Replication data for "Evaluating Bias and Noise Induced by the U.S. Census Bureau's Privacy Protection Methods" (doi:10.7910/DVN/TMIN3H)

View:

Part 1: Document Description
Part 2: Study Description
Part 5: Other Study-Related Materials
Entire Codebook

(external link) (external link)

Document Description

Citation

Title:

Replication data for "Evaluating Bias and Noise Induced by the U.S. Census Bureau's Privacy Protection Methods"

Identification Number:

doi:10.7910/DVN/TMIN3H

Distributor:

Harvard Dataverse

Date of Distribution:

2024-02-01

Version:

3

Bibliographic Citation:

Kenny, Christopher; McCartan, Cory; Kuriwaki, Shiro; Simko, Tyler; Imai, Kosuke, 2024, "Replication data for "Evaluating Bias and Noise Induced by the U.S. Census Bureau's Privacy Protection Methods"", https://doi.org/10.7910/DVN/TMIN3H, Harvard Dataverse, V3

Study Description

Citation

Title:

Replication data for "Evaluating Bias and Noise Induced by the U.S. Census Bureau's Privacy Protection Methods"

Identification Number:

doi:10.7910/DVN/TMIN3H

Authoring Entity:

Kenny, Christopher (Harvard University)

McCartan, Cory (New York University)

Kuriwaki, Shiro (Yale University)

Simko, Tyler (Harvard University)

Imai, Kosuke (Harvard University)

Distributor:

Harvard Dataverse

Access Authority:

Kenny, Christopher

Depositor:

Kenny, Christopher

Date of Deposit:

2024-01-29

Holdings Information:

https://doi.org/10.7910/DVN/TMIN3H

Study Scope

Keywords:

Law, Social Sciences

Abstract:

The United States Census Bureau faces a difficult trade-off between the accuracy of Census statistics and the protection of individual information. We conduct the first independent evaluation of bias and noise induced by the Bureau's two main disclosure avoidance systems: the TopDown algorithm employed for the 2020 Census and the swapping algorithm implemented for the three previous Censuses. Our evaluation leverages the Noisy Measure File (NMF) as well as two independent runs of the TopDown algorithm applied to the 2010 decennial Census. We find that the NMF contains too much noise to be directly useful, especially for Hispanic and multiracial populations. TopDown's post-processing dramatically reduces the NMF noise and produces data whose accuracy is similar to that of swapping. While the estimated errors for both TopDown and swapping algorithms are generally no greater than other sources of Census error, they can be relatively substantial for geographies with small total populations.

Methodology and Processing

Sources Statement

Data Access

Other Study Description Materials

Related Studies

Abowd et al., "2010 Census Production Settings Redistricting Data (P.L. 94-171) Demonstration Noisy Measurement File (2023-04-03)", Harvard Dataverse. https://doi.org/10.7910/DVN/1OR2A6

Related Publications

Citation

Title:

Evaluating bias and noise induced by the U.S. Census Bureau’s privacy protection methods, <b><i>Science Advances</i></b>, 2024. Vol 10, Issue 18.

Identification Number:

10.1126/sciadv.adl2524

Bibliographic Citation:

Evaluating bias and noise induced by the U.S. Census Bureau’s privacy protection methods, <b><i>Science Advances</i></b>, 2024. Vol 10, Issue 18.

Citation

Title:

“Evaluating Bias and Noise Induced by the U.S. Census Bureau’s Privacy Protection Methods.” arXiv, October 7, 2023. https://doi.org/10.48550/arXiv.2306.07521.

Identification Number:

2306.07521

Bibliographic Citation:

“Evaluating Bias and Noise Induced by the U.S. Census Bureau’s Privacy Protection Methods.” arXiv, October 7, 2023. https://doi.org/10.48550/arXiv.2306.07521.

Other Study-Related Materials

Label:

00_README.md

Text:

Top-level README

Notes:

text/markdown

Other Study-Related Materials

Label:

d_natl.rds

Text:

Cleaned rectangular dataset for most figures and tables

Notes:

application/octet-stream

Other Study-Related Materials

Label:

00_run_r_geo.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

00_run_r_proc.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

README.md

Notes:

text/markdown

Other Study-Related Materials

Label:

build_aian.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

build_gaf.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

gaf.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

README.md

Notes:

text/markdown

Other Study-Related Materials

Label:

01_build_nmf_all.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

02_build_state_dbs.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

build_db_de.sql

Notes:

application/x-sql

Other Study-Related Materials

Label:

build_duckdb.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

build_nmf.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

build_pl.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

build_ppmf21.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

build_ppmf23.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

fix_nmf_mis.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

nmf.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

nmf_baf.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

nmf_make_agg.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

nmf_stats.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

nmf_workflow.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

pl.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

README.md

Text:

Readme for R_proc

Notes:

text/markdown

Other Study-Related Materials

Label:

utils.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

00_setup.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

01_create_db.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

02_fig-01_pop-error.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

03_fig-02_racepop-rmse.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

04_fig-03-04_pop-bias-rmse_by-totpop.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

05_fig-05_noisy-reg_lpme.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

06-figs-bias-rmse_appendix.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

07_tda_corr.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

README.md

Notes:

text/markdown

Other Study-Related Materials

Label:

bafs.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

build_db_parq.sql

Notes:

application/x-sql

Other Study-Related Materials

Label:

census_api.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

consts.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

db.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

geo.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

stats.R

Notes:

type/x-r-syntax