View: |
Part 1: Document Description
|
Citation |
|
---|---|
Title: |
2010 Census Production Settings Redistricting Data (P.L. 94-171) Demonstration Noisy Measurement File (2023-04-03) |
Identification Number: |
doi:10.7910/DVN/1OR2A6 |
Distributor: |
Harvard Dataverse |
Date of Distribution: |
2023-05-02 |
Version: |
4 |
Bibliographic Citation: |
Abowd, John M.,; Ashmead, Robert; Cumings-Menon, Ryan; Garfinkel, Simson; Heineck, Micah; Heiss, Christine; Johns, Robert; Kifer, Daniel; Leclerc, Philip; Machanavajjhala, Ashwin; Moran, Brett; Sexton, William; Spence, Matthew; Zhuravlev, Pavel, 2023, "2010 Census Production Settings Redistricting Data (P.L. 94-171) Demonstration Noisy Measurement File (2023-04-03)", https://doi.org/10.7910/DVN/1OR2A6, Harvard Dataverse, V4 |
Citation |
|
Title: |
2010 Census Production Settings Redistricting Data (P.L. 94-171) Demonstration Noisy Measurement File (2023-04-03) |
Identification Number: |
doi:10.7910/DVN/1OR2A6 |
Authoring Entity: |
Abowd, John M., (U.S. Census Bureau) |
Ashmead, Robert (U.S. Census Bureau) |
|
Cumings-Menon, Ryan (U.S. Census Bureau) |
|
Garfinkel, Simson (Formerly, U.S. Census Bureau) |
|
Heineck, Micah (Knexus Research Corporation) |
|
Heiss, Christine (Knexus Research Corporation) |
|
Johns, Robert (Knexus Research Corporation) |
|
Kifer, Daniel (U.S. Census Bureau) |
|
Leclerc, Philip (U.S. Census Bureau) |
|
Machanavajjhala, Ashwin (Duke University; Tumult Labs) |
|
Moran, Brett (U.S. Census Bureau) |
|
Sexton, William (Formerly, U.S. Census Bureau; Tumult Labs) |
|
Spence, Matthew (U.S. Census Bureau) |
|
Zhuravlev, Pavel (U.S. Census Bureau) |
|
Producer: |
U.S. Census Bureau |
Distributor: |
Harvard Dataverse |
Depositor: |
Barbosa, Sonia |
Date of Deposit: |
2023-04-11 |
Series Name: |
Redistricting (PL), DHC |
Holdings Information: |
https://doi.org/10.7910/DVN/1OR2A6 |
Study Scope |
|
Keywords: |
Social Sciences, census, housing, housing units, redistricting, voting age, noisy measurements, noise infusion, differential privacy, disclosure avoidance, population, race, ethnicity, group quarters, Hispanic, Latino |
Abstract: |
The 2010 Census Production Settings Redistricting Data (P.L. 94-171) Demonstration NoisyMeasurement File (2023-04-03) is an intermediate output of the 2020 Census Disclosure Avoidance System (DAS) TopDown Algorithm (TDA) (as described in Abowd, J. et al [2022] https://doi.org/10.1162/99608f92.529e3cb9 , and implemented in https://github.com/uscensusbureau/DAS_2020_Redistricting_Production_Code). The NMF was produced using the official “production settings,” the final set of algorithmic parameters and privacy-loss budget allocations, that were used to produce the 2020 Census Redistricting Data (P.L. 94-171) Summary File and the 2020 Census Demographic and Housing Characteristics File. |
<br />The NMF consists of the full set of privacy-protected statistical queries (counts of individuals or housing units with particular combinations of characteristics) of confidential 2010 Census data relating to the redistricting data portion of the 2010 Demonstration Data Products Suite – Redistricting and Demographic and Housing Characteristics File – Production Settings (2023-04-03). These statistical queries, called “noisy measurements” were produced under the zero-Concentrated Differential Privacy framework (Bun, M. and Steinke, T [2016] https://arxiv.org/abs/1605.02065; see also Dwork C. and Roth, A. [2014] https://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf) implemented via the discrete Gaussian mechanism (Cannone C., et al., [2023] https://arxiv.org/abs/2004.00010), which added positive or negative integer-valued noise to each of the resulting counts. The noisy measurements are an intermediate stage of the TDA prior to the post-processing the TDA then performs to ensure internal and hierarchical consistency within the resulting tables. The Census Bureau has released these 2010 Census demonstration data to enable data users to evaluate the expected impact of disclosure avoidance variability on 2020 Census data. The 2010 Census Production Settings Redistricting Data (P.L.94-171) Demonstration Noisy Measurement File (2023-04-03) has been cleared for public dissemination by the Census Bureau Disclosure Review Board (CBDRB-FY22-DSEP-004). |
|
<br />The data includes zero-Concentrated Differentially Private (zCDP) (Bun, M. and Steinke, T [2016]) noisy measurements, implemented via the discrete Gaussian mechanism. These are estimated counts of individuals and housing units included in the 2010 Census Edited File (CEF), which includes confidential data initially collected in the 2010 Census of Population and Housing. The noisy measurements included in this file were subsequently post-processed by the TopDown Algorithm (TDA) to produce the 2010 Census Production Settings Privacy-Protected Microdata File - Redistricting (P.L. 94-171) and Demographic and Housing Characteristics File (2023-04-03) (https://www2.census.gov/programs-surveys/decennial/2020/program-management/data-product- planning/2010-demonstration-data-products/04 Demonstration_Data_Products_Suite/2023-04-03/). As these 2010 Census demonstration data are intended to support study of the design and expected impacts of the 2020 Disclosure Avoidance System, the 2010 CEF records were pre-processed before application of the zCDP framework. This pre-processing converted the 2010 CEF records into the input-file format, response codes, and tabulation categories used for the 2020 Census, which differ in substantive ways from the format, response codes, and tabulation categories originally used for the 2010 Census. |
|
<br />The NMF provides estimates of counts of persons in the CEF by various characteristics and combinations of characteristics including their reported race and ethnicity, whether they were of voting age, whether they resided in a housing unit or one of 7 group quarters types, and their census block of residence after the addition of discrete Gaussian noise (with the scale parameter determined by the privacy-loss budget allocation for that particular query under zCDP). Noisy measurements of the counts of occupied and vacant housing units by census block are also included. Lastly, data on constraints—information into which no noise was infused by the Disclosure Avoidance System (DAS) and used by the TDA to post-process the noisy measurements into the 2010 Census Production Settings Privacy-Protected Microdata File - Redistricting (P.L. 94-171) and Demographic and Housing Characteristics File (2023-04-03) —are provided. |
|
Country: |
United States, Puerto Rico |
Unit of Analysis: |
Housing Units (Units), Persons (Person) |
Kind of Data: |
Constraints objects, DPQuery objects |
Methodology and Processing |
|
Sources Statement |
|
Data Sources: |
The primary source for the 2010 Census Production Settings Redistricting Data (P.L. 94-171) Demonstration Noisy Measurement File (2023-04-03) was direct collection of responses from the population of the United States. A number of source documents are useful for understanding the NMFs.<br /> Chief among these are: |
<br /><br />2010 Census Redistricting Data (P.L. 94-171) Summary File Technical Documentation https://www.census.gov/data/datasets/2010/dec/redistricting-file-pl-94-171.html |
|
<br /><br />2020 Census Redistricting Data (P.L. 94-171) Summary File Technical Documentation https://www.census.gov/programs-surveys/decennial-census/technical-documentation/complete- technical-documents.html#redistricting |
|
<br /><br />DAS 2020 Redistricting Production Code Release https://github.com/uscensusbureau/DAS_2020_Redistricting_Production_Code (public GitHub repository for the 2020 Census DAS, vintaged as of the commit used to produce the official production run of the Redistricting product. The zCDP framework NMFs were generated in a for-internal-use-only pickled (https://docs.python.org/3/library/pickle.html; https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.SparkContext.pickleFile.html) form as a byproduct of the use of this code. A stand-alone script was developed and used to convert these internal-use NMFs into the Parquet format used in this product (that script is not yet publicly available). |
|
<br /><br />Abowd, J., Ashmead, R., Cumings-Menon, R., Garfinkel, S., Heineck, M., Heiss, C., Johns, R., Kifer, D., Leclerc, P., Machanavajjhala, A., Moran, B., Sexton, W., Spence, M., & Zhuravlev, P. (2022). The 2020 Census Disclosure Avoidance System TopDown Algorithm. Harvard Data Science Review, (Special Issue 2). https://doi.org/10.1162/99608f92.529e3cb9 (academic paper describing the technical details of the algorithms used in the 2020 Decennial Census DAS, focused on its design as of the release of the 2020 Census Redistricting Data (P.L. 94-171) Summary File). |
|
<br /><br />Cumings-Menon, R., Abowd, J., Ashmead, R., Kifer, D., Leclerc, P., Ocker, J., Ratcliffe, M., Zhuravlev, P. (2023). Geographic Spines in the 2020 Census Disclosure Avoidance System. Journal of Privacy and Confidentiality. https://doi.org/10.48550/arXiv.2203.16654 (academic paper describing the 2020 DAS optimized geographic hierarchy used within the TopDown Algorithm) |
|
<br /><br />Cumings-Menon, R., Hawes, M., and Spence, M. (2023) “Computing Confidence Intervals Using the 2010 Census Production Settings Redistricting Data (P.L. 94-171) Demonstration Noisy Measurement File (2023-04-03)” (Jupyter notebook explaining how to calculate estimates and confidence intervals from the noisy measurement files) [URL forthcoming] |
|
Data Access |
|
Other Study Description Materials |
|
Label: |
2010 Redistricting NMF 2023-11-10_Final.pdf |
Text: |
2010 Census Production Settings Redistricting Data (P.L. 94-171) Demonstration Noisy Measurement File (2023-04-03) README File, last updated November 10, 2023: Modifications to identifiers within the parquet metadata used to support internal tracking of source data. |
Notes: |
application/pdf |
Label: |
pl94-ddps-nmf-parquetsw.zip |
Text: |
Main data files Zip archive. Use the previewer on the file page to see the individual files in the archive. Individual files can be downloaded there, so you don't have to download the entire 16GB zip. |
Notes: |
application/zip |