Replication Data for: Differentially Private Survey Research (doi:10.7910/DVN/X4Y2FL)

View:

Part 1: Document Description
Part 2: Study Description
Part 3: Data Files Description
Part 4: Variable Description
Part 5: Other Study-Related Materials
Entire Codebook

Document Description

Citation

Title:

Replication Data for: Differentially Private Survey Research

Identification Number:

doi:10.7910/DVN/X4Y2FL

Distributor:

Harvard Dataverse

Date of Distribution:

2023-12-19

Version:

1

Bibliographic Citation:

Evans, Georgina; King, Gary; Smith, Adam; Thakurta, Abhradeep, 2023, "Replication Data for: Differentially Private Survey Research", https://doi.org/10.7910/DVN/X4Y2FL, Harvard Dataverse, V1, UNF:6:1hQlAh8RGzLi+kKnI82oXw== [fileUNF]

Study Description

Citation

Title:

Replication Data for: Differentially Private Survey Research

Identification Number:

doi:10.7910/DVN/X4Y2FL

Authoring Entity:

Evans, Georgina (Harvard University)

King, Gary (Harvard University)

Smith, Adam (Boston University)

Thakurta, Abhradeep (University of California Santa Cruz)

Producer:

Georgina Evans

Distributor:

Harvard Dataverse

Access Authority:

Evans, Georgina

Depositor:

Evans, Georgina

Date of Deposit:

2022-08-29

Holdings Information:

https://doi.org/10.7910/DVN/X4Y2FL

Study Scope

Keywords:

Social Sciences, Privacy, Statistics, Inference

Abstract:

Survey researchers have long protected the privacy of respondents via de-identification (removing names and other directly identifying information) before sharing data. Although these procedures help, recent research demonstrates that they fail to protect respondents from intentional re-identification attacks, a problem that threatens to undermine vast survey enterprises in academia, government, and industry. This is especially a problem in political science because political beliefs are not merely the subject of our scholarship; they represent some of the most important information respondents want to keep private. We confirm the problem in practice by re-identifying individuals from a survey about a controversial referendum declaring life beginning at conception. We build on the concept of “differential privacy” to offer new data sharing procedures with mathematical guarantees for protecting respondent privacy and statistical validity guarantees for social scientists analyzing differentially private data. The cost of these new procedures is larger standard errors, which can be overcome with somewhat larger sample sizes.

Notes:

This dataset underwent an independent verification process, complying with the AJPS Verification Policy updated June 2023, that replicated the tables and figures in the primary article. For the supplementary materials, verification was performed solely for the successful execution of code. The verification process was carried out by the Odum Institute for Research in Social Science at the University of North Carolina at Chapel Hill. <br></br> The associated article has been awarded the Open Materials Badge. Learn more about the Open Practice Badges from the <a href="https://osf.io/tvyxz/wiki/home/" target="_blank">Center for Open Science</a>.<br></br> <img src="https://odum.unc.edu/files/2020/03/OpenMaterials_PR-1.png" alt="Open Materials Badge" height="77" width="80">

Methodology and Processing

Sources Statement

Data Sources:

Rosenfeld, Bryn; Imai, Kosuke; Shapiro, Jacob, 2015, "Replication Data for: An Empirical Validation Study of Popular Survey Methodologies for Sensitive Questions", https://doi.org/10.7910/DVN/29911, Harvard Dataverse, V3, UNF:5:wfSfR7xnbL9XigVosud4zA== [fileUNF]

Data Access

Disclaimer:

The <i>American Journal of Political Science</i> and the Odum Institute for Research in Social Science are not responsible for the accuracy or quality of data uploaded within the <i>AJPS</i> Dataverse, for the use of those data, or for interpretations or conclusions based on their use.

Other Study Description Materials

Related Publications

Citation

Title:

Evans, Georgina, Gary King, Adam D. Smith, and Abhradeep Thakurta. [date]. "Differentially Private Survey Research." <i>American Journal of Political Science</i> Forthcoming. <a href="http://ajps.org/" target="_blank">http://ajps.org/</a>

Bibliographic Citation:

Evans, Georgina, Gary King, Adam D. Smith, and Abhradeep Thakurta. [date]. "Differentially Private Survey Research." <i>American Journal of Political Science</i> Forthcoming. <a href="http://ajps.org/" target="_blank">http://ajps.org/</a>

File Description--f7673753

File: k_sims.tab

  • Number of cases: 800

  • No. of variables per record: 9

  • Type of File: text/tab-separated-values

Notes:

UNF:6:khzDUos3LKNK++SA0qos2g==

File Description--f7673746

File: main_sims.tab

  • Number of cases: 1550

  • No. of variables per record: 9

  • Type of File: text/tab-separated-values

Notes:

UNF:6:ybdf76nGv0zkZnO8zgTgPA==

Variable Description

List of Variables:

Variables

true_coef

f7673753 Location:

Summary Statistics: Valid 800.0; Mean 1.5027557072832878; Max. 1.73668069093032; Min. 1.2720189369362973; StDev 0.07683358665682588

Variable Format: numeric

Notes: UNF:6:GKN3YY2uNDsFiGZWUSkBHg==

synth_coef

f7673753 Location:

Summary Statistics: Valid 800.0; Max. 2.1259068620668202; StDev 0.22221282330510794; Mean 1.2466164126791646; Min. 0.31969283513689173

Variable Format: numeric

Notes: UNF:6:yJIllsz96IZ/8fyrYeI5hQ==

llm_coef

f7673753 Location:

Summary Statistics: StDev 0.4967055137536442; Valid 796.0; Mean 1.5270863495760985; Min. -0.19429541441521395; Max. 5.679337874311988

Variable Format: numeric

Notes: UNF:6:6E/KCxvA7Utjhd3mbs+qNg==

llm_se

f7673753 Location:

Summary Statistics: Valid 796.0; Mean 0.522523773934766; StDev 0.98766294931149; Max. 25.862257672408422; Min. 0.206101091406152

Variable Format: numeric

Notes: UNF:6:w3xcf1mQ9e4AG9+ug3p1/Q==

llm_em

f7673753 Location:

Summary Statistics: Max. 2.5343270594144687; Min. 0.6788424860121531; StDev 0.21934761771782632; Mean 1.4828276768855337; Valid 800.0

Variable Format: numeric

Notes: UNF:6:S0m2BS9P8369gA9Od3US8A==

llm_em_var

f7673753 Location:

Summary Statistics: Max. 0.4071612050104414; StDev 0.045879262059710504; Mean 0.05125408519622568; Min. 0.01640300148457335; Valid 800.0;

Variable Format: numeric

Notes: UNF:6:AeCgCpaHo5CnPLSCQclkAQ==

epsilon_local

f7673753 Location:

Summary Statistics: Max. 3.0; Valid 800.0; Min. 1.5; StDev 0.6183153158278016; Mean 2.53125;

Variable Format: numeric

Notes: UNF:6:+59sRB2TkDe5CI8P94mzGQ==

n

f7673753 Location:

Summary Statistics: Min. 5000.0; Valid 800.0; Mean 5000.0; Max. 5000.0; StDev 0.0

Variable Format: numeric

Notes: UNF:6:JyJP7JOyP61F787Y6DyZTA==

k

f7673753 Location:

Summary Statistics: Mean 122.0; Max. 212.0; Min. 92.0; Valid 800.0; StDev 51.994030657663366

Variable Format: numeric

Notes: UNF:6:hhZ9KH/3POhXWJ3FuoKY+A==

true_coef

f7673746 Location:

Summary Statistics: Max. 2.166432466885211; StDev 0.1621077666802211; Valid 1550.0; Min. 0.9271491035473647; Mean 1.487662551637857

Variable Format: numeric

Notes: UNF:6:DaAk1XB78OrNABzNCbuunQ==

synth_coef

f7673746 Location:

Summary Statistics: Valid 1550.0; Mean 1.2693089130887847; StDev 0.2215281691873661; Max. 1.9453736404322763; Min. 0.34355057155357593;

Variable Format: numeric

Notes: UNF:6:gmh1EHh88Zg6spUS+tdBbQ==

synth_se

f7673746 Location:

Summary Statistics: Valid 1550.0; Max. 0.3043607087676973; Min. 0.15125494273371956; StDev 0.028521218308317123; Mean 0.20741768677634173

Variable Format: numeric

Notes: UNF:6:Ou0OXSxt+8Pm3XARjqPZDA==

llm_coef

f7673746 Location:

Summary Statistics: Mean 1.5227978795523087; Valid 1550.0; Max. 4.19173586661004; Min. -0.21938981837435853; StDev 0.41882022328105667

Variable Format: numeric

Notes: UNF:6:aBdesdIF+waLdxaDDpqDaA==

llm_se

f7673746 Location:

Summary Statistics: Mean 0.40696236188501206; StDev 0.3047595822764976; Max. 6.723271544477865; Valid 1550.0; Min. 0.15889221856021768

Variable Format: numeric

Notes: UNF:6:gDTXC90HjG7YmnejDXLM1Q==

llm_em

f7673746 Location:

Summary Statistics: Max. 2.598813202844112; Valid 1550.0; Min. 0.6478957658974533; Mean 1.496106607240939; StDev 0.24124068894675973;

Variable Format: numeric

Notes: UNF:6:H2epMSQ6JntFaG3Bz2EnPg==

epsilon_local

f7673746 Location:

Summary Statistics: Mean 4.516129032258072; Valid 1550.0; Max. 8.0; StDev 1.3277703231781692; Min. 3.5

Variable Format: numeric

Notes: UNF:6:lIseoamlehdkVDwiKBbwXA==

n

f7673746 Location:

Summary Statistics: Min. 1000.0; StDev 0.0; Max. 1000.0; Valid 1550.0; Mean 1000.0

Variable Format: numeric

Notes: UNF:6:jtmgwiRlZO+snpnlcFD/1g==

llm_em_var

f7673746 Location:

Summary Statistics: Min. 0.025096825477179112; Mean 0.056179799476229024; Max. 0.16255296044189924; Valid 600.0; StDev 0.02331268246773068

Variable Format: numeric

Notes: UNF:6:wwO1f0u9hpU4OMsu0dQtBg==

Other Study-Related Materials

Label:

fig2a.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

fig2b.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

fig3.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

fig4.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

README

Notes:

text/plain; charset=US-ASCII

Other Study-Related Materials

Label:

RUN_ALL.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

run_simulations.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

run_simulations_k.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

simulation_functions.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

dp_hier_hist.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

em_code.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

logit_regression.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

poisson_regression.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

util_functions.R

Notes:

type/x-r-syntax