Replication Data for: Statistically Valid Inferences from Differentially Private Data Releases, with Application to the Facebook URLs Dataset (doi:10.7910/DVN/UDFZJD)

View:

Part 1: Document Description
Part 2: Study Description
Part 3: Data Files Description
Part 4: Variable Description
Part 5: Other Study-Related Materials
Entire Codebook

Document Description

Citation

Title:

Replication Data for: Statistically Valid Inferences from Differentially Private Data Releases, with Application to the Facebook URLs Dataset

Identification Number:

doi:10.7910/DVN/UDFZJD

Distributor:

Harvard Dataverse

Date of Distribution:

2021-09-04

Version:

1

Bibliographic Citation:

Evans, Georgina; King, Gary, 2021, "Replication Data for: Statistically Valid Inferences from Differentially Private Data Releases, with Application to the Facebook URLs Dataset", https://doi.org/10.7910/DVN/UDFZJD, Harvard Dataverse, V1, UNF:6:qVAL2iA9dusDRaLhZ1X4xg== [fileUNF]

Study Description

Citation

Title:

Replication Data for: Statistically Valid Inferences from Differentially Private Data Releases, with Application to the Facebook URLs Dataset

Identification Number:

doi:10.7910/DVN/UDFZJD

Authoring Entity:

Evans, Georgina (Harvard University)

King, Gary (Harvard University)

Producer:

Political Analysis

Distributor:

Harvard Dataverse

Access Authority:

Evans, Georgina

Depositor:

Evans, Georgina

Date of Deposit:

2020-11-21

Holdings Information:

https://doi.org/10.7910/DVN/UDFZJD

Study Scope

Keywords:

Social Sciences

Abstract:

We offer methods to analyze the “differentially private” Facebook URLs Dataset which, at over 17 trillion cell values, is one of the largest social science research datasets ever constructed. The version of differential privacy used in the URLs dataset has specially calibrated random noise added, which provides mathematical guarantees for the privacy of individual research subjects while still making it pos- sible to learn about aggregate patterns of interest to social scientists. Unfortunately, random noise creates measurement error which induces statistical bias — includ- ing attenuation, exaggeration, switched signs, or incorrect uncertainty estimates. We adapt methods developed to correct for naturally occurring measurement error, with special attention to computational efficiency for large datasets. The result is statisti- cally valid linear regression estimates and descriptive statistics that can be interpreted as ordinary analyses of non-confidential data but with appropriately larger standard errors.

Notes:

For privacy reasons, the data for generating the Facebook URLs results reported in Table 1 is not included in the material. The code for producing the results can be found in the FB_code subfolder. More information on how to apply for access to the data can be found in README.

Methodology and Processing

Sources Statement

Data Access

Notes:

<a href="http://creativecommons.org/publicdomain/zero/1.0">CC0 1.0</a>

Other Study Description Materials

Related Publications

Citation

Title:

Forthcoming, Political Analysis

Identification Number:

add DOI# here when available. If not available, please delete all text in this field before saving

Bibliographic Citation:

Forthcoming, Political Analysis

File Description--f4771597

File: main_catalist_file.tab

  • Number of cases: 2575

  • No. of variables per record: 41

  • Type of File: text/tab-separated-values

Notes:

UNF:6:qVAL2iA9dusDRaLhZ1X4xg==

Variable Description

List of Variables:

Variables

stdist

f4771597 Location:

Variable Format: character

Notes: UNF:6:LS48NLdl3G9DFBocYCHbIw==

stabbr

f4771597 Location:

Variable Format: character

Notes: UNF:6:WT/sXRk+BHWq9OVoksxYrw==

district

f4771597 Location:

Variable Format: character

Notes: UNF:6:iBN2xNSOfhR5SRsX7FZqcw==

sh_medinc

f4771597 Location:

Summary Statistics: StDev 1.6015785812785879; Max. 14.35802; Mean 4.668674793009709; Min. 1.786967; Valid 2575.0

Variable Format: numeric

Notes: UNF:6:WCyJ6HR9GVcxzJe6c6CzFA==

sh_white

f4771597 Location:

Summary Statistics: Mean 0.7852712333203883; Min. 0.0168164; Valid 2575.0; StDev 0.2178179025537323; Max. 0.9897156;

Variable Format: numeric

Notes: UNF:6:fVBKK0ByC8msoEfgOpFjaQ==

sh_black

f4771597 Location:

Summary Statistics: StDev 0.14579735883144546; Max. 0.9286104; Valid 2575.0; Mean 0.08031672000000001; Min. 0.0

Variable Format: numeric

Notes: UNF:6:1Z/fBSBLfS/phAwAPQ+9kw==

state_income

f4771597 Location:

Summary Statistics: Valid 2575.0; Max. 64854.25; Min. 31767.06; StDev 8073.5588268243055; Mean 48051.13669902909

Variable Format: numeric

Notes: UNF:6:d6KSxxjPL6nIT2a9JpoiGA==

st_inc_q

f4771597 Location:

Summary Statistics: Valid 2575.0; StDev 1.418530023430273; Max. 5.0; Mean 3.1712621359223228; Min. 1.0;

Variable Format: numeric

Notes: UNF:6:MnV8JwXuopmClqWK/FTKsg==

D.y.020

f4771597 Location:

Summary Statistics: Max. 34499.0; StDev 2685.384561746896; Valid 2575.0; Min. 0.0; Mean 925.3417475728132

Variable Format: numeric

Notes: UNF:6:FfEjdSufU+yReZkrPaVkAg==

R.y.020

f4771597 Location:

Summary Statistics: Mean 220.89009708737754; Valid 2575.0; StDev 646.2852772838223; Max. 15809.0; Min. 0.0

Variable Format: numeric

Notes: UNF:6:kCOKgo7orBN9+vWqCTmCzA==

D.y.2040

f4771597 Location:

Summary Statistics: Max. 73740.0; StDev 8356.60565560796; Valid 2575.0; Mean 6419.216310679602; Min. 3.0

Variable Format: numeric

Notes: UNF:6:VTAspNorI1Wk+DliGQPoxA==

R.y.2040

f4771597 Location:

Summary Statistics: Min. 6.0; Mean 3689.7817475728093; Max. 59272.0; Valid 2575.0; StDev 5290.988997054812;

Variable Format: numeric

Notes: UNF:6:XWN+eSmpjlxBVAKoloTTiA==

D.y.4060

f4771597 Location:

Summary Statistics: Mean 5276.893980582549; Valid 2575.0; Max. 57834.0; StDev 7124.068325734575; Min. 0.0;

Variable Format: numeric

Notes: UNF:6:/C/+fEbHQyzpj1Zw1SWlZQ==

R.y.4060

f4771597 Location:

Summary Statistics: StDev 6007.922747663919; Valid 2575.0; Mean 4210.49359223303; Min. 5.0; Max. 52072.0

Variable Format: numeric

Notes: UNF:6:dH/eqOeSwjRD3Dk0aJ2QGA==

D.y.6080

f4771597 Location:

Summary Statistics: Max. 50476.0; Min. 0.0; Valid 2575.0; StDev 4980.945643481704; Mean 2456.395339805801

Variable Format: numeric

Notes: UNF:6:vTGB4YSBshJUm6w3FJ6RdA==

R.y.6080

f4771597 Location:

Summary Statistics: Valid 2575.0; Min. 0.0; Mean 2293.9300970873855; StDev 4606.908381025664; Max. 43345.0

Variable Format: numeric

Notes: UNF:6:8rl4EMVXAvsy7x1UNu1vbw==

D.y.80100

f4771597 Location:

Summary Statistics: StDev 2472.6452203818612; Min. 0.0; Mean 886.0170873786361; Valid 2575.0; Max. 31148.0

Variable Format: numeric

Notes: UNF:6:NYSmC80o4VHC3tFOs2bRwA==

R.y.80100

f4771597 Location:

Summary Statistics: Valid 2575.0; Mean 898.9988349514695; Min. 0.0; Max. 29610.0; StDev 2515.513083996233

Variable Format: numeric

Notes: UNF:6:VBGmAvwaAp33fCuG+5d8Ow==

D.y.100120

f4771597 Location:

Summary Statistics: Max. 20299.0; StDev 1275.728164593118; Mean 334.0908737864126; Min. 0.0; Valid 2575.0

Variable Format: numeric

Notes: UNF:6:WcEX103NOOSZ3yVO+pd1vQ==

R.y.100120

f4771597 Location:

Summary Statistics: Max. 26850.0; Mean 357.18912621358567; StDev 1415.5299724527256; Valid 2575.0; Min. 0.0;

Variable Format: numeric

Notes: UNF:6:90wZie6mWnrlQJ7lmpW4Fw==

D.y.120140

f4771597 Location:

Summary Statistics: Valid 2575.0; Mean 116.6784466019394; Max. 14471.0; StDev 618.2458585621796; Min. 0.0

Variable Format: numeric

Notes: UNF:6:O/HOIy9S2oUh7or9KBEBNQ==

R.y.120140

f4771597 Location:

Summary Statistics: Min. 0.0; Valid 2575.0; Max. 10475.0; Mean 115.03378640776559; StDev 565.2034083384813

Variable Format: numeric

Notes: UNF:6:mjkz5zdjsmX1UVOLmxBKgA==

D.y.140160

f4771597 Location:

Summary Statistics: StDev 333.8456235187749; Valid 2575.0; Min. 0.0; Mean 52.816699029126255; Max. 7789.0;

Variable Format: numeric

Notes: UNF:6:ajLjRZhtPm2KA+O6s70aww==

R.y.140160

f4771597 Location:

Summary Statistics: Min. 0.0; Mean 55.21786407767051; Valid 2575.0; StDev 323.27539839637865; Max. 7487.0;

Variable Format: numeric

Notes: UNF:6:KmBLwXfEi430qrAtB+4K5g==

R.y.160180

f4771597 Location:

Summary Statistics: StDev 179.3671986775572; Mean 23.071844660194206; Max. 4736.0; Valid 2575.0; Min. 0.0

Variable Format: numeric

Notes: UNF:6:3J16oufz02TLCPMEp3/6kA==

D.y.180200

f4771597 Location:

Summary Statistics: StDev 234.39033008007047; Min. 0.0; Valid 2575.0; Mean 28.771262135922523; Max. 5005.0

Variable Format: numeric

Notes: UNF:6:x1qeNIocl5q8WXFUnRHcMA==

R.y.180200

f4771597 Location:

Summary Statistics: Valid 2575.0; Max. 3422.0; StDev 216.20907075068814; Min. 0.0; Mean 31.210873786408175

Variable Format: numeric

Notes: UNF:6:wY7XFcc42hQthH/T72Vr6w==

D.unk

f4771597 Location:

Summary Statistics: Max. 1779.0; Min. 0.0; Mean 164.29980582524362; Valid 2575.0; StDev 232.91054149170213;

Variable Format: numeric

Notes: UNF:6:rle2q7+sljLbgmeuzXx9Vg==

R.unk

f4771597 Location:

Summary Statistics: Mean 118.96932038834858; Max. 5530.0; Min. 0.0; StDev 245.51080741273543; Valid 2575.0

Variable Format: numeric

Notes: UNF:6:He4RMFOiG7XroPhtrgVZSw==

O.y.020

f4771597 Location:

Summary Statistics: StDev 781.2274010011688; Valid 2575.0; Mean 313.24388349514504; Max. 12669.0; Min. 0.0

Variable Format: numeric

Notes: UNF:6:GrKwPb7jeOjaCkPvjnfEyA==

O.y.2040

f4771597 Location:

Summary Statistics: Min. 4.0; StDev 4211.705878379926; Valid 2575.0; Mean 3249.9658252427394; Max. 38479.0

Variable Format: numeric

Notes: UNF:6:+3kThoyr2hdTOTDFkXsZIA==

O.y.4060

f4771597 Location:

Summary Statistics: Mean 3721.5475728155116; StDev 4795.9264110917065; Min. 3.0; Max. 42296.0; Valid 2575.0

Variable Format: numeric

Notes: UNF:6:7Jf8wDj4DzxfnDqeJ6/cig==

O.y.6080

f4771597 Location:

Summary Statistics: Valid 2575.0; Mean 2037.0019417475514; Min. 0.0; Max. 32125.0; StDev 3871.5775040979274;

Variable Format: numeric

Notes: UNF:6:yHkR0jdENTSf008tNsJixA==

O.y.80100

f4771597 Location:

Summary Statistics: Valid 2575.0; Min. 0.0; Mean 760.5817475728197; Max. 19987.0; StDev 2055.7406515592565;

Variable Format: numeric

Notes: UNF:6:l1mlb2GuF7Vvwq8wSUWmcw==

O.y.100120

f4771597 Location:

Summary Statistics: Max. 17262.0; Valid 2575.0; Min. 0.0; Mean 297.0031067961166; StDev 1142.882679093597;

Variable Format: numeric

Notes: UNF:6:jCsSMFuHsgTPJMXhS1CEDg==

O.y.120140

f4771597 Location:

Summary Statistics: Mean 108.36504854368994; Min. 0.0; Valid 2575.0; Max. 9280.0; StDev 568.7164499883312;

Variable Format: numeric

Notes: UNF:6:vwiHn1ZuJCmukQSs+zZ8Fg==

O.y.140160

f4771597 Location:

Summary Statistics: Valid 2575.0; StDev 271.4118938056109; Min. 0.0; Max. 7097.0; Mean 44.58524271844617;

Variable Format: numeric

Notes: UNF:6:mgT1xUO41OAQ21jB9O2gNA==

O.y.160180

f4771597 Location:

Summary Statistics: StDev 146.59679956350473; Min. 0.0; Valid 2575.0; Mean 18.624077669902697; Max. 3588.0;

Variable Format: numeric

Notes: UNF:6:doRoVRprxSWONzc75Frj3g==

O.y.180200

f4771597 Location:

Summary Statistics: Max. 3167.0; Mean 23.722718446601583; Valid 2575.0; Min. 0.0; StDev 172.81952493203653;

Variable Format: numeric

Notes: UNF:6:177mofd/Pg3FMrvFM+lqkA==

O.unk

f4771597 Location:

Summary Statistics: Max. 4502.0; StDev 284.82690028049075; Mean 117.32038834951393; Min. 0.0; Valid 2575.0

Variable Format: numeric

Notes: UNF:6:4+Cj3vija5d5P6DebE/XIQ==

D.y.160180

f4771597 Location:

Summary Statistics: Max. 5541.0; Mean 23.871067961165135; Valid 2575.0; Min. 0.0; StDev 205.9961424893673

Variable Format: numeric

Notes: UNF:6:97HPPmEbSOIdWW3cWMvCFA==

Other Study-Related Materials

Label:

README.txt

Notes:

text/plain

Other Study-Related Materials

Label:

application.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

diagnostics.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

distributions.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

main_figures.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

main_simulation_run.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

RUN_ALL.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

simulation_functions.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

variance_time.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

Icon

Notes:

application/octet-stream

Other Study-Related Materials

Label:

urls_data_model.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

urls_data_query.py

Notes:

text/x-python

Other Study-Related Materials

Label:

figure1.pdf

Notes:

application/pdf

Other Study-Related Materials

Label:

figure2.pdf

Notes:

application/pdf

Other Study-Related Materials

Label:

figure3.pdf

Notes:

application/pdf

Other Study-Related Materials

Label:

figure4.pdf

Notes:

application/pdf

Other Study-Related Materials

Label:

figure5.pdf

Notes:

application/pdf

Other Study-Related Materials

Label:

main_sims.Rdata

Notes:

application/x-rlang-transport

Other Study-Related Materials

Label:

time_test_analytical.Rdata

Notes:

application/x-rlang-transport