View: |
Part 1: Document Description
|
Citation |
|
---|---|
Title: |
Replication Data for: Statistically Valid Inferences from Differentially Private Data Releases, with Application to the Facebook URLs Dataset |
Identification Number: |
doi:10.7910/DVN/UDFZJD |
Distributor: |
Harvard Dataverse |
Date of Distribution: |
2021-09-04 |
Version: |
1 |
Bibliographic Citation: |
Evans, Georgina; King, Gary, 2021, "Replication Data for: Statistically Valid Inferences from Differentially Private Data Releases, with Application to the Facebook URLs Dataset", https://doi.org/10.7910/DVN/UDFZJD, Harvard Dataverse, V1, UNF:6:qVAL2iA9dusDRaLhZ1X4xg== [fileUNF] |
Citation |
|
Title: |
Replication Data for: Statistically Valid Inferences from Differentially Private Data Releases, with Application to the Facebook URLs Dataset |
Identification Number: |
doi:10.7910/DVN/UDFZJD |
Authoring Entity: |
Evans, Georgina (Harvard University) |
King, Gary (Harvard University) |
|
Producer: |
Political Analysis |
Distributor: |
Harvard Dataverse |
Access Authority: |
Evans, Georgina |
Depositor: |
Evans, Georgina |
Date of Deposit: |
2020-11-21 |
Holdings Information: |
https://doi.org/10.7910/DVN/UDFZJD |
Study Scope |
|
Keywords: |
Social Sciences |
Abstract: |
We offer methods to analyze the “differentially private” Facebook URLs Dataset which, at over 17 trillion cell values, is one of the largest social science research datasets ever constructed. The version of differential privacy used in the URLs dataset has specially calibrated random noise added, which provides mathematical guarantees for the privacy of individual research subjects while still making it pos- sible to learn about aggregate patterns of interest to social scientists. Unfortunately, random noise creates measurement error which induces statistical bias — includ- ing attenuation, exaggeration, switched signs, or incorrect uncertainty estimates. We adapt methods developed to correct for naturally occurring measurement error, with special attention to computational efficiency for large datasets. The result is statisti- cally valid linear regression estimates and descriptive statistics that can be interpreted as ordinary analyses of non-confidential data but with appropriately larger standard errors. |
Notes: |
For privacy reasons, the data for generating the Facebook URLs results reported in Table 1 is not included in the material. The code for producing the results can be found in the FB_code subfolder. More information on how to apply for access to the data can be found in README. |
Methodology and Processing |
|
Sources Statement |
|
Data Access |
|
Notes: |
<a href="http://creativecommons.org/publicdomain/zero/1.0">CC0 1.0</a> |
Other Study Description Materials |
|
Related Publications |
|
Citation |
|
Title: |
Forthcoming, Political Analysis |
Identification Number: |
add DOI# here when available. If not available, please delete all text in this field before saving |
Bibliographic Citation: |
Forthcoming, Political Analysis |
File Description--f4771597 |
|
File: main_catalist_file.tab |
|
|
|
Notes: |
UNF:6:qVAL2iA9dusDRaLhZ1X4xg== |
List of Variables: |
|
Variables |
|
f4771597 Location: |
Variable Format: character Notes: UNF:6:LS48NLdl3G9DFBocYCHbIw== |
f4771597 Location: |
Variable Format: character Notes: UNF:6:WT/sXRk+BHWq9OVoksxYrw== |
f4771597 Location: |
Variable Format: character Notes: UNF:6:iBN2xNSOfhR5SRsX7FZqcw== |
f4771597 Location: |
Summary Statistics: StDev 1.6015785812785879; Max. 14.35802; Mean 4.668674793009709; Min. 1.786967; Valid 2575.0 Variable Format: numeric Notes: UNF:6:WCyJ6HR9GVcxzJe6c6CzFA== |
f4771597 Location: |
Summary Statistics: Mean 0.7852712333203883; Min. 0.0168164; Valid 2575.0; StDev 0.2178179025537323; Max. 0.9897156; Variable Format: numeric Notes: UNF:6:fVBKK0ByC8msoEfgOpFjaQ== |
f4771597 Location: |
Summary Statistics: StDev 0.14579735883144546; Max. 0.9286104; Valid 2575.0; Mean 0.08031672000000001; Min. 0.0 Variable Format: numeric Notes: UNF:6:1Z/fBSBLfS/phAwAPQ+9kw== |
f4771597 Location: |
Summary Statistics: Valid 2575.0; Max. 64854.25; Min. 31767.06; StDev 8073.5588268243055; Mean 48051.13669902909 Variable Format: numeric Notes: UNF:6:d6KSxxjPL6nIT2a9JpoiGA== |
f4771597 Location: |
Summary Statistics: Valid 2575.0; StDev 1.418530023430273; Max. 5.0; Mean 3.1712621359223228; Min. 1.0; Variable Format: numeric Notes: UNF:6:MnV8JwXuopmClqWK/FTKsg== |
f4771597 Location: |
Summary Statistics: Max. 34499.0; StDev 2685.384561746896; Valid 2575.0; Min. 0.0; Mean 925.3417475728132 Variable Format: numeric Notes: UNF:6:FfEjdSufU+yReZkrPaVkAg== |
f4771597 Location: |
Summary Statistics: Mean 220.89009708737754; Valid 2575.0; StDev 646.2852772838223; Max. 15809.0; Min. 0.0 Variable Format: numeric Notes: UNF:6:kCOKgo7orBN9+vWqCTmCzA== |
f4771597 Location: |
Summary Statistics: Max. 73740.0; StDev 8356.60565560796; Valid 2575.0; Mean 6419.216310679602; Min. 3.0 Variable Format: numeric Notes: UNF:6:VTAspNorI1Wk+DliGQPoxA== |
f4771597 Location: |
Summary Statistics: Min. 6.0; Mean 3689.7817475728093; Max. 59272.0; Valid 2575.0; StDev 5290.988997054812; Variable Format: numeric Notes: UNF:6:XWN+eSmpjlxBVAKoloTTiA== |
f4771597 Location: |
Summary Statistics: Mean 5276.893980582549; Valid 2575.0; Max. 57834.0; StDev 7124.068325734575; Min. 0.0; Variable Format: numeric Notes: UNF:6:/C/+fEbHQyzpj1Zw1SWlZQ== |
f4771597 Location: |
Summary Statistics: StDev 6007.922747663919; Valid 2575.0; Mean 4210.49359223303; Min. 5.0; Max. 52072.0 Variable Format: numeric Notes: UNF:6:dH/eqOeSwjRD3Dk0aJ2QGA== |
f4771597 Location: |
Summary Statistics: Max. 50476.0; Min. 0.0; Valid 2575.0; StDev 4980.945643481704; Mean 2456.395339805801 Variable Format: numeric Notes: UNF:6:vTGB4YSBshJUm6w3FJ6RdA== |
f4771597 Location: |
Summary Statistics: Valid 2575.0; Min. 0.0; Mean 2293.9300970873855; StDev 4606.908381025664; Max. 43345.0 Variable Format: numeric Notes: UNF:6:8rl4EMVXAvsy7x1UNu1vbw== |
f4771597 Location: |
Summary Statistics: StDev 2472.6452203818612; Min. 0.0; Mean 886.0170873786361; Valid 2575.0; Max. 31148.0 Variable Format: numeric Notes: UNF:6:NYSmC80o4VHC3tFOs2bRwA== |
f4771597 Location: |
Summary Statistics: Valid 2575.0; Mean 898.9988349514695; Min. 0.0; Max. 29610.0; StDev 2515.513083996233 Variable Format: numeric Notes: UNF:6:VBGmAvwaAp33fCuG+5d8Ow== |
f4771597 Location: |
Summary Statistics: Max. 20299.0; StDev 1275.728164593118; Mean 334.0908737864126; Min. 0.0; Valid 2575.0 Variable Format: numeric Notes: UNF:6:WcEX103NOOSZ3yVO+pd1vQ== |
f4771597 Location: |
Summary Statistics: Max. 26850.0; Mean 357.18912621358567; StDev 1415.5299724527256; Valid 2575.0; Min. 0.0; Variable Format: numeric Notes: UNF:6:90wZie6mWnrlQJ7lmpW4Fw== |
f4771597 Location: |
Summary Statistics: Valid 2575.0; Mean 116.6784466019394; Max. 14471.0; StDev 618.2458585621796; Min. 0.0 Variable Format: numeric Notes: UNF:6:O/HOIy9S2oUh7or9KBEBNQ== |
f4771597 Location: |
Summary Statistics: Min. 0.0; Valid 2575.0; Max. 10475.0; Mean 115.03378640776559; StDev 565.2034083384813 Variable Format: numeric Notes: UNF:6:mjkz5zdjsmX1UVOLmxBKgA== |
f4771597 Location: |
Summary Statistics: StDev 333.8456235187749; Valid 2575.0; Min. 0.0; Mean 52.816699029126255; Max. 7789.0; Variable Format: numeric Notes: UNF:6:ajLjRZhtPm2KA+O6s70aww== |
f4771597 Location: |
Summary Statistics: Min. 0.0; Mean 55.21786407767051; Valid 2575.0; StDev 323.27539839637865; Max. 7487.0; Variable Format: numeric Notes: UNF:6:KmBLwXfEi430qrAtB+4K5g== |
f4771597 Location: |
Summary Statistics: StDev 179.3671986775572; Mean 23.071844660194206; Max. 4736.0; Valid 2575.0; Min. 0.0 Variable Format: numeric Notes: UNF:6:3J16oufz02TLCPMEp3/6kA== |
f4771597 Location: |
Summary Statistics: StDev 234.39033008007047; Min. 0.0; Valid 2575.0; Mean 28.771262135922523; Max. 5005.0 Variable Format: numeric Notes: UNF:6:x1qeNIocl5q8WXFUnRHcMA== |
f4771597 Location: |
Summary Statistics: Valid 2575.0; Max. 3422.0; StDev 216.20907075068814; Min. 0.0; Mean 31.210873786408175 Variable Format: numeric Notes: UNF:6:wY7XFcc42hQthH/T72Vr6w== |
f4771597 Location: |
Summary Statistics: Max. 1779.0; Min. 0.0; Mean 164.29980582524362; Valid 2575.0; StDev 232.91054149170213; Variable Format: numeric Notes: UNF:6:rle2q7+sljLbgmeuzXx9Vg== |
f4771597 Location: |
Summary Statistics: Mean 118.96932038834858; Max. 5530.0; Min. 0.0; StDev 245.51080741273543; Valid 2575.0 Variable Format: numeric Notes: UNF:6:He4RMFOiG7XroPhtrgVZSw== |
f4771597 Location: |
Summary Statistics: StDev 781.2274010011688; Valid 2575.0; Mean 313.24388349514504; Max. 12669.0; Min. 0.0 Variable Format: numeric Notes: UNF:6:GrKwPb7jeOjaCkPvjnfEyA== |
f4771597 Location: |
Summary Statistics: Min. 4.0; StDev 4211.705878379926; Valid 2575.0; Mean 3249.9658252427394; Max. 38479.0 Variable Format: numeric Notes: UNF:6:+3kThoyr2hdTOTDFkXsZIA== |
f4771597 Location: |
Summary Statistics: Mean 3721.5475728155116; StDev 4795.9264110917065; Min. 3.0; Max. 42296.0; Valid 2575.0 Variable Format: numeric Notes: UNF:6:7Jf8wDj4DzxfnDqeJ6/cig== |
f4771597 Location: |
Summary Statistics: Valid 2575.0; Mean 2037.0019417475514; Min. 0.0; Max. 32125.0; StDev 3871.5775040979274; Variable Format: numeric Notes: UNF:6:yHkR0jdENTSf008tNsJixA== |
f4771597 Location: |
Summary Statistics: Valid 2575.0; Min. 0.0; Mean 760.5817475728197; Max. 19987.0; StDev 2055.7406515592565; Variable Format: numeric Notes: UNF:6:l1mlb2GuF7Vvwq8wSUWmcw== |
f4771597 Location: |
Summary Statistics: Max. 17262.0; Valid 2575.0; Min. 0.0; Mean 297.0031067961166; StDev 1142.882679093597; Variable Format: numeric Notes: UNF:6:jCsSMFuHsgTPJMXhS1CEDg== |
f4771597 Location: |
Summary Statistics: Mean 108.36504854368994; Min. 0.0; Valid 2575.0; Max. 9280.0; StDev 568.7164499883312; Variable Format: numeric Notes: UNF:6:vwiHn1ZuJCmukQSs+zZ8Fg== |
f4771597 Location: |
Summary Statistics: Valid 2575.0; StDev 271.4118938056109; Min. 0.0; Max. 7097.0; Mean 44.58524271844617; Variable Format: numeric Notes: UNF:6:mgT1xUO41OAQ21jB9O2gNA== |
f4771597 Location: |
Summary Statistics: StDev 146.59679956350473; Min. 0.0; Valid 2575.0; Mean 18.624077669902697; Max. 3588.0; Variable Format: numeric Notes: UNF:6:doRoVRprxSWONzc75Frj3g== |
f4771597 Location: |
Summary Statistics: Max. 3167.0; Mean 23.722718446601583; Valid 2575.0; Min. 0.0; StDev 172.81952493203653; Variable Format: numeric Notes: UNF:6:177mofd/Pg3FMrvFM+lqkA== |
f4771597 Location: |
Summary Statistics: Max. 4502.0; StDev 284.82690028049075; Mean 117.32038834951393; Min. 0.0; Valid 2575.0 Variable Format: numeric Notes: UNF:6:4+Cj3vija5d5P6DebE/XIQ== |
f4771597 Location: |
Summary Statistics: Max. 5541.0; Mean 23.871067961165135; Valid 2575.0; Min. 0.0; StDev 205.9961424893673 Variable Format: numeric Notes: UNF:6:97HPPmEbSOIdWW3cWMvCFA== |
Label: |
README.txt |
Notes: |
text/plain |
Label: |
application.R |
Notes: |
type/x-r-syntax |
Label: |
diagnostics.R |
Notes: |
type/x-r-syntax |
Label: |
distributions.R |
Notes: |
type/x-r-syntax |
Label: |
main_figures.R |
Notes: |
type/x-r-syntax |
Label: |
main_simulation_run.R |
Notes: |
type/x-r-syntax |
Label: |
RUN_ALL.R |
Notes: |
type/x-r-syntax |
Label: |
simulation_functions.R |
Notes: |
type/x-r-syntax |
Label: |
variance_time.R |
Notes: |
type/x-r-syntax |
Label: |
Icon |
Notes: |
application/octet-stream |
Label: |
urls_data_model.R |
Notes: |
type/x-r-syntax |
Label: |
urls_data_query.py |
Notes: |
text/x-python |
Label: |
figure1.pdf |
Notes: |
application/pdf |
Label: |
figure2.pdf |
Notes: |
application/pdf |
Label: |
figure3.pdf |
Notes: |
application/pdf |
Label: |
figure4.pdf |
Notes: |
application/pdf |
Label: |
figure5.pdf |
Notes: |
application/pdf |
Label: |
main_sims.Rdata |
Notes: |
application/x-rlang-transport |
Label: |
time_test_analytical.Rdata |
Notes: |
application/x-rlang-transport |