Replication Data for: Using a Probabilistic Model to Assist Merging of Large-scale Administrative Records (doi:10.7910/DVN/YGUHTD)

View:

Part 1: Document Description
Part 2: Study Description
Part 5: Other Study-Related Materials
Entire Codebook

(external link)

Document Description

Citation

Title:

Replication Data for: Using a Probabilistic Model to Assist Merging of Large-scale Administrative Records

Identification Number:

doi:10.7910/DVN/YGUHTD

Distributor:

Harvard Dataverse

Date of Distribution:

2019-01-04

Version:

1

Bibliographic Citation:

Enamorado, Ted; Fifield, Benjamin; Imai, Kosuke, 2019, "Replication Data for: Using a Probabilistic Model to Assist Merging of Large-scale Administrative Records", https://doi.org/10.7910/DVN/YGUHTD, Harvard Dataverse, V1

Study Description

Citation

Title:

Replication Data for: Using a Probabilistic Model to Assist Merging of Large-scale Administrative Records

Identification Number:

doi:10.7910/DVN/YGUHTD

Authoring Entity:

Enamorado, Ted (Princeton University)

Fifield, Benjamin (Princeton University)

Imai, Kosuke (Harvard University)

Distributor:

Harvard Dataverse

Distributor:

Harvard Dataverse

Access Authority:

Fifield, Benjamin

Depositor:

Fifield, Benjamin

Date of Deposit:

2018-10-09

Holdings Information:

https://doi.org/10.7910/DVN/YGUHTD

Study Scope

Keywords:

Social Sciences

Abstract:

<b> Abstract: </b> Since most social science research relies upon multiple data sources, merging data sets is an essential part of researchers' workflow. Unfortunately, a unique identifier that unambiguously links records is often unavailable, and data may contain missing and inaccurate information. These problems are severe especially when merging large-scale administrative records. We develop a fast and scalable algorithm to implement a canonical probabilistic model of record linkage that has many advantages over deterministic methods frequently used by social scientists. The proposed methodology efficiently handles millions of observations while accounting for missing data and measurement error, incorporating auxiliary information, and adjusting for uncertainty about merging in post-merge analyses. We conduct comprehensive simulation studies to evaluate the performance of our algorithm in realistic scenarios. We also apply our methodology to merging campaign contribution records, survey data, and nationwide voter files. An open-source software package is available for implementing the proposed methodology.

Methodology and Processing

Sources Statement

Data Access

Notes:

This dataset not to be distributed/posted outside of the Harvard Dataverse. All downloads should take place directly on Harvard Dataverse to ensure data integrity.

Other Study Description Materials

Related Publications

Citation

Title:

Enamorado, Ted, Benjamin Fifield, and Kosuke Imai. 2019. “Using a Probabilistic Model to Assist Merging of Large-Scale Administrative Records.” <i>American Political Science Review</i> 113 (2): 353--371.

Identification Number:

10.1017/S0003055418000783

Bibliographic Citation:

Enamorado, Ted, Benjamin Fifield, and Kosuke Imai. 2019. “Using a Probabilistic Model to Assist Merging of Large-Scale Administrative Records.” <i>American Political Science Review</i> 113 (2): 353--371.

Other Study-Related Materials

Label:

appendix.tar

Text:

Replication materials for the appendix.

Notes:

application/x-tar

Other Study-Related Materials

Label:

mainpaper.tar

Text:

Replication materials for the main paper.

Notes:

application/x-tar

Other Study-Related Materials

Label:

README.pdf

Text:

Guide to the replication files for "Using a Probabilistic Model to Assist Merging of Large-scale Administrative Records."

Notes:

application/pdf