Description
|
We present a synthetic medicare claims dataset linked to environmental exposures and potential confounders. In most environmental health studies relying on claims data, data restrictions exist and the data cannot be shared publicly. Centers for Medicare and Medicaid services (CMS) has generated synthetic publicly available Medicare claims data for 2008-2010. In this dataset, we link the 2010 synthetic Medicare claims data to environmental exposures and potential confounders. We aggregated the Medicare claims synthetic data for 2010 to the county level. Data is compiled for the contiguous United States, which in 2010, included 3109 counties. We merged the Medicare claims synthetic data with air pollution exposure data, more specifically with estimates of 𝑃𝑀2.5 exposures obtained from Di et al., 2019, 2021, which provided daily and annual estimates of PM2.5 exposure at 1 km×1 km grid cells in the contiguous United States. We use Census Bureau (United States Census Bureau, 2021), the Center for Disease Control (Centers for Disease Control and Prevention (CDC), 2021), and GridMET (Abatzoglou, 2013) to obtain data on potential confounders. The mortality rate, as the outcome, was computed using the synthetic Medicare data (CMS, 2021). We use the average of surrounding counties to impute missing observations, except in the case of the CDC confounders, where we imputed missing values by generating a normal distribution for each state and randomly imputing from this distribution. The steps for generating the merged dataset are provided at NSAPH Synthetic Data Github Repository (https://github.com/NSAPH/synthetic_data). Analytic inferences based on this synthetic dataset should not be made. The aggregated dataset is composed of 46 columns and 3109 rows. (2022-02-03)
|
Notes
| References: Abatzoglou, John T. 2013. “Development of Gridded Surface Meteorological Data for Ecological Applications and Modelling.” International Journal of Climatology 33 (1): 121–31. https://doi.org/10.1002/joc.3413. Centers for Disease Control and Prevention. 2021. “Behavioral Risk Factor Surveillance System.”https://www.cdc.gov/brfss/annual_data/annual_2010.htm. Centers for Medicare & Medicaid Services. 2021. “CMS 2008-2010 Data Entrepreneurs’ Synthetic Public Use File (DE-SynPUF).” https://www.cms.gov/Research-Statistics-Data-and-Systems/Downloadable-Public-Use-Files/SynPUFs/DE_Syn_PUF. Di, Qian, Heresh Amini, Liuhua Shi, Itai Kloog, Rachel Silvern, James Kelly, M Benjamin Sabath, et al. 2019. “An Ensemble-Based Model of Pm2. 5 Concentration Across the Contiguous United States with High Spatiotemporal Resolution.” Environment International 130: 104909. https://doi.org/10.1016/j.envint.2019.104909. Di, Qian, Yaguang Wei, Alexandra Shtein, Carolynne Hultquist, Xiaoshi Xing, Heresh Amini, Liuhua Shi, et al. 2021. “Daily and Annual Pm2.5 Concentrations for the Contiguous United States, 1-Km Grids, V1 (2000 - 2016).” NASA Socioeconomic Data; Applications Center (SEDAC). https://doi.org/10.7927/0rvr-4538. United States Census Bureau. 2021. “Cartographic Boundary Files - Shapefile.”https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.2010.html. |