Description
|
blockgroupvulnerability OPPORTUNITY The US Centers for Disease Control (CDC) publishes a set of percentiles that compare US geographies by vulnerability across household, socioeconomic, racial/ethnic and housing themes. These Social Vulnerability Indexes (SVI) were originally intended to to help public health officials and emergency response planners identify communities that will need support around an event. They are generally valuable for any public interest that wants to relate themselves to needy communities by geography. The SVI publication and its basis variables are provided at the Census tract level of geographic detail. The Census' American Community Survey is available down the to the block group level, however. Recasting the SVI methods at this lower level of geography allows it to be tied to thousands of other demographic variables available. Because the SVI relies on ACS variables only available at the tract level, a projection model needs to applied to approximate its results using blockgroup level ACS variables. The blockgroupvulnerability dataset casts a prediction for the CDCs logic for a new contribution to the Open Environments blockgroup series available on Harvard's dataverse platform. DATA The CDC's annual SVI publication starts with 23 simple derivations using 50 ACS Census variables. Next the SVI process ranks census geographies to calculate a rank for each, where Percentile Rank = (Rank-1) / (N-1). The SVI themes are then calculated at the tract level as a percentile rank of a sum of the percentile ranks of the first level ACS derived variables. Finally, the overall ranking is taken as the sum of the theme percentile rankings. The SVI data publication is keyed by geography (7 cols) where ultimately the Census Tract FIPS code is 2 State + 3 County + 4 Tract + 2 Tract Decimals eg, 56043000301 is 56 Wyoming, 043 Washakie County, Tract 3.01 republishes Census demographics called 'adjunct variables' including area, population, households and housing units from the ACS daytime population taken from LandScan 2020 estimates derives 23 SVI variables from 50 ACS 5 Year variables with each having an estimate (E_), estimate precentage (EP_), margin of error (M_), margin percentage (MP_) and flag variable (F_) for those greater than 90% or less than 10% provides the final 4 themes and a composite SVI percentile annually vars = ['ST', 'STATE', 'ST_ABBR', 'STCNTY', 'COUNTY', 'FIPS', 'LOCATION'] +\ ['SNGPNT','LIMENG','DISABL','AGE65','AGE17','NOVEH','MUNIT','MOBILE','GROUPQ','CROWD','UNINSUR','UNEMP','POV150','NOHSDP','HBURD','TWOMORE','OTHERRACE','NHPI','MINRTY','HISP','ASIAN','AIAN','AFAM','NOINT'] +\ ['TOTAL','THEME1','THEME2','THEME3','THEME4'] + \ ['AREA_SQMI', 'TOTPOP', 'DAYPOP', 'HU', 'HH'] knowns = vars + \ # Estimates, the result of calc against ACS vars [('E_'+v) for v in vars] + \ # Flag 0,1 whether this geog is in 90 percentile rank (its vulnerable) [('F_'+v) for v in vars] +\ # Margine of error for ACS calcs [('M_'+v) for v in vars] + \ # Margine of error for ACS calcs, as percentage [('MP_'+v) for v in vars] +\ # Estimates of ACS calcs, as percentage [('EP_'+v) for v in vars] + \ # Estimated percentile ranks [('EPL_'+v) for v in vars] + \ # Sum across var percentile ranks [('SPL_'+v) for v in vars]+ \ # Percentile rank of the sum of percentile ranks [('RPL_'+v) for v in vars] [c for c in svitract.columns if c not in knowns] The SVI themes range over [0,1] but the CDC uses -999 as an NA value; this is set for ~800 or 1% of tracts which have no total poulation. The themes are numbered: Socioeconomic Status – RPL_THEME1 Household Characteristics – RPL_THEME2 Racial & Ethnic Minority Status – RPL_THEME3 Housing Type & Transportation – RPL_THEME4 The themes with their variables and ACS sources are as follows: Unlike Census data, the CDC ranks Puerto Rico and Tribal tracts separately from the US otherwise. Theme SVI Variable ACS Table ACS Variables Socioeconomic E_UNINSUR S2701 S2701_C04_001E Socioeconomic E_UNEMP DP03 DP03_0005E Socioeconomic E_POV150 S1701 S1701_C01_040E Socioeconomic E_NOHSDP B06009 B06009_002E Socioeconomic E_HBURD S2503 S2503_C01_028E + S2503_C01_032E + S2503_C01_036E + S2503_C01_040E Household E_SNGPNT B11012 B11012_010E + B11012_015E Household E_LIMENG B16005 B16005_007E + B16005_008E + B16005_012E + B16005_013E + B16005_017E + B16005_018E + B16005_022E + B16005_023E + B16005_029E + B16005_030E + B16005_034E + B16005_035E + B16005_039E + B16005_040E + B16005_044E + B16005_045E Household E_DISABL DP02 DP02_0072E Household E_AGE65 S0101 S0101_C01_030E Household E_AGE17 B09001 B09001_001E Racial & Ethnic E_TWOMORE DP05 DP05_0083E Racial & Ethnic E_OTHERRACE DP05 DP05_0082E Racial & Ethnic E_NHPI DP05 DP05_0081E Racial & Ethnic E_MINRTY DP05 DP05_0071E + DP05_0078E + DP05_0079E + DP05_0080E + DP05_0081E + DP05_0082E + DP05_0083E Racial & Ethnic E_HISP DP05 DP05_0071E Racial & Ethnic E_ASIAN DP05 DP05_0080E Racial & Ethnic E_AIAN DP05 DP05_0079E Racial & Ethnic E_AFAM DP05 DP05_0078E Housing E_NOVEH DP04 DP04_0058E Housing E_MUNIT DP04 DP04_0012E + DP04_0013E Housing E_MOBILE DP04 DP04_0014E Housing E_GROUPQ B26001 B26001_001E Housing E_CROWD DP04 DP04_0078E + DP04_0079E The Census American Community Survey is updated annually and accessible by API. For this effort, variables used commonly at the block group level were retrieved at the tract level so that a predictive method could be applied to detail. The specific variables used are shown as lists in the data retrieval functions below. The Census' TIGER\Line publication provides the geographic shapes and properties. The TIGER\Line dataset includes: Geography, position ['STATEFP', 'COUNTYFP', 'TRACTCE', 'GEOID', 'INTPTLAT', 'INTPTLON'] Name with legal/statistical area description '[NAME', 'NAMELSAD', 'MTFCC', 'FUNCSTAT'] Area of land and water in square meters ['ALAND', 'AWATER'] Geographic shape ['geometry'] See https://www2.census.gov/geo/pdfs/maps-data/data/tiger/tgrshp2020/TGRSHP2020_TechDoc.pdf The supporting code is maintained on https://github.com/OpenEnvironments/blockgroupvulnerability In generally, variable names within the process are taken from the original SVI and ACS documentation. The variable names in the dataverse publication have the E_ prefix removed, maintaining the published variables relation to the SVI original. MODEL The models that generates this data publication uses block group level ACS variables aggregated by the Census to the tract level. The Census TIGER\Line data adds a variable, the land area of each geography, to calculate population density. For context, there are about 85K tracts in the United States, while there are about 200K block groups. Each tract has between 1,200 and 8,000 people in it while each block group has between 600 and 3,000. Block groups are subdivisions of Census tracts. This level of detail is available for most of the SVI's Census sources, except for variables in the ACS Data Profiles and Subject Tables. These are only available at the tract level. A model is trained, for each of the SVI's four themes as well as its composite. Each is a regressor, converted to its own percentile rank, and applied at a block group level version of the ACS and TIGER\Line features. The models performance compares the original targets to the block group estimates, aggregated by mean for each tract. The root mean squared error (RMSE) for each theme are: |Theme|RMSE| |---------------| |THEME1|0.148565| |THEME2|0.218488| |THEME3|0.086466| |THEME4|0.241419| |THEMES|0.154495| CITATIONS Centers for Disease Control and Prevention/ Agency for Toxic Substances and Disease Registry/ Geospatial Research, Analysis, and Services Program. CDC/ATSDR Social Vulnerability Index [Insert 2020, 2018, 2016, 2014, 2010, or 2000] Database [Insert US or State]. https://www.atsdr.cdc.gov/placeandhealth/svi/data_documentation_download.html. Accessed on November 8, 2022. U.S. Census Bureau. (2020). 2020 American Community Survey 5-year Estimates. Retrieved from API calls to https://api.census.gov/data/2017/acs/acs5?get=NAME,B25077_001M&for=state:* “TIGER\Line Tract Level Geographies.” Index of /Geo/Tiger/TIGER2020/Tract, US Census Bureau, 1 Feb. 2021, https://www2.census.gov/geo/tiger/TIGER2020/TRACT/. Flanagan, Barry E.; Gregory, Edward W.; Hallisey, Elaine J.; Heitgerd, Janet L.; and Lewis, Brian (2011) "A Social Vulnerability Index for Disaster Management," Journal of Homeland Security and Emergency Management: Vol. 8: Iss. 1, Article 3. DOI: 10.2202/1547-7355.1792 Available at: http://www.bepress.com/jhsem/vol8/iss1/3 XGBoost, Xgboost.ai, https://xgboost.ai/. Bryan, Michael B. “Block Group Datasets.” Open Environments Dataverse, Feb. 2022, https://dataverse.harvard.edu/dataverse/openenvironments. https://github.com/OpenEnvironments/blockgroupvulnerability
|