analyze the survey of consumer finances (scf) with r
the survey of consumer finances (scf) tracks the wealth of american families. every three years, more than five thousand households answer a battery of questions about income, net worth, credit card debt, pensions, mortgages, even the lease on their cars. plenty of surveys collect annual income, only the survey of consumer finances captures such detailed asset data. responses are at the primary economic unit-level (peu) - the economically dominant, financially interdependent family members within a sampled household.
norc at the university of chicago administers the data collection, but
the board of governors of the federal reserve pay the bills and therefore call the shots.
if you were so brazen as to open up the microdata and run a simple weighted median, you'd get the wrong answer. the five to six thousand respondents actually gobble up twenty-five to thirty thousand records in the final pub lic use files. why oh why? well, those tables contain not one, not two, but five records for each peu. wherever missing,
these data are multiply-imputed, meaning answers to the same question for the same household might vary across implicates. each analysis must account for all that, lest your
confidence intervals be too tight. to calculate the correct statistics, you'll need to break the single file into five, necessarily complicating your life. this can be accomplished with the `meanit` sas macro buried in
the 2004 scf codebook (search for `meanit` - you'll need
the sas iml add-on). or you might blow the dust off
this website referred to in
the 2010 codebook as the home of an alternative multiple imputation technique, but all i found were broken links. perhaps it's time for plan c, and by c, i mean free. read the imputation section of
the latest codebook (search for `imputation`), then give these scripts a whirl. they've got that new r smell.
the lion's share of the respondents in the survey of consumer finances get drawn from a pretty standard sample of american dwellings - no nursing homes, no active-duty military. then there's this secondary sample of richer households to even out the statistical noise at the higher end of the i ncome and assets spectrum.
you can read more if you like, but at the end of the day the weights just generalize to civilian, non-institutional american households. one last thing before you start your engine: read
everything you always wanted to know about the scf. my favorite part of that title is the word always. this new github repository contains t hree scripts:
1989-2010 download all microdata.R
- initiate a function to download and import any survey of consumer finances zipped stata file (.dta)
- loop through each year specified by the user (starting at the 1989 re-vamp) to download the main, extract, and replicate weight files, then import each into r
- break the main file into five implicates (each containing one record per peu) and merge the appropriate extract data onto each implicate
- save the five implicates and replicate weights to an r data file (.rda) for rapid future loading
2010 analysis examples.R
replicate FRB SAS output.R
click here to view these three scripts for more detail about the survey of consumer finances (scf), visit:
notes:
nationally-representative statistics on the financial health, wealth, and assets of american hous eholds might not be monopolized by the survey of consumer finances, but there isn't much competition aside from
the assets topical module of the
survey of income and program participation (sipp). on one hand, the scf
interview questions contain more detail than sipp. on the other hand, scf's smaller sample precludes analyses of acute subpopulations. and for any three-handed martians in the audience, ther e's also
a few biases between these two data sources that you ought to consider.
the survey methodologists at the federal reserve take their job seriously, as evidenced by
this working paper trail. write a thank-you in their
guestbook. one can never receive enough of those.
confidential to sas, spss, stata, and sudaan users: the eighties called. they want their statistical languages back. time to transition to r. :D