Harvard CGA Geotweet Census Archive is a subset of Harvard CGA Geotweet Archive v2.0 enriched with nationwide census data. It contains the tweet and user identification records along with census variables and sentiment scores for more than 2 billion geo-tagged tweets from January 2012 to July 2023. The sentiment scores are derived from the BERT sentiment scores from the Harvard CGA Geotweet Sentiment Archive. This dataset is available to the academic community at large, unlike the Harvard CGA Geotweet Archive v2.0 which is under Twitter's redistribution policy restriction for public sharing. It could serve as cross-validation data for publications that used data from Harvard CGA Geotweet Archive v2.0 .
If you are interested in accessing this archive, please fill out our Geotweet Request Form. Before requesting or receiving Tweet IDs, requestors must agree to Twitter's Terms of Service, Twitter's Privacy Policy, and Twitter's Developer Policy . Geotweets IDs data provided by CGA can only be used for not-for-profit research and academic purposes. Recipients may not share CGA provided Tweet IDs or content derived from them without written permission from the CGA.
Citations:
If you use the Geotweet Archive in your research please reference it: "Harvard CGA Geotweet IDs Archive".
========================================================
Schema of Geotweet Census Archive
Field name____TYPE____Description
day----TEXT----The date of the tweet (YYYY-MM-DD)
GEOID20----TEXT----Census block geoid
tweet_count----INTEGER----Number of tweets in the census block
user_count----INTEGER----Number of unique users in the census block
avg_score----FLOAT----The average tweet sentiment score in the census block
max_score----FLOAT----The maximum tweet sentiment score in the census block
min_score----FLOAT----The minimum tweet sentiment score in the census block
std_score----FLOAT----The standard deviation of tweet sentiment scores in the census block
score_10q----FLOAT----The 10th quantile tweet sentiment score in the census block
score_25q----FLOAT----The 25th quantile tweet sentiment score in the census block
score_50q----FLOAT----The 50th quantile (median) tweet sentiment score in the census block
score_75q----FLOAT----The 75th quantile tweet sentiment score in the census block
score_90q----FLOAT----The 90th quantile tweet sentiment score in the census block