View: |
Part 1: Document Description
|
Citation |
|
---|---|
Title: |
Census-block-level-TSGI |
Identification Number: |
doi:10.7910/DVN/IAYJOC |
Distributor: |
Harvard Dataverse |
Date of Distribution: |
2023-11-27 |
Version: |
3 |
Bibliographic Citation: |
Fu, Xiaokang; Jain, Devika; Hayes, Jack, 2023, "Census-block-level-TSGI", https://doi.org/10.7910/DVN/IAYJOC, Harvard Dataverse, V3 |
Citation |
|
Title: |
Census-block-level-TSGI |
Identification Number: |
doi:10.7910/DVN/IAYJOC |
Authoring Entity: |
Fu, Xiaokang (Harvard University) |
Jain, Devika (Harvard University) |
|
Hayes, Jack (Harvard University) |
|
Distributor: |
Harvard Dataverse |
Access Authority: |
Jain, Devika |
Depositor: |
Hayes, Jack |
Date of Deposit: |
2023-10-24 |
Holdings Information: |
https://doi.org/10.7910/DVN/IAYJOC |
Study Scope |
|
Keywords: |
Arts and Humanities, Earth and Environmental Sciences, Social Sciences, Geospatial, Big Data, Open Source, Social Media, Twitter |
Abstract: |
<p> Harvard CGA Geotweet Census Archive is a subset of <a href="https://doi.org/10.7910/DVN/3NCMB6"> Harvard CGA Geotweet Archive v2.0 </a> enriched with nationwide census data. It contains the tweet and user identification records along with census variables and sentiment scores for more than 2 billion geo-tagged tweets from January 2012 to July 2023. The sentiment scores are derived from the BERT sentiment scores from the <a href=https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/X2KJPC>Harvard CGA Geotweet Sentiment Archive</a>. This dataset is available to the academic community at large, unlike the <a href="https://doi.org/10.7910/DVN/3NCMB6">Harvard CGA Geotweet Archive v2.0 </a> which is under <a href="https://developer.twitter.com/en/developer-terms/agreement-and-policy">Twitter's redistribution policy</a> restriction for public sharing. It could serve as cross-validation data for publications that used data from <a href="https://doi.org/10.7910/DVN/3NCMB6">Harvard CGA Geotweet Archive v2.0 </a>. <p>If you are interested in accessing this archive, please fill out our <a href="https://gis.harvard.edu/geotweet-request-form">Geotweet Request Form</a>. Before requesting or receiving Tweet IDs, requestors must agree to <a href="https://twitter.com/en/tos">Twitter's Terms of Service</a>, <a href="https://twitter.com/en/privacy">Twitter's Privacy Policy</a>, and <a href="https://developer.twitter.com/en/developer-terms/policy"> Twitter's Developer Policy </a>. Geotweets IDs data provided by CGA can only be used for not-for-profit research and academic purposes. Recipients may not share CGA provided Tweet IDs or content derived from them without written permission from the CGA.</p> <p><strong>Citations:</strong></p> <p>If you use the Geotweet Archive in your research please reference it: "<a href="https://doi.org/10.7910/DVN/KTRIJP">Harvard CGA Geotweet IDs Archive</a>".</p> ======================================================== <p>Schema of Geotweet Census Archive</p> <p><strong>Field name____TYPE____Description</strong></p> <p><strong>day</strong>----TEXT----The date of the tweet (YYYY-MM-DD)</p> <p><strong>GEOID20</strong>----TEXT----Census block geoid</p> <p><strong>tweet_count</strong>----INTEGER----Number of tweets in the census block</p> <p><strong>user_count</strong>----INTEGER----Number of unique users in the census block</p> <p><strong>avg_score</strong>----FLOAT----The average tweet sentiment score in the census block</p> <p><strong>max_score</strong>----FLOAT----The maximum tweet sentiment score in the census block</p> <p><strong>min_score</strong>----FLOAT----The minimum tweet sentiment score in the census block</p> <p><strong>std_score</strong>----FLOAT----The standard deviation of tweet sentiment scores in the census block</p> <p><strong>score_10q</strong>----FLOAT----The 10th quantile tweet sentiment score in the census block</p> <p><strong>score_25q</strong>----FLOAT----The 25th quantile tweet sentiment score in the census block</p> <p><strong>score_50q</strong>----FLOAT----The 50th quantile (median) tweet sentiment score in the census block</p> <p><strong>score_75q</strong>----FLOAT----The 75th quantile tweet sentiment score in the census block</p> <p><strong>score_90q</strong>----FLOAT----The 90th quantile tweet sentiment score in the census block</p> |
Methodology and Processing |
|
Sources Statement |
|
Data Access |
|
Archive Where Study was Originally Stored: |
<a href="https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/3NCMB6">Harvard CGA Geotweet Archive v2.0</a> |
Access Authority: |
<p>If you are interested in accessing this archive, please fill out our <a href="https://gis.harvard.edu/geotweet-request-form">Geotweet Request form</a>. |
Citation Requirement: |
If you use the Geotweet Archive in your research please reference it: "<a href="doi:10.7910/DVN/KTRIJP">Harvard Center for Geographic Analysis Geotweet IDs Archive</a>".</p> |
Other Study Description Materials |
|
Label: |
statistics-2012_day_GEOID20__no_topic.parquet |
Text: |
2012 daily tweet data enriched with TSGI and census-level geometry |
Notes: |
application/octet-stream |
Label: |
statistics-2013_day_GEOID20__no_topic_0.parquet |
Text: |
2013 daily tweet data enriched with TSGI and census-level geometry. file 1/4 since 2013 was so large |
Notes: |
application/octet-stream |
Label: |
statistics-2013_day_GEOID20__no_topic_1.parquet |
Text: |
2013 daily tweet data enriched with TSGI and census-level geometry. file 2/4 since 2013 was so large |
Notes: |
application/octet-stream |
Label: |
statistics-2013_day_GEOID20__no_topic_2.parquet |
Text: |
2013 daily tweet data enriched with TSGI and census-level geometry. file 3/4 since 2013 was so large |
Notes: |
application/octet-stream |
Label: |
statistics-2013_day_GEOID20__no_topic_3.parquet |
Text: |
2013 daily tweet data enriched with TSGI and census-level geometry. file 4/4 since 2013 was so large |
Notes: |
application/octet-stream |
Label: |
statistics-2014_day_GEOID20__no_topic_0.parquet |
Text: |
2014 daily tweet data enriched with TSGI and census-level geometry. file 1/4 since 2014 was so large |
Notes: |
application/octet-stream |
Label: |
statistics-2014_day_GEOID20__no_topic_1.parquet |
Text: |
2014 daily tweet data enriched with TSGI and census-level geometry. file 2/4 since 2014 was so large |
Notes: |
application/octet-stream |
Label: |
statistics-2014_day_GEOID20__no_topic_2.parquet |
Text: |
2014 daily tweet data enriched with TSGI and census-level geometry. file 3/4 since 2014 was so large |
Notes: |
application/octet-stream |
Label: |
statistics-2014_day_GEOID20__no_topic_3.parquet |
Text: |
2014 daily tweet data enriched with TSGI and census-level geometry. file 4/4 since 2014 was so large |
Notes: |
application/octet-stream |
Label: |
statistics-2015_day_GEOID20__no_topic_0.parquet |
Text: |
2015 daily tweet data enriched with TSGI and census-level geometry. file 1/4 since 2015 was so large |
Notes: |
application/octet-stream |
Label: |
statistics-2015_day_GEOID20__no_topic_1.parquet |
Text: |
2015 daily tweet data enriched with TSGI and census-level geometry. file 2/4 since 2015 was so large |
Notes: |
application/octet-stream |
Label: |
statistics-2015_day_GEOID20__no_topic_2.parquet |
Text: |
2015 daily tweet data enriched with TSGI and census-level geometry. file 3/4 since 2015 was so large |
Notes: |
application/octet-stream |
Label: |
statistics-2015_day_GEOID20__no_topic_3.parquet |
Text: |
2015 daily tweet data enriched with TSGI and census-level geometry. file 4/4 since 2015 was so large |
Notes: |
application/octet-stream |
Label: |
statistics-2016_day_GEOID20__no_topic.parquet |
Text: |
2016 daily tweet data enriched with TSGI and census-level geometry |
Notes: |
application/octet-stream |
Label: |
statistics-2017_day_GEOID20__no_topic.parquet |
Text: |
2017 daily tweet data enriched with TSGI and census-level geometry |
Notes: |
application/octet-stream |
Label: |
statistics-2018_day_GEOID20__no_topic.parquet |
Text: |
2018 daily tweet data enriched with TSGI and census-level geometry |
Notes: |
application/octet-stream |
Label: |
statistics-2019_day_GEOID20__no_topic.parquet |
Text: |
2019 daily tweet data enriched with TSGI and census-level geometry |
Notes: |
application/octet-stream |
Label: |
statistics-2020_day_GEOID20__no_topic.parquet |
Text: |
2020 daily tweet data enriched with TSGI and census-level geometry |
Notes: |
application/octet-stream |
Label: |
statistics-2021_day_GEOID20__no_topic.parquet |
Text: |
2021 daily tweet data enriched with TSGI and census-level geometry |
Notes: |
application/octet-stream |
Label: |
statistics-2022_day_GEOID20__no_topic.parquet |
Text: |
2022 daily tweet data enriched with TSGI and census-level geometry |
Notes: |
application/octet-stream |
Label: |
statistics-2023_day_GEOID20__no_topic.parquet |
Text: |
2023 daily tweet data enriched with TSGI and census-level geometry |
Notes: |
application/octet-stream |