The Harvard Center for Geographic Analysis(CGA) has been harvesting geo-tweets since late 2012. "Geotweets” are tweets which contain a pair of geographic coordinates from the originating device which denote the location where the tweet was created. Approximately 1-2% of tweets are geo-tweets. Perhaps roughly 1 million per day are created. The CGA is attempting to harvest these tweets which it stores as CSV files on a hard drive. The tweets can be manually extracted upon request and the tweet IDs made available which can be rehydrated and turned back into full tweets. Contact worldmap@harvard.edu for more information.
Featured Dataverses

In order to use this feature you must have at least one published or linked dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

Advanced Search

1 to 5 of 5 Results
Jun 17, 2020
Lewis, Benjamin, 2020, "Preliminary Extraction from Geotweet Archive v2.0 for COVID-19 Tweets", https://doi.org/10.7910/DVN/2TOFZS, Harvard Dataverse, V3, UNF:6:xJkff8zy9RNf3nq0eQTPkQ== [fileUNF]
These datasets present preliminary extractions from the Geotweet Archive. See dataset for particular SQL used in extraction. As per Twitter's terms of service the data consists of Tweet IDs only which may be turned back into full tweets using the Twitter API. To make this process easier we provide a "rehydration" tool here https://github.com/cga-ha...
May 4, 2020
Lewis, Benjamin; Kakkar, Devika, 2016, "Harvard CGA Geotweet Archive v2.0", https://doi.org/10.7910/DVN/3NCMB6, Harvard Dataverse, V2
Geotweet Archive v2.0 The Harvard Center for Geographic Analysis (CGA) maintains the Geotweet Archive, a global record of tweets spanning time, geography, and language. The primary purpose of the Archive is to make a comprehensive collection of geo-located tweets available to the academic research community. The Archive extends from 2010 to July 12...
Jan 24, 2019
Lewis, Benjamin, 2019, "Harvard CGA Geotweet Archive - known bots", https://doi.org/10.7910/DVN/7OTPCI, Harvard Dataverse, V1
These bots appear to be randomly distributed spatially. These are bots discovered as of 1/15/2018. Many of these include a number that looks like a hash. There are likely to be more. Some of these and other bots may have valuable uses depending on one's perspective. This list does not include sensors as far as we know. googuns_lulz googuns_staging...
Jan 18, 2018
CGA, Harvard, 2016, "Harvard CGA 2018 Datafest WorldMap Workshop", https://doi.org/10.7910/DVN/A0HHDI, Harvard Dataverse, V11, UNF:6:FIpKsnSRKvmtjyQGYULyEA== [fileUNF]
Harvard CGA 2018 Datafest Presentation on Dataverse and WorldMap
Jul 6, 2016
CGA, Harvard, 2016, "Harvard CGA Streaming Billion Geotweet Dataset", https://doi.org/10.7910/DVN/3FDVCA, Harvard Dataverse, V1
Funded by a grant from the Sloan Foundation, and with support from Massachusetts Open Cloud, the Center for Geographic Analysis(CGA) at Harvard developed a “big geodata”, remotely hosted, real-time-updated dataset which is a prototype for a new data type hosted outside Dataverse which supports streaming updates, and is accessed via an API. The CGA...
Add Data

Sign up or log in to create a dataverse or add a dataset.

Share Dataverse

Share this dataverse on your favorite social media networks.

Link Dataverse
Reset Modifications

Are you sure you want to reset the selected metadata fields? If you do this, any customizations (hidden, required, optional) you have done will no longer appear.