1 to 10 of 11 Results
Apr 14, 2025
Sood, Gaurav, 2023, "Top News: Story URLs and Text from News Feeds of Major National News Sites (2022 to 03/2025)", https://doi.org/10.7910/DVN/ZNAKK6, Harvard Dataverse, V12
Scripts at: https://github.com/notnews/top_news. We check the RSS Feeds from the major news sites: ABC, CBS, CNN, LA Times, NBC, NPR, NYT, Politico, ProPublica, USA Today, and WaPo and get their URLs and then parse the data using newspaper3k and some custom scripts. To combine usat_html, cat usat_split_* > usat_html_articles_03_25.tar.gz Related Da... |
Aug 28, 2023
Sood, Gaurav; Laohaprapanon, Suriyan, 2018, "Not News: Provision of Apolitical News in British News Media", https://doi.org/10.7910/DVN/VZ8DB3, Harvard Dataverse, V3
URL level data (URL, source_name, date, predicted and training set labels) for 5,646,436 articles that underlie Not News: Provision of Apolitical News in British News Media. For more details, see: https://github.com/notnews/uk_not_news |
Sep 6, 2022
Sood, Gaurav; Laohaprapanon, Suriyan, 2021, "naampy", https://doi.org/10.7910/DVN/WZGJBM, Harvard Dataverse, V3
Data underlying the Python package `naampy: Infer Sociodemographic Characteristics from Indian Names.` GitHub Link: https://github.com/appeler/naampy Here's another related package: pranaam: predict religion from name. Pranaam uses the Bihar Land Records data, plot-level land records (N= 41.87 million plots or 12.13 individuals/accounts across 35,6... |
Sep 4, 2022
Sood, Gaurav, 2022, "Bihar Land Records (2022)", https://doi.org/10.7910/DVN/BI4KZS, Harvard Dataverse, V1
GitHub: https://github.com/in-rolls/bihar_land_records |
Jun 4, 2021
Sood, Gaurav; Laohaprapanon, Suriyan, 2021, "Transaction Level Ration Data from Rajasthan (2021)", https://doi.org/10.7910/DVN/FIFZEX, Harvard Dataverse, V2
Transaction Level Ration Data from Rajasthan Website: https://food.raj.nic.in/DistrictWiseCategoryDetails.aspx Scraped in 2021 Github: https://github.com/soodoku/ration |
Aug 4, 2020
Sood, Gaurav, 2020, "Maxmind IP Geolocation Archival Data", https://doi.org/10.7910/DVN/RMZOEN, Harvard Dataverse, V3
Maxmind IP Geolocation Archival Data Because of GDPR concerns, Maxmind doesn't provide historical data. We have used this data to do historical studies of IP data for MTurk, etc. and it is quite possible that such data would be useful elsewhere. Maxmind changed its db format from geolite to geolite2 and you will need to use its respective packages... |
May 10, 2020
Sood, Gaurav; Laohaprapanon, Suriyan, 2018, "Category of content of unique domains in comScore data", https://doi.org/10.7910/DVN/DXSNFA, Harvard Dataverse, V2
Category of content of unique domains in comScore data using |
Sep 4, 2019
Sood, Gaurav; Laohaprapanon, Suriyan, 2018, "DIME Race (1980--2014)", https://doi.org/10.7910/DVN/M5K7VR, Harvard Dataverse, V3, UNF:6:MIJQWSHoaIuOZwU/Lg0cqg== [fileUNF]
Race of people in DIME v2 data. DIME data: https://data.stanford.edu/dime Race imputation using: https://github.com/appeler/ethnicolr Github repo.: https://github.com/appeler/dime_race |
Feb 9, 2019
Sood, Gaurav, 2019, "Kerala English Electoral PDFs Google Vision OCR Output (Indian Electoral Rolls)", https://doi.org/10.7910/DVN/MQPPNC, Harvard Dataverse, V2
Google Vision OCR output of Kerala Electoral PDFs. For more about the project, see here: https://github.com/in-rolls. For scripts behind the data, see https://github.com/in-rolls/google_vision_ocr. Check the log file in the repo. for the metadata from the job. Google Vision OCR output for each pdf = PNG with bounding boxes, JSON with coords and tex... |
Jun 10, 2018
Sood, Gaurav, 2018, "Street Smart: Learning from Randomly Sampled Images from Google Street View", https://doi.org/10.7910/DVN/L3HN0K, Harvard Dataverse, V2
Data behind https://github.com/geosensing/streetsmart |