Top 10 News (doi:10.7910/DVN/OTJMYQ)

View:

Part 1: Document Description
Part 2: Study Description
Part 3: Data Files Description
Part 4: Variable Description
Part 5: Other Study-Related Materials
Entire Codebook

Document Description

Citation

Title:

Top 10 News

Identification Number:

doi:10.7910/DVN/OTJMYQ

Distributor:

Harvard Dataverse

Date of Distribution:

2020-04-09

Version:

1

Bibliographic Citation:

Sood, Gaurav; Laohaprapanon, Suriyan, 2020, "Top 10 News", https://doi.org/10.7910/DVN/OTJMYQ, Harvard Dataverse, V1, UNF:6:jlKZoJmu6AlRH7bK3zv4ig== [fileUNF]

Study Description

Citation

Title:

Top 10 News

Subtitle:

Data from Home pages and Top 10 Lists on News Sites

Identification Number:

doi:10.7910/DVN/OTJMYQ

Authoring Entity:

Sood, Gaurav

Laohaprapanon, Suriyan

Distributor:

Harvard Dataverse

Access Authority:

Sood, Gaurav

Depositor:

Sood, Gaurav

Date of Deposit:

2020-04-07

Holdings Information:

https://doi.org/10.7910/DVN/OTJMYQ

Study Scope

Keywords:

Social Sciences

Abstract:

We scraped and parsed the homepages, politics pages, and top10 lists of prominent news sites for 2012 and 2016--2017. We did all this in 2016--2017, and hence the 2012 data exclusively comes from Internet Archive. For 2016--2017, the data mostly comes from scraping live sites but some of the data---where we realized much too late that we wanted to scrape the site---also comes from Internet Archive. For additional details, see: https://github.com/not_news/top10

Methodology and Processing

Sources Statement

Data Access

Notes:

Data available only for research purposes.

<a href="http://creativecommons.org/publicdomain/zero/1.0">CC0 1.0</a>

Other Study Description Materials

File Description--f3798658

File: current-output-homepage.tab

  • Number of cases: 312547

  • No. of variables per record: 14

  • Type of File: text/tab-separated-values

Notes:

UNF:6:SCkcwV64HNZCi0UPIBBm7g==

File Description--f3798630

File: current-output-politics-homepage.tab

  • Number of cases: 66449

  • No. of variables per record: 14

  • Type of File: text/tab-separated-values

Notes:

UNF:6:dj8/fs33o+M3mwpHQBy8Ow==

File Description--f3798758

File: ia-output-politics-top10-text-all.tab

  • Number of cases: 10181

  • No. of variables per record: 13

  • Type of File: text/tab-separated-values

Notes:

UNF:6:0PC/X7C9sU15yVb3DpABmg==

Variable Description

List of Variables:

Variables

date

f3798758 Location:

Summary Statistics: Min. 2.0120701E7; Mean 2.0143088693055693E7; Valid 10181.0; Max. 2.0161006E7; StDev 19799.115771795714

Variable Format: numeric

Notes: UNF:6:NCo/BJAQyHFrZyP0JDzPpg==

time

f3798758 Location:

Summary Statistics: Valid 10181.0; Mean 116759.85659561942; StDev 70303.2106729336; Min. 106.0; Max. 235959.0

Variable Format: numeric

Notes: UNF:6:OwVdXdmMlAUjuvETcst3+g==

src

f3798758 Location:

Variable Format: character

Notes: UNF:6:HzIuujeo8kUuEOY9tMvePA==

order

f3798758 Location:

Summary Statistics: Valid 10181.0; StDev 2.943639038667501; Max. 20.0; Min. 1.0; Mean 5.004321775856926;

Variable Format: numeric

Notes: UNF:6:oaqJ24dNixRLIlMJrAJXtw==

url

f3798758 Location:

Variable Format: character

Notes: UNF:6:ZmrZgrUcJbAMqDQ5SCzkaQ==

link_text

f3798758 Location:

Variable Format: character

Notes: UNF:6:CoB+I56RWcqW6G1Qx2Vtzg==

path

f3798758 Location:

Variable Format: character

Notes: UNF:6:VsGPVawEulNWOgpMo6fSLQ==

title

f3798758 Location:

Variable Format: character

Notes: UNF:6:wzoZgwkgTZxmoP3qI5ZIpA==

text

f3798758 Location:

Variable Format: character

Notes: UNF:6:bmzSyPtWZvXllC2vDjoUtA==

top_image

f3798758 Location:

Variable Format: character

Notes: UNF:6:6dGcwgiBaN5ikLh1pr8sOQ==

authors

f3798758 Location:

Variable Format: character

Notes: UNF:6:iWaV1WkOHBSS2pIK2AqMLw==

summary

f3798758 Location:

Variable Format: character

Notes: UNF:6:T7p42PUPcCGa6GqwKcHdPw==

keywords

f3798758 Location:

Variable Format: character

Notes: UNF:6:OphcoOTjqsAEKZM3gD9iVg==

Other Study-Related Materials

Label:

current-homepage-html.tar.gz

Text:

Notes:

application/gzip

Other Study-Related Materials

Label:

current-output-top10.csv

Text:

Notes:

text/csv

Other Study-Related Materials

Label:

current-politics-homepage-html.tar.gz

Text:

Notes:

application/gzip

Other Study-Related Materials

Label:

current-top10-html.tar.gz

Text:

Notes:

application/gzip

Other Study-Related Materials

Label:

ia-homepage-html-2007-2011.tar.gz

Text:

Notes:

application/gzip

Other Study-Related Materials

Label:

ia-homepage-html-2012.tar.gz.partaa

Text:

Notes:

application/gzip

Other Study-Related Materials

Label:

ia-homepage-html-2012.tar.gz.partab

Text:

Notes:

application/octet-stream

Other Study-Related Materials

Label:

ia-homepage-html-2012.tar.gz.partac

Text:

Notes:

application/octet-stream

Other Study-Related Materials

Label:

ia-news-top10-html.tar.gz.partaa

Text:

Notes:

application/gzip

Other Study-Related Materials

Label:

ia-news-top10-html.tar.gz.partab

Text:

Notes:

application/octet-stream

Other Study-Related Materials

Label:

ia-output-homepage-2012-2016-notext.csv.gz

Text:

Notes:

application/gzip

Other Study-Related Materials

Label:

ia-output-homepage-2012-text.csv.gz

Text:

Notes:

application/gzip

Other Study-Related Materials

Label:

ia-output-homepage-2016-text.csv.gz

Text:

Notes:

application/gzip

Other Study-Related Materials

Label:

ia-output-politics-homepage-2012-2016-notext.csv.gz

Text:

Notes:

application/gzip

Other Study-Related Materials

Label:

ia-output-top10-text-all.csv

Text:

Notes:

text/csv

Other Study-Related Materials

Label:

ia-politics-html.tar.gz

Text:

Notes:

application/gzip

Other Study-Related Materials

Label:

ia-politics-top10-html.tar.gz

Text:

Notes:

application/gzip

Other Study-Related Materials

Label:

ia-top10-html.tar.gz

Text:

Notes:

application/gzip