Replication data for Identifying science in the news (doi:10.7910/DVN/WNDOFL)

View:

Part 1: Document Description
Part 2: Study Description
Part 3: Data Files Description
Part 4: Variable Description
Part 5: Other Study-Related Materials
Entire Codebook

(external link)

Document Description

Citation

Title:

Replication data for Identifying science in the news

Identification Number:

doi:10.7910/DVN/WNDOFL

Distributor:

Harvard Dataverse

Date of Distribution:

2022-03-06

Version:

2

Bibliographic Citation:

Fleerackers, Alice; Nehring, Lise; Alperin, Juan Pablo; Enkhbayar, Asura; Maggio, Lauren A.; Moorhead, Laura, 2022, "Replication data for Identifying science in the news", https://doi.org/10.7910/DVN/WNDOFL, Harvard Dataverse, V2, UNF:6:8njTEk0qIvW7dD7De7ZSXw== [fileUNF]

Study Description

Citation

Title:

Replication data for Identifying science in the news

Subtitle:

An assessment of the precision and recall of Altmetric.com news mention data

Identification Number:

doi:10.7910/DVN/WNDOFL

Authoring Entity:

Fleerackers, Alice (Interdisciplinary Studies, Simon Fraser University, Canada)

Nehring, Lise (Centre for Forest Biology, University of Victoria, Canada)

Alperin, Juan Pablo (Publishing Program, Simon Fraser University, Canada)

Enkhbayar, Asura (Interdisciplinary Studies, Simon Fraser University, Canada)

Maggio, Lauren A. (Department of Medicine, Uniformed Services University, USA)

Moorhead, Laura (Journalism, College of Liberal and Creative Arts, San Francisco State University, USA)

Distributor:

Harvard Dataverse

Access Authority:

Fleerackers, Alice

Depositor:

Fleerackers, Alice

Date of Deposit:

2022-03-01

Holdings Information:

https://doi.org/10.7910/DVN/WNDOFL

Study Scope

Keywords:

Computer and Information Science, Social Sciences, Computer and Information Science, Social Sciences

Abstract:

This data set contains the data and codebook required to replicate the study "Identifying science in the news: An assessment of the precision and recall of Altmetric.com news mention data." It includes two data sets, both of which contain a collection of news stories published in the science and health sections of the following eight news media outlets during March-April 2021: The Guardian (Science Section), HealthDay, IFLScience, MedPage Today, News Medical, New York Times (Science Section), Popular Science, and Wired. The first data set (altmetric_dataset.csv) was obtained by downloading all of the news stories that mentioned research using the Altmetric Explorer. The second data set (content_analysis_dataset.csv) was obtained by collecting a random sample of 400 news stories from these 8 sources and manually identifying mentions of research within them. The codebook (news_mention_codebook.pdf) contains the coding instructions that were used to identify the mentions of research in content_analysis_dataset.csv.

Methodology and Processing

Sources Statement

Data Sources:

Enkhbayar, Asura; Fleerackers, Alice; Alperin, Juan Pablo; Moorhead, Laura, 2022, "Articles published in the Science sections of 8 news outlets between March and April of 2021", https://doi.org/10.7910/DVN/KK6T86, Harvard Dataverse, V1, UNF:6:cxb0si5bUJlhhC4UFoo/JA== [fileUNF]

Data Access

Notes:

CC0 Waiver

Other Study Description Materials

Related Publications

Citation

Identification Number:

10.7910/DVN/KK6T86

Bibliographic Citation:

Enkhbayar, Asura; Fleerackers, Alice; Alperin, Juan Pablo; Moorhead, Laura, 2022, "Articles published in the Science sections of 8 news outlets between March and April of 2021", https://doi.org/10.7910/DVN/KK6T86, Harvard Dataverse, V1, UNF:6:cxb0si5bUJlhhC4UFoo/JA== [fileUNF]

Citation

Identification Number:

10.5281/zenodo.6332953

Bibliographic Citation:

Alperin, J. P. (2022). Code for publication: Identifying science in the news. Zenodo. https://doi.org/10.5281/zenodo.6332953

File Description--f6050037

File: content_analysis_dataset.tab

  • Number of cases: 674

  • No. of variables per record: 21

  • Type of File: text/tab-separated-values

Notes:

UNF:6:8njTEk0qIvW7dD7De7ZSXw==

Variable Description

List of Variables:

Variables

identifier

f6050037 Location:

Variable Format: character

Notes: UNF:6:OafPNPXWxe/lwgDyenOu3A==

URL

f6050037 Location:

Variable Format: character

Notes: UNF:6:DwXKmqVfOp+2jxLQ5goJvA==

outlet

f6050037 Location:

Variable Format: character

Notes: UNF:6:QRJ7EB7WmaYSsqfkIWIk8w==

mention_date

f6050037 Location:

Variable Format: character

Notes: UNF:6:K9DqTj+i5jH4JTZwhCIADA==

mention_title

f6050037 Location:

Variable Format: character

Notes: UNF:6:VodRFhW2epLAt32trD/72A==

Text

f6050037 Location:

Variable Format: character

Notes: UNF:6:W5ihhnPwVM/ybAysc+AusQ==

GeneralNotes

f6050037 Location:

Variable Format: character

Notes: UNF:6:3DgqA/5HWuc7eC9AYvl8cA==

MentionExcerpt

f6050037 Location:

Variable Format: character

Notes: UNF:6:EEfaM1DFsZKyrQWkJX2g9A==

MentionNotes

f6050037 Location:

Variable Format: character

Notes: UNF:6:Alv/4vNt7SICV2p0YAKO7w==

Aggregated

f6050037 Location:

Summary Statistics: Mean 0.13056379821958494; Valid 674.0; Max. 1.0; Min. 0.0; StDev 0.3371729018772002;

Variable Format: numeric

Notes: UNF:6:nG6z1L2OGZ/8pom4rs81Pg==

PressRelease

f6050037 Location:

Summary Statistics: Min. 0.0; StDev 0.2250215486967318; Valid 674.0; Max. 1.0; Mean 0.053412462908011854

Variable Format: numeric

Notes: UNF:6:yXIUvuGzj03VUzBnpQWS9g==

ResearchMentioned

f6050037 Location:

Summary Statistics: StDev 0.43629335818002274; Max. 1.0; Min. 0.0; Mean 0.744807121661721; Valid 674.0;

Variable Format: numeric

Notes: UNF:6:a5Yc76WqU5nr6xSUm8qgfQ==

DescribesAsresearch

f6050037 Location:

Summary Statistics: StDev 0.418064509581353; Valid 502.0; Min. 0.0; Mean 0.7749003984063749; Max. 1.0

Variable Format: numeric

Notes: UNF:6:+sk0JK8V5yyX6ksW96G4hQ==

HasLink

f6050037 Location:

Summary Statistics: Min. 0.0; Mean 0.7669322709163342; Valid 502.0; StDev 0.42320673849480306; Max. 1.0

Variable Format: numeric

Notes: UNF:6:NbXlKwSk3qAzZs9cDnMWSQ==

JournalMentioned

f6050037 Location:

Summary Statistics: Max. 1.0; Min. 0.0; Mean 0.4023904382470125; Valid 502.0; StDev 0.4908689827558265;

Variable Format: numeric

Notes: UNF:6:LH1ohRynlEt9bz5XK7ABZg==

AuthorMentioned

f6050037 Location:

Summary Statistics: Mean 0.4721115537848608; StDev 0.499719605516635; Max. 1.0; Valid 502.0; Min. 0.0

Variable Format: numeric

Notes: UNF:6:jxcXLR3SjxWT5IXvZr/T5w==

InstitutionMentioned

f6050037 Location:

Summary Statistics: Valid 502.0; Max. 1.0; Mean 0.4741035856573703; Min. 0.0; StDev 0.49982700922298545;

Variable Format: numeric

Notes: UNF:6:t4WQ8AMAspO9ZQVGxeweTQ==

StudyDateMentioned

f6050037 Location:

Summary Statistics: Valid 502.0; Mean 0.34860557768924294; Min. 0.0; StDev 0.477004173823539; Max. 1.0

Variable Format: numeric

Notes: UNF:6:v/VK22w0Ramm9tMhnglRCQ==

MentionExcerpt_ln

f6050037 Location:

Variable Format: character

Notes: UNF:6:d9QuDoJX96AFTxPWgFs82A==

MentionNotes_ln

f6050037 Location:

Variable Format: character

Notes: UNF:6:m2tdgU12e7J4TpuiB1YY3w==

gold_id

f6050037 Location:

Summary Statistics: StDev 194.71132478620754; Max. 673.0; Mean 336.5; Valid 674.0; Min. 0.0

Variable Format: numeric

Notes: UNF:6:ojMvsraoU+LdGbm6XbOBSg==

Other Study-Related Materials

Label:

altmetric_dataset.tab

Text:

Downloaded from Altmetric Explorer on Sept. 9, 2021. Contains all research mentions found in the following outlets since March 1, 2021: The Guardian, HealthDay, IFLScience, MedPage Today, News Medical, New York Times, Popular Science, and Wired

Notes:

text/tsv

Other Study-Related Materials

Label:

news_mention_codebook.pdf

Notes:

application/pdf