Text data for comparing the language used in Wikipedia, the DiabeticConnect Forum, and Research Articles (doi:10.7910/DVN/X27DSE)

View:

Part 1: Document Description
Part 2: Study Description
Part 3: Data Files Description
Part 4: Variable Description
Part 5: Other Study-Related Materials
Entire Codebook

(external link)

Document Description

Citation

Title:

Text data for comparing the language used in Wikipedia, the DiabeticConnect Forum, and Research Articles

Identification Number:

doi:10.7910/DVN/X27DSE

Distributor:

Harvard Dataverse

Date of Distribution:

2019-07-11

Version:

1

Bibliographic Citation:

Didegah; Ghaseminik, Zahra; Alperin, Juan Pablo, 2019, "Text data for comparing the language used in Wikipedia, the DiabeticConnect Forum, and Research Articles", https://doi.org/10.7910/DVN/X27DSE, Harvard Dataverse, V1, UNF:6:pE4SFkzhKvY+PwYHdhT3sg== [fileUNF]

Study Description

Citation

Title:

Text data for comparing the language used in Wikipedia, the DiabeticConnect Forum, and Research Articles

Identification Number:

doi:10.7910/DVN/X27DSE

Authoring Entity:

Didegah (University of British Columbia)

Ghaseminik, Zahra (Scientometrics & Technological Investigation Research Group)

Alperin, Juan Pablo (Simon Fraser University)

Distributor:

Harvard Dataverse

Access Authority:

Alperin, Juan Pablo

Depositor:

Alperin, Juan Pablo

Date of Deposit:

2019-07-04

Holdings Information:

https://doi.org/10.7910/DVN/X27DSE

Study Scope

Keywords:

Computer and Information Science, Medicine, Health and Life Sciences, Social Sciences

Abstract:

This dataset contains the text data analyzed in research comparing the language used in a Diabetes-related online Forum with that of Wikipedia and Research articles about Diabetes. The dataset contains 3 files: one for each of the three sources. Details of the data collection methods and filters applied can be found in the related publication.

Methodology and Processing

Sources Statement

Data Access

Notes:

<a href="http://creativecommons.org/publicdomain/zero/1.0">CC0 1.0</a>

Other Study Description Materials

Related Publications

Citation

Title:

Didegah, F., Ghaseminik, Z., & Alperin, J. P. (2018). Using a diabetes discussion forum and Wikipedia to detect the alignment of public interests and the research literature. BioRxiv, 496927. https://doi.org/10/gfsck6

Identification Number:

https://doi.org/10.1101/496927

Bibliographic Citation:

Didegah, F., Ghaseminik, Z., & Alperin, J. P. (2018). Using a diabetes discussion forum and Wikipedia to detect the alignment of public interests and the research literature. BioRxiv, 496927. https://doi.org/10/gfsck6

File Description--f3466001

File: DiabeticConnect_titles_tags_content.tab

  • Number of cases: 109071

  • No. of variables per record: 2

  • Type of File: text/tab-separated-values

Notes:

UNF:6:9MW7dCKYxROgZ1REebMTbQ==

DiabetesConnect forum post text and tags

File Description--f3466000

File: Scopus_title_abstract_keywords.tab

  • Number of cases: 172912

  • No. of variables per record: 4

  • Type of File: text/tab-separated-values

Notes:

UNF:6:vjwIOq8IhV3OuRv1aXtmxw==

Diabetes related article titles, abstracts, and author keywords from Scopus database

Variable Description

List of Variables:

Variables

Discussiontitle

f3466001 Location:

Variable Format: character

Notes: UNF:6:ukmTaw2Y8a++QsXEOBhH2Q==

Discussiontag

f3466001 Location:

Variable Format: character

Notes: UNF:6:Ji48xzKWe8LryY54D8sLew==

articleid

f3466000 Location:

Summary Statistics: Max. 8.5017406517E10; Valid 172912.0; Min. 21454.0; Mean 6.151215310014024E10; StDev 3.2121335706911213E10

Variable Format: numeric

Notes: UNF:6:8LgcEYBav9w/aCLp2w9gBA==

title

f3466000 Location:

Variable Format: character

Notes: UNF:6:496rUDm+1yMzS3g+b3NtQQ==

abstract

f3466000 Location:

Variable Format: character

Notes: UNF:6:iv5NK+1RbW/oTI87cL6V9A==

author-suppliedkeywords

f3466000 Location:

Variable Format: character

Notes: UNF:6:83uSe8Hz4ZESvxqdlEQMlw==

Other Study-Related Materials

Label:

Wikipedia_title_html.xlsx

Text:

Title and HTML content of Wikipedia articles about diabetes

Notes:

application/vnd.openxmlformats-officedocument.spreadsheetml.sheet