Data for: Identifying Metadata Quality Issues Across Cultures (doi:10.7910/DVN/GZI7IA)

View:

Part 1: Document Description
Part 2: Study Description
Part 3: Data Files Description
Part 4: Variable Description
Part 5: Other Study-Related Materials
Entire Codebook

Document Description

Citation

Title:

Data for: Identifying Metadata Quality Issues Across Cultures

Identification Number:

doi:10.7910/DVN/GZI7IA

Distributor:

Harvard Dataverse

Date of Distribution:

2023-07-26

Version:

1

Bibliographic Citation:

Shi, Julie; Nason, Mike; Tullney, Marco; Alperin, Juan Pablo, 2023, "Data for: Identifying Metadata Quality Issues Across Cultures", https://doi.org/10.7910/DVN/GZI7IA, Harvard Dataverse, V1, UNF:6:4g/frGnjZiZaOWIVHs5DFQ== [fileUNF]

Study Description

Citation

Title:

Data for: Identifying Metadata Quality Issues Across Cultures

Identification Number:

doi:10.7910/DVN/GZI7IA

Authoring Entity:

Shi, Julie (Public Knowledge Project & University of Toronto Libraries)

Nason, Mike (Public Knowledge Project & University of New Brunswick)

Tullney, Marco (Public Knowledge Project & Technische Informationsbibliothek (TIB))

Alperin, Juan Pablo (Public Knowledge Project & ScholCommLab, Simon Fraser University)

Distributor:

Harvard Dataverse

Access Authority:

Alperin, Juan Pablo

Depositor:

Alperin, Juan Pablo

Date of Deposit:

2023-07-25

Holdings Information:

https://doi.org/10.7910/DVN/GZI7IA

Study Scope

Keywords:

Computer and Information Science

Abstract:

This sample was drawn from the Crossref API on March 8, 2022. The sample was constructed purposefully on the hypothesis that records with at least one known issue would be more likely to yield issues related to cultural meanings and identity. Records known or suspected to have at least one quality issue were selected by the authors and Crossref staff. The Crossref API was then used to randomly select additional records from the same prefix. Records in the sample represent 51 DOI prefixes that were chosen without regard for the manuscript management or publishing platform used, as well as 17 prefixes for journals known to use the Open Journal Systems manuscript management and publishing platform. OJS was specifically identified due to the authors' familiarity with the platform, its international and multilingual reach, and previous work on its metadata quality.

Methodology and Processing

Sources Statement

Data Access

Other Study Description Materials

File Description--f7246882

File: issuesCountCultural.tab

  • Number of cases: 72

  • No. of variables per record: 15

  • Type of File: text/tab-separated-values

Notes:

UNF:6:Z42MOr7tJUICxl/JFjLaYw==

File Description--f7246890

File: issuesGrouping.tab

  • Number of cases: 37

  • No. of variables per record: 11

  • Type of File: text/tab-separated-values

Notes:

UNF:6:olZnjnd4Nrq8PLut7s+Dzw==

File Description--f7246889

File: issuesValueList.tab

  • Number of cases: 994

  • No. of variables per record: 7

  • Type of File: text/tab-separated-values

Notes:

UNF:6:uJBi09tNeuYHcbNLPL04vA==

File Description--f7246888

File: recordReviewCodebook.tab

  • Number of cases: 56

  • No. of variables per record: 4

  • Type of File: text/tab-separated-values

Notes:

UNF:6:evchHHZq09Lwas62dK8igA==

File Description--f7246887

File: recordReview.tab

  • Number of cases: 427

  • No. of variables per record: 24

  • Type of File: text/tab-separated-values

Notes:

UNF:6:PGMyVwal8NLfuDAgy6fg3Q==

Variable Description

List of Variables:

Variables

issue

f7246882 Location:

Variable Format: character

Notes: UNF:6:w4DQqHAB9VwMkRgznLC43w==

issuesCount

f7246882 Location:

Summary Statistics: Mean 265.8787878787878; Max. 4387.0; Valid 33.0; StDev 784.1376934878751; Min. 1.0;

Variable Format: numeric

Notes: UNF:6:SM8HrWlPVR2fS0nzqHrfbA==

itemDOI

f7246882 Location:

Summary Statistics: Mean 1.9393939393939403; Max. 32.0; Min. 0.0; Valid 33.0; StDev 6.661171978054021

Variable Format: numeric

Notes: UNF:6:1IVVH9iDa/8lXIQTa9K4ew==

itemAbstract

f7246882 Location:

Summary Statistics: Valid 33.0; Max. 348.0; Min. 0.0; Mean 21.09090909090909; StDev 70.74265140120723

Variable Format: numeric

Notes: UNF:6:QBq1ZyvGnu3gUCi7jUOU7Q==

itemTitle

f7246882 Location:

Summary Statistics: Mean 15.333333333333332; Min. 0.0; StDev 47.505701412216474; Max. 253.0; Valid 33.0

Variable Format: numeric

Notes: UNF:6:tdOd5B2y4Dwb4FpK1asDBw==

itemLicense

f7246882 Location:

Summary Statistics: StDev 85.26772676266631; Mean 23.090909090909093; Max. 381.0; Min. 0.0; Valid 33.0;

Variable Format: numeric

Notes: UNF:6:XheO89RYSDmxTp2E63+w5w==

personGeneral

f7246882 Location:

Summary Statistics: Min. 0.0; Max. 96.0; Mean 5.818181818181818; Valid 33.0; StDev 18.071826390570187

Variable Format: numeric

Notes: UNF:6:wlgHPlWogxBk01SbGr23lg==

personGivenName

f7246882 Location:

Summary Statistics: Max. 357.0; Valid 33.0; Min. 0.0; Mean 21.63636363636364; StDev 67.77528042261159

Variable Format: numeric

Notes: UNF:6:yW5FDJW4703xtWq31zMxKg==

personFamilyName

f7246882 Location:

Summary Statistics: StDev 64.48010322666374; Max. 329.0; Mean 19.939393939393938; Min. 0.0; Valid 33.0

Variable Format: numeric

Notes: UNF:6:L03EweCYzLoW+gMhIKShUA==

personAffiliation

f7246882 Location:

Summary Statistics: StDev 94.34725911774174; Mean 28.303030303030297; Min. 0.0; Max. 467.0; Valid 33.0

Variable Format: numeric

Notes: UNF:6:L3biSYEhvKEPNjfmtIkHEg==

containerPublisher

f7246882 Location:

Summary Statistics: StDev 143.52863060389834; Min. 0.0; Max. 742.0; Mean 44.969696969696955; Valid 33.0

Variable Format: numeric

Notes: UNF:6:/dBEp+JHHF5TWZ8aRzt6nQ==

containerTitle

f7246882 Location:

Summary Statistics: Mean 17.636363636363637; Valid 33.0; StDev 53.69346921520006; Max. 291.0; Min. 0.0

Variable Format: numeric

Notes: UNF:6:ZmQZ7KIQJg7u2i9kQ+OwXg==

containerLanguage

f7246882 Location:

Summary Statistics: StDev 67.38622566289722; Mean 18.18181818181818; Max. 300.0; Valid 33.0; Min. 0.0

Variable Format: numeric

Notes: UNF:6:bPYD8G5U6MBrvNtCI4UbLw==

containerSubject

f7246882 Location:

Summary Statistics: Min. 0.0; StDev 70.72895607368122; Valid 33.0; Max. 381.0; Mean 23.090909090909093;

Variable Format: numeric

Notes: UNF:6:HCYpO5z3yUMIGLZ5lGGz8Q==

containerRights

f7246882 Location:

Summary Statistics: Min. 0.0; Mean 24.848484848484844; Max. 410.0; Valid 33.0; StDev 99.3453953424998;

Variable Format: numeric

Notes: UNF:6:RwEgckjRcNTjzU1sygEQ/g==

issueGrouping

f7246890 Location:

Variable Format: character

Notes: UNF:6:S/lm1PxJ6fQ9niKAWepJJw==

issueDetails

f7246890 Location:

Variable Format: character

Notes: UNF:6:pnJMZTjf4vEkoLDi8CzJbA==

elementLocation

f7246890 Location:

Variable Format: character

Notes: UNF:6:OcVUXc80EpmOKldGBM4T6Q==

issueDefinition

f7246890 Location:

Variable Format: character

Notes: UNF:6:YIMISa5Tjjjwf1vnKyOXiQ==

issueExample

f7246890 Location:

Variable Format: character

Notes: UNF:6:ekShIZ4kuvuyQtQlHlLReg==

issueValueList

f7246890 Location:

Variable Format: character

Notes: UNF:6:xYMqRhXJfLntYt7N99j6Yg==

issueComments

f7246890 Location:

Variable Format: character

Notes: UNF:6:jAAg1/q4ztlTSfbnIWeqbQ==

issuePriority

f7246890 Location:

Variable Format: character

Notes: UNF:6:3Bzj///BQffJ+t6wp8BIcQ==

issuePoliticalSignificance

f7246890 Location:

Variable Format: character

Notes: UNF:6:y1DLSHbJYtIp+SEAKl0+vg==

issueHeuristics

f7246890 Location:

Variable Format: character

Notes: UNF:6:fdGJsGyVa7zr1RM+x0UscQ==

issueHeuristicsNotes

f7246890 Location:

Variable Format: character

Notes: UNF:6:bFolqXLt4PtaEqber9NlyQ==

issueGrouping (as determined in the issuesGrouping sheet)

f7246889 Location:

Variable Format: character

Notes: UNF:6:tXJNUb22QtQQG0iwWgaeBA==

issueDetails

f7246889 Location:

Variable Format: character

Notes: UNF:6:uUBDASuiKY/TrM8UFFq+0A==

issueDefinition

f7246889 Location:

Variable Format: character

Notes: UNF:6:oOBJW+eC3/i3NoIp2TbPWg==

issueValueList (as listed in column F of the issuesGrouping sheet)

f7246889 Location:

Variable Format: character

Notes: UNF:6:bNRIuwPmbrwJ/C36wisLhg==

issueValueListDefinition

f7246889 Location:

Variable Format: character

Notes: UNF:6:WTed76bJB4ESsfP33lhZCA==

Example

f7246889 Location:

Variable Format: character

Notes: UNF:6:s58z8lIW96/lUO16Shthmg==

politicalSignificance?

f7246889 Location:

Variable Format: character

Notes: UNF:6:mB6mVtl5b5bN25osGpOZSA==

General

f7246888 Location:

Variable Format: character

Notes: UNF:6:RHaZeRBm5Sho3fh40kg+cw==

Definitions

f7246888 Location:

Variable Format: character

Notes: UNF:6:HQtWJ77OA8LZ2+bI43obzA==

Column in recordReview

f7246888 Location:

Variable Format: character

Notes: UNF:6:h/UOGV4sjCgyOf/XzedTqA==

Notes

f7246888 Location:

Variable Format: character

Notes: UNF:6:iv5T/uEfWgI51IZX4Yb31Q==

doi

f7246887 Location:

Variable Format: character

Notes: UNF:6:7j85DSOzZi9HI3srYYcung==

api (json)

f7246887 Location:

Variable Format: character

Notes: UNF:6:jcM4OX7WuJtlO/QfLkAvpg==

resolve doi

f7246887 Location:

Variable Format: character

Notes: UNF:6:xhMvuBefTYzg4L3OVyvoJg==

issuesCount

f7246887 Location:

Summary Statistics: Valid 427.0; Min. 3.0; Mean 11.379391100702577; StDev 3.850046838836531; Max. 21.0;

Variable Format: numeric

Notes: UNF:6:60h9urTztFS/MbRIw6c54A==

culturalIssuesCount

f7246887 Location:

Summary Statistics: StDev 3.679961424320251; Max. 20.0; Min. 3.0; Mean 10.274004683840749; Valid 427.0;

Variable Format: numeric

Notes: UNF:6:Z2SdOEM15EdgrQMvfHOn8w==

nonculturalIssuesCount

f7246887 Location:

Summary Statistics: Min. 0.0; StDev 1.100850728690008; Max. 6.0; Mean 1.105386416861827; Valid 427.0

Variable Format: numeric

Notes: UNF:6:Mp0DOu8GvuWMmgFqNur95Q==

itemDOI_issueDetails

f7246887 Location:

Variable Format: character

Notes: UNF:6:FQayF4ghc8HfHLC8vJ5k3Q==

itemAbstract_issueDetails

f7246887 Location:

Variable Format: character

Notes: UNF:6:LVLilutjl5I0PiL1sfe1KA==

itemTitle_issueDetails

f7246887 Location:

Variable Format: character

Notes: UNF:6:TW0dGArBXWex1AQGXhTtwQ==

itemLicense_issueDetails

f7246887 Location:

Variable Format: character

Notes: UNF:6:xd5rjF4H1od0G3/5YUwOgQ==

personGeneral_issueDetails

f7246887 Location:

Variable Format: character

Notes: UNF:6:mw0c5SG4eQfmZYOJPqDPWg==

personGivenName_issueDetails

f7246887 Location:

Variable Format: character

Notes: UNF:6:i9vjjK/WA/zFchM3PlLWeg==

personFamilyName_IssueDetails

f7246887 Location:

Variable Format: character

Notes: UNF:6:ImU9X8xuB1lBEi8Jel6phw==

personAffiliation_IssueDetails

f7246887 Location:

Variable Format: character

Notes: UNF:6:TvJkbbJBicD7r4USD/cTqg==

containerPublisher_IssueDetails

f7246887 Location:

Variable Format: character

Notes: UNF:6:Uh7uUe6rb1UwD3kxh5yJXQ==

containerTitle_issueDetails

f7246887 Location:

Variable Format: character

Notes: UNF:6:i7wrJjxKQYefGUM5uxROCA==

containerLanguage_IssueDetails

f7246887 Location:

Variable Format: character

Notes: UNF:6:khBLePpv7RoDMLoBeGXY1g==

containerSubject_issueDetails

f7246887 Location:

Variable Format: character

Notes: UNF:6:q2J4nIRBAgfSQVS17Fxxtg==

containerRights_issueDetails

f7246887 Location:

Variable Format: character

Notes: UNF:6:V9XhFUC5wExTe/79fFoB+A==

other_issueDetails

f7246887 Location:

Variable Format: character

Notes: UNF:6:3snPyK+ag/NK8/ee6vmG2g==

Notes

f7246887 Location:

Variable Format: character

Notes: UNF:6:r+Mypdq43MjSUlLxzR7CBQ==

containerLanguagePolicy

f7246887 Location:

Variable Format: character

Notes: UNF:6:85eU7Gk/3Th0/n1EpQ0L/w==

publishingPlatform

f7246887 Location:

Variable Format: character

Notes: UNF:6:uYnmZsSNc9ouV2m64uEvLA==

itemType

f7246887 Location:

Variable Format: character

Notes: UNF:6:2WoWf+/m2Cvr8x/fsFLCkQ==

Other Study-Related Materials

Label:

Dataset Description.pdf

Notes:

application/pdf