Code Contribution and Credit in Science

Version 1.1

Brown, Eva Maxfield; Slaughter, Isaac; Weber, Nicholas, 2025, "Code Contribution and Credit in Science", https://doi.org/10.7910/DVN/KPYVI1, Harvard Dataverse, V1, UNF:6:wWLU2O+hmG5I5RHguL5VTg== [fileUNF]

Learn about Data Citation Standards.

Contact Owner

Dataset Metrics

31 Downloads

Description	This dataset contains all data used and created as a part of the "Code Contribution and Credit in Science" article [TODO: link/doi to paper]. There are six files in this dataset: 1. rs-graph-v1-prod.db 2. rs-graph-v1-redacted.db 3. annotated-dev-author-em-resolved.csv 4. train-set.parquet 5. test-set.parquet 6. dev-author-em-misclassifications.csv rs-graph-v1-redacted.db The rs-graph-v1-redacted.db file is a SQLite database file that contains article-repository pairs. For each article, the basic bibliometric and author information is included. For each repository, only the basic repository metadata is included. For details as to how to load and access the data within this database, please review: https://github.com/evamaxfield/rs-graph rs-graph-v1-prod.db The rs-graph-v1-prod.db file is a SQLite database file that contains the same basic data as the rs-graph-v1-redacted.db database file but additionally includes the repository contributor information for each repository along with each contributor's details as well as our predicted linkages between article authors and repository developers. This database file has restricted access due to it's creation of linked personally identifiable information. For details as to how to load and access the data within this database, please review: https://github.com/evamaxfield/rs-graph annotated-dev-author-em-resolved.csv The annotated-dev-author-em-resolved.csv CSV file stores the annotations created by our team which were used to train our author-developer-account entity matching model. Like with the rs-graph-v1-prod.db, this data has restricted access due to it's creation of linked personally identifiable information. While the training data is kept private and available by request, we make the trained predictive model available at: https://github.com/evamaxfield/sci-soft-models The train-set.parquet and test-set.parquet were the exact splits used for model training. The dev-author-em-misclassifications.csv is the set of misclassifications from the model on the test-set.
Subject	Computer and Information Science
Keyword	Science of Science, Metascience, Scientific Software, Research Software, Software Contribution
License/Data Use Agreement	CC0 1.0

Filter by

	1 to 6 of 6 Files	Original Format Archival Format (.tab) Request Access
	annotated-dev-author-em-resolved.tab Tabular Data - 1.7 MB Published Jun 20, 2025 5 Downloads 5 Variables, 2999 Observations UNF:6:lGRiU7vbKhIKmmbzp2iYrA==	Access File File Access Restricted Request Access Download Metadata Data File Citation Download EndNote XML Download RIS Download BibTeX
	dev-author-em-misclassifications.tab Tabular Data - 1.9 KB Published Jun 20, 2025 5 Downloads 3 Variables, 8 Observations UNF:6:o/C5Gy9BKFFEAOoO4jH9sA==	Access File File Access Restricted Request Access Download Metadata Data File Citation Download EndNote XML Download RIS Download BibTeX
	rs-graph-v1-prod.db Unknown - 1.4 GB Published Jun 20, 2025 6 Downloads MD5: 7a1b50bf0de0d1dd7818d91aea6e4821	Access File File Access Restricted Request Access Download Metadata Data File Citation Download EndNote XML Download RIS Download BibTeX
	rs-graph-v1-redacted.db Unknown - 1.1 GB Published Jun 20, 2025 5 Downloads MD5: fc8b9218fdbc684427f690266b8224cf	Access File File Access Public Download Options Original File Format Download Metadata Data File Citation Download EndNote XML Download RIS Download BibTeX
	test-set.parquet Unknown - 29.4 KB Published Jun 20, 2025 5 Downloads MD5: b61a5df78c48c297db749afdb839c3e1	Access File File Access Restricted Request Access Download Metadata Data File Citation Download EndNote XML Download RIS Download BibTeX
	train-set.parquet Unknown - 123.5 KB Published Jun 20, 2025 5 Downloads MD5: 7cb2a0200598485ab441462706290bb9	Access File File Access Restricted Request Access Download Metadata Data File Citation Download EndNote XML Download RIS Download BibTeX

Citation Metadata

Persistent Identifier	doi:10.7910/DVN/KPYVI1
Publication Date	2025-06-20
Title	Code Contribution and Credit in Science
Author	University of Washington0000-0003-2564-0373 University of Washington0000-0002-1911-2374 University of Washington0000-0002-6008-3763
Point of Contact	Use email button above to contact. Brown, Eva Maxfield (University of Washington)
Description	This dataset contains all data used and created as a part of the "Code Contribution and Credit in Science" article [TODO: link/doi to paper]. There are six files in this dataset: 1. rs-graph-v1-prod.db 2. rs-graph-v1-redacted.db 3. annotated-dev-author-em-resolved.csv 4. train-set.parquet 5. test-set.parquet 6. dev-author-em-misclassifications.csv rs-graph-v1-redacted.db The rs-graph-v1-redacted.db file is a SQLite database file that contains article-repository pairs. For each article, the basic bibliometric and author information is included. For each repository, only the basic repository metadata is included. For details as to how to load and access the data within this database, please review: https://github.com/evamaxfield/rs-graph rs-graph-v1-prod.db The rs-graph-v1-prod.db file is a SQLite database file that contains the same basic data as the rs-graph-v1-redacted.db database file but additionally includes the repository contributor information for each repository along with each contributor's details as well as our predicted linkages between article authors and repository developers. This database file has restricted access due to it's creation of linked personally identifiable information. For details as to how to load and access the data within this database, please review: https://github.com/evamaxfield/rs-graph annotated-dev-author-em-resolved.csv The annotated-dev-author-em-resolved.csv CSV file stores the annotations created by our team which were used to train our author-developer-account entity matching model. Like with the rs-graph-v1-prod.db, this data has restricted access due to it's creation of linked personally identifiable information. While the training data is kept private and available by request, we make the trained predictive model available at: https://github.com/evamaxfield/sci-soft-models The train-set.parquet and test-set.parquet were the exact splits used for model training. The dev-author-em-misclassifications.csv is the set of misclassifications from the model on the test-set.
Subject	Computer and Information Science
Keyword	Science of Science Metascience Scientific Software Research Software Software Contribution
Depositor	Brown, Eva Maxfield
Deposit Date	2025-02-25

Dataset Terms

License/Data Use Agreement

Our Community Norms as well as good scientific practices expect that proper credit is given via citation. Please use the data citation shown on the dataset page.

Creative Commons CC0 1.0 Universal Public Domain Dedication. CC0 1.0

Restricted Files + Terms of Access

Restricted Files

There are 5 restricted files in this dataset.

Request Access

Users may request access to files.

Dataset Version	Summary	Contributors	Published on
No records found.

Edit File

This file has already been deleted (or replaced) in the current version. It may not be edited.

Restrict Access

Restricting limits access to published files. People who want to use the restricted files can request access by default. If you disable request access, you must add information about access to the Terms of Access field.

Learn about restricting files and dataset access in the User Guide.

Request Access

Enable access request

You must enable request access or add terms of access to restrict file access.

Terms of Access for Restricted Files

Save Changes

Edit Embargo

The selected file or files have already been published. Contact an administrator to change the embargo date or reason of the file or files.

Edit Retention Period

The selected file or files have already been published. Contact an administrator to change the retention period date or reason of the file or files.

Delete Files

The file will be deleted after you click on the Delete button.

Files will not be removed from previously published versions of the dataset.

Select File(s)

Please select one or more files.

Share Dataset

Share this dataset on your favorite social media networks.

Continue

Dataset Citations

Citations for this dataset are retrieved from Crossref via DataCite using Make Data Count standards. For more information about dataset metrics, please refer to the User Guide.

Sorry, no citations were found.

Inaccessible Files Selected

The selected file(s) may not be downloaded because you have not been granted access or the file(s) have a retention period that has expired or the files can only be transferred via Globus.

You may request access to any restricted file(s) by clicking the Request Access button.

Ineligible Files Selected

The selected file(s) may not be transferred because you have not been granted access or the file(s) have a retention period that has expired or the files are not Globus accessible.

You may request access to any restricted file(s) by clicking the Request Access button.

Download Options

The files selected are too large to download as a ZIP.

You can select individual files that are below the 15.0 GB download limit from the files table, or use the Data Access API for programmatic access to the files.

Select File(s)

Please select a file or files to be downloaded.

Inaccessible Files Selected

The selected file(s) may not be downloaded because you have not been granted access or the file(s) have a retention period that has expired.

Click Continue to download the files you have access to download.

Ineligible Files Selected

Some file(s) cannot be transferred. (They are restricted, embargoed, with an expired retention period, or not Globus accessible.)

Click Continue to transfer the elligible files.

Delete Dataset

Are you sure you want to delete this dataset and all of its files? You cannot undelete this dataset.

Delete Draft Version

Are you sure you want to delete this draft version? Files will be reverted to the most recently published version. You cannot undelete this draft.

Unpublished Dataset Preview URL

Preview URL can only be used with unpublished versions of datasets.

Unpublished Dataset Preview URL

Are you sure you want to disable the Preview URL? If you have shared the Preview URL with others they will no longer be able to use it to access your unpublished dataset.

Delete Files

The file(s) will be deleted after you click on the Delete button.

Files will not be removed from previously published versions of the dataset.

Compute

This dataset contains restricted files you may not compute on because you have not been granted access.

Deaccession Dataset

Are you sure you want to deaccession? This is permanent and the selected version(s) will no longer be viewable by the public.

Deaccession Dataset

Are you sure you want to deaccession this dataset? This is permanent an it will no longer be viewable by the public.

Version Differences Details

Please select two versions to view the differences.

Version Differences Details

Version:
Last Updated:

Select File(s)

Please select a file or files for access request.

Select File(s)

Embargoed files cannot be accessed. Please select an unembargoed file or files for your access request.

Edit Tags

Select existing file tags or create new tags to describe your files. Each file can have more than one tag.

Request Access

You need to Sign Up or Log In to request access.

Dataset Terms

Please confirm and/or complete the information needed below in order to request access to files in this dataset.

This dataset is made available under the following terms. Please confirm and/or complete the information needed below in order to continue.

License/Data Use Agreement

Our Community Norms as well as good scientific practices expect that proper credit is given via citation. Please use the data citation shown on the dataset page.

Creative Commons CC0 1.0 Universal Public Domain Dedication. CC0 1.0

Preview Guestbook

Upon downloading files the guestbook asks for the following information.

Guestbook Name

Collected Data

Account Information

Package File Download

Use the Download URL in a Wget command or a download manager to download this package file. Download via web browser is not recommended. User Guide - Downloading a Dataverse Package via URL

Download URL

https://qa.dataverse.org/api/access/datafile/

Compute Batch

Clear Batch

Dataset	Persistent Identifier	Change Compute Batch

Compute Batch

Submit for Review

You will not be able to make changes to this dataset while it is in review.

Publish Dataset

Are you sure you want to republish this dataset?

By default datasets are published with the CC0-“Public Domain Dedication” waiver. Learn more about the CC0 waiver here.

To publish with custom Terms of Use, click the Cancel button and go to the Terms tab for this dataset.

Select if this is a minor or major version update.

Minor Release (1.2)

Major Release (2.0)

Publish Dataset

This dataset cannot be published until Weber Lab is published by its administrator.

Publish Dataset

This dataset cannot be published until Weber Lab and Harvard Dataverse are published.

Return to Author

Return this dataset to contributor for modification. The reason for return entered below will be sent by email to the author.

Curation Status History

Status	Date	Assigner
No records found.

Add/Edit a Version Note

Styled Citation