Featured Dataverses

In order to use this feature you must have at least one published or linked dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

Advanced Search

1 to 10 of 43 Results
Dec 5, 2021
Liu, Shifeng, 2021, "Replication Data for: Identifying unreported links between ClinicalTrials.gov trial registrations and their published results", https://doi.org/10.7910/DVN/MEROWG, Harvard Dataverse, V1, UNF:6:RciKFKdYt/cDo9NSBX36bg== [fileUNF]
This is the dataset we used to train and evaluate for the paper Identifying unreported links between ClinicalTrials.gov trial registrations and their published results. This dataset is collected on 29 September 2020. The corresponding code and the structure of the dataset can be found in https://github.com/evidence-surveillance/unreported_link_iden...
Tabular Data - 346.6 KB - 2 Variables, 27369 Observations - UNF:6:pZ/N4RCHAX4DH0HJRAQSmA==
the automatically generaget ground truth of clnical nctid index and PubMed publish id index pairs. NOTE: these are represented with index tranformed from original nctid or publish id.
Comma Separated Values - 254.0 MB - MD5: a84af924369828919fffd9077b6a9aa6
the extracted clinical trial registrations. format: nctid, concatenated_text, date
Tabular Data - 2.5 MB - 2 Variables, 128493 Observations - UNF:6:7j5O4rZkqjMVMiSr4gh+jQ==
processed transformation of nctid to index. format: nctid, index
Unknown - 169.5 KB - MD5: 87aedea0c17034e896aa1e6a93565f0b
the vector storage of ground pairs.
Unknown - 169.5 KB - MD5: ae20f7aa49c2ae15e94b20685b203764
the vector storage of ground pairs.
Tabular Data - 346.6 KB - 2 Variables, 27369 Observations - UNF:6:pZ/N4RCHAX4DH0HJRAQSmA==
the automatically generaget ground truth of PubMed publish id index and clnical nctid index pairs. NOTE: these are represented with index tranformed from original nctid or publish id.
Comma Separated Values - 648.4 MB - MD5: 64100af302bc54ea5d48908fd0dcc5c9
the extracted PubMed articles. format:publishid, nctid (nan if does not exist), concatenated_text, date
Tabular Data - 5.7 MB - 2 Variables, 378066 Observations - UNF:6:fwZ9gaEGk6ddbhOJ911beg==
processed transformation of publishid to index. format: publishid, index
Unknown - 18.4 MB - MD5: deb663cce11c36b6f3982b37634c6c21
the vectorized clinical trial registrations. These only contains those on the ground truth list. The vectors are based on the token frequency-inverse document frequency of bigram tokens.
Add Data

Sign up or log in to create a dataverse or add a dataset.

Share Dataverse

Share this dataverse on your favorite social media networks.

Link Dataverse
Reset Modifications

Are you sure you want to reset the selected metadata fields? If you do this, any customizations (hidden, required, optional) you have done will no longer appear.