Replication Data for: Evaluating the Performance of Tools Used to Call Minority Variants from Whole Genome Short-Read Data (doi:10.7910/DVN/ZIO43M)

View:

Part 1: Document Description
Part 2: Study Description
Part 5: Other Study-Related Materials
Entire Codebook

(external link)

Document Description

Citation

Title:

Replication Data for: Evaluating the Performance of Tools Used to Call Minority Variants from Whole Genome Short-Read Data

Identification Number:

doi:10.7910/DVN/ZIO43M

Distributor:

Harvard Dataverse

Date of Distribution:

2018-01-29

Version:

5

Bibliographic Citation:

Mohammed, Khadija Said; Githinji, George, 2018, "Replication Data for: Evaluating the Performance of Tools Used to Call Minority Variants from Whole Genome Short-Read Data", https://doi.org/10.7910/DVN/ZIO43M, Harvard Dataverse, V5, UNF:6:PJqmV9vNQ9wsd+4PO+h8Jw== [fileUNF]

Study Description

Citation

Title:

Replication Data for: Evaluating the Performance of Tools Used to Call Minority Variants from Whole Genome Short-Read Data

Identification Number:

doi:10.7910/DVN/ZIO43M

Authoring Entity:

Mohammed, Khadija Said (KEMRI-Wellcome Trust Research Programme, Kilifi, Kenya)

Githinji, George (KEMRI-Wellcome Trust Research Programme, Kilifi, Kenya)

Other identifications and acknowledgements:

Mwango, Lillian

Distributor:

Harvard Dataverse

Access Authority:

Githinji, G

Depositor:

Githinji, George

Date of Deposit:

2018-01-18

Holdings Information:

https://doi.org/10.7910/DVN/ZIO43M

Study Scope

Keywords:

Medicine, Health and Life Sciences

Abstract:

Several minority variant callers have been developed to describe the minority variants sub-populations from whole genome sequence data. These tools differ based on bioinformatics and statistical approaches used to distinguish between real errors and relevant low-frequency variants. This project evaluated the diagnostic performance of four published minority variant callers and assessed overall concordance used to report minority variants from short-read sequenced data. An ART-Illumina read simulation tool was used to generate artificial short-read datasets of varying coverage based on a Respiratory Syncytial Virus (RSV) reference genome. The samples were spiked with nucleotide variants at predetermined positions and frequencies and thereafter called using FreeBayes, LoFreq, Vardict, and VarScan2. To identify the effect of the quality of data on concordance and performance of the callers we included datasets with error profiles.

Notes:

The datasets and analysis code (R scripts) are also available on Open Science Framework. DOI: <a href="http://doi.org/10.17605/OSF.IO/ZW39Q">10.17605/OSF.IO/ZW39Q</a>

Methodology and Processing

Sources Statement

Data Access

Extent of Collection:

Two (2) datasets in CSV format, analysis software code (R scripts) and Readme FIle (.txt)

Citation Requirement:

Publications based on this data collection should acknowledge this source by means of bibliographic citation. To ensure that such source attributions are captured for bibliographic utilities, citations must appear in footnotes or in the reference section of publications. The bibliographic citation for this data collection is: "Mohammed, Khadija Said; Githinji, George, 2018, "Replication Data for: Evaluating the Performance of Tools Used to Call Minority Variants from Whole Genome Short-Read Data", doi:10.7910/DVN/ZIO43M, Harvard Dataverse, V1"

Notes:

This data is made available under the Creative Commons Attribution 4.0 International (CC BY 4.0) - https://creativecommons.org/licenses/by/4.0/legalcode. Publications based on these data should acknowledge this source by means of bibliographic citations. For more information on these data, please contact the authors: Mohammed, Khadija Said (KSaid@kemri-wellcome.org); Githinji, George (GGithinji@kemri-wellcome.org) OR the data governance office via this email address: dgc@kemri-wellcome.org

Other Study Description Materials

Related Studies

The datasets and analysis code (R scripts) are also available on Open Science Framework. DOI: <a href="http://doi.org/10.17605/OSF.IO/ZW39Q">10.17605/OSF.IO/ZW39Q</a>

Related Publications

Citation

Title:

Mohammed, K.S., Kibinge, N., Prins, P., Agoti, C.N., Cotten, M., Nokes, D.J., Brand, S. and Githinji, G., 2018. Evaluating the performance of tools used to call minority variants from whole genome short-read data. Wellcome open research, 3.

Identification Number:

10.12688/wellcomeopenres.13538.2

Bibliographic Citation:

Mohammed, K.S., Kibinge, N., Prins, P., Agoti, C.N., Cotten, M., Nokes, D.J., Brand, S. and Githinji, G., 2018. Evaluating the performance of tools used to call minority variants from whole genome short-read data. Wellcome open research, 3.

Other Study-Related Materials

Label:

concordanceplot.R

Text:

Analysis Code (R Script)

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

density_plot.R

Text:

Analysis Code (R Script)

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

Githinji_2018_Variant_Calling_Tools_Readme.txt

Text:

Readme File

Notes:

text/plain

Other Study-Related Materials

Label:

performance_profiles.R

Text:

Analysis Code (R Script)

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

supplementary File S2 .pdf

Text:

FastQC metrics for sample used to generate simulated reads for the second dataset

Notes:

application/pdf

Other Study-Related Materials

Label:

Supplementary File S3.pdf

Text:

FastQC quality profile for sample used to generate simulated reads for the third dataset

Notes:

application/pdf

Other Study-Related Materials

Label:

truth_table.R

Text:

Analysis Code (R Script)

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

Data_Files.zip

Text:

Analysis Datasets (CSV format)

Notes:

application/zip