View: |
Part 1: Document Description
|
Citation |
|
---|---|
Title: |
Replication Data for: Evaluating the Performance of Tools Used to Call Minority Variants from Whole Genome Short-Read Data |
Identification Number: |
doi:10.7910/DVN/ZIO43M |
Distributor: |
Harvard Dataverse |
Date of Distribution: |
2018-01-29 |
Version: |
5 |
Bibliographic Citation: |
Mohammed, Khadija Said; Githinji, George, 2018, "Replication Data for: Evaluating the Performance of Tools Used to Call Minority Variants from Whole Genome Short-Read Data", https://doi.org/10.7910/DVN/ZIO43M, Harvard Dataverse, V5, UNF:6:PJqmV9vNQ9wsd+4PO+h8Jw== [fileUNF] |
Citation |
|
Title: |
Replication Data for: Evaluating the Performance of Tools Used to Call Minority Variants from Whole Genome Short-Read Data |
Identification Number: |
doi:10.7910/DVN/ZIO43M |
Authoring Entity: |
Mohammed, Khadija Said (KEMRI-Wellcome Trust Research Programme, Kilifi, Kenya) |
Githinji, George (KEMRI-Wellcome Trust Research Programme, Kilifi, Kenya) |
|
Other identifications and acknowledgements: |
Mwango, Lillian |
Distributor: |
Harvard Dataverse |
Access Authority: |
Githinji, G |
Depositor: |
Githinji, George |
Date of Deposit: |
2018-01-18 |
Holdings Information: |
https://doi.org/10.7910/DVN/ZIO43M |
Study Scope |
|
Keywords: |
Medicine, Health and Life Sciences |
Abstract: |
Several minority variant callers have been developed to describe the minority variants sub-populations from whole genome sequence data. These tools differ based on bioinformatics and statistical approaches used to distinguish between real errors and relevant low-frequency variants. This project evaluated the diagnostic performance of four published minority variant callers and assessed overall concordance used to report minority variants from short-read sequenced data. An ART-Illumina read simulation tool was used to generate artificial short-read datasets of varying coverage based on a Respiratory Syncytial Virus (RSV) reference genome. The samples were spiked with nucleotide variants at predetermined positions and frequencies and thereafter called using FreeBayes, LoFreq, Vardict, and VarScan2. To identify the effect of the quality of data on concordance and performance of the callers we included datasets with error profiles. |
Notes: |
The datasets and analysis code (R scripts) are also available on Open Science Framework. DOI: <a href="http://doi.org/10.17605/OSF.IO/ZW39Q">10.17605/OSF.IO/ZW39Q</a> |
Methodology and Processing |
|
Sources Statement |
|
Data Access |
|
Extent of Collection: |
Two (2) datasets in CSV format, analysis software code (R scripts) and Readme FIle (.txt) |
Citation Requirement: |
Publications based on this data collection should acknowledge this source by means of bibliographic citation. To ensure that such source attributions are captured for bibliographic utilities, citations must appear in footnotes or in the reference section of publications. The bibliographic citation for this data collection is: "Mohammed, Khadija Said; Githinji, George, 2018, "Replication Data for: Evaluating the Performance of Tools Used to Call Minority Variants from Whole Genome Short-Read Data", doi:10.7910/DVN/ZIO43M, Harvard Dataverse, V1" |
Notes: |
This data is made available under the Creative Commons Attribution 4.0 International (CC BY 4.0) - https://creativecommons.org/licenses/by/4.0/legalcode. Publications based on these data should acknowledge this source by means of bibliographic citations. For more information on these data, please contact the authors: Mohammed, Khadija Said (KSaid@kemri-wellcome.org); Githinji, George (GGithinji@kemri-wellcome.org) OR the data governance office via this email address: dgc@kemri-wellcome.org |
Other Study Description Materials |
|
Related Studies |
|
The datasets and analysis code (R scripts) are also available on Open Science Framework. DOI: <a href="http://doi.org/10.17605/OSF.IO/ZW39Q">10.17605/OSF.IO/ZW39Q</a> |
|
Related Publications |
|
Citation |
|
Title: |
Mohammed, K.S., Kibinge, N., Prins, P., Agoti, C.N., Cotten, M., Nokes, D.J., Brand, S. and Githinji, G., 2018. Evaluating the performance of tools used to call minority variants from whole genome short-read data. Wellcome open research, 3. |
Identification Number: |
10.12688/wellcomeopenres.13538.2 |
Bibliographic Citation: |
Mohammed, K.S., Kibinge, N., Prins, P., Agoti, C.N., Cotten, M., Nokes, D.J., Brand, S. and Githinji, G., 2018. Evaluating the performance of tools used to call minority variants from whole genome short-read data. Wellcome open research, 3. |
Label: |
concordanceplot.R |
Text: |
Analysis Code (R Script) |
Notes: |
type/x-r-syntax |
Label: |
density_plot.R |
Text: |
Analysis Code (R Script) |
Notes: |
type/x-r-syntax |
Label: |
Githinji_2018_Variant_Calling_Tools_Readme.txt |
Text: |
Readme File |
Notes: |
text/plain |
Label: |
performance_profiles.R |
Text: |
Analysis Code (R Script) |
Notes: |
type/x-r-syntax |
Label: |
supplementary File S2 .pdf |
Text: |
FastQC metrics for sample used to generate simulated reads for the second dataset |
Notes: |
application/pdf |
Label: |
Supplementary File S3.pdf |
Text: |
FastQC quality profile for sample used to generate simulated reads for the third dataset |
Notes: |
application/pdf |
Label: |
truth_table.R |
Text: |
Analysis Code (R Script) |
Notes: |
type/x-r-syntax |
Label: |
Data_Files.zip |
Text: |
Analysis Datasets (CSV format) |
Notes: |
application/zip |