Replication Data for: Haplotype heterogeneity and low linkage disequilibrium reduce reliable prediction of genotypes for the ‑α3.7I form of α-thalassaemia using genome-wide microarray data (doi:10.7910/DVN/YTXAHR)

View:

Part 1: Document Description
Part 2: Study Description
Part 3: Data Files Description
Part 4: Variable Description
Part 5: Other Study-Related Materials
Entire Codebook

Document Description

Citation

Title:

Replication Data for: Haplotype heterogeneity and low linkage disequilibrium reduce reliable prediction of genotypes for the ‑α3.7I form of α-thalassaemia using genome-wide microarray data

Identification Number:

doi:10.7910/DVN/YTXAHR

Distributor:

Harvard Dataverse

Date of Distribution:

2020-09-25

Version:

4

Bibliographic Citation:

Ndila, Carolyne M.; Nyirongo, Vysaul; Macharia, Alexander W.; Jeffreys, Anna E.; Rowlands, Kate; Hubbart, Christina; Busby, George B. J.; Band, Gavin; Harding, Rosalind; Rockett, Kirk A.; Williams, Thomas N.; the MalariaGEN Consortium, 2020, "Replication Data for: Haplotype heterogeneity and low linkage disequilibrium reduce reliable prediction of genotypes for the ‑α3.7I form of α-thalassaemia using genome-wide microarray data", https://doi.org/10.7910/DVN/YTXAHR, Harvard Dataverse, V4, UNF:6:6YD+L93LLD7FbMgF6sUNIQ== [fileUNF]

Study Description

Citation

Title:

Replication Data for: Haplotype heterogeneity and low linkage disequilibrium reduce reliable prediction of genotypes for the ‑α3.7I form of α-thalassaemia using genome-wide microarray data

Identification Number:

doi:10.7910/DVN/YTXAHR

Authoring Entity:

Ndila, Carolyne M. (Department of Epidemiology and Demography, KEMRI-Wellcome Trust Research Programme, CGMRC, PO Box 230-80108, Kenya)

Nyirongo, Vysaul (United Nation Statistics Division, 760 United Nations Plaza, Manhattan, New York City, New York 10017, United States)

Macharia, Alexander W. (Department of Epidemiology and Demography, KEMRI-Wellcome Trust Research Programme, CGMRC, PO Box 230-80108, Kenya)

Jeffreys, Anna E. (Wellcome Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK)

Rowlands, Kate (Wellcome Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK)

Hubbart, Christina (Wellcome Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK)

Busby, George B. J. (Wellcome Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK; Centre for Genomics and Global Health, Big Data Institute, Old Road Campus, Oxford OX3 7LF, UK)

Band, Gavin (Wellcome Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK)

Harding, Rosalind (Departments of Zoology and Statistics, Zoology Research and Administration Building, 11a Mansfield Road, Oxford OX1 3SZ, UK)

Rockett, Kirk A. (Wellcome Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK; Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK)

Williams, Thomas N. (Department of Epidemiology and Demography, KEMRI-Wellcome Trust Research Programme, CGMRC, PO Box 230-80108, Kenya; Department of Infectious Diseases, Imperial College Faculty of Medicine, London W21NY, UK)

the MalariaGEN Consortium (United Nation Statistics Division, 760 United Nations Plaza, Manhattan, New York City, New York 10017, United States; Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK; MalariaGEN Consortium members are listed in full in Supplementary Materials)

Distributor:

Harvard Dataverse

Access Authority:

Williams, Thomas N.

Access Authority:

Rockett, Kirk A.

Depositor:

Ndila, Carolyne M.

Date of Deposit:

2020-09-25

Holdings Information:

https://doi.org/10.7910/DVN/YTXAHR

Study Scope

Keywords:

Medicine, Health and Life Sciences, Malaria, α-thalassemia, Predictive Models, multinomial regression-model, Classification and Regression Tree, GWAS, haplotypes

Abstract:

<p>This is a replication dataset for the submitted manuscript: <a href="https://doi.org/10.12688/wellcomeopenres.16320.1">"Haplotype heterogeneity and low linkage disequilibrium reduce reliable prediction of genotypes for the α3.7I form of α-thalassaemia using genome-wide microarray data"</a>. </p> <p>The dataset contains genotyped α-thalassemia mutations from more than 6,000 individuals from Kilifi, Kenya. These data along with their corresponding Illumina HumanOmni2.5-4 microarray data was used to investigate the haplotype structure of α-thalassemia deletion variants in the population, and the potential utility of a wide range of indirect GWAS-based approaches, including the microarray-chip intensity data and haplotype imputation, as an alternative to direct typing.</p> <p><em><strong>Version notes: </strong>This revision was made following review of the primary manuscript related to this data package and includes 2 updated files and an additional data file with associated descriptions file.</em></p>

Notes:

<p><strong>Data Access: </strong>Open</p> <p><strong>License: </strong><a href="https://creativecommons.org/licenses/by/4.0/legalcode">Creative Commons Attribution 4.0 International</a> <a href="https://creativecommons.org/licenses/by/4.0/legalcode"><img src="https://i.creativecommons.org/l/by/3.0/88x31.png"> </a></p>

Methodology and Processing

Sources Statement

Data Access

Citation Requirement:

<p> No restrictions are applied to this dataset except that the data and main publication (Ndila et al. Wellcome Open Research 2020: DOI to be added) must be cited if you use this collection</p>

Notes:

<p>This data is licensed under the <a href="https://creativecommons.org/licenses/by/4.0/legalcode">Creative Commons Attribution 4.0 International</a> <a href="https://creativecommons.org/licenses/by/4.0/legalcode"><img src="https://i.creativecommons.org/l/by/3.0/88x31.png"> </a> </p>

Other Study Description Materials

File Description--f4190908

File: Underlying_DATA1.tab

  • Number of cases: 80339

  • No. of variables per record: 28

  • Type of File: text/tab-separated-values

Notes:

UNF:6:qmd5QsWBR3qJpMQPhDieRA==

File Description--f4190906

File: Underlying_DATA2.tab

  • Number of cases: 101

  • No. of variables per record: 5

  • Type of File: text/tab-separated-values

Notes:

UNF:6:oMnR5XximSAKRA/aes7Y6Q==

File Description--f4190911

File: Underlying_DATA3.tab

  • Number of cases: 103

  • No. of variables per record: 8

  • Type of File: text/tab-separated-values

Notes:

UNF:6:3cdaz0Zqm7+Nro+U7/GaUQ==

Variable Description

List of Variables:

Variables

manifestrow

f4190908 Location:

Summary Statistics: StDev 23192.01597533082; Min. 7.0; Valid 80339.0; Mean 40176.0; Max. 80345.0

Variable Format: numeric

Notes: UNF:6:DaGKTZ/mmUKUZV0I1AdzNg==

IlmnID

f4190908 Location:

Variable Format: character

Notes: UNF:6:++cbsO3T3LLWbBJEWjQBSQ==

Name

f4190908 Location:

Variable Format: character

Notes: UNF:6:+xcjJZTBi3VTMB1hd2LAXw==

GenomeBuild

f4190908 Location:

Summary Statistics: StDev 0.0; Valid 80339.0; Mean 37.1; Min. 37.1; Max. 37.1

Variable Format: numeric

Notes: UNF:6:YgYicvvY+JJKRFeXzwGBIg==

Chr

f4190908 Location:

Summary Statistics: Min. 16.0; Mean 16.0; Valid 80339.0; StDev 0.0; Max. 16.0

Variable Format: numeric

Notes: UNF:6:I0dRupjB24OU842pVo3BTg==

MapInfo

f4190908 Location:

Summary Statistics: Mean 4.934981678617484E7; StDev 3.058756132547013E7; Valid 80339.0; Min. 84130.0; Max. 9.0170495E7

Variable Format: numeric

Notes: UNF:6:0HGqwbraRclvP+pY1xoRgg==

GWASQCgood

f4190908 Location:

Summary Statistics: Max. 1.0; Valid 80339.0; StDev 0.4440361562122193; Min. 0.0; Mean 0.7298572299879214

Variable Format: numeric

Notes: UNF:6:dS6dJ7oIUHj90ai18YLoNg==

10MBselection

f4190908 Location:

Summary Statistics: Mean 0.12102465801185179; Max. 1.0; Min. 0.0; Valid 80339.0; StDev 0.32615795911116136;

Variable Format: numeric

Notes: UNF:6:F5ND0Qrq4eLmStmlZM95mQ==

400kbselection

f4190908 Location:

Summary Statistics: Min. 0.0; Valid 80339.0; Mean 0.0022156113469205094; Max. 1.0; StDev 0.04701839991750037

Variable Format: numeric

Notes: UNF:6:7+yW5T8rF2dCcA7H0y8rLQ==

a-3.7deletion

f4190908 Location:

Summary Statistics: Valid 80339.0; Max. 1.0; StDev 0.008641694305226402; Min. 0.0; Mean 7.46835285477218E-5

Variable Format: numeric

Notes: UNF:6:XL6rDz3zWRjntArSH8w3fQ==

intensityanalysis

f4190908 Location:

Summary Statistics: Max. 1.0; StDev 0.009978444850933061; Min. 0.0; Valid 80339.0; Mean 9.957803806392337E-5

Variable Format: numeric

Notes: UNF:6:BkoSOp223BCpiDif6WMxfg==

duplicateposition

f4190908 Location:

Summary Statistics: Min. 0.0; Mean 0.0058751042457578185; StDev 0.0764242114518683; Max. 1.0; Valid 80339.0;

Variable Format: numeric

Notes: UNF:6:8iY4cvMH2qDLDzMmH2Ldvw==

IlmnStrand

f4190908 Location:

Variable Format: character

Notes: UNF:6:bv8pjTIQTvphIDjxVzC72Q==

SNP

f4190908 Location:

Variable Format: character

Notes: UNF:6:TVWPVjgQlZHurdgIn2UzbA==

AddressAID

f4190908 Location:

Summary Statistics: Min. 1.0600301E7; Max. 1.74810344E8; Mean 9.270116416772789E7; Valid 80339.0; StDev 5.3812700672790125E7

Variable Format: numeric

Notes: UNF:6:/o714lJlEmsVRHBiWjEsyQ==

AlleleAProbeSeq

f4190908 Location:

Variable Format: character

Notes: UNF:6:ZHBZ2rcgDZi37FP77yv3xQ==

AddressBID

f4190908 Location:

Summary Statistics: Min. 1.0654435E7; StDev 4.372503786927394E7; Max. 1.74805444E8; Valid 3388.0; Mean 1.2355457655785125E8;

Variable Format: numeric

Notes: UNF:6:ZBoqTkWQVyMgDEggnVz5/Q==

AlleleBProbeSeq

f4190908 Location:

Variable Format: character

Notes: UNF:6:AxVWMj/9VcZJTFgm0ap7Xg==

Ploidy

f4190908 Location:

Variable Format: character

Notes: UNF:6:cjri4Oymy5WEg+RqQVU4ng==

Species

f4190908 Location:

Variable Format: character

Notes: UNF:6:x1C+M2rsk7cltscs590TWQ==

Source

f4190908 Location:

Variable Format: character

Notes: UNF:6:lw3lOojmN6lrAP2udgbLCg==

SourceVersion

f4190908 Location:

Summary Statistics: StDev 58.88280842680909; Mean 36.82507872888718; Max. 131.0; Valid 80339.0; Min. 0.0;

Variable Format: numeric

Notes: UNF:6:i7QdlUtOMj1mwEI2txvYLg==

SourceStrand

f4190908 Location:

Variable Format: character

Notes: UNF:6:jqWQyKrUn+Idzz55Xcc8ZQ==

SourceSeq

f4190908 Location:

Variable Format: character

Notes: UNF:6:Lw9ezuDgXDXV1E4tGlrgpA==

TopGenomicSeq

f4190908 Location:

Variable Format: character

Notes: UNF:6:klkNjwUwa+RZfk4nsLyyeg==

BeadSetID

f4190908 Location:

Summary Statistics: StDev 24.660097828271656; Min. 216.0; Valid 80339.0; Max. 314.0; Mean 257.6210931179131;

Variable Format: numeric

Notes: UNF:6:jE/AZE1S1vwXGo0TiZ64aw==

ExpClusters

f4190908 Location:

Summary Statistics: StDev 0.0; Min. 3.0; Valid 80339.0; Mean 3.0; Max. 3.0

Variable Format: numeric

Notes: UNF:6:eO5x+Ge9f/ub1Zgryxj/wA==

RefStrand

f4190908 Location:

Variable Format: character

Notes: UNF:6:3n9irERR/bSiPzGWRmjVsw==

number

f4190906 Location:

Summary Statistics: StDev 11.113055385446435; Valid 38.0; Min. 1.0; Mean 19.5; Max. 38.0;

Variable Format: numeric

Notes: UNF:6:b2i4E6BA+jAfparB7rxgZQ==

tablename

f4190906 Location:

Variable Format: character

Notes: UNF:6:AUwjygZLF29nwQKmcx84Vg==

comment

f4190906 Location:

Variable Format: character

Notes: UNF:6:kKrrLjc6mlOIckSKrV/E7g==

D

f4190906 Location:

Summary Statistics: Valid 0.0; Min. NaN; Max. NaN; StDev NaN; Mean NaN;

Variable Format: numeric

Notes: UNF:6:K1NCHP3PnX8R35PvIK/SSw==

UNDERLYINGDATA2

f4190906 Location:

Variable Format: character

Notes: UNF:6:6osO3t5q0Ff7w/08JzOdNQ==

number

f4190911 Location:

Variable Format: character

Notes: UNF:6:oTADAuxI+JZbbAxF0NIWCw==

tablename

f4190911 Location:

Variable Format: character

Notes: UNF:6:MAZXUKfkdQecxFqF+TCVkA==

comment

f4190911 Location:

Variable Format: character

Notes: UNF:6:5gGcJycf67wF/XPpK2bOaA==

D

f4190911 Location:

Summary Statistics: Mean NaN; Max. NaN; StDev NaN; Min. NaN; Valid 0.0

Variable Format: numeric

Notes: UNF:6:BJ4RN6pcq5sto9Th0wqu9g==

E

f4190911 Location:

Summary Statistics: Max. NaN; StDev NaN; Valid 0.0; Min. NaN; Mean NaN

Variable Format: numeric

Notes: UNF:6:BJ4RN6pcq5sto9Th0wqu9g==

F

f4190911 Location:

Summary Statistics: Mean NaN; StDev NaN; Max. NaN; Min. NaN; Valid 0.0;

Variable Format: numeric

Notes: UNF:6:BJ4RN6pcq5sto9Th0wqu9g==

FIELDS

f4190911 Location:

Variable Format: character

Notes: UNF:6:qKvBsHrC5EWQoWevt5nhDA==

comment

f4190911 Location:

Variable Format: character

Notes: UNF:6:9LjPA/n6g3J6KGC/YYdRfg==

Other Study-Related Materials

Label:

Extended_Data_AUG2021.pdf

Notes:

application/pdf

Other Study-Related Materials

Label:

README_Ndila_et_al_alpha-thalassemia_in_KENYA_v2.txt

Notes:

text/plain

Other Study-Related Materials

Label:

Underlying_Data1_descriptions.xlsx

Notes:

application/vnd.openxmlformats-officedocument.spreadsheetml.sheet

Other Study-Related Materials

Label:

Underlying_Data2_descriptions.xlsx

Notes:

application/vnd.openxmlformats-officedocument.spreadsheetml.sheet

Other Study-Related Materials

Label:

Underlying_Data3_descriptions.xlsx

Notes:

application/vnd.openxmlformats-officedocument.spreadsheetml.sheet

Other Study-Related Materials

Label:

Underlying_data4.tab

Notes:

text/tsv

Other Study-Related Materials

Label:

Underlying_DATA4_description.txt

Notes:

text/plain