Featured Dataverses

In order to use this feature you must have at least one published or linked dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

Advanced Search

81 to 90 of 2,445 Results
Aug 18, 2023
Ryant, Neville; Liberman, Mark; Fiumara, James; Cieri, Christopher, 2021, "Second DIHARD Challenge Development - SEEDLingS", https://doi.org/10.35111/B2YG-BZ44
Abstract Introduction Second DIHARD Challenge Development - SEEDLinGS was developed by Duke University and LDC and contains approximately two hours of English child language recordings along with corresponding annotations used in support of the Second DIHARD Challenge. This release, when combined with Second DIHARD Challenge Development - Eleven So...
This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Aug 18, 2023
Sen Bhattacharya, Basabdatta; Subramanian, Aiswarya; Chatterjee, Purbayan; Dey, Sounak, 2022, "Spoken Digits in Hindi and Indian English", https://doi.org/10.35111/5WAY-1446
Abstract Introduction Spoken Digits in Hindi and Indian English was developed by the Birla Institute of Technology and Science Pilani. It contains approximately two hours of speech comprised of spoken digits from one to ten in Hindi and English with regional accents from across India. Data The speech data was collected as follows: in person, on a m...
This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Aug 18, 2023
Brandschain, Linda; Walker, Kevin; Graff, David, 2023, "Mixer 7 Spanish Speech", https://doi.org/10.35111/RVD7-7107
Abstract Introduction Mixer 7 Spanish Speech (LDC2023S04) was developed by the Linguistic Data Consortium (LDC) and contains 9,600 hours of audio recordings of interviews, transcript readings and conversational telephone speech involving 191 distinct native Spanish speakers. This material was collected by LDC in 2011 and 2012 as part of the Mixer p...
This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Aug 18, 2023
Maamouri, Mohamed; Graff, David, 2023, "Moroccan Arabic - English Lexical Database", https://doi.org/10.35111/8FZ8-R860
Abstract Introduction Moroccan Arabic - English Lexical Database was developed by the Linguistic Data Consortium (LDC). It is comprised of a set of five interrelated tables presenting each Moroccan Arabic word as an orthographic form in Arabic script and a pronunciation form in International Phonetic Alphabet (IPA) format. This release contains ove...
This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Aug 18, 2023
Tracey, Jennifer; Strassel, Stephanie; Graff, David; Wright, Jonathan; Chen, Song; Ryant, Neville; Kulick, Seth; Griffitt, Kira; Delgado, Dana; Arrigo, Michael, 2023, "LORELEI Thai Representative Language Pack", https://doi.org/10.35111/MYDH-2926
Abstract Introduction LORELEI Thai Representative Language Pack (LDC2023T08) consists of Thai monolingual text, Thai-English parallel text, annotations, supplemental resources and related software tools developed by the Linguistic Data Consortium (LDC) for the DARPA LORELEI program. The LORELEI (Low Resource Languages for Emergent Incidents) progra...
This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Aug 18, 2023
Hernández Mena, Carlos Daniel; Borsky, Michal; Mollberg, David; Guðmundsson, Smári Freyr; Hedström, Staffan; Pálsson, Ragnar; Jónsson, Ólafur Helgi; Þorsteinsdóttir, Sunneva; Guðmundsdóttir, Jóhanna Vigdís; Magnusdottir, Eydis Huld; Þórhallsdóttir, Ragnheiður; Gudnason, Jon, 2022, "Samrómur Children Icelandic Speech 1.0", https://doi.org/10.35111/FRRJ-QD60
Abstract Introduction Samrómur Children Icelandic Speech 1.0 was developed by the Language and Voice Lab, Reykjavik University in cooperation with Almannarómur, Center for Language Technology. The corpus contains 131 hours of Icelandic prompted speech from 3,175 speakers (children, aged 4-17 years) representing 137,597 utterances. This version 1.0...
This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Jul 29, 2023
Statistics Canada, 2023, "Postal Code Conversion File Plus (PCCF+) Version 8A1, December 2022 Postal Codes", https://hdl.handle.net/11272.1/AB2/FPEURY, Statistics Canada
Overview The PCCF+ is a SAS control program and set of associated datasets derived from the PCCF, a 2021 postal code population weight file, the Geographic Attribute File, Health Region boundary files, and other supplementary data. PCCF+ automatically assigns a range of Statistics Canada standard geographic areas and other geographic identifiers ba...
This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Jul 29, 2023
Statistics Canada, 2023, "Postal Code Conversion File Plus (PCCF+) Version 8A, December 2022 Postal Codes", https://hdl.handle.net/11272.1/AB2/0LDK7U, Statistics Canada
Overview The PCCF+ is a SAS control program and set of associated datasets derived from the PCCF, a 2021 postal code population weight file, the Geographic Attribute File, Health Region boundary files, and other supplementary data. PCCF+ automatically assigns a range of Statistics Canada standard geographic areas and other geographic identifiers ba...
This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Jul 25, 2023
Amith, Jonathan D.; Alcántara, Amelia Domínguez; Osollo, Hermelindo Salazar; Castañeda, Ceferino Salgado; Salgado, Eleuterio Gorostiza, 2021, "Ethnobotanical Research and Language Documentation of Nahuatl", https://doi.org/10.35111/9DJS-6V63
Abstract Introduction Ethnobotanical Research and Language Documentation of Nahuatl consists of approximately 190 hours of field recordings collected in the Sierra Nororiental and Sierra Norte regions of Puebla, Mexico. The corpus contains audio and video recordings of native Nahuatl speakers during the collection of particular plants; partial tran...
This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Jul 25, 2023
Ryant, Neville; Liberman, Mark; Fiumara, James; Cieri, Christopher, 2022, "Second DIHARD Challenge Evaluation - SEEDLingS", https://doi.org/10.35111/MFAM-HF33
Abstract Introduction Second DIHARD Challenge Evaluation - SEEDLingS was developed by Duke University and the Linguistic Data Consortium (LDC) and contains approximately two hours of English child language recordings along with corresponding annotations used in support of the Second DIHARD Challenge. The DIHARD Challenges are a set of shared tasks...
This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Add Data

Sign up or log in to create a dataverse or add a dataset.

Share Dataverse

Share this dataverse on your favorite social media networks.

Link Dataverse
Reset Modifications

Are you sure you want to reset the selected metadata fields? If you do this, any customizations (hidden, required, optional) you have done will no longer appear.