Featured Dataverses

In order to use this feature you must have at least one published or linked dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

Advanced Search

71 to 80 of 2,445 Results
Aug 30, 2023
Tracey, Jennifer; Lee, Haejoong; Strassel, Stephanie; Ismael, Safa, 2018, "BOLT Arabic Discussion Forums", https://hdl.handle.net/11272.1/AB2/DP4INP, Linguistic Data Consortium
BOLT Arabic Discussion Forums was developed by the Linguistic Data Consortium (LDC) and consists of 813,080 discussion forum threads in Egyptian Arabic harvested from the Internet using a combination of manual and automatic processes. The DARPA BOLT (Broad Operational Language Translation) program developed machine translation and information retri...
This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Aug 30, 2023
Greenberg, Craig; Martin, Alvin; Graff, David; Brandschain, Linda; Walker, Kevin, 2017, "2010 NIST Speaker Recognition Evaluation Test Set", https://hdl.handle.net/11272.1/AB2/2CPM3O, Linguistic Data Consortium
Introduction 2010 NIST Speaker Recognition Evaluation Test Set was developed by the Linguistic Data Consortium (LDC) and NIST (National Institute of Standards and Technology). It contains 2,255 hours of American English telephone speech and speech recorded over a microphone channel involving an interview scenario used as test data in the NIST-spons...
This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Aug 30, 2023
Bu, Hui, 2018, "AISHELL-1", https://hdl.handle.net/11272.1/AB2/2WMDTT, Linguistic Data Consortium
AISHELL-1 was developed by Beijing Shell Shell Technology Co., Ltd. It contains approximately 520 hours of Chinese Mandarin speech from 400 speakers recorded simultaneously on three different devices with associated transcripts. The goal of the collection was to support speech recognition system development in 11 domains, including smart homes, aut...
This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Aug 26, 2023
Walker, Kevin; Ma, Xiaoyi; Graff, David; Strassel, Stephanie; Sessa, Stephanie; Jones, Karen, 2015, "RATS Speech Activity Detection", https://hdl.handle.net/11272.1/AB2/1UISJ7, Linguistic Data Consortium
Introduction RATS Speech Activity Detection was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 3,000 hours of Levantine Arabic, English, Farsi, Pashto, and Urdu conversational telephone speech with automatic and manual annotation of speech segments. The corpus was created to provide training, development and ini...
This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Aug 26, 2023
Alwan, Abeer; Lulich, Steven; Sommers, Mitchell, 2015, "The Subglottal Resonances Database", https://hdl.handle.net/11272.1/AB2/R82KKG, Linguistic Data Consortium
Introduction The Subglottal Resonances Database was developed by Washington University and University of California Los Angeles and consists of 45 hours of simultaneous microphone and subglottal accelerometer recordings of 25 adult male and 25 adult female speakers of American English between 22 and 25 years of age. The subglottal system is compose...
This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Aug 19, 2023
Pradhan, Sameer; Cole, Ronald Allan; Ward, Wayne, 2021, "MyST Children's Conversational Speech", https://doi.org/10.35111/CYXY-P432
Abstract Introduction MyST (My Science Tutor) Children's Conversational Speech was developed by Boulder Learning Inc. It is comprised of approximately 470 hours of English speech from 1371 students in grades 3-5 conversing with a virtual science tutor in eight areas of science instruction, along with transcripts and a pronunciation dictionary. Data...
This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Aug 19, 2023
Hernández Mena, Carlos Daniel; Gatt, Albert; Borg, Claudia; DeMarco, Andrea; van der Plas, Lonneke, 2022, "MASRI Synthetic", https://doi.org/10.35111/WC8H-H752
Abstract Introduction MASRI (Maltese Automatic Speech Recognition I) Synthetic was developed by the MASRI team at the University of Malta and consists of approximately 99 hours of synthesized Maltese speech. Data Source sentences were extracted from the Maltese Language Resource Server (MLRS) corpus, comprised of written or transcribed Maltese cove...
This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Aug 18, 2023
Mollberg, David; Jónsson, Ólafur Helgi; Þorsteinsdóttir, Sunneva; Guðmundsdóttir, Jóhanna Vigdís; Steingrimsson, Steinthor; Magnusdottir, Eydis Huld; Fong, Judy; Borsky, Michal; Gudnason, Jon, 2022, "Samrómur Icelandic Speech 1.0", https://doi.org/10.35111/THX3-F170
Abstract Introduction Samrómur Icelandic Speech 1.0 was developed by the Language and Voice Lab, Reykjavik University in cooperation with Almannarómur, Center for Language Technology. The corpus contains 145 hours of Icelandic prompted speech from 8,392 speakers representing 100,000 utterances. This version 1.0 is equivalent to "Samrómur Icelandic...
This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Aug 18, 2023
Helgadóttir, Inga Rún; Kjaran, Róbert; Nikulásdóttir, Anna Björk; Gudnason, Jon, 2021, "Althingi Parliamentary Speech", https://doi.org/10.35111/695B-6697
Abstract Introduction Althingi Parliamentary Speech consists of approximately 542 hours of recorded speech from Althingi, the Icelandic Parliament, along with corresponding transcripts, a pronunciation dictionary and two language models. Speeches date from 2005-2016. This dataset was collected in 2016 by the ASR for Althingi project at Reykjavik Un...
This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Aug 18, 2023
Tracey, Jennifer; Strassel, Stephanie; Graff, David; Wright, Jonathan; Chen, Song; Ryant, Neville; Kulick, Seth; Griffitt, Kira; Delgado, Dana; Arrigo, Michael, 2023, "LORELEI Indonesian Representative Language Pack", https://doi.org/10.35111/6GWW-XC16
Abstract Introduction LORELEI Indonesian Representative Language Pack consists of Indonesian monolingual text, Indonesian-English parallel text, annotations, supplemental resources and related software tools developed by the Linguistic Data Consortium (LDC) for the DARPA LORELEI program. The LORELEI (Low Resource Languages for Emergent Incidents) p...
This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Add Data

Sign up or log in to create a dataverse or add a dataset.

Share Dataverse

Share this dataverse on your favorite social media networks.

Link Dataverse
Reset Modifications

Are you sure you want to reset the selected metadata fields? If you do this, any customizations (hidden, required, optional) you have done will no longer appear.