UBC Abacus Harvested Dataverse

Featured Dataverses

In order to use this feature you must have at least one published or linked dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

81 to 90 of 2,445 Results

Second DIHARD Challenge Development - SEEDLingS Aug 18, 2023 Ryant, Neville; Liberman, Mark; Fiumara, James; Cieri, Christopher, 2021, "Second DIHARD Challenge Development - SEEDLingS", https://doi.org/10.35111/B2YG-BZ44 Abstract Introduction Second DIHARD Challenge Development - SEEDLinGS was developed by Duke University and LDC and contains approximately two hours of English child language recordings along with corresponding annotations used in support of the Second DIHARD Challenge. This release, when combined with Second DIHARD Challenge Development - Eleven So... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Spoken Digits in Hindi and Indian English Aug 18, 2023 Sen Bhattacharya, Basabdatta; Subramanian, Aiswarya; Chatterjee, Purbayan; Dey, Sounak, 2022, "Spoken Digits in Hindi and Indian English", https://doi.org/10.35111/5WAY-1446 Abstract Introduction Spoken Digits in Hindi and Indian English was developed by the Birla Institute of Technology and Science Pilani. It contains approximately two hours of speech comprised of spoken digits from one to ten in Hindi and English with regional accents from across India. Data The speech data was collected as follows: in person, on a m... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Mixer 7 Spanish Speech Aug 18, 2023 Brandschain, Linda; Walker, Kevin; Graff, David, 2023, "Mixer 7 Spanish Speech", https://doi.org/10.35111/RVD7-7107 Abstract Introduction Mixer 7 Spanish Speech (LDC2023S04) was developed by the Linguistic Data Consortium (LDC) and contains 9,600 hours of audio recordings of interviews, transcript readings and conversational telephone speech involving 191 distinct native Spanish speakers. This material was collected by LDC in 2011 and 2012 as part of the Mixer p... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Moroccan Arabic - English Lexical Database Aug 18, 2023 Maamouri, Mohamed; Graff, David, 2023, "Moroccan Arabic - English Lexical Database", https://doi.org/10.35111/8FZ8-R860 Abstract Introduction Moroccan Arabic - English Lexical Database was developed by the Linguistic Data Consortium (LDC). It is comprised of a set of five interrelated tables presenting each Moroccan Arabic word as an orthographic form in Arabic script and a pronunciation form in International Phonetic Alphabet (IPA) format. This release contains ove... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
LORELEI Thai Representative Language Pack Aug 18, 2023 Tracey, Jennifer; Strassel, Stephanie; Graff, David; Wright, Jonathan; Chen, Song; Ryant, Neville; Kulick, Seth; Griffitt, Kira; Delgado, Dana; Arrigo, Michael, 2023, "LORELEI Thai Representative Language Pack", https://doi.org/10.35111/MYDH-2926 Abstract Introduction LORELEI Thai Representative Language Pack (LDC2023T08) consists of Thai monolingual text, Thai-English parallel text, annotations, supplemental resources and related software tools developed by the Linguistic Data Consortium (LDC) for the DARPA LORELEI program. The LORELEI (Low Resource Languages for Emergent Incidents) progra... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Samrómur Children Icelandic Speech 1.0 Aug 18, 2023 Hernández Mena, Carlos Daniel; Borsky, Michal; Mollberg, David; Guðmundsson, Smári Freyr; Hedström, Staffan; Pálsson, Ragnar; Jónsson, Ólafur Helgi; Þorsteinsdóttir, Sunneva; Guðmundsdóttir, Jóhanna Vigdís; Magnusdottir, Eydis Huld; Þórhallsdóttir, Ragnheiður; Gudnason, Jon, 2022, "Samrómur Children Icelandic Speech 1.0", https://doi.org/10.35111/FRRJ-QD60 Abstract Introduction Samrómur Children Icelandic Speech 1.0 was developed by the Language and Voice Lab, Reykjavik University in cooperation with Almannarómur, Center for Language Technology. The corpus contains 131 hours of Icelandic prompted speech from 3,175 speakers (children, aged 4-17 years) representing 137,597 utterances. This version 1.0... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Postal Code Conversion File Plus (PCCF+) Version 8A1, December 2022 Postal Codes Jul 29, 2023 Statistics Canada, 2023, "Postal Code Conversion File Plus (PCCF+) Version 8A1, December 2022 Postal Codes", https://hdl.handle.net/11272.1/AB2/FPEURY, Statistics Canada Overview The PCCF+ is a SAS control program and set of associated datasets derived from the PCCF, a 2021 postal code population weight file, the Geographic Attribute File, Health Region boundary files, and other supplementary data. PCCF+ automatically assigns a range of Statistics Canada standard geographic areas and other geographic identifiers ba... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Postal Code Conversion File Plus (PCCF+) Version 8A, December 2022 Postal Codes Jul 29, 2023 Statistics Canada, 2023, "Postal Code Conversion File Plus (PCCF+) Version 8A, December 2022 Postal Codes", https://hdl.handle.net/11272.1/AB2/0LDK7U, Statistics Canada Overview The PCCF+ is a SAS control program and set of associated datasets derived from the PCCF, a 2021 postal code population weight file, the Geographic Attribute File, Health Region boundary files, and other supplementary data. PCCF+ automatically assigns a range of Statistics Canada standard geographic areas and other geographic identifiers ba... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Ethnobotanical Research and Language Documentation of Nahuatl Jul 25, 2023 Amith, Jonathan D.; Alcántara, Amelia Domínguez; Osollo, Hermelindo Salazar; Castañeda, Ceferino Salgado; Salgado, Eleuterio Gorostiza, 2021, "Ethnobotanical Research and Language Documentation of Nahuatl", https://doi.org/10.35111/9DJS-6V63 Abstract Introduction Ethnobotanical Research and Language Documentation of Nahuatl consists of approximately 190 hours of field recordings collected in the Sierra Nororiental and Sierra Norte regions of Puebla, Mexico. The corpus contains audio and video recordings of native Nahuatl speakers during the collection of particular plants; partial tran... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Second DIHARD Challenge Evaluation - SEEDLingS Jul 25, 2023 Ryant, Neville; Liberman, Mark; Fiumara, James; Cieri, Christopher, 2022, "Second DIHARD Challenge Evaluation - SEEDLingS", https://doi.org/10.35111/MFAM-HF33 Abstract Introduction Second DIHARD Challenge Evaluation - SEEDLingS was developed by Duke University and the Linguistic Data Consortium (LDC) and contains approximately two hours of English child language recordings along with corresponding annotations used in support of the Second DIHARD Challenge. The DIHARD Challenges are a set of shared tasks... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.

Second DIHARD Challenge Development - SEEDLingS

Aug 18, 2023

Ryant, Neville; Liberman, Mark; Fiumara, James; Cieri, Christopher, 2021, "Second DIHARD Challenge Development - SEEDLingS", https://doi.org/10.35111/B2YG-BZ44

Abstract Introduction Second DIHARD Challenge Development - SEEDLinGS was developed by Duke University and LDC and contains approximately two hours of English child language recordings along with corresponding annotations used in support of the Second DIHARD Challenge. This release, when combined with Second DIHARD Challenge Development - Eleven So...