Linguistic Data Consortium Dataverse

This dataverse contains LDC membership data for Harvard University affiliates.

Featured Dataverses

In order to use this feature you must have at least one published or linked dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

1 to 6 of 6 Results

Linguistic Data Consortium Harvard Membership - General Information Aug 2, 2016 Barbosa, Sonia, 2016, "Linguistic Data Consortium Harvard Membership - General Information", https://doi.org/10.7910/DVN/WL1DFP, Harvard Dataverse, V1 Membership Years 1993 (Not-for-Profit, Standard) 1994 (Not-for-Profit, Standard) 1995 (Not-for-Profit, Standard) 1996 (Not-for-Profit, Standard) 1997 (Not-for-Profit, Standard) 1998 (Not-for-Profit, Standard) 1999 (Not-for-Profit, Standard) 2000 (Not-for-Profit, Standard) 2001 (Not-for-Profit, Standard) 2002 (Not-for-Profit, Standard) 2003 (Not-for...
Continuous Speech Recognition Corpus - Disc 1 of 1 Aug 2, 2016 LDC, 2016, "Continuous Speech Recognition Corpus - Disc 1 of 1", https://doi.org/10.7910/DVN/P0PUTV, Harvard Dataverse, V1 The third ARPA Continuous Speech Recognition (CSR) Benchmark Speech Test Collection is a three CD-ROM set that contains complete development test and evaluation test suites for speaker-independent, large-vocabulary speech recognition systems. The development and evaluation tests share a common structure, consisting of two core test components ("hub...
CSR-II (WSJ1) Sennheiser Discs 1 - 3 Aug 2, 2016 LDC, 2016, "CSR-II (WSJ1) Sennheiser Discs 1 - 3", https://doi.org/10.7910/DVN/OVXSNR, Harvard Dataverse, V1 The complete WSJ1 corpus contains approximately 78,000 training utterances (73 hours of speech), 4,000 of which are the result of spontaneous dictation by journalists with varying degrees of experience in dictation. The corpus contains approximately 8,200 conventional development test utterances (eight hours of speech), 6,800 of which are from spon...
CSR-IV HUB4 Aug 2, 2016 Garofolo, John; Fiscus, Johnathan; Fisher, William; Pallett, David, 2016, "CSR-IV HUB4", https://doi.org/10.7910/DVN/BT8CTN, Harvard Dataverse, V1 This set of CD-ROMs contains all of the speech data provided to sites participating in the DARPA CSR November 1995 HUB4 (Radio) Broadcast News tests. The data consists of digitized waveforms of MarketPlace (tm) business news radio shows provided by KUSC through an agreement with the Linguistic Data Consortium and detailed transcriptions of those br...
CSR-IV HUB3 Aug 2, 2016 Fiscus, Jonathan; Garofolo, John; Pallett, David, 2016, "CSR-IV HUB3", https://doi.org/10.7910/DVN/DACJZB, Harvard Dataverse, V1 This set of CD-ROMs contains all of the speech data provided to sites participating in the DARPA CSR November 1995 HUB3 Multi-Microphone tests. The data consists of digitized waveforms collected with eight different microphones simultaneously from 40 subjects reading 15 sentence articles drawn from various North American business news publications....
CSR-I (WSJ0) Other Discs 1 - 2 Aug 2, 2016 Garofolo, John; Graff, David; Paul, Doug; Pallett, David, 2016, "CSR-I (WSJ0) Other Discs 1 - 2", https://doi.org/10.7910/DVN/ZVU9HF, Harvard Dataverse, V1 LDC93S6A - Complete CSR-I corpus LDC93S6B - CSR-I Sennheiser speech LDC93S6C - CSR-I other speech During 1991, the DARPA Spoken Language Program initiated efforts to build a new corpus to support research on large-vocabulary Continuous Speech Recognition (CSR) systems. The first two CSR Corpora consist primarily of read speech with texts drawn from...

Linguistic Data Consortium Harvard Membership - General Information

Aug 2, 2016

Barbosa, Sonia, 2016, "Linguistic Data Consortium Harvard Membership - General Information", https://doi.org/10.7910/DVN/WL1DFP, Harvard Dataverse, V1

Membership Years 1993 (Not-for-Profit, Standard) 1994 (Not-for-Profit, Standard) 1995 (Not-for-Profit, Standard) 1996 (Not-for-Profit, Standard) 1997 (Not-for-Profit, Standard) 1998 (Not-for-Profit, Standard) 1999 (Not-for-Profit, Standard) 2000 (Not-for-Profit, Standard) 2001 (Not-for-Profit, Standard) 2002 (Not-for-Profit, Standard) 2003 (Not-for...

Continuous Speech Recognition Corpus - Disc 1 of 1

Aug 2, 2016

LDC, 2016, "Continuous Speech Recognition Corpus - Disc 1 of 1", https://doi.org/10.7910/DVN/P0PUTV, Harvard Dataverse, V1

The third ARPA Continuous Speech Recognition (CSR) Benchmark Speech Test Collection is a three CD-ROM set that contains complete development test and evaluation test suites for speaker-independent, large-vocabulary speech recognition systems. The development and evaluation tests share a common structure, consisting of two core test components ("hub...

CSR-II (WSJ1) Sennheiser Discs 1 - 3

Aug 2, 2016

LDC, 2016, "CSR-II (WSJ1) Sennheiser Discs 1 - 3", https://doi.org/10.7910/DVN/OVXSNR, Harvard Dataverse, V1

The complete WSJ1 corpus contains approximately 78,000 training utterances (73 hours of speech), 4,000 of which are the result of spontaneous dictation by journalists with varying degrees of experience in dictation. The corpus contains approximately 8,200 conventional development test utterances (eight hours of speech), 6,800 of which are from spon...

CSR-IV HUB4

Aug 2, 2016

Garofolo, John; Fiscus, Johnathan; Fisher, William; Pallett, David, 2016, "CSR-IV HUB4", https://doi.org/10.7910/DVN/BT8CTN, Harvard Dataverse, V1

This set of CD-ROMs contains all of the speech data provided to sites participating in the DARPA CSR November 1995 HUB4 (Radio) Broadcast News tests. The data consists of digitized waveforms of MarketPlace (tm) business news radio shows provided by KUSC through an agreement with the Linguistic Data Consortium and detailed transcriptions of those br...

CSR-IV HUB3

Aug 2, 2016

Fiscus, Jonathan; Garofolo, John; Pallett, David, 2016, "CSR-IV HUB3", https://doi.org/10.7910/DVN/DACJZB, Harvard Dataverse, V1

This set of CD-ROMs contains all of the speech data provided to sites participating in the DARPA CSR November 1995 HUB3 Multi-Microphone tests. The data consists of digitized waveforms collected with eight different microphones simultaneously from 40 subjects reading 15 sentence articles drawn from various North American business news publications....

CSR-I (WSJ0) Other Discs 1 - 2

Aug 2, 2016

Garofolo, John; Graff, David; Paul, Doug; Pallett, David, 2016, "CSR-I (WSJ0) Other Discs 1 - 2", https://doi.org/10.7910/DVN/ZVU9HF, Harvard Dataverse, V1

LDC93S6A - Complete CSR-I corpus LDC93S6B - CSR-I Sennheiser speech LDC93S6C - CSR-I other speech During 1991, the DARPA Spoken Language Program initiated efforts to build a new corpus to support research on large-vocabulary Continuous Speech Recognition (CSR) systems. The first two CSR Corpora consist primarily of read speech with texts drawn from...

Add Data

Share Dataverse

Link Dataverse

Reset Modifications