CSR-IV HUB4 (doi:10.7910/DVN/BT8CTN)

View:

Part 1: Document Description
Part 2: Study Description
Entire Codebook

Document Description

Citation

Title:

CSR-IV HUB4

Identification Number:

doi:10.7910/DVN/BT8CTN

Distributor:

Harvard Dataverse

Date of Distribution:

2016-08-02

Version:

1

Bibliographic Citation:

Garofolo, John; Fiscus, Johnathan; Fisher, William; Pallett, David, 2016, "CSR-IV HUB4", https://doi.org/10.7910/DVN/BT8CTN, Harvard Dataverse, V1

Study Description

Citation

Title:

CSR-IV HUB4

Identification Number:

doi:10.7910/DVN/BT8CTN

Authoring Entity:

Garofolo, John

Fiscus, Johnathan

Fisher, William

Pallett, David

Distributor:

Harvard Dataverse

Depositor:

Cabanas, Jordi

Date of Deposit:

2016-06-27

Holdings Information:

https://doi.org/10.7910/DVN/BT8CTN

Study Scope

Keywords:

Social Sciences, LDC Catalog No.: LDC96S31, ISBN: 1-58563-087-X, LDC

Abstract:

This set of CD-ROMs contains all of the speech data provided to sites participating in the DARPA CSR November 1995 HUB4 (Radio) Broadcast News tests. The data consists of digitized waveforms of MarketPlace (tm) business news radio shows provided by KUSC through an agreement with the Linguistic Data Consortium and detailed transcriptions of those broadcasts. The software NIST used to process and score the output of the test systems is also included. The data is organized as follows: <br> <br> CD26-1: Training Data-Ten complete half-hour broadcasts with minimal-verified transcripts. The transcripts are time aligned with the waveforms at the story-boundary level. <br> <br> CD26-2: Development-Test Data-Six complete half-hour broadcasts with verified transcripts. The transcripts are time aligned with the waveforms at the story- and turn-boundary level. Index files have been included which specify how the data may be partitioned into 2 test sets. <br> <br> CD26-6 Evaluation-Test Data-Five complete half-hour broadcasts with verified/adjudicated transcripts. The transcripts are time aligned with the waveforms at the story-, turn- and music-boundary level. An index file has been included which specifies how the data was partitioned into the test set used in the CSR 1995 HUB4 tests.

Methodology and Processing

Sources Statement

Documentation and Access to Sources:

The files are too large to be provided directly on Dataverse. To access this data, please bring a Harvard University ID and a flash drive with 2 GB capacity to CGIS Knafel, Room 350, 1737 Cambridge St. Cambridge, MA 02138

Data Access

Notes:

Datasets are restricted for use to Harvard University affiliates.<br /><br /> The files are too large to be provided directly on Dataverse. To access this data, please bring a Harvard University ID and a flash drive with 2 GB capacity to CGIS Knafel, Room 350, 1737 Cambridge St. Cambridge, MA 02138

Other Study Description Materials