Date: Tue, 21 May 1996 06:12:29 -0400


Subject: linguistic data consortium

perhaps this will be of interest to some. from the humanist list:

------- Forwarded Message Follows -------

Humanist Discussion Group, Vol. 10, No. 35.

Center for Electronic Texts in the Humanities (Princeton/Rutgers)

Information at


Date: Mon, 20 May 1996 08:24:24 -0400


Announcing a NEW RELEASE from the


Radio Broadcast News

Continuous Speech Recognition Corpus (Hub-4)

This set of CD-ROMs contains all of the speech data provided to sites

participating in the DARPA CSR November 1995 Hub-4 (Radio) Broadcast

News tests. The data consists of digitized waveforms of MarketPlace

(tm) business news radio shows provided by KUSC through an agreement

with the Linguistic Data Consortium, and detailed transcriptions of

those broadcasts. The software NIST used to process and score the

output of the test systems is also included.

The data is organized as follows:

CD26-1: Training Data-Ten complete half-hour broadcasts with

minimally-verified transcripts. The transcripts are time aligned with

the waveforms at the story-boundary level.

CD26-2: Development-Test Data-Six complete half-hour broadcasts with

verified transcripts. The transcripts are time aligned with the

waveforms at the story-and turn-boundary level. Index files have been

included which specify how the data may be partitioned into 2 test


CD26-6 Evaluation-Test Data-Five complete half-hour broadcasts with

verified/adjudicated transcripts. The transcripts are time aligned

with the waveforms at the story-, turn-, and music-boundary level. An

index file has been included which specifies how the data was

partitioned into the test set used in the CSR 1995 Hub-4 tests.

Institutions that have membership in the LDC during the 1996

Membership Year will be able to receive a copy of the Radio Broadcast

News at no additional charge, in the same manner as all other text and

speech corpora published by the LDC.

Nonmembers can receive a copy of this corpus for research purposes

only for a fee of $2500. If you would like to order a copy of this

corpus, please email your request to ldc[AT SYMBOL GOES HERE] If you

need additional information before placing your order, or would like

to inquire about membership in the LDC, please send email or call

(215) 898-0464.

Further information about the LDC and its available corpora can be

accessed on the Linguistic Data Consortium WWW Home Page at URL Information is also available via ftp

at under pub/ldc; for ftp access, please use

"anonymous" as your login name, and give your email address when asked

for password.