Description
|
C.R.I.M.E, the Corpus of Recorded Investigative, Media, and Evidence-based proceedings, is a corpus comprising audio and ASR-generated transcripts from investigative interviews, courtroom interactions, and related media. The resource enables analysis of linguistic, phonetic, pragmatic, and discourse-level features, supporting interdisciplinary research in linguistics, law, psychology, and computational modeling. (2025-04-28)
|
Notes
| The static version of the corpus comprises a table with transcripts and metadata from 23,270 recordings. Each row corresponds to one recording. The columns record Playlist, Channel, ID, Title, URL, Description (if any), View Count, Duration (seconds), Uploader, Uploader ID, Uploader URL, Thumbnails, Timestamp, Release Timestamp, Availability, Live Status, Channel Verified, auto_transcript, other_transcript, wav, timed_auto, timed_other, timed_auto_words, and timed_other_words. ASR transcripts are in the timed_auto column; other transcripts in timed_other. For more information, please see ... Note that due to the terms of use for this dataset, download access will be approved only for accounts with email addresses registered with academic or research institutions. |