View: |
Part 1: Document Description
|
Citation |
|
---|---|
Title: |
Danish Legislative Speech Corpus |
Identification Number: |
doi:10.7910/DVN/PNCBKF |
Distributor: |
Harvard Dataverse |
Date of Distribution: |
2021-12-13 |
Version: |
2 |
Bibliographic Citation: |
Hjorth, Frederik, 2021, "Danish Legislative Speech Corpus", https://doi.org/10.7910/DVN/PNCBKF, Harvard Dataverse, V2 |
Citation |
|
Title: |
Danish Legislative Speech Corpus |
Identification Number: |
doi:10.7910/DVN/PNCBKF |
Authoring Entity: |
Hjorth, Frederik (University of Copenhagen) |
Date of Production: |
2024-10-15 |
Distributor: |
Harvard Dataverse |
Access Authority: |
Hjorth, Frederik |
Depositor: |
Hjorth, Frederik |
Date of Deposit: |
2021-12-13 |
Holdings Information: |
https://doi.org/10.7910/DVN/PNCBKF |
Study Scope |
|
Keywords: |
Social Sciences |
Abstract: |
This dataset contains text and speaker data for around 1.7 million snippets of legislative speech from Denmark's Parliament, Folketinget. The data set draws in part on the <a href="https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/L4OAKN">ParlSpeech V2</a> data set, in part on Folketinget's publicly available <a href="https://www.ft.dk/da/dokumenter/aabne_data">XML transcripts</a>. In order to homogenize the lengths of text units, longer speeches are broken down into snippets of 3-5 sentences each. <br><br> Version 2 of the data extends the data coverage to late 2024. While the first version is retained here for archival purposes, most users will probably want the newest version (i.e. "paradf-v2"). |
Methodology and Processing |
|
Sources Statement |
|
Data Access |
|
Other Study Description Materials |
|
Label: |
paradf-v2.rds |
Text: |
Version 2 of the corpus, extending the data to late 2024. |
Notes: |
application/octet-stream |
Label: |
paradf.rds |
Text: |
File containing text snippets, date, and speaker name. |
Notes: |
application/octet-stream |