Replication Data for: A new database for Italian parliamentary speeches. Introducing the ItaParlCorpus dataset (doi:10.7910/DVN/RCKMDA)

View:

Part 1: Document Description
Part 2: Study Description
Part 5: Other Study-Related Materials
Entire Codebook

(external link)

Document Description

Citation

Title:

Replication Data for: A new database for Italian parliamentary speeches. Introducing the ItaParlCorpus dataset

Identification Number:

doi:10.7910/DVN/RCKMDA

Distributor:

Harvard Dataverse

Date of Distribution:

2025-03-23

Version:

1

Bibliographic Citation:

Cova, Joshua, 2025, "Replication Data for: A new database for Italian parliamentary speeches. Introducing the ItaParlCorpus dataset", https://doi.org/10.7910/DVN/RCKMDA, Harvard Dataverse, V1

Study Description

Citation

Title:

Replication Data for: A new database for Italian parliamentary speeches. Introducing the ItaParlCorpus dataset

Identification Number:

doi:10.7910/DVN/RCKMDA

Authoring Entity:

Cova, Joshua (Max Planck Institute for the Study of Societies)

Distributor:

Harvard Dataverse

Access Authority:

Cova, Joshua

Depositor:

Cova, Joshua

Date of Deposit:

2025-02-16

Holdings Information:

https://doi.org/10.7910/DVN/RCKMDA

Study Scope

Keywords:

Social Sciences, Parliamentary data, Italian politics, Text as data, Italy, Parliament, Political parties, Research methods, Text analysis

Abstract:

A common challenge in studying Italian parliamentary discourse is the lack of accessible, machine-readable, and systematized parliamentary data. To address this, this article introduces the ItaParlCorpus dataset, a new, annotated, machine-readable collection of Italian parliamentary plenary speeches for the Camera dei Deputati, the lower house of Parliament, spanning from 1948 to 2022. This dataset encompasses 470 million words and 2.4 million speeches delivered by 5,830 unique speakers representing 77 different political parties. The files are designed for easy processing and analysis using widely-used programming languages, and they include metadata such as speaker identification and party affiliation. This opens up opportunities for in-depth analyses on a variety of topics related to parliamentary behavior, elite rhetoric, and the salience of political themes, exploring how these vary across party families and over time.

Methodology and Processing

Sources Statement

Data Access

Other Study Description Materials

Related Publications

Citation

Title:

Cova J (2025). A new database for Italian parliamentary speeches: introducing the ItaParlCorpus dataset. Italian Political Science Review/Rivista Italiana di Scienza Politica 1–10. https://doi.org/10.1017/ipo.2025.6

Identification Number:

10.1017/ipo.2025.6

Bibliographic Citation:

Cova J (2025). A new database for Italian parliamentary speeches: introducing the ItaParlCorpus dataset. Italian Political Science Review/Rivista Italiana di Scienza Politica 1–10. https://doi.org/10.1017/ipo.2025.6

Other Study-Related Materials

Label:

read_me_itaparlcorpus.txt

Text:

ReadMe file

Notes:

text/plain

Other Study-Related Materials

Label:

replication_code_itaparlcorpus.R

Text:

Replication code for Figures 3-6

Notes:

type/x-r-syntax