Replication Data for: Positioning Political Texts with Large Language Models by Asking and Averaging (doi:10.7910/DVN/YFM0BW)

View:

Part 1: Document Description
Part 2: Study Description
Part 5: Other Study-Related Materials
Entire Codebook

Document Description
Citation
Title:	Replication Data for: Positioning Political Texts with Large Language Models by Asking and Averaging
Identification Number:	doi:10.7910/DVN/YFM0BW
Distributor:	Harvard Dataverse
Date of Distribution:	2024-11-14
Version:	2
Bibliographic Citation:	Le Mens, Gaël; Gallego, Aina, 2024, "Replication Data for: Positioning Political Texts with Large Language Models by Asking and Averaging", https://doi.org/10.7910/DVN/YFM0BW, Harvard Dataverse, V2
Study Description
Citation
Title:	Replication Data for: Positioning Political Texts with Large Language Models by Asking and Averaging
Identification Number:	doi:10.7910/DVN/YFM0BW
Authoring Entity:	Le Mens, Gaël (Pompeu Fabra University)
	Gallego, Aina (University of Barcelona and Institut Barcelona d’Estudis Internacionals)
Distributor:	Harvard Dataverse
Access Authority:	Le Mens, Gaël
Depositor:	Code Ocean
Holdings Information:	https://doi.org/10.7910/DVN/YFM0BW
Study Scope
Keywords:	Social Sciences
Abstract:	We use instruction-tuned Large Language Models (LLMs) like GPT-4, Llama 3, MiXtral, or Aya to position political texts within policy and ideological spaces. We ask an LLM where a tweet or a sentence of a political text stands on the focal dimension and take the average of the LLM responses to position political actors such as US Senators, or longer texts such as UK party manifestos or EU policy speeches given in 10 different languages. The correlations between the position estimates obtained with the best LLMs and benchmarks based on text coding by experts, crowdworkers, or roll call votes exceed .90. This approach is generally more accurate than the positions obtained with supervised classifiers trained on large amounts of research data. Using instruction-tuned LLMs to position texts in policy and ideological spaces is fast, cost-efficient, reliable, and reproducible (in the case of open LLMs) even if the texts are short and written in different languages. We conclude with cautionary notes about the need for empirical validation.
Methodology and Processing
Sources Statement
Data Access
Other Study Description Materials
Other Study-Related Materials
Label:	capsule-1286058.zip
Notes:	application/zip
Other Study-Related Materials
Label:	result-b1dc07a9-a08d-4289-b75c-31c1ce7006c6.zip
Notes:	application/zip