Replication Data for: Cross-lingual classification of political texts using multilingual sentence embeddings (doi:10.7910/DVN/OLRTXA)

View:

Part 1: Document Description
Part 2: Study Description
Part 3: Data Files Description
Part 4: Variable Description
Part 5: Other Study-Related Materials
Entire Codebook

Document Description
Citation
Title:	Replication Data for: Cross-lingual classification of political texts using multilingual sentence embeddings
Identification Number:	doi:10.7910/DVN/OLRTXA
Distributor:	Harvard Dataverse
Date of Distribution:	2022-10-04
Version:	1
Bibliographic Citation:	Licht, Hauke, 2022, "Replication Data for: Cross-lingual classification of political texts using multilingual sentence embeddings", https://doi.org/10.7910/DVN/OLRTXA, Harvard Dataverse, V1, UNF:6:rG8yuayRT3euKCJ2meYa8A== [fileUNF]
Study Description
Citation
Title:	Replication Data for: Cross-lingual classification of political texts using multilingual sentence embeddings
Identification Number:	doi:10.7910/DVN/OLRTXA
Authoring Entity:	Licht, Hauke (University of Cologne, Cologne Center for Comparative Politics)
Distributor:	Harvard Dataverse
Access Authority:	Licht, Hauke
Depositor:	Code Ocean
Holdings Information:	https://doi.org/10.7910/DVN/OLRTXA
Study Scope
Keywords:	Social Sciences, Social Sciences, multilingual embedding, multilingual text analysis, supervised machine learning
Abstract:	Established approaches to analyze multilingual text corpora require either a duplication of analysts' efforts or high-quality machine translation (MT). In this paper, I argue that multilingual sentence embedding (MSE) is an attractive alternative approach to language-independent text representation. To support this argument, I evaluate MSE for cross-lingual supervised text classification. Specifically, I assess how reliably MSE-based classifiers detect manifesto sentences' topics and positions compared to classifiers trained using bag-of-words representations of machine-translated texts, and how this depends on the amount of training data. These analyses show that when training data is relatively scarce (e.g. 20K or less labeled sentences), MSE-based classifiers can be more reliable and are at least no less reliable than their MT-based counterparts. Further, I examine how reliable MSE-based classifiers label sentences written in languages not in the training data, focusing on the task of discriminating sentences that discuss the issue of immigration from those that do not. This analysis shows that compared to the within-language classification benchmark, such "cross-lingual transfer" tends to result in fewer reliability losses when relying on the MSE instead of the MT approach. This study thus presents an important addition to the cross-lingual text analysis toolkit.
Methodology and Processing
Sources Statement
Data Access
Other Study Description Materials
File Description--f6544545
File: runtimes.tab
	Number of cases: 5 No. of variables per record: 4 Type of File: text/tab-separated-values
Notes:	UNF:6:rG8yuayRT3euKCJ2meYa8A==
Variable Description
List of Variables:	analysis - analysis started_at - started_at ended_at - ended_at hours_elapsed - hours_elapsed
Variables
analysis
f6544545 Location:	Variable Format: character Notes: UNF:6:26jKf3hOmLtgwMsvplqvhQ==
started_at
f6544545 Location:	Variable Format: character Notes: UNF:6:hLxDzawegW+9qdAZXSgaGw==
ended_at
f6544545 Location:	Variable Format: character Notes: UNF:6:fMBZuxIDrmTKNMShzZKLPg==
hours_elapsed
f6544545 Location:	Summary Statistics: Valid 5.0; Min. 0.0172863094011943; Mean 4.627563992222143; StDev 5.181454336286318; Max. 12.2113724167479 Variable Format: numeric Notes: UNF:6:Gktc8PcOafe/RNHh+z7DJQ==
Other Study-Related Materials
Label:	analysis1.RData
Notes:	application/x-rlang-transport
Other Study-Related Materials
Label:	analysis2.RData
Notes:	application/x-rlang-transport
Other Study-Related Materials
Label:	baseline.RData
Notes:	application/x-rlang-transport
Other Study-Related Materials
Label:	crosslingual_transfer.RData
Notes:	application/x-rlang-transport
Other Study-Related Materials
Label:	exampleS1.tex
Notes:	application/x-tex
Other Study-Related Materials
Label:	exampleS2.tex
Notes:	application/x-tex
Other Study-Related Materials
Label:	exampleS3.tex
Notes:	application/x-tex
Other Study-Related Materials
Label:	exampleS4.tex
Notes:	application/x-tex
Other Study-Related Materials
Label:	exampleS5.tex
Notes:	application/x-tex
Other Study-Related Materials
Label:	exampleS6.tex
Notes:	application/x-tex
Other Study-Related Materials
Label:	figure1.pdf
Notes:	application/pdf
Other Study-Related Materials
Label:	figure2.pdf
Notes:	application/pdf
Other Study-Related Materials
Label:	figure3.pdf
Notes:	application/pdf
Other Study-Related Materials
Label:	figure4.pdf
Notes:	application/pdf
Other Study-Related Materials
Label:	figureS3.pdf
Notes:	application/pdf
Other Study-Related Materials
Label:	figureS4.pdf
Notes:	application/pdf
Other Study-Related Materials
Label:	figureS5.pdf
Notes:	application/pdf
Other Study-Related Materials
Label:	figureS6.pdf
Notes:	application/pdf
Other Study-Related Materials
Label:	figureS7.pdf
Notes:	application/pdf
Other Study-Related Materials
Label:	figureS8.pdf
Notes:	application/pdf
Other Study-Related Materials
Label:	output
Notes:	text/plain; charset=US-ASCII
Other Study-Related Materials
Label:	results-03843374-51b2-4ddf-b135-020a7b72471b.zip
Text:	Ported CO capsule
Notes:	application/zip
Other Study-Related Materials
Label:	table1.tex
Notes:	application/x-tex
Other Study-Related Materials
Label:	table2.tex
Notes:	application/x-tex
Other Study-Related Materials
Label:	tableS10.tex
Notes:	application/x-tex
Other Study-Related Materials
Label:	tableS11.tex
Notes:	application/x-tex
Other Study-Related Materials
Label:	tableS12.tex
Notes:	application/x-tex
Other Study-Related Materials
Label:	tableS13.tex
Notes:	application/x-tex
Other Study-Related Materials
Label:	tableS14.tex
Notes:	application/x-tex
Other Study-Related Materials
Label:	tableS15.tex
Notes:	application/x-tex
Other Study-Related Materials
Label:	tableS1.tex
Notes:	application/x-tex
Other Study-Related Materials
Label:	tableS2.tex
Notes:	application/x-tex
Other Study-Related Materials
Label:	tableS3.tex
Notes:	application/x-tex
Other Study-Related Materials
Label:	tableS4.tex
Notes:	application/x-tex
Other Study-Related Materials
Label:	tableS5.tex
Notes:	application/x-tex
Other Study-Related Materials
Label:	tableS6.tex
Notes:	application/x-tex
Other Study-Related Materials
Label:	tableS7.tex
Notes:	application/x-tex
Other Study-Related Materials
Label:	tableS8.tex
Notes:	application/x-tex
Other Study-Related Materials
Label:	tableS9.tex
Notes:	application/x-tex