View: |
Part 1: Document Description
|
Citation |
|
---|---|
Title: |
Replication Data for: Synthetically generated text for supervised text analysis |
Identification Number: |
doi:10.7910/DVN/JJ5BBX |
Distributor: |
Harvard Dataverse |
Date of Distribution: |
2024-11-15 |
Version: |
1 |
Bibliographic Citation: |
Halterman, Andrew, 2024, "Replication Data for: Synthetically generated text for supervised text analysis", https://doi.org/10.7910/DVN/JJ5BBX, Harvard Dataverse, V1, UNF:6:JJUrUpeMWFKHndQZmjKvEw== [fileUNF] |
Citation |
|
Title: |
Replication Data for: Synthetically generated text for supervised text analysis |
Identification Number: |
doi:10.7910/DVN/JJ5BBX |
Authoring Entity: |
Halterman, Andrew (Michigan State University) |
Producer: |
<i>Political Analysis</i> |
Distributor: |
Harvard Dataverse |
Access Authority: |
Halterman, Andrew |
Depositor: |
Halterman, Andrew |
Date of Deposit: |
2024-09-25 |
Holdings Information: |
https://doi.org/10.7910/DVN/JJ5BBX |
Study Scope |
|
Keywords: |
Social Sciences |
Abstract: |
Large language models are a powerful tool for conducting text analysis in political science, but using them to annotate text has several drawbacks, including high cost, limited reproducibility, and poor explainability. Traditional supervised text classifiers are fast and reproducible, but require expensive hand annotation, which is especially difficult for rare classes. This article proposes using LLMs to generate synthetic training data for training smaller, traditional supervised text models. Synthetic data can augment limited hand annotated data or be used on its own to train a classifier with good performance and greatly reduced cost. I provide a conceptual overview of text generation, guidance on when researchers should prefer different techniques for generating synthetic text, a discussion of ethics, a simple technique for improving the quality of synthetic text, and an illustration of its limitations. I demonstrate the usefulness of synthetic training through three validations: synthetic news articles describing police responses to communal violence in India for training an event detection system, a multilingual corpus of synthetic populist manifesto statements for training a sentence-level populism classifier, and generating synthetic tweets describing the fighting in Ukraine to improve a named entity system. |
Methodology and Processing |
|
Sources Statement |
|
Data Access |
|
Other Study Description Materials |
|
Related Publications |
|
Citation |
|
Title: |
Forthcoming, Political Analysis |
Bibliographic Citation: |
Forthcoming, Political Analysis |
File Description--f10551855 |
|
File: synth_gujarat_2024-05-17.tab |
|
|
|
Notes: |
UNF:6:o1yVfXB2c0gFEn41HHqH+g== |
File Description--f10551861 |
|
File: synth_gujarat_2024-05-24.tab |
|
|
|
Notes: |
UNF:6:aqyDEjVF3ucBtr+Ulx4fqw== |
File Description--f10551866 |
|
File: TOI_active_learning_realistic_results.tab |
|
|
|
Notes: |
UNF:6:lK9pYarmAGs5dUmaGSTcsQ== |
File Description--f10551856 |
|
File: TOI_active_learning_realistic_results_2024-08-30.tab |
|
|
|
Notes: |
UNF:6:T3PDuksi+oYhlHqF7l4WTw== |
File Description--f10551864 |
|
File: TOI_simple_results_2024-06-18.tab |
|
|
|
Notes: |
UNF:6:mOEWqLvcz1OO4RTHNymJvw== |
File Description--f10552042 |
|
File: real_vs_synth_ner.tab |
|
|
|
Notes: |
UNF:6:FOpz1zA+ucO3K+sfppK6fA== |
List of Variables: |
|
Variables |
|
f10551855 Location: |
Variable Format: character Notes: UNF:6:78CTScnIupbV/iwRTN3F0g== |
f10551855 Location: |
Variable Format: character Notes: UNF:6:gctIOeV7pWlbwIQjm88CiQ== |
f10551855 Location: |
Variable Format: character Notes: UNF:6:1R3P7ynfEnxye2gFSONclg== |
f10551861 Location: |
Variable Format: character Notes: UNF:6:RDYIwuD2LF0i1ZR8zgUfOA== |
f10551861 Location: |
Variable Format: character Notes: UNF:6:OjPDJv6HEWDRTPvaVC4fPA== |
f10551861 Location: |
Variable Format: character Notes: UNF:6:m4NJ/YD4elCDVlcPgdN+QA== |
f10551866 Location: |
Summary Statistics: Valid 15000.0; Max. 980.0; Mean 490.0; StDev 288.6270148540851; Min. 0.0 Variable Format: numeric Notes: UNF:6:kqoT4F+MeRjkMynONRezJQ== |
f10551866 Location: |
Summary Statistics: Max. 0.7710843373493976; StDev 0.23284022831277068; Mean 0.4988129145868635; Valid 15000.0; Min. 0.0 Variable Format: numeric Notes: UNF:6:X5X9be5xn0LUcArY6tFj1Q== |
f10551866 Location: |
Variable Format: character Notes: UNF:6:OsTFQCjxH53pFaeVQW/Itg== |
f10551866 Location: |
Summary Statistics: Min. 0.0; Max. 49.0; Valid 15000.0; Mean 24.5; StDev 14.431350742704252 Variable Format: numeric Notes: UNF:6:gSjV0bXJpCYbHHTbUBYEXw== |
f10551866 Location: |
Variable Format: character Notes: UNF:6:0dU3gT/6AOWw+austuV1/A== |
f10551856 Location: |
Summary Statistics: Valid 15000.0; Max. 980.0; Mean 490.0; StDev 288.6270148540851; Min. 0.0 Variable Format: numeric Notes: UNF:6:kqoT4F+MeRjkMynONRezJQ== |
f10551856 Location: |
Summary Statistics: Valid 15000.0; Max. 0.7631578947368421; StDev 0.22581957122092397; Min. 0.0; Mean 0.5056345091694852; Variable Format: numeric Notes: UNF:6:FSbCFxrP0zvFu350oOcwCQ== |
f10551856 Location: |
Summary Statistics: Mean 0.7305913603284444; Valid 15000.0; Max. 1.0; StDev 0.17460428030540925; Min. 0.0 Variable Format: numeric Notes: UNF:6:QiLZIlN6PC5zFSOihgoYIg== |
f10551856 Location: |
Summary Statistics: Valid 15000.0; StDev 0.21691624152795394; Mean 0.4188488816738817; Min. 0.0; Max. 0.79375 Variable Format: numeric Notes: UNF:6:n88JmV8S76+/s45UnEsa7g== |
f10551856 Location: |
Summary Statistics: Min. 0.931469708302169; Max. 0.9983171278982798; StDev 0.0037931706497809753; Mean 0.9931895350286711; Valid 15000.0 Variable Format: numeric Notes: UNF:6:wnLIJhQfN6HL/PfmIKEgZA== |
f10551856 Location: |
Summary Statistics: Mean 34.02846666666669; Max. 139.0; Valid 15000.0; StDev 30.183045860207248; Min. 0.0; Variable Format: numeric Notes: UNF:6:tpPGcupHRTcZHAfz64mwkg== |
f10551856 Location: |
Summary Statistics: StDev 0.08109927876184747; Max. 0.6666666666666666; Mean 0.15640469913651195; Min. 0.005263157894736842; Valid 15000.0 Variable Format: numeric Notes: UNF:6:Gmgy+P8bC4LX9RVmYrRF/Q== |
f10551856 Location: |
Variable Format: character Notes: UNF:6:OsTFQCjxH53pFaeVQW/Itg== |
f10551856 Location: |
Summary Statistics: Valid 0.0; Mean NaN; Min. NaN; StDev NaN; Max. NaN Variable Format: numeric Notes: UNF:6:GjAafq4oaAd+hOiWnQmCvQ== |
f10551856 Location: |
Summary Statistics: Valid 15000.0; Max. 49.0; StDev 14.431350742704252; Mean 24.5; Min. 0.0; Variable Format: numeric Notes: UNF:6:gSjV0bXJpCYbHHTbUBYEXw== |
f10551856 Location: |
Variable Format: character Notes: UNF:6:0dU3gT/6AOWw+austuV1/A== |
f10551864 Location: |
Summary Statistics: Mean 470.5882352941176; Max. 1000.0; Min. 0.0; StDev 386.1894081819535; Valid 20400.0 Variable Format: numeric Notes: UNF:6:Ms8vXs/wAVL8xX4RAxRYEQ== |
f10551864 Location: |
Summary Statistics: Max. 500.0; Valid 20400.0; StDev 186.29643865896256; Min. -1.0; Mean 164.5 Variable Format: numeric Notes: UNF:6:4GMWskCDWDoK4G/Gg9s/xg== |
f10551864 Location: |
Summary Statistics: Min. 0.0; Max. 49.0; Valid 20400.0; StDev 14.43122340045245; Mean 24.5 Variable Format: numeric Notes: UNF:6:cdDmDydaQplZrwSW0Lp5Og== |
f10551864 Location: |
Variable Format: character Notes: UNF:6:kMRvXsTRDJELlIL2So0ghw== |
f10551864 Location: |
Summary Statistics: Max. 0.7220216606498194; Valid 20400.0; StDev 0.2096684957771509; Min. 0.0; Mean 0.37664428997884664 Variable Format: numeric Notes: UNF:6:0WgODARLHZLb+JfrrMbkMw== |
f10551864 Location: |
Summary Statistics: Min. 0.0; Max. 0.31; StDev 0.06232873741633384; Valid 20400.0; Mean 0.06542067316681392 Variable Format: numeric Notes: UNF:6:gEJTll7xbSZOAJOl//cFsw== |
f10551864 Location: |
Summary Statistics: Min. 0.0; Valid 20400.0; Mean 430.29411764705884; Max. 1680.0; StDev 500.5471899352906; Variable Format: numeric Notes: UNF:6:eJe+l0qZWSNGI5jkEW44yQ== |
f10551864 Location: |
Variable Format: character Notes: UNF:6:q32hJL6/r5/fUQK72SvuIA== |
f10551864 Location: |
Variable Format: character Notes: UNF:6:KAv9S2dcKNPGVwdqSkcamA== |
f10551864 Location: |
Variable Format: character Notes: UNF:6:EEVgM2SxZLA1tT3mukKJww== |
f10551864 Location: |
Variable Format: character Notes: UNF:6:qS8ULz43CDIXOygTm9JbkA== |
f10552042 Location: |
Summary Statistics: Mean 0.5133968216481106; Max. 0.7792553191489362; Min. 0.0; StDev 0.20413850170125777; Valid 252.0; Variable Format: numeric Notes: UNF:6:jZCU8FXT1c+f6WRwmCv95A== |
f10552042 Location: |
Summary Statistics: Valid 252.0; Max. 300.0; StDev 68.7989030868835; Min. 50.0; Mean 127.77777777777773 Variable Format: numeric Notes: UNF:6:aCyG/DAq6fjqOWYT2Z62nQ== |
f10552042 Location: |
Variable Format: character Notes: UNF:6:XsUQFWhx5J8/Smk8kyIRgg== |
f10552042 Location: |
Summary Statistics: Min. 1.0; Max. 14.0; Mean 7.5; StDev 4.039151029097151; Valid 252.0; Variable Format: numeric Notes: UNF:6:yWxTbHu/0pCTK1/Gleu8bw== |
Label: |
README.txt |
Notes: |
text/plain |
Label: |
requirements.txt |
Notes: |
text/plain |
Label: |
README.md |
Text: | |
Notes: |
text/markdown |
Label: |
raw_annotations--raw.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
sents.csv |
Text: | |
Notes: |
text/comma-separated-values |
Label: |
.DS_Store |
Text: | |
Notes: |
application/octet-stream |
Label: |
.DS_Store |
Text: | |
Notes: |
application/octet-stream |
Label: |
embedding_vis.py |
Text: | |
Notes: |
text/x-python-script |
Label: |
generate_figures.pdf |
Text: | |
Notes: |
application/pdf |
Label: |
generate_figures.Rmd |
Text: | |
Notes: |
text/x-r-notebook |
Label: |
gen_synth_india.py |
Text: | |
Notes: |
text/x-python-script |
Label: |
india_police_events_active_learning.py |
Text: | |
Notes: |
text/x-python-script |
Label: |
india_police_events_experiment.py |
Text: | |
Notes: |
text/x-python-script |
Label: |
theme_pub.R |
Text: | |
Notes: |
type/x-r-syntax |
Label: |
IPE_events_active_error.png |
Text: | |
Notes: |
image/png |
Label: |
IPE_synth_diff.png |
Text: | |
Notes: |
image/png |
Label: |
new_ipe_fig_synth_both.png |
Text: | |
Notes: |
image/png |
Label: |
new_ipe_fig_synth_unbalanced.png |
Text: | |
Notes: |
image/png |
Label: |
new_ipe_fig_unbalanced.png |
Text: | |
Notes: |
image/png |
Label: |
.DS_Store |
Text: | |
Notes: |
application/octet-stream |
Label: |
README.md |
Text: | |
Notes: |
text/markdown |
Label: |
.DS_Store |
Text: | |
Notes: |
application/octet-stream |
Label: |
cmp_labeled_sentences_lang.csv |
Text: | |
Notes: |
text/comma-separated-values |
Label: |
cmp_labeled_synth_statements_2024_02_16.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
cmp_labeled_synth_statements_2024_02_18.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
gpt3_synth_all_cmp_2022-12-15.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
gpt3_synth_populism2.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
gpt3_synth_populism_neg2.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
manifesto_sent_level.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
manifesto_sent_level_for_prodigy.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
manifesto_sent_level_score_2023-02-05.csv |
Text: | |
Notes: |
text/comma-separated-values |
Label: |
manifesto_sent_level_score_mlm.csv |
Text: | |
Notes: |
text/comma-separated-values |
Label: |
populism_hand_validation.csv |
Text: | |
Notes: |
text/comma-separated-values |
Label: |
populism_validation.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fit_cmp_classifier_new.py |
Text: | |
Notes: |
text/x-python-script |
Label: |
generate_cmp_statements.py |
Text: | |
Notes: |
text/x-python-script |
Label: |
gen_synth_populist.py |
Text: | |
Notes: |
text/x-python-script |
Label: |
get_top_ukip.py |
Text: | |
Notes: |
text/x-python-script |
Label: |
hand_validation.py |
Text: | |
Notes: |
text/x-python-script |
Label: |
synth_pop_classifier_setfit.py |
Text: | |
Notes: |
text/x-python-script |
Label: |
theme_pub.R |
Text: | |
Notes: |
type/x-r-syntax |
Label: |
config.json |
Text: | |
Notes: |
application/json |
Label: |
config_sentence_transformers.json |
Text: | |
Notes: |
application/json |
Label: |
config_setfit.json |
Text: | |
Notes: |
application/json |
Label: |
model.safetensors |
Text: | |
Notes: |
application/octet-stream |
Label: |
model_head.pkl |
Text: | |
Notes: |
application/octet-stream |
Label: |
modules.json |
Text: | |
Notes: |
application/json |
Label: |
pytorch_model.bin |
Text: | |
Notes: |
application/macbinary |
Label: |
README.md |
Text: | |
Notes: |
text/markdown |
Label: |
sentencepiece.bpe.model |
Text: | |
Notes: |
application/octet-stream |
Label: |
sentence_bert_config.json |
Text: | |
Notes: |
application/json |
Label: |
special_tokens_map.json |
Text: | |
Notes: |
application/json |
Label: |
tokenizer.json |
Text: | |
Notes: |
application/json |
Label: |
tokenizer_config.json |
Text: | |
Notes: |
application/json |
Label: |
config.json |
Text: | |
Notes: |
application/json |
Label: |
config.json |
Text: | |
Notes: |
application/json |
Label: |
config_sentence_transformers.json |
Text: | |
Notes: |
application/json |
Label: |
model_head.pkl |
Text: | |
Notes: |
application/octet-stream |
Label: |
modules.json |
Text: | |
Notes: |
application/json |
Label: |
pytorch_model.bin |
Text: | |
Notes: |
application/macbinary |
Label: |
README.md |
Text: | |
Notes: |
text/markdown |
Label: |
sentencepiece.bpe.model |
Text: | |
Notes: |
application/octet-stream |
Label: |
sentence_bert_config.json |
Text: | |
Notes: |
application/json |
Label: |
special_tokens_map.json |
Text: | |
Notes: |
application/json |
Label: |
tokenizer.json |
Text: | |
Notes: |
application/json |
Label: |
tokenizer_config.json |
Text: | |
Notes: |
application/json |
Label: |
config.json |
Text: | |
Notes: |
application/json |
Label: |
README.md |
Text: | |
Notes: |
text/markdown |
Label: |
tweet_text_partial.csv |
Text: | |
Notes: |
text/comma-separated-values |
Label: |
fake_ukraine_tweets_2022-07-25_0.8_20_0.3_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.8_20_0.5_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.8_20_0.7_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.8_20_1.3_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.8_20_1.5_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.8_20_1.8_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.8_20_1_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.8_50_0.3_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.8_50_0.5_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.8_50_0.7_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.8_50_1.3_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.8_50_1.5_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.8_50_1.8_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.8_50_1_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.8_80_0.3_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.8_80_0.5_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.8_80_0.7_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.8_80_1.3_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.8_80_1.5_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.8_80_1.8_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.8_80_1_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.95_20_0.3_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.95_20_0.5_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.95_20_0.7_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.95_20_1.3_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.95_20_1.5_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.95_20_1.8_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.95_20_1_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.95_50_0.3_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.95_50_0.5_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.95_50_0.7_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.95_50_1.3_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.95_50_1.5_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.95_50_1.8_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.95_50_1_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.95_80_0.3_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.95_80_0.5_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.95_80_0.7_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.95_80_1.3_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.95_80_1.5_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.95_80_1.8_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.95_80_1_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.99_20_0.3_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.99_20_0.5_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.99_20_0.7_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.99_20_1.3_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.99_20_1.5_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.99_20_1.8_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.99_20_1_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.99_50_0.3_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.99_50_0.5_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.99_50_0.7_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.99_50_1.3_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.99_50_1.5_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.99_50_1.8_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.99_50_1_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.99_80_0.3_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.99_80_0.5_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.99_80_0.7_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.99_80_1.3_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.99_80_1.5_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.99_80_1.8_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.99_80_1_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.9_20_0.3_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.9_20_0.5_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.9_20_0.7_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.9_20_1.3_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.9_20_1.5_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.9_20_1.8_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.9_20_1_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.9_50_0.3_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.9_50_0.5_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.9_50_0.7_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.9_50_1.3_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.9_50_1.5_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.9_50_1.8_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.9_50_1_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.9_80_0.3_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.9_80_0.5_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.9_80_0.7_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.9_80_1.3_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.9_80_1.5_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.9_80_1.8_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-25_0.9_80_1_3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-26_0.8_50_0.3_1.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-26_0.8_50_0.5_1.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-26_0.8_50_0.7_1.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-26_0.8_50_1.3_1.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-26_0.8_50_1.5_1.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-26_0.8_50_1.8_1.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-26_0.8_50_1_1.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-26_0.95_50_0.3_1.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-26_0.95_50_0.5_1.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-26_0.95_50_0.7_1.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-26_0.95_50_1.3_1.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-26_0.95_50_1.5_1.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-26_0.95_50_1.8_1.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-26_0.95_50_1_1.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-26_0.99_50_0.3_1.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-26_0.99_50_0.5_1.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-26_0.99_50_0.7_1.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-26_0.99_50_1.3_1.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-26_0.99_50_1.5_1.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-26_0.99_50_1.8_1.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-26_0.99_50_1_1.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-26_0.9_50_0.3_1.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-26_0.9_50_0.5_1.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-26_0.9_50_0.7_1.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-26_0.9_50_1.3_1.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-26_0.9_50_1.5_1.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-26_0.9_50_1.8_1.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
fake_ukraine_tweets_2022-07-26_0.9_50_1_1.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
synth_detection_results_transformer.csv |
Text: | |
Notes: |
text/comma-separated-values |
Label: |
ukr_synth_bad_both.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
ukr_weapon_real.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
ukr_weapon_synth_best3.jsonl |
Text: | |
Notes: |
application/octet-stream |
Label: |
1_get_tweets.py |
Text: | |
Notes: |
text/x-python-script |
Label: |
2_fine_tune_gpt2_twitter.py |
Text: | |
Notes: |
text/x-python-script |
Label: |
3_generate_synth_tweets_gpt2.py |
Text: | |
Notes: |
text/x-python-script |
Label: |
4_select_generation_parameters_fancy.py |
Text: | |
Notes: |
text/x-python-script |
Label: |
5_fit_ner.py |
Text: | |
Notes: |
text/x-python-script |
Label: |
6_real_vs_synth_ner_plots.Rmd |
Notes: |
text/x-r-notebook |
Label: |
theme_pub.R |
Text: | |
Notes: |
type/x-r-syntax |
Label: |
real_vs_synth_ner_perf.pdf |
Notes: |
application/pdf |
Label: |
tweet_discrim_parameters_transformer.pdf |
Text: | |
Notes: |
application/pdf |
Label: |
config.json |
Text: | |
Notes: |
application/json |
Label: |
generation_config.json |
Text: | |
Notes: |
application/json |
Label: |
model.safetensors |
Notes: |
application/octet-stream |
Label: |
training_args.bin |
Text: | |
Notes: |
application/macbinary |