Harvard Subscription Data Dataverse

Featured Dataverses

In order to use this feature you must have at least one published or linked dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

71 to 80 of 2,783 Results

Gov_Election_Data_2022.xlsx May 2, 2025 - Dave Leip Governor General County Election Data MS Excel Spreadsheet - 1.6 MB - MD5: ded3946f87f9012675db49388b2b319b version 1.1; 2025-05-02
The Corpus of Historical American English (COHA) Apr 24, 2025 - Harvard Library E-Resources Licensed Data Dataverse Davies, Mark, 2025, "The Corpus of Historical American English (COHA)", https://doi.org/10.7910/DVN/IFMZJY, Harvard Dataverse, V1 The Corpus of Historical American English (COHA) was created by Mark Davies, and it is the largest structured corpus of historical English. It is related to other corpora from English-Corpora.org, which are the most widely used corpora of English and which offer unparalleled insight into variation in English. COHA contains more than 475 million wor...
coha-db.tar Apr 24, 2025 - The Corpus of Historical American English (COHA) TAR Archive - 2.2 GB - MD5: 9bb4d7730682bf250dad587d82a3c3f3 Data/format: Database Default columns: textID, ID (1 - n), wordID (link to lexicon)
coha-lexicon.txt Apr 24, 2025 - The Corpus of Historical American English (COHA) Plain Text - 114.4 MB - MD5: f9b5751bb5aabb8edc8942163bfaec18 Data/format: Lexicon Default columns: wordID (link to database), word, lemma, PoS
coha-sources.txt Apr 24, 2025 - The Corpus of Historical American English (COHA) Plain Text - 14.4 MB - MD5: 0031366eb28f705fc379c67e07cb25d0 Data/format: Sources Default columns: Depends on corpus. Left two columns are textID, #words, and then e.g. title, URL, source, etc
coha-text.tar Apr 24, 2025 - The Corpus of Historical American English (COHA) TAR Archive - 932.9 MB - MD5: ac3d51144d06525d4e0d4eb2095f3563 Data/format: Text Default columns: textID, text
coha-wlp.tar Apr 24, 2025 - The Corpus of Historical American English (COHA) TAR Archive - 1.9 GB - MD5: 1a0e2d930ec2543094171bf38c0f357d Data/format: Word / lemma / PoS Default columns: textID, ID (1 - n), word, lemma, PoS
The Corpus of Contemporary American English (COCA) (1990-2015) Apr 8, 2025 - Harvard Library E-Resources Licensed Data Dataverse Davies, Mark, 2025, "The Corpus of Contemporary American English (COCA) (1990-2015)", https://doi.org/10.7910/DVN/QEVJOH, Harvard Dataverse, V1 The Corpus of Contemporary American English (COCA) is the largest corpus of English, and the only large and balanced corpus of American English. The corpus contains more than 560 million words of text (20 million words each year 1990-2015) and it is equally divided among spoken, fiction, popular magazines, newspapers, and academic texts.
lexicon_2020.txt Apr 8, 2025 - The Corpus of Contemporary American English (COCA) (1990-2015) Plain Text - 269.7 MB - MD5: 6ada8bc9677b0bfdba82dd6b6435307f
sources_2020.txt Apr 8, 2025 - The Corpus of Contemporary American English (COCA) (1990-2015) Plain Text - 40.7 MB - MD5: cd0b4418b4155ac09e2d7bac7648b3dd

Gov_Election_Data_2022.xlsx

May 2, 2025 - Dave Leip Governor General County Election Data

MS Excel Spreadsheet - 1.6 MB -

version 1.1; 2025-05-02

The Corpus of Historical American English (COHA)

Apr 24, 2025 - Harvard Library E-Resources Licensed Data Dataverse

Davies, Mark, 2025, "The Corpus of Historical American English (COHA)", https://doi.org/10.7910/DVN/IFMZJY, Harvard Dataverse, V1

The Corpus of Historical American English (COHA) was created by Mark Davies, and it is the largest structured corpus of historical English. It is related to other corpora from English-Corpora.org, which are the most widely used corpora of English and which offer unparalleled insight into variation in English. COHA contains more than 475 million wor...

coha-db.tar

Apr 24, 2025 - The Corpus of Historical American English (COHA)

TAR Archive - 2.2 GB -

Data/format: Database Default columns: textID, ID (1 - n), wordID (link to lexicon)

coha-lexicon.txt

Apr 24, 2025 - The Corpus of Historical American English (COHA)

Plain Text - 114.4 MB -

Data/format: Lexicon Default columns: wordID (link to database), word, lemma, PoS

coha-sources.txt

Apr 24, 2025 - The Corpus of Historical American English (COHA)

Plain Text - 14.4 MB -

Data/format: Sources Default columns: Depends on corpus. Left two columns are textID, #words, and then e.g. title, URL, source, etc

coha-text.tar

Apr 24, 2025 - The Corpus of Historical American English (COHA)

TAR Archive - 932.9 MB -

Data/format: Text Default columns: textID, text

coha-wlp.tar

Apr 24, 2025 - The Corpus of Historical American English (COHA)

TAR Archive - 1.9 GB -

Data/format: Word / lemma / PoS Default columns: textID, ID (1 - n), word, lemma, PoS

The Corpus of Contemporary American English (COCA) (1990-2015)

Apr 8, 2025 - Harvard Library E-Resources Licensed Data Dataverse

Davies, Mark, 2025, "The Corpus of Contemporary American English (COCA) (1990-2015)", https://doi.org/10.7910/DVN/QEVJOH, Harvard Dataverse, V1

The Corpus of Contemporary American English (COCA) is the largest corpus of English, and the only large and balanced corpus of American English. The corpus contains more than 560 million words of text (20 million words each year 1990-2015) and it is equally divided among spoken, fiction, popular magazines, newspapers, and academic texts.

lexicon_2020.txt

Apr 8, 2025 - The Corpus of Contemporary American English (COCA) (1990-2015)

Plain Text - 269.7 MB -

sources_2020.txt

Apr 8, 2025 - The Corpus of Contemporary American English (COCA) (1990-2015)

Plain Text - 40.7 MB -

Add Data

Share Dataverse

Link Dataverse

Reset Modifications