Featured Dataverses

In order to use this feature you must have at least one published or linked dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

Advanced Search

71 to 80 of 2,783 Results
MS Excel Spreadsheet - 1.6 MB - MD5: ded3946f87f9012675db49388b2b319b
version 1.1; 2025-05-02
Apr 24, 2025 - Harvard Library E-Resources Licensed Data Dataverse
Davies, Mark, 2025, "The Corpus of Historical American English (COHA)", https://doi.org/10.7910/DVN/IFMZJY, Harvard Dataverse, V1
The Corpus of Historical American English (COHA) was created by Mark Davies, and it is the largest structured corpus of historical English. It is related to other corpora from English-Corpora.org, which are the most widely used corpora of English and which offer unparalleled insight into variation in English. COHA contains more than 475 million wor...
TAR Archive - 2.2 GB - MD5: 9bb4d7730682bf250dad587d82a3c3f3
Data/format: Database Default columns: textID, ID (1 - n), wordID (link to lexicon)
Plain Text - 114.4 MB - MD5: f9b5751bb5aabb8edc8942163bfaec18
Data/format: Lexicon Default columns: wordID (link to database), word, lemma, PoS
Plain Text - 14.4 MB - MD5: 0031366eb28f705fc379c67e07cb25d0
Data/format: Sources Default columns: Depends on corpus. Left two columns are textID, #words, and then e.g. title, URL, source, etc
TAR Archive - 932.9 MB - MD5: ac3d51144d06525d4e0d4eb2095f3563
Data/format: Text Default columns: textID, text
TAR Archive - 1.9 GB - MD5: 1a0e2d930ec2543094171bf38c0f357d
Data/format: Word / lemma / PoS Default columns: textID, ID (1 - n), word, lemma, PoS
Apr 8, 2025 - Harvard Library E-Resources Licensed Data Dataverse
Davies, Mark, 2025, "The Corpus of Contemporary American English (COCA) (1990-2015)", https://doi.org/10.7910/DVN/QEVJOH, Harvard Dataverse, V1
The Corpus of Contemporary American English (COCA) is the largest corpus of English, and the only large and balanced corpus of American English. The corpus contains more than 560 million words of text (20 million words each year 1990-2015) and it is equally divided among spoken, fiction, popular magazines, newspapers, and academic texts.
Plain Text - 269.7 MB - MD5: 6ada8bc9677b0bfdba82dd6b6435307f
Plain Text - 40.7 MB - MD5: cd0b4418b4155ac09e2d7bac7648b3dd
Add Data

Sign up or log in to create a dataverse or add a dataset.

Share Dataverse

Share this dataverse on your favorite social media networks.

Link Dataverse
Reset Modifications

Are you sure you want to reset the selected metadata fields? If you do this, any customizations (hidden, required, optional) you have done will no longer appear.