lexicon_2020.txt

This file is part of "The Corpus of Contemporary American English (COCA) (1990-2015)".

Version 1.2
File Citation
Davies, Mark, 2025, "The Corpus of Contemporary American English (COCA) (1990-2015)", https://doi.org/10.7910/DVN/QEVJOH, Harvard Dataverse, V1; lexicon_2020.txt [fileName]
Dataset Citation
Davies, Mark, 2025, "The Corpus of Contemporary American English (COCA) (1990-2015)", https://doi.org/10.7910/DVN/QEVJOH, Harvard Dataverse, V1
File Metrics
0 Downloads
File Metadata  
Edit File

This file has already been deleted (or replaced) in the current version. It may not be edited.

Restrict Access

Restricting limits access to published files. People who want to use the restricted files can request access by default. If you disable request access, you must add information about access to the Terms of Access field.

Learn about restricting files and dataset access in the User Guide.

Enable access request
You must enable request access or add terms of access to restrict file access.
Save Changes
Edit Embargo

The selected file or files have already been published. Contact an administrator to change the embargo date or reason of the file or files.

Edit Retention Period

The selected file or files have already been published. Contact an administrator to change the retention period date or reason of the file or files.

Delete Files

The file will be deleted after you click on the Delete button.

Files will not be removed from previously published versions of the dataset.


Select File(s)

Please select one or more files.

Share File

Share this file on your favorite social media networks.

Package File Download

Use the Download URL in a Wget command or a download manager to download this package file. Download via web browser is not recommended. User Guide - Downloading a Dataverse Package via URL

https://qa.dataverse.org/api/access/datafile/

Dataset Terms

Please confirm and/or complete the information needed below in order to request access to files in this dataset.

This dataset is made available under the following terms. Please confirm and/or complete the information needed below in order to continue.

Our Community Norms as well as good scientific practices expect that proper credit is given via citation. Please use the data citation shown on the dataset page.

Custom Dataset Terms - the following Custom Dataset Terms have been defined for this dataset.

1. In no case can substantial amounts of the full-text data (typically, a total of 50,000 words or more) be distributed outside the organization listed on the license agreement. For example, you cannot create a large word list or set of n-grams, and then distribute this to others, and you could not copy 70,000 words from different texts and then place this on a website where users from outside your organization would have access to the data.

2. The data cannot be placed on a network (including the Internet), unless access to the data is limited (via restricted login, password, etc) just to those from the organization listed on the license agreement. Academic Single-User licenses do not allow the data to be distributed over a network.

3. In addition to the full-text data itself, #2 also applies to derived frequency, collocates, n-grams, concordance and similar data that is based on the corpus.

4. If portions of the derived data is made available to others, it cannot include substantial portions of the the raw frequency of words (e.g. the word occurs 3,403 times in the corpus) or the rank order (e.g. it is the 304th most common words). (Note: it is acceptable to use the frequency data to place words and phrases in "frequency bands", e.g. words 1-1000, 1001-3000, 3001-10,000, etc. However, there should not be more than about 20 frequency bands in your application.)
Academic licenses: are only valid for one campus. So if you are part of a research group, for example, with members at universities X, Y, and Z, they all need to purchase the data separately.

Academic licenses: you can not use the data to create software or products that will be sold to others.

Academic licenses: Graduate students can have access to the data for work on theses and dissertations. The data is primarily intended for use in research, not teaching. If you need corpus data for undergraduate classes, please use the standard web interface for the corpora.
Any publications or products that are based on the data should contain a reference to the source of the data: http://www.corpusdata.org.
Provenance

Provenance is a record of the origin of your data file and any transformations it has been through. Upload a JSON file from a provenance capture tool to generate a graph of your data's provenance. For more information, please refer to our User Guide.

File must be JSON format and follow the W3C standard.

Select File

You may also add information documenting the history of your data file, including how it was created, how it has changed, and who has worked with it.

Provenance
No changes have been made.
Request Access

  You need to Sign Up or Log In to request access.

Compute

This file is restricted and you may not compute on it because you have not been granted access.