Replication data for: Reverse Engineering Chinese Censorship: Randomized Experimentation and Participant Observation (doi:10.7910/DVN/26212)

View:

Part 1: Document Description
Part 2: Study Description
Part 3: Data Files Description
Part 4: Variable Description
Part 5: Other Study-Related Materials
Entire Codebook

Document Description

Citation

Title:

Replication data for: Reverse Engineering Chinese Censorship: Randomized Experimentation and Participant Observation

Identification Number:

doi:10.7910/DVN/26212

Distributor:

Harvard Dataverse

Date of Distribution:

2014-05-27

Version:

5

Bibliographic Citation:

King, Gary; Pan, Jennifer; Roberts, Margaret, E., 2014, "Replication data for: Reverse Engineering Chinese Censorship: Randomized Experimentation and Participant Observation", https://doi.org/10.7910/DVN/26212, Harvard Dataverse, V5, UNF:5:K/LGmB0vjskGYBobxbT+8g== [fileUNF]

Study Description

Citation

Title:

Replication data for: Reverse Engineering Chinese Censorship: Randomized Experimentation and Participant Observation

Identification Number:

doi:10.7910/DVN/26212

Authoring Entity:

King, Gary (Harvard University)

Pan, Jennifer (Harvard University)

Roberts, Margaret, E. (Harvard University)

Distributor:

Harvard Dataverse

Distributor:

Harvard Dataverse

Access Authority:

Gary King

Date of Deposit:

2014-05-27

Date of Distribution:

2014

Holdings Information:

https://doi.org/10.7910/DVN/26212

Study Scope

Keywords:

Social Sciences

Abstract:

Chinese government censorship of social media constitutes the largest coordinated selective suppression of human communication in recorded history. Although existing research on the subject has revealed a great deal, it is based on passive, observational methods, with well known inferential limitations. For example, these methods can reveal nothing about censorship that occurs before submissions are posted, such as via automated review which we show is used at two-thirds of all social media sites. We offer two approaches to overcome these limitations. For causal inferences, we conduct the first large scale experimental study of censorship by creating accounts on numerous social media sites spread throughout the country, submitting different randomly assigned types of social media texts, and detecting from a network of computers all over the world which types are censored. Then, for descriptive inferences, we supplement the current uncertain practice of conducting anonymous interviews with secret informants, by participant observation: we set up our own social media site in China, contract with Chinese firms to install the same censoring technologies as their existing sites, and -- with direct access to their software, documentation, and even customer service help desk support -- reverse engineer how it all works. Our results offer the first rigorous experimental support for the recent hypothesis that criticism of the state, its leaders, and their policies are routinely published, whereas posts about real world events with collective action potential are censored. We also extend the hypothesis by showing that it applies even to accusations of corruption by high-level officials and massive online-only protests, neither of which are censored. We also reveal for the first time the inner workings of the process of automated review, and as a result are able to reconcile conflicting accounts of keyword-based content filtering in the academic literature. We show that the Chinese government tolerates surprising levels of diversity in automated review technology, but still ensures a uniform outcome by post hoc censorship using huge numbers of human coders. <br /><br /> See also: <a href="http://gking.harvard.edu/category/research-interests/applications/automated-text-analysis" target="_blank">Automated Text Analysis</a>

Methodology and Processing

Sources Statement

Data Access

Notes:

This dataset is made available without information on how it can be used. You should communicate with the Contact(s) specified before use.

Other Study Description Materials

Related Publications

Citation

Title:

King, Gary, Jennifer Pan, and Margaret E Roberts. 2014. “Reverse-Engineering Censorship in China: Randomized Experimentation and Participant Observation.” Science 345 (6199): 1-10. <a href="http://j.mp/1KbwkJJ" target="_blank">Link to article</a>

Bibliographic Citation:

King, Gary, Jennifer Pan, and Margaret E Roberts. 2014. “Reverse-Engineering Censorship in China: Randomized Experimentation and Participant Observation.” Science 345 (6199): 1-10. <a href="http://j.mp/1KbwkJJ" target="_blank">Link to article</a>

File Description--f2468914

File: AiWeiwei_obs_replicate.tab

  • Number of cases: 80

  • No. of variables per record: 2

  • Type of File: text/tab-separated-values

Notes:

UNF:5:uM/zT66PuAYki4wE68RN5Q==

File Description--f2468912

File: PotalaPalace_obs_replicate.tab

  • Number of cases: 142

  • No. of variables per record: 2

  • Type of File: text/tab-separated-values

Notes:

UNF:5:le1jxO0+PWc5SJuetcg4Jw==

File Description--f2468915

File: results_all_replication.tab

  • Number of cases: 1200

  • No. of variables per record: 12

  • Type of File: text/tab-separated-values

Notes:

UNF:5:MFe4uwNcUrPHl8fw4U70dg==

File Description--f2468916

File: reviewed_replication.tab

  • Number of cases: 100

  • No. of variables per record: 3

  • Type of File: text/tab-separated-values

Notes:

UNF:5:/UNS7/lpOUXwpTv1NMC9jQ==

File Description--f2468913

File: UyghurLongName_obs_replicate.tab

  • Number of cases: 1520

  • No. of variables per record: 2

  • Type of File: text/tab-separated-values

Notes:

UNF:5:AoUfMrFlEfK8rzRZ2j6ehg==

File Description--f2468911

File: XJPDumplingcensor.tab

  • Number of cases: 20

  • No. of variables per record: 5

  • Type of File: text/tab-separated-values

Notes:

UNF:5:bb6SKx4GOaZuEPwNeJtlxA==

File Description--f2468909

File: XJPDumplingnotcensor.tab

  • Number of cases: 20

  • No. of variables per record: 5

  • Type of File: text/tab-separated-values

Notes:

UNF:5:JVHl8NaXJjoJuyqB3ixlpw==

File Description--f2468910

File: XJPDumplingupdown.tab

  • Number of cases: 9584

  • No. of variables per record: 2

  • Type of File: text/tab-separated-values

Notes:

UNF:5:h6HLeKPWrqz0BZ7O3yUg5g==

Variable Description

List of Variables:

Variables

Date

f2468914 Location:

Variable Format: character

Notes: UNF:5:plgonvqSLUdKaOtYU8ukkw==

UpDown

f2468914 Location:

Variable Format: numeric

Notes: UNF:5:chGhvaKw2EfVhPwTCa3bMg==

Date

f2468912 Location:

Variable Format: character

Notes: UNF:5:8bYiipvQJ0qTcuLBO/O5sw==

UpDown

f2468912 Location:

Variable Format: numeric

Notes: UNF:5:fS+ZVdxpSo3Vpl+AkykgMQ==

UniqueID

f2468915 Location:

Variable Format: character

Notes: UNF:5:AN/EIv6ToyG0xQTAigKcyg==

Round

f2468915 Location:

Variable Format: numeric

Notes: UNF:5:ReCTuybTRePGEZChaMdNBQ==

AcctNum2

f2468915 Location:

Variable Format: numeric

Notes: UNF:5:pDDSvZwbQbW4QAtSPOPS2g==

Website

f2468915 Location:

Variable Format: numeric

Notes: UNF:5:x65PYCXz0F3M5xlFk+22Gw==

Topic

f2468915 Location:

Variable Format: numeric

Notes: UNF:5:PQWzv2Lmy9d3sh7BLWxzyQ==

Type

f2468915 Location:

Variable Format: character

Notes: UNF:5:i8Gzs+gzp8LSqh2jK/Xq1g==

Sentiment2

f2468915 Location:

Variable Format: character

Notes: UNF:5:738DeqSSyqj3Cr02RYeL+g==

AcctLogin

f2468915 Location:

Variable Format: character

Notes: UNF:5:2zpvMoW6rwx0bZRzJ3L/Tg==

PostStatus

f2468915 Location:

Variable Format: character

Notes: UNF:5:CRnDbUrKjeF5UL+Xmu1sbg==

CantPost

f2468915 Location:

Variable Format: character

Notes: UNF:5:I7J6pFyZlED4FaThiahGXA==

StatusPublished

f2468915 Location:

Variable Format: character

Notes: UNF:5:HjAoTNx8DbOWLVz2DfwY7w==

StatusPending

f2468915 Location:

Variable Format: character

Notes: UNF:5:Ka5+sDfqoz6cL/BC6wUS/w==

WebsiteURL

f2468916 Location:

Variable Format: numeric

Notes: UNF:5:BJh0puHcOc0Pv6aAdq2hNQ==

Review

f2468916 Location:

Variable Format: numeric

Notes: UNF:5:PcB72iUaG+JZEuY52qCuRw==

Type

f2468916 Location:

Variable Format: character

Notes: UNF:5:zHTkK++hTSx46DKgzZgl6g==

Date

f2468913 Location:

Variable Format: character

Notes: UNF:5:iZ6EnYtC7C8PTPmhPDJyLA==

UpDown

f2468913 Location:

Variable Format: numeric

Notes: UNF:5:hNIPKUbg9l/VEvfr6J9BYQ==

V1

f2468911 Location:

Variable Format: numeric

Notes: UNF:5:u9SsxxcOYWwqTYldbrd7vQ==

critical

f2468911 Location:

Variable Format: numeric

Notes: UNF:5:9VWQU1oDj+F+jxr19P8nXw==

irrelevant

f2468911 Location:

Variable Format: numeric

Notes: UNF:5:cSvsnO893nU/UtuyQzIyfw==

neutral

f2468911 Location:

Variable Format: numeric

Notes: UNF:5:Mc626WkNbv9XpCvYzbl0eg==

support

f2468911 Location:

Variable Format: character

Notes: UNF:5:GAHxcQmOkM5DU1ZkUw6Scw==

V1

f2468909 Location:

Variable Format: numeric

Notes: UNF:5:u9SsxxcOYWwqTYldbrd7vQ==

critical

f2468909 Location:

Variable Format: numeric

Notes: UNF:5:w5CI+zVnlvL56lwX5C1HLw==

irrelevant

f2468909 Location:

Variable Format: character

Notes: UNF:5:1hQkmj8O54EWajajSFRuWQ==

neutral

f2468909 Location:

Variable Format: numeric

Notes: UNF:5:CQGeuJ2LSi32AbK2NZYhXA==

support

f2468909 Location:

Variable Format: character

Notes: UNF:5:6QXPn7PXisgpwA6OurRzcw==

Date

f2468910 Location:

Variable Format: character

Notes: UNF:5:MTdCQ8eV7vutkz62ckySlw==

censor

f2468910 Location:

Variable Format: numeric

Notes: UNF:5:mT9/x77HMo4uwqFgk9fM1A==

Other Study-Related Materials

Label:

100urls.txt

Text:

A list of all URLs we used in our article to run our experiment

Notes:

text/plain

Other Study-Related Materials

Label:

AiWeiwei_obs_replicate.csv

Notes:

text/plain; charset=US-ASCII

Other Study-Related Materials

Label:

PotalaPalace_obs_replicate.csv

Notes:

text/plain; charset=US-ASCII

Other Study-Related Materials

Label:

readme_replication.txt

Text:

Read me files describing how to use the data

Notes:

text/plain; charset=US-ASCII

Other Study-Related Materials

Label:

replication.R

Notes:

text/plain; charset=US-ASCII

Other Study-Related Materials

Label:

replication.R~

Notes:

text/plain; charset=US-ASCII

Other Study-Related Materials

Label:

replication_script.R

Notes:

text/plain; charset=US-ASCII

Other Study-Related Materials

Label:

results_all_replication.csv

Notes:

text/plain; charset=US-ASCII

Other Study-Related Materials

Label:

reviewed_replication.csv

Notes:

text/plain; charset=US-ASCII

Other Study-Related Materials

Label:

UyghurLongName_obs_replicate.csv

Notes:

text/plain; charset=US-ASCII

Other Study-Related Materials

Label:

XJPDumplingcensor.csv

Notes:

text/plain; charset=US-ASCII

Other Study-Related Materials

Label:

XJPDumplingnotcensor.csv

Notes:

text/plain; charset=US-ASCII

Other Study-Related Materials

Label:

XJPDumplingupdown.csv

Notes:

text/plain; charset=US-ASCII

Other Study-Related Materials

Label:

XJPDumpling_textanalysis.R

Notes:

text/plain; charset=US-ASCII

Other Study-Related Materials

Label:

XJPDumpling_textanalysis.R~

Notes:

text/plain; charset=US-ASCII