Speaker: Yejin Choi (Stony Brook)
Time: 2:15pm-3:30pm, Friday, Dec 2
Place: Room 4102, CUNY Graduate Center, 365 Fifth Ave (34str&35str).

In Search of Styles in Language: Identifying Deceptive Product
Reviews, Wikipedia Vandalism, and the Gender of Authors via
Statistical Stylometric Analysis

Language is a window into the mind. Stylometric analysis, the study of analyzing 
linguistic styles in language, can help uncovering the cognitive state and the 
personal identity of the writer. In this talk, I will present three case studies 
of Natural Language Processing (NLP) tasks that expand the scope of statistical 
stylometric analysis. First I will present the study of identifying deceptive 
product reviews, i.e., fake reviews that are written by people who are paid to 
fabricate positive reviews. As it turns out, it is surprisingly hard for human 
to distinguish fake reviews from truthful ones. Statistical analysis of language 
use on the other hand leads to nearly 90% accuracy, and provides us new clues in 
spotting suspicious reviews. Next I will introduce the study of detecting 
Wikipedia vandalism, where textual vandalism can be viewed as a unique genre in 
which a group of people with similar purpose share similar linguistic behavior. 
Finally, I will present the study of gender attribution, where we will examine 
whether there are gender-specific linguistic signals that go beyond the 
boundaries of topic and genre, and whether they are traceable even in modern and 
scientific literature.

Yejin Choi is an Assistant Professor in the Computer Science Department at Stony 
Brook University (SUNY Stony Brook). She received her Ph.D. in Computer Science 
from Cornell University in 2010 in the area of Natural Language Processing. Her 
research interests include stylometric analysis, natural language generation from 
images, and opinion & sentiment analysis in text.