Time: 2pm-3pm, Monday, June 6, 2011

Place: CS Department Conference Room 225, Science Building, Queens College, CUNY

Speaker: Asad Sayeed (University of Maryland)

TITLE: A distributional and syntactic approach to fine-grained opinion
mining

ABSTRACT: We consider the problem of finding {source, target, opinion}
triples in the IT business press as part of a larger social science
research program of analyzing the diffusion of IT innovations. In this
context, we can discern a list of innovations as targets from the domain
itself. We can then use this list as an anchor for finding the other
two members of the triple at a "fine-grained" level---paragraph contexts
or less.

We first demonstrate a vector space model for finding opinionated
contexts in which the innovation targets are mentioned. We can find
paragraph-level contexts by searching for an
"expresses-an-opinion-about" relation between sources and targets using
a supervised model with an SVM that uses features derived from a
general-purpose subjectivity lexicon and a corpus indexing tool. We show
that our algorithm correctly filters the domain relevant subset of
subjectivity terms so that they are more highly valued.

We then turn to identifying the opinion. Typically, opinions in opinion
mining are taken to be positive or negative. We discuss a crowd sourcing
technique developed to create the seed data describing human perception
of opinion bearing language needed for our supervised learning
algorithm. Our user interface successfully limited the meta-subjectivity
inherent in the task ("What is an opinion?") while reliably retrieving
relevant opinionated words using labour not expert in the domain.

Finally, we developed a new data structure and modeling technique for
connecting targets with the correct within-sentence opinionated
language. Syntactic relatedness tries (SRTs) contain all paths from a
dependency graph of a sentence that connect a target expression to a
candidate opinionated word. We use factor graphs to model how far a
path through the SRT must be followed in order to connect the right
targets to the right words.

BIO: Asad Sayeed is finishing his PhD in computer science at the
University of Maryland, College Park. He has worked on problems spanning
a wide range of sub-fields in computational linguistics and information
extraction including sentiment analysis and named-entity detection.