Time: 215pm-330pm, Friday, Sept 9, 2011
Place: Room 4102, CUNY Graduate Center, 365 Fifth Ave (34str&35str).
Speaker: Heng Ji (CUNY)
Title: Leveraging Redundancy for Cross-Source Information Extraction

Abstract:

One of the initial goals for Information Extraction (IE) was to create a
knowledge base from the entire input corpus, such as a profile or a
series of activities about any entity, and allow further logical
reasoning on the knowledge base. In practice, such information may be
scattered among a variety of sources (large-scale documents, languages,
genres and data modalities). This requires the ability to identify
topically-related documents and to integrate facts, possibly redundant,
possibly complementary, possibly in conflict, coming from these
documents. Unfortunately the knowledge base constructed from a typical
IE pipeline often contains lots of erroneous and conflicting facts.
Interestingly, when the data grows beyond some certain size, the
extracted facts become inter-dependent and thus we can take advantage of
information redundancy to conduct reasoning across sources, capture
information dynamics and background knowledge, and thus improve the
performance of IE. This talk will describe a structure called
"Information Networks" to conduct more complete information fusion and
robust inference. Experiments on cross-document, cross-lingual,
cross-media and cross-genre IE that utilized this structure will be
presented and discussed.

Bio:

Heng Ji is an assistant professor and doctoral faculty in Computer
Science at Queens College and the Graduate Center of City University of
New York. She received her Ph.D. in Computer Science from New York
University in 2007. Her research interests focus on Information
Extraction and Fusion. She was the recipient of Google Research Award in
2009 and NSF CAREER Award in 2010. She has been co-organizing the NIST
TAC Knowledge Base Population task in 2010 and 2011. She is the
information fusion task leader in Army Research Lab's Information
Network Academic Research Center, and the Information Extraction area
chair of NAACL-HLT2012.