Time: 215pm-330pm, Friday, Sept 9, 2011 Place: Room 4102, CUNY Graduate Center, 365 Fifth Ave (34str&35str). Speaker: Heng Ji (CUNY) Title: Leveraging Redundancy for Cross-Source Information Extraction Abstract: One of the initial goals for Information Extraction (IE) was to create a knowledge base from the entire input corpus, such as a profile or a series of activities about any entity, and allow further logical reasoning on the knowledge base. In practice, such information may be scattered among a variety of sources (large-scale documents, languages, genres and data modalities). This requires the ability to identify topically-related documents and to integrate facts, possibly redundant, possibly complementary, possibly in conflict, coming from these documents. Unfortunately the knowledge base constructed from a typical IE pipeline often contains lots of erroneous and conflicting facts. Interestingly, when the data grows beyond some certain size, the extracted facts become inter-dependent and thus we can take advantage of information redundancy to conduct reasoning across sources, capture information dynamics and background knowledge, and thus improve the performance of IE. This talk will describe a structure called "Information Networks" to conduct more complete information fusion and robust inference. Experiments on cross-document, cross-lingual, cross-media and cross-genre IE that utilized this structure will be presented and discussed. Bio: Heng Ji is an assistant professor and doctoral faculty in Computer Science at Queens College and the Graduate Center of City University of New York. She received her Ph.D. in Computer Science from New York University in 2007. Her research interests focus on Information Extraction and Fusion. She was the recipient of Google Research Award in 2009 and NSF CAREER Award in 2010. She has been co-organizing the NIST TAC Knowledge Base Population task in 2010 and 2011. She is the information fusion task leader in Army Research Lab's Information Network Academic Research Center, and the Information Extraction area chair of NAACL-HLT2012.