Time: 1245pm-145pm, Friday, March 26
Place: Room 6496, CUNY Graduate Center, 365 Fifth Ave (34str&35str).
Speaker: Nitin Madnani (Maryland)
Title: The Circle of Meaning: From Translation to Paraphrasing and Back

Abstract:
The preservation of meaning between their inputs and outputs is perhaps
the most ambitious and, often, the most elusive goal of systems that attempt to 
process natural language. Nowhere is this goal of more obvious importance than 
for the tasks of machine translation and paraphrase generation. Preserving 
meaning between the input and the output is paramount for both, the monolingual 
vs bilingual distinction notwithstanding. In this talk, I propose a novel, 
symbiotic connection between these two tasks.

Today's SMT systems require high quality human translations, in addition
to large bitexts, for parameter tuning. For such tuning, it is generally 
considered wise to have multiple (usually 4) reference translations to avoid 
unfair penalization of translation hypotheses.  However, this reliance on 
multiple reference translations creates a problem, because
reference translations are labor intensive and expensive to obtain. Therefore, 
most current MT datasets only contain a single reference. This leads to the 
problem of reference sparsity--- the primary open problem that I attempt to 
address in this talk---one that has a serious effect on the SMT parameter 
tuning process.

Bannard & Callison-Burch (2005) were the first to provide a practical 
connection between phrase-based statistical machine translation techniques 
paraphrase generation. However, their technique is restricted to generating 
phrasal paraphrases. We build upon their approach and augment a phrasal 
paraphrase extractor into a sentential paraphraser with extremely broad
coverage. The novelty in this augmentation lies in the further strengthening of 
the connection between statistical machine translation and paraphrase 
generation; whereas  Bannard and Callison-Burch only rely on SMT machinery to 
extract phrasal paraphrase rules and stop there,
we take it a few steps further and build a full English-to-English SMT system. 
This system can, as expected, "translate" any English input sentence into a new 
English sentence with the same degree of meaning preservation that exists in a 
bilingual SMT system. In  fact, being a
state-of-the-art SMT system, it is able to generate n-best "translations" for 
any given input sentence. This sentential paraphraser, built almost entirely 
from SMT machinery, represents the first 180 degrees of the proposed circle of 
meaning.

To complete the circle, we propose a novel connection in the other direction. 
We claim that the sentential paraphraser, once built in this fashion, can 
provide a solution to the reference sparsity problem and, hence, be used to 
improve the performance a bilingual SMT system. We posit two different 
instantiations of the sentential paraphraser and show
results that provide empirical validation for this proposed connection.

Speaker Bio:
Nitin Madnani is a final year PhD student at the University of Maryland, 
College Park. He works as a research assistant in the Laboratory for 
Computational Linguistics and Information Processing with his advisors Bonnie 
Dorr and Philip Resnik. Besides exploring the intersection of
and interaction between machine translation and paraphrasing as part of his 
thesis, he has also worked on multi-document summarization and information 
retrieval. He is planning to graduate in May 2010.