Time: 2pm-3pm, Friday, October 23
Place: Room 4422, CUNY Graduate Center, 365 Fifth Ave (34str&35str).
Speaker: Fei Huang (IBM T.J. Watson Research Center)

Title: Confidence Measure for Word Alignment


Data-driven machine translation learn various models from large
amount of bilingual data with word alignment. "Noises" in the
training data often introduce many word alignment errors. We present
a confidence measure for word alignment based on the posterior
probability of alignment links. We introduce sentence alignment
confidence measure and alignment link confidence measure. Based on
these measures, we improve the alignment quality by selecting high
confidence sentence alignments and  alignment links from multiple
word alignments of the same sentence pair. Additionally, we remove
low confidence alignment links from the MaxEnt word alignment of a
bilingual training corpus, which increases the alignment F-score,
improves Chinese-English and Arabic-English translation quality and
significantly reduces the phrase translation table size.

Bio: Dr. Fei Huang is a research staff member at IBM T.J. Watson
Research Center, His current research focus on statistical machine
translation while his interest is on various aspects of statistical
NLP. He obtained his PhD. from School of Computer Science at Carnegie
Mellon University in 2006, where he worked on information extraction
and machine translation, specializing in named entity extraction and
translation from text and speech.