Time: 2pm-3pm, Friday, Oct 8, 2010
Place: Room 4102, CUNY Graduate Center, 365 Fifth Ave (34str&35str).
Speaker: Martin Jansche (Google)
Title: Automatic Transliteration of Proper Names: Practice and Experience


If you've ever seen street artists who offer to write your name in
Chinese characters, you have encountered the problem of proper name
transliteration.  It is a pervasive problem that arises in many
multi-lingual applications.  I will describe recent practice and
experience with proper name transliteration in the context of a deep
internationalization of Google Maps.  In order to produce a world map
in e.g. Japanese, millions of named entities had to be transliterated
from a variety of source languages into Japanese.  I will discuss in
detail the linguistic issues underlying transliteration into a variety
of languages and describe some of the techniques used, including
hand-crafted rules, automatically trained finite-state models, and
automatic extraction of transliterations from Wikipedia.  I will
highlight several lessons we learned while designing a large-scale
multi-lingual transliteration system.

Joint work with Sascha Brawer, Richard Sproat, Hiroshi Takenaka, and
Yui Terashima.

Speaker Bio: 

Martin Jansche's research interests are in empirical methods in 
automatic natural language and speech processing.  He received his PhD 
in 2003 from the Ohio State University and joined the Center for 
Computational Learning Systems at Columbia University.  In 2007, 
he joined Google, Inc. in New York as part of the speech research
group, working on web search and speech applications.