Title: Advances in Cross-Lingual Syntactic Transfer
Speaker: Ryan McDonald (Google Research)
Date: Friday 2/15
Time: 3pm
Place: Graduate Center Room 6496, 5th Ave & 34th St.

The idea to use annotated resources from one language to learn models for
another has been around for at least a decade. Typically these models have
relied on access to parallel data. However, recent approaches have focused on
"direct" cross-lingual transfer, and in particular, delexicalized transfer.
Delexicalized parsing models are conditioned only on properties of the input
that are available across languages, typically induced tags or clusters. Since
these properties are universally available, it is possible to directly use a
parser trained on English for every other language. This simple method has
shown itself to be surprisingly effective and outperforms the best
weakly-supervised models by a significant margin. However, the assumptions
underlying these models are far to weak to obtain parsing accuracies at the
level of monolingual supervised methods. In this talk I will focus on porting
ideas from work on selective parameter sharing in multi-source direct transfer
to highly accurate latent CRF parsing models. I will then present novel
semi-supervised learning algorithms that relexicalize these models on unlabeled
target language data to give significant improvements. The final model brings
us one step closer to building robust syntactic parsers for all the world's

Joint work with: Oscar Tackstrom, Slav Petrov, Keith Hall, Joakim Nivre.

Ryan McDonald is a Research Scientist at Google. He received a Ph.D. from the
University of Pennsylvania and a Hon.B.Sc. from the University of Toronto.
Ryan’s thesis focused on the problem of syntactic dependency parsing. His work
allowed complex linguistic constructions to be modeled in a direct and
tractable way, which enabled parsers that are both efficient and accurate. In
2008 he wrote a book on the subject entitled 'Dependency Parsing'. Since
joining Google, Ryan has continued to work on syntactic analysis, in
particular, extending statistical models learned on resource rich languages,
like English, to resource poor languages. Ryan’s research also addresses how
these systems can be used to improve the quality of a number of important
user-facing technologies, such as search, machine translation, and sentiment