TITLE: Multi-structured inference strategies for text-to-text generation


Automated personal assistants and summarization tools are increasingly prevalent
in the modern age of mobile computing but their limitations highlight the 
longstanding challenges of natural language generation. Focused text-to-text 
generation problems such as text compression and simplification present an 
opportunity to work toward general-purpose statistical models for text generation
without strong assumptions on a domain or internal semantic representation. In 
this talk, I will present recent work on supervised sentence compression which 
simultaneously recovers heterogenous structures such as n-gram sequences and 
dependency trees to specify an output sentence. Joint inference is obtained via 
a compact integer programming formulation using flow networks to avoid cyclic and 
disconnected structures. The resulting approach generalizes over several established
compression techniques and yields significant performance gains on well-studied datasets.

I will then examine a number of extensions to this multi-structured generation 
approach. One line of research considers practical issues of runtime and applies
dual decomposition as well as dynamic programs for dependency parsing to multi-
structured compression, resulting in significantly faster inference with no loss in
output quality. Other extensions exploit the flexibility of integer programming and
apply the former approach to more challenging problems such as sentence fusion as
well as towards producing additional representations such as directed acyclic graphs
that represent predicate-argument structures. Finally, I will briefly discuss the use of
multi-structured inference in other natural language applications such as text
alignment and summarization.


Kapil Thadani is a research scientist at Yahoo Labs in New York working on 
natural language processing. His current research focuses on structured 
prediction problems which lie at the intersection of natural language 
understanding and generation. He received a Masters and PhD from Columbia 
University and has worked on a wide variety of machine learning and natural 
language applications during his doctoral studies and internships.