Time: 215pm-330pm, Friday, March 11, 2011 Place: Room 4102, CUNY Graduate Center, 365 Fifth Ave (34str&35str). Speaker: Mitch Marcus (Penn) Title: Acquiring linguistic structure automatically using minimal computation Abstract: Modeling the acquisition of language from naturally occurring data is a central challenge for both linguistics and child language development and also for the applied cognitive science of natural language processing. From a scientific viewpoint, this has been a central challenge for over fifty years. From a technological viewpoint, supervised machine learning methods have yielded quite powerful technologies over the past decade, but require very expensive annotated corpora for training; unsupervised learning algorithms with similar performance would be of enormous value. To force ourselves to more fully utilize the statistical and linguistic aspects of the signal the child uses to learn her native language; we have adopted a research strategy of minimal computation, attempting unsupervised language learning using only very simple counting methods. To date, we have approached the learning of morphological structure, part of speech induction and part of speech tagging using two key sources of constraint: First, the process of language acquisition, whether in children or machines, must exploit the Zipfian statistical distribution of the underlying data stream. We have used this constraint to develop a state-of-the-art algorithm for morphology acquisition. Second, appropriate linguistic representations provide essential constraints about domains of locality that make the learning problem tractable. We have exploited locality implications of the Minimalist Program to develop new algorithms for automatically distinguishing open class lexical items from closed class words and grammatical formatives, and then used the output of this process to develop a fully unsupervised model of part of speech labeling. This talk presents joint work with Erwin Chan, Constantine Lignos, Qiuye Zhao, and Charles Yang. Speaker Bio: Mitchell Marcus is the RCA Professor of Artificial Intelligence in the Department of Computer and Information Science at the University of Pennsylvania. He was the principal investigator for the Penn Treebank Project through the mid-1990s; he and his collaborators continue to develop hand-annotated corpora for use world-wide as training materials for statistical natural language systems. Other research interests include: statistical natural language processing, human-robot communication, and cognitively plausible models for automatic acquisition of linguistic structure. He has served as chair of Penn's Computer and Information Science Department, as chair of the Penn Faculty Senate, and as president of the Association for Computational Linguistics. He is also a Fellow of the American Association of Artificial Intelligence.