Title: Computational readability: need for a domain-oriented approach? Speaker: Thomas Francois (Penn) Place: Science Center. Rm 4102, CUNY Graduate Center. 5th Ave & 34th St. Abstract: Readability aims at automatically assessing the difficulty of texts for a given population, using some of the linguistic characteristics of the texts. The classic attempts to do so (Flesch, 1984 ; Dale and Chall, 1948) were often concerned with providing a general tool that could be used for a large range of situations. More recently, researchers have focused on more specific group of readers, such as adults with intellectual disabilities (Feng et al., 2009) or readers in a foreign language (François, 2012). These studies have shown that, in such contexts, some specialized features are more valuable than the “classic” ones. So far, the impact of the corpus used to trained readability models – which is strongly connected with the task a given formula is aiming at – have not been much investigated. However, Collins-Thompson et al. (2005) tried to apply their to various corpora and noticed large differences in the performance between them. In this talk, we first report our previous work that led to a new computational model for the readability of French as a foreign language, which was trained on a corpus of texts from textbooks. In a second step, we describe how this state-of-the-art model behave when applied to a different corpus of texts, from FFL “readers”. We show that not only a ten-fold cross-validation approach does not provide an accurate estimate of the model's performance for all types of texts intended to a FFL audience, but also that the efficiency of some features might dramatically vary depending on the characteristics of the corpus. We conclude suggesting that these findings advocate for a more domain-oriented approach of readability and that the main avenue for further performance improvement might be the reliable labelling of a large domain corpus. Bio: Thomas François is a Belgium American Exchange Foundation (BAEF) and Fulbright Fellow. He is doing a postdoctoral research stay at the Institute of Research for Cognitive Science (IRCS) of the University of Pennsylvania where he focus on various approaches to improve current readability models for FFL. He received his Ph.D. from the University of Louvain (Belgium) in 2011. His thesis, entitled “Les apports du traitement automatique du langage à la lisibilité du français langue étrangère”, provides a very complete review of the readability field for English and French as well as a new readability formula for FFL. It has been awarded the Best 2012 Thesis Award by the ATALA, the French association for NLP.