Machine Learning Approaches to Text and Multimedia Mining

Today's search engines are able to retrieve and index several billion
web pages, but the analysis that they perform on the content of these
pages is still very shallow -- as is, consequently, the functionality
that they are able to offer the user. What if these search engines
could, for example, extract the factual content from the pages they
retrieve, classify the pictures that accompany the text, disambiguate
namesakes or mine opinions expressed in the pages? Undoubtably, this
would open a world of possibilities in what concerns new
functionalities and enhanced user experience, fueled by richer
underlying data models. In this talk, I will describe my research,
spanning a number of years, on these topics. The common denominator
in the several approaches that I will present is the fact that they
rely heavily on machine learning techniques, to train systems to
classify and extract target information. The talk will also overview
real-world applications of the systems originating from the research
-- for instance, in one case we trained one of our systems to extract
information from a collection of jet engine reports provided by
Rolls-Royce, resulting in a positive impact in the way their
engineers search for information in the course of their work.

Jos Iria is a Research Associate at the University of Sheffield
(UK), where he has been working on Machine Learning-based Text and
Multimedia Mining since 2002. Before that, he worked for 2 years at
Siemens R&D in Lisbon, Portugal. Jos will soon (2009) hold a Ph.D.
in Computer Science from the University of Sheffield, having obtained
both his M.Sc. and Engineering degrees also in Computer Science
from Instituto Superior Tcnico (Technical University of Lisbon). His
research interests include Information Extraction, Document
Classification, Opinion Mining, Supervised and Semi-supervised
Learning, Graph-based methods, and applications of this sort of
technologies to Web and Knowledge Management problems. He worked in
two large European projects and one UK project, and his current
activities include the coordination of the multimedia mining area in
the X-Media project (http://www.x-media-project.org/), a large EU
project that deals with acquiring, sharing and reusing knowledge in
large distributed environments and that involves 15 partners, among
which FIAT and Rolls-Royce as industrial partners. Jos has made more
than 30 contributions to international journals, conferences and
workshops on the topics of his research.