Speaker: Karen Livescu (TTI-Chicago)

Where: CUNY Graduate Center Room 6496
When: Monday, December 2nd. 11:45am-1:15pm  (Note the unusual time)

Title: Multi-view learning of feature representations for speech (and language!)

Abstract: This talk presents an approach to learning improved features for
prediction tasks.  The main focus will be on acoustic features for speech
recognition, but the talk will also include brief forays into learning features
for other types of data (text, images) and for other tasks.  It is often
possible to improve performance of classifiers or other predictors by starting
with a high-dimensional input feature vector and applying linear or non-linear
dimensionality reduction.  The learned transformation may be unsupervised (e.g.,
principal components analysis, manifold learning) or supervised (e.g., linear
discriminant analysis, neural network-based representations).

The talk describes a recent approach that is unsupervised, but using a second
"view" of the data as additional information for learning a useful
transformation.  The different views may be audio, images, text, and others. 
The approach we take, using canonical correlation analysis (CCA) and its
nonlinear extensions, finds representations of the two views that are maximally
correlated.  This approach avoids some of the disadvantages of other
unsupervised methods, such as PCA, which are sensitive to noise and data
scaling, and possibly of supervised methods, which are more task-specific. 
While most of the focus is on the unsupervised setting, the approach can be
extended also to supervised settings where an additional data view is also

The talk will cover recent work using CCA, its nonlinear extension via kernel
CCA, and a newly proposed, parametric nonlinear extension using deep neural
networks dubbed deep CCA.  Results to date show good improvements on speech
recognition, as well as promising initial results on other tasks.


Bio:  Karen Livescu is an Assistant Professor at TTI-Chicago.  She completed her
PhD at MIT in the CSAIL Spoken Language Systems group, and was a post-doctoral
lecturer in the MIT EECS department.  Karen's main research interests are in
speech and language processing, with a slant toward combining machine learning
with knowledge about linguistics and speech science.  Her recent work has
included multi-viewlearning of speech representations, articulatory models of
pronunciation variation, discriminative training with low resources for spoken
term detection and pronunciation modeling, and automatic sign language
recognition.  She is a member of the IEEE Spoken Language Technical Committee,
associate editor for IEEE Transactions on Audio, Speech, and Language
Processing, and subject editor for Speech Communication.  She is an
organizer/co-organizer of a number of recent workshops, including the ISCA SIGML
workshops on Machine Learning in Speech and Language Processing, the Midwest
Speech and Language Days, and the Interspeech Workshop on Speech Production in
Automatic Speech Recognition.