Title: Information Extraction from Financial Documents
Speaker: Sarah Hoffman (FactSet)
Time: 2:15pm-3:30pm, Friday, March 2, 2012
Place: Room 4102, CUNY Graduate Center, 365 Fifth Ave (34str&35str).
Information extraction from financial documents can be done in many 
different ways.  At FactSet, we use a mixture of rule-based and machine 
learning techniques depending on what we are extracting.  I will be 
discussing some of our techniques for information extraction as well as 
some challenges we faced and advantages of some of our approaches.

Sarah Hoffman is a Senior Software Engineer and Engineering Manager for 
the Content Collection Services parsing group at FactSet Research Systems, 
where she has been working since June 2007. Sarah is also on the board of 
Women in Engineering NY at FactSet.  Prior to FactSet, Sarah worked as an 
Information Technology Analyst at Lehman Brothers for three years. Sarah 
holds an MS degree from Columbia University in Computer Science with a 
focus on Natural Language Processing, where she did research on 
automatically detecting deceptive speech.  Sarah also holds a BBA from 
Baruch College in Computer Information Systems.