Extraction of Information from Reports

4/10 Department Seminar


Sumali Conlon

Associate Professor, Department of MIS
School of Business Administration
University of Mississippi, University

3:00 pm, April 10, 2013
235 Weir Hall

Title: Extraction of Financial Information from Online Business Reports


CAINES, Content Analysis and INformation Extraction System, employs a semantic based information extraction (IE) methodology through a design science approach to extract unstructured text from the Web.  Our system was knowledge-engineered and tested on an active business database by experts who use the database regularly to perform their job functions. We believe that by heavily involving business experts, we are able to advance our thinking about IS research to build. CAINES extracts information to meet three objectives that were deemed important by our experts: (1) understand what current market conditions impacted the growth of certain balance sheets (2) summarize management’s discussion of potential risks and uncertainties with moving forward (3) identify significant financial activities including mergers, acquisitions, and new business segments.  These objectives were developed based on the advice of financial experts who regularly analyze financial reports.

A total of 21 online business reports from the EDGAR database, averaging about 100 pages long, were used in this study.  Based on financial expert opinions, extraction rules were created to extract information from financial reports. Using CAINES, one can extract information about global and domestic market conditions, market condition impacts, and information about the business outlook.  107,533 rows of data, and displays information regarding mergers, acquisitions, and business segment news between 2007 and 2009.  User testing of CAINES resulted in recall of 85.91%, precision of 87.16%, and an F-measure of 86.46%. Speed with CAINES was also greater than manually extracting information.  Users agree that CAINES quickly and easily extracts unstructured information from financial reports on the EDGAR database.


Sumali Conlon is an Associate Professor of Management Information Systems at the University of Mississippi. She received her B.A. in Statistics with Economics minor from Thammasat University, Bangkok, Thailand and Ph.D. from the Illinois Institute of Technology. Her teaching and research interests include Sentiment Analysis, Semantic Web, Web Services, Web Mining, Natural Language Processing, Information Retrieval, Knowledge Management, and Database Systems. Her work has appeared in Decision Support Systems, Journal of Information Systems, Journal of the American Society for Information Science, Information Processing & Management, Omega, Journal of Computer Information Systems, International Journal of Information and Management Science, among others.