Seminar - Learning Strategies to Reduce Label cost and Achieve High Accuracy

CIS Seminar
3:00 PM, Wednesday Apr. 1 2015
235 Weir Hall

Title: Learning Strategies to Reduce Label cost and Achieve High Accuracy

Abstract: This presentation examined several strategies to achieve high accuracy using as few labeled data as possible in different cases. Specifically, two fields of semi-supervised learning in machine learning -- active learning and co-training -- were extensively discussed. Multiple strategies for selecting unlabeled points for labeling in active learning were discussed, the most uncertain and "representative" points should be queried for an oracle to label. Co-training is a special semi-supervised learning, which utilizes two different views on data to augment the training set. Co-training algorithms were reviewed theoretically to address why and how co-training works. When two views are insufficient, large diversity ensures co-training outputs two near-optimal classifiers without suffering label noise and sampling bias. Last, ensemble methods are briefly reviewed. Boosted Random Forest is an improved ensemble method, which applied AdaBoost idea to general Random Forest, in which examples are weighted and each internal decision trees are also weighted. The details of boosted Random Forest was briefly reviewed.

Bio: Zhendong Zhao is a Ph.D. student in the Department of Computer and Information Science at the University of Mississippi, who is currently working for Dr. Dawn E. Wilkins and Dr. Yixin Chen.  He received his master degree in Computer Science in 2008 and his phd degree in Chemistry from the University of Mississippi in 2010. He previously worked as a postdoc research associate from July 2010 to August 2013 in the Department of Computer and Information Science at the University of Mississippi. His research interests include machine learning, data mining, large scale computing, web technologies.