Machine learning in Bioinformatics

10/16 Student Conference Presentation Rehearsal Seminar

Part I: Classification and Feature Selection Using Top Discriminating Pairs with Information Gain on Microarray Data

by Tina Gui who is a Ph.D. student in the Department of Computer and Information Science at the University of Mississippi, who is currently working with Dr. Dawn E. Wilkins and Dr. Yixin Chen.  Tina received her Bachelor of Science in Computer Science from California State University. Her research interests include machine learning, data mining and bioinformatics.


Abstract: In microarray technology, gene expression classification and feature selection are commonly used techniques to diagnose diseases. The current classification technical practice has limitations when using gene expression microarray data. Our new approach, Top Discriminating Pairs (TDP) classifier, is motivated by this issue and aimed to improve the class prediction accuracy on microarray data. To illustrate the effectiveness of TDP, we combined the TDP methodology with information gain (IG) to achieve a more effective feature selector.


Part II: Rule Based Regression and Feature Selection for Biological Data

by Sheng Liu who received his Bachelor of Science in Biochemistry from Wuhan University, Master of Science in Computer Science from University of Mississippi. He is now Doctor of Philosophy student in Computer Science at Department of Computer and Information Science, University of Mississippi. His research interests include machine learning and bioinformatics/computational Biology.


Abstract: Regression is widely utilized in a variety of biological problems involving continuous outcomes. There are a number of methods for building regression models ranging from linear models to more complex nonlinear ones. In many practical applications, the relations are nonlinear. These relations can be modeled by nonlinear regression techniques effectively. However, many models built with nonlinear techniques have the difficulty on interpretation, which is crucial in many biological problems.We propose a rule based regression algorithm that uses 1-norm regularized random forests. The proposed approach simultaneously extracts a small number of rules from generated random forests and eliminates unimportant features, and hence is able to provide a simple interpretation. We tested the approach on several datasets. The proposed approach is able to construct a significantly smaller set of regression rules using a subset of attributes while achieving prediction performance comparable to that of conventional random forests regression. It demonstrates high potential in terms of prediction performance and interpretation ease on studying nonlinear relationships of the subjects.