Exploiting similarities in writing styles to predict authorship

3/20 Department Seminar 

Thamar Solorio

Assistant Professor at the University of Alabama at Birmingham

3:00 p.m., March 20, 2013
235 Weir Hall

Title: Exploiting similarities in writing styles to predict authorship

Abstract:

Researchers in Authorship Identification (AI) assume the existence of individualized and identifiable writing styles. The success of previous work empirically supports this assumption, at least on the collections that have been used to test these approaches. However, when looking at writing samples from different authors it is clear that many similarities exist among them. For instance, when analyzing web forum data we can see that several authors share emoticon patterns, even the absence of emoticons is in itself a pattern shared by many authors. Similarly, other authors use punctuation marks in similar ways (some like to use more than one exclamation point to highlight emotion) or tend to write in long sentences. In this talk, I will present our research on exploiting similarities in writing styles to boost prediction accuracy. The idea is to extract these similarities across authors in a modality specific way, where by modality I mean the different linguistic dimensions (syntactic, lexical, stylistic).  These similarities are extracted by independently clustering, in an unsupervised way, the training instances using the subset of features from each modality. I will present experimental results on different corpora that show our method is promising.

Bio:

Thamar Solorio is an assistant professor at the University of Alabama at Birmingham (UAB). Before joining UAB she was a research associate at the University of Texas at Dallas. She obtained her PhD in Computer Science from the National Institute of Astrophysics, Optics and Electronics, in 2005. Her current research is funded by the National Science Foundation and the Office of Naval Research. Her present research interests include analysis of language samples for clinical purposes, syntactic analysis of bilingual discourse, and more recently authorship analysis on social media.