Digital Library

cab1

 
Title:      IDENTIFYING AUTHORS OF TEXT BASED ON ASSOCIATION RULES
Author(s):      Hiroshi Sugimura, Ryosuke Saga, Kazunori Matsumoto
ISBN:      978-972-8939-47-2
Editors:      Miguel Baptista Nunes, Pedro IsaĆ­as and Philip Powell
Year:      2011
Edition:      Single
Keywords:      Identifying authors, Authorship detection, Classification, Association rule, N-gram.
Type:      Poster/Demonstration
First Page:      355
Last Page:      358
Language:      English
Cover:      cover          
Full Contents:      click to dowload Download
Paper Abstract:      This paper proposes a system that identifies the author of a text based on association rules. In the case of European languages, distribution of words, length of words and sentences, patterns of punctuations, and etc. are effective to identify authors. We cannot apply this approach to many Asian languages because they do not have explicit word boundaries in text. Preliminary processing of text data such as morphological analysis may influence the final results. Therefore, we propose a language independent method for identifying authors based on association rules of N-grams. The distribution of N-grams greatly depends on genre, domain, and writing period. Therefore, a user may not obtain interesting knowledge as author's feature from the distributions. We focus on combinations of N-grams, and extract association rules from these N-grams. The feature vector of an author is created by a set of probabilities of occurrence of association rules. For author identification, the system measures dissimilarity between two feature vectors.
   

Social Media Links

Search

Login