Title:
|
IDENTIFYING AUTHORS OF TEXT BASED ON ASSOCIATION RULES |
Author(s):
|
Hiroshi Sugimura, Ryosuke Saga, Kazunori Matsumoto |
ISBN:
|
978-972-8939-47-2 |
Editors:
|
Miguel Baptista Nunes, Pedro IsaĆas and Philip Powell |
Year:
|
2011 |
Edition:
|
Single |
Keywords:
|
Identifying authors, Authorship detection, Classification, Association rule, N-gram. |
Type:
|
Poster/Demonstration |
First Page:
|
355 |
Last Page:
|
358 |
Language:
|
English |
Cover:
|
|
Full Contents:
|
click to dowload
|
Paper Abstract:
|
This paper proposes a system that identifies the author of a text based on association rules. In the case of European languages, distribution of words, length of words and sentences, patterns of punctuations, and etc. are effective to identify authors. We cannot apply this approach to many Asian languages because they do not have explicit word boundaries in text. Preliminary processing of text data such as morphological analysis may influence the final results. Therefore, we propose a language independent method for identifying authors based on association rules of N-grams. The distribution of N-grams greatly depends on genre, domain, and writing period. Therefore, a user may not obtain interesting knowledge as author's feature from the distributions. We focus on combinations of N-grams, and extract association rules from these N-grams. The feature vector of an author is created by a set of probabilities of occurrence of association rules. For author identification, the system measures dissimilarity between two feature vectors. |
|
|
|
|