Title:
|
TEXT AUTHORSHIP DETECTION USING DECISION TREES AND ASSOCIATION RULES OVER N-GRAM |
Author(s):
|
Hiroshi Sugimura, Yuta Taniguchi, Ryosuke Saga, Kazunori Matsumoto |
ISBN:
|
978-972-8939-23-6 |
Editors:
|
António Palma dos Reis and Ajith P. Abraham |
Year:
|
2010 |
Edition:
|
Single |
Keywords:
|
Authorship detection, N-gram, decision tree, association rule |
Type:
|
Poster/Demonstration |
First Page:
|
183 |
Last Page:
|
188 |
Language:
|
English |
Cover:
|
|
Full Contents:
|
click to dowload
|
Paper Abstract:
|
This paper shows methods for detecting Japanese texts authorship using decision trees and association rules, both of which are constructed over N-grams. Authorship detection of Japanese texts often requires additional grammatical information that is obtained from the morphological analysis. Thus the performance of a morphological tool used in the analysis influences the entire ability of the detection. To avoid this problem, we in this study use a set of N-gram that are sequences of N letters simply cut out from texts. In the first part of the study, we investigate a use of decision tree learning over N-gram. Since the size of possible N-gram becomes combinatory large, we use forward and backward selection approach to obtain the effective subset of N-gram by which the expected prediction accuracy of the decision tree becomes optimal. In the latter part, we state how association rules are used for detections, and compare the results of both approaches. |
|
|
|
|