Digital Library

cab1

 
Title:      TEXT AUTHORSHIP DETECTION USING DECISION TREES AND ASSOCIATION RULES OVER N-GRAM
Author(s):      Hiroshi Sugimura, Yuta Taniguchi, Ryosuke Saga, Kazunori Matsumoto
ISBN:      978-972-8939-23-6
Editors:      António Palma dos Reis and Ajith P. Abraham
Year:      2010
Edition:      Single
Keywords:      Authorship detection, N-gram, decision tree, association rule
Type:      Poster/Demonstration
First Page:      183
Last Page:      188
Language:      English
Cover:      cover          
Full Contents:      click to dowload Download
Paper Abstract:      This paper shows methods for detecting Japanese texts authorship using decision trees and association rules, both of which are constructed over N-grams. Authorship detection of Japanese texts often requires additional grammatical information that is obtained from the morphological analysis. Thus the performance of a morphological tool used in the analysis influences the entire ability of the detection. To avoid this problem, we in this study use a set of N-gram that are sequences of N letters simply cut out from texts. In the first part of the study, we investigate a use of decision tree learning over N-gram. Since the size of possible N-gram becomes combinatory large, we use forward and backward selection approach to obtain the effective subset of N-gram by which the expected prediction accuracy of the decision tree becomes optimal. In the latter part, we state how association rules are used for detections, and compare the results of both approaches.
   

Social Media Links

Search

Login