Digital Library

cab1

 
Title:      EXPLOITING DOCUMENT STABILITY LEVEL FOR EFICIENT TEMPORALLY - AWARE CLASSIFICATION
Author(s):      Thiago Salles, Isac Sandin , Luiz Carlos Oliveira, Leonardo Rocha, Marcos André Gonçalves
ISBN:      978-989-8533-01-2
Editors:      Bebo White, Pedro Isaías and Flávia Maria Santoro
Year:      2011
Edition:      Single
Keywords:      Automatic document classification, temporal effects, document stability
Type:      Full Paper
First Page:      309
Last Page:      316
Language:      English
Cover:      cover          
Full Contents:      click to dowload Download
Paper Abstract:      One major challenge for Automatic Document Classification (ADC) algorithms is that the characteristics of the docu-ments and the classes to which they belong may change over time (a.k.a., temporal effects). Temporally-aware algorithms for ADC have been recently proposed to deal with these issues. Although these algorithms have been shown to be very effective, they have a major side effect of being naturally lazy, since they need to know the creation time of the test doc-uments in order to properly learn the (time-based) classification model. Unlike eager algorithms, which induce a global classification model in an off-line setting, lazy algorithms postpone the classification model learning until the classifier receives a test document, incurring in potentially high classification runtime at the test phase. We propose a strategy to reduce such classification runtime for real-world ADC problems, while maintaining the classification task robust to the temporal effects. Our proposal is based on the fact that some of the training data characteristics remain stable over time, although some other characteristics may change as time goes by. We exploit this property by defining a metric called Document Stability Level (DSL) that captures the extent of the observed variations of the documents' content with re-spect to the classes. We then use the DSL to select a set of stable documents to be processed by a traditional classifier without being compromised by the temporal effects. The remaining (unstable) documents are left to be processed by the temporally-aware classifiers. As our experimental evaluation shows, despite the simplicity of our solution, it can maintain high classification effectiveness while reducing the test phase runtime of the temporally-aware classifiers by up to 55%.
   

Social Media Links

Search

Login