Digital Library

cab1

 
Title:      SEQUENTIAL PATTERN MINING WITH APPROXIMATED CONSTRAINTS
Author(s):      Cláudia Antunes , Arlindo L. Oliveira
ISBN:      972-98947-3-6
Editors:      Nuno Guimarães and Pedro Isaías
Year:      2004
Edition:      Single
Keywords:      Data Mining, Pattern Mining, Sequential Pattern Mining, Constraints, Constraint Relaxations, Deterministic Finite Automata.
Type:      Full Paper
First Page:      1131
Last Page:      1138
Language:      English
Cover:      cover          
Full Contents:      click to dowload Download
Paper Abstract:      The lack of focus that is a characteristic of unsupervised pattern mining in sequential data represents one of the major limitations of this approach. This lack of focus is due to the inherently large number of rules that is likely to be discovered in any but the more trivial sets of sequences. Several authors have promoted the use of constraints to reduce that number, but those constraints approximate the mining task to a hypothesis test task. In this paper, we propose the use of constraint approximations to guide the mining process, reducing the number of discovered patterns without compromising the prime goal of data mining: to discover unknown information. We show that existent algorithms, that use regular languages as constraints, can be used with minor adaptations. We propose a simple algorithm (ε-accepts) that verifies if a sequence is approximately accepted by a given regular language.The lack of focus that is a characteristic of unsupervised pattern mining in sequential data represents one of the major limitations of this approach. This lack of focus is due to the inherently large number of rules that is likely to be discovered in any but the more trivial sets of sequences. Several authors have promoted the use of constraints to reduce that number, but those constraints approximate the mining task to a hypothesis test task. In this paper, we propose the use of constraint approximations to guide the mining process, reducing the number of discovered patterns without compromising the prime goal of data mining: to discover unknown information. We show that existent algorithms, that use regular languages as constraints, can be used with minor adaptations. We propose a simple algorithm (ε-accepts) that verifies if a sequence is approximately accepted by a given regular language.
   

Social Media Links

Search

Login