Title:
|
SEQUENTIAL PATTERN MINING WITH APPROXIMATED CONSTRAINTS |
Author(s):
|
Cláudia Antunes , Arlindo L. Oliveira |
ISBN:
|
972-98947-3-6 |
Editors:
|
Nuno Guimarães and Pedro Isaías |
Year:
|
2004 |
Edition:
|
Single |
Keywords:
|
Data Mining, Pattern Mining, Sequential Pattern Mining, Constraints, Constraint Relaxations, Deterministic Finite Automata. |
Type:
|
Full Paper |
First Page:
|
1131 |
Last Page:
|
1138 |
Language:
|
English |
Cover:
|
|
Full Contents:
|
click to dowload
|
Paper Abstract:
|
The lack of focus that is a characteristic of unsupervised pattern mining in sequential data represents one of the major limitations of this approach. This lack of focus is due to the inherently large number of rules that is likely to be discovered in any but the more trivial sets of sequences. Several authors have promoted the use of constraints to reduce that number, but those constraints approximate the mining task to a hypothesis test task. In this paper, we propose the use of constraint approximations to guide the mining process, reducing the number of discovered patterns without compromising the prime goal of data mining: to discover unknown information. We show that existent algorithms, that use regular languages as constraints, can be used with minor adaptations. We propose a simple algorithm (ε-accepts) that verifies if a sequence is approximately accepted by a given regular language.The lack of focus that is a characteristic of unsupervised pattern mining in sequential data represents one of the major limitations of this approach. This lack of focus is due to the inherently large number of rules that is likely to be discovered in any but the more trivial sets of sequences. Several authors have promoted the use of constraints to reduce that number, but those constraints approximate the mining task to a hypothesis test task. In this paper, we propose the use of constraint approximations to guide the mining process, reducing the number of discovered patterns without compromising the prime goal of data mining: to discover unknown information. We show that existent algorithms, that use regular languages as constraints, can be used with minor adaptations. We propose a simple algorithm
(ε-accepts) that verifies if a sequence is approximately accepted by a given regular language. |
|
|
|
|