Title:
|
MINING FREQUENT INDUCED SUBTREES WITH APPLICATION TO PHYLOGENETIC TREES |
Author(s):
|
Marwa M. A. Hussein , Taysir H. A. Soliman , Omar H. Karam |
ISBN:
|
978-972-8924-40-9 |
Editors:
|
Jörg Roth, Jairo Gutiérrez and Ajith P. Abraham (series editors: Piet Kommers, Pedro Isaías and Nian-Shing Chen) |
Year:
|
2007 |
Edition:
|
Single |
Keywords:
|
Tree mining, frequent pattern mining, induced subtree, guided pattern growth, rooted ordered trees, phylogenetic trees. |
Type:
|
Full Paper |
First Page:
|
102 |
Last Page:
|
110 |
Language:
|
English |
Cover:
|
|
Full Contents:
|
click to dowload
|
Paper Abstract:
|
Frequent tree mining has great uses in many domains employing tree structures, e.g. bioinformatics, text and web mining.
Many challenges were tackled to adapt traditional frequent pattern mining techniques to fit into the tree structure.
Previous studies proved that Guided Pattern Growth methods are more efficient than unguided candidate generation
methods. In the current work, an efficient pattern growth algorithm, Guided Induced Pattern Growth, GIP-Growth, is
introduced for discovering frequent induced subtrees from a collection of labeled, rooted, and ordered trees. GIP-Growth
is based on frequent pattern growth methodology that uses the input trees model as a guide to generate candidates. All
frequent subtrees are efficiently discovered without duplication or generation of invalid candidates. To study the
performance of our proposed GIP-Growth algorithm, we have compared GIP-Growth against the FREQT algorithm,
using both synthetic and biological datasets. In both cases, GIP-Growth outperforms FREQT by an order of magnitude.
Experiments show that GIP-Growth can find all frequent subtrees while generating fewer candidates. In case of
biological datasets, we used 124 phylogenetic trees from the TreeBASE dataset, inferred by 43 studies conducted on the
Saccharomyces Cerevisiae yeast. These trees include 1754 different species (represented as node labels). Phylogenetic
trees experiments at different support values showed that GIP-Growth is more efficient than FREQT, which proves the
efficiency and effectiveness of the proposed algorithm and the applied guided pattern growth technique. |
|
|
|
|