Digital Library

cab1

 
Title:      MINING FREQUENT INDUCED SUBTREES WITH APPLICATION TO PHYLOGENETIC TREES
Author(s):      Marwa M. A. Hussein , Taysir H. A. Soliman , Omar H. Karam
ISBN:      978-972-8924-40-9
Editors:      Jörg Roth, Jairo Gutiérrez and Ajith P. Abraham (series editors: Piet Kommers, Pedro Isaías and Nian-Shing Chen)
Year:      2007
Edition:      Single
Keywords:      Tree mining, frequent pattern mining, induced subtree, guided pattern growth, rooted ordered trees, phylogenetic trees.
Type:      Full Paper
First Page:      102
Last Page:      110
Language:      English
Cover:      cover          
Full Contents:      click to dowload Download
Paper Abstract:      Frequent tree mining has great uses in many domains employing tree structures, e.g. bioinformatics, text and web mining. Many challenges were tackled to adapt traditional frequent pattern mining techniques to fit into the tree structure. Previous studies proved that Guided Pattern Growth methods are more efficient than unguided candidate generation methods. In the current work, an efficient pattern growth algorithm, Guided Induced Pattern Growth, GIP-Growth, is introduced for discovering frequent induced subtrees from a collection of labeled, rooted, and ordered trees. GIP-Growth is based on frequent pattern growth methodology that uses the input trees model as a guide to generate candidates. All frequent subtrees are efficiently discovered without duplication or generation of invalid candidates. To study the performance of our proposed GIP-Growth algorithm, we have compared GIP-Growth against the FREQT algorithm, using both synthetic and biological datasets. In both cases, GIP-Growth outperforms FREQT by an order of magnitude. Experiments show that GIP-Growth can find all frequent subtrees while generating fewer candidates. In case of biological datasets, we used 124 phylogenetic trees from the TreeBASE dataset, inferred by 43 studies conducted on the Saccharomyces Cerevisiae yeast. These trees include 1754 different species (represented as node labels). Phylogenetic trees experiments at different support values showed that GIP-Growth is more efficient than FREQT, which proves the efficiency and effectiveness of the proposed algorithm and the applied guided pattern growth technique.
   

Social Media Links

Search

Login