Title:
|
EFFICIENT WAY OF SIMULATING PRINCIPAL COMPONENT ANALYSIS FOR SPECIESIDENTIFICATION |
Author(s):
|
Fahim Md. , Aggarwal Ruchi |
ISBN:
|
978-972-8924-40-9 |
Editors:
|
Jörg Roth, Jairo Gutiérrez and Ajith P. Abraham (series editors: Piet Kommers, Pedro Isaías and Nian-Shing Chen) |
Year:
|
2007 |
Edition:
|
Single |
Keywords:
|
Genome sequence, Principal Component Analysis, Feature Descriptor Diagram, Tanimoto Distance. |
Type:
|
Short Paper |
First Page:
|
155 |
Last Page:
|
159 |
Language:
|
English |
Cover:
|
|
Full Contents:
|
click to dowload
|
Paper Abstract:
|
Principle Component Analysis (PCA) is a statistical & data mining technique generally used to reduce the data
dimensionality involved in various techniques [5]. It can also be used for species identification from their genomic
databases, in which it calculates the frequency of search keys (which are generated using all possible combination of the
four letters (A, G, T, and C)) in the genomic database and forms a feature vector for every species, which becomes an
identity for that specie. These vectors can then be compared to identify organisms from their genomic databases. Like
other techniques PCA also suffers from various limitations for e.g. finding the optimum length of the search keys as
there are four bases (A, G, T, C) and thus search keys can be one, two, three or four lettered [15]. In this paper Principal
Component analysis is simulated in an efficient way using programming in order to remove some of its limitations and
also to find the optimum length of the search keys that should be used in order to get efficient & accurate results in the
application of PCA for species identification from its genomic database. |
|
|
|
|