Digital Library

cab1

 
Title:      EFFICIENT WAY OF SIMULATING PRINCIPAL COMPONENT ANALYSIS FOR SPECIESIDENTIFICATION
Author(s):      Fahim Md. , Aggarwal Ruchi
ISBN:      978-972-8924-40-9
Editors:      Jörg Roth, Jairo Gutiérrez and Ajith P. Abraham (series editors: Piet Kommers, Pedro Isaías and Nian-Shing Chen)
Year:      2007
Edition:      Single
Keywords:      Genome sequence, Principal Component Analysis, Feature Descriptor Diagram, Tanimoto Distance.
Type:      Short Paper
First Page:      155
Last Page:      159
Language:      English
Cover:      cover          
Full Contents:      click to dowload Download
Paper Abstract:      Principle Component Analysis (PCA) is a statistical & data mining technique generally used to reduce the data dimensionality involved in various techniques [5]. It can also be used for species identification from their genomic databases, in which it calculates the frequency of search keys (which are generated using all possible combination of the four letters (A, G, T, and C)) in the genomic database and forms a feature vector for every species, which becomes an identity for that specie. These vectors can then be compared to identify organisms from their genomic databases. Like other techniques PCA also suffers from various limitations for e.g. finding the optimum length of the search keys as there are four bases (A, G, T, C) and thus search keys can be one, two, three or four lettered [15]. In this paper “Principal Component analysis” is simulated in an efficient way using programming in order to remove some of its limitations and also to find the optimum length of the search keys that should be used in order to get efficient & accurate results in the application of PCA for species identification from its genomic database.
   

Social Media Links

Search

Login