EFFICIENT WAY OF SIMULATING PRINCIPAL COMPONENT ANALYSIS FOR SPECIESIDENTIFICATION

Fahim Md.; Aggarwal Ruchi

Home

Digital Library

Visit Digital Library

Conference Proceedings

IADIS European Conference Data Mining - DM

IADIS European Conference Data Mining 2007 (part of MCCSIS 2007)

Document Info

Title:	EFFICIENT WAY OF SIMULATING PRINCIPAL COMPONENT ANALYSIS FOR SPECIESIDENTIFICATION
Author(s):	Fahim Md. , Aggarwal Ruchi
ISBN:	978-972-8924-40-9
Editors:	Jörg Roth, Jairo Gutiérrez and Ajith P. Abraham (series editors: Piet Kommers, Pedro Isaías and Nian-Shing Chen)
Year:	2007
Edition:	Single
Keywords:	Genome sequence, Principal Component Analysis, Feature Descriptor Diagram, Tanimoto Distance.
Type:	Short Paper
First Page:	155
Last Page:	159
Language:	English
Cover:
Full Contents:	click to dowload
Paper Abstract:	Principle Component Analysis (PCA) is a statistical & data mining technique generally used to reduce the data dimensionality involved in various techniques [5]. It can also be used for species identification from their genomic databases, in which it calculates the frequency of search keys (which are generated using all possible combination of the four letters (A, G, T, and C)) in the genomic database and forms a feature vector for every species, which becomes an identity for that specie. These vectors can then be compared to identify organisms from their genomic databases. Like other techniques PCA also suffers from various limitations for e.g. finding the optimum length of the search keys as there are four bases (A, G, T, C) and thus search keys can be one, two, three or four lettered [15]. In this paper Principal Component analysis is simulated in an efficient way using programming in order to remove some of its limitations and also to find the optimum length of the search keys that should be used in order to get efficient & accurate results in the application of PCA for species identification from its genomic database.

	Go Back

Social Media Links

amazon

Search

Login

Top Visited