DATA CLEANING USING FD FROM DATA MINING PROCESS

Home

Document Info

Title:	DATA CLEANING USING FD FROM DATA MINING PROCESS
Author(s):	Kollayut Kaewbuadee , Yaowadee Temtanapat , Ratchata Peachavanish
ISBN:	ISSN: 1646-3692
Editors:	Pedro Isaías and Marcin Paprzycki
Year:	2006
Edition:	V I, 2
Keywords:	Functional Dependency, Data Cleaning, Functional Dependency Discovery
Type:	Journal Paper
First Page:	117
Last Page:	131
Language:	English
Cover:
Full Contents:	click to dowload
Paper Abstract:	Functional Dependency (FD) is an important feature for referencing to the relationship between attributes and candidate keys in tuples. It also shows the relationship between entities in a data model (Calvanese et al. 2001). In research areas of data cleaning (Arenas et al. 1999; Bohannon et al. 2005), the FD is used for improving the data quality. In a data mining research, an FD discovery technique has been studied (Savnik and Flach 1993; Huhtala et al. 1999). However, an FD discovery could find too many FDs and, if use directly in a cleaning process, could cause it to NP time (Bohannon et al. 2005). In this research, we have developed a cleaning engine by combining an FD discovery technique with data cleaning technique and use the feature in query optimization called Selectivity Value to decrease the number of discovered FDs. Testing results showed that this work can identify duplicates and anomalies with high recall and low false positive.

	Go Back