Title:
|
DATA CLEANING USING FD FROM DATA MINING PROCESS |
Author(s):
|
Kollayut Kaewbuadee , Yaowadee Temtanapat , Ratchata Peachavanish |
ISBN:
|
ISSN: 1646-3692 |
Editors:
|
Pedro Isaías and Marcin Paprzycki |
Year:
|
2006 |
Edition:
|
V I, 2 |
Keywords:
|
Functional Dependency, Data Cleaning, Functional Dependency Discovery |
Type:
|
Journal Paper |
First Page:
|
117 |
Last Page:
|
131 |
Language:
|
English |
Cover:
|
|
Full Contents:
|
click to dowload
|
Paper Abstract:
|
Functional Dependency (FD) is an important feature for referencing to the relationship between attributes and candidate keys in tuples. It also shows the relationship between entities in a data model (Calvanese et al. 2001). In research areas of data cleaning (Arenas et al. 1999; Bohannon et al. 2005), the FD is used for improving the data quality. In a data mining research, an FD discovery technique has been studied (Savnik and Flach 1993; Huhtala et al. 1999). However, an FD discovery could find too many FDs and, if use directly in a cleaning process, could cause it to NP time (Bohannon et al. 2005). In this research, we have developed a cleaning engine by combining an FD discovery technique with data cleaning technique and use the feature in query optimization called Selectivity Value to decrease the number of discovered FDs.
Testing results showed that this work can identify duplicates and anomalies with high recall and low false positive.
|
|
|
|
|