Digital Library

cab1

 
Title:      USING WELL DEFINED TOKENS IN SIMILARITY FUNCTION FOR RECORD MATCHING IN DATA CLEANING TECHNIQUES
Author(s):      Rawshan Basha
ISBN:      972-99353-6-X
Editors:      Nuno Guimarães and Pedro Isaías
Year:      2005
Edition:      2
Keywords:      Data Cleaning, Elimination of Duplicates, Data Linkage.
Type:      Short Paper
First Page:      190
Last Page:      194
Language:      English
Cover:      cover          
Full Contents:      click to dowload Download
Paper Abstract:      The integration of information is an important area of research in databases. The duplicate elimination problem of detecting database records that are approximate duplicates, but not exact duplicates, which describe the same real world entity, is an important data cleaning problem. To ensure high data quality, data warehouse must cleanse data by detecting and eliminating the redundant data. During Data Cleaning process multiple records identified that are syntactically differ but semantically equivalent, by using data mining techniques. This paper used similarity function to detects and eliminates duplication by using well-defined tokens for record matching in a domain- independent Algorithm, for detecting and removing duplicate records which makes the real data ready for mining techniques. Existing data cleaning techniques rely heavily on full or partial domain knowledge.
   

Social Media Links

Search

Login