Digital Library

cab1

 
Title:      HARNESSING WEAK SUPERVISION TECHNIQUES IN WEB MINING TO IMPROVE OVERALL DATA QUALITY
Author(s):      Allen O'Neill
ISBN:      978-989-8704-42-9
Editors:      Yingcai Xiao, Ajith Abraham, Guo Chao Peng and Jörg Roth
Year:      2022
Edition:      Single
Keywords:      Data Quality, Web Mining, Schema Mapping
Type:      Short Paper
First Page:      218
Last Page:      222
Language:      English
Cover:      cover          
Full Contents:      click to dowload Download
Paper Abstract:      Good data quality is imperative to support robust information analysis and decision making. Predictions based on low quality training data can have a serious impact on outcomes for both data consumers and organizations. It has been established that determining if data is of sufficient high-quality and therefore 'fit for use' depends on the context and use-case of the target data. The research presented in this paper outlines a 'work in progress' novel method to use weak supervision techniques to improve data quality. This is achieved by identifying patterns in existing datasets and mapping these to input data, thereby performing a low-cost bootstrapping technique to help overcome the problem of expensive data annotation for machine learning training datasets. The research described is a work in progress primarily focusing on the web-based eCommerce data area but has implications and relevance for other non-related domains. The contribution of this work, when completed, is expected to be an improved approach to evaluating and building models for monitoring data quality.
   

Social Media Links

Search

Login