Title:
|
HARNESSING WEAK SUPERVISION TECHNIQUES IN WEB MINING TO IMPROVE OVERALL DATA QUALITY |
Author(s):
|
Allen O'Neill |
ISBN:
|
978-989-8704-42-9 |
Editors:
|
Yingcai Xiao, Ajith Abraham, Guo Chao Peng and Jörg Roth |
Year:
|
2022 |
Edition:
|
Single |
Keywords:
|
Data Quality, Web Mining, Schema Mapping |
Type:
|
Short Paper |
First Page:
|
218 |
Last Page:
|
222 |
Language:
|
English |
Cover:
|
|
Full Contents:
|
click to dowload
|
Paper Abstract:
|
Good data quality is imperative to support robust information analysis and decision making. Predictions based on low quality training data can have a serious impact on outcomes for both data consumers and organizations. It has been established that determining if data is of sufficient high-quality and therefore 'fit for use' depends on the context and use-case of the target data. The research presented in this paper outlines a 'work in progress' novel method to use weak supervision techniques to improve data quality. This is achieved by identifying patterns in existing datasets and mapping these to input data, thereby performing a low-cost bootstrapping technique to help overcome the problem of expensive data annotation for machine learning training datasets. The research described is a work in progress primarily focusing on the web-based eCommerce data area but has implications and relevance for other non-related domains. The contribution of this work, when completed, is expected to be an improved approach to evaluating and building models for monitoring data quality. |
|
|
|
|