Digital Library

cab1

 
Title:      A DATA-DRIVEN STUDY OF CITIZEN SCIENCE DATA QUALITY ASSESSMENT PROFILE
Author(s):      Jailson N. Leocadio and Antonio M. Saraiva
ISBN:      978-989-8704-34-4
Editors:      Pedro Isaías and Hans Weghorn
Year:      2021
Edition:      Single
Type:      Full
First Page:      93
Last Page:      100
Language:      English
Cover:      cover          
Full Contents:      click to dowload Download
Paper Abstract:      In Citizen Science (CS) projects, data quality (DQ) has been a major concern and discussions have been held to evaluate and ensure the quality of what is produced by volunteers, but few studies have assessed how volunteers get involved and the impact of their behavior on data quality. This study aimed to study a data-driven CS profile to data quality assessment. Here, we analyzed citizen science data extracted from the iNaturalist, a platform to record species observations. We used 58,488 observations recorded in São Paulo, Brazil, and Manchester, England, to train machine learning models, using Random Forest, and to create a DQ profile to classify data according to its quality. We applied an approach that, first identifies information elements (IE) and quality dimensions to describe the data and users' behavior. The data was then cleaned, pre-processed and transformed. Three models were created: a complete model (with all features), a reduced model (with dimension reduction) and a model with only characteristics that describe the users' behavior. The precision score for the models were 0.931, 0.932 and 0.774, respectively. The results showed that data quality can be described with few features and user behavior is very important to understand the quality of what is produced by volunteers.
   

Social Media Links

Search

Login