Title:
|
A DATA-DRIVEN STUDY OF CITIZEN SCIENCE DATA
QUALITY ASSESSMENT PROFILE |
Author(s):
|
Jailson N. Leocadio and Antonio M. Saraiva |
ISBN:
|
978-989-8704-34-4 |
Editors:
|
Pedro Isaías and Hans Weghorn |
Year:
|
2021 |
Edition:
|
Single |
Type:
|
Full |
First Page:
|
93 |
Last Page:
|
100 |
Language:
|
English |
Cover:
|
|
Full Contents:
|
click to dowload
|
Paper Abstract:
|
In Citizen Science (CS) projects, data quality (DQ) has been a major concern and discussions have been held to evaluate
and ensure the quality of what is produced by volunteers, but few studies have assessed how volunteers get involved and
the impact of their behavior on data quality. This study aimed to study a data-driven CS profile to data quality
assessment. Here, we analyzed citizen science data extracted from the iNaturalist, a platform to record species
observations. We used 58,488 observations recorded in São Paulo, Brazil, and Manchester, England, to train machine
learning models, using Random Forest, and to create a DQ profile to classify data according to its quality. We applied an
approach that, first identifies information elements (IE) and quality dimensions to describe the data and users' behavior.
The data was then cleaned, pre-processed and transformed. Three models were created: a complete model (with all
features), a reduced model (with dimension reduction) and a model with only characteristics that describe the users'
behavior. The precision score for the models were 0.931, 0.932 and 0.774, respectively. The results showed that data
quality can be described with few features and user behavior is very important to understand the quality of what is
produced by volunteers. |
|
|
|
|