Title:
|
AUTOMATIC IMPROVEMENT OF TERMS USED IN FOCUSED CRAWLING PROCESSES ON WEB PAGES |
Author(s):
|
Gilson Faria Costa, Guilherme Tavares de Assis and Marcos Vinicius Oliveira Souza |
ISBN:
|
978-989-8533-69-2 |
Editors:
|
Pedro IsaĆas and Hans Weghorn |
Year:
|
2017 |
Edition:
|
Single |
Keywords:
|
Automatic Improvement of Terms, Web Crawling, Focused Crawling |
Type:
|
Full Paper |
First Page:
|
71 |
Last Page:
|
78 |
Language:
|
English |
Cover:
|
|
Full Contents:
|
click to dowload
|
Paper Abstract:
|
The great popularity and, specially, the fast Web growth have led to the proposal and analysis of new techniques for helping users to locate effectively the needed information in a satisfactory time, without much difficulty. Traditional crawlers are not capable to identify relevant sub-spaces on Web related to a specific theme; however, focused crawlers are capable to solve, effectively and efficiently, the mentioned problem. Usually, a focused crawler process requires, as an input parameter, a well-defined set of terms that express the desired topic of interest; depending on such set of terms, the effectiveness of a crawling process may not be satisfactory. In order to automatically improve the set of terms necessary to perform focused crawling processes related to a genre-aware approach, we propose two strategies in this work. Our experiments generated results that improved precision and F1 measures by up to 88.9% and 32.1%, respectively, in crawling processes that considered not well-defined sets of terms as input parameter. |
|
|
|
|