Title:
|
A NEW ARCHITECTURE FOR CRAWLING THE WEB |
Author(s):
|
Valerio Summo , Marcello Castellano , Nicola Pastore , Francesco Arcieri , Giuliano Bellone De Greci |
ISBN:
|
972-98947-5-2 |
Editors:
|
Pedro IsaĆas, Piet Kommers and Maggie McPherson |
Year:
|
2004 |
Edition:
|
2 |
Keywords:
|
Information Retrieval, Web Crawler, Web Content and Structure Mining. |
Type:
|
Short Paper |
First Page:
|
1111 |
Last Page:
|
1114 |
Language:
|
English |
Cover:
|
|
Full Contents:
|
click to dowload
|
Paper Abstract:
|
This paper presents a Web Crawler Architecture for the extraction of useful knowledge from the web. The proposed
solution covers the first step of the Web Mining process, dealing with automatic retrieval of all relevant documents and
ensuring at the same time that the non-relevant ones are fetched as few as possible. The architecture makes use of
Information Retrieval techniques to extract keywords from documents, in order to combine them to enlarge the set of web
pages to examine. |
|
|
|
|