A NEW ARCHITECTURE FOR CRAWLING THE WEB

Home

Document Info

Title:	A NEW ARCHITECTURE FOR CRAWLING THE WEB
Author(s):	Valerio Summo , Marcello Castellano , Nicola Pastore , Francesco Arcieri , Giuliano Bellone De Greci
ISBN:	972-98947-5-2
Editors:	Pedro Isaías, Piet Kommers and Maggie McPherson
Year:	2004
Edition:	2
Keywords:	Information Retrieval, Web Crawler, Web Content and Structure Mining.
Type:	Short Paper
First Page:	1111
Last Page:	1114
Language:	English
Cover:
Full Contents:	click to dowload
Paper Abstract:	This paper presents a Web Crawler Architecture for the extraction of useful knowledge from the web. The proposed solution covers the first step of the Web Mining process, dealing with automatic retrieval of all relevant documents and ensuring at the same time that the non-relevant ones are fetched as few as possible. The architecture makes use of Information Retrieval techniques to extract keywords from documents, in order to combine them to enlarge the set of web pages to examine.

	Go Back