Digital Library

cab1

 
Title:      SEMI-AUTOMATIC GENERATION OF SEED PAGES IN GENRE-AWARE FOCUSED CRAWLING
Author(s):      Vítor Mangaravite, Guilherme Tavares de Assis, Anderson Almeida Ferreira, Flávio Luis Cardeal Pádua
ISBN:      978-989-8533-24-1
Editors:      Pedro Isaías and Bebo White
Year:      2014
Edition:      Single
Keywords:      Seed pages, meta-crawling, focused crawling.
Type:      Full Paper
First Page:      51
Last Page:      58
Language:      English
Cover:      cover          
Full Contents:      click to dowload Download
Paper Abstract:      Focused crawlers attempt to crawl web pages that are relevant to a specific topic or user interest. Although these kinds of crawlers have been demonstrated to be effective, they need improvement in their efficiency. Focused crawlers usually use a priority queue, called Frontier, that is initialized with the URLs of the seed pages, manually specified by users, in order to visit the web pages and gather relevant pages. If seed pages are not well specified, the efficiency of a crawling process may be unsatisfactory. Thus, in this work, we propose and evaluate a strategy for semi-automatic generation of seed pages to improve the efficiency of a genre-aware focused crawler. Our experimental evaluation shows, in some situations, an improvement around 360% in efficiency of crawling processes.
   

Social Media Links

Search

Login