IMPROVING NEWS WEB PAGE CLASSIFICATION THROUGH CONTENT EXTRACTION

Home

Document Info

Title:	IMPROVING NEWS WEB PAGE CLASSIFICATION THROUGH CONTENT EXTRACTION
Author(s):	Rafael Ferreira, Rinaldo Lima, Dimas Melo Filho,Hilário Tomaz, Fred Freitas
ISBN:	978-989-8533-01-2
Editors:	Bebo White, Pedro Isaías and Flávia Maria Santoro
Year:	2011
Edition:	Single
Keywords:	News Classification, Blog Crawling, Content Extraction, Supervised Classification.
Type:	Full Paper
First Page:	187
Last Page:	194
Language:	English
Cover:
Full Contents:	click to dowload
Paper Abstract:	Over the past few years, the Internet became a great medium for publishing and reading news. One of the major concerns facing news aggregators and search engines relates to the classification of news articles on the web into categories. In this work we conducted experiments to evaluate how Content Extraction algorithms and some heuristics can impact the classification results of news pages. These experiments were performed on a dataset of news articles, which are structurally similar to blog posts. Additionally, we incorporated content extraction services into a framework for news/blog retrieval, providing as well an easy access to a set of classification services.

	Go Back