Title:
|
IMPROVING NEWS WEB PAGE CLASSIFICATION THROUGH CONTENT EXTRACTION |
Author(s):
|
Rafael Ferreira, Rinaldo Lima, Dimas Melo Filho,Hilário Tomaz, Fred Freitas |
ISBN:
|
978-989-8533-01-2 |
Editors:
|
Bebo White, Pedro Isaías and Flávia Maria Santoro |
Year:
|
2011 |
Edition:
|
Single |
Keywords:
|
News Classification, Blog Crawling, Content Extraction, Supervised Classification. |
Type:
|
Full Paper |
First Page:
|
187 |
Last Page:
|
194 |
Language:
|
English |
Cover:
|
|
Full Contents:
|
click to dowload
|
Paper Abstract:
|
Over the past few years, the Internet became a great medium for publishing and reading news. One of the major concerns facing news aggregators and search engines relates to the classification of news articles on the web into categories. In this work we conducted experiments to evaluate how Content Extraction algorithms and some heuristics can impact the classification results of news pages. These experiments were performed on a dataset of news articles, which are structurally similar to blog posts. Additionally, we incorporated content extraction services into a framework for news/blog retrieval, providing as well an easy access to a set of classification services. |
|
|
|
|