Title:
|
TRASH ARTICLE DETECTION USING CATEGORIZATION TECHNIQUES |
Author(s):
|
Christos Bouras , Vassilis Poulopoulos , George Tsichritzis |
ISBN:
|
978-972-8924-97-3 |
Editors:
|
Hans Weghorn and Pedro Isaías |
Year:
|
2009 |
Edition:
|
V I, 2 |
Keywords:
|
Trash articles, categorization, news articles, trash detection |
Type:
|
Full Paper |
First Page:
|
51 |
Last Page:
|
58 |
Language:
|
English |
Cover:
|
|
Full Contents:
|
click to dowload
|
Paper Abstract:
|
We explore techniques for detecting news articles containing invalid information, using the help of text categorization
technology. The information that exists on the World Wide Web is huge enough in order to distract the users when trying
to find useful information. In order to overcome the large amounts of data many methodologies of text categorization
have been presented. One major problem we have to deal with is that many articles fetched by a crawler, then stored in a
back-end database, and finally given as an input to a categorization subsystem, may not contain valid information for the
user (trashy articles). This may lead to the user losing his trust towards the system. In this paper, we analyze the special
properties of trashy news articles categorization that allows us to detect them and we propose a specific methodology for
trash detection. Finally, we evaluate the proposed algorithm on a news categorization system and we depict the overall
benefit of a trash detection mechanism on the system. |
|
|
|
|