Title:
|
SUBJECT CLASSIFICATION OF WEB PAGES |
Author(s):
|
Ludger Martin |
ISBN:
|
978-989-8533-09-8 |
Editors:
|
Bebo White and Pedro IsaĆas |
Year:
|
2012 |
Edition:
|
Single |
Keywords:
|
Subject Classification, Web Content Mining |
Type:
|
Full Paper |
First Page:
|
298 |
Last Page:
|
306 |
Language:
|
English |
Cover:
|
|
Full Contents:
|
click to dowload
|
Paper Abstract:
|
Subject classification is a discipline to automatically find out what a text is about. For example, a text can refer to biology or to economic science. This paper discusses how the subject of a web page can be determined. This is done in several steps. First the main content of the page is extracted. Then it is investigated by using frequency classes and Wikipedia categories to determine the class of the subject. A case study shows the suitability of the procedure which depends on certain parameters. Their choice of these parameters is motivated, too. |
|
|
|
|