Title:
|
EXTRACTION OF FEATURES WITH UNSTRUCTURED REPRESENTATION FROM HTML DOCUMENTS |
Author(s):
|
Ray R. Hashemi , Charles W. Ford , Tim Vamprooyen , John R.talburt |
ISBN:
|
972-9027-53-6 |
Editors:
|
Pedro IsaĆas |
Year:
|
2002 |
Edition:
|
Single |
Keywords:
|
Web Mining, Structured Features, Unstructured Features, Text Mining, and Mining Names from Text. |
Type:
|
Full Paper |
First Page:
|
47 |
Last Page:
|
53 |
Language:
|
English |
Cover:
|
|
Full Contents:
|
click to dowload
|
Paper Abstract:
|
The goal of this research effort is to mine web pages for unstructured features (Names, Titles, and their associations.) Unstructured features are not easily identifiable because they lack the presence of obvious patterns in their ASCII representations. In addition, the crucial process of establishing associations among the extracted features adds another level of complexity to the mining process. The results obtained from the application of our methodology to a test bed of 20 URLs with 500 total pages revealed: (a) the measures of recovery and accuracy of the extracted Name and Title features are quite satisfactory, and (b) the proposed methodology is highly effective. |
|
|
|
|