EXTRACTION OF FEATURES WITH UNSTRUCTURED REPRESENTATION FROM HTML DOCUMENTS

Home

Document Info

Title:	EXTRACTION OF FEATURES WITH UNSTRUCTURED REPRESENTATION FROM HTML DOCUMENTS
Author(s):	Ray R. Hashemi , Charles W. Ford , Tim Vamprooyen , John R.talburt
ISBN:	972-9027-53-6
Editors:	Pedro Isaías
Year:	2002
Edition:	Single
Keywords:	Web Mining, Structured Features, Unstructured Features, Text Mining, and Mining Names from Text.
Type:	Full Paper
First Page:	47
Last Page:	53
Language:	English
Cover:
Full Contents:	click to dowload
Paper Abstract:	The goal of this research effort is to mine web pages for unstructured features (Names, Titles, and their associations.) Unstructured features are not easily identifiable because they lack the presence of obvious patterns in their ASCII representations. In addition, the crucial process of establishing associations among the extracted features adds another level of complexity to the mining process. The results obtained from the application of our methodology to a test bed of 20 URLs with 500 total pages revealed: (a) the measures of recovery and accuracy of the extracted Name and Title features are quite satisfactory, and (b) the proposed methodology is highly effective.

	Go Back