Title:
|
EXPERIENCES REGARDING AUTOMATIC DATA EXTRACTION FROM WEB PAGES |
Author(s):
|
Mirel Coşulschi , Bogdan Udrescu , Nicolae Constantinescu , Mihai Gabroveanu , Adrian Giurcă |
ISBN:
|
972-8924-19-4 |
Editors:
|
Pedro Isaías, Miguel Baptista Nunes and Inmaculada J. Martínez |
Year:
|
2006 |
Edition:
|
V I, 2 |
Keywords:
|
data extraction, wrapper, clustering |
Type:
|
Full Paper |
First Page:
|
281 |
Last Page:
|
288 |
Language:
|
English |
Cover:
|
|
Full Contents:
|
click to dowload
|
Paper Abstract:
|
Existing methods of information extraction from HTML documents include manual approach, supervised learning and automatic techniques. The manual method has high precision and recall values but it is difficult to apply it for large number of pages. Supervised learning involves human interaction to create positive and negative samples. Automatic techniques benefit from less human effort but they are not highly reliable regarding the information retrieved. Our experiments align in the area of this last type of methods for this purpose developing a tool for automatic data extraction from HTML pages. |
|
|
|
|