Title:
|
WEB WRAPPER SPECIFICATION USING COMPOUND FILTER LEARNING |
Author(s):
|
Julien Carme , Michal Ceresna , Max Goebel |
ISBN:
|
972-8924-19-4 |
Editors:
|
Pedro Isaías, Miguel Baptista Nunes and Inmaculada J. Martínez |
Year:
|
2006 |
Edition:
|
V I, 2 |
Keywords:
|
wrapper induction, interactive learning, information extraction |
Type:
|
Full Paper |
First Page:
|
187 |
Last Page:
|
194 |
Language:
|
English |
Cover:
|
|
Full Contents:
|
click to dowload
|
Paper Abstract:
|
Information available on the Internet is made to be read by humans, not to be processed by machines. To automatically access this information, there is a need for intelligent services that convert HTML documents into more suitable formats like XML. This can be achieved through generation of Web wrappers, programs designed to process pages of a given Web site. To generate such Web wrappers, an efficient approach is to learn them from examples provided by the user. We present such a system, which is based on the generation, selection and combination of elementary extraction operators that we call filters. What makes this approach innovative is that generated wrappers can be easily read, interpreted and modified by the user. |
|
|
|
|