Title:
|
INTERACTIVE WRAPPER LEARNING FOR WEB DOCUMENTS USING TREE ALIGNMENT |
Author(s):
|
Max Goebel , Michal Ceresna |
ISBN:
|
978-972-8924-30-0 |
Editors:
|
Nuno Guimarães and Pedro Isaías |
Year:
|
2007 |
Edition:
|
Single |
Keywords:
|
Information Extraction, Data mining, Tree alignment, Classification. |
Type:
|
Full Paper |
First Page:
|
363 |
Last Page:
|
370 |
Language:
|
English |
Cover:
|
|
Full Contents:
|
click to dowload
|
Paper Abstract:
|
This paper proposes an interactive wrapper learning approach to Web information extraction for semi-automatic wrapper generation. In particular, we present an algorithm that learns patterns based on the structure of training instances using tree alignment techniques. This is achieved by generating structural template models for both positive and negative examples. We evaluate our system on standard benchmarks, and evaluation shows that there exists great potential for structure learning for a variety of extraction tasks. |
|
|
|
|