Digital Library

cab1

 
Title:      SUPERVISED DATA EXTRACTION
Author(s):      N.georgiev , J.m.labat , J.l.minel , L.nicolas
ISBN:      972-8924-02-X
Editors:      Pedro Isaías and Miguel Baptista Nunes
Year:      2005
Edition:      1
Keywords:      Wrappers, wrapper generation, data extraction, HTML parsing, information extraction, XPath.
Type:      Full Paper
First Page:      467
Last Page:      474
Language:      English
Cover:      cover          
Full Contents:      click to dowload Download
Paper Abstract:      The process of data extraction from internet sources has been arousing the interest of the scientific community for the past years. However, there are still no well established standards because of the heterogeneous nature of the information in the Global Network. Nevertheless, there is still something in common – all the data is available in HTML format for compatibility reasons. This article presents our methodology and the prototype system we have created to extract data from HTML pages. We have used XPath as data extraction language and have developed a methodology for visual wrapper generation. Our approach takes advantage of the implicit correlation between the data and the surrounding structure.
   

Social Media Links

Search

Login