Digital Library

cab1

 
Title:      DATA EXTRACTION FROM WEB DATABASE QUERY RESULT PAGES VIA TAGSETS AND INTEGER SEQUENCES
Author(s):      Jerome Robinson
ISBN:      972-98947-1-X
Editors:      Pedro IsaĆ­as and Nitya Karmakar
Year:      2003
Edition:      2
Keywords:      Web database, data extraction, wrapper.
Type:      Full Paper
First Page:      145
Last Page:      152
Language:      English
Cover:      cover          
Full Contents:      click to dowload Download
Paper Abstract:      The World Wide Web is a collection of databases as well as web sites. Databases associated with web sites provide public access via query forms on web pages. They constitute an enormous repository of searchable data on an extremely diverse collection of subjects, ranging from multimedia collections through archives of subject-specific data to current information such as currency conversion or interest rates and news or weather reports. Many interesting and valuable Database Applications could be developed if these databases were easily and reliably accessible to programs. The difficulty in extracting data is the number of different web page formats and the tendency to change format suddenly. A rapid page analysis and wrapper creation system is needed to generate and maintain a data extraction facility for any required web sites. This important goal has been the subject of substantial recent research, modelling the web results page in various ways. The purpose of the current paper is to introduce a new method for rapid page analysis using the recurrence patterns of tagSet occurrence in the page.
   

Social Media Links

Search

Login