DATA EXTRACTION FROM WEB DATABASE QUERY RESULT PAGES VIA TAGSETS AND INTEGER SEQUENCES

Jerome Robinson

Home

Digital Library

Visit Digital Library

Conference Proceedings

IADIS International Conference WWW/Internet - ICWI

IADIS International Conference WWW/Internet 2003

Document Info

Title:	DATA EXTRACTION FROM WEB DATABASE QUERY RESULT PAGES VIA TAGSETS AND INTEGER SEQUENCES
Author(s):	Jerome Robinson
ISBN:	972-98947-1-X
Editors:	Pedro Isaías and Nitya Karmakar
Year:	2003
Edition:	2
Keywords:	Web database, data extraction, wrapper.
Type:	Full Paper
First Page:	145
Last Page:	152
Language:	English
Cover:
Full Contents:	click to dowload
Paper Abstract:	The World Wide Web is a collection of databases as well as web sites. Databases associated with web sites provide public access via query forms on web pages. They constitute an enormous repository of searchable data on an extremely diverse collection of subjects, ranging from multimedia collections through archives of subject-specific data to current information such as currency conversion or interest rates and news or weather reports. Many interesting and valuable Database Applications could be developed if these databases were easily and reliably accessible to programs. The difficulty in extracting data is the number of different web page formats and the tendency to change format suddenly. A rapid page analysis and wrapper creation system is needed to generate and maintain a data extraction facility for any required web sites. This important goal has been the subject of substantial recent research, modelling the web results page in various ways. The purpose of the current paper is to introduce a new method for rapid page analysis using the recurrence patterns of tagSet occurrence in the page.

	Go Back

Social Media Links

amazon

Search

Login

Top Visited