BOOTSTRAPPING INFORMATION EXTRACTION USING REGULARITY OF WEB PAGES

Home

Document Info

Title:	BOOTSTRAPPING INFORMATION EXTRACTION USING REGULARITY OF WEB PAGES
Author(s):	Norifumi Murayama , Tomoyuki Nanno , Manabu Okumura
ISBN:	978-972-8924-44-7
Editors:	Pedro Isaías , Miguel Baptista Nunes and João Barroso (associate editors Luís Rodrigues and Patrícia Barbosa)
Year:	2007
Edition:	V I, 2
Keywords:	Relation Extraction, Metadata, Location Information
Type:	Full Paper
First Page:	313
Last Page:	320
Language:	English
Cover:
Full Contents:	click to dowload
Paper Abstract:	To annotate web documents with metadata automatically, we must prepare a database that stores annotation targets and these metadata. In the case of location information, we need a database that stores named entities (NEs) and their location information (i.e., telephone number and address). In this paper, we automatically create such databases by extracting necessary information from documents on the Web. Our approach uses the regularity of web pages that list information to be extracted and expands the database automatically. We describe our extraction method and report our experimental results. In our experiment, we succeeded increasing the size of the database to over ten times from the first seed set. Manual evaluation for our experimental results showed that 73-87% of information extracted was exactly correct.

	Go Back