Title:
|
BOOTSTRAPPING INFORMATION EXTRACTION USING REGULARITY OF WEB PAGES |
Author(s):
|
Norifumi Murayama , Tomoyuki Nanno , Manabu Okumura |
ISBN:
|
978-972-8924-44-7 |
Editors:
|
Pedro Isaías , Miguel Baptista Nunes and João Barroso (associate editors Luís Rodrigues and Patrícia Barbosa) |
Year:
|
2007 |
Edition:
|
V I, 2 |
Keywords:
|
Relation Extraction, Metadata, Location Information |
Type:
|
Full Paper |
First Page:
|
313 |
Last Page:
|
320 |
Language:
|
English |
Cover:
|
|
Full Contents:
|
click to dowload
|
Paper Abstract:
|
To annotate web documents with metadata automatically, we must prepare a database that stores annotation targets and
these metadata. In the case of location information, we need a database that stores named entities (NEs) and their location
information (i.e., telephone number and address). In this paper, we automatically create such databases by extracting
necessary information from documents on the Web. Our approach uses the regularity of web pages that list information to
be extracted and expands the database automatically. We describe our extraction method and report our experimental
results. In our experiment, we succeeded increasing the size of the database to over ten times from the first seed set.
Manual evaluation for our experimental results showed that 73-87% of information extracted was exactly correct. |
|
|
|
|