Title:
|
REGULAR LANGUAGE INFERENCE FOR DOMAIN SPECIFIC NAMED ENTITY RECOGNITION |
Author(s):
|
Falk Brauer , Robert Rieger , Wojciech Barczynski , Adrian Mocan |
ISBN:
|
978-972-8924-93-5 |
Editors:
|
Pedro IsaĆas, Bebo White and Miguel Baptista Nunes |
Year:
|
2009 |
Edition:
|
1 |
Keywords:
|
Regular Languages, Grammatical Inference, Named Entity Extraction, Information Extraction, Information Retrieval |
Type:
|
Full Paper |
First Page:
|
543 |
Last Page:
|
550 |
Language:
|
English |
Cover:
|
|
Full Contents:
|
click to dowload
|
Paper Abstract:
|
Named Entity Recognition (NER) is one of the most important techniques in Information Extraction (IE) from
unstructured documents. Still, regular expressions are the first choice to detect domain specific entities, such as product
names, in text which follow a special syntax. In many NER scenarios a small, but representative number of entities stored
in a structured form is available or can be acquired. Such data can be used by experienced developers to create and test
regular expressions. However, creating such specific rules manually for a certain domain is a complex and timeconsuming
task. In this paper, we introduce an approach for automated rule generation for NER, based on example
instances of entities. We present an implementation, which automatically identifies patterns in small sets of example
instances, tunes these patterns in order to achieve high precision and recall and applies these patterns for information
extraction. The generated rules are not tuned to a specific training corpus and do not require a homogeneous document
structure. The evaluation of our prototype shows very good results for three target domains of interest with an average
f-measure of about 90%. |
|
|
|
|