Digital Library

cab1

 
Title:      REGULAR LANGUAGE INFERENCE FOR DOMAIN SPECIFIC NAMED ENTITY RECOGNITION
Author(s):      Falk Brauer , Robert Rieger , Wojciech Barczynski , Adrian Mocan
ISBN:      978-972-8924-93-5
Editors:      Pedro IsaĆ­as, Bebo White and Miguel Baptista Nunes
Year:      2009
Edition:      1
Keywords:      Regular Languages, Grammatical Inference, Named Entity Extraction, Information Extraction, Information Retrieval
Type:      Full Paper
First Page:      543
Last Page:      550
Language:      English
Cover:      cover          
Full Contents:      click to dowload Download
Paper Abstract:      Named Entity Recognition (NER) is one of the most important techniques in Information Extraction (IE) from unstructured documents. Still, regular expressions are the first choice to detect domain specific entities, such as product names, in text which follow a special syntax. In many NER scenarios a small, but representative number of entities stored in a structured form is available or can be acquired. Such data can be used by experienced developers to create and test regular expressions. However, creating such specific rules manually for a certain domain is a complex and timeconsuming task. In this paper, we introduce an approach for automated rule generation for NER, based on example instances of entities. We present an implementation, which automatically identifies patterns in small sets of example instances, tunes these patterns in order to achieve high precision and recall and applies these patterns for information extraction. The generated rules are not tuned to a specific training corpus and do not require a homogeneous document structure. The evaluation of our prototype shows very good results for three target domains of interest with an average f-measure of about 90%.
   

Social Media Links

Search

Login