Title:
|
IDENTIFYING AND CHARACTERIZING CONCEPTS IN UNSTRUCTURED TEXTS USING AUTOMATIC ANNOTATION |
Author(s):
|
Tiago Fraga, Orlando Belo and Anabela Barros |
ISBN:
|
978-989-8704-44-3 |
Editors:
|
Hans Weghorn and Pedro Isaias |
Year:
|
2022 |
Edition:
|
Single |
Keywords:
|
Annotation Systems, Natural Language Processing, Machine Learning, Text Mining, Automatic Tagging, Data Analysis |
Type:
|
Full Paper |
First Page:
|
63 |
Last Page:
|
70 |
Language:
|
English |
Cover:
|
|
Full Contents:
|
click to dowload
|
Paper Abstract:
|
Text annotation is an important and useful activity in semantic text analysis processes. The introduction of notes and marks in texts is one of the most common ways of valuing the content and revealing the semantics of a text, allowing its readers to have a more concrete idea of what is expressed in it. For a long time, text annotation processes were done manually often carried out in an ad hoc manner, without using a concrete method. It was a very time-consuming and labor-intensive process of different natures. Today, the great development of natural language processing, machine learning and mining text, promoted a strong emergence of applications in several domains. Text annotation was no exception. Through the combination of natural language processing and machine learning mechanisms, it is possible to develop systems that analyze texts, written in natural language, identifying words, creating contexts, and discovering and maintaining tags, their relationships and annotations, in an automatic way. In this paper, we present an automatic annotation system conceived and developed specifically for tagging the texts of the Book of Properties, a codex containing the inventory of the Archbishop's Table of Braga's properties (Portugal) in the 17th century. In addition to a general characterization of the system, we also describe the various stages of its annotation process with references taken from the Book of Properties. |
|
|
|
|