Title:
|
A COMPARATIVE ASSESSMENT OF LANGUAGE IDENTIFICATION APPROACHES IN TEXTUAL DOCUMENTS |
Author(s):
|
Luciano de Souza Cabral, Rafael Dueire Lins, Rinaldo Lima, Steven J. Simske |
ISBN:
|
978-989-8533-14-2 |
Editors:
|
Hans Weghorn and Pedro Isaías |
Year:
|
2012 |
Edition:
|
Single |
Keywords:
|
Language Identification, Comparative Analysis, Document Engineering. |
Type:
|
Full Paper |
First Page:
|
67 |
Last Page:
|
74 |
Language:
|
English |
Cover:
|
|
Full Contents:
|
click to dowload
|
Paper Abstract:
|
This paper presents several experiments conducted for assessing distinct methods for language identification of written texts. After introducing a new method for the language identification problem, we conducted some standard experiments aiming at evaluating the proposed approaches against three other ones. In order to perform fair comparisons, we used the same corpus (EuroParl Corpus), which contains 21,000 sentences evenly distributed in 21 languages. We discuss the experimental results as well as the strengths and limitations of the compared algorithms. In addition, the accuracy results achieved by the proposed method introduced in this research work showed that it is very competitive with other state of the art methods. |
|
|
|
|