Title:
|
PBT: PERSIAN PART OF SPEECH BRILL TAGGER |
Author(s):
|
Habib Karbasian , Parisa Rashidi |
ISBN:
|
978-972-8924-56-0 |
Editors:
|
Nuno Guimarães and Pedro Isaías |
Year:
|
2008 |
Edition:
|
Single |
Keywords:
|
Persian Part of Speech Tagger, Tagging, Brill Tagger, Corpus |
Type:
|
Short Paper |
First Page:
|
348 |
Last Page:
|
352 |
Language:
|
English |
Cover:
|
|
Full Contents:
|
click to dowload
|
Paper Abstract:
|
Persian is a language widely spoken in Iran and neighboring countries such as Afghanistan, Tajikistan etc. Recently there
is an increasing interest in processing and retrieval of Persian (Farsi) in these countries and around the world. A rulebased
tagging method has been applied to Persian language. Since tagging is a preprocessing step toward natural
language processing, we have tried to alleviate this path for Persian language.
In this paper, we describe initial findings in the development of a Persian part-of-speech tagger, based on Brill tagger.
Because Persian is both morphologically and structurally complex, we used two different sets of rules: a lexical rule set
and a contextual rule set. This tagger has been tested on a five Persian test collection. We have used this tagger to extract
lexicon and tag sample texts from the corpus. A tag set with 40 basic syntactical and morphological Persian tags has been
utilized in these experiments. So far the results have been encouraging about 95% accuracy on the sample texts. |
|
|
|
|