Title:
|
BUILDING A CORPUS TO CATEGORIZE ARABIC SHORT TEXT USING GAMES WITH A PURPOSE |
Author(s):
|
Abdennadher Slim, Ayman Heba, Sabty Caroline, Salem Reem, Tarhony Nada, Zohny Sara |
ISBN:
|
978-989-8533-24-1 |
Editors:
|
Pedro IsaĆas and Bebo White |
Year:
|
2014 |
Edition:
|
Single |
Keywords:
|
Games with A Purpose, Arabic short text, Topic Categorization, NLP, Crowdsourcing, Human Computation |
Type:
|
Full Paper |
First Page:
|
59 |
Last Page:
|
65 |
Language:
|
English |
Cover:
|
|
Full Contents:
|
click to dowload
|
Paper Abstract:
|
Text categorization, also known as text classification or topic detection is the task of automatically sorting documents into a predefined set of categories. This is considered to be one of the most important fields in the Natural Language Processing area, especially for the Arabic language where a fewer attempts are made towards constructing a corpus that could be used to train classifications algorithms for the Arabic short text. All the work investigated focuses on document classification whether for Arabic or other languages. Moreover, the small amount of work directed towards short text focused on sentiment analysis and neglected categorization. In this paper a new approach is presented to construct a corpus for short Arabic text classification using a Game with a Purpose (GWAP). "Eih elMawdoo3?" ("What is the Topic?") is a multiplayer GWAP that aims to categorize short, unstructured Arabic text, along with collecting various keywords that will help constructing a strong, cheap, and expandable corpus for short text classification in the Arabic language. |
|
|
|
|