Title:
|
STATISTICAL WORD SENSE DISAMBIGUATION THROUGH UNSUPERVISED SENSE MARKER ENRICHMENT ON LARGE FREE TEXT CORPUSES |
Author(s):
|
Shahzad Khan , Kenan Azam |
ISBN:
|
972-99353-0-0 |
Editors:
|
Pedro IsaĆas and Nitya Karmakar |
Year:
|
2004 |
Edition:
|
1 |
Keywords:
|
Computational Linguistics, Information Processing, Word-Sense Disambiguation, semantic ontology. |
Type:
|
Full Paper |
First Page:
|
543 |
Last Page:
|
550 |
Language:
|
English |
Cover:
|
|
Full Contents:
|
click to dowload
|
Paper Abstract:
|
Word sense ambiguity is known to have a destructive effect on the performance of information retrieval and linguistic systems. The problem arises from the inherent polysemous nature of natural languages, where one word can have multiple meanings or senses. This is not a problem for humans but mapping the correct sense of a word is a daunting task for a retrieval system. This paper describes two disambiguation methodologies based on contemporary techniques that seek to enrich text with sense meta-information by identifying the correct sense for an ambiguous noun in a document. This research draws on contemporary statistical disambiguation methodologies, and attempts to make it more effective through a novel weighting scheme, which is simpler than complex schemes used by other disambiguation algorithms. This research follows two recent ground breaking research results --- that words tend to have one sense per document and one sense per collocation. In the experiments, the set of senses for each polysemous word are the same as the Wordnet 1.7 repository. However, the methodologies are generalized, and applicable to any concept repository that is built on a generalization/specialization framework. The two different methodologies are compared with each other and the results establish that this approach leads to an improvement in the disambiguation process. This paper also proposes a strategy to use the disambiguation methodology to enhance relevance feedback and information retrieval performance. |
|
|
|
|