Title:
|
ROLE OF NAMED ENTITIES IN UNDERSTANDING
SEMANTIC SIMILARITY OF ENGLISH TEXT |
Author(s):
|
Sumit Kumar and Shubhamoy Dey |
ISBN:
|
978-989-8704-47-4 |
Editors:
|
Piet Kommers, Inmaculada Arnedillo Sánchez and Pedro Isaías |
Year:
|
2023 |
Edition:
|
Single |
Keywords:
|
Text Similarity, Semantic Similarity, Named Entity |
Type:
|
Short Paper |
First Page:
|
437 |
Last Page:
|
442 |
Language:
|
English |
Cover:
|
|
Full Contents:
|
click to dowload
|
Paper Abstract:
|
Understanding semantic similarities between documents is challenging but have enormous benefits, like plagiarism
detection and information retrieval. Various techniques are available in Natural language processing, which help in
understanding similarities between text documents. Every approach aims to find a unique set of features that help
differentiate between two or more documents.
Names of persons, organizations, locations, medical codes, acronyms, technical terms, date & time expressions,
quantities, monetary values, and percentages (collectively known as Named Entities) and the order in which they appear
in a document contribute a great deal to the uniqueness of the document (Li et al., 2020). If two documents share them,
they must present the same information or discuss the same concept. Another advantage of Named Entities (NE) in the
context of plagiarism detection is that they do not have synonyms - replacing words with their synonyms to avoid
detection is, therefore, not an option. Thus, NEs have a high potential for detecting similarities between documents.
Yet, going by the availability of literature, it is an under-researched concept.
In this article, we discuss and explore the concept of NEs and their meta characteristics, and propose a way of using that
information to find similarities between documents.
Our initial experimental results, discussed in this article, demonstrate the efficacy of the approach intuitively argued
above. This article is unique in its methodology, thus comparing the results with other available methods on textual
similarity is inappropriate. We have compared the results of the proposed NE based approach with existing approaches
based on Term Frequency and TF-IDF.
The future goal of the ongoing research work is to combine NEs and their meta characteristics with other characteristics
to develop a robust and comprehensive framework for finding semantic similarities between documents. |
|
|
|
|