Digital Library

cab1

 
Title:      ROLE OF NAMED ENTITIES IN UNDERSTANDING SEMANTIC SIMILARITY OF ENGLISH TEXT
Author(s):      Sumit Kumar and Shubhamoy Dey
ISBN:      978-989-8704-47-4
Editors:      Piet Kommers, Inmaculada Arnedillo Sánchez and Pedro Isaías
Year:      2023
Edition:      Single
Keywords:      Text Similarity, Semantic Similarity, Named Entity
Type:      Short Paper
First Page:      437
Last Page:      442
Language:      English
Cover:      cover          
Full Contents:      click to dowload Download
Paper Abstract:      Understanding semantic similarities between documents is challenging but have enormous benefits, like plagiarism detection and information retrieval. Various techniques are available in Natural language processing, which help in understanding similarities between text documents. Every approach aims to find a unique set of features that help differentiate between two or more documents. Names of persons, organizations, locations, medical codes, acronyms, technical terms, date & time expressions, quantities, monetary values, and percentages (collectively known as Named Entities) and the order in which they appear in a document contribute a great deal to the uniqueness of the document (Li et al., 2020). If two documents share them, they must present the same information or discuss the same concept. Another advantage of Named Entities (NE) in the context of plagiarism detection is that they do not have synonyms - replacing words with their synonyms to avoid detection is, therefore, not an option. Thus, NEs have a high potential for detecting similarities between documents. Yet, going by the availability of literature, it is an under-researched concept. In this article, we discuss and explore the concept of NEs and their meta characteristics, and propose a way of using that information to find similarities between documents. Our initial experimental results, discussed in this article, demonstrate the efficacy of the approach intuitively argued above. This article is unique in its methodology, thus comparing the results with other available methods on textual similarity is inappropriate. We have compared the results of the proposed NE based approach with existing approaches based on Term Frequency and TF-IDF. The future goal of the ongoing research work is to combine NEs and their meta characteristics with other characteristics to develop a robust and comprehensive framework for finding semantic similarities between documents.
   

Social Media Links

Search

Login