On finding similar verses from the Holy Quran using word embeddings

Department

Department of Computer Science

Was this content written or created while at IBA?

Yes

Document Type

Conference Paper

Publication Date

3-1-2020

Author Affiliation

  • Sumaira Saeed is PhD Scholar at the Department of Computer Science, Institute of Business Administration, Karachi
  • Sajjad Haider is Professor at Institute of Business Administration, Karachi
  • Quratulain Rajput is Assistant Professor at Institute of Business Administration, Karachi

Conference Name

2021 International Conference on Emerging Trends in Smart Technologies, ICETST 2020

Conference Location

Karachi, Pakistan

Conference Dates

26-27 March 2020

ISBN/ISSN

85084946931 (Scopus)

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Abstract / Description

Finding semantic text similarity (STS) between two pieces of text is a well-known problem in Natural Language Processing. Its applications are nearly in every field such as plagiarism detection, finding related user queries in customer services or finding similar questions in search engines or forums like Stack Overflow, Quora and Stack exchange. If applied to any religious text, it can help to relate how similar pieces of knowledge are described in different places. This paper uses Word2Vec and Sent2Vec models to facilitate the process of knowledge extraction from a given corpus. The paper makes use of several English translations of the Holy Quran which is the most sacred book for Muslims. Sent2vec models have been trained from several translations of the book and the trained models are then subsequently utilized to study the semantic relationship between different words and sentences. The performance of the custom-built word embeddings is compared against the pre-trained embeddings provided by the Spacy library.

Find in your library

Share

COinS