Comparative Analysis of the Quran, Bible, and Tanakh (English Versions) using Natural Language Processing, Python, Data Mining, and Machine Learning

Farhana Akter
2 min readNov 3, 2020

--

In the name of Allah, Most Gracious, Most Merciful

Project Manager: Dr. Farhana Akter

Research Team: AIsoftsolution

The Aim of The Project

  1. Compare documents (Quran, King James Bible, and Tanakh) for similarity using Python and NLP.
  2. Compare the Quran (the Muslim scripture), Tanakh (Jewish scripture), and King James Bible (Christian scripture) using machine learning text comparison (similarity).
  3. Find text similarity (Quran, Tanakh & King James Bible) using NLP and machine learning.
  4. Find text matching (Quran, Bible, and Tanakh) with Deep Learning.

Steps for Document Similarity

o Data Reading:

§ We used pandas and PyPDF2 for data reading and saved it into a list instance accordingly.

o Data Cleaning:

§ We then cleaned our data using the NLTK package and regular expressions. This involved removing everything other than text, such as punctuation, special characters, and numeric data.

o Model Selection:

§ After studying different models to convert text into vectorization, we selected the “Fast Text model,” an open-source, free, lightweight library for learning text representations and text classifiers.

In this model, we generated a dictionary from our textual data containing weights for each word. Subsequently, we found the whole document’s similarity, creating a similarity matrix for each book. We calculated the cosine similarity between documents.

Result and Visualization

  • We plotted cosine distance on a heat map for distance visualization. It’s important to note that human error or technical issues may affect the accuracy and adequacy of the research results.
  • Supported tools: Jupyter notebook
  • Working language: Python
  • Machine learning model: Fast Text model (Deep-learning architecture)
  • Distance finding: Cosine similarity

Similarity Result Between Quran, Tanakh, And Bible

  • The heat map shows that the Quran and King James Bible have a 69% similarity.
  • The heat map demonstrates that Tanakh and King James Bible have a 38% similarity.
  • The heat map shows that Tanakh and Quran have a 14% similarity.

Conclusion

This research demonstrates that the Quran and the King James Bible (English version) are 69% similar, which is the highest similarity among the three books. Finally, it is recommended that further studies are required to develop in-depth knowledge about this topic.

--

--

Farhana Akter
Farhana Akter

Written by Farhana Akter

Here I am, O Allah, here I am. I submit and submit again.

No responses yet