The Arabic–English Parallel Corpus of Authentic Hadith

Authors

  • Shatha Altammami University of Leeds
  • Eric Atwell University of Leeds
  • Ammar Alsalka University of Leeds

Keywords:

Hadith, Parallel Corpus, NLP, Language Resource

Abstract

We present a bilingual parallel corpus of Islamic Hadith, which is the set of narratives reporting different aspects of the Prophet Muhammad's life. The Hadith collection is extracted from the six canonical Hadith books which possess unique linguistic features and patterns that are automatically extracted and annotated using a domain-specific tool for Hadith segmentation. In this article, we present the methodology of creating the corpus of 39,038 annotated Hadiths which will be freely available for the research community.

Downloads

Published

2025-07-14