The Arabic–English Parallel Corpus of Authentic Hadith
Keywords:
Hadith, Parallel Corpus, NLP, Language ResourceAbstract
We present a bilingual parallel corpus of Islamic Hadith, which is the set of narratives reporting different aspects of the Prophet Muhammad's life. The Hadith collection is extracted from the six canonical Hadith books which possess unique linguistic features and patterns that are automatically extracted and annotated using a domain-specific tool for Hadith segmentation. In this article, we present the methodology of creating the corpus of 39,038 annotated Hadiths which will be freely available for the research community.
Downloads
Published
2025-07-14