Comprehensive Analysis of Arabic Tokenization System Preprocessing using the Matching Model

Ibrahim Abdelfattah Almajali

doi:10.52783/jisem.v10i4.8981

PDF

Published: Apr 30, 2025

DOI: https://doi.org/10.52783/jisem.v10i4.8981

Keywords:

Natural Language Processing, Arabic word tokenization, Arabic Language Processing, PoS Tagging, Maximum Matching Model

Ibrahim Abdelfattah Almajali, Mutlaq Moraya Nafah Alharbi

Abstract

This research paper proposes a novel Arabic word tokenization system based on the knowledge Word tokenization is the first stage for higher-order Natural Language Processing (NLP) tasks like Part-of-Speech (PoS) tagging, parsing, and named entity recognition. The amount of text on the World Wide Web is growing daily in the present era of technology, necessitating the use of advanced instruments. Since more and more people speak Arabic around the world, Arabic language processing systems must be improved. Due to the writing style of Arabic with a lack of support for capitalization features and the use of compound words, it is difficult to perform word tokenization. Arabic's inconsistent usage of space between words makes it difficult to tokenize words because of its cursive form. Word tokenization plays a vital role in all aspects of natural language processing. Different applications can be created once words have been tokenized. To develop this system, a maximum matching model with its two variations, namely forward and reverse maximum matching is used. The proposed system is implemented in Python. The results produced during system evaluation report high performance.

Issue

Vol. 10 No. 4 (2025)

Section

Articles

Journal of Information Systems Engineering and Management

Comprehensive Analysis of Arabic Tokenization System Preprocessing using the Matching Model

Abstract

Volume 11 (2026)

Volume 10 (2025)

Volume 9 (2024)

Volume 8 (2023)

Volume 7 (2022)

Volume 6 (2021)

Volume 5 (2020)

Volume 4 (2019)

Volume 3 (2018)

Volume 2 (2017)

Volume 1 (2016)

Journal of Information Systems Engineering and Management

Article Sidebar

Main Article Content

Abstract

Article Details