Abstractive Gujarati Text Summarization Using Sequence-To-Sequence Model and Attention Mechanism
Main Article Content
Abstract
Introduction: In recent years text summarization has been one of the piloting problems of natural language processing (NLP). It comprises a consolidated brief on a large text document. Extractive and Abstractive are the two output-based summarization techniques. For the Indian Language much research is being carried out in Extractive Summarization. Performance of Abstractive summarization remains a challenge for a language like Gujarati. With the rise of digital Gujarati news portals, automatic summarization can provide concise versions of news articles and make it easier for readers to grasp key information quickly
Objectives: We aim to create an effective and efficient abstractive text summarizer for Gujarati text, which can generate an understandable and expressive summary.
Methods: Our model works as a Sequence-to-Sequence model using encoder-decoder architecture with an attention mechanism. LSTM-based encoder-decoder with an attention-based model generates human-like sentences with core information of the original documents.
Results: Our experiment conducted the effectiveness and success of the proposed model by increasing the accuracy up to 87% and decreasing the loss to 0.48 for the Gujarati Text.
Novelty: In terms of NLP, Gujarati is a low-resource language for researchers, especially for text summarization. So to achieve our goal, we created our dataset by collecting Gujarati text data such as news articles and their headlines from online/offline resources like daily newspapers. Gujarati has unique grammatical structures and morphology, so for pre-processing the Gujarati text, we proposed a pre-processor(GujProc) specific to Gujarati to trace the linguistic.