An AI-Powered Conversational Agent for Natural Disaster Management

Main Article Content

Hathairat Ketmaneechairat, Sopida Tuammee, Maleerat Maliyaem, Valentin Obert, Samuel Jully

Abstract

This work investigates the effectiveness of lightweight large language models (LLMs) in supporting the design of specialized, domain-restricted chatbots. The study focuses on compact off-the-shelf models, specifically Phi-3, LLaMA 3-8B, and Mistral 7B, evaluating their performance in maintaining topic relevance, coherence, and overall quality under controlled experimental conditions. To enforce domain specificity, each model is guided using a carefully designed system prompt that instructs it to respond exclusively about natural disasters. The evaluation is conducted through a multi-dimensional approach involving human judgments, benchmark testing, and automated evaluation using GPT-4. A custom web-based application was developed to enable users to input questions, view side-by-side responses from two models, and rate the outputs based on helpfulness, relevance, and clarity. In addition to this interactive human evaluation, models are assessed on standard benchmarks such as Massive Multitask Language Understanding (MMLU), GSM8K, and MT-Bench, and are also subjected to pairwise preference evaluation via GPT-4. Results indicate that even without fine-tuning, lightweight models can effectively handle domain-specific conversations when guided by system prompts. The study provides practical insights into selecting the most appropriate model for building focused, efficient chatbots in resource-constrained environments.

Article Details

Section
Articles