Ethical Decision-Making in Humans and Large Language Models: Insights from Five Moral Domains

Main Article Content

Mohammad Nadeem, Shagufta Afreen

Abstract

Introduction: The rapid integration of large language models (LLMs) into sensitive domains such as healthcare, law, and public policy has intensified scrutiny of their ethical decision-making. Although LLMs can express structured reasoning, their capacity to mirror human moral intuitions—especially in socially and emotionally complex situations—remains uncertain.


Objectives: This study assesses the alignment between ethical judgments made by five widely used LLMs (GPT-4.0, Copilot, Gemini, Perplexity AI, DeepSeek) and those of human participants across diverse dilemmas.


Methods: We developed 30 binary-choice ethical dilemmas spanning five domains: moral reasoning; fairness and bias; relational ethics; accountability and transparency; and privacy and human rights. LLM responses were gathered using standardized API prompts; human judgments came from an online survey of 150 participants from varied demographics. Agreement rates between each model and the human majority were calculated and compared across domains.


Results: Overall human–AI agreement averaged 62%. Alignment peaked at 67% in fairness and accountability/transparency and fell below 45% in relational ethics. Model-specific tendencies emerged: Gemini favored outcome-focused (utilitarian) reasoning, Copilot inclined toward rule-based logic, and DeepSeek and Perplexity showed moderate flexibility but systematic privacy biases.


Conclusions: Current LLMs can reproduce structured ethical rules yet struggle with culturally and affectively nuanced judgments. We recommend ethically annotated training data and multidimensional evaluation frameworks to improve moral alignment and public trust.

Article Details

Section
Articles