"Solving the Traveling Salesman Problem with Drones Using Proximal Policy Optimization and Deep Reinforcement Learning"
Main Article Content
Abstract
The increasing demand for scalable and efficient last-mile delivery has prompted the integration of drones with trucks in hybrid logistics systems. While reinforcement learning (RL) methods have shown promise in addressing the Traveling Salesman Problem with Drones (TSP-D), most approaches focus on single truck-drone coordination, limiting their real-world applicability. This paper introduces a novel multi-agent reinforcement learning (MARL) framework based on Proximal Policy Optimization (PPO) to address a multi-truck multi-drone TSP-D scenario. Each agent (truck or drone) learns a decentralized policy with a shared global reward, enabling real-time, cooperative route planning. An enhanced state representation captures vehicle positions, battery constraints, and inter-agent interactions. The actor-critic network employs deep residual layers and agent identity encoding to support dynamic adaptation and coordination. A Dijkstra-based module ensures drone reachability under energy constraints, while a task allocation mechanism balances delivery loads and prevents conflicts. Experiments on synthetic and real-world-inspired datasets demonstrate the proposed model’s superiority over single-agent PPO and classical metaheuristics in terms of delivery time, energy efficiency, and scalability. As the number of agents and delivery nodes grows, the system maintains high performance, demonstrating strong potential for real-time autonomous logistics in urban environments.