Ensemble Learning Framework for Crop Yield Prediction with Optuna Hyperparameter Tuning
Main Article Content
Abstract
The growing risk of food scarcity, along with climate change induced shifts in agriculture, demands precise crop yield predictions (CYP). Most existing machine learning (ML) and deep learning (DL) methods face challenges of integrating complex models from diverse data sources and accommodating different agro-ecological regions. Existing solutions do not offer a fully automated and explainable ensemble approach at this scale. This research proposes an automated and explainable ensemble learning framework, using Optuna for hyper-parameter optimization to tune eight regressor models, Gradient Boosting, XGBoost, LightGBM, CatBoost, Random Forest, Bagging Regressor, and KNN, for improved accuracy and generalization. Through the use of multi-source agricultural data and Explainable AI (XAI), our approach seeks to achieve high performance while retaining interpretability. The traditional Gradient Boosting model outperformed other classical ML models achieving ????2 = 0.999 and RMSE=3298.326. Other traditional ML models could not match the performance of the ptoposed optimized models in this study. Important explanatory factors such as amount of pesticide applied, Temperature, and Rainfall were identified through SHAP analyses to underpin yield variability, enabling precise farming. By integrating automation, optimization, and advanced algorithms, the work enables more intelligent agricultural forecasting that allows farmers to make better data driven decisions.