Integration of Statistical and Machine Learning Models for Time Series Forecasting in Optimizing Decision Making for Smart Waste Management

Main Article Content

Carlos A. Villanueva, Thelma D. Palaog

Abstract

Introduction: Smart decision-making for an efficient and effective waste management strategy involves the utilization of contemporary technologies such as predictive analytics and machine learning. They allow for effective planning of waste generation patterns, collection schedules and resource deployment. With the use of data-driven insights, municipalities can increase sustainability, lower operational expenses, and enhance environmentally friendly waste management strategies.


Objectives: The goal of this research is to create and contrast time series and machine learning models namely ARIMA, Random Forest, and XGBoost to predict the amount of biodegradable waste collected (weight_kg) over time. Utilizing both temporal patterns and structured features like year and population, the goal is to compare which modeling technique is most accurate and reliable in terms of forecast. Model performance is evaluated based on common measures like MAE, MSE, RMSE, MAPE, and R² score, aiming to choose the best method for producing short-term (1-year) predictions to inform data-driven waste management planning.


Methods: The machine learning workflow was adopted to research and forecast waste generation patterns, which involves data collection, preprocessing, feature selection, model training, testing, and validation to develop a predictive model


Results: Among the three models tested namely ARIMA, Random Forest, and XGBoost, ARIMA stood out with the lowest errors: MAE 0.62, MSE 0.71, RMSE 0.84, and MAPE 4.38%, and a positive pseudo-R² of 0.2254. Random Forest also had excellent performance (MAE 0.62, MSE 0.68, RMSE 0.82, R² 0.23), following closely after ARIMA. XGBoost, on the other hand, was poor with high errors (MAE 1.41, MSE 3.58, RMSE 1.89) and a negative R² of -0.4312. Overall, ARIMA is the best model for this dataset.


Conclusions: ARIMA performed best with lowest errors and a positive R², followed very closely by Random Forest. XGBoost performed worse with greater error rates and a negative R². For further improvement in model performance, particularly for machine learning models, more features from the dataset (e.g., waste category, location, day of week, month, or type of collection service) may be added. These features may capture patterns and seasonal effects missed by simpler models, enabling more robust and generalizable forecasting. Integrating such variables may improve the predictive power of tree-based and deep learning models, potentially surpassing ARIMA in future iterations.

Article Details

Section
Articles