StackEnPred Framework for Enhancing Antimicrobial Peptides Prediction with Sequence-Based Features and Ensemble Machine Learning
Main Article Content
Abstract
Antimicrobial peptides (AMPs) are typically short length peptides that are important for many biological processes and exhibit various functions against different types of organisms. Antibiotics have been used as a cornerstone, effectively against bacterial infections. As such the overuse of antibiotics against the pathogens made them drive evolution and dissemination of microbial resistance mechanisms. This necessitates the innovative strategies to speed up the discovery of Antimicrobial Peptides (AMPs) that act as a promising candidate to traditional antibiotics. Experimental identification of AMPs is costly and time consuming. Machine learning based computational algorithms can be employed to identify the AMP sequences to expedite the discovery of AMPs. This research introduces StackEnPred, a stacked ensemble learning framework that combines sequence-based feature encoding techniques Amino Acid Composition (AAC) and Dipeptide Composition (DPC) to predict AMPs. The model is trained on Deep-AmPEP30 dataset consisting of 1,777 sequences after the preprocessing techniques are provided as input to the proposed StackEnPred model. StackEnPred, consists of two layers. The base learner layer combines Stochastic Gradient Descent (SGD), K-Nearest Neighbors (KNN), Random Forest (RF) and Support Vector Machine (SVM). The meta learner layer consists of MultiLayer Perceptron (MLP), capable of capturing nonlinear interactions for final classification. StackEnPred achieves an accuracy of 83%, AUC-ROC of 0.89, and Matthews Correlation Coefficient (MCC) of 0.6484, outperforming standalone models (SVM: 82% accuracy; RF: 81%) and deep learning architectures (CNN: 79%).