NLP Based Protein Sequence Classification using CNN

Main Article Content

Prativesh Pawar, Pinaki Ghosh

Abstract

The capacity to modify proteins to have better or novel functions has led to a significant increase in interest in protein redesign in the pharmaceutical industry. Natural selection, amplification, and mutation processes may now be simulated in the lab because to recent technological developments. However, a significant barrier remains since protein sequences are complex structures with a large number of potential mutations. Not all possible variations of a protein can be synthesised or evaluated. Protein prediction algorithms have shown very little success in predicting protein structures, despite advances in machine learning. Furthermore, most current approaches concentrate on a narrow set of traits associated with protein sequences.


This study offers a novel approach to categorise protein sequences using artificial intelligence (AI) and convolutional neural networks (CNNs). Our research aims to evaluate three distinct prediction models for efficacy. A combination of single-amino-acid and three-dimensional protein structure-based descriptors are used to train each model. To evaluate the accuracy of our forecasts, we employed a variety of evaluation metrics, encompassing both publicly accessible and proprietary datasets. The results demonstrate the remarkable effectiveness of Convolutional Neural Network (CNN) models trained using amino acid property descriptors in addressing the challenges related to protein structure estimation. For applications in the pharmaceutical industry, this renders them highly advantageous.

Article Details

Section
Articles