Deep Learning for Autonomous Data Quality Enhancement: A Paradigm Shift in Machine Learning Pipelines

Main Article Content

Subba Rao Katragadda, Ajay Tanikonda, Sudhakar Reddy Peddinti

Abstract

The effectiveness of machine learning models is highly dependent on the quality, completeness, and reliability of input data. However, traditional data preprocessing methods struggle with automating data quality enhancement, particularly in large-scale and dynamic environments. This review explores the role of deep learning in autonomous data quality enhancement, emphasizing advancements in data cleaning, imputation, deduplication, anomaly detection, and bias mitigation. Techniques such as Generative Adversarial Networks (GANs), autoencoders, transformer-based models, and self-supervised learning are analyzed for their ability to enhance data integrity and preprocessing efficiency. The paper also examines the integration of deep learning with data engineering pipelines, addressing challenges related to scalability, interpretability, and computational overhead. Finally, we discuss future research directions and potential industry applications where deep learning-driven data quality enhancement can redefine data preprocessing in machine learning workflows.

Article Details

Section
Articles