Deep Hierarchical Clustering for Enhanced Analysis of Genome-Wide DNA Promoters
Main Article Content
Abstract
Genome-wide analysis of DNA promoters is essential for understanding gene regulation and transcriptional activity, providing insights into cellular function and disease mechanisms. Traditional promoter analysis methods often struggle with high-dimensional genomic data, leading to poor clustering accuracy and limited biological insight. Deep hierarchical clustering (DHC) offers a robust solution by leveraging deep learning techniques to uncover hidden patterns in complex promoter sequences. The proposed DHC model combines convolutional neural networks (CNN) with a hierarchical clustering framework to enhance clustering accuracy and biological interpretability. CNN extracts high-dimensional promoter features, which are then clustered using an agglomerative hierarchical clustering approach based on cosine similarity. This dual-stage architecture enables precise identification of promoter subtypes and regulatory elements. Experimental validation on publicly available genome-wide datasets shows that the proposed DHC model achieves improved clustering accuracy, silhouette score, and biological consistency compared to k-means, hierarchical clustering, and Gaussian mixture models. The model demonstrated an accuracy improvement of 7.3% over existing hierarchical clustering techniques. These findings highlight the potential of deep hierarchical clustering for large-scale genomic analysis and promoter classification, offering a powerful tool for exploring gene regulation mechanisms.