Optimizing Big Data Analysis with Machine Learning: Clustering, Visualization, and Insight Extraction

Main Article Content

Rajesh Govind Talekar, Avinash Vasantrao Khambayat

Abstract

Big Data analysis is understanding valuable insights within large datasets. The current analysis, which looks into how machine learning optimizes Big Data analysis concerning clustering, visualization, and insight extraction, explores this area. We show how clustering techniques work in several areas with the application on publicly available Mall Customer and Iris datasets. For this purpose, clustering methodologies such as K-Means, Hierarchical Clustering, and DBSCAN may then be used to cluster the data and recognize the patterns. Forms of reducing the feature space and enhancing visualization are Principal Component Analysis (PCA), t-SNE, and dendrograms, which improve interpretability and give clear representations of complex patterns. The assessment on the ideal number of clusters is based on the Silhouette score. In addition to these, some classification methods such as Support Vector Machine (SVM), Logistic Regression, and K-Nearest Neighbour (K-NN) are used for classifying the data into several classes. The results show how clustering creates advantages in decision-making based on the economic and biological spheres. By discussing the scalability challenge and optimization techniques in processing large datasets, our future work is taken up. This is to show for machine learning-based clustering in getting necessary forward inputs to make this possible across different sectors.

Article Details

Section
Articles