Exploration of Big Data Pipeline Solutions for Business Analysis: A Comprehensive Survey

Main Article Content

Pallavi G B, Latha N R, Shyamala G, Kalyana Kiran B. S. Goli, D Revanth, Gamana Yeluri R, Harika N, Keerthi P Reddy

Abstract

The sudden burst of data has resulted in the emergence of many big data frameworks such as Hadoop, Flink, and cloud-native platforms including Azure, AWS, and Google Cloud. Although these technologies facilitate efficient processing, storage, and analytics for business analysis, organizations are faced with the dilemma of selecting the appropriate framework because of differences in scalability, automation, and performance. Managed cloud platforms focus on smooth integration and operational efficiency, but companies receive no direct guidance on how to select the optimal pipeline for a given workload, especially when working with real-world, heterogeneous datasets such as Yelp. This research delves into the challenges of big data processing, examining primary inefficiencies and architectural trade-offs to offer insights into workflow optimization for data, business analysis, and decision-making. Furthermore, this work not only compared the platforms but also offers some guidance on how to choose the best processing pipeline specific to a complex business dataset like Yelp.

Article Details

Section
Articles