Modern Data Store Selection: A Quantitative Framework for Enterprise Architecture
Main Article Content
Abstract
The contemporary enterprise architecture is experiencing challenges in the selection of data stores as the global data is growing exponentially, and the organizations are keeping high availability needs. The economic impact of storage infrastructure failures has grown to be very high, and of the many enterprises, most have incurred expensive downtimes which directly affect not only revenue but also customer satisfaction. Storage technology choices entail basic trade-offs in architecture between row-based and column-based forms of execution, where columnar systems have proven significantly superior to analytical workloads through late materialization, block iteration optimization and invisible join methods that scale up to provide exponential performance benefit. Distributed systems have to compromise consistency assurances and latency needs because even slight, measurably, decreases in response time have a negative impact on user interactions. The performance characteristics of the various categories of technologies are radically different, with relational and document databases displaying different performance with respect to connection loads as well as workload constituencies and memory resident caching systems demonstrating less than millisecond latencies with extraordinary throughput rates. At service level, the goals are converted into measurable error budgets limiting acceptable downtime levels, compelling architects to consider storage technologies in a worst-case recovery scenario instead of in an ideal performance. Modern business ventures are more prone to implementing polyglot persistence models where different special purpose data stores are used depending on the workload properties. Nonetheless, the pattern of architecture adds significant complexity to integration, with the organizations organizing data flow with many systems operating with high frequency rates, and reliability of pipelines and synchronization are the main elements of operation instead of the peculiarities of individual database performance.