Originally published December 10, 2012
Big data is all the rage these days because of a combination of forces, including the continued growth of data volumes, the increased velocity of data creation and updates from a variety of internal and external sources, and the availability of easily installed tools for building scalable analytics platforms using commodity hardware.
Similar to the boom in automobile use driven by the increased capacity of the interstate highway system, the improvements in computational power and speed for business intelligence (BI) and analytics applications enable broader dissemination of actionable knowledge in organizations. When that’s coupled with demands from business users for faster access to information to speed up decision making, the pressure to provide right-time intelligence capabilities grows exponentially. But in many organizations, there is a bottleneck in the technology infrastructure causing unwanted delays in the delivery of information. What can be done to break that bottleneck?
Batch extract, transform and load (ETL) processes might have been satisfactory when your data warehouse was refreshed every month. Now pervasive and right-time analytics seems to be within reach, but the batch-oriented approach is insufficient to meet today’s—let alone tomorrow’s—data integration and delivery needs.
Greater storage capacities and more powerful computers allow more and more data to be generated, published and ultimately captured and stored for analysis. Flowing all that data into analytical environments enables more reports and BI information to be streamed to the operational environment for business units to act on. Data scientists want to combine these data streams with the masses of data collected and archived over the years to support deep analytics applications.
The lingering barrier to success, associated with the only part of the technology infrastructure that has not scaled to meet the new demands, is data provisioning and the timely delivery of massive data sets from source systems to BI and analytics platforms. Basically, the ability to provide integrated analytics capabilities to the growing community of business users is being throttled by the inability to provide rapid access to consistent and up-to-date data. Without addressing the challenge of data latency, data provisioning will continue to be the biggest bottleneck to increased productivity and accurate business decision making.
Delayed access to data archives. Big data platforms are increasingly being used as interim data archival systems. These “warm archives” must be loaded with data coming from both internal and external sources, and timely migration of the data is necessary for indexing, searching, matching and then delivering information to business users. Data latency reduces the performance and effectiveness of the systems.
Longer development cycles for analytics applications. The process of developing advanced analytics applications consists of a series of iterative steps, involving the development, testing and scoring of analytical models. Big data analytics applications need to be designed using large data sets, and each repetition of the model-test-score cycle may require that the data sets be tweaked and reloaded onto the development platform. Slow data availability elongates the application development cycle and may result in missed business opportunities.
A lack of BI and analytics scalability. Demand for real-time BI and analytics capabilities from a wider community of users could cause an explosion in the number and types of analyses performed simultaneously in an organization. But that would require simultaneous availability of current and timely data to power the analyses—something that would be hard to achieve if data delivery was sluggish.
Delayed and questionable decision making. Lags in data delivery both into and out of BI and big data systems cause delays in providing actionable information to business decision makers. At the same time, data latency introduces concerns about data currency and consistency that can contribute to uneasiness about the trustworthiness of analytical results—and, ultimately, the decisions based on those results.
Recent articles by David Loshin