In the first flush of considering a BI or analytics opportunity in the business and conceiving a solution that delivers exactly the right data needed to address that pesky problem, it's easy to forget the often rocky road of design and development ahead. More often forgotten, or sometimes ignored, is the ongoing drama of maintenance. Kalido, with their origins as an internal IT team solving a real problem for the real business of Royal Dutch Shell in the late '90s, have kept these challenges front and center.
All IT projects begin with business requirements, but data warehouses have a second, equally important, staring point: existing data sources. These twin origins typically lead to two largely disconnected processes. First, there is the requirements activity often called data modeling, but more correctly seen as the elucidation of a business model, consisting of function required by the business and data needed to support it. Second, there is the ETL-centric process of finding and understanding the existing sources of this data, figuring out how to prepare and condition it, and designing the physical database elements needed to support the function required.
Most data warehouse practitioners recognize that the disconnect between these two development processes is the origin of much of the cost and time expended in delivering a data warehouse. And they figure out a way through it. Unfortunately, they often fail to recognize that each time a new set of data must be added or an existing set updated, they have to work around the problem yet again. So, not only is initial development impacted, but future maintenance remains an expensive and time-consuming task. An ideal approach is to create an integrated environment that automates the entire set of tasks from business requirements documentation, through the definition and execution of data preparation, all the way to database design and tuning. Kalido is one of a small number of vendors who have taken this all-inclusive approach. They report build effort reductions of 60-85% in data warehouse development.
Conceptually, we move from focusing on the detailed steps (ETL) of preparing data to managing the metadata that relates the business model to the physical database design. The repetitive and error-prone donkey-work of ETL, job management and administration is automated. The skills required in IT change from programming-like to modeling-like. This has none of the sexiness of predictive analytics or self-service BI. Rather, it's about real IT productivity. Arguably, good IT shops always create some or all of this process- and metadata-management infrastructure themselves around their chosen modeling, ETL and database tools. Kalido is "just" a rather complete administrative environment for these processes.
Which brings me finally back to the shores of the data lake. As described, the data lake consists of a Hadoop-based store of all the data a business could ever need, in its original structure and form, and into which any business user can dip a bucket and retrieve the data required without IT blocking the way. However, whether IT is involved or not, the process of understanding the business need and getting the data from the lake into a form that is useful and usable for a decision-making requirement is exactly identical to that described in my third paragraph above. The same problems apply. Trust me, similar solutions will be required.
Posted March 17, 2014 5:33 AM
Permalink | 1 Comment |