Originally published January 6, 2010
ďIím quite happy with the reports I currently get from the data warehouse department, but I would like a new report in which historical sales data is combined with new sales data, and I mean up-to-date sales. If a customer has paid in one of our stores in Phoenix, Arizona, five minutes later I want to be able to see his sales in my report. Can you do that?Ē asks the regional manager confidently. The warehouse specialist swallows, coughs a little, frowns his eyebrows, and starts his answer with, ďWell ÖĒ and continues with something that resembles ďÖ that might be a pickle.Ē
This is the standard reaction of a data warehouse specialist if he or she is asked to implement operational business intelligence (BI), because thatís in fact what the regional manager asks. And we all know it will be hard, especially if our current warehouse architecture is not designed for operational BI (for example, because itís refreshed only once a day or once a week). It might also be a challenge if the tools used to develop the architecture do not support this concept. Transforming an existing warehouse environment to include operational BI is a technological challenge Ė itís a technological tour de force.
Operational BI is, without a doubt, a hot topic, which is shown in the BeyeNETWORK Research Report,† Operational Business Intelligence: The State of the Art, written by Judith R. Davis, Claudia Imhoff, and Colin White. This report includes the results of a study conducted in April 2009 to determine if and how organizations were using operational BI. The final conclusion was 80% were evaluating, deploying, planning, or running operational BI applications. Forty percent of that 80% already had operational BI in production. So, organizations are adopting operational BI.
But what exactly do we mean by operational BI in this article? Different people have different opinions about what it means. Some take the view that with the term operational BI, we refer to BI applications developed for operational managers (as opposed to tactical and strategic managers). Others think that operational BI means that users get access to very timely data, the so-called operational data. And I have heard other explanations of this term.
In this article, we follow the definition given in the report mentioned earlier. There, the authors distinguish between two types of operational BI: data analytics and event analytics. The former is analytics based on data in a data warehouse environment, and the latter is analytics based on events as they flow into and through operational systems. In this article, we focus on the first form of operational BI: data analytics. This implies that the data warehouse environment should be able to supply data with a currency level ranging from seconds to at most a few minutes.
Of course, we could try to run our BI applications directly on the production databases. However, almost everyone will let you know that thatís going to be risky. At least three potential problems exist. First, the queries that are executed might disturb the workload of the production databases significantly. We could get problems, for example, with locking, or if the query uses so many resources that it slows down the data entry applications. Letís call this the interference problem. Secondly, production databases are normally designed, optimized, and tuned to support transaction processing, thus to support inserts, updates and deletes. This setup might not be ideal for running complex BI queries. It might lead to slow-running queries. We call this the query performance problem. And thirdly, it might be impossible for users if they would like to do some forecasting where they, for example, combine new data with historical data because the production databases might not have been designed to keep track of historical data; only the data warehouse does. We call this the history problem. In fact, these are just three of the many reasons why the concept of data warehouse was invented years ago.
The text above assumes that the production databases donít record historical data, and thatís an assumption that is made in most textbooks on data warehouses as well. But what if it does? What would happen if we would design a database model that makes it possible to actually keep track of all the inserted, updated and deleted data in our production databases? In that case, the third problem would automatically disappear and only the interference and query performance problems would remain.
Not so long ago, I was invited by a Dutch consultancy company called Aorta BI Solutions to discuss two of their customers. What was special about these customers was that they kept track of historical data in their production databases. The original developers of these production databases and applications had developed them in such a way that historical data would not be lost. If data was changed or deleted, it is not really removed from the tables; they use a versioning mechanism. One of the two organizations has already been keeping track of historical data in their production databases for more than fifteen years.
For both organizations, a BI environment has been created by Aorta that accesses the production databases directly. So, no staging areas, no data warehouses, no data marts, no cubes, and no ETL processes were introduced. All data is stored once and that is in the production databases. What a simplicity and what a vision those developers had years ago! In fact, this is how it should be, and how it was described and taught years ago in the first books on database technology: store every fact once! Just read (or should I say re-read), for example, An Introduction to Database Systems by Chris J. Date.
Both organizations donít belong to the top thousand of largest organizations of the world, so they donít have petabyte-sized databases. But we canít call them small either. Both are medium-sized, international organizations with production databases of a few terabytes large. And letís be honest, thatís what most organizations have.
But what about the three problems we mentioned? The first problem, the one we called the interference problem, remains in principle. But is it really a problem? Both organizations report that they are not experiencing any form of interference problem. Even when brand new data is queried, the data entry users donít notice that.
With respect to query performance problems, one organization reports to have no problems whatsoever, and the other indicates that they think they are experiencing minor performance problems with one or two of their reports. From the start, the developers of Aorta BI Solutions had decided they would use Oracle BI Server to decouple reporting from the actual data storage technology. In fact, what they created was an architecture that is in line with the data delivery platform about which I published a few articles. To solve the query performance problems, they have many options. If desired, they can add a data mart or cube to speed up the queries. If they do that, there wonít be any need to change the reports. They will have to change some of the specifications in Oracle BI Server, but thatís it. The reports themselves will stay unchanged because the model that the reports see and use will stay unchanged. Another solution they might select is that they buy a bigger and faster server, or plug in more internal memory. But whatever they select, it is purely a consideration between performance and costs: How much are they willing to pay and how much performance improvement do they need? This is a good example of applying the architectural principles of the data delivery platform Ė an architecture in which the storage structure is decoupled from the reports to improve flexibility. Changes on the storage level have no impact on the reports.
The two organizations donít regard the history problem as a problem because they can look at old data and at brand new data. Itís all there in their production databases.
The best part was that when the customers were asked whether they were satisfied with their hyper-modern and flexible operational BI environment, their response was interesting: They had no idea what we were talking about. They had never heard of the term operational BI. And after explaining it, they didnít understand why operational BI was so special. These were ďjustĒ their reports. In other words, both organizations use operational BI by accident. I think some other organizations will be pretty jealous.
Claudia Imhoff wrote a blog at the BeyeNETWORK entitled, First Steps in Operational BI. Here she proposed two ways to start with operational BI. One was to start small, and the other was to perform an honest assessment of the existing data warehouse and BI delivery capabilities. Maybe she can add a third way: start to keep track of historical data in the production databases.
What we can certainly learn from these two organizations is that in our classes and books on database modeling, and by that I mean the design of production databases, we must teach students and readers that history must always be kept; donít throw away data. Maybe historical data doesnít look that important at first; but eventually we need it, and adding it later on is a devil of a job. This way we can prevent that we have to develop complex data warehouse architectures, or must consider other constructions in order to develop the desired reports. Visit Amazon or walk to your local Barnes and Noble bookstore, randomly† pick a book on database design, and there is a 90% chance that you wonít find any guidelines on how to keep track of history in your production databases. We should start to explain to designers that is a requirement for (at least) future reporting and analytical needs. Additionally, auditability might be another need. Again, the costs for keeping track of historical data later on is humongous compared to adding it right away.
In most books, articles, and presentations, we make quite a fuss about implementing operational BI. Operational BI should not be that complicated, provided that the correct architecture is set up for the data warehouse environment. But thatís where the problems are coming from: most current architectures were designed to support classic business intelligence and only classic BI, and they are not flexible enough to easily adopt operational business intelligence.
Recent articles by Rick van der Lans