Blog: Colin White Subscribe to this blog's RSS feed!

Colin White

I like the various blogs associated with my many hobbies and even those to do with work. I find them very useful and I was excited when the Business Intelligence Network invited me to write my very own blog. At last I now have somewhere to park all the various tidbits that I know are useful, but I am not sure what to do with. I am interested in a wide range of information technologies and so you might find my thoughts will bounce around a bit. I hope these thoughts will provoke some interesting discussions.

About the author >

Colin White is the founder of BI Research and president of DataBase Associates Inc. As an analyst, educator and writer, he is well known for his in-depth knowledge of data management, information integration, and business intelligence technologies and how they can be used for building the smart and agile business. With many years of IT experience, he has consulted for dozens of companies throughout the world and is a frequent speaker at leading IT events. Colin has written numerous articles and papers on deploying new and evolving information technologies for business benefit and is a regular contributor to several leading print- and web-based industry journals. For ten years he was the conference chair of the Shared Insights Portals, Content Management, and Collaboration conference. He was also the conference director of the DB/EXPO trade show and conference.

Editor's Note: More articles and resources are available in Colin's BeyeNETWORK Expert Channel. Be sure to visit today!

It is interesting that whenever we talk about business intelligence we immediately think of it in the context of data warehousing. The two always go together, right? Wrong! I think we have been indoctrinated into thinking this way. We have lost sight of the fact that data warehousing only came about because we couldn’t design our operational systems right in the first place.

This point was brought home to me while teaching a class on operational BI with Claudia Imhoff at the recent DAMA conference. It became obvious that some people came into the seminar thinking of operational BI in terms of data warehousing. When I started talking about the concept that operational BI should be process driven and tightly integrated with (and possibly embedded in) operational processes, it came as somewhat of a surprise to some attendees that data warehousing didn’t appear in the picture.

I also discussed master data management (MDM) on the seminar. It would be reasonable to ask what MDM has got to do with operational BI. Well the problem is that some so-called operational BI applications are not really BI applications, they are MDM applications. An example would be creating a single view of the customer. This has nothing to do with BI. It’s an operational issue, not a decision support one. This doesn’t mean to say that integrated customer master data cannot be used in business intelligence processing.

Hub products that frantically move data into an operational data store or data warehouse to create a single view of something add latency to the data, and are simply papering over the cracks of master data problems, rather than trying to solve the issue at the source, i.e., in the operational systems.

We should be aiming to put less data into the data warehousing environment, not more. Our long term objective should be to eliminate the data warehouse. The first step in this process is to remove master data from both operational transaction systems and from the data warehouse. This data should be stored in a separate master data store (MDS) that is maintained by separate master data applications. We can start to do this as we redesign our operational systems and move to a services-oriented architecture. The MDM system simply becomes a service. In this scheme, the MDS contains both current and historical master data. Both operational and BI applications can access the master data by calling the MDM system as a service. Initially, for performance reasons, business view subsets of the master data may be replicated into a data warehouse to act as dimension tables.

Removing master data from the data warehouse has the advantage of also removing much of the complexity and many of the data quality problems from data warehouse design. A separate MDM environment also simplifies the operational data store (ODS). The ODS now only needs to contain integrated business transaction data. The ODS effectively becomes an operational transaction data store, or OTDS.

The concept of splitting an ODS into an MDS and an OTDS was well accepted by many of the seminar attendees. I have seen several companies in both the US and Europe do this. In some cases the decision support environment consists of an OTDS, MDS and data marts, i.e., there is no enterprise data warehouse. Heresy, I hear people say. My answer is that we have to start thinking outside of the box. For example, there are some very viable search technologies appearing that also enable organizations to build BI applications without the need for an data warehouse.

Some of my comments above are a little tongue in cheek. The objective is to get people to accept that a data warehouse is not always required for business intelligence, and as I said earlier, the long-term objective should be to eliminate the data warehouse. Comments?


Posted March 22, 2007 1:34 AM
Permalink | 8 Comments |

8 Comments

Colin,

I have a high respect for you and I have read many of your articles.

On this subject, I would strongly disagree with your statement that " data warehouse is not always required for business intelligence ".

When we talk about business intelligence, we need to understand whether the intelligence is geared towards startegic decision making (or) tactical decision making ( Or) Operation decisional making purposes. The critical factor is that how much of historical data is needed to make an effective decission. The answer would then decide whether we need data warehouse or datamart?

A decision at the stargeic level MUST require a datamart and/or data warehouse.

A decision at the tactical level MUST require a datamart and ODS.

A decision at the operational level can probably made with just ODS ( master data as well as transactional) and OLTP systems.

In summary, data warehouse/data mart is always needed because of the very fundamental requirement of using vast amount of historical data that is quite not just possible with OLTP systems.

The current state of Service Oreinted Architectures for BI applications is limited to reporting side of it rather than data cleansing and processing side of it.

Thanks
Chandra Kapireddy

Thanks for the input. As I said in my blog entry, I wrote this piece to try and get people to discuss this topic. I can show you a number of operational BI applications that don't use a data warehouse. For example, in the financial community companies are building close to real time operational BI dashboards that analyze transactions as they flow through the system. The objective is to measure the performance of the trading process. No data warehouse is involved.

I agree with you that we may need historical data in some operational BI applications for context. This information can be access via a service call to a pre-built BI queries and analyses.

I don't agree with you about SOA and data transformation. Most of the data integration vendors now support the ability to call a data transformation service from a workflow. This means a business transaction workflow can dynamically call a data transformation service. A simple example of this is to use such a service to valid name, address and telephone formatting. This is an example of fixing data quality problems on entry and not after the fact in a data warehouse.

Regards. Colin.

First - wow! Well done Colin for saying what needs to be said so openly and clearly. Your thoughts help make us think of WHY we need data warehouses currently rather than just blindly accepting that we do. It’s always good to start from the real requirements.

I agree with your long-term goal, but I would state it as "to continue to reduce the number of places an item of data is stored physically until there is only one place to store it.” I don't think it matters where this is or what label you give it (operational, ODS, MDS, etc.). The real problem is when you CAN'T satisfy all requirements from a single store. How do you manage the replication that is required (especially when the data changes over time)?

You are right to say that an important reason for having a data warehouse currently is for performance, but that is not the only reason. There are three primary issues that still need to be addressed, including:

1) Operational systems (and MDS hubs) typically don't hold history (either transactional history, or, more importantly master data history). History is required to answer some business questions like comparing this month's performance against the same month last year. Where will past states of data be held?

2) Operational systems (and MDS hubs) typically don't hold “What if'” or scenario data for planning and scenario evaluation. Where will future state and scenario data be held?

3) The typical business person is still burdened with having to know where to go to get answers to each of their questions. They need a way to uniformly access the organization's information. This way, their IT counterparts can “swap out” the technology below the covers, and the business user never needs to know what tools, or technology is delivering the result.

So I see the data warehouse as being around for some time yet. As you point out however, the landscape is changing. What we need to focus on currently is putting some insulation between the technology and the business user so they don't have to bear the pain of that change. ...

It is certainly always good to question assumptions. It seems to me that the reasons that we have data warehouses are indeed due to limitations of current technology in many ways: performance, inconsistent master data, storage of historical data slowing things down (another performance issue), the difficulty of people simply finding existing reports and information. If you could remove all these essentially technical issues then you probably wouldn't need a data warehouse. Unfortunately while some things are getting easier (e.g. applying search technology to finding reports, databases becoming more tolerant of mixed workloads) the progress has been slow, and in some areas e.g. master data handling, is as yet very immature. Hence I won't be throwing away those data warehouse books just yet. I do agree that the very existence of a data warehouse is essentially due to failure of technology. I don't think that the answer is data marts though. More on this can be found in my blog today "Philosophy and data warehouses".

Reply from Colin White

I agree with both of the last two comments. My intent with the blog was to create discussion and encourage some lateral thinking. Looks like I am succeeding!

Cliff Longman is right. We get hung up on labels. Business users just see data they don't care where it is stored or what we call it. Everytime I try and drop labels IT folks complain. I think we like to put technologies into categories. This is fine as long as those categories don't drive eveyrthing we do.

I also agree with Andy Hayler. I don't see data warehouses going away. I don't want people to interpret my blog as saying data warehousing is dead. It isn't. However, there are now BI applications that don't need a data warehouse and these new approaches need to be considered.

Andy is also right in saying that were, and still are, a number of technology issues that create the need for a data warehouse. However, it is worth examining how those issues are changing given the length of time that data warehousing has existed. I will be writing a series of blogs over the coming weeks examining each of those issues, and how they have changed. Keep the comments coming!

Colin,

Excellent article and excellent comments. I have argued this same topic with colleagues and friends before.

I look forward to the rest of the series.

As Collin Rightly Mentioned , We always need NOT go by the traditional datawarehouse way . We could judge upon the Business Scenarios and then talk about the technology wherether it is ELT OR ETL , The Latest buzzword is of Dynamic warehousing , a BI Appliance server which has most of the technology stuff bundled

Dear Colin,

You did a nice job opening the discussion for BI and datawarehousing.

Business people don't argue with IT about data warehousing because it's difficult for them to understand, and on the other hand, IT people always debate and argue on this issue because everyone has his own view on user's requirements.
At the end of the day, I think IT people should develop a list of questions for customers, and decide what the customer REALLY needs for BI upon his answers.

Thank you for this interesting entry and for the nice comments.

Colin - I do think that it's interesting that the leading BI organization in the states is named after the Data Warehouse.

In response to some of the comments above, I'd suggest that we need to distinguish between "data warehousing" as it's practiced today (complete with transforming ops data into tidier normalized forms) and a simpler practice of on-line archiving ops data in native form. The latter can afford historical visibility without all the headache and cost of a full blown dw program.

We've seen very large, very sophisticated clients using this approach, and it has cut program cycle time, setup, and maintenance costs significantly.

But, it is not supported by the clientserver/SQL/rdbms-centric cartel and its products, so it's a bit "bleeding edge." Whether this approach can go mainstream probably depends upon the timing of (1) more powerful processors at the edge, (2) continued growth in data volumes exposing cracks in the old BI model, and (3) significant improvements in usability of the new BI tools.

Thanks being edgy -- pushing us to think about whether there could be a better way.

Leave a comment