The other day a friend said to me, “I have big data. I don’t need a data warehouse.”
My reaction was not very pleasant. I stated. “A data warehouse is completely different than big data.” Even with the best big data implementation in the world, you still need a data warehouse. They are not one and the same.
So what are the issues here? The issues are very clear. Big data is a technology and a data warehouse is an architecture. They are not the same thing at all.
The discussion is about the difference between technology and an architecture. Or, stated differently, the difference between a carpenter and an architect. A carpenter is good at cutting wood and hammering nails. An architect is good at determining where the bathroom should go, where the kitchen should go and where the front door should be. You need both a carpenter and an architect to build a house. No one confuses the differences between those two jobs. So why do we have this confusion between big data and a data warehouse?
What Is Big Data Technology?
Big data technology is capable of handling a lot of data. Big data handles data cheaply. Big data handles data in the form of unstructured strings of data. Big data does its searches independently. Big data is used to store and manage large amounts of data. That’s what big data is.
And what is a data warehouse? A data warehouse is the place where you build the corporate single version of the truth. A data warehouse contains data that is integrated and is organized along lines of subject orientation. A data warehouse contains historical data. A data warehouse contains granular data. The specifications for a data warehouse are carefully laid out in my book, Building the Data Warehouse
Data in the data warehouse is reconcilable. Data in the data warehouse can be reshaped so that people in accounting, marketing, and sales can all use the same data for their individual reporting purposes. When a corporation needs bedrock, believable data at the most granular level that can be accessed and acknowledged as the corporate truth and can be independently audited (for compliance with Sarbanes-Oxley Act, Basel II, and other regulations), then it is the data warehouse that provides that foundation.
It takes a lot of work to create the corporate system of record. But the corporate system of record is invaluable when building corporate reports. (In fact there is no “corporate” information without a data warehouse.) Typically, ETL
is required to build a data warehouse because the underlying legacy application data was never designed or intended for incorporation into a data warehouse. Some technology – ETL – is needed to unravel all the knots that were created in the building of the legacy systems environment.
There is a tremendous difference between big data technology and a data warehouse. While the two environments may share some common characteristics, when you get into the DNA of the environments, they are as different as chalk and cheese, as my old English friend Roger Haworth used to say.
Proclaiming that when you have big data that you don’t need a data warehouse is revealing. It reveals that the person saying it has no understanding of what a data warehouse really is.
I suggest that people that say that big data is a replacement for data warehouse be required to wear a t-shirt that says, “Beware. I am ignorant about data warehousing. Don’t trust me to make an informed decision.” Then management gets what it deserves if they trust their architecture and data warehouse decisions to such people.
Recent articles by Bill Inmon