Business Intelligence Network business intelligence resources

Blog: Dan E. Linstedt

« Is this the ULTIMATE SOA Payback application? | Main | It's Business Modeling, not Data Modeling. »

Data Modeling is vital to successful Business practices

Is your business growing? Are you capturing more data every day? Is your business Nimble enough to stay competitive? Being nimble anymore has nothing to do with being small; while that may still be an advantage, even some small businesses have politics and misguided business practices, but rarely is it written that some of these things stem directly from the way we build our data/information models within our information stores.

With the advent of SOA, it's not just warehousing anymore, it's data integration, aggregation, conglomeration across the enterprise to a SINGLE data copy of information. Hopefully the models we build to house this information free us to change the business, rather than constrain us from making necessary changes.

Over the years, data modeling has evolved. The preferred term is now Information Modeling. What is data or information modeling? It is the logical representation of data or information in a format that can categorize, organize and manage different sets of information. The model also provides a mechanism for us to access, alter and add new information as the needs arise. This access point is typically referred to as SQL (standard query language) and includes a basic set of commands: Insert, Update, Delete, and Select.

All of this aside – information modeling hasn’t changed much over the years – that is to say that the basis for how we categorize, manage, and maintain the information has remained the same. All the while, information itself has changed and grown beyond our wildest imaginations. The Data Vault is the next evolution of information modeling. It takes the basic concepts and builds on them to overcome the challenges laid before us like very large data (VLDB), real-time (RT), and independent units of information.

If our businesses are changing, and our data sets are changing (along with volume and frequency of data arrival/utilization), shouldn't we be changing our data modeling styles to keep up? Did you realize that the 3rd Normal Form was created in the 60's, and that Dr Peter Chen started the whole thing with Entity Relationship Diagrams way before that? How about Star Schema? When was that created? Sometime in the 1980's if I'm not mistaken.

Each of these modeling techniques are wonderful when used in the context for which they were designed. But when placed in today's changing requirements of rapid data access, terabytes (approaching petabytes) of storage/collection, rapid data feeds - they leave a little bit to be desired. To my knowledge, they weren't specifically architected to meet these needs.

However from a business perspective, what about business changes? The business is changing faster than ever before, and the speed at which business changes is increasing exponentially (in order to survive competition). If Wal-Mart had to do it all over again, but started today - it would be twice as hard for them to make it work. This of course given that other companies like Target would have innovated on their own, and be at the level they are at today as well.

Now, there's a strange paradigm here: 1) divergence, 2) convergence. Divergence of the PHYSICAL data modeling representations from the business, and convergence of the Historical and up-to-date (real-time) information stores. Any time a model diverges from the business goals, we have to ask the questions: is it helping or hurting us? Is it constraining us from getting to our end-goals faster?

In the data modeling and IT world this translates to the following: Sir, we have a huge enterprise warehouse, to make the change you are asking for will take X days/weeks and Y number of resources. It will also impact A, B, C, and D. The business goes away and confers... they come back and say: costs too much, or it will take too much time, therefore we must "settle for less" or make a different business decision.

To me, this means that the business decisions are CONSTRAINED by the information model that's in place, and the further away from business processes that the information model gets, the more constraints begin to show up. It's may be one way that a legacy system can become a legacy system.

Furthermore, suppose the model was built in 3rd Normal Form (as an Active Data Warehouse - ADW), containing both history and current information.

The business question?
When was the last time that high-level business users could be walked through a 3rd Normal Form data model and understand or SEE how their business operates? Normally the modeler points and says, your data X resides here and here.

Star Schema's are a bit better to understand, they are designed and architected specifically to meet SUBJECT ORIENTED business needs. They were not designed to meet cross-functional needs, nor were they designed to handle real-time feed requirements, nor super-huge volumes. They were designed in such a way, so as the data would be aggregated (lose detail), less volume, faster queries, less disk space, etc... Meet the business needs one subject at time.

Star Schema's are wonderful tools, when built correctly, and utilized for dimensional analysis and OLAP drill-down. In fact, building Virtual Marts is something I've taken to lately, given that making copies of huge enterprise data warehousing sets is simply not feasible. However, to build a Star Schema as a data warehouse, can be done - and can be done successfuly with large volumes, it's can be very compicated and also can cause constraints on the business.

The analogy is this: Suppose you had a Porsche 914, and a Big-Rig. If your objective is to move your house, would you use the 914? How about hooking a lot of 914's together, and driving them across the country, would this work? Sure, it will work, but its not as effective as filling and driving a big-rig one time.

If your objective is to win a race, would you chop the top of the big-rig, take off the trailer, leave the engine and frame intact and then run the race against 914's and expect to win? Probably not. You certainly could ADAPT the design, but why? Use the appropriate design/architecture for the appropriate purpose.

The Star Schema is like the 914, fast, efficient, works. 3NF is like the big-rig, handles tons of data, and vast real-time requirements but has "history storing issues". The Data Vault is like the SUV of this world, it comes directly in the middle, and provides comfort, style, torque, speed, and towing capacity. Although that's where the analogy stops - the Data Vault can be extended into the Petabyte ranges of data very easily.

I've designed a new model, one that integrates the best of breed notions from both architectures, it's called the Data Vault. And it's design concepts focus on: returning the model to the business case, making the model conformant to cross-functional business, addressing compliance, handling super-huge volumes, and real-time feeds. (see the free white papers on the Data Vault Here)

It's time that we stop constraining our business decisions through the modeling choice, and start making changes to the modeling architecture that is chosen. The modeling architecture should adapt to meet the needs of the business, and shouldn't cost a lot of money, or take a lot of time in order to be altered to meet the needs of the business.

The Data Vault is free - it's just a fancy marketing name for something called: "Common Foundational Information Data Model"

I'll be blogging more on this topic later, but would love your feedback.

Cheers,
Dan Linstedt

  Posted by Dan Linstedt on April 8, 2005 8:00 PM |

Comments

this is very interesting thinking; not many folks have understood the critical importance of modelling in SOA. Compliance data services. a couple of links you might look at.
http://www.redmonk.com/jgovernor/archives/000033.html

You might well be interested in our open source Compliance Oriented Architecture model.
http://www.redmonk.com/wiki/index.php?title=COA

I suspect that data modeling as a vital component of a successful prospect research program can help your organization analyze each individual in your database — not just those whose wealth profile or giving history suggests they are good prospects.

Post a comment