Blog: Dan E. Linstedt Subscribe to this blog's RSS feed!

Dan Linstedt

Bill Inmon has given me this wonderful opportunity to blog on his behalf. I like to cover everything from DW2.0 to integration to data modeling, including ETL/ELT, SOA, Master Data Management, Unstructured Data, DW and BI. Currently I am working on ways to create dynamic data warehouses, push-button architectures, and automated generation of common data models. You can find me at Denver University where I participate on an academic advisory board for Masters Students in I.T. I can't wait to hear from you in the comments of my blog entries. Thank-you, and all the best; Dan Linstedt http://www.COBICC.com, danL@danLinstedt.com

About the author >

Cofounder of Genesee Academy, RapidACE, and BetterDataModel.com, Daniel Linstedt is an internationally known expert in data warehousing, business intelligence, analytics, very large data warehousing (VLDW), OLTP and performance and tuning. He has been the lead technical architect on enterprise-wide data warehouse projects and refinements for many Fortune 500 companies. Linstedt is an instructor of The Data Warehousing Institute and a featured speaker at industry events. He is a Certified DW2.0 Architect. He has worked with companies including: IBM, Informatica, Ipedo, X-Aware, Netezza, Microsoft, Oracle, Silver Creek Systems, and Teradata.  He is trained in SEI / CMMi Level 5, and is the inventor of The Matrix Methodology, and the Data Vault Data modeling architecture. He has built expert training courses, and trained hundreds of industry professionals, and is the voice of Bill Inmons' Blog on http://www.b-eye-network.com/blogs/linstedt/.

I sat down with my good friend Jeff Jonas yesterday and discussed the nature and notion of contextual processing. Jeff is a phenomenal individual, and much smarter than I ever hope to be, but all that aside, we had a wonderful conversation about the nature of processing streaming data (one piece at a time, or possibly multiple pieces in parallel, but separated) and how to focus the notions of context.

How is this related to B.I.?
It has everything to do with Business Intelligence, and how we "experience" and use our data sets/patterns within to make sense of our business, especially in an Operational B.I. world

Processing the context on a streaming basis (as Jeff says) requires the ability to "change" all that we know (perception) at run-time based on new facts arriving on the stream. His statements went a little like this:

1) Imagine we think our friend XYZ is a good person. We just met this person 3 days ago, so we don't know much about them, but they've been nice to us - so our current perception of this individual is: K, U, I, O, T - and so on. We've hung out with them, so we have a whole host of experiences to draw from (mostly fun).
2) Now, 3 days later we find out from another very good friend, someone we've trusted for over 25 years, that this person has done something horrible in the past...

At that instant, considering our relationship to our very good friend, all that we know about person XYZ (perceptively) changes; usually very quickly.

Now, this isn't so bad if we are dealing with one piece of information, and a very small series of memories that we are focused on, but imagine now: trying to do this at 10,000 transactions per second in a non-sequential order of arrival of facts, and then trying to affect data sitting within 100 billion rows in our database...

This brings me to my discussion. From here Jeff and I began discussing HOW this processing needed to take place, and it reminded me of some of the conversations I'm having here at Teradata Partners conference this week.

The questions on the table are:
1) How should the system determine the assigned context for a given fact? Well, we have to let go of the word "context" and from a systems perspective we have to work with the notion that the data has a strong correlation to a particular STACK or SET of facts/history or historical knowledge.
2) Once a perspective has been established for that incoming fact, what IMPACT does it or should it have against all the target data, or patterns that are already known? For instance, suppose an area code changes from 720 to 750 (Jeff's example) - what do you need to do to change ALL of the existing phone numbers? Inserting brand new rows isn't always the answer, it would cause too much data change, updating existing information also won't work - it too would take too long. REMEMBER: 10,000 transactions per second, means we have to process this information and execute on the history in millisecond response times.

Jeff and I began to discuss the notions of a LENSE, through which focus on a particular pattern could be achieved. What's important here is the FOCUS - but again, remember the focus is for _this current piece of information_ and is not necessarily related to other currently arriving information or facts.

Well what the heck does this have to do with B.I.?
You should already be able to see it... In a VLDW where we have huge stores of time based information it is near impossible (without focus) to find what you're looking for, so the first problem is (again) establishing focus - where oh where does my data FIT? So if you're processing in REAL-TIME folks, listen up... Once we establish which data sets are affected, we need to understand IN A FRACTION OF A SECOND how to change the "known outcome" on the existing history - oh yes, and by the way, this all has to happen in PARALLEL with all the other arriving facts, or it simply won't be executed in a timely fashion.

Now what else am I saying about ALL THIS DATA we've stored?
HERE IT IS:

* Large volumes of data must be processed and learnt from.
* The combined "learned" knowledge (we'll call it a derivation on average) of a STACK of related information within a topic area IS MORE IMPORTANT than the parts or the all the history and individual facts, but without all the details, we can't create a combined image.
* This combined knowledge element must be used IN CONTEXT or AS A CONTEXT LENSE to quickly establish the relevance of the incoming information, and how it will affect the "next" view or look at the information.

In other words:
* VLDB / VLDW data by itself is important when you're digging for detailed specifics that happened at a specific point in time, but the real value is having a "mined" collective perspective on all that detail that allows us to establish where and how our current "transaction" will affect the outcome.

A 24x7x365 neural network / data mining engine MUST be up and running consistently. it must first be trained, and then constantly adjusted for "drift" off topic, but the neural net should be receiving the transaction inflow for "context" application in order to establish our focus, or put a "lens" of information to our historical data set. This isn't your fathers neural net, and not your mother’s data mining engine - no... this is a different way of "scoring" parts of interesting history that are within the interested perception bounds (Jeff's term) so that processing of "extraneous noise" is filtered away as one of the first steps.

This data "mining" engine or neural net is highly focused, real-time processing based on transactions, and it houses "the many different lenses" of focus (or combined derivations) of different but interesting views of history, so that based on the incoming transaction - it can change the "lens" to match and see where the impact is.

From a B.I. perspective, I'm also saying that the sum of the whole may be more interesting and more valuable than the sum of the parts, but to get the sum of the whole, we have to have all the parts when we start. So the INTELLIGENT part of Business Intelligence is all about
1) Knowing which patterns are most interesting / most costly to the business - establishing the RIGHT LENSE at the right time, and having that lens available ahead of the arrival of the transactions
2) understanding that changing the color of the lens is easy when the transaction arrives, but that over time, the "lens" needs to be replaced (due to virtual scratching / shifting of the answer set), and needs to be re-aligned with all the large set of facts included in the history.
3) real-time transaction processing IS 100% necessary in a VLDW / data warehousing environment.
4) ALL the facts that we collect are important, depending on the "viewing perspective" of the business user.

New kinds of systems like this are in development labs, and I can help you with your efforts (should you so desire) to focus the lens. But it's advances in technology beyond what we have today that make this so interesting.

Food for thought anyhow, I'd love to hear what you have to say.

Cheers,
Dan L
DanL@DanLinstedt.com


Posted October 9, 2007 7:33 AM
Permalink | 2 Comments |

2 Comments

Interesting thoughts Dan! Thank you for sharing. When can I order? ;-)

Hi Frank,

I'm not sure when one will be available for ordering, but I know how we can build a perspective based system today out of pieces and parts... The biggest thing is to remember that we must have an "objective view" of a particular lens that we want to look at, otherwise all this technology brings in "answers" that have no bearing on what we are trying to accomplish as a business.

Thanks,
Dan L

Leave a comment

    
Search this blog
Categories ›
Archives ›
Recent Entries ›