Blog: Dan E. Linstedt Subscribe to this blog's RSS feed!

Dan Linstedt

Bill Inmon has given me this wonderful opportunity to blog on his behalf. I like to cover everything from DW2.0 to integration to data modeling, including ETL/ELT, SOA, Master Data Management, Unstructured Data, DW and BI. Currently I am working on ways to create dynamic data warehouses, push-button architectures, and automated generation of common data models. You can find me at Denver University where I participate on an academic advisory board for Masters Students in I.T. I can't wait to hear from you in the comments of my blog entries. Thank-you, and all the best; Dan Linstedt http://www.COBICC.com, danL@danLinstedt.com

About the author >

Cofounder of Genesee Academy, RapidACE, and BetterDataModel.com, Daniel Linstedt is an internationally known expert in data warehousing, business intelligence, analytics, very large data warehousing (VLDW), OLTP and performance and tuning. He has been the lead technical architect on enterprise-wide data warehouse projects and refinements for many Fortune 500 companies. Linstedt is an instructor of The Data Warehousing Institute and a featured speaker at industry events. He is a Certified DW2.0 Architect. He has worked with companies including: IBM, Informatica, Ipedo, X-Aware, Netezza, Microsoft, Oracle, Silver Creek Systems, and Teradata.  He is trained in SEI / CMMi Level 5, and is the inventor of The Matrix Methodology, and the Data Vault Data modeling architecture. He has built expert training courses, and trained hundreds of industry professionals, and is the voice of Bill Inmons' Blog on http://www.b-eye-network.com/blogs/linstedt/.

May 2005 Archives

As I discussed in my first articles, nanotechnology is not only here to stay, it's made it into the R&D labs of some of the hotest integrated circuit manufacturers, and now - it's on the fron page of a massive circulation. Carbon Nanotubes, and Carbon nanowires used to help "cool" and shrink the silicon processor environment.

Watch this space soon, I will begin a journey into what I view as the creation of a super-DNA computer.

Carbon Nanotube based computing devices

Nanotech continues to move at astounding speeds, just keeping up with it will be challenging to say the least. More to come shortly.


Posted May 31, 2005 3:59 PM
Permalink | No Comments |

So you're curious are you? Have I grabbed your attention yet, or is this not making any sense? In thinking about accountability, the data supply chain, and change requests there is one key component to making this all happen. That is: to show the bad data with the good, maybe not in the same reports, but to physically separate the bad data from the good depending on the severity of rules breakages.

There's one more thing to think about, if we accept Data Supply Chain as a paradigm, then we should be keeping the business key on the data unique, and the same - once assigned, always assigned. The concept goes back to RFIDS and the manufacturing supply chain. RFID's are helping "clean up" and provide visibility... Read more...

RFID's are used to clean up data, and provide visibility into the manufacturing supply chain, they are causing accountability in business and providing the means and mechanisms with which to improve and measure supply chains around the world. If RFIDS (which are nothing more than unique identifier keys) can do this for our manufacturing supply chain, imagine what a CONSISTENT business key can do for our DATA SUPPLY CHAIN!

It can become the RFID for the Data within our systems. This means businesses MUST abide by the following rules:
1. A business key assigned, must be mechanical
2. A business key assigned must be unique
3. A business key assigned must never be re-used.
4. The Application (like CRM/ERP/HR etc..) that assigns DUIDS (Data Unique Identifiers), will provide incredible metrics, visibility and consistency to improving the Data Supply Chain. They can begin producing the ultimate application.

I'm talking about more than just simple sequence numbers. We need an international numbering board. What if (just a thought) that the RFID's could have the same exact number or ID as the Data Supply Chain? Of course that would mean tagging the very smallest of parts in all of our assembly lines. Service companies would never have RFIDS to speak of, except maybe on invoices or paper contracts that are printed.

Hmmm, does this mean data is trackable when printed to hard-copy? You bet! Imagine a filing room filled with RFID's - how easy it would be to track down a document? Maybe the US Patent Office or the Library of Congress could undertake something like this, saving billions of dollars a year (Yea! Less Taxes?)

Next step: Printers that stamp RFIDS on documents according to document numbers that come from DUIDS. So you get the point, data identifiers (business keys) are just as important as any hard-coded identifier tags we put on products. Curve ball: if we can place a value on the products we produce, and we can begin uniquely identifying our data and it's elements, then there's no reason why we can't place a value on our data as well.

Back to the real world, since today we have no commonly accepted notion of truly unique identifiers, business keys will have to suffice. So what's the problem with today's business intelligence reports and systems?

The problem is, by the time the business user sees the integrated data set, every effort has been made to adjust, clean, alter, move, remove, and merge data to make it usable by the business. This is fine until we begin to question what is meant by "One version of the truth." As I've stated in several other entries, TRUTH is in the eye of the beholder and is subjective. It has NOTHING to do with the FACTS of the way the data is captured, stored, and moved around the organization.

While there is value to producing "usable information", we (as BI implementors) have long overlooked the fact that there's also value in producing the unusable facts - the raw data that is messed up, wrong, unmatched. However, everything starts with the analysis of the business keys. I propose that there are really two answers, both right at the same time, ahhh - a Conundrum? Yup.

I propose that along with DUIDS, we should be storing a single statement of FACT in our warehouses, then moving the FACTS into polarized/colorized versions of the truth in Data Marts. This means two basic principles apply:

1. Business rules move to the "output" side of the warehouse, between the warehouse and the marts.
2. Raw data that breaks business rules, ends up in one or more ERROR MARTS.

Physical separation of the data is absolutely necessary to begin pushing the accountability back into the business, to begin the IQ cycle and the business process clean up, to begin providing true visibility into ALL data that exists in the source systems, to begin showing the FULL level of rejects in our data supply chain.

Manufacturing supply chains don't throw away "bad parts", they put them in reject bins, record them, try to figure out why they went bad, and then try to improve them so they don't make the same mistake again (because it costs them money, time, and competitive advantage). Why shouldn't we treat our data this way? Why do so many implementation specialists INSIST on cleansing, mixing, merging, and constantly fine-tuning the "truth" so that these errors are hidden or disposed of?

By actually separating the bad data into "reject bins" for the lowest level of grain, before it is cleansed, mixed, merged, etc.. We can really begin to take inventory of our source systems and the business processes - we can finally see where our businesses are HEMMORAGING money, time and competitive advantage.

In our next entry, we'll walk through an example of how this worked at a real customer site. IT'S TIME for OUR DATA SUPPLY CHAIN to step up and begin working for us.

Comments?


Posted May 31, 2005 7:00 AM
Permalink | No Comments |

We're here, dirty data, complex business processes, inconsistent integration points - sounds like what an EDW/ADW is supposed to help solve right? Parts of it anyhow are solved by EDW/ADW, other parts must be solved by accountability of end-users, still other parts must be solved through SOI (service oriented integration, under the SOA stamp).

We've established rule #1: in a sea of data throughout our enterprises, the single most important data point is the business key - the one and only reference across the company that means something to the business, and allows the business direct access to the data set they are after.

Are we ready for rule number two? Not quite yet. Let's explore dirty data further. Not to change track, but Information Quality is extremely important. It's not just about the data itself, but it's about the people, the business processes, the metadata, and the metrics and measurement all used to ensure continuous business improvements.

Dirty data, and broken business processes can make a company "bleed money." And that's just the START! Data Models that help increase accountability from end-users, and systems architectures that help raise the visibility of business process problems help stop the bleeding, and can save millions of dollars a year if done right. But to understand these statements, we must walk through just how the systems got this way.

So we take the case of the broken business, customer SLS123, we just lost $30M to our big competitor because we took 5 weeks to respond, and our competitor took 3 weeks to respond. Please note, just because they responded quicker, doesn't necessarily mean that the quality of their product is better - it just means they stream-lined a portion of their sales, finance, and contracts communications. Now if they deliver faster, with higher quality - then they've truly got us beat, and we will go out of business if we don't do something to correct the situation (keep up).

By the way, this is what ERP systems attempt to address, and sometimes do a good job of it, but obviously they leave a little bit to be desired (due to high levels of customization), hence the usage of additional tool sets like EAI, to move the customer into CRM systems and through even more complex business processes.

After examining our business process here's what we find:
Sally takes the first contact call
Sally assigns SLS123 to the customer record
Sally pre-qualifies and fills out some basic information, to which she accidentally enters the wrong address, or uses special characters to represent information that she can't store in the source system.
But because Sally wants the bonus for this customer, and doesn't want her sales counterpart Joe to get the bonus, she uses her own special characters that only she understands and can interpret to management.
Sally then hands the account off to Finance, and sends an email to Jim, whom she also works closely with because she has a good business relationship with him.
Jim in Finance pulls up the customer record by name, an auto-synchronization routine in the source systems, moved the record from sales to finance last night and changed the account number from SLS123 to FIN456.
Jim then walks through a series of checkpoints on the application, and has to call Sally to understand her encoding of the special characters (over time, Jim begins to understand it, but doesnt annotate any of the metadata).
Jim then changes parts of the application, sends the FIN456 customer to management for approval/disapproval.
Financial Management then approves the customer FIN456, calls Jim and says - pick up the customer, it's ready.
Jim then says, good to go - marks the record for upload to Contracts.
That night the synchronization system moves the record to contracts, and promptly changes the customer number again to CONT259.

And the cycle goes on, the complexity increases, the touch-points increase. When we look at this particular scenario we discover that there are critical touch points and manual approval mechanisms that must be in place, we also discover interesting auto-synchronization mechanisms hidden in our legacy systems, or even in our re-engineering of the legacy into ERP and CRM.

We finally discover that there are unnecessary processes that the data goes through which neither improve the quality nor speed the process up. These are the business processes we wish to eliminate to stop the bleeding. Now there's the data set. One customer: John Smith, 3 Account Numbers - SLS123, FIN456, CONT259. Can the business trace John smith at an enterprise level? Not very effectively. Does the business have deep visibility into their data supply chain? No.

Business Rule #2 for effective profitability:
Once a key is assigned to a data point, it MUST NOT CHANGE.

Not in a box, not with a fox, not here nor there, not anywhere (Dr. Seuss) - the business key must stay as a consistent representation of the data point from this point forward.

Business Rule #3:
If the key changes, you can be certain that there is a break in the business process at that point, and that you are bleeding money.

Business Rule #4:
If the key changes, you can be certain that there is a flavor of ownership of data (kingdoms, fiefdoms) within your organization, and that there are parts of the organization who are guaranteed to produce different financial results - every time, and nearly on purpose (embedded in the culture of that business unit to say the other units are "wrong" in their view of the customer).

Business rule #5:
Use of abstract character annotations to mean certain things in metadata format are usually an indication that the incentive from corporate is misplaced. It also means that the business users cannot be held accountable for poor audits, nor are they incented to improve the data quality, even though the data itself is "broken", as is the capture system.

As we continue down this track, we will discuss how an integrated data store (ADW/EDW) can help pinpoint some of these problems from a metrics driven perspective - but only if the right models are in place. We will also begin showing how to help business users become more accountable in their positions - and actually begin to issue change requests, and allocate dollars to fixing the source capture systems, thus stopping the hemorrhaging of the company, while making it more nimble and stream-lined.

Thoughts? Shout out, enter your comments below... I would love to hear from you.


Posted May 27, 2005 4:57 AM
Permalink | 2 Comments |

I'm happy to see that Doug Laney has joined us here in the blog space. Not to take anything away from his valuable services, but I also would like to say that we are offering a free ETL score-carding mechanism. This is a very short entry to show you where the score-card lives. The downloadable score-card is free/empty. and has no vendors.

The free ETL/ELT scorecard is downloadable at: www.MyersHolum.com

My thoughts on the ETL/ELT scorecard are as follows: ETL and ELT are two different utilities and really shouldn't be compared except in areas of metadata, GUI development (no-code environment during development), flexibility, connectivity. Unfortunately comparing ETL to ELT in the transformation areas is unfair, but necessary. It is important to evaluate which transformations are provided to you by the RDBMS vendors, and which you have to add to the RDBMS (UDF - User Defined Functions) yourself.

However, the true nature of this scorecard looks at sourcing, targeting, metadata, transformations, market stability, cost, number of outside consulting firms, cost of available consulting knowledge, and a few other key metrics. Please feel free to download, and post your comments, questions, remarks or improvements here.

Thanks,
Dan Linstedt
CTO, Myers-Holum, Inc.
daniel.Linstedt@myersHolum.com


Posted May 27, 2005 4:29 AM
Permalink | 3 Comments |

Ok, now that we've introduced the concept let's walk through some examples of complex business processes, and dirty data. Let's find out just what we can do about starting to solve some of these problems. Furthermore, let's explore the real issue of "broken" business processes, do you have some of these in your organization?

So profitability is tied to complexity of business processes coupled with dirty data coupled with too much manual intervention. What exactly does this look like?

Here's an example:
Suppose a customer calls Sales, and says: I would like product X with the following configurations: CA, CB, CC. Sales begins tracking the customer, captures some information (hopefully not fat-fingered) about the customer and their contact point, along with the product and configuration. The customer is then assigned an account number: SLS123.

The customer wants to know approximately when this will be built, and shipped, or if there are ways for them to track the product through it's build cycle. The business says: well, we can only track it once it's shipped to you, and we can't estimate it's cost or it's build time until we have designed the custom parts. Customer says: fair enough, when will you have a design complete? Sales says: can we get back to you in a week?

Ok - sales has the customer contact, they qualify the lead through a number of manual intervention processes before passing it off to Finance. Finance takes SLS123 and changes the account number to FIN123. Now I ask you, is there any traceability in this simple example across Sales And Finance at a corporate level? No, not unless someone in finance or sales records the customer account number change (from/to).

Finance runs it through it's paces, approves financial lending, and then passes it off to contracts who runs it through a series of complex business processes with manual intervention. By the way, contracts changes the account number from FIN123 to CON456. The customer finally gets a call 3 weeks later stating they have a contract for the customer to sign. But before they can give a delivery date they need planning to run the manufacturing phase through their systems, so off it goes.

Another two weeks and planning returns to Contracts to provide an estimated build plan and date. We're already 5 weeks from initial contact, and by the way the customer has put the same bid in to our competitors. 3 weeks ago, our competitor returned the bid and build ETA to the customer. We call the customer back and they say: sorry, your competitor won the bid. We lose $300 Million dollars.

What happened? Our complex business process has not been optimized or stream-lined. There were unnecessary hand-offs between manual intervention, and alternate business units in order to win the business. Imagine if Sales were empowered to a) check financial standing b) run the contract up against previous builds of similar nature (data mining with confidence levels), c) run this by a financial analyst and contracts approval individual - all within 2 days, and return to the customer.

This would be a) a more profitable business, b) cheaper to handle contracts and approve financials c) single out contracts that are too difficult, not our sweet spot, or specialized enough to warrant higher prices d) make us highly nimble and competitive.

In order to get there, we must a) reduce the number of touch points on the data b) utilize data mining tools in an active warehouse to enable insight at the sales contact level c) simplify/streamline the business processes between customer contact, estimation, finance, and contracts approval - which means Cycle Time Reduction, and business process critical path analysis.

Think of the business processes, both mechanical data touch points, and manual data touch points as a graph of 2D lines (x,y coordinants). Complexity of the process going from A to B is the rise/run or Y coordinant. The X coordinant is the process number. Then graph the business processes as best as possible. Finally begin to analyze the graph for critical path - attempting to eliminate touch points, and reducing complexity of the business processes (reducing the Y) to end up with as "straight a line as possible".

Keep in mind that changing keys to information doubles complexity, even if the changes are recorded. I think you'll be delightfully surprised. All companies who undertake this effort can save millions of dollars a year with 1/2 the investment, furthermore this drives the quality up, profitability up, complexity down, overhead down and time to deliver speeds up. Result? More satisfied customers, the business is more nimble.

Now let's take a look at the dirty data problem (which we'll explore further in Part 3). The first problem is we need an enterprise view of this customer, even if it has to span business SECTORS, and not just companies within those sectors. This will be the ONLY way to roll up a single customer and pinpoint exactly where their deliveries are within the entire organization. Sometimes this is referred to as the Data Supply Chain (Jill Dyche, Baseline Consulting TDWI 2005).

What if we kept the SAME customer account number throughout all processes? We can pinpoint exactly where in the data supply chain their application is, and we can begin tracking and monitoring (metrics, KPA/KPI) on the efficiency of the business process. Ahh you say, we have that in place! Ok, but what happens when you re-bill a customer? Do your systems change the Invoice Number? It's the same problem, different data.

Paradigm Rule #1:
1. KEYS to information within the organization must remain consistent over time.

So business keys are extremely important to start with as a metric in business profitability. If you can start with pinpointing the places where keys are changed throughout the business, you can begin identifying major breaks in the data supply chain.

We'll dive deeper into these concepts in Part 3. Thanks, By the way, TDWI - November, Orlando - come see the Data Vault Data Modeling in play, or read about it at: www.DanLinstedt.com


Posted May 26, 2005 6:11 AM
Permalink | No Comments |
PREV 1 2 3

Search this blog
Categories ›
Archives ›
Recent Entries ›