Blog: Dan E. Linstedt Subscribe to this blog's RSS feed!

Dan Linstedt

Bill Inmon has given me this wonderful opportunity to blog on his behalf. I like to cover everything from DW2.0 to integration to data modeling, including ETL/ELT, SOA, Master Data Management, Unstructured Data, DW and BI. Currently I am working on ways to create dynamic data warehouses, push-button architectures, and automated generation of common data models. You can find me at Denver University where I participate on an academic advisory board for Masters Students in I.T. I can't wait to hear from you in the comments of my blog entries. Thank-you, and all the best; Dan Linstedt http://www.COBICC.com, danL@danLinstedt.com

About the author >

Cofounder of Genesee Academy, RapidACE, and BetterDataModel.com, Daniel Linstedt is an internationally known expert in data warehousing, business intelligence, analytics, very large data warehousing (VLDW), OLTP and performance and tuning. He has been the lead technical architect on enterprise-wide data warehouse projects and refinements for many Fortune 500 companies. Linstedt is an instructor of The Data Warehousing Institute and a featured speaker at industry events. He is a Certified DW2.0 Architect. He has worked with companies including: IBM, Informatica, Ipedo, X-Aware, Netezza, Microsoft, Oracle, Silver Creek Systems, and Teradata.  He is trained in SEI / CMMi Level 5, and is the inventor of The Matrix Methodology, and the Data Vault Data modeling architecture. He has built expert training courses, and trained hundreds of industry professionals, and is the voice of Bill Inmons' Blog on http://www.b-eye-network.com/blogs/linstedt/.

September 2005 Archives

Hmmm, I've been thinking about this for quite a while. In the tangible world we have tags for physical goods - yesterday they were bar codes, today they're RFIDS and RTLS systems. Tomorrow, physical elements may be tagged with DNA sequences, or electron signatures at the nano level.

Why then is it so hard to track intangible "data"? For applications we have the equivalent of software licenses, but for the actual data? Nothing.

In a world of hypothetical speculation, I would suppose that tagging every data element with an individual signature may be desirable. We could start with "units of work". Take this blog entry for example, tag the header with a signature, and tag the extended entry with a signature. The important thing is: the signature must travel with the unit of data - everywhere it goes. It becomes the unique ID for data sets, like RFID's and should be tracked across the network.

What possibilities does this open up? In data warehousing we often tag our data with CRC32/CRC64, and MD5 (hash functions producing mostly unique values across a row of data). Why then can't these "keys" become universal, and shared around the world? These are standard functions that produce the same keys for the same data everywhere.

What would happen if we could actually tag every "word" entered into every application? I assume data traceability would increase exponentially, talk about a boost to search engines! Unfortunately the downside to these functions is they produce very large keys, and for the most part functions like MD5 cannot be "reversed" - which leads to a massive storage and lookup function. Another issue is that CRC32 can produce duplicates (as can CRC64, although less frequently).

If someone were to produce a device that can "tag" data going over the internet, store the data in compressed format with the key-tag, then pattern recognition would be easier to spot - data mining would see a huge boost, and it may be possible to aggregate what used to be seen as dissimilar data into a similar keyed entry. These keys could also be shared across environments - maybe this is a call to EII vendors who are sharing data over SOA and web-services?

For now it's a pipe dream, but it may step into reality with DNA computing or nanotechnology. Just think: Data Unique Universal Keys (DUUK) - a fascinating idea. From compliance and monitoring perspectives it opens a ton of doors.

Thoughts?


Posted September 30, 2005 6:24 AM
Permalink | 1 Comment |

Holy Begonias!!! Talk about breach of security, and loss of personal identity. This story: "Credit card firms don't have to warn individuals" seems to ludicrous to ignore.

This is just a plane outrage. In this blog we'll explore what this means to corporations, especially with precedence set - beware: I'm warning you now, this entry is mostly a RANT.

This is incredulous. How can a judge in San Francisco say that credit card companies don't have to report "break-ins" or successful hack attempts on 40 million+ consumers? He must not have had his identity stolen!

In terms of personal loss, this is tremendously devastating. The ruling makes a statement to corporations everywhere that they no longer need to "admit" that all that information they collected from you, was stolen. That's right! Of course, that means it also makes it easier for the corporations to "sign agreements" to share information without your knowledge.

* What happens to the privacy policies? Out the window.
* How about Information "protection promises"? Meaningless.

If corporations follow suit, we will have chaos very very soon.

What if a medical claims company has their system hacked, and all your medical information was "compromised" (to use their terms). What if this medical claims company (or financial company for that matter) doesn't have to tell you that your personal information was stolen, and now appears on free hacker sites all over the world? Did this judge stop to think about all the rules that HIPPA states about privacy of medical information? I think not. The information may have to remain private, but if it's stolen - well, the company doesn't have to tell you about it.

This is one of those things that is just a BIG MISTAKE by a judge. I'm sorry, but if I can't own the information about me, the next best thing is to hold the credit card (and all the other) companies accountable for what happens to it! So what's the score now?

Accountability ZERO, Deceit TEN.

My oh my, and if we stop to think of what this does to Information Quality? If the information that is stolen is flat-out wrong, now there's absolutely no way to get it "fixed."

Bottom line? This ruling opened a Pandora’s box: You can't trust anything anyone ever says about keeping your information "private" anymore.

Anyone else care to Rant? Rant anonymously if you wish...


Posted September 27, 2005 3:37 PM
Permalink | No Comments |

Want to break down the barriers? Tired of "taking sides" when you don't have to? In this blog I explore a modeling technique called the Data Vault (no it doesn't have to do with security or locking your data away). This technique sits squarely between Inmon 3rd normal form warehouse and Kimball Star Schema design as a warehouse.

This modeling technique is comprised of the best-of-breed from both designs and is built to overcome limitations of the adaptations made to each data modeling architecture; specifically with regards to data warehousing.

What is a Data Vault?
Definition: The Data Vault is a detail oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business. It is a hybrid approach encompassing the best of breed between 3rd normal form (3NF) and star schema. The design is flexible, scalable, and consistent. It is a data model that is architected specifically to meet the needs of today’s enterprise data warehouses.

What is it's real name?
Common Foundational Warehouse Architecture

What does it do for me?
* Saves time and money in build out of your data warehouse or enterprise integration initiative.
* Provides a consistent, repeatable architecture that can scale to your enterprise needs.
* Defines easy to use standards for all to follow
* Reduces complexity of the integration effort
* Saves storage space
* Increases visibility into both: well-oiled and broken business processes
* Demonstrates a strong basis for Data Visualization and Data Mining

What are some of the benefits?
* Rapid prototyping and build out of data marts and reporting solutions.
* Genuine audit trail pictures produced of the enterprise vision of data (even if none exist on the source systems)
* Data is modeled by type, and rate of change - allowing both batch and real-time to be "added" to the warehouse at the same time.
* Contains a high degree of data attribution
* Scales to petabyte levels if necessary
* Can handle near real time data arrival at a fraction of a second.

Yes, but are there any customers using it?
Sure - just check the web site for more information.

What are some of the success stories?
* Large manufacturing company saved millions when finding and fixing a billing error that had been occurring for the past 15 years.
* Large financial company integrated M&A 3 companies in 3 months flat
* Large banking company adds "branches" to their warehouse quickly at a low cost.
* Government operation decreases "time to build data marts" to 1 hour (from requirements to inception).

What makes the Data Vault so successful?
It's ability to be modeled AT THE BUSINESS LEVEL. The Data Vault is designed to mimic the business keys - the most important data element in business. Once the keys are established it models the relationships across those keys; which flush out both process relationships and undocumented business operational relationships. Finally the attribute data or descriptive data is added to the mix and split by Type of Data and Rate Of Change.

From a business perspective, the logical model is tied tightly with the physical model and architecture - therefore it is easy to change the model as the business changes. It is also based on a statement of fact. The business keys are in use and were in use at a specific point in time. This model is built to capture those ideas.

Come hear more about this technique at TDWI, Sunday October 30th in Orlando. Or contact me directly: daniel.Linstedt@myersHolum.com

Thanks,
Dan L


Posted September 27, 2005 5:26 AM
Permalink | 3 Comments |

Pardon my ignorance, I'm still learning EII too - where it fits, how it's growing up, and what customers really can do with it in the long run. Lately I've been asked to describe EII vs EAI vs ETL in layman’s terms. I'll attempt to do this in this (short) entry. Again, if I misunderstand, please correct me for the benefit of the community, and I'll go seek additional definitions for better content next time.

By the way, I do enjoy the comments I've been getting from Tim Mathews (and others). He has a blog on Ipedo's web site: http://blogs.ipedo.com/integration_insider/

I work hard at translating terms into layman’s terms, and sometimes I don't quite get it right the first time ;-) Please correct me if I misstate some truths in translation. Just don't send the BableFish after me!

I'll try my best to explain the basic definition, differentiator, and provide an over-simplified example of where the technology might fit. Hopefully this will clear the air a bit more...

EII - Enterprise Information Integration, crudely defined as a middle tier query server; but it's much more than that. It contains a metadata layer with consolidated business definitions. It also contains (usually) an ability to communicate through web-services, database connections, or XQuery/XPath (XML translation). In fact, it relies heavily on the metadata layer to define "how and where" to get its data.

It's a PULL engine, that waits for a request - splits the query (if it has to) across heterogeneous source systems (multiple sources), gathers transactional (mostly) data sets, merges them together (again relying on the metadata layer for integration rules), then pushes them out to the requestor; which could be a web-service, a BI query tool, Excel, or some other front-end (like EAI or Message Queuing Systems).

With EII it may be safe to say?
The more definition that a business can provide for the metadata layer, the better the ROI the business will see, and the higher the utilization of the tool.

EII usually sits seamlessly between the requestor and the multiple scattered data sets. One final note: its job is NOT (as of today) to move massive batches of information on a scheduled basis from point A to point B through heavy translation layers.

An oversimplified example might be: A voter walks into a voting area, the registrar needs to check his background, current address, phone numbers, driver’s license records, and any recent activity involving the law. Each system has it's own interface, each system is completely disparate and doesn't talk to one another, and the registrar only has a drivers license number (maybe a current address) to look them up with. They need a response in a matter of seconds: Can this guy vote here now? EII is a perfect fit for getting this kind of job done, although the registrar uses a web-interface and never "sees" the EII tool doing the work.

EAI - Enterprise Application Integration. This one's been around for a while. In layman’s terms: EAI connects your Siebel to your PeopleSoft, and your Oracle Financials to your SAP systems, and vice-versa. Most EAI systems are PUSH driven, a transaction happens in your Enterprise App, and an EAI listener "sees" it and pushes it out over the bus, or to a centralized queue for distribution to other applications. Most EAI engines are more "workflow" and "process flow" driven rather than on-demand.

A simple example is: PeopleSoft is connected to Oracle Financials, and a sales person enters a new customer order, the EAI application picks up the new customer / new order, and sends it to Oracle Financials to be recorded. EAI is also transaction oriented. EAI's major flaw? It doesn't talk to "non-applications" like legacy systems, data warehouses; excel spread sheets, stock tickers, unstructured data, email, and so on (although some vendors have built custom "readers" for this information).

ETL - Extract Transform and Load, sometimes known as ELT (extract load THEN transform). This also is an older paradigm (although somewhat newer than EAI from an acronym standpoint). ETL/ELT offer PUSH technology. Usually geared towards huge volumes, highly parallel, repetitive tasks, scheduled and continuous. These are a kind of heart-beat of many integration systems around the world today - they feed massive amounts of data from point A to point B in a timely fashion. They are responsible for performing that task on a consistent and repeatable basis. They handle massive transformations (sometimes in the database, sometimes in stream).

Most ETL/ELT engines today also run on metadata, but a different kind of metadata (compared to EII). The metadata they utilize (I like to call) PROCESS METADATA. It contains back-office workflow information, the end-results of the data integration are often seen through utilizing data marts or querying the database directly. Although rare, ETL/ELT can also be used as a device to synchronize systems around the organization on an hourly or nightly basis.

ELT/ETL engines often do NOT respond well to transaction based requests, which is why ETL/ELT vendors are struggling with Real-Time integration today. An example of ELT/ETL would be: Integrate all customer data from 4 or 5 of my source systems overnight - produce a customer management table with all my customers in it. While you're at it, get me an ice-cream with a cherry on top and a root beer... Just kidding.

Well, this brings this entry to a close, I hope you enjoyed "my version of the truth." feel free to correct me, and I'll do more homework next time. Same B-Eye Time, Same B-Eye Channel, tune in next time for: For Whom the EII bell tolls??

Cheers,
Dan L


Posted September 20, 2005 7:30 PM
Permalink | 6 Comments |

I blogged about the Stinkiest Shoe Competition back in May 2005. Read it here... This time I come to find out that a sock manufacturer has actually created (or is working on creating) anti-stink SOCKS through nanotech.

**Note, this blog is not for those with severe allergic reaction to big stinks...** (this entry is short and light hearted) :)

I asked the question: what would happen if a shoe manufacturer could utilize or apply nanotech in a way that would make tennis shoes less stinky? Now with less stinky socks do we even need to worry about stink free shoes? Probably - especially for those who don't wear socks.

But besides the point, read about this Boston Sock Manufacturer and their application of nanotech to sock material here... Really, I'm not kidding. I think there's big money to be made in the business of applying nanotech to solving every day problems like stinky shoes and now, stinky feet.

All I need now is a pair of socks that "don't get cold when my feet sweat."

Weigh in now, tell us how you feel - would you buy a pair of these socks? You can answer anonymously or "for a friend of yours" if you like...

See you next time, Dan L


Posted September 16, 2005 10:45 AM
Permalink | 1 Comment |
PREV 1 2 3

Search this blog
Categories ›
Archives ›
Recent Entries ›