Blog: Dan E. Linstedt Subscribe to this blog's RSS feed!

Dan Linstedt

Bill Inmon has given me this wonderful opportunity to blog on his behalf. I like to cover everything from DW2.0 to integration to data modeling, including ETL/ELT, SOA, Master Data Management, Unstructured Data, DW and BI. Currently I am working on ways to create dynamic data warehouses, push-button architectures, and automated generation of common data models. You can find me at Denver University where I participate on an academic advisory board for Masters Students in I.T. I can't wait to hear from you in the comments of my blog entries. Thank-you, and all the best; Dan Linstedt http://www.COBICC.com, danL@danLinstedt.com

About the author >

Cofounder of Genesee Academy, RapidACE, and BetterDataModel.com, Daniel Linstedt is an internationally known expert in data warehousing, business intelligence, analytics, very large data warehousing (VLDW), OLTP and performance and tuning. He has been the lead technical architect on enterprise-wide data warehouse projects and refinements for many Fortune 500 companies. Linstedt is an instructor of The Data Warehousing Institute and a featured speaker at industry events. He is a Certified DW2.0 Architect. He has worked with companies including: IBM, Informatica, Ipedo, X-Aware, Netezza, Microsoft, Oracle, Silver Creek Systems, and Teradata.  He is trained in SEI / CMMi Level 5, and is the inventor of The Matrix Methodology, and the Data Vault Data modeling architecture. He has built expert training courses, and trained hundreds of industry professionals, and is the voice of Bill Inmons' Blog on http://www.b-eye-network.com/blogs/linstedt/.

May 2007 Archives

Context fascinates me, not to mention I love a good challenge.... Perhaps it's the fever I have tonight and perhaps it's just a wandering mind. In this entry I'm going to explore a couple of perspectives that I've been creating lately, along with a few theories that I'm proving out - and they have to do with (what else?) The obvious. The fact that metadata drives our data model structures, but rarely are metadata synchronized with definitional context (derived automatically from unstructured data sources), and rarely are they visualized beyond the standard 2 dimensional data models that we are so used to seeing and working with.

This entry is a thought experiment that dives into a land of "what-if" analysis, and attaches it to what I call Dynamic Data Warehousing - which also leads to Dynamic Automated Architecture Manageability. The problem is: how can we build a consistent, standardized, and solid foundational data model that will adapt its self going forward as the business changes and the needs change? (All of this of course without loosing sight of all the history that has already been collected). Impossible you say? Not at all...

A few of my good friends (much brighter than I) discuss semantic notions of context on a continuous basis. You can read about some of these phenomenal thoughts here. He discusses notions of semantic neutrality, and the fact that examining semantic reconciliation is the first step to success.

It is important to address semantic reconciliation before other analytical processes (e.g., statistical analysis, market segmentation, link analysis, etc.). This is a "first things first" principle because semantic reconciliation makes secondary analytic and computational problems that much easier and that much more accurate.

All too many "semantic driven engines" on the market today address the data as the starting point, establishing statistical analysis, market segmentation, etc... before ever addressing _any_ of the semantics of the data model (i.e.: the naming conventions, prefixes, suffixes, abbreviations, definitions, correlations, and so on.) This is fine if what you want is to build a "model that will hold your current but not your future nor your past data set."

An engine that builds a data model "based on just the data set" is completely ignoring 80% of the problem. Any semantic engine that "scans the data" to build a model might build the "most correct model" which addresses "today’s' snapshot of data"; but fails to be dynamic, adaptable, or even consistent (in structural layers) going forward.

The problem with basing metadata or newly constructed models on "data" is that it only captures the value of TODAY. And if you're lucky enough to have history, it might form a picture of yesterday - but then again, the tool may make compromises based on outliers and strange patterns that occurred (errors in the data). Or worse yet, it may not be able to make heads or tails of the data to come to any conclusion about what the model should be.

There is a solution: Data Model Architectural Consolidation based on the metadata or definitional elements held within. Coupling the existing models with semantic definitions, and a few other elements not only increases the value of the metadata, but soon can provide enough contexts to make a decision as to what the model really should be. This is what "Model Driven Architecture" is truly about, MDA is NOT about "data driven architecture" as some vendors claim.

What I'm proposing here is the fact that not only can an engine be developed to automate consolidation of data models, but that it can in fact apply new changes to an existing consolidated data model based on semantic discovery and associability. This can lead to a common data model which would last for years within a business and provide a solid, repeatable foundation from which to build.

It would also be feasible to assume that in order to reach Dynamic Data Warehousing, one must be willing to accept the ideas that data model changes can be "automated" and applied dynamically to everything including queries, load routines, web-services interfaces, and yes (eventually) with security in place.

Metadata is what ties all this together, and when the business changes the models change, when the models change, the context of the data (in most cases) changes... Data discovery to build a model is a crucial technology that can lead to better understanding, but it should *not* be the driving factor in building common data models for EDW or common web services going forward. Using models to drive models, and then making statements through data discovery is the proper way to forge ahead in a model driven world.

All of this should change the way we view metadata, we should begin to realize just how important metadata is, models are, and the impact that naming conventions can have on our business for years to come.

I'd love to hear your thoughts, whimsical or otherwise - again this is a thought experiment.

All the best,
Dan Linstedt


Posted May 28, 2007 8:54 PM
Permalink | 2 Comments |

Well, it's been a thousand miles, and a million years since I've seen a good metadata interface GUI - or for that matter, a complete enterprise metadata data warehouse (MDDW). Something that not only reports and integrates the metadata but also allows modifications from a front-end user perspective. Something with security, thin client access, read-write (bi-directional ability), and so on. In this blog I discuss what I'd like to see in the future of BI and metadata management. Right now the market is very dis-jointed. This is somewhat of a one-sided rant, if the vendors would like to respond - I welcome the new information.

I've seen a lot of metadata products, and heard about a lot of metadata products. Of course I've read the reviews by Gartner and Meta, and so on... But I have yet to walk into multiple clients and see a successful tool being utilized. Companies claim they have customers, but where are these implementations? One would think that if the vendor makes a claim, that the company which produced a metadata success would be winning all kinds of awards and showing off their enterprise metadata project.

Where are we now?
Since this hasn't happened, and since it seems like all the metadata features are still detached from the real business interfaces, and since the metadata collection and integration devices are disjoint, I thought I'd discuss where the market should go. This entry is based on comments I hear at client sites, and from consulting experience.

The questions I get frequently are:
* Which "tool" is the best metadata tool on the market?
* Why can't I update and maintain my metadata from within my BI reporting tool?
* What's the difference between business and technical metadata?
* Why do I need metadata anyhow?
* How can I tie my [technical] metadata to my business requirements?
* How does business process workflow play a part in my metadata?
* How can I lock down my metadata with roles and responsibilities?
* Metadata Data Warehouse (MDDW)? What's that?
* Why do these metadata repositories cost so much but offer so little?
* Is there a best of breed metadata tool out-there?
* How do I build my common metadata repository incrementally?

Ok, you get the point.

Where's my Metadata?
Business users are screaming for it (they've been screaming for years). BI reporting vendors (it seems) have only scratched the surface, by offering simplistic "pop-up definitions" of terms, and at that - there's no ability to maintain, manage, or control multiple term over-rides depending on login and lines of business. There's no differentiation across tools to manage the metadata. There are a few great repository tools for collecting a variety of metadata sources in the background, and some of these are rather expensive. Below I've listed (short list) a few places I see metadata pop-up:

* Excel spreadsheets (from-to or source to target mapping control documents)
* ETL coded routines
* ELT (SQL Views)
* Excel Spreadsheets (aggregations and groupings of user data sets)
* Word documents (requirements and addendums to projects)
* Source and Target data models (logical and physical)
* BI Reporting tool repositories (computational fields, yet another logical model)
* EAI, EII, and web-services tools
* User hierarchies and security (in the database engine)
* Database metrics (size, growth, down-time,CPU utilization)
* Business Process Workflow engines
* Excel Charts
* OLAP Cubes
* Microsoft Access Databases
* E-Mail
* Images (charts, graphs, pictures, photos)
* XML Files and schema tags
* Organizational Charts
* Visio Diagrams
* Web Pages
* Code (of any type: .NET, VB, Perl, PHP, Java, etc...)
* Semantic definitions and ontology’s (RDF and OWL)
* Neural Nets and learning systems
* network topologies
* Security topologies

I'll huff, I'll puff, and I'll blow your metadata in!
I've yet to see a proper metadata tool that addresses more than a surface scratch into some of these areas. Don't build your metadata solution out of straw, or on a sand-bar... A good metadata project is just that: a PROJECT. It needs to be treated like a standard project to be completed properly. Bill Inmon discusses the need for Metadata in DW2.0 (as a requirement to be DW2.0 compliant). Don't think you can simply "slap metadata in place" - it just won't happen. Buying an off-the-shelf tool for it's repository is a good start, pending it's interfaces to GRAB metadata, but most of these tools don't provide the GUI interface to update metadata, nor do their repositories VERSION metadata or provide a "metadata data warehouse".

Chicken Little the Meta Sky is Falling!
No, it isn't... Metadata is typically an after-thought, but with compliance breathing down our necks, and massive consolidation projects on the rise - metadata is becoming more important. (What is a meta-sky anyhow?!?)

I say Holmes, what do you make of this?
Well, Sherlock Holmes I'm not... but my intuition tells me that EII is positioned to access all of these components, a good repository on the backend to manage it, and with "hopefully" new EII user-front ends, metadata can be managed in an ontological format.

What experiences do you have with metadata? Are there any particular vendors you've worked with that you like/dislike?

Thanks,
Dan Linstedt
We have openings in the Masters of Science program for Business Intelligence, see more about the program at: http://www.COBICC.org


Posted May 15, 2007 7:03 AM
Permalink | 2 Comments |

EII has been getting a lot of buzz lately, especially with the purchase of Meta Matrix by Red Hat. I want to turn your attention (instead) to where EII needs to go as an industry. These are my opinions, and I welcome you’re constructive comments. EII (enterprise Information Integration) is a pull technology - grabbing data on-demand when needed from all kinds of sources, and building a single integrated view of the current world of "transactional data." So what's left?

In the future as we progress towards heterogeneous appliances, we will need EII more and more, especially with it's persistence of data in a virtual world. But what we are missing today are a few feeds on metadata (both business and technical), infrastructure and management of multiple web-services domains (both inside and outside the company walls), and the ability to track changes to data models - be it web service structure changes or physical data model changes in source systems.

EII will become more and more important as a back-office integration system and "glue" providing the framework needed to run the back-office more efficiently. I would expect that the EII tool of the future will pick up and integrate the appliances, along with managing the network of appliances in the plug and play scope. The more we can virtualize the information on a transactional level (and integrate it on the fly) the better we can manage all the back-office systems.

Furthermore, I expect the GUI of the EII tools to be focused more on the front-end users, bringing the integration management out of the back-office and more into the business user world. I believe that by focusing the EII GUI on plug & play nature it will provide additional power to business rule engines, workflow engines, processing engines, metrics engines, and of course metadata engines.

The EII GUI will reach the front office, and be simplified (as it should be), the the advanced interface will still be available for the IT staff, however business users should be able to switch context within their portals and not know or care that they are using EII for data exploration. Plugging EII directly into source data systems and pumping the data into MS-OLAP cubes (MDB), or Excel will push utilization forward.

Metadata collection systems are being built and focused on, particularly over the past year by all kinds of vendors including Meta Integration, ASG Systems, CA and so on. However, the interfaces used to collect and manage (not to mention link together) the metadata leaves a bit to be desired. EII is a perfect fit for integrating all kinds of metadata in a visual format, and providing a repeatable metadata integration and management front-end. By leveraging EII's ability to connect to all kinds of sources, and by visualizing the metadata stores we can easily combine the metadata into a common data model and write the metadata back.

Not only should EII be providing visualization of Metadata, but it should also plug in to the Reporting Tools out there, and provide the metadata feed on the fly with all the security and accessibility that the reporting tools offer. Management of the metadata MUST be created into a GUI somewhere, and it should be leveraged with EII's ability to not only "allow alteration" but provide write-back of the metadata to a common repository.

Summary:
EII of the future will have a much more robust GUI. There will be two different components to the GUI - a business user interface, and a technical interface. The EII GUI will become a BI tool in it's own right, and should plug and play with business rules engines, business processing workflow engines, web-services, metadata engines, and front-end applications like Excel and Microsoft OLAP cubes. The write-back capabilities should be leveraged to manage change, and assist with producing a common data model for the data to reside in (which will eventually be an exploration warehouse of sorts).

Do you have any thoughts?

Thanks,
Dan Linstedt
Check out a Masters of Science in Business Intelligence at: http://www.COBICC.org


Posted May 10, 2007 5:55 AM
Permalink | 2 Comments |

Appliances have a long way to go to mature - this is true. There are still a lot of customers asking for software packages to run on their existing systems. They rightfully want to leverage their existing investment in infrastructure. However, there are companies that are smaller (and some that are larger) that want to become more nimble, lower their maintenance overhead, replace old technologies with new for competitive advantage, and so on. These companies are looking at appliances in the market space. Appliances are growing up - albeit slowly.

What does this have to do with Convergence?
Software vendors are converging and acquiring to compete, they are beginning to see the value of packing incredible power and performance onto pre-configured hardware platforms, and providing web-interfaces for configuration. Software vendors are converging, as are hardware vendors, on to the notion that integration is king.

In previous blog entries I've discussed the notion of convergence, along with the notions of appliances.

War of the appliances and convergence
What will the future integration component have?
Appliances are coming to EDW
Open and Closed Appliances

And on Convergence:
War of the appliances and convergence
ETL, EAI, EII, and EIEIO

oracle buys sunopsis, redhat acquires meta-matrix, and so on.

What's going on out there?
In my humble opinion, I see some major trends. For example, I went to buy network devices for our new offices recently, and what I found confirmed quite a few things. For instance, a DSL modem with built in wireless, wired, and switching capabilities. It combines a web-interface for administration, a fire-wall, updatable firmware components, and wireless routing capabilities. Another new wireless router combined high speed, Draft N specifications, QoS, streaming video, and a few other functions. A print server combined with a wireless router - eliminating the need for a full computer to do the job.

There's no shortage of convergence: home phones are combining, networking, video, computer ready, and information bases. They've had for years, answering machines built in. Hand-held devices are combining computing power, graphics, video, music, networking, email, phone, and so on. Pretty soon, the PDA will include Bio-metric finger printing and an electronic key to open your car, start your car, and maybe even open your house.

What's this have to do with BI?
Convergence is happening in the BI world, convergence of software vendors (as mentioned above), convergence of hardware vendors, and convergence of software and hardware vendors. All of this is affecting the BI market space. If we want to be nimble, we must be enabled to move faster than our competition. We have to be able to do more with less, which means the machines we use have to become smarter, and more specific in their tasks.

Specialized machines can converge functionality of software with speed and ease of maintenance of hardware. Thus making it easier for plug and play devices to appear on the market - as well as making it easier to mass-produce these devices. These devices can enable the operator to "set it and forget it" when it comes to configuration switches, and when a newer version of the device appears, it is easy to upgrade - replace the entire device (because it's so cheap).

Why are we looking at appliances?
Because the nature and notion of appliance is changing to meet the needs of the convergence and standardization. Vendors are finding that in order to meet the demand (do more with less, and make it cost less), they have to go the route of mass-production and specialization; again, convergence of hardware and software. So, the specialized appliance is created.

What's coming in the BI space that will change our market?
A few things are headed our way. For instance why did Red Hat buy Meta Matrix? (the jury is still out on this one) Here's my opinion: they found a need to combine their specialized operating systems with highly complex data integration software in the EII space. What I'm really saying is that by combining their OS, with EII (gutting the EII, and rebuilding the best of breed) they are enabling their software for ready-made components with a web-interface control system, and data integration layers. If they want to take the cake, they can then put this combined set of software (in a couple years from now) on to a hardware platform, specialize the hardware platform to do "Features X, Y, Z" and a different hardware platform to produce features "P, Q, R” and sell them as appliances. In reality, they need to compete with Microsoft, IBM, Oracle, and Teradata.

But where does that leave the rest of the integration space?
You'll begin to see it (already has started happening), IBM has acquired hardware and software vendors over the past several years, Oracle has acquired several large software vendors, HP has acquired consulting services, software vendors and already has the hardware under its belt. Microsoft has acquired software vendors, and has effective hardware partnerships in place. Teradata well - Teradata has continued to improve the quality of their existing solutions. Neteezza has entered the market space as an appliance and has done well; DatAllegro has built their business on the nature and notions of convergence and also has done well.

Where does this leave companies like Informatica, Composite Software, Ab-Initio, Ipedo, Silver Creek, X-Aware, Meta Integration, and others?
Directly in the line of sight either as an acquisition target, or for acquiring other software vendors, and to boot - developing partnerships with hardware companies to produce specialized appliances that cannot be purchased anywhere else.

Why?
SOA will become a specialized appliance driven world. To implement SOA and all that it entails will require the ease of use, plug-and-play appliances that talk to each other through networking. The size and scope of SOA at an enterprise level requires that we do more with less, and that we improve our quality, performance, and lower the maintenance costs. All of this screams Converged Specialized Appliances which operate off of web-interfaces for management.

Quite simply put:
* Cost
* Ease of Use
* Performance
* Flexibility
* Expansion
* Improved Ability to be nimble
* Competitive nature

Data Integration technology will be created on an appliance basis in the future to get it out of the hands of the experts and in to the hands of the mass-population. Commoditization is important to innovation and moving forward. Convergence is one way to get there.

I'd love to hear what you think, please respond with your comments.
Thanks,
Dan Linstedt
http://www.COBICC.org


Posted May 3, 2007 5:58 AM
Permalink | 2 Comments |