Blog: Dan E. Linstedt Subscribe to this blog's RSS feed!

Dan Linstedt

Bill Inmon has given me this wonderful opportunity to blog on his behalf. I like to cover everything from DW2.0 to integration to data modeling, including ETL/ELT, SOA, Master Data Management, Unstructured Data, DW and BI. Currently I am working on ways to create dynamic data warehouses, push-button architectures, and automated generation of common data models. You can find me at Denver University where I participate on an academic advisory board for Masters Students in I.T. I can't wait to hear from you in the comments of my blog entries. Thank-you, and all the best; Dan Linstedt http://www.COBICC.com, danL@danLinstedt.com

About the author >

Cofounder of Genesee Academy, RapidACE, and BetterDataModel.com, Daniel Linstedt is an internationally known expert in data warehousing, business intelligence, analytics, very large data warehousing (VLDW), OLTP and performance and tuning. He has been the lead technical architect on enterprise-wide data warehouse projects and refinements for many Fortune 500 companies. Linstedt is an instructor of The Data Warehousing Institute and a featured speaker at industry events. He is a Certified DW2.0 Architect. He has worked with companies including: IBM, Informatica, Ipedo, X-Aware, Netezza, Microsoft, Oracle, Silver Creek Systems, and Teradata.  He is trained in SEI / CMMi Level 5, and is the inventor of The Matrix Methodology, and the Data Vault Data modeling architecture. He has built expert training courses, and trained hundreds of industry professionals, and is the voice of Bill Inmons' Blog on http://www.b-eye-network.com/blogs/linstedt/.

The problem today is that a patchwork of applications is needed to produce a serious integrated view of the enterprise. We should have the following components: ETL/ELT, EII, EAI, Web Services, Registry Managers, and Metadata Management software embedded within our enterprise projects in order to make sense of them. It's a well known fact that each of these tool sets brings its' own value to the table, and that most enterprises today have nearly all the components already in house. As far as compliance is concerned we are seeing devices (appliances) now that capture and compress transactions as they flow through the enterprise.

A while back I blogged on the future "appliance" needed to keep integration alive, where everything will converge on to a single appliance. I stil believe this is true. Hardware is getting cheaper, and software (for some strange reason) is getting more expensive. Sooner or later, hardware vendors will partner or buy-out software vendors and then they will merge the software on to the hardware platform.

Of course we know why software is getting more expensive:
1. the vendors offer more bells and whistles than they ever have
2. the vendors need to justify the high costs of engineering parallelism, HACMP, and partitioning
3. the vendors are producing ever more proprietary algorithms to solve problems that have been already solved in the MPP world of hardware.
4. the vendors (tongue in cheek) need to pay for their acquisitions of other companies (sorry, just a joke...)

The future integration component will be a plug and play device that offers an MPP style interconnect, self-discovery of other "like devices" on the corporate intranet, and of course offers parallelism, HACMP, partitioning, and compliance within the device - all of it hardware driven. Needless to say the device will also offer compression, network sharing, historical copies of raw data, and other pieces like a query interface, information quality transformations, data mining, visualization, and a data modeling interface (just to name a few).

Functionality wise it will offer right-time data collection, along with batch data collection, compression, and dissemination. The device will be capable of talking to other devices on the intranet and sharing information, recognizing similar information, and moving information around the enterprise during idle or partially idle times.

Of course with the rise of right-time, the word "idle" will mean something different, instead of "idle" CPU, it will mean "idle" data sets. Speaking of data sets, the focus will be on compressing and utilizing compressed/encrypted data.

So what drives this huge integration?
One word: Metadata. A registry of registries, along with registry of machines, registry of data sets, registry of services, registry of business logic, registry of technical metadata and process metadata. Each of these registries will be managed in a virtual network of registries (a master registry - aka master data management). This registry system will be in a separate device until we figure out how to integrate it too, into the singe working components.

Bottom line, the data warehouse, or better yet: data integration store (DIS) of the future will in fact be a plug and play device. Self-healing will be built in through replication. A shared nothing architecture with the appropriate interchange levels will be provided, along with fault-tolerant dual-fail-over network connections. The device will be cheap and will scale in cost based on storage needs, but storage needs will become reduced as compression and encryption take over.

If you want to get a handle on your business TODAY, then I would strongly suggest beginning a master data management project, with classification and ontology’s of metadata - developing registries and business intelligence to view/alter/manage the registries across the enterprise. Putting this into a collaborative environment will help it thrive; adding incentives for employees to add metadata and information to the registries along with managing it will also produce a thriving result.

I'd love to hear your thoughts on the matter.

Thanks,
Dan L


Posted November 23, 2005 4:33 AM
Permalink | 1 Comment |

1 Comment

I'm not sure how much I subscribe to the "device" as part of the plug and play as I still think the long due "tsnumai" of web services / technology as a service will come. Whether it's a device or a SaaS provider I COMPLETELY AGREE on the absolute key being the metadata.

As a metadata geek myself (see my most recent UKOUG presentation on extending OWB for custom objects) I need no prodding to embrace metadata as an important part of INFORMATION technology. However, MANY MANY of my colleagues don't feel the same. I think when metadata does two things will people embrace it as something "worthwhile"

1) Be used to define how those systems run, thus removing the need for any extensive programming or implementation (MDA). That device you mention would use these metamodels as its source and that's it.
2) Provide context all over. Everywhere in the ecosystem if you're looking at an information subject you should be able to instantly understand what you're looking at. A HUGE part of that is where this data came from, how it was processed, additional annotations etc.

Anyhow, I rant.

Leave a comment

    
Search this blog
Categories ›
Archives ›
Recent Entries ›