We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.


Blog: Dan E. Linstedt Subscribe to this blog's RSS feed!

Dan Linstedt

Bill Inmon has given me this wonderful opportunity to blog on his behalf. I like to cover everything from DW2.0 to integration to data modeling, including ETL/ELT, SOA, Master Data Management, Unstructured Data, DW and BI. Currently I am working on ways to create dynamic data warehouses, push-button architectures, and automated generation of common data models. You can find me at Denver University where I participate on an academic advisory board for Masters Students in I.T. I can't wait to hear from you in the comments of my blog entries. Thank-you, and all the best; Dan Linstedt http://www.COBICC.com, danL@danLinstedt.com

About the author >

Cofounder of Genesee Academy, RapidACE, and BetterDataModel.com, Daniel Linstedt is an internationally known expert in data warehousing, business intelligence, analytics, very large data warehousing (VLDW), OLTP and performance and tuning. He has been the lead technical architect on enterprise-wide data warehouse projects and refinements for many Fortune 500 companies. Linstedt is an instructor of The Data Warehousing Institute and a featured speaker at industry events. He is a Certified DW2.0 Architect. He has worked with companies including: IBM, Informatica, Ipedo, X-Aware, Netezza, Microsoft, Oracle, Silver Creek Systems, and Teradata.  He is trained in SEI / CMMi Level 5, and is the inventor of The Matrix Methodology, and the Data Vault Data modeling architecture. He has built expert training courses, and trained hundreds of industry professionals, and is the voice of Bill Inmons' Blog on http://www.b-eye-network.com/blogs/linstedt/.

The market is shifting, vendors are packing more and more features and functionality into their devices, they are also making their devices smaller and smaller. What does the future Data Warehouse look like? Can it be an appliance like device? What kind of partnerships or acquisitions can we expect? Why would we choose an appliance DW over our own component selections?

In this blog I look into the future, just to see if we can answer these questions. I believe there are changes coming, long overdue changes.

In the land of yesterday we would have to go in search of "best-of-breed" software, and then pair that up with best-of-breed hardware. Size it appropriately, install it all, and integrate it ourselves (within IT). I believe all that is changing. If it hasn't already, it certainly will shortly.

New vendors on the market are offering coupled hardware with built-in RDBMS's. This is just the start and as good of a start it is, it still has a little ways to go. Let's talk for a minute. What if you could walk out and buy an ADW appliance (active data warehouse) - self-configured to perform optimally on the machine, embedded within the BIOS, encapsulated storage, and a black-box interface... Would you do it? Especially at a cheaper cost than buying RDBMS vendor 1, and Hardware vendor 2.

So what does the future device look like?
It should contain not only the RDBMS, but also ELT software. This software should be embedded onto the machine for fastest performance, along with optimized disk routines, and mechanized load balancing. The ELT software should have two types of inputs: the flat file loading process, and the real-time network plug which reads JMS queues after configuration. The ETLT should be fully self-contained on it's own processor slot so that it doesn't interfere with the RDBMS operating in parallel, at high speeds on the disk.

There should also be a BI (reporting tool) card built in. It should have it's own IP connections, and reside on it's own processor slot as well. The tool and the box configuration should all be browser based, all administration could be fat client I suppose, but why? Why not make it all web/app server? It's separated from the RDBMS and ETLT engine slots, again so that it can run in parallel. Although the BI tool and the ETLT tool should be based on a common metadata framework.

Now, depending on the number of nodes purchased - hooking them together through a third pre-configured IP allows them to load-balance across a high-speed backbone. Again, nothing to do with each other but distribute the work-load.

What kind of partnerships or acquisitions can we expect?
I think in the future, you'll see storage vendors partner more heavily with RDBMS vendors, who are already working on "blade servers" - some vendors have the pre-packaged solution there. Other vendors are coming up to speed. I also think you'll see a larger effort to integrate the BI and ETLT software onto hardware platforms. It's getting cheaper to architect and build hardware - and most of the time we need the extra performance boost, even if the BI application card is running on a dual CPU at 450 Mhz, it's the RDBMS that needs the power.

That's all fine and dandy, but where's the value proposition?
The value comes as follows: automatic updates to the software and firmware over the web, little to no configuration needed (all comes pre-installed, and factory tuned), no fancy load-balancing, or parallelism software needed to gain performance. No messy dual environments for upgrades, no multi-cost purchases, plug & play scalability, speed and performance built into the hardware/firmware.

I think you may see compliance vendors entering this game too, they already are partnering with storage vendors for appliance based storage.

What makes this work and why?
The RDBMS engine on the appliance must be extremely fast, and extremely scalable. It must focus on bulk-applying data sets or images of data that is time-stamped. It must have high-quality, high-end compression, data quality, delta-processing capabilities available. As we consolidate our resources, it becomes easier to manage, upgrade, replace. In this environment you're paying for the engineering to be "done", out of the box, into the rack - load your data and away you go. All data is denormalized inside the box, so compression ratios are very very high, storage needs are low, performance is super fast - we don't have to worry about data modeling or indexing any more. (the end goal)

There are a number of companies to watch out there who are moving in these directions. It won't be long before they can meet all these needs with one appliance.

Of course it wouldn't hurt for these companies to consider a metadata appliance either, or possibly incorporate that directly into the warehouse appliance.

Just a few random thoughts, See you next time.


Posted May 3, 2005 4:15 AM
Permalink | 1 Comment |

1 Comment

You are right this is exactly where it is going. Even when the big vendors are simplifying things, it is still too complex to create a solution for the customer.

The customers are primarily looking for packaged reports/dashboards that speak their industries language most of the times. They usually do not know, and do not want to know about ETL, Warehouse/Mart, analytics, Giga/Terabyte or even slice-dice cubes etc.

How do I know this, we wanted to include a all-in-one solution for BI from ETL to Dashboards with slice-dice etc. They actually made us backoff our complete offering and simplify it.

Now we are doing visualization with some reporting, sprinkled with some analysis. The important components to them are integrated security, and a user friendly multi-level application where they can manage and maintain their own (meta) data.

They love this solution - packaged industry centric BI from operational data integration to dashboards ready for fast implementation. Basically plug-in and go.

Leave a comment

    
Search this blog
Categories ›
Archives ›
Recent Entries ›