Business Intelligence Network business intelligence resources

Blog: Dan E. Linstedt

« MDM Part Deux (II) | Main | VLDW: What happens in a scaled cluster? »

Data Warehouse Appliance, another look

Appliance based data warehousing is on the rise, and no wonder - the costs per terabyte are cheaper, and for specific applications of the warehouse - sometimes these platforms are blazingly fast. They offer plug and play technology with HA (high availability) and Fail Over just by plugging in another appliance. They offer remote management, self-updates to the BIOS, and firmware, and most of them run on open operating systems like Linux. In this blog entry I'll discuss both the pros and cons of Appliance Based warehousing, but I still believe that this will be a market segment to watch, and will eventually flood the market with the backbone for high availability data integration and warehouses.

There was a comment a while back that discussed an article in DMReview about appliances. It was written by Roger Gaskell of White-Cross systems; they build hardware for high-performance MPP and low and behold - produce a PROPRIETARY appliance.

What do appliances bring to the business?
They bring a number of wonderful features all pre-packaged in a single domain: (this is by no means a complete list)
* High Availability
* Fast loading capabilities
* Compression and Encryption (native in some cases)
* Plug and Play MPP units
* SQL Query interfaces
* Super Fast Data Access
* Low cost per terabyte options
* Plug and Play Fail-Over
* Automatic self-updating (in some cases)
* Remote Monitoring
* Complaince for data (in some cases, they include data versioning by date/time)

We saw it with the disk market in the 80's, we saw it with other devices in the 90’s like consolidation of the cell phone, with podcasts, downloads and now music on demand - appliances are everywhere. I've written on this subject of "CONVERGENCE" on B-Eye before, convergence is everywhere. With the disk manufacturers they've now grown up - the disks are no longer "just simply storage", they contain CPU's, RAM, caching algorithms, load-balancing mechanisms, reformatting (under the covers), hot-swapping, fail-over, dynamic traffic re-routing, hot-spot contention resolution, self-monitoring, remote updates, and more than that, they all adhere to common SAN or NASD standards, meaning we can plug in an IBM device next to an EMC device, and they don't care - they'll talk to each other over standard DISK I/O protocol.

What is missing from the Appliance today?
There are advances in the data warehouse appliance that must begin to take shape for this market to really grab market share. They include some of the following:
* Standards based HA and Fail-Over. For the LARGE organizations (fortune 50), they will end up with more than "one" data warehouse appliance vendor over time, this is invetable. They will require that plug and play be orchestrated across multiple vendors' devices - that they can plug and play them together in a grid fashion or over a WAN, and have them talk to each other.
* Development of a standard high-speed data exchange interface that can bridge multiple vendors together. The vendor today that "opens" the architecture to this sort of component will have a majority of the market share tomorrow.
* Partnerships with software vendors that do data integration. I've said it before, I'll say it again - establishing a low-cost option that is OEM'd inside the DW appliance to get people off the ground would be a huge boost to off-the-shelf productivity. It's also possible that partnerships with vendors of "registry solutions" and "web-based management portals" would also be a huge boost to sales and market share. Further reducing the cost of getting data integration in the door and standardized, particularly if the appliance vendor can "standardize" basic integration or web-services efforts.

I do not believe that proprietary hardware will "stop" the flow of appliances, nor do I believe it's necessarily a bad thing, EMC has it, IBM has it, Fujitsu has it, just about every disk manufacturer out there has it in their appliance and they are well-received today. It's a matter of opening up the architecture to a STANDARDS BASED service exchange, one that is obviously high-speed. These standards do not exist today, but they will - particularly as companies purchase these solutions from multiple vendors. Just look at IBM's DB2 UDB - MPP option that sits on Data Blades, it shares similar concepts - although maybe not quite an "appliance" just yet.

Feel free to contact me for more information, I would also love to hear what you think - both positive and negative, add your bullets to the list of why or why not - appliances in the future.

Thanks,
Dan L

  Posted by Dan Linstedt on March 1, 2006 6:42 AM |

Comments

Dan, Roger Gaskell is enjoying a well earned skiing break so I'd like to step in and update you on WhiteCross's move away from proprietary to open, blade based, platforms. First, WhiteCross merged with Kognitio in August of 2005 (www.kognitio.com). The WhiteCross DES (Data Exploration Server) has been replaced by the software only 'WX2' MPP database. Historically WhiteCross was a devloper of blades out of necessity before they even gained that name. The technological wave final caught us up in 2000 and blades appeared and quickly progressed from low-performance low-capacity devices with minimal networking to high-performance high-capacity compute engines with lots of networking. WhiteCross DES was already using a thin layer of Linux so the break-out step to industry standard blades was very easy. Today customers can use existing blade infrastructure or easily purchase it from their vendor of choice. TCO is very low. WX2 will turn a set of blades (one to thousands) into a single virtual appliance in less than 2 hours work! WX2 immediately exploits large amounts of RAM (Terabytes!) without the contention problem described in your recent blog about VLDB clusters. In a recent test we were scanning 23 billion records in under 2 seconds using a temporary blade farm. Blade farms allow simple (per blade) incremental growth and can easily exploit blade developments over time. WX2 just sits on top of this harnessing the available resources. The virtual appliance has interesting implications in the utility computing arena and various large vendors are researching that with us. WX2 allows temporary boosts in platform capacity by utilising additional blades at times of peak work load.

Like the worst case of reformed smoker (13 years before we kicked the habit) we can tell you from experience why proprietary is bad and why open commodity platforms is the safe performant low-cost way forward for large scale databases.

Paul Groom
Director, Business Intelligence
Kognitio

Hi Paul,

I appreciate your comment on this blog, thank-you for correcting my knowledge about Kognitio. I'd like to explore your options more thoroughly. Would you agree that my statements about appliance computing becoming a commodity platform are correct? It sounds like plug and play is a main part of your vision, and that standards based interfacing with other systems/platforms is definately a part of what you do.

Do you have any customers who can comment on their use of your appliance based solution, here on the blog?

Thank-you kindly,
Dan Linstedt

I am generally leery of these appliance products unless they can get certification from the major ETL and BI tool vendors. That is, that their ODBC connections to these tools are on the supported-platforms list.

Otherwise those vendors will disavow any tech support because you're on an unsupported platform...

Post a comment