Blog: Dan E. Linstedt Subscribe to this blog's RSS feed!

Dan Linstedt

Bill Inmon has given me this wonderful opportunity to blog on his behalf. I like to cover everything from DW2.0 to integration to data modeling, including ETL/ELT, SOA, Master Data Management, Unstructured Data, DW and BI. Currently I am working on ways to create dynamic data warehouses, push-button architectures, and automated generation of common data models. You can find me at Denver University where I participate on an academic advisory board for Masters Students in I.T. I can't wait to hear from you in the comments of my blog entries. Thank-you, and all the best; Dan Linstedt http://www.COBICC.com, danL@danLinstedt.com

About the author >

Cofounder of Genesee Academy, RapidACE, and BetterDataModel.com, Daniel Linstedt is an internationally known expert in data warehousing, business intelligence, analytics, very large data warehousing (VLDW), OLTP and performance and tuning. He has been the lead technical architect on enterprise-wide data warehouse projects and refinements for many Fortune 500 companies. Linstedt is an instructor of The Data Warehousing Institute and a featured speaker at industry events. He is a Certified DW2.0 Architect. He has worked with companies including: IBM, Informatica, Ipedo, X-Aware, Netezza, Microsoft, Oracle, Silver Creek Systems, and Teradata.  He is trained in SEI / CMMi Level 5, and is the inventor of The Matrix Methodology, and the Data Vault Data modeling architecture. He has built expert training courses, and trained hundreds of industry professionals, and is the voice of Bill Inmons' Blog on http://www.b-eye-network.com/blogs/linstedt/.

There's a movement afoot in the appliance world. Appliances are growing up. They are getting faster, smaller, cheaper, and yes: more specialized. I had it in my mind that the appliance market would combine on a single platform, and provide common plug & play hardware interfaces, well that just doesn't seem to be the case (maybe in the future, but then again maybe not). In this entry we will explore the different classes of appliances that are available, and what they do. We will also take a look at where they may go within the next 12 to 18 months.

There are classes of appliances for DW and BI these days, and there's still some debate about what "appliances" really are. But all of that aside, if we take a general definition of the "Appliance" for BI / EDW space - then it might look something like this:

A basic plug and play black box with a programmable interface, and an embedded data management engine (might be a database, might be an indexing engine, or something else). In any case, it manages "data in, and data out" at high rates of speed, mostly through network traffic, and a listener of some sort. Most of these appliance boxes come as autonomous network aware components with self-contained hardware. The "magic" is usually buried somewhere in the firmware algorithms, the high speed data stores, the caching mechanisms, and the internal data placement.

Some would go so far as to say: if you could predict the context of a query, and match it with the context of the data before executing the query, you'd have the world’s fastest data retrieval and data placement engine. The problem is: the context changes when the query / question changes. The other problem is: data by itself is not deterministic of context; therefore logical context groupings do not make sense in the storage patterns.

This is the age-old problem of Random Access File Systems on Physical Disk (Hard drives). Some of this can be solved with RAM disk, USB Flash Drives, and so on - but I ramble.

So let's see what we can find about appliances. I used a web search with the following terms to dig up information:

Term: "BI Appliance"
IBM MidMarket Appliance entry
Celequest Operational BI Appliance
Business Objects and their Open Appliance Initiative
HP Launches BI Appliance
Business Objects forms alliance with Netezza
Ingress and JasperSoft prepare BI Appliance

Term: "DW Appliance"
Sun & Greenplum DW Appliance
DATAllegro Appliance
Active-Base discusses Hybrid DW Appliance

And let's not forget: "Data Appliance"
Dataupia Product Overview
ArcGIS Data Appliancehttp://www.esri.com/software/arcgis/arcgisonline/about/data-appliance.html
Data Mining Appliancehttp://news.thomasnet.com/fullstory/482588/2585
Data Backup and Restore Appliance from Dataedge

Ok, so what does all this mean?
Well, I like to think that it means several things:
1) By reading through these articles, news releases, and other information - it is easy to tell that the water is muddy. (The definition of Appliance can mean most anything) I even found a post from Oracle - 1999 about a "Data Appliance" that Oracle proclaimed to produce.
2) there seems to be a distinction being made between "Data appliance", "BI appliance" and "Data Warehouse/Warehousing Appliance".

This brings me to the following conclusion: A data appliance simply manages the data access and retrieval, regardless of the type of data, the source or target of the data and the functionality. Within "Data Appliances" their might be specific needs (like backup / restore), high speed text / unstructured access, or database "file" access. The question here is: what then, differentiates a "Data Appliance" from something like an EMC, Hitatchi, or Fujitsu Smart SAN/NASD array? At first glance, not much. However, when we pull back the covers we begin to see some levels of specialized functions: like managing database files, or managing unstructured text documents, or specifically backup/restore as if it were a hot-swap drive within an existing RAID array.

This is a hard market to compete in, producing differentiators will be critical to the success of specific vendors like Dataupia. They are already making a splash in the BI world, but they'll have to go a few steps further (which they are already investigating according to their partner’s page).

The BI appliance can mean all kinds of things, but one thing (in my mind) it certainly DOESN'T mean is: "I don't need a data warehouse anymore." That couldn't be further from the truth. There are some vendors out there touting Business Intelligence without a Data Warehouse. That may be, and yes, you can get Operational Business Intelligence (a phrase coined by Claudia Imhoff and Colin White) without a Data Warehouse, but Claudia and I agree: to get the analytics, historical trends and patterns, and provide true data mining capabilities, a data warehouse must be part of the picture.

So, where does that leave the BI APPLIANCE?
Out in the rain... (sorry, just kidding). Really, they are extremely useful, and very valuable - they can replace "data mart" solutions (I define a data mart as any architected/governed/managed data delivery solution where data is rolled up, cleansed, aggregated, and altered from its original state). Remember, I define a Data Warehouse as RAW data... for another time.

The BI Appliance can become the Data Mart Appliance, or act as an Operational BI Appliance, possibly incorporating the physical ODS (operational data store) within it's hardware. The BI Appliance can speed up queries, understand data access patterns and focus on sharing data across web services (transactional data that is). But this is an Operational BI Appliance.

What are you talking about?
The term "BI" (business intelligence) really refers to people, architecture, data, databases, hardware, ETL, ELT, data delivery, cleansing, quality, metadata, management, reports, and so on. Business Intelligence includes all components and PEOPLE who make the decisions - it isn't simply the "REPORTING" engine as we've all been lead to believe. Step back for a minute, Business Intelligence has been around since the inception of competitive business... Ok - enough soap box, you get the picture, BI includes data warehousing.

So what, why should I care?
Probably because the appliances are coming to an I.T. department near you -- and for $9.95 *actual price may vary, today only, you can get a set of GINSU knives too! Just kidding. In reality, appliances have a VERY attractive proposition: They LOWER cost of ownership (TCO).

How do they lower the cost?
1. The knowledge needed to manage, tweak, maintain a database engine can (in certain circumstances) be eliminated. For instance, when a Netezza Appliance is purchased (it is a really good replacement for star schemas that are currently housed in traditional DBMS systems), the System Admin doesn't need to know anything about indexing, block sizes, performance and tuning and so on. In fact, less DBA resources are needed to manage it. But does it work with Operational Data, or Normalized Data Warehouse architectures? NO. This is where it falls down. The cost drops.
2. Performance increases through engineering levels buried in the hardware, and firmware, and partnerships.
3. Self-enclosed, self-healing devices, just hot-swap the disk if there's a failure. I'm still waiting for hot-swappable RAM, hmmmm... RAID 5 for RAM? Interesting.
4. Network pluggable, self configurable (for the most part)
5. Single management console, no matter how many "devices/racks/stacks" are put into the network.
6. Embedded functionality. Dataupia and Celequest offer functionality embedded within their appliances. Things like ETL, ELT, OLAP cube support, and so on - some of these are done at the firmware level, others are done in the partners' software, optimized for high speed data access.

Ok, so you get the picture - there's a whole lot more to this than I've discussed, but it is clear that this really is an optimal functionality. Who wants or needs to maintain "separate hardware, separate databases, separate functionality" anymore? Why not have it bundled and working together, pre-packaged and already performance tuned?

What does the future look like?
I think we'll see a continued pendulum swing toward specialized appliances, lower cost for each appliance, highly focused on solving tasks, before we see someone produce a "hardware grid appliance" which will attempt to standardize the management of all the appliances plugged in (this may take a couple years to get to).

For now, there will be an appliance for reporting, one for databases, one for OLAP cubes, one for Web Services, one for ETL, one for Quality/Cleansing, hopefully one for Metadata and so on.

The race goes on, and we will continue to see different vendors enter and leave this space. I also think this space is ripe for acquisitions and consolidations.

Cheers,
Dan L
Get your masters degree in BI from Denver University
See our board members at: http://www.COBICC.org


Posted August 14, 2007 8:26 AM
Permalink | 2 Comments |

2 Comments

Another good article. I really don't envy the people who need to make purchasing decisions around DW and BI.

You used to be able to just compare the major DW vendors, but now appliances and EII and enterprise search vendors are offering easier implementations and shortcuts. The big vendors are countering with mega suites and pseudo appliances. Open source vendors have moved to suites instead of stand alone products. You also have SaaS offerings and web 2.0 offerings to go with it and OEM products.

All these offerings require different ways to calculate the total cost of ownership - a complex equation of skills, re-use, flexibility, ease of implementation etc.

No wonder so many software makers are forced to spend more money on marketing then software development! One challenge for specialised appliance makers is that once a mega vendor gets one product into a company it is becoming easier for the customer to keep choosing products from that vendor that are compatible to avoid this product selection confusion - with the risk that you are not getting best of breed.

Nice overview, Dan. Anyone in the BI marketplace has to consider these products, as they offer some real advantages to their big name counterparts. However, appliances don't fit in every scenario. All the more reason to understand our business requirements before making hardware/software decisions.

Leave a comment

    
Search this blog
Categories ›
Archives ›
Recent Entries ›