We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.


To Be or Not to Be a BI Appliance Embracer

Originally published September 23, 2009

I was at a business intelligence (BI) presentation recently, and a professor from Berkeley characterized the current data explosion using the phrase “Industrial Revolution of data.” This resonated nicely as it highlighted a key contributing factor to the increase in data volumes we are challenged with, i.e., data produced by automated systems such as self-service teller machines, the Internet, cell phones, etc. Given the continued growth in all of these systems, the rate at which data is expected to grow is continuing to increase. Enterprises, like it or not, have to brace themselves for this data onslaught.

Most of the traditional databases such as Oracle, DB2 and Microsoft have managed to deliver business intelligence (BI) value with data volumes up to 4 or 5TB, and this is possible only with expensive high-end iron. Successful data management beyond 5TB has been almost impossible for an average enterprise IT shop to even imagine running on these traditional database platforms.

The continued success of Teradata and the very successful initial public offering (IPO) of Netezza just a couple years ago is a clear indication of the value of innovation in this space. Teradata’s success has primarily been with its target of the Fortune 500 customers who have deep pockets to invest in its proprietary hardware/software/services solution. Netezza, on the other hand, attempted to expand the set of customers that could leverage such BI technologies by significantly reducing the entry-level pricing/affordability of its solution. But it still requires a proprietary hardware/software solution. Both these players pushed the limits of the “shared nothing” massively parallel processing (MPP) architectures to scale to many tens of terabytes. This approach beat the shared architectures of the traditional database players hands down. However, both these solutions are still relatively more expensive, and the proprietary nature of their hardware platform has not been well received by many enterprises.

Google has proven that it is possible to leverage commodity hardware in an extremely effective manner and still deal with data volumes that are orders of magnitude greater than that of the largest enterprise datasets. The big upside of working with commodity hardware is that you can benefit from the billions of dollars that the hardware companies are pumping into their products, constantly lowering the price and improving performance. An architecture that leverages commodity hardware can gain the benefits of this ever-evolving platform.

This commodity hardware-based architecture has became the foundation for several BI appliance start-ups, attempting to bring to the enterprise “structured” data environments what Google has done for the unstructured world. Players like Greenplum, Dataupia, Kognitio and Aster Data have all pioneered this approach with some variations. These new players have also based their solution on the shared-nothing MPP architectures. As expected of start-ups, these players have been extremely aggressive in highlighting and proving their key value proposition, i.e. price/performance ratio.

I’ve been involved in two BI appliance bake-offs over the last year; and in both cases, these new players have had a very significant upper hand for the price/performance value. Also, their ability to scale out linearly, leveraging commodity hardware has been a huge value proposition for enterprises.

Most of these players do offer the choice of either a “software-only” solution on recommended hardware platforms (restricted more from a support perspective) or a packaged solution which includes hardware and software, providing additional flexibility for IT organizations to choose the type of hardware they would like to get.

The BI appliance players make big claims of performance gains not just on the “querying” of data, but also the loading process. In one of the proofs of concept (PoCs), I put this loading process to the test. There was a particular load process built with an established ETL company’s solution that was taking about 33 hours to complete. This 33-hour process included loading data from flat files to the staging area and then into a star schema and then building a set of aggregates. This process included data inserts, deletes and updates testing all of the load operations. Each PoC run had to start with the same set of flat files and at the end of the run have data in all of the final tables including the aggregates. We did a table-by-table differential at the end of each run to compare them with the baseline tables to ensure that the run produced the same results.

Even though these appliances claimed the ability to deliver the significant performance gains without aggregates, we ensured that they built all of the aggregates. This was primarily for two reasons:

  • To ensure that we had a clear baseline for comparison on the load process performance.
  • These aggregates could not be eliminated as that would require the rewrite of a bunch of reporting and analytical applications that had been built on top of these aggregates.

It took each of the appliance players less than a week to build the scripts to mimic the 33-hour load process. Two of the appliance players that participated in this bake-off had performance gains that they were able to prove which were mind blowing to say the least. Both the players were able to bring down the load time from 33 hours to about 30 minutes. Just incredible!! I do want to state that the process did not have very complex transformation, but still this performance gain was way too significant to ignore.

Even though these BI appliances showcased significant performance gains on the query side as well. The IT management was so impressed by the load performance gains that it was enough to make a business case for it.

I also included the simultaneous load and query tests to see how effective they were in minimizing the downtimes of these BI environments. Both players had architected their systems to support querying and data loads to happen simultaneously, eliminating the traditional bottlenecks and non-availability situations. So they were able to prove that there was no degradation in performance when dealing with mixed load tasks as well.

Most projects in this customer’s environment required to plan for different development, testing and preproduction environments during a project life cycle. Many a times creating these environments was in the project’s critical path, and each of these environment setups needed anywhere from 3-5 business days. With the new BI appliance platform, this task could be cut down to under an hour which resulted in significantly lowering the project costs. This was one of the key selling points for the appliance business case.

To conclude, I would strongly encourage every enterprise dealing with growing data volumes, even as small as a terabyte, to explore the appliance options and leverage the huge value that it can provide. BI appliances are here to stay and the sooner enterprises embrace them, the sooner they will be able to leverage the performance gains to deliver incredible value to their business users at a price point that does not need them to file for Chapter 11.

 

 

  • Haranath GnanaHaranath Gnana
    Haranath Gnana has over 20 years of experience in information systems, and is a Practice Area Leader at Saama Technologies where he was instrumental in creating the thriving Life Sciences and Healthcare practices. His technology skill set includes the development of complex information management and business intelligence systems, enabling enterprises to make data-driven informed decisions. Customers benefit from his broad knowledge of the information management lifecycle from formulating BI roadmaps and strategy, to transforming strategies to operational realities.

    During his time at Saama, he was also responsible for their "labs" initiatives to explore emerging technologies, both commercial and open source, to derive additional value for customers. As part of the labs initiatives, he built Savii (Smart Audience Viewing Intelligence & Insights, an enhanced variation of Nielsen ratings for the TV, based on TiVo data) and PSL (Predictive Store Locator, which small retailers use to identify optimal store locations) by leveraging syndicated public data sets, along with open source data mining and predictive modules. He received a patent for his pioneering work in PSL. This broad experience and expertise has helped him enable clients to adopt and leverage leading technologies such as BI appliances, mobile BI, and more recently he has been involved helping clients understand and embrace the big data phenomenon. Haranath has been published in leading outlets such as Information-Management.com, BeyeNetwork.com and Office of the CIO (oocio.com).

    Haranath completed his MBA with Honors from Lucas Graduate School of Business at San Jose State University, and his B.S. in Electronics Engineering from Bangalore University. He is a Certified Business Intelligence Professional (CBIP) in Leadership & Strategy at the Mastery Level from TDWI.



Recent articles by Haranath Gnana

 

Comments

Want to post a comment? Login or become a member today!

Be the first to comment!