Business Intelligence Network
business intelligence resources

Blog: Krish Krishnan

« November 2007 | Main | January 2008 »

December 29, 2007

Why an Analytical Data Warehouse Appliance

We all have been discussing, writing and reading about Data Warehouse Appliances. Still in the formative years, this technology is already making rounds in the data center. Recently I had the opportunity to spend time on columnar databases based Appliances. This technology is just awesome for Analytical applications. Why do we need a separate appliance for Analytical purposes.

When we look at the current RDBMS technologies, they are geared towards OLTP applications. They all provide analytical functions built on the database platform. But these functions when running on the traditional SMP architecture cannot perform at high speeds and encounter severe disk and cpu constraints. This is especially a fact when it comes to running OLAP queries on the SMP architecture.

This is where a columnar database differs. Traditional databases processes queries in a row based fashion while columnar databases processes queries in a columnar fashion. The architecture of a columnar database is

1. Columnar data storage
2. High-performance data loads and updates processing
3. Shared-nothing and Massively Parallel Processing architecture
4. Adaptive compression of data based on data type and length
5. Data is stored compressed on the disk

The query processing architecture of a columnar database is

1. Only columns relevant to the query being executed are retrieved
2. All operations are done in parallel (A traditional DBMS will scan all of the data sequentially)
3. There is a very low overhead in data retrieval
4. Data is scanned compressed and only expanded on retrieval.

Since the underlying database is architected to store data compressed and retrieve data in columns rather than rows, there is an advantage in building a multidimensional query on this platform. While there is a potential for the columnar database to provide a platform advantage for the Analytical data warehouse, the other appliance technologies also provide a similar advantage in terms of performance.

One columnar database vendor has already proven their database strength by executing the TPC-H benchmarks.

As Operational BI matures over the next few years and the demand for operational reporting increases there will be an increase in demand for data availability and data accessibility. These are the technologies that will be deployed in the data center to augment the workload from the data warehouse.

Watch this Blog for further details on this topic.

  Posted by kkrishnan at 6:15 PM | | Comments (0)


December 28, 2007

A few things needed in any Data Warehouse Appliance

I have been asked this questions a number of times, "what is in the data warehouse appliance?, I do not feel comfortable with it being a black box". Every time the answer that I have assured the user is, there is no black box concept, but the interfaces to manage the data warehouse appliance might not be robust as the mainstream databases yet and it is a maturity process on the technology itself.

While the vendors out there are working to make the maintenance and management of the Data Warehouse appliance easier, here are a few things that I would like to see implemented in any of the appliance technologies.

1. GUI interface - A thin client user interface for technology and user management of the appliance. While it is there in a few appliances, it is not at a level to instill confidence in users. Technologies where this is not available are going to have to get this done as a priority.

2. DBA and Administrator documentation - While the appliance can run on any Linux and Unix platforms, there are additional commands that have been added for the MPP engine integration. Robust documentation on configuration and system administration will be greatly appreciated. Similarly from a DBA perspective, documentation and management gui interface wil be an absolute success criteria.

3. TPC-H benchmarks - I'm not suggesting that every vendor needs to implement a TPC-H benchmark. But doing a TPC-H benchmark will provide the IT user statistics in a decision making process, and provide apples to apples comparison on the platform.

4. Reports - To ensure managability and provide indepth infromation, reports should be made available on disk and cpu utilization, data allocation etc. This will serve as an educational inout and will also provide operations support.

  Posted by kkrishnan at 6:01 AM | | Comments (0)


December 12, 2007

Why the Appliance is gaining on

Whether it is liked or not, the Data Warehouse Appliance has been making the rounds and it is finally getting the attention. Looking at Gartner's latest magic quadrant, we see three vendors in the quadrant (hey they are there with the big biys who have been around forever). In the next year at this time we will see more expansion of names in this area. What has made the Appliance click or get attention?.

The initial focus that was being showered on the Appliance was the competitive advantage of cost, but not anymore. the Appliance vendors have started providing feature and functionality that traditional database solutions are putting on the roadmap as future releases. The very reason that these companies are young and are ready to take the challenge and provide the solution in a record time is proof enough.

Appliance vendors have long shown the ability to move data in volumes at record speeds, the technology is built to perform and eliminates the need for overheads like indexes in most cases, there are situations where you might need them but that is a rarity. Commoditisation has proven the ability to execute in non-proprietary platforms. The ability to scale has been demonstarted by all the vendors.

Appliance vendors have been leveraged as partners by leading BI tools such as Business Objects, MicroStrategy and Informatica to name a few.

Yes, this is a relatively new technology and there is room for improvement. There are more exciting things coming in the next year from this area, and lot of these technologies will be future trendsetters.


Watch, read and participate in this channel for more information on the Applainces, their integration, issues etc.

  Posted by kkrishnan at 10:04 PM | | Comments (0)


December 5, 2007

Data Agility

With the ever growing need for data to be available in real time mode for consumption by business and non-business users, we are seeing a new rush for data agility and a need for a new backbone architecture for data integration. YouTube has given a new meaning to information sharing in the media, similarly digital dashboards has become an integral component to the business owners and executives for decision making. Realtime demand forecast engines have started making supply chain more agile then ever and customer feedback in realtime has become a major investment for theme parks (e.g. EuroDisney).

What drives this demand is the need to be agile in your business. Whether it is a meat packaging plant in rural Iowa or the theme park in the world's largest cities, the need to be agile and responsive to the customer has brought a new meaning to data availability and data integration. I do agree that you cannot change production processes or schedules or alter already manufactured goods, but with the right information available in the right time, you can work wonders with managing your product, your offerings or services or better yet your production schedule.

In order to meet this ever growing demand, technology has also been improving consistently. CPU's have become more faster and less expensive overall, memory has just about doubled in performance increase and dropped in pricing. Disk has become incredibly cheap, infact with the world going digital with SDCards and Flash Drives (even in camcorders). disk demand for storage will increase in the future.

In the data warehouse space, the demand for data agility has been consistently met with new and innovative offerings. Data Warehouse Appliances have established a strong footprint in the data centers around the globe in this year. I see this technology being embraced by data centers and data warehouse IT staff in the coming years.

Data integration architectures are being revamped to accommodate the data agility needs. DW 2.0 from Bill Inmon is pathbreaking with UnStructured data integration techniques. We are seeing the ODS being revived considering the Operational BI requirements and the data agility needs thereof.

Retail and Financial services data requirements have just about quadrupled in the last couple of years. I'm seeing the healthcare industry's growing pains with data and see that solutions are getting ready to address the issues.

Whichever way you choose to look, the next phase of this journey for all data practitioners is going to be an interesting and rewarding one.

  Posted by kkrishnan at 12:00 PM | | Comments (0)