Business Intelligence Network business intelligence resources

Blog: Krish Krishnan

« August 2007 | Main | October 2007 »

September 23, 2007

Is the SMP architecture suited for the Data Warehouse Ecosystem

Whenever a debate comes up amongst RDBMS vendors on who is the fastest database on which platform, you are directed to TPC results and how each one can surpass the other in controlled tests. Well when we talk transaction processing that is one thing, but if we talk about the data warehouse, this takes a whole different meaning.

I do not mean to belittle the TPC and their benchmarks, being a DBA myself for a number of years. But when we talk of the data warehouse ecosystem, we are not talking about small discrete transactions, we are talking about mixed queries that need processing power from the underlying database, storage and network.

The current stack of database, storage and network platform, with all the power and processing capabilities have yet not solved the issue of query speed and sustained performance. Here is where the end users are left frustated and IT often helpless, since the underlying platform cannot cope with the expanding demands, which arise from the fact that the underlying architecture of these traditional solutions are SMP based.

The net result is more spending all over to ensure sustained speed and performance while paying for database and code redesign and deployment. Does this cycle ever slow down lest stop, is there an alternative?.

We do have an answer when we talk of less spend for more performance. Before you go on your next splurge, take a look at the Data Warehouse Appliance. If you need answers ask the hard questions. This is one compiled stack of database, storage and network all built for performance and sustained scalability and most importantly based on MPP architecture.

There are multiple vendors in the market with Data Warehouse Appliances\ offerings, each of them have a solution that can be applied to solve specific problems. Yes they are all up and coming technologies, but rewind the clock and so were the database, storage and network vendors in the early years.

No early adopter of this technology has claimed failure so far, that itself is a testimonial to the fact that the Appliance technology is here to stay and help solve the problems that cannot be solved by the SMP stack. It is not just being MPP that matters, but building the right solution offering at the right price is where the Appliances are making the inroads.

Whether we choose to look at this technology or choose to ignore it for all reasons, it is becoming clear that SMP architecture cannot fuel the data warehouse ecosystem for a long time and will need the MPP architecture to co-exist and provide the combined platfom.

  Posted by kkrishnan at 8:13 PM | | Comments (1)


September 13, 2007

Historical Data - Costly Maintenance

Historical data (greater than 3 years from today), Is it needed all the time by your users? can you afford to move your historical data to an offsite location, will you be able to get it back and have it available in the data warehouse when required? if the answer is yes, then you are in good shape, if the answer is no to any of these thoughts, then read on.

Look at the overall cost of what historical data does to your bottom line

1 Increases the cost of storage
2. Increases the response time of your queries
3. Increases the time you take to load data to the warehouse

Well most of us know the obvious issues as listed above, but there are other issues that are often overlooked

1. Metadata and Master Data for the historical data needs to be maintained in the data warehouse.
2. If ETL code was developed in increments over the historical data, it is an additonal overhead that needs to be maintained.
3. If the historical data is archived and needs to be brought back then there is an issue of missing the metadata and thus losing the interface to the data.
4. Integration of the historical data to the new data always causes issues.
5. Data content between historical and current data fields might have changed, this is an impact when you want to do comparison reporting.

When you start looking at the overall impact of both maintaining and not maintaining the historical data, you will be confused as to which way to proceed. A correct decision to be made in this regard will require you to look at the value that this data will provide to your business. In other words, before you decide how to manage a data lifecycle within the data warehouse, involve the business, by taking this problem to your data governance and steering committee. Start showing the business the pains that they are facing from a volume of data and explain to them the different options of how you can mitigate this pain. One such option is to consider a new platform to store your historical data onsite at alower cost of ownership.

A data warehouse appliance will help you achieve the ability to store your historical data and still not pay in storage and reloading costs. All of the appliance vendors are ANSI SQL compatible and you should have no issues in pointing your BI tools to this platform for analyzing historical data. If you need to do comparison analysis, you can still build a special dataset on this platform and then bring it to your data warehouse for further reporting.

Bottom line is that there are new and flexible options in the technology for your data warehouse solution architetcure, how soon will you adopt to it depends on how intense your pain and financial drain is.

  Posted by kkrishnan at 10:28 AM | | Comments (0)