Business Intelligence Network business intelligence resources

Blog: Krish Krishnan

« September 2007 | Main | November 2007 »

October 26, 2007

Data Warehouse Appliances Get Attention

As the year comes to an end, in the Data Warehouse landscape, if we start assessing the ups and downs of the industry, the success and failures of companies etc. One bright spot in this entire spectrum is the attention that Data Warehouse Appliances are finally getting.

In my opinion this concept is beyond a solution for addressing Data Warehousing needs or problems with speed and cost. It is a platform that can address data related problems in general. When we talk of speed and cost, we get so mired in looking at cost and often decide to hedge bets with known technologies with several limitations since the Appliance is yet to be proven large scale.
With the recent spate of activities and postings from vendors, analysts and other industry experts in how the appliance has been implemented, and the fact that TDWI 2007 Q4 conference is dedicating sessions for speakers to talk on this subject, proves that the appliance is coming to age and adoption to this technology will gain momentum in the next few years.
We will be seeing reports from the TDWI conference posted on the www.beyenetwork.com by my colleagues who will attend this event, while you all wait for those reports, I can assure you of one fact that right here and right now is the next best thing in Data Warehousing. Lets live the moment and applaud all of those individuals who have dreamt of this concept and brought it to life.

  Posted by kkrishnan at 11:17 AM | | Comments (0)


October 23, 2007

Where Is My Data? - Part 2

From my previous posting, we started looking at the topic of auditing the data in the data warehouse and some benefits of doing this. Continuing on the same topic, due to its elaborate depth and breadth.

Everytime anybody asks me the need to audit the data within the data warehouse, and if it only solves the compliance need, my response is, no, it goes beyond making your warehouse compliant. It gives you insight into the business rules processing around data and how effective it is, it helps establish a rigor on how the ETL / ELT process is working both from a performance perspective and a logic perspective. The resulting data collected from this process, if reported on, gives the business user information about the data content that they have in their hands, the differences if any between source and data warehouse from a numbers perspective, some information on why data was rejected etc, which is extremely invaluable.

But above and beyond all these tangible benefits, there are two key points that need to be understood, by implementing an auditing process you get profound insight into

1. Data lineage - you get the ability to trace data from the data warehouse all the way back to source systems.
2. Data quality - a large benefit from the auditing exercise is your ability to handle the quality issues surrounding data.

Apart from the above mentioned benefits, you now will also have the information on things like ETL performance, server utilization, network latency which will start providing helpful insight into the overall solution architecture in the data loading process.

Let's assume that you do decide to embark on this process, if you are still in the process of building your data warehouse it is another change request; but if you have already deployed your data warehouse it is a different challenge. How to start this and how to go about making decisions etc will be a topic of discussion for the next day.

  Posted by kkrishnan at 2:27 PM | | Comments (0)


October 10, 2007

Where Is My Data?

Every business user has asked this question to their IT counterpart at least once. IT users often feels frustrated that they are held responsible for every byte of data that has been processed and are compelled to provide an audit of what they received from sources to what was processed and loaded to the data warehouse or datamart.

The question is not whether IT or business owns this process, but what is the process itself. If you have not thought of the answer by now, we are talking about the need to Audit your data warehouse. The process of auditing the data warehouse will provide an insight into all the areas of the data warehouse beginning from source systems and culminating in the datamart or reports. While you can do this process as a large scale enterprise initiative or s small group level initiative, this will clearly provide the business and IT users with answers on how data moves through the system, potential data flaws, business rules execution as key benefits. Other benefits of this initiative will be the ability to provide counts by each type of data, totals for sales data and the ability to explain discrepancies and correct them at the source system.

To be able to execute this type of a process in your data warehouse you need to be able to enforce rules of engagement like control files for source data, reference data for all the common entities across the data warehouse, common data format definitions etc. A business sponsor with an IT enabler will be required to make this initiative a success and they will need the backing and business case from a steering committee or a data warehouse governance body within the organization.

The success of this type of an initiative can be measured on the following

1. A data flow scorecard which will provide a traceability matrix.
2. The ability to detect and report anomalies
3. The ability to fix anomalies
4. Data consistency in statistics of good or bad data.
5. Peaks and valleys of data volumes
6. Inference on data quality to reported data information.

Overall the process of auditing the data warehouse is a lynchpin to the success of the data warehouse itself. It is interesting to note that in most cases this effort is often overlooked due to initial cost concerns and when there is mayhem in the data warehouse, this is the effort taken to solve the issue. How do we execute this process, when do we start and how do we integrate are all aspcets that will be covered in forthcoming blogs.

  Posted by kkrishnan at 9:00 AM | | Comments (0)