Blog: Steve Dine Subscribe to this blog's RSS feed!

Steve Dine

If you're looking for a change from analysts, thought leaders and industry gurus, you've come to the right place. Don't get me wrong, many of these aforementioned are my colleagues and friends that provide highly intelligent insight into our industry. However, there is nothing like a view from the trenches. I often find that there is what I like to call the "conference hangover." It is the headache that is incurred after trying to implement the "best" practices preached to your boss at a recent conference. It is the gap between how business intelligence (BI) projects, programs, architectures and toolsets should be in an ideal world versus the realities on the ground. It's that space between relational and dimensional or ETL and ELT. This blog is dedicated to sharing experiences, insights and ideas from inside BI projects and programs of what works, what doesn't and what could be done better. I welcome your feedback of what you observe and experience as well as topics that you would like to see covered. If you have a specific question, please email me at sdine@datasourceconsulting.com.

About the author >

Steve Dine is President and founder of Datasource Consulting, LLC. He has more than 12 years of hands-on experience delivering and managing successful, highly scalable and maintainable data integration and business intelligence (BI) solutions. Steve is a faculty member at The Data Warehousing Institute (TDWI) and a judge for the Annual TDWI Best Practices Awards. He is the former director of global data warehousing for a major durable medical equipment manufacturer and former BI practice director for an established Denver based consulting company. Steve earned his bachelor's degree from the University of Vermont and a MBA from the University of Colorado at Boulder.

Editor's Note: More articles and resources are available in Steve's BeyeNETWORK Expert Channel. Be sure to visit today!

March 2010 Archives

In the never ending quest to determine exactly why so many BI projects and programs still fail, regardless of everything we've learned over the years, another trend has hit my top 10 list.  It appears that in the quest to reduce cost and time on BI projects, there seems to be more cases where data is being consolidated rather than integrated.  What's the difference?  Data consolidation is the process of bringing together entities and attributes into a data warehouse, each retaining their form, value and technical characteristics as they exist in the source.  Data integration is the process of combining entities and attributes such that they have common form, meaning and technical characteristics.  (See graphic)

Data_Consolidation_Graphic-1.png

Why does data consolidation present a challenge in the data warehouse, and/or data mart?  It reduces the ability of users to understand and analyze the data. Users are left to try and determine which attributes and values mean the same thing, and often attach different meaning than their colleagues.  It adds time to every analysis and often reduces the value of the result.  I usually find that companies try and compensate by building the complex integration rules into the semantic layers of the front end tools.  Aside from the additional time required to create the model, it can significantly slow down response time, especially if the rules are being processed by the BI tool instead of the database.

Why do companies end up with data consolidations rather than data integration?  In my experience, there are a number of reasons.  One cause is an inexperienced DW data modeler.  Often data modelers have more experience in the OLTP world than the DW world.  They reverse engineer the sources and try to preserve all the attributes that meet reporting requirements.  Another reason is that it takes more time to integrate data.  It requires data profiling, data analysis and interaction with the subject matter experts.  There can also be political factors involved with reaching common meaning.  It's also difficult to integrate data.  Fortuntely, we've learned techniques over the years for handling different situations, like preserving source values while still providing integrated data.  Lastly, a DW may start out with only a single source and the data model is not created to accomodate integrated data. 

If your BI program isn't delivering the value that was promised, maybe the issue is with what your delivering. 


You can also follow me on Twitter @steve_dine


Posted March 30, 2010 8:46 AM
Permalink | No Comments |

Chances are that if you have data that's taking too long to load, monster queries that are taking too long to return and/or planning to add complex analytics to the mix, you are considering purchasing a data warehouse appliance.  DW appliances leverage integrated hardware with a high-speed back-plane, embedded operating system, scalable storage and analytic database technology to deliver a compelling price to performance solution.  For existing BI programs, a proof-of-concept is usually requested to have the vendor prove that their DW appliance will solve their biggest pain points.  It's difficult not to get excited when that 3 hour query returns in 3 minutes.  However, before you sign that contract, you may also want to ensure that the DW appliance:

  1. Is certified to work with your existing, or planned, front-end BI tools.  Your BI tools may not generate SQL that is recognized by the appliance, be able to leverage the built-in database functions or support the same data types.
  2. Has native drivers that work with your existing, or planned, ETL tools.  Many appliances require the use of their bulk loaders to move large volumes of data in and out of the database and are drastically slower with ETL tools.  (note: a fast driver may not be necessary if they support push-down optimization, aka ELT)
  3. Supports correlated subqueries.  While appliance vendors will correctly argue that correlated subqueries are inherently slow, due to the fact that they require nested loops, chances are that your BI tool will produce them.
  4. Is compatible with your data modeling tool.  Many DW appliances are built upon open source databases, such as Postgres.  Make sure your data modeling tool can reverse engineer and produce the DDL that these databases can consume.
  5. Automatically handle the addition of new nodes.  Since most appliances utilize a massively parallel processing (MPP) architecture, data is distributed across the nodes to ensure balanced processing.  However, this can cause challenges when adding new nodes and often require the unloading and reloading of the data.

The best approach is to include these items, or those that are critical to your requirements, in the proof-of-concept.  While you can't mitigate all your risks, adding these items to your list of requirements will help to ensure a more successful implementation for you and your appliance vendor. 


Posted March 21, 2010 9:03 PM
Permalink | No Comments |