We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.


Appliances in the Enterprise Data Warehouse – Ready for Prime Time?

Originally published August 17, 2009

I recently launched a global data warehouse using appliances in a hub- and- spoke design. Some may argue whether it is being fully utilized or whether it is being incorporated into day-to-day processing, but that will be a topic for a future article. This article will cover the areas in which people can make their own determination if appliances are truly ready to handle the day-to-day operations of an enterprise. Let me break this down into three areas – volume, integration of the data and extraction of the data for reporting.

Volume – Companies have, over the last 5-10 years, mastered processing large volumes of data. The factors we will look at are resources' usage of the systems, table sizes and record sizes, which in turn relates to page sizes. The longer the record length, the longer it takes to copy the data. Mastering volume primarily comes down to hardware capability; however, we also need to look at the network and operating system as contributing layers. When appliance design has incorporated all the layers of technology, it seems to have an advantage in two areas – cost and flexibility. We need to be careful. Just because you can copy faster doesn’t mean you can integrate faster (let’s save that for the next section). The combination of the technology stack within the appliance enables it to quickly adjust to business needs. Inserting data into a database seems to be enabled for both enterprise and appliance. People may argue sequential vs. multithread insert, but appliances have greater flexibility in this area –  advantage appliances.

Integration of the data – Most people get volume and integration flipped, but we won’t. Data integration has to provide a comprehensive, unified, open capability for global development.

Data integration includes these layers:

  • Source layer

  • Integration layer

  • Base layer

  • Package layer

  • Presentation layer

  • Reporting or consumption layer

  • Metadata layer
An enterprise acquisition strategy needs to be able to support one or more of these layers in various combinations.

On the front end of the data acquisition life cycle, the appliance seems to get out of the gate fast. One can connect right out of the box and provide some level of simple integration capability. However, the further you venture down the path of the data life cycle, where you need to integrate more complex data, the limitations of appliances become apparent. Appliances integrate data very quickly as long as the data integration business rules are simple. As soon as you start to add subject area after subject area, the process slows down. The bottleneck tends to happen around the integration layer component where data integration, separation and transformation are handled. Data that passes quality rules and definitions moves forward to the next step of integration. The transformed data is flagged as to the classification and severity of the anomaly. On rare occasions, data that did not pass quality checks will be rejected. The anomalous data will be available for reporting purposes to data quality assurance teams and to source systems. The rules for eligibility for reporting the anomaly will be enforced when the data is pulled from the base to build the package objects. The movement of so many parts tends to cause the appliances to get overheated very quickly – disadvantage appliance.

Extraction of the data for reporting – This is the area that the business sees the most. This is the make or break area. Can you display information in a meaningful way to help the company generate revenue?

Before we can dive into the last subject of the article, we need to ensure that processes have been defined and followed, governance has been communicated and adhered to and, finally, that data stewardship has been assigned and ownership has occurred.

Understanding both the similarities and the differences of the package and presentation layers is important in assessing appliance readiness for the enterprise data warehouse.

The package layer is a set of tables designed to store specific purpose information designed to explicitly meet defined business intelligence (BI) and customer intelligence (CI) requirements.

For the purpose of dashboard and scorecard reporting, as well as supporting drill-down detail information, the dimensional model (fact, dimension and aggregate tables) design will be followed. Package layer tables can represent anything from denormalized summary tables to highly normalized detail source data for drill-down purposes. These tables may be limited to current record values, or they may contain distinct historical versions of a record's changes over time.

The presentation layer is a set of tables and views accessible by end users and their tools to meet their reporting, analytics and campaign requirements. The presentation layer tables are identical to the structures in the package layer. The difference between the two lies within accessibility by BI tools. Package layer tables cannot be accessed by BI tools.

So the question is: Can appliances’ underlying structure handle these complex physical structures? More importantly, are the BI vendors lining up to work with the appliance companies to ensure all the pre-defined models can work effectively? Are the appliance companies working on providing the integration with BI vendors at this time? I don’t see it – disadvantage appliance.

Conclusion

Appliances have come a long way. The concept of combining hardware, network and database technologies into a self-contained appliance is compelling and cost effective. With any enterprise data warehouse, there is typically a strong desire to analyze and interpret data in much deeper ways than traditional BI tools provide. Integrating the data and presenting it to the business in a useful manner is crucial to getting the business to both use and continue to support the enterprise data warehouse. So until the appliance vendors figure a way to integrate information with very complex transformations and enable the presentation layer, the information appliance will continue to fill the low-cost and simple need niche market and not deliver on the industrial-strength enterprise data warehouse requirements.
  • Timothy LeonardTimothy Leonard
    Timothy Leonard has more than 20 years experience in configuration management planning for data warehouses, operational data stores (ODS) and order entry systems.  He has led several large scale infrastructure hardware (SAN, Network) design for clients, and comes into companies to help maintain and enhance the existing environment for optimal performance capability.

    His expertise includes delivering best practice methodologies around solutions that are used to identify, control and track changes for projects – integrating touch points between multiple functional areas of the implementation team. Tim has developed procedures that enforce a formal method for managing change requests to baseline documents and software release schedules associated with data warehouse projects. A patent holder for an Internet order entry system, Tim utilizes his technical background along with his project management skills to impact all phases of a software development life cycle (SDLC) – focused on the distinct requirements and standards of data warehousing projects. He has hand-on experience with all phases of a data warehouse project life cycle, and has had direct responsibility for creation of a configuration management plan for enterprise data warehouses that track all actions and changes from project initiation through to production support. Tim is a featured speaker at industry and vendor events such as TDWI and DAMA.


Recent articles by Timothy Leonard

 

Comments

Want to post a comment? Login or become a member today!

Posted February 11, 2010 by Michael Martin

It seems like a tough problem to solve with the tools available today.  Finding a way to tag, label or cut up the data in a way that it can be referenced in a quick and thorough way seems the most likely solution.  Or at least it is the part that is most in our hands. 

Is this comment inappropriate? Click here to flag this comment.