Blog: Wayne Eckerson Subscribe to this blog's RSS feed!

Wayne Eckerson

Welcome to Wayne's World, my blog that illuminates the latest thinking about how to deliver insights from business data and celebrates out-of-the-box thinkers and doers in the business intelligence (BI), performance management and data warehousing (DW) fields. Tune in here if you want to keep abreast of the latest trends, techniques, and technologies in this dynamic industry.

About the author >

Wayne has been a thought leader in the business intelligence field since the early 1990s. He has conducted numerous research studies and is a noted speaker, blogger, and consultant. He is the author of two widely read books: Performance Dashboards: Measuring, Monitoring, and Managing Your Business (2005, 2010) and The Secrets of Analytical Leaders: Insights from Information Insiders (2012).

Wayne is currently director of BI Leadership Research, an education and research service run by TechTarget that provides objective, vendor neutral content to business intelligence (BI) professionals worldwide. Wayne’s consulting company, BI Leader Consulting, provides strategic planning, architectural reviews, internal workshops, and long-term mentoring to both user and vendor organizations. For many years, Wayne served as director of education and research at The Data Warehousing Institute (TDWI) where he oversaw the company’s content and training programs and chaired its BI Executive Summit. He can be reached by email at weckerson@techtarget.com.

(Editor's note: This is the fourth in a series on the Big Data Revolution.)


The Big Data revolution has arrived and it's transforming long-established data warehousing architectures into vibrant, multi-faceted analytical ecosystems.

Gone are the days when all analytical processing first passes through a data warehouse or data mart (or their less sanctified spreadmart or data shadow system brethren.) Now data winds its way to users through a multiplicity of corporate data structures, each tailored to the type of content it contains and the type of user who wants to consume it.

Figure 1 depicts a reference architecture for the new analytical ecosystem that has the fingerprints of Big Data all over it. The objects in blue represent the traditional data warehousing environment, while those in pink represent new architectural elements made possible by Big Data technologies, namely Hadoop, NoSQL databases, high-performance analytical engines (e.g. analytical appliances, MPP databases, in-memory databases), and interactive, in-memory visualization tools.

Most source data now flows through Hadoop, which primarily acts as a staging area and online archive. This is especially true for semi-structured data, such as log files and machine-generated data, but also for some structured data that companies can't cost-effectively store and process in SQL engines (e.g. call detail records in a telecommunications company.) From Hadoop, data is fed into a data warehousing hub, which often distributes data to downstream systems, such as data marts, operational data stores, and analytical sandboxes of various types, where users can query the data using familiar SQL-based reporting and analysis tools.

Today, data scientists analyze raw data inside Hadoop by writing MapReduce programs in Java and other languages. In the future, users will be able to query and process Hadoop data using familiar SQL-based data integration and query tools.

Figure 1. The New Analytical Ecosystem
BI Ecosystem.jpg

Harmonizing Opposites

The Big Data revolution is not only about analyzing large volumes and new sources of data, it's also about balancing data alignment and consistency with flexible, ad hoc exploration. As such, the new analytical ecosystem features both top-down and bottom-up data flows that meet all business requirements for reporting and analysis.

The top-down world. In the top-down world, source data is processed, refined, and stamped with a predefined data structure--typically a dimensional model--and then consumed by casual users using SQL-based reporting and analysis tools. In this domain, IT developers create data and semantic models so business users can get answers to known questions and executives can track performance of predefined metrics. Here, design precedes access. The top-down world also takes great pains to align data along conformed dimensions and deliver clean, accurate data. The goal is to deliver a consistent view of the business entities so users can spend their time making decisions instead of arguing about the origins and validity of data artifacts.

The under world. Creating a uniform view of the business from heterogeneous sets of data is not easy. It takes time, money, and patience, often more than most departmental heads and business analysts are willing to tolerate. They often abandon the top-down world for the underworld of spreadmarts and data shadow systems. Using whatever tools are readily available and cheap, these data hungry users create their own views of the business. Eventually, they spend more time collecting and integrating data than analyzing it, undermining their productivity and a consistent view of business information.

The bottom up world. The new analytical ecosystem brings these prodigal data users back into the fold. It carves out space within the enterprise environment for true ad hoc exploration and promotes the rapid development of analytical applications using in-memory departmental tools. In a bottom-up environment, users can't anticipate the questions they will ask on a daily or weekly basis or the data they'll need to answer those questions. Often, the data they need doesn't yet exist in the data warehouse.

The new analytical ecosystem creates analytical sandboxes that let power users explore corporate and local data on their own terms. These sandboxes include Hadoop, virtual partitions inside a data warehouse, and specialized analytical databases that offload data or analytical processing from the data warehouse or handle new untapped sources of data, such as Web logs or machine data. The new environment also gives department heads the ability to create and consume dashboards built with in-memory visualization tools that point both to a corporate data warehouse and other independent sources.

Combining top-down and bottom-up worlds is not easy. BI professionals need to assiduously guard data semantics while opening access to data. For their part, business users need to commit to adhering to corporate data standards in exchange for getting the keys to the kingdom. To succeed, organizations need robust data governance programs and lots of communication among all parties.

Summary. The Big Data revolution brings major enhancements to the BI landscape. First and foremost, it introduces new technologies, such as Hadoop, that make it possible for organizations to cost-effectively consume and analyze large volumes of semi-structured data. Second, it complements traditional top-down data delivery methods with more flexible, bottom-up approaches that promote ad hoc exploration and rapid application development.


Posted February 15, 2012 6:23 AM
Permalink | 2 Comments |

2 Comments

Wayne,
Great perspective! I love the idea of having older data warehouse technology in collaboration with big data platforms like Hadoop and analytic platforms. In many of my conversations with business analysts, they are frustrated by lack of access to data, the amount of time it takes to gather data and return analytic results, and the lack of interaction between the various sysetms. When I talk to them about the idea of creating a "playground" or workspace for the analyst, their eyes light up. It seems that most analysts are spending anywhere from 50% to 80% of their time just gathering data or waiting for queries to run. The idea of a new analytic ecosystem where they are free to do ad hoc discovery work, whenever they want to do it, without any limitations on the amount of data they can access it quite apealing. Big data analytics are on the move!

I have one quick question, is this name, "New Analytic Ecosystem" just what you are calling this in the article? Or is that the naming construct you are going to use for it going forward? Just curious...

Where did you get the diagram? Because I know the author.

Leave a comment

    
Search this blog
Categories ›
Archives ›
Recent Entries ›