We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.

Blog: Wayne Eckerson Subscribe to this blog's RSS feed!

Wayne Eckerson

Welcome to Wayne's World, my blog that illuminates the latest thinking about how to deliver insights from business data and celebrates out-of-the-box thinkers and doers in the business intelligence (BI), performance management and data warehousing (DW) fields. Tune in here if you want to keep abreast of the latest trends, techniques, and technologies in this dynamic industry.

About the author >

Wayne has been a thought leader in the business intelligence field since the early 1990s. He has conducted numerous research studies and is a noted speaker, blogger, and consultant. He is the author of two widely read books: Performance Dashboards: Measuring, Monitoring, and Managing Your Business (2005, 2010) and The Secrets of Analytical Leaders: Insights from Information Insiders (2012).

Wayne is founder and principal consultant at Eckerson Group,a research and consulting company focused on business intelligence, analytics and big data.

Recently in Data Warehousing Category

The previous two articles in this series covered the organizational and technical factors required to succeed with advanced analytics. But as with most things in life, the hardest part is getting started. This final article shows how to kickstart an analytics practice and rev it into high gear.

The problem with selling an analytics practice is that most business executives who would support and fund the initiative haven't heard of the term. Some will think it's another IT boondoggle in the making and will politely deny or put off your request. You're caught in the chicken-or-egg riddle: it's hard to sell the value of analytics until you've shown tangible results. But you can't deliver tangible results until an executive buys into the program.

Of course, you may be fortunate to have enlightened executives who intuitively understand the value of analytics and are coming to you to build a practice. That's a nice fairy tale. Even with enlightened executives, you still need to prove the value of the technology and, more importantly, your ability to harness it. Even in a best-case scenario, you get one chance to prove yourself.

So, here are ten steps you can take to jumpstart an analytics practice, whether you are working at the grassroots level or working at the behest of a eager senior executive.

1. Find an Analyst. This seems too obvious to state, but it's hard to do in practice. Good analysts are hard to come by. They combine a unique knowledge of business process, data, and analytical tools. As people, they are critical thinkers who are inquisitive, doggedly persistent, and passionate about what they do. Many analysts have M.B.A. degrees or trained as social scientists, statisticians, or Six Sigma practitioners. Occasionally, you'll be able to elevate a precocious data analyst or BI report developer into the role.

2. Find an Executive. Good sponsors are almost as rare as good analysts. A good sponsor is someone who is willing to test long-held assumptions using data. For instance, event companies mail their brochures 12 weeks before every conference. Why? No one knows; it's the way it's always been done. But maybe they could get a bigger lift from their marketing investments if they mailed the brochures 11 or 13 weeks out, or shifted some of their marketing spend from direct mail to email and social media channels. A good sponsor is willing to test such assumptions.

3. Focus Your Efforts. If you've piqued an executive's interest, then explain what resources you need, if any, to conduct a test. But don't ask for much, because you don't need much to get going. Ideally, you should be able to make do with people and tools you have inhouse. A good analyst can work miracles with Excel and SQL and there are many open source data mining packages on the market today as well as low cost statistical add-ins to Excel and BI tools. Select a project that is interesting enough to be valuable to the company, but small enough to minimize risk.

4. Talk Profits. It's very important to remember that your business sponsor won't trust your computer model. They will go with their gut instinct rather than rely on a mathematical model to make a major decision. They will only trust the model if it shows either tangible lift (i.e., more revenues or profits), or it validates their own experience and knowledge. For example, the head of marketing for an online retailer will trust a market basket model if he realizes that the model has detected purchasing habits of corporate procurement officers who buy office items for new hires.

5. Act on Results. There is no point creating analytical models if the business doesn't act on them. There are many ways to make models actionable. You can present the results to executives whose go-to-market strategies might be shaped by the findings. Or you can embed the models in a weekly churn report distributed to sales people that indicates which customers are likely to attrite in the near future. (See figure 1.) Or you can embed models in operational applications so they are triggered by new events (e.g., a customer transaction) and automatically spit out recommendations (e.g., cross-sell offers.)

Figure 1. An Actionable Report
Part V - Actionable Report.jpg

6. Make it Useful. The models not only should be actionable, they should be proactive. The worst thing you can do is tell a salesperson something they already know. For instance, if the model says, "This customer is likely to churn because they haven't purchased anything in 90 days", a salesperson is likely to say, "Duh, tell me something I don't already know." A better model would be one that detects patterns not immediately obvious to the salesperson. For example, "This customer makes frequent purchases but their overall monthly expenditures have dropped ten percent since the beginning of the year."

7. Consolidate Data. Too often, analysts play the role of IT manager by accessing, moving, and transforming data before they begin analyze it. Although the DW team will never be able to identify and consolidate all the data that analysts might need, it can always do a better job understanding their requirements and making the right data available at the right level of granularity. This might require purchasing demographic data and creating specialized wide, flat tables preferred by modelers. It might also mean supporting specialized analytical functions inside the database that lets the modelers profile, prepare, and model data.

8. Unlock Your Data. Unfortunately, most IT managers don't provide analysts ready access to corporate data for fear that their SQL queries will grind an operational system or data warehouse to a halt. To balance access and performance, IT managers should create an analytical sandbox that enables modelers to upload their own data and mix it with corporate data in the warehouse. These sandboxes can be virtual table partitions inside the data warehouse or dedicated analytical machines that contain a replica of corporate data or an entirely new data set. In either case, the modelers get free and open access to data and IT managers get to worry less about resource contention.

9. Govern Your Data. Because analysts are so versatile with data, they often get pulled in multiple directions. The lowest value-added activity they perform is creating ad hoc queries for business colleagues. This type of work is better left to super users in each department. But to prevent Super Users from generating thousands of duplicate or conflicting reports, the BI team needs to establish a report governance committee that evaluates requests for new reports, maps them to an existing inventory, and decides which ones to build or roll into existing report structures. Ideally, the report governance committee is comprised of Super Users who are already creating most of the reports users use.

10. Centralize Analysts. It's imperative that analysts feel part of a team and not isolated in some departmental silo. An Analytics Center of Excellence can help build camaraderie among analysts, cross train them in different disciplines and business processes, and mentor new analysts. A director of analytics needs to prioritize analytics projects, cultivate an analytics mindset in the corporation, and maintain a close alliance with the data warehousing team. In fact, it's best if the director of analytics also has responsibility for the data warehouse. Ideally, 80% to 90% of analysts are embedded in the departments where they work side by side with business users and the rest reside at corporate headquarters where they focus on cross-departmental initiatives.


Although some of the steps defined above are clearly for novices, even analytics teams that are more advanced still struggle with many of the items. To succeed with analytics ultimately requires a receptive culture, top-notch people (i.e., analysts), comprehensive and clean data, and the proper tools. Success will not come quickly but takes a sustained effort. But the payoff, when it comes, is usually substantial.

Posted November 21, 2011 7:34 AM
Permalink | No Comments |

The prior article in this series discussed the human side of analytics. It explained how companies need to have the right culture, people, and organization to succeed with analytics. The flip side is the "hard stuff"- the architecture, platforms, tools, and data--that makes analytics possible. Although analytical technology gets the lionshare of attention in the trade press--perhaps more than it deserves for the value it delivers--it nonetheless forms the bedrock of all analytical initiatives. This article examines the architecture, platforms, tools, and data needed to deliver robust analytical solutions.


The term "analytical architecture" is an oxymoron. In most organizations, business analysts are left to their own devices to access, integrate, and analyze data. By necessity, they create their own data sets and reports outside the purview and approval of corporate IT. By definition, there is no analytical architecture in most organizations--just a hodge-podge of analytical silos and spreadmarts, each with conflicting business rules and data definitions.

Analytical sandboxes. Fortunately, with the advent of specialized analytical platforms (discussed below), BI architects have more options for bringing business analysts into the corporate BI fold. They can use these high-powered database platforms to create analytical sandboxes for the explicit use of business analysts. These sandboxes, when designed properly, give analysts the flexibility they need to access corporate data at a granular level, combine it with data that they've sourced themselves, and conduct analyses to answer pressing business questions. With analytical sandboxes, BI teams can transform business analysts from data pariahs to full-fledged members of the BI community.

There are four types of analytical sandboxes:

  • Staging Sandbox. This is a staging area for a data warehouse that contains raw, non-integrated data from multiple source systems. Analysts generally prefer to query a staging area that contains all the raw data than each source system individually. Hadoop is a staging area for large volumes of unstructured data that a growing number of companies are adding to their BI ecosystems.

  • Virtual Sandbox. A virtual sandbox is a set of tables inside a data warehouse assigned to individual analysts. Analysts can upload data into the sandbox and combine it with data from the data warehouse, giving them one place to go to do all their analyses. The BI team needs to carefully allocate compute resources so analysts have enough horsepower to run ad hoc queries without interfering with other workloads running on the data warehouse.

  • Free-standing sandbox. A free-standing sandbox is a separate database server that sits alongside a data warehouse and contains its own data. It's often used to offload complex, ad hoc queries from an enterprise data warehouse and give business analysts their own space to play. In some cases, these sandboxes contain a replica of data in the data warehouse, while in others, they support entirely new data sets that don't fit in a data warehouse or run faster on an analytical platform.

  • In-memory BI sandbox. Some desktop BI tools maintain a local data store, either in memory or on disk, to support interactive dashboards and queries. Analysts love these types of sandboxes because they connect to virtually any data source and enable analysts to model data, apply filters, and visually interact with the data without IT intervention.

Next-Generation BI Architecture. Figure 1 depicts a BI architecture with the four analytical sandboxes colored in green. The top half of the diagram represents a classic top-down, data warehousing architecture that primarily delivers interactive reports and dashboards to casual users (although the streaming/complex event processing (CEP) engine is new.) The bottom half of the diagram depicts a bottom-up analytical architecture with analytical sandboxes along with new types of data sources. This next-generation BI architecture better accommodates the needs of business analysts and data scientists, making them full-fledged members of the corporate BI ecosystem.

Figure 1. The New BI Architecture
Part IV - BI Architecture of Future.jpg

The next-generation BI architecture is more analytical, giving power users greater options to access and mix corporate data with their own data via various types of analytical sandboxes. It also brings unstructured and semi-structured data fully into the mix using Hadoop and nonrelational databases.

Analytical Platforms

Since the beginning of the data warehousing movement in the early 1990s, organizations have used general-purpose data management systems to implement data warehouses and, occasionally, multidimensional databases (i.e., "cubes") to support subject-specific data marts, especially for financial analytics. General-purpose data management systems were designed for transaction processing (i.e., rapid, secure, synchronized updates against small data sets) and only later modified to handle analytical processing (i.e., complex queries against large data sets.) In contrast, analytical platforms focus entirely on analytical processing at the expense of transaction processing.

The analytical platform movement. In 2002, Netezza (now owned by IBM), introduced a specialized analytical appliance, a tightly integrated, hardware-software database management system designed explicitly to run ad hoc queries against large volumes of data at blindingly fast speeds. Netezza's success spawned a host of competitors, and there are now more than two dozen players in the market. (see Table 1).

Table 1. Types of Analytical Platforms
Part IV - Tools Table.jpg

Today, the technology behind analytical platforms is diverse: appliances, columnar databases, in memory databases, massively parallel processing (MPP) databases, file-based systems, nonrelational databases and analytical services. What they all have in common, however, is that they provide significant improvements in price-performance, availability, load times and manageability compared with general-purpose relational database management systems. Every analytical platform customer I've interviewed has cited an order-of-magnitude performance gains that most initially don't believe.

Moreover, many of these analytical platforms contain built-in analytical functions that make life easier for business analysts. These functions range from fuzzy matching algorithms and text analytics to data preparation and data mining functions. By putting functions in the database, analysts no longer have to craft complex, custom SQL or offboard data to analytical workstations, which limits the amount of data they can analyze and model.

Companies use analytical platforms to support free-standing sandboxes (described above) or as replacements for data warehouses running on MySQL and SQL Server, and occasionally major OLTP databases from Oracle and IBM. They also improve query performance for ad hoc analytical tools, especially those that connect directly to databases to run queries (versus those that download data to a local cache.)

Analytical Tools

In 2010, vendors turned their attention to meeting the needs of power users after ten years of enhancing reporting and dashboard solutions for casual users. As a result, the number of analytical tools on the market has exploded.

Analytical tools come in all shapes and sizes. Analysts generally need one of every type of tool. Just as you wouldn't hire a carpenter to build an addition to your house with just one tool, you don't want to restrict an analyst to just one analytical tool. Like a carpenter, an analyst needs a different tool for every type of job they do. For instance, a typical analyst might need the following tools:

Excel to extract data from various sources, including local files, create reports, and share them with others via a corporate portal or server (managed Excel).
BI Search tools to issue ad hoc queries against a BI tool's metadata.
Planning tools (including Excel) to create strategic and tactical plans, each containing multiple scenarios.
Mashboards and ad hoc reporting tools to create ad hoc dashboards and reports on behalf of departmental colleagues
Visual discovery tools to explore data in one or more sources of data and create interactive dashboards on behalf of departmental colleagues
Multidimensional OLAP (MOLAP) tools to explore small and medium sets of data dimensionally at the speed of thought and run complex dimensional calculations.
Relational OLAP tools to explore large sets of data dimensionally and run complex calculations
Text analytics tools to parse text data and put it in a relational structure for analysis.
Data mining tools to create descriptive and predictive models.
Hadoop and MapReduce to process large volumes of unstructured and semi-structured data in a parallel environment.

Figure 2. Types of Analytical Tools
Part IV - Types of Tools.jpg

Figure 2 plots these tools on a graph where the x axis represents calculation complexity and the y axis represents data volumes. Ad hoc analytical tools for casual users (or more realistically super users) are clustered in the bottom left corner of the graph, while ad hoc tools for power users are clustered slightly above and to the right. Planning and scenario modeling tools cluster further to the right, offering slightly more calculation complexity against small volumes of data. High-powered analytical tools, which generally rely on machine learning algorithms and specialized analytical databases, cluster in the upper right quadrant.


Business analysts function like one-man IT shops. They must access, integrate, clean and analyze data, and then present it to other users. Figure 2 depicts the typical workflow of a business analyst. If an organization doesn't have a mature data warehouse that contains cross-functional data at a granular level, they often spend an inordinate amount of time sourcing, cleaning, and integrating data. (Steps 1 and 2 in the analyst workflow.) They then create a multiplicity of analytical silos (step 5) when they publish data, much to the chagrin of the IT department.

Figure 2. Analyst Workflow

In the absence of a data warehouse that contains all the data they need, business analysts must function as one-man IT shops where they spend an inordinate amount of time iterating between collecting, integrating, and analyzing data. They run into trouble when they distribute their hand-crafted data sets broadly.

Data Warehouse. The most important way that organizations can improve the productivity and effectiveness of business analysts is to maintain a robust data warehousing environment that contains most of the data that analysts need to perform their work. This can take many years. In a fast-moving market where the company adds new products and features continuously, the data warehouse may never catch up. But, nonetheless, it's important for organizations to continuously add new subject areas to the data warehouse, otherwise business analysts have to spend hours or days gathering and integrating this data themselves.

Atomic Data. The data warehouse also needs to house atomic data, or data at the lowest level of transactional detail, not summary data. Analysts generally want the raw data because they can repurpose in many different ways depending on the nature of the business questions they're addressing. This is the reason that highly skilled analysts like to access data directly from source systems or a data warehouse staging area. At the same time, less skilled analysts appreciate the heavy lifting done by the IT group to clean and integrate disparate data sets using common metrics, dimensions, and attributes. This base level of data standardization expedites their work.

Once a BI team integrates a sufficient number of subject areas in a data warehouse at an atomic level of data, business analysts can have a field day. Instead of downloading data to an analytical workstation, which limits the amount of data they can analyze and process, they can now run calculations and models against the entire data warehouse using analytical functions built into the database or that they've created using database development toolkits. This improves the accuracy of their analyses and models and saves them considerable time.


The technical side of analytics is daunting. There are many moving parts that all have to work synergistically together. However, the most important part of the technical equation is the data. The old adage holds true: "garbage in, garbage out." Analysts can't deliver accurate insights if they don't have access to good quality data. And it's a waste of their time to spend days trying to prepare the data for analysis. A good analytics program is built on a solid data warehousing foundation that embeds analytical sandboxes tailored to the requirements of individual analysts.

Posted November 15, 2011 7:44 AM
Permalink | No Comments |

Business intelligence is changing. I've argued in several reports that there is no longer just one intelligence--i.e., business intelligence--but multiple intelligences, each supporting a unique architecture, design framework, end-users, and tools. But all these intelligences are still designed to help business users leverage information to make smarter decisions and support the creation of either reporting or analysis applications.

The four intelligences are:

  1. Business Intelligence. Addresses the needs of "casual users," delivering reports, dashboards, and scorecards tailored to each user's role, populated with metrics aligned with strategic objectives and powered by a classic data warehousing architecture.

  2. Analytics Intelligence. Addresses the needs of "power users," providing ad hoc access to any data inside or outside the enterprise to answer business questions that can't be identified in advance using spreadsheets, desktop databases, OLAP tools, data mining tools and visual analysis tools.

  3. Continuous Intelligence. Collects, monitors, and analyzes large volumes of fast-changing data to support operational processes. It ranges from near real-time delivery of information (i.e., hours to minutes) in a data warehouse to complex event processing and streaming systems that trigger alerts.

  4. Content Intelligence. Gives business users the ability to analyze information contained in documents, Web pages, email messages, social media sites and other unstructured content using NoSQL and semantic technology.

You may wonder how all these intelligences fit together architecturally. They do, but it's not the clean, neat architecture that you may have seen in data warehousing books of yore. Figure 1 below depicts a generalized architecture that supports the four intelligences.

Figure 1. BI Ecosystem of the Future
BI Ecosystem of Future.jpg

The top half of the diagram represents the classic top-down, data warehousing architecture that primarily delivers interactive reports and dashboards to casual users (although the streaming/complex event processing (CEP) engine is new.) The bottom half of the diagram adds new architectural elements and data sources that better accommodate the needs of business analysts and data scientists and make them full-fledged members of the corporate data environment.

A recent report I wrote describes the components of this architecture in some detail and provides market research on the adoption of analytic platforms (e.g. DW appliances and columnar and MPP databases), among other things. The report is titled: "Big Data Analytics: Profiling the Use of Analytical Platforms in User Organizations." You can download it for free at Bitpipe by clicking on the hyperlink in the previous sentence.

Since "Multiple Intelligences" framework and BI ecosystem that supports it represent what I think the future holds for BI, I'd love to get your feedback.

Posted October 21, 2011 9:35 AM
Permalink | 3 Comments |

I used to think that data virtualization tools were great for niche applications, such as creating a quick and dirty prototype or augmenting the data warehouse with real-time data in an operational system or accessing data outside the corporate firewall. But now I think that data virtualization is the key to creating an agile, cost-effective data management infrastructure. In fact, data architects should first design and deploy a data virtualization layer prior to building any data management or delivery artifacts.

What is Data Virtualization? Data virtualization software makes data spread across physically distinct systems appear as a set of tables in a local database. Business users, developers, and applications query this virtualized view and the software automatically generates an optimized set of queries that fetch data from remote systems, merge the disparate data on the fly, and deliver the result to users. Data virtualization software consumes virtually any type of data, including SQL, MDX, XML, Web services, and flat files and publishes the data as SQL tables or Web services. Essentially, data virtualization software turns data into a service, hiding the complexity of back-end data structures behind a standardized information interface.

With data virtualization, organizations can integrate data without physically consolidating it. In other words, they don't have to build a data warehouse or data mart to deliver an integrated view of data, which saves considerable time and money. In addition, data virtualization lets administrators swap out or redesign back-end databases and systems without affecting downstream applications.

The upshot is that IT project teams can significantly reduce the time they spend sourcing, accessing, and integrating data, which is the lionshare of work in any data warehousing project. In other words, data virtualization speeds project delivery, increases business agility, reduces costs, and improves customer satisfaction. What's not to like?

Long Time Coming. Data virtualization has had a long history. In the early days of data warehousing (~1995), it was called virtual data warehousing (VDW) and advocates positioned it as a legitimate alternative to building expensive data warehouses. However, data warehousing purists labeled VDW as "voodoo and witchcraft" and chased it from the scene. During the next 10 years, data virtualization periodically resurfaced, each time with a different moniker, including enterprise information integration or EII and data federation, but the technology never got much traction, and vendor providers came and went.

Drawbacks. One reason data virtualization failed to take root is politics. Source systems owners don't want BI tools submitting ad hoc queries against their operational databases. And these administrators have the clout to lock out applications they think will bog down system performance.

Other traditional drawbacks of data virtualization are performance, scalability, and query complexity. The engineering required to query two or more databases is complex and becomes exponentially more challenging as data volumes and query complexity grow. As a result, data virtualization tools historically have been confined to niche applications involving small volumes of clean, consistent data sets that require little to no transformation and complex joins.

Architectural Centerpiece?

Today, however, data virtualization is making a resurgence. Advances in network speeds, CPU performance, and available memory have significantly increased the performance and scalability of data virtualization tools, expanding the range of applications they can support. Moreover, data virtualization vendors continue to enhance their query optimizers to handle more complex queries and larger data volumes. Also, thanks to the popularity of data center virtualization, many organizations are open to exploring the possibility of virtualizing their data as well.

But does this mean data virtualization is ready to take center stage in your data management architecture? I think yes. Last week, I listened to data architects from Qualcomm, BP, Comcast, and Bank of America discuss their use of data virtualization tools at "Data Virtualization Day," a one-day event hosted by Composite Software, a leading data virtualization vendor. After hearing their stories, I am convinced that data virtualization is the missing layer in our data architectures.

These architects reported no performance or scalability issues with data virtualization. If they encounter a slow query, they simply persist or cache the target data in a traditional database and reconfigure the semantic layer accordingly. In other words, they virtualize everything they can and persist when they must. This approach overcomes all physical and political obstacles to data virtualization, while improving query performance and project agility. And some hard-core "data virtualizers" do away with a data warehouse altogether, preferring to persist snapshots of data that requires an historical or time-series view.

Today, the biggest obstacles to the growth of data virtualization are perceptions and time. Given the innate bias among data warehousing professionals to persist everything, most data architects doubt that data virtualization tools offer adequate performance for their query workloads. In addition, it takes time to introduce data virtualization tools into an existing data warehousing architecture. The tools must prove their worth in an initial application and build on their success. Since enterprise-caliber data virtualization tools cost several hundred thousand dollars, they need a well-respected visionary to advocate for their usage. In most organizations, it's easier to go with the flow than buck the trend.

Nonetheless, the future looks bright for data virtualization. Since most of our data environments are heterogeneous (and always will be), it just makes sense to implement a virtualization layer that presents users and applications with a unified interface to access any back-end data no matter where it's located or how its structured. A layer of abstraction that balances federation and persistence can do two things that every IT department must deliver: lower costs and quicker deployments.

Posted October 17, 2011 3:37 PM
Permalink | No Comments |

The first part of this blog examines the pros and cons of centralized, decentralized, and federated BI architectures using Intuit as a case study. The second part looks at organizational models and how they influence BI architectures using Harley-Davidson and Dell as case studies. The third part shows different ways that corporate and divisional groups can divide responsibility for BI architectural components in the context of various organizational models.


The Federated Option

"Business needs trump architectural purity every time." Douglas Hackney

For years, business intelligence (BI) experts have diagrammed data warehousing architectures in which a single set of enterprise data is extracted and transformed from various operational systems and loaded into a staging area or data warehouse that feeds downstream data marts. This classic architecture is designed to deliver a single version of high-quality enterprise data while tailoring information views to individual departments and workgroups. It also ensures an efficient and scalable method for managing the flow of operational data to support reporting and analysis applications.

That's the theory, anyway. Too bad reality doesn't work that way.

Federated Reality. Most organizations have a hodgepodge of insight systems and non-aligned reporting and analysis applications that make achieving the veritable single version of truth a quest that only Don Quixote could appreciate. Let's face it, our BI architectures are a mess. But this is largely because our organizations are in a constant state of flux. If an ambitious, tireless, and lucky BI team achieved architectural nirvana, their edifice wouldn't last more than a few months. Mergers, acquisitions, reorganizations, new regulations, new competitors, and new executives with new visions and strategies all conspire to undermine a BI manager's best laid architectural blueprints.

Thus, the BI reality in which most of us live is federated. We have lots of overlapping and competing sources of information and insight, most of which have valid reasons for existence. So rather than trying to roll a boulder up a hill, it's best to accept a federated reality and learn how to manage it. And in the process, you'll undoubtedly discover that federation--in most of incarnations--provides the best way to balance the need for enterprise standards and local control and deliver both efficient and effective BI solutions.

Understanding the Options - Evolution of Intuit's BI Architecture

Federation takes the middle ground between centralized and decentralized BI approaches. Ideally, it provides the best of both worlds, while minimizing the downsides. Table 1 describes the pros and cons of centralized and decentralized approaches to BI. Not surprisingly, the strengths of one are the weaknesses of the other, and vice versa.

Table 1. Centralized versus Decentralized BI
Table 1.jpg

Founded in 1983, Intuit is a multi-billion dollar software maker of popular financial applications, including Quicken, TurboTax, and QuickBooks. Intuit's BI evolution helps to illuminate the major characteristics of the two approaches and explains how they ended up with a federated BI architecture. (See figure 1.)

Decentralized Approach. In the late 1990s, after a series of acquisitions, Intuit had a decentralized BI environment. Essentially, every division--most which had been independent companies--handled its own BI requirements without corporate involvement. Divisions with BI experts built their own applications to meet local needs. Deployments were generally quick, affordable, and highly tailored to end user requirements. As true with any decentralized environment, the level of BI expertise in each division varied significantly. Some divisions had advanced BI implementations while others had virtually nothing. In addition, there was considerable redundancy in BI software, staff, systems , and applications across the business units, and no two systems defined key metrics and dimensions in the same way.

Centralized Approach. To obtain promised financial synergies from its acquisitions and create badly needed enterprise views of information, Intuit in 2000 implemented a classic, top-down, centralized data warehousing architecture. Here, a newly formed corporate BI team built and managed an enterprise data warehouse (EDW) to house all corporate and divisional data. Divisions rewrote their reports to run against the unified metrics and dimensions in the data warehouse. The goal was to reduce costs through economies of scale, standardize data definitions, and deliver enterprise views of data.

Although Intuit realized these gains, the corporate BI team quickly became overwhelmed with project requests, much to the chagrin of the divisions, which began to grumble about the backlog. When the divisions began to look elsewhere to meet their information requirements, the corporate BI team recognized it needed to reinvent itself and its BI architecture.

Figure 1. Intuit's BI Architectural Evolution
Figure 1.jpg
Courtesy Intuit

Federated Approach. To forestall a grassroots resurrection, Intuit adopted a federated architecture that blends the best of top-down and bottom-up approaches. To uncork its project backlog, the corporate BI team decided to open up its data warehousing environment to the divisions. It gave each division a dedicated partition in the EDW to develop their own data marts and reports. It also taught the divisions how to use the corporate ETL tool and allowed them blend local data with corporate data inside their EDW partitions. For divisions that had limited or no BI expertise, the corporate BI team continued to build custom data marts and reports as before. (See figure 2.)

Figure 2. Intuit's Federated ArchitectureFigure 2.jpg

Benefits. The federated approach alleviated Intuit's project backlog and gave divisions more control over their BI destiny. They could now develop custom tailored applications and reports without having to wait in line. For its part, the corporate BI team achieved economies of scale by maintaining a single platform to run all BI activity and maintained tight control over enterprise data elements. Today, the corporate BI team no longer tries to be the only source of corporate data. Rather, it focuses on capturing and managing data that shared between two or more divisions and develop BI applications when asked.

Table 2. Federated BI
Table 2.jpg

Challenges. The main challenge with federation is keeping everyone who is involved in the BI program, either formally or informally, on the same page. This requires a lot of communication and consensus building, or "social architecture" as Intuit calls it. The corporate BI team needs to relinquish some of its control back to the business units--which is a scary thing for a corporate group to do--while business units need to adhere to basic operating standards when using the enterprise platform. Ultimately, to succeed, federation requires a robust BI Center of Excellence with a well-defined charter, a clear roadmap, strong business-led governance, an active education program, and frequent inhouse meetings and conferences. (See Table 2.)

Ultimately, managing a federated environment is like walking a tightrope. You have to balance yourself to keep from swinging too far to the left or right and falling off the rope. To succeed, you have to make continuous adjustments, ensuring that everyone on the extended BI team is moving in the same direction. Without sufficient communication and encouragement, a federated approach courts catastrophe.

Organizational Models That Impact BI

"Architecture must conform with corporate culture and evolve with the prevailing winds."

In Part I of this series, we discussed the strengths and weaknesses of different architectural options, including federation. And we used Intuit as an example of an organization that evolved through decentralized and centralized architectural models until it settled on federation as a means to marry the best of both worlds.

It's important to recognize that the real problem that federation addresses is not architectural; it's organizational. There is natural tension between local and central organizing forces within a company. Every organization needs to give local groups some degree of autonomy to serve customers effectively. Yet it also needs a central, guiding force to achieve economies of scale (efficiency), present a unified face to the customer (effectiveness), and deliver enterprise views of corporate performance. BI, as an internal service, needs to reflect a company's organizational structure. But knowing where to draw the line architecturally is challenging, especially as a company's business and organizational strategy evolves.

Organizational Models

At a high level, there are three basic organizational models that define the relationship between corporate and divisional groups: Conglomerate, Cooperative, and Centralized. Although, this article treats each as a unique entity, the models represent points along a spectrum of potential organizational structures. The key to each model's identity and defining characteristics, however, stems from a company's product strategy. (See table 3.)

Conglomerate Model. A conglomerate, for example, has a highly diversified product portfolio that spans multiple industries. Here, each division sells radically different products to very different customers. For example, General Electric, is a conglomerate. It sells everything from financial services to household appliances to wind turbines. A conglomerate typically gives business unit executives complete autonomy to run their operations as they see fit using their own money and people. There is little shared data and each business unit has its own BI team and technology and exchanges little, if any, data with other business units.

Table 3. Organizational Models
Table 3.jpg

Centralized Model. The centralized model falls at the other end of the organizational spectrum. Here, an organization sells one type of product to anyone, everywhere. As a result, corporate plays a strong hand in defining product and marketing strategies while business units primarily execute sales in their regions. In a centralized model, IT and BI are shared services that run centrally at corporate. There is one BI team and enterprise data warehouse that supports all business units and users.

Dell, for example, sells personal computers and servers on a global basis. Although its regional business units had significant autonomy to meet customer needs during the company's rapid expansion, Dell now has retrenched. Without dampening its entrepreneurial zeal, Dell has strengthened its centralized shared services, including BI, to streamline redundancies, reduce overhead costs, improve cross-selling opportunities, and deliver more timely, accurate, and comprehensive views of enterprise performance to top executives.

Cooperative Model. The Cooperative model--which is the most pervasive of the three--falls somewhere between the Conglomerate and Centralized models on the organizational spectrum. Here, business units sell similar, but distinct, products to a broad set of overlapping customers. Consequently, there are significant cross-selling opportunities, which require business units to share data and work cooperatively for the greater good of the organization. For example, Intuit, with its various financial software products, can increase revenues by cross-selling its various consumer and business products across divisions. Most financial services firms, which offer banking, insurance, and brokerage products, also employ a Cooperative model.

Although most divisions in a Cooperative model maintain BI teams, corporate headquarters also fields a sizable corporate BI group. Given the distribution of BI teams and technology across divisions, the Cooperative model by default requires a federated approach to BI. The challenge here is figuring out how to divide the BI stack between corporate and divisional units. There are many places to draw the line. Often, the division of responsibility is not clear cut and changes over time as a company evolves its business strategy to meet new challenges.

Harley-Davidson. For example, Harley-Davidson Motor Co., an iconic, U.S.-based manufacturer of motorcycles, decided to broaden its business strategy after the recent recession to focus more on global markets, e-commerce, and custom-made products. Consequently, the BI team created a new strategy to meet these emerging business requirements that emphasizes the need for greater agility and flexibility.

Currently, Harley-Davidson has well-defined, centralized processes for sourcing, validating and delivering data. Any application or project that requires data must be approved by the Integration Competency Center (ICC), which generally also performs the data integration work. The ICC's job is to ensure clean, consistent, integrated data throughout the company.

Currently, the ICC handles all business requests the same way, regardless of the nature or urgency of the request. In the future, to go as fast as the business wants, the ICC will break free from its "one size fits all" methodology and take a more customized approach to business requests. "We need to make sure our processes don't get in the way of the business," said Jim Keene, systems manager of global information services.

Dell. While Harley-Davidson is loosening up, Dell is tightening up. After experiencing rapid growth during the 1990s and 2000s through an entrepreneurial business model that gave regional business units considerable autonomy, the company now wants to achieve both fast growth and economies of scale by strengthening its central services, including BI. In the mid-2000s, Dell had hundreds of departmental systems, thousands of BI developers, and tens of thousands of reports that made it difficult to gain a global view of customers and sales.

As a result, Dell kicked off an Enterprise BI 2.0 initiative chartered to consolidate and integrate independent data marts and reports into a centralized data warehouse and data management framework. It also created a BI Competency Center that oversees centralized BI reporting teams that handle report design and access, metric definitions, and governance processes for different business units.. By eliminating redundancies, the EBI program has delivered a 961% return on investment, improved data quality and consistency, and enhanced self-service BI capabilities.

While Harley-Davidson and Dell represent two ends of the spectrum--a centralized BI team that is rethinking its centralized processes and a decentralized BI environment that is consolidating BI activity and staff--most organizations are somewhere in the middle. Part III of this article will map our three organizational models to a BI stack and show the various ways that a BI team can allocate BI architectural elements between corporate and divisional entities.

Part III
The Dividing Line

"If it can't be my design, tell me where do we draw the line." Poets of the Fall

In Part II of this article, we discussed the impact of organizational models on BI development. In particular, we described Conglomerate, Cooperative, and Centralized models and showed how two companies (Harley-Davidson and Dell) migrated across this spectrum and its impact on their BI architectures and organizations.

As any veteran BI professional knows, BI architectures come in all shapes and sizes. There is no one way to build a data warehouse and BI environment. Many organizations try various approaches until they find one that works, and then they evolve that architecture to meet new business demands. Along the way, each BI team needs to allocate responsibility for various parts of a BI architecture between corporate and business unit teams. Finding the right place to draw the proverbial architectural line is challenging.

The BI Stack. Figure 3 shows a typical BI stack with master data flowing into a data warehouse along with source data via an ETL tool. Data architects create business rules that are manifested in a logical model for departmental marts and business objects within a BI tool. BI developers who write code and assemblers who stitch together predefined information objects, create reports and dashboards for business users. Typically, most databases and servers that power operational and analytical systems run in a corporate data center.

Using this high-level architectural model, we can study the impact of the three organizational models described in Part II of this article on BI architectures.

Figure 3. Typical BI Stack
Figure 3.jpg

Conglomerate - Shared Data Center. In a Conglomerate model, business units have almost complete autonomy to design and manage their own operations. Consequently, business units also typically own the entire BI stack, including the data sources, which are operational systems unique to the business unit. Business units populate their own data warehouses and marts using their own ETL tools and business rules. They purchase their own BI tools, hire their own BI developers, and develop their own reports. The only thing that corporate manages is a data center which houses business unit machines and delivers economies of scale in data processing. (See figure 4.)

Figure 4. Conglomerate BI
Figure 4.jpg

Cooperative BI - Virtual Enterprise. In a Cooperative model where business units sell similar, but distinct, products, business units must work synergistically to optimize sales across an overlapping customer base. Here, there is a range of potential BI architectures based on an organization's starting point. In the Virtual Enterprise model, an organization starts to move from a Conglomerate business model to a more Centralized model to develop an integrated view of customers for cross-selling and upselling purposes and maintain a single face to customers who purchase products from multiple business units.

In a Virtual Enterprise, business units still control their own operational systems, data warehouses, data marts, and BI tools and employ their own BI staff. Corporate hasn't yet delivered enterprise ERP applications but is thinking about it. Its first step towards centralization is to identify and match mutual customers shared by its business units. To do this, the nascent corporate BI team develops a master data management (MDM) system, which generates a standard record or ID for each customer that business units can use in sales and service applications. (See figure 5.)

Figure 5. Cooperative BI - Virtual Enterprise
Figure 5.jpg

The corporate group also creates a fledgling enterprise data warehouse to deliver an enterprise view of customers, products, and processes common across all business units. This enterprise data warehouse is really a data mart of distributed data warehouses. That is, it sources data from the business unit data warehouses, not directly from operational systems. This can be a persistent data store populated with an ETL or virtual views populated on the fly using data virtualization software. A persistent store is ideal for non-volatile data (i.e., dimensions that don't change much) or when enterprise views require complex aggregations, transformations, or multi-table joins, or large volumes of data. A virtual data store is ideal for delivering enterprise views quickly at low cost and building prototypes and short-lived applications.

Cooperative BI - Shared BI Platform. The next step along the spectrum is a Shared BI Platform. Here, corporate expands appetite for data processing. It replaces business unit operational applications with enterprise resource planning (ERP) applications (e.g., finance, human resources, sales, service, marketing, manufacturing, etc.) to create a more uniform operating environment. It also fleshes out its BI environment, creating a bonafide enterprise data warehouse which pulls data directly from various source systems, including the new ERP applications, instead of departmental data warehouses. This reduces redundant extracts and ensures greater information consistency. (See figure 6.)

Figure 6. Cooperative BI - Shared Platform
Figure 6.jpg

Meanwhile, business units still generate localized data and require custom views of information. While they may still have budget and license to run their own BI environments, they increasingly recognize that they can save time and money by leveraging the corporate BI platform. To meet them halfway, the corporate BI team forms a BI Center of Excellence and teaches the business units how to use the corporate ETL tool to create virtual data marts inside the enterprise data warehouse. The business units can upload local data to these virtual marts, giving them both enterprise and local views of data without having to design, build, and maintain their own data management systems.

In addition, the corporate BI team builds a universal semantic layer of shared data objects (i.e., a semantic layer) that it makes available within the corporate BI tool. Although business units may still use their own BI tools, they increasingly recognize the value of building new reports with the corporate BI tools because they provide access to the standard business objects and definitions, which they are required or highly encouraged to use. The business units still hire and manage their own BI developers and assemblers, but they now have dotted line responsibility to the corporate BI team and are part of the BI Center of Excellence. This is the architectural approach used by Intuit (See Part I.)

Centralized BI - Shared Service. Some organizations, like Dell, don't stop with a Shared Platform model; they continue to centralize BI operations to improve information integrity and consistency and squeeze all redundancies and costs out of the BI pipeline. Here, the corporate BI team manages the entire BI stack and creates tailored reports for each business unit based on requirements. For example, Dell's EBI 2.0 program (see Part II) reduced the number of report developers in half and reassigned them to centralized reporting teams under the direction of a BI Competency Center where they develop custom reports for specific business units. The challenge here, as Harley-Davidson discovered, is to keep the corporate BI bureaucracy from getting too large and lumbering and ensure it remains responsive to business unit requirements. This is a tall task, especially in a fast-moving company whose business model, products, and customers change rapidly.


Federation is the most pervasive BI architectural model, largely because most organizations cycle between centralized and decentralized organizational models. A BI architecture, by default, needs to mirror organizational structures to work effectively. Contrary to popular opinion, a BI architecture is a dynamic environment, not a blueprint written in stone. BI managers must define an architecture based on prevailing corporate strategies and then be ready to deviate from the plan when the business changes due to an unanticipated circumstance, such as a merger, acquisition, or new CEO.

Federation also does the best job of balancing the twin needs for enterprise standards and local control. It provides enough uniform data and systems to keep the BI environment from splintering into a thousand pieces, preserving an enterprise view critical to top executives. But it also gives business units enough autonomy to deploy applications they need without delay or IT intervention. Along the way, it minimizes BI overhead and redundancy, saving costs through economies of scale.

Posted September 16, 2011 12:03 PM
Permalink | No Comments |
PREV 1 2 3

Search this blog
Categories ›
Archives ›
Recent Entries ›