Blog: Wayne Eckerson Subscribe to this blog's RSS feed!

Wayne Eckerson

Welcome to Wayne's World, my blog that illuminates the latest thinking about how to deliver insights from business data and celebrates out-of-the-box thinkers and doers in the business intelligence (BI), performance management and data warehousing (DW) fields. Tune in here if you want to keep abreast of the latest trends, techniques, and technologies in this dynamic industry.

About the author >

Wayne has been a thought leader in the business intelligence field since the early 1990s. He has conducted numerous research studies and is a noted speaker, blogger, and consultant. He is the author of two widely read books: Performance Dashboards: Measuring, Monitoring, and Managing Your Business (2005, 2010) and The Secrets of Analytical Leaders: Insights from Information Insiders (2012).

Wayne is currently director of BI Leadership Research, an education and research service run by TechTarget that provides objective, vendor neutral content to business intelligence (BI) professionals worldwide. Wayne’s consulting company, BI Leader Consulting, provides strategic planning, architectural reviews, internal workshops, and long-term mentoring to both user and vendor organizations. For many years, Wayne served as director of education and research at The Data Warehousing Institute (TDWI) where he oversaw the company’s content and training programs and chaired its BI Executive Summit. He can be reached by email at weckerson@techtarget.com.

January 2011 Archives

Ever walk through mud up to your ankles? It's dirty and slow going.

And that sums up the experience of many BI teams these days. They are mired in organizational and technical mud; they find it difficult to capture the interest of the business community and gain momentum to deliver real value to the business. "We need a spark," one BI manager confided in me recently.

Typically, ERP and other IT projects divert resources from the BI team, which can't get ahead of its queue of major BI projects. Casual users are slow to adopt self-service BI tools, while power users continue to create spreadmarts at a furious pace. Many users continue to submit requests for highly personalized versions of existing reports, sucking up the BI team's spare time. Meanwhile, the BI team struggles to find time and money to upgrade its data warehouse and BI platforms and deliver the functionality and performance that users demand. As a result, the BI team gets a reputation for being slow, unresponsive, and expensive.

For every step forward, it feels like you're taking two backwards. And most of your day is spent fighting fires or defending your processes and people.

You're Not Alone! If this sounds familiar, you're not alone. According to TDWI's BI Maturity Model assessment, almost two-thirds of BI teams (64%) are in the Teenager phase of their BI journey, teetering on the precipice of the Chasm which separates BI adolescence from BI Adulthood. There are a lot of growing pains involved in achieving BI adulthood and there are rarely shortcuts. Often, the only recourse is patience and persistence: keep moving ahead and eventually you will succeed. Trust me.

Sponsorship Issues

Why? In the long term, nothing stays the same. If you have a competent BI team and effective processes and you're still mired in the mud, then in all likelihood, you have a sponsorship problem. (See "Humpty Dumpty and the CEO.") Business executives that don't value fact-based decision making--or business intelligence--don't last long. If they don't understand that lasting competitive advantage comes from combining human intelligence and business intelligence, then they are managing blind. Eventually, the marketplace will catch up with them and knock them out of the game. (See "What Every Executive Should Know about Business Intelligence.")

So, if you wait long enough, the marketplace will get you a new set of executives who hopefully value fact-based decision making and look to your team to drive new strategic initiatives.

In the meantime, as you struggle with unenlightened sponsors, your only recourse is to evangelize BI relentlessly. But remember words are cheap, and no one really listens much these days. A picture, on the other hand, is worth a thousand words.

Find a Quick Win. What you need is a quick win to galvanize attention and help executives see the possibilities of BI. (See "The Art of the Quick Win".) Find an executive with a burning need for information and then volunteer to build something fast that relieves their pain. Be ready to work overtime for a few months and be ready to break your own BI team's "ironclad" rules regarding architecture, development, and tools. Plead with a vendor to allocate resources free of charge to get an application up and running. Or use open source tools and the cloud to deliver fast with little upfront cost.

Bottom line: show what BI can do! If you capture the imagination of the business, you'll get funding to put the application on a sturdier IT foundation. (It's always better to meet the business needs first and fix the architecture later.)

Of course, there is a chance that your quick fails, and your reputation suffers. If that happens, do a post-mortem to figure out what happened. Talk to consultants or experts to get their perspective. (Let's chat!) Learn from your mistakes. And then move on.

If All Else Fails....I mean that literally. Find a new job. The industry is always looking for good, talented BI professionals. There is no reason to suffer in a dead-end environment forever. Consider doing a stint for a consultancy to broaden your horizons and skill sets. The more BI projects you work on and the more BI teams you work with, the more you know and can contribute to other BI projects.

Change is good even if the only thing you can change is yourself!


Posted January 28, 2011 6:39 AM
Permalink | No Comments |

When I talk to audiences about performance dashboards, I'm invariably asked how to get executives and analysts to abandon eye-bending spreadsheets packed with hundreds of data points per page in favor of more visual displays populated with charts and graphics.

This is a complex question because the answer involves sorting out and addressing two intertwined issues: 1) people's inherent resistance to change and 2) the quality of the visual design. In the end, users will adopt a visual display of quantitative information if it adds informational value, not because of its visual appeal.

Change Management

No Surprises. As I argued in a recent blog on change management, change disrupts people's rhythm and makes them less productive in the short run. Some will become anxious and lash out against the change even when it is in their long-term best interests.

Any time you change the way you present data to users, you force them to work harder--at least in the short term. They need to spend additional time learning the new layout, double checking numbers, experimenting with new functions and filters, and so on. Most people resist this short-term hit to their productivity. To avoid a mutiny, you need to prepare users for the change way ahead of time. If they know what's coming, they can budget their time and resources accordingly. No one likes a surprise!

Classes of Users. You also need to understand the classes of business users affected by the change and how each might react. Executives will react differently than managers who will react differently than analysts and operational workers. You need to develop a change management strategy that addresses the concerns and issues of each group and provides the appropriate levels of training and support.

For example, there will always be a small group of users who will resist change at all costs. These folks need high-touch training and support. That means offering one-on-one training (especially if they are executives). It also means duplicating the old environment inside the new one. If they are used to viewing spreadsheets, you need to show them how to use the new system to export a spreadsheet to their desktop that contains the same data and layouts as the old environment. Gradually, you can wean them off the older views by showing them how the new environment can make them more productive. But this takes a lot of patience and hand-holding.

Prototyping. It's also important to manage expectations and get users to buy into the new environment. The best way to do that is to solicit user feedback on an initial prototype. For example, at 1-800 Contacts, an online provider of contact lenses that I profiled in the second edition of my book, Performance Dashboards: Measuring, Monitoring, and Managing Your Business, the BI team built an executive dashboard to monitor sales and orders every 15 minutes. It discovered that some executives preferred to see data as line charts, while others wanted gauges and others preferred tables. So the team displayed all three graph types on the one-page dashboard to address every executive's visual preferences. This simple dashboard is very effective in helping executives keep their fingers on the pulse of the business.

Be careful during prototyping not to abdicate responsibility for the design of the environment. While it's important to get user feedback, it's critical that you let designers skilled in the visual display of quantitative data establish the look and feel of the dashboard--the fonts, colors, layouts, and navigational cues. Once you've established the framework, then ask users to comment on the value of data and metrics displayed, the navigational flow of the environment, and its ease of use.

Visual Design Techniques

The second major impediment to adoption is that many visual displays of quantitative information are suboptimally designed. The displays make it harder, not easier, for users to glean the meaning of depicted data. Users must spend more time examining the visual display to spot problems, trends, and outliers and make relevant comparisons.

Stephen Few, a renowned visualization expert, cites many visual design faux pas in his books and articles. These range from the overuse of color to 3-D chart types and poorly labeled graphs. His maxim, "Make every pixel count" is a wise one. One key issue that is often overlooked, however, is the need to tailor and evolve the density of visual displays.

Sparsity versus Density. Sparse displays contain fewer information objects than dense visual displays. As a result, sparse displays are easier to use because users can absorb relevant information quickly. However, sparse displays risk containing too little information. If users have to click several times to view information that they could previously view on a single page, they will get frustrated and stop using the tool.

On the other hand, dense displays can overwhelm or intimidate users with too much information at once. Users may feel reluctant to learn the new environment and revert to prior methods of consuming information.

When deciding how many objects to put on a single screen, it's important to remember that each class of users (and individuals within each class) will fall on a different point in the sparsity-to-density spectrum. Tailoring displays to these preferences can significantly aid adoption. It's also important to recognize that user preferences change over time. A simple, sparse screen may suffice when users first begin using a new display, but as they become more familiar with the layout, the data, and functionality of the new environment, they will demand denser displays. As a result, you will need to continuously evolve your visual displays to keep pace with the visual IQ of your users.

Summary

The bottom line: users want information and answers, not pretty pictures. They will reject graphical displays that obscure the meaning of the data and make them work harder no matter how visually appealing the interface. Conversely, they will overcome their natural resistance to change and adopt visual tools that make it easier to spot trends, problems, and outliers than existing methods.

If you focus on the data, not the pictures, and manage change wisely, you will succeed in converting your users to new visual methods for viewing quantitative information.


Posted January 20, 2011 1:37 PM
Permalink | No Comments |

Self-service BI is the holy grail of business intelligence practitioners. It promises to empower business users to create their own reports without relying on the IT department, which may takes days, weeks, or even months to fulfill user requests. With self-service BI everybody wins: business users get the information they need, when they want it, how they want it; the IT department reduces its backlog of requests and can focus on more value-added projects.

The Dark Side of Self-Service BI

Or so the theory goes. In my experience, self-service BI tools often exacerbate the problem they're designed to solve. On one hand, most business users find the tools too difficult to use and revert to asking the IT department for custom reports, or worse yet, do without solid information when making decisions, relying largely on gut feel. On the other, a few tech-savvy users overuse the tools, creating report chaos, or worse yet, use the tools to submit runaway queries bog down performance of the data warehouse. This further alienates the majority of users who find the BI environment less friendly than before the advent of self-service BI.

Know Your Users

Before purchasing BI tools, companies need to take an inventory of business users and classify them by their styles of interacting with information for decision making. In general, there are two classes of users: casual users and power users. (See table 1.)

Table 1 Small.png

Casual Users. Casual users use information to do their jobs. They are executives, managers, field and operations staff, and even customers and suppliers. Eighty percent of the time casual users simply want to consume information created by power users; they are passive recipients of information who use data to support their daily tasks and decisions. Ideally, casual users consume information that is "tailored" to their roles usually in the form of a report or dashboard whose data is sourced from a data warehouse or data mart.

Power Users. Power users, on the other hand, are business users for whom information is their job. They are super users, business analysts (a.k.a. Excel jockeys), analytical modelers (i.e., SAS and SPSS developers), and IT professionals. Power users spend 100% of their time producing information, either for themselves or others. They usually access data in an ad hoc, exploratory fashion using powerful analytical and authoring tools that source data from both the data warehouse and other systems, both inside and outside the organization.

The Missing 20 Percent

Given this dichotomy, it's fair to say that casual and power users represent two distinct user populations that require different sets of tools and even architectures to support their information habits. However, reality is not quite this simple. That's because 20% of the time casual users need information that is not available in the standard, tailored content available to them. In these situations, they need to become "producers." Unfortunately, casual users by definition don't have the skills to produce information, nor do they want to take the time to learn producer skills.

The Role of Super Users. Many organizations rely on "super users" to plug this "information gap." Super users are business users in each department who gravitate to information technology and quickly become the "go to" people that colleagues rely upon to create custom views and reports. Supers are absolutely critical to the success of any self-service BI initiative. The smartest BI directors find out who the super users are in each department and make them an extension of their BI teams and Competency Centers. They provide first-level support to the super users and recruit them to serve on the BI working committee to guide BI development, select products, and shape the roadmap. Super users are critical to the success of any self-service BI initiative.

BI Tool Capabilities

Hierarchies of Consumer and Producer Functionality. At the same time, the current generation of BI tools is moving the needle on what various classes of business users can do for themselves. Table 2 shows a hierarchy of functionality available to casual and power users that some BI tools now support.

1-13-2011 4-46-49 PM.png

An executive, for example, may only want to view static reports and dashboards or perhaps print them out. But over time, the executive may want to navigate a dashboard by drilling down predefined paths or applying predefined filters. A manager may want to view and navigate data as well as modify it (i.e., add/delete columns, rank/sort data, change chart types, etc.) and perhaps even explore data dimensionally via a visualization or search module.

On the power user side, a super user may first produce reports by assembling them from a library of reporting gadgets, which consist of tables, charts, and filters gleaned from existing reports and dashboards. At some point, they may graduate to crafting reports from data objects embedded in a semantic layer, or perhaps even source data and then model and transform it into data objects of their own. Overall, BI tools are making it easier for casual users to become more analytical and power users to develop more sophisticated reports without IT assistance.

Finding the Comfort Zone

It's important to recognize that no user wants or needs all the functionality listed in table 2, except for perhaps the most sophisticated power user. Every user eventually settles into a comfort zone of functionality that they need and use. Over time, users may evolve their skill sets as they become more familiar with the tool and data environment.

Thus, self-service BI tools don't overwhelm business users with too much functionality at once. They only expose functionality to users when they need it and are ready to use it. Much of this type of tailored delivery can be administered through role-based authentication. But the best BI tools build dynamic displays that put additional functionality at users' fingertips so it's available as soon as they want it; otherwise they might never know to ask for it.

Summary. The key to self service BI is threefold: 1) know your users and their BI functionality "comfort zones." 2) Create a network of super users to address the 20% of casual user requirements for ad hoc information that can't be met with standard content and 3) purchase BI tools that expose BI functionality on demand.


Posted January 13, 2011 2:40 PM
Permalink | 2 Comments |

As companies grapple with the gargantuan task of processing and analyzing "big data," certain technologies have captured the industry limelight, namely massively parallel processing (MPP) databases, such as those from Aster Data and Greenplum; data warehousing appliances, such as those from Teradata, Netezza, and Oracle; and, most recently, Hadoop, an open source distributed file system that uses the MapReduce programming model to process key-value data in parallel across large numbers of commodity servers.

SMP Machines. Missing in action from this list is the venerable symmetric multiprocessing (SMP) machine that parallelizes operations across multiple CPUs (or cores) . The industry today seems to favor "scale out" parallel processing approaches (where processes run across commodity servers) rather than "scale up" approaches (where processes run on a single server.) However, with the advent of multi-core servers that today can pack upwards of 48 cores in a single CPU, the traditional SMP approach is worth a second look for processing big data analytics jobs.

The benefits of applying parallel processing within a single server versus multiple servers are obvious: reduced processing complexity and a smaller server footprint. Why buy 40 servers when one will do? MPP systems require more boxes, which require more space, cooling, and electricity. Also, distributing data across multiple nodes chews up valuable processing time and overcoming node failures, which are more common when you string together dozens, hundreds, or even thousands of servers into a single, coordinated system, adds to overhead, reducing performance.

Multi-Core CPUs. Moreover, since chipmakers maxed out the processing frequency of individual CPUs in 2004, the only way they can deliver improved performance is by packing more cores into a single chip. Chipmakers started with two-core chips, then quad-cores, and now eight- and 16-core chips are becoming commonplace.

Unfortunately, few software programs that can benefit from parallelizing operations have been redesigned to exploit the tremendous amount of power and memory available within multi-core servers. Big data analytics applications are especially good candidates for thread-level parallel processing. As developers recognize the untold power lurking within their commodity servers, I suspect next year that SMP processing will gain an equivalent share of attention among big data analytic proselytizers.

Pervasive DataRush

One company that is on the forefront of exploiting multi-core chips for analytics is Pervasive Software, a $50 million software company that is best known for its Pervasive Integration ETL software (which it acquired from Data Junction) and Pervasive PSQL, its embedded database (a.k.a. Btrieve.)

In 2009, Pervasive released a new product, called Pervasive DataRush, a parallel dataflow platform designed to accelerate performance for data preparation and analytics tasks. It fully leverages the parallel processing capabilities of multi-core processors and SMP machines, making it unnecessary to implement clusters (or MPP grids) to achieve suitable performance when processing and analyzing moderate to heavy volumes of data.

Sweet Spot. As a parallel data flow engine, Pervasive DataRush is often used today to power batch processing jobs, and is particularly well suited to running data preparation tasks (e.g. sorting, deduplicating, aggregating, cleansing, joining, loading, validating) and machine learning programs, such as fuzzy matching algorithms.

Today, DataRush will outperform Hadoop on complex processing jobs that address data volumes ranging from 500GB to tens of terabytes. Today, it is not geared to handling hundreds of terabytes to petabytes of data, which is the territory for MPP systems and Hadoop. However, as chipmakers continue to add more cores to chips and when Pervasive releases DataRush 5.0 later this year which supports small clusters, DataRush's high-end scalability will continue to increase.

Architecture. DataRush is not a database; it's a development environment and execution engine that runs in a Java Virtual Machine. Its Eclipse-based development environment provides a library of parallel operators for developers to create parallel dataflow programs. Although developers need to understand the basics of parallel operations--such as when it makes sense to partition data and/or processes based on the nature of their application-- DataRush handles all the underlying details of managing threads and processes across one or more cores to maximize utilization and performance. As you add cores, DataRush automatically readjusts the underlying parallelism without forcing the developer to recompile the application.

Versus Hadoop. To run DataRush, you feed the execution engine formatted flat files or database records and it executes the various steps in the dataflow and spits out a data set. As such, it's more flexible than Hadoop, which requires data to be structured as key-value pairs and partitioned across servers, and MapReduce, which forces developers to use one type of programming model for executing programs. DataRush also doesn't have the overhead of Hadoop, which requires each data element to be duplicated in multiple nodes for failover purposes and requires lots of processing to support data movement and exchange across nodes. But like Hadoop, it's focused on running predefined programs in batch jobs, not ad hoc queries.

Competitors. Perhaps the closest competitors to Pervasive DataRush are Ab Initio, a parallelizable ETL tool, and Syncsort, a high-speed sorting engine. But these tools were developed before the advent of multi-core processing and don't exploit it to the same degree as DataRush. Plus, DataRush is not focused just on back-end processing, but can handle front-end analytic processing as well. Its data flow development environment and engine are generic. DataRush actually makes a good complement to MPP databases, which often suffer from a data loading bottleneck. When used as a transformation and loading engine, DataRush can achieve 2TB/ hour throughput, according to company officials.

Despite all the current hype about MPP and scale-out architectures, it could be that scale-out architectures that fully exploit multi-core chips and SMP machines will win the race for mainstream analytics computing. Although you can't apply DataRush to existing analytic applications (you have to rewrite them), it will make a lot of sense to employ it for most new big data analytics applications.


Posted January 4, 2011 11:17 AM
Permalink | No Comments |