Blog: Wayne Eckerson Subscribe to this blog's RSS feed!

Wayne Eckerson

Welcome to Wayne's World, my blog that illuminates the latest thinking about how to deliver insights from business data and celebrates out-of-the-box thinkers and doers in the business intelligence (BI), performance management and data warehousing (DW) fields. Tune in here if you want to keep abreast of the latest trends, techniques, and technologies in this dynamic industry.

About the author >

Wayne has been a thought leader in the business intelligence field since the early 1990s. He has conducted numerous research studies and is a noted speaker, blogger, and consultant. He is the author of two widely read books: Performance Dashboards: Measuring, Monitoring, and Managing Your Business (2005, 2010) and The Secrets of Analytical Leaders: Insights from Information Insiders (2012).

Wayne is currently director of BI Leadership Research, an education and research service run by TechTarget that provides objective, vendor neutral content to business intelligence (BI) professionals worldwide. Wayne’s consulting company, BI Leader Consulting, provides strategic planning, architectural reviews, internal workshops, and long-term mentoring to both user and vendor organizations. For many years, Wayne served as director of education and research at The Data Warehousing Institute (TDWI) where he oversaw the company’s content and training programs and chaired its BI Executive Summit. He can be reached by email at weckerson@techtarget.com.

Every year at the end of winter, I spend a day and a half with the good folks from Informatica who provide the analyst community with an up-close-and-personal look at the company's strategy and current and future product portfolio. Here are some of the highlights from this year's event:

1. Accommodations. Thanks to the herculean effort of Peggy O'Neill and her team, we were once again treated to royal accommodations at the Rosewood Hotel in Menlo Park, California and a thoughtful schedule of briefings on day one, followed by a half-day of all-important one-on-one meetings with top company executives on day two. Kudos again!

2. Strategy. Both CEO Sohaib Abbasi and Marge Breya, the company's newly minted chief marketing officer, emphasized that despite a down year financially, Informatica is aiming to move from a category leader (i.e. data integration market) to an industry leader (i.e. big data analytics market). And with that goal in mind, Sohaib hinted that Informatica will move from a best-of-breed player to an all-in-one player. This makes more sense than it did several years ago, since most large software vendors now offer a complete BI/analytics/big data stack and the overall trend in the computer industry is to provide complete solutions (i.e. integrated hardware, software, and applications.) So, might Informatica purchase a database vendor or BI vendor? Or plunge into business applications? There are certainly a lot of candidate companies that would complement Informatica's data integration portfolio. I wouldn't be surprised to see a blockbuster deal within two years.

3. Data Governance. I was particularly captivated by Rob Karel's strategy for data governance. An erstwhile Forrester analyst, Rob has created a wonderful online tool called the Data Governance Maturity Assessment which consists of 22 questions that help companies understand their readiness to implement a data governance program. (You can take the assessment at www.governyourdata.com). To help companies assess the best initiative with which to begin their data governance journey, he has created a Business Opportunity Assessment, also available at the same site. These are wonderful tools that can help companies not just talk about treating data as a corporate asset, but do something about it.

4. Data Integration Hub. This is an interesting solution that fits between messaging middleware and ETL software. Messaging software (i.e., enterprise service bus) uses a publish/subscribe mechanism to move events bidirectionally among applications in near real-time. In contrast, ETL tools move large files unidirectionally between a source and a target system in a batch operation. If the target isn't available, the ETL job can't finish. The Data Integration Hub, which combines Informatica's PowerCenter and B2B products, uses a publish/subscribe mechanism to move both events and files bidirectionally in either real-time or batch. Moreover, unlike messaging middleware, the Hub apply complex transformations to the data it moves, thanks to its PowerCenter engine. One compelling use for the Hub, which acts like an enterprise ODS, is to replicate data among multiple data warehouses, data marts, and applications.

5. Data Masking. I was also intrigued by Informatica's data masking and archiving technology. I've overlooked this class of products until now because it is geared to database administrators. But in talking with Adam Wilson, general manager of Informatica's Information Lifecycle Management business unit, I learned that these products can save companies a bundle of money as well as improve compliance and reduce the risk of data breaches. Adam's group performs one-day Business Value Assessments that help companies identify dormant data that they can delete from their systems to improve performance and reduce storage costs. His team also performs Risk Assessments that identify and profile sensitive data and duplicate data so companies can better protect their data assets. Adam says these assessments usually uncover sizable ROI opportunities.

6. Operational Analytics. Ash Parikh runs marketing for Informatica's emerging technologies group, including many tools designed to support operational analytics, such as complex event processing, ultra messaging, and data virtualization. It seems that after many years, products that support operational analytics are about to go mainstream, as many more companies seek to reap the benefits of bringing insights to operational decisions, which in some cases can be automated with these new technologies.

Informatica will soon reach a billion dollars in annual revenues. This is a far cry from the scrappy startup of 20 years ago that offered an engine-based approach to building data marts and data warehouses. But with some bold moves on the product side supplemented with a glossy marketing orchestrated by Marge Breya, Informatica is poised to join the ranks of elite software vendors.


Posted March 6, 2013 6:25 AM
Permalink | No Comments |

After attending several big data conferences, I had to ask myself, "What's really new here?" After all, as a data warehousing practitioner, I've been doing "big data" for some 20 years. Sure, the scale and scope of the solutions has expanded along with the types of data that are processed. But much of what people are discussing seems a rehash of what we've already figured out.

After some deliberation, I came to the conclusion that there are six unique things about the current generation of "big data" which has become synonymous with Hadoop. Here they are:

  1. Unstructured data. Truth be told, the data warehousing community never had a good solution for processing unstructured and semi-structured data . Sure, we had workarounds, like storing this data as binary large objects or pointing to data in file systems. But we couldn't really query this data with SQL and combine it with our other data (although Oracle and IBM have pretty good extenders to do just this.) But now with Hadoop, we have a low-cost solution for storing and processing large volumes of unstructured and semi-structured data. Hadoop has quickly become an industry "standard" for dealing with this type of data. Now we just have to standardize the interfaces for integrating unstructured data in Hadoop with structured data in data warehouses.
  2. HDFS. The novel element of Hadoop (at least to SQL proponents) is that it's not based on a relational database. Rather, under the covers, Hadoop is a distributed file system into which you can dump any data without having to structure or model it first. Hadoop Distributed File System or HDFS runs on low-cost commodity servers, which it assumes will fail regularly. To ensure reliability in a highly unreliable environment, HDFS automatically transfers processing to an alternate server if one server fails. To do this, it requires that each block of data is replicated three times and placed on different servers, racks, and/or data centers. So with HDFS, your big data is three times bigger than your raw data. But this data expansion helps ensure high availability in a low-cost processing environment based on commodity servers.
  3. Schema at Read. Because Hadoop runs on a file system, you don't have to model and structure the data before loading it like you would do with a relational database. Consequently, the cost of loading data into Hadoop is much lower than the cost of loading data into a relational database. However, if you don't structure the data up front during load time, you have to structure it at query time. This is what "schema at read" means: whoever queries the data has to know the structure of the data to write a coherent query. In practice, this means that only the people who load the data know how to query it. This will change once Hadoop gets some reasonable metadata, but right now, issuing queries is a buyer-beware environment.
  4. MapReduce. Hadoop is a parallel processing environment, like most high-end, SQL-based analytical platforms. Hadoop spreads data across all its nodes, each of which has direct-attached storage. But writing parallel applications is complex. MapReduce is an API that shields developers from having to know the intricacies of writing parallel applications on a distributed file system. It takes cares of all the underlying inter-nodal communications, error checking, and so on. All developers need to know is what elements of their application can be parallelized or not.
  5. Open source. Hadoop is free; you can download it from the Apache Foundation and start building with it. For a big data platform, this is a radical proposition, especially since most commercial big data software easily carries a six-to seven-digit pricetag. Google developed the precursor to Hadoop as a cost-effective way to build its Web search indexes and then made its intellectual property public for others to benefit from its innovations. Google could have used relational databases to build its search indexes, but the costs doing so would have been astronomical and it would have not been the most elegant way to process Web data which is not classically structured.
  6. Data scientist. You need data scientists to extract value from Hadoop. From what I can tell, data scientists combine the skills of a business analyst, a statistician, a business domain expert, and a Java coder. In other words, they really don't exist. And if you can find one, they are expensive to hire. But the days of the data scientist are numbered; soon, the Hadoop community will deliver higher level languages and interfaces that make it easier for mere mortals to query the environment. Meanwhile, SQL-based vendors are working feverishly to integrate their products with Hadoop so that users can query Hadoop using familiar SQL-based tools without having to know how to access or manipulate Hadoop data.

So, those are the six unique things that Hadoop brings to the market. Probably the most significant is that Hadoop dramatically lowers the cost of loading data into an analytical environment. As such, organizations can now load all their data into Hadoop with financial or technical impunity. The mantra shifts from "load only what you need" to "load in case you need it." This makes Hadoop a much more flexible and agile environment, at least on the data loading side of the equation.


Posted February 19, 2013 11:59 AM
Permalink | 5 Comments |

When I went to my first big data conference almost three years ago, I thought I had been transported to a parallel universe: everyone was talking about data and analytics, yet data warehousing, SQL, and relational databases were dirty words.

Then, I looked at how people were dressed. I was the only person in the hall with a sports jacket, collared shirt, and leather shoes. Everyone else was wearing jeans, t-shirts, and sneakers and sported a pony tail. Then, it dawned on me: these were Java developers who had outgrown MySQL and were looking for a more scalable open source platform to run data-intensive, Web-based applications. And Hadoop was the answer to their big data dreams.

Immersing yourself in a foreign culture often crystallizes who you are and where you come from. For the first time in my professional life, I realized that I was a data guy from corporate IT who was wedded to commercial software and SQL-based processing. Standing brazenly in my blue blazer amidst a sea of Java coders, I also realized who I wasn't: an application developer who valued open source software.

Yet, my presence at this early big data event symbolized the beginning of the convergence of these two distinct communities: "Data people" and "applications people" have worked side by side for many years but rarely intermingled or aligned approaches. Fast forward two years. The big data conference I attended this fall had just as many "suits" as pony tails in the audience. The convergence is proceeding apace, as both communities recognize the opportunities of joining forces as well as the risks of remaining isolated.

Opportunities and Threats

Opportunities. For SQL-based vendors, the world of Hadoop and NoSQL opens new lucrative markets consisting of customers that want to harness large volumes of unstructured and semi-structured data for business gain. For Hadoop vendors, SQL-based products represent hundreds of potential applications that can legitimize the Hadoop platform once they interface with or are ported to Hadoop.

Threats. At the same time, Hadoop and NoSQL products represent a huge threat to traditional SQL-based vendors. Hadoop is like a swiss army knife that can be used to do almost anything. Consequently, many advocates believe Hadoop spells the death knoll of SQL-based databases and data warehouses. And they might have a point, since many data warehousing managers are just starting to question why they would want to move data out of Hadoop to do query, reporting, and data mining.

Conversely, SQL-based vendors, which collectively represent hundreds of billions of annual sales, aren't likely to cede this new market to a handful of open source upstarts. They are already circling the wagons, coopting Hadoop and NoSQL software by embedding them into their commercial products. This surround-and-drown strategy could spell the doom of independent, open source Hadoop vendors.

The only remaining question is which community wins in the end? My bet is on the commercial SQL vendors, which are much larger, more established, and offer robust, enterprise-caliber products that today's organizations rely on to run their businesses. They may have to radically transform their architectures and products suites to co-opt upstart Hadoop and NoSQL approaches, but they'll do what they need to stay on top and in control.


Posted February 13, 2013 1:58 PM
Permalink | No Comments |

Have you ever seen anything more hyped in the history of information management than big data? I haven't. Ok, artificial intelligence probably incited a similar media storm, but that was before my time.

What's in a number? The ironic thing is that data by itself has no intrinsic value. For example, if I gave you three numbers--100,000, 300,000, and 500,000--would you say they provide any value to you or your organization? Of course not. What if I told you those numbers referred to US currency? That's context, but no value. What if I said those figures referred to your manufacturing organization's net profits for the past three quarters? Now, that's interesting and certainly good news; but there is still no business value.

But what if said that your profit growth is due to home builders in the Midwest who are bundling your company's biggest electrical generators into their building packages in response to severe storms caused by climate change? Now, that's data--or insight--that you can take to the bank. For instance, armed with this knowledge, your organization might manufacture more high-end generators and fewer lower-end ones to accommodate the new demand. Or even better, you might identify the builders in the Midwest who haven't yet bundled your high-end generators with their home products and give them a 10% coupon to follow the lead of their peers.

Insights and actions. The point is that data--even big data--is useless without analysis and insight. Therefore, instead of talking about "big data", we should be talking about "big data analytics." Joining analytics and data can deliver real business value.

But there is a caveat: analysis without actions produces no value. It's one thing to know what drives profit growth in our example above, it's another to do something about it. Insights without actions don't get you very far. So, instead of talking about "big data analytics" we should really be talking about "big data actionable analytics."

Impacts. At risk of getting didactic, even actionable analytics doesn't guarantee business value. That's because actions that don't impact the organization in a positive direction are useless. That's like a salesman saying he should receive a commission for saying and doing all the right things with a client even though he didn't win the deal.

So, the most technically accurate term for this new phenomenon that is taking our industry by storm is: "Big data insights that drive actions that help an organization achieve its goals." Of course, that is too wordy and would never fly as an industry buzzword. But you get the point: data without analysis, and analysis without action, and action without positive impact, deliver no value.

So when you hear the hype surrounding big data, remember that data by itself has no value; it's what you do with it that counts.


Posted February 1, 2013 2:19 PM
Permalink | 1 Comment |

"It's ironic that we are the group that measures everything in the organization, but we don't have a good way to measure ourselves and our effectiveness."

This is basically what Eric Colson, former VP of Data Science and Engineering at Netflix, told me when I asked him how he measured the success of his BI and analytics team. (Eric is now chief data officer at Stitch Fix.)

When I pressed him, Eric said the best empirical measure of BI success that he could come up with was the number of times business unit heads mentioned his team at their regular operational review meetings. If a business head tells executives that he is partnering with the BI team to deliver a strategic project, Eric takes that as a sign his team has the confidence of the business and is making value-added contributions. "We must be doing something right if they keep wanting to work with us as strategic partners," says Eric.

If you are like most BI managers, measuring success is an afterthought. It's hard enough to get approval for a project and deliver results without having to kick off another project to measure your team's performance. And if you do have the time and interest, what exactly do you measure?

Usage tracking. Most BI managers track usage to gauge performance and value. They monitor how many users have BI licenses, how often they log in, how many reports they run on average, how many queries they run against which data elements, and so on. But high usage doesn't necessarily mean users are getting a lot of value or that this value is commensurate with the organization's investments in BI. For example, although you might have 1,000 users, perhaps only 25% log in weekly, and when they do, they only run one report each, which they look at for just five minutes. So, there is a lot of activity, but very little uptake.

Surveys. Some more ambitious BI managers send surveys to BI users to gauge their satisfaction with the BI tools and reports. Unfortunately, from what I've seen, the response rate to these surveys is pretty dismal (but that's true for all surveys these days) which means results are potentially skewed. Typically, you only hear from those who are really happy, frustrated, or disappointed. Unfortunately, the real value comes from the great unwashed masses who didn't respond. In my mind, this undermines the value of a survey as a measuring stick of success.

Social media analysis. One BI manager I know wants to add social media features to his team's BI reports, such as giving users the ability to rate, comment, and share reports with peers. Then using social media analytics, he can evaluate the value of each report, using both empirical and subjective data, and by extension, the value users get from BI deliverables. This also helps the BI team delete unused and undervalued reports and get a better sense of what data in what format users find helpful.

Spreadmarts. In the past, I've jokingly said that a BI program's success is inversely proportional to the number of spreadmarts in its environment. In theory, the fewer renegade data shadow systems that exist in a company, the more likely that users are getting value from the BI team's reports, dashboards and self-service reporting tools. Of course, this means a BI manager has to find and monitor all the spreadmarts in her organization. But this is like playing whack-a-mole. As soon as she discovers one spreadmart and consolidates it into the data warehouse, three more spring up that she isn't aware of.

Cost efficiencies. The best BI managers track the costs of making decisions. Before they initiate a BI project, they establish a baseline set of figures that take into account the cost of hardware and software licenses and the number of hours per week that analysts spend accessing data rather than analyzing it multiplied by their fully loaded hourly salaries. After completing a BI project, these BI managers measure these items again and compare the results to the baseline to gauge the financial lift of the BI project.

Companies that implement BI for the first time can usually wring lots of costs from their decision making processes by making the data acquisition and delivery process more efficient. But companies with mature BI programs don't have this luxury because they've already streamlined BI processes. BI managers here must justify continued investment in the BI program on the basis of the program's strategic value to the organization. This usually involves measuring the value of better decisions, mission-critical processes powered by data, or more informed workers. This is not easy to do, but it can be done. Unfortunately, the results can always be disputed by a cynical executive.

Full circle. And this brings us back to Eric Colson. Perhaps tracking the number of mentions the BI team gets in an executive meeting is not an exact science. And perhaps it's a bit unseemly or ego maniacal (and Eric doesn't do this.) But as far as I can tell, it's the best metric we have for truly measuring the value of a BI program.

Let me know what techniques you've used to measure BI success. (And you can learn more about Eric Colson and other top BI leaders in my new book, "Secrets of Analytical Leaders: Insights from Information Insiders" available at Amazon.com.)


Posted January 30, 2013 3:13 PM
Permalink | No Comments |