Blog: Wayne Eckerson Subscribe to this blog's RSS feed!

Wayne Eckerson

Welcome to Wayne's World, my blog that illuminates the latest thinking about how to deliver insights from business data and celebrates out-of-the-box thinkers and doers in the business intelligence (BI), performance management and data warehousing (DW) fields. Tune in here if you want to keep abreast of the latest trends, techniques, and technologies in this dynamic industry.

About the author >

Wayne has been a thought leader in the business intelligence field since the early 1990s. He has conducted numerous research studies and is a noted speaker, blogger, and consultant. He is the author of two widely read books: Performance Dashboards: Measuring, Monitoring, and Managing Your Business (2005, 2010) and The Secrets of Analytical Leaders: Insights from Information Insiders (2012).

Wayne is currently director of BI Leadership Research, an education and research service run by TechTarget that provides objective, vendor neutral content to business intelligence (BI) professionals worldwide. Wayne’s consulting company, BI Leader Consulting, provides strategic planning, architectural reviews, internal workshops, and long-term mentoring to both user and vendor organizations. For many years, Wayne served as director of education and research at The Data Warehousing Institute (TDWI) where he oversaw the company’s content and training programs and chaired its BI Executive Summit. He can be reached by email at weckerson@techtarget.com.

November 2012 Archives

It's never too early to start prognosticating about the year ahead. But rather than deliver another set of ho-hum predictions, I'm going to articulate new catch phrases that you will hear in 2013. Below I've concocted six catch phrases, each for a different segment of the BI market, from bleeding edge to newbie. If you've heard any of these already, let me know! Or add your own in the comments section below or send them to me at weckerson@bileadership.com.

"Hadoop is the data warehouse." You'll need to be a bleeding-edge BI aficionado to hear this one. But internet companies that have made a serious commitment to all things new and innovative (i.e. Hadoop and the cloud) are intent on moving all their data--including their master data--to Hadoop, while relegating their relational data warehouses to "data mart" status. This migration will be gradual. But as they get more comfortable running Hadoop in production, they will move the bulk of their data processing and analytic applications to Hadoop and reserve their traditional data warehouses for applications that require relational processing.

"We treat analysts as salespeople." Mature BI programs realize that you can bring a horse to water but you can't make it drink. Analysts who concoct brilliant insights and deliver them in the form of reports, dashboards, models, or artfully drawn infographics ultimately fail if the business doesn't act on them. To succeed, analysts must have skin in the game. They must be incented not by the quality of their insights, but the impact their insights have on the business. In other words, analysts must close the deal, like a salesperson on commission. They only get credit (i.e. a bonus) if the business acts on their insights.

"Data governance is dead; long live data governance!" This is the new war cry heard in mature BI circles. Most veteran BI managers have implemented formal data governance programs that quickly died on the vine through lack of interest. As important as data governance is, they discovered that you can't legislate it. Data governance must arise out of business need. Business people won't participate in data governance programs, but they will attend meetings to ensure their pet project sees the light of day. Thus, the key to data governance is to kill it as a program and embed it into project methodologies.

"Federate to integrate." Those BI programs emerging out of infancy into adulthood run smack into a brick wall: they suddenly realize that their organization has multiple, overlapping data warehouses and data marts. And most don't have the appetite, money, or time to build an uber-data warehouse. So, after some hand wringing, they realize their only option is to federate. Thankfully, data virtualization tools have improved immensely, making it possible to build a virtual EDW of data warehouses or spawn new views of disparate data without the expense or time of physically instantiating the data in a data warehouse.

"Can I have that in the cloud?" Mid-market companies, startups, and BI newbies see the world of BI with fresh eyes untainted by the long-standing tradition of inhouse IT operations and development. They already run their sales, payroll, expense payment, and numerous other applications in the cloud and, as a result, are less daunted by cloud's alleged data security issues. In fact, these groups are leading the charge to cloud BI, just as they did for e-commerce in the late 1990s, when security also was an issue. Larger companies will also follow suit as they move new and lower-risk BI applications to the cloud.

"Self service BI requires a lot of hand holding." I pity BI newbies, there is so much to learn! So many myths to deconstruct, the biggest being the holy grail of self-service BI. Although business users say they want self-service, most really don't. Casual users, who represent 80% of your population, only want greater degrees of interactivity. Power users, on the other hand, want true authorship, and different degrees at that. And counter-intuitively, empowering power users to become proficient with BI authoring tools requires a surprising degree of hand holding, and this catches most BI newbies off guard.

Please use the comments section below to add your own catch phrase of the year, or send me your thoughts to weckerson@bileadership.com.


Posted November 21, 2012 8:53 AM
Permalink | No Comments |

We all know that people, process, and technology are the keys to unlocking the business value of information technology. Although many organizations know how to setup and manage technology projects, they are less adept at setting up and managing their human resources.

Although there are no hard and fast rules about how to implement a BI Center of Excellence, top performing business intelligence programs usually adopt a common structure. After interviewing dozens of BI leaders for my recently published book, Secrets of Analytical Leaders: Insights from Information Insiders, I began to see that most BI Centers of Excellence have a tripartite structure consisting of an executive team, a business team, and a technical team. (See figure 1.)

Figure 1. BI Center of Excellence
BI CoE.jpg

EXECUTIVE TEAM

The executive team consists of line of business heads who sponsor and fund BI projects for their units. Also known as an executive committee or steering group, the executive team usually meets monthly to start and quarterly once the BI program gets established. Although its ostensible job is to review and approve the BI roadmap, allocate funds, and prioritize development, its primary purpose is to manage the politics that surround any successful BI program that proves it can deliver business value quickly. An executive team earns it pay by balancing the parochial interests of its individual members with the global interests of the company. By leading political interference, the executive team frees the BI team to work free of internecine distractions.

BUSINESS TEAM

BI director. The business team is run by a director of BI (or analytics) who oversees the entire BI program, ranging data warehousing to business intelligence to analytics and big data. The BI director sits outside of the IT department and reports to a C-level business executive, usually a COO, CFO, or CIO. This reporting structure is critical to ensure the success of the BI program. Unlike other types of information technologies, BI needs to straddle business and IT to ensure it aligns with ever-changing business requirements.

A BI Center of Excellence requires a strong and capable leader to succeed. My book details the characteristics of such leaders. In essence, they must be "purple people"--not "blue" in the business or "red" in IT, but a perfect blend of the two, hence "purple." They talk the language of both worlds and build bridges that eliminate the "us versus them" mentality that exists in many organizations. For instance, they excel at creating teams of business and technical people who sit side by side and work together to deliver rich BI solutions. In short, these leaders are the glue that binds all the components of a BI Center of Excellence together.

BOBI. Assisting the BI director are several business-oriented BI (BOBI) professionals. The primary purpose of this BOBI team is to develop and evangelize the BI strategy and coordinate its development with the BI technical team (see below). The BOBI team identifies people doing BI work in business units and establishes relationships with them. It often recruits them to serve on a BI working committee that serves as an extension to the BOBI team and helps develop the BI strategy, troubleshoot problems, select tools, and manage the company's report porfolio. In addition, the BOBI team defines and documents BI best practices, oversees data governance programs, and gathers requirements for major BI projects.

Embedded developers and analysts. The business team also consists of report developers (i.e. super users) and business analysts who work inside a business department. These embedded developers and analysts sit with business people, participate in all their meetings, and are considered full-fledged members of the business team. Although these developers and analysts usually report to the line-of-business head, they usually have a dotted line relationship to the BI director and meet regularly with their peers in other business units to share ideas and collaborate on cross-departmental issues or initiatives. These may be the same individuals who serve on the BOBI team's working committee described above.

Statisticians. The business team also consists of statisticians (or data scientists) who develop analytical models that describe patterns in large data sets and predict outcomes. In small organizations, statisticians typically reside in a central group since individual departments usually don't have enough work to keep a statistician busy all the time. In large organizations, statisticians are typically embedded in departments but report directly to a director of analytics. Even moreso than business analysts, statisticians need an affiliation with a central group that fosters collaboration, continuing education, and career development.

TECHNICAL TEAM

The technical BI team consists of data and technical architects, ETL and BI developers, data and DW administrators, requirements specialists, quality assurance testers, trainers, and technical writers, among others. These folks are responsible for implementing the strategy established by the BOBI team and its departmental surrogates. In essence, the technical BI team builds and maintains the organization's enterprise data warehouse and associated data marts as well as any complex reports and dashboards that require skilled programmers. It also implements data definitions and rules within BI tools, data models, ETL tools, and data quality tools and works closely with data center specialists, such as database administrators, to ensure the BI environment delivers adequate scalability and performance.

Like the BOBI team, the technical BI team sits outside of the IT department and reports to the director of BI. Occasionally, however, the technical team resides within IT while the BOBI team resides outside of it. To succeed with this type of hybrid structure requires that the director of BI and the director of IT maintain a close working relationship with constant communication.

CONCLUSION

Although there are infinite ways to organize a BI team, best-in-class organizations develop a tripartite organizational structure consisting of executive, business, and technical teams. The director of BI (or analytics) is the glue that holds these three teams together and must possess strong business and technical skills. Ideally, the business and technical teams reside outside of IT to align more closely with the business. These teams ensure further alignment by embedding report developers and analysts (and sometimes statisticians) within business departments. However, to ensure continuity and cross-departmental coordination, these embedded developers and analysts also maintain a reporting relationship with the director of BI and often serve on a BOBI working committee that supports BI deployment for the entire organization.

Author's Note: If you would like more information about how to organize and motivate BI and analytical professionals, my book contains several chapters on these topics: Secrets of Analytical Leaders: Insights from Information Insiders


Posted November 14, 2012 11:22 AM
Permalink | 4 Comments |

Strata+Hadoop World 2012 held in New York City several weeks ago showcased dozens of big data technologies, showing the breadth and depth of solutions that now run on or integrate with Hadoop. In a prior blog, I discussed the overarching trends fueling the big data movement and its implications for traditional data warehousing and business intelligence vendors and ecosystems. (See "TDWI 1997 Versus Hadoop World 2012".)

But in this blog, I want to discuss the numerous analytical, data integration, database, and infrastructure vendors I met with, most of whom were touting innovative products designed to meet emerging big data needs. Perhaps the two most interesting categories of tools I saw are those that provide real-time SQL queries against Hadoop and end-to-end analytical tools with embedded data processing capabilities.

Real-time Queries

Impala. One significant limitation of Hadoop is that it's a batch processing environment that can't support real-time queries. This makes it challenging for business analysts to explore Hadoop data efficiently and effectively without moving it into a relational database. To remedy this situation, several vendors announced or exhibited products that embed real-time query engines inside Hadoop. Chief among these is Cloudera which announced that a new Apache project called Impala, currently in beta, that embeds a real-time query engine alongside MapReduce. This should give some relief to Hadoop users who find Hive, a SQL-like language that runs against virtual tables in Hadoop, too slow and cumbersome for explorative querying.

Impala, which Cloudera calls Real-Time Query for customers who pay for support, currently runs as a separate processing engine alongside MapReduce with its own parallel processing framework. According to CEO Mike Olson, Cloudera plans to port Impala to the next-generation of Hadoop, called Yarn, which will provide native support for alternate data processing engines besides MapReduce. Olsen also said Cloudera will eventually upgrade Impala, which supports HiveQL, to support ANSI standard SQL. HiveQL is a SQL-like language that is missing many basic SQL functions, making it challenging for BI vendors to provide customers with rich SQL support for Hadoop data.

Hadapt. One vendor that beat Cloudera to the punch is Hadapt, which implements a Postgres database on every node in a Hadoop cluster. It then converts ANSI-standard SQL queries into MapReduce, which parallelizes the queries and then converts the MapReduce code into SQL for processing in Postgres. This lets BI users and BI tools submit rich SQL queries against structured data stored in Hadoop. To overcome the batch processing constraint, Hadapt just announced support for real-time queries that bypass the MapReduce layer in Hadoop, just like Cloudera's Impala. Hadapt also embeds the Solr search engine for processing text.

Others. It's more than likely that other Hadoop vendors, such as MapR and Hortonworks, as well as startups, will jump on the real-time query bandwagon in the near future. This trend will gain momentum when the Apache Hadoop Community elevates Yarn--otherwise known as Hadoop 2.0, from alpha into general release sometime in the next year or so. But it remains to be seen whether these products are more than just klugey workarounds of Hadoop's batch processing environment.

End-to-End Analytical Toolsets

The current way to query Hadoop data in real time is to move the data out of Hadoop and into an analytical platform optimized for real-time query processing. This is what most BI and data warehousing vendors recommend. But the big data market has spawned a host of new analytical vendors hawking end-to-end BI capabilities. Here are a few I met with at Strata+Hadoop World 2012.

SiSense. SiSense provides a complete analytical processing environment that includes an in-memory columnar database, a visual data mashup tool, and visualization software based on the D3.js library from the W3 Consortium. SiSense's claim to fame is that it can process enormous volumes of structured data at an extremely low price. At Strata+Hadoop World 2012, the company showed how it could analyze 1TB of data on a $750 laptop with 8GB of RAM. The product's secret sauce is its ability to combine memory-based computing with a vectorwise, columnar database that places no limit on data volumes while processing most queries in memory.

Platfora. Startup Platfora also provides a complete analytical environment but unlike SiSense, its visual ETL tools generate MapReduce jobs that load data into a classic star schema models running on an in-memory columnar database that can be incrementally updated as new data arrives in Hadoop. The MapReduce jobs can also parse and aggregate semi-structured data so that it can be analyzed in Platfora, which offers both SQL and REST interfaces and can run in the cloud or on premise, usually adjacent to an existing Hadoop cluster.

Alteryx. Alteryx recently expanded its focus from a spatial analytics provider to an all-in-one analytical environment geared to business analysts. Alteryx Designer Desktop comes with a point-and-click development tool, personal ETL, industry content to enrich applications, data quality tools, R for predictive analytics, and a database. The product works on large volumes of data and can be deployed in the Cloud as well.

Quest Kitenga. Recently purchased by Quest Software (which was recently purchased by Dell), Kitenga is a native Hadoop application that offers visual ETL, Solr-based search, natural language processing, Mahout-based data mining, and advanced visualization capabilities. It's a big data godsend for sophisticated analysts who want a robust toolbox of analytical tools.

Pentaho. Open source BI vendor, Pentaho, offers a complete set of big data tools that extract, transform, load, report, analyze, and explore data in Hadoop. Its new Instaview product enables business analysts to connect to any data source, including Hadoop, HBase, Cassandra, MongoDB, Web data (e.g., Twitter, Facebook, log files, and Web logs) and SQL sources, visually prepare data for analysis, and instantly visualize and explore data.

Others

Other vendors I spoke with included analytics vendors (e.g. SAP, SAS, MetaMarkets, and Revolution Analytics), database vendors (Couchbase, Calpont, and Kognitio), and data fabric vendors (ScaleOut, Terracotta, and Continuity). And this was just the tip of the iceberg! I hope to be able to drill down on these and other vendors' offerings in the near future.


Posted November 8, 2012 3:00 PM
Permalink | 1 Comment |

I've attended numerous big data events during the past three years, but the Strata/Hadoop World in New York City two weeks ago had the most people and buzz of any so far. With more than 2,500 attendees crammed into the New York Hilton Hotel, the conference was a bit of a madhouse, exploding with energy and possibility. It's clear that after simmering for several years, big data has finally captured the imagination and attention of the industry as a whole.

In fact, this year's Hadoop World reminded me of the early years of data warehousing. In 1997, The Data Warehousing Institute (TDWI) held its largest conference ever in a ramshackle two-story hotel on the outskirts of San Diego alongside the freeway. At the time, data warehousing was a red hot phenomenon and there was so much interest in this new technology that I missed getting a room at the event hotel. So every morning, I walked under the freeway to learn about the latest and greatest in data warehousing. This daily 20-minute vigil was a small sacrifice knowing that I was on to something groundbreaking!

Since then, data warehousing has become a fixture in corporate IT environments. And although it no longer attracts the same buzz and has taken its punches over the years, it has also matured. With trial and error, the data warehousing market has refined the processes, methodologies, and technologies for collecting, integrating, and reporting on large volumes of disparate data. Data warehousing is here to stay.

Pace of evolution. My first question these days is whether big data (a.k.a. Hadoop) will evolve the same way as data warehousing (and most information technologies) or will it navigate a unique course given its supercharged expectations? My guess is that in 2013 big data will slide down Gartner's trough of disillusionment as companies discover just how raw and costly the technologies are to manage in a production environment at scale.

But that doesn't mean Hadoop won't compete with established technologies and vendors for a growing portion of data processing and analytical workloads. In fact, there are indications that Hadoop may move through the maturity curve faster than most technologies. It certainly has the potential to replace technologies that companies use today to build and manage information-intensive applications. Vendors are quickly plugging the gaps in the Hadoop ecosystem with both open source and commercial software, and user organizations are moving rapidly from kicking Hadoop's tires to implementing it in production environments.

Dress code. One indication of the rapid evolution of the big data market is the changing dress code at big data events. Two years ago, the typical attendee at Hadoop World had a ponytail and wore jeans, Converse sneakers, and a t-shirt. But this year, I saw a sizable number of people in blue blazers or business suits sans ties with gray around the temples (including me!). Clearly, big data is no longer just a forum for Java developers and open source adherents. The buzz around big data has attracted the attention of commercial software vendors, venture capitalists, industry experts, and other market followers who are gearing up for the next big thing.

Complementary or Competitive?

The next question with Hadoop is whether it will complement or replace our existing analytical environments? Die-hard Hadoop advocates tout Hadoop as a replacement for data warehouses, relational databases, and data integration tools. However, top executives at Hadoop vendors, including Mike Olsen at Cloudera and Rob Beardon at Hortonworks, are more diplomatic and publicly declare that Hadoop plays a complementary role to existing technologies. Some of this "play nice" verbiage is necessary: for Hadoop to grow, it needs applications to run on top of it, and the quickest way to do that is to play nice with existing database, ETL, and BI vendors. But some of this is based on the current realities of Hadoop implementations.

Staging area and archive. Today, most companies use Hadoop as a staging area for semi-structured data. Most parse and aggregate Web logs and then load the results into a data warehouse for reporting and analysis. As such, Hadoop offers companies a cost effective way to collect, report, and analyze unstructured data, something which is not easily accomplished in the data warehousing world.

Hadoop also serves a cost-effective data archive since you simply add commodity servers to house more data and you never move data offline. As one colleague likes to joke, "Hadoop is the new tape."

Load and go. Most importantly, Hadoop offers a new approach to collecting and managing data that is quite liberating. In the data warehousing world, you have to model and structure data before you load it into a data warehouse, a process that it is time consuming and expensive. Consequently, the mantra from data warehousing experts is, "Collect only the data you need." But since Hadoop is based on a file system, you don't have to do any upfront modeling. You just load and go. As a result, the mantra from Hadoop developers is, "Collect any data you might need." That's because there is no longer any significant cost to accumulating data. You load it, explore it, and when you find something valuable, only then do you structure and load it into the data warehouse.

Ultimately, companies need both models of collecting and processing data. I call the data warehousing model "top down" intelligence and the Hadoop model "bottom up" intelligence. In the top down world, you know the questions you want to ask in advance, while in the bottom up world, you do not. Each model requires different architectural approaches and technologies.

Replacement trends. As such, Hadoop currently complements a data warehousing environment. However, the real question is whether Hadoop will bust out of this complementary role and begin to subsume additional analytical workloads--or indeed, all of them.

There is ample indication that Hadoop may in fact cannibalize additional components of the analytical environment. For example, this spring, I conducted a survey of BI professionals who have implemented Hadoop. Most plan to significantly increase the amount of ad hoc queries, visual exploration, data mining, and reporting that they run directly against Hadoop. While most of this querying will undoubtedly run against new data (i.e. unstructured data), it may also run against traditional data sets in the future.

Exploiting this desire to query data directly in Hadoop, Cloudera announced two weeks ago a real-time query engine that moves Hadoop out of the realm of batch processing and into the world of iterative querying and analytical workloads. In addition, Hortonworks earlier this year announced a metadata catalog called Hcatalog that makes it easier for users to query data in Hadoop. Moreover, there is an Apache project under development called Yarn that extends MapReduce's resource manager to support other processing frameworks and engines. As such, Yarn promises to make Hadoop better optimized to handle diverse sets of workloads, such as messaging, data mining, and real-time queries. In other words, Hadoop can do almost everything a relational database or data warehouse can do, and more.

Bleeding edge companies. Not surprisingly, companies that operate on the bleeding edge of technology, such as Netflix, have seen the future and are aggressively moving to it. Netflix now refers to its massive Teradata data warehouse as a "data mart" and Hadoop as its "data warehouse." Currently, only 10% of its master data resides in Hadoop and the rest in Teradata, but that ratio will soon be reversed. Since Netflix stages most of its data in Hadoop, it sees a natural progression toward moving most of its data and query processing to Hadoop as well. The data warehouse will either disappear or be relegated to handling certain types of dimensional reporting.

Surrounding Hadoop. Given the stakes, it's no wonder that established software vendors, like Teradata, Oracle, Microsoft, SAP, and IBM, as well as the multitude of traditional ETL and BI vendors are working overtime to court Hadoop. On one hand, big data represents a sizable new market opportunity for them--big data opens up a whole new set of applications running on unstructured data. But on the other hand, Hadoop is a threat, and they need to make sure that they don't cede their market hegemony to a slew of Hadoop startups.

Battle Lines

So the battle lines are drawn: Hadoop vendors currently pitch the complementary nature of Hadoop, but keep releasing functionality that puts them in competition with established vendors. And established vendors hawk interoperability with Hadoop while aggressively trying to surround it to maintain control of customer accounts and market dynamics. They want to keep Hadoop as niche technology for handing unstructured data only, while exploiting it for market gain.

We'll learn a lot about the future of Hadoop in 2013. We'll find out just how reliable, secure, and manageable it is for enterprise deployments; we'll better understand the true cost of implementing and maintaining Hadoop environments; and we'll know whether it can live up to its hype as a be-all and end-all for data and information processing.


Posted November 6, 2012 1:15 PM
Permalink | No Comments |