Blog: Wayne Eckerson Subscribe to this blog's RSS feed!

Wayne Eckerson

Welcome to Wayne's World, my blog that illuminates the latest thinking about how to deliver insights from business data and celebrates out-of-the-box thinkers and doers in the business intelligence (BI), performance management and data warehousing (DW) fields. Tune in here if you want to keep abreast of the latest trends, techniques, and technologies in this dynamic industry.

About the author >

Wayne has been a thought leader in the business intelligence field since the early 1990s. He has conducted numerous research studies and is a noted speaker, blogger, and consultant. He is the author of two widely read books: Performance Dashboards: Measuring, Monitoring, and Managing Your Business (2005, 2010) and The Secrets of Analytical Leaders: Insights from Information Insiders (2012).

Wayne is currently director of BI Leadership Research, an education and research service run by TechTarget that provides objective, vendor neutral content to business intelligence (BI) professionals worldwide. Wayne’s consulting company, BI Leader Consulting, provides strategic planning, architectural reviews, internal workshops, and long-term mentoring to both user and vendor organizations. For many years, Wayne served as director of education and research at The Data Warehousing Institute (TDWI) where he oversaw the company’s content and training programs and chaired its BI Executive Summit. He can be reached by email at weckerson@techtarget.com.

July 2011 Archives

(Note: This is an excerpt from the opening section of the "Dashboard Verdict," a compendium of dashboard product reviews that I'm writing for the Business Applications Research Center (BARC), a German research firm that does in-depth evaluations of business intelligence products. To be notified when the dashboard reviews are available for purchase, register at www.bileadership.com/mailing-list-form.html.) To view and subscribe to BARC research, go to www.bi-verdict.com.)

One way to differentiate dashboard products is to position them within the MAD framework that I devised several years ago. MAD stands for Monitor, Analyze, and Drill to detail and is represented by a pyramid divided into three sections. The shape of the pyramid represents the amount of data at each level.

A well-designed MAD dashboard consists of about 10 metrics at the top level, 100 metrics at the analysis layer, and 1,000 metrics at the detail layer. The top-ten metrics are filtered by about 20 dimensions, which generate the lower-level views and metrics. In essence, a performance dashboard is an interactive, information sandbox that is big enough to answer 60% to 80% of questions that users might want to ask about their performance objectives, but not so big that they get lost in the data. (See figure 1.)

The monitoring layer consists of graphical metrics (e.g., charts, stoplights, gauges, etc.) tailored to an individual's role. With a quick glance, users can see if everything is going according to plan. If something is awry, they can drill to the analysis layer and perform root cause analysis by slicing and dicing data dimensionally or applying a variety of filters. If they need transaction data to resolve the issue or they want more context about the issue, they can drill into detailed data, which might be stored in the data warehouse, an operational system, or a detailed report. MAD dashboards focus users on key metrics aligned with strategic objectives and provide access to any data they need in three clicks or less.

Five years ago when organizations built custom dashboards, they stitched together multiple tools to implement the MAD framework. Typically, they used portal software for the monitoring layer, an OLAP tool for the analysis layer, and a reporting tool for the detail layer. Today, although there are many so-called dashboard products on the market, very few support the entire framework in a seamless fashion. Most support only one of the three layers.

Figure 1. BI Products Applied to the MAD Framework (Click to expand)
7-27-2011 1-09-24 PM.jpg

For example, dashboard tools, such as Domo CenterView, SAP BusinessObjects Dashboards, and iDashboards primarily support the monitoring layer. Analysis tools, such as Tableau, QlikView, and Information Builder's Visual Discovery primarily support the analysis layer. And reporting tools, such as Microsoft SQL Server Reporting Services and SAP BusinessIntelligence Web Intelligence, only support the detail layer. Only enterprise BI platforms, such as those sold by SAP and Information Builders, and ROLAP tools, such as those from MicroStrategy and Oracle, offer tools that encompass the entire MAD stack. (Note: many BI platforms consist of lightly or non-integrated product modules that don't always deliver a seamless experience as users traverse the three layers of the MAD framework.

(The rest of this section in the soon-to-be-published BARC report uses scalability, architecture, price/performance as ways to differentiate dashboard products.)


Posted July 27, 2011 11:17 AM
Permalink | No Comments |

This is the second in a four-part blog series on cloud computing for BI professionals.

Cloud computing offers a compelling new way for organizations to manage and consume compute resources. Rather than purchase, install, and maintain hardware and software, organizations rent shared resources from an online service provider and dynamically configure the services themselves. This model of computing dramatically speeds deployment times and lowers costs. (See prior article "What is Cloud Computing?")

Although cloud computing shares the above attributes, it can be deployed in several different ways. The key factor is whether the cloud service provider is an external vendor or an internal IT department. There are three deployment options for cloud computing:

  • Public Cloud. Application and compute resources are managed by a third party services provider.
  • Private Cloud. Application and compute resources are managed by an internal data center team.
  • Hybrid Cloud. A private cloud that leverages the public cloud to handle peak capacity; a reserved "private" space within a public cloud; or a hybrid architecture in which some components run in a data center and others in the public cloud.

Public Cloud

Most of the discussion about cloud computing in the press refers to public cloud offerings. The public cloud offers the most potential benefits and the greatest potential risks. With a public cloud, organizations can obtain application and computing resources without having to make an upfront capital expenditure or use internal IT resources. Moreover, customers only pay for what they use on a usage or monthly subscription basis, and they can terminate at any time. Thus, public clouds accelerate deployments and reduce costs, at least in the short run. This is sweet news to BI teams that often must spend millions of dollars and months of development time before they can deliver their first application.

In addition, a public cloud also obviates the need for customers to maintain and upgrade application code and infrastructure. Many public cloud customers are astonished to see new software features automatically appear in their software without notice or additional expense. And the public cloud frees up IT departments to focus on more value-added activities rather than hardware and software upgrades and maintenance. In short, there is something for everyone to like about the public cloud.

Security and Privacy. But the public cloud also comes with risks. Security and privacy are the biggest bugaboos. Some executives fear that moving data and processing beyond their own firewalls exposes them to security and privacy risks. They fear that moving data across public networks and comingling it with other company's data in a public cloud might make it easier for sensitive corporate data to get into the wrong hands.

While security and privacy are always an issue, the fact is that most corporate resources are more secure in the public cloud than in a corporate data center. Public cloud providers, after all, specialize in data center operations and must meet the most stringent requirements for security and privacy. However, there are compliance regulations that legally require some organizations to maintain data within corporate firewalls or pinpoint the exact location of their data, which is generally impossible in a public cloud which virtualizes data and processing across a grid national or international computers.

Other Challenges. The public cloud poses other challenges:


  • Reliability. Executives may question the reliability of public cloud resources. For example, Amazon EC2 has had two short, but high profile, outages, causing companies that ran mission critical parts of their business there to be left stranded without much visibility into the nature or longevity of the outage.

  • Costs. It can be extremely difficult to estimate public cloud costs because pricing is complex and often companies can't accurately estimate their usage (which is why they want to migrate workloads to the cloud in the first place.)

  • Blank Slate. Administrators must redefine corporate policies and application workflows from scratch in the public cloud, which generally provides plain vanilla services.

  • Vendor and Technology Viability. The public cloud market is evolving fast so it's difficult to know which vendors and technologies will be around in the future.

Private Clouds.

Because of the above reasons, many organizations are beginning their journey into the cloud with private clouds. This is especially true in the infrastructure-as-a-service arena where IT administrators are implementing virtualization software to consolidate servers and increase overall server utilization, flexibility, and efficiency. In addition, a private cloud gives an organization greater control over its processing and data resources, providing ease of mind for worried executives, if not greater security and privacy for sensitive data. And since a private cloud runs in an existing data center, IT administrators don't have recreate security and other policies from scratch in a new environment.

But the private cloud has its own challenges. IT administrators have to learn and install new software (hypervisors and cloud management utilities). They need to manage two compute environments side by side and keep IT policies aligned in both. This adds to complexity and staff workload. And it goes without saying that a private cloud runs in an existing corporate data center, which carries high fixed costs to maintain.

Hybrid Cloud

Companies are increasingly pursuing a two-pronged strategy that uses the private cloud for the bulk of processing and the public cloud to handle peak loads. The key to a hybrid cloud is obtaining cloud management software that spans both private and public cloud environments. The software supports the same hypervisors used in each environment (ideally it's the same hypervisor) and has built-in interfaces to the public cloud provider so internal IT policies and virtual images can be transferred to the public cloud environment.

In addition, many public cloud vendors allow customers to carve out private clouds within the public cloud domain. For example, Amazon.com offers a virtual private cloud within its Elastic Compute Cloud (EC2) environment that lets customers reserve dedicated machines and static IP addresses, which they can link to their internal data centers via virtual private networks. Hybrid clouds are obviously more complex and challenging to manage. Currently, few people have experience blending private and public clouds in a seamless way.

Adding Public Cloud Components to a BI Architecture

Another form of hybrid cloud uses public cloud facilities to enhance an existing architecture. In a BI environment, there are several that organizations can mix and match public cloud offerings with their on premises software (which may or may not be running in a private cloud.)

Scenario #1 - Analytic Sandbox. When a data warehouse is running at full capacity, administrators might consider offloading complex ad hoc queries submitted by a handful of business analysts to a public cloud replica. In this scenario, complex queries submitted by the analysts are bogging down performance of the data warehouse. Since it's difficult to estimate ad hoc processing requirements and the costs of replicating a data warehouse are high, the IT staff decides it's faster and cheaper to create a new data mart in the public cloud and point the business analysts to it. The IT staff (or analysts) can increase or decrease capacity on demand using self-provisioning capabilities of the public cloud. (See figure 1.)

Figure 1. Analytic Sandbox Using a Public Cloud (click to expand)
Scenario 1.jpg

The primary challenge in this scenario is the cost and time required to move data across the internet from an internal data center to the cloud. Since the initial load may take days or weeks depending on data volumes, IT staff will usually ship a disk to the cloud provider to load manually. Thereafter, the IT staff needs to figure out whether it can move daily deltas across the internet in time within an allotted batch window. Considering that it takes six days to move 100GB across a T-1 line, organizations may need to skip doing batch loads and instead trickle feed data into the data warehouse replica. In addition, it is often difficult to estimate pricing for such data transfers and charges may add up quickly. Cloud providers generally charge for transferring data in and out of the cloud and storing it. (Amazon, however, has recently discontinued fees for transferring data into EC2.)

Also, depending on the speed of network connections, the business analysts might experience delays in query response times due to internet latency. Invariably, internet speeds won't match internal LAN speeds so users might notice a difference. Finally, there are security and privacy issues discussed in the previous article. (See "What is Cloud Computing?")

Scenario #2. Cloud-based Departmental Dashboard. A more common scenario is when a department head purchases a Software-as-a-Service (SaaS) BI solution from a SaaS BI vendor, of which there are many. Here, an organization's source systems and data warehouse remain in the corporate data center but the dashboard and associated data mart run in the cloud. (See figure 2.)

Figure 2. Cloud-based Departmental Dashboard
Scenario 2.jpg

SaaS BI tools are popular among department heads who want a dashboard on the cheap and don't want to involve corporate IT. Unfortunately, designing a data mart, whether in the cloud or on premise, is never easy or quick, especially if it involves integrating multiple operational sources. (See "Expectations Versus Reality in the Cloud: Understanding the Dynamics of the SaaS BI Market.")
This is not a problem if organizations are willing to pay the costs of creating a custom data mart and wait about three to four months, which is the time it usually takes to build out a relatively complex, custom environment. It's also not a problem if they simply want to visualize an existing spreadsheet. But if they believe the cloud provides quick, easy, and inexpensive deployments for any type of BI deployment, they will be disappointed. Also, they still need to transfer data to the cloud and users may experience response time delays due to internet latencies.

Scenario #3. BI in the Cloud Without the Data. To eliminate security, privacy, and data transfer issues, companies may want to keep data locally in a corporate data center while maintaining the BI application in the cloud. (See figure 3.) BI developers can configure the SaaS BI tool to meet their branding and workflow requirements, gaining the speed and cost advantages of cloud deployments, while minimizing data security and privacy problems.

Figure 3. BI in the Cloud Without Data
Scenario 3.jpg

While this scenario sounds like it optimally balances the risks and rewards of cloud-based BI deployments, it has a major deficiency: it requires the IT department to open a port in the corporate firewall to support incoming queries. If the organization is worried enough about data security to want to keep data locally where it's safe, they will kill it as soon as they recognize the security vulnerability it presents.


Scenario #4. Data Warehouse in the Cloud. The final scenario is to put the entire data warehousing environment in the cloud. (See figure 4.) Today, this only makes sense if all your operational applications also run in the cloud. Obviously, this scenario only applies to few companies, namely internet startups that have fully embraced cloud computing for all application processing. However, these companies have to manage all the problems associated with the public cloud (i.e., security, reliability, availability, and vendor viability). At some point in the future, this architecture may prove dominant once we get past security and latency hurdles.

Figure 4. Data Warehouse in the Cloud
Scenario 4.jpg

Summary

There are three major deployment options for cloud computing: public, private, and hybrid. As in most things in life, there is rarely a clearcut solution. So, too, with cloud computing. Organizations will experiment with public and private clouds, and most will probably have a mix of both. Most data center shops have already implemented virtualization, which is the first step on the way to private clouds. Once they get comfortable with private clouds, they will soon experiment with hybrid cloud computing to support peak computing rather than spend millions on new hardware to support a few days or weeks of peak processing a year. And if the data is particularly sensitive, they may begin with a private virtual cloud inside a public cloud data center to ease their fears about security, privacy, and reliability.

When push comes to shove, economics and convenience always trumps principles and ideals. This is how e-commerce overcame the security bogeyman and gained its footing in the consumer marketplace, and I suspect the same will happen with cloud computing.


Posted July 19, 2011 8:26 AM
Permalink | No Comments |

This is the first of four-part blog series on cloud computing for BI professionals.

There is a lot of confusion about cloud computing, even among professionals in the field. But that's true of any new, fast moving technology in which there are a lot of new technologies and methods. After reading a few definitions of cloud computing that caused me to nod off at my keyboard, I created a simpler one:
Shared, online compute resources that you rent from a service provider and dynamically configure yourself.

Let's unpack this definition a bit:

  • Shared: You share compute resources with other groups or companies, even your direct competitors! Obviously, this raises security and privacy concerns.
  • Online: You access the compute resources via a Web browser or a programmatic Web application programming interface. In this respect, cloud computing delivers online "services".
  • Compute resources: Compute resources consist of the infrastructure (servers, storage, and networks), development tools, and applications. Basically, the whole stack, accessible via a Web browser or service call.
  • Rent: You only pay for what you use and you can terminate the service at any time (although there may be some exit fees.) This is value-based pricing. Cloud infrastructure vendors generally charge by the hour while cloud software providers generally charge by user per month.
  • Service provider: A service provider could be your internal IT department (private cloud) or an external company (public cloud).
  • Dynamically configure: Unlike traditional hardware and software, you don't purchase, install, test, tune, and maintain cloud-based resources. With cloud-based infrastructure, you simply configure a virtual image of your compute environment (hardware, storage, network) using a Web browser. With cloud-based software, you simply configure your application using a Web browser to conform with your branding and workflow requirements.

Three Services

As you probably have already surmised, cloud computing is divided into three classes of services, each of which can be applied to the business intelligence market: 1) software as a service (applications) 2) platform-as-a-service (application development), and infrastructure as a service (compute resources). (See figure 1.)

Figure 1. Three Types of Cloud Services with BI Examples (click to expand)
Three Services.jpg

  • Software-as-a-Service (SaaS) delivers applications. SaaS was first popularized by Salesforce.com, which was founded in 1996 to deliver online sales applications to small- and medium-sized businesses with few or no IT resources and few capital resources. Salesforce.com now has 92,000 customers of all sizes and has spawned a multitude of imitators. Within the BI market, many startups and established BI players are offering SaaS BI services, although the uptake of such services is slower than expected. (See "Expectations Versus Reality in the Cloud: Understanding the Dynamics of the SaaS BI Market.") SaaS BI vendors including Birst, PivotLink, GoodData, Indicee, Rosslyn Analytics, and SAP, among others.

  • Platform-as-a-Service (PaaS) enables developers to build applications online. PaaS services provide development environments, such as programming languages and databases, so developers can create and deliver applications without having to purchase and install hardware. In the BI market, the SaaS BI vendors (above) are actually PaaS BI vendors, which is the primary reason why growth of SaaS BI is slow. Before you can consume a SaaS BI application, you have to build a data mart, which is often tedious and highly customized work since it involves integrating data from multiple, unique sources, cleaning and standardizing the data, and modeling and transforming the data. SaaS BI vendors are peddling a finished product when they are actually selling a custom PaaS development effort.

  • Infrastructure-as-a-Service (IaaS) provides online computing resources (servers, storage, and networking) which customers use to augment or replace their existing compute resources. In 2006, Amazon popularized IaaS when it began renting space in its own data center using virtualization services to outside parties. Some BI vendors are beginning to offer software components within public cloud or hosted environments. For example, analytic database vendors Vertica and Teradata are now available as services within Amazon EC2, while Kognitio offers a hosted service. ETL vendors Informatica and SnapLogic offer services in the cloud.

  • Key Characteristics of the Cloud

    Virtualization. Virtualization is the foundation of cloud computing. You can't do cloud computing without virtualization; but virtualization by itself doesn't constitute cloud computing.

    Virtualization abstracts or virtualizes the underlying compute infrastructure using a piece of software called a hypervisor. With virtualization, you create virtual servers (or virtual machines) to run your applications. Your virtual server can have a different operating system than the physical hardware upon which it is running. For the most part, users no longer have to worry whether they have the right operating system, hardware, and networking to support an application. Virtualization shields from the underlying complexity (as long as the IT department has created appropriate virtual machines for them to use.)

    With virtualization, organizations can run multiple, heterogeneous virtual servers on a single physical server to maximize utilization, or they can run a single virtual server on multiple physical servers to increase scalability. Because virtualization decouples applications from the underlying hardware, IT administrators can migrate applications to new hardware without having to reinstall software. They also can spawn multiple instances of a single application using virtual servers and run them in parallel on a single physical server to improve application performance and throughput. (See figure 2.)

    Figure 2. Virtualization Use Cases (click to expand)
    Virtualization Use Cases.jpg
    Left: heterogeneous system images and applications run on a single server, maximizing server utilization. Middle: a single image runs across multiple physical machines, increasing scalability. Right: multiple instances of an application run in parallel on a single machine, increasing efficiency.

    In short, virtualization increases the flexibility, scalability, efficiency, and availability of data center resources, and it dramatically lowers data center costs by enabling the IT department to consolidate servers and reduce power, cooling, space, and staffing overhead.

    To the Cloud: Dynamic Provisioning

    Browser Interface. To turn virtualization into cloud computing, you need to add software that enables business users to dynamically provision their own virtual servers and use the servers as long as they desire.

    For instance, developers using a Web browser can configure a custom virtual server to support a new development and test bed. Or, they can select a virtual image (i.e., server and applications) from a library of virtual images created in advance by the IT department. Once the developers are finished using the virtual images, they "release" them. Thus, developers no longer need to submit requests to the IT department for servers, storage, and networking capacity. They either configure their own virtual machine or select one from a library that meets their application's processing requirements. They no longer have to wait for purchasing and legal to execute a purchase order or the IT department to install, tune, test, and deploy the systems.

    Services Interface. To make the leap to cloud computing, you also need a services interface so administrators can programmatically provision servers based on a schedule or events (e.g., an ETL job that begins). Administrators use Web services interfaces to support auto-scaling, failover, and backups.

    With auto-scaling, a BI administrator uses a cloud services interface to automatically provision and release virtual BI servers during the course of a day to efficiently allocate processing power among servers to support various BI workloads. For example, at 2 a.m. in a typical BI environment, the system fires up an ETL server and database server to run nightly ETL jobs, while at 4 a.m. it releases the ETL server and provisions a BI server to process and burst daily reports. At 10 a.m. it provisions an additional BI server and database server to handle peak usage. Failovers and backups work much the same way.

    Cloud Management Software. Cloud computing also requires management software to help IT administrators keep track of all the moving parts in a virtualized environment. Cloud management software enables IT administrators to define systems-level policies (e.g., security and usage), create and manage virtual images which enforce the policies, manage virtual server versions, monitor servers and performance, manage user roles and access, track usage, and manage chargebacks or accounting, among other things. There are a variety of vendors that offer cloud management software, including cloud data center providers, such as Amazon.com and Rackspace, and independent software vendors, such as Eucalyptus and RightScale.

    Multi-tenancy

    Another key characteristic of cloud computing (in particular, Software-as-a-Service) is that applications are multi-tenant, which means multiple users from different organizations run the same application code running on the same hardware. This is different from a traditional hosting or outsourcing environment in which each customer owns or rents a dedicated set of hardware and software in the service provider's data center. The hosted model leads to a lot of wasted compute resources since customers only use their own compute resources even when other machines in the data center are idle. In contrast, multi-tenancy makes much more efficient use of hardware and software resources, delivering economies of scale that make cloud computing an attractive business model to service providers, as long as they can attract enough customers.

    One problem with multi-tenancy is that applications must be designed from scratch to support it. Multi-tenancy creates virtual partitions within the application and database for each distinct customer. Customers usually configure the application to match their unique branding and workflow requirements. On the data side, customer data is either interleaved by row and separated using unique identifiers or partitioned into separate tables or database instances.

    Legacy applications not designed for multi-tenancy have to fudge it. Either the service provider must create dedicated environments for each customer, which is highly inefficient (e.g., the old application service provider model) or they use virtualization software to run parallel instances of each application (e.g., a virtual appliance.) In some respects, the virtual appliance approach is more flexible than multi-tenancy because the virtual appliances can be ported to run on almost any hardware. (See figure 3.)

    Figure 3. Application Architectures (click to expand)
    Multi-tenancy.jpg
    Traditional on-premise software (far left) tightly couples logic and data to hardware in a LAN environment. A hosted environment (second from left) gives each customer their own dedicated hardware and software resources in a third party data center which they access via a virtual private network. A true multi-tenant environment (second from right) partitions a single application and database so different customers get their own unique views while sharing the same application, database, hardware, and network connection. A virtual appliance model (far right) enables legacy software not written for multi-tenancy to run parallel instances, essentially virtualizing multi tenancy.

    SaaS BI vendors have long waged battles over whether their respective software is truly multi-tenant or not. The virtual appliance model gives legacy software vendors venturing into SaaS a more equal footing on which to compete.

    Summary

    This blog defined cloud computing and discussed some of its more salient attributes. However, there are several ways to deploy the cloud, and these deployment options have significant implications on costs, security, and staffing. The next blog in this series will discuss the differences between public clouds, private clouds, and hybrid clouds and show how an organization might architect its BI environment to leverage public cloud offerings.

    *******
    By the way, I'm once again speaking at CFO Magazine's Corporate Performance Management Conference, which is being held September 11-12 in Dallas Texas. I'll be delivering a presentation on Monday about the future of business intelligence, using my BI Delivery Framework 2020 as the basis for the presentation. On Tuesday afternoon, I'll be delivering a half-day seminar on performance dashboards. If you are interested in registering for the all-access pass, use the code LF1000 to get a $1,000 discount. Cool!


    Posted July 11, 2011 3:56 PM
    Permalink | No Comments |

    One of the unwritten jobs of an industry analyst is to define industry terms. This is risky business because no matter what you say, most people will disagree.

    Our industry (and most industries) has a semantics problem. The most commonly used terms are always the most abused semantically. Everyone creates definitions that align with their individual perspectives. This is especially true among software vendors which must ensure that definitions harmonize with their product portfolios.

    One of the more popular terms in recent years is analytics. The root of the word is "analysis" or "analyze". Technically, to analyze something is to break it into its constituent parts. A less formal definition is to examine something critically to understand its essence or identify causes and key factors.

    Who better then to define analytics than an industry "analyst"? We presumably spend every day "thinking critically" about software and vendors. (This is also a wonderful way to justify a liberal arts education, whose primary mission is to teach students to think critically.)

    To increase my chances of gaining consensus, I'm offering two definitions of analytics. (Yes, this is wishy washy, but bear with me.) We need two definitions because every commonly used industry term has two major dimensions: an industry context and a technology context.

    So, given this context, Analytics with a capital "A" is an umbrella term that represents our industry at a macro level, and analytics with a small "a" refers to technology used to analyze data.

    Capital Analytics

    From a macro perspective, Analytics is the processes, technologies, and best practices that turns data into information and knowledge that drives business decisions and actions.

    The cool thing about such industry definitions is that you can reuse them every five years or so. (For example, I used the same definition to define "Data Warehousing" in 1995, "Business Intelligence" in 2000, and "Performance Management" in 2005.) Our industry perpetually recreates itself under a new moniker with a slightly different emphasis to expand its visibility and reenergize its base. (See my blog "What's in a Word? The Evolution of BI Semantics.")

    Today, many people use the term Analytics as a proxy for everything we do in this space. The most prominent person who defines Analytics this way is Tom Davenport, whose Harvard Business Review articles and books on the subject have prompted many executives to pursue Analytics as a sustainable source of competitive advantage. Davenport is savvy enough to know that if he had called his book "Competing on Business Intelligence" instead of "Competing on Analytics", he would not be the industry rock star that he is today. (I still prefer the term "Business Intelligence" because it perfectly describes what we do: use information to make the business run more intelligently.)

    Small Analytics

    This leaves the term "analytics" to describe various technologies that business people use to analyze data. This is a broad category of tools that spans everything from Excel, OLAP, and visual analysis tools to statistical modeling and optimization tools. There is a natural divide within these technologies so I'm tempted to create two sub-definitions: deductive analytics and inductive analytics.

    (Interestingly, all of our former capitalized terms now refer to a category of tools: data warehousing refers to data modeling and ETL tools; business intelligence refers to query and reporting tools; and performance management refers to dashboard, scorecard, and planning tools.)

    Deductive and Inductive Analytics

    With deductive analytics, business users use tools like Excel, OLAP, and visual analysis tools to explore a hypothesis. FIrst, they make an educated guess about what might be the root cause of some anomaly or performance alert. Then, they use analytical tools to explore the data and either verify or negate the hypothesis. If the hypothesis proves false, they come up with a new hypothesis and start looking in that direction.

    Inductive analytics is the opposite. Business users don't start with a hypothesis, they start with a business outcome or goal, such as: "What are the top 10% of our customers and prospects who are most likely to respond to this offer." Then, they gather historical data that they think might correlate with the desired behavior. They then use analytics to create statistical or machine learning models that they can apply to data to prioritize their customers. In other words, they don't start with a hypothesis, they start with the data and let the analytical tools discover the patterns and anomalies for them.

    Summary

    As the saying goes, there are many ways to skin a cat. Although I've offered two definitions of analytics (or Analytics), you are welcome to define it however you want. And you probably already have.

    But remember, words are very powerful. They are our primary modes of communication. The more people you can get to use the same meaning of things, the more power you have to communicate and get things done. (So I hope you use my definitions!)


    Posted July 8, 2011 10:29 AM
    Permalink | 5 Comments |

    MP900414115.JPG

    This is part three in a four-part series on cloud computing for BI professionals.

    There are no shortcuts in business intelligence (BI). And Software-as-a-Service (SaaS) BI vendors and some of their Cloud-based customers are finding this out the hard way.

    I'm a firm believer that most computing will eventually move to the Cloud but I've been surprised that the adoption of SaaS BI services has been slower than expected. Most pureplay SaaS BI vendors today are small and struggling, and leading BI vendors no longer market their SaaS BI solutions to a significant degree (if at all.) So the question is "Why?"

    Red Herrings. The two most commonly cited obstacles to SaaS BI adoption are security and data transfer rates. The security issue is mostly a red herring, in my opinion, except at organizations with strict compliance regulations. Data can be safer in the Cloud than in many corporate data centers. In terms of data transfer rates, a majority of organizations simply don't generate enough daily data to overwhelm a reasonable internet connection. And internet speeds are getting faster and cheaper all the time. Another red herring.

    The Missing Link

    I believe there is something deeper going on. There is a fundamental flaw in the SaaS BI equation. And I think I've found it.

    But first, it's important to recognize that there is a lot to like about the Cloud. There are numerous benefits to running your applications as a service rather than on premise. There is no hardware and software to buy, install, tune, and upgrade. Consequently, there are no IT people to hire, pay, and manage. As a result, software services drive down costs and speed delivery. What's not to like?

    Preparing Data. Unfortunately, this equation doesn't add up in the BI space. That's because the hard part about delivering BI applications is not what users see--the graphical report or dashboard--it's collecting, cleaning, normalizing, integrating, and aggregating data from various systems so it can be viewed in a clear, coherent way by business users.

    Preparing data is hard, tedious work, but it's the foundation of BI. Do it right, and you can ice your cake with sweet-tasting frosting. Do it wrong or not at all, and there is no cake to ice! Too many SaaS BI vendors have been peddling the icing and downplaying the need to bake the cake, and now they're suffering. The same thing is happening with visual analysis tools, such as QlikView. They are great at handling simple data sets, but give them dirty data from complex operational systems and they fall apart. Someone, somewhere has to do (and pay for) the dirty work of preparing data or else everyone goes hungry.

    Software Services or Professional Services?

    Let me take another slight digression: What's the difference between a SaaS BI vendor and a BI consultancy? Not much.

    Custom Data Marts. On one hand, you can argue that pureplay SaaS BI vendors, such as GoodData, Indicee, Birst, and PivotLink, offer software, which consultancies don't, and that the best offer true multi-tenant BI services that run in a virtualized environment. But on the other hand, SaaS BI vendors, just like BI consultancies, provide professional services to build custom data marts for their customers. Like consultants, they need to gather requirements, build a data model, extract and map source data, and build reports. This is a lot of work. If you peel back the covers on many SaaS BI deployments, they are really custom consulting jobs masquerading as a software service. But that's not the end of it.

    Operational Management. Once the development work is done, BI consultancies go home or move on to the next job, but SaaS BI vendors have to stick around and run the BI environment, just like an inhouse IT staff would. They have to schedule and execute jobs to extract and clean data and then transform and load it into the data mart. They have to manage change control and error processes, troubleshoot problems, and staff a help desk to answer any questions customers might have. And before they can upgrade their software, they need to test every customization that they've built for every customer (which happens to undermine one of the major benefits of Cloud-based services, which is rapid delivery of software upgrades.)

    Fixed Costs. Adding insult to injury, before SaaS BI vendors can begin collecting money, they have build out and staff a highly secure and scalable data center that offers full backup/recovery, failover, and disaster recovery services. Customers have been trained to demand the highest level of IT platform and administration services possible from a Cloud or hosting vendor even though many would not pay for the same level of services in their own data centers.

    Subscription Pricing. Obviously, all of this involves a lot of work and is very expensive. So you would think that SaaS BI vendors command premium prices, right? Well, not really. In fact, mostly the opposite. Customers pay only for what they use on a monthly or annual basis and they can cancel their subscription at any time (although there may be exit fees.) Compared to on-premise software where vendors get all their money upfront, SaaS BI vendors have to wait several years before they accrue a comparable sum. But, in the meantime, they have to finance an expensive technical and organizational infrastructure that requires large upfront capital outlays and ongoing expenditures. In short, the business model for SaaS BI just doesn't work.

    Wrong Audience? SaaS BI vendors have backed themselves into this corner by touting their services as low cost, easy to use, and fast to deploy. They've had a receptive audience among the unwashed masses of small- and medium-sized businesses that have no or minimal IT budget and people and little knowledge of BI. They've also done well selling to department heads at large companies which have clamped down hard on IT budgets. So, SaaS BI vendors have done a good job of selling an information-rich vision to data-hungry business people who have few capital dollars, tight budgets, and minimal understanding of BI.

    Unfortunately, unlike on premises software vendors, SaaS BI vendors have to back up their claims. They can't sell a promise and then vacate the premises. They have to live daily with the expectations that they've created among their customers who demand low-cost, high-speed delivery of robust BI services. So SaaS BI vendors are stuck between a rock and a hard place: it costs more money to deliver SaaS BI solutions than customers seem to be willing to pay for them.

    Market Strategies - The Way Forward

    As I see it, SaaS BI vendors have five options to extricate themselves from this pickle:

    1) Consult. If SaaS BI vendors want to deliver a complete BI solution that solves real business problems, they should shift from selling software services to professional services, and compete head-on with BI consultancies. SaaS BI vendors would have several advantages here:
    -- SaaS BI vendors can not only develop custom solutions, they can run them. And they can do so in a cost-effective (but not inexpensive) way due to the economies of scale of a virtualized, hosted infrastructure.
    -- They can also develop solutions faster than BI consultancies because they can leverage prebuilt software, models, and metrics built for other customers (although veteran consultancies will also have at least prebuilt models and metrics to contribute to a project.)

    I haven't come across any SaaS BI vendor that is taking this approach overtly, although many are doing so in practice. Perhaps the closest is SAP BusinessObjects OnDemand.

    2) Simplify and Shift. Another approach is for SaaS BI vendors to strip out all the custom work from the equation by making the application as simple as possible, shifting the burden of uploading, modeling, and mapping data to the customer. In other words, the SaaS BI vendor does the easy stuff and the customer does the hard stuff.

    The challenge is here making the modeling and mapping tools both easy to use and suitably sophisticated. This is a devilish tradeoff and, in most cases, a SaaS BI vendor will side with simplicity rather than power and flexibility. This means that their customers will likely hit the wall with such tools once they want to do something complex. And if the application is really simple, then it is probably more cost effective to build it in Excel than in the Cloud.

    Of all the SaaS BI vendors, Indicee seems to be following this path most closely.

    Package and Configure. Another way to minimize the amount of custom development is to deliver packaged analytic applications that come with canned but configurable data mappings, data models, metrics, and reports. The mappings extract, transform, and load data from a specific source application (e.g., Salesforce.com) to a target data model with predefined dimensions, hierarchies, and metrics. Packaged analytic applications streamline development and accelerate deployment.

    The challenge with packaged analytic applications is that they only work if the customer has the same source application that the package supports and they can live with the canned reports, dashboards, and metrics with some modification. Packages typically fall apart when customers want to customize rather than configure the application or they want to extract data from more than one source application to feed the canned data models and reports. And then the implementation becomes a custom consulting engagement. The key to making the packaged approach work is for vendors is to build out a sizable portfolio of applications that meet the majority of customers's needs out of the box. This obviously takes time and long-term investment.

    PivotLink and GoodData seem to be following this approach, although GoodData claims it only packages back-end mappings to various Cloud-based applications, such as Salesforce.com, Microsoft Dynamics CRM Online, and SugarCRM. (And most of its packages only source data from a single Cloud-based application.) GoodData reportedly leaves the front-end fully customizable although they offer rich templates that embed metrics and reports for each source application. In essence, GoodData delivers a series of packaged operational reports for various Cloud-based applications.

    Go On Premise. Another option is to abandon the Cloud, either in part or in full, and deliver software as an on premise solution. Here, the SaaS BI vendor gets its money upfront and leaves the customer with the responsibility of managing its data and delivering a BI solution. However, if the vendor also maintains a SaaS BI service, it can use the cost differential between its on premise and Cloud-based service to educate customers about the true expense of building and maintain a BI solution. This might push customers to purchase the SaaS BI service if they don't want the hassle of building a solution themselves .

    The challenge here is that the vendor needs to offer both Cloud services and on-premises software, which is a mixed business model that might be hard to sustain. The vendor still has to maintain a large-scale data center operation while it also has to provide maintenance and support for on-premise software. The vendor will need patient investors to achieve economies of scale to support both models.

    There is a chance that Birst might follow this course so it can better compete head on with what it considers its chief rival, QlikTech.

    Offer A Real Software Service. Another approach is to offer a software service, not a solution service, which is what most SaaS BI vendors deliver today. A software service takes a component of a BI solution and makes it aailable as a service via the Cloud or a hosted environment. We have already seen database, ETL, and data quality vendors put their software in the Cloud and provide subscription based access to it. This includes companies such as Kognitio (database), SnapLogic (ETL), and Melissa Data (data quality.) These vendors don't purport to deliver a complete BI solution, only a piece of a larger puzzle.

    ConclusionThe only way to make money in the Cloud is to have a lot of customers. The only way to get a lot of customers quickly is to give everyone the same configurable application and avoid custom development work. (A configurable application lets users customize the GUI, create unique workflows, and extend the data model.) In the Cloud, economies of scale are everything. But BI is largely a custom development effort. Unfortunately, most business customers don't realize this and most SaaS BI vendors have done little to disabuse them of the notion. In addition, most SaaS BI vendors have underestimated the challenge of delivering robust BI services that address real business needs and are now struggling to find a sustainable business model that will deliver real profitability.

    Ultimately, the industry will figure out a way to make SaaS BI work for everyone involved. We may have to ratchet down our expectations on both sides of the equation. But there is too much value in running applications remotely in a virtualized environment for SaaS BI not to succeed in the long run.


    Posted July 1, 2011 9:43 AM
    Permalink | 3 Comments |