BeyeNETWORK Blogs BeyeNETWORK Blogs. Copyright BeyeNETWORK 2005 - 2012 http://www.b-eye-network.com/rss/content.php 150 31 BeyeNETWORK Blogs http://www.b-eye-network.com/images/logo_b-eye_rss.gif http://www.b-eye-network.com/rss/content.php The Rise of the Crowd - Part 1 In the recently concluded Super Bowl 2012, we all know NY Giants won the championship, but in the preceding weeks there was an increasing sentiment expressed on Twitter about Eli Manning and at the end we all know the result.

If you have read James Surowiecki's book titled The Wisdom of Crowds, there is a famous example of the power of the crowd demonstrated by Sir Francis Galton. The story goes In 1906, he was visiting a livestock fair in England, where he stumbled upon an intriguing contest. An ox was put on display, and the villagers were invited to guess the animal's weight after it was slaughtered and dressed, paying 6 pence to participate. Nearly 800 people participated, but not one person hit the exact mark: 1,198 pounds. Galton collected the answers and applied the statistical mean of these guesses from independent people in the crowd: Astonishingly the mean of those 800 guesses was 1,197 pounds, accurate to fraction of a percent. This marks the first of the series of experiments conducted by scientists to prove the collective intelligence of the crowd.

What this proves to us is when you apply a set of smart people to solve a problem, any problem, chances of a solution are very more possible than a single person trying to do the same. Today the same type of contests are held by companies such as Kaggle, 99Designs, Innocentive, CrowdAnalytix and many others, where statisticians and analytic experts compete to solve such problems.

What is the use of these contests and these business models? well there are several benefits

  • The problem can be solved better by a crowd where it can be solved faster
  • The open innovation platform provides you access to more experts than any consulting expertise can provide
  • Costs can be better managed in an open contest where the solution has a fixed price and timeline
And the list goes on. We will see how challenges arise in this subject in tomorrow's blog

The topic is deep and wide, next week at TDWI Las Vegas, there is a night school session on this subject that I'm hosting, feel free to attend.


]]>
http://www.b-eye-network.com/blogs/krishnan/archives/2012/02/the_rise_of_the.php Tue, 7 Feb 2012 18:15:02 MST http://www.b-eye-network.com/blogs/krishnan/archives/2012/02/the_rise_of_the.php
Two Markets for Big Data: Comparing Value Propositions Editor's note: This is part III in a multi-part series on Big Data. To view part II "Big Data Liberation Theology", click here. There are two types of Big Data in the market today. There is open source software, centered largely around Hadoop, which eliminates upfront licensing costs for managing and processing large volumes of data. And then there are new analytical engines, including appliances and column stores, which provide significantly higher price-performance than general purpose relational databases, which have dominated the market for three decades. Both sets of Big Data software deliver higher returns on investment than previous generations of data management technology, but in vastly different ways. Hadoop Free Software. Hadoop is an open source distributed file system available through the Apache Software Foundation that is capable of storing and processing large volumes of data in parallel across a grid of commodity servers. Hadoop emanated from large internet providers, such as Google and Yahoo, who needed a cost-effective way to build search indexes. They knew that traditional relational databases would be prohibitively expensive and technically unwieldy so they came up with a low-cost alternative that they built themselves and eventually gave to the Apache Software Foundation so others could benefit from their innovations. Today, many companies are implementing Hadoop software from Apache as well as third party providers, such as Cloudera, Hortonworks, EMC, and IBM. Developers see Hadoop as a cost-effective way to get their arms around large volumes of data that they've never been able to do much with before. For the most part, companies use Hadoop to store, process, and analyze large volumes of Web log data so they can get a better feel for the browsing and shopping behavior of their customers. Previously, most companies outsourced the analysis of their clickstream data or simply let it "fall on the floor" since they didn't have a way to process it in a timely and cost-effective way. Data Agnostic. Besides being free, the other major advantage of Hadoop software is that it's data agnostic. It can handle any type of data. Unlike a data warehouse or traditional relational database, Hadoop doesn't require administrators to model or transform data before they load it. With Hadoop, you don't define a structure for the data; you simply load and go. This significantly reduces the cost of preparing data for analysis compared to what happens in a data warehouse. Most experts assert that 60% to 80% of the cost of building a data warehouse, which can run into the tens of millions of dollars, involves extracting, transforming, and loading (ETL) data. Hadoop virtually eliminates this cost. As a result, many companies are starting to use Hadoop as a general purpose staging area and archive for all their data. So, a telecommunications company can store 12 months of call detail records instead of aggregating that data in the data warehouse and rolling the details to offline storage. With Hadoop, they can keep all their data online and eliminate the cost of data archival systems. They can also let power users query Hadoop data directly if they want to access the raw data or can't wait for the aggregates to be loaded into the data warehouse. Hidden Costs. Of course, nothing in technology is ever free. When it comes to processing data, you either "pay the piper" upfront, as in the data warehousing world, or at query time, as in the Hadoop world. Before querying Hadoop data, a developer needs to understand the structure of the data and all of its anomalies. With a clean, well understood, homogenous data set, this is not difficult. But most corporate data doesn't fit this description. So a Hadoop developer ends up playing the role of a data warehousing developer at query time, interrogating the data and making sure it's format and content match their expectations. Querying Hadoop today is a "buyer beware" environment. Moreover, to run Big Data software, you still need to purchase, install, and manage commodity servers (unless you run your Big Data environment in the Cloud, say through Amazon Web Services). While each server may not cost a lot, collectively the price adds up. But what's more costly is the expertise and software required to administer Hadoop and manage grids of commodity servers. Hadoop is still bleeding edge technology and few people have the skills or experience to run it efficiently in a production environment. These folks are hard to find, and they don't come cheap. Members of the Apache Software Foundation admit that Hadoop's latest release is equivalent to version 1.0 software, so even the experts have a lot to learn since the technology is evolving at a rapid pace. But nonetheless, Hadoop and its NoSQL brethren have opened up a vast new frontier for organizations to profit from their data. Analytic Platforms The other type of Big Data predates Hadoop and NoSQL variants by several years. This version of Big Data is less a "movement" than an extension of existing relational database technology optimized for query processing. These analytical platforms span a range of technology, from appliances and columnar databases to shared nothing, massively parallel processing databases. The common thread among them is that most are read-only environments that deliver exceptional price-performance compared to general purpose relational databases originally designed to run transaction processing applications. Teradata laid the groundwork for the analytical platform market when it launched the first analytical appliance in the early 1980s. Sybase was also an early forerunner, shipping the first columnar database in the mid 1990s. Netezza kicked the current market into high gear in 2003 when it unveiled a popular analytical appliance, and was soon followed by dozens of startups. Recognizing the opportunity, all the big names in software and hardware--Oracle, IBM, Hewlett-Packard, and SAP--subsequently jumped into the market, either by building or buying technology, to provide purpose-built analytical systems to new and existing customers. Although the pricetag of these systems often exceeds a million dollars, customers find that the exceptional price-performance delivers significant business value, in both tangible and intangible form. For example, XO Communications recovered $3 million in lost revenue from a new revenue assurance application it built on an analytical appliance, even before it had paid for the system! It subsequently built or migrated a dozen applications to run on the new purpose-built system, testifying to its value. Kelley Blue Book purchased an analytical appliance to run its data warehouse, which was experiencing performance issues, giving the provider of online automobile valuations a competitive edge. For instance, the new system reduces the time needed to process hundreds of millions of automobile valuations from one week to one day, among other things. Kelley Blue Book now uses the system to analyze its Web advertising business and deliver dynamic pricing for its Web ads. Challenges. Given the upfront costs of analytical platforms, organizations usually undertake a thorough evaluation of these systems before jumping on board. First, companies must assess whether an analytical platform outperforms their existing data warehouse database to a degree that warrants migration and retraining costs. This requires a proof of concept (POC) in which customers test the systems in their own data center using their own data across a range of queries. The good news is that the new analytical platforms usually deliver jaw-dropping performance for most queries tested. In fact, many customers don't believe the initial results and rerun the queries to make sure that the results are valid. Second, companies must choose from more than two dozen analytical platforms on the market today. For instance, they must decide whether to purchase an appliance or a software-only system, a columnar database or an MPP database, or an on-premise system or a Web service. Evaluating these options takes time and many companies create a short-list that doesn't always contain comparable products. Finally, companies must decide what role an analytical platform will play in their data warehousing architectures. Should it serve as the data warehousing platform? If so, does it handle multiple workloads easily or is it a one-trick pony? If the latter, what applications and data sets makes sense to offload to the new system? How do you rationalize having two data warehousing environments instead of one? Today, we find that companies which have tapped out their SQL Server or MySQL data warehouses often replace them with analytical platforms to get better performance. However, companies that have implemented an enterprise data warehouse on Oracle, Teradata, or IBM often find that the best use of analytical platforms is to sit alongside the data warehouse and offload existing analytical workloads or handle new applications. This architecture helps organizations avoid a costly upgrade to their data warhousing platform, which might easily exceed the cost of purchasing an analytical platform. Summary The Big Data movement consists of two separate, but interrelated, markets: one for Hadoop and open source data management software and the other for purpose-built SQL databases optimized for query processing. Hadoop avoids most of the upfront licensing and loading costs endemic to traditional relational database systems. However, since the technology is still immature, there are hidden costs that have thus far kept many Hadoop implementations experimental in nature. On the other hand, analytical platforms are a more proven technology, but impose significant upfront licensing fees and potential migration costs. Companies wading into the waters of the Big Data stream need to evaluate their options carefully.

]]>
http://www.b-eye-network.com/blogs/eckerson/archives/2012/02/part_iii_-_two.php Mon, 6 Feb 2012 08:30:19 MST http://www.b-eye-network.com/blogs/eckerson/archives/2012/02/part_iii_-_two.php
The First Step to Becoming Data-Centric

In my last post I outlined the importance of IT making the change from being application-centric to becoming data-centric. What does this mean? And what are the steps IT should take to become data-centric? I will address this in my next series of posts.

The first step is to determine what your business data is. Occasionally, IT has performed business data investigations an exercise to determine which applications contain certain data such as customer data. This was an exercise to identify which applications use and/or change this data. While it is necessary to know these things, this is not the way to become data-centric.

Back to Step 1 for becoming data-centric: To determine what your business data is requires that business data be (1) named; (2) defined; and (3) published. Let's look at these individually:

  • Named business data - This seems trivial yet very few organizations have a list of the data used by the business. Naming the data is the first action.
  • Defined business data - Defining the data is the second action. This is a descriptive definition for the meaning of each datum and the rigorous specification of the rules to which the datum must adhere for it to have the basic level of integrity, that is satisfy the constraints on the values or the datum that allow it to be considered valid outside of further business contexts.
  • Published business data - In order to be data-centric, it is necessary to be able to interact with the universe of data used in the business. This means a business glossary, a directory of business terminology, or another definitive form of published information about the data.

Taking this step will require IT to implement a data discipline and governance program that commits to the published business data as the means to interact with the business community on their data and information usage and needs. Another way to think of this is as defining the semantic layer between the business and the data and information IT delivers to the business.

Using a published business glossary will also change the way in which business requirements are gathered and the way in which business applications are designed and managed. But these are topics for future posts.



]]>
http://www.b-eye-network.com/blogs/skriletz/archives/2012/02/the_first_step.php Fri, 3 Feb 2012 12:58:21 MST http://www.b-eye-network.com/blogs/skriletz/archives/2012/02/the_first_step.php
The Super (Bowl) Success of Analytics

The role of analytics in sports isn't a particularly new story. Back in 2003, Michael Lewis - in his book Moneyball - chronicled how the use of analytics helped baseball's Oakland Athletics win a World Series.  The story became so popular that it was made into an Academy Award nominated film in 2011.  While the application of analytics in football has been less publicized, one of the biggest success stories of analytics in professional football will be on display this weekend at the Super Bowl: The New England Patriots.

Before delving into analytics, two quick caveats. First, before any New York Giants fans (or Patriots haters) send any nasty emails, this post isn't an endorsement of any particular football team, but rather an attempt to examine the benefits of analytics through the lens of sports.  Second... well, go Pats!

Setting the Stage for Super Bowl Analytics

In the 1990s, the National Football League introduced two significant changes to the league: free agency and the salary cap. Free agency allowed players, once their contracts with one team had expired, to sign with another team.  The salary cap essentially limited the amount of money a team could spend on its players. The intention, and net effect, was to create parity among all the teams in the league.  In other words, in a hyper-competitive industry, all participants had relatively equal access to talent and resources.

Yet within this environment of parity, the New England Patriots have appeared in (six, counting this weekend's game) and won (three) more Super Bowls than any other franchise over the same period. Many have attributed this success, all other things being equal, to the use of analytics by the Patriots. Here are a few of the ways head coach Bill Belichick and the Patriots have been able to use analytics for competitive advantage.

Understanding the Value of Resources through Analytics

In 2000, the Patriots drafted a relatively unknown quarterback named Tom Brady in the sixth round, the 199th player selected overall. The Patriots current roster features 18 players who weren't drafted by the NFL when they left college. In case you don't follow football, Tom Brady has gone on to become one of the greatest players of all time (according to NFL Network), and those 18 undrafted players are now playing for football's grand prize.  While all teams have access to a wealth of data on these players, the Patriots have the ability to look at that data in a unique way, and fit it into their overall system. In other words, they understand the processes that drive success for their organization and have been able to quantify and evaluate the available resources that will most positively impact those processes.

Applying Analytics to Structured Processes

The rules for American football are fairly well established: there are a fixed number of players on each side (11), and a finite number of ways for one to advance the ball (pass, run or kick), yet teams are constantly finding novel ways to combine how those eleven players advance the ball on each play. The Patriots have been known to take the application of analytics in play calling to new heights, allowing them to select plays based on sound information given a myriad of factors in a given situation rather than relying on "conventional wisdom." Of course, this use of analytics doesn't always work out, but given that the Patriots have the highest winning percentage in the NFL for the past decade, this application of analytics has seemed to work for them.

Creating Long Term Value through Analytics

The Patriots are one of the most valuable franchises in the NFL according to Forbes. And while the team ranks third for overall value behind the Dallas Cowboys and the Washington Redskins, it's seen a higher increase in value (245%) over the past decade than either.  Winning certainly helps, but many attribute the Patriots' increase in value to the emphasis the team places on fan satisfaction analysis. The Patriots organization uses analytics to determine and improve the "total fan experience." They even go so far as hiring 20-25 people for each home game to make quantitative measurements of stadium food, parking, personnel and bathroom cleanliness.  Many people credit the Patriots for their "attention to detail." I think what sets them apart, however, is their analytics of the details.

What Does it All Mean?

Many organizations, in any industry, wish they were as successful as the New England Patriots.  Are analytics the only factor that has lead to their success? Of course not. But any organization looking to gain competitive advantage through analytics can benefit from the Patriots example. Companies can use analytics to better understand their competitive landscape and evaluate available resources; they can apply analytics to defined processes for improved performance; and they certainly can analyze customer information for increased loyalty. Will the Patriots winning the Super Bowl help to validate the value of analytics? No... but it sure would be nice.

By Adrian Alleyne, Patriots Fan and Director of Market Research

© 2012, DecisionPath Consulting



]]>
http://www.b-eye-network.com/blogs/williams/archives/2012/02/the_super_bowl.php Thu, 2 Feb 2012 22:12:47 MST http://www.b-eye-network.com/blogs/williams/archives/2012/02/the_super_bowl.php
Looking at BI in the Cloud Last week I attended MicroStrategy World in Miami and just posted a general overview of the conference and takeaways on the BI4SMB community forum. One of their key strategic focuses is cloud computing, with three tiers of solution offerings that have recently been released to the market. As I won't be giving an overview of the solutions, here is a link to more details:http://www.microstrategy.com/cloud/.

Because MicroStrategy is traditionally an enterprise offering in relation to market penetration and overall focus, the increasing focus on cloud offerings provides more support for SMB adoption over time. After all, the promise of no upfront costs coupled with the ability to take advantage of leading BI and data warehousing technologies is valuable when evaluating the costs and benefits associated with BI deployments. Hopefully these solutions will indeed provide this value to SMBs. Currently, the free offerings can at least give organizations an idea of whether broader adoption will benefit them.

Based on security, performance, features, etc. there isn't much that can be negatively attributed to moving to the cloud. But, SMBs should be aware that when looking at the costs over time, cloud may not be as cost effective. Evaluations should include the following:

  1. Upfront hardware costs or allocating specific hardware to a BI project.
  2. Resources required to maintain hardware/software/project over time.
  3. New hardware over time to account for expansions.
  4. Yearly fee structure - how things changed based on number of users, data stored, type of user, etc.
  5. Cost comparison based on initial hardware expenditure and internal resources vs. cloud expansion over time.
  6. What features/modules are available and do these differ based on price points.
  7. What about data access and integration?
In reality, the list is close to endless. In addition to costs, organizations need to look at the effort required from business and technical staff as well as the ease of use, training required, etc.

BI in the cloud is definitely opening up the playing field to many organizations that could not adopt broad BI in the past due to limitations. Offerings from vendors such as MicroStrategy that base their cloud offerings on the same solution platform as their on premise version, can help provide full BI offerings to SMBs providing the above factors make sense.



]]>
http://www.b-eye-network.com/blogs/wise/archives/2012/01/looking_at_bi_in_the_cloud.php Tue, 31 Jan 2012 08:32:46 MST http://www.b-eye-network.com/blogs/wise/archives/2012/01/looking_at_bi_in_the_cloud.php
CIOs to Business Analytics: Sorry We Neglected You

Last week, I posted “Can Organizations Get Business Analytics Strategy Right?” which looked at a recent Gartner report that predicted that more than 70% of companies wouldn’t be able to successfully connect analytics and business strategy.

Gartner followed up this report with their 2012 CIO survey. One of the interesting findings was that business intelligence/business analytics jumped back up to the number one priority for CIOs after slipping to 5th place in 2011.

There are a number of reasons why this shift occurred: virtualization projects have wrapped up, or 2011 projects were more focused on cost savings vs. growth, for example.  We thought it would be amusing to take a look at how the shift took place in the following video:

While this is a somewhat tongue in cheek look at the role analytics is playing for CIOs, and for that matter organizations in general, it does underscore a theme that the two Gartner reports keep coming back to: business analytics has the potential to play a major role in supporting and shaping business strategy.



]]>
http://www.b-eye-network.com/blogs/williams/archives/2012/01/cios_to_busines.php Thu, 26 Jan 2012 20:31:23 MST http://www.b-eye-network.com/blogs/williams/archives/2012/01/cios_to_busines.php
Investment Dollars Flowing into Performance Management

Within the past week two very different performance management vendors have received millions of dollars of outside investment. I say very different because one is generating excitement due to its application of the latest technology, and the other is getting attention because of its new approach to solving a long standing business challenge. The companies are Tidemark and XLerant. You can read what we have recently written about each of them here and here. They are both good companies with a solid vision and very experienced teams. Venture capital guys tend to like to get in on the next big thing. Performance management is an established big thing so why the investments now? For one, they must expect continued growth. In addition, each of these companies does have a 'next big thing' element. Tidemark is wedding the proven principles of performance management to the next generation of technology. They are still relatively early stage, but the potential is huge. In the case of XLerant they are approaching the crowded, but still in high demand, budgeting solutions area from a new angle. There certainly is room for both of these companies to succeed. These investments also bode well for the established performance management vendors as it is just one more validation that performance management is an important and growing area. In particular Adaptive Planning and Host Analytics, pioneers in bringing the latest technology to performance management, should see interest in their solutions increase as the investment and related coverage of Tidemark may help more companies recognize the value of this approach.

XLerant received 3 million, and Tidemark 24 million.



]]>
http://www.b-eye-network.com/blogs/schiff/archives/2012/01/investment_doll.php Wed, 25 Jan 2012 06:06:29 MST http://www.b-eye-network.com/blogs/schiff/archives/2012/01/investment_doll.php
How does decision support processing differ from transaction processing?
by Dan Power
Editor, DSSResources.com

Information systems can be categorized in many ways, but historically business information systems began as tools to record and process transactions. It is still useful
to distinguish between informational decision support and transaction processing systems. Transaction processing is divided into individual, indivisible operations, called
transactions. More specifically, a transaction is a discrete unit of work that must be completely processed by a computer system or it fails. For example, entering a customer order is an example of a transaction. Decision support or informational systems summarize and report on transactions.

Continue reading at
http://dssresources.com/faq/index.php?action=artikel&id=241


]]>
http://www.b-eye-network.com/blogs/power/archives/2012/01/how_does_decisi.php Sun, 22 Jan 2012 06:50:55 MST http://www.b-eye-network.com/blogs/power/archives/2012/01/how_does_decisi.php
It Is Time for IT to Become Data-Centric

This blog will investigate putting data first rather than having business data be treated as exclusive to applications. While functionality is important, software functionality is based on consistent operations on data or groups of data. The application-centric view holds this consistency within the boundaries of one or more, but not all, applications. The problems that arise from application-centric IT are well known: disparate data in data silos, making it difficult to share data across applications; inconsistent and redundant data; and the need to transform and integrate data for enterprise reporting and analytics.

A data-centric view maintains that any consistent operations on data or groups of data must hold for all applications, eliminating data transformations and making it easy to share data across applications. Putting data first means we ensure the data is what we want it to be for the business and all its business uses before we build any functionality. The consistent operations become data rules that are employed whenever an operation, such as data entry or reporting, occurs. Thus software functionality is no more that the execution of pre-defined data rules combined with usage rules for the user interface, such as for a web page, report, or query, a function is performing.

The benefits of this are clear: data-centric data is standardized and correct across all applications and uses; there is easy sharing of data across applications; there is little, if any, need to transform or change data before it can be used with other data; data rules are known and applied consistently across all uses of data; all data is ready to be used for new reports, business uses, and software functionality; and, most importantly, new software functionality is purely incremental, comprised of new data, new data rules, and usage rules for new user interface components that build upon existing data, data rules, and usage rules without replicating, duplicating, or redoing any data or data rule.

While becoming data-centric may sound like it is an ideal that is impossible to achieve, we are at a time when technology exists to make this possible. I will explore these in future articles and blog posts. The challenge for IT is to move from its current application-centric environment where all old applications are legacy silos, and that any new application immediately becomes, to a data-centric environment where data, rules, and usage are unified, consistent, and always good-to-go. We know the problems of application-centric IT cannot be overcome easily--we live with them every day. It's time to make IT data-centric.



]]>
http://www.b-eye-network.com/blogs/skriletz/archives/2012/01/its_time_for_it.php Fri, 20 Jan 2012 08:15:00 MST http://www.b-eye-network.com/blogs/skriletz/archives/2012/01/its_time_for_it.php
The Big Data Database Saga Continues By now all of you have learned about the announcement from Amazon about DynamoDB, the latest database with NoSQL+Cassandra+Voldemort+Riak and a lot of other tools thrown together, completely hosted on the cloud, with the feature to scale on demand, a true elastic scalability similar to EC2. throw on top of this a MapReduce interface and you have a Big Data Database that can truly scale.

What sets DynamoDB in my simple tests over the past few hours is the simplicity that it brings to Big Data processing. While my tests are not complete yet, initial results are definitely encouraging. As I write this blog, I have also read Datastax's comparison of Cassandra and DynamoDB at - DataStax questions DynamoDB's performance. The comparison is long post full of technical comparisons around operations per second, but does not mention cost or services provision of DataStax. If you look at cost, Amazon says the services start at $1 per gigabyte per month. Data transfer is free for incoming data. It's also free for the first 10 terabytes per month and between AWS services (like Elastic MapReduce and S3). Once you surpass 10 terabytes, taking data out of the service is $0.12 per gigabyte through 40 terabytes and then lower rates up to 350 terabytes. Throughput capacity is $0.01 per hour for every 10 units of write capacity and $0.01 per hour for every 50 units of read capacity.

Based on where several internet-based, service companies have built models and found success, they will not have any hesitation in adopting to the DynamoDB platform. Especially with the ability to dial-up and dial-down scalability, you can really control costs, which even on a consistent basis will be much lesser compared to on-site provisioning for these companies. DynamoDB has beta clients like
Elsevier, Formspring and SmugMug, which are definitely encouraging names.

As an organization, If one were to choose a cloud based services provider for Big Data, Amazon sounds a logical choice based on several fronts, but is your big data initiative internet deploy-able? and do you have staffing to execute the program even if you host the data on the cloud?. While you digest more content apart from this blog on DynamoDB, I will revert to running more experiments and share more information in the next few days on scalability tests and consistency of the database.

There are several NoSQL databases to compare DynamoDB against too for a fair comparison at the DB level.

Watch for further information on specifics.


]]>
http://www.b-eye-network.com/blogs/krishnan/archives/2012/01/the_big_data_da.php Thu, 19 Jan 2012 21:38:31 MST http://www.b-eye-network.com/blogs/krishnan/archives/2012/01/the_big_data_da.php
Let the Revolution Begin: Big Data Liberation Theology (Editor's note: This is the second in a multi-part series on big data.) The big data movement is revolutionary. It seeks to overturn cherished tenets in the world of data processing. And it's succeeding. Most people define "big data" by three attributes: volume, velocity, and variety. This is a good start, but misses what's really new. (See "The Vagaries of the Three V's: What's Really New About Big Data".) At its heart, big data is about liberation and balance. It's about liberating data from the iron grip of software vendors and IT departments. And it's about establishing balance within corporate analytical architectures long dominated by top-down data warehousing approaches. Let me explain. Liberating Data from Vendors Most people focus on the technology of big data and miss the larger picture. They think big data software is uniquely designed to process, report, and analyze large data volumes. This is simply not true. You can do just about anything in a data warehouse that you can do in Hadoop, the major difference is cost and complexity for certain use cases. For example, contrary to popular opinion on the big data circuit, many data warehouses can store and process unstructured and semi-structured content, execute analytical functions, and run processes in parallel across commodity servers. For instance, back in the 1990s (and maybe even still today), 3M drove its Web site via its Teradata data warehouse, which dynamically delivered Web content stored as blobs or external files. Today, Intuit and other customers of text mining tools, parse customer comments and other textual data into semantic objects that they store and query within a SQL database. Kelley Blue Book uses parallelized fuzzy matching algorithms baked into its Netezza data warehouse to parse, standardize, and match automobile transactions derived from auction houses and other data sources. In fact, you can argue that Google and Yahoo could have used SQL databases to build their search indexes instead of creating Hadoop and MapReduce. However, they recognized that a SQL approach would have been wildly expensive because of database licensing costs due to the data volumes. They also realized that SQL isn't the most efficient way to parse URLs from millions of Web pages and such an approach would have jacked up engineering costs. Open Source. The real weapon in the big data movement is not a technology or data processing framework, it's open source software. Rather than paying millions of dollars to Oracle or Teradata to store big data, companies can download Apache Hadoop and MapReduce for free, buy a bunch of commodity servers, and store and process all the data they want without having to pay expensive software licenses and maintenance fees and fork over millions to upgrade these systems when they need additional capacity. This doesn't mean that big data is free or doesn't carry substantial costs. You still have to buy and maintain hardware and hire hard-to-find data scientists and Hadoop administrators. But the bottom line is that it's no longer cost prohibitive to store and process hundreds of terabytes or even petabytes of data. With big data, companies can begin to tackle data projects they never thought possible. Counterpoint. Of course, this threatens most traditional database vendors who rely on a steady revenue stream of large data projects. They are now working feverishly to surround and co-opt the big data movement. Most have established interfaces to move data from Hadoop to their systems, preferring to keep Hadoop as a nice staging area for raw data, but nothing else. Others are rolling out Hadoop appliances that keep Hadoop in a subservient role to the relational database. Still others are adopting open source tactics and offering scaled down or limited use versions of their databases for free, hoping to lure new buyers and retain existing ones. Of course, this is all good news for consumers. We now have a wider range of offerings to choose from and will benefit from the downward price pressure exerted by the availability of open source data management and analytics software. Money ultimately drives all major revolutions, and the big data movement is no different. Liberating Data From IT The big data revolution not only seeks to liberate data from software vendors, it wants to free data from the control of the IT department. Too many business users, analysts, and developers have felt stymied by the long-arm of the IT department or data warehousing team. They now want to overthrow these alleged "high priests" of data who have put corporate data under lock and key for architectural or security reasons or who take forever to respond to requests for custom reports and extracts. The big data movement gives business users the keys to the data kingdom, especially highly skilled analysts and developers. Load and Go. The secret weapon in the big data arsenal is something that Amr Awadallah, founder and CTO at Cloudera, calls "schema at read." Essentially, this means that with Hadoop you don't have to model or transform data before you query it. This cultivates to a "load and go" environment where IT no longer stands between savvy analysts and the data. As long as analysts understand the structure and condition of the raw data and know how to write Java MapReduce code or use higher level languages like Pig or Hive, they can access and query the data without IT intervention (although they may need permission.) For John Rauser, principal engineer at Amazon.com, Hadoop is a godsend. He and his team are using Hadoop to rewrite many data intensive applications that require multiple compute nodes. During his presentation at Strata Conference in New York City this fall, Rauser touted Hadoop's ability to handle myriad applications, both transactional and analytical, with both small and large data volumes. His message was that Hadoop promotes agility. Basically, if you can write MapReduce programs, you can build anything you want quickly without having to wait for IT. With Hadoop, you can move as fast or faster than the business. This is a powerful message. And many data warehousing chieftains have tuned in. Tim Leonard, CTO of US Xpress loves Hadoop's versatility. He has already put it into production to augment his real-time data warehousing environment, which captures truck engine sensor data and transforms it into various key performance indicators displayed on near real-time dashboards. Also, a BI director for a well-known internet retailer uses Hadoop as a staging area for the data warehouse. He encourages his analysts to query Hadoop when they can't wait for data to be loaded into the warehouse or need access to detailed data in its raw, granular format. Buyer Beware. To be fair, Hadoop today is a "buyer beware" environment. It is beyond the reach of ordinary business users, and even many power users. Today, Hadoop is agile only if you have lots of talented Java developers who understand data processing and an operations team with the expertise and time to manage Hadoop clusters. This steep expertise curve will diminish over time as the community refines higher level languages, like Hive and Pig, but even then, it's still pretty technical. In contrast, a data warehouse is designed to meet the data needs of ordinary business users, although they may have to wait until the IT team finishes "baking the data" for general consumption. Unlike Hadoop, a data warehouse requires a data model that enforces data typing and referential integrity and ensures semantic consistency. It preprocesses data to detect and fix errors, standardize file formats, and aggregate and dimensionalize data to simplify access and optimize performance. Finally, data warehousing schemas present users with simple, business views of data culled from dozens of systems that are optimized for reporting and analysis. Hadoop simply doesn't do these things, nor should it. Restoring Balance Clearly, big data and data warehousing environments are very different animals, each designed to meet different needs. They are the yin and yang of data processing. One delivers agility, the other stability. One unleashes creativity, the other preserves consistency. Thus, it's disconcerting to hear some in the big data movement dismiss data warehousing as it's an inferior and antiquated form of data processing best preserved in a computer museum but not used for genuine business operations. Every organization needs both Hadoop and data warehousing. These two environments need to work synergistically together. And it's not just that Hadoop should serve as a staging area for the data warehouse. That's today's architecture. Hadoop will grow beyond this to become a full-fledged reporting and analytical environment as well as a data processing hub. It will become a rich sandbox for savvy business analysts (whom we now call data scientists) to mine mountains of data for million dollar insights and answer unanticipated or urgent questions that the data warehouse is not designed to handle. Summary Thomas Jefferson once said, "The tree of liberty must be refreshed from time to time with the blood of patriots and tyrants." He was referring to the natural process by which political and social structures stagnate and stratify. This principle holds true for many aspects of life, including data processing. For too long, we've tried to shoehorn all analytical pursuits into a single top-down data delivery environment that we call a data warehouse. This framework is now creaking and groaning from the strain of carrying too much baggage. It's time we liberate the data warehouse to do what it does best, which is deliver consistent, non-volatile data to business users to answer predefined questions and populate key performance indicators within standard corporate and departmental dashboards and reports. It's gratifying to see Hadoop come along and shake up data warehousing orthodoxy. The big data movement helps clarify the strengths and limitations of data warehousing and underscore it role within an analytics architecture. And this leaves Hadoop and NoSQL technologies to do what they do best, which is provide a cost-effective, agile development environment for processing and query large volumes of unstructured data. The organizations that figure out how to harmonize these environments will be the data champions of tomorrow.

]]>
http://www.b-eye-network.com/blogs/eckerson/archives/2012/01/let_the_revolut.php Thu, 19 Jan 2012 09:50:46 MST http://www.b-eye-network.com/blogs/eckerson/archives/2012/01/let_the_revolut.php
The Vagaries of the Three Vs: What Is Really Unique About Big Data (Editor's note: this is the first article in a multi-part series on big data.) Most people define "big data" by three attributes: volume, velocity, and variety. These describe the main characteristics of big data, but aren't exclusive to it. Many data warehouses today exhibit these same characteristics. This article drills into these attributes and shows what's common and not between data warehousing and big data environments. Volume. Data volume is a slippery term. Many observers have noted this. What's large for some organizations is small for others. So, experts now define the term as "data that is no longer easy to manage." This is still pretty squishy as far as definitions go. From a historical context, data warehousing has always been about "big data." The real difference is scale and scope, which have been growing steadily for years. In the 1990s, high-end data warehouses contained hundreds of gigabytes and then terabytes. Today, they have breached the petabyte range, and surely will ascend to exabytes sometime in the future. Does that make a data warehouse a big data initiative? Not really. The big data movement today is largely about using open source data management software to cost effectively capture, store, and process semi-structured Web log data for a variety of tasks. (See "Let the Revolution Begin: Big Data Liberation Theology.") While data warehousing is focused solely on delivering structured data for reporting and analysis applications, the big data movement has broader implications. Hadoop and NoSQL can manage any type of data (structured, semi-structured, and unstructured) for virtually any type of application (analytical or transactional.) Velocity. If you have big data, by default you have to load it in real-time using streaming or mini-batch load intervals. Otherwise, you can never keep up. This is nothing new. Most data warehousing teams have already converted from weekly and nightly batch refreshes to mini-batch cycles of 15 minutes or less that insert only deltas using change data capture and trickle feeding techniques. Hadoop and NoSQL databases are also evolving from batch loading of data to streaming it in real-time. Some organizations also embrace real-time loading to meet operational business needs. For example, 1-800 CONTACTS displays orders and revenues in a data warehouse-driven dashboard updated every 15 minutes. US Xpress tracks idle time of its trucks by capturing sensor data from truck engines fed into a data warehouse that drives several real-time dashboards. Currently, most big data installations don't support real-time reporting environments, but the technology is evolving fast and this capability will soon become standard fare. Variety. Variety generally refers to the ability to capture, store, and process multiple types of data. This is perhaps the biggest differentiator between data warehousing and big data environments. Hadoop is agnostic about data type and format. Just dump your data into a file and then write a Java program to get it out. For example, a Hadoop cluster can store Twitter and Facebook data, audio and video, documents and transactions, and so on. In addition, the same Hadoop file can contain a jumble of different records--or key value pairs--each representing different entities or attributes. Although you can also do this in a columnar database, it's standard fare for Hadoop. (This is the "complexity" attribute that some industry observers add as a fourth attribute of big data.) Mixing record types puts the onus on the developer/analyst to sort through the records to find only the ones they want, which presumes foreknowledge about record types and identifiers. This would never fly in a data warehouse. The SQL would grok. Summary The three Vs provide a reasonable map to the big data landscape as long as you don't dig too deeply into the details. There, you'll find there is considerable overlap with traditional data warehousing techniques. The real difference between the two environments is that big data is better suited to handling a variety of data (i.e., unstructured and complex data) than a data warehouse which is designed to work with standardized, non-volatile data.

]]>
http://www.b-eye-network.com/blogs/eckerson/archives/2012/01/the_vagaries_of.php Thu, 19 Jan 2012 09:43:28 MST http://www.b-eye-network.com/blogs/eckerson/archives/2012/01/the_vagaries_of.php
Responding to the Nontechnical Challenges of Business Analytics

Can Organizations Get Analytics Strategy Right?

In a new report released by Gartner last week, the most salient finding was that more than 70% of BI initiatives will consist of analytics metrics that lack synchronicity with overarching business strategy.  According to the report: "Organizations often develop and deploy hindsight-oriented reports and/or query applications focusing on metrics that users may find interesting, but they don't represent the operational or strategic controls used to facilitate business performance." The report's author, Andreas Bitterer, goes on to say "The immediate future of the BI landscape is one of a disconnect between marketing hype about pressing challenges on the one hand and reality on the other."

I'm not as quick to castigate the marketing function of major BI vendors for building the hype around BI technology; some might argue that many analysts have been equally enamored with what's next on the horizon for BI technology rather than on the business problems BI can help solve. 

The Real Disconnect Between Business Analytics and Business Process

The disconnect Gartner talks about is something we've seen for with our clients for years.  What's interesting is that the report predicts that this disconnect will be widespread (over 70%) for the foreseeable future. We've been evangelists and advocates for making the connection between business analytics and business for bottom-line impact for close to a decade, yet the issue persists.

The real disconnect comes the perception that analytics is primarily a technology tool from business users. Our own research shows that many business users think they understand and have adequate analytics to support their core business processes; while nearly three-quarters of business users we surveyed indicate that they use traditional reporting, only 40% report using advanced or predictive analytics. Another example of this disconnect is the fact that these business users prioritize other business initiatives over analytics.  In other words, they aren't able to connect BI with true business value.

Making the Business Analytics to Business Process Connection

I suppose the good news for any organization looking to leverage analytics to improve business performance is that most of their peers aren't doing it...yet. But making the connection doesn't happen overnight.  It takes an assessment of the organization's state of readiness to leverage business analytics.  Is there a culture around process improvement and analytically driven decision making?  Is there a partnership between IT and business (e.g. is the CIO business savvy, and are departmental executives - CFO, COO, etc... - technically savvy)? Is the IT and analytics infrastructure in place to allow the organization to leverage analytics. 

If the answer is no for any of the above, then some remediation has to occur to even have a chance to move out of Gartner's "misaligned" 70%. To then move into the ranks of the minority of companies that use analytics for competitive advantage, an organization must then look at analytics from the top down, knowing the answers to some basic business questions:

  1. What competitive and external factors influence my business?
  2. What business strategy do I employ to compete in this environment?
  3. What business processes drive this business strategy?
  4. How do I measure the success of these business processes?

With the answers to these questions in place, an organization's analytics team (and by team, I mean a collaboration between IT and line of business leadership) can then start to look at opportunities where analytics can improve how these business processes are measured. This in turn allows one to do things like provide better root cause analysis and predictive analytics.

So...Can Organizations Get Analytics Strategy Right?

While I agree with Bitterer's observation about where companies are right now in terms of business and analytics alignment, I'm not sure that I'd agree that the majority wouldn't be able to get there by 2014. Organizations just need to take the time to establish the process of understanding their readiness to leverage analytics, and then determine the opportunities that business analytics can offer.

By Adrian Alleyne, Director of Market Research
© DecisionPath Consulting, 2012



]]>
http://www.b-eye-network.com/blogs/williams/archives/2012/01/responding_to_t.php Wed, 18 Jan 2012 17:22:44 MST http://www.b-eye-network.com/blogs/williams/archives/2012/01/responding_to_t.php
Big Data, Big Mistakes?
4831625_s.jpg
Now, I may be accused of getting up on my soap box in this first post of 2012, but... a few recent articles on the topic of big data / predictive analytics have really got me thinking. Well, worrying, to be more precise. My worry is that there seems to be a growing belief in the somehow magical properties of big data and a corresponding deification of those on the leading edge of working with big data and predictive analytics. What's going on?

The first article I came across was "So, What's Your Algorithm?" by Dennis Berman in the Wall Street Journal. He wrote on January 4th, "We are ruined by our own biases. When making decisions, we see what we want, ignore probabilities, and minimize risks that uproot our hopes. What's worse, 'we are often confident even when we are wrong,' writes Daniel Kahneman, in his masterful new book on psychology and economics called 'Thinking, Fast and Slow.' An objective observer, he writes, 'is more likely to detect our errors than we are.'"

I've read no more than the first couple of chapters of Kahneman's book (courtesy of Amazon Kindle samples), so I don't know what he concludes as a solution to the problem posed above--that we are deceived by our own inner brain processes. However, my intuitive reaction to Berman's solution was visceral: how can he possibly suggest that the objective observer advocated by Kahneman could be provided by analytics over big data sets? In truth, the error Berman makes is blatantly obvious in the title of the article... it always is somebody's algorithm.



]]>
http://www.b-eye-network.com/blogs/devlin/archives/2012/01/big_data_big_mi.php Mon, 16 Jan 2012 08:28:55 MST http://www.b-eye-network.com/blogs/devlin/archives/2012/01/big_data_big_mi.php
Evolving BI Roles: From Data Experts to Decision Experts Most business intelligence (BI) methodologies feature a circular workflow which might include the following steps: collect, integrate, report, analyze, decide, act. Unfortunately, these information technology (IT) centric workflows overlook the most important parts of the decision making process: collaborate and review. Collaborate Most people don't make decisions in a vacuum; they share ideas, options, and perspectives with others. Nor do they analyze data in a vacuum, at least anomalies or variances that require further attention. When people exchange ideas on a topic, they refine each other's knowledge, fill in missing gaps, and challenge assumptions. The result is a more comprehensive understanding of a situation and a better course of action. Most of the time, people collaborate with peers in a live, two-way exchange of information. Today, this sharing typically occurs by telephone and in face-to-face meetings, or asynchronously via email. But fanned by the rising popularity of social media sites, like Facebook and LinkedIn, business software vendors are looking to bring online collaboration features to business organizations. For example, BI vendors, such as Panorama, Lyza Software, Actuate, Tibco Spotfire, and Yellowfin, now embed annotations, discussions, shared workspaces and other collaboration features into their products. Other vendors sell general purpose collaboration platforms that serve as virtual water coolers and conference rooms where users can informally and formally share a wide range of information on almost any topic. Popular products here are Jive Software, which recently went public, SAP Streamwork, IBM Connection, and Microsoft Sharepoint. By all accounts, 2012 will be a breakout year for business collaboration software. (To enhance our knowledge of collaboration and BI please take my current, five-minute survey HERE.) Review But collaboration alone is not enough to guarantee excellent decision outcomes. To do that, people must review their decisions and analyze how they could have done things better. Otherwise, they are doomed to repeat their mistakes. Success comes not just from working hard, but working smart. And that requires replaying past events and learning from them. In the book, "How We Decide," author Jonah Lehrer tells the story of Bill Robertie, a world-class backgammon player (as well as chess and poker), who turned a childhood obsession into a lucrative career. "Robertie didn't become a world champion just by playing a lot of backgammon. 'It's not the quantity of practice, it's the quality,' he says. According to Robertie, the most effective way to get better is to focus on your mistakes.... After Robertie plays a chess match, or a poker hand, or a backgammon game, he painstakingly reviews what happened. Every decision is critiqued and analyzed.... Even when he wins--and he almost always wins--he insists on searching for his errors, dissecting those decisions that could have been a little bit better. He knows that self-criticism is the secret to self-improvement, negative feedback is the best kind." Interestingly, experts, like Robertie, after years spent learning from their mistakes, internalize this knowledge. This enables them to operate on a different plane of consciousness from non-experts. In the heat of action, their intuition takes over, and they simply "see" or "feel" what needs to be done. For example, Robertie said, "I knew I was getting good when I could just glance at a board and know what I should do. The game started to become very much a matter of aesthetics. My decisions increasingly depended on the look of things..." Lehrer also describes how Tom Brady, the star quarterback for the New England Patriots football team, is able to make dozens of split-second decisions during a passing play. "Tom Brady spends hours watching game tape every week, critically looking at each of his passing decisions..." This weekly routine of self-criticism builds a literal body of knowledge that gives him an incredibly accurate "gut feel" when passing the ball during a game. When asked to explain his abilities to make the right passing decisions, Brady says, "I don't know how I know where to pass. There are no firm rules. You just feel like you're going to the right place... And that's where I throw it." Business teams, like individual experts, can build up a body of knowledge that enables them to make more accurate decisions, sometimes reflexively. But this only can happen if they assiduously study the impact of their decisions in a given area over a long period and strive to continuous improve. Summary To improve corporate decision making, individuals and teams not only need to collaborate, but they need to document and review each of their decisions. This will improve decision effectiveness and help build a true learning organization. As BI professionals, we need to understand that our job is not done when we provide data to the business. We need to shepherd them along the entire analysis and decision making process. We need to embed collaboration into BI tools and link them to general purpose decision making platforms. In short, we not only need to be data experts, but decision experts as well.

]]>
http://www.b-eye-network.com/blogs/eckerson/archives/2012/01/evolving_bi_fro.php Fri, 13 Jan 2012 15:39:45 MST http://www.b-eye-network.com/blogs/eckerson/archives/2012/01/evolving_bi_fro.php