We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.

Blog: Wayne Eckerson Subscribe to this blog's RSS feed!

Wayne Eckerson

Welcome to Wayne's World, my blog that illuminates the latest thinking about how to deliver insights from business data and celebrates out-of-the-box thinkers and doers in the business intelligence (BI), performance management and data warehousing (DW) fields. Tune in here if you want to keep abreast of the latest trends, techniques, and technologies in this dynamic industry.

About the author >

Wayne has been a thought leader in the business intelligence field since the early 1990s. He has conducted numerous research studies and is a noted speaker, blogger, and consultant. He is the author of two widely read books: Performance Dashboards: Measuring, Monitoring, and Managing Your Business (2005, 2010) and The Secrets of Analytical Leaders: Insights from Information Insiders (2012).

Wayne is founder and principal consultant at Eckerson Group,a research and consulting company focused on business intelligence, analytics and big data.

December 2010 Archives

I recently spoke with James Phillips, co-founder and senior vice president of products, at Membase, an emerging NoSQL provider that powers many highly visible Web applications, such as Zynga's Farmville and AOL's ad targeting applications. James helped clarify for me the role of NoSQL in today's big data architectures.

Membase, like many of its NoSQL brethren, is an open source, key-value database. Membase was designed to run on clusters of commodity servers so it could "solve transaction problems at scale," says Philips. Because of its transactional focus, Membase is not technology that I would normally talk about in the business intelligence (BI) sphere.

Same Challenges, Similar Solutions

However, today the transaction community is grappling with many of the same technical challenges as the BI community--namely, accessing and crunching large volumes of data in a fast, affordable way. Not coincidentally, the transactional community is coming up with many of the same solutions--namely, distributing data and processing across multiple nodes of commodity servers linked via high-speed interconnects. In other words, low-cost parallel processing.

Key-Value Pairs. But the NoSQL community differs in one major way from a majority of analytics vendors chasing large-scale parallel processing architectures: it relinquishes the relational framework in favor of key-value pair data structures. For data-intensive, Web-based applications that must dish up data to millions of concurrent online users in the blink of an eye, key-value pairs are a fast, flexible, and inexpensive approach. For example, you just pair a cookie with its ID, slam it into a file with millions of other key-value pairs, and distribute the files across multiple nodes in a cluster. A read works in reverse: the database finds the node with the right key-value pair to fulfill an application request and sends it along.

The beauty of NoSQL, according to Philips, is that you don't have to put data into a table structure or use SQL to manipulate it. "With NoSQL, you put the data in first and then figure out how to manipulate it," Phillips says. "You can continue to change the kinds of data you store without having to change schemas or rebuild indexes and aggregates." Thus, the NoSQL mantra is "store first, design later." This makes NoSQL systems highly flexible but programmatically intensive since you have to build programs to access the data. But since most NoSQL advocates are application developers (i.e. programmers), this model aligns with their strengths.

In contrast, most analytics-oriented database vendors and SQL-oriented BI professionals haven't given up on the relational model, although they are pushing it to new heights to ensure adequate scalability and performance when processing large volumes of data. Relational database vendors are embracing techniques, such as columnar storage, storage-level intelligence, built-in analytics, hardware-software appliances, and, of course, parallel processing across clusters of commodity servers. BI professionals are purchasing these purpose-built analytical platforms to address performance and availability problems first and foremost and data scalability issues secondarily. And that's where Hadoop comes in.

Hadoop. Hadoop is an open source analytics architecture for processing massively large volumes of structured and unstructured data in a cost-effective manner. Like its NoSQL brethren, Hadoop abandons the relational model in favor of a file-based, programmatic approach based on Java. And like Membase, Hadoop uses a scale-out architecture that runs on commodity servers and requires no predefined schema or query language. Many Internet companies today use Hadoop to ingest and pre-process large volumes of clickstream data which are then fed to a data warehouse for reporting and analysis. (However, many companies are also starting to run reports and queries directly against Hadoop.)

Membase has a strong partnership with Cloudera, one of the leading distributors of open source Hadoop software. Membase wants to create bidirectional interfaces with Hadoop to easily move data between the two systems.

Membase Technology

Membase's secret sauce--the thing that differentiates it from its NoSQL competitors, such as Cassandra, MongoDB, CouchDB, and Redis--is that it incorporates Memcache, an open source, caching technology. Memcache is used by many companies to provide reliable, ultra-fast performance for data-intensive Web applications that dish out data to millions of current customers. Today, many customers manually integrate Memcache with a relational database that stores cached data on disk to store transactions or activity for future use.

Membase, on the other hand, does that integration upfront. It ties Memcache to a MySQL database which stores transactions to disk in a secure, reliable, and highly performant way. Membase then keeps the cache populated with working data that it pulls rapidly from disk in response to application requests. Because Membase distributes data across a cluster of commodity servers, it offers blazingly fast and reliable read/write performance required by the largest and most demanding Web applications.

Document Store. Membase will soon transform itself from a pure key-value database to a document store (a la MongoDB.) This will give developers the ability to write functions that manipulate data inside data objects stored in predefined formats (e.g. JSON, Avro, or Protocol Buffers.) Today, Membase can't "look inside" data objects to query, insert, or append information that the objects contain; it largely just dumps object values into an application.

Phillips said the purpose of the new document architecture is support predefined queries within transactional applications. He made it clear that the goal isn't to support ad hoc queries or compete with analytics vendors: "Our customers aren't asking for ad hoc queries or analytics; they just want super-fast performance for pre-defined application queries."

Pricing. Customers can download a free community edition of Membase or purchase an
annual subscription that provides support, packaging, and quality assurance testing. Pricing starts at $999 per node.

Posted December 23, 2010 9:38 AM
Permalink | 2 Comments |

The hardest part about implementing business intelligence (BI) solutions is managing change. Let's face it: we humans don't like change. Change can be terrifying if our livelihood, safety, or productivity is at risk. And even small changes can trip us up in subtle or subconscious ways, as I recently discovered.

Last weekend, I decided to clean and rearrange my home office to mark the commencement of a new job. I cleared all books from the shelves, removed all pictures from the walls, boxed up mementoes, got rid of a 1950s era steel file cabinet (who collects paper anymore?), and demolished a built-in closet to free up space.

Today, I sit happily in an echo chamber surrounded by four white walls (squash anyone?). All that remains before I redecorate is an easy chair, a printer on a table, and my desk on which are a lamp, an inbox, and a laptop. I feel like I can breathe again!

Change is good! Or maybe not....

Soon after I decluttered my office, I discovered the perils of change. One morning, as I was getting ready to leave for the airport at 5:00 a.m., I realized I could not find my wallet. When I searched all the usual places and came up empty, I began to panic: you can't get very far these days without a driver's license and a credit card. Honestly, the stress made me want to scream and punch a hole through the wall. Yikes!

With time running out, I grabbed my passport, my wife's credit card, and $200 in cash from my 18-year old son (an unforeseen benefit of having a teenager in the house) and jumped into a waiting limo. After a few minutes, I called my wife to figure out what went wrong and what to do about it. As we were talking, she walked into my office, and exclaimed, "Wayne, your wallet is sitting right on top of your desk!" Exactly where I always put it. How embarrassing!

I must have looked at my desk a dozen times that morning. Even though I had placed my wallet beside the lamp as always, I didn't see it! Somehow, the changes I made to my office and desk altered my perceptions. The missing wallet was hiding in plain sight!

The Stress of Change

When deploying new reports, dashboards, or BI tools, we need to remember that the smallest changes can disrupt the habits, schedules, and thought processes of the business users we are trying to support. They may lash out at us because they're feeling stress from the change we have induced. Even though we offer them a better way (e.g. faster, easier tools; more tailored, accurate reports, etc.), they don't want to change. They are temporarily irrational (insane); they don't want our solution no matter how good it is; they prefer the inferior option that is familiar and easy and doesn't interfere with their ability to get things done.

Thus, to succeed with business intelligence, we need to master the art of change management. Although implementing technology can be difficult, getting people to change the way they consume information and make decisions is even more challenging.

So what can we do?

Empathize. First, before introducing anything new, take a moment to empathize with the business users whose decision-making lives we are about to throw into disarray. Second, be ready for a backlash and remain professional. Remember that when people are angry, it's not them who are talking, it's their anxiety. They are afraid they won't make a critical deadline or won't be as successful as before.

Manage Expectations. We also need to address change management issues as part of our project plans. We need to manage expectations from the outset. That means communicating early and often about changes--before they happen, when they happen, and after they happen. We need to get top executives to trumpet the rationale or benefits of the change, if it's a major one. And we can devise different marketing and communications plans for each distinct group affected by the change.

Multi-touch Support. Once deployed, we need to offer rich, multi-channel support. Some people will need more hand-holding than others. For example, executives may need one-on-one attention, and we may have to duplicate the old environment (e.g., paper report, Excel interface) in the new environment to overcome resistance to change. We need to make sure we offer plenty of online help and that our help desks are ready to answer any questions.

Track Usage. Finally, we need to track usage. We need to estimate what the uptake and resistance will be. If it's higher or lower than expected, we need to find out what's happening. We can't be passive. We need to talk with our business users, ask them what they like and don't. If they are angry, giving them the opportunity to vent is part of the change management process. Some people will only change after a considerable amount of kicking and screaming. So don't short circuit the process!

In the end, we have to build bridges from the old environment to the new. Some people will race across the bridge, wondering why we waited so long. Most will cross the bridge in due time as good corporate citizens. And a few will holdout until the bitter end. We need to know who our holdouts might be and give them extra attention beforehand to quell their anxieties and fears.

Posted December 15, 2010 7:52 AM
Permalink | No Comments |

For awhile the Hadoop community was proselytizing the new open source distributed file system as a relational database killer. But wiser minds have prevailed, namely that of Mike Olson, long-time database executive and current CEO of Cloudera, a leading distributor of Hadoop and related open source add-ons.

I recently sat down with Olson and Jon Kreisa, Cloudera VP of Marketing, and heard loud and clear that Hadoop plays a complementary role to relational-oriented data warehouses and BI tools. "It would be foolish for us to duplicate the functionality of a relational database which has more than 20 years of development behind it," says Olson.

According to Olson, Hadoop's sweet spot is processing large volumes of semi-structured and unstructured data in batch-oriented programs written by developers. Many BI architects see Hadoop as a perfect environment for staging and processing large volumes of clickstream and other unconventional data not commonly stored in a data warehouse.

In effect, Hadoop serves as staging area and ETL system to filter and process "big data" so it can loaded into a data warehouse and joined with other corporate data for reporting and analysis purposes. Hadoop also makes a terrific low-cost archival system that enables companies to keep all their data online without having to summarize it or migrate it to tape.

Last year, Cloudera notched partnerships with a bevy of relational database vendors, who also see the complementary nature of Hadoop to their data warehousing business. This year, Olson says, Cloudera will establish partnerships with multiple ETL and BI vendors, solidifying Hadoop's position as a key component in a large-scale BI architecture. Already, Cloudera has partnered with database, ETL, and BI vendors to create bridges between the two worlds. Database partners of Cloudera include Aster Data, Greenplum, Membase, Netezza, Quest, Teradata, and Vertica. Its ETL partners include Informatica, Pentaho Data Integration, and Talend. And its BI vendors include Jaspersoft, MicroStrategy, and Pentaho.

In finishing, Olson admitted that despite the current cooperation between the Hadoop and BI communities, each is aggressively developing capabilities offered by the other, which will eventually minimize the need for such partnerships. In fact, many large Internet companies, including eBay which recently spoke on a Cloudera Webcast, said they are using Hadoop for reporting and analysis as well as staging, archiving, and preprocessing.

So, while the two camps are playing nice today, the battle has only just begun!

Posted December 2, 2010 2:52 PM
Permalink | No Comments |