In this short series of articles, I explore the concept and reality of “big data.” What is it and where does it come from? Why is it important? How does it add value to the business? What is its impact on traditional data warehousing and business intelligence? In Part 1, I explored the first two questions: What is it and where does it come from? In Part 2, I examine the next two questions: Why is it important and how does it add business value?
The Importance of Big Data
Strange though it may seem, big data is nothing new! In one sense, based on the definition from Wikipedia I quoted in Part 1
of this series, “Big Data is a term applied to data sets whose size is beyond the ability of commonly used software tools to capture, manage and process the data within a tolerable elapsed time,” big data has existed since the first days of computing. The game-changing IBM System/360 mainframe first shipped in mid-1965 with disks (or DASD, direct access storage devices, as they were then known) that had an unformatted capacity of 7.25 megabytes. (For comparison, the IBM PC/XT shipped in 1983 with a 10MB hard drive.) When combined with main memories of only a few KB, as was often the case, it’s easy to see that it didn’t take very large (by today’s standards) data sets to tax the ability of the systems of the time.
Since the 1990s, data warehousing and data mining have brought numerous examples of big (for their time) data. Walmart’s Teradata-based data warehouse, dating from that period, has been consistently among the largest data warehouses in the world. Walmart is notoriously secretive about their system, but a variety of sources on the Web give figures. In the early 1990s, it started at 340GB. By 2000, it was said to be 70TB. By 2004, it was 500TB or half a petabyte. In 2008, a figure of 4PB was being mentioned. For their time, these are certainly big data, before the term was invented, with correspondingly large storage and management costs. And Walmart, as the world’s largest retailer through most of that time, is certainly renowned for getting value for its money.
From these and other examples, we may draw a couple of conclusions. First, data size has always been pushing the limits of computer storage and processing technology. In that sense, what we’re seeing today is not new. So far, technology has been able to accommodate this data growth, so it is probably a reasonable assumption that it will continue to do so. Second, some highly successful, leading-edge companies have consistently invested in the (relatively) expensive technologies needed to use big data and have reaped significant value from it. This, in a nutshell, is the importance of big data: Big data enables innovative businesses to become leading-edge adopters of new approaches to doing business and thus become particularly successful.
Of course, this is not to say that any business adopting big data will necessarily become a leading-edge adopter of a new business approach. There are other highly significant factors such as the market environment in which the business operates and the organization’s ability to adapt to change. Furthermore, like any technical innovation, big data may confer first-mover advantage on a particular business, but then becomes mandatory for the competition simply to survive. There is hardly a large retailer who hasn’t used data in a manner similar to Walmart; however, for a variety of reasons, perhaps related to Walmart’s market share or business ethos, they have been unable to achieve Walmart’s level of success.
The Value of Big Data
Recognizing that big data has long been with us allows us to look at the historical value of big data, as well as current examples. This allows a wider sample of use cases, beyond the Internet giants who are currently leading the field in using big data. This leads us to the identification of value in two broad categories: pattern discovery and process invention.Pattern Discovery
First, let me be clear that pattern discovery alone is not of value to the business. The title is simply shorthand for “pattern discovery and innovative reaction!” Clearly, discovering a pattern in, for example, customer behavior may be very interesting, but the real value occurs when we put that discovery to use by changing something that reduces costs or increases sales.
Pattern discovery leads us directly back to data mining. There are probably few of us who haven’t heard and perhaps repeated the “beer and diapers (or nappies)” story: A large retailer, supposedly Walmart, discovered through basket analysis – data mining till receipts – that men who buy diapers on Friday evenings are also likely to buy beer. They rearranged the store layout to place the beer near the diapers and watched beer sales climb. Sadly, this story is now widely believed to be an urban legend or sales pitch rather than a true story of unexpected and momentous business value gleaned from data mining. Nevertheless, it makes the point as well as any number of real examples: there are nuggets of useful information to be discovered through statistical methods in any large body of data, and action can be taken to benefit from these insights.
This particular story illustrates data mining in a single, well-understood data set, with the insight used to target a previously unidentified segment of the customer set –“fathers who buy diapers when supplies run low at home” or some similar categorization. Such uses of big data remain common and provide business value through targeted marketing to smaller and ever more specific micro-segments of the market until we reach the nirvana of the “segment of one.” (In my research, I was surprised to find this concept being discussed
as far back as 1989 by the Boston Group before data warehousing and data mining were advanced enough to make it possible!)
More recently, combining data sets from multiple sources, both related and unrelated, with increasing emphasis on computer logs such as clickstreams and publicly available data sets has become popular. Sometimes referred to as “mining the data exhaust,” this approach can allow specific individuals to be identified without requiring them to opt in by providing individually identifiable information, as was needed previously. Clearly, there is also business value here, but at what cost to privacy and individual choice
As discussed, mining behavior data allows existing processes to be tweaked and changed to provide better business value. However, the second approach to getting business value from big data involves using the data operationally to invent an entirely new process or substantially re-engineer an existing one. Beyond basket analysis, most retailers in the 1990s also used the cash register data to re-engineer their restocking and supply chain processes. It makes sense that goods purchased at the register deplete stock on the shelves, which must then be replenished. Restocking shelves depletes the store room and supplies must be reordered, and so on. In the past, restocking and reordering were largely triggered by a manual process in which floor or warehouse staff had to first notice that stocks were low and then take action. By automating these processes from triggers in the sales/inventory system, enormous financial and customer satisfaction benefits accrue. The twin terrors of retail – out of stock or overstock situations – are avoided.
Machine sensor data, the first category of big data discussed in Part 1
of this series, is key to this level of process re-engineering. Such data consists essentially of raw, unadulterated operational events that can drive new or re-engineered processes. A prime example of this type of process invention comes from the automobile insurance industry, where on-board sensors of acceleration and braking, among others, transmit driving behavior information in real time to the insurance company. Premiums are adjusted based on measured behavior. This is an entirely new way of offering automobile insurance that was impossible before the advent of this type of big data, and clearly allows new ways of achieving business value.
For those contemplating investment in big data, the most important conclusion from this article is to recognize that there are very specific combinations of circumstances in which big data can drive real business value. Sometimes, of course, it is the price for simply staying in the game, as we saw in the case of retailing. Other times, it can open up a new market niche for profitable exploitation, at least for the first movers. Recognizing and taking advantage of such opportunities demands a close partnership between IT and the business to understand all aspects of the situation.
In Part 3 of this series, I’ll take a look at the tools and techniques for using and managing big data and the importance of understanding the roles of IT and business users.
Recent articles by Barry Devlin