With Big Data Out of the Box, Maintaining Order is a Must

Originally published January 29, 2013

In Greek mythology, Prometheus offended Zeus, the king of the gods, by giving fire to mankind without permission. To avenge this deed, Zeus sent the first woman, Pandora, along with her eponymous box, to Earth as a gift to Epimetheus, the brother of Prometheus. Fearing a trick, Prometheus begged Pandora not to open the box. However, curiosity overcame her. When she peeked inside, all the ills of the world flew out, leaving but one thing within – hope.

As in the myth, "big data" certainly poured out of the box in 2012. Whether it is evil or largely a source of hope, and business benefits, for organizations remains to be seen. The past year has shown both sides – from privacy issues and nonsensical statistics to improved medical treatments and modeling of climate change. The possibilities for using big data are proving endless, and the technology to enable them has evolved by leaps and bounds.

But there's more to the story. Prometheus means "foresight" in Greek, and Epimetheus means "after-thought." In today's business environment, big data has literally shifted our thinking on analytics from looking back to looking forward. As a result, we are beginning to see that big data raises real questions about how data is gathered, managed and used.

The year 2013 will certainly see big data technology continue to be developed and improved, along with the emergency of new business uses – and indeed, some dubious schemes – for it. But the IT focus must now shift to the architectural and governance issues that arise in big data environments. The phenomenon shows what we have long processed in business is but a small proportion of the real – and almost limitless – world of information. IT and business intelligence (BI) managers must be prepared to handle much more than that.

I propose a new model for understanding big data in the context of all the structured data and less structured information created and captured by companies. The model (see Figure 1) includes three distinct domains:

  • Human-Sourced Information.  All information ultimately originates from people – it's the highly subjective record of human experiences, from text and images to audio and video, now almost entirely digitized and stored electronically. Loosely structured and often ungoverned, this information must be systematized and standardized for reliable use, by modeling and validating it in operational and BI systems to create the data that's in the second domain.
  •  
  • Process-Mediated Data. Business processes record and monitor all business events, such as registering a customer and manufacturing a product. Process-mediated data is the highly structured and modeled data, as well as the contextual metadata, produced by these processes. Such data has long been the vast majority of what IT processed and managed in relational databases.

  • Machine-Generated Data. Sensors and various machines record data on a wide array of events and situations that they monitor. Their output is machine-generated data, and from simple sensor records to complex computer logs, it is well-structured and very reliable. As sensors proliferate, the data they capture is becoming an important component of the information used for BI and analytics. The data's size and delivery speed is often beyond traditional approaches; in such cases, standalone high-performance relational and NoSQL databases are needed.




Figure 1: Unstructured forms of big data are most effective for analytics uses when combined with traditional structured data, according to the author's model.

In essence, the model I propose shows that emerging big data sources, often poorly governed or managed themselves,  need to be enhanced with traditional process-mediated data to deliver useful and relevant business analytics. As a result, the market focus is likely to shift substantially from big data startups and small vendors to more established vendors with enterprise-scale technologies for semantic and physical integration of multiple data types from various sources – a trend we've seen emerging.

The second emergent trend in 2012 that is like to accelerate in 2013 is an increased emphasis on business value. So far, we've seen much interest in analysis of social media information for brand awareness, emerging product issues and more, as well as big data analytics in support of operational excellence. The focus is like to set to shift, however, toward process innovation – the use of previously unavailable data from a variety of sources to invent new ways of doing old business.

Finally, flying under the radar now is the next wave of really big data technologies from Web denizens such as Google and Facebook, whose needs have exceeded the capabilities of file-based tools like Hadoop. A new wave of tools – Dremel, Caffeine, Pregel, Spanner and Prism – may be upon us as the biggest big data proponents inexorably move the needle from a batch-oriented, eventually consistent paradigm toward a distributed but ACID-compliant database mind-set.

     
  • Barry DevlinBarry Devlin
    Dr. Barry Devlin is among the foremost authorities in the world on business insight and data warehousing. He was responsible for the definition of IBM's data warehouse architecture in the mid '80s and authored the first paper on the topic in the IBM Systems Journal in 1988. He is a widely respected consultant and lecturer on this and related topics, and author of the comprehensive book Data Warehouse: From Architecture to Implementation.

    Barry's interest today covers the wider field of a fully integrated business, covering informational, operational and collaborative environments and, in particular, how to present the end user with an holistic experience of the business through IT. These aims, and a growing conviction that the original data warehouse architecture struggles to meet modern business needs for near real-time business intelligence (BI) and support for big data, drove Barry’s latest book, Business unIntelligence: Insight and Innovation Beyond Analytics, now available in print and eBook editions.

    Barry has worked in the IT industry for more than 30 years, mainly as a Distinguished Engineer for IBM in Dublin, Ireland. He is now founder and principal of 9sight Consulting, specializing in the human, organizational and IT implications and design of deep business insight solutions.

    Editor's Note: Find more articles and resources in Barry's BeyeNETWORK Expert Channel and blog. Be sure to visit today!

Recent articles by Barry Devlin



 

Comments

Want to post a comment? Login or become a member today!

Be the first to comment!