We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.


Big Data – Is it Old Wine in a New Bottle?

Originally published September 16, 2013





Many of you who have been in the data management/business intelligence field during the last 3-7 years or so will recognize that many of the projects that are now being tagged as “big data” projects are no different than many projects that were executed during that time – with the distinction that they were not called “big data” projects.

Volume

Over the years, I’ve personally been involved in many projects, where data volumes appeared to be a major challenge for IT. Business users were pushing to get access to larger amounts data, both historical as well as data from different sources. The rise of Greenplum, Aster Data (acquired by Teradata in 2011, Vertica (acquired by HP in 2011), DATAllegro (acquired by Microsoft in 2008) and so on was the innovation response to provide enterprises with options to support this business need. Of course, these players were up against the veterans who had been providing enterprises with the solutions to address this very need, including Teradata and the smaller, newer but fighting alternative Netezza. Both solutions had very similar approaches, with custom hardware and software, but both expensive to purchase and deploy. The change that the Greenplum and Aster Data brought was to have solutions that were software only, leveraging commodity hardware. The microprocessors being churned out by Intel and AMD are now very powerful and also inexpensive, with not just dual and quad cores, but up to 8 and 16 cores with projections of 32, 64 and more cores on a single CPU. Thus, the software-based solutions pushed the parallel processing and distributed computing limits, taking advantage of Moore’s law and the billions of dollars that Intel and AMD were pumping into the CPU / hardware performance research. Every new release of CPUs led to these companies claiming faster and better performance. The rest of the industry and especially the big players saw this as the next big evolution and led to all of these players getting acquired and ceasing to exist as independent companies, including Netezza (acquired by IBM in 2010).

As a result, the key problem around the volume of data being generated – more specifically making that entire data available for analysis – was addressed. Thus, the “volume” aspect of big data has been dealt with by enterprises for a few years.

The cost bar has been pushed even lower with further advances in technologies and with platforms like Hadoop becoming more enterprise grade as well as commercially available, making it possible for even the smallest of enterprises with shoestring IT budgets to consider taking advantage of all the data they have and need.

Velocity

The second V, “velocity,” and the more tricky third V, “variety,” are what make big data so interesting, challenging and “new.” I’m sure you can argue that “velocity” is not really new, but with these highly reduced platform costs, even manufacturing process logs, which in many cases can generate several terabytes of data every day, are now being targeted for analysis and optimization instead of being purged. One use case is around preventing “unplanned” downtimes. Another use case is more efficiently and optimally identifying / tagging products produced during a defective production run window (i.e., a run during which certain production parameters were out of bounds) and being able identify exactly when the parameter(s) went out of bounds and not rejecting the entire run. There is a strong interest for such solutions now because of the significantly lower cost.

Variety

The third V, “variety,” in my opinion is a completely new kid on the block. Enterprises have the least skills, expertise and experience with this aspect of big data. Unstructured data including imaging data, phone logs, biological and chemical molecular structures, etc., in addition to all of the “textual” information captured into enterprise systems make this a whole new type of challenge. At the same time, these data sets together form the most comprehensive data set enterprises have ever been able to pull together. This is no simple challenge, and there are no silver-bullet solutions to solve this problem. However, the lack of such solutions has triggered a race. Every major enterprise is trying to get to the finish line first because it presents a huge first-mover advantage. Enterprises have carried out pilots and POCs in the last 12-18 months to prove the value and have addressed the skeptics. The leading-edge enterprises are embarking on full-fledged projects to operationalize the value these comprehensive data sets.

New Bottle or New Wine?

So is it old wine in a new bottle? In my opinion, only a very small aspect of it is. Everything else is completely new, not just a new bottle. It’s probably even a completely different class of wine. Consider that there are 3 classes of wine – table, sparkling and dessert. (I’m no wine expert, so please don’t take me to task on this; I’m just trying to convey a point.) With big data, you could say we are creating a whole new class that we could call “BD wine.”

If you haven’t started to reap the benefits of this new "big data" vintage, it’s not too late. Very soon it will become a mandatory entity driven by the marketplace. Just as ERP and CRM are no longer competitive differentiators, decisions based on insights from big data will become the norm of the day. However, as I mentioned, the early adopters will gain a competitive edge. It’s clearly time to identify your top big data initiatives and begin working on them.

 

  • Haranath GnanaHaranath Gnana
    Haranath Gnana has over 20 years of experience in information systems, and is a Practice Area Leader at Saama Technologies where he was instrumental in creating the thriving Life Sciences and Healthcare practices. His technology skill set includes the development of complex information management and business intelligence systems, enabling enterprises to make data-driven informed decisions. Customers benefit from his broad knowledge of the information management lifecycle from formulating BI roadmaps and strategy, to transforming strategies to operational realities.

    During his time at Saama, he was also responsible for their "labs" initiatives to explore emerging technologies, both commercial and open source, to derive additional value for customers. As part of the labs initiatives, he built Savii (Smart Audience Viewing Intelligence & Insights, an enhanced variation of Nielsen ratings for the TV, based on TiVo data) and PSL (Predictive Store Locator, which small retailers use to identify optimal store locations) by leveraging syndicated public data sets, along with open source data mining and predictive modules. He received a patent for his pioneering work in PSL. This broad experience and expertise has helped him enable clients to adopt and leverage leading technologies such as BI appliances, mobile BI, and more recently he has been involved helping clients understand and embrace the big data phenomenon. Haranath has been published in leading outlets such as Information-Management.com, BeyeNetwork.com and Office of the CIO (oocio.com).

    Haranath completed his MBA with Honors from Lucas Graduate School of Business at San Jose State University, and his B.S. in Electronics Engineering from Bangalore University. He is a Certified Business Intelligence Professional (CBIP) in Leadership & Strategy at the Mastery Level from TDWI.



Recent articles by Haranath Gnana

 

Comments

Want to post a comment? Login or become a member today!

Be the first to comment!