We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.


5 Steps to Readiness Assessment for Big Data Part 2 of Big Data Nuts and Bolts

Originally published January 15, 2013

This is the second article in my Big Data Nuts and Bolts series. Part 1, Big Data: What It Is and What It Means for Today’s Enterprises, describes the business intelligence maturity model evolution to enable "big data" processing. This part of the series will concentrate on how that evolution impacts the information management services.

Big Data Information Management Services

To start off, let me state this clearly: Big data is not completely new but requires extensions of our current processing technologies to take advantage of new sources of information and advances in the processing, integration and analytics of them. In order to extend into big data, however, it is necessary to ensure the current intelligence and integration pathways are mature enough for this growth. Those who have read the previous article in this series may remember that I call these new sources of information “untapped resources.” They are untapped for a number of reasons, which may include:
  • Technology to incorporate them may not have been available or cost effective.

  • Lack of a clear business case incorporating the untapped data resources.

  • New ways of using extended information sources have become more common, requiring companies to expand resource needs and utilization to stay competitive.
These among other reasons may be why some of the resources remain untapped. So what is the driving force behind the interest and excitement behind big data? Gartner’s Hype Cycle for Cloud Computing, 2012, states, “Big data will deliver transformational benefits to enterprises within 2 to 5 years, and by 2015 will enable enterprises adopting this technology to outperform competitors by 20% in every available financial metric.” This should make any C-level executive stand up and take notice.

Unfortunately, big data efforts, as with any other initiative of a similar nature, tend to have very low success rates. As a matter of fact, Gartner's top predictions for 2012 and beyond included this prediction about big data: “Through 2015, more than 85 percent of Fortune 500 organizations will fail to effectively exploit big data for competitive advantage.” This leads to the question “Why are so many customers failing in their big data initiatives?”

Thus, the incentive to author this article in the first place is to assist my clients in understanding what big data truly implies, what the foundational concepts are and how to utilize them effectively. The first article painted a picture of big data, while this part of the series will impart the foundational concepts without which a successful big data implementation is in jeopardy. I will skip the standard top 10 reasons integration projects fail such as scope creep and expectation setting since there are a number of good ones available on BeyeNETWORK.com and focus on the maturity model requirement relevant to big data success.

Before starting a big data initiative, the following five readiness assessment focus areas should be reviewed:
  1. The organizations maturity level should be measured. Review the processes and technology capabilities and the human resource skill sets.

  2. A strong governance program combined with a metadata management policy will help alleviate or mitigate risk inherent in broadening the types of information accessed.

  3. Identify the sources of information required and the business case for each.

  4. Determine how to interpret and integrate the data.

  5. Review the data retention period for each source, what should be kept long term versus pass through which will help in the next step, technology enablement.
In my opinion, the ability to be successful in big data is directly related to the maturity model of the organization. I have seen clients who delve into big data head first but have yet to implement a data governance program. Another client was interested in big data but was still writing custom code to run their business intelligence initiatives. Neither one of these had the ability to extend their services into the big data untapped resources or even determine a common standard such as stewardship or master source to resolve data conflicts. For instance, if you are having issues today identifying a master source of data in your traditional world, how is that going to be impacted when you open the doors to the new data sources? As an example, what is the correct billing address, the one from the traditional billing OLTP, the address change service, or a new feed from the post office master file? Can you use this new information to provide a new contact point or customized campaign to this customer? Thus, I hope you can see that extending into the untapped resources may result in new and challenging data conflicts.

To commence the assessment of maturity level and technological readiness for big data, the first step is the configuration of information management services. Figure 1 (below) shows the different layers comprising the information management services from data access or acquisition through to the analytics.

Governance and metadata management are the pillars of information management services. While the mature business intelligence initiative will already have these services in place, extensions will be needed to incorporate the untapped data sources. For those who have yet to develop a strong foundation in these areas, expanding into big data will prove a challenge, to say the least. For example, consider an attempt into predictive analytics based on trending exploration of branding events. This analysis will collapse data from operational systems, call center, social media and perhaps traditional snail mail campaign results. Without a strong foundation in the pillars, how do you manage conflicting information if you can’t clearly determine where the elements originated and which is determined to be the master or driver of information? Who will own the relevancy and prioritization? I understand that this is a gray area, after all who can “govern” a data source from a social media feed? However, steps must be taken to reduce risk, anticipate and direct collision resolution, and control master driver issues.



Figure 1: Information Management Services

Once the pillars are in place, the extension of the “back office” or data acquisition can be incorporated to access additional untapped resources. ETL (extract, transform and load), EAI (enterprise application integration), API (application programming interface) and data virtualization processes have long been part of the information services and are well understood. Most vendors in this space are providing new or improved access layers to non-traditional sources of data such as semi- and unstructured data, web, social media and others. We have been accessing semi- and unstructured data for many years, but technological advances in the API layer capabilities around sentiment analysis have vastly improved the value of return.

Capturing information in the data acquisition phase from both untapped internal and external sources may require one-to-many extensions to the data acquisition tools or the purchase of new ones for some customers. Some of the internal sources such as emails and IMs (instant messages) cannot cleanly adapt to the same ETL code used to source information from, let’s say, the Oracle ERP (enterprise resource planning). Extensions provide the ability to access email and IMs, which for some corporations have become the contact mechanism of choice as human resources become increasingly distributed. For messaging systems, the intelligence we are after is in the text itself and its relationship to other texts. This is where sentiment analysis comes in, the ability to understand the context within the message or email.

The same arguments apply in the discussion of acquiring untapped external content.  Understanding the context of a tweet, blog or posting all require some type of sentiment analysis, also called opinion mining or text analytics. Taxonomies and ontologies play into semi- and unstructured content as well. For instance, how are you tagging the data in your enterprise content management system? Can you trust the tags available and are all items tagged?  More on this topic will be covered in the big data enablement section later in the series. A case can be made to also incorporate some of these tools in other service layers such as sentiment analysis in the analytics or integration layers as virtualization or in memory tools utilization.

The data services layer also requires some changes or expansion of services. The untapped sources of information will also have quality, completeness, and accuracy challenges. Some data services will be needed to cleanse and standardize the untapped data sources so they can be linked in the integration layer. Imagine now that customer name is coming in from emails, LinkedIn and documents in addition to the current traditional data sources. Will the current name standardization process work for all these new sources? Doubtful, but it is possible to extend the current standardization routines to incorporate the new types via additional business rules. 

The integration layer may require the greatest change or expansion; consider that we have to “link” or associate the data to integrate the traditional sources with the untapped ones.  The unstructured data content, such as audio or video, x-rays or other type of blob information, will affect your current integration pathways and depend in a large part on how your metadata and tagging foundation has been implemented. If scientific or log data is required to be on a columnar database, that will obviously require new integration patterns.  In some cases, physical integration between traditional and untapped data sources would not be advised, and visualization techniques provide the best abstraction layer.

The analytics Layer is broken down into two services, the presentation and advanced analytics in order to clearly delineate between the traditional BI and big data analytics capabilities. The presentation layer is enhanced with the expansion of information, thus reports and dashboards can incorporate the new sources or supplementary ones added. The advanced analytics phase may already be started in some organizations. These types of analysis include predictive analytics or modeling, trend analysis, and customer churn/segmentation, which can be addressed in traditional or data warehousing systems.

Unfortunately, without the enhancement of the untapped data sources discussed in this article, these types of advanced analytics may not have the ability to encompass data outside the data warehouse, are difficult for business users to extend or mine, and do not effortlessly address competitive analytics. This leads to the next step in utilizing the untapped sources or data, namely determining how to link them with traditional ones.


Figure 2: Incorporating the Untapped Data
(mouse over image to enlarge)

Figure 2 shows the conceptual organization of the untapped information sources to the more traditional ones (in blue). The internal content, recognized as existing within the organization is that content within the circle while non-corporate and public domain information is in the right frame rectangle. The Data Science block represents big data business enablement that will be addressed later in the series.

To wrap up this article in the series, big data can provide a wealth of information to identify opportunities for new revenue generation or vehicles to increase current ones, but the foundation to enable the extension into this world must be addressed first. Review and address your organization’s maturity model; ensure the people, processes and technology can extend and function properly; determine the sources of information and the value they provide to your organization; learn how to properly integrate and link them; and understand the retention periods. Finally, at this point you’ll be in the proper state to revise your architecture as needed and progress with the biggest bang for your buck. Big data can offer big rewards to your organization if you proceed in a thoughtful, intelligent manner.

The next article in this series will begin with the big data technology enablement section.  


  • Calla KnopmanCalla Knopman
    Calla has more than 15 years of consulting expertise in data integration including application integration, data warehousing, quality, metadata management, and business intelligence with an emphasis on system design and architecture. Calla is currently the managing member and founder of Knopman IT Consulting, LLC, while her past roles have included senior positions KPMG, Bearing Point, IBM and VIP. She is frequently a guest speaker at IUG, EDW and TDWI chapters. Calla can be reached at cknopman@knopmanit.com.

Recent articles by Calla Knopman

 

Comments

Want to post a comment? Login or become a member today!

Be the first to comment!