The amount of data available is growing like never before, and hidden in all that data are insights, ideas and explanations just waiting to be discovered. Traditional business intelligence (BI) can only answer questions that you already know you should be asking. They can’t identify what is meaningful or what should be ignored. They are built for a paradigm where you have to know the question before you can get an answer.
But what if you don’t know the questions you should be asking? That’s precisely the problem.
The Changing Nature of Data and Analysis
The challenge is really that the nature of data analysis is changing. The two branches of data analysis can be defined as follows:
|Enterprise BI Platforms || Data Discovery & Visual Analytics |
|Key Buyers: IT || Key Buyers: Business |
|Methodology: Top-down, IT modeled (semantic layers), query existing repositories, frame the questions frst before you design the solution ||Methodology: Bottom-up, business-user mapped (data mashups), just give me access to data and I will figure out my solution |
|Deliverables: Reports, KPIs, dashboards ||Deliverables: Visualization, scenarios, story boarding |
|Use Cases: Monitoring, reporting, what-if scenarios ||Use Cases: Analysis, business hypothesis tests, insight generation |
Today we require more flexibility and dynamic interpretation of the data than we did ten years ago for several reasons:
- Increased complexity of the data due to the inclusion of unstructured/semi-structured data into the very realm of “data of substance.”
- Increased velocity of data and data change is forcing us to analyze the data in real time; in other words, there is a critical need to drastically reduce the latency between the event and the response.
- Increased business demand for the use of data in all aspects of decision making; in other words, businesses are aspiring to become data-driven businesses.
The key is focusing on what you are trying to accomplish. Are you trying to find out exactly how much inventory you have in stock for a specific product at a specific location, or are you trying to understand the most likely buyers (consumer segment) at that location for that specific product?
Note that these two questions have very different degrees of specificity. For the first question, you don't want or need to do exploration. But for the second question, you need to do a fair amount of data foraging work just to understand the data sources you want to include or exclude.
It is also important to understand that when we say we don't know the questions, it actually means we don't know the relationships between the data we have selected to answer the questions we have in mind. The original intent of business intelligence tools was to allow business users to get information from systems without having to write complex queries (hence the design focus on semantic layers and abstraction
from physical data models). The resulting output was less important than delivering the data in some manageable form (charts, graphs, etc.). An unfortunate consequence is that the BI solutions became constrained by design to deliver answers to a set of known questions.
One may ask if most business users know the complex questions they need answers to, or is there is an upward trend where the vast majority of business users want to continuously frame new questions using data discovery and visual analytics as a new paradigm?
In my view, there was never a lack of ideas. Business users know the key questions they need to answer to impact their top line/bottom line. However, due to a lack of the right tools and processes, those questions stayed on the back burner for a long time.
Data Discovery and Visual Analytics
Data discovery and visual analytics helps us address this space effectively. From the starting point of the business user discussion, we can craft the "question space" of all (or at least many) of the sub-questions. At that point, understanding and sourcing the relevant data becomes critical. Defining the question space does not have to be exhaustive or big bang by nature; but the better we define it and the more iterative we become, the more robust our data set will be. Once we have done this, data discovery and visual analytics becomes extremely important. It allows us to uncover both relationships we didn't anticipate and reject relationships that we thought were significant in the first place, even if closer scrutiny revealed insignificance of the outcome. We must be prepared to throw out the insights and start all over again.
Data discovery and visual analytics is not a graphical depiction of data. Virtually any software application can produce a chart, gauge or dashboard. Data discovery and visual analytics offers something much more profound. It is the process of analytical reasoning facilitated by interactive visual interfaces.
Data (or information) discovery and visual analytics is more about assembling the many different data sources available and finding insight to what you don't know. This can include structured and unstructured data, and the presentations may vary.
Data discovery and visual analytics is beyond just pure search or visualization. It is a new interdisciplinary science aimed at drawing inferences and conclusions from data. In order to fully leverage the data discovery and visual analytics platforms, it is highly recommended to enrich your data assets and augment them with useful annotations using natural language processing (NLP), machine learning, tagging, reasoning and drawing inferences.
SOURCE: Analytics: Detect the Expected – Discover the Unexpected