We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.


Business Intelligence Data Analysis and Visualization: What’s in a Name? Part 1

Originally published December 17, 2008

Back in October, a new product was announced that offered “desktop software that allows you to gather data easily, analyze it naturally and communicate your conclusions compellingly.” The objective of the product was to provide a business friendly tool (“a tool designed by analysts for analysts”) that would overcome the complexity of existing business intelligence (BI) tools.

The reviews of this product were generally positive, but one particular review was especially scathing about the product and its developers: “… they didn’t bother engaging the services of designers who actually understand data visualization or data analysis.” This latter review led to a deluge of blogs that either supported the reviewer’s comments or strongly disagreed with them.

As a participant in these blogs, I came to the conclusion that neither side in the debate was necessarily wrong. It really depended on how you viewed the product’s audience and objectives. I felt the discussion, although heated, did provide some valuable insights. In order to be able to articulate these insights, I asked each of the bloggers involved to define data analysis and data visualization. The result is this article.

I have deliberately kept everything anonymous so that the focus is on the results of the discussions, rather than on the discussions themselves and the people involved. I have also edited the feedback for clarity and to help the flow of the article. Although I asked each person for a one-sentence definition for each term, people felt they needed to add explanatory text. This, in turn, caused a series of interleaved discussions and debates! 

Webster’s NewWorld Dictionary Definitions 

Data: Things known or assumed; facts or figures from which conclusions can be inferred; information.

Analysis: A separating or breaking up of any whole into its parts, especially with an examination of these parts to find out their nature, proportion, function, interrelationship, etc.

Visualize: To form a mental image of something not present to the sight; envision. 

The Vendor’s Position

Data Analysis: A process in which the analyst moves laterally and recursively between three modes: describing data (profiling, correlation, summarizing), assembling data (scrubbing, translating, synthesizing, filtering) and creating data (deriving, formulating, simulating).

Data Visualization: A method of exposing existing data and/or its attributes (provenance, metadata, distribution) to the eyes, which includes everything from table browse to still charts and multidimensional animation.

Visual data analysis is a sub-genre of data analysis, in which any or all forms of data visualization may be used to provide feedback signals to the analyst. Our product uses visual signals (charts, interactive browsing, workflow process cues) to assist the analyst in moving through the three modes of data analysis.

The crowd of people huddled around the cottage industry of building cool pictures from canned datasets does not like anyone using the words visualization or analysis for anything other than slice, dice, paint and drill. Regarding the visualization family of words, we’ve moved away from visualization to use the term charting. Our target customers understand that word. On the other hand, the word analysis means something dear to our target customers – despite the fact that they use that term very differently from some information technology (IT) and BI specialists. Our customers use the word analysis for the whole process of integrating, manipulating, enriching, filtering and scrubbing their data, which certainly includes, but is not limited to interacting with their data via charts.  

The IT geeks' torturing of academic and semantic borders between data transformation and business rules and analysis is just completely irrelevant to those people on the ground who've always had to do the whole thing in a holistic process, who don't use extract, transformation, and load (ETL) or BI tools and so never cared about technological cubbyhole definitions. 

Crisp demarcation used by the guardians of the status quo serves to maintain power structures. Analysts, however, will speak as they see fit, which is to say that they will develop language structures that are beneficial to them.  If our goal is to benefit those people, perhaps we should take our cue from them and not try to fight the tide of democratization of our hallowed ground.

The Product Reviewer’s Position

Data Analysis: Data sense-making. The process of discovering and understanding the meanings of data. (Not to be confused with the preliminary steps taken to prepare data for the process of analysis.)

Data Visualization: The use of visual representations to explore, make sense of and communicate data. As such, data visualization is a core and usually essential means to perform data analysis and then, once the meanings have been discovered and understood, to communicate those meanings to others.

Some terms, such as dashboard, tend to have an annoyingly broad range of meanings based on whom you ask, but I don't think of data analysis and data visualization as belonging to this category.

Data analysis need not necessarily involve mathematics or statistics. While it is true that analysis often involves one or both, and that many analytical pursuits cannot be handled without them, much of the data analysis that people perform in the course of their work involves at most mathematics no more complicated than the calculation of the mean of a set of values. The essential activity of analysis is comparison (of values, patterns, etc.), which can often be done by simply using our eyes.

The goal of analysis is not to discover interesting information in the data. Rather, this is merely an important part of the process. The goal is to make sense of data (i.e., to understand what it means) and then to make decisions based on the understanding that's achieved. Information in and of itself is not useful. Even understanding information in and of itself is not useful. The goal of analysis is to enable better decisions.

It's important in a discussion like the one we're having here to be conscious of established meanings that have been assigned to terms like data visualization and be careful not to contribute to the confusion that already exists, especially in the minds of the masses who often assume that every word that comes from our mouths is gospel.

Confusion regarding terms such as data analysis and data visualization exists in the BI community because little effort has been made to sufficiently define them. Our industry tolerates a freewheeling, define-it-as-you-wish attitude toward these and other terms to the detriment of our customers. In the academic world, which I keep one foot in, a greater effort is made to define the terms to provide the shared meanings that are required to communicate, yet even in academia it gets a bit murky at times. I believe that terms are so inadequately defined in the BI community in part because ours is an industry that has largely been defined for marketing purposes, rather than as a rational discipline. It serves the interests of software vendors to keep the terms vague.
 
I agree that we must be open to one another’s ideas and definitions, but I believe the goal of this openness, after thinking long and hard, is to narrow, not expand, our use of these terms. As it is today, these terms are barely useful because they are defined too loosely, broadly and inconsistently. Expanding the definitions will only add to the problem.

Blogger 1 Perspective

Data Analysis: Application of mathematical transformations ranging from simple aggregation to complex statistical analyses to discern interesting information in data.

Data Visualization: Graphical rendering of data values in a fashion that communicates interrelationships of variables.

The data analysis process starts with gathering of data that can contribute to the solution of a business or other problem, and with the structuring of that data in some regular form. It involves identifying and applying a statistical or deterministic schema or model of the data that can be manipulated for explanatory or predictive purposes. It then involves an interactive or automated solution that explores the structured data in order to extract information – a solution to the business problem – from the data. 

The vendor involved in these discussions defines data analysis in an idiosyncratic and self-serving fashion. Our industry distinguishes data transformation and integration from data analysis, and it is misleading when a vendor abuses those established industry categories. 

Blogger 2 Perspective

Data Analysis: The process of understanding data by one or more of five broad categories of methods: aggregation, comparison, correlation, projection and imputation.

Data Visualization: The graphical presentation of data in order to facilitate analysis.

It is not data analysis when someone uses a customer sales report to find what products customers purchased. If, however, the same person sums the total sales, compares customers, or even just sorts customer data to find those customers who bought products on the same day, then this would be data analysis.

Graphical presentation of data may be used for presentation purposes only, without facilitating any of the five analytic methods defined above, but this should not be called visualization. There are some grey areas, however. For example, a key performance indicator, KPI, (say, the classic red traffic light) may not appear to enable analysis, but it implicitly enables a comparison to a target and therefore counts as visualization. On the other hand, people use the conditional formatting features in Microsoft Excel, such as data bars, in ways that could not really be called visualization because they only marginally enable any analysis at all – they provide data comparison at most.

Blogger 3 Perspective

Given the variety of the definitions presented so far, there is little new that to add. The interesting point to observe is that there is nothing that has been said about the two terms that one could disagree with entirely.  

The natural propensity of language is to evolve. And in a rapidly changing (however forced) environment such as IT, such evolution can be rapid.  Software vendors, analysts, consultants and writers all continuously bend and twist the few available words to carry any meaning they require. Few people are classically trained, we lack therefore the old approach of taking a Latin or Greek stem and creating an entirely new word. In every community and sub-community, thought leaders go through the process of adding and subtracting meaning to words and phrases according to the needs and preconceived notions of their own communities. Add to that the decreasing attention span of people who are overloaded with mostly irrelevant and often invasive stimuli, and we have a situation where misunderstanding is easy and widespread.
 
The lesson of the review that started this debate and, indeed, of this debate itself is to tread lightly. I invite all of us to be open to the belief that we all believe intrinsically and for good reason in our own definitions, but that we can be open to the definitions of others too. Let's therefore use the discussion to expand the narrow meanings to more inclusive ones that can be more widely agreed – at least in the short term.

A Last Few Words from the Author

After reviewing and summarizing the feedback I received on this debate, I have a few comments of my own to make.

At a detailed level, two questions dominate the discussion:

  1. Are data transformation and integration different from data analysis? There are many examples of applications that retrieve data from multiple sources, restructure and aggregate it, and then load the results into a data warehouse. Similarly, data federation and data streaming technologies allow users not only to do dynamic in-motion data transformation and integration, but also data aggregation and summarization. These are all examples of processes that perform some level of data analysis. The ability to clearly delineate data transformation from data analysis is fast disappearing, and to say data transformation is completely different from data analysis makes no sense.

  2. Is data presented for presentation purposes only a form of data visualization? The mere fact that some of the comments got into semantic debates about what is data and what is information, and about whether a user is actually analyzing the results or not, suggests that a more pragmatic viewpoint is required. From my perspective, if data or information is presented to a user in a format that aids decision making, then that constitutes data visualization.

At a more macro level, it is important to define the role of a so-called expert or specialist. Our job is to help people understand and use new and evolving technologies and products for business benefit. As such, we need to use clear definitions and terminology that aids in this understanding. However, it is important that we accept that other people may have different definitions, and we need to find common ground. Defending our positions at all costs does not aid the industry. We also have to accept that business users may employ technology and use some terms in a completely different way, and it is important to adjust our positions and explanations accordingly. Unless we do that, business intelligence will continue to be usable only by the small subset of users that employ it today.

What do you think?

  • Colin WhiteColin White

    Colin White is the founder of BI Research and president of DataBase Associates Inc. As an analyst, educator and writer, he is well known for his in-depth knowledge of data management, information integration, and business intelligence technologies and how they can be used for building the smart and agile business. With many years of IT experience, he has consulted for dozens of companies throughout the world and is a frequent speaker at leading IT events. Colin has written numerous articles and papers on deploying new and evolving information technologies for business benefit and is a regular contributor to several leading print- and web-based industry journals. For ten years he was the conference chair of the Shared Insights Portals, Content Management, and Collaboration conference. He was also the conference director of the DB/EXPO trade show and conference.

    Editor's Note: More articles and resources are available in Colin's BeyeNETWORK Expert Channel. Be sure to visit today!

Recent articles by Colin White

 

Comments

Want to post a comment? Login or become a member today!

Posted April 3, 2012 by Sanath kumar sanath7285@gmail.com

Thanks for sharing a tips about Buisness Intelligance Data analysis & visualization. As this topic is best & usefull. As we must think well before implementing any buisness. Also it must be visualized in terms of graph, for visualizing our buisness growth in terms of percentage. i  am extremely thanks full for the author for posting such a great & important article to us.

         By viewing this article i also visualized our online buisness that is T-shirt design tool. Software.

Is this comment inappropriate? Click here to flag this comment.

Posted October 29, 2009 by flexmonsters@gmail.com

Great article Colin! It might be interesting for you to check our used approaches in data analysis and visualization as we developed components based on Flex/Flash that can work fast with vast amounts of data at Flexmonster

Is this comment inappropriate? Click here to flag this comment.