We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.


Big Data: Search vs. Analysis

Originally published October 3, 2013

Search and analysis are not the same at all. And the fact that they are different really matters.

What is search? Search starts with knowing what you want to search for. Suppose we want to find all occurrences of the name “Joe Foster.” A search turns up all the places where “Joe Foster” is found. Sounds pretty simple, right?

Well, let’s look a little closer at this seemingly simple proposition. Our first proposition is that we know what we are looking for before we start. In this case, we are looking for “Joe Foster.” And as long as we know what we are looking for before we start, everything is fine. But what if we don’t know what we are looking for? What if we want to find all men who are older than 50, are retired military and live in Alabama?

Trying to use search technology is fine if we know what we are looking for at the outset. But if we don’t know what we are looking for at the outset search technology is pretty limited.

But there are other limitations to search. What happens when we do a search on “Joe Foster” and we find some hits? What do we know about “Joe Foster”? We know that he exists in a database, and we know how many times he exists there. But that’s it. We don’t know if Joe is a preacher, a rock collector, a dog trainer or a clown. In short, all we know is that there are multiple occurrences of “Joe Foster” in a database. And that’s all we know. We don’t even know if the first “Joe Foster” that we found is the same “Joe Foster” that exists elsewhere. The occurrences may be references to two entirely different people.

So we see that search has its limitations.

Now what about analysis? Analysis assumes we know just a little bit more about the data that we are going to analyze. Analysis assumes that we know such things as metadata about the data we are searching for. Metadata is just another form of context. In general, search assumes no context, but analysis assumes that there is context associated with the data that will be analyzed.

Therefore, when we do analysis we don’t have to know what we are looking for before we start. We can do a general purpose query such as: Show me all the men in Tennessee who played football for the Green Bay Packers. Or we can pose queries such as: Count the number of long-arm quilters in Colorado who do independent work. In analysis we have context so we can ask much more incisive analytical questions that rely on the context to help shape the query.

So you see, there is a really big difference between search and analysis.

The Big Data Situation

Why does all of this matter? It actually matters a great deal. In today’s world, people are being pushed toward big data. Indeed, in some cases there are some very good reasons for bringing in big data. But stop and consider this: All data in big data is unstructured. Why is this important? It is important because with unstructured data you have no metadata. You have no context with unstructured data. This means you can only do the simplest of searches against unstructured data. Since there is no context, you can’t do any real analytical processing.

If all you want are simple searches, then big data may be your thing. But real business value comes from analytical processing, not simple searches.

So what’s the problem here? The problem is that in the vendor’s zeal to sell you a big data solution, somewhere in the equation it is lost that you can’t do sophisticated analytical processing against big data. And if you can’t do sophisticated analytical processing, you are not going to have an easy time getting any real return on investment for the infrastructure you have to build to support big data.

Just some food for thought.

  • Bill InmonBill Inmon

    Bill is universally recognized as the father of the data warehouse. He has more than 36 years of database technology management experience and data warehouse design expertise. He has published more than 40 books and 1,000 articles on data warehousing and data management, and his books have been translated into nine languages. He is known globally for his data warehouse development seminars and has been a keynote speaker for many major computing associations.

    Editor's Note: More articles, resources and events are available in Bill's BeyeNETWORK Expert Channel. Be sure to visit today!

Recent articles by Bill Inmon

 

Comments

Want to post a comment? Login or become a member today!

Be the first to comment!