Blog: Krish Krishnan Subscribe to this blog's RSS feed!

Krish Krishnan

"If we knew what it was we were doing, it would not be called research, would it?" - Albert Einstein.

Hello, and welcome to my blog.

I would like to use this blog to have constructive communication and exchanges of ideas in the business intelligence community on topics from data warehousing to SOA to governance, and all the topics in the umbrella of these subjects.

To maximize this blog's value, it must be an interactive venue. This means your input is vital to the blog's success. All that I ask from this audience is to treat everybody in this blog community and the blog itself with respect.

So let's start blogging and share our ideas, opinions, perspectives and keep the creative juices flowing!

About the author >

Krish is a recognized expert worldwide in the strategy, architecture and implementation of high performance data warehousing solutions. He is a visionary data warehouse thought leader and an independent analyst, writing and speaking at industry leading conferences, user groups and trade publications. He has authored two eBooks, more than 75 articles, viewpoints and case studies on business intelligence, data warehousing, and data warehouse appliances and architectures. In his 19 plus years of professional experience, he has been solving complex architecture problems spanning all aspects of data warehousing and business intelligence for Fortune 1000 clients. He has designed and tuned some of the world’s largest data warehouses.

The Vice President of Strategy at Chicago Business Intelligence Group, Krish teaches regularly at TDWI, DAMA, IRM UK and other conferences, and is helping drive and mature the data warehouse appliance market. Krish also serves as Associate Vice President of Programs for DAMA Chicago and is Ethics and Governance Advisor to DAMA International.

Editor's Note: More articles and resources are available in Krish's BeyeNETWORK Expert Channel. Be sure to visit today!

In the blur of Big Data, there is a element of suspense and mystery that prevents one from adopting to the same, what information is available and where to find integration points for linking the same to your enterprise. While there are several technologies available to address the volumetric's problem, there is one way to address the complexity and ambiguity side of Big Dat, using Taxonomies to create a Data Discovery exercise.

Taxonomies have long been used as catalog or index creation mechanisms in the world of metadata driven approach to data management and more so in the Web driven architecture where you need linked context behind the scenes. The very same taxonomy family can simply be used to create what we call word clouds or tags from content that is within Big Data. these tags can be used to create powerful linkages that will form a lineage and a graph.

What about Data Quality? that is the biggest advantage of using Taxonomies. When you have spelling errors and language issues, due to the intrinsic nature of taxonomies, you can land to a margin of error equation and often arrive at a close match.

Will this work on all types of big data, from my experiments and learning's it has worked with almost all types of data that can be deciphered by human minds. My next article in this channel will be focusing on this subject.

What can you do with the output from such a discovery? the obvious answer is that you can create a data road-map with linkages to all data across the enterprise. This is a foundational first step in a bigdata journey.

Posted May 17, 2012 12:51 PM
Permalink | No Comments |
In the last year there has been a lot of buzz about "Big Data" and its vast resources that can provide unmatched value and insights when mined for information. The question is the amount of data we are talking about mining and the time it will take to create the data structures to drive metrics and KPI's from this data? this time lapse will cost opportunities and often has been the entry barrier for "big data" adoption in many companies today.

In this world of chaos, is where we will see the strength of a nascent area of data architecture called Taxonomies. Taxonomies themselves are very popular since the early days of Aristotle, and have even been found referenced in the writings of Chinese emperor's in 3000 B.C. The word is derived from the Greek language and means classifying and identifying species. It has been used in Biology and Language for many years.

Today we have Taxonomies available for every product and subject area in the world, thanks to companies like Wand Inc, Pingar and others. These taxonomies  provide a clear metadata lineage and relationship map, which can be directly used on any kind of data to navigate and classify the same. Another benefit of taxonomies is the ability to integrate different types of data about the same subject and create powerful mashups.

The biggest advantage in using taxonomies is your ability to navigate data in its native forms without having to transport it to a single location, this removes latencies and creates minimal integration work. A second advantage lies in the fact that you can navigate multiple subject dimensions in one document or video or picture, without reprocessing the data multiple times over.

In the land of "big data", you can discover hidden nuggets of information with this approach and then create powerful visualization using light weight reporting tools like Tableau or SpotFire.

To learn more on these subjects and their usage, attend EDW 2012, TDWI and Strata Conferences.

Posted April 15, 2012 4:19 PM
Permalink | No Comments |
In the recent few months, there has been a lot of activity surrounding Big Data and the entire ecosystem. Several M&A's, new vendor announcements, capability enhancements and more. Along the way there has also been several announcements made by pure play Analytics community - including vendors, contributors and organizations, coming out in support of extending Analytical capabilities to the Big Data platform or including Big Data as part of Analytic source and data ecosystem. At a first pass, this seems to be a natural process, but do not wear your regular thinking cap and make these assumptions.

If you stand outside your normal periphery and look at Data, Big Data in particular, inspite of its sheer vastness in volume, velocity, variety, complexity and such, this data is very easily visualized and understood when seen through Analytics. Imagine for example, on Amazon.com your search has additional recommendations that are all textual in nature, you would be least interested in going back to Amazon.com, rather the data is presented as a statistic and has associated confidence factors, which makes it easy for you to shop there repeatedly. This is a simple example for this discussion on Why Analytics matters.

As you start looking for Big Data, remember to look for Analytics too, without the latter the former will never provide you useful insights.

Posted March 13, 2012 1:01 PM
Permalink | No Comments |
There has been all the buzz surrounding Big Data, and how it makes every organization look cool and competitive (that there are multiple layers of intricacies not withstanding). There are infrastructure providers,  community driven projects and associated support, research, venture capitalist backed vendors and much more happening.

While all this buzz is great and early adopters to this new and shiny object have made inroads, at a mass adoption level, Big Data has not been embraced yet by the business users. The reason for this being a key aspect of Visualization. One of the fundamental aspects that need to be understood here is the way we approach Big Data and its modeling and integration is very different from any other data integration exercise. Here is a simple way of looking at the difference

  •  Traditional Data Integration - Business Requirements & Analysis --> Model --> Organize--> Collect --> Integrate --> Store -->  Analyze --> Visualize
  •  Big Data - Collect --> Store --> Organize --> Visualize --> Analyze --> Business Requirements & Analysis --> Model --> Integrate --> Visualize
As you can see from the flow shown above, Big Data needs visualization before you can settle down for business requirements and post integration. You might wonder if this is really a huge problem or are we hyping this up, in reality this is a problem and there is very minimal options available at this point to provide as solutions. I do not want to classify any "App store" downloads as a robust solution, they are all driven towards a personal market for a consumer.

The reason for the current situation can be analyzed in two ways

  • Infrastructure Focus - Web 1.0 and 2.0 focused on infrastructure and the underpinnings, the OSI model and current solutions in the marketplace will definitely point that.
  • Data Ambiguity and Complexity - Big Data by nature is complex and ambigous, this requires additional efforts and deep SME's and sometimes Quants to think, integrate and solve. These folks need to able to visually analyze the data than reading machine data or long pages of text. The tools are not there yet for this purpose.
It is simply unfair to anyone to be looking at large sets of data to derive any value from the same. We need tools that can provide an interrogation platform for that data. The tools will and should be very ontology and semantic focused as we are not ready to model or integrate the data yet.

The journey ahead is greenfield still, there are a few vendors who are visionaries and among them there are a few who are considered leaders. In this year we will see a flurry of activity and investments in this side of the house. The frenzy of Big Data has not peaked yet but it is not too far in the future.

Posted March 3, 2012 6:37 AM
Permalink | No Comments |
In the last part of this Blog entry, we discussed the basic premise of the Crowd and why we are looking more to the Crowd for insights. There have been many studies on whether the Crowd is really smart or is it a distortion field which is swayed by the voice of a few powerful influencers. Most notable among these include a study in 2009, led by Carnegie Mellon University (CMU) professor Vassilis Kostakos pokes a big hole in the prevailing wisdom that the "wisdom of crowds" is a trustworthy force on today's web. His research focused on studying the voting patterns across several sites featuring user-generated reviews including Amazon, IMDb, and BookCrossing. Do you know that the Crowd has moved beyond just user reviews.

Let's look at some real life examples - a popular one quoted by Don Tapscott in his book MacroWikinomics is "Local Motors", a company that builds custom cars that is based on deisgn ideas submitted by car enthusiasts from around the world. Not only can you design your own car, you can visit a local motor location, order parts, customize your car and do whatever you want to build it to your specification. Their mission statement reads "To lead the next generation of crowd-powered automotive manufacturing, design, and technology in order to enable the creation of game changing vehicles".  Another great example is Fiat Mio an open innovation of a car by Fiat Motor Company.

Why is open innovation with crowdsourcing more popular? two reasons (1) The internet has created a virtual world where there are no physical boundaries. This means you can simply form communities of common interest and that leads to building a crowd that is very knowledgeable and passionate about a particular topic or topics, (2) When you open a problem to be solved by a large crowd of individuals, you get a set of solutions that will have a higher confidence factor to work and a potential lower risk factor. 

Let's come back to the reviews and feedback on various forums, why do you care or why does an online or a brick&mortar retailer care or why does your 401k investment management care? the reasons for this from a business perspective are to create that transparency and trust in you as a customer or a prospect. Your opinion will be formed based on experiences of others and you need to hear the positive and negative sentiments equally to get an informed opinion or decision. Word of mouth marketing in any social media channel allows you to get access to more metrics than ever before and the socially aware customer is a better customer to acquire and retain.

Crowdsourcing has also become mainstream with companies like Kaggle, Threadless, CrowdAnalytix, Cambrian House and Innocentive hosting competitions for the Crowd to solve. These problems range from simple to complex puzzles and often come with cash prizes.

Linux, Hadoop and many popular software's today are stellar examples of the Wisdom of Crowds.

In conclusion, we see from ancient and recent history that the Crowd formed from a community of interest has often proved to be powerful and game changing. As we move into the future, the internet and social media together have created the platform for virtual crowds to form and create powerful game changing movements.

Posted February 20, 2012 7:16 PM
Permalink | No Comments |
PREV 1 2