We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.

Blog: Krish Krishnan Subscribe to this blog's RSS feed!

Krish Krishnan

"If we knew what it was we were doing, it would not be called research, would it?" - Albert Einstein.

Hello, and welcome to my blog.

I would like to use this blog to have constructive communication and exchanges of ideas in the business intelligence community on topics from data warehousing to SOA to governance, and all the topics in the umbrella of these subjects.

To maximize this blog's value, it must be an interactive venue. This means your input is vital to the blog's success. All that I ask from this audience is to treat everybody in this blog community and the blog itself with respect.

So let's start blogging and share our ideas, opinions, perspectives and keep the creative juices flowing!

About the author >

Krish Krishnan is a worldwide-recognized expert in the strategy, architecture, and implementation of high-performance data warehousing solutions and big data. He is a visionary data warehouse thought leader and is ranked as one of the top data warehouse consultants in the world. As an independent analyst, Krish regularly speaks at leading industry conferences and user groups. He has written prolifically in trade publications and eBooks, contributing over 150 articles, viewpoints, and case studies on big data, business intelligence, data warehousing, data warehouse appliances, and high-performance architectures. He co-authored Building the Unstructured Data Warehouse with Bill Inmon in 2011, and Morgan Kaufmann will publish his first independent writing project, Data Warehousing in the Age of Big Data, in August 2013.

With over 21 years of professional experience, Krish has solved complex solution architecture problems for global Fortune 1000 clients, and has designed and tuned some of the world’s largest data warehouses and business intelligence platforms. He is currently promoting the next generation of data warehousing, focusing on big data, semantic technologies, crowdsourcing, analytics, and platform engineering.

Krish is the president of Sixth Sense Advisors Inc., a Chicago-based company providing independent analyst, management consulting, strategy and innovation advisory and technology consulting services in big data, data warehousing, and business intelligence. He serves as a technology advisor to several companies, and is actively sought after by investors to assess startup companies in data management and associated emerging technology areas. He publishes with the BeyeNETWORK.com where he leads the Data Warehouse Appliances and Architecture Expert Channel.

Editor's Note: More articles and resources are available in Krish's BeyeNETWORK Expert Channel. Be sure to visit today!

November 2011 Archives

Pretty soon we can start having Weathermen saying alerts such as "Data Storm" or "Data Blizzard" is headed towards cities and associated details, and I mean this with all seriousness. The bizzare volume of data growing in the public domain is just too much to comprehend for a human mind. The frenzy for generating data comes with continuous innovation in mobility, wireless communications, stronger infrastructure and an ever ready audience to listen. If you want to know what is happening in a particular celebrity's personal life at any given moment, you can find a million social media sites, forums and more; if you want to know where the next internet led revolution will start, you can keep tuned into multiple facebook pages, multiple social media sites. The point is as facilities increase in scalability and technology is more affordable, you will have more data producers then data consumers, and more relevant will be the question of "what is noise and what is real information"?.

Data generation is not a bad idea, but who uses that data and how useful can that data be is a question of relevance. If you are on Facebook and post every hour to your page, chances are someone in your primary or extended network will be watching it or posting on your wall or something. Those critical transactions are needed for FB to run ads and look at influencer behaviors etc, they really do not care about the personal side of the data at all.

Now lets take a look at a consumer's own behavior,for example ( to make it easy)  how much of value do you see from data in your FB pages, and how much do you care about the data, say after one day, week, month or year ? you really do not think about it, as you are using the platform for free (actually paying for it in some very indirect way). This is why the need to look for Data Storms in the future.

The information overload from the public domain data that is produced for content sharing and sentiment sharing is causing a tizzy. There is value for points in time in this data, but where is it stored? who needs it? and does it make sense to keep it? what are the security threats for this data? Thus comes the big question of "Data Governance for Public Data". Sooner or later, we need to address this question and guess what I recommend that it be a decision that is driven and defined by the same consumers who generate the data everyday, as it is about their information privacy, security and more.

But till then, let's watch for weathermen to announce Data Storms or Data Weather Readings.

Posted November 30, 2011 9:24 AM
Permalink | No Comments |
Over the weekend, I participated in the BigDataCamp in Chicago and met with over 100 people, spending a Saturday to understand what is the Big Data and Hadoop/NoSQL stuff about. What is very interesting to learn and observe from this experience and from the other engagements of last six months is, most people have to yet grasp the complexities of processing Big Data. While technologies such as Hadoop, Mahout, Cassandra, HBase, Hive and more will help you start, there is a lot beneath that surface.

The world of Big Data will need one of think of developing Algorithmic approaches to problem solving. This is no more a drag and drop world of processing data through fancy GUI's, that evolution is being attempted by vendors. There are  reasons of why you need to slow down in ingesting this type of data, and bite off complexities in incremental chunks

  • Big Data is in several structured, semi-structured and unstructured formats. Most of the data in this world is buried in context and content does not lend itself easily to process without context.
  • Big Data cannot be tokenized very easily, this is true when you start thinking about end results from the process and the associated KPI's that you want to derive. The tokenization will need two parts - a robust metadata and a strong taxonomy.
  • Big Data will need several data quality rules to be implemented, which means you need extensive data cleansing and processing. Beyond English, there are several languages that you will need to deal with.
  • There are no set paths to process images, videos and more complex data. These are still evolving and need more time to mature
  • The co-relation and derivation of meaning from compressed terms is not simplistic for any non-business SME.
  • There are different types of content that need multiple rules to process them, derive the actual underlying meaning and then prepare them for integration
  • Taxonomy driven navigation is a good start, but often not enough to create the appropriate context.
While the platform for processing is ready and available, there is no simple process to walk the complexities. A couple of good solutions that are out there are Textual ETL (Bill Inmon) and the Mahout projects (Machine Laarning from Apache community). The Textual ETL engine is a business facing rules creating tool that helps process Big Data, Mahout is more complex, but had ready-made algorithms for Recommendation engines and more. Feel free to look at both these solutions, as they solve different problems and are not competing solutions but complementary solutions.

As I had mentioned earlier, remember the Big Data world is fascinating and provides lot of valuable insights, but the hidden complexities of processing and integrating this data, to make it meaningful will make it seem a daunting task or even a over-engineering solution build.

Posted November 21, 2011 7:21 AM
Permalink | No Comments |
I have worked on several "social BI" projects over the past two years, some have been extremely successful, others have had their fair share of adoption issues. The bottom-line that hits home is the fact - is this program worth all the effort?

Well, if one were to ask this question throughout the program, the answer will be "yes", and if you wonder why?, there are several reasons

  • Social Media has rich content of consumer sentiments, brand image and sentiments, brand reach, consumer awareness, competitive information, research data and more
  • Social Media has information on patterns of behaviors of products, markets and consumers over time across the world
  • Social Media has several independent venues providing critical research on anything under the sun
  • Social Media also has the most important digital presence - Your Organization and Your Customers
If one were to harness all the data of these types and more, and establish the right meaningful content disambiguation rules applied to the same, it would provide rich information that can be used in conjunction with your internal systems and provide you a collage of behaviors, events and triggers that lead to your product and its adoption, your consumers and their sentiments, and much more.

Now let us talk economics, lets say your average marketing spend is $500k per year for the enterprise, and your line of business is home delivery of groceries. Your business has slowed down and you hire a market research company to assess why and what?. Let's assume they come back and recommend you increase market spend and also start looking at new locations for additional market, would you spend the $$'s to increase marketing?.

Not yet, because through additional research of your own if you find that there is competitive threats, consumer sentiments about your decline of quality of service, you now want to learn more. So you decide to spend say $250k to integrate social media data and you want twitter, forums, facebook pages as your target sources. In harvesting the data feeds through various listening posts, you discover that

  • You have lost 20% market to competition due to service delivery issues
  • You have lost 8% of your clients due to economic conditions
  • You can gain additional clients in remote areas if you can expand services for certain brands of foods
  • Your preference of foods brands have created a group of loyal followers, who can be your Word Of Mouth Marketing if you can cater them well, and their clout will bring you a potential 30% net new clients
  • Your products need revamping
  • Your Call centers need more data to help and support beyond just orders
  • Your ability to cross sell and up sell was limited and customers left for more lucrative offers
Upon getting better insights, you now change your product strategy, your CRM improves drastically, your quality meters go up and most importantly, you start developing a "crowd" of customers who become your ambassadors and advisers.

Now let's say you spend the new dollars of $200k on marketing, your sales jumps by 5% of new clients, 10% of customers were given additional options in cross sell and up sell, 40% of your customers give you a "AAA" rating, a new lift of 18% from prior.

In this situation, the economics of social media data is seen very clearly. There are more such instances that can be discussed and business cases articulated.

My two cents, there is lot of rich and varied content available, you need to decide what is best for your organization, if you want to compete and remain in business in the next decade, you will adopt to Social Media strategies and the economics of social media data will always provide measurable gains. You need some software tools like Spotfire from Tibco and Tableau to help create powerful dashboards and give you abilities to drill down to better insights.

There are a number of technologies and data integration points that need to be designed and implemented, that will be a whitepaper that will be available in December 2011 for your reading in my channel.

Posted November 12, 2011 7:50 PM
Permalink | No Comments |
Like the history of Earth, data has history too. We had the ICE age of data which was the punch card and large magnetic tape days. Data was primitive and binary in nature. Then we matured and came to stone age of data where we had mainframe computers and started computing with languages such as COBOL. Along came PASCAL and FORTRAN. Data has started becoming mature and available for consumption.

Next came the iron age where we discovered UNIX and C (Thanks to Ritchie and Thompson), we were suddenly capable of building strong processing platforms and found that we could process data on the mainframe and Unix as well. Data was fast emerging from the shadows as commodity in the enterprise We started seeing the convergence of Relational theory and saw the glimpses of the first relational databases.

Next came the bronze age with the advent of  Macintosh, OS/2 and Windows. We discovered client-server computing, wow what a novel ides, along with client/server came networking, security, emergence of computing platforms such as Visual Basic and Power Builder. We could now build smart applications, tie them to powerful back-end DBMS systems such as Oracle, Sybase, Informix, DB@ for UNIX and more. Data was fast emerging as the backbone of enterprise applications.

As we moved out of Bronze age of data, we were already swimming in ERP, Supply Chain, Logistics, Transportation, Warehousing and CRM. This was already causing consumers of data to go dizzy.

And now we are stepping into the new age of Data - BIG Data. Data has grown in size and volume as the name indicates.

This new age of data is ushering new age of possibilities, we are now able to visualize everything from behavior of people on websites to what drives communities and interests by following the clickstream. We are able to see crowdsourcing in action on the internet. We are able to get insights like never before on patterns and what drives them. As we start swimming in this fast moving current of data, it is also becoming clear that we will innovate faster than ever before in terms of solutions and applications with this new found capabilities. This new age of data will create a market and opportunity like never before. In reality we can say that it is the coming of age of the internet.

As I'm writing this blog the silicon valley VC community is pumping money into ventures that even spell Big Data or Hadoop remotely. This kind of action was last seen in the early 90's. We are probably well into the next boom period and this will be a long tail boom, meaning volume driven.

Posted November 7, 2011 8:01 PM
Permalink | No Comments |
The buzz and hype is going around you everywhere, whether it is a leading technology publication or a vendor or any website related to IT or any conference, the keyword is "BIGDATA". Look at this wordcloud

Any search on the topic brings technologies, companies and more.

Lets get real, when you start talking a BigData project, it is a transformation for the enterprise. You are not talking about just another data project in the enterprise, it is not another Data Warehouse or another Application. It is a combination of all things that you have in the organization about data, analyzing  and presenting the results in dynamic dashboards, mobile technologies and more.

A BigData project is not about tinkering with the hottest technologies, rather it is an innovation and disruption (internal and external) that you want to bring to the enterprise. The success criteria is not how many petabytes you have, but what influence did the insights from the data provide to you as an organization and the benefits it can bring to your business and its customers. Think different, because you are dealing with something that has never been done in the enterprise prior to this program.

Another key to remember is the success of any BigData transformation will start with champions among the people in your organization. We are seeing the emergence of a new role for these champions - Data Scientists.

While technology is important and a lot of it is new and emerging, the people and process side of this transformation is very key to make it a valuable asset for the enterprise. How do you monetize on the data that you have accumulated in the enterprise? it gets better, how do you monetize on social media data that provides rich customer sentiment and also enriched competitive intelligence? all these are the major goals for our own BigData program.

Bottomline to ensure is you have the executive sponsorship and for that your executive will need to understand that it is a transformation and not a tinkering exercise.

Watch this blog and twitter for announcements on technologies, frameworks and  implementation methodologies, all for your BigData success.

Posted November 4, 2011 3:58 PM
Permalink | No Comments |