We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.


Blog: Krish Krishnan Subscribe to this blog's RSS feed!

Krish Krishnan

"If we knew what it was we were doing, it would not be called research, would it?" - Albert Einstein.

Hello, and welcome to my blog.

I would like to use this blog to have constructive communication and exchanges of ideas in the business intelligence community on topics from data warehousing to SOA to governance, and all the topics in the umbrella of these subjects.

To maximize this blog's value, it must be an interactive venue. This means your input is vital to the blog's success. All that I ask from this audience is to treat everybody in this blog community and the blog itself with respect.

So let's start blogging and share our ideas, opinions, perspectives and keep the creative juices flowing!

About the author >

Krish Krishnan is a worldwide-recognized expert in the strategy, architecture, and implementation of high-performance data warehousing solutions and big data. He is a visionary data warehouse thought leader and is ranked as one of the top data warehouse consultants in the world. As an independent analyst, Krish regularly speaks at leading industry conferences and user groups. He has written prolifically in trade publications and eBooks, contributing over 150 articles, viewpoints, and case studies on big data, business intelligence, data warehousing, data warehouse appliances, and high-performance architectures. He co-authored Building the Unstructured Data Warehouse with Bill Inmon in 2011, and Morgan Kaufmann will publish his first independent writing project, Data Warehousing in the Age of Big Data, in August 2013.

With over 21 years of professional experience, Krish has solved complex solution architecture problems for global Fortune 1000 clients, and has designed and tuned some of the world’s largest data warehouses and business intelligence platforms. He is currently promoting the next generation of data warehousing, focusing on big data, semantic technologies, crowdsourcing, analytics, and platform engineering.

Krish is the president of Sixth Sense Advisors Inc., a Chicago-based company providing independent analyst, management consulting, strategy and innovation advisory and technology consulting services in big data, data warehousing, and business intelligence. He serves as a technology advisor to several companies, and is actively sought after by investors to assess startup companies in data management and associated emerging technology areas. He publishes with the BeyeNETWORK.com where he leads the Data Warehouse Appliances and Architecture Expert Channel.

Editor's Note: More articles and resources are available in Krish's BeyeNETWORK Expert Channel. Be sure to visit today!

October 2011 Archives

There is a widespread confusion in the market with this buzzword or catch phrase - BIG Data. If one sits through the presentations from 10 vendors, 20 different definitions are likely to come forward. Each definition, is skewed to support that vendor's products or services.

This is where the issue is. I have had people tell me they work on BIG Data and then talk about the volume of data and not the type of data. There are people that talk about BIG Data and then focus on CLOB and BLOB text. There are people who are implementing Sentiment Analytics tools and have said they are doing BIG Data.

To simplify the confusion, lets look at the world this way -

1. Companies need to make business decisions and compete on products and services, to accomplish this they build operational systems, transaction processing systems, CRM, ERP and SCM systems. All these systems create and generate data from different touchpoints. All this data is then collected in an ODS and EDW/Datamarts to generate reports and be used in analytics. Once the reports and analytics are done, the data is consumed by business managers and executives who take action. All the action is happening on emails, documents and is not related back to the EDW or ODS. The data generated by all the systems including email, documents and operational systems collectively form one form of BIG Data - internal Big Data.

2. There is external BIG Data - Clickstream logs, machine generated logs, 3rd Party, Sentiments, Forums, Blogs etc, which need to be analyzed for Competitive Intelligence, Voice of Customer, Behavior Trends, Speech to Text (Call Center) and Image processing.

All of these types of data when used by any company will be the true definition of BIG Data. Hadoop is one platform to solve this problem, but there are more emerging and semi-established platforms too. All major vendors in the DW space have already pledged support to Hadoop and talk about BIG Data specific solutions.

This blog provide you my perspective on BIG Data definition. You may or may not agree to it, but this is the real deal based on experiences from a few BIG Data projects that I worked on in the past few months.

Posted October 24, 2011 10:07 PM
Permalink | No Comments |
The market is maturing from a perspective of understanding the value of Social Media and the data generated from the same. Though all companies may not derive direct benefit from the data in the social media space, there is something for everyone.

Let us classify the types of intelligence one can derive from the Social Media

  1. Customer Behavior
  2. Customer Sentiments
  3. Product Behavior
  4. Megatrends
  5. Competitive Intelligence
In 2010, Google's Eric Schmidt said that "I don't believe society understands what happens when everything is available, knowable and recorded by everyone all the time". Using such data different organizations will derive different success factors in Social Data integration.

But just understanding the data is the first step. Once needs to get the content and its context. In the last two years several banks, retail organizations and travel & transportation have started using social data into their mainstream systems, but event these teams are still scratching the surface.

To derive the true power of this data, one needs to adapt a clear set of machine learning algorithms, which will turn data to insights and eventually provide value. For example Amazon.com derives 30% of sales from its recommendation engine. This process took a few years from inception to maturity, but nevertheless the value is there and seen.

How do you set machine learning, there are multiple techniques the easiest of which is to get your business SME's and marketing SME's to examine this data and recommend a set of rules to process the same. Then you get a platform such as Hadoop or Textual ETL and define the rules and process the data. As you discover more trends and patterns, you can keep maturing the rules. The process is complex but will provide ROI in days and not years/

If you ever wonder the worthiness of Social Data, never doubt it, there is value, you need to explore it and examine it.

Tomorrow we will start looking at semi structured data and its worthiness

Posted October 20, 2011 9:28 PM
Permalink | No Comments |
In the world of unstructured data, you often find that companies have been keeping data around for periods longer than required by their own laws or compliance regulations. Recently companies such as Facebook have been fined thousands of Euros and Dollars for harboring data. One wonders at such times, is Content (the life of data after you consume the analytics and all the semi structured and unstructured stuff) a good thing to keep or a bad thing to get rid of. We are talking about the retention of content beyond archival to offsite and backup locations, does it have any Business Value.

Let's talk a fiction story here - XYZ Corporation is a multi-faceted services company providing financial, marketing, legal and hosting services. As a part of their large clients they have MNC Autos, a large automobile parts manufacturer and supplier. XYZ manages all the financial aspects of MNC employees including payroll, 401k, pension fund and more, the legal contracts, marketing channels and agreements, document creation and retention. Due to a large recall of auto parts for a certain Car/Truck manufacturer, MNC Autos is subject to litigation and is asked to produce all contracts it has with its suppliers, manufacturing plat records and such. The court asks for 7 years of data, which MNC is not able to provide electronically. Due to the long drawn proceedings, MNC loses clients and decides to layoff employees and eventually goes Chapter 11. Now things take a worse turn and the company has to produce all its financial records and contracts. The court requires a full blown legal discovery process.

At this juncture is when MNC executives wonder what value could the data from paper bring and finally decide that they need to convert all the paper into electronic format and try to derive value from the same to get their company back on track.

This is where a classic business case for unstructured data management comes into play. Technology aside, in this situation MNC has three instances of deriving business value from unstructured content

  • The original data of contracts with supplier, vendor and clients when converted to OCR form will provide an opportunity for contracts management, vendor quality and compliance.
  • The financial data when integrated with contracts and quality will provide new insights into their own business financials and behavior trends.
  • The data from the first two processes when integrated with supply chain data within MNC and with its clients supply chain, will create meaningful insights of the root cause of the problem and potentially help MNC to clear their name.
When MNC corporation does get data to the electronic format, can it even integrate it with the other data and how can it present all the evidence to the Courts?. Here is where one can talk about eDiscovery or services like the same.



Posted October 20, 2011 8:45 PM
Permalink | No Comments |
A lot has been written and is being written about Big Data, even as you read this blog post. The most famous and most quoted article about Big Data is "McKinsey Big Data Report", which has been downloaded by all organizations and innovators.

Have you ever paused for a minute and wondered why Big Data? is this data so valuable to your organization? will it bring real insights that you have missed? If you pause and step back, the answers will dawn to you that yes you will need Big Data. But wait, do not start the next steps to move to a project as soon as possible. Chances are that your team and even executives will all jump to the project and give the proceed signal. If you rush into a POC or a POV exercise without the WHY? question, you might have a marginally successful project or realize that what you expected versus what you realized are completely different. Chances are you might lose interest and not proceed further.

Here are a few compelling business cases to think about
  • There is a lot of CRM related programs that are not garnering any significant success
  • There is a lot of competitive intelligence that your teams are not able to tap and learn
  • There is speculation that your Analytics are not predicting or reasoning the behaviors of people, market, products or such
Where or how does BIG Data improve or make these programs successful?. With the technology advances today, there are several options to harness data external and internal to the organization into an Analytical Database. Now lets think again "Why BIG Data?", here is where I would like to point out that we can use the famous "Zachman Framework", and create a matrix to process each type of data that you have identified as Big Data. This structure will give you a roadmap to make every question that you want to ask on "Why BIG Data?" to be a successful one.

If you are intrigued by this thought, watch this space for an article on "Delivering Insights with Big Data using the Zachman Approach".

Posted October 18, 2011 9:51 PM
Permalink | No Comments |
Well the first wave was between 2000 and 2003 with Netezza and its early emergence, then came the second wave between 2004 and mid 2009 with Greenplum, Vertica, ParAccel, Kickfire, AsterData, Dataupia, DataAllegro, Infobright, Kognitio and Exadata. Between mid 2009 and 2010 we saw a quick round of acquisitions, partnerships, mergers and more.

Towards the end of 2010, it all started again for the Data Warehouse Appliance, Act III part I, this time, resurgence with a  vengeance, support for BIG DATA and HADOOP. AsterData started the trend with the architecture interface to support distributed compute and Hadoop/MR support, followed by Greenplum, Vertica, Kognitio, Netezza, ParAccel, Oracle Exadata+Exalytics, Microsoft+Hortonworks.

Like they say "Third Time's A Charm", the third act is the proving ground for the Data Warehouse Appliance. The platform that has been purpose built for large scale data processing and analytical workload, finally has the right connects to the large data or big data or unstructured data content, which can be processed with Hadoop MR, Ruby or Rails, Ruby and Forest Rim Textual ETL Engine technologies. Such data can be then processed into the EDW with taxonomy and semantic ports to be integrated into analytic structures, which can be computed on the appliance, using the Hadoop principle - Compute shall move closer to store.
This can then be ported to the cloud for distribution to mobile and desktop platforms via reporting engines such as Tableau, Qlikview and Spotfire.

What is more encouraging is the emergence of high performance memory stores such as Violin and Fusion IO, these technologies combined with SSD will make Disk IO a thing of the past. What is even more appealing is these technologies can scale to a petabyte in a half rack. With the integration of such scalable and fault tolerant technologies in the hardware architecture, the DW Appliance will emerge as the Enterprise platform for DW computing of the future.

The other compelling reason to start looking at these platforms is the fact that by 2015, most enterprises will be developing Apps, not Applications. Yes you read that right, we are talking about Apps in the Enterprise App Store and not Application development. This means the underlying data must do all the workload, which will require the Data Warehouse Appliance architecture. We have begun to see the rumblings from Oracle, Teradata, Microsoft, IBM and from the hardware side from HP, Oracle, Dell, IBM, EMC among others.

The new avatar of the Data Warehouse Appliance promises to the the best yet and with all the Big Data wars unfolding right now, the Data Warehouse Space looks more exciting than before. Add Speech, Video and Image store and processing to this bundle to mix things up.

The maturity of Data Virtualization will add another dimension of data processing and application processing to the platform, which now is more elastic than before with the integration of Hadoop.

To sum it up, this is a Solution Architects pipe dream and they will get to live it well.

Posted October 17, 2011 9:09 PM
Permalink | No Comments |