We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.

Blog: Krish Krishnan Subscribe to this blog's RSS feed!

Krish Krishnan

"If we knew what it was we were doing, it would not be called research, would it?" - Albert Einstein.

Hello, and welcome to my blog.

I would like to use this blog to have constructive communication and exchanges of ideas in the business intelligence community on topics from data warehousing to SOA to governance, and all the topics in the umbrella of these subjects.

To maximize this blog's value, it must be an interactive venue. This means your input is vital to the blog's success. All that I ask from this audience is to treat everybody in this blog community and the blog itself with respect.

So let's start blogging and share our ideas, opinions, perspectives and keep the creative juices flowing!

About the author >

Krish Krishnan is a worldwide-recognized expert in the strategy, architecture, and implementation of high-performance data warehousing solutions and big data. He is a visionary data warehouse thought leader and is ranked as one of the top data warehouse consultants in the world. As an independent analyst, Krish regularly speaks at leading industry conferences and user groups. He has written prolifically in trade publications and eBooks, contributing over 150 articles, viewpoints, and case studies on big data, business intelligence, data warehousing, data warehouse appliances, and high-performance architectures. He co-authored Building the Unstructured Data Warehouse with Bill Inmon in 2011, and Morgan Kaufmann will publish his first independent writing project, Data Warehousing in the Age of Big Data, in August 2013.

With over 21 years of professional experience, Krish has solved complex solution architecture problems for global Fortune 1000 clients, and has designed and tuned some of the world’s largest data warehouses and business intelligence platforms. He is currently promoting the next generation of data warehousing, focusing on big data, semantic technologies, crowdsourcing, analytics, and platform engineering.

Krish is the president of Sixth Sense Advisors Inc., a Chicago-based company providing independent analyst, management consulting, strategy and innovation advisory and technology consulting services in big data, data warehousing, and business intelligence. He serves as a technology advisor to several companies, and is actively sought after by investors to assess startup companies in data management and associated emerging technology areas. He publishes with the BeyeNETWORK.com where he leads the Data Warehouse Appliances and Architecture Expert Channel.

Editor's Note: More articles and resources are available in Krish's BeyeNETWORK Expert Channel. Be sure to visit today!

December 2011 Archives

Taming the three big things in Unstructured Data (Big Data) include Volume, Velocity and Complexity. While we can see infrastructure growing to handle the volume and velocity equations, the third and the most toughest task involves taming complexity.

Complexity comes in a variety of shapes and sizes within the unstructured world. The reason for this arises from the fact that all things textual, audio, video and more, are based on Human Reasoning and Thinking. The fundamental concept behind human reasoning relates every piece to a context, for example - you go to nice restaurant and order food, more than the food, you relate the restaurant to an occasion, people who you were with, date on which you went there. Assume that you will write about the food experience, your document will contain just more than pure food. If we were to process this as data, without the relevant context it is pure noise with hidden layers of complexity due to the different patterns of thoughts that have gone into the document.

If we were to now take a look at everything we do, without context we are lost. Hence the need for a robust set of contextualized rules are needed to process data in the unstructured world. Textual ETL is one such rules engine that can solve the complexity equation. You can also do the same in Java and MapReduce, though it is very laborious.

Posted December 30, 2011 9:16 AM
Permalink | No Comments |
Recently a study was done by a leading university, that concluded Social Media provides minimal influence on behavior etc (read it here - Havard Study). My issue with such a study is the perception and viewpoints do not account for all that is happening in reality, and the study is focused solely on Facebook. IF you need a prime example of Social Media and its influence, look at the powers that reshaped the political landscape across the world this year, Time Magazine naming the Protester as person of the year, and this is a collection of all persona's and personalities digital and physical.

Social Media has become a reckoning force, a vital tool for information exchange and to a large extent has fostered many startups hoping to cash on building platforms and services around the subject.

With enterprise Social Adoption becoming the biggest trend among large corporations, it is pitiable that we are still grasping at the straws on the influence factors and its value.

In my opinion, the question of value is really based on the underlying purpose of the social network. It relates to the goals and direction a community, based on its common subjects of interest and intent. If you want to derive the value quotient from this community, you need to study its behavior and its relationships amongst members of the community. You can use many algorithms for such purposes, though there is no set methods available as commercial or open source solution. You can definitely use technologies like Hadoop, Cassandra, Mahout, R, Textual ETL to create solutions that will help in driving analytics which can help you to create, define and measure the value of a social network.

Social Media Metrics have been a nascent area and are still emerging from a concept to a solution. It is immature to think because I do not know how to measure, I will rather assume the exercise is futile. Rather look at the whole movement behind Mahout and R, this area will emerge in 2012 as the most adopted solution platform.

Posted December 24, 2011 10:49 AM
Permalink | No Comments |
I'm getting a lot of questions from audiences about Big Data - what is it? Hadoop - what is this? and more importantly why do i need to understand this now? will it be needed for my organization?. Bottom line is that the user community is completely befuddled with the whole Big Data and Hadoop messages. Let us take a step back and clear the slate.

What is Big Data? in a true sense, data that has been used to make decisions with respect to your business - transactional, spreadsheets, emails, campaigns, sales-force, call center, competitive intelligence, analytics, legal contracts, manuals, web data, consumer forums, content management systems and more form the foundation for Big Data. Another dimension to view this is data that lies outside of transactional and EDW systems, that influence your business decisions and provide you insights into the consumer and product behavior is also called Big Data, as it has no defined structure or storage mechanism. Big Data is big in terms of size, volume and complexity independent of Time.

Now let us come to Hadoop, it is a software solution framework, distributed by Apache foundation. The architecture of Hadoop lends to distributed and parallel computing techniques, by which you can manage massive volumes of data and process it to harness the underlying value in a relatively manageable time. Hadoop is an ecosystem as it is a community developed project and has continued contributions from the community of developers.

Is Big Data worth pursuing for any organization? very much yes. But like any other solution you need a strong business case to implement this type of program. You need a very strong governance model to analyze the type of data you want and what business value will it bring to the organization. It is a maturity journey, and it is not a turnkey solution to switch on a technology from any vendor and declare victory.

Do I need to use Hadoop and learn everything it has as a framework? well if you have to yet start the Big Data journey, all RDBMS and DW Appliance vendors are working on a race against time to bring Hadoop integration via your favorite DB platform. But even after that integration is complete from the given vendor, you still need to understand how Hadoop works and what type of problems will it solve, you will need to understand TextualETL and how you can write business rules in English to process text data of any type and much more. But the silver lining in this cloud of complexity is, all these technologies are evolving and will mature by the time you are ready to adopt them. They are all GUI driven and self managing, they all address problems that your current RDBMS or EDW platforms cannot resolve.

Two words of advise, Learn everything you can about these topics, sort out Blather and Real Information on these topics.

Remember not every business is Facebook or Twitter or FourSquare, but every business will evolve in the future to adopt similar models to emerge Customer Centric. As you start this journey, remember it is a Continuum and has multiple stages of maturity.

Posted December 15, 2011 8:38 PM
Permalink | No Comments |
AS we wind down 2011, the year has been the breaking ground for "new age BI".  The change in BI from traditional to new is in the following respects

  • Agile - the demand to provide reporting and analytics on demand on the most recent data.
  • Mobile - the report and analytics must be supported on mobile platforms.
  • Multi-Sourced - the reports and dashboards should be able to integrate data across sources. Needs a strong metadata footprint to support this.
  • Self-Service - the new BI tool should support self service reporting and analytics
  • Light Weight - the BI tool should be light weight and have a nimble footprint
  • Apps like capabilities - must be capable to run and deploy as Apps (as in iPad and Android Apps)
  • Built in support for Office or Open Office - the BI tool must have support for
    Office or Open Office apps
  • Unsrtuctured Data - the BI tool must support unstructured data and its requirements
  • In-Memory Capabilities

The list can go on and on, but these are a driving shifts that will move BI to the next generation and in 2012 will define the course for the leading platforms in this realm QlikView, Spotfire, Tableau, Microstrategy Mobile. Organizations have already started using the new platforms as augments to the existing platforms such as Cognos, Business Obejcts, OBIEE and Microstrategy. The most easily adopted tool by business users is Tableau and Spotfire in the current trend and enterprise users have started taking a hard look at Qlikview.

2012 will be a new year for BI and will probably kick off the new decade for BI as well with the new tools and trends. Only time will tell. But the future is here and being explored as we read this.

Posted December 6, 2011 9:02 PM
Permalink | 1 Comment |
An intriguing bemoan from most users that I have heard is "what is the value of my data?". Well if you are looking for $$s and cents, then your data has value from the time it is created till the next few hours where it gets analyzed, after that it is history and considered as dead weight in strict terms. However being born collectors, human tendency is to keep all this weight around and get a very obese and unhealthy data collection, that will eventually just die due to its sheer volume(read weight).

The value of your data is measured in different ways
  • Origination - This is a point of creation of the data, can be a transaction, an email, an application for insurance or a claim. Data is deemed to be often dirty at this juncture and data quality rules are applied for correction. This is the point in time where data has the highest value
  • Transformation - This is a point of collecting and transforming the data to be ingested into analytical and reporting platforms. At this point again due to the number of rules that are applied, data here will have a very high value
  • Analysis and Reporting - This is the last point in the life-cycle where the data value is held high. The data here points to trends and behaviors as simple metrics, but yet will serve a very useful purpose of being a treasured indicator

There are a number of Data Quality indicators that will be able to measure the effectiveness of quality across the enterprise both in origination and transformation phase. These indicators will be a very useful point to prove that data of good quality has a high value as it helps speed up decision support platforms. This is one way to assess and prove the value of your data.

The second way to assess and prove the value of data, is to measure its effectiveness when used as metrics and KPI's in reports and Analytics. The quality and timeliness of data here will be measurable with future results which can be compared with current results, and the differential lift can be attributed to the value of data being available, with the right quality and at the right time.

While none of these are new techniques, with the same question arising many times, it is prudent to nudge the good old ways and ensure that the simplicity of using these can be the best innovations that you have accomplished in assessing the value of your data.

Posted December 2, 2011 6:00 AM
Permalink | No Comments |