Blog: Krish Krishnan http://www.b-eye-network.com/blogs/krishnan/ "If we knew what it was we were doing, it would not be called research, would it?" - Albert Einstein. Hello, and welcome to my blog. I would like to use this blog to have constructive communication and exchanges of ideas in the business intelligence community on topics from data warehousing to SOA to governance, and all the topics in the umbrella of these subjects. To maximize this blog's value, it must be an interactive venue. This means your input is vital to the blog's success. All that I ask from this audience is to treat everybody in this blog community and the blog itself with respect. So let's start blogging and share our ideas, opinions, perspectives and keep the creative juices flowing! Copyright 2012 Thu, 17 May 2012 12:51:20 -0700 http://www.movabletype.org/?v=4.261 http://blogs.law.harvard.edu/tech/rss Exploring Big Data - Taxonomies
Taxonomies have long been used as catalog or index creation mechanisms in the world of metadata driven approach to data management and more so in the Web driven architecture where you need linked context behind the scenes. The very same taxonomy family can simply be used to create what we call word clouds or tags from content that is within Big Data. these tags can be used to create powerful linkages that will form a lineage and a graph.

What about Data Quality? that is the biggest advantage of using Taxonomies. When you have spelling errors and language issues, due to the intrinsic nature of taxonomies, you can land to a margin of error equation and often arrive at a close match.

Will this work on all types of big data, from my experiments and learning's it has worked with almost all types of data that can be deciphered by human minds. My next article in this channel will be focusing on this subject.

What can you do with the output from such a discovery? the obvious answer is that you can create a data road-map with linkages to all data across the enterprise. This is a foundational first step in a bigdata journey.
]]>
http://www.b-eye-network.com/blogs/krishnan/archives/2012/05/exploring_big_d.php http://www.b-eye-network.com/blogs/krishnan/archives/2012/05/exploring_big_d.php Thu, 17 May 2012 12:51:20 -0700
Discovering Hidden Nuggets With Taxonomies
In this world of chaos, is where we will see the strength of a nascent area of data architecture called Taxonomies. Taxonomies themselves are very popular since the early days of Aristotle, and have even been found referenced in the writings of Chinese emperor's in 3000 B.C. The word is derived from the Greek language and means classifying and identifying species. It has been used in Biology and Language for many years.

Today we have Taxonomies available for every product and subject area in the world, thanks to companies like Wand Inc, Pingar and others. These taxonomies  provide a clear metadata lineage and relationship map, which can be directly used on any kind of data to navigate and classify the same. Another benefit of taxonomies is the ability to integrate different types of data about the same subject and create powerful mashups.

The biggest advantage in using taxonomies is your ability to navigate data in its native forms without having to transport it to a single location, this removes latencies and creates minimal integration work. A second advantage lies in the fact that you can navigate multiple subject dimensions in one document or video or picture, without reprocessing the data multiple times over.

In the land of "big data", you can discover hidden nuggets of information with this approach and then create powerful visualization using light weight reporting tools like Tableau or SpotFire.

To learn more on these subjects and their usage, attend EDW 2012, TDWI and Strata Conferences.
]]>
http://www.b-eye-network.com/blogs/krishnan/archives/2012/04/doscovering_hid.php http://www.b-eye-network.com/blogs/krishnan/archives/2012/04/doscovering_hid.php Sun, 15 Apr 2012 16:19:21 -0700
Why Analytics Matters
If you stand outside your normal periphery and look at Data, Big Data in particular, inspite of its sheer vastness in volume, velocity, variety, complexity and such, this data is very easily visualized and understood when seen through Analytics. Imagine for example, on Amazon.com your search has additional recommendations that are all textual in nature, you would be least interested in going back to Amazon.com, rather the data is presented as a statistic and has associated confidence factors, which makes it easy for you to shop there repeatedly. This is a simple example for this discussion on Why Analytics matters.

As you start looking for Big Data, remember to look for Analytics too, without the latter the former will never provide you useful insights.
]]>
http://www.b-eye-network.com/blogs/krishnan/archives/2012/03/why_analytics_m.php http://www.b-eye-network.com/blogs/krishnan/archives/2012/03/why_analytics_m.php Tue, 13 Mar 2012 13:01:48 -0700
Visualizing Big Data
While all this buzz is great and early adopters to this new and shiny object have made inroads, at a mass adoption level, Big Data has not been embraced yet by the business users. The reason for this being a key aspect of Visualization. One of the fundamental aspects that need to be understood here is the way we approach Big Data and its modeling and integration is very different from any other data integration exercise. Here is a simple way of looking at the difference

  •  Traditional Data Integration - Business Requirements & Analysis --> Model --> Organize--> Collect --> Integrate --> Store -->  Analyze --> Visualize
  •  Big Data - Collect --> Store --> Organize --> Visualize --> Analyze --> Business Requirements & Analysis --> Model --> Integrate --> Visualize
As you can see from the flow shown above, Big Data needs visualization before you can settle down for business requirements and post integration. You might wonder if this is really a huge problem or are we hyping this up, in reality this is a problem and there is very minimal options available at this point to provide as solutions. I do not want to classify any "App store" downloads as a robust solution, they are all driven towards a personal market for a consumer.

The reason for the current situation can be analyzed in two ways

  • Infrastructure Focus - Web 1.0 and 2.0 focused on infrastructure and the underpinnings, the OSI model and current solutions in the marketplace will definitely point that.
  • Data Ambiguity and Complexity - Big Data by nature is complex and ambigous, this requires additional efforts and deep SME's and sometimes Quants to think, integrate and solve. These folks need to able to visually analyze the data than reading machine data or long pages of text. The tools are not there yet for this purpose.
It is simply unfair to anyone to be looking at large sets of data to derive any value from the same. We need tools that can provide an interrogation platform for that data. The tools will and should be very ontology and semantic focused as we are not ready to model or integrate the data yet.

The journey ahead is greenfield still, there are a few vendors who are visionaries and among them there are a few who are considered leaders. In this year we will see a flurry of activity and investments in this side of the house. The frenzy of Big Data has not peaked yet but it is not too far in the future.
]]>
http://www.b-eye-network.com/blogs/krishnan/archives/2012/03/visualizing_big.php http://www.b-eye-network.com/blogs/krishnan/archives/2012/03/visualizing_big.php Sat, 03 Mar 2012 06:37:28 -0700
The Rise of the Crowd - Part 2 Amazon, IMDb, and BookCrossing. Do you know that the Crowd has moved beyond just user reviews.

Let's look at some real life examples - a popular one quoted by Don Tapscott in his book MacroWikinomics is "Local Motors", a company that builds custom cars that is based on deisgn ideas submitted by car enthusiasts from around the world. Not only can you design your own car, you can visit a local motor location, order parts, customize your car and do whatever you want to build it to your specification. Their mission statement reads "To lead the next generation of crowd-powered automotive manufacturing, design, and technology in order to enable the creation of game changing vehicles".  Another great example is Fiat Mio an open innovation of a car by Fiat Motor Company.

Why is open innovation with crowdsourcing more popular? two reasons (1) The internet has created a virtual world where there are no physical boundaries. This means you can simply form communities of common interest and that leads to building a crowd that is very knowledgeable and passionate about a particular topic or topics, (2) When you open a problem to be solved by a large crowd of individuals, you get a set of solutions that will have a higher confidence factor to work and a potential lower risk factor. 

Let's come back to the reviews and feedback on various forums, why do you care or why does an online or a brick&mortar retailer care or why does your 401k investment management care? the reasons for this from a business perspective are to create that transparency and trust in you as a customer or a prospect. Your opinion will be formed based on experiences of others and you need to hear the positive and negative sentiments equally to get an informed opinion or decision. Word of mouth marketing in any social media channel allows you to get access to more metrics than ever before and the socially aware customer is a better customer to acquire and retain.

Crowdsourcing has also become mainstream with companies like Kaggle, Threadless, CrowdAnalytix, Cambrian House and Innocentive hosting competitions for the Crowd to solve. These problems range from simple to complex puzzles and often come with cash prizes.

Linux, Hadoop and many popular software's today are stellar examples of the Wisdom of Crowds.

In conclusion, we see from ancient and recent history that the Crowd formed from a community of interest has often proved to be powerful and game changing. As we move into the future, the internet and social media together have created the platform for virtual crowds to form and create powerful game changing movements.
]]>
http://www.b-eye-network.com/blogs/krishnan/archives/2012/02/the_rise_of_the_1.php http://www.b-eye-network.com/blogs/krishnan/archives/2012/02/the_rise_of_the_1.php Mon, 20 Feb 2012 19:16:10 -0700
The Rise of the Crowd - Part 1
If you have read James Surowiecki's book titled The Wisdom of Crowds, there is a famous example of the power of the crowd demonstrated by Sir Francis Galton. The story goes In 1906, he was visiting a livestock fair in England, where he stumbled upon an intriguing contest. An ox was put on display, and the villagers were invited to guess the animal's weight after it was slaughtered and dressed, paying 6 pence to participate. Nearly 800 people  participated, but not one person hit the exact mark: 1,198 pounds. Galton collected the answers and applied the statistical mean of these guesses from independent people in the crowd: Astonishingly the mean of those 800 guesses was 1,197 pounds, accurate to fraction of a percent. This marks the first of the series of experiments conducted by scientists to prove the collective intelligence of the crowd.

What this proves to us is when you apply a set of smart people to solve a problem, any problem, chances of a solution are very more possible than a single person trying to do the same. Today the same type of contests are held by companies such as Kaggle, 99Designs, Innocentive, CrowdAnalytix and many others, where statisticians and analytic experts compete to solve such problems.

What is the use of these contests and these business models? well there are several benefits

  • The problem can be solved better by a crowd where it can be solved faster
  • The open innovation platform provides you access to more experts than any consulting expertise can provide
  • Costs can be better managed in an open contest where the solution has a fixed price and timeline
And the list goes on. We will see how challenges arise in this subject in tomorrow's blog

The topic is deep and wide,  next week at TDWI Las Vegas, there is a night school session on this subject that I'm hosting, feel free to attend.
]]>
http://www.b-eye-network.com/blogs/krishnan/archives/2012/02/the_rise_of_the.php http://www.b-eye-network.com/blogs/krishnan/archives/2012/02/the_rise_of_the.php Tue, 07 Feb 2012 18:15:02 -0700
The Big Data Database Saga Continues
What sets DynamoDB in my simple tests over the past few hours is the simplicity that it brings to Big Data processing. While my tests are not complete yet, initial results are definitely encouraging. As I write this blog, I have also read Datastax's comparison of Cassandra and DynamoDB at -  DataStax questions DynamoDB's performance. The comparison is long post full of technical comparisons around operations per second, but does not mention cost or services provision of DataStax. If you look at cost, Amazon says the services start at $1 per gigabyte per month. Data transfer is free for incoming data. It's also free for the first 10 terabytes per month and between AWS services (like Elastic MapReduce and S3). Once you surpass 10 terabytes, taking data out of the service is $0.12 per gigabyte through 40 terabytes and then lower rates up to 350 terabytes. Throughput capacity is $0.01 per hour for every 10 units of write capacity and $0.01 per hour for every 50 units of read capacity.

Based on where several internet-based, service companies have built models and found success, they will not have any hesitation in adopting to the DynamoDB platform. Especially with the ability to dial-up and dial-down scalability, you can really control costs, which even on a consistent basis will be much lesser compared to on-site provisioning for these companies. DynamoDB has beta clients like
Elsevier, Formspring and SmugMug, which are definitely encouraging names.

As an organization, If one were to choose a cloud based services provider for Big Data, Amazon sounds a logical choice based on several fronts, but is your big data initiative internet deploy-able? and do you have staffing to execute the program even if you host the data on the cloud?. While you digest more content apart from this blog on DynamoDB, I will revert to running more experiments and share more information in the next few days on scalability tests and consistency of the database.

There are several NoSQL databases to compare DynamoDB against too for a fair comparison at the DB level.

Watch for further information on specifics.
]]>
http://www.b-eye-network.com/blogs/krishnan/archives/2012/01/the_big_data_da.php http://www.b-eye-network.com/blogs/krishnan/archives/2012/01/the_big_data_da.php Thu, 19 Jan 2012 21:38:31 -0700
EDW - It will be the Enterprise Data Warehouse
It is true that Hadoop is getting several upgrades and new distributors, but this does not mean you can move all your EDW data into that platform. Structured data is best processed on RDBMS platforms.

You can argue that one needs a hammer to drive a nail into the wall, but what type of hammer, what type of nail and what type of wall, all of these matter.

There are several articles in the internet including presentations from Hadoop community on why EDW. I urge you to do some research and understand the same. Plan on attending TDWI Las Vegas or Chicago this year to learn more on this, or plan to attend Enterprise Data World 2012 in Atlanta. We have several discussions and sessions on this subject.

Bottomline, EDW is here to stay and is nor getting retired soon.
]]>
http://www.b-eye-network.com/blogs/krishnan/archives/2012/01/edw_-_it_will_b.php http://www.b-eye-network.com/blogs/krishnan/archives/2012/01/edw_-_it_will_b.php Wed, 11 Jan 2012 21:59:26 -0700
Patterns
The patterns are what we formed into thoughts and behaviors that manifested into Big Data, and it is the very same patterns that need to be disambiguated with context. If you draw full circle, patterns play an important role in any aspect of data processing.

Pattern processing is intricate and definitely complex, but there are robust techniques to accomplish this subject. With the advent of Parallel processing techniques for large scale data, Pattern based processing has become more scalable and flexible.

While the subject is not new, thinking about processing complex data from this perspective will be one approach to tackle the problem of Big Data processing
]]>
http://www.b-eye-network.com/blogs/krishnan/archives/2012/01/patterns.php http://www.b-eye-network.com/blogs/krishnan/archives/2012/01/patterns.php Wed, 04 Jan 2012 21:39:34 -0700
Unstructured Data - Complexity
Complexity comes in a variety of shapes and sizes within the unstructured world. The reason for this arises from the fact that all things textual, audio, video and more, are based on Human Reasoning and Thinking. The fundamental concept behind human reasoning relates every piece to a context, for example - you go to nice restaurant and order food, more than the food, you relate the restaurant to an occasion, people who you were with, date on which you went there. Assume that you will write about the food experience, your document will contain just more than pure food. If we were to process this as data, without the relevant context it is pure noise with hidden layers of complexity due to the different patterns of thoughts that have gone into the document.

If we were to now take a look at everything we do, without context we are lost. Hence the need for a robust set of contextualized rules are needed to process data in the unstructured world. Textual ETL is one such rules engine that can solve the complexity equation. You can also do the same in Java and MapReduce, though it is very laborious.


]]>
http://www.b-eye-network.com/blogs/krishnan/archives/2011/12/unstructured_da_1.php http://www.b-eye-network.com/blogs/krishnan/archives/2011/12/unstructured_da_1.php Fri, 30 Dec 2011 09:16:58 -0700
Social Media - Does it really influence? Havard Study). My issue with such a study is the perception and viewpoints do not account for all that is happening in reality, and the study is focused solely on Facebook. IF you need a prime example of Social Media and its influence, look at the powers that reshaped the political landscape across the world this year, Time Magazine naming the Protester as person of the year, and this is a collection of all persona's and personalities digital and physical.

Social Media has become a reckoning force, a vital tool for information exchange and to a large extent has fostered many startups hoping to cash on building platforms and services around the subject.

With enterprise Social Adoption becoming the biggest trend among large corporations, it is pitiable that we are still grasping at the straws on the influence factors and its value.

In my opinion, the question of value is really based on the underlying purpose of the social network. It relates to the goals and direction a community, based on its common subjects of interest and intent. If you want to derive the value quotient from this community, you need to study its behavior and its relationships amongst members of the community. You can use many algorithms for such purposes, though there is no set methods available as commercial or open source solution. You can definitely use technologies like Hadoop, Cassandra, Mahout, R, Textual ETL to create solutions that will help in driving analytics which can help you to create, define and measure the value of a social network.

Social Media Metrics have been a nascent area and are still emerging from a concept to a solution. It is immature to think because I do not know how to measure, I will rather assume the exercise is futile. Rather look at the whole movement behind Mahout and R, this area will emerge in 2012 as the most adopted solution platform.

]]>
http://www.b-eye-network.com/blogs/krishnan/archives/2011/12/social_media_-.php http://www.b-eye-network.com/blogs/krishnan/archives/2011/12/social_media_-.php Sat, 24 Dec 2011 10:49:35 -0700
Big Data - Worth it or not?
What is Big Data? in a true sense, data that has been used to make decisions with respect to your business - transactional, spreadsheets, emails, campaigns, sales-force, call center, competitive intelligence, analytics, legal contracts, manuals, web data, consumer forums, content management systems and more form the foundation for Big Data. Another dimension to view this is data that lies outside of transactional and EDW systems, that influence your business decisions and provide you insights into the consumer and product behavior is also called Big Data, as it has no defined structure or storage mechanism. Big Data is big in terms of size, volume and complexity independent of Time.

Now let us come to Hadoop, it is a software solution framework, distributed by Apache foundation. The architecture of Hadoop lends to distributed and parallel computing techniques, by which you can manage massive volumes of data and process it to harness the underlying value in a relatively manageable time. Hadoop is an ecosystem as it is a community developed project and has continued contributions from the community of developers.

Is Big Data worth pursuing for any organization? very much yes. But like any other solution you need a strong business case to implement this type of program. You need a very strong governance model to analyze the type of data you want and what business value will it bring to the organization. It is a maturity journey, and it is not a turnkey solution to switch on a technology from any vendor and declare victory.

Do I need to use Hadoop and learn everything it has as a framework? well if you have to yet start the Big Data journey, all RDBMS and DW Appliance vendors are working on a race against time to bring Hadoop integration via your favorite DB platform. But even after that integration is complete from the given vendor, you still need to understand how Hadoop works and what type of problems will it solve, you will need to understand TextualETL and how you can write business rules in English to process text data of any type and much more. But the silver lining in this cloud of complexity is, all these technologies are evolving and will mature by the time you are ready to adopt them. They are all GUI driven and self managing, they all address problems that your current RDBMS or EDW platforms cannot resolve.

Two words of advise, Learn everything you can about these topics, sort out Blather and Real Information on these topics.

Remember not every business is Facebook or Twitter or FourSquare, but every business will evolve in the future to adopt similar models to emerge Customer Centric. As you start this journey, remember it is a Continuum and has multiple stages of maturity.
]]>
http://www.b-eye-network.com/blogs/krishnan/archives/2011/12/big_data_-_wort.php http://www.b-eye-network.com/blogs/krishnan/archives/2011/12/big_data_-_wort.php Thu, 15 Dec 2011 20:38:26 -0700
The New Age BI
  • Agile - the demand to provide reporting and analytics on demand on the most recent data.
  • Mobile - the report and analytics must be supported on mobile platforms.
  • Multi-Sourced - the reports and dashboards should be able to integrate data across sources. Needs a strong metadata footprint to support this.
  • Self-Service - the new BI tool should support self service reporting and analytics
  • Light Weight - the BI tool should be light weight and have a nimble footprint
  • Apps like capabilities - must be capable to run and deploy as Apps (as in iPad and Android Apps)
  • Built in support for Office or Open Office - the BI tool must have support for
    Office or Open Office apps
  • Unsrtuctured Data - the BI tool must support unstructured data and its requirements
  • In-Memory Capabilities


The list can go on and on, but these are a driving shifts that will move BI to the next generation and in 2012 will define the course for the leading platforms in this realm QlikView, Spotfire, Tableau, Microstrategy Mobile. Organizations have already started using the new platforms as augments to the existing platforms such as Cognos, Business Obejcts, OBIEE and Microstrategy. The most easily adopted tool by business users is Tableau and Spotfire in the current trend and enterprise users have started taking a hard look at Qlikview.


2012 will be a new year for BI and will probably kick off the new decade for BI as well with the new tools and trends. Only time will tell. But the future is here and being explored as we read this. ]]>
http://www.b-eye-network.com/blogs/krishnan/archives/2011/12/the_new_age_bi.php http://www.b-eye-network.com/blogs/krishnan/archives/2011/12/the_new_age_bi.php Tue, 06 Dec 2011 21:02:57 -0700
What is the value of my Data?
The value of your data is measured in different ways
  • Origination - This is a point of creation of the data, can be a transaction, an email, an application for insurance or a claim. Data is deemed to be often dirty at this juncture and data quality rules are applied for correction. This is the point in time where data has the highest value
  • Transformation - This is a point of collecting and transforming the data to be ingested into analytical and reporting platforms. At this point again due to the number of rules that are applied, data here will have a very high value
  • Analysis and Reporting - This is the last point in the life-cycle where the data value is held high. The data here points to trends and behaviors as simple metrics, but yet will serve a very useful purpose of being a treasured indicator

There are a number of Data Quality indicators that will be able to measure the effectiveness of quality across the enterprise both in origination and transformation phase. These indicators will be a very useful point to prove that data of good quality has a high value as it helps speed up decision support platforms. This is one way to assess and prove the value of your data.

The second way to assess and prove the value of data, is to measure its effectiveness when used as metrics and KPI's in reports and Analytics. The quality and timeliness of data here will be measurable with future results which can be compared with current results, and the differential lift can be attributed to the value of data being available, with the right quality and at the right time.


While none of these are new techniques, with the same question arising many times, it is prudent to nudge the good old ways and ensure that the simplicity of using these can be the best innovations that you have accomplished in assessing the value of your data.

]]>
http://www.b-eye-network.com/blogs/krishnan/archives/2011/12/what_is_the_val.php http://www.b-eye-network.com/blogs/krishnan/archives/2011/12/what_is_the_val.php Fri, 02 Dec 2011 06:00:00 -0700
Watch Out - Data Storm Headed This Way !!!
Data generation is not a bad idea, but who uses that data and how useful can that data be is a question of relevance. If you are on Facebook and post every hour to your page, chances are someone in your primary or extended network will be watching it or posting on your wall or something. Those critical transactions are needed for FB to run ads and look at influencer behaviors etc, they really do not care about the personal side of the data at all.

Now lets take a look at a consumer's own behavior,for example ( to make it easy)  how much of value do you see from data in your FB pages, and how much do you care about the data, say after one day, week, month or year ? you really do not think about it, as you are using the platform for free (actually paying for it in some very indirect way). This is why the need to look for Data Storms in the future.

The information overload from the public domain data that is produced for content sharing and sentiment sharing is causing a tizzy. There is value for points in time in this data, but where is it stored? who needs it? and does it make sense to keep it? what are the security threats for this data? Thus comes the big question of "Data Governance for Public Data". Sooner or later, we need to address this question and guess what I recommend that it be a decision that is driven and defined by the same consumers who generate the data everyday, as it is about their information privacy, security and more.

But till then, let's watch for weathermen to announce Data Storms or Data Weather Readings.
]]>
http://www.b-eye-network.com/blogs/krishnan/archives/2011/11/watch_out_-_dat.php http://www.b-eye-network.com/blogs/krishnan/archives/2011/11/watch_out_-_dat.php Wed, 30 Nov 2011 09:24:42 -0700