Blog: Krish Krishnan Subscribe to this blog's RSS feed!

Krish Krishnan

"If we knew what it was we were doing, it would not be called research, would it?" - Albert Einstein.

Hello, and welcome to my blog.

I would like to use this blog to have constructive communication and exchanges of ideas in the business intelligence community on topics from data warehousing to SOA to governance, and all the topics in the umbrella of these subjects.

To maximize this blog's value, it must be an interactive venue. This means your input is vital to the blog's success. All that I ask from this audience is to treat everybody in this blog community and the blog itself with respect.

So let's start blogging and share our ideas, opinions, perspectives and keep the creative juices flowing!

About the author >

Krish is a recognized expert worldwide in the strategy, architecture and implementation of high performance data warehousing solutions. He is a visionary data warehouse thought leader and an independent analyst, writing and speaking at industry leading conferences, user groups and trade publications. He has authored two eBooks, more than 75 articles, viewpoints and case studies on business intelligence, data warehousing, and data warehouse appliances and architectures. In his 19 plus years of professional experience, he has been solving complex architecture problems spanning all aspects of data warehousing and business intelligence for Fortune 1000 clients. He has designed and tuned some of the world’s largest data warehouses.

The Vice President of Strategy at Chicago Business Intelligence Group, Krish teaches regularly at TDWI, DAMA, IRM UK and other conferences, and is helping drive and mature the data warehouse appliance market. Krish also serves as Associate Vice President of Programs for DAMA Chicago and is Ethics and Governance Advisor to DAMA International.

Editor's Note: More articles and resources are available in Krish's BeyeNETWORK Expert Channel. Be sure to visit today!

By now all of you have learned about the announcement from Amazon about DynamoDB, the latest database with NoSQL+Cassandra+Voldemort+Riak and a lot of other tools thrown together, completely hosted on the cloud, with the feature to scale on demand, a true elastic scalability similar to EC2. throw on top of this a MapReduce interface and you have a Big Data Database that can truly scale.

What sets DynamoDB in my simple tests over the past few hours is the simplicity that it brings to Big Data processing. While my tests are not complete yet, initial results are definitely encouraging. As I write this blog, I have also read Datastax's comparison of Cassandra and DynamoDB at -  DataStax questions DynamoDB's performance. The comparison is long post full of technical comparisons around operations per second, but does not mention cost or services provision of DataStax. If you look at cost, Amazon says the services start at $1 per gigabyte per month. Data transfer is free for incoming data. It's also free for the first 10 terabytes per month and between AWS services (like Elastic MapReduce and S3). Once you surpass 10 terabytes, taking data out of the service is $0.12 per gigabyte through 40 terabytes and then lower rates up to 350 terabytes. Throughput capacity is $0.01 per hour for every 10 units of write capacity and $0.01 per hour for every 50 units of read capacity.

Based on where several internet-based, service companies have built models and found success, they will not have any hesitation in adopting to the DynamoDB platform. Especially with the ability to dial-up and dial-down scalability, you can really control costs, which even on a consistent basis will be much lesser compared to on-site provisioning for these companies. DynamoDB has beta clients like
Elsevier, Formspring and SmugMug, which are definitely encouraging names.

As an organization, If one were to choose a cloud based services provider for Big Data, Amazon sounds a logical choice based on several fronts, but is your big data initiative internet deploy-able? and do you have staffing to execute the program even if you host the data on the cloud?. While you digest more content apart from this blog on DynamoDB, I will revert to running more experiments and share more information in the next few days on scalability tests and consistency of the database.

There are several NoSQL databases to compare DynamoDB against too for a fair comparison at the DB level.

Watch for further information on specifics.

Posted January 19, 2012 9:38 PM
Permalink | No Comments |
Recently a tweet caught several people's attention - "Eventually, Hadoop will swallow the EDW". Let us be very clear, the EDW will be needed now and in the future. The premise of an EDW is for processing and storing data for consumption across the Enterprise for Analytical and Reporting purposes. Hadoop is a platform for managing the processing of Big Data, it is not a relational data store and nor is it engineered to replace the EDW. Several people have a similar misconception, but Hadoop and EDW are mutually exclusive platforms and they will be integrated via strong Metadata relationships.

It is true that Hadoop is getting several upgrades and new distributors, but this does not mean you can move all your EDW data into that platform. Structured data is best processed on RDBMS platforms.

You can argue that one needs a hammer to drive a nail into the wall, but what type of hammer, what type of nail and what type of wall, all of these matter.

There are several articles in the internet including presentations from Hadoop community on why EDW. I urge you to do some research and understand the same. Plan on attending TDWI Las Vegas or Chicago this year to learn more on this, or plan to attend Enterprise Data World 2012 in Atlanta. We have several discussions and sessions on this subject.

Bottomline, EDW is here to stay and is nor getting retired soon.

Posted January 11, 2012 9:59 PM
Permalink | 1 Comment |
At a recent event where I did a keynote, an audience question was on why Big Data means processing with Patterns?. Let us take a step back and analyze this thought, Patterns have always been the way we have learned. Whether it is languages by symbology or music patterns, the human mind can imbibe those patterns and reproduce them. This concept extended to computing too, where we reduce different types of data ino binary symbols that are interpreted by the system.

The patterns are what we formed into thoughts and behaviors that manifested into Big Data, and it is the very same patterns that need to be disambiguated with context. If you draw full circle, patterns play an important role in any aspect of data processing.

Pattern processing is intricate and definitely complex, but there are robust techniques to accomplish this subject. With the advent of Parallel processing techniques for large scale data, Pattern based processing has become more scalable and flexible.

While the subject is not new, thinking about processing complex data from this perspective will be one approach to tackle the problem of Big Data processing

Posted January 4, 2012 9:39 PM
Permalink | No Comments |
Taming the three big things in Unstructured Data (Big Data) include Volume, Velocity and Complexity. While we can see infrastructure growing to handle the volume and velocity equations, the third and the most toughest task involves taming complexity.

Complexity comes in a variety of shapes and sizes within the unstructured world. The reason for this arises from the fact that all things textual, audio, video and more, are based on Human Reasoning and Thinking. The fundamental concept behind human reasoning relates every piece to a context, for example - you go to nice restaurant and order food, more than the food, you relate the restaurant to an occasion, people who you were with, date on which you went there. Assume that you will write about the food experience, your document will contain just more than pure food. If we were to process this as data, without the relevant context it is pure noise with hidden layers of complexity due to the different patterns of thoughts that have gone into the document.

If we were to now take a look at everything we do, without context we are lost. Hence the need for a robust set of contextualized rules are needed to process data in the unstructured world. Textual ETL is one such rules engine that can solve the complexity equation. You can also do the same in Java and MapReduce, though it is very laborious.



Posted December 30, 2011 9:16 AM
Permalink | No Comments |
Recently a study was done by a leading university, that concluded Social Media provides minimal influence on behavior etc (read it here - Havard Study). My issue with such a study is the perception and viewpoints do not account for all that is happening in reality, and the study is focused solely on Facebook. IF you need a prime example of Social Media and its influence, look at the powers that reshaped the political landscape across the world this year, Time Magazine naming the Protester as person of the year, and this is a collection of all persona's and personalities digital and physical.

Social Media has become a reckoning force, a vital tool for information exchange and to a large extent has fostered many startups hoping to cash on building platforms and services around the subject.

With enterprise Social Adoption becoming the biggest trend among large corporations, it is pitiable that we are still grasping at the straws on the influence factors and its value.

In my opinion, the question of value is really based on the underlying purpose of the social network. It relates to the goals and direction a community, based on its common subjects of interest and intent. If you want to derive the value quotient from this community, you need to study its behavior and its relationships amongst members of the community. You can use many algorithms for such purposes, though there is no set methods available as commercial or open source solution. You can definitely use technologies like Hadoop, Cassandra, Mahout, R, Textual ETL to create solutions that will help in driving analytics which can help you to create, define and measure the value of a social network.

Social Media Metrics have been a nascent area and are still emerging from a concept to a solution. It is immature to think because I do not know how to measure, I will rather assume the exercise is futile. Rather look at the whole movement behind Mahout and R, this area will emerge in 2012 as the most adopted solution platform.


Posted December 24, 2011 10:49 AM
Permalink | No Comments |
PREV 1 2