We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.


Blog: Krish Krishnan Subscribe to this blog's RSS feed!

Krish Krishnan

"If we knew what it was we were doing, it would not be called research, would it?" - Albert Einstein.

Hello, and welcome to my blog.

I would like to use this blog to have constructive communication and exchanges of ideas in the business intelligence community on topics from data warehousing to SOA to governance, and all the topics in the umbrella of these subjects.

To maximize this blog's value, it must be an interactive venue. This means your input is vital to the blog's success. All that I ask from this audience is to treat everybody in this blog community and the blog itself with respect.

So let's start blogging and share our ideas, opinions, perspectives and keep the creative juices flowing!

About the author >

Krish Krishnan is a worldwide-recognized expert in the strategy, architecture, and implementation of high-performance data warehousing solutions and big data. He is a visionary data warehouse thought leader and is ranked as one of the top data warehouse consultants in the world. As an independent analyst, Krish regularly speaks at leading industry conferences and user groups. He has written prolifically in trade publications and eBooks, contributing over 150 articles, viewpoints, and case studies on big data, business intelligence, data warehousing, data warehouse appliances, and high-performance architectures. He co-authored Building the Unstructured Data Warehouse with Bill Inmon in 2011, and Morgan Kaufmann will publish his first independent writing project, Data Warehousing in the Age of Big Data, in August 2013.

With over 21 years of professional experience, Krish has solved complex solution architecture problems for global Fortune 1000 clients, and has designed and tuned some of the world’s largest data warehouses and business intelligence platforms. He is currently promoting the next generation of data warehousing, focusing on big data, semantic technologies, crowdsourcing, analytics, and platform engineering.

Krish is the president of Sixth Sense Advisors Inc., a Chicago-based company providing independent analyst, management consulting, strategy and innovation advisory and technology consulting services in big data, data warehousing, and business intelligence. He serves as a technology advisor to several companies, and is actively sought after by investors to assess startup companies in data management and associated emerging technology areas. He publishes with the BeyeNETWORK.com where he leads the Data Warehouse Appliances and Architecture Expert Channel.

Editor's Note: More articles and resources are available in Krish's BeyeNETWORK Expert Channel. Be sure to visit today!

January 2012 Archives

By now all of you have learned about the announcement from Amazon about DynamoDB, the latest database with NoSQL+Cassandra+Voldemort+Riak and a lot of other tools thrown together, completely hosted on the cloud, with the feature to scale on demand, a true elastic scalability similar to EC2. throw on top of this a MapReduce interface and you have a Big Data Database that can truly scale.

What sets DynamoDB in my simple tests over the past few hours is the simplicity that it brings to Big Data processing. While my tests are not complete yet, initial results are definitely encouraging. As I write this blog, I have also read Datastax's comparison of Cassandra and DynamoDB at -  DataStax questions DynamoDB's performance. The comparison is long post full of technical comparisons around operations per second, but does not mention cost or services provision of DataStax. If you look at cost, Amazon says the services start at $1 per gigabyte per month. Data transfer is free for incoming data. It's also free for the first 10 terabytes per month and between AWS services (like Elastic MapReduce and S3). Once you surpass 10 terabytes, taking data out of the service is $0.12 per gigabyte through 40 terabytes and then lower rates up to 350 terabytes. Throughput capacity is $0.01 per hour for every 10 units of write capacity and $0.01 per hour for every 50 units of read capacity.

Based on where several internet-based, service companies have built models and found success, they will not have any hesitation in adopting to the DynamoDB platform. Especially with the ability to dial-up and dial-down scalability, you can really control costs, which even on a consistent basis will be much lesser compared to on-site provisioning for these companies. DynamoDB has beta clients like
Elsevier, Formspring and SmugMug, which are definitely encouraging names.

As an organization, If one were to choose a cloud based services provider for Big Data, Amazon sounds a logical choice based on several fronts, but is your big data initiative internet deploy-able? and do you have staffing to execute the program even if you host the data on the cloud?. While you digest more content apart from this blog on DynamoDB, I will revert to running more experiments and share more information in the next few days on scalability tests and consistency of the database.

There are several NoSQL databases to compare DynamoDB against too for a fair comparison at the DB level.

Watch for further information on specifics.

Posted January 19, 2012 9:38 PM
Permalink | No Comments |
Recently a tweet caught several people's attention - "Eventually, Hadoop will swallow the EDW". Let us be very clear, the EDW will be needed now and in the future. The premise of an EDW is for processing and storing data for consumption across the Enterprise for Analytical and Reporting purposes. Hadoop is a platform for managing the processing of Big Data, it is not a relational data store and nor is it engineered to replace the EDW. Several people have a similar misconception, but Hadoop and EDW are mutually exclusive platforms and they will be integrated via strong Metadata relationships.

It is true that Hadoop is getting several upgrades and new distributors, but this does not mean you can move all your EDW data into that platform. Structured data is best processed on RDBMS platforms.

You can argue that one needs a hammer to drive a nail into the wall, but what type of hammer, what type of nail and what type of wall, all of these matter.

There are several articles in the internet including presentations from Hadoop community on why EDW. I urge you to do some research and understand the same. Plan on attending TDWI Las Vegas or Chicago this year to learn more on this, or plan to attend Enterprise Data World 2012 in Atlanta. We have several discussions and sessions on this subject.

Bottomline, EDW is here to stay and is nor getting retired soon.

Posted January 11, 2012 9:59 PM
Permalink | 1 Comment |
At a recent event where I did a keynote, an audience question was on why Big Data means processing with Patterns?. Let us take a step back and analyze this thought, Patterns have always been the way we have learned. Whether it is languages by symbology or music patterns, the human mind can imbibe those patterns and reproduce them. This concept extended to computing too, where we reduce different types of data ino binary symbols that are interpreted by the system.

The patterns are what we formed into thoughts and behaviors that manifested into Big Data, and it is the very same patterns that need to be disambiguated with context. If you draw full circle, patterns play an important role in any aspect of data processing.

Pattern processing is intricate and definitely complex, but there are robust techniques to accomplish this subject. With the advent of Parallel processing techniques for large scale data, Pattern based processing has become more scalable and flexible.

While the subject is not new, thinking about processing complex data from this perspective will be one approach to tackle the problem of Big Data processing

Posted January 4, 2012 9:39 PM
Permalink | No Comments |