This week, at the PASS Summit, Microsoft unveiled its inevitable "big data" strategy. The world of big data is the new unchartered land in information management and the big vendors are jumping on board. "New economy" giants like eBay, twitter, FaceBook and Google are the early adopters - and many even built the big data tools that everything is based on.
It would be too easy to dismiss big data as a Valley-only phenomenon, and you shouldn't. Microsoft's information management tools serve perhaps the widest ranging set of clients anywhere. They've either made their move to "keep up with the Joneses" (Oracle had some big data announcements last week) or there must be some Global 2000 budgets in it. The industry will not thrive without some of the latter and that's what I'm betting on.
There's vast utility in unstructured and machine-generated data (somehow tweets count in this category) and many reasons, starting with monetary, why, once a company finds some use for it, they will choose a big data tool like Hadoop rather than a relational database management system to store the data. Yes, and even live with the tradeoffs of lack of ACID compliance, lack of transactions, lack of SQL (although this is eroding by the day), lack of schema sharing, the need to user-assemble (although this is also eroding) and node failures being a way of life. Indeed, the "secret sauce" of Hadoop is the distribution of data and node recovery failure - RAID-like, but less costly.
It's better to play with this "hippy developed" (as one skeptic referred to it as) Hadoop than ignore it at this point. That's what Microsoft has done. Microsoft is working to deploy Hadoop on Windows and cloud-based Azure. This could really work in Microsoft's big data land grab. It's a hedge against going too hard-core into the open-source world. It's comfortable Windows combined with Hadoop. For the many, many fence-sitters out there, this is good timing. Many want to trace movements of physical objects, trace web clicks and other Web 2.0 activity. They want to do this without sacrificing enterprise standards they are used to with products like Windows and its management toolset.
Development will occur with the Yahoo-legacy Hortonworks and will go into Apache. This announcement follows the development of the Sqoop-compatible Microsoft SQL Server Connector for Apache Hadoop.
A simultaneous Microsoft big data announcement was an ODBC Driver to Hive. Hive was developed by FaceBook to make the data access to Hadoop easier than MapReduce. Every day, FaceBook runs 150,000 jobs. Only 500 are MapReduce, the rest are HiveQL. HiveQL is SQL-like and, in some ways, actually exceeds SQL capabilities with complex types like associative arrays, lists and structure data types. And soon, it will have an ODBC driver from Microsoft.
The announcements didn't coincide with any showable development so apparently there's still some work involved before we will have substantially more information, but it's definitely worth watching as a milestone in the big data journey.
Posted October 15, 2011 2:12 PM
Permalink | No Comments |



