We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.

Blog: William McKnight Subscribe to this blog's RSS feed!

William McKnight

Hello and welcome to my blog!

I will periodically be sharing my thoughts and observations on information management here in the blog. I am passionate about the effective creation, management and distribution of information for the benefit of company goals, and I'm thrilled to be a part of my clients' growth plans and connect what the industry provides to those goals. I have played many roles, but the perspective I come from is benefit to the end client. I hope the entries can be of some modest benefit to that goal. Please share your thoughts and input to the topics.

About the author >

William is the president of McKnight Consulting Group, a firm focused on delivering business value and solving business challenges utilizing proven, streamlined approaches in data warehousing, master data management and business intelligence, all with a focus on data quality and scalable architectures. William functions as strategist, information architect and program manager for complex, high-volume, full life-cycle implementations worldwide. William is a Southwest Entrepreneur of the Year finalist, a frequent best-practices judge, has authored hundreds of articles and white papers, and given hundreds of international keynotes and public seminars. His team's implementations from both IT and consultant positions have won Best Practices awards. He is a former IT Vice President of a Fortune company, a former software engineer, and holds an MBA. William is author of the book 90 Days to Success in Consulting. Contact William at wmcknight@mcknightcg.com.

Editor's Note: More articles and resources are available in William's BeyeNETWORK Expert Channel. Be sure to visit today!

Recently in BigData NoSQL Category

Couchbase announced the general availability of Couchbase Server 2.0 today.  This is the long anticipated release that carries Couchbase's capabilities over the threshold from a key-value store to document store. 

Key-value stores have little to no understanding of the value part of the key in the key-value. It is simply a blob, keyed by a unique identifier which serves a "primary key" function.  This key is used for get, put and delete.  You can also search inside the value, although the performance may be suboptimal.  The application consuming the record will need to know what to do with the blob/value.

Key-value stores are prominent today but with a few additional features they can become a document store or a column store and Couchbase has made their server a document store. When records are simple and lack strong commonality of look, a key-value store may be all you need. However, I don't see any of Couchbase's current workloads moving away from Couchbase due to the innovations.  These workloads include keeping track of everything related to the new class of web, console and mobile applications that are run simultaneously by thousands of users, sometimes sharing sessions - as in games.  They also include shopping carts.

Workloads involving relationships amongst data, complexity of the data or the need to operate on multiple keys/records at once need some of the recently added functions to 2.0 which include distributed indexing and querying, JSON storage, online compaction, incremental map reduce and cross data center replication.

Document stores tend to be excellent for logging online events of different types when those events can have varying characteristics.  They work well as pseudo content management systems that support website content.  Due to their ability to manipulate at a granular level, document stores are good for real-time analytics that change dynamically for which the application can take advantage of immediately.  

Key-value stores have historically (ok, it's all a short history) performed better than the more functional document stores.  Couchbase is navigating their capabilities forward in a very sensible manner.  There is no reason for performance degradation or lack of functionality and Couchbase is beginning to show this.  While there's still a release or two to go to be at a functionality standard for document stores, Couchbase is coming at it from a strong performance base.

Posted December 12, 2012 7:41 AM
Permalink | No Comments |

This week, at the PASS Summit, Microsoft unveiled its inevitable "big data" strategy.  The world of big data is the new unchartered land in information management and the big vendors are jumping on board.  "New economy" giants like eBay, twitter, FaceBook and Google are the early adopters - and many even built the big data tools that everything is based on. 


It would be too easy to dismiss big data as a Valley-only phenomenon, and you shouldn't.  Microsoft's information management tools serve perhaps the widest ranging set of clients anywhere.  They've either made their move to "keep up with the Joneses" (Oracle had some big data announcements last week) or there must be some Global 2000 budgets in it.  The industry will not thrive without some of the latter and that's what I'm betting on.


There's vast utility in unstructured and machine-generated data (somehow tweets count in this category) and many reasons, starting with monetary, why, once a company finds some use for it, they will choose a big data tool like Hadoop rather than a relational database management system to store the data.  Yes, and even live with the tradeoffs of lack of ACID compliance, lack of transactions, lack of SQL (although this is eroding by the day), lack of schema sharing, the need to user-assemble (although this is also eroding) and node failures being a way of life.  Indeed, the "secret sauce" of Hadoop is the distribution of data and node recovery failure - RAID-like, but less costly.


It's better to play with this "hippy developed" (as one skeptic referred to it as) Hadoop than ignore it at this point.  That's what Microsoft has done.  Microsoft is working to deploy Hadoop on Windows and cloud-based Azure.  This could really work in Microsoft's big data land grab.  It's a hedge against going too hard-core into the open-source world.  It's comfortable Windows combined with Hadoop.  For the many, many fence-sitters out there, this is good timing.  Many want to trace movements of physical objects, trace web clicks and other Web 2.0 activity.  They want to do this without sacrificing enterprise standards they are used to with products like Windows and its management toolset.


Development will occur with the Yahoo-legacy Hortonworks and will go into Apache.  This announcement follows the development of the Sqoop-compatible Microsoft SQL Server Connector for Apache Hadoop.


A simultaneous Microsoft big data announcement was an ODBC Driver to Hive.  Hive was developed by FaceBook to make the data access to Hadoop easier than MapReduce.  Every day, FaceBook runs 150,000 jobs.  Only 500 are MapReduce, the rest are HiveQL.  HiveQL is SQL-like and, in some ways, actually exceeds SQL capabilities with complex types like associative arrays, lists and structure data types.  And soon, it will have an ODBC driver from Microsoft.


The announcements didn't coincide with any showable development so apparently there's still some work involved before we will have substantially more information, but it's definitely worth watching as a milestone in the big data journey.

Posted October 15, 2011 2:12 PM
Permalink | No Comments |


Search this blog
Categories ›
Archives ›
Recent Entries ›