NoSQL solutions are solutions that do not accept the SQL language against their data stores. Ancillary to this is the fact that most do not store data in the structure SQL was built for - tables. Though the solutions are "no SQL", the idea is that "not only" SQL solutions are needed to solve information needs today. The Wikipedia article states "Carlo Strozzi first used the term NoSQL in 1998 as a name for his open source relational database that did not offer a SQL interface". Some of these NoSQL solutions are already becoming perilously close to accepting broad parts of the SQL language. Soon, NoSQL may be an inappropriate label, but I suppose that's what happens when a label refers to something that it is NOT.
So what is it? It must be worth being part of. There are currently at least 122 products claiming the space. As fine-grained as my information management assessments have had to be in the past year routing workloads across relational databases, cubes, stream processing, data warehouse appliances, columnar databases, master data management and Hadoop (one of the NoSQL solutions), there are many more viable categories and products in NoSQL that actually do meet real business needs for data storage and retrieval.
Commonalities across NoSQL solutions include high volume data which lends itself to a distributed architecture. The typical data stored is not the typical alphanumeric data. Hence the synonymous nature of NoSQL with "Big Data". Lacking full SQL generally corresponds to a decreased need for real-time query. And many use HDFS for data storage. Technically, though columnar databases such as Vertica, InfiniDB, ParAccel, InfoBright and the extensions by Teradata 14, Oracle (Exadata), SQL Server (Denali) and Informix Warehouse Accelerator deviate from the "norm" of full-row-together storage, they are not NoSQL by most definitions (since they accept SQL and the data is still stored in tables).
They all require specialized skill sets quite dissimilar to traditional business intelligence. This dichotomy in the people who perform SQL and NoSQL within an organization has already led to high walls between the two classes of projects and an influx of software connectors between "traditional" product data and NoSQL data. At the least, a partnership with CloudEra and a connector to Hadoop seems to be the ticket to claiming Hadoop integration.
NoSQL solutions fall into categories. These labels may (I dare say should) replace "NoSQL" as the operative term since, despite the similarities, the divergences are many and are exacerbating. Whereas once all this data was excluded from management (or force-fit into relational databases), NoSQL solutions access this data better, as well as save cost and don't have a per-CPU cost model. Naturally, many of the solutions are open source and embraced by various vendors with value-added code, training, support, etc.
The categories (and future industries) are:
KVS like Redis store data paired with its key and accessible by a navigable tree structure or a hash table. KVS support dynamic online activity with unstructured data.
Document Stores like mongoDB and CouchDB support schema-less sharding for guaranteed availability.
While sharing the concept of column-by-column storage of columnar databases and columnar extensions to row-based databases, column stores like HBase and Cassandra do not store data in tables but store the data in massively distributed architectures.
Graph Stores like Bigdata represent connections across nodes and is useful for relationships among associative data sets.
Posted September 14, 2011 6:22 PM
Permalink | No Comments |