We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.

Blog: Rick van der Lans Subscribe to this blog's RSS feed!

Rick van der Lans

Welcome to my blog where I will talk about a variety of topics related to data warehousing, business intelligence, application integration, and database technology. Currently my special interests include data virtualization, NoSQL technology, and service-oriented architectures. If there are any topics you'd like me to address, send them to me at rick@r20.nl.

About the author >

Rick is an independent consultant, speaker and author, specializing in data warehousing, business intelligence, database technology and data virtualization. He is managing director and founder of R20/Consultancy. An internationally acclaimed speaker who has lectured worldwide for the last 25 years, he is the chairman of the successful annual European Enterprise Data and Business Intelligence Conference held annually in London. In the summer of 2012 he published his new book Data Virtualization for Business Intelligence Systems. He is also the author of one of the most successful books on SQL, the popular Introduction to SQL, which is available in English, Chinese, Dutch, Italian and German. He has written many white papers for various software vendors. Rick can be contacted by sending an email to rick@r20.nl.

Editor's Note: Rick's blog and more articles can be accessed through his BeyeNETWORK Expert Channel.

February 2013 Archives

Quite recently, during my trip to the Silicon Valley in the San Francisco Bay Area, I visited some of the well-known NoSQL vendors. What became obvious after a couple of meetings is that all of them have added or are adding SQL interfaces to their products (or some form of SQL). For example, Cloudera will release Impala, MapR is working on Drill, and DataStax has CQL. These are all SQL interfaces. But the list goes on, data virtualization vendors, such as Composite, Denodo, and Informatic, support access to NoSQL products, Hadapt offers SQL on top of Hadoop, Simba and Quest have released SQL support for various NoSQL products, and MarkLogic, a NoSQL vendor that has developed a search-based transactional database server, will release a SQL interface. In other words, the SQL-fication of NoSQL has started and continues in a rapid pace.

Adding SQL is a wise decision, because through SQL, (big) data stored in these systems, becomes available to a much larger audience and therefore becomes more valuable to the business. It makes it possible to use a much broader set of products to query and analyze that data. Evidently, not all these SQL implementations are perfect today, but I don't doubt that they will improve over time.

Considering this SQL-fication that's going on, how much longer can we state that the term NoSQL stands for NO SQL? Maybe in a few years we will say that NoSQL stands for Not Originally SQL.

In a way, this transformation reminds me of the history of Ingres. This database server started out as a NoSQL product as well. In the beginning, Ingres supported a database language called Quel (a relational language, but not SQL). Eventually, the market forced them to convert to SQL. Not Originally SQL certainly applies to them.

Anyway, the SQL-fication of NoSQL products and big data has started and continues. To me, this is a great development, because more and more organizations understand what a major asset it is. Therefore, data, any data, big or small, should be stored in systems that can be accessed by as many tools, applications, and users as possible, and that's what SQL offers. Such a valuable asset should not be hidden and buried deep in systems that can only be accessed by experts and technical wizards

Posted February 8, 2013 8:53 AM
Permalink | 2 Comments |
The 1880 census for the US was taking a long time, much too long for the decision makers of the United States Census Office (USCO). The reason it took so long was that it was a very manual process. Eventually, it took eight years to complete the exercise.

Halfway through the process, in 1884, it was evident that it would take a long time. Therefore, one of the employees of the USCO was asked to design a machine that would speed up the process for the upcoming 1890 census. This machine had to make it possible to process the enormous amount of data much faster.

That employee was Herman Hollerith. In fact, William Hunt and Charles Pidgin were asked the same question. A benchmark was prepared where all three could demonstrate how fast their solutions were. Coding took 144 hours for Hunt's method, Pidgin's method took 100 hours, and Hollerith's method 72 hours. The processing of the data took respectively 55, 44, and 5 hours. Conclusion, Hollerith's solution was the fastest by far and was, therefore, selected by the USCO.

For the 1890 census, 50,000 men were used to gather the data and to put it on punch cards. It was decided to store much more data attributes: 235 instead of the 6 used in the 1880 census. Hollerith also invented a machine for punching cards. This machine made it possible for one person to produce 700 punch cards per day. Because of Hollerith's machines, 6,000,000 persons could be counted per day. His machines reduced a ten-year job to a few months. In total, his inventions led to $5 million in savings.

Hollerith's ideas for automation of the census are described in Patent No. 395,782 of Jan. 8, 1889 which starts with the following sentence: "The herein described method of compiling statistics ..."

Does this all sound familiar? Massive amounts of data, compiling statistics, the need for a better performance. To me it sounds as if Hollerith was working on the first generation of big data systems.

Hollerith started his own company in 1896 the Computer Tabular Recording Company (CTR). In 1924, after merging with some other companies, the name CTR was changed in IBM. In other words, IBM has always been in the business of big data, analytics, and appliances.

Why did it take so long before we came up with the term big data while, evidently, we have been developing big data systems since the early beginnings of computing? You could say that the first information processing system was a big data system using analytics. This means that Hollerith, besides being a very successful inventor, can be considered the grandfather of big data.

Posted February 1, 2013 7:00 AM
Permalink | No Comments |