We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.


Insights into the NoSQL Environment: A Spotlight Q&A with Pramod Sadalage of ThoughtWorks

Originally published November 15, 2012

This BeyeNETWORK spotlight features Ron Powell's interview with Pramod Sadalage, Principal Consultant at ThoughtWorks, Inc. Ron and Pramod discuss NoSQL and its use cases.
Pramod, NoSQL Distilled, the book you co-authored with Martin Fowler, was recently published. One of the people who reviewed your book on Amazon.com called your book "a wonderful, accessible, product-agnostic introduction to the world of NoSQL." Why did you feel it was the right time to write this book?

Pramod Sadalage:
Ron, I have been doing NoSQL for a long time now, and so has my company, ThoughtWorks. I have seen the usage go up. There's this whole notion of whether to use a key value store, or a document store, or some other store. These discussions keep happening all the time, and we thought it would be good to step back and think about the trends that are driving this NoSQL movement. Our book looks at the design decisions you need to think about and how the trends and usages are affecting the usage of NoSQL products.

Well described in Eric Evans’ book Domain-Driven Design, the concept of aggregates is about how they are used and how they affect which kind of product you use, or which kind of document store, key value store, or column family store you use. We wrote our book from a similar perspective so readers can think about how they're going to use those products and the design implications as they proceed. I think the main reason our book is so popular is that we are not only talking about a key value store or a document store, but also we are giving them perspective on how to choose a particular product based on some design goals they will encounter as they start using these products.

What do you hope your book accomplishes and what value will our readers gain from reading NoSQL Distilled?

Pramod Sadalage: I hope our book makes the readers understand that there is more than one storage technology that they can use, and that they should think about a storage technology based on what they are using that data for instead of blindly thinking that what they have right now is good enough. We have learned that based on how you're going to use the data – what real data is coming in and what real data needs to go out – the requirements for storage of that data are going to change. I hope this book basically gives people the chance to look at the different data storage technologies and make the right choice for the long-term needs of their applications as well as their enterprises.

The book will also help people keep current on their career choices because sometimes technologies just kind of fade away and then people are stuck with that technology, and there are no more jobs in that space.

Who can benefit from reading this book?

Pramod Sadalage: Everybody! It will be very helpful for architects, data developers, data analysts, data architects, technical project managers, CIOs, CTOs – basically anyone who has to deal with data. And, as you know, everybody is dealing with large amounts of data on a daily basis nowadays.

Many SQL users and practitioners are questioning whether NoSQL will be relevant in the future. Based on your experience, do you think NoSQL is a technology that SQL professionals should embrace?

Pramod Sadalage: Well, I definitely think they should embrace it – not just as a replacement for SQL but as a complement, as something that provides a better solution to your user base for certain use cases. It’s not that you’ll be learning something new and forgetting SQL. Different storage technologies will still be in play.

You mentioned NoSQL as being a complement. I recently spoke with a company that replaced their RDBMS with MongoDB. Do you see NoSQL technologies replacing the RDBMS?

Pramod Sadalage: It depends. Like I was saying, it’s a complement in the sense that it could take certain aspects that the RDBMS is doing for you and replace that with a NoSQL technology. At the same time, you can also complement some of the things that you are doing currently in an RDBMS but are having difficulty scaling, or having difficulty in using it in the right way. Certain NoSQL technologies can be put in architecturally so that you're using the RDBMS for the right task, and then you're also using NoSQL to do the right thing. A great example is an event-logging component of an application that writes all the user activity that is happening to the logs within an RDBMS. In that situation, it is  difficult to scale because the events come in at millions of events an hour. We replaced that component with a NoSQL database, but the existing application was still talking to an RDBMS for everything else. So NoSQL is a complement in a lot of ways.

Well Pramod, we hear a lot about NoSQL being “schemaless.” Does that mean there is no migration required?

Pramod Sadalage: I think that's a total misnomer. That NoSQL is “schemaless” is 100% correct. You can still deploy applications without having to change your database, but sometimes people forget that because a schema is not in the database, but rather it's in the application, as you change your application schema, the data in the database no longer matches your application. As a result, you may not find the data that you're looking for. So it doesn't mean you don't have to do migration. The option is to do lazy migration. Some of the techniques that we describe in the book are that you don't have to migrate everything at the same time. You can migrate one document or one key value at a time to get to an eventually migrated state. I think that's what people need to look at so they don't fall into the trap of thinking they don't need to migrate anything at all.

That is very interesting. From your experience, what are most enterprises now trying to solve from a big data perspective?

Pramod Sadalage: I see two or three trends. One of the major trends is the amount of data that is coming at them. Previously some of the events were not even looked at because it just takes an enormous amount of data storage to store them, but people are now tracking data all over the place such as what did the user click on, what did he do, did he buy, did he not buy, at what stage did he leave our sales pipeline, etc. That is one aspect of data collection. The other aspect of data collection is that people are also looking at usage, such as how much is my server being used, what is the CPU usage, what is the disk usage, what machine failed, when did it fail, and why did it fail. Nowadays, all this kind of data is being tracked. NoSQL is frequently being deployed to deal with the data being generated from this tracking.

Enterprises are using NoSQL technologies to store this data and analyze it for the long term to figure out how they can improve the value they provide to customers or gain more business from their customers and things like that. You see the term “data scientists” being used a lot, and data science is basically a trend of having the ability to store and analyze vast amounts of data.

Are the data scientists typically statisticians?

Pramod Sadalage: Some are coming from the statistics world. There are some who are coming from the astronomy world. Basically, data scientists are people who are used to looking at terabytes of data on an hourly basis. There a lot of developers who got interested in how to analyze data. Data scientists are coming from all over the place – especially people who are “geeked out” about data in some way. They are using enormous amounts of data to come up with some kind of value.

Another trend that I see is visualization techniques. There are a lot of front-end technologies that are coming in that can visualize this data such as D3.js or Tableau. Those technologies can now deal with large data and show you awesome graphics. Previously, if you had data but had no way to visualize it, there was a lot less value to that data. But now since it can be visualized, it’s a really awesome way of looking at data.

There are so many NoSQL options available today. Are there specific situations where one NoSQL option would be better than another?

Pramod Sadalage: Again, it depends what you are trying to use the NoSQL for. A great example would be if you're trying to store session information, or simple user preferences, or things like that, a key value store really works well for that because there's the key, the user ID, and there is a value that’s associated with that. If you're doing event logging or if you're doing some kind of content management system, a document store fits that model. If you have lots of data, lots of high write volumes and things like that, a column family database matches up with that. Databases under the graph database section are really suited for interconnected data, for example who is a friend of whom and how are you connected to that person, is a social graph that graph databases fit very well. Another example would be routing information like routing trucks, for example, or even routing money to determine the cheapest way to get your money from Bank A to Bank X. Both kinds of problems can be easily solved by graph databases.

We hear a lot about Hadoop. How do you classify Hadoop and what are its best use cases? We also hear about Cassandra. Could you give us insight into those?

Pramod Sadalage: Hadoop is a framework for distributed processing of large sets of data. People generally use Hadoop to process large quantities of data, and do long-tail analysis on it. I know of companies that store large amounts of information and use Hadoop to analyze it over a long period of time running MapReduce jobs. In addition, you might have also heard of Pig and Hive, which are technologies that give you a framework to write data processing jobs that can run on top of Hadoop.

Cassandra is basically a column-family database that stores information into rows. Basically, each column is grouped into a bunch of rows and those groups of rows are again put into columns. It's similar to a complex hash map data structure that you can store a lot of data in. All of these NoSQL solutions – at least most of them – are “masterless” in the sense that there's no master node that is coordinating other nodes. You write data to the cluster of nodes. It doesn't matter which nodes are up or not, and the nodes manage among themselves how to get data back to the node that was down. This is known as hinted-handoff. It is really an awesome way of having resiliency in your data storage solution in the sense that if one node is down, it doesn't really matter. Some other node picks up that load and data. When the node that was down comes back up, other nodes basically tell it they have data for it so it can take the data back. That's kind of a hand-off that happens, and this is really good for any kind of data storage if you don’t want to have to worry about the master being down while it’s being replicated to other nodes. Cassandra is such a database and is used extensively in large data enterprises like Netflix.

Pramod, thank you so much for providing this helpful view into the world of NoSQL. I'm sure many of our readers will soon be reading your book, NoSQL Distilled.


  • Ron PowellRon Powell
    Ron is an independent analyst, consultant and editorial expert with extensive knowledge and experience in business intelligence, big data, analytics and data warehousing. Currently president of Powell Interactive Media, which specializes in consulting and podcast services, he is also Executive Producer of The World Transformed Fast Forward series. In 2004, Ron founded the BeyeNETWORK, which was acquired by Tech Target in 2010.  Prior to the founding of the BeyeNETWORK, Ron was cofounder, publisher and editorial director of DM Review (now Information Management). He maintains an expert channel and blog on the BeyeNETWORK and may be contacted by email at rpowell@powellinteractivemedia.com. 

    More articles and Ron's blog can be found in his BeyeNETWORK expert channel.

Recent articles by Ron Powell

 

Comments

Want to post a comment? Login or become a member today!

Be the first to comment!