Originally published December 12, 2011
BeyeNETWORK Spotlights focus on news, events and products in the business intelligence ecosystem that are poised to have a significant impact on the industry as a whole; on the enterprises that rely on business intelligence, analytics, performance management, data warehousing and/or data governance products to understand and act on the vital information that can be gleaned from their data; or on the providers of these mission-critical products.
Presented as a Q&A-style article, these interviews conducted by the BeyeNETWORK present the behind-the-scene view that you won’t read in press releases.
This BeyeNETWORK spotlight features Ron Powell's interview with Sandy Steier, co-founder of 1010data. Ron and Sandy discuss the history of 1010data, its theory of interacting intimately with data, and how 1010data makes that possible.
Sandy, your background intrigues me. You came from Wall Street and the business side, while most of the founders of software companies have come from the IT side. Can you share how you got from Wall Street to 1010data?
Sandy Steier: First, I should mention that both my co-founder, Joel Kaplan, and I have a more technical academic background. Joel’s is in OR and mine is in computer science and mathematics. In fact, we spent a couple years before our careers on Wall Street doing more technical things. Joel was a salesman and a sales support at IBM, and I was at CBS. But it's true that on Wall Street at Morgan Stanley, Lehman Brothers and UBS, we were primarily on the business side. Joel was on the stock trading side, and I was on the fixed income side.
The interesting thing was that in our businesses, we had to do a lot of technical development. The businesses that we were part of were analytically intense. Joel was trading large amounts of stock, and an enormous amount of data went into that analysis. On the fixed income side in mortgage-backed securities, where I primarily operated, there were very complicated securities. There was a heavy premium on the ability to do complex analytics. That is something that we spent a lot of time on, and we developed some interesting techniques and philosophies around that sort of thing. Being an intense environment, Wall Street takes a toll. When we decided to do something else and strike out on our own, we realized that the techniques, methods, and philosophies that we had used so effectively as users on the business side could be sold commercially.
Can you explain how your Wall Street philosophies have impacted your product philosophies?
Sandy Steier: We were on the business side, and we had to react to changing markets, competitive pressures and always stay on top of things. We had to come up with new ways of viewing, exploring, and analyzing data. We developed techniques that allowed us to get at the data and play with it in a very intimate sense. There were no spreadsheets at that time, but the technologies we used were actually kind of the early '80s or '70s predecessors of spreadsheets.
In a spreadsheet, you actually interact with the data. It's not as though you write analytical programs to probe a database through some remote means. In a spreadsheet, you're directly in touch with the data – you see it and manipulate it live. We found that our experience of being very involved with the data was very effective and gave us an edge on our competition. We realized the ability to interact intimately with data was something that, to a certain degree, people hadn’t yet discovered.
At 1010data, you have a data warehouse platform. It’s no secret there are many data warehouses and analytic platforms on the market today. What makes 1010data so unique?
Sandy Steier: I think it starts with interacting intimately with data. It is very much like a spreadsheet in that the user of the data – the analysts – can actually see and manipulate the data in real time and do things with it that would normally be a cumbersome process. The way most data analytics systems and data warehouses work is very complicated. There's a data warehouse component, which is the place where the data gets stored. There's a data integration component, which is how the data gets into the data warehouse. There's the business intelligence component, which is how the data gets out of the data warehouse.
All of these things require fairly technical skills. They involve many disciplines, and it is a very cumbersome process. The users are given very restrictive access – just little windows into the data – and the types of data they can access are also very restricted. There is a complex chain of what data gets into the data warehouse and how it can be analyzed. It's all very limited and restrictive. Additionally, because it is so complicated and there are so many disciplines involved in designing and setting up the data warehouse, it can take months or sometimes years to build. So the users, who are ultimately the reason that the whole thing is being put together, have very limited access to only parts of the data and only in certain ways. We remove all that and allow the users to take all the data in its most detailed form and do whatever they want with it in a completely extemporaneous, impromptu, spur-of-the-moment way. As a result, they can easily perform the analysis that is most relevant to their business on a particular day.
You're marketing your product a trillion row spreadsheet. Will you explain what you mean by that?
Sandy Steier: In a sense, that is something of an oxymoron to most people because a spreadsheet is something that is on your desktop and works on small amounts of information. It is very intimate as opposed to a data warehouse or something that deals with large amounts of data, which is far more complex. Typically, those are on big servers in the data center, and there are all sorts of people feeding, taking care of and allowing access to it.
But what is a spreadsheet? A spreadsheet is something that is visual. The users can actually see the data, which psychologically gives them a certain sense for what they're doing. It is interactive, and users get immediate results right on the screen. They do the next thing, and they get another result. It’s very sequential. Users can "undo" if they didn’t mean to do something or if they realize that it would've been better to do it differently. They undo and then continue from that point. It is very interactive, unlike most programs that require users to submit a query and get the result sometime later. In this case, users interact with the data and get immediate, incremental results in real time.
Finally, spreadsheets are unguided and unscripted. There's nothing about a spreadsheet that is biased toward one type of analysis or another. It's basically add, subtract, multiply and divide in whatever order you want. It's a way for a user to become intimate with the data.
That is exactly what we provide, but obviously a spreadsheet running on your desktop could not handle a trillion rows. Desktops even today can't handle a trillion records of data because there isn't enough space or processing power on them. With 1010data, that issue goes away because the data is in the cloud. Users work on their desktops and everything is going on in the cloud. They see spreadsheets similar to Excel spreadsheets, and they can scroll through, manipulate and interact with the data, just as they would with a desktop spreadsheet. It just so happens that it has hundreds of billions or a trillion rows. Users get the same type of interactivity even though there is that much data. If users have 10,000 rows in a desktop spreadsheet, they can expect reasonable response times. With 1010data in the cloud, users get the same response times with hundreds of billions of rows.
There really is the sense that you are working with a spreadsheet. It's just that all the power is in the cloud, and it's not actually running on your desktop. That's what is unique. Most data warehouse – most database – technologies, would not be able to support such a thing. Most databases are designed for particular kinds of queries and uses. The notion that users can use something as general as a spreadsheet and do whatever they want flies in the face of traditional practices. Also, most databases simply could not handle that type of completely ad hoc analysis. Our database is designed specifically for that because our experience on Wall Street showed us this was the best approach.
I think a customer example would be helpful so our readers could understand how companies use it. We talked about the cloud, but a lot of people are a little leery of the cloud’s security and safety. However, Dollar General, one of your largest customers with close to $12 billion in revenue, has standardized 1010data as their data warehouse and primary analytic platform. Can you tell us how that came about and how easy it is for the people at Dollar General to use 1010data?
Sandy Steier: Dollar General is a successful retail company, which is a very difficult thing to be. Margins are very tight, and you have to be fleet of foot and very good to succeed in retail. They are succeeding very nicely. Dollar General has the largest number of stores of any retailer in the United States – approaching 10,000 stores. They obviously think about things very carefully.
They started using 1010data for things their legacy data warehouse simply could not do. They needed to look at the detailed point-of-sale data, the beep-by-beep cash register data that they were collecting and every item that every person bought. Their original data warehouse simply could not do that. They began to use 1010data to analyze that data, and they had a very, very high ROI almost immediately. They discovered all sorts of buying patterns and usage patterns within various stores that led them to change certain policies and the way they approached their marketing campaigns, which improved their business dramatically. That was how it got started. When they decided they wanted to have a single data warehouse – basically to replace their legacy data warehouse – they put us into competition with some of the other leading vendors, and we won that handily. Not just because what we offer is unique in terms of a spreadsheet, but also because of the pure processing power. We beat the competition by an order of magnitude or more on the time it took to run certain queries.
On that basis, they decided to go with 1010data, and they found they can use it in many interesting ways. There are a multitude of canned reports that have been built for the many people at Dollar General who need the analysis handed to them. They also have people who use the spreadsheet as an ad hoc analytical tool.
Before they made the decision to switch their entire data warehouse to 1010data, they visited us, as do most customers. They had a tour of our data center, which is a co-location facility; we own some cages at various facilities around the country. We arranged a tour, and it was quite clear to Dollar General that these facilities have extremely high quality physical security. They have tremendously effective backups in terms of Internet connectivity and power. These facilities know how to do what they do, and they do a very good job. I would say that they do a better job than most companies do with their own data centers. We don't own the facility, so I'm being fairly objective here. I think that having seen individual company facilities, and having seen these facilities, they do a very good job and are impressive. I think that based on that tour and the credentials of the facility we use, and based on our experience and credentials with our system in terms of security and reliability, they decided that it would make sense to use 1010data. In the backs of their minds I'm sure they thought they could always switch to a local facility and we would provide them with a local installation. We have sometimes done that and it is an option, in fact, that is available to customers. But they decided they would go with the 1010data-hosted version, and it's working out very nicely for them.
Obviously, in the cloud, they have much more power and they don’t have to worry about all the back-end maintenance and so forth. Is that right?
Sandy Steier: Exactly.
There are a lot of sophisticated types of analysis – time-series analysis, statistical analysis, regressions, cluster analysis – that aren't typical spreadsheet operations but are appropriate for big data, which we're hearing a lot about lately. Are 1010data customers out of luck if they want that type of analysis?
Sandy Steier: I would characterize it as quite the opposite. We have built all those types of analysis into our system. In fact, it is easier to do them in our system than it is in virtually any other. For time-series analysis, our system understands order and time natively. In a relational database, time and order are not really understood by the database, and you have to work around that fact. In our case, the database understands time and order, so you can talk about the price of the previous sale. You can talk about the average price last month versus this month. You can do those kinds of time comparisons or sequence comparisons very easily. You can look for patterns over time. In the pharmaceutical industry, for instance, you can find how people are filling, refilling and renewing prescriptions; switching to generics; or dropping out all together. That type of longitudinal study, as it's called, can be done very easily in 1010data. Those are very difficult things to do in relational databases.
We also have a whole range of statistical analyses built in, including all types of aggressions and cluster analysis. It’s very easy to use. Other products may require you to write programs using techniques like MapReduce, which requires a set of skills that are beyond the normal statistical skills a data scientist should have. In our case, there is no additional programming involved and these analyses are done directly in the system.
Another advantage is that because it is done within the system and there's no outside programming that needs to be written, it is very fast. In our case, you can do a regression on a very large data set very quickly because you are doing it within the confines of the database. It's in-database analytics in the ultimate sense, and that's very powerful as well.
In fact, one of the points that we like to make is that in virtually every system other than ours, people do sampling and various kinds of aggregations to reduce the problem to the point where they're able to handle it. So if you use the more sophisticated statistical or analytical tools, you typically don't apply them to a billion rows and certainly not a hundred billion rows. You typically apply them to a sample that you've extracted from that original data, and tha process of extracting samples is fraught with issues.
If you don't do it right, you don't get good results. The sample has to reflect the entire population of information. If it doesn't, your analysis is going to be wrong. Ensuring the sampling is correct and a good statistical sample is a nontrivial operation that takes a long time. It could take weeks for somebody to make sure that his or her sampling technique is unbiased and statistically sound.
With 1010data, you don't do sampling. We encourage people to do the analysis and statistics on the entire dataset. Because they can get to it right away, they can be confident that there is no sample bias, and they don't have to spend time worrying about the sampling. They can just get to the core or the crux of the analysis that they're interested in and do their analysis. We think of this as a "whole data" idea. Basically, don't deal with samples or aggregates – deal with all the original raw data in its full glory. Do your analysis directly on that full dataset, and you'll get better results in less time.
Is it fair to say the foresight that you and your partner gained from your Wall Street experience regarding the types of analysis that produced the best results is what enabled you to build 1010data to do the things that everybody else is trying to catch up with now?
Sandy Steier: That is correct. We have a lot of interesting technology in our product that makes it very powerful, fast and flexible. But, ultimately, I think the most interesting concept that we have brought to the table is that users should be able to have direct, open access to data and have a powerful, intuitive and easy-to-use tool that allows them to access data and do whatever they want with it. It's a certain notion of transparency and democratization of data that I think is, unfortunately, relatively rare in the world, and it is what makes us unique. This is something that we learned in our careers on Wall Street, and we recognized that it made sense to bring it to the rest of the world to the extent that we could.
Are there any other benefits that your customers receive because you're cloud based? Do you bring in any external data that they can access, for example?
Sandy Steier: I mentioned earlier that one of the advantages of the cloud is that it removes the limitations of how much memory, storage and processing power a PC has. By the way, the cloud means different things to different people. Some people think of the cloud as a virtualization concept, the notion that you're running on virtual machines in a machine cluster. We do some of that. But, to me, that is not the most interesting aspect of the cloud.
Another definition of cloud some people have is the notion of the Internet, the fact that it is out there and not on premises. Certainly, the word cloud implies that to a certain degree. We focus on that aspect of it; the fact that we're out there on the Internet and not local to a particular company opens up possibilities that have really not existed previously. The clearest case is the simple idea that companies can now share data. Let me give you some examples of that.
Let me first explain some of the challenges of data sharing. It is difficult enough sometimes for departments within a company to share data and open up mutual access to data. If one department has data, it's complicated to open access to that data to another department. There have to be meetings to discuss what type of access is required and how that will be accomplished from a technical perspective. To do that kind of data sharing between companies, not just within companies, is a very difficult task. Additionally, the companies that are sharing may be in different industries or businesses.
A classic example would be a retailer sharing data with a consumer packaged goods (CPG) company, the people who supply the retailers with their products. Let’s use Procter & Gamble as an example, as they're a well-known and highly respected CPG company. The retailer has data they would like to share with Procter & Gamble, and Procter & Gamble would love to see that data because, in a sense, that data represents the market. In that data, Procter & Gamble can see how and where their products and competitors’ products are selling and who is buying them. There is tremendous value in the retailer's data for both the retailer and Procter & Gamble. If the retailer can share its data with Procter & Gamble, they may get paid for it, but even if they don't, Procter & Gamble will give them a lot of advice on how they can sell product better. Typically, retailers tend to rely, to some degree, on CPG companies for that type of advice. Historically, consumer packaged goods companies have been very sophisticated. Procter & Gamble, for example, is well-known for their analytical sophistication, and retailers look to such CPG companies like Procter & Gamble for help in stocking their shelves, running various marketing and promotional campaigns, and maintaining their inventory. Those kinds of considerations are very important to the retailer, and they're very important to the CPG company as well.
Now, retailers and consumer packaged goods companies are in very different businesses. The retailer’s day-to-day concern is selling product in the stores, and the CPG company has other concerns. They may interact, but they're not the same, and enabling data sharing between them would be very difficult to implement. For instance, who is going to figure out how that interaction should happen? Who is going to figure out what queries the consumer of the data can ask of the data? Who's going to design the interface that allows Procter & Gamble to get to the retailer's data? The retailer can't because the retailer doesn't understand what Procter & Gamble does; ultimately, it's a different business. Procter & Gamble can't because it doesn't have access to the data, at least not initially. So who's going to design that interface? It becomes a big problem.
When you have a spreadsheet idea like ours, the problem goes away because the owner of the data merely puts the data into the cloud; and the consumer of the data, in our example Procter & Gamble, can get to that data using a spreadsheet-like interface that allows them to do whatever they want. Nobody has to design an interface. That is a very powerful idea, and it's where the cloud and the spreadsheet come together to allow this kind of unprecedented sharing.
We have multiple examples of that. In the capital markets, suppliers of mortgage information, consumer credit data, housing prices, unemployment statistics, etc., can make it available to traders worldwide through 1010data. They merely put it onto our platform or we put it onto our platform for them, under some arrangement. Then the consumers of the data – the traders, the money managers, the rating agencies – can get to that data on 1010data. Nobody has to exchange data in any other way. It becomes very easy for people to get to and use the data they need. Data suppliers like it because they have to deliver it only to 1010data, and it’s easier for them to sell it because nobody has to do any technical work. They just have to get on 1010data and there it is. It's a tremendous boon to the data users because, again, they don't have to deal with building databases. Everyone makes the money they need to make through commercial arrangements.
Again, retail is a good example. Dollar General and Rite Aid are 1010data customers, and both share data with CPG companies. They have what they call vendor portals. To them, the vendors are the CPG companies. Those portals are just windows into their data. With the proper permissions, a company like Procter & Gamble could actually get into Dollar General's data. They could see it in a spreadsheet as if they were at Dollar General. They see the same data that Dollar General sees, as long as they have been given access. For instance, they may not be able to see what Dollar General pays for a product, but they can see what products are sold, to whom, and for what price. That's very powerful. It means that Pepsi can see everything that Dollar General sells, including sales of Coke, and they can see how their product is selling versus how Coke's product is selling. They can try to figure out why they're doing better or worse and come up with strategies for either situation. We have that working for Dollar General and Rite Aid, and we expect to see more of that because they have been very successful programs.
It sounds like a great program. They can benefit from their data in more ways than one.
Sandy Steier: Exactly.
Is there anything else you'd like our readers to know about 1010data? What we have learned has been very interesting.
Sandy Steier: Well, again, I think what’s interesting is that users can interact with data intimately in a way that you normally can’t with a database. With a database, you submit queries and run programs. In a spreadsheet, you don't submit things and you don't run things. You just do things, and that kind of experience is what we allow even for big data. It is not an exaggeration when we talk about a trillion-row spreadsheet. You could have a trillion-row spreadsheet in 1010data. We have single tables in 1010data that have many, many hundreds of billions of rows, and we expect to have even larger ones in the near future. It is a reality.
The fact that it is in the cloud allows the additional dimension of sharing between companies. So not only does the spreadsheet allow people within a company to more effectively analyze their data, but also it allows people in different companies to share and analyze data in a way that has never been seen before. In fact, I think that is the more exciting thing.
Our technology is an enabler for the spreadsheet idea, and the spreadsheet idea is the enabler for the data sharing. I believe that a few years from now people will see this as a real change in the way business gets done.
Thank you, Sandy. It was great to hear about 1010data and the success you've achieved. We look forward to watching even greater successes in the near future.
Sandy Steier: Thank you.
Recent articles by Ron Powell
Comments
Want to post a comment? Login or become a member today!
Be the first to comment!