This BeyeNETWORK spotlight features Ron Powell's interview with Eldad Farkash, Founder of SiSense, and Bruno Aziza, the company's Vice President of Worldwide Marketing. Ron, Eldad and Bruno discuss SiSense’s unique approach to analytics on big data that is effective and affordable for companies of any size.Bruno, for those in our audience who may not be familiar with SiSense, can you give us a brief overview of the company?Bruno Aziza: SiSense
is a new startup in the big data/analytics
world. We raised our first round of funding about 18 months ago, and since then we have had incredible success with helping customers solve their big data problems. We have more than 350 customers from Target and Merck to the smaller companies like Wix. We have customers across 48 countries.Bruno, when we talk about big data and big data analytics, companies are looking at the tremendous investment they need to make to do this type of work. What do you feel is wrong with that assumption?
The perception is that to get in the game with big data, you have to spend a lot of money on software, hardware, and then you have to hire an army of data scientists and that you’re dealing with petabytes of information. That’s why we call it “big box” or “big bucks” big data. It scares away CIOs and it scares people from even considering trying to win in big data.
What we see in the real world today is actually that spending millions on big data is the exception, not the rule. More than half a million U.S. companies that have more than 20 employees are currently dealing with big data issues. There is very little taught to them, and there are few options for them to deal with big data on their own hardware, with their current infrastructure and their current people. And that’s the problem we’re attacking.
We want big data democracy. We want big data to be something that can be solved by any company. As you’ll see, we have all types of customers. We have the small companies, we have the departments inside the big enterprises, and we have the big enterprises as well. They’re recognizing that the idea of investing millions into new technology is just going to make them fail. They want something that’s more pragmatic and that works on their current infrastructure.What do you feel are the biggest problems today preventing companies from getting the most out of their big data?Bruno Aziza:
There are quite a few issues. The first one is the issue of knowledge and the perception that the data scientists will come and solve their problems. In fact, what we’re finding is a lot of the data analysts and business analysts are well equipped to win in this space.
The second one is a technology issue. Today when we look at the solutions offered in the space to solve big data analytic issues, there are three options. The first one is a traditional IT-driven infrastructure offered by Oracle, SAP, IBM and so on. This is fine, but it assumes that you have the time and a large IT team to support you, which is not the case for all companies. The second option, which is called the second generation, is the place where QlikView and Tableau fit. They are essentially bringing a value of agility to these enterprises. They go inside departments and allow business users to build quick solutions. What we have found, though, is that even these solutions are limited because they cannot handle the amount of data that is required nowadays. A terabyte is the average amount of data that any company or any group is going to deal with, and solutions that rely primarily on RAM-based in-memory are limited very quickly. That’s the substantial limitation because as business users across companies look to be agile with big data, they realize that the options they have actually make them go back to an IT solution where they have to either bring in consultants or learn the technical intricacies of building a data warehouse in order to handle big data.
We consider ourselves a third generation because we combine the best of both worlds. We have a columnar database, and we have disk and RAM-based in-memory technology, which is very rare in this space. There are actually just a few companies that work like that. SAP
HANA is one of them. We think that by providing all of that in one simple box, which is the SiSense solution, it is going to allow all companies to grow their infrastructure without actually breaking their current model.Eldad, what are your thoughts on this?Eldad Farkash:
First of all, we see the big data problem as a major pain point for companies and that scale becomes a problem very quickly. Hadoop solves the scaleout problem. You can add more nodes as you add more data, and the assumption is that you can add hundreds or thousands of nodes and solve the scaleout problem. Our customers and their users – the data heroes as we call them – rarely have the ability to set up this type of cluster. They have a few machines out of which they want to squeeze as much as possible. We call this "micro-scaling" as opposed to "scaling out." SiSense software is tuned to squeeze more performance out of the CPU. So instead of being known as a black box, we are hardware aware. This is one of the biggest differences between the typical big box, scale-out model and the SiSense model.
In-memory solutions did that very well. It took QlikView years to convince the market that there is an alternative to OLAP. They did eventually convince the market. The problem is that data grows faster than anyone anticipated, and suddenly you have much more data. There is a relationship between how much data you have and how much memory you have. You still need to be able to offer those types of users a way to deal with the data size problem with tools they are capable of handling.
Cisco, one of our customers, likened us to a speedboat. They had the big battleships but really wanted us to focus on being a speedboat for big data. That’s what we did, and we think we have the best speedboat for solving big data problems as opposed to trying to send you a battleship.Eldad, based on your speedboat analogy, how is SiSense changing the status quo and making big data analytics accessible to businesses of any size, not just large enterprises?Eldad Farkash:
It starts with the way the product was designed. Business intelligence (BI) is all about an ecosystem. It’s about being able to have one tool for ETL, one tool for data visualization, and so on. Everybody knows their spot. We took that model and threw it away and said, “Listen. If the typical Joe who knew how to run Crystal Reports a few years back now knows how to make an OLAP
cube, if he really tried hard, what kind of product would he need?” Of course, it needs to be able to do everything. So instead of throwing more technology at CIOs, we hide it, and provide everything in one package – from data import to ETL, even SQL and, of course, the web interface for visualization. The product is built from the ground up to be downloaded. You can download and play with it and get everything, even the embedded web server, so you can share your dashboard five minutes after you download the product. Are you saying that the non-technical business analyst can extract actual insights from data and they don’t always need an IT team or data scientist
No, I don’t think that the typical business user is capable of doing the modeling, for example. You always need the data hero. Typical customers of SiSense are of two types. One is the data hero and the other is the business user. The business user will use a BI
studio, the web, and the dashboards. But the data hero is the person who needs the most help. He is the person who needs to mash up data from different data sources. He is the person who needs a very fast tool like our ElastiCube Manager
to be able to mash up data without using Informatica for one part and BusinessObjects for another part, and having to deal with multiple vendors and multiple technologies. We all know how fast things change in this industry.Bruno Aziza:
To summarize, there are really two points. One is that we approach the data problem drastically differently from everyone else in terms of how our in-memory technology works and the way the software is bundled, if you will, in one solution that executes both the database and the automatic ETL, all the way to the front-end tools and dashboard views. That, in terms of the offer, is a lot more approachable than the typical layers that people would buy to solve their big data problems.
I was talking with a large company recently, and they were describing to me that they had a database, and they had Vertica, and they had a visualization tool. On top of that, they had their custom tool. So for them it is just very hard to get from data to insight because they have layers of multiple vendors and multiple solutions. It’s an integration problem. So we aim at solving that.
The second point that is really powerful is that the software is designed so that you don’t have to think about the IT complexity that typically would be put in front of you. What’s an index? What’s a join? I was talking to a customer recently and asked them why they picked SiSense. The response was, “We turned on behavioral analytics on our website, and our website traffic was growing by 50% every day. We realized that none of the solutions that we had could scale fast enough.” That’s why our demonstration of analyzing one terabyte of data on a laptop with only 8GB of RAM is useful. It puts in context the amount of stress that regular companies have with their data. They can’t control the growth of it right now, especially if they’re in an online business. But what’s interesting is the answer you get when you ask these particular users of our technology how big their data is. I asked one customer, “Is you’re a data a terabyte or two terabytes? What is the size?” You know what his answer was? He said, “I actually don’t know what a terabyte is.” We’re dealing here with a new generation of people that have realized that there are things about the infrastructure and the plumbing that they would rather not have to worry about. In some cases, they don’t care what the size is. They just know it is big, and they don’t want to give up the performance of analysis on top of that big data.When you say it is big, is it actually possible to crunch a terabyte of data on a standard issue laptop?Bruno Aziza:
Yes, but we don’t advise our customers to do that, of course. We would like them to think about the problem a little differently for multiple reasons. One reason is their data is going to grow to much larger than a terabyte as they start working with data. But I think what you’re seeing is there is a new generation of technology that has understood that the prior generations of technology will turn into an IT problem and will stop the adoption of big data if all they require is more hardware and more people in order to scale. The solution to our big data problem today is not going to be hardware and people. It’s going to be smarter software.
And, today, with the in-memory technology we have, on the same machine why would you focus on the sub-optimal solution, which is focusing on RAM, rather than going after the disk, which has much more capacity and can be leveraged much faster. I think the answer is that it is an evolution, and we are at the beginning of that. That’s the reason SAP HANA has put so much emphasis on this market. They’re right on identifying the problem, but the question is the way to approach the solution. We don’t think the answer is to spend millions of dollars and buy a big box and have dedicated new people to put on the problem instead of taking advantage of the resources you have.Eldad Farkash:
Our demo about analyzing a terabyte of data on a laptop was a fun example. But in reality, the real difference is between having one commodity server with 100 GB of RAM that costs less than $10K today, or having 50 additional nodes inside Hadoop. The difference is that most departments, most data heroes, know how to get one machine, but very few know how to set up and manage a cluster because once you have the cluster, it has a life of its own. It invents problems that people start to work around. It grows and grows. Most of the engineers at SiSense come from electrical engineering. And nobody took that to the database before we did. So this is why we call it a query column within the CPU cache. And that's why you can handle 1 TB on such a machine. You can handle billions of rows on such a machine, which costs $10k. Apart from the price, even if you had $50k to spend on this cluster, you would need two or three people to manage it. So the “know-how” is crucial here, and this is why most of our customers will need time to ramp up before they take Hadoop where they envisioned and start using it. This is where we come in.What type of actionable insights can you realistically extract using SiSense’s Prism.Bruno Aziza:
The luxury we have is we have a very large customer base, and we spend a lot of time with those customers looking at what they do. In this space, there is an imbalance between the amount spent on the data side versus the amount spent on analytics. I was looking at Gartner’s size of market, and I think they were saying that $30 billion was spent on big data infrastructure in 2012, and then analytics was $7 billion. It’s kind of crazy when you think about it because the most leverage, the most value, is actually in the analytics. The data is very important, but what’s interesting is that the industry doesn’t seem to be catching up to spending more time on the types of analytics that they should develop on top of big data. A lot of it, I think, is because they can’t work with big data and scale out the way they want to do it.
Some customer examples include Target where they do theft analysis across multiple data sources at scale. That’s in the financial function. We see marketers like our customer Plastic Jungle, which is an online card trading company, deal with a lot of online transactions to price in real time. Their marketing users are actually using the solution. We see Magellan Vacations, a luxury travel company, that is able to provide real-time analytics
to their travel agents on the entire dataset. This is another thing that is really important and a consequence of our technology. Because the querying of large datasets is complicated, most companies provide their end users with a subset of the data – that’s the concept of in-memory analysis on the RAM or the concept of OLAP. With our technology, you really are querying the entire dataset at lightning speed. As an example, going back to Plastic Jungle, they query 18 months’ worth of history in less than a minute. We’re entering a new era here where business users need to analyze more and more data, and a solution like ours is really the only one that can provide them the analytics at scale on the entire dataset.Eldad Farkash:
Web companies like Uber, Fusion-io, Wix and WeFi are analyzing how their users behave. For example, Wix is one of the biggest do-it-yourself website vendors. They have millions of users, and they want to learn what they’re doing with their products. WeFi is the biggest hub in the world for WiFi information. They have hundreds of millions of WiFi hotspots that they are collecting and querying in real time. What are some ways SiSense customers are mashing up data from Hadoop, Salesforce, Google Adwords, ZenDesk, etc. to increase profitability and solve business problems?
A great example is Online Commerce Group, a company that is looking at inventory, and the type of inventory and sizes they need to forecast in order to maximize profit. They look at the holiday season for how much they’re going to ship in terms of sizes and colors. They sell all types of products – pillows, linens and event t-shirts. If you’ve ever ordered t-shirts for an event, you know how challenging it is to determine how many of each size to order. They obviously deal with situations like that on a very large scale during the holiday season. The problem is if you use a traditional analytics tool, you can only go so far. They use our software to go five years back and look at the trends of the last five years. They build a forecasting model based on what type of year it was, how many weeks were in a month, what day of the week are most orders received. There are so many factors that you can lay out across years if you have access to that data. For them, the ability to use SiSense Prism to crunch that much data very quickly allows them to be more accurate and maximize profit.Thank you Eldad and Bruno for providing this insight into SiSense Prism and big data democracy.
Recent articles by Ron Powell