We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.


The Power of HANA and Hadoop: A Q&A with John Schitka of SAP

Originally published October 1, 2014

This BeyeNETWORK article features Ron Powell’s interview with John Schitka, solutions marketing manager at SAP. John and Ron talk about in-memory for big data, specifically the power of HANA and Hadoop.
John, from a customer’s perspective, are you only seeing large organizations deploying Hadoop with in-memory?

John Schitka: We’re not. It’s a misconception. I think when people think of big data, they think complex and very expensive. It can be something that is complex. It is something that organizations really have to think about, but it really is something that is touching every organization. And, if organizations don’t act very soon, they will be kind of left behind. So, we are seeing a variety of sizes of organizations. We do see very large organizations – the McKessons of the world – doing this. We also see smaller organizations. We have startups like CIR Foods in Europe that are using a great deal of data to optimize the way they manage their staffing and their food supply chain when they work with restaurants. We’re looking at Schukat, a company in Europe that has140 employees who are using HANA to optimize and work with big data. We have some 1,500 startups at this point in time. Not all of those are big data, but a considerable number are driving applications that are big data-enabled applications leveraging HANA. We see a great range of adoption from small to large businesses. Our Edge offerings and cloud capabilities are meeting the needs of that small to medium enterprise market segment where cost is a consideration.

Can you describe SAP’s open Hadoop strategy?

John Schitka: That is both on premise and in the cloud because both SAP HANA and the Hadoop distributions are available both ways. We have taken a very agnostic, open approach at SAP. We will operate with any distribution of Hadoop that the customer wishes to work with. We’re doing that because as we go in to a customer, a lot of customers have already started to experiment and already have a preference. And, we don’t want to tell them they have to stop what they’re doing. We will interoperate with anyone. We are working right now on all three approaches of SQL on Hive, both the batch and disk optimized, and now with Spark, there is a distribution from Spark that is optimized for HANA that provides access to the in-memory capabilities in Hadoop through Spark. We now are able to interact with large volumes of data. You have the real-time speed of HANA coupled with a massive reservoir that is Hadoop, where you can do exploration and work. You no longer have to move data. You now have the capability of interacting with data exactly where it sits. So I can start a job in HANA and reach down through Hive to get into Hadoop. The latency depends on how you connect. If you’re doing traditional ways of interacting and not using Spark, there will be latency to get the results back. With Spark, there will be some latency because it’s an in-cache memory, but it’s still far faster than the other ways of accessing it. We’ll work with anyone. We do have a partnership with Hortonworks, where we have the ability to resell the support and their distribution. We are very open and agnostic, especially now with the Spark distribution because it allows access directly to HDFS. So now we’re no longer dependent on the distribution. If the customer has HDFS and they download the Spark distribution, they have the ability to access that HDFS store directly. And, with RDBs built in Spark, HANA can use those RDBs to reach into other NoSQL databases such as Cassandra. It’s a very open way of accessing things – trying to weave that data fabric together.

Enterprises today have a lot of in-memory options. Why would you say HANA is different than other in-memory options that are available?

John Schitka: I touched on it a little bit earlier when I talked about Spark being cache. One of the biggest differences that HANA has is that it really is true in-memory. First off, it’s not an option. When you look at some of the in-memory capabilities in terms of databases, they’re options you add onto a database and they’re not necessarily true in-memory. A lot of them are a cache version of memory. The data isn’t all in memory. The database itself is still on disk and I’m pre-loading what I want to work with. They do provide speed, but they’re not the true in-memory that HANA provides. The best analogy to that is probably electric vehicles. If I take a look at hybrid cars, I have taken a car designed for combustion engines. I still have the combustion engine so I’ll still have the space and the weight of the engine. I have the transmission. I have the size/weight of the gas tank, and all I’ve done is thrown in some batteries and an electric motor. The option of the electric vehicle is there, but it really is an optional add-on. It’s a bolt-on. It isn’t a re-envisioning of or a purpose-built product. It’s a hybrid that literally has all the drawbacks of the previous version –the internal combustion engine. It doesn’t really offer the full value that you would get from something that was truly electric driven like a Tesla, where somebody sat down and designed it from the ground up. Yes, it looks like a car. It has four wheels, doors, seats and a steering wheel, but in terms of a propulsion system, it has been built from the ground up for the electric propulsion. You have that custom-built view. And that’s what HANA is to the other in-memories. It’s not an option. It is something that was built purposely, designed purposely to be an in-memory database, and that is the most significant differentiator.

When you’re working with HANA and Hadoop, how do you decide from an enterprise perspective what data is stored where?

John Schitka:
Customers decide that. When we approach big data, SAP has an interesting approach. We don’t start with technology. It has to be a strategic business discussion. We sit down with a customer first and figure out what they have, what they can do, and what they need to do. Working with them, we can through a series of workshops and design thinking come with a plan that is actionable for them for big data. Then they will decide what data has to get where. Ideally, you’re going to leverage the optimal store for what you have. So HANA, being in-memory, is where you want your mission-critical, speed-sensitive data. Anything that requires timely response, anything that will benefit from that instant in-memory, I want in HANA. The rest of the data I can store in a near-line store in IQ or I can store in a mass commodity store like Hadoop. The issue is I’m not going stick mission-critical data in something like Hadoop. It’s not ACID compliant. You have some concerns around data governance and reliability. Mission-critical data is going to be in a traditional data warehouse, ideally in HANA for that speed. And then information that is less time sensitive, information whose value I may not know is being kept – because that’s what we’ve come to realize. We used to aggregate data. We used to throw away bits of data. Now we’re keeping all the details because we’ve realized the details are important. Ten years from now, I actually may want to look back at the details, which I never thought I’d be doing. So that vast data store is ideally stored in Hadoop where I’m not necessarily worried about the ACID compliance. It’s a place where I can do that analysis and exploration.

Are there any specific customer examples that you’d like to highlight for our audience?

John Schitka: Mercedes-AMG, that is using big data to design engines, is a prime example. They’re trying to design better engines. A typical engine test is 50 minutes and generates up to 30,000 data elements in a second. So we’re talking about massive amounts of data that have to be processed very, very quickly. And this is where that ideal store between HANA and Hadoop works. When I’m doing the test, when I’m analyzing it, I need it in HANA right now to get the best results. In addition, I’m going to keep the historical information in Hadoop as I may want to actually go back over six months’ worth of tests and look at something later, but I can’t store that all in an in-memory database.

We’ve got Kaiser Compressor who is doing the same sort of thing using the sensors on their compressors to help them understand mean time to value and providing their customers with that. We have healthcare examples. MKI is using HANA, Hadoop and R for genomics analysis. And NCT, a cancer center, is using HANA to manage hundreds of thousands of data sets with several million data points. If you’re using in-memory, it could take weeks. They have the ability with HANA in a very timely manner to take a look at what will definitely work for specific patients. T-Mobile is dealing with incredible volumes of varying data forms and structures, coming from both inside and outside of T-Mobile. They’re using HANA and related tools to work with customer experience coordination for the multi-channel approach. They’re using HANA to mine data and uncover deep insights into the customer needs. They’re able to drive a better customer experience, and speed is a competitive advantage when they see it. We have energy companies like Oleander or Venture Point in the United States or AGL in Australia, where they’re actually using sensor data – smart meters – combined with a bunch of other data from historic loads to weather forecasting and weather histories to actually take a look at those studies and do forecasting in terms of load demand and weather response. They’re able to take a look at how to respond to upcoming events and changes.

HANA has a wide variety of abilities and uses, spanning everything from healthcare to manufacturing to energy to retail.

John, thank you for discussing with us how SAP’s open Hadoop strategy with HANA is making it easier for companies to harness all of the benefits of in-memory.


  • Ron PowellRon Powell
    Ron is an independent analyst, consultant and editorial expert with extensive knowledge and experience in business intelligence, big data, analytics and data warehousing. Currently president of Powell Interactive Media, which specializes in consulting and podcast services, he is also Executive Producer of The World Transformed Fast Forward series. In 2004, Ron founded the BeyeNETWORK, which was acquired by Tech Target in 2010.  Prior to the founding of the BeyeNETWORK, Ron was cofounder, publisher and editorial director of DM Review (now Information Management). He maintains an expert channel and blog on the BeyeNETWORK and may be contacted by email at rpowell@powellinteractivemedia.com. 

    More articles and Ron's blog can be found in his BeyeNETWORK expert channel.

Recent articles by Ron Powell

 

Comments

Want to post a comment? Login or become a member today!

Be the first to comment!