Data Warehouse Evolution in the Age of Big Data: A Q&A with Brian Wood of SAP
by Ron Powell
Originally published November 12, 2014
This BeyeNETWORK article features Ron Powell’s interview with Brian Wood, a product strategist for SAP and previously a Gartner analyst. Brian and Ron discuss the big data impact on the evolution on the enterprise data warehouse or EDW.How do I know if my business problem is an EDW or a big data problem?
Brian Wood: Well that’s an interesting one. I get that question a fair bit, and my answer is always the same. It’s the wrong question. Really, the enterprise data warehouse tends to be the repository of the majority of an organization’s master data and metadata. Understanding the context for big data usually involves integration to your enterprise data warehouse. When you do big data analysis, a lot of early big data analysis was very much just exploratory – looking to see if you could find something interesting in the data. But to really achieve the appropriate return on your investment for big data, what you want to do is analysis of big data in the context of your own organization with respect to your products, your customers, your channels – really all of the issues that are specific to your own organization.
So it’s not a question of an enterprise data warehouse or a big data problem. It’s a question of how those two technologies and pieces of your architecture fit together and work in a way that really is synergistic so that you get more out of it than you put into it.
Let’s talk a little bit about architecture. The traditional data warehouse approach of creating a monolithic architecture is evolving more toward a logical data warehouse or an information fabric approach, extending the traditional EDW but leveraging its strengths and capabilities more widely. How does SAP’s data warehousing strategy align with this trend?
Brian Wood: When most people look at data warehouses, they’re thinking about a database. So when we ask people what they use for their data warehouse, they’ll name one of the usual suspects in the database arena. But there’s a lot more to a data warehouse than just the database. All of the processes and services that you need – like security authorization, life cycle management as well as orchestration, scheduling and monitoring of processes – these are all things that you need to do in a data warehouse. What we’ve done within SAP with our Business Warehouse product is that we have created a set of enterprise data warehouse services that originally were applied to the data that was in the Business Warehouse (BW). Over the last three or four years, we’ve been working on extending those enterprise data warehouse services to data that does not reside within BW. So we’re able to extend the security authorization concept to other databases that might contain some of that data. And with respect to the logical data warehouse and the information fabric, they include different form factors as well. So you may have cloud-based applications where the data that’s created resides in the cloud, but you need access to that data in order to do reporting and analytics that will include data from your traditional on-premise data warehouse. We’ve also included the ability to federate data so that you can, as they say, “play it where it lies” rather than having to load it into that more traditional, monolithic architecture. We’ve extended a lot of these services so you can reduce the TCO and the time to value when you have additional enhancements and want to include additional information domains within your enterprise data warehouse.
You talked about external data. How do I incorporate all of this into BW and HANA?
Brian Wood: Again, it’s interesting. You mentioned both BW and HANA. I didn’t mention HANA explicitly. HANA is our data management platform that includes the HANA database and a number of other features like predictive algorithms, text analysis and other capabilities. I mentioned adding the federation capability to our enterprise data warehouse. That’s actually done through HANA. HANA has a capability called Smart Data Access that allows you to look at tables on other databases. The most common databases are supported – Oracle, DB2, Teradata, SQL Server and a number of others. Essentially, you can point at that data and what happens is that at runtime the queries are federated so that you get the data from those other repositories and bring it in at runtime. In addition to that, we’ve added a number of features. You probably know that Business Warehouse (BW) contains its own ETL capabilities as well as having an interface to our SAP Data Services, a single enterprise-class solution for data integration, data quality, data profiling and text analysis So we have the ability also to access data in any of these in a more traditional, batch-oriented process of data acquisition. So there are two ways that you can include these. One would be if you wanted to move off of those other platforms. Then you would use those data provisioning capabilities in BW and in Data Services to load that data directly into BW. People have thought for years that SAP’s BW was only good for SAP data. The reason they thought that is we have provided extractors that do all of the work of reformatting the data into an analytical model for the SAP data models. But there’s really nothing that stops you from loading data from anywhere – whether it’s from third-party applications, custom-built applications or information providers.
Like I said, those can be done as a physical persistence where you do the typical ETL or ELT, or they can be done at runtime using the federation capabilities when you’re running BW on the HANA platform.
That’s excellent. How would you apply the SAP approach to big data analytics and business context?
Brian Wood: One proof of concept that we’ve done internally is to show how you can have an integrated system. I refer to it as an information supply chain. The example that I use often is if you’re doing analysis in your data warehouse. Let’s say you want to look at your five worst selling products. When you look at those, you’ll get what your planned sales were and what your actuals are. You’ll see all the typical structured data that you get from a data warehouse. But your next question might be “Why aren’t these selling?” So that’s more of a big data question. So what we would do is taking the master data that we have in the current query, which shows the five worst selling products, and kick off a job through our Data Services product that then spawns a MapReduce job in Hadoop that does text analysis and sentiment analysis. It says to Hadoop, “Tell me what people are saying and feeling about these five products.” When that MapReduce job finishes, the results of that analysis are made available using this federation capability, using Smart Data Access in HANA, so that the query can actually get the results that come from that analysis of unstructured data. Obviously, it’s derived data. It will tell you how many people had very positive, positive, neutral, negative and very negative sentiment. Of course, if you want to look at the actual text, you need to drill down into that, and we have the ability to include that hyperlink. But that way you have a round trip where you discover something about your structured data, about your business, and you want to investigate it further. That really becomes a very suitable task for some of the big data technologies like Hadoop and MapReduce. Then again, using HANA as the platform, you have the ability to integrate that data back, either via the federation of Smart Data Access or it can be loaded in a more traditional persistence mechanism and loaded into HANA either directly or through BW info providers.
So it really does provide big data integration that most enterprises are looking for today.
Brian Wood: We do provide that. In many cases, most organizations know they want to do something with big data, but I’m not sure they’ve all figured out exactly what. They know that they have a very useful and relatively new tool in their tool chest. When you have a hammer, everything looks like a nail. Well, big data is a hammer, but not everything is a nail. We help people figure out when a nail is a nail and thereby create more efficiency and ROI.
That’s great. Thank you for discussing with us how SAP is addressing the evolution of the EDW due to the impact of big data and making enterprises today more competitive.
Recent articles by Ron Powell
Copyright 2004 — 2019. Powell Media, LLC. All rights reserved.
BeyeNETWORK™ is a trademark of Powell Media, LLC