Incorporating Big Data into an Enterprise Information Architecture - A Q&A Spotlight with Oracle's George Lumpkin

Originally published October 24, 2011

BeyeNETWORK Spotlights focus on news, events and products in the business intelligence ecosystem that are poised to have a significant impact on the industry as a whole; on the enterprises that rely on business intelligence, analytics, performance management, data warehousing and/or data governance products to understand and act on the vital information that can be gleaned from their data; or on the providers of these mission-critical products.

Presented as a Q&A-style article, these interviews conducted by the BeyeNETWORK present the behind-the-scene view that you won’t read in press releases.


This BeyeNETWORK spotlight features Ron Powell's interview with George Lumpkin, Oracle's Vice President of Product Managment, Data Warehousing. Ron and George discuss the trends in data warehousing, big data analytics and grid computing, and how Oracle's Exadata platform is taking these trends to a new level.

George, we're witnessing a massive explosion of data volumes. At the BeyeNETWORK and SearchBusinessAnalytics.com, we're hearing a lot about big data analytics. It seems everyone is talking about big data, cloud computing and grid computing, and most enterprises are very demanding. They want deeper, faster analysis of data. They want more people to be able to use analytics, and they want to be confident that their data is always going to be available. Nobody wants to wait for anything today. Enterprises want quicker query response time, faster business insight and the ability to use many different data types and sources. If we think back to the early days of data warehousing, we were lucky if the data warehouse was updated daily – most of the time it was updated in batch mode at night. Obviously, the old technology wouldn’t make the cut in today's fast-paced world. We'll talk about a lot of different topics in this interview, but let's start by getting your definition of big data.

George Lumpkin: In data warehousing, we've always had a trend of enterprises wanting more – more data, a larger population of business users, faster query response times and more sophisticated analysis. I want to draw a distinction here – big data is not just about more. It's not just about larger volumes of data or data size. It's something that's a little bit qualitatively different than the stepwise evolution of data warehousing capabilities. Big data is about reevaluating the potential value of business data and working to build an infrastructure that taps into all of an enterprise's data assets – looking for some of the previously unknown relationships between products, customers, and suppliers. What that really means is big data can be characterized by a couple of new requirements.

First is the embracement of new types of data into the information architecture of an organization, and often this is semi-structured data, not the traditional rows-and-columns relational data. I think a prime example of semi-structured data is data coming from sensors, or machine-generated data, such as RFID type data and location information coming from the mobile devices that we all carry around. In utilities, there are smart meters, and for quite a few years companies have been working to gain information from web logs. I don’t think semi-structured data just stops at the sensor data; it also includes the documents and emails that all organizations have. These new data elements are often produced at much higher rates than the classical transactional data. Your cell phone is generating location data much more rapidly than, for example, the number of times you go to the ATM and do a withdrawal from your bank. There is a lot more data coming in at much higher rates, and enterprises need to be able to manage these new types of data and incorporate them into their overall  information architecture framework. These new types of data are one of the new characteristics of big data.

The second characteristic of big data is the need for deeper and more sophisticated analysis of this data. You want to be able to do new types of statistical analysis – not over a small sample of gigabytes of data, but over terabytes, potentially even petabytes. We talk a lot about social media. Well, being able to do graph analytics and start looking at the relationship between different individuals, for example, would be a new type of analytics. We have some capabilities in the Oracle database platform, for example, and half of them have been used in a widespread manner in other types of predictive analytics.

It's not just being able to do these new types of analytics; you're going to need the parallel infrastructure to be able to make these analytics scale across the huge data volumes of big data.

Those are the two characteristics that we think help define big data. I think as organizations look to encompass all of their data, there are a lot of new challenges that have arisen and there is tremendous potential in terms of business value for organizations that can harness their data, turn it into new products, and use this information to better optimize their business processes.

Not all these technologies are new. For example, people have done text mining in the past, but they've used very specialized solutions. The opportunity today is to have a cohesive big data platform that brings together some of the capabilities of the previous point solutions and puts all these capabilities together into an overarching enterprise infrastructure that scales to petabytes of big data with the new data types.

George, you bring up a key point. Obviously, there's a lot more data out there, and you mentioned the ability to look at location data from a smartphone, which really illustrates the “I want it now” culture of today. Smartphones and other mobile devices are really increasing the need for immediate information. What do you see regarding the Oracle platform that meets this need for today's enterprise?

Lumpkin: You know, there's been a huge amount of innovation in database platforms over the past few years and especially in the field of data warehousing for databases. There's been a confluence of new technologies and new database approaches that have all come together in the Exadata platform. Examples of a few of these technologies are the incorporation of flash-based storage in addition to disk-based storage. The Exadata platform delivers database processing in the storage tier, providing a whole new processing tier for doing database optimizations at the storage level and also at the server level, where database processing has always occurred in the past.

We have columnar storage capabilities. Instead of storing rows of data, we can store on disks – sort of a columnar organization of the data. We've been working on in-memory database optimizations, and our Exadata platform is built upon InfiniBand so we have a much faster networking infrastructure as well.

All of these new innovations I’ve mentioned are integrated on the Exadata platform. With Exadata we delivered 10X or more performance gains over conventional data warehouse platforms. The great news and some of the success of Exadata is that we have exceeded the expectations and exceeded the requirements of some of today's data warehouses. When we look to big data, what the Exadata platform is providing is not only the ability for existing data warehouses to solve their “I want it now” business problems, but also the scalability and performance smart organizations want in order to tackle new challenges of big data.

How would you answer the concerns of a CTO or data warehouse administrator who thinks that bringing in Exadata would add another layer of complexity and potentially make things more difficult? 

Lumpkin: I think it's actually quite the opposite. Exadata does not make things more difficult. Exadata actually simplifies an organization's infrastructure. Let me explain how.

First, Exadata is not really new in many senses. There are a lot of new technological innovations inside of Exadata, but Exadata runs the Oracle 11g database. This supports all the same applications and tools, any other Oracle database, same Oracle DBAs, and the same Oracle administration and management approaches. So in the first sense, Exadata is very much continuous with the current IT best practices. It's continuing to use the Oracle database.

The second point – and perhaps more important – is that Exadata delivers order of magnitude types of performance gains. The gains in Exadata performance and scalability greatly decrease the amount of hardware required to put together the large systems required for data warehouses, for example. Again and again, we've seen customers who have replaced five or even ten racks of legacy hardware with a single rack of Exadata. It's simpler in the data center to manage a single rack of Exadata than to manage a much larger number of servers of older hardware.

Finally, there's a third way Exadata simplifies the IT infrastructure and that's because Exadata is an integrated product – what Oracle calls engineered systems. In other words, we built the product – the database, the database server, the storage, the storage software, the network – and then integrated and tested them together. If you contrast that to what an IT organization needs to do to build their own large system without Exadata, the difference is significant. They’d have to acquire all the pieces separately, assemble them themselves, and go through the process of tuning all of the components together to build a high-performance system. With Exadata, all that engineering has already been done. We've built a system that delivers the highest levels of scalability and performance without requiring the system to be assembled. So for many reasons, Exadata is a much simpler solution for large-scale database systems than any other solution on the market.

Well, that definitely answers the question – simplicity and reduction of complexity is key. Where are you seeing the most interest in Exadata and are some industries bigger users of Exadata than others?

Lumpkin: When version 1 of Exadata was released, it was a product solely focused on data warehousing – large-scale databases, lots of data, and high performance parallel queries. Currently, Exadata is in its third generation and it's positioned for all database workloads, not just data warehousing workloads. So, we've seen our customer base evolve with the Exadata product capabilities. Today, the majority of Exadata systems are still used for data warehousing, but it’s a very slim majority. Almost half of the new systems are being used for other purposes. The primary use beyond data warehousing is for consolidation. Exadata has become a very popular platform for hosting large numbers of Oracle databases. In other words, moving Oracle databases off of individual servers, consolidating them onto a single Exadata platform – and this is really an instantiation of cloud computing. It's about organizations who are building private clouds, consolidating all of their database workloads onto a smaller number of platforms and having a more efficient operating environment for their databases. So we've seen two large categories of usage for the Exadata platform evolve.

The second part of your question is about industries. I don’t think it'll surprise anyone to hear that financial services and telecommunications have been the two largest industries for Exadata. But just as we have Oracle databases across all industries, we've seen Exadata being adopted in all industries as well.

Well, the emergence of the private clouds in most enterprises has been very, very dramatic, so it makes a lot of sense that Exadata would be positioned in a private cloud. One other area that you mentioned earlier is the integration with storage and storage needs for an enterprise. Obviously, with large amounts of data, storage is a key concern for most organizations. How does Exadata help an organization make effective use of storage?

Lumpkin: First, Exadata has enabled organizations to store a lot more data. One of the big innovations of Exadata was a new compression capability. So we have what we call hybrid columnar compression that delivers 10-to-1 compression for data warehouse data. We have some large data warehouses that are able to store a few hundred terabytes of data in a single rack of Exadata. We think this is somewhat revolutionary for IT or certainly for business users – the concept that potentially a petabyte of business data is not several rows of machines in a data center but potentially three or four racks of machines. From a business standpoint, this opened up the ability to store large volumes of data with a relatively small hardware footprint and a relatively small overall cost of acquiring and managing the system. The real potential for the businesses is to be able to start looking at how they can use these larger data volumes. They can look at how to get more value from data that they haven’t been able to analyze or store before and start driving toward some new analyses to solve business problems.

Very good. I think that 10-to-1 compression capability would excite a lot of people out there. Wayne Eckerson completed a comprehensive study on analytic platforms for us, and there are a lot of platforms out there today. What do you feel is the most compelling reason for someone to choose Oracle Exadata as their analytic platform?

Lumpkin: I think the most compelling reason is that Oracle Exadata is a solution that allows you to have your cake and eat it too. Exadata delivers all the performance and scalability and all the latest advantages in hardware and database software technologies. Yet at the same time, it's running the Oracle database and supports every major database application and every major database tool. The popularity of Exadata has been driven by enterprises realizing they don’t have to retool their whole infrastructure to accommodate a specialized analytic platform. With Exadata they get all the benefits of the specialized analytic platform, but they get to keep their enterprise database – the database they understand – and maintain the highest levels of availability, security and manageability.

Now, I talked a lot earlier about some of the performance, scalability, and compression advantages of Exadata. I also want to point out that there are a lot of analytic capabilities in the Exadata platform itself. We've been very much focused within the Oracle database to deliver in-database analytics. We have the ability to do spatial analysis in the database. I talked earlier about graph analysis. We have native OLAP capabilities, predictive algorithms and data mining algorithms built into the database. We support text analytics in the database, and it's all integrated into the same Exadata platform. We’re providing the functionality of the analytics, the scalability and the performance of Exadata – all built into what's been the Oracle database. It's really sort of the standard database operating environment within enterprises.

Obviously, you have the Exadata database itself. You mentioned that an Oracle DBA can support an Exadata analytic platform because they already know the Oracle database and the utilities that interface with it. You make a really good case.

The last question I have, which I have been asking almost everyone when I do these spotlights, is what do you see for the future?  Obviously, you meet with a lot of customers and you've watched Exadata move very quickly into this analytics market space. You mentioned it moved much faster than you had even envisioned. What do you see for the next two to five years?

Lumpkin: Well, I think over the next few years it's going to be interesting. I think the potential business benefits of analytics in the broad area of business intelligence and data warehousing are potentially greater than ever before. The reason I say that is because most companies have their business processes online today. They have the ERP systems and so forth. But increasingly, you have a lot more data that's come online that hadn't been available for analysis before. You have social interactions, location data, and other types of sensor data, and  this is all now coming within the grasp of enterprises to start to analyze. I think there's a lot more potential business benefits that can be tapped out there.

And with big data, we talk a lot about the various technical hurdles. You're going to have hurdles around scalability, around the complexity of the data, around the rate at which the data is coming from the sensors and managing data latency. There are certainly going to be challenges around some of the softer issues such as the quality, privacy and security of the data. But I think that as we look forward, some of the more interesting things are going to be the technical problems to solve. From my position as a vendor, we're really looking forward to working to solve all those technical requirements.

But what's going to play out within an IT organization? I think one of the biggest challenges for a CTO or a CIO is finding the right analytic solution to deliver the biggest payoffs. We’ve put a platform like Exadata out there, and it raises the potential of storing a petabyte of business data in a small of number racks of Exadata. That's never been possible before. So now, you have the infrastructure, the groundwork for tackling some of these big data problems. We have a lot of analytic capabilities and a lot of new tools to tackle these problems. I think what the future holds in the next few years is that we're going to see how organizations start applying this new technology, these new capabilities, to solve new business problems, potentially generate new business or even make pretty dramatic changes in the way some businesses are run today. I think that's the potential, and I think that's what we're going to see evolving in this field over the next few years.

Well George, I definitely agree with your prediction because the rate of change has been quite dramatic. The ability of technology to handle this amount of data at very fast speeds is truly remarkable. I would like to thank you for taking the time for this interview and good luck in the future.

Lumpkin: Thank you. I appreciate the opportunity to talk to you.

  • Ron PowellRon Powell
    Ron, an independent analyst and consultant, has an extensive technology background in business intelligence, analytics and data warehousing. In 2005, Ron founded the BeyeNETWORK, which was acquired by Tech Target in 2010.  Prior to the founding of the BeyeNETWORK, Ron was cofounder, publisher and editorial director of DM Review (now Information Management). Ron also has a wealth of consulting expertise in business intelligence, business management and marketing. He may be contacted by email at rpowell@wi.rr.com.

    More articles and Ron's blog can be found in his BeyeNETWORK expert channel. Be sure to visit today!

Recent articles by Ron Powell



 

Comments

Want to post a comment? Login or become a member today!

Be the first to comment!