Big Data Analytics Requires Orchestration, not Federation: A Q&A with Chris Twogood of Teradata

Originally published December 12, 2014

This BeyeNETWORK article features an interview of Chris Twogood, vice president of product and services marketing with Teradata, conducted by Phil Bowermaster, independent analyst and consultant specializing in big data and data warehousing.
There is so much Teradata is doing these days that I thought it would be good to step back and talk about some of the industry trends that are driving these big changes. Maybe we could start off with this whole idea of an evolution toward an analytical ecosystem. Can you tell us what that means?

Chris Twogood: It’s really interesting. The market at large has really come to the conclusion that there is no single system that is going to do everything. You really need a cohesive analytic ecosystem that ties together different file systems and different analytic engines and brings Hadoop technology together with discovery platform technology and data warehousing technology. The challenge though is that it introduces complexity. So how do you minimize the cost and minimize the complexity in that kind of environment? Teradata has made some interesting announcements, very specifically one that helps minimize that complexity. With QueryGrid we integrate Teradata to Teradata systems, Teradata to Aster, and Teradata to Hadoop through our extended partnership with Cloudera on top of our partnership with Hortonworks. We also announced a QueryGrid Teradata to Oracle system. So you can have this cohesive environment, and the business user doesn’t have to know where the data is sitting. It’s transparent, the business users work through the architecture and then Teradata does all of the heavy lifting. The market is moving toward an analytical ecosystem, and Teradata is making it easier for them to do that.

The reality is that there is not one product or one technology that is going to solve all the problems so you have to be, to some extent, in the business of making all of those different parts work together.

Chris Twogood: Absolutely, and we call that orchestration. It’s not about federation because you want to be able to leverage the processing power of each of those systems so we push down processing to open source Hadoop, and then we deliver the result sets back to Teradata. We’ll orchestrate that kind of analytic framework.

Let’s talk about open source. A lot of these solutions that are being incorporated into an overall ecosystem are open source. What role does open source play now and into the future and how is Teradata addressing that?

Chris Twogood: If you look at all the investments in big data startups and all the different names out there, there is always the latest solution that says it will solve all of your specific problems out in the marketplace. But I think part of the challenge is how we as an industry create clarity around how these work together versus confusion. Thus, there are a number of different things that Teradata is doing. One of them is our partnership with Cloudera. Traditionally you’ve heard Cloudera say, for example, that Hadoop is going to replace the data warehouse. And I think there’s a market maturity that has come around that says that doesn’t make sense. They each have a play in the ecosystem.

Teradata also just announced a new service called the Data Integration Optimization Service that helps companies understand where the best place is to do your data integration. Is it better to do it on Hadoop? Is it better to optimize it within your warehouse? Is it better to do it on another server?

The other thing that is very key is an acquisition we’ve made of Think Big Analytics. Think Big is exclusively focused on the role of leveraging open source, whether that’s Cassandra, Hadoop, Storm, Spark or MongoDB and integrating them into a broader ecosystem. As you build out an analytic ecosystem, it is necessary to look at how to integrate these different technologies to drive real value. Teradata is doing a number of things like the acquisition of Think Big Analytics, our Data Integration Optimization Services and our Cloudera partnership to help give that clarity.

One of the things that this is doing for Teradata is putting you in a position where you really have to be a trusted advisor to the customer. You really have to be looking at each customer’s particular situation and put together the right mix of technologies that will solve those problems.


Chris Twogood: If you look at our heritage as Teradata, we have always been about helping companies gain value from data. While we have great technology that empowers that, over half of what we do as a company is consulting services. It’s about helping people understand how to get value, and that extends to all these emerging technologies. I think Teradata is a trusted advisor for people to understand how to get value and how to integrate the technologies to help drive that value.

Speaking of emerging technologies, one of the big buzzwords we hear lately is the “data lake” architecture. What’s Teradata’s take on that and how does what you’re doing factor that in?


Chris Twogood: We really view data lakes as an emerging architectural pattern. Not everybody is deploying them, but the promise of a data lake is actually quite interesting: bring in all of your data, process it, have it sitting there in its original fidelity, and then feed downstream analytic foundations for doing additional serving up to business users. Now the problem with a data lake is that the more data you put in it, the more it disappears. That’s because Hadoop as an architecture so far hasn’t done a great job of managing metadata. In fact, I heard someone say the other day that if you had a glass of water and you walk over to the lake and pour it in, the minute you pour it in, the water is gone. You cannot recreate that glass of water again. We recently announced Teradata Loom as a result of the acquisition we did with Revelytix. Teradata Loom provides integrated metadata, data lineage and data wrangling – all in a single self-service UI – to be able to understand the metadata within a Hadoop cluster so that you can start to understand your data lake and get real value out of it.

Having that element and understanding of metadata is also really important in terms of being able to orchestrate along with other analytic systems. So it’s a key value-add in the market – and, by the way, it’s available for download combined with the Hortonworks or the Cloudera sandbox so developers can try it out for free and see the value of it.

So you get the advantage of the data lake architecture but you don’t have that one-way street problem, right?

Chris Twogood: Exactly – and you’ve heard that people in the marketplace say it could be a data swamp if you don’t manage it appropriately.

Let’s talk about analytics in big data environments. One of the things I keep hearing is that there are a lot of challenges related to making traditional models. What do you think? Is it time for a new kind of analytics for big data?

Chris Twogood: I think people have started to get to the point where they’re capturing lots of data, but now they are trying to figure out how they can look through all of that data. They are trying to determine how to do it in a way that enables them to parse through the data, look for patterns and really uncover unique types of insights from the data. As part of our drive toward introducing new algorithms for helping with big data, we announced our Connection Analytics capability. Connection Analytics provides a lot of new algorithms that sit on top of a native graph engine as well as SQL and our MapReduce engine. It enables us to do things about analyzing the relationships between different entities. The entities could be people to people, people to products, products to processes, or machines to people. It’s not about analyzing the discrete units of the entities, but about the influx of information that flows between those two. There are very interesting algorithms – that include loopy belief propagation, personalized salsa and Shapley Values. All of these things really help drill into the data and uncover insight that wasn’t available before – and doing it in a very automated and machine-learning type of architecture. So it’s very interesting, and I think it will drive a whole new level of Connection Analytics.

It’s not just new insights – it’s new kinds of insights.

Chris Twogood: Absolutely. You might see things that help you with churn, social network analysis, viral marketing or fraud detection. There are all kinds of interesting use cases around that.

It’s interesting what you don’t know that you don’t know until you can start making connections that you couldn’t make before.

Chris Twogood: Exactly.

There is one more trend I’d like to talk about and that is in-memory. There seems to be greater demand for in-memory performance, and obviously Teradata has been playing in this space for some time. What can you tell us about the future for in-memory technology?
 
Chris Twogood: I think that the demand for analytics and the demand for data are growing faster than ever before. And when it grows, then it means your systems have to be able to perform really at future scale. How do you provide the performance required to meet the demand? What that means is that as vendors in this space we have to reduce the amount of bottlenecks that sit between different tiers in an architecture. The challenge has always been that your disks are the slowest component in your architecture. Then you have memory, and then you have CPU. And when they interoperate and use I/O, that degrades performance. When in-memory first came out, people were saying, “Can I take a lot of the stuff that was on the disk and put it in memory?” What Teradata has done is really driving even more advanced engineering around how to process data in-memory – doing advanced pipelining and in-memory friendly structures like columns so that they’re easily consumed by the CPU. We’ve even taken it a step further, and we’ve put it into the CPU using new technology from Intel around vectorization to reduce the movement from CPU to in-memory because in-memory is almost becoming the new bottleneck. Now we have to move up to the CPU.

So we’re going from in-memory storage to in-memory computing – fascinating. Thank you for taking the time to share these insights into Teradata’s advancements to help organizations gain value from their data.


  • Phil Bowermaster
    Phil Bowermaster is an independent analyst and consultant specializing in big data, business intelligence and analytics. Phil is the founder of Speculist Media, which produces blogs, podcasts, and other social and traditional media exploring the role of technology, particularly data technology, in shaping the future. He works with select clients in developing and executing content strategies related to big data. Phil can be reached at phil@speculist.com.

Recent articles by Phil Bowermaster

 

Comments

Want to post a comment? Login or become a member today!

Be the first to comment!