This BeyeNETWORK article features Wayne Eckerson’s interview with Dan Graham. Wayne is founder and principal consultant at Eckerson Group, a research and consulting company focused on business intelligence, analytics and big data. Dan is director of technical marketing at Teradata. Wayne and Dan discuss the newly announced Teradata Listener for dealing with streaming data.
Dan, you have your fingers in a lot of the new technologies at Teradata. Tell us what’s new.
Dan Graham: We’ve recently announced Teradata Listener, which is a new platform for dealing with streaming data – real-time data.
What was the origin of Listener – why now?
Dan Graham: We have a new president at Teradata Labs this year, Oliver Ratzesberger. At two retail companies where he worked, his team had to build a streaming tool in order to serve the data analysts and data scientists. They found themselves building these tools in one retailer, and the tool worked. It certainly did the job, but it was a point solution. It wasn’t general purpose. It couldn’t be used for multiple streams. And due to a number of things, it became a maintenance problem for them. The team got hired away from that retailer to another retailer, and they found themselves building it again. They found themselves in a situation where they were building streaming tools, and they wished that somebody would give them one so they didn’t have to do this. They wanted something more general purpose, more enterprise-class – something that would be flexible. So when Oliver came to Teradata, building that tool was one of his priorities.
It seems like streaming is in the air these days. There is a lot of talk about it, and it seems there are a lot of new products. What core use case needs will Listener meet?
Dan Graham: For today, it is fairly simple because it really is a subsystem that allows programmers and data scientists to capture a stream and deliver it to any number of repositories very quickly. There is a big emphasis on ease of use. So I could have you use this product, and within 15 minutes, you could grab a Twitter stream and deliver it to your data warehouse or to Hadoop. It really takes only 10 or 15 minutes.
There are two things going on here that are actually innovative. One is speed, of course. The stream is coming at a speed that is very fast. The ease of use changes things. In the past, to get a tool like this I’d have to go to IT and request a requirements ticket, ask for a priority to get on the schedule, get a programmer in 3 weeks or 3 months, wait for the programmer to work on it and deliver it, and then work out any bugs. I don’t have to do that anymore.
So I don’t need to call IT. I’m not a data scientist or a programmer, and I don’t have to be to use Listener, right?
Dan Graham: That’s right – you don’t have to be a data scientist or a programmer.
What skills do I need, if any? Is it all just drag and drop, point and click?
Dan Graham: There is dashboard for drag and drop to identify the incoming source and the destinations, and that’s about it. You really just hook the two things up. It’s kind of like a plumber hooking together two hoses very quickly. There’s a little more to it than that, but it is actually easy. We give you the REST API call, and we give you the statement. All you have to do is insert it into the script. So there is not a whole lot to it, and what this means is the programmer doesn’t have to go through a governance process. He doesn’t have to go through development. He doesn’t have to do anything except hook it up.
And I will tell you, business users are not going to use this. Business users are going to get the data, but they’re not going to use the tool directly and they’re not going to go into Listener and set it up.
Obviously, some of your more advanced customers need this instead of trying to build it on their own, as you mentioned. Do you see demand from the majority of your customer base for this now or is that something you think will build over time?
Dan Graham: We have just announced this so I think most of our customers are a little surprised. I think every one of our customers, if you talk to them long enough, you’ll find that they have streaming data that they’ve been throwing away. It’s been too hard for them to work with it. They didn’t know how to capture it, how to put it into the database, or how to put it into Hadoop. They didn’t have those answers so they’ve done nothing. It’s dark data that goes away. Almost every one of our customers has probably 5 to 10 streams that if they could get their arms around them easily, they would be doing it.
Look at the data scientist. They do complex work for the corporation. This person is sitting there saying, “Well, if I just had a little more data to calibrate my model or to calibrate my thinking, it would really help.” This is a very easy way for them to get it.
There is a tremendous amount of data out there. Think about it. I always laugh about weather data – what do I care about it? Well, if I’m a trucking company, I care. If I’m an airline, I should care. Even within the consumer products goods environment, I might need to know, for example, why sales were down. I can correlate with weather data at that point in time. Now because of the speed of this data, most of the applications will be analytic operational applications. You and I used to call this operational BI
or pervasive or active BI. Those kinds of workloads are the first ones that come to mind.
You mentioned retail. I assume the streams they might be using are point of sale, but aren’t there a lot of other use cases like social media and the Internet of Things?
Absolutely the case. Even in just retail because they don’t cover every use case in the world. Point of sale is actually the slowest stream that they have to deal with. Web clicks is another one. You’re getting web clicks every moment on the website. What are people doing? Why are they doing it? Did they call into the contact center
looking for help or tried to buy something and it didn’t make any sense? This stuff is arriving in real time. If you can respond to it in less than a minute or 20 seconds, then you can at least deal with the fact that grandma is having trouble with your website. She tried to order something. Maybe it’s you having trouble. Maybe we can get back to the customer right away instead of tomorrow.
Let’s talk about the inputs and outputs. What kinds of streams can it pull from and then where is it putting the data, assuming that you’re not doing much transformation or analysis on the fly.
Dan Graham: We’re trying to keep this simple right now. We’re not going after the complex event processing business. This is a new segment, a new genre of streaming tools that is coming in to being. We’ve had chats with a number of analysts. They’ve all said, “I have to redo some of my slides to encompass this segment.”
We’ve got a lot of incoming streams – anything that looks like a continuous stream of data rather than a file. People think about Twitter quite a bit and point of sale comes to mind. The Internet of Things is pretty popular because the sensors are continuously emitting data. Sometimes it’s very slow, but if you have one thousand sensors, it certainly adds up. There is data of all kinds from all kinds of industries.
One of the ones that has become popular is surveillance data. We’ve got a lot of security concerns in this country lately and certainly in Europe as well.
So is there a common API to connect the backend sources?
For the incoming sources, we use the REST API. So with anything you can insert a REST API statement into, which is just about everything, it becomes very easy to get the data. On the output side, we use fundamentally two APIs. One is ODBC. We’ll feed mini batches of data into your database – in our case, it’s the Teradata Database or the Aster Database. And then we use REST APIs to flood data into Hadoop. We’re already set up. So anything that can use ODBC
or a REST API on the output can receive the data.
What’s the future?
Dan Graham: We’ll probably be putting in some capabilities that look like complex event processing or CEP. We’re not aiming to compete with the complex event processing vendors. That’s not our goal. But there are some goals that we have to have some logic into these streams and do it in the streaming clusters. I should mention that this is not only a self-service cluster, but you can make the cluster 100 nodes if you want to. It’s a software only product. The customer brings his own hardware.
We’re going to have to start adding some filters and the ability to do analytics and transformations. But I think we’ll start by simply pumping the data to things like Spark and Storm and some of our partner products like IBM streams, and let them do some of the analytics or do whatever their product can contribute to a stream in flight.
You mentioned a runtime cluster. Is this a Hadoop cluster or some other cluster?
Dan Graham: Just a cluster – it’s any cluster. You just have to rack up some number of Intel nodes. We are running an awful lot of open source, so we’re running on OpenStack. It can be a cloud cluster. And we’re also running on Mesos, which is a cluster coordinator in the Apache projects. We have integrated about 10 different open source modules into one very large system as well as adding a lot of Teradata expertise – the guys that came from the retailers plus Teradata Labs as well.
Sound like an interesting product coming at the right time. Thanks for enlightening us about it.
SOURCE: Teradata Listener for Streaming Data
Recent articles by Wayne Eckerson