In our last chat about the production analytic platform, you gave us a great view of the theory Ė operational versus informational worlds and so on. Can you give us something a little more practical to grab hold of?
Barry Devlin: Ron, you know me too well Ė Iím just one of these architecture guys. Yes, I can, and I was thinking about this as I was getting ready for this interview. What I thought was we should just look at a practical example. Everyone knows that airplanes are collecting huge amounts of data from the moment that the pilot switches on the engines Ė perhaps even before Ė until the flight disembarks at the destination. There is a huge amount of data being continuously gathered. Every time a piece of equipment is turned on or off, measures of the flows of fuels, or the pressures in various manifolds. Itís all being collected. And this type of data, called Internet of Things data, is the basis for the analysis that weíre going to be doing. And that analysis is about allowing the airlines to make predictions about when the plane might be performing better, how they could reduce fuel consumption, how they could reduce costs, when parts are likely to fail so they can do preventive maintenance, allowing airports to make plans about how to improve flight patterns and so on.†
All of that data is what we call time-series data. And that data is very important when weíre looking at the production analytic platform. In fact, that data is one of the reasons that we thought about the production analytic platform.†I call the data events, measures and messages. Messages, by the way, are the messages that are being exchanged between the pilot and the control tower, in this case. All this data coming from the Internet of Things is the basis for getting benefit from the "Things" themselves.
The Internet of Things (IoT) is really big. Obviously we are getting so much sensor data, and it really makes a lot of sense to be able to use a time-series analysis to help us predict things. Everyone mentions volume, variety and velocity as key issues. How do you see it?
Barry Devlin: Well, I think theyíre interesting terms, especially if youíre in the marketing business for one of those products. But I believe we actually need to understand this data at a slightly deeper level. The key factor about this data is that it is essentially all about time. By that I mean each of those messages, events and measures from the airplane comes at a very specific moment in time. The order in which they arrive, the order in which theyíre generated, is particularly important. Itís the element of time that is embedded in the data that makes it very different from the traditional business data that weíve had in the past. And itís called time-series data because essentially itís a series of events in some time sequence that has led to a new way of analyzing and interpreting whatís going on with the data coming out of the Internet of Things.†
So how and why is time so relevant, Barry?
Barry Devlin: Itís because there is now so much of it, or indeed in our own experience perhaps thereís so little of it. So much of the data comes out so fast and in such great quantity Ė and that brings us back to the volumes and velocities Ė that the only way to actually analyze it is in time buckets, in intervals that you slice up this data into useful sets.†
Letís think a little bit about what it looks like. If you can, imagine these records that are coming out of the sensors. The sensor is saying something like, ďThis is the time.Ē Then it says, ďMy name is,Ē thatís the identity of the sensor, ďand hereís the payload of data.Ē All of these Internet of Things events and measures have this same structure. So you might have the time, the sensor on this particular engine, and the rate of fuel flow at a particular time. That set of data is actually quite structured, and itís very important to be able to use it in and to analyze it in series.
People think time-series data is new, but actually it has been around for a long time. If you think about the keystrokes coming out of a keyboard, itís time-series data. But more interesting is the fact that if you look at business data in the past, we have a very different way of looking at time. So, think of a bank. If Iím a banker or a customer of the bank, of course Iím interested in the fact that the transactions, money amounts, are going in and out. They are essentially events. But in terms of the banking business and my personal business, the way that the bank allocates interest is not based on the transactions going in and out. Itís based on the amount of money that is in an account over any particular time period. Thatís status data. Thatís another form of temporal data. It says that my account had this amount of money in it between last Friday and this Thursday. And that is a very different type of data, and thatís the data weíve analyzed and used in business intelligence
and analytics over the years for most of our data warehousing and business intelligence needs.
Time-series data, on the other hand, has forced us to look at it in a different way. It forces us to look at the data as time intervals, as time buckets, and be able to slide an analytic tool along this time series to see what is going on. What was the average over this period of time, or the maximums and minimums over that period of time, and what does that tell us about what is going on? Is this unusual? Is this something I should worry about? Has the temperature of the sensor peaked and does that mean that this particular component is going to fail? Unless I know if there is something else going on like a vibration or something somewhere else, I canít really say. The analytics here is about taking data from many sensors, blocking it up into time segments, and then doing the magic of analysis on it. And thatís very different and indeed more complex than what weíve done with analytics in the past.
Barry Devlin: In the case if the Internet of Things, this is the way it works. By the way, itís not just the Internet of Things that weíre thinking about here. With stock market data, the ticker tape is a similar time series and you can do similar analyses on it. But this idea of analyzing the data over a time series is what is different and what is important.†
What it leads to is the ability to save money. If I can do preventive maintenance, then I donít have to take my airplane off the route at a time when it is busiest. I anticipate that something may be likely to fail and I can make sure that I get the aircraft to the place where the component is so it can be replaced. This is where it links us back to our production analytic platform. When I start talking about decisions like that Ė where is the aircraft, where is the part, where are the skilled people to do the work Ė Iím actually talking about operational things at that stage rather than what we traditionally and typically do in business intelligence or analytics.
So we need to think about how these two things fit together. Iíve got the analytics on one side Ė thatís the ability to do analysis on the time series data Ė but then I have to go back into production operations to take immediate action in order to get the benefit.†
This is why when I started thinking about the production analytic platform that it made sense that we needed to take the data out of the analytic environment and bring it back into a more production-oriented environment like a traditional, high-performance, high reliability, high scalability relational database.
Over the years weíve always looked at operations on one side. We looked at analytics on the other side. But the production analytic platform is marrying those two worlds together.
Barry Devlin: Ron, I think youíve got it. The idea here is that we can no longer afford the time gap that we had in the past between the informational world Ė the analytic world Ė and the operational world. You and I remember the days when it was okay to say to the business people that they could go away and have a cup of coffee before we come back with the answer. Itís no longer the case. They need the answers immediately, and thatís why the production analytic platform becomes so important.
Thank you, Barry, for discussing with us how the production analytic platform is the answer to handling the complexities of analyzing time-dependent data, especially for the Internet of Things.