Barry Devlin: Thanks Ron. It’s good to be back with you for the start of 2018 – another year and another set of challenges, I’m sure.
Well, things change, but at the same time, they remain the same in any respects. When I look at the final “ThoughtPoint” of our series on the production analytic platform, you are starting to predict the future. Is that wise?
Barry Devlin: Wisdom – I don’t know about that, Ron. It’s a tough one. But to be honest, I’d say from my point of view as an old architect, it is really the architect’s job to try to predict the future because if you do a good job of predicting the future, then you actually get to develop an architecture that has legs and is going to last for some time. I suppose I can’t resist at this moment just mentioning that this year – 2018 – is the thirtieth anniversary of the first publication of the data warehouse architecture by me and Paul Murphy back in 1988. We’re 30 years doing that architecture so I must have done a reasonably good job of predicting the future back then. So let’s see how I do now in my advanced years!
I think it’s more difficult these days because technology is now evolving so fast that it’s really hard to keep up. Actually, I increasingly see that technology is driving business. So rather than looking at the future business needs, which would be the old way of predicting the future, now I think we need to start with the technology and look to see how that is going to impact business needs.
I totally agree, Barry, about the importance of an architecture that is able to extend many, many years – like you mentioned 30 years for the data warehouse. That is a wonderful achievement. Looking at the production analytic platform, you talk a lot about artificial intelligence as being the key driver for the future evolution of this platform. Why is that?
Barry Devlin: I think that artificial intelligence is really evolving so fast in the world at the moment that it’s going to impact everything. The whole area of autonomous vehicles is an example of what’s going on in artificial intelligence. If you think about the impact that autonomous vehicles are going to have on the economy, on the social structures, and on the way we design and build our cities, it is going to be huge. And underneath the covers is artificial intelligence – image recognition, natural language processing and event detection and so on. All of those things are observing the world and deciding what an event means, such as is a child on the curb going to run out in front of the car or not. All of those issues are about making decisions and acting upon those decisions. If we can do it in real time – which we have to do with autonomous vehicles – then it’s surely going to also become extremely important in enterprise computing.
From that point of view, the production analytic platform has to take into account the emergence of artificial intelligence (AI). There is another parallel here because if you go back to the story in one of our earlier podcasts about the train going across Europe, all of that is about data coming from the Internet of Things (IoT) – sensors and so on. That is trying to make similar decisions but on an enterprise scale.
So we’re now looking at a production analytic platform that is based on data coming from the Internet of Things and also from other sources and talking about analytics in the current environment. Analytics is where we started with the production analytic platform, and I see a very thin line between analytics and AI
going forward. Analytics is also about finding patterns, the meanings of patterns, and what decisions should come out of those patterns. And, again, if you push it to the extreme, making decisions and taking action – as we used to do in operational BI
and now we’re doing with operational analytics.
The parallels are just too close to ignore, and for that reason I think AI is going to be a very strong driver for technology in general, but also for enterprise technology and our production analytic platform as well.
Barry, like analytics, AI depends heavily on large data volumes for training and high data velocity for operations, leading to the production analytic platform residing next to a data lake in a largely centralized environment. Do you see this configuration changing in the future?
Barry Devlin: I think it is going to have to change, and I think it is also going to have to stay the same. In a way I think we’re going to have a continuation of a centralized environment, but we are also going to move toward a distributed, decentralized approach in the future.
I say that because looking at what is going on in AI at the moment on the leading edge of research, there are two really important things that are happening. One is that there is a focus on using less data in training. It’s really actually quite interesting. When I started looking at AI a few years back, I would say the parallel between analytics and AI was that they both needed big data for training and they both depended on big data in order to work.
But the AI folks have now started to move away from this idea of being dependent on huge amounts of data because it’s a very expensive and very slow process that they go through in training. So there is a lot of work going on in a field called reinforcement learning, which is essentially getting AI systems to learn from the results they get from experiments without having large training data sets. They’re given a goal, they keep playing at that goal, and they see how they move toward it. It’s how children learn. If you really want to explore that, read some of the material on the Internet over the last couple of months about AlphaGo Zero, which is a game playing machine. In its previous inception, the system learned from literally thousands of pre-loaded games from experts in Go. This current incarnation of AlphaGo takes no data from experts. It just goes and plays against itself and learns. It handily beat the original AlphaGo without any expert input.
So this move away from big training data sets is going to be important, and that is going to change some of the architectural constraints that we currently see around bringing everything into one place in the big data and big data processing approach. So that’s one driver from the AI side.
But the other area that’s really important is that we see more and more of this artificial intelligence being pushed out to the edge of the network. Put it on your smart phone so it can answer your questions without going off to the Internet to figure out your natural language processing. Put it in the car so we don’t have to spend so much time downloading and uploading data. All of those things say to me that we’re looking at a much more distributed and decentralized approach in the longer term.
And I see that coming from Teradata as well. If you look at their Teradata Everywhere approach, what you see is that we still need a large piece of iron with a lot of processing power and a lot of data, but we also need to be able to move our processing out into different devices – smaller machines, out into open source systems and so on. That idea that we’re really pushing everything we can into a much more distributed and decentralized environment is essentially where we’re going to go in the future.
Barry, we’ve covered a lot in this series of podcasts on the production analytic platform. Could you give our audience a summary of the key aspects that we’ve covered?
Barry Devlin: I can try. This is a very big topic. It reminds me of data warehousing in the early days because there are so many different things going on. But I think like the data warehouse, we have to think what are the key technologies and environments that we need to support. Call me old-fashioned, if you like, but I still see that the heart of the production analytic platform is going to be a relational database technology. Reliability, availability, scalability, maintainability and performance levels that span from operational to analytic use – all those things require a relational database at the core, but with the built-in non-relational support for all sorts of formats and processing approaches from analytics through AI, linking it in via SQL as well as native languages. And then we have data in all sorts of places, so data virtualization – what some people call logical data warehousing – provides access to and use of the data stored remotely. And distributed and decentralized operation increasingly, as I’ve just said, is another aspect. And integration in terms of making all these technologies work together is going to be key. That’s something that Teradata does very well. And, also integration in terms of providing an experience for the users so that we move away from this old idea of saying there’s an operational world, an informational world, an analytic world and they don’t really talk to one another but saying instead that they all need to meet together in one place so that we can make it work together. Those are the things that are going to be really important as we roll out this production analytic platform concept.
Barry, from a transitional perspective, isn’t it much easier for enterprises regardless of where they are in their informational or operational journey that they can utilize a lot of their existing technologies with the relational database as the core?
Barry Devlin: Yes, Ron, that’s a good point and thank you for bringing it up. I think it is absolutely essential to keep that in mind. Over my many years of working in this industry, one of the things that I’ve often seen that drives many failures is the attempt to rip and replace because the new technology looks better or, as is often the case, is cheaper. I think that’s a dangerous strategy, and I think you were right in saying that building on what you have is generally a safer place to start. For sure, the relational database with 40 years of history behind it has a lot of things going for it. I think we would be putting ourselves in danger by moving too far away from that as a core technology.
Thank you, Barry, for discussing with us how the production analytic platform will evolve in the coming years to address the future demands of analytics and the Internet of Things.