For this article, Ron Powell, independent analyst and expert with the BeyeNETWORK and the Business Analytics Collaborative, interviewed Mark Shainman, Director of Presto and QueryGrid for Teradata. They discussed the new features in QueryGrid 2.0 and how it is benefiting customers.
Mark, what is QueryGrid?
Mark Shainman: QueryGrid is a data fabric that Teradata has developed that enables customers who have the vision of a Unified Data Architecture (UDA) to query from a platform such as Teradata or Aster Data and reach out and query other platforms. It also has the ability to push down some of the processing into those other platforms like Hadoop or even from Teradata to Teradata, or Teradata to Aster Data. It also can bring that data back, do join processes and aggregation processes in the Teradata engine. Or, you can initiate those queries from one of the other engines as well – from Presto or Aster.
When we last talked, you were just introducing QueryGrid. Now you’ve just released QueryGrid 2.0. What’s new in QueryGrid 2.0?
Mark Shainman: We’ve actually gone through a process of re-architecting the QueryGrid product to not only make it easier for us to more rapidly develop connectors to other platforms, but also make it much easier for our end users to manage the QueryGrid environment.
The new QueryGrid architecture is actually a modular architecture where there are multiple pieces such as a link and Listener component that are then used every time we create other connectors. The second component we’ve built out is a management server. Through this management server, you can now leverage Viewpoint, which is our graphical interface that most of our management tools and applications within Teradata leverage. You’re able to do things such as set up connectors, dynamically upgrade connectors, or do testing of those connectors. Through the management server, you can now actually trace the performance and the path of a query when it’s executed in one platform and as it flows to the other platform. You can also look at the SQL that’s executed on each one of the platforms and also how much data was brought back, etc. In the management server, you also now have the ability to set up security as well as do encryption and other things like that.
One of the biggest themes in QueryGrid 2.0 is really that overall management of the data fabric and architecture.
What benefits are customers seeing? Obviously, as you analyze these queries and you are able to change them and optimize them, what kind of a performance effect are they receiving?
Mark Shainman: One of the biggest benefits that customers are seeing just on a QueryGrid level is that traditionally if you didn’t have QueryGrid in place, a customer would have to put in an IT ticket to indicate what data was needed. Two days later, IT would look at what data was needed and then ship it over to the business. Then you’d go through that join process and you’d do the analysis. Then you’d discover that you needed some other data. It would take another two or three days for IT to ship the data over. Now, with QueryGrid 2.0, we’re giving the end users instant access and the ability to reach out and grab that data from another platform and join it with data they have. So it’s really that self-service use and giving them the ability to reach out to all those other platforms and get value out of data versus it just being silos of data – a silo within your Hadoop environment, a silo within your Aster environment. With QueryGrid’s integration with Presto, you can also query data in not only Hadoop but numerous other platforms such as PostgreSQL, MySQL or even Amazon S3. Now through QueryGrid you’re able to actually give access almost at an instantaneous level to those users so they can actually get value out of that data.
So you actually take a big load off of IT?
Mark Shainman: Exactly, because IT does not have to go through the process of shipping and moving all that data. One of the other great things is you’re actually able to push processing down into those other platforms. For example, if you had data that exists in Hadoop and data that exists in Teradata, if QueryGrid didn’t exist, you’d have to bring all that data into Teradata or Aster, and then you would need to use CPU cycles within your Teradata environment. Or, if it was another platform, you still would have to move that data in. Now with QueryGrid, you can actually push that processing down into that other engine and use the capacity in that other engine, which creates a value proposition of not using CPU cycles in the Teradata or Aster platform, etc.
What does the roadmap look like for QueryGrid? What will you be doing next?
Mark Shainman: The first iteration that we announced at Partners is the QueryGrid 2.0 with the connector to Presto. In Q4, we’re coming out with the QueryGrid connector to Hive, and then in the beginning of 2017 we’re coming out with a connector for Aster as well.
One of the other neat things that’s actually coming out with the release of version 16 of the Teradata Database is actually our Adaptive Optimizer. With that, you’ll be able to execute a query. The Optimizer will have statistics from, for example, the Hadoop platform, and it will be smart enough to determine the best way to push the join process down into Hadoop and do the join there versus doing the final join in Teradata. Or it might determine there is a small table in Hadoop and a large table in Teradata, and query the data that exists in Hadoop, bring the smaller table back and then do the join in Teradata. So it’s actually a smart Optimizer that’s aware of statistics that exist in these other platforms. And that’s coming in the future as well.
This really gives a boost to query performance.
Mark Shainman: It does!
That’s great, Mark. Thank you for taking the time to talk about QueryGrid.
Recent articles by Ron Powell