GPU Technology for Real-Time Analytics on Large and Streaming Datasets
by Ron Powell
Originally published June 21, 2017
Ron Powell, independent analyst and expert with the BeyeNETWORK and the Business Analytics Collaborative, interviews Amit Vij, CEO of Kinetica. They discuss how their GPU-accelerated database is being used for real-time analytics on big data.
Your product is a GPU-accelerated analytics database for real-time insights into large and streaming datasets. Can you start by telling us the difference between a centralized processing unit (CPU ) and a graphical processing unit (GPU)?
Amit Vij: A CPU is good for general-purpose computing where it is responsible for running the operating system and managing the input and output of the computer’s components. It is able to execute on a large, flexible instruction set that is sequential in nature. The GPU is highly specialized for running operations in a massive parallel fashion that are more simple in nature. This is very good for graphics processing. Now the GP GPU, which is the general purpose graphics processing unit, is used in data centers for running SIMD operations – single instruction multiple device operations – in a highly parallel fashion.
Could you tell our audience how Kinetica got started?
Amit Vij: We were incubated within the U. S. Army Intelligence Command and the NSA. We started as a real-time geospatial and temporal computational engine that slowly matured to a highly available, distributed database that is GPU accelerated. The project that I was on was called the Brain Program. Essentially, the mission of the project was to kill or capture terrorists in real time. We were moving from a document-based search to entity-based search, traditionally running on a relational database and moving to open source technologies where our project was ingesting from more than 200 different streaming data feeds. This included drones that are tracking every asset that moves at 30 frames per second; mobile devices that are emitting their metadata every few seconds; social media to include Twitter, Facebook and other types of data feeds; and also cyberdata. Cyberdata is probably the most difficult data feed to process in real time because it is so large, coming in so fast, never stopping and always changing. We were trying to find that needle in the haystack.
Our project had almost an unlimited budget. We had enterprise licenses of Oracle and SAP, hundreds of nodes of Netezza, and acres of Hadoop servers. We tried just about every flavor of NoSQL databases. We started with Cassandra. We liked Cassandra very much as it could handle our large data feeds, but due to the fact that it has eventual consistency, we migrated to HBase. We liked HBase very much, but it had a number of different security holes. So our military project leveraged Accumulo which was created by the NSA and later donated to the Apache Software Foundation (ASF). Also from the NSA came Niagarafiles which the NSA transferred to the ASF and it became Apache NiFi. We were one of few commercial technologies that was born from my military leadership. It started as a geospatial and temporal computational engine and slowly evolved to a highly available and distributed in-memory database that is accelerated by GPUs.
How would a typical enterprise use GPU technology today?
Amit Vij: NIVIDIA's GPU technology is useful today if an organization is looking to get real-time analytics from large payloads of big data feeds that are always streaming in. Many times large organizations that operate within a country or globally find their data sizes get very large as well. To be able to leverage today’s hardware and have over 4,000 cores of a GPU to run analytic and machine learning workloads in a parallel fashion – and be able to Gigathread per node rather than multi-thread per node using CPU architectures – we feel the GPU is ideal.
Many organizations are starting to use machine learning, and deep learning. We have several financial customers using TensorFlow, Caffe and Torch to do various types of analytics related to risk and fraud. In the past, these technologies were used by Google and Facebook to do image recognition to see who a person is in a picture. These technologies are maturing very quickly, and becoming more and more useful for enterprises across multiple different verticals.
Could you provide some examples of how your customers are benefiting directly with a GPU-accelerated database?
Amit Vij: Customers are running very fast OLAP, or analytical processing, on large data feeds, being able to get reports in sub-second time instead of waiting minutes to hours or even instead of getting nightly reports. Customers also can accelerate their Tableau dashboards using Kinetica. Our technology can displace technologies that leverage scale-up hardware; we leverage scale-out architectures.
Another big use case for our customers is for converging machine learning and deep learning into a database running in-database analytics. An organization can register, for example, a TensorFlow library, run a machine learning model as a user-defined function from our database, and execute that upon a data table that may be several billion records in size. They then can see millions to billions of breadcrumbs from sensors that are flowing in today’s world and correlate a person or any type of asset, their location and time, and not only do historical but also real-time analysis on that data. We’re seeing that as an easy workflow for an organization to use.
Route optimization is another use case. The largest logistics company is able to save millions of gallons of gas because they can do this in real time.
In the automotive industry, customers are collecting sensor data from their cars, correlating that and running analytics to have their automobiles execute more proficiently.
In the financial industry, we see fraud and risk modeling using the GPU to do so in real time. Where normally they execute on a small delta window, by using a GPU database they can execute on the entire data corpus and take into account more historical data and also the real-time data that is coming in.
In telcom, our technology provides real-time analytics on dropped calls and helps companies determine where to put cell towers to get better signal quality.
Customers in retail, supply chain management and logistics all use Kinetica. We work with the Fortune 10, Fortune 50 companies in the area of industrial IoT (IIoT) because they have so many sensors coming in. We have major manufacturers that have thousands of parts sensors that are emitting their readings. These are humungous, high-velocity data feeds. The GPU is really helping them to brute-force compute in real time and get immediate results.
How does the cloud factor into all of this?
Amit Vij: We have partnerships with all three major players. Amazon was the first one to have GPU instances, in Q4 of 2016. Microsoft Azure and Google Cloud Platform both have their GPU instances live now as well. We have customers running our product in the cloud. One example is a customer who was running a rack of Teradata for various types of analytics. By pushing their data into the cloud, they can run it on two nodes. GPU instances in the cloud is a very turnkey solution for organizations at low cost and with low overhead for maintaining such servers.
Is Kinetica open source?
Amit Vij: We’re taking a different approach to this. When we look at the open source community, many times you have to duct tape together five to ten different technologies. I see organizations taking months, if not years, to put these technologies into production as they’re all on different release cycles and the development process for each is different. We’re trying to be more of a complete solution where we function as the data warehouse and you can run your analytics within our database. Then by using GPUs, we can even render pictures and even videos that are generated on the fly.
Having a very clean and simple solution like Kinetica, it’s a very easy adoption period for a major enterprise to use. We have ODBC and JDBC connectors so you can push data in and take data out simply just through SQL.
If someone was going to implement your technology, where do they start?
Amit Vij: Find us on the web at Kinetica.com, fill out your information, and we’d love to talk to you.
Thank you, Amit, for discussing with us how GPUs are accelerating the use of analytics across a broad range of use cases and having a profound impact within a wide variety of industries.
Recent articles by Ron Powell
Copyright 2004 — 2017. Powell Media, LLC. All rights reserved.
BeyeNETWORK™ is a trademark of Powell Media, LLC