Enabling Big Data Analytics Inside of Hadoop: A Spotlight Q&A with Justin Borgman of Hadapt
by Ron Powell
Originally published September 11, 2012
This BeyeNETWORK Spotlight features Ron Powell's interview with Justin Borgman, Co-Founder and CEO of Hadapt. Justin explains how Hadapt integrates SQL and Hadoop in a single unified platform – no connectors, complexities, or rigid structure, and he shares the resultant benefits of that approach.Justin, "big data" is a term we hear a lot these days. What do you feel is fueling this big data trend?
Justin Borgman: I believe the primary driver for big data is the pursuit of value (which coincidentally is the one “V” that is left out of the traditional big data definition) for both businesses and customers alike! Organizations are inundated with the big data concepts of volume of data, variety of data, and the velocity with which it is being created – which are all relevant and important; however, I believe the real driver is the unprecedented value organizations are finding by leveraging emerging analytic paradigms. To be more specific, part of what is making big data so compelling and such an interesting concept for companies is they can now get a more holistic view of not only their customers but also their businesses by leveraging the previously mentioned concepts. Legacy technologies are struggling to keep up with both the size and complexity of big data, and that's driving innovation in a lot of areas and creating opportunities for new technologies to be able to deal with this emerging analytic paradigm.
I know that when we started with data warehousing a long time ago, we talked about creating a single view of the customer. I don't think we ever envisioned that it was going to go beyond data warehousing. When I look at big data, it is definitely adding new elements and providing a more holistic view of the customer. Now Hadoop has gained more and more mindshare in the context of big data. What makes Hadoop so disruptive?
Justin Borgman: It's a completely different way of thinking about how to handle your data. In the traditional world of data warehousing, there were so many rigid constraints: you would have to predetermine your sources of data, know ahead of time what kinds of questions you wanted to ask that data, determine the schema and pre-processing steps to prepare the data for your warehouse, and then load it into the warehouse. Hadoop allows you to do things a little bit differently. First, Hadoop is a wonderful storage platform, an alternative to expensive network attached storage and other available options. You can actually store massive amounts of data in your Hadoop cluster very cost-effectively, and that's driving people to store data that they might have otherwise eliminated during the ETL process. You can now hold onto all of that raw data and do analytics across larger, more diverse data sets. I think Hadoop is disruptive from a cost standpoint because it allows you to store data cheaply and is also disruptive in terms of the kind of data that you can store. You can store both structured and unstructured data in this platform and decide later what kinds of questions you want to ask about it. That allows you to be more flexible about keeping larger sets of data for longer periods rather than having to be very judicious upfront about which data you decide is valuable; in many cases, the previous methodologies precluded you from asking the second, third, and fourth follow-on questions due to a lack of the detail level data – with Hadoop these limitations are alleviated.
Hadoop has taken a lot of flack because it requires a skill set that only the most advanced Internet companies possess. What is Hadapt doing to make Hadoop easier to use in the enterprise?
Justin Borgman: I think that many Fortune 1000 companies are challenged by how to work with data inside Hadoop. They understand the value proposition of the infrastructure, and they're excited to use Hadoop. The promise of Hadoop is very compelling, but when they try to implement it, they struggle with how their business analysts will be able to interact with data inside Hadoop. Those analysts are not programmers, and they don't know how to write MapReduce jobs. Hadapt makes Hadoop more consumable and more useful by bringing an SQL interface to Hadoop. SQL, of course, has been the language of choice for 30 years. It’s the language that everyone knows, and it is the language that allows people to use business intelligence tools to interact with the data. With Hadapt, the business analyst who either knows SQL or knows how to use business intelligence tools can now access data inside Hadoop. The enterprise doesn’t have to try to hire extremely difficult-to-find programmers who understand how to write MapReduce jobs. Hadapt makes this cool new technology work with the investments and the skills that an enterprise already has. That's part of what we do.
Is Hadapt similar to HBase or Hive?
Justin Borgman: It is more similar to Hive than HBase. There is a lot of confusion about the best uses for those technologies. HBase is really better for short request processing (more key value pair / operational in nature) and is a complementary technology to Hadapt. Hive is designed to be an SQL interface for Hadoop; in that sense, it is more similar to what we do, but Hive is lacking in terms of performance and SQL completeness. Many people trying to use Hive become quickly frustrated because it's very slow and doesn't allow them to get the desired connectivity to business intelligence tools. Hive simply doesn't support all the SQL that's necessary.
How does Hadapt help in that situation?
Justin Borgman: Hadapt has built its own SQL interface that supports more of the SQL language, allowing greater connectivity to SQL investments. That includes SQL applications that you may have built internally. It also includes business intelligence tools, such as MicroStrategy, Tableau, Cognos, or BusinessObjects; with Hadapt, you can now connect to these tools to your Hadoop cluster to do business intelligence. It also means that you can write ad hoc SQL queries. This does not have to be a batch-oriented system anymore. You can actually sit at a terminal, write an SQL query, get a response fairly quickly, and continue to iterate on that data to understand it better.
Excellent. Can you give us examples of how some of your early customers are using Hadapt?
Justin Borgman: In our early customer program, we had a lot of customers trying to get a better understanding of consumer behavior. We call this consumer behavior analytics, trying to understand how consumers behave across multiple channels. How does an email campaign or an online banner ad campaign influence both online and off-line activity? The ability to answer these questions enables them to have a more complete picture of their customers. This type of analysis requires correlating various data sources – whether it's clickstream data from the website, email data, or point-of-sale data from brick-and-mortar stores. With Hadapt, they can bring that all together and draw correlations across the data.
We've also seen interesting use cases that combine structured and unstructured data along with full text search, which is a capability that we feel we uniquely provide. Our customers can do things like call center analysis to understand why people call them and draw correlations between the recorded transcripts of a call and the very structured data that gets captured when every call is logged. We've added parallel full text search capability to the product, which has allowed some customers to use Hadoop and Hadapt as an email archival and retrieval system (in support of e-discovery use cases – as an example). In the event of a legal discovery situation, they can very quickly search through hundreds of terabytes of emails and gather the ones that they need for that discovery. We're seeing broad appeal for the platform in a number of different areas.
How about social media? Are you seeing Facebook, or Twitter or anything else from that area being involved?
Justin Borgman: We certainly see companies use data from those sources to correlate it with other data that they have. That's another case where the combination of structured and unstructured data is really helpful. A lot of that social media data tends to be lacking in structure, which makes it difficult to work with in a traditional database. However, leveraging the power of Hadoop and the full text search that we built into our product, our customers can really gain a better understanding for what's going on with social media and perhaps how that's impacting sales and activity on their websites. Those are definitely interesting data sources that more and more companies, especially retailers, are trying to capture and understand.
Justin, who does Hadapt compete with? Who do you see as your biggest competitors?
Justin Borgman: There's no one else out there that has actually integrated SQL with Hadoop. We are the only database inside of Hadoop, and that’s the innovation that really spawned the creation of the company. The most similar architecture is actually a combination of technologies. You'll see a lot of enterprises today that use the Hadoop cluster in conjunction with a database or data warehouse. They use what's known as a Hadoop connector, and so they do some things with unstructured data in that Hadoop platform and then move it – with the Hadoop connector – into their database so they can do business intelligence and deeper SQL analytics.
We think that two-system architecture is inefficient, unnecessary and ultimately architecturally offensive. In fact, we believe you can perform all those analytics and provide SQL connectivity all within the Hadoop cluster. You never have to move the data around. When you’re talking about terabytes of data, data movement is really never a good thing. In addition, you can save a lot of money by simply consolidating in a Hadoop platform rather than having two systems.
Another thing about Hadapt is just the simplicity of it. It's very complicated to do some things in Hadoop and then move them over into another system. Any time you want to work with structured and unstructured data, you're doing an elaborate multistep process across two architectures, and that adds a lot of complexity to the process. We simplify that greatly by bringing it all into one package.
Your value proposition, then, is the elimination of the ETL step of moving data back and forth. You reduce the time required to do analytics in Hadoop, is that right?
Justin Borgman: Our value proposition is much greater than improving ETL; we enable customers to bring their applications, toolsets, and skills to the burgeoning Hadoop ecosystem and interact with it via SQL. There are many operational efficiencies that come along with the integration of SQL and Hadoop into a single, unified platform; however, the most powerful value proposition comes in the form of new analytic capabilities, a holistic view of both business and customers, and a well understood common interface to this incredible ecosystem of Hadoop. We are enabling customers to execute analytics across multi-structured data and breaking the barriers that Hadoop is “just an ETL tool for a database” – our goal at Hadapt is to be the database for Hadoop.
Excellent. Justin, thank you for introducing our readers to Hadapt and your unique value proposition for using Hadoop.
Recent articles by Ron Powell
Copyright 2004 — 2020. Powell Media, LLC. All rights reserved.
BeyeNETWORK™ is a trademark of Powell Media, LLC