IT Insights from Machine-Generated Big Data: A Q&A Spotlight with Sanjay Mehta of Splunk

Originally published April 26, 2012

BeyeNETWORK Spotlights focus on news, events and products in the business intelligence ecosystem that are poised to have a significant impact on the industry as a whole; on the enterprises that rely on business intelligence, analytics, performance management, data warehousing and/or data governance products to understand and act on the vital information that can be gleaned from their data; or on the providers of these mission-critical products.

Presented as Q&A-style articles, these interviews conducted by the BeyeNETWORK present the behind-the-scene view that you wonít read in press releases.

This BeyeNETWORK spotlight features Ron Powell's interview with Sanjay Mehta, Vice President of Product Marketing at Splunk. Sanjay and Ron discuss machine-generated big data and the insights it can provide.

Sanjay, one of the sources of big data is machine-generated data, and it's undoubtedly the fastest growing, most complex, and most valuable segment of big data. Can you explain machine-generated data and where it comes from?

Sanjay Mehta: Absolutely. Machine-generated data, or machine data for short, is all the data generated by IT structured systems that power the enterprise. This includes live data from packaged and custom applications Ė for example, app servers, Web servers, databases, networks, virtual machines, telecom equipment, and much more. Every activity that occurs in the infrastructure leaves a trace in the machine data. Consequently, that data contains the definitive record of customer behavior, user transactions, application behavior, service levels, and so on. It is incredibly valuable data.

One area where machine-generated data is getting more important is with service level agreements. The ability to be able to monitor and assist in controlling and monitoring SLAs is crucial to all organizations. Is that also a concern for your customers?

Sanjay Mehta: As more services get digitized, quality levels are becoming increasingly important. The thing with machine data is everything that gets touched along the way from the service origination point to the service consumption point goes across multiple devices, applications, networks, and so on. So it's really only in the machine data where you can get the historical perspective about exactly what's been going on from the origination point to the consumption point. This data is useful not only just for service levels, but also to start analyzing exactly how the service is being consumed, by whom, where, and at what time of day. You can start slicing and dicing it across many different vectors. That's the other interesting aspect of machine data.

How else can the machine-generated data be analyzed and what other insights can you derive from it?

Sanjay Mehta:
Making use of machine data can provide significant benefit to many different aspects of the enterprise. The value of the data is across the entire enterprise so Iíll give you just a few examples:
  • Transaction monitoring for online businesses providing 24/7 operations
  • Web activity and Web access usage data to gain customer intelligence, understand capacity usage and track digital assets
  • Service level monitoring to support all internal SLAs†
  • Call and event detail records that are generated by telecom equipment to uncover keys to more profitable services for communication service providers
  • Mobile data to better understand customer location and behavior
  • Monitoring social media networks for sentiment analysis and to spot trends
  • Map and visualize threats and area behavior patterns to improve security
What I provided is a range of very diverse examples. These examples illustrate that there are many points of value where the data is in context with the business or what the business needs. As I mentioned, security mapping and visualization is an excellent example of the use of machine-generated data to not only look for known threats, but also for unknown threats to be able to respond to them in real time.

Another example is being able to see customer behavior as itís occurring when launching new services and then, finally, is the ability to see all the infrastructure and ensure that everything is running correctly so that downtime is eliminated or at least minimized and that services are being offered at the highest level of quality.

Sanjay, another major trend is "big data." What areas within an enterprise can now benefit from a big data solution like Splunk?

Sanjay Mehta: What we're finding is that the value of the data is applicable enterprise wide. For example, in the IT group, watching or monitoring machine data for patterns or specific issues in real time helps maintain service levels and reduce escalations if an issue does occur. Typically whatís happened in the past is that if something does actually happen and they need to resolve it quickly, the different silos that exist within an IT group look within their specific area of applications and data, prove their innocence and pass it to another group for further investigations. And that really introduces the notion of the human latency and takes lots of time. Splunk provides the ability to look and navigate all of your machine data from one place and reduces that time dramatically Ė from many hours to often just a few minutes. We hear that a lot throughout our customer base.

But the same data that is used to troubleshoot and reduce meantime to resolve issues is also valuable in other areas of the business. You can enrich it, for example, with data from a more traditional data warehouse or relational database management system to provide insights useful to the business. How is a new product or service performing in real-time by device type, by time, location, or user type? We have those kinds of examples everywhere. In the telecom world, MetroPCS and Cricket Communications are two great examples of companies in North America who are combining call detail records with other data to get a view on revenue and cross-carrier charges. Other huge online businesses, like Expedia and Macys.com, are using their data to not only keep systems running, but also to get intelligence on the consumption of services being deployed across their infrastructures. A final example is harnessing machine data or machine-generated big data, as I like to refer to it, for security and compliance, looking for known threats, unknown threats, and being able to improve overall security.

For our readers who are strategically approaching big data, how would they get started?

Sanjay Mehta: The good news is Splunk grows incrementally on commodity hardware as an organization's data needs grow and develop. There's no need to try to boil the ocean upfront. You can start somewhere; and as the value of the data becomes more obvious, you can grow these systems incrementally. Thatís the first point. But I would say that in terms of strategically approaching big data, it's really about focus. Focus first on the low-hanging fruit Ė what you want to get from the data. Socialize with other groups in the organization to understand their data needs. Splunk can be implemented quickly, and you can connect it to diverse data sources. You can start browsing and analyzing your data very quickly, and then pinpoint and identify uses of that data. By focusing on that and proving the value, you can then come up with a broader strategy.

You mentioned low-hanging fruit, and we talked about telcos and a number of other industries. If a company was about to deploy a big data solution like Splunk, would you be able to come in and help a telco figure out how to begin?

Sanjay Mehta:
One of the foundational principles for our product was to make Splunk as easy to use as possible. It's available as a free download from our website. Anyone can download it, and itís very easy to use. But in the event there is a complex environment, certainly we can provide support and help in best practices through our professional services organization.

One thing I would say is we focus very heavily on community with Splunk. We have many examples of users who download the product just for themselves, but then start bridging use cases beyond their own world into other areas and start interacting with business users and providing intelligence. With our community, we have the opportunity for our partners and our users to build apps that combine queries, research, visualization, workflows for specific use cases and specific technologies and then post them on a website so that app can be available for anyone to download. We support the ones that we post. We find this really provides vision of use. It gives people who may perhaps have one focus on how the data can be used see all these other examples. There are a couple hundred apps out there right now.

So there are really three answers to your point. One is trying to make the product as simple to use as possible to gain a very quick ďahaĒ experience when people download it and use it. Secondly, we have the community that is an excellent source of innovation and vision of different kinds of uses of the data. And the third is that we have lots of documentation online available for free, and lots of other support that we can provide as well.

I like how you support the community. Your community provides a vision and presents a wide variety of use cases. Seeing these use cases letís companies see additional ways they can benefit from Splunk. And then because you stand behind it, they obviously know that itís production ready. Youíve really created an open community.

Sanjay Mehta: Weíve done that because we don't want to limit innovation in any way. This community is extremely bright, innovative and entrepreneurial. Entrepreneurial users are gaining just incredible insights from the data and finding different ways to correlate it. So they can package it up as an app and post it. They also have forums and knowledge exchange sites where people discuss ideas Ė it's an incredibly interesting area. What we do is support it and provide a spot for that conversation dialogue.

Can you give us specific examples that stand out with your customers that are using Splunk? Which ones stand out for you?

Sanjay Mehta:
What we find is that at the very beginning, the use case is often around infrastructure management or application management so itís finding one place to search, navigate, analyze and monitor their systems to troubleshoot and keep things running. And we're tremendously successful in that area. We see dramatic improvements in uptimes, dramatic reductions in time to resolve issues and troubleshoot. Thatís typically where Splunk starts getting used. Once people start harnessing the data and start really looking at it, we find is that these use cases evolve quite quickly. The first example is Expedia where more than 90% of their machine-generated infrastructure data is consumed by Splunk. Splunk ingests it so itís available for analysis.

I provided an earlier example around keeping systems running and troubleshooting. That company is also now using Splunk to monitor and analyze different aspects of their online business. They have more than 3,000 users of Splunk in their environments, and itís pretty exciting to see their combination of uses both by IT and business users.

Another example is Salesforce.com, which is a big user of Splunk. They use it for maintaining the service levels and providing insight into those service levels. They also use it to monitor their own services like Force.com where you can host platform-as-a-service or infrastructure-as-a-service. People can host applications there, and Salesforce.com wants to be able to provide people with insights as to how things are running. They provide that information through Splunk.

Another example is NPR. They have a very interesting use case because they deployed a more traditional web analytics solution, but it wasnít giving them the data they were interested in. It was giving them page views and the understanding of the website, which Splunk can also provide because all that information is in the web server log and is machine data. But they also wanted to see all the way back from the media servers that their components are delivering digital assets through website. They wanted to be able to see all the way from there to where it was being consumed so that they can then rectify that against royalty payments and see what's being consumed simultaneously by who and by what device. With Splunk, they can look at all these different things because it's all machine data and they can correlate it.

The final example is a new company called Optimizely. It was founded by a gentleman who looked after the online side of the Obama campaign. He realized there were things that he just couldn't get off the shelf. They have a very exciting and interesting business focused on A/B testing, where they host your website and you can try different A/B examples on the page, or on multiple pages, and they can provide you the performance of that in terms of where people are clicking. All of that data is being provided by Splunk so that's really great example of the business being built on big data Ė specifically machine data and Splunk.

Thank you Sanjay. Obviously, it's a whole new frontier out there with Splunk and big data, and your customer examples are certainly intriguing.

  • Ron PowellRon Powell
    Ron, an independent analyst and consultant, has an extensive technology background in business intelligence, analytics and data warehousing. In 2005, Ron founded the BeyeNETWORK, which was acquired by Tech Target in 2010.† Prior to the founding of the BeyeNETWORK, Ron was cofounder, publisher and editorial director of DM Review (now Information Management). Ron also has a wealth of consulting expertise in business intelligence, business management and marketing. He may be contacted by email at rpowell@wi.rr.com.

    More articles and Ron's blog can be found in his BeyeNETWORK expert channel. Be sure to visit today!

Recent articles by Ron Powell



 

Comments

Want to post a comment? Login or become a member today!

Be the first to comment!