Data Integration That’s One Step Ahead: A Q&A with Ash Parikh of Informatica
by Ron Powell
Originally published December 27, 2012
This BeyeNETWORK Spotlight features Ron Powell's interview with Ash Parikh, Senior Director of Product Marketing at Informatica. Ron and Ash discuss data integration, business intelligence, operational BI and data virtualization.Ash, what is your role at Informatica?
Ash Parikh: I'm the Senior Director of Product Marketing, leading product strategy around some of our emerging products, or what we like to call next-generation data integration technologies and tools. The reason that it is next generation is because we at Informatica continuously strive to bring innovations to the field of data integration; these innovations are based on both our highly differentiated vision, and real needs, requirements, and requests from our customers. Since we are the data integration market leader, we pride ourselves on continuously improving not only our solid set of scalable and proven data integration products, but also keeping one step ahead by helping our customers lower costs, reduce risk, and increase efficiency, with our innovations. Bringing these innovations to the market is a big part of my role at Informatica.
The world of business intelligence (BI) continues to evolve. Can you talk about the changes you've seen in BI over the last few years?
Ash Parikh: Absolutely. It's a great question Ron, and I think to lend a little bit of credibility here, I’d love to quote my friend at Forrester Research, Boris Evelson. He has a blog entitled "Top 10 Business Intelligence Predictions for 2012.” In that blog, Boris says:
“Demands by users of business intelligence (BI) applications to "just get it done" are turning typical BI relationships, such as business/IT alignment and the roles that traditional and next-generation BI technologies play, upside down. As business users demand more control over BI applications, IT is losing its once-exclusive control over BI platforms, tools, and applications.”The reason I am sharing this is that it is a very interesting observation. What Boris is actually telling us in his blog is the undercurrent that we're seeing in the market today. Business intelligence, analytics, visibility, or just having more intelligence about how the business is run or having a single view of the business has been top of mind for CIOs across the world every single year. But, every year we hear that BI is a top priority for the CIO, over and over again.
But the theme changes, ever so slightly. I believe that BI will continue to be a top priority for the business on an ongoing basis because it's all about visibility and having a hand on the pulse of your business, your customers, and your operations in general so you can make decisions that are impactful and that can help you reduce cost, increase efficiency, and reduce risk. But this blog by Boris really lends credence to the fact that it is the business that really wants things faster – faster reports and faster insight. They're kind of putting a message out there to IT that if IT doesn't do it, they will do it.
We have a number of customers who are doing this. Recently, after my presentation at an industry conference, a BI manager came up and told me that his company uses shadow IT teams because they just can't wait for the reports that they need. This was not news to me. I had so often heard that the business wants the reports it needs and trusts now, while IT takes too long – often 3-6 months. As a result, the business often went around IT by leveraging their shadow IT teams to simply get the job done by creating custom, one-off “spread-marts.” In fact, the record is about 60,000 Access databases built by such shadow IT teams, so it’s quite possible they customers have sensitive data strewn across the enterprise. I'm sure somewhere out there is a number much larger than that, but you can clearly see that the business wants more self-service. However, IT won’t want to give up control.
But, I think there's a nice common ground that we can get to as far as enabling more agility in BI is concerned.
What are some of the challenges you're seeing with respect to agile BI and can you also reflect on self-service BI that you mentioned?
Ash Parikh: Absolutely. I can put this in context of a typical meeting that probably takes place in companies everywhere. You may have the C-level executive who stands up in the middle of the room at the end of a quarter, looking around the room at all his or her business managers, line of business people, saying, “I want a single view of the business now. I want a complete, current, and a trusted view of my business across all of our enterprise data sources. Don't tell me you're going to give it to me in three to six months. That's going to be too late. I want it now because I need to make decisions that are extremely critical, and I need to report to the Board. There are certain things that we just have to do, and I need that information right now. Tomorrow is too late.”
So if that is the case, then the call of the hour is urgency. The issues we are seeing are manifold. First, each line of business typically embarks on its own business intelligence project using whatever technologies and techniques they already have at their disposal – EAI, ESBs, ETL, data federation, hand-coding, and so on. Second, we typically see many BI tools, not just one, in an enterprise. So there's complexity at the consuming layer and there's also complexity at the data source layer. Third, there is significant backlog as far as IT is concerned, which means that any new request must take a number, and wait its turn. Next, data is everywhere, not just in a data warehouse – so it may not be readily available. Finally, on top of all that, everyone is trying to do business intelligence, as a silo within their own department or line of business, across the organization. So when the C-level executive asks for that single, current, complete and trusted view of the business, he or she is not going to get it, and certainly not in a couple of days or weeks. Additionally, the following challenges make sure that the problem runs really deep:
A key driver for enterprises from a competitive standpoint is to be agile. Is that also a major focus for Informatica?
Ash Parikh: I think you touched on something that is becoming increasingly important to businesses worldwide. It’s definitely about becoming more competitive and efficient, but there's also another angle that’s often forgotten. When people talk about agility, they often forget the productivity aspect. And the reason I say this is you don't want to keep reinventing the wheel. There's should be a rich set of reusable integration logic that is readily available, which you should be able to use. You don't want to start hand-coding certain things. You want more of a metadata-driven approach, a zero-code kind of environment to do things faster.
It’s not just about agility for us. We think agility and productivity go hand-in-hand, and along-with that a very important piece of the puzzle is giving enough power to the business users and analysts, who truly know the data the best. That's also extremely important.
Imagine asking IT to go ahead and embark on requirements that only the business knows. Some of them may have some familiarity with the data, but by and large, IT doesn't have as good of a hold on business terms, on the data the business needs, or on business rules, etc. It’s best that this responsibility be left with the business user or the analyst. But at the same time when you look at a data integration process, it’s a multistep process. It starts with the business asking IT for a new report and outlining the requirements. But after that, there are a number of steps like integration, testing and deployment that IT has to go through.
And we must remember IT has a huge backlog. You're never the first in line. There's a whole queue of requests that IT has to balance at any point in time. So while ideally the business wants to sit next to IT and provide iterative input into the reports that they're asking for, that's not possible in today's day and age. And that's the problem. The business user says, “I asked for this report. First, you take four to six months. And the second point is what you gave me after four to six months is not what I asked for. It looks different, it's missing certain attributes that I think I require, and there are inaccuracies. An ideal situation would have been if we could have collaborated iteratively. I could have iteratively told you that I wanted to add a certain attribute or I wanted to do some checking on a particular set of values – and I wanted to do in real time – sitting right next to you. That would have sped up the process.” Now that's something that is not addressed by BI tools, or by simple data federation.
So, for us, it’s about both agility and productivity. Both go hand in hand because they both help in accelerating report delivery. Informatica has enabled many customers to significantly increase agility and productivity in BI projects by up to five times.
You mentioned data federation. An area that we see a lot about is data virtualization. How is this different from data federation and what is Informatica’s data virtualization solution?
Ash Parikh: Before I get into the subject of data virtualization, I did want to point out a little bit of history. In the early part of this century, there was data federation, enterprise information integration, EII, etc., and these were the precursors to what we today know as data virtualization. It was the simple act of combining data across heterogeneous sources without physically moving the data, or leaving the data in the back-end sources as is, and then simply, at query time, sourcing that data or pointing to the data so as to report on it. It was all about creating a virtual layer and then reporting on that layer on an on-demand basis.
Now that's where the story ended. That's about all these that tools and technologies could do. They were SQL or XQuery code-heavy technologies designed for the developers, not business users or analysts. So again, remember all the challenges that I spoke about. You want to have more productivity, more self-service, and more business IT collaboration. So, if that's the kind of tool you are providing for enabling fast or agile data integration, then that is not going to help.
Now, fast-forward ten years to 2009-2010, and there is a new term, data virtualization. Basically, data federation vendors simply bottled is old wine into a new bottle. These data federation technologies simply renamed themselves as data virtualization. There’s been a lot of hype, and people hoped there was something new here. But it was basically the same old data federation technology, the same tools, the same SQL and XQuery technologies put out there, with very little by way of innovation. A few things that were missing in the simplistic data federation approach have been increased or added to, but they are still missing the key point. You still have a very developer-centric way of creating that virtual layer. It is a SQL or XQuery code-heavy way of developing that layer. Only Informatica takes a model-based and metadata-driven approach to developing this virtual layer that can be leveraged by both analysts and developers, using role-based tools that share common metadata, thus fostering instant collaboration.
The second point is the data virtualization technologies based on data federation do not have any notion of how to do data quality. It’s not just about calling a third party address cleansing Web service. You need a full-features data quality technology that works within this virtual layer to cleanse, enrich, or even mask that merged information, , in real time without the overhead of staging or post processing. Nobody does this comprehensively in the market except for Informatica. That is what we mean when we at Informatica talk about advanced data virtualization. You get the benefits of data virtualization. So, you don't move the data. You create a virtual layer, and in real time access and merge diverse data sources and apply complex business rules to the layer. You go against a common data access layer and query it, do on-demand analysis, etc., You let the business users or analysts access and merge diverse data, profile it, cleanse it, own the data and define the rules, while IT retains control and the deployment piece of the process. You also bring the sophistication of traditional data integration into the mix – by which I mean, physically materialize the virtual layer or view, if needed, with just a few clicks. That's our secret sauce. What we have done at Informatica is truly innovative. Informatica PowerCenter Data Virtualization Edition combines the sophistication of data integration with the agility of data federation, and that's why we call it advanced data virtualization.
Imagine the entire power of a proven and sophisticated data integration platform like ours, but now incorporated within this virtual layer. You get all the prebuilt advanced transformations that can be easily dragged and dropped into a graphical interface. There's no hand coding. You can graphically develop transformations and rules from scratch, or start using a rich prebuilt library of - and these are sophisticated advanced data integration transformations including data quality that we are talking about. They're not transformations that are only limited to SQL or XQuery.
In addition, we lead the market not just for data integration, but also for data quality. The entire power of our data quality platform is also brought to this virtual layer so it's not just simply going out there and connecting to an address standardization web service, as I mentioned earlier. It is bringing the whole palette of prebuilt data quality, cleansing, standardization, and matching kind of transformations to the mix. It does this on the fly, and that's what’s important and unique to Informatica. There's no staging. There's no post processing. We do these transformations including on-the-fly on this virtual layer.
So again, if you're keeping count here, you're actually speeding things up. With on-the-fly profiling and cleansing, you're taking away any post processing. And we are metadata-driven and provide a zero code environment. We offer role-based tools for both business and for IT. And the key part here is all these tools share common metadata. So for example, Ron, if you were the business user, the analyst, and I was the IT developer, any change you made in your web browser analyst tool, I would immediately see in my developer-centric tool, without loss in translation. That brings us back to the self-service aspect. And, in fact, we call it managed self-service because we allow the businessperson to do as much as possible such as combining data, creating that virtual layer, doing profiling on that virtual layer, and applying these advanced transformations if they are available, without waiting for IT or IT’s help. And if there are transformations that need to be created, it is a simple step of writing an annotation within the web browser tool, and the IT person sees it immediately in his tool and goes ahead and creates those advanced transformations. There's no time lost, no waiting, no paper pushing. It's all done through these collaborative and iterative techniques.
So as you can see, we take time out of the equation at every stage in a data integration process. And to take this further, when ready, the business analyst says, “This is how the virtual layer should look. I've profiled it, and I've validated that the data looks exactly how I want it. I've made sure that the attributes that I want in the report are there in the virtual layer, so now please go ahead and deploy it.”
The IT person can then immediately deploy this for BI or instantly reuse it with a single click for a Web service to serve a portal. Also, here is that bit that truly very unique. You're not going to throw away a data warehouse; you're not going to be doing away with ETL, right? We allow you to co-exist and decide if you want to prototype and jumpstart your project using advanced data virtualization. And with a single click, move that virtual view into our ETL tool, which is called PowerCenter. You get the best of both worlds. You can jumpstart projects and prototype using our data virtualization capability. And if you want to persist, you can go ahead and, with a single click, move into PowerCenter. There is no retraining. The same skillset knows how to do both.
It is this same combination of data integration and data federation in a single environment that is also helping many of our customers to jumpstart their data warehousing projects with rapid prototyping. They leverage our advanced data virtualization solution for both agility and productivity, as I have described, but then also for flexibility – they can rapidly prototype first, get the requirements right upfront rather than after the fact, validate the results and then deploy the solution instantly.
Finally, there is a reason why data federation solutions don’t make it into the Gartner Data Integration Magic Quadrant report. I always say that a one-trick pony that does only data federation, even though it may be renamed as data virtualization, won’t cut it in the enterprise. One has to realize that data federation on its own, however compelling a technology it may be, will continue to struggle to gain widespread adoption. Gartner Research has put out a number of reports on what they call the logical data warehouse which only validates this thought process.
We discussed agile BI. What is operational BI? What are some of the challenges your customers are facing with operational BI?
Ash Parikh: Operational BI is a very interesting realm. I hear a lot of talk around operational and real-time business intelligence. There are a few things that I ask people when I discuss agile business intelligence. When I talk to them about how long it takes to get reports from IT, I also ask them how important it is not to directly hit their operational systems. And many times, they put their hands up and say that is extremely important to them. Their IT team is not going to allow them to report off or directly access their operational systems.
But as a business user or a C-level executive, you absolutely want to have insight into the most up-to-date or current information available from your operational systems. Data virtualization is a nice technique to bring in various forms of data regardless of what format or their location, etc. But if you combine the concept of data virtualization with the ability to also access operational systems without impacting them, you have a winning combination. And that takes us into the realm of data virtualization and data replication working together to enable operational BI.
How is Informatica uniquely addressing operational BI with data replication?
Ash Parikh: The challenges that I just spoke about are the challenges that we take care of with our data replication technologies. So I'll tell you a little bit about operational BI first. We use our data replication capabilities to create a replicated source or a copy of systems that you don't want to go against. And all this is done in a non-invasive way. The technology supports heterogeneity. Basically, we can connect to any data source or type of data that's out there, and we do it in a high volume, high performance mode. That replication is very quick, and people love it because they're getting this non-invasive way of replicating information.
Now when you have created that copy, you can use that copy as just another data source to your virtualization layer. So what you have now is you're using virtualization for agile data integration, and you can serve your virtual view or your virtual table for reports. But that virtual view now includes source data from the operational systems as a result of having creating that replicated copy. You're getting the best of both worlds.
You asked how Informatica is unique from a data replication perspective. First, I think that has a lot to do with Informatica in general. We are a non-database company so we're database agnostic. If you look around, the technologies that are out there for data replication typically are database-centric. Obviously, if you had your own database, you'd optimize for your own database environment. Having said that, typically large enterprises have multiple locations, multiple places or silos that are using different replication technologies from different vendors. These vendors, except for Informatica, are database dependent. They have their own database technology. So you're looking at different silos of data replication technologies within that enterprise trying to do the same things, which is basically replicate or copy that information at high speeds.
But in this economic environment, it is all about reducing cost, reducing risks, and improving efficiency. Every CIO wants to consolidate technologies and standardize on a single toolset for all types of data integration. Informatica’s data replication technology allows enterprises to standardize on a single platform for data replication and other styles of data integration including ETL, message-oriented styles of data integration, or even data federation or data virtualization as we discussed before. That's the first thing.
The second area where we're unique is actually a couple of things. The first advantage is it’s a very easy-to-use graphical interface. Compare that with the overhead of writing scripts and a parameter-based way of development. Our approach is very graphical and it’s very intuitive. The second advantage is being non-intrusive. So we monitor logs versus putting triggers, or staging data on the database, or doing disk access. That's extremely beneficial because if you were creating triggers and staging data, etc. you’d be impacting performance, slowing down operations and incurring higher costs. Nobody wants that.
The third advantage that we already covered to some extent is the notion of standardization. It’s all about data replication as part of an end-to-end data integration platform, versus a point solution that limits choice, creates more silos, and hence increases cost and risk.
The fourth advantage is the scalability and reliability. There's a big benefit around auto parallelism and being built on a proven scalable platform.
And to sum everything up, it's about having a single platform that is complete, that is comprehensive and holistic and that provides a solution that does not create yet another silo. That's extremely important. One of our customers told us that what he really likes is the freedom of choice with Informatica because we don’t have a specific database bias, we give him the option to standardize on a single platform which does a fantastic job of supporting all types of data integration. I think that probably summarizes it.
If you were going to summarize why people choose Informatica, what would you say?
Ash Parikh: First, we've been in the data integration market for a long time. We actually created the market called data integration. We lead all aspects of data integration in the market – ETL, real-time data integration, data quality, MDM, and data virtualization. We have innovations across almost every aspect of the data integration landscape including cloud and now even big data. And this is a company that is been singularly focused on data integration. That's what we do, and we do that day in and day out, really well. We have customers betting their businesses on us because they know that Informatica is going to continue to improve its technologies and keep one step ahead by delivering game-changing innovations that are extremely valuable and practical and that will always help them save money, increase efficiencies and reduce risk.
To keep the dialogue going and to hear more about Informatica’s innovations in data integration, I would like to invite your readers to join the Next Generation Data Integration and Analytics Group on LinkedIn.
Ash, I want to thank you providing our readers with a solid understanding of Informatica, data integration and data virtualization.
Recent articles by Ron Powell
Copyright 2004 — 2014. Powell Media, LLC. All rights reserved.
BeyeNETWORK™ is a trademark of Powell Media, LLC