There are a number of technologies now ready for market and being widely adopted that are taking emphasis away from the data warehouse as the be-all and end-all of information management. Perhaps none of these technologies is so clearly associated with the transition of information architecture as data virtualization.
Data virtualization was pushed too hard, too fast in the heyday of data warehouse building, and it got the proverbial black eye by trying to create the unworkable “virtual data warehouse.” Difficult to consolidate, this style of warehouse left pieces around the organization and imagined a layer over all of them, stitching together result sets on the fly. In most cases, this was unworkable. The “hard part” of consolidation, ideally supported by governance, was required. It still is. You would not refrain from building a data warehouse and simply rely on data virtualization. For the reasons mentioned in the other articles in this series,1 the data warehouse is still required. However, there are times today when data virtualization can be a short-term solution or a long-term solution.
Controlling Your Information Asset
Every organization should expect chaos over the next decade while working to get the important asset of information under control. Architecture will drag behind the delivery of information to users. Much of data delivery will be custom. Data virtualization will be useful in this custom, creative delivery scenario. It has the ability to provide a single view of data that is spread across the organization. This can simplify access, and the consumer won’t have to know the architectural underpinnings. These are the short-term solution aspects to data virtualization.
How long is short term? Term is a vague concept in this era of technological advancement, but nonetheless it is important to have timeframes in mind for solutions in order to make wise investments. In the case of data virtualization, short term is until the architecture supports a physical view through integration or until it becomes evident that data will remain consciously fragmented and data virtualization becomes the long-term solution.
The right architectural answer may be to not centralize everything, and the right business answer may be to not take the extended time to design and develop solutions in three distinct technologies – business intelligence, data warehousing and data integration.
Leveraging Data Virtualization
While many programs like the data warehouse, CRM, ERP, big data storage, and sales and service support systems will be obvious epicenters of company data, there is the inevitable cross-system query that must be done periodically or regularly. Many organizations start out by leveraging their data virtualization investment to rapidly produce operational and regulatory reports that require data from heterogeneous sources.
Further queries spawn from this beginning since data virtualization has the only “bird’s-eye view” into the entire data ecosystem (structured/unstructured), seamless access to all of the data stores that have been identified to the virtualization tool, including NoSQL stores, cloud-managed stores, and a federated query engine.
As middleware, data virtualization utilizes two primary objects – views and data services. The virtualization platform consists of components that perform development, run time and management functions. The first component is the integrated development environment. The second is a server environment, and the third is the management environment. These combine to transform data into consistent forms for use.
Integrated Business Intelligence
Data virtualization is used primarily for providing integrated business intelligence, something formerly associated only with the data warehouse. Data virtualization provides a means to extend the data warehouse concept into data not immediately under the control of the physical data warehouse. Data warehouses in many organizations have reached their limits in terms of major known data additions to the platform. To provide the functionality the organization needs, virtualizing the rest of the data is necessary.
Data virtualization brings value to the seams of our enterprise – those gaps between the data warehouses, data marts, operational databases, master data hubs, big data hubs and query tools. It is being delivered as a standalone tool as well as extensions to other technology platforms, like business intelligence
tools and enterprise service buses.
Data virtualization of the future will bring intelligent harmonization of data under a single vision. It’s not quite there yet, but is doubtless the target of research and investment due to the escalating trends of competitive pressures, company ability and system heterogeneity.References
- Previous articles in this series by William McKnight on technologies changing our view of the data warehouse: Hadoop, Columnar Databases, and Master Data Management.
SOURCE: Data Virtualization
Recent articles by William McKnight