Increasingly data warehouse components as well as many operational systems are moving to the cloud. By the cloud, I mean systems that conform to the NIST definition of on-demand with self-service, have broad network access, resource pooling, rapid elasticity and measured service. The cloud has lowered barriers to entry in terms of IT competencies that need to be employed as well as hardware, software, power, floor space, storage, network, Procurement and Accounting. In addition cloud providers provide more professional chargeback capabilities. Obviously, much personal software has gone to the cloud - i.e. Microsoft 365, Dropbox, Google Docs, MobileMe and the impending Google Chrome device. But what about core enterprise systems like data warehouses?
The cloud can be resisted there due to the loss of control. However, we must lose the fear and facilitate the right cloud strategy for data warehousing. Some of my clients are eagerly moving various systems components to the cloud and in so doing are going to apply different standards for cloud data warehouses then they would for in-house data warehouses. For example, the usual 99.99% availability mark gets compromised in Amazon's public cloud which offers 99.95% availability. There have also been many public cloud relationships derailed such as the one between Eli Lilly and Amazon. However, even with Amazon's recent outage, I find that companies, despite the media FUD about it, are responding not by moving away from the cloud but by doubling down with high availability systems. Upon further inspection, much availability, security and performance can be considered better than in house systems.
So which type of cloud and what services for data warehousing? As would be expected, initially there is a preference for the private cloud as it gives clients more comfort and more control over security compliance and integration - even though some accountants would prefer the public option so that expenses are fully capitalized without question.
I believe that over time data warehouse infrastructures in the cloud will evolve to a hybrid approach whereby there are public cloud components as well as private cloud components and they are connected with integration. This reflects the fact that data warehouses today are integrated with data marts of various stripes and various alternative technologies to row based relational systems. One of the first ports for data in the cloud is simply data storage fileservers including content management systems, off-site backups, etc. However, CIOs have plans to see 50% of their databases in the cloud in the next five years. The cloud is truly the biggest threat to change how IT operates ever as companies consider what their true core competencies are and the allure of the savings that can be realized with cloud done right.
So which part of the multi-component data warehouse will move to the cloud? There will be many and varied paths, but one popular approach will be that first, there will mostly be those databases. There is also the information delivery layer which is a natural follow-on and then the integration layer, which must see connectivity between the source systems and the data warehouse.
A great time for considering the cloud for data warehouses is during times of consolidation, which many companies are doing today. Finding a mixture of data volume, concurrency, query complexity, data latency, and data sensitivity that works well for the cloud - public or private - is important in developing the cloud value proposition. Developing a disaster plan will be important as well and may look very different from the one used with in-house data warehouses.
So should we have different, lower standards for a cloud data warehouse? Some very well may because of the overarching value proposition that they see in the cloud. Be open to considering all aspects of the cloud value proposition as you architect your data warehouse consolidation or next major data warehouse architecture change.
Posted May 26, 2011 9:50 AM
Permalink | No Comments |