Data Warehousing in the Clouds Making Sense of the Cloud Computing Market
by Lou Agosta
Originally published October 9, 2008
Cloud computing has come up fast with companies such as Amazon, Google and Salesforce.com stealing a march on the information technology infrastructure stalwarts such as HP, IBM, Microsoft, Dell, EMC, Sun and Oracle. The latter are certainly participating, but doing so more behind the scenes notwithstanding some high profile press releases.
The standard IT big guys are allowing the leaders in the new digital economy to incur the risks, presumably while selling them hardware and software to provision the virtual network and data center in the clouds. It would be premature to say that cloud computing is a trend in data warehousing, though Blink Logic is making a splash with business intelligence (BI) dashboards and scorecards, among other venues, in the cloud provided by OpSource. With a few exceptions, cloud computing remains firmly oriented in the transaction processing area and is an ideal solution for start-ups, prototyping and non-mission-critical applications. However, it is not premature to ask how it will impact data warehousing and business intelligence, as it surely will. This requires taking a step back and getting oriented in the fog of technology, pardon the pun, that surrounds redefining what is possible in business through cloud computing. As with any emerging market, established vendors are putting a toe in the water and start-ups are jumping in with both feet.
So what’s new with cloud computing and how does it differ from the grid, software as a service (SaaS) and simple web hosting? Key differentiators include:
The latter (#4) in particular is a work in progress with many kinks to be worked out before cloud computing is ready for the enterprise, simply because the market is new and a certain amount of trial and error is occurring. Hopefully, not too much error. See Figure 1.
Of course, cloud computing takes its name from the icon for the network connection that occurs in design and flow charts. The network is not the only component that is being virtualized. The notion of a virtual private data center consisting of virtual private servers is an extension of virtualization of the network. Data center (and server) virtualization has always been a compelling idea, but one that has proven difficult for end-user enterprises to implement. Even large users of IT services – telecom, finance and retail – have struggled to get it right and obtain value from virtualization, though everyone knew it was a good idea. It turns out that there will be virtualization specialists who will expose the computing power to the network (cloud). The critical mass of knowledge, best practices and technology innovation in data center operations at Amazon and Google turns out to be generalizable and separately marketable. At least that is the value proposition.
Thus, cloud computing is related to the grid, but different from it. The grid was supposed to be a distributed, parallel computing infrastructure along the model of the delivery of electricity – hence, the “grid” metaphor. Plug in and obtain CPU cycles on demand, exploiting the fact that most CPUs are busy between 2 and 10% of the time. Not that busy, so let’s find a way to capture the other 90 – 98% of the computing cycles. Hence, the grid. The grid envisioned combining heterogeneous operating systems, scheduling, authentication, storage and administration beneath a hardware/software abstraction layer that made service virtual. And here “service” means “web service.” Competing standards and lack of standards means the grid is a high bar to get over. Cloud computing faces many of the same challenges, but goes straight to the application level, letting the business demand for innovative computing services drive the infrastructure build out. Grid computing continues to be a work in progress in scientific and academic communities – such as NASA, Fermi Lab and related large governmental agencies – where levels of professionalism and authentication are high. Don’t laugh. The Internet was once a Defense Department research project. However, development latency is high and commercial business application results are years away. It is conceivable that the Amazon cloud (Elastic Compute Cloud [EC2]), Google cloud (App Engine), IBM cloud (Blue Cloud), as well as private enterprise corporate clouds built using 3Tera, Enomalism, or Kaavo design tools on web hosting networks such as Terremark, AT&T, OpSource or IBM’s many facilities in retail, finance, consumer goods, etc. could eventually be tied together to become the next high concept – the global grid, now redescribed as the commercial cloud.
Cloud computing is also related to software as a service (SaaS), but different from it. Software as a service is a hosted application that provides a particular business solution to multiple tenants (clients). The poster child for this kind of web service is Salesforce.com. The complexity of in-house customer relationship management (CRM) systems has created a powerful case of a kinder, simpler model of managing the sales life cycle. The success of the model has spawned a whole series of “____ as a service” phenomena – hardware as a service, software as a service, collocation as a service, everything as a service. The latest of these is Force.com (from Salesforce.com), which is a cloud-like platform as a service for your business application of choice, especially if it is relevant to CRM. This is arguably a cloud computing platform where end users act as high-level developers to configure their own solution. In contrast, a purer approach to business intelligence software as a service is represented by Blink Logic, now working closely with technical, application and business operations provider OpSource. Blink Logic is arguably the first BI vendor in the cloud, providing a subscription-based, self-service model for collaboration around dashboards, KPIs, scorecards, geographic intelligence and related capabilities.
Cloud computing is related to web hosting, but is different from it. It will accelerate the commoditization of the services provided by those firms that (for example) get named on the Gartner Magic Quadrant for web hosting such as AT&T, DataPipe, The Planet, IBM and SAVVIS. Cloud computing pushes web hosting and web services further down in the technology stack. In comparison, Terremark is trying to differentiate itself from the pack – as others presumably will soon be doing too if they have not already done so – by offering cloud-like interfaces and SLAs to social networking and multiperson gaming providers. Companies such as Areti Internet – acquired by Alentus in June, Fortress ITX, Enki, Layered Technology and OpSource provide a variety of managed services, collations, hosting, billing and Internet service provider (ISP) infrastructure. What lifts them above the level of a mere computing utility is the ability to build a remote (or distributed) application using a drag-and-drop tool for application provisioning over the cloud. One such tool is from 3Tera, which seems to have stolen a march on the competition, the latter being limited to open source solutions such as Enomalism and Kaavo. This approach is in contrast to coding to a high-level API such as that furnished by Amazon’s Elastic Compute Cloud (EC2), Simple Storage Solution (S3) and Simple Database (SimpleDB). 3Tera positions itself as competing with Amazon; but as a software product, it must partner with infrastructure providers such as the above-mentioned providers.
A word to the wise – it may turn out that the market for building enterprise clouds within the enterprise is ultimately larger than that for small start-ups offering consumer products and services. Of course, since such a market does not yet exist, sizing it is speculation. It is easy to imagine such giants as Wal-Mart, Chase, Toyota, etc. evolving their intranets and corporate backbones in the direction of enterprise clouds that finally succeed in offering business services – including both transactional and business intelligence – within the four walls. It is then a logical next step to allow access to suppliers, clients and business partners through defined interfaces and services. That would not be a new idea, but it would provide new, innovative business options and content through a cloud that extended itself to provisioning and providing point-of-service business applications.
The Emerging Ecosystem
Another indication that cloud computing is for real is the emergence of an entire ecosystem around cloud computing providers such Force.com from Salesforce.com with such supplementary utility players as DreamFactory and Cloudworks. A substantial complementary offering is being delivered by Callidus Software, which complements the Salesforce.com performance management with incentive compensation management (ICM). Arguably, OpSource is another candidate ecosystem with business intelligence from Blink Logic and technical, application and business operations from other sources.
On another front, Amazon itself is constellating an ecosystem with Elastra, CohesiveFT and EnterpriseDB, though the latter hasten to state that they are not limited to the Amazon Elastic Compute Cloud (EC2) environment; and they promise to address other clouds such as FlexiScale (an xCalibre company in the UK) or GoGrid. Vertica – yes, the column database and analytic engine – is offering its application on Amazon’s EC2 – Elastic Compute Cloud services. This may actually make it the first to market in a data warehousing context1. If anyone is using this, I would like to hear from them.
Cloud Computing Meets Data Warehousing
Thus is the dynamic landscape of the emerging cloud computing environment. What will the effect be of the encounter between cloud computing and data warehousing? First, data warehousing will do to the cloud what it did to web service – raise the bar. Second, it will push the pendulum back in the direction of data marts. Third, it will deflate the inevitable hype being generated in the press.
First, data warehousing raises the bar on cloud computing. Capabilities such as data aggregation, roll up and related query intensive operations may usefully be exposed at the interface whether as Excel-like functions or actual API calls. Cloud computing is the opposite of traditional data warehousing. Cloud computing wants data to be location independent, transparent and function shippable, whereas the data warehouse is a centralized, persistent data store. Run-time metadata will be needed so that data sources can be registered, get on the wire and be accessible as a service. In the race between computing power and the explosion of data, large volumes of data continue to be stuffed behind I/O subsystems with limited bandwidth. Growing data volumes are winning. Still, with cloud computing (as with web services), the service, not the database, is the primary data integration method.
Second, data warehousing in the cloud will push the pendulum back in the direction of data marts and analytic applications. Why? Because it is hard to image anyone moving an exiting multiterabyte data warehouse to the cloud. Such databases will be exposed to intra-enterprise corporate clouds, so the database will need to be web service friendly. In any case, it is easy to imagine setting up a new ad hoc analytic app based on an existing infrastructure and a data pull of modest size. This will address the problem of data mart proliferation since it will make clear the cost and provide incentives for the business to throw it away when it is no longer needed.
Third, the inevitable hype around cloud computing will get a good dose of reality when it confronts the realities of data warehousing. Questions that a client surely needs to ask are: If I want to host the data myself, is there a tool to move it? Since this might be special project, how much does it cost? What are the constraints on tariffs (costs)? The phone company requires regulatory approval to raise your rates; but that is not the case with Amazon or Google or Layered Technology. Granted that strong incentives exist to exploit network effects (economies of scale and Moore’s Law like pricing). It is a familiar and proven revenue model to give away the razor and charge a little bit extra for the razor blade. Technology lock-in! It is an easy prediction to make that something like that will occur once the computing model has been demonstrated to be scalable, reliable and popular.
Under a best case scenario, economies of scale – large data warehousing applications – will enable a win-win scenario where large clients benefit from inexpensive options. However, in an economic downturn, the temptation will be overwhelming to raise prices once technology lock-in has occurred. Since this is a new infrastructure play, it is too soon for anything like that to occur. Indeed, this is precisely the kind of innovation that will enable the economy to dig itself out of the hole into which the mortgage mess has landed us. Unfortunately, it will not make houses more affordable. It will, however, enable business executives and information technology departments to do more with less, to work around organizational latency in any department and to compete with agility in the digital economy. It is simply not credible to assert that any arbitrary cloud computing provider will simply be able to accommodate a new client who starts out requiring an extra ten terabytes of storage. Granted, the pipeline to the hardware vendors is likely to be a high priority one. The sweet spot for fast provisioning of data warehousing in the cloud is still small- and medium-sized business and applications.
1. Ericson, Jim. Columns in the Clouds, DM Review, August 2008.
SOURCE: Data Warehousing in the Clouds
Recent articles by Lou Agosta
Copyright 2004 — 2020. Powell Media, LLC. All rights reserved.
BeyeNETWORK™ is a trademark of Powell Media, LLC