Could you put your enterprise data warehouse (DW) in the cloud? If you think you can’t, you’re not alone. Many businesses remain wary due to the perceived increased security risk for cloud-based data warehousing and the intensive data transfer process that it would require.
But you may be inching closer to a cloud enterprise data warehouse (EDW) without realizing it. Cloud infrastructure service providers offer the ability to launch and run virtual servers in the cloud. Self-service access to on-demand computing and storage resources in a multi-tenant cloud have lowered barriers to entry for cloud hosting many general-purpose uses, including database-oriented analytic applications. And the market for cloud business intelligence (BI) is exploding no matter what research data you read. Enterprises are getting a taste of the cloud in bits and bytes, especially for business intelligence. We’re feeling more comfortable with the cloud.
Still, it’s daunting to fully commit to a cloud EDW. A number of hard problems have to be solved. But over the past few years, cloud infrastructure providers have been responding to some of these issues with what I can best describe as the “cloud BI/DW appliance.” Data warehouse appliance vendors such as Teradata and HP Vertica have been evolving their cloud appliance offerings since their introductions, but Amazon’s recent entry into this market with Redshift seems to have really ignited the marketplace. If, as I expect, this trend continues, we can look forward to cloud DW appliances continuing to enhance features, lowering barriers to entry and driving down price points, all of which should work to heat up this market.
Since the whole concept of a data warehouse appliance is based on having a tightly integrated technology stack from software down to the bare metal, let’s explore how an appliance approach translates to a cloud service, and the challenges and opportunities this architecture would involve.
Six Cloud EDW Appliance Challenges and Opportunities
Challenge: Performance of virtual servers lags bare metal servers. The virtual machine layer of abstraction, making possible to over subscribe to obtain decent hardware utilization (something bare metal architectures rarely achieve), at the same time prevents any one application from really using the whole machine.
Opportunity: Virtual machine performance efficiency has gradually improved through technology developments like hardware-assisted virtualization with CPU and other hardware extensions to support virtualization, or paravirtualization techniques using hypervisor or hardware-specific code. Another important hybrid virtualization technique, of particular interest to large scale data warehousing, is the uniform packaging of hardware into cluster server units or nodes, which makes it easier to predict performance and tune the other layers of the technology stack.
Challenge: Virtualization layers have brought us cool features like the ability to create, save and move machine images that can run anywhere, but they also have provided less control of the stack. Data warehouse appliances are purpose-built; they excel because they can be tuned from the top to bottom – application, DBMS, operating system, device drivers, storage, hardware and network. Typically, tuning also involves configuring the optimal processor/storage ratio to achieve best cluster performance on parallel query workloads.
Opportunity: The cluster server units can be tuned (software, operating system, hardware, and networking) for DW workloads. Of course, because they are purpose-built, the servers may not be as readily reusable for other needs should there be excess capacity, so managing appliance assets may be more complicated or more costly than generic virtual machine hosting.
Challenge: These days, if you aren’t employing full data encryption and fine-grained access control, you should be. There has been a general lack of out-of-the-box solutions for virtual machine encryption, so cloud practitioners must take extra care and build out multiple layers of data encryption and access controls to protect data security and privacy.
Opportunity: A purpose-built uniform server cluster can be outfitted with full data encryption not just for permanent objects, but also for landing, staging, temp, spill and other interim data storage, thereby decreasing the attack surface from above and below the point of service. Landing and staging zones can also be configured to require encryption, and all data movement must be encrypted.
Challenge: Although it’s easy to scale up virtual machines by adding more virtual processors and memory, and striping storage volumes to improve I/O performance, there are limits to the total amount of storage and I/O with a virtual machine. Cluster configurations have been available, but in general they have been better suited to HPC (high performance computing) for CPU or memory-intensive applications; cross-instance I/O rates have been a limitation.
Opportunity: Cloud providers are beginning to extend their offerings to include clusters that are purpose-built for high I/O applications. Using these pre-configured nodes with high-speed interconnects and software support for MPP (massively parallel processing) makes it possible to overcome the previous storage and I/O limits. Of course, this limits portability – you can’t just move a DW appliance instance to any old virtual machine or rack and expect to have the same level of performance. The cloud provider has to build out and maintain pools of like cluster servers in multiple availability zones. Performance tweaks appear in higher layers of the stack as well (e.g., sophisticated data compression and partitioning features, etc.)
Challenge: Parallel loads and extracts require high bandwidth connections to move gigabytes and terabytes of data. Keeping up with the growing demand for network bandwidth is a challenge faced by many IT departments. Often, existing EDWs are colocated close to data sources, reducing the demand on the networks.
Opportunity: Colocating the DW appliance cluster nodes keeps data local and makes for efficient internode communication and data movement. Likewise, high bandwidth network connectivity between the cluster and file landing zones is required with support for parallel import and export. High-speed backup and restore is also mandatory. This still leaves the one potential limiting factor – the bandwidth required to move data to and from the cluster should the data sources reside elsewhere.
- SUNK COST
Challenge: Most organizations have invested a lot of money in their existing EDW platforms. They are not going to have an appetite to move to something new unless the current platform is just not doing the job or a major upgrade is required.
Opportunity: Cloud DW appliance pricing models are helping to change the way we think about cost. Some cloud DW appliance services include managed services such as cluster management, backup, archiving, or scaling events (e.g., adding, subtracting, or changing cluster nodes). Because cloud DW appliances tend to be built with commercial off-the-shelf (COTS) hardware and clusters can start small and grow as needed, entry price points can be very attractive.
These are just some of the key challenges and opportunities facing cloud BI/DW practitioners. But the heartening fact is that although it hasn’t been easy, many of us have been successfully solving these architecture and implementation issues, building and managing cloud analytics services for the past several years. The good news is that the benefits realized are significant: low up-front capital cost, low operational expense, flexibility to quickly scale up and down to meet changing demand, and access to world-class infrastructure, to name a few.
It’s a little early to predict how quickly adoption will occur or how fast cloud DW appliances will take market share away from physical appliance vendors, or whether physical appliance vendors will morph into tomorrow’s cloud appliance vendors. These are interesting times for business intelligence and data warehousing.
So, would you consider a cloud data warehouse appliance? That’s the big question, and cloud service providers are betting “yes.” Let me know
what you think.
SOURCE: Emerging Cloud Appliances for Business Intelligence and Data Warehousing
Recent articles by John Bair