Business Intelligence Network
business intelligence resources

Blog: Dan E. Linstedt

« Data Warehouse Appliance, another look | Main | Hidden in the un-structured information... »

VLDW: What happens in a scaled cluster?

I wrote a blog on this a while back, about MPP vs Clustering, now I'm going to discuss what happens in an Active cluster (to use an MS term) that usually causes problems. I'll also talk about clustering within a single node-group under an MPP option. While there are many issues surrounding clustering and volume, some are more prevalent than others. The golden rule is: Volume and Latency change everything! Come see me in August, at TDWI - where I teach a VLDW class and it's technical aspects.

In my last entry on this topic, I stated that it is better to run with MPP than with Clustering and that the more volume the clusters contain, the costlier it is to keep them going. Here's what's happening under the covers.

This pertains to a completely clustered SMP system:
1. Active clustering means to have all nodes sharing the entire RAM, all the Data, and running copies of the same process at the same time.
2. Each node "knows" about the other.
3. Assuming I have 5 clustered SMP nodes, and each node has 4 GB of RAM with 8 CPUs, the applications running on each node believe they have access to 20 GB of RAM and 40 CPU's.
4. In order to make "20GB" of RAM addressable, each machine must share a master memory allocation table, along with that allocation table, it shares semaphores - or locking mechanisms.

The network traffic between the nodes MUST be dedicated to server to server communication, if the network between the servers is mixed with disk traffic, or client traffic, or other server traffic, the communication layers begin to break down. Maintaining or keeping "up with the nodes", in other words synchronizing each node in the cluster every millisecond for access to the shared master memory allocation table becomes a bear. The more "nodes" that are added to the SMP cluster, the more network traffic there will be, the harder it becomes (mathematically) to keep them all in synch - due to the limitations of speed of the network, speed of the CPU's, speed of RAM, speed of disk. These upper limits are constantly being raised as speed of hardware increases, however - they lower back down with addition of RAM on the machines, or addition of data to manage on shared disk.

In a clustered environment such as this, everything is shared across all machines. The next thing that happens is the sharing of disk. The sharing of disk introduces I/O collisions across the I/O network (which should also be independent of every other kind of traffic). I/O contention must be managed, all nodes have access to the same data at the same time, the trick is to (again) setup a master data access table, just like a master RAM access table, and then synchronize the master data access table across all nodes in the cluster. The problem comes (again) as the data set grows, and localization of the information becomes a burden. In other words, the database that runs a cluster needs to scan 50 Million rows, and run computations.

The table is partitioned, but it becomes a shared job - the process starts on a single node in the cluster, the database thinks it has 40GB of RAM to access, so it begins to load the RAM in each of the clustered machines with different data sets - as it does, this exponentially increases the network traffic between the machines (in order to synchronize the RAM and CPU actions across the machines), and increases the network traffic between the machines and the disk device. Ultimately a second request for large data comes in, the first request hasn't finished yet - there's not much RAM left on each clustered node, so SWAPPING ensues. This is again an exponential increase in I/O (I/O includes everything from network to RAM to CPU to disk access). Again, the synchronization routines take over, and every single node in the cluster tries it's best to balance the resources.

This of course leads to extremely slow response times for both the first and the second access points, and so forth. The synchronization routines slow this process down, way down to a crawl. Operations begin to take on sequential nature as opposed to parallel nature because they run out of RAM, run out of CPU computing power, and the network gets' so bogged down that it cannot handle any further requests. Now, we think by adding a new clustered node that we'll solve the problem - but instead it only makes the problem worse.

I think by now it's evident that clustering for a very large data warehousing solution is NOT desirable. Can you put small numbers (2 to 4) clustered nodes on a single MPP solution? Yes, if you architect the nodes to operate independently, clustered nodes in a single MPP solution is one way to handle this kind of volume growth, adding another MPP node of clusters is ok - because it maintains autonomy, and scales linearly. BUT if you put too much data on a single "clustered" node within the MPP, you run into the same problems that large clusters present. Large "clustering" of machines (in my opinion) won't necessarily be feasible until we have speed-of-light communication between the clusters and, we are using RAM-based or nanotech based data storage rather than physical mechanical disk.

MPP On the other hand splits the load, and the trick with MPP is to avoid a "hot-node" which acts like a cluster in trouble. Balance of the data and the processing in the MPP world is EVERYTHING. But with balance and appropriate "split" of the data sets, near-linear scalability can be achieved.

Today, I nearly always choose the MPP option for data warehousing, in another entry at another time, we will explore MPP versus Clustering from an operational standpoint.

If you have success stories about clustering, I'd like to hear about them - please also include the estimated size of the data set, number of nodes, amount of RAM on each clustered node, and number of CPUs' on each clustered node. If you have horror stories, I welcome those too. By sharing your experiences we can begin to shed light on this subject.

Hope to hear from you,
Dan L

  Posted by Dan Linstedt on March 3, 2006 5:19 AM |

Post a comment