I have the wonderful opportunity to teach a VLDW course at TDWI, I also have the sheer joy of dealing with data that qualifies as VLDW feeds and some of the most massive systems in the country on a consulting basis. But there always seems to be the question: Which is better? Clustering machines together, or one "big honking box." (Honking = to push the horn button in the middle of your steering wheel).
Well, as usual, I have a very opinionated stance on this - and have discussed this with Kent Graziano, Richard Winter, and a couple of my other friends. This entry is based on my experience, and what I've seen. I then speculate on what happens to each scenario as the system grows (again based on experience). If there is anyone out there with a different experience, I'd love to hear their thoughts.
Here we go.
So what does VLDW mean anyway?
VLDW means different things to different people, but in my case - VLDW means moving 88M to 1.5B rows per load Job, moving over 5B (billion) rows each load cycle, AND having billions of rows in the target warehouse to start with. In my job, it doesn't mean moving 88M rows one time for historical load purposes, then moving thousands or hundreds of thousands of rows per cycle after that. It also doesn't mean having terabytes of information "sitting" inactive in the warehouse that isn't compared, utilized, queried, or altered during a load cycle. It means actively accessing a majority of those terabytes throughout a week's worth of up-time.
Let's also define what I mean by Clustering:
The clustering that I'm referring to is: single shared disk across several machines, each machine is "wired" together to synchronize memory, CPU and disk processing - in other words - it is meant to "LOOK" like one big machine. I am NOT including a "cluster" that doesn't share disk, that would be a cluster-mpp mix, and of course a cluster that doesn't share RAM or CPU isn't a cluster either - it's MPP. (Massively Parallel Processing).
Let's define what I mean by MPP:
Non-shared Disk, Non-Shared RAM, Non-Shared-CPU. Each unit is SMP underneath the covers. They share a high-speed interconnect that allows them to talk to each other, but each unit is independent of the others, they act as a collective operation, and everything is run in parallel.
Ok, what have you seen in the market place?
I cannot list vendor names here - I try not to write with a vendor bias, so I'll use X's, and Y's, and Z's instead. Contact me if you want to discuss specifics. These are TRUE case studies of customers I've visited who are sitting around 25TB to 45TB of active data in their warehouse, and are experiencing issues. Remember - these customers are ALL running data warehouses, this is NOT about the OLTP side of the house.
Customer 1: DB X - 45 TB, Clustered environment, having trouble with Network bottlenecks and I/O synchronization, has a daily call with the engineering staff of DB X vendor, with patches written just to keep their DBMS up and running.
Customer 2: DB X - 35 TB, Clustered environment, having trouble with Network bottlenecks, and I/O throughput - wondering what they can do to "fix" their problems, doesn't want custom patches from DB X vendor.
Customer 3: DB X - 12 TB, Clustered environment, trouble with data mining operations, switched to DB Y (MPP solution).
Customer 4: DB X - 18 TB, Clustered environment, trouble with loads, and query times - switched to DB Z and is growing rapidly now.
Customer 5: DB Q - 12 TB, Clustered Environment, trouble with loads, index maintenance, and RAM allocation, switched to DB Y and is growing rapidly.
Ok, here's my two cents: take it for what it's worth, this is the thought experiment I set out on to find out WHY clustering began to exhibit problems at specific levels of volume when MPP showed no signs of slowing down. Here's what I found, and what I speculate:
1. There is an inherent CAP on processing power available in most clustered environments, the same can be said for MPP - BUT typically in an MPP environment, the CPU's are much faster, the bus speeds are much faster, and the CPU's don't have the EXTRA load of trying to synchronize RAM, and I/O across the network - in MPP each independent node is responsible for it's own operations.
2. As the data set increases in size, managing the historical information and keeping all the records in SYNC requires an exponentially increasing amount of hardware and hardware performance. That means: in a cluster - to keep the DISK MAP synchronized across all nodes requires more and more network bandwidth, and more and more I/O bandwidth.
3. The DISK MAP is only one map, there's a RAM map, and a CPU MAP to maintain as well. Remember: the true cluster "appears" as one big honking machine to the application.
4. As the data set increases, the synchronization efforts double, and double again - exponentially eating up the available resources.
5. As the data set increases, it requires more SMP nodes to be attached to the cluster - but adding more SMP nodes also exponentially increases the difficulty of synchronization.
And so on, the problem compounds itself in such a way, that no amount of money in the world (to throw at the problem) can solve the amount of performance required to handle such large data sets.
A basic tenant in life is to "Divide and Conquer" when we are faced with large problems. We need to learn to apply this to our data warehouses, especially in VLDW. The only way to divide and conquer is to use MPP - OR to really buy a "big-honking SMP machine".
What am I saying?
I'm saying that in order to technologically bridge the gap, the first problem to solve is the network throughput - and today with SMP clusters, there is an upper limit between servers at which networks can operate. I have yet to see a DS3 "IP card" or even a "T1" IP card that is capable of wiring together clustered SMP nodes, or wiring disk to CPU. But the problem goes beyond that, the network card (at a hardware level) would then have to take on CPU power to overcome the next problem, lack of CPU available to run the "synchronization routines", and the problem expands. The bigger the data set, the more challenging it is to SOLVE. There's a mathematical hard-core upper limit to what technology can "do".
This is why if you look at a machine that runs as a HUGE SMP (32 to 64 CPU's and 64 GB RAM) you see a super power-horse, and also why these machines cost so much. The company that produces that machine has gone through the trouble of solving these problems (or eliminating them) through hardware BUS architecture. It's only on these machines that DB X has been scaled beyond the TB levels I've put here.
Now, there's a couple other things I wish to note: there are SMP clusters that are rack-mount, where the interconnects are a back-bone and a direct connect across the machines. These way-lay the problems but only for a little while. The next thing I'd like to note are the SMP appliances, when plugged in - they act as MPP architecture, independent nodes, and are taking market share from the leading MPP RDBMS vendors. Dedicated Rack Mount SMP clusters that act as a "unit" within an MPP environment (handling only a portion of the overall data set) work REALLY REALLY WELL, and are extremely fast, plus they offer the benefit of fail-over and recovery at lower cost than a single LARGE SMP unit within an MPP environment.
Mathematically there doesn't seem to be an upper limit to MPP data handling, mostly because adding another "node" to the MPP chain divides the work further - and doesn't necessarily "add" to the complexity because synchronization is not needed.
I'll give you several cases where I've been that have MPP in their environments (these are all commercial environments, as the public sector environments are much larger, but cannot be discussed).
1. MPP - 348 TB (2 years ago) and still growing
2. MPP - 5 Petabytes (that's all they can keep on-line), they generate 1 Petabyte for every experiment they run.
3. MPP - 150 TB (last year) and still growing
4. MPP - 268 TB (last year) and still growing
5. MPP - 3 Petabytes (scientific research) and still growing
Big problems require big solutions, I'd be happy to speak with you off-line about this information, as I teach VLDW, and performance and tuning, as well as systems architecture, design, and scalability for the future.
Bottom line, Clustering (the way I've defined it here) is not suggested for your future if you expect large volumes. I welcome any thoughts, critical, or otherwise - I'd love to hear about successes in the clustered environment, maybe we can flush out what's acceptable to cluster.
Posted December 16, 2005 5:49 AM
Permalink | 2 Comments |