Business Intelligence Network business intelligence resources

Blog: Dan E. Linstedt

« Hardware: What do you need for VLDW? | Main | Disk and VLDW: What you need, can use »

Operating Systems: What you need for VLDW

So you read my post on Hardware right? If not, take a gander at it... I'd like to think it's mostly complete. This entry focuses on Operating Systems, what the machine needs to work under severe volume loads, and how the operating systems react. In the near future I'll have a post on Applications including ETL / ELT, and Database engines, and then I'll move back in to the business side: skill sets, requirements, defining/gathering/estimating, etc.. All of these things are items that I teach at TDWI in VLDW. I try to keep it fresh. This however, is a short posting.

Operating Systems require "room to breathe." And in high-volume and/or low-latency situations (like near-real-time), OS's are often left in the dust when considering the "operating" aspects of the system. i.e. how a system actually works. People tend to forget that an operating system can make or break the actual application layer on top.

Let's take a look at some of these systems and how they work (in general), then we'll see how to help them work better with very large data warehouses.

1) Mainframe type OS: Very good at high speed computation, extremely good at parallelism, and partitioning of the work, multi-threading operations based on complex algorithms built in to the firm-ware, even when there is no "multi-threading" in the applications. Extremely high-speed hardware bus underneath to support parallel applications. These Operating Systems typically run each application in their own "slice" of memory, CPU, and disk resources. They time-slice accurately and efficiently. The downside? High engineering costs, mean high costs to consumers. You're paying for all this fancy engineering. The upside? Things like LPAR and VPAR abilities (Logical and virtual partitioning) meaning you can split the machine into multiple components, and say: install windows on one partition, AIX on another, and "Mainframe/cobol" code on a third.

They can all talk to each other without ever hitting the network, they can "share" memory across boundaries, and are extremely fast and reliable.

Usually, what you get out of the box is not necessarily "tunable" per-say, and is already pretty high quality, and pretty fast.

2) Unix OS systems: Vary depending on the manufacturer, but there are three or four major players: IBM (AIX), HP (UX), SUN (Solaris), SCO (unix). These systems are highly tunable with many different knobs to tweak depending on the hardware they reside on. The largest thing to remember here is that HPUX and AIX and Solaris all manage memory differently, they also handle swap space and threading differently at the core-level. This means you'll get different levels of performance depending on the actual hardware configuration. With HP you might want a SUPERDOME with MPP built in to the scalable chip-set. With Solaris, you want a similar machine, with IBM AIX, you might even consider a P-series or Z-Series system, also capable of running Zos underneath with LPAR's and VPAR's.

The largest thing to remember about UNIX OS systems is the delegation of SWAP or TEMP space. It's all about the DISK I/O throughput at that point, and in each unix (except AIX) you MUST set the space available to 1.5x to 2x the size of RAM. So if I have 16GB of RAM, I'd really want: 32GB of temp or swap, so that disk fragmentation is kept to a minimum, and multiple threads don't constantly run at 80% of the machine resources. This is called "run-cool" setup and will serve you well.

3) Linux OS systems, are very similar to UNIX and have multiple knobs for tuning and tweaking, it's best to call in an expert for both Unix and Linux to tune to specific hardware platforms, or the OS simply will not perform _as well as it could_ in those circumstances. (same thing applies to swap/temp).

4) Windows OS systems, 32-bit, should not be utilized for VLDW systems. Why? Because of several reasons: 1) PageFile.sys (swap/temp) is SINGLE THREADED BLOCKING I/O. 2) ALL code in Windows 32 bit MUST execute under the 640k boundary!! Required!! 3) multi-TASKING is available, yes, but true multi-threading is not. Example: launch Windows Media Player, start one of your favorite songs, then launch Outlook, two or three good sized word documents, an Excel Document, and to top it off, start a "download" of a significant sized file. What happens to the playback of Windows Media Player? It skips, even if it's made high priority... This is the result of 1, 2, and 3 working against you. Imagine what happens in a VLDW environment...

Now: Windows 64 bit, yes - good choice, with SQLServer 2005 64 bit... BUT you MUST MUST MUST give it enough hardware to perform properly. Without the right hardware, it will only execute SLOWER than the 32 bit systems.

Cheers for now,
Dan L

  Posted by Dan Linstedt on September 6, 2007 9:35 PM |

Post a comment