Blog: Dan E. Linstedt« Operating Systems: What you need for VLDW | Main | Databases and VLDW: Petabyte Scalability » Disk and VLDW: What you need, can useIn very large data warehousing, or VLDB for that matter, I am constantly asked: what kind of disk should I have? can I use a SAN, how about a NASD, what about DASD? I have RAID 5, is that good? Now there is Raid 7, 5+, S, 10, and so on. There are differences that DO make a performance difference in the disk that you are using, and when you are dealing with very large data sets you MUST have throughput. This is the optimal end-game. The answers are quite simple really: Faster the better, but a minimum throughput of 300MB to 400 MB per second is required. (This was 3 years ago!) Today, I would suggest that 400 MB to 500 MB per second is better, faster the better. Now that said, if your disk cannot achieve these throughput levels then it's and so on. Unitl you can reach these levels of throughput it, performance will be difficult if not impossible to achieve in a 30+ or even 100+ Terabyte system, Did you know that CERN produces 100TB of data every time they smash an atom? That's an interesting tid-bit to chew on, and to top it off, they capture it all.... Ok, so what do you need in your disk device? I don't mind whether it's SAN, NASD, DASD, but I will say this: Internal disks are fastest, next up are DASD (this is the preferred choice in the HUGE volumes), then NASD and SAN IF the network is VPN direct to the server, and has garaunteed throughput. If you are working with a storage hosting vendor, then ensure you have an SLA in place for garaunteed throughput, then ask to see the throughput test results on a bi-monthly basis. This will keep them honest. I've been in places where the hosting service will "move" your data around to different disk arrays in their system based on their own internal needs. I've also been in places where they will _not_ garauntee dedicated access nor will they garauntee exclusive access to disk devices. Think about it, if you're willing to spend that much money to have a high volume solution, shouldn't you be protecting yourself and getting your money's worth? Love to hear your thoughts, |