I recently had the chance to work with the Beta program of SQLServer2005, and Integration services. Let me tell you, it was amazing. There are several highlights in the technology that I'd like to share. This entry will go through these items. There are only a couple of things that I'd like to see improved. My basic thoughts are: this will give customers who are tight on budget, or not moving huge volumes per load-cycle, an opportunity to really begin building out a true data warehousing/BI solution.
* Database/Table partitioning. At long last, SQLServer has implemented range-partitioning. This will be a HUGE boost to performance on larger data sets, as long as the partitions are architected properly across the existing disk sets. If you're not familiar with partitioning, now's your chance to read up on it - do all the reading you can before you upgrade. Partitioning is not something you simply want to "turn-on". However, SQLQueries that use clustered indexes can finally avoid table scanning when partitions are invoked, why? next feature... Other types of partitioning (like hash, and sub-partitions) are said to be in the works for a future release.
* Query Parallelism. Query parallelism has been implemented to take advantage of the database and table partitioning. The queries can and will be broken into their respective components to take advantage of the range partitions. Queries also take advantage of new performance of the SQLServer2005.
* Performance. SQLServer2005 has extended its performance by optimizing to 8k block sizes, windows operating system hooks, and shares RAM caches with I/O buffers. They've streamlined the I/O from SQLServer2005 to the disk, and as I understand it - bypassed the page-file to bring data into RAM (although page-file is still used for locking and swapping). The performance of SQLServer2005 is much higher than 2000, and can process millions of rows per minute, even without clustering.
* Clustering. There are new mechanisms for clustering SQLServer2005, and as I understand it, it can take shape of multiple clusters on a network of nodes (SMP under an MPP look and feel). This brings new meaning to low-cost high power computing. This notion is still challenging to setup, and make happen with SQLServer2005, but will get easier as Microsoft continues to release their product.
* Fast fast fast, is all I can say. I saw a windows platform processing 2 Million rows in 2 minutes from a flat file, into SQLServer2005 through Integration services, and through a data mining transformation that provided name and address cleansing (and learned) - It was based on a neural network that you could tune. This was done on a single CPU laptop running all the software. It's been tuned to run with Flat Files, and SQLServer connections in parallel. Finally, direct connections gain parallel utilization and catch up with other DBMS vendors.
* Tons of transformations. Comes out of the box - lots of transformations, as I mentioned just a minute ago, one of them is a neural network. It also houses the parallel processing properties for splitting data (in parallel) down multiple transformation paths. They also have the notion of an error control and error target, which you can control the rules for.
* Workflow style transformation - all transformations are workflow style, meaning they are "runnable designs". You can see the row count increase as the data passes through the transformations along the way.
Improvements I'd like to see:
* Metadata. I had heard that they don't have a real "metadata repository", they apparently are working something out with Meta Integration (you'll hear about them shortly). They need to get into the OPEN metadata game.
* Import/Export of transformation logic via XML. There apparently is no import/export of the transformation designs, therefore there is no current capability for sharing the transformation logic across servers.
* Reusable transformation "workflows" - they do not have the concept of reusability down yet. I've heard they are working on this.
* Biggest one: Connectivity options. Today, if you want Oracle Data, DB2 UDB, DB2 AS/400, Teradata, Sybase, MySQL, or Informix you have three options: use BRIDGE software provided by the other vendor (which is slow), use ODBC and OLE*DB drivers from Microsoft, some are fast, some are slow, or write your own for native connectivity using their SDK.
Bottom line: if you're a Microsoft only shop, SQLServer2005 and Integration services is a HUGE boost to productivity over DTS and SQLServer2000 - well worth the upgrade cost. If you're a small shop that deals with mostly flat-file transfers from other databases, it will help as well. If youâ€™re into volume - and native connectivity, I suggest looking elsewhere, and letting Microsoft grow into this space.
More to come, did I miss something? Post a reply!
Posted April 26, 2006 5:45 AM
Permalink | 1 Comment |