Blog: Dan E. Linstedt« DNA Computing - What should it be used for? | Main | High Volume, Low Performance - Can CDC Help? » High volume and low performance - what to do?If you're like the rest of the world these days you've got an ever growing data set, and at the same time an ever shrinking processing window. This is not something you want to treat lightly. In most cases, you are also experiencing severe performance problems and don't know how to deal with it, or haven't been able to solve these issues. Well, there are ways and means in which performance can be improved - I've been teaching, and consulting on performance of VLDW integration systems for 10+ years, there are techniques and manners in which your performance can improve. The catch? You have to be willing to swallow the blue pill (from the movie: The Matrix). Let's just see how far done the rabbit hole goes... So you're caught in performance issues, be it ETL, ELT, EII, EAI, or worse: RDBMS system. You have big data to move or maybe lots of data in very short time windows. Maybe you have huge data to bounce it against (look up, match, consolidate, quality cleanse, etc). Maybe you're pulling together 200M rows for one feed to a dimension. You're source is 20 to 40 tables, and you've got a window to worry about. How do you handle this situation and others like it? What do you look for? There are many different facets to examine in performance and tuning of your systems and your architecture. The major facets that I examine are: Of course, we also consider where the data is coming from: ERP, CRM, BPM, Financials, and other applications that lay up-stream, and whether or not the systems are delivering data in real-time, batch, or both. If there's one thing I rely on in many of these situations it's the numbers. The throughput numbers usually tell the story as to where the problems are originating, and then through orders of scale - what we can do about it. The numbers tell me how much, how fast, and give me the ability to re-factor the architecture where the pain points exist. In plain English please... I go through more than 100+ points in an architecture to pinpoint the top 25 that are causes to the performance problems in a system (too numerous to mention here), but I'll give you a taste of what I usually look at in the RDBMS side of the house. By the way, the customers who go through my performance and tuning assessment have seen on average, anywhere between 400% and 4000% performance improvements - taking run times from 48 hours down to 8 hours, from 8 hours to 2 hours, from 6 hours to 45 minutes, from 56 hours to just under 12 hours and so on. But it means the customers were willing to take my suggestions to heart and implement them. In the RDBMS world here are some things I measure by: And so on... If you have an interest in a particular area, please post a question or comment, and I'll try to blog on that going forward. In the mean-time, please be aware that we offer these assessments with fantastic results. Thank-you, |
Comments
Hi Dan,
Yours was a interesting reading.I am an ETL Developer myself, and I found your blog very informative and fascinating, I would like to know more about designing data ware house in ETL Architect perspective
Posted by: Hemanth | December 18, 2006 2:32 PM