Blog: Dan E. Linstedt« High volume and low performance - what to do? | Main | ETLT or ELT - Either way, pull back the sheets. » High Volume, Low Performance - Can CDC Help?In my last entry, I blogged on High Volume and Low Performance issues that you might run into. In this entry we'll talk a little about CDC (Change Data Capture) and how this is paramount to the success of your systems moving forward. If you've got high volume issues, and you don't have CDC in place today, you may be fighting an ever growing data set that will at some point become un-manageable. If CDC and Real-time / Right-Time processing (you know how I feel about the term Real-Time) are not implemented together, your right-time delivery system runs a high risk of pushing all kinds of extra traffic across the wire. Change Data Capture, or CDC should be a vital part to any back-office BI solution that is put in place today. It may mean getting over the hurdles of signing SLA's with your data service providers, but believe me, it will be worth it. The question as is often missed is: what is my ever-growing cost of NOT implementing a CDC solution? When we think about it this way, we end up at the right conclusion. Why? Because the data sets continue to grow, and when data sets grow, traffic on our network grows, the logic to decipher changes and transform / remove / record duplicates becomes more complex. With complexity comes system slow-down, with more network traffic also comes system slow-down. All in all, not implementing CDC causes costs to rise - and the faster the business changes / moves - the quicker the costs of not having CDC at the source, rises. Wait a minute! CDC At the SOURCE? How in the world can I do that? I don't even own all my sources... CDC is required on ALL source systems, and by the way - if you are BUILDING an SOA, or an MDM solution, or you're setting up data governance or a governance initiative, putting CDC in place will sooner or later become a necessity. Not just the source systems you own, but also with the data and service providers you don't. Let's take sales force for example. If you outsource your sales management to Sales Force, then you'll want them to implement CDC on any of the "changes" that take place. Change Data Capture systems become the "expert" logic for providing traceability and auditability demanded by auditors and compliance initiatives around the world. They provide safe and consistent means to extract every data set that changes, when it changed, and what it was versus what it changed to. The overhead on the source systems is often used as an excuse NOT to engage in CDC - this is the wrong way to look at the cost. A better question to ask is what is the cost of all that extra traffic on my network, traveling through my Transformation tools (be it: EAI, EII, or ETLT). I'm sure the cost of all that extra traffic is much much higher than the overhead cost of CDC on source systems, especially when the data set grows again, or when the frequency of delivery is reduced again. Now, what do you want CDC feeding? What kind of features should I look for in my CDC offering? Ultimately, if a record changes and has 360 fields (let's say it's a mainframe record or a Cobol based structure), can I gear the CDC to issue an UPDATE transaction with JUST THE PRIMARY KEY, and JUST THE CHANGES? maybe only 10% or 36 fields changed, I don't want all 360 fields running across my network... These are just some of the questions I would ask of CDC vendors, there are others - but this is a start. If you have CDC installed, I'd love to hear your comments as to how it helped your business, and what your headaches may have been in putting CDC in place. If you don't have CDC, and your business is fighting the concepts, I'd love to hear the arguments used (post anonymously if you wish) against CDC. Thank-you very much for your time, |
Comments
I have never seen CDC in a source system done properly. Usually it is something thrown together around log scraping or database triggers (shudder). I have certainly never seen any implementation get close to the 11 functions you mentioned.
I don't think people know what CDC can do and I don't see it pushed by vendors. I find the ETL vendors for example being quite passive about it CDC addons and hardly promoting them at all.
Posted by: Vincent | December 4, 2006 5:20 PM