Blog: Dan E. Linstedt« October 2005 | Main | December 2005 » November 23, 2005Data Integration and Thanksgiving.As we celebrate Thanksgiving this year, I would like to remind you of the many turkeys that will be in all across America, spread out on the dinner tables. There will be pandemonium (which should have already begun among the young turkey cookers anonymous) in preparing all the fixings and even the turkey itself. The quest for the ultimate tasty, juicy, and best browned bird in the world. Then, the families will gather around, a single bird - hopefully large enough to feed everyone (who's eyes are bigger than their tummies - I know mine are)... The stuffing will be cooked in parallel with the bird, and subsequently partitioned - once the bird and the stuffing are sufficiently cooked. The mashed potatoes, green beans, and other vegetables will be in various states of reference on the stove, the cook will hope that all the independent food preparation jobs will all finish on-time and in-budget. There's the additional complication of moving all the independent food categories in harmonious synchronicity to the table at the same time, hopefully they'll all be steaming hot - and won't need a quick run through the oven process of warming over (once again) before being served. Of course, each particular separate item requires a different amount of preparation, and given that there are only 4 or 5 burners on the typical stove, and generally one oven - it's a miracle that we deliver the dinner on-time at all. Of course warming up the buns process requires a different business rule (temperature) than that of both keeping the turkey warm (once done), or cooking it in the first place. These different business rules usually require 3 to 5 cookbooks (more abstract business rules) that all provide strange instructions on how to prepare the single dish that is being built. Not to mention that the turkey has a dependency on being thawed out before it can actually be processed, now we want the gravy - well, we have two choices: go the easy route and buy an external source of gravy, or go the hard route, and WAIT until the Turkey is done, then in one fell swoop dip down, and bring the meat juices to the pre-prepared gravy additives - making sure that the temperature reflects just the right amount of integration heat. Then we get to the preparation - the setting of the table. Time to call on the kids (child processes - and I mean this kindly) to cooperate - which by now they are running around the house playing with other child processes and wreaking havoc on the neatly put away toys. Not only do they have to synchronize, they have to put all the plates, silverware, glassware, drinks, gravy boats, carved turkey, and vegetables on the table all at the same time - of course it has to be done before the food gets cold, so they are forced into performing an immediate task, some would say an emergency fix, and producing an acceptable result for the business users, uh - I mean adults. Mean-while all the adults are standing around, bellies already full of other information, due to the over-processing of wine, cheese, and crackers - possibly the occasional olive or two. Now, the adults must acknowledge the child's efforts, and everyone is expected to sit down in great harmony. Of course there's always the one adult process that fights with the other adult process over whom sits where, this is resolved by swapping one adult with another for a place at the CPU, oops - I mean table. Of course this doesn't prevent the children from kicking each other under the CPU, whereby each of those processes promptly receives a time-out from the moderator or host. Just think, this is happening, every year - across America (and for all those that celebrate thanksgiving), at a dinner table near you. Imagine trying to assign the metadata to manage all these independent processes across every household, and assign metadata registry entries to manage all the dependencies, the turkey cooking, the timing, where everyone sits, who gets along with whom, and that this all has to happen before mid-night the 24th of November. This is just a peak into our world of Enterprise Integration specialists, for now - have a happy Thanksgiving, I know I will.... Wine and cheese are waiting... How does your Thanksgiving represent your job? I'd love to hear feedback. Latest Nanotech HighlightsThere is a lot of buzz in the Nanotech sector these days. Many developments have come forth in just the past year alone. Things that people have said can't be done for another five to ten years have been accomplished; everything from self-assembling structures, to utilization of motor molecules to move things around. There are a few things that have caught my eye, and in this blog I will recap just a few of these. The first is: "Neuroscientists break code on sight" In this unbelievable article, the neuroscientists actually figured out (at least started to figure out) a way to encode images, or found some of the mechanisms within the brain that are responsible for encoding images that are seen. Why is this important to me? From a business intelligence perspective it could mean a. much better data visualization, b. new ways of abstracting information, c. a combination of form and function where data points represent the neural network - resulting in "learning something new" rather quickly. Think about it this way: what if we constructed the worlds ONLY "universal data model" with specific functions attached to each point, and then by lighting up those points with different intensities (applicability scores) we could end up with an image or a thought or a fact? This is the way I see this particular advancement. More on how that might work, later... The next story comes from a company I've been watching for the past two years: Nanosys. "This technology covers a broad variety of devices including Field Effect Transistors (FET), light emitting devices including Light Emitting Diodes (LEDs) and nanolasers, solar cells, thermoelectric devices, optical detectors, and chemical and biological sensors." What's interesting here is that Nanosys has proven with this one device, that nanotechnology does indeed cross many different aspects of life; from the technology sector, to the chemical and biological sector. This underscores the importance of convergence, something I've been blogging on for over a year. The next quote from this story raises some very interesting questions in my mind... "The technology to integrate different materials at the nanoscale enables us to create nanostructures that perform as devices with multiple functions rather than just materials," said Calvin Chow, Nanosys' Chief Executive Officer. "This significantly increases the value of our nanostructures while simplifying their incorporation into products." The questions I have are: Assuming that a nanomaterial is now also a nanodevice at the same time, then we now have the ability to create the product (or part of it) known as Wellstone (Hacking Matter, Will McCarthy). We could also conceivably create a piece of "wood" made of nanomaterial, that can change its' composition to a piece of fabric or steel based on programmatic arrangement. Maybe these nanowires are not yet that advanced. Maybe we only have the ability to create a "computationally smart coffee table." None-the-less this is a very important discovery. Here's a fun one: "Molecules that suck" And finally: "Study shows nanoparticles could damage plant life" We are already aware of the dangers of aluminum particles in the human body, causing everything from memory loss to Alzheimer’s disease - basically that the aluminum is absorbed through the skin, and lodges itself in the brain and blocks normal activity. It's no surprise that a metal like this is dangerous to plants as well. But it begs the question: are there circumstances where trace amounts of aluminum nanoparticles could be helpful? And if so, where should they be applied and under what circumstances? If there was a way to keep them from floating through the air when "sprayed on", maybe we have the next generation weed killer, as long as we don't inhale or get it on our skin. Nanotech itself is a phenomenal field of discovery and advancement, each of these pieces I've included highlights different areas of nanotech and their applications or the affects there-of. It will only become more exciting as we dive in to next year, and begin to see business applications of these components in every-day life. What will the future integration component have?The problem today is that a patchwork of applications is needed to produce a serious integrated view of the enterprise. We should have the following components: ETL/ELT, EII, EAI, Web Services, Registry Managers, and Metadata Management software embedded within our enterprise projects in order to make sense of them. It's a well known fact that each of these tool sets brings its' own value to the table, and that most enterprises today have nearly all the components already in house. As far as compliance is concerned we are seeing devices (appliances) now that capture and compress transactions as they flow through the enterprise. A while back I blogged on the future "appliance" needed to keep integration alive, where everything will converge on to a single appliance. I stil believe this is true. Hardware is getting cheaper, and software (for some strange reason) is getting more expensive. Sooner or later, hardware vendors will partner or buy-out software vendors and then they will merge the software on to the hardware platform. Of course we know why software is getting more expensive: The future integration component will be a plug and play device that offers an MPP style interconnect, self-discovery of other "like devices" on the corporate intranet, and of course offers parallelism, HACMP, partitioning, and compliance within the device - all of it hardware driven. Needless to say the device will also offer compression, network sharing, historical copies of raw data, and other pieces like a query interface, information quality transformations, data mining, visualization, and a data modeling interface (just to name a few). Functionality wise it will offer right-time data collection, along with batch data collection, compression, and dissemination. The device will be capable of talking to other devices on the intranet and sharing information, recognizing similar information, and moving information around the enterprise during idle or partially idle times. Of course with the rise of right-time, the word "idle" will mean something different, instead of "idle" CPU, it will mean "idle" data sets. Speaking of data sets, the focus will be on compressing and utilizing compressed/encrypted data. So what drives this huge integration? Bottom line, the data warehouse, or better yet: data integration store (DIS) of the future will in fact be a plug and play device. Self-healing will be built in through replication. A shared nothing architecture with the appropriate interchange levels will be provided, along with fault-tolerant dual-fail-over network connections. The device will be cheap and will scale in cost based on storage needs, but storage needs will become reduced as compression and encryption take over. If you want to get a handle on your business TODAY, then I would strongly suggest beginning a master data management project, with classification and ontology’s of metadata - developing registries and business intelligence to view/alter/manage the registries across the enterprise. Putting this into a collaborative environment will help it thrive; adding incentives for employees to add metadata and information to the registries along with managing it will also produce a thriving result. I'd love to hear your thoughts on the matter. Thanks, November 17, 2005Is Modeling in your future?In my studies of nanotech reports, massive scale computing, and extreme parallelism I constantly come across items that lead to the same end. They all have similar findings, they all proclaim the same thing, it seems a universal axiom is bubbling to the top. Information Modeling is at the heart of successful processing and integration on a grand scale. In this blog I will explore some interesting experiments that have been conducted in DNA computing which is one of the pre-cursors to the actualization of the Nanohouse. Don't get me wrong, the computational side is very important as well, and in fact - to get the scalability, FORM AND FUNCTION MUST CONVERGE, and the FORM (the data models) must be flexible, and dynamic in nature. This is where Nanotech and Biotech comes in; they are currently defining the use of "wet-technology" or natural world models in our current technological world. "Computational Mechanisms in Bio-Substrates... Leverage massive parallelism, Harvest Nature's toolkit." (1) The study goes on to discuss how DNA computing is scalable, programmable, and can exist in a 2D and 3D landscape; they also discuss the nature of self-assembly - a concept reserved for Nanotechnologists. In one of my earlier papers and references to DNA computing, 1 gram of DNA can store multiple terabytes of information. This certainly leads to the notions of a compact nanohouse. The impact of 3d modeling has already been discussed, in the ability to fold relationships, see data in a new light, and begin to program systems based on "landscape" notions, or proximity in height, width and depth. The notions of "model driven development" are central to the development of nanotechnology. A parallel can be drawn when we look at business development and understanding, particularly in terms of SOA. When we go to build SOA, the "data models" underneath make all the difference in terms of scalability, and flexibility. When I look at VLDB / VLDW - it's the same thing all over again, MPP systems are the tip of the ice-berg, and shared-nothing architectures rely HEAVILY on the model of the data underneath in order to achieve maximum performance of the queries. If we add the DARPA term: SPATIO-TEMPORAL modeling to the mix, we can begin to uncover the power of 3D modeling. "Capturing interactions in the network of Gene-protein interactions"(1) - If we can capture the affects of interactions between data sets, and weigh their significance using neural computation models we can begin to dynamically compose and decompose relationships in massively parallel fashion. Beyond that, we can also begin to establish those that are of more importance to us based on historical content and knowledge or small-context discovery. This would be the self-assembly component of the Nanohouse. (2) lists many different programs that DARPA is involved in, while many of these remain closed to the public, their titles are informative and show a heavy convergence in the Nanotech area. Another report: Shows that bio-molecular computing requires specific modeling methods, and that models can have an impact in both the type of computing as well as the abilities of the computational device - to serve it's purpose. The Nanohouse is built from the neural model in the brain, as a massively parallel system tied together with specified form and function, it can scale beyond our current dreams. If you have some interesting links you'd like to share, or thoughts about the future of Nanohousing, I'd love to hear them. Sources: EII - Fight the Hype, Build EII for the Right Reasons!Here we go again, YET ANOTHER EII vendor pushing the fact that they can "replace" the need for a data warehouse. In this blog I will talk about the issues that customers face if they DON'T implement a data warehouse. There are pros and cons to everything, fight the hype that EII is the be-all-end-all solution, it's NOT. EII is one successful piece to the puzzle; we just need to know where it fits. The article is at: http://www.metamatrix.com/news/cbr-030705.pdf The first quote I want to discuss is as follows: Now just hold on, first off who says data marts are expensive? By who's measuring stick? Where are the numbers to back this up? Is this person discussing the entire corporate warehouse or just a single star schema? Unfortunately we don't have answers to these questions, but if a vendor comes to you and pushes EII in this light, I would strongly suggest you ask them these questions. Let's dive in a little more. EII Doesn't replace a physical extract unless the data that is wanted is current. It might be more purposeful to say that if EII replaced anything at all, it may be the need for the ODS itself, not necessarily the data marts. Data marts with history serve a huge strategic purpose. Integration systems (ETL/ELT) night after night, with the cleansing and integration of data on a massive scale serves a purpose. Most times the enterprise is looking for strategic answers across much of the history. EII is not built for TREND analysis, that's the job of the warehouse. Now, can EII access the warehouse behind the scenes? YES! That's the beauty of the EII system, it can leverage the existing investment, it can also leverage all the knowledge and rules that have been built to integrate the data historically. Here's the next quote: The point here is interesting. Again, if all that is needed is OPERATINAL (Now) data with no history, EII IS a great solution. But if data has to be trended across history, it will need a data warehouse behind the scenes. Data Warehousing experts do not typically "integrate large amounts of data" for the fun of it. We have business reasons that derive value from all that history. Who says we always replicate it in three or four databases? This is rarely the truth. It's like justifying their argument by saying: who needs three or four operational systems? Here's the next quote: True, but only if the business reports are operational in nature. What this gentleman does NOT discuss is the impact on the operational systems of an EII query to originally GET the data out. There is a cost to using this technology, and it needs to be discussed. If you're talking to your sales reps for an EII solution, either bring in an expert to help with the evaluation, or ask the pertinent questions regarding impacts, standards, and best practices. Finally, at long last, someone else in the article begins discussing the true value of EII as it pertains to the enterprise: I agree - it definitely augments the reporting with a "fresh" view of the now or current data. EII is a powerful paradigm, allowing the strategic or historical batch warehouses to become an Active Data Warehouse overnight, without the higher costs associated with "activating" the warehouse by writing your own code to load it dynamically. EII has tremendous value in the web-services production layer, along with the enterprise metadata management strategies. What we need to do as practitioners is figure out HOW to leverage the metadata across other tool sets, like retrofitting it (any changes to it) automatically into ETL/ELT, and propagating the business metadata into a metadata management tool for classification and ontology. One might say that another piece to the successful EII/ETL - Data Warehousing venue would be a service registry tool/utility, and a business metadata rules management component. It saddens me to find that specific vendors are hyping the "cool-aid" approach to EII, this does nothing but damage the notions of EII and what it CAN do, damages the vendors (in the analysts eyes), and produces false expectations of the technology which IT cannot meet. EII is a GREAT resource, but should be utilized in the right light, and as an augmentation to the data warehouse in place, not as a "replacement" for future data marting or historical efforts. Thoughts? Vignettes? Ideas? Pulitzer Prize winning essays? November 11, 2005IT: Changing from Cost Center to Profit CenterFor too long I've heard over and over and over again how IT is this, IT is that - ok, true to some degree (for whatever is meant by this and that). But I've also heard how IT (specifically data warehousing groups, enterprise integration groups, SOA groups, web-services groups, etc...) are in fact cost centers. I'm here to share with you a true story about turning a data warehousing project in IT from a cost center to a profit center. Here's a small part of how we did it, and what happened when we did. I've been blogging about accountability of information, the change in the notions of what a warehouse means to people in the industry, the impacts of auditability and accountability. This particular entry builds on all of those concepts. In order to have a profit center in IT it is imperative that a: the warehouse never be wrong (it becomes a statement-of-fact rather than a version-of-the-truth), Value, value, value to the enterprise, it's like "location-location-location" in real-estate. Our job is to build a bullet proof system that passes audits, and actually begins to pass as a system-of-integrated-record for the enterprise. Ok, enough of that, how do I turn my IT "project" from a cost-center into a profit center? I've been in this position and it's a good feeling. We had more projects funded in the first 3 months of production than any other IT program. We also had business stating: "IT finally delivered on-time and in budget, a quality product." Of course, they were unhappy that they didn't get the "world" in Phase I, but by Phase II three months later, they were signing like larks. We also had business units around the world asking our team for answers, and if we could replicate the success with their IT units, in fact, teach their IT department HOW to make it work. We were finally a profit center for our business units and IT. It felt good. There are many more steps to making this a success, feel free to discuss them by replying to this post. November 9, 2005Solving your Business Problems with RFI'sHave you ever wondered how to keep up with the barrage of technology these days? Have you begun an effort for enterprise integration, but are having trouble assembling the right team, understanding the technology, or getting an unbiased opinion from the right consulting firm? I'm going to shed a little light on these different questions in hopes of helping you get your projects off the ground correctly. I've spent part of my career as an employee, and another part as a professional services consultant. I've worked for PS at a software vendor, as a consultant for a systems integration firm who resold software, and now as a consultant for a systems integration firm who does not resell software. There are plenty of differences. As a CTO I'm constantly tasked with keeping up with technology.It's up to me to not only see what's coming, but to understand the vendors that are in the space and their capabilities. It's a daunting task, even as a full-time job (amongst my other duties as well). I suggest that in order to answer the questions above, the enterprise first have a clear vision of where they want to be in 3, 6, 12, and 18 months. Setting milestones and goals is an important step to getting where you want to be. There are tools - no, not software tools - but consulting tools that can help you achieve your goals quickly. Some of the keys to business success are as follows: Of course these are just the top set of requirements. In order to answer the next question I put forward in the header of this blog let's examine some interesting facts: 1. Enterprise Integration (no matter what you call it), is a journey - not a destination. A recent project I was involved in completed a POC in a couple of months and assisted in building a team, establishing executive sponsorship, documenting requirements, and vendor selection in Information Quality, ETL/ELT engines, VLDB RDBMS engines, Hardware platform selection, and Business Intelligence delivery. What made this a success was having proven best practices ready, vendor score-cards (and the metadata/business requirements for each), along with highly skilled focus on how the technology impacted the firm in different areas. There are many more steps to this process which I will gladly discuss with you. What are you struggling with? What are you interested in? Let me know, and I'll post more about these interesting topics in hopes of helping you overcome these obstacles. November 3, 2005Redefining the "Data Warehouse" and Combining ODS+EDWAlright - let's get down to business. I've blogged before about the convergence happening in the market place, but we've not stopped to consider what should be happening across the ODS and Data Warehouse. I went to dinner with Bill Inmon the other night, and he told me his "basic definition of what a data warehouse is, is changing." I agree - and by the way, it's not just me. I've spoken with Stephen Brobst, Claudia Imhoff, Larry English, and quite a few others. The base definition of what a warehouse is, and represents is changing - and for good reason. There are compliance initiatives afoot, there are problems with multiple copies of the data hanging around in the system, and there are issues of change to be covered in the source systems... These are real cases from customers today, and from an environment I worked in - in Government over 10 years ago, long before SOX or BASIL made it to the commercial arena. We had no choice, our warehouse HAD to be auditable, believe it or not, the customer we built this for saved millions of dollars, and found huge processing holes in their operational systems which were then fixed, all as a result of capturing RAW data in an integrated format in the data warehouse. What's happening to my Data Warehouse? Now hold on, did you just say the Data Warehouse is becoming a system of record? Let me ask you this: Where is the "single copy of customer master list, or product master list, or service master list, or contracts master list?" Does it exist in any one source system or is it generated by the data warehouse? Ahh - caught you looking... It usually exists in the Data Warehouse - BUT it should really exist as a data mart (we'll talk about this shortly). When an auditor walks in, can they, will they, be able to find or audit this customer master list in any single source system as a system of record? Probably not; the notion of System Of Record has shifted, the responsibility now lies with the one system that is producing these "master lists" - after all, you run your business on these lists, don't you? Here's another reason for system of record shifting to the warehouse: Suppose you are pulling sales information from 5 different source systems, the business goes for 4 months, and makes a change to the source system data architecture - and makes an additional change to the processing engine. 12 months go by, and an auditor wants to "see" the old data, and audit the old system (the way business USED to be done). How easy or hard is it to "restore" the old system, with the old data? Fairly difficult, why? the data architecture changed underneath. This does not bode well for a source system that SHOULD have been a system of record. Flip the coin, and store RAW data as-it-stood on the source system, but in an integrated fashion in your data warehouse; now what have you got? A solid architecture (if modeled properly) which allows data to be auditable from that time period before the change. The Data Warehouse has now become a system-of-record. Here's one more: As companies move towards right-time integration, or active data warehousing, the system (the warehouse itself) becomes more and more involved in "operational" activities. The uptime goes up, the SLA's increase, the requirements for "now" data increase, and by the way, the responsibility or accountability of that system goes up as well. One might call this an "Operational Data Warehouse" which combines both ODS and Data Warehouse in one fell swoop. However, unless there are two copies of the data to be kept (which I don't agree with), there can be a single data architecture and an enterprise strategy to keep this ODW running, accountable, and auditable - but it also requires a change in the way we "do" warehousing as a business. Typically data is munged, changed, aggregated, cleansed BEFORE it's moved in to the warehouse. But NOT before it's moved into the ODS right? Hence the original need for the split - but again, once data has been altered, changed, aggregated, or cleansed it is no longer auditable UNLESS the process has kept "what it was before the change, what it is after the change, and when it changed" as a log somewhere - doubling the amount of data it has to deal with. This drives complexity up on the source systems, keeps us from loading data into our warehouses when it's "ready" on the source systems, rarely do these process keep compliant audit trails. Not only that - but when there's an error, or the business changes there's a HUGE impact to the warehouse and the data within, sometimes causing the need to "re-state" the data, or change the entire warehouse to suit the needs of the users "today" (see my blog on version of the truth versus facts). Well, I'm here to say that this isn't the way it should go. Business should not be constrained from making a needed change just because the warehouse and it's incoming processing has a huge impact as a result of the change. I'm here to tell you that tomorrows warehouse will load RAW data at the lowest level of grain, as it stood on the source system - integrated by business keys, nothing more; no more cleansing, no more quality, no more complicated processing to load the warehouse. No, that moves downstream to the PULL from the warehouse in creating the data marts that the business needs to survive strategically. I'm not saying you can't mark the data with errors, I'm saying that tomorrows warehouse will combine the ODS in a single structure, feed in right-time and batch both, and will record a system of record of data across the source systems. The warehouse or Operational Data Warehouse of tomorrow (today) will require compliance and audit ability, parallel loading, and reduce the amount of data flowing outward to the marts. The business processing (rules, and cleansing, and quality, and production of "master" lists) will become the job of data marts, why? because any time your process changes data, it's the current version of the "truth" at that time, and is subjective to whom is looking at it. Master Lists are really data marts. The quality is needed, the aggregation is needed, but if we can't show where the errors are in our source data set (raw data set), we can't prove where the operational reports are broken. We also can't be held in compliance or an auditable fashion. Bottom line: the Data Warehouse must contain current, up-to-date, and historical views of raw data. Integrated by business key, but not changed or altered in any way. The architecture must support business changes without losing any history. It can, and has been done. The business rules processing, including quality, cleansing, munging and aggregating must move downstream from the warehouse - we must realize that any time we change the "statement of fact" we are producing a slanted version of the truth, today's truth for today's business user - which in essence is a data mart. This whole topic may strike some of you as "wrong", I'd love to hear what you feel about this. Until next time, November 2, 2005Real-time versus Right-Time, Who's Right?Nothing makes my skin crawl more than to hear "REAL-TIME SYSTEMS" shouted from the pulpits, especially from those in-the-know who should never use this phrase. This is a media phrase used by marketers for marketing to grab market share. It's a FALLACY, a false-hood, a nonsensical term. Not even the fastest systems in the world are REAL-TIME. Warning: This blog is a VENT, read at your own risk of agreement. Also note, parts of these ideas are credited to: Stephen Brobst, as we have had many discussions on this topic. First, I have to apologize to those who need to use the term for marketing purposes, but second I have to say: CHANGE YOUR LITERATURE! The language is just flat-out wrong. There is no such thing as REAL-TIME SYSTEMS. Ok, here's my point. Humans create machines and systems after their own image. My brain works a certain way at a certain speed - we have not been able to replicate the way a full-human functions. For example: If I set my hand down on a hot-stove, I've usually burned my finger by the time I pull my hand away. If my body were real-time, I would've known as I was setting my hand down that the stove was too hot and would burn me, therefore I would've pulled my hand away before it got burned. If my body and my own nervous system can't operate in real-time, how in the world can I ever construct a system that operates in real-time? Anyhow, the correct term is RIGHT-TIME. The data arrives at the RIGHT TIME to make the business decision, and if we're talking about government systems or classified information we're talking about milliseconds or less. If we're talking about general business decisions, we may be talking 3 to 5 minute intervals of refresh, or maybe 1 minute refreshes - depending on the BUSINESS NEED. Right-time is similar to "version of the truth", it's interpreted and it's subjective. Right-time today, may not be right-time tomorrow. On the other hand, if the business can justify the return on investment they'll get by answering the questions within X time-frame, then they may have a case they can justify with money. Think about it in technical terms: To get faster networks, requires faster switches. To get faster switches and networks, requires (usually) more computing power, to get more computing power, usually requires faster disk storage. To buy all this "faster stuff" requires money, lots of it. It also requires a better architecture. For example: what "worked" to move 5 rows in 5 minutes from point A to point B no longer works to move 15 million rows in 5 minutes from point A to point B. What works (architecturally) to move 15 million rows in 5 minutes, no longer works to move 500 Million rows in 15 minutes, and so on. The problem becomes exponentially more difficult to solve, requiring better planning, better design, faster and more expensive processing power. The question is: for what? What is the objective you (business person) are trying to solve? The truth is, when told how much it costs to implement "right-time" they back down, and say, well - we could use a 10 minute refresh instead of a 5 second refresh. That would work just as well. Back to the point, there simply is no such thing as real-time. It's all about right-time, and finding out what's needed - when - and delivering it within that time frame, keeping in mind that right-time is subjective, and means different things to different people, so - what's the right-time today, may not be the right-time tomorrow. Do you have a "right vs real" time situation to discuss? Love to hear your thoughts, feel free to comment. What does ETL do that EAI Can't?The reason for these posts under SOA, is that this is where the convergence of these technologies is headed. In other words, SOA "fabric" relies on all of these tool sets and paradigms to work together to achieve best of breed integration. I've been asked on another post about this specific subject, so here are my thoughts. My thoughts tend to be on the edge, and hopefully spur some comments from knowledgeable individuals in the field. With that, let's take a look. Those that have been existing in the EAI world (especially if they're from a vendor) will try to tell you that their EAI tool is the be-all end-all solution for integration. This just is flat out wrong/false. If this were true, then we wouldn't have had the rise of ETL / ELT, and now the rise of EII engines, and web-services. EAI DOESN'T do everything. One of the arguments they give us is: The whole world should be "real-time", so no need for batch at all... That's flat out untrue. The whole world can't be real-time, and it's not real-time at all anyhow (this is a fallacy all by itself). The CORRECT notion is RIGHT-TIME not real-time (a blog for another day). Ok, so if we take the marketing statements they have which are geared to sell their product, and apply it to business here's what usually falls out: EAI: Again, EAI is GREAT for application integration; it is GREAT for "transactional capture and movement". Now let's take a look at why not everyone needs this. Right-Time Data Requirements: There are others (which I will blog on in the near future). BUT for now, the business needs to understand what they're getting in to. They often ask for the world and a silver spoon, but when it comes down to brass tacks, can they pay for it and justify the cost with business value? EAI again is a PUSH technology (see my post about Push Vs Pull). EAI - WON'T HANDLE BIG BATCHES OF HISTORY. ETL or ELT or ELTL: Most of all, it is often lower-cost with a quicker implementation time-frame, and it's about using the right tool for the right job. There are hundreds of articles out there that discuss why ETL, and Why EAI, and what the purposes of each are - I'm just trying to give a glimpse into each. Recap: We typically use both tool sets in conjunction with each other - best of breed to answer the questions the business has, but generally this is only the case if the business has already justified the need for EAI, otherwise ETL/ELT is the tool of choice for warehousing integration. |