Blog: Dan E. Linstedt« April 2005 | Main | June 2005 » May 31, 2005Carbon Nanotubes for IC ChipsAs I discussed in my first articles, nanotechnology is not only here to stay, it's made it into the R&D labs of some of the hotest integrated circuit manufacturers, and now - it's on the fron page of a massive circulation. Carbon Nanotubes, and Carbon nanowires used to help "cool" and shrink the silicon processor environment. Watch this space soon, I will begin a journey into what I view as the creation of a super-DNA computer. Carbon Nanotube based computing devices Nanotech continues to move at astounding speeds, just keeping up with it will be challenging to say the least. More to come shortly. Profitability in all business cycles - part 4So you're curious are you? Have I grabbed your attention yet, or is this not making any sense? In thinking about accountability, the data supply chain, and change requests there is one key component to making this all happen. That is: to show the bad data with the good, maybe not in the same reports, but to physically separate the bad data from the good depending on the severity of rules breakages. There's one more thing to think about, if we accept Data Supply Chain as a paradigm, then we should be keeping the business key on the data unique, and the same - once assigned, always assigned. The concept goes back to RFIDS and the manufacturing supply chain. RFID's are helping "clean up" and provide visibility... Read more... RFID's are used to clean up data, and provide visibility into the manufacturing supply chain, they are causing accountability in business and providing the means and mechanisms with which to improve and measure supply chains around the world. If RFIDS (which are nothing more than unique identifier keys) can do this for our manufacturing supply chain, imagine what a CONSISTENT business key can do for our DATA SUPPLY CHAIN! It can become the RFID for the Data within our systems. This means businesses MUST abide by the following rules: I'm talking about more than just simple sequence numbers. We need an international numbering board. What if (just a thought) that the RFID's could have the same exact number or ID as the Data Supply Chain? Of course that would mean tagging the very smallest of parts in all of our assembly lines. Service companies would never have RFIDS to speak of, except maybe on invoices or paper contracts that are printed. Hmmm, does this mean data is trackable when printed to hard-copy? You bet! Imagine a filing room filled with RFID's - how easy it would be to track down a document? Maybe the US Patent Office or the Library of Congress could undertake something like this, saving billions of dollars a year (Yea! Less Taxes?) Next step: Printers that stamp RFIDS on documents according to document numbers that come from DUIDS. So you get the point, data identifiers (business keys) are just as important as any hard-coded identifier tags we put on products. Curve ball: if we can place a value on the products we produce, and we can begin uniquely identifying our data and it's elements, then there's no reason why we can't place a value on our data as well. Back to the real world, since today we have no commonly accepted notion of truly unique identifiers, business keys will have to suffice. So what's the problem with today's business intelligence reports and systems? The problem is, by the time the business user sees the integrated data set, every effort has been made to adjust, clean, alter, move, remove, and merge data to make it usable by the business. This is fine until we begin to question what is meant by "One version of the truth." As I've stated in several other entries, TRUTH is in the eye of the beholder and is subjective. It has NOTHING to do with the FACTS of the way the data is captured, stored, and moved around the organization. While there is value to producing "usable information", we (as BI implementors) have long overlooked the fact that there's also value in producing the unusable facts - the raw data that is messed up, wrong, unmatched. However, everything starts with the analysis of the business keys. I propose that there are really two answers, both right at the same time, ahhh - a Conundrum? Yup. I propose that along with DUIDS, we should be storing a single statement of FACT in our warehouses, then moving the FACTS into polarized/colorized versions of the truth in Data Marts. This means two basic principles apply: 1. Business rules move to the "output" side of the warehouse, between the warehouse and the marts. Physical separation of the data is absolutely necessary to begin pushing the accountability back into the business, to begin the IQ cycle and the business process clean up, to begin providing true visibility into ALL data that exists in the source systems, to begin showing the FULL level of rejects in our data supply chain. Manufacturing supply chains don't throw away "bad parts", they put them in reject bins, record them, try to figure out why they went bad, and then try to improve them so they don't make the same mistake again (because it costs them money, time, and competitive advantage). Why shouldn't we treat our data this way? Why do so many implementation specialists INSIST on cleansing, mixing, merging, and constantly fine-tuning the "truth" so that these errors are hidden or disposed of? By actually separating the bad data into "reject bins" for the lowest level of grain, before it is cleansed, mixed, merged, etc.. We can really begin to take inventory of our source systems and the business processes - we can finally see where our businesses are HEMMORAGING money, time and competitive advantage. In our next entry, we'll walk through an example of how this worked at a real customer site. IT'S TIME for OUR DATA SUPPLY CHAIN to step up and begin working for us. Comments? May 27, 2005Profitability in all business cycles - part 3We're here, dirty data, complex business processes, inconsistent integration points - sounds like what an EDW/ADW is supposed to help solve right? Parts of it anyhow are solved by EDW/ADW, other parts must be solved by accountability of end-users, still other parts must be solved through SOI (service oriented integration, under the SOA stamp). We've established rule #1: in a sea of data throughout our enterprises, the single most important data point is the business key - the one and only reference across the company that means something to the business, and allows the business direct access to the data set they are after. Are we ready for rule number two? Not quite yet. Let's explore dirty data further. Not to change track, but Information Quality is extremely important. It's not just about the data itself, but it's about the people, the business processes, the metadata, and the metrics and measurement all used to ensure continuous business improvements. Dirty data, and broken business processes can make a company "bleed money." And that's just the START! Data Models that help increase accountability from end-users, and systems architectures that help raise the visibility of business process problems help stop the bleeding, and can save millions of dollars a year if done right. But to understand these statements, we must walk through just how the systems got this way. So we take the case of the broken business, customer SLS123, we just lost $30M to our big competitor because we took 5 weeks to respond, and our competitor took 3 weeks to respond. Please note, just because they responded quicker, doesn't necessarily mean that the quality of their product is better - it just means they stream-lined a portion of their sales, finance, and contracts communications. Now if they deliver faster, with higher quality - then they've truly got us beat, and we will go out of business if we don't do something to correct the situation (keep up). By the way, this is what ERP systems attempt to address, and sometimes do a good job of it, but obviously they leave a little bit to be desired (due to high levels of customization), hence the usage of additional tool sets like EAI, to move the customer into CRM systems and through even more complex business processes. After examining our business process here's what we find: And the cycle goes on, the complexity increases, the touch-points increase. When we look at this particular scenario we discover that there are critical touch points and manual approval mechanisms that must be in place, we also discover interesting auto-synchronization mechanisms hidden in our legacy systems, or even in our re-engineering of the legacy into ERP and CRM. We finally discover that there are unnecessary processes that the data goes through which neither improve the quality nor speed the process up. These are the business processes we wish to eliminate to stop the bleeding. Now there's the data set. One customer: John Smith, 3 Account Numbers - SLS123, FIN456, CONT259. Can the business trace John smith at an enterprise level? Not very effectively. Does the business have deep visibility into their data supply chain? No. Business Rule #2 for effective profitability: Not in a box, not with a fox, not here nor there, not anywhere (Dr. Seuss) - the business key must stay as a consistent representation of the data point from this point forward. Business Rule #3: Business Rule #4: Business rule #5: As we continue down this track, we will discuss how an integrated data store (ADW/EDW) can help pinpoint some of these problems from a metrics driven perspective - but only if the right models are in place. We will also begin showing how to help business users become more accountable in their positions - and actually begin to issue change requests, and allocate dollars to fixing the source capture systems, thus stopping the hemorrhaging of the company, while making it more nimble and stream-lined. Thoughts? Shout out, enter your comments below... I would love to hear from you. ETL /ELT ScorecardingI'm happy to see that Doug Laney has joined us here in the blog space. Not to take anything away from his valuable services, but I also would like to say that we are offering a free ETL score-carding mechanism. This is a very short entry to show you where the score-card lives. The downloadable score-card is free/empty. and has no vendors. The free ETL/ELT scorecard is downloadable at: www.MyersHolum.com My thoughts on the ETL/ELT scorecard are as follows: ETL and ELT are two different utilities and really shouldn't be compared except in areas of metadata, GUI development (no-code environment during development), flexibility, connectivity. Unfortunately comparing ETL to ELT in the transformation areas is unfair, but necessary. It is important to evaluate which transformations are provided to you by the RDBMS vendors, and which you have to add to the RDBMS (UDF - User Defined Functions) yourself. However, the true nature of this scorecard looks at sourcing, targeting, metadata, transformations, market stability, cost, number of outside consulting firms, cost of available consulting knowledge, and a few other key metrics. Please feel free to download, and post your comments, questions, remarks or improvements here. Thanks, May 26, 2005Profitability in all business cycles - part 2Ok, now that we've introduced the concept let's walk through some examples of complex business processes, and dirty data. Let's find out just what we can do about starting to solve some of these problems. Furthermore, let's explore the real issue of "broken" business processes, do you have some of these in your organization? So profitability is tied to complexity of business processes coupled with dirty data coupled with too much manual intervention. What exactly does this look like? Here's an example: The customer wants to know approximately when this will be built, and shipped, or if there are ways for them to track the product through it's build cycle. The business says: well, we can only track it once it's shipped to you, and we can't estimate it's cost or it's build time until we have designed the custom parts. Customer says: fair enough, when will you have a design complete? Sales says: can we get back to you in a week? Ok - sales has the customer contact, they qualify the lead through a number of manual intervention processes before passing it off to Finance. Finance takes SLS123 and changes the account number to FIN123. Now I ask you, is there any traceability in this simple example across Sales And Finance at a corporate level? No, not unless someone in finance or sales records the customer account number change (from/to). Finance runs it through it's paces, approves financial lending, and then passes it off to contracts who runs it through a series of complex business processes with manual intervention. By the way, contracts changes the account number from FIN123 to CON456. The customer finally gets a call 3 weeks later stating they have a contract for the customer to sign. But before they can give a delivery date they need planning to run the manufacturing phase through their systems, so off it goes. Another two weeks and planning returns to Contracts to provide an estimated build plan and date. We're already 5 weeks from initial contact, and by the way the customer has put the same bid in to our competitors. 3 weeks ago, our competitor returned the bid and build ETA to the customer. We call the customer back and they say: sorry, your competitor won the bid. We lose $300 Million dollars. What happened? Our complex business process has not been optimized or stream-lined. There were unnecessary hand-offs between manual intervention, and alternate business units in order to win the business. Imagine if Sales were empowered to a) check financial standing b) run the contract up against previous builds of similar nature (data mining with confidence levels), c) run this by a financial analyst and contracts approval individual - all within 2 days, and return to the customer. This would be a) a more profitable business, b) cheaper to handle contracts and approve financials c) single out contracts that are too difficult, not our sweet spot, or specialized enough to warrant higher prices d) make us highly nimble and competitive. In order to get there, we must a) reduce the number of touch points on the data b) utilize data mining tools in an active warehouse to enable insight at the sales contact level c) simplify/streamline the business processes between customer contact, estimation, finance, and contracts approval - which means Cycle Time Reduction, and business process critical path analysis. Think of the business processes, both mechanical data touch points, and manual data touch points as a graph of 2D lines (x,y coordinants). Complexity of the process going from A to B is the rise/run or Y coordinant. The X coordinant is the process number. Then graph the business processes as best as possible. Finally begin to analyze the graph for critical path - attempting to eliminate touch points, and reducing complexity of the business processes (reducing the Y) to end up with as "straight a line as possible". Keep in mind that changing keys to information doubles complexity, even if the changes are recorded. I think you'll be delightfully surprised. All companies who undertake this effort can save millions of dollars a year with 1/2 the investment, furthermore this drives the quality up, profitability up, complexity down, overhead down and time to deliver speeds up. Result? More satisfied customers, the business is more nimble. Now let's take a look at the dirty data problem (which we'll explore further in Part 3). The first problem is we need an enterprise view of this customer, even if it has to span business SECTORS, and not just companies within those sectors. This will be the ONLY way to roll up a single customer and pinpoint exactly where their deliveries are within the entire organization. Sometimes this is referred to as the Data Supply Chain (Jill Dyche, Baseline Consulting TDWI 2005). What if we kept the SAME customer account number throughout all processes? We can pinpoint exactly where in the data supply chain their application is, and we can begin tracking and monitoring (metrics, KPA/KPI) on the efficiency of the business process. Ahh you say, we have that in place! Ok, but what happens when you re-bill a customer? Do your systems change the Invoice Number? It's the same problem, different data. Paradigm Rule #1: So business keys are extremely important to start with as a metric in business profitability. If you can start with pinpointing the places where keys are changed throughout the business, you can begin identifying major breaks in the data supply chain. We'll dive deeper into these concepts in Part 3. Thanks, By the way, TDWI - November, Orlando - come see the Data Vault Data Modeling in play, or read about it at: www.DanLinstedt.com May 25, 2005Profitability in all business cycles - part 1Business should understand how decreasing cycle time, improving quality, straightening out business processes all lead to increased profitability. Business should also understand that profitability is directly tied to traceability and accountability both in Business and in the data that business deals with. In these entries we explore the connected notions of cycle time, quality (data and business process), business accountability, and success. In math, What is the shortest distance between two points? Can anyone tell me what the shortest distance between Customer Contact and Delivery of goods or services is? What does the straight line represent? Machines do a wonderful job of tracking data, massive amounts of it - humans do a wonderful job of turning that data into information and making it useful for organizations. However somewhere in the mix, the real "business" that earns profit is lost in translation when the machines are given complex tasks, and dirty data. Information quality and location, along with business accountability/complexity are two key factors to profitability measurement. The straight line in business should run from first point of customer contact through all the business processes to delivery of the final goods or services. But I'm sure you already know this. For instance, most extremely large manufacturing businesses have a cycle as follows: Each cycle is represented by business units. Each business unit typically owns it's own "data" and operational systems, each business unit typically uses it's own "customer key" to represent a customer throughout the life-cycle. Furthermore there are many major and minor processes in each of these business units that alter and change the customer data. Finally, as the hand-off of the customer account occurs (from one business unit to the next), the customer account numbers frequently change. What I'm saying is: Bottom line for this series (the theme) is to answer: how does this affect my profitability? You may have heard of this approach in the 80's, called Lean Initiatives, or Cycle Time Reduction - these days they call it BAM (business activity management) or BPM - business process management. However, these particular concepts roll up into something bigger: TBM (Total Business Management - which includes activities, processes, data, quality of data, accountability, profitability, overhead costs, and so on). As this series progresses, we will discuss examples of problems, and possible solutions. For now, if you care to sound off about what you see in your organization, that would be wonderful. In the mean-time, I've been asked to talk about the data modeling and architecture sides of this house at the IQ conference in Houston, TX (september). Hope to see you there. May 6, 2005Structural Mining, Dynamic Data Warehousing, Neural NetsI can't decide if this fits under nanotech or if it fits here, but I'll put it in this category, and focus on the business sides of the house. The winds are blowing outside today, as I sit here anxious for a return call. I've contacted a few individuals at a university which is currently studying structural mining techniques, and will hopefully be discussing some of their progress soon. In this entry, we will explore the brave new world of what I like to call: Dynamic Data Warehousing. I'm not referring to Dynamic Data sets, I'm referring to Dynamic Structuring and Restructuring of the information systems as a whole. What is Dynamic Data Warehousing? In a business sense, or the simplistic definition is: to add and/or change the structure of information on the fly based on "content analysis". The adaptation of the structure is in near-real time, and will result in learning things we didn't know before. It basically changes the data model underneath the covers by using neural net techniques and structural analysis ideas. Why would I want Dynamic Data Warehousing? What is structural mining and why would you want it? Structural Mining, or structural analysis is the ability to find out what's right and wrong with the architecture. The ability to discover new and different methods for storing, retrieving and hooking data up. Structural mining is a key component of Dynamic Data Warehousing (could be Dynamic Data Integration too), and the ability to change structure on the fly. Imagine this: you build a web-service to accept incoming transactions from a provider. Today, it has name and address on it. Tomorrow you ink a deal for them to provide city, state, and zip. It shows up on the feed that night. Let's say that IT "hasn't gotten around to changing the structure" yet, and you have structural analysis engine applied to the service. No sweat, the new fields arrive, and they are in context of the customer record - the SAE (structural analysis engine) doesn't see any harm in automatically adding the fields to the data model, and proceeding with the load. This is a level 3 change (scale of 1 to 3, 1 being Manual intervention needed before change, 2 being warning: change occurred - 60% to 80% sure that it works, 3 being no problem, context determined with 90% or better confidence rating - change applied). From time to time (as with all neural nets) we'd have to correct the neural model that the SAE has built, but for the time being, it becomes a central part of the glue to building a Dynamic Data Warehouse (or Dynamic Data Integration store). On the flip side, it would mean learning some lessons about Fraud detection, and teaching those to the SAE as well - so that it can spot potentially fraudulently added data trying to get in to the system. A gate-keeper of sorts. I believe that Dynamic Data Warehousing or Dynamic Data Information Stores are the next level of integration, however to get there - it requires a data modeling technique that is capable of being altered without losing existing information or corrupting existing structural integrity. What might be the ROI on something like this? Thoughts? Would love to hear your comments on this. References: May 5, 2005Big Data = Big Problems = Huge ROI if done right.There's a lot of talk in the industry today about VLDW/VLDB (very large data sets), and how too much data might not be such a good thing. I take a different opinion on this subject. In this blog I hope to explore the following questions: What is VLDW/VLDB? What are some of the problems with it? What kinds of ROI multipliers might I find in a big-data set? I've recently had discussions with a major credit card processor, and as a result will share with you some of the common issues that they face daily. VLDW/VLDB is defined to be big data, does it mean we have a 1TB, or 10TB or 100TB data store sitting there? No, if the data is sitting there, and is not used for business purposes then by all means - it shouldn't be stored on-line (due to cost), or the business may not be looking at their information hard enough or with the right questions to use all the data. Something to think about: Data Mining has begun to be a viable solution to providing analytics, trend analysis, and forecasting above and beyond traditional statistics. In other words, companies with extreme competitive advantage are using Data Mining to reach and discover things about their business that they didn't previously know, or to predict future outcomes with a confidence rating that enables business decisions that make sense. Having big data and using it are two different things. If you use 80% or better of your big-data sets, then you have a VLDB or a VLDW. The base-definition of Big-Data means different things to different people. Someone sitting at 500MB might thing "big" is 2TB. Someone at 2TB might think "big" is 8TB or 10TB, and so on. Instead of trying to define big data, I'll discuss the different levels of changes that happen within terabyte sized data sets. Ranges: The ranges are defined as a rough guide. Things change within each range. Data models, disk layouts, CPU to Disk ratio, Speed of networks, sizes of nodes, Large SMP boxes vs small MPP vs Clusters, Queries, Indexing, Constraints and so on. In other words: what works at 2TB doesn't work at 5-6TB. What works at 6TB won't work at 20TB, and so on. Of course there are some hardware vendors out there who provide so much horsepower that these ranges don't apply, and in fact as they progress and "data warehousing appliances" become more common place, they will handle most of these issues for us under the covers. But for now, assuming we are on existing systems, this is something to think about. What are some of the problems with VLDB/VLDW? List of potential problems: (assuming large SMP boxes) As far as mitigation strategies, relying on experts or those that have built and architected systems for these sizes is paramount. Architecture is everything in these systems, without long-term architecture and forward thinking the systems experience growing pains at around 20TB to 48TB, and then the company must put an all-engines-stop out and re-build from the ground up (very costly), or migrate to a new platform (also can be very costly). Denormalization is one mitigation strategy that will help, but only in certain cases. Remember that denormalization of data sets will instantly double or triple the storage requirements. Here's a fallacy for you: Storage is cheap. NOT SO at big data levels. If you buy cheap storage, you get "poor performance" or lack of parallelism. Furthermore, the more "performance" you want to drive out of a VLDB/VLDW, the more storage you may actually need. So what about the data sets? Why can't we/shouldn't we reduce them? There are two basic types of information in VLDB/VLDB: The business users are divided into multiple user groups: 80%-90% of those that use the good data, or moderately good data (good data is open to the end-users interpretation), and 10%-20% of those that require transactional details. In the Good data set, there's no reason to keep around "old" or unwanted/unused data sets. They should be removed, or placed on a rolling usage cycle. However in the transactional data set (transactional with history), it's at the lowest possible grain. The more data the better! Especially if the business is mining the data set, and/or has audit requirements or federal/international mandates that state it must be kept on line. Data mining loves big data, the more data it can mine, the better it's predictions and confidence ratings. The less granular detail it can mine, the worse it's predictions are - you might as well go back to aggregates and standard statistics. In this case, the credit-card processing company also has SLA's with it's vendors, along with the need to detect fraudulent activity - they MUST (and do) use a data mining tool on the transactional historical data. With all these headaches why build a VLDW? Why not just go back to the old-style analytics backed with aggregations, averages, and statistics? Won't that save cost? The reason? They are missing enough data to significantly impact their decision making capabilities, especially with the data mining engine. In this game, the business must spend a little to gain a lot - especially if they know what questions to ask and have a firm grasp on how the answers will make them more effective and more competitive. There's more, a lot more - I discuss the details in my class, along with mitigation strategies - I'd love to meet with you at TDWI in DC (may 19th 2005) should you wish to drop by. See you next time. May 3, 2005Nanomorphing feedback loops, terminator Eyeballs.That's right, Terminator as in T2 Eyeballs. Well, not really that advanced (yet). I just read in May's issue of Scientific American about nanomorphing silicon implants that take the place of damaged light recognition cells in the back of the eye, basically allowing a blind person to "see" images and outlines. They admit the resolution isn't that hot yet, but it will advance like everything else. This article will explore Form and Function, and discuss the nature of adaptable neural models, and what it means to build a system that could potentially mimic the human brain. According to the article, the brain can operate at 10 billion synapse firings per second. Who's Synapse? What's a Synapse? and Why does it Fire? For answers to those questions and more, see your local brain surgeon. (just kidding). Here's the poor mans definition: imagine for a minute a series of interlinked spider webs. Got the picture? Ok. Now, imagine the spider on the center of each of the web. Each center of the web represents a term called a neuron. Each part of the web spanning outward, let's call that a synapse. Where one web attaches to another, let's call that a dendrite (receptor). A spider catches prey by first, having a sticky web - second by feeling the vibrations caused on the web when something gets stuck there. Now imagine the neuron (center of the web) building up a charge and sending that charge down one or more synapses (all at once). Once the charge gets' high enough, it fires across the inhibitors to the dendrite receptors on the other side. In other words, capable of shaking another spider web with a directed charge. Now imagine 15 layers of these webs, each interconnected with the other, and each layer responsible for a "part" of coverage. The inter connectivity can provide a feedback loop to build up a charge, or to "morph" it's neural structure and learn things - or in this case, focus on what's important like edges, highlights. Nanomorphing is changing the hardware layers to suit the needs of the situation, rather than changing the software layers. The nanotech part of this allows different chemical bonds to be "favored" and "unfavored" depending on the electrical current and stimulation, thus changing the configuration at "run-time". This is an example of just how important it is to bind form and function closely together - the more specific and targeted the functions are, the more compact they can be, the more efficient they can be. The more bound the form is to that function, the more adaptable the form can be - thus more resilient, and quicker to respond or adapt to it's environment. Also, surprisingly - the more standard, fault-tolerant and redundant the architecture gets which by the way, leads to adapted efficiencies during run-time. This eye piece (according to the article) is made up of transistors modeled in a neural net fashion, with nanotechnology components, layered 5 layers thick. Each layer provides feedback loops to the last, to allow a charge to build up in a specific area, and "fire" a nerve ending in the back of the eye to the brain, resulting in a perceived image. Note to self: Where's the ACTIVE feedback loop in our Data Warehouses? Are we still in the cave-man stage here? Sorry about that... Moving on. You think this stuff is too far out? Hasn't happened yet? too difficult to build? Think again, there's a company "in my back-yard" in Boulder, CO called Genobyte... Check them out: http://www.genobyte.com/ They are already building adaptable hardware, and quite surprisingly, have been doing this since 1997. Anyhow, my point (that seems to take so long to get to) is this: CONVERGENCE IS EVERYTHING, when it comes to nanotechnology, and nanohousing (nano data warehousing of the future), we will be forced to combine form and function in order to build adaptable systems with virtually unlimited scalability. If we can build a system of nanomorphing hardware, and compensating software with encapsulated dynamic feedback loops, we may have the beginning of something interesting. Would love to hear your comments and thoughts or questions. Cheers, Look into the future: Appliance Data WarehousesThe market is shifting, vendors are packing more and more features and functionality into their devices, they are also making their devices smaller and smaller. What does the future Data Warehouse look like? Can it be an appliance like device? What kind of partnerships or acquisitions can we expect? Why would we choose an appliance DW over our own component selections? In this blog I look into the future, just to see if we can answer these questions. I believe there are changes coming, long overdue changes. In the land of yesterday we would have to go in search of "best-of-breed" software, and then pair that up with best-of-breed hardware. Size it appropriately, install it all, and integrate it ourselves (within IT). I believe all that is changing. If it hasn't already, it certainly will shortly. New vendors on the market are offering coupled hardware with built-in RDBMS's. This is just the start and as good of a start it is, it still has a little ways to go. Let's talk for a minute. What if you could walk out and buy an ADW appliance (active data warehouse) - self-configured to perform optimally on the machine, embedded within the BIOS, encapsulated storage, and a black-box interface... Would you do it? Especially at a cheaper cost than buying RDBMS vendor 1, and Hardware vendor 2. So what does the future device look like? There should also be a BI (reporting tool) card built in. It should have it's own IP connections, and reside on it's own processor slot as well. The tool and the box configuration should all be browser based, all administration could be fat client I suppose, but why? Why not make it all web/app server? It's separated from the RDBMS and ETLT engine slots, again so that it can run in parallel. Although the BI tool and the ETLT tool should be based on a common metadata framework. Now, depending on the number of nodes purchased - hooking them together through a third pre-configured IP allows them to load-balance across a high-speed backbone. Again, nothing to do with each other but distribute the work-load. What kind of partnerships or acquisitions can we expect? That's all fine and dandy, but where's the value proposition? I think you may see compliance vendors entering this game too, they already are partnering with storage vendors for appliance based storage. What makes this work and why? There are a number of companies to watch out there who are moving in these directions. It won't be long before they can meet all these needs with one appliance. Of course it wouldn't hurt for these companies to consider a metadata appliance either, or possibly incorporate that directly into the warehouse appliance. Just a few random thoughts, See you next time. May 2, 2005ELT and ETL - candid view of pros and cons.Now that I've blogged on the needs for an ETL-T engine, I think it only fair to discuss what EL-T still leaves to be desired, and what is required to make EL-T perform. While ETL-T is the industry direction, EL-T has a ways to go before it can "take-over". Of course the notions of ELT "successes" are highly dependant on the RDBMS engine that it puts its' data in. Let's explore these notions a little deeper... EL-T (as I blogged recently) is where the integration industry is headed. Some of the comments I received were in regards to specific tool sets in the integration space. In another blog this week, I'll explore what these tools will need to have in order to survive the next couple years. Let's start with the advantages of ETL over ELT: Now, before we knock ETL, let's just say there's still some big benefits to being able to perform "T" in stream, even though the ETL paradigm is indeed "dead" or morphing into something else.
Ok, now let's talk about ELT and what it's pros and cons are. Some of the cons: Ok, that said: at the end of the day, I still would like the option of ETL-T with a lower cost, and be flexible enough to deal with the situations that arise. More to come. Cheers for now, |