Blog: Dan E. Linstedt« September 2005 | Main | November 2005 » October 26, 2005Got Dirty SOX? EII & ADW & IQRecently my discussions in the field have centered on Information Quality (or the lack thereof) and the EII tool set as well as the Active Data Warehouse (right-time data warehouse). We will explore this exceedingly dry (hopefully interesting) aspect in this blog entry, particularly in relation to Compliance and Integration - but I felt that it fits under SOA as well - so here goes. Information Quality (according to Larry English) includes business processes, data, reporting, and people involved in interpretation of the information. But Information Quality both helps and hurts compliance efforts, particularly when the corporation is audited. One of the over-simplified definitions of SOX is (at least at the data level): Can your system show the "before it was changed, after it was changed, and when this change occurred" audit trail. Without being able to answer these questions, any "software product" that claims it is SOX compliant is flat out wrong. What does this have to do with EII? I've written an EAB (executive action brief) on this site that talks about "making your data integration processes compliant" - click on B-Eye-Network, go to the HOME page, and look for "education" link in the lower left corner. Now, if EII is _not_ transforming data, then the data set it pulls from should be "sox compliant" - it shifts the owness back on to the source systems to maintain audit trails of any information it changes. For source systems, this is a no-brainer, they are capture systems and are supposed to be "systems of record" for the business, which means the business is already supposed to "trust" these systems - even though the data quality may or may not be there. Time out - this doesn't make a lot of sense, where's the quality in all of this? Quality Tools are nothing more than transformation engines (ok - they do a LOT more than that), but when it comes to bare-bones they are CHANGING DATA sets. Therefore: everything that applies to ETL/EAI and data mining (in accordance with compliance) also applies to EII, and the processes that load active warehouses. Wait a minute! Active Warehouses have a refresh cycle that's too fast to put a quality trigger in play, right? Now what we will say is this: even ADW's still have "strategic" initiatives to them, which means that only the tactical sides for-go the quality settings (until the strategic based quality engine cleanses the historical data, and sometimes that historical data is returned to the source during transactional/tactical processing). Remember this: Information Quality is SUBJECTIVE, it is one version/one flavor of the truth - truth is subjective, and will change depending on the eye-of-the beholder (the end user). Therefore, quality engines MUST be held accountable and auditable by surrounding them with processes that capture before-after-when (BAW). Can EII use IQ tools or Data mining processes in-stream? Can ADW use IQ tools or data mining processes in stream? Quality should come "after" the load of the raw data to the data warehouse, or "after" the load of the raw data into the EII engine, it should be secondary, and applied only if there is an audit trail mechanism in place to trace back to the original data. Thoughts and comments are welcome; I'll blog more on the subject if there's an interest. Cheers, October 25, 2005RNA and RNAi in NanohousingThere has been renewed interest in RNAi and RNA lately in the biotech world (don't forget, biotech is a part of nanotech - or the other way around). RNA (or ribo-nucleaic acid) apparently has encoding and decoding instructions for gene sequences, RNAi apparently has the ability to block or inhibit specific gene sequences. See an introductory article here. In this blog we will explore (theoretically anyhow) what this might mean to the nanohouse and DNA computing. There are some neat pictures (simulated/generated) showing the DNA structure here. If you don't think nanohousing is being worked on, think again. Here's an IEEE link to a conference that occured in 2004. Ok let's get started.. In the Nanohouse, we need to learn from this. The future nanohouse won't be JUST a data warehouse, or JUST an ODS, or JUST an OLTP system - no, it will be an "integrated data store" where the molecules collect "data" as history, when it pertains to the context in which it lives - assigned by "key" components of information that only it recognizes. Different parts of the DNA structure will represent different and distinct chemical keys - for storing different types of information. Well that's all well and good, but we need functionality in the form of RNA and RNAi to act on the DNA strands that we "build". We also need catalyst type events to trigger interaction across the DNA sequences. Here's a quote from a Vienna RNA project that discusses this: Biomolecules exhibit a close interplay between structure and function. Therefore the growing number of RNA molecules with complex functions, beyond that of encoding proteins, has brought increased demand for RNA structure prediction methods. While prediction of tertiary structure is usually infeasible, the area of RNA secondary structures is an example where computational methods have been highly successful.http://nar.oxfordjournals.org/cgi/content/full/31/13/3429 Wow! So this means a Nanohouse is definitely feasible? Where does this impact my business today? This is interesting, how does modeling take place in this element? How does this play with RNA and RNAi? The first practical dynamic programming algorithms to predict the optimal secondary structure of an RNA sequence date back over 20 years (1). Since then they have been extended to allow prediction of suboptimal structures (2,3) and thermodynamic ensembles (4), which allow to assign a confidence level or ‘well definedness’ to the predictions (5).http://nar.oxfordjournals.org/cgi/content/full/31/13/3429 So does this mean the Nanohouse is a "dynamic structure" model? Recently, several methods have addressed the problem of predicting a consensus structure for a group of related RNA sequences (6–11). Such conserved structures are of particular interest, since conservation of structure in spite of sequence variation implies that the structure must be functionally important. By enhancing energy rules with sequence covariation these methods also obtain much better prediction accuracies. In other words, the structure itself of the RNA stays the same - much the same as the structure of a neuron. Even though the memories change, the connections in the brain change, the thought patterns change, the basic structure of the neurons in the brain stay the same. What does this mean to Nanohousing? A stretch of the imagination might be to say: Can we beg, borrow and steal some of these concepts today? * a close interplay between structure and function (the data model MUST be closely related to the functions in business) You can find more on the Data Vault modeling technique (for free) here. What do you think will happen in your Nanohouse? October 20, 2005Push-Pull Pros and ConsI've been asked about the pros and cons of ETL push-pull, I thought I'd generalize the issue a little more into the pros and cons of Push Pull technology in general. I'm including EII, and EAI in this posting. It's not that push or pull is necessarily bad by itself, its' more about using the right notion for the right data access at the right time.
Ok - down to brass tacks, the nature of PUSH technology is basically the realm of EAI and Message Queuing. In this realm we deal with the publish/subscribe model, or maintaining a broadcast message to anyone listening. Really "easy" technology until you get to the engineering underneath. The real work is deciding WHICH transactions are important, and WHICH are not. Then there's the decision on how often, how fast, and how to write the drivers to "plug" in to each of the applications, or legacy apps that service transactions to begin with. Ok - enough of the engineering talk, let's get back to the business aspects. Push technology is GREAT when wanting to distribute transactions as-they-happen. Stock tickers, and other types of financial institution transactions are very important when it comes to push technology. How about disasters and notification? Again, important. What about the different components? Now wait just a minute - Aren't ETL/ELT engines getting stronger and faster? Yes - they are. But they still aren't "architected" for real-time dynamic data integration. The worlds BEST ETL/ELT engine will focus on transforming as many transactions as possible (in batch) in the shortest amount of time, that's their strength - and they should STICK to it (Stick to your ticket Harry, very important that you STICK to your ticket... - Harry Potter) We could learn a few things from this line; no really! ETL/ELT is GREAT at PULL technology - go get the data on a scheduled timing interval, not just the data - but ALL the data, en masse. Bring me everything that meets criteria X, across ALL disparate systems, then integrate it all en masse (batch style) - and do it as fast as possible so that I can replicate the system with new information, and transformed information. Ok - well, ETL/ELT engines will HAVE to process near real time in the near future in order to survive, while batch will not go away any time soon, the windows are shrinking, and the data sets are growing, and the timeliness of critical data is becoming more important. ETL/ELT are GREAT at static rules, parallelism, partitioning, and performance - they require huge amounts of processing power to get the job done right (with very large data sets). This is the nature of PULL. I guess one could speculate that PULL technologies require a place to "land" the data once it's been transformed. Not something that PUSH technology needs, nor wants. PUSH technology wants to ACT on the transaction as it stands, once it reaches it's destination. This is a primary difference between PUSH and PULL. Now let's not get confused! There's such a thing as IMMEDIATE PULL, or PULL ON DEMAND, this is new - it's called EII (as a paradigm). EII in this nature offers many different things and is a _complimentary_ technology to EAI and ETL/ELT. Pull on demand isn't (usually) interested in massive history sets, nor is it interested in "doing" something with the transaction, such as applying it to another system based on business process workflow (although this could change in the near future). It is more interested in managing the metadata layers in between the business and data set, it is more interested in immediate access, immediate integration of CURRENT state than it is in history. Now hold on! Don't get me wrong - EII can be used to access warehouses just the same as it can be used to access current OLTP/ODS, Staging areas, and Stock Tickers. It's the FOCUS of what EII does that makes PULL ON DEMAND different than PULL on batch schedule. The focus is much different. That same focus makes it a complimentary technology to the EAI and ETL/ELT world. Using the right tool for the right job makes all the difference. EII also can transform/conform, and write-back. Something that EAI does (write-back), but ETL frequently is not "architected" for. Mostly because the "work" that ETL does must be checked before it is re-integrated with the source systems. Now take Active or Right-Time Data Warehousing, there's a combination of technologies being utilized to get the data into the warehouse at the right-time, and there's a combination (including data mining, and scoring analysis) to re-deliver the transactions back to the source systems at the right time. Of course this is neither push nor pull, but rather "closed loop processing." Ok - it uses push to get the transaction to the warehouse, and push to get it back from the warehouse to the OLTP system. So at the bottom of this blog entry, we are still left with the question, what are the pros and cons of push and pull? Let's see if we can sum it up (forgive me, I may forget a few): Push Cons: Pull Pros: Pull Cons: PULL ON DEMAND Pros: PULL ON DEMAND Cons: Ok, none of these are Complete lists by any stretch of the imagination (*some might say I have none :) But hopefully they give a peek into what might be some of the top differentiators across these technologies. Thoughts? Comments? Have some pros/cons you'd like to add? Please, feel free. Thanks, Standards, Compliance, and SuccessesI've been asked about standards, and what they contribute to the success of a project within business. Particularly from the entry on Architecture, Standards, and Business. Standards contribute quite a bit actually. But standards can also be overkill. There are some neat comments on Agile Modeling forum regarding the use of standards, and I've spoken with Scott Ambler about some of these things (but not yet in detail). Grady Booch and I have discussed the nature of useful standards in brief conversations, of which we still have to draw some conclusions - with that let me continue my entry. http://www.MyersHolum.comWhat kinds of standards do we have in industry? When an industry or business is NEW or yet-undefined, there are no real standards or accepted methodologies for build out. Take Data Warehousing for example: when it was first discussed (in the early 70’s) there were no best practices, no standards, no suggested ways of completing projects. However as time went on and practitioners built data warehouses, they discovered that when best practices and standards were applied, the businesses reaped significantly more benefits (lower cost, reduced risk, easier implementation, faster build-out) and so on. Now I can tell you some hairy stories about standards and over-kill. When we first introduced SEI/CMM standards (lock stock and barrel) to our manufacturing organization, we had severe trouble implementing CMM Level 3 – too many standards for tiny little projects which had small impacts on the state of business. In other words, the standards were too thick, too heavy to implement. Then we applied “standards thinners” (like paint thinners) which didn’t destroy the quality of the standards, but rather reduced them to a working plan. The project still followed standards and best practices, only less of them. Of course SEI was first proposing CMM as a level of software engineering, as they still do. What we did was apply SEI and CMM best practices to a blend of data warehousing best practices and standards. We wanted a system that was repeatable (in architecture and design), easy to build, consistent, with reduced risk and rapid build out. We quickly reached CMM Level 5 with our organization and our data warehouse as we followed this hybrid paradigm. We combined Spiral methodologies with Waterfall checkpoints at major steps to reduce risk, we trained individuals before engaging them on projects, we undertook versioning, and centralized store of documentation, we also put together risk analysis spreadsheets and project size estimates by using FUNCTION POINTS. We also continued to label and number ALL requirements, refocusing the requirements as needed to be specific reachable, and measurable goals (using RUPP processes). Finally we attached the project plan numbered items to each specific requirement, so we could produce business process metrics and answer user questions on progress and risk at any given point. We had a team of 3 people working on this project at any given time. So you see, even with small projects, a certain level of standards help the team achieve what they need to do – with quality. We helped befriend the business users, upon completion they threw MORE work at us than we could handle. We helped turn the business belief from: IT can’t deliver; we’ll build it ourselves, TO: IT has done a tremendous job AND saved us tons of money and time. By the way, here’s something I want you to walk away with (I teach this in my VLDW class at TDWI): The larger the data set, OR the larger the project, the less likely you are to produce a success WITHOUT standards! In other words: The larger the project, the larger the data set – the more likely you are to succeed with standards and best practices. Without standards and best practices – your project will fall into chaos and disarray and quickly succumb to unforeseen / unmitigated risks. I created something called The Matrix Methodology for data warehousing, now I’m creating a new methodology much more advanced and incorporating ideas like Agile Data Modeling (process wise), Data Vault Data Modeling (physical), Master Data Management, and SEI/CMM components for the market place. My current company puts these principles to work in the projects, and RFI’s that we assist with. By the way, while I didn’t focus on it very much, standards certainly assist those in need of compliant projects, and compliant data stores – get to where they need to go, but that’s an entry for another time. Hope this helps, October 18, 2005Information Valuation - Part 2I've been asked if there was a way to quantify information as an asset on the books, and since then have been asked what % of companies may be doing this lately. It's hard to quantify something that companies have long considered an intangible asset. However in this blog we will explore the base possibilities and present a single scenario which may begin the process of accounting for data (quantifying data). This is an experimental topic, any thoughts or feedback is welcome. What is data valuation as an asset? The first method: Top down valuation We've been doing this for years in quantifying the cost of hardware on which these data sets run, but we just haven't carried this into the data itself - some have assumed that the data is too intangible an element to be measured. Notice we didn't ask any questions about the quality or compliance of the data set... We only asked about tangible and measurable loss of the data set. One might say that Top Down measurement is akin to itemizing the entire set of data with the hardware for disaster recovery classification. True. Now would be a good time to insure against these failures and unavailability’s. Well, a big part of that is getting insurance for the LOSS of an entire data store. Maybe rates for insurance of data are LOWER for those with a proven and tested (audited by the Insurance adjusters) disaster recovery program. Maybe the rates are lower for those businesses that have business definitions and attached understanding for the uses of their data - in relation to the actual SLA's signed by the business. Maybe the lower rates indicate a better handle on understanding and quantifiable results of the data and what it means to the business. Once the data is insured - it should be attributed as an asset on the books for the amount it is insured for. Maybe insurance rates go up for those companies that DON'T meet their SLA's, and aren't accountable for failure recovery. Hmmm, talk about Information Quality Improvement efforts. Ok - so we've quantified (somewhat) what the BLOB of data represents to the business, we can justify it's loss, and account for it's recovery. What about depreciation? That's a VERY interesting question. Think of data in a Data Warehouse or Enterprise Data Integration store as different as that in an ODS, or OLTP system (which it is). Even an Active Data Warehouse falls into this category. Depreciation (hypothesis) can only be measured by the loss of historical data, and what that translates to in a quantifiable manner to the business. In other words: does the SLA cover "old" data; if so, how old? Where is the cut-off point at which the "old" information is not valuable to the organization anymore? if the "old" data is always valuable, just less valuable, then ask this question: If my systems were to lose data that is X months/years old - what would it cost me to recover? What SLA's are in place to force me to be back up and running? Are my data mining engines relying on the history to produce active responses today? If so, then the "old" data is just as valuable as the current or new data. If not, then the "old" data depreciates in value as it ages - it's up to you to negotiate that value point with the data insurance adjusters. The Second Method: Bottom Up Valuation Suppose you had a customer table, well - forgive me - but if you don't have customers, you don't have revenue, and if you don't have revenue, you aren't in business to make money. In this customer table you had 3 million customers. Each customer row has a specific value. Assuming you're a direct marketing firm, or even a banking firm - each customer has a specific investment or makes a specific amount of "money" for you by being in your customer table. As a marketing firm, you want to offer free subscriptions - and the gender of the customer becomes important as ONE of 32 elements that make a difference in determining WHAT to offer them. If you send the wrong type of magazine to the wrong individual because you're missing the gender field, and they unsubscribe from ALL your services - you've just lost a good customer. What does that equate to in revenue dollars? The bottom up method requires attaching a "row-score" to each row, weighting the importance of each row - of course this weighting or scoring mechanism must change every time the customer information changes (apply this concept to EVERY table within your data store). Then there is a general or average dollar amount, a mean, and median, max and min dollar amount for each row. Some customers are outliers and we DON'T want to adjust their outlying investments to the average numbers. Now, based on the weighting and the statistical dollar amounts calculate an overall value for each table - please take into account the EMPTY fields and the importance of having GOOD data, this should affect the weighting factors. Finally, add up the weighted dollar amounts for each row, and ask yourself the business questions: is the value of this "data set" really the total value to the business? Add an overall adjustment factor to the final dollar amounts, and test it with top-down recovery costs for an approximate range. Now you should have a fairly decent idea what your data sets are worth. Now it's up to you to negotiate with the data insurance investors for actual valuation. Ok, there's a lot of talk here about valuation of data, and insuring the data - some of which we may already do. But what about listing it as an asset for tax purposes? Well - that requires a change in the laws around the world, unfortunately that's wide-open speculation that I do not engage in. If you can insure your data, maybe just maybe, you can convince your tax auditors and your local government to at least look at the issues seriously. In regards to what I see in the field (purely speculation - not backed by any scientific studies of any kind): I see on average 10% of the fortune 500 engaging in these activities. However, I see 60% to 80% of the businesses today working hard to have fault-tolerance and disaster recovery programs in place. What I don't see is the follow through with tangible valuation of data as an asset. Good luck, hope this helps – thoughts or comments? Love to hear from you. Data to Information, Architectural Roles for BusinessDave Wells, Director of Education, TDWI and I have had several discussions on this topic: Turning your Data In to Business Information In light of this discussion we discussed the Business Dimensions and Business points of pivoting which take place when layering the data for presentation. Data is often overlaid with additional business dimensions to make it usable. I'm not talking about the technical dimensions that we produce within the data marts, I'm talking about individual columns labeled as dimensional aspects of the data. This isn't to say that parts of these ideas aren't available today; it is merely to say that some level of automation and underlying base data architecture are missing from the scene today. For instance, there are the common and major dimensions: Sales, Finance, HR, Manufacturing, etc.. There are the other common dimensions such as: hours worked revenue, taxes paid, cost of goods, etc... But hidden within these are additional layers of business dimensions which we frequently ignore. These dimensions are the most powerful - allowing the business user to slice and dice the data by column to reach a single cell of information. It's N-DIMENSIONAL information, something that could be utilized by data visualization engines. In this N-DIMENSIONAL space, we have all the other columns or data elements - but they are arranged in the manner in which we USE the data within business. Wait a minute! Just hold on there partner, are you telling me that one of these dimensions could just be equated to a Satellite? Keep in mind that Satellites will still split by Type of Data and Rate of Change - this will help define the Business Dimensional aspect of the information housed within. Each column within the Satellite (at that point) becomes a pivot objective if desired. So where's the challenge? The challenge is to connect EACH of these dimensional definitions in an X, Y, or Z axis for viewing when desired - allow the end-user to pivot on each of these dimensions, allow each of the dimensions to move in the hierarchy (up or down), and define them as full and complete metadata for the business users. In other words, a metadata repository for all elements in the Data Vault model, then make it accessible through cube-views or some such delivery mechanism. The ultimate goal would be to have database technology powerful enough to collapse the business delivery into a virtually defined layer or layers, driven by metadata and virtual definition of the web of connectivity across the metadata. Then to have this layer be the single point of access by all BI and delivery mechanisms. More about the Data Vault data modeling architecture can be found here, on free forum discussions. Thoughts? October 16, 2005A funny idea: Slower Melting SnowI've been thinking, with all the advancements that are being made in nanotech, why can't we create a molecule that melts more slowly, and lasts longer in warmer temperatures? This blog is a hypothetical look at an idea I would love to see discussed... Imagine, slower melting snow - made from nanotech. Snow that still melts, so it doesn't harm the environment and nature still can experience the seasons - but something that might be able to be created on the ski-slopes on the first of September every year, and doesn't melt until late may or June. Maybe it's a silly idea, but maybe just maybe there might be something to it. Imagine if we could keep a water molecule crystallized for just a little bit longer than usual - get it to release heat less quickly, or get it to absorb heat slower. What kinds of applications could this lead to? Let's speculate for a moment - I know nothing (other than what I've seen on Nova) about avalanches, and how they are caused by melting sheets of snow - turning top layers of snow to water on warm days, freezing at night into ice sheets - then new snow fall on the ice sheets; eventually the weight causing the layer of snow to "slide" off the ice, starting an avalanche. Suppose this type of extreme crystallization could be stopped or prolonged - in other words, suppose the snow melts more slowly, less water, less ice at night. Do you think that the sheets could be "thinned" out enough to be crushed under the weight of new snow rather than cause a slide? Maybe. Or how about slow-melting ice in drinks, (but still melts); let’s just say I'm in to the old-fashioned ice cubes, rather than the plastic re-freezable ones. Well, back to snow. If we could construct slow melting snow molecules we might have longer lasting ski seasons. What are some of the dangers? Another possible danger is the slow-melting snow, if applied to bare-ground, might actually trap heat in the ground - because it doesn't absorb the heat as fast as regular snow. I'm not sure of all the impacts, but at first glance, this doesn't sound good. Well, here's one more possibility: Slower melting snow may actually hold a colder internal temperature than regular snow, so if you got it on your hands or down your back - it would be colder to the touch. However - that requires a heat absorption rate within the snow molecule itself. Now that I think about it, to the touch - this slower melting snow may not feel as cold (not sure about this one). Here's an interesting (possibly dangerous) use: applying slower melting snow (or some offshoot) to warmer ocean waters that have traditionally been "cold". What if it could be used to slowly lower the temperature of what are supposed to be cold regions of water? Of course this is a silly idea, and one made from a fictional thought - but I just thought maybe, someone was daydreaming (like me) about a longer ski-season. Thoughts? What do you see as the dangers, or possibilities of this type of idea? What makes it infeasible/feasible? October 14, 2005What is the TRUTH anyway?We've all heard of "Single Version of the Truth", and we've all seen shows, presentations, and even read a paper or two on this topic. In fact, search Google for this phrase results in 36,900 hits! But this begs the question: what the heck is "TRUTH"? How can you hold "TRUTH" accountable? Is a "Single version of the Truth" compliant? I stand here sure as the sun will rise today to tell you that I believe TRUTH is subjective in nature. Of course I think it also depends on how you define "Truth" in your enterprise integration efforts. My friend and mentor Bill Inmon wrote about this here. I would tend to say after reading and re-reading this article that indeed, SVT (single version of the truth) is in fact a GOAL, nothing more. I would also tend to say that truth itself is in the eye of the beholder (or money-holder as the case may be), because as we all know - one persons truth is not necessarily coherent with another’s. If all truth were equal we would have discussions about metadata, common meta definitions. Nor would we fight over what the master system is for customer information, nor would we re-state or alter data in the warehouse when one major money holder leaves, and another takes their place. The operational systems' data tells one story (which for the most part matches the business expectations), while the metadata and business requirements usually tell a different story. The "gold" or real profitability within the organization is to resolve the discrepancies find and fix the business issues causing the discrepancies - this will also put you on the road to full compliance, auditability, and end-user accountability. I would tend to say that SVT is a wonderful goal - but ONLY achievable within a specific point in time as viewed and presented to the current money holder who agrees with and accepts the definitions set before them. What then really makes a truly robust, scalable integration platform? Be-it for metadata, data, or business rules? SVT says: we have truth when data is munged and agrees with whatever the current business rules state it should match. SISDF says: we have truth when data is integrated by common semantically defined sets of business keys, and is the raw data that arrived in accordance with the source system feed - both on the same level of detail, and with exacting traceability. SVT goes on to say: cleansed, merged, mixed and matched data should be put in the integration store. SISDF takes a different approach: data will be loaded to the integration store AS IT STANDS 1 for 1 match with what arrived on the source feeds. This pushes the notion of SVT downstream from the SISDF Integration store - and puts the burden of proof (to be labled an SVT) in the delivery mechanisms - which are usually data marts. In other words: Sales and Finance don't agree with each other on what "revenue" means, so one data mart has a sales SVT, and the other data mart contains finance SVT. Here we have a case where both SVT's are correct at the same time, but depending on who's using the data - it could be wrong, meaning that "truth" falls apart because of interpretation. Typically though, this case is overcome through hard-work, data stewards, metadata management, and SLA's with the end-user base to agree on two definitions: SALES REVENUE and FINANCE REVENUE, each calculated differently. On the other hand I've seen this happen first hand: SVT is built "on the way in" to the enterprise data warehouse. All is fine and dandy until the current money-holder is replaced with a new money-holder. The new money-holder says: hey, my sky isn't blue - it's white, and the SVT you think you have is WRONG until you change it. They are henceforth left with "restating" the data within the enterprise integration store (data warehouse). They've not only broken the SVT, they've broken compliance, traceability, auditability, and strangely enough - the SVT that WAS in place? It WAS correct for the time it WAS in place. A conundrum I think is what they call this. Well, with the onslaught of SOA, enterprise integration efforts are hotter than ever (as is metadata definition and management), right along with data quality. It's high time we fed ourselves with an architecture that presents SISDF rather than SVT, and moved our SVT's downstream (in the place of loading data delivery platforms such as data marts). In other words - it's high time we STOPPED all this transformation, cleansing and data alteration on the way in to our enterprise warehouses, and STARTED all the transformation, cleansing, and data alteration on the way OUT of our EDW and in to our marts. I'm not saying "don't deliver SVT", I'm not saying "don't cleanse or quality check your data", I'm not saying "it's bad to integrate or complete your metadata efforts." I AM saying: if you want a real, compliant, single/consistent integrated version of the DATA within your enterprise - you need to move the SVT notions down stream - and produce the "truth" as you produce the delivery mechanisms. I am also saying: take a hard look at producing an ERROR MART or two, or more - move the dirty data through the system, and put it in the hands of savvy business users who will begin seeing "who can clean up their data fastest" as a game. Your enterprise might just be surprised what they uncover when this type of effort is implemented. I know of several companies that found and saved $15M to $45M in the first six months of operation of a SISDF. You can read more about the data architecture behind these notions on my web site. Thoughts? October 13, 2005Oracle - Fully Loaded? Or Dried up like dust?I suppose it's all in how you look at it, but take a look at these two new E-Week stories: (This is an extremely opinionated entry, would love your feedback). Oracle Scores Open-Source InnoDB Storage Engine I like Oracle Database (for specific projects), but let's take a walk on the wild side... What do you suppose is happening? Did they (Oracle) decide their core-engine is too big, too cumbersome and can't take the heat anymore? Or do you think it's a Microsoft-like move to squash extremely great technology never to be heard of again? For the sake of discussion, let's talk about both sides of the coin. In the first situation, we have to make some assumptions: (these assumptions are only 1/2 based in reality - the rest are what I've personally experienced at customer sites). 1. Oracle's engine has been "added on and added on and added on" over the years, it's grown up like a big huge ball of band-aids. They've done some serious modification to parts of the core engine and in their latest releases (10g and on), they've finally added some LONG overdue functionality. Well that leads me to say: Think about it, Oracle has to charge huge fees in order to pay it's army of core engineers. With new smaller, leaner, and faster core engines - away with some or more of this massive expense!! Now it's not wise (and I'm not suggesting) that the ENTIRE Oracle core engine be tossed, although Hmmmm.... I am saying that re-engineering, cost reduction, and smaller faster/leaner meaner engineering needs to take it's place if Oracle is to compete in the new market place. That still leaves the second question: Is this just another attempt by a large corporation (like Oracle) to squash upcoming technology? I think honestly, Oracle needs to breathe new life into their old technology engines, and simply bought the expertise - now if they're really smart, they'll learn from the existing company at InnoDB, instead of squashing it into the Oracle Culture. What's your opinion? October 11, 2005Data Visualization, just a flash in the pan?A couple of comments on an entry I made a while ago: "What does your Dashboard look like?" lead me to continue this dive into visualization. Some of the comments are interesting, and ask questions like: what's the business value of visualization? Is it really needed in our industry? It might be nice, but it might be a niche product too... In this entry we'll explore a few of these things, and see what kinds of answers (if any) we can dig up. Is data visualization just another fad? There's even convergence across scientific areas and business in the form of nanotech. New standards are being born, and as one paradigm slowly tails off, new ones spring up. It's the ever changing nature of change. "What doesn't change - dies. What evolves - grows and adapts. The only thing constant is change." In order to understand if Visualization is just a fad, we should look at the existing technologies and delivery mechanisms and ask, how have they changed in the past 10 years? However, they haven't changed much since their inception. Businesses change, paradigms shift - albeit slowly. The way businesses view and review their data should also be changing, a natural extension of the current graphics is 3D landscapes, and interactive scenarios laid out in new ways. In other words, if I want to view my business in a new way, or force myself to think differently, I should be looking for different ways in which to experience my data. The real world is not made up of just 2D surfaces, or numbers (addresses on a mail box), it is made up of interactive experiences. There have been many experiments conducted across many well respected institutions that show: experiential settings appear to be one of the better ways of learning new things and thinking in new manners. From this we arrive at 3D interactive graphics for new and different ways to visualize data. Again, I make the point: the BI vendors have made tremendous strides to make their engines solid, to bring the value proposition of their engines to the fore-front, to include data mining and other mechanisms of retrieving data, all I'm suggesting is that "visualization" include the next step, the next layer, the experiential 3D learning environment. I don't think visualization is just a fad, I think it's a broad range of delivery mechanisms that include bar, pie, line-graphs, and Excel spreadsheets - but I think the next "change" is again, to include the new interactive means of examining data. "If you don't change what you're doing, you're going to end-up right where you're headed." Is this new visualization really needed in our industry? I'm not so sure how to answer this question, but I do know that sometimes showing new ways of seeing and interacting with information can spur new ways of application. I also know that sharp business minds are already experimenting with this technology as a competitive advantage. So I hypothesize that it's a symbiotic need, both parties (technology and business) need to come to the table to really spur the movement. That sitting back on our laurels and waiting (on either side of the fence) won't get us anywhere. I suppose I could go back to Oil & Gas exploration industry, they use land-maps and geological studies, and earth core samples to figure out where the best place is to house tank systems, drill for new oil, and run pipes on solid ground. In the financial sector, what if a representative model could be developed? Different levels of transactions representing different levels and strength of ground layers, different levels of revenue and aggregation points representing pipes and flow valves, different divisions representing different tanks. Then map this to "find the leak", or "see which tank isn't producing enough", what's the optimal or maximum flow capacity of our financials? It could raise some eyebrows. It's funny, we say business needs to drive technology, true. But sometimes business doesn't know what's possible until technology says: Hey, look at me, this is a new way of thinking - do you have a use? Personally I believe interactive landscapes and 3D modeling are just an evolution of data visualization, because in a way - bar charts, pie graphs and even spreadsheets are visualization of information too. Thoughts and comments are welcome. October 6, 2005US govt spends $3.7 Billion on NanotechNanotech is coming, and the government is spending billions of dollars a year - but it's not just the US government, it's happening all over the world! We think compliance is big, security is big money, well you haven't seen anything yet. Nanotech spending tops 'em all, and the spending is only due to increase. The following is compliments of Lux Research: * China has moved from also-ran to power player when it comes to nanoscience. China's share of academic publications on nanoscale science and engineering topics rose from 7.5% in 1995 to 18.3% in 2004, taking the country from fifth to second in the world. What would you do with $3.7 Billion budget? I might go skiing... on NanoSnow that never melts ;-) October 5, 2005EII and Unstructured Data - Blowout Party of the year!Ok, so maybe a piece of software can't really party - but we can! :) Claudia just posted a blog on the need for garnering semi-structured and unstructured data within the enterprise warehouse. Bill Inmon has got an unstructured/semi structured data retrieval and visualization tool, we see more information being pushed under the compliance umbrella. That leaves us asking many questions, like: Do I need to monitor all e-mails? How do I decide what's important and what's not in my "sea of word-docs?" How and what impact does it have on my EDW? It will take a long time to answer all those questions, but one thing in the EII world that has been overlooked is it's ability to access, reference, and integrate semi-structured and unstructured data. Someone somewhere once said: "only 20% of the worlds data lives in the structured realm, the other 80% lives in semi-structured and unstructured content." Well, if this is really true, and we've seen ROI's for EDW's as high as 400%, then what do you think the ROI could be when integrating the other 80% of our business? It certainly should raise some eyebrows. Now I'm not suggesting that EII replace ETL, and in fact there are some misunderstandings out there about ETL - one of which says: ETL handles only Batch, and is used for only historical data - this simply is not true. Alright, 80% of the time this may be true, but there are times when an Active Data Warehouse has been built and ETL is utilized on a 5 minute or 3 minute refresh increment. I've also seen ETL utilized with Queuing mechanisms for real-time transformation (by no means an easy task). There's another customer using ETL to synchronize all their source systems across the enterprise and they don't even have a warehouse. But: ETL also works with only STRUCTURED data. To make ETL "fit" a real-time integration paradigm is like a round peg in a square hole, challenging, costly, and increases complexity. Now this is where EII really begins to shine, EII can make it much easier to integrate real-time data - not to mention unstructured and semi-structured data. Let's focus on the following two components: e-mail and documents. What if the metadata for my warehouse was stored in an "appendix" or glossary of terms in a word-doc? What if I had answered 4 or 5 key questions about how certain elements are computed through emails? Would this information be helpful to a) know that it exists, b) have it catalogued in the warehouse c) be able to integrate these elements within my BI reporting solution as "pop-overs" or pop-ups? This is all fine and dandy, by now the old-timer ETL jockeys say: I can write perl to conform this stuff to structured data, and load it in - why do I need EII? Well, here's the case: What if over the following two minutes I answer two more questions (and the class is training) - EII can easily detect the new emails and provide the information in real-time to the training class. If I then add a word-doc to the central library that has FAQ's, then the class can make use of that information as well (immediately). Granted, this is just one small case of solving a very specific problem - EII can solve many more problems like this, and much larger in scope, but it demonstrates a differentiator between EII and ETL. Utilizing EII to access unstructured data will drive up ROI on integration projects at a much faster rate. Besides which, the ETL jockeys could use EII to help "discover" information about their integration projects - it may even help speed up the build-out process for EDW efforts. Thoughts? October 3, 2005Personal Security and your informationI've blogged about this recently, the judge in SF who basically ruled that credit card companies don't have to be accountable for telling you if your information is stolen right? We'll here's the flip side to this story. Turns out CardSystems is having stock trouble, on-line card processing merchants have seen sales fall a couple percentage points since the breech. Maybe they'll begin paying attention? Check out these stories on e-week: Visa USA Delays Plan to Cut Ties with CardSystems And on and on. The government can't agree on how to solve these problems, yet the justice system seems quite content on "letting these breaches slip on through". At least the credit card companies are stepping up to the plate, but is it too little to late? Let's look at this another way: a small vendor (mom & pop shop) is breached, their credit card storage is stolen, and all the cards are erroneously charged. The owners of the cards report these bogus charges, and the credit card company says: Due to the number of chargebacks that the small vendor experiences, their account will be "immediately discontinued." I don't see any waiting period or grace period for the small companies, why then does such a large company like "CardSystems" get a break of several months after the breach? Can you say double standard? This is absurd. They'll punish the little guys at the first sign of trouble, but the big-boys get a break?? Ok, so the mom & pop shops are always told: never keep the credit card numbers on file anyhow. Most of the shops abide by this rule, so what makes CardSystems any different? One word: Money. The problem is: we've got issues when we can't even control our own personal information, nor hold the vendors liable for breaches that they and their sub-contractors are responsible for. It's just a sad story. Cheers, |