Blog: Dan E. Linstedt Subscribe to this blog's RSS feed!

Dan Linstedt

Bill Inmon has given me this wonderful opportunity to blog on his behalf. I like to cover everything from DW2.0 to integration to data modeling, including ETL/ELT, SOA, Master Data Management, Unstructured Data, DW and BI. Currently I am working on ways to create dynamic data warehouses, push-button architectures, and automated generation of common data models. You can find me at Denver University where I participate on an academic advisory board for Masters Students in I.T. I can't wait to hear from you in the comments of my blog entries. Thank-you, and all the best; Dan Linstedt http://www.COBICC.com, danL@danLinstedt.com

About the author >

Cofounder of Genesee Academy, RapidACE, and BetterDataModel.com, Daniel Linstedt is an internationally known expert in data warehousing, business intelligence, analytics, very large data warehousing (VLDW), OLTP and performance and tuning. He has been the lead technical architect on enterprise-wide data warehouse projects and refinements for many Fortune 500 companies. Linstedt is an instructor of The Data Warehousing Institute and a featured speaker at industry events. He is a Certified DW2.0 Architect. He has worked with companies including: IBM, Informatica, Ipedo, X-Aware, Netezza, Microsoft, Oracle, Silver Creek Systems, and Teradata.  He is trained in SEI / CMMi Level 5, and is the inventor of The Matrix Methodology, and the Data Vault Data modeling architecture. He has built expert training courses, and trained hundreds of industry professionals, and is the voice of Bill Inmons' Blog on http://www.b-eye-network.com/blogs/linstedt/.

October 2005 Archives

Recently my discussions in the field have centered on Information Quality (or the lack thereof) and the EII tool set as well as the Active Data Warehouse (right-time data warehouse). We will explore this exceedingly dry (hopefully interesting) aspect in this blog entry, particularly in relation to Compliance and Integration - but I felt that it fits under SOA as well - so here goes.

Information Quality (according to Larry English) includes business processes, data, reporting, and people involved in interpretation of the information. But Information Quality both helps and hurts compliance efforts, particularly when the corporation is audited.

One of the over-simplified definitions of SOX is (at least at the data level): Can your system show the "before it was changed, after it was changed, and when this change occurred" audit trail. Without being able to answer these questions, any "software product" that claims it is SOX compliant is flat out wrong.

What does this have to do with EII?
EII can be quality driven too - but in doing so, it can and will, break compliance - IF it is told to transform data in the middle without producing an audit trail of what it did, what it used, and when - this is where the Write-Back capability of EII comes in handy - we almost need a EII-SOX warehouse to record the information flowing THROUGH EII tool sets in order to meet compliance and auditability.

I've written an EAB (executive action brief) on this site that talks about "making your data integration processes compliant" - click on B-Eye-Network, go to the HOME page, and look for "education" link in the lower left corner.

Now, if EII is _not_ transforming data, then the data set it pulls from should be "sox compliant" - it shifts the owness back on to the source systems to maintain audit trails of any information it changes. For source systems, this is a no-brainer, they are capture systems and are supposed to be "systems of record" for the business, which means the business is already supposed to "trust" these systems - even though the data quality may or may not be there.

Time out - this doesn't make a lot of sense, where's the quality in all of this?
Ok - compliance is one thing, but the reason I talk about it is this: Information Quality tools CHANGE DATA under the covers, therefore in order meet compliance initiatives and be auditable, we must surround these tools with a before and after process. At the DATA level - this means that if we introduce "quality processes" in-stream with EII, we could be in serious trouble with Compliance - again, unless we record the effects (before/after and when).

Quality Tools are nothing more than transformation engines (ok - they do a LOT more than that), but when it comes to bare-bones they are CHANGING DATA sets. Therefore: everything that applies to ETL/EAI and data mining (in accordance with compliance) also applies to EII, and the processes that load active warehouses.

Wait a minute! Active Warehouses have a refresh cycle that's too fast to put a quality trigger in play, right?
Right and wrong - remember active warehouses are "right-time" warehouses, it's all about latency. However, there are active warehouses that cannot use quality initiatives in-stream, because the data decays too fast.

Now what we will say is this: even ADW's still have "strategic" initiatives to them, which means that only the tactical sides for-go the quality settings (until the strategic based quality engine cleanses the historical data, and sometimes that historical data is returned to the source during transactional/tactical processing).

Remember this: Information Quality is SUBJECTIVE, it is one version/one flavor of the truth - truth is subjective, and will change depending on the eye-of-the beholder (the end user). Therefore, quality engines MUST be held accountable and auditable by surrounding them with processes that capture before-after-when (BAW).

Can EII use IQ tools or Data mining processes in-stream?
Sure, and they probably should - especially when sourcing external or freely available data. I'm just saying that EII will have to take the extra hit, and write-back the BAW somewhere to be compliant. The challenge here is: when to initiate a quality process, and keep it so that it doesn't impact the query timing too significantly. Now if EII is pulling from the strategic side of the warehouse - wonderful, it should be pulling quality data (already altered/cleansed/patched).

Can ADW use IQ tools or data mining processes in stream?
Yes and no - depending on the latency requirements this may vary. Most ADW's at 5 minutes or less latency, don't run IQ processes in-stream, companies at this level use IQ tools plugged directly in to their source/capture systems, which raises other questions, like how do I find my broken business processes? But that's for another day, another entry.

Quality should come "after" the load of the raw data to the data warehouse, or "after" the load of the raw data into the EII engine, it should be secondary, and applied only if there is an audit trail mechanism in place to trace back to the original data.

Thoughts and comments are welcome; I'll blog more on the subject if there's an interest.

Cheers,
Dan L


Posted October 26, 2005 6:33 AM
Permalink | 2 Comments |

There has been renewed interest in RNAi and RNA lately in the biotech world (don't forget, biotech is a part of nanotech - or the other way around). RNA (or ribo-nucleaic acid) apparently has encoding and decoding instructions for gene sequences, RNAi apparently has the ability to block or inhibit specific gene sequences. See an introductory article here.

In this blog we will explore (theoretically anyhow) what this might mean to the nanohouse and DNA computing.

There are some neat pictures (simulated/generated) showing the DNA structure here. If you don't think nanohousing is being worked on, think again. Here's an IEEE link to a conference that occured in 2004.

Ok let's get started..
For quite a while I've blogged and written about convergence of form and function, along with convergence of industries: Bio, chemistry, technology, physics, etc.. Back in an early paper I wrote for B-Eye I predicted that the future technologist would have to have skills well beyond mere "technology" in order to survive (or face the threat of outsourcing). Well, form and function in Bio-tech are a BIG part of what make it work.

In the Nanohouse, we need to learn from this. The future nanohouse won't be JUST a data warehouse, or JUST an ODS, or JUST an OLTP system - no, it will be an "integrated data store" where the molecules collect "data" as history, when it pertains to the context in which it lives - assigned by "key" components of information that only it recognizes. Different parts of the DNA structure will represent different and distinct chemical keys - for storing different types of information.

Well that's all well and good, but we need functionality in the form of RNA and RNAi to act on the DNA strands that we "build". We also need catalyst type events to trigger interaction across the DNA sequences. Here's a quote from a Vienna RNA project that discusses this:

Biomolecules exhibit a close interplay between structure and function. Therefore the growing number of RNA molecules with complex functions, beyond that of encoding proteins, has brought increased demand for RNA structure prediction methods. While prediction of tertiary structure is usually infeasible, the area of RNA secondary structures is an example where computational methods have been highly successful.
http://nar.oxfordjournals.org/cgi/content/full/31/13/3429

Wow! So this means a Nanohouse is definitely feasible?
Yes - but it's still at least 5 to 10 years off before we understand enough to create one. However, the study of RNA and RNAi sequences along with the DNA strands is important, and will help build a foundation of knowledge from which the Nanohouse can be built.

Where does this impact my business today?
Quite frankly it doesn’t yet. How soon it does will depend on the advances in both Biotech and Nanotech sectors. I would speculate that if your top Information Technologists / Scientists and researchers are not yet involved in this field - they should be. The paradigm is already beginning to shift as we see applications of this technology be created in the labs around the world. Like any paradigm shift, this one will take time - and lots of it.

This is interesting, how does modeling take place in this element?
In order to answer this question, we must take a look at not just data visualization, but model visualization. Model visualization consists of putting data models into a 3D landscape, and combining them with hub-spoke like structures that resemble molecular connections (poor mans neural network), see the Data Vault data modeling references on this site.

How does this play with RNA and RNAi?
RNA can help with the interaction of the molecules, while RNAi can specifically block or inhibit interaction. But more than that, the dynamics of this interaction/blocking need to be scored and measured.

The first practical dynamic programming algorithms to predict the optimal secondary structure of an RNA sequence date back over 20 years (1). Since then they have been extended to allow prediction of suboptimal structures (2,3) and thermodynamic ensembles (4), which allow to assign a confidence level or ‘well definedness’ to the predictions (5).
http://nar.oxfordjournals.org/cgi/content/full/31/13/3429

So does this mean the Nanohouse is a "dynamic structure" model?
Interesting, the answer is it depends; dynamic structure in the sense of adding new DNA components, extracting, and connecting the DNA to other molecules, yes; but changing the core-structure underneath - no. RNA itself also has a structure, and the structure is rigid.

Recently, several methods have addressed the problem of predicting a consensus structure for a group of related RNA sequences (6–11). Such conserved structures are of particular interest, since conservation of structure in spite of sequence variation implies that the structure must be functionally important. By enhancing energy rules with sequence covariation these methods also obtain much better prediction accuracies.

In other words, the structure itself of the RNA stays the same - much the same as the structure of a neuron. Even though the memories change, the connections in the brain change, the thought patterns change, the basic structure of the neurons in the brain stay the same.

What does this mean to Nanohousing?
It means that the architecture of our structure must be consistent, repeatable, and redundant - but that the inter-relations, the functions, and the sequences can change (leading to a dynamic set of rules for inter-relationships, but a static structural based foundation from which to scale infinitely).

A stretch of the imagination might be to say:
The equivalent of ‘data mining activities’ has been found within the RNA and RNAi operations.

Can we beg, borrow and steal some of these concepts today?
Yes - we should be utilizing what we learn in these fields and applying it to our current modeling techniques and data warehouses.

* a close interplay between structure and function (the data model MUST be closely related to the functions in business)
* structure must be functionally important
* assign a confidence level or ‘well definedness’ to the relationship (dynamic relationships can be created, weighed, tested, and destroyed depending on viability to associative information)

You can find more on the Data Vault modeling technique (for free) here.

What do you think will happen in your Nanohouse?
Dan L


Posted October 25, 2005 7:16 AM
Permalink | No Comments |

I've been asked about the pros and cons of ETL push-pull, I thought I'd generalize the issue a little more into the pros and cons of Push Pull technology in general. I'm including EII, and EAI in this posting. It's not that push or pull is necessarily bad by itself, its' more about using the right notion for the right data access at the right time.


Push-Pull, a direction I find myself pulled in, many different times during the day. (Seriously folks... :)

Ok - down to brass tacks, the nature of PUSH technology is basically the realm of EAI and Message Queuing. In this realm we deal with the publish/subscribe model, or maintaining a broadcast message to anyone listening.

Really "easy" technology until you get to the engineering underneath. The real work is deciding WHICH transactions are important, and WHICH are not. Then there's the decision on how often, how fast, and how to write the drivers to "plug" in to each of the applications, or legacy apps that service transactions to begin with. Ok - enough of the engineering talk, let's get back to the business aspects.

Push technology is GREAT when wanting to distribute transactions as-they-happen. Stock tickers, and other types of financial institution transactions are very important when it comes to push technology. How about disasters and notification? Again, important.

What about the different components?
For EAI: Push technology is it's life-blood, this is what it's built on, making the applications "talk" when the transactions are available.
For ETL/ELT: Not so important, even in an "Active Data Warehouse" it's not so important - ok, the PUSH of the transaction is important, but the ETL component? Gets in the way of getting the data in the right time to the warehouse for analysis.

Now wait just a minute - Aren't ETL/ELT engines getting stronger and faster? Yes - they are. But they still aren't "architected" for real-time dynamic data integration. The worlds BEST ETL/ELT engine will focus on transforming as many transactions as possible (in batch) in the shortest amount of time, that's their strength - and they should STICK to it (Stick to your ticket Harry, very important that you STICK to your ticket... - Harry Potter) We could learn a few things from this line; no really!

ETL/ELT is GREAT at PULL technology - go get the data on a scheduled timing interval, not just the data - but ALL the data, en masse. Bring me everything that meets criteria X, across ALL disparate systems, then integrate it all en masse (batch style) - and do it as fast as possible so that I can replicate the system with new information, and transformed information.

Ok - well, ETL/ELT engines will HAVE to process near real time in the near future in order to survive, while batch will not go away any time soon, the windows are shrinking, and the data sets are growing, and the timeliness of critical data is becoming more important. ETL/ELT are GREAT at static rules, parallelism, partitioning, and performance - they require huge amounts of processing power to get the job done right (with very large data sets). This is the nature of PULL. I guess one could speculate that PULL technologies require a place to "land" the data once it's been transformed.

Not something that PUSH technology needs, nor wants. PUSH technology wants to ACT on the transaction as it stands, once it reaches it's destination. This is a primary difference between PUSH and PULL.

Now let's not get confused! There's such a thing as IMMEDIATE PULL, or PULL ON DEMAND, this is new - it's called EII (as a paradigm).

EII in this nature offers many different things and is a _complimentary_ technology to EAI and ETL/ELT. Pull on demand isn't (usually) interested in massive history sets, nor is it interested in "doing" something with the transaction, such as applying it to another system based on business process workflow (although this could change in the near future). It is more interested in managing the metadata layers in between the business and data set, it is more interested in immediate access, immediate integration of CURRENT state than it is in history.

Now hold on! Don't get me wrong - EII can be used to access warehouses just the same as it can be used to access current OLTP/ODS, Staging areas, and Stock Tickers. It's the FOCUS of what EII does that makes PULL ON DEMAND different than PULL on batch schedule. The focus is much different. That same focus makes it a complimentary technology to the EAI and ETL/ELT world.

Using the right tool for the right job makes all the difference. EII also can transform/conform, and write-back. Something that EAI does (write-back), but ETL frequently is not "architected" for. Mostly because the "work" that ETL does must be checked before it is re-integrated with the source systems.

Now take Active or Right-Time Data Warehousing, there's a combination of technologies being utilized to get the data into the warehouse at the right-time, and there's a combination (including data mining, and scoring analysis) to re-deliver the transactions back to the source systems at the right time. Of course this is neither push nor pull, but rather "closed loop processing." Ok - it uses push to get the transaction to the warehouse, and push to get it back from the warehouse to the OLTP system.

So at the bottom of this blog entry, we are still left with the question, what are the pros and cons of push and pull? Let's see if we can sum it up (forgive me, I may forget a few):
Push Pros:
1. Instant transaction communication
2. Feedback on the transaction after the business processes are invoked.
3. Transaction by Transaction / Guaranteed delivery mechanisms
4. Mass Distribution, or publish subscribe to those that want it.
5. Visual Business Rule Processing Engines (are usually in place).
6. TACTICAL in nature (for solving business problems)
7. New sources can come on-line and push out new transactions (integrating with ease into existing layers).

Push Cons:
1. Independent transactions - meaning can't rely on "history", can't rely on "trends", and can’t rely on an understanding.
2. Difficult to establish context
3. Can't transform "massive sets of data" at once - technology just isn't fast enough yet - this may change with Nanotech and DNA computing.
4. Once a transaction is sent - it's gone. No "recorded history", although some EAI engines actually have mitigated this point over the years.
5. Sometimes tends to be a highly code-driven environment under the covers.
6. The number of crisscrossing attachments to transactions means it's harder to "unhook" legacy systems that are providing the information...

Pull Pros:
1. Massive sets of transactions in parallel/partitioned can be handled in ever smaller execution windows.
2. Increase in processing power means increase in data set that can be dealt with.
3. We can get what we want when we want it via scheduling.
4. Predictive support, predictive failures, predictive model - leading to standardization, and automation.
5. STRATEGIC IN NATURE.

Pull Cons:
1. Requires a Landing Area for the transformed data sets.
2. Requires massive sets of processing power (for large data)
3. Batch Windows are continually shrinking while data sets are ever growing.
4. No "NOW" data available, in other words, little to no visibility into the transactions occuring RIGHT NOW.
5. Once a source, always a source (static SOURCING, static TARGETING)

PULL ON DEMAND Pros:
1. Focus on the metadata integration layer
2. Focus on the business rules of integration
3. Utilized by services to conform NOW transactions, WHEN requested (as opposed to WHEN they happen)
4. Provides access to previously inaccessible systems (like word docs, emails, power points, and so on).
5. Dynamic and Distributed query sets mean the queries and their plans can change in accordance with the data set changes (straight PULL is STATIC QUERY BASED - unless the RDBMS engine tunes the query under the covers).
6. Dynamic Sourcing, Dynamic Targeting - if one source isn't available, the metadata layer and engine can determine the "next source in line" and fire the query just the same.
7. TACTICAL IN NATURE!!

PULL ON DEMAND Cons:
1. Requires STRICT adherence and agreement by the enterprise to metadata management, and development.
2. Requires (or forces the hand of) data quality initiatives ON THE SOURCE SYSTEMS.
3. Increases management costs, and required processing power. BUT DECREASES Long-Term costs of implementation of "Services", be-it B2B, B2C and so on.
4. Requires sources be defined and setup ahead of time (before accessing), but PULL strategic has the same requirement.

Ok, none of these are Complete lists by any stretch of the imagination (*some might say I have none :) But hopefully they give a peek into what might be some of the top differentiators across these technologies.

Thoughts? Comments? Have some pros/cons you'd like to add? Please, feel free.

Thanks,
Dan L


Posted October 20, 2005 2:42 PM
Permalink | 1 Comment |

I've been asked about standards, and what they contribute to the success of a project within business. Particularly from the entry on Architecture, Standards, and Business. Standards contribute quite a bit actually. But standards can also be overkill. There are some neat comments on Agile Modeling forum regarding the use of standards, and I've spoken with Scott Ambler about some of these things (but not yet in detail). Grady Booch and I have discussed the nature of useful standards in brief conversations, of which we still have to draw some conclusions - with that let me continue my entry.

http://www.MyersHolum.comWhat kinds of standards do we have in industry?
There are hundreds, if not thousands of standards all over mature industries. Some of the ones I can think of right now include: ISO, HIPPA, BASIL II, ANSI, SEI/CMM, PMBOK, ASCII, RS-232C, FireWire, Encryption, Security, and so on.

When an industry or business is NEW or yet-undefined, there are no real standards or accepted methodologies for build out. Take Data Warehousing for example: when it was first discussed (in the early 70’s) there were no best practices, no standards, no suggested ways of completing projects. However as time went on and practitioners built data warehouses, they discovered that when best practices and standards were applied, the businesses reaped significantly more benefits (lower cost, reduced risk, easier implementation, faster build-out) and so on.

Now I can tell you some hairy stories about standards and over-kill. When we first introduced SEI/CMM standards (lock stock and barrel) to our manufacturing organization, we had severe trouble implementing CMM Level 3 – too many standards for tiny little projects which had small impacts on the state of business. In other words, the standards were too thick, too heavy to implement. Then we applied “standards thinners” (like paint thinners) which didn’t destroy the quality of the standards, but rather reduced them to a working plan. The project still followed standards and best practices, only less of them.

Of course SEI was first proposing CMM as a level of software engineering, as they still do. What we did was apply SEI and CMM best practices to a blend of data warehousing best practices and standards. We wanted a system that was repeatable (in architecture and design), easy to build, consistent, with reduced risk and rapid build out. We quickly reached CMM Level 5 with our organization and our data warehouse as we followed this hybrid paradigm.

We combined Spiral methodologies with Waterfall checkpoints at major steps to reduce risk, we trained individuals before engaging them on projects, we undertook versioning, and centralized store of documentation, we also put together risk analysis spreadsheets and project size estimates by using FUNCTION POINTS. We also continued to label and number ALL requirements, refocusing the requirements as needed to be specific reachable, and measurable goals (using RUPP processes). Finally we attached the project plan numbered items to each specific requirement, so we could produce business process metrics and answer user questions on progress and risk at any given point. We had a team of 3 people working on this project at any given time.

So you see, even with small projects, a certain level of standards help the team achieve what they need to do – with quality. We helped befriend the business users, upon completion they threw MORE work at us than we could handle. We helped turn the business belief from: IT can’t deliver; we’ll build it ourselves, TO: IT has done a tremendous job AND saved us tons of money and time.

By the way, here’s something I want you to walk away with (I teach this in my VLDW class at TDWI): The larger the data set, OR the larger the project, the less likely you are to produce a success WITHOUT standards! In other words: The larger the project, the larger the data set – the more likely you are to succeed with standards and best practices. Without standards and best practices – your project will fall into chaos and disarray and quickly succumb to unforeseen / unmitigated risks.

I created something called The Matrix Methodology for data warehousing, now I’m creating a new methodology much more advanced and incorporating ideas like Agile Data Modeling (process wise), Data Vault Data Modeling (physical), Master Data Management, and SEI/CMM components for the market place. My current company puts these principles to work in the projects, and RFI’s that we assist with.

By the way, while I didn’t focus on it very much, standards certainly assist those in need of compliant projects, and compliant data stores – get to where they need to go, but that’s an entry for another time.

Hope this helps,
Dan Linstedt


Posted October 20, 2005 5:51 AM
Permalink | No Comments |

I've been asked if there was a way to quantify information as an asset on the books, and since then have been asked what % of companies may be doing this lately. It's hard to quantify something that companies have long considered an intangible asset. However in this blog we will explore the base possibilities and present a single scenario which may begin the process of accounting for data (quantifying data). This is an experimental topic, any thoughts or feedback is welcome.

What is data valuation as an asset?
To understand this, we have to first accept the fact that data in and of itself is valuable to the organization. Then we must take steps to measure parts of the value of the organization. In my mind there are two major methods by which valuation of the data set can begin.

The first method: Top down valuation
Top down valuation (from my perspective) means lumping the entire data set together as a single asset, or maybe each of the OLTP systems, and data integration (ok - data stores) as their own assets. The questions we have to ask with top down valuation might include the following:
1. Would I lose business if I lost this data store?
2. How much money per minute would I owe to partners/customers based on SLA's if this data set were UNAVAILABLE?
3. What tangible profits have been received partly as a result of this data set being built?

We've been doing this for years in quantifying the cost of hardware on which these data sets run, but we just haven't carried this into the data itself - some have assumed that the data is too intangible an element to be measured. Notice we didn't ask any questions about the quality or compliance of the data set... We only asked about tangible and measurable loss of the data set. One might say that Top Down measurement is akin to itemizing the entire set of data with the hardware for disaster recovery classification. True. Now would be a good time to insure against these failures and unavailability’s.

Well, a big part of that is getting insurance for the LOSS of an entire data store. Maybe rates for insurance of data are LOWER for those with a proven and tested (audited by the Insurance adjusters) disaster recovery program. Maybe the rates are lower for those businesses that have business definitions and attached understanding for the uses of their data - in relation to the actual SLA's signed by the business. Maybe the lower rates indicate a better handle on understanding and quantifiable results of the data and what it means to the business.

Once the data is insured - it should be attributed as an asset on the books for the amount it is insured for. Maybe insurance rates go up for those companies that DON'T meet their SLA's, and aren't accountable for failure recovery. Hmmm, talk about Information Quality Improvement efforts.

Ok - so we've quantified (somewhat) what the BLOB of data represents to the business, we can justify it's loss, and account for it's recovery. What about depreciation?

That's a VERY interesting question. Think of data in a Data Warehouse or Enterprise Data Integration store as different as that in an ODS, or OLTP system (which it is). Even an Active Data Warehouse falls into this category. Depreciation (hypothesis) can only be measured by the loss of historical data, and what that translates to in a quantifiable manner to the business.

In other words: does the SLA cover "old" data; if so, how old? Where is the cut-off point at which the "old" information is not valuable to the organization anymore? if the "old" data is always valuable, just less valuable, then ask this question: If my systems were to lose data that is X months/years old - what would it cost me to recover?

What SLA's are in place to force me to be back up and running? Are my data mining engines relying on the history to produce active responses today? If so, then the "old" data is just as valuable as the current or new data. If not, then the "old" data depreciates in value as it ages - it's up to you to negotiate that value point with the data insurance adjusters.

The Second Method: Bottom Up Valuation
What does this mean? Bottom up valuation (again my hypothesis) is the manner in which EACH data element is weighed in on the grand scale of value to the business. Here, data quality IS measured on a row by row, cell by cell level. It is tedious and much of the work can be done by utilizing a data profiling tool, or a data mining tool engaged to profile for trends of missing information or bad information.

Suppose you had a customer table, well - forgive me - but if you don't have customers, you don't have revenue, and if you don't have revenue, you aren't in business to make money. In this customer table you had 3 million customers. Each customer row has a specific value. Assuming you're a direct marketing firm, or even a banking firm - each customer has a specific investment or makes a specific amount of "money" for you by being in your customer table.

As a marketing firm, you want to offer free subscriptions - and the gender of the customer becomes important as ONE of 32 elements that make a difference in determining WHAT to offer them. If you send the wrong type of magazine to the wrong individual because you're missing the gender field, and they unsubscribe from ALL your services - you've just lost a good customer. What does that equate to in revenue dollars?

The bottom up method requires attaching a "row-score" to each row, weighting the importance of each row - of course this weighting or scoring mechanism must change every time the customer information changes (apply this concept to EVERY table within your data store). Then there is a general or average dollar amount, a mean, and median, max and min dollar amount for each row.

Some customers are outliers and we DON'T want to adjust their outlying investments to the average numbers. Now, based on the weighting and the statistical dollar amounts calculate an overall value for each table - please take into account the EMPTY fields and the importance of having GOOD data, this should affect the weighting factors. Finally, add up the weighted dollar amounts for each row, and ask yourself the business questions: is the value of this "data set" really the total value to the business? Add an overall adjustment factor to the final dollar amounts, and test it with top-down recovery costs for an approximate range.

Now you should have a fairly decent idea what your data sets are worth. Now it's up to you to negotiate with the data insurance investors for actual valuation.

Ok, there's a lot of talk here about valuation of data, and insuring the data - some of which we may already do. But what about listing it as an asset for tax purposes? Well - that requires a change in the laws around the world, unfortunately that's wide-open speculation that I do not engage in. If you can insure your data, maybe just maybe, you can convince your tax auditors and your local government to at least look at the issues seriously.

In regards to what I see in the field (purely speculation - not backed by any scientific studies of any kind): I see on average 10% of the fortune 500 engaging in these activities. However, I see 60% to 80% of the businesses today working hard to have fault-tolerance and disaster recovery programs in place. What I don't see is the follow through with tangible valuation of data as an asset.

Good luck, hope this helps – thoughts or comments? Love to hear from you.
Dan L


Posted October 18, 2005 6:34 AM
Permalink | No Comments |
PREV 1 2 3

Search this blog
Categories ›
Archives ›
Recent Entries ›