Business Intelligence Network business intelligence resources

Blog: Dan E. Linstedt

« March 2005 | Main | May 2005 »

April 28, 2005

DNA Robots and Computing Technology

I usually don't use a title that someone else has used, but I feel that this is a VERY important breakthrough. See this site for the story I'm blogging on: http://www.jefallbright.net/node/2616

In this blog I will go on an exploratory journey into what it is like, what it would be like to establish computing power on the DNA level. This again, is conjecture - pure speculation, so it's ok to let your mind wander a bit.

In one of my recent blogs, I predicted (not the first to say this) that DNA computing appears to be the strongest and most rapidly advancing field (in terms of nanotechnology applied to computational ability). There are other parts of nanotech that are advancing rapidly as well (in other areas, like bio-informatics).

This is quite astounding. Man made atomic level substances built to do something specific, walking around in a liquid solution - attaching to other molecular tracks and actually "walking" across the strands. All in parallel, all atomic layer control. Fascinating.

Now let's step off the path for a minute and see what this might lead to in terms of computational power. Let's ask a few questions, and hope that someone in this field (familiar with this technology) will comment for us.

The first question that comes to my mind is: why was the wheel re-invented? In other words, there are enzymes that travel down a single DNA strand and unzip it, there are other enzymes that travel down the same DNA strand and zip it back together, there are additional enzymes that replicate DNA strands, and sometimes introduce "changes" to DNA (evolution of DNA). It seems as though these walker molecules are an attempt to re-invent the wheel, or is it?

In this case, I suppose the walkers were built because a. we still don't fully understand enzymes b. we can't build our own controllable/programmable enzymes c. the walkers can do things the enzymes can't, like be non-intrusive (non-destructive), carry loads, attach and separate themselves, etc..

Now, let's talk about computational power of the walkers. Let's assume for a minute that they had advanced the walkers enough to carry a load (which they are working on). Can the walkers be programmed to release the load at a specific point? If two walkers "meet", can they join forces and combine their loads if the conditions are favorable? Would this make a single, larger walker that's twice as powerful? Ok, let's assume for a minute that we use some of the "self-assembling" nanotech that has advanced in the area of crystalline structures, and applied to the walkers.

Then we'd have to program each walker with a specific set of identifying codes (business rules) that say when the walker can merge with another, when it can't, when and what it can combine or mix it's loads with, and when it shouldn't. The end result is a tiny bit of "self-intelligence" (using the term intelligence very very loosely here). Now, if we can program and encode walkers to interact, they may actually build or self-assemble something we've never seen before.

They may actually perform "load combinations" or chemical experiments that we can't create in the lab, they may actually create new substances that we haven't seen before - pending the load itself can actually be combined to form specific results (that reaction is bound by the laws of chemistry and physics). Remember, this entry is hypothetical.

What would the combined walkers create? How would they structurally "merge" or self-assemble? How much load could combined walkers carry? My cousin, Adam Linstedt is a microbiologist professor at Carnegie Mellon University, he's suggested to me that when conditions are right, the man-made atoms (nanotech) can actually be separated from the combined chemicals. In other words, he said: making conditions favorable to binding, allows the "walkers" to bind to the target molecules, making conditions unfavorable can have the chemicals self-separate from their target molecules.

So what about the power of computing at this level? What can be said about this advancement? I believe that when this get's far enough, the walkers will indeed be encoded, and instructed to carry different loads. I also predict that self-assembly of specific kinds of walkers is inevitable and only a matter of time. The self-assembly is an interesting point when it comes to modeling.

What if we can construct models with functions that understand the context of what they contain, and what they can combine with. In other words: pair form with function, mimic the nanotech industry - we might discover new modeling and new computational models we hadn't thought of before. Self-assembling processes? Yep.

Just a thought anyhow. Cheers for now.

  Posted by Dan Linstedt at 7:49 AM | | Comments (0)


April 27, 2005

Architecture, Standards, and Business

On one of my last blogs, I received an interesting comment. I've requested via email, clarification of the term "redundant synonyms", and am hopeful that I will yet receive a reply. However, I wish to expound a little bit on some of the nature of architecture and design, in terms of what I've seen and worked through in the past 13 years in the industry.

In this blog, I will explore deeper, the business meaning - having clear, consistent, and repeatable design architecture. The methods of applying standards to data modeling are also discussed here.

Welcome back to my fire-side chat. As I recline in my rocker, I invite you to sit back, put on your house-shoes and relax a little with me. I always welcome your feedback, and please - feel free to comment, I'm open to learning new things particularly when it's an area of interest.

Down to business... Data Modeling, Information Modeling, Business Modeling - can it, should it, be one in the same? No, probably not. There is certainly a difference between Information and Data, and there is a visible difference between Information and Business.

But what happens when these modeling techniques diverge from business too much? Costs rise, Impacts rise when change is needed, inconsistencies in architecture bubble up, band-aids appear then IT says: Stop, we either need to buy an out of the box solution to replace this mess, or we need to re-write this spaghetti from the ground up. It's as if we started with a single rose bush, and ended up with a thicket - and we don't know where the starting point or ending point is to trim it.

By the same token, they cannot be exactly the same - hence the different modeling methods: business process modeling, data modeling, and systems architecture modeling, and so on. Wait a minute... Where's Information in all of this? Isn't there some standard on Information Modeling? No, there really isn't, probably because it's the grey area that crosses between Data and Business, and it's usefulness to business.

They can however be bound together by similar needs, similar architecture so that when the business needs change, the underlying data model can change without the heavy business impacts, and without the high cost of maintenance, and without severe divergence. But that's a topic for another day. Let's get on to STANDARDS in Data Modeling.

For years, people have insisted on telling me that "no two data warehouses can be modeled the same way." I beg to differ, why then does Universal Data Models enjoy such a large success? It doesn't stop there, but goes on: why then does CRM have competing vendors with very similar feature/functionality sets? (Probably very different models under the covers), they bear the high cost and brunt of changes... No wait, the customer who decides to upgrade undergoes brain surgery every time a major upgrade is released...

Is it because the data warehousing or Integration industry hasn't been bold enough to step forward and proclaim a data model as a standard starting point? But it's more than that, it's not just the data model that's important - it's the architecture and design of the data model that's important. The architecture provides the guidelines for the infrastructure from which the enterprise builds it's vision.

I liken it to this: a two-story house has many many designs, and as long as the foundation and support beams are in the right place, can be created very differently. Compare that to a 32 floor high-rise office in the city. Limited foot print, must rise straight up (usually), must be flexible in high winds at the top, must withstand earth-quakes (in California), glass must be shatterproof (extra weight). and of course the pylons and support structure (infrastructure) must be solid.

I'm no expert in high-rise buildings, but I'm going to assume that the building codes get stricter, the larger the structure that is to be built. Now imagine, after they frame a high-rise and the owner wants to "move" where the elevator shaft is, how possible is this? What if the owner of a two-story house builds an elevator, and wants it moved during framing? Compare the costs, the impacts - much different.

But that's where the similarities end. In the architecture side of the house - there's a ton of planning, design, and standard architecture, proven architecture (reusable, redundant, and consistent set of standards) that govern the build of a high-rise for success. In a two-story house, there's more lee-way, still - standards exist but it can be "thumbed" up to size. Can you take a bunch of two story houses, stack them on top of one-another and make a high-rise out of it? Probably not going to work. Can a high-rise hold a bunch of "two-story" housing units? Yes, if it's partitioned that way.

The bottom line is, it's time to converge our modeling efforts, it's time to provide some consistency, repeatability, and standards to our designs/architecture - whether we're modeling our data, our information or our business - they should all tie together.

I've seen too many data models where the data modeler points to the top left and says: here we implemented X architecture to get around this problem, down here (bottom right) we built Y because the source system had an issue, and over here (top right) Z is in place because it was built before I got here.... and the story goes on. Band-aids to the architecture because the original architecture and data model no longer meets the needs of the business.

It's best to start with a foundational data modeling approach that lends itself to a repeatable and consistent design when it's being extended to meet new enterprise needs. In this manner, metadata can also be captured through naming conventions and architectural design.

More later... Hope this is a useful topic for everyone. Feedback?

  Posted by Dan Linstedt at 8:29 PM | | Comments (5)


Automated Enterprise Data Warehouse Modeling

In my last blog in this category, I discussed repeatable architecture, and repeatable process to build a solid foundational enterprise architecture. I hope I was not giving the message out that the model elements must be repeated, for that is not the case. Data modeling is definitely a cross-combination of understanding the business need (the practitioner) and their ability to represent the business in a structured format. NOTE: This is a biased blog entry, based on a new data modeling technique called the Data Vault. I'll be talking more about the architecture in coming blogs.

However, I believe that with an integrated non-aggregated, low-level detail architecture, there is a mechanism by which to achieve a standard data modeling architecture. Particularly when it comes to "integrating" different enterprises, why else would something like Universal Data Models ever have taken off?

In this blog, I explore beyond the simple data model. I have suggested a new revolution in data modeling (available here: www.DanLinstedt.com) which is based on standard, repeatable architecture - an architecture that builds a granular, integrated, and foundational enterprise view of the facts (the data itself).

It doesn't mean that this model should be used for information dissemination. It just means this model should be used to construct an enterprise data warehouse, non-accessible except to power users and data miners. From this point, we can generate information stores, star-schemas (turn data into information) through integration, quality, cleansing, and aggregation.

At the end of this modeling effort, we begin to realize that it's nothing more than a STANDARD set forth on how to build a decent model. With any standard, the next evolution is automation. Well, I've done it. I've built a "Data Modeling Wizard", one that takes in multiple source data models from a number of relational databases, and spits out: Staging Areas, Data Vault data models, and ETL loading code to go with it. The next version of the software will actually produce all the mathematical combinations of "star-schemas" that appear to be "useful" to the end-user, and allow the end-user (IT modeler) to pick the stars to generate, then cross them with date/time aggregation options.

In other words, I've automated the process of building a back-end enterprise data warehouse data model. One step closer to the truly "Dynamic Data Warehouse" or Dynamic restructuring of information in near-real-time.

I can now produce a data model that is 60% to 80% of the final result that I want, in under 10 minutes (11,000 source tables). Of course the software has limitations, it reads only relational source data models, and relies heavily on Primary/Foreign keys. Also, the quality of the data model output depends directly on the quality of the data model input. But then again, if I can automate what used to take 3 months, down to 1 day - then I can use the rest of the week to manually tweak the data model to my liking.

The point to this blog is not to sell the software (although I am looking for VC's/Angels), or it's usage, but to point out that there is another revolution coming: Automated build-outs of enterprise information stores, and dynamic model changes. For the first time that I can recall, I can play "what-if" games with my architecture before I sink tons of cost and time into it.

After using and generating the Data Vault, a business has the responsibility of turning the DATA into INFORMATION, and actually writing the correct business rules into the processing engine to accomplish this task. This also requires different modeling techniques like Star Schema, and something Dave Wells recently wrote about in Flashpoint (November 2004): Master Dimensions, Master Fact Tables.

  Posted by Dan Linstedt at 9:26 AM | | Comments (0)


April 25, 2005

ETL, ELT, EAI, EII and E-I-E-I-O

Well well, lookie here - Old MacDonald had a farm, E-I-E-I-O. (sorry, on a bit of a funny kick today). What do all these things have in common? More over what problem are they trying to solve? Are some of these technology stacks "sun-setting"?

In this blog we explore some of these garbled acronyms, and no - I won't repeat the farm joke... We'll also take a hard look at some of the existing business issues that are forcing changes in the way we (IT) work. If nothing else, a bit of light reading - you might get a laugh or two out of this... :)

Sometimes I wonder just a bit - why we have so many different mechanisms to solve the same problem. Oh yes, but what exactly is the problem to begin with?

It all boils down to this: moving data from point A to point B.
Yes, it really is that simple!

The options (byproducts of having the data available)? Integrating, changing, recording, merging, matching, and cleansing are all by-products. We have it in-stream, in-transit, in-route - now we need to do something with it.

Is ETL dead? Yes, I believe so - in it's current form it won't last much longer as a paradigm. It needs to morph if it will survive, change into something more "becoming" of the integration age we are currently feeling (which by the way, the wave or movement started over 4 years ago).

ETL = extract transform and load.
ELT = extract, load to the RDBMS then transform
EAI = Enterprise application integration
EII = enterprise Information Integration
EIEIO = Old MacDonald had a farm... (sorry, did it again).

As I mention in the data modeling blog, the paradigm is shifting, the need to move data from "all sources" into an integrated business model that houses both current and historical views of consistent data is being sharply focused by business acumen.

So what? That means EL, doesn't it? Yep - from a data movement perspective, extract and load - basically detecting existence of new/changed data, and doing a delta comparison is all that's needed. Icing on the cake is having a visual no-code, drag and drop development paradigm that handles and manages metadata along the way.

Let's talk about the Transformation section for a minute, the Big "T". It's a bottleneck, in fact - it's THE bottleneck in most VLDW / VLDB and very large data integration systems. Over the years it has been more efficient to transform the data in-stream, because the RDBMS engines lacked the scalability, and sometimes the functionality to handle all the complex transformations that are necessary.

Today, all that has changed. RDBMS engines now contain highly complex optimizers, incredible business transformations on the SQL level including (but not limited to) object level transformation, in-database data mining engines, in-database data quality/cleansing/profiling plug-ins, statistical algorithms, hierarchical and drill-down functionality, and on and on...

Along with the paradigm shift for bringing all the data to a single statement-of-fact (across the enterprise), the nature of convergence, and consolidation are now saying: it's more efficient to perform any type of "transformation" within the bounds of the RDBMS engine itself. After all, they have begun growing up and offering Multi-Terabyte solutions, some hundred+ terabyte solutions have been around for a long long time.

If we shift the "T" bottleneck from the ETL into ELT, we have a very strong case for scalability - the resulting engine leveraging the best-of-breed, latest RDBMS capabilities, and taking advantage of every ounce of scalability and parallelism (and load-balancing) that the RDBMS can muster up.

So, ETL is "dead". There, I said it. ETL vendors MUST re-tool towards EL with a focus on "T", or better yet - why not make the transition EASY, make the tool "ETL-T". Give the designers the option to convert to "EL-T" where it makes sense. After all, we have sunk-costs and development time into intensive ETL routines, let them stand for a while and earn back their keep.

Now what? What about ELT, EAI, and EII?

Ok, EAI is a 10+ year old paradigm that focuses on integrating applications. There are vendors like Tibco that "run wall-street". EAI is going strong as long as there are applications to integrate, but does EAI overlap into the world of EL? Let's first define EAI: Every time a change happens in an application "plugged in" to the EAI tool, it pushes the change to the message bus, and looks for business rules and other "listeners" that need to be notified of the change.

These business rules can consist of manual intervention, is this really necessary? or is EAI just another "band-aid" for overcoming source system capture problems and integration problems that exist in (for instance) mainframe interchange protocols? I would argue that EAI is more than that, because it focuses on the business processes of the data - goes above and beyond simple integration and begins to look at HOW the data becomes information, and where/when/why it should be utilized.

I will say this though: EAI as a paradigm is also dead. What? How can I possibly say this? This is blasphemous. In my opinion, EAI is a technology who's time has come - who wants to "push" all this traffic onto centralized busses, especially if there's no-one listening? What I mean is, simply pushing data out onto a bus or into a queuing system just because we have a change in the application doesn't mean it's vibrant and desired data that needs to be absorbed down-stream. Off-topic: how many times during the day do you hear noises that you "tune-out"? What if that noise were never made in the first place?

Besides that, EAI focuses only on ONE aspect of the business: the APPLICATION making the data change. There are many more places that data changes within a corporate environment, and some of them are not application based (take unstructured data for instance)...

What if I don't have applications to integrate? What if my picture is bigger than that? Say, web-services or SOA? Ok, EAI vendors must also adapt to meet the needs of SOA - so they have a paradigm shift to undertake if they wish to survive. Even though the paradigm has outlived it's usefulness, as long as there are new applications to "install locally" within a company, there will exist a need for EAI.

There is a shift afoot: "Applications are being out-sourced" says CIO magazine, DMReview, and couple other sources. Software and app providers are taking up the SOA provisions, and EAI (like it's predecessor EDI (electronic data interchange) will be lost in the fray).

However, if the EAI vendors take heed, and re-tool they have TREMENDOUS business value proposition already built into their "routing and business flow management" side of the house, so why lose all that investment? They could (if they wanted) take the SOA management of components and integration by storm, there is one such vendor I'm thinking of right now who could do this in a flash...

That leaves EII: EII picks up where ETL and EAI left off, it's a Pull on demand solution, which I evaluated 3 years ago (privately). I saw the EII paradigm as a niche player, and still do - it has a VERY limited life-span unless it too encompasses some additional technology and re-tool. EII is wonderful to get data at its' source, on-demand. It could very well fill some of the needs of an SOA if desired (and in some cases does very nicely). EII too, has some nifty capabilities to handle business metadata in it's form of meta-modeling. I've never seen such gracefulness in dealing with multiple modeling formats. Once I got over the horrendous learning curve (of one particular tool), it began to make sense.

Some of EII's problems are: it can't handle massive volumes of data, it performs transformations on the fly (row/by/row column by column). The tools that implement EII usually rely on a middle tier meta-model, some of them have the capability of defining business models and allow business users to actually CHANGE the model without IT intervention - nifty trick. However, the transformation is again, squarely in the way of scalability of these engines, as is Write-back capability.

Ouch, write-back? If I setup an EII query to source from a web-service, a stock ticker, and a data warehouse, how on earth can write-back be enforced? let alone a two phase commit? The rules to determine write back must be extremely complex, and again - volume and complexity are directly juxtaposed to each other, and inversely proportional to performance over a given constant of time.

So we're back to square one (whatever that was). I'm a bit dis-illusioned by the vendor hype and what is truly delivered, and I haven't even begun to discuss the XML, Web-services and their transformation ability. I will say this though: There is a LOT of good technology buried in each of these solutions which has been developed, now if each of these vendors could focus on solving some of these problems:

1. Move the transformation logic into the RDBMS engines (they can do it, they really can!) Maybe add some "RDBMS tuning wizards" to the tool set, metrics gathering and collection would be nice...
2. Move the EL logic into loaders and connectors with high-throughput and CLEAR network channels (to move big-data, fast, and in parallel). Maybe even offering COMPRESSED network traffic as a free add-on? Adding CDC on the SOURCE as a free add-on?
3. Leverage RDBMS BEST features and plug-ins, like built-in data mining, built-in-data quality.
4. Focus their tool set on ease of use (from a business user perspective, developing business process flows).
5. Focus their tool set on "staging data" in an RDBMS of choice, so that write-back becomes a reality (with the caveat, that not all sources can be written back to, only our "copy" or snapshot of a stock ticker feed can actually be changed).
6. Focus their tool set on metadata, BUSINESS METADATA, and how it works with the business process flows, the management, maintenance and reporting of that metadata.
7. Focus their tool set on managing, setting up, and maintaining web-services and the security around them.

I think there would be a winning paradigm, maybe it would be called:
E-L-A-I-I-T-I-BMM (sounds like a foreign language).
Extract
Load
Application
Information
Transformation (in-RDBMS)
Integrate
Business Metadata Manager
(now breathe deeply)

You never know, it might just show up on EggHead's shelves! (just kidding, horrible acronym, but it describes the concepts). Of course if Disk Vendors have their way, they will take over the EL portion soon, and pair it with the compliance packages.

These are just ramblings on the state of these technology areas. I beg your pardon, this is not meant to be an attack, just a very opinionated blog as to how I see this industry evolving. Invest in BMM today! Comments?

  Posted by Dan Linstedt at 4:51 PM | | Comments (10)


SOA and Business Rule Changes

Welcome to part two of this entry. Here we will discuss the impact of business rule changes as they pertain to an SOA, compared to the impact of changes on a Data Warehouse. We will also begin a discussion (that will continue for a while) on the impact of these changes to the metadata underneath, and the other systems that SOA might use, such as EII, and EAI.

Again, this blog is open to comments and corrections - I'm always willing to learn new things, and SOA is a new adventure for me too. I'd be honored of an SOA heavyweight would weigh in and help clarify things.

Business rules change every day, and as I've sugested in a previous blog or white paper, the notion that I follow is that Business Rules are what is constituted as "today's version of the truth." But that again is for yabe (yet another blog entry).

In an SOA, as commented in the previous blog entry - the SOA can be like a restaurant menu - describing choices, prices, and contents. Enough for a customer to make an informed decision - of course all menus follow a similar paradigm: appetizers, soups/salads or lighter-fare, main dishes (chicken, beef, pastas), deserts and drinks. Sometimes they vary slightly, but if you've read one menu - you can read and understand the rest.

That says a lot about common and acceptable metadata in the restaurant world and how they compete, AND do business. They're not afraid to publish pricing, but you have to enter their establishment to see these things (sometimes their menus now, are listed on the web). On the other hand, what can they change from a business rules perspective? They can change: the layout of the restaurant, the nature of service, the number of tables and wait-staff, the number of cooks, quality of ingredients, pricing of products, product specials and so on.

Let's take a look at it from a business intelligence standpoint. We can change the way we report the data, the type of data that's on the report, how the data is aggregated and loaded, and of course - the kicker - what it means to us (the interpretation). In a restaurant, a steak might taste one way to you, and another way to someone else - but there's no right or wrong "taste", it just is. Furthermore, how do you describe a "taste" to someone? It's an experience, nothing more.

Back to business intelligence. Interpreting what is "right and wrong" is up to the executive staff - they need to agree what the real business rules are, and how the data will be interpreted. But as far as businesses go, and humans go, we all interpret requirements differently - which leads to different implementations of the same business "rule". When these business rules change, the descriptions in the SOA need to change (metadata changes), furthermore, it may be that the operations behind the SOA that retrieve the data, change. It may even be that we change WHERE we source the data from.

Today we might get the data from an EII feed, off a source system. Tomorrow the request is for: get the driver of the vehicle, but also get all their history. Or maybe the request changes to: summarize their history, and tell me if the driver's current actions are in accordance with their previous actions (is there a pattern of activity here).

Metadata is the great equalizer. In this case it allows a central point of tracking for impact analysis, discovery, and understanding. If implemented properly, it can support IT, the business and the end-customers using the SOA all at the same time. IT can use it to determine which processes are changing, pulling, and referencing the data element that is to change or be added. Businesses can use it as a metrics measuring point, and an impact analysis assistant - along with the same use as the customer, to understand and define what these elements and combination of elements mean to the business.

As far as implementing a change into an SOA architecture, it's the same process as implementing a change into the data warehouse, or the EAI system, or even the source system. The only difference is that the service architecture usually is much larger than just that of the data warehouse alone, and thus requires more resources to build and maintain. This is one reason why we are seeing SOA providers on the web spring up (so that SOA's can be used without the overhead of the technology underneath).

From a data warehousing viewpoint, the SOA utilizes the DW as just another data source to meet it's needs. Thereby increasing the value to the enterprise by re-using what has already been built and accepted.

  Posted by Dan Linstedt at 11:02 AM | | Comments (0)


SOA and DW beyond the big picture

I am delighted to see a comment requesting further clarification on SOA, DW and reporting beyond the big picture. I will do my best to dig into additional explanation as to why they fit together, and what business purposes they serve. As always, I'm learning about the SOA world as well, so any feedback or comments (or even corrections) are more than welcome. This blog is basically an opinionated view point on where this technology is heading.

I think there are two blogs to answering this question, so I'll split it into two parts. The first part goes through basic explanations of SOA and operational analysis. The second part will dive into the impact of changes to an SOA, and business metadata.

First, let's clear the air: SOA is not a product, nor is it a technology - it is an architecture. Plain and simple - a set of rules and standards that define service oriented interactions between mechanical solutions. The architecture as it's defined goes deeper than that, and assists with how the tools, technologies, and other items below the surface should react when faced with a request. But basically SOA is an architecture.

If you're familiar with Web-Services and the old CORBA model, you can easily relate it to the SOA definitions, which as a part of the architecture define HOW these "services" play with the outside world. In other words, exposing the services to the internet and B2B, B2C etc... communication.

An example (albeit not so great) might be as follows. Suppose you build a (large) retail store-front, it's important to 1) keep customers coming, 2) provide for large numbers of customers to be serviced quickly, 3) allow them to browse the products without feeling pushed, but at the same time 4) provide quick answers to their questions should they have any.

This is all about service - of course, the less service one receives the more they must know about the products before they walk in the store, or they are most likely to be unsatisfied customers. Along with that, the less service they receive, the less overhead for operating the company, therefore - reduced prices.

Now, back to SOA. An SOA is made operational by defining the electronic services behind the "clerks desk". It's like a menu at a restaurant, or a list of products on a web-site that are categorized and described with specifications. It's also a little more than that, it includes HOW the access from the outside world will come in and request/retrieve/write data to and from the systems behind the systems. But that's where it stops. It does NOT define the systems, methods, or procedures that are used behind the request to meet the need. Just like a menu doesn't tell you how the food is cooked.

With all of that, what roles do data warehousing, metadata, and ETL/ELT/EII/EAI play in an SOA architecture?

Data Warehousing is supposed to provide additional value to the enterprise by containing 60% to 80% of the information that the SOA is requesting, which include history of the information, and a single consistent view of the enterprise. If an existing data warehouse can service a "live request", then there is no need to "rebuild" all the integration, business rules, and cleansing that has already been built. In other words, don't re-invent the wheel.

If on the other hand, the DW doesn't have the data, then the service request must look to additional technologies to get the data, that's when it turns to EII and EAI to find an answer (in most cases). However, when we reach the technology level - we have to rebuild the business logic in EII, or EAI platforms in order to provide the data that they requested.

Let's talk a little about metadata, and security. In regards to metadata, the SOA registered services must be able to describe in business terms, what they offer - again like a menu at a restaurant. Would you order food at a fancy restaurant just by it's title? Probably not, not unless it tells you what's in it, and what comes with it. We need descriptions of the entire offering in the service in order to understand if it's the service we want to use. It relies heavily on metadata.

But it goes deeper than that! In medical communities, and soon in other verticals it will rely more and more on commonly accepted definitions, and commonly accepted metadata. Why? Because these services are interchanging data with many other companies around the world (depending on corporate "service and data exchange" agreements). Metadata is a HOT world inside SOA, and must be the focus of any good SOA implementation, otherwise end-users/customers won't understand how or why to use the service itself.

In regards to security, because these services are exposed (like e-filing your Taxes on the IRS web-site), they must be secured. In other words, users/agents making requests should be certified, logged in, registered - however you term it; and the actions must be monitored and logged - in other words, at a restaurant it would be similar to recording your favorite table, getting to know you, and what you order when you go to the restaurant. This way, if the restaurant has a sale on your favorite item, they might entice you back in for more business.

All I've done is wrap security in a sugar-like blanket. The real notions are to catch fraudsters, and protect confidential data sets, and to provide you or your software agents with only the services they paid for.

So as we see, the SOA is an architecture that defines WHAT can and will be put in place, how it will be seen (viewed by the outside world), and what will interface the requests to other businesses. Now, on to part two of this blog.

  Posted by Dan Linstedt at 10:10 AM | | Comments (1)


April 22, 2005

Averages and Outliers - Where's the REAL business BI?

My good friend Claudia recently blogged on the mis-use of averages. (click here) I agree with her statements, particularly in light of what the "average" ignores in terms of the outliers in information. Too many times, averages are used to get a green or red colored "single point of light" (graph) on our executive dashboards.

Statements like: Our company is performing fine, we're in the Green on our "average graph" can be extremely misleading. Warning: Opinionated statement in 3-2-1... Sometimes I wonder if the BI vendors in the industry are in business to sell software, or to actually make business better (there are those vendors who have real solutions, and those that just sell software with pretty dials).

Let's take a look at some of the facts about averages and averaging.

1. Averages ignore outliers.
2. very large data sets tend to produce clusters of outliers which averages smooth out, and remove.
3. If a VLDS (very large data set) is averaged, the really important details can be lost.
4. In a VLDS it there are MORE needles in the haystack, not less (more gems in the rough).
5. Some of these needles are like gold, when you can find them. Averages hide these facts.
6. Producing more averages over smaller clusters of averages (where the clusters are clustered data points according to market basket or neural-net mining) will produce a much better graph.
7. When was the last time you heard an executive say: "Yea, I just made a 5 Billion investment based on the average performance of the company over it's customer base..." It's usually they make an investment for very specific reasons, the gold-needle that will bring ROI fast.

Here's an interesting thought... Many people today are of the opinion that too much data is a bad thing. Well, here's the news, good bad or indifferent data allows us to learn more about our business specifics than not enough data.

Averages operate poorly over very large pools of data, they tell "less and less" about the data set underneath, where-as data mining operates very poorly in small data sets. Mining data, clustering data, and understanding the data is better done with too much data than with too little, the answers and assumptions (along with confidence levels rise) when there's more data. Does a data mining engine reach it's conclusions with "averages?" Not usually, it must go through ALL the details it's given (unless it's given a sample set) to find the correct answer.

Averages hide the gold-needle in the haystack of business data. I would suggest that a 3 dimensional landscape graph that pinpoints and monitors clusters of data (resulting from market-basket analysis, and neural networks) would be of more use to the BI world than the standard tank-full chart.

  Posted by Dan Linstedt at 10:37 AM | | Comments (0)


April 20, 2005

Information Valuation - Is data an Asset?

An interesting thought has been rolling around in my head for several years regarding information, and it's value (or lack there-of). I've spoken with Bill on this subject in the past, and numerous others - including a representative for the Government of Sweden.

There is (and always will be) a lot of talk about TCO of different BI systems, but these notions seem to ignore the data itself, or the measuring of value of the data sets. This entry in my blog is filled with unanswered questions, and if you are an information asset evaluator, insurance adjuster, or in the financial industry, I'd love to hear your comments on the subject.

It all started one day when I was teaching "How to manage your ever-growing data sets" at a Wilshire conference. I was discussing the nature of data in a VLDW/VLDB context, and talking about a specific case that Herb Edlestein had spoken on - the value of Data Mining to impute (or compute/fill in) missing data elements as part of a data quality solution.

I discussed this for a while in the context of a magazine marketing company (which is also Mr Edlesteins' example). It goes a little like this:

What if you're a magazine subscription company who makes money off advertisements that are sold in specific magazines, and you want to offer complimentary subscriptions (one years worth) to a few of your best customers. In your data set you have a table which contains subscribers, in order to offer subscriptions - the algorithms devised utilize the gender column as a heavy weighting factor. Among the millions of subscribers, there are two in particular that we'll focus on. One: Sam Smith, and the other: Sam Smith #2 - for which we have the addresses, and their magazine subscriptions, but do not have their gender.

To make matters worse, we don't have any other subscribers at their address. All we know is the following:
Sam Smith #1 - also subscribes to Hot-Rod magazine
Sam Smith #2 - subscribes to CIO Magazine

This isn't enough information to offer an appropriate free subscription, and currently the VALUE of the gender column is 98% of our weighting factor. Now under further mining, or statistics we find that Sam Smith #1 also subscribes to Young Bride, and Sam Smith #2 also subscribes to FHM. The certainty of "setting the gender column" is better, at least for the purposes of adding a subscription that each Smith would like.

Suppose Sam Smith #2 is part of our largest revenue producers, (Sam Smith #2 re-sells) what's the tangible value of the gender column when it's filled in? Is it the value of the revenue that the customer is bringing to the table?

If we make Sam Smith #2 angry by sending them an unwanted subscription (because we impute the wrong value for the gender) and they leave us for our competitor?

The real questions: What's the value of the gender column, and if we can establish a value, is it a weighted value based on different factors for each customer? Can we compute an average, and a max and min value for the entire set of customers? If we can compute a value for the data set, then can we insure it as an asset on the accounting books? Are there companies that insure data as an asset? Which of course raises the question: which data is an asset, and which data is now a liability and what's its value?

Valuation of data or information is a very tricky process, however it appears as if some companies and some governments must accomplish this in the generic sense. It would be an interesting study to conduct to find out who has already worked on this concept. I invite all readers to share their opinions.

  Posted by Dan Linstedt at 8:21 AM | | Comments (5)


Business Intelligence and the IRS

In the BI Industry, someone is constanty talking about accountability, or better security, and in this world of VLDW - we must take caution to secure our data. In case you hadn't heard, the IRS "may have had" a security breach leading to identity theft. MSNBC - IRS & Identity Theft

When we build VLDW's or VLDB's we MUST be concerned with security. The greater number of needles in the haystack, and the bigger the haystack gets - the more likely we are to have others who want to find the needles, and most of the time, the others are not authorized. If an SOA is simply "thrown on top" of an existing system without proper planning and architecture, it can lead to disasterous results.

One has to wonder in this day and age, how systems like this (particularly with e-file), get put into place. Where's the business intelligence in e-filing if you're going to get your identity stolen from a public sector industry?

The article continues... "The agency has fixed 32 of the 53 problems that turned up in a 2002 review, the GAO said. But the GAO found 39 new security problems on top of the 21 that remain unfixed."

If we built a BI system, on top of a VLDW that had 32 of the 53 security problems fixed (that were found 3 years ago), our business users would be furious, especially if it compromised the security of our systems.

Finally the article says: "An IRS spokesman declined to comment further. Michigan Rep. John Conyers, a Democrat, said the Judiciary Committee will consider whether additional measures are needed to strengthen computer security."

In the private sector, we'd never get by on selling a project like this - especially if the IT or consulting staff were to "refuse to comment further".

What does this mean to the VLDW systems and the SOA's that we are currently building?
In the private sector, it means, we must take extra caution to:
1) install proper SOA monitoring devices,
2) pay the extra dollar, go the extra mile and try to break the security of the system
3) further lock down access to the large and centralized data stores (the bigger the data sets, the more likely we can have security leaks).
4) Identify accountable resources in business who can and will take ownership when problems are found.

Just because we have more powerful tools in the Business Intelligence world today, and just because we have more data than ever collected into a single instance - doesn't mean we can be more lax with our restrictions. VLDW/VLDB, and SOA are all large undertakings for any organization (government or not), and require new measures to ensure their success. Again, convergence of IT with the business is paramount to making the new world of Business Intelligence work.

In the case of the IRS, I wonder when they will actually get around to fixing the "known bugs" and stopping the security leaks. Loss of business accountability is tremendously detrimental to any BI system.

  Posted by Dan Linstedt at 7:56 AM | | Comments (0)


April 13, 2005

It's Business Modeling, not Data Modeling.

Whether or not it's an IT shop, or a business user - its critical to change the paradigm of thinking to Business Modeling. In the physical notion of storing data, IT is data modeling - however it should all be based on business modeling efforts.

There has been a divergence between the way business gets done, and the models that house the data for the business. This is causing severe friction, particularly when businesses go to make a change.

IT needs to change their thinking about modeling; in March 15, 2005 CIO magazine, Rick Pastore commented: "Alignment is Dead: Long Live Convergence".

As stated by Rick:

To initiate convergence, there's no one button to push; there are a whole slew of things to pursue - such as populating steering committees with business and IT leaders who work from one strategic blueprint, colocating IT managers in business units, abolishing the notion of IT projects in favor of business-funded Business Projects enabled by technology.

What is meant by "Business Modeling"? Isn't this just another form of process modeling or re-engineering? No. What is meant by business modeling is the idea that business processes are the drivers for collecting and using information (data), therefore the systems that are built to house that data should be modeled after the business.

For years, IT has spent time and money training data modelers how to build "data models" that meet IT needs and desires. Then, they go through undulations of requirements gathering and interpretation of those requirements to arrive at some arcane data model. Note: Not all data models are arcane, there are some great data modelers out there... But that doesn't take away from the fact that the models themselves diverge from the business interests.

A business comes to request a change to the model, IT says: it will cost X & Y, and take resources Q & R & S a couple months to get the change in. The business decides, it costs too much, or the impact is too large, or it will take too much time to make the change - now the data model and IT are constraining business, they are no longer enabling business and the data model is now out of alignment with the business.

Unfortunately these techniques for data modeling are old (for purposes of enterprise integration). In a most recent blog it was pointed out that 3rd Normal form was created in the 1960's, and Star Schema was created in the 1980's. Technology has moved on, changed, gotten faster and more agile. Why then hasn't IT changed their notions of how data models are built?

When was the last time the business had a model walk-through, and understood where and why the data is "modeled" the way it is? There has been a divergence of methods between how the business models (which has evolved over time), and how IT models the data for the business. Constant "alignment" takes place to keep the models up to date and matching the business changes. By alignment, IT continues to "review, update, and impact" existing data models.

Consider modeling in this light: If you have sales totals across the country to "watch", would you rather receive the data in columnar or tabular format (modeled as text), or would you rather start with a high level graph that showed further summaries of information (modeled as a graphic)? Most decision makers choose the graph, hence the dashboards and all the fancy analytics that re-model the data on display.

Now, what is the impact if the business person wants to see another country on the graph? Pretty minimal to the data model and to IT if in-fact, IT has the systems setup to capture world-wide information. On the other hand, what if the business acquires a new company, and wants to see their information included on the graph? The impact can be tremendous, cost loads of money, and take incredulous amounts of time to implement. This is NOT the way it should be done.

Two years ago I had the opportunity of meeting some decision makers for a large New York company, and discussing how they were handling their M&A activities. They had told me that their requirement was to merge data from 3 acquired companies in 3 months, they also let me know they were using modeling activities similar to the Data Vault, and were extremely pleased with the flexibility; they had just completed the 90 day M&A activities for their information systems.

Think about it, if SAP or PeopleSoft has a change to make to their data model, then they also usually have a huge impact to their application and all of their customers. It's a pain for businesses to upgrade to the new version to receive the new business functionality.

From a data modeling perspective, we in IT are often found "re-aligning our data models to match the new business models". We should begin converging our data modeling practices with the notions of business modeling.

What should be focused on for Business Modeling?
Business Modeling is done by the business, for the business and means something to the business. It may be implemented by IT going forward as an enabling technology, but it shouldn't constrain the businesses ability to execute decisions. IT should be a part of the business, and know that it's their job to represent the business properly.

So what does this really mean?
IT resources should begin working like business users, rotating in and out of business circles and understand the business problems. Along with this, comes the information modeling component, metadata embedding, and changes to the way the systems work which will permanently and positively impact the business going forward.

Why do we build applications? To meet business needs, how does the business track locate, and identify data? By using keys to information - and sometimes applications to gain access to that information. Why then do the data models separate the business from the information that it owns by introducing layers and layers of complexity?

What should Business Modeling be?
1. Businesses identify their information by key, if they can't key it uniquely then they cannot reference it - or they create new records containing duplicate information.

2. Businesses then describe that key with contextual elements, otherwise known as "making sense of what the key means" - in other words, the characteristics of the key. For instance, if you have a key to your house and a key to your "condo" - what tells you which key is which? Could be the appearance after a while, the shape, the color - maybe you put red nail polish on one key at first to discern each key. That's a descriptive attribute; the color can change over time. But the key number from the manufacturer is unique.

3. Then, the business proceeds to make associations between these key elements, mostly known as transactions or intersections, or relationships.

Let's take a look at a business example. The customer; the customer has a customer number: "XYZPDA1", if we lose this key, or the key changes between contracts and finance, then can we accurately identify this customer across our enterprise? No. Like it or not, businesses that "change" operational keys to business information are making a ghastly mistake, hence all the data quality efforts after the fact to clean this up - back to the example...

The customer has a name, address, and income bracket. These are descriptive attributes which can change over time - these are business elements that are periphery information and describe context about the customer key "XYZPDA1". The key itself is meaningless outside the "system" which tracks millions of customers (unless the keys have been encoded to match a business need, which does happen).

Now, we have an interaction between XYZPDA1 and EMP001. EMP001 closes a sale to this customer, and wants to record it, they sold $140,000.00 worth of data model changes (it was a good day). There's a service/product key hidden here too: SVC4000

Two keys with different semantic meanings and an interaction between these keys on 1/1/2005 for $140,000.00, so we have a defined grain for the transaction.

What does the business model look like?
CUSTOMER KEYS -> CUSTOMER-EMPLOYEE-SERVICE-SALE <- EMPLOYEE KEYS
^
SERVICE KEYS

Each set of keys, including the cross-grain sale item have descriptive elements. This is an appropriate data model that's simple, easy to understand, and stores the data according to the business model.

A breakdown of the thought process:
1. Lists of keys (like Customer Keys and Employee Keys) are called HUB's. They are the central hub of information - one hub for each key set, since they are separated by business definition. We are capturing business metadata; some of it is embedded in the architecture.

2. Linkages between Keys (like employee sells to customer), Link tables. The number and type of Hub keys embedded in the link define the granularity of the transaction, in other words, the link is defined BY the business definition of the transaction.

3. Descriptive data fits into Satellites. Descriptions for customer that are solely dependent on the customer key, are built in one-level deep Satellites, dependent on the Hub keys and recorded by date/time of data arrival or information generation. There can be satellites that describe the link tables too; they describe the transaction or the sale (like $14.00, and when the sale took place).

Below are several of the backing notions for this model:
1. All data "about" a particular business key, is always and only one level deep from the business key.
2. Data can be split in the Satellites by type of data and rate of change (delta processing), resulting in lower storage costs, and more flexible modeling techniques.
3. Business can add new pieces without destroying existing data sets.
4. Business can experiment with "what if" questions, in other words: what is the impact of this business change, and does it make business "easier or harder"?
5. Issues in the business processes are easier to spot and fix.

Here's an example of a business issue (real case) - I've changed the type of information around a bit to protect the company.

Example: A widget building company has a planning operation and a manufacturing operation. They build computer motherboards, all of the motherboards are custom made to order. They reward (bonus) planners for building and architecting plans that meet or exceed actuals in the manufacturing bill. That is: actual dates, cost, and labor.

In this model, we are to track parts. The planned parts and the manufactured parts that roll up on to a bill of material. We are also told to track the quantities manufactured, number of rejects, successes, and reworked items. Along with that, we are told to track the dates that the actual work starts and stops - but before the work starts, it is suggested that we track the planned dates for when the work should start and stop.

Objective from the business: 1) critical path analysis through their planning and manufacturing phase 2) understanding if their bonus plan for planned vs actuals is meeting corporate goals.

We carried the following tables: Part Key Hub, Bill Number Hub, Assembly number hub, Link between assembly number and itself (hierarchical bill), Link between parts and assemblies (describing which parts were on what assemblies), and two Assembly satellites: One housing quantities, one housing dates (planned dates and actual dates).

From this simple model, we were able to save the business millions of dollars (quickly). The business did a little statistical analysis, and found the following: out of 15 planners, 13 had re-planned parts only 3 times. 2 had re-planned parts 12 to 15 times (Planner YY and Planner ZZ).

Planner YY had 5 to 8 rejects with every manufactured actual, and was 80% on-time with the plan matching the actual.

Planner ZZ had 0 to 1 rejects with every manufactured actual, and was 98% on-time with the plan matching the actual.

What was interesting was that Planner ZZ's planned start and end dates held a increasingly accurate pattern (when viewed in a Gantt Chart). Started out 20% on-time, and grew to 98% on-time.

Long story short, Planner ZZ was discovered to be cheating the system, Planner YY was given 2 weeks to re-plan their complex part. Both actions saved the company money. Bottom line: the model lended itself to human pattern recognition, quite simply: the planned vs actual date satellite was growing at an unusual rate of change which sparked the business inquiry. It was the way the model was built that helped make the difference.

Simple, effective, elegant. This modeling paradigm can be (and is currently) utilized by customers who wish to eliminate operational data stores, staging areas, data warehouses, and data marts, all in favor of a single copy of the facts.

In this model, current data and historical data are available; exactly as it was entered. IT can also build virtual data marts until the hardware platform can't handle the size or performance; at which time physical tables can be built to help with the performance issues.

How we do business should be directly reflected in the way IT builds solutions, not the other way around. IT needs to change their data models into working "Business Models"; the Data Vault helps with this paradigm shift.

Cheers,
Dan L

  Posted by Dan Linstedt at 2:40 PM | | Comments (1)


April 8, 2005

Data Modeling is vital to successful Business practices

Is your business growing? Are you capturing more data every day? Is your business Nimble enough to stay competitive? Being nimble anymore has nothing to do with being small; while that may still be an advantage, even some small businesses have politics and misguided business practices, but rarely is it written that some of these things stem directly from the way we build our data/information models within our information stores.

With the advent of SOA, it's not just warehousing anymore, it's data integration, aggregation, conglomeration across the enterprise to a SINGLE data copy of information. Hopefully the models we build to house this information free us to change the business, rather than constrain us from making necessary changes.

Over the years, data modeling has evolved. The preferred term is now Information Modeling. What is data or information modeling? It is the logical representation of data or information in a format that can categorize, organize and manage different sets of information. The model also provides a mechanism for us to access, alter and add new information as the needs arise. This access point is typically referred to as SQL (standard query language) and includes a basic set of commands: Insert, Update, Delete, and Select.

All of this aside – information modeling hasn’t changed much over the years – that is to say that the basis for how we categorize, manage, and maintain the information has remained the same. All the while, information itself has changed and grown beyond our wildest imaginations. The Data Vault is the next evolution of information modeling. It takes the basic concepts and builds on them to overcome the challenges laid before us like very large data (VLDB), real-time (RT), and independent units of information.

If our businesses are changing, and our data sets are changing (along with volume and frequency of data arrival/utilization), shouldn't we be changing our data modeling styles to keep up? Did you realize that the 3rd Normal Form was created in the 60's, and that Dr Peter Chen started the whole thing with Entity Relationship Diagrams way before that? How about Star Schema? When was that created? Sometime in the 1980's if I'm not mistaken.

Each of these modeling techniques are wonderful when used in the context for which they were designed. But when placed in today's changing requirements of rapid data access, terabytes (approaching petabytes) of storage/collection, rapid data feeds - they leave a little bit to be desired. To my knowledge, they weren't specifically architected to meet these needs.

However from a business perspective, what about business changes? The business is changing faster than ever before, and the speed at which business changes is increasing exponentially (in order to survive competition). If Wal-Mart had to do it all over again, but started today - it would be twice as hard for them to make it work. This of course given that other companies like Target would have innovated on their own, and be at the level they are at today as well.

Now, there's a strange paradigm here: 1) divergence, 2) convergence. Divergence of the PHYSICAL data modeling representations from the business, and convergence of the Historical and up-to-date (real-time) information stores. Any time a model diverges from the business goals, we have to ask the questions: is it helping or hurting us? Is it constraining us from getting to our end-goals faster?

In the data modeling and IT world this translates to the following: Sir, we have a huge enterprise warehouse, to make the change you are asking for will take X days/weeks and Y number of resources. It will also impact A, B, C, and D. The business goes away and confers... they come back and say: costs too much, or it will take too much time, therefore we must "settle for less" or make a different business decision.

To me, this means that the business decisions are CONSTRAINED by the information model that's in place, and the further away from business processes that the information model gets, the more constraints begin to show up. It's may be one way that a legacy system can become a legacy system.

Furthermore, suppose the model was built in 3rd Normal Form (as an Active Data Warehouse - ADW), containing both history and current information.

The business question?
When was the last time that high-level business users could be walked through a 3rd Normal Form data model and understand or SEE how their business operates? Normally the modeler points and says, your data X resides here and here.

Star Schema's are a bit better to understand, they are designed and architected specifically to meet SUBJECT ORIENTED business needs. They were not designed to meet cross-functional needs, nor were they designed to handle real-time feed requirements, nor super-huge volumes. They were designed in such a way, so as the data would be aggregated (lose detail), less volume, faster queries, less disk space, etc... Meet the business needs one subject at time.

Star Schema's are wonderful tools, when built correctly, and utilized for dimensional analysis and OLAP drill-down. In fact, building Virtual Marts is something I've taken to lately, given that making copies of huge enterprise data warehousing sets is simply not feasible. However, to build a Star Schema as a data warehouse, can be done - and can be done successfuly with large volumes, it's can be very compicated and also can cause constraints on the business.

The analogy is this: Suppose you had a Porsche 914, and a Big-Rig. If your objective is to move your house, would you use the 914? How about hooking a lot of 914's together, and driving them across the country, would this work? Sure, it will work, but its not as effective as filling and driving a big-rig one time.

If your objective is to win a race, would you chop the top of the big-rig, take off the trailer, leave the engine and frame intact and then run the race against 914's and expect to win? Probably not. You certainly could ADAPT the design, but why? Use the appropriate design/architecture for the appropriate purpose.

The Star Schema is like the 914, fast, efficient, works. 3NF is like the big-rig, handles tons of data, and vast real-time requirements but has "history storing issues". The Data Vault is like the SUV of this world, it comes directly in the middle, and provides comfort, style, torque, speed, and towing capacity. Although that's where the analogy stops - the Data Vault can be extended into the Petabyte ranges of data very easily.

I've designed a new model, one that integrates the best of breed notions from both architectures, it's called the Data Vault. And it's design concepts focus on: returning the model to the business case, making the model conformant to cross-functional business, addressing compliance, handling super-huge volumes, and real-time feeds. (see the free white papers on the Data Vault Here)

It's time that we stop constraining our business decisions through the modeling choice, and start making changes to the modeling architecture that is chosen. The modeling architecture should adapt to meet the needs of the business, and shouldn't cost a lot of money, or take a lot of time in order to be altered to meet the needs of the business.

The Data Vault is free - it's just a fancy marketing name for something called: "Common Foundational Information Data Model"

I'll be blogging more on this topic later, but would love your feedback.

Cheers,
Dan Linstedt

  Posted by Dan Linstedt at 8:00 PM | | Comments (2)


Is this the ULTIMATE SOA Payback application?

Ok, I just finished blogging a rant - sorry guys, but it had to be said. In that blog (posted here) I suggest that maybe I should build my own data warehouse on myself, well that would require 1) rights to access my own information, 2) free rights to access it any time I wanted, from anywhere I wanted.

Sounds like an SOA to me. So let's open the soap-box a little further shall we?

Here we go.... So you want an SOA? Cool, these are extremely good things to implement in the Enterprise Architecture, and don't get me wrong - I really do believe that SOA is a new step in the right direction (for utilizing, and conglomerating resources and driving up ROI).

So let's examine a real-world business need, my need to access my information at any time from anywhere on the web, and download with full security and authenticity any information said, collected, or written down on me. This is a pipe dream, I know - but follow me.

If ever there is an ultimate ROI, it might be to invest in myself. I could learn a lot about the way I'm perceived in public and by these companies that capture all this information, RANT-ON: after all, I helped create the information they captured, right? RANT-OFF.

So what's an SOA got to do with it?
Suppose I want to build my own personal data warehouse, on myself, for myself, and capture all that I can from the structured and unstructured world (don't worry, the government is already thinking about this for us - scary thought). If for nothing else, but to hold the insurance companies, medical companies, credit companies, and possibly banks accountable for ALL interactions they have with me.

After the warehouse is built, it would not be enough for me to record my thoughts, my ideas, my notions of a two way conversation with Company A. I would need some manner in which to "request" the information they collected from me, and then assuming they were required by law to first: verify that I am me, and then provide the information to me over a web-service.

I would then have to implement an SOA, and schedule the on-demand services from my perspective to contact their gateway and request a log of the information just captured. Maybe my process would even introduce a manual step for THEM, like: you are required to call me in 10 minutes to let me know if you received my fax or not. Maybe within 10 minutes I start the service up again, and look for an annotation in their logs that says they got it, or they didn't. RANT-ON: What ever happened to personal accountability? RANT-OFF.

Now, fast-forward: A couple years pass, I've managed to collect all this information over the SOA, and somehow miraculously I've built this cool dashboard on the world’s perception of "DAN" today, wouldn't that be the ultimate payoff? I could look at my dashboard indicator, and if it's in the red - decide to go play golf for the day, or sit down and deal with the issues that might put me back into the green.

Of course, then I'd spend all my time fighting with vendors over who's data is right, and who's is wrong - but if I have an SOA in place, I could easily correct it - write it back over the SOA, and move on. After all, the customer is always right - right?

Now for the ULTIMATE PAYOFF: at or near the end of my life, I package up all this information and offer it to the highest bidder. Why? To pay for my grand-kids to go through college.

On second thought, this is way to scary. But I wonder - if businesses are PAYING each other to share information about me over SOA, and it may be incorrect, why then do I have to go to court to get it fixed? Why can't I get the information for free?

I'll get back to business on SOA's shortly, they are wondrous things - but compliance, privacy and information access are only going to get harder as we go, not easier.

See you next time,
Dan Linstedt

Share: