Business Intelligence Network business intelligence resources

Blog: Dan E. Linstedt

Main | April 2005 »

March 29, 2005

Nanotech - genes and computation ability

Genomics, and DNA computing are HOT in nanotechnology. As one researcher has written in US News (March 28th,2005) there are many things we haven't yet considered. Can gene's act as tiny computers? Can they shape the way we move into nanaotechnology? This blog explores this type of thought process.

"Using the powerful tools of molecular biology and comparative genomics, they're finding speciic changes in the DNA that can account for 17,000 species of butterfly or why insects have only six legs instead of a dozen." US News, March 28th, 2005 - "Dances with fruit flies".

While personally I do not agree with evolution as the total story, there are certainly some implications in this article which make genomic computation interesting. The study of nanotechnology includes something called Bio-Informatics, or a crude way to put it: biology that can compute information when manipulated for our purposes. There's also the study of genomics, the manner in which genes interpret how we look, what we like, how much hair we have, and so on.

DNA is a very powerful strand of cells that dictates all of these things. The cells for the genes in the skin are no different than the cells in the genes for the brain. However, different sequencing, and emphasis of these genes allow the body to create different parts and function in a different fashion. Form vs function.

I believe that consistent form is necessary for accurate computation, regardless of where or what the computation is that is taking place. I also believe that the form is a vital key to the success of redundancy of information content, and computational power. In aaddition (if I may be so bold), I believe that computational ability (and information usefulness) is directly proportional to the amount of non-redundant information that is time-keyed.

All that said, the function of the information, at least for genomes appears to be in it's position within the DNA sequence, and whether or not inhibitors limit the functionality according to where the gene is applied (ie: skin, hair, teeth, bone).

Genes can create computation (cells that communicate), otherwise the mind would be a collection of dull grey matter which has no function. The function of those cells is regulated by the enzymes surrounding the cell matter and the location in the body in which these cells exist. Location also has an effect on how the gene is folded - different purposes for the DNA strand cause the DNA to fold in different areas, leading to different results.

Of course if we step back from the DNA form itself and ask the question: what folds the gene? It's an enzyme that does the work. Are they all folded the same way? No. Then how does DNA understand where to fold? First, this is the incorrect question. The correct question is: how does the Enzyme know where to fold the DNA? This is not known today, but let's speak hypothetically for a moment.

Suppose DNA is just the FORM of the data set, suppose the enzyme is the Function (like a program on top of data in a data model - two programs use the same data model for different purposes). Suppose there are three basic (simplistic model) enzymes: one that unzips DNA, one that copies DNA (organizes other molecules into a new copy of the DNA strand), and one that ZIPS it back together again.

Given our simplistic model, where would it make most sense to have some form of rule that says: activate this molecule to fold here when put back together. Would it make sense to have the structure of the DNA actually tell us? How about one of the enzyme engines? In order to answer this question, we'd need a ton of research. However for the purposes of discussion: I'm going to say neither.

I'm going to say that it's hidden in the informational content within the DNA, and that when the DNA is re-combined with other cells (because it's being replicated), the information triggers a chemical reaction at specific spots in the DNA strand that cause it to fold up. In other words, the form and the function are necessary for computation, but it's the content within the form that makes all the difference. I believe there are (as of yet) undiscovered atoms/cells which indicate the informational content of genes.

I think that as we move forward in this age of computation, we will find that the gene itself cannot act as a tiny computer without the enzymes and the correct information. I think however that in the world of "data warehousing and business intelligence" that we must become smarter.

WARNING: CONTROVERSIAL STATEMENT:
We must recognize that there is one way to model information within our systems, one form (today, tomorrow it will evolve) - and that there are thousands of functions of this information, but by building tiny self-contained computational modules on top of the form, larger systems can begin to interpret the information in parallel, and that reduction of redundancy of information is paramount to the evolution of understanding within those systems.

Can gene's act as tiny computers? Yes, but not through form itself. Genes must be combined with enzymes, and information in order to do their jobs properly. Can genes become the ultimate computing device? I think they already are (at least for today and until evolution makes it better). If we can learn to pair up non-invasive information with genetic structures, then we can begin working with informational systems on a level never before seen.

Can DNA computing shape the way we move into nanotechnology? Absolutely, it already has. There are different forms of nanotechnology today, ranging from single atomic layer control to mutli-cell DNA computation. I think DNA computing will become more and more powerful as we begin to understand how to switch on and off different genes in the DNA strand. But it's not just on and off. It's shades of grey, understanding how to inhibit genes to different levels (like gradient colors) can become a huge boon to understanding DNA computing. By the way, DARPA has already done such a thing - in 1999. See one of the nanotech articles on this site for more information.

Love to hear your comments and thoughts, this is a difficult topic with many many opinions, but one thing is certain: We can't test theories that don't exist, and if we don't think outside the box, we can't create the theories to test.

Cheers for now,
Dan L

  Posted by Dan Linstedt at 5:49 AM | | Comments (0)


March 26, 2005

Compliance, Data Integration, Part 2

As strange as it sounds, statement-of-fact is exactly what the data warehouse should become. See Bill Inmons article on Bill Inmon's Vision for a data warehouse. In this second installment we explore compliance, auditability and integration routines.

Let's take a look at data we think is not auditable by compliance rules...

For a hypothetical example, lets assume we are a marketing company, our business is to gain subscribers to magazines through postal mail. In our business, addresses are worth more than names - but we've seen subscription rates and response rates rise when we utilize the correct names. Of course, having the address correct saves us tons of money in postage, returns, and bounced mail.

Now - if a goal of our new campaign is to give away a free subscription to a magazine, in hopes of gaining advertising dollars or new paid subscribers. Now, in order to know what to "offer" in this free subscription we decide tho use one of our other data elements: the "gender" column, optionally provided of course.

After profiling our data (basic analysis) we find that 60% of our gender columns aren't filled in, or aren't filled in with appropriate values. This column is of great importance as a deciding factor to subscriptions, so we decide to run a data mining algorithm to "figure out" what the gender should be set to.

Uh-oh - we're bordering on changing data in the warehouse! Warning, this is NOT compliant nor advisable. But what if, we could insert the imputed value to a new table and generate a system record source, and a datetime that shows when we computed it? We've managed to leave the original data in tact, and we can show the before and after, so I think we meet compliance. But it goes deeper than this....

After mining the data sets we discover two particular rows that we are struggling with: the data says they are both: Sam Smith at two different addresses with no gender filled in. Through mining we discovered that Sam Smith #1 subscribes to Hot-Rod Magazine, and Sam Smith #2 subscribes to Home Architecture. We mine a little further and discover that Sam Smith #1 also subscribes to Home Maker and Girls Fashion, Sam Smith #2 also subscribes to Bowling Weekly, and FHM.

Finally, we produce the right gender and send out the correct invitations to additional magazine titles.
-------------
Ok back to compliance and auditing.... If our business is revenue driven through subscriptions, and subscriptions are based on Gender column, then are Gender Column values auditable? Do they hold financial value for the company? What if the Gender values are "falsified" or changed according to today's business user-version of the truth?

You get the idea. If one single column can hold financial value for the company, then most likely, most all data elements can be included in this arguement. Does this mean that ALL data can now be treated as an asset, valuated, and insured? Well - that's another blog - but it certainly means the auditor can look at how the financial figures were produced, and within that audit they might very well look at the process that generates Gender values.

Does that also mean that if the data is "incorrectly calculated" that the Implementor can be held accountable? That's for the auditors to decide, but I don't want to be left in that bucket without a solid traceable set of original and unmodified information.

  Posted by Dan Linstedt at 1:44 AM | | Comments (0)


March 24, 2005

Compliance, Data Integration, Accountability?

In this weeks' newsletter Bill discusses Sarbanes-Oxley and what it means to business. See Bill Inmons Newsletter article. In this blog we take it a step deeper - into the implementation world of data integration.

What does compliance mean to those building ETL, EAI, EII and Web Services routines? What does it mean to the data set both IN the data warehouse and now being loaded into the data warehouse? What will data Integration have to endure in the coming year or two commercially? This category of blogs will explore these questions and more.

For our first entry in this space, we will discuss the following question: What does compliance mean to data integration routines, and the implementors building EAI, ETL, EII, and Web Services components?

For the purpose of this discussion, we will translate "Compliance at a business level" to mean "accountability and tracability of the data at the lowest grain." When an auditor holds a firm "in-compliance" or "out-of-compliance" that could potentially mean that their accounting numbers (bottom line) doesn't match up with their records or investigation.

As Bill Inmon said: "In many ways, the financial transaction is the pie, and the audit needs to look at how the pie was baked and how the apples were cut up before being put in the pie."

So what if we were to assume the apples and the cutting process were to include not only e-mail (along with other unstructured data), but also the actual integration routines that cleanse, check, and alter data on the way IN to our data warehouses?

This would change the picture drastically. Suppose we are given a project to build a data warehouse that consolidates a single view of financial transactions from 3 source systems. The business signs-off on some requirements declaring "how to transform and cleans the data" to bring it into the warehouse. We build the data warehouse model, followed by the integration routines to load it.

The story begins...
Along the way, we "find" some really bad and discrepant data - bad according to the end-users "viewpoint." So again, the end-user signs off, and we "fix" the loading mechanism to change the data. A couple of months pass and the end-users are happy.

Then, the business sponsor leaves the company, and a new individual "ABE" takes their place. ABE decides that he doesn't like the way the financial data is being rolled together, comes to the data warehouse team and says: change the business rules, this data isn't bad - this other data is bad. So we change the rules, and again the users, including ABE are happy.

A few more months pass - and an auditor comes. The auditor asks questions like: how did this data get this way? There seems to be an error in this data, it doesn't seem to ever have been correct. Where is the problem ABE? ABE Says, check with JJJ he works for me - and so on down the line.

Finally the auditor walks in to your office (you the implementor) and asks: did you write the routines that change this data? Yes? Hmmm, can you show me what the data looked like BEFORE your routine changed it? Did you know that the data your routine produces is not auditable?

You struggle for a minute, and say: yes - but the source system is supposed to be the system of record, oh - and the business user JJJ signed off on these requirements... The auditor continues - JJJ said you built the routines that changed the data; he's only responsible for what they DO with the data and how they interpret the results.

The auditor, smiles and gives you the benefit of the doubt: show me these source systems and if you can trace this information back to the source - then I'll go find out how this data was captured and where the break is. If not, then I'm afraid I'm going to involve you in this audit too.

-----------
What happened? Why can't the source be used as a system of record? It just so happens that 1) aggregation, cleansing, and quality initiatives are "filters" - today's version of the truth that change when the business changes 2) applying these things to data in-stream on the way into the data warehouse causes a loss of grain, and a loss of tracability. 3) the source system is only a system of record FOR ITSELF, 4) the data warehouse (like it or not) is fast becomming a system-of-record for auditing purposes, because it's the ONLY place this integrated view of the data exists. 5) The Implementation expert takes the heat of the audit for not being able to back-track and prove how this data came to be.

Ok - that's the most severe case, but as Bill said many times in his career: the Data Warehouse should NEVER be a system of record - he's right, we should make it a "statement-of-fact." One that tracks data coming into the warehouse on a granular level AS-IT-STOOD in the source system, to meet auditors needs, thus allowing traceability back to the real SOR - the source system. Finally, this means moving the business logic, cleansing, quality, and aggregation to the OUTPUT side of the EDW - changing data "because of todays version of the truth" lends itself well to single data marts, when it's backed with a full normalized EDW that carries all the details.

What does this mean to the implementors of integration logic? If it is a requirement to be compliant, then it means that the data must be tracked: what it was before it was changed, when it was changed, and what it was changed into. Often times, a single source system just isn't enough, or it offloads the data onto un-restorable backup systems.

If the data is not "auditable" by compliance rules then you have nothing to worry about... or do you?

Watch this category, more coming soon.
Cheers,
Dan L

  Posted by Dan Linstedt at 2:07 PM | | Comments (0)


March 23, 2005

What's the big stink about anyway?

What do Nanotechnology, sneakers, and Business Intelligence have in common? (no-worries, we don't have an aroma device hooked to the blog)

Does this sound interesting? Maybe not. But then again, would you provide customer loyalty to shoe manufacturer XYZ in exchange for sneakers that never get stinky, how about your Gym Socks? Can nanotechnology come to the rescue and solve the problem? This blog explores an interesting perspective on a smelly topic.

I just heard on the radio about a "stinkiest sneaker competition" that was heald somewhere in the US. Not so interesting, but it got me thinking, what if sneaker companies could eliminate odor? What if I could buy Gym Clothes, Shoes, Socks, and a Gym Bag that never stunk? Would that boost sales? Would it lead to new BI research or competitive analysis on Just How Stinky can your sneaker get?

It might. Let's just assume for now that it is interesting. Maybe you have a pair of XYZ's and you love those shoes! But because they stink, alas, you have to throw them away. What if XYZ came out with a shoe that never gets' stinky, would you buy them and wear them until they were worn out? Would it increase your customer loyalty? Ok, so there's the business problem - what's the solution?

Well, first we have to find a way to judge stinky sneakers, and we have to find a way to make sneakers stinky to begin with. It has been said by the winner of this competition: "The secret is to never wear socks in your sneakers, ever." Ok, with that - we have a way to create a bunch of stinky sneakers.

How we judge them must become scientific - we must find a way to collect quantitative facts, maybe with an aroma-meter (ok - stink-o-meter). We enter the information into the computer and find out which sneakers stink the worst, and what skin types, persperation, acidity, and other things affect the sneaker material.

We run through all the testing, and produce the BI results that show what causes the biggest stink, then we have to build a solution - something that resides in the shoe material, that doesn't rub off, sweat off, leak off, or wear off over time. This solution, must bind with the shoe material, and it must be capable of warding off the "stink-causing agents".

Enter nanotechnology. This is the exact type of application that Nanotechnology can solve, today. Properly engineered solutions can be sprayed on to all the material prior to assembly. Half the molecule can be engineered to bind with the material, the other half of the molecule can be engineered to ward off the stink-causing agents, be-it skin acid, sweat (most likely), salt, and other elements.

The tough part, as they say, is to 1) figure out what causes shoes to stink, and 2) develop the nanotech spray that eliminates the absorption of these molecules into the shoe material. Then, the shoe can be made and XYZ might capture additional market share, and a couple of incredible endorsements....

As they say in Monsters Inc, "Go-ahead and stink-it-up"... See you in the Gym.
Dan L

  Posted by Dan Linstedt at 11:48 AM | | Comments (0)


March 18, 2005

Nanotech and the "Next Big Thing"

Nanohousing for me (right now) is a hobby, something under development - an extreme beyond the horizon. Ok, enough blather - but for gosh sakes have you seen CIO, Business Week, InfoWorld, or B-Eye Networks lately? Nanotechnology is everywhere. In this blog I am opening up this section for new developments.

Nanotechnology has come a long ways and in an over-simplified definition might be defined as:

Wet-Technology
The blending of man made technological advances, using natural world models, and chemical elements. In fact, some parts of nanotechnology create new chemical elements.

Anyhow, it might boil down to "man's control over the atom." That's right, the atom or the atomic level. So what does this mean? It means anyone who is interested in the future of technology should start learning now about nanotechnology, what it is, what it does, how it operates. Nanotechnology is not necessarily a bad thing (some places on the web report that nanotech is the end of all we know), but with all significant advancements in the field - it can be utilized for a lot of good.

In this blog I will be exploring the application of nanotechnology, rather than the definition of nanotechnology itself. One of the applications I am dreaming up is called: Nanohousing, which is nothing more than a whim right now.

I hope you'll follow along in this journey, and offer plenty of comments and new ways to think about the future, for it has significant impacts on what we do in technology, and how technology will be applied.

Cheers,
Dan

  Posted by Dan Linstedt at 4:46 PM | | Comments (0)


Unstructured data, and Business Interpretation of results

This is a follow on to my previous blog where many questions were asked in regards to what it might be like to "mine" web-blogs for content. A good comment was made on watching the context of the blogging statements. Here we explore the notion of that comment.

A comment was received on the previous blog entry about the fact that "it is very dangerous to take a sentence or part of a sentence out of context."

I would tend to agree. One of the things data mining engines have been traditionally good at is mining pre-structured information, the context for this structured information (just one view of this) might be the table structure, the organization and categorization of this information - sometimes dictated by the surrounding data, other times assisted by the data model itself. Of course the full meaning of the information content is defined only by a) the person asking the mining questions and b) the individual(s) who put together the data model being utilized as a source.

However, the age of unstructured information integration has already arrived. Is the technology there yet? Maybe; is context inference available? Not so sure. Of course limited context parsing engines are being built to associate words and word patterns with many different formulas, ranging from statistics to calculus and neural nets.

I would like to say that the comment above is appropriate, taking sentences out of full context for results in a data mining structured manner is very dangerous. It would be akin to misquoting a speaker in a public forum without exploring the meaning behind it, and yet - there is some base-level value here.

Another question was asked in the comment: "In the end is it cheaper and safer to go for searchable blog archives instead mined blogs?"

In my opinion, yes - it is probably cheaper and safer to go for searchable blog archives instead of mined blogs, however - let's not get ahead of our-selves. I stated a minute ago that it is my belief that there is base value in mining blogs, let me explain.

For example: when an executive brings accounting numbers to the table in a financial meeting, they don't just bring the numbers - they do homework to understand what the numbers mean.

I feel that in order to add value to the data mining results, due-diligence should be done to place the context and the meaning of the phrases back into the picture. This way the mining tool can be utilized for "discovery" purposes only, maybe a protected manner - followed by human research and investigation beyond that point.

Just a few more thoughts...

Great questions see you next time.
Dan

  Posted by Dan Linstedt at 2:37 PM | | Comments (0)


March 15, 2005

Unstructured Data and Blogging, and Data Mining?

What would you suppose they have in common? Is anyone considering mining blogs for business value? Sure there are aggregators out there (and some great ones at that), but what would happen if business began looking for real-patterns?

Has anyone setup a web-blogging data mining component? Interesting thought. Let's just say: fictitional company A has employees that blog, fictitional company #2 B is in competition with them, it would seem to me that competitive intelligence is just one aspect of utilizing a blog-miner.

Ok - so let's take a quick look at what that might take (technology wise) to get into. You might want: a) a genuine aggregator of good blogs to garner links and information from b) a list of relevant terms and words that make up your interest c) ranking of those keywords and key-phrases d) A web-scraping or blog scraping/RSS capture tool e) a back-end data model to load the blogs into f) a data mining tool that would either mine text, or a structured data model to fit "parts" of the scanned blog into. And finally, an analysis tool to make sense of the mined results.

What do they have in common? It's another form of super expression, no middle man, just pure information, and it's all unstructured - free flowing. Data mining and business intelligence provide a way to garner information from it. The information might be competitive, maybe watchdog, maybe useful in trying to mine the CEO's latest thoughts on the business.

What if we mined all the CEO blogs for "how to run a company successfully"? We might find some things we don't expect.

Cheers for now,
Dan L

  Posted by Dan Linstedt at 3:16 PM | | Comments (1)


March 14, 2005

Welcome!

Thank you to Bill Inmon for asking that I participate. I am looking forward to bringing readers interesting thoughts and insights into Data Warehousing / Business Intelligence, Nanotechnology, ETL technology (integration in general), Service Oriented Architecture and other items related to Data Warehousing. I hope that all the postings are enjoyed. I look forward to lots of feedback from the audience. All the best, Dan Linstedt, CTO Myers-Holum, Inc (www.myersholum.com)

  Posted by Dan Linstedt at 4:55 PM | | Comments (0)