Blog: Dan E. Linstedt Subscribe to this blog's RSS feed!

Dan Linstedt

Bill Inmon has given me this wonderful opportunity to blog on his behalf. I like to cover everything from DW2.0 to integration to data modeling, including ETL/ELT, SOA, Master Data Management, Unstructured Data, DW and BI. Currently I am working on ways to create dynamic data warehouses, push-button architectures, and automated generation of common data models. You can find me at Denver University where I participate on an academic advisory board for Masters Students in I.T. I can't wait to hear from you in the comments of my blog entries. Thank-you, and all the best; Dan Linstedt http://www.COBICC.com, danL@danLinstedt.com

About the author >

Cofounder of Genesee Academy, RapidACE, and BetterDataModel.com, Daniel Linstedt is an internationally known expert in data warehousing, business intelligence, analytics, very large data warehousing (VLDW), OLTP and performance and tuning. He has been the lead technical architect on enterprise-wide data warehouse projects and refinements for many Fortune 500 companies. Linstedt is an instructor of The Data Warehousing Institute and a featured speaker at industry events. He is a Certified DW2.0 Architect. He has worked with companies including: IBM, Informatica, Ipedo, X-Aware, Netezza, Microsoft, Oracle, Silver Creek Systems, and Teradata.  He is trained in SEI / CMMi Level 5, and is the inventor of The Matrix Methodology, and the Data Vault Data modeling architecture. He has built expert training courses, and trained hundreds of industry professionals, and is the voice of Bill Inmons' Blog on http://www.b-eye-network.com/blogs/linstedt/.

Recently in Nanohousing Category

In this entry, I return to Nanohousing(tm), the notion of utilizing nanotechnology for computing, and Business Intelligence purposes. Remember that these writings are an attempt to go beyond the horizon, and are futuristic guesses on what specific points of nanotech can be applied to within the DW / BI world. It will take years to get to these points, but rest assured - changes are happening. One of the areas that have really interested me in nanotech is the notion of DNA computing, that is using DNA strands form and function (combined) to serve specific computational purposes and answer specific questions.

"The hope of this field is that the pattern matching and polymerization processes of DNA chemistry, combined with the enormous numbers of molecules in a pound, will make feasible computations that are now too hard for conventional computers." DNA Computing, http://www.fas.org/irp/agency/dod/jason/dna.pdf

First I'd like to point out (as I have a few times before) that the notion of form and function are recombined at the DNA computing level. In the BI/DW world of today, we have separated form from function, and it is inhibiting our ability to move forward, not to mention it is a severe drain on flexibility, scalability, and applicability. Form in our BI / DW world today would consist of models: Process models, business models, data models, architecture models, network models, and so forth. Function would be what these models do with the data / information passing through them.

For instance, data models today hardly resemble the business processes in which the data sets flow - while there have been some advances, like UML and Object Oriented modeling - they are still (for the most part) diversified from the true business functions. We strive to make sense of the data, and the architectural modeling paradigms by assigning metadata - descriptive context. We also are now headed back towards convergence of business function and "architecture" with Master Data Models and Master Data sets. Finally we're beginning to get it - but still, the nature of the RDBMS engine in today’s world is to apply common functionality to models designed by external means. They are not tightly coupled.

When we examine DNA Computing as a function of nanotechnology we find this to be a tightly coupled form and function process. The "model" in which the data sits, even where the information is encoded within the strand becomes important. The "function" is built in to the type of DNA strand created - in a bio-chemical sense.

"No arithmetical operations are performed, or have been envisioned, in DNA computing. Instead, the potential power of DNA computing lies in the ability to prepare and sort through an exhaustive library of all possible answers to problems of a certain size. ... A single strand of DNA can be abstracted as a string made up of the letters A, C, G, T. ... Complementary strands of DNA will form a doulbe strand (the famous double helix). Two strings are complementary if the second, read backwards is the same as the first, except that A and T are interchanged, and C and G are interchanged."

Now what happens in the BI / DW space if we were to follow this "wet-technology" model? What would happen if we were to combine form and function like the DNA computation machine? Would we see tremendous leaps in traditional computational power? I hypothesize that this is true, that if we were to simulate DNA computation in a newly designed DNA type database engine we would see a number of things happen. But remember, I'm not talking about traditional DNA modeling software on a traditional CPU / Computing Engine - no, I'm talking about a machine that currently only exists in bio-tech labs, in the test tubes.

Ok, so what could we do better today that we haven't done in the past, and do it on conventional computing resources?
We can begin converging form and function, start small (like a web-service for example), combine it with security, access rules, metadata, and definition of groups from a common set of elements (taxonomies). Cross it with the functionality of a web-service and make it available to the world. Self-encapsulated, it might interact (on it's own) with other web-services, in other words - discovery and deterministics are parts of this web-service. It discovers other web-services, and then decides if the other available service has information it can use, and if it has access - pulls the information in and assimilates it automatically.

Obviously the web-service is part of an extended neural network, which is capable of being taught, learning on it's own, and being corrected over time. So we still have some incorporation of traditional practices (due to the ultimate abstraction). This is a fundamental difference between the computational world and the DNA computing world. DNA Computing uses bio-chemistry to solve it's problems, and learn new things. Security is built in (as a function of what a DNA strand can and cannot "tie" to, bond with, cut and merge to - and how it will execute these things.

As a matter of interest to DARPA, here is an interesting look at the applications of nanotech in today's world.

How do you see DNA computing affecting the future of BI / DW?

Cheers,
Dan Linstedt
CTO, Myers-Holum, Inc


Posted November 17, 2006 7:46 AM
Permalink | 1 Comment |

I've recently searched the web for references to my nanohousing articles, and I found two interesting references that I'd like to point out. I've taken a hiatus from blogging on nanohousing to learn more about nanotechnology, I hope you won't be disappointed with the efforts. Anyhow this is a short entry; I thought you might enjoy these two items.

The first is a successful story that uses one of my articles here on B-Eye Network on Nanohousing, check out this slide show about a robot that was built to "LEARN" systems

eTrium Corporation in the Czech Republic has created a robot based on nanotechnology, and has quoted the nanohousing paper.

The second is a reference to ComputerWire in which an author from their team has plagiarized my work - word for word, and claimed it as their own. For the past several years, ComputerWire has been selling my article for $45 (of which I see NO retribution), they have promised to issue a press release retracting the fact that it was that author's original work, but I've never seen it.

What's even more interesting is they've back-dated the material to 2003, when I wrote it in 2005!!

What's funny is you can get the article for free, HERE on B-Eye-Network. It was originally published in Bill Inmons newsletter many years ago.

I think it's interesting that they still insist on selling my original work without paying me for it.

Cheers,
Dan Linstedt
CTO, Myers-Holum, Inc


Posted September 8, 2006 6:28 AM
Permalink | No Comments |

I've blogged on this before, suggested that there be an equivalent "hidden signal" embedded in a data set, something that uniquely identifies each word, each paragraph, and each document (context). I wish there were a way (electronically) to construct and send unique keys to all data sets around the world; a single unified key structure (open, public but unique).

What's the business benefit? The business benefit would be the ability to key across B2B applications, the ability to recognize duplicate data, the ability to check remote web sites and b2b applications for unique data exchange. There are many more business benefits that I will elaborate on as we go forward.

Today, the only technology that can accomplish this is DNA computing within the Nanotech sector. DNA would give us the ability to uniquely "sign" data sets without destroying the data itself.

It's an interesting thought I think that bears more discussion and research; sort of an RFID for data if you will.


Posted June 3, 2006 5:59 AM
Permalink | 1 Comment |

DNA computing is rapidly making strides in the nanotech industry. There is an interesting evolution with absolutely profound implications: control over a single DNA molecule via nano crystal antennae. The presentation is available for a small fee, but shows just what is possible. Imagine, a massively parallel computing engine at phenomenal speeds, controlling millions or billions of DNA molecules via radio signals.. Wow! How about a thumb drive with 10^8 terabytes of computing power in a couple grams of DNA solution? Searching this solution in less than 3 seconds for answers, computing within the solution in 3 to 10 seconds...

The presentation is on the MIT web site.
The implications are profound. The notion of controlling a single DNA molecule from a radio wave is incredible. Let's step off the edge, and look into the future, over the horizon - let's see if we can think of applications and implications of this technology within the DW / BI space. Beyond the obvious applications in bio-tech, and medical science, let's see what we can come up with.

The web blurb talks about the following:

Anyone can imagine controlling a model car or airplane with radio signals, remotely guiding the machine along a prescribed pathway. In this Knowledge Update, readers learn that the same is being done with DNA and other molecules. This Update describes the tools behind this molecular control, which relies on nanotechnology. In addition, readers learn how this technique can control the binding of DNA, which governs biological processes from cell division to switching genes on and off. Consequently, controlling bimolecular operations opens many possibilities, such as using this nano-control for genetic testing, building molecule-size devices that move on command, and much more.

Now, lets' dive into nano-computing for a moment: imagine a computing system containing a few grams of DNA - say within the size of a thumb drive for a USB port. Within that thumb drive are two things: modified DNA with nano crystal antennae, and a computing system that produces super short, very "weak" radio transmission waves; just enough of a wave to reach the localized DNA. Of course the frequency must be localized as well, and the radio wave must be too weak to travel outside the bounds of the thumb drive - maybe the inside of the thumb drive is coated with a shielding material that keeps the radio waves within the device.

Power consumption is low for this kind of thing. It would be very easy to "program" the DNA, especially since the radio waves cut, splice, and control on/off of the molecules. The challenge would be in reading the DNA results. Suppose there are two mechanisms available to "read results", one possibility might be based on a solution, encouraging and discouraging bonding based on ionization of the molecules - then the reading mechanism might be a segment of light that passes through the entire solution, and either shadow and/or intensity of shadow can produce a read-out of the result, or instead of light and colors, maybe additional radio waves are passed through the solution - ones that don't interact with the antennae, what bounces is read into an "imaging" device - the image is then interpreted by standard programmatic methods.

It is possible then, by combining existing technology with nanotechnology into a single device, to see how "exponentially hard" computational problems can be solved through a simple USB plug and play, and that existing technology can be used to "read" the answers, and send the signals in parallel to the actual computation engine. However, now that I think of it, why not use this for simple solutions too? Solved in parallel, all the DNA strands and programmable DNA molecules should come up with the same answer, every time.

Radio waves offer the dynamics of the same signal to each programmable element at the same time, using imaging and light/color/shadowing techniques - the solution could be "read". Localizing the radio waves and shielding the cover would minimize interference.

I'd love to hear from you, and see what you think of this future vision.

Thank-you,
Dan Linstedt


Posted March 8, 2006 8:01 AM
Permalink | No Comments |

There has been renewed interest in RNAi and RNA lately in the biotech world (don't forget, biotech is a part of nanotech - or the other way around). RNA (or ribo-nucleaic acid) apparently has encoding and decoding instructions for gene sequences, RNAi apparently has the ability to block or inhibit specific gene sequences. See an introductory article here.

In this blog we will explore (theoretically anyhow) what this might mean to the nanohouse and DNA computing.

There are some neat pictures (simulated/generated) showing the DNA structure here. If you don't think nanohousing is being worked on, think again. Here's an IEEE link to a conference that occured in 2004.

Ok let's get started..
For quite a while I've blogged and written about convergence of form and function, along with convergence of industries: Bio, chemistry, technology, physics, etc.. Back in an early paper I wrote for B-Eye I predicted that the future technologist would have to have skills well beyond mere "technology" in order to survive (or face the threat of outsourcing). Well, form and function in Bio-tech are a BIG part of what make it work.

In the Nanohouse, we need to learn from this. The future nanohouse won't be JUST a data warehouse, or JUST an ODS, or JUST an OLTP system - no, it will be an "integrated data store" where the molecules collect "data" as history, when it pertains to the context in which it lives - assigned by "key" components of information that only it recognizes. Different parts of the DNA structure will represent different and distinct chemical keys - for storing different types of information.

Well that's all well and good, but we need functionality in the form of RNA and RNAi to act on the DNA strands that we "build". We also need catalyst type events to trigger interaction across the DNA sequences. Here's a quote from a Vienna RNA project that discusses this:

Biomolecules exhibit a close interplay between structure and function. Therefore the growing number of RNA molecules with complex functions, beyond that of encoding proteins, has brought increased demand for RNA structure prediction methods. While prediction of tertiary structure is usually infeasible, the area of RNA secondary structures is an example where computational methods have been highly successful.
http://nar.oxfordjournals.org/cgi/content/full/31/13/3429

Wow! So this means a Nanohouse is definitely feasible?
Yes - but it's still at least 5 to 10 years off before we understand enough to create one. However, the study of RNA and RNAi sequences along with the DNA strands is important, and will help build a foundation of knowledge from which the Nanohouse can be built.

Where does this impact my business today?
Quite frankly it doesn’t yet. How soon it does will depend on the advances in both Biotech and Nanotech sectors. I would speculate that if your top Information Technologists / Scientists and researchers are not yet involved in this field - they should be. The paradigm is already beginning to shift as we see applications of this technology be created in the labs around the world. Like any paradigm shift, this one will take time - and lots of it.

This is interesting, how does modeling take place in this element?
In order to answer this question, we must take a look at not just data visualization, but model visualization. Model visualization consists of putting data models into a 3D landscape, and combining them with hub-spoke like structures that resemble molecular connections (poor mans neural network), see the Data Vault data modeling references on this site.

How does this play with RNA and RNAi?
RNA can help with the interaction of the molecules, while RNAi can specifically block or inhibit interaction. But more than that, the dynamics of this interaction/blocking need to be scored and measured.

The first practical dynamic programming algorithms to predict the optimal secondary structure of an RNA sequence date back over 20 years (1). Since then they have been extended to allow prediction of suboptimal structures (2,3) and thermodynamic ensembles (4), which allow to assign a confidence level or ‘well definedness’ to the predictions (5).
http://nar.oxfordjournals.org/cgi/content/full/31/13/3429

So does this mean the Nanohouse is a "dynamic structure" model?
Interesting, the answer is it depends; dynamic structure in the sense of adding new DNA components, extracting, and connecting the DNA to other molecules, yes; but changing the core-structure underneath - no. RNA itself also has a structure, and the structure is rigid.

Recently, several methods have addressed the problem of predicting a consensus structure for a group of related RNA sequences (6–11). Such conserved structures are of particular interest, since conservation of structure in spite of sequence variation implies that the structure must be functionally important. By enhancing energy rules with sequence covariation these methods also obtain much better prediction accuracies.

In other words, the structure itself of the RNA stays the same - much the same as the structure of a neuron. Even though the memories change, the connections in the brain change, the thought patterns change, the basic structure of the neurons in the brain stay the same.

What does this mean to Nanohousing?
It means that the architecture of our structure must be consistent, repeatable, and redundant - but that the inter-relations, the functions, and the sequences can change (leading to a dynamic set of rules for inter-relationships, but a static structural based foundation from which to scale infinitely).

A stretch of the imagination might be to say:
The equivalent of ‘data mining activities’ has been found within the RNA and RNAi operations.

Can we beg, borrow and steal some of these concepts today?
Yes - we should be utilizing what we learn in these fields and applying it to our current modeling techniques and data warehouses.

* a close interplay between structure and function (the data model MUST be closely related to the functions in business)
* structure must be functionally important
* assign a confidence level or ‘well definedness’ to the relationship (dynamic relationships can be created, weighed, tested, and destroyed depending on viability to associative information)

You can find more on the Data Vault modeling technique (for free) here.

What do you think will happen in your Nanohouse?
Dan L


Posted October 25, 2005 7:16 AM
Permalink | No Comments |