Business Intelligence Network business intelligence resources

Blog: Dan E. Linstedt

« November 2005 | Main | January 2006 »

December 23, 2005

Who's on First - Abbot and Costello (parody)

This funny parody was submitted by my good friend Kent Graziano, but we do not know whom the original author is. Abbot and Costello ringing your phone this holiday?

Subject: Remember "Who's on first?" Enjoy!

You have to be old enough to remember Abbott and Costello, and too old to REALLY understand computers to fully appreciate this, AND for those of us who sometimes get frustrated by our computers, please read on...

If Bud Abbott and Lou Costello were alive today, their infamous sketch, "Who's on First?" might have turned out something like this:

COSTELLO CALLS TO BUY A COMPUTER FROM ABBOTT

ABBOTT: Super Duper computer store. May I help you?
COSTELLO: Thanks. I'm setting up an office in my den and I'm thinking about buying a computer.
ABBOTT: Mac?
COSTELLO: No, the name's Lou.
ABBOTT: Your computer?
COSTELLO: I don't own a computer. I want to buy one.
ABBOTT: Mac?
COSTELLO: I told you, my name's Lou.
ABBOTT: What about Windows?
COSTELLO: Why? Will it get stuffy in here?
ABBOTT: Do you want a computer with Windows?
COSTELLO: I don't know. What will I see when I look at the windows?
ABBOTT: Wallpaper.
COSTELLO: Never mind the windows. I need a computer and software.
ABBOTT: Software for Windows?
COSTELLO: No. On the computer! I need something I can use to write proposals, track expenses and run my business.? What do you have?
ABBOTT: Office.
COSTELLO: Yeah, for my office. Can you recommend anything?
ABBOTT: I just did.
COSTELLO: You just did what?
ABBOTT: Recommend something.
COSTELLO: You recommended something?
ABBOTT: Yes.
COSTELLO: For my office?
ABBOTT: Yes.
COSTELLO: OK, what did you recommend for my office?
ABBOTT: Office.
COSTELLO: Yes, for my office!
ABBOTT: I recommend Office with Windows.
COSTELLO: I already have an office with windows! OK, let's just say I'm sitting at my computer and I want to type a proposal.? What do I need?
ABBOTT: Word.
COSTELLO: What word?
ABBOTT: Word in Office.
COSTELLO: The only word in office is office.
ABBOTT: The Word in Office for Windows.
COSTELLO: Which word in office for windows?
ABBOTT: The Word you get when you click the blue "W".
COSTELLO: I'm going to click your blue "w" if you don't start with some straight answers.? What about financial bookkeeping? You have anything I can track my money with?
ABBOTT: Money.
COSTELLO: That's right. What do you have?
ABBOTT: Money.
COSTELLO: I need money to track my money?
ABBOTT: It comes bundled with your computer.
COSTELLO: What's bundled with my computer?
ABBOTT: Money.
COSTELLO: Money comes with my computer?
ABBOTT: Yes. No extra charge.\
COSTELLO: I get a bundle of money with my computer? How much?
ABBOTT: One copy.
COSTELLO: Isn't it illegal to copy money?
ABBOTT: Microsoft gave us a license to copy Money.
COSTELLO: They can give you a license to copy money?
ABBOTT: Why not? THEY OWN IT!

???????? (A few days later)
ABBOTT: Super Duper computer store. Can I help you?
COSTELLO: How do I turn my computer off?


ABBOTT: Click on "START".............

  Posted by Dan Linstedt at 7:42 AM | | Comments (1)


December 16, 2005

VLDW: Clustering Versus MPP

I have the wonderful opportunity to teach a VLDW course at TDWI, I also have the sheer joy of dealing with data that qualifies as VLDW feeds and some of the most massive systems in the country on a consulting basis. But there always seems to be the question: Which is better? Clustering machines together, or one "big honking box." (Honking = to push the horn button in the middle of your steering wheel).

Well, as usual, I have a very opinionated stance on this - and have discussed this with Kent Graziano, Richard Winter, and a couple of my other friends. This entry is based on my experience, and what I've seen. I then speculate on what happens to each scenario as the system grows (again based on experience). If there is anyone out there with a different experience, I'd love to hear their thoughts.

Here we go.

So what does VLDW mean anyway?
VLDW means different things to different people, but in my case - VLDW means moving 88M to 1.5B rows per load Job, moving over 5B (billion) rows each load cycle, AND having billions of rows in the target warehouse to start with. In my job, it doesn't mean moving 88M rows one time for historical load purposes, then moving thousands or hundreds of thousands of rows per cycle after that. It also doesn't mean having terabytes of information "sitting" inactive in the warehouse that isn't compared, utilized, queried, or altered during a load cycle. It means actively accessing a majority of those terabytes throughout a week's worth of up-time.

Let's also define what I mean by Clustering:
The clustering that I'm referring to is: single shared disk across several machines, each machine is "wired" together to synchronize memory, CPU and disk processing - in other words - it is meant to "LOOK" like one big machine. I am NOT including a "cluster" that doesn't share disk, that would be a cluster-mpp mix, and of course a cluster that doesn't share RAM or CPU isn't a cluster either - it's MPP. (Massively Parallel Processing).

Let's define what I mean by MPP:
Non-shared Disk, Non-Shared RAM, Non-Shared-CPU. Each unit is SMP underneath the covers. They share a high-speed interconnect that allows them to talk to each other, but each unit is independent of the others, they act as a collective operation, and everything is run in parallel.

Ok, what have you seen in the market place?
I cannot list vendor names here - I try not to write with a vendor bias, so I'll use X's, and Y's, and Z's instead. Contact me if you want to discuss specifics. These are TRUE case studies of customers I've visited who are sitting around 25TB to 45TB of active data in their warehouse, and are experiencing issues. Remember - these customers are ALL running data warehouses, this is NOT about the OLTP side of the house.

Customer 1: DB X - 45 TB, Clustered environment, having trouble with Network bottlenecks and I/O synchronization, has a daily call with the engineering staff of DB X vendor, with patches written just to keep their DBMS up and running.
Customer 2: DB X - 35 TB, Clustered environment, having trouble with Network bottlenecks, and I/O throughput - wondering what they can do to "fix" their problems, doesn't want custom patches from DB X vendor.
Customer 3: DB X - 12 TB, Clustered environment, trouble with data mining operations, switched to DB Y (MPP solution).
Customer 4: DB X - 18 TB, Clustered environment, trouble with loads, and query times - switched to DB Z and is growing rapidly now.
Customer 5: DB Q - 12 TB, Clustered Environment, trouble with loads, index maintenance, and RAM allocation, switched to DB Y and is growing rapidly.

Ok, here's my two cents: take it for what it's worth, this is the thought experiment I set out on to find out WHY clustering began to exhibit problems at specific levels of volume when MPP showed no signs of slowing down. Here's what I found, and what I speculate:
1. There is an inherent CAP on processing power available in most clustered environments, the same can be said for MPP - BUT typically in an MPP environment, the CPU's are much faster, the bus speeds are much faster, and the CPU's don't have the EXTRA load of trying to synchronize RAM, and I/O across the network - in MPP each independent node is responsible for it's own operations.
2. As the data set increases in size, managing the historical information and keeping all the records in SYNC requires an exponentially increasing amount of hardware and hardware performance. That means: in a cluster - to keep the DISK MAP synchronized across all nodes requires more and more network bandwidth, and more and more I/O bandwidth.
3. The DISK MAP is only one map, there's a RAM map, and a CPU MAP to maintain as well. Remember: the true cluster "appears" as one big honking machine to the application.
4. As the data set increases, the synchronization efforts double, and double again - exponentially eating up the available resources.
5. As the data set increases, it requires more SMP nodes to be attached to the cluster - but adding more SMP nodes also exponentially increases the difficulty of synchronization.

And so on, the problem compounds itself in such a way, that no amount of money in the world (to throw at the problem) can solve the amount of performance required to handle such large data sets.

A basic tenant in life is to "Divide and Conquer" when we are faced with large problems. We need to learn to apply this to our data warehouses, especially in VLDW. The only way to divide and conquer is to use MPP - OR to really buy a "big-honking SMP machine".

What am I saying?
I'm saying that in order to technologically bridge the gap, the first problem to solve is the network throughput - and today with SMP clusters, there is an upper limit between servers at which networks can operate. I have yet to see a DS3 "IP card" or even a "T1" IP card that is capable of wiring together clustered SMP nodes, or wiring disk to CPU. But the problem goes beyond that, the network card (at a hardware level) would then have to take on CPU power to overcome the next problem, lack of CPU available to run the "synchronization routines", and the problem expands. The bigger the data set, the more challenging it is to SOLVE. There's a mathematical hard-core upper limit to what technology can "do".

This is why if you look at a machine that runs as a HUGE SMP (32 to 64 CPU's and 64 GB RAM) you see a super power-horse, and also why these machines cost so much. The company that produces that machine has gone through the trouble of solving these problems (or eliminating them) through hardware BUS architecture. It's only on these machines that DB X has been scaled beyond the TB levels I've put here.

Now, there's a couple other things I wish to note: there are SMP clusters that are rack-mount, where the interconnects are a back-bone and a direct connect across the machines. These way-lay the problems but only for a little while. The next thing I'd like to note are the SMP appliances, when plugged in - they act as MPP architecture, independent nodes, and are taking market share from the leading MPP RDBMS vendors. Dedicated Rack Mount SMP clusters that act as a "unit" within an MPP environment (handling only a portion of the overall data set) work REALLY REALLY WELL, and are extremely fast, plus they offer the benefit of fail-over and recovery at lower cost than a single LARGE SMP unit within an MPP environment.

Mathematically there doesn't seem to be an upper limit to MPP data handling, mostly because adding another "node" to the MPP chain divides the work further - and doesn't necessarily "add" to the complexity because synchronization is not needed.

I'll give you several cases where I've been that have MPP in their environments (these are all commercial environments, as the public sector environments are much larger, but cannot be discussed).
1. MPP - 348 TB (2 years ago) and still growing
2. MPP - 5 Petabytes (that's all they can keep on-line), they generate 1 Petabyte for every experiment they run.
3. MPP - 150 TB (last year) and still growing
4. MPP - 268 TB (last year) and still growing
5. MPP - 3 Petabytes (scientific research) and still growing

Big problems require big solutions, I'd be happy to speak with you off-line about this information, as I teach VLDW, and performance and tuning, as well as systems architecture, design, and scalability for the future.

Bottom line, Clustering (the way I've defined it here) is not suggested for your future if you expect large volumes. I welcome any thoughts, critical, or otherwise - I'd love to hear about successes in the clustered environment, maybe we can flush out what's acceptable to cluster.

Thanks,
Dan Linstedt

  Posted by Dan Linstedt at 5:49 AM | | Comments (2) | TrackBacks (1)


December 13, 2005

Competitive Decision Time Is Shrinking

We've all heard it, it's there. Most of us know it - yet we refuse to accept it. There are strange happenings within the strategic use of information across the organization. The REAL question going forward will be: what will the value of STRATEGIC data sets be to the organization in the future? The whole question invites the opposite thought process: Do a bunch of fast TACTICAL decisions today, define the STRATEGIC decision of tomorrow?

In this day and age executives and decision makers are finding less and less time to "decide" what to do strategically with the organization. Yet strategic decisions become more valuable when they are made in the RIGHT-TIME. Tactical decision making is on the rise, and in fact - is using "learned" information from a strategic base of data (patterns to create knowledge) to TEST their tactical decision.

Confused yet? Sorry. Here's an example:
15 years ago, if you told me that you had a decision for where the company needed to go (you had a strategic plan for the next 5 to 10 years) and that you used 20 years or 30 years of historical data and trends to reach that plan, I might have said: great - you seem like a solid company, and the plan looks good.

Today, if you tell me you have a strategic plan for the next 10 years, I might begin to question how agile your company is to changing market conditions. I might tell you your company may not be around in 5 years (unless the plan changes as you go along).

What does this mean?
It means a few things.
1. A decision maker has less and less time (overall) to make good decisions today than they did 5 years or 10 years ago.
2. Each decision applies to "less" time moving forward, particularly because the market conditions change so quickly
3. Larger and larger companies must become more adaptable and agile in order to compete with the leading edge companies - particularly once an idea has proved itself worthy, and is beginning to "cut a new market" out.
4. BETTER decisions made faster require ever increasing amount of background data (usually learned patterns from a result of data mining).
5. STATIC Strategic long-term plans are no good. GOALS on the other hand, and OBJECTIVES are wonderful - but need to be created in such a manner that the strategic plans are flexible, and can be "thrown out at a moments notice and replaced with new plans." This means sunk-cost for poor strategic decisions.
6. Competition is driving ever shorter "decision cycles", as long as we continue with increasing direct contact with the customer, and empowering the customer to make choices, there will be less and less "group think" and more and more "individualism", making it harder to "trend" and harder to predict why people do what they do.
7. Each decision is becoming more and more critical, not to mention more and more personalized. This is making it even harder to set and stick to a static strategic plan.

I'm not saying that all strategic plans are washed up, nor am I saying that strategic decision making is completely gone, nor will it ever completely go away. I am saying that the nature of strategic decision making is changing - to be more agile. The lines of what's tactical and what's strategic are changing and blurring together. I am saying that strategic and tactical decisions (if made incorrectly, or without enough learned background) cost more today than they did in the past. I am also suggesting that tactical decisions need to be made on the basis of data mining of all that history.

The notions of time are speeding up (see Ray Kurzweil, and The Age of Spiritual Machines). What I would like to know is: in your organization, or companies you've worked in (without mentioning names), what have you seen in regards to their ability to think Tactically vs Strategically? Do they have long term plans that guide them and are unchanging?

It no longer pays to be a dinosaur of giant proportions; it seems to pay better to be more like a body of water - fluid, dynamic, and possibly covering large areas of ground.

Cheers,
Dan L

  Posted by Dan Linstedt at 6:58 AM | | Comments (0)


Were does EII need to go?

In this entry I will explore some futuristic capabilities (a wish list) of features that I would like to see EII work towards. The real questions are beginning to surface about EII and ETL / ETLT and EAI, there are other questions about web-services, security, standardization, and the best practices needed for implementation of SOA around the enterprise. Let's take a look at the feature set that may be needed via an EII tool in the near future.

What are some of the business problems that EII solves compared to ETL and EAI?
* Access to "now data", current view of transactions across multiple disparate source systems
* Management of Metadata (currently mostly meta-models) for "conforming" of the data model across the enterprise. In this manner, it may actually assist in the development of the data warehouse of the future.
* Dynamic integration of unstructured and semi-structured data
* Real-Time / Right-Time reporting

Technical Problems that EII solves
* Access to XML, XQuery, XPATH data and documents.
* Access to Web Services
* Access to semi-structured and unstructured data sets
* Control over publication of Web-Services
* Definition of consistent enterprise metadata
* GUI Development Interface for web-services

What we need is a single tool, a single interface to handle a much more broad set of requirements. EII has such a narrow scope right now (because most EII tools are just now coming into the second generation), that additional functionality is necessary to really take a chunk of the market space. For instance, a huge potential exists for a very strong single GUI in an EII tool to manage, maintain, and help define UDDI registries (in other words manage the web-services through metadata). Today, there appear to be partnerships between EII vendors and "Registry" vendors. This is good, but won't remain a differentiator for long.

Wish list of features
* Virtual Tables
* Registry (UDDI) management and integration
* Automated Query Tuning
* Two-Phase commit across sources that allow write-back
* Management of Security Policies
* Business Metadata and Ontology Support
* Additional bi-directional metadata interface, particularly to work with MetaIntegration


The next generation of EII tool will have to extend it's metadata reach, into business metadata, across process metadata, and down into Web Services Management and maintenance. It will have to add Version Control of Registries, web-services as a whole, security policies, and so on. Why? The EII space will need to continue to take chunks of "very hard domain problems" and show enterprise information integration with their solution. They will need to focus LESS on transformation (although that will remain a key function), and focus MORE on additional TYPES of information integration.

Their ability to truly integrate the enterprise and ALL of it's data (not necessarily in volume, but remaining true to the notions of currency) will have a huge impact IF this information can also be managed. Reaching into new domains of information integration will help EII grow into a major player in the implementation space.

SOA is growing, best practices are being developed, web-services and EII are major players in the success of SOA. Particularly when EII can provide the management of the Web-Services and it's metadata. It's a domain that is a natural fit for EII, the EII vendor of the future will "purchase" a registry solution as their own, and will begin to differentiate beyond other vendors in this area and in what they can do with the metadata. One of the largest keys to success will be: how does the EII tool tackle the problem of "bringing that management to the end-user?"

In other words, can the tool provide enough of an end-user or business user interface to entice metadata management to take place as a natural function of business? The GUI interface and codeless solutions will become more and more important, tying the metadata to a master integrated meta-model (single view of the enterprise) will also become paramount to success. Finally, the EII tool that can communicate bi-directionally with a metadata solution will have tremendous success, as business users see added leverage for utilizing a single GUI interface to assist with true EII.

Do you agree / disagree? I'd love to hear your thoughts on the matter.

Thanks,
Dan L

  Posted by Dan Linstedt at 6:35 AM | | Comments (0)


December 8, 2005

Virtual "Data Tables" for EII

There is a new concept on the horizon of EII known as Virtual Tables. In other words, structures and temporary data stores that capture data from the sources, and refresh it on request. In this entry we will explore the nature of virtual tables, and temporary data storage - the pros and cons of this mechanism. I'm not sure it belongs in Dynamic Data Warehousing, but it's not your "ordinary" mechanism for data access, therefore it's dynamic in nature - ever changing without management. Without further adieu, let's take a look at this concept.

There's a vendor: Ipedo, who has produced a thing called Virtual Tables for EII queries. What does this mean? What does this bring to the table? Who can use it? Should it be implemented across the board?

We'll try to answer a few of these questions, all that I blog on here is based on my personal experience and speculation for what the future holds. There's nothing more that I like than to have those in the field offer their opinions in response to my blog entries, thank-you to all those who've commented in the past year, and I look forward to additional comments here on this.

What's a Virtual Table anyway?
Well, according to Ipedo, a virtual table is a structure that houses data - is built on the fly and captures the data on request from the source (in case the same request comes through the pike again). The Virtual table can represent a "time-based snapshot" of data at a particular instant. It won't hold data for long (how long is the question I have yet to ask the vendor), but none-the-less, holding data that "may already have changed" or has disappeared from access can lead to some very interesting results.

What's the power in a virtual table?
The power is: capturing the data in the CURRENT structure during the CURRENT request, and time-stamping the result. The Business Power is the ability to say: Wait, did I just see what I thought I saw? And re-run the request and get the same result. This is usually reserved for a fundamental tenant of Data Warehousing questions. But in this case we are dealing with TACTICAL data (real-time data). The impacts to the business can be huge. Imagine, asking a question in which the source data changes every 5 seconds, or "disappears" after a short period of time, in these cases having the data AND the structure stored in a virtual format can make all the difference in the world - and enable the same report to be run twice. That is the same TACTICAL report.

Don't get me wrong, a Virtual Table is NOT a replacement for the warehouse.
In fact, the "data" in a virtual table should be FED to the warehouse as a part of the "transactional" or real-time/right-time feeds. In this regard the virtual table buys time for the warehouse to collect the data (while it changes in the background). This is particularly special IF a lazy (queued) feed can be constructed to provide this data to the warehouse. But we would want to accompany it along with WHEN the data was pulled, WHAT the structure looks like RIGHT NOW, and any metadata that arrived on that feed as well to describe the data housed in the virtual table.

Virtual Tables when used within the right context provide the EII tool with a powerful solution.
Particularly (again) when the data set on the source side fades, or changes, or is no longer available.

What are the pros and cons of the Virtual Table within an EII solution?
I speculate the following and welcome any comments that extend either list.
Pros:
* Temporary static structure
* Temporary static data
* Ability to "get the same results" from a query run within X minutes
* Dynamic restructuring (after the data decay's - and changes in structure, a new virtual structure can be build)
* If the structure remains consistent, the query can be run every X minutes to determine CDC (depending on the load on the source systems)
* A mechanism to "house" an XML feed, a stock ticker, or the results of a service request from a provider.
* Possibility to define extra METADATA on the virtual table and structure.
* Possibility to "monitor" the virtual table for compliance reasons, and to feed the data in a queuing fashion down to the data warehouse.

Cons:
* Extra memory necessary within the EII server.
* Extra processing power needed to keep the virtual tables in sync
* Possibility for structure change between "calls" to the source data, resulting in multiple memory images of virtual tables.
* Requires Metadata management cycles in the EII tool to help handle the expansion of the table sets, and definition of the sources.

What are some of the challenges?
* Managing a virtual table to represent unstructured data sourced.
* Managing changing structures on web-services, however Registries should keep this clean - published API's of metadata, structure, and accessibility.

Bottom line?
Virtual tables within EII are a baby step in the right direction. I feel there is tremendous hidden and yet undiscovered power within the Virtual Table concept - I think, given time - applications for the Virtual Table will become an integral part of data integration processing (which includes the feeds to the enterprise data warehouse).

Thoughts? Comments? What do you think about a virtual table?

Cheers,
Dan L

  Posted by Dan Linstedt at 12:18 PM | | Comments (2)


December 2, 2005

Data Mining and the Active Data Warehouse

Where is data mining these days? What power can it bring to the table? If I build an Active Data Warehouse will it come, is it necessary?

There are many questions floating around these days, and I've written a little bit about this topic in the past. In this post I will attempt to discuss some of the newer thoughts about this subject, and push the envelope out a little further than maybe we're comfortable with. This entry is a thought experiment, but has implications in today’s computing arena.

Data Mining has grown up over the recent years. It's been around for a long long time, but I guess I should say, it's become much more "usable" in the business users eyes, and it's beginning to appear as embedded technology. It's now plugged in to Teradata RDBMS, Oracle RDBMS, SQLServer 2005 Integration Services and RDBMS, DB2 UDB RDBMS, FirstLogic IQ Suite, SAS ETL and BI tools, and so on - there are too many to list here. The point is that it is beginning to be utilized to enhance the quality of information.

"According to the example of Baosteel production, this paper introduces the way of using data mining technology -- SAS/EM to discover the rules that we don’t know before and it can improve the quality of products and decrease the cost." (1)

Data mining is not just about data quality, it's also about business process quality, deeper understanding of our environment, and the quality of our products. In this companies' case they concluded that "...How to use data is an important thing that faces everyone. We should apply the data mining technology to more fields." (1)

I would tend to agree. Additional fields (in my mind) include mining active data as it arrives - in context with the strategic data that it's already "learned" or established a knowledge pattern for. Other areas may include mining the architecture in which the data sits in (ie: the data model), mining the processes that link the data together - looking for flaws or better ways to deal with it, mining the metadata around the data set for additional context establishment and so on.

"In this paper we introduce data quality mining (DQM) as a new and promising data mining approach from the academic and the business point of view. The goal of DQM is to employ data mining methods in order to detect, quantify, explain and correct data quality deficiencies in very large databases. Data quality is crucial for many applications of knowledge discovery in databases (KDD). So a typical application scenario for DQM is to support KDD projects, especially during the initial phases. Moreover, improving data quality is also a burning issue in many areas outside KDD. That is, DQM opens new and promising application fields for data mining methods outside the field of pure data analysis. To give a first impression of a concrete DQM approach, we describe how to employ association rules for the purpose of DQM." (2)

Active Data Warehousing is about integrating the ODS and Data Warehouse into a single instance, single data store. It's about capturing data as it happens (at the right time), in to the warehouse as a statement of fact, and then using that data or leveraging the data to make both strategic and tactical decisions in time with the enterprise. Active Data Warehousing also brings in massive sets of information to deal with, thus making the task dually difficult. Of course - with an Active Warehouse we also need to utilize real-time arriving data.

One notion I've believed in is something I call Active Mining. Active Mining is the ability to start a neural net, pre-load it with the historical data, and then as data arrives (when it arrives), add it to the neural net already in play. In other words - no waiting, no "re-running" of the mining algorithms to get the result. Of course in the beginning (or depending on how much history is mined when started), the neural net may need to be shut-down and restarted - but as time goes on, less and less correction is necessary.

I believe that active mining will take the fore-front and will be embedded in every process through the streams of data that we deal with on a daily basis. However, that's not to say that there's no value in storing existing level of details as a statement-of-fact in the warehouse, there certainly is value to that. But moving forward, dynamically understanding how well the new data fits - may become a critical factor of business operations.

Speaking of business operations, there is a company called Purple Insight which has (in my opinion) begun to master the ability to tie data mining, and results to visualization. Check them out here. Using Active Mining to feed a live visualization of the data may also begin to play a powerful role in the future "use" of our information sets.

References:
1. Data Mining Quality Improvement - http://www2.sas.com/proceedings/sugi27/p111-27.pdf
2. Data Quality Mining - http://www.cs.cornell.edu/johannes/papers/dmkd2001-papers/p5_hipp.pdf

  Posted by Dan Linstedt at 6:03 AM | | Comments (1)


DNA Computing & Tic-Tac-Toe

I came across this entry this morning, where DNA computing in enzymes has been activated to play tic-tac-toe. Apparently (the article says) that the system cannot be beat. The article also goes on to discuss how the enzymes affect the DNA strands around it, cutting, splicing, and attaching depending on their choice. In this blog posting I will explore what some of the "possible applications" of this technology might be, a simple thought experiment if you will.

The article can be found here. I've spent a lot of time writing about the nature of convergence, and the fact that I believe "wet-technology" or the mix between natural world models and our electronic models is coming together. Nothing is more evident here. In this particular case we have electronic gates / switches that we normally use to play tic-tac-toe, only they are placed into DNA enzymes. This raises some very interesting questions:

1. How parallel is this DNA computer?
2. If it has so many parallel operations, how fast is it in terms of "operations per second?"
3. What would happen if we took multiples of these tic-tac-toe boards, and tied them together for a game of Othello? If we did this for Chess, what would need to change?

We're always molding our natural world into models that we see to fit our needs, for instance - moving the tubes into a sequence to represent tic-tac-toe. What if instead, we utilized a single strand of folded DNA in three dimensions to represent a tic-tac-toe board? Could the single solution with a single DNA strand play the game on a much smaller level?

This is the type of question that would lead deeper into the Nanohouse abilities. The ability to control a single DNA strand, and utilize a model that already exists to achieve our goals. We would have a much larger scale repeatable model if we could do this.

The thought experiment:
Say for an instant that you had 120 train tracks back to back, and 120 trains (1 on each track). Now say, each train has 120 cars - each one uniquely different, and each trains engine was unique - color, size, shape, horsepower, motor drive, etc.. Now suppose within each train, a series of cars represented a "square" on the tic-tac-toe board. In other words, 20 cars from each train represented one square. Each train represents a different tic-tac-toe board.

Now suppose we released 120 people, told them to go "take" six cars from each train - the only requirement is that they need to all choose a different 6 car set. This might represent the chemical release to a DNA strand, and each of the "people" or incoming chemical mix matches with a specific DNA place in the chain. By repeating this process, and having the "computer" or the "game" choose other car sets, you've effectively re-created a logic gate computing device at the DNA strand level.

The other thing we've done here is suggest that the computations occur in parallel, and that data sets can be different for each "action" - told to attach itself to DNA at different parts of the strand. We've effectively re-created the possibility to play an very large number of finite "games", all in parallel. Very quickly the "winning pattern" will emerge, these may become the rules that are applied to the next engine going forward - in other words, spot the "learning pattern".

Of course, change the game - and we have to start all over again. The learned rules for Tic-Tac-Toe don't necessarily work for checkers or chess.

Some of the other questions still rolling around in my head are:
1. Is DNA Computing (or will it ever be) faster than electron based spin technology?
2. What are the pros and cons of DNA computing versus electron spin technology?
3. What are the recent strides that electron spin technology has made?

It seems to me that electron spin computing has a ways to go, and isn't making advances as fast as DNA computing, but that remains to be seen. It also appears to be a more difficult challenge, as DNA molecules are much larger than electron based control at the atomic level. However - I must ask the question, if we can search 10^8 Terabytes of DNA solution in 3 seconds, how fast (if it ever can be done) will electron spin computing device search 10^8 terabytes? I must also ask, is it really worth the cost or difficulty of overcoming its (electron spin) obstacles to make it happen?

I'd love to hear your thoughts and ideas.

  Posted by Dan Linstedt at 5:37 AM | | Comments (0)