Blog: Dan E. Linstedt Subscribe to this blog's RSS feed!

Dan Linstedt

Bill Inmon has given me this wonderful opportunity to blog on his behalf. I like to cover everything from DW2.0 to integration to data modeling, including ETL/ELT, SOA, Master Data Management, Unstructured Data, DW and BI. Currently I am working on ways to create dynamic data warehouses, push-button architectures, and automated generation of common data models. You can find me at Denver University where I participate on an academic advisory board for Masters Students in I.T. I can't wait to hear from you in the comments of my blog entries. Thank-you, and all the best; Dan Linstedt http://www.COBICC.com, danL@danLinstedt.com

About the author >

Cofounder of Genesee Academy, RapidACE, and BetterDataModel.com, Daniel Linstedt is an internationally known expert in data warehousing, business intelligence, analytics, very large data warehousing (VLDW), OLTP and performance and tuning. He has been the lead technical architect on enterprise-wide data warehouse projects and refinements for many Fortune 500 companies. Linstedt is an instructor of The Data Warehousing Institute and a featured speaker at industry events. He is a Certified DW2.0 Architect. He has worked with companies including: IBM, Informatica, Ipedo, X-Aware, Netezza, Microsoft, Oracle, Silver Creek Systems, and Teradata.  He is trained in SEI / CMMi Level 5, and is the inventor of The Matrix Methodology, and the Data Vault Data modeling architecture. He has built expert training courses, and trained hundreds of industry professionals, and is the voice of Bill Inmons' Blog on http://www.b-eye-network.com/blogs/linstedt/.

November 2005 Archives

As we celebrate Thanksgiving this year, I would like to remind you of the many turkeys that will be in all across America, spread out on the dinner tables. There will be pandemonium (which should have already begun among the young turkey cookers anonymous) in preparing all the fixings and even the turkey itself. The quest for the ultimate tasty, juicy, and best browned bird in the world. Then, the families will gather around, a single bird - hopefully large enough to feed everyone (who's eyes are bigger than their tummies - I know mine are)...

The stuffing will be cooked in parallel with the bird, and subsequently partitioned - once the bird and the stuffing are sufficiently cooked. The mashed potatoes, green beans, and other vegetables will be in various states of reference on the stove, the cook will hope that all the independent food preparation jobs will all finish on-time and in-budget.

There's the additional complication of moving all the independent food categories in harmonious synchronicity to the table at the same time, hopefully they'll all be steaming hot - and won't need a quick run through the oven process of warming over (once again) before being served. Of course, each particular separate item requires a different amount of preparation, and given that there are only 4 or 5 burners on the typical stove, and generally one oven - it's a miracle that we deliver the dinner on-time at all.

Of course warming up the buns process requires a different business rule (temperature) than that of both keeping the turkey warm (once done), or cooking it in the first place. These different business rules usually require 3 to 5 cookbooks (more abstract business rules) that all provide strange instructions on how to prepare the single dish that is being built. Not to mention that the turkey has a dependency on being thawed out before it can actually be processed, now we want the gravy - well, we have two choices: go the easy route and buy an external source of gravy, or go the hard route, and WAIT until the Turkey is done, then in one fell swoop dip down, and bring the meat juices to the pre-prepared gravy additives - making sure that the temperature reflects just the right amount of integration heat.

Then we get to the preparation - the setting of the table. Time to call on the kids (child processes - and I mean this kindly) to cooperate - which by now they are running around the house playing with other child processes and wreaking havoc on the neatly put away toys. Not only do they have to synchronize, they have to put all the plates, silverware, glassware, drinks, gravy boats, carved turkey, and vegetables on the table all at the same time - of course it has to be done before the food gets cold, so they are forced into performing an immediate task, some would say an emergency fix, and producing an acceptable result for the business users, uh - I mean adults.

Mean-while all the adults are standing around, bellies already full of other information, due to the over-processing of wine, cheese, and crackers - possibly the occasional olive or two. Now, the adults must acknowledge the child's efforts, and everyone is expected to sit down in great harmony. Of course there's always the one adult process that fights with the other adult process over whom sits where, this is resolved by swapping one adult with another for a place at the CPU, oops - I mean table. Of course this doesn't prevent the children from kicking each other under the CPU, whereby each of those processes promptly receives a time-out from the moderator or host.

Just think, this is happening, every year - across America (and for all those that celebrate thanksgiving), at a dinner table near you. Imagine trying to assign the metadata to manage all these independent processes across every household, and assign metadata registry entries to manage all the dependencies, the turkey cooking, the timing, where everyone sits, who gets along with whom, and that this all has to happen before mid-night the 24th of November.

This is just a peak into our world of Enterprise Integration specialists, for now - have a happy Thanksgiving, I know I will.... Wine and cheese are waiting...

How does your Thanksgiving represent your job? I'd love to hear feedback.


Posted November 23, 2005 3:45 PM
Permalink | 1 Comment |

There is a lot of buzz in the Nanotech sector these days. Many developments have come forth in just the past year alone. Things that people have said can't be done for another five to ten years have been accomplished; everything from self-assembling structures, to utilization of motor molecules to move things around. There are a few things that have caught my eye, and in this blog I will recap just a few of these.

The first is: "Neuroscientists break code on sight" In this unbelievable article, the neuroscientists actually figured out (at least started to figure out) a way to encode images, or found some of the mechanisms within the brain that are responsible for encoding images that are seen.

Why is this important to me?
From a business perspective, it could mean that a. we could build better visual recognition systems, b. we could encrypt or encode images in extremely compressed formats. Imagine, if all we needed were the "neuron imaging program" to rebuild the image from a very small set of data, then this would change the entire nature of compression / decompression. In other words - what does it take to constitute a particular image with precise information? Notice that the article houses the images in black and white, there must be another component of the neural network that processes colors, however this shows a recognition of depth through hue and saturation.

From a business intelligence perspective it could mean a. much better data visualization, b. new ways of abstracting information, c. a combination of form and function where data points represent the neural network - resulting in "learning something new" rather quickly. Think about it this way: what if we constructed the worlds ONLY "universal data model" with specific functions attached to each point, and then by lighting up those points with different intensities (applicability scores) we could end up with an image or a thought or a fact? This is the way I see this particular advancement. More on how that might work, later...

The next story comes from a company I've been watching for the past two years: Nanosys.
"Nanosys Announces Issued Patent Covering Fundamental Nanowire Heterostructures"
This story is also interesting in what Nanosys has accomplished. When we read the intro blurb about the development, notice what Nanosys says about the application of this technology:

"This technology covers a broad variety of devices including Field Effect Transistors (FET), light emitting devices including Light Emitting Diodes (LEDs) and nanolasers, solar cells, thermoelectric devices, optical detectors, and chemical and biological sensors."

What's interesting here is that Nanosys has proven with this one device, that nanotechnology does indeed cross many different aspects of life; from the technology sector, to the chemical and biological sector. This underscores the importance of convergence, something I've been blogging on for over a year. The next quote from this story raises some very interesting questions in my mind...

"The technology to integrate different materials at the nanoscale enables us to create nanostructures that perform as devices with multiple functions rather than just materials," said Calvin Chow, Nanosys' Chief Executive Officer. "This significantly increases the value of our nanostructures while simplifying their incorporation into products."

The questions I have are:
1. If nanostructures enable the creation of multiple devices with multiple functions, then when does a device begin and a material end?
2. Will we be able to tell the difference between a nanodevice and a nanomaterial?
3. Is it safe to say that a nanomaterial is now a nanodevice and vice-versa?

Assuming that a nanomaterial is now also a nanodevice at the same time, then we now have the ability to create the product (or part of it) known as Wellstone (Hacking Matter, Will McCarthy). We could also conceivably create a piece of "wood" made of nanomaterial, that can change its' composition to a piece of fabric or steel based on programmatic arrangement. Maybe these nanowires are not yet that advanced. Maybe we only have the ability to create a "computationally smart coffee table." None-the-less this is a very important discovery.

Here's a fun one: "Molecules that suck"
The interesting part of this, is the notion that molecules can "pick up" and then be told to re-arrange, and "release" other molecules. In other words, it sounds as if it's temporary bonding. If I extrapolate the thought process, this could potentially provide a battery operated surface for gloves and shoes, where it can "bond" with metal molecules, say a steel wall, then released, and re-bonded again. Could it lead to nano-devices for "walking up walls"?

And finally: "Study shows nanoparticles could damage plant life"
This article is very interesting, in that it discusses how nanoparticles actually damage other natural world particles. It brings to the fore-front (in my mind) the potential danger of nanotech. In this case, they've actually shown that nanotech can in fact have harmful effects on the natural world. They admit to not knowing "how" this happens yet, but one can speculate that it's like clogging of the pores in your skin, that the aluminum nanoparticles block the water absorption pores of the plant root, or maybe that they are absorbed as a part of the water and then somehow block the oxygen creation process within the plant.

We are already aware of the dangers of aluminum particles in the human body, causing everything from memory loss to Alzheimer’s disease - basically that the aluminum is absorbed through the skin, and lodges itself in the brain and blocks normal activity. It's no surprise that a metal like this is dangerous to plants as well. But it begs the question: are there circumstances where trace amounts of aluminum nanoparticles could be helpful? And if so, where should they be applied and under what circumstances? If there was a way to keep them from floating through the air when "sprayed on", maybe we have the next generation weed killer, as long as we don't inhale or get it on our skin.

Nanotech itself is a phenomenal field of discovery and advancement, each of these pieces I've included highlights different areas of nanotech and their applications or the affects there-of. It will only become more exciting as we dive in to next year, and begin to see business applications of these components in every-day life.


Posted November 23, 2005 5:05 AM
Permalink | No Comments |

The problem today is that a patchwork of applications is needed to produce a serious integrated view of the enterprise. We should have the following components: ETL/ELT, EII, EAI, Web Services, Registry Managers, and Metadata Management software embedded within our enterprise projects in order to make sense of them. It's a well known fact that each of these tool sets brings its' own value to the table, and that most enterprises today have nearly all the components already in house. As far as compliance is concerned we are seeing devices (appliances) now that capture and compress transactions as they flow through the enterprise.

A while back I blogged on the future "appliance" needed to keep integration alive, where everything will converge on to a single appliance. I stil believe this is true. Hardware is getting cheaper, and software (for some strange reason) is getting more expensive. Sooner or later, hardware vendors will partner or buy-out software vendors and then they will merge the software on to the hardware platform.

Of course we know why software is getting more expensive:
1. the vendors offer more bells and whistles than they ever have
2. the vendors need to justify the high costs of engineering parallelism, HACMP, and partitioning
3. the vendors are producing ever more proprietary algorithms to solve problems that have been already solved in the MPP world of hardware.
4. the vendors (tongue in cheek) need to pay for their acquisitions of other companies (sorry, just a joke...)

The future integration component will be a plug and play device that offers an MPP style interconnect, self-discovery of other "like devices" on the corporate intranet, and of course offers parallelism, HACMP, partitioning, and compliance within the device - all of it hardware driven. Needless to say the device will also offer compression, network sharing, historical copies of raw data, and other pieces like a query interface, information quality transformations, data mining, visualization, and a data modeling interface (just to name a few).

Functionality wise it will offer right-time data collection, along with batch data collection, compression, and dissemination. The device will be capable of talking to other devices on the intranet and sharing information, recognizing similar information, and moving information around the enterprise during idle or partially idle times.

Of course with the rise of right-time, the word "idle" will mean something different, instead of "idle" CPU, it will mean "idle" data sets. Speaking of data sets, the focus will be on compressing and utilizing compressed/encrypted data.

So what drives this huge integration?
One word: Metadata. A registry of registries, along with registry of machines, registry of data sets, registry of services, registry of business logic, registry of technical metadata and process metadata. Each of these registries will be managed in a virtual network of registries (a master registry - aka master data management). This registry system will be in a separate device until we figure out how to integrate it too, into the singe working components.

Bottom line, the data warehouse, or better yet: data integration store (DIS) of the future will in fact be a plug and play device. Self-healing will be built in through replication. A shared nothing architecture with the appropriate interchange levels will be provided, along with fault-tolerant dual-fail-over network connections. The device will be cheap and will scale in cost based on storage needs, but storage needs will become reduced as compression and encryption take over.

If you want to get a handle on your business TODAY, then I would strongly suggest beginning a master data management project, with classification and ontology’s of metadata - developing registries and business intelligence to view/alter/manage the registries across the enterprise. Putting this into a collaborative environment will help it thrive; adding incentives for employees to add metadata and information to the registries along with managing it will also produce a thriving result.

I'd love to hear your thoughts on the matter.

Thanks,
Dan L


Posted November 23, 2005 4:33 AM
Permalink | 1 Comment |

In my studies of nanotech reports, massive scale computing, and extreme parallelism I constantly come across items that lead to the same end. They all have similar findings, they all proclaim the same thing, it seems a universal axiom is bubbling to the top. Information Modeling is at the heart of successful processing and integration on a grand scale.

In this blog I will explore some interesting experiments that have been conducted in DNA computing which is one of the pre-cursors to the actualization of the Nanohouse.

Don't get me wrong, the computational side is very important as well, and in fact - to get the scalability, FORM AND FUNCTION MUST CONVERGE, and the FORM (the data models) must be flexible, and dynamic in nature. This is where Nanotech and Biotech comes in; they are currently defining the use of "wet-technology" or natural world models in our current technological world.

"Computational Mechanisms in Bio-Substrates... Leverage massive parallelism, Harvest Nature's toolkit." (1)
"Computational models of Cells - Natural Computation" (1)

The study goes on to discuss how DNA computing is scalable, programmable, and can exist in a 2D and 3D landscape; they also discuss the nature of self-assembly - a concept reserved for Nanotechnologists. In one of my earlier papers and references to DNA computing, 1 gram of DNA can store multiple terabytes of information. This certainly leads to the notions of a compact nanohouse.

The impact of 3d modeling has already been discussed, in the ability to fold relationships, see data in a new light, and begin to program systems based on "landscape" notions, or proximity in height, width and depth. The notions of "model driven development" are central to the development of nanotechnology.

A parallel can be drawn when we look at business development and understanding, particularly in terms of SOA. When we go to build SOA, the "data models" underneath make all the difference in terms of scalability, and flexibility. When I look at VLDB / VLDW - it's the same thing all over again, MPP systems are the tip of the ice-berg, and shared-nothing architectures rely HEAVILY on the model of the data underneath in order to achieve maximum performance of the queries.

If we add the DARPA term: SPATIO-TEMPORAL modeling to the mix, we can begin to uncover the power of 3D modeling. "Capturing interactions in the network of Gene-protein interactions"(1) - If we can capture the affects of interactions between data sets, and weigh their significance using neural computation models we can begin to dynamically compose and decompose relationships in massively parallel fashion. Beyond that, we can also begin to establish those that are of more importance to us based on historical content and knowledge or small-context discovery. This would be the self-assembly component of the Nanohouse.

(2) lists many different programs that DARPA is involved in, while many of these remain closed to the public, their titles are informative and show a heavy convergence in the Nanotech area.

Another report:
"An overview and categorization of existing research in DNA based computation, the possible advantages that different models have over conventional computational methods, and potential applications that might emerge from, or serve to motivate, the creation of a working Bio-molecular Computer." (3)

Shows that bio-molecular computing requires specific modeling methods, and that models can have an impact in both the type of computing as well as the abilities of the computational device - to serve it's purpose. The Nanohouse is built from the neural model in the brain, as a massively parallel system tied together with specified form and function, it can scale beyond our current dreams.

If you have some interesting links you'd like to share, or thoughts about the future of Nanohousing, I'd love to hear them.

Sources:
1. DARPA Military Briefing, http://www.darpa.mil/ipto/solicitations/closed/01-26_briefing1.pdf
2. DARPA Listing of Programs, http://www.darpa.mil/dso/programs.htm
3. http://publish.uwo.ca/~jadams/dnaapps1.htm


Posted November 17, 2005 6:38 PM
Permalink | No Comments |

Here we go again, YET ANOTHER EII vendor pushing the fact that they can "replace" the need for a data warehouse. In this blog I will talk about the issues that customers face if they DON'T implement a data warehouse. There are pros and cons to everything, fight the hype that EII is the be-all-end-all solution, it's NOT. EII is one successful piece to the puzzle; we just need to know where it fits.

The article is at: http://www.metamatrix.com/news/cbr-030705.pdf

The first quote I want to discuss is as follows:
"Proponents argue that EII replaces a physical extract of a data warehouse, thereby removing the need to spin-off expensive data marts."

Now just hold on, first off who says data marts are expensive? By who's measuring stick? Where are the numbers to back this up? Is this person discussing the entire corporate warehouse or just a single star schema?

Unfortunately we don't have answers to these questions, but if a vendor comes to you and pushes EII in this light, I would strongly suggest you ask them these questions. Let's dive in a little more.

EII Doesn't replace a physical extract unless the data that is wanted is current. It might be more purposeful to say that if EII replaced anything at all, it may be the need for the ODS itself, not necessarily the data marts. Data marts with history serve a huge strategic purpose. Integration systems (ETL/ELT) night after night, with the cleansing and integration of data on a massive scale serves a purpose. Most times the enterprise is looking for strategic answers across much of the history. EII is not built for TREND analysis, that's the job of the warehouse.

Now, can EII access the warehouse behind the scenes? YES! That's the beauty of the EII system, it can leverage the existing investment, it can also leverage all the knowledge and rules that have been built to integrate the data historically.

Here's the next quote:
“What’s the point of integrating a large amount of data when you only need a small slice of it?” Chappell asks. “Because the nature of corporate information is dynamic, trying to keep it replicated and synched in three or four databases when joined with another is impractical, especially if it’s accessed infrequently.”

The point here is interesting. Again, if all that is needed is OPERATINAL (Now) data with no history, EII IS a great solution. But if data has to be trended across history, it will need a data warehouse behind the scenes. Data Warehousing experts do not typically "integrate large amounts of data" for the fun of it. We have business reasons that derive value from all that history.

Who says we always replicate it in three or four databases? This is rarely the truth. It's like justifying their argument by saying: who needs three or four operational systems?

Here's the next quote:
“EII can help to ease the backlog of requested business reports,” says Chappell. “Changes to a data warehouse model to bring in new data can take months. EII isn’t as brittle as the procedural ETL scripts and can affect the necessary changes much more quickly.”

True, but only if the business reports are operational in nature. What this gentleman does NOT discuss is the impact on the operational systems of an EII query to originally GET the data out. There is a cost to using this technology, and it needs to be discussed. If you're talking to your sales reps for an EII solution, either bring in an expert to help with the evaluation, or ask the pertinent questions regarding impacts, standards, and best practices.

Finally, at long last, someone else in the article begins discussing the true value of EII as it pertains to the enterprise:
“EII isn’t an alternative to a data warehouse,” says Walcott. “Rather it augments historical time-series BI reporting with fresher operational detail.” He adds that EII is especially useful for situations where you want to get to detailed data that is usually omitted from the warehouse.

I agree - it definitely augments the reporting with a "fresh" view of the now or current data. EII is a powerful paradigm, allowing the strategic or historical batch warehouses to become an Active Data Warehouse overnight, without the higher costs associated with "activating" the warehouse by writing your own code to load it dynamically.

EII has tremendous value in the web-services production layer, along with the enterprise metadata management strategies. What we need to do as practitioners is figure out HOW to leverage the metadata across other tool sets, like retrofitting it (any changes to it) automatically into ETL/ELT, and propagating the business metadata into a metadata management tool for classification and ontology. One might say that another piece to the successful EII/ETL - Data Warehousing venue would be a service registry tool/utility, and a business metadata rules management component.

It saddens me to find that specific vendors are hyping the "cool-aid" approach to EII, this does nothing but damage the notions of EII and what it CAN do, damages the vendors (in the analysts eyes), and produces false expectations of the technology which IT cannot meet. EII is a GREAT resource, but should be utilized in the right light, and as an augmentation to the data warehouse in place, not as a "replacement" for future data marting or historical efforts.

Thoughts? Vignettes? Ideas? Pulitzer Prize winning essays?


Posted November 17, 2005 4:29 AM
Permalink | No Comments |
PREV 1 2

Search this blog
Categories ›
Archives ›
Recent Entries ›