We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.


Blog: Dan E. Linstedt Subscribe to this blog's RSS feed!

Dan Linstedt

Bill Inmon has given me this wonderful opportunity to blog on his behalf. I like to cover everything from DW2.0 to integration to data modeling, including ETL/ELT, SOA, Master Data Management, Unstructured Data, DW and BI. Currently I am working on ways to create dynamic data warehouses, push-button architectures, and automated generation of common data models. You can find me at Denver University where I participate on an academic advisory board for Masters Students in I.T. I can't wait to hear from you in the comments of my blog entries. Thank-you, and all the best; Dan Linstedt http://www.COBICC.com, danL@danLinstedt.com

About the author >

Cofounder of Genesee Academy, RapidACE, and BetterDataModel.com, Daniel Linstedt is an internationally known expert in data warehousing, business intelligence, analytics, very large data warehousing (VLDW), OLTP and performance and tuning. He has been the lead technical architect on enterprise-wide data warehouse projects and refinements for many Fortune 500 companies. Linstedt is an instructor of The Data Warehousing Institute and a featured speaker at industry events. He is a Certified DW2.0 Architect. He has worked with companies including: IBM, Informatica, Ipedo, X-Aware, Netezza, Microsoft, Oracle, Silver Creek Systems, and Teradata.  He is trained in SEI / CMMi Level 5, and is the inventor of The Matrix Methodology, and the Data Vault Data modeling architecture. He has built expert training courses, and trained hundreds of industry professionals, and is the voice of Bill Inmons' Blog on http://www.b-eye-network.com/blogs/linstedt/.

September 2005 Archives

Let's cut right to the chase, I've been blogging on how data modeling affects our abilities to actively "execute" on data - in other words, how well we can interpret or interpolate the results from on-screen information to actionable business decisions.

I've found a software company in the Nanotech side of molecular modeling and 3D visualization that specializes in "bringing data to life". Talk about drill-down, and simulation! These guys appear to have it together. Read on to find out why I believe these types of advancements are needed to light a fire under our BI systems.

On another note: Tiny droplet of water moved up a slope using "nanomachines". Very short article, really cool technology!

Ok, back to the 3D visualization. I received some comments about data visualization and how it can be extremely complex to try and map business data to 3 dimensional models. Lets step back and take a slightly different tack.

A completely different field (nanotech) is way out front in this area. Read a COOL article here, see the models they offer. Their business users: scientists, chemists, physicists, and so on are all required to utilize simulation and modeling software in real-time and 3D space. They are working in a virtual world. Their business users have complete control over "what-if" analysis, storing and saving simulations, replaying, intersecting the simulations, and building new representative models of the information displayed within. Granted - they work in a 3D world, where molecules have a basic shape and can be defined with motion vectors.

However this is where I get off the bus and look around to ask: where are the BI vendors that are "striving" to break the curve, push the paradigm, and blow our socks off with a new paradigm? Why aren't the BI vendors partnering up with data mining vendors, and data visualization specialists? Why can't the BI vendors bring in animation specialists? It's not just the nanotech sector!!

Take a look around, the industry of BI is evolving, and the BI vendors appear to have been "left on the shore." I got news: The BOAT HAS LEFT THE DOCK and is already 10 miles out to sea. Just look around at the education industry, the internet, or the gaming industry. When was the last time you said: "this game is cool with it's bar charts and pie graphs, I think I'll play that one non-stop!"

Or how about this: "Wow! Look at this web-site! It's got a title, and static text! I think I'll create my corporate portal this way!"

Ok - here's one more: When was the last time you took a class that consisted of the ever-fun and addicting "read this word document" - then we'll test you on it, on line.

No!! You want to hear the instructor, see animation, watch the components in action, replay the learning pieces. You want to see the web-site updated, graphics and sound, flashy movement - spiffy looking and functional. What about games? Why was DOOM or Descent so popular? 1. Addictive interactive play, 2. The sequences were varied every time it was played 3. Intensive graphics and incredible "experiences".

You know, our BI vendors could take some lessons on this - it's time they brought their visualization interfaces into the 21st century. I'm tired of hearing "turn this (picture of spreadsheet) into this... (picture of red-green-yellow speed dials and bar charts)."

Here's my AD draft for the next big BI visualization tool:

"Interact with this, develop visualization scenarios, view your data across multiple axis (dimensions), swap your dimensional points in and out of your graph to change the landscape, walk around the graph, give motion to your graph in real time - backed with the latest in data mining and visualization technology, we BLOW the covers off the other BI vendors in presentation, style, and interactivity.

That's right! PLAY with your data in a way that is educational, have FUN in a 3D virtual world, see connections across data relationships like you've never seen before!"

Ok - so I'm not a sales man, but I want a FUN tool with hot graphics and the option of time-lining data (like a video editor in playback mode) over a 3D landscape. If I could adapt the Nanotech visualization and modeling tool to the BI world, I would. The next BI vendor to overcome this paradigm shift could be rich, really really fast.

A couple of questions for the readers:
If you had to suggest HOW to make this happen to a BI vendor, what would you say?
Would you want a system like this? How would it impact your business and business decisions?

See you soon, Dan L


Posted September 16, 2005 9:49 AM
Permalink | 3 Comments |

The availability of real-time access to live business data — business visibility, as Cisco likes to call it — will draw a line under enterprise investment in data warehouse products, says Michael Carter, co-founder and chief marketing officer of CXO Systems: "The data warehouse is going the way of the mainframe." This from an article at Looseley Coupled.

Seriously folks! I'm not kidding.. Is the Data Warehouse really going the way of the mainframe because of EII?

http://www.businessintelligence.com/ex/asp/code.115/xe/article.htmI do not agree with this statement at all. In fact, if we look at EII and it's value to the industry (which it has quite a bit of), EII is fitting in nicely as another mechanism to backend reporting - allowing the EDW to remain in a strategic nature, while handling the tactical and operational data ON TOP of the EDW. If we dig deeper, and examine the larger picture of SOA we find that a strategic EDW becomes CRITICAL to the mix of back-end components required to expedite the creation of an enterprise SOA system - hardly legacy technology.

Just because EII is becoming critical in the component stack doesn't mean the amassed data sets are old, brittle, and non-conformant to business. In fact it’s just the opposite. Successful data warehouses play a huge role in the strategic success of understanding the business and feeding vendors, supply chains, external customers, internal customers as much data as they can handle. Without a consolidated and quality controlled data warehouse, the SOA is just another EAI system with exposure to the outside world.

Ok - I'm upset, but shouldn't it be that way? I don't mind the change to EII, nor the need for EII to be involved in SOA initiatives - but don't tell me the Data Warehouse is going the way of the mainframe, and then not back it up with quantitative facts. A more accurate statement would be that the EDW is changing into a more dynamic and integral part of the overall enterprise architecture.

Here's another article: "The Data Warehouse Is Dead", written in 2004 after the fall of ENRON. The Data Warehouse is NOT dead, they are alive and kicking - in fact, most are expanding. Michael Carter, again.

I agree there is value in distributed intelligence, don't get me wrong. I also agree there is lots of value in up-to-date information. However, I feel he is tremendously discounting the nature of quality efforts, the ROI that companies have seen, the first-look at an integrated or patterned history of customer activity, the data mining results netting corporations millions of dollars and so on. As with ANY project GOALS and OBJETIVES must be set, RISKS must be mitigated, and REQUIREMENTS must be written.

It's a shame for the EII industry that this gentleman feels the need to discount one of his major sources of quality data for enterprise views. Where do his reports and services GET their (historical) information from if an enterprise integrated view is NOT available? Can auditors answer the questions of what happened on Day X if it's not stored in a data warehouse somewhere? I'm not so sure.

In another post, Andy Haylor (Kalido) says: EII requires provisions to access the data, as does the Service Oriented Architecture that will feed the enterprise needs. He goes on to state several major issues that EII as an industry has yet to overcome (IF it wants to replace the data warehouse entirely rather than feed from it): The nature of gathering and integrating history (consistently), producing snapshots of data AS OF a particular point in time especially when the source systems have "dumped" the data beacuse they are operational, managing and controlling query access speed and timing against operational systems, trending analysis and so on.

What I would say is EII has value, it also has it's place - not to mention it's a technology built to solve specific business problems. EII as an industry needs to mature, by mature I mean - build standards, define methodologies for implementation, provide best practices, tricks and tips to implementation, develop case studies how they solve specific business problems and what the ROI on those problems are, and begin defining risk mitigation for projects and implementations across the board.

Again, EII isn't the issue here - EII is an additive component that brings value to the table for existing data warehouses, and increases the need for corporations (those that don't have one) to build EDW's, particularly Active Data Warehouses with Right-time data delivery to the SOA through utilizing BI and EII together.

Thoughts?


Posted September 14, 2005 6:19 AM
Permalink | 2 Comments |

http://sawww.epfl.ch/SIC/SA/publications/SCR02/scr13_page23e.htmlThe Nanohouse computing device is still just a dream today, and it may be bound to stay that way for some time. It never hurts though to explore the "what-if" side of things. In this blog entry we explore the advances made in DNA computing and self-assembly. Self-assembly is an important part to nano scale machines. It provides the ability to produce consistent, repeatable (and ordered) circutry. These patterns are the very foundation of the Nanohouse large-scale data capture and modeling efforts.

"This stuff is coming," Uldrich says, "and it's coming a lot sooner than many people believe." ComputerWorld.

Molecular electronics is one of the most promising directions in nanotechnology [1]. The building blocks of future molecular electronic devices could be specially designed organic molecules assembled on appropriate substrates into useful circuits through the processes of self-assembly, i.e. the spontaneous organization of the molecular building blocks... SuperComputing Review Publication.

My hypothosis:
The larger systems get the more order they must have - or they become unmanagable, unwieldy, and begin behaving badly.

For example, consider the initial construction of the automobile. When Henry Ford sat down and thought about the problem of "mass production with consistent quality", he came up with a revolutionary system: build all automobiles the same way every time - that answers the quality side of it, and then add repeatable and redundant tasks along a series of checkpoints - voila the assembly line.

What do you think would have happened to the creation of the automobile if he had said: build 100 autos a day, everyone needs to be an expert in their field - and build their own car from bottom to top (without an assembly line)?
Chaos would have ensued, and his factory probably would have fallen apart from all the mistakes that were made. No single individual could have been an expert in every aspect of building the car. Consistency, repeatability, and order are the keys to automation - and thus self assembly of the nanoscale warehouse.

"The concept of a mass-produced structure with dimensions measured in atoms helps explain why researchers are turning to nanotechnology as the next great hope for Moore's Law..." ComputerWorld

The nanohouse is relies strongly on these principles, in fact so strongly that it forces us to rethink the way we compute, store, and utilize information (data). Data models that represent 2D space are no longer enough. We must concentrate our efforts on 3D modeling and learn from the molecules involved in the nanoscale calculations.

Example: "Another important simplification is made when the interaction of valence electrons with the electrons of the inner electronic shells of atoms is described by effective atomic pseudo potentials."

Lets paraphrase and over-simplify as we apply this to the nanohouse:
"Another important simplification is made when the interaction of [two or more business keys] with the [business keys of other elements] is described by [relevancy and frequency of relationship] potentials." The business keys provide the unique reference points into the information housed within the nanoscale devices.

The job of the nanoscale devices are to:
a. understand the data they carry (have some knowledge as to what would constitute a weak or strong bond i.e. relevancy)
b. understand what other nanoscale components they are allowed to "connect with" or self-assemble to.
c. propel themselves through the environment looking for other elements to attach to.

The results would show incredible ability to form "memory like" structures hopefully one baby step closer to the human brain functionality. It would have the capabilities of re-wiring itself by changing the self-assembled structure, or by being stimulated by an outside charge.

Let's examine this from a scientific perspective as it relates to the modeling necessary to represent a system like this:
"The most demanding parts of the calculations are i) the fast Fourier transforms (FFT) needed to evaluated the total charge density in real space and ii) the scalar products between wavefunctions, which are necessary to enforce orthogonality between the orbitals. Both operations can be efficiently parallelized [4] so that the overwhelming majority of the operations are performed locally on each processor through calls to optimised library routines (matrix-matrix multiplications (MMM) and one-dimensional FFT), while a carefully written proprietary three-dimensional (3D) FFT routine assures that the communication overload is minimized during grid transpositions." SuperComputing Journal

We must change our "data modeling" skills into biomechanical modeling skills. Why is this a big leap? Why is it so important for our success moving forward? What impact does it have on the Nanohouse of the future?

"Information and algorithms appear to be central to biological organization and processes, from the storage and reproduction of genetic information to the control of developmental processes to the sophisticated computations performed by the nervous system. Much as human technology uses electronic microprocessors to control electro-mechanical devices, biological organisms use biochemical circuits to control molecular and chemical events. The ability to engineer and program biochemical circuits, in vivo and in vitro, is poised to transform industries that make use of chemical and nano-structured materials." California Institute of Technology

What we need to address NOW is our primitive thought processes. It's time to think outside the box - time to expand our horizons. Can we get a Data Modeling tool vendor to finally come to the table and offer 3-D modeling based on variances, strength of bonding (associative properties), and relevance? If we can build some of these attributes into our respective data models - that's one step closer to the nanohouse. Of course there are hundreds of miles to go before we get there. The modeling is where it starts; from there we can begin to focus our efforts on the programmatic shifts that must take place.

Additional blog entries will continue exploring the nanohouse along with the notions of DNA computing, and self-assembly. We will explore the notions of the hypothesis stated earlier and work at uncovering what happens to a system when it expands beyond order.

Seen any interesting nanotech articles lately? I'd love to hear about them. What's your view on Nanotech, DNA computing, Information Modeling? Sound off!


Posted September 13, 2005 6:03 AM
Permalink | No Comments |

We could learn A LOT about information modeling from the nano molecular levels if we only paid attention. Self-assembly at the nanoscale provides many clues about how we should model our information systems as they grow. This blog entry highlights self assembly and its attributes: repeatable, consistent, and reliable.

"Although man's understanding of how to build and control molecular machines is still at an early stage, nanoscale science and engineering could have a life-enhancing impact on human society comparable in extent to that of electricity, the steam engine, the transistor and the Internet." -- Professor David Leigh, Edinburgh University


ComputerWorld reports that self assembly and mixed silicon circuits are 5 to 7 years off. However they do present some very interesting findings from the lead laboratories in the nation. Here we explore impacts, and apply the ideas to a cross-field: data modeling.

"The neat thing about SAMs is they're very well ordered," McGimpsey says. A field of these SAMs protrudes from the substrate at a well-defined angle—like a small patch of thick, well-tended grass—and can perform several duties, such as improving conductivity or increasing surface area. Such order, McGimpsey says, "means predictability of structure, and thus of properties." (from the ComputerWorld article mentioned earlier)

The order means predictable structure and properties - shouldn't we be taking our data modeling queues from nature? Our current data modeling efforts inside RDBMS engines is ancient history, 3rd normal form has only a few ties to natural structure. Our data models must reflect the natural models at the nanoscale. They need to be repeatable, predictable, and redundant. This is a foundation of the Nanohouse. See my web site for more information.

What does Order bring to the table?
Redundancy, fault-tolerance, control, scalability, repeatability are all attributes of order. If we can provide an ordered data model for our information systems (one that resonates with natural models) we can begin predicting how it will act under certain circumstances. We can also begin producing (automatically) the models that will house our data.

No matter how large the data sets grow, we can always predict exactly how it will perform – especially through the use of Fourier transforms and mathematical formulas. "Natural systems form nano-scale structures," Natural systems also provide accurate accounts of form and function. Why then do we in IT insist on creating artificial modeling elements in a 2 dimensional world to house our data? We should be solely focused on 3D modeling capabilities with repeatable and redundant design (ordered systems).

With IT moving toward SOA, we should also be focusing on the data model behind the scenes – can it self-assemble in the future? Can self-assembly mean self-maintaining data models? Can data models proactively change according to newly arriving stimuli? Can we teach our modeling systems (in the information industry) like chemical experiments? When will our data modelers finally learn that it’s about the FORM and FUNCTION, not just the data itself?

For now, focusing on the biological aspects of nano self-assembly can bring tremendous gains to the data modeling world – if for nothing else, housing huge quantities of information in an itsy-bitsy space and an ordered and repeatable fashion.

Do you believe Data Modeling needs an overhaul? Sound off!


Posted September 13, 2005 5:59 AM
Permalink | No Comments |

In my blog: "Stuck in 1985", I discuss the nature of graphing, and how I believe the current BI Reporting vendors aren't doing enough to represent the data for visual recognition. There's a flip side or an underside to this current as well. The question I'm driving here is: Is accurate data visualization driven by data modeling architecture of the warehouse behind the scenes?

I would tend to say YES, it is. In this blog, we explore this notion a bit more in depth. Take a look and let me know what you think...

I begin with directing towards visualization tools, just as I directed towards graphing tools in the last round. In this particular case, there's an Open Source data visualization component called OpenVis from IBM. In the enhanced data model section it discusses how the "data model" plays a critical role in the visualization capabilities. With OpenVis, apparently the data model is an object-oriented component. Here, they discuss the the details of the data model in action.

I believe that data modeling is a key to open many doors. The data model should be consistent (in architecture), repeatable, redundant, and flexible to change (without restating data). In this case, the components or entity types should be standardized beyond just "parent child". In order to gain some sort of 2-dimensional understanding of a data model, patterns within the data model itself must be easily recognized.

“If we assume that the viewer is an expert in the subject area but not data modeling, we must translate the model into a more natural representation for them. For this purpose we suggest the use of orienteering principles as a template for our visualizations.” http://www.thearling.com/text/dmviz/modelviz.htm

In this case, orienteering is the use of “anchor points” like a 3d Landscape where we anchor ourselves to visual queues, street corners, addresses, height of buildings, etc. In the data modeling case, orienteering could easily mean data points treated as geographical or spatial coordinates. In other words, the data model can be capable of driving multi-axis (multi-dimensional) graphing qualities; in fact, I blogged about this earlier.

Here is a very interesting knowledge portal used to visualize information in a moving format (theBrain). Here the data model is virtual – embedded in the software’s reference layer to the content it collects. It reflects a neural net behind the scenes. What if we were to extrapolate the notions behind neural net? What if we were to over-simplify the representation of information in a standardized data modeling format? Would we be better equipped to visualize and mine the information in its native stored format?

I’ve attempted to do just that with the Data Vault data modeling architecture. It’s a standardized set of entity types that represent a poor mans neural net. It provides a two-dimensional data storage space with the capacity for N dimensional bisection/associations based on the physical data stored within the entities. It is based on the business keys and semantic definition of those business keys, along with the grain of those keys. In this manner – grain might be considered one dimension; semantic definition could be another dimension. Within the model we can add gradient and mechanical relevance scores to assist in defining associative properties between elements. In turn, it becomes easier to represent this information in a 3D modeling format, where the data can be visualized and explored on (for instance) landscape maps.

I believe that the key to visualization, and better understanding of our information relies heavily on the architecture or data model housing that information. You can read more about the Data Vault Here...


Posted September 8, 2005 8:37 AM
Permalink | 1 Comment |

Search this blog
Categories ›
Archives ›
Recent Entries ›