Business Intelligence Network business intelligence resources

Blog: Dan E. Linstedt

« August 2005 | Main | October 2005 »

September 30, 2005

Can we get RFIDS for Data?

Hmmm, I've been thinking about this for quite a while. In the tangible world we have tags for physical goods - yesterday they were bar codes, today they're RFIDS and RTLS systems. Tomorrow, physical elements may be tagged with DNA sequences, or electron signatures at the nano level.

Why then is it so hard to track intangible "data"? For applications we have the equivalent of software licenses, but for the actual data? Nothing.

In a world of hypothetical speculation, I would suppose that tagging every data element with an individual signature may be desirable. We could start with "units of work". Take this blog entry for example, tag the header with a signature, and tag the extended entry with a signature. The important thing is: the signature must travel with the unit of data - everywhere it goes. It becomes the unique ID for data sets, like RFID's and should be tracked across the network.

What possibilities does this open up? In data warehousing we often tag our data with CRC32/CRC64, and MD5 (hash functions producing mostly unique values across a row of data). Why then can't these "keys" become universal, and shared around the world? These are standard functions that produce the same keys for the same data everywhere.

What would happen if we could actually tag every "word" entered into every application? I assume data traceability would increase exponentially, talk about a boost to search engines! Unfortunately the downside to these functions is they produce very large keys, and for the most part functions like MD5 cannot be "reversed" - which leads to a massive storage and lookup function. Another issue is that CRC32 can produce duplicates (as can CRC64, although less frequently).

If someone were to produce a device that can "tag" data going over the internet, store the data in compressed format with the key-tag, then pattern recognition would be easier to spot - data mining would see a huge boost, and it may be possible to aggregate what used to be seen as dissimilar data into a similar keyed entry. These keys could also be shared across environments - maybe this is a call to EII vendors who are sharing data over SOA and web-services?

For now it's a pipe dream, but it may step into reality with DNA computing or nanotechnology. Just think: Data Unique Universal Keys (DUUK) - a fascinating idea. From compliance and monitoring perspectives it opens a ton of doors.

Thoughts?

  Posted by Dan Linstedt at 6:24 AM | | Comments (1)


September 27, 2005

Oh Where - Oh Where has my identity gone?

Holy Begonias!!! Talk about breach of security, and loss of personal identity. This story: "Credit card firms don't have to warn individuals" seems to ludicrous to ignore.

This is just a plane outrage. In this blog we'll explore what this means to corporations, especially with precedence set - beware: I'm warning you now, this entry is mostly a RANT.

This is incredulous. How can a judge in San Francisco say that credit card companies don't have to report "break-ins" or successful hack attempts on 40 million+ consumers? He must not have had his identity stolen!

In terms of personal loss, this is tremendously devastating. The ruling makes a statement to corporations everywhere that they no longer need to "admit" that all that information they collected from you, was stolen. That's right! Of course, that means it also makes it easier for the corporations to "sign agreements" to share information without your knowledge.

* What happens to the privacy policies? Out the window.
* How about Information "protection promises"? Meaningless.

If corporations follow suit, we will have chaos very very soon.

What if a medical claims company has their system hacked, and all your medical information was "compromised" (to use their terms). What if this medical claims company (or financial company for that matter) doesn't have to tell you that your personal information was stolen, and now appears on free hacker sites all over the world? Did this judge stop to think about all the rules that HIPPA states about privacy of medical information? I think not. The information may have to remain private, but if it's stolen - well, the company doesn't have to tell you about it.

This is one of those things that is just a BIG MISTAKE by a judge. I'm sorry, but if I can't own the information about me, the next best thing is to hold the credit card (and all the other) companies accountable for what happens to it! So what's the score now?

Accountability ZERO, Deceit TEN.

My oh my, and if we stop to think of what this does to Information Quality? If the information that is stolen is flat-out wrong, now there's absolutely no way to get it "fixed."

Bottom line? This ruling opened a Pandora’s box: You can't trust anything anyone ever says about keeping your information "private" anymore.

Anyone else care to Rant? Rant anonymously if you wish...

  Posted by Dan Linstedt at 3:37 PM | | Comments (0)


Between Inmon and Kimball Design

Want to break down the barriers? Tired of "taking sides" when you don't have to? In this blog I explore a modeling technique called the Data Vault (no it doesn't have to do with security or locking your data away). This technique sits squarely between Inmon 3rd normal form warehouse and Kimball Star Schema design as a warehouse.

This modeling technique is comprised of the best-of-breed from both designs and is built to overcome limitations of the adaptations made to each data modeling architecture; specifically with regards to data warehousing.

What is a Data Vault?
Definition: The Data Vault is a detail oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business. It is a hybrid approach encompassing the best of breed between 3rd normal form (3NF) and star schema. The design is flexible, scalable, and consistent. It is a data model that is architected specifically to meet the needs of today’s enterprise data warehouses.

What is it's real name?
Common Foundational Warehouse Architecture

What does it do for me?
* Saves time and money in build out of your data warehouse or enterprise integration initiative.
* Provides a consistent, repeatable architecture that can scale to your enterprise needs.
* Defines easy to use standards for all to follow
* Reduces complexity of the integration effort
* Saves storage space
* Increases visibility into both: well-oiled and broken business processes
* Demonstrates a strong basis for Data Visualization and Data Mining

What are some of the benefits?
* Rapid prototyping and build out of data marts and reporting solutions.
* Genuine audit trail pictures produced of the enterprise vision of data (even if none exist on the source systems)
* Data is modeled by type, and rate of change - allowing both batch and real-time to be "added" to the warehouse at the same time.
* Contains a high degree of data attribution
* Scales to petabyte levels if necessary
* Can handle near real time data arrival at a fraction of a second.

Yes, but are there any customers using it?
Sure - just check the web site for more information.

What are some of the success stories?
* Large manufacturing company saved millions when finding and fixing a billing error that had been occurring for the past 15 years.
* Large financial company integrated M&A 3 companies in 3 months flat
* Large banking company adds "branches" to their warehouse quickly at a low cost.
* Government operation decreases "time to build data marts" to 1 hour (from requirements to inception).

What makes the Data Vault so successful?
It's ability to be modeled AT THE BUSINESS LEVEL. The Data Vault is designed to mimic the business keys - the most important data element in business. Once the keys are established it models the relationships across those keys; which flush out both process relationships and undocumented business operational relationships. Finally the attribute data or descriptive data is added to the mix and split by Type of Data and Rate Of Change.

From a business perspective, the logical model is tied tightly with the physical model and architecture - therefore it is easy to change the model as the business changes. It is also based on a statement of fact. The business keys are in use and were in use at a specific point in time. This model is built to capture those ideas.

Come hear more about this technique at TDWI, Sunday October 30th in Orlando. Or contact me directly: daniel.Linstedt@myersHolum.com

Thanks,
Dan L

  Posted by Dan Linstedt at 5:26 AM | | Comments (3)


September 20, 2005

EII, EAI, and ETL in Laymans Terms

Pardon my ignorance, I'm still learning EII too - where it fits, how it's growing up, and what customers really can do with it in the long run. Lately I've been asked to describe EII vs EAI vs ETL in layman’s terms. I'll attempt to do this in this (short) entry. Again, if I misunderstand, please correct me for the benefit of the community, and I'll go seek additional definitions for better content next time.

By the way, I do enjoy the comments I've been getting from Tim Mathews (and others). He has a blog on Ipedo's web site: http://blogs.ipedo.com/integration_insider/

I work hard at translating terms into layman’s terms, and sometimes I don't quite get it right the first time ;-) Please correct me if I misstate some truths in translation. Just don't send the BableFish after me!

I'll try my best to explain the basic definition, differentiator, and provide an over-simplified example of where the technology might fit. Hopefully this will clear the air a bit more...

EII - Enterprise Information Integration, crudely defined as a middle tier query server; but it's much more than that. It contains a metadata layer with consolidated business definitions. It also contains (usually) an ability to communicate through web-services, database connections, or XQuery/XPath (XML translation). In fact, it relies heavily on the metadata layer to define "how and where" to get its data.

It's a PULL engine, that waits for a request - splits the query (if it has to) across heterogeneous source systems (multiple sources), gathers transactional (mostly) data sets, merges them together (again relying on the metadata layer for integration rules), then pushes them out to the requestor; which could be a web-service, a BI query tool, Excel, or some other front-end (like EAI or Message Queuing Systems).

With EII it may be safe to say?
The more definition that a business can provide for the metadata layer, the better the ROI the business will see, and the higher the utilization of the tool.

EII usually sits seamlessly between the requestor and the multiple scattered data sets. One final note: its job is NOT (as of today) to move massive batches of information on a scheduled basis from point A to point B through heavy translation layers.

An oversimplified example might be: A voter walks into a voting area, the registrar needs to check his background, current address, phone numbers, driver’s license records, and any recent activity involving the law. Each system has it's own interface, each system is completely disparate and doesn't talk to one another, and the registrar only has a drivers license number (maybe a current address) to look them up with. They need a response in a matter of seconds: Can this guy vote here now? EII is a perfect fit for getting this kind of job done, although the registrar uses a web-interface and never "sees" the EII tool doing the work.

EAI - Enterprise Application Integration. This one's been around for a while. In layman’s terms: EAI connects your Siebel to your PeopleSoft, and your Oracle Financials to your SAP systems, and vice-versa. Most EAI systems are PUSH driven, a transaction happens in your Enterprise App, and an EAI listener "sees" it and pushes it out over the bus, or to a centralized queue for distribution to other applications. Most EAI engines are more "workflow" and "process flow" driven rather than on-demand.

A simple example is: PeopleSoft is connected to Oracle Financials, and a sales person enters a new customer order, the EAI application picks up the new customer / new order, and sends it to Oracle Financials to be recorded. EAI is also transaction oriented. EAI's major flaw? It doesn't talk to "non-applications" like legacy systems, data warehouses; excel spread sheets, stock tickers, unstructured data, email, and so on (although some vendors have built custom "readers" for this information).

ETL - Extract Transform and Load, sometimes known as ELT (extract load THEN transform). This also is an older paradigm (although somewhat newer than EAI from an acronym standpoint). ETL/ELT offer PUSH technology. Usually geared towards huge volumes, highly parallel, repetitive tasks, scheduled and continuous. These are a kind of heart-beat of many integration systems around the world today - they feed massive amounts of data from point A to point B in a timely fashion. They are responsible for performing that task on a consistent and repeatable basis. They handle massive transformations (sometimes in the database, sometimes in stream).

Most ETL/ELT engines today also run on metadata, but a different kind of metadata (compared to EII). The metadata they utilize (I like to call) PROCESS METADATA. It contains back-office workflow information, the end-results of the data integration are often seen through utilizing data marts or querying the database directly. Although rare, ETL/ELT can also be used as a device to synchronize systems around the organization on an hourly or nightly basis.

ELT/ETL engines often do NOT respond well to transaction based requests, which is why ETL/ELT vendors are struggling with Real-Time integration today. An example of ELT/ETL would be: Integrate all customer data from 4 or 5 of my source systems overnight - produce a customer management table with all my customers in it. While you're at it, get me an ice-cream with a cherry on top and a root beer... Just kidding.

Well, this brings this entry to a close, I hope you enjoyed "my version of the truth." feel free to correct me, and I'll do more homework next time. Same B-Eye Time, Same B-Eye Channel, tune in next time for: For Whom the EII bell tolls??

Cheers,
Dan L

  Posted by Dan Linstedt at 7:30 PM | | Comments (5)


September 16, 2005

Stinky Feet - Nanotech, the laces that bind them!

I blogged about the Stinkiest Shoe Competition back in May 2005. Read it here... This time I come to find out that a sock manufacturer has actually created (or is working on creating) anti-stink SOCKS through nanotech.

**Note, this blog is not for those with severe allergic reaction to big stinks...** (this entry is short and light hearted) :)

I asked the question: what would happen if a shoe manufacturer could utilize or apply nanotech in a way that would make tennis shoes less stinky? Now with less stinky socks do we even need to worry about stink free shoes? Probably - especially for those who don't wear socks.

But besides the point, read about this Boston Sock Manufacturer and their application of nanotech to sock material here... Really, I'm not kidding. I think there's big money to be made in the business of applying nanotech to solving every day problems like stinky shoes and now, stinky feet.

All I need now is a pair of socks that "don't get cold when my feet sweat."

Weigh in now, tell us how you feel - would you buy a pair of these socks? You can answer anonymously or "for a friend of yours" if you like...

See you next time, Dan L

  Posted by Dan Linstedt at 10:45 AM | | Comments (1)


What should your Dashboard look like?

Let's cut right to the chase, I've been blogging on how data modeling affects our abilities to actively "execute" on data - in other words, how well we can interpret or interpolate the results from on-screen information to actionable business decisions.

I've found a software company in the Nanotech side of molecular modeling and 3D visualization that specializes in "bringing data to life". Talk about drill-down, and simulation! These guys appear to have it together. Read on to find out why I believe these types of advancements are needed to light a fire under our BI systems.

On another note: Tiny droplet of water moved up a slope using "nanomachines". Very short article, really cool technology!

Ok, back to the 3D visualization. I received some comments about data visualization and how it can be extremely complex to try and map business data to 3 dimensional models. Lets step back and take a slightly different tack.

A completely different field (nanotech) is way out front in this area. Read a COOL article here, see the models they offer. Their business users: scientists, chemists, physicists, and so on are all required to utilize simulation and modeling software in real-time and 3D space. They are working in a virtual world. Their business users have complete control over "what-if" analysis, storing and saving simulations, replaying, intersecting the simulations, and building new representative models of the information displayed within. Granted - they work in a 3D world, where molecules have a basic shape and can be defined with motion vectors.

However this is where I get off the bus and look around to ask: where are the BI vendors that are "striving" to break the curve, push the paradigm, and blow our socks off with a new paradigm? Why aren't the BI vendors partnering up with data mining vendors, and data visualization specialists? Why can't the BI vendors bring in animation specialists? It's not just the nanotech sector!!

Take a look around, the industry of BI is evolving, and the BI vendors appear to have been "left on the shore." I got news: The BOAT HAS LEFT THE DOCK and is already 10 miles out to sea. Just look around at the education industry, the internet, or the gaming industry. When was the last time you said: "this game is cool with it's bar charts and pie graphs, I think I'll play that one non-stop!"

Or how about this: "Wow! Look at this web-site! It's got a title, and static text! I think I'll create my corporate portal this way!"

Ok - here's one more: When was the last time you took a class that consisted of the ever-fun and addicting "read this word document" - then we'll test you on it, on line.

No!! You want to hear the instructor, see animation, watch the components in action, replay the learning pieces. You want to see the web-site updated, graphics and sound, flashy movement - spiffy looking and functional. What about games? Why was DOOM or Descent so popular? 1. Addictive interactive play, 2. The sequences were varied every time it was played 3. Intensive graphics and incredible "experiences".

You know, our BI vendors could take some lessons on this - it's time they brought their visualization interfaces into the 21st century. I'm tired of hearing "turn this (picture of spreadsheet) into this... (picture of red-green-yellow speed dials and bar charts)."

Here's my AD draft for the next big BI visualization tool:

"Interact with this, develop visualization scenarios, view your data across multiple axis (dimensions), swap your dimensional points in and out of your graph to change the landscape, walk around the graph, give motion to your graph in real time - backed with the latest in data mining and visualization technology, we BLOW the covers off the other BI vendors in presentation, style, and interactivity.

That's right! PLAY with your data in a way that is educational, have FUN in a 3D virtual world, see connections across data relationships like you've never seen before!"

Ok - so I'm not a sales man, but I want a FUN tool with hot graphics and the option of time-lining data (like a video editor in playback mode) over a 3D landscape. If I could adapt the Nanotech visualization and modeling tool to the BI world, I would. The next BI vendor to overcome this paradigm shift could be rich, really really fast.

A couple of questions for the readers:
If you had to suggest HOW to make this happen to a BI vendor, what would you say?
Would you want a system like this? How would it impact your business and business decisions?

See you soon, Dan L

  Posted by Dan Linstedt at 9:49 AM | | Comments (3)


September 14, 2005

Is an EDW Legacy Technology because of EII?

The availability of real-time access to live business data — business visibility, as Cisco likes to call it — will draw a line under enterprise investment in data warehouse products, says Michael Carter, co-founder and chief marketing officer of CXO Systems: "The data warehouse is going the way of the mainframe." This from an article at Looseley Coupled.

Seriously folks! I'm not kidding.. Is the Data Warehouse really going the way of the mainframe because of EII?

http://www.businessintelligence.com/ex/asp/code.115/xe/article.htmI do not agree with this statement at all. In fact, if we look at EII and it's value to the industry (which it has quite a bit of), EII is fitting in nicely as another mechanism to backend reporting - allowing the EDW to remain in a strategic nature, while handling the tactical and operational data ON TOP of the EDW. If we dig deeper, and examine the larger picture of SOA we find that a strategic EDW becomes CRITICAL to the mix of back-end components required to expedite the creation of an enterprise SOA system - hardly legacy technology.

Just because EII is becoming critical in the component stack doesn't mean the amassed data sets are old, brittle, and non-conformant to business. In fact it’s just the opposite. Successful data warehouses play a huge role in the strategic success of understanding the business and feeding vendors, supply chains, external customers, internal customers as much data as they can handle. Without a consolidated and quality controlled data warehouse, the SOA is just another EAI system with exposure to the outside world.

Ok - I'm upset, but shouldn't it be that way? I don't mind the change to EII, nor the need for EII to be involved in SOA initiatives - but don't tell me the Data Warehouse is going the way of the mainframe, and then not back it up with quantitative facts. A more accurate statement would be that the EDW is changing into a more dynamic and integral part of the overall enterprise architecture.

Here's another article: "The Data Warehouse Is Dead", written in 2004 after the fall of ENRON. The Data Warehouse is NOT dead, they are alive and kicking - in fact, most are expanding. Michael Carter, again.

I agree there is value in distributed intelligence, don't get me wrong. I also agree there is lots of value in up-to-date information. However, I feel he is tremendously discounting the nature of quality efforts, the ROI that companies have seen, the first-look at an integrated or patterned history of customer activity, the data mining results netting corporations millions of dollars and so on. As with ANY project GOALS and OBJETIVES must be set, RISKS must be mitigated, and REQUIREMENTS must be written.

It's a shame for the EII industry that this gentleman feels the need to discount one of his major sources of quality data for enterprise views. Where do his reports and services GET their (historical) information from if an enterprise integrated view is NOT available? Can auditors answer the questions of what happened on Day X if it's not stored in a data warehouse somewhere? I'm not so sure.

In another post, Andy Haylor (Kalido) says: EII requires provisions to access the data, as does the Service Oriented Architecture that will feed the enterprise needs. He goes on to state several major issues that EII as an industry has yet to overcome (IF it wants to replace the data warehouse entirely rather than feed from it): The nature of gathering and integrating history (consistently), producing snapshots of data AS OF a particular point in time especially when the source systems have "dumped" the data beacuse they are operational, managing and controlling query access speed and timing against operational systems, trending analysis and so on.

What I would say is EII has value, it also has it's place - not to mention it's a technology built to solve specific business problems. EII as an industry needs to mature, by mature I mean - build standards, define methodologies for implementation, provide best practices, tricks and tips to implementation, develop case studies how they solve specific business problems and what the ROI on those problems are, and begin defining risk mitigation for projects and implementations across the board.

Again, EII isn't the issue here - EII is an additive component that brings value to the table for existing data warehouses, and increases the need for corporations (those that don't have one) to build EDW's, particularly Active Data Warehouses with Right-time data delivery to the SOA through utilizing BI and EII together.

Thoughts?

  Posted by Dan Linstedt at 6:19 AM | | Comments (2) | TrackBacks (1)


September 13, 2005

Checking in on the Nanohouse Computer

http://sawww.epfl.ch/SIC/SA/publications/SCR02/scr13_page23e.htmlThe Nanohouse computing device is still just a dream today, and it may be bound to stay that way for some time. It never hurts though to explore the "what-if" side of things. In this blog entry we explore the advances made in DNA computing and self-assembly. Self-assembly is an important part to nano scale machines. It provides the ability to produce consistent, repeatable (and ordered) circutry. These patterns are the very foundation of the Nanohouse large-scale data capture and modeling efforts.

"This stuff is coming," Uldrich says, "and it's coming a lot sooner than many people believe." ComputerWorld.

Molecular electronics is one of the most promising directions in nanotechnology [1]. The building blocks of future molecular electronic devices could be specially designed organic molecules assembled on appropriate substrates into useful circuits through the processes of self-assembly, i.e. the spontaneous organization of the molecular building blocks... SuperComputing Review Publication.

My hypothosis:
The larger systems get the more order they must have - or they become unmanagable, unwieldy, and begin behaving badly.

For example, consider the initial construction of the automobile. When Henry Ford sat down and thought about the problem of "mass production with consistent quality", he came up with a revolutionary system: build all automobiles the same way every time - that answers the quality side of it, and then add repeatable and redundant tasks along a series of checkpoints - voila the assembly line.

What do you think would have happened to the creation of the automobile if he had said: build 100 autos a day, everyone needs to be an expert in their field - and build their own car from bottom to top (without an assembly line)?
Chaos would have ensued, and his factory probably would have fallen apart from all the mistakes that were made. No single individual could have been an expert in every aspect of building the car. Consistency, repeatability, and order are the keys to automation - and thus self assembly of the nanoscale warehouse.

"The concept of a mass-produced structure with dimensions measured in atoms helps explain why researchers are turning to nanotechnology as the next great hope for Moore's Law..." ComputerWorld

The nanohouse is relies strongly on these principles, in fact so strongly that it forces us to rethink the way we compute, store, and utilize information (data). Data models that represent 2D space are no longer enough. We must concentrate our efforts on 3D modeling and learn from the molecules involved in the nanoscale calculations.

Example: "Another important simplification is made when the interaction of valence electrons with the electrons of the inner electronic shells of atoms is described by effective atomic pseudo potentials."

Lets paraphrase and over-simplify as we apply this to the nanohouse:
"Another important simplification is made when the interaction of [two or more business keys] with the [business keys of other elements] is described by [relevancy and frequency of relationship] potentials." The business keys provide the unique reference points into the information housed within the nanoscale devices.

The job of the nanoscale devices are to:
a. understand the data they carry (have some knowledge as to what would constitute a weak or strong bond i.e. relevancy)
b. understand what other nanoscale components they are allowed to "connect with" or self-assemble to.
c. propel themselves through the environment looking for other elements to attach to.

The results would show incredible ability to form "memory like" structures hopefully one baby step closer to the human brain functionality. It would have the capabilities of re-wiring itself by changing the self-assembled structure, or by being stimulated by an outside charge.

Let's examine this from a scientific perspective as it relates to the modeling necessary to represent a system like this:
"The most demanding parts of the calculations are i) the fast Fourier transforms (FFT) needed to evaluated the total charge density in real space and ii) the scalar products between wavefunctions, which are necessary to enforce orthogonality between the orbitals. Both operations can be efficiently parallelized [4] so that the overwhelming majority of the operations are performed locally on each processor through calls to optimised library routines (matrix-matrix multiplications (MMM) and one-dimensional FFT), while a carefully written proprietary three-dimensional (3D) FFT routine assures that the communication overload is minimized during grid transpositions." SuperComputing Journal

We must change our "data modeling" skills into biomechanical modeling skills. Why is this a big leap? Why is it so important for our success moving forward? What impact does it have on the Nanohouse of the future?

"Information and algorithms appear to be central to biological organization and processes, from the storage and reproduction of genetic information to the control of developmental processes to the sophisticated computations performed by the nervous system. Much as human technology uses electronic microprocessors to control electro-mechanical devices, biological organisms use biochemical circuits to control molecular and chemical events. The ability to engineer and program biochemical circuits, in vivo and in vitro, is poised to transform industries that make use of chemical and nano-structured materials." California Institute of Technology

What we need to address NOW is our primitive thought processes. It's time to think outside the box - time to expand our horizons. Can we get a Data Modeling tool vendor to finally come to the table and offer 3-D modeling based on variances, strength of bonding (associative properties), and relevance? If we can build some of these attributes into our respective data models - that's one step closer to the nanohouse. Of course there are hundreds of miles to go before we get there. The modeling is where it starts; from there we can begin to focus our efforts on the programmatic shifts that must take place.

Additional blog entries will continue exploring the nanohouse along with the notions of DNA computing, and self-assembly. We will explore the notions of the hypothesis stated earlier and work at uncovering what happens to a system when it expands beyond order.

Seen any interesting nanotech articles lately? I'd love to hear about them. What's your view on Nanotech, DNA computing, Information Modeling? Sound off!

  Posted by Dan Linstedt at 6:03 AM | | Comments (0)


Nanotech lessons for Data Modelers

We could learn A LOT about information modeling from the nano molecular levels if we only paid attention. Self-assembly at the nanoscale provides many clues about how we should model our information systems as they grow. This blog entry highlights self assembly and its attributes: repeatable, consistent, and reliable.

"Although man's understanding of how to build and control molecular machines is still at an early stage, nanoscale science and engineering could have a life-enhancing impact on human society comparable in extent to that of electricity, the steam engine, the transistor and the Internet." -- Professor David Leigh, Edinburgh University


ComputerWorld reports that self assembly and mixed silicon circuits are 5 to 7 years off. However they do present some very interesting findings from the lead laboratories in the nation. Here we explore impacts, and apply the ideas to a cross-field: data modeling.

"The neat thing about SAMs is they're very well ordered," McGimpsey says. A field of these SAMs protrudes from the substrate at a well-defined angle—like a small patch of thick, well-tended grass—and can perform several duties, such as improving conductivity or increasing surface area. Such order, McGimpsey says, "means predictability of structure, and thus of properties." (from the ComputerWorld article mentioned earlier)

The order means predictable structure and properties - shouldn't we be taking our data modeling queues from nature? Our current data modeling efforts inside RDBMS engines is ancient history, 3rd normal form has only a few ties to natural structure. Our data models must reflect the natural models at the nanoscale. They need to be repeatable, predictable, and redundant. This is a foundation of the Nanohouse. See my web site for more information.

What does Order bring to the table?
Redundancy, fault-tolerance, control, scalability, repeatability are all attributes of order. If we can provide an ordered data model for our information systems (one that resonates with natural models) we can begin predicting how it will act under certain circumstances. We can also begin producing (automatically) the models that will house our data.

No matter how large the data sets grow, we can always predict exactly how it will perform – especially through the use of Fourier transforms and mathematical formulas. "Natural systems form nano-scale structures," Natural systems also provide accurate accounts of form and function. Why then do we in IT insist on creating artificial modeling elements in a 2 dimensional world to house our data? We should be solely focused on 3D modeling capabilities with repeatable and redundant design (ordered systems).

With IT moving toward SOA, we should also be focusing on the data model behind the scenes – can it self-assemble in the future? Can self-assembly mean self-maintaining data models? Can data models proactively change according to newly arriving stimuli? Can we teach our modeling systems (in the information industry) like chemical experiments? When will our data modelers finally learn that it’s about the FORM and FUNCTION, not just the data itself?

For now, focusing on the biological aspects of nano self-assembly can bring tremendous gains to the data modeling world – if for nothing else, housing huge quantities of information in an itsy-bitsy space and an ordered and repeatable fashion.

Do you believe Data Modeling needs an overhaul? Sound off!

  Posted by Dan Linstedt at 5:59 AM | | Comments (0)


September 8, 2005

Data Modeling may also be stuck in 1985!

In my blog: "Stuck in 1985", I discuss the nature of graphing, and how I believe the current BI Reporting vendors aren't doing enough to represent the data for visual recognition. There's a flip side or an underside to this current as well. The question I'm driving here is: Is accurate data visualization driven by data modeling architecture of the warehouse behind the scenes?

I would tend to say YES, it is. In this blog, we explore this notion a bit more in depth. Take a look and let me know what you think...

I begin with directing towards visualization tools, just as I directed towards graphing tools in the last round. In this particular case, there's an Open Source data visualization component called OpenVis from IBM. In the enhanced data model section it discusses how the "data model" plays a critical role in the visualization capabilities. With OpenVis, apparently the data model is an object-oriented component. Here, they discuss the the details of the data model in action.

I believe that data modeling is a key to open many doors. The data model should be consistent (in architecture), repeatable, redundant, and flexible to change (without restating data). In this case, the components or entity types should be standardized beyond just "parent child". In order to gain some sort of 2-dimensional understanding of a data model, patterns within the data model itself must be easily recognized.

“If we assume that the viewer is an expert in the subject area but not data modeling, we must translate the model into a more natural representation for them. For this purpose we suggest the use of orienteering principles as a template for our visualizations.” http://www.thearling.com/text/dmviz/modelviz.htm

In this case, orienteering is the use of “anchor points” like a 3d Landscape where we anchor ourselves to visual queues, street corners, addresses, height of buildings, etc. In the data modeling case, orienteering could easily mean data points treated as geographical or spatial coordinates. In other words, the data model can be capable of driving multi-axis (multi-dimensional) graphing qualities; in fact, I blogged about this earlier.

Here is a very interesting knowledge portal used to visualize information in a moving format (theBrain). Here the data model is virtual – embedded in the software’s reference layer to the content it collects. It reflects a neural net behind the scenes. What if we were to extrapolate the notions behind neural net? What if we were to over-simplify the representation of information in a standardized data modeling format? Would we be better equipped to visualize and mine the information in its native stored format?

I’ve attempted to do just that with the Data Vault data modeling architecture. It’s a standardized set of entity types that represent a poor mans neural net. It provides a two-dimensional data storage space with the capacity for N dimensional bisection/associations based on the physical data stored within the entities. It is based on the business keys and semantic definition of those business keys, along with the grain of those keys. In this manner – grain might be considered one dimension; semantic definition could be another dimension. Within the model we can add gradient and mechanical relevance scores to assist in defining associative properties between elements. In turn, it becomes easier to represent this information in a 3D modeling format, where the data can be visualized and explored on (for instance) landscape maps.

I believe that the key to visualization, and better understanding of our information relies heavily on the architecture or data model housing that information. You can read more about the Data Vault Here...

  Posted by Dan Linstedt at 8:37 AM | | Comments (1)


September 6, 2005

Stuck in 1985!

I've had a bit of time to think while I've been away. I've pondered this question quite a bit as of late. I'm looking into business intelligence presentation layers and wondering why the vendors seem to be stuck in 1985. I'm not talking about the physical data, or the drill through, or the web-capabilities.... No, I'm talking about the graphs and graphics available by the vendors themselves.

I find myself asking the question: In this day and age, is it really necessary to "get excited" over executive dashboards that contain the latest bar chart, pie chart, line graph, speed dial, and red-yellow-green coloring?

Let me step back for a moment and discuss a few other capabilities that I'm referring to. Take for instance web sites, and web-presence. Most corporate web-sites are moving - full of animation, video, sound, dynamic content and vivid coloring. These days, the old point and click simplistic 2d looking site (mine included) can only hold ones attention for roughly 2 minutes (if I'm lucky). This is one example of the evolution of human-machine interface.

Why then must I settle (as an executive officer or senior management) for 2D sterile 3 color bar-charts? Is there something in the corporate culture that prohibits me from having a dynamic 3D fly-through interspatial data exploration experience? Well, there might be...

Ok, let's talk about another example. If we move on to Oil and Gas exploration - what kind of interface do the engineers use to explore the ground, for possible oil and gas deposits? How about managing or mapping an existing oil well? For example, check out some of the representations at the bottom of this page (Petroleum Graphics). Granted, geography exists in 3D space - so physically we can represent the data this way according to lat-long, and height or depth. Here's an example of the graphs that they might use for land mapping (RockWorks). There's one particular graph that shows leakage of Radium from tanks, what if we could say the tanks are business units, and the leakage was loss of revenue? Would we be inclined to "pinpoint" the hole in the tank? Would we be interested to know if the hole is the same in all tanks, the same size, location, how many holes?

However, why shouldn't we ask the BI vendors to begin producing more sophisticated graphing choices? What if we could get a 3D graph of our business in real-time, where some of the points along the height axis are moving up and down, the 2d access bisected with planes, and colors representing more stagnant components of business? In other words - a graph we can "fly-through" or truly drill into.

This type of graph might make BI more fun, dare I say addicting? Imagine how much time we might spend investigating productivity gains or playing with business profitability if motion and fly-through were a part of it... Ok, so there's a lot of mathematics behind the engines, and it may require a deep understanding of the business to assist in developing a custom solution - after all, it's one of the ways a business maintains its competitive edge right? Just think of the visual correlations that could be made with this type of graph, immediate conclusions that were otherwise hidden in two separate 2D graphs.

Here's a colored plot (Aspire Software) that begins to be interesting. It represents sound waves and frequencies. Here's another technical plotting (Techplot) software that can represent all kinds of neat real-world elements. Give the page some time to load, then scroll through it.

At this point I wonder, is it the business acumen that have been lulled into believing they can only get 2D charts and graph representations from their vendors, or is it the vendors touting that this is the latest and greatest available, and look at this neat coloring feature of this graph? In order to be fair, let me say again - that these vendors have a tremendous amount of fantastic engineering to getting, manipulating, drilling, and walking through the data - it's just the graphing component of the presentation layer that I'm referring to.

In all fairness we have to ask: is the average business user ready to make use of all this power on their desktop? Here's an quote from a vendor: Who Uses This Stuff? that you might find interesting:

EVS- Environmental Visualization System provides state-of-the art analysis and visualization tools for geologists, geochemists, environmental or mining engineers, oceanographers, archaeologists and modelers. EVS provides true 3D volumetric modeling, analysis and visualization. Advanced site visualization can help reduce site assessment costs and enhance data presentation capabilities for remediation planning, litigation support, regulatory reporting, and public relations.

Personally I think its high time big business intelligence and data reporting vendors step to the table, and begin to demonstrate the true power of data visualization. I think the current state of reporting engines is still "stuck in 1985" for the time being.

I'm open to comments, thoughts, questions... Again, the BI vendors are strong in the back-end engineering and data integration/retrieval - I'm targeting the graphic presentation specifically.

  Posted by Dan Linstedt at 7:10 PM | | Comments (2)