Blog: Dan E. Linstedt Subscribe to this blog's RSS feed!

Dan Linstedt

Bill Inmon has given me this wonderful opportunity to blog on his behalf. I like to cover everything from DW2.0 to integration to data modeling, including ETL/ELT, SOA, Master Data Management, Unstructured Data, DW and BI. Currently I am working on ways to create dynamic data warehouses, push-button architectures, and automated generation of common data models. You can find me at Denver University where I participate on an academic advisory board for Masters Students in I.T. I can't wait to hear from you in the comments of my blog entries. Thank-you, and all the best; Dan Linstedt http://www.COBICC.com, danL@danLinstedt.com

About the author >

Cofounder of Genesee Academy, RapidACE, and BetterDataModel.com, Daniel Linstedt is an internationally known expert in data warehousing, business intelligence, analytics, very large data warehousing (VLDW), OLTP and performance and tuning. He has been the lead technical architect on enterprise-wide data warehouse projects and refinements for many Fortune 500 companies. Linstedt is an instructor of The Data Warehousing Institute and a featured speaker at industry events. He is a Certified DW2.0 Architect. He has worked with companies including: IBM, Informatica, Ipedo, X-Aware, Netezza, Microsoft, Oracle, Silver Creek Systems, and Teradata.  He is trained in SEI / CMMi Level 5, and is the inventor of The Matrix Methodology, and the Data Vault Data modeling architecture. He has built expert training courses, and trained hundreds of industry professionals, and is the voice of Bill Inmons' Blog on http://www.b-eye-network.com/blogs/linstedt/.

July 2005 Archives

With the super-swell of data sets these days it can be challenging if not impossible to make your way through which "DBMS" vendor does what. Most vendors offer different sets of features and functionality, they all leapfrog from one feature to the next, one version to the next - but at the end of the day, we as customers must decipher which solution fits our needs.

This blog is an attempt at suggesting what features are critical (in a generic sense) to managing VLDB/VLDW going forward. If you have features you'd like to suggest, or things your company really needs, please comment.

In the VLDB/VLDW world things change, as strategic EDW's become tactical EDW's, and our world shifts into near-real time this and that. Instant responses aren't always what they're cracked up to be. Vendors throw around the term "single version of the truth" when all the while it really should be "single version of the FACTS" because "truth" is purely subjective, and squarely in the hands of the BI user.

However, volume does funny things to our systems. It forces us to shift paradigms (from SMP and shared X) to MPP, MPP/SMP clusters with shared nothing under the covers. It forces our architectures to change, data models to change, and our latency for loading data to get smaller. Of course - I'm assuming this is all business driven right?

Let's put it this way: you can wash 1 car 5 ways from Sunday, and take all day Sunday to get it sparkling clean, but if you have 500 Cars to wash, well - you need a system, a standardized system whereby each part takes X amount of time, and there are multiple people working in parallel to get all the cars clean. If you double that again to 1000 cars a day, then 2000, then 4000, then 8000 pretty soon you've overloaded the mechanism for cleaning all these cars in one day. You begin to need efficient machines that work on the standardized system, giant machines all operating in parallel, that can wash 500 cars in two hours or so.

Just like this example: their is a breaking point in DBMS vendor’s architectures that promote SMP clustering (without MPP controllers), and shared-X architectures. What you could do architecturally with 5000 rows of data and 1 hour, doesn't necessarily work at 500M rows in the same hour.

I would suggest that the following criteria be important when evaluating VLDW/VLDB vendors (I'm not just talking about having 500M rows, and not using them. I'm talking about active information - that flows into the database, and is utilized or queried, summarized and acted on).

* Shared Nothing MPP
* Fail-over and fault-tolerant SMP's underneath
* Redundant Networking, Redundant Disk, Redundant CPU, Redundant RAM
* High speed throughput
* Compression
* Dynamic and Batch loading capabilities
* High speed I/O, redundant I/O, dedicated I/O 300-400MB per second or better in raw data copy speed.

There are hundreds more criteria, but these should get you out of the blocks. If embarking on VLDW or VLDB, it would be wise to review your current architecture, and possibly to load test it by duplicating the data set you currently have. Vendors know where and when you'll hit the wall with your VLDB/VLDW - and in some cases if they're called in to save the day, they'll jack the prices (not always, and not all vendors) to help with the switchover, because they know you have no choice. Your systems will reach a point of no-return and fail. I've seen it happen.

If you'd like to hear more about this subject, feel free to reply with thoughts and comments. If you disagree, I'd like to know why, and what your experience has been - especially if it's been positive with a particular vendor.


Posted July 28, 2005 7:04 AM
Permalink | No Comments |

EII - aka; Enterprise Information Integration. Does it really have a chance to survive? or is it just another passing fad??

As an architecture it makes sense, a lot of sense - but then there's SOA - with a much larger view of the world, and lot more integration under the covers. So is EII just the technology to make SOA work? or is there something else going on here?

EII is an interesting topic, it get's a lot of buzz, both positive and negative in the industry. The vendors in this space today are new, and considered first generation (by their own accounts), but are rapidly racing to come up with generation two.

What I've seen so far is that EII as a niche player provides some value to the business, as long as the business wants to integrate "data now" across the organization, and is interested in an enterprise view (including outside or external data sources) of all their information. Where EII runs into limitations today is: write-back to source systems that meet ACID tests - overtaking the entire data warehousing effort, claiming "virtual warehouse".

I think that EII is an interesting category when it comes to replacing the ODS - and maybe the marketeers should be trumpeting: Virtual ODS, which if the ODS is built according to most standard definitions(not containing any history except transactional history - because its' reflected in most source systems), then EII can hold a candle to that. I think EII falls short in creating a virtual warehouse, that may be in the future - but for now, it just doesn't happen (for a variety of reasons).

As the EII vendors rush to generation two, the SOA "vendors" are gearing up with generation one. As I've stated before: the tools underneath SOA architecture and EII have a lot of overlap, and some EII vendors are actually tooling EII generation two to include SOA offerings (with web services, security, and compliance).

As with any technology, there's convergence in the market place. Convergence across EII, EAI, ETL, and web-services. SOA is the architectural icing on the cake.

EII is a technology to watch, and today - if you have a very specific question about your enterprise that needs data from many sources (but not history), then EII may solve the problem elegantly.

Comments?


Posted July 27, 2005 6:58 PM
Permalink | 2 Comments |

I'd like to explore the material known as Wellstone. There are some interesting aspects to this material, and it is written about in "Hacking Matter" by Will McCarthy. It is not necessarily nanotechnology, so much as it is quantum level materials and bio-molecular control over nano sized or meso-sized particles.

We will return to the world of Nanotechnology and DNA computing shortly, for now - let's talk about Wellstone.

Definitions from the book:

Quantum Well: "When layered in particular ways, doped silica can trap conduction electrons in a membrane so thin that, from one face to the other, their behavior as tiny quantum wave packets takes precedence over their behavior as particles. This structure is called a quantum well."

"From there, confining the electrons along a second dimension produces a quantum wire, and finally, with three dimensions, a quantum dot."

These are interesting definitions of nano-scale particles. If we were to play "what-if" questions, one might begin to imagine that if we could do some very strange things if we can harness the power of a quantum well. Using wave dynamics to penetrate surfaces, and pass information from point a to point b. But it gets more interesting than that:

"The unique trait of a quantum dot, as opposed to any other electronic component, is that the electrons trapped in it will arrange themselves as though they were part of an atom, even though there's no atomic nucleus for them to surround. Which atom they emulate depends on the number of electrons and the exact geometry of the wells that confine them, and in fact where a normal atom is spherical, such designer atoms can be fashioned into cubes or tetrahedrons or any other shape..."

Wellstone is just such a structure capable of trapping quantum dots in a translucent structure. Given Wellstone, and the nature of the quantum dots, all one has to do is to add or remove electrons to change the "chemical makeup" of the designer atoms; thus resulting in changing the look and feel (at the macro level) of the object.

In other words, it can look and feel like gold, change the electron count - and it can look and feel like iron, impervium, or even wood. This is, in a true sense - programmable matter. What does this mean to the business world? An interesting question indeed. From a commercial perspective it could mean wealth and power. From a consumer perspective it may mean things like flat-computer screens that can change to "writable paper" and back to LCD-like images. It may mean changing the table from opaque to translucent, of course the table would be made of wellstone.

More on this soon, what would you do with Wellstone?


Posted July 27, 2005 6:37 PM
Permalink | 4 Comments |

This is a marketing entry, but in hopes of helping our EDW community, we have released Free (blank) RDBMS and IQ ScoreCards on our website, along with the ETL scorecards, and Metadata Tool Scorecards already there.

Our web site is: www.MyersHolum.com

thank-you,
Dan L


Posted July 21, 2005 7:45 AM
Permalink | No Comments |

In recent posts I have begun to discuss a notion, or concept regarding something I call Dynamic Data Warehousing. It's real name should be Dynamic Structural Change and Adaptation Data Warehousing - but who would buy that? Not very marketing like if I say so myself.

I recently blogged on the war of the appliance vendors, and have written articles in the past on Convergence, and the wave of integration and partnerships sweeping the industry. This is just one of the futuristic items that I believe is completely possible to build with today's technology.

Today it may be expensive, but it can be done. In the future as the market makes way for more consolidations and integrations or partnerships between hardware and software vendors we will see additional efforts headed toward automatic structural manipulation.

I also added an entry on 3-D modeling capacities, and if the DDW device ever is produced, attaching a 3-D modeling landscape would increase the value 50 fold or more. So what are the basics needed for DDW "device" or appliance?

Here's a partial list of architectural components:
1. High speed hardware with embedded RDBMS software (embedded into the firmware), such that model changes and data movement is quick and painless, indexing is not needed.
2. Back-plane with several card-slots.
3. Each of the slots should be taken by one of the following cards: Data Mining/Structure Mining, Security and Access Card, Data Access - web-browser card, SOA and web-services card, Real-Time and Batch data integration card.
4. If I had my choice, one more card: chemical modeling software retrofitted to represent data, and data clusters - so we can visualize the information in 3-D format.

Each particular card has a job to do, but they all talk to each other through the backplane - high speed bus transfer. Data never leaves disk, or is buffered in high speed RAM.

The dynamic section of this device is to use Neural Nets, and data mining capacities across the structural components - to explore and dynamically adapt to newly arriving business elements. What we want from an IT perspective is the ability to plug and play a system, as we feed it data - it "discovers" the inherant structure, we teach it to model, and give it rules for performance and interaction with the existing storage device, CPU's and RAM that the card is plugged into. We fine tune the neural net over time, and it becomes a highly responsive processing machine that knows the "what" portion of our business.

From the business side, we make the structural inferences with confidence ratings, put the models back in the hands of business through BI and 3-D modeling efforts, bring the storage of the data, the model, and the presentation layers up to meet the business processes - closing the gap between IT and business. Train the business users (a few who wish to perform the tasks), and give them full reign over exploring the data sets through a 3-D landscape.

From a structure perspective, we adapt, change, and alter the structure through the structural neural net - tweaking for the balance between performance and business representation.

And finally from an information quality perspective, the appliance helps with both the structural improvements, as well as the data improvements through a second neural net - that imputes values, standardizes, cleanses, and reports metadata (the results).

If packaged properly, this device can rapidly become "smart" - no, not think for itself, but it will begin to lower costs, lower overhead, allow "what-if" games to be played with the architecture and judge the impact (visually).

Dynamic Data Warehousing is just beginning, I hope to hear back from you on your thoughts.

Thanks,
Dan L


Posted July 15, 2005 11:29 PM
Permalink | No Comments |
PREV 1 2

Search this blog
Categories ›
Archives ›
Recent Entries ›