Blog: Dan E. Linstedt Subscribe to this blog's RSS feed!

Dan Linstedt

Bill Inmon has given me this wonderful opportunity to blog on his behalf. I like to cover everything from DW2.0 to integration to data modeling, including ETL/ELT, SOA, Master Data Management, Unstructured Data, DW and BI. Currently I am working on ways to create dynamic data warehouses, push-button architectures, and automated generation of common data models. You can find me at Denver University where I participate on an academic advisory board for Masters Students in I.T. I can't wait to hear from you in the comments of my blog entries. Thank-you, and all the best; Dan Linstedt http://www.COBICC.com, danL@danLinstedt.com

About the author >

Cofounder of Genesee Academy, RapidACE, and BetterDataModel.com, Daniel Linstedt is an internationally known expert in data warehousing, business intelligence, analytics, very large data warehousing (VLDW), OLTP and performance and tuning. He has been the lead technical architect on enterprise-wide data warehouse projects and refinements for many Fortune 500 companies. Linstedt is an instructor of The Data Warehousing Institute and a featured speaker at industry events. He is a Certified DW2.0 Architect. He has worked with companies including: IBM, Informatica, Ipedo, X-Aware, Netezza, Microsoft, Oracle, Silver Creek Systems, and Teradata.  He is trained in SEI / CMMi Level 5, and is the inventor of The Matrix Methodology, and the Data Vault Data modeling architecture. He has built expert training courses, and trained hundreds of industry professionals, and is the voice of Bill Inmons' Blog on http://www.b-eye-network.com/blogs/linstedt/.

April 2006 Archives

Imagine, a smart RFID (radio frequency identifier tag) - in other words, not just one that bounces a signal that was received by a transmitter, but one that emanates a unique number (like a RIN (RFID identification number) - like a VIN only for RFID's. I realize they already have RTLS (real-time locator systems) with this technology embedded, but imagine it at a smaller scale. RTLS are currently very large (compared to RFID tags). How would this affect BI? What if it could use Nanotechnology and an embedded power source (like Nanotech reports is possible) to power a unique signal? What would happen to the supply chain for example?

This entry is just a thought experiment.

Well, I was thinking about my can of soda; yes I drink soda, and it's usually Pepsi, uhhh I mean Coke... Oh Well, I drink both - but anyhow, what if the can's paint could contain a RID and a modified RFID that generated signals? What if Coke/Pepsi cared about geographic location of the can? It is possible to send a satellite signal to each MRFID (modified RFID). This would have to be done using Nanotech, for an internal power source, and a transmitter would have to be embedded, or an encoding device.

In other words, since the power source is usually too weak to respond to a satellite signal, it would have to record where it was (latitude and longitude). Every hour it would record it's lat/long in a DNA computing style by folding DNA elements.

Yea, so what, what if Pepsi/Coke could track the can, and what difference would it make?
Well - from a vendor perspective they could start to discover where the cans went when they left the store. Perhaps a scary thought, perhaps not. In any case, it's bound to happen and not just with the Drink manufacturers, but with cars, clothing, artwork, and so on. In fact with On-Star in GM Cars, it's already happening (only on a larger scale). Imagine what marketing power the cola company would have if they knew that on July 4th many of the cans were not only purchased at Wal-Mart, but driven to a remote location where they were subsequently consumed; in other words, a campground.

If the cola manufacturer could figure out how to open a store closer to that location, they might have a boost in sales, or even a dispenser machine or they drive a truck up to the location to sell or promotionally give away their product; all in the sake of loyalty.

But hey, we're talking tracking of the products that we purchase. This raises serious invasion of privacy concerns. I may not want my cola / pants / T-Shirt producer tracking my activities and locations. They'd quickly find out that I'm not worth tracking - moving around the country to present at IT meetings, and working at home most of the time.

On the other hand, think of what Law enforcement could do from a business intelligence perspective - a criminal purchases a set of pants or a mask that's tagged with MRFID, and all of the sudden the FBI has a fix on their location... Hey maybe it's good for tracking wanted individuals. But we'll leave that alone.

What I'm suggesting is the following:
* This technology will come to pass, like it or not - it will happen within 15 to 20 years (or sooner) Because vendors would have a huge increase in revenue as a result.
* Nanotech is already here, and there are limitless utilizations for it.
* Privacy and Ethics are a hot debate in the nanotech industry
* There are some interesting applications for MRFID in the productized world.

Care to share some ideas? Thoughts?
Daniel Linstedt


Posted April 29, 2006 4:23 PM
Permalink | 4 Comments |

If you haven't heard of this company, and you're looking for Master Metadata Management, you'll want to check them out. They are a privately held firm based in Silicon Valley, and are OEM'ing their core technology to just about every major vendor out there. That said, they do sell their complete package with all kinds of cool features for a reasonable cost. The business reasons to use their software? Read on - let's try to shed some light on this...

If you're involved in Metadata Implementation, master metadata initiatives or in fact, data management, data administration, change management, of process flows, and other such items, then this tool can fill the bill. Meta Integration (http://www.MetaIntegration.com) is:

a metadata repository that contains versioning, and over 50 different bridges for BOTH import / export of technical metadata. But it goes deeper than that. It manages process metadata, data model metadata, logical metadata, ETL / ELT flow metadata, BI Query and database view metadata, and so on. The bridges can connect to many different tools and pull in the metadata from all of these - and through a process called stitching can overlap the metadata together (where appropriate).

From a business perspective their tool provides drill down and FULL HISTORY DATA LINEAGE from the source to the target, and can provide this lineage across the business, in a web-based browser to the business users. The repository can also handle registries of business metadata information, and allow the business users to maintain copies of the metadata (those who have privileges and access may upload the new metadata to Meta Integration as a new version).

It is quite a tool, it can save time and money by providing PROFILING of the existing FLOW of the data set across the systems involved in data integration, this is a massive time-saving device, even if the tool is not used for metadata management.

Here are a few of the connectivity options it has: ER-Win, Informatica, Ascential, Teradata, Oracle, DB2 UDB, SAP, PeopleSoft, Oracle Financials, SAS, and so on.

We have built metadata management best practices with roles and responsibilities, business definitions, and execution strategies around their tool. We can really speed up or enhance your data management / data administration and metadata initiatives.

Thanks,
Dan Linstedt


Posted April 28, 2006 6:28 AM
Permalink | No Comments |

I recently had the chance to work with the Beta program of SQLServer2005, and Integration services. Let me tell you, it was amazing. There are several highlights in the technology that I'd like to share. This entry will go through these items. There are only a couple of things that I'd like to see improved. My basic thoughts are: this will give customers who are tight on budget, or not moving huge volumes per load-cycle, an opportunity to really begin building out a true data warehousing/BI solution.

SQLServer2005:

* Database/Table partitioning. At long last, SQLServer has implemented range-partitioning. This will be a HUGE boost to performance on larger data sets, as long as the partitions are architected properly across the existing disk sets. If you're not familiar with partitioning, now's your chance to read up on it - do all the reading you can before you upgrade. Partitioning is not something you simply want to "turn-on". However, SQLQueries that use clustered indexes can finally avoid table scanning when partitions are invoked, why? next feature... Other types of partitioning (like hash, and sub-partitions) are said to be in the works for a future release.

* Query Parallelism. Query parallelism has been implemented to take advantage of the database and table partitioning. The queries can and will be broken into their respective components to take advantage of the range partitions. Queries also take advantage of new performance of the SQLServer2005.

* Performance. SQLServer2005 has extended its performance by optimizing to 8k block sizes, windows operating system hooks, and shares RAM caches with I/O buffers. They've streamlined the I/O from SQLServer2005 to the disk, and as I understand it - bypassed the page-file to bring data into RAM (although page-file is still used for locking and swapping). The performance of SQLServer2005 is much higher than 2000, and can process millions of rows per minute, even without clustering.

* Clustering. There are new mechanisms for clustering SQLServer2005, and as I understand it, it can take shape of multiple clusters on a network of nodes (SMP under an MPP look and feel). This brings new meaning to low-cost high power computing. This notion is still challenging to setup, and make happen with SQLServer2005, but will get easier as Microsoft continues to release their product.

Integration Services:
* Fast fast fast, is all I can say. I saw a windows platform processing 2 Million rows in 2 minutes from a flat file, into SQLServer2005 through Integration services, and through a data mining transformation that provided name and address cleansing (and learned) - It was based on a neural network that you could tune. This was done on a single CPU laptop running all the software. It's been tuned to run with Flat Files, and SQLServer connections in parallel. Finally, direct connections gain parallel utilization and catch up with other DBMS vendors.

* Tons of transformations. Comes out of the box - lots of transformations, as I mentioned just a minute ago, one of them is a neural network. It also houses the parallel processing properties for splitting data (in parallel) down multiple transformation paths. They also have the notion of an error control and error target, which you can control the rules for.

* Workflow style transformation - all transformations are workflow style, meaning they are "runnable designs". You can see the row count increase as the data passes through the transformations along the way.

Improvements I'd like to see:
* Metadata. I had heard that they don't have a real "metadata repository", they apparently are working something out with Meta Integration (you'll hear about them shortly). They need to get into the OPEN metadata game.

* Import/Export of transformation logic via XML. There apparently is no import/export of the transformation designs, therefore there is no current capability for sharing the transformation logic across servers.

* Reusable transformation "workflows" - they do not have the concept of reusability down yet. I've heard they are working on this.

* Biggest one: Connectivity options. Today, if you want Oracle Data, DB2 UDB, DB2 AS/400, Teradata, Sybase, MySQL, or Informix you have three options: use BRIDGE software provided by the other vendor (which is slow), use ODBC and OLE*DB drivers from Microsoft, some are fast, some are slow, or write your own for native connectivity using their SDK.

Bottom line: if you're a Microsoft only shop, SQLServer2005 and Integration services is a HUGE boost to productivity over DTS and SQLServer2000 - well worth the upgrade cost. If you're a small shop that deals with mostly flat-file transfers from other databases, it will help as well. If you’re into volume - and native connectivity, I suggest looking elsewhere, and letting Microsoft grow into this space.

More to come, did I miss something? Post a reply!

Cheers,
Dan Linstedt


Posted April 26, 2006 5:45 AM
Permalink | 1 Comment |

What do I mean by Local versus Localized copies of Master Data? In this entry I will try to explain my definition for each. This is a short entry, and as always, comments are welcome.

I did not say "local copies" should not exist, what I did suggest is that no "localized" copies should exist. If I did actually refer to "no local copies" then shame on me, I made a mistake.

Local Copies of the data replicated Master Lists across geographically dispersed regions for fast access times, this is fine.

Localized copies of the data
Master Data that has been "changed, altered, or is different" in some way from the source of the Master Data Set. In other words, someone not only changed the master metadata (definition in how the data set is used in context - which is OK to override), but they also changed the MASTER DATA itself, to represent "their version" of the master data. This is NOT ok. It re-creates the silo's and destroys the master data purpose to span the horizontal enterprise.

I hope this clears things up.
Cheers,
Dan Linstedt
CTO, Myers-Holum, Inc
http://www.MyersHolum.com


Posted April 21, 2006 8:55 AM
Permalink | No Comments |

This is a new category for me, typically I don't focus on "vendor specific" items, but in this category I will begin doing so. I may blog on Microsoft SQLServer2005 and it's intelligence services, or Teradata, Netezza, DatAllegro, Ipedo, MetaMatrix, Oracle, DB2 UDB, IBM Ascential, IBM Websphere or any number of other vendors that I think are doing something interesting in the market space as it pertains to Business Intelligence. All entries here will be vendor specific. Feel free to send me questions about specific vendors. Thank-you! Hope to see you out here.


Posted April 15, 2006 7:35 AM
Permalink | No Comments |
PREV 1 2

Search this blog
Categories ›
Archives ›
Recent Entries ›