Business Intelligence Network

Blog: Dan E. Linstedt

« June 2005 | Main | August 2005 »

July 28, 2005

VLDB/VLDW And Vendor attentiveness

With the super-swell of data sets these days it can be challenging if not impossible to make your way through which "DBMS" vendor does what. Most vendors offer different sets of features and functionality, they all leapfrog from one feature to the next, one version to the next - but at the end of the day, we as customers must decipher which solution fits our needs.

This blog is an attempt at suggesting what features are critical (in a generic sense) to managing VLDB/VLDW going forward. If you have features you'd like to suggest, or things your company really needs, please comment.

In the VLDB/VLDW world things change, as strategic EDW's become tactical EDW's, and our world shifts into near-real time this and that. Instant responses aren't always what they're cracked up to be. Vendors throw around the term "single version of the truth" when all the while it really should be "single version of the FACTS" because "truth" is purely subjective, and squarely in the hands of the BI user.

However, volume does funny things to our systems. It forces us to shift paradigms (from SMP and shared X) to MPP, MPP/SMP clusters with shared nothing under the covers. It forces our architectures to change, data models to change, and our latency for loading data to get smaller. Of course - I'm assuming this is all business driven right?

Let's put it this way: you can wash 1 car 5 ways from Sunday, and take all day Sunday to get it sparkling clean, but if you have 500 Cars to wash, well - you need a system, a standardized system whereby each part takes X amount of time, and there are multiple people working in parallel to get all the cars clean. If you double that again to 1000 cars a day, then 2000, then 4000, then 8000 pretty soon you've overloaded the mechanism for cleaning all these cars in one day. You begin to need efficient machines that work on the standardized system, giant machines all operating in parallel, that can wash 500 cars in two hours or so.

Just like this example: their is a breaking point in DBMS vendor’s architectures that promote SMP clustering (without MPP controllers), and shared-X architectures. What you could do architecturally with 5000 rows of data and 1 hour, doesn't necessarily work at 500M rows in the same hour.

I would suggest that the following criteria be important when evaluating VLDW/VLDB vendors (I'm not just talking about having 500M rows, and not using them. I'm talking about active information - that flows into the database, and is utilized or queried, summarized and acted on).

* Shared Nothing MPP
* Fail-over and fault-tolerant SMP's underneath
* Redundant Networking, Redundant Disk, Redundant CPU, Redundant RAM
* High speed throughput
* Compression
* Dynamic and Batch loading capabilities
* High speed I/O, redundant I/O, dedicated I/O 300-400MB per second or better in raw data copy speed.

There are hundreds more criteria, but these should get you out of the blocks. If embarking on VLDW or VLDB, it would be wise to review your current architecture, and possibly to load test it by duplicating the data set you currently have. Vendors know where and when you'll hit the wall with your VLDB/VLDW - and in some cases if they're called in to save the day, they'll jack the prices (not always, and not all vendors) to help with the switchover, because they know you have no choice. Your systems will reach a point of no-return and fail. I've seen it happen.

If you'd like to hear more about this subject, feel free to reply with thoughts and comments. If you disagree, I'd like to know why, and what your experience has been - especially if it's been positive with a particular vendor.

  Posted by Dan Linstedt at 7:04 AM | | Comments (0)


July 27, 2005

EII - does it have a chance to survive?

EII - aka; Enterprise Information Integration. Does it really have a chance to survive? or is it just another passing fad??

As an architecture it makes sense, a lot of sense - but then there's SOA - with a much larger view of the world, and lot more integration under the covers. So is EII just the technology to make SOA work? or is there something else going on here?

EII is an interesting topic, it get's a lot of buzz, both positive and negative in the industry. The vendors in this space today are new, and considered first generation (by their own accounts), but are rapidly racing to come up with generation two.

What I've seen so far is that EII as a niche player provides some value to the business, as long as the business wants to integrate "data now" across the organization, and is interested in an enterprise view (including outside or external data sources) of all their information. Where EII runs into limitations today is: write-back to source systems that meet ACID tests - overtaking the entire data warehousing effort, claiming "virtual warehouse".

I think that EII is an interesting category when it comes to replacing the ODS - and maybe the marketeers should be trumpeting: Virtual ODS, which if the ODS is built according to most standard definitions(not containing any history except transactional history - because its' reflected in most source systems), then EII can hold a candle to that. I think EII falls short in creating a virtual warehouse, that may be in the future - but for now, it just doesn't happen (for a variety of reasons).

As the EII vendors rush to generation two, the SOA "vendors" are gearing up with generation one. As I've stated before: the tools underneath SOA architecture and EII have a lot of overlap, and some EII vendors are actually tooling EII generation two to include SOA offerings (with web services, security, and compliance).

As with any technology, there's convergence in the market place. Convergence across EII, EAI, ETL, and web-services. SOA is the architectural icing on the cake.

EII is a technology to watch, and today - if you have a very specific question about your enterprise that needs data from many sources (but not history), then EII may solve the problem elegantly.

Comments?

  Posted by Dan Linstedt at 6:58 PM | | Comments (2)


Quantum Wells, Quantum Wire, and Quantum Dots

I'd like to explore the material known as Wellstone. There are some interesting aspects to this material, and it is written about in "Hacking Matter" by Will McCarthy. It is not necessarily nanotechnology, so much as it is quantum level materials and bio-molecular control over nano sized or meso-sized particles.

We will return to the world of Nanotechnology and DNA computing shortly, for now - let's talk about Wellstone.

Definitions from the book:

Quantum Well: "When layered in particular ways, doped silica can trap conduction electrons in a membrane so thin that, from one face to the other, their behavior as tiny quantum wave packets takes precedence over their behavior as particles. This structure is called a quantum well."

"From there, confining the electrons along a second dimension produces a quantum wire, and finally, with three dimensions, a quantum dot."

These are interesting definitions of nano-scale particles. If we were to play "what-if" questions, one might begin to imagine that if we could do some very strange things if we can harness the power of a quantum well. Using wave dynamics to penetrate surfaces, and pass information from point a to point b. But it gets more interesting than that:

"The unique trait of a quantum dot, as opposed to any other electronic component, is that the electrons trapped in it will arrange themselves as though they were part of an atom, even though there's no atomic nucleus for them to surround. Which atom they emulate depends on the number of electrons and the exact geometry of the wells that confine them, and in fact where a normal atom is spherical, such designer atoms can be fashioned into cubes or tetrahedrons or any other shape..."

Wellstone is just such a structure capable of trapping quantum dots in a translucent structure. Given Wellstone, and the nature of the quantum dots, all one has to do is to add or remove electrons to change the "chemical makeup" of the designer atoms; thus resulting in changing the look and feel (at the macro level) of the object.

In other words, it can look and feel like gold, change the electron count - and it can look and feel like iron, impervium, or even wood. This is, in a true sense - programmable matter. What does this mean to the business world? An interesting question indeed. From a commercial perspective it could mean wealth and power. From a consumer perspective it may mean things like flat-computer screens that can change to "writable paper" and back to LCD-like images. It may mean changing the table from opaque to translucent, of course the table would be made of wellstone.

More on this soon, what would you do with Wellstone?

  Posted by Dan Linstedt at 6:37 PM | | Comments (2)


July 21, 2005

RDBMS and IQ ScoreCards Available

This is a marketing entry, but in hopes of helping our EDW community, we have released Free (blank) RDBMS and IQ ScoreCards on our website, along with the ETL scorecards, and Metadata Tool Scorecards already there.

Our web site is: www.MyersHolum.com

thank-you,
Dan L

  Posted by Dan Linstedt at 7:45 AM | | Comments (0)


July 15, 2005

Core definitions of Dynamic Data Warehousing

In recent posts I have begun to discuss a notion, or concept regarding something I call Dynamic Data Warehousing. It's real name should be Dynamic Structural Change and Adaptation Data Warehousing - but who would buy that? Not very marketing like if I say so myself.

I recently blogged on the war of the appliance vendors, and have written articles in the past on Convergence, and the wave of integration and partnerships sweeping the industry. This is just one of the futuristic items that I believe is completely possible to build with today's technology.

Today it may be expensive, but it can be done. In the future as the market makes way for more consolidations and integrations or partnerships between hardware and software vendors we will see additional efforts headed toward automatic structural manipulation.

I also added an entry on 3-D modeling capacities, and if the DDW device ever is produced, attaching a 3-D modeling landscape would increase the value 50 fold or more. So what are the basics needed for DDW "device" or appliance?

Here's a partial list of architectural components:
1. High speed hardware with embedded RDBMS software (embedded into the firmware), such that model changes and data movement is quick and painless, indexing is not needed.
2. Back-plane with several card-slots.
3. Each of the slots should be taken by one of the following cards: Data Mining/Structure Mining, Security and Access Card, Data Access - web-browser card, SOA and web-services card, Real-Time and Batch data integration card.
4. If I had my choice, one more card: chemical modeling software retrofitted to represent data, and data clusters - so we can visualize the information in 3-D format.

Each particular card has a job to do, but they all talk to each other through the backplane - high speed bus transfer. Data never leaves disk, or is buffered in high speed RAM.

The dynamic section of this device is to use Neural Nets, and data mining capacities across the structural components - to explore and dynamically adapt to newly arriving business elements. What we want from an IT perspective is the ability to plug and play a system, as we feed it data - it "discovers" the inherant structure, we teach it to model, and give it rules for performance and interaction with the existing storage device, CPU's and RAM that the card is plugged into. We fine tune the neural net over time, and it becomes a highly responsive processing machine that knows the "what" portion of our business.

From the business side, we make the structural inferences with confidence ratings, put the models back in the hands of business through BI and 3-D modeling efforts, bring the storage of the data, the model, and the presentation layers up to meet the business processes - closing the gap between IT and business. Train the business users (a few who wish to perform the tasks), and give them full reign over exploring the data sets through a 3-D landscape.

From a structure perspective, we adapt, change, and alter the structure through the structural neural net - tweaking for the balance between performance and business representation.

And finally from an information quality perspective, the appliance helps with both the structural improvements, as well as the data improvements through a second neural net - that imputes values, standardizes, cleanses, and reports metadata (the results).

If packaged properly, this device can rapidly become "smart" - no, not think for itself, but it will begin to lower costs, lower overhead, allow "what-if" games to be played with the architecture and judge the impact (visually).

Dynamic Data Warehousing is just beginning, I hope to hear back from you on your thoughts.

Thanks,
Dan L

  Posted by Dan Linstedt at 11:29 PM | | Comments (0)


War of the Appliances, and Convergence

In BI we've seen the trend, it's been written about for over 2 years now. There's a war afoot - across vendor land and between the software makers best-of-breed solutions and the hardware vendors of scalable devices, compliance and storage have partnered up, as have security and storage. As of late, RDBMS vendors and storage have partnered as well.

In the early days of Data Warehousing and BI we saw a split, into best-of breed software vendors, and best-of breed hardware devices. The market got hot, so hot - it exploded, nearly died and is being rebuilt as we speak. What will convergence look like over the next two years? What kinds of devices can we look forward to? What do these super-corporations need to learn or know to move forward?

Let's take a look at some of the vendors and what's happened over the past several years.

With IBM: They've acquired numerous software vendors, both hardware and software to bolster a huge SOA and enterprise integration effort. The list is too large to mention, but here are a few of the products and companies they've bought: NumaQ (Sequent) High performant hardware, MPP distribution, parallelism and partitioning. Informix XPS, for large data sets, high performance database engine practices, parallelism, partitioning. They've integrated these two into IBM hardware and DB2 UDB to make it a super-powerful option for big data volumes and high speed throughput. On another front, they've bought vendors like Ascential, AlphaBlox, and others to bolster IBM Data Integrator and Web Sphere, and their SOA offerings.

IBM isn't the only vendor to move in these directions, Netezza has built an appliance, Sun and HP are entering (or are actively competing in) this area, Microsoft has already begun the same initiatives.

Back to convergence; appliances of today do not offer "everything" that they could to to the enterprise. We still are left to buy bits and pieces and integrate them across the board to make the enterprise vision work (I still remember a time when I was told that "there will never be such a thing as enterprise view, because it's too hard to get everyone to agree as to what that means.") However, let's take a look at what makes the appliances so appealing.

You can buy an RDBMS device, and not "worry" about data modeling, or worry about managing, maintaining, or growing the system - its plug in, load, and go - taking snapshots of existing data sets as they stand. With security devices, it's the same story - plug and go, built in fire-walls, data mining concepts, real-time hack alerts, web-interfaces and management reports along with Web-Site service updates. With compliance devices, again it is the same story. Plug and go, get snapshots of the before & after data changes, find out when and who accessed what - down to the IP Packet level if desired.

What I predict is the continued merging of software and hardware vendors (at the very least, partnerships). Software vendors offering best of breed will begin to produce "firmware" plug & play updatable cards that fit into the back-planes of pre-engineered systems. These systems will include high-speed, tuned I/O, data placement optimization, Appliance like look and feel, and integration between multiple software vendors’ cards across the backplane.

In the future, we will receive firmware updates rather than software updates, and probably be purchasing hardware cards instead of CD's that are customized to meet the needs of the integrated enterprise. If I were to guess, I would say that the following categories of firmware cards will be made:

1. Information Integration cards, handling both real-time and batch loading, backup, extract, restore of information within the system.
2. Data Mining and statistical analysis cards, handling Information Quality, Metrics measurement, data testing, validation, imputing values (profiling and cleansing), and alerting or triggering mechanisms.
3. Web Access front-end cards handling graphical interfaces, user access layers, data distribution, and additional configuration/administration features.
4. SOA / Web Services card, handling the web-services responsibilities.
5. Security/Compliance Card with extended features for replication of "Compliance" based data sets - such that all other cards run through the security layer, providing single logon, and other key features.

The hardware vendor of the initial device will handle high-speed networking, fail-over, compression/decompression, and plug-and-play (either grid computing environments or MPP shared nothing environments).

This I believe is the future device. While software will never disappear completely (due to the "ease" at which it can be created, relative to hardware), the mature products should find their way onto integrated circuit cards.

What's the value add for the software makers to bear the extra expense?
1. No "copies" of the software can be pirated; the hardware card itself must be pirated with replicating hardware
2. Cost of hardware manufacturing will continue to drop, as nanotech encroaches on existing IC technology (driving nanotech/hardware engineering cost up, and existing IC engineering costs down).
3. Strength in partnerships across multi-best of breed providers
4. Higher performance across the board with dedicated hardware options.
5. Licensing issues across "dual-core" vs "single-core" will disappear.
6. Happier customers with a plug and play environment.

What will the customer get from this?
1. Firmware updates instead of software updates
2. Appliance device bundle purchases (pre-configured, pre-tested, plug-and play)
3. Better SMB support
4. Better cross-integration
5. Super high speed devices
6. Lower maintenance cost, lower support costs
7. Better vendor support

Do you think this is possible? If not, why not?

Cheers for now,
Dan L

  Posted by Dan Linstedt at 10:55 AM | | Comments (0)


July 13, 2005

3-D Visualization of Data Models

In the world of Nanotech, chemistry, biology and medicine we already have 3-D modeling capabilities for everything from Neurons to Chemical compounds to molecular structures. The business users in these areas are already manipulating their own structures, producing "fly-throughs" and interactive mappings of their modeling world.

Why then doesn't the BI community have the same? Why must our business users and our data modelers be stuck in the stone age? What is with the standard and traditional 2D modeling approaches of today’s' data modeling tools and BI Query Tools?

These are problems that I think the vendors should step up and solve. It's been years since the business truly understood why, where, and how their information is stored (of course, there are some truly great data modelers out there who go to great pains to make the models conform to business, bravo I say).

But what about the rest of us? Why do we insist on simple 2D data models? Why can't we incorporate shading, gradients, depth and 3D views of the data sets and their activity? What would an INTERACTIVE data model such as this reveal about our businesses?

What if someone adapted a chemical modeling software product to act on data models, and we applied different interaction rules - such as gathering metrics of usage, volume, performance, join capacity, redundancies, and compression ratios. If we could then apply these rules in a dynamic fashion, we can play visual what-if games with our enterprise architectures, what mathematical formulas in brilliant colors - and see how our businesses may be affected by changes to the model, structures, indexes, and so on.

This would be a world like no other - we could begin to explore the real business rules. Hopefully we could "walk through" the model, represent height, depth, color, perspective, shading and gradients to act as guides in our world of data; "Data Art" if you will.

Ok back to the real world, how does this benefit business?
1. If RED meant trouble, (size trouble, or performance trouble, or some other marker - metadata) we could easily identify the Red areas of our business.
2. If Height meant activity, then we could see the areas of data that receive the most activity (either refresh or utilization - additional dimensions to shade with).
3. If Perspective and depth meant reach, or span across the business, we could begin to view our business in a whole new light.

Finally, going back to the original example, we as IT and data modelers could bring the data (in a representative format) back to the business users, and probably "play all day" with what-if scenarios without ever making a modeling change. The impacts could be very dramatic, the time savings tremendous. I just wish I had a tool like this today, anyone interested in funding a software startup?

Cheers,
Dan L

  Posted by Dan Linstedt at 9:14 PM | | Comments (0)


The ground-swell of Nanotech

I've been writing (and learning) about nanotech for the past two years, it seems like only yesterday when I first broke ground to try to bring Nanotech definitions and explanations to the Data Warehousing community. I've been off doing more research, in hopes of making my blog entries more interesting, and to the point.

Well, I've found something I want to blog about - the silicon industry versus the nanotech chip makers of the future. This is interesting indeed. Will the silicon giants of today realize what they're facing? Will they re-tool and rebuild ahead of the 8-ball? Will they sink faster than a stone in water?

There are many things I've learned about technology over the years, and many of these things are taught in school. Such as the cyclical nature of inventions. The stone age, the bronze age, the mechanical age, the information age, and now: the nanotech age - it is upon us.

Here's a link to a february 2005 Business Week article that discusses nano chip makers of the future versus Silicon Chip makers of today.

In the article, some of the silicon execs are refusing to see the light - but are they bluffing? Are they secretly building their nanotech advances in the back-room labs? They should be! If they haven't started working in true nanotech, they will be out of business within 10 years, possibly sooner. But then again, 10 years is a long time - or is it?

In my opinion they need to take what they have learned about manufacturing processes, architectures and designs, and find a way to model them at the molecular level. OR they should entirely scrap the whole process and re-invent their companies from the ground up. I tend to agree with the author of the article: Nanotech is not a force to be taken lightly nor should it be ignored. We should learn as much about it as possible.

Of course ultimate power lies in the manner of which the technology is applied to solve problems. However, nanotech is different - much different from that of standard Silicon based technology. Nanotech provides horizons and abilities we've only dreamt of or seen in scifi movies. A quote from the article: "They could conceivably marry electronics, for example, to biology, coming up with self-replicating computational devices. Researchers in Israel have already harnessed transistors to strands of DNA." I've already begun discussing this in some of my nanotech white papers here on the B-Eye-Network.

If you think I'm kidding about Nanotech and it's impact in our society, think again. There are thousands of stories out there about products emerging on the market place today, like this one: a nanotech based golf ball, or how about a Samsung 8gb Compact Flash Card, or a Samsung silver nanoseal refrigerator that resists stains, mold and mildew.

Here's another Business Week article about nanomaterials. A quote from this article is one I especially like: "The laws of gravity, optics, and acceleration represent averages, not the quirky behavior of each single nanoparticle. For those principles, researchers must venture into quantum physics. Those who come to grips with this realm and can harness its power stand to become the titans of the nano age."

Is anyone involved in Nanotech? Love to hear from you.

  Posted by Dan Linstedt at 8:56 PM | | Comments (1)


July 12, 2005

Security and our Data Warehousing Solutions

There's been a lot of talk about security, fire-walls, VPN access, anti-spam, and anti-hack systems - but most of the break ins and data leaks appear to be caused when a hacker reaches the RDBMS systems we put in place.

Why then, aren't the RDBMS vendors stepping to the plate to join up with many of the security firms? Why aren't we seeing acquisitions of security technology to become embedded within the RDBMS engines?

Even major authority figures have written about the hacks that can compromise database security, here, Donald K Burleson (a well known author on Oracle) discusses 9i hacks - as early as 2002.

A more recent breach found in December 2004, is written here. An instance of a hack against Teradata controllers was written here (click then FIND Teradata). Even with 10g, there are suggested security management procedures outside the database. Again, Don Burleson reports.

There's even a quote in the DB2 UDB PDF:
"by 2005, enterprises that don't encrypt stored sensitive data will spend 50 percent more than enterprises that do, due to failure to comply with regulatory or contractual data protection requirements (0.7 probability).",When and how to use enterprise data encryption, Rich Mogull, Gartner, March 2004

Here are some press releases that I found where vendors are beginning to work at the solution:
Teradata Press Release. June 2005
I was unable to locate one for Oracle10g - but will gladly accept links to one in the comments.
DB2 UDB Cross-Vendor Solution.

There are many different articles out there, spread across many different vendors, some positive, some negative - but the general gist is this: Businesses must consolidate their information stores in order to...
1. Save on cost of management
2. Become more competitive with information assets it has
3. Become better integrated with the information assets it wants to create
4. Understand it's own business better

But with consolidation, comes risk - greater risk of security problems, not only can the business get better answers, but now - anyone hacking into the centralized system all of the sudden has access to better answers too. Read this article on Data Center Server consolidations.

It's nice to hear that vendors are training people in RDBMS security, but you'd think by now that more RDBMS vendors (especially given the recent breaches) would pay more attention to row, and column level encryption and overall database security. You'd think that the industry would have learned their lesson!

It's up to us, the customers, to present a rallying cry to embed security at the RDBMS management level, make it seamless - and assist us in managing it real-time with alerts, audit tracking, and highly sophisticated software, and of course - partner with the best of breed - please don't write your own.

One other thing crosses my mind in this: VLDB - the larger the consolidated data sets get, the harder they become to secure (as an afterthought). Architecture is paramount in a VLDB/VLDW, and if integrated within these massive information stores we might stand a chance of fending off hackers once they reach "the motherload". With the advent of SOA now reaching directly from the web into the RDBMS back-ends one has to wonder, how will this all be managed?

Thoughts?

  Posted by Dan Linstedt at 10:36 PM | | Comments (0)


July 5, 2005

Microsoft wants Spyware??

General business sense says WARNING! WARNING! DANGER WILL ROBINSON! (sorry, line from an old US TV show). Apparently Microsoft is vying to buy a well-known spam-ware company. Check out the news article here.

Is this move brilliant or stupid? We'll leave that one to the analysts - as it is just a rumor for now anyhow, but I know what I would do..

So you're using a MS centric PC? No room for Mac's in your company? What happens when a trusted enterprise server software company like Microsoft is possibly going to buy out a well-known spyware/malware company?

Why would they want it? From a BI perspective, what does it give them? Well, deep data mining capabilities and characteristics for one - apparently this spyware company has a huge set of spyware collected information. I've put together a list of what I think Microsoft could use this technology for:

Positive things:
* Increasing customer share
* Increasing knowledge on what and how to market to customers
* Increasing security of their fire-walls, and make a serious statement to putting a stop to spyware and malware.
* Finally finding and fixing some of their most blatant operating system errors (in terms of security).

Now some of the negative things:
* Embedding spyware/malware directly into every OS on the planet and releasing it as invisible hooks (some of which they already have) to reach your machine when you're not looking.
* Popups for their own advertising campaigns.
* Controlling the advertising banners that appear on your Windows PC - replacing every 3rd banner with a Microsoft banner.

All this aside, it's a very interesting position that MS has put themselves in, and according to the story - they have declined comment on the rumor stating that it's just that, a rumor.

From a BI perspective, and understanding their customer base - it makes some limited business sense - but from a security perspective it only makes sense if they use it to beef up the defenses of the OS, rather than providing more back-doors into the system.

From a personal perspective I have to ask myself these questions:
* Did I ever authorize any of this personal information about me to be captured by some spyware company?
* How much information on me do they truly have?
* What gives them the right to "sell" my personal information for profit? Shouldn't microsoft be paying me money for my information? Or at least asking me if I allow this information to become a part of their operations, especially since it was most likely obtained without my knowledge?

I also have to ask the industry this: at what point does "selling and using" personal information (without my consent) actually become "identity theft by corporate business?"

One more question comes to mind, or at least a parallel observation: In the credit industry we have laws that allow us (although challenging and difficult) to monitor and change (to some degree) our credit reports. In this industry, they collect the "equivalent" of a credit report - our surfing habits, and other personal information, yet there are NO laws that protect our rights to monitor, change or even file in court against those who use that information illegally or for profit without our consent.

Injustice I say, injustice. If Microsoft buys this company and uses the spyware for anything other than improving their security, then there may very well be a Macintosh in my future...

Comments? How do you feel about this?

  Posted by Dan Linstedt at 4:19 PM | | Comments (0)