Blog: Dan E. Linstedt Subscribe to this blog's RSS feed!

Dan Linstedt

Bill Inmon has given me this wonderful opportunity to blog on his behalf. I like to cover everything from DW2.0 to integration to data modeling, including ETL/ELT, SOA, Master Data Management, Unstructured Data, DW and BI. Currently I am working on ways to create dynamic data warehouses, push-button architectures, and automated generation of common data models. You can find me at Denver University where I participate on an academic advisory board for Masters Students in I.T. I can't wait to hear from you in the comments of my blog entries. Thank-you, and all the best; Dan Linstedt http://www.COBICC.com, danL@danLinstedt.com

About the author >

Cofounder of Genesee Academy, RapidACE, and BetterDataModel.com, Daniel Linstedt is an internationally known expert in data warehousing, business intelligence, analytics, very large data warehousing (VLDW), OLTP and performance and tuning. He has been the lead technical architect on enterprise-wide data warehouse projects and refinements for many Fortune 500 companies. Linstedt is an instructor of The Data Warehousing Institute and a featured speaker at industry events. He is a Certified DW2.0 Architect. He has worked with companies including: IBM, Informatica, Ipedo, X-Aware, Netezza, Microsoft, Oracle, Silver Creek Systems, and Teradata.  He is trained in SEI / CMMi Level 5, and is the inventor of The Matrix Methodology, and the Data Vault Data modeling architecture. He has built expert training courses, and trained hundreds of industry professionals, and is the voice of Bill Inmons' Blog on http://www.b-eye-network.com/blogs/linstedt/.

There are many trends afoot in the industry, but I think there are two or three that are here to stay, and quite frankly, they will make a huge impact on the way BI/EDW is done. Some of the technologies that are ground-shaking and game-changing include Solid State Disk, Database appliances on the Cloud, and data services on the cloud.  I've been watching the Database market now for several years, and here are a few of my predictions for 2010.

Here's the skinny of it:

1) Solid State Disk.  It's not your grand-fathers "RAM on a USB stick" plugged in to the computer, no - it's high-speed, ultra parallel, screaming bus performance RAM.  As one analyst recently put it: "CPU's have gained 5 million times faster performance over the last 30 years, while disk drives have improved only about five times, he says. SSD's offer 150x improvement in random I/O's per second over spinning disk."  Information Week, Oct 26, 2009, Douglas Henschen.

So what does this really mean in business terms?

In business terms this brings the ability to finally execute high speed parallel operations for massive BI queries, and massive bulk-loading systems.  But wait, there's more (yes, it gets better)...  This also means that we can finally take advantage of extreme flexibility in logical modeling layers, and have to "worry less" about physical modeling components... Which of course leads us right to the Data Vault Modeling methodology, or 6th normal form (can become a reality if you like). 

In business terms, if we have extreme flexibility in our architecture, we can finally enable IT to become nimble and fast again.  It means we can reduce delivery times, and execution cycle times.  It also means we can increase up-time, and oh-wait...  HOT DATA?  Did all of you forget about Temperature driven data?  Shame on you!  Hot data will reside in the SSD devices, allowing hyper-cubes to take place, AND allowing DYNAMIC changes to both the data models, the queries, and the loading systems to occur.

Why is that important to me?

Well, I'll tell you...  In my opinion, it's important to you because you should care about the architecture you are using, you should care about the accountability and auditability of the data set, you should (and usually do) care about the performance of the system you're building your solution on; and by the way: your going to have MORE and MORE data to deal with, like it or not: the EDW/BI solutions of today are accountable (they have become a system of record in their own right).

Yea ok, so what...  Solid state disk, faster machines, phooey...

You can say what you like, but IBM Smart Analytic System, Oracles' Exadata, Teradata Extreme Performance Appliance 4555, are just a few of these notions.  What would happen (do you think) if ParAccel, or Vertica, or Sybase IQ jumped on this ship?  Especially into Amazon EC2 cloud...  Hmmmm Now it get's interesting!

Which brings me to my next point:

2) Database technology is changing for 2010.  RDBMS making use of SSD will change the game, it will be an uneven playing field, but then making these devices available on the EC2 cloud (wether it's amazon or google providing the services) - based on HADOOP (mostly) - it won't matter.  The RDBMS technology will get faster, and now - in-company network bandwidth will be OUTSHINED by cloud privisioned bandwidth.

What?  What does that mean outshined by cloud bandwidth?  I don't get it...

You will have to get it if you want to lower cost, and succeed in business going forward.  What I mean is: Data Warehousing and BI solutions will be offered as an outsourced service.  That's right folks, for the first time in DW/BI history, (with just these two advancements coupled together), Data Warehousing and BI as a service will become a reality.  Mark my words!   Ok, don't take my word for it... 

All you need to do is call up a company called: Quantivo they are processing/ingesting 7 billion customer records, and allowing 2 to 3 second query response times against this sizable data.  What do they sell?  packaged data warehousing/BI analytics.  In other words, they do the loading of the raw data, they manage their internal data model, they perform the cross-mapping, the indexing, and they build the "data mart" availability of the data set.

So what's the catch, there's always a catch, and I want to know what it is!

Of course there's a catch... today there's an ethical, security, and privacy dilemma.  How do you "outsource" data to a company providing a service that is put into a cloud owned by a 3rd party company (say Amazon or Google), and keep it all safe and secure?  Well, I would argue there are several layers to making "DW/BI as a service" available.

Layer A: is what Quantivo Does: doesn't keep (or receive) any specific or private information about individual customers that may risk identity theft if the security wall is broken.  They provide a VPN service, dedicated encrypted channels for the customer to get at their own data sets.  They also slice the data sets off - for each customer, they setup different grids in the cloud - ensuring all data stays separate.

Layer B: Hardware vendor/RDBMS vendor offers use of their software/hardware within the cloud as a direct contract or managed service for upgrades, the customer contracts them to execute the cloud installation, and then the customer manages, owns the data, the data modeling, the queries, and all other components within the cloud over VPN.

I think beyond this, there may be some additional options for EDW/BI as a service however they have to be scrutinized carefully.

Ok, ok - so what's the real business benefit here?

Still don't see it eh?  Well, here are just a few of the business benefits to be received by executing plans like this:

  • Lower cost of ownership
  • Faster turn-around time (like in Quantivo's case - they are specialized in customer analytics and data mining)
  • More parallelism with "sloppier design" - wait, that's a risk!  Yes, it is...  but sloppy designs will get a significant performance boost from SSD, and from cloud technology
  • More agile, more nimble (Dynamic data marts) - here today, and possibly trashed tomorrow, only to build a new one!
  • Cube Write-Back, yes - because of the nature of cubes, LARGE CUBES on SSD in a CLOUD now can offer FAST write-back. Hey Microsoft, you listening?
  • More power in the hands of the business user - direct access to create on-demand data mart solutions IN-RAM, that's right... SSD on a cloud over VPN, lets users PLAY with their data directly (if the outsourced service has implemented properly).  finally, play with LARGE to HUMUNGOUS sets of data on essentially a RAM drive!  Pay as you go, pay for only what you need.

I don't mean to dig on Microsoft, I think just the opposite - they have cube write back technology today, and their 2008 SQLServer R2 release is INCREDIBLE...  it is screaming fast, and super scalable (even on standard 32 bit windows servers, if you know how to tune it).  I just think it's an easy jump for them to implement something on SSD and in the cloud, and voila - what an INCREDIBLE solution.  It's so easy, I don't know why their gold partners haven't done it yet...  Anyone want to invest in me?  I'll build you a company to do it.

Alright, what are the risks...

The risks include: not managing the services effectively, not setting the quotas and policies for each user appropriately (example: a low rung business user fires off a query that expands the cloud to 12 to 20 machines for 1 hour, but can't authorize the payment for the service usage).  Ok - Governance, Governance, and more governance....  Oh, did I mention: GOVERNANCE over the solution?  Absolutely required.

Another risk is: letting yourself believe that this is the answer to "all the poor quality data modeling" or "sloppy ETL" or "poorly behaved BI queries" because now the performance is incredible out of the gate.  Wrong, wrong wrong.  Even though this opportunity affords HUGE gains in performance, it is no excuse for lack of GOVERNANCE and DATA MANAGEMENT best practices.  Your data models can & should be improved, your sloppy ETL can and should be tuned, and your poorly behaved BI queries must be optimized.  Then instead of seeing only a 10x gain, you will be able to see a 150x gain.

You know what they say: No PAIN, no GAIN... In this case, the pain is paying for the right architecture to be put in place, the right training for performance and tuning, and the right data modeling to be utilized under the covers.

You can find out more about Data Vault (550+ users, 2000 unique visitors a month, 150k hits a month) at http://www.DataVaultInstitute.com

You can find out more about ETL performance and Tuning at http://www.TrainOvation.com, and http://www.GeneseeAcademy.com

OR: Contact me directly: danL@DataVaultInstitute.com  (even if you're not using Data Vault)

Cheers for now,

Dan Linstedt

 


Posted October 28, 2009 12:03 AM
Permalink | No Comments |

Leave a comment

    
Search this blog
Categories ›
Archives ›
Recent Entries ›