Blog: Dan E. Linstedt Subscribe to this blog's RSS feed!

Dan Linstedt

Bill Inmon has given me this wonderful opportunity to blog on his behalf. I like to cover everything from DW2.0 to integration to data modeling, including ETL/ELT, SOA, Master Data Management, Unstructured Data, DW and BI. Currently I am working on ways to create dynamic data warehouses, push-button architectures, and automated generation of common data models. You can find me at Denver University where I participate on an academic advisory board for Masters Students in I.T. I can't wait to hear from you in the comments of my blog entries. Thank-you, and all the best; Dan Linstedt http://www.COBICC.com, danL@danLinstedt.com

About the author >

Cofounder of Genesee Academy, RapidACE, and BetterDataModel.com, Daniel Linstedt is an internationally known expert in data warehousing, business intelligence, analytics, very large data warehousing (VLDW), OLTP and performance and tuning. He has been the lead technical architect on enterprise-wide data warehouse projects and refinements for many Fortune 500 companies. Linstedt is an instructor of The Data Warehousing Institute and a featured speaker at industry events. He is a Certified DW2.0 Architect. He has worked with companies including: IBM, Informatica, Ipedo, X-Aware, Netezza, Microsoft, Oracle, Silver Creek Systems, and Teradata.  He is trained in SEI / CMMi Level 5, and is the inventor of The Matrix Methodology, and the Data Vault Data modeling architecture. He has built expert training courses, and trained hundreds of industry professionals, and is the voice of Bill Inmons' Blog on http://www.b-eye-network.com/blogs/linstedt/.

There are many things that come to mind when reading the title of this entry, it's a HUGE space with even larger prospects - from the app servers to the databases, from the tips of BI reporting all the way to ethics, security and privacy laws....  And then there's the dreaded: "What if the company supporting my current cloud apps & data fails?"

Hmm, in this entry we will explore the tip of the iceberg as it were, and explore some of the notions to consider when looking at Business Intelligence and Data Warehousing in the cloud...  Why? Because the CLOUD is big, and getting bigger - and because it IS a central and important part of technology evolution in 2010.

Who can define "CLOUD" computing?  Not me, that's for sure.  It has the same problem as every other industry changing paradigm shift has had over the past 5 years.  Multiple definitions, multiple meanings, multiple provisions - all correct for different reasons.

Well, for lack of a better definition, let's focus on the one provided by techtarget.com:

DEFINITION - Cloud computing is a general term for anything that involves delivering hosted services over the Internet. These services are broadly divided into three categories: Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS) and Software-as-a-Service (SaaS). The name cloud computing was inspired by the cloud symbol that's often used to represent the Internet in flow charts and diagrams.

http://searchcloudcomputing.techtarget.com/sDefinition/0,,sid201_gci1287881,00.html

let's add that to specifically focus on 1) Data Warehouses and Data Marts in the cloud, and 2) Business Intelligence "reporting" and analytics in the cloud. 

Is it a good thing?  Yes, I think so.  I believe it can help reduce complexity, it can help consolidate disparate data sets, it can increase efficiency and offer "pay by use" computing power as data sets grow. 

But where might it fall short?  Well, firms that have poor data architecture (or poorly optimized data architectures) coupled with ever growing data sets will continue to see costs rise - quite quickly, and quite possibly beyond what they ever dreamed possible.  In other words, on the outset, the costs will be reduced - as migration into the cloud occurs.  Then, over a period of 6 to 9 to 12 months (as the data sets grow), costs for computing power will grow - exponentially or worse.  Why?  Because getting at "bad data architecture" and physically performing joins across engines without the ability to scale MPP at the core, will continue to run CPU intensive single streamed data access.  So, costs will continue to grow - in order for the "cloud to meet end-user BI performance expectations, computing power will have to be added to run additional user logins with additional single-streaming power."

Regardless of what Business Users like to believe (or IT for that matter) Performance is a MAJOR cost driver in cloud computing.  Now, to mitigate the high cost, or potential for exponential cost increase - the ONLY thing that might help is: a strong, yet parallel data architecture for back-office enterprise data warehouses.  It MUST be parallel in design and by nature, as the cloud is inherantly built around MPP scalability (on-demand MPP that is).  Which means: a GOOD or GREAT data model will be at the core of the cost contingency - which in turn means, call in a consultant who understands Terabytes and Petabytes and MPP systems BEFORE transitioning.  Optimize your BI data models, and accessibility before transitioning.

Ok, next point: what else can go wrong?  Well, let's put it this way: Clouds and cloud computing services are just that: Services.  And being services to the enterprise, they are equipped with API's (application programmer interfaces), also well known as web-services these days.  What this means is: potential for hacks/security breaches.  But that's not all!  Now, the computing resources that house YOUR corporate data are OUTSIDE your Firewall.  You as a company no longer know where (physically) your data lives in the cloud, and as cloud services expand (on-demand) to handle the need, then the cloud service provider opens more machines with visibilty to your data sets, your queries, and your on-line access.

So, security is a HUGE deal.  Here's an interesting thought (quite possibly a real engineering challenge): Construct a cloud on a virtual private network, with a maximum number of dedicated private machines, with encrypted data on the back-end.  Wow, that's a mouth-full.  I'm not even sure if this is possible (although with enough money, anything is possible).  So what does that mean?  It might mean buying servers, hardware, and fundamentals of internalizing cloud technology inside the corporate walls....  But where do you find IT resources with this knowledge?....  Another tough question.  IF you make this kind of decision, then outsourcing clearly is not an option, unless the outsourced talent is in house.  But wait: yet another security breach waiting to happen.

If you have sensitive protected data, then whomever you hire to work with it, better be trained and monitored closely.  Let me guess, haven't you heard about Cisco and the "back-door" they put into their routers/switches/networking devices?  I'm not saying all companies or all people are like this, but I've been in enough situations to understand that this is a security risk, and it is ripe for picking in a Cloud environment.

So there's one more to think about: Privacy and Ethics laws.  More data means more management costs, more data also means higher risk of "losing data to the outside world" with the missing data going un-noticed or un-traced.  If it's executed in a cloud, how do you know that the "machine/hardware" you are using doesn't have a virus on it, or some sophisticated monitoring program that when it's "turned on" added to the compute cluster in real-time, that it doesn't find and share sensitive data running through it's hardware?

Again, traceability is in question.  Let me say this: Cloud Computing IS (I believe) the future, we MUST find a way to work with it, and leverage it in the proper manner.  I am just saying that we must tread carefully into these waters, and like every other project - we must put a few things forward, like security tests, breach procedures, and yes - off-site backups of clouds in case the cloud vendor goes belly up....  What would you do if that happened and you were running your business on a cloud outside your organization?

Just a few things to think about.  Please reply - love to hear what you are considering in your cloud implementation.

Cheers,

Dan Linstedt


Posted March 3, 2010 5:55 PM
Permalink | No Comments |

Leave a comment

    
Search this blog
Categories ›
Archives ›
Recent Entries ›