Blog: Dan E. Linstedt Subscribe to this blog's RSS feed!

Dan Linstedt

Bill Inmon has given me this wonderful opportunity to blog on his behalf. I like to cover everything from DW2.0 to integration to data modeling, including ETL/ELT, SOA, Master Data Management, Unstructured Data, DW and BI. Currently I am working on ways to create dynamic data warehouses, push-button architectures, and automated generation of common data models. You can find me at Denver University where I participate on an academic advisory board for Masters Students in I.T. I can't wait to hear from you in the comments of my blog entries. Thank-you, and all the best; Dan Linstedt http://www.COBICC.com, danL@danLinstedt.com

About the author >

Cofounder of Genesee Academy, RapidACE, and BetterDataModel.com, Daniel Linstedt is an internationally known expert in data warehousing, business intelligence, analytics, very large data warehousing (VLDW), OLTP and performance and tuning. He has been the lead technical architect on enterprise-wide data warehouse projects and refinements for many Fortune 500 companies. Linstedt is an instructor of The Data Warehousing Institute and a featured speaker at industry events. He is a Certified DW2.0 Architect. He has worked with companies including: IBM, Informatica, Ipedo, X-Aware, Netezza, Microsoft, Oracle, Silver Creek Systems, and Teradata.  He is trained in SEI / CMMi Level 5, and is the inventor of The Matrix Methodology, and the Data Vault Data modeling architecture. He has built expert training courses, and trained hundreds of industry professionals, and is the voice of Bill Inmons' Blog on http://www.b-eye-network.com/blogs/linstedt/.

January 2008 Archives

Every good BI/EDW solution is backed by a good architecture, DW2.0 is no different. The frame-work that DW2.0 provides is a sound framework with all the components necessary. That said, in addition to the framework, architectures need to exist at different levels, as do standards, and templates. A solid enterprise data warehouse project usually contains many of the following components that implementers and consultants use to make a project successful.

The components include:
1. Common Project Plan (usually around 1400 to 2200 line items)
2. Business Requirements Document
3. Technical Requirements Document
4. Work Breakdown Structure (WBS)
5. Data Breakdown Structure (DBS)
6. Organizational Breakdown Structure (OBS)
7. Process Breakdown Structure (PBS)
8. Skills Assessment Matrix
9. Risk Analysis Document (high level)
10. Risk Analysis documents for critical tasks
11. Critical Process Flow Definitions (as-is), both business and technical
12. As-is data model documents
13. To-Be (proposed) E-R diagrams for staging, EDW, and at least the first mart structure
14. To-Be (proposed) interface structures
15. Cross-Reference (from/to) table.element to table.element definition (source to target design spec with transformations)
16. Standards Templates for loading CLASSIFICATIONS of table structures.
17. Reporting Templates for providing data - usually part of the business requirements document, that is: existing reports they may be receiving. Sometimes the To-Be reports are defined in the technical requirements document. However, reporting templates should be designed according to the GRAIN of data they provide, along with the OBS that they serve.

This is not a comprehensive list, but it has the major components that are necessary in all projects. Keep in mind that a lot of these items can usually be purchased as templates from ITIL (the IT Information Library), or ISACA. Some of these things are available for purchase from SOME consulting companies.

I, being of sound mind (although some would beg to differ) hereby offer up (within this series) documents that might assist you along the way. I'll post these as we proceed on my companies' web site. Ok - enough of this... back to the topic at hand.

What are the key components?
1. It is vital, absolutely vital to put all of these (and other project artifacts) under a single, company wide version control.
2. ALL EMPLOYEES DOWN TO THE LAST BUSINESS WORKER SHOULD HAVE ACCESS TO THE ENTIRE STACK OF DOCUMENTS AT ANY TIME, that is of course unless your running a cleared/classified project. All these documents should be available for read-only access on the web, by any member of the business or technical side of the house.
3. ALL documents require sign-off sheets for changes, and adjustments - these sheets are signed and dated by the money holder, project sponsor, technical project lead, and project manager. If you have a PMO office, they should handle SOME of this documentation, it is unlikely they handle it all unless they are SEI / CMMI Level 5 sanctioned.
4. ALL documents must contain technical numbering. It starts with the Business Requirements, followed by the tech requirements, followed by the project plan.

for instance: 2.1.5.1 in my Busn. Req. doc, should match 2.1.5.1 in my Tech requirements doc (where we discuss the implementation aspects), which should match 2.1.5.1 in my project plan - the resource load, progress tracking, and dependencies to getting it done. The 2.1.5.1 title in the project plan should be a tech requirements title, major numbers like 2.0 should be a business requirements doc paragraph title. MIX the metadata!!!

Oh it get's better....
2.1.5.1 should also match the WBS - telling me what work needs to be done to accomplish the tasks identified.
2.1.5.1 should also match the OBS - telling me which roles & responsibilities (which are usually loaded in to project plans) are needed, and what the escalation paths are in the organization should something go wrong. It also attaches the sign-off document to individual components.
2.1.5.1 should also match the PBS - which define the current AS-IS processes that are involved in currently getting the work done, which of course also define the critical path elements (business process management/re-engineering) so that the TO-BE process designs are more efficient.
2.1.5.1 should also match the DBS - which defines what data is needed to meet the requirements.

In other words, the project line-item number should be pervasively applied to every document in the stack, so that monitoring project progression and completeness can be done. You can't reach Level 5 CMMI without monitoring, and automating (where possible), and you can't fix what's broken unless you can plainly see the hold-ups, delays, and large issues in the way.

WHAT??? There's no possible way I can get all of this for the whole project before I start... that would be MASSIVE effort!!
Yes and no.... it's all about scope. Make absolutely certain that you scope down what you are implementing, and scope down the documents as needed. The only documents which usually don't scope down are the project plan, and the business requirements documents. Once you have these, scoping is much easier.

Remember: that the size of your implementation team can bear a huge impact on the scope you can accomplish and the time you can accomplish that scope in. In other words, it can contribute to overall project RISK to not have enough resources. It needs to be identified in order to discuss with the business before moving forward.

Now, that said: I speak from experience, as a technical project manager in a HUGE organization, I once went from zero to all documents in place, and the project phased and then scoped within 3 months. Also within that 3 months my technical team of 3 - 4 people went from zero to delivery of a single report/single star schema. It is possible. Sometimes this is "sold" as a 90-day time box deliverable, but be-wary of the deliverables that do not include the methodology, the approach, and are missing the documents I specified above.

Why? because when the consulting company finishes the work, and turn over the efforts, if you don't receive the methodology behind it, and the documents aren't numbered, then it is VERY hard for you (the customer) to learn about what you have, much less maintain it going forward.

KEY POINT ABOUT SCOPE:
Usually business requirements documents are huge, in order to meet all the items within the BPR, it is absolutely vital to divide it in to phases (I like a 3 phased approach). This way, eventually all the requirements can be met, and in the first phase some low-hanging fruit (like necessary infrastructure) can be built along side an expandable and flexible architecture also along side NECESSARY requirements that are on-fire.

LET THE USERS WORK OUT WHAT'S HIGH PRIORITY FOR PHASE 1, once they understand they _will_ eventually get all requirements, just not in the first phase, they usually settle in to the task at hand of assisting to define scope WITH YOU. Ensure that you help or assist them to define scope of the project as you move forward.

As usual, I'm interested in your thoughts and feedback. What has been your experiences good or bad within projects?

Thanks,
Dan Linstedt


Posted January 31, 2008 2:47 AM
Permalink | 1 Comment |

With these posts I hope to shed some light on what makes projects work, no matter the scale, no matter the time-line; always with an eye on costs, overhead, and a watch on the number of errors. In part 1 of this series I introduced top-level concepts required by nearly all "good" projects. In this entry I will dive a little deeper into what these concepts bring to the table, and also add some lower-level concepts that are also necessary in successful projects. I'm specifically targeting 2nd generation warehouses, and DW2.0 - in an effort to move forward.

In the last post we discussed the following ideas:

1. Singularity
2. Repeatability and Standardization
3. Measurable
4. Automatable

Lest we forget it's always about:
1. People - AKA: Organizational Breakdown Structure (OBS)
2. Process - Process Breakdown Structure (PBS)
3. Data - Data Breakdown Structure (DBS)
4. Work - Work Breakdown Structure (WBS)

We need to apply the concepts:
1. Accountability
2. Version Control
3. Governance

To everything we build, be it a warehouse, a staging table, or a single star schema. Without these fundamental tenants, we have a difficult time reaching the top 4 goals listed above (defined in part 1).

What does this mean in real terms?
a. Everything must be numbered, and versioned. Use technical numbering patterns for extensibility and hierarchical concept organization
b. Glossaries must be developed and maintained, common terminology is extremely important.
c. Common check/in and check/out procedures must be applied to ALL project documentation, no longer is it acceptable to keep a spreadsheet, project plan, or requirements document out of site, or on a single persons' PC/desktop. It needs to be available in a common shared repository by the entire team.
d. AS-IS diagrams, documents, statements, use-cases are just as important as TO-BE defining where they need to go. Gap Analysis is a critical part of the success to any project. Without Gap Analysis, it becomes difficult to MEASURE the LOE (level of effort) to get from point A (as-is) to point B (to-be).
e. ALL BREAK-DOWN STRUCTURES SHOULD BE NUMBERED! And attached to line-items in the project plan, as I've blogged about in the past, and will bring back again shortly.

If you can, or if you remember how to: apply function points. The secret number is the number of person-hours per function point. Usually for the average "already trained" I.T. employee, it's roughly 1.5 person-hours per function point. Divide the function points in to groups: easy, moderate, difficult. However, don't change the hours per function point. Why? the resource expertise leveling will do this for you.

Divide the resources into "expert, knowledgeable, and beginner". Assign risk categories to each level of resource. Then assign a multiplier to each resource group, the lower the group knowledge, the higher the risk, also - the more person-hours per function point it will take. Ideally you want each person on the project resource weighted. Now, keep in mind that these are NOT general categories for resources. Each resource may be expert in a different category. Where do you get the categories from? The Work Breakdown Structure.

Ok, so you're beginning to get the point. You need a matrix to cross-reference all this information, yes.. Start with a solid project plan, where the numbers in the project plan are numbered down to EACH element and EACH paragraph, or better yet - specific sentences in the requirements document that tie a project plan line-item to the requirements. THEN cross the project plan numbers to a work breakdown structure, by using this strategy, you can "resource load" your project plan with roles and responsibilities (held in the WBS).

Once you assign resources, you can setup risk leveling for each line-item in the project plan, and tie them to difficulty counts in the function point analysis (task overview). But I get ahead of myself.

Proceed at once to the AS-IS documentation:
a) AS-IS data models
b) AS-IS process flows
c) AS-IS resource allocations
d) AS-IS results (that the business is receiving)

Then, proceed to define the TO-BE:
a) TO-BE data models (tie these back to the results, by project plan by number)
b) TO-BE process flows (tie these back to the results, by project plan by number)
c) TO-BE resource allocations (if the team assignments are to change)
d) TO-BE results (this ties directly BACK to the requirements document and project plan by number)

Then, assess the Gap between the two, and generate a risk assessment complete with analysis and documentation, necessary people, roles, knowledge, and data. Then, assign importance to each item in the risk assessment, and tie the results of the risk assessment back to the project plan, again by number.

What we are doing is crossing the PBS, DBS, WBS, and OBS together with the gap analysis, requirements document, and project plan. So we can clearly see _exactly_ what it takes to get from A to B in the shortest time possible. We are also clearly illuminating any obstacles in the way, be it training, to hardware, to software to make it happen. We can then MEASURE our progress as we begin down the project plan.

Now, we are taking charge and governing our build process for EDW/DW2.0 - once we begin this effort, the business will begin to take us seriously, and see that we too are accountable for our delivery efforts - and every step of the way they can monitor the progress.

Thoughts? What have you experienced? Have you executed function points? (yes they're from the 1970's but they still work when done properly).

Thanks,
Dan Linstedt


Posted January 28, 2008 2:27 PM
Permalink | No Comments |

It's been a while since my last blog entry, my apologies. I've been heads down building companies lately, and they seem to be starting to gain steam. All that aside however, I've been thinking quite a lot over the years about these notions we hear about: top 10 this, top 10 that... most of these are about "mistakes" we can make, followed by short sound bytes about what or how to look for answers. There are books and books filled with great information about building, maintaining and deploying an Enterprise Data Warehouse, and then there are architectural discussions from John Zachman, Bill Inmon, and others. All good information....

But what I've been thinking about is a practical guide, a good sense guide to how to get from point A to point B, and then do it over again, repeatably, reliably, and consistently. Not just to do it the first time, but also for those of you getting in to 2nd Generation Warehousing, or DW2.0, what does it mean? How should it be approached?

Please keep in mind, I'm no expert, and I hope I never claim to be, that title is reserved for people like Colin White, Bill Inmon, Claudia Imhoff, Ralph Kimball, and a few others who really are the true thought leaders. No, what I want to share is a culmination of 15+ years in the industry of building enterprise data warehouses and business intelligence systems. What to do, what to look for, how to do it, how to make it work.

I'm not (hopefully) going to dive in to the nitty gritty of it all (I've done that in postings in the past), but I'm going to stick to the architectural roots, and try to make business sense out of all of this.

For starters, we (most of us anyhow) are familiar with Dr Kimball’s approach to enterprise data warehousing; and when we reference his books in the LifeCycle Toolkit, we talk about things like Star Schemas, Enterprise Service Busses, and Staging Areas. But let me back up...

There are a few things that the CIO should be discussing and contemplating when building ANY enterprise based systems:
1. Singularity - cost of management, maintenance, organization, and flow - everything should have a center or master point of contact. This includes data, people, process, organization, and so on. Business have singularity (usually) in their organizational structures/reporting structures, why should data, and business process be any different?
2. Repeatability, Standardization - Once a success has been established, it should be repeated again, for a lower cost, higher value, and measured/monitored effort. CIO's need to focus on things like ITIL, CoBIT, CMMI, PMP, Six Sigma, and if a CIO doesn't understand these terms, or can't bring them to bear within their own organization, they should either be sent to school to learn these tools, or be let go from their position. The effect of deploying these measures within a corporation is to enable the corporation to grow, safely and collectively with practices that scale. The business gets better and better at singularity and success. Small shops should focus on leveraging these ideas at a high level, getting the corporate culture used to operating this way.
3. Measurable - Without standardization of people, processes, and data, measuring and estimating the next big project could turn out to be a disaster. Measuring the process in the Estimation cycle, or in the projection to completion cycle, or in the current effort cycle (level of effort aka LOE) is extremely important. Understanding the measurements of the people AND the data AND the processes involved go a long way to controlling project cost overruns, and budgetary concerns. They also help to point out where things go wrong, have gone wrong or are in danger of going wrong in the near future, thus enabling the business to navigate around the issue or solve the issue, which ever they choose to do. If you can't measure the effect of the people, the systems, processes or the data, then you won't be able to improve what you are doing, and thus make a repeatable success next time.
4. Automatable - once the CIO has gotten a handle on the processes, and the corporate culture has turned the corner (gotten used to standardized practices, and version control, document control, PMP and six sigma practices), then the CIO should look at automating parts of the business.

Why am I rehashing old ideas here that we have known for I.T. in general for years? Well, guess what: the EDW teams over the past 10 years (for the most part) have claimed: no standards to follow, it's a different project, every single customer is different, all pieces are special, new, and are exceptions to the rule... You name it, the EDW projects have done it (in a blaze of glory no doubt) in order to deliver on time.

Ok, so in walk methodologies that the Big 6 (now final four if you can call them that) from the various consulting companies. I'm guilty of that too - I wrote one of these behemoths in my past life... They are massive in documentation, enforce rigid standards, and usually are waived in the clients face as proof that the company knows what they are doing.

This isn't the problem, no the problem isn't that these methodologies exist, nor is it a problem that they are followed (which in most cases they are...) The problem is: when the consulting companies leave your realm... what do they leave you, the customer with?

Empty arms, lack of knowledge, minimal training... They turn over the project, walk in to do their song and dance, show what they've provided - but no one on your team understands the infrastructure, the standards, the closely guarded methodologies they used to build the huge solution set. It's like walking in to the empire state building in NY and not knowing where the elevators are because you can't see them; yet you know they are there.

CIO's with all the off-shoring, in-shoring, on-shoring, near-shoring, and beach belly wagging (just kidding)... have gotten lost in the fray here. Companies need to get back to doing business the way they do business, and they need to get back to contracting consulting companies that do business they way THEY need it done. Which means internal controls, internal standards, internal documentation, internal improvement processes...

But I digress. Back to the point. Enterprise Data Warehouses need guidelines, standards, processes, architectures, and workable solutions - otherwise you'll be stuck in a quagmire of mud for a bit, while stretching to reach the DW2.0 bank of solid ground.

These entries will start at high levels, and dive down a little bit to discuss how, and what we do to build successful systems. Remember, these entries come from my own personal experiences of working in the trenches - I'm open to comments on all forthcoming entries to see what other knowledge we can share.

Cheers,
Dan L


Posted January 27, 2008 7:58 PM
Permalink | 1 Comment |