Blog: Jill Dyché Subscribe to this blog's RSS feed!

Jill Dyché

There you are! What took you so long? This is my blog and it's about YOU.

Yes, you. Or at least it's about your company. Or people you work with in your company. Or people at other companies that are a lot like you. Or people at other companies that you'd rather not resemble at all. Or it's about your competitors and what they're doing, and whether you're doing it better. You get the idea. There's a swarm of swamis, shrinks, and gurus out there already, but I'm just a consultant who works with lots of clients, and the dirty little secret - shhh! - is my clients share a lot of the same challenges around data management, data governance, and data integration. Many of their stories are universal, and that's where you come in.

I'm hoping you'll pour a cup of tea (if this were another Web site, it would be a tumbler of single-malt, but never mind), open the blog, read a little bit and go, "Jeez, that sounds just like me." Or not. Either way, welcome on in. It really is all about you.

About the author >

Jill is a partner co-founder of Baseline Consulting, a technology and management consulting firm specializing in data integration and business analytics. Jill is the author of three acclaimed business books, the latest of which is Customer Data Integration: Reaching a Single Version of the Truth, co-authored with Evan Levy. Her blog, Inside the Biz, focuses on the business value of IT.

Editor's note: More articles, resources, news and events are available in Jill's BeyeNETWORK Expert Channel. Be sure to visit today!

DQeBook-400

You may have noticed we've slowed down our "In The Field" blog entries, but it's for a good reason. Last week, Baseline launched its latest e-book, co-authored by Baseline consultants and frequent bloggers, Carol Newcomb and Caryn Maresic.

The Data Quality eBook is both a cautionary tale and a nuts-and-bolts toolkit for bringing a set of formalized data quality processes to your company. When the Central Health Alliance discovers just how costly bad data can be, the health care provider launches a data quality program that not only improves services—it can actually save lives. This e-book looks at data issues faced by companies across industries, and shows you how to apply a step-by-step process to prevent over-investment in untrustworthy data and drive business value in the bargain.

The book is currently available for download at Information-management.com.

And now, a brief excerpt from the book:


DQImprovementProcess-400

At Central Health Alliance—as with many companies—protracted explanations and guesswork cede to manual effort.   If there is a problem hidden in the data, an analyst will surely find it. The question is: how long will that take?

The problem with manual data exploration is that you’ve got a lot of data—probably a lot more than you know.   Data is captured, copied and transformed—it is everywhere in all shapes and forms.   When digging through the data, where do you start?   More importantly, where do you stop?   Unfocused and manual data   profiling might lead to interesting discoveries, but won’t get you a cohesive roadmap to better data quality. Moreover, it’s hardly scalable.

The right way to improve data quality is by focusing on four incremental steps:

Identify the Business Issue – Defining the business issue and its impact on business operations, strategic goals, or decision making maintains focus for the remainder of this process. The scope of the business issue should be well understood. You might identify several related business issues that have bad data as their core. Or you might have a number of overarching issues, as Central Health Alliance does.

Assess Conformance to Requirements – After your business issue is well understood, it is time to do a data quality assessment.   The assessment is a focused effort to determine where in the data lifecycle things go wrong.   Central Health Alliance knows its business issues and they are poised to kick off the data assessment.

Discover the Root Causes – After you’ve assessed your data quality issues, it is time to discover why these problems are occurring. What are the root causes? Is there a lack of consistent training for the people who key in data?   Is there some buggy code that is moving data around behind the scenes?   Maybe there is some confusion about what the data actually means?

Formalize Improvements – Once you know the ”what” and the ”why,” it is time for action.   Improving data quality is often a two-pronged effort—you’ve got to fix what’s wrong and you’ve got to put a monitoring system in place so that you will know when something goes awry in the future. By fixing the data problem at its source, you can not only prevent it from recurring, you can improve the quality of the data in upstream systems as well.

What are you waiting for? Go download the entire e-book today!


Posted August 26, 2010 6:00 AM
Permalink | No Comments |

By Caryn Maresic, Senior Consultant

summer reading by Robert S. Donovan via Flickr

Contribute to society and human well-being.   Avoid harm to others.   Be honest and trustworthy.   Be fair and take action not to discriminate.   Those are the first four items in the ACM Code of Ethics.   The ACM, for those who may not be familiar, is the Association for Computing Machinery, whose mission is to advance computing as a science and a profession.

In the course of a recent assignment with a major insurance carrier our team was asked to create various target lists for sales and marketing based on certain selection criteria.   While it is likely that all of the things they asked for were legal and ethical, we never questioned it.   As good Data Stewards, what should we have done in this case?   Should we be asking the business to justify their selection criteria?   Should we be checking to make sure there are no legal or ethical violations inherent in the rules?   A little research on the topic turned up this presentation  
which is very interesting and thought provoking.   That being said, it focuses more on the hot-topic issues like privacy and identity theft than it does the ethical dilemmas of sales and marketing.

This article tells the story of an ”Agent Profile System” set up by an insurer in Texas to rate its agents.   Agents who didn’t score well were punished by not getting any new business.   The agents filed suit contending this was illegal as it compelled them to drop clients with low credit ratings, low income, and/or those who lived in undesirable locations in order to boost their own score.   Is the IT team that built the Agent Profile System responsible, at least in part, for discrimination?

When we are dealing with situations where lives are in danger the ethical answer is clear.   For example, no reasonable person would deny that engineers working on Space Shuttle software have a duty to report concerns regarding possible malfunction.   In the BI community our issues are not always so clear cut.   Sometimes discrimination is good for the business’ bottom line, yet still unethical and possibly illegal.   If we go back to the statements ”Avoid harm to others” and ”Be fair” and ”take action not to discriminate” it appears that we should take serious our responsibility to be involved in how the business uses data.   In fact, I would argue that we should make ethical considerations part of our data governance program.

photo by Robert S. Donovan via Flickr (Creative Commons License)


Caryn_50x50 Caryn has over 20 years experience in providing high-quality data solutions to clients in the areas of Business Intelligence, Data Warehousing and System Integration.   Caryn has expertise in across industries with an emphasis in Pharmaceutical, Manufacturing, and Insurance.   Prior to joining to Baseline, she ran her own consulting company.


Posted July 15, 2010 6:00 AM
Permalink | No Comments |

by Caryn Maresic, Senior Consultant

Design

The Data Architect is the core of any BI team.   It is important to choose a person with the right skill set.   As I tried to put together a list of skills I looked to IT Toolbox and Database Answers for help, but my mind wandered a bit.   System Construction. Data Architect. Data Warehouse. Software Factory.   We like to portray what we do in terms of construction and/or manufacturing.   A recent client bemoaned her departments inability to move from ”building custom cars” to ”an assembly line”.   Comparing ourselves to these burly industries might make us feel strong, but it does it accurately represent what we aspire to be?

What is a Data Architect?   What should they know how to do?     I borrowed the following description from this article. Before you click, read on and see if you can guess what this is really describing.   I think it is a great description for a Data Architect:

A Data Architect is qualified by education, experience, and imagination to enhance the function and quality of systems. The purpose of this pursuit is to improve the quality of life, increase productivity, and protect the health, security, and welfare of the business.

The best Data Architects are capable of analyzing a client's needs, goals, safety and business requirements and integrating this information into a design that is both pleasing to the eye and functional. They will work with the client closely to develop preliminary design concepts that meet their aesthetic, functional, and economic needs while maintaining adherence to standards.

In essence, the best Data Architects are part detective, part artist, and part psychologist and they use these skill sets to create systems that fit a client's tastes and needs with their budget in mind.

Doesn’t that sound like a great job?   Sign me up!   What this is actually describing is an interior designer.   While I doubt that HGTV has any plans to showcase the next dashboard you build, we are indeed closer to Designing Women than Rosie the Riveter!   Stay tuned for future posts on the talents of a good Data Design Star.

photo by Annahape Gallery via Flickr (Creative Commons License)


Caryn_50x50 Caryn has over 20 years experience in providing high-quality data solutions to clients in the areas of Business Intelligence, Data Warehousing and System Integration.   Caryn has expertise in across industries with an emphasis in Pharmaceutical, Manufacturing, and Insurance.   Prior to joining to Baseline, she ran her own consulting company.


Posted July 8, 2010 6:00 AM
Permalink | No Comments |

By Caryn Maresic, Senior Consultant

Mickey Mouse by wrayckage via Flickr Creative Commons

Most Data Warehouse designs include constructs for Address, Phone, and/or Email for Customers.   Len Silverston came up with what he calls a Universal Data Model that does a very good job of abstracting address, email and phone number data.   I have seen clients use the Contact Point portion of his model as-is and with a few simplifications with great success.   That being said, in the area of Marketing and Sales, the manner in which we reach out to our customers and prospects gets more diverse every day.   Disneyland has just partnered with Verizon so that park guests can get real time information about the park and play Disney games on their phones....and, of course, Disney gets access to more information about its customers!

How does this new and ever changing world of communication change the way we think about and model contact points?   What would my ”address” look like if I were near the Haunted Mansion looking for a lunch spot?   Would it be different than if I were at Downtown Disney looking for a cup of coffee?   On Main Street looking for Winnie the Pooh?   In all instances I would be using the same phone, possibly the same IP address, but I would be in different locations which would be important to the marketeers at Disney.

As time goes by (and cell phone GPS systems become more accurate) I suspect that the way we run marketing campaigns to smart phones will be similar to the way in which we use billboards today.   Where the customer is physically located at any given time will be as important as the phone number and/or IP address, thus creating a two dimensional contact point.

Have you come across this issue in your organization?   Have you changed your data model to include two dimensional contact points?   If not, has the use of smart phones changed your data model in other ways?

photo by wrayckage via Flickr (Creative Commons license)


Caryn_50x50 Caryn has over 20 years experience in providing high-quality data solutions to clients in the areas of Business Intelligence, Data Warehousing and System Integration.   Caryn has expertise in across industries with an emphasis in Pharmaceutical, Manufacturing, and Insurance.   Prior to joining to Baseline, she ran her own consulting company.


Posted July 1, 2010 6:00 AM
Permalink | No Comments |

By Carol Newcomb, Senior Consultant

Diamond in the Rough: Data Quality

The third part of my summertime primer addresses Data Quality Analysis.   Don’t even start a data quality analysis until you have completed the first two steps of your Root Cause Analysis--investigate & prioritize any potential causative factors, and start your metadata assessment.   Otherwise, you may be misled by your findings.

Diamonds
Data quality is defined as complete and accurate data that is ready for business consumption.   Sources of poor data quality may include lack of data entry rules, unclear data element definitions, inconsistent metadata definitions for field type, format or intent, or breakdowns in data transformation processes as data flow between systems or applications.   Poor data quality results in bad business decisions; it contributes to major problems in using data effectively, and costs companies millions of dollars/year in terms of rework and inefficiency.   Data quality, in combination with robust metadata definitions, is part of the foundation of good data governance.

Data Quality Analysis

A Data Quality Management process should be designed to enable an area to start with a simple approach and over time to mature to one that is more proactive and comprehensive.   Initially, investigation may be focused on single data elements or events.   As patterns, data commonalities and other relationships appear, the data quality management process will grow to support complete business processes.     A mature data quality management process will not just resolve individual issues; it will also track relationships between data elements, ensure that business rules are consistent and generate statistical analyses to monitor previously addressed issues to ensure that data quality is stable and that an early warning system is in place as part of the data governance program.   The goal is to design a data quality management lifecycle, as shown in this diagram:

Carol_fig1

Initial Data Quality Analysis Process

I. Define data scope
    • Determine data elements that are associated with or are direct results of the reported issue
    • Check that all metadata definitions are present and current
    • Enlist the involvement of the Data SME or Data Stewards
    • Identify all source systems where the data originates, is   entered or derived
II. Extract and profile the data
    • Extract the relevant data from all key source systems.
    • Design the profile.   A profile will consist, at a minimum, of total record counts, min/max values, frequency of unique values, and frequency of invalid values (if defined) for each data element profiled.  
    • Profile the data to determine key characteristics that are contributing to the issue, such as:
      1. Wrong values
      2. Missing values
      3. Corrupt transformation processes
      4. Incorrect business rules
      5. Incorrect usage rules
III. Analyze Data Profile Results
    • Summarize the key findings from the profile detail
    • Determine what key drivers are contributing to the impact
    • Determine accountability for the data quality issue
    • Involve other Data Stewards in troubleshooting and designing the data quality solution
IV. Design the Corrective Action Plan

Two types of plans should be developed to address known data quality issues: a corrective action plan to fix the immediate source of the problem identified, and an ongoing monitoring plan, where thresholds have been determined and metrics are routinely collected and reported to data stakeholders.   This monitoring process should be scalable based on the number of data elements being tracked.

    1. Corrective Action Plan
      • Does scope of problem warrant change in metadata definitions, business practices or data entry rules?
      • Does scope of problem warrant a data governance standard?
      • Does the corrective action plan include details on how to fix the source of the problem as well as ways to correct historical data in the system?
    2. Preventive Action Plan
      • This plan will be designed to minimize the probability of data quality issues from recurring
      • Determine ‘early warning triggers’ based on designated thresholds.   These thresholds should reflect the business tolerance for inaccurate data (is 95% acceptable?)
      • If data latency is the source of a data quality issue, then latency thresholds should be included in the monitoring plan
      • Determine how frequently results of the monitoring plan will be reported to data stakeholders or governance oversight committees
Carol_fig2
So, now that summer is officially here, this wraps up my Data Governance Primer series.   Time for some iced tea and my favorite beach towel.   Come August, these little refreshers might be just the thing!

photo by Swamibu via Flickr (Creative Commons License)


CarolNewcomb_thumb Carol Newcomb is a Senior Consultant with Baseline Consulting. She specializes in developing BI and data governance programs to drive competitive advantage and fact-based decision making. Carol has consulted for a variety of health care organizations, including Rush Health Associates, Kaiser Permanente, OSF Healthcare, the Blue Cross Blue Shield Association and more. While working at the Joint Commission and Northwestern Memorial Hospital, she designed and conducted scientific research projects and contributed to statistical analyses.

Posted June 24, 2010 6:00 AM
Permalink | No Comments |

By Carol Newcomb, Senior Consultant

Minding Your Metadata

The second part of my summertime primer addresses ‘Minding your Metadata’.   I can just hear the collective groans and yawns now.   Sorry, but metadata collection is one of those necessary evils that may not be fun in the doing, but having it available as a resource to understand your data and use it appropriately is invaluable.   And you just might find some interesting surprises along the way!

Carol_image3

Metadata: What Is It & Why Do I Need It?

As you start your Root Cause Analysis (see last week’s primer), you first need to examine existing data definitions (or lack thereof).   Metadata is the foundation of good data management and forms the basis for Data Governance.     Pardon me for stating the obvious, but metadata is fundamental to investigating and resolving data issues and it is the first place to start when investigating data quality issues.

Metadata is ”data about data”.   Plain and simple.   It includes descriptive information about electronic data used in common daily business practice.   Metadata includes items usually found in a data dictionary: field name, field length, retention rules, and security access, as well as additional descriptive information that may include data origin (source or system), creation/entry date, method of creation (key-entry or the result of a calculation), purpose of the data (its intended use), how frequently it gets updated or refreshed, and current location in a database (table, view, schema).   If a data element is the result of calculation logic or groupings (such as age categories), those business rules used to generate the resulting data values should be collected as part of the metadata.

A good example of metadata that you may use every day would be ‘document properties’ in a Word document.   This feature captures data on the original document creation date, most recent access and update times, document creator, count of characters, words and pages.   If the document should be private, this will be indicated in its properties.   You may also tag the document by indicating key words in order to make it easier to find by you or others.

A few of the benefits of Metadata Management include:

  • Clarify rules for data entry
  • Reduce ambiguity around appropriate use of data elements
  • Eliminate problems associated with not having data definitions, business rules or transformation logic available
  • Validate legitimate values at the data element level
  • Provide evidence to regulators that security and confidentiality are protected
  • Centralize the storage and accessibility of metadata for end-users
  • Reduce the amount of effort required to research data results.

A Metadata Management Repository is a central location or system to collect and store metadata that may exist in disparate parts of the organization (data dictionaries, systems, spreadsheets, or people’s brains). The metadata repository will store detailed definitions centrally on a network where other users can find it.

There are three general sources of metadata that should be included in this repository:

Business Metadata – Business metadata attributes facilitate identification, understanding, and appropriate use of existing data elements.   These include clear business names and descriptions, relevant business rules, descriptions of the data sources, security and privacy rules, etc.  
Technical Metadata – Describes the technical attributes of data such as physical location (host server, database server, schema, etc.), data types, any transformations applied and domain of valid values, relationships to other data elements, precision, and lineage.   Technical metadata is used by business users and by IT staff to design efficient databases, queries, and applications, and to reduce duplication of data.  
Operational Metadata – Describes the attributes of routine operations on data and related statistics.   These include job schedules and descriptions, data movement and transformation processes, data read, update and performance statistics, volume statistics, backup and archival information.   Operational metadata is used by operations staff, and DBA’s to tune the system and ensure its continued efficient operations.   It is also used by business users to track such events as ”last use” of a field, and ”last load” of a data element.
Exciting stuff, huh?   Well, the whole point of metadata is to have the information about data available to a multitude of users when they need it, to keep it current, and to avoid confusion around usage.   So if you appreciate having a clean bathroom, and knowing where you keep your antiperspirant, you will also appreciate having good metadata!   The time for spring cleaning is well overdue.

CarolNewcomb_thumb Carol Newcomb is a Senior Consultant with Baseline Consulting. She specializes in developing BI and data governance programs to drive competitive advantage and fact-based decision making. Carol has consulted for a variety of health care organizations, including Rush Health Associates, Kaiser Permanente, OSF Healthcare, the Blue Cross Blue Shield Association and more. While working at the Joint Commission and Northwestern Memorial Hospital, she designed and conducted scientific research projects and contributed to statistical analyses.


Posted June 17, 2010 6:00 AM
Permalink | No Comments |

By Carol Newcomb, Senior Consultant

Newcomb_Graphic_01b

They say that Data Governance is about People, Process and Organization.   Much of the design work in planning for data governance is around people’s roles and responsibilities, then designing the organizational structure that will provide authority for decisions to be made and enforced.   The processes, however, are not new.   They are probably already being practiced within your organization, just in a decentralized, informal way.   In this blog series, I discuss the processes for 1) investigating and isolating the data quality issues—Root Cause Analysis—, 2) starting to collect complete Metadata Definitions, and 3) performing Data Quality Analysis.   Only when your governance group has worked through each step, in order, will you be more likely to design the appropriate solution.

Root Cause Analysis

The process of data governance is fundamentally very simple.

  1. Identify the data quality issues to address
  2. Prioritize the portfolio of issues to isolate/tackle the most important
  3. Perform Root Cause Analysis to determine the true source of the data issue
  4. Design the corrective action
  5. Formalize the correction through consideration & approval by the Data Governance organization
  6. Implement the fix
  7. Monitor the results

It seems like when we start to map out the discrete steps involved in the data governance process, much of the work is already being done in informal ways throughout the organization.   What some folks don’t realize is that data governance is often nothing more than formalizing a whole bunch of informal processes that either don’t get communicated, or aren’t accepted as a data standard.

Root Cause Analysis is the process of identifying probable causes of a data issue, and isolating the contributing factors.   In order to resolve any particular issue, root cause analysis involves fact-finding, drilling into details of the problem, talking to the right people, and separating out other associated (but not contributing) factors.

A standard tool for supporting the detailed findings is the Ishikawa Diagram, below.   

Newcomb_Graphic_02
To conduct a thorough Root Cause Analysis, use the following checklist:
  • Diagnose the problem as if you are a physician or a detective. Consider all possible sources of the symptom. Don’t rule anything out yet!
  • Boil the ocean—be exhaustive and creative.
  • Don't practice problem solving before collecting all possible causes.
  • Practice the ”5 Why’s”—don’t stop asking ”Why” until you have exhausted every conceivable potential reason.
  • Rank the factors if possible.   Identify the Primary causes versus the Secondary or associated factors.
  • Rule out each possible factor one at a time.   Justify why (you may need to come back to this later).
  • Find all potential business process and data owners to involve them in your understanding of the possible sources of the problem.
  • Share the findings with everyone involved in troubleshooting. They could rule out certain factors with their knowledge.
  • Test your hypotheses with actual data.     
  • Fix the problem and test again.
  • Publish/share your findings and fixes.   Communicating your findings may reveal additional factors you hadn’t considered.

After a thorough Root Cause Analysis has been completed, Data Stewards should proceed to Metadata Analysis and Data Quality Analysis.   These two techniques will be discussed in my next blogs.


CarolNewcomb_thumb Carol Newcomb is a Senior Consultant with Baseline Consulting. She specializes in developing BI and data governance programs to drive competitive advantage and fact-based decision making. Carol has consulted for a variety of health care organizations, including Rush Health Associates, Kaiser Permanente, OSF Healthcare, the Blue Cross Blue Shield Association and more. While working at the Joint Commission and Northwestern Memorial Hospital, she designed and conducted scientific research projects and contributed to statistical analyses.


Posted June 10, 2010 6:00 AM
Permalink | No Comments |

By Carol Newcomb, Senior Consultant

Newcomb_Graphic_01b

They say that Data Governance is about People, Process and Organization.   Much of the design work in planning for data governance is around people’s roles and responsibilities, then designing the organizational structure that will provide authority for decisions to be made and enforced.   The processes, however, are not new.   They are probably already being practiced within your organization, just in a decentralized, informal way.   In this blog series, I discuss the processes for 1) investigating and isolating the data quality issues—Root Cause Analysis—, 2) starting to collect complete Metadata Definitions, and 3) performing Data Quality Analysis.   Only when your governance group has worked through each step, in order, will you be more likely to design the appropriate solution.

Root Cause Analysis

The process of data governance is fundamentally very simple.

  1. Identify the data quality issues to address
  2. Prioritize the portfolio of issues to isolate/tackle the most important
  3. Perform Root Cause Analysis to determine the true source of the data issue
  4. Design the corrective action
  5. Formalize the correction through consideration & approval by the Data Governance organization
  6. Implement the fix
  7. Monitor the results

It seems like when we start to map out the discrete steps involved in the data governance process, much of the work is already being done in informal ways throughout the organization.   What some folks don’t realize is that data governance is often nothing more than formalizing a whole bunch of informal processes that either don’t get communicated, or aren’t accepted as a data standard.

Root Cause Analysis is the process of identifying probable causes of a data issue, and isolating the contributing factors.   In order to resolve any particular issue, root cause analysis involves fact-finding, drilling into details of the problem, talking to the right people, and separating out other associated (but not contributing) factors.

A standard tool for supporting the detailed findings is the Ishikawa Diagram, below.   

Newcomb_Graphic_02
To conduct a thorough Root Cause Analysis, use the following checklist:
  • Diagnose the problem as if you are a physician or a detective. Consider all possible sources of the symptom. Don’t rule anything out yet!
  • Boil the ocean—be exhaustive and creative.
  • Don't practice problem solving before collecting all possible causes.
  • Practice the ”5 Why’s”—don’t stop asking ”Why” until you have exhausted every conceivable potential reason.
  • Rank the factors if possible.   Identify the Primary causes versus the Secondary or associated factors.
  • Rule out each possible factor one at a time.   Justify why (you may need to come back to this later).
  • Find all potential business process and data owners to involve them in your understanding of the possible sources of the problem.
  • Share the findings with everyone involved in troubleshooting. They could rule out certain factors with their knowledge.
  • Test your hypotheses with actual data.     
  • Fix the problem and test again.
  • Publish/share your findings and fixes.   Communicating your findings may reveal additional factors you hadn’t considered.

After a thorough Root Cause Analysis has been completed, Data Stewards should proceed to Metadata Analysis and Data Quality Analysis.   These two techniques will be discussed in my next blogs.


CarolNewcomb_thumb Carol Newcomb is a Senior Consultant with Baseline Consulting. She specializes in developing BI and data governance programs to drive competitive advantage and fact-based decision making. Carol has consulted for a variety of health care organizations, including Rush Health Associates, Kaiser Permanente, OSF Healthcare, the Blue Cross Blue Shield Association and more. While working at the Joint Commission and Northwestern Memorial Hospital, she designed and conducted scientific research projects and contributed to statistical analyses.


Posted June 10, 2010 6:00 AM
Permalink | No Comments |

By Caryn Maresic, Senior Consultant

Parents on Vacation via Flickr (Creative Commons)

Julia’s parents were planning a vacation.   Her mother thought Pensacola would be a great destination—she’s heard so much about the wildlife, especially the dolphins!   Her father wants to see the National Naval Aviation Museum and the Blue Angels.   Since Julia’s traveled extensively, her parents asked her to make all the arrangements.   While having dinner with them to discuss plans, she jotted down the following notes:

  • Location:   Moderately-priced hotel close to water/sights.
  • Budget: $3,000 for transportation and accommodations.
  • Activities:   Beach and nature activities (Mom), science/historic sights (Dad)
  • Duration: 10 days.

Julia felt honored that her parents trusted her to get the job done.   After doing some online research, she made all the reservations and met with her parents to review the reservations.   She eagerly awaited the look on her parents’ faces as they scanned the vacation itinerary and read through the glossy brochures.

”Hawaii?”, they said in unison.   ”We didn’t want to go to Hawaii!"

"Honey, we chose Florida because we can drive there.   I don’t want to fly anymore.   Flying is such a pain,” Dad grumbled.

”I appreciate what you’ve done, Julia, but an old friend of mine lives near Pensacola and I was hoping to visit while we were there.” said Mom.

”But, Mom!”, exclaimed Julia, ”You said you wanted beaches, dolphins, sunny weather.   Dad, you like science and history—what about Pearl Harbor?   You two can’t go to the gulf coast—what about the oil spill?”

What happened here is typical of what happens to IT projects all the time.   It’s easy to say that we wouldn’t do what Julia did.   Would we?   Don’t we oftentimes:

  • Interview the business and record the requirements in an abstract way.
  • Believe that the we can deliver something better than what the business asked for.
  • Assume that the business lacks the capability to understand the technology.
  • Fail to get all of the requirements.   Not exactly our fault, but still a problem.
  • Neglect to keep the business involved in the process.

There has been a lot of buzz on IT-Business alignment of late, including this article on some specific companies that are going the extra mile: Beyond Alignment—as well as this one on lack of user involvement: Why IT Projects Fail: Lack of User Involvement.   Most companies aren’t as progressive.   The willingness to work together has to occur at all levels. Only when we let them drive can we deliver, if not what they asked for, then at least something useful.

photo by stevendepolo via Flickr (Creative Commons license)


Caryn_50x50 Caryn has over 20 years experience in providing high-quality data solutions to clients in the areas of Business Intelligence, Data Warehousing and System Integration.   Caryn has expertise in across industries with an emphasis in Pharmaceutical, Manufacturing, and Insurance.   Prior to joining to Baseline, she ran her own consulting company.


Posted June 3, 2010 6:00 AM
Permalink | No Comments |
By Rob Paller, Consultant

Buried_in_sand by eden pictures via Flickr (Creative Commons License)

Recently at a client, the data warehouse administrator was asked to define a sandbox environment in the production data warehouse for   analysts and developers working on a small project. The idea behind this sandbox was to allow the team a working area for collaboration and intermediate storage of results while working with the data in a purely ad hoc capacity. Instantly it was recognized this could be the start of something bigger within the organization—something that could not currently be provided by the incumbent business intelligence tools. The response had to be formulated quickly in order to avoid stifling the creativity of the analysts—or worse, the progress of the project—but care had to be taken as well; if managed incorrectly it could get out of hand and become a waste of system resources and a drain on human resources that had already been spread thin.   The business unit in question is looking to move from the confines the current business intelligence environment and push the edges.

This was a group of analysts that wanted to get their hands dirty and weren’t afraid to fail. They wanted to mash data together that previously could not be done by the business intelligence tools in their controlled ad hoc environments. This was data mining for the next set of KPIs that would shape the way business moves forward.

The concept of agile analytics is not new, eBay presented on and blogged about this concept in 2008. The idea at this client was simple. By leveraging the existing enterprise data warehouse system to house their sandbox environment the duplication of data is all but eliminated. Groups interested in sharing data between their sandbox environments are strongly discouraged until the data has been properly integrated into the production environment. The sandbox environments would also be given a short life expectancy at their inception to prevent the prototypes from becoming production and data ending up in a wasteland. This all sounded great on paper.

In the midst of a development architecture overview, a brief conversation among a few enterprise architects uncovered the potential Screw-Me Scenario that could bring the concept of agile analytics to an untimely demise. ”The users of the data warehouse are not permitted to write ad hoc queries outside of a controlled business intelligence tool. They might write a bad query.” Thanks for the warning, we’ll be sure to refine our pitch to the enterprise architects to diffuse this scenario before it turns ugly.

In Oliver Ratzesberger’s presentation for eBay’s Analytics as a Service, he acknowledges that the metrics we already know are cheap and the unknown metrics are expensive. But the known metrics are not pushing the edges. Known metrics are found in the middle of the box. Agile analytics is about pushing the edges about how your enterprise data warehouse is used to improve response to the needs of the business. It is about the evolution of the user community from one who plays in controlled ad hoc environments to encouraging them to experiment with new ideas and not to fear failing along the way. Agile analytics is about encouraging your users reach out for the edges and P U S H. Only once the edges are stretched can the middle of the box redefined.

photo by edenpictures via Flickr (Creative Commons License)


RobPaller_bw_100Rob Paller is an expert at business analytics and database administration. Since joining Baseline, Rob has been responsible for developing a case analysis system to streamline the oversight of food assistance benefits, implementing a common citizen data model, and assisting in the rollout of a new public assistance data model integrating data from over 10 years of legacy with a new benefit eligibility determination system.

Posted May 27, 2010 6:00 AM
Permalink | No Comments |