We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.


Agile Data Governance to Harness Enterprise Big Data

Originally published June 19, 2012

Data management has been a long time affection, addiction and affliction of ours because it is an essential factor in any decision support system. It was decision support that led us to data warehousing in the early 1990s and it soon became very clear that metadata – data about the data – was the critical layer that held the architecture together. The reason, of course, is because it is in the metadata – and the process models that include time, stakeholders and provide the greater business context – where one finds the rules and relationships that define how everything is supposed to work. How can we tell the size and type of a data element? How do we know which specific version of a table is the most recent? How can we decide whether the value of a field is valid or not? Where do we find business definitions of data elements to clarify their actual meaning? How can we disambiguate between synonyms when we have several acceptable options? These are just some examples of the role of metadata and business context in a data warehousing environment. And every single instance points back to data governance.

Governance is the act of governing; it refers to control or authority. Good dictionary definitions include: to make and administer the affairs of; to control the speed or magnitude of; to regulate; to control the actions or behavior of; to keep under control, etc. In all cases it refers to setting rules, which in data management leads you back to the metadata.

But data management “ain’t what it used to be.” Once upon a time, heads of data processing organizations needed to be concerned with manuals, data dictionaries, punched card files, librarian programs, tape storage and the like. With the explosion of structured data and the need to address the data problems predicted for Y2K, we moved massively into data conversions, data migrations, and rushed to Enterprise Resource Planning (ERP) systems. The result was a transformation in enterprise data environments and thus the nature of data governance.

But the new data governance approaches and supporting toolkits never truly managed to establish themselves in earnest because they led to the usual confrontation between the theoreticians, wanting to “do it right” and the practitioners, needing to “just do it.” Furthermore, to truly “do it right” there would be unavoidable skirmishes over some of IT’s “sacred cows” such as relational databases and the world of SQL.

But now we have been hit with the explosion of very large structured data sets, such as from real-time retail transactions, and unstructured data tied to social media, imagery, surveillance, bioengineering and all the other sources that have come together under the rubric of “big data.” Now, we need to both “do it right” and “just do it.” It is clear that we need agile data governance if we are going to survive and prosper in the era of big data for the enterprise.

Agile Data Governance: What Is It and How Do We Do It?

Agile data governance (ADG) is a streamlined approach that emphasizes value produced by following a formal procedure, albeit using procedures of its own. This is the foundation of all agile approaches, whether for software development or project management. Data management is no exception. ADG also focuses on ensuring there is a direct line of sight between corporate policies and any business activities involving downstream engineering implementation, which is critical for speed and low cost. With ADG, business changes can be implemented quickly without an army of experts performing extensive modifications for even simple cases, as is the norm today.  

The obvious question is “How is this possible?” The most important step is to learn from the trouble points of past efforts and stay focused on the business value of data management. This means that governance, and its products of standards and common models, must be flexible, adaptive and engineering ready. It also means we are ready to acknowledge the trusted techniques that worked well in departmental systems are just not capable of being agile at enterprise scale.  

Lessons learned from many projects distill into a few major hurdles that span organizational, process and technology issues: group rivalry, terminology confusion, poor knowledge sharing and inflexible designs. Most of us have lived through these problems and have seen the negative effects they produce, as well as the frustration and inefficiency they create in teams. The good news is there are ways to overcome them.

Group rivalry is typically the most immediate and difficult challenge. It can show up as tepid cooperation with people emphasizing bureaucratic procedures over substantive results, a kind of “civil disobedience,” overt disagreement with authority or even the need to build enterprise data. The solution and the way to achieve almost immediate agile success is to avoid the underlying friction points that are rooted in the fear of losing funding, prestige and jobs. First, openly embrace all important business variations and recognize that there is no valid reason, neither business nor technical, to force everyone to use one view exclusively. In fact, the notion that there is one “single version of the truth” is a myth promulgated by people stuck in a simpler era of departmental system scope. Everything depends on its business context. Agile governance concentrates on identifying and defining key business entities in just enough detail to manage them efficiently, flexibly and adaptively. We certainly want to consolidate variations into a small set of definable standards, but only those that make business sense. This alone (in our experience) can turn a room full of uncooperative people into a unified problem-solving assembly.

Terminology confusion refers to variable meanings of the same term in different contexts, and arises in almost all medium-to-large environments. Some of the most common, and counterintuitive, problem terms are “customer,” “person,” “job,” “fund,” and even an ERP “part number.” Forget about trying to simply line up these terms and build a single version with lineage. The problem is that the data values morph over time to be intricately tied to business activities or application functions (which also tend to be undocumented). This is the data semantics, which is what the values mean in business context. Why can’t a good architecture solve this? It could, if the requisite corporate knowledge could be captured, decomposed, normalized and distributed. But this rarely happens because the critical information is buried in a complicated web spread across the organization. Hence, top-down projects tend to be lengthy, expensive and unfulfilling.

What is missing is a simple way to identify how and why the terms have different meanings, and how to implement these variations in data models, extract-transfer-load (ETL) and business intelligence. More good news: this can be solved rather easily by doing the same thing at the same time (this is agile!); that is, what we did to overcome group rivalry. We accept the variations for most, if not all – important terms – embrace them and use a guided framework of well-known concepts (we use organization, process, technology in our Ψ-KORS™ approach) to rapidly identify, define and implement them as related but unique entities. We have used this approach and produced consensus-driven normalized terms with multiple variations in a few hours. Compare this to the same people struggling for years. Importantly, these results are engineering ready and have line of sight between governance, data models, integration and data stores.

Poor knowledge sharing should not be a surprise since it is one of the most common complaints heard at professional meetings. It should be obvious that if a business is to achieve success, then the people creating data models and ETL rules should know whether there are different situations in which the stored data will differ among applications for valid business reasons, such as mixing numeric data with free text comments, or ERP part numbers that are linked to sales processes in one instance and external regulations in another. This is exacerbated when people work independently without visibility and only share documents. We need an integrated metadata capability that does not eliminate people working in independent tools, but where important artifacts (e.g., business models, data models, glossaries, code lists and integration rules) are visible, coordinated and able to be referenced. This is important so everyone can understand current business context annotated to data models tables and columns, glossary terms, and rules for ETL transforms. This is a living environment and must be simple and intuitive to use, available to people across the organization, and not based simply on static documents. In our ADG approach we use a new commercial Web-based collaboration system that was specifically implemented for the agile methods we are discussing.

Finally, inflexible designs are the same rigid structures that might have worked at a departmental scale but that create very complicated, expensive and inflexible bottlenecks at the enterprise level. In decades past, computing power and data storage were expensive and not always available, so it made sense to “get it right” and build highly structured data models, ETL rules and physical database structures. But these rigid approaches are based on two assumptions that are no longer valid:

  1. Requirements are stable over the project timeline.

  2. Data semantics can be clearly specified in data structures. We have already described why these are invalid and how to overcome their limitations for governance.

Furthermore, we need to ensure that upstream agility does not turn into downstream sluggishness. We can do so as part of agile governance by avoiding rigidity and taking advantage of cheap computing and disk storage. Several new approaches are being promoted with one of the fast-growing techniques (NoSQL) based on the flat storage of key-value pairs. But, the NoSQL approach loses value because it lacks direct line of sight to governance decisions about important entities like “customer.” We use a hybrid approach (Corporate NoSQL™) which blends traditional table structures to give direct connection to governance and fit with existing modeling skills, and NoSQL for its immense flexibility and inherently faster execution speeds.

Summary

We live and work in a world that is very different from the one that spawned the most ingrained approaches and methodologies still in use for governing data at the enterprise level. Data has exploded to the point that 2.5 quintillion (2.5 x 1018) bytes of data are created every day. We have compressed the time dimension so that we no longer have the luxury of weeks or months to react and think through our responses to acts of war or marketing moves from the competition. We now are forced to operate and respond in near real time in business environments that are replete with complexity, uncertainty and change. We need to move and “just do it” but in an intelligent and adaptable framework that allows us to “do it right” in terms of approach, scope and direction. That is what agile data governance is all about. It is real, it works and it’s here to stay.

  • Dr. Ramon BarquinDr. Ramon Barquin

    Dr. Barquin is the President of Barquin International, a consulting firm, since 1994. He specializes in developing information systems strategies, particularly data warehousing, customer relationship management, business intelligence and knowledge management, for public and private sector enterprises. He has consulted for the U.S. Military, many government agencies and international governments and corporations.

    He had a long career in IBM with over 20 years covering both technical assignments and corporate management, including overseas postings and responsibilities. Afterwards he served as president of the Washington Consulting Group, where he had direct oversight for major U.S. Federal Government contracts.

    Dr. Barquin was elected a National Academy of Public Administration (NAPA) Fellow in 2012. He serves on the Cybersecurity Subcommittee of the Department of Homeland Security’s Data Privacy and Integrity Advisory Committee; is a Board Member of the Center for Internet Security and a member of the Steering Committee for the American Council for Technology-Industry Advisory Council’s (ACT-IAC) Quadrennial Government Technology Review Committee. He was also the co-founder and first president of The Data Warehousing Institute, and president of the Computer Ethics Institute. His PhD is from MIT. 

    Dr. Barquin can be reached at rbarquin@barquin.com.

    Editor's note: More articles from Dr. Barquin are available in the BeyeNETWORK's Government Channel

     

  • Dr. Geoffrey MalafskyDr. Geoffrey Malafsky
    Dr. Malafsky is founder and CEO of Phasic Systems Inc, a consulting and software products company. He works in Agile data methods especially on Agile governance for term standardization and common models, and Agile warehousing and integration. This work is based on extensive work and solutions for government and corporate clients on the toughest challenges in building common data with semantic consistency across the data lifecycle in his previous company, TECHi2, where he was the founder and President. Prior to that, he was Director of Advanced Technologies for SAIC, a major government contractor, and a research scientist at the Naval Research Laboratory. He has more than thirty years of experience and is an expert in multiple fields including information and data engineering, knowledge management, knowledge discovery and dissemination, and nanotechnology. He has a PhD from The Pennsylvania State University. He can be reached at gmalafsky@phasicsystemsinc.com.

Recent articles by Dr. Ramon Barquin, Dr. Geoffrey Malafsky

 

Comments

Want to post a comment? Login or become a member today!

Posted June 19, 2012 by Naveen Sidda

Very interesting article! Agile metholodgy simulates real world and you guys have put it right perspective for data world. Thanks

Is this comment inappropriate? Click here to flag this comment.