We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.


Introduction to the Government Information Factory (GIF)

Originally published July 19, 2004

The GIF is a framework for planning the architecture to support the information needs of a government agency. The GIF is shaped by many factors including the need to:

  • exchange information with other government agencies;
  • protect information held in confidence within the boundaries of the agency;
  • provide online transaction processing between individuals and the agency;
  • operate information processing within a finite budget;
  • provide long-term archiving for certain kinds of records;
  • provide public access through the Internet;
  • protect agency processing from the intrusive access of data from the Internet;
  • provide both proactive and reactive security through the agency’s information systems;
  • manage a large volume of data;
  • monitor activity as it occurs;
  • integrate data into a cohesive whole as it is collected from disparate sources;
  • support the usage of sophisticated reporting and analytical tools;
  • service many different kinds of users across the agency’s domain;
  • provide accurate and timely information;
  • provide data at a low enough level of detail that it can be reshaped to support all information needs; and
  • provide a definitive source of information in cases where the accuracy of information is questioned.

Every agency has a large charter that needs to be satisfied by its information architecture and the GIF is the framework to satisfy the mandates of the charter.

The following figure shows the GIF:

The GIF is not designed to be built all at once. Even the richest, most sophisticated agency builds its GIF a component at a time. Indeed, in many agencies, the GIF will never be completed. Instead the GIF may have parts constructed for one agency that are not needed in another agency.

Some of the differences between agencies that lend themselves to a different interpretation of the GIF include:

  • the need for security;
  • the need for online processing;
  • the need for volumes of data;
  • the need for archival processing, etc.

Despite these differences in implementation, the GIF is designed to serve all the information needs of all the agencies. It becomes useful even where there is not an initial information processing need. By implementing the GIF, a future need can be correctly positioned within the overall technological architecture, and it becomes the road map into the future for the informational needs of the agency.

Different components of the GIF can be considered “major” or cornerstone elements.  The figure below outlines these cornerstone components:

These cornerstone components are:

  •       The operational applications are where the day-to-day detailed processing of the agency occurs. The operational applications are where individual records are created, deleted, and updated. Most interactions between the agency and the public served by the agency occur in the operational applications. Nearly all online processing that occurs happens here. In some environments the operational applications are also called the “legacy applications” because the applications are so old.
  •       The data warehouse contains all of the detailed historical data that the agency has. The data warehouse contains integrated data that is stored and organized at a low-level of detail. This detailed data can be shaped and reshaped in many ways. The key to responsiveness in the data warehouse is that the data is ready for new requirements, in addition to serving existing requirements. The data warehouse represents the agency-wide view of the data. One of the essential components of the data warehouse is the ETL (extract/transform/load) component. The ETL component is the software and procedures that are in place to integrate and convert data from its application format into a truly agency-wide format. The data warehouse contains snapshots of data and does not update processing. If a change needs to be made to data inside the data warehouse, a new snapshot record is written. Every record of data found in the data warehouse is time-stamped. There is one moment in time is which every record in the data warehouse is accurate. A record in the data warehouse may have data from any number of source application systems. One record in the data warehouse is related to another record by means of primary and foreign keys. The optimal design technique for the data warehouse is a normalized data model, not a star schema or a multi-dimensional model. Because historical data is stored at a detailed level in the data warehouse, data grows to very large volumes. For medium and large sized warehouses, it is optimal to store data on more than one storage medium.
  •       The data mart is where different departments or agencies have their own rendition of the data found in the data warehouse. To create the data mart, the granular data found in the data warehouse is accessed and restructured into the form mandated by the department or agency. Usually data is summarized and aggregated as it passes from the data warehouse to the data mart.  The agency or department can access the data in the data mart using specialized business intelligence tools.
  •       The global ODS is where online transaction processing occurs on integrated current data. In many regards the global ODS is the place where hybrid processing occurs. The processing that occurs in the global ODS is a combination of operational and informational processing.
  •       The project mart, sometimes referred to as the “prototype”, is where there is high flexibility and rapid response to data needs. In the project mart, data can be restructured and analyzed in ways that cannot be done anywhere else. When it comes to looking at data in an unstructured manner, nothing beats the project mart. The project mart is a temporary structure and often mutates into some other form of processing such as a data mart, an exploration warehouse, or a data warehouse.
  •       DSS applications are those applications that depend on the data warehouse for the bulk of their data for processing. DSS applications operate at the micro-level or the macro-level.
  •       The exploration warehouse and data mining facility is where heavy statistical processing occurs looking for patterns of activities that have never been discovered. The exploration warehouse and the data mining facility contains data of the most granular type and a great deal of historical data. The requirements for processing are unknown upon entering the exploration process. The exploration warehouse and the data mining facility contains data that is structured for flat files, which in turn is optimal for statistical analysis.
  •            Nearline storage (sometimes called alternate storage) is storage that is used to hold bulk amounts of data that does not have a high probability of access. Nearline storage is not disk storage, but is a storage media that is cheaper and slower than disk storage. There is a performance penalty to pay when using nearline storage but, given that data placed on nearline storage does not have a high probability of access, the performance penalty is mitigated.
  •            The integrated archival storage facility is where older data with a low probability of access is placed. The integrated archival storage facility holds bulk stores of data for a lengthy period of time. It is hosted by a storage medium that is inexpensive and reliable, holding data safely for longer periods of time.
  •            The web environment is where the agency meets the public electronically. The web environment allows the public to see information which is designed for public consumption and allows a certain amount of messaging between the agencies’ systems and the general public.

There are other components of the GIF that are important in their own way. However, the components that are listed here are those that hold most of the data and do most of the processing.

DEVELOPING THE COMPONENTS OF THE GOVERNMENT INFORMATION FACTORY

The different components of the GIF have very different characteristics. These characteristics differ widely in terms of:

  • volumes of data;
  • transaction response time;
  • security needs;
  • system usage, etc.

Because there are vast differences in these characteristics, it is not surprising that there are different development methodologies that can be used to build them. These are:

  • operational application environment;
  • global ODS environment;
  • nearline storage environment;
  •  integrated archival storage environment; and the
  • web environment.

The parts of the GIF that are subject to the SDLC waterfall methodology are those parts that are repetitive and predictable in nature. The detailed activities of operational applications, the web activities, and the activities passing through the global ODS are all full of repetitive, predictable processes and therefore there is a need to use the SDLC approach to systems development.

The spiral development approach is also applicable to the development of components of the GIF and is most useful in the development of:

  • the data warehouse;
  • parts of the global ODS;
  • project marts;
  • DSS applications;
  • parts of nearline storage;
  • parts of integrated archival storage.

The spiral development approach works best where only partial requirements are known and where development results are desired very quickly. Occasionally, some of the operational applications can be built under the spiral development approach as well.

Still other parts of the GIF are best built under the heuristic development approach. In particular, the components which lend themselves to this approach include:

  • the exploration/data mining facility;
  • parts of nearline storage; and
  • parts of the integrated archival store.

The ideal order in which organizations should build the components of the GIF would show that there are two entry points. The classical entry point is that of the operational applications. Once the operational applications are built, then the data warehouse is built. After the data warehouse is built, one can build the ODS, data marts, the project marts, the DSS applications, and the exploration warehouse/data mining facility. After the data warehouse is built and large amounts of data are stored, the alternative storage facility and the integrated archival facility can be constructed.

The other entry point to the building of the GIF is to start with the web environment. In this case, the web environment is built and then the web data flows into the data warehouse.

In many ways the building of the government information factory is like the building of a city. A city is built from a plan over many years. There are many complex and expensive components to building a city, each of which rises and falls on its own merits. There are financial, residential, municipal, and recreational districts. There are plans for houses, buildings, parks, etc., all within the city plan. The city plan may take years to fulfill or, in some cases, the city plan may never be fulfilled entirely. But the existence of the city plan and its fulfillment proves its worth from the very first day of its adoption.

A city plan varies from a house plan in many respects. A city plan usually requires components which may never exist. A house plan requires all the components to exist. You don’t build a house without electricity and a bathroom, for example. A city plan may take years to complete. A house plan generally has a finite time for completion. A city is paid for by all of its citizens and legal entities who reside in it. A house is paid for by its occupants.

The GIF then is like a city plan and the specifications for an application, whereas, the  data warehouse is more comparable to the plans for a house or an office building.

When there is disagreement between two or more informational processes, such as in a cube found in a data mart and in a report coming from a DSS application, the data warehouse provides a foundation to determine why there is a difference. But the data in the data warehouse serves another very important function. Once the data warehouse has been built and is in place, the data in the data warehouse can be accessed and can be used to build a new data mart or DSS application quickly.  The data warehouse sets the stage for a rapid development process in that once the infrastructure is built; it is immediately available for other purposes.

For these very important reasons – reconcilability of information and speed of development, the data warehouse becomes the hub of existence of the GIF.

  • Bill InmonBill Inmon

    Bill is universally recognized as the father of the data warehouse. He has more than 36 years of database technology management experience and data warehouse design expertise. He has published more than 40 books and 1,000 articles on data warehousing and data management, and his books have been translated into nine languages. He is known globally for his data warehouse development seminars and has been a keynote speaker for many major computing associations.

    Editor's Note: More articles, resources and events are available in Bill's BeyeNETWORK Expert Channel. Be sure to visit today!

Recent articles by Bill Inmon

 

Comments

Want to post a comment? Login or become a member today!

Be the first to comment!