The Florida K-20 Education Data Warehouse (EDW) was created by legislative appropriation in July 2000. A consulting firm was engaged and the EDW was completed in July 2003 at a cost of $5.73 million. Like many data warehouse projects, it encountered the common pains of scope creep, fuzzy objectives and unclear sponsorship. Shortly after going live, design and architectural problems began to emerge. Data storage volumes were greater than anticipated, expansion of certain subject areas was difficult, unexpected data volatility made answers hard to interpret and users were unable to access data marts or queries as the front end tools collapsed under the quantity of data retrieved. In December 2003, work began on version two of the EDW.
The first step in the EDW redesign was to address the architectural issues. Although there was an information technology (IT) architecture in place, it primarily addressed technical issues such as ETL (extract-transform-load) processes and database structure. A more robust architecture is required. In order to accomplish this, the architecture must meet certain requirements:
The only data warehouse architecture that meets these requirements is Bill Inmon’s Government Information Factory (GIF). Using GIF as a basis for design, the EDW Architecture is shown in Figure 1.
There are five key components to the EDW Architecture: the operational environment (yellow), the information environment (green), the volatile reporting environment (blue), the communications environment (light brown) and the quality assurance environment (red).
Operational Environment
The operational environment consists of information systems located throughout the state of Florida at local education agencies (LEA) such as school districts as well as the systems located at the Department of Education (DOE) in Tallahassee. The data at the LEAs is consolidated by the DOE. This data is extracted into a staging area from more than 25 systems at DOE and represents 67 school districts, 28 community colleges and 11 state universities, as well as central systems controlling such things as state-wide facility management and teacher certifications.
Information Environment
The information environment contains the core components of a data warehouse: normalized database, reporting facilities, data mining and exploration environments and ETL. Not all of these components are currently available, but inclusion of them in our architecture ensures that they are considered in long-range planning and budgeting. The GIF framework allows for this kind of flexibility and completeness.
Volatile Reporting Environment
The kinds of data being tracked in the EDW create numerous interdependencies that affect the load schedule. For example, we cannot add a student’s course schedule until we have loaded the master schedule of course offerings. However, certain high-stakes information requires analysis before it is loaded into the EDW. To accommodate these issues, we have included a volatile reporting environment. This component consists of an operational data store (ODS) in which the volatile information is held and a set of reporting marts, which draw data from the ODS and EDW to provide current and longitudinal/trend analyses.
Communication Environment
Education information both draws from and provides vital information to other agencies. We must also be accountable to the citizens of Florida. The communications environment provides those mechanisms for inter-agency, policy maker and public access to education information.
Quality Assurance Environment
The EDW Architecture is concerned with two aspects of quality assurance: the stored information is accurate; and the reported information is interpreted correctly. The first entails a trace-ability and inspection process. The second is controlled through careful design and access to reports and data marts.
We view the EDW Architecture as a master plan for the next five years. Even then, we don’t foresee more than incremental adjustments to keep it functional for many years to come. At the current time, roughly 40 percent of the EDW Architecture has been implemented. Work is actively proceeding on another 10 percent. However, having a roadmap of where we want to go helps keep us focused on how the pieces fit together and lets us evaluate new technologies as they become available. It also helps us define requirements for new components because we have a “big picture” of where everything has to fit and what components it must interact with. We have become firm believers in the power and utility of the GIF as a planning and management tool. Just as you would never build a house without a set of blueprints, you should never build a data warehouse without an information architecture.