This article focuses on what basic education an aspiring business intelligence (BI) practitioner needs to put him or her on the track of becoming a competent professional, particularly if their workplace is government. In order to do this, let us review briefly the need for the discipline, the tools of the trade and the needs of the marketplace. From here it should be easier to construct an appropriate business intelligence curriculum.
Business intelligence attempts to extract meaning from mountains of bits and bytes. Developments in transmission, collection and storage of data and the ability to analyze it allow us to leverage that data to achieve our business or policy objectives. Knowledge workers in today’s marketplace must have access to the right information at the right time to make the right decisions, and all the while avoid “drowning” in information overload. Business intelligence is the principal discipline that helps us get there.
As we have often stated in previous articles, the need for business intelligence in the public sector, especially the U.S. federal government, is significant, urgent and has the potential to dramatically impact service-delivery to the citizen. Whether for national security, defense, diplomacy, and law enforcement or in transportation, health care, agriculture, housing or treasury issues, agencies need business intelligence to improve efficiency and service, as well as to meet stringent reporting, compliance and budget requirements. Business intelligence can help government achieve its policy priorities – and the stakes are very high.
All too often, however, information lies in “silos.” It resides in agency databases, on individual hard drives, or somewhere in emails, documents or the Web. Workers, some of whom may soon retire, possess vast amounts of explicit and tacit knowledge in their minds. Data and text, especially in the age of social media, are often accessible but often difficult to analyze, compare and report. These are the challenges that business intelligence seeks to help us overcome.
Business intelligence is a broad category of systems, processes, management principles and technologies to turn bits and bytes of data into knowledge. Some of the underlying principles of BI systems include:
- Integrated information. In order to make quick and sound decisions, organizations need to integrate data and text from various sources. Developments in data warehousing, online analytical processing (OLAP), and data mining allow us to combine information from a number of different sources to deliver a holistic view of our business objectives.
- Web-enabled. Enterprise information portals help ensure that information is collected and distributed across the Internet to enable users with real-time access to the information they need, whenever they need it.
- Architected environment. Methodologies like service-oriented architecture (SOA) and enterprise architecture help organizations ensure that IT and business intelligence investments map to their unique objectives and requirements.
- Analysis, reporting, querying and prediction. Business intelligence applications continue to evolve and improve, providing the ability to not only report historical data, but to use real-time data to predict what is to come. Data can increasingly be reported and analyzed in the format that decision makers need to improve their business performance.
With this as prelude, let’s explore what a business intelligence education program should look like for the public sector. Most of what this article addresses is based on our personal experience running such a program in our firm: the Barquin International Business Intelligence Certification Program
The intent of the program is to equip students with a background and foundation in business intelligence concepts, principles, technologies, methodologies and developments. The courses are attempting to help them understand the ways in which the discipline can help organizations achieve their business and policy objectives, and to expose the participants to some of the innovative tools and techniques they ought to explore when implementing business intelligence solutions on their own.
The Business Intelligence Certification program offers an all-around training on business intelligence. It is divided in 6 modules of 6 hours each for a total of 36 classroom hours. This is complemented with approximately 4 hours of reading and exercises per course for a total of 60 hours education/training hours. The curriculum is scheduled to be taught over a 9-month period with make-up sessions offered as needed.
The following are the major modules covered in the program, essentially in this order:
- Knowledge Management
- Data Warehousing
- Data Mining
- Knowledge Exploration
- Data Quality
- Data Visualization
- Big Data
Let us provide a brief overview what is covered in each module.
- The Knowledge Management (KM) module is taught first in order to present the big picture right up front. The reality is that we have a government that is losing institutional memory due to a rapidly changing workforce; is replete with functional stovepipes including non-interoperable data domains that prevent enterprise-level integration; has significant attachment to legacy due to resistance to change and agency cultures; has unwillingness to share information because of the “not-invented-here” syndrome; and has huge difficulty in identifying and disseminating best practices government-wide.
As outlined through the syllabus of the KM module, we heighten the student’s awareness that BI is the result of analysis done on your data, that analysis should precede action and that decision support is the primary use for business intelligence. Because action should be based on knowledge rather than on data, the process of knowledge creation, extraction and sharing is central to the BI practitioner. This clinches the KM loop once they understand that knowledge management is not just about technology, but must take into account people and processes too.
- Data warehouses are repositories of content and platforms for analysis. Data Warehousing, therefore, is an essential tool for the production of business intelligence. In order to do data warehousing, the BI practitioners must have a solid understanding of what it is and what it is not; hence, concepts and definitions are important. In addition, data warehousing architectures define entities such as third normal form (3NF) data warehouses, data marts, operational data stores (ODSs), and the schema such as stars and snowflake that are often critical for performance. The layered structure of data warehousing beyond the data warehouse database must be covered. That means extraction, transformation and loading (ETL) as well as cleansing; the end user delivery layer; and metadata. Beyond this, the BI practitioner must know that data warehousing is generally complex, expensive and carries some historical baggage for many IT shops. Thus the challenges to data warehousing deserve discussion and review.
- Data Mining is defined by Herb Edelstein, an internationally recognized expert in data mining and co-founder of The Data Warehousing Institute, as as “an information extraction activity whose goal it is to discover hidden facts contained in data bases.” Implicit in the definition, of course, is that if to do mining you need a mine, to do data mining you need a “data-mine,” that is, a database or, better still, a data warehouse. In looking at the applications of data mining, in addition to classical areas such as customer segmentation, propensity to buy, customer profitability and the like, we also address those that are of special interest to the public sector such as fraud detection and personnel attrition. The course reviews the tools of the trade, starting with some foundational statistics and touches on probability theory, correlation and multivariate analysis, as well as the basics of decision-trees, neural nets, etc. It differentiates between data mining as a discipline and the broader process of knowledge discovery, and then reviews some of the principal products and vendors currently in the marketplace.
- Given the deluge of unstructured data that we experience these days, Knowledge Exploration becomes a very important module. It is in this course that we focus on the need to obtain business intelligence not only from the fixed alphanumeric fields we store in data repositories, but also from large amounts of unstructured content. As a result we explain the eight-step framework for knowledge exploration, introduce the concepts of the “knowledge space” and of the “knowledge exploration workbench.” Search becomes a frequent starting point for knowledge exploration, and this topic is given due treatment early on. Likewise, we review the toolkit that allows us to organize, explore, tag, summarize and manipulate content in the process of preparing a knowledge base for analysis. We touch on the some of the architectures and techniques – more thoroughly covered in the Data Warehousing and Data Mining modules – to introduce the analytics component of the course. A review of visualization techniques is carried out, and we conclude with the need for rigor in interpretation of results.
- We all learned about GIGO (garbage in, garbage out) early on. Data Quality is essentially about addressing GIGO and reducing its impact on operations in general and on the value of business intelligence in particular. The first sections deal with awareness, definitions and examples. A methodology for implementing effective data quality initiatives is presented, also introducing the concept of data integrity as a meta-concept in the data quality space. A review of the principal vendors and tools is carried out, and some demos are presented.
- The Data Visualization module deals with the display and visualization of information with an emphasis on social media. It is premised on the fact that we must look increasingly for ways to extract business intelligence from Facebook, Twitter and YouTube as well as from their successors to some in Web 2.0 and beyond. And, then, we need to provide better visualization for the results from our analysis. As we struggle not just with the huge volumes of data, but also with the different sources, formats and structures as well as with the compression of time in the virtual world, we need to devise approaches that provide us with an instant understanding of results. The graphs, charts or constructs must be intuitive, user-friendly, and give us the ability to learn from the data without needing to spend much time comprehending the tools.
- The Big Data module tackles the buzzword of the day and attempts to define the term and explain its importance. After navigating the differences between perception and reality and exploring the imagery, acoustics and other big data segments, we focus on text, mainly coming from social media, as the principal category for analysis. Here we emphasize semantic analysis, text analytics, sentiment analysis and real-time stream processing. We end the session with a look at Hadoop, MapReduce, NoSQL and some of the other tools in the big data toolkit.
These seven modules are accompanied by a suggested reading list, which is expanded and culled dynamically as new relevant literature emerges in each area. And each session has substantial class interaction often accompanied by case studies, demos or problem sets.
In all, there are surely more comprehensive approaches that could deliver more in-depth education of the BI discipline, but for purposes of addressing the basic needs of a new practitioner, our curriculum seems quite adequate. Its goal is to give a solid platform of understanding from which to begin operating on one’s own. This program provides references in terms of literature and case studies and offers methodologies and processes that can serve the students well as the building blocks for their practice.
SOURCE: Business Intelligence Education and the Public Sector
Recent articles by Dr. Ramon Barquin