Oops! The input is malformed!
Originally published July 21, 2008
While most business leaders agree that data is a critical strategic asset, the decision on how to best manage it eludes most enterprises. What is known is that the amount of enterprise data continues to grow, and more often than not it is maintained in an ever-increasing number of locations. For example, a recent IDC report noted that the average company has 49 applications in 14 different databases and typically has no more than 20% of its customer data residing in any one location.
Given this, data architects have an urgent and important decision to make. What is the best architectural design to enable an enterprise to leverage its data to enhance relationships, increase efficiencies and lower costs? Many data practitioners are stumped when it comes to choosing a master data strategy; some lean toward a more traditional approach that takes advantage of data warehousing principles, while others insist that infusing master data management (MDM) with service-oriented architecture (SOA) can bring about a better result.
Let’s take a look at both approaches to understand what method works best for specific business types.
Many of today’s MDM practitioners earned their stripes in the business intelligence world. These experts understand data and have a true appreciation for the pain, risk and cost that poor data can create for a business.
However, this skilled group has typically taken a more traditional approach to data management that does not easily translate to today’s more advanced MDM environment. The business intelligence mind-set is often to create one large data warehouse that holds all of a corporation’s data, including master, reference and transaction data, where no two types of data are differentiated. While this makes good use of an existing data warehouse, it does not create the high-performance, single-focused, real-time environment required by many of today’s enterprises.
Data warehouses were designed to consume large volumes and types of data over historical periods of time and derive, prepare and propagate that data in batch. This makes data warehouses extremely useful for data analytics; but because it can take hours to extract, load and transform data, they are not well-suited for real-time business functions. These systems, when pushed to accommodate alterations to existing data, can become brittle, causing the entire system to experience availability, performance and scalability issues. For example, 99.999% availability is a requirement for online consumer-based businesses with real-time customer demand. This means that the data system can only be down for maintenance or upgrades for five minutes and 14 seconds per year. A traditional data warehouse is optimized for batch processing, not real-time transactions, making it extremely challenging to meet these high-availability requirements.
In a data warehouse environment, scalability and performance are tied to one another; as the volume of data in a warehouse expands, so too do opportunities for performance issues. For instance, as significant rows of data are added, some data tables grow exponentially and are queried regularly while others are used only moderately or infrequently. Performance problems arise when frequently queried data cannot be delivered in the required timeframe because infrequently used data tables are utilizing the warehouse’s memory or index space. Additionally, performance will be impacted when multiple queries, inserts, updates or deletes are requested at the same time. A data warehouse manages each of these requests using a single engine, meaning each request vies for priority, creating a lag in response times.
And lastly, the cost to create, manage and maintain a traditional database is difficult to forecast and is often much greater than initial budget predictions. Industry experts have estimated that creating a single data structure for all of an enterprise’s data will take a minimum of two years. In addition to factoring the cost of this large time investment and an inability to adequately access corporate data for an extended period, business owners must consider that maintenance and hardware costs will likely escalate for such a complex endeavor.
While a traditional approach has its place in a data management discussion, a growing number of data architects are shifting gears to focus on a more nimble approach that applies SOA and relies on the idea of separation of concerns. But how exactly do SOA and separation make an MDM environment more nimble?
SOA is an effective architectural strategy for master data that resides in more than one location. A properly employed SOA can help enterprises leverage their existing systems by creating a master data service (MDS) that runs interference between each of the existing data systems. These SOA-based services allow systems to remain “autonomous,” meaning that they can stay independent of one another, which eliminates the need for each system to know the details of how other systems manage their data.
Separation of concerns is a long-standing architectural design principle that, when used for MDM, instructs or causes businesses to separate each type of master data into its own hub so that all customer data is grouped together, all product data is grouped together and so on. When combined with an SOA, these specialized data domains gain agility because they can easily connect to one another through an enterprise service bus (ESB). As a first-class service on the ESB, this MDS is tailored to manage the enterprise-wide business rules for data quality in one place and is fine-tuned to be the arbiter of data quality among the systems and services with which it interacts.
An SOA-based approach provides a level of flexibility that hasn’t been feasible before. This nimble model can support a flat structure where only a handful of data attributes are managed, or it can accommodate much more complex circumstances where hundreds of data attributes and complex interrelationships exist. Additionally, master data services can be coupled or decoupled to orchestrate new business processes. For example, an organization can combine services so that when customers are notified of a completed order transaction, the notification scheme automatically updates the asset service and product service, too, but does not impact any other unrelated data domains.
Scalability and performance are also dramatically improved with an SOA-based system, since one data hub can grow, change and be queried without impacting the performance response rates of the other hubs. In addition, maintenance is also simplified with an SOA-based solution. For example, data owners can easily alter the structure of a product catalog without taking customer or account data offline because separate types of master data can exist on their own, rather than being intermingled with other data types. As new data is created, they can easily be added to the specialized hub, and new types of hubs can be created and connected through an ESB if a business begins collecting new types of data.
Finally, since an SOA-based approach to MDM does not require all data to be moved to a centralized location – a process that is costly and time intensive – enterprises are able to stand up a service that is unique to one kind of data within six to eight months. When compared to the one-to-three year time frame anticipated for completing a data warehouse, this condensed timing schedule has valuable cost and ROI implications.
An organization’s leadership must consider some specific business issues before determining whether to proceed with a traditional or SOA-based approach to MDM. In addition to determining the desired time frame for completing the data project, other considerations should include the channels through which customers are engaged, if data domains will change over time or whether there are real-time requirements for data.
By evaluating these business criteria, a data architect can make a decision that not only represents the company today, but also in the future. Given these considerations, when will a traditional approach to MDM suffice and under what circumstances does it make sense to turn to an SOA-based MDM style?
A traditional approach to master data should work well for businesses that use their data for analytical purposes, rather than for real-time transactions, and that are comfortable with the latency that will accompany their data requests. This is particularly true for an organization that has already invested in a data warehouse and that doesn’t see a change on the horizon for how its data will be used. A single database could also suffice for businesses with a small customer base (thousands or tens of thousands), a limited product catalog and no changes in requirements that would significantly alter the data warehouse.
Additionally, some companies currently master their data by way of a commercial package, such as an ERP system, which may have the ability to manage a “single version of the truth” amid a myriad of other applications that also master data. This traditional approach leverages a company’s significant investment and could be suitable as long as the organization does not have performance or transactional requirements that will stress the application beyond its capabilities. For instance, any business with transaction rates of more than a few per second may face scalability and availability requirements that are beyond the capacity of their existing database architecture. Bear in mind, too, that if a commercial package ever needs to scale significantly, an organization might face hardware, database, storage and related software costs that can escalate into the tens of millions of dollars.
On the flip side, an SOA-based approach is better suited for companies that have real-time requirements or have transaction rates higher than a few per second (most often companies that engage with customers online), and that have no downtime allowances. Again, looking toward the future, an SOA-based approach is the best option if a business expects to grow, anticipates changes in its systems infrastructure or experiences significant spikes in transaction rates. Also, businesses with high volumes of data would be well served to examine an SOA-based style. For example, enterprises with tens of millions of rows of data or higher will face significant cost and complexity challenges with a traditional system that they will not encounter with an MDS approach.
A nimble, SOA-based style is also better suited for those companies and industries that are highly dynamic, experience consolidation or must meet changing regulatory reporting requirements where agility is critical. In addition, if time and cost are concerns, an SOA-based MDM approach is likely a better choice, as this style can set up a single-purposed, uniquely focused, highly specialized MDS in a matter of months. This approach not only enables a business to leverage pre-existing investments in legacy systems, but takes advantage of the fact that these software service costs are usually vastly lower than costs associated with a traditional approach.
Traditional and SOA-based approaches to MDM both have a place at the data table. To determine which approach is best for an organization, data architects must look at their business requirements for today as well as their needs for tomorrow. Examining the importance of scalability, flexibility, transaction rates and time to deploy will help ensure that the best long-term decision is made.
Enterprises with small data sets, no need for real-time transactions, low transaction rates, or a pre-existing system that is meeting the company’s needs for latency or regulatory requirements might be well served by a traditional system. Those organizations that require a highly scalable, flexible system that can accommodate high transaction rates in real-time, or that must master a particular kind of data in a matter of months instead of years, should investigate a more nimble, SOA-based MDM approach.