We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.

Transaction Information and Analysis for Retail Business Intelligence

Originally published August 16, 2005

Over the next three months, we will describe how to unlock the treasure trove of information of transaction data contained in the retail data warehouse. This month, in Part One, we examine the data model that captures transaction information and how it fits with the traditional star schema approach for analysis. Over the next two months, we will outline a series of analytical areas for transaction data that will provide a deep insight into the operations of a retail business.

Retail transaction information contains an incredible wealth of data about many aspects of a retail enterprise. From this data, a retailer can derive customer behavior, employee and store-level performance, promotional effectiveness, profit drivers and many other important factors. However, most retailers do not seem to be taking advantage of what can be learned from this data. According to a recent Retail Information Systems (RIS) / Gartner Tech Study, only 16 percent of respondents surveyed indicated that they had technology in place to perform customer segmentation or market basket analysis at the transaction level. An equal percentage said they had started such endeavors, which leaves a surprising 65 percent to 70 percent of “best-in-class” retailers without anything in place.

The reasons for this can be different for each firm, but in general, there is some commonality in the obstacles to performing transaction analysis effectively:

  • Size of the data: transaction detail information is usually a large amount of data. There is at least one table in some database that needs to capture a row for every song you buy from your online music store, every bottle of milk you buy from the local supermarket chain and every shirt you buy from your favorite clothier. While storage costs have continued to decline and processing power has continued to multiply, building a history of transaction data can easily run into terabytes of data rather quickly.
  • Poor structure for analysis by business intelligence tools: as we discuss below, the structure of the data captured by point-of-sale (POS) systems is not ideal for reporting. Most business intelligence tools struggle with the highly normalized nature of the information, and data warehouses are often populated with transaction data exactly as it comes from the POS system with minimal or no optimization for reporting purposes.
  • Calculations are complex: to accomplish market basket analysis effectively, multiple passes are often required against transaction data, or SQL self-joins may be needed to determine which products sell together. In addition, in some cases the SQL required to accomplish an analysis may need to join multiple transaction-related tables together, which requires sophistication in the business intelligence tool and some serious horsepower in the database engine.

Those retailers that can overcome these obstacles through the creative use of data warehousing technology stand to gain tremendous benefits and competitive advantage, given the relatively small number of their competitors who have managed to tackle this challenge. To start our article series, we examine the data model for capturing transaction information from a POS system.

The operational data model for transaction information is relatively complex and rich. By its nature, it is well suited for optimizing the performance of a POS system. That means that the retail transaction operational data (like most other operational data sets) has the following characteristics:

  1. Data is highly normalized (typically 3rd normal form);
  2. Single transaction information is spread across several tables (typically 3 to 4 tables);
  3. Information has a high degree (nearing 100 percent) of referential integrity and foreign key relationships; and
  4. Structure is optimized for insert speed, not reporting.

While there are a number of purveyors of POS systems, the data model that is employed to capture POS data is fairly common across the different products. It typically consists of three main entities (tables):

Transaction Header—this table typically contains one row per transaction. A complicating factor in transaction data modeling is the notion that a transaction is not assigned an enterprise-wide unique key at point of sale. Most POS systems assign unique keys only at the register level, which creates a requirement for a multi-valued primary key for a transaction at the enterprise level. This multi-valued key often consists of store, register, transaction and (sometimes) date values. The header record for a transaction usually contains:

  • Transaction time of day;
  • Register number within a store;
  • Employee who completed the transaction;
  • Indicator field for a return or regular sale;
  • Indicator field for voided transactions; and
  • Customer identifier or loyalty program number.

To make the transaction data model friendlier for reporting, experienced data warehouse teams will solve the transaction primary key problem upon data warehouse load, substituting the multi-valued key with a single value unique identifier for a transaction. Other optimizations at the header level might include adding pre-calculated fields like the number of items in the transaction and several fields related to the total retail price, taxes and promotional amounts (e.g. coupons) that are contained in the detail records. This would allow a simple report like average sales price per item or average items per basket to be calculated from header information without accessing the detail table, which can be several times larger.

Transaction Detail—this table typically contains one row per item in a transaction basket. The transaction detail table is often very wide, containing dozens of attributes about each item in a transaction. Captured fields typically include the following:

  • Pricing information such as retail price, discounted price and markdown amounts;
  • Manually keyed price changes at the point of sale;
  • UPC codes, which can tie to vendors;
  • Package sizes and weights;
  • Timestamps for each item; and
  • Tax amounts (sometimes) by item.

The detail table is the place where you would go to analyze product affinities, promotional effectiveness, customer service determined by register wait time and many other important measures. Again, here is a place where a unique transaction identifier is helpful in simplifying the SQL needed to aggregate and analyze the detail. Conversely, there may be fields that need to be de-normalized into the detail table for reporting such that extra joins are not required at reporting time.

Transaction Tenders—this table typically contains on average slightly more than one row per transaction, depending on how customers pay for their purchases. A tender can be associated with cash, check, charge, gift cards and coupons. If a customer has three coupons and pays by credit card, this table would contain four rows for that particular transaction. Typical information captured at the tender level includes:

  • Tender type and amount—cash, credit, charge, coupon, etc.;
  • Specific information about each tender (e.g., credit card number, coupon identifier, check number); and
  • Timestamp information.

The Transaction Tender table is a hidden backdoor into customer segmentation analysis, even in absence of a customer loyalty program. Unique customers can be identified by their credit card numbers or bank account numbers and their purchase histories tracked and aggregated. The sophisticated retailer substitutes its own replacement key value for these very sensitive pieces of data.

There can be other entities tracked at the transaction level in some circumstances that are specific to a particular type of retailer or individual firm. Sometimes, taxes can, and should be, tracked separately for a number of regulatory reasons and at least one common industry data model will include a reference to customer survey data, which can be linked to transaction information.

All of the common aggregations of retail data warehouse data should be derived directly from transaction information, but many or most of the high-value analyses should be performed at this level. The successful data warehouse team puts in place data model optimizations for transaction information that make it a little easier for common business intelligence tools to access this information.

In future articles, we will examine the valuable analyses that can be performed on transaction data, and the means to accomplish them.

  • Dan RossDan Ross
    Dan is the Managing Partner of the Retail Practice at Claraview, a strategy and technology consultancy that helps leading companies and government agencies use business intelligence to achieve competitive advantage and operational excellence. Claraview clients realize measurable results: faster time to decision, improved information quality and greater strategic insight. Dan is a frequent contributor to business intelligence literature, writing on topics spanning technical approaches and business impact, and the Claraview Retail Practice serves some of the world's most advanced users of retail data warehouses.

    Editor's note: More retail articles, resources, news and events are available in the BeyeNETWORK's Retail Channel. Be sure to visit today!

Recent articles by Dan Ross



Want to post a comment? Login or become a member today!

Be the first to comment!