We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.

Analyzing Retail Transactions: Avoiding Obstacles

Originally published September 29, 2005

In last month’s article, I discussed how transaction data is captured in a typical retail data warehouse. The article also focused on the obstacles that the information’s structure presents to the merchandiser attempting to collect information from it. This article will discuss ways to avoid those obstacles. Next month I will focus on the types of analysis retailers should be conducting against their transaction data.

As I have stated before, we estimated that only 15-30% of retailers are doing any analysis against transaction information. Some of the reasons for this include:

  1. The data volumes are quite large.
  2. The data is typically not structured well for business intelligence tools to analyze it easily.
  3. The calculation complexity requires a great deal of horsepower and resources (disk space and CPU processing power).

How do you overcome these issues and start generating business value?

Use Less Data

Using less data seems obvious, right? If too much data is causing you problems, find a way to use less and still derive value. This is actually easier than it sounds. Many meaningful analyses can be performed at the transaction summary level. This eliminates the need for constantly accessing tables that include a row for every item in a basket. While I will soon consider what analysis can be performed at this level, for now just remember to limit access to the detail unless absolutely required. Transferring analysis from detail level to transaction summary level can be accomplished through an effective ETL process. This process does some of the heavy lifting prior to running reports with your business intelligence tool. One must remember that your goal is deriving value, not making your business intelligence tool perform tricks against a large, poorly structured database. This is especially important in grocery stores, where baskets of “12 items or less” are the exception.

Another way to minimize data size is by carefully examining your usage of history. When trying to gain analytical insight, you do not necessarily need to analyze every transaction in your data warehouse over the last 10 years. For example, grocers have so many transactions and repeat customers over short amounts of time. Because of this, they might be able to start with six months of transaction detail (while the ideal might be one to two years). The same thing could probably be said for big box retail. Here, they have enough repeat customers, which allow them to use a shorter history. Specialty retailers have comparably fewer transactions, which naturally limits data set size. But these businesses can probably start with two years of data (although five is ideal) for a meaningful sample set from customers that visit one or two times a year.

Tremendous data set reductions can occur by examining the applicability of the transactions to valuable analysis. This basically means that all transactions are different and should not be compared with all other transactions. Within a national retail chain, you might see dramatically different customer behavior and purchasing by region or season. In a Northeast home improvement store, for example, you probably would not see lawn care products being sold in January. For this same chain, however, lawn care products could be sold year-round in the south and west. Grocery, beer and wine sales can often substantially increase the overall basket price, but not every store can carry beer and wine because of state alcohol regulations. You should construct a set of “relevant” transactions for each analysis resulting in a smaller data set for analysis. You should also realize that even though analysis must be conducted at the basket level, the valuable correlation and affinity analyses likely involve product information at a higher level than SKU.

An example of this is if you are interested in how often people buy paintbrushes and painter’s tape with paint. If you are searching for this information, you probably would not care what color of paint people were buying. You probably would not care about the paint or the paintbrushes package size either. In terms of the fashion retailing, it could be very profitable to incent sales of hosiery with every shoe purchase. But you might not care about the detailed characteristics, assuming there are similar margins on similar SKU’s. Before conducting affinity analysis, you should introduce a step in your process where you replace a SKU with the most relevant department or product class.   

In short, there are numerous ways of reducing the amount of data necessary to access or conduct a meaningful analysis of your transactions. Whenever possible, do not access the detail. When you do, make sure you have created a relevant subset of transactions for analysis.

Simplify the Structure

Too many companies pull data directly off their POS systems, create an identical structure in their data warehouse and place a business intelligence tool on it. Transactional data is complex and hard to expose in an unvarnished fashion to business people, while business intelligence tools struggle with handling the entity relationships correctly. In addition, the inherent level of normalization and pre-calculation make analysis difficult and slow-performing. This technique is easy for IT because it simplifies the amount of data integration and ETL that must be built. It also speeds time to deploying a transactional data warehouse. Unfortunately, this technique doesn’t make analysis very easy.

There are three strategies that should be done to make the data warehouse structure easier to analyze by a typical business intelligence tool and business user:

  • Learn about master data managementThe most chaotic operational data is the item master and store master information. Remember use robust master data management processes that work well with your merchandising, inventory and ordering systems.
  • Don’t copy the POS data model into the data warehouseTake the time to create ETL processes that actually transform the data into more usable models. De-normalize and pre-calculate information without leaving out necessary information. For example, adding a field for the number of items in the Transaction Header table saves you from accessing Transaction Detail. This also eliminates the need for counting rows at analysis time. You should also solve the transaction primary key problem to avoid complex, multi-column joins for the reports that need to access multiple transaction-related tables. Create good aggregate tables (you will need these even if your vendor says otherwise) for item movement analyses that don’t need to be recreated upon a product re-class.
  • Realize that transaction data breaks the “dimension” structurethe typical dimensional data structure in retail involves three main dimensions (products, stores and retail calendar) that carry attributes. These attributes relate to each other independently from sales and inventory amounts. Only when placed with sales and inventory, can the different dimensions relate to one another. There are relatively few exceptions, at least for business intelligence tools. A transaction is a dimension with its own attributes that are different from products, stores or time. In fact, products, items and stores are really just attributes of a transaction. It can even be said that a data warehouse containing transaction data is really a 1-dimension data warehouse. This is not easy for business tools to deal with and might require special treatment.

Use Your Horsepower Wisely

Unfortunately, transaction analysis requires significant disk space and CPU processing power. You cannot avoid this. However, you should prudently consider your horsepower. There are many strategies for this:

  • Use your batch window to the maxAs much as possible, push heavy calculation and repeatable processing into ETL and the batch process. You should also optimize the mundane data movement and loading process as much as possible. By doing this, you will have cycles for real transformation through pre-calculation and de-normalization. Ideally, this should be completed during off hours, instead of real-time when it impacts other users’ queries.
  • Target your analysisEspecially when doing product affinity and customer analysis, use the techniques above to avoid analyzing the entire database at once. Common sense is important as well. In general, there is enough lift from a manageable set of affinity analyses to produce a substantial ROI. Avoid open-ended analyses that attempt to determine affinity by analyzing every transaction (in every store) for two years. If an open-ended analysis is required, use a transaction sample using statistical principles. Even on a general analysis, you do not need to examine every transaction to generate actionable information.
  • Proactively tune your data warehouse—You will need as many processing cycles as you can recapture. Instead of asking your CFO for millions of dollars for hardware, it is much better to spend continuous time tuning your data warehouse and ensuring optimal performance.

Effectively analyzing transaction information is not easy, which is why many retailers still have not mastered it. Using these techniques, though, you can put yourself in a position to begin deriving value from transaction information. You can also learn more about your customers’ behavior, among many other things. Next month, I will discuss the extent of actionable information you can derive from analyzing POS data.

  • Dan RossDan Ross
    Dan is the Managing Partner of the Retail Practice at Claraview, a strategy and technology consultancy that helps leading companies and government agencies use business intelligence to achieve competitive advantage and operational excellence. Claraview clients realize measurable results: faster time to decision, improved information quality and greater strategic insight. Dan is a frequent contributor to business intelligence literature, writing on topics spanning technical approaches and business impact, and the Claraview Retail Practice serves some of the world's most advanced users of retail data warehouses.

    Editor's note: More retail articles, resources, news and events are available in the BeyeNETWORK's Retail Channel. Be sure to visit today!

Recent articles by Dan Ross



Want to post a comment? Login or become a member today!

Be the first to comment!