<-- Back to full color view

Sizing Your Retail Data Warehouse

Originally published January 19, 2006

In this month’s article, we examine a difficult question regarding data warehousing: How much data is enough data for your data warehouse? For the business intelligence practitioner, there are many sources of information concerning best practices for data warehousing. Despite this, remarkably few of them provide any advice on how much (and which) data should be stored online in the data warehouse. In our business, the common understanding seems to be “more is better.” 

For numerous reasons, however, a “more is better” approach may not be feasible. Cost is a driving reason in most cases. Not all retailers have an IT budget that can support the kind of systems required to store and process multiple years of detailed sales and inventory information. A corollary could be that the expected return on the costs required to effectively handle storage and then to access data is insufficient, regardless of the affordability factor. In contrast, today’s retailers must create and use detailed sales and inventory analyses as a competitive edge by best-in-class organizations. These retailers must also squeeze efficiencies out of their business through using information to compete.

Assuming that most retailers are not in a position to have everything, we should discuss how to make difficult, yet sensible decisions about ways to optimize the amount of data stored in the data warehouse. This must be done to strike a balance between cost/maintainability and delivering effective business value. Simply put, we need a framework in which an enterprise can categorize data with its planned or possible usage. This framework must define what to load into the data warehouse.

Which data is the most valuable?
While providing a number in gigabytes or terabytes would be wonderful, there is simply no way to do this in a general fashion. This is impossible given the vast differences in data volumes from retailer to retailer. Instead, I recommend that you make your sizing decisions based on the value of the data and determine the minimums. After doing this, you should analyze what the “nice-to-haves” are and include them as your project’s resources allow.

Every retail data warehouse should contain at least some level of sales/transactions, inventory, vendor/supply chain, promotions and customer information. These business areas may include the places where effective usage of information can deliver the most tangible bottom-line benefits. If you must choose because of size/processing constraints, I would suggest encompassing more business areas of data instead of carrying additional years of history.

Sales/transaction information
Analyzing aggregate level sales certainly helps organizations review and predict corporate performance. As I discussed in my last article, Best Practices for Mining Retail Transactions, analysis of transaction data provides a wealth of information about how customers shop in your stores; what an average visit to the store consists of; how factors like location and store size affect item sales and many other interesting issues. I believe that the sales area is the most useful to include in your data warehouse. There would be few retail data warehouses in production without sales information.

Though keeping a history is important to discover seasonal or year-by-year trends, it also impacts the degree that one can perform valuable customer analysis. In a sales-analysis only data warehouse, the most valuable (and most accessed) data is for the trailing quarter (or season) to date, and the comparable periods in years prior. A decision could be made to load only the detailed level data for those periods and use aggregate level information for the in-between periods. But the extra processing required to load and unload periods may be more “expensive” than merely adding storage capacity.

The amount of sales/transaction history to include depends upon the kind of retail business under analysis. In the grocery/fast-mover business, for example, there are many transactions. This industry also has frequent re-ordering and visits by the same customers over short periods of time. Given those patterns, a grocer could ostensibly only need to store the most recent one to two years of history, and would still get a usable picture of the business. In the specialty or fashion retail business, transaction volumes and basket content counts are smaller and customers tend to visit less often. Because of this, these warehouses should contain at least three years of history, and preferably five years at a minimum.

Inventory Information
Putting inventory information into the data warehouse supports trending analysis on operational metrics that combine sales and inventory. Such information includes weeks/days of supply (future), sell-through (historical), inventory turns, GMROI, shrinkage, etc. While sales constitute the revenue side of the business, inventory obviously accounts for a significant component of the business’s cost side. You must at least combine sales and inventory to perform detailed trending analysis on the company P&L.

Whereas a financial system can certainly capture revenues and costs at the aggregate level, the data warehouse should be the place to go for supporting detail. The data warehouse can also help you analyze how and why certain parts of the financial plan missed or exceeded expectations. Typical financial planning for retail extends only to the store-department level, and perhaps in weekly or even monthly increments. Many retailers are now able to track up-to-the-minute achievement against these plans using active data warehousing techniques. The most fundamental analyses in retail is to have a thorough picture of how fast items are selling and how many of them are in the supply chain.  

Because snapshots of complete inventory information are so bulky (even bulkier than sales data), many retail data warehouses do not include them. Keeping inventory information at the weekly level or higher in the data warehouse and using derivation to calculate necessary day-level positions is a common approach to minimize data set size. Most of the meaningful analyses (except stock-outs and lost sales) do not require daily level detail anyway. In addition, it may not be necessary to keep a great deal of inventory history. Perhaps only the last six months of detailed stock information is required, with monthly information kept for the last 24 months.

Vendor/Supply Chain Data 
If sales and inventory information tell you how much of a particular item you are selling and how much you have left in inventory, adding a vendor and supply chain data will tell you how well you are re-supplying. This will also indicate how well your vendors/partners are helping you. Every good retail data warehouse should have the capability to perform vendor scorecarding. Tracking a vendor/partner’s performance in successfully filling orders is a basic process that should be analyzed over time. Many retailers use performance metrics to hold their vendors accountable for their actions, and some even mandate that vendors meet a service level agreement to keep their business. 

Keeping order history in the data warehouse is a (relatively) small addition to the size of the database and facilitates vendor performance analysis. Coupled with a detailed map of the products that each vendor ultimately supplies, retailers should create a basic vendor portal capability. This allows each vendor to track the sales of the products they supply (and their inventory levels) as well as their performance.  

Promotion Data
Promotion and markdown are essential retail functions that should be tracked in the data warehouse. The action of promoting or marking down an item is a “give to get” situation. Obviously, the action is attempting to influence customer behavior in some fashion (presumably a net positive for the retailer). Without including information in the data warehouse about promotions, it would be quite difficult to determine if the action was successful.

There are many subtleties in the area of promotion. While a single item’s promotion and resulting sales impact could presumably be tracked outside the data warehouse, the approach makes it difficult to gauge the complete collateral impacts of a promotion. Customers may react to the promotion by adding other items to their basket. Such items would complement the promoted item. Sales of other items may suffer. There may be variations in how the promotion impacts sales over different regions or time periods. Without the rich context of the data warehouse, these impacts could go unnoticed.

Customer Data
Customer data is a vital component used to support marketing needs. Understanding the characteristics of a retail operation’s customers is vital input to the process of planning, buying and pricing the right products. Even if database marketing is outsourced, the minimum information to identify customers with their purchase history must be created.

Many retailers have brought in third party demographic databases to “household” their customers along with their vital characteristics. At a higher level, getting census data for zip code demographics is easier to accomplish and is less personal than the householding information. Nonetheless, this data can and should be used to understand and predict differences among store performance based on where each store is located.

In summary, the areas we have discussed are the “minimums” of retail data warehouse contents. Retailers should first strive to include these areas in their business intelligence applications, then add other areas and broaden their history ranges as resources allow. Incorporating these minimums would put a retailer in the top quartile of business intelligence maturity in their industry, which just might be enough. 

SOURCE: Sizing Your Retail Data Warehouse

  • Dan RossDan Ross
    Dan is the Managing Partner of the Retail Practice at Claraview, a strategy and technology consultancy that helps leading companies and government agencies use business intelligence to achieve competitive advantage and operational excellence. Claraview clients realize measurable results: faster time to decision, improved information quality and greater strategic insight. Dan is a frequent contributor to business intelligence literature, writing on topics spanning technical approaches and business impact, and the Claraview Retail Practice serves some of the world's most advanced users of retail data warehouses.

    Editor's note: More retail articles, resources, news and events are available in the BeyeNETWORK's Retail Channel. Be sure to visit today!

Recent articles by Dan Ross



Want to post a comment? Login or become a member today!

Be the first to comment!


Copyright 2004 — 2020. Powell Media, LLC. All rights reserved.
BeyeNETWORK™ is a trademark of Powell Media, LLC