Sizing Your Retail Data Warehouse
by Dan Ross
Originally published January 19, 2006
In this month’s article, we examine a difficult question regarding data warehousing: How much data is enough data for your data warehouse? For the business intelligence practitioner, there are many sources of information concerning best practices for data warehousing. Despite this, remarkably few of them provide any advice on how much (and which) data should be stored online in the data warehouse. In our business, the common understanding seems to be “more is better.”
For numerous reasons, however, a “more is better” approach may not be feasible. Cost is a driving reason in most cases. Not all retailers have an IT budget that can support the kind of systems required to store and process multiple years of detailed sales and inventory information. A corollary could be that the expected return on the costs required to effectively handle storage and then to access data is insufficient, regardless of the affordability factor. In contrast, today’s retailers must create and use detailed sales and inventory analyses as a competitive edge by best-in-class organizations. These retailers must also squeeze efficiencies out of their business through using information to compete.
Assuming that most retailers are not in a position to have everything, we should discuss how to make difficult, yet sensible decisions about ways to optimize the amount of data stored in the data warehouse. This must be done to strike a balance between cost/maintainability and delivering effective business value. Simply put, we need a framework in which an enterprise can categorize data with its planned or possible usage. This framework must define what to load into the data warehouse.
Which data is the most valuable?
Every retail data warehouse should contain at least some level of sales/transactions, inventory, vendor/supply chain, promotions and customer information. These business areas may include the places where effective usage of information can deliver the most tangible bottom-line benefits. If you must choose because of size/processing constraints, I would suggest encompassing more business areas of data instead of carrying additional years of history.
Though keeping a history is important to discover seasonal or year-by-year trends, it also impacts the degree that one can perform valuable customer analysis. In a sales-analysis only data warehouse, the most valuable (and most accessed) data is for the trailing quarter (or season) to date, and the comparable periods in years prior. A decision could be made to load only the detailed level data for those periods and use aggregate level information for the in-between periods. But the extra processing required to load and unload periods may be more “expensive” than merely adding storage capacity.
The amount of sales/transaction history to include depends upon the kind of retail business under analysis. In the grocery/fast-mover business, for example, there are many transactions. This industry also has frequent re-ordering and visits by the same customers over short periods of time. Given those patterns, a grocer could ostensibly only need to store the most recent one to two years of history, and would still get a usable picture of the business. In the specialty or fashion retail business, transaction volumes and basket content counts are smaller and customers tend to visit less often. Because of this, these warehouses should contain at least three years of history, and preferably five years at a minimum.
Whereas a financial system can certainly capture revenues and costs at the aggregate level, the data warehouse should be the place to go for supporting detail. The data warehouse can also help you analyze how and why certain parts of the financial plan missed or exceeded expectations. Typical financial planning for retail extends only to the store-department level, and perhaps in weekly or even monthly increments. Many retailers are now able to track up-to-the-minute achievement against these plans using active data warehousing techniques. The most fundamental analyses in retail is to have a thorough picture of how fast items are selling and how many of them are in the supply chain.
Because snapshots of complete inventory information are so bulky (even bulkier than sales data), many retail data warehouses do not include them. Keeping inventory information at the weekly level or higher in the data warehouse and using derivation to calculate necessary day-level positions is a common approach to minimize data set size. Most of the meaningful analyses (except stock-outs and lost sales) do not require daily level detail anyway. In addition, it may not be necessary to keep a great deal of inventory history. Perhaps only the last six months of detailed stock information is required, with monthly information kept for the last 24 months.
Vendor/Supply Chain Data
Keeping order history in the data warehouse is a (relatively) small addition to the size of the database and facilitates vendor performance analysis. Coupled with a detailed map of the products that each vendor ultimately supplies, retailers should create a basic vendor portal capability. This allows each vendor to track the sales of the products they supply (and their inventory levels) as well as their performance.
There are many subtleties in the area of promotion. While a single item’s promotion and resulting sales impact could presumably be tracked outside the data warehouse, the approach makes it difficult to gauge the complete collateral impacts of a promotion. Customers may react to the promotion by adding other items to their basket. Such items would complement the promoted item. Sales of other items may suffer. There may be variations in how the promotion impacts sales over different regions or time periods. Without the rich context of the data warehouse, these impacts could go unnoticed.
Many retailers have brought in third party demographic databases to “household” their customers along with their vital characteristics. At a higher level, getting census data for zip code demographics is easier to accomplish and is less personal than the householding information. Nonetheless, this data can and should be used to understand and predict differences among store performance based on where each store is located.
In summary, the areas we have discussed are the “minimums” of retail data warehouse contents. Retailers should first strive to include these areas in their business intelligence applications, then add other areas and broaden their history ranges as resources allow. Incorporating these minimums would put a retailer in the top quartile of business intelligence maturity in their industry, which just might be enough.
Recent articles by Dan Ross
Copyright 2004 — 2020. Powell Media, LLC. All rights reserved.
BeyeNETWORK™ is a trademark of Powell Media, LLC