We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.

RFM: A Precursor to Data Mining

Originally published May 14, 2009

RFM stands for recency, frequency and monetary value. It has been used by direct marketers for more than 40 years as a segmentation tool to increase marketing return on investment (ROI). The basic premise of RFM is that customers who have purchased more recently, more frequently and have spent more with your company are your best prospects for future direct marketing campaigns. Like data mining/response modeling, the goal of RFM is to increase marketing ROI by communicating (via direct mail, call center, etc.) only with customers that are likely to respond. Done well, you increase your ROI as you attain almost the same number of sales by contacting only a fraction of your customer base.

RFM, business intelligence, data mining and optimization represent a common progression away from mass marketing for many organizations as their marketing efforts become more analytically based and targeted.

(mouse over image to enlarge)

As depicted above, the adoption of each technique is a function of many factors. Consequently, a technique like RFM can still be a new and promising approach to many companies today. It is simple to understand, contributes to ROI, is inexpensive and can be utilized as a reliable stepping stone to more advanced techniques like data mining.

RFM in Action

RFM was initially utilized by marketers in the B-2-C space – specifically in industries like cataloging, insurance, retail banking, telecommunications and others. There are a number of scoring approaches that can be used with RFM. We’ll take a look at three:

RFM – Basic ranking

RFM – Within parent cell ranking

RFM – Weighted cell ranking

Each approach has experienced proponents that argue one over the other. The point is to start somewhere and experiment to find the one that works best for your company and your customer base. Let’s look at a few examples.

RFM – Basic Ranking

This approach involves scoring customers based on each RFM factor separately. It begins with sorting your customers based on recency, i.e., the number of days or months since their last purchase. Once sorted in ascending order (most recent purchasers at the top), the customers are then split into quintiles, or five equal groups. The customers in the top quintile represent the 20% of your customers that most recently purchased from you.


This process is then undertaken for frequency and monetary as well. Each customer is in one of the five cells for R, F and M (see below).


Experience tells us that the best prospects for an upcoming campaign are those customers that are in quintile 5 for each factor – those customers that have purchased most recently, most frequently and have spent the most money. In fact, a common approach to creating an aggregated score is to concatenate the individual RFM scores together resulting in 125 cells (5x5x5).

A customer’s score can range from 555 (the highest) to 111 (the lowest).

RFM – Within Parent Cell Ranking

This approach is advocated by Arthur Middleton Hughes1 – one of the biggest proponents of RFM analysis. It begins like the one above, i.e., all customers are initially grouped into 5 cells based on recency. The next step takes customers in a given recency cell – say cell number 5, and then ranks those customers based on frequency. Then customers in the 55 (RF) cell are ranked by monetary value. The illustration below shows that this method really requires quite a number of sorts on the database.


RFM – Weighted Ranking

Weightings used by RFM practitioners vary. For example, some advocate adding the RFM score together – thus giving equal weight to each factor. Consequently, scores can range from 15 (5+5+5) to 3 (1+1+1). Another weighting arrangement often used is, 3xR + 2xF + 1xM. In this case, scores can range from 30 to 3.

So which to use? In reality, there are many other permutations of approaches that are being used today. Best-practice marketing analytics requires a fine mix of mathematical and statistical science, creativity and experimentation. Bottom line: test multiple scoring methods to see which works best for your unique customer base. The graphical analysis below is a great first step in determining a weighting scheme that is appropriate for your company.

So far we have assumed R is more important than F, which is more important than M. This is a great start, but in reality some businesses find that a different order works best given the unique nature of their business and customer base. The graphs below represent an analysis to a recent campaign for a hypothetical company. When looking at actual response across each RFM factor, the graphs suggest that this company may be better off developing a scoring scheme based on some weighting of MRF, attributing the highest weight to monetary since it is associated with the highest response rate.


Okay, so now you have scores – how do you decide which customers should be contacted based on those scores?

Establishing a Score Threshold

After a test or production campaign, you will find that some of the cells were profitable while some were not. Let’s turn to a case study to see how you can establish a threshold that will help maximize your profitability. This study comes from Professor Charlotte Mason of the Kenan-Flagler Business School and utilizes a real-life marketing study performed by The BookBinders Book Club.2

BookBinders is a specialty book seller that utilizes multiple marketing channels. BookBinders traditionally did mass marketing and wanted to test the power of RFM. To do so, they initially did a random mailing to 50,000 customers. The customers were mailed an offer to purchase The Art History of Florence. Response data was captured and a “post-RFM” analysis was completed. This “post analysis” was done by freezing the files of the 50,000 test customers prior to the actual test offer. Thus, the impact of this test campaign did not affect the analysis by coding many (the actual buyers) of the 50,000 test subjects as the most recent purchasers. The results firmly support the use of RFM as a highly effective segmentation approach.


Customers that purchased the book were more recent purchasers, more frequent purchasers and had spent the most with BookBinders.

The response rates by decile for recency paint an even more compelling picture (see graph below).


The response rate for the top decile (18%) was twice the response rate associated with the 5th decile (9%).

Results from this test were then used by BookBinders to identify which of their remaining customers should receive the same mailing. BookBinders used a breakeven response rate calculation to determine the appropriate RFM cells to mail.

The following cost information was used as input:
 Cost per mail piece
 $    .50
 Selling price
 Bookbinders' book cost
 $  9.00
 Shipping costs
 $  3.00

Breakeven is achieved when the cost of the mailing is equal to the net profit from a sale. In this case:

Breakeven  = (cost to mail the offer/net profit from a single sale)
        = $0.50/($18-9-3)
        = ($0.50/6)
        = 8.3% = breakeven response rate

So, according to the test offer, profit can be obtained by mailing to cells that exhibited a response rate of greater than 8.3% – or cells with an RFM score greater than 425. BookBinders compared the profitability of RFM versus their old mass marketing approach in the table below.


(mouse over image to enlarge)

RFM dramatically improved profitability by capturing 71% of buyers (3,214/4,522) while mailing only 46% of their customers (22,731/50,000). And the return on marketing expenditures using RFM was more than eight times (69.7/8.5) that of a mass mailing.

Number of Cells and Cell Size Considerations

As previously mentioned, RFM was initially utilized by companies that operated in the B-to-C marketplace and generally possessed a very large number of customers. The idea of generating 125 cells using quintiles for R, F and M has been a very good practice as an initial modeling effort. But what if you are a B-to-B marketer with relatively fewer customers? Or, what if you are a B-to-C marketer with an extremely large file with millions of customers? The answer is to use the same approach that is used in data mining – be flexible and experiment.

Establishing a minimum test cell size is a good place to start. Arthur Hughes recommends the following formula:

    Test cell size = 4 / breakeven response rate

The breakeven response rate was addressed above in the BookBinders case study. The number "4" is a number that Hughes has found works successfully based on many studies he has performed. BookBinders breakeven response rate was 8.3%. Using the above formula, you would need a minimum of 48 customers in each cell (4/0.083). BookBinders actually had 400 customers per cell, so they had more than adequate comfort in the significance of their test. In reality, BookBinders could have created as many as 1,041 cells if they were comfortable using the minimum of 48 per cell. As an example, they could have used deciles as opposed to quintiles and established 1,000 cells (10 x 10 x 10). The more cells, the finer the analysis, but of course the law of diminishing returns will arise.

Other weighting considerations can be used for small files. If your breakeven response rate is 3%, your minimum cell size would be 133 customers (4/0.03). Therefore, if you have 12,000 customers, you could have about 90 cells (12,000/133). As such, a 5 x 5 x 4 (100 cells) or a 5 x 4 x 4 (80 cells) approach may be appropriate.


RFM, business intelligence and data mining are all part of an evolutionary path that is common to many marketing organizations. While RFM has been practiced for over 40 years, it still holds great value for many organizations. Its merits include:
  • Simplicity – easy to understand and implement

  • Relatively low cost

  • Proven ROI

  • The demand on data requirements is relatively low in terms of variables required and the number of records

  • Once utilized, it sets up a broader foundation (from an infrastructure and business case perspective) to undertake more sophisticated data mining efforts
RFM’s challenges include:
  • Contact fatigue can be a problem for the higher scoring customers. A high level cross-campaign communication strategy can help prevent this.

  • Your lowest scoring customers may never hear from you. Again, a cross-campaign communications plan should ensure that all of your customers are communicated with periodically to ensure low scoring customers are given the opportunity to meet their potential. Also, data mining and the prediction of customer lifetime value can help address this shortcoming.

  • RFM includes only three variables. Data mining typically finds RFM-based variables to be quite important in response models. But there are additional variables that data mining typically uses (e.g., detailed transaction, demographic and firmographic) that help produce improved results. Moreover, data mining techniques can also increase response rates via the development of richer segment/cell profiles that can be used to vary offer content and incentives.
As stated before, successful marketing efforts require analytics and experimentation. RFM has proven itself as an effective approach to predicting response and improving profitability. It can be an important stage in your company’s evolution in marketing analytics.

End Notes:
  1. Arthur Middleton Hughes, Vice President, The Database Marketing Institute

  2. Recency, Frequency and Monetary (RFM) Analysis, Professor Charlotte Mason, Kenan-Flagler Business School, University of North Carolina, 2003.
  • Jim Stafford
    Jim has worked for leading companies in the marketing automation space (business intelligence, data mining, campaign management and eMarketing) for more than 12 years. He has directed SPSS’ pre-sales engineers in North America and has played the role of Product Marketing Manager for Unica’s Model (data mining) application. Jim has developed response models and customer segmentation strategies for many industries including catalogers, financial services, retailers and hospitality. Learn more about Jim’s services here. He can be reached at Jim@StaffordSBSG.com.


Want to post a comment? Login or become a member today!

Be the first to comment!