Recency, Frequency, Monetary (RFM) segmentation

In [6]:
import os
for root, dirs, files in os.walk("data/chapter_2"):  
    for filename in files:
        print(filename)
datamart_rfm.csv
datamart_rfm_and_scores.csv
datamart_rfm_scores_named_segment.csv
online12M.csv
._datamart_rfm.csv
._datamart_rfm_and_scores.csv
._datamart_rfm_scores_named_segment.csv
._online12M.csv
In [27]:
import pandas as pd
df = pd.read_csv("data/chapter_1/online.csv",index_col = 0)
In [28]:
df['Spend'] = df['Quantity'] * df['UnitPrice']
df_spend = df.groupby('CustomerID')['Spend'].sum().reset_index()
In [35]:
df_spend['Spend_Quartile'] = pd.qcut(df_spend['Spend'],q=4,labels = range(1,5))
In [38]:
df_spend.sort_values('Spend').head()
Out[38]:
CustomerID Spend Spend_Quartile
100 12977 0.65 1
2465 16454 1.00 1
2789 16913 1.25 1
2459 16446 1.25 1
3067 17343 1.74 1

Here, pd.qcut cuts the Spend value by quantile.

We know, RFM is based on the persion instead of transaction. Thus, here is to calculate the aggregate value for FRM.

In [76]:
snapshot_date = dt.datetime(2011,12,10,0,0,0)
In [91]:
df_rfm = df.groupby('CustomerID').agg({
    'Spend': 'sum',
    'InvoiceNo': 'count',
    'InvoiceDate': lambda x: (snapshot_date - pd.to_datetime(x.max())).days
}) \
.rename(columns={'InvoiceDate': 'Recency',
                 'InvoiceNo': 'Frequency',
                 'TotalSum': 'MonetaryValue'})
In [94]:
df_rfm.head()
Out[94]:
Spend Frequency Recency
CustomerID
12747 992.82 27 2
12748 7522.06 967 0
12749 813.45 37 3
12820 268.02 17 3
12822 146.15 9 70
In [103]:
datamart = pd.read_csv("data/chapter_2/datamart_rfm.csv")
In [107]:
r_labels = range(3,0,-1);f_labels = range(1,4);m_labels = f_labels
r_groups = pd.qcut(datamart['Recency'],q = 3, labels =  r_labels)
f_groups = pd.qcut(datamart['Frequency'],q = 3, labels =  f_labels)
m_groups = pd.qcut(datamart['MonetaryValue'],q = 3, labels =  m_labels)
datamart = datamart.assign(
    R = r_groups.values,
    F = f_groups.values,
    M = m_groups.values
)
datamart['Score'] = datamart[['R','F','M']].sum(axis = 1)
datamart.head()
Out[107]:
CustomerID Recency Frequency MonetaryValue R F M Score
0 12747 3 25 948.70 3 3 3 9.0
1 12748 1 888 7046.16 3 3 3 9.0
2 12749 4 37 813.45 3 3 3 9.0
3 12820 4 17 268.02 3 3 3 9.0
4 12822 71 9 146.15 2 2 2 6.0

Here, the qcut process is similar to the normalized process.

When we use the avg formula to get the score, we need to get the customer levels for the easy understanding purpose.

In [112]:
datamart = pd.read_csv('data/chapter_2/datamart_rfm_and_scores.csv')
In [117]:
def rfm_level(x):
    if x >= 10:
        return 'Top'
    elif x >= 6 and x < 10:
        return 'Middle'
    else:
        return 'Low'
datamart['FRM_Level'] = datamart['RFM_Score'].apply(rfm_level)
In [119]:
datamart.head()
Out[119]:
CustomerID Recency Frequency MonetaryValue R F M RFM_Segment RFM_Score FRM_Level
0 12747 3 25 948.70 4 4 4 444 12.0 Top
1 12748 1 888 7046.16 4 4 4 444 12.0 Top
2 12749 4 37 813.45 4 4 4 444 12.0 Top
3 12820 4 17 268.02 4 3 3 433 10.0 Top
4 12822 71 9 146.15 2 2 3 223 7.0 Middle

Finally, we need to tell our readers the performance of our model.

In [123]:
datamart.groupby('FRM_Level').agg({
    'Recency': 'mean',
    'Frequency': 'mean',
    'MonetaryValue': ['mean','count']
}).round(1)
Out[123]:
Recency Frequency MonetaryValue
mean mean mean count
FRM_Level
Low 180.8 3.2 52.7 1075
Middle 73.9 10.7 202.9 1547
Top 20.3 47.1 959.7 1021