Recency, Frequency, Monetary (RFM) segmentation
import os
for root, dirs, files in os.walk("data/chapter_2"):
for filename in files:
print(filename)
import pandas as pd
df = pd.read_csv("data/chapter_1/online.csv",index_col = 0)
df['Spend'] = df['Quantity'] * df['UnitPrice']
df_spend = df.groupby('CustomerID')['Spend'].sum().reset_index()
df_spend['Spend_Quartile'] = pd.qcut(df_spend['Spend'],q=4,labels = range(1,5))
df_spend.sort_values('Spend').head()
Here, pd.qcut
cuts the Spend
value by quantile.
We know, RFM is based on the persion instead of transaction. Thus, here is to calculate the aggregate value for FRM.
snapshot_date = dt.datetime(2011,12,10,0,0,0)
df_rfm = df.groupby('CustomerID').agg({
'Spend': 'sum',
'InvoiceNo': 'count',
'InvoiceDate': lambda x: (snapshot_date - pd.to_datetime(x.max())).days
}) \
.rename(columns={'InvoiceDate': 'Recency',
'InvoiceNo': 'Frequency',
'TotalSum': 'MonetaryValue'})
df_rfm.head()
datamart = pd.read_csv("data/chapter_2/datamart_rfm.csv")
r_labels = range(3,0,-1);f_labels = range(1,4);m_labels = f_labels
r_groups = pd.qcut(datamart['Recency'],q = 3, labels = r_labels)
f_groups = pd.qcut(datamart['Frequency'],q = 3, labels = f_labels)
m_groups = pd.qcut(datamart['MonetaryValue'],q = 3, labels = m_labels)
datamart = datamart.assign(
R = r_groups.values,
F = f_groups.values,
M = m_groups.values
)
datamart['Score'] = datamart[['R','F','M']].sum(axis = 1)
datamart.head()
Here, the qcut
process is similar to the normalized process.
When we use the avg formula to get the score, we need to get the customer levels for the easy understanding purpose.
datamart = pd.read_csv('data/chapter_2/datamart_rfm_and_scores.csv')
def rfm_level(x):
if x >= 10:
return 'Top'
elif x >= 6 and x < 10:
return 'Middle'
else:
return 'Low'
datamart['FRM_Level'] = datamart['RFM_Score'].apply(rfm_level)
datamart.head()
Finally, we need to tell our readers the performance of our model.
datamart.groupby('FRM_Level').agg({
'Recency': 'mean',
'Frequency': 'mean',
'MonetaryValue': ['mean','count']
}).round(1)