Recency, Frequency, Monetary (RFM) segmentation

import os
for root, dirs, files in os.walk("data/chapter_2"):  
    for filename in files:
        print(filename)

datamart_rfm.csv
datamart_rfm_and_scores.csv
datamart_rfm_scores_named_segment.csv
online12M.csv
._datamart_rfm.csv
._datamart_rfm_and_scores.csv
._datamart_rfm_scores_named_segment.csv
._online12M.csv

import pandas as pd
df = pd.read_csv("data/chapter_1/online.csv",index_col = 0)

df['Spend'] = df['Quantity'] * df['UnitPrice']
df_spend = df.groupby('CustomerID')['Spend'].sum().reset_index()

df_spend['Spend_Quartile'] = pd.qcut(df_spend['Spend'],q=4,labels = range(1,5))

df_spend.sort_values('Spend').head()

Here, pd.qcut cuts the Spend value by quantile.

We know, RFM is based on the persion instead of transaction. Thus, here is to calculate the aggregate value for FRM.

snapshot_date = dt.datetime(2011,12,10,0,0,0)

df_rfm = df.groupby('CustomerID').agg({
    'Spend': 'sum',
    'InvoiceNo': 'count',
    'InvoiceDate': lambda x: (snapshot_date - pd.to_datetime(x.max())).days
}) \
.rename(columns={'InvoiceDate': 'Recency',
                 'InvoiceNo': 'Frequency',
                 'TotalSum': 'MonetaryValue'})

df_rfm.head()

datamart = pd.read_csv("data/chapter_2/datamart_rfm.csv")

r_labels = range(3,0,-1);f_labels = range(1,4);m_labels = f_labels
r_groups = pd.qcut(datamart['Recency'],q = 3, labels =  r_labels)
f_groups = pd.qcut(datamart['Frequency'],q = 3, labels =  f_labels)
m_groups = pd.qcut(datamart['MonetaryValue'],q = 3, labels =  m_labels)
datamart = datamart.assign(
    R = r_groups.values,
    F = f_groups.values,
    M = m_groups.values
)
datamart['Score'] = datamart[['R','F','M']].sum(axis = 1)
datamart.head()

Here, the qcut process is similar to the normalized process.

When we use the avg formula to get the score, we need to get the customer levels for the easy understanding purpose.

datamart = pd.read_csv('data/chapter_2/datamart_rfm_and_scores.csv')

def rfm_level(x):
    if x >= 10:
        return 'Top'
    elif x >= 6 and x < 10:
        return 'Middle'
    else:
        return 'Low'
datamart['FRM_Level'] = datamart['RFM_Score'].apply(rfm_level)

datamart.head()

Finally, we need to tell our readers the performance of our model.

datamart.groupby('FRM_Level').agg({
    'Recency': 'mean',
    'Frequency': 'mean',
    'MonetaryValue': ['mean','count']
}).round(1)

	CustomerID	Spend	Spend_Quartile
100	12977	0.65	1
2465	16454	1.00	1
2789	16913	1.25	1
2459	16446	1.25	1
3067	17343	1.74	1

	Spend	Frequency	Recency
CustomerID
12747	992.82	27	2
12748	7522.06	967	0
12749	813.45	37	3
12820	268.02	17	3
12822	146.15	9	70

	CustomerID	Recency	Frequency	MonetaryValue	R	F	M	Score
0	12747	3	25	948.70	3	3	3	9.0
1	12748	1	888	7046.16	3	3	3	9.0
2	12749	4	37	813.45	3	3	3	9.0
3	12820	4	17	268.02	3	3	3	9.0
4	12822	71	9	146.15	2	2	2	6.0

	CustomerID	Recency	Frequency	MonetaryValue	R	F	M	RFM_Segment	RFM_Score	FRM_Level
0	12747	3	25	948.70	4	4	4	444	12.0	Top
1	12748	1	888	7046.16	4	4	4	444	12.0	Top
2	12749	4	37	813.45	4	4	4	444	12.0	Top
3	12820	4	17	268.02	4	3	3	433	10.0	Top
4	12822	71	9	146.15	2	2	3	223	7.0	Middle

	Recency	Frequency	MonetaryValue
	mean	mean	mean	count
FRM_Level
Low	180.8	3.2	52.7	1075
Middle	73.9	10.7	202.9	1547
Top	20.3	47.1	959.7	1021