--- title: pyks keywords: fastai sidebar: home_sidebar summary: "Calculate KS statistic for models" ---

example 1

import pandas as pd
import numpy as np
data = pd.read_csv('refs/two_class_example.csv')

路径也支持联想输入

data.describe().pipe(print)
data.count().pipe(print)
                y          yhat
count  500.000000  5.000000e+02
mean     0.516000  5.447397e-01
std      0.500244  4.138621e-01
min      0.000000  1.794262e-07
25%      0.000000  7.289481e-02
50%      1.000000  6.569442e-01
75%      1.000000  9.794348e-01
max      1.000000  9.999965e-01
y       500
yhat    500
dtype: int64

y=1判断为好人,相应地,yhat普遍会高。

data["good"] = data.y
data["bad"] = 1 - data.y
data["score"] = data.yhat

summary[source]

summary(df, n_group=10)

Calculation KS statistic Inspired by one WenSui Liu's blog at https://statcompute.wordpress.com/2012/11/18/calculating-k-s-statistic-with-python/

Parmaters

df: pandas.DataFrame with M x N size. M length is the number of bins. N measures the number of metrics related to KS. n_group: float The number of cutted groups.

Returns

agg2 : The DataFrame return with KS and related metrics.

summary(data, n_group = 10)
min_scr max_scr bads goods total odds bad_rate ks max_ks
0 1.794262e-07 0.002773 50 0 50 0.00 100.00% 20.66
1 2.810221e-03 0.036310 49 1 50 0.02 98.00% 40.52
2 3.670582e-02 0.122027 43 7 50 0.16 86.00% 55.58
3 1.225460e-01 0.325715 37 13 50 0.35 74.00% 65.83
4 3.269821e-01 0.655164 31 19 50 0.61 62.00% 71.27 <----
5 6.587248e-01 0.853443 22 28 50 1.27 44.00% 69.51
6 8.561391e-01 0.958957 7 43 50 6.14 14.00% 55.74
7 9.623505e-01 0.987179 1 49 50 49.00 2.00% 37.16
8 9.875471e-01 0.997897 2 48 50 24.00 4.00% 19.38
9 9.979229e-01 0.999997 0 50 50 inf 0.00% -0.00

example 2

import pandas as pd
import numpy as np
from sklearn.metrics import roc_curve
import matplotlib.pyplot as plt
data = pd.read_csv('refs/two_class_example.csv')