--- title: Demo keywords: fastai sidebar: home_sidebar summary: "Use autoencoders to detect fraud samples." ---
Using TensorFlow backend.

Write an autoencoder to do the fraud detection using Keras.

Write an autoencoder to do the fraud detection using Keras.

Make inputs

import numpy as np
np.random.seed(1) # reproducibility
import pandas as pd
df = pd.read_csv("data/creditcard.csv")
df.head()
Time V1 V2 V3 V4 V5 V6 V7 V8 V9 ... V21 V22 V23 V24 V25 V26 V27 V28 Amount Class
0 0.0 -1.359807 -0.072781 2.536347 1.378155 -0.338321 0.462388 0.239599 0.098698 0.363787 ... -0.018307 0.277838 -0.110474 0.066928 0.128539 -0.189115 0.133558 -0.021053 149.62 0
1 0.0 1.191857 0.266151 0.166480 0.448154 0.060018 -0.082361 -0.078803 0.085102 -0.255425 ... -0.225775 -0.638672 0.101288 -0.339846 0.167170 0.125895 -0.008983 0.014724 2.69 0
2 1.0 -1.358354 -1.340163 1.773209 0.379780 -0.503198 1.800499 0.791461 0.247676 -1.514654 ... 0.247998 0.771679 0.909412 -0.689281 -0.327642 -0.139097 -0.055353 -0.059752 378.66 0
3 1.0 -0.966272 -0.185226 1.792993 -0.863291 -0.010309 1.247203 0.237609 0.377436 -1.387024 ... -0.108300 0.005274 -0.190321 -1.175575 0.647376 -0.221929 0.062723 0.061458 123.50 0
4 2.0 -1.158233 0.877737 1.548718 0.403034 -0.407193 0.095921 0.592941 -0.270533 0.817739 ... -0.009431 0.798278 -0.137458 0.141267 -0.206010 0.502292 0.219422 0.215153 69.99 0

5 rows กม 31 columns

df.shape
(284807, 31)
y = df['Class']
X = df.drop(['Time', 'Class'], axis = 1)
from sklearn.model_selection import train_test_split
# ?train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42)

Define and train model

build_model[source]

build_model(input_dim=29, outer_layer_dim=14, inner_layer_dim=7)

# ?autoencoder.compile
autoencoder.compile(optimizer='adam', loss='mean_squared_error', metrics=['accuracy'])
history = autoencoder.fit(X_train, X_train, epochs = 10, batch_size = 32, validation_split=0.2)
Train on 205060 samples, validate on 51266 samples
Epoch 1/10
205060/205060 [==============================] - 10s 47us/step - loss: 58.7521 - accuracy: 0.8417 - val_loss: 0.9116 - val_accuracy: 0.8536
Epoch 2/10
205060/205060 [==============================] - 10s 50us/step - loss: 2.4403 - accuracy: 0.8613 - val_loss: 0.7706 - val_accuracy: 0.8786
Epoch 3/10
205060/205060 [==============================] - 9s 45us/step - loss: 1.6772 - accuracy: 0.8856 - val_loss: 0.7180 - val_accuracy: 0.8955
Epoch 4/10
205060/205060 [==============================] - 9s 44us/step - loss: 1.3769 - accuracy: 0.8986 - val_loss: 0.9737 - val_accuracy: 0.9111
Epoch 5/10
205060/205060 [==============================] - 9s 45us/step - loss: 1.7829 - accuracy: 0.9004 - val_loss: 0.7955 - val_accuracy: 0.9119
Epoch 6/10
205060/205060 [==============================] - 9s 45us/step - loss: 1.2445 - accuracy: 0.9101 - val_loss: 0.6326 - val_accuracy: 0.9171
Epoch 7/10
205060/205060 [==============================] - 10s 47us/step - loss: 1.5845 - accuracy: 0.9157 - val_loss: 0.6526 - val_accuracy: 0.9168
Epoch 8/10
205060/205060 [==============================] - 9s 44us/step - loss: 1.2776 - accuracy: 0.9221 - val_loss: 0.7749 - val_accuracy: 0.9067
Epoch 9/10
205060/205060 [==============================] - 9s 44us/step - loss: 1.1414 - accuracy: 0.9253 - val_loss: 0.5749 - val_accuracy: 0.9349
Epoch 10/10
205060/205060 [==============================] - 9s 44us/step - loss: 1.0061 - accuracy: 0.9317 - val_loss: 0.5736 - val_accuracy: 0.9319
autoencoder.save("model/creditcard_autoencoders_model.h5")

plot_acc[source]

plot_acc(history)

plot_acc(history.history)

plot_loss[source]

plot_loss(history)

plot_loss(history.history)
from keras.models import load_model
autoencoder2 = load_model("model/creditcard_autoencoders_model.h5")
autoencoder2.__class__
keras.engine.training.Model
preds = autoencoder.predict(X_test)
for i in range(5):
    # print(i)
    if np.sum(np.square(preds[i] - np.array(X_test)[i]))/30 < 1:
        np.sum(np.square(preds[i] - np.array(X_test)[i]))/30
    else :
        1

ifelse is too slow.

y_preds  = np.where(np.sum(np.square(preds - np.array(X_test)), axis = 1)/30 < 1, 
                    np.sum(np.square(preds - np.array(X_test)), axis = 1)/30, 1)
y_preds.__class__
numpy.ndarray
y_preds.shape
(28481,)
import pyks
data = pd.DataFrame({'y': y_test, 'yhat': y_preds})
pyks.plot(data)
0.8452397152926965
0.8452397152926965
<Figure size 432x288 with 0 Axes>