通過使用者使用信用卡行為資料,建立信用卡盜刷風控模型。當使用者有了新的行為,通過這個模型就可以判斷是正常使用者的行為,還是有人盜刷這張卡。
由于資料集是PCA降維後的資料,這樣就隐藏了原始資訊的敏感資訊,但保留了原資料中的資訊量。
深度神經網絡可解釋性差,資料次元是用PCA處理之後的,是以很容易出現過拟合。可以把權重值加到Loss函數中,懲罰權重,降低過拟合。
檢視資料
kaggle官網 下載下傳資料需要登入。
歐洲的信用卡持卡人在2013年9月2天時間裡的284807筆交易資料,其中有492筆交易是欺詐交易,占比0.172%。資料采用PCA變換映射為V1,V2,...,V28 數值型屬性,隻有交易時間和金額這兩個變量沒有經過PCA變換。輸出變量為二值變量,1為正常,0為欺詐交易。
RangeIndex: 284807 entries, 0 to 284806
Data columns (total 31 columns):
Time 284807 non-null float64
V1 284807 non-null float64
V2 284807 non-null float64
V3 284807 non-null float64
V4 284807 non-null float64
V5 284807 non-null float64
V6 284807 non-null float64
V7 284807 non-null float64
V8 284807 non-null float64
V9 284807 non-null float64
V10 284807 non-null float64
V11 284807 non-null float64
V12 284807 non-null float64
V13 284807 non-null float64
V14 284807 non-null float64
V15 284807 non-null float64
V16 284807 non-null float64
V17 284807 non-null float64
V18 284807 non-null float64
V19 284807 non-null float64
V20 284807 non-null float64
V21 284807 non-null float64
V22 284807 non-null float64
V23 284807 non-null float64
V24 284807 non-null float64
V25 284807 non-null float64
V26 284807 non-null float64
V27 284807 non-null float64
V28 284807 non-null float64
Amount 284807 non-null float64
Class 284807 non-null int64
dtypes: float64(30), int64(1)
memory usage: 67.4 MB
count mean std min 25% \
Time 284807.0 9.481386e+04 47488.145955 0.000000 54201.500000
V1 284807.0 3.919560e-15 1.958696 -56.407510 -0.920373
V2 284807.0 5.688174e-16 1.651309 -72.715728 -0.598550
V3 284807.0 -8.769071e-15 1.516255 -48.325589 -0.890365
V4 284807.0 2.782312e-15 1.415869 -5.683171 -0.848640
V5 284807.0 -1.552563e-15 1.380247 -113.743307 -0.691597
V6 284807.0 2.010663e-15 1.332271 -26.160506 -0.768296
V7 284807.0 -1.694249e-15 1.237094 -43.557242 -0.554076
V8 284807.0 -1.927028e-16 1.194353 -73.216718 -0.208630
V9 284807.0 -3.137024e-15 1.098632 -13.434066 -0.643098
V10 284807.0 1.768627e-15 1.088850 -24.588262 -0.535426
V11 284807.0 9.170318e-16 1.020713 -4.797473 -0.762494
V12 284807.0 -1.810658e-15 0.999201 -18.683715 -0.405571
V13 284807.0 1.693438e-15 0.995274 -5.791881 -0.648539
V14 284807.0 1.479045e-15 0.958596 -19.214325 -0.425574
V15 284807.0 3.482336e-15 0.915316 -4.498945 -0.582884
V16 284807.0 1.392007e-15 0.876253 -14.129855 -0.468037
V17 284807.0 -7.528491e-16 0.849337 -25.162799 -0.483748
V18 284807.0 4.328772e-16 0.838176 -9.498746 -0.498850
V19 284807.0 9.049732e-16 0.814041 -7.213527 -0.456299
V20 284807.0 5.085503e-16 0.770925 -54.497720 -0.211721
V21 284807.0 1.537294e-16 0.734524 -34.830382 -0.228395
V22 284807.0 7.959909e-16 0.725702 -10.933144 -0.542350
V23 284807.0 5.367590e-16 0.624460 -44.807735 -0.161846
V24 284807.0 4.458112e-15 0.605647 -2.836627 -0.354586
V25 284807.0 1.453003e-15 0.521278 -10.295397 -0.317145
V26 284807.0 1.699104e-15 0.482227 -2.604551 -0.326984
V27 284807.0 -3.660161e-16 0.403632 -22.565679 -0.070840
V28 284807.0 -1.206049e-16 0.330083 -15.430084 -0.052960
Amount 284807.0 8.834962e+01 250.120109 0.000000 5.600000
Class 284807.0 1.727486e-03 0.041527 0.000000 0.000000
50% 75% max
Time 84692.000000 139320.500000 172792.000000
V1 0.018109 1.315642 2.454930
V2 0.065486 0.803724 22.057729
V3 0.179846 1.027196 9.382558
V4 -0.019847 0.743341 16.875344
V5 -0.054336 0.611926 34.801666
V6 -0.274187 0.398565 73.301626
V7 0.040103 0.570436 120.589494
V8 0.022358 0.327346 20.007208
V9 -0.051429 0.597139 15.594995
V10 -0.092917 0.453923 23.745136
V11 -0.032757 0.739593 12.018913
V12 0.140033 0.618238 7.848392
V13 -0.013568 0.662505 7.126883
V14 0.050601 0.493150 10.526766
V15 0.048072 0.648821 8.877742
V16 0.066413 0.523296 17.315112
V17 -0.065676 0.399675 9.253526
V18 -0.003636 0.500807 5.041069
V19 0.003735 0.458949 5.591971
V20 -0.062481 0.133041 39.420904
V21 -0.029450 0.186377 27.202839
V22 0.006782 0.528554 10.503090
V23 -0.011193 0.147642 22.528412
V24 0.040976 0.439527 4.584549
V25 0.016594 0.350716 7.519589
V26 -0.052139 0.240952 3.517346
V27 0.001342 0.091045 31.612198
V28 0.011244 0.078280 33.847808
Amount 22.000000 77.165000 25691.160000
Class 0.000000 0.000000 1.000000
複制
資料主要是由PCA産生,不需要過多預處理。
模組化——深度學習模型
資料集有一個特點,正标簽很少,是以在訓練的時候應該均和正負标簽。
import numpy as np
import pandas as pd
np.set_printoptions(suppress=True)
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
data_df = pd.read_csv("/Users/wangsen/ai/13/homework/creditcard.csv")
# data_df.info()
# print(data_df.describe().T)
#print(data_df.Time.head(100))
data_df = data_df.drop("Time",axis=1)
neg_df = data_df[data_df.Class==0]
pos_df = data_df[data_df.Class==1]
#print(neg_df.head())
#print(pos_df.head())
neg_data = neg_df.drop('Class',axis=1).values
pos_data = pos_df.drop('Class',axis=1).values
print("neg_data shape:",neg_data.shape)
print("pos_data shape:",pos_data.shape)
import tensorflow as tf
X = tf.placeholder(dtype=tf.float32,shape=[None,29])
label = tf.placeholder(dtype=tf.float32,shape=[None,2])
net = tf.layers.dense(X,16,tf.nn.relu)
net = tf.layers.dense(net,256,tf.nn.relu)
net = tf.layers.dense(net,256,tf.nn.relu)
net = tf.layers.dense(net,256,tf.nn.relu)
net = tf.layers.dense(net,256,tf.nn.relu)
y = tf.layers.dense(net,2,None)
#y = tf.nn.softmax(y)
loss = tf.losses.softmax_cross_entropy(label,y)
#loss = tf.reduce_mean(tf.square(label-y))
#loss = tf.reduce_sum(-label*tf.log(y))
accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.argmax(y, 1), tf.argmax(label, 1)), tf.float32))
train_step = tf.train.AdamOptimizer(0.0001).minimize(loss)
# train_step = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
neg_high = neg_data.shape[0]
pos_high = pos_data.shape[0]
input_y = np.zeros([64, 2])
input_y[:32, 0] = 1
input_y[32:, 1] = 1
test_x = np.concatenate([neg_data[10000:10000+450],pos_data[0:450]])
test_y = np.zeros([900,2])
test_y[:450,0] = 1
test_y[450:,1] = 1
sess = tf.Session()
sess.run(tf.global_variables_initializer())
l = []
s = []
for itr in range(10000):
neg_ind = np.random.randint(0,neg_high,32)
pos_ind = np.random.randint(0,pos_high,32)
input_x = np.concatenate([neg_data[neg_ind],pos_data[pos_ind]])
_,loss_var = sess.run((train_step,loss),feed_dict={X:input_x,label:input_y})
if itr%100==0:
accuracy_var = sess.run(accuracy,feed_dict={X:test_x,label:test_y})
print("iter: accurency: loss:%f",(itr,accuracy_var,loss_var))
s.append(accuracy_var)
l.append(loss_var)
import matplotlib.pyplot as plt
plt.plot(l,color="red")
plt.plot(s,color="green")
plt.show()
'''
neg_data shape: (284315, 29)
pos_data shape: (492, 29)
'''
複制
iter: accurency: loss:%f (9900, 0.98888886, 0.06290674)
複制
![](https://img.laitimes.com/img/9ZDMuAjOiMmIsIjOiQnIsICMyYTMvw1dvwlMvwlM3VWaWV2Zh1Wa-cmbw5CMwd2Nq92a3UjcvwlN5YDM3EjMtUGall3LcVmdhNXLwRHdo9CXt92YucWbpRWdvx2Yx5yazF2Lc9CX6MHc0RHaiojIsJye.png)
損失函數和準确率
防止過拟合,添加L2懲罰
loss_w = [tf.nn.l2_loss(var) for var in tf.trainable_variables() if "kernel" in var.name]
print("variables:",tf.trainable_variables())
weights_norm = tf.reduce_sum(loss_w)
loss = tf.losses.softmax_cross_entropy(label,y)+0.001*weights_norm
複制
variables: [<tf.Variable 'dense/kernel:0' shape=(29, 16) dtype=float32_ref>, <tf.Variable 'dense/bias:0' shape=(16,) dtype=float32_ref>, <tf.Variable 'dense_1/kernel:0' shape=(16, 256) dtype=float32_ref>, <tf.Variable 'dense_1/bias:0' shape=(256,) dtype=float32_ref>, <tf.Variable 'dense_2/kernel:0' shape=(256, 256) dtype=float32_ref>, <tf.Variable 'dense_2/bias:0' shape=(256,) dtype=float32_ref>, <tf.Variable 'dense_3/kernel:0' shape=(256, 256) dtype=float32_ref>, <tf.Variable 'dense_3/bias:0' shape=(256,) dtype=float32_ref>, <tf.Variable 'dense_4/kernel:0' shape=(256, 256) dtype=float32_ref>, <tf.Variable 'dense_4/bias:0' shape=(256,) dtype=float32_ref>, <tf.Variable 'dense_5/kernel:0' shape=(256, 2) dtype=float32_ref>, <tf.Variable 'dense_5/bias:0' shape=(2,) dtype=float32_ref>]
iter:4900 accurency:0.972222 loss:0.291221 weight:197.521942
複制
添加L2懲罰項