代碼學習
散點畫圖方式、多線繪與一圖、多次項式畫圖
軸刻度設定
定點給線、邏輯函數圖繪制
定點标記、二進制分類可視化、繪制文本與箭頭
4.1線性回歸
特征與偏置項的權重求和
目标:使方差成本函數最小θ
代碼實作
标準方程:WLIN=θ=僞逆陣Xy
代碼實作:
np.linalg.inv計算僞逆陣,求得θ0、θ1
求僞逆法二:np.linalg.pinv()直接得X僞逆
再矩陣相乘得y_predict始終點y值
Scikit-Learn:
執行個體、模型訓練、intercept_coef_得θ0與θ1、predict得線
法二:np.linalg.lstsq直接得θ0、θ1
4.2梯度下降
在初始化θ0每步降低成本函數MSE(θ),直到趨向最小
目标:找使成本函數MSE(θ)最小化的參數組合Wlin
∨MSE(θ)成本函數的偏導(梯度):訓示四面八方上升最快的方向
學習率η步數
批量梯度下降
在每一步(疊代)中計算梯度∨MSE(θ)
代碼實作:
随機梯度下降SGD
代碼實作:
每一步(疊代)随機找個執行個體計算梯度,每個疊代逐漸降低學習率(步長η)
學習率排程函數:eta = learning_schedule(epoch * m + i)
確定執行個體iid:對執行個體混洗,使失去标簽排序順序
Scikit-Learn:
SGDRegressor、訓練模型、得偏置項與權重
小批量梯度下降
小批量、随機執行個體
4.3多項式回歸
目标:線性模型拟合非線性D
介紹:多次項添加為新特征,使用新特征集訓練模型
Scikit-Learn:
PolynomialFeatures自動添加特征組合、fittransform添加二次項列的特征集、使用線性回歸
4.4學習曲線
驗證模型的泛化性能:交叉驗證、學習曲線
原理:不同訓練集大小上train和val集各自的RMSE水準
自定義繪制函數plot_learning_curves
4.5正則化
原理:限制模型權重θ減少次數,進而減少過拟合
嶺回歸??
嶺回歸成本函數:添加了α/2l2範數進行限制
閉式解嶺回歸函數:
Lasso回歸??
Lasso回歸成本函數:添加αl1範數進行限制
彈性網路
加入混合比r與(1-r)分别控制雙回歸的懲罰項
ElasticNet模型
提前停止
Eval上升的時候即在過拟合,復原模型參數到Eval最小的位置
算法實作:
4.6邏輯回歸??
對線性回歸的預測值z二分化,嵌入sigmoid函數并取對數得到p、預測y
決策邊界
用petal width特征檢測是否弗吉尼亞:
擷取寬度資料X與0/1标簽資料y、訓練邏輯回歸模型、制作x軸資料點并對對應點作各類估計分數、繪圖
Softmax回歸
對某執行個體計算每個類k分數的指數,再歸一化,分類器傳回最高機率的一個類
Scikit-Learn:
# 作業
a=0
while a<3:
a+=1
if a == 2:
break
print(a)
a=0
while a<3:
a+=1
if a == 2:
continue
print(a)
for i in range(4):
for j in range(4):
if j == 0:
continue
print(i)
if j ==2:
break
print(i,j)
嵌套for循環中
continue:跳過後續代碼進入下一次循環,該程式不運作内層第1次循環,内層直接運作第二次循環,是以輸出的j不包含0
break:結束内層循環,是以内層隻運作第二次,第3、4次循環均無輸出
while循環中:
continue:結束本次,執行下一次循環
break:直接結束循環
當score輸入在區間[90,100)時,index索引下标為3,傳回第一個A
當輸入為100時,若無第二個A,則傳回為E
添加第二個A後,當score輸入為100時,index索引下标為4,傳回第二個A
File "<ipython-input-394-239d5f113bd3>", line 23
continue:跳過後續代碼進入下一次循環,該程式不運作内層第1次循環,内層直接運作第二次循環,是以輸出的j不包含0
^
SyntaxError: invalid character in identifier
import matplotlib as mpl
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
X = 2 * np.random.rand(100, 1)
y = 4 + 3*X + np.random.randn(100, 1)
from matplotlib.pyplot import MultipleLocator
# plt.figure(figsize=(20,8), dpi=80)
plt.scatter(X, y)
plt.tick_params(axis='both',which='major',labelsize=14)
plt.xlabel('X1', fontsize=12)
plt.ylabel('y', fontsize=12)
# 刻度間隔
x_major_locator = MultipleLocator(0.25)
y_major_locator = MultipleLocator(2)
# 坐标軸執行個體
ax = plt.gca()
# 主刻度
ax.xaxis.set_major_locator(x_major_locator)
ax.yaxis.set_major_locator(y_major_locator)
# 刻度範圍
plt.xlim(0, 2.)
plt.ylim(0, 15)
plt.show()
X_b = np.c_[np.ones((100, 1)), X]
X_b
array([[1. , 0.69056896],
[1. , 1.09503641],
[1. , 0.64677682],
[1. , 1.30597366],
[1. , 1.83011935],
[1. , 1.05779863],
[1. , 0.54006116],
[1. , 0.41063882],
[1. , 0.84434918],
[1. , 1.48040929],
[1. , 0.32755812],
[1. , 0.95007289],
[1. , 1.6222861 ],
[1. , 1.36803173],
[1. , 0.67995342],
[1. , 0.28375754],
[1. , 0.22192431],
[1. , 0.81670441],
[1. , 0.3227235 ],
[1. , 1.93555247],
[1. , 0.88910284],
[1. , 0.90670084],
[1. , 0.85993465],
[1. , 1.56791135],
[1. , 1.7641284 ],
[1. , 1.71865204],
[1. , 1.27361387],
[1. , 1.52356172],
[1. , 1.06240312],
[1. , 1.19423602],
[1. , 0.68403175],
[1. , 1.23269376],
[1. , 0.28656663],
[1. , 0.56283408],
[1. , 0.64074196],
[1. , 0.18367042],
[1. , 0.71660301],
[1. , 0.13905064],
[1. , 0.35977406],
[1. , 0.4753392 ],
[1. , 0.45272167],
[1. , 0.93375507],
[1. , 0.68137743],
[1. , 0.2406101 ],
[1. , 0.60453245],
[1. , 1.02911014],
[1. , 1.76223207],
[1. , 1.89428423],
[1. , 1.41578483],
[1. , 1.08903995],
[1. , 0.28713063],
[1. , 1.83932683],
[1. , 1.72162226],
[1. , 1.9239099 ],
[1. , 1.89466128],
[1. , 1.69328722],
[1. , 1.04791071],
[1. , 1.08904717],
[1. , 0.32126084],
[1. , 1.31982875],
[1. , 1.24665312],
[1. , 0.71878169],
[1. , 1.43133907],
[1. , 1.03540358],
[1. , 1.1733726 ],
[1. , 1.90103184],
[1. , 1.24935772],
[1. , 0.85959727],
[1. , 0.11931619],
[1. , 1.08489517],
[1. , 0.53089631],
[1. , 1.15935157],
[1. , 0.50477505],
[1. , 0.82253989],
[1. , 0.42160345],
[1. , 0.05328369],
[1. , 1.7194971 ],
[1. , 1.03510607],
[1. , 0.54010678],
[1. , 1.1045144 ],
[1. , 0.75630817],
[1. , 1.66230773],
[1. , 1.36801615],
[1. , 0.51904843],
[1. , 0.76730644],
[1. , 1.23956514],
[1. , 0.06545095],
[1. , 1.97346577],
[1. , 0.17622071],
[1. , 0.38396168],
[1. , 1.06051449],
[1. , 0.23115968],
[1. , 1.15153478],
[1. , 0.97422361],
[1. , 0.25063895],
[1. , 0.61187805],
[1. , 0.90840842],
[1. , 1.34972953],
[1. , 1.82988156],
[1. , 0.63734291]])
theta_best
array([[4.07132793],
[3.00060297]])
X_new = np.array([[0], [2]])
X_new
array([[0],
[2]])
X_new_b = np.c_[np.ones((2,1)), X_new]
y_predict = X_new_b.dot(theta_best)
y_predict
array([[ 4.07132793],
[10.07253387]])
plt.plot(X_new, y_predict, 'r-')
plt.plot(X, y, 'b.')
plt.axis([0, 2, 0, 15])
plt.show()
from sklearn.linear_model import LinearRegression
lin_reg = LinearRegression()
lin_reg.fit(X, y)
lin_reg.intercept_, lin_reg.coef_
(array([4.07132793]), array([[3.00060297]]))
array([[ 4.07132793],
[10.07253387]])
theta_besta_svd, residuals, rank, s = np.linalg.lstsq(X_b, y, rcond=1e-6)
theta_besta_svd
array([[4.07132793],
[3.00060297]])
array([[4.07132793],
[3.00060297]])
count = 0
eta = 0.1
n_iterations = 1000
m =100
theta = np.random.randn(2,1)
xml = MultipleLocator(0.5)
yml = MultipleLocator(2)
fig, ax = plt.subplots()
for iteration in range(n_iterations):
gradients = 2/m * X_b.T.dot(X_b.dot(theta) - y)
theta = theta - eta * gradients
# theta3 = np.append(theta3, theta, axis=1)
# if count < 10:
# print(theta3)
ax.plot(X, y,'b.')
ax.plot(X_new, xx.dot(theta),'r-')
plt.xlabel('X1', fontsize=12)
plt.ylabel('y', fontsize=12)
ax.xaxis.set_major_locator(xml)
ax.yaxis.set_major_locator(yml)
plt.xlim(0,2.)
plt.ylim(0, 15)
count += 1
plt.show()
n_epochs = 100
t0, t1 = 5, 50
def learning_schedule(t):
return t0 / (t + t1)
theta = np.random.randn(2,1)
xx = np.c_[np.ones((2,1)), np.array([[0],[2]])]
for epoch in range(n_epochs):
for i in range(m):
random_index = np.random.randint(m)
xi = X_b[random_index:random_index+1]
yi = y[random_index:random_index+1]
gradients = 2 * xi.T.dot(xi.dot(theta) - yi)
eta = learning_schedule(epoch * m + i)
theta = theta - eta * gradients
plt.plot(X, y, 'b.')
plt.plot(X_new, xx.dot(theta))
plt.show()
from sklearn.linear_model import SGDRegressor
sgd_reg = SGDRegressor(max_iter=1000, tol=1e-3, penalty=None, eta0=0.1)
sgd_reg.fit(X, y.ravel())
SGDRegressor(eta0=0.1, penalty=None)
sgd_reg.intercept_, sgd_reg.coef_
(array([4.1182052]), array([3.1043404]))
plt.plot(X,y, 'b.')
plt.plot(X_new, rrr, 'g-')
plt.show()
m = 100
X = 6 * np.random.rand(m ,1) - 3
y = 0.5 * X**2 + X + 2 + np.random.randn(m, 1)
plt.plot(X, y, 'b.')
plt.show()
# 期待爬山、文言文翻譯學習器(可以自己寫點有趣的、可以開搞了)
from sklearn.preprocessing import PolynomialFeatures
poly_features = PolynomialFeatures(degree=2, include_bias=False)
X_poly = poly_features.fit_transform(X)
X[0]
array([0.04875089])
array([0.04875089, 0.00237665])
X_poly.shape
(100, 2)
lin_reg = LinearRegression()
lin_reg.fit(X_poly, y)
lin_reg.intercept_, lin_reg.coef_
(array([2.05438764]), array([[0.96842795, 0.48549735]]))
lin_reg2 = LinearRegression()
lin_reg2.fit(X, y)
lin_reg2.intercept_, lin_reg2.coef_
# 第二條直線
y_predict2 = lin_reg2.predict(np.array([[-3],[3]]))
poly = PolynomialFeatures(degree=300,include_bias=False)
poly.fit(X)
X3 = poly.transform(X)
lin_reg3 = LinearRegression()
lin_reg3.fit(X3,y)
y_predict3 = lin_reg3.predict(X3)
# y_predict3.shape
X3.shape
(100, 300)
# 第一條曲線
y_predict = lin_reg.predict(X_poly)
y_predict
plt.plot(X, y, 'r.')
plt.plot(np.sort(X,axis=None), y_predict[np.argsort(X,axis=None)], 'g-')
plt.plot(np.sort(X,axis=None), y_predict3[np.argsort(X,axis=None)], 'k:')
plt.plot(np.array([[-3],[3]]), y_predict2, 'y--')
plt.show()
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
def plot_learning_curves(model,X,y):
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2)
train_errors, val_errors = [], []
for m in range(1, len(X_train)):
model.fit(X_train[:m], y_train[:m])
y_train_predict = model.predict(X_train[:m])
y_val_predict = model.predict(X_val)
train_errors.append(mean_squared_error(y_train[:m], y_train_predict))
val_errors.append(mean_squared_error(y_val, y_val_predict))
plt.plot(np.sqrt(train_errors), 'r-+', linewidth=2, label='train')
plt.plot(np.sqrt(val_errors), 'b-', linewidth=3, label='val')
plt.legend()
plt.xlabel('訓練集大小', fontsize=12)
plt.ylabel('RMSE', fontsize=12)
lin_reg = LinearRegression()
plot_learning_curves(lin_reg, X, y)
plt.show()
from sklearn.pipeline import Pipeline
polynomial_regression = Pipeline([
('poly_features', PolynomialFeatures(degree=10, include_bias=False)),
('lin_reg', LinearRegression())
])
poly_features = PolynomialFeatures(degree=10, include_bias=False)
X_poly = poly_features.fit_transform(X)
lin_reg = LinearRegression()
# lin_reg.fit(X_poly,y)
plot_learning_curves(lin_reg, X,y)
plt.show()
from sklearn.linear_model import Ridge
ridge_reg = Ridge(alpha=0.1, solver='cholesky')
ridge_reg.fit(X,y)
ridge_reg.predict([[1.5]])
array([[5.01749775]])
sgd_reg = SGDRegressor(penalty='l2')
sgd_reg.fit(X,y.ravel())
sgd_reg.predict([[1.5]])
array([5.00194116])
from sklearn.linear_model import Lasso
lasso_reg = Lasso(alpha=0.1)
lasso_reg.fit(X,y)
lasso_reg.predict([[1.5]])
array([4.97201777])
sgd_reg=SGDRegressor(penalty='l1')
sgd_reg.fit(X,y.ravel())
sgd_reg.predict([[1.5]])
array([5.00530233])
from sklearn.linear_model import ElasticNet
elastic_net = ElasticNet(alpha=0.1, l1_ratio=0.5)
elastic_net.fit(X,y)
elastic_net.predict([[1.5]])
array([4.97514089])
from sklearn.preprocessing import StandardScaler
from sklearn.base import clone
X_train, X_val, y_train, y_val = train_test_split(X, y.ravel(), test_size=0.2)
poly_scaler = Pipeline([
('poly_features', PolynomialFeatures(degree=90, include_bias=False)),
('std_scaler', StandardScaler())
])
X_train_poly_scaled = poly_scaler.fit_transform(X_train)
X_val_poly_scaled = poly_scaler.transform(X_val)
sgd_reg = SGDRegressor(max_iter=1, tol=-np.infty, warm_start=True,
penalty=None, learning_rate='constant', eta0=0.0005)
minimum_val_error = float('inf')
best_epoch = None
best_model = None
for epoch in range(1000):
sgd_reg.fit(X_train_poly_scaled, y_train)
y_val_predict = sgd_reg.predict(X_val_poly_scaled)
val_error = mean_squared_error(y_val, y_val_predict)
if val_error < minimum_val_error:
minimum_val_error = val_error
best_epoch = epoch
best_model = clone(sgd_reg)
t = np.linspace(-10, 10, 100)
# 函數
sig = 1 / (1 + np.exp(-t))
# 圖檔大小
plt.figure(figsize=(9,3))
# y=0給條實線 y=0.5、y=1給條虛線 x=0實線
plt.plot([-10,10],[0,0],'k-')
plt.plot([-10,10], [0.5,0.5], 'k:')
plt.plot([-10,10],[1,1], 'k:')
plt.plot([0,0], [-1.1,1.1], 'k-')
plt.plot(t, sig, 'b-', linewidth=2, label=r'$\sigma(t) = \frac{1}{1 + e^{-t}}$')
plt.xlabel('t')
plt.legend(loc='upper left', fontsize=20)
plt.axis([-10, 10, -0.1, 1.1])
# save_fig('logistic_function_plot')
plt.show()
from sklearn import datasets
iris = datasets.load_iris()
list(iris.keys())
['data',
'target',
'frame',
'target_names',
'DESCR',
'feature_names',
'filename',
'data_module']
X = iris['data'][:, 3:]
y = (iris['target'] == 2).astype(int)
from sklearn.linear_model import LogisticRegression
log_reg = LogisticRegression()
log_reg.fit(X,y)
LogisticRegression()
# 資料X:第四行後的花瓣寬度;y:target列為2(virginia)則True(1),否則0
# X軸
X_new = np.linspace(0,3,1000).reshape(-1,1)
# 相應邏輯回歸的類别機率值
y_proba = log_reg.predict_proba(X_new)
# 決策邊界:可能性剛好在1/2的x點
decision_boundary = X_new[y_proba[:, 1] >= 0.5][0]
plt.figure(figsize=(8,3))
# y包含50個1,100個0(非弗吉尼亞)
# X[y==0], y[y==0]傳回100個非弗花瓣寬度、100個0,标記藍色方塊
# X[y==1], y[y==1]傳回50個弗花瓣寬度、50個1,标記綠色正三角
plt.plot(X[y==0], y[y==0], 'bs')
plt.plot(X[y==1], y[y==1], 'g^')
# 決策邊界虛線
plt.plot([decision_boundary, decision_boundary], [-1,2], 'k:', linewidth=2)
# (第二列)正類的相對機率、負類
plt.plot(X_new, y_proba[:, 1], 'g-', linewidth=2, label='弗吉尼亞鸢尾')
plt.plot(X_new, y_proba[:, 0], 'b--', linewidth=2, label='非弗吉尼亞鸢尾')
# 文本坐标、垂直對齊方式
plt.text(decision_boundary+0.02, 0.15, '決策邊界', fontsize=14, color='k', ha='center')
# 箭尾坐标、箭頭方向與坐标偏移量、頭寬長、頭尾色
plt.arrow(decision_boundary, 0.08, -0.3, 0, head_width=0.05, head_length=0.1, fc='b', ec='b')
plt.arrow(decision_boundary, 0.92, 0.3, 0, head_width=0.05, head_length=0.1, fc='g', ec='g')
plt.xlabel('花瓣寬度(cm)', fontsize=14)
plt.ylabel('機率', fontsize=14)
plt.legend(loc='center left', fontsize=14)
plt.axis([0,3,-0.02,1.02])
plt.show()
array([1, 0])
Xl,Yl = np.meshgrid(
np.linspace(0,1000,20).reshape(-1,1),
np.linspace(0,500,20).reshape(-1,1)
)
plt.plot(Xl, Yl,
color='limegreen', # 設定顔色為limegreen
marker='.', # 設定點類型為圓點
linestyle='') # 設定線型為空,也即沒有線連接配接點
plt.grid(True)
plt.show()
from sklearn.linear_model import LogisticRegression
# 花瓣長度、寬度
X = iris['data'][:, (2,3)]
# 非0弗1,前100後50
y = (iris['target'] == 2).astype(int)
# 五種優化之一、系數倒數
log_reg = LogisticRegression(solver='lbfgs', C=10**10, random_state=42)
log_reg.fit(X, y)
# 生成網格點矩陣,均200行500列
x0, x1 = np.meshgrid(
np.linspace(2.9,7,500).reshape(-1,1),
np.linspace(0.8,2.7,200).reshape(-1,1)
)
# 分别拉成一維,再合并
X_new = np.c_[x0.ravel(), x1.ravel()]
# 對X_new各點給出0/1分數
y_proba = log_reg.predict_proba(X_new)
plt.figure(figsize=(10,4))
# 清單前100T後50F,加第二位的0/1。。。
# 100個非弗行,以及第一列的長度第二列的寬度
plt.plot(X[y==0, 0], X[y==0,1], 'bs')
# 50個弗行,以及第一列的長度第二列的寬度
plt.plot(X[y==1, 0], X[y==1,1], 'g^')
zz = y_proba[:,1].reshape(x0.shape)
# 雙自變,1因變,非填充的漸變..輪廓線,類似等高
contour = plt.contour(x0, x1, zz, cmap=plt.cm.brg)
left_right = np.array([2.9, 7])
boundary = -(log_reg.coef_[0][0] * left_right + log_reg.intercept_[0]) / log_reg.coef_[0][1]
#
plt.clabel(contour, inline=1, fontsize=12)
plt.plot(left_right, boundary, 'k--', linewidth=3)
plt.text(3.5, 1.5, '非弗', fontsize=14, color='b', ha='center')
plt.text(6.5, 2.3, '弗', fontsize=14, color='g', ha='center')
plt.xlabel('花瓣長度', fontsize=14)
plt.ylabel('花瓣寬度', fontsize=14)
plt.axis([2.9,7,0.8,2.7])
plt.show()
X = iris['data'][:, (2,3)]
y = iris['target']
softmax_reg = LogisticRegression(multi_class='multinomial',solver='lbfgs', C=10)
softmax_reg.fit(X,y)
softmax_reg.predict([[5,2]])
array([2])
array([[6.38014896e-07, 5.74929995e-02, 9.42506362e-01]])
# 用softmax批量梯度下降,實作提前停止
X = iris['data'][:, (2,3)]
y = iris['target']
# 構造矩陣,對每個矩陣添加x0為1的偏置項
X_with_bias = np.c_[np.ones([len(X), 1]), X]
np.random.seed(2042)
# 手動實作train與val集分層抽樣、設定test、val比例與總數
test_ratio = 0.2
validation_ratio = 0.2
total_size = len(X_with_bias)
# 設定各資料集的數量
test_size = int(total_size * test_ratio)
validation_size = int(total_size * validation_ratio)
train_size = total_size - test_size - validation_size
# permutation:對總量索引随機排序,二維0縱1橫,三維0橫向1縱向,
rnd_indices = np.random.permutation(total_size)
# 各資料随機切片操作
X_train = X_with_bias[rnd_indices[:train_size]]
y_train = y[rnd_indices[:train_size]]
X_valid = X_with_bias[rnd_indices[train_size:-test_size]]
y_valid = y[rnd_indices[train_size:-test_size]]
X_test = X_with_bias[rnd_indices[-test_size:]]
y_test = y[rnd_indices[-test_size:]]
# 編稀疏模型(獨熱編碼),模型全0之後,隻要對應索引為1
def to_one_hot(y):
n_classes = y.max() + 1
m = len(y)
Y_one_hot = np.zeros((m, n_classes))
Y_one_hot[np.arange(m), y] = 1
return Y_one_hot
array([0, 1, 2, 1, 1, 0, 1, 1, 1, 0])
array([[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[1., 0., 0.],
[0., 1., 0.],
[0., 1., 0.],
[0., 1., 0.],
[1., 0., 0.]])
# 目标标簽全轉稀疏獨熱模型
Y_train_one_hot = to_one_hot(y_train)
Y_valid_one_hot = to_one_hot(y_valid)
Y_test_one_hot = to_one_hot(y_test)
# 指數除以指數之和函數
def softmax(logits):
exps = np.exp(logits)
exp_sums = np.sum(exps, axis=1, keepdims=True)
return exps / exp_sums
# 特征數量、标簽類别數量
n_inputs = X_train.shape[1]
n_outputs = len(np.unique(y_train))
# 訓練模型需要的等式
# 成本函數
# $J(\mathbf{\Theta}) =
# - \dfrac{1}{m}\sum\limits_{i=1}^{m}\sum\limits_{k=1}^{K}{y_k^{(i)}\log\left(\hat{p}_k^{(i)}\right)}$
# 梯度計算
# $\nabla_{\mathbf{\theta}^{(k)}} \, J(\mathbf{\Theta}) =
# \dfrac{1}{m} \sum\limits_{i=1}^{m}{ \left ( \hat{p}^{(i)}_k - y_k^{(i)} \right ) \mathbf{x}^{(i)}}$
eta = 0.01
n_iterations = 5001
m = len(X_train)
epsilon = 1e-7
Theta = np.random.randn(n_inputs, n_outputs)
# softmax實作批量梯度下降
for iteration in range(n_iterations):
logits = X_train.dot(Theta)
Y_proba = softmax(logits)
if iteration % 500 == 0:
loss = -np.mean(np.sum(Y_train_one_hot * np.log(Y_proba+epsilon), axis=1))
print(iteration, loss)
error = Y_proba - Y_train_one_hot
gradients = 1/m * X_train.T.dot(error)
Theta = Theta - eta * gradients
0 5.446205811872683
500 0.8350062641405651
1000 0.6878801447192402
1500 0.6012379137693313
2000 0.5444496861981872
2500 0.5038530181431525
3000 0.47292289721922487
3500 0.44824244188957774
4000 0.4278651093928793
4500 0.41060071429187134
5000 0.3956780375390374
Theta
array([[ 3.32094157, -0.6501102 , -2.99979416],
[-1.1718465 , 0.11706172, 0.10507543],
[-0.70224261, -0.09527802, 1.4786383 ]])
# 預測驗證集、檢查機率分數
logits = X_valid.dot(Theta)
Y_proba = softmax(logits)
y_predict = np.argmax(Y_proba, axis=1)
accuracy_score = np.mean(y_predict == y_valid)
accuracy_score
0.9666666666666667
# 對成本函數添加l2懲罰
eta = 0.1
n_iterations = 5001
m = len(X_train)
epsilon = 1e-7
alpha = 0.1
Theta = np.random.randn(n_inputs, n_outputs)
for iteration in range(n_iterations):
logits = X_train.dot(Theta)
Y_proba = softmax(logits)
if iteration % 500 == 0:
xentropy_loss = -np.mean(np.sum(Y_train_one_hot * np.log(Y_proba + epsilon), axis=1))
l2_loss = 1/2 * np.sum(np.square(Theta[1:]))
loss = xentropy_loss + alpha * l2_loss
print(iteration, loss)
error = Y_proba - Y_train_one_hot
gradients = 1/m * X_train.T.dot(error) + np.r_[np.zeros([1, n_outputs]), alpha * Theta[1:]]
Theta = Theta - eta * gradients
0 5.401014020496038
500 0.5399802167300589
1000 0.5055073771883054
1500 0.4953639890209271
2000 0.49156703270914
2500 0.4900134074001495
3000 0.48934877664358845
3500 0.48905717267345383
4000 0.488927251858594
4500 0.4888688023117297
5000 0.4888423408562912
# 驗證模型表現
logits = X_valid.dot(Theta)
Y_proba = softmax(logits)
y_predict = np.argmax(Y_proba, axis=1)
accuracy_score = np.mean(y_predict == y_valid)
accuracy_score
1.0
# 提前停止算法:在每次疊代計算Eval,并在當Eval開始上升時停止
eta = 0.1
n_iterations = 5001
m =len(X_train)
epsilon = 1e-7
alpha = 0.1
best_loss = np.infty
Theta = np.random.randn(n_inputs, n_outputs)
for iteration in range(n_iterations):
logits = X_train.dot(Theta)
Y_proba = softmax(logits)
error = Y_proba - Y_train_one_hot
gradients = 1/m * X_train.T.dot(error) + np.r_[np.zeros([1, n_outputs]), alpha * Theta[1:]]
Theta = Theta -eta * gradients
logits = X_valid.dot(Theta)
Y_proba = softmax(logits)
xentropy_loss = -np.mean(np.sum(Y_valid_one_hot * np.log(Y_proba + epsilon), axis=1))
l2_loss = 1/2 * np.sum(np.square(Theta[1:]))
loss = xentropy_loss + alpha * l2_loss
if iteration % 500 == 0:
print(iteration, loss)
if loss < best_loss:
best_loss = loss
else:
print(iteration -1, best_loss)
print(iteration, loss, '提前停止!')
break
0 2.897275838876366
500 0.5702751662442892
1000 0.5425654873413586
1500 0.5353090385301479
2000 0.5331256731252507
2500 0.5325827330917428
2736 0.5325454243382794
2737 0.5325454252101579 提前停止!
logits = X_valid.dot(Theta)
Y_proba = softmax(logits)
y_predict = np.argmax(Y_proba, axis=1)
accuracy_score = np.mean(y_predict == y_valid)
accuracy_score
1.0
# 繪制模型預測圖
x0, x1 = np.meshgrid(
np.linspace(0, 8, 500).reshape(-1,1),
np.linspace(0, 3.5, 200).reshape(-1,1)
)
X_new = np.c_[x0.ravel(), x1.ravel()]
X_new_with_bias = np.c_[np.ones([len(X_new), 1]), X_new]
logits = X_new_with_bias.dot(Theta)
Y_proba = softmax(logits)
y_predict = np.argmax(Y_proba, axis=1)
zz1 = Y_proba[:, 1].reshape(x0.shape)
zz = y_predict.reshape(x0.shape)
plt.figure(figsize=(10, 4))
plt.plot(X[y==2,0], X[y==2,1], 'g^', label='弗吉尼亞')
plt.plot(X[y==1,0], X[y==1,1], 'bs', label='變色鸢尾')
plt.plot(X[y==0,0], X[y==0,1], 'yo', label='山鸢尾')
from matplotlib.colors import ListedColormap
custom_cmap = ListedColormap(['#fafab0', '#9898ff', '#a0faa0'])
plt.contourf(x0, x1, zz, cmap=custom_cmap)
contour = plt.contour(x0, x1, zz1, cmap=plt.cm.brg)
plt.clabel(contour, inline=1, fontsize=12)
plt.xlabel('花瓣長度',fontsize=14)
plt.ylabel('花瓣寬度',fontsize=14)
plt.legend(loc='upper left', fontsize=14)
plt.axis([0,7,0,3.5])
plt.show()
# 對測試集測試模型準确度
# 原因:N太小,并且對三資料集切分不同結果也不同
logits = X_test.dot(Theta)
Y_proba = softmax(logits)
y_predict = np.argmax(Y_proba, axis=1)
accuracy_score = np.mean(y_predict == y_test)
accuracy_score
0.9333333333333333