天天看點

第四章 訓練模型代碼學習4.1線性回歸4.2梯度下降4.3多項式回歸4.4學習曲線4.5正則化4.6邏輯回歸??

代碼學習

散點畫圖方式、多線繪與一圖、多次項式畫圖

軸刻度設定

定點給線、邏輯函數圖繪制

定點标記、二進制分類可視化、繪制文本與箭頭

4.1線性回歸

特征與偏置項的權重求和

目标:使方差成本函數最小θ

代碼實作

标準方程:WLIN=θ=僞逆陣Xy

代碼實作:

np.linalg.inv計算僞逆陣,求得θ0、θ1

求僞逆法二:np.linalg.pinv()直接得X僞逆

再矩陣相乘得y_predict始終點y值

Scikit-Learn:

執行個體、模型訓練、intercept_coef_得θ0與θ1、predict得線

法二:np.linalg.lstsq直接得θ0、θ1

4.2梯度下降

在初始化θ0每步降低成本函數MSE(θ),直到趨向最小

目标:找使成本函數MSE(θ)最小化的參數組合Wlin

∨MSE(θ)成本函數的偏導(梯度):訓示四面八方上升最快的方向

學習率η步數

批量梯度下降

在每一步(疊代)中計算梯度∨MSE(θ)

代碼實作:

随機梯度下降SGD

代碼實作:

每一步(疊代)随機找個執行個體計算梯度,每個疊代逐漸降低學習率(步長η)

學習率排程函數:eta = learning_schedule(epoch * m + i)

確定執行個體iid:對執行個體混洗,使失去标簽排序順序

Scikit-Learn:

SGDRegressor、訓練模型、得偏置項與權重

小批量梯度下降

小批量、随機執行個體

4.3多項式回歸

目标:線性模型拟合非線性D

介紹:多次項添加為新特征,使用新特征集訓練模型

Scikit-Learn:

PolynomialFeatures自動添加特征組合、fittransform添加二次項列的特征集、使用線性回歸

4.4學習曲線

驗證模型的泛化性能:交叉驗證、學習曲線

原理:不同訓練集大小上train和val集各自的RMSE水準

自定義繪制函數plot_learning_curves

4.5正則化

原理:限制模型權重θ減少次數,進而減少過拟合

嶺回歸??

嶺回歸成本函數:添加了α/2l2範數進行限制

閉式解嶺回歸函數:

Lasso回歸??

Lasso回歸成本函數:添加αl1範數進行限制

彈性網路

加入混合比r與(1-r)分别控制雙回歸的懲罰項

ElasticNet模型

提前停止

Eval上升的時候即在過拟合,復原模型參數到Eval最小的位置

算法實作:

4.6邏輯回歸??

對線性回歸的預測值z二分化,嵌入sigmoid函數并取對數得到p、預測y

決策邊界

用petal width特征檢測是否弗吉尼亞:

擷取寬度資料X與0/1标簽資料y、訓練邏輯回歸模型、制作x軸資料點并對對應點作各類估計分數、繪圖

Softmax回歸

對某執行個體計算每個類k分數的指數,再歸一化,分類器傳回最高機率的一個類

Scikit-Learn:

# 作業
a=0
while a<3:
    a+=1
    if a == 2:
        break
    print(a)
a=0
while a<3:
    a+=1
    if a == 2:
        continue
    print(a)
for i in range(4):
    for j in range(4):
            if j == 0:
                continue
                print(i)
            if j ==2:
                break
            print(i,j)
嵌套for循環中
continue:跳過後續代碼進入下一次循環,該程式不運作内層第1次循環,内層直接運作第二次循環,是以輸出的j不包含0
break:結束内層循環,是以内層隻運作第二次,第3、4次循環均無輸出
while循環中:
continue:結束本次,執行下一次循環
break:直接結束循環
當score輸入在區間[90,100)時,index索引下标為3,傳回第一個A
當輸入為100時,若無第二個A,則傳回為E
添加第二個A後,當score輸入為100時,index索引下标為4,傳回第二個A
           
File "<ipython-input-394-239d5f113bd3>", line 23
    continue:跳過後續代碼進入下一次循環,該程式不運作内層第1次循環,内層直接運作第二次循環,是以輸出的j不包含0
                                                              ^
SyntaxError: invalid character in identifier
           
import matplotlib as mpl
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
           
X = 2 * np.random.rand(100, 1)
y = 4 + 3*X + np.random.randn(100, 1)
           
from matplotlib.pyplot import MultipleLocator
           
# plt.figure(figsize=(20,8), dpi=80)
plt.scatter(X, y)
plt.tick_params(axis='both',which='major',labelsize=14)
plt.xlabel('X1', fontsize=12)
plt.ylabel('y', fontsize=12)
# 刻度間隔
x_major_locator = MultipleLocator(0.25)
y_major_locator = MultipleLocator(2)
# 坐标軸執行個體
ax = plt.gca()
# 主刻度
ax.xaxis.set_major_locator(x_major_locator)
ax.yaxis.set_major_locator(y_major_locator)
# 刻度範圍
plt.xlim(0, 2.)
plt.ylim(0, 15)
plt.show()
           
第四章 訓練模型代碼學習4.1線性回歸4.2梯度下降4.3多項式回歸4.4學習曲線4.5正則化4.6邏輯回歸??
X_b = np.c_[np.ones((100, 1)), X]
X_b
           
array([[1.        , 0.69056896],
       [1.        , 1.09503641],
       [1.        , 0.64677682],
       [1.        , 1.30597366],
       [1.        , 1.83011935],
       [1.        , 1.05779863],
       [1.        , 0.54006116],
       [1.        , 0.41063882],
       [1.        , 0.84434918],
       [1.        , 1.48040929],
       [1.        , 0.32755812],
       [1.        , 0.95007289],
       [1.        , 1.6222861 ],
       [1.        , 1.36803173],
       [1.        , 0.67995342],
       [1.        , 0.28375754],
       [1.        , 0.22192431],
       [1.        , 0.81670441],
       [1.        , 0.3227235 ],
       [1.        , 1.93555247],
       [1.        , 0.88910284],
       [1.        , 0.90670084],
       [1.        , 0.85993465],
       [1.        , 1.56791135],
       [1.        , 1.7641284 ],
       [1.        , 1.71865204],
       [1.        , 1.27361387],
       [1.        , 1.52356172],
       [1.        , 1.06240312],
       [1.        , 1.19423602],
       [1.        , 0.68403175],
       [1.        , 1.23269376],
       [1.        , 0.28656663],
       [1.        , 0.56283408],
       [1.        , 0.64074196],
       [1.        , 0.18367042],
       [1.        , 0.71660301],
       [1.        , 0.13905064],
       [1.        , 0.35977406],
       [1.        , 0.4753392 ],
       [1.        , 0.45272167],
       [1.        , 0.93375507],
       [1.        , 0.68137743],
       [1.        , 0.2406101 ],
       [1.        , 0.60453245],
       [1.        , 1.02911014],
       [1.        , 1.76223207],
       [1.        , 1.89428423],
       [1.        , 1.41578483],
       [1.        , 1.08903995],
       [1.        , 0.28713063],
       [1.        , 1.83932683],
       [1.        , 1.72162226],
       [1.        , 1.9239099 ],
       [1.        , 1.89466128],
       [1.        , 1.69328722],
       [1.        , 1.04791071],
       [1.        , 1.08904717],
       [1.        , 0.32126084],
       [1.        , 1.31982875],
       [1.        , 1.24665312],
       [1.        , 0.71878169],
       [1.        , 1.43133907],
       [1.        , 1.03540358],
       [1.        , 1.1733726 ],
       [1.        , 1.90103184],
       [1.        , 1.24935772],
       [1.        , 0.85959727],
       [1.        , 0.11931619],
       [1.        , 1.08489517],
       [1.        , 0.53089631],
       [1.        , 1.15935157],
       [1.        , 0.50477505],
       [1.        , 0.82253989],
       [1.        , 0.42160345],
       [1.        , 0.05328369],
       [1.        , 1.7194971 ],
       [1.        , 1.03510607],
       [1.        , 0.54010678],
       [1.        , 1.1045144 ],
       [1.        , 0.75630817],
       [1.        , 1.66230773],
       [1.        , 1.36801615],
       [1.        , 0.51904843],
       [1.        , 0.76730644],
       [1.        , 1.23956514],
       [1.        , 0.06545095],
       [1.        , 1.97346577],
       [1.        , 0.17622071],
       [1.        , 0.38396168],
       [1.        , 1.06051449],
       [1.        , 0.23115968],
       [1.        , 1.15153478],
       [1.        , 0.97422361],
       [1.        , 0.25063895],
       [1.        , 0.61187805],
       [1.        , 0.90840842],
       [1.        , 1.34972953],
       [1.        , 1.82988156],
       [1.        , 0.63734291]])
           
theta_best
           
array([[4.07132793],
       [3.00060297]])
           
X_new = np.array([[0], [2]])
X_new
           
array([[0],
       [2]])
           
X_new_b = np.c_[np.ones((2,1)), X_new]
y_predict = X_new_b.dot(theta_best)
y_predict
           
array([[ 4.07132793],
       [10.07253387]])
           
plt.plot(X_new, y_predict, 'r-')
plt.plot(X, y, 'b.')
plt.axis([0, 2, 0, 15])
plt.show()
           
from sklearn.linear_model import LinearRegression
lin_reg = LinearRegression()
lin_reg.fit(X, y)
lin_reg.intercept_, lin_reg.coef_
           
(array([4.07132793]), array([[3.00060297]]))
           
array([[ 4.07132793],
       [10.07253387]])
           
theta_besta_svd, residuals, rank, s = np.linalg.lstsq(X_b, y, rcond=1e-6)
theta_besta_svd
           
array([[4.07132793],
       [3.00060297]])
           
array([[4.07132793],
       [3.00060297]])
           
count = 0
eta = 0.1
n_iterations = 1000
m =100
theta = np.random.randn(2,1)
xml = MultipleLocator(0.5)
yml = MultipleLocator(2)
fig, ax = plt.subplots()
for iteration in range(n_iterations):
    gradients = 2/m * X_b.T.dot(X_b.dot(theta) - y)
    theta = theta - eta * gradients
#     theta3 = np.append(theta3, theta, axis=1)
#     if count < 10:
#     print(theta3)
    ax.plot(X, y,'b.')
    ax.plot(X_new, xx.dot(theta),'r-')
    plt.xlabel('X1', fontsize=12)
    plt.ylabel('y', fontsize=12)
    ax.xaxis.set_major_locator(xml)
    ax.yaxis.set_major_locator(yml)
    plt.xlim(0,2.)
    plt.ylim(0, 15)
    count += 1
plt.show()
           
第四章 訓練模型代碼學習4.1線性回歸4.2梯度下降4.3多項式回歸4.4學習曲線4.5正則化4.6邏輯回歸??
n_epochs = 100
t0, t1 = 5, 50
def learning_schedule(t):
    return t0 / (t + t1)
theta = np.random.randn(2,1)
xx = np.c_[np.ones((2,1)), np.array([[0],[2]])]
for epoch in range(n_epochs):
    for i in range(m):
        random_index = np.random.randint(m)
        xi = X_b[random_index:random_index+1]
        yi = y[random_index:random_index+1]
        gradients = 2 * xi.T.dot(xi.dot(theta) - yi)
        eta = learning_schedule(epoch * m + i)
        theta = theta - eta * gradients
        plt.plot(X, y, 'b.')
        plt.plot(X_new, xx.dot(theta))
plt.show()
           
第四章 訓練模型代碼學習4.1線性回歸4.2梯度下降4.3多項式回歸4.4學習曲線4.5正則化4.6邏輯回歸??
from sklearn.linear_model import SGDRegressor
sgd_reg = SGDRegressor(max_iter=1000, tol=1e-3, penalty=None, eta0=0.1)
sgd_reg.fit(X, y.ravel())
           
SGDRegressor(eta0=0.1, penalty=None)
           
sgd_reg.intercept_, sgd_reg.coef_
           
(array([4.1182052]), array([3.1043404]))
           
plt.plot(X,y, 'b.')
plt.plot(X_new, rrr, 'g-')
plt.show()
           
第四章 訓練模型代碼學習4.1線性回歸4.2梯度下降4.3多項式回歸4.4學習曲線4.5正則化4.6邏輯回歸??
m = 100
X = 6 * np.random.rand(m ,1) - 3
y = 0.5 * X**2 + X + 2 + np.random.randn(m, 1)
plt.plot(X, y, 'b.')
plt.show()
           
第四章 訓練模型代碼學習4.1線性回歸4.2梯度下降4.3多項式回歸4.4學習曲線4.5正則化4.6邏輯回歸??
# 期待爬山、文言文翻譯學習器(可以自己寫點有趣的、可以開搞了)
from sklearn.preprocessing import PolynomialFeatures
poly_features = PolynomialFeatures(degree=2, include_bias=False)
X_poly = poly_features.fit_transform(X)
X[0]
           
array([0.04875089])
           
array([0.04875089, 0.00237665])
           
X_poly.shape
           
(100, 2)
           
lin_reg = LinearRegression()
lin_reg.fit(X_poly, y)
lin_reg.intercept_, lin_reg.coef_
           
(array([2.05438764]), array([[0.96842795, 0.48549735]]))
           
lin_reg2 = LinearRegression()
lin_reg2.fit(X, y)
lin_reg2.intercept_, lin_reg2.coef_
# 第二條直線
y_predict2 = lin_reg2.predict(np.array([[-3],[3]]))
           
poly = PolynomialFeatures(degree=300,include_bias=False)
poly.fit(X)
X3 = poly.transform(X)
lin_reg3 = LinearRegression()
lin_reg3.fit(X3,y)
y_predict3 = lin_reg3.predict(X3)
# y_predict3.shape
X3.shape
           
(100, 300)
           
# 第一條曲線
y_predict = lin_reg.predict(X_poly)
y_predict
plt.plot(X, y, 'r.')
plt.plot(np.sort(X,axis=None), y_predict[np.argsort(X,axis=None)], 'g-')
plt.plot(np.sort(X,axis=None), y_predict3[np.argsort(X,axis=None)], 'k:')
plt.plot(np.array([[-3],[3]]), y_predict2, 'y--')
plt.show()
           
第四章 訓練模型代碼學習4.1線性回歸4.2梯度下降4.3多項式回歸4.4學習曲線4.5正則化4.6邏輯回歸??
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
           
def plot_learning_curves(model,X,y):
    X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2)
    train_errors, val_errors = [], []
    for m in range(1, len(X_train)):
        model.fit(X_train[:m], y_train[:m])
        y_train_predict = model.predict(X_train[:m])
        y_val_predict = model.predict(X_val)
        train_errors.append(mean_squared_error(y_train[:m], y_train_predict))
        val_errors.append(mean_squared_error(y_val, y_val_predict))
    plt.plot(np.sqrt(train_errors), 'r-+', linewidth=2, label='train')
    plt.plot(np.sqrt(val_errors), 'b-', linewidth=3, label='val')
    plt.legend()
    plt.xlabel('訓練集大小', fontsize=12)
    plt.ylabel('RMSE', fontsize=12)
           
lin_reg = LinearRegression()
plot_learning_curves(lin_reg, X, y)
plt.show()
           
第四章 訓練模型代碼學習4.1線性回歸4.2梯度下降4.3多項式回歸4.4學習曲線4.5正則化4.6邏輯回歸??
from sklearn.pipeline import Pipeline
polynomial_regression = Pipeline([
    ('poly_features', PolynomialFeatures(degree=10, include_bias=False)),
    ('lin_reg', LinearRegression())
])
           
poly_features = PolynomialFeatures(degree=10, include_bias=False)
X_poly = poly_features.fit_transform(X)
lin_reg = LinearRegression()
# lin_reg.fit(X_poly,y)
plot_learning_curves(lin_reg, X,y)
plt.show()
           
第四章 訓練模型代碼學習4.1線性回歸4.2梯度下降4.3多項式回歸4.4學習曲線4.5正則化4.6邏輯回歸??
from sklearn.linear_model import Ridge
ridge_reg = Ridge(alpha=0.1, solver='cholesky')
ridge_reg.fit(X,y)
ridge_reg.predict([[1.5]])
           
array([[5.01749775]])
           
sgd_reg = SGDRegressor(penalty='l2')
sgd_reg.fit(X,y.ravel())
sgd_reg.predict([[1.5]])
           
array([5.00194116])
           
from sklearn.linear_model import Lasso
lasso_reg = Lasso(alpha=0.1)
lasso_reg.fit(X,y)
lasso_reg.predict([[1.5]])
           
array([4.97201777])
           
sgd_reg=SGDRegressor(penalty='l1')
sgd_reg.fit(X,y.ravel())
sgd_reg.predict([[1.5]])
           
array([5.00530233])
           
from sklearn.linear_model import ElasticNet
elastic_net = ElasticNet(alpha=0.1, l1_ratio=0.5)
elastic_net.fit(X,y)
elastic_net.predict([[1.5]])
           
array([4.97514089])
           
from sklearn.preprocessing import StandardScaler
           
from sklearn.base import clone
X_train, X_val, y_train, y_val = train_test_split(X, y.ravel(), test_size=0.2)
poly_scaler = Pipeline([
    ('poly_features', PolynomialFeatures(degree=90, include_bias=False)),
    ('std_scaler', StandardScaler())
])
X_train_poly_scaled = poly_scaler.fit_transform(X_train)
X_val_poly_scaled = poly_scaler.transform(X_val)
sgd_reg = SGDRegressor(max_iter=1, tol=-np.infty, warm_start=True,
                      penalty=None, learning_rate='constant', eta0=0.0005)
minimum_val_error = float('inf')
best_epoch = None
best_model = None
for epoch in range(1000):
    sgd_reg.fit(X_train_poly_scaled, y_train)
    y_val_predict = sgd_reg.predict(X_val_poly_scaled)
    val_error = mean_squared_error(y_val, y_val_predict)
    if val_error < minimum_val_error:
        minimum_val_error = val_error
        best_epoch = epoch
        best_model = clone(sgd_reg)
           
t = np.linspace(-10, 10, 100)
# 函數
sig = 1 / (1 + np.exp(-t))
# 圖檔大小
plt.figure(figsize=(9,3))
# y=0給條實線 y=0.5、y=1給條虛線 x=0實線 
plt.plot([-10,10],[0,0],'k-')
plt.plot([-10,10], [0.5,0.5], 'k:')
plt.plot([-10,10],[1,1], 'k:')
plt.plot([0,0], [-1.1,1.1], 'k-')
plt.plot(t, sig, 'b-', linewidth=2, label=r'$\sigma(t) = \frac{1}{1 + e^{-t}}$')
plt.xlabel('t')
plt.legend(loc='upper left', fontsize=20)
plt.axis([-10, 10, -0.1, 1.1])
# save_fig('logistic_function_plot')
plt.show()
           
第四章 訓練模型代碼學習4.1線性回歸4.2梯度下降4.3多項式回歸4.4學習曲線4.5正則化4.6邏輯回歸??
from sklearn import datasets
iris = datasets.load_iris()
list(iris.keys())
           
['data',
 'target',
 'frame',
 'target_names',
 'DESCR',
 'feature_names',
 'filename',
 'data_module']
           
X = iris['data'][:, 3:]
y = (iris['target'] == 2).astype(int)
           
from sklearn.linear_model import LogisticRegression
log_reg = LogisticRegression()
log_reg.fit(X,y)
           
LogisticRegression()
           
# 資料X:第四行後的花瓣寬度;y:target列為2(virginia)則True(1),否則0
# X軸
X_new = np.linspace(0,3,1000).reshape(-1,1)
# 相應邏輯回歸的類别機率值
y_proba = log_reg.predict_proba(X_new)
# 決策邊界:可能性剛好在1/2的x點
decision_boundary = X_new[y_proba[:, 1] >= 0.5][0]
plt.figure(figsize=(8,3))
# y包含50個1,100個0(非弗吉尼亞)
# X[y==0], y[y==0]傳回100個非弗花瓣寬度、100個0,标記藍色方塊
# X[y==1], y[y==1]傳回50個弗花瓣寬度、50個1,标記綠色正三角
plt.plot(X[y==0], y[y==0], 'bs')
plt.plot(X[y==1], y[y==1], 'g^')
# 決策邊界虛線
plt.plot([decision_boundary, decision_boundary], [-1,2], 'k:', linewidth=2)
# (第二列)正類的相對機率、負類
plt.plot(X_new, y_proba[:, 1], 'g-', linewidth=2, label='弗吉尼亞鸢尾')
plt.plot(X_new, y_proba[:, 0], 'b--', linewidth=2, label='非弗吉尼亞鸢尾')
# 文本坐标、垂直對齊方式
plt.text(decision_boundary+0.02, 0.15, '決策邊界', fontsize=14, color='k', ha='center')
# 箭尾坐标、箭頭方向與坐标偏移量、頭寬長、頭尾色
plt.arrow(decision_boundary, 0.08, -0.3, 0, head_width=0.05, head_length=0.1, fc='b', ec='b')
plt.arrow(decision_boundary, 0.92, 0.3, 0, head_width=0.05, head_length=0.1, fc='g', ec='g')
plt.xlabel('花瓣寬度(cm)', fontsize=14)
plt.ylabel('機率', fontsize=14)
plt.legend(loc='center left', fontsize=14)
plt.axis([0,3,-0.02,1.02])
plt.show()
           
第四章 訓練模型代碼學習4.1線性回歸4.2梯度下降4.3多項式回歸4.4學習曲線4.5正則化4.6邏輯回歸??
array([1, 0])
           
Xl,Yl = np.meshgrid(
        np.linspace(0,1000,20).reshape(-1,1), 
        np.linspace(0,500,20).reshape(-1,1)
)
plt.plot(Xl, Yl,
         color='limegreen',  # 設定顔色為limegreen
         marker='.',  # 設定點類型為圓點
         linestyle='')  # 設定線型為空,也即沒有線連接配接點
plt.grid(True)
plt.show()
           
第四章 訓練模型代碼學習4.1線性回歸4.2梯度下降4.3多項式回歸4.4學習曲線4.5正則化4.6邏輯回歸??
from sklearn.linear_model import LogisticRegression
# 花瓣長度、寬度
X = iris['data'][:, (2,3)]
# 非0弗1,前100後50
y = (iris['target'] == 2).astype(int)
# 五種優化之一、系數倒數
log_reg = LogisticRegression(solver='lbfgs', C=10**10, random_state=42)
log_reg.fit(X, y)
# 生成網格點矩陣,均200行500列
x0, x1 = np.meshgrid(
        np.linspace(2.9,7,500).reshape(-1,1),
        np.linspace(0.8,2.7,200).reshape(-1,1)
)
# 分别拉成一維,再合并
X_new = np.c_[x0.ravel(), x1.ravel()]
# 對X_new各點給出0/1分數
y_proba = log_reg.predict_proba(X_new)
plt.figure(figsize=(10,4))
# 清單前100T後50F,加第二位的0/1。。。
# 100個非弗行,以及第一列的長度第二列的寬度
plt.plot(X[y==0, 0], X[y==0,1], 'bs')
# 50個弗行,以及第一列的長度第二列的寬度
plt.plot(X[y==1, 0], X[y==1,1], 'g^')
zz = y_proba[:,1].reshape(x0.shape)
# 雙自變,1因變,非填充的漸變..輪廓線,類似等高
contour = plt.contour(x0, x1, zz, cmap=plt.cm.brg)
left_right = np.array([2.9, 7])
boundary = -(log_reg.coef_[0][0] * left_right + log_reg.intercept_[0]) / log_reg.coef_[0][1]
# 
plt.clabel(contour, inline=1, fontsize=12)
plt.plot(left_right, boundary, 'k--', linewidth=3)
plt.text(3.5, 1.5, '非弗', fontsize=14, color='b', ha='center')
plt.text(6.5, 2.3, '弗', fontsize=14, color='g', ha='center')
plt.xlabel('花瓣長度', fontsize=14)
plt.ylabel('花瓣寬度', fontsize=14)
plt.axis([2.9,7,0.8,2.7])
plt.show()
           
第四章 訓練模型代碼學習4.1線性回歸4.2梯度下降4.3多項式回歸4.4學習曲線4.5正則化4.6邏輯回歸??
X = iris['data'][:, (2,3)]
y = iris['target']
softmax_reg = LogisticRegression(multi_class='multinomial',solver='lbfgs', C=10)
softmax_reg.fit(X,y)
softmax_reg.predict([[5,2]])
           
array([2])
           
array([[6.38014896e-07, 5.74929995e-02, 9.42506362e-01]])
           
# 用softmax批量梯度下降,實作提前停止
X = iris['data'][:, (2,3)]
y = iris['target']
# 構造矩陣,對每個矩陣添加x0為1的偏置項
X_with_bias = np.c_[np.ones([len(X), 1]), X]
np.random.seed(2042)
# 手動實作train與val集分層抽樣、設定test、val比例與總數
test_ratio = 0.2
validation_ratio = 0.2
total_size = len(X_with_bias)
# 設定各資料集的數量
test_size = int(total_size * test_ratio)
validation_size = int(total_size * validation_ratio)
train_size = total_size - test_size - validation_size
# permutation:對總量索引随機排序,二維0縱1橫,三維0橫向1縱向,
rnd_indices = np.random.permutation(total_size)
# 各資料随機切片操作
X_train = X_with_bias[rnd_indices[:train_size]]
y_train = y[rnd_indices[:train_size]]
X_valid = X_with_bias[rnd_indices[train_size:-test_size]]
y_valid = y[rnd_indices[train_size:-test_size]]
X_test = X_with_bias[rnd_indices[-test_size:]]
y_test = y[rnd_indices[-test_size:]]
# 編稀疏模型(獨熱編碼),模型全0之後,隻要對應索引為1
def to_one_hot(y):
    n_classes = y.max() + 1
    m = len(y)
    Y_one_hot = np.zeros((m, n_classes))
    Y_one_hot[np.arange(m), y] = 1
    return Y_one_hot

           
array([0, 1, 2, 1, 1, 0, 1, 1, 1, 0])
           
array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.],
       [0., 1., 0.],
       [0., 1., 0.],
       [1., 0., 0.],
       [0., 1., 0.],
       [0., 1., 0.],
       [0., 1., 0.],
       [1., 0., 0.]])
           
# 目标标簽全轉稀疏獨熱模型
Y_train_one_hot = to_one_hot(y_train)
Y_valid_one_hot = to_one_hot(y_valid)
Y_test_one_hot = to_one_hot(y_test)
           
# 指數除以指數之和函數
def softmax(logits):
    exps = np.exp(logits)
    exp_sums = np.sum(exps, axis=1, keepdims=True)
    return exps / exp_sums
           
# 特征數量、标簽類别數量
n_inputs = X_train.shape[1]
n_outputs = len(np.unique(y_train))
           
# 訓練模型需要的等式
# 成本函數
# $J(\mathbf{\Theta}) =
# - \dfrac{1}{m}\sum\limits_{i=1}^{m}\sum\limits_{k=1}^{K}{y_k^{(i)}\log\left(\hat{p}_k^{(i)}\right)}$
# 梯度計算
# $\nabla_{\mathbf{\theta}^{(k)}} \, J(\mathbf{\Theta}) = 
# \dfrac{1}{m} \sum\limits_{i=1}^{m}{ \left ( \hat{p}^{(i)}_k - y_k^{(i)} \right ) \mathbf{x}^{(i)}}$
eta = 0.01
n_iterations = 5001
m = len(X_train)
epsilon = 1e-7
Theta = np.random.randn(n_inputs, n_outputs)
# softmax實作批量梯度下降
for iteration in range(n_iterations):
    logits = X_train.dot(Theta)
    Y_proba = softmax(logits)
    if iteration % 500 == 0:
        loss = -np.mean(np.sum(Y_train_one_hot * np.log(Y_proba+epsilon), axis=1))
        print(iteration, loss)
    error = Y_proba - Y_train_one_hot
    gradients = 1/m * X_train.T.dot(error)
    Theta = Theta - eta * gradients
           
0 5.446205811872683
500 0.8350062641405651
1000 0.6878801447192402
1500 0.6012379137693313
2000 0.5444496861981872
2500 0.5038530181431525
3000 0.47292289721922487
3500 0.44824244188957774
4000 0.4278651093928793
4500 0.41060071429187134
5000 0.3956780375390374
           
Theta
           
array([[ 3.32094157, -0.6501102 , -2.99979416],
       [-1.1718465 ,  0.11706172,  0.10507543],
       [-0.70224261, -0.09527802,  1.4786383 ]])
           
# 預測驗證集、檢查機率分數
logits = X_valid.dot(Theta)
Y_proba = softmax(logits)
y_predict = np.argmax(Y_proba, axis=1)

accuracy_score = np.mean(y_predict == y_valid)
accuracy_score
           
0.9666666666666667
           
# 對成本函數添加l2懲罰
eta = 0.1
n_iterations = 5001
m = len(X_train)
epsilon = 1e-7
alpha = 0.1
Theta = np.random.randn(n_inputs, n_outputs)
for iteration in range(n_iterations):
    logits = X_train.dot(Theta)
    Y_proba = softmax(logits)
    if iteration % 500 == 0:
        xentropy_loss = -np.mean(np.sum(Y_train_one_hot * np.log(Y_proba + epsilon), axis=1))
        l2_loss = 1/2 * np.sum(np.square(Theta[1:]))
        loss = xentropy_loss + alpha * l2_loss
        print(iteration, loss)
    error = Y_proba - Y_train_one_hot
    gradients = 1/m * X_train.T.dot(error) + np.r_[np.zeros([1, n_outputs]), alpha * Theta[1:]]
    Theta = Theta - eta * gradients
           
0 5.401014020496038
500 0.5399802167300589
1000 0.5055073771883054
1500 0.4953639890209271
2000 0.49156703270914
2500 0.4900134074001495
3000 0.48934877664358845
3500 0.48905717267345383
4000 0.488927251858594
4500 0.4888688023117297
5000 0.4888423408562912
           
# 驗證模型表現
logits = X_valid.dot(Theta)
Y_proba = softmax(logits)
y_predict = np.argmax(Y_proba, axis=1)

accuracy_score = np.mean(y_predict == y_valid)
accuracy_score
           
1.0
           
#  提前停止算法:在每次疊代計算Eval,并在當Eval開始上升時停止
eta = 0.1
n_iterations = 5001
m =len(X_train)
epsilon = 1e-7
alpha = 0.1
best_loss = np.infty
Theta = np.random.randn(n_inputs, n_outputs)
for iteration in range(n_iterations):
    logits = X_train.dot(Theta)
    Y_proba = softmax(logits)
    error = Y_proba - Y_train_one_hot
    gradients = 1/m * X_train.T.dot(error) + np.r_[np.zeros([1, n_outputs]), alpha * Theta[1:]]
    Theta = Theta -eta * gradients
    
    logits = X_valid.dot(Theta)
    Y_proba = softmax(logits)
    xentropy_loss = -np.mean(np.sum(Y_valid_one_hot * np.log(Y_proba + epsilon), axis=1))
    l2_loss = 1/2 * np.sum(np.square(Theta[1:]))
    loss = xentropy_loss + alpha * l2_loss
    if iteration % 500 == 0:
        print(iteration, loss)
    if loss < best_loss:
        best_loss = loss
    else:
        print(iteration -1, best_loss)
        print(iteration, loss, '提前停止!')
        break
           
0 2.897275838876366
500 0.5702751662442892
1000 0.5425654873413586
1500 0.5353090385301479
2000 0.5331256731252507
2500 0.5325827330917428
2736 0.5325454243382794
2737 0.5325454252101579 提前停止!
           
logits = X_valid.dot(Theta)
Y_proba = softmax(logits)
y_predict = np.argmax(Y_proba, axis=1)
accuracy_score = np.mean(y_predict == y_valid)
accuracy_score
           
1.0
           
# 繪制模型預測圖
x0, x1 = np.meshgrid(
        np.linspace(0, 8, 500).reshape(-1,1),
        np.linspace(0, 3.5, 200).reshape(-1,1)
)
X_new = np.c_[x0.ravel(), x1.ravel()]
X_new_with_bias = np.c_[np.ones([len(X_new), 1]), X_new]
logits = X_new_with_bias.dot(Theta)
Y_proba = softmax(logits)
y_predict = np.argmax(Y_proba, axis=1)
zz1 = Y_proba[:, 1].reshape(x0.shape)
zz = y_predict.reshape(x0.shape)
plt.figure(figsize=(10, 4))
plt.plot(X[y==2,0], X[y==2,1], 'g^', label='弗吉尼亞')
plt.plot(X[y==1,0], X[y==1,1], 'bs', label='變色鸢尾')
plt.plot(X[y==0,0], X[y==0,1], 'yo', label='山鸢尾')
from matplotlib.colors import ListedColormap
custom_cmap = ListedColormap(['#fafab0', '#9898ff', '#a0faa0'])
plt.contourf(x0, x1, zz, cmap=custom_cmap)
contour = plt.contour(x0, x1, zz1, cmap=plt.cm.brg)
plt.clabel(contour, inline=1, fontsize=12)
plt.xlabel('花瓣長度',fontsize=14)
plt.ylabel('花瓣寬度',fontsize=14)
plt.legend(loc='upper left', fontsize=14)
plt.axis([0,7,0,3.5])
plt.show()
           
第四章 訓練模型代碼學習4.1線性回歸4.2梯度下降4.3多項式回歸4.4學習曲線4.5正則化4.6邏輯回歸??
# 對測試集測試模型準确度
# 原因:N太小,并且對三資料集切分不同結果也不同
logits = X_test.dot(Theta)
Y_proba = softmax(logits)
y_predict = np.argmax(Y_proba, axis=1)
accuracy_score = np.mean(y_predict == y_test)
accuracy_score
           
0.9333333333333333
           

繼續閱讀