代碼學習

散點畫圖方式、多線繪與一圖、多次項式畫圖

軸刻度設定

定點給線、邏輯函數圖繪制

定點标記、二進制分類可視化、繪制文本與箭頭

4.1線性回歸

特征與偏置項的權重求和

目标：使方差成本函數最小θ

代碼實作

标準方程：WLIN=θ=僞逆陣Xy

代碼實作：

np.linalg.inv計算僞逆陣，求得θ0、θ1

求僞逆法二：np.linalg.pinv()直接得X僞逆

再矩陣相乘得y_predict始終點y值

Scikit-Learn：

執行個體、模型訓練、intercept_coef_得θ0與θ1、predict得線

法二：np.linalg.lstsq直接得θ0、θ1

4.2梯度下降

在初始化θ0每步降低成本函數MSE(θ)，直到趨向最小

目标：找使成本函數MSE(θ)最小化的參數組合Wlin

∨MSE(θ)成本函數的偏導（梯度）：訓示四面八方上升最快的方向

學習率η步數

批量梯度下降

在每一步（疊代）中計算梯度∨MSE(θ)

代碼實作：

随機梯度下降SGD

代碼實作：

每一步（疊代）随機找個執行個體計算梯度，每個疊代逐漸降低學習率（步長η）

學習率排程函數：eta = learning_schedule(epoch * m + i)

確定執行個體iid：對執行個體混洗，使失去标簽排序順序

Scikit-Learn：

SGDRegressor、訓練模型、得偏置項與權重

小批量梯度下降

小批量、随機執行個體

4.3多項式回歸

目标：線性模型拟合非線性D

介紹：多次項添加為新特征，使用新特征集訓練模型

Scikit-Learn：

PolynomialFeatures自動添加特征組合、fittransform添加二次項列的特征集、使用線性回歸

4.4學習曲線

驗證模型的泛化性能：交叉驗證、學習曲線

原理：不同訓練集大小上train和val集各自的RMSE水準

自定義繪制函數plot_learning_curves

4.5正則化

原理：限制模型權重θ減少次數，進而減少過拟合

嶺回歸？？

嶺回歸成本函數：添加了α/2l2範數進行限制

閉式解嶺回歸函數：

Lasso回歸？？

Lasso回歸成本函數：添加αl1範數進行限制

彈性網路

加入混合比r與(1-r)分别控制雙回歸的懲罰項

ElasticNet模型

提前停止

Eval上升的時候即在過拟合，復原模型參數到Eval最小的位置

算法實作：

4.6邏輯回歸？？

對線性回歸的預測值z二分化，嵌入sigmoid函數并取對數得到p、預測y

決策邊界

用petal width特征檢測是否弗吉尼亞：

擷取寬度資料X與0/1标簽資料y、訓練邏輯回歸模型、制作x軸資料點并對對應點作各類估計分數、繪圖

Softmax回歸

對某執行個體計算每個類k分數的指數，再歸一化，分類器傳回最高機率的一個類

Scikit-Learn：

# 作業
a=0
while a<3:
    a+=1
    if a == 2:
        break
    print(a)
a=0
while a<3:
    a+=1
    if a == 2:
        continue
    print(a)
for i in range(4):
    for j in range(4):
            if j == 0:
                continue
                print(i)
            if j ==2:
                break
            print(i,j)
嵌套for循環中
continue：跳過後續代碼進入下一次循環，該程式不運作内層第1次循環，内層直接運作第二次循環，是以輸出的j不包含0
break：結束内層循環，是以内層隻運作第二次，第3、4次循環均無輸出
while循環中：
continue：結束本次，執行下一次循環
break：直接結束循環
當score輸入在區間[90,100)時，index索引下标為3，傳回第一個A
當輸入為100時，若無第二個A，則傳回為E
添加第二個A後，當score輸入為100時，index索引下标為4，傳回第二個A

File "<ipython-input-394-239d5f113bd3>", line 23
    continue：跳過後續代碼進入下一次循環，該程式不運作内層第1次循環，内層直接運作第二次循環，是以輸出的j不包含0
                                                              ^
SyntaxError: invalid character in identifier

import matplotlib as mpl
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

X = 2 * np.random.rand(100, 1)
y = 4 + 3*X + np.random.randn(100, 1)

from matplotlib.pyplot import MultipleLocator

# plt.figure(figsize=(20,8), dpi=80)
plt.scatter(X, y)
plt.tick_params(axis='both',which='major',labelsize=14)
plt.xlabel('X1', fontsize=12)
plt.ylabel('y', fontsize=12)
# 刻度間隔
x_major_locator = MultipleLocator(0.25)
y_major_locator = MultipleLocator(2)
# 坐标軸執行個體
ax = plt.gca()
# 主刻度
ax.xaxis.set_major_locator(x_major_locator)
ax.yaxis.set_major_locator(y_major_locator)
# 刻度範圍
plt.xlim(0, 2.)
plt.ylim(0, 15)
plt.show()

第四章訓練模型代碼學習4.1線性回歸4.2梯度下降4.3多項式回歸4.4學習曲線4.5正則化4.6邏輯回歸？？

X_b = np.c_[np.ones((100, 1)), X]
X_b

array([[1.        , 0.69056896],
       [1.        , 1.09503641],
       [1.        , 0.64677682],
       [1.        , 1.30597366],
       [1.        , 1.83011935],
       [1.        , 1.05779863],
       [1.        , 0.54006116],
       [1.        , 0.41063882],
       [1.        , 0.84434918],
       [1.        , 1.48040929],
       [1.        , 0.32755812],
       [1.        , 0.95007289],
       [1.        , 1.6222861 ],
       [1.        , 1.36803173],
       [1.        , 0.67995342],
       [1.        , 0.28375754],
       [1.        , 0.22192431],
       [1.        , 0.81670441],
       [1.        , 0.3227235 ],
       [1.        , 1.93555247],
       [1.        , 0.88910284],
       [1.        , 0.90670084],
       [1.        , 0.85993465],
       [1.        , 1.56791135],
       [1.        , 1.7641284 ],
       [1.        , 1.71865204],
       [1.        , 1.27361387],
       [1.        , 1.52356172],
       [1.        , 1.06240312],
       [1.        , 1.19423602],
       [1.        , 0.68403175],
       [1.        , 1.23269376],
       [1.        , 0.28656663],
       [1.        , 0.56283408],
       [1.        , 0.64074196],
       [1.        , 0.18367042],
       [1.        , 0.71660301],
       [1.        , 0.13905064],
       [1.        , 0.35977406],
       [1.        , 0.4753392 ],
       [1.        , 0.45272167],
       [1.        , 0.93375507],
       [1.        , 0.68137743],
       [1.        , 0.2406101 ],
       [1.        , 0.60453245],
       [1.        , 1.02911014],
       [1.        , 1.76223207],
       [1.        , 1.89428423],
       [1.        , 1.41578483],
       [1.        , 1.08903995],
       [1.        , 0.28713063],
       [1.        , 1.83932683],
       [1.        , 1.72162226],
       [1.        , 1.9239099 ],
       [1.        , 1.89466128],
       [1.        , 1.69328722],
       [1.        , 1.04791071],
       [1.        , 1.08904717],
       [1.        , 0.32126084],
       [1.        , 1.31982875],
       [1.        , 1.24665312],
       [1.        , 0.71878169],
       [1.        , 1.43133907],
       [1.        , 1.03540358],
       [1.        , 1.1733726 ],
       [1.        , 1.90103184],
       [1.        , 1.24935772],
       [1.        , 0.85959727],
       [1.        , 0.11931619],
       [1.        , 1.08489517],
       [1.        , 0.53089631],
       [1.        , 1.15935157],
       [1.        , 0.50477505],
       [1.        , 0.82253989],
       [1.        , 0.42160345],
       [1.        , 0.05328369],
       [1.        , 1.7194971 ],
       [1.        , 1.03510607],
       [1.        , 0.54010678],
       [1.        , 1.1045144 ],
       [1.        , 0.75630817],
       [1.        , 1.66230773],
       [1.        , 1.36801615],
       [1.        , 0.51904843],
       [1.        , 0.76730644],
       [1.        , 1.23956514],
       [1.        , 0.06545095],
       [1.        , 1.97346577],
       [1.        , 0.17622071],
       [1.        , 0.38396168],
       [1.        , 1.06051449],
       [1.        , 0.23115968],
       [1.        , 1.15153478],
       [1.        , 0.97422361],
       [1.        , 0.25063895],
       [1.        , 0.61187805],
       [1.        , 0.90840842],
       [1.        , 1.34972953],
       [1.        , 1.82988156],
       [1.        , 0.63734291]])

theta_best

array([[4.07132793],
       [3.00060297]])

X_new = np.array([[0], [2]])
X_new

array([[0],
       [2]])

X_new_b = np.c_[np.ones((2,1)), X_new]
y_predict = X_new_b.dot(theta_best)
y_predict

array([[ 4.07132793],
       [10.07253387]])

plt.plot(X_new, y_predict, 'r-')
plt.plot(X, y, 'b.')
plt.axis([0, 2, 0, 15])
plt.show()

from sklearn.linear_model import LinearRegression
lin_reg = LinearRegression()
lin_reg.fit(X, y)
lin_reg.intercept_, lin_reg.coef_

(array([4.07132793]), array([[3.00060297]]))

array([[ 4.07132793],
       [10.07253387]])

theta_besta_svd, residuals, rank, s = np.linalg.lstsq(X_b, y, rcond=1e-6)
theta_besta_svd

array([[4.07132793],
       [3.00060297]])

array([[4.07132793],
       [3.00060297]])

count = 0
eta = 0.1
n_iterations = 1000
m =100
theta = np.random.randn(2,1)
xml = MultipleLocator(0.5)
yml = MultipleLocator(2)
fig, ax = plt.subplots()
for iteration in range(n_iterations):
    gradients = 2/m * X_b.T.dot(X_b.dot(theta) - y)
    theta = theta - eta * gradients
#     theta3 = np.append(theta3, theta, axis=1)
#     if count < 10:
#     print(theta3)
    ax.plot(X, y,'b.')
    ax.plot(X_new, xx.dot(theta),'r-')
    plt.xlabel('X1', fontsize=12)
    plt.ylabel('y', fontsize=12)
    ax.xaxis.set_major_locator(xml)
    ax.yaxis.set_major_locator(yml)
    plt.xlim(0,2.)
    plt.ylim(0, 15)
    count += 1
plt.show()

第四章訓練模型代碼學習4.1線性回歸4.2梯度下降4.3多項式回歸4.4學習曲線4.5正則化4.6邏輯回歸？？

n_epochs = 100
t0, t1 = 5, 50
def learning_schedule(t):
    return t0 / (t + t1)
theta = np.random.randn(2,1)
xx = np.c_[np.ones((2,1)), np.array([[0],[2]])]
for epoch in range(n_epochs):
    for i in range(m):
        random_index = np.random.randint(m)
        xi = X_b[random_index:random_index+1]
        yi = y[random_index:random_index+1]
        gradients = 2 * xi.T.dot(xi.dot(theta) - yi)
        eta = learning_schedule(epoch * m + i)
        theta = theta - eta * gradients
        plt.plot(X, y, 'b.')
        plt.plot(X_new, xx.dot(theta))
plt.show()

第四章訓練模型代碼學習4.1線性回歸4.2梯度下降4.3多項式回歸4.4學習曲線4.5正則化4.6邏輯回歸？？

from sklearn.linear_model import SGDRegressor
sgd_reg = SGDRegressor(max_iter=1000, tol=1e-3, penalty=None, eta0=0.1)
sgd_reg.fit(X, y.ravel())

SGDRegressor(eta0=0.1, penalty=None)

sgd_reg.intercept_, sgd_reg.coef_

(array([4.1182052]), array([3.1043404]))

plt.plot(X,y, 'b.')
plt.plot(X_new, rrr, 'g-')
plt.show()

第四章訓練模型代碼學習4.1線性回歸4.2梯度下降4.3多項式回歸4.4學習曲線4.5正則化4.6邏輯回歸？？

m = 100
X = 6 * np.random.rand(m ,1) - 3
y = 0.5 * X**2 + X + 2 + np.random.randn(m, 1)
plt.plot(X, y, 'b.')
plt.show()

第四章訓練模型代碼學習4.1線性回歸4.2梯度下降4.3多項式回歸4.4學習曲線4.5正則化4.6邏輯回歸？？

# 期待爬山、文言文翻譯學習器（可以自己寫點有趣的、可以開搞了）
from sklearn.preprocessing import PolynomialFeatures
poly_features = PolynomialFeatures(degree=2, include_bias=False)
X_poly = poly_features.fit_transform(X)
X[0]

array([0.04875089])

array([0.04875089, 0.00237665])

X_poly.shape

(100, 2)

lin_reg = LinearRegression()
lin_reg.fit(X_poly, y)
lin_reg.intercept_, lin_reg.coef_

(array([2.05438764]), array([[0.96842795, 0.48549735]]))

lin_reg2 = LinearRegression()
lin_reg2.fit(X, y)
lin_reg2.intercept_, lin_reg2.coef_
# 第二條直線
y_predict2 = lin_reg2.predict(np.array([[-3],[3]]))

poly = PolynomialFeatures(degree=300,include_bias=False)
poly.fit(X)
X3 = poly.transform(X)
lin_reg3 = LinearRegression()
lin_reg3.fit(X3,y)
y_predict3 = lin_reg3.predict(X3)
# y_predict3.shape
X3.shape

(100, 300)

# 第一條曲線
y_predict = lin_reg.predict(X_poly)
y_predict
plt.plot(X, y, 'r.')
plt.plot(np.sort(X,axis=None), y_predict[np.argsort(X,axis=None)], 'g-')
plt.plot(np.sort(X,axis=None), y_predict3[np.argsort(X,axis=None)], 'k:')
plt.plot(np.array([[-3],[3]]), y_predict2, 'y--')
plt.show()

第四章訓練模型代碼學習4.1線性回歸4.2梯度下降4.3多項式回歸4.4學習曲線4.5正則化4.6邏輯回歸？？

from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

def plot_learning_curves(model,X,y):
    X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2)
    train_errors, val_errors = [], []
    for m in range(1, len(X_train)):
        model.fit(X_train[:m], y_train[:m])
        y_train_predict = model.predict(X_train[:m])
        y_val_predict = model.predict(X_val)
        train_errors.append(mean_squared_error(y_train[:m], y_train_predict))
        val_errors.append(mean_squared_error(y_val, y_val_predict))
    plt.plot(np.sqrt(train_errors), 'r-+', linewidth=2, label='train')
    plt.plot(np.sqrt(val_errors), 'b-', linewidth=3, label='val')
    plt.legend()
    plt.xlabel('訓練集大小', fontsize=12)
    plt.ylabel('RMSE', fontsize=12)

lin_reg = LinearRegression()
plot_learning_curves(lin_reg, X, y)
plt.show()

第四章訓練模型代碼學習4.1線性回歸4.2梯度下降4.3多項式回歸4.4學習曲線4.5正則化4.6邏輯回歸？？

from sklearn.pipeline import Pipeline
polynomial_regression = Pipeline([
    ('poly_features', PolynomialFeatures(degree=10, include_bias=False)),
    ('lin_reg', LinearRegression())
])

poly_features = PolynomialFeatures(degree=10, include_bias=False)
X_poly = poly_features.fit_transform(X)
lin_reg = LinearRegression()
# lin_reg.fit(X_poly,y)
plot_learning_curves(lin_reg, X,y)
plt.show()

第四章訓練模型代碼學習4.1線性回歸4.2梯度下降4.3多項式回歸4.4學習曲線4.5正則化4.6邏輯回歸？？

from sklearn.linear_model import Ridge
ridge_reg = Ridge(alpha=0.1, solver='cholesky')
ridge_reg.fit(X,y)
ridge_reg.predict([[1.5]])

array([[5.01749775]])

sgd_reg = SGDRegressor(penalty='l2')
sgd_reg.fit(X,y.ravel())
sgd_reg.predict([[1.5]])

array([5.00194116])

from sklearn.linear_model import Lasso
lasso_reg = Lasso(alpha=0.1)
lasso_reg.fit(X,y)
lasso_reg.predict([[1.5]])

array([4.97201777])

sgd_reg=SGDRegressor(penalty='l1')
sgd_reg.fit(X,y.ravel())
sgd_reg.predict([[1.5]])

array([5.00530233])

from sklearn.linear_model import ElasticNet
elastic_net = ElasticNet(alpha=0.1, l1_ratio=0.5)
elastic_net.fit(X,y)
elastic_net.predict([[1.5]])

array([4.97514089])

from sklearn.preprocessing import StandardScaler

from sklearn.base import clone
X_train, X_val, y_train, y_val = train_test_split(X, y.ravel(), test_size=0.2)
poly_scaler = Pipeline([
    ('poly_features', PolynomialFeatures(degree=90, include_bias=False)),
    ('std_scaler', StandardScaler())
])
X_train_poly_scaled = poly_scaler.fit_transform(X_train)
X_val_poly_scaled = poly_scaler.transform(X_val)
sgd_reg = SGDRegressor(max_iter=1, tol=-np.infty, warm_start=True,
                      penalty=None, learning_rate='constant', eta0=0.0005)
minimum_val_error = float('inf')
best_epoch = None
best_model = None
for epoch in range(1000):
    sgd_reg.fit(X_train_poly_scaled, y_train)
    y_val_predict = sgd_reg.predict(X_val_poly_scaled)
    val_error = mean_squared_error(y_val, y_val_predict)
    if val_error < minimum_val_error:
        minimum_val_error = val_error
        best_epoch = epoch
        best_model = clone(sgd_reg)

t = np.linspace(-10, 10, 100)
# 函數
sig = 1 / (1 + np.exp(-t))
# 圖檔大小
plt.figure(figsize=(9,3))
# y=0給條實線 y=0.5、y=1給條虛線 x=0實線 
plt.plot([-10,10],[0,0],'k-')
plt.plot([-10,10], [0.5,0.5], 'k:')
plt.plot([-10,10],[1,1], 'k:')
plt.plot([0,0], [-1.1,1.1], 'k-')
plt.plot(t, sig, 'b-', linewidth=2, label=r'$\sigma(t) = \frac{1}{1 + e^{-t}}$')
plt.xlabel('t')
plt.legend(loc='upper left', fontsize=20)
plt.axis([-10, 10, -0.1, 1.1])
# save_fig('logistic_function_plot')
plt.show()

第四章訓練模型代碼學習4.1線性回歸4.2梯度下降4.3多項式回歸4.4學習曲線4.5正則化4.6邏輯回歸？？

from sklearn import datasets
iris = datasets.load_iris()
list(iris.keys())

['data',
 'target',
 'frame',
 'target_names',
 'DESCR',
 'feature_names',
 'filename',
 'data_module']

X = iris['data'][:, 3:]
y = (iris['target'] == 2).astype(int)

from sklearn.linear_model import LogisticRegression
log_reg = LogisticRegression()
log_reg.fit(X,y)

LogisticRegression()

# 資料X：第四行後的花瓣寬度；y：target列為2（virginia）則True（1），否則0
# X軸
X_new = np.linspace(0,3,1000).reshape(-1,1)
# 相應邏輯回歸的類别機率值
y_proba = log_reg.predict_proba(X_new)
# 決策邊界：可能性剛好在1/2的x點
decision_boundary = X_new[y_proba[:, 1] >= 0.5][0]
plt.figure(figsize=(8,3))
# y包含50個1，100個0（非弗吉尼亞）
# X[y==0], y[y==0]傳回100個非弗花瓣寬度、100個0，标記藍色方塊
# X[y==1], y[y==1]傳回50個弗花瓣寬度、50個1，标記綠色正三角
plt.plot(X[y==0], y[y==0], 'bs')
plt.plot(X[y==1], y[y==1], 'g^')
# 決策邊界虛線
plt.plot([decision_boundary, decision_boundary], [-1,2], 'k:', linewidth=2)
# （第二列）正類的相對機率、負類
plt.plot(X_new, y_proba[:, 1], 'g-', linewidth=2, label='弗吉尼亞鸢尾')
plt.plot(X_new, y_proba[:, 0], 'b--', linewidth=2, label='非弗吉尼亞鸢尾')
# 文本坐标、垂直對齊方式
plt.text(decision_boundary+0.02, 0.15, '決策邊界', fontsize=14, color='k', ha='center')
# 箭尾坐标、箭頭方向與坐标偏移量、頭寬長、頭尾色
plt.arrow(decision_boundary, 0.08, -0.3, 0, head_width=0.05, head_length=0.1, fc='b', ec='b')
plt.arrow(decision_boundary, 0.92, 0.3, 0, head_width=0.05, head_length=0.1, fc='g', ec='g')
plt.xlabel('花瓣寬度（cm）', fontsize=14)
plt.ylabel('機率', fontsize=14)
plt.legend(loc='center left', fontsize=14)
plt.axis([0,3,-0.02,1.02])
plt.show()

第四章訓練模型代碼學習4.1線性回歸4.2梯度下降4.3多項式回歸4.4學習曲線4.5正則化4.6邏輯回歸？？

array([1, 0])

Xl,Yl = np.meshgrid(
        np.linspace(0,1000,20).reshape(-1,1), 
        np.linspace(0,500,20).reshape(-1,1)
)
plt.plot(Xl, Yl,
         color='limegreen',  # 設定顔色為limegreen
         marker='.',  # 設定點類型為圓點
         linestyle='')  # 設定線型為空，也即沒有線連接配接點
plt.grid(True)
plt.show()

第四章訓練模型代碼學習4.1線性回歸4.2梯度下降4.3多項式回歸4.4學習曲線4.5正則化4.6邏輯回歸？？

from sklearn.linear_model import LogisticRegression
# 花瓣長度、寬度
X = iris['data'][:, (2,3)]
# 非0弗1，前100後50
y = (iris['target'] == 2).astype(int)
# 五種優化之一、系數倒數
log_reg = LogisticRegression(solver='lbfgs', C=10**10, random_state=42)
log_reg.fit(X, y)
# 生成網格點矩陣，均200行500列
x0, x1 = np.meshgrid(
        np.linspace(2.9,7,500).reshape(-1,1),
        np.linspace(0.8,2.7,200).reshape(-1,1)
)
# 分别拉成一維，再合并
X_new = np.c_[x0.ravel(), x1.ravel()]
# 對X_new各點給出0/1分數
y_proba = log_reg.predict_proba(X_new)
plt.figure(figsize=(10,4))
# 清單前100T後50F，加第二位的0/1。。。
# 100個非弗行，以及第一列的長度第二列的寬度
plt.plot(X[y==0, 0], X[y==0,1], 'bs')
# 50個弗行，以及第一列的長度第二列的寬度
plt.plot(X[y==1, 0], X[y==1,1], 'g^')
zz = y_proba[:,1].reshape(x0.shape)
# 雙自變，1因變，非填充的漸變..輪廓線，類似等高
contour = plt.contour(x0, x1, zz, cmap=plt.cm.brg)
left_right = np.array([2.9, 7])
boundary = -(log_reg.coef_[0][0] * left_right + log_reg.intercept_[0]) / log_reg.coef_[0][1]
# 
plt.clabel(contour, inline=1, fontsize=12)
plt.plot(left_right, boundary, 'k--', linewidth=3)
plt.text(3.5, 1.5, '非弗', fontsize=14, color='b', ha='center')
plt.text(6.5, 2.3, '弗', fontsize=14, color='g', ha='center')
plt.xlabel('花瓣長度', fontsize=14)
plt.ylabel('花瓣寬度', fontsize=14)
plt.axis([2.9,7,0.8,2.7])
plt.show()

第四章訓練模型代碼學習4.1線性回歸4.2梯度下降4.3多項式回歸4.4學習曲線4.5正則化4.6邏輯回歸？？

X = iris['data'][:, (2,3)]
y = iris['target']
softmax_reg = LogisticRegression(multi_class='multinomial',solver='lbfgs', C=10)
softmax_reg.fit(X,y)
softmax_reg.predict([[5,2]])

array([2])

array([[6.38014896e-07, 5.74929995e-02, 9.42506362e-01]])

# 用softmax批量梯度下降，實作提前停止
X = iris['data'][:, (2,3)]
y = iris['target']
# 構造矩陣，對每個矩陣添加x0為1的偏置項
X_with_bias = np.c_[np.ones([len(X), 1]), X]
np.random.seed(2042)
# 手動實作train與val集分層抽樣、設定test、val比例與總數
test_ratio = 0.2
validation_ratio = 0.2
total_size = len(X_with_bias)
# 設定各資料集的數量
test_size = int(total_size * test_ratio)
validation_size = int(total_size * validation_ratio)
train_size = total_size - test_size - validation_size
# permutation:對總量索引随機排序，二維0縱1橫，三維0橫向1縱向，
rnd_indices = np.random.permutation(total_size)
# 各資料随機切片操作
X_train = X_with_bias[rnd_indices[:train_size]]
y_train = y[rnd_indices[:train_size]]
X_valid = X_with_bias[rnd_indices[train_size:-test_size]]
y_valid = y[rnd_indices[train_size:-test_size]]
X_test = X_with_bias[rnd_indices[-test_size:]]
y_test = y[rnd_indices[-test_size:]]
# 編稀疏模型（獨熱編碼），模型全0之後，隻要對應索引為1
def to_one_hot(y):
    n_classes = y.max() + 1
    m = len(y)
    Y_one_hot = np.zeros((m, n_classes))
    Y_one_hot[np.arange(m), y] = 1
    return Y_one_hot

array([0, 1, 2, 1, 1, 0, 1, 1, 1, 0])

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.],
       [0., 1., 0.],
       [0., 1., 0.],
       [1., 0., 0.],
       [0., 1., 0.],
       [0., 1., 0.],
       [0., 1., 0.],
       [1., 0., 0.]])

# 目标标簽全轉稀疏獨熱模型
Y_train_one_hot = to_one_hot(y_train)
Y_valid_one_hot = to_one_hot(y_valid)
Y_test_one_hot = to_one_hot(y_test)

# 指數除以指數之和函數
def softmax(logits):
    exps = np.exp(logits)
    exp_sums = np.sum(exps, axis=1, keepdims=True)
    return exps / exp_sums

# 特征數量、标簽類别數量
n_inputs = X_train.shape[1]
n_outputs = len(np.unique(y_train))

# 訓練模型需要的等式
# 成本函數
# $J(\mathbf{\Theta}) =
# - \dfrac{1}{m}\sum\limits_{i=1}^{m}\sum\limits_{k=1}^{K}{y_k^{(i)}\log\left(\hat{p}_k^{(i)}\right)}$
# 梯度計算
# $\nabla_{\mathbf{\theta}^{(k)}} \, J(\mathbf{\Theta}) = 
# \dfrac{1}{m} \sum\limits_{i=1}^{m}{ \left ( \hat{p}^{(i)}_k - y_k^{(i)} \right ) \mathbf{x}^{(i)}}$
eta = 0.01
n_iterations = 5001
m = len(X_train)
epsilon = 1e-7
Theta = np.random.randn(n_inputs, n_outputs)
# softmax實作批量梯度下降
for iteration in range(n_iterations):
    logits = X_train.dot(Theta)
    Y_proba = softmax(logits)
    if iteration % 500 == 0:
        loss = -np.mean(np.sum(Y_train_one_hot * np.log(Y_proba+epsilon), axis=1))
        print(iteration, loss)
    error = Y_proba - Y_train_one_hot
    gradients = 1/m * X_train.T.dot(error)
    Theta = Theta - eta * gradients

0 5.446205811872683
500 0.8350062641405651
1000 0.6878801447192402
1500 0.6012379137693313
2000 0.5444496861981872
2500 0.5038530181431525
3000 0.47292289721922487
3500 0.44824244188957774
4000 0.4278651093928793
4500 0.41060071429187134
5000 0.3956780375390374

Theta

array([[ 3.32094157, -0.6501102 , -2.99979416],
       [-1.1718465 ,  0.11706172,  0.10507543],
       [-0.70224261, -0.09527802,  1.4786383 ]])

# 預測驗證集、檢查機率分數
logits = X_valid.dot(Theta)
Y_proba = softmax(logits)
y_predict = np.argmax(Y_proba, axis=1)

accuracy_score = np.mean(y_predict == y_valid)
accuracy_score

0.9666666666666667

# 對成本函數添加l2懲罰
eta = 0.1
n_iterations = 5001
m = len(X_train)
epsilon = 1e-7
alpha = 0.1
Theta = np.random.randn(n_inputs, n_outputs)
for iteration in range(n_iterations):
    logits = X_train.dot(Theta)
    Y_proba = softmax(logits)
    if iteration % 500 == 0:
        xentropy_loss = -np.mean(np.sum(Y_train_one_hot * np.log(Y_proba + epsilon), axis=1))
        l2_loss = 1/2 * np.sum(np.square(Theta[1:]))
        loss = xentropy_loss + alpha * l2_loss
        print(iteration, loss)
    error = Y_proba - Y_train_one_hot
    gradients = 1/m * X_train.T.dot(error) + np.r_[np.zeros([1, n_outputs]), alpha * Theta[1:]]
    Theta = Theta - eta * gradients

0 5.401014020496038
500 0.5399802167300589
1000 0.5055073771883054
1500 0.4953639890209271
2000 0.49156703270914
2500 0.4900134074001495
3000 0.48934877664358845
3500 0.48905717267345383
4000 0.488927251858594
4500 0.4888688023117297
5000 0.4888423408562912

# 驗證模型表現
logits = X_valid.dot(Theta)
Y_proba = softmax(logits)
y_predict = np.argmax(Y_proba, axis=1)

accuracy_score = np.mean(y_predict == y_valid)
accuracy_score

1.0

#  提前停止算法：在每次疊代計算Eval，并在當Eval開始上升時停止
eta = 0.1
n_iterations = 5001
m =len(X_train)
epsilon = 1e-7
alpha = 0.1
best_loss = np.infty
Theta = np.random.randn(n_inputs, n_outputs)
for iteration in range(n_iterations):
    logits = X_train.dot(Theta)
    Y_proba = softmax(logits)
    error = Y_proba - Y_train_one_hot
    gradients = 1/m * X_train.T.dot(error) + np.r_[np.zeros([1, n_outputs]), alpha * Theta[1:]]
    Theta = Theta -eta * gradients
    
    logits = X_valid.dot(Theta)
    Y_proba = softmax(logits)
    xentropy_loss = -np.mean(np.sum(Y_valid_one_hot * np.log(Y_proba + epsilon), axis=1))
    l2_loss = 1/2 * np.sum(np.square(Theta[1:]))
    loss = xentropy_loss + alpha * l2_loss
    if iteration % 500 == 0:
        print(iteration, loss)
    if loss < best_loss:
        best_loss = loss
    else:
        print(iteration -1, best_loss)
        print(iteration, loss, '提前停止！')
        break

0 2.897275838876366
500 0.5702751662442892
1000 0.5425654873413586
1500 0.5353090385301479
2000 0.5331256731252507
2500 0.5325827330917428
2736 0.5325454243382794
2737 0.5325454252101579 提前停止！

logits = X_valid.dot(Theta)
Y_proba = softmax(logits)
y_predict = np.argmax(Y_proba, axis=1)
accuracy_score = np.mean(y_predict == y_valid)
accuracy_score

1.0

# 繪制模型預測圖
x0, x1 = np.meshgrid(
        np.linspace(0, 8, 500).reshape(-1,1),
        np.linspace(0, 3.5, 200).reshape(-1,1)
)
X_new = np.c_[x0.ravel(), x1.ravel()]
X_new_with_bias = np.c_[np.ones([len(X_new), 1]), X_new]
logits = X_new_with_bias.dot(Theta)
Y_proba = softmax(logits)
y_predict = np.argmax(Y_proba, axis=1)
zz1 = Y_proba[:, 1].reshape(x0.shape)
zz = y_predict.reshape(x0.shape)
plt.figure(figsize=(10, 4))
plt.plot(X[y==2,0], X[y==2,1], 'g^', label='弗吉尼亞')
plt.plot(X[y==1,0], X[y==1,1], 'bs', label='變色鸢尾')
plt.plot(X[y==0,0], X[y==0,1], 'yo', label='山鸢尾')
from matplotlib.colors import ListedColormap
custom_cmap = ListedColormap(['#fafab0', '#9898ff', '#a0faa0'])
plt.contourf(x0, x1, zz, cmap=custom_cmap)
contour = plt.contour(x0, x1, zz1, cmap=plt.cm.brg)
plt.clabel(contour, inline=1, fontsize=12)
plt.xlabel('花瓣長度',fontsize=14)
plt.ylabel('花瓣寬度',fontsize=14)
plt.legend(loc='upper left', fontsize=14)
plt.axis([0,7,0,3.5])
plt.show()

第四章訓練模型代碼學習4.1線性回歸4.2梯度下降4.3多項式回歸4.4學習曲線4.5正則化4.6邏輯回歸？？

# 對測試集測試模型準确度
# 原因：N太小，并且對三資料集切分不同結果也不同
logits = X_test.dot(Theta)
Y_proba = softmax(logits)
y_predict = np.argmax(Y_proba, axis=1)
accuracy_score = np.mean(y_predict == y_test)
accuracy_score

0.9333333333333333

第四章訓練模型代碼學習4.1線性回歸4.2梯度下降4.3多項式回歸4.4學習曲線4.5正則化4.6邏輯回歸？？

代碼學習

4.1線性回歸

代碼實作

4.2梯度下降

批量梯度下降

随機梯度下降SGD

小批量梯度下降

4.3多項式回歸

4.4學習曲線

4.5正則化

嶺回歸？？

Lasso回歸？？

彈性網路

提前停止

4.6邏輯回歸？？

決策邊界

Softmax回歸

繼續閱讀

吳恩達機器學習筆記（3）

吳恩達j機器學習之過拟合

吳恩達機器學習(一) 介紹

深度學習模型分析人類複雜疾病的準确性

疾病研究：重症肌無力

人工智能如何有效地運用于自然語言處理

新聞 | Mapbox 牽手阿裡，飛豬旅行上線六大城市地圖功能

【趨高機器視覺】機器視覺技術原了解析及解決方案

吳恩達 coursera ML 第七課總結+作業答案前言目錄正文模型表示作業答案

XGBoost Plotting API以及GBDT組合特征實踐 XGBoost Plotting API以及GBDT組合特征實踐

[HTML5]自定義屬性 data-* 和 jQuery.data 詳解

解碼器用于語義分割：資料依賴的解碼可以實作靈活的特征聚合

2021-2025年中國運動療法（KT）帶行業市場供需與戰略研究報告

cs231n斯坦福基于卷積神經網絡的CV學習筆記（一）KNN和線性分類器/分類器損失/反向傳播一，KNN圖像分類算法二，線性分類器三，線性分類器損失四，反向傳播五，神經網絡

2021年危險化學品經營機關安全管理人員考試題庫及危險化學品經營機關安全管理人員考試技巧

無人機--飛控科普

第四章 訓練模型代碼學習4.1線性回歸4.2梯度下降4.3多項式回歸4.4學習曲線4.5正則化4.6邏輯回歸？？

代碼學習

4.1線性回歸

代碼實作

4.2梯度下降

批量梯度下降

随機梯度下降SGD

小批量梯度下降

4.3多項式回歸

4.4學習曲線

4.5正則化

嶺回歸？？

Lasso回歸？？

彈性網路

提前停止

4.6邏輯回歸？？

決策邊界

Softmax回歸

繼續閱讀

第四章訓練模型代碼學習4.1線性回歸4.2梯度下降4.3多項式回歸4.4學習曲線4.5正則化4.6邏輯回歸？？