台大林軒田《機器學習基石》：作業二python實作

台大林軒田《機器學習基石》：作業一python實作

台大林軒田《機器學習基石》：作業三python實作

台大林軒田《機器學習基石》：作業四python實作

完整代碼：

https://github.com/xjwhhh/LearningML/tree/master/MLFoundation

歡迎follow和star

在學習和總結的過程中參考了不少别的博文，且自己的水準有限，如果有錯，希望能指出，共同學習，共同進步

##17，18

台大林軒田《機器學習基石》：作業二python實作

分類方法是"positive and negative rays",老師上課講過的

第17題是要在[-1,1]種取20個點，分隔為21個區間作為theta的取值區間，每種分類有42個hyphothesis，枚舉所有可能情況找到使E_in最小的hyphothesis，記錄最小E_in

第18題的意思是在17題得到的最佳hyphothesis的基礎上，利用第16題的公式計算E_out.

注意點：

1.由16題得需要加20%噪聲

2.使用16題的答案來計算E_out

代碼如下：

import numpy as np


# generate input data with 20% flipping noise
def generate_input_data(time_seed):
    np.random.seed(time_seed)
    raw_X = np.sort(np.random.uniform(-1, 1, 20))
    # 加20%噪聲
    noised_y = np.sign(raw_X) * np.where(np.random.random(raw_X.shape[0]) < 0.2, -1, 1)
    return raw_X, noised_y


def calculate_Ein(x, y):
    # calculate median of interval & negative infinite & positive infinite
    thetas = np.array([float("-inf")] + [(x[i] + x[i + 1]) / 2 for i in range(0, x.shape[0] - 1)] + [float("inf")])
    Ein = x.shape[0]
    sign = 1
    target_theta = 0.0
    # positive and negative rays
    for theta in thetas:
        y_positive = np.where(x > theta, 1, -1)
        y_negative = np.where(x < theta, 1, -1)
        error_positive = sum(y_positive != y)
        error_negative = sum(y_negative != y)
        if error_positive > error_negative:
            if Ein > error_negative:
                Ein = error_negative
                sign = -1
                target_theta = theta
        else:
            if Ein > error_positive:
                Ein = error_positive
                sign = 1
                target_theta = theta
    # two corner cases
    if target_theta == float("inf"):
        target_theta = 1.0
    if target_theta == float("-inf"):
        target_theta = -1.0
    return Ein, target_theta, sign


if __name__ == '__main__':
    T = 5000
    total_Ein = 0
    sum_Eout = 0
    for i in range(0, T):
        x, y = generate_input_data(i)
        curr_Ein, theta, sign = calculate_Ein(x, y)
        total_Ein = total_Ein + curr_Ein
        sum_Eout = sum_Eout + 0.5 + 0.3 * sign * (abs(theta) - 1)
    # 17
    print((total_Ein * 1.0) / (T * 20))
    # 18
    print((sum_Eout * 1.0) / T)

我的運作結果17題是0.17014，18題是0.2561434158451364

##19

台大林軒田《機器學習基石》：作業二python實作

下載下傳訓練樣本，分别拿出每一次元分别按照16-18的方法計算其錯誤率，在每一次元選取各自最好的Hyphothesis。在得到的9個最好的Hyphothesis中，再選取這9個中最好的Hyphothesis，作為全局最優Hyphothesis，記錄此時的Hyphothesis的參數，它所在的次元，最小錯誤率

實際上和16-18并無什麼差別，隻是注意要記得儲存和比較

代碼如下：

import numpy as np


def read_input_data(path):
    x = []
    y = []
    for line in open(path).readlines():
        items = line.strip().split(' ')
        tmp_x = []
        for i in range(0, len(items) - 1): tmp_x.append(float(items[i]))
        x.append(tmp_x)
        y.append(float(items[-1]))
    return np.array(x), np.array(y)


def calculate_Ein(x, y):
    # calculate median of interval & negative infinite & positive infinite
    thetas = np.array([float("-inf")] + [(x[i] + x[i + 1]) / 2 for i in range(0, x.shape[0] - 1)] + [float("inf")])
    Ein = x.shape[0]
    sign = 1
    target_theta = 0.0
    # positive and negative rays
    for theta in thetas:
        y_positive = np.where(x > theta, 1, -1)
        y_negative = np.where(x < theta, 1, -1)
        error_positive = sum(y_positive != y)
        error_negative = sum(y_negative != y)
        if error_positive > error_negative:
            if Ein > error_negative:
                Ein = error_negative
                sign = -1
                target_theta = theta
        else:
            if Ein > error_positive:
                Ein = error_positive
                sign = 1
                target_theta = theta
    return Ein, target_theta, sign


if __name__ == '__main__':
    # 19
    x, y = read_input_data("hw2_train.dat")
    # record optimal descision stump parameters
    Ein = x.shape[0]
    theta = 0
    sign = 1
    index = 0
    # multi decision stump optimal process
    for i in range(0, x.shape[1]):
        input_x = x[:, i]
        input_data = np.transpose(np.array([input_x, y]))
        input_data = input_data[np.argsort(input_data[:, 0])]
        curr_Ein, curr_theta, curr_sign = calculate_Ein(input_data[:, 0], input_data[:, 1])
        if Ein > curr_Ein:
            Ein = curr_Ein
            theta = curr_theta
            sign = curr_sign
            index = i
    print((Ein * 1.0) / x.shape[0])

運作結果為0.25

##20

台大林軒田《機器學習基石》：作業二python實作

對19題得到的最好的Hyphothesis，對測試資料集對應次元計算E_out

注意此時計算E_out是使用測試資料集，而不是16-18那樣使用公式計算

代碼如下，直接添加在19題的main方法裡即可：

# 20
# test process
test_x, test_y = read_input_data("hw2_test.dat")
test_x = test_x[:, index]
predict_y = np.array([])
if sign == 1:
    predict_y = np.where(test_x > theta, 1.0, -1.0)
else:
    predict_y = np.where(test_x < theta, 1.0, -1.0)
Eout = sum(predict_y != test_y)
print((Eout * 1.0) / test_x.shape[0])

運作結果為0.355

台大林軒田《機器學習基石》：作業二python實作

繼續閱讀

簡單文檔分類——樸素貝葉斯算法樸素貝葉斯算法簡單文檔分類執行個體步驟總結樸素貝葉斯分類調用(sklearn)

【分類算法】什麼是分類算法定義分類與聚類分類過程方法

分類算法的評價名額

K-近鄰算法以及圖像分類應用

weka之NB算法

使用weka的select attribute

weka中分類器算法

在weka中內建自己的算法

【多變量線性回歸】學習記錄序思路實作終

申請評分模型拒絕推斷（RI）方法申請評分模型拒絕推斷（RI）方法

【人工智能行業大師訪談1】吳恩達采訪 Geoffery Hinton

【趨高機器視覺】機器視覺技術原了解析及解決方案

吳恩達 coursera ML 第七課總結+作業答案前言目錄正文模型表示作業答案

XGBoost Plotting API以及GBDT組合特征實踐 XGBoost Plotting API以及GBDT組合特征實踐

解碼器用于語義分割：資料依賴的解碼可以實作靈活的特征聚合

2021-2025年中國運動療法（KT）帶行業市場供需與戰略研究報告