台大林軒田《機器學習基石》:作業一python實作
台大林軒田《機器學習基石》:作業二python實作
台大林軒田《機器學習基石》:作業三python實作
台大林軒田《機器學習基石》:作業四python實作
完整代碼:
https://github.com/xjwhhh/LearningML/tree/master/MLFoundation
歡迎follow和star
在學習和總結的過程中參考了不少别的博文,且自己的水準有限,如果有錯,希望能指出,共同學習,共同進步
##17,18
分類方法是"positive and negative rays",老師上課講過的
第17題是要在[-1,1]種取20個點,分隔為21個區間作為theta的取值區間,每種分類有42個hyphothesis,枚舉所有可能情況找到使E_in最小的hyphothesis,記錄最小E_in
第18題的意思是在17題得到的最佳hyphothesis的基礎上,利用第16題的公式計算E_out.
注意點:
1.由16題得需要加20%噪聲
2.使用16題的答案來計算E_out
代碼如下:
import numpy as np
# generate input data with 20% flipping noise
def generate_input_data(time_seed):
np.random.seed(time_seed)
raw_X = np.sort(np.random.uniform(-1, 1, 20))
# 加20%噪聲
noised_y = np.sign(raw_X) * np.where(np.random.random(raw_X.shape[0]) < 0.2, -1, 1)
return raw_X, noised_y
def calculate_Ein(x, y):
# calculate median of interval & negative infinite & positive infinite
thetas = np.array([float("-inf")] + [(x[i] + x[i + 1]) / 2 for i in range(0, x.shape[0] - 1)] + [float("inf")])
Ein = x.shape[0]
sign = 1
target_theta = 0.0
# positive and negative rays
for theta in thetas:
y_positive = np.where(x > theta, 1, -1)
y_negative = np.where(x < theta, 1, -1)
error_positive = sum(y_positive != y)
error_negative = sum(y_negative != y)
if error_positive > error_negative:
if Ein > error_negative:
Ein = error_negative
sign = -1
target_theta = theta
else:
if Ein > error_positive:
Ein = error_positive
sign = 1
target_theta = theta
# two corner cases
if target_theta == float("inf"):
target_theta = 1.0
if target_theta == float("-inf"):
target_theta = -1.0
return Ein, target_theta, sign
if __name__ == '__main__':
T = 5000
total_Ein = 0
sum_Eout = 0
for i in range(0, T):
x, y = generate_input_data(i)
curr_Ein, theta, sign = calculate_Ein(x, y)
total_Ein = total_Ein + curr_Ein
sum_Eout = sum_Eout + 0.5 + 0.3 * sign * (abs(theta) - 1)
# 17
print((total_Ein * 1.0) / (T * 20))
# 18
print((sum_Eout * 1.0) / T)
我的運作結果17題是0.17014,18題是0.2561434158451364
##19
下載下傳訓練樣本,分别拿出每一次元分别按照16-18的方法計算其錯誤率,在每一次元選取各自最好的Hyphothesis。在得到的9個最好的Hyphothesis中,再選取這9個中最好的Hyphothesis,作為全局最優Hyphothesis,記錄此時的Hyphothesis的參數,它所在的次元,最小錯誤率
實際上和16-18并無什麼差別,隻是注意要記得儲存和比較
代碼如下:
import numpy as np
def read_input_data(path):
x = []
y = []
for line in open(path).readlines():
items = line.strip().split(' ')
tmp_x = []
for i in range(0, len(items) - 1): tmp_x.append(float(items[i]))
x.append(tmp_x)
y.append(float(items[-1]))
return np.array(x), np.array(y)
def calculate_Ein(x, y):
# calculate median of interval & negative infinite & positive infinite
thetas = np.array([float("-inf")] + [(x[i] + x[i + 1]) / 2 for i in range(0, x.shape[0] - 1)] + [float("inf")])
Ein = x.shape[0]
sign = 1
target_theta = 0.0
# positive and negative rays
for theta in thetas:
y_positive = np.where(x > theta, 1, -1)
y_negative = np.where(x < theta, 1, -1)
error_positive = sum(y_positive != y)
error_negative = sum(y_negative != y)
if error_positive > error_negative:
if Ein > error_negative:
Ein = error_negative
sign = -1
target_theta = theta
else:
if Ein > error_positive:
Ein = error_positive
sign = 1
target_theta = theta
return Ein, target_theta, sign
if __name__ == '__main__':
# 19
x, y = read_input_data("hw2_train.dat")
# record optimal descision stump parameters
Ein = x.shape[0]
theta = 0
sign = 1
index = 0
# multi decision stump optimal process
for i in range(0, x.shape[1]):
input_x = x[:, i]
input_data = np.transpose(np.array([input_x, y]))
input_data = input_data[np.argsort(input_data[:, 0])]
curr_Ein, curr_theta, curr_sign = calculate_Ein(input_data[:, 0], input_data[:, 1])
if Ein > curr_Ein:
Ein = curr_Ein
theta = curr_theta
sign = curr_sign
index = i
print((Ein * 1.0) / x.shape[0])
運作結果為0.25
##20
對19題得到的最好的Hyphothesis,對測試資料集對應次元計算E_out
注意此時計算E_out是使用測試資料集,而不是16-18那樣使用公式計算
代碼如下,直接添加在19題的main方法裡即可:
# 20
# test process
test_x, test_y = read_input_data("hw2_test.dat")
test_x = test_x[:, index]
predict_y = np.array([])
if sign == 1:
predict_y = np.where(test_x > theta, 1.0, -1.0)
else:
predict_y = np.where(test_x < theta, 1.0, -1.0)
Eout = sum(predict_y != test_y)
print((Eout * 1.0) / test_x.shape[0])
運作結果為0.355