天天看點

梯度下降-邏輯回歸 學習筆記梯度下降求解邏輯回歸梯度下降-邏輯回歸

梯度下降求解邏輯回歸

目标:建立分類器 , 求解出三個參數 :

θ 0 θ 1 θ 2 \theta_0 \theta_1 \theta_2 θ0​θ1​θ2​

設定門檻值,根據門檻值判斷錄取結果.

要完成的子產品:

  • sigmoid

    : 映射到機率的函數
  • model

    : 傳回預測結果值
  • cost

    : 根據參數計算損失
  • gradient

    : 計算每個參數的梯度方向
  • descent

    : 進行參數更新
  • accuracy

    : 計算精度

一 :

sigmoid

函數

公式:

g ( z ) = 1 1 + e − z g(z) = \frac{1}{1+e^{-z}} g(z)=1+e−z1​

def sigmoid(z):
	return 1 / (1 + np.exp(-z))
           

二 :

model

函數

公式:

( θ 0 θ 1 θ 2 ) × ( 1 x 1 x 2 ) = θ 0 + θ 1 x 1 + θ 2 x 2 \begin{array}{ccc} \begin{pmatrix}\theta_{0} & \theta_{1} & \theta_{2}\end{pmatrix} & \times & \begin{pmatrix}1\\ x_{1}\\ x_{2} \end{pmatrix}\end{array}=\theta_{0}+\theta_{1}x_{1}+\theta_{2}x_{2} (θ0​​θ1​​θ2​​)​×​⎝⎛​1x1​x2​​⎠⎞​​=θ0​+θ1​x1​+θ2​x2​

def model(X, theta):
	return sigmoid(np.dot(X, theta.T))
           

三 :

cost

函數

對數似然函數(去負号):

D ( h θ ( x ) , y ) = − y log ⁡ ( h θ ( x ) ) − ( 1 − y ) log ⁡ ( 1 − h θ ( x ) ) D(h_\theta(x), y) = -y\log(h_\theta(x)) - (1-y)\log(1-h_\theta(x)) D(hθ​(x),y)=−ylog(hθ​(x))−(1−y)log(1−hθ​(x))

平均損失:

J ( θ ) = 1 n ∑ i = 1 n D ( h θ ( x i ) , y i ) J(\theta)=\frac{1}{n}\sum_{i=1}^{n} D(h_\theta(x_i), y_i) J(θ)=n1​i=1∑n​D(hθ​(xi​),yi​)

def cost(X, y, theta):
	left = np.multiply(-y, np.log(model(X, theta)))
	right = np.multiply(1 - y, np.log(1- model(X, theta)))
	return np.sum(left - right) / (len(X))
           

四 : 計算gradient梯度

公式:

∂ J ∂ θ j = − 1 m ∑ i = 1 n ( y i − h θ ( x i ) ) x i j \frac{\partial J}{\partial \theta_j}=-\frac{1}{m}\sum_{i=1}^{n}(y_i - h_\theta(x_i))x_{ij} ∂θj​∂J​=−m1​i=1∑n​(yi​−hθ​(xi​))xij​

def gradient(X, y, theta):
	grad = np.zeros(theta.shape)
	error = (model(X, theta) - y).ravel()
	for j in range(len(theta.ravel())):
		term = np.multiply(error, X[:,j])
		grad[0, j] = np.sum(term) / len(X)

	return grad
           

梯度下降-邏輯回歸

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
df = pd.read_csv('LogiReg_data.txt', header=None, names=['Exam 1','Exam 2','Admitted'])

# 畫圖看一下
positive = df[df['Admitted'] == 1]
negative = df[df['Admitted'] == 0]
fig,ax = plt.subplots(figsize = (10,8))
ax.scatter(positive['Exam 1'], positive['Exam 2'], s=60,color='g',marker='o',label='Admitted')
ax.scatter(negative['Exam 1'], negative['Exam 2'], s=60,color='r',marker='x',label='Not Admitted')
ax.legend()
ax.grid()
ax.set_xlabel('Exam 1 Score')
ax.set_ylabel('Exam 2 Score')
           
梯度下降-邏輯回歸 學習筆記梯度下降求解邏輯回歸梯度下降-邏輯回歸
def sigmoid(z):
    return 1 / (1 + np.exp(-z))
def model(X, theta):
    return sigmoid(np.dot(X, theta.T))

df.insert(0,'Ones',1)  #建立一列, 值為1
orig_data = df.values
cols = orig_data.shape[1]
X = orig_data[:,0:cols-1]
y = orig_data[:,cols-1:cols]
theta = np.zeros([1,3])

def cost(X, y, theta):
    left = np.multiply(-y, np.log(model(X, theta)))
    right= np.multiply(1-y, np.log(1-model(X,theta)))
    return np.sum(left - right) / len(X)
cost(X, y, theta) # 損失

def gradient(X, y, theta):
    grad = np.zeros(theta.shape)
    error = (model(X, theta) - y).ravel()
    for j in range(len(theta.ravel())):
        term = np.multiply(error, X[:,j])
        grad[0,j]= np.sum(term)/len(X)
    return grad
           

gradient descent梯度下降三種方法:

stop_iter = 0
stop_cost = 1
stop_grad = 2
def stopCriterion(type, value, threshold):
    if type == stop_iter:
        return value > threshold
    elif type == stop_cost:
        return abs(value[-1]-value[-2]) < threshold
    elif type == stop_grad:
        return np.linalg.norm(value) < threshold # 求範數https://blog.csdn.net/hqh131360239/article/details/79061535

import numpy.random
def shuffleData(data):
    np.random.shuffle(data)
    cols = data.shape[1]
    X = data[:, 0:cols-1]
    y = data[:, cols-1:]
    return X,y

import time
def descent(data, theta, batchSize, stopType, thresh, alpha):
    init_time = time.time()
    i = 0
    k = 0
    X,y = shuffleData(data)
    grad = np.zeros(theta.shape)
    costs = [cost(X,y,theta)]
    
    while True:
        grad = gradient(X[k:k+batchSize], y[k:k+batchSize], theta)
        k += batchSize
        if k >= n:
            k = 0
            X,y = shuffleData(data)
 
        theta = theta - alpha*grad
        costs.append(cost(X, y, theta))
        i += 1
        if stopType == stop_iter:
            value = i
        elif stopType == stop_cost:
            value = costs
        elif stopType == stop_grad:
            value = grad
        if stopCriterion(stopType, value, thresh):
            break
    return theta, i-1, costs, grad, time.time() - init_time

def runExpe(data, theta, batchSize, stopType, thresh, alpha):
    theta, iter, costs, grad, dur = descent(data,theta,batchSize,stopType,thresh,alpha)
    name = 'Original' if (data[:,1]>2).sum() > 1 else 'Scaled'
    name += ' data - learning rate: {} - '.format(alpha)
    if batchSize == n:
        strDescType = 'Gradient'
    elif batchSize == 1:
        strDescType = 'Stochastic'
    else:
        strDescType = 'Mini-batch ({})'.format(batchSize)
    name += strDescType + ' descent - Stop: '
    if stopType == stop_iter:
        strStop = '{} iterations'.format(thresh)
    elif stopType == stop_cost:
        strStop = 'costs change < {}'.format(thresh)
    else:
        strStop = 'gradient norm < {}'.format(thresh)
    name += strStop
    print('***{}\nTheta: {} - Iter: {} - Last cost: {:.2f} - Duration: {:.2f}s'.
          format(name, theta, iter, costs[-1], dur))
    fig,ax = plt.subplots(figsize=(15,4))
    ax.plot(np.arange(len(costs)), costs, 'r')
    ax.set_xlabel('Iterations')
    ax.set_ylabel('Cost')
    ax.set_title(name + ' - Error vs. Iteration')
    ax.grid()
    return theta
           

設定參數:

n = 100
runExpe(orig_data, theta, n, stop_iter, thresh=5000, alpha=0.000001)
           
梯度下降-邏輯回歸 學習筆記梯度下降求解邏輯回歸梯度下降-邏輯回歸
梯度下降-邏輯回歸 學習筆記梯度下降求解邏輯回歸梯度下降-邏輯回歸
梯度下降-邏輯回歸 學習筆記梯度下降求解邏輯回歸梯度下降-邏輯回歸

調整參數:

runExpe(orig_data, theta, 1, stop_iter, thresh=1000, alpha=0.001)
runExpe(orig_data, theta, 1, stop_iter, thresh=15000, alpha=0.000001)
runExpe(orig_data, theta, 16, stop_iter, thresh=15000, alpha=0.0001)
           
梯度下降-邏輯回歸 學習筆記梯度下降求解邏輯回歸梯度下降-邏輯回歸
梯度下降-邏輯回歸 學習筆記梯度下降求解邏輯回歸梯度下降-邏輯回歸
梯度下降-邏輯回歸 學習筆記梯度下降求解邏輯回歸梯度下降-邏輯回歸

- 資料預處理(标準化)後調參:

from sklearn import preprocessing as pp
scaled_data = orig_data.copy()
scaled_data[:, 1:3] = pp.scale(orig_data[:, 1:3])
runExpe(scaled_data,theta,n,stop_iter,thresh=5000,alpha=0.001)
           
梯度下降-邏輯回歸 學習筆記梯度下降求解邏輯回歸梯度下降-邏輯回歸
runExpe(scaled_data, theta, n, stop_grad, thresh=0.02, alpha=0.001)
runExpe(scaled_data, theta, 1, stop_grad, thresh=0.0004, alpha=0.001)
           
梯度下降-邏輯回歸 學習筆記梯度下降求解邏輯回歸梯度下降-邏輯回歸
梯度下降-邏輯回歸 學習筆記梯度下降求解邏輯回歸梯度下降-邏輯回歸
梯度下降-邏輯回歸 學習筆記梯度下降求解邏輯回歸梯度下降-邏輯回歸
  • 計算精度
def predict(X, theta):
    return [1 if x >= 0.5 else 0 for x in model(X, theta)]
scaled_X = scaled_data[:,:3]
y = scaled_data[:, 3]
predictions = predict(scaled_X, theta)
correct = [1 if ((a==1 and b==1) or (a==0 and b==0)) else 0 for (a,b) in zip(predictions, y)]
accuracy = (sum(map(int, correct)) % len(correct))
print('accuracy = {}%'.format(accuracy))
           

accuracy = 90%