文章目錄

一、說明
二、資料項說明
三、實戰部分

一、說明

我是在jupyter完成的，然後導出成markdown格式，ipynb檔案導出為markdown的指令如下：

jupyter nbconvert --to markdown  xxx.ipynb

源代碼和資料檔案，點選這裡擷取

二、資料項說明

Name		Data Type	Meas.	Description
	----		---------	-----	-----------
	Sex		nominal			M, F, and I (infant)
	Length		continuous	mm	Longest shell measurement
	Diameter	continuous	mm	perpendicular to length
	Height		continuous	mm	with meat in shell
	Whole weight	continuous	grams	whole abalone
	Shucked weight	continuous	grams	weight of meat
	Viscera weight	continuous	grams	gut weight (after bleeding)
	Shell weight	continuous	grams	after being dried
	Rings		integer			+1.5 gives the age in years

現在有8個資料字段，前面7個是特征值，最最後一個Rings為預測，具體請查閱檔案内容

三、實戰部分

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Sex	Length	Diameter	Height	Whole weight	Shucked weight	Viscera weight	Shell weight	Rings
M	0.455	0.365	0.095	0.5140	0.2245	0.1010	0.150	15
1	M	0.350	0.265	0.090	0.2255	0.0995	0.0485	0.070	7
2	F	0.530	0.420	0.135	0.6770	0.2565	0.1415	0.210	9
3	M	0.440	0.365	0.125	0.5160	0.2155	0.1140	0.155	10
4	I	0.330	0.255	0.080	0.2050	0.0895	0.0395	0.055	7
5	I	0.425	0.300	0.095	0.3515	0.1410	0.0775	0.120	8
6	F	0.530	0.415	0.150	0.7775	0.2370	0.1415	0.330	20
7	F	0.545	0.425	0.125	0.7680	0.2940	0.1495	0.260	16
8	M	0.475	0.370	0.125	0.5095	0.2165	0.1125	0.165	9
9	F	0.550	0.440	0.150	0.8945	0.3145	0.1510	0.320	19

# 檢視資料容量 
dataframe01.shape

(4177, 9)

Index(['Sex', 'Length', 'Diameter', 'Height', 'Whole weight', 'Shucked weight',
       'Viscera weight', 'Shell weight', 'Rings'],
      dtype='object')

# 清洗資料
# 替換特征值，将性别中的字元類型轉化為整數
dataframe02 = dataframe01.copy()

dataframe02.Sex[dataframe01['Sex']=='I']=0
dataframe02.Sex[dataframe01['Sex']=='F']=1
dataframe02.Sex[dataframe01['Sex']=='M']=2

Sex	Length	Diameter	Height	Whole weight	Shucked weight	Viscera weight	Shell weight	Rings
2	0.455	0.365	0.095	0.5140	0.2245	0.1010	0.150	15
1	2	0.350	0.265	0.090	0.2255	0.0995	0.0485	0.070	7
2	1	0.530	0.420	0.135	0.6770	0.2565	0.1415	0.210	9
3	2	0.440	0.365	0.125	0.5160	0.2155	0.1140	0.155	10
4	0.330	0.255	0.080	0.2050	0.0895	0.0395	0.055	7
5	0.425	0.300	0.095	0.3515	0.1410	0.0775	0.120	8
6	1	0.530	0.415	0.150	0.7775	0.2370	0.1415	0.330	20
7	1	0.545	0.425	0.125	0.7680	0.2940	0.1495	0.260	16
8	2	0.475	0.370	0.125	0.5095	0.2165	0.1125	0.165	9
9	1	0.550	0.440	0.150	0.8945	0.3145	0.1510	0.320	19

# 導入線性回歸的庫
from sklearn.linear_model import LinearRegression as LR
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score

data_index

['Sex',
 'Length',
 'Diameter',
 'Height',
 'Whole weight',
 'Shucked weight',
 'Viscera weight',
 'Shell weight',
 'Rings']

# 擷取特征矩陣X 的index
X_index = data_index[0:-1]
Y_index = data_index[-1]

X_index, Y_index

(['Sex',
  'Length',
  'Diameter',
  'Height',
  'Whole weight',
  'Shucked weight',
  'Viscera weight',
  'Shell weight'],
 'Rings')

Sex	Length	Diameter	Height	Whole weight	Shucked weight	Viscera weight	Shell weight
2	0.455	0.365	0.095	0.5140	0.2245	0.1010	0.150
1	2	0.350	0.265	0.090	0.2255	0.0995	0.0485	0.070
2	1	0.530	0.420	0.135	0.6770	0.2565	0.1415	0.210
3	2	0.440	0.365	0.125	0.5160	0.2155	0.1140	0.155
4	0.330	0.255	0.080	0.2050	0.0895	0.0395	0.055

0    15
1     7
2     9
3    10
4     7
Name: Rings, dtype: int64

# 劃分訓練集和測試集
Xtrain, Xtest, Ytrain, Ytest = train_test_split(X,Y,test_size=0.2,random_state=420)

Sex	Length	Diameter	Height	Whole weight	Shucked weight	Viscera weight	Shell weight
2763	0.550	0.425	0.135	0.6560	0.2570	0.1700	0.203
439	2	0.500	0.415	0.165	0.6885	0.2490	0.1380	0.250
1735	2	0.670	0.520	0.165	1.3900	0.7110	0.2865	0.300
751	2	0.485	0.355	0.120	0.5470	0.2150	0.1615	0.140
1626	1	0.570	0.450	0.135	0.7805	0.3345	0.1850	0.210

2763    10
439     13
1735    11
751     10
1626     8
Name: Rings, dtype: int64

#恢複索引
for i in [Xtrain, Xtest]:
    i.index = range(i.shape[0])

#恢複索引
for i in [Ytrain, Ytest]:
    i.index = range(i.shape[0])

Sex	Length	Diameter	Height	Whole weight	Shucked weight	Viscera weight	Shell weight
0.550	0.425	0.135	0.6560	0.2570	0.1700	0.203
1	2	0.500	0.415	0.165	0.6885	0.2490	0.1380	0.250
2	2	0.670	0.520	0.165	1.3900	0.7110	0.2865	0.300
3	2	0.485	0.355	0.120	0.5470	0.2150	0.1615	0.140
4	1	0.570	0.450	0.135	0.7805	0.3345	0.1850	0.210

0    10
1    13
2    11
3    10
4     8
Name: Rings, dtype: int64

# 先用訓練集訓練(fit)标準化的類，然後用訓練好的類分别轉化(transform)訓練集和測試集

# 開始模組化
reg = LR().fit(Xtrain, Ytrain)

4.22923686878166

22.656846035572762

array([  0.40527178,  -0.88791132,  13.01662939,  10.39250886,
         9.64127293, -20.87747601, -10.50683081,   7.70632772])

Xtrain.columns

Index(['Sex', 'Length', 'Diameter', 'Height', 'Whole weight', 'Shucked weight',
       'Viscera weight', 'Shell weight'],
      dtype='object')

[('Sex', 0.4052717783379893),
 ('Length', -0.8879113179582045),
 ('Diameter', 13.016629389061475),
 ('Height', 10.39250886428478),
 ('Whole weight', 9.64127293101552),
 ('Shucked weight', -20.87747600529615),
 ('Viscera weight', -10.506830809919672),
 ('Shell weight', 7.706327719866024)]

Name Data Type Meas. Description

Sex nominal M, F, and I (infant)

Length continuous mm Longest shell measurement

Diameter continuous mm perpendicular to length

Height continuous mm with meat in shell

Whole weight continuous grams whole abalone

Shucked weight continuous grams weight of meat

Viscera weight continuous grams gut weight (after bleeding)

Shell weight continuous grams after being dried

Rings integer +1.5 gives the age in years

# 截距
reg.intercept_

2.7888240054011835

# 自定義最小二乘法嘗試
def my_least_squares(x_array, y_array):
    '''
    :param x: 清單，表示m*n矩陣
    :param y: 清單，表示m*1矩陣
    :return: coef：list 回歸系數(1*n矩陣)   intercept: float 截距
    '''
    # 矩陣對象化
    arr_x_01 = np.array(x_array)
    arr_y_01 = np.array(y_array)

    # x_array由 m*n矩陣轉化為 m*(n+1)矩陣，其中第n+1列系數全為1
    # 擷取行數
    row_num = arr_x_01.shape[0]

    # 生成常量系數矩陣  m*1矩陣
    arr_b = np.array([[1 for i in range(0, row_num)]])

    # 合并成m*(n+1)矩陣
    arr_x_02 = np.insert(arr_x_01, 0, values=arr_b, axis=1)

    # 矩陣運算
    w = np.linalg.inv(np.matmul(arr_x_02.T, arr_x_02))
    w = np.matmul(w, arr_x_02.T)
    w = np.matmul(w, arr_y_01)
    
    # w為1*(n+1)矩陣
    # print(w)
    result = list(w)
    coef = result.pop(-1)
    intercept = result
    
    return coef, intercept

# debug中
my_least_squares(Xtrain,list(Ytrain))

# 梯度下降法嘗試
def costFunc(X,Y,theta):
    '''
    代價函數
    '''
    inner = np.power((X*theta.T)-Y,2)
    return np.sum(inner)/(2*len(X))

def gradientDescent(X,Y,theta,alpha,iters):
    '''
    梯度下降
    '''
    temp = np.mat(np.zeros(theta.shape))
    cost = np.zeros(iters)
    thetaNums = int(theta.shape[1])
    print(thetaNums)
    for i in range(iters):
        error = (X*theta.T-Y)
        for j in range(thetaNums):
            derivativeInner = np.multiply(error,X[:,j])
            temp[0,j] = theta[0,j] - (alpha*np.sum(derivativeInner)/len(X))

        theta = temp
        cost[i] = costFunc(X,Y,theta)

    return theta,cost

基于最小二乘法的一般多元線性回歸的實戰一、說明二、資料項說明三、實戰部分

文章目錄

一、說明

二、資料項說明

三、實戰部分

繼續閱讀

簡單文檔分類——樸素貝葉斯算法樸素貝葉斯算法簡單文檔分類執行個體步驟總結樸素貝葉斯分類調用(sklearn)

【分類算法】什麼是分類算法定義分類與聚類分類過程方法

分類算法的評價名額

K-近鄰算法以及圖像分類應用

weka之NB算法

使用weka的select attribute

weka中分類器算法

在weka中內建自己的算法

【多變量線性回歸】學習記錄序思路實作終

申請評分模型拒絕推斷（RI）方法申請評分模型拒絕推斷（RI）方法

【人工智能行業大師訪談1】吳恩達采訪 Geoffery Hinton

【趨高機器視覺】機器視覺技術原了解析及解決方案

吳恩達 coursera ML 第七課總結+作業答案前言目錄正文模型表示作業答案

XGBoost Plotting API以及GBDT組合特征實踐 XGBoost Plotting API以及GBDT組合特征實踐

解碼器用于語義分割：資料依賴的解碼可以實作靈活的特征聚合

2021-2025年中國運動療法（KT）帶行業市場供需與戰略研究報告