【MachineLearning】之 K-近鄰算法實作

一、步驟

資料準備：通過資料清洗，資料處理，将每條資料整理成向量。
計算距離：計算測試資料與訓練資料之間的距離。
尋找鄰居：找到與測試資料距離最近的 K 個訓練資料樣本。
決策分類：根據決策規則，從 K 個鄰居得到測試資料的類别。

下面嘗試一個KNN分類流程

（1）資料生成

"""生成示例資料
"""
import numpy as np


def create_data():
    features = np.array(
        [[2.88, 3.05], [3.1, 2.45], [3.05, 2.8], [2.9, 2.7], [2.75, 3.4],
         [3.23, 2.9], [3.2, 3.75], [3.5, 2.9], [3.65, 3.6], [3.35, 3.3]])
    labels = ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B']
    return features, labels

"""列印示例資料
"""
features, labels = create_data()
print('features: \n', features)
print('labels: \n', labels)

"""示例資料繪圖
"""
from matplotlib import pyplot as plt
%matplotlib inline

plt.figure(figsize=(5, 5))
plt.xlim((2.4, 3.8))
plt.ylim((2.4, 3.8))
x_feature = list(map(lambda x: x[0], features))  # 傳回每個資料的x特征值
y_feature = list(map(lambda y: y[1], features))
plt.scatter(x_feature[:5], y_feature[:5], c="b")  # 在畫布上繪畫出"A"類标簽的資料點
plt.scatter(x_feature[5:], y_feature[5:], c="g")
plt.scatter([3.18], [3.15], c="r", marker="x")  # 待測試點的坐标為 [3.1，3.2]

标簽标注：

标簽為

（藍色圓點）的資料在左下角的位置。

标簽為

（綠色圓點）的資料在右上角的位置。

紅色 × 點

（2）算法實作

距離計算：歐式距離
分類的決策規則：多數表決法

test_data ：用于分類的輸入向量。
train_data ：輸入的訓練樣本集。
labels ：樣本資料的類标簽向量。
k ：用于選擇最近鄰居的數目。

"""KNN 方法完整實作
"""


def knn_classify(test_data, train_data, labels, k):
    distances = np.array([])  # 建立一個空的數組用于存放距離

    for each_data in train_data:  # 使用歐式距離計算資料相似度
        d = d_euc(test_data, each_data)
        distances = np.append(distances, d)

    sorted_distance_index = distances.argsort()  # 擷取按距離大小排序後的索引
    sorted_distance = np.sort(distances)
    r = (sorted_distance[k]+sorted_distance[k-1])/2  # 計算

    class_count = {}
    for i in range(k):  # 多數表決
        vote_label = labels[sorted_distance_index[i]]
        class_count[vote_label] = class_count.get(vote_label, 0) + 1

    final_label = majority_voting(class_count)
    return

分類預測

對未知資料

[3.18, 3.15]

開始分類，初識設定 K值為 5

test_data = np.array([3.18, 3.15])
final_label, r = knn_classify(test_data, features, labels, 5)
final_label

畫圖方式形象化展示 KNN 算法決策方式。

def circle(r, a, b):  # 為了畫出圓，這裡采用極坐标的方式對圓進行表示 ：x=r*cosθ，y=r*sinθ。
    theta = np.arange(0, 2*np.pi, 0.01)
    x = a+r * np.cos(theta)
    y = b+r * np.sin(theta)
    return x, y


k_circle_x, k_circle_y = circle(r, 3.18, 3.15)

plt.figure(figsize=(5, 5))
plt.xlim((2.4, 3.8))
plt.ylim((2.4, 3.8))
x_feature = list(map(lambda x: x[0], features))  # 傳回每個資料的x特征值
y_feature = list(map(lambda y: y[1], features))
plt.scatter(x_feature[:5], y_feature[:5], c="b")  # 在畫布上繪畫出"A"類标簽的資料點
plt.scatter(x_feature[5:], y_feature[5:], c="g")
plt.scatter([3.18], [3.15], c="r", marker="x")  # 待測試點的坐标為 [3.1，3.2]

當

值為

時，與測試樣本距離最近的

個訓練資料（如藍色圓圈所示）中屬于

類的有

個，屬于

類的有

個，根據多數表決法決策出測試樣本的資料為

類。

from ipywidgets import interact, fixed


def change_k(test_data, features, k):
    final_label, r = knn_classify(test_data, features, labels, k)
    k_circle_x, k_circle_y = circle(r, 3.18, 3.15)
    plt.figure(figsize=(5, 5))
    plt.xlim((2.4, 3.8))
    plt.ylim((2.4, 3.8))
    x_feature = list(map(lambda x: x[0], features))  # 傳回每個資料的x特征值
    y_feature = list(map(lambda y: y[1], features))
    plt.scatter(x_feature[:5], y_feature[:5], c="b")  # 在畫布上繪畫出"A"類标簽的資料點
    plt.scatter(x_feature[5:], y_feature[5:], c="g")
    plt.scatter([3.18], [3.15], c="r", marker="x")  # 待測試點的坐标為 [3.1，3.2]
    plt.plot(k_circle_x, k_circle_y)


interact(change_k, test_data=fixed(test_data),
         features=fixed(features), k=[3, 5, 7, 9])

【MachineLearning】之 K-近鄰算法實作

一、步驟

（1）資料生成

（2）算法實作

分類預測

繼續閱讀

5G小型蜂應用指南

PAT (Advanced Level) Practise 1012 The Best Rank (25)

mysql5.7的sql優化

線程通信和程序通信差別（線程程序差別）

Matlab随機波動率SV、GARCH用MCMC馬爾可夫鍊蒙特卡羅方法分析匯率時間序列

微信小程式前端解密擷取使用者資訊

Spring MVC 自學雜記（五） -- SpringMVC與前台的json資料互動

《MySQL技術内幕：InnoDB存儲引擎》筆記

擴容TIKV節點遇到的坑

PHP輔導代做程式設計：CS353 Database System

自學Zabbix3.10.2-事件通知Notifications upon events-Actions報警配置點選傳回：自學zabbix集錦

HDU 5678 ztr loves trees

FZU 2086 餐廳點餐

拓端tecdat|R語言彈性網絡Elastic Net正則化懲罰回歸模型交叉驗證可視化

二叉樹及其應用--二叉樹建立

詳解STM32單片機的堆棧