Python機器學習實戰-建立KNN模型預測腎髒疾病（源碼和實作效果）

實作功能：

python建立KNN模型預測腎髒疾病完整代碼和實作效果

實作代碼：

import pandas as pd
import warnings
warnings.filterwarnings("ignore")
pd.set_option('display.max_columns', 26)

#==========================讀取資料======================================
df = pd.read_csv("E:\資料雜壇\datasets\kidney_disease.csv")
df=pd.DataFrame(df)
pd.set_option('display.max_rows', None)
pd.set_option('display.width', None)
df.drop("id",axis=1,inplace=True)
print(df.head())
print(df.dtypes)
df["classification"] = df["classification"].apply(lambda x: x if x == "notckd" else "ckd")
# 分類型變量名
cat_cols = [col for col in df.columns if df[col].dtype == "object"]
# 數值型變量名
num_cols = [col for col in df.columns if df[col].dtype != "object"]

# ========================缺失值處理============================
def random_value_imputate(col):
    """
    函數：随機填充方法（缺失值較多的字段）
    """

    # 1、确定填充的數量；在取出缺失值随機選擇缺失值數量的樣本
    random_sample = df[col].dropna().sample(df[col].isna().sum())
    # 2、索引号就是原缺失值記錄的索引号
    random_sample.index = df[df[col].isnull()].index
    # 3、通過loc函數定位填充
    df.loc[df[col].isnull(), col] = random_sample


def mode_impute(col):
    """
    函數：衆數填充缺失值
    """
    # 1、确定衆數
    mode = df[col].mode()[0]
    # 2、fillna函數填充衆數
    df[col] = df[col].fillna(mode)

for col in num_cols:
    random_value_imputate(col)

for col in cat_cols:
    if col in ['rbc','pc']:
        # 随機填充
        random_value_imputate('rbc')
        random_value_imputate('pc')
    else:
        mode_impute(col)

# ======================特征編碼============================
from sklearn.preprocessing import MinMaxScaler
mms = MinMaxScaler()
df[num_cols] = mms.fit_transform(df[num_cols])

from sklearn.preprocessing import LabelEncoder
led = LabelEncoder()
for col in cat_cols:
    df[col] = led.fit_transform(df[col])

print(df.head())

#===========================資料集劃分===============================
X = df.drop("classification",axis=1)
y = df["classification"]
from sklearn.utils import shuffle
df = shuffle(df)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state = 0)

#===========================模組化=====================================
def create_model(model):
    # 模型訓練
    model.fit(X_train, y_train)
    # 模型預測
    y_pred = model.predict(X_test)
    # 準确率acc
    acc = accuracy_score(y_test, y_pred)
    # 混淆矩陣
    cm = confusion_matrix(y_test, y_pred)
    # 分類報告
    cr = classification_report(y_test, y_pred)

    print(f"Test Accuracy of {model} : {acc}")
    print(f"Confusion Matrix of {model}: \n{cm}")
    print(f"Classification Report of {model} : \n {cr}")

from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier()
create_model(knn)

實作效果：

本人讀研期間發表5篇SCI資料挖掘相關論文，現在某研究院從事資料挖掘相關科研工作，對資料挖掘有一定認知和了解，會結合自身科研實踐經曆不定期分享關于python機器學習、深度學習、資料挖掘基礎知識與案例。
緻力于隻做原創，以最簡單的方式了解和學習，關注我一起交流成長。
關注 訂閱号（資料雜壇） 可在背景聯系我擷取相關資料集和源碼，送有關資料分析、資料挖掘、機器學習、深度學習相關的電子書籍。

Python機器學習實戰-建立KNN模型預測腎髒疾病（源碼和實作效果）

繼續閱讀

XGBoost Plotting API以及GBDT組合特征實踐 XGBoost Plotting API以及GBDT組合特征實踐

解碼器用于語義分割：資料依賴的解碼可以實作靈活的特征聚合

YAML簡介和PyYAML安全操作YAML支援的類型YAML的優點：yaml的基本文法python操作

2021-2025年中國運動療法（KT）帶行業市場供需與戰略研究報告

Small tricks

libsvm for python 安裝

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

27. Remove Element(清單)題目代碼

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入