機器學習之決策樹實踐：隐形眼鏡類型預測

2023-08-07 01:14:34

步驟：

收集資料：使用書中提供的小型資料集

準備資料：對文本中的資料進行預處理，如解析資料行

分析資料：快速檢查資料，并使用createPlot()函數繪制最終的樹形圖

訓練決策樹：使用createTree()函數訓練

測試決策樹：編寫簡單的測試函數驗證決策樹的輸出結果&繪圖結果

使用決策樹：這部分可選擇将訓練好的決策樹進行存儲，以便随時使用

1、資料集

young	myope	no	reduced	no lenses
young	myope	no	normal	soft
young	myope	yes	reduced	no lenses
young	myope	yes	normal	hard
young	hyper	no	reduced	no lenses
young	hyper	no	normal	soft
young	hyper	yes	reduced	no lenses
young	hyper	yes	normal	hard
pre	myope	no	reduced	no lenses
pre	myope	no	normal	soft
pre	myope	yes	reduced	no lenses
pre	myope	yes	normal	hard
pre	hyper	no	reduced	no lenses
pre	hyper	no	normal	soft
pre	hyper	yes	reduced	no lenses
pre	hyper	yes	normal	no lenses
presbyopic	myope	no	reduced	no lenses
presbyopic	myope	no	normal	no lenses
presbyopic	myope	yes	reduced	no lenses
presbyopic	myope	yes	normal	hard
presbyopic	hyper	no	reduced	no lenses
presbyopic	hyper	no	normal	soft
presbyopic	hyper	yes	reduced	no lenses
presbyopic	hyper	yes	normal	no lenses

2、代碼如下

# -*- coding: UTF-8 -*-
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.externals.six import StringIO
from sklearn import tree
import pandas as pd
import numpy as np
import pydotplus

if __name__ == '__main__':
	with open('lenses.txt', 'r') as fr:										#加載檔案
		lenses = [inst.strip().split('\t') for inst in fr.readlines()]		#處理檔案
	lenses_target = []														#提取每組資料的類别，儲存在清單裡
	for each in lenses:
		lenses_target.append(each[-1])
	# print(lenses_target)

	lensesLabels = ['age', 'prescript', 'astigmatic', 'tearRate']			#特征标簽		
	lenses_list = []														#儲存lenses資料的臨時清單
	lenses_dict = {}														#儲存lenses資料的字典，用于生成pandas
	for each_label in lensesLabels:											#提取資訊，生成字典
		for each in lenses:
			lenses_list.append(each[lensesLabels.index(each_label)])
		lenses_dict[each_label] = lenses_list
		lenses_list = []
	# print(lenses_dict)														#列印字典資訊
	lenses_pd = pd.DataFrame(lenses_dict)									#生成pandas.DataFrame
	# print(lenses_pd)														#列印pandas.DataFrame
	le = LabelEncoder()														#建立LabelEncoder()對象，用于序列化			
	for col in lenses_pd.columns:											#序列化
		lenses_pd[col] = le.fit_transform(lenses_pd[col])
	# print(lenses_pd)														#列印編碼資訊

	clf = tree.DecisionTreeClassifier(max_depth = 4)						#建立DecisionTreeClassifier()類
	clf = clf.fit(lenses_pd.values.tolist(), lenses_target)					#使用資料，建構決策樹

	dot_data = StringIO()
	tree.export_graphviz(clf, out_file = dot_data,							#繪制決策樹
						feature_names = lenses_pd.keys(),
						class_names = clf.classes_,
						filled=True, rounded=True,
						special_characters=True)
	graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
	graph.write_pdf("tree.pdf")												#儲存繪制好的決策樹，以PDF的形式存儲。

	print(clf.predict([[1,1,1,0]]))

機器學習之決策樹實踐：隐形眼鏡類型預測

繼續閱讀

簡單文檔分類——樸素貝葉斯算法樸素貝葉斯算法簡單文檔分類執行個體步驟總結樸素貝葉斯分類調用(sklearn)

【分類算法】什麼是分類算法定義分類與聚類分類過程方法

分類算法的評價名額

K-近鄰算法以及圖像分類應用

weka之NB算法

使用weka的select attribute

weka中分類器算法

在weka中內建自己的算法

【多變量線性回歸】學習記錄序思路實作終

申請評分模型拒絕推斷（RI）方法申請評分模型拒絕推斷（RI）方法

【人工智能行業大師訪談1】吳恩達采訪 Geoffery Hinton

【趨高機器視覺】機器視覺技術原了解析及解決方案

吳恩達 coursera ML 第七課總結+作業答案前言目錄正文模型表示作業答案

XGBoost Plotting API以及GBDT組合特征實踐 XGBoost Plotting API以及GBDT組合特征實踐

解碼器用于語義分割：資料依賴的解碼可以實作靈活的特征聚合

2021-2025年中國運動療法（KT）帶行業市場供需與戰略研究報告