學習一個東西之前要認清學的是什麼
啥是機器學習?
機器學習就算用資料的語言,通過計算來進行回歸和預測
包括監督學習,非監督學習,強化學習,深度學習
監督學習:就是用含有标簽的資料進行在各種數學模型中進行運算,得到具有比較好正确率的參數,可以在未知的資料中預測标簽
那麼先用一個小代碼來了解一下
用回歸模型來看幸福感和城市富裕程度的關系
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn import linear_model
#首先處理幸福的資料
#加載資料
oecd_bli = pd.read_csv("oecd_bli_2015.csv",thousands = ',')
oecd_bli = oecd_bli[oecd_bli['Inequality']=='Total']
oecd_bli = oecd_bli.pivot(index = 'Country', columns = 'Indicator',values = 'Value')
#接着處理gdp的資料
gdp_per_capita = pd.read_csv('gdp_per_capita.csv',thousands = ',',
delimiter = '\t', encoding ='latin1',na_values = 'n/a')
gdp_per_capita.rename(columns = {'2015':'GDP per captial'},inplace = True)
gdp_per_capita.set_index('Country', inplace = True)
gdp_per_capita.head(2)
#将兩張表融合在一起
full_country_stats = pd.merge(left = oecd_bli, right = gdp_per_capita,
left_index = True, right_index = True)
full_country_stats.sort_values(by = 'GDP per captial', inplace = True)
#劃分資料
remove_indices = [0,1,6,8,33,34,35]
keep_indices = list(set(range(36)) - set(remove_indices))
sample_data = full_country_stats[["GDP per captial",'Life satisfaction']].iloc[keep_indices]
missing_data = full_country_stats[["GDP per captial","Life satisfaction"]].iloc[remove_indices]
#畫圖
sample_data.plot(kind = 'scatter',x= 'GDP per captial',y = 'Life satisfaction', figsize = (5,3))
plt.axis([0,60000,0,10])
position_text = {
"Hungary":(5000,1),
"Korea":(18000,1.7),
"France":(29000,2.4),
"Australia":(40000,3.0),
"United States":(52000,3.8)
}
for country, pos_text in position_text.items():
pos_data_x, pos_data_y = sample_data.loc[country]
if country == "United States" : country = 'U.S.'
else: country
plt.annotate(country, xy = (pos_data_x, pos_data_y), xytext = pos_text,
arrowprops = dict(facecolor = 'black', width = 0.5, shrink = 0.1, headwidth = 5))
plt.plot(pos_data_x,pos_data_y,'ro')

#選擇線性模型
country_stats = sample_data
x = np.c_[country_stats['GDP per captial']]
y = np.c_[country_stats['Life satisfaction']]
# Visualize the data
country_stats.plot(kind='scatter', x="GDP per captial", y='Life satisfaction')
plt.show()
#選擇線性模型
lin_reg_model = linear_model.LinearRegression()
lin_reg_model.fit(x, y)
#Make a prediction for Cyprus
X_new = [[22587]]
print(lin_reg_model.predict(X_new))