天天看點

sklearn&Tensorflow機器學習01 --- 概覽,回歸模型(幸福感與國家gdp的關系)

學習一個東西之前要認清學的是什麼

啥是機器學習?

機器學習就算用資料的語言,通過計算來進行回歸和預測

包括監督學習,非監督學習,強化學習,深度學習

監督學習:就是用含有标簽的資料進行在各種數學模型中進行運算,得到具有比較好正确率的參數,可以在未知的資料中預測标簽

那麼先用一個小代碼來了解一下

用回歸模型來看幸福感和城市富裕程度的關系

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn import linear_model

#首先處理幸福的資料
#加載資料
oecd_bli = pd.read_csv("oecd_bli_2015.csv",thousands = ',')
oecd_bli = oecd_bli[oecd_bli['Inequality']=='Total']
oecd_bli = oecd_bli.pivot(index = 'Country', columns = 'Indicator',values = 'Value')

#接着處理gdp的資料
gdp_per_capita = pd.read_csv('gdp_per_capita.csv',thousands = ',', 
                             delimiter = '\t', encoding ='latin1',na_values = 'n/a')
gdp_per_capita.rename(columns = {'2015':'GDP per captial'},inplace = True)
gdp_per_capita.set_index('Country', inplace = True)
gdp_per_capita.head(2)

#将兩張表融合在一起

full_country_stats = pd.merge(left = oecd_bli, right = gdp_per_capita, 
                              left_index = True, right_index = True)
full_country_stats.sort_values(by = 'GDP per captial', inplace = True)

#劃分資料
remove_indices = [0,1,6,8,33,34,35]
keep_indices = list(set(range(36)) - set(remove_indices))
sample_data = full_country_stats[["GDP per captial",'Life satisfaction']].iloc[keep_indices]
missing_data = full_country_stats[["GDP per captial","Life satisfaction"]].iloc[remove_indices]

#畫圖
sample_data.plot(kind = 'scatter',x= 'GDP per captial',y = 'Life satisfaction', figsize = (5,3))
plt.axis([0,60000,0,10])
position_text = {
        "Hungary":(5000,1),
        "Korea":(18000,1.7),
        "France":(29000,2.4),
        "Australia":(40000,3.0),
        "United States":(52000,3.8)     
        }
for country, pos_text in position_text.items():
    pos_data_x, pos_data_y = sample_data.loc[country]
    if country == "United States" : country = 'U.S.' 
    else: country
    plt.annotate(country, xy = (pos_data_x, pos_data_y), xytext = pos_text,
                 arrowprops = dict(facecolor = 'black', width = 0.5, shrink = 0.1, headwidth = 5))
    plt.plot(pos_data_x,pos_data_y,'ro')      
sklearn&Tensorflow機器學習01 --- 概覽,回歸模型(幸福感與國家gdp的關系)
#選擇線性模型
country_stats = sample_data
x = np.c_[country_stats['GDP per captial']]
y = np.c_[country_stats['Life satisfaction']]

# Visualize the data
country_stats.plot(kind='scatter', x="GDP per captial", y='Life satisfaction')
plt.show()

#選擇線性模型
lin_reg_model = linear_model.LinearRegression()
lin_reg_model.fit(x, y)

#Make a prediction for Cyprus
X_new = [[22587]]
print(lin_reg_model.predict(X_new))
      
sklearn&Tensorflow機器學習01 --- 概覽,回歸模型(幸福感與國家gdp的關系)

繼續閱讀