sklearn中的svm參數介紹

svm是sklearn中一個關于支援向量機的包，比較常用，在使用過程中若是不熟悉各個參數的意義，總以預設參數進行機器學習，則不能做到最優化使用SVM，這就是一個較為遺憾的事情了。為了加深了解和友善調用，根據現有了解，結合官方文檔，對其中的參數做一些記錄，友善自己時常溫習，也給閱讀者進行一些粗淺的介紹，如果有了解錯誤的地方，希望閱讀者能夠指出。

以svm中的支援向量分類SVC作為介紹，所有參數如下：

class sklearn.svm.SVC(
            C=1.0, 
            kernel='rbf', 
            degree=3, 
            gamma='auto', 
            coef0=0.0, 
            shrinking=True, 
            probability=False, 
            tol=0.001, 
            cache_size=200, 
            class_weight=None, 
            verbose=False, 
            max_iter=-1, 
            decision_function_shape='ovr', 
            random_state=None)

具體每個參數的使用方法介紹如下：

C : float, optional (default=1.0)

    誤差項的懲罰參數，一般取值為10的n次幂，如10的-5次幂，10的-4次幂。。。。10的0次幂，10，1000,1000，在python中可以使用pow（10，n） n=-5~inf
    C越大，相當于懲罰松弛變量，希望松弛變量接近0，即對誤分類的懲罰增大，趨向于對訓練集全分對的情況，這樣會出現訓練集測試時準确率很高，但泛化能力弱。
    C值小，對誤分類的懲罰減小，容錯能力增強，泛化能力較強。

kernel : string, optional (default=’rbf’)

    svc中指定的kernel類型。
    可以是： ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’ 或者自己指定。 預設使用‘rbf’ 。

degree : int, optional (default=3)

    當指定kernel為 ‘poly’時，表示選擇的多項式的最高次數，預設為三次多項式。
    若指定kernel不是‘poly’,則忽略，即該參數隻對‘poly’有作用。

gamma : float, optional (default=’auto’)

    當kernel為‘rbf’, ‘poly’或‘sigmoid’時的kernel系數。
    如果不設定，預設為 ‘auto’ ，此時，kernel系數設定為：1/n_features

coef0 : float, optional (default=0.0)

    kernel函數的常數項。
    隻有在 kernel為‘poly’或‘sigmoid’時有效，預設為0。

probability : boolean, optional (default=False)
    是否采用機率估計。
    必須在fit（）方法前使用，該方法的使用會降低運算速度，預設為False。

shrinking : boolean, optional (default=True)

    如果能預知哪些變量對應着支援向量，則隻要在這些樣本上訓練就夠了，其他樣本可不予考慮，這不影響訓練結果，但降低了問題的規模并有助于迅速求解。進一步，如果能預知哪些變量在邊界上(即a=C)，則這些變量可保持不動，隻對其他變量進行優化，進而使問題的規模更小，訓練時間大大降低。這就是Shrinking技術。

    Shrinking技術基于這樣一個事實：支援向量隻占訓練樣本的少部分，并且大多數支援向量的拉格朗日乘子等于C。

tol : float, optional (default=1e-3)

    誤差項達到指定值時則停止訓練，預設為1e-3，即0.001。

cache_size : float, optional

    指定核心緩存的大小，預設為200M。

class_weight : {dict, ‘balanced’}, optional

    權重設定。如果不設定，則預設所有類權重值相同。
    以字典形式傳入。
    ##（這個具體使用還不是很清楚）##
    Set the parameter C of class i to class_weight[i]*C for SVC. If not given, all classes are supposed to have weight one. The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y))

verbose : bool, default: False

    是否啟用詳細輸出。
    多線程時可能不會如預期的那樣工作。預設為False。

max_iter : int, optional (default=-1)

    強制設定最大疊代次數。
    預設設定為-1，表示無窮大疊代次數。
    Hard limit on iterations within solver, or -1 for no limit.

decision_function_shape : ‘ovo’, ‘ovr’, default=’ovr’

    ##這個用法也不是很了解##
    Whether to return a one-vs-rest (‘ovr’) decision function of shape (n_samples, n_classes) as all other classifiers, or the original one-vs-one (‘ovo’) decision function of libsvm which has shape (n_samples, n_classes * (n_classes - 1) / 2).

    Changed in version 0.19: decision_function_shape is ‘ovr’ by default.

    New in version 0.17: decision_function_shape=’ovr’ is recommended.

    Changed in version 0.17: Deprecated decision_function_shape=’ovo’ and None.

random_state : int, RandomState instance or None, optional (default=None)

    僞随機數使用資料。

一些屬性介紹：

Attributes: 

support_ : array-like, shape = [n_SV]

    Indices of support vectors.

support_vectors_ : array-like, shape = [n_SV, n_features]

    Support vectors.

n_support_ : array-like, dtype=int32, shape = [n_class]

    Number of support vectors for each class.

dual_coef_ : array, shape = [n_class-1, n_SV]

    Coefficients of the support vector in the decision function. For multiclass, coefficient for all 1-vs-1 classifiers. The layout of the coefficients in the multiclass case is somewhat non-trivial. See the section about multi-class classification in the SVM section of the User Guide for details.

coef_ : array, shape = [n_class-1, n_features]

    Weights assigned to the features (coefficients in the primal problem). This is only available in the case of a linear kernel.

    coef_ is a readonly property derived from dual_coef_ and support_vectors_.

intercept_ : array, shape = [n_class * (n_class-1) / 2]

    Constants in decision function.

sklearn中的svm參數介紹

繼續閱讀

XGBoost Plotting API以及GBDT組合特征實踐 XGBoost Plotting API以及GBDT組合特征實踐

解碼器用于語義分割：資料依賴的解碼可以實作靈活的特征聚合

YAML簡介和PyYAML安全操作YAML支援的類型YAML的優點：yaml的基本文法python操作

2021-2025年中國運動療法（KT）帶行業市場供需與戰略研究報告

Small tricks

libsvm for python 安裝

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

27. Remove Element(清單)題目代碼

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入