Python之 sklearn：sklearn中的RobustScaler 函數的簡介及使用方法之詳細攻略

sklearn中的RobustScaler 函數的簡介及使用方法

RobustScaler 函數使用對異常值魯棒的統計資訊來縮放特征。這個标量去除中值，并根據分位數範圍(預設為IQR即四分位數範圍)對資料進行縮放。IQR是第1個四分位數(第25分位數)和第3個四分位數(第75分位數)之間的範圍。通過計算訓練集中樣本的相關統計量，對每個特征分别進行定心和縮放。然後将中值和四分位範圍存儲起來，使用“變換”方法用于以後的資料。

資料集的标準化是許多機器學習估計器的常見需求。這通常是通過去除平均值和縮放到機關方差來實作的。然而，異常值往往會對樣本均值/方差産生負面影響。在這種情況下，中位數和四分位範圍通常會給出更好的結果。

class RobustScaler Found at: sklearn.preprocessing._data

class RobustScaler(TransformerMixin, BaseEstimator):

"""Scale features using statistics that are robust to outliers.

This Scaler removes the median and scales the data according to the quantile range (defaults to IQR: Interquartile Range). The IQR is the range between the 1st quartile (25th quantile) and the 3rd quartile (75th quantile).

Centering and scaling happen independently on each feature by computing the relevant statistics on the samples in the training set. Median and interquartile range are then stored to be used on later data using the ``transform`` method.

Standardization of a dataset is a common requirement for many machine learning estimators. Typically this is done by removing the mean and scaling to unit variance. However, outliers can often influence the sample mean / variance in a negative way. In such cases, the median and the interquartile range often give better results.

.. versionadded:: 0.17

Read more in the :ref:`User Guide <preprocessing_scaler>`.

使用對異常值魯棒的統計資訊來縮放特征。

這個标量去除中值，并根據分位數範圍(預設為IQR即四分位數範圍)對資料進行縮放。IQR是第1個四分位數(第25分位數)和第3個四分位數(第75分位數)之間的範圍。

通過計算訓練集中樣本的相關統計量，對每個特征分别進行定心和縮放。然後将中值和四分位範圍存儲起來，使用“變換”方法用于以後的資料。

. .versionadded:: 0.17

更多内容見:ref: ' User Guide '。</preprocessing_scaler>

Parameters

----------

with_centering : boolean, True by default. If True, center the data before scaling. This will cause ``transform`` to raise an exception when attempted on sparse matrices, because centering them entails building a dense matrix which in common use cases is likely to be too large to fit in memory.

with_scaling : boolean, True by default. If True, scale the data to interquartile range. quantile_range : tuple (q_min, q_max), 0.0 < q_min < q_max < 100.0. Default: (25.0, 75.0) = (1st quantile, 3rd quantile) = IQR. Quantile range used to calculate ``scale_``.

.. versionadded:: 0.18

copy : boolean, optional, default is True. If False, try to avoid a copy and do inplace scaling instead. This is not guaranteed to always work inplace; e.g. if the data is not a NumPy array or scipy.sparse CSR matrix, a copy may still be returned.

Attributes

center_ : array of floats. The median value for each feature in the training set.

scale_ : array of floats. The (scaled) interquartile range for each feature in the training set.

參數

----------

with_centering : boolean類型，預設為True。如果為真，在縮放前将資料居中。這将導緻“轉換”在嘗試處理稀疏矩陣時引發異常，因為圍繞它們需要建構一個密集的矩陣，在常見的用例中，這個矩陣可能太大而無法裝入記憶體。

with_scaling : boolean類型，預設為True。如果為真，将資料縮放到四分位範圍。quantile_range:元組(q_min, q_max)， 0.0 < q_min < q_max < 100.0。預設:(25.0,75.0)=(第1分位數，第3分位數)= IQR。用于計算' ' scale_ ' '的分位數範圍。

. .versionadded:: 0.18

copy : boolean類型，可選，預設為真。如果為False，則盡量避免複制，而改為就地縮放。這并不能保證總是有效的;例如，如果資料不是一個NumPy數組或scipy。稀疏CSR矩陣，仍可傳回副本。

屬性

center_ : 浮點數數組。訓練集中每個特征的中值。

scale_ :浮點數數組。訓練集中每個特征的(縮放的)四分位範圍。

*scale_* attribute.

Examples

--------

>>> from sklearn.preprocessing import RobustScaler

>>> X = [[ 1., -2., 2.],

... [ -2., 1., 3.],

... [ 4., 1., -2.]]

>>> transformer = RobustScaler().fit(X)

>>> transformer

RobustScaler()

>>> transformer.transform(X)

array([[ 0. , -2. , 0. ],

[-1. , 0. , 0.4],

[ 1. , 0. , -1.6]])

Python之 sklearn：sklearn中的RobustScaler 函數的簡介及使用方法之詳細攻略

sklearn中的RobustScaler 函數的簡介及使用方法

繼續閱讀

來自python的【條件控制/語句循環/break/continue/else/pass】一、條件控制二、語句循環

無法解析的外部符号 wmain，該符号在函數 "void cdecl mainCRTStartupHelper(struct HINSTANCE *,unsigned short con......

TestLink導出用例轉換工具(XML2Excel)

YAML簡介和PyYAML安全操作YAML支援的類型YAML的優點：yaml的基本文法python操作

Small tricks

libsvm for python 安裝

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

27. Remove Element(清單)題目代碼

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入