sklearn.feature_extraction.DictVectorizer将字典格式的資料轉換為特征

2018-01-29 23:50:00

class sklearn.feature_extraction.DictVectorizer(dtype=<class ‘numpy.float64’>, separator=’=’, sparse=True,sort=True)

Transforms lists of feature-value mappings to vectors.

This transformer turns lists of mappings (dict-like objects) of feature names to feature values into Numpy arrays or scipy.sparse matrices（稀疏矩陣） for use with scikit-learn estimators.

When feature values are strings, this transformer will do a binary one-hot (aka one-of-K) coding: one boolean-valued feature is constructed for each of the possible string values that the feature can take on. For instance, a feature “f” that can take on the values “ham” and “spam” will become two features in the output, one signifying “f=ham”, the other “f=spam”.

However, note that this transformer will only do a binary one-hot encoding when feature values are of type string. If categorical features are represented as numeric values such as int, the DictVectorizer can be followed by OneHotEncoder to complete binary one-hot encoding.

Features that do not occur in a sample (mapping) will have a zero value in the resulting array/matrix.

Parameters:

dtype : callable, optional

The type of feature values. Passed to Numpy array/scipy.sparse matrix constructors as the dtype argument.

separator : string, optional

Separator string used when constructing new features for one-hot coding.

sparse : boolean, optional.

Whether transform should produce scipy.sparse matrices. True by default.

sort : boolean, optional.

Whether feature_names_ and vocabulary_ should be sorted when fitting. True by default.

Attributes:

vocabulary_ : dict

A dictionary mapping feature names to feature indices.

feature_names_ : list

A list of length n_features containing the feature names (e.g., “f=ham” and “f=spam”).

sklearn.feature_extraction.DictVectorizer将字典格式的資料轉換為特征

繼續閱讀

來自python的【條件控制/語句循環/break/continue/else/pass】一、條件控制二、語句循環

無法解析的外部符号 wmain，該符号在函數 "void cdecl mainCRTStartupHelper(struct HINSTANCE *,unsigned short con......

TestLink導出用例轉換工具(XML2Excel)

YAML簡介和PyYAML安全操作YAML支援的類型YAML的優點：yaml的基本文法python操作

Small tricks

libsvm for python 安裝

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

27. Remove Element(清單)題目代碼

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入