如果要定義轉換器,所需要的隻是建立一個類,然後應用以下三個方法:fit()、transform()、fit_transform()。如果添加TransformerMixin作為基類,就可以直接得到最後一個方法,同時,如果添加BaseEstimator作為基類(并在構造函數中避免*args和**kargs),你還能額外獲得兩個非常有用的自動調整超參數的方法(get_params()和set_params())。
from sklearn.base import BaseEstimator, TransformerMixin
room_ix, bedrooms_ix, population_ix, household_ix = 3, 4, 5, 6
class CombinedAttributesAdder(BaseEstimator, TransformerMixin):
def __init__(self, add_bedrooms_per_room = True):
self.add_bedrooms_per_room = add_bedrooms_per_room
def fit(self, X, y=None):
return self
def transform(self, X, y=None):
rooms_per_household = X[:, room_ix] / X[:, household_ix]
population_per_househould = X[:, population_ix] / X[:, household_ix]
if self.add_bedrooms_per_room:
bedrooms_per_room = X[:, bedrooms_ix] / X[:, room_ix]
return np.c_[X, rooms_per_household, population_per_househould,
bedrooms_per_room]
else:
return np.c_[X, rooms_per_household, population_per_househould]
attr_adder = CombinedAttributesAdder(add_bedrooms_per_room=False)
housing_extra_attribs = attr_adder.transform(housing.values)
額外知識點
- np.r_是按列連接配接兩個矩陣,就是把兩矩陣上下相加,要求列數相等。
- np.c_是按行連接配接兩個矩陣,就是把兩矩陣左右相加,要求行數相等。