天天看點

sklearn自定義轉換器

如果要定義轉換器,所需要的隻是建立一個類,然後應用以下三個方法:fit()、transform()、fit_transform()。如果添加TransformerMixin作為基類,就可以直接得到最後一個方法,同時,如果添加BaseEstimator作為基類(并在構造函數中避免*args和**kargs),你還能額外獲得兩個非常有用的自動調整超參數的方法(get_params()和set_params())。

from sklearn.base import BaseEstimator, TransformerMixin

room_ix, bedrooms_ix, population_ix, household_ix = 3, 4, 5, 6

class CombinedAttributesAdder(BaseEstimator, TransformerMixin):
    def __init__(self, add_bedrooms_per_room = True):
        self.add_bedrooms_per_room = add_bedrooms_per_room

    def fit(self, X, y=None):
        return self

    def transform(self, X, y=None):
        rooms_per_household = X[:, room_ix] / X[:, household_ix]
        population_per_househould = X[:, population_ix] / X[:, household_ix]
        if self.add_bedrooms_per_room:
            bedrooms_per_room = X[:, bedrooms_ix] / X[:, room_ix]
            return np.c_[X, rooms_per_household, population_per_househould,
            bedrooms_per_room]
        else:
            return np.c_[X, rooms_per_household, population_per_househould]

attr_adder = CombinedAttributesAdder(add_bedrooms_per_room=False)
housing_extra_attribs = attr_adder.transform(housing.values)
           

額外知識點

  • np.r_是按列連接配接兩個矩陣,就是把兩矩陣上下相加,要求列數相等。
  • np.c_是按行連接配接兩個矩陣,就是把兩矩陣左右相加,要求行數相等。

繼續閱讀