Transformers are usually combined with classifiers, regressors or other estimators to build a composite estimator
将一系列的数据预处理和模型封装在一起,固定处理数据的一系列步骤,调用一次fit和predict完成整个流程,并可使用grid search对pipeline内参数进行统一调试。
构成规则如下(前面处理数据,最后一步输入模型或者完全处理数据):
- 所有流水线中的estimator必须为transformer,除了最后一个
- 最后一个estimator可以是任何类型(trainsformer,classifier)
主要函数:
- union为只包含transformer的pipeline
1. pipeline的基本使用
使用类似字典的元组链表初始化pipeline,或者使用make_pipeline()函数直接传入estimator列表
1 2 3 4 5 6 7
| >> from sklearn.pipeline import Pipeline >> from sklearn.svm import SVC >> from sklearn.decomposition import PCA >> estimators = [('reduce_dim', PCA()), ('model', SVC())] >> test_pipeline = Pipeline(estimators) >> test_pipeline Pipeline(steps=[('reduce_dim', PCA()), ('model', SVC())])
|
可通过数组、字典以及step属性访问每个estimator
1 2 3 4 5 6
| >> print(test_pipeline[0]) >> print(test_pipeline.steps) >> print(test_pipeline['reduce_dim']) PCA() [('reduce_dim', PCA()), ('model', SVC())] PCA()
|
可使用数组的截断形式,获取子pipeline
1 2 3 4 5
| >> print(len(test_pipeline)) >> sub_pipeline = test_pipeline[1:] >> print(sub_pipeline) 2 Pipeline(steps=[('model', SVC())])
|
使用<estimator>__<parameter>
即pipeline中estimator名称+参数,设置访问pipeline中某一estimator的参数
1 2 3 4 5 6 7 8 9 10 11 12
| >> test_pipeline.set_params(model__C = 10) >> test_pipeline.get_params()['model__C'] 10
>> from sklearn.model_selection import GridSearchCV >> param_grid = dict(reduce_dim__n_components=[2, 5, 10], clf__C=[0.1, 10, 100]) >> grid_search = GridSearchCV(test_pipeline, param_grid=param_grid) >> grid_search GridSearchCV(estimator=Pipeline(steps=[('reduce_dim', PCA()), ('model', SVC(C=10))]), param_grid={'clf__C': [0.1, 10, 100], 'reduce_dim__n_components': [2, 5, 10]})
|
待补充。。