Apologies if this is obvious but I couldn't find a clear answer to this:
Say I've used a pretty typical pipeline:
feat_sel = RandomizedLogisticRegression() clf = RandomForestClassifier() pl = Pipeline([ ('preprocessing', preprocessing.StandardScaler()), ('feature_selection', feat_sel), ('classification', clf)]) pl.fit(X,y)
Now when I apply pl on a new set,
is RandomizedLogisticRegression going to be reapplied or are the columns that were selected in training going to be used in the new data? If not is there a way for pipeline to differentiate between feature selectors and feature extractors/scalers/other transforms that should be applied on the new input? Until I'm sure, I'm skipping the pipeline feature and just doing each step manually and maintaning state.