Open
Description
Describe the bug
When setting the memory
parameter of a transformer Pipeline
(i.e., one whose last step is a transformer), the final transformer is not cached.
Discovered at https://stackoverflow.com/q/71812869/10495893.
Steps/Code to Reproduce
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.pipeline import Pipeline
import time
class Test(BaseEstimator, TransformerMixin):
def __init__(self, col):
self.col = col
def fit(self, X, y=None):
print(self.col)
return self
def transform(self, X, y=None):
for t in range(5):
# just to slow it down / check caching.
print(".")
time.sleep(1)
#print(self.col)
return X
pipline = Pipeline(
[
("test", Test(col="this_column")),
("test2", Test(col="that_column"))
],
memory="tmp/cache",
)
pipline.fit(None)
pipline.fit(None)
pipline.fit(None)
Expected Results
this_column
.
.
.
.
.
that_column
Actual Results
this_column
.
.
.
.
.
that_column
that_column
that_column
Versions
System:
python: 3.7.13 (default, Mar 16 2022, 17:37:17) [GCC 7.5.0]
executable: /usr/bin/python3
machine: Linux-5.4.144+-x86_64-with-Ubuntu-18.04-bionic
Python dependencies:
pip: 21.1.3
setuptools: 57.4.0
sklearn: 1.0.2
numpy: 1.21.5
scipy: 1.4.1
Cython: 0.29.28
pandas: 1.3.5
matplotlib: 3.2.2
joblib: 1.1.0
threadpoolctl: 3.1.0
Built with OpenMP: True