Closed
Description
When subsampling features IsolationForest
fails the input validation when calling predict()
.
from sklearn.ensemble import IsolationForest
from sklearn import datasets
iris = datasets.load_iris()
X = iris.data
y = iris.target
clf = IsolationForest(max_features=0.8)
clf.fit(X, y)
clf.predict(X)
gives the following:
scikit-learn/sklearn/tree/tree.pyc in _validate_X_predict(self, X, check_input)
392 " match the input. Model n_features is %s and "
393 " input n_features is %s "
--> 394 % (self.n_features_, n_features))
395
396 return X
ValueError: Number of features of the model must match the input. Model n_features is 3 and input n_features is 4
In predict
one of the individual fitted estimators is used for input validation: self.estimators_[0]._validate_X_predict(X, check_input=True)
but it is passed the full X
which has all the features. After looking into it a bit, bagging.py
sub-samples the features itself, where as forest.py
delegates it to the underlying DecisionTree
.