Skip to content

IsolationForest(max_features=0.8).predict(X) fails input validation #5732

Closed
@betatim

Description

@betatim

When subsampling features IsolationForest fails the input validation when calling predict().

from sklearn.ensemble import IsolationForest
from sklearn import datasets

iris = datasets.load_iris()
X = iris.data
y = iris.target

clf = IsolationForest(max_features=0.8)
clf.fit(X, y)
clf.predict(X)

gives the following:

scikit-learn/sklearn/tree/tree.pyc in _validate_X_predict(self, X, check_input)
    392                              " match the input. Model n_features is %s and "
    393                              " input n_features is %s "
--> 394                              % (self.n_features_, n_features))
    395
    396         return X

ValueError: Number of features of the model must  match the input. Model n_features is 3 and  input n_features is 4

In predict one of the individual fitted estimators is used for input validation: self.estimators_[0]._validate_X_predict(X, check_input=True) but it is passed the full X which has all the features. After looking into it a bit, bagging.py sub-samples the features itself, where as forest.py delegates it to the underlying DecisionTree.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions