Missing features removal with SimpleImputer

### Code sample
In the sample code below, a column is removed from the dataset during the pipeline

```python
>>> from sklearn.impute import SimpleImputer
>>> import numpy as np
>>> imp = SimpleImputer()
>>> imp.fit([[0, np.nan], [1, np.nan]])
>>> imp.transform([[0, np.nan], [1, 1]])
array([[0.],
       [1.]])
```

### Problem description
Currently `sklearn.impute.SimpleImputer` silently removes features that are `np.nan` on every training sample.

This may cause further issues on pipelines because the dataset's `shape` has changed, e.g.

```python
dataset[:, columns_to_impute_with_median] = imp.fit_transform(dataset[:, columns_to_impute_with_median])
```

### Possible solutions
For the problematic features, either keep their values if valid or impute the `fill_value` during `transform`. I suggest adding a new parameter to trigger this behaviour with a warning highlighting the referred features.

As I'm willing to implement this feature, I look forward advices.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Missing features removal with SimpleImputer #16426

Code sample

Problem description

Possible solutions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Missing features removal with SimpleImputer #16426

Description

Code sample

Problem description

Possible solutions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions