.. currentmodule:: sklearn
August 7, 2013
- Missing values with sparse and dense matrices can be imputed with the transformer :class:`preprocessing.Imputer` by `Nicolas Trésegnie`_.
- The core implementation of decisions trees has been rewritten from scratch, allowing for faster tree induction and lower memory consumption in all tree-based estimators. By `Gilles Louppe`_.
- Added :class:`ensemble.AdaBoostClassifier` and :class:`ensemble.AdaBoostRegressor`, by `Noel Dawe`_ and `Gilles Louppe`_. See the :ref:`AdaBoost <adaboost>` section of the user guide for details and examples.
- Added :class:`grid_search.RandomizedSearchCV` and :class:`grid_search.ParameterSampler` for randomized hyperparameter optimization. By `Andreas Müller`_.
- Added :ref:`biclustering <biclustering>` algorithms (:class:`sklearn.cluster.bicluster.SpectralCoclustering` and :class:`sklearn.cluster.bicluster.SpectralBiclustering`), data generation methods (:func:`sklearn.datasets.make_biclusters` and :func:`sklearn.datasets.make_checkerboard`), and scoring metrics (:func:`sklearn.metrics.consensus_score`). By `Kemal Eren`_.
- Added :ref:`Restricted Boltzmann Machines<rbm>` (:class:`neural_network.BernoulliRBM`). By `Yann Dauphin`_.
- Python 3 support by :user:`Justin Vincent <justinvf>`, `Lars Buitinck`_, :user:`Subhodeep Moitra <smoitra87>` and `Olivier Grisel`_. All tests now pass under Python 3.3.
- Ability to pass one penalty (alpha value) per target in :class:`linear_model.Ridge`, by @eickenberg and `Mathieu Blondel`_.
- Fixed :mod:`sklearn.linear_model.stochastic_gradient.py` L2 regularization issue (minor practical significance). By :user:`Norbert Crombach <norbert>` and `Mathieu Blondel`_ .
- Added an interactive version of `Andreas Müller`_'s Machine Learning Cheat Sheet (for scikit-learn) to the documentation. See :ref:`Choosing the right estimator <ml_map>`. By `Jaques Grobler`_.
- :class:`grid_search.GridSearchCV` and
:func:`cross_validation.cross_val_score` now support the use of advanced
scoring function such as area under the ROC curve and f-beta scores.
See :ref:`scoring_parameter` for details. By `Andreas Müller`_
and `Lars Buitinck`_.
Passing a function from :mod:`sklearn.metrics` as
score_func
is deprecated. - Multi-label classification output is now supported by :func:`metrics.accuracy_score`, :func:`metrics.zero_one_loss`, :func:`metrics.f1_score`, :func:`metrics.fbeta_score`, :func:`metrics.classification_report`, :func:`metrics.precision_score` and :func:`metrics.recall_score` by `Arnaud Joly`_.
- Two new metrics :func:`metrics.hamming_loss` and :func:`metrics.jaccard_similarity_score` are added with multi-label support by `Arnaud Joly`_.
- Speed and memory usage improvements in :class:`feature_extraction.text.CountVectorizer` and :class:`feature_extraction.text.TfidfVectorizer`, by Jochen Wersdörfer and Roman Sinayev.
- The
min_df
parameter in :class:`feature_extraction.text.CountVectorizer` and :class:`feature_extraction.text.TfidfVectorizer`, which used to be 2, has been reset to 1 to avoid unpleasant surprises (empty vocabularies) for novice users who try it out on tiny document collections. A value of at least 2 is still recommended for practical use. - :class:`svm.LinearSVC`, :class:`linear_model.SGDClassifier` and
:class:`linear_model.SGDRegressor` now have a
sparsify
method that converts theircoef_
into a sparse matrix, meaning stored models trained using these estimators can be made much more compact. - :class:`linear_model.SGDClassifier` now produces multiclass probability estimates when trained under log loss or modified Huber loss.
- Hyperlinks to documentation in example code on the website by :user:`Martin Luessi <mluessi>`.
- Fixed bug in :class:`preprocessing.MinMaxScaler` causing incorrect scaling
of the features for non-default
feature_range
settings. By `Andreas Müller`_. max_features
in :class:`tree.DecisionTreeClassifier`, :class:`tree.DecisionTreeRegressor` and all derived ensemble estimators now supports percentage values. By `Gilles Louppe`_.- Performance improvements in :class:`isotonic.IsotonicRegression` by `Nelle Varoquaux`_.
- :func:`metrics.accuracy_score` has an option normalize to return the fraction or the number of correctly classified sample by `Arnaud Joly`_.
- Added :func:`metrics.log_loss` that computes log loss, aka cross-entropy loss. By Jochen Wersdörfer and `Lars Buitinck`_.
- A bug that caused :class:`ensemble.AdaBoostClassifier`'s to output incorrect probabilities has been fixed.
- Feature selectors now share a mixin providing consistent
transform
,inverse_transform
andget_support
methods. By `Joel Nothman`_. - A fitted :class:`grid_search.GridSearchCV` or :class:`grid_search.RandomizedSearchCV` can now generally be pickled. By `Joel Nothman`_.
- Refactored and vectorized implementation of :func:`metrics.roc_curve` and :func:`metrics.precision_recall_curve`. By `Joel Nothman`_.
- The new estimator :class:`sklearn.decomposition.TruncatedSVD` performs dimensionality reduction using SVD on sparse matrices, and can be used for latent semantic analysis (LSA). By `Lars Buitinck`_.
- Added self-contained example of out-of-core learning on text data :ref:`sphx_glr_auto_examples_applications_plot_out_of_core_classification.py`. By :user:`Eustache Diemert <oddskool>`.
- The default number of components for
:class:`sklearn.decomposition.RandomizedPCA` is now correctly documented
to be
n_features
. This was the default behavior, so programs using it will continue to work as they did. - :class:`sklearn.cluster.KMeans` now fits several orders of magnitude faster on sparse data (the speedup depends on the sparsity). By `Lars Buitinck`_.
- Reduce memory footprint of FastICA by `Denis Engemann`_ and `Alexandre Gramfort`_.
- Verbose output in :mod:`sklearn.ensemble.gradient_boosting` now uses a column format and prints progress in decreasing frequency. It also shows the remaining time. By `Peter Prettenhofer`_.
- :mod:`sklearn.ensemble.gradient_boosting` provides out-of-bag improvement :attr:`~sklearn.ensemble.GradientBoostingRegressor.oob_improvement_` rather than the OOB score for model selection. An example that shows how to use OOB estimates to select the number of trees was added. By `Peter Prettenhofer`_.
- Most metrics now support string labels for multiclass classification by `Arnaud Joly`_ and `Lars Buitinck`_.
- New OrthogonalMatchingPursuitCV class by `Alexandre Gramfort`_ and `Vlad Niculae`_.
- Fixed a bug in :class:`sklearn.covariance.GraphLassoCV`: the 'alphas' parameter now works as expected when given a list of values. By Philippe Gervais.
- Fixed an important bug in :class:`sklearn.covariance.GraphLassoCV` that prevented all folds provided by a CV object to be used (only the first 3 were used). When providing a CV object, execution time may thus increase significantly compared to the previous version (bug results are correct now). By Philippe Gervais.
- :class:`cross_validation.cross_val_score` and the :mod:`grid_search` module is now tested with multi-output data by `Arnaud Joly`_.
- :func:`datasets.make_multilabel_classification` can now return the output in label indicator multilabel format by `Arnaud Joly`_.
- K-nearest neighbors, :class:`neighbors.KNeighborsRegressor` and :class:`neighbors.RadiusNeighborsRegressor`, and radius neighbors, :class:`neighbors.RadiusNeighborsRegressor` and :class:`neighbors.RadiusNeighborsClassifier` support multioutput data by `Arnaud Joly`_.
- Random state in LibSVM-based estimators (:class:`svm.SVC`, :class:`NuSVC`,
:class:`OneClassSVM`, :class:`svm.SVR`, :class:`svm.NuSVR`) can now be
controlled. This is useful to ensure consistency in the probability
estimates for the classifiers trained with
probability=True
. By `Vlad Niculae`_. - Out-of-core learning support for discrete naive Bayes classifiers
:class:`sklearn.naive_bayes.MultinomialNB` and
:class:`sklearn.naive_bayes.BernoulliNB` by adding the
partial_fit
method by `Olivier Grisel`_. - New website design and navigation by `Gilles Louppe`_, `Nelle Varoquaux`_, Vincent Michel and `Andreas Müller`_.
- Improved documentation on :ref:`multi-class, multi-label and multi-output classification <multiclass>` by `Yannick Schwartz`_ and `Arnaud Joly`_.
- Better input and error handling in the :mod:`metrics` module by `Arnaud Joly`_ and `Joel Nothman`_.
- Speed optimization of the :mod:`hmm` module by :user:`Mikhail Korobov <kmike>`
- Significant speed improvements for :class:`sklearn.cluster.DBSCAN` by cleverless
- The :func:`auc_score` was renamed :func:`roc_auc_score`.
- Testing scikit-learn with
sklearn.test()
is deprecated. Usenosetests sklearn
from the command line. - Feature importances in :class:`tree.DecisionTreeClassifier`,
:class:`tree.DecisionTreeRegressor` and all derived ensemble estimators
are now computed on the fly when accessing the
feature_importances_
attribute. Settingcompute_importances=True
is no longer required. By `Gilles Louppe`_. - :class:`linear_model.lasso_path` and
:class:`linear_model.enet_path` can return its results in the same
format as that of :class:`linear_model.lars_path`. This is done by
setting the
return_models
parameter toFalse
. By `Jaques Grobler`_ and `Alexandre Gramfort`_ - :class:`grid_search.IterGrid` was renamed to :class:`grid_search.ParameterGrid`.
- Fixed bug in :class:`KFold` causing imperfect class balance in some cases. By `Alexandre Gramfort`_ and Tadej Janež.
- :class:`sklearn.neighbors.BallTree` has been refactored, and a :class:`sklearn.neighbors.KDTree` has been added which shares the same interface. The Ball Tree now works with a wide variety of distance metrics. Both classes have many new methods, including single-tree and dual-tree queries, breadth-first and depth-first searching, and more advanced queries such as kernel density estimation and 2-point correlation functions. By `Jake Vanderplas`_
- Support for scipy.spatial.cKDTree within neighbors queries has been removed, and the functionality replaced with the new :class:`KDTree` class.
- :class:`sklearn.neighbors.KernelDensity` has been added, which performs efficient kernel density estimation with a variety of kernels.
- :class:`sklearn.decomposition.KernelPCA` now always returns output with
n_components
components, unless the new parameterremove_zero_eig
is set toTrue
. This new behavior is consistent with the way kernel PCA was always documented; previously, the removal of components with zero eigenvalues was tacitly performed on all data. gcv_mode="auto"
no longer tries to perform SVD on a densified sparse matrix in :class:`sklearn.linear_model.RidgeCV`.- Sparse matrix support in :class:`sklearn.decomposition.RandomizedPCA`
is now deprecated in favor of the new
TruncatedSVD
. - :class:`cross_validation.KFold` and
:class:`cross_validation.StratifiedKFold` now enforce n_folds >= 2
otherwise a
ValueError
is raised. By `Olivier Grisel`_. - :func:`datasets.load_files`'s
charset
andcharset_errors
parameters were renamedencoding
anddecode_errors
. - Attribute
oob_score_
in :class:`sklearn.ensemble.GradientBoostingRegressor` and :class:`sklearn.ensemble.GradientBoostingClassifier` is deprecated and has been replaced byoob_improvement_
. - Attributes in OrthogonalMatchingPursuit have been deprecated (copy_X, Gram, ...) and precompute_gram renamed precompute for consistency. See #2224.
- :class:`sklearn.preprocessing.StandardScaler` now converts integer input to float, and raises a warning. Previously it rounded for dense integer input.
- :class:`sklearn.multiclass.OneVsRestClassifier` now has a
decision_function
method. This will return the distance of each sample from the decision boundary for each class, as long as the underlying estimators implement thedecision_function
method. By `Kyle Kastner`_. - Better input validation, warning on unexpected shapes for y.
List of contributors for release 0.14 by number of commits.
- 277 Gilles Louppe
- 245 Lars Buitinck
- 187 Andreas Mueller
- 124 Arnaud Joly
- 112 Jaques Grobler
- 109 Gael Varoquaux
- 107 Olivier Grisel
- 102 Noel Dawe
- 99 Kemal Eren
- 79 Joel Nothman
- 75 Jake VanderPlas
- 73 Nelle Varoquaux
- 71 Vlad Niculae
- 65 Peter Prettenhofer
- 64 Alexandre Gramfort
- 54 Mathieu Blondel
- 38 Nicolas Trésegnie
- 35 eustache
- 27 Denis Engemann
- 25 Yann N. Dauphin
- 19 Justin Vincent
- 17 Robert Layton
- 15 Doug Coleman
- 14 Michael Eickenberg
- 13 Robert Marchman
- 11 Fabian Pedregosa
- 11 Philippe Gervais
- 10 Jim Holmström
- 10 Tadej Janež
- 10 syhw
- 9 Mikhail Korobov
- 9 Steven De Gryze
- 8 sergeyf
- 7 Ben Root
- 7 Hrishikesh Huilgolkar
- 6 Kyle Kastner
- 6 Martin Luessi
- 6 Rob Speer
- 5 Federico Vaggi
- 5 Raul Garreta
- 5 Rob Zinkov
- 4 Ken Geis
- 3 A. Flaxman
- 3 Denton Cockburn
- 3 Dougal Sutherland
- 3 Ian Ozsvald
- 3 Johannes Schönberger
- 3 Robert McGibbon
- 3 Roman Sinayev
- 3 Szabo Roland
- 2 Diego Molla
- 2 Imran Haque
- 2 Jochen Wersdörfer
- 2 Sergey Karayev
- 2 Yannick Schwartz
- 2 jamestwebber
- 1 Abhijeet Kolhe
- 1 Alexander Fabisch
- 1 Bastiaan van den Berg
- 1 Benjamin Peterson
- 1 Daniel Velkov
- 1 Fazlul Shahriar
- 1 Felix Brockherde
- 1 Félix-Antoine Fortin
- 1 Harikrishnan S
- 1 Jack Hale
- 1 JakeMick
- 1 James McDermott
- 1 John Benediktsson
- 1 John Zwinck
- 1 Joshua Vredevoogd
- 1 Justin Pati
- 1 Kevin Hughes
- 1 Kyle Kelley
- 1 Matthias Ekman
- 1 Miroslav Shubernetskiy
- 1 Naoki Orii
- 1 Norbert Crombach
- 1 Rafael Cunha de Almeida
- 1 Rolando Espinoza La fuente
- 1 Seamus Abshere
- 1 Sergey Feldman
- 1 Sergio Medina
- 1 Stefano Lattarini
- 1 Steve Koch
- 1 Sturla Molden
- 1 Thomas Jarosch
- 1 Yaroslav Halchenko