.. currentmodule:: sklearn

Version 0.14

August 7, 2013

Changelog

Missing values with sparse and dense matrices can be imputed with the transformer :class:`preprocessing.Imputer` by `Nicolas Trésegnie`_.
The core implementation of decisions trees has been rewritten from scratch, allowing for faster tree induction and lower memory consumption in all tree-based estimators. By `Gilles Louppe`_.
Added :class:`ensemble.AdaBoostClassifier` and :class:`ensemble.AdaBoostRegressor`, by `Noel Dawe`_ and `Gilles Louppe`_. See the :ref:`AdaBoost <adaboost>` section of the user guide for details and examples.
Added :class:`grid_search.RandomizedSearchCV` and :class:`grid_search.ParameterSampler` for randomized hyperparameter optimization. By `Andreas Müller`_.
Added :ref:`biclustering <biclustering>` algorithms (:class:`sklearn.cluster.bicluster.SpectralCoclustering` and :class:`sklearn.cluster.bicluster.SpectralBiclustering`), data generation methods (:func:`sklearn.datasets.make_biclusters` and :func:`sklearn.datasets.make_checkerboard`), and scoring metrics (:func:`sklearn.metrics.consensus_score`). By `Kemal Eren`_.
Added :ref:`Restricted Boltzmann Machines<rbm>` (:class:`neural_network.BernoulliRBM`). By `Yann Dauphin`_.
Python 3 support by :user:`Justin Vincent <justinvf>`, `Lars Buitinck`_, :user:`Subhodeep Moitra <smoitra87>` and `Olivier Grisel`_. All tests now pass under Python 3.3.
Ability to pass one penalty (alpha value) per target in :class:`linear_model.Ridge`, by @eickenberg and `Mathieu Blondel`_.
Fixed :mod:`sklearn.linear_model.stochastic_gradient.py` L2 regularization issue (minor practical significance). By :user:`Norbert Crombach <norbert>` and `Mathieu Blondel`_ .
Added an interactive version of `Andreas Müller`_'s Machine Learning Cheat Sheet (for scikit-learn) to the documentation. See :ref:`Choosing the right estimator <ml_map>`. By `Jaques Grobler`_.
:class:`grid_search.GridSearchCV` and :func:`cross_validation.cross_val_score` now support the use of advanced scoring function such as area under the ROC curve and f-beta scores. See :ref:`scoring_parameter` for details. By `Andreas Müller`_ and `Lars Buitinck`_. Passing a function from :mod:`sklearn.metrics` as score_func is deprecated.
Multi-label classification output is now supported by :func:`metrics.accuracy_score`, :func:`metrics.zero_one_loss`, :func:`metrics.f1_score`, :func:`metrics.fbeta_score`, :func:`metrics.classification_report`, :func:`metrics.precision_score` and :func:`metrics.recall_score` by `Arnaud Joly`_.
Two new metrics :func:`metrics.hamming_loss` and :func:`metrics.jaccard_similarity_score` are added with multi-label support by `Arnaud Joly`_.
Speed and memory usage improvements in :class:`feature_extraction.text.CountVectorizer` and :class:`feature_extraction.text.TfidfVectorizer`, by Jochen Wersdörfer and Roman Sinayev.
The min_df parameter in :class:`feature_extraction.text.CountVectorizer` and :class:`feature_extraction.text.TfidfVectorizer`, which used to be 2, has been reset to 1 to avoid unpleasant surprises (empty vocabularies) for novice users who try it out on tiny document collections. A value of at least 2 is still recommended for practical use.
:class:`svm.LinearSVC`, :class:`linear_model.SGDClassifier` and :class:`linear_model.SGDRegressor` now have a sparsify method that converts their coef_ into a sparse matrix, meaning stored models trained using these estimators can be made much more compact.
:class:`linear_model.SGDClassifier` now produces multiclass probability estimates when trained under log loss or modified Huber loss.
Hyperlinks to documentation in example code on the website by :user:`Martin Luessi <mluessi>`.
Fixed bug in :class:`preprocessing.MinMaxScaler` causing incorrect scaling of the features for non-default feature_range settings. By `Andreas Müller`_.
max_features in :class:`tree.DecisionTreeClassifier`, :class:`tree.DecisionTreeRegressor` and all derived ensemble estimators now supports percentage values. By `Gilles Louppe`_.
Performance improvements in :class:`isotonic.IsotonicRegression` by `Nelle Varoquaux`_.
:func:`metrics.accuracy_score` has an option normalize to return the fraction or the number of correctly classified sample by `Arnaud Joly`_.
Added :func:`metrics.log_loss` that computes log loss, aka cross-entropy loss. By Jochen Wersdörfer and `Lars Buitinck`_.
A bug that caused :class:`ensemble.AdaBoostClassifier`'s to output incorrect probabilities has been fixed.
Feature selectors now share a mixin providing consistent transform, inverse_transform and get_support methods. By `Joel Nothman`_.
A fitted :class:`grid_search.GridSearchCV` or :class:`grid_search.RandomizedSearchCV` can now generally be pickled. By `Joel Nothman`_.
Refactored and vectorized implementation of :func:`metrics.roc_curve` and :func:`metrics.precision_recall_curve`. By `Joel Nothman`_.
The new estimator :class:`sklearn.decomposition.TruncatedSVD` performs dimensionality reduction using SVD on sparse matrices, and can be used for latent semantic analysis (LSA). By `Lars Buitinck`_.
Added self-contained example of out-of-core learning on text data :ref:`sphx_glr_auto_examples_applications_plot_out_of_core_classification.py`. By :user:`Eustache Diemert <oddskool>`.
The default number of components for :class:`sklearn.decomposition.RandomizedPCA` is now correctly documented to be n_features. This was the default behavior, so programs using it will continue to work as they did.
:class:`sklearn.cluster.KMeans` now fits several orders of magnitude faster on sparse data (the speedup depends on the sparsity). By `Lars Buitinck`_.
Reduce memory footprint of FastICA by `Denis Engemann`_ and `Alexandre Gramfort`_.
Verbose output in :mod:`sklearn.ensemble.gradient_boosting` now uses a column format and prints progress in decreasing frequency. It also shows the remaining time. By `Peter Prettenhofer`_.
:mod:`sklearn.ensemble.gradient_boosting` provides out-of-bag improvement :attr:`~sklearn.ensemble.GradientBoostingRegressor.oob_improvement_` rather than the OOB score for model selection. An example that shows how to use OOB estimates to select the number of trees was added. By `Peter Prettenhofer`_.
Most metrics now support string labels for multiclass classification by `Arnaud Joly`_ and `Lars Buitinck`_.
New OrthogonalMatchingPursuitCV class by `Alexandre Gramfort`_ and `Vlad Niculae`_.
Fixed a bug in :class:`sklearn.covariance.GraphLassoCV`: the 'alphas' parameter now works as expected when given a list of values. By Philippe Gervais.
Fixed an important bug in :class:`sklearn.covariance.GraphLassoCV` that prevented all folds provided by a CV object to be used (only the first 3 were used). When providing a CV object, execution time may thus increase significantly compared to the previous version (bug results are correct now). By Philippe Gervais.
:class:`cross_validation.cross_val_score` and the :mod:`grid_search` module is now tested with multi-output data by `Arnaud Joly`_.
:func:`datasets.make_multilabel_classification` can now return the output in label indicator multilabel format by `Arnaud Joly`_.
K-nearest neighbors, :class:`neighbors.KNeighborsRegressor` and :class:`neighbors.RadiusNeighborsRegressor`, and radius neighbors, :class:`neighbors.RadiusNeighborsRegressor` and :class:`neighbors.RadiusNeighborsClassifier` support multioutput data by `Arnaud Joly`_.
Random state in LibSVM-based estimators (:class:`svm.SVC`, :class:`NuSVC`, :class:`OneClassSVM`, :class:`svm.SVR`, :class:`svm.NuSVR`) can now be controlled. This is useful to ensure consistency in the probability estimates for the classifiers trained with probability=True. By `Vlad Niculae`_.
Out-of-core learning support for discrete naive Bayes classifiers :class:`sklearn.naive_bayes.MultinomialNB` and :class:`sklearn.naive_bayes.BernoulliNB` by adding the partial_fit method by `Olivier Grisel`_.
New website design and navigation by `Gilles Louppe`_, `Nelle Varoquaux`_, Vincent Michel and `Andreas Müller`_.
Improved documentation on :ref:`multi-class, multi-label and multi-output classification <multiclass>` by `Yannick Schwartz`_ and `Arnaud Joly`_.
Better input and error handling in the :mod:`metrics` module by `Arnaud Joly`_ and `Joel Nothman`_.
Speed optimization of the :mod:`hmm` module by :user:`Mikhail Korobov <kmike>`
Significant speed improvements for :class:`sklearn.cluster.DBSCAN` by cleverless

API changes summary

The :func:`auc_score` was renamed :func:`roc_auc_score`.
Testing scikit-learn with sklearn.test() is deprecated. Use nosetests sklearn from the command line.
Feature importances in :class:`tree.DecisionTreeClassifier`, :class:`tree.DecisionTreeRegressor` and all derived ensemble estimators are now computed on the fly when accessing the feature_importances_ attribute. Setting compute_importances=True is no longer required. By `Gilles Louppe`_.
:class:`linear_model.lasso_path` and :class:`linear_model.enet_path` can return its results in the same format as that of :class:`linear_model.lars_path`. This is done by setting the return_models parameter to False. By `Jaques Grobler`_ and `Alexandre Gramfort`_
:class:`grid_search.IterGrid` was renamed to :class:`grid_search.ParameterGrid`.
Fixed bug in :class:`KFold` causing imperfect class balance in some cases. By `Alexandre Gramfort`_ and Tadej Janež.
:class:`sklearn.neighbors.BallTree` has been refactored, and a :class:`sklearn.neighbors.KDTree` has been added which shares the same interface. The Ball Tree now works with a wide variety of distance metrics. Both classes have many new methods, including single-tree and dual-tree queries, breadth-first and depth-first searching, and more advanced queries such as kernel density estimation and 2-point correlation functions. By `Jake Vanderplas`_
Support for scipy.spatial.cKDTree within neighbors queries has been removed, and the functionality replaced with the new :class:`KDTree` class.
:class:`sklearn.neighbors.KernelDensity` has been added, which performs efficient kernel density estimation with a variety of kernels.
:class:`sklearn.decomposition.KernelPCA` now always returns output with n_components components, unless the new parameter remove_zero_eig is set to True. This new behavior is consistent with the way kernel PCA was always documented; previously, the removal of components with zero eigenvalues was tacitly performed on all data.
gcv_mode="auto" no longer tries to perform SVD on a densified sparse matrix in :class:`sklearn.linear_model.RidgeCV`.
Sparse matrix support in :class:`sklearn.decomposition.RandomizedPCA` is now deprecated in favor of the new TruncatedSVD.
:class:`cross_validation.KFold` and :class:`cross_validation.StratifiedKFold` now enforce n_folds >= 2 otherwise a ValueError is raised. By `Olivier Grisel`_.
:func:`datasets.load_files`'s charset and charset_errors parameters were renamed encoding and decode_errors.
Attribute oob_score_ in :class:`sklearn.ensemble.GradientBoostingRegressor` and :class:`sklearn.ensemble.GradientBoostingClassifier` is deprecated and has been replaced by oob_improvement_ .
Attributes in OrthogonalMatchingPursuit have been deprecated (copy_X, Gram, ...) and precompute_gram renamed precompute for consistency. See #2224.
:class:`sklearn.preprocessing.StandardScaler` now converts integer input to float, and raises a warning. Previously it rounded for dense integer input.
:class:`sklearn.multiclass.OneVsRestClassifier` now has a decision_function method. This will return the distance of each sample from the decision boundary for each class, as long as the underlying estimators implement the decision_function method. By `Kyle Kastner`_.
Better input validation, warning on unexpected shapes for y.

People

List of contributors for release 0.14 by number of commits.

277 Gilles Louppe

245 Lars Buitinck

187 Andreas Mueller

124 Arnaud Joly

112 Jaques Grobler

109 Gael Varoquaux

107 Olivier Grisel

102 Noel Dawe

99 Kemal Eren

79 Joel Nothman

75 Jake VanderPlas

73 Nelle Varoquaux

71 Vlad Niculae

65 Peter Prettenhofer

64 Alexandre Gramfort

54 Mathieu Blondel

38 Nicolas Trésegnie

35 eustache

27 Denis Engemann

25 Yann N. Dauphin

19 Justin Vincent

17 Robert Layton

15 Doug Coleman

14 Michael Eickenberg

13 Robert Marchman

11 Fabian Pedregosa

11 Philippe Gervais

10 Jim Holmström

10 Tadej Janež

10 syhw

9 Mikhail Korobov

9 Steven De Gryze

8 sergeyf

7 Ben Root

7 Hrishikesh Huilgolkar

6 Kyle Kastner

6 Martin Luessi

6 Rob Speer

5 Federico Vaggi

5 Raul Garreta

5 Rob Zinkov

4 Ken Geis

3 A. Flaxman

3 Denton Cockburn

3 Dougal Sutherland

3 Ian Ozsvald

3 Johannes Schönberger

3 Robert McGibbon

3 Roman Sinayev

3 Szabo Roland

2 Diego Molla

2 Imran Haque

2 Jochen Wersdörfer

2 Sergey Karayev

2 Yannick Schwartz

2 jamestwebber

1 Abhijeet Kolhe

1 Alexander Fabisch

1 Bastiaan van den Berg

1 Benjamin Peterson

1 Daniel Velkov

1 Fazlul Shahriar

1 Felix Brockherde

1 Félix-Antoine Fortin

1 Harikrishnan S

1 Jack Hale

1 JakeMick

1 James McDermott

1 John Benediktsson

1 John Zwinck

1 Joshua Vredevoogd

1 Justin Pati

1 Kevin Hughes

1 Kyle Kelley

1 Matthias Ekman

1 Miroslav Shubernetskiy

1 Naoki Orii

1 Norbert Crombach

1 Rafael Cunha de Almeida

1 Rolando Espinoza La fuente

1 Seamus Abshere

1 Sergey Feldman

1 Sergio Medina

1 Stefano Lattarini

1 Steve Koch

1 Sturla Molden

1 Thomas Jarosch

1 Yaroslav Halchenko

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.14.rst.txt

v0.14.rst.txt

Version 0.14

Changelog

API changes summary

People

Files

v0.14.rst.txt

Latest commit

History

v0.14.rst.txt

File metadata and controls

Version 0.14

Changelog

API changes summary

People