-
Notifications
You must be signed in to change notification settings - Fork 81
/
Copy pathrelated_projects.txt
170 lines (117 loc) · 7.09 KB
/
related_projects.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
.. _related_projects:
=====================================
Related Projects
=====================================
Below is a list of sister-projects, extensions and domain specific packages.
Interoperability and framework enhancements
-------------------------------------------
These tools adapt scikit-learn for use with other technologies or otherwise
enhance the functionality of scikit-learn's estimators.
- `ML Frontend <https://github.com/jeff1evesque/machine-learning>`_ provides
dataset management and SVM fitting/prediction through
`web-based <https://github.com/jeff1evesque/machine-learning#web-interface>`_
and `programmatic <https://github.com/jeff1evesque/machine-learning#programmatic-interface>`_
interfaces.
- `sklearn_pandas <https://github.com/paulgb/sklearn-pandas/>`_ bridge for
scikit-learn pipelines and pandas data frame with dedicated transformers.
- `Scikit-Learn Laboratory
<https://skll.readthedocs.io/en/latest/index.html>`_ A command-line
wrapper around scikit-learn that makes it easy to run machine learning
experiments with multiple learners and large feature sets.
- `auto-sklearn <https://github.com/automl/auto-sklearn/>`_
An automated machine learning toolkit and a drop-in replacement for a
scikit-learn estimator
- `TPOT <https://github.com/rhiever/tpot>`_
An automated machine learning toolkit that optimizes a series of scikit-learn
operators to design a machine learning pipeline, including data and feature
preprocessors as well as the estimators. Works as a drop-in replacement for a
scikit-learn estimator.
- `sklearn-pmml <https://github.com/alex-pirozhenko/sklearn-pmml>`_
Serialization of (some) scikit-learn estimators into PMML.
- `sklearn2pmml <https://github.com/jpmml/sklearn2pmml>`_
Serialization of a wide variety of scikit-learn estimators and transformers
into PMML with the help of `JPMML-SkLearn <https://github.com/jpmml/jpmml-sklearn>`_
library.
Other estimators and tasks
--------------------------
Not everything belongs or is mature enough for the central scikit-learn
project. The following are projects providing interfaces similar to
scikit-learn for additional learning algorithms, infrastructures
and tasks.
- `pylearn2 <http://deeplearning.net/software/pylearn2/>`_ A deep learning and
neural network library build on theano with scikit-learn like interface.
- `sklearn_theano <http://sklearn-theano.github.io/>`_ scikit-learn compatible
estimators, transformers, and datasets which use Theano internally
- `lightning <https://github.com/scikit-learn-contrib/lightning>`_ Fast state-of-the-art
linear model solvers (SDCA, AdaGrad, SVRG, SAG, etc...).
- `Seqlearn <https://github.com/larsmans/seqlearn>`_ Sequence classification
using HMMs or structured perceptron.
- `HMMLearn <https://github.com/hmmlearn/hmmlearn>`_ Implementation of hidden
markov models that was previously part of scikit-learn.
- `PyStruct <https://pystruct.github.io>`_ General conditional random fields
and structured prediction.
- `pomegranate <https://github.com/jmschrei/pomegranate>`_ Probabilistic modelling
for Python, with an emphasis on hidden Markov models.
- `py-earth <https://github.com/scikit-learn-contrib/py-earth>`_ Multivariate adaptive
regression splines
- `sklearn-compiledtrees <https://github.com/ajtulloch/sklearn-compiledtrees/>`_
Generate a C++ implementation of the predict function for decision trees (and
ensembles) trained by sklearn. Useful for latency-sensitive production
environments.
- `lda <https://github.com/ariddell/lda/>`_: Fast implementation of Latent
Dirichlet Allocation in Cython.
- `Sparse Filtering <https://github.com/jmetzen/sparse-filtering>`_
Unsupervised feature learning based on sparse-filtering
- `Kernel Regression <https://github.com/jmetzen/kernel_regression>`_
Implementation of Nadaraya-Watson kernel regression with automatic bandwidth
selection
- `gplearn <https://github.com/trevorstephens/gplearn>`_ Genetic Programming
for symbolic regression tasks.
- `nolearn <https://github.com/dnouri/nolearn>`_ A number of wrappers and
abstractions around existing neural network libraries
- `sparkit-learn <https://github.com/lensacom/sparkit-learn>`_ Scikit-learn functionality and API on PySpark.
- `keras <https://github.com/fchollet/keras>`_ Theano-based Deep Learning library.
- `mlxtend <https://github.com/rasbt/mlxtend>`_ Includes a number of additional
estimators as well as model visualization utilities.
- `kmodes <https://github.com/nicodv/kmodes>`_ k-modes clustering algorithm for categorical data, and
several of its variations.
- `hdbscan <https://github.com/lmcinnes/hdbscan>`_ HDBSCAN and Robust Single Linkage clustering algorithms
for robust variable density clustering.
- `lasagne <https://github.com/Lasagne/Lasagne>`_ A lightweight library to build and train neural networks in Theano.
- `multiisotonic <https://github.com/alexfields/multiisotonic>`_ Isotonic regression on multidimensional features.
- `spherecluster <https://github.com/clara-labs/spherecluster>`_ Spherical K-means and mixture of von Mises Fisher clustering routines for data on the unit hypersphere.
Statistical learning with Python
--------------------------------
Other packages useful for data analysis and machine learning.
- `Pandas <http://pandas.pydata.org>`_ Tools for working with heterogeneous and
columnar data, relational queries, time series and basic statistics.
- `theano <http://deeplearning.net/software/theano/>`_ A CPU/GPU array
processing framework geared towards deep learning research.
- `statsmodels <http://statsmodels.sourceforge.net/>`_ Estimating and analysing
statistical models. More focused on statistical tests and less on prediction
than scikit-learn.
- `PyMC <http://pymc-devs.github.io/pymc/>`_ Bayesian statistical models and
fitting algorithms.
- `REP <https://github.com/yandex/REP>`_ Environment for conducting data-driven
research in a consistent and reproducible way
- `Sacred <https://github.com/IDSIA/Sacred>`_ Tool to help you configure,
organize, log and reproduce experiments
- `gensim <https://radimrehurek.com/gensim/>`_ A library for topic modelling,
document indexing and similarity retrieval
- `Seaborn <http://stanford.edu/~mwaskom/software/seaborn/>`_ Visualization library based on
matplotlib. It provides a high-level interface for drawing attractive statistical graphics.
- `Deep Learning <http://deeplearning.net/software_links/>`_ A curated list of deep learning
software libraries.
Domain specific packages
~~~~~~~~~~~~~~~~~~~~~~~~
- `scikit-image <http://scikit-image.org/>`_ Image processing and computer
vision in python.
- `Natural language toolkit (nltk) <http://www.nltk.org/>`_ Natural language
processing and some machine learning.
- `NiLearn <https://nilearn.github.io/>`_ Machine learning for neuro-imaging.
- `AstroML <http://www.astroml.org/>`_ Machine learning for astronomy.
- `MSMBuilder <http://msmbuilder.org/>`_ Machine learning for protein
conformational dynamics time series.
Snippets and tidbits
---------------------
The `wiki <https://github.com/scikit-learn/scikit-learn/wiki/Third-party-projects-and-code-snippets>`_ has more!