-
Notifications
You must be signed in to change notification settings - Fork 81
/
Copy pathmetrics.txt
204 lines (142 loc) · 6.64 KB
/
metrics.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
.. _metrics:
Pairwise metrics, Affinities and Kernels
========================================
The :mod:`sklearn.metrics.pairwise` submodule implements utilities to evaluate
pairwise distances or affinity of sets of samples.
This module contains both distance metrics and kernels. A brief summary is
given on the two here.
Distance metrics are functions ``d(a, b)`` such that ``d(a, b) < d(a, c)``
if objects ``a`` and ``b`` are considered "more similar" than objects ``a``
and ``c``. Two objects exactly alike would have a distance of zero.
One of the most popular examples is Euclidean distance.
To be a 'true' metric, it must obey the following four conditions::
1. d(a, b) >= 0, for all a and b
2. d(a, b) == 0, if and only if a = b, positive definiteness
3. d(a, b) == d(b, a), symmetry
4. d(a, c) <= d(a, b) + d(b, c), the triangle inequality
Kernels are measures of similarity, i.e. ``s(a, b) > s(a, c)``
if objects ``a`` and ``b`` are considered "more similar" than objects
``a`` and ``c``. A kernel must also be positive semi-definite.
There are a number of ways to convert between a distance metric and a
similarity measure, such as a kernel. Let ``D`` be the distance, and ``S`` be
the kernel:
1. ``S = np.exp(-D * gamma)``, where one heuristic for choosing
``gamma`` is ``1 / num_features``
2. ``S = 1. / (D / np.max(D))``
.. currentmodule:: sklearn.metrics.pairwise
.. _cosine_similarity:
Cosine similarity
-----------------
:func:`cosine_similarity` computes the L2-normalized dot product of vectors.
That is, if :math:`x` and :math:`y` are row vectors,
their cosine similarity :math:`k` is defined as:
.. math::
k(x, y) = \frac{x y^\top}{\|x\| \|y\|}
This is called cosine similarity, because Euclidean (L2) normalization
projects the vectors onto the unit sphere,
and their dot product is then the cosine of the angle between the points
denoted by the vectors.
This kernel is a popular choice for computing the similarity of documents
represented as tf-idf vectors.
:func:`cosine_similarity` accepts ``scipy.sparse`` matrices.
(Note that the tf-idf functionality in ``sklearn.feature_extraction.text``
can produce normalized vectors, in which case :func:`cosine_similarity`
is equivalent to :func:`linear_kernel`, only slower.)
.. topic:: References:
* C.D. Manning, P. Raghavan and H. Schütze (2008). Introduction to
Information Retrieval. Cambridge University Press.
http://nlp.stanford.edu/IR-book/html/htmledition/the-vector-space-model-for-scoring-1.html
.. _linear_kernel:
Linear kernel
-------------
The function :func:`linear_kernel` computes the linear kernel, that is, a
special case of :func:`polynomial_kernel` with ``degree=1`` and ``coef0=0`` (homogeneous).
If ``x`` and ``y`` are column vectors, their linear kernel is:
.. math::
k(x, y) = x^\top y
.. _polynomial_kernel:
Polynomial kernel
-----------------
The function :func:`polynomial_kernel` computes the degree-d polynomial kernel
between two vectors. The polynomial kernel represents the similarity between two
vectors. Conceptually, the polynomial kernels considers not only the similarity
between vectors under the same dimension, but also across dimensions. When used
in machine learning algorithms, this allows to account for feature interaction.
The polynomial kernel is defined as:
.. math::
k(x, y) = (\gamma x^\top y +c_0)^d
where:
* ``x``, ``y`` are the input vectors
* ``d`` is the kernel degree
If :math:`c_0 = 0` the kernel is said to be homogeneous.
.. _sigmoid_kernel:
Sigmoid kernel
--------------
The function :func:`sigmoid_kernel` computes the sigmoid kernel between two
vectors. The sigmoid kernel is also known as hyperbolic tangent, or Multilayer
Perceptron (because, in the neural network field, it is often used as neuron
activation function). It is defined as:
.. math::
k(x, y) = \tanh( \gamma x^\top y + c_0)
where:
* ``x``, ``y`` are the input vectors
* :math:`\gamma` is known as slope
* :math:`c_0` is known as intercept
.. _rbf_kernel:
RBF kernel
----------
The function :func:`rbf_kernel` computes the radial basis function (RBF) kernel
between two vectors. This kernel is defined as:
.. math::
k(x, y) = \exp( -\gamma \| x-y \|^2)
where ``x`` and ``y`` are the input vectors. If :math:`\gamma = \sigma^{-2}`
the kernel is known as the Gaussian kernel of variance :math:`\sigma^2`.
.. _laplacian_kernel:
Laplacian kernel
----------------
The function :func:`laplacian_kernel` is a variant on the radial basis
function kernel defined as:
.. math::
k(x, y) = \exp( -\gamma \| x-y \|_1)
where ``x`` and ``y`` are the input vectors and :math:`\|x-y\|_1` is the
Manhattan distance between the input vectors.
It has proven useful in ML applied to noiseless data.
See e.g. `Machine learning for quantum mechanics in a nutshell
<http://onlinelibrary.wiley.com/doi/10.1002/qua.24954/abstract/>`_.
.. _chi2_kernel:
Chi-squared kernel
------------------
The chi-squared kernel is a very popular choice for training non-linear SVMs in
computer vision applications.
It can be computed using :func:`chi2_kernel` and then passed to an
:class:`sklearn.svm.SVC` with ``kernel="precomputed"``::
>>> from sklearn.svm import SVC
>>> from sklearn.metrics.pairwise import chi2_kernel
>>> X = [[0, 1], [1, 0], [.2, .8], [.7, .3]]
>>> y = [0, 1, 0, 1]
>>> K = chi2_kernel(X, gamma=.5)
>>> K # doctest: +ELLIPSIS
array([[ 1. , 0.36..., 0.89..., 0.58...],
[ 0.36..., 1. , 0.51..., 0.83...],
[ 0.89..., 0.51..., 1. , 0.77... ],
[ 0.58..., 0.83..., 0.77... , 1. ]])
>>> svm = SVC(kernel='precomputed').fit(K, y)
>>> svm.predict(K)
array([0, 1, 0, 1])
It can also be directly used as the ``kernel`` argument::
>>> svm = SVC(kernel=chi2_kernel).fit(X, y)
>>> svm.predict(X)
array([0, 1, 0, 1])
The chi squared kernel is given by
.. math::
k(x, y) = \exp \left (-\gamma \sum_i \frac{(x[i] - y[i]) ^ 2}{x[i] + y[i]} \right )
The data is assumed to be non-negative, and is often normalized to have an L1-norm of one.
The normalization is rationalized with the connection to the chi squared distance,
which is a distance between discrete probability distributions.
The chi squared kernel is most commonly used on histograms (bags) of visual words.
.. topic:: References:
* Zhang, J. and Marszalek, M. and Lazebnik, S. and Schmid, C.
Local features and kernels for classification of texture and object
categories: A comprehensive study
International Journal of Computer Vision 2007
http://research.microsoft.com/en-us/um/people/manik/projects/trade-off/papers/ZhangIJCV06.pdf