Scikit Learn Org Stable Modules Kernel - Approximation HTML
Scikit Learn Org Stable Modules Kernel - Approximation HTML
Prev Up Next
See also: Polynomial regression: extending linear models with basis functions for an exact polynomial transformation.
The Nystroem method, as implemented in Nystroem is a general method for low-rank approximations of kernels. It achieves this by
essentially subsampling the data on which the kernel is evaluated. By default Nystroem uses the rbf kernel, but it can use any
kernel function or a precomputed kernel matrix. The number of samples used - which is also the dimensionality of the features
computed - is given by the parameter n_components .
The mapping relies on a Monte Carlo approximation to the kernel values. The fit function performs the Monte Carlo sampling,
whereas the transform method performs the mapping of the data. Because of the inherent randomness of the process, results
may vary between different calls to the fit function.
The fit function takes two arguments: n_components , which is the target dimensionality of the feature transform, and gamma , the
parameter of the RBF-kernel. A higher n_components will result in a better approximation of the kernel and will yield results more
similar to those produced by a kernel SVM. Note that “fitting” the feature function does not actually depend on the data given to the
fit function. Only the dimensionality of the data is used. Details on the method can be found in [RR2007].
For a given value of n_components RBFSampler is often less accurate as Nystroem. RBFSampler is cheaper to compute, though,
making use of larger feature spaces more efficient.
Prev Up Next
6.7.3. Additive Chi Squared Kernel
scikit-learn 1.3.2
Other versions The additive chi squared kernel is a kernel on histograms, often used in computer vision.
Please cite us if you use the The additive chi squared kernel as used here is given by
software.
2xiyi
6.7. Kernel Approximation k(x, y) = ∑
xi + yi
6.7.1. Nystroem Method for Kernel i
Approximation
6.7.2. Radial Basis Function Kernel This is not exactly the same as sklearn.metrics.pairwise.additive_chi2_kernel. The authors of [VZ2010] prefer the version
6.7.3. Additive Chi Squared Kernel above as it is always positive definite. Since the kernel is additive, it is possible to treat all components xi separately for embedding.
6.7.4. Skewed Chi Squared Kernel This makes it possible to sample the Fourier transform in regular intervals, instead of approximating using Monte Carlo sampling.
6.7.5. Polynomial Kernel
Approximation via Tensor Sketch The class AdditiveChi2Sampler implements this component wise deterministic sampling. Each component is sampled n times,
6.7.6. Mathematical Details yielding 2n + 1 dimensions per input dimension (the multiple of two stems from the real and complex part of the Fourier
transform). In the literature, n is usually chosen to be 1 or 2, transforming the dataset to size n_samples * 5 * n_features (in the
case of n = 2).
The approximate feature map provided by AdditiveChi2Sampler can be combined with the approximate feature map provided by
RBFSampler to yield an approximate feature map for the exponentiated chi squared kernel. See the [VZ2010] for details and
[VVZ2010] for combination with the RBFSampler.
2√xi + c√yi + c
k(x, y) = ∏
xi + yi + 2c
i
It has properties that are similar to the exponentiated chi squared kernel often used in computer vision, but allows for a simple
Monte Carlo approximation of the feature map.
The usage of the SkewedChi2Sampler is the same as the usage described above for the RBFSampler. The only difference is in the
free parameter, that is called c. For a motivation for this mapping and the mathematical details see [LS2010].
⊤ d
k(x, y) = (γx y + c0)
where:
Intuitively, the feature space of the polynomial kernel of degree d consists of all possible degree- d products among input features,
which enables learning algorithms using this kernel to account for interactions between features.
The TensorSketch [PP2013] method, as implemented in PolynomialCountSketch, is a scalable, input data independent method for
polynomial kernel approximation. It is based on the concept of Count sketch [WIKICS] [CCF2002] , a dimensionality reduction
technique similar to feature hashing, which instead uses several independent hash functions. TensorSketch obtains a Count Sketch
of the outer product of two vectors (or a vector with itself), which can be used as an approximation of the polynomial kernel feature
space. In particular, instead of explicitly computing the outer product, TensorSketch computes the Count Sketch of the vectors and
then uses polynomial multiplication via the Fast Fourier Transform to compute the Count Sketch of their outer product.
Conveniently, the training phase of TensorSketch simply consists of initializing some random variables. It is thus independent of the
input data, i.e. it only depends on the number of input features, but not the data values. In addition, this method can transform
samples in O(nsamples(nf eatures + ncomponents log(ncomponents))) time, where ncomponents is the desired output dimension,
determined by n_components .
Examples:
Kernel methods like support vector machines or kernelized PCA rely on a property of reproducing kernel Hilbert spaces. For any
positive definite kernel function k (a so called Mercer kernel), it is guaranteed that there exists a mapping ϕ into a Hilbert space H,
such that
6.7. Kernel Approximation The classes in this submodule allow to approximate the embedding ϕ, thereby working explicitly with the representations ϕ(xi),
6.7.1. Nystroem Method for Kernel
which obviates the need to apply the kernel or store training examples.
Approximation
6.7.2. Radial Basis Function Kernel References:
6.7.3. Additive Chi Squared Kernel
6.7.4. Skewed Chi Squared Kernel [RR2007] (1,2)
6.7.5. Polynomial Kernel “Random features for large-scale kernel machines” Rahimi, A. and Recht, B. - Advances in neural information processing 2007,
Approximation via Tensor Sketch
6.7.6. Mathematical Details [LS2010]
“Random Fourier approximations for skewed multiplicative histogram kernels” Li, F., Ionescu, C., and Sminchisescu, C. - Pattern
Recognition, DAGM 2010, Lecture Notes in Computer Science.
[VZ2010] (1,2)
“Efficient additive kernels via explicit feature maps” Vedaldi, A. and Zisserman, A. - Computer Vision and Pattern Recognition 2010
[VVZ2010]
“Generalized RBF feature maps for Efficient Detection” Vempati, S. and Vedaldi, A. and Zisserman, A. and Jawahar, CV - 2010
[PP2013]
“Fast and scalable polynomial kernels via explicit feature maps” Pham, N., & Pagh, R. - 2013
[CCF2002]
“Finding frequent items in data streams” Charikar, M., Chen, K., & Farach-Colton - 2002
[WIKICS]
“Wikipedia: Count sketch”
© 2007 - 2023, scikit-learn developers (BSD License). Show this page source
Toggle Menu