tf.signal.linear_to_mel_weight_matrix
Stay organized with collections
Save and categorize content based on your preferences.
Returns a matrix to warp linear scale spectrograms to the mel scale.
tf.signal.linear_to_mel_weight_matrix(
num_mel_bins=20,
num_spectrogram_bins=129,
sample_rate=8000,
lower_edge_hertz=125.0,
upper_edge_hertz=3800.0,
dtype=tf.dtypes.float32
,
name=None
)
Returns a weight matrix that can be used to re-weight a Tensor
containing
num_spectrogram_bins
linearly sampled frequency information from
[0, sample_rate / 2]
into num_mel_bins
frequency information from
[lower_edge_hertz, upper_edge_hertz]
on the mel scale.
This function follows the Hidden Markov Model Toolkit
(HTK) convention, defining the mel scale in
terms of a frequency in hertz according to the following formula:
$$\textrm{mel}(f) = 2595 * \textrm{log}_{10}(1 + \frac{f}{700})$$
In the returned matrix, all the triangles (filterbanks) have a peak value
of 1.0.
For example, the returned matrix A
can be used to right-multiply a
spectrogram S
of shape [frames, num_spectrogram_bins]
of linear
scale spectrum values (e.g. STFT magnitudes) to generate a "mel spectrogram"
M
of shape [frames, num_mel_bins]
.
# `S` has shape [frames, num_spectrogram_bins]
# `M` has shape [frames, num_mel_bins]
M = tf.matmul(S, A)
The matrix can be used with tf.tensordot
to convert an arbitrary rank
Tensor
of linear-scale spectral bins into the mel scale.
# S has shape [..., num_spectrogram_bins].
# M has shape [..., num_mel_bins].
M = tf.tensordot(S, A, 1)
Args |
num_mel_bins
|
Python int. How many bands in the resulting mel spectrum.
|
num_spectrogram_bins
|
An integer Tensor . How many bins there are in the
source spectrogram data, which is understood to be fft_size // 2 + 1 ,
i.e. the spectrogram only contains the nonredundant FFT bins.
|
sample_rate
|
An integer or float Tensor . Samples per second of the input
signal used to create the spectrogram. Used to figure out the frequencies
corresponding to each spectrogram bin, which dictates how they are mapped
into the mel scale.
|
lower_edge_hertz
|
Python float. Lower bound on the frequencies to be
included in the mel spectrum. This corresponds to the lower edge of the
lowest triangular band.
|
upper_edge_hertz
|
Python float. The desired top edge of the highest
frequency band.
|
dtype
|
The DType of the result matrix. Must be a floating point type.
|
name
|
An optional name for the operation.
|
Returns |
A Tensor of shape [num_spectrogram_bins, num_mel_bins] .
|
Raises |
ValueError
|
If num_mel_bins /num_spectrogram_bins /sample_rate are not
positive, lower_edge_hertz is negative, frequency edges are incorrectly
ordered, upper_edge_hertz is larger than the Nyquist frequency.
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates. Some content is licensed under the numpy license.
Last updated 2024-04-26 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-04-26 UTC."],[],[],null,["# tf.signal.linear_to_mel_weight_matrix\n\n|---------------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/tensorflow/blob/v2.16.1/tensorflow/python/ops/signal/mel_ops.py#L90-L216) |\n\nReturns a matrix to warp linear scale spectrograms to the [mel scale](https://en.wikipedia.org/wiki/Mel_scale).\n\n#### View aliases\n\n\n**Compat aliases for migration**\n\nSee\n[Migration guide](https://www.tensorflow.org/guide/migrate) for\nmore details.\n\n[`tf.compat.v1.signal.linear_to_mel_weight_matrix`](https://www.tensorflow.org/api_docs/python/tf/signal/linear_to_mel_weight_matrix)\n\n\u003cbr /\u003e\n\n tf.signal.linear_to_mel_weight_matrix(\n num_mel_bins=20,\n num_spectrogram_bins=129,\n sample_rate=8000,\n lower_edge_hertz=125.0,\n upper_edge_hertz=3800.0,\n dtype=../../tf/dtypes#float32,\n name=None\n )\n\nReturns a weight matrix that can be used to re-weight a `Tensor` containing\n`num_spectrogram_bins` linearly sampled frequency information from\n`[0, sample_rate / 2]` into `num_mel_bins` frequency information from\n`[lower_edge_hertz, upper_edge_hertz]` on the [mel scale](https://en.wikipedia.org/wiki/Mel_scale).\n\nThis function follows the [Hidden Markov Model Toolkit\n(HTK)](http://htk.eng.cam.ac.uk/) convention, defining the mel scale in\nterms of a frequency in hertz according to the following formula: \n\n $$\\textrm{mel}(f) = 2595 * \\textrm{log}_{10}(1 + \\frac{f}{700})$$\n\nIn the returned matrix, all the triangles (filterbanks) have a peak value\nof 1.0.\n\nFor example, the returned matrix `A` can be used to right-multiply a\nspectrogram `S` of shape `[frames, num_spectrogram_bins]` of linear\nscale spectrum values (e.g. STFT magnitudes) to generate a \"mel spectrogram\"\n`M` of shape `[frames, num_mel_bins]`. \n\n # `S` has shape [frames, num_spectrogram_bins]\n # `M` has shape [frames, num_mel_bins]\n M = tf.matmul(S, A)\n\nThe matrix can be used with [`tf.tensordot`](../../tf/tensordot) to convert an arbitrary rank\n`Tensor` of linear-scale spectral bins into the mel scale. \n\n # S has shape [..., num_spectrogram_bins].\n # M has shape [..., num_mel_bins].\n M = tf.tensordot(S, A, 1)\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `num_mel_bins` | Python int. How many bands in the resulting mel spectrum. |\n| `num_spectrogram_bins` | An integer `Tensor`. How many bins there are in the source spectrogram data, which is understood to be `fft_size // 2 + 1`, i.e. the spectrogram only contains the nonredundant FFT bins. |\n| `sample_rate` | An integer or float `Tensor`. Samples per second of the input signal used to create the spectrogram. Used to figure out the frequencies corresponding to each spectrogram bin, which dictates how they are mapped into the mel scale. |\n| `lower_edge_hertz` | Python float. Lower bound on the frequencies to be included in the mel spectrum. This corresponds to the lower edge of the lowest triangular band. |\n| `upper_edge_hertz` | Python float. The desired top edge of the highest frequency band. |\n| `dtype` | The `DType` of the result matrix. Must be a floating point type. |\n| `name` | An optional name for the operation. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ------- ||\n|---|---|\n| A `Tensor` of shape `[num_spectrogram_bins, num_mel_bins]`. ||\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Raises ------ ||\n|--------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `ValueError` | If `num_mel_bins`/`num_spectrogram_bins`/`sample_rate` are not positive, `lower_edge_hertz` is negative, frequency edges are incorrectly ordered, `upper_edge_hertz` is larger than the Nyquist frequency. |\n\n\u003cbr /\u003e"]]