Soxeffect
Soxeffect
NAME
SoX - Sound eXchange, the Swiss Army knife of audio manipulation
SYNOPSIS
sox [global-options] [format-options] infile1
[[format-options] infile2] ... [format-options] outfile
[effect [effect-options]] ...
play [global-options] [format-options] infile1
[[format-options] infile2] ... [format-options]
[effect [effect-options]] ...
rec [global-options] [format-options] outfile
[effect [effect-options]] ...
DESCRIPTION
Introduction
SoX reads and writes audio files in most popular formats and can optionally apply effects to them. It can
combine multiple input sources, synthesise audio, and, on many systems, act as a general purpose audio
player or a multi-track audio recorder. It also has limited ability to split the input into multiple output files.
All SoX functionality is available using just the sox command. To simplify playing and recording audio, if
SoX is invoked as play, the output file is automatically set to be the default sound device, and if invoked as
rec, the default sound device is used as an input source. Additionally, the soxi(1) command provides a con-
venient way to just query audio file header information.
The heart of SoX is a library called libSoX. Those interested in extending SoX or using it in other pro-
grams should refer to the libSoX manual page: libsox(3).
SoX is a command-line audio processing tool, particularly suited to making quick, simple edits and to batch
processing. If you need an interactive, graphical audio editor, use audacity(1).
* * *
The overall SoX processing chain can be summarised as follows:
Input(s) → Combiner → Effects → Output(s)
Note however, that on the SoX command line, the positions of the Output(s) and the Effects are swapped
w.r.t. the logical flow just shown. Note also that whilst options pertaining to files are placed before their re-
spective file name, the opposite is true for effects. To show how this works in practice, here is a selection of
examples of how SoX might be used. The simple
sox recital.au recital.wav
translates an audio file in Sun AU format to a Microsoft WAV file, whilst
sox recital.au -b 16 recital.wav channels 1 rate 16k fade 3 norm
performs the same format translation, but also applies four effects (down-mix to one channel, sample rate
change, fade-in, nomalize), and stores the result at a bit-depth of 16.
sox -r 16k -e signed -b 8 -c 1 voice-memo.raw voice-memo.wav
converts ‘raw’ (a.k.a. ‘headerless’) audio to a self-describing file format,
sox slow.aiff fixed.aiff speed 1.027
adjusts audio speed,
sox short.wav long.wav longer.wav
concatenates two audio files, and
sox -m music.mp3 voice.wav mixed.flac
mixes together two audio files.
play "The Moonbeams/Greatest/*.ogg" bass +3
plays a collection of audio files whilst applying a bass boosting effect,
play -n -c1 synth sin %-12 sin %-9 sin %-5 sin %-2 fade h 0.1 1 0.1
plays a synthesised ‘A minor seventh’ chord with a pipe-organ sound,
rec -c 2 radio.aiff trim 0 30:00
The soxi(1) command can be used to display information from audio file headers.
Determining & Setting The File Format
There are several mechanisms available for SoX to use to determine or set the format characteristics of an
audio file. Depending on the circumstances, individual characteristics may be determined or set using dif-
ferent mechanisms.
To determine the format of an input file, SoX will use, in order of precedence and as given or available:
1. Command-line format options.
2. The contents of the file header.
3. The filename extension.
To set the output file format, SoX will use, in order of precedence and as given or available:
1. Command-line format options.
2. The filename extension.
3. The input file format characteristics, or the closest that is supported by the output file type.
For all files, SoX will exit with an error if the file type cannot be determined. Command-line format options
may need to be added or changed to resolve the problem.
Playing & Recording Audio
The play and rec commands are provided so that basic playing and recording is as simple as
play existing-file.wav
and
rec new-file.wav
These two commands are functionally equivalent to
sox existing-file.wav -d
and
sox -d new-file.wav
Of course, further options and effects (as described below) can be added to the commands in either form.
* * *
Some systems provide more than one type of (SoX-compatible) audio driver, e.g. ALSA & OSS, or
SUNAU & AO. Systems can also have more than one audio device (a.k.a. ‘sound card’). If more than one
audio driver has been built-in to SoX, and the default selected by SoX when recording or playing is not the
one that is wanted, then the AUDIODRIVER environment variable can be used to override the default.
For example (on many systems):
set AUDIODRIVER=oss
play ...
The AUDIODEV environment variable can be used to override the default audio device, e.g.
set AUDIODEV=/dev/dsp2
play ...
sox ... -t oss
or
set AUDIODEV=hw:soundwave,1,2
play ...
sox ... -t alsa
Note that the way of setting environment variables varies from system to system—for some specific exam-
ples, see ‘SOX_OPTS’ below.
When playing a file with a sample rate that is not supported by the audio output device, SoX will automati-
cally invoke the rate effect to perform the necessary sample rate conversion. For compatibility with old
hardware, the default rate quality level is set to ‘low’. This can be changed by explicitly specifying the rate
effect with a different quality level, e.g.
play ... rate -m
or by using the --play-rate-arg option (see below).
* * *
On some systems, SoX allows audio playback volume to be adjusted whilst using play. Where supported,
this is achieved by tapping the ‘v’ & ‘V’ keys during playback.
To help with setting a suitable recording level, SoX includes a peak-level meter which can be invoked (be-
fore making the actual recording) as follows:
rec -n
The recording level should be adjusted (using the system-provided mixer program, not SoX) so that the me-
ter is at most occasionally full scale, and never ‘in the red’ (an exclamation mark is shown). See also -S
below.
Accuracy
Many file formats that compress audio discard some of the audio signal information whilst doing so. Con-
verting to such a format and then converting back again will not produce an exact copy of the original au-
dio. This is the case for many formats used in telephony (e.g. A-law, GSM) where low signal bandwidth is
more important than high audio fidelity, and for many formats used in portable music players (e.g. MP3,
Vorbis) where adequate fidelity can be retained even with the large compression ratios that are needed to
make portable players practical.
Formats that discard audio signal information are called ‘lossy’. Formats that do not are called ‘lossless’.
The term ‘quality’ is used as a measure of how closely the original audio signal can be reproduced when
using a lossy format.
Audio file conversion with SoX is lossless when it can be, i.e. when not using lossy compression, when not
reducing the sampling rate or number of channels, and when the number of bits used in the destination for-
mat is not less than in the source format. E.g. converting from an 8-bit PCM format to a 16-bit PCM for-
mat is lossless but converting from an 8-bit PCM format to (8-bit) A-law isn’t.
N.B. SoX converts all audio files to an internal uncompressed format before performing any audio process-
ing. This means that manipulating a file that is stored in a lossy format can cause further losses in audio fi-
delity. E.g. with
sox long.mp3 short.mp3 trim 10
SoX first decompresses the input MP3 file, then applies the trim effect, and finally creates the output MP3
file by re-compressing the audio—with a possible reduction in fidelity above that which occurred when the
input file was created. Hence, if what is ultimately desired is lossily compressed audio, it is highly recom-
mended to perform all audio processing using lossless file formats and then convert to the lossy format only
at the final stage.
N.B. Applying multiple effects with a single SoX invocation will, in general, produce more accurate results
than those produced using multiple SoX invocations.
Dithering
Dithering is a technique used to maximise the dynamic range of audio stored at a particular bit-depth. Any
distortion introduced by quantisation is decorrelated by adding a small amount of white noise to the signal.
In most cases, SoX can determine whether the selected processing requires dither and will add it during
output formatting if appropriate.
Specifically, by default, SoX automatically adds TPDF dither when the output bit-depth is less than 24 and
any of the following are true:
• bit-depth reduction has been specified explicitly using a command-line option
• the output file format supports only bit-depths lower than that of the input file format
• an effect has increased effective bit-depth within the internal processing chain
For example, adjusting volume with vol 0.25 requires two additional bits in which to losslessly store its re-
sults (since 0.25 decimal equals 0.01 binary). So if the input file bit-depth is 16, then SoX’s internal repre-
sentation will utilise 18 bits after processing this volume change. In order to store the output at the same
depth as the input, dithering is used to remove the additional bits.
Use the -V option to see what processing SoX has automatically added. The -D option may be given to
override automatic dithering. To invoke dithering manually (e.g. to select a noise-shaping curve), see the
dither effect.
Clipping
Clipping is distortion that occurs when an audio signal level (or ‘volume’) exceeds the range of the chosen
representation. In most cases, clipping is undesirable and so should be corrected by adjusting the level
prior to the point (in the processing chain) at which it occurs.
In SoX, clipping could occur, as you might expect, when using the vol or gain effects to increase the audio
volume. Clipping could also occur with many other effects, when converting one format to another, and
even when simply playing the audio.
Playing an audio file often involves resampling, and processing by analogue components can introduce a
small DC offset and/or amplification, all of which can produce distortion if the audio signal level was ini-
tially too close to the clipping point.
For these reasons, it is usual to make sure that an audio file’s signal level has some ‘headroom’, i.e. it does
not exceed a particular level below the maximum possible level for the given representation. Some stan-
dards bodies recommend as much as 9dB headroom, but in most cases, 3dB (≈ 70% linear) is enough. Note
that this wisdom seems to have been lost in modern music production; in fact, many CDs, MP3s, etc. are
now mastered at levels above 0dBFS i.e. the audio is clipped as delivered.
SoX’s stat and stats effects can assist in determining the signal level in an audio file. The gain or vol effect
can be used to prevent clipping, e.g.
sox dull.wav bright.wav gain -6 treble +6
guarantees that the treble boost will not clip.
If clipping occurs at any point during processing, SoX will display a warning message to that effect.
See also -G and the gain and norm effects.
Input File Combining
SoX’s input combiner can be configured (see OPTIONS below) to combine multiple files using any of the
following methods: ‘concatenate’, ‘sequence’, ‘mix’, ‘mix-power’, ‘merge’, or ‘multiply’. The default
method is ‘sequence’ for play, and ‘concatenate’ for rec and sox.
For all methods other than ‘sequence’, multiple input files must have the same sampling rate. If necessary,
separate SoX invocations can be used to make sampling rate adjustments prior to combining.
If the ‘concatenate’ combining method is selected (usually, this will be by default) then the input files must
also have the same number of channels. The audio from each input will be concatenated in the order given
to form the output file.
The ‘sequence’ combining method is selected automatically for play. It is similar to ‘concatenate’ in that
the audio from each input file is sent serially to the output file. However, here the output file may be closed
and reopened at the corresponding transition between input files. This may be just what is needed when
sending different types of audio to an output device, but is not generally useful when the output is a normal
file.
If either the ‘mix’ or ‘mix-power’ combining method is selected then two or more input files must be given
and will be mixed together to form the output file. The number of channels in each input file need not be
the same, but SoX will issue a warning if they are not and some channels in the output file will not contain
audio from every input file. A mixed audio file cannot be un-mixed without reference to the original input
files.
If the ‘merge’ combining method is selected then two or more input files must be given and will be merged
together to form the output file. The number of channels in each input file need not be the same. A merged
audio file comprises all of the channels from all of the input files. Un-merging is possible using multiple in-
vocations of SoX with the remix effect. For example, two mono files could be merged to form one stereo
file. The first and second mono files would become the left and right channels of the stereo file.
The ‘multiply’ combining method multiplies the sample values of corresponding channels (treated as num-
bers in the interval -1 to +1). If the number of channels in the input files is not the same, the missing chan-
nels are considered to contain all zero.
When combining input files, SoX applies any specified effects (including, for example, the vol volume ad-
justment effect) after the audio has been combined. However, it is often useful to be able to set the volume
of (i.e. ‘balance’) the inputs individually, before combining takes place.
For all combining methods, input file volume adjustments can be made manually using the -v option (be-
low) which can be given for one or more input files. If it is given for only some of the input files then the
others receive no volume adjustment. In some circumstances, automatic volume adjustments may be ap-
plied (see below).
The -V option (below) can be used to show the input file volume adjustments that have been selected (ei-
ther manually or automatically).
There are some special considerations that need to made when mixing input files:
Unlike the other methods, ‘mix’ combining has the potential to cause clipping in the combiner if no balanc-
ing is performed. In this case, if manual volume adjustments are not given, SoX will try to ensure that clip-
ping does not occur by automatically adjusting the volume (amplitude) of each input signal by a factor of
¹/n, where n is the number of input files. If this results in audio that is too quiet or otherwise unbalanced
then the input file volumes can be set manually as described above. Using the norm effect on the mix is an-
other alternative.
If mixed audio seems loud enough at some points but too quiet in others then dynamic range compression
should be applied to correct this—see the compand effect.
With the ‘mix-power’ combine method, the mixed volume is approximately equal to that of one of the input
signals. This is achieved by balancing using a factor of ¹/√n instead of ¹/n. Note that this balancing factor
does not guarantee that clipping will not occur, but the number of clips will usually be low and the resultant
distortion is generally imperceptible.
Output Files
SoX’s default behaviour is to take one or more input files and write them to a single output file.
This behaviour can be changed by specifying the pseudo-effect ‘newfile’ within the effects list. SoX will
then enter multiple output mode.
In multiple output mode, a new file is created when the effects prior to the ‘newfile’ indicate they are done.
The effects chain listed after ‘newfile’ is then started up and its output is saved to the new file.
In multiple output mode, a unique number will automatically be appended to the end of all filenames. If the
filename has an extension then the number is inserted before the extension. This behaviour can be custom-
ized by placing a %n anywhere in the filename where the number should be substituted. An optional num-
ber can be placed after the % to indicate a minimum fixed width for the number.
Multiple output mode is not very useful unless an effect that will stop the effects chain early is specified be-
fore the ‘newfile’. If end of file is reached before the effects chain stops itself then no new file will be cre-
ated as it would be empty.
The following is an example of splitting the first 60 seconds of an input file into two 30 second files and ig-
noring the rest.
sox song.wav ringtone%1n.wav trim 0 30 : newfile : trim 0 30
Stopping SoX
Usually SoX will complete its processing and exit automatically once it
has read all available audio data from the input files.
If desired, it can be terminated earlier by sending an interrupt signal
the given sample rate option will be applied to all three vox
files.
-p, --sox-pipe
This can be used in place of an output filename to specify that
the SoX command should be used as in input pipe to another SoX
command. For example, the command:
play "|sox -n -p synth 2" "|sox -n -p synth 2 tremolo 10" stat
plays two ‘files’ in succession, each with different effects.
-p is in fact an alias for ‘-t sox -’.
-d, --default-device
This can be used in place of an input or output filename to spec-
ify that the default audio device (if one has been built into SoX)
is to be used. This is akin to invoking rec or play (as described
above).
-n, --null
This can be used in place of an input or output filename to spec-
ify that a ‘null file’ is to be used. Note that here, ‘null file’
refers to a SoX-specific mechanism and is not related to any oper-
ating-system mechanism with a similar name.
Using a null file to input audio is equivalent to using a normal
audio file that contains an infinite amount of silence, and as
such is not generally useful unless used with an effect that spec-
ifies a finite time length (such as trim or synth).
Using a null file to output audio amounts to discarding the audio
and is useful mainly with effects that produce information about
the audio instead of affecting it (such as noiseprof or stat).
The sampling rate associated with a null file is by default
48 kHz, but, as with a normal file, this can be overridden if de-
sired using command-line format options (see below).
Supported File & Audio Device Types
See soxformat(7) for a list and description of the supported file for-
mats and audio device drivers.
OPTIONS
Global Options
These options can be specified on the command line at any point before
the first effect name.
The SOX_OPTS environment variable can be used to provide alternative de-
fault values for SoX’s global options. For example:
SOX_OPTS="--buffer 20000 --play-rate-arg -hs --temp /mnt/temp"
Note that setting SOX_OPTS can potentially create unwanted changes in
the behaviour of scripts or other programs that invoke SoX. SOX_OPTS
might best be used for things (such as in the given example) that re-
flect the environment in which SoX is being run. Enabling options such
as --no-clobber as default might be handled better using a shell alias
since a shell alias will not affect operation in scripts etc.
One way to ensure that a script cannot be affected by SOX_OPTS is to
clear SOX_OPTS at the start of the script, but this of course loses the
benefit of SOX_OPTS carrying some system-wide default options. An al-
ternative approach is to explicitly invoke SoX with default option
values, e.g.
SOX_OPTS="-V --no-clobber"
...
sox -V2 --clobber $input $output ...
Note that the way to set environment variables varies from system to
system. Here are some examples:
Unix bash:
export SOX_OPTS="-V --no-clobber"
Unix csh:
setenv SOX_OPTS "-V --no-clobber"
MS-DOS/MS-Windows:
set SOX_OPTS=-V --no-clobber
MS-Windows GUI: via Control Panel : System : Advanced : Environment
Variables
Mac OS X GUI: Refer to Apple’s Technical Q&A QA1067 document.
--buffer BYTES, --input-buffer BYTES
Set the size in bytes of the buffers used for processing audio
(default 8192). --buffer applies to input, effects, and output
processing; --input-buffer applies only to input processing (for
which it overrides --buffer if both are given).
Be aware that large values for --buffer will cause SoX to be be-
come slow to respond to requests to terminate or to skip the cur-
rent input file.
--clobber
Don’t prompt before overwriting an existing file with the same
name as that given for the output file. This is the default be-
haviour.
--combine concatenate|merge|mix|mix-power|multiply|sequence
Select the input file combining method; for some of these, short
options are available: -m selects ‘mix’, -M selects ‘merge’, and
-T selects ‘multiply’.
See Input File Combining above for a description of the different
combining methods.
-D, --no-dither
Disable automatic dither—see ‘Dithering’ above. An example of why
this might occasionally be useful is if a file has been converted
from 16 to 24 bit with the intention of doing some processing on
it, but in fact no processing is needed after all and the original
16 bit file has been lost, then, strictly speaking, no dither is
needed if converting the file back to 16 bit. See also the stats
effect for how to determine the actual bit depth of the audio
within a file.
--effects-file FILENAME
Use FILENAME to obtain all effects and their arguments. The file
is parsed as if the values were specified on the command line. A
new line can be used in place of the special : marker to separate
effect chains. For convenience, such markers at the end of the
file are normally ignored; if you want to specify an empty last
effects chain, use an explicit : by itself on the last line of the
file. This option causes any effects specified on the command
line to be discarded.
-G, --guard
Automatically invoke the gain effect to guard against clipping.
E.g.
sox -G infile -b 16 outfile rate 44100 dither -s
is shorthand for
sox infile -b 16 outfile gain -h rate 44100 gain -rh dither -s
See also -V, --norm, and the gain effect.
-h, --help
Show version number and usage information.
--help-effect NAME
Show usage information on the specified effect. The name all can
be used to show usage on all effects.
--help-format NAME
Show information about the specified file format. The name all
can be used to show information on all formats.
--i, --info
Only if given as the first parameter to sox, behave as soxi(1).
-m|-M
Equivalent to --combine mix and --combine merge, respectively.
--magic
If SoX has been built with the optional ‘libmagic’ library then
this option can be given to enable its use in helping to detect
audio file types.
--multi-threaded | --single-threaded
By default, SoX is ‘single threaded’. If the --multi-threaded op-
tion is given however then SoX will process audio channels for
most multi-channel effects in parallel on hyper-threading/multi-
core architectures. This may reduce processing time, though some-
times it may be necessary to use this option in conjunction with a
larger buffer size than is the default to gain any benefit from
multi-threaded processing (e.g. 131072; see --buffer above).
--no-clobber
Prompt before overwriting an existing file with the same name as
that given for the output file.
N.B. Unintentionally overwriting a file is easier than you might
think, for example, if you accidentally enter
sox file1 file2 effect1 effect2 ...
when what you really meant was
play file1 file2 effect1 effect2 ...
then, without this option, file2 will be overwritten. Hence, us-
ing this option is recommended. SOX_OPTS (above), a ‘shell’ alias,
script, or batch file may be an appropriate way of permanently en-
abling it.
--norm[=dB-level]
Automatically invoke the gain effect to guard against clipping and
to normalise the audio. E.g.
sox --norm infile -b 16 outfile rate 44100 dither -s
is shorthand for
sox infile -b 16 outfile gain -h rate 44100 gain -nh dither -s
See also the norm, vol, and gain effects, and see Input File Bal-
ancing above.
Input & Output File Format Options
These options apply to the input or output file whose name they immedi-
ately precede on the command line and are used mainly when working with
headerless file formats or when specifying a format for the output file
that is different to that of the input file.
-b BITS, --bits BITS
The number of bits (a.k.a. bit-depth or sometimes word-length) in
each encoded sample. Not applicable to complex encodings such as
MP3 or GSM. Not necessary with encodings that have a fixed number
of bits, e.g. A/µ-law, ADPCM.
For an input file, the most common use for this option is to in-
form SoX of the number of bits per sample in a ‘raw’ (‘header-
less’) audio file. For example
sox -r 16k -e signed -b 8 input.raw output.wav
converts a particular ‘raw’ file to a self-describing ‘WAV’ file.
For an output file, this option can be used (perhaps along with
-e) to set the output encoding size. By default (i.e. if this op-
tion is not given), the output encoding size will (providing it is
supported by the output file type) be set to the input encoding
size. For example
sox input.cdda -b 24 output.wav
converts raw CD digital audio (16-bit, signed-integer) to a 24-bit
(signed-integer) ‘WAV’ file.
-c CHANNELS, --channels CHANNELS
The number of audio channels in the audio file. This can be any
number greater than zero.
For an input file, the most common use for this option is to in-
form SoX of the number of channels in a ‘raw’ (‘headerless’) audio
file. Occasionally, it may be useful to use this option with a
‘headered’ file, in order to override the (presumably incorrect)
value in the header—note that this is only supported with certain
file types. Examples:
sox -r 48k -e float -b 32 -c 2 input.raw output.wav
converts a particular ‘raw’ file to a self-describing ‘WAV’ file.
play -c 1 music.wav
interprets the file data as belonging to a single channel regard-
less of what is indicated in the file header. Note that if the
file does in fact have two channels, this will result in the file
playing at half speed.
For an output file, this option provides a shorthand for specify-
ing that the channels effect should be invoked in order to change
(if necessary) the number of channels in the audio signal to the
number given. For example, the following two commands are equiva-
lent:
sox input.wav -c 1 output.wav bass -b 24
sox input.wav output.wav bass -b 24 channels 1
though the second form is more flexible as it allows the effects
to be ordered arbitrarily.
For an output file, this option can be used (perhaps along with
-b) to set the output encoding type For example
sox input.cdda -e float output1.wav
each other.
sampless
Specifies the number of samples directly, as in ‘8000s’. For
large sample counts, e notation is supported: ‘1.7e6s’ is the same
as ‘1700000s’.
Time specifications can also be chained with + or - into a new time
specification where the right part is added to or subtracted from the
left, respectively: ‘3:00-200s’ means two hundred samples less than
three minutes.
To see if SoX has support for an optional effect, enter sox -h and look
for its name under the list: ‘EFFECTS’.
Supported Effects
Note: a categorised list of the effects can be found in the accompanying
‘README’ file.
allpass frequency[k] width[h|k|o|q]
Apply a two-pole all-pass filter with central frequency (in Hz)
frequency, and filter-width width. An all-pass filter changes the
audio’s frequency to phase relationship without changing its fre-
quency to amplitude relationship. The filter is described in de-
tail in [1].
This effect supports the --plot global option.
band [-n] center[k] [width[h|k|o|q]]
Apply a band-pass filter. The frequency response drops logarith-
mically around the center frequency. The width parameter gives
the slope of the drop. The frequencies at center + width and cen-
ter - width will be half of their original amplitudes. band de-
faults to a mode oriented to pitched audio, i.e. voice, singing,
or instrumental music. The -n (for noise) option uses the alter-
nate mode for un-pitched audio (e.g. percussion). Warning: -n in-
troduces a power-gain of about 11dB in the filter, so beware of
output clipping. band introduces noise in the shape of the fil-
ter, i.e. peaking at the center frequency and settling around it.
This effect supports the --plot global option.
See also sinc for a bandpass filter with steeper shoulders.
bandpass|bandreject [-c] frequency[k] width[h|k|o|q]
Apply a two-pole Butterworth band-pass or band-reject filter with
central frequency frequency, and (3dB-point) band-width width.
The -c option applies only to bandpass and selects a constant
skirt gain (peak gain = Q) instead of the default: constant 0dB
peak gain. The filters roll off at 6dB per octave (20dB per
decade) and are described in detail in [1].
These effects support the --plot global option.
See also sinc for a bandpass filter with steeper shoulders.
bandreject frequency[k] width[h|k|o|q]
Apply a band-reject filter. See the description of the bandpass
effect for details.
out-length.
See also the splice effect.
fir [coefs-file|coefs]
Use SoX’s FFT convolution engine with given FIR filter coeffi-
cients. If a single argument is given then this is treated as the
name of a file containing the filter coefficients (white-space
separated; may contain ‘#’ comments). If the given filename is
‘-’, or if no argument is given, then the coefficients are read
from the ‘standard input’ (stdin); otherwise, coefficients may be
given on the command line. Examples:
sox infile outfile fir 0.0195 -0.082 0.234 0.891 -0.145 0.043
sox infile outfile fir coefs.txt
with coefs.txt containing
# HP filter
# freq=10000
1.2311233052619888e-01
-4.4777096106211783e-01
5.1031563346705155e-01
-6.6502926320995331e-02
...
This effect supports the --plot global option.
flanger [delay depth regen width speed shape phase interp]
Apply a flanging effect to the audio. See [3] for a detailed de-
scription of flanging.
All parameters are optional (right to left).
Range Default Description
delay 0 - 30 0 Base delay in milliseconds.
depth 0 - 10 2 Added swept delay in milliseconds.
regen -95 - 95 0 Percentage regeneration (delayed
signal feedback).
width 0 - 100 71 Percentage of delayed signal mixed
with original.
speed 0.1 - 10 0.5 Sweeps per second (Hz).
shape sin Swept wave shape: sine|triangle.
phase 0 - 100 25 Swept wave percentage phase-shift
for multi-channel (e.g. stereo)
flange; 0 = 100 = same phase on
each channel.
interp lin Digital delay-line interpolation:
linear|quadratic.
gain [-e|-B|-b|-r] [-n] [-l|-h] [gain-dB]
Apply amplification or attenuation to the audio signal, or, in
some cases, to some of its channels. Note that use of any of -e,
-B, -b, -r, or -n requires temporary file space to store the audio
to be processed, so may be unsuitable for use with ‘streamed’ au-
dio.
Without other options, gain-dB is used to adjust the signal power
level by the given number of dB: positive amplifies (beware of
Clipping), negative attenuates. With other options, the gain-dB
amplification or attenuation is (logically) applied after the pro-
cessing due to those options.
A popular sound:
play snare.flac phaser 0.89 0.85 1 0.24 2 -t
More severe:
play snare.flac phaser 0.6 0.66 3 0.6 2 -t
pitch [-q] shift [segment [search [overlap]]]
Change the audio pitch (but not tempo).
shift gives the pitch shift as positive or negative ‘cents’ (i.e.
100ths of a semitone). See the tempo effect for a description of
the other parameters.
See also the bend, speed, and tempo effects.
rate [-q|-l|-m|-h|-v] [override-options] RATE[k]
Change the audio sampling rate (i.e. resample the audio) to any
given RATE (even non-integer if this is supported by the output
file format) using a quality level defined as follows:
Quality Band- Rej dB Typical Use
width
-q quick n/a ≈30 @ playback on an-
Fs/4 cient hardware
-l low 80% 100 playback on old
hardware
-m medium 95% 100 audio playback
-h high 95% 125 16-bit mastering
(use with
dither)
-v very 95% 175 24-bit mastering
high
where Band-width is the percentage of the audio frequency band
that is preserved and Rej dB is the level of noise rejection.
Increasing levels of resampling quality come at the expense of in-
creasing amounts of time to process the audio. If no quality op-
tion is given, the quality level used is ‘high’ (but see ‘Playing
& Recording Audio’ above regarding playback).
The ‘quick’ algorithm uses cubic interpolation; all others use
band-limited interpolation. By default, all algorithms have a
‘linear’ phase response; for ‘medium’, ‘high’ and ‘very high’, the
phase response is configurable (see below).
The rate effect is invoked automatically if SoX’s -r option speci-
fies a rate that is different to that of the input file(s). Al-
ternatively, if this effect is given explicitly, then SoX’s -r op-
tion need not be given. For example, the following two commands
are equivalent:
sox input.wav -r 48k output.wav bass -b 24
sox input.wav output.wav bass -b 24 rate 48k
though the second command is more flexible as it allows rate op-
tions to be given, and allows the effects to be ordered arbitrar-
ily.
* * *
Warning: technically detailed discussion follows.
The simple quality selection described above provides settings
Note that repeating once yields two copies: the original audio and
the repeated audio.
reverb [-w|--wet-only] [reverberance (50%) [HF-damping (50%)
[room-scale (100%) [stereo-depth (100%)
[pre-delay (0ms) [wet-gain (0dB)]]]]]]
Add reverberation to the audio using the ‘freeverb’ algorithm. A
reverberation effect is sometimes desirable for concert halls that
are too small or contain so many people that the hall’s natural
reverberance is diminished. Applying a small amount of stereo re-
verb to a (dry) mono signal will usually make it sound more natu-
ral. See [3] for a detailed description of reverberation.
Note that this effect increases both the volume and the length of
the audio, so to prevent clipping in these domains, a typical in-
vocation might be:
play dry.wav gain -3 pad 0 3 reverb
The -w option can be given to select only the ‘wet’ signal, thus
allowing it to be processed further, independently of the ‘dry’
signal. E.g.
play -m voice.wav "|sox voice.wav -p reverse reverb -w reverse"
for a reverse reverb effect.
reverse
Reverse the audio completely. Requires temporary file space to
store the audio to be reversed.
riaa Apply RIAA vinyl playback equalisation. The sampling rate must be
one of: 44.1, 48, 88.2, 96 kHz.
This effect supports the --plot global option.
silence [-l] above-periods [duration threshold[d|%]
[below-periods duration threshold[d|%]]
Removes silence from the beginning, middle, or end of the audio.
‘Silence’ is determined by a specified threshold.
The above-periods value is used to indicate if audio should be
trimmed at the beginning of the audio. A value of zero indicates
no silence should be trimmed from the beginning. When specifying a
non-zero above-periods, it trims audio up until it finds non-si-
lence. Normally, when trimming silence from beginning of audio the
above-periods will be 1 but it can be increased to higher values
to trim all audio up to a specific count of non-silence periods.
For example, if you had an audio file with two songs that each
contained 2 seconds of silence before the song, you could specify
an above-period of 2 to strip out both silence periods and the
first song.
When above-periods is non-zero, you must also specify a duration
and threshold. duration indicates the amount of time that non-si-
lence must be detected before it stops trimming audio. By increas-
ing the duration, burst of noise can be treated as silence and
trimmed off.
threshold is used to indicate what sample value you should treat
as silence. For digital audio, a value of 0 may be fine but for
audio recorded from analog, you may wish to increase the value to
length1 excess
-----------><--->
_________ : : _________________
\ : : :\ ‘
\ : : : \ ‘
\: : : \ ‘
* : : * - - *
\ : : :\ ‘
\ : : : \ ‘
_______________\: : : \_____‘____
: : : :
<---> <----->
excess leeway
fact.
The -rms option will convert all output average values to ‘root
mean square’ format.
The -v option displays only the ‘Volume Adjustment’ value.
The -freq option calculates the input’s power spectrum (4096 point
DFT) instead of the statistics listed above. This should only be
used with a single channel audio file.
The -d option displays a hex dump of the 32-bit signed PCM data
audio in SoX’s internal buffer. This is mainly used to help track
down endian problems that sometimes occur in cross-platform ver-
sions of SoX.
See also the stats effect.
stats [-b bits|-x bits|-s scale] [-w window-time]
Display time domain statistical information about the audio chan-
nels; audio is passed unmodified through the SoX processing chain.
Statistics are calculated and displayed for each audio channel
and, where applicable, an overall figure is also given.
For example, for a typical well-mastered stereo music file:
Overall Left Right
DC offset 0.000803 -0.000391 0.000803
Min level -0.750977 -0.750977 -0.653412
Max level 0.708801 0.708801 0.653534
Pk lev dB -2.49 -2.49 -3.69
RMS lev dB -19.41 -19.13 -19.71
RMS Pk dB -13.82 -13.82 -14.38
RMS Tr dB -85.25 -85.25 -82.66
Crest factor - 6.79 6.32
Flat factor 0.00 0.00 0.00
Pk count 2 2 2
Bit-depth 16/16 16/16 16/16
Num samples 7.72M
Length s 174.973
Scale max 1.000000
Window s 0.050
DC offset, Min level, and Max level are shown, by default, in the
range ±1. If the -b (bits) options is given, then these three
measurements will be scaled to a signed integer with the given
number of bits; for example, for 16 bits, the scale would be
-32768 to +32767. The -x option behaves the same way as -b except
that the signed integer values are displayed in hexadecimal. The
-s option scales the three measurements by a given floating-point
number.
Pk lev dB and RMS lev dB are standard peak and RMS level measured
in dBFS. RMS Pk dB and RMS Tr dB are peak and trough values for
RMS level measured over a short window (default 50ms).
Crest factor is the standard ratio of peak to RMS level (note:
not in dB).
Flat factor is a measure of the flatness (i.e. consecutive sam-
ples with the same value) of the signal at its peak levels (i.e.
to 1.
If -m, -s, or -l is specified, the default value of segment will
be calculated based on factor, while default search and overlap
values are based on segment. Any values you provide still override
these default values.
factor gives the ratio of new tempo to the old tempo, so e.g. 1.1
speeds up the tempo by 10%, and 0.9 slows it down by 10%.
The optional segment parameter selects the algorithm’s segment
size in milliseconds. If no other flags are specified, the de-
fault value is 82 and is typically suited to making small changes
to the tempo of music. For larger changes (e.g. a factor of 2),
41 ms may give a better result. The -m, -s, and -l flags will
cause the segment default to be automatically adjusted based on
factor. For example using -s (for speech) with a tempo of 1.25
will calculate a default segment value of 32.
The optional search parameter gives the audio length in millisec-
onds over which the algorithm will search for overlapping points.
If no other flags are specified, the default value is 14.68.
Larger values use more processing time and may or may not produce
better results. A practical maximum is half the value of segment.
Search can be reduced to cut processing time at the risk of de-
grading output quality. The -m, -s, and -l flags will cause the
search default to be automatically adjusted based on segment.
The optional overlap parameter gives the segment overlap length
in milliseconds. Default value is 12, but -m, -s, or -l flags au-
tomatically adjust overlap based on segment size. Increasing over-
lap increases processing time and may increase quality. A practi-
cal maximum for overlap is the value of search, with overlap typi-
cally being (at least) a little smaller then search.
See also speed for an effect that changes tempo and pitch to-
gether, pitch and bend for effects that change pitch only, and
stretch for an effect that changes tempo using a different algo-
rithm.
treble gain [frequency[k] [width[s|h|k|o|q]]]
Apply a treble tone-control effect. See the description of the
bass effect for details.
tremolo speed [depth]
Apply a tremolo (low frequency amplitude modulation) effect to the
audio. The tremolo frequency in Hz is given by speed, and the
depth as a percentage by depth (default 40).
trim {position(+)}
Cuts portions out of the audio. Any number of positions may be
given; audio is not sent to the output until the first position is
reached. The effect then alternates between copying and discard-
ing audio at each position. Using a value of 0 for the first po-
sition parameter allows copying from the beginning of the audio.
For example,
sox infile outfile trim 0 10
will copy the first ten seconds, while
play infile trim 12:34 =15:00 -2:00
and
play infile trim 12:34 2:26 -2:00
will both play from 12 minutes 34 seconds into the audio up to 15
minutes into the audio (i.e. 2 minutes and 26 seconds long), then
resume playing two minutes before the end of audio.
upsample [factor]
Upsample the signal by an integer factor: factor-1 zero-value sam-
ples are inserted between each pair of input samples. As a re-
sult, the original spectrum is replicated into the new frequency
space (imaging) and attenuated. This attenuation can be compen-
sated for by adding vol factor after any further processing. The
upsample effect is typically used in combination with filtering
effects.
For a general resampling effect with anti-imaging, see rate. See
also downsample.
vad [options]
Voice Activity Detector. Attempts to trim silence and quiet back-
ground sounds from the ends of (fairly high resolution i.e.
16-bit, 44-48kHz) recordings of speech. The algorithm currently
uses a simple cepstral power measurement to detect voice, so may
be fooled by other things, especially music. The effect can trim
only from the front of the audio, so in order to trim from the
back, the reverse effect must also be used. E.g.
play speech.wav norm vad
to trim from the front,
play speech.wav norm reverse vad reverse
to trim from the back, and
play speech.wav norm vad reverse vad reverse
to trim from both ends. The use of the norm effect is recom-
mended, but remember that neither reverse nor norm is suitable for
use with streamed audio.
Options:
Default values are shown in parenthesis.
-t num (7)
The measurement level used to trigger activity detection.
This might need to be changed depending on the noise level,
signal level and other charactistics of the input audio.
-T num (0.25)
The time constant (in seconds) used to help ignore short
bursts of sound.
-s num (1)
The amount of audio (in seconds) to search for qui-
eter/shorter bursts of audio to include prior to the de-
tected trigger point.
-g num (0.25)
Allowed gap (in seconds) between quieter/shorter bursts of
audio to include prior to the detected trigger point.
-p num (0)
The amount of audio (in seconds) to preserve before the
trigger point and any found quieter/shorter bursts.
Advanced Options:
These allow fine tuning of the algorithm’s internal parameters.
-b num
The algorithm (internally) uses adaptive noise estima-
tion/reduction in order to detect the start of the wanted
audio. This option sets the time for the initial noise es-
timate.
-N num
Time constant used by the adaptive noise estimator for when
the noise level is increasing.
-n num
Time constant used by the adaptive noise estimator for when
the noise level is decreasing.
-r num
Amount of noise reduction to use in the detection algorithm
(e.g. 0, 0.5, ...).
-f num
Frequency of the algorithm’s processing/measurements.
-m num
Measurement duration; by default, twice the measurement pe-
riod; i.e. with overlap.
-M num
Time constant used to smooth spectral measurements.
-h num
‘Brick-wall’ frequency of high-pass filter applied at the
input to the detector algorithm.
-l num
‘Brick-wall’ frequency of low-pass filter applied at the in-
put to the detector algorithm.
-H num
‘Brick-wall’ frequency of high-pass lifter used in the de-
tector algorithm.
-L num
‘Brick-wall’ frequency of low-pass lifter used in the detec-
tor algorithm.
See also the silence effect.
vol gain [type [limitergain]]
Apply an amplification or an attenuation to the audio signal. Un-
like the -v option (which is used for balancing multiple input
files as they enter the SoX effects processing chain), vol is an
effect like any other so can be applied anywhere, and several
times if necessary, during the processing chain.
The amount to change the volume is given by gain which is inter-
preted, according to the given type, as follows: if type is ampli-
tude (or is omitted), then gain is an amplitude (i.e. voltage or
linear) ratio, if power, then a power (i.e. wattage or voltage-
squared) ratio, and if dB, then a power change in dB.
When type is amplitude or power, a gain of 1 leaves the volume