Processing Stereo Audio Files
Processing Stereo Audio Files
Processing Stereo Audio Files
2013
Just how does stereo work, and how can you manipulate a stereo audio signal to your mixs advantage?
Hugh Robjohns
veryone is familiar with stereo sound, but when it comes to mixing theres more to it than being able to place a sound at a point between two speakers. Various processes can be applied to a stereo signal to rebalance a stereo image, to make mono sources appear to be heard in stereo, or just to make something sound more impressive in stereo. However, for some processes there are trade-offs: have you ever used a widener to pan guitars outside the speakers, only to find that you cant hear them in mono? This article explains how stereo works, explores what you can do to manipulate stereo files, and discusses the trade-offs. Ill start with a little history... Intensity Stereo The origins of stereophonic audio reproduced over two channels can be traced back to Clment Ader and the Paris Electrical Exhibition of 1881, but the real basis of two-channel stereo as we know it today dates from the pioneering work of Alan Blumlein and his EMI colleagues in the early 1930s. Blumlein realised that sound reproduction using multiple speakers inherently means that both ears hear all of the speakers. Consequently, trying to reproduce time-of-arrival differences captured by spaced microphones would be extremely problematic: the physical spacing of the speakers relative to the listener would add further time-of-arrival differences, compromising the accuracy of the imaging. Blumlein saw that this apparent problem (both ears hearing both speakers) could be used to advantage, if only level or intensity differences were relayed from the two speakers. If the physical placement of the speakers was controlled, the inherent time-of-arrival differences between the speakers and ears could be used to fool the human hearing system into converting the source-signal intensity differences into perceived time-of-arrival differences and hence creating believable and stable stereo imaging. For this reason, Blumleins stereophonic system was originally referred to as producing Intensity Stereo. Blumlein In Practice A handy advantage of Blumleins approach is that it is inherently mono-compatible: combining the two channels results in a clean mono mix, with no unwanted coloration. To work correctly, the physical relationship between the listener and the two speakers is constrained, such that they each sit at the corners of an equilateral triangle, typically with the length of each side between two and four metres, Alan Blumlein, the Godfather of depending on the size of the speakers and the room. The interaction of the signals from both speakers modern stereophonic reproduction. arriving at each ear results in the creation of a new composite signal, which is identical in wave shape but shifted in time. The time-shift is towards the louder sound and creates a fake time-of-arrival difference between the ears, so the listener interprets the information as coming from a sound source at a specific bearing somewhere within a 60-degree angle in front. If the two speakers produce equally loud sounds, the signal combinations at both ears are identical, so there are no apparent time-of-arrival differences and the sound image is perceived to be directly in front of the listener, as a phantom centre image. Varying the relative levels of the two channels introduces apparent time-shifts, and offsets the perceived source position towards the louder side. Although the exact level offset needed for a given position varies slightly with hearing acuity and the monitoring conditions, a figure of 12-16dB is generally sufficient to place a sound firmly over to the louder side. The inter-channel level differences required to create the illusion of a sound source somewhere between the speakers can be created artificially using a pan pot, of course, but real spatial information can also be captured when recording, using a coincident microphone array. If, rather than sitting at the apex of the ideal listener-speaker triangle, the listener moves over to one side, the stereo image quickly collapses into the nearer speaker, because the signal from the closer speaker arrives much earlier than that from the more distant one. The resulting physical time-of-arrival differences completely swamp those generated by the inter-channel level differences of the sounds they are reproducing. Panning
The basic physics underpinning Blumleins theory: the sound from each speaker takes longer to
Lets move on to consider the different ways of controlling and manipulating the stereo image. The obvious reach one ear than it does the starting point is the pan pot, originally called the panoramic potentiometer and invented in 1938 by other, due to distance and the Disneys sound department as part of their pioneering work for the film Fantasia. The pan pot is a device shadowing effect of the head. with one input and two outputs, and varies the signal level reaching each output. When set to a central position, equal amounts of the input signal are passed to each output. Theres no inter-channel level difference, so theres a phantom centre image. As the control is rotated towards one side, that output receives a constant amount of input signal, while the opposite side receives less and less. The resulting inter-channel level difference creates the required stereo image position from the speakers. On stereo channels, the pan pot is usually replaced with a balance control. This type of control usually
www.soundonsound.com/sos/nov10/articles/stereoprocessing.htm
1/8
17.05.2013
Left: the best listening position for stereo sound, with an equilateral triangle between the listeners ears and the speakers. Middle: moving away from the points of the triangle, the stereo image is compromised. Right: sounds from each speaker reach each ear at different times, and this is the basis of stereo sound.
It follows from this that the balance between the Mid and Side signals determines stereo width. If the Side signal is removed completely, all that remains is a mono sum and the resulting sound is often not quite what you might expect or hope for. For example, if the stereo recording was captured with spaced mics, or has timing differences between channels (such as caused by an azimuth error on a tape machine), the mono signal may well sound dull compared with the stereo version. This is a surprisingly common issue with some samples and loops and comes back to our old friend, mono compatibility! Increasing the level of the Side signal relative to the Mid increases the significance of the difference elements within the stereo image, giving the effect of a wider image: elements panned towards the edges become more dominant. Stereo sound can be captured and conveyed in either the left/right format (used in conventional systems like mixing consoles and CD players), or in M/S format (which is used for FM radio broadcasts, and is effectively at the heart of stereo vinyl records). Its simple to convert between the two formats using a phase amplitude matrix. The same process is used to create M/S from L/R, or L/R from M/S, and the necessary equations are: Mid = (left + right) 3dB Side = (left right) 3dB Left = (mid + side) 3dB Right = (mid side) 3dB The 3dB attenuations are optional: theyre included so that a complete round-trip process (say L/R to M/S to L/R) doesnt result in an increase in signal level. Many matrix systems dont apply the attenuation as part of the conversion, so the overall level may need to be reduced manually after multiple passes through the matrix. Creating An M/S Matrix Manually Most DAWs include an M/S conversion plug-in, but plenty of third-party plug-ins can do the job such as the freeware Voxengo MSED (www.voxengo.com/group/freevst) and hardware equivalents are available too. Using a dedicated conversion matrix is the easiest way to convert between the formats, but the matrices are trivially simple to create manually in hardware or software mixers. To convert L/R to M/S, you need to both sum the two channels together (which is exactly what a mixing bus does) and subtract them. For the subtraction, all thats required is to flip the polarity of one channel and then mix them together again: if the two channels are carrying the same material there will be no output (because theres no difference), and if theyre carrying different things there will be an output. The first thing to do is route the matrix input channels to a pair of buses (lets say 47 and 48). Route the left input equally to both buses, and duplicate the right input, with the original version going to bus 47 and the duplicate to bus 48. This duplicated channel also requires a polarity inversion. So, bus 47 receives the left and right inputs (L+R = Mid), while bus 48 receives the left input and a polarity inverted right input (LR = Side). Obviously, the two right channels must have perfectly matched signal levels, and the overall gain must equal that of the left channel through to the buses. Depending on the specific configuration of the mixer, it may therefore be necessary to fine-tune the signal path gains to match levels properly. Often, where a panpot has to be used as part of the signal routing, the built-in attenuation (or gain) of the pan-pot when panned fully to one side, or in the centre, will mess up the levels slightly. So, having set the routing up, its worth checking and adjusting the level from the inputs through to the buses with a reference alignment tone or similar. Exactly the same routing arrangements are used to convert back from M/S to L/R. The Mid signal feeds both left and right buses, while the Side signal is duplicated with a polarity inversion in the feed to the right bus. In this case, though, being able to adjust the level of the Side
Plug-ins such as Voxengos freeware MSED perform the M/S matrix processing for you, but as you can see from the diagram, its also relatively easy to set up the conversion process using a DAW or a desk.
www.soundonsound.com/sos/nov10/articles/stereoprocessing.htm
2/8
17.05.2013
Another simple application of fiddling with the balance of Mid and Side signals can be to help tame an over-reverberant stereo recording. Captured reverberation in a stereo signal is normally incoherent between the two channels, and so tends to exist mainly in the Side channel. Reducing the level of the Side signal can reduce the apparent amount of audible reverberation to a useful degree. With judicious use of equalisation in the Side channel, its often possible to rein in the more obvious reverberation elements without squeezing the wanted signal back to mono too. Lots of plug-ins are available to perform this kind of processing with greater or lesser degrees of sophistication, such as the various offerings from Brainworx, and there are also some hardware units, such as the Rupert Neve Designs Portico 5014 Stereo Field Editor, which do much the same thing. In fact, the use of equalisation on the Side signal is fairly fundamental to the strength of the technique as a processing format. For example, a little boost across just the top octave or two (boost above 8-16kHz) generally adds width at the top end, making the mix sound a little more spacious, airy and open. If the stereo source being processed was captured with coincident mics, or derived using pan-pots with multitracked sources, the image will become wider, but also remain precise and sharply focused. However, if the stereo signal was captured using spaced mics, increasing the Side signal will tend to blur the imaging even more than spaced mic arrays do naturally resulting in a wider, but less well defined stereo image. In practice, this is unlikely to be an issue, and the perceived benefits of the wider width will probably outweigh the less accurate imaging. Applying some LF cut to the Side signal has the effect of narrowing the bass, making it much easier to cut as a vinyl record, and often making it sound more cohesive and punchy at the bottom end too. Conversely, boosting the LF in the side signal will make it sound much more spacious and natural although its a good idea to limit the boost to no more than 6dB with a shelf equaliser, having maximum boost below about 250Hz (and turning over below 600Hz). Once again Blumlein got here first, with a process he called Shuffling which, in effect, converts small LF phase differences between channels into useful level differences that enhance the stereo spread. This technique works very well with simple coincident mic arrays, and is well worth experimenting with. You can also use the M/S domain to process the dynamics of a mix, taking advantage of the format to affect central sounds independently of more widely spaced sounds. For example, if the lead vocals (which are normally central) are a little too loud in the mix, compressing the Mid channel independently of the Side channel can often help to re-balance things without disturbing the more widely spaced backing singers, guitars and drums. Often, in this kind of application, though, it helps to use a multi-band approach using equalisation to restrict the part of the spectrum over which the dynamics processor has effect. Again, the Brainworx BX dynEQ allows this kind of approach, as does a new model called BX Shredspread, which is optimised for dealing with electric guitars. Another application combining equalisation and dynamics is the processing of sibilance in a vocal without affecting the brilliance and clarity of the more widely spaced cymbal crashes or guitar and keyboard parts. Some of the more sophisticated M/S processing plug-ins also allow some manipulation of the phase relationship between the Mid and Side signals. This affects how they recombine when converting back to L/R stereo, and has the effect of altering the perceived depth of the stereo image, essentially allowing central sources to be pulled forward or pushed back relative to the edge sources. Again, the Portico Stereo Field Editor includes this kind of feature. Faking Stereo With M/S A common requirement is to create a stereo effect from a mono source, and there are several different ways of achieving this, each with various pros and cons. One very effective, and totally mono-compatible, solution is to treat the mono source as the Mid element of a Mid/Side stereo signal, create a fake Side signal to go with it, and then decode them together to form a normal left-right stereo signal. This is essentially what most stereo enhancers actually do though possibly with a few extra bells and whistles thrown in...
www.soundonsound.com/sos/nov10/articles/stereoprocessing.htm
3/8
17.05.2013
www.soundonsound.com/sos/nov10/articles/stereoprocessing.htm
4/8
17.05.2013
www.soundonsound.com/sos/nov10/articles/stereoprocessing.htm
5/8
17.05.2013
Panning a signal inherently boosts or attenuates it, and various pan law s have been developed to avoid the problems that this creates.
There are several different pan pot laws in common use that determine the relationship between the control rotation and the inter-channel level difference, generally following a sine/cosine relationship. An important aspect of the panning law is the actual output level from each channel when the pan pot is at the centre position, and common options are 3, 4.5 or 6dB of attenuation. Why the different options? In systems where mono compatibility is critically important such as in broadcast environments it makes sense to ensure that the signal level of a source doesnt vary as it is panned across the sound stage. When panned centrally, the input signal is obviously sent to both output channels, and when summed to mono those two channels are added together. Summing two identical signals electrically results in a 6dB increase in level, so to maintain a constant derived mono signal level regardless of pan position, the centre point of the pan pot needs to attenuate both outputs by 6dB relative to the extreme edge positions. This is called a constant-voltage law. Some systems achieve this level variation by passing the input signal unchanged when panned fully to one side or the other, but attenuate it progressively as the pan nears the centre. Others increase the signal gain as the input is panned towards the edges, and some use a combination of both techniques. There are advantages and disadvantages of each approach, but the important aspect is that when panned centrally, the level at both outputs is reduced relative to the level when panned fully to one side. As we have seen, for ideal mono compatibility, 6dB of attenuation is required for centrally panned sources. However, when listening to stereo speakers their outputs combine acoustically, not electrically, and this implies that a different attenuation amount is required. When reproducing the same signal from both speakers, the perceived level increases by only about 3dB compared with the same signal from one speaker only. So to maintain a constant perceived level as the pan pot is rotated, the centre position attenuation needs to be just 3dB rather than 6dB. This is called a constant-power law. While a lot of modern DAWs allow the user to choose the most appropriate pan law for their application, it isnt very practical to reconfigure analogue mixing consoles for different pan laws. So a compromise option has been widely used for many decades, providing 4.5dB of attenuation for central sources. For someone listening in mono, panning a source across the sound stage using this compromise law will result in a barely noticeable 1.5dB bulge around the central position. A stereo listener would hear a similarly marginal drop in level across the centre. In practice, these pan level errors are usually negligible and few casual listeners even notice. Nevertheless, it does pay to be aware of the effect of different panning laws, as they do affect the fine balance of panned elements within a mix when auditioned in mono and stereo.
When using two dynamics processors, such as limiters or compressors, to process a normal left-right stereo signal, it is normally essential that the gain reduction is applied identically for both channels. In practice, this is achieved with a stereo link switch, which ensures that the control voltages generated by the two side-chains are applied equally to both channels. Depending on the design of the units, one channel might provide complete control of all parameters (threshold, ratio, attack, release and so on) for both units, but more usually each unit has to be adjusted for identical settings, so that both side-chains work in the same way. Linking the two channels modifies the overall level of the stereo signal in the appropriate way, and the stereo image is not affected. If the two compressors were not linked, a loud sound on the extreme left-hand side, say, would trigger the left channel compressor to reduce the gain, while the right-hand channel compressor would not react at all. As a result, central sources (which should have equal level in both channels) would appear to rush to the right-hand side, as they would be louder in the right channel than in the compressed left channel. As the left channels compressor dumped the gain reduction, the image would drift back towards the centre. Clearly, this kind of image shifting is normally very undesirable, hence the need to link compressors when processing left-right stereo. However, when using dynamics for Mid/Side processing, the aim is deliberately to process the Mid and Side channels independently, specifically to achieve a rebalancing effect. So the stereo-linking facility is not used when processing Mid/Side signals and if the Mid channel is compressed slightly without corresponding compression of the Side channel, the re-converted left-right stereo signal will appear to breathe in and out: the stereo width will tend to vary as the processing is applied. Fortunately, most people are very insensitive to this kind of image variation, and perceive only the altered dynamics, which will have the effect of rebalancing the centre and edge sounds.
Before considering the various ways of processing stereo material, it would help to understand what stereo actually is and how it works. Our human hearing system, although only equipped with two sound receptors, is capable of detecting and locating (with varying degrees of accuracy) sounds in the full sphere surrounding us, and it does this using a combination of three basic techniques: time-of-arrival and phase differences, level differences, and spectral analysis. The primary method involves the detection of time-of-arrival-differences of a sound at each ear. A sound wave takes roughly one millisecond to travel one foot (3.4ms per metre). So for any sound source that is not directly in front of or behind the head, sound waves will reach one ear fractionally before the other. The average spacing of an adults ears is a little over six inches (15cm), suggesting that the largest possible time-of-arrival difference for a sound to reach each ear is about 0.5ms. In practice, most people can determine the bearing of a sound in front of them (in the horizontal plane) to within about two degrees, and that corresponds to a time-of-arrival difference of less than 0.01 milliseconds! The brain is believed to latch on to the leading edge of a transient sound as the reference point for measuring time-of-arrival differences, and in normal life most sounds contain plentiful transient information on which to base arrival time difference measurements. It follows, though, that if a sound does not contain transients, a time-of-arrival difference cannot be calculated and this is revealed clearly when trying to locate the acoustic source of a continuous tone signal. While time-of-arrival differences provide left-right positioning information, theyre unable to differentiate between frontal and rearward sounds. The time-of-arrival difference for a sound at 45 degrees (front-right) to the listener is the same as that for a sound approaching at 135 degrees (the same angle (back-right) behind). This inherent ambiguity is resolved by small automatic and unconscious movements of the head to augment the time-of-arrival information. By rotating and tilting the head slightly, the time-of-arrival difference changes and, for example, rotating the head to the left will decrease the time-of-arrival difference if the source is in front, but will increase it if the source is behind the listener. If you are unable to move your head it becomes almost impossible to tell whether a sound source is in front or behind, and this may be the
www.soundonsound.com/sos/nov10/articles/stereoprocessing.htm
6/8
17.05.2013
Real-world Examples
Theres plenty of commercially released material in which you can hear interesting stereo-manipulation processes at work, or where mono and stereo playback affects the tonality due to phase cancellation when summed to mono. Ive commented on a few examples below. Madonna: Vogue, from the album Immaculate Conception. This album mostly contains Madonnas hit tracks remixed using a processing system called Archer Q Sound (now called QSound Labs), which, it is claimed, gives a 3D soundstage via stereo replay systems. It certainly produces impressively large and wide stereo sound, but is generally less satisfactory (or even unpleasant) in mono. Vogue has huge keyboard pads from the start, the tonality of which changes significantly and not for the better when auditioned in mono. Many other instruments also change character in detrimental ways, including some vocal lines and percussion. Eurythmics: Ball and Chain, from the album Be Yourself Tonight. Midway through this album track, there is a sequence where the main motif starts to revolve in a circular motion around the listener. Its a very powerful effect on headphones, and actually remains quite effective on stereo speakers. However, when auditioned in mono, the volume of the track dips to nothing repeatedly. The reason is that the rotating effect is achieved with a co-ordinated combination of autopanning and polarity reversals the latter being responsible for the level nulls when the pan passes through the centre! Nirvana: Smells Like Teen Spirit from the album Nevermind. Andy Wallaces mix of this song may be impressive, but if you compare the track in stereo and mono, youll notice that the tonality of the guitar parts changes significantly, with an obvious attenuation of the higher harmonic frequencies when listened to in mono. In all of these examples, the mono version is perfectly acceptable, but the sound quality or character is definitely compromised in comparison with the stereo original. They provide good illustrations of how the engineer, producer or artist has struck a balance between creating interest in the stereo version and keeping a solid and acceptable mono mix balance.
www.soundonsound.com/sos/nov10/articles/stereoprocessing.htm
7/8
17.05.2013
www.soundonsound.com/sos/nov10/articles/stereoprocessing.htm
8/8