0% found this document useful (0 votes)
409 views13 pages

Naudio: Low Latency Audio Recording, Such As That Found in Digital Audio Workstation Software Used in

This document discusses recording audio from a microphone in .NET. It describes how to set up audio recording using NAudio to access the Windows audio APIs, check the recording level by visualizing it, and adjust the recording volume level. Key aspects covered include initializing recording with a chosen sample rate and device, processing audio data in a callback, displaying the level on a progress bar using MVVM bindings, and controlling the input volume with a slider.

Uploaded by

dharshbtech
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
409 views13 pages

Naudio: Low Latency Audio Recording, Such As That Found in Digital Audio Workstation Software Used in

This document discusses recording audio from a microphone in .NET. It describes how to set up audio recording using NAudio to access the Windows audio APIs, check the recording level by visualizing it, and adjust the recording volume level. Key aspects covered include initializing recording with a chosen sample rate and device, processing audio data in a callback, displaying the level on a progress bar using MVVM bindings, and controlling the input volume with a slider.

Uploaded by

dharshbtech
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

In this article I demonstrate how to record from the microphone in .

NET, with support for setting


the recording level, trimming noise from the start and end, visualizing the waveform in WPF and
converting to MP3.

Audio Recording in .NET


The .NET framework does not provide any direct support for recording audio, so I will make use
of the open source NAudio project, which includes wrappers for a number of Windows audio
recording APIs.

Note: It is important to point out that .NET is not an appropriate choice for high sample rate and
low latency audio recording, such as that found in Digital Audio Workstation software used in
recording studios. This is because the .NET garbage collector can interrupt the process at any
point. However, for purposes of recording speech from the microphone, the .NET framework is
more than capable. By default, NAudio asks the soundcard to give us data every 100ms, which
gives plenty of time for the garbage collector to run as well as our own code.

We will make use of the wrappers for the waveIn API's, as these are the most universally
supported, and allow us freedom to choose the sample rate. We will record in mono, 16 bit at
8kHz, which is more than good enough audio quality for speech, and will not overly tax the
processor, which is important as we want to visualize the waveform as well.

Choosing a Capture Device

Normally, you will be able to use the default audio capture device without any difficulties, but
should you need to offer the user a choice, NAudio will allow you to do so. You can use the
WaveIn.DeviceCount and WaveIn.GetDeviceCapabilities to find out how many recording
devices are present, and query for their name and number of supported channels.

On my computer, I have a single waveIn device (Microphone Array) until I plug my headset in,
at which point, a new device appears and becomes the default (device 0 is always the default).

int waveInDevices = WaveIn.DeviceCount;


for (int waveInDevice = 0; waveInDevice < waveInDevices; waveInDevice++)
{
WaveInCapabilities deviceInfo = WaveIn.GetCapabilities(waveInDevice);
Console.WriteLine("Device {0}: {1}, {2} channels",
waveInDevice, deviceInfo.ProductName, deviceInfo.Channels);
}

This produces the following output on my computer:

Device 0: Microphone / Line In (SigmaTel , 2 channels


Device 1: Microphone Array (SigmaTel High, 2 channels

Unfortunately these device names are truncated because the WAVEINCAPS structure only
supports 31 characters. There is a way of getting the full device name, but it is rather convoluted.
Normally, you will choose Device 0 (the default), but if you wish to select a different input
device, simply set the DeviceNumber property on your WaveIn object to the desired number.

Checking the Recording Level

The first step in recording is usually to help the user determine if their microphone is working or
not. This is especially important if the user has more than one input on their soundcard. The way
we achieve this is simply by starting recording and displaying the level of audio detected to the
user with a volume meter. The waveIn APIs do not write anything to disk, so no audio is actually
being ‘recorded' at this point, we are simply examining the input level and then throwing the
captured audio samples away.

To begin capturing audio from the soundcard, we use the WaveIn class in NAudio. We
configure it with the WaveFormat in which we would like to record (in our case 8kHz mono),
before calling StartRecording, to start capturing audio from the device.

waveIn = new WaveIn();


waveIn.DeviceNumber = selectedDevice;
waveIn.DataAvailable += waveIn_DataAvailable;
int sampleRate = 8000; // 8 kHz
int channels = 1; // mono
waveIn.WaveFormat = new WaveFormat(sampleRate, channels);
waveIn.StartRecording();

The DataAvailable event handler will notify us whenever a buffer of audio has been returned to
us from the sound card. The data comes back as an array of bytes, representing PCM sample
data. This is fine if we are planning to write the audio directly to disk, but what if we wish to
have a look at the audio data itself? Each audio sample is 16 bits, i.e. two bytes, meaning that we
will need to convert pairs of bytes into shorts to be able to make sense of the data.

Note: if we were recording in stereo, the 16 bit samples would themselves come in pairs, first the
left sample, then the right sample.

The following code shows how we might process the raw bytes in the DataAvailable event, and
read the individual audio samples out. Notice that we use the BytesRecorded field, not the
buffer's Length property. Also, I have chosen to convert the samples to 32 bit floating point
format and scaled them so the maximum volume is 1.0f. This makes processing them through
effects and visualizing them much easier.

void waveIn_DataAvailable(object sender, WaveInEventArgs e)


{
for (int index = 0; index < e.BytesRecorded; index += 2)
{
short sample = (short)((e.Buffer[index + 1] << 8) |
e.Buffer[index + 0]);
float sample32 = sample / 32768f;
ProcessSample(sample32);
}
}

Note: One complication of using the waveIn and waveOut APIs is deciding on a callback
mechanism. NAudio offers three options. First is function callbacks. This means that the waveIn
API is given a (pinned) function pointer which it calls back onto. This means that your
DataAvailable callback will come in on a background thread. In some ways this is the cleanest
approach, but you need to beware of rogue soundcard drivers that can hang in calls to
waveOutReset when using function callbacks (the SoundMAX chipset found on a lot of laptops
is particularly prone to this problem).

The second is to supply a window handle. The waveIn APIs will post a message back to be
handled on the message queue of that window handle. This method tends to be the most reliable
and most commonly used. One gotcha to watch out for is that if you stop recording and
immediately restart, a message from the old recording session could get handled in the new
session resulting in a nasty exception.

The third is to let NAudio create its own new window and post messages to that. This gets round
any danger of messages from one recording session getting muddled up with another. This is the
callback method that NAudio will use by default if you call the default WaveIn constructor. But
don't use this from a background thread or from a console application, or the new window that
NAudio creates won't actually get round to processing its message queue.

Visualizing the Recording Level

We have seen how we can begin to capture audio from the soundcard for the purposes of
checking the recording level. Now we need to give the user some visual feedback. We will use
WPF for our sample recording application. The simplest control we have available to display a
single numeric value graphically is the ProgressBar. And because it is WPF, we can fully
customize the graphical appearance of the progress bar to look a little more like a volume meter.
I have used a gradient going from green to red to show the current volume level. You can read
more about how I created this ProgressBar template here.

Figure 1 - A Progress Bar Showing the Current Microphone Volume Level

To help provide the volume level to display, I have created a SampleAggregator class. This is
passed every audio sample value we receive and keeps track of the maximum and minimum
values. Then, after a specified number of samples, it raises an event allowing the GUI
components to respond. We need to be careful not to raise too many of these events or
performance will be badly affected. I am raising one every 800 samples, meaning we get 10
updates per second to the screen.
Because I am using data binding, when one of these updates fires, I must raise a
PropertyChangedEvent on my DataContext object (also known as the “ViewModel” in the
MVVM pattern). Here's the XAML syntax for binding to my CurrentInputLevel property:

<ProgressBar Orientation="Horizontal"
Value="{Binding CurrentInputLevel, Mode=OneWay}"
Height="20" />

And here's the code in the ViewModel that ensures that the GUI updates whenever we calculate a
new maximum input level:

private float lastPeak;

void recorder_MaximumCalculated(object sender, MaxSampleEventArgs e)


{
lastPeak = Math.Max(e.MaxSample, Math.Abs(e.MinSample));
RaisePropertyChangedEvent("CurrentInputLevel");
}

// multiply by 100 because the Progress bar's default maximum value is


100
public float CurrentInputLevel { get { return lastPeak * 100; } }

Note: Model View ViewModel (MVVM) is a pattern that is growing in popularity amongst WPF
and Silverlight developers. The basic idea is that you have no code behind whatsoever on your
View (i.e. your xaml markup file), and simply specify all communications with your business
logic by means of data binding. The ViewModel serves as an adapter to ease the process of data
binding. This approach gives very good separation of appearance and behavior. For the most
part, this pattern works very well, but there are a few tricky areas, for which you will need to
either write a few lines of code behind, or make use of some cunning tricks such as attached
dependency properties or custom triggers. There are several excellent open source helper
libraries that can take some of the work out of getting an MVVM application up and running.
Have a look here for a comprehensive list.

Adjusting the Recording Level

Suppose the current input level is too high or too soft. We would like to be able to support
modifying the recording level. Again, we would like to use data binding to do so, so we will add
a volume slider to our XAML:

<Slider Orientation="Horizontal"
Value="{Binding MicrophoneLevel, Mode=TwoWay}"
Maximum="100"
Margin="5" />
Now we have to get hold of the MixerLine that will allow us to access the input volume control
for our waveIn device. This requires us to make use of the Windows mixer APIs, which also
have wrappers in NAudio. Getting hold of this volume control is not always as straightforward as
you might hope (and can require different approaches for XP and Vista), but the following is
code that seems to work on most systems:

private void TryGetVolumeControl()


{
int waveInDeviceNumber = 0;
var mixerLine = new MixerLine((IntPtr)waveInDeviceNumber,
0, MixerFlags.WaveIn);
foreach (var control in mixerLine.Controls)
{
if (control.ControlType == MixerControlType.Volume)
{
volumeControl = control as UnsignedMixerControl;
break;
}
}
}

Now we can use the Percent property on the UnsignedMixerControl to set volume to a value
anywhere between 0 and 100.

Starting Recording

Now we have got our recording levels set up correctly, we are ready to actually start recording.
But since we have already opened our waveIn device, all we need to do is start writing the data
we have received into a file.

NAudio has a class called WaveFileWriter which will allow us to write our recorded data to a
file. For now, we will write it to a temporary file in PCM format, and convert it later into a better
compressed format such as MP3. The following code creates a new WAV file:

writer = new WaveFileWriter(waveFileName, recordingFormat);

Now we can write to the file as we receive notifications from the waveIn device:

void waveIn_DataAvailable(object sender, WaveInEventArgs e)


{
if (recordingState == RecordingState.Recording)
writer.WriteData(e.Buffer, 0, e.BytesRecorded);

// ...
}
Note: There are three main options for how to store audio while it is being recorded. First, you
can write it to a MemoryStream. This saves the inconvenience of dealing with a temporary file,
but you need to be careful not to run out of memory. Also, if your recording program crashes
half way through, you have lost everything. At the sample rate we are using for this demo, one
minute of audio takes just under 1 MB of memory, but if you were recording at 44.1kHz stereo
(the standard for music), you would need about 10 MB per minute.

Second, you can write to a temporary WAV file to be converted to another format later, as we
are doing here. While this is not a disk space efficient format, it is very easy to work with, and
particularly useful if you are planning to apply any effects or edit the audio in any way after
recording.

Third, you can pass the audio directly to an encoder (such as WMA or MP3) as it is being
recorded. This might be the best option if you are making a longer recording, and have no need
to edit it after recording.

Stopping Recording

Obviously we will stop when the user clicks the stop recording button, but we might also want to
set a maximum recording duration to stop the user inadvertently filling up their hard disk. For
this example, we will allow one minute of recording.

long maxFileLength = this.recordingFormat.AverageBytesPerSecond * 60;

int toWrite = (int)Math.Min(maxFileLength - writer.Length,


bytesRecorded);
if (toWrite > 0)
writer.WriteData(buffer, 0, bytesRecorded);
else
Stop();

Note: Something that can be slightly confusing for users is that when using window callbacks
with WaveIn, the last bit of audio you recorded comes in after you have asked recording to stop,
so make sure you don't close the file you are saving to until you have got all the audio back. The
FinishedRecording event on the WaveIn object will help you determine when it is safe to close
the WaveFileWriter and clean up your resources.

Visualizing the Wave Form

It is often desirable to display the audio waveform to the user. Displaying the waveform while
you are recording is sometimes called “confidence recording”, because it allows you to see that
audio is being recorded as expected and the levels are still right.

There are a variety of possible approaches for drawing audio waveforms. The simplest is to draw
a vertical line showing the minimum and maximum values every time our sample aggregator
fires:
Figure 2 - Audio Waveform using vertical lines

At first glance it may seem that this would be trivial to implement in WPF, but there is a real
danger of consuming too many resources. For example, simply adding a new line to a Canvas
every time a new maximum sample is calculated performs very badly, so it is better to have a
fixed number of vertical lines and resize them dynamically.

Another approach is to create a polygon. This requires us to add two points to a Polygon's Points
collection every time we receive a new sample. The trick is to add these points in the middle of
the Points collection, rather than at the end, so that the end result is a single shape. This means
our waveform can have a different outline color and fill color. To stop the edges from appearing
too jagged, we plot points two units apart along on the X axis.

Figure 3 - Audio Waveform rendered using a Polygon

Like the microphone volume meter, the waveform drawing control needs to receive several
notifications a second of the maximum and minimum sample values received by the
SampleAggregator. When each sample value is received, we either insert new points into our
polygon, or, if the whole screen is full, we go back to the left-hand edge and continue drawing
from there.

For the confidence recording display I have used the Polygon method, which is in a class called
PolygonWaveFormControl. Here's the code which calculates the new points or updated point
locations as we receive a new maximum sample:

public void AddValue(float maxValue, float minValue)


{
int visiblePixels = (int)(ActualWidth / xScale);
if (visiblePixels > 0)
{
CreatePoint(maxValue, minValue);

if (renderPosition > visiblePixels)


{
renderPosition = 0;
}
int erasePosition = (renderPosition + blankZone) %
visiblePixels;
if (erasePosition < Points)
{
double yPos = SampleToYPosition(0);
waveForm.Points[erasePosition] =
new Point(erasePosition * xScale, yPos);
waveForm.Points[BottomPointIndex(erasePosition)] =
new Point(erasePosition * xScale, yPos);
}
}
}

private void CreatePoint(float topValue, float bottomValue)


{
double topYPos = SampleToYPosition(topValue);
double bottomYPos = SampleToYPosition(bottomValue);
double xPos = renderPosition * xScale;
if (renderPosition >= Points)
{
int insertPos = Points;
waveForm.Points.Insert(insertPos, new Point(xPos, topYPos));
waveForm.Points.Insert(insertPos + 1, new Point(xPos,
bottomYPos));
}
else
{
waveForm.Points[renderPosition] = new Point(xPos, topYPos);
waveForm.Points[BottomPointIndex(renderPosition)] =
new Point(xPos, bottomYPos);
}
renderPosition++;
}

The erase position calculation is to blank out some previous sample values to make it obvious
where the new data is appearing after we have wrapped around once:

Figure 4 PolygonWaveForm control's “blank zone”

Note: There are faster ways to perform rendering in WPF. One option is to use the
WriteableBitmap class and draw directly onto it. This could be a good approach if you were
using the vertical lines method of rendering. The second is to use DrawingVisual objects, which
are lightweight drawing objects offering better performance than using classes derived from
Shape. The down-side is the loss of features such as DataBinding and the ability to fully describe
the picture in XAML, but for WaveForm drawing this is not really a drawback. I use the
DrawingVisual method in the Save Audio part of this application.

Another challenge was how the waveform drawing control could receive notifications since I am
using MVVM so I have no direct access to the SampleAggregator. A simple way around this was
to create a Dependency Property on PolygonWaveFormControl:

public static readonly DependencyProperty SampleAggregatorProperty =


DependencyProperty.Register(
"SampleAggregator",
typeof(SampleAggregator),
typeof(PolygonWaveFormControl),
new PropertyMetadata(null, OnSampleAggregatorChanged));

public SampleAggregator SampleAggregator


{
get { return
(SampleAggregator)this.GetValue(SampleAggregatorProperty); }
set { this.SetValue(SampleAggregatorProperty, value); }
}

private static void OnSampleAggregatorChanged(object sender,


DependencyPropertyChangedEventArgs e)
{
PolygonWaveFormControl control = (PolygonWaveFormControl)sender;
control.Subscribe();
}

This allows us to bind the PolygonWaveFormControl to the SampleAggregator made public on


our DataContext:

<my:PolygonWaveFormControl
Height="40"
SampleAggregator="{Binding SampleAggregator}" />

Trimming the Audio

We have created a temporary WAV file, but before the user saves it to a file of their choosing,
we want to allow them to trim off any unwanted parts from the start and end of the recording. To
do this I would like to display the entire recorded waveform, with a selection rectangle
superimposed on top to allow a sub-range to be selected.
Figure 5 - GUI to allow selection of a portion of the recorded audio

To accomplish this kind of interface we need three components. The first is a ScrollViewer. The
ScrollViewer allows us to scroll left and right through the WaveForm if it is too big to fit onto a
screen, which is likely if you record more than a few seconds of audio.

The second is a new type of WaveForm renderer that will render an entire file, rather than my
PolygonWaveFormControl which started again at the left when the screen filled up. For this I
created WaveFormVisual which uses DrawingVisual objects to draw the entire WaveForm.
Obviously if we wanted to record for a long period, this approach would need to be optimised as
the polygon it creates would have thousands of points, but for short recordings, it works fine.

The third piece was the hardest to get right – the selection rectangle to support mouse dragging
selection of the waveform. For this I created the RangeSelectionControl.

The RangeSelectionControl is simply a blue rectangle with a solid outline and semi-transparent
fill sitting on a Canvas. The magic occurs in the mouse handler. We need to detect when the user
hovers over the left or right edge of the rectangle, and set the cursor to show a horizontal resizing
icon. This can be done in the MouseMove event, checking the X coordinate and then setting the
Cursor property:

Cursor = Cursors.SizeWE;

When the user clicks the left-button while over the edge, we begin to drag. Key to this is calling
Canvas.CaptureMouse. If we don't do this, as soon as you try to drag the rectangle bigger, the
mouse move events are lost to other controls underneath.

void RangeSelectionControl_MouseDown(object sender,


MouseButtonEventArgs e)
{
if (e.LeftButton == MouseButtonState.Pressed)
{
Point position = e.GetPosition(this);
Edge edge = EdgeAtPosition(position.X);
DragEdge = edge;
if (DragEdge != Edge.None)
{
mainCanvas.CaptureMouse();
}
}
}

Now in the MouseMove methods, we can change the Canvas.Left and Width properties of the
rectangle to resize it.

The ScrollViewer is quite straightforward to use, but you must remember to set
CanContentScroll property to true, and also to set the size of the items within the ScrollViewer
correctly.

<ScrollViewer CanContentScroll="True"
HorizontalScrollBarVisibility="Visible"
VerticalScrollBarVisibility="Hidden">
<Grid>
<my:WaveFormVisual Height="100"
HorizontalAlignment="Left"
x:Name="waveFormRenderer"/>
<my:RangeSelectionControl
HorizontalAlignment="Left"
x:Name="rangeSelection" />
</Grid>
</ScrollViewer>

We set the appropriate Width of the WaveFormVisual and RangeSelectionControl based on the
total number of points we have drawn in the waveform.

Saving the Audio

So we are finally ready to save the audio. We will offer the user two choices of format to save in.
The first is simply to save as a WAV file. If the user has selected the entire recording, we only
need to copy the audio across to their desired location. If, however, the user has selected a sub-
range, then we need to trim the WAV file. This can be quickly accomplished using a
TrimWavFile utility function that copies from a WAV file reader to a WAV file writer, skipping
over a certain number of bytes from the beginning and end.

public static void TrimWavFile(string inPath, string outPath,


TimeSpan cutFromStart, TimeSpan cutFromEnd)
{
using (WaveFileReader reader = new WaveFileReader(inPath))
{
using (WaveFileWriter writer =
new WaveFileWriter(outPath, reader.WaveFormat))
{
int bytesPerMillisecond =
reader.WaveFormat.AverageBytesPerSecond / 1000;

int startPos = (int)cutFromStart.TotalMilliseconds *


bytesPerMillisecond;
startPos = startPos - startPos %
reader.WaveFormat.BlockAlign;

int endBytes = (int)cutFromEnd.TotalMilliseconds *


bytesPerMillisecond;
endBytes = endBytes - endBytes %
reader.WaveFormat.BlockAlign;
int endPos = (int)reader.Length - endBytes;

TrimWavFile(reader, writer, startPos, endPos);


}
}
}

private static void TrimWavFile(WaveFileReader reader,


WaveFileWriter writer, int startPos, int endPos)
{
reader.Position = startPos;
byte[] buffer = new byte[1024];
while (reader.Position < endPos)
{
int bytesRequired = (int)(endPos - reader.Position);
if (bytesRequired > 0)
{
int bytesToRead = Math.Min(bytesRequired, buffer.Length);
int bytesRead = reader.Read(buffer, 0, bytesToRead);
if (bytesRead > 0)
{
writer.WriteData(buffer, 0, bytesRead);
}
}
}
}

We also want to offer the ability to save as MP3. The easiest way to create MP3 files is to use the
open source LAME MP3 encoder (do a web search for lame.exe to get hold of this application if
you haven't already got it). Our application will look in the current directory, and prompt the user
to find lame.exe if it is not present, as we do not include it in the application download.
Assuming you do provide a valid path, we can then convert our (trimmed) WAV file to MP3 by
simply calling lame.exe with the appropriate parameters.

public static void ConvertToMp3(string lameExePath,


string waveFile, string mp3File)
{
Process converter = Process.Start(lameExePath, "-V2 \"" + waveFile
+ "\" \"" + mp3File + "\"");
converter.WaitForExit();
}

We end up with a nice compact MP3 file containing the selected portion of our microphone
recording.

Exploring the Sample Code Solution

The main WPF sample application is found in the VoiceRecorder project. This contains the
main window along with the three views and their associated ViewModels.
VoiceRecorder.Core contains some WPF helper classes and user controls to help with the
plumbing and GUI of the application, while VoiceRecorder.Audio contains the classes that
actually perform the recording, editing and converting of audio.

About the Author

Mark Heath is a software developer currently working for NICE CTI Systems in Southampton,
UK. He specializes in .NET development with a particular focus on client side technologies and
audio playback. He blogs about audio, WPF, Silverlight and software engineering best practices
at http://mark-dot-net.blogspot.com. He is the author of several open source projects hosted at
CodePlex, including NAudio, a low-level .NET audio toolkit (http://www.codeplex.com/naudio).

You might also like