Lab 04: Synthesis of Sinusoidal Signals-Music Synthesis: Signal Processing First
Lab 04: Synthesis of Sinusoidal Signals-Music Synthesis: Signal Processing First
Pre-Lab and Warm-Up: You should read at least the Pre-Lab and Warm-up sections of this lab assignment
and go over all exercises in the Pre-Lab section before going to your assigned lab session.
Verification: The Warm-up section of each lab must be completed during your assigned Lab time and
the steps marked Instructor Verification must also be signed off during the lab time. One of the laboratory
instructors must verify the appropriate steps by signing on the Instructor Verification line. When you have
completed a step that requires verification, simply demonstrate the step to the TA or instructor. Turn in the
completed verification sheet to your TA when you leave the lab.
Lab Report: It is only necessary to turn in a report on Section 4 with graphs and explanations. You are
asked to label the axes of your plots and include a title for every plot. In order to keep track of plots, include
your plot inlined within your report. If you are unsure about what is expected, ask the TA who will grade
your report.
1 Introduction
This lab includes a project on music synthesis with sinusoids. The piece, Fugue #2 for the Well-Tempered
Clavier by Bach1 has been selected for doing the synthesis program. The project requires an extensive
programming effort and should be documented with a complete formal lab report. A good report should
include the following items: a cover sheet, commented M ATLAB code, explanations of your approach,
conclusions and any additional tweaks that you implemented for the synthesis. Since the project must be
evaluated by listening to the quality of the synthesized song, the criteria for judging a good song are given
at the end of this lab description. In addition, it may be convenient to place the final song on a web site so
that it can be accessed remotely by a lab instructor who can then evaluate its quality. If you would like to t
h
k
n
try other songs, the SP First CD-ROM includes information about alternative tunes: Minuet in G, Für Elise,
CD-ROM
Beethoven’s Fifth, Jesu, Joy of Man’s Desiring and Twinkle, Twinkle, Little Star.
MUSIC
The music synthesis will be done with sinusoidal waveforms of the form SYN-
THESIS
X
x(t) = Ak cos(ωk t + φk ) (1)
k
so it will be necessary to establish the connection between musical notes, their frequencies, and sinusoids.
A secondary objective of the lab is the challenge of trying to add other features to the synthesis in order
to improve the subjective quality for listening. Students who take this challenge will be motivated to learn
more about the spectral representation of signals—a topic that underlies this entire course.
2 Pre-Lab
In this lab, the periodic waveforms and music signals will be created with the intention of playing them out
through a loudspeaker. Therefore, it is necessary to take into account the fact that a conversion is needed
from the digital samples, which are numbers stored in the computer memory to the actual voltage waveform
that will be amplified for the speakers.
1
See http://www.hnh.com/composer/bach.htm.
signal x(t), which is sampled by the continuous-to-discrete (C-to-D) converter to produce a sequence of
samples x[n] = x(nTs ), where n is the integer sample index and Ts is the sampling period. The sampling
rate is fs = 1/Ts where the units are samples per second. As described in Chapter 4 of the text, the ideal
discrete-to-continuous (D-to-C) converter takes the input samples and interpolates a smooth curve between
them. The Sampling Theorem tells us that if the input signal x(t) is a sum of sine waves, then the output y(t)
will be equal to the input x(t) if the sampling rate is more than twice the highest frequency fmax in the input,
i.e., fs > 2fmax . In other words, if we sample fast enough then there will be no problems synthesizing the
continuous audio signals from x[n].
(a) The ideal C-to-D converter is, in effect, being implemented whenever we take samples of a continuous-
time formula, e.g., x(t) at t = tn . We do this in M ATLAB by first making a vector of times, and then
evaluating the formula for the continuous-time signal at the sample times, i.e., x[n] = x(nTs ) if
tn = nTs . This assumes perfect knowledge of the input signal, but we have already been doing it this
way in previous labs.
To begin, create a vector x1 of samples of a sinusoidal signal with A1 = 100, ω1 = 2π(800), and
φ1 = −π/3. Use a sampling rate of 11025 samples/second, and compute a total number of samples
equivalent to a time duration of 0.5 seconds. You may find it helpful to recall that a M ATLAB statement
such as tt=(0:0.01:3); would create a vector of numbers from 0 through 3 with increments of
2
This sampling rate is one quarter of the rate (44,100 Hz) used in audio CD players.
(b) Now create another vector x2 of samples of a second sinusoidal signal (0.8 secs. in duration) for
the case A2 = 80, ω2 = 2π(1200), and φ2 = +π/4. Listen to the signal reconstructed from these
samples. How does its sound compare to the signal in part (a)?
(c) Concatenate the two signals x1 and x2 with a short duration of 0.1 seconds of silence in between.
You should be able to use a statement something like:
xx = [ x1, zeros(1,N), x2 ];
assuming that both x1 and x2 are row vectors. Determine the correct value of N to make 0.1 seconds
of silence. Listen to this new signal to verify that it is correct.
(d) To verify that the concatenation operation was done correctly in the previous part, make the following
plot:
tt = (1/11025)*(1:length(xx)); plot( tt, xx );
This will plot a huge number of points, but it will show the “envelope” of the signal and verify that the
amplitude changes from 100 to zero and then to 80 at the correct times. Notice that the time vector
tt was created to have exactly the same length as the signal vector xx.
(e) Now send the vector xx to the D-to-A converter again, but change the sampling rate parameter in
soundsc(xx, fs) to 22050 samples/second. Do not recompute the samples in xx, just tell the
D-to-A converter that the sampling rate is 22050 samples/second. Describe how the duration and
pitch of the signal were affected. Explain.
One of the most useful modes of the debugger causes the program to jump into “debug mode” whenever
an error occurs. This mode can be invoked by typing:
dbstop if error
With this mode active, you can snoop around inside a function and examine local variables that probably
caused the error. You can also choose this option from the debugging menu in the M ATLAB editor. It’s
sort of like an automatic call to 911 when you’ve gotten into an accident. Try help dbstop for more
information. Use the following to stop debugging
dbclear if error
Create an M-filecoscos.m containing the code below and use the debugger to find the error(s) in the
function. Call the function with the test case: [xn,tn] = coscos(2,3,20,1). Use the debugger to:
1. Set a breakpoint to stop execution when an error occurs and jump into “Keyboard” mode,
3. determine the size of all vectors by using either the size() function or the whos command.
4. and, lastly, modify variables while in the “Keyboard” mode of the debugger.
OCTAVE
41 43
C3 D3 E3 F3 G3 A3 B3 C4 D4 E4 F4 G4 A4 B4 C5 D5 E5 F5 G5 A5 B5
28 30 32 33 35 37 39 40 42 44 45 47 49 51 52 54 56 57 59 61 63
Middle-C A-440
Figure 2: Layout of a piano keyboard. Key numbers are shaded. The notation C4 means the C-key in the
fourth octave.
middle-C which is usually called A-440 (or A4 ) because its frequency is 440 Hz. (In this lab, we are using
the number 40 to represent middle C. This is somewhat arbitary; for instance, the Musical Instrument Digital
Interface (MIDI) standard represents middle C with the number 60). Each octave contains 12 notes (5 black
keys and 7 white) and the ratio between the frequencies of the notes is constant between successive notes.
As a result, this ratio must be 21/12 . Since middle C is 9 keys below A-440, its frequency is approximately
261 Hz. Consult chapter 9 for even more details.
Musical notation shows which notes are to be played and their relative timing (half, quarter, or eighth).
Figure 3 shows how the keys on the piano correspond to notes drawn in musical notation. The white keys
are all labeled as A, B, C, D, E, F , and G; but the black keys are denoted with “sharps” or “flats.” A sharp
such as A# is one key number larger than A; a flat is one key lower, e.g., A[4 is key number 48.
Figure 3: Musical notation is a time-frequency diagram where vertical position indicates which note is to be
played. Notice that the shape of the note defines it as a half, quarter or eighth note, which in turn defines the
duration of the sound.
3
If you have little or no experience reading music, don’t be intimidated. Only a little music knowledge is needed to carry out this
lab. On the other hand, the experience of working in an application area where you must quickly acquire knowledge is a valuable
one. Many real-world engineering problems have this flavor, especially in signal processing which has such a broad applicability
in diverse areas such as geophysics, medicine, radar, speech, etc.
3 Warm-up
3.1 Note Frequency Function
Now write an M-file to produce a desired note for a given duration. Your M-file should be in the form of a
function called key2note.m. Your function should have the following form:
function xx = key2note(X, keynum, dur)
% KEY2NOTE Produce a sinusoidal waveform corresponding to a
% given piano key number
%
% usage: xx = key2note (X, keynum, dur)
%
% xx = the output sinusoidal waveform
% X = complex amplitude for the sinusoid, X = A*exp(j*phi).
% keynum = the piano keyboard number of the desired note
% dur = the duration (in seconds) of the output note
%
fs = 11025; %-- or use 8000 Hz
tt = 0:(1/fs):dur;
freq = %<=============== fill in this line
xx = real( X*exp(j*2*pi*freq*tt) );
For the freq = line, use the formulas given above to determine the frequency for a sinusoid in terms
of its key number. You should start from a reference note (middle-C or A-440 is recommended) and solve for
the frequency based on this reference. Notice that the xx = real( ) line generates the actual sinusoid
as the real part of a complex exponential at the proper frequency.
Instructor Verification (separate page)
n2 = n1 + length(tone) - 1;
xx(n1:n2) = xx(n1:n2) + tone; %<=== Insert the note
n1 = n2 + 1;
end
soundsc( xx, fs )
For the tone = line, generate the actual sinusoid for keynum by making a call to the function
key2note() written previously. It is important to point out that the code in play scale.m allocates a
vector of zeros large enough to hold the entire scale then inserts each note into its proper place in the vector
xx.
Instructor Verification (separate page)
(a) Generate the signal for the scale with play scale.m.
(b) Use the function specgram(xx,512,fs). Zoom in to see the progression of three consecutive
notes in the scale (help zoom), and identify the note A-440 in your spectrogram. The second
argument4 is the window length which could be varied to get different looking spectrograms. The
spectrogram is able to “see” the separate spectrum lines with a longer window length, e.g., 1024 or
2048.5
(c) If you are working at home, you might not have the specgram() function because it is part of the
“Signal Processing Toolbox.” In that case, use the function plotspec(xx,fs) which is in the SP
First toolbox. Show that you get the same result as in part (b). Explain why the result is correct. If
necessary, add a grid so that frequencies can be measured accurately.
• Note: The argument list for plotspec() has a different order from specgram, because plotspec()
uses an optional third argument for the window length (default value is 256).
4
If the second argument is made equal to the “empty matrix” then its default value of 256 is used.
5
Usually the window length is chosen to be a power of two, because a special algorithm called the FFT is used in the computa-
tion. The fastest FFT programs are those where the signal length is a power of 2.
(a) Determine a sampling frequency that will be used to play out the sound through the D-to-A system of
the computer. This will dictate the time Ts between samples of the sinusoids.
(b) Determine the total time duration needed for each note, and also determine the frequency (in hertz)
for each note (see Fig. 2 and the discussion of the well-tempered scale in the warm-up.) A data
file called bach fugue.mat is provided with this information stored in M ATLAB structures; this
contains the portion of the piece needed for this lab. A second file called bach fugue short.mat
has the same information for the first few measures of the piece; you may find this useful for initial
debugging. Both of these files can be found in the M ATLAB files link.
(c) Synthesize the waveform as a combination of sinusoids, and play it out through the computer’s built-in
speaker or headphones using soundsc().
(d) Make a plot of a few periods of two or three of the sinusoids to illustrate that you have the correct
frequency (or period) for each note.
(e) Include a spectrogram image of a portion of your synthesized music—probably about 1 or 2 secs—so
that you can illustrate the fact that you have all the different notes. This piece has many sixteenth
notes, so a window length of 512 might be the best choice for specgram(). In addition, the
spectrogram M-files will scale the frequency axis to run from zero to half the sampling frequency, so
it might be useful to “zoom in” on the region where the notes are. Consult help zoom, or use
the zoom tool in M ATLAB-v5.3 figure windows.
After the load command is executed a new variable will be present in the workspace, called theVoices.
Do whos to see that you have this new variable.
The variable theVoices is a vector whose elements are structures. Each structure gives information
about a single melody in the song; in Fugues, such melodies are often called “voices.” You can determine
the number of melodies in the song by calculating the length of the vector theVoices with the com-
mand length(theVoices). This number will also equal the maximum number of notes that are ever
simultaneously played in the song.
Each structure theVoices(i) has three fields: noteNumbers, startPulses, and durations.
A typical structure theVoices(i) looks like
The value of voices(i).noteNumbers(j) is a single note’s key number. The note’s starting pulse
(where there are four pulses per quarter note, or 16 pulses per measure) and duration in pulses is given by
the corresponding elements in the other two fields.
Measures and beats are the basic time intervals in a musical score. A measure is denoted in the score by a
vertical line that cuts from the top to the bottom of one line in the score. For example, in Fig. 4 there are three
such vertical lines dividing that part of the musical score into four measures. Each measure contains a fixed
number of beats which, in this case, equals four. The label “C” at the left of Fig. 4 describes this relationship
and is called the time signature of the song. By convention, “C” denotes “common time,” in which there
are four beats per measure and that a single beat is the length of one quarter note. For example, typing
theVoices(1).noteNumbers(6) at the M ATLAB command prompt returns the number 52, which
describes the C in the first measure. Because the note is a sixteenth note and a sixteenth note is one pulse,
theVoices(1).durations(6) equals one. The value of theVoices(1).startPulses(6) is
11 because this note begins eleven pulses from the beginning of the song.
bpm = 120;
Computer programs which lets musicians record, modify, an play back notes played on a keyboard or other
electronic instrument are called “sequencer.”7 The timing resolution of a sequencer is usually measured in
“pulses per quarter note,” or PPQ. In this lab, we will employ four pulses per quarter note. A real commercial
sequencer would have a much higher PPQ to encapsulate the subtle timing nuances of a real human playing
a real instrument. The starting times and durations of notes in the music file provided to you are specified in
terms of “pulses,” so it will be helpful helpful to compute the number of “seconds per pulse,” for instance
via:
beats_per_second = bpm/60;
seconds_per_beat = 1/beats_per_second;
seconds_per_pulse = seconds_per_beat / 4;
Half notes are twice as long as quarters; eighth notes are half as long. Triplets, as in Jesu, appear to be
eighth notes, but should actually be a little shorter—three of them should have a total duration equal to a
quarter note. If the tempo is defined only once, then it could be changed: for example, setting bpm =
240 would make the whole piece play twice as fast.
For example, a quarter note is designated by the number 4, an eighth note by the number 8, a half note
by the number 2, a whole note by 1, and so on. When we need a note that lasts longer than a whole note
then we just use a fraction, e.g., 0.5 designates a note that is twice as long as a whole note.
Another timing issue is related to the fact that when a musical instrument is played, the notes are not
continuous. Therefore, inserting very short pauses between notes usually improves the musical sound be-
cause it imitates the natural transition that a musician must make from one note to the next. An envelope
(discussed below) can accomplish the same thing.
A R
t
Figure 5: ADSR profile for an envelope function E(t).
8
In the early 80’s, a company called Digital Keyboards produced a commercial synthesizer called the Synergy in which
the user created sounds via “additive synthesis” by specifying the envelops of individual frequency components. This is
an quite powerful, albeit tedious and challenging way to create realistic sounds. American composer Wendy Carlos (best
known for Switched-On Bach and her score for A Clockwork Orange) used it extensively in her score for Tron. See
http://www.synthmuseum.com/synergy/synergy01.html
Verified: Date/Time:
Part 3.2 Complete and demonstrate the script file play scale.m:
Verified: Date/Time:
Part 3.3 Demonstrate the spectrogram of the scale generated by play scale.m:
Verified: Date/Time:
Does the file play notes? All Notes Most Treble only
Overall Impression:
Excellent: Enjoyable sound, good use of extra features such as harmonics, envelopes, etc.
Good: Bass and Treble clefs synthesized and in sync, few errors, one or two special features.
OK: Basic sinusoidal synthesis, including the bass, with only a few errors.
Poor: No bass notes, or treble and bass not synchronized, many wrong notes.