Kehtarnavaz 2015
Kehtarnavaz 2015
Series
SeriesISSN:
ISSN:
ISSN:1932-1236
1932-1236
1932-1236
Smartphone-Based
Smartphone-Based
Smartphone-BasedReal-Time
Real-Time
Real-Time Smartphone-Based
Smartphone-Based
SEHGAL
SEHGAL
Digital
Digital
DigitalSignal
Signal
SignalProcessing
Processing
Processing
Nasser
Nasser
NasserKehtarnavaz,
Kehtarnavaz,
Kehtarnavaz,Shane
Shane
ShaneParris,
Parris,
Parris,and
and
andAbhishek
Abhishek
AbhishekSehgal,
Sehgal,
Sehgal,University
University
UniversityofofofTexas
Texas
Texasat
atatDallas
Dallas
Dallas Real-Time
Real-Time Digital
Digital
SMARTPHONE-BASED REAL-TIME
SMARTPHONE-BASED
SMARTPHONE-BASED
Signal
Signal Processing
Processing
Real-time
Real-time
Real-timeor or
orapplied
applied
applieddigital
digital
digitalsignal
signal
signalprocessing
processing
processingcourses courses
coursesare are
areoffered
offered
offeredas asasfollow-ups
follow-ups
follow-upsto to
toconventional
conventional
conventionalor or
ortheory-
theory-
theory-
oriented
oriented
orienteddigitaldigital
digitalsignal
signal
signalprocessing
processing
processingcoursescourses
coursesin ininmany
many
manyengineering
engineering
engineeringprograms programs
programsfor for
forthe
the
thepurpose
purpose
purposeof ofofteaching
teaching
teachingstudents
students
students
the
the
thetechnical
technical
technicalknow-how
know-how
know-howfor for
forputting
putting
puttingsignal
signal
signalprocessing
processing
processingalgorithms
algorithms
algorithmsor or
ortheory
theory
theoryinto
into
intopractical
practical
practicaluse.use.
use.These
These
Thesecourses
courses
courses
normally
normally
normallyinvolve involve
involveaccess
access
accesstoto
toaaateaching
teaching
teachinglaboratory
laboratory
laboratorythat that
thatisisisequipped
equipped
equippedwith with
withhardware
hardware
hardwareboards,boards,
boards,in ininparticular
particular
particularDSP DSP
DSP
boards,
boards,
boards,together
together
togetherwith with
withtheir
their
theirsupporting
supporting
supportingsoftware.
software.
software.AAAnumbernumber
numberof ofoftextbooks
textbooks
textbookshave have
havebeen
been
beenwritten
written
writtendiscussing
discussing
discussinghow how
howtoto
to
achieve
achieve
achievereal-time
real-time
real-timeimplementation
implementation
implementationon on
onthese
these
thesehardware
hardware
hardwareboards.
boards.
boards.This This
Thisbook book
bookdiscusses
discusses
discusseshow how
howsmartphones
smartphones
smartphonescan can
canbebe
be
used
used
usedas asashardware
hardware
hardwareboardsboards
boardsforfor
forreal-time
real-time
real-timeimplementation
implementation
implementationof ofofsignal
signal
signalprocessing
processing
processingalgorithms
algorithms
algorithmsas asasan
an
analternative
alternative
alternativeto to
tothe
the
the
REAL-TIME DIGITAL
REAL-TIME
hardware
hardware
hardwareboards boards
boardsthat
that
thatare
are
arecurrently
currently
currentlybeing being
beingused
used
usedin ininsignal
signal
signalprocessing
processing
processingteachingteaching
teachinglaboratories.
laboratories.
laboratories.The The
Thefact fact
factthat
that
thatmobile
mobile
mobile
devices,
devices,
devices,in ininparticular
particular
particularsmartphones,
smartphones,
smartphones,have have
havenow now
nowbecome
become
becomepowerful
powerful
powerfulprocessing
processing
processingplatforms
platforms
platformshas has
hasledled
ledto to
tothe
the
thedevelop-
develop-
develop-
ment
ment
mentof ofofthis
this
thisbook,
book,
book,thus
thus
thusenabling
enabling
enablingstudents
students
studentsto to
touse
use
usetheir
their
theirown
own
ownsmartphones
smartphones
smartphonesto to
torun
run
runsignal
signal
signalprocessing
processing
processingalgorithms
algorithms
algorithms
in
ininreal-time
real-time
real-timeconsidering
considering
consideringthat that
thatthese
these
thesedays
days
daysnearly
nearly
nearlyall all
allstudents
students
studentspossess
possess
possesssmartphones.
smartphones.
smartphones.ChangingChanging
Changingthe the
thehardware
hardware
hardwareplat- plat-
plat-
DIGITAL SIGNAL
DIGITAL
forms
forms
formsthat that
thatareare
arecurrently
currently
currentlyusedused
usedin ininapplied
applied
appliedor or
orrealtime
realtime
realtimesignal
signal
signalprocessing
processing
processingcourses courses
coursesto to
tosmartphones
smartphones
smartphonescreates creates
createsaaatrulytruly
truly
mobile
mobile
mobilelaboratory
laboratory
laboratoryexperience
experience
experienceor or
orenvironment
environment
environmentfor for
forstudents.
students.
students.In InInaddition,
addition,
addition,itititrelieves
relieves
relievesthe the
thecost
cost
costburden
burden
burdenassociated
associated
associated
SIGNAL PROCESSING
SIGNAL
with
with
withusing
using
usingaaadedicated
dedicated
dedicatedsignal
signal
signalprocessing
processing
processingboardboard
boardnoting
noting
notingthatthat
thatthethe
thesoftware
software
softwaredevelopment
development
developmenttools tools
toolsfor for
forsmartphones
smartphones
smartphones
are
are
arefree
free
freeof ofofcharge
charge
chargeandand
andare
are
arewell-developed.
well-developed.
well-developed.This This
Thisbook
book
bookisisiswritten
written
writtenin ininsuch
such
suchaaawayway
waythat
that
thatitititcan
can
canbebe
beused
used
usedas asasaaatextbook
textbook
textbook
for
for
forapplied
applied
appliedor or
orreal-time
real-time
real-timedigital
digital
digitalsignal
signal
signalprocessing
processing
processingcourses courses
coursesoffered
offered
offeredatatatmany
many
manyuniversities.
universities.
universities.Ten Ten
Tenlab
lab
labexperiments
experiments
experimentsthat that
that Nasser
Nasser
NasserKehtarnavaz
Kehtarnavaz
Kehtarnavaz
PROCESSING
PROCESSING
are
are
arecommonly
commonly
commonlyencountered
encountered
encounteredin ininsuch
such
suchcourses
courses
coursesare are
arecovered
covered
coveredin ininthe
the
thebook.
book.
book.ThisThis
Thisbook
book
bookisisiswritten
written
writtenprimarily
primarily
primarilyfor for
forthose
those
those
who
who
whoare are
arealready
already
alreadyfamiliar
familiar
familiarwith
with
withsignal
signal
signalprocessing
processing
processingconceptsconcepts
conceptsand and
andareare
areinterested
interested
interestedin inintheir
their
theirreal-time
real-time
real-timeand and
andpractical
practical
practical Shane
Shane
ShaneParris
Parris
Parris
aspects.
aspects.
aspects.Similar
Similar
Similarto to
toexisting
existing
existingreal-time
real-time
real-timecourses,
courses,
courses,knowledge
knowledge
knowledgeof ofofCCCprogramming
programming
programmingisisisassumed.assumed.
assumed.This This
Thisbook book
bookcan can
canalso
also
also
be
be
on
on
beused
used
usedas
oneither
either
asasaaaself-study
eitherAndroid
self-study
self-studyguide
Android
Androidor or
guide
guidefor
oriPhone
iPhone
for
forthose
those
thosewho
iPhonesmartphones.
smartphones.
who
whowish
smartphones.All
wish
wishto
All
Allthe
to
tobecome
the
become
becomefamiliar
thelab
lab
labcodes
codes
familiar
familiarwith
codescan can
canbe be
with
withsignal
signal
signalprocessing
beobtained
obtained
obtainedas
processing
processingapp
asasaaasoftware
software
app
appdevelopment
softwarepackage
development
development
package
packagefrom: from:
from:
Abhishek
Abhishek
AbhishekSehgal
Sehgal
Sehgal
http://sites.fastspring.com/bookcodes/product/bookcodes
http://sites.fastspring.com/bookcodes/product/bookcodes
http://sites.fastspring.com/bookcodes/product/bookcodes
ABOUT
ABOUT
ABOUTSYNTHESIS
SYNTHESIS
SYNTHESIS
MOR GA
MOR
MOR
This
This
Thisvolume
volume
volumeisisisaaaprinted
printed
printedversion
version
versionof
ofofaaawork
work
workthat
that
thatappears
appears
appearsin
ininthe
the
theSynthesis
Synthesis
SynthesisDigital
Digital
DigitalLibrary
Library
LibraryofofofEngineering
Engineering
Engineeringand
and
andComputer
Computer
Computer
Science.
Science.
Science.Synthesis
Synthesis
SynthesisLectures
Lectures
Lecturesprovide
provide
provideconcise,
concise,
concise,original
original
originalpresentations
presentations
presentationsofofofimportant
important
importantresearch
research
researchand
and
anddevelopment
development
developmenttopics,
topics,
topics,published
published
published
GA N
GA
quickly,
quickly,
quickly,ininindigital
digital
digitaland
and
andprint
print
printformats.
formats.
formats.For
For
Formore
more
moreinformation
information
informationvisit
visit
visitwww.morganclaypool.com
www.morganclaypool.com
www.morganclaypool.com
N&
N
SSSyntheSiS
yntheSiSL
L
LectureS onSSSignaL
ignaLP
PProceSSing
& CL
&
ISBN:
ISBN:
ISBN:978-1-62705-816-2
978-1-62705-816-2
978-1-62705-816-2 yntheSiS ectureS
ectureSon
on ignaL roceSSing
roceSSing
CL AY
CL
MORGAN&
MORGAN
MORGAN &
&CLAYPOOL
CLAYPOOL
CLAYPOOLPUBLISHERS
PUBLISHERS
PUBLISHERS 999
000000000
AY POOL
AY
wwwwwwwww..m
.mmooorrrgggaaannncccllalaayyypppooooooll.l.c.ccooom
mm
POOL
POOL
999781627
781627
781627058162
058162
058162
José
José
JoséMoura,
Moura,
Moura,Series
Series
SeriesEditor
Editor
Editor
Smartphone-Based Real-Time
Digital Signal Processing
Synthesis Lectures on Signal
Processing
Editor
José Moura, Carnegie Mellon University
Synthesis Lectures in Signal Processing publishes 80- to 150-page books on topics of interest to signal
processing engineers and researchers. e Lectures exploit in detail a focused topic. ey can be at
different levels of exposition—from a basic introductory tutorial to an advanced
monograph—depending on the subject and the goals of the author. Over time, the Lectures will
provide a comprehensive treatment of signal processing. Because of its format, the Lectures will also
provide current coverage of signal processing, and existing Lectures will be updated by authors when
justified.
Lectures in Signal Processing are open to all relevant areas in signal processing. ey will cover
theory and theoretical methods, algorithms, performance analysis, and applications. Some Lectures
will provide a new look at a well established area or problem, while others will venture into a brand
new topic in signal processing. By careful reviewing the manuscripts we will strive for quality both in
the Lectures’ contents and exposition.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in
any form or by any means—electronic, mechanical, photocopy, recording, or any other except for brief quotations
in printed reviews, without the prior permission of the publisher.
DOI 10.2200/S00666ED1V01Y201508SPR013
Lecture #13
Series Editor: José Moura, Carnegie Mellon University
Series ISSN
Print 1932-1236 Electronic 1932-1694
Smartphone-Based Real-Time
Digital Signal Processing
M
&C Morgan & cLaypool publishers
ABSTRACT
Real-time or applied digital signal processing courses are offered as follow-ups to conventional
or theory-oriented digital signal processing courses in many engineering programs for the pur-
pose of teaching students the technical know-how for putting signal processing algorithms or
theory into practical use. ese courses normally involve access to a teaching laboratory that is
equipped with hardware boards, in particular DSP boards, together with their supporting soft-
ware. A number of textbooks have been written discussing how to achieve real-time implemen-
tation on these hardware boards. is book discusses how smartphones can be used as hardware
boards for real-time implementation of signal processing algorithms as an alternative to the hard-
ware boards that are currently being used in signal processing teaching laboratories. e fact that
mobile devices, in particular smartphones, have now become powerful processing platforms has
led to the development of this book, thus enabling students to use their own smartphones to
run signal processing algorithms in real-time considering that these days nearly all students pos-
sess smartphones. Changing the hardware platforms that are currently used in applied or real-
time signal processing courses to smartphones creates a truly mobile laboratory experience or
environment for students. In addition, it relieves the cost burden associated with using a ded-
icated signal processing board noting that the software development tools for smartphones are
free of charge and are well-developed. is book is written in such a way that it can be used
as a textbook for applied or real-time digital signal processing courses offered at many univer-
sities. Ten lab experiments that are commonly encountered in such courses are covered in the
book. is book is written primarily for those who are already familiar with signal processing
concepts and are interested in their real-time and practical aspects. Similar to existing real-time
courses, knowledge of C programming is assumed. is book can also be used as a self-study
guide for those who wish to become familiar with signal processing app development on either
Android or iPhone smartphones. All the lab codes can be obtained as a software package from
http://sites.fastspring.com/bookcodes/product/bookcodes
KEYWORDS
real-time implementation of signal processing algorithms on smartphones; using
smartphones for applied digital signal processing courses; mobile laboratory for sig-
nal processing
vii
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Smartphone Implementation Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Smartphone Implementation Shells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.1 Android Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.2 iPhone Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Overview of ARM Processor Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.1 Data Flow and Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Organization of Chapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Software Package of Lab Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
7 Adaptive Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
7.1 Infinite Impulse Response Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
7.2 Adaptive Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
7.3 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
L7 LAB 7:
IIR Filtering and Adaptive FIR Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
L7.1 IIR Filter Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
L7.2 Adaptive FIR Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
L7.3 Lab Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
xi
Preface
For over twenty-five years, I have been teaching real-time or applied digital signal processing
courses as follow-ups to conventional or theory-oriented digital signal processing courses that are
taught in most electrical engineering curricula. e purpose of offering applied courses has been
to teach students the technical know-how for putting signal processing algorithms or theory into
practical use. In the past, I have used various implementation platforms including TI DSP boards
and NI FPGA boards and have used different coding methods including DSP Assembly, C, and
LabVIEW.
All the lab courses I taught till a year ago had to be held within the confines of a teaching
laboratory that needed to be equipped with the appropriate hardware and software for imple-
menting signal processing algorithms. One issue that students always raised in the lab courses
was they were not getting enough time to carry out thorough experimentations during the three-
hour duration that labs were normally scheduled. e driving force behind introducing this book
has thus been to address this issue. e fact that mobile devices, in particular smartphones, have
become powerful processing platforms led me to develop this book towards enabling students to
use their own smartphones as implementation platforms for running signal processing algorithms
as apps considering that these days nearly all students possess smartphones.
Changing the hardware platforms that are currently used in applied or real-time signal
processing courses to smartphones creates a truly mobile laboratory experience or environment for
students. In addition, it relieves the cost burden associated with using a dedicated signal processing
board noting that the software development tools for smartphones are free of charge and are well-
developed.
is book is written in such a way that it can be used as a textbook for applied or real-
time digital signal processing courses offered at many universities. Ten lab experiments that are
commonly encountered in such courses are covered in the book. is book is written primarily
for those who are already familiar with signal processing concepts and are interested in their real-
time and practical aspects. Similar to existing real-time courses, knowledge of C programming is
assumed. is book can also be used as a self-study guide for those who wish to become familiar
with signal processing app development on either Android or iPhone smartphones.
My hope is that by introducing this alternative paradigm for learning applied or real-time
digital signal processing, the interest of students in digital signal processing is further stimulated
as they are made to view their smartphones as a powerful signal processing mobile laboratory.
xii PREFACE
Finally, I would like to thank my students and co-authors Shane Parris and Abhishek Sehgal
for their contributions to the lab experiments allowing me to complete this alternative implemen-
tation approach in a short amount of time of about one year.
Nasser Kehtarnavaz
August 2015
1
CHAPTER 1
Introduction
Applied or real-time digital signal processing courses offered at many universities have greatly
enhanced students’ learning of signal processing concepts by covering practical aspects of imple-
menting signal processing algorithms. DSP processor boards are often deployed in these courses.
To a lesser extent, ARM-based boards such as Raspberry Pi [1] are utilized. A number of text-
books are available discussing how to implement signal processing algorithms on DSP boards,
e.g., [2–6]. is book is written to provide an alternative hardware platform which students can
use in a truly mobile manner and at no cost as it is already in their possession, that being their
own smartphones.
Not only do there exist hardware and software costs associated with equipping a teaching
laboratory with DSP or other types of signal processing boards, in many cases these boards are
confined to a specific teaching laboratory location. Taking advantage of the ubiquitous utilization
of ARM processors in mobile devices, in particular smartphones, this book covers an alternative
approach to teaching applied or real-time DSP courses by enabling students to use their own
smartphones to implement signal processing algorithms. Changing the hardware platforms that
are currently used for applied or real-time signal processing courses to smartphones creates a truly
mobile laboratory experience or environment for students. In addition, it relieves the cost burden
associated with using a dedicated signal processing board noting that the software development
tools for smartphones are free of charge and are well-developed.
is book addresses the process of developing signal processing apps on smartphones in a
step by step manner. It shows how to acquire sensor data, implement typical signal processing
algorithms encountered in a real-time or applied digital signal processing course, and how to
generate output or display information. It should be noted that these steps are carried out for both
the Android and iOS operating systems and besides smartphones, the apps developed can be run
on any ARM-based mobile targets such as tablets. e laboratory experiments that are included
cover the following topics: signal sampling and i/o buffering, quantization effects, fixed-point
versus floating-point implementation, FIR filtering, IIR filtering, adaptive filtering, DFT/FFT
frequency transformation, and optimization techniques to gain computational efficiency.
Figure 1.1: Components of the developed shell programs to run C codes on iPhone and Android
smartphones.
Processing – is module allows running C codes within the Android shell. Additional
code segments are written to interface with the Java modules using the Java Native Interface
( JNI) programming framework.
Chapter 3 and Lab 2 (p. 43) are the counterparts of Chapter 2 and Lab 1 (p. 20) focusing
instead on the iOS operating system. Chapter 3 details the setup of the Xcode programming envi-
ronment and duplicates the “Hello World” app from Lab 1 (p. 20). It also includes the debugging
tool for an iOS platform.
Chapter 4 introduces the topics of signal sampling and frame based processing, and the
steps that are required to interface with the A/D and D/A (analog-to-digital and digital-to-
analog) converters for audio signal input and output on a smartphone target. As part of this
process, the smartphone app shells for the Android and iOS platforms are covered in detail. e
Java and Objective-C shells are discussed, and the steps to incorporate C codes are explained.
6 1. INTRODUCTION
Labs 3 (p. 52) and 4 (p. 61) in Chapter 4 show how to sample an analog signal, process it,
and produce an output in real-time on an Android and iOS smartphone target, respectively. Lab 3
(p. 52) covers the Android development environment, and Lab 4 (p. 61) the iOS development
environment. ese lab experiments involve processing a frame of signal samples captured by the
smartphone microphone. e frame length can be altered by the user through a graphical-user-
interface (GUI) settings menu. e sampling rate can also be altered depending on the sampling
rates permitted by the A/D converter of the smartphone target used. It is normally possible to
alter the sampling rate on a smartphone from 8 kHz to 48 kHz. A low-pass FIR filter together
with a user specified delay are considered in this lab experiment. e delay is meant to simulate
an additional signal processing algorithm running on the ARM processor of the smartphone. e
delay can be changed by the user through the settings menu, adding additional processing time to
the low-pass filtering time. By increasing the sampling frequency or lowering the sampling time
interval, data frames will get skipped and hence a real-time throughput cannot be met. Besides
skipped frames noted on the GUI, one can hear both the original signal and the filtered signal
through the speaker of the smartphone and notice the distortion caused by skipped frames due to
the real-time demand. Distortion can also be experienced by increasing the processing time delay,
thus demonstrating that a real-time throughput is a balance between computational complexity
and computation rate. Processing of one frame of data needs to be done in less than N dt sec in
order to achieve a real-time throughput, where N denotes the frame length and dt the sampling
time interval. For example, for a sampling rate of 8 kHz and a frame length of 256, the processing
needs to be completed within 32 msec in order for all the frames to get processed without any
frames getting skipped.
In Chapter 5, fixed-point and floating-point number representations are discussed and their
differences are pointed out. Lab 5 (p. 77) in Chapter 5 gives suggestions on how one may cope
with the overflow problem. is lab experiment involves running an FIR filter on a smartphone
1.4. ORGANIZATION OF CHAPTERS 7
using fixed-point arithmetic. 16 bits are used to quantize the double precision floating-point filter
coefficients generated by a filter design package. Due to quantization, the frequency response of
the filter is affected. e quantization word length can be adjusted in the settings menu and the
deviation of the frequency response magnitude can be observed in a graph displayed automatically
in the user interface. e settings menu allows the user to alter the quantization bits to examine
the deviation of the frequency response from the frequency response of the floating-point im-
plementation. In addition, due to quantization, overflows may occur depending on the number
of coefficients. is experiment shows how scaling can be used to overcome overflows by scaling
down input samples and scaling back up output samples generated by the filter.
Chapters 6 and 7 discuss common filters used in digital signal processing applications.
Lab 6 (p. 88) in Chapter 6 covers FIR (finite impulse response) filtering and Lab 7 (p. 101)
in Chapter 7 shows how adaptive filtering can be used to perform system identification. e
experiment in Lab 7 (p. 101) exhibits adaptive filtering where an adaptive FIR filter based on the
least mean squares (LMS) coefficient update is implemented to match the output of an IIR filter.
e error between the output of the adaptive FIR filter and the IIR filter for an input signal is
measured and displayed on the smartphone screen in real-time as the app runs. Over time the
error between the two outputs converges towards zero. e user can experiment with the rate of
convergence by altering the adaptive filter order through the settings menu without needing to
recompile the code. As the filter order is increased, it can be observed that the convergence rate
also increases. e drawback of increasing the filter order, which is an increase in the processing
time, can also be observed. is experiment allows one to see how a tradeoff between convergence
rate and real-time throughput can be established.
Chapter 8 covers frequency domain transforms and their implementation using frame-
based processing. Lab 8 (p. 119) explores the computational complexity of Fourier transform
algorithms and shows the utilization of Fourier transform for solving linear systems. e first
part of this lab experiment compares the computational complexity of discrete Fourier trans-
form (DFT) and fast Fourier transform (FFT) by first computing the DFT directly, having the
computational complexity of O.N 2 /, and then via FFT, having the computational complexity
of O.N log N /. In the second part of this lab, a filter is implemented in the frequency domain
by using Fourier transform three times. Frequency domain filtering is done by complex multipli-
cation between two transformed signals. is approach is observed to be more computationally
efficient than convolution when the length of the filter is made long.
Code efficiency issues are addressed in Chapter 9, in which optimization techniques, as well
as the use of intrinsics to access hardware features of the ARM processor, are discussed. Lab 9
(p. 132) in this chapter provides a walk-through of optimization techniques and their impact on a
signal processing app. In this lab experiment, the steps one can take to speed up code execution on
a smartphone target are covered. ese steps include changing compiler settings, writing efficient
C code, and using architecture-specific functions for the ARM processor. e FIR filtering (linear
convolution) code is used here to show the effects of these steps on the real-time throughput.
8 1. INTRODUCTION
Compiler options constitute the simplest but an effective optimization step. By changing these
options, the compiler produces executable binaries that are either optimized for higher processing
speed or for lower memory footprint. After carrying out various compiler optimization options
and observing the computational efficiency gains, one can take advantage of the NEON SIMD
coprocessor that modern smartphones possess to perform vector data processing. One method
of using the NEON coprocessor is the use of NEON intrinsics within C codes. ese intrinsics
allow access to architecture specific operations such as fused multiply-accumulate, the Newton-
Raphson method for division and square root, data format conversions, and saturating arithmetic
operations. In other words, many of the architecture specific features of the ARM processor can
be accessed by utilizing intrinsic functions within C codes. e initial processing algorithms can
be used as a basis for deciding where to utilize intrinsics. In this lab, it is demonstrated that the
convolution of two signal sequences can be performed more efficiently by utilizing a vectorized
loop via NEON intrinsics.
Chapter 10 presents an optional alternative approach, presented in [14], using the Simulink
Coder from the company MathWorks that can be used to rapidly take a signal processing algo-
rithm implemented in MATLAB and transfer it to a smartphone target. e lab experiment
covered in the chapter exhibits the setup process for the Simulink Support Package for Sam-
sung Galaxy Android Devices and Simulink Support Package for Apple iOS Devices provided
by MathWorks. is requires the use of MATLAB version 2015a with Simulink. e experi-
ment shows how to implement a signal processing app as a Simulink model by incorporating the
Simulink blocks for audio input and output on a smartphone target while the signal processing
app is implemented via a MATLAB script within a MATLAB code block.
1.6 REFERENCES
[1] https://www.raspberrypi.org/ 1
[2] N. Kehtarnavaz, Real-time Digital Signal Processing Based on the TMS320C6000, Elsevier,
2004. 1
[3] N. Kehtarnavaz, Digital Signal Processing System Design, Second Edition: LabVIEW-Based
Hybrid Programming, Academic Press, 2008.
[4] T. Welch, C. Wright, and M. Murrow, Real-Time Digital Signal Processing from MATLAB
to C with the TMS320C6x DSPs, CRC Press, 2011.
[5] S. Kuo and B. Lee, Real-Time Digital Signal Processors: Implementations, Applications and
Experiments with the TMS320C55x, Wiley, 2001.
[6] N. Kehtarnavaz and S. Mahotra, Digital Signal Processing Laboratory: LabVIEW-Based
FPGA Implementation, Universal Publishers, 2010. 1
[7] http://developer.android.com/sdk/index.html 2
[8] http://developer.android.com/tools/sdk/ndk/index.html 2
[9] https://developer.apple.com/library/ios/referencelibrary/GettingStarte
d/RoadMapiOS/index.html 2
[10] https://developer.apple.com/programs/ios/ 2
[11] https://code.google.com/p/ios-coreaudio-example/ 3
[12] http://www.arm.com/products/processors/technologies/neon.php 4
[13] ARM Ltd., ARM Architecture Reference Manual ARMv7-A and ARMv7-R Edition, 2011,
http://www.arm.com 4
CHAPTER 2
create the directories by using the Browse option and create a Studio folder and an sdk folder.
When the installer is finished, do not allow it to start Android Studio as additional configuration
is still needed.
e last step is to extract the Android NDK to the folder C:\Android by placing the archive
executable in the folder and running it. When this action is completed, rename the folder android-
ndk-<version> to ndk.
the Advanced tab, see Figure 2.4. en, create new system variables by clicking the New...
button below the System variables section, shown in Figure 2.5. ere are three new sys-
tem variables that need to be set: ANDROID_SDK_HOME with the value C:\Android ,
ANDROID_SDK_ROOT with the value %ANDROID_SDK_HOME%\sdk , and ANDROID_NDK_HOME
with the value %ANDROID_SDK_HOME%\ndk .
en, add the following text to the end of your system path variable as shown in Figure 2.6
;%ANDROID_SKD_ROOT%\platform-tools . Be sure to include the semicolon which is used to
separate the variables. Modifications are now complete and the settings menus can be closed.
Now run the SDK Manager, whose entry can be found by clicking on the Configure option.
e SDK Manager will automatically select any components of the Android SDK which need
updating, as illustrated in Figure 2.9. From this menu, additional system images for emulation
and API packages for future Android versions can get added.
Click the Install option and allow the update process to complete.
For the first launch of the AVD, it is best to select the Wipe user data and Save to snapshot
options (see Figure 2.13). Click Launch and wait for the AVD to boot. Once the AVD launches,
unlock the screen and get rid of the greeting message. Open the apps menu and do the same with
the message that appears there. Go back to the home screen and close the emulator. In the AVD
Manager, select the Start option for your AVD again. is time, make sure that only the Launch
from snapshot box is checked in the Launch Options menu. Click Launch and the emulator should
boot significantly faster than it booted previously.
By specifying the snapshot not to be saved for subsequent launches, it can be ensured that
the emulator always remains clean for testing apps. In order to ensure that this setting is car-
ried over when developing apps with Android Studio, the default run configurations need to
be changed to specify this setting. Close out of the AVD and SDK Managers and return to the
main Android Studio screen (see Figure 2.8). Navigate down to the Configure and Project Defaults
entries to get to the Run Configurations option (see Figure 2.14).
Expand the Defaults listing and select the Android Application option. Under the General
tab, scroll down to the Target Device section and select the Show chooser dialog option. Next, switch
to the Emulator tab and enter the text -no-snapshot-save in the field marked Additional
command line options and ensure that the checkbox is enabled (see Figure 2.15). Repeat these steps
for the Android Tests section as well. Later on when apps are being installed onto the emulator,
this will ensure that the emulator does not retain the old app data and remains clean.
L1 LAB 1:
GETTING FAMILIAR WITH ANDROID SOFTWARE
TOOLS
is lab covers a simple app on the Android smartphone platform by constructing a “Hello world!”
program. Android Studio and NDK tools are used for code development, emulation, and code
debugging. All the codes needed for this and other labs can be extracted from the package men-
tioned in Chapter 1. Start by launching Android Studio, and if not already done, set up an An-
droid Virtual Device (AVD) for use with the Android emulator.
• Begin by creating a new Android project using the Quick Start menu found on the Android
Studio home screen.
• Set the Application Name to HelloWorld and the project location to a folder within the
C:\Android directory.
L1. LAB 1: GETTING FAMILIAR WITH ANDROID SOFTWARE TOOLS 21
• e Target Android device should be set to Phone and Tablet using a Minimum SDK
setting of API 15 .
• Click Next and on the following screen choose to create a Blank Activity .
• Click Next again to create a Blank Activity and leave the default naming.
• Select Finish. e new app project is now created and the main app editor will open to show
the GUI layout of the app.
• Navigate to the java directory of the app in the Project window and open the MainActiv-
ity.java file under com.dsp.helloworld.
e class that typically defines an Android app is called an Activity. Activities are generally
used to define user interface elements. An Android app has activities containing various sections
that the user might interact with such as the main app window. Activities can also be used to
construct and display other activities – such as if a settings window is needed. Whenever an
Android app is opened, the onCreate function or method is called. is method can be regarded
24 2. ANDROID SOFTWARE DEVELOPMENT TOOLS
as the “main” of an activity. Other methods may also be called during various portions of the app
lifecycle as detailed at the following website:
http://developer.android.com/training/basics/activity-lifecycle/star
ting.html
In the default code created by the SDK, setContentView(R.layout.activity_main)
exhibits the GUI. e layout is described in the file res/layout/activity_main.xml in the Package
Explorer window. Open this file to preview the user interface. Layouts can be modified using the
WYSIWYG editor which is built into Android Studio. For now the basic GUI suits our purposes
with one minor modification detailed as follows:
• Open the XML text of the layout (see Figure L1.2) by double clicking on the
Hello world! text or by clicking on the activity_main.xml tab next to the Graphical Lay-
out tab.
• Add the line android:id="@+id/Log" within the <TextView/> section on a new line
and save the changes. is gives a name to the TextView UI element.
TextView in the GUI acts similar to a console window. It displays text. Additional text can
be appended to it. By adding the android:id directive to TextView, it may be interfaced with
in the app code.
After setting up the emulator and the app GUI, let us now cover interfacing with C codes.
Note that it is not required to know the Java code syntax. e purpose is to show that the Java
L1. LAB 1: GETTING FAMILIAR WITH ANDROID SOFTWARE TOOLS 25
Native Interface ( JNI) is a bridge between Java and C codes. Java is useful for handling Android
APIs for sound and video i/o, whereas the signal processing codes are done in C. Of course,
familiarity with C programming is assumed.
A string returned from a C code is considered here. e procedure to integrate native code
consists of creating a C code segment and performing more alterations to the project. First, it is
required to add support for the native C code to the project. e first step is to create a folder in
which the C code will be stored. In the Project listing, navigate down to New > Folder > JNI to
create a folder in the listing called jni . Refer to Figure L1.3 through Figure L1.6. Figure L1.5
shows how the Project listing view may be changed in order to show the jni folder in the main
source listing.
Android Studio now needs to be configured to build a C code using the Gradle build
system. Begin by specifying the NDK location in the project local.properties file according to Fig-
26 2. ANDROID SOFTWARE DEVELOPMENT TOOLS
ure L1.7. Assuming the directory C:/Android is used for setting up the development tools, the
location specification would be as follows:
ndk.dir=C\:\\Android\\ndk
.
Next, the native library specification needs to get added to the build.gradle file within the
project listing. is specification declares the name of the native library which is needed by Java
to actually load the library, as well as the library target platform (e.g., armeabi, x86, mips). is
is done by adding the following code to the defaultConfig section:
ndk {
moduleName "HelloWorld"
abiFilter "armeabi"
}
.
28 2. ANDROID SOFTWARE DEVELOPMENT TOOLS
#import <jni.h>
jstring Java_com_dsp_helloworld_MainActivity_getString (
JNIEnv* env, jobject thiz ) {
return (*env)->NewStringUTF(env, "Hello UTD!");
}
.
is code defines a method that returns a Java string object according to the JNI specifi-
cations with the text Hello UTD! . e naming for this method is dependent on what is called
fully qualified name of the native method which is defined in the MainActivity class. ere are
alternate methods of defining native methods that will be discussed in later labs.
L1. LAB 1: GETTING FAMILIAR WITH ANDROID SOFTWARE TOOLS 29
It is important to note that due to a bug currently present in the Gradle build system,
a dummy C source file needs to be created in the jni folder in order for the build process to
complete successfully. Simply create a new source file, named dummy.c for example, without any
code content.
Next, the native method needs to be declared within the MainActivity.java class (see Fig-
ure L1.9) according to the naming used in the C code. To do so, add this declaration below the
onCreate method already defined.
Now, add the following code within public class to load the native library:
static {
System.loadLibrary("HelloWorld");
}
.
30 2. ANDROID SOFTWARE DEVELOPMENT TOOLS
To use the TextView GUI object, it needs to be imported by adding the following decla-
ration to the top of the MainActivity.java file:
import android.widget.TextView;
.
is will cause the text displayed in the TextView to be changed by the second line which
calls the C getString method.
L1. LAB 1: GETTING FAMILIAR WITH ANDROID SOFTWARE TOOLS 31
Save the changes and select the Make Project option (located under the Build category on
the main toolbar). Android Studio would display the build progress and notify if any errors occur.
Next, run the app on the Android emulator using the Run app option located in the Run menu
of the toolbar. If an emulator is already running, an option will be given to deploy the app to the
selected device (see Figure L1.10). Android Studio should launch the emulator and the screen
(see Figure L1.11) would display Hello UTD! . To confirm that the display is being changed,
comment out the line log.setText() and run the app again. is time the screen would display
Hello World! .
Note that the LogCat feature of Android Studio can be used to display a message from
the C code. LogCat is equivalent to the main system log or display of the execution information.
Here, the code from the previous project is modified to enable the log output capability as follows:
• Add the logging library to the build.gradle file (see Figure L1.8) by adding the line
ldLibs "log" to the ndk section which was added previously.
• Add the Android logging import to the top of the HelloWorld.c source file (see Figure L1.12)
by adding the line #include <android/log.h> .
• Add the following code to output the test message before the return statement:
screen. e message DSP 9001.001 would appear in the listing if the previous procedures were
performed properly (see Figure L1.12).
is table shows some common data types. A multi-dimensional array is represented as an
array of arrays. With an array being an object in Java, a multi-dimensional array appears as an
array of Java object primitives (which are themselves arrays of floating-point primitives).
For the example above, the function used is:
jstring Java_com_dsp_helloworld_MainActivity_getString (
JNIEnv* env, jobject thiz ) {
return (*env)->NewStringUTF(env, "Hello UTD!");
}
.
According to the JNI convention, the inputs to this method, i.e., JNIEnv* env and
jobject thiz , are always required. Additional input variables may be added and the return
type may be changed as noted below
jfloat Java_com_dsp_helloworld_MainActivity_getArea (
JNIEnv* env, jobject thiz, jfloat radius) {
return 3.14159f*radius*radius;
}
.
L1. LAB 1: GETTING FAMILIAR WITH ANDROID SOFTWARE TOOLS 35
with the corresponding native method in Java declared as
CHAPTER 3
8. Click Next. On the next page, remember to deselect Create Git Repository on.
After clicking Create, the settings screen of the project gets shown. Here the features of the
app can be altered, the devices supported by your project can be changed and also any additional
frameworks or libraries to be utilized by your project can be added.
If getting a warning display “No signing identity found,” this means you need to have your
Apple Developer Account accepted for iOS app development. Also, your device must be certified
for app development.
40 3. IOS SOFTWARE DEVELOPMENT TOOLS
3.2 SETTING-UP APP ENVIRONMENT
e left column in the Xcode window is called the Navigator. Here one can select or organize
different files and environment for a project.
• In the Navigator Pane, the Main.Storyboard entry is seen. is is used to design the layout
of your app. Different UI elements in multiple views provided by the IDE can be used to
design the interface of an app. However, this is done programmatically here.
• AppDelegate.h and AppDelegate.m are Objective-C files that can be used to handle events
such as:
– app termination
– app entering background or foreground
– app loading
ese files are not accessed here.
• e files ViewController.m and ViewController.h are used to define methods and properties
specific to a particular view in the storyboard.
@interface ViewController ()
@property UILabel *label;
@property UIButton *button;
@end
.
Initialize the label and button and assign them to the view. is can be done by adding this
code in the method viewDidLoad .
(IBAction)buttonPress:(id)sender {
}
.
• Right click on the HelloWorld folder in your project navigator in the left column and select
New File.
• Write the file name as Algorithm and select Also create a header file.
• After clicking Next, select the destination to store the files. Preferably store the files in the
folder of your project.
• In the project navigator, you can view the two new added files. Select Algorithm.c.
• e function HelloWorld() prints a string and returns a char pointer upon execution.
Let us call this function on the button press action in the view controller and alter the label.
• To allow this function to be called in Objective-C, the function in the header file needs to
be declared. For this purpose, in Algorithm.h, add the following line before #endif :
is code line alters the text of the label in the program.
• Run the program in the simulator.
On pressing the button in the simulator, the following is observed:
1. e text of the label changes.
2. In the Xcode window, “Method Called” gets printed in the Debug Console at the bottom.
is shows that printing can be done from the C function to the debug console in Xcode.
is feature is used for debugging purposes.
L2 LAB 2:
IOS APP DEBUGGING
After getting a familiarity with the Xcode IDE by creating and modifying an iOS app project
and running the app on an iPhone simulator, the following lab experiment can be done to debug
C codes via the built-in Xcode debugger.
To obtain familiarity with the Xcode debugging tool, perform the following:
• Begin by acquiring the C code to be used for this lab.
• Open the folder containing the project.
• Double click on the file with the extension .xcodeproj.
• Navigate to the C code in the Project Navigator.
44 3. IOS SOFTWARE DEVELOPMENT TOOLS
e app can be built by going to Product -> Build.
After the project is successfully built, debug points can be placed inside the C code. e
debug points can be placed by clicking on the column next to the line to be debugged or by pressing
CMD +\. A blue arrow appears (see Figure L2.1) that points towards the line to be debugged.
e recent Xcode version at the time of this writing includes the LLDB debugger, which
allows one to view data in an array with a pointer by typing the following command in the debug
console after the debug point is encountered:
Hints – When attempting to debug with Xcode, first get the code to a point where it will
compile. en, fix the logical errors in the code. e code will compile as is, but the processing
functionality is commented out. When testing, the reference output found in lab2_testsignal.txt
can be used to verify the result. is should match the output from the C code in your app.
47
CHAPTER 4
Analog-to-Digital Signal
Conversion
e process of analog-to-digital signal conversion consists of converting a continuous time and
amplitude signal into discrete time and amplitude values. Sampling and quantization constitute
the steps needed to achieve analog-to-digital signal conversion. To minimize any loss of infor-
mation that may occur as a result of this conversion, it is important to understand the underlying
principles behind sampling and quantization.
4.1 SAMPLING
Sampling is the process of generating discrete time samples from an analog signal. First, it is
helpful to see the relationship between analog and digital frequencies. Let us consider an analog
sinusoidal signal x .t/ D A cos .!t C /. Sampling this signal at t D nTs , with the sampling time
interval of Ts , generates the discrete time signal
Figure 4.1: Different sampling of two different analog signals leading to the same digital signal.
Figure 4.2: Different sampling of the same analog signal leading to two different digital signals.
4.1. SAMPLING 49
8
ˆ R1
ˆ
< X .j!/ D x .t / e j!t
dt
Fourier transform pair for 1
R1 (4.2)
analog signals ˆ 1
:̂ x .t / D 2
X .j!/ e j!t d!
1
8
ˆ P
1
ˆ
< X e j D x Œn e j n
; D !Ts
Fourier transform pair for nD 1
R (4.3)
discrete signals ˆ 1
X e j e j n d
:̂ x Œn D 2
Figure 4.3: (a) Fourier transform of a continuous-time signal, and (b) its discrete time version.
As illustrated in Figure 4.3, when an analog signal with a maximum frequency of fmax
(or bandwidth of W ) is sampled at a rate of Ts D f1s , its corresponding frequency response is
repeated every 2 radians, or fs . In other words, Fourier transform in digital domain becomes a
periodic version of Fourier transform in analog domain. at is why, for discrete signals, one is
only interested in the frequency range 0–fs =2.
erefore, in order to avoid any aliasing or distortion of the frequency content of the discrete
signal, and hence to be able to recover or reconstruct the frequency content of the original analog
signal, the sampling frequency must obey this rate fs 2fmax . is is known as the Nyquist
rate; that is, the sampling frequency should be at least twice the highest frequency in the signal.
Normally, before any digital manipulation, a frontend antialiasing analog lowpass filter is used to
limit the highest frequency of the analog signal.
50 4. ANALOG-TO-DIGITAL SIGNAL CONVERSION
Figure 4.4 shows the Fourier transform of a sampled sinusoid with a frequency of fo . As
can be seen, there is only one frequency component at fo . e aliasing problem can be further
illustrated by considering an under-sampled sinusoid as depicted in Figure 4.5. In this figure, a
1 kHz sinusoid is sampled at fs D 0:8 kHz, which is less than the Nyquist rate. e dashed-
line signal is a 200 Hz sinusoid passing through the same sample points. us, at this sampling
4.2. QUANTIZATION 51
frequency, the output of an A/D converter would be the same if either of the sinusoids were the
input signal. On the other hand, over-sampling a signal provides a richer description than that of
the same signal sampled at the Nyquist rate.
4.2 QUANTIZATION
An A/D converter has a finite number of bits (or resolution). As a result, continuous amplitude
values get represented or approximated by discrete amplitude levels. e process of converting
continuous into discrete amplitude levels is called quantization. is approximation leads to an
error called quantization noise. e input/output characteristic of a 3-bit A/D converter is shown
in Figure 4.6 to see how analog values get approximated by discrete levels.
(a)
(b)
Figure 4.6: Characteristic of a 3-bit A/D converter: (a) input/output static transfer function, and
(b) additive quantization noise.
To avoid saturation or out-of-range distortion, the input voltage must be between Vref
and Vref C . e full-scale (FS) signal Vref is defined as
VFS D Vref D Vref C Vref (4.4)
and one least significant bit (LSB) is given by
Vref
1 LSB D D (4.5)
2N
where N is the number of bits of the A/D converter. Usually, it is assumed that quantization
noise is signal independent and is uniformly distributed over 0:5 LSB and 0.5 LSB. Figure 4.8
shows the quantization noise of an analog signal quantized by a 3-bit A/D converter. It is seen
that, although the histogram of the quantization noise is not exactly uniform, it is reasonable to
consider the uniformity assumption.
L3 LAB 3:
ANDROID AUDIO SIGNAL SAMPLING
is lab provides an understanding of the tools provided by the Android API for capturing audio
signals and outputting processed audio signals. Android API documentation is available online
at http://developer.android.com/reference/packages.html. e two relevant packages
for this lab are android.media.AudioRecord for audio input and android.media.AudioTrack
for audio output. As noted in the previous labs, the Android emulator does not support audio
input. Additionally, the computation time on the emulator is not accurate or stable so an actual
smartphone target is required in order to obtain proper computation times in the exercises.
is lab involves an example app demonstrating how to use the Android APIs supplied
in Java and how to wrap C code segments so that they can be executed using the Java Native
L3. LAB 3: ANDROID AUDIO SIGNAL SAMPLING 53
(a)
(b)
Figure 4.8: Quantization of an analog signal by a 3-bit A/D converter: (a) output signal and quanti-
zation error and (b) histogram of quantization error. (Continues.)
54 4. ANALOG-TO-DIGITAL SIGNAL CONVERSION
(c)
Figure 4.8: (Continued.) Quantization of an analog signal by a 3-bit A/D converter: (c) bit stream.
Interface ( JNI). e example app records an audio signal from the smartphone microphone and
applies a lowpass filter to the audio signal. An overview of the dataflow is shown in Figure L3.1.
Input samples can come from either a file or from the microphone input. ese samples
are stored in a Java WaveFrame object (a data wrapper class used for transferring of data). e
BlockingQueue interface is used to transfer data from the input source to the processing code,
and finally to the output destination (either file or speaker). Using the BlockingQueue interface
is advantageous because it allows a small buffer for accumulating data, while at the same time
functioning as a First-In-First-Out (FIFO) queue. e WaveFrame objects are helper objects
which serve to store audio samples in an array. ese objects are stored and re-used to reduce
garbage collection in the Java VM.
enable software installation from Unknown sources. e Unknown sources option can be found in
the general Settings configuration on the smartphone. Look for the Unknown sources option found
in the Security submenu.
e project source of the app is also provided for your reference. is app performs signal
sampling using the smartphone microphone. It can also read in a previously sampled signal from
Wav or PCM files in the device storage. e input signal is accumulated into frames of audio data
and then passed through JNI to C code for lowpass filtering. An artificial computational delay is
added to the filtering code by the use of the usleep function within the C processing segment.
e filtered signal is then passed back to Java and is saved to a file or played back through the
target audio output. e app settings menu allows the adjustment of the sampling rate, frame
size, and computational delay of the signal processing pipeline.
An important aspect of the app manifest is enabling the correct permissions to per-
form file IO and record audio. Permissions are set with the uses-permission directive, where
RECORD_AUDIO is defined for audio input and WRITE_EXTERNAL_STORAGE enables saving files
to the device storage. A detailed list of the permissions can be found on the Android developer
website at:
http://developer.android.com/reference/android/Manifest.permission.ht
ml
e code to be executed is referred to as an Activity—in this case named .RealTime in the
manifest, which corresponds to the RealTime.java file. e manifest indicates namings which are
necessary for hooking into the UI elements, defined in the layout. RealTime.java contains the
code which hooks the UI elements and controls the app execution.
L3.3 RECORDING
e API that supplies audio recording capability is found in android.media.AudioRecord.
A reference implementation of this API appears within the WaveRecorder class of the sam-
pling code. To determine which sampling rates are supported by an Android device, the func-
tion WaveRecorder.checkSamplingRate() is called upon the app initialization. is func-
L3. LAB 3: ANDROID AUDIO SIGNAL SAMPLING 57
tion initializes the AudioRecord API with a preset list of sampling rates. If the sampling rate
is not supported, an error will occur and that particular sampling rate will not be added to
the list of supported sampling rates. Supported sampling rates as determined by the function
checkSamplingRate are listed in the app settings menu.
To initialize the recorder, the size of the data buffer to be used by the recorder is first
computed by calling the function getMinBufferSize as follows:
where FS specifies the desired sampling rate, CHANNELS is the channel configuration which
can be either stereo or mono, and lastly FORMAT specifies 16bit PCM or 8bit PCM audio
data.e return value of this function is the minimum size of the buffer in bytes. It is important
to mention that the size of this buffer is dependent upon the Android target and will vary from
target to target. To ensure that no audio data is lost, this value can be scaled up to ensure that
there is enough overhead, but needs to be set to at least the size returned by getMinBufferSize .
Scaling up the size of this buffer will also affect the latency of the recording; larger buffers will
cause increased delay before the sampled audio is available for processing. e same is true for
the AudioTrack API; increasing the output buffer size will increase the delay before the processed
audio is outputted to the smartphone speaker. is BufferLength value is used to instantiate the
AudioRecord object:
e values used for FS, CHANNELS, and FORMAT should match those used to
calculate the buffer length. ere are several options for the SOURCE parameter, detailed
at http://developer.android.com/reference/android/media/MediaRecorder.AudioS
ource.html. For most cases it should be specified as AudioSource.CAMCORDER as this ensures
that the built-in filters for noise reduction and gain correction for voice calls are not active during
the recording.
e audio recorder does not begin to accumulate data when it is instantiated and must
be controlled. To begin collecting data, the function recorder.startRecording() is called.
e recorder object is then polled to retrieve audio data by calling one of the read functions. If
audio is not read from the recorder at a sufficient rate, the internal buffer will overflow. e read
function must be supplied with a buffer to write data into, an initial offset, and a final offset.
58 4. ANALOG-TO-DIGITAL SIGNAL CONVERSION
ese functions are blocking - meaning that the program flow will not continue until the desired
amount of audio data has been read into the supplied buffer. e recording loop is shown below:
loop:while(true){
if(isRecording.get()) {
out = recycleQueue.take();
recorder.read(out.getAudio(), 0, Settings.stepSize);
output.put(out);
} else {
output.put(Settings.STOP);
break loop;
}
}
.
loop:while(true) {
. WaveFrame currentFrame = null;
L3. LAB 3: ANDROID AUDIO SIGNAL SAMPLING 59
currentFrame = input.take();
if(currentFrame == Settings.STOP){
output.put(currentFrame);
break loop;
}
process(currentFrame.getAudio());
getOutput(currentFrame.getAudio(), Settings.output);
output.put(currentFrame);
}
.
process() takes the short array input corresponding to the input signal and processes it using
the native code methods. After the processing method is called, the samples corresponding to the
filtered output will be stored temporarily in a memory location allocated by the native code. To
retrieve the filtered signal, the getOutput() method is used to overwrite the contents of the
WaveFrame audio buffer with the desired output as selected by an integer based switch.
package <tld>.<your_domain>.<your_package>;
public class <YourClass> {
public static native float[] <yourMethod>(float[] in, float a,
int b);
}
.
2. Method naming within the native code should follow a set pattern and handle JNI variables
as indicated below. Note that the input and output types shown in the above yourMethod
declaration correspond to the input and output types in the declaration noted below with
the addition of some JNI fields.
60 4. ANALOG-TO-DIGITAL SIGNAL CONVERSION
jfloatArray
Java_<tld>_< your_domain>_<your_package>_<YourClass>_<yourMethod>(
JNIEnv*env, jobject thiz, jfloatArray in, jfloat a, jint a)
{ /*do stuff*/ }
.
3. Loading native libraries into memory when the program runs. e following code should
be placed in the class file responsible for defining the Android activity. In the example app,
this can be found in the GUI class:
static {
System.loadLibrary("yourlibrary");
}
.
System.loadLibrary links the native method declaration in the Java class with the process-
ing code declared in the C file. ere are alternative methods of declaring native functions and
performing the linking which will be covered in a later chapter.
Depending on the input data types in the methods and the expected return data types, the
JNI method signature will change. If any of these steps are not correctly done, Java will not be
able to find the native methods and an exception will be thrown when the app attempts to call
the native method.
2. Experiment with various frame sizes and computation delays to find the maximum ac-
ceptable computation delay for a given frame size. Explain the situations when real-time
processing breaks down.
Hints – Here are the steps that need to be taken for running the prebuilt app code on your
Android smartphone target:
• Enable application installation from unknown sources. On Android 4.3, this option is
in the Settings > More > Security section of the Settings menu. is option may also
be in Settings > Applications on older Android versions.
L4. LAB 4: IOS AUDIO SIGNAL SAMPLING 61
• Put the application APK on the smartphone and open the APK to install the app.
You should then be able to run the app. You can record your own audio samples by selecting
a sampling rate from the app Settings menu. Set the Debugging Level to Wave for the audio
to be saved. Test the case when recording audio and the processing takes too long, and then
the case when reading from a file and the processing takes too long.
L4 LAB 4:
IOS AUDIO SIGNAL SAMPLING
is lab is the iOS version of Lab 3 (p. 52) for capturing audio signals and outputting processed
audio signals on an iPhone smartphone target. e iOS API documentation is available online
at https://developer.apple.com/library/ios/navigation/. e relevant framework for
this lab is AudioToolbox. As noted previously, the iPhone simulator does not support audio input.
Additionally, the computation time on the simulator is not accurate or stable, thus an actual
smartphone target is required in order to obtain actual computation times in the exercises.
e structure of the previous debugging lab is re-used here. is lab involves an example
app demonstrating how to use the iOS APIs supplied in Objective-C and how to properly link
C code segments so that they can be executed using just a header file. e example app records
an audio signal from the smartphone microphone and applies a lowpass filter to the audio signal.
An overview of the dataflow is shown in Figure L4.1.
IosAudio IosAudio
Controller LowPass Controller
From Mic Input Buffer Output Buffer To Speaker
Filter in C
Recording Playback
Audio Buffers
e input samples can come from either a file or from the microphone input. ey get
stored as short arrays in an AudioBuffer. e i/o buffers are used to transfer data from the input
source to the processing code, and finally to the speaker as output. e use of callbacks allows one
to input data from either the microphone or a file, store them in a software buffer, process and
then output to the speaker.
62 4. ANALOG-TO-DIGITAL SIGNAL CONVERSION
L4.1 APP SOURCE CODE
e project source of this app allows one to study sampling rate and frame size and their effects on
the maximum computation delay allowable in a real-time signal processing pipeline. It is required
to use a real iOS device for this app, as the simulator timings do not reflect those of an actual target
device.
is app performs signal sampling using the microphone of an iPhone smartphone. It can
also read in a CAF or PCM file format into the device storage. e input signal is accumulated into
frames of audio data and then passed to the C code for lowpass filtering. An artificial computation
delay is added to the filtering code using the function usleep within the C code segment. e app
settings menu allows adjusting the sampling rate, the frame size, and the delay.
iOS does not require any permissions to be added to a manifest. It automatically asks the
user for permission to access the microphone. As the audio file provided for testing with this app
is included in the main bundle, no special permissions are required to access it.
e code that is executed is called from the main.m file. It returns the app main in an
autoreleasepool, i.e., it handles the auto releasing of memory variables.
#import <UIKit/UIKit.h>
#import "AppDelegate.h"
#import "IosAudioController.h"
L4.2 RECORDING
e API that supplies the audio recording capability is part of the AudioToolbox. It can be
included within a file using the instruction #import. To specify the audio format to be used by
the app, an instance of AudioStreamBasicDescription needs to be created. is specifies the audio
data to be processed.
L4. LAB 4: IOS AUDIO SIGNAL SAMPLING 63
AudioStreamBasicDescription audioFormat;
audioFormat.mSampleRate = 44100.0;
audioFormat.mFormatID = kAudioFormatLinearPCM;
audioFormat.mFormatFlags =( kAudioFormatFlagIsSignedInteger |
kAudioFormatFlagIsPacked);
audioFormat.mFramesPerPacket = 1;
audioFormat.mChannelsPerFrame = 1;
audioFormat.mBitsPerChannel = 16;
audioFormat.mBytesPerPacket = 2;
audioFormat.mBytesPerFrame = 2;
.
e above specifications indicate to the device that the audio data being handled has a
sampling rate of 44.1 KHz, is linear PCM and is packed in the form of 16 bit integers. e
audio is mono with only one frame per packet. e frame size is not determined here because the
hardware determines the size of the frame at runtime.
is description is used to initialize the Audio Unit that will handle the audio i/o. e
audio data is handled by a separate C function called a callback which is shown below:
AudioBuffer buffer;
buffer.mNumberChannels = 1;
buffer.mDataByteSize = inNumberFrames * 2;
buffer.mData = malloc( inNumberFrames * 2 );
if(getMic()) {
OSStatus status;
status = AudioUnitRender([iosAudio audioUnit],
ioActionFlags,
inTimeStamp,
inBusNumber,
inNumberFrames,
&bufferList);
checkStatus(status);
TPCircularBufferProduceBytes(inBuffer, audioBuffer.mData,
audioBuffer.mDataByteSize);
As the device collects data, the recording callback is called. Audio samples are collected
and stored in a software buffer. e function processStream is used to process the audio samples
in the software buffer.
(void) processStream {
//Frame Size
UInt32 frameSize = getFrameSize() * sizeof(short);
int32_t availableBytes;
free(output);
free(buffer);
duration+=clock() - startTime;
. count++;
66 4. ANALOG-TO-DIGITAL SIGNAL CONVERSION
startTime = clock();
• e function to be linked is coded in the .c file and the corresponding header file is linked
with the same.
//FIRFilter.c
#include "FIRFilter.h"
• Function declaration is provided in the header file through which it can be linked to other
files importing it.
L4. LAB 4: IOS AUDIO SIGNAL SAMPLING 67
//FIRFilter.h
#include <stdio.h>
Once the function is declared in the header file, it can be accessed anywhere in the app,
simply by importing the same header file.
CHAPTER 5
N 1 N 2
D .B/ D bN 12 C bN 22 C C b1 21 C b0 20 : (5.1)
e 2’s-complement representation allows a processor to perform integer addition and sub-
traction by using the same hardware. When using unsigned integer representation, the sign bit is
treated as an extra bit. Only positive numbers get represented this way.
ere is a limitation to the dynamic range of the foregoing integer representation scheme.
For example, in a 16-bit system, it is not possible to represent numbers larger than C215 1 D
32767 or smaller than 215 D 32768. To cope with this limitation, numbers are normalized be-
tween -1 and 1. In other words, they are represented as fractions. is normalization is achieved
by the programmer moving the implied or imaginary binary point (note that there is no physical
memory allocated to this point) as indicated in Figure 5.1. is way, the fractional value is given
by
F .B/ D bN 1 20 C bN 2 2 1 C C b1 2 .N 2/ C b0 2 .N 1/ : (5.2)
is representation scheme is referred to as Q-format or fractional representation. e pro-
grammer needs to keep track of the implied binary point when manipulating Q-format numbers.
70 5. FIXED-POINT VS. FLOATING-POINT
For instance, let us consider two Q15 format numbers and a 16-bit wide memory. Each number
consists of 1 sign bit plus 15 fractional bits. When these numbers are multiplied, a Q30 format
number is obtained (the product of two fractions is still a fraction), with bit 31 being the sign
bit and bit 32 another sign bit (called extended sign bit). If not enough bits are available to store
all 32 bits, and only 16 bits can be stored, it makes sense to store the most significant bits. is
translates into storing the upper portion of the 32-bit product register, minus the extended sign
bit, by doing a 1-bit left shift followed by a 16-bit right shift. In this manner, the product would
be stored in Q15 format (see Figure 5.2). Notation for Q-format numbers is QM.N where M
represents the number of bits corresponding to the whole-number part and N the number of bits
corresponding to the fractional-number part.
Based on 2’s-complement representation, a dynamic range of 2N 1 D .B/ <
2N 1 1 can be achieved, where N denotes the number of bits. For an easy illustration, let us
consider a 4-bit system where the most negative number is 8 and the most positive number 7.
e decimal representations of the numbers are shown in Figure 5.3. Notice how the numbers
change from most positive to most negative with the sign bit. Since only the integer numbers
falling within the limits 8 and 7 can be represented, it is easy to see that any multiplication or
addition resulting in a number larger than 7 or smaller than 8 will cause overflow. For exam-
ple, when 6 is multiplied by 2, the number 12 is obtained. Hence, the result is greater than the
representation limits and will be wrapped around the circle to 1100, which is 4.
Q-format representation addresses this problem by normalizing the dynamic range between
–1 and 1. Any resulting multiplication falls within the limits of this dynamic range. Using Q-
format representation, the dynamic range is divided into 2N sections, where 2 .N 1/ is the size
of a section. e most negative number is always 1 and the most positive number is 1 2 .N 1/ .
5.1. Q-FORMAT NUMBER REPRESENTATION 71
e following example helps one to see the difference in the two representation schemes.
As shown in Figure 5.4, the multiplication of 0110 by 1110 in binary is the equivalent of multiply-
ing 6 by –2 in decimal, giving an outcome of 12, a number exceeding the dynamic range of the
4-bit system. Based on the Q3 representation, these numbers correspond to 0.75 and 0:25, re-
72 5. FIXED-POINT VS. FLOATING-POINT
spectively. e result is 0:1875, which falls within the fractional range. Notice that the hardware
generates the same 1’s and 0’s, what is different is the interpretation of the bits.
When multiplying QN numbers, it should be remembered that the result will consist of 2N
fractional bits, one sign bit, and one or more extended sign bits. Based on the data type used, the
result has to be shifted accordingly. If two Q15 numbers are multiplied, the result will be 32-bits
wide, with the MSB being the extended sign bit followed by the sign bit. e imaginary decimal
point will be after the 30th bit. After discarding the extended sign bit with a 1-bit left shift, a right
shift of 16 is required to store the result in a 16-bit memory location as a Q15 number. It should
be realized that some precision is lost, of course, as a result of discarding the smaller fractional bits.
Since only 16 bits can be stored, the shifting allows one to retain the higher precision fractional
bits. If a 32-bit storage capability is available, a left shift of 1 can be performed to remove the
extended sign bit and store the result as a Q31 number.
To further understand a possible precision loss when manipulating Q-format numbers,
let us consider another example where two Q3.12 numbers corresponding to 7.5 and 7.25 are
multiplied and that the available memory space is 16-bit wide. As can be seen from Figure 5.5,
the resulting product might be left shifted by 4 bits to store all the fractional bits corresponding
to Q3.12 format. However, doing so results in a product value of 6.375, which is different than
the correct value of 54.375. If the fractional product is stored in a lower precision Q-format—say,
in Q6.9 format—then the correct product value can be stored.
Although Q-format solves the problem of overflow in multiplication, addition and sub-
traction still pose a problem. When adding two Q15 numbers, the sum exceeds the range of Q15
representation. To solve this problem, the scaling approach, discussed later in the chapter, needs
to be employed.
5.2. FLOATING-POINT NUMBER REPRESENTATION 73
where s denotes the sign bit (bit 31), exp the exponent bits (bits 23 through 30), and frac the
fractional or mantissa bits (bits 0 through 22), see Figure 5.6.
Consequently, numbers as big as 3:4 1038 and as small as 1:175 10 38 can be processed.
In the double-precision format, more fractional and exponent bits are used as indicated below
1s 2.exp 1023/
1:frac (5.4)
where the exponent bits are from bits 20 through 30 and the fractional bits are all the bits of one
word and bits 0 through 19 of the other word, see Figure 5.7. In this manner, numbers as big as
1:7 10308 and as small as 2:2 10 308 can be handled.
When using a floating-point processor, all the steps needed to perform floating-point arith-
metic are done by a floating-point CPU hardware. For example, consider adding two floating-
74 5. FIXED-POINT VS. FLOATING-POINT
a D afrac 2aexp
(5.5)
b D bfrac 2bexp :
c DaCb
D afrac C bfrac 2 .aexp bexp / 2aexp if aexp bexp (5.6)
D afrac 2 .bexp aexp / C bfrac 2bexp if aexp < bexp :
ese parts are computed by the floating-point hardware. is shows that, though possi-
ble, it is inefficient to perform floating-point arithmetic on fixed-point processors, since all the
operations involved, such as those in the above equations, need to be performed by software.
5.4.1 DIVISION
e floating-point NEON coprocessor in modern smartphones provides an instruction, named
VRECP, which provides an estimate of the reciprocal of an input number [1]. e accuracy can
be improved by using this instruction as the seed point vŒ0 for the iterative Newton-Raphson
algorithm expressed by this equation
where x is the value whose reciprocal is to be found. Accuracy is increased by each iteration of this
equation. A portion of this equation, i.e., .2:0 x vŒn/, can be computed in the NEON co-
processor via the instruction VRECPS [2]. A full iteration is then achieved by the multiplication
with the previous value. In other words, on the floating-point NEON coprocessor, division can
be achieved by taking the reciprocal of the denominator and then by multiplying the reciprocal
with the numerator. Accuracy is increased by performing many iterations. More details of the
above division approach are covered in [3].
76 5. FIXED-POINT VS. FLOATING-POINT
5.4.2 SINE AND COSINE
Trigonometric functions such as sine and cosine can be approximated by using the Taylor series
expansion. For sine, the following expansion can be used:
x3 x5 x7 x9
sin.x/ D x C C C higher order: (5.8)
3Š 5Š 7Š 9Š
Clearly, adding higher order terms leads to more precision. For implementation purposes,
this expansion can be rewritten as follows:
x2 x2 x2 x2
sin.x/ Š x 1 1 1 1 : (5.9)
23 45 67 89
Similarly, for cosine, the following expansion can be used:
x2 x4 x6 x8 x2 x2 x2 x2
cos.x/ Š 1 C C D1 1 1 1 : (5.10)
2 4Š 6Š 8Š 2 34 56 78
Furthermore, to generate sine and cosine, the following recursive equations can be used:
sin nx D 2 cos x sin.n 1/x sin.n 2/x
(5.11)
cos nx D 2 cos x cos.n 1/x cos.n 2/x:
5.4.3 SQUARE-ROOT
Square-root sqrt.y/ can be approximated by the following Taylor series expansion considering
that y 0:5 D .x C 1/0:5 :
x x2 x 3 5x 4 7x 5
sqrt.y/ Š 1 C C C
2 8 16 128 256 (5.12)
x x 2 x 3 x 4 x 5
D1C 0:5 C 0:5 0:625 C 0:875 :
2 2 2 2 2
Here, it is assumed that x is in Q15 format. In this equation, the estimation error would
be small for x values near unity. Hence, to improve accuracy in applications where the range of x
is known, x can be scaled by a2 to bring it close to 1 (i.e., sqrt.a2 x/ where a2 x Š 1). e result
should then be scaled back by 1=a.
It is also possible to compute square-root by using the Newton-Raphson algorithm. e
NEON coprocessor provides an instruction for the Newton-Raphson iteration of the reciprocal
of square-root:
v Œn C 1 D v Œn .3 x v Œn v Œn/=2: (5.13)
is instruction is named VRSQRTE which is used to provide an estimate of 1=sqrt.x/ [3].
VRSQRTS provides the recursive equation part .3 x wŒn/=2, where w Œn D v Œn vŒn. A
multiplication operation with the previous vŒn is then needed for one full iteration of the above
equation.
5.5. REFERENCES 77
5.5 REFERENCES
[1] http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0204j/C
IHCHECJ.html 75
[2] http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0204j/C
IHDIACI.html 75
[3] http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka
14282.html 75, 76
L5 LAB 5:
FIXED-POINT AND FLOATING-POINT OPERATIONS
In this lab, a typical computation function is implemented using both floating-point and fixed-
point arithmetic and the differences in the outcomes are compared. e function considered is
division.
A division instruction exists on ARM processors, which is performed by the floating-point
coprocessor VFP. is instruction is generally slower than a multiplication instruction in terms
of the number of clock cycles it takes to complete the computation. For example, on an ARM
Cortex-A9 processor, the timing for fixed-point division is 15 cycles and for floating-point divi-
sion is 25 cycles. Division can be obtained by using the following inversion operation:
To double the accuracy, inputs can be scaled (1 < x < 2) by appropriately moving the (bi-
nary) decimal point in x and y . is scaling needs to be removed to produce the final output.
In the above code, the variables of type float32x4_t refer to NEON registers. e type
specifies that the register holds four 32-bit floating-point numbers. Since there is a total of 128 bits
in the registers, they are referred to as quadword (Q) registers. NEON registers containing 64 bits
5.7. REFERENCES 79
are referred to as doubleword (D) registers. e NEON register bank is described in more detail
in [2] and later in Chapter 9.
If the instructions are for operating on quadword registers, the suffix “q” is required to be
added to the instruction intrinsic (as indicated above), otherwise, the registers will be assumed to
be doubleword. e data type of the instruction needs to be specified as an additional _{type} suffix
to the instruction. Supported types include 8, 16, 32, and 64-bit signed and unsigned integers, as
well as 32-bit floating-point. A complete listing of data types is available in [3].
3. Write a floating-point C code to compute division by square-root for two numbers b and a
by making use of NEON intrinsics. You should use the provided reference code for Newton-
Raphson division as the basis for your implementation. Compare the initial estimation with
the result produced from the square-root operation performed in MATLAB. Add iterations
as before and evaluate the outcome as the number of iterations is increased. How does the
rate of convergence for square-root approximation compare to the rate of convergence for
the division approximation?
5.7 REFERENCES
[1] http://infocenter.arm.com/help/topic/com.arm.doc.dui0491c/BABIIBBG.ht
ml 78
80 5. FIXED-POINT VS. FLOATING-POINT
[2] http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dht0002a/c
h01s03s02.html 79
[3] http://infocenter.arm.com/help/topic/com.arm.doc.dui0473c/CIHDIBDG.ht
ml 79
[4] http://infocenter.arm.com/help/topic/com.arm.doc.dui0204j/CIHCHECJ.ht
ml 79
[5] http://infocenter.arm.com/help/topic/com.arm.doc.dui0204j/CIHDIACI.ht
ml 79
81
CHAPTER 6
Real-Time Filtering
For carrying out real-time filtering, it is required to know how to acquire input samples, pro-
cess them and provide the result. is chapter addresses these issues towards achieving real-time
filtering implementation on the ARM processor of a smartphone.
or
// Filtering
for( i = 0; i < N; i++ ) {
result += (samples[i] * coefficients[i]) ) << 1;
}
In the CircularBuffer structure, buff contains the data and writeCount stores the
number of times the circular buffer has been written to. e rest of the code for the circular buffer
appears below.
84 6. REAL-TIME FILTERING
newCircularBuffer->writeCount = 0;
newCircularBuffer->bitMask = pow2Size-1;
newCircularBuffer->buff = (short*)calloc(pow2Size,sizeof(short));
return newCircularBuffer;
}
.
6.2. CIRCULAR BUFFERING 85
void writeCircularBuffer(CircularBuffer* buffer, short value){
//buff[writeCount % pow2Size] = value;
//writeCount = writeCount + 1;
buffer->buff[(buffer->writeCount++) & buffer->bitMask] = value;
}
Although the performance of this method may appear acceptable, it is not the best that
can be achieved on ARM processors. To reduce the computational burden of creating a moving
window of input samples, an alternative approach known as frame processing is discussed next.
6.5 REFERENCES
[1] J. Proakis and D. Manolakis, Digital Signal Processing: Principles, Algorithms, and Applica-
tions, Prentice-Hall, 1996. 88
L6 LAB 6:
REAL-TIME FIR FILTERING, QUANTIZATION EFFECT
AND OVERFLOW
e purpose of this lab is to design and then run an FIR filter written in C on the ARM processor
of a smartphone. Also, the quantization effect and overflow are examined. e application shells
introduced in the previous labs are used here for collecting audio signal, passing a sampled signal
to a C code segment for filtering, and saving the output to a file for analysis. e design of the FIR
filter, i.e., generation of filter coefficients, is realized using the MATLAB filter design tool. Other
filter design tools may be used for the generation of filter coefficients. e lab experiment involves
the implementation of this filter in C code. e base application shell is used here to insert the
filtering C code. Note that the previous lab provided a floating-point C code implementation of
FIR filtering while the fixed-point C code implementation of FIR filtering is covered here.
L6. LAB 6: REAL-TIME FIR FILTERING, QUANTIZATION EFFECT AND OVERFLOW 89
L6.1 FILTER DESIGN
To generate the filter coefficients, the Parks-McClellan method is used to design a lowpass filter
for the specifications stated below:
%compute deviations
dev = [(10^(rpass/20)-1)/(10^(rpass/20)+1) 10^(-rstop/20)];
is code creates an array of n C 1 double precision filter coefficients meeting the above
specifications. In order to confirm that the filter matches the specifications, a synthesized signal is
considered for testing via the following MATLAB code (see Figure L6.1). is code synthesizes
a sinusoidal signal composed of three frequency components. e signal gets filtered and the
spectrum of the input and output signals are displayed along with the frequency response of the
filter.
subplot(3,1,1);
plot(frq*(fs/ns),X);
grid on;
#ifdef __arm__
.align 2
.global addStatus
addStatus:
@ r0 = input A and sum output address
@ r1 = input B and status output address
ldr r2, [r0] @ load contents at [r0] into r2
ldr r3, [r1] @ load contents at [r1] into r3
adds r2, r2, r3 @ add r2 and r3, store in r2
@ set status flags in APSR
mrs r3, APSR @ copy APSR to r3
str r2, [r0] @ store r2 at address [r0]
str r3, [r1] @ store r3 at address [r1]
bx lr @ return to caller
#elif __arm64__
.align 2
.global addStatus
addStatus:
@ x0 = input A and sum output address
@ x1 = input B and status output address
ldr w2, [x0] @ load contents at [x0] into w2
. ldr w3, [x1] @ load contents at [x1] into w3
92 6. REAL-TIME FILTERING
adds w2, w2, w3 @ add w2 and w3, store in w2
@ set status flags in APSR
mrs x3, NZCV @ copy APSR to x3
str w2, [x0] @ store w2 at address [x0]
str w3, [x1] @ store x3's MSB at address [r1]
ret
#endif
.
Since ARMv7 and ARMv8 instruction sets are used on different iOS devices, the assembly
code to check the APSR needs to be implemented properly for each instruction set. e line
#ifdef __arm__ checks at compile time if the supported instruction set is ARMv7. If the
instruction set is not ARMv7, it checks if the instructions set is ARMv8 using the __arm64__
flag. ere are some notable differences between the two implementations. ARMv8 uses 64-bit
registers, thus the register naming in the assembly code is different. On ARMv8, the register
file consists of 64-bit segments R0 through R30. When used in assembly coding, these registers
must be further qualified to indicate the operand data size. Registers beginning with X refer to
full width 64-bit registers whereas registers beginning with W refer to 32-bit registers. Also, note
that the mnemonic for the status register is APSR on ARMv7 and NZCV on ARMv8. e
assembly code segment can then be called from C in the manner shown below:
After the execution of addGetStatus, the register A contains the status register and B con-
tains the result of the addition. e following test cases illustrate the operation of the addStatus
function.
On an Android platform, __android_log_print is used to send the output to LogCat:
L6. LAB 6: REAL-TIME FIR FILTERING, QUANTIZATION EFFECT AND OVERFLOW 93
short A = 32767;
short B = 32767;
status = addGetStatus(A, B, &result);
__android_log_print(ANDROID_LOG_ERROR, "Add Status",
"A: %d, B: %d, C: %d, Status: %#010x", A, B, result, status);
A = 32767;
B = -32768;
status = addGetStatus(A, B, &result);}
__android_log_print(ANDROID_LOG_ERROR, "Add Status",
"A: %d, B: %d, C: %d, Status: %#010x", A, B, result, status);
A = 10;
B = 11;
status = addGetStatus(A, B, &result);
__android_log_print(ANDROID_LOG_ERROR, "Add Status",
"A: %d, B: %d, C: %d, Status: %#010x", A, B, result, status);
A = -10;
B = -10;
status = addGetStatus(A, B, &result);
__android_log_print(ANDROID_LOG_ERROR, "Add Status",
"A: %d, B: %d, C: %d, Status: %#010x", A, B, result, status);
A = 100;
B = -1000;
status = addGetStatus(A, B, &result);
__android_log_print(ANDROID_LOG_ERROR, "Add Status",
"A: %d, B: %d, C: %d, Status: %#010x", A, B, result, status);
A = -100;
B = -32768;
status = addGetStatus(A, B, &result);
__android_log_print(ANDROID_LOG_ERROR, "Add Status",
"A: %d, B: %d, C: %d, Status: %#010x", A, B, result, status);
.
94 6. REAL-TIME FILTERING
A = 32767;
B = 1000;
status = addGetStatus(A, B, &result);
__android_log_print(ANDROID_LOG_ERROR, "Add Status",
"A: %d, B: %d, C: %d, Status: %#010x", A, B, result, status);
.
On an iOS platform, the same output is shown using the printf method:
short A = 32767;
short B = 32767;
A = 32767;
B = -32768;
status = addGetStatus(A, B, &result);
printf("A: %d,B: %d,C: %d,Status: %#010x\n", A, B, result, status);
A = 10;
B = 11;
status = addGetStatus(A, B, &result);
printf("A: %d,B: %d,C: %d,Status: %#010x\n", A, B, result, status);
A = -10;
B = -10;
status = addGetStatus(A, B, &result);
printf("A: %d,B: %d,C: %d,Status: %#010x\n", A, B, result, status);
A = 100;
B = -1000;
status = addGetStatus(A, B, &result);
printf("A: %d,B: %d,C: %d,Status: %#010x\n", A, B, result, status);
. A = -100;
L6. LAB 6: REAL-TIME FIR FILTERING, QUANTIZATION EFFECT AND OVERFLOW 95
B = -32768;
status = addGetStatus(A, B, &result);
printf("A: %d,B: %d,C: %d,Status: %#010x\n", A, B, result, status);
A = 32767;
B = 1000;
status = addGetStatus(A, B, &result);
printf("A: %d,B: %d,C: %d,Status: %#010x\n", A, B, result, status);
.
e first half-byte of the status word copied from the APSR contains the NZCV (nega-
tive, zero, carry, and overflow) bit flags. e outcome from the test cases is shown in Figures L6.2
and L6.3. e first hexadecimal character corresponds to the bits of the NZCV flags. For the case
of 215 1 C 215 1 (the largest positive value represented by Q15 numbers), one can see the
resulting status of 0x9 or binary 1001. is means that the result became negative and produced
an overflow.
Next, quantize the filter by using the MATLAB fixed-point toolbox func-
tion sfi./. For example, if coeffs denotes double precision filter coefficients,
ficoeffs = sfi(coeffs,bits,bits-intgr-1) can be used to convert to quan-
tized values with bits denoting wordlength, intgr integer bits, and intgr-1
fractional bits. If the magnitude of any of the coefficients is greater than or equal to 1, an
appropriate amount of integer bits needs to be used. To retrieve quantized coefficients, the
function ficoeffs.data can be used.
Test your filter using the signal: chirp(t,0,ts*ns,fs/2) with ns equal to 256 samples.
Determine the minimum number of bits needed to represent the coefficients such that the
maximum absolute error of the filter in the frequency domain is less than five percent, i.e.,
96 6. REAL-TIME FILTERING
the comparison should be between the frequency spectrum of the filtered output based on
the quantized filter output and the double precision unquantized filter output.
2. Implement the filter designed above in C using 16-bit short values for the coefficients as
well as for the input samples. All computations are to be performed using fixed-point rep-
resentation.
e filter output may get overflowed due to the multiplications and summations involved
in the FIR filtering equation. Develop a scheme to detect such overflows and implement a
prevention measure. e most effective way to avoid overflows is by scaling down the input
L6. LAB 6: REAL-TIME FIR FILTERING, QUANTIZATION EFFECT AND OVERFLOW 97
signal magnitude before filtering and then reversing the scaling when the output is returned.
Keep scaling the input signal samples by a scaling factor less than one (a scaling factor of
1/2 can be achieved simply by right shifting) until the overflow disappears.
99
CHAPTER 7
Adaptive Filtering
In this chapter, an adaptive FIR filter is used to model the behavior of an Infinite Impulse Re-
sponse (IIR) filter. Let us first examine IIR filtering.
where ak ’s and bk ’s denote the coefficients. e recursive behavior of the filter is caused by the
feedback provided from the ak coefficients acting on the previous output terms y Œn k. is is
called Direct Form I and the following C code implements it:
bSum = 0;
for( i = 0; i <= N; i++) {
bSum += x[N-i]*b[i];
}
aSum = 0;
for( i = 1; i <= N; i++ ) {
aSum += y[i-1]*a[i];
. }
100 7. ADAPTIVE FILTERING
return y[0];
}
.
Compared to FIR filters, IIE filters have advantages and disadvantages. IIR filters allow
meeting a desired frequency response characteristic via a lower number of coefficients than an
equivalent FIR filter, resulting in a lower computation time. FIR filters provide linear phase re-
sponse whereas IIR filters do not. Unlike FIR filters which have no poles and are thus guaranteed
to be stable, special care must be taken with IIR filters. Stability of IIR filters is heavily dependent
on quantization. Recall the finite word length effect discussed in previous chapters. e amount
of shift in the positions of poles and zeros can be related to the amount of quantization error in
the coefficients. For an N th order IIR filter, the sensitivity of the i th pole pi with respect to the
k th coefficient ak can be derived to be [1],
@pi piN k
D : (7.2)
@Ak Q
N
.pi pl /
lD1
l¤i
is means that the change in the position of a pole is influenced by the positions of all
the other poles. at is the reason an N th order IIR filter is normally implemented by having a
number of second-order IIR filters in series in order to decouple this dependency of poles and
having real value coefficients.
Unknown d[n]
System +
input x[n]
y[n] -
Adaptive
FIR Filter
e[n]
where h’s denote the FIR filter coefficients. e output yŒn converges to d Œn. e rate of con-
vergence is governed by the step size ı . Small step sizes ensure convergence at the cost of slow
adaptation rate. Larger step sizes lead to faster adaptation at the cost of overshooting the solution.
Additionally, the order of the FIR filter used limits how accurately the unknown system can be
modeled. A low order filter will likely not be able to produce an accurate model. Conversely, a
high order filter might produce an accurate model but the computational complexity of such a
filter would be prohibitive for real-time operation purposes.
7.3 REFERENCES
[1] J. Proakis and D. Manolakis, Digital Signal Processing: Principles, Algorithms, and Applica-
tions, Prentice-Hall, 1996. 100
L7 LAB 7:
IIR FILTERING AND ADAPTIVE FIR FILTERING
is lab consists of two parts. In the first part, the direct form realization of an IIR filter of order
N is compared to its cascade form realization (i.e., N=2 second-order filters). In the second part,
an adaptive FIR filter is used to model or match the IIR filter.
102 7. ADAPTIVE FILTERING
L7.1 IIR FILTER DESIGN
e unknown system that is used as the target for the adaptive FIR filter is an eight-order band
pass IIR filter. e MATLAB function yulewalk is used here to achieve the desired filter design.
Note that the frequency band definitions are decimal numbers ranging from 0 to 1, with 1 repre-
senting Nyquist frequency. e actual sampling rate of the input signal is not needed to generate
the filter coefficients. e MATLAB code below is used to design the filter with the pass-band
from =3 to 2=3 radians and 20dB stop-band attenuation:
e output of the filter can be verified by graphing the filter response in MATLAB as well
as the frequency content of the sampled signal and the filtered signal (shown in Figures L7.1 and
L7.2). e MATLAB code to produce these comparisons was previously shown in Lab 6 (88).
When performing the implementation on a smartphone target, the stop- and pass-band
frequencies and gains may need to be adjusted to hear an audible difference as the frequency
response of the filter will scale along with the sampling rate used to record and process audio on
the smartphone.
L7. LAB 7: IIR FILTERING AND ADAPTIVE FIR FILTERING 103
error = infiniteIR(inSample)-firOutput;
weight = mu*error;
Study the round-off error between the direct form and the second order cascade form using
MATLAB. Use the MATLAB function tf2sos to convert the transfer function into cascade
form and then apply the filter sos as indicated below:
Z = x;
for i=1:size(sos);
Z = filter(sos(i,1:3),sos(i,4:6), Z);
end
.
Examine the effect of various word lengths on the output and report your observations.
Recall that you can quantize the filter by using the MATLAB fixed-point toolbox function
sfi() . For example, if coeffs denotes double precision filter coefficients, the expression
ficoeffs = sfi(coeffs,bits,bits-intgr-1) can be used to convert to quantized
values with bits denoting wordlength, intgr integer bits, and intgr-1 fractional bits.
Although this issue did not arise during the exercise involving FIR coefficient quantization,
it needs to be noted that the number of integer bits must be sufficient to accommodate IIR
filter coefficients whose magnitude is greater than 1.
First, compare the frequency spectra of the filtered outputs of the direct form filter based
on the quantized and unquantized coefficients. en, compare the frequency spectra of the
filtered outputs based on the direct form quantized coefficients and the quantized second-
order sections coefficients.
106 7. ADAPTIVE FILTERING
2. Over time, the output of the FIR filter should converge to that of the IIR filter. Confirm
this by comparing the output of the two filters or by examining the decline in the error term.
Experiment with different step size and ı filter length N and report your observations.
Next, add a delay to the adaptive FIR filtering pipeline to make the real-time processing fail
on purpose. A possible solution to address the real-time processing aspect is to update only a
fraction of the coefficients using the LMS equation during each iteration. For example, you
may update all even coefficients during the first iteration, and then all odd coefficients during
the second iteration. Implement such a coefficient update scheme and report the results in
terms of the tradeoff between convergence rate, convergence accuracy, and processing time.
CHAPTER 8
8
ˆ P
1
ˆ
< X e j D x Œn e j n
; D !Ts
Fourier transform pairs for nD 1
R : (8.1)
discrete signals ˆ 1
X e j e j n d
:̂ x Œn D 2
ese two equations allow the transformation of signals from the time to the frequency
and from the frequency back to the time domain.
8 R T =2
ˆ
ˆ Xk D T1 T =2 x .t / e j !0 k t dt
ˆ
ˆ
ˆ
< P
1
x .t / D Xk e j !0 kt
Fourier series for periodic analog signals (8.2)
ˆ
ˆ kD 1
ˆ
ˆ where T denotes period and
:̂
!0 fundamental frequency.
110 8. FREQUENCY DOMAIN TRANSFORMS
8 NP1
ˆ
ˆ 2
ˆ
ˆ X Œk D x Œn e j N nk ;
ˆ
ˆ nD0
ˆ
<
Discrete Fourier transform (DFT) k D 0; 1; :::; N 1
: (8.3)
for periodic discrete signals ˆ
ˆ NP1
j 2
ˆ x Œn D 1
X Œk e N nk ;
ˆ
ˆ N
ˆ kD0
:̂
n D 0; 1; :::; N 1
float sumXr[N];
float sumXi[N];
is code takes an input structure containing arrays for the real and imaginary components
of a signal segment along with the number of samples contained in the signal segment. e
DFT is then computed and the input signal is overwritten by the DFT. Notice that this code
is computationally inefficient, as it calculates each twiddle factor ( wR and wI ) using a math
library at every iteration. It is also important to note that the frequency-domain resolution of the
DFT may be increased by increasing the size of the transform and zero-padding the input signal.
Zero-padding allows representing the signal spectrum with a greater number of frequency bins.
In the above code, the size of the transform matches the size of the array provided by the input
structure. For example, if the input array is allowed to store four times the length of the original
signal with the remaining three-quarters being zero-padded, the resulting transform will contain
four times the frequency resolution of the original transform.
//bit reversal
for(i=1;i<(k-1);i++) {
L=m;
while(j>=L) {
j = j-L;
L = L/2;
}
j = j+L;
if(i<j) {
tempReal = data->real[i];
tempImaginary = data->imaginary[i];
data->real[i] = data->real[j];
data->imaginary[i] = data->imaginary[j];
data->real[j] = tempReal;
data->imaginary[j] = tempImaginary;
}
}
L = 0;
m = 1;
n = k/2;
//computation
for(i=k; i>1; i=(i>>1)) {
L = m;
m = 2*m;
o = 0;
Note that, unlike in the DFT code, the twiddle factors ( cos and sin ) are pre-computed
and stored in the cosine and sine lookup tables, respectively. is function takes the same
input structure as the DFT function and performs a transformation which overwrites the input
signal with the transformed signal.
8.2 LEAKAGE
When computing DFT, it is required to assume periodicity with a period of Ns samples. Fig-
ure 8.1 illustrates a sampled sinusoid which is no longer periodic. In order to make sure that the
sampled version remains periodic, the analog frequency should satisfy this condition [1]
m
fo D fs (8.5)
Ns
where m denotes number of cycles over which DFT is computed.
When the periodicity constraint is not met, a phenomenon known as leakage occurs. Fig-
ure 8.2 shows the effect of leakage on the FFT computation. In this figure, the FFTs of two
sinusoids with frequencies of 250 Hz and 251 Hz are shown. e amplitudes of the sinusoids
are unity. Although there is only a 1Hz difference between the sinusoids, the FFT outcomes are
significantly different due to improper sampling. In Figure 8.2a, it can be seen that the signal
energy resides primarily in the 250 Hz band. Leakage causes the signal energy to be spread out to
the other bands of the transform. is is evident in Figure 8.2b by the diminished peak at 250 Hz
and increased amplitude of the bands to either side of the peak.
8.3 WINDOWING
Leakage can be reduced by applying a windowing function to the incoming signal. In the time
domain, a windowing function is shaped such that when it is applied to a signal, the beginning
and end taper towards zero. One such window is the Hanning window, shown in Figure 8.3.
114 8. FREQUENCY DOMAIN TRANSFORMS
float* Hanning(int N) {
(a)
(b)
domain resolution of the Fourier transform is increased. In overlap processing, instead of process-
ing signal samples in discrete chunks, samples are buffered and shifted through a time-domain
window. Each shift through the buffer retains some of the previous signal information, on which
the windowing function is applied as illustrated in Figure 8.4. In this figure, the input signal is
x.n/ D u.n/ u.n 221/ and a Hanning window is generated by calling Hanning(485) using
the C function provided earlier. e frame size, or shift, is considered to be 221 samples which
116 8. FREQUENCY DOMAIN TRANSFORMS
Figure 8.4: Fourier transform windowing (from left to right: iteration 1, 2, and 3).
corresponds to the length of a rectangular pulse. is leads to gaining greater resolution in the
time-domain.
8.5 RECONSTRUCTION
Reconstruction or synthesis is the process by which a time-domain signal is recovered from a
frequency domain signal. It involves performing inverse Fourier transform and overlap-add re-
construction when overlap processing is performed.
8.5. RECONSTRUCTION 117
8.5.1 INVERSE FOURIER TRANSFORM
e inverse Fourier transform is operationally very similar to the forward Fourier transform. From
Equation (8.3), one can see that to recover the time domain signal, the complex conjugate of the
twiddle factor WN can be used while scaling the resulting value by the inverse of the transform
size. e code is easily implemented by modifying the code stated earlier for DFT/FFT. An
inverse transform C code is shown below.
float sumXr[N];
float sumXi[N];
8.6 REFERENCES
[1] TI Application Report SPRA291. http://www.ti.com/lit/an/spra291/spra291.pdf
110, 111, 113
L8 LAB 8:
FREQUENCY DOMAIN TRANSFORMS - DFT AND FFT
In this lab, the C implementations of Discrete Fourier Transform (DFT) and Fast Fourier Trans-
form (FFT) are considered.
In the previous filtering labs, although audio data samples were passed in frames (in order
to accommodate the requirements of the audio APIs), the actual filtering operation was done on
a sample by sample basis using linear convolution. However, in performing DFT (or FFT), the
transform requires access to a window of audio data samples which may or may not contain more
than one frame of data. is is referred to as frame processing. In frame processing, N samples
120 8. FREQUENCY DOMAIN TRANSFORMS
need to be captured first and then operations are applied to all N samples with the computation
time measured in terms of the duration of a frame.
e application shell for this lab resembles that of the previous labs. e code basically
follows the same initialization, computation, and finalization methods covered in the previous
labs. e DFT and FFT implementations, given in Chapter 8, appear in the file Transforms.c. In
addition, an audio spectrogram application is provided for the Android and iOS targets by taking
into consideration the concepts discussed in the chapter. is app features graphical display of
the frequency spectrum and allows the adjustment of the transform parameters (see Figures L8.1
through L8.3).
Figure L8.2: Android spectrogram app main screen and settings menu.
Find the output of the system y.n/ to an input audio signal x.n/ via the overlap and add
convolution method. Record the processing time for the case with 256-sample frames as
input. It helps to use the MATLAB fdesign tool to generate the filter coefficients for this
system.
3. Frequency Domain Filtering – Solve the previous bandpass filter system in the frequency
domain by using two forward and one inverse FFT by using Y .k/ D H.k/X.k/ (convolu-
tion property). For the frequency domain case, consider the results when using 512-point
FFTs and 256-sample frames for the following two cases:
122 8. FREQUENCY DOMAIN TRANSFORMS
CHAPTER 9
Code Optimization
In this chapter, code optimization techniques which often have a major impact on the compu-
tational efficiency of C codes are covered. ese techniques include compiler optimizations, ef-
ficient C code writing, and architecture-specific instructions of the ARM processor. For a better
understanding of these techniques, they are illustrated through the signal processing example app
of linear convolution. In general, to write an efficient C code, it helps to know how the processor
implements it the way it is written.
e subsections that follow are:
• Code Timing
• Linear Convolution
• Compiler Options
• Coding Techniques
• Architecture Specific Instructions
e variables will show the total execution time of any code placed where the comment sec-
tion is indicated. More information on timing functionality may be found in the documentation
for the relevant C headers.
newFIR->numCoefficients = numCoefficients;
newFIR->frameSize = frameSize;
newFIR->coefficients =
(float*)malloc(numCoefficients*sizeof(float));
newFIR->window = (float*)calloc(numCoefficients + frameSize,
sizeof(float));
newFIR->result = (float*)malloc(frameSize*sizeof(float));
int i;
for(i=0;i<numCoefficients;i++) {
newFIR->coefficients[(numCoefficients - 1) - i] =
(float)coefficients[i];
}
return newFIR;
}
ndk {
moduleName "yourLibrary"
abiFilter "armeabi"
ldLibs "log"
cFlags "-O3"
}
.
When using Xcode, all options for C code libraries can be set within the Build Settings
of the app by changing the Optimization Level under the Apple LLVM 6.1—Code Generation
section.
e array window is stored in heap memory using the previously defined FIRFilter struc-
ture as these values need to be retained between calls to the compute method. Memory allocation
is time consuming and multiple repeated allocations should be avoided if possible.
Another way to improve code performance is to reduce the logic necessary for the loop to
operate. Although the above two loops may appear fine, it still takes extra operations to compute
the array index and thus the memory address of the desired value. A method involving pointer
manipulation can be used as shown in the following code block:
ndk {
moduleName "yourLibrary"
. abiFilter "armeabi armeabi-v7a mips x86"
9.5. ARCHITECTURE-SPECIFIC INSTRUCTIONS 129
ldLibs "log"
cFlags "-O3"
}
.
is generates native libraries compiled specifically for the targets listed in the abiFilter
directive. For each targeted ABI, the compiler generates a native library which gets included
with the application. By default, all available ABIs will be built by the build system. When the
application is installed, the corresponding library also gets installed.
Architecture-specific optimizations may be included by setting flags and enabling code
sections at compile time. Flags can be set in the build.gradle file by checking the target using the
productFlavors directive as follows:
productFlavors {
armv7 {
ndk {
abiFilter "armeabi-v7a"
cFlags "-mfloat-abi=softfp -mfpu=neon -march=armv7-a
-DMETHOD=1"
}
}
}
.
Code sections can then be enabled or disabled depending on the flags set when the library
is getting compiled. is allows architecture specific optimizations to be included in one main set
of source files. As noted below, the METHOD flag is defined to enable NEON code blocks:
#if METHOD == 1
/* Normal code */
#elif METHOD = 2
/* NEON code */
#endif
.
Note that this is only one case and the Gradle build system allows compilation of completely
separate source sets for different architectures. is eliminates the need for using compiler flags for
source code selection when building for different architectures. In addition, separate compilation
flags may be set for each product flavor, allowing one to fine-tune to a specific architecture. It is
130 9. CODE OPTIMIZATION
to be emphasized that this discussion of the Gradle build system may change due to the relatively
recent release of Android Studio as well as the continued development effort by Google on the
Android Studio IDE.
e overall result is the same as the previous code versions, but now the linear convolution
result is computed with vectors containing four elements each.
9.6 REFERENCES
[1] A. Sloss, D. Symes, and C. Wright, ARM System Developer’s Guide, Morgan Kaufmann
Publishers, 2004. 128
L9 LAB 9:
CODE OPTIMIZATION
e purpose of this lab is to experiment with the optimization steps discussed above. ese steps
include changing compiler settings, writing efficient code constructs, and using architecture-
specific instructions for the ARM processor. e FIR filtering (linear convolution) example is
considered as a model case to show the effects of these steps on the real-time throughput.
Consider a lowpass filter whose passband covers the human vocal frequency range. e
specification used to generate the filter in MATLAB is shown below:
CHAPTER 10
Implementation via
Simulink/MATLAB
is chapter presents the steps one needs to take in order to run a signal processing algorithm
designed in Simulink or MATLAB on the ARM processor of smartphones, which was first dis-
cussed in [1]. e steps are conveyed by transitioning the linear convolution filtering algorithm as
a Simulink model to smartphone. is chapter also shows how MATLAB script can be embed-
ded into Simulink models for smartphone deployment. Considering that Simulink programming
is widely used in signal processing, the approach presented in this chapter is beneficial due to the
ease with which a signal processing algorithm may be adapted to run on smartphones.
In addition to the most recent available versions of MATLAB and Simulink from Math-
Works, the appropriate support package for the smartphone platform is required. For Android
targets the package “Simulink Support Package for Samsung GALAXY Android Devices” is
required to be installed, and for iOS the package “Simulink Support Package for Apple iOS De-
vices.” More details on these packages are available at the MathWorks websites:
http://www.mathworks.com/help/supportpkg/android/index.html
http://www.mathworks.com/hardware-support/ios-device-simulink.html
ese links provide setup instructions as well as explanation for the Simulink blocks that
provide access to the sensors and outputs of smartphones.
Connections between the blocks must be added following the highlighted lines, with the
final model layout appearing as shown below in Figure 10.2:
Parameters for the blocks may then be changed by double clicking on the block to be con-
figured. Set the Frame Size of the Audio Capture block to 1024 samples and 16 kHz sampling
rate. Next, open the configuration for the Discrete FIR Filter block and go to the Data Types
tab. Set the Output word length to “Inherit: Same as Input” and check the “Saturate on integer
overflow” option. In the Main tab, the filter coefficients can be set from a pre-computed transfer
function in array form. e coefficients are shown below together with the Data Types screen in
Figure 10.3.
10.1. SIMULINK MODEL DESIGN 137
e model is now ready to be deployed to a smartphone. Set the simulation stop time to
inf. Now navigate to the Tools > Run on Target Hardware menu and select the Prepare to Run...
option. From the window that opens, select the smartphone target and click OK to save the
setting. From the main model view, it is now possible to click the Deploy to Hardware button
and the model should automatically be built and installed on a connected smartphone. When the
model is installed, it will immediately start running and processing data. In the next subsection,
the MATLAB code block approach for the deployment on smartphones is presented.
138 10. IMPLEMENTATION VIA SIMULINK/MATLAB
10.2 MATLAB CODE BLOCKS
MATLAB code blocks allow MATLAB functions to be incorporated into a Simulink model.
MATLAB code blocks can be added to a Simulink model by dragging the MATLAB Function
block from the Simulink Library Browser. e block is located under the Simulink > User-
Defined Functions category.
Continuing with the above Simulink example, the same filter is implemented here as a
MATLAB function and incorporated into the Simulink model. Start by replacing the Discrete
FIR Filter block from the previous model with a MATLAB Function block. Double click the
block and the MATLAB editor will open showing the function code of the block. e default
code will simply pass through the input to the output. Inputs to the function block retain the data
type of the previous Simulink block, and outputs must match the data type of the subsequent
Simulink block. e MATLAB code for the FIR filtering function is shown below:
function y = fcn(u)
B = [0.0311877009838756 -0.0146995512360664 -0.0819729557864041
0.00273929607168683 0.186691140568207 0.378547772724477
0.378547772724477 0.186691140568207 0.00273929607168683
-0.0819729557864041 -0.0146995512360664 0.0311877009838756];
persistent buffer;
if isempty(buffer)
buffer = zeros(2, size(u, 1) + size(B, 2));
end
is function uses the same filter coefficients specified previously. In order to properly per-
form the filtering operation, a buffer of previous input samples is maintained in a persistent array.
In this case, one call to the MATLAB block equates to one frame of audio data due to the Audio
Capture block output. On the initial call to the function, it is initialized to zero with a data type
of double-precision floats. e filter coefficients stored in the B array are also double-precision.
After initialization, the buffer is shifted and the input is converted to double-precision, trans-
10.3. REFERENCES 139
posed, and stored in the upper portion of the buffer. e transpose operation is necessary because
the MATLAB function filter (B, A, X) operates using rows as channels whereas the input has
channels defined as columns. Once the data are stored in the buffer, the samples are filtered and
the result is converted back to int16 which is the data type expected by the Audio Playback block.
Finally, the output of the filter function is truncated to retain the newest filtered samples and the
output transposed and passed along to the MATLAB function output variable.
Although many of the built-in MATLAB functions will work with the Simulink coder,
not all are supported. In cases where the functions would not work, they need to be implemented
manually by the programmer. In general, MATLAB Function blocks allow a wide range of signal
processing functionality. e most important consideration when using these blocks is awareness
of input and output data types. A persistent variable storage can be accomplished by declaring
the persistent variable and performing a one-time initialization. After this declaration and ini-
tialization, any data may be retained between calls to a block. With these techniques, practically
any signal processing algorithm, whether in the form of MATLAB code or as a Simulink model,
can be compiled to run on Android or iOS smartphones.
10.3 REFERENCES
[1] R. Pourreza-Shahri, S. Parris, F. Saki, I. Panahi, and N. Kehtarnavaz, “From Simulink to
Smartphone: Signal processing Application Examples,” Proceedings of IEEE ICASSP Con-
ference, Australia, April 2015. 135
141
Authors’ Biographies
NASSER KEHTARNAVAZ
Nasser Kehtarnavaz is Professor of Electrical Engineering at University of Texas at Dallas. His
research areas include digital signal and image processing, real-time processing on embedded
processors, pattern recognition, and biomedical image analysis. He has published more than 300
articles in these areas and 8 other books pertaining to signal and image processing. He regularly
teaches applied digital signal processing courses, for which this book is intended. Dr. Kehtarnavaz
is a Fellow of IEEE, a Fellow of SPIE, and a licensed Professional Engineer. Among his many
professional activities, he is serving as Editor-in-Chief of Journal of Real-Time Image Processing.
SHANE PARRIS
Shane Parris received his BS degree in Electrical Engineering from University of Texas at Dallas
in 2013. He is currently pursuing his MS degree in Electrical Engineering at the University of
Texas at Dallas. His research interests include signal and image processing, and real-time imple-
mentation of signal and image processing algorithms.
ABHISHEK SEHGAL
Abhishek Sehgal received his BE degree in Instrumentation Technology from Visvesvaraya
Technological University in India in 2012. He is currently pursuing his MS degree in Electri-
cal Engineering at the University of Texas at Dallas. His research interests include signal and
image processing, and real-time implementation of signal and image processing algorithms.
143
Index