Voice Over IP Security (PDFDrive)
Voice Over IP Security (PDFDrive)
Voice Over IP Security (PDFDrive)
Patrick Park
Cisco Press
Cisco Press
800 East 96th Street
Indianapolis, Indiana 46240 USA
ii
TK5105.8865.P37 2008
004.69'5--dc22
2008036070
ISBN-13: 978-1-58705-469-3
ISBN-10: 1-58705-469-8
Trademark Acknowledgments
All terms mentioned in this book that are known to be trademarks or service marks have been appropriately capital-
ized. Cisco Press or Cisco Systems, Inc., cannot attest to the accuracy of this information. Use of a term in this book
should not be regarded as affecting the validity of any trademark or service mark.
iii
Feedback Information
At Cisco Press, our goal is to create in-depth technical books of the highest quality and value. Each book is crafted
with care and precision, undergoing rigorous development that involves the unique expertise of members from the
professional technical community.
Readers’ feedback is a natural continuation of this process. If you have any comments regarding how we could
improve the quality of this book, or otherwise alter it to better suit your needs, you can contact us through email at
[email protected]. Please make sure to include the book title and ISBN in your message.
We greatly appreciate your assistance.
Publisher Paul Boger
Associate Publisher Dave Dusthimer
Cisco Press Program Manager Jeff Brady
Executive Editor Brett Bartow
Managing Editor Patrick Kanouse
Development Editor Dan Young
Project Editor Seth Kerney
Copy Editor Margaret Berson
Technical Editors Bob Bell
Dan Wing
Editorial Assistant Vanessa Evans
Designer Louisa Adair
Composition Octal Publishing, Inc.
Indexer WordWise Publishing Services LLC
Proofreader Water Crest Publishing, Inc.
iv
Dedication
This book is dedicated to our God who lifted me up for this opportunity, my wonderful wife, Sun, and my children, Janice
and Jayden. Thank you all for making me complete.
Acknowledgments
I’d like to give special recognition to Dan Young and Andrew Cupp for providing their expert technical knowledge
in editing the book and working hard to keep the book on time.
A big “thank you” goes out to Dan Wing and Bob Bell for giving great comments during the review process and
helping me complete this book.
Thanks to Allan Konar, Yoon Son, and Mo Kang for contributing their technical expertise, which helped me find the
right direction in the initial writing of this book.
Last but not least, I’d like to thank my current manager, Shamim Pirzada, who mentors me and encourages me to
spend extra time for personal development. Also, thanks to my colleagues, the Photon team, who gave great inspira-
tion and technical information.
vi
Contents at a Glance
Introduction xvii
Part I VoIP Security Fundamentals 3
Chapter 1 Working with VoIP 5
Chapter 2 VoIP Threat Taxonomy 19
Chapter 3 Security Profiles in VoIP Protocols 47
Chapter 4 Cryptography 83
Chapter 5 VoIP Network Elements 107
Part II VoIP Security Best Practices 125
Chapter 6 Analysis and Simulation of Current Threats 127
Chapter 7 Protection with VoIP Protocol 175
Chapter 8 Protection with Session Border Controller 203
Chapter 9 Protection with Enterprise Network Devices 249
Part III Lawful Interception (CALEA) 289
Chapter 10 Lawful Interception Fundamentals 291
Chapter 11 Lawful Interception Implementation 307
Index 345
vii
Contents
Introduction xvii
Part I VoIP Security Fundamentals 3
Chapter 1 Working with VoIP 5
VoIP Benefits 6
VoIP Disadvantages 8
Sources of Vulnerability 10
IP-Based Network Infrastructure 10
Open or Public Networks 11
Open VoIP Protocol 11
Exposed Interface 11
Real-Time Communications 11
Mobility 11
Lack of Security Features and Devices 11
Voice and Data Integration 12
Vulnerable Components 12
Myths Versus Reality 14
Legacy Versus VoIP Systems 14
Protecting Networks Using Strict Authentication and Encryption 14
Protecting Networks Using a Data Security Infrastructure 15
Summary 15
End Notes 16
References 16
Chapter 2 VoIP Threat Taxonomy 19
Threats Against Availability 20
Call Flooding 20
Malformed Messages (Protocol Fuzzing) 22
Spoofed Messages 24
Call Teardown 25
Toll Fraud 26
Call Hijacking 26
Registration Hijacking 27
Media Session Hijacking 27
Server Impersonating 28
QoS Abuse 29
viii
Security Profiles 67
Digest Authentication 68
Identity Authentication 69
Secure/Multipurpose Internet Mail Extensions (S/MIME) 70
Secure RTP 71
TLS 71
IPSec 73
MGCP 74
Overview 74
Basic Call Flow 75
Security Profiles 75
Summary 78
End Notes 79
References 80
Chapter 4 Cryptography 83
Symmetric (Private) Key Cryptography 84
DES 85
3DES 87
AES 89
SubBytes 89
ShiftRows 90
MixColumns 91
AddRoundKey 92
Asymmetric (Public) Key Cryptography 92
RSA 93
Digital Signature 95
Hashing 96
Hash Function (MD5) 97
SHA 98
Message Authentication Code 99
MAC Versus Digital Signature 100
Key Management 100
Key Distribution 101
Summary 103
End Notes 104
References 104
x
Summary 200
End Notes 200
References 201
Chapter 8 Protection with Session Border Controller 203
Border Issues 204
Between Access and Core Networks 206
Between Core and Peer Networks 207
Access and Peer SBCs 208
SBC Functionality 208
Network Topology Hiding 208
Example of Topology Hiding 209
DoS Protection 213
Policy-Driven Access Control 213
Hardware Architecture 215
Overload Prevention 216
Registration Timer Control 217
Ping Control 220
Load Balancing 220
NAT Traversal 222
Lawful Interception 224
Other Functions 226
Protocol Conversion 226
Transcoding 226
Number Translation 227
QoS Marking 228
Service Architecture Design 228
High Availability 229
Active-Standby 230
Active-Active 231
Network Connectivity 232
Service Policy Analysis 234
Virtualization 237
Optimization of Traffic Flow 239
Deployment Location 239
Media Control 240
Summary 245
End Notes 246
References 246
xiii
Router Switch
PC Server Certificate
Authority
V
v V IP IAD
ATA IAD Router
Voice SIP Server CallManager
Gateway
IP Phone Phone
Signaling Network Cloud Laptop
Controller
NAT Hub
Fax Router with
Firewall
Introduction
Voice over Internet Protocol (VoIP) has been popular in the telecommunications world since its emer-
gence in the late 90s, as a new technology transporting multimedia over the IP network. In this book, the
multimedia (or rich media) includes not only voice, but also video, instant message, presence data, and
fax data over the IP network.
Today people commonly make phone calls with IP phones or client software (such as Skype or iChat)
on their computer, or send instant messages to their friends. This gives them convenience and cost
savings. Many telecommunications companies and other organizations have been switching their legacy
phone infrastructure to a VoIP network, which reduces costs for lines, equipment, manpower, and
maintenance.
However, the benefits of VoIP are not free. There are disadvantages to using VoIP. The integrated rich
media makes it difficult to design the network architecture. Multiple VoIP protocols and different methods
of implementation create serious interoperability issues. Integration with existing data networks creates
quality of service issues. The fact that so many network elements are involved through open (or public)
networks creates serious security issues, because each element and network has vulnerable factors.
The security issues especially are becoming more serious because traditional security devices (such as
firewalls) and protocols (such as encryption) cannot protect VoIP services or networks from recent intel-
ligent threats.
This book focuses on the important topic of VoIP security by analyzing current and potential threats to
demonstrating the methods of prevention.
Chapter 4 Cryptography
NOTE This chapter approaches the topics at a high level. The technical details are described in
Part II, “VoIP Security Best Practices.”
Like every technology, VoIP has many benefits and disadvantages. The following section
describes the benefits of VoIP.
6 Chapter 1: Working with VoIP
Service Provider
Network
PSTN
Media
Gateway v
IP
VoIP Servers
Consumer Enterprise
Video
Network Network
DSL Router
CallManager
ATA
V
Voice
Voice
Voice, Video, Enterprise
Fax
Presence, and IM IM and Presence
VoIP Benefits
The reason for the prevalence of VOIP is that it gives significant benefits compared to
legacy phone systems. The key benefits are as follows:
• Cost savings—The most attractive feature of VoIP is its cost-saving potential. When
we move away from public switched telephone networks, long-distance phone calls
become inexpensive. Instead of being processed across conventional commercial
telecommunications line configurations, voice traffic travels on the Internet or over
private data network lines.
For the enterprise, VoIP reduces cost for equipment, lines, manpower, and maintenance.
All of an organization’s voice and data traffic is integrated into one physical network,
bypassing the need for separate PBX tie lines. Although there is a significant initial
setup cost, significant net savings can result from managing only one network and not
needing to sustain a legacy telephony system in an increasingly digital and data-centered
world. Also, the network administrator’s burden may be lessened as they can now
focus on a single network. There is no longer a need for several teams to manage a
data network and another to manage a voice network.
VoIP Benefits 7
For consumers, VoIP reduces the charge of subscription or usage, especially for long
distance and international calls.
• Rich media service—The legacy phone system mainly provides voice and fax
service even though limited video service is possible. However, the demand of users
is much higher than that, as shown in today’s rich media communications through the
Internet. People check out friends’ presence (such as online, offline, busy), send
instant messages, make voice or video calls, transfer images, and so on. VoIP
technology makes rich media service possible, integrating with other protocols
and applications.
Rich media service not only provides multiple options of media to users, but also
creates new markets in the communications industry, such as VoIP service in mobile
phones.
• Phone portability—The legacy phone system assigns a phone number with a
dedicated line, so you generally cannot move your home phone to another place if you
want to use the same phone number. It is a common hassle to call the phone company
and ask for a phone number update when moving to a new house. However, VoIP
provides number mobility: The phone device can use the same number virtually
everywhere as long as it has proper IP connectivity. Many businesspeople today bring
their IP phones or softphones when traveling, and use the same numbers everywhere.
• Service mobility—The context of mobility here includes service mobility as well.
Wherever the phone goes, the same services could be available, such as call features,
voicemail access, call logs, security features, service policy, and so on.
• Integration and collaboration with other applications—VoIP protocols (such as
Session Initiation Protocol [SIP], H.323) run on the application layer and are able to
integrate or collaborate with other applications such as email, web browser, instant
messenger, social-networking applications, and so on. The integration and collaboration
create synergy and provide valuable services to the users. Typical examples are
voicemail delivery via email, click-to-call service on a website, voice call button on
an email, presence information on a contact list, and so on.
• User control interface—Most VoIP service providers provide a user control
interface, typically a web GUI, to their customers so that they can change features,
options, and services dynamically. For example, the users log in to the web GUI and
change call forwarding number, speed dial, presence information (online, offline),
black/white list, music-on-hold option, anonymous call block, and so on.
• No geographical boundary—The VoIP service area becomes virtualized without
geographical limit. That is, the area code or country code is no longer bound to a
specific location. For example, you could live in South Korea but subscribe to a U.S.
phone number, which makes it possible that all calls to the U.S. become domestic calls
(cheaper) even though you live in South Korea.
8 Chapter 1: Working with VoIP
• Rich features—VoIP provides rich features like click-to-call on a web page, Find-
Me-Follow-Me (FMFM), selective call forwarding, personalized ring tones (or
ringback tone), simultaneous rings on multiple phones, selective area or country code,
and so on.
Now that you are aware of many of the benefits, the next section takes a look at several
disadvantages.
VoIP Disadvantages
The benefits of VoIP do not come free of charge. There are significant disadvantages for
using VoIP, as follows:
• Complicated service and network architecture—Integrated rich media services
(such as voice, video, IM, presence, and fax) make it difficult to design the service and
network architecture because many different types of devices for each service are
involved, as well as different protocols and characteristics of each media. Rich
features (such as click-to-call and FMFM) also make the architecture more complicated
because many different applications (such as web and email) and platforms are
involved. This complication requires extra time and resources when designing,
testing, and deploying. It also causes various errors and makes it harder to
troubleshoot and isolate them.
• Interoperability issues between different protocols, applications, or products—
There are multiple VoIP protocols (such as SIP, H.323, Media Gateway Control
Protocol [MGCP], and Skinny), and product companies who choose whatever they
like when developing products, which means there are always interoperability issues
between the products that use different protocols. Even between the products using
the same protocol, interoperability issues still come up because of different ways of
implementation, different versions (extensions), or different feature sets. Therefore, it
is common for VoIP service providers to spend a significant amount of time and
resources for testing interoperability and resolving the issues.
• Quality of service (QoS) issues—Voice and video streams flow over an IP network
as real-time packets, passing through multiple networks and devices (such as switches,
routers, firewalls, and media gateways). Therefore, ensuring QoS is very difficult and
costs lots of time and resources to meet the user’s expectations. The main factors in
QoS are packet loss, delay (latency), and jitter (packet delay variation).
In a comparison of VoIP QoS versus traditional circuit switched networks, Sinden2
reported data from a Telecommunications Industry Association (TIA) study that
showed even a fairly small percentage of lost packets could push VoIP network QoS
below the level users have come to expect on their traditional phone lines. Each coder-
decoder (codec) the TIA studied experienced a steep downturn in user satisfaction
when latency crossed the 150-ms point. However, even with less than 150 ms of
VoIP Disadvantages 9
latency, a packet loss of 5 percent caused VoIP traffic encoded with G.711 (an
international standard for encoding telephone audio on a 64-kbps stream) to drop
below the QoS levels of the PSTN, even with a packet loss concealment scheme.
Similarly, losses of 1 and 2 percent, respectively, were enough to place quality in VoIP
networks encoded with G.723.1 (for very low bit-rate speech compression) and
G.729A (for voice compression on an 8kbps stream) below this threshold. At losses
of 3 and 4 percent, respectively, the performance of these networks resulted in a
majority of dissatisfied users.
• Power outages—Legacy home phones continue to work even during a power outage
because the phone line supplies 48 volts constantly. However, VoIP phones use regular
data network lines that do not provider power in most cases, which means you cannot
use VoIP phones during power outages. Of course, there are inline power solutions
(such as Power over Ethernet), but these are mainly for enterprise environments.
• Emergency calls—Unlike legacy phone connections, which are tied to a physical
location, VoIP allows phone portability as described in the previous section, which is
convenient for users. However, the flexibility complicates the provision of emergency
services like an E-911 call, which provides the caller’s location to the 911 dispatch
office based on the caller ID (phone number). Especially for users using softphones
on their mobile computers, E-911 service is almost impossible unless the users notify
the service provider of their physical location every time they move. Although most
VoIP vendors have workable solutions for E-911 service, government regulators and
vendors are still working out standards and procedures for 911 services in VoIP
environment.
• Security issues—In a legacy phone system, the security issue is mainly intercepting
conversations that require physical access to phone lines or compromise of the office
PBX. In VoIP, based on open or public networks, security issues are much more than
that. Between a caller and callee, many elements (such as IP phones, access devices,
media gateways, proxy servers, and protocols) are involved in setting up the call and
transferring the media. Each element has vulnerable factors that are targets for
attackers. The next few sections provide examples.
• Legal issues (lawful interception)—Legal wiretapping in VoIP, also called lawful
interception (LI), is much more complicated than that in legacy phone systems,
because of the complexity of VoIP service architecture. For the details, refer to
Chapter 10, “Lawful Interception Fundamentals.”
Among these disadvantages, the security issues are becoming more serious because
traditional security devices (such as firewalls and Intrusion-Detection Systems) and
protocols (such as encryption) cannot protect VoIP services or networks from recent
intelligent threats.
10 Chapter 1: Working with VoIP
The following sections look into the vulnerability from the following aspects:
• What are the sources of vulnerability?
• What are the vulnerable components?
• What do people misunderstand about the vulnerability?
Sources of Vulnerability
VoIP has two types of vulnerability. One is the inherited vulnerability coming from an
existing infrastructure such as the network, operating system, or web server that VoIP
applications are running on. The other is its own vulnerability coming from VoIP protocols
and devices, such as IP phone, voice gateway, media server, signaling controller, and so on.
Basically, these vulnerabilities are derived from the characteristics of VoIP that are shown
in Figure 1-2.
Real-Time Communications
Exposed Interface
A client/server model is the basic architecture of VoIP service. Generally, servers are
located in a protected network (the enterprise’s or the service provider’s), but the interfaces
receiving call requests are open to clients that are located in an open or public network. It
is possible for attackers to scan random IPs/ports and find the exposed interfaces for
sending malicious traffic, such as Denial of Service (DoS), toll fraud, and so on.
Real-Time Communications
Unlike regular data service like email, VoIP services work with real-time media traffic that
is very sensitive about packet delay, loss, and jitter (packet delay variation). Even minor
packet delay or jitter could be recognized by users and impact the overall QoS. Packet loss
also can impact the QoS because VoIP uses User Datagram Protocol (UDP) packets in most
cases, and there is no retransmission mechanism.
Mobility
A legacy phone system assigns a dedicated line to a certain phone number and does not
provide the users with mobility, It typically requires physical access for an attacker to spoof
the identity (the telephone number or line). However, generally, VoIP allows endpoints to
be virtually everywhere as long as they have proper IP connectivity, which complicates
protection against identity spoofing.
Vulnerable Components
All components involved in VoIP service have vulnerable elements that are affected directly
or indirectly. The following are VoIP’s main components and their vulnerability.
• Operating system of the VoIP application—VoIP applications run on many
different types of operating systems such as Linux/Unix, Microsoft Windows, or real-
time operating system (RTOS), and are affected by the vulnerabilities inherent in
those operating systems and network code implementations (for example, IP and
TCP). The frequent security patches for the operating systems prove that they always
have security issues.
• VoIP application—There are many different types of VoIP applications; for example,
softphones (Skype, Google Talk), instant messengers (AOL AIM and MSN
Messenger), call managers, softswitches, and so on. The application itself may have
security issues because of bugs or errors, which could make VoIP service insecure.
• Management interface—For management purposes, most VoIP devices have service
interfaces such as Simple Network Management Protocol (SNMP), Secure Shell
(SSH), Telnet, and HTTP. The interfaces could be the source of vulnerability,
especially when being configured carelessly. For example, if a VoIP device uses a
“public” community name in SNMP, an attacker can get valuable information (for
example, configuration) by using SNMP queries. If a VoIP device uses the default ID/
password for its management interface, it is easy for an attacker to break in.
• TFTP Server—Many VoIP devices, especially customer premise equipment (CPE),
download their configurations from a TFTP server. An attacker could sniff the packets
and gather the server information. Or, an attacker could impersonate a TFTP server by
spoofing the connection, and then distribute a malicious configuration to the CPE.
Vulnerable Components 13
• Web client/server—Many VoIP applications are embedded into a web client (that is,
a browser) to provide web services (for example, click-to-dial service, corporate
directory lookup, and timecard services). These services inherit the vulnerability of
web client/servers, such as malicious code or worms.
• Access device (switch, router)—All VoIP traffic flows through access devices
(Layer 2 and 3 switch or router) that are in charge of switching or routing. Compro-
mised access devices could create serious security issues because they have full con-
trol of packets. Even minor wrong configuration could be a potential security hole. For
example, an attacker compromises a Layer 2 switch and sets up a monitoring port for
a particular voice VLAN. The attacker can capture all VoIP signals and media through
the monitoring port without any impact on end users. Another example is that wrong
configuration on a Layer 3 router could make an unnecessary broadcasting domain
where a potential attacker could sniff broadcasted messages that are used for further
attacks.
• Network—The network itself can be the vulnerable component because of uncontrolled
traffic, regardless of malicious or not. For example, the flooded traffic from certain
endpoints not only threatens the target server, but also exhausts network bandwidth so
that other legitimate traffic cannot go through. The flooded traffic could come from
either malicious sources as a part of Denial-of-Service (DoS), or legitimate devices
that have wrong configuration or bugs.
• VoIP protocol stack—Security factors are not much considered when most VoIP
protocols (for example, SIP and H.323) are designed. For example, the initial version
of SIP (RFC 2543) allows clear-text–based credentials; that is, anyone can see the
password as long as they can sniff the packets. The latest version of SIP (RFC 3261)
supports the digest format of password (that is, hashed password), but it is still
vulnerable to brute-force or dictionary attack. Quite a large number of current threats
abuse this kind of security weakness on the protocol. Therefore, these protocols
recommend combining with other security protocols (for example, Transport Layer
Security [TLS], Secure/Multipurpose Internet Mail Extensions [S/MIME]) when
implementing them.
Now that you are aware of the vulnerable components in VoIP, the next section explains
some misunderstandings about the vulnerability.
14 Chapter 1: Working with VoIP
Summary
VoIP has been prevailing in the telecommunication world since its emergence in the late
90s, as a new technology transporting multimedia over the IP network. The reason for its
prevalence is that VoIP gives significant benefits compared to legacy phone systems. The
key benefits are cost savings, rich media service, phone portability, service portability,
integration with other applications, lack of geographical boundary, and rich features.
The benefits of VoIP do not come without cost. There are significant disadvantages for
using VoIP, such as complicated service architecture, interoperability issues, QoS issues,
power outages, and legal and security issues. Among these disadvantages, VoIP security
issues are becoming more serious because traditional security devices (for example, firewalls,
IDS/IPS), protocols (for example, encryption), and architectures do not adequately protect
VoIP service or network from recent intelligent threats.
There are two types of vulnerability in VoIP. One is the inherited vulnerability coming from
an existing infrastructure such as network, operating system, or web server that VoIP
applications are running on. The other is its own vulnerability coming from VoIP protocol
and devices, such as IP phone, voice gateway, media server, signaling controller, and so on.
These vulnerabilities are derived from the characteristics of VoIP, which uses IP-based
network infrastructure, public (or open) networks, standard protocol, exposed interface to
the public, real-time communications, mobility, and integration with data.
All components involved in VoIP service have vulnerable elements that affect it directly or
indirectly. The main components of vulnerability are the operating system of the VoIP
application, the VoIP application itself, the management interface, TFTP server, web client/
server, access device (switch, router), network, and VoIP protocol stack.
16 Chapter 1: Working with VoIP
There are some misunderstandings related to VoIP’s vulnerability and protection. The
reality is that a VoIP system is more secure than a legacy phone system as long as it
maintains a basic level of security infrastructure. Strict authentication and encryption are
not enough to protect network and end users against today’s sophisticated threats. Secure
infrastructure of the data network can help to make VoIP network secure but not enough to
protect application-specific attacks.
End Notes
1 Security Considerations for VoIP Systems, NIST (National Institute of Standards and
Technology), January 2000.
2 Comparison of Voice over IP with circuit switching techniques, R. Sinden
(Southampton University, UK), January 2002.
References
“A Security Blueprint of Enterprise Networks,” Cisco Systems, http://www.cisco.com/
warp/public/cc/so/cuso/epso/sqfr/safe_wp.pdf.
“Comprehensive VoIP Security for the Enterprise,” Sipera Systems, http://www.sipera.com/
assets/Documents/whitepapers/Sipera_Enterprise_VoIP_Security_WP.pdf.
Hersent, O., J. P. Petit, and D. Gurle. IP Telephony (Deploying Voice-over-IP Protocols).
Wiley, 2005.
RFC 3261, “SIP (Session Initiation Protocol),” J. Rosenberg, H. Schulzrinne, G. Camarillo,
A. Johnston, J. Peterson, R. Sparks, M. Handley, and E. Schooler, June 2002.
This page intentionally left blank
This chapter covers the taxonomy of VoIP threats based on the following categories:
• Threats against availability
• Threats against confidentiality
• Threats against integrity
• Threats against social context
CHAPTER
2
NOTE For an exhaustive list of all current and potential threats, go to www.voipsa.org (Voice over
IP Security Alliance).
There are many possible ways to categorize the threats. This book uses the following four
categories that most VoIP threats can belong to:
• Threats against availability
• Threats against confidentiality
• Threats against integrity
• Threats against social context
Each section in this chapter covers each category with typical threat examples. To give you
a better understanding, each section uses figures and protocol examples with Session
Initiation Protocol (SIP).
20 Chapter 2: VoIP Threat Taxonomy
NOTE This chapter approaches these threats at a high level, focusing on the taxonomy. If you want
to see a detailed analysis with simulation, refer to Chapter 6, “Analysis and Simulation of
Current Threats.”
The following section introduces the most critical threats that impact service availability.
Call Flooding
The typical example of DoS is intentional call flooding; an attacker floods valid or invalid
heavy traffic (signals or media) to a target system (for example, VoIP server, client, and
underlying infrastructure), and drops the performance significantly or breaks down the
system. The typical methods of flooding are as follows:
• Valid or invalid registration flooding—An attacker uses this method commonly
because most registration servers accept the request from any endpoints in the public
Internet as an initial step of authentication. Regardless of whether the messages are
valid or invalid, the large number of request messages in a short period of time (for
example, 10,000 SIP REGISTER messages per second) severely impacts the
performance of the server.
• Valid or invalid call request flooding—Most VoIP servers have a security feature
that blocks flooded call requests from unregistered endpoints. So, an attacker registers
first after spoofing a legitimate user, and then sends flooded call requests in a short
Threats Against Availability 21
period of time (for example, 10,000 SIP INVITE messages per second). This impacts
the performance or functionality of the server regardless of whether the request
message is valid or not.
• Call control flooding after call setup—An attacker may flood valid or invalid call
control messages (for example, SIP INFO, NOTIFY, Re-INVITE) after call setup.
Most proxy servers are vulnerable because they do not have a security feature to
ignore and drop those messages.
• Ping flooding—Like Internet Control Message Protocol (ICMP) ping, VoIP protocols
use ping messages in the application layer to check out the availability of a server or
keep the pinhole open in the local Network Address Translation (NAT) server, such as
SIP OPTIONS message. Most IP network devices (for example, a router or firewall)
in the production network do not allow ICMP pings for security reasons. However,
many VoIP servers should allow the application-layer ping for proper serviceability,
which could be a critical security hole.
Figure 2-1 illustrates the example of distributed flooding with zombies; an attacker
compromises other computers with malware (for example, a virus) and uses them as
zombies flooding registration messages. Each zombie sends 1,000 SIP REGISTER
messages per second with different credentials that are randomly generated.
SIP Registrar
Flooded REGISTERs
Responses Service Outage
Zombies
Legitimate Users
Infect
Attacker
In Figure 2-1, the flooded messages will impact the registration server (SIP Registrar)
severely as long as the server processes and replies with any error codes, such as “401
Unauthorized,” “404 Not Found,” “400 Bad Request,” and so on. The impact can be high
resource consumption (for example, CPU, memory, network bandwidth), system malfunction,
22 Chapter 2: VoIP Threat Taxonomy
or service outage. Whether the server responds or not, flooding the SIP registrar with
sufficient registration messages will result in the degradation of service to the legitimate
endpoints.
Not only the intentional flooding just mentioned, but also unintentional flooding exists in
VoIP networks, so-called “self-attack,” because of incorrect configuration of devices,
architectural service design problems, or unique circumstances. Here are some examples:
• Regional power outage and restoration—When the power is backed up after a
regional outage, all endpoints (for example, 10,000 IP phones) will boot up and send
registration messages to the server almost at the same time, which are unintentional
flooded messages. Because those phones are legitimate and distributed over a wide
area, it is hard to control the flooding traffic proactively.
• Incorrect configuration of device—The most common incorrect configuration is
setting endpoint devices (for example, IP phones) to send too many unnecessary
messages, such as a registration interval that is too short.
• Misbehaving endpoints—Problematic software (firmware) or hardware could create
unexpected flooding, especially when multiple or anonymous types of endpoints are
involved in the VoIP service network.
• Legitimate call flooding—There are unusual days or moments when many legitimate
calls are made almost at the same time. One example is Mother’s Day, when a lot of
calls are placed in the United States. Another example is natural disasters (for
example, earthquakes), when people within the area make a lot of calls to emergency
numbers (for example, 911) and their family and friends make calls to the affected
area at the same time.
Those types of intentional and unintentional call flooding are common and most critical
threats to VoIP service providers, who have to maintain service availability continually.
The next type is another form of threat against service availability, by means of malformed
messages.
NOTE Protocol fuzzing is another name for malformed messages. A small difference is that
protocol fuzzing includes malicious messages that have correct syntax but break the
sequence of messages, which may cause system error by making the state machine
confused.
Message body
Session Description Protocol
Session Description Protocol Version (v): = = = = = = 0
Owner/Creator, Session Id (o): 2 2 2 IN IP4 CAL-D600-5814.cc-ntd1.example.com
Session Name (s): Session SDP
Connection Information (c): IN IP4 192.168.10.10
Time Description, active time (t): 0 0
Media Description, name and address (m): audio 9876 RTP/AVP 0
Media Attribute (a): rtpmap:0 PCMU/8000
Note that the comments (bold letters) in Example 2-1 are not shown in the actual SIP
INVITE message. You can find something wrong in the example of an INVITE message.
Three SIP headers (Request-URI, From, and Call-Id) and one version in Session
Description Protocol (SDP) have the wrong format.
24 Chapter 2: VoIP Threat Taxonomy
The server receiving this kind of unexpected message could be confused (fuzzed) and react
in many different ways depending on the implementation. The typical impacts are as
follows:
• Infinite loop of parsing
• Buffer overflow, which may permit execution of arbitrary code
• Break state machine
• Unable to process other normal messages
• System crash
This vulnerability comes from the following sources in general:
1 Weakness of protocol specification
Most VoIP protocols are open to the public and don’t strictly define every single line.
Attackers could find where the weakness of syntax is. Additionally, there are many
customizable fields or tags.
2 Ease of creating the malformed message
Creating a message like that in Example 2-1 is easy for regular programmers. Even
for nonprogrammers, many tools are available to make customized messages.
3 Lack of exception handling in the implementation
Because of time restrictions, most implementers are apt to focus on product features
and interfaces, rather than create exception handling for massive negative cases.
4 Difficulty of testing all malformed cases
It is very difficult to test all the negative cases, even though sophisticated testing tools
covering more cases are coming out these days.
The threat of malformed messages should be preventable as long as the parsing algorithm
handles them properly.
The next threat is spoofed messages that are not malformed but still impact service
availability.
Spoofed Messages
An attacker may insert fake (spoofed) messages into a certain VoIP session to interrupt the
service, or insert them to steal the session. The typical examples are “call teardown” and
“toll fraud.”
Threats Against Availability 25
Call Teardown
The method of malicious call teardown is that an attacker monitors a SIP dialog and obtains
session information (Call-ID, From tag, and To tag), and sends a call termination message
(for example, SIP BYE) to the communication device while the users are talking. The
device receiving the termination message will close the call session immediately. Figure
2-2 illustrates the example with SIP messages.
SIP Proxy
INVITE
200 OK
Media
User A User B
BYE
Attacker
Figure 2-2 assumes that the attacker already monitored call signals between User A and B,
and knew the session information (SIP dialog). The attacker injects the session information
to the BYE message. The IP phone of user A receives the BYE and disconnects the media
channel.
Another method of attack is that an attacker sends the termination messages to random
devices (especially, proxy server) without knowing session information, which may affect
current call sessions.
Compared to previous threats in this section, the malicious call teardown is not a common
attack because the attacker should monitor the target call session before sending a termina-
tion message (BYE).
The next type of attack, toll fraud, also requires preliminary information like credentials
before making fraud calls, but it happens commonly because of monetary benefit.
26 Chapter 2: VoIP Threat Taxonomy
Toll Fraud
A fraudulent toll call is one of the common threats these days, especially for long distance
or international calls. Because most mediation devices (for example, public switched
telephone network [PSTN] media gateway, proxy server) require valid credentials (for
example, ID and password) before setting up the toll call, an attacker collects the credentials
first in many different ways. Typically, an attacker creates spoofed messages for brute-force
password assault on the server until he receives authorization. If the clients use default
passwords or easy-to-guess passwords, it is much easier to find them, especially when an
attacker uses a password dictionary (see Note).
NOTE A password dictionary is a file that contains millions of frequently used passwords.
Most passwords are manually created by humans (rather than by computers), so it’s highly
likely that they will be simple and easy to remember. No one really wants to have to
remember random passwords that are longer than 10 digits, except perhaps system
administrators. For example, a user named John Kim is apt to have passwords such as
“jkim,” “iamjohn,” “johnkim,” “john2kim,” “john4me,” and so on. Therefore, an attacker
using a password dictionary containing millions of commonly used passwords would not
need much time to crack most user-created passwords.
In some cases, the server does not require the credentials, but checks out the source IP
address or subnet of the client to control the access. Especially when call trunking (for
example, SIP trunking) is set up between a VoIP service provider and an enterprise
customer, access control based on the source IP or subnet is commonly used. An attacker
may be able to access the server by spoofing the source IP address.
Call Hijacking
Hijacking occurs when some transactions between a VoIP endpoint and the network are
taken over by an attacker.
The transactions can be registration, call setup, media flow, and so on. This hijacking can
make serious service interruption by disabling legitimate users to use the VoIP service. It is
similar to call teardown in terms of stealing session information as a preliminary, but the
actual form of attack and impact are different.
The typical cases are registration hijacking, media session hijacking, and server
impersonating. The next few sections describe each of these cases.
Threats Against Availability 27
Registration Hijacking
The registration process allows an endpoint to identify itself to the server (for example, SIP
Registrar) as a device that a user is located.
An attacker monitors this transaction and sends spoofed messages to the server in order to
hijack the session. When a legitimate user has been compromised, that user cannot receive
inbound calls. Figure 2-3 illustrates the example with SIP messages.
SIP
Registrar Proxy Inbound Call to User A
REGISTER
From: User A OK
To: User A OK
REGISTER
From: User A
To: Attacker
User A
Attacker
In Figure 2-3, an attacker impersonates a user agent by modifying the “From” header and
adding the attacker’s address to the “To” header when it sends a REGISTER message,
which updates the address-of-record of the target user. All inbound calls to User A will be
routed to the attacker.
This threat happens when the user agent server (Registrar) is relying on only SIP headers
to identify the user agent.
SIP Proxy
INVITE
Ringing
User A User B
200 OK
Media
Voicemail
Box Attacker
In Figure 2-4, User A tries to make a call to User B and the IP phone of User B is ringing.
Having monitored call requests to User B, an attacker detects the call and sends 200 OK
messages to User A with the IP/port address of the attacker’s voicemail server. User A
leaves a voice message for User B in the attacker’s voicemail box. This hijacking happens
before the media session is established between User A and (the intended) user B.
Even after the media session is established between A and B, an attacker can still hijack an
active session by sending a Re-Invite message to User A.
Server Impersonating
A VoIP client sends a request message to a server in the target domain for registration, call
setup or routing, and so on. It is possible for an attacker to impersonate the server, receive
the request message, and then manipulate it for malicious purposes.
The typical method of impersonating a server is attacking the local TFTP server or Domain
Name Service (DNS) server as the initial step. An attacker may intrude into the TFTP server
and replace the configuration file for IP phones with his file having an IP address of a
malicious server (for example, SIP Registrar).
The IP phones downloading the malicious file will send a request message to the wrong
server.
Threats Against Availability 29
An attacker may also compromise the DNS server and replace the entry of current VoIP
server with an IP address of a malicious server. The IP phones looking up the server IP will
receive a wrong one. Figure 2-5 illustrates an example based on SIP transactions with a
Redirect server.
DNS Server
Compromise
Redirect
Server
Attacker
10.10.10.10
original.redirect.com
IP: 10.1.1.10
In Figure 2-5, the attacker compromised the local DNS server first by replacing the IP
address (10.1.1.10) of original.redirect.com with 10.10.10.10, which is the attacker’s
redirect server.
When User A tries to make a call to User B, the IP phone looks up the IP address of the
redirect server (original.redirect.com) and receives the IP (10.10.10.10) of the imperson-
ated server. The INVITE message is sent to the impersonated server, and it replies “302
Moved Temporarily” with wrong contact information that could be a dummy address or
attacker’s proxy server for further threat. The original redirect server (10.1.1.10) cannot
receive any call request in this situation.
QoS Abuse
The elements of a media session are negotiated between VoIP endpoints during call setup
time, such as media type, coder-decoder (codec) bit rate, and payload type. For example, it
may be necessary or desirable to use G.729 when leaving a network (to conserve bandwidth)
30 Chapter 2: VoIP Threat Taxonomy
but to use G.711 when calls are staying inside a network (to keep call quality higher). An
attacker may intervene in this negotiation and abuse the Quality of Service (QoS), by
replacing, deleting, or modifying codecs or payload type.
Another method of QoS abuse is exhausting the limited bandwidth with a malicious tool so
that legitimate users cannot use bandwidth for their service. Some VoIP service providers
or hosting companies limit the bandwidth for certain groups of hosts to protect the network.
An attacker may know the rate limit and generate excessive media traffic through the
channel, so voice quality between users may be degraded.
In this section so far, you have learned about threats against availability, such as call
flooding, malformed messages, spoofed messages (call teardown, toll fraud), call hijacking
(registration and media session hijacking, server impersonating), and QoS abuse. The next
section covers another type of threat: attacks against call data and media confidentiality.
Eavesdropping Media
Eavesdropping on someone’s conversation has been a popular threat since telecommunica-
tion service started a long time ago, even though the methods of eavesdropping are different
between legacy phone systems and VoIP systems.
In VoIP, an attacker uses two methods typically. One is sniffing media packets in the same
broadcasting domain as a target user’s‚ or on the same path as the media. The other is
compromising an access device (for example, Layer 2 switch) and forwarding (duplicating)
the target media to an attacker’s device.
Threats Against Confidentiality 31
The media can be voice-only or integrated with video, text, fax, or image. Figure 2-6
illustrates these cases.
L3 Router
Duplicating
Attacker
Hub
Broadcasting Domain
Attacker
User A
In Figure 2-6, the attacker’s device that is in the same broadcasting domain as the IP phone
of User A can capture all signals and media through the hub. This figure also shows the
possibility that the attacker intrudes in a switch or router, and configures a monitoring port
for voice VLAN, and forwards (duplicates) the media to the attacker’s capturing device.
Another possible way of eavesdropping media is that an attacker taps the same path as
the media itself, which is similar to legacy tapping technique on PSTN. For example, the
attacker has access to the T1 itself and physically splits the T1 into two signals.
Although this technique is targeting media, the next method (call pattern tracking) is
targeting signal information.
32 Chapter 2: VoIP Threat Taxonomy
INVITE sip:[email protected]:5060
192.168.10.10:5060 SIP/2.0
Via: SIP/2.0/UDP 10.10.10.10:5060;branch=z9hG4bK00002000005
10.10.10.10:5060
From: Alice <sip:[email protected]:5060>;tag=2345
[email protected]:5060
To: Bob <sip:[email protected]>
[email protected]
Call-Id: 9252226543-0001
CSeq: 1 INVITE
Contact: <sip:[email protected]>
[email protected]
Expires: 1200
Max-Forwards: 70
Content-Type: application/sdp
Content-Length: 143
===========================================================
SIP/2.0 200 OK
Via: SIP/2.0/UDP 10.10.10.10:5060;branch=z9hG4bK00002000005
From: Alice <sip:[email protected]:5060>;tag=2345
To: Bob <sip:[email protected]>;tag=4567
Call-Id: 9252226543-0001
CSeq: 1 INVITE
Contact: <sip:[email protected]>
[email protected]
Content-Type: application/sdp
Content-Length: 131
Threats Against Confidentiality 33
The following list shows sample information that the attacker may extract from Example 2-2:
• The IP address of the SIP proxy server is 192.168.10.10, and the listening port is 5060.
• They use User Datagram Protocol (UDP) packets for signaling without any
encryption, such as Transport Layer Security (TLS) or Secure Multipurpose Internet
Mail Extension (S/MIME).
• The proxy server does not require authentication for a call request.
• The caller (Alice), who has a phone number 4085251111, makes a call to Bob at
9252226543.
• The IP address of Alice’s phone is 10.10.10.10 and a media gateway is 172.26.10.10
(supposing that the call goes to PSTN).
• The media gateway opens a UDP port, 20000, to receive Real-time Transport Protocol
(RTP) stream from Alice’s phone.
• The media gateway accepts only G.729a codec (Alice’s phone offered G.711a,
G.711u, and G.729a initially).
The information just presented can be used for future attacks, such as DoS attack on the
proxy server or the media gateway.
Data Mining
Like email spammers who collect email addresses from various sources like web pages or
address books, VoIP spammers also collect user information like phone numbers from
intercepted messages, which is one example of data mining.
The general meaning of data mining in VoIP is the unauthorized collection of identifiers that
could be user name, phone number, password, URL, email address, strings or any other
identifiers that represent phones, server nodes, parties, or organizations on the network. In
Example 2-2, you can see that kind of information from the messages.
An attacker utilizes the information for subsequent unauthorized connections such as:
• Toll fraud calls
• Spam calls (for example, voice, Instant Messaging [IM], presence spam)
34 Chapter 2: VoIP Threat Taxonomy
• Service interruptions (for example, call flooding, call hijacking, and call teardown)
• Phishing (identity fraud; see the section “Threats Against Social Context” for more
information)
With valid identities, attackers could have a better chance to interrupt service by sending
many different types of malicious messages. Many servers reject all messages, except
registration, unless the endpoint is registered.
Reconstruction
Reconstruction means any unauthorized reconstruction of voice, video, fax, text, or
presence information after capturing the signals or media between parties. The reconstruction
includes monitoring, recording, interpretation, recognition, and extraction of any type of
communications without the consent of all parties. A few examples are as follows:
• Decode credentials encrypted by a particular protocol.
• Extract dual-tone multifrequency (DTMF) tones from recorded conversations.
• Extract fax images from converged communications (voice and fax).
• Interpret the mechanism of assigning session keys between parties.
These reconstructions do not affect current communications, but they are utilized for future
attacks or other deceptive practices.
In this section so far, you have learned about threats against confidentiality such as
eavesdropping media, call pattern tracking, data mining, and reconstruction. The next
section covers another type of threats: breaking message and media integrity.
Message Alteration
Message alteration is the threat that an attacker intercepts messages in the middle of
communication entities and alters certain information to reroute the call, change information,
interrupt the service, and so on. The typical examples are call rerouting and black holing.
Call Rerouting
Call rerouting is any unauthorized change of call direction by altering the routing
information in the protocol message. The result of call rerouting is either to exclude
legitimate entities or to include illegitimate entities in the path of call signal or media.
Figure 2-7 illustrates the example of including a malicious entity during call setup.
Redirect Server
IP: 192.168.10.10
Proxy Server
302 Moved Temp
Contact: 10.1.1.10
IP: 172.26.1.10
Attacker
Attacker
302 Moved Temp
Contact: 172.26.1.10
INVITE
INVITE INVITE
Proxy Server
In Figure 2-7, an attacker keeps monitoring the call request message (for example, SIP
INVITE) from User A to a redirect server. When User A initiates a call, the IP phone sends
an INVITE message to the redirect server, as shown in Example 2-3.
36 Chapter 2: VoIP Threat Taxonomy
The attacker detects the INVITE and intercepts the response message (that is, “302 Moved
Temporarily”) from the redirect server, as shown in the continuation of Example 2-3.
SIP/2.0 302 Moved Temporarily
From: UserA <sip:[email protected]:5060>;tag=2345
To: Bob <sip:[email protected]>;tag=6789
Call-Id: 9252226543-0001
CSeq: 1 INVITE
Contact: <sip:[email protected]>
Content-Length: 0
The attacker replaces the IP address of the proxy server (10.1.1.10) in the Contact header
with his proxy server (172.26.1.10), and sends to the IP phone, as shown in the continuation
of Example 2-3.
SIP/2.0 302 Moved Temporarily
From: UserA <sip:[email protected]:5060>;tag=2345
To: Bob <sip:[email protected]>;tag=6789
Call-Id: 9252226543-0001
CSeq: 1 INVITE
Contact: <sip:[email protected]>
Content-Length: 0
The IP phone sends a new INVITE to attacker’s proxy server rather than the legitimate
server, and his server relays the message as shown in the picture. From now on, the attacker
in the middle can see all signals between the endpoints and modify for any malicious
purpose.
Media Alteration
Media alteration is the threat that an attacker intercepts media in the middle of communication
entities and alters media information to inject unauthorized media, degrade the QoS, delete
certain information, and so on. The media can be voice-only or integrated with video, text,
fax, or image. The typical examples are media injection and degrading.
Media Injection
Media injection is an unauthorized method in which an attacker injects new media into an
active media channel or replaces media in an active media channel. The consequence of
media injection is that the end user (victim) may hear advertisement, noise, or silence in the
middle of conversation. Figure 2-8 illustrates the example with voice stream.
Media Gateway
PSTN
Media
Media Injection
User A User B
Attacker
In Figure 2-8, User A with an IP phone makes a call to User B who has a PSTN phone
through a media gateway. After the call setup, the IP phone sends voice (RTP) packets to
the media gateway. An attacker in the middle monitors the RTP sequence number of the
voice packets, and adjusts the sequence number of illegitimate packets (for example,
advertisements), and injects them into the voice channel so that they will arrive before
the legitimate packets. User B in PSTN hears the injected voice.
38 Chapter 2: VoIP Threat Taxonomy
Media Degrading
Media degrading is an unauthorized method in which an attacker manipulates media or
media control (for example, Real-Time Control Protocol [RTCP]) packets and reduces the
QoS of any communication. Here are a couple of examples:
1 An attacker intercepts RTCP packets in the middle, and changes (or erases) the
statistic values of media traffic (packet loss, delay, and jitter) so that the endpoint
devices may not control the media properly.
2 An attacker intercepts RTCP packets in the middle, and changes the sequence number
of the packets so that the endpoint device may play the media with wrong sequence,
which degrades the quality.
In this section so far, you have learned about VoIP threats against integrity such as message
alteration (call rerouting, call black holing) and media alteration (media injection, media
degrading). The next section covers another type of threats: social threats.
The general meaning of spam is unsolicited bulk email that you may see every day. It wastes
network bandwidth and system resources, as well as annoying email users. The spam exists
in VoIP space as well, so-called VoIP spam, in the form of voice, IM, and presence spam.
This section looks into each type of VoIP spam with SIP protocol. The content refers to
RFC 5039.1
Threats Against Social Context 39
Phishing is becoming popular in the VoIP world these days as a method of getting
somebody’s personal information by deceiving the identity of an attacker.
The following sections give more details about these social threats.
NOTE These same types of attacks are equally available in today’s PSTN environment.
Misrepresentation
Misrepresentation is the intentional presentation of a false identity, authority, rights, or
content as if it were true so that the target user (victim) or system may be deceived by the
false information. These misrepresentations are common elements of a multistage attack,
such as phishing.
Identity misrepresentation is the typical threat that an attacker presents his identity with
false information, such as false caller name, number, domain, organization, email address,
or presence information.
Authority or rights misrepresentation is the method of presenting false information to an
authentication system to obtain the access permit, or bypassing an authentication system by
inserting the appearance of authentication when there was none. It includes presentation of
password, key, certificate, and so on. The consequence of this threat could be improper
access to toll calls, toll calling features, call logs, configuration files, presence information
of others, and so on.
Content misrepresentation is the method of presenting false content as if it came from a
trusted source of origin. It includes false impersonation of voice, video, text, or image of
a caller.
SPIT for many reasons: low hardware cost, low line cost, ease of writing a spam application,
no boundary for international calls, and so on. Additionally, in some countries, such
telemarketing calls over the PSTN are regulated.
In some cases, spammers utilize computational and bandwidth resources provided by
others, by infecting their machines with viruses that turn them into “zombies” that can be
used to generate call spam.
Another reason SPIT is getting popular is its effectiveness, compared to email spams. For
email spams, you may already realize that there is a big difference between turning on and
off a spam filter for your email account. In fact, most spam filters for email today work very
well (filter more than 90 percent of spams) because of the nature of email; store and
forward. All emails can be stored and examined in one place before forwarding to users.
Even though users may still receive a small percentage of email spams, they usually look
at profiles (for example, sender name and subject) and delete most of them without seeing
the contents. However, the method of filtering emails does not work for SPIT because voice
is real-time media. Only after listening to some information initially can users recognize
whether it is a spam or not. So, spammers try to put main information in the initial
announcement so that users may listen to it before hanging up the phone. There is a way to
block those call attempts based on a blacklist (spammers’ IP address or caller ID), but it is
useless if spammers spoof the source information.
You can find more information on SPIT and mitigation methods in Chapter 6, “Analysis and
Simulation of Current Threats.”
The next topic is a different type of VoIP spam, IM spam.
IM Spam (SPIM)
IM spam is similar to email. It is defined as a bulk unsolicited set of instant messages,
whose content contains the message that the spammer is seeking to convey. This is often
called Spam over Instant Messaging, or SPIM.
SPIM is usually sent in the form of request messages that cause content to automatically
appear on the user’s display. The typical request messages in SIP are as follows:
• SIP MESSAGE request (most common)
• INVITE request with large Subject headers (since the Subject is sometimes rendered
to the user)
• INVITE request with text or HTML bodies
Threats Against Social Context 41
======================================================================
MESSAGE sip:[email protected]:5060 SIP/2.0
Via: SIP/2.0/UDP 10.10.10.10:5060;branch=z9hG4bK00002000005
From: Spammer <sip:[email protected]:5060>;tag=2345
To: Bob <sip:[email protected]>
Call-Id: 9252226543-0001
CSeq: 1 MESSAGE
Max-Forwards: 70
Content-Type: test/plain
Content-Length: 25
SPIM is very much like email, but much more intrusive than email. In today’s systems, IMs
automatically pop up and present themselves to the user. Email, of course, must be
deliberately selected and displayed.
In SIP, this is done using the watcherinfo event package. This package allows a user to learn
the identity of the watcher, in order to make an authorization decision. This could provide
a vehicle for conveying information to a user; Example 2-5 shows the example with SIP
SUBSCRIBE.
Example 2-5 Presence Spam
A spammer in Example 2-5 generates the SUBSCRIBE request from the identity (sip:buy-
[email protected]), and this brief message can be conveyed to the
user, even though the spammer does not have permission to access presence. As such,
presence spam can be viewed as a form of IM spam, where the amount of content to be
conveyed is limited. The limit is equal to the amount of information generated by the
watcher that gets conveyed to the user through the permission system.
Phishing
The general meaning of phishing is an illegal attempt to obtain somebody’s personal
information (for example, ID, password, bank account number, credit card information) by
posing as a trust entity in the communication. In VoIP, phishing is typically happening
through voice or IM communication, and voice phishing is sometimes called “vishing.”
The typical sequence is that a phisher picks target users and creates request messages (for
example, SIP INVITE) with spoofed identities, pretending to be a trusted party. When the
target user accepts the call request, either voice or IM, the phisher provides fake
information (for example, bank policy announcement) and asks for personal information.
Some information like user name and password may not be directly valuable to the phisher,
but it may be used to access more information useful in identity theft.
Here are a couple of phishing examples:
1 A phisher makes a call to a target user and leaves a voice message like: “This is an
important message from ABC Bank. Because our system has changed, you need to
change your password. Please call back at this number: 1-800-123-4567.” When the
target user calls the number back, the phisher’s Interactive Voice Response (IVR)
system picks up the call and acquires the user’s password by asking “Please enter your
current password for validation purposes . . ..”
Summary 43
2 A phisher sends an instant text message to a smart phone (for example, PDA phone)
or softphone (for example, Skype client) users, saying “This message is from ABC
Bank. Your credit card rate has been increased. Please check it out on our website:
http://www.abcbank.example.com.” When the users click the URL, it goes to a
phisher’s website (example.com) that appears to have exactly the same web page that
ABC Bank has. The fake website collects IDs and passwords that the users type in.
In this section, you have learned about VoIP threats in a social context, such as misrepre-
sentation, call spamming, IM spamming, presence spamming and phishing. For more
detailed information about VoIP spamming, refer to Chapter 6, “Analysis and Simulation
of Current Threats.”
Summary
VoIP vulnerabilities can be exploited to create many different kinds of threats. The threats
can be categorized as four different types: threats against availability, confidentiality,
integrity, and social context.
A threat against availability is a threat against service availability that is supposed to be
running 24/7. That is, the threat is aiming at VoIP service interruption, typically, in the form
of DoS. The examples are call flooding, malformed messages (protocol fuzzing), spoofed
messages (call teardown, toll fraud), call hijacking (registration or media session hijacking),
server impersonating, and QoS abuse.
A threat against confidentiality does not impact current communications generally, but
provides an unauthorized means of capturing conversations, identities, patterns, and
credentials that are used for the subsequent unauthorized connections or other deceptive
practices. VoIP transactions are mostly exposed to the confidentiality threat because most
VoIP service does not provide full confidentiality (both signal and media) end-to-end. The
threat examples are eavesdropping media, call pattern tracking, data mining, and
reconstruction.
A threat against integrity is altering messages (signals) or media after intercepting them in
the middle of the network. That is, an attacker can see the entire signaling and media stream
between endpoints as an intermediary. The alteration can consist of deleting, injecting, or
replacing certain information in the VoIP message or media. The typical examples are call
rerouting, call black holing, media injection, and media degrading.
A threat against social context focuses on how to manipulate the social context between
communication parties so that an attacker can misrepresent himself as a trusted entity and
convey false information to the target user. The typical examples are misrepresentation
(identity, authority, rights, and content), voice spam, instant message spam, presence spam,
and phishing.
44 Chapter 2: VoIP Threat Taxonomy
End Notes
1 RFC 5039, “SIP and Spam,” J. Rosenberg, C. Jennings, http://www.ietf.org/
rfc/rfc5039.txt, January 2008.
References
“Phishing,” Wikipedia, http://en.wikipedia.org/wiki/Phishing.
RFC 3261, “SIP (Session Initiation Protocol),” J. Rosenberg, H. Schulzrinne, G. Camarillo,
A. Johnston, J. Peterson, R. Sparks, M. Handley, and E. Schooler, June 2002.
RFC 3428, “Session Initiation Protocol (SIP) Extension for Instant Messaging,” B. Campbell,
J. Rosenberg, H. Schulzrinne, C. Huitema, D. Gurle, December 2002.
Trammell, Dustin D. “VoIP Attacks,” http://www.dustintrammell.com/presentations/.
“VoIP Security Threat Taxonomy,” VOIPSA, http://www.voipsa.org/Activities/
taxonomy-wiki.php.
This page intentionally left blank
This chapter covers the security profiles in the following VoIP protocols:
• H.323
• Session Initiation Protocol (SIP)
• Media Gateway Control Protocol (MGCP)
CHAPTER
3
Security Profiles
in VoIP Protocols
Three protocols are dominating in VoIP network today: H.323, Session Initiation Protocol
(SIP), and Media Gateway Control Protocol (MGCP). SIP and H.323 are peer-to-peer
session protocols that are necessary for global VoIP service, especially interconnecting
heterogeneous service networks. MGCP is a device control protocol that provides a simple
and centralized mechanism of controlling media gateways. Similar control protocols are
H.248 (Megaco), Network-based Call Signaling Protocol Specification (NCS), and Skinny
Call Control Protocol (SCCP).
These protocols define specific security mechanisms as part of the protocols, or recommend
a combined solution with other security protocols, such as IP Security (IPSec), Transport
Layer Security (TLS), or Secure Real-time Transport Protocol (SRTP).
This chapter looks into the security profiles of the following protocols at a high level, as
well as an overview of protocols:
• H.323
— Overview
— Security Profiles
H.235 Annex D (Baseline Security)
H.235 Annex E (Signature Security)
H.235 Annex F (Hybrid Security)
• SIP
— Overview
— Security Profiles
Digest Authentication
Identity Authentication
S/MIME
SRTP
TLS
IPSec
48 Chapter 3: Security Profiles in VoIP Protocols
• MGCP
— Overview
— Security Profiles
Even though these security profiles are not enough to make the whole VoIP service secure,
they are essential elements as part of a comprehensive solution.
NOTE The detailed usage of SIP security is demonstrated in Chapter 7, “Protection with VoIP
Protocol.”
H.323
H.3231 is the International Telecommunication Union (ITU) specification describing the
complete architecture and operations of audio and video communications across packetized
networks. H.323 is the first VoIP standard that is publicly used and adopts Real-Time
Transport Protocol (RTP) to transport voice and video over the IP network. Since H.323
was released for the first time in 1996, it has been updated with many enhancements and
the latest one was released in 2006, commonly referred to as H.323v6.
Before looking into the security profiles, this section briefly summarizes H.323 as follows.
Overview
H.323 is an umbrella specification that encompasses many other protocols; in particular, the
following protocols are key components:
• H.225 (Q.931)—Defines call setup messages and procedures used to establish a call,
request changes in bandwidth of the call, get status of the endpoints in the call, and
disconnect the call. It also defines Registration, Admission, and Status (RAS) messages
and procedures.
• H.245—Defines control messages and procedures used to exchange capabilities (for
example, coder-decoder [codec]) and open the media channels.
• RTP/RTCP—Real-Time Transport Protocol (RTP) provides end-to-end network
transport functions suitable for applications transmitting real-time data, such as voice
and video. Real-Time Transport Control Protocol (RTCP) provides statistical
information on Quality of Service (QoS), such as packet loss, delay, and jitter.
• H.235—Defines security profiles for H.323, such as authentication, message
integrity, signature security, and voice encryption.
The contents in this section refer to the H.323 specification, and the first topic is its
components.
H.323 49
Components
The H.323 service network consists of several components: terminal, gateway (GW), gatekeeper
(GK), Multipoint Controller (MC), and Multipoint Control Unit (MCU). Each component has
different roles and functions to establish communications between end users.
A terminal (endpoint) is a user device, such as an IP phone. It contains a protocol stack
implementing the basic functionality of real-time communications, such as H.225, H.245,
and RTP. This endpoint communicates with a gatekeeper to send or receive a call.
A gatekeeper is a key component that provides call control services to endpoints. More than
one gatekeeper may be present and they may communicate with each other in an unspecified
fashion. The gatekeeper is logically separate from the endpoints; however, its physical
implementation may coexist with a terminal, MCU, gateway, MC, or other non-H.323
network device. When it is present in a system, the gatekeeper shall provide the following
services:
• Address Translation—Translate H.323 alias address to transport address. This
should be done using a translation table, which is updated using the registration
messages.
• Admissions Control—Authorize network access using H.225 messages (admission
request, confirm or reject). This may be based on call authorization, bandwidth, or
some other criteria that is left to the manufacturer. It may also be a null function,
which admits all requests.
• Bandwidth Control—Support bandwidth control messages (bandwidth request,
confirm or reject). It may also be a null function that accepts all requests.
• Zone Management—Provides the other three functions in this list for terminals,
MCUs, and gateways that have registered with it.
The gatekeeper may also perform other optional functions such as:
• Call Control Signaling—Process the call signaling with endpoints.
• Call Authorization—Authorize call attempts. Through the use of H.225 signaling,
the gatekeeper may reject calls from a terminal due to authorization failure. The
reasons for rejection may include restricted access to/from particular terminals or
gateways and restricted access during certain periods of time.
• Bandwidth Management—Control the number of H.323 terminals permitted
simultaneous access to the network. Through the use of the H.225.0 signaling, the
gatekeeper may reject calls from a terminal due to bandwidth limitations.
• Alias Address Modification—May return a modified alias address in an admission
confirm (ACF) so that the endpoint may use the alias address in establishing the
connection.
• Dialed Digit Translation—May translate dialed digits into an E.164 number or a
private network number.
50 Chapter 3: Security Profiles in VoIP Protocols
The gatekeeper with these functions receives a call request from an endpoint and terminates
the call according to the routing and security policy. One of the common termination points
is a gateway, especially for a PSTN call.
A gateway is a translation device between an H.323 network and other networks, such as
ISDN or a mobile network. The typical usage of the gateway is enabling terminal users to
make calls to public switched telephone network (PSTN) users, and the reverse. It may also
be possible for an endpoint on one segment of the network to call out through one gateway
and back onto the network through another gateway in order to bypass a router or a low-
bandwidth link.
A Multipoint Controller (MC) is a control device supporting conferences between three or
more endpoints in a multipoint conference. The MC carries out the capabilities exchange
with each endpoint in a multipoint conference. The MC sends a capability set to the
endpoints in the conference indicating the operating modes in which they may transmit.
The MC may revise the capability set that it sends to the terminals as a result of terminals
joining or leaving the conference or for other reasons.
A Multipoint Control Unit (MCU) is an endpoint that provides support for multipoint
conferences. It uses H.245 messages and procedures to implement features. A gatekeeper
or gateway may also include the MCU as a separate module.
Now that you have learned about the functions of each component in H.323, the next
section takes a look at the basic call flow.
The method of implementing the steps varies depending on the service architecture, type of
service, and call scenarios. One of the typical call flows with a gatekeeper is shown in
Figure 3-1; Endpoint A makes a call to Endpoint B through Gatekeeper.
H.323 51
ARQ (S1)
ACF/ARJ (S2)
Setup (S3)
ACF/ARJ (S6)
Alerting (S7)
Connect (S8)
RTP/RTCP (S12)
In the scenario shown in Figure 3-1, both endpoints are registered to the same gatekeeper,
and the gatekeeper has chosen direct call signaling. Here is the description of each signal:
• S1—Endpoint A (calling endpoint) initiates the ARQ (admission request) to Gatekeeper.
• S2—Gatekeeper responds ACF (admission confirm) with the Call Signaling Channel
Transport Address of Endpoint B (called endpoint).
• S3—Endpoint A then sends the Setup message directly to Endpoint B using that
Transport Address.
• S4—Endpoint B responds Call Proceeding to notify its processing.
• S5 and S6—If Endpoint B wants to accept the call, it initiates an ARQ/ACF
exchange with Gatekeeper. It is possible that an ARJ (admission reject) is received
by Endpoint B, in which case it sends Release Complete (disconnection) to
Endpoint A.
52 Chapter 3: Security Profiles in VoIP Protocols
Security Profiles
H.2352 describes security enhancements within the framework of H.323 to incorporate
security services such as authentication and privacy. The proposed scenario is applicable to
both simple point-to-point and multipoint conferences for any terminals that utilize H.245
as a control protocol.
NOTE This section uses many terms related to cryptography. For more detailed information, refer
to Chapter 4, “Cryptography.”
The latest version (3) of H.235 was released in 2003, featuring a procedure for encrypted
dual-tone multifrequency (DTMF) signals, object identifiers for the Advanced Encryption
Standard (AES) encryption algorithm for media payload encryption, the enhanced OFB
(EOFB; see the following Note) stream-cipher encryption mode for encryption of media
streams, and an authentication-only (see Note) option for Network Address Translation
(NAT)/firewall traversal, and so on.
H.323 53
NOTE Output Feedback Mode (OFB) defines an operation mode that deploys a stream cipher
using block encryption algorithms. The OFB mode provides:
• Improved performance through reduced encryption processing delay
Enhanced OFB (EOFB) is a slightly modified OFB mode, which deploys the same features
as OFB but in addition to that:
• Uses a salting key (KS) in addition to the encryption key (KE)
NOTE H.235 uses the following terms for provisioning the security services:
• Authentication and integrity—This is a combined security service part of the
baseline profile that supports message integrity in conjunction with user authentication.
The user may ensure authentication by correctly applying a shared secret key
procedure. Both security services are provided by the same security mechanism.
• Authentication-only—This security service offered by the baseline security profile
as an option supports authentication of selected fields only, but does not provide full
message integrity. The authentication-only security profile is applicable for signaling
messages traversing NAT/firewall devices. The user may ensure authentication by
correctly applying a shared secret key procedure.
H.235 includes several annexes that each hold security profiles of H.235. A security profile
specifies specific usage of H.235 or a subset of H.235 functionality for well-defined
environments with scoped applicability.
Depending on the environment and application, security profiles may be implemented
either selectively or all together. Typically, H.235-enabled systems indicate within object
identifiers as part of signaling messages which security profiles they deploy. H.235-enabled
systems should select the security profile according to their needs.
The following sections describe the security profiles of H.235 that this section refers to.
54 Chapter 3: Security Profiles in VoIP Protocols
Call Functions
Security
Services RAS H.225 H.245 RTP
Authentication and Password Password Password –
integrity HMAC-SHA1-96 HMAC-SHA1-96 HMAC-
SHA1-96
Nonrepudiation – – – –
Confidentiality – – – –
Access control – – – –
Key management Subscription-based Subscription-based – –
password assignment password assignment
Optionally, the voice-encryption security profile can be combined smoothly with the
baseline security profile. Audio streams may be encrypted using the voice-encryption
security profile deploying Data Encryption Standard (DES), RC2-compatible or triple-
DES, and using the authenticated Diffie-Hellman key-exchange procedure.
The baseline security profile mandates the fast connect procedure with integrated key
management elements. Signaling means are provided also for tunneled H.245 key-update
and synchronization. For long-duration calls, these messages require tunneling of H.245
within H.225.0 messages.
That was a brief summary of baseline security. The next topic, Annex E, covers signature
security.
H.323 55
NOTE Nonrepudiation is the concept of ensuring that a communication party cannot deny the
validity of a message. In a digital signature, the private key is only accessible by its holder;
the signature proves that the message was signed “only” by the holder, which offers
nonrepudiation. For more detailed information about digital signatures, refer to Chapter 4,
“Cryptography.”
The signature security profile supports hop-by-hop security as well as true end-to-end
authentication with simultaneous use of H.235 proxies or intermediate gatekeepers.
The features provided by these profiles include, for RAS, H.225.0 and H.245 messages:
• User authentication to a desired entity irrespective of the number of application-level
hops that the message traverses.
• Integrity of all or critical portions (fields) of messages arriving at an entity irrespective
of the number of application-level hops that the message traverses. Integrity of the
message itself using a strongly generated random number is also optional.
• Application-level hop-by-hop message authentication, integrity, and nonrepudiation
provide these security services for the entire message.
• Nonrepudiation of messages exchanged between two entities irrespective of the
number of application-level hops that the message traverses can also be provided.
Specifically, the nonrepudiation is provided for critical portions (fields) of the
message. For instance, this may be the case when an endpoint sends a SETUP
message to its gatekeeper and the two (endpoint and gatekeeper) are separated by one
or more proxies.
56 Chapter 3: Security Profiles in VoIP Protocols
Table 3-2 shows the scope of signature security profile. An option within the profile is to
select between RSA-SHA1 or RAS-MD5 digital signatures.
Table 3-2 Signature Security Profile
Call Functions
Security
Services RAS H.225 H.245 RTP
Authentication SHA1/MD5 SHA1/MD5 SHA1/MD5 –
Digital signature Digital signature Digital signature
Non-repudiation SHA1/MD5 SHA1/MD5 SHA1/MD5 –
Digital signature Digital signature Digital signature
Integrity SHA1/MD5 SHA1/MD5 SHA1/MD5 –
Digital signature Digital signature Digital signature
Confidentiality – – – –
Access control – – – –
Key management Certificate Allocation Certificate Allocation – –
That was a brief summary of signature security. The next topic, Annex F, covers hybrid
security.
user mobility as well. It applies asymmetric cryptography with signatures and certificates
only where necessary and otherwise uses simpler and more efficient symmetric techniques.
It provides tunneling of H.245 messages for H.245 message integrity and also implements
some provisions for nonrepudiation of messages.
The hybrid security profile mandates the GK-routed model and is based on the H.245
tunneling techniques. Support for non GK-routed models is for further study.
Table 3-3 shows an overview of hybrid security with security mechanisms.
Table 3-3 Hybrid Security Profile
Call Functions
Security
Services RAS H.225 H.245 RTP
Authentication RSA Digital Signature RSA Digital Signature RSA Digital –
(SHA1) (SHA1) Signature (SHA1)
HMAC-SHA1-96 HMAC-SHA1-96 HMAC-SHA1-96
Nonrepudiation (possible only on first (possible only on first
message) message) RSA Digital
Integrity –
RSA Digital Signature RSA Digital Signature Signature (SHA1)
(SHA1) (SHA1) HMAC-SHA1-96
HMAC-SHA1-96 HMAC-SHA1-96
Confidentiality – – – –
Access control Certificate Allocation Certificate Allocation – –
Key management (authenticated (authenticated Diffie- – –
Diffie-Hellman Hellman key-exchange)
key-exchange)
The preceding sections were a brief summary of hybrid security. So far, you have learned
about the components, basic call flow, and security profiles of H.323. The next section
covers those same points with SIP.
SIP
SIP (RFC 32613) is an application-layer control protocol that can establish, modify, and
terminate multimedia sessions such as Internet telephony (VoIP) calls. SIP can also invite
participants to already existing sessions, such as multicast conferences. Media can be added
to (and removed from) an existing session. SIP transparently supports name mapping and
redirection services, which supports personal mobility—users can maintain a single
externally visible identifier regardless of their network location.
58 Chapter 3: Security Profiles in VoIP Protocols
In this section, you learn about the security profiles of SIP, as well as the components, basic
call flow, and session setup examples, based on RFC 3261.
Overview
SIP is not a vertically integrated communications system. SIP is rather a component that
can be used with other protocols to build a complete multimedia architecture.
Typically, these architectures will include protocols such as RTP for transporting real-time
data and providing QoS feedback, Real-Time Streaming Protocol (RTSP) for controlling
delivery of streaming media, MGCP for controlling gateways to the STN, and Session
Description Protocol (SDP) for describing multimedia sessions. Therefore, SIP should be
used in conjunction with other protocols to provide complete services to the users.
However, the basic functionality and operation of SIP does not depend on any of these
protocols.
SIP also provides a suite of security services, which include Denial-of-Service (DoS)
prevention, authentication (both user-to-user and proxy-to-user), integrity protection, and
encryption and privacy services.
SIP supports five facets of establishing and terminating multimedia communications:
1 User location—Determination of the end system to be used for communication.
2 User availability—Determination of the willingness of the called party to engage in
communications.
3 User capabilities—Determination of the media and media parameters to be used.
Components
In SIP, there are two logical entities: User Agent Client (UAC) and User Agent Server
(UAS), often just called User Agents.
A UAC is a logical entity that creates a new request, and then uses the client transaction state
machinery to send it. The role of UAC lasts only for the duration of that transaction. In other
words, if a piece of software initiates a request, it acts as a UAC for the duration of that
transaction. If it receives a request later, it assumes the role of a user agent server for the
processing of that transaction.
SIP 59
On the other hand, a UAS is a logical entity that receives a request and generates a response
to that SIP request. The UAS accepts, rejects, or redirects the request. This role lasts only
for the duration of that transaction. In other words, if a piece of software responds to a
request, it acts as a UAS for the duration of that transaction. If it generates a request later,
it assumes the role of a user agent client for the processing of that transaction.
Therefore, all IP phones supporting the SIP protocol can be either UAC or UAS depending
on the direction of the call request.
It is possible to make a call directly between endpoints (that is, end-to-end call setup), but
in most cases, servers are involved in the communication for authentication, call routing,
advanced feature services, and so on. There are four servers in SIP: registrar, redirect, and
proxy servers, and a Back-to-Back User Agent (B2BUA). Here is the description of each
server.
1 Registrar—A registration server that accepts REGISTER requests and places the
information it receives in those requests into the location service for the domain it
handles. It maintains a list of bindings that are accessible to proxy servers and redirect
servers within its administrative domain.
2 Redirect server—A redirect server that generates 3xx responses to requests it
receives, directing the client to contact an alternate set of uniform resource identifiers
(URIs).
3 Proxy server—An intermediary entity that acts as both a server and a client for the
purpose of making requests on behalf of other clients. A proxy server primarily plays
the role of routing, which means its job is to ensure that a request is sent to another
entity “closer” to the targeted user. A proxy server is also useful for enforcing policy
(for example, making sure a user is allowed to make a call). It interprets, and, if
necessary, rewrites specific parts of a request message before forwarding it.
4 Back-to-Back User Agent—A logical entity that receives a request and processes it
as a UAS. To determine how the request should be answered, it acts as a UAC and
generates requests. Unlike a proxy server, it maintains dialog state and must
participate in all requests sent on the dialogs it has established. Because it is a
concatenation of a UAC and UAS, no explicit definitions are needed for its behavior.
These servers could be separate entities physically, or integrated in a single machine, for
example, a softswitch. An example of the relationship between a server and a client (UAC)
is shown in Figure 3-2.
Now that you are aware of the components of SIP, the next section takes a look at the basic
call flow to understand more about SIP.
60 Chapter 3: Security Profiles in VoIP Protocols
ACK (M3)
INVITE (M4)
INVITE (M5)
100 Trying (M6)
180 Ringing (M7)
180 Ringing (M8)
200 OK (M9)
200 OK (M10)
ACK (M11)
ACK (M12)
RTP/RTCP
BYE (M13)
BYE (M14)
200 OK (M15)
200 OK (M16)
• M6—Notifies that the proxy server received the request and continues to process it.
• M7 and M8—UserB sends 180 Ringing when the phone is ringing.
• M9 and M10—UserB sends 200 OK with SDP when picking up the phone.
• M11 and M12—Acknowledgment. After this, the media channel (RTP/RTCP) is
opened.
• M13 and M14—UserB sends BYE when hanging up the phone.
• M15 and M16—Confirmation of disconnecting.
This is a typical example of call flow to set up and disconnect a SIP dialog among a UAC,
a UAS, and a proxy server.
The following section shows examples of actual messages with detailed information.
INVITE (M1)
INVITE (M2)
100 Trying (M3)
180 Ringing (M4)
180 Ringing (M5)
200 OK (M6)
200 OK (M7)
ACK (M8)
ACK (M9)
RTP/RTCP
62 Chapter 3: Security Profiles in VoIP Protocols
Example 3-1 shows the initial INVITE (M1) from UserA to proxy server.
Example 3-1 M1
v=0
o=UserA 2890844526 2890844526 IN IP4 userAclient.example.com
s=-
c=IN IP4 192.0.2.101
t=0 0
m=audio 49172 RTP/AVP 0
a=rtpmap:0 PCMU/8000
UserA calls UserB using his SIP identity, a type of URI called a SIP URI. In this case, it
is sip:[email protected], where example.com is the domain of UserB’s SIP service
provider. UserA has a SIP URI of sip:[email protected].
SIP also provides a secure URI, called a SIPS URI. An example would be sips:UserB@
example.com. A call made to a SIPS URI guarantees that secure, encrypted transport
(namely TLS) is used to carry all SIP messages from the caller to the domain of the callee.
From there, the request is sent securely to the callee, but with security mechanisms that
depend on the policy of the domain of the callee.
SIP is based on an HTTP-like request/response transaction model. Each transaction consists
of a request that invokes a particular method, or function, on the server and at least one
response. In this example, the transaction begins with UserA’s IP phone sending an INVITE
request addressed to UserB’s SIP URI. INVITE is an example of a SIP method that specifies
the action that the requestor (UserA) wants the server (UserB) to take. The INVITE request
contains a number of header fields. Header fields are named attributes that provide additional
information about a message. The ones present in an INVITE include a unique identifier
for the call, the destination address, UserA’s address, and information about the type of
session that UserA wishes to establish with UserB.
SIP 63
The first line of the text-encoded message contains the method name (INVITE). The lines
that follow are a list of header fields. This example contains a minimum required set; the
following six headers are mandatory. The header fields are briefly described in the following
list:
• Via—Contains the address (userAclient.example.com) at which UserA is expecting
to receive responses to this request. It also contains a branch parameter that identifies
this transaction.
• Max-Forwards—Serves to limit the number of hops a request can make on the way
to its destination. It consists of an integer that is decremented by one at each hop.
• From—Contains a display name (UserA) and a SIP or SIPS URI
(sip:[email protected]) that indicate the originator of the request. This header
field also has a tag parameter containing a random string (9fxced76sl) that was added
to the URI by the UAC. It is used for identification purposes.
• To—Contains a display name (UserB) and a SIP or SIPS URI (sip:[email protected])
toward which the request was originally directed.
• Call-ID—Contains a globally unique identifier for this call, generated by the
combination of a random string and the IP phone’s host name or IP address. The
combination of the To tag, From tag, and Call-ID completely defines a peer-to-peer
SIP relationship between UserA and UserB and is referred to as a dialog.
• CSeq or Command Sequence—Contains an integer and a method name. The CSeq
number is incremented for each new request within a dialog and is a traditional
sequence number.
The following three headers also can be used for specific purposes, even though they are
not mandatory.
• Contact—Contains a SIP or SIPS URI that represents a direct route to contact UserA,
usually composed of a username at a fully qualified domain name (FQDN). The Via
header field tells other elements where to send the response, and the Contact header
field tells other elements where to send future requests.
• Content-Type—Contains a description of the message body, which is typically
application/sdp (described next).
• Content-Length—Contains an octet (byte) count of the message body.
The remaining portion is the body of a SIP message, typically SDP (RFC 4566), which
contains the description of the session, such as the type of media, codec, and port (“m=”
line), IP address (“c=” line), and sampling rate (“a=” line).
64 Chapter 3: Security Profiles in VoIP Protocols
Example 3-2 shows INVITE (M2) from proxy server to UserB in Figure 3-3.
Example 3-2 M2
v=0
o=UserA 2890844526 2890844526 IN IP4 userAclient.example.com
s=-
c=IN IP4 192.0.2.101
t=0 0
m=audio 49172 RTP/AVP 0
a=rtpmap:0 PCMU/8000
Example 3-3 shows 100 Trying (M3) from proxy server to UserA in Figure 3-3.
Example 3-3 M3
The proxy server receives SIP requests and forwards them on behalf of the requestor. In this
example, the proxy server receives the INVITE request and sends a 100 Trying response
back to UserA’s IP phone. The 100 Trying response indicates that the INVITE has been
received and that the proxy is working on UserA’s behalf to route the INVITE to the
destination. Responses in SIP use a three-digit code followed by a descriptive phrase. This
response contains the same To, From, Call-ID, CSeq, and branch parameter in the Via as in
the INVITE, which allows UserA’s IP phone to correlate this response to the sent INVITE.
Before forwarding the request, the proxy server adds an additional Via header field value
that contains its own address (the INVITE already contains UserA’s address in the first Via).
SIP 65
The proxy server consults a database, generically called a location service, which contains
the current IP address of UserB.
Example 3-4 shows 180 Ringing (M4) from UserB to proxy server in Figure 3-3.
Example 3-4 M4
Example 3-5 shows 180 Ringing (M5) from proxy server to UserA in Figure 3-3.
Example 3-5 M5
UserB’s SIP phone receives the INVITE and alerts UserB to the incoming call from UserA
so that UserB can decide whether to answer the call, that is, UserB’s phone rings. UserB’s
SIP phone indicates this in a 180 Ringing response, which is routed back through the proxy
in the reverse direction. The proxy uses the Via header field to determine where to send the
response and removes its own address from the top.
When UserA’s IP phone receives the 180 Ringing response, it passes this information to
UserA, perhaps using an audio ringback tone or by displaying a message on UserA’s screen.
Example 3-6 shows 200 OK (M6) from UserB to proxy server in Figure 3-3.
66 Chapter 3: Security Profiles in VoIP Protocols
Example 3-6 M6
SIP/2.0 200 OK
Via: SIP/2.0/UDP ss2.example.com:5060;branch=z9hG4bK2d4790.1
;received=192.0.2.222
Via: SIP/2.0/UDP userAclient.example.com:5060;branch=z9hG4bK74bf9
;received=192.0.2.101
From: UserA <sip:[email protected]>;tag=9fxced76sl
To: UserB <sip:[email protected]>;tag=314159
tag=314159
Call-ID: [email protected]
CSeq: 1 INVITE
Contact: <sip:[email protected]>
Content-Type: application/sdp
Content-Length: 147
v=0
o=UserB 2890844527 2890844527 IN IP4 userBclient.example.com
s=-
c=IN IP4 192.0.2.201
t=0 0
m=audio 3456 RTP/AVP 0
a=rtpmap:0 PCMU/8000
Example 3-7 shows 200 OK (M7) from proxy server to UserA in Figure 3-3.
Example 3-7 M7
SIP/2.0 200 OK
Via: SIP/2.0/UDP userAclient.example.com:5060;branch=z9hG4bK74bf9
;received=192.0.2.101
From: UserA <sip:[email protected]>;tag=9fxced76sl
To: UserB <sip:[email protected]>;tag=314159
tag=314159
Call-ID: [email protected]
CSeq: 1 INVITE
Contact: <sip:[email protected]>
Content-Type: application/sdp
Content-Length: 147
v=0
o=UserB 2890844527 2890844527 IN IP4 userBclient.example.com
s=-
c=IN IP4 192.0.2.201
t=0 0
m=audio 3456 RTP/AVP 0
a=rtpmap:0 PCMU/8000
In this example, UserB decides to answer the call. When he picks up the handset, his SIP
phone sends a 200 OK response to indicate that the call has been answered. The 200 OK
contains a message body with the SDP media description of the type of session that UserB
SIP 67
Security Profiles
The SIP protocol describes several security features and their usage guidelines. The main
features are as follows:
• Digest authentication
• Identity authentication
• Message encryption (S/MIME)
68 Chapter 3: Security Profiles in VoIP Protocols
Digest Authentication
SIP provides challenge-based Digest authentication that is derived from HTTP authentication.
It challenges one-direction between UAC and UAS including Registrar, or between user
agent (UA) and proxy server.
When UAS, proxy, or registrar receives a request, it may challenge the request to provide
the assurance of identity of the originator. The originator can reply with its credential with
encryption (for example, MD5), or reject the challenge. When the credential is received,
the server verifies and sends back respective response codes like 401 (Unauthorized) or
200 (OK).
The high-level mechanism is shown in Figure 3-4.
1. Initial Request
(INVITE)
2. Challenge
(407 Auth Required)
SIP Client SIP Server
3. Request with Encrypted Credentials
(MD5)
4. Authorized or Unauthorized
(Unauthorized)
Because of the security issue, the previous method of Basic authentication (RFC 2543) is
not acceptable anymore: It is supposed to be rejected or ignored.
SIP 69
NOTE This section uses many cryptographic terms and methods. For more detailed information,
refer to Chapter 4, “Cryptography.”
Identity Authentication
In general, the “From” header in a SIP request message contains the identity of an
originator (address-of-record; see the following Note), and the originator may manipulate
or spoof the identity when making a call. This type of identity issues and authentication
mechanism for SIP is defined in RFC 4474.4
NOTE An address-of-record (AoR) is a SIP or SIPS URI that points to a domain with a location
service that can map the URI to another URI where the user might be available. Typically,
the location service is populated through registrations. An AoR is frequently thought of as
the “public address” of the user.
RFC 3261 itself does not define the solid mechanism for securely identifying originators of
SIP requests. Instead, it recommends the way in which a user agent authenticates itself to a
local proxy server, which in turn authenticates itself to a remote proxy server via mutual
TLS, creating a two-link chain of transitive authentication between the originator and
the remote domain. This transitive trust is inherently weaker than an assertion that can be
validated end-to-end. It is possible for SIP requests to cross multiple intermediaries in
separate administrative domains, in which case transitive trust becomes even less
compelling.
One solution to this problem is to use “trusted” SIP intermediaries that assert an identity
for users in the form of a privileged SIP header. A mechanism for doing so (with the
P-Asserted-Identity header) is given in RFC 3325. However, this solution allows only hop-
by-hop trust between intermediaries, not end-to-end cryptographic authentication, and it
assumes a managed network of nodes with strict mutual trust relationships, an assumption
that is incompatible with widespread Internet deployment.
Accordingly, RFC 4474 specifies a means of sharing a cryptographic assurance of end-user
SIP identity in an interdomain or intradomain context that is based on the concept of an
“authentication service” and a new SIP header, the Identity header.
The RFC 4474 specification allows either a user agent or a proxy server to provide identity
services and to verify identities. To maximize end-to-end security, it is obviously preferable
for end users to acquire their own certificates and corresponding private keys; if they do,
they can act as an authentication service. However, end-user certificates may be neither
70 Chapter 3: Security Profiles in VoIP Protocols
practical nor affordable, given the difficulties of establishing a Public Key Infrastructure
(PKI) that extends to end users. Accordingly, in the initial use of this mechanism, it is likely
that intermediaries will instantiate the authentication service role.
Here is a usage example: Imagine the case of Alice, who has the home proxy of example.com
and the address-of-record (AoR) sip:[email protected], wants to communicate with Bob,
sip:[email protected].
Alice generates an INVITE and places her identity in the From header field of the request.
She then sends an INVITE over TLS to an authentication service proxy for her domain. The
authentication service authenticates Alice (possibly by sending a Digest authentication
challenge) and validates that she is authorized to assert the identity that is populated in the
From header field. This value may be Alice’s AoR, or it may be some other value that the
policy of the proxy server permits her to use. It then computes a hash over some particular
headers, including the From header field and the body of the message (which usually
contains SDP). This hash is signed with the certificate for the domain (example.com, in
Alice’s case) and inserted in a new header field in the SIP message, the “Identity” header.
The authentication service, as the holder of the private key of its domain, is asserting that
the originator of this request has been authenticated and that she is authorized to claim the
identity (the SIP address-of-record) that appears in the From header field. The proxy also
inserts a companion header field, Identity-Info, that tells Bob how to acquire its certificate,
if he does not already have it.
When Bob’s domain receives the request, it verifies the signature provided in the Identity
header, and thus can validate that the domain indicated by the host portion of the AoR in
the From header field authenticated the user, and permitted the user to assert that From
header field value. This same validation operation may be performed by Bob’s UAS.
Secure RTP
Secure RTP (SRTP) is an extension of RTP, which provides security features, such as
encryption and authentication.
The method of securing RTP packets was not defined when SIP (RFC 3261) was released.
In 2004, researchers from Cisco and Ericsson proposed the specification and IETF listed in
RFC 3711. It provides a framework for encryption and message authentication of RTP
and RTCP streams (note that SRTP includes SRTCP in this context).
SRTP has not been widely deployed yet for VoIP services because of some issues like
performance, complexity of implementation, and interoperability. However, it is critical
technology that you can provide to ensure the confidentiality and integrity of media
streams.
It uses a common security mechanism in which, between communication parties, they share
keys and encrypt/decrypt RTP packets. Chapter 7 demonstrates the usage of SRTP.
TLS
Because the full encryption of a message is almost impossible within public service
networks because of intermediary servers, as mentioned before, we need a low-layer
security mechanism that encrypts entire SIP requests and responses on the wire for
providing the confidentiality and integrity of messages.
72 Chapter 3: Security Profiles in VoIP Protocols
Finished (M9)
Application Data
• M1 and M2—The client hello and server hello are used to establish security capabilities
between client and server, such as protocol version, session ID, cipher suite, and
compression method.
• M3—Following the hello messages, the server will send its certificate containing the
server’s public key, name, and Certificate Authority (CA), for example, VeriSign. The
client may contact the CA to confirm that the certificate is authentic.
• M4—Hello done message, indicating that the hello-message phase of the handshake
is complete. The server will then wait for a client response.
• M5—With client key exchange message, the pre-master secret is set; the client
encrypts a random number with the server’s public key to generate session keys for
the connection.
• M6—Change cipher spec message is sent by the client, and the client copies the
pending Cipher Spec into the current Cipher Spec.
• M7—The client then immediately sends the finished message under the new
algorithms, keys, and secrets.
SIP 73
• M8 and M9—In response, the server will send its own change cipher spec message,
transfer the pending to the current Cipher Spec, and send its finished message under
the new Cipher Spec. At this point, the handshake is complete, and the client and
server may begin to exchange application layer data.
Typically, SIP uses TLS to provide hop-by-hop security in the service network and
eventually give end-to-end security between UAs. For example, think about this kind of
common situation: User agent A tries to make a call to user agent B through A’s proxy
server and B’s proxy server. Also, there is no trust between A and B, but A trusts A’s proxy
server and B trusts B’s proxy server through TLS (or another way like IPSec). In this case,
we can provide end-to-end security by exchanging certificates between A’s and B’s proxy
server through TLS.
NOTE Transport mechanisms are specified on a hop-by-hop basis in SIP, so a user agent that sends
requests over TLS to a proxy server has no assurance that TLS will be used end-to-end.
The following section describes another lower-layer security mechanism, IPSec, which SIP
also recommends.
IPSec
IPSec is a suite of network-layer protocols securing IP network communications by
encrypting and authenticating data. It is generally used for Virtual Private Network (VPN)
connection.
Basically, the IPSec protocol (network layer) is independent of the SIP protocol
(application layer) and there is no required integration between them. Unlike the integration
with TLS, SIP does not provide any indication of IPSec in the messages. However,
practically speaking, IPSec is very useful to provide security between SIP entities,
especially between a UA and a proxy server. UAs that have a preshared keying relationship
with their first-hop proxy server are good candidates to use IPSec.
Implementers should consider a separate security mechanism from SIP protocol because
IPSec is usually deployed at the operating system level in a host, or on a security gateway
(for example, a VPN server) that provides confidentiality and integrity for all traffic that it
receives from a particular interface.
In this section, you have learned about the security profiles of SIP, such as Digest
authentication, identity authentication, S/MIME, SRTP, TLS, and IPSec. The next section
covers another VoIP protocol, MGCP.
74 Chapter 3: Security Profiles in VoIP Protocols
MGCP
MGCP was initially defined in RFC 2705 as a control protocol of media gateway, and
updated in year 2003 as RFC 3435. RFC 3435 is still MGCP version 1 because it updates
only minor things with error fixes. As a variant, the organization PacketCable adapted this
protocol and released NCS, which is available on the PacketCable website. The content in
this section refers to RFC 3435.6
Overview
As the name Media Gateway Control Protocol (MGCP) implies, it is a protocol based on a
master-slave relation between entities. The master is called Call Agent, which controls the
slave, called Media Gateway. Figure 3-6 illustrates the transaction between the elements;
the call agent initiates transactions (commands) to manage or configure the media gateway.
The protocol is text-based and offers a set of simple primitives.
Responses
V
Media Gateway
Call Agent
A media gateway is a network element that provides conversion between the audio signals
carried on telephone circuits and data packets carried over the Internet or over other packet
networks. Some examples of media gateways are:
• Trunking gateways—Interface between the telephone network and a VoIP network.
Such gateways typically manage a large number of digital circuits.
• Residential gateways—Provide a traditional analog (RJ11) interface to a VoIP
network. Examples of residential gateways include cable modem/cable set-top boxes,
DSL devices, and broadband wireless devices.
• Access gateways—Provide a traditional analog (RJ11) or digital PBX interface to a
VoIP network. Examples of access gateways include small-scale VoIP gateways.
MGCP assumes a call control architecture where the call control “intelligence” is outside
the gateways and handled by external call control elements. The MGCP assumes that these
call control elements (call agents) will synchronize with each other to send coherent
commands to the gateways under their control. MGCP does not define a mechanism for
synchronizing call agents.
MGCP 75
Security Profiles
MGCP does not define any specification of security profile, but refers to lower-layer
security protocols. It recommends that MGCP messages always be carried over secure
Internet connections, as defined in IPSec using either the IP Authentication Header (AH) or
the IP Encapsulation Security Payload (ESP). The complete MGCP protocol stack would
thus include the layers in Figure 3-8.
76 Chapter 3: Security Profiles in VoIP Protocols
Off-Hook NTFY
ACK
RQNT
Dialtone
ACK
NTFY
Digits
ACK
CRCX
Recvonly
ACK
CRCX
User Call Sendrev
ACK
A Agent
Gateway MDCX Gateway
Recvonly A B
ACK User
B
RQNT
Ringback
ACK Ringing
RQNT
ACK
Off-Hook
NTFY
ACK
MDCX
Sendrcv
ACK
RTP/RTCP
MGCP
UDP
IP Security
(Authentication or Encryption)
IP
Transmission Media
MGCP 77
Adequate protection of the connections will be achieved if the gateways and the call agents
only accept messages for which IP security provided an authentication service. An encryption
service will provide additional protection against eavesdropping or traffic analysis, thus
preventing third parties from monitoring the connections set up by a given endpoint.
The encryption service will also be useful if the session descriptions are used to carry
session keys, as defined in SDP.
These procedures do not necessarily protect against Denial-of-Service attacks by misbehaving
gateways or misbehaving call agents. However, they will provide an identification of these
misbehaving entities, which should then be deprived of their authorization through
maintenance procedures.
For the protection of media connections, MGCP allows the call agent to provide gateways
with “session keys” that can be used to encrypt the audio messages, protecting against
eavesdropping, based on RFC 4568.
A specific problem of packet networks is “uncontrolled barge-in.” This attack can be
performed by directing media packets to the IP address and UDP port used by a connection.
If no protection of the media is implemented, the packets will be decoded and played to the
user. A basic protection against this attack is to only accept packets from communication
parties; however, this tends to conflict with RTP principles. This also has two issues:
• It slows down connection establishment—To enable the address-based protection,
the call agent must obtain the source address of the egress gateway and pass it to the
ingress gateway (see Note). This requires at least one network round trip, and leaves
us with a dilemma: either allow the call to proceed without waiting for the round trip
to complete, and risk for example “clipping” a remote announcement; or wait for the
full round trip and settle for slower call setup procedures.
Summary
VoIP protocols (SIP, H.323, and MGCP) define specific security mechanisms as part of the
protocols, or recommend combined solution with other security protocols. Even though
these security profiles are not enough to make the whole VoIP service secure, they are
essential elements as part of the comprehensive solution.
H.323 is the ITU specification describing the complete architecture and operations of audio
and video communications across packetized networks. It is an umbrella specification that
encompasses many other protocols, such as H.225, H.235, H.245, and RTP/RTCP. The
main components are terminal, gateway, gatekeeper, MC, and MCU.
H.235 describes security enhancements within the framework of H.323 to incorporate
security services such as authentication and privacy. The proposed scheme is applicable to
both simple point-to-point and multipoint conferences for any terminals that utilize H.245
as a control protocol.
H.235 includes several annexes that each hold security profiles of H.235. Annex D defines
a simple, baseline security profile that provides security mechanism by simple means using
secure password-based cryptographic techniques. Annex E describes a security profile
deploying digital signatures. Annex F describes an efficient and scalable, PKI-based hybrid
security profile deploying digital signatures from Annex E and deploying the baseline
security profile from Annex D.
SIP (RFC 3261) is an application-layer control protocol that can establish, modify, and ter-
minate multimedia sessions such as VoIP calls. It is not a vertically integrated communica-
tions system, but a component that can be used with other protocols to build a complete
multimedia architecture. Typically, these architectures will include protocols such as RTP/
RTCP, RTSP, and SDP.
SIP describes several security features and their usage guidelines. The main features are
digest authentication, identity authentication, message encryption (S/MIME), media
encryption (SRTP), TLS, and network layer security (IPSec).
Digest authentication means that, when a server receives a request, it may challenge the
request to provide the assurance the originator’s identity. The originator can reply with its
credential with encryption, and the server verifies it and sends back respective response
codes.
Identity authentication is defined in RFC 4474, which specifies a means of sharing a
cryptographic assurance of end-user SIP identity in an interdomain or intradomain context
that is based on the concept of an “authentication service” and a new SIP header, the
Identity header.
S/MIME allows SIP UAs to encrypt MIME bodies within SIP and secure the bodies end-
to-end without affecting message headers.
End Notes 79
SRTP is an extension of RTP, which provides a framework for encryption and message
authentication of RTP and RTCP stream between SIP endpoints.
TLS is to provide transport-layer security over connection-oriented protocols (TCP).
Typically, SIP uses TLS to provide hop-by-hop security in the service network and
eventually give end-to-end security between UAs.
IPSec is a suite of network-layer protocols securing IP network communications by
encrypting and authenticating data. It is independent of the SIP protocol and there is no
required integration between them. However, IPSec is very useful to provide security
between SIP entities, especially between UA and a proxy server.
MGCP is a device control protocol that provides a simple and centralized mechanism of
controlling media gateways, based on a master-slave relation between call agent and media
gateway. The protocol is text-based and offers a set of simple primitives.
MGCP does not define any specification of security profile, but refers to lower-layer
security protocols. It recommends that MGCP messages always be carried over secure
Internet connections, as defined in IPSec using either the IP AH or the IP ESP.
End Notes
1 H.323, “Packet-based multimedia communications systems,” ITU-T, June 2006.
2 H.235, “Security and encryption for H-series (H.323 and other H.245-based)
multimedia terminals,” ITU-T, August 2003.
3 RFC 3261, “SIP (Session Initiation Protocol),” J. Rosenberg, H. Schulzrinne,
G. Camarillo, A. Johnston, J. Peterson, R. Sparks, M. Handley, and E. Schooler,
June 2002.
4 RFC 4474, “Enhancements for Authenticated Identity Management in the SIP,”
J. Peterson, C. Jennings, August 2006.
5 RFC 4346, “Transport Layer Security (TLS) Protocol,” T. Dierks, E. Rescorla,
April 2006.
6 RFC 3435, “Media Gateway Control Protocol (MGCP) Version 1.0,” F. Andreasen,
B. Foster, January 2003.
80 Chapter 3: Security Profiles in VoIP Protocols
References
“Security Considerations for VoIP Systems,” NIST (National Institute of Standards and
Technology), January 2005.
RFC 2617, “HTTP Authentication: Basic and Digest Access Authentication,” J. Franks,
P. Hallam-Baker, J. Hostetler, S. Lawrence, P. Leach, A. Luotonen, L. Stewart, June 1999.
RFC 3264, “An Offer/Answer Model with the Session Description Protocol (SDP),”
J. Rosenberg, H. Schulzrinne, June 2002.
RFC 3711, “Secure Real-time Transport Protocol (SRTP),” M. Baugher, D. McGrew,
M. Naslund, E. Carrara, K. Norrman, March 2004.
RFC 4566, “Session Description Protocol,” M. Handley, V. Jacobson, C. Perkins, July 2006.
RFC 4568, “Session Description Protocol (SDP) Security Descriptions for Media
Streams,” F. Andreasen, M. Baugher, D. Wing, July 2006.
RFC 4961, “Symmetric RTP/RTP Control Protocol,” D. Wing, July 2007.
This page intentionally left blank
This chapter covers the basic concept and practice of the following topics in cryptography:
• Symmetric (Private) Key Cryptography
— DES
— 3DES
— AES
• Asymmetric (Public) Key Cryptography
— RSA
— Digital Signature (DSA)
• Hashing
— MD5
— SHA
— Message Authentication Code (MAC)
• Key Management
CHAPTER
4
Cryptography
The topic of “VoIP security” includes many aspects. One of the key aspects is the method-
ology of information hiding; that is, how to conceal the signals and media in real-time com-
munications from unauthorized entities. Cryptography is the main solution for this aspect.
NOTE The purpose of this chapter is to give a high-level understanding of each technique with
comprehensible figures, rather than looking into the mathematical detail of cryptographic
algorithms.
As an introduction, here are some explanations of the terminology related to this topic.
Cryptography is that part of cryptology that is derived from the Greek cryptos (meaning
hidden) and logos (meaning science), which literally means the science of hiding informa-
tion. Cryptology consists of two areas: cryptography and cryptanalysis.
Cryptography is the practice and study of hiding information based on a secret key. Only
people who have access to the key can encrypt or decrypt the information.
Cryptanalysis (also known as “hacking”) is the practice and study of deciphering encrypted
information without any information about the keys that are used. In a positive way, cryp-
tanalysis helps cryptologists evaluate certain cryptography and create better algorithms. In
a negative way, it is illegally used for cracking encrypted information.
This chapter briefly covers cryptanalysis, but mainly focuses on cryptography in terms of
basic concept and high-level algorithms.
Cryptography is divided into two categories according to the usage of keys: symmetric and
asymmetric key cryptography.
84 Chapter 4: Cryptography
Symmetric key cryptography is based on a single key that both the sender and the receiver
use for encrypting and decrypting the information. Asymmetric key cryptography is based
on two keys: one for encrypting and the other for decrypting. Their implementation is
various, and this chapter covers the well-known cryptographic methods as follows:
• Symmetric (Private) Key Cryptography
— DES (Data Encryption Standard)
— 3DES
— AES (Advanced Encryption Standard)
• Asymmetric (Public) Key Cryptography
— RSA
— Digital Signature Algorithm (DSA)
Additionally, this chapter covers cryptographic hashing functions such as Message-Digest
Algorithm 5 (MD5), Secure Hash Algorithm (SHA), and Message Authentication Code
(MAC), which provide message integrity or authenticity.
Besides these cryptographic methods focusing on the protection of information, key manage-
ment is another important aspect in cryptography. Key management includes key generation,
distribution, storage, replacement, and final destruction. The last section of this chapter
discusses this topic, mainly focusing on key distribution.
The first section introduces symmetric key cryptography.
User A User B
Encryption Decryption
Algorithm Algorithm
Encrypted Encrypted
Message Message
Message Message
Attacker
Data Encryption Standard (DES) was approved as an official Federal Information Processing
Standard (FIPS 46) in 1976, and subsequently has been reaffirmed many times. It has been
popular since then, but it is considered to be insecure because of the relatively short length
of the key (56 bits). Therefore, the latest version (FIPS-46-6) released in 1999 recommends
using Triple DES (3DES), which runs the DES algorithm three times. DES was superseded
by Advanced Encryption Standard (AES) in 2002. Even though DES was superseded, it
remains in widespread use. The National Institute of Standard Technology (NIST) has
approved 3DES through the year 2030 for sensitive government information.
The following three sections give the details of each algorithm.
DES
DES is the most well-known cryptographic algorithm that specifies the method of encrypting
and decrypting data with a secret key.
The algorithm is designed to encrypt and decrypt blocks of data consisting of 64 bits under
control of a 64-bit key. Only 56 bits of the key are used and the remaining 8 bits are used
for parity check. Decrypting must be accomplished by using the same key as for encrypting,
but with the schedule of addressing the key bits altered so that the deciphering process is
the reverse of the enciphering process.
86 Chapter 4: Cryptography
Figure 4-2 illustrates the general diagram of the DES algorithm where the following
notation is used:
The 64-bit input data can be denoted LR. L is 32 left-hand bits and R is 32 right-hand bits.
K1–K16 are DES subkeys that are derived from the 64-bit original key K.
INITIAL PERMUTATION
PERMUTED
LO (32 Bits) RO (32 Bits)
INPUT
K1
+ f
L1 = R0 R1 = L0 + f(RO, K1)
K2
+ f
L2 = R1 R2 = L1 + f(R1, K2)
Kn
+ f
K16
+ f
The 64-bit input data is divided into 32 L bits and 32 R bits when passing through the initial
permutation. After this, 16 operations (called DES rounds) are performed, and inverse
permutation of the two blocks of data is calculated at the last step. The output is 64-bit
encrypted data.
Regarding the 16 operations, every step exchanges L bits with R bits, and the original L bits
are first processed in some manner that consists of binary addition to the function F. The
function F depends on the R bits and a subkey. Assuming that the output of each step is
denoted by L’R’ and the input is LR, the operation can be defined as follows:
L’ = R
R’ = L + F(R,K)
The next algorithm, 3DES, uses this DES three times to add more complexity.
3DES
DES has been used internationally for a long time since its public release, but it is consid-
ered to be insecure because of the short length of its private key, 56 bits. In 1999, two non-
profit organizations (Distributed.net and Electronic Frontier Foundation) collaborated to
publicly break a DES key in 22 hours and 15 minutes to demonstrate its weakness. Besides
this event, there are some analytical papers showing theoretical vulnerability in the DES.
NOTE For more information on the vulnerability in the DES, go to your favorite search engine and
search for “DES Challenges.”
Therefore, the latest version of DES (FIPS-46-6) recommends using 3DES, which runs
DES three times, which is a practical means of providing a more secure mechanism.
Figure 4-3 illustrates the algorithm; each block is same as that of DES in Figure 4-2.
Now that you are aware of the basic cryptographic mechanism of DES and 3DES, the next
section takes a look at the latest algorithm, AES.
88
LO RO LO RO LO RO
K1 K1 K1
+ f + f + f
Kn Kn Kn
+ f + f + f
R16 = L15 + f(R15, K16) L16 = R15 R16 = L15 + f(R15, K16) L16 = R15 R16 = L15 + f(R15, K16) L16 = R15
AES
AES,1 superseding DES, is a symmetric block cipher that can process data blocks of 128 bits,
using cipher keys with lengths of 128, 192, and 256 bits. Based on the fixed block size of
128 bits, AES operates on a 4x4 array of bytes, called the State.
At the start of the encryption, the input data is copied to the State array and processed by
the following sequence:
Step 1 Initial round
— AddRoundKey
Step 2 Rounds (being executed multiple times)
— SubBytes
— ShiftRows
— MixColumns
— AddRoundKey
Step 3 Final round
— SubBytes
— ShiftRows
— AddRoundKey
The first step is adding round keys that are values derived from the private key using the key
expansion routine; they are applied to the State.
The second step is main rounds that consist of four round functions: SubBytes, ShiftRows,
MixColumns, and AddRoundKey. The functions are executed multiple times (depending on
the key length) to the State array.
The third step is the final round, which executes the three functions once. The final State is
then copied to the output. Here is the brief description of four main functions.
SubBytes
The SubBytes() function is a non-linear byte substitution that operates independently on
each byte of the State using a substitution table (S-box). This S-box takes input value (bits)
and transforms them into some number of output value (bits), implemented as a lookup
table. For example, if a State array (1,1) has a value {32}, the substitution value would be
90 Chapter 4: Cryptography
determined by the intersection of the row with index ‘3’ and the column with index ‘2’ in
Table 4-1. This would result in a new value {4b}.
Table 4-1 Substitution Table
0 1 2 3 4
0 ab 11 3d e2 C1
1 24 bf ca bc 19
2 e8 a7 12 7e f7
3 17 ee 4b 5d 16
4 99 2d 0f bn 54
Figure 4-4 illustrates the effect of the SubBytes() function on the State.
ShiftRows
In the ShiftRows() function, the bytes in the last three rows of the State are cyclically
shifted over different numbers of bytes (offsets). The first row, r = 0, is not shifted.
This has the effect of moving bytes to “lower” positions in the row, while the “lowest” bytes
wrap around into the “top” of the row. This transformation provides diffusion in the cipher.
Figure 4-5 illustrates the ShiftRows() function.
Symmetric (Private) Key Cryptography 91
ShiftRows()
S S'
MixColumns
The MixColumns() function operates on the State column by column, treating each column
as a four-term polynomial. The four bytes of each column of the State are combined using
an invertible linear transformation. The MixColumns function takes four bytes as input and
outputs four bytes, where each input byte affects all four output bytes. Like ShiftRows,
MixColumns provides diffusion in the cipher. Figure 4-6 illustrates the MixColumns()
function.
MixColumns()
S0,c S'0,c
S0,0 S0,2 S0,3 S'0,0 S'0,2 S'0,3
S1,c S'1,c
S1,0 S1,2 S1,3 S'1,0 S'1,2 S'1,3
S2,c S'2,c
S2,0 S2,2 S2,3 S'2,0 S'2,2 S'2,3
S3,c S'3,c
S3,0 S3,2 S3,3 S'3,0 S'3,2 S'3,3
92 Chapter 4: Cryptography
AddRoundKey
In the AddRoundKey() function, a round key is added to the State by a simple bitwise XOR
operation. Each round key consists of Nb words from the key schedule (Rijndael’s key
schedule); each key is the same size as the State. Those Nb words are each added into the
columns of the State. The action of this transformation is shown in Figure 4-7, where l =
round * Nb.
l = round * Nb
S0,c S '0,c
S0,0 S0,2 S0,3 S '0,0 S '0,2 S '0,3
S1,c S '1,c
S1,0 S1,2 S1,3 S '1,0 S '1,2 S '1,3
+ Wl + c
Wl Wl + 2 Wl + 3
S2,c S '2,c
S2,0 S2,2 S2,3 S '2,0 S '2,2 S '2,3
S3,c S '3,c
S3,0 S3,2 S3,3 S '3,0 S '3,2 S '3,3
Now that you are aware of the basic algorithm of AES, the next section covers public-key
based asymmetric cryptography.
NOTE Many asymmetric key systems use the public/private keys to securely exchange a symmetric
key. The symmetric key is then used to send/receive the actual data between the two
endpoints. This is done because symmetric cryptography uses fewer CPU resources than
public key cryptography. TLS, for example, does this.
The section “Key Management” in this chapter shows the detailed information.
Another popular usage of asymmetric key cryptography is Digital Signature (DS), which is
an electronic analogue of a written signature, which proves the message was signed by the
originator.
In other words, the receiver can verify the identity of the originator (that is, authenticity)
through the signature. Additionally, the digital signature provides a mechanism to verify
that the message has not been altered in transit (that is, message integrity).
This section introduces two commonly used standards: Rivest, Shamir, and Adleman
(RSA) and Digital Signature Algorithm (DSA). RSA can be used for both message
encryption and digital signature. DSA can be used only for digital signature.
NOTE Keep in mind that the purpose of this section is not looking into the mathematical detail of
cryptographic algorithms, but giving a high-level understanding of each technique with
comprehensible figures.
RSA
Since RSA was publicly released in 1977, it has been the most popular type of public key
cryptography. The name RSA is the surname initials of the inventors (Rivest, Shamir, and
Adleman) at Massachusetts Institute of Technology (MIT).
RAS uses two keys: one key for encrypting and the other key for decrypting data. For
message privacy (hiding), the public key is for encryption and the private key for decryption.
For digital signature, the private key is for encryption and the public key for decryption
(authentication). Figures 4-8 and 4-9 show the difference.
The public key can be and often is shared with other parties. However, the private key
remains a secret of the device (or user), and is never shared.
It is relatively easy for communication parties to calculate the public/private pair of keys.
However, it is almost impossible for an attacker in the middle to determine the private key
even if the attacker knows the public key and the cryptographic algorithm.
94 Chapter 4: Cryptography
RSA can be used for both message privacy and digital signature. Figure 4-8 illustrates the
usage of message privacy, and Figure 4-9 the usage of digital signature.
CA
User A User B
B’s B’s
Public Key Private Key
Attacker
Figure 4-8 shows how to protect the privacy of User A’s message with User B’s public key.
When User A sends a message to User B, User A encrypts it with User B’s public key. User
B receives the encrypted message and decrypts it with User B’s private key based on
the RSA algorithm. An attacker in the middle may already know User B’s public key and
intercept the encrypted User A’s message, but there is no reverse algorithm that retrieves the
original message or the private key.
CA
User A User B
A’s A’s
Private Key Public Key
Attacker
Asymmetric (Public) Key Cryptography 95
Figure 4-9 shows the mechanism of a digital signature that provides two things; only User
A sent the message (authenticity), and the message was not changed while being transferred
(integrity). When User A sends a message, A encrypts it with User A’s private key, which is
never exposed to the public. User B receives the message and decrypts it with User A’s
public key to verify its authenticity and integrity.
There could be many different ways to pass the User A’s public key. The popular one is that
User A includes his own certificate (includes User A’s public key) in the message, which
is signed by a Certificate Authority (CA; see Note). When User B receives the message
(including the certificate), User B validates the certificate by the public key of the CA, and
then uses User A’s public key.
An attacker in the middle may intercept the message and change some information, but
User B can detect the attack while authenticating the message with User A’s public key.
NOTE Certificate Authority (CA) is an organization or network entity that issues a digital certificate,
which contains a public key and the owner’s identity information according to the request.
Because of security issues, a CA is supposed to be a trusted entity (for example, a trusted
third party) that both applicants and communication parties can rely on.
There are many third-party commercial CAs, such as VeriSign or Comodo, as well as free
CAs. Some organizations, such as governments, use their own CA.
Digital Signature
The purpose of a digital signature is verifying two things, as mentioned previously: the
authenticity of the originator and the integrity of the message.
The well-known standard of digital signature is DSA, which was proposed by NIST in
1991, specified in FIPS 186. Figure 4-10 illustrates the DSA mechanism, which is how the
digital signature is created, transferred, and verified at a high level.
In Figure 4-10, User A sends a message to User B, along with User A’s signature. The steps
of the process can be summarized as follows:
Step 1 User A generates a hash value from User A’s original message. (Refer to
the next section for hash functions.)
Step 2 User A creates a digital signature with the hash value and User A’s private
key, by DSA algorithm.
Step 3 User A attaches the signature to the original message.
Step 5 User B divides the combined message into the original message and signature.
96 Chapter 4: Cryptography
Step 7 User B uses the hash value and User A’s public key, and generates a value
by DSA.
Step 8 User B compares the value with the signature, and determines the
authenticity of User A and message integrity.
CA
User A User B
A’s A’s
Private Key Public Key
Compare
Attacker
In this section so far, you have learned about asymmetric cryptography, which uses two
keys (private and public key) for message privacy and digital signature. The next section
covers hashing algorithms, which are often used as part of other cryptographies.
Hashing
This section covers well-known hashing algorithms that are employed by many applications,
such as transport layer security (TLS), secure shell (SSH), secure multipurpose Internet
mail extensions (S/MIME), and IP security (IPSec). This section introduces three well-
known hashing algorithms; message digest algorithm 5 (MD5), SHA, and MAC.
The first algorithm is MD5 as described in the following section.
Hashing 97
Hash c3fcd3d76192e400
“Hi”
Function 7dfb496cca67e13b
Attacker
The most popular hash function is Message-Digest algorithm 5 (MD5), defined in RFC 1321.2
The MD5 algorithm takes a message of arbitrary length as input and produces a 128-bit
“fingerprint” or “message digest” as output. It is conjecture that it is computationally
infeasible to produce two output messages having the same input, or to produce the same
output having two different input messages.
The MD5 algorithm is intended for digital signature applications, where a large file must
be “compressed” in a secure manner before being encrypted with a private (secret) key
under a public-key cryptosystem, such as RSA.
The MD5 algorithm is designed to be quite fast on 32-bit machines. In addition, the MD5
algorithm does not require any large substitution tables; the algorithm can be coded quite
compactly.
98 Chapter 4: Cryptography
The next hashing algorithm, Secure Hash Algorithm (SHA), provides a more secure
mechanism, and is considered to be the successor to MD5.
SHA
Secure Hash Standard (SHS) specifies, as defined in FIPS 180-2,3 four Secure Hash
Algorithms; SHA-1, SHA-256, SHA-384, and SHA-512. All four of the algorithms are
iterative, one-way hash functions that can process a message to produce a condensed
representation called a message digest. These algorithms enable the determination of a
message’s integrity: any change to the message will, with a very high probability, result in
a different message digest. This property is useful in the generation and verification of
digital signatures and message authentication codes, and in the generation of random
numbers (bits).
Each algorithm can be described in two stages: preprocessing and hash computation.
Preprocessing involves padding a message, parsing the padded message into m-bit blocks,
and setting initialization values to be used in the hash computation. The hash computation
generates a message schedule from the padded message and uses that schedule, along
with functions, constants, and word operations, to iteratively generate a series of hash
values. The final hash value generated by the hash computation is used to determine the
message digest.
The four algorithms differ most significantly in the number of bits of security that are pro-
vided for the data being hashed—this is directly related to the message digest length. When
a secure hash algorithm is used in conjunction with another algorithm, there may be require-
ments specified elsewhere that require the use of a secure hash algorithm with a certain
number of bits of security. For example, if a message is being signed with a digital signature
algorithm that provides 128 bits of security, that signature algorithm may require the use of
a secure hash algorithm that also provides 128 bits of security (for example, SHA-256).
Additionally, the four algorithms differ in terms of the size of the blocks and words of data
that are used during hashing. Table 4-2 presents the basic properties of all four secure hash
algorithms.
Table 4-2 SHA Properties
The next algorithm, Message Authentication Code (MAC), is different from MD5 or SHA
in terms of using a shared private key, as follows.
User A User B
Authentication Authentication
Algorithm Algorithm
Compare
Attacker
100 Chapter 4: Cryptography
In Figure 4-12, when User A sends a message to User B, User A generates MAC first with
the original message and the private key by the authentication algorithm. User A attaches
the MAC to the original message and sends to User B. When User B receives it, she extracts
the original message only and generates the output (MAC) by the same algorithm. User B
compares the output with the MAC that User A sent, and verifies message integrity. If User
A and B fully trust each other and the private key is shared only between them, this verifi-
cation includes the authenticity of originator as well.
An attacker in the middle may intercept and modify the message, but the attacker’s modi-
fication will be discovered by User B comparing the MAC.
MAC algorithms also can be constructed with cryptographic hash functions like MD5 or
SHA-1, in combination with a shared secret key. This MAC algorithm is called Keyed-
Hashing Message Authentication Code (HMAC), which is defined in RFC 2104. The
cryptographic strength of HMAC depends on the properties of the underlying hash
function.
Key Management
The previous sections examined cryptographic methods and algorithms with symmetric or
asymmetric keys, which focus on how to provide message integrity, confidentiality, and
authenticity with the keys. The remaining topic in this cryptography is key management,
which focuses on how to securely maintain those keys from creation to final destruction.
Key Management 101
Key management includes the following aspects: key generation, distribution, storage,
replacement, and final destruction.
The key generation must be unpredictable, even for the key users. If certain keys have a
higher possibility of being generated, it would be easier for attackers to find the key. So, the
key must be generated randomly by machine, with enough complexity. If the system of key
generation requires the user’s input, it should combine multiple users’ input so that a single
user cannot predict the source of key generation.
The key distribution is most critical because the key in transit could be exposed to attackers.
The key distribution system should assume that all keys could be modified in transit and
prepare a secure mechanism for the case.
One of the desirable system capabilities is detecting any modification of a key while being
transferred and discarding the key immediately.
Using additional keys for distribution (named transportation keys) is also a desirable solution.
The following subsection shows how to use asymmetric keys (a private and public key) to
distribute another key (a symmetric key).
The session keys should be stored in a secure format (for example, encrypted format) that
requires a storage key. The key storage protects the key even if an intruder obtains the
encrypted key. The storage keys and transportation keys are often called meta keys.
The key replacement is closely related to the security level of the system, depending on how
frequently the system changes the keys. Even though an attacker intercepts and cracks an
encrypted key with sophisticated tools, which takes time usually, the cracked key is useless
if the system already changed the key. The frequency of replacement could be determined
by the importance of data, strength of the cryptographic algorithm, security level of the
network, and so on.
Key destruction means to completely erase used keys. If an attacker could find used keys,
even though they are not used now, they might give a clue of a pattern for generating keys in
the system.
Key Distribution
The most significant aspect of key management is key distribution that could be exposed
to potential attackers in transit. There are many ways to distribute keys securely based on
either symmetric or asymmetric cryptography. One of the popular ways is, based on the
RSA algorithm, using asymmetric keys (a private and public key) to distribute another key
(a symmetric key), as shown in Figure 4-13.
102 Chapter 4: Cryptography
CA
User A User B
Encryption Decryption
Attacker
In Figure 4-13, User A sends a KEY (symmetric key) to User B, by using their given
asymmetric keys. The steps of key distribution from User A to User B are summarized as
follows:
Step 1 User A encrypts a key with User A’s private key.
Step 4 User B receives and decrypts it with User B’s private key.
— Now User B has the symmetric key used for the data sent by User
A and B.
Summary 103
As long as the private keys (A’s and B’s) are maintained securely, it is virtually impossible
for an attacker in the middle to extract the key from the encrypted message, even when the
attacker knows User A’s and User B’s public key.
Summary
The purpose of cryptography is to provide message integrity, confidentiality, and authenticity
between communication parties, by means of shared keys. It is divided into two categories
according to the type of key used: symmetric and asymmetric key cryptography.
Symmetric key cryptography is based on a single key that both the sender and the receiver
use for encrypting and decrypting the message. The well-known standards are DES, 3DES,
AES, and MAC.
DES is designed to encrypt and decrypt blocks of data consisting of 64 bits under control
of a 64-bit key. Only 56 bits of the key are used and the remaining 8 bits are used for parity
checks. DES was considered to be insecure because of the relatively short length of the key
(56 bits). Therefore, the latest version (FIPS-46-6) released in 1999 recommends using
3DES, which runs the DES algorithm three times.
AES, superseding DES, is a symmetric block cipher that can process data blocks of 128 bits,
using cipher keys with lengths of 128, 192, and 256 bits. Based on the fixed block size of
128 bits, AES operates on a 4x4 array of bytes called the State. At the start of the encryption,
the input data is copied to the State array and processed by three steps; each step executes
its own round functions.
Asymmetric key cryptography use two keys; one for encryption and the other for decryption.
The most common usage (RSA algorithm) is for message encryption in which a public key
is used for encrypting and a private key for decrypting. A message receiver maintains both
keys and exposes only a public key to the public. A sender encrypts his own message with
the private key whenever sending to the receiver. It is virtually impossible to crack the
encrypted message without the private key.
Another popular usage of asymmetric key cryptography is the digital signature, which is an
electronic analogue of a written signature, which proves the message was signed by the
originator. In other words, the receiver can verify the identity of the originator (that is,
authenticity) through the signature. Additionally, the digital signature provides a mechanism
to verify that the message has not been altered in transit (that is, message integrity).
A cryptographic hash function converts various lengths of data (input) into fixed-length
data (output) without using a key, which is different from regular key-based cryptographic
algorithms. Calculating the output is simple and quick, but there is no reverse algorithm that
retrieves the original message. Generally, the hash function is used as part of other crypto-
graphies, such as DSA. The well-known hashing algorithms are MD5, SHA, and HMAC.
104 Chapter 4: Cryptography
Besides the cryptographic methods described in this chapter, which focus on the protection
of information, key management is another important aspect in cryptography. It focuses on
how to securely maintain those keys from generation to distribution, storage, replacement,
and final destruction. The key distribution is most critical because the key in transit could
be exposed to men in the middle. The key distribution system should assume that all keys
could be modified in transit, and prepare a secure mechanism for the case.
End Notes
1 Advanced Encryption Standard (FIPS 197), NIST (National Institute of Standards
and Technology), November, 2001.
2 RFC 1321, “MD5 Message-Digest Algorithm,” R. Rivest, April, 1992.
3 Secure Hash Standard (FIPS 180-2), NIST (National Institute of Standards and
Technology), August, 2002.
References
“Certificate Authority,” Wikipedia, http://en.wikipedia.org/wiki/Certificate_authority.
“Cryptography Basics,” Tech-invite, http://www.tech-invite.com/Ti-crypto.html.
“Data Encryption Standard,” Wikipedia, http://en.wikipedia.org/wiki/Data_Encryption_
Standard.
Data Encryption Standard (FIPS 46-6), NIST (National Institute of Standards and
Technology), October 1999.
Digital Signature Algorithm (FIPS 186-2), NIST (National Institute of Standards and
Technology), January 2000.
“SHA hash functions,” Wikipedia, http://en.wikipedia.org/wiki/SHA.
“Message authentication code,” Wikipedia, http://en.wikipedia.org/wiki/Message_
authentication_code.
RFC 2104, “Keyed-Hashing for Message Authentication (HMAC),” H. Krawczyk,
M. Bellare, C. Canetti, February 1997.
van der Lubbe, J.C.A. Basic Methods of Cryptography. Cambridge, UK: Cambridge
University Press, 1998.
This page intentionally left blank
This chapter covers fundamental information about the following VoIP network elements
from a security perspective:
• Security devices
— VoIP-aware firewall
— Network Address Translation (NAT)
— Session Border Controller
— Lawful Interception Server
• Service devices
— Customer Premise Equipment (CPE)
• Call processing servers
CHAPTER
5
Security Devices
The security devices are primarily designed for providing security itself. There are two
types of VoIP security devices. One originated from legacy data security, such as firewalls
and NAT. The other was invented for VoIP service, such as Session Border Controller and
Lawful Interception server. The following section gives a brief description of these devices.
VoIP-Aware Firewall
A firewall is a primary device for security in an IP network that protects the internal network
and devices from external attacks. The general function is blocking certain types of traffic
based on a policy that an administrator preconfigured. This policy consists of the range of
IP addresses, port numbers, protocols, traffic directions, bandwidth consumption, and so on.
There are two types of firewall in terms of capability of recognizing VoIP protocols: legacy
and VoIP-aware firewalls.
The legacy firewall handles packets only in the network and transport layer, and does not
care what protocol is going through into the application layer. However, the VoIP-aware
firewall has additional capability to inspect and manipulate VoIP packets in the application
layer for secure service.
Next, you will learn about the VoIP-aware firewall.
An Access Control List (ACL) is a primary method used by a firewall to protect VoIP serv-
ers, media gateways, and CPEs from external devices that are not supposed to communicate
with them. Using ACL for VoIP traffic is not simple because the ports used by VoIP entities
change dynamically based on the call setup. You may use a static configuration, such as a
certain range being always opened or blocked, but that creates potential vulnerability.
In general, an endpoint and a server (for example, SIP proxy) are using the client/server
model for signaling for call setup, and the media channel between endpoints is established
directly, that is, end-to-end. If the call signaling message does not go through a firewall, the
media stream cannot pass through it because the firewall does not know which ports need
to be opened.
Besides the dynamic port assignment, an advanced VoIP-aware firewall has the following
capabilities:
• Protocol message inspection—An advanced VoIP-aware firewall checks out the
integrity of protocol messages (for example, Session Initiation Protocol), and blocks
the originator if it detects any malformed messages. If those malformed messages
pass through without being blocked, the receiver (VoIP server, IP phone, and so on)
may have system error.
Security Devices 109
• Denial-of-Service (DoS) protection—It detects any flooded messages and blocks the
originator for a certain amount of time, based on the policy. The policy may include
number of call attempts per second, number of messages per second, number of
invalid messages, and so on.
• Bandwidth control—It can assign maximum bandwidth for each endpoint (or group),
and block any overused endpoint.
Because a firewall handles a large amount of traffic by nature, capabilities and performance
need to be taken into account. Performance includes the amount of latency, which the
firewall can increase if it is under high load or even under attack. The general rule in VoIP
deployment is to keep the CPU usage less than 60 percent for normal usage. If the CPU
usage goes up more than 60 percent, especially in sustained high usage, the quality of
service (QoS) will degrade and phones will start to unregister. When this happens, the
phones will attempt to reregister with a VoIP server, which increases the load on the firewall
even more.
NAT
Network Address Translation (NAT), as defined in RFC 2663,1 is a method by which IP
addresses are mapped from one realm to another, in an attempt to provide transparent
routing to hosts. Traditionally, NAT devices are used to connect an isolated address realm
with private unregistered addresses to an external realm with public unique registered
addresses. There are four different types of NAT based on RFC 34892 as follows, even
though these well-known names are inadequate for describing real-life NAT behavior (see
the following Note):
1 Full cone NAT
In full cone NAT, all requests from the same internal IP address and port are mapped
to the same external IP address and port. Furthermore, any external host can send a
packet to the internal host, by sending a packet to the mapped external address.
2 Restricted cone NAT
In restricted cone NAT, all requests from the same internal IP address and port are
mapped to the same external IP address and port. Unlike a full cone NAT, an external
host (with IP address X) can send a packet to the internal host only if the internal host
had previously sent a packet to IP address X.
3 Port restricted cone NAT
Port restricted cone NAT is like a restricted cone NAT, but the restriction includes port
numbers. Specifically, an external host can send a packet, with source IP address X
and source port P, to the internal host only if the internal host had previously sent a
packet to IP address X and port P.
110 Chapter 5: VoIP Network Elements
4 Symmetric NAT
In symmetric NAT, all requests from the same internal IP address and port, to a
specific destination IP address and port, are mapped to the same external IP address
and port. If the same host sends a packet with the same source address and port, but
to a different destination, a different mapping is used. Furthermore, only the external
host that receives a packet can send a User Datagram Protocol (UDP) packet back to
the internal host.
Determining the type of NAT is important in many cases. Depending on what the
application wants to do, it may need to take the particular behavior into account.
NOTE RFC 3489 used the terms “full cone,” “restricted cone,” “port restricted cone,” and
“symmetric” to refer to different variations of NATs applicable to UDP only. Unfortunately,
this terminology has been the source of much confusion, as it has proven inadequate for
describing real-life NAT behavior. Therefore, RFC 4787 refers to specific individual NAT
behaviors instead of using the cone/symmetric terminology.
NOTE In the rest of this section, NAT means Network Address and Port Translation (NAPT).
Multiple internal hosts use only private addresses when communicating with each other,
and share the single public IP when communicating with external hosts.
The other benefit of NAT is providing access security, much like a firewall that blocks
incoming unsolicited packets. The access from internal to external hosts is relatively simple
depending on the type of NAT. However, the access from external to internal hosts is almost
impossible because the addresses of internal hosts are not publicly routable. The only way
to make it work is that the internal host makes a mapping first on the NAT and the external
host sends packets through the pinhole, which provides strict access security. However, this
benefit has serious side effects on VoIP.
NAT devices are application-unaware in that the translations are limited to IP, TCP, UDP,
Internet Control Message Protocol (ICMP) headers, and ICMP error messages only.
NAT devices do not change the payload of the packets, as payloads tend to be application-
specific. For this reason, there are serious issues with VoIP protocols, such as SIP/Session
Definition Protocol (SDP). Figure 5-1 illustrates the NAT traversal issue with the SIP/SDP
protocol. User A (IP phone user) makes a call to User B (public switched telephone network
[PSTN] phone user) through NAT and media gateway.
Security Devices 111
Internal Network
(10.10.10.0) NAT
Media
10.10.10.1 INVITE Gateway
161.10.10.10
PSTN
V
INVITE
162.10.10.10
No Media to
10.10.10.10
10.10.10.10
User A User B
The IP phone uses a private IP address (10.10.10.10) and the NAT device maps it to the
public address (161.10.10.10) whenever sending outbound packets. The media gateway has
a public IP address (162.10.10.10) and works as an endpoint on behalf of User B’s phone.
(Note that this example does not consider port numbers, just to simplify.)
The SIP INVITE message in Figure 5-1 is shown in Example 5-1.
Example 5-1 SIP/SDP Messages Through NAT
v=0
o=UserA 2890844526 2890844526 IN IP4 10.10.10.10
s=-
c=IN IP4 10.10.10.10
t=0 0
m=audio 49172 RTP/AVP 0
a=rtpmap:0 PCMU/8000
continues
112 Chapter 5: VoIP Network Elements
v=0
o=UserA 2890844526 2890844526 IN IP4 10.10.10.10
s=-
c=IN IP4 10.10.10.10
t=0 0
m=audio 49172 RTP/AVP 0
a=rtpmap:0 PCMU/8000
When the media gateway receives the INVITE message, it looks at the “c=” line in the SDP
to find out where it will send media. After establishing a SIP dialog, the media gateway tries
to send User B’s voice to IP phone, but it fails because the private IP (10.10.10.10) is not
publicly routable. This is a typical problem when NATed endpoints try to communicate
with other external endpoints.
There are some sophisticated NAT devices that know the application protocols and replace
all IP/port information in the layer, but those devices are not commonly deployed yet.
Another typical problem happens when making inbound calls, supposing that the IP phone
of User A is registered to the media gateway. When User B makes a call to User A, the
media gateway sends an initial INVITE message to the IP phone based on the address-of-
record (IP phone’s mapped IP and port). That is, the registration process is very critical
because that is the only way for the media gateway to know the actual mapped IP/port
address of the IP phone. The problem happens when the registration interval is not short
enough (for example, the IP phone registers every 30 minutes), as the following three
examples show:
• If the NAT device refreshes the mapping table, the media gateway cannot reach the
IP phone until the next registration message comes in.
• If the internal address of the IP phone is changed, the media gateway cannot reach
the phone until the next registration message comes in.
• If the media gateway is rebooted (for example, because of system error) and loses
the registration information, it cannot reach the phone until the next registration
message comes in.
Security Devices 113
Of course, you can minimize the impact as long as you make the registration interval very
short, but doing that consumes more bandwidth and resources.
The next topic is Session Border Controller, which is another important element for secure
VoIP service.
Session Session
Border Border
Core Network
(Service Provider)
Session Session
Border Border
Access Network
(Enterprise Access Network
Customer) (Consumer)
There are typically two network borders from a VoIP service provider’s perspective. One is
between the customer’s access network and the service provider’s network (core network).
The other is between the core network and the other service provider’s network (peer
network).
The customer’s access network is most likely that of the local Internet service provider
(ISP) who provides Internet access service, which is generally different from the telephony
service provider’s network. (Note that it is possible for the telephony service provider to
provide the access network, especially for enterprise customers.) The peer network is
typically a call-termination network, such as a PSTN termination or IP hand-off.
114 Chapter 5: VoIP Network Elements
The role of SBC is, simply speaking, resolving border issues that include interoperability
and security issues as described in the following list:
• DoS (intentional flooding)—Malicious traffic from a large number of infected
devices around public networks (Distributed Denial-of-Service [DDoS]), or from an
attacker’s machine generating massive call requests. Most VoIP servers are vulnerable
to this type of attack because it’s very difficult to implement sophisticated access
control.
• DoS (unintentional flooding)—This is not malicious traffic, but the impact is almost
same as intentional flooding. An example is a large number of registration requests
issued at the same time after a global power outage followed by a power backup.
• Exposed topology of core network—Most IP addresses and port numbers of VoIP
servers are exposed for public service, which means that attackers may send probe
messages to learn the characteristics of the servers and then generate many types of
malicious calls, such as spoofed or malformed messages.
• Traversing firewall or NAT—Most enterprise customers use firewall or NAT for
security purposes, but this may cause a one-way or no-audio issue when traversing
two different networks. The SBC can resolve this issue.
• Protocol conflict—Each service provider has its own VoIP protocol and there are
always interoperability issues between them, even if they use the same standard
protocol, such as coder-decoder (codec) conflict. Most issues are not directly related
to security, but some of them are related. For example, one requires Transport Layer
Security (TLS) connection when sending SIP messages, but the other does not.
• Regulatory mandate (lawful interception)—There is a complicated governmental
security issue when intercepting VoIP traffic in this border because of many different
types of call routing through heterogeneous networks. The details of lawful
interception are discussed in Part III, “Lawful Interception (CALEA),” in this book.
• Ensuring quality of service—This is a generic issue when VoIP traffic goes through
heterogeneous networks (not directly related to border security).
For details about SBC, refer to Chapter 8, “Protection with Session Border Controller.”
The call content is, for example, voice or video. The call data is a dialed number, call direc-
tion, call duration or signaling information, and so on. The target subscriber is identified
generally by a phone number. The LEA could be any agency that is able to request the lawful
interception. For example, in the United States, the FBI or a police officer requests it with
a corresponding warrant.
LI in PSTN networks has been executed for a long time in most developed countries. The
scope of LI in this context is a VoIP network, managed by telecommunication service
providers (TSPs) who are being asked to meet legal and regulatory requirements for the
interception of voice and data communications in IP networks in a variety of countries
worldwide. Almost every developed country has its own LI requirements and has adopted
global standards (or proposals) fully or partially, developed by standard organizations.
The LI servers are performing multiple functions to provide LI service. The functions are
broadly categorized as access, delivery, collection, service provider administration, and law
enforcement administration functions. Each function could be performed by each logical
server. The relationship between these functional categories is shown in Figure 5-3.
Lawful
Authorization
Telecommunication Service Provider
Delivery Function
Service Provider
Administration Function
Access Function
In Figure 5-3, the Access Function, Delivery Function, and Service Provider Administration
Function are the responsibility of the TSP, and the Collection Function and Law
Enforcement Administration Function are the responsibility of the LEA.
116 Chapter 5: VoIP Network Elements
NOTE All LI functions begin with an initial capital letter for each function’s name followed by the
letter “F,” such as “AF” for “Access Function,” because these names are defined by LI
specifications and not as general terms.
Service Devices
The service devices are primarily designed for providing VoIP services like call setup,
media control, protocol conversion, voicemail access, user interaction, and so on. They also
have some security features in order to provide access control or protect service features.
The following sections give a brief description of service devices that are commonly used
in VoIP service.
• IP phone—An IP-based phone that converts digital signals to analog tones, and vice
versa. It communicates with a VoIP server (for example, Softswitch) to send and
receive calls. An example is the Cisco 7960 series.
Most IP phones are password-protected and provide an interface (for example, HTTP)
to enable or disable communication ports or security features (for example, media
encryption).
• Softphone—A software-based phone that runs on a computer. It relies on computer
resources (for example, sound cards, CPU, and memory) to process calls, which
makes it relatively easy to implement phone functions. An example is the Skype client
program.
Most softphones provide an interface to enable or disable communication ports or
security features, similar to IP phones. They also rely on the security features from the
operating system (for example, the Windows firewall).
• Analog Telephone Adapter (ATA)—An access device that has Foreign Exchange
Stations (FXS) and Ethernet interfaces. Regular analog phones are connected to the
FXS ports and send/receive calls through an IP network. An example is Cisco ATA 188.
The security features are almost the same as those of IP phones. Generally, ATA provides
an interface to apply different security policy to each port.
• Integrated Access Device (IAD)—An access device that provides multiple types
of interfaces to users, such as analog (FXS, FXO) and digital (T1/E1) interfaces.
Different types of phones (analog or IP phones) are connected to the interfaces and
send/receive calls through an IP trunk. Typically, it has proxy functions (for example,
SIP proxy) that negotiate call setup with external servers.
One of the well-known IADs is the Cisco IAD 2400 series. An IAD provides relatively
rich security features because of multiple interfaces, high volume of traffic, and
different types of call control. The features might include access control for adminis-
tration, user credential management, encryption, session key control, and ACL for
signal or media.
Besides these CPE, the following call processing servers are also essential service devices.
There are many different kinds of call processing servers, and the typical ones are as
follows:
• Protocol proxy—An intermediary entity that sets up call connections between
clients. It acts as both a receiver and sender for the purpose of making requests on
behalf of other clients.
A protocol proxy server mainly plays the role of routing, which means its job is to
ensure that a request is sent to another entity closer to the target user. It is also useful
for enforcing policy, such as applying call permissions depending on source or
destination numbers. It may interpret and rewrite specific parts of a request message
before forwarding it. Some examples of protocol proxies are SIP proxy, H.323
gatekeeper, and Media Gateway Control Protocol (MGCP) call agent.
The basic security mechanism is that protocol proxies rely on authentication of each
endpoint before setting up the call. The endpoints are generally authenticated based
on user ID, password, IP address, or credentials.
• Back-to-Back User Agent (B2BUA)—A logical SIP entity that receives a request
and processes it as a user agent server, and regenerates the request as a user agent
client to the target user agent. Unlike a SIP proxy, it maintains dialog state and
participates in all requests sent on the dialogs it has established. So, it has full control
of all messages.
B2BUA also relies on the authentication of each endpoint before processing the call,
as protocol proxies do.
• IP PBX—An IP-based PBX that manages internal calls within the domain and
terminates external calls through an IP trunk or PSTN media gateway. It has features
of legacy PBX plus advanced ones, such as call forward, call transfer, call routing, call
waiting, call parking, interactive voice response (IVR), music on hold, FMFM (Find
Me Follow Me), VoIP protocol conversion, SIP trunking, and so on. An example of
IP PBX is Asterisk.
Generally, IP PBX authenticates each phone based on phone number, MAC address,
IP address, user ID, or password.
• Softswitch—Works like a legacy Class 4 or 5 switch in the central office, providing
Class 4 features (for example, routing) and Class 5 features (for example, call transfer
and forward). However, the difference is that it is a program running on regular
operating systems (for example, Linux), and it is located in an IP network; that is, all
in and out traffic consists of IP packets. It typically has multiple VoIP protocol inter-
faces (for example, SIP, H.323, and MGCP) and converts them when passing through,
as a B2BUA. An example of Softswitch is Sylantro Softswitch.
Most Softswitches have rich security features like user authentication, message
encryption, media encryption, TLS (transport layer security), CAC (call admission
control), Denial-of-Service protection, and ACL (access control list).
Service Devices 119
• Rich media server—A controlling device that provides rich media communications
like voice, video, instant messaging(IM), presence, web collaboration, and multimedia
conference. An example is Cisco Unified Communication Manager. The security
features are almost the same as what Softswitch has.
• Media gateway—A gateway device located in between different types of networks,
such as Time-Division Multiplexing (TDM), IP, and Next Generation Network
(NGN). Typically, it converts media packets (IP network) to digitized signals (TDM
network), and vice versa. An example of a media gateway is the Cisco AS5400 series.
The security features of media gateway include
— User authentication (for example, Password Authentication Protocol [PAP])
— Challenge Handshake Authentication Protocol (CHAP; see the “PAP Versus
CHAP” section)
— Multilevel password protection
— ACL
— DoS protection
— IP spoofing prevention
— Remote Authentication Dial-in User Service (RADIUS; see the “RADIUS
Versus TACACS+” section) for network access management
— Terminal Access Control Access System Plus (TACACS+; see the “RADIUS
Versus TACACS+” section)
— Logging
its own calculation of the expected hash value. If the values match, the authentication is
acknowledged; otherwise, the connection should be terminated. CHAP provides protection
against playback attack through the use of an incrementally changing identifier and a
variable challenge value. The use of repeated challenges is intended to limit the time of
exposure to any single attack. The authenticator is in control of the frequency and timing of
the challenges.
Summary
The network architecture of VoIP security consists of two groups of devices: service
devices and security devices.
The security devices are primarily designed for providing security services like access
control, intrusion detection, DoS protection, lawful interception, and so on. Examples of
those devices are VoIP-aware firewall, NAT, SBC, and Lawful Interception server.
A VoIP-aware firewall has legacy firewall functions and additional capability to inspect and
manipulate VoIP packets in the application layer for secure service, based on predefined
policy. The capability includes dynamic port assignment, DoS protection, protocol message
inspection, and bandwidth control. The policy consists of the range of IP addresses, port
numbers, protocols, traffic directions, bandwidth consumption, and so on.
The benefit of NAT (NAPT) is reducing the usage of public IP addresses and providing
access security. The access from external to internal hosts through NAT is almost impossible
because the addresses of internal hosts are not publicly routable. The only way to make it
End Notes 121
work is that the internal host makes a mapping first on the NAT and the external host sends
packets through the pinhole, which provides strict access security. However, this benefit has
serious side effects on VoIP. NAT devices are application-unaware in that the translations
are limited to IP, TCP, UDP, and ICMP.
Such NAT devices do not change the payload of the packets, as payloads tend to be
application-specific. For this reason, there can be one-way or no-media issues with VoIP
protocols such as SIP/SDP.
An SBC is a controlling device located on a border of two network sessions that are logical
boundaries of a VoIP network. The role of an SBC is resolving border issues that include
interoperability and security issues, such as Denial-of-Service, exposed topology of core
network, traversing firewall or NAT, protocol conflict, lawful interception, ensuring quality
of service, and so on.
LI, also known as wiretapping, is the lawfully authorized interception of communications
and call-identifying information for a particular telecommunication subscriber, requested
by a law enforcement agency. The LI servers perform multiple functions to provide LI
service. The functions are broadly categorized as AF, DF, CF, SPAF, and LEAF. Each
function could be performed by each logical server.
The service devices are primarily designed for providing VoIP services like call setup,
media control, protocol conversion, voicemail access, user interaction, and so on. As a
secondary purpose, most service devices provide limited security features. Examples of
those devices are CPE (IP phone, softphone, ATA, IAD) and call processing servers
(Softswitch, protocol proxy, B2BUA, IP PBX, rich media server, and media gateway).
There is no magic bullet; that is, a single device or architecture that can protect whole VoIP
service network securely. The best practice is analyzing current vulnerability and applying
a “consolidated” solution that includes all possible network devices.
End Notes
1 RFC 2663, “IP Network Address Translator (NAT) Terminology and Considerations,”
P. Srisuresh, M. Holdrege, August 1999.
2 RFC 3489, “STUN—Simple Traversal of UDP Through NAT,” J. Rosenberg,
J. Weinberger, C. Huitema, R. Mahy, March 2003.
3 RFC 1334, “PPP Authentication Protocols,” B. Lloyd, W. Simpson, October, 1992.
122 Chapter 5: VoIP Network Elements
References
ATIS T1.678, Lawfully Authorized Electronic Surveillance (LAES) for Voice over Packet
Technologies in Wireline Telecommunications Networks, Alliance for Telecommunications
Industry Solutions, http://www.atis.org.
Cisco Unified Communications SRND, based on Cisco Unified Communications Manager
Release 6.x, http://www.cisco.com/en/US/products/sw/voicesw/ps556/products_
implementation_design_guide_book09186a008085eb0d.html.
RFC 1994, “PPP Challenge Handshake Authentication Protocol (CHAP),” W. Simpson,
August 1996.
RFC 2865, “Remote Authentication Dial in User Service (RADIUS),” C. Rigney, S. Willens,
A. Rubens, W. Simpson, June 2000.
RFC 3261, “SIP (Session Initiation Protocol),” J. Rosenberg, H. Schulzrinne, G. Camarillo,
A. Johnston, J. Peterson, R. Sparks, M. Handley, and E. Schooler, June 2002.
RFC 4787, “Network Address Translation (NAT) Behavioral Requirements for Unicast
UDP,” F. Audet, C. Jennings, January 2007.
Security Considerations for VoIP Systems, NIST (National Institute of Standards and
Technology), January 2005.
This page intentionally left blank
PART
II
VoIP Security Best Practices
Chapter 6 Analysis and Simulation of Current Threats
NOTE The contents in this chapter are written more from the perspective of the enterprise or
service provider that sees and manages VoIP networks from access to core network. The
end users also can utilize the contents of this chapter to identify current threats and gain
hands-on experience with them.
128 Chapter 6: Analysis and Simulation of Current Threats
Denial of Service
A Denial-of-Service (DoS) attack is the most common threat in VoIP networks with so
many different patterns, which are more than known patterns in a pure data network (for
example, TCP SYN attack; see the following Note). The typical method of DoS is flooding;
for example, an attacker floods valid or invalid heavy traffic to a targeted system, and as a
result the performance drops significantly or the system breaks down.
NOTE TCP SYN attack—The basic method of TCP connection between a client and a server is
as follows:
Step 1 (Client) SYN -------------> (Server)
The client sends a SYN message to the server (Step 1), and the server acknowledges the
SYN message by sending a SYN-ACK message (Step 2), and the client finishes establishing the
connection by replying with an ACK message (Step 3). This flow applies to all TCP
connections like Telnet, web, email, and so on.
An attacker (client) exploits this connection mechanism by flooding SYN messages with
spoofed IP addresses, and then never responding to SYN-ACK from the server. That is, the
server has sent acknowledgment (SYN-ACK) for the flooded SYN, but has not received
ACK from the client (attacker), which leaves a lot of “half-open” connections on the server.
The data structure of the half-open connections on the server system will eventually fill, and
then the system will not be able to accept new legitimate connections until the table is
emptied out.
There is generally a timeout for the pending connections so that the half-open connections
will eventually expire and be dumped out. However, as long as the attacker keeps sending
the flooded SYN with spoofed IPs, the timeout does not help.
There are many solutions in the market for data networks, but you cannot apply those in
VoIP networks directly because there are many VoIP protocol-specific attacks, such as SIP
INVITE message flooding. This section focuses on VoIP protocol layer attacks, rather than
network layer.
Not only malicious and intentional attacks, but also unintentional attacks are possible, so-
called self-attack, because of wrong configuration of devices, architectural service design
issues, or unique circumstances.
The first example of DoS is intentional flooding, as described in the following section.
Denial of Service 129
Intentional Flooding
Intentional flooding includes all malicious DoS flooding typically from external attackers.
The typical methods of flooding are as follows:
• Valid or invalid registration flooding—An attacker commonly uses this method
because most registration servers accept the request from any endpoints in the Internet
as an initial step of authentication. Regardless of whether the messages are valid or
invalid, the large number of request messages in a short period of time (for example,
10,000 SIP REGISTER messages per second) impacts the performance of the server
severely.
• Valid or invalid call request flooding—Most VoIP servers have a security feature
that blocks flooded call requests from unregistered endpoints. So, an attacker registers
first after spoofing a legitimate user (assuming the attacker stole the identity), and
sends flooded call requests in a short period of time (for example, 10,000 SIP INVITE
messages per second). It impacts the performance or functionality of the server
regardless of whether the request message is valid or not.
• Call control flooding after call setup—An attacker may flood valid or invalid call
control messages (for example, SIP INFO, NOTIFY, Re-INVITE) after call setup.
Most proxy servers are vulnerable because they do not have a security feature to
ignore and drop those messages.
• Ping flooding—Like Internet Control Message Protocol (ICMP) ping, VoIP protocols
use ping messages in the application layer to check out the availability of server or
keep the pinhole open in the local Network Address Translation (NAT) device, such
as a SIP OPTIONS message. Although most IP network devices (for example, router
or firewall) in the production network do not allow ICMP pings for security reasons,
many VoIP servers should allow the application-layer ping for proper serviceability,
which can be a critical security hole.
Simulation
Simulating a flooding attack can give you a better idea and understanding of the threat. A
tool that generates malicious traffic is generally called a negative testing tool, which is used
by testers who validate security features of a product. Of course, attackers use the tool in a
malicious way.
There are many negative testing tools for generating flooded messages. One of the most
popular tools is SIPSAK (SIP Swiss Army Knife), which is freeware; you can download it
from http://www.sipsak.org/#download.
The contents of this section refer to the SIPSAK website.1
130 Chapter 6: Analysis and Simulation of Current Threats
SIPSAK is a Session Initiation Protocol (SIP) stress and diagnostics utility. It sends SIP
requests to the server within the sip-uri and examines received responses. It runs in one of
the following modes:
• default mode—A SIP message is sent to destination in sip-uri and reply status
is displayed. The request is either taken from a filename or generated as a new
OPTIONS message.
• traceroute mode (-T)—This mode is useful for learning the request’s path. It
operates similarly to IP-layer utility traceroute.
• message mode (-M)—This mode sends a short message (similar to short message
service [SMS] from mobile phones) to a given target. With the option -B, the content
of the MESSAGE can be set. The options -c and -O might be useful in this mode.
• usrloc mode (-U)—Stress mode for SIP registrar. SIPSAK keeps registering to a SIP
server at a high pace. Additionally, the registrar can be stressed with the -I or the -M
option. If -I and -M are omitted, SIPSAK can be used to register any given contact
(with the -C option) for an account at a registrar and to query the current bindings for
an account at a registrar.
• randtrash mode (-R)—Parser torture mode. SIPSAK keeps sending randomly
corrupted messages to torture a SIP server’s parser.
• flood mode (-F)—Stress mode for SIP servers. SIPSAK keeps sending requests to a
SIP server at a high pace.
Here is the full syntax for generating traffic with SIPSAK. You also can see it by typing the
command in the command-line interface (CLI).
shoot : sipsak [-f FILE] [-L] -s SIPURI
trace : sipsak -T -s SIPURI
usrloc : sipsak -U [-I|M] [-b NUMBER] [-e NUMBER] [-x NUMBER] [-z NUMBER] -s SIPURI
usrloc : sipsak -I|M [-b NUMBER] [-e NUMBER] -s SIPURI
usrloc : sipsak -U [-C SIPURI] [-x NUMBER] -s SIPURI
message: sipsak -M [-B STRING] [-O STRING] [-c SIPURI] -s SIPURI
flood : sipsak -F [-e NUMBER] -s SIPURI
random : sipsak -R [-t NUMBER] -s SIPURI
To insert a forwarding contact for myself at work to me at home for one hour and
authenticate with password if required:
Prompt> sipsak -U -C sip:me@home -x 3600 -a password -s sip:myself@company
To query the currently registered bindings for myself at work and authenticate with
password if required:
Prompt> sipsak -I -C empty -a password -s sip:myself@work
To send the instant message “Lunch time!” to a colleague and show the result:
Prompt> sipsak -M -v -s sip:colleague@work -B "Lunch time!"
132 Chapter 6: Analysis and Simulation of Current Threats
Now try to generate flooding traffic to a targeted SIP proxy server with SIPSAK. Here are
the assumptions before the test:
• SIPSAK is installed on the Microsoft Windows system of your PC, which has IP
connectivity with the proxy server.
• You already know the IP address (or fully qualified domain name [FQDN]) of the SIP
proxy server.
• The phone number you test is a known number for a proxy server so that it will not
reject the request message from SIPSAK.
• The network bandwidth between your PC and proxy server is enough to pass heavy
traffic during a short period of time. It requires at least 1 Mbps.
WARNING This flooding test may cause serious damage to the performance or functionality of the SIP
proxy server. It is highly recommended to use a lab system so that there is no real customer
traffic. Do not try this on your production system or service provider’s network.
Step 3 This is a pretrial step. Send flooded SIP OPTIONS messages to the
targeted SIP proxy server for 5 seconds as shown in Example 6-1 and
stop sending (press Ctrl-C) as shown in Example 6-2. Make sure that
there is no error message, and that those packets are going out to proxy
server properly (you may use Ethereal trace to confirm).
Example 6-1 Execution of SIP OPTIONS Message Flooding
C:\sipsak>
NOTE As you can see in Examples 6-1 and 6-2, SIPSAK generated about 15,000 SIP OPTIONS
messages for 5 seconds; that is, about 3,000 messages per second. You can increase or
decrease the number of messages per second by adding or deleting the -v parameter in the
command line. Refer to the command syntax as explained previously.
Step 4 Log in to the SIP proxy server, turn on the system monitoring tool, and
then check out the current usage of system resources, such as CPU and
memory. Because this is before the flooding, you are supposed to see
normal and low resource usage. Figure 6-1 is an example of a resource
monitoring tool in the UNIX or Linux system using the top command.
134 Chapter 6: Analysis and Simulation of Current Threats
NOTE If your proxy server is running on a Windows system, you can monitor the resources by
activating the Task Manager, as shown in Figure 6-2.
Step 5 While continuing to watch the monitoring screen, execute the flooding as
instructed in Step 3. You may need to generate for more than 5 seconds
to see consistent results on the server. Figure 6-3 shows an example.
NOTE Depending on the capacity of the server machine, the resource usage could vary. You may
adjust the verbosity option (-v) or time duration of flooding to see the degree of impairment
on the server.
Step 6 You may make a regular call with an IP phone while the server is attacked
by the flooding in Step 5 in order to experience the degradation (that is,
late call setup time) or outage of the service.
Analysis
As you saw in Figure 6-3, 3,000 SIP OPTIONS messages per second consume more than
90 percent of CPU usage (8.2% idle), which is under very critical state. Especially if the
high usage is sustained, the proxy server is not able to provide proper service for other call
attempts; in the worst case, it could be down or rebooted.
Additionally, keep in mind that the proxy server being tested in this example is a well-
known carrier-grade product, even though it is designed for a lab environment with a little
lower hardware specification. Even a high-performance machine would suffer the same
problems if the verbosity option is increased.
136 Chapter 6: Analysis and Simulation of Current Threats
How is it possible for one PC to create a serious impairment on a carrier-grade machine that
is designed for covering lots of customers, say, 50,000 endpoints? The reason is that the call
request is very intensive within a short period of time.
Table 6-1 describes the difference between regular heavy traffic during busy hours and
malicious flooding.
Table 6-1 Comparison Between Regular Heavy Traffic and Malicious Flooding
What kind of VoIP messages (signals) can be a tool of flooding? One example is, as you
already see, SIP OPTIONS. There are many more messages, and they can be categorized
as shown in Table 6-2.
Table 6-2 Flooding Messages
Mitigation
How can you mitigate those VoIP flooding attacks? There are several ways to mitigate, and
they can be summarized as follows for your best practice:
• Do not allow ping messages, such as SIP OPTIONS, from endpoints. Also, do not
allow the process to make response messages, so that the CPU of the server will not
be engaged.
• Limit the number of registration requests within a certain period of time, such as only
up to five times per second. The server must drop the exceeded packets without CPU
engagement.
• Require credentials for registration and call requests so that unauthorized endpoints
cannot occupy the resources of the server, such as registration cache or memory for
call state information. Also, a server should limit the number of rejection messages to
save CPU resources.
• Do not allow multiple call requests from single endpoints, except the multiple lines in
a single endpoint.
• Maintain a “black list,” put the misbehaving endpoint into the list, and drop all
messages from it for a certain period of time, say, 60 seconds.
138 Chapter 6: Analysis and Simulation of Current Threats
• Limit the total number of messages from the specific endpoint for a certain period of
time, such as up to 30 messages for 30 seconds.
The number can be calculated based on the normal behavior of the endpoint. Drop the
exceeded packets or put the endpoint into the black list for demoting.
• Limit the total bandwidth for each endpoint and put it into the black list for a certain
period of time when it exceeds the bandwidth. You may need to calculate the
maximum bandwidth for normal phone usage, which varies depending on protocol
and type of services.
• Use ACL (Access Control List) to block the source of unauthorized IP traffic.
NOTE The legacy security tool of the network layer, such as a generic firewall blocking ICMP ping
or TCP SYN attack, cannot mitigate these application (VoIP protocol) layer attacks.
The methods of mitigation described in this section are just guidelines for people who
prepare secure VoIP networks, especially flooding protection. More specific or different
methods of mitigation are possible, depending on the network architecture, type of service
(ToS), type of protocol, type of endpoints, and so on.
Many VoIP servers like a Softswitch have some of the features described in this section, but
they are very limited because the server manufacturers believe that a VoIP server itself is
not a security device like a firewall. In reality, it is not a recommended method for the VoIP
server to have all the security features, which use up significant resources as well. For
example, maintaining a black list requires extra processing power and memory.
That is why an efficient method of mitigation is using external VoIP security devices like
Session Border Controller (SBC), IP-to-IP gateway, or VoIP-aware firewall. Chapter 8,
“Protection with Session Border Controller,” and Chapter 9, “Protection with Enterprise
Network Devices,” show the details of the usage of those devices to mitigate the flooding.
Unintentional Flooding
It is possible to see unintentional flooding in a production service environment without any
malicious external attacks. It is not common, but it happens because of wrong configuration
of devices, wrong network design, unique circumstances or misbehaving devices, and so on.
It is generally easier to isolate and fix the problem of unintentional flooding compared to
malicious flooding, because the legitimate devices are under the enterprise’s or service
provider’s control. However, it is apt to damage the VoIP service quickly without being
filtered by security devices.
Denial of Service 139
This section analyzes the well-known cases and shows the methods of mitigation. Because
of the lack of tools, actual simulation of each case is not included.
Analysis
Here are three well-known cases and their analysis: global power outage and backup,
wrong configuration of devices, and misbehaving endpoints.
The reason for the short timer in the default setting is that many phone manufacturers
consider the worst case of NAT traversal: Some local NAT devices refresh the mapping
table very quickly, like every 30 seconds, so the phone should send a dummy packet (for
example, SIP OPTIONS; see the following Note) every less than 30 seconds in order to
keep the pinhole open.
You may need to use the short timer if your service network is like that case, but it is not
necessary in most cases.
NOTE Using SIP messages, such as OPTIONS, is a common way of keeping the pinhole of NAT
open, but the SIP Working Group does not recommend this method any more because of
performance issues, as specified in draft-ietf-sip-outbound-13. Instead, the SIP Working
Group selected the Session Traversal Utilities for NAT (STUN) mechanism, which is very
robust, far less CPU-intensive, and allows the detection of a changed IP address. For more
information, refer to the draft and RFC 3489 (“Simple Traversal of UDP Through NATs”).
Not only endpoint devices, but also VoIP servers (registrar, gatekeeper, or call agent) may
have wrong configuration that generates unnecessary heavy traffic like these examples:
• Too short interval of ping messages from a server (for example, MGCP AUEP, SIP
OPTIONS)
• Too short registration timer, which makes endpoint devices send registration too
frequently
You may need to set the server to have short intervals depending on service type or local
environment, but it is not necessary in most cases.
Misbehaving Endpoints
There is a well-known flood problem with Address Resolution Protocol (ARP) broadcast
that you might have seen before; for example, one PC has a network interface card (NIC)
problem and floods lots of ARP packets sucking up all internal network bandwidth. It
causes people to complain about the delay of downloading files, emails, or web contents.
Sound familiar? Even in a VoIP network, you may see a similar situation because of
device error.
Denial of Service 141
A few years ago, when I was working for a Softswitch company, I found there was
significant call setup delay in our corporate VoIP network along with high bandwidth
consumption, even though not many users used the phones. Eventually, the problem was
isolated: One MGCP phone that had a firmware or hardware error kept flooding RSIP
messages until it was accepted by the Softswitch.
Like the experience I had, it gets worse if those problematic phones and servers are within
the same corporate network because security devices like SBC are not apt to be involved
for internal traffic.
Software (firmware) or hardware problems could create this kind of unexpected flooding,
especially if multiple or anonymous types of endpoints are involved in the service network.
Mitigation
The method of mitigating unintentional (internal) flooding is somewhat easier than inten-
tional (external) attack, in terms of manageability of the root cause. The basic steps of
mitigation are as follows:
Step 1 Monitor traffic.
Malformed Messages
Here is an experience that I had a few years ago:
Our demo Softswitch in the lab was down suddenly one day. I checked out network traffic,
but there was no suspicious traffic like intrusion or DoS flooding or anything else suspicious.
Internal logs in the Softswitch also did not show any critical error enough to crash the whole
144 Chapter 6: Analysis and Simulation of Current Threats
system. It was running back to normal after I rebooted, but the crash happened again a little
while after. So, I sniffed all packets, analyzed, and found the root cause; the Softswitch was
running on infinite loops after receiving a call from one IP phone that sent the wrong format
of SIP header (the character “>” was missing in the contact header as in the following
example):
Contact: <sip:192.168.11.11;transport=udp
The bug was fixed on both IP phone and Softswitch, and everything worked fine.
It may sound funny that one syntax error on the VoIP message caused the server crash, but
this could happen in a real service environment, not only caused by a system bug but also
by an external attacker generating massive malformed messages on purpose.
In fact, it is hard to prevent those attacks because there are so many possibilities of wrong
syntax. That is, it is very difficult to code the fixes case by case. Fortunately, many testing
tools have been released recently, and they help to verify the vulnerability before deploying
a production system.
This section discusses those malformed messages from simulation to analysis and
mitigation.
Simulation
There are many tools for testing malformed messages in the market. Some of them are free
to use and already verified by many users, such as PROTOS created by the University of
Oulu in Finland. For your simulation, download the PROTOS SIP version from the
following link, or you can search “PROTOS” in the university website (www.ee.oulu.fi).
The contents in this section refer to the website:2
http://www.ee.oulu.fi/research/ouspg/protos/testing/c07/sip/#download
Download the latest version of test-material as well. Because it is a JAR package, your PC
should have the Java Virtual Machine to run it.
PROTOS manipulates only SIP INVITE message and generates 4,527 malformed INVITE
messages for the whole test. One of the examples is shown in Example 6-3.
Example 6-3 Malformed SIP Message
Message body
Session Description Protocol
Session Description Protocol Version (v): = = = = = = 0
Owner/Creator, Session Id (o): 2 2 2 IN IP4 CAL-D600-5814.cc-ntd1.covad.com
Session Name (s): Session SDP
Connection Information (c): IN IP4 192.168.10.10
Time Description, active time (t): 0 0
Media Description, name and address (m): audio 9876 RTP/AVP 0
Media Attribute (a): rtpmap:0 PCMU/8000
You can find something wrong in the INVITE message. Three headers have the wrong
format: one in Request-URI (aaaaaaaaa), another in From (;;;;;;;;;), and the other in SDP
version (======). Those error messages will be sent one at a time.
All the test cases (4,527) that PROTOS executes are grouped in Table 6-4.
Table 6-4 Test Cases of PROTOS
Exceptional
Name1 Elements2 First Index #3 Test Cases3
valid n/a 0 1
SIP-Method overflow-general, 1 193
overflow-space,
overflow-null, fmtstring,
utf-8, ansi-escape
SIP-Request-URI sip-URI 194 61
SIP-Version sip-version 255 75
SIP-Via-Host ipv4-ascii 330 106
SIP-Via-Hostcolon overflow-colon 436 16
SIP-Via-Hostport integer-ascii 452 46
SIP-Via-Version sip-version 498 75
SIP-Via-Tag sip-tag 573 57
SIP-From-Displayname overflow-general, 630 193
overflow-space,
overflow-null, fmtstring,
utf-8, ansi-escape
continues
SIP-From-Tag sip-tag 823 57
146 Chapter 6: Analysis and Simulation of Current Threats
Exceptional
Name1 Elements2 First Index #3 Test Cases3
SIP-From-Colon overflow-colon 880 16
SIP-From-URI sip-URI 896 61
SIP-Contact-Displayname overflow-general, 957 193
overflow-space,
overflow-null, fmtstring,
utf-8, ansi-escape
SIP-Contact-URI sip-URI 1150 61
SIP-Contact-Left-Paranthesis overflow-leftbracket 1211 16
SIP-Contact-Right-Paranthesis overflow-rightbracket 1227 16
SIP-To overflow-general, 1243 193
overflow-space,
overflow-null, fmtstring,
utf-8, ansi-escape
SIP-To-Left-Paranthesis overflow-leftbracket 1436 16
SIP-To-Right-Paranthesis overflow-rightbracket 1452 16
SIP-Call-Id-Value overflow-general, 1468 193
overflow-space,
overflow-null, fmtstring,
utf-8, ansi-escape
SIP-Call-Id-At overflow-at 1661 16
SIP-Call-Id-Ip ipv4-ascii 1677 106
SIP-Expires integer-ascii 1783 46
SIP-Max-Forwards integer-ascii 1829 46
SIP-Cseq-Integer integer-ascii 1875 46
SIP-Cseq-String overflow-general, 1921 193
overflow-space,
overflow-null, fmtstring,
utf-8, ansi-escape
SIP-Content-Type overflow-general, 2114 247
overflow-space,
overflow-null, fmtstring,
utf-8, ansi-escape,
content-type
SIP-Content-Length integer-ascii 2361 46
Malformed Messages 147
Exceptional
Name1 Elements2 First Index #3 Test Cases3
SIP-Request-CRLF crlf 2407 10
CRLF-Request crlf 2417 10
SDP-Attribute-CRLF crlf 2427 10
SDP-Proto-v-Identifier overflow-general, 2437 193
overflow-space,
overflow-null, fmtstring,
utf-8, ansi-escape
SDP-Proto-v-Equal overflow-equal 2630 16
SDP-Proto-v-Integer integer-ascii 2646 46
SDP-Origin-Username overflow-general, 2692 193
overflow-space,
overflow-null, fmtstring,
utf-8, ansi-escape
SDP-Origin-Sessionid integer-ascii 2885 46
SDP-Origin-Networktype overflow-general, 2931 193
overflow-space,
overflow-null, fmtstring,
utf-8, ansi-escape
SDP-Origin-Ip ipv4-ascii 3124 106
SDP-Session overflow-general, 3230 193
overflow-space,
overflow-null, fmtstring,
utf-8, ansi-escape
SDP-Connection-Networktype overflow-general, 3423 188
overflow-space,
overflow-null, utf-8,
fmtstring
SDP-Connection-Ip ipv4-ascii 3611 106
SDP-Time-Start integer-ascii 3717 46
SDP-Time-Stop empty 3763 1
continues
SDP-Media-Media overflow-general, 3764 193
overflow-space,
overflow-null, fmtstring,
utf-8, ansi-escape
148 Chapter 6: Analysis and Simulation of Current Threats
Exceptional
Name1 Elements2 First Index #3 Test Cases3
SDP-Media-Port integer-ascii 3957 46
SDP-Media-Transport overflow-general, 4003 118
overflow-space,
overflow-null, fmtstring,
ansi-escape
SDP-Media-Type integer-ascii 4121 46
SDP-Attribute-Rtpmap overflow-general, 4167 118
overflow-space,
overflow-null, fmtstring,
ansi-escape
SDP-Attribute-Colon overflow-colon 4285 16
SDP-Attribute-Payloadtype integer-ascii 4301 46
SDP-Attribute-Encodingname integer-ascii 4347 118
SDP-Attribute-Slash overflow-slash 4465 16
SDP-Attribute-Clockrate integer-ascii 4481 46
1. The “Name” column represents the tag-names of the test-groups. Tags reflect the header and field names in
the protocol specification. Tags can be used to follow which parts of the Protocol Data Unit (PDU) are being
tested.
2. The “Exceptional Elements” column describes which exceptional element categories are integrated in the
test-group.
3. The “First Index #” and “Test Cases” columns describe the first test-case number for a test-group, and
the number of cases from there on.
You can execute all of them sequentially, or individual cases indexed by the number. You
can see the usage and syntax with a help option as in Example 6-4.
Example 6-4 PROTOS Usage
The following steps show an example that runs all the test cases, along with a valid INVITE
message to make sure the SIP proxy server runs properly right after each test case.
Here is an example of a command and its sequence of execution:
C:\PROTOS>java -jar c07-sip-r2.jar -touri [email protected] -teardown -validcase
1 Sends the INVITE test-case to address [email protected] default SIP port 5060
over UDP.
2 Sends CANCEL.
WARNING This test of malformed messages may cause serious damage to the functionality of a SIP
proxy server. It is highly recommended to use a lab system on which there is no other traffic.
Do not try this on your production system or service provider’s network.
150 Chapter 6: Analysis and Simulation of Current Threats
Example 6-5 is the initial output after executing the command in the example.
Example 6-5 Test Output with PROTOS
If you need to execute an individual test case, such as #3000 in Table 6-4, use the case
number as in the following line:
C:\PROTOS>java -jar c07-sip-r2.jar -touri [email protected] -teardown — single 3000
Analysis
After running the test-material package targeting your SIP proxy server, you may find
certain errors on the server like memory overflow, high CPU usage, system crash or
automatic reboot, and so on.
Malformed Messages 151
continues
152 Chapter 6: Analysis and Simulation of Current Threats
1. Meaning of symbols:
X: Failure
I: Inconclusive
?: Unknown
–: Pass
As shown in Table 6-5, many SIP servers have vulnerability to malformed messages, which
could be the method of DoS attack by external hackers.
There are two main reasons why even commercial products have this kind of security hole:
• Most VoIP solution providers focus on the general VoIP service itself like functionality,
interoperability, features, and capacity. Many of them believe that additional security
devices (for example, SBC) are required because VoIP server is not a security device.
• Technically, it is very difficult to implement a complete solution for every single
malformed case. Even PROTOS with 4,527 test cases does not cover every possible
case (limited to the SIP INVITE message as well).
154 Chapter 6: Analysis and Simulation of Current Threats
Mitigation
There is no 100 percent complete solution for a malformed message attack, but you can
mitigate the risk significantly with the methods like those in the following list. These
guidelines are categorized by the different level of product life cycle.
• Development (protocol) level
— Limit the buffer size on malformed message lines.
— Prevent infinite loop caused by syntax errors (for example, unending clause
in any message line).
— Define a clear procedure of exception handling after the error is detected,
like stopping the parsing process immediately and flushing the error.
• Test level
— Use testing tools (for example, PROTOS) and verify whether the target
server handles properly without abusing resources. If not, additional
development is required.
— Perform the qualification process for targeted endpoint devices to make sure
they do not send any type of malformed message.
• Operational level
— Whenever the VoIP server detects the error, it should send a trap to the
administration server, so that a network administrator may handle the error
immediately.
— The VoIP server should demote the endpoint device sending the malformed
message, like dropping all packets from the device for a certain period of
time.
Sniffing/Eavesdropping
The biggest concern that most Internet telephony users have is that someone may hear their
conversation somewhere in the middle of the network. It is technically possible but not
as easy as they believe. The eavesdropper must have network access to the same local
broadcasting domain or the media path to capture the packets, as well as special tools to
decode voice conversation.
This section shows how the sniffing or eavesdropping works, and how to mitigate them.
Simulation
The call scenario of this simulation is simple: You (the caller) make a phone call to the
callee while activating the sniffing tool on your local network. That is, you pretend to be
the eavesdropper as well.
Sniffing/Eavesdropping 155
There are many options in this tool, but we will use only the essential ones to sniff VoIP
packets. Follow these steps to make the tool ready.
Step 1 Click Options on the Capture menu in the main window (or press the
shortcut key Ctrl-K). Then you can see the Capture Options window as
shown in Figure 6-5.
Step 2 Select the correct network interface card in the Interface list box if you
have multiple cards, and set the choices in the Display Options area as
shown in Figure 6-6 to see the packets in real time.
Step 3 Click the Start button, and then you can see all real-time packets going
through the hub that your phone and computer are connected to, if
everything is properly configured. Figure 6-7 is an example.
Sniffing/Eavesdropping 157
If you see some packets rolling down, you are ready to sniff VoIP packets.
Now, pick up the phone, make a phone call, talk a little bit, and hang up the phone. You can
see many SIP and RTP packets rolling down on the Wireshark during the test. Stop the
capture and save it as a file.
Analysis
Because the capture file has non-VoIP packets as well, you need to filter them out to see
only VoIP packets.
To see the call setup messages as in Figure 6-8, type sip in the filter box and then click
Apply.
As shown in Figure 6-8, you can see some information about the targeted user and the call,
such as:
• What phone number the user has
• What number the user dialed
Sniffing/Eavesdropping 159
• What ID and password were used (in this case, those are encrypted with Digest
format)
• IP address and User Datagram Protocol (UDP) port of the user phone, proxy server,
or media server
• The duration of the conversation
It is possible that an attacker may use this kind of information and spoof its identity or
attack network nodes.
Now it is time to filter actual voice (RTP) packets and retrieve the voice with Wireshark.
You can filter RTP packets as in Figure 6-9; type rtp in the filter box and then click Apply.
In order to hear the actual audio conversation, you need to take the following steps:
Step 1 Choose Statistics > RTP > Stream Analysis, and you can see the RTP
Stream Analysis window as in Figure 6-10.
160 Chapter 6: Analysis and Simulation of Current Threats
Step 2 Depending on the voice direction that you want to hear, you can choose
it in the option. The default is, as shown in Figure 6-10, the caller to
callee. Click the Save payload button, and you can see another window
for saving the audio file as in Figure 6-11.
Step 3 Select the audio type (.au) and filename (for example, voice.au) as
in Figure 6-11, and save it. Now, you can hear the original voice
conversation by playing the audio file (voice.au) with a media player. If
you want to hear the other direction of voice, you can select other options
in Step 2.
Therefore, it is possible for an eavesdropper to hear someone’s conversation with this kind
of sniffing tool.
Mitigation
Someone may be surprised by this kind of sniffing and eavesdropping. It is realistically
possible, as you have seen, but not too difficult to prevent it.
There could be two solutions to mitigate at a high level: encrypting and non-broadcasting.
The non-broadcasting is the more common way these days because of its cost-effectiveness.
Table 6-6 shows the difference.
Table 6-6 Two Solutions for Sniffing
Spoofing/Identity Theft
Spoofing means that an attacker pretends to be an actual registered user and inserts fake
messages in order to interrupt VoIP service or make toll calls. The following are examples
of spoofing:
• Insert faked SIP REGISTER message to steal the registration
• Insert faked SIP BYE message to the current call session and tear it down
• Respond out-of-service messages (for example, 486 BUSY HERE) for the inbound
calls after stealing the registration
• Make toll calls after stealing the user identity
In common cases, an attacker sniffs original registrations or call setup messages from
legitimate users as exemplified in the previous section in order to spoof the user’s identity.
Simulation
Many spoofing scenarios are possible, but the most common type of spoofing is that an
attacker steals a user’s identity (or session) first and generates a fake message to break the
service, which will be simulated in this section.
There are two steps of preparation before generating the attack: prespoofing scan and
identity theft. We will use a tool called SIPcrack to simulate both. You can download it from
the following website. The content in this section refers to the website:3
http://www.remote-exploit.org/codes_sipcrack.html
This is running on a UNIX/Linux system. Download the tar.gz file and extract into a folder
that you want. Switch to the folder and type the make command to build the files. If you
do not have OpenSSL or some errors happen while building, try the make no-openssl
command to build with integrated Message Digest 5 (MD5) function, which is slower than
the OpenSSL implementation.
After that, you can see two modules in the package: the SIPdump command for scanning
and the SIPcrack command for getting identity. Now, you are ready to do the simulation.
Prespoofing Scan
Prespoofing scan means collecting the user’s session messages, such as registration or call
setup messages, by means of any type of sniffing tools as shown in the previous section.
The SIPdump has the function.
SIPdump sniffs SIP Digest authentications from all packets passing by and writes the
password into the file that you specify.
Spoofing/Identity Theft 163
For example, if you want to sniff Ethernet 0 and write the password into the sniffed_
passwords.dump file, use the following command:
Prompt> sipdump —I eth0 sniffed_passwords.dump
If your phones and computer are using same broadcasting domain like using the same hub,
you can see the hashed (or encrypted) passwords in the dump file after you make phone
calls or reboot the phones. Now, you need another tool (SIPcrack) for decrypting them.
Identity Theft
The function of SIPcrack is that, when you select the hashed (or encrypted) password that
you want to crack, SIPcrack reads a password dictionary file having millions of samples
and hashes each word with MD5 and compares it with the target password until they are
matched. So, you need an additional password dictionary file (see the following Note) to
execute it.
NOTE To get the password dictionary, go to your favorite search engine and search for the phrase
“password dictionary.” Alternatively, you can create your own text-based dictionary with
sample words just for this simulation.
For example, if you want to crack the sniffed_passwords.dump file with the dictionary file
pass_dic.txt, the usage is as follows:
Prompt> sipcrack —w pass_dic.txt sniffed_passwords.dump
continues
164 Chapter 6: Analysis and Simulation of Current Threats
Analysis
According to the simulation in Example 6-6, the password was cracked in 13 seconds even
though it is hashed by MD5. You may have a question: Is it that easy to crack the password?
Technically, it is almost impossible to crack an encrypted password because the average
time of cracking a password with a single PC is many years when using the method of
inputting all possible words.
However, most passwords are created by humans (rather than computers) manually and
there is a high possibility that they may use very similar ones that are simple and easy
to remember. No one really wants to have more than 10 digits and random password to
remember, except system administrators. For example, if someone has a name “John Kim,”
he is apt to have passwords like “jkim,” “iamjohn,” “johnkim,” “john2kim,” “john4me,” and
so on.
Therefore, if you are using a password dictionary containing millions of commonly used
passwords, it does not take long to crack passwords that ordinary users created.
When the identity is exposed, an attacker can easily pretend and generate any type of
spoofing messages as well as making toll calls. The attacks are categorized as in the
following:
• Stealing registration
— The user cannot make outbound calls or receive inbound calls: the phone
goes into an out-of-service state.
— The attacker receives all inbound calls and either rejects (for example,
486 BUSY HERE) or receives the call.
— The attacker may send flooded registrations to the server in order to
interrupt the service.
VoIP Spam 165
• Session interruption
— The attacker sends Re-INVITE to either caller or callee, and this results in
one-way or no audio.
— The attacker sends BYE message to the current call session and tears it down.
• Toll call
— The attacker sends INVITE with a stolen identity and makes toll calls.
Mitigation
The first and best way of mitigating this spoofing attack is preventing the prespoofing scan
like a sniffing, because after the identity or call session information is exposed, there is a
high possibility of being cracked or manipulated.
The prevention of sniffing is already described in Table 6-6 with two methods: designing
non-broadcasting network and encrypting.
The other methods of mitigating are summarized as follows:
• Require authentication for call request message (for example, SIP INVITE) as well as
registration: Some service providers do not require this for faster call setup.
• Do not use default ID and password when installing initially.
• Pre-provision a machine-generated random user ID and password into the phone,
rather than giving users the option to set up their own ID and password.
• The proxy server should track the original IP address of caller and callee, and detect
(or prevent) when other source of IP address sends any session message (for example,
SIP BYE or Re-INVITE).
VoIP Spam
The general meaning of spam is unsolicited bulk email, which you may see every day. It
wastes network bandwidth and system resources, as well as annoying email users.
The spam exists in VoIP space as well, so-called VoIP Spam, in the form of voice, instant
message (IM), and presence Spam. This section looks into each type of VoIP Spam with the
SIP protocol and provides several solutions for mitigation. The content in this section refers
to RFC 5039.4
Voice Spam
Voice (or call) spam is defined as a bulk unsolicited set of session initiation attempts (that
is, INVITE requests), attempting to establish a voice or video communications session. If
166 Chapter 6: Analysis and Simulation of Current Threats
the user answers, the spammer proceeds to relay a message over the real-time media. This
is the classic telemarketer spam, applied to VoIP protocol (SIP). This is often called SPam
over Ip Telephony, or SPIT.
The main reason that SPIT is getting popular is that it is cost-effective for spammers. As
you know, the legacy PSTN-call spam already exists in the form of telemarketer calls.
Although these calls are annoying, they do not arrive in the same kind of volume as email
spam. The difference is cost; it costs more for the spammer to make a phone call than it does
to send email. This cost manifests itself in terms of the cost for systems that can perform
telemarketer calls, and in cost per call. However, the cost is dramatically dropped when
switching to SPIT because of the following reasons:
• Easy to write a spam application—It is just a SIP User Agent that initiates, in
parallel, a large number of calls. If a call connects, the spam application generates an
ACK and proceeds to play out a recorded announcement, and then it terminates the
call. This kind of application can be built entirely in software, using readily available
off-the-shelf software components.
• Low hardware cost—It can run on a low-end PC and requires no special expertise to
execute.
• Low line cost— A normal PSTN phone line allows only one call to be placed at a
time. If additional lines are required, a user must purchase another line. Typically, a
T1 or T3 would be required for a large-volume telemarketing service. However, SPIT
uses a broadband Internet connection. For example, if a spammer uses a typical
broadband Internet that provides 500 Kbps of upstream bandwidth, initiating a call
requires just a single INVITE message that is about 1 KB, which allows about 62 call
attempts per second.
• No boundary for international calls—Currently, there are few telemarketing calls
across international borders, largely due to the large cost of making international calls.
However, IP network provides no boundaries for them, and calls to any SIP URI are
possible from anywhere in the world. This will allow for international spam at a
significantly reduced cost.
• Higher hitting rate—Its content is much more likely to be examined by a user if a
call attempt is successful, because a user has to listen to an initial announcement
(spam) to judge whether it is a spam call or not.
• Finite address space—Unlike email addresses, phone numbers are a finite address
space and one that is fairly densely packed. As a result, going sequentially through
phone numbers is likely to produce a fairly high hit rate.
This low cost is enough to be attractive to spammers. For some cases, many spammers
utilize computational and bandwidth resources provided by others, by infecting their
machines with viruses that turn them into “zombies” that can be used to generate spam.
This can reduce the cost of call spam to nearly zero.
VoIP Spam 167
IM Spam
Instant Message (IM) spam is similar to email spam. It is defined as a bulk unsolicited set
of instant messages, whose content contains the message that the spammer is seeking to
convey. IM spam is most naturally sent using the SIP MESSAGE request. However, any
other request that causes content to automatically appear on the user’s display will also
suffice. That might include INVITE requests with large Subject headers (since the Subject
is sometimes rendered to the user), or INVITE requests with text or HTML bodies. This is
often called SPam over Instant Messaging, or SPIM.
SPIM is very much like email, but much more intrusive than email. In today’s systems, IMs
automatically pop up and present themselves to the user. Email, of course, must be deliberately
selected and displayed. However, most popular IM systems employ white lists, which only
allow IM to be delivered if the sender is on the white list. Thus, whether or not IM spam
will be useful seems to depend a lot on the nature of the systems as the network is opened
up. If they are ubiquitously deployed with white-list access, the value of IM spam is likely
to be low.
It is important to point out that there are two different types of IM systems: page mode and
session mode. Page mode IM systems work much like email, with each IM being sent as a
separate message. In session mode IM, there is signaling in advance of communication
to establish a session, and then IMs are exchanged, perhaps point-to-point, as part of the
session. The modality impacts the types of spam techniques that can be applied. Techniques
for email can be applied identically to page mode IM, but session mode IM is more like
telephony, and many techniques (such as content filtering) are harder to apply.
Presence Spam
Presence spam (SPPP) is similar to IM spam. It is defined as a bulk unsolicited set of
presence requests (that is, SIP SUBSCRIBE requests) in an attempt to get on the “buddy
list” or “white list” of a user in order to send them IMs or initiate other forms of
communications. This is occasionally called SPam over Presence Protocol, or SPPP.
The cost of SPPP is within a small constant factor of IM spam, so the same cost estimates
can be used here. What would be the effect of such spam? Most presence systems provide
some kind of consent framework. A watcher that has not been granted permission to see the
user’s presence will not gain access to their presence. However, the presence request is
usually noted and conveyed to the user, allowing them to approve or deny the request.
In SIP, this is done using the watcherinfo event package. This package allows a user to learn
the identity of the watcher, in order to make an authorization decision. This could provide
a vehicle for conveying information to a user; for example, by generating SUBSCRIBE
requests from identities such as sip:[email protected], which brief messages
can be conveyed to the user, even though the sender does not have permission to access
168 Chapter 6: Analysis and Simulation of Current Threats
presence. As such, presence spam can be viewed as a form of IM spam, where the amount
of content to be conveyed is limited. The limit is equal to the amount of information
generated by the watcher that gets conveyed to the user through the permission system.
Mitigation
There is no single magic bullet that prevents all voice, IM, and presence spam problems.
However, the problems would be much less significant if solutions had been deployed
globally before the problems became widespread.
RFC 5039 introduces dozens of solutions mostly coming from techniques for email spam.
Here are some remarkable solutions from the RFC.
Content Filtering
In email space, the most common form of spam protection is content filtering: a spam filter
analyzes the content of the email and looks for clues that the email is spam.
This is a useful technique for IM spam that is similar to email. IM spam filter can leverage
the latest email filter.
However, this technique is not that effective for SPIT. There are two reasons. First, in the
case where the user answers the call, the call is already established and the user is paying
attention before the content is delivered. The call cannot be analyzed before the user hears
it. Second, if the content is stored before the user accesses it (for example, with voicemail),
the content will be in the form of recorded audio or video. Speech and video recognition
technology is not likely to be good enough to analyze the content and determine whether
or not it is spam. Indeed, if a system tried to perform speech recognition on a recording in
order to perform such an analysis, it would be easy for the spammers to make calls with
background noises, poor grammar, and varied accents, all of which will throw off recogni-
tion systems. Video recognition is even harder to do and remains primarily an area of
research.
Turing Test
In email, Turing tests are mechanisms whereby the sender of the message is given some
kind of puzzle or challenge, which only a human can answer. If the puzzle is answered
correctly, the sender is placed on the user’s white list. These puzzles frequently take the
form of recognizing a word or sequence of numbers in an image with a lot of background
noise. The tests need to be designed such that automata cannot easily perform the image
recognition needed to extract the word or number sequence, but a human user usually can.
Designing such tests is not easy because ongoing advances in image processing and
artificial intelligence continually raise the bar.
VoIP Spam 169
Like many of the other email techniques, Turing tests are dependent on sender identity,
which cannot easily be authenticated in email.
Turing tests can be used to prevent IM spam in much the same way they can be used to
prevent email spam.
Turing tests can be applied to SPIT as well, although not directly, because SPIT does not
usually involve the transfer of images and other content that can be used to verify that a
human is on the other end. If most of the calls are voice, the technique needs to be adapted
to voice.
This is not that difficult to do. The following sidebar shows how it could be done, for
example.
User A calls User B and is not on User B’s white or black list. User A is transferred to an
Interactive Voice Response (IVR) system. The IVR system tells the user that they are going
to hear a series of numbers (say 5 of them), and that they have to enter those numbers on
the keypad. The IVR system reads out the numbers while background music is playing,
making it difficult for an automated speech recognition system to be applied to the media.
The user then enters the numbers on the keypad. If they are entered correctly, the user is
added to the white list.
This kind of voice-based Turing test is extended to a variety of media, such as video and text.
In the case of voice, the Turing test would need to be made to run in the language of the
caller. This is possible in SIP, using the Accept-Language header field, though this header
is not widely used at the moment, and meant for languages of SIP message components,
not the media streams.
The primary problem with the voice Turing test is the same one that email tests have:
instead of having an automaton process the test, a spammer can pay cheap workers to take
the tests.
As an alternative to paying cheap workers to take the tests, the tests can be taken by human
users who are tricked into completing the tests to gain access to what they believe is a
legitimate resource. This was done by a spambot that posted the tests on a pornography site,
and required users to complete the tests in order to gain access to content.
Because of these limitations, Turing tests may never completely solve the problem.
Reputation System
A reputation system is also used in conjunction with white or black lists. Assume that User A
is not on User B’s white list, and A attempts to contact B. If a consent-based system is used,
170 Chapter 6: Analysis and Simulation of Current Threats
Address Obfuscation
Address obfuscation is a fundamental way of minimizing spam by preventing spammers
from gathering email addresses through websites or other public sources of information.
One way to minimize spam is to make your address difficult or impossible to gather. Spam
bots typically look for text in pages of the form “user@domain” and assume that anything
of that form is an email address. To hide from such spam bots, many websites place email
addresses in an obfuscated form, usable to humans but difficult for an automaton to read as
an email address. For example,
patrick at cisco dot com
p a t r i c k a t c i s c o d o t c o m
These techniques are equally applicable to prevention of VoIP spam, and are likely to be as
equally effective or ineffective in its prevention.
It is worth mentioning that the source of addresses need not be a website—any publicly
accessible service containing addresses will suffice. As a result, Telephone Number
VoIP Spam 171
Mapping (ENUM) has been cited as a potential gold mine for spammers. It would allow a
spammer to collect SIP and other URIs by traversing the tree in e164.arpa and mining it for
data. This problem is mitigated in part if only number prefixes, as opposed to actual numbers,
appear in the Domain Name System (DNS). Even in that case, however, it provides a
technique for a spammer to learn which phone numbers are reachable through cheaper
direct SIP connectivity.
Limited-Use Address
The technique of using a limited-use address is that a user has a large number of email
addresses at their disposal, each of which has constraints on its applicability. A limited-use
address can be time-bound, so that it expires after a fixed period. Or, a different email
address can be given to each correspondent. When spam arrives from that correspondent,
the limited-use address they were given is terminated. In another variation, the same
limited-use address is given to multiple users that share some property; for example, all
work colleagues, all coworkers from different companies, all retailers, and so on. Should
spam begin arriving on one of the addresses, it is invalidated, preventing communications
from anyone else that received the limited use address.
This technique is equally applicable to SIP. One of the drawbacks of the approach is that
it can make it hard for people to reach you; if an email address you hand out to a friend
becomes spammed, changing it requires you to inform your friend of the new address. SIP
can help solve this problem in part, by making use of presence. Instead of handing out your
email address to your friends, you would hand out your presence URI. When a friend wants
to send you an email, they subscribe to your presence, which can include an email address
where you can be reached. This email address can be obfuscated and be of single use,
different for each buddy who requests your presence. The addresses can also be constantly
changed, as these changes are pushed directly to your buddies. In a sense, the buddy list
represents an automatically updated address book, and would therefore eliminate the
problem.
Another approach is to give a different address to each and every correspondent, so that it
is never necessary to tell a “good” user that an address needs to be changed. This is an
extreme form of limited-use address, which can be called a single-use address. However,
the hard part remains a useful mechanism for distribution and management of those
addresses.
These systems are used widely in presence and IM. Because most of today’s popular IM
systems only allow communications within a single administrative domain, sender identities
can be authenticated. Email often uses similar consent-based systems for mailing lists.
They use a form of authentication based on sending cookies to an email address to verify
that a user can receive mail at that address.
This solution could mitigate call spams (SPIT), but it might just change the nature of the
spam. Instead of being bothered with content, in the form of call spam or IM spam, users
are bothered with consent requests. Those requests for communications do not convey
much useful content to the user, but they can convey some. At the very least, they will
convey the identity of the requester. The user part of the SIP URI allows for limited free-
form text, and thus could be used to convey brief messages. For example, the SIP URI could
be “sip:[email protected]”. Fortunately, it is possible to
apply traditional content-filtering systems to the header fields in the SIP messages, thus
reducing these kinds of consent request attacks.
In order for the spammer to convey more extensive content to the user, the user must
explicitly accept the request. This is unlike email spam, where, even though much spam is
automatically deleted, some percentage of the content does get through, and is seen by
users, without their explicit consent. Thus, if consent is required first, the value in sending
spam is reduced, and perhaps it will cease for those spam cases where consent is not given
to spammers.
Summary
This chapter analyzes, demonstrates, and provides guidelines for mitigation for current
VoIP threats; Denial of Service, malformed messages, sniffing (eavesdropping), spoofing
(identity theft), and VoIP spam.
The typical method of DoS is that an attacker floods valid or invalid VoIP messages to target
VoIP servers in order to drop the performance or break down the system. To mitigate, limit
the number of registration requests, require credentials for registration and call requests,
maintain a dynamic “black list,” limit the total number of messages for a certain period of
time, limit the total bandwidth for each endpoint, use ACL to block the source of unautho-
rized IP traffic, and do not allow application-layer ping messages from endpoints. Uninten-
tional flooding as well can occur because of wrong configuration of devices, architectural
service design issues, or unique circumstances.
Malformed messages are another way of attacking VoIP servers by causing system errors.
To mitigate, developers should limit the buffer size on malformed message lines, prevent
infinite loops caused by syntax error, and define a clear procedure of exception handling
after the error is detected. Testers should use sophisticated testing tools and verify whether
the target server handles properly without abusing resources. Administrators should
prepare a management system so that they can demote the malicious endpoints immediately
after receiving the traps from VoIP servers.
References 173
The local broadcasting domain allows attackers to sniff (eavesdrop) someone’s conversation
or call information. To mitigate, encrypt signaling and media, and prevent packet
broadcasting in the local domain.
The method of spoofing is that an attacker pretends to be a registered user after stealing the
user’s identity, and inserts fake messages in order to interrupt VoIP service or make toll
calls. To mitigate, prevent prespoofing scan, require authentication for call request message,
do not use default ID and password when installing initially, pre-provision machine-
generated random user ID and password into the phone, and track the original IP address
of caller and callee during the call session.
VoIP spam is unsolicited bulk voice (SPIT), IM (SPIM), and presence spam (SPPP). To
mitigate, consolidate possible solutions like content filtering, Turing tests, reputation
system, address confusing, limited-use address, and consent-based black/white list. The
problems will be much less significant when selected solutions are deployed globally
before the problems become widespread.
End Notes
1 SIPSAK, SIP Swiss Army Knife, http://www.sipsak.com.
2 PROTOS, Security Testing of Protocol Implementations, http://www.ee.oulu.fi/
research/ouspg/protos/index.html.
3 SIPcrack, SIP login dumper/cracker, http://www.remote-exploit.org/
codes_sipcrack.html.
4 RFC 5039, “SIP and Spam,” J. Rosenberg, C. Jennings, http://www.ietf.org/rfc/
rfc5039.txt, January 2008.
References
CERT Advisory CA-1996-21 “TCP SYN Flooding and IP Spoofing Attacks,” http://
www.cert.org/advisories/CA-1996-21.html.
draft-ietf-sip-outbound-13.txt, “Managing Client Initiated Connections in the Session
Initiation Protocol (SIP),” C. Jennings, R. Mahy, March 2008.
RFC 3489, “STUN—Simple Traversal of User Datagram Protocol (UDP) Through
Network Address Translators (NATs),” J. Rosenberg, J. Weinberger, C. Huitema, R. Mahy,
March 2003.
RFC 3761, “The E.164 to Uniform Resource Identifiers (URI) Dynamic Delegation
Discovery System (DDDS) Application (ENUM),” P. Faltstrom, M. Mealling, April 2004.
This chapter covers the methodology of protection with VoIP protocol (SIP) in the
following sectors:
• Authentication
• Encryption
• Transport and network layer security
• Threat model and prevention
• Limitation
CHAPTER
7
Authentication
SIP provides challenge-based Digest authentication, which is defined in HTTP authentica-
tion (RFC 26172). It challenges one direction between user agent client (UAC) and user
agent server (UAS) including registrar, or between user agent (UA) and proxy server.
176 Chapter 7: Protection with VoIP Protocol
When UAS, proxy, or registrar receives a request, it may challenge the request to provide
the assurance of the originator’s identity. The originator can reply with its credential with
encryption (for example, Message Digest Algorithm 5 [MD5]), or reject the challenge.
When the credential is received, the server verifies and sends back respective response
codes like 401 (Unauthorized) or 200 (OK).
The high-level mechanism is shown in Figure 7-1.
1. Initial Request
2. Challenge
(Using Nonce Value)
SIP Server
SIP Client
(Proxy, Registrar)
3. Request with Credentials
(Based on MD5)
4. Authorize or Unauthorize
Because of the security issue, the previous method of Basic authentication (RFC 2543) is
not acceptable anymore; it is supposed to be rejected or ignored.
The next section shows the details of the Digest mechanism and the usage among user agent
and server.
User-to-Proxy Authentication
As mentioned, SIP uses Digest authentication derived from HTTP authentication based on
RFC 2617. The typical authentication is between UA and the proxy server: Example 7-1
shows an example of one-way challenge from the proxy server.
Authentication 177
Note that the comments (“Request-Line” and “Message Header”) do not exist in actual SIP
messages. In Example 7-1, there are four messages between UA and proxy server to
accomplish the authentication. Each message corresponds to each step as follows:
Step 1 UA sends REGISTER without any credential.
User-to-User Authentication
When a UAC sends a request to a UAS (including registrar), the UAS may authenticate the
UAC before processing the request. Similar to the authentication between UAC and proxy
server, the UAS challenges the UAC to provide credentials by rejecting the request with a
status code 401 (“Unauthorized”), and the UAC sends the request again with its credentials
based on the requested encryption (for example, MD5). The basic mechanism is shown in
Figure 7-2.
180 Chapter 7: Protection with VoIP Protocol
Note that the response codes (401) for the challenge and authentication headers are
different from the case with proxy server. Example 7-2 shows an example with a SIP
INVITE request.
Example 7-2 SIP Messages for User-to-User Authentication
Each message in Example 7-2 can be analyzed as a step to accomplish the authentication
mechanism. Here is the detail of each message:
Step 1 UAC sends an INVITE request to UAS without any credentials.
Step 2 UAS challenges with a status code 401.
— A “WWW-Authenticate” response header (the shaded text) must be
included.
— The field value consists of authentication scheme and parameters
applicable to the realm.
182 Chapter 7: Protection with VoIP Protocol
Encryption
End-to-end full encryption is the most common way to provide message confidentiality and
integrity between communication endpoints. The SIP standard (RFC 3261) also recommends
encryption for the purpose, but there is some limitation to providing the full encryption.
It is almost impossible, or we may say not practical, to encrypt all SIP requests and
responses end-to-end because intermediaries like the proxy server have to look at the
message fields to route properly. In particular, “Request-URI”, “Route”, and “Via” headers
should be visible to the proxy server to route the call. Furthermore, the proxy server needs
to modify some message field like “Via” header by adding its own IP address.
Therefore, you should have authentication mechanisms that proxy servers are trusted by
SIP UAs before implementing this encryption. For this purpose, you should have low-layer
security mechanisms like TLS or IPSec between UAs and proxy servers, which is discussed
in the next section.
If there are limitations to end-to-end full encryption, what is the alternative? Two parts of
a SIP transaction can be encrypted: message body and media. The message body encryption
with S/MIME is recommended in RFC 3261. The media encryption is defined in RFC 3711.3
Next up is the detail of each method and usage example.
Encryption 183
NOTE S/MIME, as the name implies, is a combination of MIME format plus security specification.
MIME was developed by the Internet Engineering Task Force (IETF) to define the format
of email messages supporting characters beyond US-ASCII, non-text attachment, multi-
purpose message bodies, and header information in non-ASCII characters. This MIME
format is also adapted by other protocols like HTTP as a supplement (SIP is derived from
HTTP). The security specification was originally defined in the de facto standard PKCS #7
by RSA Laboratories, showing how to encrypt messages with a public key. IETF adapted
PKCS #7 and documented it in RFC 2315 (CMS, “Cryptographic Message Syntax”).
However, there could be an issue if some network intermediaries rely on the message body
(SDP) and modifying it. Typical proxy servers do not modify the SDP, but some servers like
Back-to-Back User Agent (B2BUA) and Session Border Controllers (SBC) do modify the
SDP. For information on SBC, refer to Chapter 5, “VoIP Network Elements,” and Chapter
8, “Protection with Session Border Controller.”
Now that you are aware of the general concept of S/MIME, the next topic is detailed usage
of S/MIME with SIP.
184 Chapter 7: Protection with VoIP Protocol
S/MIME Certificates
The certificate for S/MIME is used to identify an end user, asserting that the holder is
identified by an end-user address, which is a combination of the “userinfo”, “@”, and
“domainname” portions of a SIP or SIPS Uniform Resource Indicator (URI; see the
following Note), typically the user’s address-of-record.
NOTE A SIP or SIPS URI identifies a communications resource. Like all URIs, SIP and SIPS
URIs may be placed in web pages, email messages, or printed literature. They contain
sufficient information to initiate and maintain a communication session with the resource.
Examples of communications resources include the following:
• A user of an online service
A SIPS URI specifies that the resource be contacted securely. This means, in particular, that
TLS is to be used between the UAC and the domain that owns the URI. From there, secure
communications are used to reach the user, where the specific security mechanism depends
on the policy of the domain. Any resource described by a SIP URI can be “upgraded” to a
SIPS URI by just changing the scheme, if the goal is to communicate with that resource
securely.
The certificate is associated with private/public keys that are used to sign or encrypt bodies
of SIP messages. As a public-key–based cryptographic mechanism, bodies are signed with
the private key of the sender (who may include their public key with the message as appropriate),
but bodies are encrypted with the public key of the intended recipient. Obviously, senders
must have foreknowledge of the public key of recipients in order to encrypt message bodies.
Public keys can be stored within a UA on a virtual keyring.
Each user agent that supports S/MIME must contain a keyring specifically for end users’
certificates. This keyring should map between address-of-record and corresponding certif-
icates. Over time, users should use the same certificate when they populate the originating
URI of signaling with the same address-of-record.
The certificate can be acquired from known public certificate authorities or well-known
centralized directories that distribute end-user certificates. Note that there is no such way to
obtain someone else’s certificate. It is also possible for users to create self-signed certificates
for particular service.
Encryption 185
The next section shows how SIP distributes public keys through S/MIME.
d4z+p7Kxe3L23ExE0phaJKBEj2TSGZ3V1ExI9Q1tv5VG/+onyohs+JH09B41bY8i7RaWgSu
OF1s4GgD/oI34a8iSrUxq4Jw0e7wi/ZhSAXGKsZfoVi/G7NNTSljf2YUeyxDKE8H5BQP1Gp
2NOM/Kl4vTyg+W4o4GBMH8wDAYDVR0TAQH/BAIwADAOBgNVHQ8BAf8EBAMCBsAwHwYDVR0j
BBgwFoAUcEQ+gi5vh95K03XjPSC8QyuT8R8wHQYDVR0OBBYEFL5sobPjwfftQ3CkzhMB4v3
jl/7NMB8GA1UdEQQYMBaBFEFsaWNlRFNTQGV4YW1wbGUuY29tMAkGByqGSM44BAMDMAAwLQ
continues
188 Chapter 7: Protection with VoIP Protocol
Now that you are aware of SIP message encryption with S/MIME, the next section takes a
look at how media can be securely encrypted.
Media Encryption
Secure RTP (SRTP) is an extension of Real-time Transport Protocol (RTP), which provides
security features, such as encryption and authentication.
The method of securing RTP packets was not defined when SIP (RFC 3261) was released.
In 2004, researchers from Cisco and Ericsson proposed the specification and IETF listed
it in RFC 3711.3 It provides a framework for encryption and message authentication of RTP
and Real-time Transport Control Protocol (RTCP) stream. Note that SRTP includes Secure
RTCP (SRTCP) in this section.
SRTP has not been widely deployed yet for VoIP services because of some issues, such as
performance, complexity of implementation, and interoperability. However, it is critical
technology that you can use to provide the confidentiality and integrity of media streams.
It uses a common method of security mechanism in which communication parties exchange
keys and encrypt/decrypt RTP packets.
The usage of SRTP is described in terms of key derivation and packet processing as follows,
based on RFC 3711.3 The method of simulating SRTP process is also introduced for your
hands-on test.
NOTE The following subsections give the high-level concept and idea of SRTP, rather than
describing every detail. For more information, refer to RFC 3711.
Key Derivation
SRTP uses two types of keys to make RTP packets secure: master key and session key.
The master key is a random bit string provided by the key management protocol and it is
used to derive session keys.
Encryption 189
NOTE Like salting bland food, a salt key is used to salt a bland encryption system—in other words,
to give more complexity to encryption/decryption. What is the benefit? It makes it almost
impossible to decrypt ciphertext (for example, password) with a dictionary or brute-force
attack.
For example, think about a regular authentication system like a Microsoft Windows login;
a user creates a new password when requested and Windows hashes the password before
storing it. You cannot guess the actual password with only the hashed characters, but you
may easily crack it with a password-dictionary attack because a human uses very simple
passwords compared to those generated by machine. Because of this vulnerability, the salt
(machine-generated random values) is appended to the password before being hashed.
Cracking this password is literally impossible without knowing the salt.
This salt key mechanism is applied to RTP packet encryption as well as generating session
keys from master keys.
The key derivation can be depicted at a high level, as shown in Figure 7-3.
190 Chapter 7: Protection with VoIP Protocol
Session Key
(Salt)
Packet
Index
Sender Receiver
Note that RFC 3711 does not describe how to pass the master key and salt from the key
management system to the sender or receiver. It could be implemented in many different
ways (for example, containing the keys in the SDP during the call setup).
The next section shows how communication parties process SRTP packets.
Step 5 Encrypt the RTP payload (the default cipher is Advanced Encryption
Standard [AES]).
Step 6 If the master key identifier (MKI) indicator is set to one, append the MKI
to the packet.
Step 7 For message authentication, compute the authentication tag for the
authenticated portion of the packet.
Step 8 If necessary, update the rollover counter using the packet index
determined in Step 2.
These steps make the SRTP packets that the sender will send to the receiver.
The receiver will take the following steps to authenticate and decrypt the SRTP packets:
Step 1 Determine which cryptographic context to use.
Step 2 Get the index of the SRTP packet.
Step 3 Determine the master key and master salt (if the MKI indicator in the
context is set to one, use the MKI in the SRTP packet; otherwise, use the
index from the previous step).
Step 4 Determine the session keys and session salts by using the master key/salt,
key derivation rate, and session key-lengths in the cryptographic context
with the index, determined in Steps 2 and 3.
Step 5 For message authentication and replay protection, first check if the
packet has been replayed by using the replay list (if the packet is
replayed, the packet must be discarded).
Step 6 Decrypt the encrypted portion of the packet by using the decryption
algorithm indicated in the cryptographic context, the session encryption
key and salt found in Step 4.
Step 7 Update the rollover counter and highest sequence number in the
cryptographic context.
Step 8 Remove the MKI and authentication tag fields from the packet.
Now that you are aware of the basic steps of constructing, authenticating, and decrypting
SRTP packets, the next section gives you the opportunity to simulate the SRTP process with
a test tool.
SRTP Test
You can simulate the SRTP process with a test tool, named libSRTP.4 The libSRTP is an
open-source implementation of the SRTP originally created by David McGrew of Cisco
Systems, who is one of the authors of RFC 3711.
192 Chapter 7: Protection with VoIP Protocol
The libSRTP uses the default key derivation function that uses AES-128 in Counter Mode.
It requires a 16-octet master key and a 14-octet master salt in order to generate session keys.
You can download the libSRTP in the following website (it is available under a Berkeley
Software Distribution [BSD]-style license):
http://srtp.sourceforge.net/download.html
Download the latest version from the website and compile it to your target machine as
follows:
> gunzip srtp-X.Y.Z.tgz
> tar xvf srtp-X.Y.Z.tar
> cd srtp
> autoconf
> ./configure
> make
You can see the basic usage with the rtpw command as follows:
[root@ test]# ./rtpw
error: neither sender [-s] nor receiver [-r] specified
usage: ./rtpw [-d <debug>]* [-k <key> [-a][-e]] [-s | -r] dest_ip dest_port
or ./rtpw -l
where -a use message authentication
-e use encryption
-k <key> sets the srtp master key
-s act as rtp sender
-r act as rtp receiver
-l list debug modules
-d <debug> turn on debugging for module <debug>
Now, run the receiver first with a random 30-octet key/salt (you may use any generation
tool) and then run the sender with same master key/salt. The libSRTP sender sends random
words automatically, and you can see the same words displaying on the screen, as shown
in Example 7-5.
Example 7-5 SRTP Test with libSRTP
word: &c
word: 'd
word: 'em
word: 'll
word: 'm
word: 'mid
word: 'midst
word: 'mongst
word: 'prentice
word: 're
word: 's
word: 'sblood
In Example 7-5, the first command is for a sender (“-s” parameter) and the second com-
mand is for a receiver (“-r” parameter). Both the sender and the receiver are in the same
local machine (127.0.0.1) in this example. The receiver receives the same words sequen-
tially in accordance of the sender’s words.
You have learned about message encryption with S/MIME and media encryption with
SRTP in this section. The next section shows how to secure VoIP service in the transport
and network layers.
Exchange Certificate
A’s Proxy B’s Proxy
Pre-Existing Pre-Existing
Trust Trust
UA-A UA-B
No Pre-Existing Trust
NOTE The transport mechanisms are specified on a hop-by-hop basis in SIP, so a user agent
that sends requests over TLS to a proxy server has no assurance that TLS will be used
end-to-end.
TLS can be specified in SIP-URI or Via header signifying TLS over TCP, as shown in the
following examples:
INVITE sip:[email protected]; transport=tls
Via: SIP/2.0/TLS 10.10.10.155:5060;branch=z9hG4bK-a7140dfd
The Advanced Encryption Standard (AES) must be supported at a minimum when TLS is
used in a SIP application. For the purpose of backward compatibility, all SIP servers (proxy,
redirect, and registrar) should support triple DES (3DES).
SIP can also specify the usage of TLS when targeting a specific resource by means of SIPS
URI format, which is the same as SIP URI except using “sips:” as follows:
Sips:[email protected]:5060;uri-parameters?headers
Sips
Threat Model and Prevention 195
Using SIPS means that TLS is preferred to be used hop-by-hop until the terminating UAS
has the target resource. However, in real service environments, some other security mech-
anism could be used partially rather than end-to-end TLS connection.
Now that you are aware of the basic usage of TLS with SIP, the next topic is network-layer
security with IPSec.
IPSec (Tunneling)
IPSec is a suite of network-layer protocols that secure IP network communications by
encrypting and authenticating data. It is generally used for virtual private network (VPN)
connection.
Basically, the IPSec protocol (network-layer) is independent of the SIP protocol (application-
layer) and there is no required integration between them. Unlike the integration with TLS,
SIP does not provide any indication of IPSec in the messages. However, practically speak-
ing, IPSec is very useful to provide security between SIP entities, especially between a UA
and a proxy server. UAs that have a preshared keying relationship with their first-hop proxy
server are good candidates to use IPSec.
Implementers should consider a separate security mechanism from SIP protocol because
IPSec is usually deployed at the operating system level in a host, or on a security gateway
(for example, a VPN server) that provides confidentiality and integrity for all traffic that it
receives from a particular interface. (The detailed usage of IPSec is beyond the scope of this
book. For more information on IPSec, go to Cisco.com.)
You have learned about transport and network layer security with SIP in this section. The
next section shows threat models and prevention from a SIP perspective.
Registration Hijacking
SIP registration allows a user agent to identify itself to a registrar as a device that a user is
located. A registrar assesses the identity asserted in the From header field of a REGISTER
message to determine whether this request can modify the contact addresses associated
with the address-of-record in the To header field. In most cases, these two fields are same,
196 Chapter 7: Protection with VoIP Protocol
which means the user agent registers its own. However, these two fields could be different
in the case of third-party registration, which means the third party registers the user agent
(address-of-record) on the user’s behalf.
Here is a serious security hole in which the registration could be hijacked:
An attacker impersonates a user agent by modifying the From header and add the attacker’s
address to the To header when it sends a REGISTER message, which updates the address-
of-record of the target user. Typically, the attacker unregisters first and registers its own and
hijacks all the messages going to the target user.
This threat happens when the user agent server (registrar) is relying only on SIP headers
to identify the user agent. The method of prevention is that the user agent server should
authenticate the originator of requests based on cryptographic assurance; for example,
by TLS.
The next threat is impersonating a server.
Impersonating a Server
Generally, a user agent sends a request to a server (proxy, registrar, or redirect server) in the
target domain, which is specified in Request-URI. It is possible that an attacker imperson-
ates the server, receives all requests, and manipulates them. Here is an example:
User agent A sends requests to its redirect server in the same domain (abc.com) when
making a call. An attacker’s redirect server in the different domain (xyz.com) impersonates
A’s redirect server by malicious means like attacking a Domain Name System (DNS)
server. From now, A’s requests go to the attacker’s redirect server and the attacker redirects
the call to any malicious proxy server that is totally under the attacker’s control.
If a registrar server is impersonated as in this case, the situation is worse. The attacker
responds SIP 301 (Moved permanently) with wrong contact information for a REGISTER
request from the user agent, which makes the user agent register to the wrong registrar
server all the time.
The method of prevention is providing the mechanism of cryptographic authentication from
user agents to SIP servers; for example, by TLS.
The next threat is tearing down sessions.
User agent A makes a call to user agent B. An attacker in the middle sniffs all SIP messages
(for example, INVITE, OK, ACK) and memorizes the dialog based on the From tag, To tag,
and Call-ID. Then, the attacker tears down the session by sending BYE to either A or B. Or,
the attacker can eavesdrop the media by sending re-INVITE to either A or B and anchoring
the media through the attacker’s server.
The method of prevention is authenticating the sender of the BYE (or re-INVITE). The user
agent needs to know that the BYE (or re-INVITE) came from the same party with whom
the corresponding dialog was established. Another possible method of prevention is
encrypting all headers so that an attacker may not see the session information, but this is
generally not practical because many headers (for example, Via, From, To) are supposed to
be visible to intermediaries like proxy. TLS also can be used to prevent the attack.
The next section shows the most common and critical threat: denial-of-service and
amplification.
protocol is not originally designed for security. Therefore, you need to have a policy-based
security device like SBC that is able to detect and prevent DoS attacks, as described in
Chapter 8, “Protection with Session Border Controller.”
Now that you are aware of the many different types of threats and protection methods with
SIP, the next section takes a look at what limitations those methods have.
Limitations
The SIP protocol (RFC 3261) itself provides a few security features, and also introduces
many guidelines with other protocols (for example, S/MIME, TLS, and IPSec) as described
in the previous sections, which are very useful to build up secure VoIP service network.
However, you also need to know what limitations exist.
S/MIME Limitations
The number one issue with S/MIME in SIP is the man-in-the-middle (MITM) attack
because of its loose key management.
If self-signed certificates are used, which is allowed in SIP, an attacker in the way of the
initial request can intercept, modify, and send the forged certificates to the other party. From
now on, the attacker in the middle can monitor or manipulate all the messages between two
parties.
Of course, this attack is only valid when the initial self-signed certificate is intercepted;
otherwise, the UAs can detect any change of certificate. Therefore, how the keys are
initially distributed from the key management system to UAs, or the keys are exchanged
between UAs, is most critical.
The next limitations are on TLS, as follows.
Limitations 199
TLS Limitations
The biggest limitation of TLS is that it cannot run over UDP. It requires a TCP (connection-
oriented) connection that requires much more resources than UDPs, especially when a
long-lived TCP connection is used. There is a relatively new protocol, Datagram Transport
Layer Security (DTLS; see the following Note), that supports TLS-equivalent security over
UDP, but DTLS is not commonly deployed yet.
In the same manner, it creates a scalability issue when a large number of user agents
establish a long-lived TCP connection with a proxy or registrar server, which is very
common in global VoIP networks. That is why TLS has not been widely deployed in large
service networks even though it has significant security benefit.
TLS allows UAs to authenticate only the adjacent server, which means there is no guarantee
of end-to-end TLS in a SIP transaction.
The next limitations are on SIPS URI as follows.
NOTE DTLS (RFC 43476) specifies the Datagram Transport Layer Security protocol, which pro-
vides communications privacy for datagram protocols. It allows client/server applications
to communicate in a way that is designed to prevent eavesdropping, tampering, or message
forgery. It is based on the TLS protocol and provides equivalent security guarantees. Data-
gram semantics of the underlying transport are preserved by the DTLS protocol.
Summary
This chapter demonstrates how to make VoIP service secure with SIP and other supplemen-
tary protocols (S/MIME, SRTP, TLS, and IPSec). It focuses especially on the methodology
of protection in the area of authentication, encryption, and transport/network layer security
in conjunction with threat models and limitation.
The SIP protocol itself is not secure enough to provide VoIP service through the public
Internet because it was not originally designed for security, but its security features are
essential to build up the whole security structure in conjunction with existing security
models derived from other protocols, such as HTTP and SMTP.
SIP provides challenge-based Digest authentication between UAC and UAS, or between
UA and proxy server. When a server receives a request, it may challenge the request to
provide the assurance of the originator’s identity. The originator can reply with its credential
with encryption (for example, MD5), or reject the challenge. When the credential is
received, the server verifies and sends back respective response codes.
End-to-end full encryption is the most common way to provide message confidentiality and
integrity between communication endpoints. SIP also recommends encryption for the
purpose, but there is a limitation; encrypting all SIP requests and responses end-to-end is
not applicable because intermediaries like proxy server have to look at the message fields
to route properly. Therefore, two parts of a SIP transaction are recommended to be
encrypted: message body (with S/MIME) and media (with SRTP).
Another security mechanism is necessary so that proxy servers are trusted by UAs. For this
purpose, you need a low-layer security mechanism that encrypts entire SIP requests and
responses on the wire on a hop-by-hop basis, and allows UAs to verify the identity of proxy
servers to whom they send requests. SIP recommends TLS and IPSec for the purpose.
SIP protocol-specific threat models exist, such as registration hijacking, impersonating
a server, tearing down sessions, and DoS. These can be mitigated by authentication,
encryption, or lower-layer security methods.
End Notes
1 RFC 3261, “SIP (Session Initiation Protocol),” J. Rosenberg, H. Schulzrinne,
G. Camarillo, A. Johnston, J. Peterson, R. Sparks, M. Handley, E. Schooler,
June 2002.
2 RFC 2617, “HTTP Authentication: Basic and Digest Access Authentication,”
J. Franks, P. Hallam-Baker, J. Hostetler, S. Lawrence, P. Leach, A. Luotonen,
L. Stewart, June 1999.
References 201
References
RFC 2633, “S/MIME Version 3 Message Specification,” B. Ramsdell, June 1999.
RFC 2315, “Cryptographic Message Syntax Version 1.5,” B. Kaliski, March 1998.
This chapter covers the methodology of protection with a Session Border Controller (SBC).
The main subjects are as follows:
• Network border issues
• Access and peer SBCs
• SBC functionality
• Service architecture design
CHAPTER
8
Protection with
Session Border Controller
There is no single solution to secure a VoIP service network entirely. The best practice is to
integrate all possible solutions according to service model, network architecture, protocol
model, target customers, peering partners, and so on.
Chapter 7, “Protection with VoIP Protocol,” demonstrated the methods of how to protect
VoIP service with VoIP protocols. This chapter will demonstrate additional methods of
protection with a major security device, Session Border Controller (SBC).
An SBC is, as the name implies, a controlling device located on a border of two network
sessions. The session is a logical boundary of a VoIP network. For a better understanding,
in this book it is also referenced as either domain or realm. Figure 8-1 shows an example
of session borders among different VoIP networks.
The role of SBC is, simply speaking, resolving border issues. What are the border issues,
then? They are interoperability and security issues taking place in the border, such as
Denial-of-Service (DoS), call flooding, traversing media (one-way audio), coder-decoder
(codec) conflict, and so on.
This chapter covers the details of border issues in a VoIP network, especially security
issues, and the methodology of preventing them with an SBC.
NOTE The content in this chapter is written from the perspective of VoIP service providers or
enterprises, who generally design and deploy SBCs into their service network.
204 Chapter 8: Protection with Session Border Controller
Enterprise
Customer
Network
Border
Partner
Partner
Network Core
Border Border Network
(PSTN Network
(IP hand-off)
Termination)
Border
Residential
Customer
Network
Border Issues
There are typically two network borders from a VoIP service provider’s perspective. One is
between the customer’s access network and the service provider’s network (core network).
The other is between the core network and the other service provider’s network (peer
network).
The customer’s access network is most likely that of the local Internet service provider
(ISP) who provides Internet access service, which is generally different from the service
provider’s network (see the following Note).
NOTE Many VoIP service providers, such as Vonage or Skype, do not provide the access network;
however, some do provide it, especially for enterprise customers, to ensure quality of
service and security. Some ISPs, such as Comcast or AT&T, also provide both the access
and service network for their customers.
Border Issues 205
The peer network is typically a call-termination network like public switched telephone
network (PSTN) termination. Figure 8-2 illustrates the VoIP service between the different
networks at a high level.
Service Provider
(PSTN Termination)
Peer PSTN
Network Proxy MG
CO
ER
ER
ER
Local ISP
Residential Enterprise
Customer Customer
Access Firewall
Network IAD
NAT
In an access network, most residential customers use an integrated access device (IAD; for
example, digital subscriber line [DSL] or cable modem) that the local ISP provides in order
to access the Internet. They can send and receive calls with IP phone or softphone through
IP connectivity. IAD may have extra Foreign Exchange Station (FXS) port interfaces for
206 Chapter 8: Protection with Session Border Controller
regular PSTN phones. Enterprise customers, generally, maintain their internal network and
use a local ISP to have Internet access through a firewall/Network Address Translation
(NAT). They may use IP PBX, multiport IAD, IP phone, softphone, and so on for VoIP
service.
In the core network, the service provider maintains all servers like Softswitch, proxy, gate-
keeper, application server, and database in order to process call requests or registrations.
Many servers have interfaces accepting call requests directly from clients in public (access)
networks, and route the calls to either the service provider’s own network or another service
provider’s network that is a peer network (for example, PSTN termination). Generally, a
core network already has a fair degree of security level with a firewall.
In a peer network, the partner company also maintains some server groups and uses them
to route the call to the corresponding destination. It is generally more secure than an access
network, but there are still security issues facing it.
Some border issues among access, core, and peer networks are common, but mostly they
have different types of issues because of their different policy, topology, management,
and so on.
The next topic is the issues between access and core networks.
SBC Functionality
The primary function of an SBC is resolving the border issues that are listed in the section
“Border Issues” of this chapter. This section covers the concept of the function of SBCs,
the guidelines of functional design, and usage examples. Keep in mind that this content is
explained at a high level; the actual implementation and utilization of the function can vary
from company to company.
SBC
Voice Voice
Gateway Gateway
IP Phone Softphone IP Phone Softphone
The external endpoints (Customer Premises Equipment [CPE]) cannot recognize the medi-
ation device (SBC) that passes the same protocol messages as the VoIP servers except IP
and port information. An SBC also typically has the capability to rewrite all headers in the
message so that the endpoints cannot see any other node’s information; for example, an
SBC can rewrite a SIP Via header by removing history and adding only its own IP informa-
tion. That is, a user cannot see where the actual server is and how the call is routed.
NOTE Because an SBC is a single entity and its IP address/port is known to anything connecting
to it, an SBC can become a point of attack, which is why an SBC has to have sophisticated
security features, high capacity, robustness, and high availability.
NOTE Back-to-Back User Agent (B2BUA), as defined in RFC 3261,1 is a logical entity that
receives a request and processes it as a User Agent Server (UAS). To determine how the
request should be answered, it acts as a User Agent Client (UAC) and generates requests.
Unlike a proxy server, it maintains dialog state and must participate in all requests sent on
the dialogs it has established. Because it is a concatenation of a UAS and UAC, no explicit
definitions are needed for its behavior.
The following example shows the simple call request from CPE to Softswitch (Proxy)
through an access SBC that provides two separate network interfaces for each counterpart.
Figure 8-4 illustrates the call flow with IP address change when going through SBC.
INVITE
F: 77.244.244.196
T: 77.244.244.196
INVITE
Contact:<[email protected]>
F: 66.66.71.32
T: 66.66.71.32
Contact:<[email protected]>
200 OK
F: 66.66.71.32
T: 66.66.71.32
200 OK Contact:<[email protected]>
F: 77.244.244.196
T: 77.244.244.196
Contact:<[email protected]>
In Figure 8-4, the access SBC has two IP interfaces: 77.244.244.196 is the facing access
network (CPE) and 77.244.244.236 is the facing core network (Softswitch).
CPE sends INVITE with its domain (IP), but the access SBC converts it with the other
domain (IP) in the core network. 200 OK is also converted in the same way. The point is
that CPE in the access network cannot see any core network information, even within
headers and bodies in response messages. The actual messages in this dialog are listed in
Example 8-1, which gives you the details of message conversion providing topology
hiding. The important area is shaded.
SBC Functionality 211
v=0
o=4084003020 26235624 26235644 IN IP4 66.67.74.188
c=IN IP4 66.67.74.188
t=0 0
m=audio 8000 RTP/AVP 0
a=rtpmap:0 pcmu/8000
v=0
o=4084003020 26235624 26235644 IN IP4 77.244.244.236
c=IN IP4 77.244.244.236
t=0 0
m=audio 45572 RTP/AVP 0
a=rtpmap:0 pcmu/8000
v=0
o=- 2385770530 2385770531 IN IP4 66.66.71.32
s=SIP Call
c=IN IP4 66.66.71.16
t=0 0
m=audio 19546 RTP/AVP 0
a=rtpmap:0 PCMU/8000
v=0
o=- 2385770530 2385770531 IN IP4 77.244.244.196
s=SIP Call
c=IN IP4 77.244.244.196
t=0 0
m=audio 45572 RTP/AVP 0
a=rtpmap:0 PCMU/8000
Note that SBC converts the Session Description Protocol (SDP) (“c=” line) as well so that
the media can be anchored (relayed) by SBC instead of end-to-end media, which is the most
common service architecture to secure the media path and control the bandwidth. If any
malicious endpoint floods media, SBC can detect it and demote the endpoint based on
the policy.
SBC Functionality 213
DoS Protection
The major need for SBC came from the requirement of protecting VoIP servers from
flooded traffic regardless of whether it is malicious or not. The DoS attack is typically
formed of flooded traffic. You can see examples in Chapter 6, “Analysis and Simulation of
Current Threats.”
In reality, most VoIP servers (for example, Softswitch, proxy, and gatekeeper) are apt to
support only a limited function of protecting flooded traffic, such as call admission control
(CAC). Of course, that limited function cannot prevent the variety of DoS attacks. Why do
those servers not support fully, then? The main reason is the capability issue on VoIP
servers; supporting DoS prevention requires a large amount of system resources (CPU,
memory, bandwidth, and so on), as well as sophisticated software and hardware
architecture.
The method of implementing DoS protection functions in an SBC varies from company to
company, but the general concept of function design and usage can be summarized as in the
following section.
NOTE “Black or white list” is not an officially used term but is used here just to give you a better
understanding.
• White traffic—Secure signal or media from trusted endpoints. Generally, the VoIP
service provider deploys these endpoints and maintains the static IP information.
Some service providers manage even the access network as well and make sure all
traffic within their network is secure; this is not a common service model, though.
SBC should maintain White ACL and allow inbound and outbound traffic without
restriction.
• Black traffic—Insecure signal or media from untrusted endpoints. These endpoints
are either malicious or infected by attackers. Generally, this black traffic comes from
an unmanaged access network in which the VoIP service provider cannot control the
IP connectivity, which does not mean that all traffic from an unmanaged network
is Black.
SBC should maintain Black ACL and deny inbound and outbound traffic.
214 Chapter 8: Protection with Session Border Controller
NOTE The reason for time expiration in the Black list is that even normal endpoints may have
malfunctions. The endpoints cannot have a service anymore even after recovery if there
is no expiration time in the Black list.
SBC Functionality 215
Meanwhile, each ACL should include all information for the endpoint as follows:
• Source IP address
• Source port
• Transport protocol (Transmission Control Protocol [TCP] or User Datagram
Protocol [UDP]
• Application Protocol (SIP, H.323, Media Gateway Control Protocol [MGCP],
Skinny Call Control Protocol [SCCP])
• Destination IP address (SBC’s)
• Destination port (SBC’s)
Hardware Architecture
It is a much better idea to design SBC-specific hardware in order to apply the DoS policies
effectively because software-only solutions have many limitations.
There are many possible architectures, and one of them is shown in Figure 8-5 from a high-
level perspective.
Priority Non-Priority
Queue Queue
Control
Policy Engine
ACL
Access Control Engine
(Dynamic and Static)
NIC NIC
216 Chapter 8: Protection with Session Border Controller
Overload Prevention
The meaning of overload prevention in this context is that the SBC monitors regular traffic
from legitimate endpoints and controls it in order not to overwhelm VoIP servers, which is
somewhat different from DoS protection dealing with malicious or flooded traffic.
SBC Functionality 217
The typical method of preventing the overload is that an SBC reduces redundant or
unnecessary signals by controlling the frequency of messages (for example, periodical
registration or keepalive), or distributes the load to multiple targets based on policy. The
next section looks at the typical examples that an SBC can support.
REGISTER
Expires: 1800
REGISTER
Expires: 3600
OK
Expires: 3600
OK
Expires: 60
REGISTER
OK
Every 60 sec
REGISTER
OK
REGISTER
OK
As shown in the preceding example, the access SBC manipulates the “Expires” header, and
sets the timer to 60 seconds with the IP phone and 3600 seconds with the Softswitch, which
saves a significant amount of bandwidth and resources on the Softswitch.
Ping Control
NOTE In this context, the term “ping” is not ICMP Ping in the network layer, but ping messages
in the application layer.
VoIP servers ping endpoints to check out the availability and keep the pinhole open if they
use NATed IP addresses. The method of pinging is different depending on the protocol
they use. For example, a Softswitch may send SIP OPTION or MGCP AUEP to an endpoint
and validate the response.
The SIP OPTION itself is a very light message in terms of bandwidth and resource con-
sumption. However, if the SIP server sends the OPTION to a large number of endpoints
(say 50,000) very frequently, it becomes a significant issue.
SBC could be the solution by being located in the middle and controlling the number of
pings to save bandwidth and resources. An administrator needs to configure an SBC to
control the frequent OPTION messages to unmanaged endpoints so that SIP servers in the
core network may not take care of the ping traffic.
Load Balancing
It happens intermittently that VoIP servers are overloaded during a certain period of time
and users cannot make calls. There could be many reasons for this, but the typical one is a
failure of load balancing among multiple identical servers.
Most service providers deploy multiple VoIP servers with the same functions to divide the
traffic, after calculating expected traffic and capacity. However, this does not work well
when the method of distributing traffic is not efficient; some service providers “statically”
assign the server address (IP and port) to a CPE when initially deploying it. For example, a
certain group of CPE (Group A) always connects to a dedicated server (Proxy A), Group B
to Proxy B, Group C to Proxy C, and so on. The method of static assignment may work
reasonably well if each group generates less traffic all the time than what the corresponding
server can handle, but the reality does not quite work that way. Here are two main issues
with the static load balancing:
• When a certain server is heavy loaded, others that have light traffic cannot share the
overload. This happens commonly, especially in global VoIP service environments.
SBC Functionality 221
• The method of preventing overload (for example, deploying more servers to handle a
small group of CPE) can be wasteful and not cost-effective.
To avoid these issues, some service providers deploy a redirect server in a core network to
distribute the traffic, but it still has typical border issues (for example, flooding). Therefore,
an SBC can be a good solution to resolve both overloading and border issues. Figure 8-7
shows before and after deploying an SBC in terms of load balancing.
SBC
The method of distributing traffic from an SBC depends on the strategy that an administra-
tor defines. Several methods are possible, and the following list describes some of them:
• Round Robin—Select each target server in turn and route a call request, which gives
equal load to all servers.
Round Robin is efficient when all servers have same capacity.
• First Available—There is a priority selecting each server, and the highest one is
always selected first as long as it has not hit the threshold. If the highest is not
available or overloaded, the second is selected. The configuration comes with the
priority list.
First Available is efficient when a primary server exists along with backups.
222 Chapter 8: Protection with Session Border Controller
• Random Select—Each server is selected randomly, which distributes the load almost
equally in the long run, like Round Robin.
Random Select is efficient when all servers have same capacity.
• Least Busy—Select the target server having the least number of active sessions.
Least Busy is efficient when distributing calls based on real-time resource consumption
on the server. (Note that Round Robin or Random Select does not care about the
number of active sessions.)
• Proportional Distribution—Select the target server based on the defined proportion
of usage. For example, an administrator assigns 30 percent to Server A, 20 percent to
Server B, and 50 percent to Server C.
Proportional Distribution is efficient when each server or network has different
capacity.
These are typical methods of distributing traffic from SBC.
In this section so far, you have learned about registration timer control, ping control, and
load balancing to prevent overload. The next topic is another important function of SBC,
NAT traversal.
NAT Traversal
The way that media traffic (voice and video) can traverse a NAT device is a very common
issue. Most CPE these days are located behind a NAT (or firewall) device and have RFC
1918 addresses (also known as “private IP”) that are routable only by internal network.
Because the private IP is not routable in the public Internet, a NAT device maps it to its
public IP address whenever VoIP traffic goes through. However, the thing is that a NAT
device converts only IP addresses in Layer 3 and does not know about IPs in the application
layer (for example, SIP messages), which can cause a one-way or no-audio problem, which
you may have heard many times.
NOTE There are also VoIP-aware NATs that convert IP addresses in the application layer as well,
but these are not commonly deployed yet.
SBC Functionality 223
v=0
o=4084003020 26235624 26235644 IN IP4 192.168.10.10
c=IN IP4 192.168.10.10
t=0 0
m=audio 8000 RTP/AVP 0
a=rtpmap:0 pcmu/8000
The user agent sending the INVITE in Example 8-3 has a private IP 192.168.10.10 that is
written in “Via”, “Contact” header and SDP (“c” line). The private IP in the headers is not
an issue when making a call setup with a proxy server dealing with a NATed public IP.
However, the other user agent receiving the INVITE from the proxy looks at the SDP and
sends media to the address in the “c” line having a private IP, which means the media will
be dropped.
An SBC can be a solution to resolve this problem in the middle of the signaling path as a
proxy server or B2BUA. There are two ways of resolving it in an SBC:
1 Replacing a private IP with the SBC’s IP.
This is the most common way of traversing a NAT device. An SBC anchors (relays)
the media between user agents by replacing the IP (“c” line) and port (“m” line) with
its own IP/port.
It assumes that the endpoints behind a NAT device use symmetric Real-time Transport
Protocol (RTP)/Real-time Transport Control Protocol (RTCP) (see the following
Note).
The downside is that it affects the performance of SBC anchoring all media.
224 Chapter 8: Protection with Session Border Controller
Instead of anchoring the media, an SBC hands it off by replacing the private IP with
the NATed IP so that the other user agent may send the media directly to the originator
through the NAT device.
It assumes that a NAT device uses (maps) the same port number as an originator’s,
which is not a common feature. So, this method is not commonly used.
NOTE Symmetric RTP/RTCP means that the IP address and port number used for outbound RTP/
RTCP are reused for the inbound RTP/RTCP. An SBC learns the outbound IP/port pair of
RTP/RTCP when receiving initial outbound media packets and passes inbound media to the
same IP/port pair. Most CPE supports this feature.
Because anchoring media requires high bandwidth and processing power, it is not recom-
mended for regular VoIP servers (for example, SIP B2BUA) to support this feature. A ded-
icated media server like an SBC is recommended.
The next topic is another important function of SBC, lawful interception, especially for
government security purposes.
Lawful Interception
Lawful Interception (LI) or Communications Assistance for Law Enforcement Act
(CALEA) is a VoIP service provider’s duty to intercept call data (for example, call setup
messages) or call contents (for example, voice), and forward them to a law enforcement
agency according to a warrant. The detailed analysis of LI and its implementation with an
SBC is described in Part III, “Lawful Interception (CALEA),” of this book.
The reason for utilizing an SBC for the interception is that it can see most of the signals and
media going back and forth among CPE and VoIP servers as an access device. Figure 8-8
shows the functional architecture of LI and the location of SBC, which is based on the
Telecommunications Industry Association (TIA) J-STD specification.
SBC should have the following functions as a part of Access Function (AF):
• Interface with Service Provider Administrative Function (SPAF) and receive the target
information such as a phone number, start time, or end time.
• Intercept the target call information in accordance with the request from SPAF.
• Interface with Delivery Function (DF) and forward the call data and/or content.
• Provide the transparency of interception; in other words, the target user is not
supposed to recognize any difference when being intercepted.
SBC Functionality 225
Service Provider
Delivery Function (DF)
Service Provider
Administrative Function Call Call
(SPAF) Data Content
CPE
NOTE All LI functions use an initial capital letter for each function’s name followed by the
letter “F,” such as “DF” for “Delivery Function,” because these names are defined by LI
specifications and not as general terms.
For the details of LI, see Chapter 10, “Lawful Interception Fundamentals.”
In this section so far, you have learned about the major functions of SBC, especially from
the security perspective. The next section covers other important functions of SBC that you
need to know.
226 Chapter 8: Protection with Session Border Controller
Other Functions
Besides the security-related functions described in previous sections, an SBC has many
other functions for resolving border issues. Other critical functions are introduced in this
section, which are related to secure service in a sense.
Protocol Conversion
An SBC is located at the border of a core network facing multiple different domains, such
as other service providers (peering partners). There is always an interoperability issue when
communicating with peering partners that use either a different protocol or the same
protocol but a different method of implementation. Here are some examples:
• The core network uses the SIP protocol, but a peering partner A uses H.323.
• The core network uses SIP INVITE with SDP (early media), but a peering partner B
does not support.
• The core network uses SIP INFO for dual-tone multifrequency (DTMF) transmission,
but a peering partner C supports only RFC 2833.
• The core network uses fax relay (T.38), but a peering partner D supports only fax
pass-through.
The traditional way of resolving those issues is adding more codes to VoIP servers in order
to support different protocols, which takes time and delays production service. An SBC can
be the solution, converting protocols at the borderline without changing core VoIP servers
(or making only minor changes). In fact, most current SBC products in the market support
these features.
Figure 8-9 shows an example of converting protocols between SIP and H.323, which is a
typical feature of SBC.
Converting protocols is not always as clear as the example in Figure 8-9 because each
protocol may have complicated call flows that make it very hard to match one to one.
Transcoding
What codec will be used for the current call is decided after negotiating it during the call
setup time. The method of negotiating is a little different depending on protocols, but the
basic method is offer-and-answer; a call originator offers a list of codecs and a responder
picks what it supports. Session Description Protocol (SDP) especially uses this offer-and-
answer model as defined in RFC 3264.
For example, if the core network supports four codecs with priority (G.711u, G.711a, G.729,
and iLBC) and a peering partner supports three codecs (G.729, iLBC, and G.723.1), G.729
will be picked as a final codec.
SBC Functionality 227
INVITE
100 Trying
ARQ
ACF
CALL SETUP
CALL PROCEEDING
183 Progress
ALERTING
180 Ringing
CONNECT
200 OK
ACK
RTP/RTCP
However, codec mismatching also may happen between different domains, which requires
a mediation device transcoding them. An SBC is in the right location and can provide this
feature.
Transcoding is required not only for the mismatching but also for bandwidth control.
Suppose that, for example, a certain access network has limited bandwidth and cannot
guarantee Quality of Service (QoS) when using a default codec, G.711 (64 Kbps). An SBC
can resolve this issue by reordering a codec list and forcing it to use a low bit-rate codec
like G.729 (8 Kbps).
Number Translation
Each peering partner may use a different format of phone number, which creates another
interoperability issue. For example, the core network uses U.S. standard dialing format (for
example, 14085556666 for domestic, 0118251864489 for international), but a peering
partner A requires E.164 format (for example, +14085556666, +8251864489) and a
peering partner B requires another format (for example, 4085556666, 8251864489). In
228 Chapter 8: Protection with Session Border Controller
the SIP protocol, these numbers are located in the Request-URI, From, To headers as in the
following example. The other party not supporting this format will have a parsing error and
return an error message.
INVITE sip:[email protected] SIP/2.0
From: 14084003020 <sip:[email protected]>;tag=498560566
To: <sip: [email protected]>
It is possible that a VoIP server (for example, Softswitch) may apply different translation
rules to each peering partner when routing the call. However, it will be more efficient for
VoIP servers to use a unified format in the core network, and an SBC applies different rules
to the peer interface, which gives the following benefits:
• Reduces complexity of handling multiple formats in core network.
• Saves processing resources in a core routing engine.
• Updating the translation rule does not affect the core routing engine.
• Makes it easier to apply different rules because an SBC generally has a separate
interface (logically or physically) with each peering partner.
QoS Marking
One of the well-known techniques to provide QoS in VoIP networks is marking the type of
service (ToS) byte in the IP headers to guarantee bandwidth for VoIP packets. The six most
significant bits of the ToS byte, called the Differentiated Services Code Point (DSCP), can
be used to differentiate the priority by marking them. This is a simple and efficient method
of packet classification as long as the network nodes can recognize and differentiate them
(for example, using a priority queue).
If an SBC at the border can mark the bits and send them to the access network, it is a big
benefit especially for enterprise customers who have their own voice and data network. It
is also beneficial for peer and core networks if their network nodes handle both real-time
and non-real-time (for example, email) traffic at the same time.
Now that you are aware of the key functions of SBC, the next section looks at how to design
secure service architecture with SBC.
• Network connectivity
• Service analysis
• Virtualization
• Optimization
NOTE Some SBC products may not have enough features or interfaces to implement all of the
guidelines that this section covers.
High Availability
At the initial stage of VoIP service, people believed that VoIP service is much less stable
than legacy PSTN service providing generally five 9s (99.999 percent availability), and
hesitated to use it despite many advantages. It might be unfair to compare with the legacy
system that has been stabilized for more than 100 years, but people are already used to it
and expect that high level of availability. The techniques of providing high availability in
VoIP have been developed fast, and some of them have demonstrated carrier-grade high
availability recently.
SBC products providing carrier-grade high availability should have the following
capability:
• Detect critical failure in the primary system and switchover to the backup (or
secondary) system automatically
• Maintain current call sessions (for example, session timer, call state, and media state)
in the backup system, as well as not losing them in the event of a primary system
failure
• Preserve call detail records (CDR) even after the switchover
• Minimize the failover time (less than 60 ms)
• Should not drop new call requests while switching over
• Notify any failure of primary or backup system to a system administrator in real time
(for example, using SNMP trap)
• Recover the primary system failure automatically (for example, auto reboot) after
handing over its role to the backup system
• Should not affect network topology (for example, no change of IP address) while
switching over
There are two different models for high availability: Active-Standby and Active-Active.
Both have pros and cons, and can be chosen according to service model, policy, network
capacity plan or Service Level Agreement (SLA), and so on.
230 Chapter 8: Protection with Session Border Controller
Active-Standby
Active-Standby is a common model in which the primary system is active and the backup
is standby; that is, only the primary handles all signals and media. The backup has to be
fully synchronized with the primary in real time and maintain all call information in order
to provide seamless backup service. Figure 8-10 illustrates simply the service model.
Sync Sync
Active SBC Standby SBC Active SBC Standby SBC
Note that the phones having active sessions or making new calls cannot recognize the
failover, which is transparent.
Not only 1:1 (Active:Standby) mapping as shown in Figure 8-10, but also n:1 mapping is
possible (multiple active nodes and single standby node), depending on the service model.
The Active-Standby model requires specific features and network connectivity between the
pair of SBCs. Here are the characteristics:
• The pair of SBCs should use the same IP address and VLAN.
This arrangement provides the transparency of IP connection from endpoints. It is
recommended to use shared “virtual” IP address.
Service Architecture Design 231
• The pair of SBCs should use the same Media Access Control (MAC) address.
It is possible to provide transparency even with different MAC addresses, but using
the same “virtual” MAC address can minimize the failover time.
• There should be a dedicated physical connection between the pair of SBCs.
This is not only for checking out the heartbeat, but also synchronizing current call
information.
• The pair of SBCs should have the same capacity (bandwidth and system resources).
• The pair of SBCs should have the same configuration.
The pros of this model are, as mentioned, being able to provide seamless and transparent
service even after failover. However, the cons include wasting system resources; the
standby system is just waiting without taking any call even when the active system is over-
loaded, that is, the standby is not involved in load-balancing. Also, real-time synchroniza-
tion and heartbeat monitoring require additional resources.
Active-Active
Active-Active is another way of providing high availability by deploying multiple (typi-
cally, two) SBCs in parallel. The SBCs in the group are always active and have the same
configuration with different network addresses, which means that endpoints are able to
access any one of them. Whenever any failure happens on an SBC, the others take over the
service. The endpoints decide the alternative based on their initial configuration. Figure 8-11
illustrates simply the service model.
The characteristics compared with the Active-Standby model are summarized as follows:
• It optimizes network and system resources because no standby node exists.
• The SBCs in the group do not have to have the same capacity.
• Current call sessions in the failed node will be dropped, and also new call attempts to
this node will be rejected.
• The endpoints decide the alternative SBC when the current connection fails.
• There is no synchronization of call state between SBC nodes.
Despite the benefit of optimizing resources, the Active-Active model is not commonly used
for large-scaled VoIP service because of the large number of call drops.
The next section shows how to design network connectivity to provide secure VoIP service.
232 Chapter 8: Protection with Session Border Controller
Network Connectivity
The question of
• A pair of SBCs is located at the border of the access and peer networks.
• The Active-Standby model is used for high availability.
• Each SBC has two physical network interfaces for VoIP service.
• Each SBC has the capability of setting up a VLAN on each interface.
• Each SBC has two additional interfaces; one for management and the other for
synchronization (or heartbeat).
• Layer 2 switching and Layer 3 routing infrastructure are ready to use.
Figure 8-12 illustrates network Layer 2 and 3 connectivity (with high availability [HA])
with a pair of SBCs, focusing on high availability and efficient traffic control.
Service Architecture Design 233
VRRP or HSRP
L3 Router
Redundancy
L2 Switch
HA Link
Management Network
5 During the switchover, existing sessions are not interrupted because the MAC and IP
addresses are still alive to the upstream router.
6 The pair of Layer 3 routers provides a fault-tolerant default gateway by means of a
redundancy protocol, such as VRRP or Hot Standby Router Protocol (HSRP, Cisco
proprietary).
7 There is an HA link between an Active and Standby for synchronizing state and
detecting heartbeat.
8 There is another Ethernet port for management, which goes to a dedicated
management network for monitoring, troubleshooting, maintaining, and so on.
NOTE As defined in RFC 3768,2 Virtual Router Redundancy Protocol (VRRP) specifies an
election protocol that dynamically assigns responsibility for a virtual router to one of the
VRRP routers on a LAN. The VRRP router controlling the IP address associated with a
virtual router is called the Master, and forwards packets sent to these IP addresses. The
election process provides dynamic failover in the forwarding responsibility should the
Master become unavailable. This allows any of the virtual router IP addresses on the LAN
to be used as the default first-hop router by end-hosts. The advantage gained from using
VRRP is a higher-availability default path without requiring configuration of dynamic
routing or router discovery protocols on every end-host.
Note that, in this design, there is no fixed active or standby node even though the name is
assigned in the figure for simplicity. Whichever takes current calls is an Active SBC and the
other is a Standby. It switches after a failover.
The next section provides
continues
236 Chapter 8: Protection with Session Border Controller
You can break down the policies in Table 8-1, and create a detailed policy group; for
example, each VoIP protocol can have its own signal policy.
Other policies, such as authentication, call admission control, recovery, license, number
translation, or protocol interworking, may be defined depending on the scope of service.
These polices are applied to each logical interface on an SBC with different values of
attributes according to the target service plan. The logical interface is segregated by many
factors, such as the following examples:
• Type of network (core vs. access/peer network)
• Type of access network (managed vs. unmanaged)
• Type of peering network (IP handoff vs. PSTN termination)
• VoIP protocol (SIP vs. H.323)
• Network bandwidth (T1 vs. T3)
• Type of media (voice vs. video)
• Target QoS (low vs. high quality)
• Type of core network element (Softswitch vs. media gateway)
Here are examples of applying policies to the logical interface:
Interface A:
— Facing “unmanaged” access network with limited bandwidth
Policy A:
• Low trust level (the access network is relatively not secure)
Service Architecture Design 237
Virtualization
Virtualizing SBC is a technique of dividing a single SBC into multiple virtual (or logical)
SBCs, named Virtual SBC (VSBC), in order to segregate traffic among the different services
used in VoIP network. It provides simplicity of managing SBCs, and efficiency of applying
different policies.
238 Chapter 8: Protection with Session Border Controller
Each VSBC consists of two logical interfaces; one for the core network and the other for
access (or peer) network. Figure 8-13 shows the example of virtualization with multiple
VSBCs in an SBC. Note that this design approach may not be applicable to certain SBC
products.
Softswitch 1 Softswitch 2
Core Network
VSBC
SBC
Managed Unmanaged Managed Unmanaged Managed
SIP SIP H.323 H.323 MGCP
(UDP) (TLS) (UDP) (TLS) (UDP)
IP1 IP2 IP3 IP4 IP5
Access Network
In Figure 8-13, there are five different VSBCs with two interfaces for access and core
network, which looks as if there are five different SBCs. Each VSBC handles a different
protocol, different type of CPE, and different security option. For example, whenever a
CPE connects to IP1, the traffic (signal and media) goes to Softswitch1 through VSBC1,
after being manipulated based on the policy.
It requires transcoding if a VSBC has two different protocol interfaces; for example, if a
call goes to IP3 in Figure 8-13, it is transcoded between SIP and H.323.
This design is applied to the peering network as well, as shown in Figure 8-14.
The method of virtualization makes it much easier for you to design the service architecture,
especially globalized VoIP service with multiprotocol, multivendors, and multiservers.
The next section shows how to optimize and secure traffic flow with SBC.
Service Architecture Design 239
Softswitch 1 Softswitch 2
Core Network
VSBC
SBC
SIP SIP H.323
(UDP) (TCP) (TCP)
Peer Network
Proxy Gatekeeper
Deployment Location
First of all, think about the physical location of an SBC. Of course, the physical location
might not be significant because it could be anywhere around the world as long as it has
stable IP connectivity. However, generally, the physical distance is relative to the latency of
traffic because packets pass by more network nodes (that is, more latency) if the distance is
longer. The latency tends to come with packet loss or jitter as well.
Figure 8-15 illustrates an example of wrong deployment that does not consider the latency;
a User A in Los Angeles makes a call to a local pizza store through an SBC located at a data
center in New York.
240 Chapter 8: Protection with Session Border Controller
SBC in LA
SBC in NY
User in LA
Therefore, either one of the following ways is recommended to minimize the latency:
• Configure CPE to connect the nearest SBC when provisioning them (static assignment).
• All CPE connect initially to a redirect server and receive the IP/port address of the
corresponding SBC (it is possible to recognize the physical location of CPE based on
its IP subnet).
Media Control
The signals always go through either access SBC or peer SBC providing topology hiding,
but the media is either anchored (relayed) or released depending on the optimization of
service architecture. It may waste network and system resources if an SBC always anchors
media (unconditionally). Figure 8-16 is an example of unnecessary media anchoring; User
A makes a call to User B in the same company through access SBC.
In Figure 8-16, because both parties are in the same company network, there is no issue of
point-to-point media connection, actually, which gives better quality of service without any
security issues. How does the SBC know that both are in the same network? It can be
determined either from the same IP subnet or the same group of phone numbers.
Peer SBC also has the same mechanism of releasing media. For example, if a call session
comes from and goes to the same peering network, peer SBC may release the media
depending on the service agreement with the peering partner.
Service Architecture Design 241
Softswitch
Access SBC
There can be four different models of media traversal with access and peer SBC:
• Media anchoring in both access and peer SBC—This is a typical service model
that provides high-security architecture with topology hiding. Both access and peer
network cannot see each other’s network, not to mention the core network. Figure 8-17
illustrates the model with simple signal (SIP) and media traversal.
Keep in mind that the access and peer SBCs are logical entities of an SBC, as
mentioned before, so they can be within the same physical SBC or a separate SBC.
Another benefit of this model is that it gives full control of signal and media so that
an SBC can manipulate them, such as transcoding, QoS marking, DTMF interworking
or RTCP control, and so on. Also, this model provides an easier interface for lawful
interception (CALEA).
However, the downside is that it consumes relatively high system resources (for
example, CPU, memory) and network bandwidth, especially in a core network. It
requires cautious routing policy and location design of both SBCs; otherwise, it may
cause a serious latency issue.
242 Chapter 8: Protection with Session Border Controller
Peer PSTN
V
Network
MG
Peer SBC
Core
Network
Access SBC
Access
Signal Media
Network
• Media anchoring only in access SBC—This model is generally used where the
service provider has its own termination network (for example, managing time-
division multiplexing [TDM] media gateways), or the peering partner does not require
“strict routing.” The strict routing means that inbound/outbound traffic (signal and
media) has to come and go through a fixed IP address. In other words, the source
and destination IP address between peer and core network is already fixed, that is, the
service provider cannot terminate a call with a different source IP address. Some
peering providers require this strict routing for security and performance purpose.
Figure 8-18 illustrates the model. Stable and secure IP connectivity between core and
peer network is necessary in this model.
Service Architecture Design 243
Peer PSTN
V
Network
MG
Peer SBC
Core
Network
Access SBC
Access
Signal Media
Network
This model consumes fewer resources than the full anchoring model and provides the
same security level to the access network that is most vulnerable. It is easier to design
the routing architecture, especially to the peering network.
However, it exposes part of the core network topology to peering partner. It can
be vulnerable to any malicious attack coming from the peering network, such as
malformed messages or flooded DoS. So, it is necessary to confirm that the peering
partner maintains a secure enough network.
• Media anchoring only in peer SBC—This model is not commonly used, but could
be used where the service provider manages its own access network and controls CPE,
assuming that all traffic from them is secure enough. Figure 8-19 shows media
traversal only through peer SBC.
244 Chapter 8: Protection with Session Border Controller
Peer PSTN
V
Network
MG
Peer SBC
Core
Network
Access SBC
Access
Signal Media
Network
This model reduces the usage of resources and provides high security protection to the
peering network.
However, it has a potential security issue because the topology of the core network is
exposed to end users. It is very complicated to provide the feature of lawful interception
(CALEA) in this model.
• No media anchoring—In this model, the media channel is opened end-to-end
without going through an SBC. It is not commonly used, but could be used where the
service provider manages both access and termination network, or carries only signals
because of limited bandwidth or system resource.
This model saves significant resources, but exposes a variety of security issues related
to media and also restricts QoS control.
In this section, you have learned how to design secure VoIP service architecture with SBC,
in terms of high availability, network connectivity, service policy analysis, virtualization,
and optimization of
Summary 245
Summary
An SBC is a controlling device located in a logical boundary of a VoIP network in order to
resolve border issues, such as DoS, call flooding, exposed network topology, traversing
NAT/Firewall, codec conflict, lawful interception, QoS, and so on.
VoIP service providers can deploy two logically separated SBCs—access and peer SBC—
in their service network depending on where the SBC is located.
The access SBC is located on the border between the service provider’s core network and
access network in order to deal with the border issues. Because the VoIP traffic comes from
an unmanaged public network, the access SBC should have a strict policy for DoS attacks,
flooded calls, malformed messages, and spoofed calls. Moreover, it should have the capa-
bility to apply the policy to individual users/devices without affecting other devices.
The peer SBC is located on the border between the service provider’s core network and peer
network in order to deal with border issues. Because the traffic comes from a relatively safe
network, typically through a VoIP trunk, it does not require strict policy as much as access
SBCs do. It should have the capability to apply different policies to different peer networks.
The primary function of SBC is network topology hiding; it encapsulates the core network
and provides a single logical interface for external networks. The external endpoints can see
only the IP address and port of SBC rather than actual VoIP servers, and the SBC routes the
call to the corresponding server based on ToS, policy, protocol, and so on.
For DoS protection, SBC provides the function of policy-driven access control by catego-
rizing VoIP traffic; white, black, and gray traffic (or endpoints). The method of judging the
category is based on authentication, number of messages per second, number of call
attempts per second, number of invalid (or malformed) messages, maximum bandwidth
consumption per call, and so on.
The typical way to prevent overload with SBC is reducing redundant or unnecessary signals
by controlling the frequency of messages, such as registration timer control or ping timer
control. SBC also can distribute the load to multiple targets (for example, VoIP servers)
based on the policy.
SBC also provides the function of NAT traversal by replacing a private IP of endpoint with
either SBCs or NATed IP.
Lawful interception is another function of SBC that sees most of the signals and media
going back and forth among endpoints and VoIP servers as an access device.
Other functions of SBC are protocol conversion, transcoding, number translation, QoS
marking, and so on.
When designing VoIP service architecture with an SBC, you should consider high availability,
network connectivity, service policy, virtualization, and deployment location for secure and
optimized service.
246 Chapter 8: Protection with Session Border Controller
End Notes
1 RFC 3261, “SIP (Session Initiation Protocol),” J. Rosenberg, H. Schulzrinne,
G. Camarillo, A. Johnston, J. Peterson, R. Sparks, M. Handley, and E. Schooler,
June 2002.
2 RFC 3768, “Virtual Router Redundancy Protocol (VRRP),” R. Hinden, ed.,
April 2004.
References
draft-ietf-sipping-sbc-funcs-05.txt, “Requirements from SIP Session Border Control
Deployments,” G. Camarillo, R. Penfield, A. Hawrylyshen, M. Bhatia, March 2008.
RFC 3264, “An Offer/Answer Model with the Session Description Protocol (SDP),”
J. Rosenberg, H. Schulzrinne, June 2002.
RFC 5128, “State of Peer-to-Peer (P2P) Communication Across Network Address
Translators,” P. Srisuresh, B. Ford, D. Kegel, March 2008.
This page intentionally left blank
This chapter covers the methodology of protection with the following network devices in
enterprise VoIP networks:
• Firewall
• Cisco Unified Communications Manager (Unified CM)
• Cisco Unified Communications Manager Express (Unified CME)
• Access device: IP phone
• Access device: Multilayer switch
CHAPTER
9
Protection with
Enterprise Network Devices
Most network devices in an enterprise VoIP network have their own security features. Some
devices (for example, a firewall) have strong capability with sophisticated features, and
some devices (for example, an IP phone) have very limited capability.
As mentioned in previous chapters, there is no magic bullet—a single device or architecture
that can protect the whole VoIP service network securely. The best practice is analyzing
current vulnerability and applying a consolidated solution that includes all possible
network devices and architectures.
This chapter demonstrates how to protect the enterprise VoIP network with the following
devices. For practical information, products from Cisco Systems will be demonstrated,
which are largely deployed around the world.
• Firewall
• Cisco Unified Communications Manager (Unified CM, formerly CallManager)
• Cisco Unified Communications Manager Express (Unified CME, formerly
CallManager Express)
• Access device: IP phone
• Access device: Multilayer switch
Rather than specifying every security feature of those products, the chapter focuses on key
features and their usage. The content refers to Cisco Unified Communications SRND.1
Firewall
There are two types of firewall in terms of capability of recognizing VoIP protocols: legacy
and VoIP-aware firewalls. The legacy firewall handles packets only in the network and
transport layers, and does not care what protocol is going through in the application layer.
However, the VoIP-aware firewall has additional capability to inspect and manipulate VoIP
packets in the application layer for secure service. This section describes the function and
usage of the VoIP-aware firewall (shortening the name to just “firewall”).
250 Chapter 9: Protection with Enterprise Network Devices
An Access Control List (ACL) is a primary method of a firewall to protect VoIP servers and
media gateways from external devices that are not supposed to communicate with them.
Using ACL for VoIP traffic is not simple because the media ports used by entities change
dynamically based on the call setup. You may use a static configuration, such as a certain
range being always opened or blocked, but it creates a potential vulnerability.
In general, an endpoint and a server are using the client/server model for signaling for call
setup, and the media channel between endpoints is established directly; that is, end-to-end.
If the call signaling did not go through the firewall, the media stream cannot pass through
it because the firewall does not know which media ports need to be opened.
Because a firewall handles a large amount of traffic by nature, capabilities and performance
need to be taken into account. Performance includes the amount of latency, which the
firewall can increase if it is under high load or even under attack. The general rule in VoIP
deployment is to keep the CPU usage less than 60 percent for normal usage. If the CPU
usage goes up more than 60 percent, especially in sustained high usage, the quality of
service (QoS) will degrade and phones will start to unregister. When this happens, the
phones will attempt to reregister with a VoIP server, which increases the load on the firewall
even more.
There are many ways to deploy firewalls for secure networks. This section focuses on Cisco
ASA, PIX, and FWSM (see the following Note) in the Active-Standby mode in both routed
and transparent scenarios. Figure 9-1 illustrates the Active-Standby mode for redundancy
purposes.
Outside Network
(Less Protected)
HA Link
Active Firewall Standby Firewall
State and Failover Cable
Inside Network
(Protected)
Firewall 251
All of the Cisco firewalls can run in either multiple-context or single-context mode. In
single-context mode, the firewall is a single firewall that controls all traffic flowing through
it. In multiple-context mode, the firewalls can be turned into many virtual firewalls, which
is the same concept as a Virtual Session Border Controller (VSBC; see Chapter 8). Each
of these contexts or virtual firewalls has its own configurations and can be controlled by
different groups or administrators. Each time a new context is added to a firewall, it will
increase the load and memory requirements on the firewall.
Both the Cisco ASA and PIX firewalls operate in a different manner than the Cisco FWSM.
Within an ASA and PIX, as long as there is no ACL on a more trusted interface, all traffic
from that interface is trusted and allowed out to a less-trusted interface. When any ACL
is applied to the more trusted interface on an ASA/PIX, all other traffic is denied and the
firewall will then function very much like the FWSM.
Routed Mode
The ASA or PIX firewall in routed mode acts as a router between connected networks, and
each interface requires an IP address on a different subnet. In single-context mode, the
routed firewall supports Open Shortest Path First (OSPF) and Routing Information Protocol
(RIP) in passive mode. Multiple-context mode supports static routes only. ASA version 8.x
252 Chapter 9: Protection with Enterprise Network Devices
also supports Enhanced Interior Gateway Routing (EIGRP). Cisco recommends using the
advanced routing capabilities of the upstream and downstream routers instead of relying on
the security appliance for extensive routing needs.
The routed ASA or PIX supports QoS, Network Address Translation (NAT), and VPN
termination to the ASA, which are not supported in the transparent mode. Figure 9-1 shows
the logical placement of firewalls for both routed and transparent configurations in Active-
Standby mode. With the routed configuration, each interface on the ASA or PIX would have
an IP address.
Unlike with transparent mode, the device can be seen in the network and, because of that,
it can be a point of attack.
Placing a routed ASA or PIX in a network changes the network routing because some of
the routing can be done by the firewall. IP addresses must also be available for all the
interfaces on the firewall, so changing the IP addresses of the routers in the network might
also be required. If a routing protocol is to be allowed through the ASA or PIX firewall, an
ACL will have to be put on the inside (or most trusted) interface to allow that traffic to pass
to the outside (or less trusted) interface. That ACL must also define all other traffic that will
be allowed out of the most trusted interface.
Transparent Mode
The ASA or PIX firewall can be configured to be a Layer 2 firewall (also known as “bump
in the wire” or “stealth firewall”). In this configuration, the firewall does not have an IP
address (other than for management proposes), and all of the transactions are done at Layer 2
of the network.
Even though the firewall acts as a bridge, Layer 3 traffic cannot pass through the security
appliance unless you explicitly permit it with an extended access list. The only traffic
allowed without an access list is Address Resolution Protocol (ARP) traffic.
This configuration has the advantage that an attacker cannot see the firewall because it is
not doing any dynamic routing. Static routing is required to make the firewall work even in
transparent mode. This configuration also makes it easier to place the firewall into an
existing network because routing does not have to change for the firewall.
It also makes the firewall easier to manage and debug because it is not doing any routing
within the firewall. Because the firewall is not processing routing requests, the performance
of the firewall is usually somewhat higher with inspect commands and overall traffic than
the same firewall model and software that is doing routing.
However, you are unable to use NAT on the firewall. If you are going to pass data for
routing, you will also have to define the ACLs both inside and outside the firewall to allow
traffic, unlike with the same firewall in routed mode. Cisco Discovery Protocol (CDP)
traffic will not pass through the device even if it is defined. Each directly connected network
Firewall 253
must be on the same subnet. You cannot share interfaces between contexts; if you plan on
running multiple-context mode, you will have to use additional interfaces. You must define
all non-IP traffic, such as routing protocols, with an ACL to allow that traffic through the
firewall. QoS is not supported in transparent mode. Multicast traffic can be allowed to go
through the firewall with an extended ACL, but it is not a multicast device. Also, the firewall
does not support VPN termination other than for the management interface.
If a routing protocol or RSVP is to be allowed through the ASA or PIX firewall, an ACL
will have to be put on the inside (or most trusted) interface to allow that traffic to pass to
the outside (or lesser trusted) interfaces. That ACL must also define all other traffic that will
be allowed out of the most trusted interface.
NOTE Application Layer Gateways (or Application Level Gateways) are, as defined in RFC
2663,2 application-specific translation agents that allow an application on a host in one
address realm to connect to its counterpart running on a host in a different realm transpar-
ently. An ALG may interact with NAT to set up state, use NAT state information, modify
application-specific payload, and perform whatever else is necessary to get the application
running across disparate address realms.
ALGs may not always utilize NAT state information. They may glean application payload
and simply notify NAT to add additional state information in some cases. ALGs are similar
to proxies, in that both ALGs and proxies facilitate application-specific communication
between clients and servers. Proxies use a special protocol to communicate with proxy
clients and relay client data to servers and vice versa. Unlike proxies, ALGs do not use a
special protocol to communicate with application clients and do not require changes to
application clients.
When the ASA firewall is placed between an IP phone and the Unified CM to which it is
registered, the TLS proxy is inserted into the TLS session. A phone with encrypted
signaling uses TLS as a transport between itself and Unified CM. When the TLS proxy is
involved, there are two TLS sessions for each phone registration, one between the phone
and the ASA and the second between the ASA and Unified CM.
254 Chapter 9: Protection with Enterprise Network Devices
The ASA is the only firewall with an ALG that has a controlled method to allow a call with
encrypted signaling to work, because it is able to inspect that signaling.
When a VPN design is not the desired solution for securing remote phones, the ASA can
provide an alternative method of securing those devices.
The TLS proxy is added as a trusted entity to the Certificate Trust List (CTL) that is used
by the phones. The CTL file is allowed to contain 16 entries, which include all servers that
need to have a trust relationship with the phones. Therefore, the number of TLS proxies
configured to work with a given cluster is limited by the number of free entries in the CTL.
Configuration Example
Example 9-1 is a configuration example listing the ports and inspect commands that are
used to make the firewalls work with voice for ASA and PIX software Release 7.04. This
is an example only, and you should review the ports list from all the applications that are
used in your network before deploying any firewall. This configuration example shows only
the voice sections.
Example 9-1 Configuration Example of ASA and PIX
!
object-group service remote-access tcp
description remote access
!Windows terminal
port-object range 3389 3389
!VNC
port-object range 5800 5800
!VNC
port-object range 5900 5900
port-object range 8080 8080
port-object eq ssh
!SSH
port-object eq ftp-data
!FTP data transport
port-object eq www
!HTTP Access
port-object eq ftp
!FTP
port-object eq https
!HTTPS Access
object-group service voice-protocols-tcp tcp
description TCP voice protocols
!CTI/QBE
port-object range 2428 2428
!SIP communication
port-object eq ctiqbe
!SCCP
port-object range 2000 2000
!Secure SCCP
Firewall 255
Now that you are aware of ASA and PIX firewalls, the next section looks at another type of
firewall, FWSM.
FWSM Firewall
Cisco FWSM is a software module for Catalyst 6500 Series Switches and 7600 Series
Routers, as mentioned before. Here are the usages of FWSM based on routed and
transparent mode, as well as a configuration example.
Routed Mode
In routed mode, the FWSM is considered to be a router hop in the network. It performs NAT
between connected networks and can use OSPF or passive RIP (in single-context mode).
Routed mode supports up to 256 interfaces per context or in single mode, with a maximum
of 1,000 interfaces divided between all contexts.
Unlike the transparent mode, the routed device is visible in the network and, because of
that, it can be a point of attack. To place the device in a network, IP addressing and routing
must be changed.
Transparent Mode
In transparent mode, the FWSM acts like a “bump in the wire,” or a “stealth firewall,” and
is not a router hop. The FWSM connects the same network on its inside and outside interfaces,
but each interface must be on a different VLAN. No dynamic routing protocols or NAT are
required. However, like routed mode, transparent mode also requires ACLs to allow traffic
to pass through. Transparent mode can also optionally use EtherType ACLs to allow non-
IP traffic. Transparent mode supports only two interfaces, an inside interface and an outside
interface.
You might use a transparent firewall to simplify your network configuration. Transparent
mode is also useful if you want the firewall to be invisible to attackers. You can also use a
transparent firewall for traffic that would otherwise be blocked in routed mode. For example,
a transparent firewall can allow multicast streams using an EtherType ACL.
To avoid loops when you use failover in transparent mode, you must use switch software
that supports Bridge Protocol Data Unit (BPDU) forwarding, and you must configure the
FWSM to allow BPDUs.
Transparent mode does not support NAT, dynamic routing, or a unicast Reverse Path
Forwarding (RPF) check.
Configuration Example
Example 9-2 is a configuration example listing the ports and inspect commands that are
used to make the firewall work with voice for FWSM software Release 2.3.x. This is only
an example, and you should review the ports list from all the applications that are used in
your network before deploying any firewall. This configuration example shows only the
voice sections.
Example 9-2 Configuration Example of FWSM
!
fixup protocol h323 H225 1720
!Enable fixup h3232 h225
fixup protocol h323 ras 1718-1719
!Enable fixup h323 RAS
fixup protocol mgcp 2427
!Enable fixup mgcp
fixup protocol skinny 2000
!Enable fixup
fixup protocol tftp 69
!Enable fixup
object-group service VoiceProtocols tcp
description Unified CM Voice protocols
port-object eq ctiqbe
port-object eq 2000
port-object eq 3224
port-object eq 2443
port-object eq 2428
port-object eq h323
!Defining the ports for TCP voice
object-group service VoiceProtocolsUDP udp
description UDP based Voice Protocols
port-object range 2427 2427
port-object range 1719 1719
port-object eq tftp
!Defining the ports for UDP voice
object-group service RemoteAccess tcp
description Remote Acces
port-object range 3389 3389
port-object range 5800 5809
port-object eq ssh
port-object range 5900 5900
port-object eq www
port-object eq https
!Defining remote access TCP ports
access-list inside_nat0_outbound extended permit ip any any
!
access-list phones_access_in extended permit tcp any any object-group RemoteAccess
log
notifications interval 2
continues
258 Chapter 9: Protection with Enterprise Network Devices
In this section so far, you have learned about ASA, PIX, and FWSM firewalls in terms of
network architecture, usage, features, and configuration. The next section describes the
limitations of using those firewalls.
Limitations
Not all IP telephony application servers or applications are supported through a firewall.
Some applications that are not supported with firewalls or with an ALG in the firewall include
Cisco Unity voicemail servers, Attendant Console, Cisco Unified Contact Center Enterprise,
and Cisco Unified Contact Center Express. ACLs can be written for these applications to
allow media and signaling traffic to flow through a firewall.
Unified Communications Manager Express 259
Versions of Cisco FWSM prior to version 3.0 do not support SCCP fragmentation. If an
SCCP packet is fragmented from a phone, from Unified CM, or from a gateway to another
IP telephony device, the fragmented packet will not be allowed through the FWSM. In
cases where fragmentation occurs with an FWSM running version 2.x code, an ACL should
be used without the ALG feature of the firewall for the signaling traffic. This configuration
will allow the signaling traffic through the FWSM but will not do packet inspection as the
signaling goes through the firewall.
If there are other applications that use the same port as SCCP (TCP 2000), those applications
could be affected by the SCCP inspection. All traffic that is going to the SCCP TCP port
will be inspected to see if it is SCCP traffic. If it is not SCCP traffic, it will be dropped.
To determine whether the applications running on your network are supported with the
version of firewall in the network or if ACLs have to be written, refer to the appropriate
application documentation available at Cisco.com.
Access Control
Unified CME provides interfaces for system access locally and remotely. The system access
control is the fundamental way of making VoIP service secure. The recommendation and
command examples are listed as follows:
• Enable security and encrypt passwords
Use enable secret to encrypt the enable password:
enable secret <removed>
no enable password
The enable secret command takes precedence over the enable password command
if both are configured.
To increase security access, passwords can be encrypted to prevent any unauthorized
users from viewing the passwords when packets are examined by protocol analyzers:
Service password-encryption
260 Chapter 9: Protection with Enterprise Network Devices
The sample command log shows the information contained in a TACACS+ command
accounting record for privilege level 1.
Wed Jun 25 03:46:47 1997 172.16.25.15 fgeorge tty3 5622329430/4327528 stop
task_id=3 service=shell priv-lvl=1 cmd=show version <cr>
Wed Jun 25 03:46:58 1997 172.16.25.15 fgeorge tty3 5622329430/4327528 stop
task_id=4 service=shell priv-lvl=1 cmd=show interfaces Ethernet 0 <cr>
Wed Jun 25 03:47:03 1997 172.16.25.15 fgeorge tty3 5622329430/4327528 stop
task_id=5 service=shell priv-lvl=1 cmd=show ip route <cr>
Unified Communications Manager Express 261
By default, the VTY’s transport is Telnet. The following command disabled Telnet
and supports only SSH to the VTY lines.
line vty 0 4
transport input ssh
Because “read” and “write” are two common community strings for read and write
access, respectively, change the community strings to different ones.
• Disable CDP Unless Needed
Because CDP automatically discovers the neighboring network devices supporting
CDP, disable CDP in an untrusted domain so that Unified CME routers will not show
in the CDP table of other devices.
no cdp run
ip source-address command, so that only locally attached IP phones will be able to register
and get telephony services. The example of a command is:
Unified CME(config-telephony)# ip source-address 1.1.1.1 port 2000 strict-match
Use the following access-list if you want to block port 2000 access from the WAN side to
prevent external SCCP phones from registering with Unified CME:
access-list 101 deny tcp any any eq 2000
NOTE Unknown phones or phones that are not configured in Unified CME are allowed to register
by default for ease of management, but they do not get dial tone until you configure them
by associating the buttons with ephone-dns or configuring auto assign dns under telephony
service.
Unified CME has the following syslog messages to generate and display all registration/
deregistration events:
%IPPHONE-6-REG_ALARM
%IPPHONE-6-REGISTER
%IPPHONE-6-REGISTER_NEW
%IPPHONE-6-UNREGISTER_ABNORMAL
%IPPHONE-6-REGISTER_NORMAL
The following message, for example, indicates that a phone has registered and is not part
of the explicit router configuration; that is, ephone configuration has not been created:
%IPPHONE-6-REGISTER_NEW: ephone-3:SEP003094C38724 IP:1.4.170.6
Socket:1 DeviceType:Phone has registered.
Unified CME also allows unconfigured IP phones to register in order to make provisioning
of the Unified CME system more convenient. By default, IP phones designated as “new”
are not assigned phone numbers and cannot make calls.
You can use the following configuration to enable syslogging to a router’s buffer/console
or a syslog server:
logging console | buffer
logging 172.19.153.129 !!! 172.19.153.129 is the syslog server
The Cisco CallManager Express GUI provides call history table information so that a
network administrator can monitor the call history information for unknown callers and use
this information to disallow calling activities based on select calling patterns. The call
history log should be configured to perform forensics and accounting and allow the
administrator to track down fraudulent calling
dial-control-mib retain-timer 10080
dial-control-mib max-size 500
!
gw-accounting syslog
Unified Communications Manager Express 263
Use the following command to generate an RSA usage key pair with a length of 1024 bits
or greater:
crypto key generate rsa usage 1024
If you do not generate an RSA usage key pair manually, an RSA usage key pair with a
length of 768 bits will be generated automatically when you connect to the HTTPS server
for the first time. These automatically generated RSA keys are not saved to the startup
configuration; therefore, they will be lost when the device is rebooted unless you save the
configuration manually.
You should obtain an X.509 digital certificate with digital signature capabilities for the
device from a certification authority (CA). If you do not obtain a digital certificate in
advance, the device creates a self-signed digital certificate to authenticate itself.
If you change the device hostname after obtaining a device digital certificate, HTTPS
connections to the device fail because the hostname does not match the hostname specified
in the digital certificate. Obtain a new device digital certificate using the new hostname to
fix this problem.
The ip http secure-server command will prevent clear-text passwords across the wires
when a Unified CME administrator logs into the GUI. However, communication between
the phone and the router will remain unsecured. A signed digital signature is required in the
phone load and Cisco IOS for secure connection.
The following are the suggested best practices for using HTTP’s interactive access to the
Unified CME router:
• Use the ip http access-class command to restrict IP packets connecting to Cisco
CallManager Express.
264 Chapter 9: Protection with Enterprise Network Devices
Class of Restriction
The Class of Restriction (COR) is used to prevent toll fraud or restrict the permission of
incoming/outgoing calls. There are two COR in the configuration: user and superuser along
with various permissions allowed, such as local calling, long distance calling, 911 access,
and 411 access. Example 9-3 shows how the “superuser” has access to everything and
“user” has access to all resources with the exception of toll “1900,” directory assistance
“411,” and international calling.
Example 9-3 COR Example
!
dial-peer cor custom
name 911
name 1800
name local-call
name ld-call
name 411
name int-call
name 1900
dial-peer cor list call911
member 911
!
dial-peer cor list call1800
member 1800
!
dial-peer cor list calllocal
member local-call
!
dial-peer cor list callint
member int-call
!
dial-peer cor list callld
member ld-call
!
dial-peer cor list call411
member 411
!
dial-peer cor list call1900
member 1900
dial-peer cor list user
member 911
Unified Communications Manager Express 265
telephony-service
after-hours block pattern 1 .1242
after-hours block pattern 2 .1264
after-hours block pattern 3 .1268
after-hours block pattern 4 .1246
after-hours block pattern 5 .1441
after-hours block pattern 6 .1284
after-hours block pattern 7 .1345
after-hours block pattern 8 .1767
after-hours block pattern 9 .1809
after-hours block pattern 10 .1473
after-hours block pattern 11 .1876
after-hours block pattern 12 .1664
after-hours block pattern 13 .1787
after-hours block pattern 14 .1869
after-hours block pattern 15 .1758
after-hours block pattern 16 .1900
after-hours block pattern 17 .1976
Unified Communications Manager 267
In this section, you have learned how to make VoIP service secure with Unified CME in
terms of access control, phone registration, GUI management, class of restriction, and
after-hours call blocking. The next section covers security practices with Unified CM.
NOTE To avoid massive configuration snapshots, this section will approach the security applica-
tions at a high level, focusing on authentication, integrity, and encryption. If you need to
see detailed configuration examples, refer to the administration guides at Cisco.com.
Media encryption keys derived by Unified CM get sent securely via encrypted signaling
paths to IP phones through TLS (or TCP for some phone models) and to gateways over
IPSec-protected links.
Table 9-1 shows the summary of security features that Unified CM can implement during
a Session Initiation Protocol (SIP) or SCCP call.
Table 9-1 Security Features on Unified CM
After root certificates are installed, certificates get added to the root trust stores to secure
connections between users and hosts, integrate application devices, and so on. For security
reasons, trusted certificate files typically get stored as an 8-digit number (such as f7a74b2c.0),
which is a hashed value of the certificate name.
Unified CM imports the following certificate types to its trust store:
• Cisco Unity server certificate—Cisco Unity uses this self-signed root certificate to
sign the Cisco Unity SCCP device certificates. The Cisco Unity Telephony Integration
Manager manages this certificate.
• Cisco Unity SCCP device certificates—Cisco Unity SCCP devices use this signed
certificate to establish a TLS connection with the Unified CM. Every Unity device (or
port) gets issued a certificate that is rooted at the Unity root certificate. The Unity
certificate name is a hash of the certificate’s subject name, which is based on the Unity
machine name.
• SIP Proxy server certificate—A SIP user agent that connects via a SIP trunk
authenticates to Unified CM if the CM trust store contains the SIP user agent
certificate and if the SIP user agent contains the CM certificate in its trust store.
Administrators have read-only access to certificates. Administrators can view the finger-
print of server certificates, regenerate self-signed certificates, and delete trust certificates at
the Cisco IP Telephony Platform GUI.
Administrators can also regenerate and view self-signed certificates at the command-line
interface (CLI).
Image Authentication
This process prevents tampering with the binary image (that is a firmware load) prior to
loading it on the phone. Tampering with the image causes the phone to fail the authentica-
tion process and reject the image. Image authentication occurs through signed binary files
that are automatically installed when you install Unified CM. Likewise, firmware updates
that you download from the web also provide signed binary images.
Device Authentication
The process of device authentication validates the identity of the device and ensures that
the entity is who it claims to be. Device authentication occurs between Unified CM and
supported IP Phones, SIP trunks, or JTAPI/TAPI/CTI applications (when supported).
An authenticated connection occurs between these entities only when each entity accepts
the certificate of the other entity. This process of mutual certificate exchange is called
mutual authentication. Device authentication relies on the creation of the Cisco CTL file
for authenticating Unified CM server node, and the Certificate Authority Proxy Function for
authenticating phones and JTAPI/TAPI/CTI applications.
File Authentication
The process of file authentication validates digitally signed files that the phone downloads;
for example, the configuration, ring list, locale, and CTL files. The phone validates the
signature to verify that file tampering did not occur after the file creation.
The TFTP server does not sign any files if you configure the cluster for non-secure mode.
If you configure the cluster for secure mode, the TFTP server signs static files, such as ring
list, localized, default.cnf.xml, and ring list WAV files, in .sgn format. The TFTP server
signs files in <device name>.cnf.xml format every time the TFTP server verifies that a data
change occurred for the file.
The TFTP server writes the signed files to disk if caching is disabled. If the TFTP server
verifies that a saved file has changed, the TFTP server re-signs the file. The new file on the
disk overwrites the saved file, which gets deleted. Before the phone can download the new
file, the administrator must restart affected devices in Unified CM Administration.
After the phone receives the files from the TFTP server, the phone verifies the integrity of
the files by validating the signature on the file. For the phone to establish an authenticated
connection, ensure that the following criteria are met:
• The phone must have been provisioned with its own certificate.
• The CTL file must exist on the phone, and the Unified CM entry and certificate must
exist in the CTL file.
• You configured the device for authentication or encryption.
Unified Communications Manager 271
Signaling Authentication
The process of signaling authentication, also known as signaling integrity, uses the TLS
protocol to validate that no tampering has occurred to signaling packets during transmission.
Signaling authentication relies on the creation of the CTL file.
Digest Authentication
The process of digest authentication for SIP trunks and phones allows Unified CM to
challenge the identity of a SIP user agent (UA) when the UA sends a request to Unified CM.
(A SIP user agent represents a device or application that originates a SIP message.)
Unified CM acts as a user agent server (UAS) for SIP calls originated by line-side phones
or devices reached through the SIP trunk, as a User Agent Client (UAC) for SIP calls that
it originates to the SIP trunk, or a back-to-back user agent (B2BUA) for line-to-line or
trunk-to-trunk connections. In most environments, Unified CM acts primarily as a B2BUA
connecting SCCP and SIP endpoints.
Unified CM can challenge SIP phones or SIP devices connecting through a SIP trunk (as a
UAS) and can respond to challenges received on its SIP trunk interface (as a UAC). When
digest authentication is enabled for a phone, Unified CM challenges all SIP phone requests
except keepalive messages. Note that you can only effectively challenge on a SIP trunk if
the UA belongs to your same realm and thus you know the valid username and password.
Unified CM defines a SIP call as having two or more separate call legs. For a standard two-
party call between two SIP devices, two separate call legs exist: one leg between the origi-
nating SIP UA and Unified CM (the originating call leg) and the other leg between Unified
CM and destination SIP UA (the terminating call leg). Each call leg represents a separate
SIP dialog. Because digest authentication is a point-to-point process, digest authentication
on each call leg stays independent of the other call legs. SRTP capabilities can change for
each call leg, depending on the capabilities negotiated between the user agents.
Unified CM server uses a SIP 401 (Unauthorized) message to initiate a challenge, which
includes the nonce and the realm in the header. (The nonce specifies a random number that
gets used to calculate the Media Digest 5 [MD5] hash.) When a SIP user agent challenges
the identity of Unified CM, Unified CM responds to SIP 401 and SIP 407 (Proxy
Authentication Required) messages.
After you enable digest authentication for a SIP phone or trunk and configure digest
credentials, Unified CM calculates a credentials checksum that includes a hash of the user-
name, password, and realm. Unified CM encrypts the values and stores the username
and the checksum in the database. Each digest user can have one set of digest credentials
per realm.
When Unified CM challenges a user agent, Unified CM indicates the realm and nonce value
for which the user agent must present its credentials. After receiving a response, Unified
CM validates the checksum for the username that is stored in the database against the
272 Chapter 9: Protection with Enterprise Network Devices
credentials received in the response header from the UA. If the credentials match, digest
authentication succeeded, and Unified CM processes the SIP request.
When responding to a challenge from a user agent that is connected through the SIP trunk,
Unified CM responds with the Unified CM username and password that are configured for
the realm, which is specified in the challenge message header. When Unified CM gets
challenged, the Unified CM looks up the username and encrypted password based on the
realm that the challenge message specifies. Unified CM decrypts the password, calculates
the digest, and presents it in the response message.
Administrators configure SIP digest credentials for a phone user or application user. For
applications, you specify digest credentials in the Applications User Configuration window
in Unified CM Administration. For SIP phones, you specify the digest authentication
credentials, which are then applied to a phone, in the End User window in Unified CM
Administration.
To associate the credentials with the phone after you configure the user, you choose a Digest
User, an end user, in the Phone Configuration window. After you reset the phone, the
credentials exist in the phone configuration file that the TFTP server offers to the phone.
If you enable digest authentication for an end user but do not configure the digest creden-
tials, the phone will fail registration. If the cluster mode is nonsecure and you enable digest
authentication and configure digest credentials, the digest credentials get sent to the phone
and Unified CM still initiates challenges.
Administrators configure the SIP realm for challenges to the phone and for challenges that
are received through the SIP trunk. The SIP Realm GUI provides the trunk-side credentials
for UAC mode. You configure the SIP realm for phones with the service parameter SIP
Station Realm. You must configure a SIP realm and username and password in Unified CM
Administration for each SIP trunk user agent that can challenge Unified CM.
Administrators configure the minutes that the nonce value stays valid for the external device
before that value gets rejected and a new number gets generated by Unified CM.
Authorization
Unified CM uses the authorization process to restrict certain categories of messages from
SIP phones, from SIP trunks, and from SIP application requests on SIP trunks.
For SIP INVITE messages and in-dialog messages, and for SIP phones, Unified CM
provides authorization through calling search spaces and partitions.
For SIP SUBSCRIBE requests from phones, Unified CM provides authorization for user
access to presence groups.
Unified Communications Manager 273
For SIP trunks, Unified CM provides authorization of presence subscriptions and certain
non-INVITE SIP messages; for example, out-of-dial REFER message, unsolicited notifi-
cation, and any SIP request with the replaces header. You specify authorization in the SIP
Trunk Security Profile window when you check the related check boxes in the window.
Authorization occurs for the SIP trunk first (as configured in the SIP Trunk Security Profile)
and then for the SIP application user agent on the SIP trunk (as configured in the Applica-
tion User Configuration), when application-level authorization is configured. For the trunk,
Unified CM downloads the trunk ACL information and caches it. The ACL information gets
applied to the incoming SIP request. If the ACL does not allow the SIP request, the call fails
with a 403 Forbidden message.
If the ACL allows the SIP request, Unified CM checks whether digest authentication is
enabled in the SIP Trunk Security Profile. If digest authentication is not enabled and
application-level authorization is not enabled, Unified CM processes the request. If digest
authentication is enabled, Unified CM verifies that the authentication header exists in the
incoming request and then uses digest authentication to identify the source application. If
the header does not exist, Unified CM challenges the device with a 401 message.
To enable SIP application authorization on the SIP trunk, you must check the Enable
Application Level Authorization check box in the SIP Trunk Security Profile window.
Before an application-level ACL gets applied, Unified CM authenticates the SIP trunk user
agent through digest authentication. Therefore, you must enable digest authentication in the
SIP Trunk Security Profile for application-level authorization to occur.
Encryption
Unified CM supports three types of encryption: signaling, media, and configuration file
encryption.
Signaling Encryption
Signaling encryption ensures that all SIP and SCCP signaling messages that are sent
between the device and the Unified CM server are encrypted.
Signaling encryption ensures that the information that pertains to the parties, dual-tone
multifrequency (DTMF) digits that are entered by the parties, call status, media encryption
keys, and so on, are protected against unintended or unauthorized access.
Cisco does not support NAT with Unified CM if you configure the cluster for secure mode;
NAT does not work with signaling encryption because the encrypted signaling does not
work in conjunction with the NAT ALG.
274 Chapter 9: Protection with Enterprise Network Devices
Firewall ALGs also break with encrypted signaling. As a workaround, you can enable User
Datagram Protocol (UDP) ALG in the firewall to allow media stream firewall traversal.
Enabling the UDP ALG allows the media source on the trusted side of the firewall to open
a bidirectional media flow through the firewall by sending the media packet through the
firewall.
SIP trunks support signaling encryption but do not support media encryption.
Media Encryption
Media encryption, which uses SRTP, ensures that only the intended recipient can interpret
the media streams between supported devices. Support includes audio streams only. Media
encryption includes creating a media master key pair for the call, delivering the keys to the
endpoints, and securing the delivery of the keys while the keys are in transport.
If the devices support SRTP, the system uses an SRTP connection. If at least one device
does not support SRTP, the system uses an RTP connection. SRTP-to-RTP fallback may
occur for a variety of reasons: transfers from a secure device to a non-secure device,
conferencing, transcoding, music on hold, and so on.
For most security-supported devices, authentication and signaling encryption serve as the
minimum requirements for media encryption; that is, if the devices do not support signaling
encryption and authentication, media encryption cannot occur. Cisco IOS gateways and
trunks support media encryption without authentication. For Cisco IOS gateways and
trunks, you must configure IPSec when you enable the SRTP capability (media encryption).
Secure SIP trunks can support secure calls over TLS; be aware, though, that the trunk
supports signaling encryption but does not support media encryption (SRTP). Because the
trunk does not support media encryption, the shield icon may display on the phones during
the call, that is, if all devices in the call support authentication or signaling encryption.
The following example demonstrates media encryption for SCCP and MGCP calls:
1 Device A and Device B, which support media encryption and authentication, register
with Unified CM.
2 When Device A places a call to Device B, Unified CM generates two sets of media
session master values from the key manager function.
3 Both devices receive the two sets: one set for the media stream, Device A–Device B,
and the other set for the media stream, Device B–Device A.
4 Using the first set of master values, Device A derives the keys that encrypt and
authenticate the media stream, Device A–Device B.
5 Using the second set of master values, Device A derives the keys that authenticate and
decrypt the media stream, Device B–Device A.
Unified Communications Manager 275
7 After the devices receive the keys, the devices perform the required key derivation,
and the SRTP packet can be sent and received by both Device A and B.
Configuration Guideline
The following list gives the recommended steps to implement integrity, authentication, and
encryption when configuring Unified CM based on version 5.
Note that you may have a different interface or options depending on the version and
specific features you are implementing.
Step 1 On each server in the cluster, activate the Cisco CTL Provider service in
Unified CM Serviceability. If you activated this service prior to a Unified
CM upgrade, you do not need to activate the service again. The service
automatically activates after the upgrade.
Step 2 On the first node, activate the Cisco Certificate Authority Proxy service
in Unified CM Serviceability to install, upgrade, troubleshoot, or delete
locally significant certificates.
Step 3 If you do not want to use the default port settings, configure ports for the
TLS connection. If you configured these settings prior to a Unified CM
upgrade, the settings migrate automatically during the upgrade.
Step 4 Obtain at least two security tokens and the passwords, hostnames/IP
addresses, and port numbers for the servers that you will configure for the
Cisco CTL client.
276 Chapter 9: Protection with Enterprise Network Devices
Step 5 Install the Cisco CTL client. You cannot use the Cisco CTL client that
was available with Unified CM 4.0. To update the Cisco CTL file after
an upgrade to Unified CM 5.0(1), you must install the plug-in that is
available in Unified CM Administration 5.0(1).
Step 6 Configure the Cisco CTL client. If you created the Cisco CTL file prior
to a Unified CM upgrade, the Cisco CTL file migrates automatically
during the upgrade.
Step 7 Configure the phone security profiles. Perform the following tasks when
you configure the profiles.
Step 8 Configure the device security mode (for SCCP and SIP phones).
Step 9 Configure CAPF settings (for some SCCP and SIP phones). Additional
CAPF settings display in the Phone Configuration window.
Step 10 If you plan to use digest authentication for SIP phones, check the Enable
Digest Authentication check box.
Step 11 Apply the phone security profiles to the phones.
Step 13 Verify that the locally significant certificates are installed on supported
Cisco IP Phones.
Step 14 Configure digest authentication for SIP phones.
Step 23 If you checked the Enable Application Level Authorization check box
in the SIP trunk security profile, configure the allowed SIP requests by
checking the authorization check boxes in the Application User
Configuration window.
Step 24 Reset all phones in the cluster.
In this section, you have learned how to make VoIP service secure with Unified CM, in
terms of authentication, authorization, and encryption. The next section covers the same
topic with access devices.
Access Devices
The access devices in this context are IP phones (for example, Cisco 7960s) and multilayer
switches (for example, Cisco Catalyst 6500s) that provide security features and interfaces
at the user’s network. These features can be enabled or disabled on a phone-by-phone or
service-by-service basis to increase the security of an IP telephony deployment.
Figure 9-2 illustrates the typical layout of an access network.
Access Network
278 Chapter 9: Protection with Enterprise Network Devices
To make a secure access network, the section recommends the following method of using
IP phone, VLAN, switch port, and ACL with configuration examples.
IP Phone
IP phones (Cisco Unified IP Phones) have built-in features to increase the security on the
network even though the number of features is relatively small compared to other network
devices. The best practices of usage can be summarized as follows:
• Disable unused PC ports to prevent a device from plugging into the back of the phone
and getting network access through the phone itself. A phone in a common area such
as a lobby would typically have its port disabled.
• Enable Gratuitous ARP (ARP announcement) to prevent man-in-the-middle attacks to
the phone.
• Isolate the voice VLAN not to allow any devices from the PC port to access the voice
VLAN.
• Restrict access to the built-in web server so that an attacker cannot get any information
from the interface.
• Disable access to the network settings page on the phone so that an attacker cannot
obtain network information like IP addresses of TFTP, default gateway (GW), or
Unified CM.
• Integrate authentication and encryption with Unified CM and CME (refer to previous
sections).
• Keep in mind that enabling video capabilities, as it is designed, could possibly allow
communication to the phone from the PC connected to that phone.
Switch
Switch (Cisco Catalyst Switch) also provides many security features to protect IP telephony
network. The following section present the methodology of preventing typical treats in the
access network with the switch.
An attacker may use any flooding tool (for example, “macof”) to generate MAC flooding
from random source to random destination MAC address, which can fill up the CAM
quickly and disrupt the functionality of the switch.
To prevent malicious MAC flooding, limit the number of MAC addresses allowed to access
individual ports based on the connectivity requirements for those ports. For example, with
a switch port with only a workstation attached to it, you would want to limit the number of
learned MAC addresses to one. In the case of a port with an IP phone and a workstation
behind it, you would want to set the number of learned MAC addresses to two.
If an untrusted DHCP-snooping port makes a DHCP server response, its response will be
dropped. However, legitimately attached DHCP servers or uplinks to legitimate servers
must be configured as trusted.
VLAN ACL
You can use VLAN ACLs to control VoIP traffic that flows in the access network. Cisco
multilayer switches have the capability of controlling Layers 2 to 4 within a VLAN ACL.
Depending on the types of switches in a network, VLAN ACLs can be used to block traffic
into and out of a particular VLAN. They can also be used to block intra-VLAN traffic to
control what happens inside the VLAN between devices.
If you plan to deploy a VLAN ACL, you should verify which ports are needed to allow the
phones to function with each application used in your VoIP network. Normally any VLAN
ACL would be applied to the VLAN that the phones use. This would allow control at the
access port, as close as possible to the devices that are plugged into that access port.
Example 9-8 represents a VLAN ACL that allows only the traffic for a Cisco 7960 IP Phone
to boot and function in a VLAN. The example uses the following IP address ranges:
• Phones are in the range 10.0.20.
• Servers are in the range 10.0.10.
Access Devices 283
Note that the ports do change when either the application is updated or the OS is updated.
This note applies to all the IP telephony devices in the network, including phones. To obtain
the latest list of ports used by a product, refer to the appropriate documentation for the
version of the product that is running on your network.
As this example of an ACL illustrates, the more well-defined the IP addresses are in a
network, the easier it is to write and deploy an ACL.
284 Chapter 9: Protection with Enterprise Network Devices
Deployment Example
Example 9.9 illustrates one possible way to configure a phone and a network for use in an
area with low physical security, such as a lobby area. None of the features in this example
are required for a lobby phone, but if your security policy states more security is needed,
you could use the features listed in this example.
Because you would not want anyone to gain access to the network from the PC port on the
phone, you should disable the PC port on the back of the phone to limit network access. You
should also disable the settings page on the phone so that potential attackers cannot see the
IP addresses of the network to which the lobby phone is connected. The disadvantage of
not being able to change the settings on the phone usually will not matter for a lobby phone.
Because there is very little chance that a lobby phone will be moved, you could use a static
IP address for that phone. A static IP address would prevent an attacker from unplugging
the phone and then plugging into that phone port to get a new IP address. Also, if the phone
is unplugged, the port state will change and the phone will no longer be registered with
Unified CM. You can track this event in just the lobby phone ports to see if someone is
trying to attach to the network.
Using static port security for the phone and not allowing the MAC address to be learned
would mean that an attacker would have to change his MAC address to that of the phone,
if he were able to discover that address. Dynamic port security could be used with an
unlimited timer to learn the MAC address, so that it would not have to be added. Then the
switchport would not have to be changed to clear that MAC address unless the phone is
changed. The MAC address is listed in a label on the bottom of the phone. If listing the
MAC address is considered a security issue, the label can be removed and replaced with a
“Lobby Phone” label to identify the device.
A single VLAN could be used, and CDP could be disabled on the port so that attackers
would not be able to see any information from the Ethernet port about that port or switch
to which it is attached. In this case, the phone would not have a CDP entry in the switch for
E911 emergency calls, and each lobby phone would need either a label or an information
message to local security when an emergency number is dialed.
A static entry in the DHCP Snooping binding table could be made because there would be
no DHCP on the port. When the static entry is in the DHCP Snooping binding table, Dynamic
ARP Inspection could be enabled on the VLAN to keep the attacker from getting other
information about one of the Layer 2 neighbors on the network.
With a static entry in the DHCP Snooping binding table, IP Source Guard could be used. If
an attacker got the MAC address and the IP address and then started sending packets, only
packets with the correct IP address could be sent.
Access Devices 285
A VLAN ACL could be written to allow only the ports and IP addresses that are needed for
the phones to operate. Example 9-9 contains a very small ACL that can be applied to a port
at Layer 2 or at the first Layer 3 device to help control access into the network. This example
is based on a Cisco 7960 IP Phone being used in a lobby area, without music on hold to the
phone or HTTP access from the phone. It uses the following IP address ranges:
• The lobby phone has an IP address of 10.0.40.5.
• The Unified CM cluster uses the address range of 10.0.20.*
• The DNS server has an IP address of 10.0.30.2.
• The HSRP routers have IP addresses 10.0.10.2 and 10.0.10.3.
• Other phones in the network use IP addresses in the range 10.0.*.*
Example 9-9 ACL Example for Lobby Phone
In this section, you learned how to make VoIP service secure with IP phones and switches.
Many methods and configuration examples were demonstrated to prevent unauthorized
access, MAC CAM flooding, unauthorized network extensions, fraudulent DHCP server,
DoS attack, and so on.
Summary
This chapter demonstrates how to protect the enterprise VoIP network with five different
types of Cisco network devices; VoIP-aware firewall, Unified CME, Unified CM, IP phone,
and multilayer switch.
The VoIP-aware firewalls (Cisco ASA, PIX, and FWSM) use an ACL as a primary method
of protecting VoIP servers and media gateways from external devices that are not supposed
to communicate with them. The ACL for the VoIP traffic should be dynamically updated
because the ports used by entities are dynamically changed based on the call setup. The
Cisco firewalls are running in the Active-Standby mode for high availability, and routed or
transparent mode for operation. Each mode targets different service with pros and cons.
Unified CME supports the latest security features for small or medium-size enterprise VoIP
networks to monitor and prevent malicious attacks or malfunctions of endpoints. To make
a network secure, you should utilize the features of system access control, phone registration
control, secure GUI management, class of restriction, and call blocking according to the
policy and service requirements.
Unified CM supports the latest security technology to establish and maintain authenticated
communications along with encryption as an enterprise-class IP telephony call processing
system providing voice, video, mobility, and presence service. To make a network secure,
utilize the features of device authentication, digest authentication, authorization, transport
security, message integrity, and signal/media encryption.
As access devices, IP phones and switches provide security features and interfaces at the
user end. With the IP phone, disable unused PC ports, enable Gratuitous ARP, isolate voice
VLAN, restrict access to the built-in web server, and integrate authentication/encryption
with VoIP servers. With the switch, configure the security features for preventing MAC
CAM flooding, illegitimate port access, fraud DHCP server, DHCP DoS attack, and ARP
flooding.
References 287
End Notes
1 Cisco Unified Communications Solution Reference Network Design (SRND) based
on Cisco Unified Communications Manager Release 6.x, http://www.cisco.com/en/
US/products/sw/voicesw/ps556/products_implementation_design_guide_
book09186a008085eb0d.html.
2 RFC 2663, “NAT Terminology and Considerations,” P. Srisuresh, M. Holdrege,
August 1999.
3 Cisco Unified Communications Manager Express Security, http://www.cisco.com/
en/US/netsol/ns340/ns394/ns165/ns391/networking_solutions_design_
guidance09186a00801f8e30.html.
4 Cisco CallManager Security Guide Release 5.0, http://www.conft.com/en/US/docs/
voice_ip_comm/cucm/security/5_0_1/sec50.html.
References
RFC 2617, “HTTP Authentication: Basic and Digest Access Authentication,” J. Franks,
P. Hallam-Baker, J. Hostetler, S. Lawrence, P. Leach, A. Luotonen, L. Stewart, June 1999.
RFC 3261, “SIP (Session Initiation Protocol),” J. Rosenberg, H. Schulzrinne, G. Camarillo,
A. Johnston, J. Peterson, R. Sparks, M. Handley, E. Schooler, June 2002.
RFC 3711, “Secure Real-time Transport Protocol (SRTP),” M. Baugher, D. McGrew,
M. Naslund, E. Carrara, K. Norrman, March 2004.
RFC 4346, “Transport Layer Security (TLS) Protocol,” T. Dierks, E. Rescorla, April 2006.
PART
III
Lawful Interception (CALEA)
Chapter 10 Lawful Interception Fundamentals
NOTE The content in this chapter does not represent any legal requirements or obligations for a
certain country, but provides general guidelines based on common specifications so that
you may understand the concept and the service architecture at a high level. Readers who
are planning or implementing LI in their VoIP service need to look at country-specific
requirements and standards. For Europe and America, there is more information on the
following section.
TIP For more information on country-specific requirements and standards, go to your favorite
search engine and search for “Lawful Interception,” “Communications Assistance for Law
Enforcement Act (CALEA),” or “Wiretapping” with the country name.
292 Chapter 10: Lawful Interception Fundamentals
This chapter focuses on common concepts and methodology based on the proposals in the
U.S., referring to the following specifications:
1 ATIS T1.6781—Defines Lawfully Authorized Electronic Surveillance (LAES) for
voice-over-packet technologies in wireline telecommunications networks, written by
ATIS.
2 TIA ANSI/J-STD-025A2—Defines the interfaces between a TSP and an LEA to
assist the LEA in conducting lawfully authorized electronic surveillance. ATIS
T1.678 inherited this specification.
3 PacketCable PKT-SP-ESP1.5-I02-0704123—Defines the interface between a TSP
that provides telecommunications services to the public for hire using PacketCable
capabilities, and an LEA to assist the LEA in conducting LAES.
4 RFC 39244—Describes Cisco’s architecture for supporting lawful intercept in IP
networks, and provides a general solution that has a minimum set of common
interfaces.
Now that you are aware of the basic concept, definition, and background of LI, the next
section shows what the requirements of LI are, as an initial step.
• If the information being intercepted is encrypted by the TSP and the TSP has access
to the keys, the information must be decrypted before delivery to the LEA or the
encryption keys should be passed to the LEA to allow them to decrypt the information.
• If the information being intercepted is encrypted by the target subscriber and its
remote party and the service provider has access to the keys, the service provider may
deliver the keys to the LEA.
• There is often a requirement for TSP to be able to do multiple simultaneous intercepts
on a target subscriber. The fact that there are multiple intercepts should be transparent
to the LEAs.
• There is often a requirement that the service provider should not deliver any unauthorized
information to the LEA.
Now that you are aware of common requirements, the next section takes a look at the basic
architecture of LI.
Lawful
Authorization
Telecommunication Service Provider
Delivery Function
Service Provider
Administration Function
Access Function
Reference Model from an Architectural Perspective 295
As shown, the access function, delivery function, and service provider administration
function are the responsibility of the TSP, and the collection function and law enforcement
administration function are the responsibility of the LEA. The use of these functions to
perform an interception is initiated by receipt of a specific lawful authorization. The
following sections present a brief description of each function.
NOTE All LI functions use an initial capital letter for each function’s name followed by the letter
“F,” such as “AF” for “Access Function” because these names are defined by LI specifica-
tions and not as general terms.
AF (Access Function)
The AF consists of one or more intercept access points (IAPs), accesses and intercepts the
target subscriber’s call data and content confidentially.
IAP may be an existing device that has intercept capability or it could be a special device
that is provided for that purpose; for example, a Session Border Controller (SBC).
The AF typically includes the ability
• To intercept the target subscriber’s call data information unobtrusively and make the
information available to the DF (Delivery Function).
• To intercept target call content unobtrusively and make the call content available to
the DF.
• To protect (for example, prevent unauthorized access, manipulation, and disclosure)
intercept controls, intercepted call content, and call data consistent with TSP security
policies and practices.
The intercepted information by the AF is delivered to the next function, Delivery Function.
DF (Delivery Function)
The DF delivers intercepted communications to one or more CFs (Collection Functions) in
the form of call content and data.
The DF typically includes the ability
• To accept call content for each target subscriber over one or more channels from
the AF(s).
• To deliver call content for each target subscriber over one or more call content
channels (CCCs; see the following Note) to a CF.
296 Chapter 10: Lawful Interception Fundamentals
• To accept call data or packet-mode (see the following Note) content information
for each target subscriber over one or more channels and deliver that information
to the CF over one or more call data channels (CDCs; see the following Note).
• To ensure that the call data and content delivered to a CF is authorized for a
particular LEA.
• To duplicate and deliver authorized call data and content for the target subscriber
to one or more CFs (up to a total of five).
• To protect (for example, prevent unauthorized access, manipulation, and disclosure)
intercept controls, intercepted call content, and data consistent with TSP security
policies and practices.
NOTE Call content channel (CCC) is the logical link between the device performing an electronic
surveillance access function and the LEA that primarily carries the call content passed
between an intercept subject and one or more associates.
Call data channel (CDC) is the logical link between the device performing an electronic
surveillance access function and the LEA that primarily carries call-identifying
information.
NOTE Packet mode is a communication in which individual packets or virtual circuits of a com-
munication within a physical circuit are switched or routed by the accessing telecommuni-
cation system. Each packet may take a different route through the intervening network(s).
After receiving the intercepted information, the DF sends them the next function, Collection
Function.
CF (Collection Function)
The CF is responsible for collecting lawfully authorized intercepted communications (call
content) and call data for an LEA. The CF is the responsibility of the LEA.
The CF typically includes the ability
• To receive and process call content information for each target subscriber.
• To receive and process information regarding each target subscriber (for example,
call associated or non-call associated).
Request and Response Interfaces 297
NOTE A Mediation Device (MD) is maintained by TSP and is the center of the LI process. It sends
configuration commands to the various IAPs to enable intercepts, receives intercept
information (call data and content), and delivers this information to the LEA. If more than
one LEA is monitoring an intercept target, the mediation device duplicates the intercept
information for each LEA. The mediation device is sometimes called the delivery function.
In some cases, the mediation device performs additional filtering of the information. It is
also responsible for formatting the information to be compliant with the country or
technology-specific requirements for delivery to law enforcement.
LEA
Law Enforcement Agent
Domain
HI-1
Demarcation
HI-2 HI-3
PI
Service Provider Mediation Device (MD)
Administration Function
Demarcation
Access Domain
Note that there are demarcation points between LEA, Service Provider, and Access domain
in Figure 10-2, assuming that the service provider does not manage the subscriber’s access
network.
Table 10-1 provides a brief description of the interfaces in Figure 10-2, based on what
information is sent to which direction.
Request and Response Interfaces 299
Interface Description
HI-1 Handover Interface 1—The law enforcement agent provides intercept information
to the service provider administration function. The information could be the phone
number of the target subscriber, time of interception, message format, encryption key,
and so on.
HI-2 Handover Interface 2—The interface between the mediation device and law
enforcement agent (for example, collection function) for delivering call data, such
as call duration, time, call direction, message headers, and so on.
HI-3 Handover Interface 3—Interface between the mediation device and law enforcement
agent (for example, collection function) for delivering call content, such as voice
or video.
PI Provisioning Interface—Provisioning interface from the service provider
administration function to a mediation device. The parameters include target
identifier, duration of interception, type of interception, and so on.
INI-1A Internal Network Interface 1A—A mediation device provides interception
information (for example, target identifier, duration) to the intercept access point for
call data.
INI-1B Internal Network Interface 1B—A mediation device provides interception
information (for example, target identifier, duration) to the intercept access point for
call content.
INI-2 Internal Network Interface 2—The internal access point sends call data
information to the mediation device through this interface.
INI-3 Internal Network Interface 3—The internal access point sends call content to the
mediation device.
Step 9 IAPs duplicate and send the call data and/or content to MD through
INI-2 and/or INI-3.
Step 10 MD forwards the call data and/or content to LEA through HI-2 and/
or HI-3.
Now that you are aware of LI interfaces—that is, what information is sent to which
direction—the next section shows operational considerations when practicing LI.
Operational Considerations
There could be a variety of operational issues in LI, depending on the requirements and
unique circumstances country by country. In particular, security-related issues are outstand-
ing because of the sensitive nature of LI, both from the standpoint of the need to protect
sensitive data and to conceal the identities of law enforcement agencies and the intercept
targets.
The following section describes typical issues that need to be considered when implement-
ing and operating LI. The content refers to Cisco SII architecture.5
NOTE In a case where there is multihoming (two or more routers connected to provide access for
the CPE), intercept taps may have to be installed on more than one access router. If the CPE
is multihomed to multiple service providers, the interception will have to be installed on
each service provider separately and the LEA will have to correlate the data.
v=0
o=UserA 2890844526 2890844526 IN IP4 userAclient.example.com
s=-
c=IN IP4 192.168.100.10
t=0 0
m=audio 49172 RTP/AVP 0
a=rtpmap:0 PCMU/8000
SIP/2.0 200 OK
Via: SIP/2.0/UDP userAclient.example.com:5060;branch=z9hG4bK74bf9
From: UserA <sip:[email protected]>;tag=9fxced76sl
To: UserB <sip:[email protected]>;tag=314159
Call-ID: [email protected]
CSeq: 1 INVITE
Contact: <sip:[email protected]>
Content-Type: application/sdp
Content-Length: 147
v=0
o=UserB 2890844527 2890844527 IN IP4 userBclient.example.com
s=-
c=IN IP4 172.23.10.10
t=0 0
m=audio 3456 RTP/AVP 0
a=rtpmap:0 PCMU/8000
Content Encryption
If the call content is encrypted and the service provider has access to the encryption keys
(for example, receives keys in Session Description Protocol [SDP]), the keys can be sent
via the call data. It is, however, possible for end users to exchange keys by some other
means without any knowledge of the service provider, in which case the service provider
will not be able to provide the keys. This kind of encryption could make decryption at the
LEA impossible. This is why the original packets are provided on interface INI-3 rather
than attempting to convert them to some other format.
Operational Considerations 303
Capacity
Support for lawful intercept on a network element supporting customers consumes resources
on that equipment. Therefore, support for lawful intercept requires capacity planning and
engineering to ensure that revenue-producing services are not adversely affected.
The CPUs of the following devices will be impacted by LI:
• Edge router—Must be able to intercept and replicate all intercepted IP
communication on its section of the network.
• Trunking gateway—Must be able to intercept and replicate all intercepted calls
that are forwarded off-net.
• Mediation device—Must be able to support the required maximum number of
simultaneous intercepts.
The following interfaces must be engineered with sufficient bandwidth to support LI traffic:
• IAP (for call data) <-> mediation device
• IAP (for call content) <-> mediation device
• Mediation device <-> collection functions
You should also understand that three-way calls require twice the bandwidth of regular calls
because they require two pairs of transmit and receive channels.
You may also need to provision a network management system for LI, such as domain name
server (DNS), DHCP server, simple network management protocol (SNMP), network time
protocol (NTP) server, and so on.
The various devices involved in LI have minimum software and memory requirements that must
be met. Because of the number of possible devices, these requirements are subject to change.
The several issues in this section should be considered before implementing and operating
LI service.
Summary
LI is the lawfully authorized interception of communications (call content) and call-
identifying information (call data) for a particular telecommunication subscriber (target
subscriber), requested by LEA.
The call content is voice or video. The call data is a dialed number, call direction, call
duration or signaling information, and so on. The target subscriber is identified generally
by a phone number.
Almost every country has its own LI requirements and has adopted global standards (or
proposals) fully or partially, developed by standard organizations. There are two groups of
leading organizations: ATIS, TIA, PacketCable and Cisco in the U.S., and ETSI in the EU.
End Notes 305
In America, the U.S. Congress passed CALEA in 1994 to make clear a telecommunications
carrier’s duty to cooperate in the interception of communications. Although Europe has a
single organization, America has multiple organizations with different proposals, even
though those are fairly similar at a high level.
As a precondition for TSP’s assistance, LEA should serve TSP with the necessary legal
authorization identifying the target subscriber, the communications and information to be
accessed, and service areas where the communications and information can be accessed.
After this authorization is obtained, the TSP shall perform access and delivery for
transmission to the LEA’s procured equipment, facilities, or services.
The requirements for TSP vary from country to country, but many requirements remain
common at a high level, such as: the LI must not be detectable by the target subscriber, must
have a capability of encryption, must provide a separate interface for call data/content, and
must be able to do multiple simultaneous intercepts on a target subscriber, and so on.
The functions needed to perform LI are broadly categorized as AF, DF, CF, SPAF, and
LEAF. The AF, DF, and SPAF are the responsibility of TSP, and the CF and LEAF are the
responsibility of LEA.
Interface INI-1 is used for MD to request IAP interception of call data and content. Interface
INI-2 is used for IAP to send call data to MD. Interface INI-3 is used for IAP to send call
content to MD. Interface HI-1, HI-2, and HI-3 are used between LEA and MD to send the
request and receive the response of interception.
There could be a variety of operational issues on LI, depending on the requirements and unique
circumstances country by country. The typical issues are detection by the target subscriber,
identifying location/address information for call content interception, content encryption,
unauthorized creation/detection, special call type (call forward/transfer), and capacity.
End Notes
1 ATIS T1.678, “Lawfully Authorized Electronic Surveillance (LAES) for Voice
over Packet Technologies in Wireline Telecommunications Networks,” Alliance
for Telecommunications Industry Solutions, http://www.atis.org.
2 TIA ANSI/J-STD-025A, “Lawfully Authorized Electronic Surveillance,
Telecommunications Industry Associations,” http://www.tiaonline.org/.
3 PacketCable PKT-SP-ESP1.5-I02-070412, “Electronic Surveillance,” PacketCable,
http://www.packetcable.com.
4 RFC 3924, “Cisco Architecture for Lawful Intercept in IP Networks,” F. Baker,
B. Foster, C. Sharp, October 2004.
5 Cisco Service Independent Intercept Architecture, Cisco Systems, http://
www.cisco.com/en/US/technologies/tk583/tk799/technologies_design_
guide09186a0080826773.pdf.
This chapter covers the methodology of deploying Lawful Interception in the following
interfaces:
• Intercept Request Interface
• Call Data Connection Interface
• Call Content Connection Interface
CHAPTER
11
Lawful Interception
Implementation
Chapter 10 covered the fundamentals of Lawful Interception, such as definition, require-
ments, basic architecture, and so on. This chapter covers the next step, how to implement
each of the fundamentals in the VoIP service environment.
Even if the requirement from Law Enforcement Agent (LEA) is fixed, the implementation
varies depending on a Telecommunications Service Provider (TSP). The primary reason for
this variance is that there is no single standard that rules the method of implementation for
whole LI architecture. In fact, most standard specifications provide partial implementation
for the interface between functional modules.
To show you how to implement these fundamentals, this chapter focuses on the following
interfaces with corresponding specifications in the United States, in order to give you
guidelines:
• Intercept Request Interface (INI-1)
— SIP P-DCS
— Cisco SII (Service Independent Intercept)
• Call Data Connection Interface (INI-2, HI-2)
— PacketCable Electronic Surveillance Specification
• Call Content Connection Interface (INI-3, HI-3)
— PacketCable Electronic Surveillance Specification
NOTE The content in this chapter does not represent any legal requirements or obligations for a
certain country, but provides general guidelines based on common specifications. Readers
who are planning or implementing LI in their VoIP service network should look at country-
specific requirements and standards.
308 Chapter 11: Lawful Interception Implementation
Intercept Request
INI-1 Call Data Call Content
(Dynamic Triggering or INI-2 INI-3
Pre-Provisioning)
There are two methods of requesting interception: dynamic triggering and pre-provisioning.
Dynamic triggering means that the MD has full control of VoIP signals and sends a request
message to the IAP when the target subscriber is detected in the signaling path. That is, the
MD keeps monitoring all signals between endpoints and detects the target, whereas the IAP
has no information about the target beforehand and just responds passively according to the
MD’s request. Some VoIP protocols provide this kind of interface, such as SIP P-DCS-LAES
header (RFC 3603).
Pre-provisioning means that the MD provisions the request to the IAP beforehand so that
the IAP may send back intercepted call data or content when the IAP detects the target
subscriber. That is, the IAP should know information about the target beforehand and
monitor the traffic passing by. This kind of interface is usually proprietary, and Cisco’s
Service Independent Intercept (SII) is an example.
The following sections cover the details of SIP P-DCS headers and Cisco SII to give you a
better understanding of intercepting between the MD and the IAP.
NOTE Both SIP P-DCS headers and Cisco SII define only the requesting interface. The response
interface (for example, formatting call data or content) is defined by other organizations
like PacketCable, which is described in the section “Call Data and Content Connection
Interface” in this chapter.
Intercept Request Interface 309
4. Response
Without P-DCS
1. Call Request
(INVITE) 5. Media Stream
Target Subscriber
Step 2 The IAP receives the INVITE and sends it to an MD having proxy
function.
Intercept Request Interface 311
2. Call Request
8. Intercepted Call
with P-DCS 5. Response with P-DCS Header
Data and/or Content
Header (INVITE)
4. Response
3. Call Request
without P-DCS 7. Media Stream
Header (INVITE)
Target Subscriber
Step 4 The endpoint of the target subscriber responds for the call request.
Step 6 The MD sends the response to the originating proxy. If the IAP is unable
to perform the required surveillance, the MD should include a P-DCS-
LAES header in the first reliable non-100 response requesting the
originating proxy to perform the surveillance. The P-DCS-LAES header
should include the address and port of the MD for a copy of the call data
and/or content.
Step 7 After call setup, media channels are opened and the target subscriber
sends and receives media through IAP that could be different from the
IAP handling signals.
Step 8 The IAP sends intercepted (duplicated) call data and/or content to MD
that could be different from the MD handling signals.
In this section so far, you have learned about the intercept process flows for inbound and
outbound calls with SIP P-DCS headers, which is the method of dynamic triggering. The
next subsection covers the pre-provisioning method with Cisco SII.
Cisco SII
The SII architecture was developed by Cisco to provide compliance with LI legislation and
regulations. It is defined in RFC 39242 and the Cisco SII Architecture document,3 which
this content refers to.
It provides a common approach for intercepting IP communications using existing network
elements. The architecture addresses the key LI requirements and does so in a cost-effective
manner. Key features of the architecture include the following:
• Use of standard access list technology to provide the intercept.
• Encapsulation of the entire intercepted and replicated packet so that the original
source and destination addresses are available (important information for intercept
purposes).
• Use of a control plane for intercept that is different from call control, which prevents
network operations personnel from detecting the presence of active intercepts in the
network (see the following Note).
• An integrated approach that limits the intercept activity to the router or gateway that
is handling the target’s IP traffic and only activates an intercept when the target is
accessing the network.
314 Chapter 11: Lawful Interception Implementation
• No LI-related command-line interface (CLI) commands that could allow for the
detection of intercept activity on a router or gateway.
• LI-related Management Information Bases (MIBs) and traps sent only to the (third-
party) equipment controlling the intercept.
• Support for multiple encapsulation and transport formats (for example, PacketCable
Electronic Surveillance Specification, described in the section “Call Data and Content
Connection Interfaces”).
NOTE A control plane defines the transport used for sending or receiving the messages that initiate
the LI. Because it is important that unauthorized network operations personnel not know
that intercepts are active on the network, it is important to hide or keep separate the active
intercept messages from those messages used for routine call setup. However, many TSPs
routinely monitor all messages for diagnostic purposes, so the personnel may be able to
learn of the interception.
Device Interfaces
Figure 11-4 illustrates the device interfaces in the context of the specific devices that are
used in a Cisco SII network. Note that Call Management Server (CMS), edge router, and
trunking gateway have the call data and call content IAP function in this picture. Access
server and authentication, authorization, and accounting (AAA) server are used for “data”
interception that is beyond the scope of this book.
Intercept Request Interface 315
LI
Collection LEA
Administration
Function Network
Function
c
a Data Target
Access
Server Subscriber
k d2 e2
Mediation
Device
d1
e1
DNS
Server d2 d1 d2
e2 e1 e2
AAA
Server
CMS
PSTN
Interface Description
continues
316 Chapter 11: Lawful Interception Implementation
Interface Description
d1 This is the delivery interface. The call data IAP uses this interface to deliver call data
to the mediation device. For voice, this is according to the PacketCable EventMessages
Specification document. For data, this is Remote Authentication Dial-In User Service
(RADIUS) accounting messages.
For voice intercepts, the IAP is the call control entity (call agent, SIP proxy, or H.323
gateway). For data intercepts, the IAP is the AAA server (or a sniffer monitoring
RADIUS traffic).
d2 The call content IAP replicates call content and sends it to the mediation device. The
call content IAP encapsulates the packets with additional User Datagram Protocol
and IP headers and a 32-bit call content connection identifier (CCCID) header, based
on the PacketCable Electronic Surveillance Specification document. The CCCID is
used to associate the call content with the target.
The CCCID is included so that the mediation device can map intercepts to the
appropriate warrants. Usually, the mediation device will rewrite the CCCID before
forwarding intercept information to CFs.
The call content IAP is an edge router, trunking gateway, or access server.
e1 The mediation device uses Secure Shell (SSH) to provision an intercept on the call
data IAP.
e2 The mediation device uses Simple Network Management Protocol version 3
(SNMPv3) to instruct the call content IAP to replicate call content and send it to the
mediation device. The call content IAP can be either an edge router or a trunking
gateway for voice.
k The mediation device queries the Domain Name Service (DNS) server to determine
the fully qualified domain name (FQDN) of the call content IAP.
1 Court Order
2 Configuration
3 Enable Intercept
4 Incoming Call
Signaling_Start 5
Termination Attempt 6
Figure 11-5 Standard Intercept Process Flow
7 SDP
QoS_Reserve 8
9 Query
10 SNMPv3 Command
CCOpen 11
12 Call
13 Ring
CC 14
CC 15
Call_Answer 16
Answer 17
Call_Release 18
Release 19
QoS_Stop 20
21 SNMPv3 Destroy
CCClose 22
Intercept Request Interface
317
318 Chapter 11: Lawful Interception Implementation
Step 14 The call is connected end-to-end, and the edge router intercepts and
replicates all voice packets and sends the packets to the mediation device.
Step 15 The mediation device delivers call content to the CF.
Step 17 The mediation device forwards this message as an Answer message to the
CF.
Step 18 When the parties hang up, the CMS sends a Call_Release message to the
mediation device.
Step 19 The mediation device forwards this message as a Release message to the
CF.
Step 20 The CMS sends a QoS_Stop message to the mediation device.
Step 21 When the mediation device receives the QoS-Stop message, it sends
SNMPv3 messages to the edge router instructing it to destroy the call
content monitoring sessions and the mediation device MIB. Three
destroy messages are sent: one for each of the two call content streams
and one for the mediation device MIB.
Step 22 The mediation device sends a CCClose message to the CF.
This is the intercept process flow for a standard call. The next topic is the process flow for
a forwarding call.
1 Court Order
2 Configuration
3 Enable Intercept
4 Call
Figure 11-6 Standard Intercept Process Flow
Signaling_Start 5
Termination Attempt 6
Service_Instance 7
Service_Instance 8
Chapter 11: Lawful Interception Implementation
QoS_Reserve 9
10 Query
11 SNMPv3 Command
CCOpen 12
CC 13
CC 14
Call_Release 15
Release 16
QoS_Stop 17
18 SNMPv3 Destroy
CCClose 19
Intercept Request Interface 321
This is the intercept process flow for a forwarding call. The next topic is the process flow
for a conference call.
322 Chapter 11: Lawful Interception Implementation
1 Court Order
2 Configuration
3 Enable Intercept
Outgoing Call 4
Signaling Start 5
Originating Attempt 6
SDP 7
QoS_Reserve 8
9 Query
10 SNMPv3 Command
CCOpen 11
Call 12
13 Ring
CC 14
CC 15
Call_Answer 16
Answer 17
18 Hook Flash
Signaling Start 19
Origination Attempt 20
SDP 21
Qos_Reserve 22
23 Query
24 SNMPv3
CCOpen 25
Call CMS 26
27 Ring
CC 28
CC 29
Call_Answer 30
Answer 31
32 Hook Flash
Service_Instance Message 33
2 Call_Release Messages 34
2 Release Messages 35
2 QoS_Stop Messages 36
2CCClose Messages 38
Intercept Request Interface 323
Step 8 The CMS sends the SDP information to the mediation device in a
QoS_Reserve message.
Step 9 The mediation device queries the DNS server to determine the IP address
of the edge router (based on the IP address of the target gateway).
Step 10 The mediation device sends an SNMPv3 command to the edge router to
initiate the intercept.
Step 11 The mediation device sends a CCOpen message with the SDP to the CF.
Step 14 The call is connected end-to-end, and the edge router intercepts and
replicates all voice packets and sends the packets to the mediation device.
Step 15 The mediation device delivers call content to the CF.
Step 17 The mediation device forwards this message as an Answer message to the CF.
Step 18 The target hook flashes to put the Hook nontarget subscriber 1 on hold
and initiate a second call.
Step 19 The CMS sends a Signalling_Start message to the mediation device.
Step 22 The CMS sends the SDP information to the mediation device in a
QoS_Reserve message.
Step 23 The mediation device queries the DNS server to determine the IP address
of the edge router (based on the IP address of the target gateway).
Step 24 The mediation device sends an SNMPv3 command to the edge router to
initiate the intercept.
Step 25 The mediation device sends a CCOpen message with the SDP to the CF.
Step 28 The call is connected end-to-end, and the edge router intercepts and
replicates all voice packets and sends the packets to the mediation device.
Step 29 The mediation device delivers call content to the CF.
Step 37 When the mediation device receives the QoS_Stop message, it sends
SNMPv3 messages to the terminating gateway instructing it to destroy
the call content monitoring sessions and the mediation device MIB. Six
destroy messages are sent: three for each part of the three-way call.
Step 38 The mediation device sends two CCClose messages to the CF.
This is the intercept process flow for a conference call. The next section covers what you
need to consider before implementing LI in your network.
Intercept Request Interface 325
Predesign Considerations
Before configuring your network for LI, you should establish or verify reliable end-to-end
IP connectivity on your existing network. The main concern when designing an LI network
is ensuring that the network has sufficient bandwidth and CPU capacity to handle the
anticipated load of intercepts. This section focuses on the following considerations as an
initial stage of designing an LI network:
• Bandwidth and processing power
• IP address provisioning
The CPUs of the following devices will be impacted by LI:
• Edge router
It should be able to intercept and replicate all intercepted IP communication on its
section of the network.
• Trunking gateway
It should be able to intercept and replicate all intercepted calls that are forwarded off-net.
• Mediation device
It should be able to support the required maximum number of simultaneous intercepts.
The following interfaces should be engineered with sufficient bandwidth to support
LI traffic:
• Between call data IAP and mediation device
• Between call content IAP and mediation device
• Between mediation device and CFs
You should also take into consideration that three-way calls require twice the bandwidth of
regular calls because they require two pairs of transmit and receive channels.
You should also provision a network management system to perform DNS and DHCP, such
as Cisco Network Registrar.
The use of SNMPv3 in SII requires that Network Time Protocol (NTP) is enabled and that
all network elements involved in LI are synchronized to a stable time source.
The various devices involved in LI have minimum software and memory requirements that
must be met.
In general, Cisco recommends that service providers do not use static IP addresses, partic-
ularly for Customer Premises Equipment (CPE). Static provisioning of IP addresses is
time-consuming, expensive, and error-prone. On the IAPs, it can be helpful to use loopback
interfaces for the interface with the mediation device because the loopback interface
remains constant if physical interfaces go out of service or if the routing path changes.
326 Chapter 11: Lawful Interception Implementation
Security Considerations
Given the sensitive nature of lawful intercept—both from the standpoint of the need to
protect sensitive data, and to conceal the identities of law enforcement agencies and the
intercept targets—the LI architecture must contain stringent security measures to combat
the following types of threats:
• Impersonation of LEAs and mediation devices
• Privacy and confidentiality breaches
• Message forgery
• Replay attacks
Because LI is expected to run on the wide-open Internet, very few assumptions should be
made about how well the networks of the LEAs and TSPs can be secured. Although this
section does not examine the issues of physical security, operating system, or application
hardening within the principles of the LI architecture, they are clearly important considerations.
In particular, both the MD and LEA servers must be considered prime targets for attacks by
hackers. Hardening measures commensurate with other highly vulnerable servers, such as
key distribution and AAA servers, must be considered in any design.
All interfaces must be able to provide strong cryptographic authentication to establish the
identity of the principles, and must correlate the identity of the principle with the action
they are attempting to perform. That is, it is not sufficient to expect that authentication alone
implies any specific authorization.
Providing the ability to use strong crypto is not identical to requiring its use. Because many
Cisco devices do not have crypto accelerators, actual use of crypto accelerators is the choice
of the TSP, and is dependent on how the device is deployed and its relative exposure. For
devices placed in open, hostile environments (such as access routers), TSPs must consider
customer requirements for LI when making decisions about crypto acceleration hardware.
Because LI is an interesting target for attackers, all interfaces must perform some sort of
cryptographic message integrity checking (such as Hash-based Message Authentication
Code—Message Digest 5 [MD5]). Message integrity checking must also counter replay
attacks. Because of privacy and confidentiality considerations, the architecture should
allow for the use of encryption. Although encryption is not necessarily a requirement, it is
highly recommended and may be a requirement in some LI deployments.
• Interface between MD and call data IAP: Control—SSH is used for the control
interface between the MD and the call data IAP.
• Interface between MD and call content IAP: SNMPv3 Control—SNMPv3 View-
based Access Control Model (VACM) and User-Based Security Model (USM) are
used for the control interface between the MD and the call content IAP. The native
SNMPv3 security module mechanism must be used, and the minimum requirement is
that preshared keys must be supported. The additional requirement is that the IAP
must support the ability to protect the LI MIBs from disclosure or control by unauthorized
Intercept Request Interface 327
USM users. In general VACM should provide the necessary tools to limit the views to
particular USM users, but there are also special considerations given that USM and
VACM provide the ability to create arbitrary view/user mappings to authorized entities.
The security requirements of the Cisco Lawful Intercept Control MIB (CISCO-TAP-
MIB) with respect to SNMP require the following actions: The MIB must be accessed
(or accessible) only via SNMPv3. By default, no access must be granted to the MIB.
Access to the MIB must be granted only by an administrative authority with the
highest privileges: the CISCO-TAP-MIB can be added to a view only at privilege level
15 (the highest level), and including CISCO-TAP-MIB into a view on a router via the
SNMP-VACM-MIB will be disallowed.
SNMPv3 must be configured correctly to maintain security. The MD acts as a network
manager and the call content IAP acts as an agent.
• Interface Between MD and call data IAP: Data—The call data is delivered from
the call data IAP to the MD. This information is delivered in RADIUS format.
Currently, this information is not encrypted.
• Interface Between MD and call content IAP: Content—The call content informa-
tion is delivered from the call content IAP to the MD. IP security (IPSec via standard
router cryptographic features) is used for this interface.
These are security considerations when designing SII architecture. The next section shows
configuration examples with a few Cisco devices.
Configuration Example
Because it would be impossible to show all configuration examples of devices according to
each service scenario, the purpose of this section is to give a glimpse of configuration with
a few Cisco devices: aggregation router, Cisco BTS 10200, and Cisco PGW 2200. Note that
the example of a mediation device is not shown here. You may need to contact MD vendors,
such as SS8 or Acme Packet, for configuration examples with Cisco devices.
Aggregation Router
The following aggregation router platforms support version 1.0 of Cisco LI MIB:
• Cisco 7200 series routers
• Cisco 7301 router
• Cisco 7500 series routers
• Cisco 10000 Edge Services Router (ESR)
• Cisco 12000 Gigabit Switch Router (GSR)
• Cisco Universal Broadband Router (uBR) 7246
• Cisco uBR 10000
328 Chapter 11: Lawful Interception Implementation
The following configuration enables Cisco SII on an aggregation router using version 1.0
of the Cisco LI MIB:
7200-egw(config)# snmp-server view tapView CTapMIB included
7200-egw(config)# snmp-server group tapGroup v3 auth read tapView write tapView
notify tapView
7200-egw(config)# snmp-server user mduserid tapGroup v3 auth md5 mdpasswd
The following configuration synchronizes the router’s clock with the mediation device and
enables SNMP traps to be sent to the mediation device:
7200-egw(config)# snmp-server enable traps snmp authentication linkdown linkup
coldstart warmstart
7200-egw(config)# snmp-server host 10.15.113.9 version 3 auth mduserid
7200-egw(config)# ntp server 10.15.113.9
The “mduserid” username and “mdpasswd” password must match the username and
password that are provisioned on the mediation device for this particular router. In this case,
the router’s clock is synchronized to the mediation device’s clock. A better option is to
synchronize all devices in the network to an NTP time server.
Because the BTS 10200 call agent has no information about network topology and is not
aware of aggregation routers, no configuration is necessary for aggregation routers.
On the call agent’s profile for trunking gateways, local hairpinning (that is, sending a call
back in the direction that it came from) must be disabled. The following line in the trunking
gateway profile disables local hairpinning:
MGCP_HAIRPIN_SUPP=N
Before adding an MD to the Cisco PGW 2200, you should verify that LI is enabled by
verifying that the “SysConnectDataAccess=true” and “LISupport=enable” parameters are
set as shown in the /opt/CiscoMGC/etc/XECfgParm.dat file.
Following is an example of provisioning a mediation device using default RADIUS timeouts
and retries. The recommended RADIUS key of 16 zeros is automatically provisioned.
mml> prov-add:extnode:name="mdname",type="LIMD",desc="Mediation_Device"
mml> prov-add:lipath:name="md-path",desc'"MD_Path",extnode="aqsacom"
mml> prov-add:iplnk:name="md-link",desc="MD_link",svc="md-path", ipaddr="IP_
Addr2",port=14146,peeraddr="192.168.9.2",peerport=1813,pri=1
In the preceding example, the “ipaddr” value is selected from the /opt/CiscoMGC/etc/
XECfgParm.dat file and must match the physical interface that has connectivity to the
mediation device.
In this subsection, you have learned about Cisco SII interfaces, intercept flows for various
calls, predesign, and security considerations. The next section covers call data and content
connection
PacketCable, TIA or
ATIS Spec
CD Connection CC Connection
Demarcation
Proprietary or
CD Connection CC Connection
PacketCable Spec
The specification of call data and content interface between IAP and MD is various and
mostly proprietary depending on the TSP’s preference. They may deliver original call
signal and media (RTP packets) without any formatting, or encrypt them with their own
method, or apply one of the known specifications like PacketCable Electronic Surveillance.
However, the specification of call data and content interface between MD and LEA is
relatively fixed: Europe adapts ETSI (102 series), and America adapts either PacketCable
(PKT-SP-ESP), ATIS (T1 series), or TIA (J-STD series), which are very similar at a high
level.
This section focuses more on the connection interface between MD and LEA, and refers to
the PacketCable PKT-SP-ESP specification.5 The first topic is the Call Content Connection
Interface, as described in the following section.
NOTE All LI functions use an initial capital letter for each function’s name followed by the letter
“F,” such as “DF” for “Delivery Function,” because these names are defined by LI specifica-
tions and not as general terms.
The CCC datagram should contain a timestamp that allows Law Enforcement to identify
the time at which the corresponding information was detected by the DF. This timestamp
should have an accuracy of at least 200 milliseconds. The CCC datagram should be queued
at the DF for transmission to the CF within 8 seconds of detection of the corresponding
packet by the IAD 95 percent of the time.
The delivery of a particular CCC datagram to the CF depends on many factors not under
the control of the TSP, such as the bandwidth between the DF and CF. These factors may
affect the ability of the TSP to meet the transmission criterion, and this specification does
not require the TSP to take steps to counteract delays caused by such factors.
Call content should be delivered as a stream of User Datagram Protocol (UDP)/IP datagram,
sent to the port number at the CF as provided during provisioning of the interception. The
UDP/IP payload should adhere to the format shown in Figure 11-9.
Call Data and Content Connection Interfaces 331
The CCC-Identifier in Figure 11-9 is provided by the DF in the CCOpen message (begin-
ning of call content delivery in PacketCable specification). It is a 32-bit quantity and is used
to identify the intercept order to the LEA.
A conversation typically consists of two separate packet streams, each corresponding to a
direction of the communication. Both are delivered to the demarcation point with the same
CCC-Identifier. The party listening to the communication is identified by the combination
of Destination Address (from Original IP Header) and Destination Port (from Original
UDP Header). The Destination Address and Destination Port for both parties involved in
the communication are provided in the SDP information provided to the LEA as part of the
CCOpen message.
The DF should generate a CCC-Identifier that is different from all other CCC-Identifiers in
use between that DF and a particular LEA. That is, two streams of content delivered to a
single LEA must have different CCC-Identifiers, but a single stream of content delivered to
multiple LEAs may use a single CCC-Identifier, so long as no other stream being delivered
to one of the LEAs is using the same CCC-Identifier.
The Timestamp in Figure 11-9 should adhere to the NTP time format: a 64-bit unsigned
fixed-point number, in seconds relative to 0000 on 1 January 1900. The integer (whole
seconds) part is in the first 32 bits and the fractional part (fractional seconds) is in the last
32 bits. The timestamp should be accurate to within 200 milliseconds of the time the DF
received the datagram.
Intercepted Real-time Transport Protocol (RTP) information will be of the format shown in
Figure 11-10.
332 Chapter 11: Lawful Interception Implementation
Note that protocols other than RTP may be intercepted, such as for T.38 fax relay.
The brief description of each header in Figure 11-10 is as follows:
• Original IP header—The IP header sent by the endpoint. Contained in this IP header
are the IP Source Address (SA) and IP Destination Address (DA), which identify the
Internet addresses of the source and destination of the packet.
• Original UDP header—The UDP header sent by the endpoint. Contained in this
UDP header is the Source Port and Destination Port, both of which are 16-bit
quantities that identify the connection to the two endpoints.
• Original RTP header—The RTP header sent by the endpoint identified in the SA and
Source Port. This header contains the packet formation timestamp, packet sequence
number, and payload type value, as generated by the source endpoint.
• Original payload—The bit sequence as sent by the endpoint identified in the SA
and Source Port. The payload typically contains the voice samples, as encoded and
encrypted by the sending endpoint. Encryption of the payload is by use of a stream
cipher, or other method. Encoding of the voice may be done through use of one of the
Internet Engineering Task Force’s (IETF’s) defined coder-decoder (codec) algorithms
(for example, G.711 or G.729) or through a dynamic payload type defined in the SDP
(for example, dual-tone multifrequency [DTMF] with RFC 2833). See the following
Note for transcoding.
Call Data and Content Connection Interfaces 333
NOTE Transcoding occurs whenever a voice signal encounters an edge device without compatible
codec support. The transcoding of communications content between encoding algorithms
does not effectively alter the original content if the new encoding algorithm supports at least
the same capabilities (that is, encoded frequency range) as the original encoding algorithm.
Intercepted content may be transcoded into a different encoding format if the new encoding
format provides at least the same level of information as the original encoding format. For
example, the G.711 encoding algorithm is acceptable for use in transcoding content origi-
nally encoded in the G.728 or G.729E algorithms. If G.711 is used for the intercepted call,
the DF may pass the original RTP packets, unaltered and unencrypted. The DF should sup-
port the ability to disable transcoding on a per-intercept basis.
CDC Messages
The CDC messages report call-identifying information accessed by an IAP. These IAPs
provide expeditious access to the reasonably available call-identifying information for calls
made by a target subscriber. This includes abandoned and incomplete call attempts, if
known to an IAP.
334 Chapter 11: Lawful Interception Implementation
The CDC messages in Table 11-2 have been defined to convey information to an LEA for
call-identifying events on a call that result from a user action or a signal. Only the events
that are available to PacketCable elements providing intercept access functionality will be
reported using the messages shown in Table 11-2.
Table 11-2 CDC Messages of PacketCable
CDC Message
(Call Events) Description
Answer A two-way connection has been established for a call under
surveillance.
CCChange A change in the description of call content delivery for a call under
interception.
CCClose End of call content delivery for a call under interception.
CCOpen Beginning of call content delivery for a call under interception.
ConferencePartyChange A third party or more additional parties are added to an existing call to
form a conference call, or any party in a conference call is placed on
hold or retrieved from hold.
DialedDigitExtraction The target subscriber dialed or signaled digits after a call is connected.
MediaReport Exchange of SDP information for new or existing calls for which only
call-identifying information is being reported.
NetworkSignal The PC/TSP network requested the application of a signal toward the
target subscriber.
Origination The IAP detects that the target subscriber is attempting to originate
a call.
Redirection A call under surveillance is redirected (for example, via termination
special service processing or via a call transfer).
Release The resources for a call under surveillance have been released.
ServiceInstance The IAP detects that a defined service event has occurred.
SubjectSignal The target subscriber sends dialing or signaling information to the
PC/TSP network to control a feature or service.
TerminationAttempt The IAP detects a call attempt to a target subscriber.
The examples for a basic call are divided into two: originating from and terminating to a
target subscriber.
For completed calls originating from a target subscriber under a communication intercept
order, nine call-identifying messages are generated for delivery to the LEA: Origination,
CCOpen (downstream), CCOpen (upstream), Answer, CCChange (downstream),
CCChange (upstream), CCClose (downstream), CCClose (upstream), and Release.
For completed calls terminating to a target subscriber under a communication interception
order, nine call-identifying messages are generated for delivery to the LEA: TerminationAttempt,
CCOpen (downstream), CCOpen (upstream), Answer, CCChange (downstream),
CCChange (upstream), CCClose (downstream), CCClose (upstream), and Release.
In addition to the CDC messages described here, other CDC messages might be generated
depending on the events that occur during a basic call. As examples, the NetworkSignal
message might be generated for events such as the application of dial tone (originating call)
and ringing (terminating call) toward the target subscriber, and the SubjectSignal message
might be generated for an event such as fax tone detection.
3 Redirection (to identify the redirection event and the redirected-to party)
4 CCOpen (downstream, if communication interception order)
If the redirection is done after the termination attempt, but before the call is answered, the
following sequence of messages is an example of what will be sent to the LEA.
1 TerminationAttempt (for the original terminating call to the target subscriber)
2 CCOpen (downstream, for the original call, if communication interception order)
7 Redirection (to identify the redirection event and the redirected-to party)
A blind transfer occurs only on an active call; that is, one that has already generated a
Origination or TerminationAttempt, Answer, and (if a communication interception order)
CCOpen (downstream), CCOpen (upstream), CCChange (downstream), and CCChange
(upstream) messages to LEA. When performed by a target subscriber on an active call, the
blind transfer may result in the following call-identifying messages:
1 Redirection (to identify the redirection event and the redirected-to party)
2 CCClose (downstream, of the old connection, if communication interception order)
When a blind transfer of a call under surveillance is performed by a subscriber not under
surveillance, the following sequence of call-identifying messages is an example of what
may be sent to the LEA:
1 CCClose (downstream, of the old connection, if communication interception order)
2 CCClose (upstream, of the old connection, if communication interception order)
When the multimedia terminal adapter (MTA) performs the bridging function, and the
initiator disconnects, the following sequence of call-identifying messages is an example
of what may be sent to the LEA:
1 CCClose (downstream, of the call between A and B, if communication interception
order)
2 CCClose (upstream, of the call between A and B, if communication interception
order)
3 Release (of the call between A and B)
7 Redirection (of the call between C and bridge, redirected-from bridge, redirected-to B)
Procurement, engineering, and sizing of the physical facilities connecting the DF to the CF
is the responsibility of the LEA. Engineering and sizing of the CF is also the responsibility
of the LEA.
When the resources necessary for transmission of call content or call-identifying informa-
tion, as provided by an LEA, are insufficient, the information is not required to be queued
by the DF. In other words, intercepted information may be delayed or discarded by the DF
if insufficient transmission capacity is provided by the LEA to the LEA’s CF.
It is the responsibility of the TSP to deliver CCC and CDC information to a demarcation
point. The demarcation point shall consist of a physical interconnect adjacent to the DF. The
LEA is responsible for providing the equipment, facilities, and maintenance needed to
deliver this information from the demarcation point to the CF.
The TSP should ensure that only those packets that have been authorized to be examined
by the LEA are delivered to the LEA at the demarcation point. For example, if there is more
than one LEA doing surveillance on the TSP’s network at a given point in time, each LEA
should see only the data that it is authorized to receive.
The requirements in each network layer can be summarized as follows:
• Network layer interface—The network layer protocol for delivery of both CDC and
CCC information should be as defined by the IP. The transport protocol for CDC
information is as specified in the section “Call Data Connection Interface,” whereas
transport of CCC information is as specified in the section “Call Content Connection
Interface.” Both CCC and CDC information may be provided over the same physical
interface. Information is available in the CCC and CDC information packets to
identify the type of packet (either CDC or CCC) and the particular case. The identification
is provided either directly by the packet containing the surveillance case identifier, or
indirectly by the packet containing an identifier that can be correlated with the case
identifier.
Contained in the IP header is the source IP address, which is the address of the DF, and the
destination IP address, which is the address of the CF provided during interception
provisioning. All transfer of packets other than those operationally required to
maintain the link should be from the DF to the CF only. At no time may the LEA send
unsolicited packets from the CF to the DF.
• Link-layer interface—The default link-layer protocol between the DF and CF should
be as defined by the Ethernet protocol. However, alternate link-layer protocols may be
used at the discretion of the TSP based on negotiated agreements with the LEA.
• Physical interface—The default type of physical interconnects provided by the TSP
at the demarcation point should be an RJ45 10/100BaseT connection. However,
alternate physical interconnects may be provided at the discretion of the TSP.
Summary 341
Encryption need not be supplied by the TSP on the connections between the DF and
the demarcation point. However, the LEA may choose to provide encryption from the
demarcation point to the CF by supplying the necessary equipment and facilities.
Summary
This chapter demonstrates the methods of implementing Lawful Interception based on
standard specifications in United States, focusing on three interfaces: Intercept Request
Interface (INI-1), Call Data Connection Interface (INI-2, HI-2), and Call Content
Connection Interface (INI-3, HI-3).
Intercept Request Interface has two methods of requesting interception: dynamic triggering
and pre-provisioning.
Dynamic triggering means that the MD has full control of VoIP signals and sends a request
message to the IAP when the target subscriber is detected in the signaling path. That is, the
MD keeps monitoring all signals between endpoints and detects the target, whereas the IAP
has no information about the target beforehand and just responds passively according to
the MD’s request. Some VoIP protocols provide this kind of interface; for example, SIP
P-DCS-LAES header.
Pre-provisioning means that the MD provisions the request to the IAP beforehand so that
the IAP may send back intercepted call data or content when the IAP detects the target
subscriber. That is, the IAP should know the information about the target beforehand and
monitor the traffic passing by. This kind of interface is usually proprietary, and Cisco’s SII
is an example.
There are two interfaces for call data and content connection in the LI architecture: between
the IAP and the MD, and between the MD and LEA. The specification of call data and
content interface between the IAP and the MD is various and mostly proprietary depending
on the TSP’s preference, whereas the interface between the MD and LEA is relatively fixed:
Europe adapts ETSI (102 series), and America adapts either PacketCable (PKT-SP-ESP),
ATIS (T1 series), or TIA (J-STD series), which are similar at a high level.
The call content connection is established by DF as a UDP connection, and the content is
sent to the IP/port number at the CF as provided during provisioning of the interception.
The packet of call content includes an identifier used to identify the intercept order,
timestamp, and intercept information.
The CDC is established by DF as TCP connection, and the data is delivered to the CF
designated by LEA in the surveillance provisioning. The TCP connection shall be capable
of transporting the call-identifying information for multiple surveillance cases to a single
LEA. The call data messages have been defined to convey information to a LEA for call-
identifying events on a call that result from a user action or a signal.
342 Chapter 11: Lawful Interception Implementation
It is the responsibility of the TSP to deliver call data and content information to a demarca-
tion point. The demarcation point shall consist of a physical interconnect adjacent to the
DF. The LEA is responsible for providing the equipment, facilities, and maintenance
needed to deliver this information from the demarcation point to the CF.
The TSP should ensure that only those packets that have been authorized to be examined
by the LEA are delivered to the LEA at the demarcation point. If there is more than one
LEA doing surveillance on the TSP’s network at a given point in time, each LEA should
only see the data that it is authorized to receive.
End Notes
1 RFC 3603, “Private Session Initiation Protocol (SIP) Proxy-to-Proxy Extensions for
Supporting the PacketCable Distributed Call Signaling Architecture,” W. Marshall,
F. Andreasen, October 2003.
2 RFC 3924, “Cisco Architecture for Lawful Intercept in IP Networks,” F. Baker,
B. Foster, C. Sharp, October 2004.
3 Cisco Service Independent Intercept Architecture, http://www.cisco.com/en/
US/products/ps6566/products_feature_guide09186a008060dece.html.
4 PacketCable Electronic Surveillance Specifications, PacketCable, http://
www.packetcable.com/specifications.
5 PacketCable PKT-SP-ESP1.5-I02-070412, Electronic Surveillance, PacketCable,
http://www.packetcable.com.
References
RFC 2833, “RTP Payload for DTMF Digits, Telephony Tones and Telephony Signals,”
H. Schulzrinne, S. Petrack, May 2000.
This page intentionally left blank
INDEX
algorithms
Numerics DES, 87. See also DES
3DES (Triple Data Encryption Standard), 87 DSA, 95–96
hashing, 96
MAC, 99–100
A MD5, 97–98
access SHS, 98–99
control, Unified CME, 259–261 RSA, 95
devices, 13, 277 SHA, 84
deployment, 284–286 alias address modification, 50
IP phones, 278 Alliance for Telecommunications Industry
Switch, 278–282 Solutions (ATIS), 292
VLAN ACLs, 282–284 alteration
policies, 213 media, 37–38
ports, preventing, 279 messages, 35–37
SBCs, 208 amplification, DoS and, 197
Access Control Engines. See ACEs Analog Telephone Adapter (ATA), 117
Access Control Lists. See ACLs analysis, 160
Access Function (AF), 116, 295 flooding attacks, 135–137
access gateways, 74 malformed messages, 150–153
ACEs (Access Control Engines), 216 service policies, 234–237
ACF (admission confirm), 50 sniffing/eavesdropping, 158–161
ACLs (Access Control Lists), 108, 215 spoofing, 164–165
DoS protection, 216 unintentional flooding, 139
VLANs, 282–284 anchoring media, 240
Active-Active mode, 231 ANMPv3 (Simple Network Management
Active-Standby mode, 230–231, 250 Protocol version 3), 316
Address Resolution Protocol. See ARP Annex D (H.235) baseline security, 54
addresses Annex E (H.235) signature security, 55–56
alias address modification, 50 Annex F (H.235) hybrid security, 56–57
call content interception, 301–302 Answer messages, 334
limited-use, 171 AoR (address-of-record), 69
NAT, 21, 109–113 Application Layer Gateway (ALG), 253
obfuscation, 170 applications
translation, 49 pkcs7-mime types, 183
traversal, 222–224 VoIP, 12
address-of-record (AoR), 69 architecture
AddRoundKey() function, 92 Cisco SII architecture, 313, 329
admission confirm (ACF), 50 connectivity, 232–234
admissions, control, 49 hardware, DoS protection, 215–216
AES (Advanced DES), 89–92 LI, 294–297
AF (Access Function), 116, 295 networks, 8
aggregation routers, 327–328 SBC locations, 224
ALG (Application Layer Gateway), 253
346 architecture
security
VoIP-aware firewalls, 108–109 D
SIP sessions, 61–67 DAI (Dynamic ARP Inspection), 282
troubleshooting, 22 Data Encryption Standard. See DES
Unified CM, 275–277 data integration, 12
wrong configuration of devices, 139 data mining, 33–34
consent-based black/white lists, 171 data security infrastructure, 15
consultative transfers, 336 DDoS (Distributed Denial-of-Service), 197
content debugging firewalls, 252
connection interfaces, 329–339 decryption, 189
encryption, LI, 302 degrading media, 38
filtering, 168 DeleteConnection (DLCX), 75
misrepresentation, 39 Delivery Function (DF), 116, 295–296
control demarcation points, 339
access, Unified CME, 259–261 Denial-of-Service. See DoS attacks
admissions, 49 deployment
bandwidth, 49, 109 access devices, 284–286
media, 240 firewalls, 250
MGCP, 74–75 location, 239–240
phone registrations, Unified CME, 261–262 derivation, keys, 188–190
ping, 220 DES (Data Encryption Standard), 54, 85–87
planes, 314 design
registration timers, 217–220 configuration. See configuration
security profiles, 75–77 service architecture, 228–229
transcoding, 226–227 Active-Active, 231
conversion, protocols, 226 Active-Standby, 230–231
COR (Class of Restriction), 264–266 high availability, 229–230
cost savings, 6–7 network connectivity, 232–234
CPE (Customer Premise Equipment), 116–117 service policy analysis, 234–237
SBCs, 209 traffic flow optimization, 239–244
cracking passwords, 163 virtualization, 237–238
CRCX (CreateConnection), 75 detection
CreateConnection (CRCX), 75 by target subscribers, LI, 300–301
cryptanalysis, 83 unauthorized creation of intercepts, 303
cryptographic message syntax (CMS), 183 devices
cryptography, 83 access, 13, 277
asymmetric (public) key, 92–93 deployment, 284–286
DSA, 95–96 IP phones, 278
RSA, 93–95 Switch, 278–282
symmetric (private) key, 84–85 VLAN ACLs, 282–284
3DES, 87 authentication, 270
AES, 89–92 configuration, 22
DES, 85–87 endpoint, 140
Customer Premise Equipment (CPE), 116–117 interfaces, Cisco SII, 314
extensions, preventing network 349
lack of, 11
MD, 298, 339–341
E
security, 108 eavesdropping, 154
analysis, 158–161
LI, 114–116
media, 30–31
NAT, 109–113
mitigation, 161
SBC, 113–114
VoIP-aware firewalls, 108–109 simulation, 154–158
services, 116 Edge Services Router (ESR), 327
EIGRP (Enhanced Interior Gateway Routing),
call processing servers, 117–120
252
CPE, 116–117
email, spam, 165
wrong configuration of, 139
mitigation, 168–172
DF (Delivery Function), 116, 295–296
SPPP, 167–168
DHCP (Dynamic Host Configuration Protocol)
voice, 165–166
DoS attacks, mitigating, 281
emergency calls, 9
Servers, preventing fraudulent, 280
Enable Application Level Authorization check
dialed digital translation, 50
box, 273, 277
DialedDigitExtraction message, 334
Enable Digest Authentication check box, 276
dictionaries, passwords, 26
enable password command, 259
Differentiated Services Code Point (DSCP), 228
enable secret command, 259
Diffie-Hellman key-exchange procedures, 54
encryption, 14, 182
digest authentication, 68–69, 271–272
content, LI, 302
limitations of, 198
media, 188–193, 274
Digital Signature Algorithm. See DSA
S/MIME, 183–188
digital signatures, 95
disadvantages of VoIP, 8–10 signaling, 273
Distributed Denial-of-Service (DDoS), 197 Unified CM, 273–275
distribution, keys, 101–103 encryption key (KE), 53
DLCX (DeleteConnection), 75 EndpointConfiguration (EPCF), 75
DNS (Domain Name Service), 316 endpoints
domains, 113 devices, 140
SBCs, 203 misbehaving, 140–141
DoS (Denial-of-Service) attacks, 128 troubleshooting, 22
and amplification, 197 Enhanced Interior Gateway Routing (EIGRP),
DHCP, mitigating, 281 252
flooding, 114 EOFB (Enhanced OFB), 53
intentional flooding, 129–138 EPCF (EndpointConfiguration), 75
ESR (Edge Services Router), 327
protection, 109
ETSI (European Telecommunications Standards
SBCs, 206, 213–216
Institute), 292
unintentional flooding, 138–143
European Council Resolution, 292
DSA (Digital Signature Algorithm), 95–96
exchanges, keys, 185–186
DSCP (Differentiated Services Code Point), 228
expiration time, registration timer control,
DTMF (dual-tone multifrequency), 34, 273
217–220
Dynamic ARP Inspection (DAI), 282
exposed interfaces, 11
Dynamic Host Configuration Protocol. See
extensions, preventing network, 280
DHCP
dynamic triggering, 308
350 fake (spoofed) messages
theft, 162
H analysis, 164–165
H.225 (Q.931), 48 mitigation, 165
H.235, 48 simulation, 162–164
Annex D (baseline security), 54 IETF (Internet Engineering Task Force), 183
Annex E (signature security), 55–56 IM (Instant Messaging), 5
Annex F (hybrid security), 56–57 spam, 167
H.245, 48 SPIM, 40–41
H.323, 48 images, authentication, 270
call flow, 50–52 impersonation, 326
components, 49–50 servers, 28–29, 196
overview of, 48–52 implementation of LI intercept request
security profiles, 52–57 interfaces, 308–329
hacking, 83. See also cryptography inbound calls, 112. See also calls
hairpinning, 328 infrastructure
handshakes, TLS, 71–73 data security, 15
hardware, DoS protection, 215–216 IP, 10
Hash-based Message Authentication Code sources of vulnerability, 10–13
(HMAC), 303 injection, media, 37
hashing algorithms, 96 inspection, protocol messages, 108
MAC, 99–100 Instant Messaging. See IM
MD5, 97–98 integrated access device (IAD), 117, 139
SHS, 98–99 integration
headers data, 12
Cisco SII architecture, 313–329 voice, 12
SIP P-DCS, 309–313 integrity
heavy traffic, comparing to malicious flooding, authentication, 53
136 threats against, 34–38
hiding topologies, 208–212 Unified CM, 269–273
high availability, 229–230 intentional flooding, 129–138
network connectivity with, 233 SBCs, 206
hijacking Interactive Voice Response (IVR), 42
calls, 26 intercept access point (IAP), 295, 308
media sessions, 27 intercept request interfaces, 308
registration, SIP, 195 Cisco SII, 313–329
HMAC (Hash-based Message Authentication SIP P-DCS headers, 309–313
Code), 303 interception
HTTPS certificates, 268 call content, 301–302
hybrid security (H.235), 56–57 capacity, 304
unauthorized creation and detection of
intercepts, 303
I interfaces
IAD (integrated access device), 117, 139 content connection, 329–339
IAP (intercept access point), 295, 308 devices, Cisco SII, 314
identity exposed, 11
authentication, 69–70
352 interfaces
sniffing, 154
analysis, 158–161
T
mitigation, 161 TACACS+, 120
simulation, 154–158 TCP (Transmission Control Protocol), SYN
social context, threats against, 38–43 attacks, 128
softphones, 117 teardown, calls, 25
Softswitch, 118 tearing down sessions, 196
SPAF (Service Provider Administration telecommunication service providers (TSPs), 292
Function), 116, 297 Telecommunications Industry Associations
spam, 165 (TIA), 292
calls, 39 TerminationAttempt message, 334
IM, 167 testing
mitigation, 168–172 malformed messages, 144–150
presence, 41–42 analysis, 150–153
mitigation, 154
SPPP, 167–168
negative testing tools, 129
voice, 165–166
sniffing/eavesdropping, 154–158
Spam over Instant Messaging (SPIM), 40–41
SPam over IP Telephony (SPIT), 39 analysis, 158–161
spoofing, 162 mitigation, 161
analysis, 164–165 SRTP, 191–193
messages, 24–30 Turing tests, 168
mitigation, 165 TFTP servers, 12
threats
simulation, 162–164
against availability, 20–30
SPPP (presence spam), 41-42, 167–168
SRST-enabled gateway certificates, 268 against confidentiality, 30–34
SRTP (Secure RTP), 71, 188–193, 267 DoS attacks, 128
packet processing, 190–191 intentional flooding, 129–138
testing, 191–193 unintentional flooding, 138–143
SSH (Secure Shell), 316 against integrity, 34–38
SSI (Service Independent Interception), 292 malformed messages, 143–144
stacks, protocols, 13 analysis, 150–153
static load balancing, 220 mitigation, 154
stream analysis, 160 simulation, 144–150
SubBytes() function, 89 models, 195–198
SubjectSignal message, 334 sniffing/eavesdropping, 154
superusers, 264 analysis, 158–161
Switch, 278–282 mitigation, 161
switches, 13 simulation, 154–158
symmetric (private) key cryptography, 84–85 against social context, 38–43
3DES, 87 spam, 165
AES, 89–92 IM, 167
DES, 85–87 mitigation, 168–172
symmetric NAT, 110 SPPP, 167–168
voice, 165–166
360 threats
Your purchase of Voice over IP Security includes access to a free online edition for
120 days through the Safari Books Online subscription service. Nearly every Cisco
Press book is available online through Safari Books Online, along with over 5,000
other technical books and videos from publishers such as Addison-Wesley Professional,
IBM Press, O’Reilly, Prentice Hall, Que, and Sams.
SAFARI BOOKS ONLINE allows you to search for a specific answer, cut and paste
code, download chapters, and stay current with emerging technologies.