Computer Science One
Computer Science One
Copyleft (Copyright)
The entirety of this book is free and is released under a Creative Commons AttributionShareAlike 4.0 International License (see http://creativecommons.org/licenses/
by-sa/4.0/ for details).
Draft Notice
This book is a draft that has been released for evaluation and comment. The draft
contains mostly complete chapters up through arrays for three languages (C, Java, and
PHP). Subsequent chapters are included as placeholders and indicators for the intended
scope of the final draft, but are intentionally left blank. The author encourages people
to send feedback including suggestions, corrections, and reviews to inform and influence
the final draft. Thank you in advance to anyone helping out or sending constructive
criticisms.
iii
Preface
If you really want to understand something, the best way is to try and
explain it to someone else. That forces you to sort it out in your own mind...
thats really the essence of programming. By the time youve sorted out a
complicated idea into little steps that even a stupid machine can deal with,
youve certainly learned something about it yourself. Douglas Adams,
Dirk Gentlys Holistic Detective Agency [8]
The world of A.D. 2014 will have few routine jobs that cannot be done better
by some machine than by any human being. Mankind will therefore have
become largely a race of machine tenders. Schools will have to be oriented in
this direction. All the high-school students will be taught the fundamentals
of computer technology, will become proficient in binary arithmetic and will
be trained to perfection in the use of the computer languages that will have
developed out of those like the contemporary Fortran Isaac Asimov 1964
Ive been teaching Computer Science since 2008 and was a Teaching Assistant long
before that. Before that I was a student. During that entire time Ive been continually
disappointed in the value (note, not quality) of textbooks, particularly Computer Science
textbooks and especially introductory textbooks. Of primary concern are the costs
which have far outstripped inflation over the last 20 years while not providing any real
additional value. New editions with trivial changes are released on a regular basis in an
attempt to nullify the used book market. Publishers engage in questionable business
practices and unfortunately many institutions are complicit in this process.
In established fields such as mathematics and physics, new textbooks are especially
questionable as the material and topics dont undergo many changes. However, in
Computer Science, new languages and technologies are created and change at breakneck
speeds. Faculty and students are regularly trying to give away stacks of textbooks
(Learn Java 4!, Introduction to Cold Fusion, etc.) that are only a few years old
and yet are completely obsolete and worthless. The problem is that such books have
built-in obsolescence by focusing too much on technological specifics and not enough on
concepts. There are dozens of introductory textbooks for Computer Science; add in the
fact that there are multiple languages and many gimmicks (Learn Multimedia Java,
Gaming with JavaScript, Build a Robot with C!), its publisher paradise: hundreds of
variations, a growing market, and customers with few alternatives.
Preface
Thats why I like organizations like Openstax (http://openstaxcollege.org/) that
attempt to provide free and open learning materials. Though they have textbooks for
a variety of disciplines, Computer Science is not one of them (currently that is). This
might be due to the fact that there are already a huge amount of resources available
online such as tutorials, videos, online open courses, and even interactive code learning
tools. With such a huge amount of resources, why write this textbook then? Firstly,
layoff. Secondly, I dont really expect this book to have much impact beyond my own
courses or department. I wanted a resource that presented Computer Science how I teach
it in my courses and it wasnt available. However, if it does find its way into another
instructors classes or into the hands of an aspiring student that wants to learn, then
great!
Several years ago our department revamped our introductory courses in a Renaissance
in Computing initiative in which we redeveloped several different flavors of Computer
Science I (one intended for Computer Science majors, one for Computer Engineering
majors, one for non-CE engineering majors, one for humanities majors, etc.). The courses
are intended to be equivalent in content but have a broader appeal to those in different
disciplines. The intent was to provide multiple entry points into Computer Science. Once
a student had a solid foundation, they could continue into Computer Science II and pick
up a second programming language with little difficulty.
This basic idea informed how I structured this book. There is a separation of concepts and
programming language syntax. The first part of this book uses pseudocode for example
with a minimum of language-specific elements. Subsequent parts of the book recapitulate
these concepts but in the context of a specific programming language. This allows for a
plug-in style approach to Computer Science: the same book could theoretically be used
for multiple courses or the book could be extended by adding another part for a new
language with minimal effort.
Another inspiration for the structure of this book is the Computer Science I Honors course
that I developed. Usually Computer Science majors take CS1 using Java as the primary
language while CE students take CS1 using C. Since the honors course consists of both
majors (as well as some of the top students), I developed the Honors version to cover
both languages at the same time in parallel. This has led to many interesting teaching
moments: by covering two languages, it provides opportunities to highlight fundamental
differences and concepts in programming languages. It also keeps concepts as the focus of
the course emphasizing that syntax and idiosyncrasies of individual languages are only of
secondary concern. Finally, actively using multiple languages in the first class provides a
better opportunity to extend knowledge to other programming languagesonce a student
has a solid foundation in one language learning a new one should be relatively easy.
The exercises in this book are a variety of exercises Ive used in my courses over the
years. They have been made as generic as possible so that they could be assigned using
any language. While some have emphasized the use of real-world exercises (whatever
that means), my exercises have focused more on solving problems of a mathematical
vi
nature (most of my students have been Engineering students). Some of them are more
easily understood if students have had Calculus but it is not absolutely necessary.
It may be cliche, but the two quotes above exemplify what I believe a Computer Science
I course is about. The second is from Isaac Asimov who was asked at the 1964 Worlds
Fair what he though the world of 2014 would look like. His depiction isnt entirely true,
but I do believe we are on the verge of a fundamental social change that will be caused
by more and more automation. Like the industrial revolution, but on a much smaller
time scale and to a far greater extent, automation will fundamentally change how we
live and not work (I say not work because automation will very easily destroy the vast
majority of todays jobsthis a huge economic and political issue that will need to be
addressed). The time is quickly approaching where being able to program and develop
software will be considered a fundamental skill as essential as arithmetic. I hope this
book plays some small role in helping students adjust to that coming world.
The second quote describes programming, or more fundamentally Computer Science and
problem solving. Computers do not solve problems, humans do. Computers only make
automating solutions possible quickly and on a large scale. At the end of the day, the
human race is still responsible for tending the machines and will be for some time despite
what Star Trek and the most optimistic of AI advocates think.
I hope that people find this book useful. If value is a ratio of quality vs cost then this
book has already succeeded in having infinite value.1 If you have suggestions on how to
improve it, please feel free to contact me. If you end up using it and finding it useful,
please let me know that too!
or it might be undefined, or NaN, or this book is Exceptional depending on which language sections
you read
vii
Acknowledgements
Id like to thank the Department of Computer Science & Engineering at the University
of NebraskaLincoln for their support during my writing and maintaining this book.
This book is dedicated to my family.
ix
Contents
Copyleft (Copyright)
Draft Notice
iii
Preface
Acknowledgements
ix
1. Introduction
1.1. Problem Solving . . . . . . . . .
1.2. Computing Basics . . . . . . . .
1.3. Basic Program Structure . . . .
1.4. Syntax Rules & Pseudocode . .
1.5. Documentation, Comments, and
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
Coding Style
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2. Basics
2.1. Control Flow . . . . . . . . . . . . . . . . . . . . . . .
2.1.1. Flow Charts . . . . . . . . . . . . . . . . . . . .
2.2. Variables . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.1. Naming Rules & Conventions . . . . . . . . . .
2.2.2. Types . . . . . . . . . . . . . . . . . . . . . . .
2.2.3. Declaring Variables: Dynamic vs. Static Typing
2.2.4. Scoping . . . . . . . . . . . . . . . . . . . . . .
2.3. Operators . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.1. Assignment Operators . . . . . . . . . . . . . .
2.3.2. Numerical Operators . . . . . . . . . . . . . . .
2.3.3. String Concatenation . . . . . . . . . . . . . . .
2.3.4. Order of Precedence . . . . . . . . . . . . . . .
2.3.5. Common Numerical Errors . . . . . . . . . . . .
2.3.6. Other Operators . . . . . . . . . . . . . . . . .
2.4. Basic Input/Output . . . . . . . . . . . . . . . . . . . .
2.4.1. Standard Input & Output . . . . . . . . . . . .
2.4.2. Graphical User Interfaces . . . . . . . . . . . . .
2.4.3. Output Using printf -style Formatting . . . .
2.4.4. Command Line Input . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
2
4
5
13
13
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
17
17
17
18
19
22
31
32
33
33
35
37
38
38
39
41
41
42
42
44
xi
Contents
2.5. Debugging . . . . . . . . . . . .
2.5.1. Types of Errors . . . . .
2.5.2. Strategies . . . . . . . .
2.6. Examples . . . . . . . . . . . .
2.6.1. Temperature Conversion
2.6.2. Quadratic Roots . . . .
2.7. Exercises . . . . . . . . . . . . .
3. Conditionals
3.1. Logical Operators . . . . . . .
3.1.1. Comparison Operators
3.1.2. Negation . . . . . . . .
3.1.3. Logical And . . . . . .
3.1.4. Logical Or . . . . . . .
3.1.5. Compound Statements
3.1.6. Short Circuiting . . . .
3.2. If Statement . . . . . . . . . .
3.3. If-Else Statement . . . . . . .
3.4. If-Else-If Statement . . . . . .
3.5. Ternary If-Else Operator . . .
3.6. Examples . . . . . . . . . . .
3.6.1. Meal Discount . . . . .
3.6.2. Look Before You Leap
3.6.3. Comparing Elements .
3.6.4. Life & Taxes . . . . . .
3.7. Exercises . . . . . . . . . . . .
4. Loops
4.1. While Loops . . . . . . . . . .
4.1.1. Example . . . . . . . .
4.2. For Loops . . . . . . . . . . .
4.2.1. Example . . . . . . . .
4.3. Do-While Loops . . . . . . . .
4.4. Foreach Loops . . . . . . . . .
4.5. Other Issues . . . . . . . . . .
4.5.1. Nested Loops . . . . .
4.5.2. Infinite Loops . . . . .
4.5.3. Common Errors . . . .
4.5.4. Equivalency of Loops .
4.6. Problem Solving With Loops .
4.7. Examples . . . . . . . . . . .
4.7.1. For vs While Loop . .
4.7.2. Primality Testing . . .
4.7.3. Paying the Piper . . .
xii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
45
46
48
49
49
50
50
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
61
61
62
64
65
66
67
70
71
72
75
78
79
79
80
81
81
84
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
91
93
94
95
95
96
98
99
99
99
101
102
102
103
103
104
105
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Contents
4.8. Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5. Functions
5.1. Defining & Using Functions . . . . . . . . . .
5.1.1. Function Signatures . . . . . . . . . . .
5.1.2. Calling Functions . . . . . . . . . . . .
5.1.3. Organizing . . . . . . . . . . . . . . . .
5.2. How Functions Work . . . . . . . . . . . . . .
5.2.1. Call By Value . . . . . . . . . . . . . .
5.2.2. Call By Reference . . . . . . . . . . . .
5.3. Other Issues . . . . . . . . . . . . . . . . . . .
5.3.1. Functions as Entities . . . . . . . . . .
5.3.2. Function Overloading . . . . . . . . . .
5.3.3. Variable Argument Functions . . . . .
5.3.4. Optional Parameters & Default Values
5.4. Exercises . . . . . . . . . . . . . . . . . . . . .
6. Error Handling
6.1. Error Handling . . . . . . . . .
6.2. Error Handling Strategies . . .
6.2.1. Defensive Programming
6.2.2. Exceptions . . . . . . . .
6.3. Exercises . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
125
126
126
128
129
129
130
132
134
134
136
137
137
137
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
143
145
145
145
147
149
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
151
151
154
158
160
160
163
164
8. Strings
8.1. Basic Operations
8.2. Comparisons . . .
8.3. Tokenizing . . . .
8.4. Exercises . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
171
171
172
173
173
9. File Input/Output
9.1. Processing Files . . . . . . . . .
9.1.1. Paths . . . . . . . . . .
9.1.2. Error Handling . . . . .
9.1.3. Buffered and Unbuffered
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
177
177
178
179
179
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
xiii
Contents
9.1.4. Binary vs Text Files . . . . . . . . . . . . . . . . . . . . . . . . . 181
9.2. Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
10.Encapsulation & Objects
10.1. Objects . . . . . . . . .
10.1.1. Defining . . . . .
10.1.2. Creating . . . . .
10.1.3. Using Objects . .
10.2. Design Principles & Best
10.3. Exercises . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
Practices
. . . . . .
11.Recursion
11.1. Writing Recursive Functions
11.1.1. Tail Recursion . . . .
11.2. Avoiding Recursion . . . . .
11.2.1. Memoization . . . .
11.3. Exercises . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
191
192
192
193
194
194
195
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
197
198
199
200
201
202
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
205
205
206
207
209
214
215
219
221
226
231
231
232
232
233
234
235
236
237
237
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
241
243
xiv
Contents
245
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
16.Conditionals
16.1. Logical Operators . . . . . . . . . . . . . .
16.1.1. Order of Precedence . . . . . . . .
16.1.2. Comparing Strings and Characters
16.2. If, If-Else, If-Else-If Statements . . . . . .
16.3. Examples . . . . . . . . . . . . . . . . . .
16.3.1. Computing a Logarithm . . . . . .
16.3.2. Life & Taxes . . . . . . . . . . . . .
16.3.3. Quadratic Roots Revisited . . . . .
17.Loops
17.1. While Loops . . . . . . . . . .
17.2. For Loops . . . . . . . . . . .
17.3. Do-While Loops . . . . . . . .
17.4. Other Issues . . . . . . . . . .
17.5. Examples . . . . . . . . . . .
17.5.1. Normalizing a Number
17.5.2. Summation . . . . . .
17.5.3. Nested Loops . . . . .
17.5.4. Paying the Piper . . .
.
.
.
.
.
.
.
.
.
18.Functions
18.1. Defining & Using Functions . .
18.1.1. Declaration: Prototypes
18.1.2. Void Functions . . . . .
18.1.3. Organizing Functions . .
18.1.4. Calling Functions . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
247
247
248
248
249
252
252
253
254
255
256
258
258
261
.
.
.
.
.
.
.
.
263
263
263
265
266
267
267
268
270
.
.
.
.
.
.
.
.
.
275
275
276
277
278
278
278
279
279
280
.
.
.
.
.
283
283
283
285
285
286
xv
Contents
18.2. Pointers . . . . . . . . . . . .
18.2.1. Passing By Reference .
18.2.2. Function Pointers . . .
18.3. Examples . . . . . . . . . . .
18.3.1. Generalized Rounding
18.3.2. Quadratic Roots . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
19.Error Handling
19.1. Language Supported Error Codes . . . . .
19.1.1. POSIX Error Codes . . . . . . . .
19.2. Error Handling By Design . . . . . . . . .
19.3. Enumerated Types . . . . . . . . . . . . .
19.4. Using Enumerated Types for Error Codes
20.Arrays
20.1. Basic Usage . . . . . . . . . .
20.2. Dynamic Memory . . . . . . .
20.3. Using Arrays with Functions .
20.4. Multidimensional Arrays . . .
20.4.1. Contiguous 2-D Arrays
20.5. Dynamic Data Structures . .
21.Strings
21.1. Character Arrays
21.2. String Library . .
21.3. Arrays of Strings
21.4. Comparisons . . .
21.5. Conversions . . .
21.6. Tokenizing . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
22.File I/O
22.1. Opening Files . . . . .
22.2. Reading & Writing . .
22.2.1. Plaintext Files .
22.2.2. Binary Files . .
22.3. Closing Files . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
23.Structures
23.1. Defining Structures . . . . . . . . .
23.1.1. Alternative Declarations . .
23.1.2. Nested Structures . . . . . .
23.2. Usage . . . . . . . . . . . . . . . .
23.2.1. Declaration & Initialization
23.2.2. Selection Operators . . . . .
xvi
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
287
289
292
293
293
294
.
.
.
.
.
297
297
300
300
301
303
.
.
.
.
.
.
305
305
307
309
311
313
314
.
.
.
.
.
.
315
315
317
319
320
322
322
.
.
.
.
.
325
325
326
326
328
329
.
.
.
.
.
.
331
331
332
333
334
334
336
Contents
23.3. Arrays of Structures . . . . . . . .
23.4. Using Structures With Functions .
23.4.1. Factory Functions . . . . . .
23.4.2. To String Functions . . . . .
23.4.3. Passing Arrays of Structures
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
337
338
339
340
341
24.Recursion
343
347
347
351
359
359
360
361
364
364
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
to Elements
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
367
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
369
370
371
371
372
374
374
375
376
377
380
381
381
385
27.Conditionals
27.1. Logical Operators . . . . . . . . . . . . . .
27.1.1. Order of Precedence . . . . . . . .
27.1.2. Comparing Strings and Characters
27.2. If, If-Else, If-Else-If Statements . . . . . .
27.3. Examples . . . . . . . . . . . . . . . . . .
27.3.1. Computing a Logarithm . . . . . .
27.3.2. Life & Taxes . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
389
389
391
391
393
394
394
395
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
xvii
Contents
27.3.3. Quadratic Roots Revisited . . . . . . . . . . . . . . . . . . . . . . 397
28.Loops
28.1. While Loops . . . . . . . . . .
28.2. For Loops . . . . . . . . . . .
28.3. Do-While Loops . . . . . . . .
28.4. Enhanced For Loops . . . . .
28.5. Examples . . . . . . . . . . .
28.5.1. Normalizing a Number
28.5.2. Summation . . . . . .
28.5.3. Nested Loops . . . . .
28.5.4. Paying the Piper . . .
29.Methods
29.1. Defining Methods . . . . . . .
29.1.1. Void Methods . . . . .
29.1.2. Using Methods . . . .
29.1.3. Passing By Reference .
29.2. Examples . . . . . . . . . . .
29.2.1. Generalized Rounding
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
401
401
402
403
404
405
405
405
406
406
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
409
410
412
412
413
415
415
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
417
417
417
419
419
420
422
423
.
.
.
.
.
.
.
.
.
.
.
.
31.Arrays
31.1. Basic Usage . . . . . . . . .
31.2. Dynamic Memory . . . . . .
31.3. Using Arrays with Methods
31.4. Multidimensional Arrays . .
31.5. Dynamic Data Structures .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
425
425
427
428
429
429
32.Strings
32.1. Basics . . . . . .
32.2. String Methods .
32.3. Arrays of Strings
32.4. Comparisons . . .
32.5. Tokenizing . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
433
433
434
436
437
438
xviii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Contents
33.File I/O
441
33.1. File Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441
33.2. File Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443
34.Objects
34.1. Data Visibility . . . . . . . . . . . .
34.2. Methods . . . . . . . . . . . . . . . .
34.2.1. Accessor & Mutator Methods
34.3. Constructors . . . . . . . . . . . . . .
34.4. Usage . . . . . . . . . . . . . . . . .
34.5. Common Methods . . . . . . . . . .
34.6. Composition . . . . . . . . . . . . . .
34.7. Example . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
445
446
447
448
450
452
453
455
457
35.Recursion
461
465
465
468
469
470
471
471
472
473
474
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
hashCode() Methods
. . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
Arguments
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
477
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
479
479
480
480
481
482
482
484
484
485
486
486
489
489
xix
Contents
37.6. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491
37.6.1. Converting Units . . . . . . . . . . . . . . . . . . . . . . . . . . . 491
37.6.2. Computing Quadratic Roots . . . . . . . . . . . . . . . . . . . . . 493
38.Conditionals
38.1. Logical Operators . . . . . . . . .
38.1.1. Order of Precedence . . .
38.2. If, If-Else, If-Else-If Statements .
38.3. Examples . . . . . . . . . . . . .
38.3.1. Computing a Logarithm .
38.3.2. Life & Taxes . . . . . . . .
38.3.3. Quadratic Roots Revisited
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
497
497
499
499
501
501
502
504
39.Loops
39.1. While Loops . . . . . . . . . .
39.2. For Loops . . . . . . . . . . .
39.3. Do-While Loops . . . . . . . .
39.4. Foreach Loops . . . . . . . . .
39.5. Examples . . . . . . . . . . .
39.5.1. Normalizing a Number
39.5.2. Summation . . . . . .
39.5.3. Nested Loops . . . . .
39.5.4. Paying the Piper . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
507
507
508
509
509
510
510
510
511
511
40.Functions
40.1. Defining & Using Functions .
40.1.1. Declaring Functions . .
40.1.2. Organizing Functions .
40.1.3. Calling Functions . . .
40.1.4. Passing By Reference .
40.1.5. Function Pointers . . .
40.2. Examples . . . . . . . . . . .
40.2.1. Generalized Rounding
40.2.2. Quadratic Roots . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
515
515
515
517
517
517
519
519
519
520
xx
527
527
527
528
Contents
42.2.2. Non-Contiguous Indices
42.2.3. Key-Value Initialization
42.3. Useful Functions . . . . . . . .
42.4. Iteration . . . . . . . . . . . . .
42.5. Adding Elements . . . . . . . .
42.6. Removing Elements . . . . . . .
42.7. Using Arrays in Functions . . .
42.8. Multidimensional Arrays . . . .
43.Strings
43.1. Basics . . . . . .
43.2. String Functions
43.3. Arrays of Strings
43.4. Comparisons . . .
43.5. Tokenizing . . . .
.
.
.
.
.
.
.
.
.
.
44.File I/O
44.1. Opening Files . . . .
44.2. Reading & Writing .
44.2.1. Using URLs .
44.2.2. Closing Files .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
45.Objects
45.1. Data Visibility . . . . . . . . . . . .
45.2. Methods . . . . . . . . . . . . . . . .
45.2.1. Accessor & Mutator Methods
45.3. Constructors . . . . . . . . . . . . . .
45.4. Usage . . . . . . . . . . . . . . . . .
45.5. Common Methods . . . . . . . . . .
45.6. Composition . . . . . . . . . . . . . .
45.7. Example . . . . . . . . . . . . . . . .
46.Recursion
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
529
529
529
531
531
532
532
534
.
.
.
.
.
537
537
538
539
540
541
.
.
.
.
543
543
544
545
545
.
.
.
.
.
.
.
.
547
548
548
550
551
552
553
553
554
557
565
Acronyms
575
Index
581
xxi
Contents
References
xxii
583
List of Algorithms
1.1. An example of pseudocode: finding a minimum value . . . . . . . . . . . .
14
34
35
36
50
51
3.1. An if-statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
72
75
76
76
78
80
80
81
83
83
93
94
95
95
96
xxiii
LIST OF ALGORITHMS
4.6. Counter-Controlled Do-While Loop . . . . . . . . . . . . . . . . . . . . . .
96
98
98
99
99
xxiv
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
formatted in hexadecimal
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
Spaces are highlighted for
. . . . . . . . . . . . . . . .
9
11
12
32
41
45
46
. . . . . .
. . . . . .
. . . . . .
Checking
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
266
271
272
273
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
275
276
277
277
279
279
280
282
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
358
362
363
364
xxv
. . .
. . .
Java
. . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
370
381
385
388
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
393
398
399
400
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
401
402
403
403
404
404
405
405
406
408
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
480
480
487
493
494
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
500
502
505
506
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
507
508
508
509
510
xxvi
. . . .
. . . .
. . . .
PHP .
. . . .
. . . .
. . . .
. . . .
. . . .
PHP .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
xxvii
List of Figures
1.1. A Compiling Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
18
19
43
3.1.
3.2.
3.3.
3.4.
3.5.
3.6.
3.7.
.
.
.
.
.
.
.
73
74
77
84
87
88
90
92
97
113
113
114
116
116
127
131
133
135
7.1.
7.2.
7.3.
7.4.
7.5.
152
155
157
159
161
Example of an Array . . . . . . . .
Example returning a static array .
Pitfalls of Returning Static Arrays .
Depiction of Application Memory. .
Shallow vs. Deep Copies . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
and an if-statement.
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
xxix
List of Figures
9.3.
9.4.
9.5.
9.6.
A Word Search . . . . .
A solved Sudoku puzzle
A DNA Sequence . . . .
Codon Table for RNA to
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
Protein Translation .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
183
185
187
189
xxx
1. Introduction
Computers are awesome. The human race has seen more advancements in the last 50
years than in the entire 10,000 years of human history. Technology has transformed the
way we live our daily lives, how we interact with each other, and has changed the course of
our history. Today, everyone carries smart phones which have more computational power
than supercomputers from 20 years ago. Computing has become ubiquitous, the internet
of things will soon become a reality in which every device will become interconnected
and data will be collected and available even about the smallest of minutiae.
However, computers are also dumb. Despite the most fantastical of depictions in science
fiction and and hopes of Artificial Intelligence, computers can only do what they are told
to do. The fundamental art of Computer Science is problem solving. Computers are
not good at problem solving; you are the problem solver. It is still up to you, the user,
to approach a complex problem, study it, understand it, and develop a solution to it.
Computers are only good at automating solutions once you have solved the problem.
Computational sciences have become a fundamental tool of almost every discipline.
Scholars have used textual analysis and data mining techniques to analyze classical
literature and historic texts, providing new insights and opening new areas of study.
Astrophysicists have used computational analysis to detect dozens of new exoplanets.
Complex visualizations and models can predict astronomical collisions on a galactic scale.
Physicists have used big data analytics to push the boundaries of our understanding of
matter in the search for the Higgs boson and study of elementary particles. Chemists
simulate the interaction of millions of combinations of compounds without the need for
expensive and time consuming physical experiments. Biologists use massively distributed
computing models to simulate protein folding and other complex processes. Meteorologists
can predict weather and climactic changes with ever greater accuracy.
Technology and data analytics have changed how political campaigns are run, how
products are marketed and even delivered. Social networks can be data mined to track
and predict the spread of flu epidemics. Computing and automation will only continue
to grow. The time is soon coming where basic computational thinking and the ability
to develop software will be considered a basic skill necessary to every discipline, a
requirement for many jobs and an essential skill akin to arithmetic.
Computer Science is not programming. Programming is a necessary skill, but it is only
the beginning. This book is intended to get you started on your journey.
1. Introduction
1. Introduction
entities that make up a system first. Once these have been defined and implemented,
they are combined and interactions between them are defined to produce a more complex
system.
2n
210
220
230
240
250
260
270
280
Number of bytes
1,024
1,048,576
1,073,741,824
1,099,511,627,776
1,125,899,906,842,624
1,152,921,504,606,846,976
1,180,591,620,717,411,303,424
1,208,925,819,614,629,174,706,176
1. Introduction
Address
..
.
0x7fff58310b8f
0x7fff58310b8b
0x7fff58310b8a
0x7fff58310b89
0x7fff58310b88
0x7fff58310b87
0x7fff58310b86
0x7fff58310b85
0x7fff58310b84
0x7fff58310b83
0x7fff58310b82
0x7fff58310b81
0x7fff58310b80
0x7fff58310b7f
0x7fff58310b7e
0x7fff58310b7d
0x7fff58310b7c
0x7fff58310b7b
0x7fff58310b7a
0x7fff58310b79
0x7fff58310b78
0x7fff58310b77
0x7fff58310b76
0x7fff58310b75
0x7fff58310b74
0x7fff58310b73
0x7fff58310b72
0x7fff58310b71
0x7fff58310b70
0x7fff58310b6f
0x7fff58310b6e
0x7fff58310b88
0x7fff58310b87
0x7fff58310b86
..
.
Contents
..
.
0x32
0x3e
0xcf
0x23
0x01
0x32
0x7c
0xff
3.14159265359
32,321,231
1,458,321
\0
o
l
l
e
H
0xfa
0xa8
0xba
..
.
Table 1.2.: Depiction of Computer Memory. Each address refers to a byte, but different
types of data (integers, floating-point numbers, characters) take different
amounts of memory. Memory addresses and some data is represented in
hexadecimal.
6
1. Introduction
Text Editor
or IDE
Syntax
Error(s)
Source File
Compiler
Other Object
Files &
Libraries
success
Object File
Linker
Input
Executable
File
run
Results &
Output
#include<stdlib.h>
#include<stdio.h>
#include<math.h>
4
5
if(argc != 2) {
fprintf(stderr, "Usage: %s x\n", argv[0]);
exit(1);
}
7
8
9
10
11
double x = atof(argv[1]);
double result = sqrt(x);
12
13
14
if(x < 0) {
fprintf(stderr, "Cannot handle complex roots\n");
exit(2);
}
15
16
17
18
19
20
21
return 0;
22
23
}
Code Sample 1.1: A simple program in C
errors become runtime errors. A program may run fine until its first syntax error at
which point it fails.
There are other ways of compiling and running programs. Java for example represents a
compromise between compiled and interpreted languages. Java source code is compiled
into Java bytecode which is not actually machine code that the operating system and
hardware can run directly. Instead, it is compiled code for a Java Virtual Machine (JVM).
This allows a developer to write highly portable code, compile it and it is runnable on
any JVM on any system (write-once, compile-once, run-anywhere).
In general, interpreted languages are slower than compiled languages because they are
being run through another program (the interpreter) instead of being executed directly
by the processor. Modern tools have been introduced to solve this problem. Just In Time
(JIT) compilers have been developed that take scripts that are not usually compiled,
and compile them to a native machine code format which has the potential to run much
faster than when interpreted. Modern web browsers typically do this for JavaScript code
(Chromes V8 JavaScript engine for example).
Transpilers are source-to-source compilers. They dont produce assembly or machine code,
1. Introduction
instead they take one language and translate it to another language. This is sometimes
done to ensure that scripting languages like JavaScript are backwards compatible with
previous versions of the language. Transpilers can also be used to translate one language
into the same language but with different aspects (such as parallel or synchronized code)
automatically added. They can also be used to translate older languages such as Pascal
to more modern languages as a first step in updating a legacy system.
10
.section
.globl
.align
_main:
__TEXT,__text,regular,pure_instructions
_main
4, 0x90
## @main
.cfi_startproc
## BB#0:
pushq
%rbp
Ltmp2:
.cfi_def_cfa_offset 16
Ltmp3:
.cfi_offset %rbp, -16
movq
%rsp, %rbp
Ltmp4:
.cfi_def_cfa_register %rbp
subq
$48, %rsp
movl
$0, -4(%rbp)
movl
%edi, -8(%rbp)
movq
%rsi, -16(%rbp)
cmpl
$2, -8(%rbp)
je
LBB0_2
## BB#1:
leaq
L_.str(%rip), %rsi
movq
___stderrp@GOTPCREL(%rip),
movq
(%rax), %rdi
movq
-16(%rbp), %rax
movq
(%rax), %rdx
movb
$0, %al
callq
_fprintf
movl
$1, %edi
movl
%eax, -36(%rbp)
##
callq
_exit
LBB0_2:
movq
-16(%rbp), %rax
movq
8(%rax), %rdi
callq
_atof
xorps
%xmm1, %xmm1
movsd
%xmm0, -24(%rbp)
movsd
-24(%rbp), %xmm0
sqrtsd
%xmm0, %xmm0
movsd
%xmm0, -32(%rbp)
ucomisd
-24(%rbp), %xmm1
jbe
LBB0_4
## BB#3:
leaq
L_.str1(%rip), %rsi
movq
___stderrp@GOTPCREL(%rip),
movq
(%rax), %rdi
movb
$0, %al
callq
_fprintf
movl
$2, %edi
movl
%eax, -40(%rbp)
##
callq
_exit
LBB0_4:
leaq
L_.str2(%rip), %rdi
movsd
-24(%rbp), %xmm0
movsd
-32(%rbp), %xmm1
movb
$2, %al
callq
_printf
movl
$0, %ecx
movl
%eax, -44(%rbp)
##
movl
%ecx, %eax
addq
$48, %rsp
popq
%rbp
retq
.cfi_endproc
.section
L_.str:
.asciz
%rax
4-byte Spill
%rax
4-byte Spill
4-byte Spill
__TEXT,__cstring,cstring_literals
## @.str
"Usage: %s x\n"
L_.str1:
.asciz
## @.str1
"Cannot handle complex roots\n"
L_.str2:
.asciz
## @.str2
"square root of %f = %f\n"
.subsections_via_symbols
11
1. Introduction
00000e40
00000e50
00000e60
00000e70
00000e80
00000e90
00000ea0
00000eb0
00000ec0
00000ed0
00000ee0
00000ef0
00000f00
00000f10
00000f20
00000f30
00000f40
00000f50
00000f60
00000f70
00000f80
00000f90
00000fa0
00000fb0
00000fc0
00000fd0
00000fe0
00000ff0
00001000
00001010
00001020
00001030
00001040
*
00002000
00002010
00002020
00002030
00002040
00002050
00002060
00002070
00002080
00002090
000020a0
000020b0
000020c0
000020d0
000020e0
000020f0
00002100
00002110
00002120
00002130
00002140
00002150
00002160
00002170
00002180
00002190
000021a0
000021b0
000021c0
000021c4
55
7d
00
00
00
8b
0f
45
a5
e8
00
10
45
ff
00
00
00
ff
65
68
6f
74
01
00
34
34
00
01
00
00
3e
52
00
48
f8
00
48
00
45
11
e0
00
41
00
4d
d4
25
00
00
00
ff
3a
61
6f
20
00
00
00
00
00
7a
00
00
0f
0f
00
89
48
00
8b
bf
f0
45
66
00
00
48
e0
89
0a
00
00
00
68
20
6e
74
6f
00
00
00
00
00
52
00
00
00
00
00
e5
89
48
38
01
48
e8
0f
00
00
8d
b0
c8
01
00
90
e9
27
25
64
73
66
00
00
00
00
00
00
00
00
00
00
00
48
75
8d
48
00
8b
f2
2e
48
00
3d
02
48
00
4c
68
dc
00
73
6c
0a
20
1c
1c
34
03
00
01
00
00
01
01
00
83
f0
35
8b
00
78
0f
4d
8b
bf
9d
e8
83
00
8d
00
ff
00
20
65
00
25
00
00
00
00
00
78
00
00
00
00
00
ec
81
f2
45
00
08
10
e8
05
02
00
22
c4
ff
1d
00
ff
00
78
20
73
66
00
00
00
00
00
10
00
00
00
00
00
30
7d
00
f0
89
e8
45
0f
45
00
00
00
30
25
dd
00
ff
e9
0a
63
71
20
00
00
00
00
01
01
00
00
00
00
00
c7
f8
00
48
45
6e
e8
86
01
00
00
00
5d
0c
00
00
68
c8
00
6f
75
3d
00
02
0b
0c
14
10
00
34
48
00
00
45
02
00
8b
dc
00
f2
25
00
00
f2
00
c3
01
00
e9
18
ff
43
6d
61
20
00
00
0f
00
00
0c
00
0f
0f
00
00
fc
00
48
10
e8
00
0f
00
00
89
0f
b9
ff
00
00
e6
00
ff
61
70
72
25
00
00
00
01
00
07
00
00
00
00
00
00
00
8b
b0
81
00
51
00
48
45
10
00
25
00
41
ff
00
ff
6e
6c
65
66
00
00
00
00
00
08
00
00
00
00
00
00
00
05
00
00
0f
c0
00
8b
d8
45
00
08
ff
53
ff
00
55
6e
65
20
0a
1c
40
00
10
00
90
00
01
01
00
00
00
0f
9f
e8
00
57
f2
48
38
e8
e8
00
01
25
ff
ff
e9
73
6f
78
72
00
00
0e
00
00
00
01
00
00
00
00
00
00
84
01
94
00
c9
0f
8d
b0
2e
f2
00
00
0e
25
68
d2
61
74
20
6f
00
00
00
00
01
00
00
00
00
00
00
00
89
2c
00
00
48
f2
11
35
00
00
0f
89
00
01
cd
0c
ff
67
20
72
6f
00
00
00
00
00
00
00
00
00
00
00
00
|UH..H..0.E......|
|}.H.u..}.......,|
|...H.5....H.....|
|.H.8H.E.H.......|
|........E......H|
|.E.H.x..n....W..|
|..E....E...Q....|
|E.f..M...%...H.5|
|....H..E...H.8..|
|.A.........E....|
|..H.=.......E...|
|.M....".........|
|E...H..0]..%....|
|.%.....%.....%..|
|....L......AS.%.|
|....h.........h.|
|........h.......|
|..h........Usag|
|e: %s x..Cannot |
|handle complex r|
|oots..square roo|
|t of %f = %f....|
|................|
|............@...|
|4...4...........|
|4...............|
|................|
|.zR..x..........|
|................|
|........4.......|
|>.......H.......|
|R...............|
|................|
11
65
74
ff
66
72
30
00
74
25
c0
00
02
16
1c
27
2d
33
3c
44
03
07
04
6d
72
72
66
64
00
22
72
75
ff
00
28
11
01
65
02
1c
00
00
00
00
00
00
00
00
00
00
00
00
68
00
70
70
79
00
18
72
62
ff
90
11
40
5f
5f
00
00
00
00
00
00
00
00
00
00
00
00
00
00
5f
5f
00
72
6c
00
54
70
5f
ff
00
40
5f
00
68
00
00
01
00
00
00
00
00
00
00
00
00
00
00
65
6d
5f
69
64
00
00
00
62
ff
72
5f
70
05
65
00
00
00
0f
0f
01
01
01
01
01
01
04
00
05
78
61
61
6e
5f
00
51
69
01
20
66
72
00
61
03
00
00
01
01
00
00
00
00
00
00
00
00
00
65
69
74
74
73
00
72
6e
90
11
70
69
02
64
00
00
00
10
00
00
00
00
00
00
00
00
00
00
63
6e
6f
66
74
00
10
64
00
40
72
6e
5f
65
c0
00
00
00
00
01
01
01
01
01
01
00
40
00
75
00
66
00
75
11
90
65
72
5f
69
74
6d
72
1c
fa
00
00
40
00
00
00
00
00
00
05
02
06
74
5f
00
5f
62
40
40
72
18
65
6e
66
68
00
00
de
00
00
0e
00
00
00
00
00
00
00
00
00
65
5f
5f
70
5f
5f
64
00
11
78
74
00
5f
21
00
0c
00
00
00
00
00
00
00
00
00
00
00
00
5f
5f
65
72
62
5f
79
80
40
69
66
90
65
6d
00
05
00
00
00
00
00
00
00
00
00
00
00
00
68
73
78
69
69
5f
6c
e8
5f
74
00
00
78
61
00
00
00
01
01
00
00
00
00
00
00
06
03
20
65
74
69
6e
6e
73
64
ff
61
00
90
00
65
69
00
00
00
00
00
00
00
00
00
00
00
00
00
00
61
64
74
74
64
74
5f
ff
74
90
00
00
63
6e
00
00
00
00
00
00
00
00
00
00
00
00
00
5f
64
65
00
66
65
64
73
ff
6f
00
72
00
75
00
00
14
00
00
00
00
00
00
00
00
00
00
00
5f
65
72
5f
00
72
|.".T.....@___std|
|errp.Qr..@dyld_s|
|tub_binder......|
|........r..@_ato|
|f...r .@_exit...|
|r(.@_fprintf...r|
|0.@_printf......|
|.._...._mh_execu|
|te_header.!main.|
|%...............|
|................|
|................|
|................|
|........@.......|
|................|
|...............|
|-...............|
|3...............|
|<...............|
|D...............|
|................|
|.......@........|
|............ .__|
|mh_execute_heade|
|r._main.___stder|
|rp._atof._exit._|
|fprintf._printf.|
|dyld_stub_binder|
|....|
Code Sample 1.3: A simple program in C, resulting machine code formatted in hexadecimal (partial)
12
13
1. Introduction
1
2
3
4
5
end
end
output min
were not indented, it contained different spacing or different fonts, etc. Likewise, code
should be legible. Well written code is consistent and makes good use of whitespace and
indentation. Code in the same code block should be indented at the same level. Nested
blocks should be further indented just like the outline of an essay or table of contents.
Code should be well-documented. The code itself should be clear enough that it tells the
user what the code does and how it does it. This is called self-documenting code. In
addition, well-written code should contain sufficient and clear comments. A comment
in a program is intended for a human user to read. A comment is ultimately ignored
and has no effect on the actual program. Good comments tell the user why the code was
written or why it was written the way it was. Comments provide a high-level description
of what a block of code, function, or program does. If the particular method or algorithm
is of interest, it should also be documented.
There are typically two ways to write comments. Single line comments usually begin
with two forward slashes, // . Everything after the slashes until the next line is ignored
by the program. Multiline comments begin with a /* and end with a */ ; everything
between them is ignored even if it spans multiple lines. This syntax is shared among
many languages including C, Java, PHP and others. Some examples:
14
2
3
/*
This is a multiline comment
each line is ignored, but allows
for better formatting
4
5
6
7
*/
8
9
10
11
12
13
14
/**
* This is a doc-style comment, usually placed in
* front of major portions of code such as a function
* to provide documentation
* It begins with a forward-slash-star-star
*/
The last example above is a doc-style comment. It originated with Java, but has since
been adopted by many other programming languages. Syntactically it is a normal
multiline comment, but begins with a /** . Asterisks are aligned together on each
line. Certain commenting systems allow you to place other marked up data inside these
comments such as labeling parameters ( @param x ) or use HTML code to provide links.
These doc-style comments are used to provide documentation for major parts of the
code especially functions and data structures. Though not part of the language, other
documentation tools can be used to gather the information in doc-style comments to
produce documentation documents (such as web pages).
Comments should not be trivial: they should not explain something that should be
readily apparent to an experienced user or programmer. For example, if a piece of code
adds two numbers together and stores the result, there should not be a comment that
explains the process. It is a simple and common enough operation that is self-evident.
However, if a function uses a particular process or algorithm such as a Fourier Transform
to perform an operation, it would be appropriate to document it in a series of comments.
Comments can also detail how a function or piece of code should be used. This is
typically done when developing an Application Programmer Interface (API) for use
by other programmers. The APIs available functions should be well-documented so
that users will know how and when to use a particular function. It can document the
functions expectations and behavior such as how it handles bad input or error situations.
15
2. Basics
2.1. Control Flow
The flow of control (or simply control flow) is how a program processes its instructions.
Typically, programs operate in a linear or sequential flow of control. Executable statements
or instructions in a program are performed one after another. In source code, the order
that instructions are written defines their order. Just like English, a program is read
top to bottom. Each statement may modify the state of a program. The state of a
program is the value of all its variables and other information/data stored in memory
at a given moment during its execution. Further, an executable statement may instead
invoke (or call or execute) another procedure (also called subroutine, function, method,
etc.) which is another unit of code that has been encapsulated into one unit so that it
can be reused.
This type of control flow is usually associated with a procedural programming paradigm
(which is closely related to imperative or structured programming paradigms). Though
this text will mostly focus on languages that are procedural (or that have strong procedural
aspects), it is important to understand that there are other programming language
paradigms. Functional programming languages such as Scheme and Haskell achieve
computation through the evaluation of mathematical functions with as little or no (pure
functional) state at all. Declarative languages such as those used in database languages
like SQL or in spreadsheets like Excel specify computation by expressing the logic of
computation rather than explicitly specifying control flow. For a more formal introduction
to programming language paradigms, a good resource is Seven Languages in Seven Weeks:
A Pragmatic Guide to Learning Programming Languages by Tate [32].
17
2. Basics
Decision
Node
Control to
Perform
Action to
Perform
Figure 2.1.: Types of Flowchart Nodes. Control and action nodes are distinguished
by color. Control nodes are automated steps while action nodes are steps
performed as part of the algorithm being depicted.
decision. Decision boxes are usually depicted with a diamond shaped box.
Other boxes represent a process, operation, or action to be performed. Boxes representing
a process are usually rectangles. We will further distinguish two types of processes using
two different colorings: well use green to represent boxes that are steps directly related
to the algorithm being depicted. Well use blue for actions that are necessary to the
control flow of the algorithm such as assigning a value to a variable or incrementing a
value as part of a loop. Figure 2.1 depicts the three types of boxes well use. Figure 2.2
depicts a simple ATM (Automated Teller Machine) process as an example.
2.2. Variables
In mathematics, variables are used as placeholders for values that arent necessarily
known. For example, in the equation,
x = 3y + 5
the variables x and y represent numbers that can take on a number of different values.
Similarly, in a computer program, we also use variables to store values. A variable is
essentially a memory location in which a value can be stored. Typically, a variable is
referred to by a name or identifier (like x, y, z in mathematics). In mathematics variables
are usually used to hold numerical values. However, in programming, variables can
usually hold different types of values such as numbers, strings (a collection of characters),
Booleans (true or false values), or more complex types such as objects.
18
2.2. Variables
User Input
amount
Input PIN
Is PIN
correct?
yes
Get
amount of
withdraw
Sufficient
Funds?
yes
Dispense
amount
no
no
Eject Card
19
2. Basics
discouraged: it is difficult to read, inconsistent, and just plain ugly.
Beyond the naming rules that languages may enforce, most languages have established
naming conventions; a set of guidelines and best-practices for choosing identifier names
for variables (as well as functions, methods, and class names). Conventions may be widely
adopted on a per-language basis or may be established within a certain library, framework
or by an organization. Naming conventions are intended to give source code consistency
which ultimately improves readability and makes it easier to understand. Following a
consistent convention can also greatly reduce the chance for errors and mistakes. Good
naming conventions also has an aesthetic appeal; code should be beautiful.
There are several general conventions when it comes to variables. An early convention,
but still in common use is underscore casing in which variable names consisting of more
than one word have words separated by underscore characters with all other characters
being lower case. For example:
average_score , number_of_students , miles_per_hour
A variation on this convention is to use all uppercase letters such as MILES_PER_HOUR .
A more modern convention is to use lower camel casing (or just camel casing) in which
variable names with multiple words are written as one long word with the first letter in
each new word capitalized but with the first words first letter lower case. For example:
averageScore , numberOfStudents , milesPerHour
The convention refers to the capitalized letters resembling the humps of a camel. One
advantage that camel casing has over underscore casing is that youre not always straining
to type the underscore character. Yet another similar convention is upper camel casing,
also known as PascalCase 1 which is like camel casing, but the first letter in the first word
is also capitalized:
AverageScore , NumberOfStudents , MilesPerHour
Each of these conventions is used in various languages in different contexts which well
explore more fully in subsequent sections (usually underscore lowercasing and camel
casing are used to denote variables and functions, PascalCase is used to denote user
defined types such as classes or structures, and underscore uppercasing is used to denote
static and constant variables). However, for our purposes, well use camel casing for
variables in our pseudocode.
There are exceptions and special cases to each of these conventions such as when a variable
name involves an acronym or a hyphenated word, etc. In such cases sensible extensions or
1
20
2.2. Variables
compromises are employed. For example, xmlString or priorityXMLParser (involving
the acronym Extensible Markup Language (XML)) may be used which keep all letters in
the acronym consistent (all lower or all uppercase).
In addition to these conventions, there are several best-practice principles when deciding
on identifiers.
Be descriptive, but not verbose Use variable names that describe what the
variable represents. The examples above, averageScore , numberOfStudents ,
milesPerHour clearly indicate what the variable is intended to represent. Using
good, descriptive names makes your code self-documenting (a reader can make
sense of it without having to read extensive supplemental documentation).
Avoid meaningless variable names such as value , aVariable , or some cryptic
combination of v10 (its the 10th variable Ive used!). Ambiguous variables such
as name should also be avoided unless the context makes its clear what you are
referring to (as when used inside of a Person object).
Single character variables are commonly used, but used in a context in which their
meaning is clearly understood. For example, variable names such as x , y are okay
if they are used to refer to points in the Euclidean plane. Single character variables
such as i , j are often used as index variables when iterating over arrays. In this
case, terseness is valued over descriptiveness as the context is very well-understood.
As a general rule, the more a variable is used, the shorter it should be. For
example, the variable numStudents may be preferred over the full variable
numberOfStudents .
Avoid abbreviations (or at least use them sparingly) Youre not being charged by
the character in your code; you can afford to write out full words. Abbreviations can
help to write shorter variable names, but not all abbreviations are the same. The
word abbreviation itself could be abbreviated as abbr., abbrv. or abbrev.
for example. Abbreviations are not always universally understood by all users,
may be ambiguous, or non-standard. Moreover, modern IDEs provide automatic
code completion, relieving you of the need to type longer variable names. If the
abbreviation is well-known or understood from context, then it may make sense to
use it.
Avoid acronyms (or at least use them sparingly) Using acronyms in variable
names come with many of the same problems as abbreviations. However, if it makes
sense in the context of your code and has little chance of being misunderstood or
mistaken, then go for it. For example, in the context of a financial application,
APR (Annual Percentage Rate) would be a well-understood acronym in which case
the variable apr may be preferred over the longer annualPercentageRate .
Avoid pluralizations, use singular forms English is not a very consistent language
when it comes to rules like pluralizations. For most cases you simply add s;
21
2. Basics
for others you add es or change the y to i and add es. Some words are
the same form for singular and plural such as glasses.2 Other words completely
different forms forms (focus becomes foci). Still yet there are instances in which
multiple words are acceptable: the plural of person can be persons or people.
Avoiding plural forms keeps things simple and consistent: you dont need to be
a grammarian in order easily read code. One potential exception to this is when
using a collection such as an array to hold more than one element or the variable
represents a quantity that is pluralized (as with numberOfStudents above).
Though the guidelines above provide a good framework from which to write good variable
names, reasonable people can and do disagree on best practice because at some point as
you go from generalities to specifics, conventions become more of a matter of personal
preference and subjective aesthetics. Sometimes an organization may establish its own
coding standards that must be followed which of course trumps any of the guidelines
above.
In the end, a good balance must be struck between readability and consistency. Rules
and conventions should be followed, until they get in the way of good code that is.
2.2.2. Types
A variables type (or data type) is the characterization of the data that it represents. As
mentioned before, a computer only speaks in 0s and 1s (binary). A variable is merely
a memory location in which a series of 0s and 1s is stored. That binary string could
represent a number (either an integer or a floating point number), a single alphanumeric
character or series of characters (string), a boolean type or some other, more complex
user-defined type.
The type of a variable is important because it affects how the raw binary data stored
at a memory location is interpreted. Moreover, some types take a different amount of
memory. For example, an integer type could take 32 bits while a floating point type
could take 64 bits.
Programming languages may support different types and may do so in different ways.
In the next few sections well describe some common types that are supported by many
languages.
These are called plurale tantum (nouns with no singular form) and singular tantum (nouns with no
plural form) for you grammarians. Words like sheep are unchanging irregular plurals; words whose
singular and plural forms are the same.
22
2.2. Variables
Numeric Types
At their most basic, computers are number crunching machines. Thus, the most basic
type of variable that can be used in a computer program is a numeric type. There are
several numeric types that are supported by various programming languages. The most
simple is an integer type which can represent whole numbers 0, 1, 2, etc. and their
negations, 1, 2, . . .. Floating point numeric types represent decimal numbers such as
0.5, 3.14, 4.0, etc. However, floating point numbers cannot represent every real number
possible since they use a finite number of bits to represent the number. We will examine
this in detail below. For now, lets understand how a computer represents both integers
and floating point numbers in memory.
As humans, we think in base-10 (decimal) because we have 10 fingers and 10 toes.3
When we write a number with multiple digits in base-10 we do so using places (ones
place, tens place, hundreds place, etc.). Mathematically, a number in base-10 can be
broken down into powers of ten; for example:
3, 201 = 3 103 + 2 102 + 0 101 + 1 100
In binary, numbers are represented in the same way, but in base-2 in which we only have
0 and 1 as symbols. To illustrate, lets consider counting from 0: in base-10, we would
count 0, 1, 2, . . . , 9 at which point we carry-over a 1 to the tens spot and start over at
0 in the ones spot, giving us 10, 11, 12, . . . , 19 and repeat the carry-over to 20.
With only two symbols, the carry-over occurs much more frequently, we count 0, 1 and
then carry over and have 10. It is important to understand, this is not ten: we are
counting in base-2, so 10 is actually equivalent to 2 in base-10. Continuing, we have 11
and again carry over, but we carry it over twice giving us 100 (just like wed carry over
twice when going from 99 to 100 in base-10). A full count from 0 to 16 in binary can be
found in Table 2.1. In many programming languages, a prefix of 0b is used to denote a
number represented in binary. We use this convention in the table.
As a fuller example, consider again the number 3,201. This can be represented in binary
23
2. Basics
as follows.
0b110010000001 = 1 211 + 1 210 + 0 29 + 0 28 +
1 27 + 0 26 + 0 25 + 0 24 +
0 23 + 0 22 + 0 21 + 1 20
= 211 + 210 + 27 + 20
= 2, 048 + 1, 024 + 128 + 1
= 3, 201
Representing negative numbers is a bit more complicated and is usually done using a scheme called
twos complement. We omit the details for now,
but essentially the first bit in the representation
serves as a sign bit: zero indicates positive, while 1
indicates negative. Negative values are represented
as a complement with respect to 2n (a complement
is where 0s and 1s are flipped to 1s and 0s).
When represented using Twos Complement, binary
numbers with n bits can represent numbers x in the
range
2n1 x 2n1 1
Note that the upper bound follows from the fact
that
n1
X
0b 11 . . . 11 =
2i = 2n 1
| {z } i=0
n bits
Base-10
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Binary
0b0
0b1
0b10
0b11
0b100
0b101
0b110
0b111
0b1000
0b1001
0b1010
0b1011
0b1100
0b1101
0b1110
0b1111
0b10000
24
2.2. Variables
Some programming languages allow you to define variables that are unsigned in which
the sign bit is not used to indicate positive/negative. With the extra bit we can represent
numbers twice as big; using n bits we can represent numbers x in the range
0 x 2n 1
Floating point numbers in binary are represented in a manner similar to scientific notation.
Recall that in scientific notation, a number is normalized by multiplying it by some power
of 10 so that it its most significant digit is between 1 and 9. The resulting normalized
number is called the significand while the power of ten that the number was scaled by is
called the exponent (and since we are base-10, 10 is the base). In general, a number in
scientific notation is represented as:
significand baseexponent
For example,
exponent
z}|{
14326.123 = 1.4326123 10 4
| {z } |{z}
significand
base
25
2. Basics
Name
Half
Single
Double
Quadruple
16
32
64
128
5
8
11
15
Mantissa Bits
10
23
52
112
Significant Digits
of Precision
3.3
7.2
15.9
34.0
Approximate
Range
103 104.5
1038 1038
10308 10308
104931 104931
Table 2.3.: Summary of Floating-point Precisions in the IEEE 754 Standard. Half and
quadruple are not widely adopted.
Most modern programming languages implement floating point numbers according to the
Institute of Electrical and Electronics Engineers (IEEE) 754 Standard [20] (also called
the International Electrotechnical Commission (IEC) 60559 [19]). When represented
in binary, the total number of bits must be used to represent the sign, mantissa and
exponent. The standard defines several precisions that each use a fixed number of bits
with a resulting number of significant digits (base-10) of precision. Table 2.3 contains a
summary of a few of the most commonly implemented precisions.
Just as with integers, the finite precision of floating-point numbers results in several limitations. First, irrational numbers such as = 3.14159 . . . can only be approximated out
to a certain number of digits. For example, with single precision 3.1415927 which is
accurate only to the 6th decimal place and with double precision, 3.1415926535897931
approximate to only 15 decimal places.4 In fact, regardless of how many bits we allow in
our representation, an irrational number like (that never repeats and never terminates)
will only ever be an approximation. Real numbers like require an infinite precision,
but computers are only finite machines.
Even numbers that have a finite representation (rational numbers) such as 13 = 0.333 are
not represented exactly when using floating-point numbers. In double precision binary,
1
= 0b1.0101010101010101010101010101010101010101010101010101 22
3
which when represented in scientific notation in decimal is
3.3333333333333330 101
That is, there are only 16 digits of precision, after which the remaining (infinite) sequence
of 3s get cut off.
Programming languages usually only support the common single and double precisions
defined by the IEEE 754 standard as those are commonly supported by hardware.
4
26
2.2. Variables
However, there are languages that support arbitrary precision (also called multiprecision)
numbers and yet other languages that have many libraries to support big number
arithmetic. Arbitrary precision is still not infinite: instead, as more digits are needed,
more memory is allocated. If you want to compute 10 more digits of , you can but at a
cost. To support the additional digits, more memory is allocated. Also, operations are
performed in software using many operations which can be much slower than performing
fixed-precision arithmetic directly in hardware. Still, there are many applications where
such accuracy or large numbers are absolutely essential.
27
2. Basics
Binary
0b000 0000
0b000 0001
0b000 0010
0b000 0011
0b000 0100
0b000 0101
0b000 0110
0b000 0111
0b000 1000
0b000 1001
0b000 1010
0b000 1011
0b000 1100
0b000 1101
0b000 1110
0b000 1111
0b001 0000
0b001 0001
0b001 0010
0b001 0011
0b001 0100
0b001 0101
0b001 0110
0b001 0111
0b001 1000
0b001 1001
0b001 1010
0b001 1011
0b001 1100
0b001 1101
0b001 1110
0b001 1111
0b010 0000
0b010 0001
0b010 0010
0b010 0011
0b010 0100
0b010 0101
0b010 0110
0b010 0111
0b010 1000
0b010 1001
0b010 1010
Dec
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
Character
\0 Null character
Start of Header
Start of Text
End of Text
End of Transmission
Enquiry
Acknowledgment
\a Bell
\b Backspace
\t Horizontal Tab
\n Line feed
\v Vertical Tab
\f Form feed
\r Carriage return
Shift Out
Shift In
Data Link Escape
Device Control 1
Device Control 2
Device Control 3
Device Control 4
Negative Ack
Synchronous idle
End of Trans. Block
Cancel
End of Medium
Substitute
Escape
File Separator
Group Separator
Record Separator
Unit Separator
(space)
!
"
#
$
%
&
(
)
*
Binary
0b010 1011
0b010 1100
0b010 1101
0b010 1110
0b010 1111
0b011 0000
0b011 0001
0b011 0010
0b011 0011
0b011 0100
0b011 0101
0b011 0110
0b011 0111
0b011 1000
0b011 1001
0b011 1010
0b011 1011
0b011 1100
0b011 1101
0b011 1110
0b011 1111
0b100 0000
0b100 0001
0b100 0010
0b100 0011
0b100 0100
0b100 0101
0b100 0110
0b100 0111
0b100 1000
0b100 1001
0b100 1010
0b100 1011
0b100 1100
0b100 1101
0b100 1110
0b100 1111
0b101 0000
0b101 0001
0b101 0010
0b101 0011
0b101 0100
0b101 0101
Dec
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
Character
+
,
.
/
0
1
2
3
4
5
6
7
8
9
:
;
<
=
>
?
@
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
Binary
0b101 0110
0b101 0111
0b101 1000
0b101 1001
0b101 1010
0b101 1011
0b101 1100
0b101 1101
0b101 1110
0b101 1111
0b110 0000
0b110 0001
0b110 0010
0b110 0011
0b110 0100
0b110 0101
0b110 0110
0b110 0111
0b110 1000
0b110 1001
0b110 1010
0b110 1011
0b110 1100
0b110 1101
0b110 1110
0b110 1111
0b111 0000
0b111 0001
0b111 0010
0b111 0011
0b111 0100
0b111 0101
0b111 0110
0b111 0111
0b111 1000
0b111 1001
0b111 1010
0b111 1011
0b111 1100
0b111 1101
0b111 1110
0b111 1111
Dec
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
Character
V
W
X
Y
Z
[
\
]
^
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z
{
|
}
~
Delete
Table 2.4.: ASCII Character Table. The first and second column indicate the binary and
decimal representation respectively. The third column visualizes the resulting
character when possible. Characters 031 and 127 are control characters that
are not printable or print whitespace. The encoding is designed to impose a
lexicographic ordering: AZ are in order, uppercase letters precede lowercase
letters, numbers precede letters and are also in order.
28
2.2. Variables
wanted to code those characters, you would need to specify them in some way other
than those keys (since typing those keys will affect what you are typing rather than
specifying a character). The standard way to escape characters is to use a backslash
along with another, single character. The three most common are the (horizontal) tab,
\t, the endline character, \n, and the null terminating character, \0. The tab and endline
character are used to specify their whitespace characters respectively. The null character
is used in some languages to denote the end of a string and is not printable.
ASCII is quite old, originally developed in the early sixties. President Johnson first
mandated that all computers purchased by the federal government support ASCII in 1968.
However, it is quite limited with only 128 possible characters. Since then, additional
extensions have been developed. The Extended ASCII character set adds support for
128 additional characters (numbered 128 through 255) by adding 1 more bit (8 total).
Included in the extension are support for common international characters with diacritics
such as u
, n
~ and (which are characters 129, 164, and 156 respectively).
Even 256 possible characters are not enough to represent the wide array of international
characters when you consider languages like Chinese, Japanese, and Korean (CJK for
short). Unicode was developed to solve this problem by establishing a standard encoding
that supports 1,112,064 possible characters, though only a fraction of these are actually
currently assigned.5 Unicode is backward compatible, so it works with plain ASCII
characters. In fact, the most common encoding for Unicode, UTF-8 uses a variable
number of bytes to encode characters. 1-byte encodings correspond to plain ASCII, there
are also 2, 3, and 4-byte encodings.
In most programming languages, strings literals are defined by using either single or
double quotes to delimit where the string begins and ends. For example, one may be
able to define the string "Hello World" : the double quotes are not part of the string,
but instead specify where the string begins and ends. Some languages allow you to use
either single or double quotes. PHP for example would allow you to also define the same
string as Hello World . Yet other languages, such as C distinguish the usage of single
and double quotes: single quotes are for single characters such as A or \n while
double quotes are used for full strings such as "Hello World" .
In any case, if you want a single or double quote to appear in your string you need to
escape it similar to how the tab and endline characters are escaped. For example, in C
\ would refer to the single quote character and "Dwayne \"The Rock\" Johnson"
would allow you to use double quotes within a string. In our pseudocode well use the
stylized double quotes, Hello World in any strings that we define. We will examine
string types more fully in Chapter 8.
As of 2012, 110,182 are assigned to characters, 137,468 are reserved for private use (they are valid
characters, but not defined so that organizations can use them for their own purposes), with 2,048
surrogates and 66 non-character control codes. 864,348 are left unassigned meaning that we are
well-prepared for encoding alien languages when they finally get here.
29
2. Basics
Boolean Types
A Boolean is another type of variable that is used to hold a truth value, either true or
false, of a logical statement. Some programming languages explicitly support a built-in
Boolean type while others implicitly support them. For languages that have explicit
types, typical the keywords true and false are used, but logical expressions can also
be evaluated and assigned to Boolean variables.
Some languages do not have an explicit Boolean type and instead support Booleans
implicitly, sometimes by using numeric types. For example, in C, false is associated with
zero while any non-zero value is associated with true. In either case, Boolean values are
used to make decisions and control the flow of operations in a program (see Chapter 3).
30
2.2. Variables
In some languages, variables must be declared before they can be referred to or used.
When you declare a variable, you not only give it an identifier, but also define its type.
For example, you can declare a variable named numberOf Students and define it to be
an integer. For the life of that variable, it will always be an integer type. You can only
give that variable integer values. Attempts to assign, say, a string type to an integer
variable may either result in a syntax error or a runtime error when the program is
executed or lead to unexpected or undefined behavior. A language that requires you to
declare a variable and its type is a statically typed language.
The declaration of a variable is typically achieved by writing a statement that includes
the variables type (using a built-in keyword of the language) along with the variable
name. For example, in C-style languages, a line like
int x;
would create an integer variable associated with the identifier x .
In other languages, typically interpreted languages, you do not have to declare a variable
before using it. Such languages are generally referred to as dynamically typed languages.
Instead of declaring a variable to have a particular type, the type of a variable is
determined by the type of value that is assigned to it. If you assign an integer to a
variable it becomes an integer. If you assign a string to it, it becomes a string type.
Moreover, a variables type can change during the execution of a program. If you reassign
a value to a variable, it dynamically changes its type to match the type of the value
assigned.
In PHP for example, a line like
$x = 10;
would create an integer variable associated with the identifier $x . In this example, we
did not declare that $x was an integer. Instead, it was inferred by the value that we
assigned to it (10).
At first glance it may seem that dynamically typed languages are better. Certainly
they are more flexible (and allow you to write less so-called boilerplate code), but
that flexibility comes at a cost. Dynamically typed variables are generally less efficient.
Moreover, dynamic typing opens the door to a lot of potential type mismatching errors.
For example, you may have a variable that is assumed to always be an integer. In a
dynamically typed language, no such assumption is valid as any reassignment can change
the variables type. It is impossible to enforce this rule by the language itself and may
require a lot of extra code to check a variables type and deal with type safety issues.
The advantages and disadvantages of each continue to be debated.
31
2. Basics
2.2.4. Scoping
The scope of a variable is the section of code in which a variable is valid or known.
In a statically typed language, a variable must be declared before it can be used. The
code block in which the variable is declared is therefore its scope. Outside of this code
block, the variable is invalid. Attempts to reference or use a variable that is out-of-scope
typically result in a syntax error. An example using the C programming language is
depicted in Code Sample 2.1.
{
int a;
{
//this is a new code block inside the outer block
int b;
//at this point in the code, both a and b are in-scope
}
//at this point, only a is in-scope, b is out-of-scope
2
3
4
5
6
7
8
9
}
Code Sample 2.1: Example of variable scoping in C
Scoping in a dynamically typed language is similar, but since you dont declare a variable,
the scope is usually defined by the block of code where you first use or reference the
variable. Moreover, in some languages using a variable may cause that variable to become
globally scoped.
A globally scoped variable is valid throughout the entirety of a program. A global variable
can be accessed and referenced on every line of code. Sometimes this is a good thing:
for example, we could define a variable to represent and then use it anywhere in our
program. We would then be assured that every computation involving would be using
the same definition of (rather than one line of coding using the estimate 3.14 while
another uses 3.14159).
On the same token, however, global variables make the state and execution of a program
less predictable: if any piece of code can access a global variable, then potentially any
piece of code could change that variable. Imagine some questionable code changing the
value of our global variable to 3. For this reason, using global variables are generally
considered bad practice.6 Even if no code performs such an egregious operation, the
fact that anything can change the value means that when testing, you must test for
the potential that anything will change the value, greatly increasing the complexity of
software testing.
6
Coders often say globals are evil and indeed have often demonstrated that they have low moral
standards. Global variables that is. Coders are above reproach.
32
2.3. Operators
To capture the advantages of a global variable while avoiding the disadvantages, it is
common to only allow global constants; variables whose values cannot be changed once
set.
Another argument against globally scoped variables is that once the identifier has been
used, it cannot be reused or redefined for other purposes (a floating-point variable with
the identifier pi means we cannot use the identifier pi for any other purpose) as
it would lead to conflicts. Defining many globally scoped variables (or functions, or
other elements) starts to pollute the namespace by reserving more and more identifiers.
Problems arise when one attempts to use multiple libraries that have both used the same
identifiers for different variables or functions. Resolving the conflict can be difficult or
impossible if you have no control over the offending libraries.
2.3. Operators
Now that we have variables, we need a way to work with variables. That is, given two
variables we may wish to add them together. Or we may wish to take two strings and
combine them to form a new string. In programming languages this is accomplished
through operators which operate on one or more operands. An operator takes the values
of its operands and combines them or changes them in some way to produce a new value.
If an operator is applied to variable(s), then the values used in the operation are the
values stored in the variable at the time that the operator is evaluated.
Many common operators are binary in that they operate on two operands such as common
arithmetic operations like addition and multiplication. Some operators are unary in that
they only operate on one variable. The first operator that we look at is a unary operator
and allows us to assign values to variables.
33
2. Basics
It is important to realize that when this notation is used, it is not a declaration like it
would be in algebra: a = b for example is an algebraic assertion that the variables a
and b are equal. An assignment operator is different: it means place the value on the
right-hand-side into the variable on the left-hand-side. For that reason, writing something
like
10 = a;
is invalid syntax. The left-hand-side must be a variable.
The right-hand-side, however, may be a literal, another variable, or even a more complex
expression. In the example before,
a 10
the value 10 was acting as a numerical literal: a way of expressing a (human-readable)
value that the computer can then interpret as a binary value. In code, we can conveniently
write numbers in base-10; when compiled or interpreted, the numerical literals are
converted into binary data that the computer understands and placed in a memory
location corresponding to the variable. This entire process is automatic and transparent
to the user. Literals can also be strings or other values. For example:
message hello world
We can also copy values from one variable to another. Assuming that weve assigned
the value 10 to the variable a, we can then copy it to another variable b:
ba
This does not mean that a and b are the same variable. The value that is stored in the
variable a at the time that this statement is executed is copied into the variable b. There
are now two different variables with the same value. If we reassign the value in a, the
value in b is unaffected. This is illustrated in Algorithm 2.1
1
2
a 10
ba
//a and b both store the value 10 at this point
a 20
//now a has the value 20, but b still has the value 10
b 25
//a still stores a value of 20, b now has a value of 25
Algorithm 2.1: Assignment Operator Demonstration
The right-hand-side can also be a more complex expression, for example the result of
summing two numbers together.
34
2.3. Operators
5
6
a 10
b 20
ca+b
dab
//c has the value 30 while d has the value 10
c a + 10
d d
//c now has the value 20 and d now has the value 10
Algorithm 2.2: Addition and Subtraction Demonstration
35
2. Basics
as ab or a/b or ab . In our pseudocode, well generally use ab and ab , but in programming
languages it is difficult to write these some of these symbols. Usually programming
languages use * for multiplication and / for division. Similar examples are provided in
Algorithm 2.3.
1
2
3
4
a 10
b 20
cab
d ab
//c has the value 200 while d has the value 0.5
Algorithm 2.3: Multiplication and Division Demonstration
Careful! Some languages specify that the result of an arithmetic operation on variables
of a certain type must match. That is, an integer plus an integer results in an integer. A
floating-point number divided by a floating-point number results a floating-point number.
When we mix types, say an integer and a floating-point number, the result is generally a
floating-point number. For the most part this is straightforward. The one tricky case is
when we have an integer divided by another integer, 3/2 for example.
Since both operands are integers, the result must be an integer. Normally, 3/2 = 1.5,
but since the result must be an integer, the fractional part gets truncated (cut-off) and
only the integral part is kept for the final result. This can lead to weird results such as
1/3 = 0 and 99/100 = 0. The result is not rounded down or up; instead the fractional
part is completely thrown out. Care must be taken when dividing integer variables in a
statically typed language. Type casting can be used to force variables to change their
type for the purposes of certain operations so that the full answer is preserved. For
example, in C we can write
1
2
3
4
5
6
7
int a = 10;
int b = 20;
double c;
int d;
c = (double) a / (double) b;
d = a / b;
//the value in c is correctly 0.5 but the value in d is 0
Integer Division
Recall that in arithmetic, when you divide integers a/b, b might not go into a evenly in
which case you get a remainder. For example, 13/5 = 2 with a remainder r = 3. More
36
2.3. Operators
generally we have that
a = qb + r
Where a is the dividend, b is the divisor, q is the quotient (the result) and r is the
remainder. We can also perform integer division in most programming languages. In
particular, the integer division operator is the operator that gives us the remainder of
the integer division operation in a/b. In mathematics this is the modulo operator and is
denoted
a mod b
For example,
13 mod 5 = 3
It is possible that the remainder is zero, for example,
10 mod 5 = 0
Many programming languages support this operation using the percent sign. In C for
example,
c = a % b;
37
2. Basics
38
2.3. Operators
to add one more,
cb+1
Mathematically wed expect the result to be 2,147,483,648, but that is more than
the maximum representable integer. What happens is something called arithmetic
overflow. The actual number stored in binary in memory for 2,147,483,647 is
0b0 |11 .{z
. . 11}
31 1s
When we add 1 to this, it is carried over all the way to the 32nd bit, giving the
result
0b1 |00 .{z
. . 00}
31 0s
in binary. However, the 32nd bit is the sign bit, so this is a negative number.
In particular, if this is a twos complement integer, it has the decimal value
2, 147, 483, 648 which is obviously wrong. Another example would be if we have a
large number, say 2 billion and attempt to double it (multiply by 2). We would
expect 4 billion as a result, but again overflow occurs and the result (using 32-bit
signed twos complement integers) is 294, 967, 296. do support functions.
A similar phenomenon can happen with floating point numbers. If an operation
(say multiplying two small numbers together) results in a number that is smaller
than the smallest floating-point number that can be represented, the result is said
to have resulted in underflow. The result can essentially be zero, or an error can
be raised to indicate that underflow has occurred. The consequences of underflow
can be very complex.
Floating-point operations can also result in a loss of precision even if no overflow
or underflow occurs. For example, when adding a very large number a and a very
small number b, the result might be no different from the value of a. This is because
(for example) double precision floating-point numbers only have about 16 significant
digits of precision with the least significant digits being cutoff in order to preserve
the magnitude.
39
2. Basics
Increment Operators
Adding or subtracting one to a variable is a very common operation. So common, that
most programming languages define increment operators such as i++ and i-- which
add one and subtract one from the variables applied. The same effect could be achieved
by writing
i (i + 1) and i (i 1)
but the increment operators provide a shorthand way of expressing the operation.
The operators i++ and i-- are postfix operators: the operator is written after (post)
the operand. Some languages define similar prefix increment operators, ++i and --i .
The effect is similar: each adds or subtracts one from the variable i . However, the
difference is when the operator is used in a larger expression. A postfix operator retains
the original value for the expression, a prefix operator takes on the new, incremented
value in the expression.
To illustrate, suppose the variable i has the value 10. In the following line of code, i is
incremented and used in an expression that adds 5 and stores the result in a variable x :
x = 5 + (i++);
The value of x after this code is 15 while the value of i is now 11. This is because
the postfix operator increments i , but i++ retains the value 10 in the expression. In
contrast, with the line
x = 5 + (++i);
the variable i again now has the value 11, but the value of x is 16 since ++i takes on
the new, incremented value of 11.
Appropriately using each can lead to some very concise code, but it is important to
remember the difference.
40
1
2
3
4
5
int a =
a += 5;
a -= 3;
a *= 2;
a /= 4;
10;
//adds 5 to a
//subtracts 3 from a
//multiplies a by 2
//divides a by 4
6
7
8
9
10
11
12
41
2. Basics
displayed differently on some systems (it is typeset in red in some consoles that support
color to indicate that the output is communicating an error).
As a program is executing, it may prompt a user to enter input. A program may wait
(called blocking) until a user has typed whatever input they want to provide. The user
typically hits the enter key to indicate their input is done and the program resumes,
reading the input provided via the standard input. The program may also produce
output which is displayed to the user.
The standard input and output are generally universal: almost any language, and
operating system will support them and they are the most basic types of input/output.
However, the type of input and output is somewhat limited (usually limited to text-based
I/O) and doesnt provide much in the way of input validation. As an example, suppose
that a program prompts a user to enter a number. Since the input device (keyboard) is
does not really restrict the user, a more obstinate user may enter a non-numeric value,
say hello. The program may crash or provide garbage output with such input.
42
String Output
sprintf
String.format
sprintf
Table 2.5.: printf -style Methods in Several Languages. Languages support formatting
directly to the Standard Output as well as to strings that can be further used
or manipulated. Most languages also support printf -style formatting to
other output mechanisms (streams, files, etc.).
Format String
printf("The value of a = %d, the value of b is %f\n", a, b);
Print List
Placeholders
Figure 2.3.: Elements of a printf statement in C
Such data formatting can be achieved through the use of a printf -style formatting
function. The ideas date back to the mid-60s, but the modern printf comes from
the C programming language. Numerous programming languages support this style of
formatted output ( printf stands for print f ormatted). Most support either printing
the resulting formatted output to the standard output as well as to strings and other
output mechanisms (files, streams, etc.). Table 2.5 contains a small sampling of printf style supported in several languages. Well illustrate this usage using the C programming
language for our examples, but the concepts are generally universal across most languages.
The function works by providing it a number of arguments. The first argument is always
a string that specifies the formatting of the result using several placeholders (flags that
begin with a percent sign) which will be replaced with values stored in variables but in a
formatted manner. Subsequent arguments to the function are the list of variables to be
printed; each argument is delimited by a comma. Figure 2.3 gives an example of of a
printf statement with two placeholders. The placeholders are ultimately replaced with
the values stored in the provided variables a, b. If a, b held the values 10 and 2.718281,
the code would end up printing
The value of a = 10, the value of b is 2.718281
Though there are dozens of placeholders that are supported, we will focus only on a few:
%d formats an integer variable or literal
%f formats a floating-point variable or literal
43
2. Basics
%c formats a single character variable or literal
%s formats a string variable or literal
Misuse of placeholders may result in garbage output. For example, using an integer
placeholder, %d , but providing a string argument; since strings cannot be (directly)
converted to integers, the output will not be correct.
In addition to these placeholders, you can also add modifiers. A number n between
the percent sign and character ( %nd , %nf , %ns )) specifies that the result should be
formatted with a minimum of n columns. If the output takes less than n columns,
printf will pad out the result with spaces so that there are n columns. If the output
takes n or more columns, then the modifier will have no effect (it specifies a minimum
not a maximum.
Floating-point numbers have a second modifier that allows you to specify the number of
digits of precision to be formatted. In particular, you can use the placeholder %n.mf in
which n has the same meaning, but m specifies the number of decimals to be displayed.
By default, 6 decimals of precision are displayed. If m is greater than the precision of the
number, zeros are usually used for subsequent digits; if m is smaller than the precision of
the number, rounding may occur. Note that the n modifier includes the decimal point
as a column. Both modifiers are optional.
Finally, each of these modifiers can be made negative (example: %-20d ) to left-justify
the result. By default, justification is to the right. Several examples are illustrated in
Code Sample 2.3 with the results in Code Sample 2.4.
44
2.5. Debugging
1
2
int a = 4567;
double b = 3.14159265359;
3
4
5
6
7
printf("a=%d\n", a);
printf("a=%2d\n", a);
printf("a=%4d\n", a);
printf("a=%8d\n", a);
8
9
10
11
12
13
14
15
16
17
18
Within a program, command line arguments are usually referred to as an argument vector
(sometimes in a variable named argv ) and argument count (sometimes in a variable
named argc ). We explore how each language supports this in subsequent sections.
2.5. Debugging
Making mistakes in programming is inevitable. Even the most expert of software
developers make mistakes.7 An error in computer programs are usually referred to as
bugs. The term was popularized by Grace Hopper in 1947 while working on a Mark
II Computer at a US Navy research lab. Literally, a moth stuck in the computer was
impeding its operation, removing it or debugging the computer fixed it. In this section
will will identify general types of errors and outline ways to address them.
A severe security bug in the popular unix bash shell utility went undiscovered for 25 years before it
was finally fixed in September 2014, missed by thousands of experts and some of the best coders in
the world.
45
2. Basics
a=4567
a=4567
a=4567
a=
4567
b=3.141593
b= 3.141593
b=3.14
b=
3.142
b=
3.141592653590000
Code Sample 2.4: Result of Computation in Code Sample 2.3. Spaces are highlighted
for clarity.
Syntax Errors
Syntax errors are errors in the usage of a programming language itself. A syntax error
can be a failure to adhere to the rules of the language such as misspelling a keyword or
forgetting proper punctuation (such as missing an ending semicolon). When you have a
syntax error, youre essentially not speaking the same language: You wouldnt be very
comprehensible if you started injecting non-sense words or words from different language
when speaking to someone in English. Similarly, a computer cant understand what
youre trying to say (or what directions youre trying to give it) if youre not speaking
the same language.
Typically syntax errors prevent you from even compiling a program, though syntax errors
can be a problem at runtime with interpreted languages. When it encounters a syntax
error, a compiler will fail to complete the compilation process and will generally quit.
Ideally, the compiler will give reasons for why it was unable to compile and will hopefully
identify the line number where the syntax error was encountered with a hint on what
was wrong. Unfortunately, many times a compilers error message isnt too helpful or
may indicate a problem on one line where the root cause of the problem is earlier in
the program. One cannot expect too much from a compiler after all. If a compiler
were able to correctly interpret and fix our errors for us, wed have natural language
programming where we could order the computer to execute our commands in plain
English. If we had this science fiction-level of computer interaction we wouldnt need
programming languages at all.
46
2.5. Debugging
Fixing syntax errors involves reading and interpreting the compiler error messages,
reexamining the program and fixing any and all issues to conform to the syntax of
the programming language. Fixing one syntax error may enable the compiler to find
additional syntax errors that it had not before. Only once all syntax errors have been
resolved can a program actually compile. For interpreted languages, the program may
be able to run up to where it encounters a syntax error and then exits with a fatal error.
It may take several runs to resolve such errors as well.
Runtime Errors
Once a program is free of syntax errors it can compile and be run. However, that doesnt
mean that the program is completely free of bugs, just that it is free of the types of bugs
(syntax errors) that the compiler is able to detect. A compiler is not able to predict every
action or possible event that could occur when a program is actually run. A runtime
error is an error that occurs while a program is being executed. For example, a program
could attempt to access a file that does not exist, or attempt to connect to a database,
but the computer has lost its network connection, or a user could enter bad data that
results in an invalid arithmetic operation, etc.
A compiler cannot be expected to detect such errors because by definition, the conditions
under which runtime errors occur occur at runtime, not at compile time. One run of
a program could execute successfully, while another subsequent run could fail because
the system conditions have changed. That doesnt mean that we should mitigate the
consequences of runtime errors.
As a programmer it is important to think about the potential problems and runtime
errors that could occur and make contingency plans accordingly. We can make reasonable
assumptions that certain kinds of errors may occur in the execution of our program and
add code to handle those errors if the occur. This is known as error handling (which we
discuss in detail in Chapter 6). For example, we could add code that checks if a user
enters bad input and then re-prompt them to enter good data. If a file is missing, we
could add code to create it as needed. By checking for these errors and preventing illegal,
potentially fatal operations, we practice defensive programming.
Logic Errors
Other errors may be a result of bad code or bad design. Computer do exactly as they
are told to do. Logic errors can occur if we tell the computer to do something that we
didnt intend for them to do. For example, if we tell the computer to execute command
A under condition X, but we meant to have the computer execute command B under
condition Y , we have caused a logical error. The computer will perform the first set of
instructions, not the second as we intended. The program may be free of syntax errors
and may execute without any problems, but we certainly dont get the results that we
47
2. Basics
expected.
Logic errors are generally only detected and addressed by rigorous software testing. When
developing software, we can also design a collection of test cases: a set of inputs along
with correct outputs that we would expect the program of code to produce. We can then
test the program with these inputs to see if they produce the same output as in the test
cases. If they dont, then weve uncovered a logical error that needs to be addressed.
Rigorous testing can be just as complex (or even more complex) than writing the program
itself. Testing along cannot guarantee that a program is free of bugs (in general, the
number of possible inputs is infinite; it is impossible to test all possibilities). However,
the more test cases that we design and pass the higher the confidence we have that the
program is correct.
Testing can also be very tedious. Modern software engineering techniques can help
streamline the process. Many testing frameworks have been developed and built that
attempt to automate the testing process. Test cases can be randomly generated, test suites
can be repeatedly run and verified throughout the development process. Frameworks
can perform regression testing to see if fixing one bug caused another, etc.
2.5.2. Strategies
A common beginners way of debugging a program is to insert temporary print statements
throughout their program to see what values variables have at certain points in an attempt
to isolate where an error is occurring. This is an okay strategy for extremely simple
programs, but its the poor mans way of debugging. As soon as you start writing
more complex programs you quickly realize that this strategy is slow, inefficient, and
can actually hide the real problems (the standard output is not guaranteed to work as
expected if an error has occurred, so print statements may actually mislead you into
thinking the problem occurs at one point in the program when it actually occurs in a
different part).
Instead, it is much better to use a proper debugging tool in order to isolate the problem.
A debugger is another program, that allows you to simulate an execution of your
program. You can set break points in your program on certain lines and the debugger
will execute your program up to those points. It then pauses and allows you to look at
the programs state: you can examine the contents of memory, look at the values stored
in certain variables, etc. Debuggers will also allow you to resume the execution of your
program to the next break point or allow you to step through your program line by
line. This allows you to examine the execution of a program at human speed in order to
diagnose the exact point in execution where the problem occurs. IDEs allow you to do
this visually with a graphical user interface and easy visualization of variables. However,
there are command line debuggers such as GDB (GNUs Not Unix! (GNU) Debugger)
that you interact with using text commands.
48
2.6. Examples
In general, debugging strategies attempt to isolate a problem to as small of a code
segment as possible. For this reason, it is good practice to design code into as small of
segments as possible using good procedural abstraction and use of functions and methods
(see Chapter 5) and creating test cases and suites for these small pieces of code.
It can also help to diagnose a problem by looking at the nature of the failure. If some
test cases pass and others fail you can get a hint as to whats wrong by examining the
key differences between the test cases. If one value passes and another fails, you can
trace that value as it propagates through the execution of your program to see how it
affects other value.
In the end, good debugging skills, just like good coding skills, come from experience. A
seasoned expert may be able to look at an error message and immediately diagnose the
problem. Or, a bug can escape the detection of hundreds of the best developers and
software tools and end up costing millions of dollars and thousands of man-hours because
of a simple failure to convert from English units to metric (in 1999, the Mars Climate
Orbiter broke up in the atmosphere of Mars because one subsystem was computing force
using pound-seconds while everything else was computing in newton-seconds leading to
a miscalculation of orbital entry [1]).
2.6. Examples
Lets apply these concepts by developing several prompt-and-compute style programs.
That is, the programs will prompt the user for input, perform some calculations, and
then output a result.
To write these programs, well use pseudocode, an informal, abstract description of a
program/algorithm. Pseudocode does not use any language-specific syntax. Instead,
it describes processes at a high-level, making use of plain English and mathematical
notation. This allows us to focus on the actual process/program rather than worrying
about the particular syntax of a specific language. Good pseudocode should be easily
translated into any programming language.
5
(F 32)
9
49
2. Basics
1. Read in a Fahrenheit value from the user
2. Compute a Celsius value using the formula above
3. Output the result to the user
This is actually pretty good pseudocode already, but lets be a little more specific using
some of the operators and notation weve established above. The full program can be
found in Algorithm 2.4.
1
2
3
4
b2 4ac
2a
Following the same basic outline, well read in the coefficients from the user, compute
each of the roots, and output the results to the user. Here, however, we may two
computations, one for each of the roots which we label r1 , r2 . The full procedure is
presented in Algorithm 2.5.
2.7. Exercises
Exercise 2.1. Write a program that calculates mileage deduction for income tax using
the standard rate of $0.575 per mile. Your program will read in a beginning and ending
odometer reading and calculate the difference and total deduction. Take care that your
output is in whole cents. An example run of the program may look like the following.
50
2.7. Exercises
r1
r2
Output the roots of ax2 + bx + c are r1 , r2
1
2
3
4
5
b+ b2 4ac
2a
b b2 4ac
2a
r
12
(1 + r) y 1
Where y is the number of years (possibly fractional) the asset was held (and r is on the
scale [0, 1]).
51
2. Basics
Exercise 2.4. The annual percentage yield (APY) is a much more accurate measure of
the true cost of a loan or savings account that compounds interest on a monthly or daily
basis. For a large enough number of compounding periods, it can be calculated as:
AP Y = ei 1
where i is the nominal interest rate (6% = 0.06). Write a program that prompts the user
for the nominal interest rate and outputs the APY.
Exercise 2.5. Write a program that calculates the speed of sound (v, feet-per-second)
in the air of a given temperature T (in Fahrenheit). Use the formula,
r
5T + 297
v = 1086
247
Be sure your program does not lose the fractional part of the quotient in the formula
shown and format the output to three decimal places.
Exercise 2.6. Write a program to convert from radians to degrees using the formula
deg =
180 rad
However, radians are on the scale [0, 2). After reading input from the user be sure to
do some error checking and give an error message if their input is invalid.
Exercise 2.7. Write a program to compute the Euclidean Distance between two points,
(x1 , y2 ) and (x2 , y2 ) using the formulate:
p
(x1 x2 )2 + (y1 y2 )2
Exercise 2.8. Write a program that will compute the value of sin(x) using the first 4
terms of the Taylor series:
sin(x) x
x3 x 5 x7
+
3!
5!
7!
In addition, your program will compute the absolute difference between this calculation
and a standard implementation of the sine function supported in your language. Your
program should prompt the user for an input value x and display the appropriate output.
Your output should looks something like the following.
Sine Approximation
===================
Enter x: 1.15
Sine approximation: 0.912754
Sine value:
0.912764
Difference:
0.000010
52
2.7. Exercises
Exercise 2.9. Write a program to compute the roots of a quadratic equation:
ax2 + bx + c = 0
using the well-known quadratic formula:
b
b2 4ac
2a
Your program will prompt the user for the values, a, b, c and output each real root.
However, for invalid input (a = 0 or values that would result in complex roots), the
program will instead output a message that informs the user why that the inputs are
invalid (with a specific reason).
Exercise 2.10. One of Ohms laws can be used to calculate the amount of power in
Watts (the rate of energy conversion; 1 joule per second) in terms of Amps (a measure of
current, 1 amp = 6.241 1018 electrons per second) and Ohms (a measure of electrical
resistance). Specifically:
W = A2 O
Develop a simple program to read in two of the terms from the user and output the third.
Exercise 2.11. Ohms Law models the current through a conductor as follows:
I=
V
R
where V is the voltage (in volts), R is the resistance (in Ohms) and I is the current (in
amps). Write a program that, given two of these values computes the third using Ohms
Law.
The program should work as follows: it prompts the user for units of the first value:
the user should be prompted to enter V, R, or I and should then be prompted for the
value. It should then prompt for the second unit (same options) and then the value. The
program will then output the third value depending on the input. An example run of
the program:
Current Calculator
==============
Enter the first unit type (V, R, I): V
Enter the voltage: 25.75
Enter the second unit type (V, R, I): I
Enter the current: 72
The corresponding resistance is 0.358 Ohms
Exercise 2.12. Consider the following linear system of equations in two unknowns:
ax + by = c
dx + ey = f
53
2. Basics
Write a program that prompts the user for the coefficients in such a system (prompt for
a, b, c, d, e, f ). Then output a solution to the system (the values for x, y). Take care to
handle situations in which the system is inconsistent.
Exercise 2.13. The surface area of a sphere of radius r is
4r2
and the volume of a sphere with radius r is
4 3
r
3
Write a program that prompts the user for a radius r and outputs the surface area
and volume of the corresponding sphere. If the radius entered is invalid, print an error
message and exit. Your output should look something like the following.
Sphere Statistics
=================
Enter radius r: 2.5
area: 78.539816
volume: 65.449847
Exercise 2.14. Write a program that prompts for the latitude and longitude of two
locations (an origin and a destination) on the globe. These numbers are in the range
[180, 180] (negative values correspond to the western and southern hemispheres). Your
program should then compute the air distance between the two points using the Spherical
Law of Cosines. In particular, the distance d is
d = arccos (sin(1 ) sin(2 ) + cos(1 ) cos(2 ) cos()) R
1 is the latitude of location A, 2 is the latitude of location B
is the difference between location Bs longitude and location As longitude
R is the (average) radius of the earth, 6,371 kilometers
Note: the formula above assumes that latitude and longitude are measured in radians r,
r . See Exercise 2.6 for how to convert between them. Your program output
should look something like the following.
City Distance
========================
Enter latitude of origin: 41.9483
Enter longitude of origin: -87.6556
Enter latitude of destination: 40.8206
Enter longitude of destination: -96.7056
Air distance is 764.990931
54
2.7. Exercises
Exercise 2.15. Write a program that prompts the user to enter in a number of days.
Your program should then compute the number of years, weeks, and days that number
represents. For this exercise, ignore leap years (thus all years are 365 days). Your output
should look something like the following.
Day Converter
=============
Enter number of days: 1000
That is
2 years
38 weeks
4 days
Exercise 2.16. The derivative of a function f (x) can be estimated using the difference
function:
f (x + x) f (x)
f 0 (x)
x
That is, this gives us an estimate of the slope of the tangent line at the point x. Write a
program that prompts the user for an x value and a x value and outputs the value of
the difference function for all three of the following functions:
f (x) = x2
f (x) = sin(x)
f (x) = ln(x)
Your output should looks something like the following.
Derivative Approximation
===================
Enter x: 2
Enter delta-x: 0.1
(x^2) ~= 4.100000
sin(x) ~= -0.460881
lnx(x) ~= 0.487902
In addition, your program should check for invalid inputs: x cannot be zero, and ln(x)
is undefined for x 0. If given invalid inputs, appropriate error message(s) should be
output instead.
Exercise 2.17. Write a program that prompts the user to enter two points in the plane,
(x1 , y1 ) and (x2 , y2 ) which define a line segment `. Your program should then compute and
output an equation for the perpendicular line intersecting the midpoint of `. You should
take care that invalid inputs (horizontal or vertical lines) are handled appropriately. An
example run of your program would look something like the following.
55
2. Basics
Perpendicular Line
====================
Enter x1: 2.5
Enter y1: 10
Enter x2: 3.5
Enter y2: 11
Original Line:
y = 1.0000 x + 7.5000
Perpendicular Line:
y = -1.0000 x + 13.5000
Exercise 2.18. Write a program that computes the total for a bill. The program should
prompt the user for a sub-total. It should then prompt whether or not the customer is
entitled to an employee discount (of 15%) by having them enter 1 for yes, 2 for no. It
should then compute the new sub-total and apply a 7.35% sales tax, and print the receipt
details along with the grand total. Take care that you properly round each operation.
An example run of the program should look something like the following.
Please enter a sub-total: 100
Apply employee discount (1=yes, 2=no)? 1
Receipt
========================
Sub-Total
$ 100.00
Discount
$
15.00
Taxes
$
6.25
Total
$
91.25
Exercise 2.19. The ROI (Return On Investment) is computed by the following formula:
Gain from Investment Cost of Investment
Cost of Investment
Write a program that prompts the user to enter the cost and gain (how much it was sold
for) from an investment and computes and outputs the ROI. For example, if the user
enters $100,000 and $120,000 respectively, the output look similar to the following.
ROI =
56
2.7. Exercises
Beginning odometer reading
Ending odometer reading
Number of gallons it took to fill the tank
Cost of gas in dollars per gallon
For example, if the user enters 50,125, 50,430, 10 (gallons), and $3.25 (per gallon), then
your output should be something like the following.
Miles driven: 305
Miles per gallon: 30.50
Cost per mile: $0.11
Exercise 2.21. A bearing can be measured in degrees on the scale of [0, 360) with 0
being due north, 90 due east, etc. The (initial) directional bearing from location A to
location B can be computed using the following formula.
= atan2 sin() cos(2 ), cos(1 ) sin(2 ) sin(1 ) cos(2 ) cos()
Where
1 is the latitude of location A
2 is the latitude of location B
is the difference between location Bs longitude and location As longitude
atan2 is the two-argument arctangent function
Note: the formula above assumes that latitude and longitude are measured in radians r,
< r < . To convert from degrees d (180 < d < 180) to radians r, you can use the
simple formula:
d
r=
180
Write a program to prompt a user for a latitude/longitude of two locations (an origin and
a destination) and computes the directional bearing (in degrees) from the origin to the
destination. For example, if the user enters: 40.8206, 96.7056 (40.8206 N, 96.7056 W)
and 41.9483, 87.6556 (41.9483 N, 87.6556 W), your program should output something
like the following.
From (40.8206, -96.7056) to (41.9483, -87.6556):
bearing 77.594671 degrees
Exercise 2.22. General relativity tells us that time is relative to your velocity. As you
approach the speed of light (c = 299, 792 km/s), time slows down relative to objects
57
2. Basics
traveling at a slower velocity. This time dilation is quantified by the Lorentz equation
t0 = q
t
1
v2
c2
Where t is the time duration on the traveling space ship and t0 is the time duration on
the (say) Earth.
For example, if we were traveling at 50% the speed of light relative to Earth, one hour in
our space ship (t = 1) would correspond to
1
t0 = p
= 1.1547
1 (.5)2
hours on Earth (about 1 hour, 9.28 minutes).
Write a program that prompts the user for a velocity which represents the percentage
p of the speed of light (that is, p = vc ) and a time duration t in hours and outputs the
relative time duration on Earth.
For example, if the user enters 0.5 and 1 respectively as in our example, it should output
something like the following:
Traveling at 1 hour(s) in your space ship at
50.00% the speed of light, your friends on
Earth would experience:
1 hour(s)
9.28 minute(s)
Your output should be able to handle years, weeks, days, hours, and minutes. So if the
user inputs something like 0.9999 and 168, your output should look something like:
Traveling at 168.00 hour(s) in your space ship at
99.99% the speed of light, your friends on
Earth would experience:
1 year(s)
18 week(s)
3 day(s)
17 hour(s)
41.46 minute(s)
Exercise 2.23. Radioactive isotopes decay into other isotopes at a rate that is measured
by a half-life, H. For example, Strontium-90 has a half-life of 28.79 years. If you started
with 10 kilograms of Strontium-90, 28.79 years later you would have only 5 kilograms
(with the remaining 5 kilograms being Yttrium-90 and Zirconium-90, Strontium-90s
decay products).
58
2.7. Exercises
Given a mass m of an isotope with half-life H we can determine how much of the isotope
remains after y years using the formula,
(y/H)
1
r =m
2
For example, if we have m = 10 kilograms of Strontium-90 with H = 28.79, after y = 2
years we would have
(2/28.79)
1
r = 10
= 9.5298
2
kilograms of Strontium-90 left.
Write a program that prompts the user for an amount m (mass, in kilograms) of an
isotope and its half-life H as well as a number of years y and outputs the amount of
the isotope remaining after y years. For the example above your output should look
something like the following.
Starting with 10.00kg of an isotope with half-life
28.79 years, after 2.00 years you would have
9.5298 kilograms left.
59
3. Conditionals
When writing code, its important to be able to distinguish between one or more situations.
Based on some condition being true or false, you may want to perform some action if
its true, while performing another, different action if it is false. Alternatively, you may
simply want to perform one action if and only if the condition is true, and do nothing
(move forward in your program) if it is false.
Normally, the control flow of a program is sequential : each statement is executed top-tobottom one after the other. A conditional statement (sometimes called selection control
structures) interrupts this normal control flow and executes statements only if some
specified condition holds. The usual way of achieving this in a programming languages is
through the use conditional statements such as the if statement, if-else statement, and
if-else-if statement.
By using conditional statements, we can design more expressive programs whose behavior
depends on their state: if the value of some variable is greater than some threshold, we
can perform action a, otherwise, we can perform action b. You do this on a daily basis as
you make decisions for yourself. At a cafe you may want to purchase the grande coffee
which costs $2. If you have $2 or more, then youll buy it. Otherwise, if you have less
than $2, you can settle for the normal coffee which costs $1. Yet still, if you have less
than $1 youll not be able to make a purchase. The value of your pocket book determines
the decision and subsequent actions that you take.
Similarly, our programs need to be able to make decisions based on various conditions
(they dont actually make decisions for themselves as computer are not really intelligent,
we are simply specifying what should occur based on the conditions). Conditions in a
program are specified by coding logical statements using logical operators.
61
3. Conditionals
a > b,
a b,
a b,
a = b,
a 6= b
a > 10,
a 10,
a 10,
a = 10,
a 6= 10
10 < b,
10 > b,
10 b,
10 b,
10 = b,
10 6= b
or
62
=
6=
Code
<
>
<=
>=
==
!=
Meaning
Type
less than
relational
greater than
relational
less than or equal to
relational
greater than or equal to relational
equal to
equality
not equal to
equality
b2 4ac < 0
which could commonly be expressed in code as
sqrt(b*b - 4*a*c)
< 0
Observe that both operands could be constants, such as 5 10 but there would be little
point. Since both are constants, the truth value of the expression is already determined
before the program runs. Such an expression could easily be replaced with a simple true
or false variable. These are referred to as tautologies and contradictions respectively.
Well examine them in more detail below.
Pitfalls
Sometimes you may want to check that a variable falls in a range. For example, we may
want to test that x lies in the interval [0, 10] (between 0 and 10 inclusive on both ends).
Mathematically we could express this as
0 x 10
and in code, we may try to do something like
0 <= x <= 10
However, when used in code, the operators <= are binary and must be applied to two
operands. In a language the first inequality, 0 <= x would be evaluated and would
result in either true or false. The result is then used in the second comparison which
results in a question such as true 10 or false 10.
Some languages would treat this as a syntax error and not allow such an expression to
be compiled since you cannot compare a boolean value to a numerical value. However,
other languages may allow this, typically representing true with some nonzero value such
as 1 and false with 0. In either case, the expression would evaluate to true since both
0 10 and 1 10. However, this is clearly wrong: if x had a value of 20 for example, the
63
3. Conditionals
first expression would evaluate to false, making the entire expression true, but 20 6 10.
The solution is to use logical operators to express the same logic using two comparison
operators (see Section 3.1.3).
Another common pitfall when programming is to mistake the assignment operator
(typically only one equals sign, = ) and the equality operator (typically two equal signs,
== ). As before, some languages will not allow it. The expression a = 10 would not have
a truth value associated with it. Attempts to use the expression in a logical statement
would be a syntax error.
Still yet, other languages may permit the expression and would give it a truth value
equal to the value of the variable. For example, a = 10 would take on the value 10 and
treated as true (nonzero value) while a = 0 would take on the value 0 and be treated
as false (zero). In either case, we probably do not get the result that we want. Take care
that you use proper equality comparison operator.
Other Considerations
The comparison operators that weve examined are generally used for comparing numerical
types. However, sometimes we wish to compare non-numeric types such as single
characters or strings. Some languages allow you to use numeric operators with these
types as well. We examine specific uses in subsequent chapters.
Some dynamically typed languages (PHP, JavaScript, etc.) have additional rules when
comparison operators are used with mixed types (that is, we compare a string with a
numeric type). They may even have additional strict comparison operators such as
(a === b) and (a !== b) which are true only if the values and types match. So, for
example, (10 == "10") may be true because the values match, but (10 === "10")
would be false since the types do not match (one is an integer, the other a string). We
discuss specifics in subsequent chapters are they pertain to specific languages.
3.1.2. Negation
The negation operator is an operator that flips the truth value of the expression that
it is applied to. It is very much like the numerical negation operator which when applied
to positive numbers results in their negation and vice versa. When the logical negation
operator is applied to a variable or statement, it negates its truth value. If the variable
or statement was true, its negation is false and vice versa.
Also like the numerical negation operator, the logical negation operator is a unary
operator as it applies to only one operand. In modern logic, the symbol is used to
64
a
true
false
(a > 10),
(a b)
We will adopt this notation in our pseudocode, however most programming languages use
the exclamation mark, ! for the negation operator, similar to its usage in the inequality
comparison, != .
The negation operator applies to the variable or statement immediately following it, thus
(a b) and a b
are not the same thing (indeed, the second expression may not even be valid depending
on the language). Further, when used with comparison operators, it is better to use the
opposite comparison. For example,
(a b) and (a > b)
are equivalent, but the second expression is preferred as it is simpler. Likewise,
(a = b) and (a 6= b)
are equivalent, but the second expression is preferred.
This notation was first used by Heyting, 1930 [16]; prior to that the tilde symbol was used (p for
example) by Peano [29] and Whitehead & Russell [33]. However, the tilde operator has been adopted
to mean bit-wise negation in programming languages.
2
In logic, the wedge symbol, p q is used to denote the logical and. It was first used again by
Heyting, 1930 [16] but should not be confused for the keyboard caret, , symbol. Many programming
languages do use the caret as an operator, but it is usually the exclusive-or operator which is true if
and only if exactly one of its operands is true
65
3. Conditionals
a
false
false
true
true
b
false
true
false
true
a And b
false
false
false
true
3.1.4. Logical Or
The logical or operator is the binary operator that is true if at least one of its operands
is true. If both of its operands are false, then the logical or is false. This is in contrast to
what is usually meant in colloquially. If someone says you can have cake or ice-cream,
usually they implicitly also mean, but not both. With the logical or operator, if both
operands are true, the result is still true.
Many programming languages use two vertical bars (also referred to as Sheffer strokes),
|| to denote the logical Or operator.3 . However, for our pseudocode we will adopt the
notation Or, thus the logical or can be expressed as a Or b. Table 3.4 contains a truth
table representation of the logical Or operator.
As with the logical And, the logical Or is used to combine logical statements to make
more complex statements. For example,
(age 18) Or (year = senior)
3
In logic, the vee symbol, p q is used to denote the logical Or. It was first used by Russell, 1906
[31].
66
b
false
true
false
true
a Or b
false
true
true
true
67
3. Conditionals
Tautologies and Contradictions
Some logical statements have the same meaning regardless of the variables involved. For
example,
a Or a
is always true regardless of the value of a. To see this, suppose that a is true, then the
statement becomes
a Or a = true Or false
which is true. Now suppose that a is false, then the statement is
a Or a = false Or true
which again is true. A statement that is always true regardless of the truth values of its
variables is a tautology.
Similarly, the statement
a And a
is always false (at least one of the operands will always be false). A statement that is
always false regardless of the truth values of its variables is a contradiction.
In most cases, it is pointless to program a conditional statement with tautologies or
contradictions: if an if-statement is predicated on a tautology it will always be executed.
Likewise, an if-statement involved with a contradiction will never be executed. In either
case, many compilers or code analysis tools may indicate and warn about these situations
and encourage you to modify the code or to remove dead code. Some languages may
not even allow you write such statements.
There are always exceptions to the rule. Sometimes you may wish to intentionally write
an infinite loop (see Section 4.5.2) for example in which case a statement similar to the
following may be written.
1
while true do
//some computation
end
De Morgans Laws
Another tool to simplify your logic is De Morgans Laws. When a logical And statement
is negated, it is equivalent to an unnegated logical Or statement and vice versa. That is,
(a And b)
68
and
a Or b
2
And
3
Or
Table 3.5.: Logical Operator Order of Precedence
are equivalent to each other;
(a Or b)
and
a And b
are also equivalent to each other. Though equivalent, it is generally preferable to write
the simpler statement. From one of our previous examples, we could write
((0 x) And (x 10))
or we could apply De Morgans Law and simplify this to
(0 > x) Or (x > 10)
which is more concise and arguably more readable.
Order of Precedence
Recall that numerical operators have a well-defined order of precedence that is taken from
mathematics (multiplication is performed before addition for example, see Section 2.3.4).
When working with logical operators, we also have an order of precedence that somewhat
mirrors those of numerical operators. In particular, negations are always applied first,
followed by And operators, and then lastly Or operators.
For example, the statement
a Or b And c
is somewhat ambiguous. We dont just read it left-to-right since the And operator has a
higher order of precedence (this is similar to the mathematical expression a + b c where
the multiplication would be evaluated first). Instead, this statement would be evaluated
by evaluating the And operator first and then the result would be applied to the Or
operator. Equivalently,
a Or (b And c)
If we had meant that the Or operator should be evaluated first, then we should have
explicitly written parentheses around the operator and its operands like
(a Or b) And c
69
3. Conditionals
In fact, its best practice to write parentheses even if it is not necessary. Writing
parentheses is often clearer and easier to read and more importantly communicates intent.
By writing
a Or (b And c)
the intent is clear: we want the And operator to be evaluated first. By not writing the
parentheses we leave our meaning somewhat ambiguous and force whoever is reading the
code to recall the rules for order of precedence. By explicitly writing parentheses, we
reduce the chance for error both in writing and in reading. Besides, its not like were
paying by the character.
For similar operators of the same precedence, they are evaluated left-to-right, thus
a Or b Or c is equivalent to
((a Or b) Or c)
and
a And b And c is equivalent to
70
3.2. If Statement
The first operand is checking to see if d is not zero and the second checks to see if its
reciprocal is greater than 1. With short-circuiting, if d = 0, then the second operand
will not be evaluated and the division by zero will be prevented. If d 6= 0 then the
first operand is true and so the second operand will be evaluated as normal. Without
short-circuiting, both operands would be evaluated leading to a division by zero.
There are many other common patterns that rely on short-circuiting to avoid invalid or
undefined operations. Checking that a variable is valid (defined or not Null) before
using it to evaluate an expression, or checking that an index variable is within the range
of an arrays size before accessing its value for example.
Historically, the short-circuited version of the And operator was known as McCarthys
sequential conjunction operation which was formally defined by John McCarthy (1962)
as if p then q, else false, eliminating the evaluation of q if p is false [22].
Because of short-circuiting, the logical And is effectively not commutative. An operator
is commutative if the order of its operands is irrelevant. For example, addition and
multiplication are all commutative,
x+y =y+x xy =yx
but subtraction and division are not,
x y 6= y x x/y 6= y/x
In logic, the And and Or operators are commutative, but when used in most programming
languages they are not,
(a And b) 6= (b And a)
and
(a Or b) 6= (b Or a)
It is important to emphasize that they are still logically equivalent, but they are not
effectively equivalent: because of short-circuiting, each of these statements have a
potentially different effect.
The Or operator is also short-circuited: if the first operand is true, then the truth value
of the expression is already determined to be true and so the second operand will not be
evaluated. In the expression,
a Or b
if a evaluates to true, then b is not evaluated (since if either operand is true, the entire
expression is true).
3.2. If Statement
Normally, the flow of control (or control flow) in a program is sequential. Each instruction
is executed, one after the other, top-to-bottom and in individual statements left-to-right
71
3. Conditionals
just as one reads in English. Moreover, in most programming languages, each statement
executes completely before the next statement begins. A visualization of this sequential
control flow can be found in the control flow diagram in Figure 3.1(a).
However, it is often necessary for a program to make decisions. Some segments of code
may need to be executed only if some condition is satisfied. The if statement is a control
structure that allows us to write a snippet of code predicated on a logical statement.
The code executes if the logical statement is true, and does not execute if the logical
statement is false. This control flow is featured in Figure 3.1(b)
An example of the syntax using pseudocode can be found in Algorithm 3.1. The use
of the keyword if is common to most programming languages. The logical statement
associated with the if-statement immediately follows the if keyword and is usually
surrounded by parentheses. The code block immediately following the if-statement is
bound to the if-statement.
1
2
3
if (hconditioni) then
Code Block
end
Algorithm 3.1: An if-statement
As in the flow chart, if the hconditioni evaluates to true, then the code block bound to
the statement executes in its entirety. Otherwise, if the condition evaluates to false, the
code block bound to the statement is skipped in its entirety.
A simple if-statement can be viewed as a do this if and only if the condition holds.
Alternatively, if this condition holds do this, otherwise dont. In either case, once the
if-statement executes, the program returns to the normal sequential control flow.
72
Statement 1
hconditioni
true
Code
Block
Statement 2
false
Statement 3
Remaining
Program
Figure 3.1.: Control flow diagrams for sequential control flow and an if-statement. In
sequential control, statements are executed one after the other as they are
written. In an if-statement, the normal flow of control is interrupted and a
Code Block is only executed if the given condition is true, otherwise it is
not. After the if-statement, normal sequential control flow resumes.
73
3. Conditionals
hconditioni
true
Code
Block A
false
Code
Block B
Remaining
Program
Figure 3.2.: An if-else Flow Chart
74
1
2
3
4
5
if (hconditioni) then
Code Block A
else
Code Block B
end
Algorithm 3.2: An if-else Statement
75
3. Conditionals
statement.
1
2
3
4
5
6
7
Lets understand how this code works. First, the if and else keywords are used
just as the two previous control structures, but we are now also using the else if
keyword combination to specify an additional condition. Each condition, starting with
the condition associated with the if-statement is checked in order. If and when one of the
conditions is satisfied (evaluates to true), the code block associated with that condition
is executed and all other code blocks are ignored.
Remember, each of the code blocks in an if-else-if control structure are mutually exclusive.
One and only one of the code blocks will ever execute. Similar to the sequential control
flow, the first condition that is satisfied is the one that executes. If none of the conditions
is satisfied, then the code block associated with the else-statement is the one that is
executed.
In our example, we only identified three possibilities. You can generalize an if-else-if
statement to specify as many conditions as you like. This generalization is depicted in
Algorithm 3.4 and visualized in Figure 3.3. Similar to the if-statement, the else-statement
and subsequent code block is actually optional. If omitted, then it may be possible that
none of the code blocks is executed.
1
2
3
4
5
6
7
8
9
10
76
true
if(hcondition 1i)
Code
Block A
hcondition 1i
false
true
else if(hcondition 2i)
Code
Block B
hcondition 2i
false
true
else if(hcondition 3i)
Code
Block C
hcondition 3i
false
..
.
..
.
false
true
else if(hcondition ni)
hcondition ni
Code
Block N
false
else
Code
Block M
Remaining
Program
Figure 3.3.: Control Flow for an If-Else-If Statement. Each condition is evaluated in
sequence. The first condition that evaluates to true results in the corresponding code block being executed. After executing, the program continues.
Thus, each code block is mutually exclusive: at most one of them is executed.
77
3. Conditionals
The design of if-else-if statements must be done with care to ensure that your statements
are each mutually exclusive and capture the logic you intend. Since the first condition
that evaluates to true is the one that is executed, the order of the conditions is important.
A poorly designed if-else-if statement can lead to bugs and logical errors.
As an example, consider describing the loudness of a sound by its decibel level in Algorithm
3.5.
1
2
3
4
5
6
7
8
9
if decibel 70 then
comf ort intrusive
else if decibel 50 then
comf ort quiet
else if decibel 90 then
comf ort annoying
else
comf ort dangerous
end
Algorithm 3.5: If-Else-If Statement With a Bug
Suppose that decibel = 20 which should be described as a quite sound. However, in the
algorithm, the first condition, decibel 70 evaluates to true and the sound is categorized
as intrusive. The bug is that the second condition, decibel 50 should have come first
in order to capture all decibel levels less than or equal to 50.
Alternatively, we could have followed the example in Algorithm 3.3 and completely
specified both lower bounds and upper bounds in our condition. For example, the
condition for intrusive could have been
(decibel > 50) And (decibel 70)
However, doing this is unnecessary if we order our conditions appropriately and we can
potentially write simpler conditions if we keep the fact that the if-else-if statement is
mutually exclusive.
78
3.6. Examples
Where E is a boolean expression. If E evaluates to true, the statement takes on the
value X which does not need to be a boolean value: it can be anything (an integer,
string, etc.). If E evaluates to false, the statement takes on the value Y .
A simple usage of this expression is to find the minimum of two values:
min = ( (a < b) ? a : b );
If a < b is true, then min will take on the value a. Otherwise it will take on the value b
(in which case a b and so b is minimal).
Most programming languages support this special syntax as it provides a nice convenience
(yet another example of syntactic sugar).
3.6. Examples
Consider the problem of computing a receipt for a meal. Suppose we have the subtotal
cost of all items in the meal. Further, suppose that we want to compute a discount
(senior citizen discount, student discount, or employee discount, etc.). We can then apply
the discount, compute the sales tax, and sum a total, reporting each detail to the user.
To do this, we first prompt the user to enter a subtotal. We can then ask the user if
there is a discount to be computed. If the user answers yes, then we again prompt them
for an amount (to allow different types of discounts). Otherwise, the discount will be
zero. We can then proceed to calculate each of the amounts above. To do this well need
an if-statement. We could also use a conditional statement to check to see if the input
79
3. Conditionals
makes sense: we wouldnt want a discount amount that is greater than 100% after all.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
if y 6= 0 then
q x/y
end
Algorithm 3.7: Preventing Division By Zero Using an If Statement
80
3.6. Examples
could use an if-else statement to perform alternate operations or handle the situation
differently. Defensive programming is akin to looking before leaping: before taking
another, potentially dangerous step, you look to see if you are at the edge of a cliff, and
if so you dont take that dangerous step.
12
13
end
1
2
3
4
5
6
7
8
9
10
11
81
3. Conditionals
Federal tax margins and marginal rates for a married couple filing jointly based on the
Adjusted Gross Income (income after deductions).
AGI is over
0
$18,150
$73,800
$148,850
$225,850
$405,100
$457,600
Table 3.6.: 2014 Tax Brackets for Married Couples Filing Jointly
In addition, one of the tax credits (which offsets tax liability) tax payers can take is the
child tax credit. The rules are as follows:
If the AGI is $110,000 or more, they cannot claim a credit (the credit is $0)
Each child is worth a $1,000 credit, however at most $3,000 can be claimed
The credit is not refundable: if the credit results in a negative tax liability, the tax
liability is simply $0
As an example: suppose that a couple has $35,000 AGI and has two children. Their tax
liability is
$1, 815 + 0.15 ($35, 000 $18, 150) = $4, 342.50
However, the two children represent a $2,000 refund, so their total tax liability would be
$2,342.50.
Lets first design some code that computes the tax liability based on the margins and
rates in Table 3.6. Well assume that the AGI is stored in a variable named income.
Using a series of if-else-if statements as presented in Algorithm 3.9, the variable tax will
82
3.6. Examples
contain our initial tax liability.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
To compute the amount of a tax credit and adjust the tax accordingly by using similar
if-else-if and if-else statements as in Algorithm 3.10.
1
2
3
4
5
6
7
8
9
10
11
12
83
3. Conditionals
3.7. Exercises
Exercise 3.1. Write a program that prompts the user for an x and a y coordinate in
the Cartesian plane and prints out a message indicating if the point (x, y) lies on an axis
(x or y axis, or both) or what quadrant it lies in (see Figure 3.4).
y
Quadrant II
Quadrant I
Quadrant III
Quadrant IV
84
3.7. Exercises
Exercise 3.4. Various substances have different boiling points. A selection of substances
and their boiling points can be found in Table 3.8. Write a program that prompts the
user for the observed boiling point of a substance in degrees Celsius and identifies the
substance if the observed boiling point is within 5% of the expected boiling point. If the
data input is more than 5% higher or lower than any of the boiling points in the table, it
should output Unknown substance.
Substance
Methane
Butane
Water
Nonane
Mercury
Copper
Silver
Gold
Resistivity (n m)
16.78
26.50
35.6
72.0
96.10
85
3. Conditionals
Color Wave length range (nm)
Violet
380 450
Blue
450 475
Indigo
476 495
Green
495 570
Yellow
570 590
Orange
590 620
Red
620 - 750
Table 3.10.: Visible Light Spectrum Ranges
(i) Hardness must be greater than 50
(ii) Carbon content must be less than 0.7
(iii) Tensile strength must be greater than 5600
A grade of 5 thru 10 is is assigned to the steel according to the conditions in Table 3.11.
Write a program that will read in the hardness, carbon content, and tensile strength as
Grade
10
9
8
7
6
5
Conditions
All three conditions are met
Conditions (i) and (ii) are met
Conditions (ii) and (iii) are met
Conditions (i) and (iii) are met
If only 1 of the three conditions is met
If none of the conditions are met
Table 3.11.: Grades of Steel
86
3.7. Exercises
m
703.069579
h2
where m is the persons mass (in lbs) and h is the persons height (in whole inches).
Write a program that reads in a persons mass and height as input and outputs a
characterization of the persons health with respect to the categories in Table 3.12.
Range
BMI < 15
15 BMI < 16
16 BMI < 18.5
18.5 BMI < 25
25 BMI < 30
30 BMI < 35
35 BMI < 40
BMI 40
Category
Very severely underweight
Severely underweight
Underweight
Normal
Overweight
Obese Class I
Obese Class II
Obese Class III
87
3. Conditionals
y
(8.5, 8.25)
(6, 7.5)
(4, 5.5)
(2, 1)
x
Figure 3.6.: Intersection of Two Rectangles
should output empty intersection. Your program should also be robust enough to
check that the input is valid (it should not accept empty or reversed rectangles).
Your program should read in x1 , y1 , x2 , y2 , x3 , y3 , x4 , y4 from the user and perform the
computation above. As an example, the values 2, 1, 6, 7.5, 4, 5.5, 8.5, 8.25 would correspond
to the two rectangles in Figure 3.6.
The output for this instance should look something like the following.
Intersecting rectangle: (4, 5.5), (6, 7.5)
Area: 4.00
Exercise 3.11. Write an app to help people track their cell phone usage. Cell phone
plans for this particular company give you a certain number of minutes every 30 days
which must be used or they are lost (no rollover). We want to track the average number
of minutes used per day and inform the user if they are using too many minutes or can
afford to use more.
Write a program that prompts the user to enter the following pieces of data:
Number of minutes in the plan per 30 day period, m
The current day in the 30 day period, d
The total number of minutes used so far u
The program should then compute whether the user is over, under, or right on the average
daily usage under the plan. It should also inform them of how many minutes are left
88
3.7. Exercises
and how many, on average, they can use per day for the rest of the month. Of course, if
theyve run out of minutes, it should inform them of that too.
For example, if the user enters m = 250, d = 10, and u = 150, your program should print
out something similar to the following.
10 days used, 20 days remaining
Average daily use: 15 min/day
You are EXCEEDING your average daily use (8.33 min/day),
continuing this high usage, youll exceed your minute plan by
200 minutes.
To stay below your minute plan, use no more than 5 min/day.
Of course, if the user is under their average daily use, a different message should be
presented. You are allowed/encouraged to compute any other stats for the user that you
feel would be useful.
Exercise 3.12. Write a program to help a floor tile company determine how many tiles
they need to send to a work site to tile a floor in a room. For simplicity, assume that all
rooms are perfectly rectangular with no obstructions; we will also omit any additional
measurements related to grouting.
Further, we will assume that all tile is laid in a grid pattern centered at the center of the
room. That is, four tiles will meet at their corners at the center of the room with tiles
laid out to the edge of the room. Thus, it may be the case that the final row and/or
column at the edge may need to be cut. Also note that if the cut is short enough, the
remaining tile can be used on the other end of the room (same goes for the corners).
The program will take the following input:
w - the width of the room
l - the length of the room
t - width/length of the tile (all tiles are perfectly square)
If we can use whole tiles to perfectly fit the room, then we do so. For example, on
the input (10, 10, 1), we could perfectly tile a 10 10 room with 100 1 1 tiles. If the
tiles dont perfectly fit, then we have to consider the possibility of waste and/or reuse.
Consider the examples in Figure 3.7.
The first example is from the input (9.8, 100, 1). In this case, we lay the tiles from the
center of the room (8 full tile lengths) but are left with 0.9 on either side. If we cut a
tile to fit the left side, we are left with only .1 tile which is too short for the right side.
Therefore, we are forced to waste the 0.1 length and cut a full tile for the right side. In
89
3. Conditionals
0.9
0.9
9.8
(a) Example 1
10.0
0.4
0.4
8.8
(b) Example 2
90
4. Loops
Computers are really good at automation. A key aspect of automation is that we be
able to repeat a process over and over on different pieces of data until some condition
is met. For example, if we have a collection of numbers and we want to find their sum
we would iterate over each number, adding it to a total, until we have examined every
number. Another example may include sending an email message to each student in a
course. To automate the process, we could iterate over each student record and for each
student we would generate and send the email.
Automated repetition is where loops come in handy. Computers are perfectly suited for
performing such repetitive tasks. We can write a single block of code that performs some
action or processes a single piece of data, then we can write a loop around that block of
code to execute it a number of times.
Loops provide a much better alternative than repeating (cut-paste-cut-paste) the same
code over and over with different variables. Indeed, we wouldnt even do this in real
life. Suppose that you took a 100 mile trip. How would you describe it? Likely, you
wouldnt say, I drove a mile, then I drove a mile, then I drove a mile, . . . repeated 100
times. Instead, you would simply state I drove 100 miles or maybe even, I drove until
I reached my destination.
Loops allow us to write concise, repeatable code that can be applied to each element in
a collection or perform a task over and over again until some condition is met. When
writing a loop, there are three essential components:
An initialization statement that specifies how the loop begins
A continuation (or termination) condition that specifies whether the loop should
continue to execute or terminate
An iteration statement that makes progress toward the termination condition
The initialization statement is executed before the loop begins and serves as a way to set
the loop up. Typically, the initialization statement involves setting the initial value of
some variable.
The continuation statement is a logical statement (that evaluates to true or false) that
specifies if the loop should continue (if the value is true) or should terminate (if the value
is false). Upon termination, code returns to a sequential control flow and the program
continues.
91
4. Loops
Initialization:
i 1
Continuation:
i 10?
false
true
loop body
repeat
Iteration:
i (i + 1)
remaining
program
92
i 1 //Initialization statement
while i 10 do
Perform some action
i (i + 1) //Iteration statement
end
1
2
3
93
4. Loops
executed, the continuation condition is checked. Since i = 1 10, the condition evaluates
to true and the loop code block is executed. The last line of the code block is the iteration
statement, where i is incremented by 1 and now has a value of 2. The code returns to
the top of the loop and again evaluates the continuation condition (which is still true as
i = 2 10).
On the 10th iteration of the loop when i = 10, the loop will execute for the last time. At
the end of the loop, i is incremented to 11. The loop still returns to the top and the
continuation condition is still checked one last time. However, since i = 11 6 10, the
condition is now false and the loop terminates. Regular sequential control flow returns
and the program continues executing whatever code is specified after the loop.
4.1.1. Example
In the previous example we knew how many times we wanted the loop to execute. Though
you can use a while loop in counter-controlled situations, while loops are typically used
in scenarios when you may not know how many iterations you want the loop to execute
for. Instead of a straightforward iteration, the loop itself may update a variable in a
less-than-predictable manner.
As an example, consider the problem of normalizing a number as is typically done in
scientific notation. Given a number x (for simplicity, well consider x 0), we divide
it by 10 until its value is in the interval [0, 10), keeping track of how many times weve
divided by 10. For example, if we have the number x = 32, 145.234, we would divide by
10 four times, resulting in 3.2145234 so that we could express it as
3.2145234 104
A simple realization of this process is presented in Algorithm 4.2. The number of times
the loop executes depends on how large x is. For the example mentioned, it executes 4
times; for an input of x = 10, 000, 000 it would execute 7 times. A while loop allows us
to specify the repetition process without having to know up front how many times it will
execute.
1
2
3
4
5
6
Input : A number x, x 0
k0
while x > 10 do
x (x/10)
k (k + 1)
end
output x, k
Algorithm 4.2: Normalizing a Number With a While Loop
94
Note the additional syntax: in many programming languages, semicolons are used at the
end of executable statements. Semicolons are also used to delimit each of the three loop
components in a for-loop (otherwise there may be some ambiguity as to where each of
the components begins and ends). However, the semicolons are typically only placed
after the initialization statement and continuation condition and are omitted after the
iteration statement. A more concrete example is given in Algorithm 4.4 which represents
an equivalent code snippet as the counter-controlled while loop we examined earlier.
1
2
3
for i 1; i 10; i (i + 1) do
Perform some action
end
Algorithm 4.4: Counter-Controlled For Loop
Though all three components are written on the same line, the initialization statement
is only ever executed once; at the beginning of the loop. The continuation condition is
checked prior to each and every execution of the loop. Only if it evaluates to true does
the loop body execute. The iteration condition is performed at the end of each loop
iteration.
4.2.1. Example
As a more concrete example, consider Algorithm 4.5 in which we do the same iteration
(i will take on the values 1, 2, 3, . . . , 10), but in each iteration we add the value of i for
that iteration to a running total, sum.
95
4. Loops
1
2
3
4
sum 0
for i 1; i 10; i (i + 1) do
sum (sum + i)
end
Algorithm 4.5: Summation of Numbers in a For Loop
Again, the initialization of i = 1 is only performed once. On the first iteration of the loop,
i = 1 and so sum will be given the value sum + i = 0 + 1 = 1 At the end of the loop, i
will be incremented will have a value of 2. The continuation condition is still satisfied, so
once again the loop body executes and sum will be given the value sum + i = 1 + 2 = 3.
On the 10th (last) iteration, sum will have a value 1 + 2 + 3 + + 9 = 45 and i = 10.
Thus sum + i = 45 + 10 = 55 after which i will be incremented to 11. The continuation
condition is still checked, but since 11 6 10, the loop body will not be executed and the
loop will terminate.
i1
do
Perform some action
i (i + 1)
while i 10
1
2
3
96
Initialization:
i 1
loop body
Iteration:
i (i + 1)
Continuation:
i 10?
true
false
remaining
program
Figure 4.2.: A Do-While Loop Flow Chart. The continuation condition is checked after
the loop body.
example is if the loop body performs an operation that may have result in an error code
(or flag) that is either true (an error occurred) or false (no error occurred).
97
4. Loops
1
2
3
4
do
Read some data
isError result of reading
while isError
Algorithm 4.7: Flag-Controlled Do-While Loop
From this perspective, a do-while loop can also be seen as a do-until loop: perform a
task until some condition is no longer satisfied. The subtle wording difference implies
that well perform the action before checking to see if it should be performed again.
1
2
3
How the elements are stored in the collection and how they are iterated over is not our
(primary) concern. We simply want to apply the same block of code to each element,
the foreach loop handles the details on how each element is iterated over. The syntax
also provides us a way to refer to each element (the a variable in the algorithm). On
each iteration of the loop, the foreach loop updates the reference a to the next element
in the array. The loop terminates after it has iterated through each and every one of the
elements. In this way, a foreach loop simplifies the syntax: we dont have to specify any
of the three components ourselves. This is very convenient. As a more concrete example,
consider iterating over each student in a course roster. For each student, we wish to
compute their grade and then email them the results. The foreach loop allows us to do
this without worrying about the iteration details (see Algorithm 4.9).
98
end
1
2
n 10
m 20
for i 1; i m; i (i + 1) do
for j 1; j n; j (j + 1) do
output (i, j)
end
end
1
2
3
4
5
99
4. Loops
termination/continuation condition. Such a loop is referred to as an infinite loop. As an
example, suppose we forgot the increment operation from a previous example.
1
2
3
4
5
sum 0
i1
while i 10 do
sum (sum + i)
end
Algorithm 4.11: Infinite Loop
In Algorithm 4.11 we never make progress toward the terminating condition! Thus, the
loop will execute forever, i will continue to have the value 0 and since 0 10, the loop
body will continue to execute. Care is needed in the design of your loops to ensure that
they make progress toward the termination condition.
Most of the time an infinite loop is not something you want and usually you must terminate
your buggy program externally (sometimes referred to as killing it). However, infinite
loops do have their uses. A poll loop is a loop that is intended to not terminate. At a
system level, for example, a computer may poll devices (such as input/output devices)
one-by-one to see if there is any active input/output request. Instead of terminating,
the poll loop simply repeats itself, returning back to the first device. As long as the
computer is in operation, we dont want this process to stop. This can be viewed as an
infinite loop as it doesnt have any termination condition.
Though proper testing and debugging should reduce the likelihood of such bugs, there
are several notable instances in which an infinite loop impacted real software. One
such instance was the Microsoft Zune bug. The Zune was a portable music player, a
competitor to the iPod. At about midnight on the night of December 31st, 2008, Zunes
everywhere failed to startup properly. A firmware clock driver designed by a 3rd party
company contained the following code.
100
1
2
3
4
5
6
7
8
9
10
11
2008 was a leap year, so the check on line 2 evaluated to true. However, though December
31st, 2008 was the 366th day of the year ( days = 366 ) the third line evaluated to false
and the loop was repeated without any of the program state being updated. The problem
was fixed 24 hours later when it was the 367th day and line 3 worked. The problem
was that line 3 should have been days >= 366) .
The failure was that this code was never tested on the corner cases that it was designed
for. No one thought to test the driver to see if it worked on the last day of a leap year.
The code worked the vast majority of the time, but this illustrates the need for rigorous
testing.
101
4. Loops
Other errors can be avoided by using the proper types of variables. Recall that operations
involving floating-point numbers can have round off and precision errors, 13 + 13 + 13 may
not be equal to one for example. It is best to avoid using floating-point numbers or
comparisons in the control of your loops. Boolean and integer types are much less error
prone.
Finally, you must always ensure that your loops are making progress toward the termination condition. A failure to properly increment a counter can lead to incorrect results or
even an infinite loop.
102
4.7. Examples
How do we make progress toward the termination condition? What variable(s)
need to be incremented and how?
4.7. Examples
4.7.1. For vs While Loop
Recall the classic geometric series,
X
1
=
xk = 1 + x + x2 + x3 +
1 x k=0
Obviously a computer cannot compute an infinite series as it is required to terminate in
a finite number of steps. Thus, we can approach this problem in a number of different
ways.
One way we could approximate the series is to compute it out to a fixed, say n, number
of terms. To do so, we could initialize a sum variable to zero, then iteratively compute
and add terms to the sum until we have computed n terms. To keep track of the terms,
we can define a counter variable, k as in the summation.
Following our strategy, we can identify the initialization: k should start at 0. The iteration
is also easy: k should be incremented by 1 each time. The continuation condition should
continue the loop until we have computed n terms. However, since k starts at 0, we
would want to continue while k < n. We would not want to continue the iteration when
k = n as that would make n + 1 iterations (again since k starts at 0). Further, since
we know the number of iterations we want to execute, a for loop is arguably the most
appropriate loop for this problem. Such a solution is presented in Algorithm 4.12.
1
2
3
4
5
Input : x, n 0
sum 0
for k = 0; k < n; k (k + 1) do
sum (sum + xk )
end
output sum
Algorithm 4.12: Computing the Geometric Series Using a For Loop
103
4. Loops
only affect the summation less and less. That is, the current value represents a good
enough approximation. That way, if someone wanted an even better approximation,
they could specify a smaller .
This approach will be more straightforward with a while loop since the continuation
condition will be more along the lines of while the estimation is not yet good enough,
continue the summation. This approach will also be easier if we keep track of both
a current and a previous value of the summation, then computing and checking the
difference will be easier.
1
2
3
4
5
6
7
8
9
Input : x, > 0
sumprev 0
sumcurr 1
k1
while |sumprev sumcurr | do
sumprev sumcurr
sumcurr (sumcurr + xk )
k (k + 1)
end
output sum
Algorithm 4.13: Computing the Geometric Series Using a While Loop
104
4.7. Examples
1
2
3
4
5
6
Input : n > 1
for i 2; i n; i (i + 1) do
if i divides n then
output composite
end
end
output prime
Algorithm 4.14: Determining if a Number is Prime or Composite
prime numbers m there are. A key observation is that weve already solved part of
the problem: determining if a given number is prime in the previous exercise. To solve
this problem, we could reuse or adapt our previous solution. In particular, we could
surround the previous solution in an outer loop and iterate over integers from 2 up to m.
The inner loop would then determine if the integer is prime and instead of outputting a
result, could keep track of a counter of the number of primes. This solution is presented
in Algorithm 4.15.
1
2
3
4
5
6
7
8
9
10
11
12
13
Input : m > 1
numberOf P rimes 0
for j = 2; j m; j (j + 1) do
isP rime true
for i 2; i j; i (i + 1) do
if i divides j then
isP rime f alse
end
end
if isP rime then
numberOf P rimes (numberOf P rimes + 1)
end
end
output numberOf P rimes
Algorithm 4.15: Counting the number of primes.
105
4. Loops
Further, banks charge an amount of interest on a loan measured as an Annual Percentage
Rate (APR). Given these conditions, the borrower makes monthly payments determined
by the following formula.
monthlyP ayment =
Where i =
apr
12
iP
1 (1 + i)n
is the monthly interest rate, and n is the number of terms (in months).
For simplicity, suppose we borrow P = $1, 000 at 5% interest (apr = 0.05) to be paid
back over a term of 2 years (n = 24). Our monthly payment would thus be
monthlyP ayment =
.05
12
1000
= $43.87
)24
1 (1 + .05
12
When the borrower makes the first months payment, some of it goes to interest, some of
it goes to paying down the balance. Specifically, one months interest on $1,000 is
$1, 000
0.05
= $4.17
12
and so $43.87 $4.17 = $39.70 goes to the balance, making the new balance $960.30.
The next month, this new balance is used to compute the new interest payment,
$960.30
0.05
= $4.00
12
And so on until the balance is fully paid. This process is known as loan amortization.
Lets write a program that will calculate a loan amortization schedule given the inputs as
described above. To start, well need to compute the monthly payment using the formula
above and for that well need a monthly interest rate. The balance will be updated
month-to-month, so well use another variable to represent that that gets updated.
Finally, well want to track the current month in the loan schedule process.
Once we have these variables setup, we can start a loop that will repeat once for each
month in the loan schedule. We could do this using either type of loop, but for this
exercise, lets use a while loop. Using our month variable, well start by initializing it to
1 and run the loop through the last month, n.
On each iteration compute that months interest and principle payments as above, update
the balance, and also be sure to update our month counter variable to ensure were
making progress toward the termination condition. On each iteration well also output
each of these variables to the user. The full program can be found in Algorithm 4.16.
If we were to actually implement this wed need to be more careful. This outlines the basic
process, but keep in mind that US dollars are only accurate to cents. A monthly payment
cant be $43.871 cents. Well need to take care to round properly. This introduces another
issue: by rounding the final months payment may not match the expected monthly
106
4.8. Exercises
payment (we may over or under pay in the final month). An actual implementation may
need to handle the final months payment separately with different logic and operations
than are inside the loop.
1
2
monthlyP ayment
10
11
end
5
6
7
8
9
4.8. Exercises
Exercise 4.1. Write a for-loop and a while-loop that accomplishes each of the following.
(a) Prints all integers 1 thru 100 on the same line delimited by a single space
(b) Prints all even integers 0 up to n in reverse order
(c) A list of integers divisible by 3 between a and b where a, b are parameters or inputs
(d) Prints all positive powers of two up to 230 : 1, 2, 4, . . . , 1073741824 one value per
line (try computing up to 231 and 232 and discern reasons for why it may fail)
(e) Prints all even integers 2 thru 200 on 10 different lines (10 numbers on each line)
in reverse order
(f) Prints the following pattern of numbers (hint: use two nested loops; the result can
be computed using some value of i + 10j)
107
4. Loops
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81 91
82 92
83 93
84 94
85 95
86 96
87 97
88 98
89 99
90 100
101
102
103
104
105
106
107
108
109
110
Exercise 4.2. Civil engineers have come up with two different models on how a citys
population will grow over the next several years. The first projection assumes a 10%
annual growth rate while the second projection assumes a linear growth rate of 50,000
additional citizens per year. Write a program to project the population growth under
both models. Take, as input, the initial population of the city along with a number of
years to project the population.
In addition, compute how many years it would take to double the population under each
model.
Exercise 4.3. Write a loan program similar to the amortization schedule program we
developed in Section 4.7.3. However, give the user an option to specify an extra monthly
payment amount in order to pay off the loan early. Calculate how much quicker the loan
gets paid off and how much they save in interest.
Exercise 4.4. The rate of decay of a radioactive isotope is given in terms of its half-life
H, the time lapse required for the isotope to decay to one-half of its original mass.
For example, the isotope Strontium-90 (90 Sr) has a half-life of 28.9 years. If we start
with 10kg of Strontium-90 then 28.9 years later you would expect to have only 5kg of
Strontium-90 (and 5kg of Yttrium-90 and Zirconium-90, isotopes which Strontium-90
decays into).
Write a program that takes the following input:
Atomic Number (integer)
Element Name
Element Symbol
H (half-life of the element)
m, an initial mass in grams
Your program will then produce a table detailing the amount of the element that remains
after each year until less than 50% of the original amount remains. This amount can be
108
4.8. Exercises
computed using the following formula:
(y/H)
1
r =m
2
y is the number of years elapsed, and H is the half-life of the isotope in years.
For example, using your program on Strontium-90 (symbol: Sr, atomic number: 38) with
a half-life of 28.9 years and an initial amount of 10 grams would produce a table something
Strontium-90 (38-Sr)
Elapsed Years
Amount
------------------------10g
1
9.76g
like:
2
9.53g
3
9.30g
...
28
5.11g
29
4.99g
Exercise 4.5. In this exercise, you will develop a program that assists people in saving
for a retirement using a tax-deferred 401k program.
Your application will allow a user to enter the following inputs:
An initial starting balance
A monthly contribution amount (well assume its the same over the life of the
savings plan)
An (average) annual rate of return
An (average) annual rate of inflation
In addition, your program will allow a user to choose between two different scenarios:
The first will allow the user to input a number of years left until retirement. It
will then compute a monthly savings table which will be a projection out to that
many years.
The second will take a target dollar amount and compute a monthly savings table
until the account balance has reached this target dollar amount.
The monthly interest rate should be inflation-adjusted. The inflation-adjusted rate of
return can be computed with the following formula.
1 + rate of return
1
1 + inflation rate
109
4. Loops
To get the monthly rate, simply divide by 12. Each month, interest is applied to the
balance at this rate along with the monthly contribution amount.
An example: if we start with $10,000 and contribute $500 monthly with a return rate of
9% and an inflation rate of 1.2%, the first few lines of our table would something like the
following.
Payment
1
2
3
4
...
Interest Earned
$
64.23
$
67.85
$
71.50
$
75.17
Contribution
$ 500.00
$ 500.00
$ 500.00
$ 500.00
Balance
$ 10564.23
$ 11132.08
$ 11703.58
$ 12278.75
1X
xi
n i=1
The variance,
n
2 =
1X
(xi )2
n i=1
110
4.8. Exercises
Minimum:
Maximum:
Mean:
Variance:
Standard
Deviation:
2.71
42.00
12.77
228.77
15.13
Exercise 4.7. The ancient Greek mathematician Euclid developed a method for finding
the greatest common divisor of two positive integers, a and b. His method is as follows:
1. If the remainder of a/b is 0 then b is the greatest common divisor.
2. If it is not 0, then find the remainder r of a/b and assign b to a and the remainder
r to b.
3. Return to step (1) and repeat the process.
Write a program that uses a function to perform this procedure. Display the two integers
and the greatest common divisor.
Exercise 4.8. Write a program to estimate the value of e 2.718281 . . . using the series:
X
1
e=
k!
k=0
Obviously, you will need to restrict the summation to a finite number of n terms.
Exercise 4.9. The value of can be expressed by the following infinite series:
1 1 1 1
1
1
=4 1 + +
+
3 5 7 9 11 13
An approximation can be made by taking the first n terms of the series. For n = 4, the
approximation is
1 1 1
4 1 +
= 2.8952
3 5 7
Write a program that takes n as input and outputs an approximation of according to
the series above.
Exercise 4.10. The sine function can be approximated using the following Taylor series.
sin (x) =
X
(1)i 2i+1
x3 x5
x
=x
+
(2i
+
1)!
3!
5!
i=0
Write a function that takes x and n as inputs and approximates sin x by computing the
first n terms in the series above.
111
4. Loops
Exercise 4.11. One way to compute is to use Machins formula:
1
1
= 4 arctan arctan
4
5
239
To compute the arctan function, you could use the following series:
arctan x =
X (1)i
x x3 x5 x 7
+ =
x2i+1
1
3
5
7
2k
+
1
i=0
Write a program to estimate using these formulas but allowing the user to specify how
many terms to use in the series to compute it. Compare the estimate with the built-in
definition of in your language.
Exercise 4.12. The arithmetic-geometric mean of two numbers x, y, denoted M (x, y) (or
gn+1 =
an g n
The two sequences will converge to the same number which is the arithmetic-geometric
mean of x, y. Obviously we cannot compute an infinite sequence, so we compute until
|an gn | < for some small number .
Exercise 4.13. The integral of a function is a measure of the area under its curve. One
numerical method for computing the integral of a function f (x) on an interval [a, b] is
the rectangle rule. Specifically, an interval [a, b] is split up into n equal subintervals of
size h = ba
. Then the integral is approximated by computing:
n
Z
f (x)dx
a
n1
X
f (a + ih) h
i=0
Write a program to approximate an integral using the rectangle method. For this
particular exercise you will integrate the function
f (x) =
sin x
x
For reference, the function is depicted in Figure 4.3. Write a program that will read the
end points a, b and the number of subintervals n and computes the integral of f using
the rectangle method. It should then output the approximation.
Exercise 4.14. Another way to compute an integral is to a technique called Monte
Carlo Integration, a randomized numerical integration method.
Given the interval [a, b], we enclose the function in a region of interest with a rectangle
of a known area Ar . We then randomly select n points within the rectangle and count
112
4.8. Exercises
sin x
x
the number of random points that are within the functions curve. If m of the n points
are within the curve, we can estimate the integral to be
Z b
m
f (x) dx Ar
n
a
Consider again the function f (x) = sin(x)
. Note that the global maximum and minimum
x
of this function are 1 and 0.2172 respectively. Therefore, we can also restrict the
rectangle along the y-axis from .25 to 1. That is, the lower left of the rectangle will be
(a, .25) and the upper right will be (b, 1) for a known area of
Ar = |a b| 1.25
Figure 4.4 illustrates the rectangle for the interval [5, 5].
113
4. Loops
Exercise 4.15. Consider a ball trapped in a 2-D box. Suppose that it has an initial
position (x, y) within the box (the boxs dimensions are specified by its lower left (x` , y` )
and an upper right (xr , yr ) points) along with an initial angle of travel in the range
[0, 2). As the ball travels in this direction it will eventually collide with one of the sides
of the box and bounce off. For this model, we will assume no loss of velocity (it keeps
going) and its angle of reflection is perfect.
Write a program that takes as input, x, y, , x` , y` , xr , yr , and an integer n and computes
the first n 1 Euclidean points on the boxs perimeter that the ball bounces off of in its
travel (include the initial point in your printout for a total of n points). You may assume
that the input will always be good (the ball will always begin somewhere inside the
box and the lower left and upper right points will not be reversed).
As an example, consider the inputs:
x = 1, y = 1, = .392699, x` = 0, y` = 0, xr = 4, yr = 3, n = 20
Starting at (1, 1), the ball travels up and to the right bouncing off the right wall. Figure
4.5 illustrates this and the subsequent bounces back and forth.
y
(1.284, 3)
(1, 1)
x
Figure 4.5.: Follow the bouncing ball
Your output should simply be the points and should look something like the following.
114
4.8. Exercises
(1.000000,
(4.000000,
(2.171572,
(0.000000,
(4.000000,
(2.928929,
(0.000000,
(4.000000,
(3.686287,
(0.000000,
(3.556355,
(4.000000,
(0.000000,
(2.798998,
(4.000000,
(0.000000,
(2.041640,
(4.000000,
(0.000000,
(1.284282,
1.000000)
2.242640)
3.000000)
2.100506)
0.443652)
0.000000)
1.213202)
2.870056)
3.000000)
1.473090)
0.000000)
0.183764)
1.840617)
3.000000)
2.502529)
0.845675)
0.000000)
0.811179)
2.468033)
3.000000)
Exercise 4.16. An integer n 2 is prime if its only divisors are 1 and itself, n. For
example, 2, 3, 5, 7, 11, . . . are primes. Write a program that outputs all prime numbers 2
up to m where m is read as input.
Exercise 4.17. An integer n 2 is prime if the only integers that evenly divide it are 1
and n itself, otherwise it is composite. The prime factorization of an integer is a list of
its prime divisors along with their multiplicities. For example, the prime decomposition
of 188, 760 is:
188, 760 = 2 2 2 3 5 11 11 13
Write a program that takes an integer n as input and outputs the prime factorization of
n. If n is invalid, an appropriate error message should be displayed instead. Your output
should look something like the following.
1001 = 7 * 11 * 13
Exercise 4.18. One way of estimating is to randomly sample points within a 2 2
square centered at the origin. If the distance between the randomly chosen point (x, y)
and the origin is less than or equal to 1, then the point lies inside the unit circle centered
at the origin and we count it. If the point lies outside the circle then we can ignore it. If
we sample n points and m of them lie within the circle, then can be estimated as
4m
n
115
4. Loops
y
116
4.8. Exercises
from the origin and the first point should lie on the positive x-axis. Each subsequent
point should be at an angle equal to 2
from the previous point. Recall that given the
n
polar coordinates , r we can convert to cartesian coordinates (x, y) using the following.
x = r cos
y = r sin
Your program should be robust enough to check for invalid inputs. If invalid, an error
message should be printed and the program should exit.
For example, running your program with n = 5, r = 6 should produce the points of a
pentagon with radius 6. The output should look something like:
Regular 5-sided polygon with radius 6.0:
(6.0000, 0.0000)
(1.8541, 5.7063)
(-4.8541, 3.5267)
(-4.8541, -3.5267)
(1.8541, -5.7063)
Exercise 4.20. Let p1 = (x1 , y1 ) and p2 = (x2 , y2 ) be two points in the cartesian plane
which define a line segment. Suppose we travel along this line starting at p1 taking n
steps that are an equal distance apart until we reach p2 . We wish to know which points
correspond to each of these steps and which step along this path is closest to anther
point p3 = (x3 , y3 ). Recall that the distance between two points can be computed using
the Euclidean distance formula:
p
= (x1 x2 )2 + (y1 y2 )2
Write a program that takes three points and an integer n as inputs and outputs a sequence
of points along the line defined by p1 , p2 that are distance n apart from each other. It
should also indicate which of these computed points is the closest to the third point.
For example, the execution of your program with inputs 0, 2, 5.5, 7.75, 2, 3, 10 should
produce output that looks something like:
(0.00, 2.00) to (-5.50, 7.75) distance: 7.9569
(0.00, 2.00)
(-0.55, 2.58)
(-1.10, 3.15)
(-1.65, 3.72) <-- Closest point to (-2, 3)
(-2.20, 4.30)
(-2.75, 4.88)
(-3.30, 5.45)
(-3.85, 6.02)
(-4.40, 6.60)
(-4.95, 7.17)
(-5.50, 7.75)
117
4. Loops
Exercise 4.21. The natural log of a number x is usually computed using some numerical
approximation method. One such method is to use the following Taylor series.
(x 1)2 (x 1)3 (x 1)4
+
+
2
3
4
However, this only works for |x 1| 1 (except for x = 0) and diverges otherwise. For
x such that |x| > 1, we can use the series
ln x = (x 1)
ln
1
1
y
1
= + 2 + 3 +
y1
y 2y
3y
x
where y = x1
. Of course such an infinite computation cannot be performed by a
computer. Instead, we approximate ln x by computing the series out to a finite number
of terms, n. Your program should print an error message and exit for x 0; otherwise it
should use the first series for 0 < x 1 and the second for x > 1.
Another series that has better convergence properties and works for any range of x is as
follows
1 2 1 4 1 6
1+y
= 2y 1 + y + y + y
ln x = ln
1y
3
5
7
where y =
(x1)
.
(x+1)
You will write a program that approximates ln x using these two methods computed to n
terms. You will also compute the error of each method by comparing the approximated
value to the standard math librarys log function.
Your program should accept x and n as inputs. It should be robust enough to reject
any invalid inputs (ln x is not defined for x = 0 you may also print an error for any
negative value; n must be at least one). It will then compute an approximation using
both methods and print the relative error of each method.
For example, the execution of your program with inputs 3.1415, 6 should produce output
that looks something like:
Taylor Series: ln(3.1415) ~= 1.11976
Error: 0.02494
Other Series: ln(3.1415) ~= 1.14466
Error: 0.00004
Exercise 4.22. There are many different numerical methods to compute the square root
of a number. In this exercise, you will implement several of these methods.
(a) The Exponential Identity Method involves the following identity:
1
x = e 2 ln (x)
Which assumes the use of built-in (or math-library) functions for e and the natural
log, ln.
118
4.8. Exercises
(b) The Babylonian Method involves iteratively computing the following recurrence:
x
1
ai1 +
ai =
2
ai1
where a1 = 1.0. Computation is repeated until |ai ai1 | where is some
small constant value.
(c) A method developed for one of the first electronic computers (EDSAC [27]) involves
the following iteration. Let a0 = x, c0 = x 1. Then compute
ai+1 = ai ai2ci
c2 (c 3)
ci+1 = i 4i
The iteration is performed for as many iterations as specified (n), or until the
change
in a is negligible. The resulting value for a is used as an approximation for
x a. However, this method only works for values of x such that 0 < x < 3. We
can easily overcome this by scaling x by some power of 4 so that the scaled value
of x satisfies 12 x < 2. After applying the method we can then scale back up by
the appropriate value of 2 (since 4 = 2). Algorithm 4.17 describes how to scale x.
Write a program to compute the square root of an input number using these methods
and compare your results.
1
2
3
4
power 0
while x < 12 do
//Scale up
x (x 4)
power (power 1)
end
while x 2 do
//Scale down
x x4
power (power + 1)
end
5
6
ln(x)
m ln(2)
2M (1, 4/s)
119
4. Loops
Where M (a, b) is the arithmetic-geometric mean and s = x2m . In this formula, m
is a parameter (a larger m provides more precision).
(b) The standard Taylor Series for the natural logarithm is:
ln(x) =
X
(1)n+1
n=1
(x 1)n
As we cannot compute an infinite series, we will simply compute the series to the
first m terms. Also note that this series is not convergent for values x > 1
(c) Borchardts algorithm is an iterative method that works as follows. Let
a0 =
1+x
2
b0 =
Then repeat:
ak +bk
ak+1 = p
2
bk+1 =
ak+1 bk
until the absolute difference between ak , bk is small; that is |ak bk | < . Then the
logarithm is approximated as
ln(x) 2
x1
ak + b k
120
4.8. Exercises
Exercise 4.25. Write a program that takes an integer n and a subsequent list of integers
as command line arguments and determines which number(s) between 1 and n are
missing from the list. For example, if the following numbers are given to the program:
10 5 2 3 9 2 8 8 your output should look something like:
Missing numbers 1 thru 10:
1, 4, 6, 7, 10
Exercise 4.26. Write a program that takes a list of pairs of numbers representing
latitudes/longitudes (on the scale [180, 180] (negative values correspond to the southern
and western hemispheres). Then, starting with the first pair, calculate the intermediate
air distances between each location as well as a final total distance.
To compute air distance from location A to a location B, use the Spherical Law of
Cosines:
d = arccos (sin(1 ) sin(2 ) + cos(1 ) cos(2 ) cos()) R
where
1 is the latitude of location A, 2 is the latitude of location B
is the difference between location Bs longitude and location As longitude
R is the (average) radius of the earth, 6,371 kilometers
Note: the formula above assumes that latitude and longitude are measured in radians r,
r . To convert from degrees deg (180 deg 180) to radians r, you can use
the simple formula:
deg
r=
180
For example, if the command line arguments were
40.8206 -96.756 41.8806 -87.6742 41.9483 -87.6556 28.0222 -81.7329
your output should look something like:
(40.8206, -96.7560) to (41.8806, -87.6742): 766.8053km
(41.8806, -87.6742) to (41.9483, -87.6556): 7.6836km
(41.9483, -87.6556) to (28.0222, -81.7329): 1638.7151km
Total Distance: 2413.2040
Exercise 4.27. A DNA sequence is made up of a sequence of four nucleotide bases, A,
C, G, T (adenine, cytosine, guanine, thymine). One particularly interesting statistic of a
DNA sequence is finding a CG island : a subsequence that contains the highest frequency
of guanine and cytosine.
For simplicity, we will be interested in subsequences of a particular length, n that will be
provided as part of the input.
121
4. Loops
Write a program that takes, as command line arguments, an integer n and a DNA
sequence. The program should then find all subsequences of the given DNA string of
length n with the maximal frequency of C and G in it. For example, if the DNA sequence
is
ACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGC
and the window size that were interested in is n = 5 then you would scan the sequence
and find every subsequence with the maximum number of C or G bases. Your output
should include all CG Islands (by indices) in the sequence similar to the following.
n = 5
highest frequency: 5 / 5 = 100.00%
CG Islands:
15 thru 20: CCCCC
16 thru 21: CCCCG
17 thru 22: CCCGG
18 thru 23: CCGGC
19 thru 24: CGGCC
42 thru 47: CCGGG
43 thru 48: CGGGG
44 thru 49: GGGGC
45 thru 50: GGGCC
Exercise 4.28. Write a program that will assist people in saving for retirement using a
tax-deferred 401k program.
Your program will read the following inputs as command line arguments.
An initial starting balance
A monthly contribution amount (well assume its the same over the life of the
savings plan)
An (average) annual rate of return (on the scale [0, 1])
An (average) annual rate of inflation (on the scale [0, 1])
A number of years until retirement
Your program will then compute a monthly savings table detailing the (inflation-adjusted)
interest earned each month, contribution, and new balance. The inflation-adjusted rate
of return can be computed with the following formula.
1 + rate of return
1
1 + inflation rate
To get the monthly rate, simply divide by 12. Each month, interest is applied to the
122
4.8. Exercises
balance at this rate (prior to the monthly deposit) and the monthly contribution is added.
Thus, the earnings compound month to month.
Be sure that your program handles bad inputs as well as it can. It should also round to
the nearest cent for every figure. Finally, as of 2014, annual 401k contributions cannot
exceed $17,500. If the users proposed savings schedule violates this limit, display an
error message instead of the savings table.
For inputs 10000 500 0.09 0.012 10 your output should look something like the
following:
Month
1
2
3
4
5
6
7
8
9
...
116
117
118
119
120
Total
Total
$
$
$
$
$
$
$
$
$
Interest
64.23
67.85
71.50
75.17
78.87
82.58
86.33
90.09
93.88
$
$
$
$
$
$
$
$
$
Balance
10564.23
11132.08
11703.58
12278.75
12857.62
13440.20
14026.53
14616.62
15210.50
$
678.19 $ 106767.24
$
685.76 $ 107953.00
$
693.37 $ 109146.37
$
701.04 $ 110347.41
$
708.75 $ 111556.16
Interest Earned: $ 41556.16
Nest Egg: $ 111556.16
Exercise 4.29. An affine cipher is an encryption scheme that encrypts messages using
the following function:
ek (x) = (ax + b) mod n
Where n is some integer and 0 a, b, x n 1. That is, we fix n, which will be used to
encode an alphabet as in Table 4.1.
Then we choose integers a, b to define the encryption function. Suppose a = 10, b = 13,
then
ek (x) = (10x + 13) mod 29
So to encrypt HELLO! we would encode it as 8, 5, 12, 12, 15, 27, then encrypt them,
ek (8) = (10 8 + 13) mod 29 = 6
ek (5) = (10 5 + 13) mod 29 = 5
123
4. Loops
x
0
1
2
3
..
.
character
(space)
A
B
C
..
.
25
26
27
28
Y
Z
.
!
124
5. Functions
In mathematics, a function is a mapping from a set of inputs to a set of outputs such
that each input is mapped to exactly one output. For example, the function
f (x) = x2
maps numeric values to their squares. The input is a variable x. When we assign an
actual value to x and evaluate the function, then the function has a value, its output. For
example, setting x = 2 as input, the output would be 22 = 4. Mathematical functions
can have multiple inputs,
f (x, y) = x2 + y 2
f (, y, z) = 2x + 3y 4z
125
5. Functions
languages that provide functions to perform standard input/output or mathematical
functions. These standard libraries provide functions that are used by thousands of
different programs across multiple different platforms.
Functions also form an isolated unit of code. This allows for better and easier testing.
By isolating pieces of code, we can rigorously test those pieces of code by themselves
without worrying about the larger program or contexts.
Finally, functions facilitates procedural abstraction. Placing code into functions allows
you to abstract the details of how the function computes its answer. As an example:
consider a standard math librarys square root function: it may use some interpolation
method, a Taylor series, or some other method entirely to compute the square root of a
given number. However, by putting this functionality into a function, we, as programmers,
do not need to concern ourselves about these details. Instead, we simply use the function,
allowing us to focus on the larger issues at hand in our program.
126
Function sum(a, b)
xa+b
return x
end
1
2
Identifier
(name)
Figure 5.1.: A function declaration (prototype) in the C programming language with the
return type, identifier, and parameter list labeled.
Some languages only allow you to use one identifier for one function (like variables) while
other languages allow you to define multiple functions with the same identifier as long
as the parameter list is different (see Section 5.3.2 below). In general, like variables,
function names are case sensitive. Also similar to variables, modern lower camel casing
is used with function names.
When defining the parameters to a function (its input), you usually provide a comma
delimited list of variable names. In the case of statically typed languages, the types of
the variable parameters are also specified. This order is important as when you invoke
the function, the number of inputs must match the number of parameters in the function
declaration. The variable types may also need to match. In some dynamically typed
languages, you may be able to call functions with different types or you may be able to
omit some of the parameters (see Section 5.3.4 below).
Similarly, the return type of the function may need to be specified in statically typed
languages while with dynamic languages, functions may conditionally return different
types. We generally refer to the return value or return type because when a function
127
5. Functions
is done executing, it returns the control flow back to the line of code that invoked it,
returning its computed value.
You can also define functions that may not have any inputs or may not have any output.
Some languages use the keyword void to indicate no return value and such functions are
known as void functions. When a function doesnt have any input values, its parameter
list is usually empty.
The function signature may then be accompanied by the function body which contains
the actual code that specifies what the function does. Typically the function body is
demarcated with opening and closing curly brackets, { ... } . Within the function you
can generally write any valid code including declaring variables. When you declare a
variable inside a function, however, it is local to that function. That is, the variables
scope is only defined within the function. A local variable cannot be accessed outside
the function, indeed the local variable does not usually survive when the function ends
its execution and returns control back to line of code that called it. Function parameters
are essentially locally scoped variables as well and can usually be treated as such.
a 10
b 20
c sum(a, b)
128
5.1.3. Organizing
Functions provide code organization, but functions themselves should also be organized.
Weve seen this with standard libraries. Functions that provide basic input/output are
all grouped together into one library. Functions that involve math functions are grouped
together into a math library.
Some languages allow you to define and import individual libraries which organize
similar functions together. Some languages do this by collecting functions into utility
classes or modules. Only when you import these modules do the functions come into
scope and can be used in your code. If you do not import these modules, then the
functions are out of scope and cannot be used.
In some languages, functions, once imported, are part of the global scope and can be
seen by any part of the code. This can cause conflicts: if you import modules from two
different libraries each with different functions that have the same name or signature,
then the two function definitions may be in conflict or it may make your code ambiguous
as to which function you intend to invoke. This is sometimes referred to as polluting the
namespace. There are several techniques that can avoid this situation. Some languages
allow you to place functions into a namespace to keep functions with the same name in
different spaces. Other languages allow you to place functions into different classes and
then invoke them by explicitly specifying which classs function you want to call. Yet
other languages dont have great support for function organization and it is the library
designers responsibility to avoid naming conflicts, typically by adding a library-specific
prefix to every function.
129
5. Functions
by the function, as well as a space for a return value and a return address. The return
address is a memory location that the program should return to after the execution of
the function. That way, when the function finishes its execution, the stack frame can be
removed (popped) and the lower stack frame of the calling function is preserved. This is
a very efficient way to keep track of the flow of control in a program. As function calls
function calls another function, each stack frame is preserved by pushing a new one on
top of the program stack.
Each time a function terminates execution and returns, the removal of the stack frame
means that all local variables go out of scope. Thus, variables that are local to a function
are not accessible outside the function.
To illustrate, consider the following snippet of C code. The main() function invokes
the average() function which in turn invokes the sum() function. Each invocation
creates a new stack frame on top of the last in the program stack which is depicted in
Figure 5.2.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
130
local variables x
return address, value ( 26.0 )
arguments a, b
local variables y
return address, value ( 12.5 )
arguments a, b
Program Code
Figure 5.2.: Program Stack. At the bottom we have the programs code, followed by
static content such as global variables. Each function call has its own stack
frame along with its own arguments and local variables. In particular, the
variable arguments a and b in two different stack frames are completely
different variables. Upon returning from the sum() function call, the topmost stack frame would be popped and removed, returning to the code
for the average() function via the return address. The stack is depicted
bottom-up with high memory at the bottom and low memory at the top,
but this may differ depending on the architecture.
Recall that the arguments passed to a function are placed in a new stack frame for that
function. Thus, in reality copies of the values of the variables are passed to the function.
Any changes to the parameters inside the function have no effect on the original variables
131
5. Functions
that were passed to the function when it was invoked.
To illustrate, consider the following C code. We have a function sum that takes two
integer parameters a and b which are passed by value. Inside sum , we create another
variable x which is the sum of the two passed variables. We then change the value
of the first variable, a to 10 . Elsewhere in the code we call sum on two variables,
n , m with values 5 and 15 respectively. The invocation of the function sum means
that the two values, 5 and 15, stored in the variables are copied into a new stack frame.
Thus, changing the value to the first parameter changes the copy and has no effect on
the variable n . At the end of this code snippet n retains its original value of 5. The
program stack frames are depicted in Figure 5.3.
1
2
3
4
5
6
7
...
8
9
10
11
int n = 5;
int m = 15;
int k = sum(n, m);
132
sum()
stack frame
0x0088
0x0084
0x0080
x = 20
b = 15
a = 5
sum()
stack frame
0x0088
0x0084
0x0080
..
.
calling function
stack frame
0x0018
0x0014
0x0010
..
.
k
m = 10
n = 5
sum()
stack frame
0x0088
0x0084
0x0080
x = 20
b = 15
a = 10
calling function
stack frame
0x0018
0x0014
0x0010
0x0018
0x0014
0x0010
k
m = 10
n = 5
sum()
stack frame
0x0088
0x0084
0x0080
..
.
calling function
stack frame
x = 20
b = 15
a = 10
x = 20
b = 15
a = 10
..
.
k
m = 10
n = 5
calling function
stack frame
0x0018
0x0014
0x0010
k = 20
m = 10
n = 5
Figure 5.3.: Demonstration of Pass By Value. Passing variables by value means that
copies of the values stored in the variables are provided to the function.
Changes to parameter variables do not affect the original variables.
133
5. Functions
1
2
3
4
5
6
7
...
8
9
10
11
int n = 5;
int m = 15;
int k = sum(&n, m);
134
sum()
stack frame
0x0088
0x0084
0x0080
x = 20
b = 15
a = 0x0010
sum()
stack frame
0x0088
0x0084
0x0080
..
.
calling function
stack frame
0x0018
0x0014
0x0010
..
.
k
m = 10
n = 5
sum()
stack frame
0x0088
0x0084
0x0080
x = 20
b = 15
a = 0x0010
calling function
stack frame
0x0018
0x0014
0x0010
0x0018
0x0014
0x0010
k
m = 10
n = 10
sum()
stack frame
0x0088
0x0084
0x0080
..
.
calling function
stack frame
x = 20
b = 15
a = 0x0010
x = 20
b = 15
a = 0x0010
..
.
k
m = 10
n = 10
calling function
stack frame
0x0018
0x0014
0x0010
k = 20
m = 10
n = 10
135
5. Functions
receives the callback will execute or call back the passed function at some point.
Using callbacks enables us to program a generic function that provides some generalized
functionality. Then more specific behavior can be be implemented in the callback function.
For example, we could create a generic sort function that sorts elements in a collection.
We could make the sort function generic so that it could sort any type of data: numbers,
strings, objects, etc. A callback would provide more specific behavior on how to order
individual elements in the sorted array.
As another example, consider GUI Programming in which we want to design a user
interface. In particular, we may be able to create a button element in our interface. We
need to be able to specify what happens when the user clicks the button. This could be
achieved by passing in a function as a callback to register it with the click event.
A related issue is anonymous functions. Typically, we simply want to create a function
so that we can pass it as a callback to another function. We may have no intention of
actually calling this function directly as it may not be of much use other than passing
it as a callback. Some languages allow you to define a function inline without an
identifier so that it can be passed to another function. Since the function has no name
and cannot be invoked by other sections of the code (other than the function we passed
it to), it is known as an anonymous function.
136
5.4. Exercises
for functions that perform the same operation but on different types.
5.4. Exercises
Exercise 5.1. Recall that the greatest common divisor (gcd) of two positive integers, a
and b is the largest positive integer that divides both a and b. Adapt the solution from
Exercise 4.7 into a function. If the language you use supports it, return the gcd via a
pass by reference variable.
137
5. Functions
Exercise 5.2. Write a function that scales an input x to to its scientific notation scale
so that 1 x < 10. If you language supports pass by reference, the amount that x is
shifted should be stored in a pass-by-reference parameter. For example, a call to this
function with x = 314.15 should return 3.1415 and the amount it is scaled by is n = 2.
Exercise 5.3. Write a function that returns the most significant digit of a floating point
number. The function should only return an integer in the range 1 9 (it should return
zero only if x = 0).
Exercise 5.4. Write a function that, given an integer x, sums the values of its digits.
That is, for x = 29423 the sum 2 + 9 + 4 + 2 + 3 = 20.
Exercise 5.5. Write a function to convert radians to degrees using the formula,
deg =
180 rad
gn+1 =
an g n
The two sequences will converge to the same number which is the arithmetic-geometric
mean of x, y. Obviously we cannot compute an infinite sequence, so we compute until
|an gn | < for some small number .
Exercise 5.8. Write a function to compute the annual percentage yield (APY) given an
annual percentage rate (APR) using the formula
AP Y = eAP R 1
Exercise 5.9. Write a function that will compute the air distance between two locations
given their latitudes and longitudes. Use the formula as in Exercise 2.14.
Exercise 5.10. Write a function to convert a color represented in the RGB (red-greenblue) color model (used in digital monitors) to a CMYK (cyan-magenta-yellow-key) used
in printing. RGB values are integers in the range [0, 255] while CMYK are fractional
numbers in the range [0, 1]. To convert to CMYK, you first need to scale each integer
value to the range [0, 1] by simply computing
r0 =
138
r
,
255
g0 =
g
,
255
b0 =
b
255
5.4. Exercises
and then using the following formulas:
K
C
M
Y
= 1 max{r0 , g 0 , b0 }
= (1 r0 k)/(1 k)
= (1 g 0 k)/(1 k)
= (1 b0 k)/(1 k)
Exercise 5.11. Write a function to convert from CMYK to RGB using the following
formulas.
r = 255 (1 C) (1 K)
g = 255 (1 M ) (1 K)
b = 255 (1 Y ) (1 K)
Exercise 5.12. Write some functions to convert an RGB color to a gray scale, removing
the color values. An RGB color value is grayscale if all three components have the same
value. To transform a color value to grayscale, there there are several possible techniques.
The average method simply sets all three values to the average:
r+g+b
3
The lightness method averages the most prominent and least prominent colors:
max{r, g, b} + min{r, g, b}
2
The luminosity technique uses a weighted average to account for a human perceptual
preference toward green:
0.21r + 0.72g + 0.07b
Exercise 5.13. Adapt the methods to compute a square root in Exercise 4.22 into
functions.
Exercise 5.14. Adapt the methods to compute the natural logarithm in Exercise 4.23
into functions.
Exercise 5.15. Weight (mass in the presence of gravity) can be measured in several
scales: kilogram force (kgf), pounds (lbs), ounces (oz), or Newtons (N). To convert
between these scales, you can use the following facts:
1 kgf is equal to 2.20462 pounds
There are 16 ounces in a pound
139
5. Functions
1 kgf is equal to 9.80665 Newtons
Write a collection of functions to convert between these scales.
Exercise 5.16. Length can be measured by several different units. We will concern
ourselves with the following scales: kilometer, mile, nautical mile, and furlong. A measure
in each one of these scales can be converted to another using the following facts.
One mile is equivalent to 1.609347219 kilometers
One nautical mile is equivalent to 1.15078 miles
A furlong is 81 -th of a mile
Write a collection of functions to convert between these scales.
Exercise 5.17. Temperature can be measured in several scales: Celsius, Kelvin, Fahrenheit, and Newton. To convert between these scales, you can use the following conversion
table.
From/To
Celsius
Kelvin
Fahrenheit
Newton
Celsius
k 273.15
(f 32) 59
n 100
33
Kelvin
c + 273.15
5
f
+
255.372
9
100
n + 273.15
33
Fahrenheit
Newton
9
33
c 5 + 32
c 100
9
k 459.67 .33k 90.1395
5
11
f 88
60
15
60
n
+
32
11
140
5.4. Exercises
The torr, an absolute scale for pressure
To convert between these units, you can use the following formulas.
1 psi is equal to 6,894.75729 Pascals, 1 psi is equal to 0.06804596 atmospheres
1 atmosphere is equal to 101,325 Pascals
1 torr is equal to
1
760
atmosphere and
101,325
760
Pascals
141
6. Error Handling
Writing perfect code is difficult. The more complex a system or code base, the more
likely it is to have bugs. That is, flaws or mistakes in a program that result in incorrect
behavior or unintended consequences. The term bug has been used in engineering
for quite a while. The term was popularized in the context of computer systems by
Grace Hopper who, when working on the Naval Mark II computer in 1946, tracked a
malfunction to a literal bug, a moth, trapped in a relay [2].
Some of the biggest modern engineering failures can be tracked to simple software bugs.
For example, on September 26th, 1983 a newly installed Soviet early warning system
gave indication that nuclear missiles had been launched on the Soviet Union by the
United States. Stanislav Petrov, a lieutenant colonel in the Soviet Air Defense Forces and
duty officer at the time, did not trust the new system and did not report the incident to
superiors who may have ordered a counter strike. Petrov was correct as the false alarm
was caused by sunlight reflections off of high altitude clouds as well as other bugs in the
newly deployed system [26].
In September 1999 the Mars Climate Orbiter, a project intended to study the Martian
climate and atmosphere was lost after it entered into the upper atmosphere of Mars and
disintegrated. The error was due to a subsystem that measured the crafts momentum
in non-standard pound force per second when all other systems expected the standard
newton second unit [1]. The loss of the project was calculated at over $125 million.
There are numerous other examples, some that have caused inconvenience to users (such
as the Zune bug mentioned in Section 4.5.2) to bugs in medical devices that have cost
dozens of lives to those resulting in the loss of millions of dollars [6].
In some sense, Software Engineering and programming is unique. If you build a bridge
and forget one bolt its likely not going to cause the bridge to collapse. If you draw
up plans for a development and the land survey is a few inches off overall, its not a
catastrophic problem. However, if you forget one character or are off by one number in a
program, it can cause a complete system failure.
There are a variety of reasons for why bugs make it into systems. Bugs could be
the result of a fundamental misunderstanding of the problem or requirements. Poorly
managed projects and the pressure of time constraints to deliver a project may make
developers more careless. A lack of proper testing may mean many more bugs survive
the development process than otherwise should have. Even expert programmers can
143
6. Error Handling
overlook a simple mistake when writing thousands of lines of code.
Given the potential for error, it is important to have good software development methodologies that emphasize testing a system at all levels. Working in teams where regular
code reviews are held so that colleagues can examine, critique, and catch potential bugs
are essential for writing robust code.
WHERE TO PLACE THIS?
Much of what we now consider Software Engineering was pioneered by people like
Margaret Hamilton who was the lead Apollo flight software designer at NASA. During
the Apollo 11 Moon landing (1969), an error in one system caused the landers computer
to become overworked with data. However, because the system was designed with a
robust architecture, it could detect and handle such situations by prioritizing more
important tasks (those related to landing) over lower priority tasks. The resilience that
was built into the system is credited with its success [11].
END
Modern coding tools and techniques can also help to improve the robustness of code.
For example, debuggers are tools that help a developer debug (that is, find and fix the
cause of an error) a program. Debuggers generally allow you to simulate the execution of
a program statement by statement and view the current state of the program such as
variable values. You can step through the execution line by line to find where an error
occurs in order to localize an identify a bug.
Other tools allow you to perform static analysis on source code to search for potential
problems. That is, problems that are not syntax errors and are not necessarily bugs
that are causing problems, but instead are anti-patterns or code smells. Anti-patterns
are essentially common bad-habits that can be found in code. They are an attempted
solution to a commonly encountered problem but which dont actually solve the problem
or introduces new problems. Code smells are symptoms in a source code that indicate
a possible deeper design or implementation flaw. Failure to adhere to good programming
principles such as properly initializing variables or failure to check for null values are
examples of smells. Static analysis tools automatically examine the code base for potential
issues like these. For example, a lint (or linter) is a tool that can examine source code
suspicious or non-portable code or code that does not comply with generally accepted
standards or ways of doing things.
Even if code contains no bugs, it is still susceptible to errors. For example, a program
could connect to a remote database to pull and process data. However, if the network
connection is temporarily unavailable, the program will not be able to execute properly.
Because of the potential of such errors, it is important to write robust and resilient code.
We must anticipate possible error conditions and write code to detect, prevent, or recover
from such errors. Generally, this is referred to as error handling.
144
145
6. Error Handling
that it should be fatal and terminate the execution of the program.
Which is the right way to handle this error? It depends on your design requirements
really. This raises the question, though: who is responsible for making these decisions?
Suppose were designing a function for a library that is for use not just by our project but
others as well (as is the case with the standard library functions). Further, the function
were designing could have multiple different error conditions that it checks for. In this
scenario there are two entities that could handle the errors: the function itself and the
code that invokes the function.
Suppose that we decide to handle the errors inside the function. That is, as designers of
the function, weve made the decision to handle the errors for the user (the code that
invokes our function). Regardless of how we decide to handle the errors, this design
decision has essentially taken any decision making ability away from users. This is not
very flexible for someone using our code. If they have different design considerations or
requirements, they may need or want to handle the errors in a different way than we did.
Now suppose that we decide not to handle the errors inside our function. Defensive
programming may still be used to prevent the execution of code that results in an error.
However, we now need a way to communicate the error condition to the calling function
so that it can know what type of error happened and handle it appropriately.
Error Codes
One common pattern to communicate errors to a calling function is to use the return
type as an error code. Usually this is an integer type. By convention 0 is used to indicate
no error and various other non-zero values are used to indicate various types of errors.
Depending on the system and standard used, error codes may have a predefined value or
may be specific to an application or library.
One problem with using the return type to indicate errors is that functions are no longer
able to use the return type to return an actual computed value. If a language supports
pass by reference, then this is not generally a problem. However, even with such languages
there are situations where the return type must be used to return a value. In such cases,
the function can still communicate a general error message by returning some flag value
such as null.
Alternatively, a language may support error codes by using a shared global variable that
can be set by a function to indicate an error. The calling function can then examine the
variable to see if an error occurred during the invocation of the function.
146
A problem arises when an error condition is checked and does not hold. Then, later in
the execution, circumstances change and the error condition then holds. However, since
it was already checked for, the program remains under the assumption that the error
condition does not hold. For example, suppose that another process or program deletes
the file that we wish to process after its existence has been checked but before we start
processing it.
Because of the sequential nature of our program, this type of error checking is susceptible
to these issues.
6.2.2. Exceptions
An exception is an event or occurrence of an anomalous, erroneous or exceptional
condition that requires special handling. Exceptions interrupt the normal flow of control
in a program by handing the flow of control over to exception handlers.
Languages usually support exception handling using a try-catch control structure such
as the following.
147
6. Error Handling
try {
//potentially dangerous code here
} catch(Exception e) {
//exception handling code here
}
The try is used to encapsulate potentially dangerous code, or simply code that would
fail if an error condition occurs. If an error occurs at some point within the try block,
control flow is immediately transferred to the catch block. The catch block is where
you specify how to handle the exception. If the code in the try block does not result in
an exception, then control flow will skip over the catch statement and resume normally
after.
It is important to understand how exceptions interrupt the normal control flow. For
example, consider the following pseudocode
try {
statement1;
statement2;
statement3;
} catch(Exception e) {
//exception handling code here
}
Suppose statement1 executes with no error but that when statement2 executes, it
results an exception. Control flow is then transferred to the catch block, skipping
statement3 entirely. In general, there may not be a mechanism for your catch block to
recover and execute statement3 . Therefore, maybe necessary to make your try-catch
blocks fine-grained, perhaps having only a single statement within the try statement.
Some languages only support a generic Exception and the type of error may need
to be communicated through other means such as a string error message. Still other
languages may support many different types of exceptions and you may be able to provide
multiple catch statements to handle each one differently. In such languages, the order in
which you place your catch statements may be important as similar to an if-else-if
statement, the first one that matches will be the one that executes. Thus, it is best
practice to order your catch blocks from the most specific to the most general.
Some languages also support a third finally control statement as in the following
example.
148
6.3. Exercises
try {
//potentially dangerous code here
} catch(Exception e) {
//exception handling code here
} finally {
//unconditionally executed code here
}
The try-catch block operates as previously described. However, the finally block
will execute regardless of whether or not an exception was raised. If no exception
was raised, then the try block will fully execute and the finally block will execute
immediately after. If an exception was raised, control flow will be transferred to the
catch block. After the catch block has executed, the finally block will execute.
finally blocks are generally used to handle resources that need to be cleaned up
whether or not an exception occurs. For example, opening a connection to a database
to retrieve and process data. Whether or not an exception occurs during this process
the connection will need to be properly closed as it represents a substantial amount of
resources (a network connection, memory and processing time on both the server and
client machines, etc.). Failure to properly close the connection may result in wasted
resources. By placing the clean up code inside a finally statement, we can be assured
that it will execute regardless of an error or exception.
In addition to handling exceptions, a language may allow you to throw usually by
using the keyword throw . In this way you can also practice defensive programming.
You could write a conditional statement to check for an error condition and then throw
and exception.
6.3. Exercises
Exercise 6.1. Rewrite the function to compute the GCD in Exercise 5.1 to handle
invalid inputs.
Exercise 6.2. Rewrite the function to compute statistics of a circle in Exercise 5.6 to
handle invalid input (negative radius).
Exercise 6.3. Rewrite the function to compute the annual percentage yield in Exercise
5.8 to handle invalid input.
Exercise 6.4. Rewrite the function to compute air distance in Exercise 5.9 to handle
invalid input (latitude/longitude values outside the range [180, 180]).
Exercise 6.5. Rewrite the function to convert from RGB to CMYK in Exercise 5.10 to
handle invalid inputs (values outside the range [0, 255]).
149
6. Error Handling
Exercise 6.6. Rewrite the function to convert from CMYK to RGB in Exercise 5.11 to
handle invalid inputs.
Exercise 6.7. Rewrite the square root functions from Exercise 5.13 to handle invalid
inputs.
Exercise 6.8. Rewrite the natural logarithm functions from Exercise 5.14 to handle
invalid inputs.
Exercise 6.9. Rewrite the weight conversion functions from Exercise 5.15 to handle
invalid inputs.
Exercise 6.10. Rewrite the length conversion functions from Exercise 5.16 to handle
invalid inputs.
Exercise 6.11. Rewrite the temperature conversion functions from Exercise 5.17 to
handle invalid inputs.
Exercise 6.12. Rewrite the energy conversion functions from Exercise 5.18 to handle
invalid inputs.
Exercise 6.13. Rewrite the pressure conversion functions from Exercise 5.19 to handle
invalid inputs.
150
151
contents
11
13
17
19
23
29
Figure 7.1.: An integer array of size 10. Using zero-indexing, the first element is at index
0, the last at index 9.
arrays (see Section 7.2 below for a detailed discussion). Static arrays are generally created
using the program stack space while dynamically allocated arrays are stored in the heap.
In either case you generally declare an array by specifying its size. In statically typed
languages, you must also declare the arrays type (integer, floating-point, etc.).
Indexing Arrays
Once an array has been created you can use it by assigning values to it or by retrieving
values from it. Because there is more than one element, you must specify which element
you are assigning or retrieving. This is generally done through indexing. An index is
an integer that specifies an element in the array. The index is used in conjunction with
(usually) square brackets and the arrays identifier. For example, if the arrays identifier
is arr and the index is an integer value stored in the variable i , we would refer to the
i-th element using the syntax arr[i] . An example is presented in Figure 7.1.
For most programming languages, indices start at zero. This is known as zero-indexing.1
Thus, the first element is at arr[0] , the second at arr[1] etc. When an array is
stored in memory, each element is usually stored one after the other in one contiguous
space in memory. Further, each element is of a specific type which is represented using a
fixed number of bytes in memory. Thus the index actually acts as an offset in memory
from the beginning of the array. For example, if we have an array of integers which each
take 4 bytes each, then the 5th element would be stored at index 4, which is an an offset
equal to 4 4 = 16 bytes away from the beginning of the array. The first element, being
at index 0 is thus at 4 0 = 0 bytes from the beginning of the array (that is, the first
element is at the beginning of the array).
Once an element has been indexed, it can essentially be treated as a regular variable,
assigning and retrieving values as you would regular variables. Care must be taken
so that you do not make a reference to an element that does not exist. For example,
using a negative index or an index i n in an array of n elements. Depending on the
language, index an array element that is out-of-bounds may result in undefined behavior,
an exception being thrown, or a corruption of memory.
1
Though, some languages do use 1-indexing, there are very strong arguments in favor of zero-indexing
[13].
152
Some languages build the size of the array into a property that can be accessed. Java,
for example, has a arr.length property. Other languages provide functions that you
can pass an array to in order to get its size. Still other languages (such as C), place
the burden of bookkeeping the size of an array on you, the programmer. Whenever
you pass an array to a function you need to also pass a size parameter that informs the
function of how many elements are in the array. Yet other functions may also require
you not only tell it the size of the array, but also the size of each element in the array.
Some languages also support a basic foreach loop (cf. Section 4.4). A foreach loop is
syntactic sugar that allows you to iterate over the elements in an array (usually in order)
without the need for boilerplate code that creates and increments an index variable.
1
2
3
153
Demonstration in C
To make this concept a bit more clear, well use a concrete example in the C programming
language. Consider the program code in Figure 7.2. Here, we have a function foo()
that creates a static integer array of size 5, int b[5]; . This memory is allocated on
154
1
2
=
=
=
=
=
5
10
15
20
25
#include<stdlib.h>
#include<stdio.h>
3
4
5
6
7
8
9
10
11
12
int * foo(int n) {
int i;
int b[5];
for(i=0; i<5; i++) {
b[i] = n*i;
printf("b[%d] = %d\n", i, b[i]);
}
return b;
}
13
14
15
16
17
18
19
20
21
However, when the function foo() ends execution and returns control back to the
main() function, (sometimes called unwinding), the contents of foo() s stack frame
are altered as part of the process. Some of the contents are the same, but elements have
been completely altered. Printing the returned contents of the array gives us garbage:
155
=
=
=
=
=
1564158624
32767
15
20
-626679356
This is not an issue when returning primitive types as the return value is placed in a
special memory location available to the calling function. Even in our example, the
return value is properly communicated to the calling function: its just that the returned
value is a pointer to the arrays location (which happens to be a memory address in the
stale stack frame). The stack frames are depicted in Figure 7.3.
156
Variable
foo
b[4]
b[3]
b[2]
b[1]
b[0]
i
n
a
m
i
main
Address
..
.
0x5c44cb76
0x5c44cb72
0x5c44cb68
0x5c44cb64
0x5c44cb60
0x5c44cb56
0x5c44cb52
..
.
0x5c44cb34
0x5c44cb30
0x5c44cb26
Content
..
.
25
20
15
10
5
5
5
..
.
NULL
5
0
Stack Frame
Variable
Address
..
.
Content
..
.
foo
b[4]
b[3]
b[2]
b[1]
b[0]
i
n
0x5c44cb76
0x5c44cb72
0x5c44cb68
0x5c44cb64
0x5c44cb60
0x5c44cb56
0x5c44cb52
..
.
-626679356
20
15
32767
1564158624
5
5
..
.
main
a
m
i
0x5c44cb34
0x5c44cb30
0x5c44cb26
0x5c44cb60
5
0
(b) Upon returning, the stack frame is no longer valid; the pointer
variable a points to a stack memory address but the frame and
its local variables are no longer valid. Some have been overwritten
with other values. Subsequent usage or access of the values in a are
undefined behavior.
Figure 7.3.: Illustration of the pitfalls of returning a static array in C. Static arrays
are locally scoped and exist only within the function/block in which they
are declared. The program stack frame in which the variables are stored
is invalid when the function returns control back to the calling function.
Depending on how the system/compiler/language handles this unwinding
process, values may be changed, unavailable, etc.
157
Memory Management
If a program no longer needs a dynamically allocated memory space, it should clean
up after itself by deallocating or freeing the memory, releasing it back to the heap
space so that it can be reused either by the program or some other program or process
on the system. The process of allocating and deallocating memory is generally referred
to as memory management. If a program does not free up memory, it may eventually
run out and be forced to terminate. Even if it does not run out of available memory, its
performance may degrade.
If a program has poor memory management and fails to deallocate memory when it is
no longer needed, the memory leaks: the available memory is gradually lost because
it is not released back to the heap for reallocation. Programs which such poor memory
management are said to have a memory leak. Sometimes this is a consequence of a
dangling pointer: when a program dynamically allocates a chunk of memory but then
158
Allocated Heap
Available Heap
Available Stack
Allocated Stack
Static Content
Program Code
Figure 7.4.: Depiction of Application Memory. The details of how application memory
is allocated and how the stack/heap grow may vary depending on the
architecture. The figure shows stack memory growing upward while heap
allocation grows downward. Allocation and deallocation may fragment
the heap space though.
159
160
11
13
17
19
23
29
B
(a) A shallow copy. B refers to A which refers to the array. Thus, B implicitly refers to the
same array.
11
13
-1
19
23
29
B
(b) When an element in a shallow copy is changed, A[i] = -1; , it is changed from the
perspective of both A and B.
11
13
17
19
23
29
11
13
17
19
23
29
(c) A deep copy. B refers to its own copy of the array distinct from A. Both are stored in
separate memory locations.
11
13
-1
19
23
29
11
13
17
19
23
29
(d) When an element in a deep copy is changed, A[i] = -1; , it is changed only in the
array A. The element in B is unaffected.
Figure 7.5.: Shallow copies are when two references refer to the same data in memory
(a) and (b). Changes to one affect the other. Deep copies (c) and (d) are
distinct data in memory, changes to one do not affect the other.
161
1 9 8
2.5 3 5
In mathematics, entries in a matrix are indexed via their row and column. For example,
ai,j would refer to the element in the i-th row and j-th column. Referring to the row
first and column second is referred to as row major ordering. If the number of rows and
the number of columns are the same, the matrix is referred to as a square matrix. For
example, the following is a square, 10 10 matrix.
2
3
5
7
11
13
17
19
23
29
68
86
7
35
55
20
72
59
73
48
9
22
17
64
36
37
68
6
30
2
44
42
12
69
5
74
26
26
80
39
80
58
29
79
25
3
11
87
51
18
79
24
56
56
6
53
6
18
14
21
77
45
68
52
22
87
63
82
34
33
59
39
14
77
25
70
70
27
67
28
27
7
65
82
72
3
29
75
59
40
2
47
3
85
37
78
16
19
58
34
162
Aside from basic arrays, many languages have rich libraries of other dynamic collections.
Dynamic collections are not the same thing as dynamically allocated arrays. With a
normal array, once created, its size is fixed and cannot, in general, be changed. However,
dynamic collections can grow (and shrink) as needed when you add or remove elements
from them.
Lists are ordered collections that are essentially dynamic arrays. Lists are ordered and
are usually zero-indexed just like arrays. Lists are generally objects and provide methods
that can be used to add, remove, and retrieve elements from the list. If you add an
element to a list, the list will automatically grow to accommodate it, so its size is not
fixed when created. Two common implementations of lists are array-based lists and
linked lists. Array-based lists use an array to store elements. When the array fills up,
the list allocates a new, larger array to hold more elements, copying the original contents
over to the new array with a larger capacity. Linked lists hold elements in nodes that are
linked together. Adding a new element simply involves creating a new node and linking
it to the last element in the list.
Some languages also define what are called sets. Sets allow you to store elements
dynamically just like lists, but sets are generally unordered. There is no concept of a
first, second, or last element in a set. Iterating over the elements in a set could result
in a different enumeration of the elements each time. Elements in sets are also usually
unique. For example, a set containing integers would only ever contain one instance of
each integer. The value 10, for example, would only ever appear once. If you added 10
to a set that already contained it, the operation would have no effect on the set.
Another type of dynamic array are associative arrays (sometimes called dictionaries).
An associative array holds elements, but may not be restricted in how they are indexed.
In particular, a language that supports associative arrays may allow you to use integers
or strings as indices, or even any arbitrary object. Further, when using integers to index
elements, indices need not be fully defined nor contiguous. In an associative array you
could define an element at index 5 and then place the next element at index 10, skipping
index 6 through 9 which would remain undefined.
One way to look at associative arrays is as a map. A map is a data structure that
stores elements as key-value pairs. Both they keys and values could be any arbitrary
type (integers or strings) or object depending on the language. You could map account
numbers (stored as strings) to account objects, or vice versa. Using a smart data structure
like a map can make data manipulation a lot easier and more straightforward.
163
7.5. Exercises
Exercise 7.1. Write a function to return the index of the maximum element in an array
of numbers.
Exercise 7.2. Write a function to return the index of the minimum element in an array
of numbers.
Exercise 7.3. Write a function to compute the mean (average) of an array of numbers.
Exercise 7.4. Write a function to compute the standard deviation of an array of numbers,
v
u
N
u1 X
t
=
(xi )2
N i=1
where is the mean of the array of numbers.
Exercise 7.5. Write a function that takes two arrays of numbers that are sorted and
merges them into one array (retuning a new array as a result).
Exercise 7.6. Write a function that takes an integer n and produces a new array of size
n filled with 1s.
Exercise 7.7. Write a function that takes an array of numbers are computes returns
the median element. The median is defined as follows:
If n is odd, the median is the
n+1
-th
2
largest element
n
2
Exercise 7.8. The dot product of two arrays (or vectors) of the same dimension is
defined as the sum of the product of each of their entries. That is,
n
X
ai b i
i=1
Write a function to compute the dot product of two arrays (you may assume that they
are of the same dimension)
Exercise 7.9. The norm of an n-dimensional vector, ~x = (x1 , x2 , . . . , xn ) captures the
notion of distance in a higher dimensional space and is defined as
q
k~xk = x21 + + x2n
Write a function that takes an array of numbers that represents an n-dimensional vector
and computes its norm.
164
7.5. Exercises
Exercise 7.10. Write a function that takes two arrays A, B and creates and returns a
new array that is the concatenation of the two. That is, the new array will contain all
elements a followed by all elements in b.
Exercise 7.11. Write a function that takes an array of numbers A and an element x
and returns true/false if A contains x
Exercise 7.12. Write a function that takes an array of numbers A, an element x and
two indices i, j and returns true/false if A contains x somewhere between index i and j.
Exercise 7.13. Write a function that takes an array of numbers A and an element x
and returns the multiplicity of x; that is the number of times x appears in A.
Exercise 7.14. Write a function to compute a sliding window mean. That is, it computes
the average of the first m numbers in the array. The next value is the average of the
values index from 1 to m, then 2 to m + 1, etc. The last window is the average of the
last m elements. Obviously, m n (for m = n, this is the usual mean). Since there is
more than one value, your function will return a (new) array of means of size n m + 1.
Exercise 7.15. Write a function to compute the mode of an array of numbers. The
mode is the most common value that appears in the array. For example, if the array
contained the elements 2, 9, 3, 4, 2, 1, 8, 9, 2, the mode would be 2 as it appears more than
any other element. The mode may not be unique; multiple elements could appear the
same, maximal number of times. Your function should simply return a mode.
Exercise 7.16. Write a function to find all modes of an array. That is, it should find
all modes in an array and return a new array containing all the mode values.
Exercise 7.17. Write a function to filter out certain elements from an array. Specifically,
the function will create a new array containing only elements that are greater than or
equal to a certain threshold .
Exercise 7.18. Write a function that takes an array of numbers and creates a new
deep copy of the array. In addition, the function should take a new size parameter
which will be the size of the copy. If the new size is less than the original, then the new
array will be a truncated copy. If the new size is greater then the copy will be padded
with zero values at the end.
Exercise 7.19. Write a function that takes an array A and two indices i, j and returns
a new array that is a subarray of A consisting of elements i through j.
Exercise 7.20. Write a function that takes two arrays A, B and creates and returns a
new array that represents the unique intersection of A and B. That is, an array that
contains elements that are in both A and B. However, elements should not be included
more than once.
Exercise 7.21. Write a function that takes two arrays A, B and creates and returns a
new array that represents the unique union of A and B. That is, an array that contains
elements that are either in A or B (or both). However, elements should not be included
more than once.
165
166
7.5. Exercises
Exercise 7.30. We can multiply a matrix by a single scalar value x by simply multiplying
each entry in the matrix by x. Write a function that takes a matrix of numbers and an
element x and performs scalar multiplication.
Exercise 7.31. Write a function that takes two matrices and determines if they are
equal (all of their elements are the same).
Exercise 7.32. Write a function that takes a matrix and an index i and returns a new
array that contains the elements in the i-th row of the matrix.
Exercise 7.33. Write a function that takes a matrix and an index j and returns a new
array that contains the elements in the j-th column of the matrix.
Exercise 7.34. Iterated Matrix Multiplication is where you take a square matrix, A
and multiply it by itself k times,
Ak = A
| A
{z A}
k times
167
8 2 4 1
A = 10 4 2 3 .
12 42 1 0
then a call to this function with i = 1, j = 2, k = 2, ` = 3) should result in
2 3
A=
.
1 0
Exercise 7.46. The Kronecker product (http://en.wikipedia.org/wiki/Kronecker_
product) is a matrix operation on two matrices that produces a larger block matrix.
Specifically, if A is an m n matrix and B is a p q matrix, then the Kronecker product
A B is the mp nq block matrix:
a11 B a1n B
..
...
A B = ...
.
am1 B amn B
more explicitly:
..
..
.
.
a b
m1 11 am1 b12
a b
m1 21 am1 b22
..
..
.
.
am1 bp1 am1 bp2
...
a11 b1q
a11 b2q
..
.
a1n b11
a1n b21
..
.
168
a1n b12
a1n b22
..
.
a1n b1q
a1n b2q
..
.
a1n bpq
..
.
.
..
.
amn b1q
amn b2q
..
...
.
amn bpq
...
7.5. Exercises
Exercise 7.47. The Hadamard product is an entry-wise product of two matrices of
equal size. Let A, B be two n m matrices, then the Hadamard product is defined as
follows.
AB = ..
..
.. ..
.. . .
.. = ..
..
..
..
..
.
.
.
.
.
.
.
.
.
.
.
.
an1 an2 anm
bn1 bn2 bnm
an1 bn1 an2 bn2 anm bnm
Write a function to compute the Hadamard product of two n m matrices.
169
8. Strings
A string is an ordered sequence of characters. Weve previously seen string data types
as literals. Most languages allow you to define and use static string literals using the
double-quote syntax. We used strings to specify output formatting using printf() -style
functions for example. When reading input from a user, we read it as a string and
converted the input to numbers. We also described some basic operations on strings
including concatenation. We now examine strings in more depth.
Programming languages vary greatly in how they represent string data types. Some
languages have string types built-in to the language and others require that you use
arrays and yet others treat strings as a special type of object.
One issue with string representations is determining where and how the string ends.
Some languages use a length prefix string representation. The length (that is, the number
characters in the string) is stored in a special location at the beginning of a string. Then
the string characters are stored as an array. Still other languages use a special character,
the null-terminating character to denote the end of a string. Still other languages store
strings as arrays or dynamic arrays and the bookkeeping is done internally as part of
an object representation.
Other details vary as well. Most languages support the basic ASCII characters, others
have full Unicode support or support Unicode through a library. Most languages also
provide large libraries of functions and operations that make working with strings easier.
171
8. Strings
itself (usually called a property of the string) or again through a function call. We can
further use such functionality to iterate over the individual characters in a string using
an index-controlled for-loop.
More advanced operations on strings include concatenation which is the operation of
combining one or more strings to create a new string. Concatenation simply appends
one string to the end of another string. Again, depending on the language this may be
accomplished with a built-in operator or it may require a function call.
Another common operation is to extract a substring from a string, that is create a
new string from a portion of another string. Commonly, this is done via some standard
function that may operate by specifying indices and/or the length of the desired substring.
Finally, it is also common to deal with collections of strings. Some languages allow you
to create arrays of strings or dynamic collections (lists or sets) of strings. for languages
in which strings are arrays of characters, an array of strings might be implemented with
a 2-dimensional array of characters.
8.2. Comparisons
When processing strings there are several other standard operations. In particular, we
often have need to make comparisons between two string variables or between a string
variable and a literal. Some languages allow you to use the same operators such as ==
or even < to make comparisons between strings. The implied behavior would compare
strings for equality (case sensitive) or for lexicographic order. For example Apple <
Banana might evaluate to true because Apple precedes Banana in alphabetic order.
Many languages, however, require that you make string comparisons using a function.
Using the equality operator == may be correct syntactically, but is usually making a
pointer or reference comparison which evaluates to true if and only if the two variables
represent the same thing in memory. Even if two string variables have the same content,
the equality operator may evaluate to false if they are distinct (deep) copies of the same
string. Likewise, the inequality operators <, , etc. may only be comparing memory
addresses which is meaningless for comparing strings.
The solution that many languages provide is the use of a comparator, which is either a
function or an object that facilitates the comparison of strings (and more generally, any
object). Generally, a comparator function takes two arguments, a, b and compares them,
not just for equality, but for their relative order: does a come before b or does b come
before a, or are they equal. To distinguish between these three cases, a comparator
returns an integer value with the following general contract: it returns
Something negative if a < b
Zero if a = b
172
8.3. Tokenizing
Something positive if a > b
Using this contract we can determine the relative ordering of any two strings. In general
we cannot make any assumptions about the actual value that a comparator returns, only
that it returns something negative or positive. The actual magnitude of the returned
value need not be 1 or +1, and it may not even have any predefined meaning.
8.3. Tokenizing
It is common to store different pieces of data as a string such that each individual piece
of data is demarcated by some delimiter. For example, Comma Separated Values (CSV)
or Tab Separated Values (TSV) data use commas and tabs to delimit data. For example,
the string
Smith,Joe,12345678,1985-09-08
is a CSV string holding data on a particular person (last name, first name, ID, date of
birth). Often we need to process such strings to extract each individual piece of data.
Processing such strings is usually referred to as parsing. In particular, a string is split
into a collection of individual strings called tokens (thus the process is also sometimes
referred to as tokenizing). In the example above, the string would be processed into
4 individual strings, Smith , Joe , 12345678 , and 1985-09-08 . Each string could
further be tokenized if needed, such as parsing the date of birth to extract the year,
month, and date.
Most languages provide a function to facilitate tokenizing. Some do so by directly
returning an array or collection of the resulting tokens (usually with the delimiter
removed). Others have a more manual process that requires a loop structure to iterate
over each token.
8.4. Exercises
Exercise 8.1. Write functions to reverse a string. If appropriate, write versions to do
so by manipulating a given string and returning a new string that is a reversed copy.
Exercise 8.2. Write a function to replace all spaces in a string with two spaces.
Exercise 8.3. Write a program to take a phrase (International Business Machines) and
acronymize it by producing a string that is an upper-cased string of the first letter of
each word in the phrase (IBM).
Exercise 8.4. Write a function that takes a string containing a word and returns a
pluralized version according to the following rules.
173
8. Strings
1. If the noun ends in y, remove the y and add ies
2. If the noun ends in s, ch, or sh, add es
3. In all other cases, just add s
Exercise 8.5. Write a function that takes a string and determines if it is a palindrome
or not. A palindrome is a word that is spelled exactly the same when the letters are
reversed.
Exercise 8.6. Write a function to compute the longest common prefix of two strings.
For example, the longest common prefix of global and glossary is glo. If two
strings have no common prefix, then the longest common prefix is the empty string.
Exercise 8.7. Write a function to remove any whitespace from a given string. For example, if the string passed to the function contains "Hello World How
are you? "
then it should result in the string "HelloWorldHowareyou?"
Exercise 8.8. Write a function that takes a string and flips the case of each alphabetic
character in it. For example, if the input string is "GNU Image Processing Tool-Kit"
then it should output "gnu iMAGE pROCESSING tOOL-kIT"
Exercise 8.9. Write a function to validate a variable name according to the rules that it
must begin with an alphabetic character, az or AZ but may contain any alphanumeric
character az, AZ, 09, or underscores _ . Your function should take a string with a
possible variable name and return true or false depending on whether or not it is valid.
Exercise 8.10. Write a function to convert a string that represents a variable name using
under_score_casing to lowerCamelCasing . That is, it should remove all underscores,
and replace the first letter of each word with an uppercase (except the first word).
Exercise 8.11. Write a function that takes a string and another character c and counts
the number of times that c appears in the string.
Exercise 8.12. Write a function that takes a string and another character c and removes
all instances of c from the string. For example, a call to this function on the string
"Hello World" with c being equal to o would result in the string "Hell Wrld" .
Exercise 8.13. Write a function that takes a string and two characters, c and d and
replaces all instances of c with d.
Exercise 8.14. Write a function to determine if a given string s contains a substring t.
The function should return true if t appears anywhere inside s and false otherwise.
Exercise 8.15. Write a function that takes a string s and returns a new string that
contains the first character of each word in s capitalized. You may assume that words
are separated by a single space. For example, if we call this function with the string
"International Business Machines" it should return "IBM" . If we call it with the
string "Flint Lockwood Diatonic Super Mutating Dynamic Food Replicator" it
should return "FLDSMDFR"
174
8.4. Exercises
Exercise 8.16. Write a function that trims leading and trailing white space from a
string. Inner whitespace should not be modified.
Exercise 8.17. Write a function that splits a string containing a unix path/file into its
three components: the directory path, the file base name and the file extension. For
example, if the input string is /usr/home/message.txt then the three components
would be /usr/home/ , message and txt respectively. For the purposes of this function,
you may assume that the path ends with the last forward slash (or is empty if none) and
that the extension is always after the last period. That is, you should be able to handle
inputs such as ../foo/bar/baz.old.txt .
Exercise 8.18. Write a function that (re)formats a string representing a telephone
number. Phone numbers can be written using a variety of formats, for example
1-402-555-1234 or +4025551234 or 402 555-1234 , etc. Assume that you will only
deal with 10 digit US phone numbers. Create a new string that uses the standard
format of (402) 555-1234 .
Exercise 8.19. Write a function that takes a string and splits it up to an array of
strings. The split will be length-based: the function will also take an integer n and will
split the given string up into strings of length n. It is possible that the last string will
not be of length n.
For example, if we pass "Hello World, how are you?" with n = 3 then it should
return an array of size 9 containing the strings "Hel" , "lo " , "Wor" , "ld," , " ho" ,
"w a" , "re " , "you" , "?"
Exercise 8.20. HyperText Markup Language (HTML) (Hypertext Markup Language)
is the primary document description language for the World Wide Web (WWW). Certain
characters are not rendered in browsers as they are special characters used in HTML; in
particular tags which begin and end with the < and > .
To display such characters correctly they need to be escaped (similar to how you need
to escape tabs \t and endline \n characters). Properly escaping these characters
is not only important for proper rendering, but there are also security issues involved
(Cross-Site Scripting Attacks).
Write a function that takes a string and escapes the HTML characters in Table 8.1.
Replace the following
&
<
>
"
with this
&
<
>
"
175
9. File Input/Output
A file is a block of data used for storing information. Normally, we think of a file as
something that is stored on a hard drive (or memory stick or other disk media), but the
concept of a file is much more general. For example, when a file is loaded (read) by a
program it then exists in main memory. An executable program itself is a file (containing
instructions to be executed), both stored on the hard drive and run in memory.
In a typical unix-based system, everything is a file. Directories are files, executables are
files, running processes are files, etc. Even the familiar standard input and standard
output are buffers that are treated as files that can be read from or written to.
Files may be stored as binary data or as plaintext files. Granted, plaintext files are still
stored as binary data, but are stored as an encoding using the ASCII text values. Binary
files will also have structure, but it depends on the application that produced the file to
give meaning to the data. For example, an image file in a Joint Photographic Experts
Group (JPEG) format is essentially just binary data but it has a very specific format
that an image viewer would be able to process, but, say, a text editor would not. Further,
if the binary format is corrupted, even the image viewer might not be able to display the
image correctly
Files provide persistence of data. Typical programs are short lived, anywhere from a few
milliseconds to maybe a user session. However, we often want data to be saved across
multiple runs we need to save it or persist it in some durable storage medium (disk).
177
9. File Input/Output
A file may be processed line by line until the end of the end of the file has been reached.
Languages usually support this by a special End Of File (EOF) flag or value to indicate
the end of a file has been reached.
We need to properly manage a resource like a file just as we would with, say, memory
and properly close it once we are finished processing it. Depending on the language and
other factors such as the operating system, failure to close a file may result in corrupted
data. Though a file may be closed automatically for us when the program terminates, its
still best practice to properly close it.
9.1.1. Paths
When opening a file on a file system, it is necessary to specify which file you want to open.
This is typically done by specifying at least the name of the file. Often files will have
extensions which indicate the type of file it is such as .txt for text files or .html for
HTML files. However, in general file extensions are only for organizational purposes and
have no real bearing on what data is stored in the file.
More important is the path of the file. Usually, if no path is specified, then implicitly
we are opening the file in the Current Working Directory (CWD). For example, if we
open the file data.txt then we are opening the file in the same directory in which our
program is executing.
When specifying a path we can either specify an absolute path or a relative path. An
absolute path specifies each and every subdirectory in the file system from the root to
the directory that the file is located in. The root directory is the top-most directory in
the file system. Each subdirectory is separated by some delimiter.
Windows systems usually use a backslash as a directory delimiter while the root directory
is specified using a volume name such as C:\ . For example, an absolute path on a
Windows system may look something like:
C:\applications\users\data\data.txt
On a Unix-based system, a forward slash is used as a directory delimiter and the root
directory is simply a single forward slash. The same directory structure in a Unix-based
system would look like the following.
/applications/users/data/data.txt
A path may also be relative to the current working directory. In most systems (Windows
and Unix-based) the current directory is denoted using a single period, . . You can use
this to specify directories deeper in the directory tree from the current directory. For
example (in Unix),
./app/data/data.txt
178
179
9. File Input/Output
/proc/self/
|-- attr
|-- cwd -> /proc
|-- fd
|
-- 3 -> /proc/15589/fd
|-- fdinfo
|-- net
|
|-- dev_snmp6
|
|-- netfilter
|
|-- rpc
|
|
|-- auth.rpcsec.context
|
|
|-- auth.rpcsec.init
|
|
|-- auth.unix.gid
|
|
|-- auth.unix.ip
|
|
|-- nfs4.idtoname
|
|
|-- nfs4.nametoid
|
|
|-- nfsd.export
|
|
-- nfsd.fh
|
-- stat
|-- root -> /
-- task
-- 15589
|-- attr
|-- cwd -> /proc
|-- fd
| -- 3 -> /proc/15589/task/15589/fd
|-- fdinfo
-- root -> /
Figure 9.1.: The Linux file system (like most file systems), defines a tree directory
structure. Each file and directory is contained in subdirectories all contained
within the root directory, / . This diagram was generated by the Tree
command, http://mama.indstate.edu/users/ice/tree/.
180
181
9. File Input/Output
9.2. Exercises
Exercise 9.1. Write a function that takes a string representing a file name and opens
and processes the file, returning all of its contents as a single string.
Exercise 9.2. Consider an irregular, 2-D simple polygon with n points,
(x1 , y1 ), (x2 , y2 ), . . . , (xn , yn )
The area A of the polygon can be computed as
n1
1X
A=
(xi yi+1 xi+1 yi )
2 i=0
Note, that the initial and end point will be the same, (x0 , y0 ) = (xn , yn ). An example
polygon for n = 5 can be found in Figure 9.2.
182
9.2. Exercises
You may assume that each word is separated by some whitespace (you may assume that
there are no multi-line hyphenated words). However, you should ignore all punctuation
(periods, question marks, etc.).
Use a standard American dictionary provided on your unix system which stores words
one per line. Your output should include all misspelled or unrecognized words (words
not contained in the dictionary file).
Exercise 9.5. A standard word search consists of an n n grid in which there are a
number of words hidden, some intersecting, with dummy letters filling in the blanks. An
example is provided in Figure 9.3.
183
9. File Input/Output
Exercise 9.7. Bridge is a four player (2 team) game played with a standard 52-card
deck. Prior to play, a round of bidding is performed to determine which team is playing
for or against the contract, the trump suit, and at what level. Understanding the rules of
the game or the bidding conventions involved are not necessary for this exercise. Instead,
write a program to assist players in how they should bid based on the following point
system.
A standard 52-card deck is dealt evenly to 4 different hands (Players 1 thru 4, 13 cards
each). Each players hand is worth a number of points based on the following rules:
Each Ace in the hand is worth 4 points
Each King is worth 3
Each Queen is worth 2
Each Jack is worth 1
For each suit (Diamond, Spade, Club, Heart) such that the hand has only 2 cards
(a doubleton) an additional point is added
For each suit that the hand has only 1 card in (a singleton) two additional points
are added
For each suit that the hand has no cards (a void) 3 additional points are added.
Write a program that reads in a text file containing a deal. The formatting is as
follows: the input file will have 4 lines, one for each player. Each line contains the cards
dealt to that player delimited by a single space. The cards are indicated by the rank
(A, K, Q, J, 10, 9, . . . , 2) and the suit (D, S, C, H). An example:
3C
5D
2H
2D
3D 7S QD KC AS 6S AC JS 4S JD 7H 6D
8C 7D AH 3H QC 8D JH 5H 9D 7C 9C 4D
10D 8H KS QH 4C 10S 9S 6H 8S KD AD QS
10C 6C 2C 10H 4H 2S 3S 5C 9H KH JC 5S
Your program should process the file and output the total number of points each hand
represents. You should not make any assumptions about the ordering of the input.
Hand
Hand
Hand
Hand
1
2
3
4
Points:
Points:
Points:
Points:
17
10
16
6
Exercise 9.8. The game of Sudoku is played on a 9 9 grid in which entries consist of
the numbers 1 thru 9. Initially, the board is presented with some values filled in and
others blank. The player has to fill in the remaining values until all grid boxes are filled
and the following constraints are satisfied.
184
9.2. Exercises
In each of the 9 rows, each number, 19 must appear exactly once
In each of the 9 columns, each number 19 must appear exactly once
In each of the 3 3 sub-grids, each number 19 must appear exactly once
A full example is presented in Figure 9.4.
185
9. File Input/Output
are enclosed in square brackets (putting them in an array). For each record, each value is
denoted with a key (the column name) and a value. For this exercise, treat all values as
strings even if they are numbers. For example, the input file above would be formatted
as follows.
1
[
{
"lastName": "Castro",
"firstName": "Starlin",
"NUID": "11223344",
"GPA": "3.48"
},
{
"lastName": "Rizzo",
"firstName": "Anthony",
"NUID": "55667788",
"GPA": "3.95"
},
{
"lastName": "Bryant",
"firstName": "Chris",
"NUID": "01234567",
"GPA": "2.7"
}
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Exercise 9.10. Ranked voting elections are elections where each voter ranks each
candidate rather than just voting for a single candidate. If there are n candidates, then
each voter will rank them 1 (best) through n (worst). Usually, the winner of such an
election is determined by a Condorcet method (the candidate that would win in by a
majority in all head-to-head contests). However, well use an alternative method, a Borda
count.
In a Borda count, points are awarded to each candidate for each ballot. For every number
1 ranking, a candidate receives n points, for every 2 ranking, a candidate gets n 1
points, and so on. For a rank of n, the candidate only receives 1 point. The candidates
are then ordered by their total points and the one with the highest point count wins
the election. Such a system usually leads to a consensus candidate rather than one
preferred by a majority.
Implement a Borda-count based ranked voting program. Your program will read in a file
in the following format. The first line will contain an ordered list of candidates delimited
by commas. Each line after that will represent a single ballots ranking of the candidates
and will contain comma delimited integers 1 through n. The order of the rankings will
correspond to the order of the candidates on the first line.
186
9.2. Exercises
Your program will take an input file name as a command line argument, open the file
and process it. It will then report the results including the point total for each candidate
(in order) as well as the overall winner. It will also report the total number of ballots.
You may assume each ballot is valid and all rankings are provided.
An example input:
Alice,Bob,Charlie,Deb
2,1,4,3
3,4,2,1
4,2,3,1
3,2,1,4
3,1,4,2
An example output:
Election Results
Number of ballots: 5
Candidate
Bob
Deb
Charlie
Alice
Points
15
14
11
10
Winner is Bob
Exercise 9.11. A DNA sequence is a sequence of some combination of the characters
A (adenine), C (cytosine), G (guanine), and T (thymine) which correspond to the
four nucleobases that make up DNA. Given a long DNA sequence, its often useful to
compute the frequency of n-grams. An n-gram is a DNA subsequence of length n. Since
there are four bases, there are 4n possible n-grams.
Write a program that processes a DNA sequence from a plaintext file and, given n,
computes the relative frequency of each n-gram as it appears in the sequence. As an
example, consider the sequence in Figure 9.5.
GGAAGTAGCAGGCCGCATGCTTGGAGGTAAAGTTCATGGTTCCCTGGCCC
Figure 9.5.: A DNA Sequence
To compute the frequency of all n = 2 n-grams, we would consider all 16 combinations
of length-two DNA sequences. We would then go through the sequence and count up the
number of times each 2-gram appears. We then compute the relative frequency (note:
if a sequence is length L, then the total number of n-grams in it is L (n 1)). The
relative frequency of each such 2-gram is calculated below.
187
9. File Input/Output
AA
AC
AG
AT
CA
CC
CG
CT
GA
GC
GG
GT
TA
TC
TG
TT
6.1224%
0.0000%
10.2041%
4.0816%
6.1224%
10.2041%
2.0408%
4.0816%
4.0816%
10.2041%
12.2449%
8.1633%
4.0816%
4.0816%
8.1633%
6.1224%
Exercise 9.12. Given a long DNA sequence, it is often useful to compute the number
of instances of a certain subsequence. As an example, if we were to search for the
subsequence GT A in the DNA sequence in Figure 9.5, it appears twice. As another
example, in the sequence CCCC , the subsequence CC appears three times.
Write a program that processes a text file containing a DNA sequence and, given a
subequence s, searches the DNA sequence and counts the number of times s appears.
Exercise 9.13. Protein sequencing in an organism consists of a two step process. First
the DNA is translated into RNA by replacing each thymine (T) nucleotide with uracil
(U). Then, the RNA sequence is translated into a protein according to the following rules.
The RNA sequence is processed 3 bases at a time. Each trigram is translated into a single
amino acid according to known encoding rules. There are 20 such amino acids, each
represented by a single letter in (A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y ).
The rules for translating trigrams are presented in Figure 9.6. Each triple defines a
protein, but were only interested in the first letter of each protein. Moreover, the
trigrams UAA, UAG, and UGA are special markers that indicate a (premature) end to
the protein sequencing (there may be additional nucleotides left in the RNA sequence,
but they are ignored and the translation ends).
As an example, suppose we start with the DNA sequence AAAT T CCGCGT ACCC;
it would be encoded into RNA as AAAU U CCGCGU ACCC; and into an amino acid
sequence KF RV P .
Write a program that processes a file containing a DNA sequence and outputs the
translated proteins (only the first letter of each protein) to an output file.
Exercise 9.14. Recently, researchers have successfully inserted two new artificial nucleases into simple bacteria that successfully reproduced the artificial bases through several
188
9.2. Exercises
189
9. File Input/Output
Amino Acid
Threonine
Alanine
Codon
ACT
ACC
ACA
ACG
GCT
GCC
GCA
GCG
Artificial Codon
AYT
AYY
AYA
AYX
XYT
XYY
XYA
XYX
190
Depending on the identification number, it may be more appropriately modeled with a string. Social
Security Numbers for example are not purely numeric: they include dashes and may begin with
zeros.
191
Last Name
Baker
Eccleston
Tennant
Smith
Capaldi
ID
74
5
10
29
13
GPA
3.75
3.5
4.0
3.2
2.9
10.1. Objects
Though languages will differ in how they support objects, they all have some commonalities. A language needs to provide ways to define objects, create instances of objects, and
to use them in code.
10.1.1. Defining
Most object oriented programming languages such as C++ and Java are class-based
languages. Meaning that they allow you to define objects by declaring and defining a
class. A class is essentially a blue print for what the object is and how it is defined.
Generally, a class declaration allows you to specify member variables and member
methods. Further, full encapsulation is achieved by using visibility keywords such as
public or private to either allow or restrict access to variables and methods from
code outside the object.
Some languages (such as C) do not support full encapsulation, rather they allow you to
define structures which allow for the grouping of data, but make it difficult or impossible
to achieve the other two aspects of encapsulation (the grouping of methods that act on
that data and the protection of data).
192
10.1. Objects
In either case, a language usually allows you to define the member variables and to name
the class or structure so that instances can be referred to by that type. Built-in types such
as numbers or strings already have a type name defined by the language. However, an
object is a user-defined type that is not built in to the language. Once defined, however,
the class or structure can be referred to just like any built-in variable type.
Often, it is usual to create objects that are made of other objects. For example, a student
object may be defined by using two strings for its first and last name. In the language,
a string itself may be an object. As a more complex example, suppose that we wanted
an additional member variable to model a students date of birth. A date may itself
be an object as it consists of several pieces of information (a year, month, and date
at least). When an object owns an instance of another object it is referred to as
composition as the object is composed of other objects. Further, an object may consist of
a collection of other objects (suppose that a student object owned an array of course
objects representing their schedule). This is a form of composition known as aggregation
(multiple objects have been aggregated by the object).
10.1.2. Creating
Once a blueprint for an object (or structure) has been declared and defined, the second
element that a language usually provides for is a way to create instances of the object.
The concept of an object is general and abstract. Its like the idea of a student. Only
once we have created an actual instance that lives in memory do we have an actual
instance. Creating instances of an object is usually referred to as instantiation.
Languages may be able to automatically create instances of your object with default
values. After all, your object is likely composed of built-in types. The student example
above for example could be modeled with two strings, an integer, and a floating point
number. The language/compiler/interpreter knows how to deal with these built-in
types, so it can extend that knowledge to create instances of your object which are
essentially just collections of types that it already knows how to deal with.
Object oriented languages usually provide a special method for you to be able to give
more specifics about how an object is created. These are called constructor methods.
Sometimes you can define multiple constructors methods that take different number(s)
of arguments and/or have different behavior. Constructor methods typically have special
syntax or have the same name as the class.
In other languages that do not fully support object-oriented programming, you must
define utility functions that can be used to create instances of your object. Sometimes
these are referred to as factory functions as they are responsible for manufacturing
instances of your object.
193
194
10.3. Exercises
as member fields to your object, it is probably more appropriate to define an address
object, especially if such an object would be useful elsewhere in a program.
10.3. Exercises
Exercise 10.1. A complex number consists of two real numbers:
a real component a
and an imaginary component bi where b is a real number and i = 1. Define an object
or structure to model a complex number. Write functions to:
Create a complex number
Print a complex number
Perform basic arithmetic operations on two complex numbers including addition,
subtraction and multiplication as defined by:
c1 + c2 = (a1 + b1 i) + (a2 + b1 i) = (a1 + a2 ) + (b1 + b2 )i
c1 c2 = (a1 + b1 i) (a2 + b1 i) = (a1 a2 ) + (b1 b2 )i
c1 c2 = (a1 + b1 i) (a2 + b1 i) = (a1 a1 b1 b2 ) + (a1 b2 + b1 a1 )i
Exercise 10.2. Design an object (or structure) that models an album. Include at least
the album title, artist (or band) and release year. Include any other data that you think
is relevant and write functions to support your object.
Exercise 10.3. Design an object (or structure) that models a bank savings account.
Include at least the balance, APR, an account number and customer information
(which may be another object or structure). Include any other data that you think is
relevant and write functions to support your object.
Exercise 10.4. Design an object (or structure) that models a sports stadium. Include at
least the stadium name, the team that plays there, its city, state, and year built. Include
both latitude and longitude data. Include any other data that you think is relevant and
write functions to support your object. Write a parser to process a flat file data of all
stadiums (in your chosen sport) and build instances of all of them.
Exercise 10.5. Design an object (or structure) that models a network-connected electronic device. Include at least a unique ID, a human-readable name for the device, a
Media Access Control (MAC) address and Internet Protocol (IP) address as well as the
devices bandwidth in megabits per second. Include any other data that you think is
relevant and write functions to support your object.
Exercise 10.6. Design an object (or structure) that models an airport. Include at least
the name, FAA designation, its city, state, and latitude/longitude data. Include any
other data that you think is relevant and write functions to support your object.
195
11. Recursion
Suppose we wanted to write a simple program that counted down, printing 10, 9, 8, . . . ,
2, 1 and when it reached zero it printed a Happy New Year message. Likely our first
instinct would be to write a very simple for-loop using an increment variable. But what
if we lived in a world without the usual loop control structures that we are now familiar
with. How might we write such a program?
After thinking about it for a while, we might think: well, we dont have loops, but we
still have functions. In particular what if we had a function that took the current value
of our counter variable and decremented it, passing it to another function, which did the
same thing. For example, we could pass 10 to such a function, which would then subtract
1, passing 9 to another function and so on. A check could be made to see if the value was
zero, in which case we print our special message and no longer call any more functions.
In fact, we would not need to define 10 different functions to do so. Instead, we could
define one function that called itself. It might look something like Algorithm 11.1.
Input : An integer n 0
Output : A countdown of integers n, . . . 0
if n = 0 then
output Happy New Year!!!
else
output n
CountDown(n 1)
end
1
2
3
4
197
11. Recursion
This was not just a toy example. There are many programming languages in which
recursion is used as a matter of course. Functional programming languages tend to avoid
control structures like loops and even (mutable) variables. Instead, control flow is defined
by evaluating a series of functions, making recursion a fundamental operation.
Recursion is extensively used in mathematics. Recurrence relations or recursive functions
are common. The Fibonacci sequence is a common, if not overused1 example. It has a
simple definition: the next value in the sequence is simply the sum of the two previous
values. The sequence starts with the initial values of 1. The first few terms in the
sequence:
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, . . .
The more formal mathematical definition can be
1
1
Fn =
Fn1 + Fn2
stated as follows.
if n = 0
if n = 1
otherwise
The Fibonacci sequence is the cliche example for recursion. We can define an algorithmic
function to compute the n-th Fibonacci number as follows.
1
2
3
4
5
Input : An integer n 0
Output : The n-th Fibonacci number, Fn
if n 1 then
output 1
else
output Fibonacci(n 1) + Fibonacci(n 2)
end
Algorithm 11.2: Recursive Fibonacci(n) Function
Though hackneyed, it does provide a good example for how recursive functions work.
Well also utilize it as an example of why you should avoid recursion in practice. Well
also use it to illustrate how the problems with recursion can be mitigated or avoided
altogether.
The Fibonacci sequence is nothing special; its simply a second order linear homogenous recurrence
relation with coefficients of 1. You can define many such relations. The near reverence that so many
people attribute to it borders on mysticism.
198
int foo(int x) {
...
return bar(x-1) + 1;
}
Here, foo() calls bar() but it is not the last operation before it returns. Instead, it
invokes bar() , takes the result and adds one then returns to the calling function. Note
that decrementing x is performed before the invocation of bar() . In contrast, consider
the following modified code:
1
2
3
4
int foo(int x) {
...
return bar(x-1);
}
Here, the invocation of bar() is the last operation performed by foo() . Thus, this is
a tail call.
199
11. Recursion
Tail calls have the advantage that a language or compiler can generally optimize the
function call with respect to the stack frame. Since the function foo() is essentially
done with its computation, its stack frame is no longer needed. The system, therefore, can
reuse the stack frame. Tail recursion is such an important optimization, some languages
require it or guarantee it in other ways.
n1
X
Fi
i=0
2
Its overuse as an example of recursion is even less explicable as it solves a problem that no one cares
about.
200
Fibonacci(4)
Fibonacci(3)
Fibonacci(2)
Fibonacci(1)
Fibonacci(1)
Fibonacci(3)
Fibonacci(2)
Fibonacci(1)
Fibonacci(0)
Fibonacci(2)
Fibonacci(1)
Fibonacci(1)
Fibonacci(0)
Fibonacci(0)
11.2.1. Memoization
The inefficiency in the example above comes from the fact that we make the same function
calls on the same values over and over. One way to avoid recomputing the same values is
to store them into a table (or tableau if you prefer being fancy). Then, when you need
to compute a value, you look at the table to see if it has already been computed. If
3
201
11. Recursion
it has, we reuse the value stored in the table, otherwise we compute it by making the
appropriate recursive calls. Once computed, we place the value into the table so that it
can be looked up on subsequent function calls. This approach is usually referred to as
memoization.
The table in this case is very general: it can be achieved using a number of different
data structures including simple arrays, or even maps (mapping input value(s) to output
values). The table is essentially serving as a cache for the previously computed values. An
illustration of how this might work can be found in Algorithm 11.3. Here, the recursion
only occurs if the value Fn is not yet defined.
end
1
2
3
4
5
6
11.3. Exercises
Exercise 11.1. The binomial coefficients, C(n, k) or nk (n choose k), are defined as
the number of ways you can select k distinct items from a collection of n items. A direct
combinatorial definition is
n
n!
=
k
k!(n k)!
202
11.3. Exercises
An alternative is Pascals identity, which gives a recurrence to compute this value:
n
n1
n1
=
+
k
k
k1
Where n0 = 1 for any n and for all k > n, nk = 0. Finally, n1 = n.
1. Write a recursive function using Pascals identity to compute nk . Benchmark its
performance.
2. Write a recursive version that uses memoization to avoid recomputing values
3. Modify your functions to utilize an arbitrary precision numeric type so that you
can compute arbitrarily large values.
Exercise 11.2. The Jacobsthal sequence is very similar to the Fibonacci sequence in
that it is defined by its two previous terms. The difference is that the second term is
multiplied by two.
if n = 0
0
1
if n = 1
Jn =
203
12.1. Searching
Searching is a very basic operation. Given a collection of data, we wish to find a particular
element or elements that match a certain criteria. More formally, we have the following.
Problem 1 (Searching).
Given: a collection of elements, A = {a1 , a2 , . . . , an } and a key element ek
Output: The element ai in A such that ai = ek
The equality or comparison in this problem statement is not explicitly specified. In
fact, this is a very general, abstract statement of the basic search problem. We didnt
specify that the collection was an array, a list, a set, or any other particular data
structure. Nor did we specify what type of elements were in the collection. They could
be numbers, they could be strings, they could be objects.
There are many variations of this general search problem that we could consider. For
example, we could generalize it to find the first or last such element if our collection is
ordered. We could find all elements that match our criteria. Some basic operations that
weve already considered such as finding the minimum or maximum (extremal elements),
or median element are variations on this search problem.
When designing a solution to any of these variations additional considerations must be
made. We may wish our search to be index-based (that is, output the index i rather
than the element ai ). We may need to think about how to handle unsuccessful searches
(return null ? A special flag value? Throw an exception?, etc.).
When implementing a solution in a programming language, we of course will need to be
more specific about the type of collection being searched, the type of elements in the
205
contents
42
102 34
12
1
2
3
4
5
6
To illustrate, consider the following example searches. Suppose we wish to search the
0-indexed array of integers in Figure 12.1.
A search for the key ek = 102 would start at the first element. 42 6= 102 so the search
would continue; it would compare it against 4, then 9, then 5, and finally find 102 at
index i = 4, making a total of 5 comparisons (including the final comparison to the
matched element).
A search for the key ek = 42 would get lucky. It would find it after only one comparison as
the first element is a match. A search for the element 20 would result in an unsuccessful
search with a total of 10 comparisons being made. Finally a search for ek = 4 would only
206
12.1. Searching
a1
<m
a n2 1
a n2
a n2 +1
an
>m
Figure 12.2.: When an array is sorted, all elements in the left half are less than the
middle element m, all elements in the right half are greater than m.
require two comparisons as we find 4 at the second index. There is a duplicate element
at index 3, but the way weve defined linear search is to find the first such element.
Again, we could design any number of variations on this solution.
We give a more detailed analysis of this algorithm below.
If duplicate elements are in the array, then elements in the left/right half could be less than or equal
to and greater than or equal to m, but this will not affect how our algorithm works.
207
1
2
3
4
5
6
7
8
9
10
11
m b l+r
c
2
if am = ek then
output am
else if am < ek then
BinarySearch(A, m + 1, r, ek )
else
BinarySearch(A, l, m 1, ek )
end
208
12.1. Searching
run of the algorithm is shown in Figure 12.3.
1
2
3
4
5
6
7
8
9
10
11
12
13
12.1.3. Analysis
When algorithms are implemented and run on a computer, they require a certain amount
of resources. In general, we could consider a lot of different resources such as computation
time and memory. Algorithm analysis involves quantifying how many resource(s) an
algorithm requires to execute with respect to the size of the input it is run on.
When analyzing algorithms, we want to keep the analysis as abstract and general as
possible independent of any particular language, framework or hardware. We could
always update the hardware on which we run our implementation, but that does not
necessarily make the algorithm faster, it only means that more steps of the algorithm can
be run in less time. The number of operations that the algorithm performs remains the
same. In fact, the concept of an algorithm itself is a mathematical concept that predates
modern computers by thousands of years. One of the oldest algorithms, for example,
Eulers GCD (greatest common divisor) algorithm dates to 300 BCE. Whether or not
youre running it on a piece of papyrus 2,300 years ago or on a modern supercomputer,
the same number of divisions and subtractions are performed.
To keep things abstract, we analyze an algorithm using pseudocode and identify an
elementary operation. This is generally the most common or most expensive operation
209
10
contents
-3
12
34
index
10
contents
-3
12
34
(b) Since 64 > 12, we update our left index variable l to m + 1, thus l = 6 and
weve eliminated the left half of the list from consideration.
index
contents
-3
12
34
6+10
2
l+r
2
10
index
10
contents
-3
12
34
(d) Since 64 < 102, we update the right index variable r to m 1 = 7, eliminating
the right half of the subarray.
index
10
contents
-3
12
34
index
10
contents
-3
12
34
index
10
contents
-3
12
34
Figure 12.3.: The worst case scenario for binary search, resulting in an unsuccessful
search. This example is run on a 0-indexed array with an array of integers
of size 11.
210
12.1. Searching
that the algorithm performs. Sometimes there may be more than one reasonable choice
for an elementary operation which may give different analysis results. However, we
generally do not consider basic operations that are necessary to the control flow of an
algorithm. For example, variable assignments or the iteration of index variables.
Once we have identified an elementary operation, we can quantify the complexity of
an algorithm by analyzing the number of times the elementary operation is executed
with respect to the input size. For a collection, the input size is generally the number
of elements in the collection, n. We can then characterize the number of elementary
operations and thus the complexity of the algorithm itself as a function of the input size.
We illustrate this process by analyzing and comparing the two search algorithms.
Linear Search Analysis
When considering the linear search algorithm, the input size is clearly the number of
elements in the collection, n. The best candidate for the elementary operation is the
comparison (Line 2, Algorithm 12.1). To analyze this algorithm, we need to determine
how many comparisons are made with respect to the size of the collection, n.
As we saw in the examples, the number of comparisons made by linear search can vary
depending on the element were searching and the configuration of the collection being
searched. Because of this variability, we can analyze the algorithm in one of three ways:
by looking at the best case scenario, worst case scenario, and average case scenario.
The best case scenario is when the number of operations is minimized. For linear search,
the best case scenario happens when we get lucky and the first element that we examine
matches our criteria, requiring only a single comparison operation. In general, it is not
reasonable to assume that the best case scenario will be commonly encountered.
The worst case scenario is when the number of operations is maximized. This happens
when we get unlucky and have to search the entire collection finding a match at the
last element or not finding a match at all. In either case, we make n comparisons to
search the collection.
A formal average case analysis is not difficult, but is a bit beyond the scope of the
present analysis. However, informally, we could expect to make about n2 comparisons for
successful searches if we assume that all elements have a uniform probability of being
searched for.
Both the worst-case and average-case are reasonable scenarios from which to analyze
the linear search algorithm. In the end, however, the only difference between the two
analyses is a constant factor. Both analyses result in two linear functions,
1
f1 (n) = n f2 (n) = n
2
The only difference being the constant factor 12 . In fact, this is why the algorithm is called
211
In fact, this is the basis of Big-O analysis, something that we will not discuss in detail here, but is of
prime importance when analyzing algorithms.
212
12.1. Searching
Iteration
Array Size
Comparisons
n
2
n
4
n
8
4
..
.
n
16
1
..
.
k
..
.
n
2k
..
.
1
..
.
log2 (n)
log2 (n) + 1
..
.
Total
log2 (n) + 1
Table 12.1.: Number of comparisons and array size during the execution of binary search.
Comparative Analysis
Binary search presents a clear advantage over linear search. There is an exponential
difference between a linear function, n2 and a logarithmic function, log2 (n).
To put this in perspective, consider searching a moderately large database of 1 trillion
(1012 ) records.3 Using linear search, even in the average-case scenario would require
about
1012
= 5 101 1
2
or about 500 billion comparisons. However, using binary search would only require at
most
log2 (1012 ) = 12 log2 (10) < 40
comparisons to search. This is a huge difference in performance.
As another comparison, lets consider how each algorithms complexity grows as we
increase the size of the collection being searched. As observed earlier, if we double the
input size, n 2n, we would expect the number of comparisons performed by linear
search to also double. However, if we double the input size for binary search, we get the
following.
log2 (2n) = log2 (2) + log (n) = log (n) + 1
That is, only a single additional comparison is necessary to search an array of twice the
size.
3
In the era of big data, 1 trillion records only qualifies as moderately large.
213
Figure 12.4.: Illustrative example of the benefit of ordered (indexed) elements, Windows
7
Though binary search presents a clear advantage over linear search, it only works if the
collection has been sorted.4 Thus, we now turn our attention to the problem of sorting a
collection.
12.2. Sorting
Sorting a collection of data is another fundamental data operation. It is conceptually
simple, but is ubiquitous. There are a large variety of algorithms, data structures and
applications built around the problem of sorting. As weve already seen, being able
to sort a collection provides a huge speed up when searching for a particular element.
Sorting provides a natural way to store and organize data.
Problem 2 (Sorting).
Given: a collection of orderable elements, A = {a1 , a2 , . . . , an }
Output: A permuted list of elements A0 = {a01 , a02 , . . . , a0n } according to a specified order
The requirement that the collection be made of orderable elements can be a bit
technical5 , but essentially we need to be guaranteed that given two elements, a, b in the
collection, we can determine whether a < b, a = b or a > b. If such a determination
cannot be made, then sorting is impossible.
Again, we can consider variations on this problem. We may want our collection to be
4
Binary search also only works when searching an array with random access to its elements. The
performance of binary search cannot generally be realized with data structures such as linked lists or
unordered sets.
5
We require that A be a total order, a partially ordered binary relation such that all pairs are comparable.
214
12.2. Sorting
sorted in ascending or descending order.6 We may also want the collection itself to be
permuted (that is, reordered) or we may instead want a copy of the collection to be
created and sorted so that the original is unchanged.
We will examine several standard sorting algorithms (though there are dozens of algorithms, we will only focus on a few of the more common ones). As with searching, we
can analyze a sorting algorithm based on the number of comparisons it makes in the
worst, best, or average cases. We may also look at alternative resources or operations:
how many swaps does the algorithm make? How much extra memory is required? Etc.
Though we will examine, analyze and compare several sorting algorithms, most programming languages provide standard functionality to sort a collection of elements.
It is generally preferable to use the functionality built into whatever language youre
using rather that reimplementing your own. Typically, these functions are well-designed,
well-tested, optimized and more efficient than any custom alternatives.
Technically, these are referred to as non-decreasing and non-increasing respectively. This is because the
collection could contain duplicate elements and not lead to a strictly increasing or strictly decreasing
ordering.
215
1
2
3
4
5
6
7
8
9
end
swap amin and ai
end
Algorithm 12.4: Selection Sort
Example
We illustrate the execution of Selection Sort in Figure 12.5.
Analysis
We now analyze Selection Sort to determine how complex it is. First, the elementary
operation is the comparison on line 4. We need to determine how many times this line is
executed with respect to the size of the input, n.
First, we make the observation that on the i-th iteration, the first i 1 elements are
sorted and we need not make any comparisons among them. We start by assuming that
the ai is the minimum element and compare it to the remaining n i elements, requiring
n i comparisons, to find the minimal element. For example, in the first iteration we
make n 1 comparisons, the second we make n 2, etc. The last iteration we make only
1 comparison. Totaling these all up gives us
(n 1) + (n 2) + (n 3) + + 3 + 2 + 1
Rewritten this is
n1
X
i=1
216
i = 1 + 2 + 3 + + (n 2) + (n 1)
12.2. Sorting
42
102 34
12
102 34
12
42
swap
(a) First iteration. We find the minimal element, 0, at the last index,
swapping it with the first element. At this point, the first element is sorted.
0
102 34
12
42
102 34
12
42
swap
(b) Second Iteration. Now starting with the second element, the minimal
element among the remaining is found at the second to last element. 4 and
2 are swapped. At this point, the first two elements are sorted.
0
102 34
12
42
102 34
12
42
swap
(c) Third iteration. Since we are using the strictly-less than comparison, the
first 4 is the minimal element and swapped with 9.
0
102 34
12
42
102 34
12
42
swap
(d) Fourth iteration. At this point, the first 3 elements are sorted. We find
the minimal element (the other 4) and swap it with 9. At the end of this
iteration, the first 4 elements are sorted.
0
102 34
12
42
34
12 102 42
swap
(e) Fifth iteration. The 9 is swapped with 102, sorting the first 5 elements.
0
34
12 102 42
12
34 102 42
12
34 102 42
swap
12
34 102 42
swap
(g) Seventh iteration. 34 ends up being the minimal element and we essentially swap it with itself. Even though the current element was also the
minimal element, we still had to compare the current element with all other
elements; in this case we made 2 comparisons.
0
12
34 102 42
12
34
42 102
swap
(h) Eighth iteration. This is the final iteration, we swap 42 and 102. After
this iteration, the final element, 102 is already where it needs to be.
217
i = 1 + 2 + 3 + + (n 1) + n =
i=1
n(n + 1)
2
In Selection Sort, the number of comparisons doesnt sum up to n, only n1. Substituting
this in gives us
n1
X
i=
i=1
n(n 1)
2
Another way to analyze the code is to count the number of comparisons with respect
to the for-loop index variables. In particular, there is one comparison made on line 4.
Line 4 itself is executed once for each time the inner for-loop on line 3 executes which
executes for j running from i + 1 up to n. Line 3 and the entire inner for loop executes
once for each time the outer for loop executes, that is for i running from 1 up to n 1.
This gives us the following summation.
n1 X
n
X
i=1
1
j=i+1 |{z}
| {zline }4
{zline 3 }
line 1
1=
i=1 j=i+1
n1
X
ni
i=1
n1
X
i=1
n1
X
= n(n 1)
=
i=1
n(n 1)
2
n(n 1)
2
That is, Selection Sort is a quadratic sorting algorithm, requiring roughly n2 comparisons
to sort an array of n elements. We analyze this further below.
218
12.2. Sorting
1
2
3
4
5
6
7
8
9
Example
Analysis
As we can see in the example run of Insertion Sort, not every iteration needs to make
comparisons with all elements in the sorted part of the collection. In iteration 4 we only
had to make one comparison and we were done. In other iterations such as the last two,
we had to make comparisons to every element in the sorted part of the collection. This
219
42
102 34
12
42
102 34
12
42
102 34
12
42
102 34
12
(b) Second iteration. The first two elements are sorted, we insert 9 by making
two comparisons: to find that it is less than 42, bug greater than 4. At the
end of the iteration, the first three elements are sorted.
3
2
1
42
102 34
12
42 102 34
12
(c) Third iteration. The first three elements are sorted, we insert the second
four by making 3 comparisons.
1
42 102 34
12
42 102 34
12
(d) Fourth iteration. Here, only one comparison is necessary to find that 102
is already where it needs to be.
3
2
1
4
42 102 34
12
34
42 102 12
34
2 1
42 102 12
12
34
42 102
12
34
2 1
42 102
12
34
42 102
Figure 12.6.: Example execution of Insertion Sort. Each iteration depicts the comparisons
to previous elements; the last comparison is dashed indicating a comparison
220
was made, but not a swap. The final iteration is omitted for space, but
would require 8 comparisons to insert 0 at the front of the collection.
12.2. Sorting
suggests that Insertion Sort is adaptive and may have a different complexity depending
on the structure of the collection its sorting.
Lets consider the best case in which the number of comparisons is minimized. Suppose,
for example, we ran Insertion Sort on a collection that was already sorted. Each iteration
would only need one comparison to determine that the element was already where it
needed to be. As there are only n 1 iterations, in the best case, Insertion Sort makes
n 1 comparisons.
In contrast, the worst case would occur if the list was already sorted, but in reverse order.
The i-th iteration would require i comparisons to move the current element all the way
to the front of the collection. Again, this gives us a summation:
n1
X
i=
i=1
n(n 1)
2
Moving the constant outside the summation and applying Gausss Formula would give
us an expected number of comparisons to be
n(n 1)
4
This is still a quadratic function, but the constant involved and the fact that Insertion
Sort is more adaptive to the input makes Insertion Sort a much better algorithm in
practice than Selection Sort. In fact, Insertion Sort is very efficient on small arrays in
practice and is used in many hybrid algorithm implementations (see Section 12.2.5).
221
222
12.2. Sorting
partitioning operations on subarrays as part of the recursion.
end
1
2
3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
18
else
swap pivot, ai1
output (i 1)
19
end
16
17
223
42
102 34
12
42
swap
34
12
102
j
i
j
(a) First iteration. 42 is chosen as the pivot element. The index variable
i moves over to 102, the first element that is greater than 42 and on the
wrong side of the partition. The j index variable does not move as 0 is less
than the pivot element and on the wrong side; these are swapped.
42
34
12
102
34
12
42 102
s
j
i
(b) Second iteration. The index variable i moves all the way to the right
as all remaining elements are less than the pivot, 42. The index variable j
remains at 102 as i is now equal to j.
Figure 12.7.: Example execution of the Partition subroutine in Quick Sort. A total
of 8 comparisons are made, 9 if you count the last swap outside the while
loop. The partition returns s as the pivot position; the right-partition is
sorted as it only consists of 1 element.
34
12
swap
34
12
i
j
34
12
34
12
s
i, j
Figure 12.8.: Example execution of the Partition subroutine in Quick Sort on the first
recursive call on the left partition. A total of 6 comparisons are made, 7 if
you count the last swap outside the while loop. The partition returns s as
the pivot position; in this case the left-partition is sorted as it only consists
of 1 element.
224
12.2. Sorting
9
34
12
34
12
s
i, j
(a) First (and only) iteration. In this partitioning, 9 is the pivot. The index variable i
is incremented to 34 while j decrements to
match. 34 is swapped with itself. After this
iteration, the second condition applies and
the pivot is swapped with ai1 = 4
Figure 12.9.: Example execution of the Partition subroutine in Quick Sort on the
second recursive call on the right partition. A total of 5 comparisons are
made, 6 if you count the last swap outside the while loop. The partition
returns s as the pivot position.
Analysis
We can easily analyze how many comparisons are made by the Partition subroutine.
Suppose that we are given a (sub)array of n elements. Since the pivot element must be
compared to every other element in the (sub)array, it must make n 1 comparisons to
partition the elements around the pivot. If we count the last comparison to determine
where to place the pivot, it would be n (but the difference is trivial).
The analysis of Quick Sort itself is a bit more involved and is highly dependent on how
well the Partition subroutine splits the array. In the worst case, our pivot choice
will always partition the array into one empty subarray and one subarray with n 1
elements (we do not count the pivot element). One such example of this would be if our
collection is already sorted. In this case, one of the recursive calls would result in no
comparisons and the other would result in n 1 comparisons. This lopsided recursion
would result in the following number of comparisons:
n + (n 1) + (n 2) + + 3 + 2 + 1
which is exactly Gausss Formula, meaning that Quick Sort would perform
n(n + 1)
2
in the worst case.
However, in the sorting scenario, we cannot always assume the worst case. If we are given
random data, then the likelihood that we will always have such an extremely lopsided
partitioning is extremely small. A more reasonable analysis would involve the average
case: where each partition is roughly an equal size, about n2 . This happens when our
pivot choice is the median (or close to the median) element.
225
For example, there does exist a median-finding algorithm that runs in linear time which would
guarantee the theoretically best running time, but the algorithm is recursive and has a large overhead,
making its us slow in practice.
8
Still, Quick Sort is more common in practice because it has less memory requirements and the pivot
choice strategies can mitigate the risk of the worst-case scenario.
226
12.2. Sorting
Merge Sort works by first dividing the list into two (roughly) equal partitions. It then
recursively sorts each partition. The recursion stops when the subarray is of size 1 just
as with Quick Sort. The difference, however, is what Merge Sort does after the recursion.
After having sorted the left partition, L and the right partition R, Merge Sort merges
the two sorted partitions into one.
The Merge subroutine works by maintaining two index variables, i, j, one for each
partition. Suppose that the index variables correspond to the elements Li and Rj in
the left/right partition. If Li Rj , we add Li to the end of a temporary array and
increment i. Otherwise we place Rj into the temporary array and increment j. We
continue until we have examined every element in one or both partitions (if one partition
still has elements in it, we can simply copy the rest over in order). This merge operation
works because each subarray is sorted.
Merge Sort is presented as Algorithm 12.8 with the Merge subroutine as Algorithm
12.9.
1
2
3
4
5
6
7
8
227
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
while i n And j m do
if Li Ri then
Ak Li
i (i + 1)
else
Ak Lj
j (j + 1)
end
k (k + 1)
end
//At least one collection is empty, we can blindly copy the other
while i n do
Ak Li
i (i + 1)
k (k + 1)
end
while j m do
Ak Rj
j (j + 1)
k (k + 1)
end
output A
Algorithm 12.9: Merge
Example
We present an example run of Merge Sort in Figure 12.10 (here, we have made the
collection of size 8 to emphasize the even split). An example run of the Merge
subroutine on the last merge operation of this algorithm is presented in Figure 12.11.
228
12.2. Sorting
42
42
42
42
102 34
12
102 34
12
102 34
102
12
34
12
merge
merge
merge
merge
34 102
42
merge
4
12
merge
42
12
34 102
merge
2
12
34
42 102
229
42
12
34 102
42
12
34 102
42
9
k
42
12
34 102
42
12
42
34 102
j
12
34
k
12
42
12
34 102
12
34 102
12
34 102
12
42
12
34
42
k
12
34
42 102
k
Figure 12.11.: Demonstration of the merge operation in Merge Sort. Here we depict the
final Merge subroutine invocation from the previous example.
230
12.2. Sorting
Analysis
Because Merge Sort divides the list first, an even split is guaranteed. After the recursion,
the Merge subroutine requires at most n 1 comparisons to merge the two collections.
This leads to a recurrence relation similar to Quick Sort,
n
+ (n 1)
C(n) = 2C
2
A similar analysis yields a complexity of n log (n)
231
Best
n2
n
Complexity
Average
n2
n2
Worst
n2
n2
n2
Quick Sort
n log n n log n
Merge Sort
Stable? Notes
No
Yes
No
Yes
Table 12.2.: Summary of Sorting Algorithms. See Section 12.3.6 for a discussion on
stability.
232
Some implementations of Quick Sort will do something similar when choosing the middle element
as a pivot.
233
if a < b then
output 1
else if a = b then
output 0
else
output +1
end
For example, if a = 5, b = 10 then our comparator would output 1. If the values were
a = 10, b = 5 it would output +1, and if they were equal, a = b = 10 then it would
output zero. A common trick is to instead compute the difference between these values,
a b. Observe:
234
b ab
10 5
5
5
5
0
The sign of the difference in each of these examples matches the logic of our if-else-if
statement. The lazy programmer may be tempted to write a one liner, output (a b)
instead of the if-else-if statement. However, this would fail for certain values of a and b.
Suppose both of them are 32-bit 2s complement integers. And suppose that a = 231 1,
the maximum representable value, and b = 10. In the logic above, b would come before
a and so we would expect a positive result. However, our arithmetic trick would give the
following result:
(231 1) (10) = 231 + 9
which, mathematically, is positive, but exceeds the maximum representable value, leading
to overflow. In most systems, the result of this arithmetic is 2, 147, 483, 639, a negative
value. There are many other input values that could cause arithmetic overflow. Only if
you are absolutely sure that no arithmetic overflow is possible should you even consider
using this trick.
Another issue with this trick is when comparing floating point values. GPAs for example:
suppose that a = 4.0 and b = 3.9. Their difference would be a b = 0.1. However, a
comparator returns an integer value. In some languages this result would be casted to
an integer, truncating the fractional value, so that 0.1 0, meaning that a GPA of 3.9
is equivalent to the 4.0.
For these reasons, it is best to avoid this trick altogether.
235
However, this logic is complex and does not provide a good solution if we want to then
add support for Pre-freshman or Graduate, etc.
Another solution would be to model the year using a data type that has a natural
ordering. For example, we could define an enumerated type and associate each of the
years with 0, 1, 2, 3; giving them a natural ordering.
Alternatively, we could use a data structure, such as a map, to model the artificial
ordering. For example, we could map Freshman to 0, Sophomore to 1, etc. Then,
when we wanted to order two elements, we could look up the natural value via the
map and use their natural ordering to order our elements.
236
12.4. Exercises
12.4. Exercises
Exercise 12.1. Give an input example input demonstrating that the Quick Sort algorithm, as presented, is unstable. Run through the algorithm to demonstrate how it
results in an unstable sort.
237
238
12.4. Exercises
Artificial Ordering:
newaxdtjlt
nfrfrwkknj
laencfuesw
gkkmgwwwpa
gvtkwekfom
fzqrvgblov
wmcvmjmtet
vcawufotrb
vrsfqictqc
qetegyqelu
For simplicity, you can assume that all words will be lower case and no non-alphabetic
characters are used. However, you may not assume that all words will be the same length.
Words of a shorter length that are a prefix of another word should be ordered first. For
example, newax should come before newaxn in the ordering above.
Exercise 12.5. Write comparators for all the member fields of the album object in
Exercise 10.2.
Exercise 12.6. Write comparators for all the member fields of the savings account
object in Exercise 10.3.
Exercise 12.7. Write comparators for all the member fields of the stadium object in
Exercise 10.4.
Exercise 12.8. Write comparators for all the member fields of the network device object
in Exercise 10.5.
Exercise 12.9. Write comparators for all the member fields of the airport object in
Exercise 10.6.
239
241
243
Part I.
The C Programming Language
245
15. Basics
The C programming language is a relatively old language, but still widely used to this
day. It is nearly universal in that nearly every system, platform, and operating system
has a C compiler that produces machine code for that system. C is used extensively in
systems programming for operating system kernels, embedded systems, microcontrollers,
and supercomputers. It is generally an imperative language which is a paradigm that
characterizes computation in terms of executable statements and functions that change a
programs state (variable values). C has been highly influential in the design of other
languages including C++, Objective-C, C#, Java, PHP, Python, among many others.
Many languages have adopted the basic syntactic elements and structured programming
approach of the C language.
C was originally developed by Dennis Ritchie while at AT&T Bell Labs 19691972. C
was born out of the need for a new language for the PDP-11 minicomputer that used
the Unix operating system (written by Ken Thompson). From its inception, C has had
a close relation to Unix; in fact the operating system was subsequently rewritten in C,
making it the first OS to be written in a language other than assembly. The language
was dubbed C as its predecessor was named B, a simplified version of BCPL (Basic
Combined Programming Language). The first formal specification was published as The
C Programming Language by Kernighan and Ritchie (1978) [24] often referred to as
The K&R Book which would later become the American National Standards Institute
(ANSI) C standard.
C gained in popularity and directly influenced object-oriented variations of it. Bjarne
Stroustrup developed C++ while at Bell Labs during 19791983. Brad Cox and Tom
Love developed Objective-C during 19811983 at their company Stepstone. Subsequent
standards of the C language have added and extended features. In 1990, the International
Organization for Standardization (ISO)/IEC 9899:1990 standard, referred to as C89
or C90, was adopted. About every 10 years since, a new standard has been adopted;
ISO/IEC 9899:1999 (referred to as C99) in 1999 and ISO/IEC 9899:2011 (C11) in 2011.
247
15. Basics
to be on the basic syntax of the language. It is also typically used to ensure that your
development environment, compiler, runtime environment, etc. are functioning properly
with a minimal example. The Hello World! program is generally attributed to Brian
Kernighan who used it as an example of programming in C in 1974 [23]. A basic Hello
World! program in C can be found in Code Sample 15.1.
1
2
#include<stdlib.h>
#include<stdio.h>
3
4
5
6
7
8
/**
* Basic Hello World program in C
* Prints "Hello World" to the standard output and exits
*/
int main(int argc, char **argv) {
printf("Hello World\n");
10
11
return 0;
12
13
}
Code Sample 15.1: Hello World Program in C
We will not focus on any particular development environment, code editor, or any
particular operating system, compiler, or ancillary standards in our presentation. However,
as a first step, you should be able to write, compile, and run the above program on
the environment you intend to use for the rest of this book. This may require that you
download and install a basic C compiler/development environment (such as GCC, the
GNU Compiler Collection on OSX/Unix/Linux, cygwin or MinGW for Windows) or a
full IDE (such as Xcode for OSX, or Code::Blocks, http://www.codeblocks.org/ for
Windows).
248
#include<stdlib.h>
#include<stdio.h>
249
15. Basics
provides in your program. Using the functions without including the library may result in
a compiler error. The .h in the library names stands for header; function declarations
are typically contained in a header file while their definitions are placed in a source file
of the same name. Well explore this convention in detail when we look at functions in C
(Chapter 18.
There are many other important standard libraries that well touch on as needed, but another one that may be of immediate interest is the standard mathematics library, math.h .
It includes many useful functions to compute common mathematical functions such as
the square root and natural logarithm. Table 15.1 highlights several of these functions.
To use them youd include the math library in your source file #include<math.h> and
then call them by providing input and getting the output. For example:
1
2
3
4
double x = 1.5;
double y, z;
In both of the function calls above, the value of the variable x is passed to the math
function which computes and returns the result which then gets assigned to another
variable.
Macros
Another preprocessor directive establishes macros using the #define keyword. A macro
is a single instruction that specifies a more complex set of instructions. The macro can
be used to define constants to be used throughout your program. To illustrate, consider
the following example.
1
The macro defines an alias for the MILES_PER_KM identifier as the value 1.60934. Essentially, the C preprocessor will go through the code and any instance of MILES_PER_KM
will be replaced with 1.609 . The advantage of using a macro like this is that we can use
the identifier MILES_PER_KM throughout our program instead of mysterious numbers
whose meaning and intent may not be immediately clear. Moreover, if we want to change
the definition (say make it more precise, 1.60934) then we only need to change the macro
instead of making the same change throughout our program.
As a stylistic note: macro constants in C are usually associated with uppercase underscore
casing as in our example. Also, the math standard library
defines several macros for
250
Function
abs(x)
fabs(x)
Description
Absolute value for int variables, |x|a
Absolute value for double variables
ceil(x)
floor(x)
cos(x)
sin(x)
tan(x)
Cosine functionb
Sine functionb
Tangent functionb
exp(x)
log(x)
log10(x)
pow(x,y)
sqrt(x)
Table 15.1.: Several functions defined in the C standard math library. a The absolute value
function is actually in the standard library, stdlib.h . b all trigonometric
functions assume input is in radians, not degrees. c Input is assumed to be
positive, x > 0.
251
15. Basics
15.2.3. Comments
Comments can be written in a C program either as a single line using two forward
slashes, //comment or as a multiline comment using a combination of forward slash
and asterisk: /* comment */ . With a single line comment, everything on the line after
the forward slashes is ignored. With a multiline comment, everything in between the
forward slash/asterisk is ignored. Comments are ultimately ignored by the compiler so
the amount of comments do not have an effect on the final executable code. Consider
the following example.
1
2
3
4
5
6
7
8
9
/*
This is a comment that can
span multiple lines to format the comment
message more clearly
*/
double y;
Most code editors and IDEs will present comments in a special color or font to distinguish
them from the rest of the code (just as our example above does). Failure to close a
multiline comment will likely result in a compiler error but with color-coded comments
its easy to see the mistake visually.
252
15.3. Variables
arguments as strings. Well be able to understand the syntax later on, but for now we
can at least understand how we might convert these arguments to different types such as
integers and floating-point numbers.
First, recall that argv is the argument vector : it is an array (see Chapter 20) of the
command line arguments. To access them, you can index them starting at zero, the
first being argv[0] , the second argv[1] , etc. (the last one of course would be at
argv[argc-1] ). The first one is always the name of the executable file being run. The
remaining are the command line arguments provided by the user.
To convert them you can use two different functions, atoi and atof which are short
for alphanumeric to integer and f loafing-point number respectively. An example:
1
2
3
4
5
6
15.3. Variables
As previewed, the three main primitive types supported in C are int , double , and
char which support integers, floating-point numbers, and single ASCII characters.
Integer ( int ) types are only guaranteed to be at least 16 bytes by the C standard
but are usually 32-bit signed integers on most modern systems1 . With a 32-bit signed
int we can represent integers between 2, 147, 483, 648 and 2,147,483,647.
Doubles ( double ) types are usually double-precision floating-point numbers as per the
IEEE 754 standard and provide about 16 digits of precision.
Though C does provide a float (single precision floating-point number) type and there
are various modifiers such as short , long , unsigned and signed that can be used,
but these are either system-dependent or rely on later versions of the C standard (such
as C99). We will restrict our focus to more portable, interoperable code and stick with
the basic two types in most of our code.
Finally, the char type is typically a single byte that represents a single ASCII character.
For all intents and purposes a char can be treated as an integer in the range 0 to 127
(or 255) as defined by the ASCII text table (see Table 2.4).
1
You may have to deal with 16-bit int types in legacy systems/compilers or in modern embedded
systems.
253
15. Basics
int numUnits;
double costPerUnit;
char firstInitial;
Each declaration specifies the variables type followed by the identifier and ending with
a semicolon. The identifier rules are fairly standard: a name can consist of lower and
uppercase alphabetic characters, numbers, and underscores but may not begin with a
numeric character. We adopt the modern camelCasing naming convention for variables
in our code.
The assignment operator is a single equal sign, = and is a right-to-left assignment. That
is, the variable that we wish to assign the value to appears on the left-hand-side while
the value (literal, variable or expression) is on the right-hand-size. Using our variables
from before, we can assign them values:
1
2
3
numUnits = 42;
costPerUnit = 32.79;
firstInitial = C;
An important thing to understand and to keep in mind is: if you declare a variable but do
not assign it a value, its value is undefined. That is, if we code something like int a; ,
the value of the variable a is not necessarily zero; depending on the system, it could
contain a special value that indicates uninitialized memory or it could contain garbage,
or it could have the value zero. The C standard does not specify default values for
variables. The default value of variables is highly system dependenton the compiler, the
libraries, and even the operating system. Do not make any assumptions on the initial or
default values of variables. If you need such assumptions, then values must be assigned.
For brevity, C allows you to declare a variable and immediately assign it a value on the
same line. So these two code blocks could have been more compactly written as:
1
2
3
As another shorthand, we can declare multiple variables on the same line by delimiting
them with a comma. However, they must be of the same type. We can also use an
assignment with them.
254
15.4. Operators
1
2
Another convenient keyword is const , short for constant. We can apply it to any
variable to indicate that it is a read-only variable. Of course, we must assign it a value
at declaration. For example:
1
2
Any attempt to reassign the values of const variables will result in a compiler error.
15.4. Operators
C supports the standard arithmetic operators for addition, subtraction, multiplication,
and division using + , - , * , and / respectively. Each of these operators is a binary
operator that acts on two operands which can either be literals or other variables and
follow the usually rules of arithmetic when it comes to order of precedence (multiplication
and division before addition and subtraction).
1
2
3
4
5
6
7
int
d =
d =
d =
d =
d =
d =
a
a
a
a
a
a
a
=
+
+
+
*
/
See below
8
9
10
11
12
13
14
15
16
17
18
19
Special care must be taken when dealing with int types. For all four operators, if
255
15. Basics
both operands are integers, the result will be an integer. For addition, subtraction, and
multiplication this isnt a big deal, but for division it means that when we divide, say
10 / 20 , the result is not 0.5 as expected. The number 0.5 is a floating-point number.
As such, the fractional part gets truncationtruncated (cut off and thrown out) leaving
only zero. In the code above, d = a / b; the variable d ends up getting the value
zero because of this.
Similarly, attempting to assign a floating-point number to an integer also results in
truncation because an int type cannot handle the fractional part. In the line d = b + y
above, b + y is correctly 20 + 3.4 = 23.4, but when assigned to the int variable d
the .4 gets truncated and d has the value 23.
Assigning an int value to a double variable is not a problem as the integer 2 becomes
the floating-point number 2.0.
A solution to this problem is to use explicit type casting to force at least one of the
operands in an integer division to become a double type. For example:
1
2
3
4
x = (double) a / b;
results in x getting the correct value of 0.5. This works because the (double) code
forces the int variable a to temporarily be treated as a double variable (in this case
10.0) for the purposes of division (so that truncation does not occur).
C also supports the integer remainder operator using the % symbol. This operator gives
the remainder of the result of dividing two integers. Examples:
1
int x;
2
3
4
5
x = 10 % 5; //x is 0
x = 10 % 3; //x is 1
x = 29 % 5; //x is 4
256
int a;
printf("Please enter a number: ");
scanf("%d", &a);
The printf statement prompts the user for an input. The scanf then executes and
the program waits for the user to enter input. The user is free to start typing. When
the user is done, they hit the enter key at which point the program resumes and reads
the input from the standard input buffer, converts the value entered by the user into
an integer and places the result in the variable a where it can now be used by the
remainder of the program.
A few points of interest. First, the same placeholder as printf was used, %d for int
values. However, when we passed the variable a to scanf we placed an ampersand,
& in front of it. This is passing the variable by reference and well explore that concept
further in Chapter 18, but for now just know that when using variables with scanf , an
ampersand is required. Failure to place an ampersand in front of a variable with scanf
will likely result in a segmentation fault (an illegal memory access).
You can use the same placeholder, %c with scanf to read in single characters as well.
However, for floating-point numbers, in particular double types, the placeholder %lf
must be used (which stands for long float, a double precision number). Failure to use
the correct placeholder may result in garbage results as the input will be interpreted
incorrectly. Another example:
1
2
3
double x;
printf("Please enter a fractional number: ");
scanf("%lf", &x);
Another potential problem is that scanf expects a certain format (thus its name).
If we prompt the user for a number but they just start mashing the keyboard giving
non-numerical input, we may get incorrect results. scanf will likely interpret the input
as zero. It may be very difficult to distinguish between the case of a user actually
entering in zero as a legitimate input versus bad input. In general, scanf is not a
good mechanism for reading input (and in fact can be very dangerous), but it is a good
starting point.
257
15. Basics
15.6. Examples
15.6.1. Converting Units
Lets start with a simple task: lets write a program that will prompt the user to enter a
temperature in degrees Fahrenheit and convert it to degrees Celsius using the formula
C = (F 32)
5
9
We begin with the basic program outline which will include preprocessor directives to
bring in the standard library and the standard input/output library (after all, well need
to prompt for input and print the result as output to the user). Further, we want our
program to be executable, so we need to put our code into the main method. Finally,
well document our program to indicate its purpose.
1
2
#include<stdlib.h>
#include<stdio.h>
3
4
5
6
7
8
/**
* This program converts Fahrenheit temperatures to
* Celsius
*/
int main(int argc, char **argv) {
10
11
return 0;
12
13
It is common for programmers to use a comment along with a TODO note to themselves
as a reminder of things that they still need to do with the program.
Lets first outline the basic steps that our program will go through:
1. Well first prompt the user for input, asking them for a temperature in Fahrenheit
2. Next well read the users input, likely into a floating-point number as degrees can
be fractional
3. Once we have the input, we can calculate the degrees Celsius by using the formula
above
4. Lastly, we will want to print the result to the user to inform them of the value
Sometimes its helpful to write an outline of such a program directly in the code using
258
15.6. Examples
comments to provide a step-by-step process. For example:
1
2
#include<stdlib.h>
#include<stdio.h>
3
4
5
6
7
8
/**
* This program converts Fahrenheit temperatures to
* Celsius
*/
int main(int argc, char **argv) {
10
11
12
13
14
input in Fahrenheit
value from the standard input
Celsius
the user
15
return 0;
16
17
As we read each step it becomes apparent that well need a couple of variables: one to
hold the Fahrenheit (input) value and one for the Celsius (output) value. It also makes
sense that each of these should be double variables as we want to support fractional
values. So at the top of our main function, well add the variable declarations:
double fahrenheit, celsius;
Each of the steps is now straightforward; well use a printf statement in the first step
to prompt the user for input:
printf("Please enter degrees in Fahrenheit: ");
In the second step, well use the standard input to read the fahrenheit variable value
from the user. Recall that we use the placeholder %lf for reading double values and
use an ampersand when using scanf :
scanf("%lf", &fahrenheit);
We can now compute celsius using the formula provided:
celsius = (fahrenheit - 32) * (5 / 9);
Finally, we use printf again to output the result to the user:
printf("%f Fahrenheit is %f Celsius\n", fahrenheit, celsius);
Try typing and running the program as defined above and youll find that you dont
259
15. Basics
get correct answers. In fact, youll find that no matter what values you enter, you get
zero. This is because of the calculation using 5 / 9 : recall what happens with integer
division: truncation! This will always end up being zero.
One way we could fix it would be to pull out our calculators and find that 59 = 0.55555 . . .
and replace 5/9 with 0.555555 . But, how many fives? It may be difficult to tell how
accurate we can make this floating-point number by hardcoding it ourselves. A much
better approach would be to let the compiler take care of the optimal computation for
us by making at least one of the numbers a double to prevent integer truncation. That
is, we should instead use 5.0 / 9 .
The full program can be found in Code Sample 15.2.
1
2
#include<stdlib.h>
#include<stdio.h>
3
4
5
6
7
8
/**
* This program converts Fahrenheit temperatures to
* Celsius
*/
int main(int argc, char **argv) {
10
11
12
13
14
15
16
17
18
19
20
21
22
23
return 0;
24
25
}
Code Sample 15.2: Fahrenheit-to-Celsius Conversion Program in C
260
15.6. Examples
b b2 4ac
x=
2a
As before, we can create a basic program with a main function and start filling in
the details. In particular, well need to prompt for the input a, then read it in; then
prompt for b, read it in and repeat for c. Well also need several variables: three for the
coefficients a, b, c and two more; one for each root. Thus, we have
1
2
3
4
5
6
7
8
Now to compute the roots: we need to take care that we correctly adapt the formula so
it accurately reflects the order of operations. We also need to use the standard math
librarys square root function (unless you want to write your own!2 Carefully adapting
the formula leads to
1
2
Finally, we print the output using printf . The full program can be found in Code
Sample 15.3.
This program was interactive. As an alternative, we could have read all three of the inputs
as command line arguments, taking care that we need to convert them to floating-point
numbers. Lines 1217 in the program could have been changed to
1
2
3
a = atof(argv[1]);
b = atof(argv[2]);
c = atof(argv[3]);
261
15. Basics
#include<stdlib.h>
#include<stdio.h>
#include<math.h>
1
2
3
4
/**
* This program computes the roots to a quadratic equation
* using the quadratic formula.
*/
int main(int argc, char **argv) {
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
262
16. Conditionals
C supports the basic if, if-else, and if-else-if conditional structures as well as switch
statements. Logical statements are built using the standard logical operators for numeric
comparisons as well as logical operators such as negations, And, and Or. However, there
are a few idiosyncrasies that need to be understood.
int
int
int
int
a
b
c
d
=
=
=
=
10;
20;
10;
0;
The six standard comparison operators are presented in Table 16.1 using these variables
as examples. The comparison operators are the same when used with double types as
well and int types and double types can be compared with each other without type
casting.
The three basic logical operators are also supported as described in Table 16.2 using the
same code snippet variable values as examples.
263
16. Conditionals
Name
Equals
Operator Syntax
==
Not Equals
!=
<
<=
>
>=
Examples
a == 10
b == 10
a == b
a == c
a != 10
b != 10
a != b
a != c
a < 15
a < 5
a < b
a < c
a <= 15
a <= 5
a <= b
a <= c
a > 15
a > 5
a > b
a > c
a >= 15
a >= 5
a >= b
a >= c
Operator
Negation
Operator Syntax
!
And
&&
Or
||
Examples
!a
!d
a && b
a && d
a || b
a || d
264
Values
false
true
true
false
false
true
Value
true
false
false
true
false
true
true
false
true
false
true
false
true
false
true
true
false
true
false
false
false
true
false
true
Lowest
*, /, %
+, < , <= , > , >=
== , !=
&&
||
= , += , -= , *= , /=
Associativity Notes
left-to-right
increment operators
right-to-left
unary negation operator, logical
not
left-to-right
left-to-right
addition, subtraction
left-to-right
comparison
left-to-right
equality, inequality
left-to-right
logical And
left-to-right
logical Or
right-to-left
assignment and compound assignment operators
Table 16.3.: Operator Order of Precedence in C. Operators on the same level have
equivalent order and are performed in the associative order specified.
these operators could be used in one statement, for example,
(b*b < 4*a*c || a == 0 || argc != 4)
it is important to understand the order in which each one gets evaluated. Table 16.3
summarizes the order of precedence for the operators seen so far. This is not an exhaustive
list of C operators.
true
false
true
true
true
false
265
16. Conditionals
Numeric comparison operators cannot be used to compare strings in C. For example, if
we could write ("aardvark" < "zebra") which would be valid C, and it would even
have a result. However, that result wouldnt necessarily be true or false. The reason for
this is that strings in C are actually represented as arrays, which in turn are represented
as memory locations. Well explore these issues in greater depth later on, but for now
understand that you can write this code, it will compile, and it will even run. However,
the results will not be as expected.
1
2
3
4
//example of an if statement:
if(x < 10) {
printf("x is less than 10\n");
}
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Some observations about the syntax: the statement, if(x < 10) does not have a
semicolon at the end. This is because it is a conditional statement that determines
the flow of control and not an executable statement. Therefore, no semicolon is used.
Suppose we made a mistake and did include a semicolon:
266
16.3. Examples
1
2
3
4
int x = 15;
if(x < 10); {
printf("x is less than 10\n");
}
Some compilers may give a warning, but this is valid C; it will compile and it will run.
However, it will end up printing x is less than 10 , even though x = 15! Recall that
a conditional statement binds to the executable statement or code block immediately
following it. In this case, weve provided an empty executable statement ended by the
semicolon. The code is essentially equivalent to
1
2
3
4
int x = 15;
if(x < 10) {
}
printf("x is less than 10\n");
Which is obviously not what we wanted. The semicolon ended up binding to the empty
executable statement, and the code block containing the print statement immediately
followed, but was not bound to the conditional statement which is why the print statement
executed regardless of the value of x.
Another convention that weve used in our code is where we have placed the curly brackets.
First, if a conditional statement is bound to only one statement, the curly brackets are
not necessary. However, it is best practice to include them even if they are not necessary
and well follow this convention. Second, the opening curly bracket is on the same line as
the conditional statement while the closing curly bracket is indented to the same level
as the start of the conditional statement. Moreover, the code inside the code block is
indented. If there were more statements in the block, they would have all been at the
same indentation level.
16.3. Examples
16.3.1. Computing a Logarithm
The logarithm of x is the exponent that some base must be raised to get x. The most
common logarithm is the natural logarithm, ln (x) which is base e = 2.71828 . . .. But
logarithms can be in any base b > 11 What if we wanted to compute log2 (x)? Or
log (x)? Lets write a program that will prompt the user for a number x and a base b
and computes logb (x).
1
Bases can also be 0 < b < 1, but well restrict our attention to increasing functions only.
267
16. Conditionals
Arbitrary bases can be computed using the change of base formula:
logb (x) =
loga (x)
loga (b)
If we can compute some base a, then we can compute any base b. Fortunately we have
such a solution. Recall that the standard library provides a function to compute the
natural logarithm, log() ). This is one of the fundamentals of problems solving: if a
solution already exists, use it. In this case, a solution exists for a different, but similar
problem (computing the natural logarithm), but we can adapt the solution using the
change of base formula. In particular, if we have variables b (base) and x , we can
compute logb (x) using
log(x) / log(b)
But wait: we have a problem similar to the examples in the previous section. The user
could enter invalid values such as b = 10 or x = 2.54 (logarithms are undefined
for non-positive values in any base). We want to ensure that b > 1 and x > 0. With
conditionals, we can now do this. Once we have read in the input from the user we can
make a check for good input using an if statement.
1
2
3
4
This code has something new: exit(1) . The exit function immediately terminates
the program regardless of the rest of the code that may remain. The argument passed to
exit is an integer that represents an error code. The convention is that zero indicates
no error while non-zero values indicate some error. This is a simple way of performing
error handling: if the user provides bad input, we inform them and quit the program,
forcing them to run it again and provide good input. By prematurely terminating the
program we avoid any illegal operation that would give a bad result.
Alternatively, we could have split the conditions into two statements and given a more
descriptive error message. We use this design in the full program which can be found in
Code Sample 16.2. The program also takes the input as command line arguments. Now
that we have conditionals, we can actually check that the correct number of arguments
was provided by the user and quit in the event that they dont provide the correct
number.
268
16.3. Examples
the user. At the same time we can check for bad input (negative values) for both the
inputs.
1
2
3
4
5
6
7
8
9
10
11
12
Next, we can code a series of if-else-if statements for the income range. By placing the
ranges in increasing order, we only need to check the upper bounds just as in the original
example.
1
2
3
4
5
6
7
8
9
Next we compute the child tax credit, taking care that it does not exceed $3,000. A
conditional based on the number of children should suffice as at this point in the program
we already know it is zero or greater.
1
2
3
4
5
if(numChildren <= 3) {
credit = numChildren * 1000;
} else {
credit = 3000;
}
Finally, we need to ensure that the credit does not exceed the total tax liability (the
credit is non-refundable, so if the credit is greater, the tax should only be zero, not
negative).
269
16. Conditionals
1
2
3
4
5
270
16.3. Examples
1
2
3
#include<stdlib.h>
#include<stdio.h>
#include<math.h>
4
5
6
7
8
9
/**
* This program computes the logarithm base b (b > 1)
* of a given number x > 0
*/
int main(int argc, char **argv) {
10
double b, x, result;
if(argc != 3) {
printf("Usage: %s b x \n", argv[0]);
exit(1);
}
11
12
13
14
15
16
b = atof(argv[1]);
x = atof(argv[2]);
17
18
19
if(x <= 0) {
printf("Error: x must be greater than zero\n");
exit(1);
}
if(b <= 1) {
printf("Error: base must be greater than one\n");
exit(1);
}
20
21
22
23
24
25
26
27
28
29
30
31
32
}
Code Sample 16.2: Logarithm Calculator Program in C
271
16. Conditionals
1
2
#include<stdlib.h>
#include<stdio.h>
3
4
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
if(numChildren <= 3) {
credit = numChildren * 1000;
} else {
credit = 3000;
}
38
39
40
41
42
43
44
45
46
47
48
49
printf("AGI:
printf("Tax:
printf("Credit:
printf("Tax Liability:
50
51
52
53
$%10.2f\n",
$%10.2f\n",
$%10.2f\n",
$%10.2f\n",
income);
baseTax);
credit);
totalTax);
54
return 0;
55
56
272
16.3. Examples
1
2
3
#include<stdlib.h>
#include<stdio.h>
#include<math.h>
4
5
6
7
8
9
/**
* This program computes the roots to a quadratic equation
* using the quadratic formula.
*/
int main(int argc, char **argv) {
10
11
12
if(argc !=4) {
printf("Usage: %s a b c\n", argv[0]);
exit(1);
}
13
14
15
16
17
a = atof(argv[1]);
b = atof(argv[2]);
c = atof(argv[3]);
18
19
20
21
if(a == 0) {
printf("Error: a cannot be zero\n");
exit(1);
} else if(b*b < 4*a*c) {
printf("Error: cannot handle complex roots\n");
exit(1);
} else if(b*b == 4*a*c) {
root1 = -b / (2*a);
printf("Only one distinct root: %f\n", root1);
} else {
root1 = (-b + sqrt(b*b - 4*a*c) ) / (2*a);
root2 = (-b - sqrt(b*b - 4*a*c) ) / (2*a);
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
return 0;
39
40
}
Code Sample 16.4: Quadratic Roots Program in C With Error Checking
273
17. Loops
C supports while loops, for loops, and do-while loops using the keywords while , for ,
and do (along with another while ). Continuation conditions for loops are enclosed
in parentheses, (...) and the blocks of code associated with the loop are enclosed in
curly brackets.
int i = 1; //Initialization
while(i <= 10) { //continuation condition
//perform some action
i++; //iteration
}
Code Sample 17.1: While Loop in C
In addition, the continuation condition does not contain a semicolon since it is not an
executable statement. Just as with an if-statement, if we had placed a semicolon it would
have led to unintended results. Consider the following:
1
2
3
4
A similar problem occurs: the while keyword and continuation condition bind to
the next executable statement or code block. As a consequence of the semicolon, the
executable statement that gets bound to the while loop is empty. What happens is
275
17. Loops
even worse: the program will enter an infinite loop. To see this, the code is essentially
equivalent to the following:
1
2
3
4
5
6
In the while loop, we never increment the counter variable i , the loop does nothing,
and so the computation will continue on forever! Some compilers will warn you about
this, others will not. It is valid C and it will compile and run, but obviously wont work
as intended. Avoid this problem by using proper syntax.
Another common use case for a while loop is a flag-controlled loop in which we use a
Boolean flag rather than an expression to determine if a loop should continue or not.
Recall that in C, zero is treated as false and any non-zero numeric value is treated as
true. We can thus create an implicit Boolean flag by using an integer variable and setting
it to 1 for true and 0 for false (when we want the loop to terminate. An example can be
found in Code Sample 17.2.
1
2
3
4
5
6
7
8
9
int i = 1;
int flag = 1;
while(flag) {
//perform some action
i++; //iteration
if(i>10) {
flag = 0;
}
}
Code Sample 17.2: Flag-controlled While Loop in C
276
1
2
3
4
int i;
for(i=1; i<=10; i++) {
//perform some action
}
Code Sample 17.3: For Loop in C
Again, note the syntax: semicolons are placed at the end of the initialization and
continuation condition, but not the iteration statement. Just as with while loops, the
opening curly bracket is placed on the same line as the for keyword. Code within the
loop body is indented, all at the same indentation level.
Another observation is that we declared the index variable i prior to the for loop.
Some languages allow you to declare the index variable in the initialization statement,
for example for(int i=1; i<=10; i++) . Doing so scopes the index variable to the
loop and so i would be out-of-scope before and after the loop body. This is a nice
convenience and is generally good practice. However, C89 and prior standards do not
allow you to do this; the variable must be declared prior to the loop structure. C99 and
newer standards do allow you to do this and some compilers will be somewhat forgiving
when you use the newer syntax (by supporting their own non-standard extensions to C).
For maximum portability, well follow the older convention.
int i;
do {
//perform some action
i++;
} while(i<=10);
Code Sample 17.4: Do-While Loop in C
Note the syntax and style: the opening curly bracket is again on the same line as the
keyword do . The while keyword and continuation condition are on the same line as
277
17. Loops
the closing curly bracket. In a slight departure from consistent syntax, a semicolon does
appear at the end of the continuation condition even though it is not an executable
statement.
The initialization sets the variable x to the first element in the array, arr[0] . The
loop continues for as many elements as there are in the array, n . The iteration does two
things: it assigns x to the next element in the array while at the same time incrementing
the index variable using the prefix increment operator (see Section 2.3.6).
17.5. Examples
17.5.1. Normalizing a Number
Lets revisit the example from Section 4.1.1 in which we normalize a number by continually
dividing it by 10 until it is less than 10. The code in Code Sample 17.5 specifically refers
to the value 32145.234 but would work equally well with any value of x .
278
17.5. Examples
1
2
3
4
5
6
double x = 32145.234;
int k = 0;
while(x > 10) {
x = x / 10; //or: x /= 10;
k++;
}
Code Sample 17.5: Normalizing a Number with a While Loop in C
17.5.2. Summation
Lets revisit the example from Section 4.2.1 in which we computed the sum of integers
1 + 2 + + 10. The code is presented in Code Sample 17.6
1
2
3
4
5
int i;
int sum = 0;
for(i=1; i<=10; i++) {
sum += i;
}
Code Sample 17.6: Summation of Numbers using a For Loop in C
Of course we could easily have generalized the code somewhat. Instead of computing a
sum up to a particular number, we could have written it to sum up to another variable
n , in which case the for loop would instead look like the following.
1
2
3
279
17. Loops
1
2
3
4
5
6
7
8
int i, j;
int n = 10;
int m = 20;
for(i=0; i<n; i++) {
for(j=0; j<m; j++) {
printf("(i, j) = (%d, %d)\n", i, j);
}
}
Code Sample 17.7: Nested For Loops in C
The inner loop execute for j = 0, 1, 2, . . . , 19 < m = 20 for a total of 20 times. However, it
executes 20 times for each iteration of the outer loop. Since the outer loop execute for i =
0, 1, 2, . . . , 9 < n = 10, the total number of times the printf statement execute is 10
20 = 200. In this example, the sequence (0, 0), (0, 1), (0, 2), . . . , (0, 19), (1, 0), . . . , (9, 19)
will be printed.
However, recall that we may have problems due to accuracy. The monthly payment
could come out to be a fraction of a cent, say $43.871. For accuracy, we need to ensure
that all of the figures for currency are rounded to the nearest cent. The standard math
library does have a round function, but it only rounds to the nearest whole number,
not the nearest 100th.
However, we can adapt the off-the-shelf solution to fit our needs. If we take the number,
multiply it by 100, we get (say) 4387.1 which we can now round to the nearest whole
number, giving us 4387. We can then divide by 100 to get a number that has been
rounded to the nearest 100th! In C, we could simply do the following.
monthlyPayment = round(monthlyPayment * 100.0) / 100.0;
We can use the same trick to round the monthly interest payment and any other number
280
17.5. Examples
expected to be whole cents. To output our numbers, we use printf and take care to
align our columns to make make it look nice. To finish our adaptation, we handle the
final month separately to account for an over/under payment due to rounding. The full
solution can be found in Code Sample 17.8.
281
17. Loops
1
2
3
#include<stdio.h>
#include<stdlib.h>
#include<math.h>
4
5
if(argc != 4) {
printf("Usage: %s principle apr terms\n", argv[0]);
exit(1);
}
7
8
9
10
11
12
13
14
15
16
17
18
19
//monthly payment
double monthlyPayment = (monthlyInterestRate * principle) /
(1 - pow( (1 + monthlyInterestRate), -n));
//round to the nearest cent
monthlyPayment = round(monthlyPayment * 100.0) / 100.0;
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
return 0;
51
52
18. Functions
As a procedural-style language, functions are essential in C programming. As weve already
seen, C provides a large library of standard functions to perform basic input/output,
math, and many other functions. C also provides the ability to define and use your own
functions.
When you define functions in C, careful thought must be made as to the naming of your
functions. This is because C does not support function overloading. When you name a
function, that is the only function that can have that name. Consequently, you cannot,
in general, use the same function names as defined in the standard libraries or any other
3rd party library that you would like to use in your programs.
C supports both call by value and call by reference using pointers (see Section 18.2). C
also supports vararg functions ( printf() being a prime example) and allows you to
define vararg functions, but we will not cover them in depth here. Finally, parameters
are not, in general, optional. For modern versions of C the omission of parameters is a
syntax error. For older versions, complex rules dictate what happens when arguments
are omitted, but doing so usually results in garbage.
283
18. Functions
repeated with the function definition. This is a principle known as Dont Repeat Yourself
(DRY). Consider the following examples. In these examples we use a commenting style
known as doc comments. This style was originally developed for Java but has since
been adopted by many other languages.
1
2
3
4
/**
* Computes the sum of the two arguments.
*/
int sum(int a, int b);
5
6
7
8
9
10
/**
* Computes the Euclidean distance between the 2-D points,
* (x1,y1) and (x2,y2).
*/
double getDistance(double x1, double y1, double x2, double y2);
11
12
13
14
15
16
17
18
/**
* Computes a monthly payment for a loan with the given
* principle at the given APR (annual percentage rate) which
* is to be repaid over the given number of terms (usually
* months).
*/
double getMonthlyPayment(double principle, double apr, int terms);
In each of these, the return type is the first thing specified. The function identifier
(name) is then specified. Function names must follow the same naming rules as variables:
they must begin with an alphabetic character and may contain alphanumeric characters
as well as underscores. However, using modern coding conventions we usually name
functions using lower camel casing.
Note again, each prototype ends with a semicolon. Further, prototypes do not specify
what the function does, they only specify its signature. Later in the program, we can
provide the actual definition of each function by using the following syntax. We repeat
the signature, but instead of using a semicolon, we provide a code block, enclosed using
opening/closing curly brackets, that specifies the function body. Here are the definitions
from the prototype examples above:
284
4
5
6
7
8
9
10
11
12
13
14
15
The keyword return is used to specify the value that is returned to the calling function.
//prototype:
void printCopyright();
3
4
5
6
7
//definition:
void printCopyright() {
printf("(c) Bourke 2015\n");
}
In the example above, weve also illustrated how to define a function that has no inputs.
Some sources may include an explicit void keyword as a parameter to indicate the
function takes no parameters as in void printCopyright(void); .
285
18. Functions
with the usual file extension, .c .
Weve seen this before with the standard libraries: we use #include<math.h> to
include the math librarys header file in our code. This essentially brings in the math
library function prototypes so that we can write calls to, say, sqrt() or sin() . Only
when we compile do we actually need to link our code to the function definitions.
When we separate prototypes into header files and definitions into source files we also
need to include the prototypes in our source file just as we would need to include them
in any other file in which we use one of the functions. Suppose our functions above have
their prototypes in a file named utils.h and their definitions in a file named utils.c .
In the utils.c source file we would typically use the following syntax to include the
header file:
#include "utils.h"
We use the double quote syntax with user-defined libraries while the usual less-than/greaterthan syntax is used with standard libraries. With the less-than/greater-than syntax, the
compiler will usually attempt to look for the header file(s) in a specified system directory
which it will fail to find if it is a user-defined library.
Furthermore, other elements are usually included in header files such as preprocessor
directives and other declarations (such as enumerated types and structures which we
introduce later).
3
4
5
6
7
8
9
10
By default, all primitive types including int , double , and char are passed by
286
18.2. Pointers
valuefunction arguments are passed by value. To be able to pass arguments by reference,
we need to use pointers.
18.2. Pointers
Consider the following line of C code.
int a = 10;
This line creates an integer variable and sets it equal to 10. In more detail, this line
creates a spot in memory (typically 32 bits) and stores a binary representation of the
value 10 at that location. In many instances, we dont care where the variable is stored
in memory. However, we may have need to communicate that memory location to other
functions. To do so, we can use pointers.
A pointer in C is a reference to a memory location. Because different types ( int ,
double , char ) take a different amount of memory, it is necessary to have a pointer for
each type. That is, a pointer that points to a memory location that stores an int or a
pointer that points to a memory location that stores a double , etc.
The syntax for declaring a pointer is to use an asterisk.
1
2
3
4
5
6
7
If ptrA represents a memory location, what values can it take on? At the end of the
day, a memory location is just a number, so you could do something like the following.
ptrA = 10;
Though syntactically this makes sense (and generally the compiler will let you do this
with perhaps at most a warning), its not really what you want. This assigns to the
pointer variable ptrA the value 10, which will be interpreted as the memory address
10. This memory address may not belong to your program, or it may not even exist as
a valid memory address. Attempts to access the value stored at an arbitrary memory
location may be illegal and may result in the operating system killing the program with
a segmentation fault or similar error.
There are many reasons why a program should not be allowed access to arbitrary memory
287
18. Functions
locations, but one of the prime reasons is security. Imagine if the operating system
allowed a program access to any part of memory; in particular memory that contained
sensitive information such as passwords or secret Secure Sockets Layer (SSL) keys. To
prevent this, operating systems generally only allow a program to access its own memory.
Referencing Operator
ptrA = &a;
ptrB = &b;
The operation &a gets the memory address of the variable a and we can assign it to a
pointer value. Again, note that the pointer type and variable type should match. making
a double pointer point to an int variable type such as
ptrB = &a;
is valid syntax, but since the two types use different amounts of memory you may get
garbage results.
There is a special value used in C called NULL which is a (case-sensitive) keyword used
for uninitialized, undefined, empty or otherwise invalid or meaningless value. In the
context of memory locations, NULL points to nothing. As with regular variables, its
best practice to initialize pointer values to NULL . For example,
int *ptrA = NULL;
Without an initialization, the pointer may point to a random memory address which may
be dangerous to attempt to access. You can also test whether or not a pointer points to
NULL using the usual equality operator.
1
2
3
4
5
288
18.2. Pointers
Dereferencing Operator
Once we have a valid pointer to a memory location, we may want to manipulate the
contents of the memory it references. To do this we use the inverse operator, the
dereferencing operator which again uses an asterisk. Given a pointer variable ptrA , we
apply an asterisk in front of it to turn it into a regular variable. Consider the following
example.
1
2
3
4
5
6
7
8
9
10
11
12
13
How each of these lines of code operate in memory is depicted in Figure 18.1.
Now that we have the ability to reference a memory location using pointers, we can write
functions that pass variables by reference. To do so, we use the same asterisk syntax
used with pointer variables.
289
18. Functions
ptrA
Address
Contents
0xc260ec88
0xf289fb18
0xf289fb14
NULL
ptrA
Address
Contents
0xc260ec88
0xf289fb18
0xf289fb14
0xc260ec84
..
.
0xc260ec88
0xc260ec84
0xc260ec80
..
.
10
ptrA
Address
Contents
0xc260ec88
0xf289fb18
0xf289fb14
0xc260ec84
0xc260ec88
0xc260ec84
0xc260ec80
10
..
.
0xc260ec88
0xc260ec84
0xc260ec80
ptrA = &a;
*ptrA = 20;
20
Figure 18.1.: Pointer Operations. Pointers can be made to point to other variables
memory locations. You can manipulate/access values of variables via their
pointers using dereferencing.
290
18.2. Pointers
1
2
3
4
5
6
7
//prototypes
/**
* This function sums the first two variables (passed by
* value) and places the result into the third variable
* (passed by reference).
*/
void sum(int a, int b, int *c);
8
9
10
11
12
13
/**
* This function swaps the values stored in the
* two variables passed by reference.
*/
void swap(int *a, int *b);
In the function definitions, we can use the dereferencing operator to access or modify the
value stored in the variable pointed to by the pointer.
1
2
3
4
5
6
7
8
9
10
11
12
int x = 10;
int y = 20;
int c;
int *ptrC = &c;
sum(x, y, ptrC);
//at this point c contains the value 30
7
8
9
10
swap(&x, &y);
//at this point, the values in x and y have been swapped
// x contains 20 and y contains 10
291
18. Functions
This should look familiar. We saw this same syntax when we used scanf() to read input
from the standard input. We needed to place an ampersand in front of each variable in
order to pass the variable by reference so that scanf() could place the results into the
respective memory locations. If the variables had been passed by value, then scanf()
would not have been able to manipulate their values.
You can also specify functions to return pointers which we discuss in detail in Chapter
20.
Functions are just pieces of code that reside somewhere in memory just as variables do.
Since we can create pointers that point to variables, it makes sense to be able to create
variables that point to functions too! These are referred to as function pointers.
The syntax for declaring function pointers is similar to variable pointers. However, since
a functions signature involves a return type and parameter list, these need to be specified.
For example, suppose we wanted to create a function pointer that could point to the
math librarys sqrt() function which takes a single double parameter and returns a
double value.
double (*ptrToSqrt)(double) = NULL;
The above line creates a function pointer that can point to any function that takes
a single double parameter and returns a double value. As is good practice, weve
initialized it to point to NULL . The function pointer itself is named ptrToSqrt . To
make it point to the sqrt() function we can use the following syntax.
ptrToSqrt = sqrt;
This is because a functions identifier acts as a pointer as well! Once we have a pointer
to a function, we can invoke the function via its pointer as we would any other function
call.
double x = ptrToSqrt(2.0);
Some more examples:
292
18.3. Examples
1
2
3
4
5
6
7
8
9
10
11
12
13
double x;
double (*ptr)(double) = NULL;
//we can make it point to sqrt:
ptr = sqrt;
x = ptr(2.0); //x contains 1.4142...
//or we can make it point to fabs
ptr = fabs;
x = ptr(-10.5); //x contains 10.5
You generally want to create and use function pointers when passing and returning
functions as arguments to other functions as callbacks. We discuss this in further detail
in Chapter 25.
18.3. Examples
18.3.1. Generalized Rounding
Recall that the standard math library provides a round() function that rounds a number
to the nearest whole number. Often, weve had need to round to cents as well. We
now have the ability to write a function to do this for us. Before we do, however, lets
think more generally. What if we wanted to round to the nearest tenth? Or what if we
wanted to round to the nearest 10s or 100s place? Lets write a general purpose rounding
function that allows us to specify which decimal place to round.
The most natural input values would be to specify the place using an integer exponent.
That is, if we wanted to round to the nearest tenth, then we would pass it 1 as
0.1 = 101 , 2 if we wanted to round to the nearest 100th, etc. On the positive end
passing in 0 would correspond to the usual round function, 1 to the nearest 10s spot,
and so on.
Moreover, we could demonstrate good code reuse (as well as procedural abstraction)
by scaling the input value and reusing the functionality already provided in the math
librarys round() function. We could further define a roundToCents() function that
used our generalized round function.
Lets also think about organization. We could place the prototypes into a round.h
header file and the corresponding definitions in a round.c source file. The contents of
these two files are presented here:
293
18. Functions
1
2
3
4
5
/**
* Rounds to the nearest digit specified by the place
* argument. In particular to the (10^place)-th digit
*/
double roundToPlace(double x, int place);
6
7
8
9
10
1
2
/**
* Rounds to the nearest cent
*/
double roundToCents(double x);
#include<math.h>
#include "round.h"
3
4
5
6
7
8
9
10
11
12
double roundToCents(double x) {
return roundToPlace(x, -2);
}
Observe that neither of these files contains a main() function. By themselves they
would not be able to be compiled into an executable program. Weve essentially built a
small library of rounding functions. We could compile them though into a binary object
file using gcc (something like gcc -c round.c ). We could then link into the object file
when compiling an executable program that uses these functions.
294
18.3. Examples
using the quadratic formula,
b2 4ac
2a
Since there are two roots, we may have to write two functions, one for the plus root
and one for the minus root both of which take the coefficients, a, b, c as arguments.
However, if we wrote a single function that took the coefficients as parameters by value
as well as two other parameters by reference, we could place both root values, one in each
of the by-reference variables.
b
1
2
3
4
5
6
7
By using pass by reference variables, we avoid multiple functions. We also note that the
return value in this case is unused since we are returning the root values in the two
pass-by-reference variables. This frees up the return value to be used to communicate
errors to the calling function. Recall that there could be several bad inputs to this
function. The roots could be complex values, the coefficient a could be zero, etc. And
now that we are dealing with pointers, the pointers could be invalid (point to NULL ). In
the next chapter, we examine how we can use the return value to communicate different
errors to the calling function, letting it handle those errors.
295
297
All three of these are defined in the errno.h header file. Depending on the system,
additional error codes may also be defined and supported (see POSIX Error Codes
below).
When an error occurs, a function will set the global variable errno to one of these error
code values. Upon returning from a function, you can check for these error codes. Since
these error codes are represented as integers, you simply use the numerical comparison
operator, == . You can check for no error by making a comparison to zero.
In addition, the standard string library, defined in the header file, string.h provides a
function, char * strerror(errno) that can be used to map the value in errno to
a human-readable error message. We discuss strings in detail later on, but we can see
how to use this function in the following demonstration (see below). The output of this
program is as follows.
result: 1.4142, error: 0
result: -nan, error: 33
it was an EDOM error
Error Message: Numerical argument out of domain
result: -inf, error: 34
it was an ERANGE error
Error Message: Numerical result out of range
For this particular system, the EDOM and ERANGE error codes were associated with the
integer values 33 and 34 respectively. These numbers are not necessarily the same on all
systems so comparisons must be made against the macro names for portability.
298
#include<stdio.h>
#include<stdlib.h>
#include<math.h>
#include<string.h>
#include<errno.h>
6
7
9
10
11
//okay
x = sqrt(b);
printf("result: %.4f, error: %d\n", x, errno);
12
13
14
15
16
17
18
19
//make a comparison
if(errno == EDOM) {
printf("it was an EDOM error\n");
}
20
21
22
23
24
25
26
27
28
//ERANGE error
x = log(c);
printf("result: %.4f, error: %d\n", x, errno);
29
30
31
32
if (errno == ERANGE) {
printf("it was an ERANGE error\n");
}
33
34
35
36
37
38
39
40
return 0;
41
42
43
299
In our own code we could communicate errors to calling functions by setting errno ,
however, we may run into compatibility issues with the standard error codes or POSIX
error codes. Instead, it may be more appropriate to do error handling by utilizing the
return value of a function to communicate an error code.
That is, to do error handling we could design our functions to always return an int value
to indicate an error: 0 for no error and some non-zero value to indicate various different
types of errors. Of course, as a consequence any value that needs to be returned to the
calling function would need to be done so via a pass by reference variable.
As an example, lets revisit the quadratic roots example in Section 18.3.2. By returning
the two output values via pass by reference variables, we freed up the return value. We
can now modify our function to return an integer indicating an error code instead.
Previously we had identified several different types of errors: division by zero (if a = 0),
complex roots (if b2 4ac < 0) and a NULL pointer error if the variables passed by
reference were NULL . We can now modify our function to check for these errors and return
an appropriate error code. We return zero in the event that no error was encountered.
300
Now when a function invokes our quadraticRoots() it can check to see what kind of
error code it returned and handle the error in whatever way it wants.
There is still an issue, however. The usage of the integers 1, 2, 3 to indicate the various
errors was arbitrary. These are essentially magic numbers that the calling function
would have to deal with by making comparisons with various integers. The numbers
themselves are meaningless and someone using them would need to constantly refer to
documentation to understand which integer corresponded to which error condition.
It would be much better if we could follow the strategy of the errno and define humanreadable identifiers for each error code. We could accomplish this by defining macros,
but another solution is to use an enumerated type.
301
typedef enum {
SUNDAY,
MONDAY,
TUESDAY,
WEDNESDAY,
THURSDAY,
FRIDAY,
SATURDAY
} DayOfWeek;
In this example weve defined an enumeration of the days of the week. The name of the
type itself is DayOfWeek and we can now declare variables of this type. The possible
values it can take are SUNDAY , MONDAY , etc. and we can use these keywords in our
program. For example,
1
2
3
4
5
Note the modern naming conventions: the type identifier uses upper camel casing while
the enumerated values follow an upper case underscore convention. Though our example
does not contain a value with multiple words, if it had, we would have used an underscore
to separate them. Furthermore, enumerated type declarations are usually placed in
separate header files along with function prototype declarations.
Care must be taken when using enumerated types, however. Internally, C simply associates
integers with the values. Thus, in our example, SUNDAY is actually 0 , MONDAY is 1 ,
and SATURDAY is 6 . When we do assignments or equality comparisons, were actually
just comparing integers.
Consequently, a DayOfWeek variable may be assigned values that do not correspond to
our enumeration. For example,
DayOfWeek today = 1000;
is valid code and will not (in general) result in any compiler errors or warnings, even
though it is assigning an invalid value to the variable. Care must be taken to only assign
valid values to an enumerated type variable. Proper error checking should also be done.
Despite this limitation, using enumerated types in C provides an obvious advantage.
Without an enumerated type wed be forced to use a collection of magic numbers to
indicate values. Even for something as simple as the days of the week wed be constantly
trying to remember: which day is Wednesday again? I forget, do our week start with
302
typedef enum {
NO_ERROR,
DIV_BY_ZERO_ERROR,
COMPLEX_ROOT_ERROR,
NULL_POINTER_ERROR
} ErrorCode;
Now in the quadraticRoots() function, we can return the appropriate error code.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
303
20. Arrays
C allows you to declare and use arrays. Since C is statically typed, arrays must also
be typed when they are declared and may only hold that particular type of element. C
supports the use of both static arrays and dynamic arrays through standard library calls.
int arr[5];
double values[10];
The two declarations above create arrays of size 5 and 10 respectively. The array types
are also defined using the usual keywords. In this case, arr can only hold int values
and values can only hold double values. Arrays follow the same naming rules and
conventions as regular variables. Many times, identifiers are made plural as an array
naturally holds more than one value.
C99 introduced Variable Length Arrays (VLAs, which are also supported in GNU C89)
which allow you to declare a static array whose size is determined by a variable. For
example,
1
2
int n = 5;
int arr[n];
or within a function,
1
2
3
4
void foo(int n) {
int arr[n];
...
}
In either case, care must be taken as static arrays are allocated on the stack. The stack
305
20. Arrays
is generally small and allocating even a moderately large array on the stack may lead to
a stack overflow. In addition, VLAs are not supported in any C++ standard and should
be avoided if portable code is desired.
Another way to declare an array is to use the compound declaration and assignment
syntax whereby you can initialize an array to hold a certain list of values.
1
int a[] = { 2, 3, 5, 7, 11 };
Using this syntax we do not need to specify the size of the array as the compiler is smart
enough to count the number of elements weve provided. The elements themselves are
denoted inside curly brackets and delimited with commas.
Indexing
Once an array has been created, its elements can be accessed by indexing. C uses the
standard 0-indexing scheme so the first element is at index 0, the second at index 1, etc.
Indexing an element involves using the square bracket notation and providing an index.
Once indexed, an array element can be treated as a normal variable and can be used
with other operators such as the assignment operator or comparison operators.
1
2
3
4
5
arr[0] = 42;
if(arr[4] < 0) {
printf("negative!\n");
}
printf("arr[1] = %d\n", arr[1]);
Recall that an index is actually an offset. The compiler and system know exactly how
many bytes each int element takes and so an index i calculates exactly how many
bytes from the first element the i-th element is located at. Consequently it is possible
to index elements that are beyond the range of the array. For example, arr[-1] or
arr[5] would attempt to access an element immediately before the first element and
immediately after the last element. Obviously, these elements are not part of the array.
If these out-of-bound elements represent a memory space that does not belong to our
program, then it is likely that a segmentation fault will occur and terminator our program
as weve attempted to access or modify memory that does not belong to us. However,
it is also likely that the memory space surrounding our array belongs to our program.
Still, accessing those blocks of memory may not give us meaningful values and modifying
them could corrupt other variable values or generally lead to undefined behavior. It is
our responsibility as programmers to write code that does not go beyond the bounds of
an array.
306
Iteration
C provides no foreach loop to iterate over an array. Instead, the most natural way to
iterate over the elements is a normal for-loop that increments an index variable.
1
2
3
4
5
int i, n = 10;
int arr[n];
for(i=0; i<n; i++) {
arr[i] = 5 * i;
}
The for loop above initializes the variable i to zero, corresponding to the first element
in the array. The continuation condition is specifies that the loop continues while i is
strictly less than the size of the array. This iteration for-loop is idiomatic when dealing
with arrays.
307
20. Arrays
size. There is a macro available, sizeof() that can be used to determine the number
of bytes any type takes on a system. For example, sizeof(int) gives the number of
bytes an int takes while sizeof(double) gives the number of bytes for a double ,
etc. Thus, if we want to allocate an array of 100 integers, we could call malloc() as
malloc(100 * sizeof(double));
Using sizeof() is actually preferable as some systems may use a different number of
bytes for various types.
Finally, note the return type of malloc() : its a void pointer. The malloc() function
simply allocates chunks of memory. It doesnt care that you intend to use the memory to
store integers or floating-point numbers. Thus, malloc() returns a generic pointer,
simply an address in memory. Once we have that pointer we can treat it as an integer
pointer, int * or a floating-point pointer, double * depending on what we want to
store.
One way of doing this is to explicitly cast the void pointer as the pointer that we want.
Some examples:
1
2
3
4
5
The pointer cast is just like when we casted int types as double types so that we
could perform division without truncation. In this case, we convert the returned generic
void pointer into a int pointer and double pointer respectively. 1
Once created, a dynamic array can be used just like a static array. You use the arrays
identifier as well as an index to access or modify each element. The same rules and
pitfalls apply, so care must be taken to not access elements outside the bounds of the
array. Finally, when using malloc() it is important to understand that the memory
that is allocated is uninitialized. Just as with variables you cannot make any assumptions
as to the contents of the memory space that is allocated. It may contain garbage values,
it may contain the original contents that occupied the memory last time it was used, etc.
A full example:
Performing an explicit pointer cast is actually not necessary in C as the type system will do an implicit
cast for us. Whether explicit or implicit types casts are better can evoke debates akin to nerd
holy wars. Though there are advantages and disadvantages to both [15], we perform explicit pointer
casts in this book as clarity and intent is more important than brevity. Even more important is that
explicit pointer casts are necessary in C++ so doing so in our C code makes our code more portable.
308
int n = 100;
int *arr = NULL;
arr = (int *) malloc(sizeof(int) * n);
if(arr == NULL) {
fprintf(stderr, "unable to allocate memory!\n");
exit(1);
}
for(i=0; i<n; i++) {
arr[i] = (i+1) * 10;
printf("a[%d] = %d\n", i, arr[i]);
}
There are other functions that can be used to allocate memory in C. For example,
calloc() is similar to malloc() but initializes the contents to zero (null bytes).
realloc() can be used to resize an existing memory space (though it may fail if it
cannot be expanded).
Deallocation
Once dynamically allocated memory is no longer needed, we should release it so that
it can be reused by the program or the operating system. The free() function in the
standard library does this for us. All we need to do is provide the pointer to free()
and it deallocates the memory block.
1
free(arr);
Once freed, accessing old memory pointed to by the arr pointer is undefined behavior
and may lead to unexpected or fatal results.
309
20. Arrays
1
2
3
4
5
6
7
8
9
10
11
12
/**
* This function computes the sum of elements in the
* given array which contains n elements
*/
int computeSum(int *arr, int n) {
int i;
int sum = 0;
for(i=0; i<size; i++) {
sum += arr[i];
}
return sum;
}
In this example we had no need to make changes to any of the elements in the array.
However, the array was still passed by reference, meaning we could have. When passing
arrays, we can use the keyword const (short for constant) to explicitly indicate that no
changes will be made to the array elements. For example,
int computeSum(const int *arr, int n)
This is enforced by the compiler: if we do attempt to make changes to a const array, it
will be a compiler error.
We can also create an array in a function and return it as a value. As previously discussed,
we will need to do so by creating a dynamically allocated array. For example, the following
function creates a deep copy of the given integer array. That is, a completely new array
that is a distinct copy of the old array. In contrast, a shallow copy would be if we simply
made one reference point to another reference.
1
2
3
4
5
6
7
8
9
10
11
12
13
/**
* This function creates a new copy of the given integer
* array which contains n elements and returns a pointer
* to the new copy.
*/
int * makeCopy(const int *a, int n) {
int *copy = (int *)malloc(sizeof(int) * n);
int i;
for(i=0; i<n; i++) {
copy[i] = a[i];
}
return copy;
}
The function returns an integer pointer. Here we have a similar problem with respect to
the size of the array. We only have one return value, which must be the pointer. In this
310
int i;
int **myMatrix = NULL;
myMatrix = (int **)malloc(n * sizeof(int*));
for(i=0; i<n; i++) {
myMatrix[i] = (int *)malloc(n * sizeof(int));
}
Line 3 invokes malloc() to create an array of n integer pointers, that is int * types.
We then go into a loop to iterate over each of these pointers and invoke malloc()
again to setup each array. This process is visualized in Figure 20.1. Note the syntax:
when we invoke malloc() to create an array of pointers, we use sizeof(int*) to
determine how many bytes each integer pointer takes. We also do an explicit type cast
of (int **) to match our pointer-to-pointer(s) variable myMatrix .
Once the allocation has completed, we can treat myMatrix as a normal 2-dimensional
array and index each element with two index variables.
1
2
3
4
5
6
int i, j;
for(i=0; i<n; i++) {
for(j =0; j<n; j++) {
myMatrix[i][j] = 0;
}
}
To deallocate and free multidimensional arrays, we need to work backwards. If we immediately freed the pointer-to-pointers, free(myMatrix); , we would lose all references to
311
20. Arrays
**myMatrix
*myMatrix[0]
*myMatrix[1]
*myMatrix[2]
..
.
*myMatrix[n-1]
*myMatrix[0]
*myMatrix[1]
*myMatrix[2]
myMatrix[0][n-1]
..
.
*myMatrix[n-1]
(b) Initialization of the first pointer. On the first iteration of the for loop when i = 0, the
first row is initialized when malloc() is invoked.
**myMatrix
*myMatrix[0]
myMatrix[0][n-1]
*myMatrix[1]
myMatrix[1][n-1]
*myMatrix[2]
..
.
*myMatrix[n-1]
(c) Initialization of the second pointer. On the second iteration of the for loop when i = 1,
the second row is initialized.
**myMatrix
*myMatrix[0]
myMatrix[0][n-1]
*myMatrix[1]
myMatrix[1][n-1]
*myMatrix[2]
myMatrix[2][n-1]
..
.
*myMatrix[n-1]
myMatrix[n-1][n-1]
(d) After the termination of the for loop, each row has been initialized and the pointer-topointers can be treated like a 2-dimensional array.
int n = 5, m = 3, i, j;
2
3
4
Were not done yet, however. We still need to initialize all of the other pointers. To do
so, we dereference the array (once) to get the address of the start of the memory block
and then compute an offset from this beginning on where the next row should be.
Since each row has m elements, this arithmetic is simple.
1
2
3
Now we can treat the array like we would any other two dimensional array by specifying
two indices.
313
20. Arrays
1
2
3
4
5
Which would result in an array that, conceptually, looks something like the following.
[
[
[
[
[
0
10
20
30
40
1
11
21
31
41
2
12
22
32
42
]
]
]
]
]
314
21. Strings
C has no built-in string type. Instead, strings are represented as arrays of char elements.
They differ from, say arrays of int or double types, however in that they are null
terminated arrays. The end of the string must always be denoted with a null-terminating
character, \0 (the 0 valued character in the ASCII table). Failure to properly
null-terminate a string may lead to undefined behavior or fatal errors.
C provides a good variety of functions in its standard string library (included in the
string.h header). All of these functions operate under the assumption that the strings
passed to it are null-terminated, thus its your responsibility to ensure that they are.
Moreover, some of the functions that operate on strings will inset the null-terminating
character for us, but others will not. Thus, its important to understand the expectations
and guarantees of each function.
This syntax can only be used when creating static strings (they are allocated on the
stack and locally scoped). The compiler is able to scan the string literal and determine
index
char *s
\0
Figure 21.1.: A string in C is achieved by using a char array. However, the string is
terminated by a null-terminating character, \0 . Though an array may have
space for additional characters, they are irrelevant if the null terminator
precedes them.
315
21. Strings
how many characters are needed and even inserts the null-terminating character for us.
Thus, the length of the two strings in the example are 3 and 5 respectively, but the array
size created for them will be 4 and 6 respectively to accommodate the null-terminating
character.
Static strings have the same limitations as static arrays. Since they are allocated on the
stack, they cannot, for example, be returned from a function. Just as with int and
double types, however, we can dynamically allocate memory to hold char types using
malloc() . The important difference being that we need to always allocate at least one
more character to accommodate the null-terminating character.
1
2
This example creates a dynamically allocated char array that is able to hold 9 characters
(since one will be needed for the null-terminating character). However, once we have
created the array, we cannot simply assign an entire string to it using the usual assignment
operator.
1
2
//THIS IS *WRONG*:
fullName = "Tom Waits";
This will compile, but doesnt give us what we want. Recall that fullName is a character
pointer. Using the assignment operator simply makes it point to the static, literal string
"Tom Waits" . In fact, we lose our reference to the dynamically allocated array that
was created with malloc() , giving us a memory leak.
Instead, we need to use a function in the standard library to copy a string. The function
in the standard library that allows you to copy strings is as follows.
char *strcpy(char *dest, const char *src);
The name, strcpy is short for string copy. 1 Both arguments are character pointers.
In keeping with the use of the assignment operator, the second argument (source)
is copied into the first, destination (just as with an assignment operator, the value
on the right-hand-side is copied into the variable on the left-hand-side). Moreover, the
second argument has been marked as const indicating that it will not be changed.
The contents of the first argument will be changed since we are copying a string into
it, erasing whatever contents it had prior. Finally, the function returns a pointer to the
dest argument, mostly so that it can be used in nested function calls (though well
avoid such confusingly terse tricks). From our example:
The abbreviations are mostly historic: in the 70s and 80s when memory was measured in kilobytes,
saving a few characters made a significant difference.
316
In addition, we can access and modify individual characters in a string using the usual
indexing and assignment operator with char literals.
1
2
3
4
5
6
//printing:
printf("First Initial: %c\n", fullName[0]);
7
8
9
10
11
//modifying:
firstInitial[0] = t;
firstInitial[4] = w;
//fullName is now "tom waits"
Length
Like with regular arrays, we are responsible for ensuring that we do not access characters
outside the character array of a string. However, since a string is null-terminated, there
is a nice function provided by the string library to determine its length,
size_t strlen(const char *s);
Recall that size_t can essentially be treated as an integer, indicating the number of
bytes in the passed string. Since a character is a single byte, this function tells us how
many character are in the given string not including the null-terminating string. This
function, too is an abbreviation for string length.
Using this function we can easily iterate over each character in a string.
317
21. Strings
1
2
3
4
int i;
for(i=0; i<strlen(fullName); i++) {
printf("fullName[%d] = %c\n", i, fullName[i]);
}
Concatenation
A concatenation function is provided that allows you to append one string to the end of
another.
char *strcat(char *dest, const char *src);
Similar to the strcpy() function, strcat() appends the source ( src ) string to
the end of the destination ( dest ) string. Note that it is your responsibility to ensure
that the destination string is large enough to accommodate the source string. If it is not,
then it could lead to undefined behavior as strcat() overwrites memory after the end
of the destination string. Further, strcat() will copy the null-terminating character
for you so that the resulting string (now stored in dest ) is valid.
1
2
3
4
5
6
7
8
Byte-Limited Versions
C also provides several byte-limited versions of the copy and concatenation functions:
char *strncpy(char *dest, const char *src, size_t n);
char *strncat(char *dest, const char *src, size_t n);
They work similarly in that they copy/concatenate the source, src string into the
destination, dest string. However, there is a third parameter, n which specifies at
most how many bytes to copy/concatenate. The parameter, n allows you to limit the
number of characters that the operation uses. If either of these functions encounters the
null-terminating character before n bytes have been copied/concatenated, they stop and
copy the null-terminating character for us.
318
3
4
5
6
7
8
Computing a Substring
The byte-limited versions of the copy function can be used to compute a substring of
another string. The parameter n can be used to limit the length of the string, but how
might we specify where the substring should start?
For example, in the string "Thomas Alan Waits" we may want to get the substring
representing his middle name, "Alan" . The length is 5 (four characters and the nullterminating character) and we can specify that the copying should start by using an
index. In this case, we want the copying to start at index 7 , the 8th character. If the
string is stored in an array named name , this would be name[7] . However, indexing an
array like this results in a single character and strncpy() expects a string (a character
pointer ). Fortunately, we know how to do this: using the referencing operator, we can
turn the 8th character into a character pointer, &name[7] . A full example:
1
2
3
4
319
21. Strings
int main(int argc, char **argv)
which is represented as a double character pointer, char ** . Conceptually, each row
is a string, char * . The first parameter, argc indicates how many rows there are.
The main() function doesnt need to be told how big each row is since strings are
null-terminated.
We can create our own arrays of strings similar to how we created 2-dimensional arrays
of int and double types.
1
2
3
4
5
6
7
8
9
10
11
12
21.4. Comparisons
When comparing strings in C, we cannot use the numerical comparison operators such
as == , or < . Because strings are represented as arrays, using these operators actually
compares the variables memory addresses.
1
2
3
4
char *a =
char *b =
strcpy(a,
strcpy(b,
5
6
7
8
if(a == b) {
printf("strings match!\n");
}
The code above will not print anything even though the strings a and b have the
same content. This is because a == b is comparing the memory address of the two
variables. Since they point to different memory addresses (created by two separate calls
to malloc() ) they are not equal.
320
21.4. Comparisons
The C string library provides a standard comparator function to compare strings based
on their content:
int strcmp(const char *a, const char *b);
The function takes two strings and returns an integer based on the lexicographic ordering
of a and b . If a precedes b , strcmp() returns something negative. It returns zero if
a and b have the same content. Otherwise it returns something positive if b precedes
a.
Some examples:
1
2
3
4
int
x =
x =
x =
x;
strcmp("apple", "banana"); //x is negative
strcmp("zelda", "mario"); //x is positive
strcmp("Hello", "Hello"); //x is zero
5
6
7
8
9
x = strcmp("Apple", "apple");
//x is negative
In the last example, "Apple" precedes "apple" since uppercase letters are ordered
before lowercase letters according to the ASCII table. We can also make comparisons
ignoring case if we need to using the alternative:
int strcasecmp(const char *s1, const char *s2);
which is a case-insensitive version. Here, strcasecmp("Apple", "apple") will return
zero as the two strings are the same ignoring the cases.
The comparison functions also have byte-limited versions,
int strncmp(const char *s1, const char *s2, size_t n);
and
int strncasecmp(const char *s1, const char *s2, size_t n);
Both will only make comparisons in the first n bytes of the strings. Thus, the comparison,
strncmp("apple", "apples", 5) will result in zero as the two strings are equal in
the first 5 bytes.
321
21. Strings
21.5. Conversions
Weve previously examined the functions atoi and atof that allow you to convert
a strings that hold numeric values to int and double values respectively. Another
way to convert strings to numbers is to use a function similar to the familiar scanf()
function which reads its string from the standard input. The sscanf() function reads
its input from a string instead of the standard input.
1
2
3
4
The sscanf() function differs in its first argument: the string that contains the value
you want to parse. Otherwise, the second two arguments are as in scanf() : the format
(as a string) and the variable(s) that the results should be stored in (passed by reference).
Likewise, there is a companion sprintf() which is similar to the printf() function,
but instead of printing to the standard output, it prints to the string. That is, the
result is placed in a string.
1
2
3
4
5
int x = 10;
double y = 3.14;
char *s = (char *) malloc(sizeof(char) * 50);
sprintf(s, "The value of x is %d, y = %f.", x, y);
//s now contains "The value of x is 10, y = 3.140000."
21.6. Tokenizing
Recall that tokenizing is the process of splitting up a string along some delimiter. For
example, the comma delimited string, "Smith,Joe,12345678,1985-09-08" contains
four pieces of data delimited by a comma. Our aim is to split this string up into four
separate strings so that we can process each one.
The C string library provides a tokenizing function:
char *strtok(char *str, const char *delim);
which works as follows. The first argument is the string that you want to tokenize and
the second contains the delimiter that you want to split along. The second argument is
actually a string and allows you to specify more than one delimiter, but well restrict our
attention to single character delimiters. The function returns a pointer to the first token
322
21.6. Tokenizing
in the string. To get the second and all subsequent tokens, we call strtok() again, but
we pass it NULL as the first argument to continue parsing the same string.
If we pass a new string as the first argument to strtok() the tokenization process
will start over on the new string. Note that the first argument does not have the
const keyword. This is because strtok() will make changes to the string during
the tokenization process. If the string needs to be preserved, tokenization should be
performed on a deep copy of the string.
When there are no more tokens in the string, strtok() returns NULL to indicate no
more tokens are in the string. This logic can be used to write a while loop to iterate over
each token. Consider the following example.
1
2
3
4
5
6
7
8
9
Smith
Joe
12345678
1985-09-08
It should be noted that strtok() is not reentrant. It can only work on one string at a
time. In a multithreaded application, two threads cannot both use strtok() or they
would end up processing each others strings. Even in a non-threaded application, we
need to be careful. For example, we cannot process a string using strtok() in one
function and then call another function that does the same. C does provide a reentrant
version, strtok_r() that can be used.
323
C provides several functions to manipulate and process files. Like other I/O functions,
these are all defined in the standard input/output library, stdio.h . Writing binary or
plaintext data is determined by which functions you use.
In general whether or not a file input/output stream is buffered or unbuffered is determined
by the system configuration. There are some ways in which this can be changed, but we
will not cover them in detail.
Files are represented in C by a FILE pointer type defined in the standard input/output
library. As a pointer, it essentially points to the file stored in memory.
To open a file, you use the fopen() function (short for f ile open) which takes two
arguments and returns a FILE pointer:
FILE *fopen(const char *path, const char *mode);
The first argument is the file path/name that you want to open for processing. The
second argument is a string representing the mode that you want to open the file in.
There are several supported modes, but the two we will be interested in are reading, in
which case you pass it "r" and writing in which case you pass it "w" . The path can
be an absolute path, relative path, or may be omitted if the file is in the current working
directory.
325
1
2
3
4
5
if(input == NULL) {
fprintf(stderr, "Unable to open input file");
exit(1);
}
6
7
8
9
10
if(output == NULL) {
fprintf(stderr, "Unable to open output file");
exit(1);
}
11
12
13
14
The two checks above check that the file opened successfully. If the file opening failed,
fopen() returns NULL . Opening a file can fail for a number of reasons. On POSIX
systems for example, additional information can be obtained by accessing the standard
error number, errno (see Section 19.1). Some errors that can result:
ENOENT No such file or directory
EACCES Permission denied
ENOMEM Insufficient storage space is available
among many other possibilities. These error codes can be used to implement more specific
error handling code if desired.
326
int x = 10;
double y = 3.14;
3
4
5
6
7
8
9
10
11
12
13
14
Using fscanf() for arbitrary string input is potentially dangerous as there is limited
bounds checking. We must store the input value from the file into a string (character
array), but if the file or line contains more characters than the array can accommodate
we may have a buffer overflow.
A better way to read input is to use fgets() which allows us to limit the number of
bytes that are read.
char *fgets(char *s, int size, FILE *stream);
The first argument is the string that the input data will be read into. The second
parameter is how we limit the number of characters that will be read. It actually reads
in one fewer character, size-1 to account for the null-terminating character which
fgets() automatically inserts for us. The last argument is the file pointer that we wish
to read from.
The behavior of fgets() is that it reads up to size-1 characters from the input
file and places the results into s . If fgets() encounters either an EOF symbol or
an endline character, \n it stops reading. In the case of an endline character, it is
included in the result and may need to be chomped out (that is, removed).
The fgets() function can be used to process a file line by line until the end of the
file is reached. Each line can be processed (perhaps tokenized) individually to extract
particular pieces of data. This is typically how a CSV or similar file may be processed.
To determine if the end of the file has been reached, you can use the return value of the
function: it returns NULL when no more characters have been read.
327
8
9
10
11
12
13
14
15
16
17
A similar function, int fgetc(FILE *stream); allows us to get a single character from
the input file (returned as an int ) if we prefer to read character by character.
328
int x = 10;
2
3
4
5
6
7
8
9
10
11
12
13
14
15
329
23. Structures
Strictly speaking, C is not an object-oriented programming language, it is an imperative
(or, relatedly a structured or procedural ) programming language. This characterizes
the language as one that changes a programs state through statements and the use
of function calls. Though C does not have objects, it does support a weak form of
encapsulation through the use of structures. Structures are a user-defined type that
collect multiple pieces of data together into one logical unit. Once defined, structures
can be used in a program just like any other variable type; variables can be declared,
passed and returned from functions, pointers to structures can be used, etc.
Structures form a weak form of encapsulation in that they only provide the grouping of
data. The protection of data through visibility keywords is not supported. The grouping
of functions that act on that data is also not readily supported.1 However, even this
weak form provides a very useful and convenient way to collect related pieces of data.
typedef struct {
char *firstName;
char *lastName;
int id;
double gpa;
} Student;
You can define a structure with function pointers as elements, so structures could technically include
methods but this is not really what most people think of when considering object-oriented paradigms.
331
23. Structures
merated type which is a list, these elements do not constitute a list).
A structure may contain any number of elements of any type.
The name of the structure is provided at the end, ended with a semicolon.
We use a modern naming convention: each element is named using lower camel
casing while the name of the structure itself uses upper camel casing.
In addition, structure declarations are generally placed in a header file along with any
any function prototypes that use the structure as a parameter or a return type. Once
you have defined a structure you can now use it as you would a built-in variable type.
For example,
Student s;
declares a Student structure with the variable name s .
struct Student {
...
};
That is, without the keyword typedef and with the structure name at the beginning.
The difference is a bit technical (the original syntax creates an anonymous structure with
the Student identifier placed in the global namespace, while this declaration places the
Student identifier in the structure namespace), but one of the consequences of using
this type of declaration is that the Student identifier is not in the global scope, so to
declare a variable of the type Student you need to further specify that it is a structure
using the following syntax.
struct Student s;
Further, you may also see another style,
1
2
3
Which places the Student identifier in both the global space and in the structure
space. Which style of declaration you use depends on several factors, but for simplicity
332
typedef struct {
int year;
int month;
int date;
} Date;
How might we include this in our Student structure? Once we have defined a structure
we can use it as we would a normal variable, so it makes sense that we could include it
in another structure.
333
23. Structures
1
2
3
4
5
6
7
typedef struct {
char *firstName;
char *lastName;
int id;
double gpa;
Date dateOfBirth;
} Student;
Code Sample 23.1: A Student structure declaration
This is a good illustration of a form of composition where one structure may be composed
of other structures. Since the Student structure owns an instance of the Date
structure, it is necessary to ensure that the Date structure is declared before the
Student structure (just as we need to declare variables before we use them.
23.2. Usage
23.2.1. Declaration & Initialization
Once we have defined a structure (and included the header file it has been defined in),
we can create instances using the usual syntax.
1
2
Student s;
Student t;
These static declarations will allocate enough space on the stack to hold all of the data
associated with the structures (the two pointers, integer, and double). However, the
values stored in each of the structures member variables are undefined. With this style
of declaration, C does not define default values.
Alternatively, another way to declare instances of our structures is to use the following
syntax.
334
23.2. Usage
1
2
3
4
5
6
7
8
9
10
11
12
Student s = {};
Student t = {
"Grace",
"Hopper",
12345678,
4.0,
{
1906,
1,
1
}
};
The first declaration creates a Student structure with defaulted values (zero for any
numeric types, null for any pointers). The second creates a Student structure
initialized with the values provided in the curly brackets. The order matters here and
will match the ordering of the original structure declaration. Since the dateOfBirth
is a structure itself, a nested set of values within curly brackets is necessary. One draw
back to this type of declaration is that the character pointers are statically declared.
The strings "Grace" and "Hopper" are initialized in a read-only segment of memory,
so any attempts to change the contents of these strings are undefined behavior. The
pointers themselves, however, can be reassigned.
This static declaration also suffers the same limitations as static arrays: they are allocated
on the stack and cannot, therefore be returned from functions. In addition, since structures
consist of multiple pieces of data, their memory footprint is larger. Allocating larger and
larger structures on the stack runs the risk of running out of stack memory resulting in a
stack overflow. To solve this, we can instead use dynamically allocated structures.
To dynamically allocate a structure, we use a pointer to a structure and a call to
malloc() to allocate enough space for the structure. Even though our structure is
user-defined, we can still use the sizeof() macro to determine the number of bytes
required by the structure. The compiler and macro are smart enough to look at the
structure declaration and determine the size of each of its variables and add up the total
number of bytes required.
1
The multiplication by 1 in this example is not strictly necessary, but emphasizes the fact
that we are allocating space for one structure and not an array of structures. Initializing
a dynamically allocated structure like this does not initialize any of its variables as
there are no default values defined by C. Moreover, it does not initialize memory for any
pointer member variables. For example the firstName and lastName pointers need
to be manually initialized with additional malloc() calls.
335
23. Structures
Once we have a declared structure, we need to access its member variables and perhaps
update them. If we have a statically declared structure, such as
Student s;
we can use the direct component selector operator, which is simply just a period followed
by the member variable we wish to access. Commonly, this is referred to simply as the
dot operator.
1
Student s;
2
3
4
5
//set values:
s.id = 87654321;
s.gpa = 3.9;
6
7
8
9
//access values:
printf("Name: %s, %s\n", s.lastName, s.firstName);
printf("GPA: %.2f\n", s.gpa);
When structures are nested, we can use the dot operator multiple times to access members
variables of member variables.
1
2
3
s.dateOfBirth.year = 2525;
s.dateOfBirth.month = 12;
s.dateOfBirth.date = 25;
When we have a pointer to a structure, we cannot directly use the dot operator. Instead,
we have to change the pointer into a normal structure by dereferencing it, then we can
use the dot operator. However, the dot operator has a higher order of precedence than
the dereferencing operator, thus parentheses are required:
1
2
Student *s = ...;
(*s).id = 87654321;
This can be a bit unwieldy, so C provides a convenience operator, the indirect component
selector operator, or more commonly, the arrow operator that allows us to select a member
variable with a single operator that resembles a left-pointing arrow.
336
2
3
4
5
//set values:
s->id = 87654321;
s->gpa = 3.9;
6
7
8
9
10
11
12
13
14
15
//access values:
printf("Name: %s, %s\n", s->lastName, s->firstName);
printf("GPA: %.2f\n", s->gpa);
16
17
s->dateOfBirth.year = 2525;
Note the last line: the dateOfBirth member variable was another structure, but not a
pointer to a structure, so we use the dot operator to access the year member variable.
2
3
...
4
5
6
7
8
9
As in the example, we can index each element in the array, roster . Once indexed, each
element is a regular structure and so we use the dot operator to access each of its member
variables. As with any other array, each element takes up a number of bytes, equal to
sizeof(Student) . We can swap and reassign each element just like any other variable,
337
23. Structures
for example, the following code swaps the first two elements using a temporary variable.
1
2
3
Each of these operations copies over every byte that makes up the structure. For small
structures, this isnt that big of a deal. However, for larger structures, this may become
an issue, especially if we do this often or pass structures around to functions.
As an alternative, we could instead deal indirectly with structures by creating an array
of pointers to structures. Swapping elements then involves only copying pointer values
rather than every byte that makes up the structure. To do this, we use the familiar
pointer-to-pointer syntax.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
As in the example above, if each element in the array is a pointer to a structure, then we
use the arrow operator to access each member variable.
In the first prototype, we would pass a Student structure by value to the function. As
338
In the first example, weve used the const keyword which prevents any changes to
the structures values (we may omit this if we need to design a function that changes
its values). We will now consider several idiomatic examples of using structure with
functions.
Properly creating and initializing structure instances can be a complex and tedious
task. However, it is likely that we will need to repeat this operation over and over.
We can simplify our task if we write a function that creates a structure instance for us
while we provide the function the values we want it initialized to. Such functions are
sometimes referred to as factory functions as they can be used to manufacture as many
instances as we want (in object oriented programming languages, such methods are called
constructors).
We will need to take care that we make deep copies of any dynamically allocated elements
such as strings. Shallow copies where references are shared may lead to unexpected
behavior as changes to one string may affect multiple structures.
339
23. Structures
1
2
3
4
5
6
7
8
/**
* This function creates a new student structure with the
* given values.
*/
Student * createStudent(const char *firstName,
const char *lastName,
int id,
double gpa) {
Student *s = NULL;
s = (Student *) malloc(sizeof(Student) * 1);
10
11
12
13
14
15
16
17
18
s->id = id;
s->gpa = gpa;
19
20
21
return s;
22
23
/**
* This function creates a new, deep copy of a Student
* structure .
*/
Student * copyStudent(const Student *s) {
return createStudent(s->firstName, s->lastName, s->id, s->gpa);
}
340
/**
* Returns a string representation of the given
* Student structure.
*/
char * studentToString(const Student *s) {
int n = strlen(s->firstName) +
strlen(s->lastName) +
8 + //id, assumed to always be at most 8 digits
4 + //gpa, assumed to be 0.0 - 4.0
19; //other formatting characters
7
8
9
10
11
12
13
14
15
16
17
return str;
18
19
Here, weve utilized a variation on the familiar printf() function, sprintf() which
prints the result not to the standard output or a file, but to a string, specified as the
first argument.
341
23. Structures
1
2
3
4
5
6
7
8
9
10
11
12
/**
* Computes the average GPA of the Student structures in
* the given roster (which is of size n).
*/
double computeAverageGpa(const Student *roster, int n) {
double sum = 0.0;
int i;
for(i=0; i<n; i++) {
sum += roster[i].gpa;
}
return sum / n;
}
When we pass in the array of structures, it is not passed by value. That is, the total
number of bytes for each student is not copied onto the call stack. Nevertheless, as weve
previously seen it is sometimes preferable to maintain an array of pointers to structures.
If we had such an array, we would instead of a function that looks something like this.
1
2
3
4
5
6
7
8
9
10
11
12
/**
* Computes the average GPA of the Student structures in
* the given roster (which is of size n).
*/
double computeAverageGpa(const Student **roster, int n) {
double sum = 0.0;
int i;
for(i=0; i<n; i++) {
sum += roster[i]->gpa;
}
return sum / n;
}
The only difference here is in how we access the gpa member variable using the arrow
operator instead of the dot operator.
342
24. Recursion
C supports recursion with no special syntax. However, as a structured, procedural
language, recursion is generally expensive and iterative or other non-recursive solutions
are generally preferred. We present a few examples to demonstrate how to write recursive
functions in C.
The first example of a recursive function we gave was the toy count down example. In C
it could be implemented as follows.
1
2
3
4
5
6
7
8
void countDown(int n) {
if(n==0) {
printf("Happy New Year!\n");
} else {
printf("%d\n", n);
countDown(n-1);
}
}
As another example that actually does something useful, consider the following recursive
summation function that takes an array, its size and an index variable. The recursion
works as follows: if the index variable has reached the size of the array, it stops and returns
zero (the base case). Otherwise, it makes a recursive call to recSum() , incrementing
the index variable by 1. When the function returns, it adds its result to the i-th element
in the array. To invoke this function we would call it with an initial value of 0 for the
index variable: recSum(arr, n, 0) .
1
2
3
4
5
6
7
This example was not tail-recursive as the recursive call was not the final operation (the
sum was the final operation). To make this function tail recursive, we can carry the
343
24. Recursion
summation through to each function call ensuring that the summation is done prior to
the recursive function call.
1
2
3
4
5
6
7
int fibonacci(int n) {
if(n < 0) {
return 0;
} else if(n <= 1) {
return 1;
} else {
return fibonacci(n-1) + fibonacci(n-2);
}
}
344
It is the responsibility of the calling function to ensure that the table array is large
enough to accommodate all values (it should be at least of size (n + 1) to compute the
n-th Fibonacci number).
345
347
2
3
But how we implement this function? Obviously we want to compare the values stored
in the pointer variables, so we need to dereference them. However, the comparison,
(*a < *b) for example is not comparing integer values. The variables a and b are
void pointers, not int pointers. Thus, the first step is to make them int pointers by
doing an explicit type cast:
1
2
Now the variables x and y are int pointers which can be dereferenced and compared
(we preserve the keyword const to ensure we do not make changes to the variables).
Altogether, we have the full comparator function.
348
What if we wanted to order integers in the opposite order? We could write another
comparator in which the comparisons or values are reversed. Even simpler, we could
reuse the comparator above and flip the sign by multiplying by 1 (that is, after
one of the purposes of writing functions: code reuse). Even simpler, we could flip the
arguments we pass to cmpInt() to reverse the order.
1
2
3
349
/**
* A comparator function to order Student structures by
* last name/first name in alphabetic order
*/
int studentByNameCmp(const void *s1, const void *s2) {
7
8
9
10
11
12
13
14
15
1
2
3
4
5
}
/**
* A comparator function to order Student structures by
* last name/first name in reverse alphabetic order
*/
int studentByNameCmpDesc(const void *s1, const void *s2) {
7
8
350
/**
* A comparator function to order Student structures by
* id in ascending numerical order
*/
int studentIdCmp(const void *s1, const void *s2) {
7
8
9
10
11
12
13
14
15
16
1
2
3
4
5
}
/**
* A comparator function to order Student structures by
* GPA in descending order
*/
int studentGpaCmp(const void *s1, const void *s2) {
7
8
9
10
11
12
13
14
15
16
351
352
The main use for function pointers is so that references to functions can be passed as
parameters to other functions. The passed function is known as a callback. This gives us
the ability to write a more abstract and generic function.
For example, in GUI programming, we frequently need to associate a particular function
with a particular event. For example, suppose we create a button; we need to be able to
specify what happens when that button gets clicked. We do so by providing a function as
a callback to a registration function that associates the click event with the provided
function. Thus, whenever a user clicks the button, the callback is invoked (called back).
Lets illustrate some syntax usage with another example. Suppose we want to create a
getMax() function. We could write one function for arrays of integers, another for arrays
of double s, another for Student arrays that gets the student with the maximum GPA,
then another for the ID, then another for the name and so on. Or, we could program
a generic getMax() function that could be used for any type by taking a comparator
function as a callback. To illustrate, consider the following non-generic function for
integers.
1
2
3
4
5
6
7
8
9
10
This simple function iterates through the array, keeping track of the maximum value
found so far and updating it when it finds something larger. Because weve specified
that arr is an array if int values, we can use the less-than comparison operator (line
4). Now lets make it more generic: rather than taking an array of int values, it will
now take a generic void array. Further, we will pass a function pointer to this function
that references a generic comparator function that can be used to replace the less-than
comparison operator.
353
There are a couple of issues here that we have to deal with here. When working with
generic void * pointers in C and using arrays, you cannot simply index using the
usual 0, 1, 2, etc. indices. Recall that when elements are stored in an array, the index
represents an offset of a memory address. If the array is an array of integers or double
or some other build-in type, the compiler knows how large each one is and is able to
compute the appropriate offset given the usual 0, 1, 2, etc. indices.
However, when dealing with void * elements, a function must be told how many bytes
each element takes, say size . Then each element can be indexed by multiplying an
index by the size. That is,
1
2
3
4
5
arr[0
arr[1
arr[2
...
arr[i
Thus, we also need to provide this to our getMax() function. Once we do, we can use
the comparator to determine which is the larger of two elements in the array (indexed
using the scheme above). To do this, we call the comparator on the maximum element
weve found so far and the i-th element in the loop. If it returns something negative,
then we know that the max element is less than the i-th element and so update our
max_index variable. Making these changes results in the this final version.
1
2
3
4
5
6
7
8
9
10
11
354
In this example, the getMax() function would return the index 3 and print the maximum
value stored there, 10. The getMax() function returns the index corresponding to
the maximum element according to the comparator used. If instead we had used
cmpIntDesc() the maximum element would have been the least element, (2 in the
example above) because it would have been the element ordered last by the descending
comparator.
Consider another example with our Student structure.
1
2
int n = 10;
Student *roster = (Student *) malloc(sizeof(Student) * n);
3
4
...
5
6
355
356
357
#include<stdio.h>
#include<stdlib.h>
3
4
5
6
7
8
9
10
int i = 5;
double d = 3.14;
char c = Q;
11
12
13
14
15
16
17
18
19
20
21
22
//assignment
pt2Func01 = function01;
//or:
pt2Func01 = &function01;
pt2Func02 = &function02;
23
24
25
26
27
28
29
30
31
32
33
34
35
36
//With function pointers, you can now pass entire functions as arguments to another function!
printf("Calling runAFunction...\n");
runAFunction(pt2Func01);
//we should not pass in the second pointer as it would not match the signature:
//syntactically okay, compiler warning, undefined behavior
//runAFunction(pt2Func02);
37
38
39
40
41
42
43
44
45
46
47
48
49
50
return;
51
52
53
54
55
56
57
58
59
60
61
62
63
25.3.1. Searching
Linear Search
The C search library, search.h provides a linear search function to search arrays, named
lfind() (linear find). This function does not require that the array be sorted and
performs a linear search algorithm, returning a pointer to the first element such that
the comparator returns 0 (indicating equality). The full signature of the function is as
follows.
1
2
3
4
5
359
Binary Search
In the standard library ( stdlib.h ) there is an additional binary search function that
can be used to more efficiently search a sorted array. The prototype:
void *bsearch(const void *key,
const void *base,
size_t nmemb,
size_t size,
int (*compar)(const void *, const void *));
1
2
3
4
5
All parameters are exactly as with lfind() as is the behavior: it returns a pointer to
the first element that it finds (though first does not necessarily mean the first in the
order of the array) and NULL if no matching element is found.
It is an important requirement that the array be sorted with the same comparator as
was used to sort the array or NULL may be returned erroneously.
25.3.2. Sorting
The standard library ( stdlib.h ) also provides a generic sorting function, qsort() .
Though the name suggests a Quick Sort implementation, it does not necessarily have to
be (it was when the function was designed). Modern implementations of qsort() may
implement alternatives such as Merge Sort or non-recursive hybrid Quick Sort algorithms.
The prototype and parameters are similar to the search functions but do not include a
key. The array is also not const indicating that it will be changed (which is the whole
point of calling the function).
void qsort(void *base,
size_t nmemb,
size_t size,
int(*compar)(const void *, const void *))
1
2
3
4
360
The advantages to using qsort() (as well as lfind() and bsearch() ) should be
clear. There is no need to write a new function that reimplements the same algorithm for
every possible ordering of every possible user-defined structure. We need only to create a
comparator function and pass it to qsort() . There is less code, and less chance of bugs.
The qsort() function is well-designed, optimized, and most importantly well-tested
and proven.
These functions represent a sort of weak form of polymorphic behavior found in
more modern object-oriented programming languages and other languages that support
generic programming. Polymorphism is the characteristic that the same code can be
executed on different types, greatly reducing the need for duplicate code.
25.3.3. Examples
We illustrate the usage of these functions in Code Samples 25.2 and 25.3.
361
#include<stdio.h>
#include<stdlib.h>
#include<search.h>
4
5
#include "student.h"
6
7
int n = 0;
Student *roster = loadStudents("student.data", &n);
int i;
size_t numElems = n;
9
10
11
12
13
14
printf("Roster: \n");
printStudents(roster, n);
15
16
17
/* Searching */
Student *castro = NULL;
Student *castroKey = NULL;
Student *sandberg = NULL;
char *str = NULL;
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
return 0;
61
62
362
#include<stdio.h>
#include<stdlib.h>
3
4
#include "student.h"
5
6
7
int n = 0;
Student *roster = loadStudents("student.data", &n);
int i;
size_t numElems = n;
9
10
11
12
13
printf("Roster: \n");
printStudents(roster, n);
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
return 0;
33
34
}
Code Sample 25.3: C Sort Examples
363
Observe the behavior of this function: it uses the standard strcmp function, but makes
the proper explicit type casting before doing so. The *(char * const *) casts the
generic void pointers as pointers to strings (or pointers to pointers to characters), then
dereferences it to be compatible with strcmp .
Another case is when we wish to sort user-defined structures. The Student structure
presented earlier is small in that it only has a few fields. When structures are stored
in an array and sorted, there may be many swaps of individual elements which involves
a lot of memory copying. If the structures are small this is not too bad, but for larger
structures this could be potentially expensive. Instead, it may be preferred to have
an array of pointers to structures. Swapping elements involves only swapping pointers
instead of the entire structure. This is far cheaper as a memory address is likely to be
far smaller than the actual structure it points to. This is essentially equivalent to the
1
364
1
2
3
4
...
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Another issue when sorting arrays of pointers is that we may now have to deal with NULL
elements. When sorting arrays of elements this is not an issue as a properly initialized
array will contain non-null elements (though elements could still be uninitialized, the
memory space is still valid).
How we handle NULL pointers is more of a design decision. We could ignore it and any
attempt to access a NULL structure will result in undefined behavior (or segmentation
faults, etc.). Or we could give NULL values an explicit ordering with respect to other
elements. That is, we could order all NULL pointers before non- NULL elements (and
consider all NULL pointers to be equal). An example with respect to our Student
structure is given in Code Snippet 25.6.
365
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
366
Part II.
The Java Programming Language
367
26. Basics
The Java programming language was developed in the early 1990s at Sun Microsystems
by James Gosling, Mike Sheridan, and Patrick Naughton. Its original intention was
to enable cable box sets to be more interactive. By the mid-90s, Java was retargeted
toward the WWW. The first public release came on May 23, 1995 with the first Java
Development Kit (JDK), Java 1.0 on January 23rd, 1996. A new, updated release has
come about every other year. As of 2014, Java 8 is the current stable version.
Today, Java is one of the most popular programming languages, consistently ranked
as one of the top 2 languages (see http://www.tiobe.com). It is now owned and
maintained by Oracle, but there are many open source tools, compilers and runtime
environments available. Java is used in everything from mobile devices (Android) and
desktop applications to enterprise-level servers.
From its inception, Java was designed with 5 basic principles:
1. Simple, Object-oriented, familiar
2. Robust and secure
3. Architecture-neutral and portable
4. High performance
5. Interpreted, threaded and dynamic
Java offers many key features that have made it popular. It is unique in that it
is not entirely compiled nor interpreted. Instead, Java source code is compiled into
an intermediate form, called Java bytecode. This bytecode is not directly runnable
on a processor. Instead, a JVM, an application that was written and compiled a
particular processor, interprets the bytecode and runs the application. This added layer
of abstraction means that Java source code can be written once (and compiled once)
and then run anywhere on any device that has a JVM. The added layer of abstraction
makes development easier, but does come at a cost in performance. However, the most
recent JVMs have offered performance that is comparable to native machine code in
many applications.
Another key feature is that Java has its own automated garbage collection. Some
languages require manual memory management, meaning that requesting, managing, and
freeing up memory is part of the code that you write as a developer. Failure to handle
369
26. Basics
memory management properly can lead to wasted resources (memory leaks), poor or
unstable performance, and even more serious security issues (buffer overflows). In Java,
there is no manual memory management. The JVM handles the allocation and clean up
of memory automatically.
In following with the five design principles, Java is similar in syntax to C (called C-style
syntax). Executable statements are terminated by semicolons, code blocks are defined
by opening/closing curly brackets, etc. Java is also fundamentally an Object-Oriented
Programming (OOP) language. With the exception of a few primitive types, in Java,
everything is an object or belongs to an object.
package unl.cse;
//package declaration
2
3
4
5
6
7
8
/**
* A basic hello world program in Java
*/
public class HelloWorld {
10
11
12
13
14
15
}
Code Sample 26.1: Hello World Program in Java
We will not focus on any particular development environment, code editor, or any
particular operating system, compiler, or ancillary standards in our presentation. However,
as a first step, you should be able to write, compile, and run the above program
370
371
26. Basics
In general, whitespace between coding elements is ignored.
Though not a syntactic requirement, the proper use of whitespace is important for good,
readable code. Code inside code blocks is indented at the same indentation. Nested
code blocks are indented further. Think of a typical table of contents or the outline of a
formal paper or essay. Sections and subsections or points and points all follow proper
indentation with elements at the same level at the same indentation. This convention is
used to organize code and make it more readable.
Packages
Java code is organized into modules called packages. Packages are essentially directories
(or folders) which follow a directory tree structure which allows subdirectories and
separate directories at the same level. It all starts at the root directory called the
default package.
Within a source file, we declare which package the file belongs to using the keyword
package followed by a fully qualified package declaration which is essentially just the
names of the directories that the file is located in, separated by a period. The declaration
is terminated by a semicolon. For example, the package declaration,
package unl.cse;
372
Imports
An import statement essentially brings in another class so that its methods and
functionality can be used. For example, there is a class named Scanner (located in
the package java.util ) that makes it easy to read input from the standard input. To
include it in our program so that we can use its functionality, we would need1 to import
it:
import java.util.Scanner;
Classes in the package java.lang (such as String and Math ) are considered standard
and are imported by default without an explicit import statement.
You may see some code that uses a wildcard like import java.util.*; which ends
up importing every class in that package. This is generally considered bad practice. In
general, code should be intentional and specific, importing every class even if they are
not used goes against this principle.
When naming packages, you must follow the general naming rules for identifiers (see below).
Package names cannot begin with a number, no whitespace, etc. Moreover, the general convention for package names is to use lowercase underscore casing, here_is_an_example .
Moreover, packages and subpackages follow the same convention as directories: the top
most directory is the most general and subdirectories are more and more specific.
In many of our examples well use unl.cse (UNL, University of NebraskaLincoln; CSE,
Department of Computer Science & Engineering) which illustrates this general-to-specific
organization.
Strictly, speaking you can still use it without importing it, but youd need to use a fully qualified path
name at declaration/instantiation.
373
26. Basics
26.2.4. Comments
Comments can be written in a Java program either as a single line using two forward
slashes, //comment or as a multiline comment using a combination of forward slash
and asterisk: /* comment */ . With a single line comment, everything on the line after
the forward slashes is ignored. With a multiline comment, everything in between the
forward slash/asterisk is ignored. Comments are ultimately ignored by the compiler so
the amount of comments do not have an effect on the final executable code. Consider
the following example.
374
26.3. Variables
1
2
3
4
5
6
7
8
9
/*
This is a comment that can
span multiple lines to format the comment
message more clearly
*/
double y;
Most code editors and IDEs will present comments in a special color or font to distinguish
them from the rest of the code (just as our example above does). Failure to close a
multiline comment will likely result in a compiler error but with color-coded comments
its easy to see the mistake visually.
Another common comment style convention is the Javadoc (Java Documentation) style
of comments. Javadoc style comments are multiline comments that begin with /**
(that is, two asterisks). The Javadoc framework allows you to markup your comments
with tags and links so that documentation can be automatically generated and published.
We will sometimes use this style, but we will not cover the details. For documentation,
see Oracles website, http://www.oracle.com/.
26.3. Variables
Java has 8 built-in primitive types supporting numbers (integers and floating-point
numbers), Booleans, and characters. Table 26.1 contains a complete description of these
types. Each of these primitive types also has a corresponding wrapper class defined in
the java.lang package. Wrapper classes provide object versions of each of these classes.
The object versions have many utility methods that can be used in relation to their type.
For example, the aforementioned Integer.parseInt() method is part of the Integer
wrapper class.
The wrapper classes, however, are different. These are objects, so when a reference is
declared for them, by default, that reference refers to null . The keyword null is used
to indicate a special memory address that represents nothing. In fact, the default value
for any object type is null . Care must be taken when mixing primitive types and their
wrapper classes (see below) as null references may result in a NullPointerException .
Finally, instances of the wrapper classes are immutable. Once they are created, they
cannot be changed. References can be made to reference a different object, but the
objects value cannot be changed.
375
26. Basics
Type
byte
short
int
long
float
double
boolean
char
Description
8-bit signed 2s complement integer
16-bit signed 2s complement integer
32-bit signed 2s complement integer
64-bit signed 2s complement integer
32-bit IEEE 754 floating point number
64-bit floating point number
may be set to true or false
16-bit Unicode (UTF-16) character
Wrapper Class
Byte
Short
Integer
Long
Float
Double
Boolean
Character
int numUnits;
double costPerUnit;
char firstInitial;
boolean isStudent;
Each declaration specifies the variables type followed by the identifier and ending with
a semicolon. The identifier rules are fairly standard: a name can consist of lower and
uppercase alphabetic characters, numbers, and underscores but may not begin with a
numeric character. We adopt the modern camelCasing naming convention for variables
in our code.
In general, variables must be assigned a value before you can read them (say printing
them) or otherwise using them in an expression. You do not have to immediately assign
a value when you declare them (though it is good practice), but some value must be
assigned before they can be used or the compiler will issue an error.2
The assignment operator is a single equal sign, = and is a right-to-left assignment. That
is, the variable that we wish to assign the value to appears on the left-hand-side while
the value (literal, variable or expression) is on the right-hand-size. Using our variables
from before, we can assign them values:
2
Instance variables, that is variables declared as part of an object do have default values. For objects,
the default is null , for all numeric t ypes, zero is the default value. For the boolean type, false
is the default, and the default char value is \0 , the null-terminating character (zero in the ASCII
table).
376
26.4. Operators
1
2
3
4
numUnits = 42;
costPerUnit = 32.79;
firstInitial = C;
isStudent = true;
For brevity, Java allows you to declare a variable and immediately assign it a value on
the same line. So these two code blocks could have been more compactly written as:
1
2
3
4
As another shorthand, we can declare multiple variables on the same line by delimiting
them with a comma. However, they must be of the same type. We can also use an
assignment with them.
1
2
Another convenient keyword is final . Though it has several uses, when applied to a
variable declaration, it makes it a read-only variable. After a value has been assigned to
a final variable, its value cannot be changed.
1
2
Any attempt to reassign the values of final variables will result in a compiler error.
26.4. Operators
Java supports the standard arithmetic operators for addition, subtraction, multiplication,
and division using + , - , * , and / respectively. Each of these operators is a binary
operator that acts on two operands which can either be literals or other variables and
follow the usually rules of arithmetic when it comes to order of precedence (multiplication
and division before addition and subtraction).
377
26. Basics
1
2
3
4
5
6
7
int
d =
d =
d =
d =
d =
d =
a
a
a
a
a
a
a
=
+
+
+
*
/
See below
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
In addition, you can mix the wrapper classes with their primitive types. You must be
careful though. The wrapper classes are object references which can be null . If a null
reference is used in an arithmetic expression, it will result in a NullPointerException
which can be caught and handled. If not caught, it will end up being a fatal error. Some
examples:
378
26.4. Operators
1
2
int a = 10, c;
Integer b = 20;
3
4
5
6
7
8
9
10
double x = 3.14, z;
Double y = 2.71;
//double and Double can be mixed:
z = x + y;
11
12
13
14
15
16
17
//Be careful:
Integer d = null;
c = a + d; //NullPointerException
This works because of a mechanism called autoboxing (or autounboxing in this case). The
wrapper class is acting like a box: its an object that stores the value of a primitive type.
When it gets used in an arithmetic expression, it gets unboxed and converted to a
primitive type so that the arithmetic operation is performed on compatible primitive types.
This is all done by the compiler and is completely transparent to us. However, that is the
reason that we may get a NullPointerExcpetion . Our code actually gets converted
from c = a + d; to c = a +d.doubleValue(); . The doubleValue method returns
a double primitive value. However, if d is null , you cant call a method on it; thus
the NullPointerException .
Special care must be taken when dealing with int types. For all four operators, if
both operands are integers, the result will be an integer. For addition, subtraction, and
multiplication this isnt an issue, but for division it means that when we divide, say
(10 / 20) , the result is not 0.5 as expected. The number 0.5 is a floating-point number.
As such, the fractional part gets truncationtruncated (cut off and thrown out) leaving
only zero. In the code above, d = a / b; the variable d ends up getting the value
zero because of this.
A solution to this problem is to use explicit type casting to force at least one of the
operands in an integer division to become a double type. For example:
1
2
3
4
x = (double) a / b;
379
26. Basics
Assigning a floating-point number to an integer is not allowed in Java and attempting to
do so will be treated as a compiler error. This is because Java does not support implicit
type casts. However, you can do so if you provide an explicit type cast as in the code
above,
d = (int) (b + y);
In this code, b + y is correctly computed as 20 + 3.4 = 23.4, but the explicit type cast
(down to an integer) results in truncation. The .4 gets cutoff and d gets the value 23.
Assigning an int value to a double variable is not a problem as the integer 2 becomes
the floating-point number 2.0.
Java also supports the integer remainder operator using the % symbol. This operator
gives the remainder of the result of dividing two integers. Examples:
1
int x;
2
3
4
5
x = 10 % 5; //x is 0
x = 10 % 3; //x is 1
x = 29 % 5; //x is 4
380
26.6. Examples
s.nextDouble() to get a double , etc. When these methods are called, the program
blocks until the user enters her input and presses the enter/return key. The conversion to
the type you requested is automatic. A full example is depicted in Code Sample 26.2.
1
2
3
4
5
One potential problem with using Scanner is that the methods cannot force a user
to enter good input. In the example above, if the user, instead of entering a number,
entered "Hello" , the conversion to a number would fail. This would result in a
InputMismatchException .
26.6. Examples
26.6.1. Converting Units
Lets start with a simple task: lets write a program that will prompt the user to enter a
temperature in degrees Fahrenheit and convert it to degrees Celsius using the formula
C = (F 32)
5
9
We begin with the basic program outline which will include a package and class declaration.
Well also need to read from the standard input, so well import the Scanner class.
Well want want our class to be executable, so we need to put a main method in our
class. Finally, well document our program to indicate its purpose.
381
26. Basics
1
package unl.cse;
2
3
import java.util.Scanner;
4
5
6
7
8
9
/**
* This program converts Fahrenheit temperatures to
* Celsius
*/
public class TemperatureConverter {
10
11
12
13
14
15
16
It is common for programmers to use a comment along with a TODO note to themselves
as a reminder of things that they still need to do with the program.
Lets first outline the basic steps that our program will go through:
1. Well first prompt the user for input, asking them for a temperature in Fahrenheit
2. Next well read the users input, likely into a floating-point number as degrees can
be fractional
3. Once we have the input, we can calculate the degrees Celsius by using the formula
above
4. Lastly, we will want to print the result to the user to inform them of the value
Sometimes its helpful to write an outline of such a program directly in the code using
comments to provide a step-by-step process. For example:
382
26.6. Examples
1
package unl.cse;
2
3
import java.util.Scanner;
4
5
6
7
8
9
/**
* This program converts Fahrenheit temperatures to
* Celsius
*/
public class TemperatureConverter {
10
11
12
13
14
15
16
17
input in Fahrenheit
value from the standard input
Celsius
the user
18
19
20
As we read each step it becomes apparent that well need a couple of variables: one to
hold the Fahrenheit (input) value and one for the Celsius (output) value. It also makes
sense that each of these should be double variables as we want to support fractional
values. So at the top of our main method, well add the variable declarations:
double fahrenheit, celsius;
Well also need a scanner, initialized to read from the standard input:
Scanner s = new Scanner(System.in);
Each of the steps is now straightforward; well use a Sytem.out.println statement in
the first step to prompt the user for input:
System.out.println("Please enter degrees in Fahrenheit: ");
In the second step, well use our Scanner to read in a value from the user for the
fahrenheit variable. Recall that we use the method s.nextDouble() to read a
double value from the user.
fahrenheit = s.nextDouble();
We can now compute celsius using the formula provided:
celsius = (fahrenheit - 32) * (5 / 9);
383
26. Basics
Finally, we use System.out.printf to output the result to the user:
System.out.printf("%f Fahrenheit is %f Celsius\n", fahrenheit, celsius);
Try typing and running the program as defined above and youll find that you dont get
correct answers. In fact, youll find that no matter what values you enter, you get zero.
This is because of the calculation using (5 / 9) : recall what happens with integer
division: truncation! This will always end up being zero.
One way we could fix it would be to pull out our calculators and find that 59 = 0.55555 . . .
and replace (5 / 9) with 0.555555 . But, how many fives? It may be difficult to tell
how accurate we can make this floating-point number by hardcoding it ourselves. A
much better approach would be to let the compiler take care of the optimal computation
for us by making at least one of the numbers a double to prevent integer truncation.
That is, we should instead use 5.0 / 9 .
The full program can be found in Code Sample 26.3.
384
26.6. Examples
package unl.cse;
2
3
import java.util.Scanner;
4
5
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
}
Code Sample 26.3: Fahrenheit-to-Celsius Conversion Program in Java
b b2 4ac
x=
2a
As before, we can create a basic program with a main method and start filling in the
details. In particular, well need to prompt for the input a, then read it in; then prompt
385
26. Basics
for b, read it in and repeat for c. Well also need several variables: three for the coefficients
a, b, c and two more; one for each root. Thus, we have
double a, b, c, root1, root2;
Scanner s = new Scanner(System.in);
1
2
3
4
5
6
7
8
9
Now to compute the roots: we need to take care that we correctly adapt the formula so
it accurately reflects the order of operations. We also need to use the standard math
librarys square root function (unless you want to write your own! Carefully adapting
the formula leads to
root1 = (-b + Math.sqrt(b*b - 4*a*c) ) / (2*a);
root2 = (-b - Math.sqrt(b*b - 4*a*c) ) / (2*a);
1
2
Finally, we print the output using System.out.printf . The full program can be found
in Code Sample 26.4.
This program was interactive. As an alternative, we could have read all three of the inputs
as command line arguments, taking care that we need to convert them to floating-point
numbers. Lines 1621 in the program could have been changed to
a = Double.parseDouble(args[0]);
b = Double.parseDouble(args[1]);
c = Double.parseDouble(args[2]);
1
2
3
Finally, think about the possible inputs a user could provide that may cause problems
for this program. For example:
What if the user entered zero for a?
What if the user entered some combination such that b2 < 4ac?
What if the user entered non-numeric values?
For the command line argument version, what if the user provided less than three
argument? Or more?
How might we prevent the consequences of such bad inputs? That is, how might we
handle the even that a users enters those bad inputs and how do we communicate these
386
26.6. Examples
errors to the user? To do so well need conditionals.
387
26. Basics
package unl.cse;
2
3
import java.util.Scanner;
4
5
6
7
8
9
/**
* This program computes the roots to a quadratic equation
* using the quadratic formula.
*/
public class QuadraticRoots {
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
}
Code Sample 26.4: Quadratic Roots Program in Java
388
27. Conditionals
Java supports the basic if, if-else, and if-else-if conditional structures as well as switch
statements. Java has Boolean types and logical statements are built using the standard
logical operators for numeric comparisons as well as logical operators such as negations,
And, and Orthat can be used with Boolean types.
int a = 10;
boolean b = true;
boolean result = (a || b); //compilation error
The standard numeric comparison operators are also supported. Consider the following
code snippet:
1
2
3
4
5
int a =
int b =
int c =
boolean
boolean
10;
20;
10;
x = true;
y = false;
The six standard comparison operators are presented in Table 27.1 using these variables
as examples. The comparison operators are the same when used with double types as
well and int types and double types can be compared with each other without type
casting.
Furthermore, because of autoboxing and unboxing, the wrapper classes for numeric types
can be compared using the same operators. For example:
389
27. Conditionals
Name
Equals
Operator Syntax
==
Not Equals
!=
<
<=
>
>=
Examples
a == 10
b == 10
a == b
a == c
a != 10
b != 10
a != b
a != c
a < 15
a < 5
a < b
a < c
a <= 15
a <= 5
a <= b
a <= c
a > 15
a > 5
a > b
a > c
a >= 15
a >= 5
a >= b
a >= c
390
Value
true
false
false
true
false
true
true
false
true
false
true
false
true
false
true
true
false
true
false
false
false
true
false
true
Operator Syntax
!
And
&&
Or
||
Examples
!x
!y
x && true
x && y
x || false
x || y
Values
false
true
true
false
true
false
int a = 10;
Integer b = 20;
Double x = 3.14;
boolean r;
r = (a < b);
r = (a >= b);
r = (x == 2.71);
The three basic logical operators are also supported as described in Table 27.2 using the
same code snippet variable values as examples.
391
27. Conditionals
Operator(s)
Highest ++ , --, !
Lowest
*, /, %
+, < , <= , > , >=
== , !=
&&
||
= , += , -= , *= , /=
Associativity Notes
left-to-right
postfix increment operators
right-to-left
unary negation operator, logical
not
left-to-right
left-to-right
addition, subtraction
left-to-right
comparison
left-to-right
equality, inequality
left-to-right
logical And
left-to-right
logical Or
right-to-left
assignment and compound assignment operators
Table 27.3.: Operator Order of Precedence in Java. Operators on the same level have
equivalent order and are performed in the associative order specified.
Comparison Example
(A < a)
(A == a)
(A < Z)
(0 < 9)
(\n < A)
( < \n)
Result
true
false
true
true
true
false
Numeric comparison operators cannot be used to compare strings in Java. For example,
we could not code something like ("aardvark" < "zebra") . The Java compiler would
not allow you to do this because the comparison operator is for numeric types only.
However, the following code would compile and run:
1
2
3
String s = "aardvark";
String t = "zebra";
boolean b = (s == t);
but it wouldnt necessarily give you what you want. To understand why this is okay,
recall that a String is an object; the s and t variables are references to that object
in memory. When we use the equality comparison, == were asking if s and t are
the same memory address. In this case, likely they are not and so the result is false .
However, similar code, for example,
392
would also result in false because s and t represent different strings in memory,
even though they have the same sequence of characters. Well explore how to properly
compare strings later. For now, avoid using the comparison operators with strings.
1
2
3
4
//example of an if statement:
if(x < 10) {
System.out.println("x is less than 10");
}
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Some observations about the syntax: the statement, if(x < 10) does not have a
semicolon at the end. This is because it is a conditional statement that determines
the flow of control and not an executable statement. Therefore, no semicolon is used.
Suppose we made a mistake and did include a semicolon:
393
27. Conditionals
1
2
3
4
int x = 15;
if(x < 10); {
System.out.println("x is less than 10");
}
Some compilers may give a warning, but this is valid Java; it will compile and it will run.
However, it will end up printing x is less than 10 , even though x = 15! Recall that
a conditional statement binds to the executable statement or code block immediately
following it. In this case, weve provided an empty executable statement ended by the
semicolon. The code is essentially equivalent to
1
2
3
4
int x = 15;
if(x < 10) {
}
System.out.println("x is less than 10");
Which is obviously not what we wanted. The semicolon ended up binding to the empty
executable statement, and the code block containing the print statement immediately
followed, but was not bound to the conditional statement which is why the print statement
executed regardless of the value of x.
Another convention that weve used in our code is where we have placed the curly brackets.
First, if a conditional statement is bound to only one statement, the curly brackets are
not necessary. However, it is best practice to include them even if they are not necessary
and well follow this convention. Second, the opening curly bracket is on the same line as
the conditional statement while the closing curly bracket is indented to the same level
as the start of the conditional statement. Moreover, the code inside the code block is
indented. If there were more statements in the block, they would have all been at the
same indentation level.
27.3. Examples
27.3.1. Computing a Logarithm
The logarithm of x is the exponent that some base must be raised to get x. The most
common logarithm is the natural logarithm, ln (x) which is base e = 2.71828 . . .. But
logarithms can be in any base b > 11 What if we wanted to compute log2 (x)? Or
log (x)? Lets write a program that will prompt the user for a number x and a base b
and computes logb (x).
1
Bases can also be 0 < b < 1, but well restrict our attention to increasing functions only.
394
27.3. Examples
Arbitrary bases can be computed using the change of base formula:
logb (x) =
loga (x)
loga (b)
If we can compute some base a, then we can compute any base b. Fortunately we have
such a solution. Recall that the standard library provides a function to compute the
natural logarithm, Math.log() ). This is one of the fundamentals of problems solving: if
a solution already exists, use it. In this case, a solution exists for a different, but similar
problem (computing the natural logarithm), but we can adapt the solution using the
change of base formula. In particular, if we have variables b (base) and x , we can
compute logb (x) using
Math.log(x) / Math.log(b)
But wait: we have a problem similar to the examples in the previous section. The user
could enter invalid values such as b = 10 or x = 2.54 (logarithms are undefined
for non-positive values in any base). We want to ensure that b > 1 and x > 0. With
conditionals, we can now do this. Once we have read in the input from the user we can
make a check for good input using an if statement.
1
2
3
4
This code has something new: System.exit(1) . The exit function immediately
terminates the program regardless of the rest of the code that may remain. The argument
passed to exit is an integer that represents an error code. The convention is that zero
indicates no error while non-zero values indicate some error. This is a simple way of
performing error handling: if the user provides bad input, we inform them and quit
the program, forcing them to run it again and provide good input. By prematurely
terminating the program we avoid any illegal operation that would give a bad result.
Alternatively, we could have split the conditions into two statements and given a more
descriptive error message. We use this design in the full program which can be found in
Code Sample 27.2. The program also takes the input as command line arguments. Now
that we have conditionals, we can actually check that the correct number of arguments
was provided by the user and quit in the event that they dont provide the correct
number.
395
27. Conditionals
them in from the user. At the same time we can check for bad input (negative values)
for both the inputs.
1
2
3
4
5
6
7
8
9
10
11
12
13
Next, we can code a series of if-else-if statements for the income range. By placing the
ranges in increasing order, we only need to check the upper bounds just as in the original
example.
1
2
3
4
5
6
7
8
9
Next we compute the child tax credit, taking care that it does not exceed $3,000. A
conditional based on the number of children should suffice as at this point in the program
we already know it is zero or greater.
1
2
3
4
5
if(numChildren <= 3) {
credit = numChildren * 1000;
} else {
credit = 3000;
}
Finally, we need to ensure that the credit does not exceed the total tax liability (the
credit is non-refundable, so if the credit is greater, the tax should only be zero, not
396
27.3. Examples
negative).
1
2
3
4
5
397
27. Conditionals
1
2
3
4
5
/**
* This program computes the logarithm base b (b > 1)
* of a given number x > 0
*/
public class Logarithm {
7
8
double b, x, result;
if(args.length != 2) {
System.out.println("Usage: b x");
System.exit(1);
}
9
10
11
12
13
14
b = Double.parseDouble(args[0]);
x = Integer.parseInt(args[1]);
15
16
17
if(x <= 0) {
System.out.println("Error: x must be greater than zero");
System.exit(1);
}
if(b <= 1) {
System.out.println("Error: base must be greater than one");
System.exit(1);
}
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
}
Code Sample 27.2: Logarithm Calculator Program in Java
398
27.3. Examples
import java.util.Scanner;
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
if(numChildren <= 3) {
credit = numChildren * 1000;
} else {
credit = 3000;
}
37
38
39
40
41
42
43
44
45
46
47
48
System.out.printf("AGI:
System.out.printf("Tax:
System.out.printf("Credit:
System.out.printf("Tax Liability:
49
50
51
52
$%10.2f\n",
$%10.2f\n",
$%10.2f\n",
$%10.2f\n",
income);
baseTax);
credit);
totalTax);
53
54
55
56
399
Code Sample 27.3: Tax Program in Java
27. Conditionals
1
2
3
4
5
/**
* This program computes the roots to a quadratic equation
* using the quadratic formula.
*/
public class Roots {
7
8
9
if(args.length != 3) {
System.err.println("Usage: a b c\n");
System.exit(1);
}
10
11
12
13
14
a = Double.parseDouble(args[0]);
b = Double.parseDouble(args[1]);
c = Double.parseDouble(args[2]);
15
16
17
18
if(a == 0) {
System.err.println("Error: a cannot be zero");
System.exit(1);
} else if(b*b < 4*a*c) {
System.err.println("Error: cannot handle complex roots\n");
System.exit(1);
} else if(b*b == 4*a*c) {
root1 = -b / (2*a);
System.out.printf("Only one distinct root: %f\n", root1);
} else {
root1 = (-b + Math.sqrt(b*b - 4*a*c) ) / (2*a);
root2 = (-b - Math.sqrt(b*b - 4*a*c) ) / (2*a);
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
}
Code Sample 27.4: Quadratic Roots Program in Java With Error Checking
400
28. Loops
Java supports while loops, for loops, and do-while loops using the keywords while , for ,
and do (along with another while ). Continuation conditions for loops are enclosed
in parentheses, (...) and the blocks of code associated with the loop are enclosed in
curly brackets.
int i = 1; //Initialization
while(i <= 10) { //continuation condition
//perform some action
i++; //iteration
}
Code Sample 28.1: While Loop in Java
In addition, the continuation condition does not contain a semicolon since it is not an
executable statement. Just as with an if-statement, if we had placed a semicolon it would
have led to unintended results. Consider the following:
1
2
3
4
A similar problem occurs: the while keyword and continuation condition bind to
the next executable statement or code block. As a consequence of the semicolon, the
executable statement that gets bound to the while loop is empty. What happens is
401
28. Loops
even worse: the program will enter an infinite loop. To see this, the code is essentially
equivalent to the following:
1
2
3
4
5
6
In the while loop, we never increment the counter variable i , the loop does nothing,
and so the computation will continue on forever! Some compilers will warn you about
this, others will not. It is valid Java and it will compile and run, but obviously wont
work as intended. Avoid this problem by using proper syntax.
Another common use case for a while loop is a flag-controlled loop in which we use a
Boolean flag rather than an expression to determine if a loop should continue or not. An
example can be found in Code Sample 28.2.
1
2
3
4
5
6
7
8
9
int i = 1;
boolean flag = true;
while(flag) {
//perform some action
i++; //iteration
if(i>10) {
flag = false;
}
}
Code Sample 28.2: Flag-controlled While Loop in Java
402
1
2
3
Again, note the syntax: semicolons are placed at the end of the initialization and
continuation condition, but not the iteration statement. Just as with while loops, the
opening curly bracket is placed on the same line as the for keyword. Code within the
loop body is indented, all at the same indentation level.
Another observation: the declaration of the counter variable i was done in the initialization statement. This scopes the variable to the loop itself. The variable i is valid inside
the loop body, but will be out-of-scope after the loop body. It is possible to declare the
variable prior to the loop, but the variable i would have a much larger scope. It is best
practice to limit the scope of variables only to where they are needed. Thus, we will
write our loops as above.
1
2
3
4
5
int i;
do {
//perform some action
i++;
} while(i <= 10);
Code Sample 28.4: Do-While Loop in Java
Note the syntax and style: the opening curly bracket is again on the same line as the
keyword do . The while keyword and continuation condition are on the same line as
the closing curly bracket. In a slight departure from consistent syntax, a semicolon does
appear at the end of the continuation condition even though it is not an executable
statement.
403
28. Loops
1
2
3
4
5
The code (int a : arr) should be read as for each integer element a in the collection
arr ... Within the enhanced for loop, the variable a will be automatically updated for
you on each iteration. Outside the loop body, the variable a is out-of-scope.
Java allows you to use an enhanced for loop with any array or collection (technically,
anything that implements the Iterable interface). One example is a List , an ordered
collection of elements. Code Sample 28.6 contains an example.
1
2
3
4
5
404
28.5. Examples
28.5. Examples
28.5.1. Normalizing a Number
Lets revisit the example from Section 4.1.1 in which we normalize a number by continually
dividing it by 10 until it is less than 10. The code in Code Sample 28.7 specifically refers
to the value 32145.234 but would work equally well with any value of x .
1
2
3
4
5
6
double x = 32145.234;
int k = 0;
while(x > 10) {
x = x / 10; //or: x /= 10;
k++;
}
Code Sample 28.7: Normalizing a Number with a While Loop in Java
28.5.2. Summation
Lets revisit the example from Section 4.2.1 in which we computed the sum of integers
1 + 2 + + 10. The code is presented in Code Sample 28.8
1
2
3
4
5
int i;
int sum = 0;
for(i=1; i<=10; i++) {
sum += i;
}
Code Sample 28.8: Summation of Numbers using a For Loop in Java
Of course we could easily have generalized the code somewhat. Instead of computing a
sum up to a particular number, we could have written it to sum up to another variable
n , in which case the for loop would instead look like the following.
1
2
3
405
28. Loops
int i, j;
int n = 10;
int m = 20;
for(i=0; i<n; i++) {
for(j=0; j<m; j++) {
System.out.printf("(i, j) = (%d, %d)\n", i, j);
}
}
Code Sample 28.9: Nested For Loops in Java
The inner loop execute for j = 0, 1, 2, . . . , 19 < m = 20 for a total of 20 times. However,
it executes 20 times for each iteration of the outer loop. Since the outer loop execute
for i = 0, 1, 2, . . . , 9 < n = 10, the total number of times the System.out.printf
statement execute is 10 20 = 200. In this example, the sequence
(0, 0), (0, 1), (0, 2), . . . , (0, 19), (1, 0), . . . , (9, 19)
will be printed.
However, recall that we may have problems due to accuracy. The monthly payment
could come out to be a fraction of a cent, say $43.871. For accuracy, we need to ensure
that all of the figures for currency are rounded to the nearest cent. The standard math
library does have a Math.round() function, but it only rounds to the nearest whole
number, not the nearest 100th.
406
28.5. Examples
However, we can adapt the off-the-shelf solution to fit our needs. If we take the number,
multiply it by 100, we get (say) 4387.1 which we can now round to the nearest whole
number, giving us 4387. We can then divide by 100 to get a number that has been
rounded to the nearest 100th! In Java, we could simply do the following.
monthlyPayment = Math.round(monthlyPayment * 100.0) / 100.0;
We can use the same trick to round the monthly interest payment and any other number
expected to be whole cents. To output our numbers, we use System.out.printf and
take care to align our columns to make make it look nice. To finish our adaptation, we
handle the final month separately to account for an over/under payment due to rounding.
The full solution can be found in Code Sample 28.10.
407
28. Loops
3
4
if(args.length != 4) {
System.err.println("Usage: principle apr terms");
System.exit(1);
}
5
6
7
8
9
10
11
12
13
14
15
16
//monthly payment
double monthlyPayment = (monthlyInterestRate * principle) /
(1 - Math.pow( (1 + monthlyInterestRate), -n));
//round to the nearest cent
monthlyPayment = Math.round(monthlyPayment * 100.0) / 100.0;
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
29. Methods
As an object-oriented programming language, functions in Java are usually referred to
as methods and are essential to writing programs. The distinction is that a function is
usually a standalone element while methods are functions that are members of a class.
In Java, since everything is a class or belongs to a class, standalone functions cannot be
defined.
In Java you can define your own methods, but they need to be placed within a class.
Usually methods that act on data in the class (or instances of the class, see Chapter 34)
or have common functionality are placed into one class. For example, all the basic math
methods are all part of the java.lang.Math class. It is not uncommon to place similar
methods together into one utility class.
Java supports method overloading, so within the same class you can define multiple
methods with the same name as long as they differ in either the number of type of
parameters. For example, in the java.lang.Math class, there are 3 versions of the
absolute value method, abs() , one that takes/returns an int , one that takes/returns
a double and one for float types. Naming conflicts can easily be solved by ensuring
that you place your methods in a class/package that is unique to your application.
In Java, the 8 primitive types ( int , double , char , boolean , etc.) are passed by
value. All object types, however, such as the wrapper classes Integer , Double as well
as String , etc. are passed by reference. That is, the memory address in the JVM is
passed to the method. This is done for efficiency, for objects that are large it would be
inefficient to copy the entire object into the call stack in order to pass it to a method.
However, though object types are passed by reference, the method cannot necessarily
change them. Recall that the wrapper classes Integer , Double and the String class
are all immutable, meaning that once created they cannot be modified. Thus, even
though they are passed by reference, the method that receives them cannot change them.
There are many mutable objects in Java. The StringBuilder class for example is a
mutable object. If you pass a StringBuilder instance to a method, that method is
free to invoke mutator methods (any methods that change the objects state). Since it is
the same object as in the calling method, the calling method can see those changes.
As of Java 5, you can write and use vararg methods. The System.out.printf()
method is a prime example of this. However, we will not discuss in detail how to do this.
409
29. Methods
Instead, refer to standard Java documentation. Finally, parameters are not optional in
Java. This is because Java supports method overloading. You can write multiple versions
of the same method that each take a different number of arguments. You can even design
them so that the more specific versions (with fewer arguments) invoke the more general
versions (with more arguments), passing in sensible defaults. when doing so.
Defining methods is fairly straightforward. First you create class to place them in. Then
you provide the method signature along with the body of the method. In addition, there
are several modifiers that you can place in the method signature to specify its visibility
and whether or not the method belongs to the class or to instances of the class. This
is a concept well explore in Chapter 34. For now, well only focus on what is needed to
get started.
Typically, the documentation for methods is included with the method definition using
Javadoc style comments. Consider the following examples.
410
/**
* Computes the sum of the two arguments.
* @param a
* @param b
* @return the sum, <code>a + b</code>
*/
public static int sum(int a, int b) {
return (a + b);
}
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
/**
* Computes the Euclidean distance between the 2-D points,
* (x1,y1) and (x2,y2).
* @param x1
* @param y1
* @param x2
* @param y2
* @return
*/
public static double getDistance(double x1, double y1,
double x2, double y2) {
double xDiff = (x1-x2);
double yDiff = (y1-y2);
return Math.sqrt( xDiff * xDiff + yDiff * yDiff);
}
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
/**
* Computes a monthly payment for a loan with the given
* principle at the given APR (annual percentage rate) which
* is to be repaid over the given number of terms.
* @param principle - the amount borrowed
* @param apr - the annual percentage rate
* @param terms - number of terms (usually months)
* @return
*/
public static double getMonthlyPayment(double principle,
double apr, int terms) {
double rate = (apr / 12.0);
double payment = (principle * rate) / (1-Math.pow(1+rate, -terms));
return payment;
}
In each of the examples above, the first modifier keyword we used was public . This
411
29. Methods
makes the method visible to all other parts of the code base. Any other piece of code
can invoke the method and take advantage of the functionality it provides. Alternatively,
we could have used the keywords private , to make it only visible to other methods in
the same class, protected or package protected by omitting the modifier altogether.
Well discuss these in detail later on. Well mostly want our methods to be available, so
well make most of them public .
The second modifier is static which makes it so that the method belongs to the class
itself rather than instances of the class. Well discuss objects and instances in detail later
on. For now, well simply make all of our methods static .
After the modifiers, we provide the method signature including the return type, its
identifier (name), and its parameter list. Method names must follow the same naming rules
as variables: they must begin with an alphabetic character and may contain alphanumeric
characters as well as underscores. However, using modern coding conventions we usually
name methods using lower camel casing.
Immediately after the signature we provide a method body which contains the code
that will be run upon invocation of the method. The method body is enclosed using
opening/closing curly brackets.
In the example above, weve also illustrated how to define a method that has no inputs.
412
3
4
5
6
7
8
9
10
The Utils.methodName() syntax is used because the methods are static they belong
to the class and so must be invoked through the class using the classs name. Weve
previously seen this syntax when using System. or Math. with the standard JDK
library functions.
413
29. Methods
1
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
change(a, b);
18
19
20
21
22
23
24
To see this, observe the following output. When we return to the main method, the
original string s is unchanged (since it was immutable). However, the StringBuilder
has been changed by the method.
main: s = Hello
main: b = Hello
change: s = Hello world!
change: sb = Hello world!
main after: s = Hello
main after: b = Hello world!
414
29.2. Examples
29.2. Examples
Recall that the standard math library provides a Math.round() method that rounds a
number to the nearest whole number. Often, weve had need to round to cents as well.
We now have the ability to write a method to do this for us. Before we do, however, lets
think more generally. What if we wanted to round to the nearest tenth? Or what if we
wanted to round to the nearest 10s or 100s place? Lets write a general purpose rounding
method that allows us to specify which decimal place to round to.
The most natural input values would be to specify the place using an integer exponent.
That is, if we wanted to round to the nearest tenth, then we would pass it 1 as
0.1 = 101 , 2 if we wanted to round to the nearest 100th, etc. On the positive end
passing in 0 would correspond to the usual round function, 1 to the nearest 10s spot,
and so on.
Moreover, we could demonstrate good code reuse (as well as procedural abstraction)
by scaling the input value and reusing the functionality already provided in the math
librarys Math.round() method. We could further define a roundToCents() method
that used our generalized round method. Finally, we could place all of these methods
into RoundUtils Java class for good organization.
415
29. Methods
1
package unl.cse;
2
3
4
5
6
7
/**
* A collection of rounding utilities
*
*/
public class RoundUtils {
/**
* Rounds to the nearest digit specified by the place
* argument. In particular to the (10^place)-th digit
*
* @param x the number to be rounded
* @param place the place to be rounded to
* @return
*/
public static double roundToPlace(double x, int place) {
double scale = Math.pow(10, -place);
double rounded = Math.round(x * scale) / scale;
return rounded;
}
9
10
11
12
13
14
15
16
17
18
19
20
21
22
/**
* Rounds to the nearest cent (100th place)
*
* @param x
* @return
*/
public static double roundToCents(double x) {
return RoundUtils.roundToPlace(x, -2);
}
23
24
25
26
27
28
29
30
31
32
33
Observe that this class does not contain a main() method. That means that this class
is not executable itself. It only provides functionality to other classes in the code base.
416
30.1. Exceptions
Java defines a base class named Throwable that an object type that can be thrown
using the keyword throw . There are two majors subtypes of Throwable : Error and
Exception . The Error class is used primarily for fatal errors such as the JVM running
out of memory or some other extreme case that your code cannot reasonably be expected
to recover from.
There are dozens of types of exceptions that are subclasses of the standard Java
Exception class defined by the JDK including IOException (and its subclasses such
as FileNotFoundException ) or SQLException (when working with Structured Query
Language (SQL) databases).
An important subclass of Exception is RuntimeException which represent unchecked
exceptions that do not need to be explicitly caught (see Section 30.1.4 below for further
details). Well mostly focus on this type of exception.
417
3
4
5
6
7
8
9
10
try {
String input = s.next();
n = Integer.parseInt(input);
} catch (NumberFormatException nfe) {
System.err.println("You entered invalid data!");
System.exit(1);
}
In this example, weve simply displayed an error message to the standard error output
and exited the program. That is, weve made the design decision that this error should
be fatal. We could have chosen to handle this error differently in the catch block.
The code above could have resulted in other exceptions. For example if the Scanner
had failed to read the next token from the standard input, it would have thrown a
NoSuchElementException . We can add as many catch blocks as we want to handle
each exception differently.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
try {
String input = s.next();
n = Integer.parseInt(input);
} catch (NumberFormatException nfe) {
System.err.println("You entered invalid data!");
System.exit(1);
} catch (NoSuchElementException nsee) {
System.err.println("Input reading failed, using default...");
n = 20; //a default value
} catch(Exception e) {
System.err.println("A general exception occurred");
e.printStackTrace();
System.exit(1);
}
Each catch block catches a different type of exception. Thus, the name of the variable
that holds each exception must be different in the chain of catch blocks; nfe , nsee ,
e.
Note that the last catch block was written to catch a generic Exception . This last
block will essentially catch any other type of exception. Much like an if-else-if
418
30.1. Exceptions
statement, the first type of exception that is caught is the block that will be executed
and they are all mutually exclusive. Thus, a catch all block like this should always be
the last catch block. The most specific types of exceptions should be caught first and
the most general types should be caught last.
/**
* Constructor that takes an error message
*/
public ComplexRootException(String errorMessage) {
super(errorMessage);
}
3
4
5
6
7
8
9
Now in our code we can catch and even throw this new type of exception.
419
1
2
3
4
5
or we would need to specify that the method processFile() explicitly throws the
exception:
420
30.1. Exceptions
1
2
3
Doing this, however, would force any code that called the processFile() method to
surround it in a try-catch block and explicitly handle it (or once again, throw it back
to the calling method).
The point of a checked exception is to force code to deal with potential issues that can be
reasonably anticipated (such as the unavailability of a file). However, from another point
of view checked exceptions represent the exact opposite goal of error handling. Namely,
that a function or code block can and should inform the calling function that an error
has occurred, but not explicitly make a decision on how to handle the error. A checked
exception doesnt make the full decision for the calling function, but it does eliminate
ignoring the error as an option from the calling function.
Java also supports unchecked exceptions which do not need to be explicitly caught. For
example, NumberFormatException or NullPointerException are unchecked exceptions. If an unchecked exception is thrown and not caught, it bubbles up through the
call stack until some piece of code does catch it. If no code catches it, it results in a fatal
error and terminates the execution of the JVM.
The RuntimeException class and any of its subclasses are unchecked exceptions. In
our ComplexRootException example above, because we extended RuntimeException
we made it an unchecked exception, allowing the calling function to decide not only how
to handle it, but also whether or not to handle it at all. If we had instead decided to
extend Exception we would have made our exception a checked exception.
There is considerable debate as to whether or not checked exceptions are a good thing
(and as to whether or not unchecked exceptions are a good thing). Many feel (the author
included) that checked exceptions were a mistake and their usage should be avoided. The
rationale behind checked exceptions is summed up in the following quote from the Java
documentation [7].
Heres the bottom line guideline: If a client can reasonably be expected to
recover from an exception, make it a checked exception. If a client cannot do
anything to recover from the exception, make it an unchecked exception
The problem is that the JDKs own design violates this principle. For example,
FileNotFound is a checked exception; the reasoning being that a program could reprompt the user for a different file. The problem is the assumption that the program we
are writing is always interactive. In fact most software is not interactive and is instead
designed to interact with other software. Reprompting is not an option in the vast
majority of cases.
As another example, consider Javas SQL library which allows you to programmatically
421
In the example, since the name of the enumeration is Day this declaration must be in a
source file named Day.java . We can now declare variables of this type. The possible
values it can take are restricted to SUNDAY , MONDAY , etc. and we can use these keywords
in our program. However these values belong to the class Day and must be accessed
statically. For example,
1
2
3
4
5
422
for(Day d : Day.values() {
System.out.println(d.name());
}
In the example above, we used another feature: each enum value has a name() method
that returns the value as a String . This example would end up printing the following.
SUNDAY
MONDAY
TUESDAY
WEDNESDAY
THURSDAY
FRIDAY
SATURDAY
Of course, we may want more human-oriented representations. To do this we could
override the classs toString() method to return a better string representation. For
example:
423
Because enum types are full classes in Java, many more tricks can be used that leverage
the power of classes including using additional state and constructors. We will cover
these topics later.
424
31. Arrays
Java allows you to declare and use arrays. Since Java is statically typed, arrays must
also be typed when they are declared and may only hold that particular type of element.
Since Java has automated garbage collection, memory management is greatly simplified.
Finally, in Java, only locally scoped primitives and references are allocated on the program
call stack. As such, there are no static arrays; all arrays are allocated in the heap space.
This example1 only declares 3 references that can refer to an array, it doesnt actually
create them as the references are all null . To create arrays of a particular size, we need
to initialize them using the keyword new .
1
2
3
Each of these initializations creates a new array (allocated on the heap) of the specified
size (10, 20, and 5 respectively). These arrays can only hold values of the specified type
( int , double , and String respectively).
As with regular variables, the default value for each element in these new arrays will be
1
You may see code examples that use the alternative notation int[] arr = null; . Both are
acceptable. Some would argue that this way is more correct because it keeps the brackets with
the type, avoiding the type-variable name-type separation in our example. Those preferring the
int arr[] = null; notation would argue that it is more natural because that is how we
ultimately index the array. Six of One, Half-Dozen of the Other.
425
31. Arrays
zero for the numeric types and null for object types. Alternatively, you could use a
compound declaration/assignment syntax to initialize specific values:
1
int arr[] = { 2, 3, 5, 7, 11 };
Using this syntax we do not need to specify the size of the array as the compiler is smart
enough to count the number of elements weve provided. The elements themselves are
denoted inside curly brackets and delimited with commas.
For both types of syntax, the actual array is always allocated on the heap while the
reference variables, arr values , and names are stored in the program call stack.
Indexing
Once an array has been created, its elements can be accessed by indexing. Java uses the
standard 0-indexing scheme so the first element is at index 0, the second at index 1, etc.
Indexing an element involves using the square bracket notation and providing an index.
Once indexed, an array element can be treated as a normal variable and can be used
with other operators such as the assignment operator or comparison operators.
1
2
3
4
5
arr[0] = 42;
if(arr[4] < 0) {
System.out.println("negative!");
}
System.out.println("arr[1] = "+ arr[1]);
Recall that an index is actually an offset. The compiler and system know exactly how
many bytes each element takes and so an index i calculates exactly how many bytes
from the first element the i-th element is located at. Consequently it is possible to index
elements that are beyond the range of the array. For example, arr[-1] or arr[5]
would attempt to access an element immediately before the first element and immediately
after the last element. Obviously, these elements are not part of the array.
If you attempt to access an element outside the bounds of an array, the JVM will raise
an IndexOutOfBoundsException which is a RuntimeException that you can catch
and handle if you choose. To prevent such an exception you can write code that does
not exceed the bounds of the array. Java arrays have a special length property that
gives you the size of the array. You can access the property using the dot operator, so
arr.length would give the value 5.
426
Using the length property you can design a for-loop that increments an index variable to
iterate over the elements in an array.
1
2
3
4
The for loop above initializes the variable i to zero, corresponding to the first element
in the array. The continuation condition is specifies that the loop continues while i is
strictly less than the size of the array denoted using the arr.length property. This
iteration for-loop is idiomatic when dealing with arrays.
In addition, Java (as of version 5) supports foreach loops that allow you to iterate over
the elements of an array without using an index variable. Java refers to these loops as
enhanced for-loops, but they are essentially foreach loops. For example:
1
2
3
for(int a : arr) {
System.out.println(a);
}
The syntax still uses the keyword for , but instead of an index variable, it specifies the
type, the loop-scoped variable identifier followed by a colon and the array that you want
to iterate over. Each iteration of the loop updates the variable a to the next value in
the array.
427
31. Arrays
1
2
3
4
5
6
7
int arr[];
//create a new array of size 10:
arr = new arr[10];
//lose the reference by explicitly setting it to null:
arr = null;
//all references to the old memory are now lost and it is
//eligible for garbage collection
8
9
10
/**
* This method computes the sum of elements in the
* given array which contains n elements
*/
public static int computeSum(int arr[]) {
int sum = 0;
for(int i=0; i<size; i++) {
sum += arr[i];
}
return sum;
}
In Java, arrays are always passed by reference. Though we did not make any changes
to the contents of the passed array in the particular example, in general we could have.
Any such changes would be realized in the calling method. Unfortunately, there is no
mechanism by which we can prevent changes to arrays when passed to methods.2
We can also create an array in a method and return it as a value. For example, the
following method creates a deep copy of the given integer array. That is, a completely
new array that is a distinct copy of the old array. In contrast, a shallow copy would be if
we simply made one reference point to another reference.
2
The use of the keyword final only prevents us from changing the array reference, not modifying
the contents.
428
/**
* This method creates a new copy of the given array
* and returns it.
*/
public static int [] makeCopy(int a[]) {
int copy[] = new int[a.length];
for(int i=0; i<n; i++) {
copy[i] = a[i];
}
return copy;
}
The method returns an integer array. In fact, this method seems so useful, that it is
already provided as part of the Java Software Development Kit (SDK). The class Arrays
contains dozens of utility methods that process arrays. In particular there is a copyOf()
method that allows you to create deep copies of arrays and even allows you to expand or
shrink their size.
This creates a 2-dimensional array of integers with 10 rows and 20 columns. Once created,
we can index individual elements by specifying the row and column.
1
2
3
4
5
429
31. Arrays
cover how to use some of these data structures, but we will not go into the details of how
they are implemented nor the OOP concepts that underly them.
The Java List is an interface that defines a dynamic list data structure. This data
structure provides a dynamic collection that can grow and shrink automatically as you
add and remove elements from it. It is an interface, so it doesnt actually provide an
implementation, just a specification for the publicly available methods that you can use
on it. To common implementations are the ArrayList , which uses an array to hold
elements, and LinkedList which stores elements in linked nodes.
To create an instance of either of these lists, you use the new keyword and the following
syntax.
1
2
The first line creates a new instance of an ArrayList that is parameterized to hold
Integer types. The second creates a new instance of a LinkedList that has been
parameterized to only hold String types. Because of this parameterization, it would
be a compiler error to attempt to add anything other than Integer s to the first list or
anything other than String s to the second.
Once these lists have been created, you can add and remove elements as follows.
1
2
3
a.add(42);
a.add(81);
a.add(17);
4
5
6
7
b.add("Hello");
b.add("World");
b.add("Computers!");
The order that you add elements is preserved, so in the first list, the first element would
be 42, the second 81, and the last 17. You can remove elements by specifying an index
of the element to remove. Like arrays, lists are 0-indexed.
1
a.remove(0);
2
3
4
b.remove(2);
b.remove(0);
As you remove elements, the indices are shifted down, so that after removing the first
element in the list a , 81 becomes the new first element. Removing the last then the
430
5
6
7
Any attempt to access an element that lies outside the bounds of the List , will result
in an IndexOutOfBoundsException just as with arrays. To stay within bounds you
can use the size() method to determine how many elements are in the collection. In
this example, values.size() would return an integer value of 3.
Finally, most collections implement the Iterable interface which allows you iterate
over the elements using an enhanced for-loop just as with arrays.
1
2
3
for(Double x : values) {
System.out.println(x);
}
There are dozens of other methods that allow you to insert, remove, and retrieve elements
from a Java List ; refer to the documentation for details.
Another type of collection supported by the Collections library is a Set . A set differs
from a list in that elements are not stored in any particular order; there is no concept of
the first element or last element. Moreover, a set does not allow duplicate elements.3 A
commonly used implementation of the Set interface is the HashSet .
1
2
3
4
5
names.add("Robin");
names.add("Little John");
names.add("Marian");
6
7
8
Duplicates are determined by how the equals() and possibly the hashCode() methods are implemented in your particular objects.
431
31. Arrays
Since the elements in a Set are unordered, we cannot use an index-based get() method
as we did with a set. Fortunately, we can still use an enhanced for-loop to iterate over
the elements.
1
2
3
When this code executes we cannot expect any particular order of the three names. Any
permutation of the three may be printed. If we executed the loop more than once we
may even observe a different enumeration of the names!
Finally, Java also supports a Map data structure which allows you to store key-value
pairs. The keys and values can be any object type and are specified by two parameters
when you create the map. A common implementation is the HashMap .
1
2
3
4
5
6
7
8
9
10
11
12
The Collections library is much more extensive than what weve presented here. It
includes stacks, queues, hash tables, balanced binary search trees, and many other
dynamic data structure implementations.
432
32. Strings
As weve previously seen, Java has a String class in the standard JDK. Internally, Java
strings are stored as arrays of characters. However, because of the String class, we
never directly interact with this representation, making using strings much easier than in
other languages. Java strings have also supported Unicode since version 1.
Moreover, Java provides many methods as part of the String class that can be used to
process and manipulate strings. These methods do not change the strings since strings
in Java are immutable. Instead, these methods operate by returning a new modified
string that can then be stored in a variable.
32.1. Basics
As weve previously seen, we can declare String variables and assign them values using
the regular assignment operator.
1
2
3
4
5
Note that the reassignment in the last line in the example does not change the original
string. It just makes the variable firstName point to a new string. If there are no other
references to the old string, it becomes eligible for garbage collection and the JVM takes
care of it.
Strings can be copied using the String classs copy constructor:
String copy = new String(firstName);
However, since strings are immutable, there is rarely reason to create such a deep copy.
Though you cant change individual characters in a string, you can access them using the
charAt() method and providing an index. Characters in a string are 0-indexed just as
with elements in arrays.
433
32. Strings
1
2
3
4
5
6
7
8
if(fullName.charAt(8) == s) {
...
}
Length
When accessing individual characters in a string, it is necessary that we know the length
of the string so that we do not access invalid characters. The length() method returns
an int that represents the number of characters in the string.
1
2
3
Using this method we can easily iterate over each character in a string.
1
2
3
4
5
6
7
8
434
3
4
5
int x = 10;
double y = 3.14;
3
4
5
When used with objects, the concatenation operator ends up invoking the objects
toString() method. The plus operator as a concatenation operator is actually syntactic
sugar. When the code is compiled, it is actually replaced with equivalent code that uses a
series of the StringBuilder classs (a mutable version of the String class) append()
method. The first example above may actually be replaced with the following code that
does not use the concatenation operator.
1
2
3
4
5
6
7
Computing a Substring
There are two methods that allow you to compute a substring of a string. The first allows
you to specify a beginning index with the entire remainder of the string being included
in the returned string. The second allows you to specify a beginning index as well as an
435
32. Strings
ending index. In both cases, the beginning index is inclusive (that is the resulting string
includes the character at that index), but in the second, the ending index is exclusive (it
is not included).
1
2
3
4
5
The result of the two argument substring() method will always have length equal to
endIndex - beginIndex .
3
4
5
6
7
8
names[0]
names[1]
names[2]
names[3]
names[4]
=
=
=
=
=
"Margaret Hamilton";
"Ada Lovelace";
"Grace Hopper";
"Marie Curie";
"Hedy Lamarr";
Better yet, we can use dynamic collections such as a List or a Set of strings.
436
32.4. Comparisons
1
2
3
4
5
6
7
names.add("Margaret Hamilton");
names.add("Ada Lovelace");
names.add("Grace Hopper");
names.add("Marie Curie");
names.add("Hedy Lamarr");
8
9
System.out.println(names.get(2));
32.4. Comparisons
When comparing strings in Java, we cannot use the numerical comparison operators such
as == , or < . Because strings are represented as arrays, using these operators actually
compares the variables memory addresses.
1
2
3
4
5
6
if(a == b) {
System.out.println("strings match!");
}
The code above will not print anything even though the strings a and b have the same
content. This is because a == b is comparing the memory address of the two variables.
Since they point to different memory addresses (created by two separate calls to the
constructors) they are not equal.
Instead, there are several comparator methods that Java provides to compare strings.
Each string has a compareTo() method1 that takes another string and returns something
negative, zero, or something positive depending on the relative lexicographic ordering of
the strings.
This method is part of the String class due to the fact that strings implement the Comparable
interface which defines a lexicographic ordering.
437
32. Strings
1
2
3
4
5
6
7
int x;
String a = "apple";
String b = "zelda";
String c = "Hello";
x = a.compareTo("banana"); //x is negative
x = b.compareTo("mario"); //x is positive
x = c.compareTo("Hello"); //x is zero
8
9
10
11
12
13
String d = "Apple";
x = d.compareTo("apple");
//x is negative
In the last example, "Apple" precedes "apple" since uppercase letters are ordered
before lowercase letters according to the ASCII table. We can also make comparisons ignoring case if we need to using compareToIgnoreCase() method which works the same way.
This is a case-insensitive version of the method. Here, d.compareToIgnoreCase("apple")
will return zero as the two strings are the same ignoring the cases.
Note that the commonly used equals() method only returns true or false depending
on whether or not two strings are the same or different. They cannot be used to provide
a relative ordering of two strings.
1
2
3
String a = "apple";
String b = "apple";
String c = "Hello";
4
5
6
7
boolean result;
result = a.equals(b); //true
result = a.equals(c); //false
32.5. Tokenizing
Recall that tokenizing is the process of splitting up a string along some delimiter. For
example, the comma delimited string, "Smith,Joe,12345678,1985-09-08" contains
four pieces of data delimited by a comma. Our aim is to split this string up into four
separate strings so that we can process each one.
Java provides a very simple method to do this called split() that takes a string
delimiter and returns an array of strings containing the tokens. For example,
438
32.5. Tokenizing
1
2
3
4
5
6
7
The delimiter this method uses is actually a regular expression; a sequence of characters
that define a search pattern in which special characters can be used to define complex
patterns. For example, the complex expression,
^[+-]?(\d+(\.\d+)?|\.\d+)([eE][+-]?\d+)?$
will match any valid numerical value including scientific notation. We will not cover
regular expressions in depth, but to demonstrate their usefulness, heres an example by
which you can split a string along any and all whitespace:
1
2
3
439
Scanner s = null;
try {
s = new Scanner(new File("/user/apps/data.txt"));
} catch (FileNotFoundException e) {
//handle the exception here
}
441
String line;
while(s.hasNext()) {
line = s.nextLine();
//process the line
}
Once we are done reading the file, we can close the Scanner to free up resources:
s.close(); . We could have placed all this code within one large try-catch block
with perhaps a finally block to close the Scanner once were were done to ensure
that it would be closed regardless of any exceptions. However, Java 7 introduced a new
construct, the try-with-resources statement.
The try-with resources statement allows us to place the initialization of a closeable
resource (defined by the AutoCloseable interface) into the try statement. The JVM
will then automatically close this resource upon the conclusion of any the catch block
or upon the conclusion of any catch block. We can still provide a finally block if we
wish, but this relieves us of the need to explicitly close the resource in a finally block.
A full example:
1
2
3
4
5
6
7
8
9
10
11
Using the Scanner class to do file input offers a more abstract interaction with a file.
It also uses a buffered input stream for performance. Binary files can still be read using
nextByte() . However, the better solution is to use a class that models and abstracts
the underlying file. For example, if you are reading or writing an image file, you should
use the java.awt.Image class to read and write to files. The JDK and other libraries
offer a wide variety of classes to model all kinds of data.
442
Again, there are several ways to achieve file output, but well look at the two most
recommended ways. First, we describe how to do convenient plaintext output using a
buffered stream for performance. Unfortunately, to do this requires the nesting of several
classes, the details of each we will not go into. Essentially, we create a new FileWriter
specifying the path and file name, we then wrap that in a BufferedWriter for better
performance. Finally, for convenience, we wrap that into a PrintWriter which offers
many convenient methods for writing primitive and String types. It also offers a
printf() style method for formatting.
1
2
3
4
5
6
7
8
9
int x = 10;
double pi = 3.14;
FileWriter fw = null;
try {
fw = new FileWriter("data.txt");
} catch(IOException ioe) {
throw new RuntimeException(ioe);
}
PrintWriter pw = new PrintWriter(new BufferedWriter(fw));
10
11
12
13
pw.println("Hello World!");
pw.println(x);
pw.printf("x = %d, pi = %f\n", x, pi);
14
15
pw.close();
The close() method will conveniently close all the underlying resource (the BufferedWriter
and FileWriter ) for us. In addition, it implements AutoCloseable and so it can be
used in a try-catch-with resources statement.
Another convenience of PrintWriter is that is swallows exceptions (just as the
Scanner class did). That means we dont have to deal explicitly with the checked
IOException s that the underlying classes throw as the PrinteWriter silently catches
them (though doesnt handle them). However, this can also be viewed as a disadvantage
in that if we want to do error handling, we need to manually check if there was an error
(using checkError() ).
The PrintWriter class is intended mostly for formatted output. It does not provide
a way to write binary data to an output file. Just as with binary input, it is best to
use a class that abstracts the file type and data so that we dont have to deal with the
low-level details of the binary data.
443
444
34. Objects
Java is a class-based object oriented programming language, meaning that it facilitates
the creation of objects through the use of classes. Classes are essentially blueprints
for creating instances of objects. Weve been implicitly using classes all along since
everything in Java must be a class or belong to a class. Now, however, we will start using
classes in more depth rather than simply using static methods.
An object is an entity that is characterized by identity, state and behavior. The identity
of an object is an aspect that distinguishes it from other objects. The variables and
values that a variable takes on within an object is its state. Typically the variables that
belong to an object are referred to as member variables. Finally, an object may also
have functions that operate on the data of an object. In the context of object oriented
programming, a function that belongs to an object is referred to as a (member) method.
As a class-based object oriented language, Java implements objects using classes. A class
is essentially a blue print for creating instances of the class. A class simply specifies the
member variables and member methods that belong to instances of the class. We discuss
how to create and use instances of a class below. However, to begin, lets define a class
that models a student by defining member variables to support a first name, last name,
a unique identifier, and GPA.
To declare a class, we use the class keyword. Inside the class (denoted by curly
brackets), we place any code that belongs to the class. To declare member variables
within a class, we use the normal variable declaration syntax, but we do so outside any
methods.
1
package unl.cse;
2
3
//member variables:
String firstName;
String lastName;
int id;
double gpa;
5
6
7
8
9
10
11
445
34. Objects
Recall that a package declaration allows you to organize classes and code within a package
(directory) hierarchy. Moreover, source code for a class must be in a source file with the
same name (and is case sensitive) with the .java extension. Our Student class would
need to be in a file named Student.java and would be compiled to a class named
Student.class .
Subclasses are involved with inheritance, another object oriented programming concept that we will
not discuss here).
446
34.2. Methods
Modifier
public
protected
none (default)
private
Class Package
Y
Y
Y
Y
Y
Y
Y
N
Subclass
Y
Y
N
N
World
Y
N
N
N
package unl.cse;
2
3
//member variables:
private String firstName;
private String lastName;
private int id;
private double gpa;
5
6
7
8
9
10
11
34.2. Methods
The third aspect of encapsulation involves the grouping of methods that act on an objects
data. Within a class, we can declare member methods using the syntax were already
familiar with. We declare a member method by providing a signature and body. We
can use the same visibility keywords as with member variables in order allow or restrict
access to the methods. With methods, visibility and access determine whether or not
the method may be invoked.
In contrast to the methods we defined in Chapter 29, when defining a member method,
we do not use the static keyword. Making a variable or a method static means
that the method belongs to the class and not to instances of the class. Thus, a static
method would not be able to access the member variables or methods of an instance
unless it also had a reference to that instance.
Again, we add to our example by providing two public methods that compute and
return a result on the member variables. We also use javadoc style comments to document
447
34. Objects
each member method.
1
package unl.cse;
2
3
//member variables:
private String firstName;
private String lastName;
private int id;
private double gpa;
5
6
7
8
9
10
/**
* Returns a formatted String of the Students
* name as Last, First.
*/
public String getFormattedName() {
return lastName + ", " + firstName;
}
11
12
13
14
15
16
17
18
/**
* Scales the GPA, which is assumed to be on a
* 4.0 scale to a percentage.
*/
public double getGpaAsPercentage() {
return gpa / 4.0;
}
19
20
21
22
23
24
25
26
27
448
34.2. Methods
1
2
3
4
5
6
7
In the setter example, there is a problem: the code has no effect. There are two variables
named firstName : the instances member variable and the variable in the method
parameter. The scoping rules of Java mean that the parameter variable name(s) take
precedent. This code has no effect because its essentially setting the parameter variable
to itself. It is essentially doing the following.
1
2
int a = 10;
a = a;
Setting a variable to itself has no effect. To solve this, we use something called open
recursion. When an instance of a class is created, for example,
Student s = ...;
the reference variable s is how we can refer to it. This variable, however, exists outside
the class. Inside the class, we need a way to refer to the instance itself. In Java we use
the keyword this to refer to the instance inside the class. For example, the member
variables of an instance can be accessed using the this keyword along with the dot
operator (more below). In our example, this.firstName would refer to the instances
firstName and not to the parameter variable. Even when it is not necessary to use the
this keyword (as in the getter example above) it is still best practice to do so. Our
updated getters and setter methods would thus look like the following.
1
2
3
4
5
6
7
One advantage to using getters and setters (as opposed to naively making everything
public) is that you can have greater control over the values that your variables can take.
For example, we may want to do some data validation by rejecting null values or
invalid values. For example:
449
34. Objects
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Controlling access of member variables through getters and setters is good encapsulation.
Doing so makes your code more predictable and more testable. Making your member
variables public means that any piece of code can change their values. There is no way
to do validation or prevent bad values.
In fact, it is good practice to not even have setter methods. If the value of member
variables cannot be changed, it makes the object immutable. Weve seen this before with
the built-in wrapper classes ( Integer , String , etc.). Immutability is a nice property
because it makes instances of the class thread-safe. That is, we can use instances of the
class in a multithreaded program without having to worry about threads changing the
values of the instance on one another. Immutable classes are also safer to use in certain
collections such as Set s. Elements in a Set are unique; attempting to add a duplicate
element will have no effect on the Set . However, if the elements we add are mutable, we
could end up with duplicates. This is because uniqueness is tested only when the element
is added to the set. We could add an element that is unique, then end up changing it so
that it matches another element in the Set , violating the assumption of the collection.
34.3. Constructors
If we make the (good) design decision to make our class immutable, we still need a way
to initialize the values. This is where constructors come in. A constructor is a special
method that specifies how an object is constructed. With built-in primitive variables such
as an int , the Java language (compiler and JVM) know how to interpret and assign a
value to such a variable. However, with user-defined objects such as our Student class,
we need to specify how the object is created.
A constructor method has special syntax. Though it may still one of the visibility
450
34.3. Constructors
keywords, it has no return type and its name is the same as the class. A constructor may
take any number of parameters. For example, the following constructor allows someone
to construct a Student instance and specify all four member variables.
1
2
3
4
5
6
7
public Student() {
this.firstName = null;
this.lastName = null;
this.id = 0;
this.gpa = 0.0;
}
In both of these examples, we repeated a lot of code. One shortcut is to make all your
constructors call the most general constructor. To invoke another constructor, we use
the this keyword as a method call. For example:
451
34. Objects
1
2
3
public Student() {
this(null, null, 0, 0.0);
}
4
5
6
7
Another, very useful type of constructor is the copy constructor that allows you to make
a copy of an instance by passing it to a constructor. The following example copies each
of the member variables of s into this .
1
2
3
public Student(Student s) {
this(s.firstName, s.lastName, s.id, s.gpa);
}
34.4. Usage
Once we have defined our class and its constructors, we can create and use instances of
it. Just as with regular variables, we need to declare instances of a class by providing
the type and a variable name. For example:
1
2
Student s = null;
Student t = null;
Both of these declarations are simply just reference variables. They may refer to an
instance of the class Student , but we have initialized them to null . To make these
variables refer to valid instances, we invoke a constructor by using the new keyword and
providing arguments to the constructor.
1
2
452
System.out.println(s.getFormattedName());
2
3
4
5
453
34. Objects
takes another object obj and returns true or false depending on whether or not
the instance is equal to obj . Recall that identity is one of the defining characteristics
of objects. This method is how Java achieves identity; it defines exactly what equality
means. By default, the behavior inherited from the Object class simply checks if the
object, this and the passed object, obj are located at the same memory address,
essentially (this == obj) .
However, conceptually we may want different behavior. For example, two Student
objects may be the same if they have the same id value (it is, after all supposed to be
a unique identifier). Alternatively, we may consider two objects to be the same if every
member variable holds the same value. Even with only a four member variables, the
logic can get quite complicated, especially if we have to account for null values. For
this reason, many IDEs provide functionality to automatically generate such methods.
The following example was generated by an IDE.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
454
34.6. Composition
default behavior is to return a value associated with the memory address of the instance.
In general, however, the behavior should be overridden to be based on the entire state
of the object. A hash is simply a function that maps data to a small set of values.
In this context, we are mapping object instances to integers so that they can be used
in hash table-based data structures such as HashSet or HashMap . The hashCode()
method is used to map an instance to an integer so that it can be used as an index in
these data structures. Again, most IDEs will provide functionality to generate a good
implementation for the hashCode() method (as the example below is).
How ever you design the equals() and hashCode() method, there is a requirement:
if two instances are equal (that is, equals() returns true ) then they must have the
same hashCode() value. This requirement is necessary to ensure that hash table-based
data structures operate properly. Note that it is okay if unequal objects have equal or
unequal hash values. This rule only applies when the objects are equal.
1
2
3
4
5
6
7
8
9
10
11
12
13
34.6. Composition
Another important concept when designing classes is composition. Composition is a
mechanism by which an object is made up of other objects. One object is said to own
an instance of another object. Weve already seen this with our Student example: the
Student class owns two instances of the String class.
To illustrate the importance of composition, we could extend the design of our Student
class to include a date of birth. However, a date of birth is also made up of multiple
pieces of data (a year, a month, a date, and maybe even a time and/or locale). We could
design our own date/time class to model this, but its generally best to use what the
language already provides. Java 8 introduced the java.time package in which there
are many updated and improved classes for dealing with dates and times. The class
455
34. Objects
LocalDate for example, could be used to model a date of birth:
1
We can take this concept further and have our own user-defined classes own instances of
each other. For example, we could define a Course class and then update our Student
class to own a collection of Course objects representing a students class schedule (this
type of collection ownership is sometimes referred to as aggregation rather than strict
composition).
1
Both of these updates beg the question: who is responsible for instantiating the instances
of dateOfBirth and the schedule ? Should we force the outside user of our Student
class to build the LocalDate instance and pass it to a constructor? Should we allow the
outside code to simply provide us a date of birth as a string? Both of these are design
choices that have advantages and disadvantages that have to be considered.
What about the course schedule? We could require that a user provide the constructor
with a pre-computed Set of courses, but care must be taken. Consider the following
(partial) example.
1
2
3
4
This is the same pattern we described above: almost every data structure in the Java
Collections library has a (deep) copy constructor.
Alternatively, we could make our design a bit more flexible by allowing the construction of
a Student instance without having to provide a course schedule. Instead, we could add
a method that allowed the outside code to add a course to the schedule . Something
like the following.
456
34.7. Example
1
2
3
This adds some flexibility to our object, but removes the immutability property. Design
is always a balance and compromise between competing considerations.
34.7. Example
We present the full and completed Student class in Code Sample 34.1.
457
34. Objects
1
package unl.cse;
2
3
4
5
6
7
8
private
private
private
private
String firstName;
String lastName;
int id;
double gpa;
9
10
11
12
13
14
15
16
17
18
19
public Student() {
this(null, null, 0, 0.0);
}
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
458
/**
* Returns a formatted String of the Students
* name as Last, First.
*/
public String getFormattedName() {
return lastName + ", " + firstName;
34.7. Example
47
48
49
50
51
52
53
54
55
/**
* Scales the GPA, which is assumed to be on a
* 4.0 scale to a percentage.
*/
public double getGpaAsPercentage() {
return gpa / 4.0;
}
56
57
58
59
60
61
62
63
64
@Override
public String toString() {
return String.format("%s, %s (ID = %d); %.2f",
this.lastName,
this.firstName,
this.id,
this.gpa);
}
65
66
67
68
69
70
71
72
73
74
75
76
77
@Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + ((firstName == null) ? 0 : firstName.hashCode());
long temp;
temp = Double.doubleToLongBits(gpa);
result = prime * result + (int) (temp ^ (temp >>> 32));
result = prime * result + id;
result = prime * result + ((lastName == null) ? 0 : lastName.hashCode());
return result;
}
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
@Override
public boolean equals(Object obj) {
if (this == obj) {
return true;
}
if (obj == null) {
return false;
}
if (!(obj instanceof Student)) {
return false;
}
Student other = (Student) obj;
if (firstName == null) {
459
34. Objects
if (other.firstName != null) {
return false;
}
} else if (!firstName.equals(other.firstName)) {
return false;
}
if (Double.doubleToLongBits(gpa) != Double.doubleToLongBits(other.gpa)) {
return false;
}
if (id != other.id) {
return false;
}
if (lastName == null) {
if (other.lastName != null) {
return false;
}
} else if (!lastName.equals(other.lastName)) {
return false;
}
return true;
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
460
35. Recursion
As another example that actually does something useful, consider the following recursive
summation method that takes an array, and an index variable. The recursion works as
follows: if the index variable has reached the size of the array, it stops and returns zero
(the base case). Otherwise, it makes a recursive call to recSum() , incrementing the
index variable by 1. When the method returns, it adds its result to the i-th element in
the array. To invoke this method we would call it with an initial value of 0 for the index
variable: recSum(arr, 0) .
1
2
3
4
5
6
7
This example was not tail-recursive as the recursive call was not the final operation (the
sum was the final operation). To make this method tail recursive, we can carry the
summation through to each method call ensuring that the summation is done prior to
the recursive method call.
461
35. Recursion
1
2
3
4
5
6
7
As a final example, consider the following Java implementation of the naive recursive
Fibonacci sequence. An additional condition has been included to check for invalid
negative values of n for which an exception is thrown.
1
2
3
4
5
6
7
8
9
Java is not a language that provides implicit memoization. Instead, we need to explicitly
keep track of values using a table. In the following example, we use a Map data structure
to store previously computed values.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Java provides an arbitrary precision data type, BigInteger that can be used to compute
arbitrarily large integer values. Since Fibonacci numbers grow very quickly, using an
int we could only represent up to to F45 . Using BigInteger we can support much
462
463
36.1. Comparators
Lets consider a generic Quick Sort algorithm as was presented in Algorithm 12.6. The
algorithm itself specifies how to sort elements, but it doesnt specify how they are ordered.
The difference is subtle but important. Essentially, Quick Sort needs to know when two
elements, a, b are in order, out of order, or equivalent in order to decide which partition
each element goes in. However, it doesnt know anything about the elements a and b
themselves. They could be numbers, they could be strings, they could be user-defined
objects.
A sorting algorithm still needs to be able to determine the proper ordering in order to
sort. In Java this is achieved by using a Comparator object, which is responsible for
comparing two elements and determining their proper order. A Comparator<T> is an
interface in Java that specifies a single method:
public int compare(T a, T b)
A Comparator<T> is parameterized to operate on any type T
The method takes two instances, a and b whose type matches the parameterized
type T
The method returns an integer indicating the relative ordering of the two elements:
It returns something negative, < 0 if a comes before b (that is, a < b)
It returns zero if a and b are equal (a = b)
It returns something positive, > 0 if a comes after b (that is, a > b)
Note that there is no guarantee on the values magnitude, it does not necessarily return
1 or +1; it just returns something negative or positive. Weve previously seen this
pattern when comparing strings and other wrapper classes. Each of the standard types
implements something similar, the Comparable interface, that specifies a compareTo()
method with the same basic contract. Strings for example, are ordered in lexicographic
465
This is new syntax. When we create a Comparator in Java we are creating a new instance
of a class. However, we didnt define a classwe didnt use public class... nor did
we place it into a .java source file. Instead, this is an anonymous class declaration.
The class weve created has no name; the cmpInt is the variables name, but the class
itself has no name, its anonymous. This syntax allows us to define and instantiate a
class inline without having to create a separate class source file. This is an advantage
because we generally do not need multiple instances of a Comparator ; they would all
have the same functionality anyway. An anonymous class allows us to create ad-hoc
classes with a one-time (or one purpose) use.
Another issue with this method is that it may not be able to handle null values. When
a and b get unboxed for the comparisons, if they are null , a NullPointerException
will be thrown. We discuss how to deal with this below in Section 36.3.2.
Another issue with this Comparator is the second case where we use the equality operator
== to compare values. With the less-than operator, the two integers get unboxed and
their values are compared. However, when we use the == operator on object instances,
it is comparing memory addresses in the JVM, not the actual values. We could solve
this issue by rearranging our cases so that the equality is our final case, avoiding the
use of the equality operator. Even better, however, we can exploit the built-in natural
ordering of the integers by using the compareTo() method.
466
36.1. Comparators
1
2
3
4
5
What if we wanted to order integers in the opposite, descending order? We could simply
write another Comparator that reversed the ordering:
1
2
3
4
5
We might be tempted to instead reuse the original comparator and write line 3 above
simply as
return cmpInt(b, a);
However, Java will not allow you to reference another outside variable like this inside
a Comparator object. An anonymous class in Java is a sort of weak closure; a function
that has a scope in which variables are bound. In this case, the anonymous class has a
reference to the cmpInt variable, but it is not necessarily bound as the variable could
be reassigned outside the Comparator . If we want to do something like this, Java forces
us to make the cmpInt variable final so that it cannot change.
To illustrate some more examples, consider the Student object we defined in Code
Sample 34.1. The following Code Samples demonstrate various ways of ordering Student
instances based on one or more of their member variable values.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
/**
* This Comparator orders Student objects by
* last name/first name
*/
Comparator<Student> byName = new Comparator<Student>() {
@Override
public int compare(Student a, Student b) {
if(a.getLastName().equals(b.getLastName())) {
return a.getFirstName().compareTo(b.getFirstName());
} else {
return a.getLastName().compareTo(b.getLastName());
}
}
};
467
1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
/**
* This Comparator orders Student objects by
* last name/first name in descending (Z-to-A) order
*/
Comparator<Student> byNameDesc = new Comparator<Student>() {
@Override
public int compare(Student a, Student b) {
if(b.getLastName().equals(a.getLastName())) {
return b.getFirstName().compareTo(a.getFirstName());
} else {
return b.getLastName().compareTo(a.getLastName());
}
}
};
/**
* This Comparator orders Student objects by
* id in ascending order
*/
Comparator<Student> byId = new Comparator<Student>() {
@Override
public int compare(Student a, Student b) {
return a.getId().compareTo(b.getId());
}
};
/**
* This Comparator orders Student objects by
* GPA in descending order
*/
Comparator<Student> byGpa = new Comparator<Student>() {
@Override
public int compare(Student a, Student b) {
return b.getGpa().compareTo(a.getGpa());
}
};
468
36.2.1. Searching
Linear Search
Java Set and List collections provide several linear search methods. Both have a
public boolean contains(Object o) method that returns true or false depending on whether or not the object is in the collection. The List collection has an
additional public int indexOf(Object o) method that returns the index of the first
matched element and 1 if no such element is found ( Set s are unordered and have no
indices, so this method would not apply).
All of these functions are equality-based and they do not take a Comparator object. Instead, the elements are compared using the objects equals() (and possibly
hashCode() ) method. The first element x such that x.equals(o) returns true is
the element that is determined to match. For this reason, it is important to override
both of these methods when designing objects (for additional discussion, see Section
36.3.3 below).
Binary Search
The Arrays and Collections classes provide many variations on methods that implement a binary search. All of these methods assume that the array or List being
searched are sorted appropriately (you cannot use binary search on a Set as it is an
unordered collection).
The Arrays class provides several binarySearch() methods, one for each primitive
type as well as variations that search within a specified range of elements. The most
generally useful version is as follows:
public static <T> int Arrays.binarySearch(T[] a, T key, Comparator<T> c)
That is, it takes an array of elements as well as key and a Comparator all of the same
type T . It returns an integer representing the index at which it finds the first matching
element (there is no guarantee that the first element in the sorted list is returned). If
no match is found, then the method returns something negative.1 Another version has
the same behavior but can be called without a Comparator , relying instead on the
natural ordering of elements. For this variation, the type of elements must implement
the Comparable interface.
The Collections class provides a similar method,
1
2
The only difference is that the method takes a List instead of an array. Otherwise, the
behavior is the same. We present several examples in Code Sample 36.1.
1
It actually returns a negative value corresponding to the insertion point at which the element would
be if it had existed.
469
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
36.2.2. Sorting
As with searching, the Arrays and Collections provide parameterized sorting methods
to sort arrays and List s. In Java 6 and prior, the implementation was a hybrid
merge/insertion sort. Java 7 switched the implementation to Tim Sort. Here too, there
are several variations that rely on the natural ordering or allow you to sort a subpart of
the collection.
The Arrays provides the following method, which sorts arrays of a particular type with
a Comparator for that type.
static <T> void sort(T[] a, Comparator<T> c)
470
1
2
3
4
5
6
7
8
//sort by name:
Collections.sort(roster, byName);
Arrays.sort(rosterArr, byName);
9
10
11
12
//sort by GPA:
Collections.sort(roster, byGPA);
Arrays.sort(rosterArr, byGPA);
Code Sample 36.2: Using Java Collections Sort Method
As the name implies, the implementation is a balanced binary search tree, in particular a red-black
tree.
471
7
8
9
10
11
12
When using a SortedSet it is important that the Comparator properly orders all
elements. A SortedSet is still a Set : it does not allow duplicate elements. If the
Comparator you use returns zero for any element that you attempt to insert, it will not
be inserted. To demonstrate how this might fail, consider our byName Comparator for
Student objects. Suppose two students have the same first name and last name, John
Doe and John Doe, but have different IDs (as they are different people) and maybe
different GPAs. The Comparator would return 0 for these two objects because they
have the same first/last name, even though they are distinct people. Thus, only one of
these objects could exist in the SortedSet .
To solve this problem, it is important to define Comparator s that break ties appropriately. In this case, we would want to modify the Comparator to return a comparison
of IDs if the first and last name are the same.
472
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
This solution only handles null Student values not null values within the Student
object. If the getters used in the Comparator return null values, then we could still
have NullPointerException s.
3
4
5
6
473
3
4
5
6
7
8
When we instantiate the two Student instances, they are distinct as their IDs are
different. So when we add them to the set, their equals() method returns false and
they are both added to the Set . However, when we change the ID using the setter
on the last line, both objects are now identical, violating the no-duplicates property of
the Set . This is not a failing of the Set class, just one of the many consequences of
designing mutable objects.
2
3
The new syntax is the use of the arrow operator as a lambda expression. In this case, it maps a pair, (a, b) to the value of the expression
474
Comparator<Student> c =
(a, b) -> a.getLastName().compareTo(b.getLastName());
c = c.thenComparing( (a, b) ->
a.getFirstName().compareTo(b.getFirstName()));
c = c.thenComparing( (a, b) -> a.getGpa().compareTo(b.getGpa()));
6
7
8
We can make this even more terse using method references, another new feature in Java
8.
1
2
3
4
5
There are several other convenient methods provided by the updated Comparator
interface. For example, the reversed() member method returns a new Comparator
that defines the reversed order. The static methods, nullsFirst() and nullsLast()
can be used to modify a Comparator to order null values.
475
Part III.
The PHP Programming Language
477
37. Basics
Back in the mid-1990s the world-wide web was in its infancy but becoming more and more
popular. For the most part, web pages contained static content: articles and text that
was just-there. Web pages were far from the fully interactive and dynamic applications
that theyve become. Rasmus Lerdorf had a home page containing his resume and he
wanted to track how many visitors were coming to his page. With purely static pages,
this was not possible. So, in 1994 he developed PHP/FI which stood for Personal Home
Page tools and Forms Interpreter. This was a series of binary tools written in C that
operated through a web servers Common Gateway Interface. When a user visited a
webpage, instead of just retrieving static content, a script was invoked: it could do a
number of operations and send back HTML formatted as a response. This made web
pages much more dynamic: by serving it through a script, a counter could be maintained
that tracked how many people had visited the site. Lerdorf eventually released his source
code in 1995 and it became widely used.
Today, PHP is used in a substantial number of pages on the web and is used in many
Content Management System (CMS) applications such as Drupal and WordPress. Because
of its history, much of the syntax and aspects of PHP (now referred to as PHP: Hypertext
Preprocessor) are influenced or inspired by C. In fact, many of the standard functions
available in the language are directly from the C language.
As a scripting language, PHP is not compiled: instead, PHP code is interpreted by a
PHP interpreter. Though there are several interpreters available, the de facto PHP
interpreter is the Zend Engine, a free and open source project that is widely available on
many platforms, operating systems, and web servers.
Because it was originally intended to serve web pages, PHP code can be interleaved with
static HTML tags. This allows PHP code to be embedded in web pages and dynamically
interpreted/rendered when a user requests a webpage through a web browser. Though
rendering web pages is its primary purpose, PHP can be used as a general scripting
language from the command line (which is how well present it here).
479
37. Basics
Sample 37.1. A version in which the PHP code is interleaved in HTML is presented in
Code Sample 37.2.
1
<?php
2
3
printf("Hello World\n");
4
5
?>
Code Sample 37.1: Hello World Program in PHP
1
2
3
4
5
6
<html>
<head>
<title>Hello World PHP Page</title>
</head>
<body>
<h1>A Simple PHP Script</h1>
7
8
?>
9
10
11
</body>
</html>
Code Sample 37.2: Hello World Program in PHP with HTML
We will not focus on any particular development environment, code editor, or any
particular operating system, compiler, or ancillary standards in our presentation. However,
as a first step, you should be able to write and run the above program on the environment
you intend to use for the rest of this book. This may require that you download and
install a basic PHP interpreter/development environment (such as a standard LAMP,
WAMP or MAMP technology stack) or a full IDE (such as Eclipse, PhpStorm, etc.).
480
481
37. Basics
37.2.3. Libraries
PHP has many built-in functions that you can use. These standard libraries are loaded
and available to use without any special command to import or include them. Full
documentation on each of these functions is maintained in the PHP manual, available
online at http://php.net/manual/en/index.php. The manuals web pages also contain
many curated comments from PHP developers which contain further explanations, tips
& tricks, suggestions, and sample code to provide further assistance.
There are many useful functions that well mention as we progress through each topic.
One especially useful collection of functions is the math library. This library is directly
adapted from Cs standard math library so all the function names are the same. It
provides many common mathematical functions such as the square root and natural
logarithm. Table 37.1 highlights several of these functions; full documentation can be
found in the PHP manual (http://php.net/manual/en/ref.math.php). To use these,
you simply call them by providing input and getting the output. For example:
1
2
3
$x = 1.5;
In both of the function calls above, the value of the variable $x is passed to the math
function which computes and returns the result which then gets assigned to another
variable.
37.2.4. Comments
Comments can be written in PHP code either as a single line using two forward slashes,
//comment or as a multiline comment using a combination of forward slash and asterisk:
/* comment */ . With a single line comment, everything on the line after the forward
slashes is ignored. With a multiline comment, everything in between the forward
slash/asterisk is ignored. Comments are ultimately ignored by the compiler so the
amount of comments do not have an effect on the final executable code. Consider the
following example.
1
2
3
4
5
6
7
8
9
/*
This is a comment that can
span multiple lines to format the comment
message more clearly
*/
$y = 3.14;
Most code editors and IDEs will present comments in a special color or font to distinguish
them from the rest of the code (just as our example above does). Failure to close a
482
Function
abs($x)
Description
Absolute value, |x|
ceil($x)
floor($x)
cos($x)
sin($x)
tan($x)
Cosine functiona
Sine functiona
Tangent functiona
exp($x)
log($x)
log10($x)
pow($x,$y)
sqrt($x)
Table 37.1.: Several functions defined in the PHP math library. a all trigonometric functions assume input is in radians, not degrees. b Input is assumed to be
positive, x > 0. c alternatively, PHP supports exponentiation by using
x ** y .
483
37. Basics
multiline comment will likely result in a compiler error but with color-coded comments
its easy to see the mistake visually.
37.3. Variables
PHP is a dynamically typed language. As a consequence, you do not declare variables
before you start using them. If you need to store a value into a variable, you simply
name the variable and use an assignment operator to assign the value. Since you do
not declare variables, you also do not specify a variables type. If you assign a string
to a variable, its type becomes a string. If you assign an integer to a variable, its type
becomes an integer. If you reassign the value of a variable to a value with a different
type, the variables type also changes.
Internally, however, PHP does support several different types: Booleans, integers, floatingpoint numbers, strings, arrays, and objects.
The way that integers are represented may be platform dependent, but are usually 32-bit
signed twos complement integers, able to represent integers between 2, 147, 483, 648
and 2,147,483,647.
Floating-point numbers are also platform-dependent, but are usually 64-bit double
precision numbers are defined by the IEEE 754 standard, providing about 16 digits of
precision.
Strings and single characters are the same thing in PHP. Strings are represented as
sequences of characters from the extended ASCII text table (see Table 2.4) which
includes all characters in the range 0255. PHP does not have native Unicode support
for international characters.
484
37.3. Variables
To use a variable in PHP, you simply need to assign a value to a named variable identifier
and the variable comes into scope. Variable names always begin with a single dollar sign,
$ . The assignment operator is a single equal sign, = and is a right-to-left assignment.
That is, the variable that we wish to assign the value to appears on the left-hand-side
while the value (literal, variable or expression) is on the right-hand-size. For example:
1
2
3
$numUnits = 42;
$costPerUnit = 32.79;
$firstInitial = "C";
Each assignment also implicitly changes the variables type. Each of the variables
above becomes an integer, floating-point number, and string respectively. Assignment
statements are terminated by a semicolon like most executable statements in PHP. The
identifier rules are fairly standard: a variables name can consist of lower and uppercase
alphabetic characters, numbers, and underscores. You can also use the extended ASCII
character set in variable names but it is not recommended (umlauts and other diacritics
can easily be confused). Variable names are case sensitive. As previously mentioned,
variable names must always begin with a dollar sign, $ . Stylistically, we adopt the
modern camelCasing naming convention for variables in our code.
If you do not assign a value to a variable, that variable remains undefined or unset.
Undefined variables are treated as null in PHP. The concept of null refers to
uninitialized, undefined, empty, missing, or meaningless values. In PHP the keyword
null is used which is case insensitive ( null , Null and NULL are all the same), but
for consistency, well use null . When null values are used in arithmetic expressions,
null is treated as zero. So, (10 + null) is equal to 10. When null is used in the
context of strings, it is treated as an empty string and ignored. When used in a Boolean
expression or conditional, null is treated as false .
PHP also allows you to define constants: values that cannot be changed once set. To
define a constant, you invoke a function named define and providing a name and value.
Examples:
1
2
3
define("PI", 3.14159);
define("INSTITUTION", "University of Nebraska-Lincoln");
define("COST_PER_UNIT", 2.50);
Constant names are case sensitive. By convention, we use uppercase underscore casing.
An attempt to redefine a constant value will raise a script warning, but will ultimately
have no effect. When referring to constants later on in the script, you use the constants
name. You do not treat it as a string, nor do you use a dollar sign. For example:
$area = $r * $r * PI;
485
37. Basics
37.4. Operators
PHP supports the standard arithmetic operators for addition, subtraction, multiplication,
and division using + , - , * , and / respectively. Each of these operators is a binary
operator that acts on two operands which can either be literals or other variables and
follow the usually rules of arithmetic when it comes to order of precedence (multiplication
and division before addition and subtraction).
1
2
3
4
5
6
$a
$d
$d
$d
$d
$d
=
=
=
=
=
=
$x
$w
$w
$w
$w
$w
$w
=
=
=
=
=
=
=
1.5,
$x +
$x +
$x $x +
$x *
$x /
7
8
9
10
11
12
13
14
$y = 3.4, $z = 10.5;
5.0;
$y;
$y;
$y * $z;
$y;
$y;
15
16
17
PHP also supports the integer remainder operator using the % symbol. This operator
gives the remainder of the result of dividing two integers. Examples:
1
2
3
$x = 10 % 5; //x is 0
$x = 10 % 3; //x is 1
$x = 29 % 5; //x is 4
PHP does allow you to add two arrays together which results in their union.
486
37.4. Operators
called type juggling. When juggled, an attempt is made to convert the string variable into
a numeric value by parsing it. The parsing goes over each numeric character, converting
the value to a numeric type (either an integer or floating-point number). The first type a
non-numeric character is encountered, the parsing stops and the value parsed so far is
the value used in the expression.
Consider the following examples in Code Sample 37.3. In the first block, $a is type
juggled to the value 10. when added to 5 . In the second example, $a represents a
floating-point number, and is converted to 3.14, the result of adding to 5 is thus 8.14. In
the third example, the string does not contain any numerical values. In this case, the
parsing stops at the first character and what has been parsed so far is zero! Finally, in
the last example, the first two characters in $a are numeric, so the parsing ends at the
third character, and what has been parsed so far is 10.
1
2
3
$a = "10";
$b = 5 + $a;
print $b; //b = 15
4
5
6
7
$a = "3.14";
$b = 5 + $a;
print "b = $b"; //b = 8.14
8
9
10
11
$a = "ten";
$b = 5 + $a;
print "b = $b"; //b = 5
12
13
14
15
16
Relying on type juggling to convert values can be ugly and error prone. You can write
much more intentional code by using the several conversion functions provided by PHP.
For example:
1
2
3
$a = intval("10");
$b = floatval("3.14");
$c = intval("ten"); //c has the value zero
In all three of the examples above, the strings are converted just as they are when type
juggled. However, the variables are guaranteed to have the type indicated (integer or
floating-point number).
There are several utility functions that can be used to help determine the type of variable.
487
37. Basics
The function is_numeric($x) returns true if $x is a numeric (integer or floatingpoint number) or represents a pure numeric string. The functions is_int($x) and
is_float($x) each return true or false depending on whether or not $x is of that
type. For example:
1
2
3
4
5
6
$a
$b
$c
$d
$e
$f
=
=
=
=
=
=
10;
"10";
3.14;
"3.14";
"hello";
"10foo";
7
8
9
10
11
12
13
is_numeric($a);
is_numeric($b);
is_numeric($c);
is_numeric($d);
is_numeric($e);
is_numeric($f);
//true
//true
//true
//true
//false
//false
14
15
16
17
18
19
20
is_int($a);
is_int($b);
is_int($c);
is_int($d);
is_int($e);
is_int($f);
//true
//false
//false
//false
//false
//false
21
22
23
24
25
26
27
is_float($a);
is_float($b);
is_float($c);
is_float($d);
is_float($e);
is_float($f);
//false
//false
//true
//false
//false
//false
A more general way to determine the type of a variable is to use the function gettype($x)
which returns a string representation of the type of the variable $x . The string returned
by this function is one of the following depending on the type of $x : "boolean" ,
"integer" , "double" , "string" , "array" , "object" , "resource" , "NULL" , or
"unknown type" .
Other checker functions allow you to determine if a variable has been set, if its null,
empty etc. For example, is_null($x) returns true if $x is not set or is set, but
has been set to null . The function isset($x) returns true only if $x is set and it
is not null . The function empty($x) returns true if $x represents an empty entity:
an empty string, false , an empty array, null , "0" , 0, or an unset variable. Several
488
isset($var)
empty($var)
is_null($var)
bool(true)
bool(true)
bool(true)
bool(true)
bool(true)
bool(true)
bool(false)
bool(true)
bool(true)
bool(true)
bool(false)
bool(true)
bool(false)
bool(true)
bool(false)
bool(true)
bool(false)
bool(true)
bool(true)
bool(true)
bool(true)
bool(true)
bool(true)
bool(false)
bool(false)
bool(false)
bool(false)
bool(false)
bool(false)
bool(false)
bool(true)
bool(false)
bool(false)
bool(false)
bool(true)
bool(false)
$s = "Hello";
$t = "World!";
$msg = $s . " " . $t; //msg contains "Hello World!"
Another way you can combine strings is by placing variable values directly in a string.
The variables inside the string are replaced with the variables values. Example:
1
2
3
4
$x = 13;
$name = "Starlin";
$msg = "Hello, $name, your number is $x";
//msg contains the string "Hello, Starlin, your number is 13"
489
37. Basics
function to allow formatted output. Some examples:
1
2
$a = 10;
$b = 3.14;
3
4
5
print $a;
print "The value of a is $a\n";
6
7
8
echo $a;
echo "The value of a is $a\n";
9
10
There are also several ways to perform standard input, but the easiest is to use fgets
(short for f lie get string) using the keyword STDIN (Standard Input). This function
will return, as a string, everything the user enters up to and including the enter key
(interpreted as the endline character, \n . To remove the endline character, you can use
another function, trim which removes leading and trailing whitespace from a string. A
full example:
1
2
3
4
490
37.6. Examples
37.6. Examples
37.6.1. Converting Units
Lets start with a simple task: lets write a program that will prompt the user to enter a
temperature in degrees Fahrenheit and convert it to degrees Celsius using the formula
C = (F 32)
5
9
We begin with the basic script shell with the opening and closing PHP tags and some
comments documenting the purpose of our script.
1
<?php
2
3
4
5
6
/**
* This program converts Fahrenheit temperatures to
* Celsius
*/
7
8
9
10
?>
It is common for programmers to use a comment along with a TODO note to themselves
as a reminder of things that they still need to do with the program.
Lets first outline the basic steps that our program will go through:
1. Well first prompt the user for input, asking them for a temperature in Fahrenheit
2. Next well read the users input, likely into a floating-point number as degrees can
be fractional
3. Once we have the input, we can calculate the degrees Celsius by using the formula
above
4. Lastly, we will want to print the result to the user to inform them of the value
Sometimes its helpful to write an outline of such a program directly in the code using
comments to provide a step-by-step process. For example:
491
37. Basics
1
<?php
2
3
4
5
6
/**
* This program converts Fahrenheit temperatures to
* Celsius
*/
7
8
9
10
11
12
13
//1.
//2.
//3.
//4.
input in Fahrenheit
value from the standard input
Celsius
the user
14
15
?>
As we read each step it becomes apparent that well need a couple of variables: one to
hold the Fahrenheit (input) value and one for the Celsius (output) value. Well want to
ensure that these are floating-point numbers which we can do by making some explicit
conversion.
Well use a printf statement in the first step to prompt the user for input:
printf("Please enter degrees in Fahrenheit: ");
In the second step, well use the standard input to read the $fahrenheit variable value
from the user. Recall that we can use fgets to read from the standard input, but may
have to trim the trailing whitespace.
$fahrenheit = trim(fgets(STDIN));
If we want to ensure that the variable $fahrenheit is a floating-point value, we can
further use floatval() :
$fahrenheit = floatval($fahrenheit);
We can now compute $celsius using the formula provided:
$celsius = ($fahrenheit - 32) * (5 / 9);
Finally, we use printf again to output the result to the user:
printf("%f Fahrenheit is %f Celsius\n", $fahrenheit, $celsius);
The full program can be found in Code Sample 37.4.
492
37.6. Examples
<?php
2
3
4
5
6
/**
* This program converts Fahrenheit temperatures to
* Celsius
*/
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
?>
Code Sample 37.4: Fahrenheit-to-Celsius Conversion Program in PHP
b b2 4ac
x=
2a
As before, we can create a basic program with PHP tags and start filling in the details.
In particular, well need to prompt for the input a, then read it in; then prompt for b,
read it in and repeat for c. Thus, we have
1
2
3
4
5
6
493
37. Basics
Now to compute the roots: we need to take care that we correctly adapt the formula
so it accurately reflects the order of operations. We also need to use the math librarys
square root function (unless you want to write your own! Carefully adapting the formula
leads to
1
2
Finally, we print the output using printf . The full program can be found in Code
Sample 37.5.
<?php
2
3
4
5
6
/**
* This program computes the roots to a quadratic equation
* using the quadratic formula.
*/
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
?>
Code Sample 37.5: Quadratic Roots Program in PHP
This program was interactive. As an alternative, we could have read all three of the inputs
as command line arguments, taking care that we need to convert them to floating-point
numbers. Lines 813 in the program could have been changed to
1
2
3
$a = floatval($argv[1]);
$b = floatval($argv[2]);
$c = floatval($argv[3]);
Finally, think about the possible inputs a user could provide that may cause problems
494
37.6. Examples
for this program. For example:
What if the user entered zero for a?
What if the user entered some combination such that b2 < 4ac?
What if the user entered non-numeric values?
For the command line argument version, what if the user provided less than three
argument? Or more?
How might we prevent the consequences of such bad inputs? That is, how might we
handle the even that a users enters those bad inputs and how do we communicate these
errors to the user? To do so well need conditionals.
495
38. Conditionals
PHP supports the basic if, if-else, and if-else-if conditional structures as well as switch
statements. Logical statements are built using the standard logical operators for numeric
comparisons as well as logical operators such as negations, And, and Or.
$a = 10;
$b = 20;
$c = 20;
4
5
6
7
8
9
10
$r
$r
$r
$r
$r
$r
=
=
=
=
=
=
($a
($a
($b
($a
($a
($b
< $b);
<= $b);
<= $c);
> $b);
>= $b);
>= $c);
//true
//false
//true
//false
//false
//true
When these operators are used to compare strings to strings, the strings are compared
lexicographically according to the standard ASCII text table. Some examples follow, but
it is better to use a function (in particular strcmp which we discuss later) to do string
comparisons.
497
38. Conditionals
1
2
$s = "aardvark";
$t = "zebra";
3
4
5
6
7
$r
$r
$r
$r
=
=
=
=
($s
($s
($s
($s
However, when these operators are used to compare strings to numeric types, the strings
are converted to numbers using the same type juggling that happens when strings are
mixed with arithmetic operators. In the following example, $b gets converted to a
numeric type when compared to $a which give the results indicated in the comments.
1
2
$a = 10;
$b = "10";
3
4
5
6
7
$r
$r
$r
$r
=
=
=
=
($a
($a
($a
($a
<= $b);
< $b);
>= $b);
> $b);
//true
//false
//true
//false
With the equality operators, == and != , something similar happens. When the types of
the two operands match, the expected comparison is made: when numbers are compared
to numbers their values are compared; when strings are compared to strings, their content
is compared (case sensitively). However, when the types are different, again, type juggling
happens and strings are converted to numbers for the purpose of comparison. Thus, a
comparison like (10 == "10") ends up being true! The operators are == and != are
referred to as loose comparison operators because of this.
What if we want to ensure that were comparing apples to apples? To rectify this, PHP
offers another set of comparison operators, strict comparison operators, === and !==
(the same, but with an extra equals sign, = ). These operators will make a comparison
without type juggling either operand first. Now a similar comparison, (10 === "10")
ends up evaluating to false. The operator === will only evaluate to true if the both the
operands type and value are the same.
1
2
$a = 10;
$b = "10";
3
4
5
6
7
$r
$r
$r
$r
=
=
=
=
($a
($a
($a
($a
== $b);
!= $b);
=== $b);
!== $b);
//true
//false
//false
//true
The three basic logical operators, not ! , And && , and Or || are also supported.
498
Associativity
left-to-right
right-to-left
left-to-right
left-to-right
left-to-right
left-to-right
left-to-right
left-to-right
right-to-left
Notes
increment operators
unary negation operator, logical not
addition, subtraction
comparison
equality, inequality
logical And
logical Or
assignment and compound assignment
operators
Table 38.1.: Operator Order of Precedence in PHP. Operators on the same level have
equivalent order and are performed in the associative order specified.
$x = 15;
if($x < 10); {
printf("x is less than 10\n");
}
This PHP code will run without error or warning. However, it will end up printing
x is less than 10 , even though x = 15! Recall that a conditional statement binds
to the executable statement or code block immediately following it. In this case, weve
provided an empty executable statement ended by the semicolon. The code is essentially
499
38. Conditionals
1
2
3
4
//example of an if statement:
if($x < 10) {
printf("x is less than 10\n");
}
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
equivalent to
1
2
3
4
$x = 15;
if($x < 10) {
}
printf("x is less than 10\n");
Which is obviously not what we wanted. The semicolon ended up binding to the empty
executable statement, and the code block containing the print statement immediately
followed, but was not bound to the conditional statement which is why the print statement
executed regardless of the value of x.
Another convention that weve used in our code is where we have placed the curly brackets.
First, if a conditional statement is bound to only one statement, the curly brackets are
not necessary. However, it is best practice to include them even if they are not necessary
and well follow this convention. Second, the opening curly bracket is on the same line as
the conditional statement while the closing curly bracket is indented to the same level
as the start of the conditional statement. Moreover, the code inside the code block is
indented. If there were more statements in the block, they would have all been at the
same indentation level.
500
38.3. Examples
38.3. Examples
38.3.1. Computing a Logarithm
The logarithm of x is the exponent that some base must be raised to get x. The most
common logarithm is the natural logarithm, ln (x) which is base e = 2.71828 . . .. But
logarithms can be in any base b > 11 What if we wanted to compute log2 (x)? Or
log (x)? Lets write a program that will prompt the user for a number x and a base b
and computes logb (x).
Arbitrary bases can be computed using the change of base formula:
logb (x) =
loga (x)
loga (b)
If we can compute some base a, then we can compute any base b. Fortunately we have
such a solution. Recall that the standard library provides a function to compute the
natural logarithm, log() ). This is one of the fundamentals of problems solving: if a
solution already exists, use it. In this case, a solution exists for a different, but similar
problem (computing the natural logarithm), but we can adapt the solution using the
change of base formula. In particular, if we have variables b (base) and x , we can
compute logb (x) using
log(x) / log(b)
But wait: we have a problem similar to the examples in the previous section. The user
could enter invalid values such as b = 10 or x = 2.54 (logarithms are undefined
for non-positive values in any base). We want to ensure that b > 1 and x > 0. With
conditionals, we can now do this. Once we have read in the input from the user we can
make a check for good input using an if statement.
1
2
3
4
This code has something new: exit(1) . The exit function immediately terminates
the script regardless of the rest of the code that may remain. The argument passed to
exit is an integer that represents an error code. The convention is that zero indicates
no error while non-zero values indicate some error. This is a simple way of performing
error handling: if the user provides bad input, we inform them and quit the program,
forcing them to run it again and provide good input. By prematurely terminating the
program we avoid any illegal operation that would give a bad result.
Alternatively, we could have split the conditions into two statements and given a more
descriptive error message. We use this design in the full program which can be found in
Code Sample 38.2. The program also takes the input as command line arguments. Now
that we have conditionals, we can actually check that the correct number of arguments
1
Bases can also be 0 < b < 1, but well restrict our attention to increasing functions only.
501
38. Conditionals
was provided by the user and quit in the event that they dont provide the correct
number.
<?php
2
3
4
5
6
/**
* This program computes the logarithm base b (b > 1)
* of a given number x > 0
*/
7
8
9
10
11
if($argc != 3) {
printf("Usage: %s b x \n", $argv[0]);
exit(1);
}
12
13
14
$b = floatval($argv[1]);
$x = floatval($argv[2]);
15
16
17
18
19
20
21
22
23
if($x <= 0) {
printf("Error: x must be greater than zero\n");
exit(1);
}
if($b <= 1) {
printf("Error: base must be greater than one\n");
exit(1);
}
24
25
26
27
28
?>
Code Sample 38.2: Logarithm Calculator Program in C
502
38.3. Examples
1
2
3
4
$income = floatval(trim(fgets(STDIN)));
5
6
7
8
9
10
11
12
13
Next, we can code a series of if-else-if statements for the income range. By placing the
ranges in increasing order, we only need to check the upper bounds just as in the original
example.
1
2
3
4
5
6
7
8
9
Next we compute the child tax credit, taking care that it does not exceed $3,000. A
conditional based on the number of children should suffice as at this point in the program
we already know it is zero or greater.
1
2
3
4
5
if($numChildren <= 3) {
$credit = $numChildren * 1000;
} else {
$credit = 3000;
}
Finally, we need to ensure that the credit does not exceed the total tax liability (the
credit is non-refundable, so if the credit is greater, the tax should only be zero, not
negative).
1
2
3
4
5
503
38. Conditionals
The full program is presented in Code Sample 38.3.
504
38.3. Examples
1
2
3
<?php
//prompt for income from the user
printf("Please enter your Adjusted Gross Income: ");
4
5
$income = floatval(trim(fgets(STDIN)));
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
if($numChildren <= 3) {
$credit = $numChildren * 1000;
} else {
$credit = 3000;
}
37
38
39
40
41
42
43
44
45
46
47
printf("AGI:
printf("Tax:
printf("Credit:
printf("Tax Liability:
$%10.2f\n",
$%10.2f\n",
$%10.2f\n",
$%10.2f\n",
$income);
$baseTax);
$credit);
$totalTax);
48
49
?>
505
38. Conditionals
<?php
2
3
4
5
6
/**
* This program computes the roots to a quadratic equation
* using the quadratic formula.
*/
7
8
9
10
11
if($argc != 4) {
printf("Usage: %s a b c\n", $argv[0]);
exit(1);
}
12
13
14
15
$a = floatval($argv[1]);
$b = floatval($argv[2]);
$c = floatval($argv[3]);
16
17
18
19
20
21
22
23
24
25
26
27
28
if($a === 0) {
printf("Error: a cannot be zero\n");
exit(1);
} else if($b*$b < 4*$a*$c) {
printf("Error: cannot handle complex roots\n");
exit(1);
} else if($b*$b === 4*$a*$c) {
$root1 = -$b / (2*$a);
printf("Only one distinct root: %f\n", $root1);
} else {
$root1 = (-$b + sqrt($b*$b - 4*$a*$c) ) / (2*$a);
$root2 = (-$b - sqrt($b*$b - 4*$a*$c) ) / (2*$a);
29
30
31
32
33
34
35
?>
Code Sample 38.4: Quadratic Roots Program in PHP With Error Checking
506
39. Loops
PHP supports while loops, for loops, and do-while loops using the keywords while , for ,
and do (along with another while ). Continuation conditions for loops are enclosed
in parentheses, (...) and the blocks of code associated with the loop are enclosed in
curly brackets.
1
2
3
4
5
$i = 1; //Initialization
while($i <= 10) { //continuation condition
//perform some action
$i++; //iteration
}
Code Sample 39.1: While Loop in PHP
In addition, the continuation condition does not contain a semicolon since it is not an
executable statement. Just as with an if-statement, if we had placed a semicolon it would
have led to unintended results. Consider the following:
1
2
3
4
A similar problem occurs: the while keyword and continuation condition bind to
the next executable statement or code block. As a consequence of the semicolon, the
executable statement that gets bound to the while loop is empty. What happens is
even worse: the program will enter an infinite loop. To see this, the code is essentially
equivalent to the following:
507
39. Loops
1
2
3
4
5
6
In the while loop, we never increment the counter variable $i , the loop does nothing,
and so the computation will continue on forever! Some compilers will warn you about
this, others will not. It is valid PHP and will run, but obviously wont work as intended.
Avoid this problem by using proper syntax.
Another common use case for a while loop is a flag-controlled loop in which we use a
Boolean flag rather than an expression to determine if a loop should continue or not.
Since PHP has built-in Boolean types, we can use a variable along with the keywords
true and false appropriately. An example can be found in Code Sample 39.2.
1
2
3
4
5
6
7
8
9
$i = 1;
$flag = true;
while($flag) {
//perform some action
$i++; //iteration
if($i>10) {
$flag = false;
}
}
Code Sample 39.2: Flag-controlled While Loop in PHP
1
2
3
4
$i;
for($i=1; $i<=10; $i++) {
//perform some action
}
Code Sample 39.3: For Loop in PHP
508
$i;
do {
//perform some action
$i++;
} while($i <= 10);
Code Sample 39.4: Do-While Loop in PHP
Note the syntax and style: the opening curly bracket is again on the same line as the
keyword do . The while keyword and continuation condition are on the same line as
the closing curly bracket. In a slight departure from consistent syntax, a semicolon does
appear at the end of the continuation condition even though it is not an executable
statement.
In the foreach syntax we specify the array we want to iterate over, $arr and use the
keyword as . The last element in the statement is the variable name that we want to use
within the loop. This should be read as foreach element $x in the array $arr ....
1
Actually, PHP supports associative arrays, which are not the same thing as traditional arrays.
509
39. Loops
Inside the loop, the variable $x will be automatically updated on each iteration to the
next element in $arr .
39.5. Examples
39.5.1. Normalizing a Number
Lets revisit the example from Section 4.1.1 in which we normalize a number by continually
dividing it by 10 until it is less than 10. The code in Code Sample 39.5 specifically refers
to the value 32145.234 but would work equally well with any value of $x .
1
2
3
4
5
6
$x = 32145.234;
$k = 0;
while($x > 10) {
$x = $x / 10;
$k++;
}
Code Sample 39.5: Normalizing a Number with a While Loop in PHP
39.5.2. Summation
Lets revisit the example from Section 4.2.1 in which we computed the sum of integers
1 + 2 + + 10. The code is presented in Code Sample 39.6
1
2
3
4
$sum = 0;
for($i=1; $i<=10; $i++) {
$sum += $i;
}
Code Sample 39.6: Summation of Numbers using a For Loop in PHP
Of course we could easily have generalized the code somewhat. Instead of computing a
sum up to a particular number, we could have written it to sum up to another variable
$n , in which case the for loop would instead look like the following.
1
2
3
510
39.5. Examples
1
2
3
4
5
6
7
$n = 10;
$m = 20;
for($i=0; $i<$n; $i++) {
for($j=0; $j<$m; $j++) {
printf("(i, j) = (%d, %d)\n", $i, $j);
}
}
Code Sample 39.7: Nested For Loops in PHP
The inner loop execute for j = 0, 1, 2, . . . , 19 < m = 20 for a total of 20 times. However, it
executes 20 times for each iteration of the outer loop. Since the outer loop execute for i =
0, 1, 2, . . . , 9 < n = 10, the total number of times the printf statement execute is 10
20 = 200. In this example, the sequence (0, 0), (0, 1), (0, 2), . . . , (0, 19), (1, 0), . . . , (9, 19)
will be printed.
However, recall that we may have problems due to accuracy. The monthly payment
could come out to be a fraction of a cent, say $43.871. For accuracy, we need to ensure
that all of the figures for currency are rounded to the nearest cent. The standard math
library does have a round function, but it only rounds to the nearest whole number,
not the nearest 100th.
However, we can adapt the off-the-shelf solution to fit our needs. If we take the number,
multiply it by 100, we get (say) 4387.1 which we can now round to the nearest whole
number, giving us 4387. We can then divide by 100 to get a number that has been
rounded to the nearest 100th! In PHP, we could simply do the following.
$monthlyPayment = round($monthlyPayment * 100) / 100;
We can use the same trick to round the monthly interest payment and any other number
expected to be whole cents. To output our numbers, we use printf and take care to
511
39. Loops
align our columns to make make it look nice. To finish our adaptation, we handle the
final month separately to account for an over/under payment due to rounding. The full
solution can be found in Code Sample 39.8.
512
39.5. Examples
1
2
3
4
5
<?php
if($argc != 4) {
printf("Usage: %s principle apr terms\n", $argv[0]);
exit(1);
}
$principle = floatval($argv[1]);
$apr = floatval($argv[2]);
$n = intval($argv[3]);
7
8
9
10
$balance = $principle;
$monthlyInterestRate = $apr / 12;
11
12
13
//monthly payment
$monthlyPayment = ($monthlyInterestRate * $principle) /
(1 - pow( (1 + $monthlyInterestRate), -$n));
//round to the nearest cent
$monthlyPayment = round($monthlyPayment * 100) / 100;
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
?>
513
40. Functions
Functions are essential in PHP programming. As weve already seen, PHP provides a
large library of standard functions to perform basic input/output, math, and many other
functions. PHP also provides the ability to define and use your own functions.
PHP does not support function overloading, so when you define a function and give it a
name, that name cannot be in conflict with any other function name in the standard
library or any other code that you might use. Therefore, careful though should go into
the design of your functions.
PHP supports both call by value and call by reference. As of PHP 5.6, vararg functions
are also supported (though earlier versions supported some vararg-like functions such as
printf() ). However, we will not go into detail here. Finally, another feature of PHP is
that function parameters are all optional. You may invoke a function with a subset of the
parameters; depending on your PHP setup, it may be issue a warning that a parameter
was omitted. However, PHP allows you to define default values for optional parameters.
515
40. Functions
1
2
3
4
5
6
/**
* Computes the sum of the two arguments.
*/
function sum($a, $b) {
return ($a + $b);
}
7
8
9
10
11
12
13
14
15
16
/**
* Computes the Euclidean distance between the 2-D points,
* (x1,y1) and (x2,y2).
*/
function getDistance($x1, $y1, $x2, $y2) {
$xDiff = ($x1-$x2);
$yDiff = ($y1-$y2);
return sqrt( $xDiff * $xDiff + $yDiff * $yDiff);
}
17
18
19
20
21
22
23
24
25
26
27
28
/**
* Computes a monthly payment for a loan with the given
* principle at the given APR (annual percentage rate) which
* is to be repaid over the given number of terms (usually
* months).
*/
function getMonthlyPayment($principle, $apr, $terms) {
$rate = ($apr / 12.0);
$payment = ($principle * $rate) / (1-pow(1+$rate, -$terms));
return $payment;
}
Function identifiers (names) follow similar naming rules as variables, however they do
not begin with a dollar sign. Function names must begin with an alphabetic character
and may contain alphanumeric characters as well as underscores. However, using modern
coding conventions we usually name functions using lower camel casing. Another quirk
of PHP is that function names are case insensitive. Though we declared a function,
getDistance() above, it could be invoked with either getdistance() , GETDISTANCE
or any other combination of capital/lower case letters. However, good code will use
consistent naming and your function calls should match their declaration.
The keyword return is used to specify the value that is returned to the calling function.
Whatever value you end up returning is the return type of the function. Since you do not
specify variable or return types, functions are usually referred to as returning a mixed
type. You could design a function that, given one set of inputs, returns a number while
another set of inputs ends up returning a string.
You can use the syntax return; to return no value (you do not use the keyword void ).
In practice, however, the function ends up returning null when doing this.
516
<?php
2
3
include_once("utils.php");
4
5
6
The include_once function essentially loads and evaluates the given PHP source file
at the point in the code in which it is invoked. The once in the function refers to the
fact that if the source file was already included in the script/code before, it will not be
included a second time. This allows you to include the same source file in multiple source
files without a conflict.
$a = 10, $b = 20;
$c = sum($a, $b); //c contains the value 30
3
4
5
6
7
8
9
10
517
40. Functions
in front of it in the function signature. 1 No other syntax is necessary and when you
call the function, PHP automatically takes care of the referencing/dereferencing for you.
Consider the following examples.
1
<?php
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
$x = 10;
$y = 20;
17
18
19
20
21
22
23
24
?>
The first function, swap() passes both variables by value. Swapping the values only
affects the copies of the parameters. The original variables $x and $y will be unaffected.
In the second function, swap2() , both variables are passed by reference as there are
ampersands in front of them. Swapping them inside the function, swaps the original
variables. The output to this code is as follows.
x = 10, y = 20
x = 10, y = 20
x = 20, y = 10
Observe that when we invoked the function, swapByRef($x, $y); we used the same
syntax as the pass by value version. The only syntax needed to pass by reference is in
the function signature itself.
Those familiar with pointers in C will note that this is the exact opposite of the C operator.
518
40.2. Examples
$func = swapByRef;
2
3
$func($x, $y);
In the example above, we assigned the function swapByRef() to the variable $func
by using its identifier. The variable essentially holds a reference to the swapByRef()
function. Since it refers to a function, we can also invoke the function using the variable
as in the last line. This allows you to treat functions as callbacks to other functions. We
will revisit this concept in Chapter 47.
40.2. Examples
40.2.1. Generalized Rounding
Recall that the standard math library provides a round() function that rounds a number
to the nearest whole number. Often, weve had need to round to cents as well. We
now have the ability to write a function to do this for us. Before we do, however, lets
think more generally. What if we wanted to round to the nearest tenth? Or what if we
wanted to round to the nearest 10s or 100s place? Lets write a general purpose rounding
function that allows us to specify which decimal place to round.
The most natural input values would be to specify the place using an integer exponent.
That is, if we wanted to round to the nearest tenth, then we would pass it 1 as
0.1 = 101 , 2 if we wanted to round to the nearest 100th, etc. On the positive end
passing in 0 would correspond to the usual round function, 1 to the nearest 10s spot,
and so on.
Moreover, we could demonstrate good code reuse (as well as procedural abstraction)
by scaling the input value and reusing the functionality already provided in the math
librarys round() function. We could further define a roundToCents() function that
used our generalized round function. Consider the following.
Some would use a much more restrictive definition of first-class and would not consider them first-class
citizens in this sense
519
40. Functions
1
<?php
2
3
4
5
6
7
8
9
10
11
/**
* Rounds to the nearest digit specified by the place
* argument. In particular to the (10^place)-th digit
*/
function roundToPlace($x, $place) {
$scale = pow(10, -$place);
$rounded = round(x * $scale) / $scale;
return $rounded;
}
12
13
14
15
16
17
18
/**
* Rounds to the nearest cent
*/
function roundToCents($x) {
return roundToPlace($x, -2);
}
19
20
?>
We could place these functions into a file named round.php and include them in another
PHP source file.
b2 4ac
2a
Since there are two roots, we may have to write two functions, one for the plus root
and one for the minus root both of which take the coefficients, a, b, c as arguments.
However, if we wrote a single function that took the coefficients as parameters by value
as well as two other parameters by reference, we could place both root values, one in each
of the by-reference variables.
520
40.2. Examples
1
2
3
4
5
6
By using pass by reference variables, we avoid multiple functions. Recall that there
could be several bad inputs to this function. The roots could be complex values, the
coefficient a could be zero, etc. In the next chapter, we examine how we can handle these
errors.
521
By using a generic Exception , we can only attach a message to the exception (which
can be printed by code that catches the exception). If we want more fine-grained control
over the type of exceptions, we need to define our own exceptions.
function readNumber() {
$input = readline("Please enter a number: ");
if( is_numeric($input) ) {
$value = floatval($input);
} else {
throw new Exception("Invalid input!");
}
}
523
try {
readNumber();
} catch(Exception $e) {
printf("Error: exception encountered: " . $e->getMessage());
exit(1);
}
In this example, weve simply displayed an error message to the standard error output
and exited the program. That is, weve made the design decision that this error should
be fatal. We could have chosen to handle this error differently in the catch block. The
$e->getMessage() prints the message that the exception was created with. In this
case, "Invalid input!" .
/**
* Defines a ComplexRoot exception class
*/
class ComplexRootException extends Exception
{
public function __construct($message = null,
$code = 0,
Exception $previous = null) {
// simply call the parent constructor
parent::__construct($message, $code, $previous);
}
12
13
14
15
16
17
Now in our code we can catch and even throw this new type of exception.
524
1
2
3
4
5
6
7
In the code above we had two catch blocks. Since we can have multiple types of
exceptions, we can also catch each different type and handle them differently if we choose.
Each catch block catches a different type of exception. The last catch block was
written to catch a generic Exception . This last block will essentially catch any other
type of exception. Much like an if-else-if statement, the first type of exception that
is caught is the block that will be executed and they are all mutually exclusive. Thus, a
catch all block like this should always be the last catch block. The most specific types
of exceptions should be caught first and the most general types should be caught last.
525
42. Arrays
PHP allows you to use arrays, but PHP arrays are actually associative arrays. Though
you can treat them as regular arrays and use contiguous integer indices, they are more
flexible. You can also use strings as indices for example. In addition, since PHP is
dynamically typed, PHP arrays allow mixed types. An array has no fixed type and you
can place different mixed types into the same array. Moreover, PHP arrays are dynamic,
so there is no memory management or allocation/deallocation of memory space. Arrays
will grow and shrink automatically as you add and remove elements.
3
4
5
6
7
8
42.2. Indexing
By default, when inserting elements, 0-indexing is used. In the three examples above,
each element would be located at indices 0, 1, and 2 respectively. The usual square
bracket syntax can be used to access and assign elements.
527
42. Arrays
1
2
3
4
5
6
7
8
9
10
11
Attempting to access an element at an invalid index does not actually result in an error
or an exception (though a warning may be issued depending on how PHP is setup).
Instead, if you attempt to access an invalid element, it will be treated as a null value.
1
2
8
9
10
11
528
$arr = array();
$arr[0] = 5;
$arr[foo] = 10;
$arr[hello] = world;
5
6
Note that strings that contain integer values will be type-juggled into their numeric
values. For example, $arr["10"] = 3; will be equivalent to $arr[10] = 3; . However,
strings containing floating-point values will not be coerced but will remain as strings,
$arr["3.15"] = 7; for example.
$arr = array();
$arr[0] = 10;
$arr[5] = 20;
The values at indices 1 through 4 are undefined and the array contains some holes in
its indices.
$arr = array(
"foo" => 5,
4 => "bar",
0 => 3.14,
"baz" => "ten"
);
529
42. Arrays
1
2
For convenience and debugging, a special function, print_r() allows you to print the
contents of an array in a human-readable format that resembles the key-value initialization
syntax above. For example,
1
2
3
4
5
6
7
$arr = array(
"foo" => 5,
4 => "bar",
0 => 3.14,
"baz" => "ten"
);
print_r($arr);
$keys = array_keys($arr);
$vals = array_values($arr);
print_r($keys);
print_r($vals);
would print
Array
(
[0]
[1]
[2]
[3]
)
Array
(
[0]
[1]
[2]
[3]
)
=>
=>
=>
=>
foo
4
0
baz
=>
=>
=>
=>
5
bar
3.14
ten
Finally, you can use the equality operators, == and === to compare arrays. The first is
the loose equality operator and evaluates to true if the two compared arrays have the
same key-value pairs while the second is the strict equality operator and is true only if
the arrays have the same key/value pairs in the same order and are of the same type.
530
42.4. Iteration
42.4. Iteration
If we have an array in PHP that we know is 0-indexed and all elements are contiguous, we
can use a normal for-loop to iterate over its elements by incrementing an index variable.
1
2
3
This fails, however, when we have an associative array that has a mix of integer and
string keys or holes in the indexing of integer keys. For this reason, it is more reliable
to use foreach loops. There are several ways that we can use a foreach loop. The
most general usage is to use the double arrow notation to iterate over each key-value
pair.
1
2
3
4
This syntax gives you access to both the key and the value for each element in the array
$arr . The keyword as is used to denote the variable names $key and $val that
will be changed on each iteration of the loop. You need not use the identifiers $key and
$val ; you can use any legal variable names for the key/value variables.
If you do not need the keys when iterating, you can use the following shorthand syntax.
1
2
3
4
By using the assignment operator but not specifying the index, the element will be added
531
42. Arrays
to the next available integer index. Since there were already 3 elements in the array,
each subsequent element is inserted at index 3, 4, and finally 5. In general, the element
will be inserted at the maximum index value already used plus one. The example above
results in the following.
Array
(
[0]
[1]
[2]
[3]
[4]
[5]
)
=>
=>
=>
=>
=>
=>
10
20
30
5
15
25
532
function setFirst($a) {
$a[0] = 5;
}
4
5
6
7
8
=> 10
=> 20
=> 30
=> 10
=> 20
=> 30
That is, the change to the first element does not affect the original array. However, if we
specify that the array is passed by reference, then the change is realized. For example,
1
2
3
function setFirst(&$a) {
$a[0] = 5;
}
4
5
6
7
8
=> 10
=> 20
=> 30
=> 5
=> 20
=> 30
533
42. Arrays
PHP supports multidimensional arrays in the sense that elements in an array can be of
any type, including other arrays.
We can use all the same syntax and operations for single dimensional arrays. For example,
we can use the double arrow syntax and assign arrays as values to create a 2-dimensional
array.
1
2
3
4
5
6
$mat = array(
0 => array(10, 20, 30),
1 => array(40, 50, 60),
2 => array(70, 80, 90)
);
print_r($mat);
Alternatively, you can use two indices to get and set values from a 2-dimensional array.
1
2
3
4
5
534
0
3
6
9
3
6
9
12
6
9
12
15
535
43. Strings
As weve previously seen, PHP has a built-in string type. Internally, PHP strings are
simply a sequence of bytes, but for our purposes we can treat it as a 0-indexed character
array. PHP strings are mutable and can be changed, but it is considered best practice
to treat them as mutable and rely on the many functions PHP provides to manipulate
strings.
43.1. Basics
As weve previously seen, we can create strings by simply assigning a string literal value
to a variable as PHP is a dynamically typed language. Strings can be specified by either
single quotes or double quotes (there are no individual characters in PHP, only single
character strings), but we will mostly use the double quote syntax.
1
2
$firstName = "Thomas";
$lastName = "Waits";
3
4
5
The reassignment in the last line in the example effectively destroys the old string. The
assignment operator can also be used to make copies of strings
1
2
$firstName = "Thomas";
$alias = $firstName;
It is important to understand that this assignment essentially makes a deep copy of the
string. Changes to the first do not affect the second one.
You can make changes to individual characters in a string by treating it like a zero-indexed
array.
1
2
3
4
$a = "hello";
$a[0] = "H";
$a[5] = "!";
//a is now "Hello!"
Note that the last line extends the string by adding an additional character. You can
even remove characters by setting them to the empty string.
537
43. Strings
1
2
3
$a = "Apples!";
$a[5] = "";
//a is now "Apple!"
$s
$x
$s
$x
=
=
=
=
"Hello World!";
strlen($s); //x is 12
"";
strlen($s); //x is 0
5
6
7
8
//careful:
$s = NULL
$x = strlen($s); //x is 0
As demonstrated in the last example, strlen() will return 0 even for NULL strings.
Recall that we can distinguish between these two situations we can use is_null() .
Using this function we can easily iterate over each individual character in a string.
1
2
3
4
538
=
=
=
=
=
=
=
=
=
T
o
m
W
a
i
t
s
$firstName = "Tom";
$lastName = "Waits";
3
4
5
$x = 10;
$y = 3.14;
3
4
5
Computing a Substring
PHP provides a simple function, substr() to compute a substring of a string. It takes
at at least 2 arguments: the string to operate on and the starting index. There is a third,
optional parameter that allows you to specify the length of the resulting substring.
1
2
3
4
5
As in the final example, omitting the optional length parameter results in the entire
remainder of the string being returned as the substring.
539
43. Strings
1
2
3
4
5
6
$names = array(
"Margaret Hamilton",
"Ada Lovelace",
"Grace Hopper",
"Marie Curie",
"Hedy Lamarr");
43.4. Comparisons
When comparing strings in PHP, we can use the usual numerical operators such as
=== , < , or <= which will compare the strings lexicographically. However, this is
generally discouraged because of type juggling issues and strict vs loose equality/inequality
comparisons.
Instead, there are several comparator methods that PHP provides to compare strings
based on their content. strcmp($a, $b) takes two strings and returns an integer based
on the lexicographic ordering. of $a and $b . If $a precedes $b , strcmp() returns
something negative. It returns zero if $a and $b have the same content. Otherwise it
returns something positive if $b precedes $a .
Some examples:
1
2
3
4
5
6
7
8
$x = strcmp("Apple", "apple");
//x is negative
In the last example, "Apple" precedes "apple" since uppercase letters are ordered
before lowercase letters according to the ASCII table. We can also make comparisons
ignoring case if we need to using the alternative, strcasecmp($a, $b) , a case-insensitive
version. Here, strcasecmp("Apple", "apple") will return zero as the two strings are
the same ignoring the cases.
The comparison functions also have length-limited versions, strncmp($a, $b, $n)
and strncasecmp($a, $b, $n) . Both will only make comparisons in the first $n
characters of the strings. Thus, strncmp("apple", "apples", 5) will result in zero
as the two strings are equal in the first 5 characters.
540
43.5. Tokenizing
43.5. Tokenizing
Recall that tokenizing is the process of splitting up a string along some delimiter. For
example, the comma delimited string, "Smith,Joe,12345678,1985-09-08" contains
four pieces of data delimited by a comma. Our aim is to split this string up into four
separate strings so that we can process each one.
PHP provides several functions to to this, explode() and preg_split() .
The simpler one, explode() takes two arguments: the first one is a string delimiter and
the second is the string to be processed. It then returns an array of strings.
1
$data = "Smith,Joe,12345678,1985-09-08";
2
3
4
5
6
7
The more sophisticated one, preg_split() also takes two arguments1 , but instead of
a simple delimiter, it actually uses a regular expression; a sequence of characters that
define a search pattern in which special characters can be used to define complex patterns.
For example, the complex expression ^[+-]?(\d+(\.\d+)?|\.\d+)([eE][+-]?\d+)?$
will match any valid numerical value including scientific notation. We will not cover
regular expressions in depth, but to demonstrate their usefulness, heres an example by
which you can split a string along any and all whitespace:
1
2
3
541
5
6
7
8
9
if(!$input) {
printf("Unable to open input file\n");
exit(1);
}
10
11
12
13
14
if(!$output) {
printf("Unable to open output file\n");
exit(1);
}
The two checks above check that the file opened successfully. If opening the file failed,
fopen() returns false (and the interpreter issues warning).
543
$h = fopen("input.data", "r");
while(!feof($h)) {
//read the next line:
$line = fgets($h);
//trim it:
$line = trim($line);
//process it, well just print it
print $line;
}
$x = 10;
$y = 3.14;
3
4
5
6
544
$h = fopen("http://cse.unl.edu", "r");
$contents = "";
while(!feof($h)) {
$contents .= fgets($h);
}
6
7
8
//or just:
$contents = file_get_contents("http://cse.unl.edu");
545
45. Objects
Object-oriented features have been continually added to PHP with each successive version.
Starting with version 5, PHP has had a full, class-based object oriented programming
support, meaning that it facilitates the creation of objects through the use of classes and
class declarations. Classes are essentially blueprints for creating instances of objects.
An object is an entity that is characterized by identity, state and behavior. The identity
of an object is an aspect that distinguishes it from other objects. The variables and
values that a variable takes on within an object is its state. Typically the variables that
belong to an object are referred to as member variables. Finally, an object may also
have functions that operate on the data of an object. In the context of object oriented
programming, a function that belongs to an object is referred to as a (member) method.
A class declaration simply specifies the member variables and member methods that
belong to instances of the class. We discuss how to create and use instances of a class
below. However, to begin, lets define a class that models a student by defining member
variables to support a first name, last name, a unique identifier, and GPA.
To declare a class, we use the class keyword. Inside the class (denoted by curly
brackets), we place any code that belongs to the class. To declare member variables
within a class, we place specify the variable names and their visibility inside the class,
but outside any methods in the class.
1
class Student {
//member variables:
private $firstName;
private $lastName;
private $id;
private $gpa;
3
4
5
6
7
8
9
To organize code, it is common practice to place class declarations in separate files with
the same name as the class. For example, this Student class declaration would be
placed in a file named Student.php and included in any other script files that utilized
the class.
547
45. Objects
Class Subclass
Y
Y
Y
Y
Y
N
World
Y
N
N
45.2. Methods
The third aspect of encapsulation involves the grouping of methods that act on an objects
data. Within a class, we can declare member methods using the syntax were already
1
Subclasses are involved with inheritance, another object oriented programming concept that we will
not discuss here.
548
45.2. Methods
familiar with. We declare a member method by using the keyword function and
providing a signature and body. We can use the same visibility keywords as with member
variables in order allow or restrict access to the methods. With methods, visibility and
access determine whether or not the method may be invoked.
Again, we add to our example by providing two public methods that compute and
return a result on the member variables. We also use javadoc-style comments to document
each member method.
1
class Student {
//member variables:
private $firstName;
private $lastName;
private $id;
private $gpa;
3
4
5
6
7
8
/**
* Returns a formatted String of the Students
* name as Last, First.
*/
public function getFormattedName() {
return $this->lastName . ", " . $this->firstName;
}
9
10
11
12
13
14
15
16
/**
* Scales the GPA, which is assumed to be on a
* 4.0 scale to a percentage.
*/
public function getGpaAsPercentage() {
return $this->gpa / 4.0;
}
17
18
19
20
21
22
23
24
25
There is some new syntax in the example above. In the member methods, we need a way
to refer to the instances member variables. The keyword $this is used to refer to the
instance, this is known as open recursion.
When an instance of a class is created, for example,
$s = new Student();
the reference variable $s is how we can refer to it. This variable, however, exists outside
the class. Inside the class, we need a way to refer to the instance itself. Since we dont
have a variable inside the class to reference the instance itself, PHP provides the keyword
$this in order to do so. Then, to access the member variables we use the arrow operator
(more below) and reference the member variable via its identifier but with no dollar sign.
549
45. Objects
2
4
5
6
7
One advantage to using getters and setters (as opposed to naively making everything
public ) is that you can have greater control over the values that your variables can
take. For example, we may want to do some data validation by rejecting null values or
invalid values. For example:
You can use the syntax $this->$foo but it will assume that $foo is a string that contains the
name of another variable, for example, if $foo = "firstName"; then $this->$foo would resolve
to the instances $firstName variable. This is useful if your object has been dynamically created
by adding variables at runtime that were not part of the original class declaration.
550
45.3. Constructors
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Controlling access of member variables through getters and setters is good encapsulation.
Doing so makes your code more predictable and more testable. Making your member
variables public means that any piece of code can change their values. There is no way
to do validation or prevent bad values.
In fact, it is good practice to not even have setter methods. If the value of member
variables cannot be changed, it makes the object immutable. Immutability is a nice
property because it makes instances of the class thread-safe. That is, we can use instances
of the class in a multithreaded program without having to worry about threads changing
the values of the instance on one another.
45.3. Constructors
If we make the (good) design decision to make our class immutable, we still need a way
to initialize the values. This is where a constructor comes in. A constructor is a special
method that specifies how an object is constructed. With built-in variables such as an
numbers or strings, PHP knows how to interpret and assign a value to such a variable.
However, with user-defined objects such as our Student class, we need to specify how
the object is created.
Just as with functions outside of classes, PHP does not support function overloading
inside classes. That is, you can only have one and only one function with a given
identifier (name). Thus, there is only one possible constructor. Moreover, PHP reserves
the name __construct for the constructor method. The two underscores are a naming
convention used by PHP to denote Magic Methods that are reserved and have a special
purpose in the language. Further, magic methods must be made public . Some magic
methods provide default behavior while others do not. For example, if you do not define
a constructor method, the default behavior will be to create an object whose member
variables all have null values.
The following constructor allows a user to construct an instance of our Student instance
551
45. Objects
and specify all four member variables.
1
2
3
4
5
6
Though we cannot define multiple constructors, we can use the default value feature
of PHP functions to allow a user to call our constructor with a different number of
parameters. For example,
1
2
3
4
5
6
7
45.4. Usage
Once we have defined our class and its constructors, we can create and use instances of it.
Just as with regular variables, we simply need to assign them to an instance of an object
and their type will dynamically change to match. To create new instances, we invoke a
constructor by using the new keyword and providing arguments to the constructor.
1
2
3
2
3
4
5
552
45.6. Composition
Another important concept when designing classes is composition. Composition is a
mechanism by which an object is made up of other objects. One object is said to own
an instance of another object.
To illustrate the importance of composition, we could extend the design of our Student
class to include a date of birth. However, a date of birth is also made up of multiple
pieces of data (a year, a month, a date, and maybe even a time and/or locale). We could
design our own date/time class to model this, but its generally best to use what the
language already provides. PHP 5.2 introduced the DateTime object in which there is
a lot of functionality supporting the representation and comparison of dates and time.
We can take this concept further and have our own user-defined classes own instances of
each other. For example, we could define a Course class and then update our Student
class to own a collection of Course objects representing a students class schedule
(this type of collection ownership is sometimes referred to as aggregation rather than
composition).
Both of these design updates beg the question: who is responsible for instantiating the
instances of $dateOfBirth and the $schedule ? Should we force the outside user of
our Student class to build a DateTime instance and pass it to a constructor? Should
we allow the outside code to simply provide us a date of birth as a string and make the
constructor responsible for creating the proper DateTime instance? Do we require that
a user create a complete array of Course instances and provide it to the constructor at
instantiation?
A more flexible approach might be to allow the construction of a Student instance
553
45. Objects
without having to provide a course schedule. Instead, we could add a method that
allowed the outside code to add a course to the student. For example,
1
2
3
This adds some flexibility to our object, but removes the immutability property. Design
is always a balance and compromise between competing considerations.
45.7. Example
We present the full and completed Student class in Code Sample 45.1.
554
45.7. Example
1
2
<?php
class Student {
3
4
5
6
7
8
9
private
private
private
private
private
private
$firstName;
$lastName;
$id;
$gpa;
$dateOfBirth;
$schedule;
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
/**
* Returns a formatted String of the Students
* name as Last, First.
*/
public function getFormattedName() {
return $this->lastName . ", " . $this->firstName;
}
33
34
35
36
37
38
39
40
/**
* Scales the GPA, which is assumed to be on a
* 4.0 scale to a percentage.
*/
public function getGpaAsPercentage() {
return $this->gpa / 4.0;
}
41
42
43
44
45
46
555
45. Objects
return $this->lastName;
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
?>
556
46. Recursion
PHP supports recursion with no special syntax. However, recursion is generally expensive
and iterative or other non-recursive solutions are generally preferred. We present a few
examples to demonstrate how to write recursive functions in PHP.
The first example of a recursive function we gave was the toy count down example. In
PHP it could be implemented as follows.
1
2
3
4
5
6
7
8
function countDown($n) {
if($n===0) {
printf("Happy New Year!\n");
} else {
printf("%d\n", $n);
countDown($n-1);
}
}
As another example that actually does something useful, consider the following recursive
summation function that takes an array, its size and an index variable. The recursion
works as follows: if the index variable has reached the size of the array, it stops and returns
zero (the base case). Otherwise, it makes a recursive call to recSum() , incrementing
the index variable by 1. When the function returns, it adds its result to the i-th element
in the array. To invoke this function we would call it with an initial value of 0 for the
index variable: recSum($arr, 0) .
1
2
3
4
5
6
7
This example was not tail-recursive as the recursive call was not the final operation (the
sum was the final operation). To make this function tail recursive, we can carry the
summation through to each function call ensuring that the summation is done prior to
the recursive function call.
557
46. Recursion
1
2
3
4
5
6
7
As a final example, consider the following PHP implementation of the naive recursive
Fibonacci sequence. An additional condition has been included to check for invalid
negative values of n for which we throw an exception.
1
2
3
4
5
6
7
8
9
function fibonacci($n) {
if($n < 0) {
throw new Exception("Undefined for n<0.");
} else if($n <= 1) {
return 1;
} else {
return fibonacci($n-1) + fibonacci($n-2);
}
}
PHP is not a language that provides implicit memoization. Instead, we need to explicitly
keep track of values using a table. In the following example, the table is passed through
as an argument.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
558
559
What if we wanted to order integers in the opposite order? We could write another
comparator in which the comparisons or values are reversed. Even simpler, we could
reuse the comparator above and flip the sign by multiplying by 1 (that is, after one
of the purposes of writing functions: code reuse). Even simpler still, we could just flip
the arguments we pass to cmpInt() to reverse the order.
1
2
3
To illustrate some more examples, consider the Student class we defined in Code Sample
45.1. The following Code Samples demonstrate various ways of ordering Student
instances based on one or more of their components.
1
2
3
4
5
/**
* A comparator function to order Student instances by
* last name/first name in alphabetic order
*/
function studentByNameCmp($a, $b) {
7
8
9
10
11
12
13
560
/**
* A comparator function to order Student instances by
* last name/first name in reverse alphabetic order
*/
function studentByNameCmpDesc($a, $b) {
7
8
1
2
3
4
5
}
/**
* A comparator function to order Student instances by
* id in ascending numerical order
*/
function studentIdCmp($a, $b) {
7
8
9
10
11
12
13
14
1
2
3
4
5
}
/**
* A comparator function to order Student instances by
* GPA in descending order
*/
function studentGpaCmp($a, $b) {
7
8
9
10
11
12
13
14
47.1.1. Searching
PHP provides a linear search function, array_search() that can be used to search for
an element in an array. The array can be specified to use loose comparisons (default) or
561
3
4
5
6
7
8
PHP does not provide a standard binary search function. Though you can write your
own binary search implementation, likely the reason that that PHP does not provide
one is because one is not needed. The purpose of binary search is to search a sorted
array efficiently. However, PHP arrays are not usual arrays: they are associative arrays,
essentially key-value maps. Retrieving an element via its key is essentially a constant-time
operation, even more efficient that binary search. A better solution may be to simply
store the elements using a proper key which can be used to retrieve the element later on.
Even this solution is not ideal as PHP associative array keys are limited to integers and
strings.
47.1.2. Sorting
PHPs sort() function can be used to sort elements in ascending order. This is useful
if you have arrays of numbers or strings, but doesnt work very well if you have an array
of mixed types or objects.
1
2
3
4
5
6
7
PHP provides a more versatile sorting function, usort() (user defined sort) that
accepts a comparator function that it uses to define the ordering of elements. To pass
a comparator function to the usort() function, we pass a string value containing the
name of the comparator function we wish to use. Recall that function names in PHP are
case insensitive, though it is still best practice to match the naming. Several examples of
the usage of this function are presented in Code Sample 47.1.
562
1
2
3
4
5
6
7
8
9
//sort by name:
usort($roster, "studentByNameCmp");
10
11
12
//sort by ID:
usort($roster, "studentIdCmp");
13
14
15
//sort by GPA:
usort($roster, "studentGpaCmp");
Code Sample 47.1: Using PHPs usort() Function
563
Glossary
abstraction a technique for managing complexity whereby levels of complexity are
established so that higher levels do not see or have to worry about details at lower
levels.
algorithm a process or method that consists of a specified step-by-step set of operations.
17
anonymous class a class that is defined inline without declaring a named class; typically created because the instance has a single use and there is no reason to create
multiple instances. 464
anonymous function a function that has no identifier or name, typically created so that
it can be passed as an argument to another function as a callback. 136
anti-pattern a common software pattern that is used as a solution to recurring problems
that is usually ineffective in solving the problem or introduces risks and other
problems; a technical term for common bad-habits that can be found in software.
144
array an ordered collection of pieces of data, usually of the same type.
assignment operator an operator that allows a user to assign a value to a variable. 33
backward compatible a program, code, library, or standard that is compatible with
previous versions so that current and older versions of it can coexist and successfully
operate without breaking anything. 29
bit the basic unit of information in a digital computer. A bit can be either 1 or 0
(alternatively, true/false, on/off, high voltage/low voltage, etc.). Originally a
portmanteau (mash up) of binary digit. 4, 22, 23
Boolean a data type that represents the truth value of a logical statement. Booleans
typically have only two values: true or false. 30
bug A flaw or mistake in a computer program that results in incorrect behavior that may
have unintended such as errors or failure. The term predates modern computer
systems but was popularized by Grace Hopper who, when working with the Mark
II computer in 1946 traced a system failure to a moth stuck in a relay. 45, 78, 143
byte a unit of information in a digital computer consisting of 8 bits. 4
565
Glossary
cache a component or data structure that stores data in an efficiently retrievable manner
so that future requests for the data are fast. 200
call by reference when a variables memory address is passed as a parameter to a
function, enabling the function to manipulate the contents of the memory address
and change the original variables value. 132
call by value when a copy of a variables value is passed as a parameter to a function;
the function has no reference to the original variable and thus changes to the copy
inside the function have no effect on the original variable. 130
callback a function or executable unit of code that is passed as an argument to another
function with the intention that the function that it is passed to will execute or
call back the passed function at some point. 134, 351
case sensitive a language is case sensitive if it recognizes differences between lower and
upper case characters in identifier names. A language is case insensitive if it does
not. 19
chomp the operation of removing any endline characters from a string (especially when
read from a file); may also refer more generally to removing leading and trailing
whitespace from a string or trimming it. 325,
closure a function with its own environment in which variables exist. 465
code smell a symptom or common pattern in source code that is usually indicative of
a deeper problem or design flaw; smells are usually not bugs and may not cause
problems in and of themselves, but instead indicate a pattern of carelessness or low
quality of software design or implementation. 144
comparator a function or object that allows you to pass in two elements a, b for comparison and returns an integer indicating their relative order: something negative,
zero, or something positive if a < b, a = b or a > b respectively. 172, 319, 435, 538
compile the process of translating code in a high-level programming language to a low
level language such as assembly or machine code.
computer engineering a discipline integrating electrical engineering and computer science that tends to focus on the development of hardware and its interaction with
software.
computer science the mathematical modeling and scientific study of computation.
concatenation the process of combining two (or more) strings to create a new string by
appending one of them to the end of the other. 172
constant a variable whose value cannot be changed once set.
566
Glossary
contradiction a logical statement that is always false regardless of the truth values of
the statements variables. 68
control flow the order in which individual statements in a program are executed or
evaluated. 71
cruft anything that is left over, redundant or getting in the way; in the context of code
cruft is code that is no longer needed, legacy or simply poorly written source code.
dangling pointer when a reference to dynamically allocated memory is lost and the
memory can no longer be deallocated, resulting in a memory leak. Alternatively,
when a reference points to memory that gets deallocated or reallocated but the
pointer remains unmodified, still referencing the deallocated memory. 158
dead code a code segment that has no effect on a program either because it is unused
or unreachable (the conditions involving the code will never be satisfied). 68
debug the process of analyzing a program to find a fault or error with the code that
leads to bad or unexpected results. 144
debugger a software tool that facilitates debugging; usually a debugger simulates the
execution of a program allowing a developer to view the contents of a program as
it executes and to walk through the execution step by step. 144
deep copy in contrast to a shallow copy, a deep copy is a copy of an array or other piece
of data that is distinct from the original. Changes to one copy do not affect the
other. 160, 308, 426, 431, 535
defensive programming an approach to programming in which error conditions are
checked and handled, preventing undefined or erroneous operations from happening
in a program. 38, 47, 80
dynamic programming a technique for solving problems that involves iteratively computing values to subproblems, storing them in a table so that they can be used to
solve larger versions of the problem. 200
dynamic typing a variable whose type can change during runtime based on the value it
is assigned. 31,
encapsulation the grouping and protection of data together into one logical entity along
with the functionality (functions or methods) that act on that data. 30
enumerated type a data type (usually user defined) that consists of a list of named
values. 299, 420
exception an event or occurrence of an erroneous or exceptional condition that interrupts the normal flow of control in a program, handing control over to exception
handler(s). 147
567
Glossary
expression a combination of values, constants, literals, variables, operators and possibly
function calls such that when evaluated, produce a resulting value. 34
file a resource on a computer stored in memory that holds data. 177
flowcharts a diagram that represents an algorithm or process, showing steps as boxes
connected by arrows which establish an order or flow. 17
function a sequence of program instructions that perform a specific task, packaged as a
unit, also known as a subroutine. 125
function overloading the ability to define multiple functions with the same name but
with with a different number of or different types of parameters. 136
garbage collection automated memory management in which a garbage collector attempts to reclaim memory (garbage) that is no longer being used by a program so
that it can be reallocated for other purposes. 160, 367
global scope a variable, function, or other element in a program has global scope if it is
visible or has effect throughout the entire program. 32,
grok slang, meaning to understand something.
hardware (computer hardware) the physical components that make up a computer
system such as the processor, motherboard, storage devices, input and output
devices, etc..
hexadecimal base-16 number system using the symbols 0, 1, . . . , 9, A, B, C, D, E, F;
usually denoted with a prefix 0x such as 0xff1321ab01 . 451
hoisting usually used in interpreted languages, hoisting involves processing code to find
variable or function declarations and processing them before actually executing the
code or script. 126
identifier a symbol, token, or label that is used to refer to a variable. Essentially, a
variables name. 18
idiom in the context of programming, an idiom is a commonly used pattern, expression
or way of structuring code that is well-understood for users of the language. For
example, a for-loop structure that iterates over elements in an array. May also refer
to a programming design pattern.. 305,
immutable an object whose internal state cannot be changed once created, alternatively,
one whose internal state cannot be observably changed once created. 373, 407, 431,
448, 549
568
Glossary
inheritance an object oriented programming principle that allows you to derive an object
from another object, usually to allow for more specificity.
input data or information that is provided to a computer program for processing. 41
interactive a program that is designed to interface with humans by prompting them for
input and displaying output directly to them. 41
interactive an informal, abstract, high-level description of a process or algorithm. 41
keyword a word in a programming language with a special meaning in a particular
context. In contrast to a reserved word, a keyword may be used for an identifier
(variable or function name) but it is strongly discouraged to do so as the keyword
already has an intended meaning.
kilobyte a unit of information in a digital computer consisting of 1024 bytes (equivalently,
210 bytes), KB for short.
kludge a poorly designed or thrown-together solution; a design that is a collection of
ill-fitting parts that may be functional, but is fragile and not easily maintained.
lambda expression another term for anonymous functions. 472
lexicographic a generalization of the usual dictionary order as codified with the ASCII
character table. 172
linking the process of generating an executable file from (multiple) object files.
lint (or linter) a static code analysis tool that analyzes code for suspicious or error-prone
code that is likely to cause problems. 144
literal in a programming language, a literal is notation for specifying a value such as a
number of string that can be directly assigned to a variable. 29, 34
magic number a value used in a program with unexplained, undocumented, or ambiguous meaning, usually making the code less understandable. 299, 300, 421
mantissa the part of a floating-point number consisting of its significant digits (called a
significand in scientific notation). 25
map a data structure that allows you to store elements as key-value pairs with the key
mapping to a value.
memoization a technique which uses a table to store previously computed values of a
function so that they do not need to be recomputed, essentially the table serves as
a cache. 200
569
Glossary
memory leak the gradual loss of memory when a program fails to deallocate or free up
unused memory, degrading performance and possibly resulting in the termination
of the program when memory runs out. 158, 311, 314
naming convention a set of guidelines for choosing identifier names for variables, functions, etc. in a programming language. Conventions may be generally accepted
by all developers of a particular language or they may be established for use in a
particular library, framework, or organization. 20
network a collection of two or more computer systems linked together through a physical
connection over which data can be transmitted using some protocol.
octal base-8 number system using the symbols 0, 1, 2, . . . , 6, 7; usually denoted with a
single leading zero such as 0123742 .
open recursion a mechanism by which an object is able to refer to itself usually using a
keyword such as this or self .. 447, 547
operand the arguments that an operator applies to. 33
operator a symbol used to denote some transformation that combines or changes the
operands it is applied to to produce a new value. 33
order of precedence the order in which operators are evaluated, multiplication is performed before addition for example. 38
output data or information that is produced as the result of the execution of a program.
41
overflow when an arithmetic operation results in a number that is larger than the
specified type can represent overflow occurs resulting in an invalid result. 39
parse to process data to identify its individual components or elements. 173,
persistence the characteristic of data that outlives the process or program that created
it; the saving of data across multiple runs of a program. 177
pointer a reference to a particular memory location in a computer. 30, 285
polymorphism an object oriented programming concept that allows you to treat a
variable, method, or object as different types.
primitive a basic data type that is defined and provided by a programming language.
Typically numeric and character types are primitive types in a language for example.
Generally, the user doesnt need to define the operations involving primitive types
as they are defined by the language. Primitive data types are used as the basic
building blocks in a program and used to compose more complex user-defined types.
30, 373
570
Glossary
procedural abstraction the concept that a procedure or sequence of operations can be
encapsulated into one logical unit (function, subroutine, etc.) so that a user need
not concern themselves with the low-level details of how it operates. 126
program stack also referred to as a call stack, it is an area of memory where stack
frames are stored for each function call containing memory for arguments, local
variables and return values/addresses. 129
protocol a set of rules or procedures that define how communication takes place.
pseudocode the act of a program asking a user to enter input and subsequently waiting
for the user to enter data. 13
queue a data structure that store elements in a FIFO (First-In First-Out) manner;
elements can be added to the end of a queue by an enqueue operation and removed
from the start of a queue by a dequeue operation.
radix the base of a number system. Binary, octal, decimal, hexadecimal would be base
2, 8, 10, and 16 respectively.
reentrant a function that can be interrupted during its execution while another thread
can successfully invoke the function without the two functions interfering with the
data used in either function call. 321
refactor the process of modifying, updating or restructuring code without changing
its external behavior; refactoring may be done to make code more efficient, more
readable, more reliable, or simply to bring it into compliance with style or coding
conventions.
reference a reference in a computer program is a variable that refers to an object or
function in memory. 30
regular expression a sequence of characters in which special characters and directives
can be used to define a complex pattern that can be searched and matched in
another string or data. 437, 539
reserved word a word or identifier in a language that has a special meaning to the
syntax of the language and therefore cannot be used as an identifier in variables,
functions, etc.. 19
scope the scope of a variable, method, or other entity in a program is the part of the
program in which the name or reference of the entity is bound. That is, the part of
the program that knows about the variable in which the variable can be accessed,
changed, or used. 32, 247, 369
571
Glossary
scope a function signature is how a function is uniquely identified. A signature includes
the name (identifier) of the function, its parameter list (and maybe types) and the
return type. 126
segmentation fault a fault or error that arises when a program attempts to access a
segment of memory that it is not allowed access to, usually resulting in the program
being terminated by the operating system. 285, 304
shallow copy in contrast to a deep copy, a shallow copy is merely a reference that refers
to the original array or piece of data. The two references point to the same data,
so if the data is modified, both references will realize it.. 160, 308, 426
short circuiting the process by which the second operand in a logical statement is not
evaluated if the value of the expression is determined by the first operand. 70
software any set of machine-readable instructions that can be executed in a computer
processor.
software engineering the study and application of engineering principles to the design,
development, and maintenance of complex software systems.
spaghetti code a negative term used for code that is overly complex, disorganized or
unstructured code.
stack a data structure that stores elements in a LIFO (last-in first-out) manner; elements
can be added to a stack via a push operation which places the element on the top
of the stack; elements can be removed from the top of the stack via a pop operation.
129
stack overflow when a program runs out of stack space, it may result in a stack overflow
and the termination of the program. 197
static analysis the analysis of software that is performed on source (or object) code
without actually running or compiling a program usually by using an automated
tool that can detect actual or potential problems with the source code (other than
syntactic problems that could easily be found by a compiler). 144
static dispatch when function overloading is supported in a language, this is the mechanism by which the compiler determines which function should be called based on
the number and type of arguments passed to the function when it is called. 136
static typing a variable whose type is specified when it is created (declared) and does
not change while the variable remains in scope. 31,
string a data type that consists of a sequence of characters which are encoded under
some encoding standard such as ASCII or Unicode. 27, 171
572
Glossary
string concatenation an operation by which a string and another data type are combined
to form a new string. 37
syntactic sugar syntax in a language or program that is not absolutely necessary (that
is, the same thing can be achieved using other syntax), but may be shorter, more
convenient, or easier to read/write. In general, such syntax makes the language
sweeter for the humans reading and writing it. 39, 79, 98, 153, 433
tautology a logical statement that is always true regardless of the truth values of the
statements variables. 68
token when something (usually a string) is parsed, the individual components or elements
are referred to as tokens. 173
top-down design an approach to problem solving where a problem is broken down into
smaller parts. 3, 125
transpile to (automatically) translate code in one programming language into code
in another programming language, usually between two high-level programming
languages.
truncation removing the fractional part of a floating-point number to make it an integer.
Truncation is not a rounding operation. 36, 254, 377
twos complement A way of representing signed (positive and negative) integers using
the first bit as a sign bit (0 for positive, 1 for negative) and where negative numbers
are represented as the complement with respect to 2n (the result of subtracting the
number from 2n ) . 24
type a variables type is the classification of the data it represents which could be
numeric, string, boolean, or a user defined type. 22
type casting converting or variables type into another type, for example, converting an
integer into a more general floating-point number, or converting a floating-point
number into an integer, truncating and losing the fractional part. 36, 254, 377
underflow when an arithmetic operation involving floating-point numbers results in a
number that is smaller than the smallest representable number underflow occurs
resulting in an invalid result. 39
Unicode an international character encoding standard used in programming languages
and data formats. 431
validation the process of verifying that data is correct or conforms to certain expectations
including formatting, type, range of values, represents a valid value, etc.. 42
573
Glossary
variable a memory location which stores a value that may be set using an assignment
operator. Typically a variable is referred to using a name or identifier . 18
widget a generic term for a graphical user interface component such as a button or text
box.
574
Acronyms
ACM Association for Computing Machinery.
ALU Arithmetic and Logic Unit. 4
ANSI American National Standards Institute. 245
API Application Programmer Interface. 15, 177, 420
ASCII American Standard Code for Information Interchange. 27, 62, 177, 181, 251, 313,
319, 369, 374, 436, 538
CE Computer Engineering.
CLA Command Line Arguments.
CLI Command Line Interface. 44, 487
CMS Content Management System. 477
CMYK Cyan-Magenta-Yellow-Key.
CPU Central Processing Unit. 4, 70
CS Computer Science.
CSS Cascading Style Sheets.
CSV Comma Separated Values. 173, 325
CWD Current Working Directory. 178
CYA Cover Your Ass.
DRY Dont Repeat Yourself. 282
EB Exabyte.
ECMA European Computer Manufacturers Association.
EDI Electronic Data Interchange.
EOF End Of File. 178
575
Acronyms
FIFO First-In First-Out.
FOSS Free and Open Source Software.
GB Gigabyte.
GCC GNU Compiler Collection.
GDB GNU Debugger.
GIF Graphics Interchange Format.
GIMP GNU Image Manipulation Program.
GIS Geographic Information System.
GNU GNUs Not Unix!. 48, 312
GUI Graphical User Interface. 42, 136, 351
HTML HyperText Markup Language. 175, 178, 477, 543
IDE Integrated Development Environment. 5, 21, 46, 48, 250, 369, 373, 452, 453, 480
IEC International Electrotechnical Commission. 26, 245
IEEE Institute of Electrical and Electronics Engineers. 26, 251, 298, 482
IP Internet Protocol. 193
ISO International Organization for Standardization. 245, 331
JDBC Java Database Connectivity.
JDK Java Development Kit. 367, 369, 411, 415, 419, 431, 440
JEE Java Enterprise Edition.
JIT Just In Time. 9
JPEG Joint Photographic Experts Group. 177
JRE Java Runtime Environment.
JSON JavaScript Object Notation. 179, 543
JVM Java Virtual Machine. 9, 367, 368, 407, 415, 419, 424, 425, 431, 448, 451, 464
KB Kilobyte. 154, 442
576
Acronyms
LIFO Last-In First-Out. 129
MAC Media Access Control. 193
MB Megabyte. 154
MP3 MPEG-2 Audio Layer III.
MPEG Moving Picture Experts Group.
NIST National Institute of Standards and Technology.
ODBC Open Database Connectivity.
OEM Original Equipment Manufacturer.
OOP Object-Oriented Programming. 368, 370, 417, 428, 479
PB Petabyte.
PHP PHP: Hypertext Preprocessor (a recursive backronym; used to stand for Personal
Home Page).
PNG Portable Network Graphics.
POJO Plain Old Java Object.
POSIX Portable Operating System Interface. 250, 298, 324, 331
RAM Random Access Memory. 5
REPL Read-Eval-Print Loop.
RGB Red-Green-Blue.
ROM Read-Only Memory.
RTFM Read The Freaking Manual.
RTM Read The Manual.
SDK Software Development Kit. 427
SE Software Engineering.
SEO Search Engine Optimization.
SQL Structured Query Language. 415, 420
SSL Secure Sockets Layer. 286
577
Acronyms
STEAM Science, Technology, Engineering, Art, and Math.
STEM Science, Technology, Engineering, and Math.
TB Terabyte.
TCP Transmission Control Protocol.
TDD Test-Driven Development.
TSV Tab Separated Values. 173
UI User Interface.
URL Uniform Resource Locator.
UTF-8 Universal (Character Set) Transformation Format8-bit.
UX User Experience.
VLSI Very Large Scale Integration. 4
W3C World Wide Web Consortium.
WWW World Wide Web. 175, 367
XML Extensible Markup Language. 21, 179, 543
578
Index
arrays, 151
multidimensional, 160
associative arrays, 163
binary search, 207
case sensitive, 19
comparator, 232
composition, 193
conditionals, 61
if statement, 71
if-else statement, 72
if-else-if statement, 75
conjunction, see logical operatorsand
deep copy, 160
disjunction, see logical operatorsor
do-while loop, 96
error handling, 143
exception, 147
exception, 147
file I/O, 177
for loop, 95
foreach loop, 98, 153
functions, 125
objects, 191
open recursion, 194
operator, 33
logical, 61
order of precedence, 38
logic, 69
if statement, 71
if-else statement, 72
if-else-if statement, 75
infinite loop, 93, 99
Insertion Sort, 219
recursion, 197
search
binary search, 207
linear search, 206
579
Index
searching, 205
Selection Sort, 215
sets, 163
shallow copy, 160
short circuiting, 70
sorting, 205
Heap Sort, 231
Insertion Sort, 219
Merge Sort, 226
Quick Sort, 221
Selection Sort, 215
stability, 237
Tim Sort, 231
strings, 27, 171
comparison, 172
tokenizing, 173
Tim Sort, 231
type juggling, 487
variable, 18
scope, 32
while loop, 93
580
Bibliography
[1] Mars climate orbiter. http://mars.jpl.nasa.gov/msp98/orbiter/, 1999. [Online;
accessed 17-March-2015].
[2] Moth in the machine: Debugging the origins of bug. Computer World Magazine,
September 2011.
[3] errno.h: system error numbers - base definitions reference. http://pubs.opengroup.
org/onlinepubs/9699919799/basedefs/errno.h.html, 2013. [Online; accessed
13-September-2015].
[4] ISO/IEC 9899 - Programming Languages - C. http://www.open-std.org/JTC1/
SC22/WG14/www/standards, 2013.
[5] Java platform standard edition 7. http://docs.oracle.com/javase/7/docs/api/,
2015. [Online; accessed 10-February-2015].
[6] List of software bugs. https://en.wikipedia.org/wiki/List_of_software_bugs,
2015. [Online; accessed 12-September-2015].
[7] Unchecked exceptions the controversy. https://docs.oracle.com/javase/
tutorial/essential/exceptions/runtime.html, 2015. [Online; accessed 15September-2015].
[8] Douglas Adams. Dirk Gentlys Holistic Detective Agency. Pocket Books, 1987.
[9] Jon L. Bentley and M. Douglas McIlroy. Engineering a sort function. Softw. Pract.
Exper., 23(11):12491265, November 1993.
[10] Joshua Bloch. Extra, extra - read all about it: Nearly all binary searches
and mergesorts are broken. http://googleresearch.blogspot.com/2006/06/
extra-extra-read-all-about-it-nearly.html, 2006.
[11] Michael Braukus. NASA honors apollo engineer. NASA News, September 2003.
[12] IEEE Computer Society. Portable Applications Standards Committee, England)
Open Group (Reading, Institute of Electrical, Electronics Engineers, and IEEESA Standards Board. Standard for Information Technology: Portable Operating
System Interface (POSIX) : Base Specifications. Number Issue 7 in IEEE Std. 2008.
581
Bibliography
[13] Edsger W. Dijkstra. Why numbering should start at zero. https://www.cs.utexas.
edu/users/EWD/transcriptions/EWD08xx/EWD831.html, 1982. [Online; accessed
September 25, 2015].
[14] Bruce Eckel. Thinking in Java. Prentice Hall PTR, Upper Saddle River, NJ, USA,
4th edition, 2005.
http://stackoverflow.
[15] Internet Goons. Do i cast the result of malloc?
com/questions/605845/do-i-cast-the-result-of-malloc. [Online; accessed
September 27, 2015].
[16] Arend Heyting. Die formalen Regeln der intuitionistischen Logik. Berlin, 1930. First
use of the notation as a negation operator.
[17] C. A. R. Hoare. Algorithm 64: Quicksort. Commun. ACM, 4(7):321, July 1961.
[18] C. A. R. Hoare. Quicksort. The Computer Journal, 5(1):1016, 1962.
[19] IEC. IEC 60559 (1989-01): Binary floating-point arithmetic for microprocessor
systems. 1989. This Standard was formerly known as IEEE 754.
[20] IEEE Task P754. IEEE 754-2008, Standard for Floating-Point Arithmetic. August
2008.
[21] ISO. ISO 8601:1988. Data elements and interchange formats Information interchange Representation of dates and times. 1988. See also 1-page correction, ISO
8601:1988/Cor 1:1991.
[22] John McCarthy. Towards a mathematical science of computation. In In IFIP
Congress, pages 2128. North-Holland, 1962.
[23] Brian W. Kernighan. Programming in C A Tutorial. Bell Laboratories, Murray
Hill, New Jersey, 1974.
[24] Brian W. Kernighan. The C Programming Language. Prentice Hall Professional
Technical Reference, 2nd edition, 1988.
[25] Donald E. Knuth. Von neumanns first computer program. ACM Comput. Surv.,
2(4):247260, December 1970.
[26] Tony Long. The man who saved the world by doing ... nothing. Wired, September
2007.
[27] M. V. Wilkes, D. J. Wheeler and S. Gill. The preparation of programs for an
electronic digital computer, with special reference to the EDSAC and the use of a
library of subroutines. Addison-Wesley Press, Cambridge, Mass., 1951.
582
Bibliography
[28] Richard E. Pattis. Textbook errors in binary searching. In Proceedings of the
Nineteenth SIGCSE Technical Symposium on Computer Science Education, SIGCSE
88, pages 190194, New York, NY, USA, 1988. ACM.
[29] Giuseppe Peano. Studii de Logica Matematica. In Atti della Reale Accademia delle
scienze di Torino, volume 32 of Classe di Scienze Fisiche Matematiche e Naturali,
pages 565583. Accademia delle Scienze di Torino, Torino, April 1897.
[30] Tim Peters. [Python-Dev] Sorting. Python-Dev mailing list, https://mail.python.
org/pipermail/python-dev/2002-July/026837.html, July 2002.
[31] Bertrand Russell. The theory of implication. American Journal of Mathematics,
28:159202, 1906.
[32] Bruce A. Tate. Seven Languages in Seven Weeks: A Pragmatic Guide to Learning
Programming Languages. Pragmatic Bookshelf, 1st edition, 2010.
[33] Alfred North Whitehead and Bertrand Arthur William Russell. Principia mathematica; vol. 1. Cambridge Univ. Press, Cambridge, 1910.
[34] J. W. J. Williams. Algorithm 232: Heapsort. Communications of the ACM, 7(6):347
348, 1964.
583