CMP e 220 Merged
CMP e 220 Merged
Class 1
System Software
– or –
History of Computing
CMPE 220 1
System Software / History of
Computing
Programming Architecture
• Machine language • Microprogramming
• Loaders • CISC
• Assembly language • RISC
• Macro Processors
• Distributed Systems
• Linkers
• Networked Systems
• Compilers
• Schedulers
• Memory Managers
• Servers
CMPE 220 2
Rear Admiral Grace Murray Hopper
• December 9, 1906 – January 1, 1992
• PhD, Mathematics, Yale University, 1934
• One of the first programmers of
the Harvard Mark I computer, she was a
pioneer in computer programming who
invented one of the first linkers.
• She popularized the idea of machine-
independent programming languages,
which led to the development of COBOL,
an early high-level programming
language still in use today.
CMPE 220 3
My Background
Robert Nicholson
• BS, Computer Science - 1975
CSU, Chico
• MS, Computer Engineering – 1978
Stanford University
• Engineering and Management:
Hewlett-Packard, Oracle, Silicon Graphics, Sun Microsystems,
Red Herring Communications, NASA, various startups
• While in college, started a company building kit computers for local
businesses
CMPE 220 4
Housekeeping
• Class information in Canvas
• Email: [email protected]
Grading
• Assignments: 30%
• Midterm: 25%
• Final: 45%
CMPE 220 6
Examples and Assignments
• Examples will be based on Unix/Linux/POSIX – and if you don’t know
what that means, I’m about to explain.
• Programming assignments will be short and can be done on any
system, but I strongly advise that you use a Unix/Linux/POSIX system,
for several reasons:
• It’s going to be easier for you
• I probably can’t help with questions or problems you encounter on other
systems
• It’s effectively an industry standard. From a career standpoint, you should be
proficient in it!
• So, let’s see why it’s so important…
CMPE 220 7
How Software Became Portable
• Before the 1970s, every computer manufacturer distributed bundled
software with their systems – compilers, databases, editors, and so
on. Once you chose a computer vendor, you were “locked in” to their
systems.
• There was virtually no independent software industry.
• Three things changed that:
• The development of system-independent programming languages
• The development of system independent databases (SQL)
• The proliferation of Unix/POSIX based systems
CMPE 220 8
How “Unix” Became a De Facto
Standard
• Unix was developed in the 1970s at Bell Labs by Ken Thompson, Dennis
Ritchie, and others, but licensing was very restrictive
• An operating system – the Berkeley Standard Distribution (BSD) – based on
Unix was developed at Berkeley and released in 1977. Mac OS is based on
BSD.
• Another system based on Unix – Linux – was developed by Finnish software
engineer Linus Torvalds in the early 1990s. Many Linux variants were
created by other developers.
• The IEEE Computer Society released a family of standards – the Portable
Operating System Interface (POSIX) – in 1997.
• Over 70% of all web servers, and approximately 98% of the top 1 million
web servers, run POSIX compliant operating systems.
CMPE 220 9
Machine Code – Common in 1940s
Instruction Action
0101 1111 1111 0001 Load the value from the following address into the A
register; advance the program counter by 2
0011 1110 1000 0101 (data address)
0111 1111 1111 0001 Subtract the following value from the A register;
advance the program counter by 2
0000 0000 0000 1100 (value = 12)
0110 1111 1111 0001 Store the value from the A register into the following
address; advance the program counter by 2
0011 1110 1000 0101 (data address)
0110 1010 1111 0001 Compare the following value to the A register; if A is
less than or equal to the value, jump to address
0000 0000 0001 0100 (value = 20)
0100 1000 1000 0110 (program address)
CMPE 220 10
HP 2116 – circa 1966
CMPE 220 11
Problems with Machine Code
Programming
• Really hard and error prone
• Really hard to “read” code
• Tied to machine architecture
• Need to remember what each address is used for
• Not relocatable – if you want to load things at a different place in
memory, addresses change
CMPE 220 12
Assembly Language
Instruction Action
LDA inventory Load the value from the specified location into the A
register
SBA 12 Subtract a value (12) from the A register
STA inventory Store the value from the A register into the specified
location
CMPA 20, low_inventory Compare the A register to a value (20); if A <= 20, go
to the address “low_inventory”
•
•
•
low_inventory:
CMPE 220 13
Assembly Language Coding Sheet
CMPE 220 14
Assembly Language
• Easier to remember and code, less error prone
• Locations identified by name
• Identifiers can be mapped to different memory addresses (this is one
of the things a loader does
• Problems: Still tied to machine architecture – you can’t take an
assembly language program and run it on a different machine
CMPE 220 15
Higher Level (compiled) Language
Instruction Action
inventory = inventory - 12; Subtract 12 from the value in the specified location
If ( inventory <= 20 ) goto low_inventory; In the value of inventory is less than or equal to 20,
jump to the address “low_inventory”
•
•
•
low_inventory:
Pros Cons
• Much easier and less error prone • Early compilers generated inefficient code
• Not tied to a particular machine
(Grace Hopper’s idea!)
CMPE 220 16
Compiler Inefficiencies
Instruction Action
inventory = inventory - 12; LDA inventory
SBA 12
STA inventory
If ( inventory <= 20 ) goto low_inventory; LDA inventory
CMPA 20, low_inventory
•
•
•
low_inventory:
Inefficient Code
• Literal, line-by-line translation (smart, optimizing compilers weren’t developed until years later)
• In this example, we generate five instructions instead of four – a 25% increase in memory requirements
CMPE 220 17
Linkers and Loaders
• Linkers combine machine code modules into a single program,
allowing programs to be written in pieces. Linkers also allow the
development of shared libraries.
• Loaders load machine code program files into memory, adjust
addresses, and initialize registers.
CMPE 220 18
Building Software
High Level Assembly Binary Machine
Language Language Code
Compiler Assembler
Source Code Source Code
(e.g. C++)
optional
Executable In-Memory
Machine Code Hardware
Linker Loader
Code Execution
CMPE 220 19
Interpreters
• Interpreters implement programming languages.
• Like a compiler, an interpreter parses the language.
• Rather than emitting assembly language, the interpreter immediately
executes that language statements.
• Interpreters eliminate the need for assemblers, linkers, and loaders,
but the programs run much slower than compiled programs that have
been converted to machine code.
CMPE 220 20
Think About It
• Our company makes the LunaVac Model 1 computer. We’ve spent
years developing assemblers, linkers, loaders, compilers, and a lot of
application software… all written in C++.
• The hardware guys have just completed the new LunaVac Model 2
computer… but the architecture and instruction set are VERY
different!
• How do we get our software to run on the Model 2?
CMPE 220 21
Step One: Write Model 2 Assembler
• Devise an assembly language instruction set for the Model 2
• Using C++ (or another high level language), write an assembler for the
Model 2 instruction set that generates Model 2 machine code
• We can now write and assemble programs on a Model 1, and emit
machine code for the Model 2
CMPE 220 22
Step 2: Write a Compiler for the
Model 2
• Write a compiler in C++ that compiles C++ into Model 2 assembly
language
• We now have a path to write C++ code, which we can then compile to
emit Model 2 assembly code, which we can then assemble to
generate Model 2 binary instructions
CMPE 220 23
Step 3: Write (or port) Code for the
Model 2
• Write machine-specific code like system software using C++
• Some code – things like editors, application programs, and utility
programs – may not even need to be re-written. If they were
originally written in C++, they can simply be “cross compiled” to move
them to the new machine.
CMPE 220 24
Course Goals and Requirements
• One of the important goals of this class is to understand how system
software components fit together and the problems that they solve,
so that you can use them to address real problems.
• This is not a programming-intensive class. Programming exercises will
be very short. But if you don’t have a good grasp of programming,
you will have a very hard time following the concepts, and keeping up
with the assignments.
CMPE 220 25
Shell scripting
CMPE 220 26
Shell Scripting
• A “shell” is a command-line interface to a computer system. The shell
implements a high-level interpreted programming language.
• On POSIX systems, the shell is directly accessible from the console or
any authorized terminal device.
• On MacOS, the shell is accessed through the Terminal program.
• On Windows, the shell is accessed through the Command Prompt.
• Shell commands can be entered directly, or loaded from a file.
• The most commonly used shell is called bash (the Born Again SHell),
and that’s the one I’ll use for my examples.
CMPE 220 27
History
• The first Unix shell, called sh, was developed
by Ken Thompson at Bell Labs in 1971.
Thompson, along with Dennis Ritchie,
invented and popularized the Unix operating
system. (Ritchie was also the inventor of the
C programming language.)
• Sh was completely rewritten by Stephen
Bourne in 1979, who created the Bourne
Shell. A number of spinoff and replacement
shells were subsequently developed,
including ksh, csh, tcsh, and bash.
Ken Thompson
CMPE 220 28
Bash Basics
• Bash has a number of built-in commands that can be executed from the
command line or in a script file.
• If the command line input does not correspond to a built-in shell
command, the shell will look for an executable program or shell script file
of that name.
• The shell has a built-in variable called path. Path is an ordered list of
directories (separated by “;”) that bash will search for executable
programs.
• Arguments can be passed to shell commands, or to programs launched
from the shell, by simply appending them on the command line (with
spaces as separators):
command arg1 arg2 arg3
CMPE 220 29
Bash Basics (cont)
• Many shell commands accept options, indicated by a leading “-”
character
ls –al directory1 directory2 directory3
• Multiple commands or programs can be executed from a single line,
using the “;” character as a separator:
command; program1 arg1 arg2; program 2
• The “*” character may be used as a wildcard when matching
filenames in commands:
ls *.jpg
• A “\” character is used to escape special characters:
ls –al file\*name
CMPE 220 30
The rc file
• When an interactive shell session is launched, it automatically looks
for and executes a shell startup script. In the case of bash, this file is
called .bashrc
• This shell startup script may be used to initialize variables, print a
header message, perform cleanup activities, etc.
• One of the most common things you will want to do in your startup
script is set the path variable, which tells the shell where to look for
programs.
CMPE 220 31
File References
• “name” refers to an unqualified program name. The shell will search
for the program in the directories in the path variable:
name
• “/” refers to the root of the file system. Subsequent / characters are
used to specify subfolders or files, e.g.:
/usr/bin/name
• “.” refers to the current working directory. To execute a local program
rather than search the path, you would enter:
./name
• “..” refers to the parent of the current working directory:
../siblingdirectory/name
CMPE 220 32
POSIX File Permissions
• Permissions are granted to three classes of user:
• The file’s Owner
• The file’s Group
• World – which means everyone
• The possible permissions for each class are read, write, and execute.
• When applied to a directory, execute means that the directory can be
searched or traversed.
• Programs inherit the permissions of the user who launched the
program, unless the program has a SUID (Set User ID) flag set, in
which case it inherits the permissions of the owner of the executable
file.
CMPE 220 33
POSIX File Permissions (cont)
• Permissions are displayed by commands such as ls as a nine character
string, showing read/write/execute permissions for owner, group, and
world. A “-” character indicates non-permission:
rwxrw-r--
Owner: read, write execute
Group: read, write
World: read
• Permissions are also represented as a three digit octal string; each bit
represents one of the permissions:
0764
CMPE 220 34
Useful Shell Commands
• pwd – print the current working directory
• cd – change the current working directory:
cd ../../documents
• ls – list directory contents. By default, the command will list the
contents of the current working directory. Optionally, the command
accepts a directory name:
ls ../documents
• ls accepts many options, prefaced by a “-” character. Useful options include:
-a – list all files (by default, filenames starting with a “.” are not listed
-l – list files in long format (show ownership, permissions, and date)
-r – list files recursively
CMPE 220 35
Useful Shell Commands (cont)
• chmod – change of permissions on files
chmod 750 file1 file2 file3
• cat – concatenate the specified files, and send the result to STDOUT
(the standard output)
cat file1 file2 file3
• grep – search the designated file and output lines matching the
specified regular expression
grep expression file
• history – list the history of commands executed in the current shell
session
CMPE 220 36
Useful Shell Commands (cont)
• !! – repeat the previous command
• !(chars) – repeat the most recent command starting with the
designated characters:
!ls
• !(number) – repeat the specified command number. Bash keeps a
numerically indexed list of all commands executed in the current
session, starting with 1.
!173
• ^(pattern)^(replacement) – repeat the previous command, replacing
instances of pattern with the replacement string:
^Tuesday^Wednesday
CMPE 220 37
Standard File Streams
• When POSIX systems launch a program, the system automatically
opens three file streams:
• STDIN – standard input
• STDOUT – standard output
• STDERR – standard error reporting stream
• Bash allows you to assign these streams when programs are launched,
using the “<“ and “>” characters.
• Execute a program and assign the contents of a file to STDIN
program <file
• Execute a program and direct STDOUT to a file (replacing the contents)
program >file
CMPE 220 38
Standard File Streams (cont)
• Execute a program and direct STDOUT to a file (appending to the
contents):
program >> file
• Execute a program and direct STDERR to a file:
program 2> file
• File direction can be combined with arguments:
program arg1 arg2 <myinput >myoutput
CMPE 220 39
Pipelines (aka Pipes)
• Multiple programs can be executed from a single line. The STDOUT
stream of each program can be attached to the STDIN stream on the
next using the “|” character:
program1 argument | program2 | program3
CMPE 220 40
Variables
• Bash allows the setting and retrieval of variables.
• To set a variable, simply append an equal sign after its name (no
spaces):
today=Wednesday
• To set a value including spaces, quote the string:
today=“Wednesday, January 8, 2020”
• To reference a variable, prepend a dollar sign to its name:
echo $today
• To append to a variable, reference it and follow the reference with the
additional contents:
path=$path;/usr/bin/local
CMPE 220 41
Conditionals
• By convention, programs return a 0 for success and a 1 for failure.
• command1 && command2 – execute command2 if – and only if –
command 1 succeeds
• command1 || command2 – execute command2 if – and only if –
command 1 fails
• [[ condition ]] – The bracket characters are used to evaluate the
condition and return success or failure:
[[ $today==Wednesday ]]
CMPE 220 42
Conditional Expressions
Expression True if:
[[ string1 == string2 ]] string1 is equal to string2
[[ string1 != string2 ]] string1 is not equal to string2
[[ num1 –eq num2 ]] num1 is equal to num2
-ne, -lt, -le, -gt, -ge Not equal, less than, less than or equal, greater than,
greater than or equal
[[ -e file ]] File exists
[[ -r file ]] File is readable
[[ -w file ]] File is writeable
[[ -x file ]] File is executable
[[ file1 -nt file2 ]] File1 is more recent than file2
CMPE 220 43
Conditional Blocks: if
Block statements can be used in scripts, but not on the command line.
• if condition then - Execute statements if the condition is true
statements
else
statements
fi
if [[ $today == “Wednesday” ]]; then
echo “It\’s Wednesday”
else
echo “It\’s not Wednesday”
fi
CMPE 220 44
Loop Blocks: for
Block statements can be used in scripts, but not on the command line.
• for variable in values do
statements
done – iterates through values
for day in Mon Tue Wed Thu Fri do
echo $day
done
• Values may be specified as a range:
for value in {1..100} do
echo $value
done
CMPE 220 45
Loop Blocks: while
Block statements can be used in scripts, but not on the command line.
• while [ condition ] do
statements
done – iterates while condition is true
while [ $i –lt 100] do
i = $[$i+1]
echo $i
done
CMPE 220 46
make
CMPE 220 47
What is Make?
• Make is an application to organize and automate the process of
compiling programs.
• Make builds programs based on instructions in a file, called a
makefile.
• A makefile contains:
• A list of dependencies (e.g. a hierarchical list files that go into building
a program)
• Instructions for building the program and its components
CMPE 220 48
History
• Make was created by Stuart Feldman, at Bell Labs, in 1976
• Feldman received the 2003 ACM Software System Award for the authoring of
this widespread tool
• Ken Thompson and Dennis Ritchie were not the only people working
on Unix!
• Prior to make, most programmers created shell scripts to build
complex programs
• Make is now part of the POSIX standard, so makefiles are reasonably
portable
CMPE 220 49
Dependencies: Example
• A program – prog.exe – is dependent on several object files
prog.exe depends on module1.o, module2.o, and module3.o
• Each object file depends on a source file and an include file.
module1.o depends on module1.c and module1.h
module2.o depends on module2.c and module2.h
module3.o depends on module3.c and module3.h
CMPE 220 50
Instructions: Example
• There are instructions (command lines) for:
• Building each of the object files by compiling the source and include files
• Building the program file by linking the object files
CMPE 220 51
Using Make
• When make is run it checks the modification dates of all
dependencies, and executes the necessary instructions to build the
program
• For example, if only module2.c has changed, then:
module2.o will be rebuilt by compiling module2.c
the program will be rebuilt by linking module1.o. module2.o, and
module3.o
• Note that module1.o and module3.o will not be rebuilt, because the
underlying source files on which they depend have not changed
CMPE 220 52
Makefile: Example of a Make Rule
• target: dependencies
command 1
command 2
command 3
hellomake: hellomake.c hellofunc.c
gcc -o hellomake hellomake.c hellofunc.c -I.
• The first line lists the dependencies for the program hellomake
• The second line lists the command line for building hellomake (the
target)
• Command lines must begin with a tab character (not spaces)
• You can have multiple rules – separated by blank lines - in a makefile
CMPE 220 53
Invoking Make
• To build the program, type the make command on the command line:
make
• Make will look for a file called “makefile” in the current working directory
• The target on the first line will be built
• That first rule may include other targets as dependencies, thus triggering
those rules
• You can include several independent programs in a makefile, in which
case you would need to specify which program to build:
make hellomake
CMPE 220 54
Make Macros
• Make supports macros, which are similar to variables
CC=gcc
CFLAGS=-I.
hellomake: hellomake.o hellofunc.o
$(CC) $(CFLAGS) -o hellomake hellomake.o hellofunc.o
• Note that we don’t list hellomake.c as a dependency for hellomake.o.
• We also don’t provide a command line for building hellomake.o.
• These rules are built in to make. You may not be comfortable
depending on built-in rules; feel from to include explicit rules.
CMPE 220 55
Predefined Macros
Macro Definition Default Macro Definition Default
AS Assembler as ASFLAGS Flags for assembler (none)
CC C compiler cc CFLAGS Flags for C compiler (none)
CXX C++ Compiler c++ CXXFLAGS Flags for C++ compiler (none)
CPP C pre-processor $(CC) -E CPPFLAGS Flags for C pre-processor (none)
FC Fortran 77 compiler f77 FFFLAGS Flags for Fortran compiler (none)
GET Extract a file from SCCS get GFLAGS Flags for SCCS (none)
LINT Run Lint on source code lint LINTFLAGS Flags for lint (none)
PC Pascal compiler pc PFLAGS Flags for Pascal compiler (none)
RM Remove a file rm -f LDFLAGS Flags for C/C++ loader (none)
CMPE 220 56
Generic Rules
• If you don’t want to rely on built-in make rules, you can create your own generic rules:
CC=gcc
CFLAGS=-I.
DEPS = hellomake.h
CMPE 220 57
Rules You Should Include
• These are conventions that experienced users expect in every
makefile
• all: useful if your makefile includes several top-level program
all: program1 program2 program3
• clean: delete all intermediate files (such as .o files, libraries, etc) and
then make the top level target.
• This is useful when something just isn’t working right and you suspect you
may have a corrupted object file, incorrect modification dates, etc.
• install: install the targets in appropriate directories; set permissions;
etc
CMPE 220 58
Make for Everything
• Make is not limited to building programs!
• For example, you could use make to generate a report. The
commands might include doing several database queries, and
concatenating the output of each into a report file.
CMPE 220 59
For Next Week
• Recommended: Read chapter 1 of the text
• Log in to Canvas and:
• Submit a copy of your transcript, with the prerequisites highlighted
• Download the Academic Integrity Pledge, sign it, and submit it
• Note that the Transcript and the Academic Integrity Pledge are
required by the department; if you don’t submit them, you will be
dropped from the class.
CMPE 220 60
CMPE 220
Class 2
Computer Architecture
CMPE 220 1
Binary Versus Decimal Arithmetic
Decimal Binary Binary Coded Decimal (BCD)
CMPE 220 2
Early History
• The first programmable general purpose
computer was the ENIAC (Electronic Numerical
Integrator and Calculator), developed by John
Mauchley and J.Presper Eckert at the University
of Pennsylvania.
• It performed decimal arithmetic.
• It took 2,800 microseconds to multiply two 10-
digit decimal numbers… about 360 operations
per second. (Desktop computers today are at
least 1 billion times faster)
• It was delivered to the US army in 1945 and was
used to computer artillery trajectories.
CMPE 220 3
Early History - Continued
• ENIAC was followed by the Univac
(UNIVersal Automatic Computer), the first
successful commercial computer, in 1951.
• The Univac also used decimal arithmetic.
• The Univac had 1000 words of 12
characters each. An integer was
represented as a + or – character, and 11
digit characters.
• These machines used vacuum tubes, which
frequently burned out. They were “down”
about 50% of the time.
CMPE 220 4
Early History - Continued
• The ENIAC, UNIVAC, and other early computers used decimal
arithmetic. Rather than representing numbers in binary, they
represented numbers as a string of decimal digits, each digit encoded
in binary (BCD).
• They had no provision for fixed point decimal or floating point
numbers. If calculations involved non-integer numbers, the program
needed to keep track of the decimal point independently.
CMPE 220 5
Early History – Why Decimal
Arithmetic?
• Comfort factor
• Numbers were entered as decimal; conversion was expensive
• Conversion caused rounding errors
CMPE 220 6
BCD Arithmetic Today
• Decimal, or Binary Coded Decimal (BCD) arithmetic is still important:
• It is required for some languages, such as COBOL
• Ideal for financial computations, where conversion errors can be a problem.
0.0000001010001111011 = $0.0100002288818359375
CMPE 220 7
Binary Computers
• The first binary computer, the Z1, was
developed in the 1930s, but was
unreliable.
• The first widely successful binary
computer was the IBM System/360,
released in 1965.
• It supported binary integer and binary
floating point arithmetic, as well as decimal
arithmetic.
• It used a 32-bit word, which could hold four
8-bit characters.
CMPE 220 8
Mini- and Micro-Computers
• The 1960s and 1970s saw the development of mini-computers (really
just small, inexpensive computers) and micro-computers (based on
single-chip CPUs).
• Mini-computers typically used 16-bit words, consisting of two 8-bit
bytes. They supported multi-word formats for long integer and
floating point arithmetic.
• Micro-computers, which appeared in the mid-70s, often used 8-bit
words, but had the ability to do 16-bit arithmetic and 32-bit
arithmetic.
• Integers were represented as 15 bits, plus a sign bit (+/- 32,767).
Double-word integer arithmetic.
CMPE 220 9
Precision and Portability
• In order for programs to be portable across platforms (hardware
independent), arithmetic must work the same on any hardware.
• Language standards and test suites usually specify a minimum range
and precision for each data type… but some systems support greater
range and precision.
• A program may run correctly on a high-precision system, and fail
when run on a lower precision system – even though both systems
are standards-compliant. The programmers are inadvertently
depending on higher precision than the standards.
• Some programs may even fail when running on higher-precision
hardware.
CMPE 220 10
Floating Point Arithmetic
• IEEE 754: a set of standards for binary and decimal floating point
representation and arithmetic. The standard was established in 1985
and updated in 2008 and 2019.
• Each sub-standard specifies the number of bits (digits) and the
exponent size.
• binary32: 24 significant bits (including sign); 8 exponent bits
CMPE 220 11
Floating Point Arithmetic - continued
• A floating point format consists of:
• a base (also called radix) b, which is either 2 (binary) or 10 (decimal) in IEEE
754;
• a precision p;
• an exponent range from emin to emax, with emin = 1 − emax for all IEEE 754
formats
• Specified representations for +infinity, -infinity, and NaN (not a number)
• The IEEE 754 floating-point standard also includes rules for rounding
CMPE 220 12
Character Sets and Portability
• A character set is an encoding scheme for strings of text glyphs.
• Character sets are generally not tied to processor hardware.
• Character sets are tied to peripherals.
• Processors may have instructions for character set conversions.
CMPE 220 13
Character Sets – ASCII (/ˈæski)
• ASCII (American Standard Code for Information Interchange)
• 7-bit ASCII (aka US ASCII): 0-127 - 1963
• First 32 characters were reserved for control functions
• Uppercase and lowercase English alphabet; 10 digits; punctuation and special
characters (! @ # $ % etc)
• Some programs used the 8th bit for other purposes!
• 8-bit (Extended) ASCII: 0-255
• Adds many special and international characters: € † ™ £ ¶ Á á ä
• Initially proprietary (system dependent)
• ISO 8859 (ISO-8) defined several standard variants: Latin-1 (Western
European), Latin-2 (Central European), Latin-3 (South European), Latin-4
(North European), Latin/Cyrillic, Latin/Arabic – 1987, 1999
CMPE 220 14
7-Bit ASCII
CMPE 220 15
Character Sets – EBCDIC (/ˈɛbsɪdɪk/)
• EBCDIC: Extended Binary Coded Decimal Interchange Code - the “other” 8-
bit character set
• Devised in 1963 for IBM mainframe computers and peripherals
• Encompassed BCD decimal characters
• ASCII and EBCDIC each contain characters not found in the other – such as
“{“ and “}” - making translations ambiguous and language support difficult
• Characters in EBCDIC are not in continuous alphabetical order, leading to
potential portability issues:
for (c='A';c<='Z';++c)
• Steps through 26 characters if the character set is ASCII
• Steps through 41 characters if the character set is EBCDIC
CMPE 220 16
EBCDIC
CMPE 220 17
Character Sets – Unicode
• Unicode: unique, unified, universal encoding. An attempt to encode
all of the world’s character sets.
• Unicode currently supports 137,994 glyphs and control characters.
• First draft standard was developed by Xerox and Apple in 1987.
• First complete standard was adopted in 1991.
• Actually a family of encoding standards: UTF-8, UTF-16, UTF-16BE,
UTF-16LE, UTF-32, etc.
• UTF-8: an 8-bit variable-width encoding which maximizes
compatibility with ASCII; used by 94% of all sites on the web
• One byte for characters 0-127; up to 4 bytes for extended characters
CMPE 220 18
Instruction Set
Architecture
CMPE 220 19
Memory Addressing
Memory
• Instructions and data are stored in memory
• Memory is external to the processor
• Accessing memory takes a significant amount of time
Registers
• Built into the processor
• Fast to access
• Limited in number
• Used directly by machine instructions:
SBA 12 – (subtract 12 from the A register)
CMPE 220 20
Addressing Modes
• Immediate
SBA 12
Opcode Operand - OR - Opcode
Operand
• Register
ADA B Register
CMPE 220 21
Addressing Modes - continued
• Direct
LDA inventory
Memory Location
Opcode Address Operand
- OR -
Opcode Memory Location
Address Operand
CMPE 220 22
Addressing Modes - continued
• Indirect – Useful for arrays
LDA index* Memory Location Memory Location
Opcode Address Memory Address Operand
CMPE 220 23
Addressing Modes - continued
• Register Indirect
Register Memory Location
Opcode Register ID Memory Address Operand
CMPE 220 24
Addressing Modes - continued
• Register Offset
LDA array,X
Opcode Reg ID Offset
CMPE 220 25
Addressing Modes - continued
• Using Register Indirect or Register Offset with Auto-increment/Auto-
decrement
• Easy in assembly language
• Point register to list or array
• Perform operations as register is incremented
• Difficult for compilers!
while (*from != ‘\0’)
*to++ = *from++;
CMPE 220 26
Addressing Modes - continued
• Relative or PC (Program Counter) Relative
JMP loop
Opcode Offset
PC + Memory Location
Address Effective address Operand (next opcode)
CMPE 220 27
Addressing Modes - continued
• *** Stack Addressing ***
Top of Stack
Opcode Operand
CMPE 220 28
Addressing Limitations
• 16-bit addresses limit memory to a “flat” space of 65,536 (64K) words
or bytes of memory.
• Segmented or bank-switched memory.
• Memory is divided into 64K segments; a segment ID and an address are
resolved to a physical (or virtual) memory location by the Memory
Management Unit (MMU).
CMPE 220 29
Addressing Limitations – Extended Base
Register
• Displacement or Base Register
Opcode Reg ID Offset
CMPE 220 30
Addressing Limitations – Base
Multiplier
• Displacement or Base Register
Opcode Reg ID Offset
+
Register Memory Location
Base * 2n
Address Base Effective base Effective address Operand
Register = 16 bits Effective Base > 16 bits Effective Address > 16 bits
CMPE 220 31
Impacts on Programming
• Multiple, complex addressing modes and memory management have
made assembly language programming more difficult
• Compilers, linkers and loaders need to be much smarter than their
early counterparts
CMPE 220 32
The Natural Evolution of Processors
• Memory was very limited - ~ 1,000 words or less
• Most programmers were writing in machine code or assembly
language
• By adding instructions that did more, programs required fewer
instructions, and programmers didn’t have to write as much code
• This led to what we now call Complex Instruction Set Computers
(CISC)
• Over time, instruction sets grew more and more complex.
• Harder for hardware designers to implement
• Slower (more “cycles” required for complex instructions)
CMPE 220 33
Microprogrammed Instruction Sets
• Complex instruction sets are very complex and hard to implement
with logic design.
• Microprogramming is a way of implementing instruction sets by
writing lower level instructions for a microcontroller… a processor
inside the processor.
• Instructions are written in a microassembler language.
• Multiple microinstructions to implement one machine instruction
• Microinstructions are very low level: basic arithmetic and logic
operations; register, memory, and bus access.
CMPE 220 34
Microprogramming History
• The first microprogrammed machine was the EDSAC 2, developed by
Maurice Wilkes at Cambridge in 1958.
• The instruction sets at the time did not require microprogramming
• There was not practical and cost-effective way to build a persistent
microcontrol store
• The first widely successful microprogrammed computer was the IBM
System/360, released in 1965.
• Microprograms may be stored in ROM, or in EPROM to allow
instruction set updates.
CMPE 220 35
The Rise of RISC
In 1980, David Patterson at UC Berkeley
outlined an architecture for a Reduced
Instruction Set Computer, and coined the
term RISC.
• Fellow of the ACM
• Fellow of the IEEE
• Fellow of the Computer History Museum
• ACM Distinguished Service Award
• ACM-IEEE Eckert-Mauchly Award
CMPE 220 36
RISC versus CISC
Reduced Instruction Set Complex Instruction Set
Computer Computer
• One clock-cycle per instruction • Multiple / variable clock-cycles
• Effective pipelining per instruction
• Fewer addressing modes • More addressing modes
• Requires more instructions per • Requires fewer instructions per
program (more RAM) program (less RAM)
• Lower gate count • Higher gate count – more chip
real estate
• Lower energy use
• Higher energy use
CMPE 220 37
RISC Versus CISC Today
• No longer one versus the other
• RISC and CISC architectures have borrowed from one another
• Both are used
RISC CSIC
• MIPS, PowerPC, Atmel’s AVR, the • Motorola 68000 (68K), the DEC
Microchip PIC processors, Arm VAX, PDP-11, Intel x86
processors, RISC-V
• Often used in mobile devices
and for embedded applications
CMPE 220 38
Instruction Pipelining (RISC
computers)
• Each instruction is divided into several steps or stages – allowing
instruction execution to be overlapped.
Example: 5-stage pipeline
Clock Cycle Inst 1 Inst 2 Inst 3 Inst 4 Inst 5 Inst 6 Inst 7
1 Fetch
2 Decode Fetch
3 Execute Decode Fetch
4 Memory Execute Decode Fetch
Write
5 Register Memory Execute Decode Fetch
Write Write
6 Register Memory Execute Decode Fetch
Write Write
7 Register Memory Execute Decode Fetch
Write Write
CMPE 220 39
Instruction Pipelining - continued
Problems
• Branch instructions break the pipeline, and it must be reloaded
• Pipelines can be broken by interrupts
• Data dependencies occur when an instruction relies on the results of
a previous instruction, causing the pipeline to block
Advances
• More stages… faster clock time, shorter blocks
• Parallel pipelines for conditional branch instructions
CMPE 220 40
New Directions
• Heterogeneous computing: including multiple different computing
elements (Application Specific Integrated Circuits, or ASICs) in a single
system.
• Graphics Processing Unit (GPU): common today
• Machine learning
• Image Processing
• Cryptography
• Video compression/decompression
• Field Programmable Gate Arrays (FPGAs)
CMPE 220 41
New Directions - continued
• Near Memory Computing: reducing the need to time-consuming
fetch and store operations
• Rather than fetching small bits of data from memory to bring to the
processor for computations, researchers are flipping this idea around.
They are experimenting with building small processors directly into
the memory controllers on your RAM or SSD.
• By doing the computation closer to the memory, there is the potential
for huge energy and time savings since data doesn't need to be
transferred around as much.
• This idea is still in its infancy, but the results look promising.
CMPE 220 42
Quantum Computing
• Commercial computing started with Binary Coded Decimal arithmetic.
• Binary arithmetic allowed problems to be solved in different ways. It
is faster the BCD, but has some problems.
• Quantum computing promises a significant new paradigm for
computation.
CMPE 220 43
Quantum Computing - continued
• Quantum computing is based on QUBITS – elements that can hold the
value 0, 1, or some combination. This is called superposition.
• Another quantum property is entanglement, in which the value of one
element is tied to another.
• Finally, quantum interference allows elements to affect the value of
other elements – either positively or negatively. This leads to
“voting” type solutions.
CMPE 220 44
Quantum Computing - continued
• Quantum computing offers the promise of simplifying certain classes
of problems, including:
• Modelling of natural systems
• Searching
• Machine learning
• Artificial intelligence
• It would be a mistake to say quantum computers are faster than
conventional computers. Rather, they will solve certain types of
problems much faster – and possibly solve problems that can’t be
solved at all today.
CMPE 220 45
Quantum Computing Today
• The largest quantum computers are about 50 Qubits.
• Two companies – Microsoft and IBM – have quantum simulators and
development toolkits available.
• Current computers are subject to quantum decoherence, which
causes loss of state information.
• Quantum computing algorithms need to be fault-tolerant
CMPE 220 46
Quantum Computing Today -
continued
• New programming languages and libraries are being developed, but
these languages are not yet standard. We are once again faced with
portability issues.
CMPE 220 47
Next Lecture
• Simplified Instructional Computer
• A “typical” computer instruction set
CMPE 220 48
CMPE 220
Class 3
Computer Architecture
Simplified Instructional Computer
CMPE 220 1
Simplified Instructional Computer
(SIC)
• A hypothetical computer that includes the hardware features most
often found on real machines.
• SIC (standard model)
• SIC/XE
• Upward compatible
• Programs for SIC can run on SIC/XE
• SIC is a good example of the basic architectural features required in a
computer.
• I’ll use SIC for many of the examples in the remainder of this class
CMPE 220 2
SIC - Memory
• 8-bit bytes
• 3 consecutive bytes form a 24-bit word
• Words are addressed by the lowest number byte
• 215 (32768) bytes in the computer memory
CMPE 220 3
SIC - Registers
• Five 24-bit registers
Mnemonic # Use
A 0 Accumulator: used for arithmetic & logic operations
L 2 Linkage: stores the return address for a subroutine jump. Only allows one
level of return. SIC does not have a stack.
CMPE 220 4
SIC – Status Word (SW)
Field Length Bits Use
(bits)
mode 1 0 user mode (0) or supervising mode (1)
state 1 1 process is in running state (0) or idle state (1)
id 3 2-5 process id (PID)
CC 2 6-7 condition code (device state)
mask 4 8-11 interrupt mask
X 4 12-15 (unused)
icode 8 16-23 interrupt code i.e. Interrupt Service Routine
CMPE 220 5
SIC – Instruction Format
opcode m address
• 24 bits (1 word)
• opcode: machine instruction code (8 bits)
• m: address mode (1 bit)
• 0: direct
• 1: indexed (aka base register addressing)
• address: target address, or offset from X register (15 bits)
CMPE 220 6
SIC – Addressing Modes
• Direct
Memory Location
opcode 1 address
CMPE 220 7
SIC – Data Formats
• Characters: 8-bit ASCII
• Integers: 24-bit binary, two’s complement
• No decimal, no floating point
CMPE 220 8
SIC – Instructions (Load & Store)
• Transfer 1 word of data (24 bits) between the A, X, or L registers, and
a memory location
• LDA location • STA location
• LDL location • STL location
• LDX location • STX location
CMPE 220 9
SIC – Instructions (Arithmetic &
Logic)
• Arithmetic involves the A register and a memory location
• ADD location
• SUB location
• MUL location
• DIV location
• AND location
• OR location
• The contents of the A register are replaced by:
A (operation) (contents of location)
CMPE 220 10
SIC – Instructions (Comparison)
• Comparison instructions set the CC flag to <, =, or >
• Compare the A register and the contents of a memory location
• COMP location
• Increment the X (index) register, and compare the result to the
contents of a memory location
• TIX location
CMPE 220 11
SIC – Instructions (Jump)
• Unconditional Jump
• J location
• Conditional Jumps (based on value of CC flag):
• JEQ location
• JLT location
• LGT location
• Jump to subroutine; store return address in L register
• JSUB location
• Return from subroutine (jump to address in L register)
• RSUB
CMPE 220 12
SIC – Input/Output
• Three I/O instructions
• TD: Test Device; returns status in CC flag of SW register. ‘<‘ means
ready, ‘=‘ means not ready.
• RD: Read one byte of data from the specified device into the lower 8
bits of the A register.
• WD: Write one byte of data from the lower 8 bits of the A register to
the specified device.
CMPE 220 13
SIC – A Workable RISC Instruction
Set
• Although lacking many “convenience” instructions, the SIC
architecture implements a fully functional, general purpose
instruction set, similar to common minicomputers of the 1960s.
• Its chief limitation is the 15-bit address space, allowing only 32,767
bytes of memory.
CMPE 220 14
Non-Instruction Statements (SIC)
• So far, we’ve talked about statements that correspond one-to-one to
machine code instructions… but there is another statement type
required: the memory declaration.
• Reserve some memory, and assign a label:
inventory RESW 5 (Reserve 5 words)
partnumbers RESB 100 (Reserve 100 bytes)
• Reserve some memory, and assign a label and starting value:
inventory WORD 100 (Reserve 1 word; value=100)
partname BYTE C‘widget’ (Reserve 6 bytes;
value=‘widget’)
Iochannel BYTE X‘05’ (Reserve 1 byte; value=x05)
CMPE 220 15
Other Housekeeping Statements
• Indicated starting address of program:
programname START 1000
• Indicate end of program, and location of first statement:
END starthere
CMPE 220 16
SIC/XE – An Extended CISC
Instruction Set
• The SIC/XE is fully backward compatible with the SIC. That is, it will
run all SIC instructions.
• It adds:
• A 20-bit addressing mode, supporting 1 MB of memory
• Floating point arithmetic
• Multiple new addressing modes
• Additional arithmetic and logic functions
CMPE 220 17
SIC/XE – Additional Registers
Mnemonic # Use
B 3 Base: base register for addressing
S 4 S: general accumulator
T 5 T: general accumulator
CMPE 220 18
SIC/XE – Additional Instruction
Formats
opcode
• 8-bit
• opcode: machine instruction code (8 bits)
opcode r1 r2
• 16-bit
• opcode: machine instruction code (8 bits)
• r1, r2: register identifiers (4 bits, 4 bits)
CMPE 220 19
SIC/XE – Additional Instruction
Formats
opcode flags disp
• 24-bit
• opcode: machine instruction code (6 bits)
• flags: n, i, x, b, p, e (6 bits)
• disp: 12-bit address displacement
opcode flags address
• 32-bit
• opcode: machine instruction code (8 bits)
• flags: n, i, x, b, p, e (6 bits)
• address: 20-bit address
CMPE 220 20
SIC/XE – Instruction Flags
• n: Indirect addressing flag
• i: Immediate addressing flag
• x: Indexed addressing flag
• b: Base address-relative flag
• p: Program counter-relative flag
• e: Format 4 instruction flag
CMPE 220 21
SIC/XE – Additional Addressing
Modes
• Base Relative
opcode b=1, p=0 disp
CMPE 220 22
SIC/XE – Additional Addressing
Modes
• Program-Counter Relative
opcode b=0, p=1 disp
CMPE 220 23
SIC/XE – Additional Data Formats
s exponent fraction
CMPE 220 24
SIC/XE – Instructions (Load & Store)
• Transfer 1 word of data (24 bits) between the B, S, or T registers, and
a memory location
• LDB location • STB location
• LDS location • STS location
• LDT location • STT location
CMPE 220 25
SIC/XE – Instructions (FP Arithmetic)
• Floating point arithmetic involves the F register and a memory
location
• ADDF location
• SUBF location
• MULF location
• DIVF location
• The contents of the F register are replaced by:
F (operation) (contents of location)
CMPE 220 26
Precision and Portability: (an aside)
CMPE 220 27
Floating Point Arithmetic
• IEEE 754: a set of standards for binary and decimal floating point representation
and arithmetic. The standard was established in 1985 and updated in 2008 and
2019.
• Each sub-standard specifies the number of bits (digits) and the exponent size.
• binary32: 24 significant bits (including sign); 8 exponent bits
CMPE 220 28
Floating Point Arithmetic - continued
• A floating point format consists of:
• a base (also called radix) b, which is either 2 (binary) or 10 (decimal) in IEEE
754;
• a precision p;
• an exponent range from emin to emax, with emin = 1 − emax for all IEEE 754
formats
• Specified representations for +infinity, -infinity, and NaN (not a number)
• The IEEE 754 floating-point standard also includes rules for rounding
CMPE 220 29
SIC/XE – Instructions (Register
Arithmetic)
• Perform an arithmetic operation on two specified registers:
• ADDR reg1, reg2: reg2 = reg2 + reg1
• SUBR reg1, reg2: reg2 = reg2 - reg1
• MULR reg1, reg2: reg2 = reg2 * reg1
• DIVR reg1, reg2: reg2 = reg2 / reg1
• Logical operations: circular shift the specified register ‘n’ bits
• SHIFTL r1,n: circular shift left
• SHIFTR r1,n: circular shift right
• RMO reg1, reg2: Move the first specified register to the second
• CLEAR reg1: Clear the specified register (set the value to 0)
CMPE 220 30
SIC/XE – Instructions (Conversion)
• Convert the integer in the A register to floating point, and store the
result in the F register
• FLOAT
• Convert the floating in the F register to integer, and store the result in
the A register
• FIX
• Normalize the floating point number in the F register
• NORM
CMPE 220 31
SIC/XE – Instructions (Comparison)
• Comparison instructions set the CC flag to <, =, or >
• Compare the F register and the 48-bit contents of a memory location
• COMPF location
• Compare the first specified register to the second specified register
• COMPR reg1,reg2
• Increment the X (index) register, and compare the result to the
contents of a specified register
• TIXR register
CMPE 220 32
Assembly Example: SIC/XE
Line # Label Instruction Argument Address Instruction Size*
1 Program START 1000 1000 0
2 LDA Inventory 1000 3
3 LDT Sales 1003 3
4 SUBR T, A 1006 2
5 J DisplayRoutine 1008 3
6 Partnumber BYTE C’005740’ 1011 6
7 Inventory WORD 500 1017 3
8 Sales WORD 27 1020 3
Note that the SIC/XE has variable length instructions. This is often true of CISC machines.
RISC machines have uniform length instructions.
CMPE 220 33
Useful Resources
• Simplified Instructional Computer (SIC) Architecture
• https://www.geeksforgeeks.org/simplified-instructional-computer-sic/
• SIC/XE Architecture
• https://www.geeksforgeeks.org/sic-xe-architecture/
CMPE 220 34
The SIC/XE Simulator
• There is a Simulator for the SIC/XE machine
• Integrated Development Environment
• Assembler
• Linker
• Simulator (executes SIC/XE instructions)
• Download at: http://jurem.github.io/SicTools/
• Written in Java
• Download the JAR (Java Archive) file
CMPE 220 35
Assignment 1
Due in One Week (at start of class)
• Log in to Canvas and complete Assignment 1
• Recommended: Download and install the SIC/XE simulator
CMPE 220 36
CMPE 220
CMPE 220 1
RISC versus CISC
Reduced Instruction Set Complex Instruction Set
Computer Computer
• One clock-cycle per instruction • Multiple / variable clock-cycles
• Effective pipelining per instruction
• Fewer addressing modes • More addressing modes
• Requires more instructions per • Requires fewer instructions per
program (more RAM) program (less RAM)
• Lower gate count • Higher gate count – more chip
real estate
• Lower energy use
• Higher energy use
CMPE 220 2
CISC Computers are Often
Microprogrammed
• Machine instruction set: CISC
• Micro-machine instruction set: RISC
• From the standpoint of a compiler, and assembler, or a programmer, I
would call the a CISC machine
CMPE 220 3
Week 3: What is an Assembler?
• An assembler is a program that converts “assembly language” source
code into binary instructions (aka machine code or machine
instructions)
CMPE 220 4
Machine Code – Common in 1940s
Instruction Action
0101 1111 1111 0001 Load the value from the following address into the A
register; advance the program counter by 2
0011 1110 1000 0101 (data address)
0111 1111 1111 0001 Subtract the following value from the A register;
advance the program counter by 2
0000 0000 0000 1100 (value = 12)
0110 1111 1111 0001 Store the value from the A register into the following
address; advance the program counter by 2
0011 1110 1000 0101 (data address)
0110 1010 1111 0001 Compare the following value to the A register; if A is
less than or equal to the value, jump to address
0000 0000 0001 0100 (value = 20)
0100 1000 1000 0110 (program address)
CMPE 220 5
Assembly Language
Instruction Action
LDA inventory Load the value from the specified location into the A
register
SBA 12 Subtract a value (12) from the A register
STA inventory Store the value from the A register into the specified
location
CMPA 20, low_inventory Compare the A register to a value (20); if A <= 20, go
to the address “low_inventory”
•
•
•
low_inventory:
CMPE 220 6
Assembly Language Coding Sheet
CMPE 220 7
Why is it Called an “Assembler?”
• Because it “assembles” machine code instructions!
LDA inventory
011001 0101110011
opcode address
CMPE 220 8
Building Software
High Level Assembly Binary Machine
Language Language Code
Compiler Assembler
Source Code Source Code
(e.g. C++)
optional
Executable In-Memory
Code Code Hardware
Linker Loader
Execution
CMPE 220 9
Wait a Minute!
• When you build a program, you don’t go through all those steps!
gcc –o program program_source.c
./program
• Modern compiler commands “hide” many of the steps… but you still
have the option of breaking out the steps, as we saw in the makefile
examples:
%.o: %.c $(DEPS)
$(CC) -c -o $@ $< $(CFLAGS)
• To compile a C/C++ to assembly language:
gcc -S -o my_asm_output.s helloworld.c
CMPE 220 10
Types of Assemblers (nomenclature)
• Assembler: converts assembly language to binary machine code
• Macro Assembler: allows the programmer to define new instructions
that the assembler “expands” into the actual instruction set
This does not create new machine instructions
• High Level Assembler: an assembler that includes certain high-level
statements – such as IF/THEN/ELSE statements or loops – that don’t
correspond direction to machine instructions
• Cross Assembler: an assembler that runs on one machine, but generates
binary machine code for a different machine
• Micro Assembler: converts microassembly source code into microcode…
the low level code that implements the machine instruction set
CMPE 220 11
A (small) Bit of History
• Kathleen Booth is credited with creating the first assembler in 1947,
while working on the ARC2 (Automatic Relay Calculator) computer at
the University of London.
• David Wheeler independently
developed an assembler for the
EDSAC (Electronic Delay Storage
Automatic Calculator) in 1948.
• The IEEE credits Wheeler with
creating the first assembler.
CMPE 220 12
Requirements for the first
Assemblers
• They needed a way to enter mnemonic instructions, so that
programmers didn’t need to remember binary opcodes and
instruction formats.
• They needed a way to associate mnemonic labels with memory
addresses.
• But: they needed to be very simple! The first assemblers were
written in binary machine code, and ran on computers with tiny
amounts of memory.
CMPE 220 13
Requirements for an Assembler
(continued)
• They could use characters to represent assembly language
instructions.
• Instructions could be entered with punched cards.
• Punched cards had been in use since the 1890s.
• Businesses were comfortable working with punched cards and keypunches.
• Card readers were easily adapted for use with computers
CMPE 220 14
Assembly Language Instruction
Formats
• The requirements led to a very simply format, that is still in use today.
CMPE 220 15
Non-Instruction Statements (SIC)
• So far, we’ve talked about statements that correspond one-to-one to
machine code instructions… but there is another statement type
required: the memory declaration.
• Reserve some memory, and assign a label:
inventory RESW 5 (Reserve 5 words)
partnumbers RESB 100 (Reserve 100 bytes)
• Reserve some memory, and assign a label and starting value:
inventory WORD 100 (Reserve 1 word; value=100)
partname BYTE C‘widget’ (Reserve 6 bytes;
value=‘widget’)
Iochannel BYTE X‘05’ (Reserve 1 byte; value=x05)
CMPE 220 16
Other Housekeeping Statements
• Indicated starting address of program:
programname START 1000
• Indicate end of program, and location of first statement:
END starthere
CMPE 220 17
What an Assembler Does
• Convert mnemonic opcodes to machine language code
• Convert symbolic references to memory addresses
• Assemble machine code instructions
• Write a binary machine code file
CMPE 220 18
Two-Pass Assembler
• 1st Pass
• Identify statements
• Determine memory layout
• Assign addresses to symbolic references and build “symbol table”
• 2nd Pass
• Assemble instructions
• Output binary machine code
• Print program listing & address assignments (Symbol Table)
CMPE 220 19
Assembly Example: SIC/XE
Line # Label Instruction Argument Address Instruction Size*
1 Program START 1000 1000 0
2 LDA Inventory 1000 3
3 LDT Sales 1003 3
4 SUBR T, A 1006 2
5 J Somewhereelse 1008 3
6 Partnumber BYTE C’005740’ 1011 6
7 Inventory WORD 500 1017 3
8 Sales WORD 27 1020 3
Note that the SIC/XE has variable length instructions. This is often
true of CISC machines. RISC machines have uniform length
instructions.
CMPE 220 20
1st Pass: Build Symbol Table
Line # Label Instruction Argument Address Instruction Size
1 Program START 1000 1000 0
2 LDA Inventory 1000 3
3 LDT Sales 1003 3
4 SUBR T, A 1006 2
5 J Somewhereelse 1008 3
CMPE 220 21
2nd Pass: Assemble Machine
Instructions
Line # Label Instruction Argument Address Instruction Size
1 Program START 1000 1000 0
2 LDA Inventory 1000 3
3 LDT Sales 1003 3
4 SUBR T, A 1006 2
5 J Somewhereelse 1008 3
Symbol
Program
Address
1000
LDA Inventory
Partnumber 1011
Inventory 1017
Sales 1020
011001000000 001111111001
22
CMPE 220
opcode address
Single Pass Assembler
• Uses two tables: a Symbol Table and a Reference Table
• When an undefined symbol is encountered, it’s added to the
Reference Table
• When the symbol is defined, its address is placed in Symbol Table, and
all locations that reference the symbol are updated.
CMPE 220 23
Single Pass Example
Line # Label Instruction Argument Address Instruction Size
1 Program START 1000 1000 0
2 LDA Inventory 1000 3
3 LDT Sales 1003 3
4 SUBR T, A 1006 2
5 J Somewhereelse 1008 3
6 Partnumber BYTE C’005740’ 1011 6
7 Inventory WORD 500 1017 3
8 Sales WORD 27 1020 3
CMPE 220 24
SIC/XE – Special Hardware Cases
• Some Instructions (e.g. LDA) may be 3-byte or 4-byte
• 24-bit
• disp: 12-bit address displacement
• 32-bit
• address: 20-bit address
CMPE 220 25
2nd Pass Must Update Instructions &
Tables
Line # Label Instruction Argument Address Instruction Size
1 Program START 1000 1000 0
2 LDA Inventory 1000 3
3 LDT Sales 1003 3
4 SUBR T, A 1006 2
••• •••
6 Partnumber BYTE C’005740’ 27011 6
7 Inventory WORD 500 27017 3
8 Sales WORD 27 27020 3
CMPE 220 26
2nd Pass Must Update Instructions &
Tables
Line # Label Instruction Argument Address Instruction Size
1 Program START 1000 1000 0
2 LDA Inventory 1000 4
3 LDT Sales 1004 3
4 SUBR T, A 1007 2
••• •••
6 Partnumber BYTE C’005740’ 27012 6
7 Inventory WORD 500 27018 3
8 Sales WORD 27 27021 3
CMPE 220 27
2nd Pass Must Update Instructions &
Tables
Line # Label Instruction Argument Address Instruction Size
1 Program START 1000 1000 0
2 LDA Inventory 1000 4
3 LDT Sales 1004 4
4 SUBR T, A 1008 2
••• •••
6 Partnumber BYTE C’005740’ 27013 6
7 Inventory WORD 500 27019 3
8 Sales WORD 27 27022 3
CMPE 220 28
Programming an Assembler
• Symbol Table: built by 1st Pass; matches symbols to addresses
• Reference Table: built be 1st pass; tracks references to symbols
• Scanner / Tokenizer: scans each line looking for tokens delimited by
space characters:
label (optional) – instruction – address or register (optional) – comment?
• Instruction lookup: searches a pre-defined table for the matching
instruction string. The table contains:
• Opcode
• Instruction length
• Instruction type
• Assembly routines: a subroutine to assemble each instruction format.
CMPE 220 29
A Bit More History: The First
Assemblers
• In the 40s and 50s, memory was very limited. It wasn’t possible to
store a two-pass assembler, the source code of the program being
“assembled”, and the various data structures in memory.
CMPE 220 30
History (continued)
• With the advent of magnetic disks in the late 50s, operators didn’t
need to load multiple card decks… the two “passes” of the assembler
were stored on disk, along with the source code of the program being
assembled.
• Memory was still very limited, and the same sequence of “overlay”
steps was still performed… although much faster!
CMPE 220 31
Assembler Output: the Object File
• Not just a binary executable file!
• Specifics vary from system to system.
• Common elements:
• Header record:
• Start address
• Length
• Reference Table (for linking object files)
• Symbol Table (for linking object files)
CMPE 220 32
Assembler Output: the Listing
• Line numbers and source code (including comments)
• Binary (octal, hex) instruct codes
• Symbol Table
• Error messages!
CMPE 220 33
Assembler Output: Sample Listing
CMPE 220 34
Assembler Error Conditions
• Unrecognized Opcode:
LDQ
• Missing address:
LDA
• Missing register(s):
ADDR
ADDR S
• Invalid register(s):
ADDR S, G
CMPE 220 35
Assembler Error Conditions
(continued)
• Unknown Address: location referenced but not defined:
LDA inventroy
• Duplicate label definition:
inventory WORD 500
•
•
•
inventory WORD 500
• Location defined but never referenced (warning):
inventofy WORD 500
CMPE 220 36
What is Pseudocode?
• An informal high-level description of an algorithm
• Typically includes programming language constructs such as
IF/THEN/ELSE, DO/WHILE, FOR, etc.
• Typically includes use of variables
• Augmented by natural language (Descriptions)
• May be more or less “formal”
• May resemble a particular programming language
CMPE 220 37
Pseudocode: Scan a Line
• label opcode argument ;comment
CMPE 220 38
Pseudocode: Scan a Line (cont)
• Get next line from input file
• IF (first character is not a space) THEN
• Copy characters until blank into $label;
• Skip blanks
• IF (end-of-line) THEN
• GOTO error
• Copy characters until blank into $opcode
• Skip blanks
• IF (not end-of-line) THEN
• Copy characters until blank into $argument
CMPE 220 39
CMPE 220
CMPE 220 1
Using Pseudocode
Fibonacci Sequence
2
Output
Fibonacci Sequence
3
Pseudocode
PRINT "Fibonacci Sequence”
linenumber = 0;
prev_prev = -1;
prev = 1;
WHILE (linenumber <= 30)
{
fib = prev_prev + prev;
PRINT linenumber and fib
IF (fib mod 5 is 0) PRINT “divisible by 5”
PRINT new line
i = i + 1;
prev_prev = prev;
prev = fib;
}
4
Compilers Emit Assembly Language
Code
• Sometimes length and complex instruction sequences are req uired
to do things that are not built into the machine architecture
CMPE 220 5
EXAMPLE: Nesting Subroutines
CMPE 220 6
A Subroutine Call Stack
• SIC/XE has JSUB and RSUB instructions, but only one L register
• Nested subroutine calls don’t work!
7
EXAMPLE: Nesting Subroutines
CMPE 220 8
Storage Declarations
SSTACK RESW 10 Allow 10 nested subroutine calls
SINDEX WORD 0 Index
SMAX WORD 30 Stack Maximum
SAVEA RESW 1 Space to save A register (we’ll see why)
SAVEX RESW 1 Space to save X register (we’ll see why)
9
Subroutine Call
JSUB *+3 Get address into L register
STA SAVEA Save the A register
STX SAVEX Save the X Register
LDA SINDEX Get the stack index in A
COMP SMAX See if we’re at the end of stack
JEQ ERROR
RMO L, A Move return address into A register
ADD 35 (?) Increment address to point past this code
LDX SINDEX Get the stack index in X
STL SSTACK,X Use indexed addressing to store return address on stack
LDA #3 Increment stack index
ADDR X, A
STX SINDEX
LDA SAVEA Restore the A register
LDX SAVEX Restore the X register
J SUB Jump to the subroutine
••• ••• This is where we will return
10
Subroutine Return
LDA SINDEX Decrement stack index
SUB #3
STA SINDEX
LDX SINDEX
LDL SPTR,X Indexed load to get the return address into L register
LDA SAVEA Restore the A register
LDX SAVEX Restore the X register
RSUB
11
Managing a Runtime Stack
• Many compiled languages use a runtime stack
• A “frame” is pushed onto the stack each time a subroutine is called
• The frame contains the subroutine parameters, and the local variables
• Calling sequence:
• Load A with the number of bytes required
• jsb stkpush
• Returns the address of the stack frame in A
• If the stack is full, returns 0 in A
CMPE 220 12
Memory Management
stack RESW 1000 ; space for stack stack
endstack WORD 0 ; end of stack Stack frame 1
stacktop WORD #stack ; next unused word
•••
stackptr RESW 1 ; stack address pointer
Stack frame 1 pointer
savea RESW 1 ; place to save A register
Stack frame 2
savex RESW 1 ; place to save X register
•••
Stack frame 2 pointer
CMPE 220 13
Memory Management
stkpush STX savex ; save the X register
LDX stacktop ; get the current top of stack
STX stackptr ; save it as next stack pointer
ADD stackptr ; add space needed to stack pointer
ADD #3 ; add space for previous stack pointer
COMP #endstack ; check for stack overflow
JLT stkOK
CLEAR A ; return error condition if stack is full
RSUB
stkOK STA stacktop ; update top of stack
SUB #3 ; get address to store prev stack ptr
RMO X, A
LDA stackptr ; get stack pointer and put in current stack frame
STA 0, X
LDX savex ; restore X
RSUB ; return to caller
CMPE 220 14
Memory Management
stkpop STA savea ; save the A register
STX savex ; save the X register
LDA stacktop ; Get the address of the bottom of the previous stack frame
SUB #
RMO X, A ; move it into X
LDA 0, X ; get the address of the previous stack frame
STA stacktop ; make it the new top of stack
LDA savea ; restore A
LDX savex ; restore X
RSUB ; return to caller
CMPE 220 15
Printing
• The compiler emits code to print; this is usually a call to a library
routine
• As an example, here is a very simple function to print a string
• Calling sequence:
Store the I/O device number in outdev
Load X with the address of a null-terminates string to print
jsb pstr
CMPE 220 16
Printing
outdev BYTE 5 ; Output device number
savea RESW 1 ; Space to save the A register
CMPE 220 17
Assignment 2
• Short assignment in Canvas – due next Wednesday
• Pseudocode
• SIC/XE Assembly Language
CMPE 220 18
CMPE 220
CMPE 220 1
Building Software
High Level Assembly Object file
Language Language (binary machine
Compiler Assembler
Source Code Source Code code and tables)
(e.g. C++)
optional
Executable In-Memory
file Code Hardware
Linker Loader
Execution
CMPE 220 2
System Dependent Components (Compilers not
yet discussed)
Assembler Linker Loader
• Parses assembly language source • Links object files • Relocates executable file
code • Relocates modules and adjusts • Loads executable file
• Assigns addresses addresses • Inserts startup / terminate code
• Assembles instructions • Generates executable file • Launches executable file
• Generates object (*.o) files • Reports errors • Reports errors
• Generates listings
• Reports errors
Operating System
• Memory Management
• Process Management
Hardware
• Instruction set
• Addressing modes
• Memory address space
CMPE 220 3
The Link Step
• Linkers combine multiple object (*.o) files into a single executable file
• Linkers connect symbols defined in one code module (EXTDEFS) with
symbol references in another module (EXTREFS)
• The linker may automatically search system libraries for external
routines (automatic linking)
• Most systems require a link step, even for a single object file
• The linker creates a properly formatted executable file
• Object file (*.o) format differs from executable file format
CMPE 220 4
Historical Aside
• As programs grew larger, it became
desirable to break the development
into several parts… and even to
share parts of programs
• One of the earliest linkers was
developed by Grace Hopper in
1952
• She called it a “compiler” because
it compiled several parts into one
program
CMPE 220 5
Linker Relocation: (review)
Symbol Table Symbol Reference Table
Symbol Type Address Symbol Type Reference
LOOP ADDR 1017 EXPO EXTREF 1021
INCR ADDR 1027 INCR REF 1026
CONT ADDR 1055 LOOP REF 1030
COUNTER ADDR 1095 LOOP REF 1048
MAXINDEX EQU 300 COUNTER REF 1070
MAXINDEX REF 1076
CMPE 220 6
Module Relocation: Linker
External Symbol Table Modification Table
Symbol Type Address Reference
1021
1026
External Symbol Reference Table 1030
CMPE 220 7
Module Relocation: Linker
External Symbol Table Modification Table
Symbol Type Address 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0
External Symbol Reference Table 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
Symbol Type Reference 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
EXPO EXTREF 1021 1 0 0 0 0 1 0 0 0 0 1 0 1 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
CMPE 220 8
Dynamic Linking
• On some systems, shared libraries are not linked into the executable
file.
• Instead, the linker inserts a call to a special system function for
dynamic linking and loading (also called linking on demand).
• Supported, with variations, on most modern platforms:
• Windows: DLL (Dynamic Link Library) files
• POSIX & Solaris: ELF (Executable and Linkable Format) files
• MacOS: .dylib (Dynamic Library) files
CMPE 220 9
Dynamic Linking Advantages
• Reduces the memory footprint of the executable
• Loads only routines that are actually used
• Allows run-time binding, so different versions of shared library
routines may be used
• E.g. a version with debugging or tracing code
• A single copy of a dynamic library may be shared among processes via
MMU mapping.
• An MMU, or Memory Mapping Unit, is a hardware device that maps
addresses in a process address space to actual locations in physical memory
CMPE 220 10
Shared Dynamic Libraries
Process 1 Address Space Physical Memory Process 2 Address Space
CMPE 220 11
Shared Dynamic Libraries (Data
Separation)
Process 1 Address Space Physical Memory Process 2 Address Space
(Process 1 Data)
(Process 2 Data)
CMPE 220 12
The Load Step
• Loads an executable file into memory
• May relocate the file
• Performs any required system initialization
• Launches the program (starts execution)
CMPE 220 13
Absolute Addressing
Predefined Addresses
1000 Program Main • Very early systems
Module
• Embedded systems
• Routines can be located at pre-
2000 Library 1 designated addresses
Module
• Separate assembly files
• No linker required
3000 Library 2 • Absolute Loader
Module
CMPE 220 14
Absolute Addressing Problem: code
growth
Predefined Addresses
1000 Program Main • Very early systems
Module
• Embedded systems
• Routines can be located at pre-
2000 Library 1 designated addresses
Module
• Separate assembly files
Program Main
(extra code) • Absolute Loader
3000 Library 2 • Memory management is the
Module responsibility of the programmer
CMPE 220 15
Absolute Addressing – contiguous
code
Predefined Starting Address – Contiguous Code
1000 Program Main • Very early systems
Module
• Embedded systems
Library 1 • Routines can be located at pre-
2000 Module designated addresses
Library 2
Module
• Ways to generate:
• Single assembly file (no linker)
3000 • Multiple files (linker required)
• Absolute Loader
CMPE 220 16
Absolute Loader
• Loads an executable file, with a defined start address and size, into
memory
• “Code” format in the executable file may be binary, or character-encoded
(e.g. hex character codes)
• No relocation is needed at load time
• Executable file may be generated from a single assembly source file,
or may be linked.
• Linker relocates modules in order to combine them into a single address
space
CMPE 220 17
Absolute Loader - Use Cases
• “Primitive” machine – fixed address layout
• Many embedded system
• Bootstrap loader
• Load the initial software (typically the OS) when a computer is started
• A system that uses a hardware MMU – software doesn’t need to be
relocated.
CMPE 220 18
Relocating Loader
Linking / Loading with Relocation
1000 Program Main With START Instruction
Module
• Assembler: set addresses based
on START
Library 1 Starts at 1720
2000 Module • Linker or Loader: update
Library 2 Starts at 2176 addresses
Module • new address = old address – START
+ new module base address
3000
CMPE 220 19
Relocating Loader
Linking / Loading with Relocation
0 Program Main Without START Instruction
Module
• Assembler: ignore START; set
addresses based on 0
Library 1 Starts at 720
1000 Module • Linker or Loader: update
Library 2 Starts at 1176 addresses
Module • new address = old address + new
module base address
2000 • START instruction in assembly
language is superfluous
CMPE 220 20
Program Relocation: Loader
Modification Table Modification Table
Reference 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1021 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1026 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0
1030 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
1048 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1070 1 0 0 0 0 1 0 0 0 0 1 0 1 0 0
1076 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
CMPE 220 21
Terminology
• Linker: Links multiple object (*.o) files into a single executable file
• Linkage Editor: an old-fashioned term for linker.
• Dynamic Linker: allows links to be made to shared library routines at run
time (on demand)
• Linker/Loader, or Linking Loader: links multiple object (*.o) files into a
single executable file, which it then loads and runs
• Loader: loads and runs an executable file
• Absolute Loader: loads an executable file at its specified address in
memory
• Relocating Loader: relocates an executable file, allowing it be loaded at
any address
CMPE 220 22
Interactive Development
Environments (IDEs)
High Level Assembly Object file
Language Language (binary machine
Compiler Assembler
Source Code Source Code code and tables)
(e.g. C++)
optional
Executable In-Memory
file IDE / Code Hardware
Linker
Debugger Execution
CMPE 220 23
IDEs: a Development Framework
and Toolset
• In the first lecture, we talked about:
• Shells: used for a wide range of ad hoc and automated tasks
(chiefly on POSIX systems)
• Make: automates the program build process
CMPE 220 26
History
• The first modern, GUI-based IDE
was a commercial product,
Maestro I, released by Softlab
Munich in 1975
• It was a leased hardware/software
system, which included a custom
keyboard and display
• Turbo PASCAL – 1983
• Visual Basic – 1991
• Delphi - 1995
CMPE 220 27
Eclipse – a Popular Open Source IDE
CMPE 220 28
Most Popular IDEs
IDE Developer
Visual Studio Microsoft
Eclipse Open Source
Xcode Apple
Android Studio Google (based on IntelliJ)
NetBeans Oracle / Open Source
IntelliJ IDEA Proprietary - JetBrains
PyCharm Proprietary - JetBrains
Komodo Proprietary - Komodo
Xojo (formerly RealBasic) Proprietary - Xojo
Bash Shell Open Source / POSIX
CMPE 220 29
IDE Portability Versus Hardware
Dependency
Feature Machine Dependent?
Smart Editor / Code Editor NO
Program Build NO
Version Control NO
File Browser NO
High Level Language Support NO
Code Refactoring NO
Assembly Language Support YES
Visual Layout Editor YES
Performance Profiling YES
Debugging YES
CMPE 220 30
Smart / Code Editing
• Multiple Languages
• Syntax checking
• Auto-complete
• Indentation and code
cleanup
• Color coding
(configurable)
• Block checking /
bracket matching /
paren matching
CMPE 220 31
Source Code Control / Version
Control
• Allows multiple people to collaborate on a project
• Files are “checked in” to the version control system, and “checked
out” for editing
• Checking in files typically requires a list of changes (comments),
allowing a kind of audit trail
• The version control system stores all versions of a file, allowing users
to revert to a previous version
• Some version control systems allow the project team to manage
“forks” and to intelligently merge forked versions.
• New version versus bug fixes
CMPE 220 32
Source Code Control: Checkpoints
and Forks
• When a project is “released” Mylibrary.c
all files are checkpointed – Release 1.0
given a Release number
fork
• Developers may later fork
files to work on different Mylibrary.c Mylibrary.c
versions independently
• The Source Code Control
System provides assistance Mylibrary.c
Release 1.1
for merging forked versions assisted
merge
Mylibrary.c
Release 2.0
CMPE 220 33
Popular Version Control Systems
Version Control System Model Developer
Revision Control System (RCS) Local Open Source
Source Code Control System (SCCS) Local Open Source
Git Distributed Linus Torvalds / Open Source
Concurrent Versions System (CVS) Client / Server Open Source
Subversion (SVN) Client / Server Open Source
AWS CodeCommit (based on Git) Distributed Amazon
Team Foundation Server Client / Server Micrsoft
Rational ClearCase Distributed IBM
Mercurial Distributed Open Source
CMPE 220 34
Visual Layout
• Drag-and-Drop
components
• Attach code to
components
CMPE 220 35
Debugging (Sample GUI)
LDX #0 Label Addr Byte Word
Value Value
STX matches matches = 0
STA index
CMPE 220 36
Debugging
Label Addr Byte
Value
Word
Value
• Allows the user to view the
address and content of memory
LOOP 1017 74 5330938 locations by label
CONT 1042 107 7410384 • May allow different display
matches 1062 0 12 formats (decimal, binary, hex,
character, etc)
arraya 1066 2 135017
• Allows the user to alter the
Register Byte Value Word Value content of memory locations
A 112 7810374
• Allows users to view and alter
B 44 2349534 register contents
X 0 0
CMPE 220 37
Debugging (Breakpoints) (ANIMATE)
STA index
CMPE 220 38
Debugging (Breakpoints)
LDX #0 Label Addr Byte Word
Value Value
STX matches matches = 0
STA index
CMPE 220 39
Setting a Breakpoint (ANIMATE)
JSUB
JEQ BREAK
INCR GIVE
If theyCONTROL
match, go
TOincr
IDEcount
ADD #3
STA index
CMPE 220 40
Hitting a Breakpoint
Pseudocode
BREAK:
Save L Register (JSUB return address)
Save all Registers
Save CC (condition code)
Update Program Display (show position in code)
Update Memory Display (show values of memory locations)
CMPE 220 41
Continuing After a Breakpoint
When the User Clicks RUN
Restore saved registers
Restore CC (condition code)
Execute Saved Instruction
Jump to return address (L register)
CMPE 220 42
Impact on Memory and Addressing
Memory
IDE and • Debugger and program are a single
Debugger
process
• Debugger and program reside in the same
Program
being
address space
debugged • Debugger needs to relocate program
when loading it
• Debugger is able to read and write
program memory
CMPE 220 43
Breakpoints Using Interrupts
• An interrupt is a hardware mechanism for interrupting the normal
flow of program execution.
• Causes an ‘immediate’ branch to an interrupt handler routine
• Interrupts may be triggered by
• A timer
• An I/O operation
• An instruction that triggers an interrupt
CMPE 220 44
Setting a Breakpoint (ANIMATE)
CMPE 220 45
Impact on Memory and Addressing
Process 1 Address Space Process 2 Address Space
IDE and Program
Debugger being
debugged
1035 JEQ INCR If they match, go incr count arraya REF 1066
1038 CONT LDA index Increment index by 3 bytes
arrayb REF 1365
1041 ADD #3
CMPE 220 47
Debugging (High Level Languages)
status = cgi_get(&common, field_buffer, 1000); Label Addr Value
status = er_showerror(&common);
Insert Break Delete Break Run
exit (0);
CMPE 220 48
Debugging (High Level Languages)
• The debugger must now be able to connect high-level source code
statements to machine code addresses, to allow breakpoints
• The debugger must understand and be able to display variable arrays,
different data types, structures, etc.
CMPE 220 49
Additional File Information
• The information required by debuggers adds a great deal of data to
the object and executable files
• Typically an option when compiling, assembling, and linking:
gcc –g
• gcc -g generates debug information to be used by GDB debugger
CMPE 220 50
For Next Thursday
• Log in to Canvas and complete Assignment 3
• Read Chapter 4 – Macro Processors
CMPE 220 51
CMPE 220
CMPE 220 1
Building Software
High Level Assembly Binary Machine
Language Language Code
Compiler Assembler
Source Code Source Code
(e.g. C++)
optional
Executable In-Memory
Code Code Hardware
Linker Loader
Execution
CMPE 220 2
What is a “Macro?”
• A macro (which stands for "macroinstruction") is a programmable
pattern which translates a certain sequence of input into a preset
sequence of output.
• A macro definition defines an input sequence, and the corresponding output
sequence (expansion)
• Using a macro within an input stream is called an invocation
• Macros can be used to make programming (or other tasks) less
repetitive.
• Macros are another step in the direction of convenience
CMPE 220 3
Example
• Macro Definition (syntax will vary depending on macro language):
aliqua :: bananas are rich in potassium ;;
Input with Macro Invocations Output with Macro Expansions
Lorem ipsum dolor sit amet, consectetur adipiscing Lorem ipsum dolor sit amet, consectetur adipiscing
elit, sed do eiusmod tempor incididunt ut labore et elit, sed do eiusmod tempor incididunt ut labore et
dolore magna %aliqua. dolore magna bananas are rich in potassium.
Ut enim ad minim veniam, quis nostrud exercitation Ut enim ad minim veniam, quis nostrud exercitation
ullamco laboris nisi ut aliquip ex ea commodo ullamco laboris nisi ut aliquip ex ea commodo
consequat. Duis aute irure dolor in reprehenderit in consequat. Duis aute irure dolor in reprehenderit in
voluptate velit esse %aliqua dolore eu fugiat nulla voluptate velit esse bananas are rich in potassium
pariatur. Excepteur sint occaecat cupidatat non dolore eu fugiat nulla pariatur. Excepteur sint
proident, sunt in culpa qui officia deserunt mollit occaecat cupidatat non proident, sunt in culpa qui
anim id est laborum. officia deserunt mollit anim id est laborum.
CMPE 220 4
Example: Positional Arguments
• Macro Definition (syntax will vary depending on macro language):
fruit() :: %1 are rich in %2 and %3 ;;
Input with Macro Invocations Output with Macro Expansions
Lorem ipsum dolor sit amet, consectetur adipiscing Lorem ipsum dolor sit amet, consectetur adipiscing
elit, sed do eiusmod tempor incididunt ut labore et elit, sed do eiusmod tempor incididunt ut labore et
dolore magna %fruit(bananas, potassium, Vitamin dolore magna bananas are rich in potassium and
B6). Vitamin B6.
Ut enim ad minim veniam, quis nostrud exercitation Ut enim ad minim veniam, quis nostrud exercitation
ullamco laboris nisi ut aliquip ex ea commodo ullamco laboris nisi ut aliquip ex ea commodo
consequat. Duis aute irure dolor in reprehenderit in consequat. Duis aute irure dolor in reprehenderit in
voluptate velit esse %fruit(oranges, Vitamin C, voluptate velit esse oranges are rich in Vitamin C and
Thiamin) dolore eu fugiat nulla pariatur. Excepteur Thiamin dolore eu fugiat nulla pariatur. Excepteur
sint occaecat cupidatat non proident, sunt in culpa sint occaecat cupidatat non proident, sunt in culpa
qui officia deserunt mollit anim id est laborum. qui officia deserunt mollit anim id est laborum.
CMPE 220 5
Example: Named Arguments with
Defaults
• Macro Definition (syntax will vary depending on macro language):
fruit(%name, %nutrient1, %nutrient2=sugar) ::
%name are rich in %nutrient1 and %nutrient2 ;;
Input with Macro Invocations Output with Macro Expansions
Lorem ipsum dolor sit amet, consectetur adipiscing Lorem ipsum dolor sit amet, consectetur adipiscing
elit, sed do eiusmod tempor incididunt ut labore et elit, sed do eiusmod tempor incididunt ut labore et
dolore magna %fruit(name=bananas, dolore magna bananas are rich in potassium and
nutrient1=potassium, nutrient2=Vitamin B6). Vitamin B6.
Ut enim ad minim veniam, quis nostrud exercitation Ut enim ad minim veniam, quis nostrud exercitation
ullamco laboris nisi ut aliquip ex ea commodo ullamco laboris nisi ut aliquip ex ea commodo
consequat. Duis aute irure dolor in reprehenderit in consequat. Duis aute irure dolor in reprehenderit in
voluptate velit esse %fruit(name= oranges, voluptate velit esse oranges are rich in Vitamin C and
nutrient1= Vitamin C) dolore eu fugiat nulla pariatur. sugar dolore eu fugiat nulla pariatur. Excepteur sint
Excepteur sint occaecat cupidatat non proident, sunt occaecat cupidatat non proident, sunt in culpa qui
in culpa qui officia deserunt mollit anim id est officia deserunt mollit anim id est laborum.
laborum.
CMPE 220 6
Example: Named Arguments with
Defaults
• Macro Definition (syntax will vary depending on macro language):
fruit(%name, %nutrient1, %nutrient2=sugar) ::
%name are rich in %nutrient1 and %nutrient2 ;;
• Omitting an argument that has a • Omitting an argument that does not
default is legal; the default value is have a default is an error.
used. I’ve heard that
I’ve heard that %fruit(%name=bananas,
%fruit(%name=bananas, %nutrient2=Vitamin C)
%nutrient1=potassium).
CMPE 220 7
Incorporating Macros
• A macro-processor may be an standalone program that processes text
before it is submitted to a compiler or assembler:
CMPE 220 8
Incorporating Macros
• Macro-processing can be built into a compiler or an assembler (i.e. a
macro assembler)
CMPE 220 9
Separating Definitions from Input
Files
• Macro definitions can reside in separate files, allowing shared
“libraries” of macro definitions
Macro
Definitions
File
CMPE 220 10
Macros Are Not Limited to
Programming
• Suppose our company produces documents for using medical
equipment. Out attorneys want us to include some legal disclaimers,
but the disclaimer language sometimes changes.
Input with Macro Invocations
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore
magna.
Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse aliento dolore eu fugiat nulla pariatur. Excepteur
sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
%disclaimer
CMPE 220 11
Macros Do Not Extend Instruction
Sets!
• Macros simply translate one text pattern into another.
• When used with assembly language – whether as a pre-processor, or
a macro assembler – they do not add new instructions to the
machine.
• Caution: A single macro may expand to many instructions. If the
macro is used frequently, it can cause code size to balloon.
• Caution: Macros may also have hidden side effects, such as changing
register values or the Condition Code (CC)
• Programmers who use macros need to keep these side effects in mind
CMPE 220 12
Macro Versus Subroutines
Macro Subroutine
Code is duplicated each time the macro is invoked Only one copy of the subroutine code
Labels can be a problem; if a label is used within a Labels can be freely used within the subroutine
macro, it will be duplicated each time the macro is
invoked, generating a duplicate label error
Macros may allow sophisticated argument handling, Argument handling is limited by the language. Most
e.g. changing order, default values, etc. assemblers have no provisions for arguments
CMPE 220 13
Real-World Use Case
• I was once assigned to test a math library on a new computer
• SIN, COS, TAN, ASIN, ACOS, ATAN, LOG, etc
For each function, I needed to:
• Test if the function returned known correct values for a range of
arguments
• Test the “sensitivity” to see if return values changed when input
arguments were varied by the smallest machine precision
• Test error handling to ensure an error status was return for invalid
arguments
Writing the test cases was horribly repetitive and tedious!
CMPE 220 14
Macros Reduced the Coding to 1
Line Per Test
• Sample Macro Definition (using pseudocode)
1 ERRORTEST (%function, %value) ::
2 IF (%function (%value) NE NaN) THEN
3 print ”Function %function failed to return an NaN for argument %value)
4 ENDIF
5 ;;
CMPE 220 15
Example: a Better JSUB
• The SIC instruction set has a subroutine call – JSUB
• Because there is only one return address register (L), subroutines
cannot be nested:
... ... 1
JSUB FIRST FIRST . . . ... 2
... ... JSUB SECOND SECOND . . . ...
... ... ... ...
4 ... ... RSUB
RSUB 3
CMPE 220 16
Macro Subroutine Call
• Our Macro Subroutine Call will use a stack to store the return address,
allowing subroutines to be nested
• We will need three macros:
• SINIT: Set up and initialize the subroutine return address stack
• SJMP: Subroutine jump
• SRET: Subroutine return
CMPE 220 17
SINIT: Initialize Subroutine Stack
1 SINIT ::
2 SSTACK RESW 10 Allow 10 nested subroutine calls
3 SPTR WORD #SSTACK Stack pointer – contains the address of the stack
4 SAVEA RESW 1 Space to save A register (we’ll see why)
5 ;;
CMPE 220 18
SJMP: Subroutine Jump
1 SJMP( %ADDR) ::
2 JSUB *+3 Get address into L register
3 STA SAVEA Save the A register
4 RMO L, A Move address into A register
5 ADD 24 Increment address to point past our macro
6 STA @SPTR Use indirect addressing to store return address on stack
7 LDA SPTR Increment stack pointer
8 ADD 3
9 STA SPTR
10 LDA SAVEA Restore the A register
11 J %ADDR Jump to the subroutine
12 ;;
CMPE 220 19
SRET: Subroutine Return
1 SRET ::
2 STA SAVEA Save the A register
3 LDA SPTR Decrement stack pointer
4 SUB 3
5 STA SPTR
6 LDA SAVEA Restore the A register
7 LDL @SPTR Indirect load to get the return address into L register
8 RSUB Return
9 ;;
CMPE 220 20
Downsides of the Macro Subroutine
Call
• Normally a SIC subroutine call costs two instructions
• JSUB: 1 instruction for each call
• RSUB: 1 instruction for the return from the subroutine
• Our Macro Subroutine call costs 17 instructions
• SJMP: 10 instructions for each call
• SRET: 7 instructions for the return from the subroutine
• There is a hidden risk of a stack overflow
• We could eliminate the hidden risk of a stack overflow by adding code in the
JSUB macro to check for more than 10 saved addresses, and jump to an error
routine – but this would add more code
CMPE 220 21
Additional Features
CMPE 220 22
Escape Characters
• Macro languages usually make use of special characters
• Our sample language uses the ‘%’ character
• If the target language we are generating uses that special character,
we have a problem, because the macro processor will be confused
when it sees the character in the macro definition
• We use an escape character – usually ‘\’ – to tell the macroprocessor
to simply pass through the next character and not treat it as part of
the macro language
• \%
CMPE 220 23
Automatic Label Generation
• Labels can be included within macros without duplication
• The macro processor assigns a unique label each to the macro is
invoked – usually based on a simple counter
CMPE 220 24
Defining Automatic Labels
Original SJMP SJMP with Automatic Labels
1 SJMP( %ADDR) :: 1 SJMP( %ADDR) ::
2 JSUB *+3 2 STA SAVEA
3 STA SAVEA 3 LDA #END%%
4 RMO L, A 4 ADD 3
5 ADD 24 5 STA @SPTR
6 STA @SPTR 6 LDA SPTR
7 LDA SPTR 7 ADD 3
8 ADD 3 8 STA SPTR
9 STA SPTR 9 LDA SAVEA
10 LDA SAVEA 10 END%% J %ADDR
11 J %ADDR 11 ;;
12 ;;
CMPE 220 25
Using Automatic Labels
First Time Macro is Invoked Second Time Macro is Invoked
1 STA SAVEA 1 STA SAVEA
2 LDA #END01 2 LDA #END02
3 ADD 3 3 ADD 3
4 STA @SPTR 4 STA @SPTR
5 LDA SPTR 5 LDA SPTR
6 ADD 3 6 ADD 3
7 STA SPTR 7 STA SPTR
8 LDA SAVEA 8 LDA SAVEA
9 END01 J %ADDR 9 END02 J %ADDR
10 ;; 10 ;;
CMPE 220 26
Conditional Macros
• Macro Processors may add “conditional” statements to generate
different code when the macro is invoked.
• The conditional statements are processed when the macro is
invoked… they are not part of the generated code
CMPE 220 27
Conditional Macro - Example
1 SJMP( %ADDR, %SAVE=YES) ::
2 %IF (%SAVE EQ YES)
3 STA SAVEA
4 %ENDIF
5 LDA #END%%
6 ADD 3
7 STA @SPTR
8 LDA SPTR
9 ADD 3
10 STA SPTR
11 %IF (%SAVE EQ YES)
12 LDA SAVEA
13 %ENDIF
14 END%% J %ADDR
15 ;;
CMPE 220 28
Conditionally Generated Code
SJMP(%ADDR=MYLABEL, %SAVE=YES) SJMP(%ADDR=MYLABEL, %SAVE=NO)
1 STA SAVEA 1 LDA #END02
2 LDA #END01 2 ADD 3
3 ADD 3 3 STA @SPTR
4 STA @SPTR 4 LDA SPTR
5 LDA SPTR 5 ADD 3
6 ADD 3 6 STA SPTR
7 STA SPTR 7 END02 J MYLABEL
8 LDA SAVEA 8 ;;
9 END01 J MYLABEL
10 ;;
CMPE 220 29
Macro Variables
• Allows us to define and use variables within the macro processor
• These variables have nothing to do with the target language. They
are visible only to the macro processor
CMPE 220 30
Macro Variables - Example
• We can define a variable and set its value
1 DEFINE %DEBUG = YES ;;
CMPE 220 31
Nested Macros
• With a simple, single-pass macroprocessor, a macro definition cannot
contain a macro invocation
• There are two ways to allow a macro definition to contain a macro
invocation:
• iteration
• recursion
CMPE 220 32
Single Pass Macro Processing
Pseudocode
1 read source file into input
2 status = macroprocessor(input, output) Process the input, searching for macro invocations, and return the output with expansion
3 Write output to output file
CMPE 220 33
Macro Processing With Iteration
Pseudocode
1 read source file into input
2 WHILE (macroprocessor(input, output) EQ invocations_found)
3 input = output
4 ENDWHILE
5 Write output to output file
CMPE 220 34
Macro Processing with Recursion
What is Recursion?
• The process in which a function calls itself directly or indirectly is
called recursion and the corresponding function is called as recursive
function.
• Recursion depends on a stack, which means that each time a function
is called, it has its own copy of local variables, and its own return
address
CMPE 220 35
Macro Processing With Recursion
Pseudocode
1 read source file into input 1 FUNCTION macroprocessor(input)
2 macroprocessor(input) 2 output = “”
3 Write output to output file 3 WHILE ( token = getnexttoken(input) )
4 IF (token is a macro invocation)
5 expansion = expandmacro(token)
• Single pass, but depends on the 6 macroprocessor(expansion)
ability of macroprocessor() to 7 ELSE
CMPE 220 36
Built-In Macro Processing
• Many assemblers and high-level language compilers feature built-in
macro processing features
• Macro Pre-Processor
• Expands macros into a separate file which is then passed to the compiler or
assembler
• Built-In Macro Processor
• Typically processes each line of the input, and then passes the expanded lines
to the scanner
• Best know example is and ANSI C compiler, which includes the ability to
define and expand macros
CMPE 220 37
Steps in Translating a Language
• Early assemblers used “brute force” techniques for scanning the input
stream.
• That was workable, because the structure of assembly language is very
simple:
label opcode operands comment
• Brute force techniques don’t work for high-level languages
• Macro Processors fall somewhere in between
• When we talk about compilers, we’ll break out the software
components that are used to break down complex languages
CMPE 220 38
Writing a Macro Processor
Software Components
• A scanner to search the input for macro definitions and invocations
• A macro and variable lookup routine
• A Variable table & lookup routine
• A routine to emit the macro expansion
CMPE 220 39
For Next Week
• Log in to Canvas and complete Assignment 4 – Macros
CMPE 220 40
CMPE 220
CMPE 220 1
Building Software
High Level Assembly Object file
Language Language (binary machine
Compiler Assembler
Source Code Source Code code and tables)
(e.g. C++)
optional
Executable In-Memory
file Code Hardware
Linker Loader
Execution
CMPE 220 2
High Level Languages
• Compilers convert “high level languages” to assembly language
• The original goals of high level languages were two-fold:
• Make it easier to write and to read programs
• Write portable programs that could run on any hardware
• This latter goal meant that the language had to be independent of the
machine architecture… it had to provide a level of abstraction that hid
the details of registers, opcodes, and even memory architecture.
• Computers had to have memory and processing speed to spare!
• Early languages had very rigid formats and few “features” – but
languages evolved, experimenting with new ways of programming
(and making the task of writing compilers progressively harder).
CMPE 220 3
21 Important Programming
Languages
• My list – others may differ
• The purpose of reviewing these languages is to:
• get a sense of how languages evolved and continue to evolve
• understand some of the key features of programming languages today, and
how they affect compilers and code generation
• As we go through the list, keep track of:
• How many of these languages you have heard of
• How many of these languages you have used
CMPE 220 4
A Few Important Languages – the
Early Years
• A (or A-0) – 1952 – created by Grace Hopper, and often considered
the first high-level language
• FORTRAN (FORmula TRANslation) - 1957 – developed at IBM - focus
on math and science programming
• LISP (LISt Processor) – 1958 – built in support for “list” data structures
– emphasis on mathematical programming
• ALGOL (ALGOrithmic Language) – 1958 – in many ways, ahead of its
time – a very “structured” language, used in teaching for decades
• RPG (Report Program Generator) – 1959 – a non-procedural language
that allowed users to describe/specify the desired output, and how it
was produced.
CMPE 220 5
More Important Languages
• COBOL (COmmon Business Oriented Language) – 1960 – focus on business
programming – user friendly syntax and words – the first portable language
• BASIC (Beginner’s All-Purpose Symbolic Instruction Code) – 1964 – created
as an easy-to-use teaching language, which became popular due to its
simplicity. Often implemented as an interpreter.
• B – 1969 – created at Bell Labs; an ALGOL-like language best known as the
precursor to C
• PASCAL – 1970 – created by Nicklaus Wirth as a highly structured
instructional language; quickly replaced ALGOL as the teaching language of
choice
• C – 1972 – developed by Dennis Ritchie and Brian Kernigan at Bell Labs. A
powerful (and dangerous!) language frequently used for OS programming.
CMPE 220 6
More Important Languages
• Smalltalk – 1972 – a dynamically typed, object-oriented programming language,
developed at Xerox Palo Alto Research Center (PARC). Slow to catch on, Smalltalk
represented a paradigm shift in programming languages.
• SQL (Structured Query Language) – 1975 – developed at IBM, based on Edgar Codd’s
relational database model
• C++ - 1980 – added object-oriented programming constructs – and a lot of other things –
to C. Very powerful, but often derided by computer language experts as overly complex
and error prone.
• MATLAB (MATrix LABoratory) – 1984 – a multi-paradigm numerical computing language
and toolkit – an early specialized language
• Perl – 1987 – designed for text processing; still frequently used for scripting on POSIX
systems
• Python – 1991 – dynamically-typed, garbage-collected language with an emphasis on
readability
CMPE 220 7
More Important Languages – the
Web Years
• HTML (Hyper-Text Markup Language) – 1989 – a non-procedural text formatting
language that formed the basis of the World Wide Web.
• Ruby – 1995 - dynamically typed and uses garbage collection. Supports multiple
programming paradigms, including procedural, object-oriented, and functional
programming. Originally interpreted.
• Java – 1995 – a garbage-collected, object-oriented language, created by James
Gosling at Sun. Widely used in teaching, and in commercial software development.
• PHP (Personal Home Page) – 1995 – a dynamically typed language with built-in
support for structures. The most widely used language for web programming.
Usually interpreted, though PHP compilers may be used for high-demand
applications.
• Javascript – 1995 – a lightweight, dynamically typed, interpreted language
embedded in web browsers (and other devices)
CMPE 220 8
A Timeline of Language Advances
Language Date Language Date
A (or A-0) 1952 PASCAL 1970
FORTRAN (FORMula TRANslator) 1957 C 1972
LISP (LISt Processor) 1958 Smalltalk 1972
ALGOL (ALGOrithmic Language) 1958 C++ 1980
RPG (Report Program Generator) 1959
MATLAB (MATrix LABoratory) 1984
COBOL 1960
(COmmon Business Oriented Language) HTML 1989
BASIC 1964 PYTHON 1991
(Beginner’s All-purpose Symbolic Ruby 1995
Instruction Code)
Java 1995
B 1969
PHP 1995
SQL (SEQUEL) 1975
Javascript 1995
1950 2000
CMPE 220 9
That’s 21 Influential Languages
• I believe these languages are important because:
• Many are still widely used today
• They introduced or popularized new ways of programming
• They influenced later languages
• There are a host of new languages since 2000, but none (in my
opinion) have broken out into the mainstream as truly influential
• How many have you heard of or used?
CMPE 220 10
How Many Languages Have You
Used?
• My totals:
• Heard of 21 (obviously, since it’s my list!)
• Used 13: FORTRAN, ALGOL, COBOL, BASIC, SQL, PASCAL, C, C++, HTML,
Python, Java, PHP, JavaScript (plus dozens of others)
CMPE 220 11
Why Are There No Famous
Assemblers?
• Assembly languages are, by their nature, tied to a specific machine
architecture
• No assembly language could become ubiquitous…. it could not spread
beyond the users of a particular computer
• Assembly languages change as processors evolve… most don’t
“persist” more than a few years
CMPE 220 12
History
• Kathleen Booth is credited with creating the first assembler in 1947,
while working on the ARC2 (Automatic Relay Calculator) computer at
the University of London.
• The first high level language compiler – for a primitive language called
A-0 – was developed by Grace Hopper in 1952. The A-0 system
combined features of what we would call a compiler, assembler, linker
and loader.
• Several versions followed: A-1, A-2, ARITH-MATIC, FLOW-MATIC
CMPE 220 13
History - FORTRAN
• The first FORTRAN (FORmula TRANslation)
compiler was developed at IBM in 1957, by a
team led by John Backus
• Backus pioneered parsing techniques based on a
formal set of syntax rules (a grammar)
• FORTRAN was the first commercial compiler – it
was an extra cost add-on when you bought an
IBM computer
• It is sometimes called the first modern compiler,
because it employed formal grammar rules
CMPE 220 14
History - COBOL
• The first COBOL (Common Business Oriented Language) compiler was
released in 1960.
• COBOL was designed by an industry-wide committee – proposed by
Mary Hawes, a Burroughs Corporation programmer
• It was the first “standardized” programming language
• It was heavily influenced by Grace Hopper, who advised the
committee
• It was the first language to run on multiple computers (the UNIVAC
II and the RCA 501) – thus achieving the goal of portability
CMPE 220 15
New Challenges
• Assemblers are easy to write.
• No special techniques are needed
• Anyone with a few years of programming experience should be able to write
an assembler
• The earliest assemblers were written using “brute force” programming
• Compilers raised the bar
• It is almost impossible to write a compiler using “brute force” methods
• New programming concepts and techniques were needed – and that’s what
we’ll be talking about this week and next
CMPE 220 16
Assemblers versus Compilers
Assembler Compiler
• Scanline(): look for label, opcode, operands, • Scanner: also called a lexical analyzer or tokenizer.
comments Breaks the input into a series of tokens
• Processline(): indentify the opcode, process • Parser: a semantic analyzer, that recognizes the
operands, etc. structure of the program based on the token
stream, and builds an internal representation of the
program
• Second pass: Assemble the instructions and • Code generator: process the internal
generate machine code into an object file representation and generate assembly-language
code for a specific machine
• Symbol Table: stores symbols and information • Symbol Table: stores symbols and information
about their type and definition about their type and definition
CMPE 220 17
Compiler Components
Source Code Scanner Tokens Parser Parse Tree
(character (lexical (types & values) (syntax (an internal
stream) analyzer) analyzer) representation
of the program)
Assembly
Code Language
Assembler •••
Generator
CMPE 220 18
Conceptual Design
• We can architect a compiler with three major parts:
19
Scanning
• A scanner converts the input stream into a series of tokens, or lexical
components that are part of the language:
label keyword ( ) + / ; { }
• Scanning typical uses a finite state machine or finite state automaton
driven by an underlying state table.
• The finite state machine is an efficient way to recognize a pre-defined
lexicon of tokens in an input stream – and to report an error when a
token cannot be recognized
CMPE 220 20
Finite State Machines
• Finite State Machine (FSM): an abstract machine that can be in
exactly one of a finite number of states at any given time
• The FSM can change from one state to another in response to inputs
• The change from one state to another is called a transition
• The FSM may define end states which represent the completion of a
sequence of inputs
• We often use diagrams to represent a finite state machine, because
it’s easy to visualize and understand
CMPE 220 21
Turnstyle: a Real-Word FSM
• Two States: locked and un-locked
CMPE 220 22
A Simple FSM
Recognizes a valid sequence of inputs
CMPE 220 23
A Simple State Transition Table
STATE a b c
1 2 - -
2 - 3 -
3 - - 4
(4) 2 - 4
CMPE 220 24
Additional “Transition” Notation
• A-Z: any uppercase
character
• a-z: any lowercase
character
• 0-9: any digit
• A set of possible
characters:
*/+-
CMPE 220 25
A More Complex (and useful) FSM
STATE TYPE A-Z a-z 0-9 + - * /
1
+ - * / 1 2 2 3 4
(2) Symbol 2 2 2 -
A-Z a-z 0-9
(3) Integer - - 3 -
2 3 4
(4) Operator - - - -
• Note that we’ve added a token type to the State Transition Table
CMPE 220 26
A Scanner / Lexical Analyzer /
Tokenizer
• Input = character stream
1
+ - * / Output = token stream
• EXAMPLE: Inventory * 25
A-Z a-z 0-9
Symbol Operator Integer
2 3 4 • EXAMPLE: Inventory - Sales
Symbol Operator Symbol
A-Z a-z 0-9 • EXAMPLE: - / Sales Inventory
0-9 Operator Operator Symbol Symbol
CMPE 220 27
The Scanner Is Called Repeatedly
space char • It is used to break the input stream into
a sequence of tokens
1 • In most languages, ‘space’ characters
+ - * / (including tab and end-of-line) can
A-Z a-z
terminate a token, but are otherwise not
0-9
part of the language lexicon
2 3 4 STATE TYPE space A-Z a-z 0-9 +-*/
1 1 2 2 3 4
(4) Operator - - - - -
CMPE 220 28
FSM Function state = 1;
value = “”;
error = none;
while ((inchar = getnextchar()) != EOF) {
STATE TYPE space A-Z a-z 0-9 +-*/ nextstate = STT_lookup(state, inchar);
1 1 2 2 3 4 if (nextstate == ‘-’) { // no valid transition
if (state is an endstate) {
(2) Symbol - 2 2 2 - unget(inchar); // for next token
}
(3) Integer - - - 3 - else {
error = ”invalid token";
(4) Operator - - - - -
}
break;
State Transition Table }
state = nextstate;
Reminder: (n) indicates a valid end state value .= inchar; // appended inchar to value
}
if (error == none)
return type[state], value;
else
return error;
CMPE 220 29
A Scanner is NOT a Syntax Analyzer
• The scanner recognizes the language lexicon, or vocabulary
• It will (happily) return sequences of tokens that are not valid syntax:
• + / Sales Inventory
Operator Operator Symbol Symbol
CMPE 220 30
Token Use: Parsing Versus Code
Generation
• In addition to the token type, the scanner returns the actual token values
• Inventory - Sales
Symbol (Inventory) Operator (-) Symbol (Sales)
• In order to determine if a string of tokens represents a valid sequence, the
parser needs to know the token types:
Symbol Operator Symbol - VALID
• In order to build code for execution, the code generator must have the
specific values of the tokens:
Inventory – Sales
• In general, the syntax analyzer (parser) is concerned with token type, while
the code generator is concerned with both token type and token value
CMPE 220 31
Compiler Components
High Level Scanner Tokens Parser Parse Tree
Language (lexical (types & values) (syntax (an internal
Source Code analyzer) analyzer) representation
(e.g. C++) of the program)
Assembly
Code Language
Interpreter - OR - Generator Source File
CMPE 220 32
Recognizing Syntax: Parsing
• Languages are made up of a lexicon or vocabulary of tokens, which are
combined using grammatical rules (syntax) to form meaningful statements
• The parser recognizes the syntax, or grammatical structure of the token
stream
• The parser returns an internal representation of the program being
compiled. The most common representation is a parse tree.
Scanner (lexical analyzer) Parser (syntax analyzer)
Recognizes lexical elements in the input Recognizes the grammatical structure of
stream the program
Returns token types, and actual token Returns an internal representation of the
values program
integer (356) typically a “parse tree”
CMPE 220 33
History: Describing a Grammar
• The first formal grammar definition dates back to the 4th – 6th
century BC
• An Indian scholar named Pāṇini developed grammatical rules for
Sanskrit
• Even today, grammar is taught by “parsing” a sentence and breaking
it down into a formal structure:
The green vegetables are always
disgusting, and I hate them
CMPE 220 34
History: Describing a Grammar
• In 1959, John Backus (who led the FORTRAN team) had an insight:
grammar rules were useful for things besides torturing students.
• John Backus and Peter Naur created a notation for describing the
grammar, or syntax, of a computer language
• This notation is called BNF, which stands for Backus-Naur Form or
Backus Normal Form
• Many extensions and variants exist today, including Extended Backus–Naur
form (EBNF) and Augmented Backus–Naur form (ABNF).
CMPE 220 35
Backus Naur Form
• Every rule in Backus-Naur form has the following structure:
<name> ::= expansion
• Every non-terminal symbol is enclosed by brackets
• Symbols may be concatenated in the expansion, indicating a
sequence:
<expr> ::= <term> <operator> <expr>
• Alternative expansions are separated by a vertical bar: |
<expr> ::= <term> <operator> <expr> | <term>
CMPE 220 36
BNF Example
<loop statement> ::= <while loop> | <for loop>
<for loop> ::= for “(“ <expression> “;” <expression> “;” <expression> “)“ <statement>
<digit> ::= “0” | “1” | “2” | “3” | “4” | “5” | “6” | “7” | “8” | “9”
CMPE 220 37
Formal Grammars Eliminate
Ambiguity
•A=2+3*5
Does A equal 25 or 17?
• 25:
<expression> ::= <term> | <term> <operator> <expression>
<term> ::= <variable> | <number>
<operator> ::= “+” | “-” | “*” | “/”
• 17:
<expression> ::= <term> | <term> <addoperator> <expression> |
<term> <multoperator> <expression>
<term> ::= <variable> | <number>
<addoperator> ::= “+” | “-”
<multoperator> ::= “+” | “/”
CMPE 220 38
Using a Grammar
• Once a grammar is defined:
• it can be used as a guide for hand-coding a parser, or
• It can be used by a variety of modern tools – such as YACC/Bison or ANTLR -
that will automatically build the code for a parser
• There are many “parser builders” – and each uses its own
specification language for defining the grammar
CMPE 220 39
Grammar “Formalism”
• It is possible to create grammars that are not parseable
• Grammars are categized based on certain characteristics, and specific
algorithms have been created to parse each class of grammar
• Parser builders are based on specific classes of grammars and their
associated algorithms.
• They will reject grammars that they cannot parse
CMPE 220 40
LR Grammars
• In 1965, Donald Knuth – author of The Art
of Computer Programming - invented
the LR parser (Left to Right, Rightmost
derivation). LR grammars and parsers are
extremely memory intensive.
• In 1969, Frank DeRemer proposed a
simplified version of the LR parser, called
the Look-Ahead LR (LALR) – the most
widely used type of grammar today
• Look-Ahead grammars rely on the ability
to see the next token, without fetching it
CMPE 220 41
LR versus LL
• Both process tokens left to right
• LL (left-to-right, leftmost derivation) grammars expand or derive the
leftmost non-terminal first
• Given a grammar tree, they attempt to expand the leftmost non-terminal first
• LR (left-to-right, rightmost derivation) grammars expand or derive the
rightmost non-terminal first
• Given a grammar tree, they attempt to expand the rightmost non-terminal
first
• The grammar type affects the specific languages that can be parsed
(and hence the design of programming languages), and the amount of
processing power and memory required to parse them
CMPE 220 42
Parser Generators
• In the early 70s, Stephen Johnson at Bell Labs created the YACC (Yet
Another Compiler Compiler) parser generator, and the LEX lexical
analyzer, drawing heavily on Knuth’s algorithms.
• Bison, a YACC replacement, is included in most POSIX distributions.
• Currently popular parser generators include JavaCC (Java Compiler
Compiler) and ANTLR (Another Tool for Language Recognition)
• ANTLR is able to process LL(0) and LL(1) (Left-to-right, Leftmost
derivation) grammars, where the number represents the degree of
look-ahead required
• The classes of grammars are beyond the scope of this course, and
won’t appear on assignments or exams.
CMPE 220 43
Example: A Simple Program in
“SICTRAN”
PRINT("Fibonacci Sequence \n\n\n"); fib = prev_prev + prev;
PRINT(i); PRINT(": "); PRINT(fib);
i = 0; IF (fib%5 == 0)
prev_prev = -1; PRINT(" Divisible by 5!");
prev = 1; PRINT(“\n”);
CMPE 220 45
Grammar for SICTRAN – Using
ANTLR
program : stmtList ; Char (s) Meaning
: Separates rule name from expansion
stmtList : stmt ( ';' stmt )* ; ; End of a rule
| Or
( … )* Repeat contents 0 or more times
stmt : assignmentStmt
( … )+ Repeat contents 1 or more times
| ifStmt ‘c’ Input character c – returned by scanner
| whileStmt NAME Uppercase NAME indicates a token type
that is returned by the scanner
| printStmt # string Comment
| compoundStmt
• The complete grammar is in the
; download files in Canvas
CMPE 220 46
Grammar of SICTRAN - continued
assignmentStmt : variable '=' intExpr ; variable : IDENTIFIER ;
ifStmt : IF '(' boolExpr ')' stmt ;
whileStmt : WHILE '(' boolExpr ')' stmt intExpr : intExpr mulDivOp intExpr
; | intExpr addSubOp intExpr
compoundStmt : '{' stmtList '}' ; | number
| signedNumber
printStmt : PRINT '(' printArg ')' ; | variable
| '(' intExpr ')'
printArg : variable # printVar ;
| STRING # printStr ; • Full Antler grammar for
SICTRAN is 39 rules
CMPE 220 47
Grammar Diagram for SICTRAN
CMPE 220 48
Using the Grammar: Writing a
Parser
• Grammar Tree: Consider all of the rules in a grammar arranged into a
tree structure
• Parsers may walk the tree bottom up or top down, with variants of
each, based on the grammar class and the parser algorithm
• Bottom Up: start with the token in the bottom (or leaf) rule, and
attempt to walk upwards in the tree by matching the input token
stream against the tree.
CMPE 220 49
Using the Grammar: Writing a
Parser
• Top down: start with the top-most rule ( program: stmtList ; ) and
recursively walk down the tree looking for matches in the token
stream.
• Reminder: The process in which a function calls itself directly or
indirectly is called recursion and the corresponding function is called
a recursive function
• We will learn how to write a top-down, recursive-descent parser to
build a parse tree, which can then be passed to the code generator
CMPE 220 50
What Does the Parser Do?
• The parser is the heart of the compiler
• Given a set of grammar rules, it analyzes the token stream and
converts it to a data structure that can be used to generate code…
typically a parse tree
• It calls on a scanner, or lexical analyzer, which accepts characters
from the input stream and returns tokens to the parser.
CMPE 220 51
Compiler Components - reminder
High Level Scanner Tokens Parser Parse Tree
Language (lexical (types & values) (syntax (an internal
Source Code analyzer) analyzer) representation
(e.g. C++) of the program)
Assembly
Code Language
Interpreter - OR - Generator Source File
CMPE 220 52
Aside: Data Structures and
Algorithms
• As we get deeper into compilers, we will talk about a number of data
structures, such as stacks, tables, arrays, trees, linked lists, and so on.
• There are well established coding techniques to create, manipulate,
and traverse these structures. There are formal courses and books
that explore them:
• CMPE 126: Algorithms and Data Structure Design
• CMPE 180A: Data Structures and Algorithms in C++
• Don Knuth’s Art of Computer Programming, volumes 1-3
• The pioneers in computing had to invent everything from the ground
up, but modern software developers can draw on decades of
knowledge
CMPE 220 53
Building a Parser by Hand
• A top-down, recursive-descent parser can be coded entirely by hand,
simply by writing a function for each rule.
• The “rule” functions may call functions for other rules
• Rule functions accept parse trees returned by the lower level
functions they call, and combine those trees… returning the resulting
tree to the caller.
• Let’s look at some concrete examples…
CMPE 220 54
Top-Down, Recursive Descent Parser
program : stmtList ;
function parse_program {
if ((tree = parse_stmtList()) == error)
return error;
return tree;
}
CMPE 220 55
Top-Down, Recursive Descent Parser
stmtList : stmt ( ';' stmt )* ;
function parse_stmtList {
if ((tree = parse_stmt()) == error)
return error;
while (parse_SEMICOLON() != error) {
if ((subtree = parse_stmt()) != error)
attach subtree to tree;
else {
ungetToken(); // the ability to “unget” a token is called a look-ahead rule
break;
}
}
return tree;
}
CMPE 220 56
Top-Down, Recursive Descent Parser
SEMICOLON : ‘;’ ;
function parse_SEMICOLON {
nextToken = getNextToken(); // call to scanner
if (nextToken.type == SEMICOLON)
return nextToken;
else {
ungetToken(); // the ability to “unget” a token is called a look-ahead rule
return error;
}
}
CMPE 220 57
The Parse Tree – the Output of the
Parser
• Sample Parse Tree (based on the example on the previous few slides)
program
stmtList
CMPE 220 58
The Parse Tree – Additional
Examples
• Inventory - Sales
Expression
CMPE 220 59
The Parse Tree – Additional
Examples
• Inventory = Inventory - Sales
Assignment
Variable Expression
Inventory
CMPE 220 60
Hand Coding Versus Parser Builders
• It is certainly possible to write a parser entirely by hand, by writing a
function for each rule in the grammar
• A typical term project in compiler classes
• SICTRAN – a very simple, PASCAL-like language, has 39 rules
• Grammars for modern real-world programming languages may have
many thousands of rules
• Parser builders make it feasible to create parsers for large, complex
languages
• They are also useful for simple languages, to avoid the minutia of
parser programming
CMPE 220 61
Code Generation (hand waving)
• To generate code, we “walk” the tree and take appropriate action for
each statement type
CMPE 220 62
Walking the Parse Tree
• In the previous example, why did we visit and process the tree nodes
in a rather strange order?
• That’s the responsibility of the functions written for the code
generator
• The code generator, like the parser, will have a separate function for
each node type.
• The code in that function will determine how – and in what order - it
processes child nodes
CMPE 220 63
Fibonacci - A Simple Program in
“SICTRAN”
PRINT("Fibonacci Sequence \n\n\n"); fib = prev_prev + prev;
PRINT(i); PRINT(": "); PRINT(fib);
i = 0; IF (fib%5 == 0)
prev_prev = -1; PRINT(" Divisible by 5!");
prev = 1; PRINT(“\n”);
0: 0 Divisible by 5!
1: 1
2: 1
3: 2
4: 3
5: 5 Divisible by 5!
6: 8
7: 13
8: 21
9: 34
...
CMPE 220 65
Fibonacci Parse Tree – a 14 line
program!
CMPE 220 66
CMPE 220
CMPE 220 1
Compiler Components
High Level Scanner Tokens Parser Parse Tree
Language (lexical (types & values) (syntax (an internal
Source Code analyzer) analyzer) representation
(e.g. C++) of the program)
Assembly
Code Language
Interpreter - OR - Generator Source File
CMPE 220 2
Elements of a Compiler
Scanner Parser Code Generator
Deals with the lexicon, or Deals with the syntax, or grammar Deals with the semantics, or
vocabulary of the source language of the source language meaning of the language
CMPE 220 3
The Parse Tree
• A data structure generated by the parser, and used as input by an
interpreter or code generator
• Inventory = Inventory - Sales
Assignment
Variable Expression
Inventory
CMPE 220 4
Notation for Examples
• Because we won’t be discussing how a tree data structure is
implemented, I will simply assign a number to each node for purposes
of discussion. This is just for convenience purposes during the lecture!
1 Assignment
2 Variable 3 Expression
Inventory
CMPE 220 5
Code Generation
• The code generator “walks” the tree, visiting each node.
• It has a function corresponding to each node type, e.g.
function processAssignment(1)
1 Assignment
2 Variable 3 Expression
Inventory
CMPE 220 6
Code Generation: expression
• The function processExpression(3) will generate code that will
subtract Sales from Inventory, and return the value.
• Should it generate an integer subtract, or a floating point subtract?
(We’ll answer that question in a few minutes)
1 Assignment
2 Variable 3 Expression
Inventory
CMPE 220 8
Symbol Use: Compiler Versus
Assembler
High Level Program Assembly Program
AdjustInv Inventory = Inventory – Sales; AdjustInv LDA Inventory
SUB Sales
STA Inventory
CMPE 220 9
Symbol Table: Language
Dependencies
• Languages may require all type declarations, variables, and function
names before they are used
• Because declarations precede their use, the code generator can build
the symbol table and generate the code in a single pass
• If the language does not require declarations to precede use, then the
code generator will require two passes:
• Pass 1 builds symbol table
• Pass 2 generates code
CMPE 220 10
Symbol Table: Language
Dependencies
• The symbol table may need to store definitions for complex data
types: single- and multi-dimensional arrays, structures, and so on
• These type definitions are needed to generate the correct code both
to create the data structures, and to access them
• The details on accomplishing this are usually covered in an
intermediate compiler class
CMPE 220 11
Generating a Symbol Table
• Two Pass code generation
• First pass walks the parse tree and builds the symbol table
• Second pass emits code
• Single Pass code generation
• Symbol table is built on the fly as symbols are encountered in the parse tree
• Much easier with languages that require symbols to be defined before use
CMPE 220 12
Symbol Table: Generated Symbols
• The code generator may also need to create new symbols that don’t
appear in the source program
• Symbols pointing into complex data structures and allowing them to be
referenced
• Symbols as jump targets
• Address labels for literals
CMPE 220 13
Generated Symbols – Data
Structures
Symbols pointing into complex data structures and allowing them to be
referenced in assembly language
• struct product { int weight; double price; } apple;
• apple_weight
• apple_price
CMPE 220 14
Generated Symbols – Jump Targets
• if ( var1 == var2)
{
some code;
}
LDA var1
COMP var2
JLT generated_label_1
JGT generated_label_1
some code
generated_label_1 next instruction
CMPE 220 15
Generated Symbols - Literals
• myString = “the cow jumped over the moon”;
•
•
•
LDA literal_01
STA myString
CMPE 220 16
Code Generation: Pass 2
• We’ve built the symbol table in Pass 1, so we know the type of the
variables (integer). Let’s look at some code
3 Expression
CMPE 220 17
Code Generation: Pass 2 (ANIMATE)
3 Expression
CMPE 220 18
Code Generation: Pass 2 (ANIMATE)
case ‘-’:
3 Expression
subIntVariable(6);
break;
4 Variable 5 Operator 6 Variable
Inventory - Sales
CMPE 220 19
Code Generation: A Simple Program in
“SICTRAN”
PRINT("Fibonacci Sequence \n\n\n"); fib = prev_prev + prev;
PRINT(i); PRINT(": "); PRINT(fib);
i = 0; IF (fib%5 == 0)
prev_prev = -1; PRINT(" Divisible by 5!");
prev = 1; PRINT(“\n”);
CMPE 220 21
Generated Code
CMPE 220 22
Generated Code
CMPE 220 23
Generated Code
CMPE 220 24
Generated Code
CMPE 220 25
Generated Code
CMPE 220 26
Code Generation: Local Data
This is a machine dependent compiler feature
Subroutine
• Data may be stored with the subroutine code – Local Variables
but it will be overwritten if the subroutine calls Subroutine
itself (recursion) Code
CMPE 220 27
Code Generation: Local Data &
Recursion
This is a machine dependent compiler feature
• Many languages save subroutine arguments,
local variable instances, and the return address
each time a subroutine is called
• Necessary for recursion
• The best mechanism for doing this is a stack
• Each time a subroutine is called, arguments are
placed on the stack, and space is allocated for
local variables
• Code needs to be emitted at the start and the end of
a subroutine to handle this
CMPE 220 28
Code Generation: The Stack
• When contained in a subroutine, variable references are not fixed –
we cannot simply pass the labels through to the assembler
• Local variables need to be “flagged” in the symbol table
• Address references for local variables are relative to the stack pointer
• Accomplishing this depends on the addressing modes available on the
underlying machine
CMPE 220 29
Code Optimizations: Register
Allocation
Register allocation is a machine dependent code optimization
• Registers are faster than memory – so we want to make effective use
of them!
• Optimizing registers requires knowledge of the machine architecture
• The code generator must have a register allocation function to keep
track of which registers are in use as the parse tree is walked and
code is generated
CMPE 220 30
Code Optimizations: Register
Allocation
Reduce Memory Accesses by Eliminating Redundant Loads/Stores
OldInventory = Inventory;
Inventory = Inventory – Sales;
Unoptimized Optimized
LDA Inventory LDA Inventory
LDA Inventory
CMPE 220 31
Code Optimizations: Register
Allocation
• One Register versus Multiple Registers
if (expression1 == expression2)
One Register Multiple Registers
Evaluate expression1 and return result in A Evaluate expression1 and return result in A
Store A in a temporary memory location Evaluate expression2 and return result in B
Evaluate expression2 and return result in A Compare A to B
Compare A to the temporary memory location
CMPE 220 32
Code Optimizations: Register
Allocation
Use Registers for Most Frequently Accessed Data
• Within each code block (single-entry, single exit)
• Load data into registers when entering block, store when exiting block
• Determine most-referenced data locations
• Keep the values of those locations in registers
CMPE 220 33
Code Optimizations: Register
Allocation
Optimize Register Usage in Inner Loops
• Programs spend most of their time in “inner loops”
• The register allocation function should place a higher emphasis on
register usage as nesting depth increases
• Optimization doesn’t always make the right decisions!
• A deeply-nested inner loop that is executed three times is less critical than an
outer loop that is executed 1,000 times
CMPE 220 34
Code Optimizations: Invariant Code
This is a machine independent code optimization
• Invariant Code Optimization: computations that do not change
should be removed from loops:
CMPE 220 35
Code Optimizations: Invariant Code
• Invariant Code Optimization: the optimized assembly language
would be equivalent to:
stop = max-1;
for (index = 0; index < stop; index++ ) {
if (array[index] > array[index+1] {
temp = array[index];
array[index] = array[index+1];
array[index+1] = temp;
}
}
CMPE 220 36
Compiler Output Options
High Level Scanner Tokens Parser Parse Tree
Language (lexical (types & values) (syntax (an internal
Source Code analyzer) analyzer) representation
(e.g. C++) of the program)
Assembly Abstract
Code Code
Language (P-code)
Generator
Source File - OR - Generator
Language - OR - Interpreter
Source File
CMPE 220 37
Compiler Output Options: Assembly
• The original – and still the primary – purpose of a high-level language
compiler is to convert the high-level code to assembly language for a
particular machine
• There are other options that can be considered for different purposes
CMPE 220 38
Compiler Output Options: P-code
• P-code: precompiled code, or portable code, or Pascal code
• Assembly language code for an abstract (hypothetical) machine
• P-code may be interpreted
• P-code may be translated into an assembly language instruction set
for the current machine
• P-code may be “compiled” on the fly to machine code
• Since P-code interpreters and translators exist on any machine,
writing a compiler that emits p-code is a quick way to achieve
language portability
CMPE 220 39
Building Software
High Level P-Code Assembly
Language Source Code Language
Compiler Translator
Source Code Source Code
(e.g. C++)
Machine
JIT Code
Interpreter Compiler
CMPE 220 40
Compiler Output Options: P-code
Examples
• P-code: may also refer to Pascal code –
an early portable layer
• PASCAL – 1970 – created by Swiss
computer scientist Nicklaus Wirth as a
highly structured instructional language;
quickly replaced ALGOL as the teaching
language of choice
• First P-code emitter - 1973
• UCSD P-Machine – 1977 – widely used in
academia, as well as commercially
CMPE 220 41
Compiler Output Options: P-code
Examples
• JVM (Java Virtual Machine): a virtual
machine built by James Gosling and
Brendan Eich – 1995 – specifically for
Java - but now widely used
• Virtual machine instruction set
• Manages memory and system resources
CMPE 220 42
Interpreters
• Early interpreters (e.g. BASIC – 1964): Line-by-line scanning, parsing
and interpreting
Modern Approaches
• Scan and Parse entire program – generate parse tree
• Interpret parse tree rather than source code
• Compile on-the-fly – as each line or statement is encountered, scan it,
parse it, and generate code.
• Great for optimizing loop performance
CMPE 220 43
Pure Interpreters Are Applications
• A traditional interpreter is not “system software”
• They don’t produce executable code
• They don’t have system dependencies
• They simply perform operations that are described to them in a high-
level programming language
CMPE 220 44
Really? Just an Application?
• Write a calculator application that accepts keyboard input - any string
of numbers separated by operators – and prints the result:
15.3 / 12.77 + 0.8 * 87
• You could write that application. You might write a scanner to break
the input down into tokens
• Now let’s add the ability to group operations:
17.4 * (18.2 – 3.0) + (7.77 / 8.9)
• Maybe at this point you’ll write a parser to break down the elements
and build a tree structure that you can walk to actually perform the
operations
CMPE 220 45
Yes. Just an Application.
• Now let’s add the idea of variables, so we can store and reuse values:
savethis = 17.4 * (18.2 – 3.0) + (7.77 / 8.9)
result = 0.15 * savethis
• You can parse this into a parse tree, and walk the tree to perform the
operations.
• You’ll need to create a data structure to save the contents of the
variables
Variable Name Contents
• But it’s still just an application!
• It doesn’t generate code savethis 265.353
• It isn’t system dependent
result 39.802
CMPE 220 46
Interpreted Languages Can Be Very
Rich
• We started with a simple calculator, added grouped expressions, and
then assignments and variables.
• We can continue to add features:
• Type declarations
• While loops
• If/then/else statements
• Functions and function calls
• The process of writing an interpreter is the same:
• scanner
• parser
• Executer (walks the tree and executes instructions)
CMPE 220 47
Interpreting a Parse Tree (ANIMATE)
Sales 22
CMPE 220 48
Interpreting a Parse Tree
function processExpression( node = 3 ) {
leftValue = processVariable(4);
rightValue = processVariable(6);
switch ( getValue(5) ) {
case ‘-’:
result = left - right;
3 Expression
return result;
CMPE 220 49
Why Write an Interpreter?
Easy to Use / Great for Beginners
• BASIC (Beginner’s All-Purpose Symbolic Instruction Code) – 1964 –
created as an easy-to-use teaching language, which became popular
due to its simplicity
• Immediate syntax checking
• Immediate execution – doesn’t require a complicated build process
CMPE 220 50
Why Write an Interpreter?
Portability!
• Allows languages to be run on any operating system and any
architecture without writing a compiler
• PHP – 1995 – now the most widely used language for web
applications.
• JavaScript – 1995 – embedded in web browsers on a wide range of
platforms. Portability is integral to the very concept of Javascript.
CMPE 220 51
PHP and Facebook: Case Study
• Facebook was originally written in PHP
• To improve performance, Facebook adopted the HipHop translator in
2010, which translated the php code to C++ (which was then
compiled)
• The compiled C++ code improved performance by a factor of two
• Issues: some PHP language constructs did not translate well with HipHop
• In 2013, Facebook switched to the HipHop Virtual Machine (HHVM), a
p-code abstract machine.
• PHP code is compiled into the HHVM instruction set
• HHVM instructions are compiled on demand into machine code
• New code being written in Hack, a strongly-typed PHP-like language
CMPE 220 52
PHP and Facebook: the Takeaway
• There are a range of technologies • Key software concepts provide
today that can be used for a basis for all of these
software development and technologies:
production operations: • Lexical analysis (scanning)
• Macro Processors • Syntax analysis (parsing)
• Compilers • Formal grammars
• Cross-compilers • Sematic processing
• Language Translators • Code generation
• Assemblers • Interpreting
• Interpreters
• IDEs & Debuggers
• Just-In-Time Compilers
CMPE 220 53
For Next Week
• Log in to Canvas and complete Assignment 5
CMPE 220 54
CMPE 220
CMPE 220 1
What is an Operating System?
Wikipedia: An operating system (OS) is system software that manages
computer hardware, software resources, and provides common
services for computer programs.
TechTerms: An operating system, or "OS," is software that
communicates with the hardware and allows other programs to run. It
is comprised of system software, or the fundamental files your
computer needs to boot up and function.
HowToGeek: An operating system is the primary software that
manages all the hardware and other software on a computer. The
operating system, also known as an “OS,” interfaces with the
computer’s hardware and provides services that applications can use.
CMPE 220 2
What is an Operating System?
Applications System Software
Business programs, scientific Compilers, assemblers, linkers,
programs, utility programs loaders, debuggers, databases
Operating System
Computer Hardware
Instruction Set
I/O Architecture
Memory
CMPE 220 3
History
• The first computers (1940s) did not have an operating system.
• Computers had a primitive absolute loader – often stored in some
form of non-volatile memory
• Programs were loaded – typically from punched cards – and had
complete control over the hardware
• Programmers were responsible for memory management and I/O
• When the program was done (or crashed), the next user would take
over the computer and load their program
• Microcomputers in the 1970s followed the same path, except
programs were usually loaded from punched paper tape
CMPE 220 4
History: Job Control Languages (JCL)
• The first step forward
(1950s) was the ability to
automatically load and
run a series of programs,
using a primitive “job
control language”
(JCL)
CMPE 220 5
History: Job Control Programs
• This required a small “job control program” to remain resident in
memory.
• This was a tradeoff. It used (precious) memory, but it made more
efficient use of the computer
• Batch processing
• Moved on quickly when programs didn’t work
• Did not require the programmer to be present
• This began a slippery slope
• Saving work from programmers by adding functions to the resident job
control program. Programming became simpler, but the control program got
bigger.
CMPE 220 6
A Typical Job
• A Fortran Compiler
• 3 ½ boxes of punched cards
• Each box = 2000 cards (about
ten pounds each)
• A 35 pound program!
CMPE 220 7
Late 1950s: Recognizable Operating
Systems
• IBM 1401 (1959)
1402 Card Read Punch 1407 Console 1401 CPU 729 Tape Drive 1403 Line Printer
CMPE 220 8
The IBM 1401: the Model T of
Computing
• 1959-1971
• Inexpensive
• Decimal (BCD) Arithmetic
• High Sales Volume: over 12,000 sold
• By the mid-1960s, almost half the computers in the world were 1401s
• There is a fully restored and working IBM 1401 at the Computer
History Museum in Mountain View
CMPE 220 9
Interested in Early IBM History?
• Light Blue: An Entry Level and Mid
Management Perspective of IBM's Evolution
Through the Golden Years
• Justin “Jud” McCarty
• June, 2020
CMPE 220 10
Building Software
High Level Assembly Binary Machine
Language Language Code
Compiler Assembler
Source Code Source Code
(e.g. C++)
optional
Executable In-Memory
Code Code Hardware
Linker Loader
Execution
CMPE 220 11
Building Software: 1950s
optional
In-Memory
Executable Code Hardware
Linker Code Loader
Execution
CMPE 220 12
What Does a Modern Operating
System Do?
1. Process Management
• Interprocess Communications
2. Input / Output (I/O) Management
3. Memory Management
4. File System Management
5. System Functions and Kernel Mode
6. User Interaction – (maybe)
CMPE 220 13
(1) Process Management
• A modern computer runs many programs at the same time
• Each instance of a running program is called a process
• A program is just code
• A process is code, data, and state information running on a computer
• Each process has its own address space… in effect, it behaves as if it is
the only program running
• It’s important to understand that the operating system itself runs as
one or more of these processes
CMPE 220 14
Processes – Mac OS: Activity
Monitor
CMPE 220 15
Processes – POSIX: ps –axl
command
CMPE 220 16
How Processes Are Created
• A system function call is required to create a new process
• A system function call is required to invoke the loader, load a
program, and start execution
• There may be two separate calls, or a single call to do both
• In POSIX systems:
• The fork() system call duplicates the currently running process – leaving TWO
processes running the identical code, with identical states
• The exec() system call loads an executable into the current process
• For one program to “launch” another, it first calls fork(), and then the child
process calls exec()
CMPE 220 17
Fork() and Exec()
If ((pid = fork()) < 0) { // After this call there should be 2 processes running
printf(“Fork failed\n”);
exit(-1);
}
If (pid == 0) { // This is the child process
exec( “/users/robert/fibonnacci” ); // Invoke the loader
printf(“Exec failed\n”);
exit(-1);
}
// Parent process continues
printf(“Process ID of child is %d\n”, pid);
CMPE 220 18
Fork() and Exec()
Process One Process Two
Program A Code B Code
Program A
CMPE 220
How Processes are Terminated
• The program may call a system function, such as the POSIX exit()
function, that does some cleanup and deletes the process
• When a program terminates without calling exit(), the system
automatically executes some code that does the same thing
• Alluded to in earlier lectures
• With appropriate permissions, one process may force the deletion of
another process by calling a system function such as the POSIX kill()
function
• The system may also have security features that will automatically kill
processes that use too much CPU, memory, etc.
CMPE 220 20
Interrupts
• A multiprocessing operating system relies on interrupts
• An interrupt is a signal sent to the processor that interrupts the
current process
• It may be generated by a hardware device or a software program
• Types of interrupts
• Timer
• I/O state change (such as operation complete)
• A signal from one process to another
CMPE 220 21
What an Interrupt Does
Currently Executing Program Interrupt Handler
rupt
te r
In
CMPE 220 22
Saving the Process State
• In order to allow us to resume execution of the current process, the
interrupt handler must save the state (such as the registers) of the
currently executing program in a data structure called a Process
Control Block (PCB)
• The is a PCB for each process
CMPE 220 23
Contents of a Process Control Block
(PCB)
CMPE 220 24
Process Scheduling
• The operating system makes sure that every program gets some time
to run.
• Scheduler: a component of the operating system that determines
which process runs next
• Dispatcher: The task of switching control to another process is called
dispatching, and the code that accomplishes this is called the
dispatcher
• Time Slicing: The operating system uses a timer interrupt to
periodically regain control so that it can switch from one process to
another
CMPE 220 25
Scheduling Algorithms
• Round Robin: All processes get an equal time-slice, in order
• Priority Scheduling: Some processes may be scheduled as higher
priority, and get more or longer time slices
• The original Lunar Lander had a single computer
• The process that adjusted the attitude control jets ran at a higher priority
than the process that updated the display
• Adaptive Scheduling: adjust scheduling based on process
performance
• A system may give less time to “CPU hogs”
• Many other algorithms are possible
CMPE 220 26
Switching Processes
Currently Executing Process Interrupt Handler Next Process
CMPE 220 27
Process Switching and I/O
• Typically, when a program starts an I/O operation, it can’t proceed
until the operation is complete
• In terms of processor speed, I/O operations take a really, really long
time
• The system scheduler will not give control to a process that is waiting
for I/O.
• We say that the process is blocked – or in an I/O wait state
CMPE 220 28
What Happens When a Program
Initiates I/O
• The program calls an operating system routine to start the I/O
operation
• The operating system routine will:
• Save the process state in its PCB
• Set a flag in the process’s PCB to indicate that the program is in an I/O wait
state
• Initiate the I/O operation
• Call the scheduler to determine which process to execute next
• Call the dispatcher to give control to that process
CMPE 220 29
When an I/O Operation Completes
• A completed I/O operation generates an I/O Interrupt
• The interrupt handler gains control, and:
• Saves the state of the currently executing process, whatever it may be
• Determines which process initiated the I/O operation that just completed
• Sets the status in that process’s PCB to indicate it is ready
• Calls the scheduler to determine which process to execute next
• Calls the dispatcher to give control to that process
• Note that the “next process” may or may not be the process that
initiated the I/O operation
• I/O completion simply makes that process ready, or eligible for
scheduling
CMPE 220 30
Returning Control to the I/O Caller
• Eventually, the dispatcher will return control to the process that
initiated the I/O operation
• At that time, the system I/O function will return, passing back the
results of the I/O operation
CMPE 220 31
I/O Processing System I/O Function
• Save the process state
Executing Program in its PCB
• Set I/O wait state
• Initiate the I/O
operation
CMPE 220 32
Process Synchronization and
Communication
• A process can pause, waiting for the completion of another process
using the wait() function call
• A POSIX shell may launch a child process to run a program from the command
line, and wait for its completion
• Inter-Process Communication (IPC): A process may make a system
call to send a signal to another process, or to set up a signal handler
to receive signals
• Just as with I/O wait states, these waits are indicated by a special
blocked status in the PCB, preventing the process from being
scheduled until the condition is met
CMPE 220 33
I/O ProcessingSystem Wait Function
• Save the process state
Process 1 in its PCB Process 2
• Set blocked state
CMPE 220 34
(2) Input / Output (I/O)
Management
• On early computers, I/O was performed by the processor:
WAITING TD DEVICE ; wait until device is ready
JEQ WAITING
RD DEVICE ; get a byte from device
STA BUFFER, X ; store byte in buffer
• Of course, today we don’t tie up the processor waiting for each byte!
CMPE 220 35
Adding a Primitive I/O Subsystem
• System functions to READ and WRITE data
• READ( device, buffer, count );
• WRITE( device, buffer, count );
• The calling process is placed in an I/O wait state
• Functions place the parameters in an I/O Control Block (IOCB)
• An I/O Process
• The I/O process contains a loop: for each IOCB:
• Check to see if the device is ready
• If so, READ or WRITE the next byte
• If the I/O operation is complete, set the status to ready in the calling
process’s PCB
CMPE 220 36
A Modern I/O Subsystem
• On modern systems, I/O is handled by hardware, rather than relying
on the processor for byte-by-byte transfers
• System functions to READ and WRITE data
• READ( device, buffer, count );
• WRITE( device, buffer, count );
• The calling process is placed in an I/O wait state
• Functions place the parameters in an I/O Control Block (IOCB)
• Functions initiate the I/O transfer using dedicated I/O controllers
CMPE 220 37
A Modern I/O Subsystem - Continued
• We do not need an I/O process
• The I/O controllers will generate an interrupt when the transfer is
complete
• The interrupt handler will set the status to ready in the calling
process’s PCB
CMPE 220 38
(3) Memory Management
Physical Addressing
• Early computers did not have any form of address translation
• Addresses used in the program exactly correspond to physical
memory addresses
• The size of physical memory was limited to the address range of the
instruction set
• To load multiple programs or code blocks into memory, the system
required a relocating loader
• Microcomputers in the 1970s and embedded computers today use
physical addressing
CMPE 220 39
Partitioned Memory
Partitioned Addressing
• Partitioned Addressing requires hardware support – a Memory
Management Unit (MMU)
• A “base address” register that can be set when a process is
dispatched, and possibly start and end addresses to provide memory
protection
• Memory references are translated in hardware by adding the “base
address register” to determine the address in physical memory
• The amount of memory available to any process may be limited by
the instruction set, but physical memory may be much larger
CMPE 220 40
Partitions
Process 1 Address Space Physical Memory Process 2 Address Space
CMPE 220 41
Memory Fragmentation
Starting State Process 2 Ends Process 4 Doesn’t Fit!
We have enough
Process 3 Address Space Process 3 Address Space memory… but it’s
not contiguous
CMPE 220 42
Memory Fragmentation – Dynamic
Relocation
Starting State Process 2 Ends Process 4 Starts
CMPE 220 43
Memory Protection
• Memory Protection prevents one process from reading or writing
memory used by another process
• In addition to a base address, the MMU may support a start address
and end address.
• For each memory reference, the MMU adds the base address, and
determines if the resulting physical address live within the bounds of
the start and end
• If not, an error interrupt is generated
CMPE 220 44
Getting Fancy – Modern MMUs
• Separate base addresses for code, data, and shared memory
Process 1 Address Space Physical Memory Process 2 Address Space
(Process 1 Data)
(Process 2 Data)
CMPE 220 45
Virtual Memory
• Virtual Memory: a system of software and hardware that allows
portions of a program’s memory to be temporarily cached on disk, to
minimize the requirements for physical memory.
• Allows the operating system to load and execute programs with
memory requirements that total far more than the available physical
memory.
• Virtually memory systems divide the memory address space into
pages, each of which is separately mapped by the MMU.
• Not all pages reside in physical memory. Pages that are not loaded in
physical memory are cached on disk.
CMPE 220 46
Virtual Memory Mapping
Process 1 Address Space Physical Memory Disc Storage
Page 1 Page 1
Page 3 Page 2
Page 4 Page 5
Page 5 Page 4
CMPE 220 47
Virtual Memory Mapping
• Requires a separate base address for each page in the process’s
address space.
• The MMU maps each memory reference to the corresponding
location in physical memory by adding the base address.
• If the page does not reside in physical memory, the MMU will
generate a page fault interrupt.
• The process will be placed in an I/O wait state
• The requested memory page will be loaded from disk into physical memory
• This may require a page that is currently in physical memory to be moved to
disk (swapped out) in order to free up space
CMPE 220 48
Choosing the Page to Swap Out
• If a page is already on disk, and the copy in memory has not been
modified, it doesn’t actually need to be written to disk – so it’s a good
candidate to free up
• Other algorithms include:
• FIFO – First In, First Out
• LRU – Least Recently Used
• LFU – Least Frequently Used
CMPE 220 49
Why the Algorithm is Important
• Going to disk to “swap in” a page (and possibly “swap out” a page to
free up memory) dramatically slows program execution
• If a RAM access took one SECOND, a disk access would take one DAY
• A system that has insufficient physical memory needs to swap
frequently, resulting in thrashing, a condition in which the system is
essentially unable to perform useful work because of constant page
swaps.
CMPE 220 50
CMPE 220
CMPE 220 1
What Does a Modern Operating
System Do?
1. Process Management
• Interprocess Communications
2. Input / Output (I/O) Management
3. Memory Management
4. File System Management
5. System Functions and Kernel Mode
6. User Interaction – (maybe)
CMPE 220 2
(4) File System Management
• The operating system is responsible for creating, deleting, opening,
closing, reading and writing files on disk
• It manages a “directory” to allow files to be accessed and located by
name
• It manages the organization and placement of file data on the disk
• The File System uses the underlying I/O System to perform the
physical I/O to and from the disk.
CMPE 220 3
The File Catalog
• Disk files are managed by the operating system. The system manages
a catalog or directory of files, as well as file layout information.
• The catalog is simple an index that maps files by name to their
location on disk.
• On most modern systems, the catalog is hierarchical, allowing for
nested directories or folders.
CMPE 220 4
File Layout on Disk
• Files are made up of fixed size blocks. A file is a series of blocks; these
blocks are mapped to corresponding physical blocks on the disk.
Logical Blocks in a FIle Physical Blocks on Disk
Block 1 Block 1
Block 3 Block 2
Block 4 Block 5
Block 5 Block 4
Block 3
CMPE 220 5
File Layout on Disk
• When disk media develops errors, some blocks may be marked as
“bad” and not used.
Logical Blocks in a FIle Physical Blocks on Disk
Block 1 Block 1
Block 3 Block 2
Block 5 Block 4
Block 3
CMPE 220 6
Disk Fragmentation
• Just like physical memory allocation, the space on disk becomes
broken up into small chunks as files are added, deleted, expanded.
• The result dramatically degrades system performance, because
reading sequential data from disk becomes very slow.
• Disk transfer rates are pretty fast
• Disk rotation is slower… waiting for the disk to rotate to read the next
block is expensive
• Moving the disk head from track to track is slower still… so if the file
blocks are scattered, reading successive blocks of data can be very
expensive
CMPE 220 7
Disk Fragmentation
CMPE 220 8
Defragmentation
• To reduce the performance impact of disk fragmentation, utility
programs were developed to “defrag” the disk. This involved
rewriting each file as a contiguous series of blocks, and eliminating
unused space between files.
• Defragmenting a large disk could take several hours, and the disk
could not be used while the utility was running.
• Modern operating systems dynamically defragment the disk, so it is in
most cases no longer necessary to run a separate utility.
CMPE 220 9
Defragmentation
CMPE 220 10
(5) System Functions and Kernel
Mode
• Modern CPUs support two modes of instruction execution: user
mode and kernel mode
• When a process is running in user mode, its capabilities are restricted:
• It cannot execute certain instructions, such as I/O instructions
• It cannot trigger an interrupt
• It cannot access a different process’s memory partition
• The operating system needs to do all of those things, so system
processes may run in kernel mode
CMPE 220 11
Calling System Functions
• User programs may call system functions that have been statically or
dynamically linked. But since the entire process is running in user
mode, those system functions still cannot perform restricted
operations, such as initiating I/O, or even updating system control
structures.
• System functions have a mechanism for invoking code in kernel mode.
• This mode switch is initiated with a special software interrupt,
sometimes called a trap.
CMPE 220 12
Switching Execution Modes
User Mode Process System Function Kernel Code
• Return to caller
System Function ()
CMPE 220 13
Kernel Mode is Dangerous
• Modern operating systems are well-protected. It’s very difficult for a
program running in user mode to crash or penetrate the OS
• Kernel mode allows essentially free reign to system structures and
hardware
• Code that is executed in kernel mode must be very well written and
tested
• Kernel code that is invoked via a trap must carefully validate the
request
CMPE 220 14
Traps Are Expensive
• On most systems, switching into and out of kernel mode is a relatively
expensive operation
• Calls to kernel mode code should be kept to a minimum
• A great deal of system optimization work goes into dividing the
operations that can be performed in user mode from the operations
that must be performed in kernel mode
CMPE 220 15
Resource Conflicts
• System processes – or system calls from user processes – may often
access the same resource (for example, multiple processes may be
updating PCBs or IOCBs)
• This introduces the possibility of errors
• Process 1 is in the midst of making some updates to PCBs
• In the midst of the changes, process 2 gets control – but the changes process
2 tries to make may conflict with the changes process 1 is making. Or, process
2 may run into errors accessing the PCBs, because they are in an inconsistent
state.
• There are two ways to resolve this problem
CMPE 220 16
Uninterruptible Code Segments
• Some systems allow small sections of code to be protected from
interrupts
• Process 1 can safely update the PCBs, knowing that it cannot lose
control until the changes are complete
• Risk: too many sections of uninterruptible code can effectively “lock
out” other processes
CMPE 220 17
Resource Locks
• Critical resources can be locked so that other processes cannot access
the resource until the current process unlocks it
• To accomplish this, each critical shared resource is associated with a
semaphore or mutex (mutual exclusion flag). Testing and setting is an
atomic operation that cannot be interrupted.
• Once a process has the semaphore, it can safely use the resource until
it releases the semaphore
CMPE 220 18
Resource Locking Can Lead to
“Deadlocks”
• Assume two processes, 1 and 2, each require the use of resources A
and B – but request them in the opposite order!
• Process 1 requests the resource A semaphore – and gets it.
• At that point, Process 1 loses control. Process 2 then gets control, and
requests the resource B semaphore – which it gets.
• This leads to a deadlock – neither Process 1 nor Process 2 can
continue.
Process 1 Process 2
• Has the Resource A semaphore • Has the Resource B semaphore
• Is waiting for the Resource B semaphore • Is waiting for the Resource A semaphore
CMPE 220 19
Deadlock Detection and Prevention
• Very hard to debug
• Because there may be only tiny “windows” that can result in a
deadlock, they may occur rarely, and be very hard to locate and debug
• Deadlock detection and prevention is a major research area in
computer science
• Current detection algorithms tend to be very resource intensive and
have a large impact on system performance
CMPE 220 20
(6) User Interaction (maybe)
• The early “proto operating systems” – Job Control Programs –
accepted a limited set of commands directly from the input device
(typically a card reader)
• By the late 1950s, most computers had a console where commands
could be entered and messages could be output
• Today, user interactions are not part of the operating system
• A shell accepts user command, and runs programs or calls system functions to
execute them
• A windowing system interacts with the user through a Graphic User Interface
(GUI) and runs programs or calls system functions as required
CMPE 220 21
What Does a Modern Operating
System Do?
1. Process Management
• Interprocess Communications
2. Input / Output (I/O) Management
3. Memory Management
4. File System Management
5. System Functions and Kernel Mode
6. User Interaction – (maybe)
CMPE 220 22
Types of Operating
Systems
CMPE 220 23
Embedded Systems
(Generalizations)
• May not support a hardware MMU
• Single address space
• Limited physical memory
• Might not support a disk or mass storage device
• Code is loaded from non-volatile memory
• Swapping is not possible – limited to physical memory
• May still support a simple multi-process software architecture
• Little or no ability for user to add/modify software
• Fewer security issues
CMPE 220 24
Multiprocessor Architectures
• The goal of Multiprocessor Architectures is to provide greater system
throughput through parallelism
Network
CMPE 220 25
Multiprocessor Operating Systems
• The core functions of the OS remain unchanged:
• Process management
• Scheduler, dispatcher
• Memory Management
• I/O Management
• File Management
• The resources that they manage are expanded. For example, the
scheduler can assign processors to multiple CPUs.
CMPE 220 26
Multiprocessor OS: Master-Slave
• One copy of the OS runs on the Master CPU
• Slave processors make all OS calls to the Master CPU
CMPE 220 27
Multiprocessor OS: Symmetric
• OS Processes can run on any CPU
• Requires memory “lock” on shared OS data structures
CMPE 220 28
Breaking Out of the
Box
Operating Systems Across “Computer” Boundaries
CMPE 220 29
Network vs Distributed Operating
Systems
• Both provide a means of sharing resources across a communications
network
• Network Operating System: access to remote resources is explicit
• Distributed Operating System: access to remote resources is implicit
– programs may not know about locality of references
• Cloud Operating system: manages the operation, execution and
processes of virtual machines, virtual servers and virtual
infrastructure, as well as the back-end hardware and software
resources.
CMPE 220 30
Network Operating Systems
• Almost all modern operating systems are ”networked”
• POSIX, Mac OS, Windows, IOS, Android: YES
• Embedded systems: MAYBE
• Remote resources are accessed via explicit communication protocols
• FTP, SFTP, RPC, telnet, SSH
• Client-Server applications are supported through standard
communication protocols (i.e. TCP/IP)
• The basic architecture of the Internet
• We’ll talk more about this architecture in the next class when we
discuss “server” software
CMPE 220 31
Distributed Operating Systems
• Users are not aware that they are on a network
• Access to remote resources is similar to access to local resources
• Transparent access to remote files
CMPE 220 32
Distributed OS: Access to Remote
Files
• Remote File Access: Each file access traverses the network
• File Migration / Data Migration: File is copied to local machine,
accessed, then copied back (requires a file locking mechanism)
• Program Migration: Program is executed on remote machine, where
is has direct access to the file
CMPE 220 33
Distributed OS: Process Migration
• Execute an entire process, or parts of it, on a remote system
• Allows:
• Load balancing
• Access to special purpose hardware or software
CMPE 220 34
Cloud Operating System
• A cloud OS virtualizes system resoruces.
• A cloud OS is a distributed OS that allows multiple operating systems
to coexist on a single “system,” or a single OS to span multiple
“systems.”
• A web OS is a specialized type of cloud OS. All resources are available
through a web browser
• Similar to a concept pioneered at Sun Microsystems ca. 2005
• “The network is the computer” – Scott McNealy
CMPE 220 35
Object Oriented Operating Systems
• Extend the object oriented programming paradigm into the OS… the
operating system manages resources as objects
• An active research area from the mid-1980s through the mid-1990s
• There are no mainstream examples today
CMPE 220 36
For Next Week
• Log in to Canvas and complete Assignment 6
CMPE 220 37
CMPE 220
Week 11
Servers (and client/server applications)
CMPE 220 1
The Client / Server Model
• A server is a program that runs on a computer, providing a specific
service to other program(s), called clients
• In the POSIX world, a server is often called a daemon (demon)
• a long-running background process that answers requests for services
• The server program may launch multiple processes, as needed
• The software that makes up a server may - or may not - have system
dependencies
• Examples: • Web Server
• Database Management System • Windowing System
• FTP Server
• Mail Server
CMPE 220 2
Local or Remote
• The client and the server can run on the same machine, or they may
run on separate machines and communicate over a network.
• Network communication may be hidden by a function call library.
Local Computer Remote Computer
CMPE 220 3
Accessing Network Services
• Client / Server applications require a protocol – a language and a set
of rules to allow the client to communicate with the server
• Languages do not need to be human readable (i.e. could be binary)
• Early protocols were ad hoc – made up by the application developer
• FTP: File Transfer Protocol
• Developed in the early 1970s by Abhay Bhushan, a student at MIT
• Protocols Encapsulate:
• Authentication
• Requests
• Responses
• State & Session Management
CMPE 220 4
Standardizing Protocols: OSI
• Open Systems Interconnection model (OSI model): a conceptual
model that characterizes and standardizes the communication
functions
• Adopted as an ISO standard in 1980
• Seven Layers Application Layer
Presentation Layer
Session Layer
Transport Layer
Network Layer
Data Link Layer
Physical Layer
CMPE 220 5
TCP/IP
• Transmission Control Protocol / Internet Protocol
• Based on research done by the Army Research Projects Agency (ARPA)
in the 1960s and 1970s
• Standardized by the US Department of Defense in 1982
• Administered by the Internet Engineering Task Force (IETF) since 1989
• Four Layers:
Application Layer
Transport Layer
Internet Layer
Network Interface Layer
CMPE 220 6
OSI Versus TCP/IP Models
OSI Reference Model TCP/IP Model • OSI model provides a clear
distinction between application,
Application presentation, and session services.
Presentation Application • TCP/IP groups these as a single
Application layer
Session
CMPE 220 7
Application-Specific Protocols
• Each type of server (web, database, ftp, etc) has its own protocol
specific to that application.
• A web server uses the HyperText Transfer Protocol to request web pages
• Service-specific protocols are the top layer of the TCP/IP network
model
Application Layer
Transport Layer Transmission Control Protocol (TCP)
Internet Layer Internet Protocol (IP)
Network Interface Layer
CMPE 220 8
TCP/IP Enabled the Internet
• Ubiquitous: supported by virtually every operating system
• Essentially makes every system today a Network Operating System
• Every device on the Internet has a unique 32-bit IP address, consisting
of four 8-bit numbers (4.2 billion addresses):
• 67.169.41.253
• IPv6: an extended protocol which supports 128-bit addresses, made
up of eight 16-bit numbers, expressed in hexadecimal
• FE80:CD00:0000:0CDE:1257:0000:211E:729C
• Drafted by the Internet Engineering Task Force (IETF) in 1998
• Finalized in 2017 – currently being deployed worldwide
CMPE 220 9
TCP/IP Service Requests
• In addition to IP addresses, the TCP/IP protocol uses the notion of a
standard “port” to map requests to specific services
• For the Transmission Control Protocol and the User Datagram
Protocol, a port number is a 16-bit integer in the header of a message
• Reserved (“well-known”) port numbers – by convention:
• FTP: port 20/21 • HTTPS: port 443
• Telnet: port 23 • POP3 (email): port 110
• DNS: port 53 • MySQL: port 3306
• HTTP: port 80
• Servers use system calls to open read operations on network ports to
receive incoming client requests
CMPE 220 10
What Does a (modern) Operating
System Do?
1. Process Management
• Interprocess Communications
2. Input / Output (I/O) Management
3. Memory Management
4. File System Management
5. System Functions and Kernel Mode
6. User Interaction – (maybe)
7. Network Services
CMPE 220 11
File Transfer Protocol (FTP) - Abhay
Bhushan
• One of the earliest client/server applications
(early 1970s by Abhay Bhushan)
• A graduate of the first class (1960–65) from
the Indian Institute of Technology Kanpur
• Masters in EE from MIT
• Drafted RFC 114 – FTP
• Contributed to the development of the
ARPAnet and email protocols
CMPE 220 12
File Transfer Protocol (FTP)
• Originally operated over dialup phone lines
• An FTP client connects to an FTP server
• Via the client, a user can: • FTP Clients
• Authenticate (login) • FileZilla
• List files • Cyberduck
• Request (get) files or upload (put) files • Transmit
• Change directories • WinSCP
• Rename files • many others
• Delete files
CMPE 220 13
FTP
• The FTP client program is started by the user only when needed
• The FTP server program is started by the system and continues to run
CMPE 220 14
Database Management Systems
• A Database Management System (DBMS) allows information to be
organized in a structured way, and used by one or more application
programs.
CMPE 220 15
A Single Computer DBMS
• The DBMS is simply a library
Application Program
Database Management System Library
File System
I/O System
Hardware
CMPE 220 16
A Networked DBMS
• The DBMS consists of a library, and a server
ork
Netw ation
Application Program mu nic Database Management Server
Com
Database Management Library File System
I/O System
Hardware
• The application and the database server may actually run on the same
computer, but still use a network protocol to communicate (localhost)
CMPE 220 17
Database Clients
• Any application can be a client of a DBMS server
• Database systems usually include a front-end interface that allows
programmers to examine and update database contents
• This “front end” is simply another client
CMPE 220 18
Is a DBMS System Software?
• NO:
• Sits on top of the file system
• Uses standard communication capabilities
• YES:
• The DBMS is optimized by taking advantage of specific knowledge of the I/O
system and the hardware architecture
• Database vendors work closely with system vendors
CMPE 220 19
How Oracle Builds Software
Engineering
Organization
Base
Porting
Coding
CMPE 220 20
According to Larry Ellison, founder
of Oracle
• “Oracle is not a database company. We are a portable software
solutions company.”
• Just as we saw with high level programming languages, database
management systems allow portable applications, by providing a
standard interface – a database language – which is tuned to run on a
wide range of systems.
CMPE 220 21
Relational Databases
• The most widely used “type” of database
• Oracle
• mySQL
• Accessed using Structured Query Language (SQL)
• Rigidly defines the data structure
CMPE 220 22
Relational Databases - History
• The term "relational database" was invented
by Dr. Edgar (Ted) Codd at IBM in 1970.
• Codd introduced the term in his research
paper "A Relational Model of Data for Large
Shared Data Banks". In this paper and later
papers, he defined what he meant by
"relational".
CMPE 220 23
Relational DB: Entities and
Attributes
• Entity
• Attribute
24
Diagramming Entities and Attributes
• An entity relationship diagram, or ER diagram, shows
an entity with a rectangle and its attributes with ovals.
• Underline the unique attribute.
Database Systems
by Jukić, Vrbsky, & Nestorov
Pearson 2014 25
ISBN 978-0-13-257567-6
Relationships
• Each entity in an ER diagram must be related
to at least one other entity.
Database Systems
by Jukić, Vrbsky, & Nestorov
Pearson 2014 26
ISBN 978-0-13-257567-6
Types of Relationships
• One-to-one (1:1)
• One-to-many (1:M)
Database Systems
by Jukić, Vrbsky, & Nestorov
Pearson 2014 28
ISBN 978-0-13-257567-6
Multiple Relationships
• Two entities can have multiple relationships with each
other.
Database Systems
by Jukić, Vrbsky, & Nestorov
Pearson 2014 29
ISBN 978-0-13-257567-6
ER Diagram Example
Database Systems
by Jukić, Vrbsky, & Nestorov
Pearson 2014 30
ISBN 978-0-13-257567-6
Database Design
• The ER diagram is the design of our database.
• Map the diagram to a set of database tables.
• Each table contains related data.
• Example: An employee table.
• A table is also called a relation.
• Each row of a table contains the data
of a single record.
• Example: One row per employee record.
• The table columns are the attributes.
• Examples: Employee ID, name, gender,
phone number, birthdate, etc.
31
Mapping Entities
ER diagram of the
CUSTOMER entity.
The CUSTOMER
table design
An example of a filled
CUSTOMER table.
Database Systems
by Jukić, Vrbsky, & Nestorov
Pearson 2014
32
ISBN 978-0-13-257567-6
Primary Key
• Each table must have a primary key.
• A column or set of columns whose value
uniquely identifies each row.
• Underline the primary key in each database table.
Database Systems
by Jukić, Vrbsky, & Nestorov
Pearson 2014
ISBN 978-0-13-257567-6
33
Structured Query Language (SQL)
Statements
• SQL (“Sequel”) is a language for accessing and manipulating relational
databases
• Statements can be entered on the command line of a database front-
end (client)
• Statements can be sent by application programs to the database
server
CMPE 220 34
SQL – History
• SQL was initially developed at IBM by Donald D. Chamberlin and
Raymond F. Boyce after learning about the relational model from
Edgar F. Codd in the early 1970s.
• In June 1979, Relational Software (now Oracle) introduced the first
commercially available implementation of SQL for VAX computers.
• Approved as an ISO standard in 1987.
CMPE 220 35
SQL Query Examples – single table
• What is the class code of the
Java programming class?
Class Code Teacher_id Subject Room
908 7008 Data structures 114
926 7003 Java programming 101
931 7051 Compilers 222
951 7012 Software engineering 210
974 7012 Operating systems 109
Desired fields
SELECT code
Source tables FROM class
WHERE subject = 'Java programming'
Selection criteria
+------+
| code |
+------+
| 926 |
+------+
36
SQL Query Examples – multiple tables (joins)
• Who is teaching Java programming?
Teacher Class
Id Last First Code Teacher_id Subject Room
7003 Rogers Tom 908 7008 Data structures 114
7008 Thompson Art 926 7003 Java programming 101
7012 Lane John 931 7051 Compilers 222
7051 Flynn Mabel 951 7012 Software engineering 210
974 7012 Operating systems 109
+-------+--------+
| first | last |
+-------+--------+ Selecting from multiple tables
| Tom | Rogers |
+-------+--------+
is called a join.
37
SQL Query Examples - multiple tables
• What subjects does John Lane teach?
Teacher Class
Id Last First Code Teacher_id Subject Room
7003 Rogers Tom 908 7008 Data structures 114
7008 Thompson Art 926 7003 Java programming 101
7012 Lane John 931 7051 Compilers 222
7051 Flynn Mabel 951 7012 Software engineering 210
974 7012 Operating systems 109
+------+----------------------+
| code | subject |
+------+----------------------+
| 951 | Software engineering |
| 974 | Operating systems |
+------+----------------------+
38
SQL Query Examples - multiple tables
• Who is taking Java programming? Takes
Student_id Class_code
SELECT id, last, first 1001 926
FROM student, class, takes 1001 951
WHERE subject = 'Java programming'
1001 908
AND code = class_code AND id = student_id
1005 974
+------+-------+-------+ 1005 908
| id | last | first | 1014 931
+------+-------+-------+
1021 926
| 1001 | Doe | John |
| 1021 | Smith | Kim | 1021 974
+------+-------+-------+ 1021 931
Class Student
Code Teacher_id Subject Room Id Last First
908 7008 Data structures 114 1001 Doe John
926 7003 Java programming 101 1005 Novak Tim
931 7051 Compilers 222 1009 Klein Leslie
951 7012 Software engineering 210 1014 Jane Mary
974 7012 Operating systems 109 1021 Smith Kim
39
SQL Update Statement - example
• Fixing a typo (last row of table)
Class
Code Teacher_id Subject Room
908 7008 Data structures 114
926 7003 Java programming 101
931 7051 Compilers 222
951 7012 Software engineering 210
974 7012 Operating sytsems 109
UPDATE class
SET subject = ‘Operating systems’
WHERE code = ‘974’
CMPE 220 40
Web Servers
• The first web browser and server were
created by Sir Tim Berners-Lee in 1990
• Berners-Lee was knighted by Queen
Elizabeth II in 2004
CMPE 220 41
Basic Web Protocols
Web Browsers use two basic services
• Domain Name Server (DNS Server)
• Looks up a domain name to get an IP address
• User Datagram Protocol (UDP) – a low latency protocol - on port 53
• Web Server
• Responds to web page requests
• HyperText Transfer Protocol (HTTP) – over TCP - on port 80
• HyperText Transfer Protocol - Secure (HTTPS) – over TCP - on port 443
CMPE 220 42
Layered Protocols (TCP/IP Model)
Network Model Layers Domain Name Lookup Web Page Fetch
Application Layer DNS request HTTP or HTTPS
Transport Layer UDP – User Datagram Protocol TCP - Transmission Control Protocol
Internet Layer IP – Internet Protocol IP – Internet Protocol
Network Layer Ethernet, etc Ethernet, etc
CMPE 220 43
HTTPS (Secure HyperText Transfer
Protocol)
HTTPS has two functions
• Encrypts traffic between the browser and the web server
• Allows transmission of sensitive information, such as bank account numbers
• Verifies the identity of the website by downloading a certificate issued
by a recognized Certificate Authority (CA)
CMPE 220 44
Additional Browser Functions
• “Build” web pages
• Parse web pages looking for embedded content, such as images
• Send additional HTTP requests to fetch embedded content
• One web page may require many HTTP requests (hundreds)
• Process Cascading Style Sheets (CSS) to format pages
• Run embedded Javascript code
• Maintain a data model – the Document Object Model (DOM) – which
allows Javascript to access and manipulate page elements
• Store local data associated with a particular website (cookies) – used
for managing user sessions
CMPE 220 45
Fetching and Building a Page
• The home page on my own website requires 48
separate fetches
https://tools.pingdom.com/
• Complex websites may require hundreds of
fetches for a single page
• May require fetches from several domains
• Ads
• Shared font files
• Services such as Google Analytics
• Handshakes may take a significant part of the
overall load time
CMPE 220 46
What Are Cookies?
• A cookie is a named data element, stored by the browser, and
associated with a website
• Application code on a website can set a cookie, such as SESSION_ID
• On subsequent web page requests, the web application code can
request the value of cookies the website set
• This is used to allow logins and sessions
• Cookies can only be retrieved by the site that set them
• …but there is an important loophole!
CMPE 220 47
Embedded Content
Embedded Ad (Google)
CMPE 220 48
How Cookies Are Used for Tracking
• Remember that a single webpage may be built from many web
requests
• If a website includes ads from an ad agency, such as Google,
applications on that web server can set cookies
• If pages on other websites also include ads from the same agency,
that web server can request the cookie value
• In this way, and ad agency can track the sites that you visit, and
display similar ads
CMPE 220 49
How Cookies Are Used for Tracking
Website 1 Server Ad Server Website 2 Server
Browser
AD AD
CMPE 220 50
Additional Web Server Functions
• Run back-end programs that dynamically build pages and send them
to the browser
• php
• Java
• Microsoft Active Server Pages (ASP)
• Binary executable programs - Common Gateway Interface (CGI)
• Respond to additional protocols
• Cache commonly accessed pages to improve speed
CMPE 220 51
Additional Web Protocols
• HTTP get & HTTP post
• Two ways for a browser to submit form data to a program on the web server,
which may then access databases and return appropriate page contents
• JavaScript Object Notation (JSON): allows downloading data from a
web server, to be processed by Javascript programs in the browser
• Simple Object Access Protocol (SOAP): a protocol for exchanging
structured data between a web client (browser) and a server
• Asynchronous JavaScript And XML (AJAX): a protocol for
asynchronously retrieving web content to dynamically update web
pages
CMPE 220 52
Web Forms (get & post)
• Designated with an HTML form tag
<form action=“/process-contact-form.php” method=“get”>
• Action code processes request and returns a web page
• E.g. “Thank you for contacting us”
Get method
• Form fields are attached to the URL:
/process-contact-form.php&name=Robert&[email protected]
Post Method
• Form fields are embedded in the HTTP request
CMPE 220 53
Web Services
• A piece of software that provides a function or service over the
Internet, using a standard (usually XML-based) interface
• Client applications make requests from the service, and get back
results
• A web server is a type of API, but not all APIs are web services
54
Traditional Web Services (ca. 2007)
• Traditional web services are described by a service contract written in
the Web Services Description Language (WSDL)
• The WSDL document is an XML document that provides a machine-readable
description of how the service can be called
• The WSDL document and the request and response messages are transmitted
over http or https
• https://www.w3.org/TR/2001/NOTE-wsdl-20010315
• W3 or W3C = World Wide Web Consortium
55
Traditional Web Services
• Messages use the Service Oriented Architecture Protocol (SOAP)
• SOAP is also an XML-based format
• https://www.w3schools.com/xml/xml_soap.asp
• https://www.w3.org/TR/soap/
56
RESTful Web Services
• A simpler web services protocol is REST
• Representational State Transfer
• A software architecture style consisting of guidelines and best
practices for creating scalable web services
• REST is not an interface standard
• https://www.w3.org/2001/sw/wiki/REST
• https://www.codecademy.com/article/what-is-rest
57
The REST Architecture
• REST systems are stateless, meaning that the server does not
need to know anything about what state the client is in
and vice versa
• Client and server implementations are independent; the
only connect is via messages
• In the REST architecture, clients send requests to retrieve or modify
resources, and servers send responses to these requests
58
REST Requests
• REST requires that a client make a request to the server in order to
retrieve or modify data on the server. A request generally consists of:
• an HTTP verb, which defines what kind of operation to perform (GET, POST,
PUT, DELETE)
• a header, which allows the client to pass along information about the
request
• a path to a resource
• an optional message body or payload containing data
59
REST Responses
• You download JavaScript code from the web service provider that
makes the AJAX calls.
• Therefore, you don’t worry about what protocol the web service provider
uses
61
Web APIs
• APIs (Application Program Interfaces) include Javascript function calls
as well as web service interfaces
• Function calls may hide the details of web services
• A web service is an API, but not all API are web services
• Additional differences
• Web Services are network based (by definition)
• APIs are protocol agnostic
62
Windowing Systems
• The window system (Graphic User Interface) on a desktop computer
displays content from many programs
• The window system is a server – sometimes called a display server
• Application programs that want to display content or interact with the
user are clients.
• GUI management and display features are typically implemented as
function calls, which use inter-process communication to make
requests from the windows server
• Programs on another computer can access the GUI and devices
CMPE 220 63
Windowing Systems
CMPE 220 64
Windowing Systems
• Microsoft Windows was originally a distinct component built on top
of Microsoft DOS (Disk Operating System)
• In POSIX systems, Graphic User Interfaces are based on the X-
Windows server, developed at MIT in 1984
• “The X Window System (X11) is an open source, cross platform, client-server
computer software system that provides a GUI in a distributed network
environment.”
• The Macintosh windowing system was based on Darwin, a derivative
of X-Windows.
CMPE 220 65
System Software
• System Software is not necessarily part of the operating system, but it
includes specific knowledge of the underlying operating system and
hardware, including:
• Instruction set
• Memory architecture and management
• Process management
• Network architecture
• System software enables program portability, by isolating
programmers and users from underlying system details
CMPE 220 66
System Software
• System software has grown in complexity
• Successfully writing system software requires:
• A knowledge of system specifics
• A knowledge of algorithms and data structures that have been developed
over decades
CMPE 220 67
For Next Class
• Log in to Canvas and complete Assignment 10
CMPE 220 68
CMPE 220
Class 13
Servers (and client/server applications)
CMPE 220 1
The Client / Server Model
• A server is a program that runs on a computer, providing a specific
service to other program(s), called clients
• In the POSIX world, a server is often called a daemon (demon)
• a long-running background process that answers requests for services
• The server program may launch multiple processes, as needed
• The software that makes up a server may - or may not - have system
dependencies
• Examples: • Web Server
• Database Management System • Windowing System
• FTP Server
• Mail Server
CMPE 220 2
Basic Web Protocols
Web Browsers use two basic services
• Domain Name Server (DNS Server)
• Looks up a domain name to get an IP address
• User Datagram Protocol (UDP) – a low latency protocol - on port 53
• Web Server
• Responds to web page requests
• HyperText Transfer Protocol (HTTP) – over TCP - on port 80
• HyperText Transfer Protocol - Secure (HTTPS) – over TCP - on port 443
CMPE 220 3
Layered Protocols (TCP/IP Model)
Network Model Layers Domain Name Lookup Web Page Fetch
Application Layer DNS request HTTP or HTTPS
Transport Layer UDP – User Datagram Protocol TCP - Transmission Control Protocol
Internet Layer IP – Internet Protocol IP – Internet Protocol
Network Layer Ethernet, etc Ethernet, etc
CMPE 220 4
Leveraging Protocols
• Once protocols (such as http and https) are defined, they can be used
to expand services
• Additional protocols may be defined, either on top of existing
protocols, or as complements
• Protocols are the glue that allows client-server application systems to
be robust and extensible
• Protocols allow an ecosystem of services
CMPE 220 5
Web Services and APIs
6
Web Services
• A piece of software that provides a function or service over the
Internet, using a standard (usually XML-based) interface
• Client applications make requests from the service, and get back
results
• A web server is a type of API, but not all APIs are web services
7
Traditional Web Services (ca. 2007)
• Traditional web services are described by a service contract written in
the Web Services Description Language (WSDL)
• The WSDL document is an XML document that provides a machine-readable
description of how the service can be called
• The WSDL document and the request and response messages are transmitted
over http or https
• https://www.w3.org/TR/2001/NOTE-wsdl-20010315
• W3 or W3C = World Wide Web Consortium
8
Traditional Web Services
• Messages use the Service Oriented Architecture Protocol (SOAP)
• SOAP is an XML-based format
• https://www.w3schools.com/xml/xml_soap.asp
• https://www.w3.org/TR/soap/
• Runs on top of http/https
• SOAP allows developers to invoke processes running on different
operating systems to authenticate, authorize, and communicate
using XML data
9
RESTful Web Services
• A simpler web services protocol is REST
• Representational State Transfer
• A software architecture style consisting of guidelines and best
practices for creating scalable web services
• REST is not an interface standard
• https://www.w3.org/2001/sw/wiki/REST
• https://www.codecademy.com/article/what-is-rest
10
The REST Architecture
• REST systems are stateless, meaning that the server does not
need to know anything about what state the client is in
and vice versa
• Client and server implementations are independent; the
only connect is via messages
• In the REST architecture, clients send requests to retrieve or modify
resources, and servers send responses to these requests
11
REST Requests
• REST requires that a client make a request to the server in order to
retrieve or modify data on the server. A request generally consists of:
• an HTTP verb, which defines what kind of operation to perform (GET, POST,
PUT, DELETE)
• a header, which allows the client to pass along information about the
request
• a path to a resource
• an optional message body or payload containing data
12
REST Responses
200 (OK) This is the standard response for successful HTTP requests.
This is the standard response for an HTTP request that resulted in
201 (CREATED) an item being successfully created.
204 (NO CONTENT) This is the standard response for successful HTTP requests, where
nothing is being returned in the response body.
The request cannot be processed because of bad request syntax,
400 (BAD REQUEST) excessive size, or another client error.
403 (FORBIDDEN) The client does not have permission to access this resource.
404 (NOT FOUND) The resource could not be found at this time. It is possible it was
deleted, or does not exist yet.
The generic answer for an unexpected failure if there is no more
500 (INTERNAL SERVER ERROR)
specific information available.
13
Browser-Based Web Service Clients
• You can invoke many web services today from a web browser using
AJAX
• You download JavaScript code from the web service provider that
makes the AJAX calls.
• Therefore, you don’t worry about what protocol the web service provider
uses
14
Web APIs
• APIs (Application Program Interfaces) include Javascript function calls
as well as web service interfaces
• Function calls may hide the details of web services
• A web service is an API, but not all API are web services
• Additional differences
• Web Services are network based (by definition)
• APIs are protocol agnostic
15
A few FREE web services
• Map zip codes to city & state: https://www.zipcodeapi.com/API
• Look up movie info: https://www.omdbapi.com/
• Google Maps: https://developers.google.com/maps/
• Google Translate: https://cloud.google.com/translate/docs
• Weather forecast info: https://openweathermap.org/
• Dictionary lookup: https://dictionaryapi.com/
• https://www.freecodecamp.org/news/public-apis-for-developers/
16
Example Web API: Google Maps
• Google provides a large number of web-based web services, such as
Google Maps.
• See https://developers.google.com/maps/documentation/javascript
• Load and incorporate the JavaScript API
17
Google Maps Example: html
<html>
<head>
<title>Example 01 - Google Map Demo</title>
<link rel="stylesheet" type="text/css" href="./Gmaps.css" />
<script type="module" src="./Gmaps.js"></script>
</head>
<body>
<h3>Google Maps Demo</h3>
<!--The div element for the map --> API Key
<div id="map"></div>
<script src="
https://maps.googleapis.com/maps/api/js?key=AIzaSyDrRNPlF1ug4lBA28qR4xP8NkLPAZrZQrk&callback=initM
ap&v=weekly
" defer></script>
</body>
</html> Gmaps.html
18
Google Maps Example: css
/* Set the size of the div element that contains the map */
#map {
height: 400px; /* The height is 400 pixels */
width: 100%; /* The width is the width of the web page */
}
Gmaps.css
19
Google Maps Example: js
// Initialize and add the map
function initMap() {
// The location of Uluru Latitude & longitude
const uluru = { lat: -25.344, lng: 131.031 };
// The map, centered at Uluru, Australia
const map = new google.maps.Map(document.getElementById("map"), {
zoom: 4,
center: uluru,
}); Initial Zoom Factor
// The marker, positioned at Uluru
const marker = new google.maps.Marker({
position: uluru,
map: map,
});
}
window.initMap = initMap;
Gmaps.js
20
Examples
• Gmap example:
• http://cos-cs106.science.sjsu.edu/~012755158/Class-11/Gmaps.html
• https://robertnicholson.info/cs174/Class-11/Gmaps.html
• Animated map example:
• http://californiamissionguide.com/california-mission-guide/california-missi
on-map/
• Locations example:
• https://pavbhajihut.com/locations/
21
Mission Map Animation
function setlocations()
{
location_count = 0;
locations[location_count] = new Array ( "1", "San Diego de Alcalá", "10818 San
Diego Mission Road", "San Diego, CA 92108", "July 16, 1769", 32.790738, -
117.106018 ); location_count++;
locations[location_count] = new Array ( "2", "San Carlos Borromeo de
Carmelo", "3080 Rio Road", "Carmel, CA 93923", "June 3, 1770", 36.550741, -
121.92009 ); location_count++;
locations[location_count] = new Array ( "3", "San Antonio de Padua", "End of
Mission Road", "Ft. Hunter-Liggert Reservation<br />Jolon, CA 93928", "July 14,
1771", 36.016615, -121.249666 ); location_count++;
mission_map.js
22
Getting Latitude & Longitude
• Enter a location in Google Maps to find out its latitude & longitude
• https://www.google.com/maps
23
Google API Keys
• Google requires you to get an API key in order to use their services
• You need to provide a credit card number
• Google will charge if you exceed a usage threshold
• If you don’t enable charges, they will simply block the service when you
exceed the usage threshold
• Go to the Google Cloud Console
24
Getting a Google API Key
• Go to the Google Cloud Console:
• https://console.cloud.google.com
• Create or select a project
• Click Continue to enable the API and any related services
• On the Credentials page, get an API key (and set the API key
restrictions). Note: If you have an existing unrestricted API key, or a
key with browser restrictions, you may use that key
25
How Do Server Apps Use Web
Services?
• We services are invoked from the browser
• JavaScript code can communicate with the server app to get
parameters that are then used to connect with the service
• EXAMPLE:
• A server application stores my comic book collection
• I can pull up a comic record from my collection into the browser
• JavaScript code can call a “Comic Price Guide” service to get the current
price
26
Email Protocols
CMPE 220 27
Email Protocols
• Simple Mail Transfer Protocol (SMTP)
• Post Office Protocol (POP)
• Internet Message Access Protocol (IMAP).
• All three use TCP, and the last two are used for accessing electronic
mailboxes.
• Special records stored in DNS servers play a role as well, using UDP.
CMPE 220 28
Mail and DNS
• DNS servers hold several record types
• A DNS 'mail exchange' (MX) record directs email to a
mail server
• Web servers (A records) can be independent from mail
servers (MX records)
CMPE 220 29
Primary DNS Record Types
• A Record: The Address Mapping record (or DNS host record) stores a
hostname and its corresponding IPv4 address
• AAAA Record: The IP Version 6 Address record also stores a hostname
but points the domain to its corresponding IPv6 address
• DNS CNAME Record: The Canonical Name record can be used as a
hostname alias that points to another domain or subdomain but not
to an IP address
• MX Record: The Mail Exchanger record indicates an SMTP email
server for the domain
• TXT Record: A text (TXT) record can store any type of
descriptive information in text format. Often used for
authentication
CMPE 220 30
Protocols and Data Types - MIME
• Although not a protocol, there is a series of
Multipurpose Internet Mail Extensions (just MIME, never
“MIMEs”) for various types of email attachments (not
just simple text).
• MIME types tell the receiver how to handle the
attachment
• MIME types are also used in web services
• https://www.w3docs.com/learn-html/mime-types.html
CMPE 220 31
SMTP
• SMTP stands for Simple Mail Transfer Protocol, and it is responsible for
sending email messages
• This protocol is used by email clients and mail servers to exchange emails
between computers
• A mail client and the SMTP server communicate with each other over a
connection established through a particular email port (port 25)
• Ports 587 & 465 => encrypted
CMPE 220 33
POP Definciencies
• The ability to mark a message as read on multiple devices
• The ability to synchronize sent items from multiple devices
• The ability for emails to be pushed to your device as they arrive
• The ability to create folders in your POP account
CMPE 220 34
IMAP
• Internet Messaging Access Protocol
• A replacement for POP
• With IMAP accounts, messages are stored in a remote server
• Users can log in via multiple email clients on computers or mobile
device and read the same messages
• All changes made in the mailbox will be synced across multiple
devices and messages will only be removed from the server if the user
deletes the email
CMPE 220 35
CMPE 220
Class 14
Servers (and client/server applications)
CMPE 220 1
What is an Embedded System?
Wikipedia: An embedded system is a computer system—a
combination of a computer processor, computer memory,
and input/output peripheral devices—that has a dedicated function
within a larger mechanical or electrical system.
Embedded systems control many devices in common use today. Ninety-
eight percent of all microprocessors manufactured are used in
embedded systems.
Cars Aircraft Boats & Ships Drones Factory Equipment
Home Security Home Thermostats Computer TVs & Media Players Toys
Systems & Control Systems Peripherals
CMPE 220 2
Complex Circuits as Embedded
Systems
• An “embedded system” may be a very complex circuit – typically
designed using a Hardware Definition Language (HDL) – which is
capable of performance advanced algorithmic actions.
• For purposes of this class, that is not what we are discussing.
• By embedded system, we mean a recognizable “computer,” with a processor,
memory, and I/O capabilities, executing software that resides in memory.
CMPE 220 3
Why Use Embedded Systems?
• Faster and cheaper to develop (standard computers & software,
rather than custom hardware).
• More flexible: problems can be fixed and features added with
software updates.
• May be cheaper to manufacture (mass-produced, off-the-shelf
computer chips rather than low-volume custom circuits).
CMPE 220 4
History
• One of the earliest mass-produced
embedded systems was the Autonetics D-17
guidance computer for the Minuteman
missile, released in 1961.
• Just as the ENIAC in 1945, which was
delivered to the US Army to compute
artillery trajectories, a significant advance in
computing was driven by military
requirements.
CMPE 220 5
History
• A major milestone embedded system: the Apollo Guidance Computer
(1965).
• Flew both the Apollo Command Module, and the Lunar Excursion
Module (1969)
• The Apollo space program and
moon landing would not have
been possible without the
development of integrated
circuits (ICs) and embedded
systems.
CMPE 220 6
IBM System/360
• First released 1964.
• Built with discrete
Transistor /
Transistor Logic
(TTL)
• “Modern” operating
system
CMPE 220 7
Apollo Guidance Computer
• Integrated Circuits
• 16-bit words
• 72 KB of ROM for programs
• 4 KB of RAM
• Keypad control interface
• Functions:
• Displays System Status
• Navigates
• Flies the Apollo Command
Module
• Lands the LEM on the moon
CMPE 220 8
More History
• In 1968, the first embedded system for a car was released. The
Volkswagen 1600 used a microprocessor to control its electronic fuel
injection system.
• The first microcontroller was developed by Texas Instruments in 1971.
The TMS 1000 series, which became commercially available in 1974,
contained a 4-bit processor.
• In 1987, the first embedded operating system, the real-time VxWorks,
was released by Wind River Systems.
CMPE 220 9
Embedded Systems Today
• The Internet of Things (IoT) describes the network of physical objects
—“things”—that are embedded with sensors, software, and other
technologies for the purpose of connecting and exchanging data with
other devices and systems over the internet.
• These devices range from ordinary household objects to sophisticated
industrial tools.
• There are an estimated 10 billion devices connected to the Internet
today.
• A primary reason for IP v6
CMPE 220 10
Embedded System Differences
CMPE 220 11
Different Architecture / System
Characteristics
• Specialized Processors
• Cheap (mass produced, limited processing power)
• Small (simple architecture)
• Low power consumption, low heat radiation (slow)
• Limited memory
• No Memory Management Unit (MMU)
• No Disk
• OS and Application are loaded from ROM/EPROM (non-volatile memory)
• No “traditional” I/O devices
• No terminal, card reader, printer
• Specialized I/O devices: sensors & controllers (A-to-D and D-to-A)
• May have network capabilities (IoT)
CMPE 220 12
Different Development Tools
Cross Development
• Cross-Compiler – Runs on a General Purpose (GP) computer;
generates assembly code for an embedded computer
• Cross-Assembler – Runs on a GP computer, assembles source code for
an embedded computer; emits binary object code for that computer
• Cross-Linker – Runs on a GP computer; links object files for an
embedded computer
• Absolute Loader – Runs on embedded computer; loads binary code
that was built on a GP computer
CMPE 220 13
Building Embedded System
Software
High Level EmSys EmSys Binary
Language Cross Assembly Cross Machine Code
Source Code Compiler Language Assembler
(e.g. C++) Source Code
optional
Embedded System
EmSys In-Memory
download
Executable Code Hardware
Linker Loader
Code Execution
CMPE 220 14
Different Development Tools -
Debugging
Expanded embedded system
• A version of the embedded system with expanded capabilities (more
memory, terminal interface, disk drive, etc), used for software
development
• May support a full debugger, and even other development tools,
which run natively on the embedded system
CMPE 220 15
Different Development Tools -
Debugging
Remote Debugging
• Debugging tools run on GP computer.
• Debugger talks to a very simply “monitor” program on the embedded
computer in order to examine and set memory, and set breakpoints.
CMPE 220 16
Different Development Tools -
Debugging
In-Circuit Emulator (ICE)
• A hardware interface connected
to a GP computer “plugs in” to a
circuit or device, completely
replacing the embedded system.
• The GP computer supports a full
development environment, and
emulates the embedded system
at a circuit level.
CMPE 220 17
Different Development Tools -
Debugging
Joint Test Action Group (JTAG)
• Interface capabilities built in to the embedded processor support
external access to memory and control signals for purposes of
debugging and testing.
• Interfaces can be accessed and controlled by a GP computer.
• A standard codified by the IEEE in 1990.
CMPE 220 18
Different Development Tools -
Debugging
Simulation
• A simulator running on a GP computer can execute programs written
for the embedded computer.
• Interprets the instruction set.
• For useful development of embedded systems, may need to simulate
I/O devices (sensors, controllers) as well.
• Typically used in very “big budget” organizations, such as car
companies, aircraft companies, etc.
CMPE 220 19
Languages for Embedded Systems
• Assembly Languages
• Traditional Languages: C, C++, Python, Ada (DoD)
• Specialized Languages: Rust, Go / Golang (Google)
CMPE 220 20
Different Operating Systems
Traditional / General Purpose Embedded
POSIX (Unix, Linux) Embedded Linux
CMPE 220 21
Is a Smartphone an Embedded
System?
Low Power Consumption / Low Heat ✓
No Disk ✓
Cross-Development Tools ✓
Limited Memory / No MMU X
No Traditional I/O Devices X
Single Purpose X
CMPE 220 23
What Does an (Embedded) OS Do?
1 Process Management Maybe
2 Input / Output (I/O) Management Yes
3 Memory Management No
4 File System Management No
5 System Functions and Kernel Mode Limited
6 User Interaction No
7 Network Communications Maybe
CMPE 220 24
Security
• Traditional: Since applications are “friendly,” there are fewer security
requirements. The system doesn’t need to protect itself against
malicious applications.
• Today: IoT opens up the possibility of remote hacking!
• In 2015, security researchers demonstrated
the ability to take control of a Jeep
Cherokee while driving on a highway.
• There is no agency that regulates the
security of embedded systems.
CMPE 220 25
Real-Time Operating Systems
(RTOS)
• A real-time operating system (RTOS) is an operating system (OS)
intended to serve real-time applications. Real-time applications must
respond to inputs within a specified time.
• Often a requirement for embedded systems
• GP operating systems usually cannot guarantee a response time, for a
number of reasons:
• Too many processes might be in the process queue
• Interrupt processing and process switching may be slow operations
• Important code may be “swapped out” on disk
• However, there are specialized real-time operating systems for traditional
general purpose computers; each of these problems needs to be addressed
and overcome by the operating system
CMPE 220 26
Types of RTOS
• Asynchronous (Event Response)
Responds to asynchronous events. For example, if a proximity alert
occurs on a self-driving car, evasive action may be triggered.
• Powerful and flexible
• Difficult to prove & guarantee real-time response
• Continuous (Closed Loop)
Constantly monitors inputs and adjusts outputs. For example,
monitors temperature sensors and control fans.
• Can prove maximum response time by computing longest code path
CMPE 220 27
Further RTOS Classifications
How Rigid are the Real Time Requirements?
• Hard: Guaranteed response time.
• Flight control systems, robots, drones.
• Firm: Range of response times acceptable. Failure to meet the
desired response time is undesirable, but not catastrophic.
• Assembly line automation.
• Soft: Failure to meet desired response times degrades system
performance, but consequences are minimal.
• Human-facing applications.
CMPE 220 28
RTOS Adaptations
• Prioritized Scheduling
• Minimized Interrupt Latency
• No “blocking” code
• No User/Kernel mode switches
CMPE 220 29
Different System Software Services
• Tradition, general-purpose computers may run various servers: FTP,
email, database, http/web, etc.
• Embedded systems may run a “network OS” and support standard
network protocols
• Surprisingly, many embedded systems run a very light-weight web
server
• Limited capabilities
• Supports http/https protocols
• Used to provide a user interface to the system
CMPE 220 30
Embedded System Example: Kiln
Controller
• Allows the user to program a
firing cycle: when it starts, how
quickly the temperature rises,
the max temperature, etc.
• A specialized device made by one
company (Bartlett) that is found
on virtually every kiln sold in the
United States, from $1,500
hobbyist models to $200,000
commercial kilns.
• Local / manual interface
CMPE 220 31
Embedded System Example: Cable
Modem
• A cable modem & router uses an embedded computer system
• Most cable modems & routers run their own web server, so you can
manage and configure the router using a web browser
CMPE 220 32
Embedded System Example: Pool
Controller
• Uses smartphone app or browser to
control pool functions remotely
CMPE 220 33
Embedded Systems in My House
• TV(s) • Dishwasher
• Printer(s) • Clothes Washer
• Cable Modem / Router • Clothes Dryer
• Thermostat • Bathroom Scale
• Air Purifier • Pool Controller
• Roomba vacuum • Kiln Controller
• Oven • Sprinkler System Controller
• Microwave • Security System
• Refrigerator • Video Cameras
CMPE 220 34
Writing Embedded System
Applications
• Roughly 3-5% of software developers work on embedded systems.
• The development process is more cumbersome.
• Libraries and system functions are limited.
• Programmers need to focus on performance and efficient use of
memory.
• Programmers may need a deep understanding of specific hardware
capabilities.
CMPE 220 35
Arduino
• Arduino is a micro-controller* on a board, used for educational
purposes and limited commercial development
• Developed in 2005 at the Interaction Design Institute Ivrea (IDII) in
Ivrea, Italy
• No independent OS; library functions are linked into the application
program
• Inexpensive; can be built into commercial products ($4-25)
CMPE 220 36
Raspberry Pi
• Raspberry Pi is a computer on a board, often used in educational
environments
• Developed in 2012 by Raspberry Pi Foundation and Broadcom
• Supports monitor, mouse, and keyboard
• Can be interfaced to external sensors and devices to teach some
basics of embedded systems programming
• A full, general purpose processor and operating system (Linux)
• Significant RAM (256MB to 8GB)
• No disk
• Too expensive to embed in many real-world applications ($35-75)
CMPE 220 37
Alternatives
Many single-board microcontrollers & computers are available for
education, development, and embedded applications.
CMPE 220 38
Following a Trend…
• Just as with every other technology we’ve look at this semester,
embedded systems – and embedded system applications – are
becoming more powerful and more complex
• At the same time, the systems are becoming smaller, cheaper, and
more widespread – meaning more jobs for software developers
• Better tools!
CMPE 220 39
For Next Week
• Log in to Canvas and complete Assignment 7
CMPE 220 40
Midterm Next Monday
• Open Book, Open Notes
• We will not use the lockdown browser
• A mix of multiple choice, fill-in-the-blanks, and short-answer
questions
• Bring your laptops!
CMPE 220 41
Next Week: Midterm (Intro)
• Historical figures: Grace Hopper, Kathleen Booth, David Wheeler,
Dennis Ritchie, Ken Thompson, Edsger Dijkstra, David Patterson
• Key dates: first assembler, punched cards, first binary computer, first
linker, first command-line shell, make, ASCII / UTF characters, first IDE,
structured programming, etc.
• What are Unix / Linux / POSIX systems”
CMPE 220 42
Command Line Interfaces
• Basic shell commands (cat, ls, cd, pwd) and concept (pipes, I/O
redirection)
• File permissions and ownership
• Makefiles and make rules
CMPE 220 43
Midterm (Architecture)
• BCD arithmetic
• Character set representations (ASCII, EBCDIC, Unicode, UTF-8, UTF-16)
• Floating point representations
• Addressing modes (immediate, displacement, indirect, register, stack)
• RISC versus CISC
• Pipelining
• Microprogramming
• The SIC and SIC/XE instruction set
• Short programs
CMPE 220 44
Machine (Architectural Directions)
• Integrating functions onto processor chip
• Shifting function to MMU
• Quantum computers
CMPE 220 45
Midterm (The Software Build Cycle)
High Level Assembly Binary Machine
Language Language Code
Compiler Assembler
Source Code Source Code
(e.g. C++)
optional
Executable In-Memory
Code Code Hardware
Linker Loader
Execution
CMPE 220 46
Midterm (The Software
Development Cycle)
• Two pass and single pass assemblers
• Relocating linkers
• Dynamic libraries
• Absolute versus relocating loaders
• IDEs
• Smart editors
• Version control
• Debuggers & breakpoints
• Macro Languages and Macro-Processors
CMPE 220 47
Software Development Concepts
• Structured Programming
• Sequence
• Selection
• Iteration
• Pseudocode
CMPE 220 48
Compilers
• Scanning (lexical processing)
• Finite State Machines
• Parsing (syntactic processing)
• Code Generation
• Optimization
CMPE 220 49
Operating Systems
1. Process Management
• Interprocess Communications
2. Input / Output (I/O) Management
3. Memory Management
4. File System Management
5. System Functions and Kernel Mode
6. User Interaction – (maybe)
7. Network Management
CMPE 220 50
OS Details
1. Process Management
• Interprocess Communications
• Process Control Blocks (PCBs)
• Schedulers
• Dispatchers
2. Input / Output (I/O) Management
• I/O Control Blocks
• I/O Wait States
3. Memory Management
• Virtual Memory
• Memory Management Units (MMUs)
CMPE 220 51
Operating Systems
4. File System Management
• Directories
• File system cleanup
5. System Functions and Kernel Mode
1. User mode versus kernel mode
2. Context switching
6. User Interaction – (maybe)
• Shells
• Windowing systems
CMPE 220 52
Operating Systems
7. Network Management
• Protocols
• Ports
CMPE 220 53
Types of Operating Systems
• Distributed Operating Systems
• Network Operating Systems
• Cloud Operating Systems
• Object Oriented Operating Systems
CMPE 220 54
Client-Server Applications
• OSI versus TCP/IP Stack
• Protocols
• IP addressing
• IP v4 versus IP v6
• Ports
• UDP, TCP, HTTP, HTTPS
• Client-Server Applications
• FTP
• DNS lookup
• Web Servers
• Web Services and Applications
• Email
CMPE 220 55