0% found this document useful (0 votes)

87 views64 pages

UNIX System and Network Performance Tuning

Uploaded by

Rajesh Ranjan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

87 views64 pages

UNIX System and Network Performance Tuning

Uploaded by

Rajesh Ranjan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 64

UNIX System and Network

Performance Tuning

A tutorial introducing the gentle art of system performance tuning for UNIX
system managers and like minded individuals wherein we propose to explain
how, while systems are constantly growing faster with each passing year, they
never are fast enough.

Topics

❍ Introductory concepts
❍ CPU systems
❍ Memory systems
❍ Disk systems
❍ Windows, graphics and databases
❍ Networks
❍ Benchmarks
❍ Tuning up applications
❍ FINAL EXAM

Page 1
Copyright(C) 1999, Marcus J. Ranum - all rights reserved 1
Introductory concepts

❍ What are the biasses of this course?

❍ The fundamental laws of system performance
❍ The fundamental laws of performance tuning
❍ Overview of hardware and O/S interaction

Course Biasses

❍ Much of system tuning is system and O/S

dependent
❍ Examples based loosely on BSD UNIX
• Example text reformatted to fit on viewgraphs
❍ When you try to do this stuff yourselves you will
need to read the manual for your particular UNIX
flavor
• Pay close attention to the “see also” sections on performance
related manuals
• Some vendors provide better tools than others

Page 2
Copyright(C) 1999, Marcus J. Ranum - all rights reserved 2
Fundamental laws of system
performance

❍ You can’t have too much RAM

❍ The computer is never fast enough
❍ There is always one more bottleneck
❍ Every 2 years your computer is obsolete
❍ Every new software upgrade will cost performance
❍ Every new O/S upgrade will cost performance
❍ Performance is money
❍ Money can’t buy happiness but it can buy
performance

Fundamental laws of performance

tuning

❍ Be a scientist
• FIRST: Measure
• SECOND: Hypothesize
• THIRD: Test/Verify/Fiddle with things
• GOTO FIRST
❍ Change only one thing at a time
❍ Note what you changed
❍ Save measurement data
❍ TANSTAAFL: There Ain’t No Such Thing As A
Free Lunch
❍ Don’t look for a panacea
6

Page 3
Copyright(C) 1999, Marcus J. Ranum - all rights reserved 3
The uncertainty principle

❍ Quantum mechanics discover that it is impossible

to simultaneously measure both a particle’s
location and its speed
❍ Heisenberg’s uncertainty principle indicates that
the act of measuring a system affects it - thereby
throwing the measurement off
❍ The same applies for performance tuning!
• Most system monitoring programs are resource pigs!
• Some fancy X-windows based monitoring tools can completely
crunch a system!
• You must be able to factor (roughly) the effect of measuring into
your measurements

Hardware and O/S interaction

❍ The key to understanding system performance is to

understand how your software uses the underlying
hardware
❍ Most performance problems are a result of one of:
• Unbalanced systems
• Incorrect algorithms
• System bugs
• Resource starvation

Page 4
Copyright(C) 1999, Marcus J. Ranum - all rights reserved 4
Hardware and O/S interaction:
unbalanced systems

❍ System capabilities must be on par to take

advantage of eachother:
• If you have a super hot CPU connected via a 14.4K PPP link
your network will be slow even if you upgrade your CPU
• If you have a super hot on a system with slow disk drives it will
spend most of its cycles waiting for the disk to spin
• If you have an incredibly fast disk connected to a slow controller
it cannot transfer data faster than the controller
❍ Balancing system depends largely on what you
want to do with it

Hardware and O/S interaction:

incorrect algorithms

❍ Sometimes system capabilities are ignored or even

defeated by using the wrong algorithms either in
the kernel or in applications:
• One example is System V filesystem which (unwittingly) acted to
defeat disk track caches
• UFS (aka: “fast file system”) disk geometry logic makes no sense
on SCSI disks but is incredibly important on CDROMs
❍ What was a smart way to do things years ago may
be a bad idea today because of changing technology
❍ If you are on or near the cutting edge you will get
cut

Page 5
Copyright(C) 1999, Marcus J. Ranum - all rights reserved 5
Hardware and O/S interaction:
incorrect algorithms - example

inodes
• To write a file
inodes block means updating
data
data inode and data block
blocks
blocks
• Requires a seek from
inside to outside of
disk for each write
• Seeks are slow
• Seeks also discard
cached
cached modern SCSI disk track
track
track caches

Hardware and O/S interaction:

system bugs

❍ Bugs can manifest as performance problems

• Network problems may cause NFS errors and retransmits
which will appear to be a disk or server performance problem
• Multiprocessor lock contention or lost locks may appear to be
slow disks or unpredictable hangs
• Memory leaks may cause slow system degradation [A favorite
example is an X-Window server that slowly leaks memory and
grows to 20-30MB before crashing the system]
❍ Tracking vendor bug reports and keeping the O/S
upgraded regularly can be a cheap and easy form
of performance tuning!

Page 6
Copyright(C) 1999, Marcus J. Ranum - all rights reserved 6
Hardware and O/S interaction:
resource starvation

❍ Configuring the system is a zero sum game:

• In order to give in one place you usually have to take from
another
• The usual currency of system performance tradeoffs is RAM
❍ Example:
• Marcus sets up a 16MB BSD-based system with
MAXUSERS=128
• MAXUSERS is used to calculate kernel process table sizes
giving 2048 process table slots on Marcus’ machine
• 2048 process table slots, plus associated stack pointers and other
allocated memory uses 8MB of RAM from the system!
• By trying to make the machine faster Marcus actually slows it
down and causes periodic crashes!!

Hardware and O/S interaction:

resource starvation

❍ Resource starvation situations are the hardest to

debug:
• Often the system tries desperately to reallocate resources to get
work done
• This means that a shortage of RAM may appear to the user as
slow disk I/O
• Shortage of network buffers caused by shortage of RAM may
appear as a slow network!
❍ Most system performance problems are the result
of resource starvation !

Page 7
Copyright(C) 1999, Marcus J. Ranum - all rights reserved 7
Topics

❍ Introductory concepts
❍ CPU systems
❍ Memory systems
❍ Disk systems
❍ Windows, graphics and databases
❍ Networks
❍ Benchmarks
❍ Tuning up applications
❍ FINAL EXAM

CPU systems

❍ The role of the CPU

❍ Measuring CPU performance
❍ Solving CPU performance problems

Page 8
Copyright(C) 1999, Marcus J. Ranum - all rights reserved 8
What does the CPU do?

❍ CPU’s role depends on your system

• Some systems have hardware assists for various operations
❍ Generally assume the CPU has at least some
involvement in everything that happens: it is the
central coordinator of all activity
❍ General rule of thumb: things that happen really
often will have a CPU impact
• Network interrupts
• Context switches
• System calls
• Bus chatter
• Other mysterious things caused by space aliens
17

What does the CPU do? The

multiprocessor question

❍ The idea of multiple processors is to let more than

one thing happen at a time so more work gets done
in the same amount of time
❍ Some things can’t or shouldn’t happen at once or
the system will die a violent but creative death
Therefore: something needs to control what can
happen at once and what cannot
...and it’s one hell of a bottleneck!

Page 9
Copyright(C) 1999, Marcus J. Ranum - all rights reserved 9
CPU performance measures: key
concepts

❍ Interactive response times

• Also known as “the natives are getting restless”
❍ Load average
❍ CPU idle/system/user times

CPU performance measures: key

concepts: interactive response

❍ When your mind-bogglingly fast RISC system feels

slow to you, it probably is!
❍ Use the “time” command to get a rough idea how
long some basic things normally take
• Usually the same tasks should take approximately the same
amount of time each time, all things being equal
❍ Scientific studies* show that computer user
frustration rises sharply with delays in response
• More than 1/2 second delay produces visible stress
• Users begin repeatedly pressing <return> [3 times, on average]
• Don’t worry: if you’re the systems admin they will let you know

* Yes, they really don’t seem to

have anything better to do 20

Page 10
Copyright(C) 1999, Marcus J. Ranum - all rights reserved 10
CPU performance measures: key
concepts: load average

❍ Load average prints the average number of

runnable processes during:
• The last minute
• The last 5 minutes
• The last 15 minutes
❍ A runnable process is one that is ready to do work
but cannot because the system is busy
❍ Load average is a very coarse approximation but
usually when it is high you have a problem
❍ Some systems compute load average wrong and get
very strange results

CPU performance measures: key

concepts: CPU idle/system/user

❍ Many system diagnosis tools report CPU usage in

terms of percent of time spent in:
• Idle - The CPU is twiddling its little silicon thumbs
• User - The CPU is doing computation on behalf of an
application
• System - The CPU is busy doing some kind of internal
bookkeeping
❍ High values of system time usually indicate a
performance problem:
• I/O waits
• Paging
• System calls
• Network service routines
22

Page 11
Copyright(C) 1999, Marcus J. Ranum - all rights reserved 11
Measuring CPU: Uptime

. uptime
11:48pm up 1 day, 23:29, 49 users, load average: 0.34, 0.11, 0.01
. while true; do
you
you can
can tell
tell
> sleep 1 nobody
nobody is
is
doing much
doing much
> uptime
> done
11:48pm up 1 day, 23:30, 49 users, load average: 0.21, 0.09, 0.00
11:48pm up 1 day, 23:30, 49 users, load average: 0.21, 0.09, 0.00
11:48pm up 1 day, 23:30, 49 users, load average: 0.21, 0.09, 0.00
11:48pm up 1 day, 23:30, 49 users, load average: 0.28, 0.10, 0.01
11:48pm up 1 day, 23:30, 49 users, load average: 0.28, 0.10, 0.01
11:48pm up 1 day, 23:30, 49 users, load average: 0.28, 0.10, 0.01
11:48pm up 1 day, 23:30, 49 users, load average: 0.28, 0.10, 0.01
^C.

Measuring CPU: vmstat

system
system
time
time
user
user idle
idle
time
time time
time
. vmstat 5
procs memory page disk faults
cpu
r b w avm fre re at pi po fr de sr s0 s1 s2 s3 in sy cs us sy id
0 0 0 0 2496 0 6 16 0 1 0 0 0 0 0 1 168 336 31 2 7 91
0 0 0 0 2504 0 4 0 0 0 0 0 0 0 0 0 86 231 22 1 2 97
0 0 0 0 2352 0 2 0 0 0 0 0 0 0 0 0 79 211 22 0 1 98
0 0 0 0 2296 0 0 0 0 0 0 0 0 0 0 0 61 205 20 1 2 97
^C.

Page 12
Copyright(C) 1999, Marcus J. Ranum - all rights reserved 12
Measuring CPU: mpstat

. mpstat 5
average cpu 0 cpu 1 cpu 2
us ni sy id us ni sy id us ni sy id us ni sy id
2 0 7 91 2 0 7 91 2 0 7 91 2 0 7 91
0 0 1 98 1 0 1 98 0 0 2 97 0 0 1 99
0 0 7 93 1 0 0 99 0 0 0100 0 0 21 79
1 0 3 96 2 0 3 95 0 0 3 97 0 0 3 97
1 0 4 95 2 0 5 94 1 0 3 95 0 0 3 96
0 0 2 97 1 0 4 95 0 0 3 97 0 0 1 99
^C.

Measuring CPU: Accounting

❍ Very system/version dependent

❍ Somewhat annoying to manage
• Uses disk space
• Some performance impact
• Badly documented
• Accounting reduction tools are awful. But that is what PERL is
for
❍ Advantages:
• Measure exactly which processes on your machine are biggest
CPU or disk I/O hogs
• Solve for once and for all vital issues such as what web browser
is a bigger memory pig than the others

Page 13
Copyright(C) 1999, Marcus J. Ranum - all rights reserved 13
Measuring CPU: ps

CPU
CPU resident
resident
size
size status
status
time
time size
size
. ps -aux
F UID PID PPID CP PRI NI SZ RSS WCHAN STAT TT TIME COMMAND
80003 0 0 0 0 -25 0 0 0 runout D ? 0:00 swapper
88001 0 187 1 0 1 0 104 360 select I ? 0:00 rpc.lockd
28001 0 195 1 2 5 0 400 816 child I ? 0:00 /usr/local/b
88001 0 198 1 0 1 0 96 392 select I ? 0:13 /usr/local/e
2001 0 213 1 0 1 0 40 112 socket I ? 0:00 sh -c while
26001 0 215 213 5 1 5 104 536 select S N ? 0:00 /usr/local/w
20001 0 218 215 5 1 5 112 560 select S N ? 0:12 wplmd60 -T s
20001 0 220 1 0 5 0 32 112 child I ? 0:00 /bin/sh /usr
800001 13239 1 0 1 0 184 944 select I ? 0:26 /usr/local/e
88001 0 241 1 5 1 0 72 0 socket I ? 0:00 /bin/httpd_3
27

Measuring CPU: top

resident
resident CPU
CPU
size
size status
status
size
size percentage
percentage
last pid: 27674; load averages: 1.86, 1.49, 0.95
27681 1.72, 1.47, 0.95 00:13:43
2264processes: 2533sleeping, 23running, 8 stopped
Cpu states: 0.8% user, 0.0% nice, 32.9% system, 66.2% idle
Memory: 170M available, 1245 in use, 465 free, 18M locked

PID USERNAME PRI NI SIZE RES STATE TIME WCPU CPU COMMAND
27600 root 788 0 72K 328K rrun/2 1:33 79.42% 79.30 du
27670 mjr 538 0 14148K 1856 run/1 0:012 13.58% 7.03 top.Series5
27681 root 29 0 120K 232K run/0 0:00 0.00% 0.00% elpd
257 root 1 0 120K 280K sleep 2:295 0.00% 0.00% elpd
163 root 1 0 48K 0K sleep 2:04 0.00% 0.00% vnfsd
28

Page 14
Copyright(C) 1999, Marcus J. Ranum - all rights reserved 14
Solving CPU problems

❍ If your system spends most of its time with its time

pegged in user time you may have overreached
your CPU
❍ Graphics or numeric processing loaded systems are
likely candidates for CPU problems
❍ Systems with highly loaded networks can spend so
much time shuffling packets that they slow down
❍ CPU overload is the easiest problem to diagnose

Solving CPU problems: Bigger iron

❍ Buy a bigger system

❍ Add cache memory or a faster CPU
❍ Upgrading the whole system is usually better than
just upgrading the CPU
• Complete system upgrades may include bus speed increases
which will help overall performance
❍ If at all possible get a loaner system and compare it
with your existing system for the same load
❍ Take into account whether upgrading your
hardware will require upgrading your software to
possibly slower revisions
30

Page 15
Copyright(C) 1999, Marcus J. Ranum - all rights reserved 15
Solving CPU problems: More
processors

❍ For compute-intensive applications

multiprocessors are a win
❍ Understand the “parallelism” of your applications
base
• If your system is CPU-bottlenecked running a your single-
process database server it will be worse on a multiprocessor
system unless the database server is able to use more than one
processor efficiently [Beware vendor claims here]
❍ For general multi-user workload multiple
computers may outperform a multiprocessor box
• Multiple machines increases aggregate bus bandwidth
• More administrative hassles

Solving CPU problems: Less work

❍ Identify applications that can be offloaded to

dedicated server systems
• Example: buying a Pentium w/64MB of RAM as a dedicated
news server is much cheaper than upgrading a departmental
server
❍ Run processes at off-peak hours using cron or
queueing systems
❍ Identify applications that are problems and tune
them individually*

* This is what UNIX accounting

was originally for 32

Page 16
Copyright(C) 1999, Marcus J. Ranum - all rights reserved 16
Topics

❍ Introductory concepts
❍ CPU systems
❍ Memory systems
❍ Disk systems
❍ Windows, graphics and databases
❍ Networks
❍ Benchmarks
❍ Tuning up applications
❍ FINAL EXAM

Memory systems

❍ The role of memory: real and virtual

❍ Measuring memory performance
❍ Solving memory performance problems

Page 17
Copyright(C) 1999, Marcus J. Ranum - all rights reserved 17
Real and virtual memory

❍ System has a limited amount of real RAM

❍ Pretends to have more by playing a shell game
shuffling memory to and from disk
❍ System has a limited amount of virtual memory
based on swap space
❍ For a fast modern processor having to page
memory to and from disk has an impact on
performance equivalent to a freight train slamming
into the side of a cliff at 120MPH
❍ BUT some paging is normal and even good

Real and virtual memory:

allocation strategies

❍ Some systems provide different views of virtual

memory:
• Total memory = sizeof(RAM) + sizeof(swap)
• Total memory = sizeof(swap)
❍ Swap should always be larger than physical
memory
• How much depends on how much swap you usually use
• Rule of thumb is 2-3 times sizeof(RAM)

Page 18
Copyright(C) 1999, Marcus J. Ranum - all rights reserved 18
Real and virtual memory:
allocation strategies

❍ Some recommend multiple swap partitions on

different controllers/disks
• Really mostly useful for keeping paging (which is disk activity)
off of disks that are usually active for filesystems
• Remember: going to disk is so incredibly slow what’s another
millisecond between friends anyhow?
❍ Having a separate swap disk is a nice luxury
• I like putting swap and /tmp on a small high speed disk
❍ Remember that paging contends for disk bandwidth
with filesystem traffic

Real and virtual memory:

allocation strategies

❍ Most versions of UNIX perform all kinds of clever

tricks to try to page stuff out before it is forced to so
that new requests for allocated memory can be
fulfilled immediately
• 2 hand paging algorithm
• Scan rate
• Minfree, Maxfree, and Desfree [the three evil brothers]
❍ Fiddling with these values is serious hard-core
black magic juju
• In fact it’s so black magic that some vendors forget to adjust the
values to take into account changes in technology and average
memory sizes

Page 19
Copyright(C) 1999, Marcus J. Ranum - all rights reserved 19
Swapping

❍ Processes get swapped for one of two reasons:

• They were asleep for a long time and its efficient for the system
to kick the whole process out of memory and into long-term
storage
• The system is absolutely desperate for memory and it can’t get
by just stealing a few pages here and there so it starts to swap
entire processes
❍ Some swapping activity is normal and good
• Helps the system keep a free pool of memory
• Nice place to store zombie processes
❍ Other types of swapping activity means your
system has died and gone to hell

Paging

❍ 80% of the time a process is only executing 20% of

its code
• Hugely bloated X-Windows applications probably spend 95% of
their time in 5% of their code
❍ Paging is a nice way of keeping just the important
parts of a process in memory so it can run without
hogging the system
• Otherwise you’d need 1GB of RAM to support 30 users running
Xmosaic or Netscape
❍ System uses version-dependent paging algorithms
to shuffle less-often used pages of a program’s
address space to disk
40

Page 20
Copyright(C) 1999, Marcus J. Ranum - all rights reserved 20
Paging and the birth of a process

❍ Many versions of UNIX start a process by loading

the beginning of the program and enough to get it
started
❍ The rest of the program’s text image is loaded into
memory by page faulting it in on demand when a
code segment gets activated
• This is why the first time you pull down your “Hotlist” menu in
Xmosaic it sits and “thinks” for a while - it is paging the code in
❍ Paging is most often visible when portions of user
interface code get paged out and activating a
seldom-used menu suddenly causes a lag

Paging and the birth of a process

❍ Page-ins occur as a normal part of the life of a

process
❍ Page-outs occur as a normal part of the life of a
process
❍ You only need to get concerned when the system is
slow and is paging a lot

Page 21
Copyright(C) 1999, Marcus J. Ranum - all rights reserved 21
Thrashing

❍ Thrashing is the rare state where a program is

trying to keep more things in memory than there is
physical memory to fit and then it tries to access
them all frequently
❍ This results in constant heavy paging activity
❍ Large old TinyMUD servers used to thrash systems
❍ Mis-configured database engines can thrash
systems in exciting ways!
• What if you have a box with 64MB of RAM and tell your
database engine that it should keep a 64MB cache?
• Remember there’s system overhead: it doesn’t really have 64MB
to work with!
43

How memory is used in the system

❍ Other system resources consume memory:

• Network buffers
• TTY and PTY character typeahead
• Process table slots
• Inode cache
• File system in-memory superblocks
• Shared memory pages and semaphores
• Mount table slots
• File descriptor tables
• Page table entries
❍ Increasing kernel values almost always consumes
memory
44

Page 22
Copyright(C) 1999, Marcus J. Ranum - all rights reserved 22
What happens when memory runs
low

❍ The system always tries to keep a small “slush

fund” of free memory to give out quickly when
asked for it (desfree)
❍ When system needs free pages it starts to scan the
page tables (2 hand scan)
❍ The more desperate it is the faster and more often
it will scan
❍ When it cannot find enough fast enough it starts to
swap
❍ When it starts to swap everything grinds to a halt

Measuring memory: pstat

. pstat -s
55448k allocated + 20944k reserved = 76392k used, 538600k available
.

total
total active
active remaining
remaining
swap
swap swap
swap swap
swap

When out of swap

entirely, processes
start to die!
46

Page 23
Copyright(C) 1999, Marcus J. Ranum - all rights reserved 23
Measuring memory: ps

memory
memory
hogs
hogs

. ps -alx | sort -r -n +8
UID PID PPID CP PRI NI SZ RSS WCHAN STAT TT TIME COMMAND
175 15893 15884 0 1 0 920 1848 select I r2 0:57 pine
59 9370 18173 18 1 0 584 1728 select I r1 0:05 gs -dQUIET -
174 23698 23191 3 1 0 624 1664 select I t1 0:52 xmh -geometr
59 18173 16842 0 1 0 328 1528 select I r1 0:23 ghostview 5m
261 8197 1 0 1 0 328 1440 select I p6 0:04 xterm -title
261 25712 1 0 1 0 320 1368 select I ? 0:04 xterm -title
164 15882 15876 2 1 0 432 1368 select I r0 0:33 xmh -geometr
261 25721 1 0 1 0 320 1336 select I ? 0:03 xterm -title
^C

Measuring memory: vmstat

page
page page
page scan
scan
ins
ins outs
outs rate
rate

. vmstat 5
procs memory page disk faults
cpu
r b w avm fre re at pi po fr de sr s0 s1 s2 s3 in sy cs us sy id
2 2 0 0 47600 0 6 16 0 1 0 0 0 0 0 1 168 342 31 2 7 91
4 0 0 0 47456 0 8 104 0 0 0 0 1 1 0 3 407 559 53 2 59 40
2 0 0 0 47312 0 1 104 0 0 0 0 0 0 0 0 3381632 47 3 64 33
1 1 0 0 47280 0 0 112 0 0 0 0 0 0 0 0 279 979 40 3 56 41
^C.

Page 24
Copyright(C) 1999, Marcus J. Ranum - all rights reserved 24
Topics

❍ Introductory concepts
❍ CPU systems
❍ Memory systems
❍ Disk systems
❍ Windows, graphics and databases
❍ Networks
❍ Benchmarks
❍ Tuning up applications
❍ FINAL EXAM

Disk systems

❍ The role of disks

❍ The role of filesystems
❍ Measuring disk/filesystem performance
❍ Solving filesystem performance problems

Page 25
Copyright(C) 1999, Marcus J. Ranum - all rights reserved 25
Disk systems: the role of disks

❍ A disk is a place to store data you want to recover

reasonably quickly when you want it again
❍ Physically disks have many properties
• Many have on-disk caches of various sizes
• Some perform read/write re-ordering and optimization
• Some reveal their geometry
• Some have flexible or meaningless geometry
❍ Anything this page says about disks will be obsolete
by the time you read it
• Technology moving very fast
• Always getting smaller/faster/cheaper

Disk systems and filesystems

❍ Filesystem organizes data on disks (or similar

devices) so that the user and system can keep track
of it
❍ Filesystem design and database design are similar:
• Tradeoffs of speed versus space
• More efficient use of space seems to imply slower write
throughput
• Less efficient use of space usually implies faster write
throughput [tossing socks on the floor is faster than putting
them in your sock drawer but sooner or later it doesn’t scale
well and domestic harmony degrades]
• Tradeoffs of speed versus reliability
• Balance for read throughput or for write - never both

Page 26
Copyright(C) 1999, Marcus J. Ranum - all rights reserved 26
Disk caches and filesystem caches

❍ Most systems cache disk access

❍ Reads (and sometimes writes) are actually
from/done to memory instead of to disk
❍ For multiuser systems a cache inverts the
throughput equation:
• 85% of I/O on average is reads
• 95% of reads are fulfilled by the cache
• Therefore the I/O mix becomes 5% reads and 95% writes!!!
❍ Cache write policy is very version dependent and
very arcane and extremely important

File system performance

❍ Filesystem performance largely determined by

layout of blocks and meta-data
• More robust filesystems write more meta-data which makes
them slower
• Faster filesystems store more information about layout in
memory which makes them faster at reading but makes cache
write policy more complex [BSD, for example, caches pathname
components]
• Writing big chunks at a time is faster than writing little chunks
• Writing little chunks means you can use space more efficiently
❍ Filesystems and databases share many common
design issues

Page 27
Copyright(C) 1999, Marcus J. Ranum - all rights reserved 27
File system performance: reading

❍ The more information the filesystem keeps in

memory about where data is on disk the faster it
can read
• But the more memory it takes from the rest of the system
❍ UNIX filesystems use predictable sized block
values to allow quick location of any single block in
a file
❍ File blocks are read into cache and passed to
application in smaller chunks
❍ Most versions of UNIX read-ahead by one block to
try to already have data in memory
• This is a gamble that usually pays off
55

File system performance: writing

❍ Writing data entails multiple I/Os to different

parts of the disk:
• Write the data block itself
• Write the file’s inode with new file size
• Write the disk free block map when blocks are allocated
❍ Writing data so the filesystem will survive a crash
entails writing data in synchronous mode:
• System waits until the disk announces the data is written*
• Inode and free block/superblock writes are synchronous
❍ This is why restoring a filesystem is so much slower
than dumping it
* And we all know disks never
lie 56

Page 28
Copyright(C) 1999, Marcus J. Ranum - all rights reserved 28
File system performance: allocating

❍ Allocation strategy very system dependent

❍ Typically blocks are allocated to be as close to
other blocks of same file as possible
• Minimize head seeks when reading/writing same file
• Take advantage of disk internal caches if present
❍ UNIX file system maintains a free block bitmap
and allocates quickly by searching in-memory
structure
• Updating filesystem meta-data still very expensive!
❍ Old UNIX file system allocated blocks from a
linked list

File system performance: System V

filesystem

Free block
inodes
inodes list head
in superblock
(with cached next
entries) block

next
block

data
data next
block

When free list

gets scrambled
the filesystem is
corrupted

Page 29
Copyright(C) 1999, Marcus J. Ranum - all rights reserved 29
File system performance: UFS
filesystem

inodes
inodes
on
on cylinder
cylinder
groups
groups 10101011011011111

data
data is
is
between
between
cylinder When allocating a
cylinder
groups
groups block system searches
. in-memory bitmap
. based on cylinder
group summaries

File system performance: Log /

journaling systems

index
index
checkpoint
checkpoint

#322 #322 index

#233 #432 #212 ..

@321 @923 @453
old
old copy
copy updated
updated
of
of block
block block
block

When allocating a
block system searches
in-memory index

Page 30
Copyright(C) 1999, Marcus J. Ranum - all rights reserved 30
File system performance: silicon
disks

❍ Silicon disks have the advantage of very high speed

for writes
❍ Usually fairly small
❍ Usually fairly expensive
❍ Widely used in databases for storing transaction
logs and meta-data
• Some silicon disks used to boost NFS and filesystem
performance by absorbing filesystem meta-data writes to
battery-backed memory and later committing them to real disk
• NVRAM can boost NFS as much as 200% and regular UNIX
filesystems as much as 15%
❍ Can turn I/O bottleneck into a CPU bottleneck
61

File system performance: RAIDs

❍ RAID devices aggregate multiple disks into a single

virtual disk
• Different RAID “levels” represent different update policy
choices and different meta-data redundancy choices
• Just because the RAID “level” is high does not mean it is better
❍ RAIDs give performance boost by either striping
all I/O across the entire set of disks or by including
RAM caching and NVRAM caching
❍ Some in-kernel RAID software can turn a disk
bottleneck into a CPU bottleneck
❍ For all intents and purposes a RAID looks like a
single disk to the system
62

Page 31
Copyright(C) 1999, Marcus J. Ranum - all rights reserved 31
Network File Systems: NFS

❍ In an attempt to insure integrity NFS protocol spec

requires that all writes be synchronous
• Therefore all writes are generally painfully slow
• Some vendors support “asynchronous NFS” which is 200-700%
faster
• Asynch NFS should only be used for disposable data
❍ Generally NFS should be used to store shared non-
disposable data only
❍ Local disk bandwidth can give better application
performance for system executables, etc

Network File Systems: AFS

❍ Like NFS but with disk block caching to local hard

disk
❍ Gives significant performance improvements over
NFS for read-intensive applications

Page 32
Copyright(C) 1999, Marcus J. Ranum - all rights reserved 32
Measuring filesystems

❍ The main useful measurement of filesystem activity

is how many blocks of data are moving back and
forth
❍ Cross-index filesystem physical devices with mount
points to determine what directories are “hot
spots” and consider relocating them
❍ If one disk has significantly more I/O on it it is
possible that 95% of the system may be waiting for
that one disk to move data
❍ If one filesystem is a “hot spot” and it can be
replicated consider doing so a cheap form of RAID

Measuring filesystems: iostat

blocks
blocks // transf
transf // millisec
millisec //
second
second second
second seek
seek

. iostat 5
tty sd0 sd1 sd2 sd3
cpu
tin tout bps tps msps bps tps msps bps tps msps bps tps msps
4 274 4 0 15.6 1 0 13.6 5 1 12.0 24 4 13.2
0 21 0 0 9.0 8 1 18.2 2 1 18.8 0 0 19.3
0 15 0 0 0.0 0 0 0.0 0 0 0.0 3 0 12.3
0 16 0 0 0.0 0 0 0.0 0 0 0.0 2 0 27.3
^C.

Page 33
Copyright(C) 1999, Marcus J. Ranum - all rights reserved 33
Measuring filesystems: iostat

reads
reads // writes
writes // percent
percent
second
second second
second utilized
utilized

. iostat -D 5
sd0 sd1 sd3 sd7
rps wps util rps wps util rps wps util rps wps util
0 0 2.0 1 1 3.6 3 3 13.6 7 6 25.9
1 1 8.5 1 0 2.2 6 6 20.8 23 18 82.0
0 0 1.2 1 0 2.4 10 7 37.2 17 16 71.3
1 1 8.8 1 1 6.2 7 7 32.1 20 14 74.9
1 0 4.2 1 0 1.4 8 7 38.2 15 13 58.3
^C.

Measuring filesystems: nfsstat

. nfsstat -c directory
writes errors directory reads
writes errors reads reads
Client rpc: reads
calls badcalls retrans badxid timeout wait newcred timers
1113240 59 12656 12145 12701 0 0 72441
Client nfs:
calls badcalls nclget nclsleep
1113240 59 1113240 0
null getattr setattr root lookup readlink read
0 0% 136235 12% 2487 0% 0 0% 138468 12% 10152 0% 790623 71%
wrcache write create remove rename link symlink
0 0% 15615 1% 8208 0% 4094 0% 3550 0% 1 0% 0 0%
mkdir rmdir readdir fsstat
0 0% 0 0% 3717 0% 90 0%
.

Page 34
Copyright(C) 1999, Marcus J. Ranum - all rights reserved 34
Improving performance: buffer
cache

❍ Some versions of UNIX use a fixed size buffer

cache for filesystem I/O
❍ Increasing the buffer cache size will improve I/O
but may trigger paging or resource starvation
elsewhere in the system
❍ Before increasing buffer cache be careful to
measure the existing system’s virtual memory
usage

Improving performance: more

RAM

❍ Most current versions of UNIX use the virtual

memory system as a disk cache as well
❍ I/O and memory bandwidth now entwined
❍ Adding more RAM will help both virtual memory
performance and disk I/O
❍ I/O intensive applications reduce the available
memory bandwidth for other applications
❍ Memory pig applications reduce the disk I/O
bandwidth

Page 35
Copyright(C) 1999, Marcus J. Ranum - all rights reserved 35
Improving performance: load
spreading

❍ Keep Statistics to determine which disk devices are

most heavily loaded
• Run iostat every 1/2 hour on each device for a week
• Summarize reads and writes - some sites make pretty plots with
excel or jgraph
• Map devices to mount points
• Examine statistics to see which devices are most busy and see if
you have more than one important filesystem on the same disk
❍ When installing more disk space spread heavily
loaded filesystems rather than simply mounting the
new disk as a set of empty filesystems

Improving performance: NFS

tricks

❍ Track NFS services on your fileserver using nfsstat

to examine server statistics
❍ NFS server statistics don’t tell what filesystem the
accesses are against
• Guess :)
• Try to cross-index with iostat output
❍ If a filesystem is very busy and contains
“disposable” data consider moving copies of the
data to client local disk
• Remember: kicking netscape off over an NFS-mounted
filesystem triggers about 1MB of NFS I/O in no time at all

Page 36
Copyright(C) 1999, Marcus J. Ranum - all rights reserved 36
Improving performance: NFS
tricks

❍ Check for clients that are causing especially heavy

loads and examine them
• Look out for cron jobs on clients that do a “find” [Some systems
ship this way!]
❍ Check the state of nfsds on the server
• Adding nfsds will allow the server to simultaneously service
more NFS client requests
• More nfsds means the server can work HARDER than before
• More nfsds does not make the server faster but may speed
things up for the client

Improving performance: NFS

tricks

❍ Asynchronous NFS
• NFS writes are always supposed to be synchronous
• Makes NFS writes incredibly painfully slow
• Some vendors support asynchronous NFS in which the server
tells the client it has completed the write before it actually has
• If the server crashes the client will think the file was successfully
written but it wasn’t
• Asynchronous NFS is best for things like /tmp - but usually you
don’t want a dynamic filesystem like /tmp NFS-mounted in the
first place!
• Depending on size of server memory can provide up to 700%
write performance boost

Page 37
Copyright(C) 1999, Marcus J. Ranum - all rights reserved 37
Improving performance: NFS
tricks

❍ NVRAM cache
• System modified to write NFS write requests to NVRAM instead
of disk
• Client receives NFS write ack instantly
• Asynchronous process flushes NVRAM cache through normal
filesystem on server
• Cache preserved across boots
• Depending on cache size typically provides a performance boost
of up to 300%
• If cache is on motherboard make sure that the cache is flushed
when field service replaces mother board. Also make sure that
new motherboard caches are cleared before installing. :)

Improving performance: NFS set

up wrong

Client Server

NFS mount
/home /usr/local
directory /usr/X11R5
with user’s /usr/spool/news
files /usr/spool/mail

Page 38
Copyright(C) 1999, Marcus J. Ranum - all rights reserved 38
Improving performance: NFS set
up wrong

Client Server

local system files NFS mount

/ /home
/usr /usr/spool/mail
/usr/local/bin
/usr/X11R5

Improving performance: smarter

applications

❍ Some applications do not use disk bandwidth

wisely
• Shell scripts on NFS clients that generate large temporary files
instead of using pipes are very slow
• Using standard I/O library matches buffered I/O to system
block size
• Applications that rely on NFS locking are very slow*
• Applications that use read/write may do much more I/O than
necessary if not designed carefully
❍ If you narrow your I/O problems down to a few
applications use application tuning techniques on
them

* Also, they are crazy to rely on

it. It’s usually crippled by bugs 78

Page 39
Copyright(C) 1999, Marcus J. Ranum - all rights reserved 39
Topics

❍ Introductory concepts
❍ CPU systems
❍ Memory systems
❍ Disk systems
❍ Windows, graphics and databases
❍ Networks
❍ Benchmarks
❍ Tuning up applications
❍ FINAL EXAM

Windows, terminals, and graphics

❍ Effects of windowing systems on performance

❍ Improving X-Window performance
❍ Effects of graphic accelerators
❍ Effects of hardwired terminals

Page 40
Copyright(C) 1999, Marcus J. Ranum - all rights reserved 40
Effects of windowing systems on
performance

❍ Windowing systems result in lots of context

switches
• Mouse focus changes wake up and put applications to sleep
• For fun, on a workstation move your mouse around while
running vmstat and watch its effect
• For fun, use MOTIF on a workstation with 16MB of memory
and several Xterms open, compile something, and run vmstat,
then click on a menu. :)
❍ The fancier the window the more memory it takes
❍ Putting a nifty background on your screen eats up
(screenheight x screenwidth) bytes = usually 1MB
• Put a nifty background up, run vmstat, and move windows
around on your screen
81

Improving X-Window performance

❍ Buy more memory

❍ Buy a faster machine
❍*

* I’d love to say “don’t use X”

but that isn’t an option these days 82

Page 41
Copyright(C) 1999, Marcus J. Ranum - all rights reserved 41
Effects of graphic accelerators

❍ Custom graphic accelerators often require custom

X servers or drivers
• Limits your options for building local versions of X
[Sad But True story: one site buys a bunch of fancy 3D graphics
boards and then builds and runs a local version of X11R5
instead of the X11R4-based server from the vendor. It takes a
while for them to figure out why it is slow]
• Sometimes accelerators require more system memory
❍ If you are doing serious graphics expect to buy
serious memory

Hardwired terminals

❍ When using hardwired terminals the CPU gets an

interrupt for every keystroke
❍ Many modern versions of UNIX not optimized to
lots of character I/O
❍ Offload keystroke processing to terminal servers if
possible
❍ If system has hardwired terminals connected and
has performance problems use iostat to see if it is
doing a lot of terminal I/O
• Sometimes a terminal wire goes bad and loops the operating
system with a load of garbage
• Getty will usually complain in syslog
84

Page 42
Copyright(C) 1999, Marcus J. Ranum - all rights reserved 42
Databases

❍ A simple view of databases

❍ Resource effects of databases
❍ Measuring database impact on performance
❍ Solving database performance problems

Introduction to databases

❍ Most modern DBMS servers act like a miniature

operating system
• Handle their own cache
• Do their own internal scheduling
• Manage locking sometimes using O/S locking
• Manage network connections
• Use raw disk partition to avoid having UNIX do I/O buffering
❍ Many operating system tuning concepts apply to
DBMS’
❍ Using UNIX tools can help you tell what the DBMS
server is doing [sometimes]

Page 43
Copyright(C) 1999, Marcus J. Ranum - all rights reserved 43
Resource profile of databases

❍ Use lots of memory

❍ Large writes to raw disk partition
❍ Small synchronous writes to database journal
• Some DBMS’ write journal files to UNIX filesystem others to
raw partition
❍ May use large amounts of CPU during query
optimization and index joins
❍ Usually I/O bound for writes
❍ Usually CPU bound for reads

Measuring database impact on

system

❍ Use iostat to see disk read/write mix on database

devices
❍ Use vmstat to see virtual memory use patterns
❍ Use netstat to estimate network traffic caused by
server
❍ Whenever possible when tuning a database make
sure it is on a system by itself or otherwise
characterize and “subtract” out user load

❍ Usually the vendor can help

❍ Sometimes the vendor doesn’t have a clue
• Get vendor technical support in contact with the O/S vendor
technical support
❍ Treat the DBMS as a normal application and use
normal system performance tuning techniques
❍ If you imagine it is an operating system within an
operating system you will approach the problem
from the right point of view

Topics

❍ Introductory concepts
❍ CPU systems
❍ Memory systems
❍ Disk systems
❍ Windows, graphics and databases
❍ Networks
❍ Benchmarks
❍ Tuning up applications
❍ FINAL EXAM

❍ The role of networks

❍ Measuring network performance
❍ Solving network performance problems

Overview of networks

❍ TCP
• Reliable end-to-end virtual circuit with sequenced delivery
• High throughput
• Congestion control moderates data rate
❍ UDP
• Unreliable “message” delivery without sequencing
• High throughput
• Can spew packets as fast as the machine can send them*
❍ Those who don’t understand TCP are doomed to
re-invent it
• The world is full of UDP-based code with loads of logic that does
what TCP does better [like NFS]
* Faster, actually. When it can’t
send them it throws them away. 92

❍ Ping
• Measures packet round trip time
• Useful for measuring packet lossage between hosts or networks
❍ Traceroute
• Tries to determine route that packets are taking between hosts
or networks
• Useful for detecting weird routing situations [Sad But True
story: one organization was routing traffic between 2 machines
over an international link because the admin didn’t understand
the concept of subnets. Needless to say, this was slow.]
❍ Netstat
• Tons of useful information about how much data system has
sent and received

. ping sol
PING sol: 56 data bytes
64 bytes from sol (192.33.112.100): icmp_seq=0. time=2. ms
64 bytes from sol (192.33.112.100): icmp_seq=1. time=5. ms
64 bytes from sol (192.33.112.100): icmp_seq=2. time=4. ms
errors
errors
64 bytes from sol (192.33.112.100): icmp_seq=3. time=6. ms
64 bytes from sol (192.33.112.100): icmp_seq=4. time=2. ms
64 bytes from sol (192.33.112.100): icmp_seq=5. time=7. ms
^C
----sol PING Statistics----
6 packets transmitted, 6 packets received, 0% packet loss
round-trip (ms) min/avg/max = 2/4/7
.

trip
trip time
time
95

Measuring networks: netstat

packets
packets read
read packets
packets
read
read errors
errors sent
sent

. netstat -i
Name Mtu Net/Dest Address Ipkts Ierrs Opkts Oerrs Collis
le0 1500 192.33.112.0 illuminati 1323992 455 1037625 4 11715
lo0 1536 loopback localhost 31509 0 31509 0 0
.
collisions
collisions

A few collisions
is OK. This is less
than 2%!!
96

. netstat -m
192/320 mbufs in use:
1 mbufs allocated to data
1 mbufs allocated to packet headers
75 mbufs allocated to socket structures
how
how much
much of
of the
the
99 mbufs allocated to protocol control blocks system’s
system’s network
network
3 mbufs allocated to routing table entries buffers are left
buffers are left
11 mbufs allocated to socket names and addresses
2 mbufs allocated to interface addresses
0/28 cluster buffers in use
68 Kbytes allocated to network (35% in use)
can
can indicate
indicate aa
0 requests for memory denied
system
system without
without
0 requests for memory delayed enough
enough network
network
buffers
buffers or
or RAM
RAM
0 calls to protocol drain routines

Measuring networks: ttcp

TCP
TCP load
load //
thruput
thruput thruput
thruput

. ttcp -t -s illuminati
ttcp-t: nbuf=1024, buflen=1024, port=2000
ttcp-t: connect
ttcp-t: 0.0user 0.4sys 0:05real 9% 0i+43d 21maxrss 0+0pf 378+5csw
ttcp-t: 1048576 bytes processed
ttcp-r: 0.0user 0.2sys 0:05real 4% 0i+29d 19maxrss 0+1pf 773+774csw
ttcp-r: 1048576 bytes processed
ttcp-r: 0.26 CPU sec = 3938.46 KB/cpu sec, 31507.7 Kbits/cpu sec
ttcp-r: 5.98 real sec = 171.074 KB/real sec, 1368.59 Kbits/sec
.

❍ Daemon listener
• Accepts flexible and very powerful directions about what types
of things to look for and what kinds of statistics to maintain
about them
• Efficient listener doesn’t use much CPU
❍ Client stats-poller
• Periodically downloads from daemon to disk
• Can merge reports from multiple listening posts
❍ Interactive access
• Administrator can interactively query server for statistics
❍ This is a good tool if you want to summarize traffic
between hosts for a given service
99

Measuring networks: etherman,

packetman, interman

❍ Graphical interface based on X

• Represents a LAN with nodes
• Plots traffic between nodes using bars of various thickness to
indicate level of traffic
❍ On busy networks can completely swamp the
workstation it is running on
❍ Very useful for a “one shot” check of network
traffic
❍ Pretty pictures are very useful for printing out
❍ “Poor man’s network management station”

100

❍ 95% of network problems are either configuration

errors or applications overloading the network
• Track configuration errors using system tools, etherman,
NNstat, etc
• Track applications using packetman, NNstat, etc
• Check for high levels of collisions or errors using netstat
❍ If networking on a given system is slow
• Check netstat -m to see if low on memory buffers
• Increase MAXUSERS on system to increase memory for
networking
• Don’t make MAXUSERS too big or you’ll actually waste
memory

101

Improving network performance

❍ The Best way to improve network performance is

to design your network carefully
❍ Put high point-to-point load servers on private
networks with separate interfaces if local networks
are swamping
❍ Use “smart” bridges or route traffic
❍ Move noisy workstations to segments behind
“smart” bridges

102

Clients
10BaseT 100BaseT
Hub hub

Offsite
router

103

A simple network layout grows

Clients
10BaseT 100BaseT
Hub hub

Offsite Insite
router router

10BaseT 100BaseT
Hub hub

104

❍ Introductory concepts
❍ CPU systems
❍ Memory systems
❍ Disk systems
❍ Windows, graphics and databases
❍ Networks
❍ Benchmarks
❍ Tuning up applications
❍ FINAL EXAM

105

Benchmarks

❍ How vendors cheat on benchmarks

❍ How to interpret a benchmark
❍ Benchmarks you can trust(?)
❍ How to do a benchmark

106

❍ The case of the growing SPECMARKs

• Within one operating system release a vendor improves
performance in floating point by a factor of 2
• Older systems by same vendor benchmarks no longer published
but their performance increases by 2 also
• Conclude: Someone cracked a benchmark
• One year later all vendors reporting inflated floating point
• SPEC Cooperative improves benchmark in next version
❍ Moral: When a benchmark is published that is an
average of a number of test results check the entire
test range and look for tests that match your needs
• Also look for numbers that are seriously out of range

107

How vendors cheat on benchmarks

❍ The case of the “cache thrasher” benchmark

• Vendor ‘A’ discovers a side effect of how Vendor ‘B’s UNIX
implementation manages buffer cache
• Vendor ‘A’ designs and publishes a benchmark that their
system handles gracefully, which completely blows Vendor B’s
machine out of the water
• Next release of Vendor B’s system performs same as Vendor ‘A’
on the benchmark
❍ Moral: There is lots of kruft in vendor UNIX
versions just for benchmark cooking
• When you run it on your machine you’ll discover that it causes
resource starvation that chokes the system elsewhere

108

❍ The case of the “custom library”

• Customer provided benchmark uses fgets() to read a big file
• Vendor runs the benchmark after first loading the file into the
buffer cache by doing a “cat file > /dev/null”
• Vendor compounds the performance increase by linking the
benchmark against the Cnews fast fgets() implementation that is
about 2X faster
• Vendor gains approximate 400% performance increase
❍ Moral: Applications tuning is almost always
applied to benchmarks

109

How vendors cheat on benchmarks

❍ The case of “I know the answer”

• A large computer magazine uses seive of eratosthenes for the
first 1000 primes as a benchmark
• A compiler writer hand-codes a recognizer into the compiler to
identify the benchmark at compile time
• Compiler simply generates the correct answers on standard
output
• Benchmark runs in under a millisecond
❍ Moral: If the answer is too good to be true it
probably is
❍ In this case the compiler writer did it as a joke and
told the perplexed victim what he had done

110

❍ Before you even look at a benchmark characterize

your system and have a good idea what it needs to
do well
❍ Look for benchmark figures related to those
attributes only
• If your system will be running a DBMS ignore values that are
“cumulative” test values that include things like floating point
into an average value
• Most “real” benchmarks include per-test-function measures
❍ Look for information about testing methodology
• Some benchmarks specify that the benchmark be run on a
machine using “as shipped” software and “as shipped”
configuration
111

Benchmarks you can trust(?)

❍ Generally computer trade magazines that do

benchmarks do not cook them for specific vendors
❍ Some computer trade magazine benchmarks are
really really stupid [seive of eratosthenes?]
❍ Don’t trust a benchmark that was run by the
vendor - ever
❍ Synthetic benchmarks (e.g.: SPEC) more useful
than simulated workload (e.g.: AIM) only if all you
do is number crunching
❍ The best benchmark is your application running
on the system after you set it up and run it yourself
112

❍ Don’t
❍ Use your real application that you install the way
you would run it

113

Topics

❍ Introductory concepts
❍ CPU systems
❍ Memory systems
❍ Disk systems
❍ Windows, graphics and databases
❍ Networks
❍ Benchmarks
❍ Tuning up applications
❍ FINAL EXAM

114

❍ The importance of application tuning

❍ Measuring application performance
❍ Solving application performance problems

115

Tuning Applications

❍ All the time your system spends in user time is a

place where applications tuning may be applied
❍ Tuning applications is much cheaper (sometimes)
than buying a new machine
❍ The case of the Real Dumb Database:
• Site upgrades server every year for 3 years
• One day a programmer discovers that a critical part of the
application performs a linear search on a file
• Application is rewritten to use a dbm database
• New version of application running on desktop machine is now
much faster than old version running on a big server

116

❍ Accounting is a good way of seeing what

applications your machine spends most of its time
servicing
❍ Determine which are most CPU intensive and
which are most disk intensive
❍ Worry only about the 2 or 3 at the top of the list
❍ If 5 applications use 50% of your system’s compute
cycles and you can speed them up by a factor of 2
each you have just gained the equivalent of a
processor upgrade to next year’s technology

117

Measuring applications: gprof

lousy
lousy
code
code
. cat > x.c
main()
{
int x;
while(read(0,&x,sizeof(x)) > 0)
write(1,&x,sizeof(x));
}
. cc -pg -o x x.c
.

compile
compile
w/profiling
w/profiling

118

run csh
run
normally csh has
has aa
normally builtin
builtin “time”
“time” that
that
gives
gives I/O
I/O
counts,
counts, etc.
etc.
.time x < /etc/termcap > /dev/null
real 0m6.81s
user 0m0.70s profiling
profiling
sys 0m6.10s output
output
. ls -l
total 40
-rw------- 1 mjr 7108 Apr 4 01:35 gmon.out
-rwx------ 1 mjr 32768 Apr 4 01:34 x
-rw------- 1 mjr 77 Apr 4 01:30 x.c
.

119

Measuring applications: gprof

run
run
gprof
gprof Wow!
Wow!
it
it does
does aa
lot of I/O!!
lot of I/O!!
. gprof x | more
index %time self descendents called+self name index
[1] 99.5 0.00 5.35 start [1]
0.09 5.26 1/1 _main [2]
0.00 0.00 1/1 _on_exit [101]
0.00 0.00 1/1 _exit [95]
-----------------------------------------------
0.09 5.26 1/1 start [1]
[2] 99.5 0.09 5.26 1 _main [2]
3.47 0.00 33361/33361 _read [3]
1.79 0.00 33360/33360 _write [4]

120

. trace x < /etc/termcap > /dev/null

open ("/dev/zero", 0, 02) = 3
mmap (0, 17036, 0x3, 0x80000002, 3, 0) = 0xf77fa000
close (3) = 0
getpagesize () = 4096
brk (0x9678) = 0
system
system
brk (0xa678) = 0
calls
calls
read (0, "# --", 4) = 4
write (1, "# --", 4) = 4
read (0, "----", 4) = 4 WHAT!?!
WHAT!?!
Right
Right about
about here
here is
is where
where
write (1, "----", 4) = 4 you
you should notice
should notice it
it is
is
read (0, "----", 4) = 4 reading
reading the whole file 44
the whole file
bytes at a time!!!!
bytes at a time!!!!
write (1, "----", 4) = 4
read (0, "----", 4) = 4

121

Topics

❍ Introductory concepts
❍ CPU systems
❍ Memory systems
❍ Disk systems
❍ Windows, graphics and databases
❍ Networks
❍ Benchmarks
❍ Tuning up applications
❍ FINAL EXAM

122

123

FINAL EXAM: Question #1

124

125

FINAL EXAM: Question #3

126

127

FINAL EXAM: Question #5

128

Cs23402 - Computer Architecture - Unit - 1
No ratings yet
Cs23402 - Computer Architecture - Unit - 1
161 pages
Introduction To Computer System Performance
100% (1)
Introduction To Computer System Performance
28 pages
Sneak Peek BCTCI - First 7 Chapters - What's Broken About Coding Interviews, What Recruiters Won't Tell You, How To Get in The Door, and More
100% (1)
Sneak Peek BCTCI - First 7 Chapters - What's Broken About Coding Interviews, What Recruiters Won't Tell You, How To Get in The Door, and More
70 pages
Linux System Administration
No ratings yet
Linux System Administration
39 pages
Lec 2
No ratings yet
Lec 2
31 pages
Performance Tuning Basic Bejjanky Naresh Kumar
100% (2)
Performance Tuning Basic Bejjanky Naresh Kumar
36 pages
00 Introduction To OS-1
No ratings yet
00 Introduction To OS-1
51 pages
Practical Work 1
100% (1)
Practical Work 1
10 pages
RBS 6102 4+4+4 900 and 1800 PDF
96% (23)
RBS 6102 4+4+4 900 and 1800 PDF
16 pages
Unit 1
No ratings yet
Unit 1
68 pages
Southern Province Grade 10 Information and Communication Technology Ict 2020 1 Term Test Paper 61e9422335b6f
No ratings yet
Southern Province Grade 10 Information and Communication Technology Ict 2020 1 Term Test Paper 61e9422335b6f
13 pages
01 Introduction
No ratings yet
01 Introduction
31 pages
ch1 Os Last
No ratings yet
ch1 Os Last
43 pages
4 Performance
No ratings yet
4 Performance
67 pages
2022 MDP APP and Budget Matrix F2F SARSARACAT ES
No ratings yet
2022 MDP APP and Budget Matrix F2F SARSARACAT ES
15 pages
CA - OS-Chapter 2 - Students
No ratings yet
CA - OS-Chapter 2 - Students
44 pages
01-Lec - Concepts of OS & Networking and of Week 1
No ratings yet
01-Lec - Concepts of OS & Networking and of Week 1
34 pages
Advanced Performance Tuning..
No ratings yet
Advanced Performance Tuning..
135 pages
Chapter4 Performance
No ratings yet
Chapter4 Performance
36 pages
SBP-performance-tuning Color en
No ratings yet
SBP-performance-tuning Color en
29 pages
Digital Signal Processing Ppt-1
100% (1)
Digital Signal Processing Ppt-1
12 pages
System & Network Performance Tuning: Hal Stern
No ratings yet
System & Network Performance Tuning: Hal Stern
184 pages
Chapter 1 PPT 2007 V 2
No ratings yet
Chapter 1 PPT 2007 V 2
36 pages
rh442 Notes
No ratings yet
rh442 Notes
26 pages
T5-Linux Performance Tuning
No ratings yet
T5-Linux Performance Tuning
52 pages
Ashwani Kumar Yadav Chief Mechanic
No ratings yet
Ashwani Kumar Yadav Chief Mechanic
5 pages
L-2 (Computer Performance)
No ratings yet
L-2 (Computer Performance)
52 pages
CCS 1202 Lecture 2 - Computer Evolution and Performance
No ratings yet
CCS 1202 Lecture 2 - Computer Evolution and Performance
32 pages
Feedback Control Theory
From Everand
Feedback Control Theory
Bruce Francis
5/5 (1)
Action Recognition: Step-by-step Recognizing Actions with Python and Recurrent Neural Network
From Everand
Action Recognition: Step-by-step Recognizing Actions with Python and Recurrent Neural Network
Mark Magic
No ratings yet
Linux Perf Tuning 2010 1up
No ratings yet
Linux Perf Tuning 2010 1up
91 pages
Electromagnetic Clutches
100% (1)
Electromagnetic Clutches
12 pages
Lec10 Performance
No ratings yet
Lec10 Performance
22 pages
FIT9134 Week11
No ratings yet
FIT9134 Week11
21 pages
EGPCL-NPL-PEL-KEC-PPL-RPT-00007 Wall Thickness Calculation Report C01
No ratings yet
EGPCL-NPL-PEL-KEC-PPL-RPT-00007 Wall Thickness Calculation Report C01
13 pages
Da Ci
No ratings yet
Da Ci
13 pages
Dumpsys ANR WindowManager
No ratings yet
Dumpsys ANR WindowManager
5,317 pages
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
From Everand
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
Jonathan Rigdon
No ratings yet
Linux Performance Tuning Logistics: Tutorial Runs From 9 To 5:00pm
No ratings yet
Linux Performance Tuning Logistics: Tutorial Runs From 9 To 5:00pm
46 pages
Solaris WorkShop R3 21
No ratings yet
Solaris WorkShop R3 21
125 pages
Chapter - Four - Resource - Monitoring & Management PDF
No ratings yet
Chapter - Four - Resource - Monitoring & Management PDF
27 pages
5.prestressing in UHPFRC
No ratings yet
5.prestressing in UHPFRC
10 pages
Performance Tuning Oracle Rac On Linux
No ratings yet
Performance Tuning Oracle Rac On Linux
12 pages
The Role of Performance: Chapter - 2
No ratings yet
The Role of Performance: Chapter - 2
40 pages
MyPractice - Question Bank - Results
No ratings yet
MyPractice - Question Bank - Results
194 pages
Linux Performance Tuning - 1
No ratings yet
Linux Performance Tuning - 1
53 pages
Computer Science I Essentials
From Everand
Computer Science I Essentials
Randall Raus
5/5 (7)
RHCA Syllabus
No ratings yet
RHCA Syllabus
13 pages
Codigos 5700
No ratings yet
Codigos 5700
153 pages
Pptcomp
No ratings yet
Pptcomp
10 pages
Chapter 8 - CPU Performance
No ratings yet
Chapter 8 - CPU Performance
40 pages
Lecture 4
No ratings yet
Lecture 4
10 pages
Advanced Computer Architecture: 563 L02.1 Fall 2011
No ratings yet
Advanced Computer Architecture: 563 L02.1 Fall 2011
57 pages
CS3350B Computer Architecture CPU Performance and Profiling: Marc Moreno Maza
No ratings yet
CS3350B Computer Architecture CPU Performance and Profiling: Marc Moreno Maza
28 pages
Why Computers Are Getting Slower
No ratings yet
Why Computers Are Getting Slower
24 pages
Linux Con 2010 Linux Monitoring
No ratings yet
Linux Con 2010 Linux Monitoring
44 pages
Unit IV - IT Infra Management
No ratings yet
Unit IV - IT Infra Management
8 pages
ТБК 116 ССО Бахитова Бахыткул Алфавит, графика, орфография, транскрипция, транслитерация
No ratings yet
ТБК 116 ССО Бахитова Бахыткул Алфавит, графика, орфография, транскрипция, транслитерация
10 pages
Operating System Interview Questions and Answers
From Everand
Operating System Interview Questions and Answers
Manish Soni
No ratings yet
IT401 Computer Organization and Architecture: Prasun Ghosal
No ratings yet
IT401 Computer Organization and Architecture: Prasun Ghosal
30 pages
Systems Performance Book
No ratings yet
Systems Performance Book
4 pages
HP-UX Kernel Tuning and Performance Guide: Getting The Best Performance From Your Hewlett-Packard Systems
No ratings yet
HP-UX Kernel Tuning and Performance Guide: Getting The Best Performance From Your Hewlett-Packard Systems
31 pages
Lecture 16 Technology, Performance, Powerwall
No ratings yet
Lecture 16 Technology, Performance, Powerwall
9 pages
M116C 1 M116C 1 Lect02-Performance
No ratings yet
M116C 1 M116C 1 Lect02-Performance
23 pages
Lecture 1 8405 Computer Architecture
No ratings yet
Lecture 1 8405 Computer Architecture
15 pages
030-036 Tuning
No ratings yet
030-036 Tuning
7 pages
Teknologi Virtualisasi Dan Cloud Computing: L #3 Assessing and Understanding Performance
No ratings yet
Teknologi Virtualisasi Dan Cloud Computing: L #3 Assessing and Understanding Performance
10 pages
Touchpad Modular Ver. 1.1 Class 7
From Everand
Touchpad Modular Ver. 1.1 Class 7
Team Orange
No ratings yet
Forced Perspective Photography
100% (1)
Forced Perspective Photography
3 pages
Cmcp700s-Cvt Manual v1.1
No ratings yet
Cmcp700s-Cvt Manual v1.1
8 pages
Os Theory
No ratings yet
Os Theory
7 pages
5.unix System Performance
No ratings yet
5.unix System Performance
2 pages
Unit 9-2
No ratings yet
Unit 9-2
14 pages
Walmart's (Key Success Factors)
No ratings yet
Walmart's (Key Success Factors)
4 pages
The Use of Ultrasonic Cleaning in Dairy Industry: How Does It Work?
No ratings yet
The Use of Ultrasonic Cleaning in Dairy Industry: How Does It Work?
3 pages
Lesson 7: System Performance: Objective
No ratings yet
Lesson 7: System Performance: Objective
2 pages
DCCN Lab
No ratings yet
DCCN Lab
37 pages
Parallel Database
No ratings yet
Parallel Database
27 pages
Learn Operating System in 24 Hours
From Everand
Learn Operating System in 24 Hours
Alex Nordeen
No ratings yet
Flair AIO
No ratings yet
Flair AIO
5 pages
In This Project I Design "Exam - Aspx, Mocktest - Aspx, and Result - Aspx" With Their Background Coding
No ratings yet
In This Project I Design "Exam - Aspx, Mocktest - Aspx, and Result - Aspx" With Their Background Coding
1 page
Onboarding Form Filling Guide
No ratings yet
Onboarding Form Filling Guide
2 pages
Overview of 3G
No ratings yet
Overview of 3G
52 pages
Journal of Computer Science and Informat
No ratings yet
Journal of Computer Science and Informat
192 pages
6SL3210-5HE12-0UF0 Datasheet en
No ratings yet
6SL3210-5HE12-0UF0 Datasheet en
2 pages
Geo SCADA Expert Performance Guidelines
No ratings yet
Geo SCADA Expert Performance Guidelines
12 pages
1 s2.0 S0306261924004148 Main
No ratings yet
1 s2.0 S0306261924004148 Main
20 pages
Proper Waste Management
No ratings yet
Proper Waste Management
20 pages
A Comparative Performance Analysis of ANN Algorithms For MPPT Energy Harvesting in Solar PV System
No ratings yet
A Comparative Performance Analysis of ANN Algorithms For MPPT Energy Harvesting in Solar PV System
16 pages
Blog - Are Smartphone Rentals Value Fo... - Mobile World Live
No ratings yet
Blog - Are Smartphone Rentals Value Fo... - Mobile World Live
8 pages
Ce06ePresentation Skills
No ratings yet
Ce06ePresentation Skills
6 pages
Development of Hydroponic IoT-based Monitoring System and Automatic Nutrition Control Using KNN
No ratings yet
Development of Hydroponic IoT-based Monitoring System and Automatic Nutrition Control Using KNN
6 pages
680daExpt-Isolation of Plasma N Leukocytes
No ratings yet
680daExpt-Isolation of Plasma N Leukocytes
1 page
Some Performance Aspects Considerations of A Class of Artificial Neural Network
No ratings yet
Some Performance Aspects Considerations of A Class of Artificial Neural Network
3 pages
Awr Design Environment University Program (Flexible Access) Installation Instructions
No ratings yet
Awr Design Environment University Program (Flexible Access) Installation Instructions
2 pages