Design Issues of Concurrent and
Iterative Servers; I/O
Multiplexing; Socket Options
Concurrent vs. iterative
servers
Iterative server
process one request at a time
Easy to build
Unnecessary delay
Concurrent server handles multiple requests
at one time.
More difficult to design and build
Better performance
Concurrent vs. iterative
servers
The term concurrent server refers to whether
the server handles multiple requests
concurrently, not to whether the underlying
implementation uses multiple concurrent
processes
Connection-Oriented vs
Connectionless Servers
Connection-Oriented Servers
Ease of Programming – because the transport
protocol handles packet loss and out-of-order
delivery problems automatically
Require a separate socket for each connection
For trivial applications the overhead of a 3-way
handshake makes TCP expensive over UDP
Wastage of resources, and if clients crash
repeatedly server will run out of resources
Connection-Oriented vs
Connectionless Servers
Connectionless Servers
One side or the other must take the responsibility
for reliable delivery
Usually clients retransmit requests if no response
arrives
If server need to divide its response into multiple data
packets it may need to implement a retransmission
mechanism as well
Achieving reliability through timeout and
retransmission can be extremely difficult, infact it
requires considerable expertise in protocol design
Why?
Connection-Oriented vs
Connectionless Servers
Connectionless servers
Adaptive Retransmission
Broadcast or Multicast operations
Four Basic Types of Servers
Iterative Connectionless
Iterative Connection-Oriented
Concurrent Connectionless
Concurrent Connection-Oriented
Request Processing Time
Server’s request processing time- total time the
server takes to handle a single isolated request
Client’s observed response time- total delay
between the time it sends a request and the time the
server’s request processing time
Observed response times increases in proportion to
N
N denotes the average length of the request queue
Most servers restrict N to small values (e.g., 5) and expect
programmers to use concurrent servers in cases where a
small queue does not suffice
Iterative Server Algorithms
Create a socket and bind
Place the socket in passive mode
Accept the next connection request from the
socket and obtain a new socket for the
connection
Read request from the client and send reply
When finished with a particular client close
the connection and return to step 3
Iterative Server Algorithms
Client Server
socket socket
bind open_listenfd
open_clientfd
listen
Connection
request
connect accept
rio_writen rio_readlineb
Client /
Server Await connection
Session request from
rio_readlineb rio_writen next client
EOF
close rio_readlineb
close
Iterative Servers
Iterative servers process one request at a
time
client 1 server client 2
call connect call accept call connect
ret connect
call write ret accept
read
ret write
close
close
call accept
ret connect
ret accept
call write
read
ret write
close
close
Fundamental Flaw of Iterative Servers
client 1 server client 2
call accept
call connect
ret connect
ret accept
call fgets
Server blocks call read call connect
User goes waiting for
out to lunch data from
Client 2 blocks
Client 1
waiting to complete
Client 1 blocks its connection
waiting for user request until after
to type in data lunch!
Solution: use concurrent servers instead
Concurrent servers use multiple concurrent flows to serve
multiple clients at the same time
Iterative, Connectionless
Server Algorithm
Create a socket and bind to the well-known
address for the service being offered
Repeatedly read the next request from a
client, formulate a response, and send a reply
back to the client according to the application
protocol
Outline for typical concurrent server
int pid,s, conn;
S = Socket( .. );
// fill in server address
Bind(s, ..);
Listen(s, LISTNQ);
while(1){
conn = Accept(s, ..);
if( (pid = fork()) ==0)
{ close (s);
doit( conn);
close(conn);
exit(0);
} // end of if
close(conn);
}// end of while loop
Concurrent, Connectionless
Server Algorithm
Master: Create a socket and bind it. Leave
the socket unconnected
Master: Repeatedly call recvfrom to receive
the next request from a client and create a
new slave process to handle the response
Slave: Receive a specific request upon
creation as well as access to the socket
Slave: Form a reply and send it back to the
client using sendto
Exit
Concurrent, Connectionless
Server Algorithm
Programmers should remember that the cost
of creating a new process is expensive
In this case one must consider carefully
whether the cost of concurrency will be
greater than the gain in speed
Fact: Few connectionless servers have
concurrent implementations
Concurrent, Connection-
oriented Server Algorithm
M: Create a socket and bind. Leave the socket
unconnected
M: Place the socket in passive mode
M: Repeatedly call accept to receive the next
request from a client, and create a new slave
process to handle the response
S: Receive a connection request (i.e., socket for the
connection) upon creation
S: Interact with the client using the connection: read
requests and send response
Close the connection and exit.
Using separate programs as
slaves
For simple application protocols a single
server can contain all the code needed for
both the master and slave processes
In some cases it may be more convenient to
have the slave execute code from a program
that has been written and compiled
independently
UNIX can handle such cases easily because from
a slave process you can call execve…..
Apparent Concurrency Using a
Single Process
The select() system call
Other Concurrent Server
Designs
For very high-load servers the cost of
creating a new child (or even thread) for each
child imposes a significant burden
Web servers
Preforked and prethreaded
servers
The server precreates a fixed number of child
processes (or threads) immediately on
startup
These children constitute a server pool
Each child handle one client, but instead of
terminating after handling the client, the child
fetches the next client to be serviced
Server farms
DNS round robin load sharing
Challenges-
Good load balancing
Future requests from clients on a particular host
should bypass the round robin DNS server
Server load Balancing
inetd Daemon
I/O Multiplexing: The select
and poll functions
Client is handling multiple descriptors
Client is handling multiple sockets – rare but
possible
TCP server handling both listening and
connected sockets
TCP server handling both TCP and UDP
TCP server handling multiple services and
multiple protocols
I/O multiplexing is not limited to networking
I/O Models
Five I/O models available in Unix
Blocking I/O Mode
Non blocking I/O Mode
Multiplexing I/O Mode
Select() or poll()
Asynchronous I/O Mode
Signal Driven I/O
I/O Models
Two distinct phases for an input operation:
Waiting for the data to be ready
Copying the data from the kernel to the process
Wait for data to arrive on network
When packet arrives it is copied into a buffer
within the kernel
Copy the data from kernel’s buffer to
application buffer
Blocking I/O Model
Application kernel
system call
recvfrom no datagram ready
process
blocks
datagram ready
copy datagram
return OK
process datagram copy complete
Non-Blocking I/O Model
Application kernel
system call
recvfrom EWOULDBLOCK no datagram ready
system call
recvfrom EWOULDBLOCK no datagram ready
system call
recvfrom datagram ready
copy datagram
return OK
process datagram copy complete
Non-Blocking I/O Model
When an application sits in a loop calling
recvfrom on a nonblocking descriptor like
this it is called polling
The application is continuously polling the
kernel to see if some operation is ready
This is often a waste of CPU time, but this
model is occasionally encountered, normally
on systems dedicated to one function
I/O Multiplexing Model
Application kernel
system call
select no datagram ready
process
blocks
return readable
datagram ready
recvfrom system call
copy datagram
process
blocks
return OK
process datagram copy complete
I/O Multiplexing Model
We call select or poll and block in one of
these two system calls instead of blocking in
the actual I/O system call
Signal Driven I/O Model
Application kernel
system call
establish SIGIO no datagram ready
signal handler return
process
continues
executing
deliver SIGIO
signal handler datagram ready
recvfrom system call
copy datagram
process
blocks
return OK
process datagram copy complete
Signal Driven I/O Model
We first enable the socket for signal driven
I/O and install a handler using sigaction
When the datagram is ready the SIGIO signal
is generated for our process
Either read the datagram from the signal
handler calling recvfrom and then notify the
main loop or we can notify the main loop and
let it read the data
Asynchronous I/O Model
Introduced in POSIX.1 (realtime extensions)
We tell the kernel to start the operation and
notify us when the entire operation (including
copying of data from kernel to our buffer) is
complete
The select() system call
The select() system call blocks until one
or more of a set of file descriptors become
ready
The select() system call
#include <sys/time.h> // For portability
#include <sys/select.h>
int select(int nfds, fd_set *readfds,
fd_set *writefds,
fd_set *exceptfds,
struct timeval *timeout);
Returns number of ready file descriptors, 0 on
timeout, -1 on error
The select() system call
readfds, writefds and exceptfds are pointers
to file descriptor sets
readfds is the set of file descriptors to be
tested to see if input is possible
writefds is the set of file descriptors to be
tested to see if output is possible
exceptfds is the set of file descriptors to be
tested to see if an exceptional condition has
occured
The select() system call:
The macros
#include <sys/select.h>
void FD_ZERO(fd_set *fdset);
void FD_SET(int fd, fd_set *fdset);
void FD_CLR(int fd, fd_set *fdset);
int FD_ISSET(int fd, fd_set *fdset);
The select() system call:
The timeout argument
timeout controls the blocking behavior of
select()
If NULL then select blocks indefinitely
Or it can be a pointer to a timeval structure
struct timeval {
time_t tv_sec; // Seconds
suseconds_t tv_usec; // Microseconds
};
The select() system call
When timeout is NULL or points to a structure
containing non-zero fields, select() blocks
until
At least one of the file descriptors specified in
readfds, writefd or exceptfds becomes ready
The call is interrupted by a signal handler
The amount of time specified by timeout has
passed
The select() system call:
Return values
-1 on error. Possible errors include EBADF
and EINTR
0 means the call timeout before any file
descriptor became ready. In this case the
returned file descriptors will be empty
A positive value indicated more than one file
descriptor is ready
Each must be examined using FD_ISSET()
If file descriptor is ready for more than one event
then count multiple times
Under what conditions is a
descriptor read ready?
Number of bytes of data in a socket receive buffer ≥
current size of low water mark for the socket receive
buffer. A read operation on that socket will not block
and will return a value greater than zero
Low water mark set using SO_RCVLOWAT socket option
Default to 1 for TCP/UDP sockets
The read half of the connection is closed (i.e., a
TCP connection that has received a FIN)
A read operation on that socket will not block and return 0
(i.e., EOF)
Under what conditions is a
descriptor read ready?
The socket is a listening socket and the
number of connected connections is non
zero. An accept on the listening socket will
normally not block
A socket error is pending. A read operation
on the socket will not block and will return an
error (-1) with errno set to specific condition
Under what conditions is a
descriptor write ready?
Number of bytes in socket send buffer ≥
current size of the low water mark for the
socket send buffer
Use SO_SNDLOWAT
Default is 2048 bytes for TCP/UDP sockets
The write half of the connection is closed. A
write operation on the socket will generate
SIGPIPE
A socket error is pending
Under what conditions is a
descriptor exception ready?
If there exists OOB data for the socket or the
socket is still at the out-of-band mark
The poll() function
#include <poll.h>
int poll(struct pollfd fds[ ], nfds_t nfds, int
timeout);
Returns number of ready file descriptors, 0 on
timeout and -1 on error
The poll() function
struct pollfd {
int fd; //File descriptor
short events; //Events of interest on fd
short revents; //Events occurred on fd
};
The poll() function: events
POLLIN
POLLRDNORM
POLLRDBAND //Priority data can be read
POLLPRI //High priority data can be read
POLLRDHUP //Shutdown on peer socket
POLLOUT
POLLWRNORM
POLLWRBAND //Priority data can be written
The poll() function: events
The following can occur only on revents
POLLERR // An error has occurred
POLLHUP // A hangup has occurred
POLLNVAL //File descriptor is not open
The poll() function: the
timeout argument
If timeout is -1, block until one of the file
descriptor listed in fds array is ready or a
signal is caught
If timeout is 0, do not block-just perform a
check
If timeout > 0, block for up to timeout
milliseconds, until one of the file descriptors
in fds is ready, or until a signal is caught
Socket Options
Various ways to get and set the options that
affect a socket
The getsockopt and setsockopt functions
The fcntl function
The ioctl function
getsockopt and setsockopt
#include <sys/socket.h>
int getsockopt (int sockfd, int level, int optname,
void *optval, socklen_t *optlen);
int setsockopt (int sockfd, int level, int optname,
const void *optval,
socklen_t optlen);
getsockopt and setsockopt
level specifies the protocol to which the
socket option applies – e.g., TCP, IP etc.
At socket API level it is SOL_SOCKET
optname identifies the option whose value we
wish to set or retrieve
optval is a pointer to a buffer used to specify
or return the option value
Can be a pointer to an integer or structure,
depending on the option
getsockopt and setsockopt
optlen specifies the size (in bytes) of the
buffer pointed by optval
For setsockopt() this argument is passed by
value
For getsockopt(), optlen is a value-result
argument
getsockopt and setsockopt
int optval;
socklen_t optlen;
optlen = sizeof(optval);
getsockopt(sfd, SOL_SOCKET, SO_TYPE,
&optval, &optlen);
getsockopt and setsockopt
Write a program to check whether most of the
options defined are supported, and if so, print
their default values.
SO_BROADCAST
Enables or disables the ability of the process
to send broadcast messages
SO_DEBUG
Supported only by TCP
If enabled for a TCP socket, the kernel keep
track of detailed information about all the
packets sent or received by TCP for the
socket
These are kept in a circular buffer within the
kernel that can be examined with the trpt
program
SO_ERROR
When an error occurs on a socket, the
Berkley driven kernels sets a variable named
so_error for the socket to one of the standard
Unix exxx values
This is called the pending error for the socket
The process can be immediately notified of
the error in two ways
If the process is blocked on select, return
If the process is using signal driven I/O, SIGIO is
generated for either the process or the process
group
SO_KEEPALIVE
When the keepalive option is set for a TCP
socket and no data has been exchanged
across the socket in either direction for 2
hours, TCP automatically sends a keepalive
probe to the peer. Three scenarios can occur
Peer responds with ACK. Application is not
notified (since everything is ok)
Peer responds with RST (peer host has crashed
or rebooted). Socket’s pending error is set to
ECONNRESET and the socket is closed
SO_KEEPALIVE
No response from the peer. Berkley derived
kernels send eight additional probes, 75 secs
apart. TCP gives up in 11 mins 15 secs. If no
response, socket’s pending error is set to
ETIMEDOUT and socket is closed. However if the
socket receives an ICMP error in response to one
of the keepalive probes, the corresponding error
is returned (the socket is still closed). A common
ICMP error is host unreachable, in which case the
pending error is set to EHOSTUNREACH
SO_LINGER
This option specifies how close function
operates for TCP
By default close returns immediately, but if
any data is still remaining in the socket send
buffer, the system will try to deliver the data
to the peer
SO_LINGER
The following structure is passed between
the user process and the kernel
struct linger {
int l_onoff; // 0=off; non-zero=on
int l_linger; // linger time in secs
};
SO_LINGER
If l_onoff is 0, the option is turned off
If l_onoff is non-zero and l_linger is 0, TCP
aborts the connection when it is closed
TCp discards any remaining data in the socket
send buffer and sends an RST to the peer
If both l_onoff and l_linger is non-zero, the
kernel will linger when socket is closed
If there is any data still remaining in the socket
send buffer, the process is put to sleep until either
All data is sent and acknowledged by the peer TCP
The linger time is non zero
SO_RCVBUF and SO_SNDBUF
TCP and UDP have receive buffers to hold
received data until read by application
With TCP, the available room in the socket
receive buffer is the window that TCP
advertises to the other end
Peer is not allowed to send data beyond the
advertised window (TCP flow control)
If peer still sends data beyond this window,
TCP discards it
With UDP if the incoming datagram does not
fit in the receive buffer, datagram is discarded
SO_RCVBUF and SO_SNDBUF
These two socket options let us change the
default sizes
Default value differs widely between
implementations
Older Berkley derived implementations would
default sent and receive buffers to 4096 bytes
New systems use anywhere between 8192 to
61440 bytes
UDP send buffer often defaults to a value around
9000 bytes and the receive buffer around 40000
bytes
SO_RCVBUF and SO_SNDBUF
For a client SO_RCVBUF must be set before
or after connect() and why?
For a server SO_RCVBUF must be set when
and why?
Before or after listen()?
Before or after connect()?
SO_REUSEADDR and
SO_REUSEPORT
SO_REUSEADDR allows a listening server
to start and bind its well known port even if
previously established connections exist that
use this port
The listening server is restarted
The listening server terminates but the child
continues to service the client on the existing
connection
SO_REUSEADDR and
SO_REUSEPORT
SO_REUSEADDR allows multiple instances of the
same server to be started on the same port, as long
as each instance binds a different local IP address
SO_REUSEADDR allows a single process to bind
the same port to multiple sockets, as long as each
bind specifies a different local IP address
SO_REUSEADDR allows completely duplicate
bindings: a bind of an IP address and port, when
that same IP address and port are already bound to
another socket
SO_REUSEADDR and
SO_REUSEPORT
SO_REUSEPORT allows complete duplicate
bindings, but only if each socket that wants to
bind the same IP address and port specify
this socket option
SO_REUSEADDR is considered equivalent
to SO_REUSEPORT if the IP address being
bound is a ___________address (fill in the
blanks).
SO_TYPE
Returns the socket type
The integer value returned is a value such as
SOCK_STREAM or SOCK_DGRAM
Typically used by a process that inherits a
socket when it is started
SO_USELOOPBACK
Socket receives a copy of everything sent on
the socket
Only applies to socket in the routing domain
(AF_ROUTE)
fcntl revisited for sockets
Use O_NONBLOCK using F_SETFL to set a
socket nonblocking
Set O_ASYNC file status flag using F_SETFL
which causes SIGIO to be generated when
status of a socket changes
F_SETOWN allows the socket owner (the
process id or process group id) to receive the
SIGIO and SIGURG signals
F_GETOWN returns the current owner of the
socket
fcntl revisited for sockets
#include<fcntl.h>
int fcntl (int fd, int cmd, …./* int arg */ );
OOB
What is OOB?
Out of Bounds
Order of Battle
Out of Body
Out of Band Data
The idea is that something important occurs
at one end of the connection and that end
wants to tell its peer quickly
TCP does not have true out-of-band data
Logically independent transmission channel
associated with each pair of connected
stream sockets
Give some situations where out-of-band data
is used.
TCP Out of Band Data
TCP provides an urgent mode
To send OOB data use send, specifying the
MSG_OOB flag
send(fd, “a”, 1, MSG_OOB);
TCP Out of Band Data
TCP places the data in the next available
position in the socket send buffer and sets its
urgent pointer for this connection to be the
next available location
The next segment sent by TCP will have its
URG flag set in the TCP header
But this segment may or may not contain the byte
that we have labeled as OOB
TCP Out of Band Data
When TCP receives a segment with the URG
flag set, the urgent pointer is examined to see
whether this pointer refers to new out-of-band
data
The receiving process is notified when a new
urgent pointer arrives
SIGURG sent to the process when either fcntl
or ioctl has been called to establish an owner
for the socket and a signal handler has been
established
If process is blocked using select, then select
returns
TCP Out of Band Data
When the actual byte of data pointed to by the
urgent pointer arrives at the receiving TCP, the data
byte can be pulled out-of-band or left inline
By default the SO_OOBINLINE socket option is not
set for a socket so the single byte of data is not
placed into the socket receive buffer
Instead the data is placed into a separate 1-byte
OOB buffer for this connection
The only way to read from this special buffer is to
call recv, recvfrom or recvmsg and specify the
MSG_OOB flag
TCP Out of Band Data
If, however, the process sets the
SO_OOBINLINE socket option, then the
single byte of data is left in the normal socket
receive buffer
The process cannot specify the MSG_OOB
flag to read the data byte in this case
The process will know when it reaches this
byte of data by checking the out-of-band
mark for this connection
TCP Out of Band Data:
sockatmark()
Whenever OOB data is received there is an
associated out-of-band mark
This is the position in the normal stream of
data at the sender when the sending process
sent the OOB byte
The receiving process determines whether or
not it is at the OOB mark by calling
int sockatmark(int sockfd);
Returns 1 if at OOB mark, 0 if not, -1 on error
TCP Out of Band Data:
possible errors
If the process asks for OOB data using MSG_OOB
flag but the peer has not sent any, then EINVAL is
returned
If the process tries to read the same OOB data
multiple time EINVAL is returned
If SO_OOBINLINE is set and process tries to read
the OOB data using MSG_OOB flag, EINVAL is
returned
If the process has been notified that peer has sent
OOB byte (SIGURG or select) and the process tries
to read it, but that byte has not yet arrived,
EWOULDBLOCK is returned
References
For server design issues – Volume 3 of
Comer book
For I/O Multiplexing – Chapter 6 of Steven’s
book
For Socket options – Chapter 7 of Steven’s
book
For OOB data- Chapter 21 of Steven’s book