Unit-7 Socket Options
Unit-7 Socket Options
Source: UNIX Network Programming: Chapter: 7 W. Richard Stevens, Bill Fenner, Andrew M. Rudoff
Overview
● Introduction
● getsockopt and setsockopt function
● socket state
● Generic socket option
● IPv4 socket option
● TCP socket option
● fcnl function
2
Introduction
There are various ways to get and set options that affect a socket:
● getsockopt , setsockopt function=>IPv4 and IPv6 multicasting options
● fcntl function =>nonblocking I/O, signal driven I/O; is used for standard file descriptor
manipulations and locking mechanisms.
● ioctl function : is used when you need device-specific functionality.
3
getsockopt and setsockopt function
● The getsockopt and setsockopt system calls manipulate the options associated with a socket.
● The getsockopt() function retrieve the value for the option specified by
the option_name argument for the socket specified by the socket argument.
#include <sys/socket.h>
int getsockopt(int sockfd, , int level, int optname, void *optval, socklent_t *optlen);
int setsockopt(int sockfd, int level , int optname, const void *optval, socklent_t optlen);
6
Contd..
7
Contd..
8
Why we need socket options?
● Socket options are used to configure and fine-tune the behavior of sockets in a network
application.
● They provide additional control over how a socket operates and interact with the underlying
system and network.
● Enabling developers to optimize performance, ensure reliability, and handle special
requirements.
Without Socket Options- you would lose flexibility and control, potentially facing:
• Performance Issues: Limited buffer sizes and latency problems.
• Resource Constraints: Problems reusing ports or managing connections.
• Incompatibility: Lack of support for advanced networking scenarios like multicasting or
QoS.
9
Contd..
Configuring Socket Behavior
Socket options allow you to control various aspects of how a socket functions. For example:
SO_REUSEADDR: Allows reusing a port that is in the TIME_WAIT state, avoiding the "Address already
in use" error.
SO_KEEPALIVE: Keeps idle connections alive by sending periodic probes, useful for detecting dead or
unresponsive peers.
SO_BROADCAST: Enables broadcasting for UDP sockets.
Example: int opt = 1; //Reusing Ports: To allow multiple processes to bind to the same port:
setsockopt(sockfd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));
int keepalive = 1; //Keep Connections Alive: To detect broken connections:
setsockopt(sockfd, SOL_SOCKET, SO_KEEPALIVE, &keepalive, sizeof(keepalive));
int bufsize = 65536; //Tuning Performance: To increase the size of the send buffer:
setsockopt(sockfd, SOL_SOCKET, SO_SNDBUF, &bufsize, sizeof(bufsize));
10
Contd..
Enabling Advanced Features
● Some socket options enable advanced networking features:
IP_TOS: Sets the Type of Service (ToS) field in the IP header for Quality of Service (QoS) configurations.
SO_PRIORITY: Sets packet priority for traffic differentiation.
Multicast Options: Options like IP_ADD_MEMBERSHIP and IP_DROP_MEMBERSHIP control multicast
group memberships for efficient data distribution.
11
Contd..
Security and Reliability
● Socket options improve the robustness and security of network applications:
SO_LINGER: Ensures proper handling of pending data when closing a socket.
SO_RCVLOWAT and SO_SNDLOWAT: Specify thresholds for read and write operations, ensuring data
readiness.
Portability Across Platforms
● By using standard socket options (e.g., POSIX options), developers can write portable code
that behaves consistently across different operating systems.
Optimizing Network Performance
Socket options enable fine-tuning of performance parameters:
SO_RCVBUF and SO_SNDBUF: Adjust the size of the receive and send buffers to optimize data transfer
performance.
TCP_NODELAY: Disables Nagle's algorithm, reducing latency for applications that send small packets
12
Socket Options Getting/Setting socket options…
● While manipulating the socket options both level and optname arguments must be correctly
specified.
● The “datatype” column shows the data type of what the optval pointer must point to for each
option. Use the notation to of two braces to indicate a structure.
● When calling getsockopt for these flags option,*optval is an integer. The value returned in
*optvalue is zero if the option is disabled, or non zero if option is enabled.
● Setsockopt requires a non zero *optvalue to turn the option on, and a zero to turn the option
off.
● If the Flag column does not contain a “.” then option is used to pass the values to the specific
data type between the user processes and the a system. 13
Types of options
Levels:
● Generic: SOL_SOCKET
● IP: IPPROTO_IP
● ICMPV6: IPPROTO_ICMPv6
● IPV6:IPPROTO_IPV6
● TCP:IPPROTO_TCP
14
Generic: SOL_SOCKET
optname get set Description Flag Datatype
permit sending of
SO_BROADCAST int
broadcast datagrams
SO_DEBUG enable debug tracing int
bypass routing table
SO_DONTROUTE int
lookup
SO_ERROR get pending error & clear int
periodically test if
SO_KEEPALIVE int
connection still alive
Linger on close if data to
SO_LINGER linger{}
send
15
Generic: SOL_SOCKET
● level: SOL_SOCKET
optname get set Description Flag Datatype
timeval{
SO_SNDTIMEO send timeout
}
allow local address
SO_REUSEADDR int
reuse
17
Generic socket option
● SO_BROADCAST =>enable or disable the ability of the process to send broadcast
message.(only datagram socket : Ethernet, token ring..)
You cannot broadcast – point to point link and connection based transport protocol.
● SO_DEBUG => When enable for TCP socket, kernel keep track of detailed information
about all packets sent or received by TCP(only supported by TCP).
● SO_DONTROUTE=>outgoing packets are to bypass the normal routing mechanisms of
the underlying protocol.
The equivalent of this option can also be applied to individual datagrams using the MSG_DONTROUTE
flag with the send , sendto, or sendmsg function.
"Does not route: sends directly to interface. E.G: routing daemons
19
SO_LINGER
● SO_LINGER =>specify how the close function operates for a connection-oriented
protocol(default:close returns immediately)
struct linger{
int l_onoff; /* 0 = off, nonzero = on */
int l_linger; /*linger time : second*/
};
● l_onoff = 0 : turn off , l_linger is ignored
● l_onoff = nonzero and l_linger is 0:TCP abort the connection, discard any remaining data
in send buffer.
● l_onoff = nonzero and l_linger is nonzero : process wait until remained data sending, or
until linger time expired. If socket has been set nonblocking it will not wait for the close to
complete, even if linger time is nonzero.
20
SO_LINGER
client server
write data
client server
write data
Close with SO_LINGER socket option set and l_linger a positive value
22
SO_LINGER
client server
write data
24
Contd..
● An way to know that the peer application has read the data
use an application-level ack or application ACK
client
char ack;
Write(sockfd, data, nbytes); // data from client to server
n=Read(sockfd, &ack, 1); // wait for application-level ack
server
nbytes=Read(sockfd, buff, sizeof(buff)); //data from client
//server verifies it received the correct amount of data from the client
Write(sockfd, “”, 1); //server’s ACK back to client
25
Buffer
● Every socket has a send buffer and receive buffer.
● The receive buffer are used by TCP,UDP, and SCTP to hold received data until it is read by
the application.
● TCP?
● UDP?
● Two socket options let us change the default sizes. The default values differ widely between
implementation.
26
SO_RCVBUF , SO_SNDBUF
● let us change the default send-buffer, receive-buffer size.
Default TCP send and receive buffer size :
4096bytes
8192-61440 bytes(newer system)
Default UDP buffer size : 9,000bytes(send buff if the host supports NFS), 40,000 bytes(receive buff)
27
SO_RCVLOWAT , SO_SNDLOWAT
● Every socket has a receive low-water mark and send low-water mark.(used by select
function)
● Receive low-water mark:
The amount of data that must be in the socket receive buffer for select to return “readable”.
Default receive low-water mark : 1 for TCP and UDP
28
SO_RCVTIMEO, SO_SNDTIMEO
● These two socket option allow us to place a timeout on socket receives and sends.
● We disable timeout by setting its value to 0 seconds and 0 micro seconds.
● The receive timeout affects the five input functions: read , readv, recv,recvfrom and
recvmsg
● The send time out affect the five output functions: write, writev, send , sendto and
sendmesg
● Readv means read vector;
● The term "vector" refers to the array of struct iovec structures that is passed to the system
call, enabling the reading of data into multiple buffers in a single operation. This is why it's
often associated with scatter I/O, where the data is "scattered" across multiple non-
contiguous memory buffers.
29
SO_REUSEADDR, SO_REUSEPORT
● Allow a listening server to start and bind its well known port even if previously established
connection exist that use this port as their local port.
● Allow multiple instance of the same server to be started on the same port, as long as each
instance binds a different local IP address.
● Allow a single process to bind the same port to multiple sockets, as long as each bind
specifies a different local IP address.
● Allow completely duplicate bindings : multicasting
31
SO_TYPE
● Return the socket type.
● Returned value is such as SOCK_STREAM, SOCK_DGRAM...
32
SO_USELOOPBACK
● This option applies only to sockets in the routing domain(AF_ROUTE).
● The socket receives a copy of everything sent on the socket.
33
IPv4 socket option
● Level => IPPROTO_IP
● IP_HDRINCL => If this option is set for a raw IP socket, we must build our IP header for
all the datagrams that we send on the raw socket.
34
IPv4 socket option
● IP_OPTIONS=>allows us to set IP option in IPv4 header.
● IP_RECVDSTADDR=>This socket option causes the destination IP address of a received
UDP datagram to be returned as ancillary data by recvmsg.
35
recvmsg()
● The recvmsg() API is similar to other socket APIs, such as recv() and read(), that allow an
application to receive data, but also provides the capability of receiving ancillary data.
● Ancillary data allows the TCP/IP protocol stack to return additional option data to the
application along with the normal data from the IP network.
36
IP_RECVIF
● Cause the index of the interface on which a UDP datagram is received to be returned as
ancillary data by recvmsg.
37
IP_TOS
● lets us set the type-of-service(TOS) field in IP header for a TCP or UDP socket.
● If we call getsockopt for this option, the current value that would be placed into the
TOS(type of service) field in the IP header is returned.
38
IP_TTL
● User can set and fetch the default TTL that the system will use for unicast packets sent on a
given socket.
39
TCP socket option
● There are five socket option for TCP, but three are new with Posix.1g and not widely
supported.
● Specify the level as IPPROTO_TCP.
40
IPPROTO_TCP
● level: IPPROTO_TCP
optname get set Description Flag Datatype
interpretation of urgent
TCP_STDURG int
pointer
41
TCP_KEEPALIVE
● This is new with Posix.1g
● It specifies the idle time in second for the connection before TCP starts sending keepalive
probe.
● Default 2hours
● This option is effective only when the SO_KEEPALIVE socket option enabled.
42
TCP_MAXRT
● This is new with Posix.1g.
● It specifies the amount of time in seconds before a connection is broken once TCP starts
retransmitting data.
0 : use default
-1:retransmit forever
positive value: rounded up to next transmission time
43
TCP_MAXSEG
● This allows us to fetch or set the maximum segment size(MSS) for TCP connection.
44
TCP_NODELAY
● This option disables TCP’s Nagle algorithm.
(default this algorithm enabled)
● purpose of the Nagle algorithm.
==>prevent a connection from having multiple small packets outstanding at any time.
● Small packet => any packet smaller than MSS.
45
Nagle algorithm
● Default enabled.
● Reduce the number of small packet on the WAN.
● If given connection has outstanding data , then no small packet data will be sent on
connection until the existing data is acknowledged.
46
Nagle Algorithm Disabled
h 0
e 250
l 500
l 750
o 1000
! 1250
1500
1500
1750
2000
47
NAGLE ALGORITHM ENABLED
Example
● Consider a client is sending message “Network Programming” to the server. Here, client
sends with exactly 250ms delay between each character and RTT is 600ms. The server
sends back the echo along with the ACK for each character sent by the client. Calculate and
compare the total time taken in this communication for TCP_NODELAY enable and
disable. Write Nagle algorithm to prove that it reduces the number of packets communicated
in a communication.
49
Contd..
Note: Total number of packet sent and received when
TCP_NODELAY enable and disable. And Total time also
need to mention clearly.
50
fcntl function
● File control
● The fcntl function can change the properties of a file that is already open.
● This function perform various descriptor control operation.
● Provide the following features
Nonblocking I/O
Signal-driven I/O
Set socket owner to receive SIGIO signal.
51
Contd..
52
Fcntl: Nonblocking I/O
● A blocking function is capable of delaying execution of other tasks, especially those that are
independent
In case of a server, other requests may get blocked
In case of a worker consuming tasks from a queue other independent tasks may get delayed.
The fcntl (file control) API is used to retrieve or set the flags that are associated with a socket or stream file.
54
Nonblocking I/O using fcntl
int flags;
/* set a socket as nonblocking */
if((flags = fcntl(fd, f_GETFL, 0)) < 0)
err_sys(“F_GETFL error”);
flags |= O_NONBLOCK;
if(fcntl(fd, F_SETFL, flags) < 0)
err_sys(“F_ SETFL error”);
The only correct way to set one of the file status flags is to fetch the current flags, logically
OR in the new flag, and then set the flag.
each descriptor has a set of file flags that fetched with
the F_GETFL command
and set with F_SETFL command.
56
Misuse of fcntl
/* wrong way to set socket nonblocking */
if(fcntl(fd, F_SETFL,O_NONBLOCK) < 0)
err_sys(“F_ SETFL error”);
57
Turn off the nonblocking flag
Flags &= ~O_NONBLOCK;
if(fcntl(fd, F_SETFL, flags) < 0)
err_sys(“F_SETFL error”);
58
F_SETOWN
● The integer arg value can be either positive(process ID) or negative (group ID)value to
receive the signal.
● F_GETOWN => retrurn the socket owner by fcntl function, either process ID or process
group ID.
60
61
62
Name and Address Conversions
Elementary Name and Address Conversions
● Domain Name System(DNS)
● gethostbyname Function
● gethostbyname2 Function and IPv6 support
● gethostbyaddr Function
● uname and gethostname Functions
● getservbyname and getservbyport Functions
● Other networking information
63
Introduction DNS
1. What is the
IP address of
udel.edu ?
It is 128.175.13.92
1. What is the
host name of
128.175.13.74
It is
strauss.udel.edu
64
DNS Components
● Resolvers:
Client programs that extract information from Name Servers.
● Name Servers:
Server programs which hold information about the structure and the names.
65
Resolvers
A Resolver maps a name to an address and vice versa.
Query
Response
66
Iterative Resolution
a.root
server
a a.gtld-
3 server
. 5
n udel ns1.goo
s server gle.com
t 7
iterative response (referral)
3l
“I don't know. Try a.root-servers.net.”
d iterative response (referral)
. 9
c 1 iterative response (referral)
“I don't know. Try a.gtld-servers.net.”
o iterative response (referral)
“I don't know. Try a3.nstld.com.”
m 2 4 “I don't know. Try ns1.google.com.”
6 iterative response
8 “The IP address of www.google.com
client 10 is 216.239.37.99.”
iterative request
“What is the IP address of
www.google.com?” 67
Recursive Resolution
root
server
edu 3 com
server server
7 4
udel 2 8 google
server server
6 5
9
1
10 recursive request
“What is the IP address of
www.google.com?”
client recursive response
“The IP address of www.google.com is
216.239.37.99.”
68
Domain Name System
● Entries in DNS: resource records (RRs) for a host
A record: maps a hostname to a 32-bit IPv4 addr
AAAA (quad A) record: maps to a 128-bit IPv6 addr
PTR record: maps IP addr to hostname
MX record: specifies a mail exchanger of the host
CNAME record: assigns canonical name for common services
69
MX Record
● Three servers can receive emails for the kerio.com email domain. The lowest number means
the highest server preference.
● In this example the primary MX server (the server with highest preference) is
mx1.kerio.com
70
Contd..
● If you have to change the IP-address you only have to change it in one place, i.e. in the A
record.
71
DNS: Application, Resolver, Name Servers
application
application
code
UDP
call return
request
resolver local other
code name name
UDPserver servers
reply
resolver
configuration
files
• resolver functions: gethostbyname/gethostbyaddr
• name server: BIND (Berkeley Internet Name Domain)
• static hosts files (DNS alternatives): /etc/hosts
• resolver configuration file (specifies name server IPs): /etc/resolv.conf
72
Contd..
73
Contd..
● The gethostbyname() function returns a structure of type hostent for the
given host name.
● gethostbyname Function performs a DNS query for an A record or a
AAAA record
#include <netdb.h>
struct hostent *gethostbyname (const char *hostname);
returns: nonnull pointer if OK, NULL on error with h_errno set
struct hostent {
char *h_name; /* official (canonical) name of host */
char **h_aliases; /* ptr to array of ptrs to alias names */
/ *A list of aliases that can be accessed with arrays*/
int h_addrtype; /* host addr type: AF_INET or AF_INET6 */
int h_length; /* length of address: 4 or 16 */
char **h_addr_list; /* ptr to array of ptrs with IPv4/IPv6 addrs */
};
#define h_addr h_addr_list[0] /* first address in list */
74
hostent Structure Returned by gethostbyname
hostent { }
h_name canonical hostname \0
h_aliases alias #1 \0
h_addrtype AF_INET/6
h_length 4/16 alias #2 \0
h_addr_list NULL
in/6_addr { }
IP addr #1
in/6_addr { }
IP addr #2
NULL in/6_addr { }
IP addr #3
75
RES_USE_INET6 Resolver Option
● Per-application: call res_init
#include <resolv.h>
res_init ( );
_res.options |= RES_USE_INET6
● For a host without a AAAA record, IPv4-mapped IPv6 addresses are returned.
gethostbyname2 Function and IPv6 Support
#include <netdb.h>
struct hostent *gethostbyname2 (const char *hostname, int family);
returns: nonnull pointer if oK, NULL on error with h_errno set
RES_USE_INET6 option
off on
gethostbyname A record AAAA record
(host) or A record returning
IPv4-mapped IPv6 addr
gethostbyname2 A record A record returning
(host, AF_INET) IPv4-mapped IPv6 addr
gethostbyname2 AAAA record AAAA record
(host, AF_INET6)
Note: Resolver’s RES_USE_INET6 option along with which function is called (gethostbyname or
gethostbyname2) dictates the type of records that are searched for in the DNS (A or AAAA) and what type of
addresses are returned (IPv4, IPv6, or IPv4-mapped IPv6).
77
gethostbyaddr Function binary IP address to hostent structure
#include <netdb.h>
struct hostent *gethostbyaddr (const char *addr, size_t len, int family);
returns: nonnull pointer if OK, NULL on error with h_errno set
78
getservbyname and getservbyport Functions
● The getservbyname() function searches the database and finds an entry which matches the
specified service name and the specified protocol, opening a connection to the database if
necessary.
● The getservbyport() function searches the database and finds an entry which matches the
specified port number and the specified protocol, opening a connection to the database if
necessary.
81
getservbyname and getservbyport
Functions
#include <netdb.h>
struct servent *getservbyname (const char *servname, const char
*protoname);
returns: nonnull pointer if OK, NULL on error
struct servent *getservbyport (int port, const char *protoname);
returns: nonnull pointer if OK, NULL on error
The getservbyport socket function returns the Internet service name based on a specified Internet service port
number and protocol.
struct servent {
char *s_name; /* official service name */
char **s_aliases; /*alias list */
int s_port; /* port number, network-byte order */
char *s_proto; /* protocol, TCP or UDP, to use */
}
Mapping from name to port number: in /etc/services
Services that support multiple protocols often use the same TCP and UDP port
number. But it’s not always true:
shell 514/tcp 82
Contd..
83
Daytime Client using gethostbyname and getservbyname
● Call gethostbyname and getservbyname
● Try each server address
● Call connect
● Check for failure
● Read server’s reply
84
Other Networking Info
● Four types of info:
hosts (gethostbyname, gethostbyaddr)
through DNS or /etc/hosts, hostent structure
networks (getnetbyname, getnetbyaddr)
through DNS or /etc/networks, netent structure
protocols (getprotobyname, getprotobynumber)
through /etc/protocols, protoent structure
services (getservbyname, getservbyport)
through /etc/servies, servent structure
85
CHAPTER 12
IPV4 AND IPV6
INTEROPERABILI
TY
Contents
● Introduction
● IPv4 Client, IPv6 Server
● IPv6 Client, IPv4 Server
● IPv6 Address Testing Macros
● IPV6_ADDRFORM Socket Option
● Source Code Portability
IPV6
● IPv6 can theoretically hold 2128 IP addresses. As you’re probably aware of, that’s a huge
number:
● 2128 = 340,282,366,920,938,463,463,374,607,431,768,211,456
● 340 undecillion, 282 decillion, 366 nonillion, 920 octillion, 938 septillion, 463 sextillion,
463 quintillion, 374 quadrillion, 607 trillion, 431 billion, 768 million, 211 thousand and 456
Introduction
● Server and client combination
IPv4 <=> IPv4(most server and client)
IPv4 <=> IPv6
IPv6 <=> IPv4
IPv6 <=> IPv6
● How IPv4 application and IPv6 application can communicate with each other.
● Host are running dual stacks, both an IPv4 protocol stack and IPv6 protocol stack
Dual stack
A station should simultaneously run both IPv4 & IPv6 protocols until all Internet
uses IPv6.
To determine which version to use, the source host queries the DNS and send
whichever version of IP packet the DNS returns.
90
IPv4 Client , IPv6 Server
● IPv6 dual stack server can handle both IPv4 and IPv6 clients.
● This is done using IPv4-mapped IPv6 address
● server create an IPv6 listening socket that is bound to the IPv6 wildcard address
IPv4 Mapped IPv6 Address
IPv6 IPv4 IPv6 listening socket,
IPv6
client client server
bound to 0::0, port 9999
TCP UDP
IPv4 mapped
Address IPv4 IPv6
returned by
accept or
recvfrom
IPv4 IPv6
97
IPv6 client, IPv4 server
● IPv4 server start on an IPv4 only host and create an IPv4 listening socket
● IPv6 client start, call gethostbyname. IPv4 mapped IPv6 address is returned.
● Using IPv4 datagram
Contd..
First consider an IPv6 TCP client running on a dual-stack host.
● An IPv4 server starts on an IPv4-only host and creates an IPv4 listening socket.
● The IPv6 client starts and calls getaddrinfo asking for only IPv6 addresses (it requests the
AF_INET6 address family and sets the AI_V4MAPPED flag in its hints structure). Since
the IPv4-only server host has only A records, an IPv4-mapped IPv6 address is returned to
the client.
● The IPv6 client calls connect with the IPv4-mapped IPv6 address in the IPv6 socket address
structure. The kernel detects the mapped address and automatically sends an IPv4 SYN to
the server.
● The server responds with an IPv4 SYN/ACK, and the connection is established using IPv4
datagrams.
99
AF_INET AF_INET
IPv4 SOCK_STREAM SOCK_DGRAM
sockets sockaddr_in sockaddr_in
AF_INET6 AF_INET6
IPv6 SOCK_STREAM SOCK_DGRAM
sockets sockaddr_in6 sockaddr_in6
TCP UDP
IPv4 IPv6
102
IPv6 Address Testing Macros
There are small class of IPv6 application that must know whether they are talking to an IPv4
peer.
● These application need to know if the peer’s address is an IPv4-mapped IPv6 address.
● Twelve macro defined
● Eg: int IN6_IS_ADDR_LOOPBACK(const struct in6_addr * aptr ) ;
● Eg: int IN6_IS_ADDR_V4MAPPED(const struct in6_addr * aptr ) ;
Contd..
104
IPV6_ADDRFORM Socket Option
● Change the IPV6 address into a different address family
● Can change a socket from one type to another, following restriction.
An IPv4 socket can always be changed to an IPv6. Any IPv4 address already associated with the socket
are converted to IPv4- mapped IPv6 address.
An IPv6 socket can changed to an IPv4 socket only if any address already associated with the socket are
IPv4-mapped IPv6 address.
Converting an IPv4 to IPv6
Int af;
socklen_t clilen;
struct sockaddr_int6 cli; /* IPv6 struct */
struct hostent *ptr;
af = AF_INT6;
Setsockopt(STDIN_FILENO, IPPROTO_IPV6, IPV6_ADDRFORM, &af, sizeof(af));
clilen = sizeof(cli);
Getpeername(0, &cli, &clilen);
ptr = gethostbyaddr(&cli.sin6_addr, 16, AF_INET);
Contd..
● setsockopt => change the Address format of socket from IPv4 to IPv6.
Return value is AF_INET or AF_INET6
● getpeername =>return an IPv4-mapped IPv6 address