Introduction To Socket Programming
Introduction To Socket Programming
Introduction to Socket Programming – Overview of TCP/IP Protocols – Introduction to Sockets – Socket address
Structures – Byte ordering functions – Address conversion functions – Elementary TCP sockets – socket, connect,
bind, listen, accept, read, write, close functions – Iterative Server – Concurrent server.
Most network applications can be divided into two pieces: a client & server.
Networking protocols are used for communication between client application and the server application.
Web clients and servers communicate using the TCP protocol. TCP, in turn, uses the IPprotocol, and IP
communicates with a data link layer of some form.
A server can handle single or multiple clients at a time
The server executes first and waits to receive the request from the client.
The client executes next and sends the first packet to the server.
After the initial contact, either the client or server is capable of sending and receiving data.
The client and server communicate using an application protocol; the transport layers
communicate using TCP, and so on.
The actual flow of information between the client and server goes down the protocol stack on
one side, across the network, and up the protocol stack on the other side.
The client and server are user processes, while the TCP and IP protocols are normally part of
the protocol stack within the kernel.
In the below fig, the client and server communicate through the single LAN.
In the following fig, we show the client and server on different LANs, with both LANs connected to
WAN using routers.
Routers are the building blocks of WANs. The largest WAN today is the Internet.
Fig. Client and server on different LANS connected through a WAN
All the above protocols are defined in the RFC (Request For Comments) , which are supported by their formal
specification.
TCP Connection Establishment and Termination
Three-Way handshake
The following scenario occurs when a TCP connection is established:
1. The server must be prepared to accept an incoming connection. This is normally done by calling socket, bind,
and listen and is called a passive open.
2. The client issues an active open by calling connect. This causes the client TCP to send a "synchronize" (SYN)
segment, it tells the server the client's initial sequence number for the data that the client will send on the
connection. Normally, there is no data sent with the SYN; it just contains an IP header, a TCP header, and
possible TCP options.
3. The server must acknowledge (ACK) the client's SYN and the server must also send its own SYN containing
the initial sequence number for the data that the server will send on the connection. The server sends its SYN and
the ACK of the client's SYN in a single segment.
4. The client must acknowledge the server's SYN.
The minimum number of packets required for this exchange is three; hence, this is called TCP's three-way
handshake. Two computers on a network establish a connection using some kind of networking tools is called
handshake.
We show the client's initial sequence number as J and the server's initial sequence
number as K. The acknowledgment number in the ACK of each SYN is the initial sequence
number plus one.
TCP Options
Each SYN can contain TCP options. Commonly used options are
i. MSS option
The Maximum Segment Size (MSS) is set when TCP connection established.
MSS option can only appear in a SYN segment.
If one end does not receive an MSS option from other end a default 536 byte is assumed.
ii. Window scale option.
TCP always tells its peer exactly how many bytes of data it is willing to accept from the peer. This is
called the advertised window.
The maximum window size is 65,535.
TCP can able to change the window size during the connection establishment.
It never changes the window size during communication.
iii. Timestamp option. This option is needed for high-speed connections to prevent possible data corruption
caused by old, delayed, or duplicated segments.
Types of Socket:
Stream Socket
Datagram Socket
Raw Socket
Stream Socket
These sockets are error free. Sending items are reached at the destination in the same order as they send. They use
TCP and hence the connection is established between sockets and then the data transfer occurs. Structure of
Stream Socket: SOCK_STREAM
Datagram Socket
These sockets use UDP. There is no need to open connection in case of datagram sockets. Structure of Datagram
Socket : SOCK_DGRAM
Raw Socket
It provides access to ICMP. Raw sockets are not intended for general user. They have been provided mainly for
those interested in developing new communication protocols. Structure of Raw Socket: SOCK_RAW
struct in_addr
{
in_addr_t s_addr; /* 32-bit IPv4 address */
/* network byte ordered */
};
struct sockaddr_in
{
uint8_t sin_len; /* length of structure (16) */
sa_family_t sin_family; /* AF_INET */
in_port_t sin_port; /* 16-bit TCP or UDP port number */
/* network byte ordered */
struct in_addr sin_addr; /* 32-bit IPv4 address */
/* network byte ordered */
char sin_zero[8]; /* unused */
};
sockaddr_in is a parallel structure to deal with struct sockaddr for IPv4 addresses.
sin_port contains the port number and must be in Network byte order.
sin_family corresponds to sa_family and contains the type of address family (AF_INET for IPv4) and
must be in Network byte order.
sin_addr represents Internet address (IPv4).
sin_zero member is unused, but we always set to all zero using bzero( ) or memset( ) functions.
Both the IPv4 address and the TCP or UDP port number are always stored in the structure in network byte order.
Socket address structures are used only on a given host: the structure itself is not communicated between different
hosts although certain fields (IP address and port) are used for communication.
IPv6 Socket address Structure
The IPv6 socket address is defined by including the <netinet/in.h> header. IPv6 socket address structure:
sockaddr_in6.
struct in6_addr
{
uint8_t s6_addr[16]; /* 128-bit IPv6 address */
/* network byte ordered */
};
struct sockaddr_in6
{
uint8_t sin6_len; /* length of this struct (28) */
sa_family_t sin6_family; /* AF_INET6 */
in_port_t sin6_port; /* transport layer port# */
/* network byte ordered */
uint32_t sin6_flowinfo; /* priority, flow label */
/* Network byte order*/
struct in6_addr sin6_addr; /* IPv6 address */
/* network byte ordered */
};
The SIN6_LEN constant must be defined if the system supports the length member for socket address
structures.
The IPv6 family is AF_INET6, whereas the IPv4 family is AF_INET.
The members in this structure are ordered so that if the sockaddr_in6 structure is 64-bit aligned, so is the
128-bit sin6_addr member.
The sin6_flowinfo member is divided into three fields:
The low-order 24 bits are the flow label
The next 4 bits are the priority,
The next 4 bits are reserved
The socket functions are then defined as taking a pointer to the generic socket address structure, as shown here in
the function prototype for the bind function:
int bind(int, struct sockaddr *, socklen_t);
This requires that any calls to these functions must cast the pointer to the protocol-specific socket address
structure to be a pointer to a generic socket address structure. For example,
struct sockaddr_in serv; /* IPv4 socket address structure */
/* fill in serv{} */
bind(sockfd, (struct sockaddr *) &serv, sizeof(serv));
From an application programmer's point of view, the only use of these generic socket addressstructures is to cast
pointers to protocol-specific structures.
The four functions accept(), recvmsg(), getsockname() and getpeername() pass a socket address structure from
kernal to the process, the reverse direction form the precious scenario. In this case the length is passed as pointer
to an integer containing the size of structure as in
The reason that the size changes from an integer to be a pointer to an integer is because the size is both value
when the function is called ( it tells the kernal the size of the structure so that the kernal does not write past the
end of the4 structure when filling it ) and it is the result when the function results (It tells the process how much
information the kernal actually stored in the structure). This type of argument is called value – result arguments.
1.5 Byte ordering functions
A 16-bit integer that is made up of 2 bytes. There are two ways to store the two bytes in memory: with the low-
order byte at the starting address, known as little-endian byte order, or with the high-order byte at the starting
address, known as big-endian byte order.
In this fig. we show increasing memory addresses going from right to left in the top, and from left to right in the
bottom. There is no standard between these two byte orderings and we encounter systems that use both formats.
The byte ordering used by a given system is known as host byte order. The byte ordering used by a given
protocol is known as Network byte order.
Normally, the address structure may be maintained in the host byte order system. Then it may be converted into
network byte order as per requirement. However, Posix.1g specifies that the certain files in socket address structure be
maintained in network byte order . Therefore, there are function that convert between these two byte orders.
# include <netinet/in.h>
uint16_t htons(uint16_t host16bitvalue);
uint32_t htons(uint32_t host32bitvalue);
Return value in network byte order.
uint16_t ntohs(uint16_t net16bitvalue);
uint32_t ntohs(uint32_t net16bitvalue);
Returns value in host byte order.
memset () sets the specified number of bytes to the value in c in the destination, memcpy() is similar to bcopy ()
but the order of the two pointer arguments is swapped. bcopy correctly handles overlapping fields, while the
behaviour of memcpy() is undefined if the source and destination overlap. memmove() functions can be used
when the fields overlap. memcpy() compares two arbitrary byte strings and returns 0 if they are identical, if not,
the return value is either greater than 0 or less than 0 depending whether the first unequal byte pointed to by ptr1
is greater than or less than the corresponding byte pointed to by ptr 2.
There are two groups of address conversion function that convert the Internet address between ASCII strings
(readable form) to network byte ordered binary values and vice versa.
inet_aton( ), inet_addr( ), and inet_ntoa( ) : convert an IPv4 address between a dotted decimal string (eg
206.62.226.33) and it s 32 bit network byte ordered binary values
#include <arpa/inet.h>
The newer functions, inet_pton, inet_ntop handle both IPv4 and IPv6 addresses.
inet_aton function:
converts the C character strings pointed to by the strptr into its 32 bit binary network byte ordered value
which is stored through the pointer addptr. If successful 1 is returned otherwise a 0.
#include <arpa/inet.h>
int inet_aton (const * strptr, strut in_addr * addptr);
Returns: 1 if string was valid, 0 on error
inet_ntoa function:
converts a 32 bit binary network byte ordered IPv4 address into its corresponding dotted decimal string. The
string pointed to by the return value of the function resides in static memory. This function take structure as an
arguments, not a pointer to a structure. (This is rare)
#include <arpa/inet.h>
inaddr);
char *inet_ntoa(struct in_addr
Returns: pointer to dotted-decimal string
inet_addr function:
It does the same conversion as inet_aton(), returning the 32 bit binary network byte ordered value as the return
value. Although the IP address (0.0.0.0 through 255.255.255.255) are al valid addresses, the functions returns
the constant INADDR_NONE on an error.
#include <arpa/inet.h>
in_addr_t inet_addr (const char * strptr);
Returns: 32-bit binary network byte ordered IPv4 address; INADDR_NONE if error
These two functions are new with the IPv6 and work with both IPv4 and IPv6 addresses.
The letter p and n stands for presentation and numeric. Presentation format for an address is often ASCII
string and the numeric format is the binary value that goes into a socket address structure.
# include <arpa/inet.h>
int inet_pton (int family, const char *strptr, void *addrptr);
//Returns: 1 if OK, 0 if input not a valid presentation format, -1 on error
const char *inet_ntop (int family, cost void *addrptr, char *strptr, size_t len);
//Returns: pointer to result if OK, NULL on error
The family argument for both function is either AF-INET or AF_ INET6. If family is not supported,
both functions return –1 with errno set to EAFNOSUPPORT.
The first functions tries to convert the string pointed to by strptr, storing the binary results through the
pointer addrptr. IF successful, the return value is 1. If the input string is not valid presentation format for the
specified family, 0 is returned.
inet_pton () does the reverse conversion from numeric (addrptr) to presentation (strptr). The len
argument is the size of the destination, to prevent the function from overflowing the caller’s buffer. To help
specify this size, following two definitions are defined by including the <netinet/in.h> header: the following
figure summarizes the five functions
First server is started, then sometimes later a client is started that connects to the server. The client sends a
request to the server, the server processes the request, and the server sends back reply to the client. This
continues until the client closes its end of the connection, which sends an end of file notification to the server.
The server then closes its end of the connections and either terminates or waits for a new connection.
i. socket ( ) Function
To perform network I/O, the first thing a process must do is call the socket( ) function, specifying the type of
communication protocol desired.
#include<sys/socket.h>
int socket(int family, int type, int protocol);
On success the socket ( ) function returns a small non negative integer value, we call this a socket descriptor or a
sockfd.
family: specifies the protocol family {AF_INET for TCP/IP}
type: indicates communications semantics
SOCK_STREAM stream socket TCP
SOCK_DGRAM datagram socket UDP
SOCK_RAW raw socket
protocol: set to 0 except for raw sockets
Example: sd = socket(AF_INET, SOCK_STREAM,0)
ii .connect Function : The connect function is by a TCP client to establish a active connection with a remote
server. The arguments allows the client to specify the remote end points which includes the remote machines IP
address and protocol port number.
# include <sys/socket.h>
int connect (int sockfd, const struct sockaddr * servaddr, socklen_t addrelen)
returns 0 if ok -1 on error.
sockfd is the socket descriptor that was returned by the socket function. The second and third arguments are a
pointer to a socket address structure and its size.
In case of TCP socket, the connect() function initiates TCP’s three way handshake. The function
returns only when the connection is established or an error occurs. Different type of errors are :
1. If the client TCP receives no response to its SYN segment, ETIMEDOUT is returned. This is done
after the SYN is sent after, 6sec, 24sec and if no response is received after a total period of 75
seconds, the error is returned.
2. In case for SYN request, a RST is returned (hard error), this indicates that no process is waiting for
connection on the server. In this case ECONNREFUSED is returned to the client as soon the RST is
received. RST is received when (a) a SYN arrives for a port that has no listening server (b) when
TCP wants to abort an existing connection, (c) when TCP receives a segment for a connection does
not exist.
3. If the SYN elicits an ICMP destination is unreachable from some intermediate router, this is
considered a soft error. The client server saves the message but keeps sending SYN for the time
period of 75 seconds. If no response is received, ICMP error is returned as EHOSTUNREACH or
ENETUNREACH.
In terms of the TCP state transition diagram, connect() moves from the CLOSED state to the SYN_SENT state
and then on success to the ESTABLISHED state. If the connect fails, the socket is no longer usable and must be
closed.
Bind(): When a socket is created, it does not have any notion of end points addresses An application calls bind
to specify the local endpoint address for a socket. That is the bind function assigns a local port and address to a
socket..
#include <sys/socket.h>
int bind (int sockfd, const strut sockaddr *myaddr, socklen_t addrlen);
Return 0 if Ok, -1 on error
The second arguments is a pointer to a protocol specific address and the third argument is the size of this
address structure. Server bind their well known port when they start. (A TCP client does not bind an IP address
to its socket.)
listen Function:
The listen function is called only by TCP server and it performs following functions.
The listen function converts an unconnected socket into a passive socket, indicating that the kernel
should accept incoming connection requests directed to this socket. In terms of TCP transmission diagram the call
to listen moves the socket from the CLOSED state to the LISTEN state.
The second argument to this function specifies the maximum number of connections that the kernel should
queue for this socket.
#include <sys/socket.h>
int listen (int sockfd, int backlog); returns 0 if OK -1 on error.
This function is normally called after both the socket and bind functions and must be called before calling the
accept function.
The kernel maintains two queues and the backlog is the sum of these two queues. These are :
An incomplete connection queue, which contains an entry for each SYN that has arrived from a client for
which the server is awaiting completion of the TCP three way handshake. These sockets are in the
SYN_RECD state.
A Completed Connection Queue which contains an entry for each client with whom three handshake has
completed. These sockets are in the ESTABLISHED state.
Following figure depicts these two queues for a given listening socket.
Server
Sum of both queues cannot exceed backlog
Accept()
completed connection
queue (ESTABLISHED) state
3WHS complete
Arriving SYN
The two queues maintained by TCP for a listening socket.
When a SYN arrives from a client, TCP creates a new entry on the incomplete queue and then responds with the second
segment of the three way handshake. The server’s SYN with an ACK of the clients SYN. This entry will remain on the
incomplete queue until the third segment of the three way handshake arrives ( the client’s ACK of the server’s SYN) or the
entry times out. If the three way hand shake completes normally, the entry moves from the incomplete queue to the
completed queue. When the process calls accept, the first entry on the completed queue is returned to the process or, if the
queue is empty, the process is put to sleep until an entry is placed onto the completed queue. If the queue are full when a
client arrives, TCP ignores the arriving SYN, it does not send an RST. This is because the condition is considered temporary
and the client TCP will retransmit its SYN with the hope of finding room in the queue.
accept Function : accept is called by a TCP server to return the next completed connection from the from of the completed
connection queue. If the completed queue is empty, the process is put to sleep.
# include <sys/socket.h>
int accept ( sockfd, struct sockaddr * cliaddr, socklen_t *addrlen) ;
return non negative descriptor if OK, -1 on error.
The cliaddr and addrlen arguments are used to return the protocol address of the connected peer process (the
client). addrlen is a value-result argument before the call, we set the integer value pointed to by *addrlen to the size of
the socket address structure pointed to by cliaddr and on return this integer value contains the actual number of bytes stored
by the kernel in the socket address structure. If accept is successful, its return value is a brand new descriptor that was
automatically created by the kernel. This new descriptor refers to the TCP connection with the client. When discussing
accept we call the first argument to accept the listening and we call the return value from a accept the connected socket
fork function:
fork is the function that enables the Unix to create a new process
#inlcude <unistd.h>
pid_t fork (void); Returns 0 in child, process ID of child in parent, -1 on error
There are two typical uses of fork function:
1. A process makes a copy of itself so that one copy can handle one operation while the other copy does another
task. This is normal way of working in a network servers.
2. A process wants to execute another program. Since the only way to create a new process is by calling fork,
the process first calls fork to make a copy of itself, and then one of the copies(typically the child process) calls
exec function to replace itself with a the new program. This is typical for program such as shells.
3. fork function although called once, it returns twice. It returns once in the calling process (called the parent)
with a return value that is process ID of the newly created process (the child). It also returns once in the child,
with a return value of 0. Hence the return value tells the process whether it is the parent or the child.
4. The reason fork returns 0 in the child, instead of parent’s process ID is because a child has only one parent
and it can always obtain the parent’s process ID by calling getppid A parent, on the other hand, can have any
number of children, and there is no way to obtain the process Ids of its children. If the parent wants to keep
track of the process Ids of all its children, it must record the return values form fork.
exec function :
The only way in which an executable program file on disk is executed by Unix is for an existing process to call one
of the six exec functions. exec replaces the current process image with the new program file and this new program
normally starts at the main function. The process ID does not change. The process that calls the exec is the calling process
and the newly executed program as the new program.
#include <unistd.h>
int execl (const char *pathname, const char arg 0, …/ (char *) 0 */);
int execv (const char *pathname, char *const argv[ ]);
int execle (const char *pathname, const char *arg 0, ./ * (char *)0,char *const envp[] */);
int execve (const char *pathname, char *const arg [], char *const envp[]);
int execlp (const char *filename, const char arg 0, …/ (char *) 0 */);
int execvp (const char *filename, char *const argv[]);
These functions return to the caller only if an error occurs. Otherwise control passes to the start of the new
program, normally the main function.
Write function transfer the data from application to a buffer in the kernel on your machine. Read function transfers data
from a buffer in the kernel to application.
close ():
This function is used to close a socket and terminate a TCP connection. It marks the socket as closed and returns
to the process immediately.
program:
#include "unp.h"
#include <time.h>
int main()
{
int listenfd, connfd;
socklen_t len;
struct sockaddr_in servaddr, cliaddr;
char buff[MAXLINE];
time_t ticks;
listenfd = Socket(AF_INET, SOCK_STREAM, 0);
bzero(&servaddr, sizeof(servaddr));
servaddr.sin_family = AF_INET;
servaddr.sin_addr.s_addr = htonl(INADDR_ANY);
servaddr.sin_port = htons(13); /* daytime server */
bind(listenfd, (SA *) &servaddr, sizeof(servaddr));
listen(listenfd, LISTENQ);
for ( ; ; )
{
len = sizeof(cliaddr);
connfd = Accept(listenfd, (SA *) &cliaddr, &len);
printf("connection from %s, port %d\n",Inet_ntop(AF_INET, &cliaddr.sin_addr, buff, sizeof(buff)),
ntohs(cliaddr.sin_port));
write(connfd, buff, strlen(buff));
close(connfd);
}
}
Here we declare two new variables that are len and cliaddr. The cliaddr contain the client’s protocol
address.
Create a socket by using socket( ) function. It returns a socket descriptor representing an endpoint.
After the socket descriptor is created, the bind( )function gets a unique name for the socket.
The listen( ) allows the server to accept incoming client connections.
The server uses accept ( ) function to accept an incoming connection request. The accept( ) call will block
indefinitely waiting for the incoming connection to arrive.
The result is written to client by using write ( ) function.
The server closes its connection with client by calling close ( ) function.
When a connection is established, accept returns, the server calls fork, and the child process services the client (on
connfd, the connected socket) and the parent process waits for another connection (on listenfd, the listening
socket). The parent closes the connected socket since the child handles the new client.
#include “unp.h”
int main (int arg, char **argv)
{
pid_t pid;
int listenfd, connfd;
socklen_t len;
struct sockaddr_in servaddr, cliaddr;
listenfd = Socket( AF_INET, SOCK_STREAM,0 );
bzero (&servaddr, sizeof(servaddr));
servaddr.sin_family = AF_INET;
servaddr.sin_addr.s_addr = htonl(INADDR_ANY);
servaddr.sin_port = htons(13);
Bind(listenfd, (SA*)&servaddr, sizeof(servaddr));
Listen(listenfd, LISTENQ);
for ( ; ; )
{
connfd = Accept (listenfd, (SA*)&cliaddr, &len); /* probably blocks */
if( (pid = Fork()) == 0)
{
Close(listenfd); /* child closes listening socket */
doit(connfd); /* process the request */
Close(connfd); /* done with this client */
exit(0); /* child terminates */
}
Close(connfd); /* parent closes connected socket */
}
}
With the fork ( ) command we create another separate process for each request.
The fork( ) command splits the current process into two processes: a parent and a child.
The fork( ) command returns 0 if you are the child, so all must check the return value from fork ( ).
From this point, the child to process the clients request and the parent can continue on to accept other
request.
However, when a child finishes and exits it needs to inform the parent that it is completed.
The function doit does whatever is required to service the client.
When this function returns, we explicitly close the connected socket in the child.
This is not required since the next statement calls exit, and part of process termination is to close all open
descriptors by the kernel.
We can also visualize the sockets and connection that occur in the above program as follows. First, following fig
shows the status of the client and server while the server is blocked in the call to accept and the connection
request arrives from the client.
Immediately after accept returns, the connection is accepted by the kernel and a new socket, connfd, is created.
This is a connected socket and data can now be read and written across the connection.
Server client
listenfd Connection request
connect()
This figure shows the status of client and server while the server is blocked in the call to accept() and the
connection request arrives from the client.
Server client
Listenfd Connection request connect()
connfd
The connection is accepted by the kernel and a new socket , confd is created. This is a connected socket and data
can now be read and written across the connection.
Server client
Listenfd Connection request connect()
connfd
Fork()
Server (child)
Listenfd
connfd
Status of client /server after parent, child close appropriate sockets
Server client
Listenfd Connection request connect()
connfd
connfd
Differences: