2. UNIX PROGRAMMING
2. UNIX PROGRAMMING
- SUSHANT PAUDEL
OUTLINE
• Sockets Introduction
• Socket Address Structures
• Values Result arguments
• Byte ordering and Manipulation functions
• Fork and exec functions
• Concurrent Servers
• UNIX /INTERNET domain socket
• Socket System Calls
WHAT IS UNIX PROGRAMMING?
UNIX
• Unix is an operating system that was developed in the
1970s
• It is known for its stability, security, and versatility.
• It is widely used in servers, workstations, and embedded
systems.
• Unix provides a command-line interface (CLI) where users
can interact with the system using commands and scripts.
KEY CONCEPTS AND COMPONENTS OF UNIX
PROGRAMMING
• Shell: The Unix shell is a command-line interpreter that allows users to interact with the operating
system. It provides a command prompt where users can enter commands to perform tasks, execute
programs, and manipulate files and directories.
• Programming Languages: Unix supports various programming languages, including C, C++, Python,
Perl, and Shell scripting languages like Bash. These languages are commonly used to develop
applications and scripts for Unix systems.
• Networking: Unix systems have built-in support for networking, allowing programmers to develop
applications that communicate over networks. This includes creating network sockets, establishing
connections, and exchanging data using protocols such as TCP/IP.
• Tools and Utilities: Unix provides a rich set of tools and utilities that aid in programming and
development. These include compilers, debuggers, text editors (such as vi or emacs), version control
systems (such as Git), and various command-line tools for tasks like text processing, file manipulation,
and system administration.
SOCKET ADDRESS STRUCTURE
● A socket address structure is a special structure that stores the
connection details of a socket.
● It mainly consists of fields like IP Address, Port Number and Protocol
family.
● Socket Address Family determines the format of the address structure
○ Members of AF_INET address family are IPv4 addresses.
○ Members of AF_INET6 address family are IPv6 addresses.
○ Members of AF_UNIX address family are names of Unix domain sockets
○ Members of AF_IPX address family are IPX addresses, and so on.
● The name of each socket starts with socketaddr_ followed by its unique
name.
SOCKET
● Sockets allow communication between two different processes on the
same or different machines.
● A Unix Socket is used in a client-server application framework.
● A server is a process that performs some functions on request from a
client.
● Most of the application-level protocols like FTP, SMTP, and POP3 make
use of sockets to establish connection between client and server and
then for exchanging data.
IPV4 SOCKET ADDRESS STRUCTURE
● An IPv4 socket address structure, commonly called an "Internet socket address structure," is named sockaddr_in
and is defined by including the <netinet/in.h> header.
● Figure in next slide shows the POSIX definition of IPv4 socket address structure
● POSIX or “Portable Operating System Interface for uniX” is a collection of standards that define some of the
functionality that a (UNIX) operating system should support.
● Datatypes required by Posix
IPV4 SOCKET ADDRESS STRUCTURE
GENERIC SOCKET ADDRESS STRUCTURE
● A socket address structures is always passed by reference when passed
as an argument to any socket functions
● But any socket function that takes one of these pointers as an argument
must deal with socket address structures from any of the supported
protocol families.
● A problem arises in how to declare the type of pointer that is passed.
● A generic socket address structure was defined in the <sys/socket.h>
header, which was shown below.
IPV6 SOCKET ADDRESS STRUCTURE
● The IPv6 socket address is defined by including the <netinet/in.h> header
IPV6 SOCKET ADDRESS STRUCTURE
● The SIN6_LEN constant must be defined if the system supports the length member for socket
address structures.
● The IPv6 family is AF_INET6, whereas the IPv4 family is AF_INET.
● The members in this structure are ordered so that if the sockaddr_in6 structure is 64-bit aligned,
so is the 128-bit sin6_addr member. On some 64-bit processors, data accesses of 64-bit values are
optimized if stored on a 64-bit boundary.
● The sin6_flowinfo member is divided into two fields:
○ The low-order 20 bits are the flow label
○ The high-order 12 bits are reserved
● The sin6_scope_id identifies the scope zone in which a scoped address is meaningful, most
commonly an interface index for a link-local address
NEW GENERIC SOCKET ADDRESS STRUCTURE
● A new generic socket address structure was defined as part of the IPv6
sockets API, to overcome some of the shortcomings of the existing struct
sockaddr.
● Unlike the struct sockaddr, the new struct sockaddr_storage is large
enough to hold any socket address type supported by the system.
● The sockaddr_storage structure is defined by including the
<netinet/in.h> header
NEW GENERIC SOCKET ADDRESS STRUCTURE
The sockaddr_storage type provides a generic socket address structure that is different from struct sockaddr in two ways:
1. If any socket address structures that the system supports have alignment requirements, the sockaddr_storage
provides the strictest alignment requirement.
○ alignment requirement, is an integer value of type size_t representing the number of bytes between
successive addresses.
2. The sockaddr_storage is large enough to contain any socket address structure that the system supports.
VALUE-RESULT ARGUMENTS
● When a socket address structure is passed to any socket function, it is
always passed by reference. That is, a pointer to the structure is passed.
● The length of the structure is also passed as an argument.
● But the way in which the length is passed depends on which direction
the structure is being passed:
○ from the process to the kernel, or
○ vice versa.
VALUE-RESULT ARGUMENTS(PASS A SOCKET ADDRESS STRUCTURE FROM
THE PROCESS TO THE KERNEL. )
● Three functions, bind, connect, and sendto, pass a socket address structure from the
process to the kernel.
● One argument to these three functions is the pointer to the socket address structure
and another argument is the integer size of the structure
VALUE-RESULT ARGUMENTS(PASS A SOCKET ADDRESS STRUCTURE FROM THE
PROCESS TO THE KERNEL. )
● Since the kernel is passed both the pointer and the size of what the pointer points
to, it knows exactly how much data to copy from the process into the kernel.
VALUE-RESULT ARGUMENTS(PASS A SOCKET ADDRESS
STRUCTURE FROM THE KERNEL TO THE PROCESS. )
● Four functions, accept, recvfrom, getsockname, and getpeername, pass a socket
address structure from the kernel to the process, the reverse direction from the
previous scenario.
● Two of the arguments to these four functions are the pointer to the socket address
structure along with a pointer to an integer containing the size of the structure
VALUE-RESULT ARGUMENTS(PASS A SOCKET ADDRESS STRUCTURE FROM THE KERNEL TO THE PROCESS. )
VALUE-RESULT ARGUMENTS(PASS A SOCKET ADDRESS STRUCTURE FROM THE KERNEL TO THE PROCESS. )
eg:0x1234
Little-endian
34 --- 12
Big-endian
12 --- 34
BYTE ORDERING FUNCTIONS
● We must deal with these byte ordering differences as network
programmers because networking protocols must specify a network
byte order.
● For example, in a TCP segment, there is a 16-bit port number and a 32-
bit IPv4 address.
● The sending protocol stack and the receiving protocol stack must agree
on the order in which the bytes of these multibyte fields will be
transmitted. The Internet protocols use big-endian byte ordering for
these multibyte integers.
CONVERSION OF NETWORK TO HOST BYTE ORDER AND VICE VERSA
● fork starts a new process which is a copy of the one that calls
it, while exec replaces the current process image with another
(different) one.
● Both parent and child processes are executed simultaneously
in case of fork() while Control never returns to the original
program unless there is an exec() error.
CONCURRENT SERVERS
bind
● The bind system call assigns a name to an unnamed socket.
#include <sys/types.h>
#include <sys/socket.h>
● The first argument is the socket descriptor returned from socket system call.
○ Each socket within the network has a unique name associated with it called a socket descriptor
● The second argument is a pointer to a protocol-specific address and
● The third argument is the size of this address.
SOCKET SYSTEM CALLS
Connect
● A client process connects a socket descriptor following the socket system call to establish a connection with a
server.
#include <sys/types.h>
#include <sys/socket.h>
Listen
● This system call is used by a connection-oriented server to indicate that it is willing to receive connections .
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/types.h>
#include <sys/socket.h>
int sendto(int sockfd, char *buff, int nbytes, int flags, struct sockaddr *to, int
addrlen);
int recvfrom(int sockfd, char *buff, int nbytes, int flags, struct sockaddr *from,
int *addrlen);
SOCKET SYSTEM CALLS
close
● The normal Unix close system call is also used to close a socket.
#include <sys/types.h>
#include <sys/socket.h>
● Vnode Table
○ The kernel keeps yet another table, the Vnode table, which has an entry for
each open file or device.
○ Each entry, called a Vnode, contains information about the type of file and
pointers to functions that operate on the file.
○ Typically for files, the vnode also contains a copy of the inode for the file,
which has "physical" information about the file, e.g. where exactly on the disk
the file's data resides
PASSING FILE DESCRIPTORS
https://www.sobyte.net/post/2022-01/pass-fd-over-domain-socket/\
https://www.sobyte.net/post/2022-01/pass-fd-over-domain-socket/
https://flylib.com/books/en/3.224.1.265/1/
OUTLINE
● The process calls recvfrom and the system call does not
return until the datagram arrives and is copied into
our application buffer, or an error occurs.
● the process is blocked the entire time from when it calls
recvfrom until it returns
● When recvfrom returns successfully, our application
processes the datagram.
NONBLOCKING I/O MODEL
● Nonblocking I/O allows a program to continue executing even if an I/O operation is not
yet complete.
● When a program performs a nonblocking I/O operation, it doesn't wait for the
operation to finish but continues with other tasks.
● It enables the program to perform other operations or check the status of multiple I/O
operations simultaneously.
● Nonblocking I/O requires additional code logic to handle the nonblocking nature and
repeatedly check for completion.
NONBLOCKING I/O MODEL
● When a socket is set to be nonblocking, we are telling the kernel "when an I/O operation that I request cannot be
completed without putting the process to sleep, do not put the process to sleep, but return an error instead"
NONBLOCKING I/O MODEL
● For the first three recvfrom, there is no data to return and the kernel
immediately returns an error of EWOULDBLOCK.
● For the fourth time we call recvfrom, a datagram is ready, it is copied into
our application buffer, and recvfrom returns successfully. We then process
the data.
● When an application sits in a loop calling recvfrom on a nonblocking
descriptor like this, it is called polling.
● The application is continually polling the kernel to see if some operation is
ready.
● This is often a waste of CPU time, but this model is occasionally
encountered, normally on systems dedicated to one function.
I/O MULTIPLEXING MODEL
● We first enable the socket for signal-driven I/O and install a signal handler using
the sigaction system call. The return from this system call is immediate and our
process continues; it is not blocked.
● When the datagram is ready to be read, the SIGIO signal is generated for our
process. We can either:
○ read the datagram from the signal handler by calling recvfrom and then notify the main
loop that the data is ready to be processed
○ notify the main loop and let it read the datagram .
ASYNCHRONOUS I/O MODEL
● Asynchronous I/O is a method where I/O operations are initiated and executed
independently of the program's execution flow.
● The POSIX aio_ functions (e.g., aio_read, aio_write) are used for asynchronous I/O operations.
● With asynchronous I/O, a program initiates an I/O operation and continues with other tasks
without waiting for the operation to complete.
● The program can then later check the status of the asynchronous operation or be notified
through a callback mechanism.
● The main difference between this model and the signal-driven I/O model is that with signal-
driven I/O, the kernel tells us when an I/O operation can be initiated, but with asynchronous
I/O, the kernel tells us when an I/O operation is complete.
ASYNCHRONOUS I/O MODEL
ASYNCHRONOUS I/O MODEL
● We call aio_read (the POSIX asynchronous I/O functions begin with aio_ or lio_) and
pass the kernel the following:
○ descriptor, buffer pointer, buffer size (the same three arguments for read),
○ file offset (similar to lseek),
○ and how to notify us when the entire operation is complete.
This system call returns immediately and our process is not blocked while waiting for the
I/O to complete.
● We assume in this example that we ask the kernel to generate some signal when the
operation is complete.
● This signal is not generated until the data has been copied into our application
buffer, which is different from the signal-driven I/O model.
SYNCHRONOUS I/O VERSUS ASYNCHRONOUS I/O
● There are various ways to get and set the options that
affect a socket:
○ The getsockopt and setsockopt functions.
○ The fcntl function, which is the POSIX way to set a socket
for nonblocking I/O, signal-driven I/O, and to set the owner
of a socket.
○ The ioctl function.
GETSOCKOPT AND SETSOCKOPT FUNCTIONS
● getsockopt() and setsockopt() manipulate options for the socket referred to by the file descriptor sockfd.
● Options may exist at multiple protocol levels;
● Daemons are processes that are often started when the system is
bootstrapped and terminate only when the system is shut down.
● Because they don’t have a controlling terminal, they run in the
background.
● UNIX systems have numerous daemons that perform day-to-day
activities.
● Since a daemon does not have a controlling terminal, we need to see
how a daemon can report error conditions when something goes wrong.
DAEMON PROCESS: ERROR LOGGING
One problem a daemon has is how to handle error messages. It cannot (simply)
write to:
A central daemon error-logging facility is required. Most daemons use this facility.
DAEMON PROCESS: ERROR LOGGING
SIOCSPGRP Set the process ID and process group ID of the socket int
SIOCGPGPR Get the process ID and process group ID of the socket int
FIONREAD Get the number of bytes of data in the receive buffer int
https://github.com/sushantpaudel/network-programming-examples
ASSIGNMENT .
● https://aws.amazon.com/education/awseducate/