NP&ACN

Download as pdf or txt
Download as pdf or txt
You are on page 1of 432

ICT 3173 –

Network Programming and


Advanced Communication
Network
Dr. Ramakrishna M

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Socket Introduction

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Introduction
• Socket Address Structure
• Value-Result Argument
• Byte Ordering Function
• Byte Manipulation Function
• Other important functions

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Socket Address Structure
• Socket = IP address + TCP or UDP port number
• Used in a socket function as an argument (as pointer).
• IP address, TCP or UDP port number, length of structure .....
• Each protocol define its own Socket Address Structure(IPv4, IPv6....)

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


IPv4 Socket Address Structure
Struct in_addr{
in_addr_t s_addr; /*32bit IPv4 address*/
}; /*network byte ordered*/

struct sockaddr_in {
uint8_t sin_len; /* length of structure(16) */
sa_family_t sin_family; /* AF_INET */
in_port_t sin_port; /* 16bit TCP or UDP port number */ /*network byte ordered*/

struct in_addr sin_addr; /* 32bit IPv4 address */ /*network byte ordered*/

char sin_zero[8]; /* unused */


}; /* included in <netinet/in.h> */

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Datatypes required by POSIX.1G
Datatype Description Header

int8_t Signed 8bit integer <sys/types.h>


uint8_t Unsigned 8bit integer <sys/types.h>
int16_t Signed 16bit integer <sys/types.h>
uint16_t Unsigned 16bit integer <sys/types.h>
int32_t Signed 32bit integer <sys/types.h>
uint32_t Unsigned 32bit integer <sys/types.h>
Sa_family_t Address family of socket address structure <sys/socket.h>
Length od socket address structure normally
Socklen_t uint32_t <sys/socket.h>

in_addr_t Ipv4 address, normally uint32_t <netinet/in.h>


in_port_t TCP or UDP port, normally uint16_t <netinet/in.h>

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Generic Socket address structure
• int bind(int , struct sockaddr * , socklen_t);
• Example:
struct sockaddr_in serv; /*IPv4 socket address structure*/
/* fill in serv{} */
bind(sockfd, (struct sockaddr *) &serv, sizeof(serv));
• That is the general structure pointer for handling the structure of
heterogeneous protocol (type casting notice)

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


IPv6 Socket Address Structure
Struct in6_addr{
uint8_t s6_addr[16]; /*128bit IPv6 address*/
}; /*network byte ordered*/

#define SIN6_LEN /* required for compile-time tests */

struct sockaddr_in6 {
uint8_t sin6_len; /* length of structure(24) */
sa_family_t sin6_family; /* AF_INET6*/
in_port_t sin6_port; /* Transport layer port# */ /*network byte ordered*/
uint32_t sin6_flowinfo; /* priority & flow label */ /*network byte ordered*/
struct in6_addr sin6_addr; /* IPv6 address */ /*network byte ordered*/
}; /* included in <netinet/in.h> */

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Comparison
of socket
address
structure

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Comparison
of socket
address
structure

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


IPv4 Header

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


IPv6 Header

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


IPv4
Address

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Value-Result Argument
• Socket address structure are passed as arguments to the socket
function, are always passed as a pointer. (Passed by reference.)
• Socket address structure goes from the process to the kernel methods
and vice-versa
• Depending on which direction the structure being passed

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Value-Result Passing

Accept, recvfrom, getsockname,


Bind, connect, sendto
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
getpeername
Value-Result Argument
Process to kernel Kernel to process

struct sockaddr_in serv struct sockaddr_un cli


/* unix domain */
/* fill in serv{} */
socklen_t len;
connect(sockfd, (SA *)&serv, len = sizeof(cli);
sizeof(serv)); getpeername(unixfd,
(SA*)&cli,&len);
/* len may have changed. */
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
Byte Ordering Function
Increasing memory
address
Address A+1 Address A
Little-endian byte order: High-order byte low-order byte

MSB 16bit value LSB

big-endian byte order: High-order byte low-order byte


Address A Address A+1
Increasing memory
address
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
Sample code to test byte ordering
#include "unp.h"
int main(int argc, char **argv)
{
union {short s;
char c[sizeof(short)]; } un;
un.s = 0x0102;
printf("%s: ", CPU_VENDOR_OS);
if (sizeof(short) == 2) {
if (un.c[0] == 1 && un.c[1] == 2) printf("big-endian\n");
else if (un.c[0] == 2 && un.c[1] == 1) printf("little-endian\n");
else printf("unknown\n");
} else printf("sizeof(short) = %d\n", sizeof(short));
exit(0);
}

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Byte Ordering
• Network program must specify a network byte order
• Internet protocol uses big-endian byte
• Byte order is different between the host byte order and network byte
order - conversion needed

uint16_t htons (uint16_t host16bitvalue); • Return : value in


uint32_t htonl (uint32_t host32bitvalue);
network byte order
uint16_t ntohs (uint16_t net16bitvalue);
uint32_t ntohl (uint32_t net32bitvalue);
• Return : value in
h:host n:network s:short(16bit) l:long(32bit) host byte order

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Byte Manipulation Function
#include <strings.h>

void bzero(void *dest, size_t nbytes);


/* Sets specified no. of bytes to 0 in the dest */

void bcopy(const void *src, void *dest, size_t nbytes);


/* Moves the specified no. of bytes from src to dest*/

int bcmp(const void *ptr1, const void *ptr2, size_t nbytes);


/* return 0 if equal the 2 byte strings are identical,
nonzero if unequal */
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
Byte Manipulation Function
#include <string.h>

void *memset(void *dest, int c, size_t len);


/* Sets the specified no. of bytes to the value c in the dest*/

void *memcpy(void *dest, const void *src, size_t nbytes);


/* same as bcopy, order of the 2 pointer arguments swapped*/

int memcmp(const void *ptr1, const void *ptr2, size_t nbytes);


/* ptr1 < ptr2 : less than 0
ptr1 > ptr2 : greater than 0
ptr1 = ptr2 : than 0*/

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


IPv4 Address Conversion Functions
• Convert internet address between ASCII string and network byte
ordered binary values ==> “203.255.74.129”
• It may be stored in socket address structure
• IPv4 address only convert

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


IPv4 Address Conversion Functions
#include<arpa/inet.h>

int inet_aton(const char *strptr, struct in_addr *addrptr);


/* return : 1 if string was valid,0 on error */

in_addr_t inet_addr(const char *strptr);


/* return : 32bit binary network byte ordered IPv4 address;
INADDR_NONE if error */

char *inet_ntoa(struct in_addr inaddr);


/*return pointer to dotted-decimal string*/

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


IPv6 Address Conversion Functions
• For IPv6: inet_pton , inet_ntop
• For both: IPv4,IPv6 address converting
• p : presentation (string) n : numeric(binary)

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


IPv6 Address Conversion Functions
#include<arpa/inet.h>

int inet_pton(int family, const char *strptr, void *addrptr);


/* return: 1 if OK, 0 if input not a valid presentation format, -1 onerror */
/* by string as a binary value */

const char *inet_ntop(int family, const void *addrptr, char *strpt, size_t len);
/* return : pointer to result if OK, NULL onerror */
/* len : size of the destination */
/* binary value to a string value */
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
Elementary Socket Functions
Dr. Ramakrishna M

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Contents
• Socket function
• connect function
• bind function
• listen function
• accept function
• fork and exec function
• Concurrent server
• close function
• getsockname and getpeername function

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Socket and Process Communication

application layer application layer

User Process Internet User Process


Socket Socket
transport transport layer (TCP/UDP)
OS layer (TCP/UDP)
network OS network
network layer (IP)
Internet network layer (IP)
stack stack
link layer (e.g. ethernet) Internet link layer (e.g. ethernet)

The interface that the OS provides to its networking subsystem

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Delivering the Data: Division of Labor
• Network
• Deliver data packet to the destination host
• Based on the destination IP address
• Operating system
• Deliver data to the destination socket
• Based on the destination port number (e.g., 80)
• Application
• Read data from and write data to the socket
• Interpret the data (e.g., render a Web page)

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Socket: End Point of Communication
• Sending message from one process to another
• Message must traverse the underlying network
• Process sends and receives through a “socket”
• In essence, the doorway leading in/out of the house
• Socket as an Application Programming Interface
• Supports the creation of network applications
User process User process

socket socket
Operating Operating
System System
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
Two Types of Application Processes
Communication
• Datagram Socket (UDP)
• Collection of messages
• Best effort
• Connectionless

• Stream Socket (TCP)


• Stream of bytes
• Reliable
• Connection-oriented

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


User Datagram Protocol (UDP):
Datagram Socket
UDP Postal Mail
• Single socket to receive messages • Single mailbox
Single mailbox to receive
to receive letters
messages
• No guarantee of delivery •• Unreliable
Unreliable 
• Not necessarily in-order
• Not necessarily in-order delivery • Not necessarily in-order delivery
delivery
• Datagram – independent packets •• Each letter is independent
Letters sent independently
• Must address each reply
• Must address each packet • Must address each mail

Example UDP applications


Multimedia, voice over IP (Skype)
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
Transmission Control Protocol (TCP):
Stream Socket
TCP Telephone Call
• Reliable – guarantee delivery • Guaranteed delivery

• Byte stream – in-order delivery • In-order delivery

• Connection-oriented – single • Connection-oriented


socket per connection
• Setup connection followed by
• Setup connection followed by conversation
data transfer

Example TCP applications


Web, Email, Telnet
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
Socket Identification
• Communication Protocol
• TCP (Stream Socket): streaming, reliable
• UDP (Datagram Socket): packets, best effort
• Receiving host
• Destination address that uniquely identifies the host
• An IP address is a 32-bit quantity
• Receiving socket
• Host may be running many different processes
• Destination port that uniquely identifies the socket
• A port number is a 16-bit quantity

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Socket Identification (Cont.)

Process Process
A B
port X port Y Port Number

TCP/UDP Protocol

Host Address
IP

Ethernet Adapter

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Clients and Servers
• Client program • Server program
• Running on end host • Running on end host
• Requests service • Provides service
• E.g., Web browser • E.g., Web server
GET /index.html

“Site Dr.
under construction
Ramakrishna M, Dept. of I&CT, MIT,”
MAHE, Manipal
Client-Server Communication
• Client “sometimes on” • Server is “always on”
• Initiates a request to the server • Handles services requests from
when interested many client hosts
• E.g., Web browser on your laptop • E.g., Web server for the
or cell phone www.cnn.com Web site
• Doesn’t communicate directly • Doesn’t initiate contact with the
with other clients clients
• Needs to know server’s address • Needs fixed, known address

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Client and Server Processes
• Client process
• process that initiates communication

• Server Process
• process that waits to be contacted

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Knowing What Port Number To Use
• Popular applications have well-known ports
• E.g., port 80 for Web and port 25 for e-mail
• See http://www.iana.org/assignments/port-numbers
• Well-known vs. ephemeral ports
• Server has a well-known port (e.g., port 80)
• Between 0 and 1023 (requires root to use)
• Client picks an unused ephemeral (i.e., temporary) port
• Between 1024 and 65535
• Uniquely identifying traffic between the hosts
• Two IP addresses and two port numbers
• Underlying transport protocol (e.g., TCP or UDP)

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Using Ports to Identify Services
Server host 128.2.194.242
Service request for
Client host
128.2.194.242:80 Web server
(i.e., the Web server) (port 80)
Client OS
Echo server
(port 7)

Service request for


128.2.194.242:7 Web server
(i.e., the echo server) (port 80)
Client OS
Echo server
(port 7)
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
Client-Server Communication
Stream Sockets (TCP): Connection-oriented
Server
Create a socket

Bind the socket


(what port am I on?)
Client
Listen for client
(Wait for incoming connections) Create a socket

Connect to server
Accept connection

Send the request


Receive Request

Send response
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal Receive response
Client-Server Communication
Stream Sockets (TCP): Connection-oriented
BIND
SOCKET
LISTEN
CONNECT

TCP three-way ACCEPT


handshake

SEND RECEIVE

SEND
RECEIVE

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, ManipalCLOSE


Client-Server Communication
Datagram Sockets (UDP): Connectionless
Server
Client
Create a socket
Create a socket
Bind the socket
Bind the socket

Receive Request Send the request

Send response
Receive response

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Client-Server Communication
Datagram Sockets (UDP): Connectionless
CREATE
BIND

SEND

RECEIVE

SEND

CLOSE

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


read, write Functions
• It may read/write less than needed due to finite buffer → call
function
• Only if write is nonblocking

#include "unp.h"
ssize_t readn(int filedes, void *buff, size_t nbytes);
ssize_t writen(int filedes, const void *buff, size_t nbytes);
ssize_t readline(int filedes, void *buff, size_t maxlen);
/*All return: number of bytes read or written, –1 on error*/

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


lib/readn.c
1 #include "unp.h"
2 ssize_t /* Read "n" bytes from a descriptor. */
3 readn(int fd, void *vptr, size_t n)
4 {
5 size_t nleft;
6 ssize_t nread;
7 char *ptr;
8 ptr = vptr;
9 nleft = n;
10 while (nleft > 0) {
11 if ( (nread = read(fd, ptr, nleft)) < 0) {
12 if (errno == EINTR)
13 nread = 0; /* and call read() again */
14 else
15 return (-1);
16 } else if (nread == 0)
17 break; /* EOF */
18 nleft -= nread;
19 ptr += nread;
20 }
21 return (n - nleft); /* return >= 0 */
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
22 }
lib/writen.c

1 #include "unp.h"
2 ssize_t /* Write "n" bytes to a descriptor. */
3 writen(int fd, const void *vptr, size_t n)
4 {
5 size_t nleft;
6 ssize_t nwritten;
7 const char *ptr;
8 ptr = vptr;
9 nleft = n;
10 while (nleft > 0) {
11 if ( (nwritten = write(fd, ptr, nleft)) <= 0) {
12 if (nwritten < 0 && errno == EINTR)
13 nwritten = 0; /* and call write() again */
14 else
15 return (-1); /* error */
16 }
17 nleft -= nwritten;
18 ptr += nwritten;
19 }
20 return (n);
21 } Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
1 # include "unp.h"
2 static int read_cnt;
3 static char *read_ptr;
4 static char read_buf[MAXLINE];
5 static ssize_t
6 my_read(int fd, char *ptr)
7 {
8 if (read_cnt <= 0) {
9 again:
10 if ( (read_cnt = read(fd, read_buf, sizeof(read_buf))) < 0) {
11 if (errno == EINTR)
12 goto again;
13 return (-1);
14 } else if (read_cnt == 0)
15 return (0);
16 read_ptr = read_buf;
17 }
18 read_cnt--;
19 *ptr = *read_ptr++;
20 return (1);
21 } Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
22 ssize_t
23 readline(int fd, void *vptr, size_t maxlen)
24 {
25 ssize_t n, rc;
26 char c, *ptr;
27 ptr = vptr;
28 for (n = 1; n < maxlen; n++) {
29 if ( (rc = my_read(fd, &c)) == 1) {
30 *ptr++ = c;
31 if (c == '\n')
32 break; /* newline is stored, like fgets() */
33 } else if (rc == 0) {
34 *ptr = 0;
35 return (n - 1); /* EOF, n - 1 bytes were read */
36 } else
37 return (-1); /* error, errno set by read() */
38 }
39 *ptr = 0; /* null terminate like fgets() */
40 return (n);
41 }

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


42 ssize_t
43 readlinebuf(void **vptrptr)
44 {
45 if (read_cnt)
46 *vptrptr = read_ptr;
47 return (read_cnt);
48 }

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


isfdtype
• It checks whether the descriptor is a socket descriptor

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Summary
• Socket Address Structures
• Value-Result Passing
• Basic socket and byte manipulation functions
• Reading and writing descriptors

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


bind()
• int bind(int sockfd, const struct sockaddr
*myaddr, socklen_t addrlen);
• Assign a local protocol address (“name”) to a socket.
• sockfd is socket descriptor from socket()
• myaddr is a pointer to address struct with:
• port number and IP address
IP Address IP Port Result
• if port is 0, then host will pick ephemeral port
• not usually for server (exception RPC port-map) Kernel chooses IP address
INADDR_ANY 0
• IP address != INADDR_ANY (unless multiple nics) and port
Kernel chooses IP address,
• addrlen is length of structure INADDR_ANY non zero
process specifies port
• returns 0 if ok, -1 on error Process specifies IP
Local IP
• EADDRINUSE (“Address already in use”) 0 address, kernel chooses
address
port
Local IP Process specifies IP
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal non zero
address address and port
listen()
• int listen(int sockfd, int backlog);
• Change socket state for TCP server.
• sockfd is socket descriptor from socket()
• backlog is maximum number of incomplete connections
• historically 5
• rarely above 15 on even moderate Web server!
• Sockets default to active (for a client)
• change to passive so OS will accept connection
• If the queues are full when client SYN arrives, TCP server ignore the
SYN, it does not send RST.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


listen()
• Backlog argument to the listen function has historically specified the
maximum value for the sum of both queues

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


listen()

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


accept()
• int accept(int sockfd, struct sockaddr cliaddr,
socklen_t *addrlen);
• Return next completed connection.
• sockfd is socket descriptor from socket()
• cliaddr and addrlen return protocol address from client
• returns brand new descriptor, created by OS
• note, if create new process or thread, can create concurrent server

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


close()
• int close(int sockfd);
• Close socket for use.
• sockfd is socket descriptor from socket()
• closes socket for reading/writing
• returns (doesn’t block)
• attempts to send any unsent data
• socket option SO_LINGER
• block until data sent
• or discard any remaining data
• returns -1 if error

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


TCP Client-Server
Server
socket()

bind() “well-known”
port
listen()
Client
accept()
socket()
(Block until connection) “Handshake”
connect()

Data (request)
send()
recv()
Data (reply)
send()
recv()

close()
close()
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
connect()
• int connect(int sockfd, const struct sockaddr *servaddr, socklen_t
addrlen);
• Connect to the server
• sockfd is socket descriptor from socket()
• servaddr is a pointer to a structure with:
• port number and IP address
• must be specified (unlike bind())
• addrlen is length of structure
• client doesn’t need bind()
• OS will pick ephemeral port
• returns socket descriptor if ok, -1 on error
• Return error
• ETIMEOUT : no response from server
• RST : server process is not running
• EHOSTUNREACH : client’s SYN unreachable
from some intermediate router.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Sending and Receiving
• int recv(int sockfd, void *buff, size_t mbytes,
int flags);
• int send(int sockfd, void *buff, size_t mbytes,
int flags);

• Same as read() and write() but for flags


• MSG_DONTWAIT (this send non-blocking)
• MSG_OOB (out of band data, 1 byte sent ahead)
• MSG_PEEK (look, but don’t remove)
• MSG_WAITALL (don’t give me less than max)
• MSG_DONTROUTE (bypass routing table)
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
UDP Client-Server
Server
socket()
“well-known”
bind()
Client
port

recvfrom() socket()
(Block until receive datagram) Data (request)
sendto()

sendto() recvfrom()
Data (reply)
- No “handshake” close()
- No simultaneous close
- No fork()/spawn() for M,
Dr. Ramakrishna concurrent servers!
Dept. of I&CT, MIT, MAHE, Manipal
Sending and Receiving
• int recvfrom(int sockfd, void *buff, size_t
mbytes, int flags, struct sockaddr *from,
socklen_t *addrlen);

• int sendto(int sockfd, void *buff, size_t mbytes,


int flags, const struct sockaddr *to, socklen_t
addrlen);
• Same as recv() and send() but for addr
• recvfrom fills in address of where packet came from
• sendto requires address of where sending packet to

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


connect() with UDP
• Record address and port of peer
• datagrams to/from others are not allowed
• does not do three way handshake, or connection
• “connect” a misnomer, here. Should be setpeername()
• Use send() instead of sendto()
• Use recv() instead of recvfrom()
• Can change connect or unconnect by repeating connect() call
• (Can do similar with bind() on receiver)

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Why use connected UDP?
• Send two datagrams • Send two datagrams connected:
unconnected: • connect the socket
• connect the socket • output first dgram
• output first dgram • ouput second dgram
• unconnect the socket
• connect the socket
• ouput second dgram
• unconnect the socket

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Concurrent Servers
• Close sock in child, newsock in parent
• Reference count for socket descriptor
Text segment
Parent
sock = socket()
/* setup socket */ int sock;
while (1) { int newsock;
newsock = accept(sock)
fork()
if child
read(newsock) Child
until exit
int sock;
} int newsock;
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
Client: Learning Server Address/Port
• Server typically known by name and service
• E.g., “www.cnn.com” and “http”
• Need to translate into IP address and port #
• E.g., “64.236.16.20” and “80”

• Get address info with given host name and service


• int getaddrinfo( char *node, char *service,
struct addrinfo *hints, struct addrinfo
**result)
• *node: host name (e.g., “www.cnn.com”) or IP address
• *service: port number or service listed in /etc/services (e.g. ftp)
• hints: points to a struct addrinfo with known information

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Server: Allowing Clients to Wait
• Many client requests may arrive
• Server cannot handle them all at the same time
• Server could reject the requests, or let them wait
• Define how many connections can be pending
• int listen(int sockfd, int backlog)
• Arguments: socket descriptor and acceptable backlog
• Returns a 0 on success, and -1 on error
• Listen is non-blocking: returns immediately
• What if too many clients arrive?
• Some requests don’t get through
• The Internet makes no promises…
• And the client can always try again

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Server: Accepting Client Connection
• Now all the server can do is wait…
• Waits for connection request to arrive
• Blocking until the request arrives
• And then accepting the new request

• Accept a new connection from a client


• int accept(int sockfd, struct sockaddr *addr,
socketlen_t *addrlen)
• Arguments: sockfd, structure that will provide client address and port, and
length of the structure
• Returns descriptor of socket for this new connection

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Client and Server: Cleaning House
• Once the connection is open
• Both sides and read and write
• Two unidirectional streams of data
• In practice, client writes first, and server reads
• … then server writes, and client reads, and so on
• Closing down the connection
• Either side can close the connection
• … using the int close(int sockfd)
• What about the data still “in flight”
• Data in flight still reaches the other end
• So, server can close() before client finishes reading

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Server: One Request at a Time?
• Serializing requests is inefficient
• Server can process just one request at a time
• All other clients must wait until previous one is done
• What makes this inefficient?
• May need to time share the server machine
• Alternate between servicing different requests
• Do a little work on one request, then switch when you are waiting for some other
resource (e.g., reading file from disk)
• “Nonblocking I/O”
• Or, use a different process/thread for each request
• Allow OS to share the CPU(s) across processes
• Or, some hybrid of these two approaches

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Handle Multiple Clients using fork()
• Steps to handle multiple clients
• Go to a loop and accept connections using accept()
• After a connection is established, call fork() to create a new child process to
handle it
• Go back to listen for another socket in the parent process
• close() when you are done.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


fork and exec function
#include <unistd.h>
pid_t fork(void);
Returns: 0 in child, process ID of child in parent, -1 on error

#include <unistd.h>
int execl(const char *pathname, const char *arg(), …/*(char *) 0*/);
int execv(const char *pathname, char *const argv[]);
int execle(const char *pathname, const char *arg());
int execve(const char *pathname, char *const argv[], char *const envp[]);
int execlp(const char *filename, const char *arg());
int execvp(const char *filename, char *const argv[]);

All six return: -1 on error, no return on success

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
Concurrent server
pid_t pid;

int listenfd, connfd;


listenfd = socket(...);
/***fill the socket address with server’s well known port***/
bind(listenfd, ...);
listen(listenfd, ...);
for ( ; ; ) {
connfd = accept(listenfd, ...); /* blocking call */

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


if ( (pid = fork()) == 0 ) {
close(listenfd); /* child closes listening socket */
/***process the request doing something using connfd ***/
/* ................. */
close(connfd);
exit(0); /* child terminates
}
close(connfd); /*parent closes connected socket*/
}
}

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
getsockname and getpeername function
• #include<sys/socket.h>
• int getsockname(int sockfd, struct sockaddr
*localaddr, socklen_t
*addrlen);
• int getpeername(int sockfd, struct sockaddr
*peeraddr, socklen_t
*addrlen);
• both return : 0 if OK, -1 on error
• =>getsockname : return local address associated with a socket
• getpeername : foreign protocol address associated with a socket

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
Socket Options
Dr. Ramakrishna M

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Contents
• Introduction
• getsockopt and setsockopt function
• socket state
• Generic socket option
• IPv4 socket option
• ICMPv6 socket option
• IPv6 socket option
• TCP socket option
• fcnl function

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Introduction
• Three ways to get and set the socket option that affect a socket
• getsockopt , setsockopt function=>IPv4 and IPv6 multicasting options
• fcntl function =>nonblocking I/O, signal driven I/O
• ioctl function =>chapter16

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


getsockopt and setsockopt function
#include <sys/socket.h>
int getsockopt(int sockfd, , int level, int optname, void *optval,
socklent_t *optlen);
int setsockopt(int sockfd, int level , int optname, const void
*optval, socklent_t optlen);

sockfd => open socket descriptor


level => code in the system to interpret the option(generic, IPv4, IPv6, TCP)
optval => pointer to a variable from which the new value of option is fetched
by setsockopt, or into which the current value of the option is stored
by setsockopt.
optlen => the size of the option variable
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
socket state
• We must set that option for the listening socket => because
connected socket is not returned to a server by accept until the
three-way handshake is completed by the TCP layer.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Generic socket options
• SO_BROCAST =>enable or disable the ability of the process to send broadcast
message.(only datagram socket : Ethernet, token ring..)
• SO_DEBUG =>kernel keep track of detailed information about all packets sent or
received by TCP(only supported by TCP)
• SO_DONTROUTE=>outgoing packets are to bypass the normal routing
mechanisms of the underlying protocol.
• SO_ERROR=>when error occurs on a socket, the protocol module in a Berkeley-
derived kernel sets a variable named so_error for that socket. Process can
obtain the value of so_error by fetching the SO_ERROR socket option

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


SO_KEEPALIVE

• SO_KEEPALIVE=>wait 2hours, and then TCP automatically sends a


keepalive probe to the peer.
• Peer response
• ACK(everything OK)
• RST(peer crashed and rebooted):ECONNRESET
• no response: ETIMEOUT =>socket closed
• example: Rlogin, Telnet…

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


SO_LINGER
• SO_LINGER =>specify how the close function operates for a connection-oriented
protocol(default:close returns immediately)
struct linger{
int l_onoff; /* 0 = off, nonzero = on */
int l_linger; /*linger time : second*/
};
• l_onoff = 0 : turn off , l_linger is ignored
• l_onoff = nonzero and l_linger is 0:TCP abort the connection, discard any
remaining data in send buffer.
• l_onoff = nonzero and l_linger is nonzero : process wait until remained data
sending, or until linger time expired. If socket has been set nonblocking it will not wait
for the close to complete, even if linger time is nonzero.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


SO_LINGER

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


SO_LINGER

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


SO_LINGER

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


shutdown()
#include <sys/socket.h>
int shutdown(int socket, int how);

socket: Specifies the file descriptor of the socket.


how: Specifies the type of shutdown. The values are as follows:
SHUT_RD: Disables further receive operations.
SHUT_WR: Disables further send operations.
SHUT_RDWR: Disables further send and receive operations.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


• A way to know that the peer application has read the data
• Use an application-level ack or application ACK
• Client
char ack;
Write(sockfd, data, nbytes); // data from client to server
n=Read(sockfd, &ack, 1); // wait for application-level ack
• Server
nbytes=Read(sockfd, buff, sizeof(buff)); //data from client
//server verifies it received the correct amount of data from
// the client
Write(sockfd, “”, 1);//server’s ACK back to client

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
SO_RCVBUF, SO_SNDBUF
• Let us change the default send-buffer, receive-buffer size.
• Default TCP send and receive buffer size :
• 4096bytes
• 8192-61440 bytes
• Default UDP buffer size : 9000bytes, 40000 bytes
• SO_RCVBUF option must be setting before connection established.
• TCP socket buffer size should be at least three times the MSSs

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


SO_RCVLOWAT, SO_SNDLOWAT
• Every socket has a receive low-water mark and send low-water mark.(used
by select function)
• Receive low-water mark:
• the amount of data that must be in the socket receive buffer for select to return
“readable”.
• Default receive low-water mark : 1 for TCP and UDP
• Send low-water mark:
• the amount of available space that must exist in the socket send buffer for select to
return “writable”
• Default send low-water mark : 2048 for TCP
• UDP send buffer never change because dose not keep a copy of send datagram.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


SO_RCVTIMEO, SO_SNDTIMEO
• Allow us to place a timeout on socket receives and sends.
• Default disabled

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


SO_REUSEADDR, SO_REUSEPORT
• Allow a listening server to start and bind its well-known port even if
previously established connection exist that use this port as their local
port.
• Allow multiple instance of the same server to be started on the same
port, as long as each instance binds a different local IP address.
• Allow a single process to bind the same port to multiple sockets, as
long as each bind specifies a different local IP address.
• Allow completely duplicate bindings : multicasting

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


SO_TYPE
• Return the socket type.
• Returned value is such as SOCK_STREAM, SOCK_DGRAM...

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


SO_USELOOPBACK
• This option applies only to sockets in the routing domain(AF_ROUTE).
• The socket receives a copy of everything sent on the socket.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


IPv4 socket option
• Level => IPPROTO_IP
• IP_HDRINCL => If this option is set for a raw IP socket, we must
build our IP header for all the datagrams that we send on the raw
socket.(chapter 26)

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


• IP_OPTIONS=>allows us to set IP option in IPv4 header.(chapter 24)
• IP_RECVDSTADDR=>This socket option causes the destination IP
address of a received UDP datagram to be returned as ancillary data
by recvmsg. (chapter20)

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


IP_RECVIF
• Cause the index of the interface on which a UDP datagram is received
to be returned as ancillary data by recvmsg.(chapter20)

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


IP_TOS
• lets us set the type-of-service(TOS) field in IP header for a TCP or UDP
socket.
• If we call getsockopt for this option, the current value that would
be placed into the TOS(type of service) field in the IP header is
returned.(figure A.1)

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


IP_TTL
• We can set and fetch the default TTL (time to live field, figure A.1).

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


TCP socket option
• There are five socket option for TCP, but three are new with Posix.1g
and not widely supported.
• Specify the level as IPPROTO_TCP.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


TCP_KEEPALIVE
• This is new with Posix.1g
• It specifies the idle time in second for the connection before TCP
starts sending keepalive probe.
• Default 2hours
• this option is effective only when the SO_KEEPALIVE socket
option enabled.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


TCP_MAXRT
• This is new with Posix.1g.
• It specifies the amount of time in seconds before a connection is
broken once TCP starts retransmitting data.
• 0 : use default
• -1:retransmit forever
• positive value: rounded up to next transmission time

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


TCP_MAXSEG
• This allows us to fetch or set the maximum segment size(MSS) for TCP
connection.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


TCP_NODELAY
• This option disables TCP’s Nagle algorithm.
• (default this algorithm enabled)
• purpose of the Nagle algorithm.
• ==>prevent a connection from having multiple small packets
outstanding at any time.
• Small packet => any packet smaller than MSS.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Nagle algorithm
• Default enabled.
• Reduce the number of small packet on the WAN.
• If given connection has outstanding data , then no small packet data
will be sent on connection until the existing data is acknowledged.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Nagle algorithm disabled

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Nagle algorithm enabled

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


fcntl function
• File control
• This function perform various descriptor control operation.
• Provide the following features
• Nonblocking I/O(chapter 15)
• signal-driven I/O(chapter 22)
• set socket owner to receive SIGIO signal (chapter 21,22)

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


#include <fcntl.h>
int fcntl(int fd, int cmd, …./* int arg */);
Returns: depends on cmd if OK, -1 on error
cmd:
F_GETFL: get flag
F_SETFL: set flag

Two flags that affect a socket:


O_NONBLOCK : nonblocking I/O
O_ASYNC : signal driven I/O notification

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Nonblocking I/O using fcntl
int flags;
/* set socket nonblocking */
if((flags = fcntl(fd, f_GETFL, 0)) < 0)
err_sys(“F_GETFL error”);
flags |= O_NONBLOCK;
if(fcntl(fd, F_SETFL, flags) < 0)
err_sys(“F_ SETFL error”);
each descriptor has a set of file flags that fetched with
the F_GETFL command
and set with F_SETFL command.
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
Misuse of fcntl
/* wrong way to set socket nonblocking */
if(fcntl(fd, F_SETFL,O_NONBLOCK) < 0)
err_sys(“F_ SETFL error”);

/* because it also clears all the other file status flags.*/

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


To Set a Flag
• To set a flag : 1. fetch 2. OR/AND~ 3.set

int flags;
if ( (flags = fcntl(fd, F_GETFL, 0) ) < 0)
error_sys (“F_GETFL error”);

flags |= O_NONBLOCK; /* turn on */


or
flags &= ~O_NONBLOCK; /* turn off */

if (fcntl(fd, F_SETFL, flags) < 0)


error_sys (“F_SETFL error”)

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Turn off the nonblocking flag
Flags &= ~O_NONBLOCK;
if(fcntl(fd, F_SETFL, flags) < 0)
err_sys(“F_SETFL error”);

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Set and Get Socket Owner
• F_SETOWN command lets us set socket owner (process ID or
process group ID) to receive SIGIO and SIGURG signals
• SIGIO is generated if signal-driven I/O is enabled for a socket
• SIGURG is generated when a out-of-band data arrives for a socket.
• F_GETOWN command gets socket owner

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


F_SETOWN
• The integer arg value can be either positive(process ID) or negative
(group ID)value to receive the signal.
• F_GETOWN => return the socket owner by fcntl function, either
process ID or process group ID.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
Name and Address
Conversions
Dr. Ramakrishna M

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Elementary Name and Address Conversions
• Domain name system
• gethostbyname Function
• RES_USE_INET6 resolver option
• gethostbyname2 Function and IPv6 support
• gethostbyaddr Function
• uname and gethostname Functions
• getservbyname and getservbyport Functions
• Other networking information

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Domain Name System
• Entries in DNS: resource records (RRs) for a host
• A record: maps a hostname to a 32-bit IPv4 addr
• AAAA (quad A) record: maps to a 128-bit IPv6 addr
• PTR record: maps IP addr to hostname
• MX record: specifies a mail exchanger of the host
• CNAME record: assigns canonical name for common services
e.g. solaris IN A 206.62.226.33
IN AAAA 5f1b:df00:ce3e:e200:0020:0800:2078:e3e3
IN MX 5 solaris.kohala.com
IN MX 10 mailhost.kohala.com
IN PTR 33.226.62.206.in-addr.arpa
www IN CNAME bsdi.kohala.com
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
DNS: Application, Resolver, Name Servers
application

application
code
UDP
call return
request
resolver local other
code name name
UDP server servers
reply
resolver resolver functions:
configuration gethostbyname/gethostbyaddr
files name server: BIND
(Berkeley Internet Name Domain)
static hosts files (DNS alternatives):
/etc/hosts
resolver configuration file (specifies name server IPs):
/etc/resolv.conf
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
gethostbyname Function
performs a DNS query for an A record or a AAAA record

#include <netdb.h>
struct hostent *gethostbyname (const char *hostname);
returns: nonnull pointer if OK, NULL on error with h_errno set
struct hostent {
char *h_name; /* official (canonical) name of host */
char **h_aliases; /* ptr to array of ptrs to alias names */
int h_addrtype; /* host addr type: AF_INET or AF_INET6 */
int h_length; /* length of address: 4 or 16 */
char **h_addr_list; /* ptr to array of ptrs with IPv4/IPv6 addrs */
};
#define h_addr h_addr_list[0] /* first address in list */

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


hostent Structure Returned by gethostbyname
hostent { }
h_name canonical hostname \0
h_aliases alias #1 \0
h_addrtype AF_INET/6
h_length 4/16 alias #2 \0
h_addr_list NULL

in/6_addr { }
IP addr #1
in/6_addr { }
IP addr #2
NULL in/6_addr { }
IP addr #3

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


RES_USE_INET6 Resolver Option
• Per-application: call res_init
• #include <resolv.h>
• res_init ( );
• _res.options |= RES_USE_INET6
• Per-user: set environment variable RES_OPTIONS
• export RES_OPTIONS=inet6
• Per-system: update resolver configuration file
• options inet6 (in /etc/resolv.conf)
• For a host without a AAAA record, IPv4-mapped IPv6 addresses are
returned.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


gethostbyname2 Function and IPv6 Support
#include <netdb.h>
struct hostent *gethostbyname2 (const char *hostname, int family);
returns: nonnull pointer if oK, NULL on error with h_errno set

RES_USE_INET6 option
off on
gethostbyname A record AAAA record
(host) or A record returning
IPv4-mapped IPv6 addr
gethostbyname2 A record A record returning
(host, AF_INET) IPv4-mapped IPv6 addr
gethostbyname2 AAAA record AAAA record
(host, AF_INET6)

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


IPv4 and IPv6 Interoperability
Dr. Ramakrishna M

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Contents
• Introduction
• IPv4 Client, IPv6 Server
• IPv6 Client, IPv4 Server
• IPv6 Address Testing Macros
• IPV6_ADDRFORM Socket Option
• Source Code Portability

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Introduction
• Server and client combination
• IPv4 <=> IPv4(most server and client)
• IPv4 <=> IPv6
• IPv6 <=> IPv4
• IPv6 <=> IPv6
• How IPv4 application and IPv6 application can communicate with
each other.
• Host are running dual stacks, both an IPv4 protocol stack and IPv6
protocol stack

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


IPv4 Client , IPv6 Server
• IPv6 dual stack server can handle both IPv4 and IPv6 clients.
• This is done using IPv4-mapped IPv6 address
• server create an IPv6 listening socket that is bound to the IPv6
wildcard address

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


IPv6 IPv4 IPv6 listening socket,
IPv6
client client server
bound to 0::0, port 8888

TCP TCP TCP


IPv6 address
IPv4 mapped
IPv6 address
IPv6 IPv4 IPv4 IPv6
206.62.226.42 5flb:df00:ce3
Data Data Data e:e200:20:80
0:2b37:6426
link link link

Enet IPv4 TCP TCP


Type0800 hdr hdr hdr data
Dport 8888
Enet IPv4 TCP TCP
hdr hdr hdr data
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
Type0800 Dport 8888
AF_INET AF_INET
IPv4 SOCK_STREAM SOCK_DGRAM
sockets sockaddr_in sockaddr_in
AF_INET6 AF_INET6
IPv6 SOCK_STREAM SOCK_DGRAM
sockets sockaddr_in6 sockaddr_in6

TCP UDP

IPv4 mapped
Address IPv4 IPv6
returned by
accept or
recvfrom
IPv4 IPv6

IPv4 Dr.datagram
Ramakrishna M, Dept. of I&CT, MIT, IPv6 datagram
MAHE, Manipal
IPv6 client, IPv4 server
• IPv4 server start on an IPv4 only host and create an IPv4 listening
socket
• IPv6 client start, call gethostbyname. IPv4 mapped IPv6 address is
returned.
• Using IPv4 datagram

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


AF_INET AF_INET
IPv4 SOCK_STREAM SOCK_DGRAM
sockets sockaddr_in sockaddr_in
AF_INET6 AF_INET6
IPv6 SOCK_DGRAM SOCK_DGRAM
sockets sockaddr_in6 sockaddr_in6

TCP UDP

IPv6 IPv4 mapped


Address IPv4 IPv6
for connect
or sendto

IPv4 IPv6

IPv4 Dr.datagram
Ramakrishna M, Dept. of I&CT, MIT, IPv6 datagram
MAHE, Manipal
IPv6 Address Testing Macros
• There are small class of IPv6 application that must know whether they
are talking to an IPv4 peer.
• These application need to know if the peer’s address is an IPv4-
mapped IPv6 address.
• Twelve macro defined

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


IPV6_ADDRFORM Socket Option
• Can change a socket from one type to another, following restriction.
• An IPv4 socket can always be changed to an IPv6. Any IPv4 address already
associated with the socket are converted to IPv4- mapped IPv6 address.
• An IPv6 socket can changed to an IPv4 socket only if any address already
associated with the socket are IPv4-mapped IPv6 address.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Converting an IPv4 to IPv6

int af;
socklen_t clilen;
struct sockaddr_int6 cli; /* IPv6 struct */
struct hostent *ptr;

af = AF_INT6;
setsockopt(STDIN_FILENO, IPPROTO_IPV6, IPV6_ADDRFORM, &af, sizeof(af));

clilen = sizeof(cli);
getpeername(0, &cli, &clilen);
ptr = gethostbyaddr(&cli.sin6_addr, 16, AF_INET);
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
• setsockopt => change the Address format of socket from IPv4 to IPv6.
• Return value is AF_INET or AF_INET6
• getpeername =>return an IPv4-mapped IPv6 address

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


gethostbyaddr Function
binary IP address to hostent structure

#include <netdb.h>
struct hostent *gethostbyaddr (const char *addr, size_t len, int family);
returns: nonnull pointer if OK, NULL on error with h_errno set

addr argument: a pointer to an in_addr or in6_addr structure


h_name in hostent: canonical hostname
gethostbyaddr: queries a DNS name server for a PTR record
in the in-addr.arpa domain for IPv4 or a PTR record
in the ip6.int domain for IPv6.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


getservbyname and getservbyport Functions
#include <netdb.h>
struct servent *getservbyname (const char *servname, const char *protoname);
returns: nonnull pointer if OK, NULL on error
struct servent *getservbyport (int port, const char *protoname);
returns: nonnull pointer if OK, NULL on error
struct servent {
char *s_name; /* official service name */
char **s_aliases; /*alias list */
int s_port; /* port number, network-byte order */
char *s_proto; /* protocol, TCP or UDP, to use */

Mapping from name to port number: in /etc/services


Services that support multiple protocols often use the same TCP and UDP
port number. But it’s not always true:
shell 514/tcp
syslog 514/udp
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
Other Networking Info
• Four types of info:
• hosts (gethostbyname, gethostbyaddr)
• through DNS or /etc/hosts, hostent structure
• networks (getnetbyname, getnetbyaddr)
• through DNS or /etc/networks, netent structure
• protocols (getprotobyname, getprotobynumber)
• through /etc/protocols, protoent structure
• services (getservbyname, getservbyport)
• through /etc/servies, servent structure

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Daemon Process
Dr. Ramakrishna M

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Introduction
• A daemon is a process that runs in the background and is independent of
control from all terminals.
• There are numerous ways to start a daemon
1. the system initialization scripts ( /etc/rc )
2. the inetd superserver
3. cron deamon
4. the at command
5. from user terminals

• Since a daemon does not have a controlling terminal, it needs some way to
output message when something happens, either normal informational
messages, or emergency messages that need to be handled by an
administrator.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


syslogd daemon
• Berkeley-derived implementation of syslogd perform the following actions
upon startup.
1. The configuration file is read, specifying what to do with each type of log message
that the daemon can receive.
2. A Unix domain socket is created and bound to the pathname /var/run/log
( /dev/log on some system).
3. A UDP socket is created and bound to port 514
4. The pathname /dev/klog is opened. Any error messages from within the kernel
appear as input on this device.

• We could send log messages to the syslogd daemon from our daemons by
creating a Unix domain datagram socket and sending our messages to the
pathname that the daemon has bound, but an easier interface is the syslog
function.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


syslog function

#include <syslog.h>
void syslog(int priority, const char *message, . . . );

• the priority argument is a combination of a level and a facility.


• The message is like a format string to printf, with the addition of a %m
specification, which is replaced with the error message corresponding
to the current value of errno.

• Ex) Syslog(LOG_INFO|LOG_LOCAL2, “rename(%s, %s): %m”,file1,file2);

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


syslog function
• Log message have a level between 0 and 7.
level value description
LOG_EMERG 0 system is unusable ( highest priority )
LOG_ALERT 1 action must be taken immediately
LOG_CRIT 2 critical conditions
LOG_ERR 3 error conditions
LOG_WARNING 4 warning conditions
LOG_NOTICE 5 normal but significant condition (default)
LOG_INFO 6 informational
LOG_DEBUG 7 debug-level message ( lowest priority )
Figure 12.1 level of log message.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


syslog function
facility Description
LOG_AUTH security / authorization messages
• A facility to identify LOG_AUTHPRIV
LOG_CR ON
security / authorization messages (private)
cron daemon
the type of process LOG_DAEM ON
LOG_FTP
system daemons
FTP daemon
sending the message. LOG_KERN kernel messages
LOG_LOCAL0 local use
LOG_LOCAL1 local use
LOG_LOCAL2 local use
LOG_LOCAL3 local use
LOG_LOCAL4 local use
LOG_LOCAL5 local use
LOG_LOCAL6 local use
LOG_LOCAL7 local use
LOG_LPR line printer system
LOG_M AIL mail system
LOG_NEWS network news system
LOG_SYSLOG messages generated internally by syslog
LOG_USER random user-level messages(default)
LOG_UUC P UUCP system
Figure 12.2 facility of log messages.
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
syslog function
• Openlog and closelog
• openlog can be called before the first call to syslog and closelog can be called
when the application is finished sending is finished log messages.
options Description
LOG_CONS Log to console if cannot send to syslog daemon
LOG_NDELAY Do not delay open, create socket now
LOG_PERROR Log to standard error as well as sending to syslogd aemon
LOG_PDI Log the process ID with each message
Figure 12.3 options for openlog

#include <syslog.h>
void openlog(const char *ident, int options, int facility);
void closelog(void);

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


daemon_init Function
#include "unp.h"
#include <syslog.h>
#define MAXFD 64
extern int daemon_proc; /* defined in error.c */
void daemon_init(const char *pname, int facility)
{
int i;
pid_t pid;
if ( (pid = Fork()) != 0)
exit(0); /* parent terminates */
/* 1st child continues */
setsid(); /* become session leader */
Signal(SIGHUP, SIG_IGN);
if ( (pid = Fork()) != 0) exit(0); /* 1st child terminates */
/* 2nd child continues */
daemon_proc = 1; /* for our err_XXX() functions */
chdir("/"); /* change working directory */
umask(0); /* clear our file mode creation mask */
for (i = 0; i < MAXFD; i++)
close(i);
openlog(pname, LOG_PID, facility);
}
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
inetd Daemon

• A typical Unix system’s problems


• 1. All these daemons contained nearly identical startup code.
• 2. Each daemon took a slot in the process table, but each daemon was asleep most
of the time.

• inetd daemon fixes the two problems.


• 1. It simplifies writing daemon processes, since most of the startup details are
handled by inetd.
• 2. It allow a single process(inetd) to be waiting for incoming client requests for
multiple services, instead of one process for each service.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


socket()

inetd daemon bind()

listen()
For each service listed in the
/etc/inetd.conf file

(if TCP socket)

select()
for readability

accpet()
( if TCP socket)

fork()
parent child

close connected close all descriptors other


socket(if TCP) than socket

dup socket to descriptors


0,1 and 2;
close socket

setgid()
setuid()
( if user not root)

exec() server

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


daemon_inetd Function
• Figure 12.11 #include "unp.h"
#include <syslog.h>
extern int daemon_proc; /* defined in error.c */
void
daemon_inetd(const char *pname, int facility)
{
daemon_proc = 1; /* for our err_XXX() functions */
openlog(pname, LOG_PID, facility);
}

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


daemon_inetd Function #include "unp.h"
#include <time.h>
int
• Figure 12.12 main(int argc, char **argv)
{
socklen_t len;
struct sockaddr *cliaddr;
char buff[MAXLINE];
time_t ticks;
daemon_inetd(argv[0], 0);
cliaddr = Malloc(MAXSOCKADDR);
len = MAXSOCKADDR;
Getpeername(0, cliaddr, &len);
err_msg("connection from %s", Sock_ntop(cliaddr, len));
ticks = time(NULL);
snprintf(buff, sizeof(buff), "%.24s\r\n", ctime(&ticks));
Write(0, buff, strlen(buff));
Close(0); /* close TCP connection */
exit(0);
}

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Multicast Sockets
Dr. Ramakrishna M

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Contents
• Multicast address
• Multicasting versus broadcasting on a LAN
• Multicasting on a WAN
• Multicast socket option
• Mcast_join and related function
• Dg_cli function using multicasting
• Receiving mbone session announcements
• Sending and receiving
• Sntp

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


• Unicast
• A single interface
• Broadcast
• Multiple interfaces
• Lan
• Multicast
• A set of interfaces
• LAN or WAN
• Mbone
• Five socket options

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


multicast address
• IPv4 class D address
• 224.0.0.0 ~
239.255.255.255
• (224.0.0.1: all hosts
group), (224.0.0.2:
all-routers group)

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Multicast Range
8 bits 4 bits 4 bits 112 bits
1111 1111 Flag Scope Group ID

Multicast
FF00::/8

IPv6 multicast addresses have


the prefix FF00::/8

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


IPv6 Multicast Addresses - Scope
8 bits 4 bits 4 bits 112 bits
1111 1111 Flag Scope Group ID

• Scope is a 4-bit field used to define the range of the multicast packet.
• Scope (partial list):
• 0 Reserved
• 1 Interface-Local scope
• 2 Link-Local scope
• 5 Site-Local scope
• 8 Organization-Local
scope
• E Global scope

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


IPv6 Multicast Addresses - Flag
8 bits 4 bits 4 bits 112 bits
1111 1111 Flag Scope Group ID

• Flag
• 0 - Permanent, well-known multicast address assigned by IANA.
• Includes both assigned and solicited-node multicast addresses.
• 1 - Non-permanently-assigned, “dynamically" assigned multicast
address.
• An example might be FF18::CAFE:1234, used for a multicast
application with organizational scope.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


multicast address(2)
• IPv6 multicast address
• special IPv6 multicast
address (flags)
• ff02::1 => all-nodes
group(all multicast-capable
hosts on subnet must join
this group on all multicast-
capable interfaces)
• ff02::2 => all-routers
group(all multicast-capable
routers on subnet must join
this group on all multicast-
capable interfaces)

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Scope of multicast addresses
• IPv6 multicast address have an explicit 4-bit scope field that specifies
how far the multicast packet will travel.
• Node-local (1)
• link-local (2)
• site-local (5)
• organization-local (8)
• global (14)
• remaining values are unassigned or reserved

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


• *Organization must set its boundary routers not to forward multicast
packets outside network
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
Multicasting versus broadcasting on a LAN
• One host sends a multicast packet, and any interested host receives
the packet.
• Benefit
• reducing the load on all the hosts not interested in the multicast packets

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
multicasting on a WAN

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


A host sends the audio packets, and the multicast receivers are waiting to receive

MRP: Multicast Routing Protocol


Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
multicast socket option

• The API support for multicasting requires only five new socket options.
• Figure 19.7

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


multicast socket option(2)
• IP_ADD_MEMBERSHIP, IPV6_ADD_MEMBERSHIP
• join a multicast group on s specified local interface.

Struct ip_mreq{ Struct ipv6_mreq{


struct in_addr imr_multiaddr; struct in6_addr ipv6mr_multiaddr;
Int imr_interface; int ipv6mr_interface;
}; };

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


multicast socket option(3)
• IP_DROP_MEMBERSHIP, IPV6_DROP_MEMBERSHIP
• leave a multicast group on a specified local interface.
• If the local interface is not specified, the first matching multicasting group
membership is dropped.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


multicast socket option(4)
• IP_MULTICAST_IF, IPV6_MULTICAST_IF
• specify the interface for outgoing multicast datagrams sent on this socket.
• This interface is specified as either an in_addr structure for IPv4 or an
interface index for IPv6.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


multicast socket option(5)
• IP_MULTICAST_TTL, IPV6_MULTICAST_TTL
• set the IPv4 TTL or the IPv6 hop limit for outgoing multicast datagrams.
• If this is not specified , both default to 1, which restricts the datagram to the
local subnet.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


multicast socket option(6)
• IP_MULTICAST_LOOP, IPV6_MULTICAST_LOOP
• enable or disable local loopback of multicast datagrams.
• Default loopback is enabled.
• This is similar to broadcasting.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


mcast_join and related function

#include “unp.h”
int mcast_join(int sockfd, const struct sockaddr *addr, socklen_t salen, const char
*ifname, u_int ifindex);
int mcast_leave(int sockfd, const struct sockaddr *addr, socklen_t salen);
int mcast_set_if(int sockfd, const char *ifname, u_int ifindex);
int mcast_set_loop(int sockfd, int flag);
int mcast_set_ttl(int sockfd, int ttl);
All above return :0 if ok, -1 on error
int mcast_get_if(int sockfd);
return : nonnegative interface index if OK, -1 error
int mcast_get_loop(int sockfd);
return : current loopback flag if OK, -1 error
int mcast_get_ttl(int sockfd);
return : current TTL or hop limit if OK, -1 error

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


#include "unp.h"
#include <net/if.h>
int mcast_join(int sockfd, const SA *sa, socklen_t salen, const char *ifname, u_int ifindex)
{
switch (sa->sa_family) {
case AF_INET: {
struct ip_mreq mreq;
struct ifreq ifreq;
memcpy(&mreq.imr_multiaddr, &((struct sockaddr_in *) sa)->sin_addr,
sizeof(struct in_addr));
if (ifindex > 0) {
if (if_indextoname(ifindex, ifreq.ifr_name) == NULL) {
errno = ENXIO; /* i/f index not found */
return(-1);
}
goto doioctl;
} else if (ifname != NULL) {
strncpy(ifreq.ifr_name, ifname, IFNAMSIZ);
doioctl:
if (ioctl(sockfd, SIOCGIFADDR, &ifreq) < 0)
return(-1);
memcpy(&mreq.imr_interface,
&((struct sockaddr_in *) &ifreq.ifr_addr)->sin_addr,
sizeof(struct in_addr));
} else
mreq.imr_interface.s_addr = htonl(INADDR_ANY);
return(setsockopt(sockfd, IPPROTO_IP, IP_ADD_MEMBERSHIP, &mreq, sizeof(mreq)));
}
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
#ifdef IPV6
case AF_INET6: {
struct ipv6_mreq mreq6;
memcpy(&mreq6.ipv6mr_multiaddr, &((struct sockaddr_in6 *) sa)->sin6_addr,
sizeof(struct in6_addr));
if (ifindex > 0)
mreq6.ipv6mr_interface = ifindex;
else if (ifname != NULL)
if ( (mreq6.ipv6mr_interface = if_nametoindex(ifname)) == 0) {
errno = ENXIO; /* i/f name not found */
return(-1);
}
else
mreq6.ipv6mr_interface = 0;
return(setsockopt(sockfd, IPPROTO_IPV6, IPV6_ADD_MEMBERSHIP,
&mreq6, sizeof(mreq6)));
}
#endif
default:
errno = EPROTONOSUPPORT;
return(-1);
}
}

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


#include "unp.h"
int mcast_set_loop(int sockfd, int onoff)
{
switch (sockfd_to_family(sockfd)) {
case AF_INET: {
u_char flag;
flag = onoff;
return(setsockopt(sockfd, IPPROTO_IP, IP_MULTICAST_LOOP,
&flag, sizeof(flag)));
}
#ifdef IPV6
case AF_INET6: {
u_int flag;
flag = onoff;
return(setsockopt(sockfd, IPPROTO_IPV6, IPV6_MULTICAST_LOOP, &flag, sizeof(flag)));
}
#endif
default:
errno = EPROTONOSUPPORT;
return(-1);
}
}

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


dg_cli function using multicasting

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Receiving MBone session announcements

• To receive a multimedia conference on the MBone a site needs to


know only the multicast address of the conference and the UDP ports
for the conference’s data streams.(audio, video)
• SAP(Session Announce Protocol)
• describe the way
• SDP(Session Description Protocol)
• the contents of these announcements

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


• A site wishing to announce a session on the Mbone
• periodically sends a multicast packet containing a description of the session to a well-
known multicast group and UDP port

• Sites on Mbone
• run sdp program
• receives these announcements
• provides an interactive user interface that displays the information
• lets user send announcements

• A sample program
• only receives these session announcements to show an example of a simple
multicast receiving program
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
The program receiving the periodic SAP/SDP announcements

#include "unp.h"
#define SAP_NAME "sap.mcast.net" /* default group name and port */
#define SAP_PORT "9875"
void loop(int, socklen_t);
int main(int argc, char **argv)
{
int sockfd;
const int on = 1;
socklen_t salen;
struct sockaddr *sa;
if (argc == 1)
sockfd = Udp_client(SAP_NAME, SAP_PORT, (void **) &sa, &salen);
else if (argc == 4)
sockfd = Udp_client(argv[1], argv[2], (void **) &sa, &salen);
else err_quit("usage: mysdr <mcast-addr> <port#> <interface-name>");
Setsockopt(sockfd, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(on));
Bind(sockfd, sa, salen);
Mcast_join(sockfd, sa, salen, (argc == 4) ? argv[3] : NULL, 0);
loop(sockfd, salen); /* receive and print */
exit(0);
}

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


#include "unp.h"
void loop(int sockfd, socklen_t salen)
{ char buf[MAXLINE+1];
socklen_t len;
ssize_t n;
struct sockaddr *sa;
struct sap_packet {
uint32_t sap_header;
uint32_t sap_src;
char sap_data[1];
} *sapptr;
sa = Malloc(salen);
for ( ; ; ) {
len = salen;
n = Recvfrom(sockfd, buf, MAXLINE, 0, sa, &len);
buf[n] = 0; /* null terminate */
sapptr = (struct sap_packet *) buf;
if ( (n -= 2 * sizeof(uint32_t)) <= 0)
err_quit("n = %d", n);
printf("From %s\n%s\n", Sock_ntop(sa, len), sapptr->sap_data);
}
}

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


sending and receiving

• Sends and receives multicast datagrams


• First part
• sends a multicast datagram to a specific group every 5 seconds and the
datagram contains the sender’s hostname and process ID
• Second part
• an infinite loop that joins the multicast group to which the first part is sending
and prints every received datagrams

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


#include "unp.h"
void recv_all(int, socklen_t);
void send_all(int, SA *, socklen_t);
int main(int argc, char **argv)
{
int sendfd, recvfd;
const int on = 1;
socklen_t salen;
struct sockaddr *sasend, *sarecv;
if (argc != 3) err_quit("usage: sendrecv <IP-multicast-address> <port#>");
sendfd = Udp_client(argv[1], argv[2], (void **) &sasend, &salen);
recvfd = Socket(sasend->sa_family, SOCK_DGRAM, 0);
Setsockopt(recvfd, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(on));
sarecv = Malloc(salen);
memcpy(sarecv, sasend, salen);
Bind(recvfd, sarecv, salen);
Mcast_join(recvfd, sasend, salen, NULL, 0);
Mcast_set_loop(sendfd, 0);
if (Fork() == 0)
recv_all(recvfd, salen); /* child -> receives */
send_all(sendfd, sasend, salen); /* parent -> sends */
}

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


#include "unp.h"
#include <sys/utsname.h>
#define SENDRATE 5 /* send one datagram every 5 seconds */
void
send_all(int sendfd, SA *sadest, socklen_t salen)
{
static char line[MAXLINE]; /* hostname and process ID */
struct utsname myname;
if (uname(&myname) < 0)
err_sys("uname error");;
snprintf(line, sizeof(line), "%s, %d\n", myname.nodename, getpid());
for ( ; ; ) {
Sendto(sendfd, line, strlen(line), 0, sadest, salen);
sleep(SENDRATE);
}
}

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


#include "unp.h"

void
recv_all(int recvfd, socklen_t salen)
{
int n;
char line[MAXLINE+1];
socklen_t len;
struct sockaddr *safrom;

safrom = Malloc(salen);

for ( ; ; ) {
len = salen;
n = Recvfrom(recvfd, line, MAXLINE, 0, safrom, &len);

line[n] = 0; /* null terminate */


printf("from %s: %s", Sock_ntop(safrom, len), line);
}
}

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


SNTP
• NTP => so sophisticated protocol
• SNTP => simplified version of NTP
• hosts do not need the complexity of a complete NTP implementation.
• A client listening for NTP broadcast or multicasts on all attached networks and
then prints the time difference between the NTP packet and the host’s
current time-of-day

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Multicast Routing
Dr. Ramakrishna M

Courtesy: http://www.cs.wisc.edu/~pb/640/multicast.ppt

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


One to many communication
• Application level one to many communication

multiple unicasts IP multicast


R R

S S

R R

R R

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Why Multicast
• When sending same data to multiple receivers
• Better bandwidth utilization
• Less host/router processing
• Quicker participation
• Application
• Video/audio broadcast (one sender)
• Video conferencing (many senders)
• Real time news distribution
• Interactive gaming

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


IP multicast service model
• Invented by Steve Deering (PhD. 1991)
• It’s a different way of routing datagrams
• RFC1112 : Host Extensions for IP Multicasting - 1989
• Senders transmit IP datagrams to a "host group"
• “Host group” identified by a class D IP address
• Members of host group could be present anywhere in the Internet
• Members join and leave the group and indicate this to the routers
• Senders and receivers are distinct: i.e., a sender need not be a member
• Routers listen to all multicast addresses and use multicast routing protocols
to manage groups

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


IP multicast group address
• Things are a little tricky in multicast since receivers can be anywhere
• Class D address space
• high-order three 3bits are set
• 224.0.0.0 ~ 239.255.255.255
• Allocation is essentially random – any class D can be used
• Nothing prevents an app from sending to any multicast address
• Customers end hosts and ISPs are the ones who suffer
• Some well-known address have been designated
• RFC1700
• 224.0.0.0 ~ 224.0.0.25
• Standard are evolving

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Getting Packets to End Hosts
• We haven’t treated general methods for this yet, but the problem is
having both a unicast and multicast IP
• Packets from remote sources will only be forwarded by IP routers
onto a local network only if they know there is at least one recipient
for that group on that network
• Internet Group Management Protocol (IGMP, RFC2236)
• Used by end hosts to signal that they want to join a specific multicast group
• Used by routers to discover what groups have interested member hosts on
each network to which they are attached.
• Implemented directly over IP

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


IGMP – Joining a group
Example : R joins to Group 224.2.0.1
IGMP Membership-Report
• R sends IGMP Membership-Report
R
to 224.2.0.1
Network A • DR receives it. DR will start
forwarding packets for 224.2.0.1 to
Network A
DR

Network B Data to 224.2.0.1 • DR periodically sends IGMP


Membership-Query to 224.0.0.1
(ALL-SYSTEMS.MCAST.NET)
R: Receiver
DR: Designated Router • R answers IGMP Membership-
Report to 224.2.0.1
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
IGMP – Leaving a group
IGMP Leave-Group Example : R leaves from a Group 224.2.0.1
R • R sends IGMP Leave-Group to
224.0.0.2
Network A
(ALL-ROUTERS.MCAST.NET)
• DR receives it.
DR
• DR stops forwarding packets for
Network B Data to 224.2.0.1
224.2.0.1 to Network A if no more
224.2.0.1 group members on Network A.
R: Receiver
DR: Designated Router

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


IP multicast routing
• Purpose: share Group information among routers, to implement
better routing for data distribution
• Distribution tree structure
• Source tree vs shared tree
• Data distribution policy
• Opt in (ACK) type vs opt out (NACK) type
• Routing protocols are used in conjunction with IGMP

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Source distribution tree
Source S
Notation: (S, G)
S = Source
G = Group

A B D F

C E

R R

Receiver 1 Receiver 2
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
Shared distribution tree
Source S1
Notation: (*, G)
* = all sources
G = Group
Shared Root

A B D F S2

C E

R R

Receiver 1 Receiver 2
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
Source tree characteristics
• Source tree
• More memory O (G x S ) in routers
• Optimal path from source to receiver, minimizes delay
• Good for
• Small number of senders, many receivers such as radio broadcasting
application

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Shared tree characteristics
• Shared tree
• Less memory O (G) in routers
• Sub-optimal path from source to receiver, may introduce extra delay (source
to root)
• May have duplicate data transfer (possible duplication of a path from source
to root and a path from root to receivers)
• Good for
• Environments where most of the shared tree is the same as the source tree
• Many senders with low bandwidth (e.G. Shared whiteboard)

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Data distribution policy
• Opt out (NACK) type
• Start with “broadcasting” then prune brunches with no receivers, to create a
distribution tree
• Lots of wasted traffic when there are only a few receivers and they are spread
over wide area
• Opt in (ACK) type
• Forward only to the hosts which explicitly joined to the group
• Latency of join propagation

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Protocol types
• Dense mode protocols
• Assumes dense group membership
• Source distribution tree and NACK type
• DVMRP (distance vector multicast routing protocol)
• PIM-DM (protocol independent multicast, dense mode)
• Example: company-wide announcement
• Sparse mode protocol
• Assumes sparse group membership
• Shared distribution tree and ACK type
• PIM-SM (protocol independent multicast, sparse mode)
• Examples: futurama or a shuttle launch

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


DVMRP
exchange distance vectors
• Each router maintains a ‘multicast routing table’ by exchanging distance
vector information among routers
• First multicast routing protocol ever deployed in the Internet
• Similar to RIP
• Constructs a source tree for each group using reverse path forwarding
• Tree provides a shortest path between source and each receiver
• There is a “designated forwarder” in each subnet
• Multiple routers on the same LAN select designated forwarder by lower metric or
lower IP address (discover when exchanging metric info.)
• Once tree is created, it is used to forward messages from source to
receivers
• If all routers in the network do not support DVMRP then unicast tunnels
are used to connect multicast enabled networks

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


DVMRP
broadcast & prune

• Flood multicast packets based on RPF (Reverse path forwarding) rule


to all routers.
• Leaf routers check and sends prune message to upstream router
when no group member is on their network
• Upstream router prune the interface with no dependent downstream
router.
• Graft message to create a new branch for late participants
• Restart forwarding after prune lifetime (standard : 720 minutes)
• draft-ietf-idmr-dvmrp-v3-09.txt (September 1999)

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


RPF(reverse path forwarding)
• Simple algorithm developed to avoid duplicate packets on multi-
access links
• RPF algorithm takes advantage of the IP routing table to compute a
multicast tree for each source.
• RPF check
• When a multicast packet is received, note its source (S) and interface (I)
• If I belongs to the shortest path from S, forward to all interfaces except I
• If test in step 2 is false, drop the packet
• Packet is never forwarded back out the RPF interface!

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


DVMRP (1)
form a source tree by exchanging metric source tree

S Source

DF

R1

Receiver 1

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


DVMRP (2)
broadcast source tree

S datagram
Source

DF

R1

Receiver 1

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


DVMRP (3)
prune source tree

S datagram
Source
IGMP DVMRP-Prune

DF

R1

Receiver 1

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


DVMRP (4)
X and Y pruned source tree

S datagram
Source

DF X

Y
R1

Receiver 1

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


DVMRP (4)
New member source tree

S datagram
Source
IGMP DVMRP-Graft

DF X

Y
R1
R2
Receiver 1
Receiver 2
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
DVMRP (4)
New branch source tree

S datagram
Source
IGMP DVMRP-Graft

DF X

Y
R1
R2
Receiver 1
Receiver 2
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
Internet Group
Management Protocol
Objectives
Upon completion you will be able to:

• Know the purpose of IGMP


• Know the types of IGMP messages
• Understand how a member joins a group and leaves a group
• Understand membership monitoring
• Understand how an IGMP message is encapsulated
• Understand the interactions of the modules of an IGMP package
TCP/IP Protocol Suite 1
UNICASTING

TCP/IP Protocol Suite 2


MULTICASTING

TCP/IP Protocol Suite 3


Difference b/w multicasting and multiple unicasting
 Multicasting starts with one single packet from the source that is duplicated by the
routers. The destination address in each packet is the same for all duplicates
 In multiple unicasting, several packets start from the source with different
destination address. For example, when a person sends an e-mail message to a
group of people, this is multiple unicasting. The e-mail software creates replicas of
the message, each with a different destination address, and sends them one by one.

TCP/IP Protocol Suite 4


Applications of Multicasting

 Teleconferencing
 Distance Learning
 Dissemination of News

TCP/IP Protocol Suite 5


Multicast Address
 A multicast address is a destination address for a group of
hosts that have joined a multicast group.
 A packet that uses a multicast address as a destination can
reach all members of the group unless there are some filtering
restriction by the receiver.
 Multicast Address used in IPV4: In classful addressing,
multicast addresses occupied the only single block in class D.
In classless addressing the same block has been used for this
purpose.
 In other words, the block assigned for multicasting is
224.0.0.0/4. This means that the block has 2^28 = 268,435,456
addresses (224.0.0.0 to 239.255.255.255).

TCP/IP Protocol Suite 6


Internet Group Management Protocol
(IGMP)
 Multicast communication means that a sender sends a message to a group
of recipients that are members of the same group.
 Since one copy of the message is sent by the sender, but copied and
forwarded by routers, each multicast router needs to know the list of groups
that have at least one loyal member related to each interface.
 This means that the multicast routers need to collect information about
members and share it with other multicast routers.
 Collection of this type of information is done at two levels: locally and
globally.
 A multicast router connected to a network is responsible to collect this type
of information locally; the information collected can be globally
propagated to other routers.
 The first task is done by the IGMP protocol; the second task is done by the
multicast routing protocols.
TCP/IP Protocol Suite 7
Position of IGMP in the network layer

TCP/IP Protocol Suite 8


GROUP MANAGEMENT
IGMP is a protocol that manages group membership. The IGMP
protocol gives the multicast routers information about the membership
status of hosts (routers) connected to the network. .

TCP/IP Protocol Suite 9


Note:

IGMP is a group management


protocol. It helps a multicast router
create and update a list of loyal
members related to each router
interface.

TCP/IP Protocol Suite 10


IGMP MESSAGES
IGMP has three types of messages: the query, the membership report,
and the leave report. There are two types of query messages, general and
special.

The topics discussed in this section include:

Message Format

TCP/IP Protocol Suite 11


IGMP message types

TCP/IP Protocol Suite 12


IGMP message format

TCP/IP Protocol Suite 13


IGMP type field

TCP/IP Protocol Suite 14


IGMP OPERATION
A multicast router connected to a network has a list of multicast
addresses of the groups with at least one loyal member in that network.
For each group, there is one router that has the duty of distributing the
multicast packets destined for that group.

The topics discussed in this section include:

Joining a Group
Leaving a Group
Monitoring Membership

TCP/IP Protocol Suite 15


IGMP operation

TCP/IP Protocol Suite 16


Membership report

TCP/IP Protocol Suite 17


Note:

In IGMP, a membership report is sent


twice, one after the other.

TCP/IP Protocol Suite 18


Leave report

TCP/IP Protocol Suite 19


General query message

TCP/IP Protocol Suite 20


Note:

The general query message does not


define a particular group.

TCP/IP Protocol Suite 21


ExamplE 1

Imagine there are three hosts in a network as shown in Figure


10.8.

A query message was received at time 0; the random delay time


(in tenths of seconds) for each group is shown next to the
group address. Show the sequence of report messages.

See Next Slide

TCP/IP Protocol Suite 22


Figure 10.8 Example 1

TCP/IP Protocol Suite 23


ExamplE 1 (ContinuEd)

Solution
The events occur in this sequence:
a. Time 12: The timer for 228.42.0.0 in host A expires and a
membership report is sent, which is received by the router and every
host including host B which cancels its timer for 228.42.0.0.

b. Time 30: The timer for 225.14.0.0 in host A expires and a


membership report is sent, which is received by the router and every
host including host C which cancels its timer for 225.14.0.0.

c. Time 50: The timer for 238.71.0.0 in host B expires and a


membership report is sent, which is received by the router and every
host.

See Next Slide


TCP/IP Protocol Suite 24
ExamplE 1 (ContinuEd)

d. Time 70: The timer for 230.43.0.0 in host C expires and a


membership report is sent, which is received by the router and every
host including host A which cancels its timerfor 230.43.0.0.

Note that if each host had sent a report for every group in its
list, there would have been seven reports; with this strategy
only four reports are sent.

TCP/IP Protocol Suite 25


Note:

The IP packet that carries an IGMP


packet has a value of 2 in its
protocol field.

TCP/IP Protocol Suite 26


Note:

The IP packet that carries an IGMP


packet has a value of 1 in its
TTL field.

TCP/IP Protocol Suite 27


Destination IP addresses

TCP/IP Protocol Suite 28


Protocol Independent Multicast
• PIM : Protocol Independent Multicast
• Independent of particular unicast routing protocol
• Just assumes one exists
• Pros: simple, less overhead
• Does not require computation of specific routing tables
• Cons: may cause more broadcast-and-prunes (in dense mode)
• Most popular multicast routing protocol today
• Main difference with DVMRP – independence from underlying unicast
routing mechanism
• PIM supports both dense (DM) and sparse (SM) mode operation
• You can locally use either or both modes

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


PIM DM overview(1)
• Assumes that you have lots of folks who want to be part of a group
• Based on broadcast and prune
• Ideal for dense group
• Source tree created on demand based on RPF rule
• If the source goes inactive, the tree is torn down
• Easy “plug-and-play” configuration
• Branches that don’t want data are pruned

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


PIM DM overview(2)
• Grafts used to join existing source tree
• Asserts used to determine the forwarder for multi-access LAN
• Non-RPF point-2-point links are pruned as a consequence of initial
flooding

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


PIM DM Forwarding
• PIM DM interfaces are placed on an “downstream” list for a multicast
group if:
• PIM neighbor is heard on interface
• Host on this interface has just joined the group
• Interface has been manually configured to join group
• Packets are flooded out all interfaces in “downstream” list
• If a PIM neighbor is present, DM assumes EVERYONE wants to receive the
group so it gets flooded to that link

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


PIM Assert Mechanism
• Routers receive packet on an interface in their “downstream” list
• Only one router should continue sending to avoid duplicate packets.
• Routers sends “PIM assert” messages
• Compare distance and metric values
• Router with best route to source wins
• If metric & distance equal, highest IP addr wins
• Losing router stops sending (prunes interface)

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


PIM DM State Maintenace
• State is maintained by the “flood and prune” behavior of Dense Mode.
• Received Multicast packets reset(S,G) entry “expiration” timers.
• When (S,G) entry “expiration” timers count down to zero, the entry is deleted.
• Interface prune state times out causing periodic reflooding and
pruning
• could be as little as 210 seconds

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


PIM-DM(1)
Initial flood of data

S Source

A B
G

C D F

E I
R1
R2
Receiver 1
Receiver 2
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
PIM-DM(2)
prune non-RPF p2p link

S IGMP PIM-Prune
Source

A B
G

C D F

E I
R1
R2
Receiver 1
Receiver 2
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
PIM-DM(3)
C and D Assert to Determine
Forwarder for the LAN, C Wins
S IGMP PIM-Assert
Source
with its own IP address

A B
G

C D F

E I
R1
R2
Receiver 1
Receiver 2
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
PIM-DM(4)
I, E, G send Prune
H send Join to override G’s Prune
S IGMP PIM-Prune
Source
IGMP PIM-Join

A B
G

C D F

E I
R1
R2
Receiver 1
Receiver 2
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
PIM-DM(5)
I Gets Pruned E’s Prune is Ignored (since R1 is a receiver)
G’s Prune is Overridden (due to new receiver R2)
S Source

A B
G

C D F

E I
R1
R2
Receiver 1
Receiver 2
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
PIM-DM(6) New Receiver, I send Graft
S IGMP PIM-Graft
Source

A B
G

C D F

E I
R1
R2
Receiver 1
R3 Receiver 2
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
Receiver 3
PIM-DM(6) new branch
S IGMP PIM-Graft
Source

A B
G

C D F

E I
R1
R2
Receiver 1
R3 Receiver 2
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
Receiver 3
Raw Sockets
Dr. Ramakrishna M

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Other –
• Readv ( ) and writev ( )
• Read or write data into multiple buffers
• Connection-oriented.
• Recvmsg ( ) and sendmsg ( )
• Most general form of send and receive. Supports multiple buffers and flags.
• Connectionless

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


readv( ) and writev( )
• ssize_t writev ( int filedes, const struct iovec *iov, int iovcnt);
• ssize_t readv ( int filedes, const struct iovec *iov, int iovcnt);
• filedes – socket identifier
• iov – pointer to an array of iovec structures
• iovcnt – number of iovec structures in the array (16 < iovcnt: Linux 1024)
• Return value is # of bytes transferred or -1 or error

struct iovec {
void * iov_base; // starting address of buffer
size_t iov_len; // size of buffer
};

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


writev( ) example:
• Write a packet that contains multiple separate data elements
• 2 character opcode
• 2 character block count
• 512 character data block

char opcode[2] = “03”;


char blkcnt[2] = “17”;
char data[512] = “This RFC specifies a standard …”;
struct iovec iov[3];
iov[0].iov_base = opcode;
iov[0].iov_len = 2;
iov[1].iov_base = blkcnt;
iov[1].iov_len = 2;
iov[2].iov_base = data;
iov[3].iov_len = 512;
writev (sock, &iov, 3);
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
writev( ) example:

writev (sock, &iov, 3);

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


recvmsg ( ) and sendmsg ( )
• ssize_t recvmsg (int sock, struct msghdr *msg, int flags);
• ssize_t sendvmsg (int sock, struct msghdr *msg, int flags);
• Sock – Socket identifier
• Msg – struct msghdr that includes message to be sent as well as address and
msg_flags
• Flags – sockets level flags
• Return value is # of bytes transferred or -1 or error

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Sockets level flags
Flag Description RECV SEND

MSG_DONTROUTE Bypass routing table lookup *


MSG_DONTWAIT Only this operation is nonblocking * *
MSG_PEEK Peek at incoming message *
MSG_WAITALL Wait for all the data *
MSG_OOB Send or receive out-of-band data * *

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Struct msghdr
struct msghdr {
void *msg_name; // protocol address
socklen_t msg_namelen // size of protocol address
struct iovec *msg_iov // scatter / gather array
int msg_iovlen //# of elements in msg_iov
void *msg_cntrl // cmsghdr struct
socklen_t msg_cntrllen // length of msg_cntrl
int msg_flags // flags returned by recvmsg
};

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Msg_flags
Flag Returned by
recvmsg msg_flags
MSG_EOR *
MSG_OOB *
MSG_BCAST *
MSG_MCAST *
MSG_TRUNC *
MSG_CTRUNC *
MSG_NOTIFICATION *

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


recvmsg example (ICMP)
char recvbuf[BUFSIZE];
char controlbuf[BUFSIZE];
struct msghdr msg;
struct iovec iov;
sockfd = socket (PF_INET, SOCK_RAW, pr->icmpproto);
iov.iov_base = recvbuf;
iov.iov_len = sizeof(recvbuf);
msg.msg_name = sin; //sockaddr struct, for sender’s IP & port
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
msg.msg_control = controlbuf;
for ( ; ; ) {
msg.msg_namelen =sizeof(sin);
msg.msg_controllen = sizeof(controlbuf);
n = recvmsg(sockfd, &msg, 0);

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


What are Raw Sockets?
• A way to pass information to network protocols other than TCP or
UDP (e.g. ICMP and IGMP)
• A way to implement new IPv4 protocols
• A way to build our own packets (be careful here)

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Why Would We Use Them?
• Allows us to access packets sent over protocols other than TCP / UDP
• Allows us to process IPv4 protocols in user space
• Control, speed, troubleshooting
• Allow us to implement new IPv4 protocols
• Allows us to control the IP header
• Control option fields (beyond setsockopt() )
• Test / control packet fragmentation

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Limitations?
• Reliability Loss
• No Ports
• Nonstandard communication
• No Automatic ICMP
• Raw TCP / UDP unlikely
• Requires root / admin

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


OS Involvement in Sockets
User Space Kernel Space

Socket App Linux TCP/IP Stack

Socket ( AF_INET,
SOCK_STREAM, Identify TCP
IPPROTO_TCP) Socket Type

Socket ( AF_INET,
SOCK_RAW, Identify IP
IPPROTO_ICMP) Socket Type

Socket ( AF_PACKET,
SOCK_RAW, Identify Ethernet
htons(ETH_P_IP)) Socket Type

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Normal Socket Operation (TCP)
• Create a socket
• s = socket (AF_INET, SOCK_STREAM, IPPROTO_TCP)
• Bind to a port (optional)
• Identify local IP and port desired and create data structure
• bind (s, (struct sockaddr *) &sin, sizeof(sin))
• Establish a connection to server
• Identify server IP and port
• connect (s, (struct sockaddr *) &sin, sizeof(sin))
• Send / Receive data
• Place data to be send into buffer
• recv (s, buf, strlen(buf), 0);
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
Normal Socket Operation (TCP)
User Space Kernel Space

Socket App Linux Protocol

socket ( ) Create socket TCP


OK
Bind to local port: TCP, IP, Internet
connect( ) Connect to remote port
OK
Pass data thru local TCP, IP, Internet
send( ) stack to remote port
OK

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Raw Sockets Operation (ICMP)
• Create a socket
• s = socket (AF_INET, SOCK_RAW, IPPROTO_ICMP)
• Since there is no port, there is no bind *
• There is no TCP, so no connection *
• Send / Receive data
• Place data to be sent into buffer
• sendto (s, buf, strlen(buf), 0, addr, &len);

• * More later

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Raw Sockets Operation (ICMP)
User Space Kernel Space

Socket App Linux Protocol

socket ( ) Create socket ICMP


OK

Pass data thru local IP, Internet


sendto( ) stack to remote host
OK

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Create a Raw Socket
• s = socket (AF_INET, SOCK_RAW, protocol)
• IPPROTO_ICMP, IPPROTO_IP, etc.
• Can create our own IP header if we wish
• const int on = 1;
• setsockopt (s, IPPROTO_IP, IP_HDRINCL, &on, sizeof (on));
• Can “bind”
• Since we have no port, the only effect is to associate a local IP address with the raw
socket. (useful if there are multiple local IP addrs and we want to use only 1).
• Can “connect”
• Again, since we have no TCP, we have no connection. The only effect is to associate a
remote IP address with this socket.
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
Raw Socket Output
• Normal output performed using sendto or sendmsg.
• Write or send can be used if the socket has been connected
• If IP_HDRINCL not set, starting addr of the data (buf) specifies the first
byte following the IP header that the kernel will build.
• Size only includes the data above the IP header.
• If IP_HDRINCL is set, the starting addr of the data identifies the first
byte of the IP header.
• Size includes the IP header
• Set IP id field to 0 (tells kernel to set this field)
• Kernel will calculate IP checksum
• Kernel can fragment raw packets exceeding outgoing MTU
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
Raw Socket Input
• Received TCP / UDP NEVER passed to a raw socket.
• Most ICMP packets are passed to a raw socket
• (Some exceptions for Berkeley-derived implementations)
• All IGMP packets are passed to a raw socket
• All IP datagrams with a protocol field that the kernel does not
understand (process) are passed to a raw socket.
• If packet has been fragmented, packet is reassembled before being
passed to raw socket

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Conditions that include / exclude passing to
specific raw sockets
• If a nonzero protocol is specified when raw socket is created,
datagram protocol must match
• If raw socket is bound to a specific local IP, then destination IP must
match
• If raw socket is “connected” to a foreign IP address, then the source IP
address must match

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Summary
• Raw Sockets allow access to Protocols other than the standard TCP
and UDP
• Performance and capabilities may be OS dependent.
• Some OSs block the ability to send packets that originate from raw sockets
(although reception may be permitted).
• Raw sockets remove the burden of the complex TCP/IP protocol stack,
but they also remove the safeguards and support that those protocols
provide

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Queueing Models
Dr. Ramakrishna M

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Agenda
• The Purpose
• Characteristics of Queues Systems
• Queueing Notations
• Long-Run Measures
• Steady-State Behavior of Infinite-Population Markovian Models
• Steady-State Behavior of Finite-Population Markovian Models
• Final Summary

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Purpose
• Simulation is often used in the analysis of queueing models.
• A simple but typical queueing model

Calling population

Waiting line Server

• Queueing models provide the analyst with a powerful tool for designing and evaluating
the performance of queueing systems.
• Typical measures of system performance
• Server utilization, length of waiting lines, and delays of customers
• For relatively simple systems: compute mathematically
• For realistic models of complex systems: simulation is usually required

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Queueing System Examples
System Customers Server

Reception desk People Receptionist

Hospital Patients Nurses

Airport Airplanes Runway

Production line Cases Case-packer

Road network Cars Traffic light

Grocery Shoppers Checkout station

Computer Jobs CPU, disk, CD

Network Packets Router

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Characteristics of Queueing Systems
• Key elements of queueing systems:
• Customer: refers to anything that arrives at a facility and requires service,
e.g., people, machines, trucks, emails.
• Server: refers to any resource that provides the requested service, e.g., repair
persons, retrieval machines, runways at airport.
Waiting Line Server(s)
Customer Arrivals

Calling Population System Capacity Service Times


Arrival Process Queue Behavior Service Mechanisms
Queue Discipline
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
Calling Population
• Calling population: the population of potential customers, may be
assumed to be finite or infinite.
• Finite population model: if arrival rate depends on the number of customers
being served and waiting, e.g., model of one corporate jet, if it is being
repaired, the repair arrival rate becomes zero.

• Infinite population model: if arrival rate is not affected by the number of


customers being served and waiting, e.g., systems with large population of
potential customers.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


System Capacity
• System Capacity: a limit on the number of customers that may be in
the waiting line or system.
• Limited capacity, e.g., an automatic car wash only has room for 10 cars to wait
in line to enter the mechanism.

Waiting line Server

• Unlimited capacity, e.g., concert ticket sales with no limit on the number of
people allowed to wait to purchase tickets.

Waiting line Server


Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
Arrival Process
• For infinite-population models:
• In terms of interarrival times of successive customers.
• Arrival types:
• Random arrivals: interarrival times usually characterized by a probability
distribution.
• Most important model: Poisson arrival process (with rate λ), where a time represents the
interarrival time between customer n-1 and customer n and is exponentially distributed (with
mean 1/ λ).
• Scheduled arrivals: interarrival times can be constant or constant plus or minus a
small random amount to represent early or late arrivals.
• Example: patients to a physician or scheduled airline flight arrivals to an airport
• At least one customer is assumed to always be present, so the server is
never idle, e.g., sufficient raw material for a machine.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Arrival Process
• For finite-population models:
• Define customer as “pending” when the customer is outside the queueing
system, e.g., machine-repair problem: a machine is “pending” when it is
operating, it becomes “not pending” the instant it demands service form the
repairman.
• Define “runtime” of a customer as the length of time from departure from
the queueing system until that customer’s next arrival to the queue, e.g.,
machine-repair problem, machines are customers, and a runtime is time to
failure.
• Let A1(i), A2(i), … be the successive runtimes of customer i, and S1(i), S2(i) be
the corresponding successive system times: that is Sn(i) is the total time spent
in the system by customer i during the nth visit.
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
Arrival Process
• Finite Population Models

• The total arrival process is the superposition of the arrival times of all customers.
• One important application of finite models is the machine-repair problem. Machines
are the customers and runtime is time to failure. When a machine fails, it “arrives” at
the queueing system and remains there until it is served. Time to failure is
chracterized by exponential, Weibull and Gamma distributions.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Queue Behavior and Queue Discipline
• Queue behavior: refers to the actions of customers while in a queue
waiting for service to begin, for example:
• Balk: leave when they see that the line is too long, (forced and unforced)
• Renege: leave after being in the line when its moving too slowly,
• Jockey: move from one line to a shorter line.

• Queue discipline: refers to the logical ordering of customers in a queue that


determines which customer is chosen for service when a server becomes
free, for example:
• First-in-first-out (FIFO)
• Last-in-first-out (LIFO)
• Service in random order (SIRO)
• Shortest processing time first (SPT)
• Service according to priority (PR). (e.g., type, class, priority)

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Service Times and Service Mechanism
• Service times of successive arrivals are denoted by S1, S2, S3, ……
• May be constant or random.
• {S1, S2, S3, …} is usually characterized as a sequence of independent and
identically distributed random variables,
• e.g., exponential, Weibull, gamma, lognormal, and truncated normal distribution.
• Sometimes, services are identically distributed for all customers of a given
type or class or priority, whereas customers of different types might have
completely different service-time distributions
• In some systems, service times depend upon the time of the day or upon the
length of waiting line (e.g., servers might work faster than usual if waiting
times are long, effectively reducing service times)

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Service Times and Service Mechanism
• A queueing system consists of a number of service centers and
interconnected queues.
• Each service center consists of some number of servers, c, working in parallel
• upon getting to the head of the line, a customer takes the 1st available server.
• Parallel service mechanisms are either single server (c=1), multiple server
(1<c<), or unlimited servers (c=)
• A self-service facility is usually characterized by an unlimited number of servers

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Queuing System: Example 1
• Example: consider a discount warehouse where customers may
• serve themselves before paying at the cashier (service center 1) or
• served by a clerk (service center 2)

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Queuing System: Example 1
• Wait for one of the three clerks:
• Batch service (a server serving several customers simultaneously), or
customer requires several servers simultaneously.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Queuing System: Example 1

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Queuing System: Example 2
• Candy production line
• Three machines separated by buffers
• Buffers have capacity of 1000 candies

Assumption:Allways
sufficient supply of
raw material.
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
Optical Links
Dr. Ramakrishna M

High Performance Communication Network


- Jean Walrand
- Pravin Varaiya
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
Why Optical?
• Greater Bandwidth
• Reliable
• Long Distance Communication

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


The advantages of fiber optic over wire cable
• Thinner
• Higher carrying capacity
• Less signal degradation
• Light signal
• Low power
• Flexible
• Non-flammable
• Lightweight

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Disadvantages over copper cable
• Optical fiber is more expensive per meter than copper
• Optical fiber can not be joined as easily as copper cable. It requires
training and expensive splicing and measurement equipment.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


History of Fiber Optics
• John Tyndall demonstrated in 1870
• Total Internal reflection is the basic idea of fiber optic

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Total internal reflection
• Optical fibers work on the principle of total internal reflection
• With light, the refractive index is listed
• The angle of refraction at the interface between two media is
governed by Snell’s law:
n1 sin1  n2 sin 2

• Is a formula used to describe the relationship between the angles of


incidence and refraction, when referring to light or other waves passing
through a boundary between two different isotropic media, such as water,
glass, or air.
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
Optical Transmission
optical
electrical electrical
Optical signal Optical
signal signal
Fibre Fibre
Transmission Transmission
System System

Advantages of optical transmission:


1. Longer distance (noise resistance and less attenuation)
2. Higher data rate (more bandwidth)
3. Lower cost/bit

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Optical Networks
• Passive Optical Network (PON)
• Fiber-to-the-home (FTTH)
• Fiber-to-the-curb (FTTC)
• Fiber-to-the-premise (FTTP)
• Metro Networks (SONET)
• Metro access networks
• Metro core networks
• Transport Networks (DWDM)
• Long-haul networks

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Optical Network Architecture
DWDM Long Haul
Network

SONET
Metro Metro
Network Network
transport network
PON
Access Access Access Access
Network Network Network Network

CPE (customer premise)

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


All-Optical Networks
• Most optical networks today are EOE (electrical/optical/electrical)
• All optical means no electrical component
• To transport and switch packets photonically.
• Transport: no problem, been doing that for years
• Label Switch
• Use wavelength to establish an on-demand end-to-end path
• Photonic switching: many patents, but how many products?

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Optical Signal
• Wavelength (): length of a wave and is measured in nanometers,
10^-9m (nm)
• 400nm (violet) to 700nm (red) is visible light
• Fiber optics primarily use 850, 1310, & 1550nm
• Frequency (f): measured in Tera Hertz, 1012 (THz)
• Speed of light = 3×108 m/sec

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Optical Spectrum

• Light UV IR 125 GHz/nm
• Ultraviolet (UV) Visible
• Visible 850 nm 1310 nm 1550 nm
• Infrared (IR)
• Communication wavelengths
• 850, 1310, 1550 nm
• Low-loss wavelengths 1550nm 193,548.4GHz
1551nm 193,424.6GHz
1nm 125 GHz

Bandwidth
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
Optical Fiber Core Cladding

• An optical fiber is made of


three sections:
• The core carries the
light signals
• The cladding keeps the light
in the core Coating
• The coating protects the glass

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Optical Fiber (cont.)
• Single-mode fiber
• Carries light pulses by laser along single path
• Multimode fiber
• Many pulses of light generated by LED travel at different angles

SM: core=8.3 cladding=125 µm

MM: core=50 or 62.5 cladding=125 µm

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Bending of light ray

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Propagation modes

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Propagation modes

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Fiber construction

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Fiber-optic cable connectors

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Fiber Installation
• Don’t squeeze support straps too tight.
• Pull cables by hand, no jerking, even hand pressure.
• Avoid splices.
• Make sure the fiber is dark when working with it.
• Broken pieces of fiber VERY DANGEROUS!! Do not ingest!

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Optical Transmission Effects
Attenuation
Dispersion & Nonlinearity

Distortion

Transmitted Data Waveform Waveform After 1000 Km

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Optical Transmission Effects
Attenuation:
Loss of transmission power due to long distance

Dispersion and Nonlinearities:


Erodes clarity with distance and speed

Distortion due to signal detection and recovery

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Transmission Degradation
Ingress Signal Egress Signal

Loss of Energy
Optical Amplifier

Shape Distortion
Dispersion Compensation Unit (DCU) Phase Variation

t t
Loss of Timing (Jitter)
Optical-Electrical-Optical (OEO) cross-connect

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Communication Components
• Bandwidth of optical fiber is 25000GHz
• Components of Optical Network
• Transmitter : Input signal to optical signal
• OOK – On-Off keying
• SCM – Sub Carrier Multiple Access
• Fiber : Transmission over large distance
• Receiver : Optical signal to electrical signal
Electrical Optical fiber Electrical
Modulator Receiver
signal signal

Optical
sourceDr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
• A link is characterized by (B,L) where
• B: bitrate (bps)
• L: Max. Distance for which BER <10^(-12)
• To build a communication system that can transmit BT bps over a
distance LT kms using (B,L) optical links, we need BT/B parallel
systems each with LT/L links in series.
• Thus we need (BT X LT) / (B x L) optical links.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Optical Links

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Multi-Protocol Label Switch
(MPLS)
Dr. Ramakrishna M

Ref: Communication Networks by Leon-Garcia (Chapter 10.5)

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Outline
• Introduction
• MPLS Terminology
• MPLS Operation
• Label Encapsulation
• Label Distribution Protocol (LDP)
• Any Transport Service over MPLS
• MPLS Applications
• Traffic Engineering
• MPLS-based VPN
• MPLS and QoS
• Summary

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Why MPLS?
• Growth and evolution of the Internet
• The need for network convergence to support both voice and data services
on both carrier and enterprise networks
• The need for advanced and guaranteed service over the Internet
• The need of virtual circuit but without the complexity of provisioning and managing
virtual circuits.
• PVC: too much provisioning and management work
• SVC: [signaling] too complex to support and trouble shoot
• The need for an efficient transport mechanism
• routing: flexibility
• forwarding: price/performance
• Can we forward IP packets? Answer: MPLS
Performance and service of Layer-2 and management of layer-3
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
Motivation for Carriers
• Network convergence
• Single network to support voice and data traffic
• Ease of network management
• to provision new services
• to support various Service Level Agreements (SLA)
• Ease of Traffic Engineering
• To reroute during node failures or network congestion

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Motivation for Enterprises
• Network convergence
• Single network for voice and data
• A meshed topology (any-to-any) without the nightmare of cost and
management
• Confusion with too many Frame Relay PVCs
• Quality of Service (QoS) for intranet
• Ease of bandwidth management
• Flexibility of bandwidth provisioning

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


MPLS History
• IP over ATM
• IP Switching by Ipsilon (Networking company)
• Cell Switching Router (CSR) by Toshiba
• Tag switching by Cisco
• Aggregate Route-based IP Switching (IBM)
• IETF – MPLS
• http://www.ietf.org/html.charters/mpls-charter.html
• RFC3031 – MPLS Architecture
• RFC2702 – Requirements for TE over MPLS
• RFC3036 – LDP Specification
• over 113 RFCs related to MPLS

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


MPLS and OSI
(MPLS is a layer 2.5 protocol)
Applications

TCP UDP
IP
MPS
MPLS
PPP FR ATM Ethernet DWDM
Physical

When a layer is added, no modification is needed


on the existing layers.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


MPLS and OSI
(MPLS is a layer 2.5 protocol)

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Label Switching
(This is not new!)
• ATM: VPI/VCI
• Frame Relay: DLCI
• X.25: LCI (logical Channel Identifier)
• TDM: the time slot (Circuit Identification Code)
• Ethernet switching: MAC Address

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Label Substitution (swapping)

Label-A1 Label-B1

Label-A2 Label-B2

Label-A3 Label-B3

Label-A4 Label-B4

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


MPLS
• A protocol to establish an end-to-end path from source to the
destination
• A hop-by-hop forwarding mechanism
• Use labels to set up the path
• Require a protocol to set up the labels along the path
• Support multi-level label transport
• It builds a connection-oriented service on the IP network
• Note: ATM and Frame Relay also support connection-oriented services, but IP
does not.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Terminology
• LSR - Routers that support MPLS are called Label Switch Router
• LER - LSR at the edge of the network is called Label Edge Router (a.k.a Edge
LSR)
• Ingress LER is responsible for adding labels to unlabeled IP packets.
• Egress LER is responsible for removing the labels.
• Label Switch Path (LSP) – the path defined by the labels through LSRs
between two LERs.
• Label Forwarding Information Base (LFIB) – a forwarding table (mapping)
between labels to outgoing interfaces.
• Forward Equivalent Class (FEC) – All IP packets follow the same path on the
MPLS network and receive the same treatment at each node.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


How does it work?
Add label at the remove label at
ingress LER the egress LER

LSR LSR LER


LER

IP IP #L1 IP #L2 IP #L3 IP

IP Label Label IP
Routing Switching Switching Routing

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


MPLS Operation

Label Path: R1 => R2 => R3 => R4

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Label Forwarding Information Base (LFIB)
Router Incoming Incoming Destination Outgoing Outgoing
Interface Network
Label Interface Label
(FEC)

R1 --- E0 172.16.1.0
S1 6

R2 6 S0 172.16.1.0
S2 11

R3 11 S0 172.16.1.0
S3 7

R4 7 S1 172.16.1.0
E0 --
Note: the label switch path is unidirectional.
Q: create LFIB for R4 => R3 => R2 => R1
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
Label Encapsulation
• Label information can be carried in a packet in a variety of ways:
• A small, shim label header inserted between the Layer 2 and network layer
headers.
• As part of the Layer 2 header, if the Layer 2 header provides adequate
semantics (such as ATM).
• As part of the network layer header (future, such as IPv6).

• In general, MPLS can be implemented over any media type, including point-
to-point, Ethernet, Frame Relay, and ATM links. The label-forwarding
component is independent of the network layer protocol.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Shim Header
• The Label (Shim Header) is represented as a sequence of Label stack
entries
• Each Label is 4 bytes (32 bits)
• 20 Bits is reserved for the Label Identifier
Label Identifier Exp S TTL
(20 bits) (3 bits) (1 bit) (8bits)
Label Identifier : Label value (0 to 15 are reserved)
Exp : Experimental Use
S: Bottom of Stack (set to 1 for the last entry in the label)
TTL : Time To Live

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


MPLS and TTL
• TTL: Time to Live
• In IP, TTL is used to prevent packets to travel indefinitely in the
network.
• MPLS uses the same mechanism as IP.
• Why do we need TTL?
• MPLS may interwork with non-MPLS network.
• TTL is in the label header of PPP and Ethernet (shim header)
• Not supported in ATM.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Forward Equivalent Class (FEC) Classification
• When an unlabeled packet arrives at an ingress router, a label has to
be applied. A packet can be mapped to a particular FEC based on the
following criteria:
• destination IP address
• source IP address
• TCP/UDP port
• class of service (CoS) or type of service (ToS)
• application used
• any combination of the previous criteria.
Ingress Label FEC Egress Label
6 138.120.6.0/24 9

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Transmitter - A Light Sources

ILD (injection laser diode)


LED (Light emitting diode)

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
Transmitter
• Transmitter is modulated source of light.
• LASER diode is light source.
• Light amplification is achieved as photons move back and forth
between two parallel mirrors, triggering forced or stimulated
emission.
• Ideal laser light is formed when group of photons are coherent.
• Amplification and coherence create Laser’s highly directional beam.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Transmitter
• Intensity of light can be varied by modulation.
• Transmitter’s limitations are determined by Power of light Source (PT)
• Coherence
• Modulation bandwidth
• Laser diodes have output power of 10mW and modulation bandwidth
of 3Ghz.
• Wavelength of emitted photon is
.
( )

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Transmitter
• A fiber optic transmitter is a device which includes a LED or laser
source and signal conditioning electronics that is used to inject a
signal into fiber.
• Information is sent from a source to a transmitter by means of an
electrical signal. The transmitter then takes that binary data and
transfers it to a light signal.
• A transceiver is a device which combines the functions of both the
transmitter and receiver.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Fiber
• An optical signal propagates over the fiber.
• Core: High Refractive Index
• Cladding: Lower RI
• Buffer: Protection
• Jacket: Covering

• It gets distorted due to


• Attenuation: Reduction in power in optical signal
• Dispersion: Spreading of pulse of light.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
Attenuation
• Expressed in dB/km.
• Attenuation is exponential to fiber length.
• a(l) = e ^ (-α l), l >0
• 10 log (Pt/P(l)) = A x L
Where a(l) attenuation factor
P(l) Power of beam after travelling l km
Pt Power of beam launched into fiber
A Attenuation of fiber in db/Km
L Length of fiber
A depends on fiber material and wavelength of light.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Attenuation
• Causes of attenuation
• Rayleigh Scattering
• Vibrational absorption

Consider that the window at 1.55μm has a width of 200nm.


The range of frequencies are c/(λ+200) to c/ λ
If λ ~ 1.44 μm, range of frequencies is
1.818 X 10^ 14 to 2.068 X 10 ^ 14 Hz.
=> 25000Ghz.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Attenuation

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Maximum Usable Length of an optical fiber
• L = (10/A) log 10 (Pt/Pr)

• P (dBm) = 10 log 10 (P In Watts/ 1mW)

• L = (1/A) { Pt (dbm) – Pr(dBm) }

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


• Assume that Pt = 1mW, Pr = -45dBm at the rate of 1Gbps and
BER = 10 ^ (-12) with A = 0.2dB/Km.
• Calculate L
• L = 225Km => (BxL) = 225GbpsXKm

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Dispersion
• Suppose transmitter transmits 1 for ‘T’ seconds and 0 for ‘T’ seconds.
• (T = 1/B is bit time and B is bit rate in bps)
• Receiver can see the 0 between two 1’s if the pulse spread is less than
T/4.
• If pulse spread is given as αL and if the condition is to be satisfied,
then
αL < 1/ (4B)
B x L < 1/ (4α)
• Hence dispersion limits bandwidth distance product.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Receiver
• Operations involved to detect whether a 1 or 0 is transmitted
• Photo detection: Photodiode converts optical to electrical photocurrent
• Amplification: Photocurrent in voltage signal at usable level.
• Filtering: Low pass filter reduces noise introduced by amplifier
• Decision: Equalizer to restore the data pulse shape, compare the signal with
threshold to decide 1 or 0

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Noise sources for decision
• Photodetector Shot Noise: Sum of sequence of impulses that
coincide with random arrival times of photons that constitute the
optical signal.

• Photodetector Dark Current: Photocurrent produced even when no


external light is impinging on photodiode. (1 to 5nA)

• Amplifier Thermal Noise: It is white noise produced by amplifier.

• <i²>total = <i²>shot + <i²>dark + <i²>thermal


Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
• Receiver performance is measured by Sensitivity: Minimum received
optical power needed to achieve BER of 10 ^ (-9) at specified bit rate
B.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Subcarrier Multiplexing
• N analog or digital baseband signals modulate different oscillators at
different RF subcarrier frequencies.

• Electrical signal obtained by adding modulated subcarriers now


modulates a single laser.

• At receiver, direct detection is followed by down converting to


intermediate frequency.

• Combine Cable TV, telephone and data networks : Fiber to the curb
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
WDM

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


WDM

λ1 λ1
T R

T R
λn λn
MUX
DEMUX

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


WDM
• WDM divides the window into N channels.

• Light of each wavelength is generated by separate laser and


modulated independently.

• They are then combined and transmitted over same fiber.

• At receiver, the filter selects desired channel, and the signal is


demodulated.
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
WDM
• WDM offers protocol transparency since each wavelength is
modulated independently.

• Hence one wavelength may carry Analog TV signals, other can carry IP
packets.

• Connecting WDM links without electrical signal conversion needs


Optical cross connects.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Advantages of WDM
• Lesser number of channels to transmit and receive data.
• Immune to amplitude nonlinearity.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Optical cross connects (OXC)
• Also called frequency or wavelength selective switch.
• Each of N input fibers carries n WDM channels.
• After demultiplexing the nN channels are switched to nN X nN space-
division switch.
• The switch permutes the channels
• The nN output channels are then re-multiplexed into N output fibers.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


λ1 λ1
R T

1 T
R
λn λn

Space Division Switch


λ1 λ1
R T

N
R T
λn λn

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


• Some channels may terminate locally, and local channels may be
substituted.
• The switch may not be reconfigurable.
• So we need ADMs.
• Two light paths that share a common fiber link should not be assigned
the same wavelength.
• So we need Wavelength conversion. This may be based on
• Optical Gating
• Wave-mixing.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Add drop multiplexing

Single fiber connects adjacent multiplexers.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


• Optical cross connects permits wavelength routing

• A virtual light path must be created that spans several links joined by
cross connects.

• A light path must carry same wavelength. This is called wavelength


continuity requirement.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Link to rest of C
network
5
A
1
3 6
E
OXC1 OXC2

4
2

B
D

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Three generations of Optical Fiber

Factor First Generation Second Generation Third Generation

Laser AlGaAs or LED


Pt 1mW 1mW 1mW
Wavelength 0.85µm 1.3µm 1.55µm
Fiber Multimode Single mode Single mode
Attenuation 2.5dB/Km 0.4dB/Km 0.25dB/Km
Photodiode Silicon InGaAs InGaAs

Receiver Sensitivity 300 photons per bit 1000 photons per bit Approx. 1000 photons per bit

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Integrated and Differentiated
Services
Dr. Ramakrishna M

Ref: Chapter 10 :Communication Networks - Alberto Leon-Garcia

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Capability of a network to provide better service (high bandwidth, less
delay, low jitter, and low loss probability) to a selected set of network
traffic.
Multimedia applications:
network audio and video
(“continuous media”)

TCP/UDP/IP: “best-effort service”


• no guarantees on delay, loss

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


QoS Principles
Traffic Shaping
Packet
Scheduling
(Users get their share of
(Amount of traffic
bandwidth)
users can inject into
the network)

Admission
Control
(To accept or reject a
flow based on flow Core
specifications)

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Integrated Services in the Internet
• There is a need to have a model that reserves resources such as
buffers and bandwidth for a given data flow to ensure that the
application receives its requested QoS.

• Hence the solution is…

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Integrated Services Router Model

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Router Model
Admission Control
Packet Classifier Packet scheduler
Determine
Classify the Handle forwarding
whether the
packets to Identify of different packet device has
QoS requirements flows ensuring the necessary
QoS requirements
resources to
are met.
accept a new flow

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Flow Descriptor
• Flow descriptor is used to describe the traffic and QoS requirements
of a flow. The flow descriptor consist of two parts:
• Filterspec: Filter Specification – information required by the packet
classifier to identify the packets that belong to the flow
• Flowspec: Flow Specification
• Tspec: Traffic Specification – traffic behavior – Token Bucket
• Rspec: Service Request Specification – requested QoS – bandwidth, packet
delay, or packet loss

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


`
Real time applications •Applications that are sensitive
The Guaranteed service class for overload condition.
provides firm end-to-end delay •Thus these kind of applications
guarantees. expect low packet loss and low
This service guarantees both queuing delay.
delay and bandwidth.
To support guaranteed service, 2 services •The controlled load service does
not make use of specific target
each router must know the values for control parameters
traffic characteristics of the flow such as delay or loss.

Guaranteed Controlled Load

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Difference

Guaranteed Service Controlled load service


To provide a bound on To provide low packet
the end-to-end packet loss and low queuing
delay for a flow. delay
Provides Quantitative Provides Qualitative
Guarantee guarantee
Implementation Implementation
complexity is higher complexity is lower

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


IntServ and ATM: similarities

• Both require signaling


• Both operate on per flow basis
• Both use admission control

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


IntServ and ATM: differences

• ATM is hard state


• connection-oriented nature provides confirmation of setup or denial
• QoS negotiable

• IntServ is soft state


• No sender control
• Guaranteed service determined from Tspec, Rspec; not negotiable for
controlled load

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


What Integrated Services Offer?
• QoS guarantees on a per flow basis
• Intermediate router keep per flow state
• Resource reservation protocol (RSVP) for end-to-end signaling
• Admission control: Check if the required resources can be provided
• Policing: check if traffic conforms to profile
• Shaping: modify traffic timing so that it conforms to profile
• Classification: identify packets that are to receive certain level of
service
• Scheduling: isolate flows and support minimum bandwidth

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


RSVP functions in brief
• RSVP is an IP signaling protocol to setup and maintain flow-specific
state in hosts and routers
• Receiver-oriented
• Receivers initiate and maintain resource reservations
• Multicast-oriented
• Performs resource reservations for point-to-multipoint applications
• Adapts changing group membership & routes
• Unicast, a special case

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


• Different Reservation Styles
• Soft-state at intermediate routers
• Reservation valid for specified duration
• Released after timeout, unless first refreshed
• Simplex
• Requests resources from sender to receiver
• Bidirectional flows require separate reservations
• Supports IPv4 and IPv6
• Non capable RSVP routers can use best effort service.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


RSVP Architecture

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Reservation
Messages RSVP
(Reservation
Protocol) Describe Traffic and
QoS Requirements

Flow
Session
Descriptor

Information
required to identify QoS in
packets belonging to Filter Spec Flow Spec terms of
a flow Bandwidth,
Delay

Traffic Tspec Rspec


behavior

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


• RSVP Session: It is defined to be a data flow identified by a destination IP
address, IP Protocol Number and optionally a destination port number.
• A session can consist of multiple senders; Each sender in a session is a
source of one or more data flows. Each data flow in a session has the same
multicast address. The specific data flow is identified by a flow identifier
field Rx1 S1
S2 Multicast Rx2
Rx3
• What RSVP does not do:
• It does not specify how the network provides the reserved b/w to the data flows.
• It is not a routing protocol.
• RSVP lays special emphasis on heterogeneous receivers

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


1. Receiver Initiated Reservation
• Receiver initiates resource reservation
• Main intention is to support multiparty conferencing with
heterogeneous receivers.
• Receiver does not directly know path taken by data packets.
• Sender sends Path Messages to receiver using existing routing
protocol.
• The purpose of this is to store Path state in each node along the path

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


• On receiving Path message, receiver sends Resv messages in a unicast fashion
toward sender
• Each router record state of path
• Each router performs admission & policy control (send PathErr message if
rejected)
• Each router stores address of previous RSVP router (PHOP) and inserts its address
in last hop field and forwards message, establishing the path in the reverse
direction

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Path Message

Field Description
Phop Address of previous hop RSVP capable node that
forwards the Path message
Sender Template Sender IP address and optionally sender port
Sender Tspec Sender’s Traffic Characteristics
Adspec Information used to advertise end to end path to
receivers

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


2. Reservation Merging
• Resources are shared among receivers up to point where paths to
different receivers diverge
• RSVP process at nodes will merge requests at node where sufficient
resources are already reserved

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
3. Reservation Style
• A reservation request specifies a reservation style that indicates
whether senders in the session have distinct or shared resource
reservations and whether senders are selected according to an
explicit list.

• Wildcard filter: single reservation for all senders in a session


• Fixed filter: distinct reservation for each sender
• Shared Explicit filter: single reservation for a specified set of senders
• 3 sender and 3 Rx attached to a router

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


WildCard Filter
• Symbolically represented as WF(*{Q})
• * Wildcard sender selection
• Q flowspec
• Applications that include audio conferencing
• Flowspec assumed to be 1D in multiples of base resource of quantity B

Ex: Audio conferencing

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
Fixed Filter Style
• Symbolically FF(S1{Q1}, S2{Q2}, … )
• Si selected Sender
• Qi Resource request for sender i.
• Total reservation on a link is sum of all Qi’s for a given session.

Ex: Video
conferencing

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
Shared Explicit Style
• Symbolically represented as SE(S1,S2,…{Q})
• Si selected sender
• Q flowspec

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
4. Soft State
• Routers keep state about reservation.
• Periodic messages refresh state.
• Non-refreshed state times out automatically.
• Alternative: Hard state
• No periodic refresh messages.
• State is guaranteed to be there.
• State is kept till explicit removal.
• Properties of soft state:
• Adapts to changes in routes, sources, and receivers.
• Recovers from failures
• Cleans up state after receivers' dropouts

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


RSVP
• Refresh messages are transmitted once in every R (30) seconds.
• Tolerate Loss as long as at least one of the K (3) consecutive messages
gets through.
• Each Path and Resv message carries TIME_VALUES object containing
refresh period R.
• If a Refresh message is absent, RSVP uses teardown messages.
• PathTear: Same as Path messages, delete path state and dependent
reservation state along path.
• ResvTear: Towards senders deleting reservation state along the way.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


RSVP Message Format
• Has common header and a body consisting of variable number of
objects
• Version 1
• Flags Not defined
• Message Type Seven types
• Checksum 1s complement algorithm
• Send_TTL IP TTL value with which message was sent
• RSVP Length Total Length of message in octets including header

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Object Format
• Length Total object length in octets
• Class-Num Identifies Object Class
• C-Type Identify subclass of the object
• NULL, SESSION, RSVP_HOP, TIME_VALUES, STYLE, FLOWSPEC, FILTER-SPEC,
SENDER_TEMPLATE, SENDER_TSPEC, ADSPEC, ERROR_SPEC, POLICY_DATA,
INTEGRITY, SCOPE, RESV_CONFIRM

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Drawback of IntServ

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Differentiated Services
• Offer different level of performance to different users.

• Scalability – Per flow is replaced by per aggregate basis.

• Complexity – Complex processing is moved to the edge of network.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Differentiated Services
• DS Model Aggregates entire customer’s requirement for QoS.
• A customer wishing to receive differentiated services must have a
Service Level Agreement (SLA) with its service provider.
• SLA is a service contract between customer and service provider that
specifies the service that customer will receive.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Differentiated Services
• SLA includes a Traffic Conditioning Agreement (TCA) that gives
detailed service parameters such as traffic shaping.
• SLA can be static or dynamic.
• Dynamic uses “Bandwidth Broker” to effect SLA changes.
• To receive different service levels for different packets, the packets
are marked by assigning values in DS field.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
Differentiated Services
• Different values in DS field corresponds to different packet forwarding
treatments at each router called per hop behaviors (PHB)
• Service provider performs Traffic Classification (IP address, port
number) and Traffic Conditioning (Marking , shaping, dropping) by
using traffic classifier and traffic conditioner, respectively.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


DS Field
• A DS capable node uses classifier to select packets based on the value
of DS field and uses buffer management and scheduling mechanisms
to deliver the specific PHB based on selection result.
• Six bits of DS are used as Differentiated Services Code Point (DSCP)
Two bits are unused.
• Default codepoint – 000000

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Per Hop Behavior
• Two Services
• 1. Expedited Forwarding PHB (EF PHB) – Low loss, low latency, low
jitter, assured bandwidth, end to end service
• Virtual leased line
• Aggregate arrival rate of packets at every node must be less than
aggregates minimum allowed departure rate.
• CodePoint 101110
• Similar to CBR in ATM

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Per Hop Behavior
• 2. Assured Forwarding PHB (AF PHB) –
• Aggregate traffic from a particular customer with high assurance
• There are 4 independent classes defined
• Within each AF class packets are assigned to one of the drop
precedence value. If there is congestion, this value determines the
importance of packet.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Traffic Conditioner
• Elements include
• Meter: Measures traffic to check conformance to a traffic profile, such as
burst size
• Marker: Sets DSCP in packet header.
• Shaper: Delays packets so that they are compliant with the traffic profile
• Dropper: Discards traffic that violates its traffic profile.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Bandwidth Broker
• Tracks current allocation of traffic to various services and handles new
requests for service according to organizational policies and current
state of traffic allocation.
• Managed bandwidth
• Maintains policy database
• Authenticates each requester and decides whether there is sufficient
bandwidth
• Maintains bilateral agreements with neighboring domains.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Label Distribution Protocol (LDP)
• Labels are distributed between LERs and LSRs using LDP
• LSRs regularly exchange label and reachability information with each
other using standardized procedures
• Used to build a picture of the network that can be used to forward
packets
• Label Switch Paths are created by network operators – similar to PVC
and VPN

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


MPLS over ATM/Frame Relay/Ethernet
• A majority of MPLS examples are used to carry IP traffic over Ethernet
links
• But MPLS can also carry IP traffic over ATM and frame relay links

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


MPLS Applications
• Traffic Engineering
• Virtual Private Network (VPN)
• Quality of Service (QoS)

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Traffic Engineering
• Traffic engineering allows a network administrator to select the path between two notes
and bypass the normal routed hop-by-hop paths. An administrator may elect to explicitly
define the path between nodes to ensure QoS or have the traffic follow a specified path
to avoid traffic congestion at certain hops.
• The network administrator can reduce congestion by forcing the frame to travel around
the overloaded segments. Traffic engineering, then, enables an administrator to define a
policy for forwarding frames rather than depending upon dynamic routing protocols.
• Traffic engineering is similar to source-routing in that an explicit path is defined for the
frame to travel. However, unlike source-routing, the hop-by-hop definition is not carried
with every frame. Rather, the hops are configured in the LSRs ahead of time along with
the appropriate label values.
• The administrator could be a centrally located program.
• Traffic engineering is an important tool for network management. It is NOT a customer
service. (So you will not see it on a carrier’s web site.)

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


MPLS – Traffic Engineering

Overload !!
LER 1 LER 4 IP
IP Overload !!
IP L IP L

Forward to IP L
LSR 2
LSR 3
LSR 4 LSR 2 LSR 3
LSR X

 End-to-End forwarding decision determined by


ingress node.
 Enables Traffic Engineering
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
MPLS-based VPN
• One of most popular MPLS applications is the implementation of
VPN.
• The basic concept is the same as ATM transparent LAN.
• Using label (instead of IP address) to interconnect multiple sites over
a carrier’s network. Each site has its own private IP address space.
• Different VPNs may use the same IP address space.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


MPLS VPN - Example
192.168.1.0 192.168.2.0

E1 E1

E3 E3
E1 E2 E2
E2

192.168.3.0 -- E1 10 E3 10 E1 30 E2 30 E3 -- E1 192.168.4.0
-- E2 20 E3 20 E1 40 E2 40 E3 -- E2
LSP
uni-direction
10 E3 -- E1 30 E2 10 E1 -- E1 30 E3
20 E3 -- E1 40 E2 20 E1 -- E2 40 E3
uni-direction LSP

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


MPLS VPN Connection Model
MPLS MPLS
Edge Edge
VPN_A MPLS Core VPN_A
10.2.0.0 11.5.0.0

VPN_B VPN_A
10.2.0.0 10.1.0.0
VPN_A
11.6.0.0 VPN_B
10.3.0.0
VPN_B
10.1.0.0

VPN_A: 10.2.0.0/24, 11.5.0.0/24, 11.6.0.0/24, 10.1.0.0/24


VPN_B: 10.2.0.0/24, 10.1.0.0/24, 10.3.0.0/24
Q: For a meshed connection, how many label paths are needed?

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Case Study (I)
AT&T MPLS Private Transport Network Services
• Features and Benefits
• Advanced Management options
• MPLS-based security
• Meshed topology for any-to-any
connectivity
• Traffic prioritization - 4 Classes of
Services (CoS)
• Service Level Agreements (SLAs)
• Web-based reporting
Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal
Case Study (II)
Verizon Private IP Service (MPLS)
• History:
• MCI (Verizon) adopted MPLS on a large scale in 1998 as a traffic engineering technology on its public
Internet backbone

• Features and Benefits:


• Exceptional Service. 24-hour monitoring customer service, and service level agreements (SLAs).
• Any-to-Any Connectivity. Multiple locations are connected (meshed topology). You no longer need PVCs to
communicate between sites; rather
• Cost-Effective Solution. Private IP Service utilizes existing network infrastructure without building and
operating a private VPN.
• Intranets and Extranets. Private IP Service captures the enhanced networking efficiencies associated with an
IP-based WAN, bringing together all the elements to support e-business applications within the company or
between companies.
• MPLS Technology. Private IP Service provides varying Class of Services (CoS) and flexible IP routing that
optimize network’s performance.

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


High Speed LANs
Dr. Ramakrishna M

Ref: Chapter 16 :William Stallings Data and Computer Communications 7th Edition

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Introduction
• Range of technologies
• Fast and Gigabit Ethernet
• Fibre Channel
• High Speed Wireless LANs

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Why High-Speed LANs?
• Office LANs used to provide basic connectivity
• Connecting PCs and terminals to mainframes and midrange systems that ran corporate
applications
• Providing workgroup connectivity at departmental level
• Traffic patterns light
• Emphasis on file transfer and electronic mail
• Speed and power of PCs has risen
• Graphics-intensive applications and GUIs
• MIS organizations recognize LANs as essential
• Began with client/server computing
• Now dominant architecture in business environment
• Intranetworks
• Frequent transfer of large volumes of data

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal


Applications Requiring High Speed LANs
• Centralized server farms
• User needs to draw huge amounts of data from multiple centralized servers
• E.g. Color publishing
• Servers contain tens of gigabytes of image data
• Downloaded to imaging workstations
• Power workgroups
• Small number of cooperating users
• Draw massive data files across network
• E.g. Software development group testing new software version or computer-aided design (CAD)
running simulations
• High-speed local backbone
• Processing demand grows
• LANs proliferate at site
• High-speed interconnection is necessary

Dr. Ramakrishna M, Dept. of I&CT, MIT, MAHE, Manipal

You might also like