Introduction To Socket Programming-NBV
Introduction To Socket Programming-NBV
` ` ` ` ` ` ` ` `
A Small Dose of Questions To Know You Little Briefing about Unix Internals Recapitulation of What is Internet Variety of Addresses involved Socket Concepts Related System Calls Simple TCP Client and Server in action Simple UDP Client and Server in action What is DNS
` ` `
` `
What is the Difference between Data Communications and Computer Networks?. What is firmware?. Why do we need to split a message?. Why do we require so many levels of control? (Is network system is reliable?) What are physical and logical addresses?. What is the conceptual difference between DLL and NLL?.
` ` ` ` ` ` `
What is fork()? What is signal? What is Process and Thread? What is a device driver?. What is a daemon?. What is exec() What are locks?.
Internet[work]
CS
`
Router: node that connects distinct networks Host: network endpoints (computer, PDA, light switch, ) Together, an independently administered entity
Enterprise, ISP, etc.
EE ME
Internet[work]
ATM
802.3
Frame relay
` `
Internet vs. internet The Internet: the interconnected set of networks of the Internet Service Providers (ISPs) and endnetworks, providing data communications services.
Network of internetworks, and more About 17,000 different ISP networks make up the Internet Many other end networks 100,000,000s of hosts
Node
`
Link
Node
Links can be
Wired or wireless
10
R R
R R
H
R R
H: Hosts R: Routers
11
Packets
Short bursts: buffer Buffer sizes varies from network to network. So, fragmentation takes places What if buffer overflows?
Packets dropped Sender adjusts rate until load = resources congestion control
14
Problem: Packet size On Ethernet, max packet is 1.5KB Typical web page is 10KB
Solution: Fragment data across packets ml x.ht inde GET GET index.html
15
Friendly greeting
Muttered reply
Destination?
16
17
The hardware/software of communicating parties are often not built by the same vendor Yet they can communicate because they use the same protocol
x Actually implementations could be different x But must adhere to same specification
`
18
Protocols/layers can be implemented and modified in isolation Each layer offers a service to the higher layer, using the services of the lower layer. Peer layers on different systems communicate via a protocol.
higher level protocols (e.g. TCP/IP, Appletalk) can run on multiple lower layers multiple higher level protocols can share a single physical network
19
20
FTP
HTTP
NV
TFTP UDP
App protocols Two transport protocols: provide logical channels to apps Interconnection of n/w technologies into a single logical n/w
Note: No strict layering. App writers can define apps that run on any lower level protocols.
21
FTP
HTTP
NV
TFTP UDP
Waist
Data Link
NETn
Physical
The Hourglass Model The waist: minimal, carefully chosen functions. Facilitates interoperability and rapid evolution
22
23
User A
Get index.html
User B
Connection ID
Header
24
` `
TCP
NET1
NET2
NETn
25
V/HL ID TTL
TOS Prot.
TCP IP
TCP IP
26
TCP
` ` ` `
Telephone Call
Guaranteed delivery In-order delivery Setup connection followed by conversation
Reliable guarantee delivery Byte stream in-order delivery Checksum for validity Setup connection followed by data transfer
27
UDP
No guarantee of delivery Not necessarily in-order delivery No validity guaranteed Must address each independent packet
Postal Mail
Unreliable Not necessarily in-order delivery Must address each reply
Application file transfer e-mail web documents real-time audio/ video stored audio/video interactive games financial apps
Bandwidth elastic elastic elastic audio: 5Kb-1Mb video:10Kb-5Mb same as above few Kbps elastic
Time Sensitive no no no yes, 100s msec yes, few secs yes, 100s msec yes and no
29
Byte Order
Different computers may have different internal representation of 16 / 32-bit integer (called host byte order). Examples Big-Endian byte order (e.g., used by Motorola 68000):
TCP/IP specifies a network byte order which is the bigendian byte order. For some WinSock functions, their arguments (i.e., the parameters to be passed to these functions) must be stored in network byte order. WinSock provides functions to convert between host byte order and network byte order:
32
Processes
A process has text: machine instructions (may be shared by other processes) data stack Process may execute either in user mode or in kernel mode. Process information are stored in two places: Process table User table
36
37
Process Table
Process table: an entry in process table has the following information: process state: A. running in user mode or kernel mode B. Ready in memory or Ready but swapped C. Sleep in memory or sleep and swapped PID: process id UID: user id scheduling information signals that is sent to the process but not yet handled a pointer to per-process-region table 38 There is a single process table for the entire system
Process table
Active process
resident swappable
text u area data stack
Region table
40
Process table
text data stack Active process
Region table
Reference count = 2
42
System Call
A process accesses system resources through system call. System call for Process Control: fork: create a new process wait: allow a parent process to synchronize its execution with the exit of a child process. exec: invoke a new program. exit: terminate process execution File system: File: open, read, write, lseek, close inode: chdir, chown chmod, stat fstat 43 others: pipe dup, mount, unmount, link, unlink
main() { int fpid; printf("Before forking ...\n"); fpid = fork(); if (fpid == 0) { printf("Child Process fpid=%d\n", fpid); } else { printf("Parent Process fpid=%d\n", fpid); } printf("After forking fpid=%d\n", fpid); }
$ cc forkEx1.c -o forkEx1 $ forkEx1 Before forking ... Child Process fpid=0 After forking fpid=0 Parent Process fpid=14707 After forking fpid=14707 $
45
/* forkEx2.c */ #include <stdio.h> main() { int fpid; printf("Before forking ...\n"); system("ps"); fpid = fork(); system("ps"); printf("After forking fpid=%d\n", fpid); }
$ forkEx2 Before forking ... PID TTY TIME CMD 14759 pts/9 0:00 tcsh 14778 pts/9 0:00 sh 14777 pts/9 0:00 forkEx2 PID TTY TIME CMD 14781 pts/9 0:00 sh 14759 pts/9 0:00 tcsh 14782 pts/9 0:00 sh 14780 pts/9 0:00 forkEx2 14777 pts/9 0:00 forkEx2 After forking fpid=14780 $ PID TTY TIME CMD 14781 pts/9 0:00 sh 14759 pts/9 0:00 tcsh 14780 pts/9 0:00 forkEx2 After forking fpid=0
47
/* pid.c */ #include <stdio.h> #include <sys/types.h> #include <unistd.h> main() { printf("pid=%d ppid=%d\n",getpid(), getppid()); } $ cc pid.c -o pid $ pid pid=14935 ppid=14759 $
48
/* forkEx3.c */ #include <stdio.h> #include <sys/types.h> #include <unistd.h> main() { int fpid; printf("Before forking ...\n"); if((fpid = fork())== 0) { printf("Child Process fpid=%d pid=%d ppid=%d\n", fpid, getpid(), getppid()); } else { printf("Parent Process fpid=%d pid=%d ppid=%d\n", fpid, getpid(), getppid()); } printf("After forking fpid=%d pid=%d ppid=%d\n", fpid, getpid(), getppid()); }
49
$ cc forkEx3.c -o forkEx3 $ forkEx3 Before forking ... Parent Process fpid=14942 pid=14941 ppid=14759 After forking fpid=14942 pid=14941 ppid=14759 $ Child Process fpid=0 pid=14942 ppid=1 After forking fpid=0 pid=14942 ppid=1 $ ps PID TTY TIME CMD 14759 pts/9 0:00 tcsh
50
51
#include <stdio.h> #include <sys/types.h> #include <unistd.h> main() { int fpid, status; printf("Before forking ...\n"); fpid = fork(); if (fpid == 0) { printf("Child Process fpid=%d pid=%d ppid=%d\n", fpid, getpid(), getppid()); } else { printf("Parent Process fpid=%d pid=%d ppid=%d\n", fpid, getpid(), getppid()); } wait(&status); printf("After forking fpid=%d pid=%d ppid=%d\n", fpid, getpid(), getppid()); }
52
$ cc forkEx4.c -o forkEx4 $ forkEx4 Before forking ... Parent Process fpid=14980 pid=14979 ppid=14759 Child Process fpid=0 pid=14980 ppid=14979 After forking fpid=0 pid=14980 ppid=14979 After forking fpid=14980 pid=14979 ppid=14759 $
53
int execl(file_name, arg0 [, arg1, ..., argn], NULL) char *file_name, *arg0, *arg1, ..., *argn; int execv(file_name, argv) char *file_name, *argv[]; int execle(file_name, arg0 [, arg1, ..., argn], NULL, envp) char *file_name, *arg0, *arg1, ..., *argn, *envp[]; int execve(file_name, argv, envp) char *file_name, *argv[], *envp[]; int execlp(file_name, arg0 [, arg1, ..., argn], NULL) char *file_name, *arg0, *arg1, ..., *argn; int execvp(file_name, argv) char *file_name, *argv[];
55
/* execEx1.c */ #include <stdio.h> #include <unistd.h> main() { printf("Before execing ...\n"); execl("/bin/date", "date", 0); printf("After exec\n"); } $ execEx1 Before execing ... Sun May 9 16:39:17 CST 1999 $
56
/* execEx2.c */ #include <sys/types.h> #include <unistd.h> #include <stdio.h> $ execEx2 Before execing ... After exec and fpid=14903 main() $ Sun May 9 16:47:08 CST 1999 { $ int fpid; printf("Before execing ...\n"); fpid = fork(); if (fpid == 0) { execl("/bin/date", "date", 0); } printf("After exec and fpid=%d\n",fpid); 57 }
Handling Signal
A signal is a message from one process to another. Signal are sometime called software interrupt Signals usually occur asynchronously. Signals can be sent A. by one process to anther (or to itself) B. by the kernel to a process. Unix signals are content-free. That is the only thing that can be said about a signal is it has arrived or not
58
Handling Signal
Most signals have predefined meanings: A. sighup (HangUp): when a terminal is closed, the hangup signal is sent to every process in control terminal. B. sigint (interrupt): ask politely a process to terminate. C. sigquit (quit): ask a process to terminate and produce a codedump. D. sigkill (kill): force a process to terminate. See signEx1.c
59
#include <stdio.h> #include <sys/types.h> #include <unistd.h> main() { int fpid, *status; printf("Before forking ...\n"); fpid = fork(); if (fpid == 0) { printf("Child Process fpid=%d pid=%d ppid=%d\n", fpid, getpid(), getppid()); for(;;); /* loop forever */ } else { printf("Parent Process fpid=%d pid=%d ppid=%d\n", fpid, getpid(), getppid()); } wait(status); /* wait for child process */ printf("After forking fpid=%d pid=%d ppid=%d\n", fpid, getpid(), getppid()); }
60
$ cc sigEx1.c -o sigEx1 $ sigEx1 & Before forking ... Parent Process fpid=14989 pid=14988 ppid=14759 Child Process fpid=0 pid=14989 ppid=14988 $ ps PID TTY TIME CMD 14988 pts/9 0:00 sigEx1 14759 pts/9 0:01 tcsh 14989 pts/9 0:09 sigEx1 $ kill -9 14989 $ ps ...
61
Scheduling Processes
On a time sharing system, the kernel allocates the CPU to a process for a period of time (time slice or time quantum) preempts the process and schedules another one when time slice expired, and reschedules the process to continue execution at a later time. The scheduler use round-robin with multilevel feedback algorithm to choose which process to be executed: A. Kernel allocates the CPU to a process for a time slice. B. preempts a process that exceeds its time slice. C. feeds it back into one of the several priority queues.
62
Process Priority
Priority Levels
swapper wait for Disk IO wait for buffer wait for inode ... wait for child exit User level 0 User level 1 ... User level n
63
Processes
0 1 2
60
75
0 60 30
60
67
15
75
0 60 30
60
60
63
7 67 33
67
15
75
0 60 30
3 4
76
63
7 ...
67
15
65
Booting
When the computer is powered on or rebooted, a short built-in program (maybe store in ROM) reads the first block or two of the disk into memory. These blocks contain a loader program, which was placed on the disk when disk is formatted. The loader is started. The loader searches the root directory for /unix or /root/unix and load the file into memory The kernel starts to execute.
66
init process
The init process is a process dispatcher:spawning processes, allow users to login. Init reads /etc/inittab and spawns getty when a user login successfully, getty goes through a login procedure and execs a login shell. Init executes the wait system call, monitoring the death of its child processes and the death of orphaned processes by exiting parent.
68
Init fork/exec a getty progrma to manage the line When the shell dies, init wakes up and fork/exec a getty for the line
The shell runs programs for the user unitl the user logs off
The login process prints the password message, read the password then check the password
69
File Subsystem
A file system is a collection of files and directories on a disk or tape in standard UNIX file system format. Each UNIX file system contains four major parts: A. boot block: B. superblock: C. i-node table: D. data block: file storage
70
...
Block n Block n+1 Block n+1 - last:Files
...
The last Block
71
Boot Block
A boot block may contains several physical blocks. Note that a physical block contains 512 bytes (or 1K or 2KB) A boot block contains a short loader program for booting It is blank on other file systems.
72
Superblock
Superblock contains key information about a file system Superblock information: A. Size of a file system and status: label: name of this file system size: the number of logic blocks date: the last modification date of super block. B. information of i-nodes the number of i-nodes the number of free i-nodes C. information of data block: free data blocks. 73 The information of a superblock is loaded into memory.
I-nodes
i-node: index node (information node) i-list: the list of i-nodes i-number: the index of i-list. The size of an i-node: 64 bytes. i-node 0 is reserved. i-node 1 is the root directory. i-node structure: next page
74
mode owner timestamp Data block Size Reference count Block count Data block Data block Data block
I-node structure
...
Data block
...
Data block
Direct blocks 0-9 Indirect block Indirect block Indirect block Indirect block Triple indirect
75
...
I-node structure
mode: A. type: file, directory, pipe, symbolic link B. Access: read/write/execute (owner, group,) owner: who own this I-node (file, directory, ...) timestamp: creation, modification, access time size: the number of bytes block count: the number of data blocks direct blocks: pointers to the data single indirect: pointer to a data block which pointers to the data blocks (128 data blocks). Double indirect: (128*128=16384 data blocks) 76 Triple indirect: (128*128*128 data blocks)
Data Block
A data block has 512 bytes. A. Some FS has 1K or 2k bytes per blocks. B. See blocks size effect (next page) A data block may contains data of files or data of a directory. File: a stream of bytes. Directory format:
i-# Next size File name pad
77
home
alex
jenny
john
Report.txt grep
bin find
notes
i-#
Next
10
Report.txt
pad
i-#
Next
bin
pad
i-#
Next
notes
pad
Next
78
home
alex
kc
i-nodes
notes
i-node
... ...
i-node
Report.txt grep
source find
i-node
u area
Current directory inode
In-core inodes
i-node
...
i-node
...
Report.txt
...
i-node
Data Blocks
...
Current Dir
79
File table
The kernel have a global data structure, called file table, to store information of file access. Each entry in file table contains: A. a pointer to in-core inode table B. the offset of next read or write in the file C. access rights (r/w) allowed to the opening process. D. reference count.
82
83
84
main() { int fd1, fd2, fd3; printf("Before open ...\n"); fd1 = open("/etc/passwd", O_RDONLY); fd2 = open("./openEx1.c", O_WRONLY); fd3 = open("/etc/passwd", O_RDONLY); printf("fd1=%d fd2=%d fd3=%d \n", fd1, fd2, fd3); }
85
U area
file table
... CNT=1 R ... CNT=1 W
in-core inodes
CNT=2 /etc/passwd
2 3 4 5 6 7
...
CNT=1 ./openEx2.c
...
. . .
CNT=1 R ...
...
86
87
main() { int fd1, fd2, fd3; char buf1[20], buf2[20]; buf1[19]='\0'; buf2[19]='\0'; printf("=======\n"); fd1 = open("/etc/passwd", O_RDONLY); read(fd1, buf1, 19); printf("fd1=%d buf1=%s \n",fd1, buf1); read(fd1, buf2, 19); printf("fd1=%d buf2=%s \n",fd1, buf2); printf("=======\n"); }
$ cc openEx2.c -o openEx2 $ openEx2 ======= fd1=3 buf1=root:x:0:1:Super-Us fd1=3 buf2=er:/:/sbin/sh daemo ======= $
88
#include <stdio.h> $ cc openEx3.c -o openEx3 #include <sys/types.h> $ openEx3 #include <fcntl.h> ====== main() fd1=3 buf1=root:x:0:1:Super-Us { fd2=4 buf2=root:x:0:1:Super-Us int fd1, fd2, fd3; ====== char buf1[20], buf2[20]; $ buf1[19]='\0'; buf2[19]='\0'; printf("======\n"); fd1 = open("/etc/passwd", O_RDONLY); fd2 = open("/etc/passwd", O_RDONLY); read(fd1, buf1, 19); printf("fd1=%d buf1=%s \n",fd1, buf1); read(fd2, buf2, 19); printf("fd2=%d buf2=%s \n",fd2, buf2); printf("======\n"); }
89
U area
file table
... CNT=1 R ... ...
in-core inodes
CNT=2 /etc/passwd
Descriptor table
2 3 4 5 6 7
...
. . .
CNT=1 R ...
91
#include <stdio.h> $ cc openEx4.c -o openEx4 #include <sys/types.h> $ openEx4 #include <fcntl.h> ====== main() fd1=3 buf1=root:x:0:1:Super-Us { fd2=4 buf2=er:/:/sbin/sh int fd1, fd2, fd3; daemo char buf1[20], buf2[20]; ====== buf1[19]='\0'; $ buf2[19]='\0'; printf("======\n"); fd1 = open("/etc/passwd", O_RDONLY); fd2 = dup(fd1); read(fd1, buf1, 19); printf("fd1=%d buf1=%s \n",fd1, buf1); read(fd2, buf2, 19); printf("fd2=%d buf2=%s \n",fd2, buf2); printf("======\n"); char buf1[20], buf2[20]; }
92
U area
file table
... CNT=2 R ... ...
in-core inodes
CNT=1 /etc/passwd
Descriptor table
2 3 4 5 6 7
...
. . .
... ...
94
95
96
/* creatEx1.c */ #include <stdio.h> #include <sys/types.h> #include <fcntl.h> main() { int fd1; char *buf1="I am a string\n"; char *buf2="second line\n"; printf("======\n"); fd1 = creat("./testCreat.txt", O_WRONLY); write(fd1, buf1, 20); write(fd1, buf2, 30); printf("fd1=%d buf1=%s \n",fd1, buf1); close(fd1); chmod("./testCreat.txt", 0666); printf("======\n"); }
97
$ cc creatEx1.c -o creatEx1 $ creatEx1 ====== fd1=3 buf1=I am a string ====== $ ls -l testCreat.txt -rw-rw-rw- 1 cheng $ more testCreat.txt ...
staff
98
/* statEx1.c */ #include <sys/stat.h> main() { int fd1, fd2, fd3; struct stat bufStat1, bufStat2; char buf1[20], buf2[20]; printf("======\n"); fd1 = open("/etc/passwd", O_RDONLY); fd2 = open("./statEx1", O_RDONLY); fstat(fd1, &bufStat1); fstat(fd2, &bufStat2); printf("fd1=%d inode no=%d block size=%d blocks=%d\n", fd1, bufStat1.st_ino,bufStat1.st_blksize, bufStat1.st_blocks); printf("fd2=%d inode no=%d block size=%d blocks=%d\n", fd2, bufStat2.st_ino,bufStat2.st_blksize, bufStat2.st_blocks); printf("======\n"); }
100
$ cc statEx1.c -o statEx1 $ statEx1 ====== fd1=3 inode no=21954 block size=8192 blocks=6 fd2=4 inode no=190611 block size=8192 blocks= ====== ...
101
102
103
#include <stdio.h> #include <sys/types.h> #include <fcntl.h> main() { chdir("/usr/bin"); system("ls -l"); }
$ ls -l /usr/bin $
104
` ` ` ` ` `
pipe(int a[]) FILE* popen(char *command, char *mode) pclose(FILE*) mknod(char *, S_IFIFO|0644, 0) mknod filename p mkfifo filename
Signal SIGABRT SIGALRM SIGFPE SIGHUP SIGILL SIGINT SIGKILL SIGPIPE SIGQUIT SIGSEGV SIGTERM SIGUSR1 SIGUSR2 SIGCHLD SIGCONT SIGSTOP SIGTSTP SIGTTIN SIGTTOU SIGBUS SIGPOLL SIGPROF SIGSYS SIGTRAP SIGURG SIGVTALRM SIG CPU SIG FSZ Process abort signal. Alarm clock.
Description
Erroneous arithmetic operation. Hangup. Illegal instruction. Terminal interrupt signal. Kill (cannot be caught or ignored). Write on a pipe with no one to read it. Terminal quit signal. Invalid memory reference. Termination signal. User-defined signal 1. User-defined signal 2. Child process terminated or stopped. Continue executing, if stopped. Stop executing (cannot be caught or ignored). Terminal stop signal. Background process attempting read. Background process attempting write. Bus error. Pollable event. Profiling timer expired. Bad system call. Trace/breakpoint trap. High bandwidth data is available at a socket. Virtual timer expired. CPU time limit exceeded. File size limit exceeded.
Signal number
Handler
#include <stdio.h> /* standard I/O functions */ #include <unistd.h> /* standard unix functions, like getpid() */ #include <sys/types.h> /* various type definitions, like pid_t */ #include <signal.h> /* signal name macros, and the signal() prototype */ /* first, here is the signal handler */ void catch_int(int sig_num) { /* re-set the signal handler again to catch_int, for next time */ signal(SIGINT, catch_int); /* and print the message */ printf("Don't do that"); fflush(stdout); } /* and somewhere later in the code.... */ /* set the INT (Ctrl-C) signal handler to 'catch_int' */ signal(SIGINT, catch_int); /* now, lets get into an infinite loop of doing nothing. */ for ( ;; ) pause(); }
Signal sets
Signal sets are data types (structures) to represent multiple signals. The following functions are used manipulate them.
struct sigaction{
void (*sa_handler)(); /*pointer to function or SIG_DFL or SIG_IGN*/ sigset_t sa_mask/ /*additional signal to be blocked during execution of hander*/ int sa_flags; /*special flags and options*/}
#include <stdio.h> #include <sys/types.h> #include <sys/ipc.h> #include <sys/msg.h> int main(int argc, char* argv[]){ /* create a private message queue, with access only to the owner. */ struct msgbuf* msg; struct msgbuf* recv_msg; int rc; int queue_id = msgget(IPC_PRIVATE, 0600); if (queue_id == -1) { perror("main: msgget"); exit(1); } printf("message queue created, queue id '%d'.\n", queue_id); msg = (struct msgbuf*)malloc(sizeof(struct msgbuf)+strlen("hello world")); msg->mtype = 1; strcpy(msg->mtext, "hello world"); rc = msgsnd(queue_id, msg, strlen(msg->mtext)+1, 0); if (rc == -1) { perror("main: msgsnd"); exit(1); } free(msg); printf("message placed on the queue successfully.\n"); recv_msg = (struct msgbuf*)malloc(sizeof(struct msgbuf)+strlen("hello world")); rc = msgrcv(queue_id, recv_msg, strlen("hello world")+1, 0, 0); if (rc == -1) { perror("main: msgrcv"); exit(1); } printf("msgrcv: received message: mtype '%d'; mtext '%s'\n", recv_msg->mtype, recv_msg>mtext); return 0; }
192.168.19.1
Internet
192.168.19.2 [21]
SMTP [25]
192.168.19.2 192.168.19.0 198.163.197.4
Telnet [23]
192.168.19.3
11 5
Clients
11 6
Port numbers are used to identify entities on a host Port numbers can be
x Well-known (port 0-1023) x Dynamic or private (port 1024-65535)
NTP daemon
port 123
Web server
port 80
11 7
Consider Railway Station Counter 0: Platform Tickets Counter 1: Enquiries Counter 2: Reservations -----Counter 8: Current Reservations Counter 9: Cancellations
` `
file transfer
user at host
` `
Transfer file to/from remote host Client/server model Client: side that initiates transfer (either to/from remote) Server: remote host ftp: RFC 959 ftp server: port 21
` `
Ftp client contacts ftp server at port 21, specifying TCP as transport protocol Two parallel TCP connections opened:
Control: exchange commands, responses between client, server. out of band control Data: file data to/from server
TCP control connection port 21
FTP client
FTP server
application layer
application layer
Internet
stack
Internet Internet
` ` `
Client
Client host address 128.2.194.242 Note: 3479 is an ephemeral port allocated by the kernel
Server host address 208.216.181.15 Note: 80 is a well-known port associated with Web servers
Server host 128.2.194.242 Client host Service request for 128.2.194.242:80 (i.e., the Web server) Web server (port 80) Kernel Echo server (port 7)
Client
Client
Each server waits for requests to arrive on a wellknown port associated with a particular service.
Port 7: echo server Port 23: telnet server Port 25: mail server See /etc/services for a Port 80: HTTP server comprehensive list of the services available on a Other applications should choose between 1024 and 65535
Linux machine.
What is a socket?
To the kernel, a socket is an endpoint of communication. To an application, a socket is a file descriptor that lets the application read/write from/to the network.
x Remember: All Unix I/O devices, including networks, are modeled as files.
Clients and servers communicate with each by reading from and writing to socket descriptors. The main distinction between regular file I/O and socket I/O is how the application opens the socket descriptors.
Endpoint Address
Generic Endpoint Address x The socket abstraction accommodates many protocol families. x It supports many address families. x It defines the following generic endpoint address: x ( address family, endpoint address in that family ) x Data type for generic endpoint address:
TCP/IP Endpoint Address x For TCP/IP, an endpoint address is composed of the following items: x Address family is AF_INET (Address Family for InterNET). x Endpoint address in that family is composed of an IP address and a port number.
12 8
x The IP address identifies a particular computer, while the port number identifies a particular application running on that computer. x The TCP/IP endpoint address is a special instance of the generic one:
x Port Number x A port number identifies an application running on a computer. x When a client program is executed, WinSock randomly chooses an unused port number for it. x Each server program must have a pre-specified port number, so that the client can contact the server.
12 9
x The port number is composed of 16 bits, and its possible values are used in the following manner: x 0 - 1023: For well-known server applications. x 1024 - 49151: For user-defined server applications (typical range to be used is 1024 - 5000). x 49152 - 65535: For client programs. x Port numbers for some well-known server applications: x WWW server using TCP: 80 x Telnet server using TCP: 23 x SMTP (email) server using TCP: 25 x SNMP server using UDP: 161.
13 0
0 1 2 3 4
Standard input Standard output Standard error Data structure for file 0 Data structure for file 1 Data structure for file 2 131
0 1 2 3 4
Family: PF_INET Service: SOCK_STREAM Local IP: 111.22.3.4 Remote IP: 123.45.6.78 Local Port: 2249 Remote Port: 3726
132
13 3
` ` `
Fixed length: 32 bits Total IP address size: 4 billion Initial class-ful structure (1981)
Class A: 128 networks, 16M hosts Class B: 16K networks, 64K hosts Class C: 2M networks, 256 hosts
134
Network ID 8 16
Host ID 24 32
Host ID
13 5
Large tables
2 Million class C networks
13 6
Original goal: network part would uniquely identify a single physical network Inefficient address space usage
Class A & B networks too big Each physical network must have one network number
x Also, very few LANs have close to 64K hosts x Easy for networks to (claim to) outgrow class-C
` `
Routing table size is too high Need simple way to reduce the number of network numbers assigned
Subnetting: Split up single network address ranges Fizes routing table size problem, partially
137
` `
Network Network
111111111111111111111111 00000000
Assume an organization was assigned address 150.100 (class B) Assume < 100 hosts per subnet (department) How many host bits do we need?
Seven
` `
13 9
Host configured with IP adress and subnet mask Subnet number = IP (AND) Mask (Subnet number, subnet mask) Outgoing I/F
D = destination IP address For each forwarding table entry (SN, SM D1 = SM & D if (D1 == SN) Deliver on OI Else Forward to default router
14 0
OI)
14 1
14 2
14 3
14 4
14 5
Provider
201.10.0.0/22
201.10.4.0/24
201.10.5.0/24
201.10.6.0/23
Packet R Sender
1
R
3 1
R1
R2
4 2
3 R
4 R 3
R
3
R3
4 R 3
Receiver
14 7
Packet
R1, R2, R3, R
R2, R3, R 2 2 3 1
Sender
R1
4
R2
4 2
R3, R
3
R3
4
Receiver R
14 8
Network picks a path Assigns VC numbers for flow on each link Populates forwarding table Packet Sender 5
5 1
7
7 3 1 2
R1
4
R2
4 2
1,7
4,2
1,5
3,7
2
6 3
1 2,2
R3
4
Receiver
3,6
14 9
128.2.198.222 host LAN 1 router 128.2.254.36 WAN host ... host Destination = 128.2.198.222
op Sender MAC address Sender IP Address Target MAC address Target IP Address
op: Operation
1: request 2: reply
Sender
Host sending ARP message
Target
Intended receiver of message
Low-Level Protocol
Operates only within local network Determines mapping from IP address to hardware (MAC) address Mapping determined dynamically
x No need to statically configure tables x Only requirement is that each host know its own IP address
15 1
op Sender MAC address Sender IP Address Target MAC address Target IP Address
op: Operation
1: request
Sender
Host that wants to determine MAC address of another machine
Target
Other machine
Requestor
x Why include its MAC address?
Mapping
Fills desired host IP address in target IP address
Sending
Send to MAC address ff:ff:ff:ff:ff:ff
x Ethernet broadcast
15 2
op Sender MAC address Sender IP Address Target MAC address Target IP Address
op: Operation
2: reply
Sender
Host with desired IP address
Target
Original requestor
15 3
Destination Gateway Genmask Iface 128.2.209.100 0.0.0.0 255.255.255.255 eth0 128.2.0.0 0.0.0.0 255.255.0.0 eth0 127.0.0.0 0.0.0.0 255.0.0.0 lo 0.0.0.0 128.2.254.36 0.0.0.0 eth0
` ` ` ` `
Host 128.2.209.100 when plugged into CS ethernet Dest 128.2.209.100 routing to same machine Dest 128.2.0.0 other hosts on same ethernet Dest 127.0.0.0 special loopback address Dest 0.0.0.0 default route to rest of Internet
Main CS router: gigrouter.net.cs.cmu.edu (128.2.254.36)
15 4
Send query: Request-reverse [ether addr], server responds with IP x Used primarily by diskless nodes, when they first initialize, to find their Internet address
`
DHCPDISCOVER - broadcast
DHCPOFFER
IP addressing information Boot file/server information (for network booting) DNS name servers Lots of other stuff - protocol is extensible; half of the options reserved for local site definition and use.
Lease-based assignment
Clients can renew: Servers really should preserve this information across client & server reboots.
Use:
Generic config for desktops/dial-in/etc.
x Assign IP address/etc., from pool
Goal: allow host to dynamically obtain its IP address from network server when it joins network
Can renew its lease on address in use Allows reuse of addresses (only hold address while connected an on) Support for mobile users who want to join network (more shortly)
DHCP overview: host broadcasts DHCP discover msg [optional] DHCP server responds with DHCP offer msg [optional] host requests IP address: DHCP request msg DHCP server sends address: DHCP ack msg
Network Layer
415 9
DHCP server
223.1.2.9
223.1.2.1
B
223.1.1.3 223.1.3.1 223.1.3.27
223.1.2.2 223.1.3.2
Network Layer
416 0
DHCP discover src : 0.0.0.0, 68 dest.: 255.255.255.255,67 yiaddr: 0.0.0.0 transaction ID: 654 DHCP offer src: 223.1.2.5, 67 dest: 255.255.255.255, 68 yiaddrr: 223.1.2.4 transaction ID: 654 Lifetime: 3600 secs
arriving client
DHCP request src: 0.0.0.0, 68 dest:: 255.255.255.255, 67 yiaddrr: 223.1.2.4 transaction ID: 655 Lifetime: 3600 secs DHCP ACK src: 223.1.2.5, 67 dest: 255.255.255.255, 68 yiaddrr: 223.1.2.4 transaction ID: 655 Lifetime: 3600 secs
416 1
time
Network Layer
of first-hop router for client name and IP address of DNS sever network mask (indicating network versus host portion of address)
DHCP: example
DHCP DHCP DHCP DHCP
168.1.1.1
Ethernet demuxed to IP
DHCP: example
DHCP DHCP DHCP DHCP
DHCP ACK containing clients IP address, IP address of first-hop router for client, name & IP address of DNS server
encapsulation of DHCP
DHCP DHCP DHCP DHCP DHCP
server, frame forwarded to client, demuxing up to DHCP at client client now knows its IP address, name and IP address of DSN server, IP address of its first-hop router
request
Message type: Boot Reply (2) Hardware type: Ethernet Hardware address length: 6 Hops: 0 Transaction ID: 0x6b3a11b7 Seconds elapsed: 0 Bootp flags: 0x0000 (Unicast) Client IP address: 192.168.1.101 (192.168.1.101) Your (client) IP address: 0.0.0.0 (0.0.0.0) Next server IP address: 192.168.1.1 (192.168.1.1) Relay agent IP address: 0.0.0.0 (0.0.0.0) Client MAC address: Wistron_23:68:8a (00:16:d3:23:68:8a) Server host name not given Boot file name not given Magic cookie: (OK) Option: (t=53,l=1) DHCP Message Type = DHCP ACK Option: (t=54,l=4) Server Identifier = 192.168.1.1 Option: (t=1,l=4) Subnet Mask = 255.255.255.0 Option: (t=3,l=4) Router = 192.168.1.1 Option: (6) Domain Name Server Length: 12; Value: 445747E2445749F244574092; IP Address: 68.87.71.226; IP Address: 68.87.73.242; IP Address: 68.87.64.146 Option: (t=15,l=20) Domain Name = "hsd1.ma.comcast.net."
reply
Link-local address
1111 1110 10 :: 64 bit interface ID (usually from Ethernet addr)
x (fe80::/64 prefix)
Uniqueness test (anyone using this address?) Router contact (solicit, or wait for announcement)
x Contains globally unique prefix x Usually: Concatenate this prefix with local ID -> globally unique IPv6 ID
` DNS ` DNS
Design Today
16 7
` ` `
Need naming to identify resources Once identified, resource must be located How to name resource?
Naming hierarchy
Lookup a Central DNS? ` Single point of failure ` Traffic volume ` Distant centralized database ` Single point of update ` Doesnt scale!
16 9
Count of hosts was increasing: machine per domain machine per user
Many more downloads Many more updates
17 0
Dont need
x Atomicity x Strong consistency
17 1
Conceptually, programmers can view the DNS database as a collection of millions of host entry structures:
/* DNS host entry structure */ struct hostent { char *h_name; char **h_aliases; int h_addrtype; int h_length; char **h_addr_list; }; /* official domain name of host */ /* null-terminated array of domain names */ /* host address type (AF_INET) */ /* length of an address, in bytes */ /* null-terminated array of in_addr structs */
in_addr is a struct consisting of 4-byte IP address gethostbyname: query key is a DNS host name. gethostbyaddr: query key is an IP address.
17 2
Identification
12 bytes
Name, type fields for a query RRs in response to query Records for authoritative servers Additional helpful info that may be used
Questions (variable number of answers) Answers (variable number of resource records) Authority (variable number of resource records) Additional Info (variable number of resource records)
17 3
Identification
Used to match up request/response
Flags
1-bit to mark query or response 1-bit to mark authoritative or not 1-bit to request recursive resolution 1-bit to indicate support for recursive resolution
17 4
RR format: (class,
FOR IN class:
`
Type=A
name is hostname value is IP address
Type=CNAME
name is an alias name for some canonical (the real) name value is canonical name
Type=NS
name is domain (e.g. foo.com) value is name of authoritative name server for this domain
Type=MX
value is hostname of mailserver associated with name
17 5
17 6
Each node in hierarchy stores a list of names that end with same suffix Suffix = path up tree E.g., given this tree, where would following be stored: Fred.com Fred.edu Fred.wisc.edu Fred.cs.wisc.edu Fred.cs.cmu.edu
17 7
bu mit
Example:
CS.WISC.EDU created by WISC.EDU administrators Who creates WISC.EDU or .EDU?
17 9
Local name servers contact root servers when they cannot resolve a name
Configured with wellknown root servers
18 0
Name servers
Either responsible for some zone or Local servers
x Do lookup of distant host names for local hosts x Typically answer queries about local zone
18 1
18 2
Recursive query:
`
Server goes out and searches for more info (recursive) Only returns final answer or not found
Iterative query:
`
Server responds with as much as it knows (iterative) I dont know this name, but ask this server
8 Workload impact on choice? ` Local server typically does recursive ` Root/distant server does iterative requesting host
surf.eurecom.fr
6authoritative name
server dns.cs.umass.edu
gaia.cs.umass.edu
18 3
18 4
www.cs.wisc.edu
Client resolver
18 5
ftp.cs.wisc. du
Cli
Local DNS s rv r
18 6
18 7
unnamed root
Task
Given IP address, find its name When is this needed?
arpa
edu
`
Method
Maintain separate hierarchy based on IP names Write 128.2.194.242 as 242.194.2.128.in-addr.arpa
x Why is the address reversed?
in-addr
cmu
128
cs
Managing
Authority manages IP addresses assigned to it E.g., CMU manages name space 2.128.in-addr.arpa
cmcl
194
242
kittyhawk
128.2.194.242
18 8
Name servers can add additional data to response Typically used for prefetching
CNAME/MX/NS typically point to another host name Responses include address of host referred to in additional section
18 9
Generic Top Level Domains (gTLD) = .com, .net, .org, etc Country Code Top Level Domain (ccTLD) = .us, .ca, .fi, .uk, etc Root server ({a-m}.root-servers.net) also used to cover gTLD domains
Load on root servers was growing quickly! Moving .com, .net, .org off root servers was clearly necessary to reduce load done Aug 2000
19 0
` ` ` ` ` ` ` `
.info general info .biz businesses .aero air-transport industry .coop business cooperatives .name individuals .pro accountants, lawyers, and physicians .museum museums Only new one actives so far = .info, .biz, .name
19 1
1 - (#DNS/#connections)
` `
Lower TTLs for A records does not affect performance DNS performance really relies more on NS-record caching
19 2
Goal: learn how to build client/server application that communicate using sockets
Socket
`
I
SD4. I ,
socket
i troduced i
` `
explicitl created, used, released apps client/ser er paradi t o t pes of transport ser ice ia socket I: unreliable datagram reliable, byte stream-oriented
a host-local, application-created, OS-controlled interface (a door) into which application process can both send and receive messages to/from another application process
19 4
Server and Client exchange messages over the network through a common Socket API
Socket API
kernel space
hardware
19 5
Socket: a door between application process and endend-transport protocol (UDP or TCP) TCP service: reliable transfer of bytes from one process to another
internet
host or server
host or server
19 6
Client must contact server ` server process must first be running ` server must have created socket (door) that welcomes clients contact Client contacts server by: ` creating client-local TCP socket ` specifying IP address, port number of server process ` When client creates socket: client TCP establishes connection to server TCP
When contacted by client, server TCP creates new socket for server process to communicate with client allows server to talk with multiple clients source port numbers used to distinguish clients (more in Chap 3)
application viewpoint TCP provides reliable, in-order transfer of bytes (pipe) between client and server
19 7
A stream is a sequence of characters that flow into or out of a process. An input stream is attached to some input source for the process, eg, keyboard or socket. An output stream is attached to an output source, eg, monitor or socket.
19 8
ke board
monitor
input stream
in rom ser
output stream
in rom erver
out o erver
input stream
Client
TCP
create socket, connect to hostid, port=x clientSocket = Socket() send request using clientSocket
UDP: no connection between client and server ` no handshaking ` sender explicitly attaches IP address and port of destination to each packet ` server must extract IP address, port of sender from received packet UDP: transmitted data may be received out of order, or lost
application viewpoint UDP provides unreliable transfer of groups of bytes (datagrams) between client and server
20 1
Client
create socket, clientSocket = DatagramSocket()
read request from serverSocket write reply to serverSocket specifying client host address, port number
20 2
C ient process
Output: sends
Input: receives
20 3
This contains the protocol specific addressing information that is passed from the user process to the kernel and vice versa Each of the protocols supported by a socket implementation have their own socket address structure sockaddr_suffix
Where suffix represents the protocol family Ex: sockaddr_in Internet/IPv4 socket address structure sockaddr_ipx IPX socket address structure
The generic socket address structure sockaddr { address family protocol specific data }; The internet/IPv4 socked address structure sockaddr_in { in_family Internet address family sin_port Transport layer Port Number in_addr sin_addr IP address; sin_zero[8] Padding ; };
x int8_t signed 8-bit integer - <sys/types.h> x uint8_t unsigned 8-bit integer - <sys/types.h> x int16_t signed 16-bit integer - <sys/types.h> x uint16_t unsigned 16-bit integer - <sys/types.h> x int32_t signed 32-bit integer - <sys/types.h> x uint32_t unsigned 32-bit integer - <sys/types.h> x sa_family_t address family of - <sys/socket.h> x socklen_t length of socket address structure -<sys/socket.h> x in_addr_t IPv4 address, normally uint32_t <netinet/in.h> x in_port_t TCP/UDP port, normally uint16_t <netinet/in.h>
Byte ordering
Network byte order Host byte order htons(l), ntohs(l)
sockfd socket(domain, type, protocol) domain is the protocol/address family AF_INET,AF_IPX.. type is the the type of service SOCK_DGRAM,SOCK_STREAM protocol is the specific protocol that is supported by the protocol family specified(as param1) Returns a fresh socket descriptor on success, 1 on error status close(sockfd) Flushes(supposed to) the pending I/O to disk Returns 1 on error
struct sockaddr_in { Above sin_family; error unsigned* short calls return 1 on/* address family (always AF_INET) */ unsigned short sin_port; /* port num in network byte order */ struct in_addr sin_addr; /* IP addr in network byte order */ unsigned char sin_zero[8]; /* pad to sizeof(struct sockaddr) */ };
status bind(sockfd,ptr_to_sockaddr,sockaddr_size) Associates the sockaddr with sockfd The rules for successful binding depend on the protocol family of the socket(specified during call to socket) Necessary for receiving connections on STREAM socket status listen(sockfd,backlog) Notifies the willingness to accept connections backlog Maximum number of established connections yet to be notified to their respective user processes(calls to accepts) On unbounded sockets an implicit bind is done with IN_ADDRANY and a random port as the address and port parameters respectively
connfd accept(sockfd,ptr_to_sockaddr,ptr_to_sockaddr_size) Blocks till a connection gets established on sockfd and returns a new file descriptor on which I/O can be performed with the remote entity Fills the sockaddr and size parameters with the address information (and its size respectively) of the connecting entity bind and listen are assumed to have been called on sockfd prior to calling accept status connect(sockfd, ptr_to_sockaddr, sockaddr_size) Initiates a new connection with the entity addressed by sockaddr in case of a STREAM socket Sets the default remote address for I/O in case of DGRAM socket
* Above calls return 1 on error
SEND: int send(int sockfd, const void *msg, int len, int flags);
msg: message you want to send len: length of the message flags := 0 returned: the number of bytes actually sent
RECEIVE: int recv(int sockfd, void *buf, int len, unsigned int flags);
buf: buffer to receive the message len: length of the buffer (dont give me more!) flags := 0 returned: the number of bytes received
SEND (DGRAM-style): int sendto(int sockfd, const void *msg, int len, int flags, const struct sockaddr *to, int tolen);
msg: message you want to send len: length of the message flags := 0 to: socket address of the remote process tolen: = sizeof(struct sockaddr) returned: the number of bytes actually sent
RECEIVE (DGRAM-style): int recvfrom(int sockfd, void *buf, int len, unsigned int flags, struct sockaddr *from, int *fromlen);
buf: buffer to receive the message len: length of the buffer (dont give me more!) from: socket address of the process that sent the data fromlen:= sizeof(struct sockaddr) flags := 0 returned: the number of bytes received
SEND
RECEIVE
Concurrent server
CREATE BIND
TCP Server
`
Web Server
`
ort 80
Ethernet Adapter
21 7
Since web traffic uses TCP, the web server must create a socket of type SOCK_STREAM int fd; /* socket descriptor */
if((fd = socket(AF_INET, SOCK_STREAM, 0)) < 0) { perror(socket); exit(1); } socket returns an integer (socket descriptor) fd < 0 indicates that an error occurred AF_INET associates a socket with the Internet protocol family SOCK_STREAM selects the TCP protocol
21 9
/* 1) create the socket */ /* 2) bind the socket to a port */ if(listen(fd, 5) < 0) { perror(listen); exit(1); }
22 0
accept returns a new socket (newfd) with the same properties as the original socket (fd) newfd < 0 indicates that an error occurred 22
2
newfd = accept(fd, (struct sockaddr*) &cli, &cli_len); if(newfd < 0) { perror("accept"); exit(1); }
Now the server can exchange data with the client by using read and write on the descriptor newfd. Why does accept need to return a new descriptor?
22 3
` `
read can be used with a socket read blocks waiting for data from the client but does not guarantee that sizeof(buf) is read
int fd; char buf[512]; int nbytes; /* /* /* /* 1) 2) 3) 4) /* socket descriptor */ /* used by read() */ /* used by read() */
create the socket */ bind the socket to a port */ listen on the socket */ accept the incoming connection */
TCP
IP
Ethernet Adapter
22 5
IP Addresses are commonly written as strings (128.2.35.50), but programs deal with IP addresses as integers. Converting strings to numerical address:
struct sockaddr_in srv; srv.sin_addr.s_addr = inet_addr(128.2.35.50); if(srv.sin_addr.s_addr == (in_addr_t) -1) { fprintf(stderr, "inet_addr failed!\n"); exit(1); }
22 6
` `
#include <netdb.h> struct hostent *hp; /*ptr to host info for remote*/ struct sockaddr_in peeraddr; char *name = www.cs.cmu.edu; peeraddr.sin_family = AF_INET; hp = gethostbyname(name) peeraddr.sin_addr.s_addr = ((struct in_addr*)(hp->h_addr))->s_addr;
22 7
/* 1) create the socket */ /* 2) connect() to the server */ /* Example: A client could write a request to a server */ if((nbytes = write(fd, buf, sizeof(buf))) < 0) { perror(write); exit(1); }
22 9
TCP Server
socket() bind()
TCP Client
socket() connect() write() connection establishment data request data reply read() close() end-of-file notification
listen() accept()
Close connection
close(clientSocket); }
Exampl : C s rv r ( CP)
/* server.c */ void main(int argc, char *argv[]) { struct sockaddr_in sad; /* structure to hold an IP address */ struct sockaddr_in cad; int welcomeSocket, connectionSocket; /* socket descriptor */ struct hostent *ptrh; /* pointer to a host table entry */ char clientSentence[128]; char capitalizedSentence[128]; port = atoi(argv[1]); welcomeSocket = socket(PF_INET, SOCK_STREAM, 0); memset((char *)&sad,0,sizeof(sad)); /* clear sockaddr structure */ sad.sin_family = AF_INET; /* set family to Internet */ sad.sin_addr.s_addr = INADDR_ANY; /* set the local IP address */ sad.sin_port = htons((u_short)port);/* set the port number */ bind(welcomeSocket, (struct sockaddr *)&sad, sizeof(sad)); Create welcoming socket at port & Bind a local address
n=write(connectionSocket, capitalizedSentence, strlen(capitalizedSentence)+1); close(connectionSocket); } } End of while loop, loop back and wait for another client connection Write out the result to socket
Status transition
*after return from accept
application viewpoint UDP provides unreliable transfer of groups of bytes (datagrams) between client and server
`
NTP daemon Port 123 UDP
IP
Ethernet Adapter
23 9
socket returns an integer (socket descriptor) fd < 0 indicates that an error occurred AF_INET: associates a socket with the Internet protocol family SOCK_DGRAM: selects the UDP protocol
24 0
read does not provide the clients address to the UDP server
int fd; struct sockaddr_in srv; struct sockaddr_in cli; char buf[512]; int cli_len = sizeof(cli); int nbytes; /* 1) create the socket */ /* 2) bind to the socket */ nbytes = recvfrom(fd, buf, sizeof(buf), 0 /* flags */, (struct sockaddr*) &cli, &cli_len); if(nbytes < 0) { perror(recvfrom); exit(1); }
24 2
/* /* /* /* /* /*
socket descriptor */ used by bind() */ used by recvfrom() */ used by recvfrom() */ used by recvfrom() */ used by recvfrom() */
nbytes = recvfrom(fd, buf, sizeof(buf), 0 /* flags */, (struct sockaddr*) cli, &cli_len);
24 3
2 UDP Clients
`
IP
Ethernet Adapter
24 4
` `
write is not allowed Notice that the UDP client does not bind a port number a port number is dynamically assigned when the first sendto is called
/* sendto: send data to IP Address 128.2.35.50 port 80 */ srv.sin_family = AF_INET; srv.sin_port = htons(80); srv.sin_addr.s_addr = inet_addr(128.2.35.50); nbytes = sendto(fd, buf, sizeof(buf), 0 /* flags */, (struct sockaddr*) &srv, sizeof(srv)); if(nbytes < 0) { perror(sendto); exit(1); }
24 5
UDP Server
socket() bind()
UDP Client
socket() sendto() data r q t bl r
sendto()
24 6
t );
fr
SERVER:
s\n,modifiedSentence);
close(clientSocket); }
Exampl : C s rv r (
P)
/* server.c */ void main(int argc, char *argv[]) { struct sockaddr_in sad; /* structure to hold an IP address */ struct sockaddr_in cad; int serverSocket; /* socket descriptor */ struct hostent *ptrh; /* pointer to a host table entry */ char clientSentence[128]; char capitalizedSentence[128]; port = atoi(argv[1]); serverSocket = socket(PF_INET, SOCK_DGRAM, 0); memset((char *)&sad,0,sizeof(sad)); /* clear sockaddr structure */ sad.sin_family = AF_INET; /* set family to Internet */ sad.sin_addr.s_addr = INADDR_ANY; /* set the local IP address */ sad.sin_port = htons((u_short)port);/* set the port number */ bind(serverSocket, (struct sockaddr *)&sad, sizeof(sad)); Create welcoming socket at port & Bind a local address
UDP
IP
Ethernet Adapter
25 0
int s1; int s2; /* /* /* /* 1) 2) 3) 4) create socket s1 */ create socket s2 */ bind s1 to port 2000 */ bind s2 to port 3000 */
while(1) { recvfrom(s1, buf, sizeof(buf), ...); /* process buf */ recvfrom(s2, buf, sizeof(buf), ...); /* process buf */ }
`
Server Flaw
client 1
call connect ret connect call et
server
call accept
client 2
ret accept
User goes out to lunch Client 1 blocks waiting for user to type in data
call read
call connect
Client 2 blocks waiting to complete its connection request until after lunch!
ser er
call accept
client 2
call connect
ret accept call read (dont block) call accept ret connect call et write call read write clo e end read clo e call read
User goes out to lunch Client 1 blocks waiting for user to type in data
ret accept
` ` ` ` ` ` ` ` ` ` `
while (1) { newsock = (int *)malloc(sizeof (int)); *newsock=accept(sock, (struct sockaddr *)&from, &fromlen); if (*newsock < 0) error("Accepting"); printf("A connection has been accepted from %s\n", inet_ntoa((struct in_addr)from.sin_addr)); retval = pthread_create(&tid, NULL, ConnectionThread, (void *)newsock); if (retval != 0) { error("Error, could not create thread"); } }
` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` `
/****** ConnectionThread **********/ void *ConnectionThread(void *arg) { int sock, n, len; char buffer[BUFSIZE]; char *msg = "Got your message"; sock = *(int *)arg; len = strlen(msg); n = read(sock,buffer,BUFSIZE-1); while (n > 0) { buffer[n]='\0'; printf("Message is %s\n",buffer); n = write(sock,msg,len); if (n < len) error("Error writing"); n = read(sock,buffer,BUFSIZE-1); if (n < 0) error("Error reading"); } if (close(sock) < 0) error("closing"); pthread_exit(NULL); return NULL; }
Concurrency
Threading
Easier to understand Race conditions increase complexity
Select()
Explicit control flows, no race conditions Explicit control more complicated
What is select()?
Monitor multiple descriptors How does it work?
Setup sets of sockets to monitor select(): blocking until something happens Something could be
Incoming connection: accept() Clients sending data: read() Pending data to send: write() Timeout
Concurrency Step 1
Allowing address reuse
int sock, opts=1; sock = socket(...); // To give you an idea of where the new code goes setsockopt(sock, SOL_SOCKET, SO_REUSEADDR, &opts, sizeof(opts));
Concurrency Step 2
Monitor sockets with select()
int select(int maxfd, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, const struct timespec *timeout);
maxfd
max file descriptor + 1
timeout
how long to wait without activity before returning
The Server
// socket() call and non-blocking code is above this point if((bind(sockfd, (struct sockaddr *) &saddr, sizeof(saddr)) < 0) { // bind! printf(Error binding\n); ... } if(listen(sockfd, 5) < 0) { printf(Error listening\n); ... } clen=sizeof(caddr); // Setup pool.read_set with an FD_ZERO() and FD_SET() for // your server socket file descriptor. (whatever socket() returned) while(1) { pool.ready_set = pool.read_set; // Save the current state pool.nready = select(pool.maxfd+1, &pool.ready_set, &pool.write_set, NULL, NULL); if(FD_ISSET(sockfd, &pool.ready_set)) { // Check if there is an incoming conn isock=accept(sockfd, (struct sockaddr *) &caddr, &clen); // accept it add_client(isock, &pool); // add the client by the incoming socket fd } check_clients(&pool); // check if any data needs to be sent/received from clients } ... close(sockfd); // listen for incoming connections
What is pool?
typedef struct { /* represents a pool of connected descriptors */ int maxfd; /* largest descriptor in read_set */ fd_set read_set; /* set of all active read descriptors */ fd_set write_set; /* set of all active read descriptors */ fd_set ready_set; /* subset of descriptors ready for reading */ int nready; /* number of ready descriptors from select */ int maxi; /* highwater index into client array */ int clientfd[FD_SETSIZE]; /* set of active descriptors */ rio_t clientrio[FD_SETSIZE]; /* set of active read buffers */ ... // ADD WHAT WOULD BE HELPFUL FOR PROJECT1 } pool;
int select(int maxfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout);
FD_CLR(int fd, fd_set *fds); FD_ISSET(int fd, fd_set *fds); FD_SET(int fd, fd_set *fds); FD_ZERO(fd_set *fds);
`
/* /* /* /*
clear the bit for fd in fds */ is the bit for fd in fds? */ turn on the bit for fd in fds */ clear all bits in fds */
` `
writefds: returns a set of fds ready to write exceptfds: returns a set of fds with exception conditions
26 4
int select(int maxfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout); struct timeval { long tv_sec; long tv_usec; }
`
/* seconds / /* microseconds */
timeout
if NULL, wait forever and return only when one of the descriptors is ready for I/O otherwise, wait up to a fixed amount of time specified by timeout
x if we dont want to wait at all, create a timeout structure with timer value equal to 0
26 5
/* create and bind s1 and s2 */ while(1) { FD_ZERO(&readfds); /* initialize the fd set */ FD_SET(s1, &readfds); /* add s1 to the fd set */ FD_SET(s2, &readfds); /* add s2 to the fd set */ if(select(s2+1, &readfds, 0, 0, 0) < 0) { perror(select); exit(1); } if(FD_ISSET(s1, &readfds)) { recvfrom(s1, buf, sizeof(buf), ...); /* process buf */ } /* do the same for s2 */ }
26 6
TCP
IP
Ethernet Adapter
26 7
int fd, next=0; /* original socket */ int newfd[10]; /* new socket descriptors */ while(1) { fd_set readfds; FD_ZERO(&readfds); FD_SET(fd, &readfds); /* Now use FD_SET to initialize other newfds that have already been returned by accept() */ select(maxfd+1, &readfds, 0, 0, 0); if(FD_ISSET(fd, &readfds)) { newfd[next++] = accept(fd, ...); } /* do the following for each descriptor newfd[n] */ if(FD_ISSET(newfd[n], &readfds)) { read(newfd[n], buf, sizeof(buf)); /* process data */ } }
`
26 8
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Length | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Type: 4-byte integer Length: 2-byte integer Checksum: 2-byte integer Address: 4-byte IP address
26 9
/* ================================================== */ char buf[1024]; struct packet *pkt; pkt = (struct packet*) buf; pkt->type = htonl(1); pkt->length = htons(2); pkt->checksum = htons(3); pkt->address = htonl(4);
27 0
#include <stdio.h> /* for printf() and fprintf() */ #include <sys/socket.h> /* for socket(), connect(), sendto(), and recvfrom() */ #include <arpa/inet.h> /* for sockaddr_in and inet_addr() */ #include <stdlib.h> /* for atoi() and exit() */ #include <string.h> /* for memset() */ #include <unistd.h> /* for close() */ #define ECHOMAX 255 /* Longest string to echo */
int main(int argc, char *argv[]) { int sock; /* Socket descriptor */ struct sockaddr_in echoServAddr; /* Echo server address */ struct sockaddr_in fromAddr; /* Source address of echo */ unsigned short echoServPort =7; /* Echo server port */ unsigned int fromSize; /* address size for recvfrom() */ char *servIP=172.24.23.4; /* IP address of server */ char *echoString=I hope this works; /* String to send to echo server */ char echoBuffer[ECHOMAX+1]; /* Buffer for receiving echoed string */ int echoStringLen; /* Length of string to echo */ int respStringLen; /* Length of received response */
/* Create a datagram/UDP socket */ sock = socket(AF_INET, SOCK_DGRAM, 0); /* Construct the server address structure */ memset(&echoServAddr, 0, sizeof(echoServAddr)); /* Zero out structure */ echoServAddr.sin_family = AF_INET; /* Internet addr family */ echoServAddr.sin_addr.s_addr = htonl(servIP); /* Server IP address */ echoServAddr.sin_port = htons(echoServPort); /* Server port */ /* Send the string to the server */ sendto(sock, echoString, echoStringLen, 0, (struct sockaddr *) &echoServAddr, sizeof(echoServAddr); /* Recv a response */
fromSize = sizeof(fromAddr); recvfrom(sock, echoBuffer, ECHOMAX, 0, (struct sockaddr *) &fromAddr, &fromSize); /* Error checks like packet is received from the same server*/ /* null-terminate the received data */ echoBuffer[echoStringLen] = '\0'; printf("Received: %s\n", echoBuffer); /* Print the echoed arg */ close(sock); exit(0); } /* end of main () */
int main(int argc, char *argv[]) { int sock; /* Socket */ struct sockaddr_in echoServAddr; /* Local address */ struct sockaddr_in echoClntAddr; /* Client address */ unsigned int cliAddrLen; /* Length of incoming message */ char echoBuffer[ECHOMAX]; /* Buffer for echo string */ unsigned short echoServPort =7; /* Server port */ int recvMsgSize; /* Size of received message */ /* Create socket for sending/receiving datagrams */ sock = socket(AF_INET, SOCK_DGRAM, 0); /* Construct local address structure */ memset(&echoServAddr, 0, sizeof(echoServAddr)); /* Zero out structure */ echoServAddr.sin_family = AF_INET; /* Internet address family */ echoServAddr.sin_addr.s_addr = htonl(172.24.23.4); echoServAddr.sin_port = htons(echoServPort); /* Local port */ /* Bind to the local address */ bind(sock, (struct sockaddr *) &echoServAddr, sizeof(echoServAddr);
for (;;) /* Run forever */ { cliAddrLen = sizeof(echoClntAddr); /* Block until receive message from a client */ recvMsgSize = recvfrom(sock, echoBuffer, ECHOMAX, 0, (struct sockaddr *) &echoClntAddr, &cliAddrLen); printf("Handling client %s\n", inet_ntoa(echoClntAddr.sin_addr)); /* Send received datagram back to the client */ sendto(sock, echoBuffer, recvMsgSize, 0, (struct sockaddr *) &echoClntAddr, sizeof(echoClntAddr); } } /* end of main () */ Error handling is must
The setsockopt() function manipulates options associated with a socket. Options can exist at multiple protocol levels. However, the options are always present at the uppermost socket level. Options affect socket operations, such as the routing of packets, out-of-band data transfer, and so on.
` `
The level argument specifies the protocol level at which the option resides. To set options at the socket level, specify the level argument as SOL_SOCKET. To set options at other levels, supply the appropriate protocol number for the protocol controlling the option. For example, to indicate that an option is interpreted by the TCP (Transport Control Protocol), set level to the protocol number of TCP. The following options are supported for setsockopt(): SO_DEBUG Provides the ability to turn on recording of debugging information. This option takes an int value in the optval argument. This is a BOOL option. SO_BROADCAST Permits sending of broadcast messages, if this is supported by the protocol. This option takes an int value in the optval argument. This is a BOOL option. SO_REUSEADDR Specifies that the rules used in validating addresses supplied to bind() should allow reuse of local addresses, if this is supported by the protocol. This option takes an int value in the optval argument. This is a BOOLoption.
` `
SO_KEEPALIVE Keeps connections active by enabling periodic transmission of messages, if this is supported by the protocol. If the connected socket fails to respond to these messages, the connection is broken and processes writing to that socket are notified with an ENETRESET errno. This option takes an int value in the optval argument. This is a BOOL option. SO_LINGER Specifies whether the socket lingers on close() if data is present. If SO_LINGER is set, the system blocks the process during close() until it can transmit the data or until the end of the interval indicated by the l_lingermember, whichever comes first. If SO_LINGER is not specified, and close() is issued, the system handles the call in a way that allows the process to continue as quickly as possible. This option takes a linger structure in the optval argument.
` ` `
SO_OOBINLINE Specifies whether the socket leaves received out-of-band data (data marked urgent) in line. This option takes an int value in optval argument. This is a BOOL option. SO_SNDBUF Sets send buffer size information. This option takes an int value in the optval argument. SO_RCVBUF Sets receive buffer size information. This option takes an int value in the optval argument. SO_DONTROUTE Specifies whether outgoing messages bypass the standard routing facilities. The destination must be on a directly-connected network, and messages are directed to the appropriate network interface according to the destination address. The effect, if any, of this option depends on what protocol is in use. This option takes an int value in the optval argument. This is a BOOL option. TCP_NODELAY Specifies whether the Nagle algorithm used by TCP for send coalescing is to be disabled. This option takes an int value in the optval argument. This is a BOOL option. For boolean options, a zero value indicates that the option is disabled and a non-zero value indicates that the option is enabled.
` `
` ` `
RETURN VALUES If successful, setsockopt() returns a zero. If a failure occurs, it returns a value of -1 and sets errno to one of the following values: EBADF s is not a valid descriptor ENOTSOCK s is not a socket descriptor ENOPROTOOPT optname is unknown at indicated level EFAULT optval is an invalid pointer
Sample Usage: int skt, int sndsize; err = setsockopt(skt, SOL_SOCKET, SO_SNDBUF, (char *)&sndsize, (int)sizeof(sndsize));or: int skt, int sndsize; err = setsockopt(skt, SOL_SOCKET, SO_RCVBUF, (char *)&sndsize, (int)sizeof(sndsize));
` ` `
` ` ` `
int optval; int optlen; char *optval2; // set SO_REUSEADDR on a socket to true (1): optval = 1; setsockopt(s1, SOL_SOCKET, SO_REUSEADDR, &optval, sizeof optval); // bind a socket to a device name (might not work on all systems): optval2 = "eth1"; // 4 bytes long, so 4, below: setsockopt(s2, SOL_SOCKET, SO_BINDTODEVICE, optval2, 4); // see if the SO_BROADCAST flag is set: getsockopt(s3, SOL_SOCKET, SO_BROADCAST, &optval, &optlen); if (optval != 0) { print("SO_BROADCAST enabled on s3!\n"); }
` `
` `
ESCRIPTION The getsockopt() function retrieves the current value for a socket option associated with a socket of any type, in any state, and stores the result in optval. Options may exist at multiple protocol levels, but they are always present at the uppermost socket' level. Options affect socket operations, such as the routing of packets, out-of-band data transfer, and so on. The level argument specifies the protocol level at which the option resides. To retrieve options at the socket level, specify the level argument as SOL_SOCKET. To retrieve options at other levels, supply the appropriate protocol number for the protocol controlling the option. For example, to indicate that an option is to be interpreted by the TCP (Transport Control Protocol), set level to the protocol number of TCP. The value associated with the selected option is returned in the buffer optval. The integer pointed to by optlen should originally contain the size of this buffer; on return, it is set to the size of the value returned. For SO_LINGER, this is the size of a struct linger; for most other options it is the size of an integer. The application is responsible for allocating any memory space pointed to directly or indirectly by any of the parameters it specified. If an option has not been set with setsockopt(), getsockopt() returns the default value for the option.
` ` `
` `
O_DEBUG Reports whether debugging information is being recorded. This option stores an int value in the optval argument. This is a BOOL option. SO_ACCEPTCONN Reports whether socket listening is enabled. This option stores an int value in the optval argument. This is a BOOL option. SO_BROADCAST Reports whether transmission of broadcast messages is supported, if this is supported by the protocol. This option stores an int value in the optval argument. This is a BOOL option. SO_REUSEADDR Reports whether the rules used in validating addresses supplied to bind() should allow reuse of local addresses, if this is supported by the protocol. This option stores an int value in the optval argument. This is a BOOLoption. SO_KEEPALIVE Reports whether connections are kept active with periodic transmission of messages, if this is supported by the protocol. If the connected socket fails to respond to these messages, the connection is broken and processes writing to that socket are notified with an ENETRESET errno. This option stores an int value in the optval argument. This is a BOOL option.
` ` ` ` ` `
SO_LINGER Reports whether the socket lingers on close() if data is present. If SO_LINGER is set, the system blocks the process during close() until it can transmit the data or until the end of the interval indicated by the l_lingermember, whichever comes first. If SO_LINGER is not specified, and close() is issued, the system handles the call in a way that allows the process to continue as quickly as possible. This option stores a linger structure in the optval argument. SO_OOBINLINE Reports whether the socket leaves received out-of-band data (data marked urgent) in line. This option stores an int value in optval argument. This is a BOOL option. SO_SNDBUF Reports send buffer size information. This option stores an int value in the optval argument. SO_RCVBUF Reports receive buffer size information. This option stores an int value in the optval argument. SO_ERROR Reports information about error status and clears it. This option stores an int value in the optval argument. SO_TYPE Reports the socket type. This option stores an int value in the optval argument. SO_DONTROUTE Reports whether outgoing messages bypass the standard routing facilities. The destination must be on a directly-connected network, and messages are directed to the appropriate network interface according to the destination address. The effect, if any, of this option depends on what protocol is in use. This option stores an int value in the optval argument. This is a BOOL option. SO_MAX_MSG_SIZE Maximum size of a message for message-oriented socket types (for example, SOCK_DGRAM). Has no meaning for stream-oriented sockets. This option stores an int value in the optval argument.
` `
` ` `
CP_NODELAY Specifies whether the Nagle algorithm used by TCP for send coalescing is disabled. This option stores an int value in the optval argument. This is a BOOL option. For boolean options, a zero value indicates that the option is disabled and a non-zero value indicates that the option is enabled. RETURN VALUES If successful, getsockopt() returns a zero. If a failure occurs, it returns a value of -1 and sets errno to one of the following values: EBADF The parameter s is not a valid descriptor. ENOPROTOOPT The option is unknown at the level indicated. ENOTSOCK The parameter s is a file, not a socket.
int sockbufsize = 0; int size = sizeof(int); err = getsockopt(skt, SOL_SOCKET, SO_RCVBUF, (char *)&sockbufsize, &size);