Internetworking TCP/IP
Internetworking TCP/IP
DOUGLAS E. COMER
and
DAVID L. STEVENS
Contents
1 Introduction And Overview................................................................................................................................................. 7 1.1 Use Of TCP/IP ............................................................................................................................................................. 7 1.2 Designing Applications For A Distributed Environment............................................................................................... 7 1.3 Standard And Nonstandard Application Protocols........................................................................................................ 7 1.4 An Example Of Standard Application Protocol Use...................................................................................................... 8 1.5 An Example Connection .............................................................................................................................................. 8 1.6 Using TELNET To Access An Alternative Service....................................................................................................... 9 1.7 Application Protocols And Software Flexibility.......................................................................................................... 10 1.8 Viewing Services From The Provider's Perspective..................................................................................................... 10 1.9 The Remainder Of This Text...................................................................................................................................... 11 1.10 Summary.................................................................................................................................................................. 11 2 The Client Server Model And Software Design................................................................................................................. 13 2.1 Introduction ............................................................................................................................................................... 13 2.2 Motivation ................................................................................................................................................................. 13 2.3 Terminology And Concepts........................................................................................................................................ 13 2.4 Summary ................................................................................................................................................................... 19 3 Concurrent Processing In Client-Server Software.............................................................................................................. 21 3.1 Introduction ............................................................................................................................................................... 21 3.2 Concurrency In Networks........................................................................................................................................... 21 3.3 Concurrency In Servers .............................................................................................................................................. 22 3.4 Terminology And Concepts........................................................................................................................................ 23 3.5 An Example Of Concurrent Process Creation ............................................................................................................. 25 3.6 Executing New Code.................................................................................................................................................. 29 3.7 Context Switching And Protocol Software Design ...................................................................................................... 29 3.8 Concurrency And Asynchronous I/O .......................................................................................................................... 29 3.9 Summary ................................................................................................................................................................... 30 4 Program Interface To Protocols......................................................................................................................................... 33 4.1 Introduction ............................................................................................................................................................... 33 4.2 Loosely Specified Protocol Software Interface ............................................................................................................ 33 4.3 Interface Functionality ............................................................................................................................................... 33 4.4 Conceptual Interface Specification ............................................................................................................................. 34 4.5 System Calls .............................................................................................................................................................. 34
4.6 Two Basic Approaches To Network Communication.................................................................................................. 35 4.7 The Basic I/O Functions Available In UNIX .............................................................................................................. 35 4.8 Using UNIX I/O With TCP/IP.................................................................................................................................... 36 4.9 Summary ................................................................................................................................................................... 37 5 The Socket Interface ......................................................................................................................................................... 39 5.1 Introduction ............................................................................................................................................................... 39 5.2 Berkeley Sockets ........................................................................................................................................................ 39 5.3 Specifying A Protocol Interface .................................................................................................................................. 39 5.4 The Socket Abstraction .............................................................................................................................................. 40 5.4.2 System Data Structures For Sockets......................................................................................................................... 41 5.5 Specifying An Endpoint Address................................................................................................................................ 41 5.6 A Generic Address Structure...................................................................................................................................... 42 5.7 Major System Calls Used With Sockets ...................................................................................................................... 43 5.8 Utility Routines For Integer Conversion ..................................................................................................................... 45 5.9 Using Socket Calls In A Program............................................................................................................................... 46 5.10 Symbolic Constants For Socket Call Parameters....................................................................................................... 46 5.11 Summary.................................................................................................................................................................. 47 6 Algorithms And Issues In Client Software Design............................................................................................................. 48 6.1 Introduction ............................................................................................................................................................... 48 6.2 Learning Algorithms Instead Of Details..................................................................................................................... 48 6.3 Client Architecture..................................................................................................................................................... 48 6.4 Identifying The Location Of A Server ........................................................................................................................ 48 6.5 Parsing An Address Argument................................................................................................................................... 50 6.6 Looking Up A Domain Name..................................................................................................................................... 50 6.7 Looking Up A Well-Known Port By Name................................................................................................................. 51 6.8 Port Numbers And Network Byte Order ..................................................................................................................... 52 6.9 Looking Up A Protocol By Name ............................................................................................................................... 52 6.10 The TCP Client Algorithm....................................................................................................................................... 52 6.11 Allocating A Socket ................................................................................................................................................. 53 6.12 Choosing A Local Protocol Port Number .................................................................................................................. 53 6.13 A Fundamental Problem In Choosing A Local IP Address........................................................................................ 53 6.14 Connecting A TCP Socket To A Server .................................................................................................................... 54 6.15 Communicating With The Server Using TCP........................................................................................................... 54 6.16 Reading A Response From A TCP Connection ......................................................................................................... 55 6.17 Closing A TCP Connection 6.17.1 The Need For Partial Close................................................................................. 56
6.17.2 A Partial Close Operation...................................................................................................................................... 56 6.18 Programming A UDP Client .................................................................................................................................... 56 6.19 Connected And Unconnected UDP Sockets .............................................................................................................. 57 6.20 Using Connect With UDP ........................................................................................................................................ 57 6.21 Communicating With A Server Using UDP.............................................................................................................. 57 6.22 Closing A Socket That Uses UDP............................................................................................................................. 58 6.23 Partial Close For UDP.............................................................................................................................................. 58 6.24 A Warning About UDP Unreliability........................................................................................................................ 58 6.25 Summary.................................................................................................................................................................. 58 7 Example Client Software .................................................................................................................................................. 60 7.1 Introduction ............................................................................................................................................................... 60 7.2 The Importance Of Small Examples........................................................................................................................... 60 7.3 Hiding Details............................................................................................................................................................ 60 7.4 An Example Procedure Library For Client Programs.................................................................................................. 61 7.5 Implementation Of ConnectTCP ................................................................................................................................ 61 7.6 Implementation Of ConnectUDP................................................................................................................................ 62 7.7 A Procedure That Forms Connections ........................................................................................................................ 62 7.8 Using The Example Library ....................................................................................................................................... 65 7.9 The DAYTIME Service ............................................................................................................................................. 65 7.10 Implementation Of A TCP Client For DAYTIME .................................................................................................... 66 7.11 Reading From A TCP Connection ............................................................................................................................ 67 7.12 The TIME Service.................................................................................................................................................... 67 7.13 Accessing The TIME Service ................................................................................................................................... 68 7.14 Accurate Times And Network Delays....................................................................................................................... 68 7.15 A UDP Client For The TIME Service....................................................................................................................... 68 7.16 The ECHO Service................................................................................................................................................... 70 7.17 A TCP Client For The ECHO Service ...................................................................................................................... 70 7.18 A UDP Client For The ECHO Service...................................................................................................................... 72 7.19 Summary.................................................................................................................................................................. 74 8 Algorithms And Issues In Server Software Design ............................................................................................................ 77 8.1 Introduction ............................................................................................................................................................... 77 8.2 The Conceptual Server Algorithm .............................................................................................................................. 77 8.3 Concurrent Vs. Iterative Servers................................................................................................................................. 77 8.4 Connection-Oriented Vs. Connectionless Access........................................................................................................ 77 8.5 Connection-Oriented Servers...................................................................................................................................... 78
8.6 Connectionless Servers............................................................................................................................................... 78 8.7 Failure, Reliability, And Statelessness ........................................................................................................................ 79 8.8 Optimizing Stateless Servers ...................................................................................................................................... 79 8.9 Four Basic Types Of Servers ...................................................................................................................................... 80 8.10 Request Processing Time.......................................................................................................................................... 81 8.11 Iterative Server Algorithms ...................................................................................................................................... 81 8.12 An Iterative, Connection-Oriented Server Algorithm................................................................................................ 81 8.13 Binding To A Well-Known Address Using INADDR_ANY..................................................................................... 82 8.14 Placing The Socket In Passive Mode ........................................................................................................................ 82 8.15 Accepting Connections And Using Them ................................................................................................................. 82 8.16 An Iterative, Connectionless Server Algorithm......................................................................................................... 83 8.17 Forming A Reply Address In A Connectionless Server ............................................................................................. 83 8.18 Concurrent Server Algorithms.................................................................................................................................. 83 8.19 Master And Slave Processes ..................................................................................................................................... 84 8.20 A Concurrent, Connectionless Server Algorithm ...................................................................................................... 84 8.21 A Concurrent, Connection-Oriented Server Algorithm ............................................................................................. 84 8.22 Using Separate Programs As Slaves ......................................................................................................................... 85 8.23 Apparent Concurrency Using A Single Process ........................................................................................................ 85 8.24 When To Use Each Server Type ............................................................................................................................... 86 8.25 A Summary of Server Types Iterative, Connectionless Server ..................................................................................... 87 8.26 The Important Problem Of Server Deadlock ............................................................................................................. 87 8.27 Alternative Implementations .................................................................................................................................... 88 8.28 Summary.................................................................................................................................................................. 88 9 Iterative, Connectionless Servers (UDP)............................................................................................................................ 91 9.1 Introduction ............................................................................................................................................................... 91 9.2 Creating A Passive Socket.......................................................................................................................................... 91 9.3 Process Structure........................................................................................................................................................ 94 9.4 An Example TIME Server.......................................................................................................................................... 94 9.5 Summary ................................................................................................................................................................... 96 10 Iterative, Connection-Oriented Servers (TCP) ................................................................................................................. 99 10.1 Introduction ............................................................................................................................................................. 99 10.2 Allocating A Passive TCP Socket ............................................................................................................................. 99 10.3 A Server For The DAYTIME Service....................................................................................................................... 99 10.4 Process Structure.................................................................................................................................................... 100 10.5 An Example DAYTIME Server.............................................................................................................................. 100
10.6 Closing Connections .............................................................................................................................................. 102 10.7 Connection Termination And Server Vulnerability................................................................................................. 103 10.8 Summary................................................................................................................................................................ 103 11 Concurrent, Connection-Oriented Servers (TCP) .......................................................................................................... 105 11.1 Introduction ........................................................................................................................................................... 105 11.2 Concurrent ECHO.................................................................................................................................................. 105 11.3 Iterative Vs. Concurrent Implementations .............................................................................................................. 105 11.4 Process Structure.................................................................................................................................................... 105 11.5 An Example Concurrent ECHO Server .................................................................................................................. 106 11.6 Cleaning Up Errant Processes ................................................................................................................................ 109 11.7 Summary................................................................................................................................................................ 110 12 Single-Process, Concurrent Servers (TCP) .................................................................................................................... 111 12.1 Introduction ........................................................................................................................................................... 111 12.2 Data-driven Processing In A Server........................................................................................................................ 111 12.3 Data-Driven Processing With A Single Process ...................................................................................................... 111 12.4 Process Structure Of A Single-Process Server......................................................................................................... 112 12.5 An Example Single-Process ECHO Server ............................................................................................................. 112 12.6 Summary................................................................................................................................................................ 115 12 Single-Process, Concurrent Servers (TCP) .................................................................................................................... 117 12.1 Introduction ........................................................................................................................................................... 117 12.2 Data-driven Processing In A Server........................................................................................................................ 117 12.3 Data-Driven Processing With A Single Process ...................................................................................................... 117 12.4 Process Structure Of A Single-Process Server......................................................................................................... 118 12.5 An Example Single-Process ECHO Server ............................................................................................................. 118 12.6 Summary................................................................................................................................................................ 121
for services like file transfer, remote login, and electronic mail. Thus, a programmer would use a standard protocol for such services.
where the argument machine denotes the domain name of the machine to which remote login access is desired. Thus, to form a TELNET connection to machine nic.ddn.mil a user types:
telnet nic.ddn.mil
From the user's point of view, running telnet converts the user's terminal into a terminal that connects directly to the remote system. If the user is running in a windowing environment, the window in which the telnet command has been executed will be connected to the remote machine. Once the connection has been established, the telnet application sends each character the user types to the remote machine, and displays each character the remote machine emits on the user's screen. After a user invokes telnet and connects to a remote system, the remote system displays a prompt that requests the user to type a login identifier and a password. The prompt a machine presents to a remote user is identical to the prompt it presents to users who login on local terminals. Thus, TELNET provides each remote user with the illusion of being on a directly-connected terminal.
login:
The initial output message, Trying... appears while the telnet program converts the machine name to an IP address and tries to make a valid TCP connection to that address. As soon as the connection has been established, telnet prints the second and third lines, telling the user that the connection attempt has succeeded and identifying a special character that the user can type to escape from the telnet application temporarily if needed (e.g., if a failure occurs and the user needs to abort the connection). The notation A] means that the user must hold the CONTROL key while striking the right bracket key. 8
The last few lines of output come from the remote machine. They identify the operating system as SunOS, and provide a standard login prompt. The cursor stops after the login: message, waiting for the user to type a valid login identifier. The user must have an account on the remote machine for the TELNET session to continue. After the user types a valid login identifier, the remote machine prompts for a password, and only permits access if the login identifier and password are valid.
the telnet program will form a connection to protocol port number 185 at machine cnri.reston.va. us. The machine is owned by the Corporation For National Research Initiatives (CNRI). Port 185 on the machine at CNRI does not supply remote login service. Instead, it prints information about a recent change in the service offered, and then closes the connection.
telnet cnri.reston.va.us 185 Trying... Connected to cnri.reston.va.us. Escape character is '^]'. ******NOTICE****** The KIS client program has been moved from this machine to info.cnri.reston.va.us (132.151.1.15) on port 185. ******************
Contacting port 185 on machine info. cnri. reston. va. us allows one to access the Knowbot Information Service. After a connection succeeds, the user receives information about the service followed by a prompt for Knowbot commands:
Trying... Connected to info.cnri.reston.va.us. Escape character is '^]'. Knowbot Information Service KIS Client (V2.0). Copyright CNRI 1990. All Rights Reserved.
KIS searches various Internet directory services to find someone's street address, email address and phone number.
Type 'man' at the prompt for a complete reference with examples. Type 'help' for a quick reference to commands. Type 'news' for information about recent changes. Backspace characters are '^H' or DEL
Please enter your email address in our guest book... (Your email address?) >
The first three lines are the same as in the example above because they come from the telnet program and not the remote service. The remaining lines differ, and clearly show that the service available on port 185 is not a remote login service. The greater-than symbol on the last line serves as the prompt for Knowbot commands. The Knowbot service searches well-known white pages directories to help a user find information about another user. For example, suppose one wanted to know the email address for David Clark, a researcher at MIT. Typing clark in response to the Knowbot prompt retrieves over 675 entries that each contain the name Clark. Most of the entries correspond to individuals with a first or last name of Clark, but some correspond to individuals with Clark in their affiliation (e.g., Clark College). Searching through the retrieved information reveals only one entry for a David Clark at MIT:
(617)253-6003
The TELNET protocol provides incredible flexibility because it only defines interactive communication and not the details of the service accessed. TELNET can be used as the communication mechanism for many interactive services besides remote login.
level of concurrency and consider whether their software will exhibit higher throughput if they increase or decrease the level of concurrency. This text helps application programmers understand the design, construction, and optimization of network application software that uses concurrent processing. It describes the fundamental algorithms for both sequential and concurrent implementations of application protocols and provides an example of each. It considers the tradeoffs and advantages of each design. Later chapters discuss the subtleties of concurrency management and review techniques that permit a programmer to optimize throughput automatically. To summarize:
Providing concurrent access to application services is important and difficult; many chapters of this text explain and discuss concurrent implementations of application protocol software.
1.10 Summary
Many programmers are building distributed applications that use TCP/IP as a transport mechanism. Before programmers can design and implement a distributed application, they need to understand the client-server model of computing, the operating system interface an application program uses to access protocol software, the fundamental algorithms used to implement client and server software, and alternatives to standard clientserver interaction including the use of application gateways. Most network services permit multiple users to access the service simultaneously. The technique of concurrent processing makes it possible to build an application program that can handle multiple requests at the same time. Much of this text focuses on techniques for the concurrent implementation of application protocols and on the problem of managing concurrency. FOR FURTHER STUDY The manuals that vendors supply with their operating systems contain information on how to invoke commands that access services like TELNET. Many sites augment the set of standard commands with locally-defined commands. Check with your site administrator to find out about loc ally- available commands. EXERCISES
1.1 Use TELNET from your local machine to login to another machine. How much delay, if any, do you experience when the second machine connects to the same local area network? How much delay do you notice when connected to a remote machine?
1.2 Read the vendor's manual to find out whether your local version of the TELNET software permits connection to a port on the remote machine other than the. standard port used for remote login.
11
1.3 Determine the set of TCP/IP services available on your local computer.
1.4 Use an FTP program to retrieve a file from a remote site. If the software does not provide statistics, estimate the transfer rate for a large file. Is the rate higher or lower than you expected?
1.5 Use the finger command to obtain information about users at a remote site.
12
2.2 Motivation
The fundamental motivation for the client-server paradigm arises from the problem of rendezvous. To understand the problem, imagine a human trying to start two programs on separate machines and have them communicate. Also remember that computers operate many orders of magnitude faster than humans. After the human initiates the first program, the program begins execution and sends a message to its peer. Within a few milliseconds, it determines that the peer does not yet exist, so it emits an error message and exits. Meanwhile, the human initiates the second program. Unfortunately, when the second program starts execution, it finds that the peer has already ceased execution. Even if the two programs retry to communicate continually, they can each execute so quickly that the probability of them sending messages to one another simultaneously is low. The client-server model solves the rendezvous problem by asserting that in any pair of communicating applications, one side must start execution and wait (indefinitely) for the other side to contact it. The solution is important because TCP/IP does not respond to incoming communication requests on its own. Because TCP/IP does not provide any mechanisms that automatically create running programs when a message arrives, a program must be waiting to accept communication before any requests arrive. Thus, to ensure that computers are ready to communicate, most system administrators arrange to have communication programs start automatically whenever the operating system boots. Each program runs forever, waiting for the next request to arrive for the service it offers.
13
The client-server paradigm uses the direction of initiation to categorize whether a program is a client or server. In general, an application that initiates peer-to-peer communication is called a client. End users usually invoke client software when they use a network service. Most client software consists of conventional application programs. Each time a client application executes, it contacts a server, sends a request, and awaits a response. When the response arrives, the client continues processing. Clients are often easier to build than servers, and usually require no special system privileges to operate. By comparison, a server is any program1 that waits for incoming communication requests from a client. The server receives a client's request, performs the necessary computation, and returns the result to the client. 2.3.2 Privilege And Complexity Because servers often need to access data, computations, or protocol ports that the operating system protects, server software usually requires special system privileges. Because a server executes with special system privilege, care must be taken to ensure that it does not inadvertently pass privileges on to the clients that use it. For example, a file server that operates as a privileged program must contain code to check whether a given file can be accessed by a given client. The server cannot rely on the usual operating system checks because its privileged status overrides them. Servers must contain code that handles the issues of: Authentication - verifying the identity of the client Authorization - determining whether a given client is permitted to access the service the server supplies Data security - guaranteeing that data is not unintentionally revealed or compromised Privacy - keeping information about an individual from unauthorized access Protection - guaranteeing that network applications cannot abuse system resources. As we will see in later chapters, servers that perform intense computation or handle large volumes of data operate more efficiently if they handle requests concurrently. The combination of special privileges and concurrent operation usually makes servers more difficult to design and implement than clients. Later chapters provide many examples that illustrate the differences between clients and servers. 2.3.3 Standard Vs. Nonstandard Client Software Chapter I describes two broad classes of client application programs: those that invoke standard TCP/IP services (e.g., electronic mail) and those that invoke services defined by the site (e.g., an institution's private database system). Standard application services consist of those services defined by TCP/IP and assigned well-known, universally recognized protocol port identifiers; we consider all others to be locally-defined application services or nonstandard application services. The distinction between standard services and others is only important when communicating outside the local environment. Within a given environment, system administrators usually arrange to define service names in such a way that users cannot distinguish between local and standard services. Programmers who build network applications that will be used at other sites must understand the distinction, however, and must be careful to avoid depending on services that are only available locally. Although TCP/IP defines many standard application protocols, most commercial computer vendors supply only a handful of standard application client programs with their TCP/IP software. For example, TCP/IP software usually includes a remote terminal client that uses the standard TELNET protocol for remote login, an electronic mail client that uses the standard SMTP protocol to transfer electronic mail to a remote system, a file transfer client that uses the standard FTP protocol to transfer files between two machines, and a Web browser that uses the standard HTTP protocol to access Web documents. Of course, many organizations build customized applications that use TCP/IP to communicate. Customized, nonstandard applications range from simple to complex, and include such diverse services as image transmission and video teleconferencing,
1 Technically, a server is a program and not a piece of hardware. However, computer users frequently (mis)apply the term to the computer responsible for running a particular server program. For example, they might say, "That computer is our file server," when they mean, "That computer runs our file server program.
14
voice transmission, remote real-time data collection, hotel and other on-line reservation systems, distributed database access, weather data distribution, and remote control of ocean-based drilling platforms. 2.3.4 Parameterization Of Clients Some client software provides more generality than others. In particular, some client software allows the user to specify both the remote machine on which a server operates and the protocol port number at which the server is listening. For example, Chapter I shows how standard application client software can use the TELNET protocol to access services other than the conventional TELNET remote terminal service, as long as the program allows the user to specify a destination protocol port as well as a remote machine. Conceptually, software that allows a user to specify a protocol port number has more input parameters than other software, so we use the term fully parameterized client to describe it. Many TELNET client implementations interpret an optional second argument as a port number. To specify only a remote machine, the user supplies the name of the remote machine:
telnet machine-name
Given only a machine name, the telnet program uses the well-known port for the TELNET service. To specify both a remote machine and a port on that machine, the user specifies both the machine name and the port number:
Not all vendors provide full parameterization for their client application software. Therefore, on some systems, it may be difficult or impossible to use any port other than the official TELNET port. In fact, it may be necessary to modify the vendor's TELNET client software or to write new TELNET client software that accepts a port argument and uses that port. Of course, when building client software, full parameterization is recommended.
When designing client application software, include parameters that allow the user to fully specify the destination machine and destination protocol port number.
Full parameterization is especially useful when testing a new client or server because it allows testing to proceed independent of the existing software already in use. For example, a programmer can build a TELNET client and server pair, invoke them using nonstandard protocol ports, and proceed to test the software without disturbing standard services. Other users can continue to access the old TELNET service without interference during the testing. 2.3.5 Connectionless Vs. Connection-Oriented Servers When programmers design client-server software, they must choose between two types of interaction: a connectionless style or a connection-oriented style. The two styles of interaction correspond directly to the two major transport protocols that the TCP/IP protocol suite supplies. If the client and server communicate using UDP, the interaction is connectionless; if they use TCP, the interaction is connection-oriented. From the application programmer's point of view, the distinction between connectionless and connection-oriented interactions is critical because it determines the level of reliability that the underlying system provides. TCP provides all the reliability needed to communicate across an internet. It verifies that data arrives, and automatically retransmits segments that do not. It computes a checksum over the data to guarantee that it is not corrupted during transmission. It uses sequence numbers to ensure that the data arrives in order, and automatically eliminates duplicate packets. It provides flow control to ensure that the 15
sender does not transmit data faster than the receiver can consume it. Finally, TCP informs both the client and server if the underlying network becomes inoperable for any reason. By contrast, clients and servers that use UDP do not have any guarantees about reliable delivery. When a client sends a request, the request may be lost, duplicated, delayed, or delivered out of order. Similarly, a response the server sends back to a client may be lost, duplicated, delayed, or delivered out of order. The client and/or server application programs must take appropriate actions to detect and correct such errors. UDP can be deceiving because it provides best effort delivery. UDP does not introduce errors - it merely depends on the underlying IP internet to deliver packets. IP, in turn, depends on the underlying hardware networks and intermediate gateways. From a programmer's point of view, the consequence of using UDP is that it works well if the underlying internet works well. For example, UDP works well in a local environment because reliability errors seldom occur in a local environment. Errors usually arise only when communication spans a wide area internet. Programmers sometimes make the mistake of choosing connectionless transport (i.e., UDP), building an application that uses it, and then testing the application software only on a local area network. Because a local area network seldom or never delays packets, drops them, or delivers them out of order, the application software appears to work well. However, if the same software is used across a wide area internet, it may fail or produce incorrect results. Beginners, as well as most experienced professionals, prefer to use the connection oriented style of interaction. A connectionoriented protocol makes programming simpler, and relieves the programmer of the responsibility to detect and correct errors. In fact, adding reliability to a connectionless internet message protocol like UDP is a nontrivial undertaking that usually requires considerable experience with protocol design. Usually, application programs only use UDP if: (1) the application protocol specifies that UDP must be used ( presumably, the application protocol has been designed to handle reliability and delivery errors), (2) the application protocol relies on hardware broadcast or multicast, or (3) the application cannot tolerate the computational overhead or delay required for TCP virtual circuits. We can summarize:
When designing client-server applications, beginners are strongly advised to use TCP because it provides reliable, connection-oriented communication. Programs only use UDP if the application protocol handles reliability, the application requires hardware broadcast or multicast, or the application cannot tolerate virtual circuit overhead.
2.3.6 Stateless Vs. Stateful Servers Information that a server maintains about the status of ongoing interactions with clients is called state information. Servers that do not keep any state information are called stateless servers; others are called stateful servers. The desire for efficiency motivates designers to keep state information in servers. Keeping a small amount of information in a server can reduce the size of messages that the client and server exchange, and can allow the server to respond to requests quickly. Essentially, state information allows a server to remember what the client requested previously and to compute an incremental response as each new request arrives. By contrast, the motivation for statelessness lies in protocol reliability: state information in a server can become incorrect if messages are lost, duplicated, or delivered out of order, or if the client computer crashes and reboots. If the server uses incorrect state information when computing a response, it may respond incorrectly. 2.3.7 A Stateful File Server Example An example will help explain the distinction between stateless and stateful servers. Consider a file server that allows clients to remotely access information kept in the files on a local disk. The server operates as an application program. It waits for a client to contact it over the network. The client sends one of two request types. It either sends a request to extract data from a specified file or a request to store data in a specified file. The server performs the requested operation and replies to the client. 16
On one hand, if the file server is stateless, it maintains no information about the transactions. Each message from a client that requests the server to extract data from a file must specify the complete file name (the name could be quite lengthy), a position in the file from which the data should be extracted, and the number of bytes to extract. Similarly, each message that requests the server to store data in a file must specify the complete file name, a position in the file at which the data should be stored, and the data to store. On the other hand, if the file server maintains state information for its clients, it can eliminate the need to pass file names in each message. The server maintains a table that holds state information about the file currently being accessed. Figure 2.1 shows one possible arrangement of the state information.
When a client first opens a file, the server adds an entry to its state table that contains the name of the file, a handle (a small integer used to identify the file), and a current position in the file (initially zero). The server then sends the handle back to the client for use in subsequent requests. Whenever the client wants to extract additional data from the file, it sends a small message that includes the handle. The server uses the handle to look up the file name and current file position in its state table. The server increments the file position in the state table, so the next request from the client will extract new data. Thus, the client can send repeated requests to move through the entire file. When the client finishes using a file, it sends a message informing the server that the file will no longer be needed. In response, the server removes the stored state information. As long as all messages travel reliably between the client and server, a stateful design makes the interaction more efficient. The point is:
In an ideal world, where networks deliver all messages reliably and computers never crash, having a server maintain a small amount of state information for each ongoing interaction can make messages smaller and processing simpler.
Although state information can improve efficiency, it can also be difficult or impossible to maintain correctly if the underlying network duplicates, delays, or delivers messages out of order (e.g., if the client and server use UDP to communicate). Consider what happens to our file server example if the network duplicates a read request. Recall that the server maintains a notion of file position in its state information. Assume that the server updates its notion of file position each time a client extracts data from a file. If the network duplicates a read request, the server will receive two copies. When the first copy arrives, the server extracts data from the file, updates the file position in its state information, and returns the result to the client. When the second copy arrives, the server extracts additional data, updates the file position again, and returns the new data to the client. The client may view the second response as a duplicate and discard it, or it may report an error because it received two different responses to a single request. In either case, the state information at the server can become incorrect because it disagrees with the client's notion of the true state. When computers reboot, state information can also become incorrect. If a client crashes after performing an operation that creates additional state information, the server may never receive messages that allow it to discard the information. Eventually, the accumulated state information exhausts the server's memory. In our file server example, if a client opens 100 files and then crashes, the server will maintain 100 useless entries in its state table forever. A stateful server may also become confused (or respond incorrectly) if a new client begins operation after a reboot using the same protocol port numbers as the previous client that was operating when the system crashed. It may seem that this problem can be overcome easily by having the server erase previous information from a client whenever a new request for interaction 17
arrives. Remember, however, that the underlying internet may duplicate and delay messages, so any solution to the problem of new clients reusing protocol ports after a reboot must also handle the case where a client starts normally, but its first message to a server becomes duplicated and one copy is delayed. In general, the problems of maintaining correct state can only be solved with complex protocols that accommodate the problems of unreliable delivery and computer system restart. To summarize:
In a real internet, where machines crash and reboot, and messages can be lost, delayed, duplicated, or delivered out of order, stateful designs lead to complex application protocols that are difficult to design, understand, and program correctly.
2.3.8 Statelessness Is A Protocol Issue Although we have discussed statelessness in the context of servers, the question of whether a server is stateless or stateful centers on the application protocol more than the implementation. If the application protocol specifies that the meaning of a particular message depends in some way on previous messages, it may be impossible to provide a stateless interaction. In essence, the issue of statelessness focuses on whether the application protocol assumes the responsibility for reliable delivery. To avoid problems and make the interaction reliable, an application protocol designer must ensure that each message is completely unambiguous. That is, a message cannot depend on being delivered in order, nor can it depend on previous messages having been delivered. In essence, the protocol designer must build the interaction so the server gives the same response no matter when or how many times a request arrives. Mathematicians use the term idempotent to refer to a mathematical operation that always produces the same result. We use the term to refer to protocols that arrange for a server to give the same response to a given message no matter how many times it arrives.
In an internet where the underlying network can duplicate, delay or deliver messages out of order or where computers running client applications can crash unexpectedly, the server should be stateless. The server can only be stateless if the application protocol is designed to make operations idempotent.
2.3.9 Servers As Clients Programs do not always fit exactly into the definition of client or server. A server program may need to access network services that require it to act as a client. For example, suppose our file server program needs to obtain the time of day so it can stamp files with the time of access. Also suppose that the system on which it operates does not have a time-of-day clock. To obtain the time, the server acts as a client by sending a request to a time-of-day server as Figure 2.2 shows.
18
In a network environment that has many available servers, it is not unusual to find a server for one application acting as a client for another. Of course, designers must be careful to avoid circular dependencies among servers.
2.4 Summary
The client-server paradigm classifies a communicating application program as either a client or a server depending on whether it initiates communication. In addition to client and server software for standard applications, many TCP/IP users build client and server software for nonstandard applications that they define locally. Beginners and most experienced programmers use TCP to transport messages between the client and server because it provides the reliability needed in an internet environment. Programmers only resort to UDP if TCP cannot solve the problem. Keeping state information in the server can improve efficiency. However, if clients crash unexpectedly or the underlying transport system allows duplication, delay, or packet loss, state information can consume resources or become incorrect. Thus, most application protocol designers try to minimize state information. A stateless implementation may not be possible if the application protocol fails to make operations idempotent. Programs cannot be divided easily into client and server categories because many programs perform both functions. A program that acts as a server for one service can act as a client to access other services. FOR FURTHER STUDY Stevens [ 1990] briefly describes the client-server model and gives UNIX examples. Other examples can be found by consulting applications that accompany various vendors' operating systems. EXERCISES
2.1 Which of your local implementations of standard application clients are fully parameterized? Why is full parameterization needed?
2.2 Are standard application protocols like TELNET, FTP, SMTP, and NFS (Network File System) connectionless or connection-oriented?
19
2.3 What does TCP/IP specify should happen if no server exists when a client request arrives? (Hint: look at ICMP.) What happens on your local system?
2.4 Write down the data structures and message formats needed for a stateless file server. What happens if two or more clients access the same file? What happens if a client crashes before closing a file?
2.5 Write down the data structures and message formats needed for a stateful file server. Use the operations open, read, write, and close to access files. Arrange for open to return an integer used to access the file in read and write operations. How do you distinguish duplicate open requests from a client that sends an open, crashes, reboots, and sends an open again?
2.6 In the previous exercise, what happens in your design if two or more clients access the same file? What happens if a client crashes before closing a file?
2.7 Examine the NFS remote file access protocol carefully to identify which operations are idempotent. What errors can result if messages are lost, duplicated, or delayed?
20
21
In addition to concurrency among clients on a single machine, the set of all clients on a set of machines can execute concurrently. Figure 3.1 illustrates concurrency among client programs running on several machines. Client software does not usually require any special attention or effort on the part of the programmer to make it usable concurrently. The application programmer designs and constructs each client program without regard to concurrent execution; concurrency among multiple client programs occurs automatically because the operating system allows multiple users to each invoke a client concurrently. Thus, the individual clients operate much like any conventional program. To summarize:
Most client software achieves concurrent operation because the underlying operating system allows users to execute client programs concurrently or because users on many machines each execute client software simultaneously. An individual client program operates like any conventional program; it does not manage concurrency explicitly.
Chapter 8 discusses algorithms and design issues for concurrent servers, showing how they operate in principle. Chapters 9 through 13 each illustrate one of the algorithms, describing the design in more detail and showing code for a working server. The remainder of this chapter concentrates on terminology and basic concepts used throughout the text.
Some systems use the terms task, job, or thread instead of process.
23
Of course, on a uniprocessor architecture, the single CPU can only execute one process at any instant in time. The operating system makes the computer appear to perform more than one computation at a time by switching the CPU among all executing processes rapidly. From a human observer's point of view, many processes appear to proceed simultaneously. In fact, one process proceeds for a short time, then another process proceeds for a short time, and so on. We use the term concurrent execution to capture the idea. It means "apparently simultaneous execution." On a uniprocessor, the operating system handles concurrency, while on a multiprocessor, all CPUs can execute processes simultaneously. The important concept is:
Application programmers build programs for a concurrent environment without knowing whether the underlying hardware consists of a uniprocessor or a multiprocessor.
3.4.2 Programs vs. Processes In a concurrent processing system, a conventional application program is merely a special case: it consists of a piece of code that is executed by exactly one process at a time. The notion of process differs from the conventional notion of program in other ways. For example, most application programmers think of the set of variables defined in the program as being associated with the code. However, if more than one process executes the code concurrently, it is essential that each process has its own copy of the variables. To understand why, consider the following segment of C code that prints the integers from 1 to 10:
The iteration uses an index variable, i. In a conventional program, the programmer thinks of storage for variable i as being allocated with the code. However, if two or more processes execute the code segment concurrently, one of them may be on the sixth iteration when the other starts the first iteration. Each must have a different value for i. Thus, each process must have its own copy of variable i or confusion will result, To summarize:
When multiple processes execute a piece of code concurrently, each process has its own, independent copy of the variables associated with the code.
3.4.3 Procedure Calls In a procedure-oriented language, like Pascal or C, executed code can contain calls to subprograms (procedures or functions). Subprograms accept arguments, compute a result, and then return just after the point of the call. If multiple processes execute code concurrently, they can each be at a different point in the sequence of procedure calls. One process, A, can begin execution, call a procedure, and then call a second-level procedure before another process, B, begins. Process B may return from a first-level procedure call just as process A returns from a second-level call. The run-time system for procedure-oriented programming languages uses a stack mechanism to handle procedure calls. The run-time system pushes a procedure activation record on the stack whenever it makes a procedure call. Among other things, the activation record stores information about the location in the code at which the procedure call occurs. When the procedure finishes execution, the run-time system pops the activation record from the top of the stack and returns to the procedure from which the call occurred. Analogous to the rule for variables, concurrent programming systems provide separation between procedure calls in executing processes:
24
When multiple processes execute a piece of code concurrently, each has its own run-time stack of procedure activation records.
#include <stdlib.h> #include <stdio.h> int sum; main () { int i; sum = 0; for (i=1 ; i <=5 ; i++) { /* iterate i from 1 to 5 */ /* i is a local variable */ /* sum is a global variable */
printf("The value of i is %d\n", i); fflush(stdout); sum += i; } printf ("The sum is %d\n", sum); exit(0); } /* terminate the program */ /* flush the buffer */
The value of i is 1 The value of i is 2 The value of i is 3 The value of i is 4 The value of i is 5 The sum is 15
25
3.5.2 A Concurrent Version To create a new process in UNIX, a program calls the system function fork2. In essence, fork divides the running program into two (almost) identical processes, both executing at the same place in the same code. The two processes continue just as if two users had simultaneously started two copies of the application. For example, the following modified version of the above example calls fork to create a new process. (Note that although the introduction of concurrency changes the meaning of the program completely, the call to fork occupies only a single line of code.)
#include <stdlib.h> #include <stdio.h> int sum;
main() { int i; sun = 0; fork(); for (i=1 ; i<=5 ; i++) { printf ("The value of i is %d\n", i); fflush(stdout); sum += i; } printf ("The sum is %d\n", sum); exit (0) } /* create a new process */
When a user executes the concurrent version of the program, the system begins with a single process executing the code. However, when the process reaches the call to fork, the system duplicates the process and allows both the original process and the newly created process to execute. Of course, each process has its own copy of the variables that the program uses. In fact, the easiest way to envision what happens is to imagine that the system makes a second copy of the entire running program. Then imagine that both copies run Oust as if two users had both simultaneously executed the program). To summarize:
To understand the fork function, imagine that fork causes the operating system to make a copy of the executing program and allows both copies to run at the same time.
On one particular uniprocessor system, the execution of our example concurrent program produces twelve lines of output:
The value of i is 1 The value of i is 2 The value of i is 3 The value of i is 4 The value of i is 5
2 To a programmer, the call to fork looks and acts like an ordinary function call in C. It is written fork() . At run-time, however, control passes to the operating system, which creates a new process.
26
The sum is 15 The value of i is 1 The value of i is 2 The value of i is 3 The value of i is 4 The value of i is 5 The sum is 15
On the hardware being used, the first process executed so rapidly that it was able to complete execution before the second process ran at all. Once the first process completed, the operating system switched the processor to the second process, which also ran to completion. The entire run took less than a second. The operating system overhead incurred in switching between processes and handling system calls, including the call to fork and the calls required to write the output, accounted for less than 20% of the total time. 3.5.3 Timeslicing In the example program, each process performed a trivial amount of computation as it iterated through a loop five times. Therefore, once a process gained control of the CPU, it quickly ran to completion. If we examine concurrent processes that perform substantially more computation, an interesting phenomenon occurs: the operating system allocates the available CPU power to each one for a short time before moving on to the next. We use the term timeslicing to describe systems that share the available CPU among several processes concurrently. For example, if a timeslicing system has only one CPU to allocate and a program divides into two processes, one of the processes will execute for a while, then the second will execute for a while, then the first will execute again, and so on. If the timeslicing system has many processes, it runs each for a short time before it runs the first one again. A timeslicing mechanism attempts to allocate the available processing equally among all available processes. If only two processes are eligible to execute and the computer has a single processor, each receives approximately 50% of the CPU. If N processes are eligible on a computer with a single processor, each receives approximately 1/N of the CPU. Thus, all processes appear to proceed at an equal rate, no matter how many processes execute. With many processes executing, the rate is low; with few, the rate is high. To see the effect of timeslicing, we need an example program in which each process executes longer than the allotted timeslice. Extending the concurrent program above to iterate 10,000 times instead of 5 times produces:
main() { int i;
sum = 0; fork(); for (i=1 ; i <=10000 ; i++) { printf("The value of i is %d\n", i); fflush(stdout); sum += i;
27
When the resulting concurrent program is executed on the same system as before, it emits 20,002 lines of output. However, instead of all output from the first process followed by all output from the second process, output from both processes is mixed together. In one run, the first process iterated 74 times before the second process executed at all. Then the second process iterated 63 times before the system switched back to the first process. On subsequent timeslices, the processes each received enough CPU service to iterate between 60 and 90 times. Of course, the two processes compete with all other processes executing on the computer, so the apparent rate of execution varies slightly depending on the mix of programs running. 3.5.4 Making Processes Diverge So far, we have said that fork can be used to create a new process that executes exactly the same code as the original process. Creating a truly identical copy of a running program is neither interesting nor useful because it means that both copies perform exactly the same computation. In practice, the process created by fork is not absolutely identical to the original process: it differs in one small detail. Fork is a function that returns a value to its caller. When the function call returns, the value returned to the original process differs from the value returned to the newly created process. In the newly created process, the fork returns zero; in the original process, fork returns a small positive integer that identifies the newly created process. Technically, the value returned is called a process identifier or process id3. Concurrent programs use the value returned by fork to decide how to proceed. In the most common case, the code contains a conditional statement that tests to see if the value returned is nonzero:
#include <stdlib.h>
In the example code, variable pid records the value returned by the call to fork. Remember that each process has its own copy of all variables, and that fork will either return zero (in the newly created process) or nonzero (in the original process). Following the call to fork, the if statement checks variable pid to see whether the original or the newly created process is
28
executing. The two processes each print an identifying message and exit. When the program runs, two messages appear: one from the original process and one from the newly created process. To summarize:
The value returned by fork differs in the original and newly created processes; concurrent programs use the difference to allow the new process to execute different code than the original process.
29
available, the program will block. The user may type a command while the program is blocked waiting for input on the TCP connection. The problem is that the application cannot know whether input will arrive from the keyboard or the TCP connection first. To solve the dilemma, a UNIX program calls select. In doing so, it asks the operating system to let it know which source of input becomes available first. The call returns as soon as a source is ready, and the program reads from that source. For now, it is only important to understand the idea behind select; later chapters present the details and illustrate its use.
3.9 Summary
Concurrency is fundamental to TCP/IP applications because it allows users to access services without waiting for one another. Concurrency in clients arises easily because multiple users can execute client application software at the same time. Concurrency in servers is much more difficult to achieve because server software must be programmed explicitly to handle requests concurrently. In UNIX, a program creates an additional process using the fork system call. We imagine that the call to fork causes the operating system to duplicate the program, causing two copies to execute instead of one. Technically, fork is a function call because it returns a value. The only difference between the original process and a process created by fork lies in the value that the call returns. In the newly created process, the call returns zero; in the original process, it returns the small, positive integer process id of the newly created process. Concurrent programs use the returned value to make new processes execute a different part of the program than the original process. A process can call execve at any time to have the process execute code from a separately-compiled program. Concurrency is not free. When an operating system switches context from one process to another, the system uses the CPU. Programmers who introduce concurrency into server designs must be sure that the benefits of a concurrent design outweigh the additional overhead introduced by context switching. The select call permits a single process to manage concurrent I/O. A process uses select to find out which I/O device becomes ready first. FOR FURTHER STUDY Many texts on operating systems describe concurrent processing. Peterson and Silberschatz [1985] covers the general topic. Comer [19841 discusses the implementation of processes, message passing, and process coordination mechanisms. Leffler et. al. [1989] describes 4.3 BSD UNIX. EXERCISES
3.1 Run the example programs on your local computer system. Approximately how many iterations of the output loop can a process make in a single timeslice?
3.2 Write a concurrent program that starts five processes. Arrange for each process to print a few lines of output and then halt.
3.3 Find out how systems other than UNIX create concurrent processes.
3.4 Read more about the UNIX fork function. What information does the newly created process share with the original process?
3.5 Write a program that uses execve to change the code a process executes.
3.6 Write a program that uses select to read from two terminals (serial lines), and displays the results on a screen with labels that identify the source.
30
3.7 Rewrite the program in the previous exercise so it does not use select. Which version is easier to understand? more efficient? easier to terminate cleanly?
31
The TCP/IP standards do not specify the details of how application software interfaces with TCP/IP protocol software; they only suggest the required functionality, and allow system designers to choose the details.
4.2.1 Advantages And Disadvantages Using a loose specification for the protocol interface has advantages and disadvantages. On the positive side, it provides flexibility and tolerance. It allows designers to implement TCP/IP using operating systems that range from the simplest systems available on personal computers to the sophisticated systems used on supercomputers. More important, it means designers can use either a procedural or message-passing interface style (whichever style the operating system supports). On the negative side, a loose specification means that designers can make the interface details different for each operating system. As vendors add new interfaces that differ from existing interfaces, application programming becomes more difficult and applications become less portable across machines. Thus, while system designers favor a loose specification, application programmers, desire a restricted specification because it means applications can be compiled for new machines without change. In practice, only a few TCP/IP interfaces exist. The University of California at Berkeley defined an interface for the Berkeley UNIX operating system that has become known as the socket interface, or sockets. AT&T defined an interface for System V UNIX known by the acronym TLI1. A few other interfaces have been defined, but none has gained wide acceptance yet.
33
Allocate local resources for communication Specify local and remote communication endpoints Initiate a connection (client side)
Wait for an incoming connection (server side) Send or receive data Determine when data arrives Generate urgent data Handle incoming urgent data Terminate a connection gracefully Handle connection termination from the remote site Abort communication Handle error conditions or a connection abort Release local resources when communication finishes
The conceptual interface defined by the TCP/IP standards does not specify data representations or programming details; it merely provides an example of one possible interface that an operating system can offer to application programs that use TCP/IP
Thus, the conceptual interface illustrates loosely how applications interact with TCP. Because it does not prescribe exact details, operating system designers are free to choose alternative procedure names or parameters as long as they offer equivalent functionality.
service from the operating system, the process executing the application climbs into the operating system, performs the necessary operation, and then climbs back out. As it passes through the system call interface, the process acquires privileges that allow it to read or modify data structures in the operating system. The operating system remains protected, however, because each system call branches to a procedure that the operating system designers have written.
The designer invents entirely new system calls that applications use to access TCP/IP. The designer attempts to use conventional I/O calls to access TCP/IP.
In the first approach, the designer makes a list of all conceptual operations, invents names and parameters for each, and implements each as a system call. Because many designers consider it unwise to create new system calls unless absolutely necessary, this approach is seldom used. In the second approach, the designer uses conventional I/O primitives but overloads them so they work with network protocols as well as conventional I/O devices. Of course, many designers choose a hybrid approach that uses basic I/O functions whenever possible, but adds additional functions for those operations that cannot be expressed conveniently.
35
When an application program calls open to initiate input or output, the system returns a small integer called a file descriptor that the application uses in further I/O operations. The call to open takes three arguments: the name of a file or device to open, a set of bit flags that controls special cases such as whether to create the file if it does not exist, and an access mode that specifies read/write protections for newly created files. For example, the code segment:
int desc;
opens an existing file, filename, with a mode that allows both reading and writing. After obtaining the integer descriptor, desc, the application uses it in further I/O operations on the file. For example, the statement:
reads 128 bytes of data from the file into array buffer. Finally, when an application finishes using a file, it calls close to deallocate the descriptor and release associated resources (e.g., internal buffers):
close(desc);
if an application chooses to use UDP, it must be able to transfer UDP datagrams, not merely a stream of bytes. The designers of Berkeley UNIX added new system calls to UNIX to accommodate these special cases. The next chapter shows the details of the design.
4.9 Summary
Because TCP/IP is designed for a multi-vendor environment, the protocol standards loosely specify the interface that application programs use, allowing operating system designers freedom in choosing how to implement it. The standards do discuss a conceptual interface, but it is intended only as an illustrative example. Although the standards present the conceptual interface as a set of procedures, designers are free to choose different procedures or to use an entirely different style of interaction (e.g., message passing). Operating systems often supply services through a mechanism known as the system call interface. When adding support for TCP/IP, designers attempt to minimize the number of new system calls by extending existing system calls where possible. However, because network communication requires operations that do not fit easily into conventional I/O procedures, most interfaces to TCP/IP require a few new system calls. FOR FURTHER STUDY Section 2 of the UNIX Programmer's Manual describes each of the socket calls in detail; section 4P describes protocols and network device interfaces in more detail. [AT&T 1989] defines AT&T's TLI interface, an alternative to sockets used in System V UNIX. EXERCISES
4.1 Examine a message-passing operating system. How would you extend the application program interface to accommodate network communication?
4.2 Compare the socket interface from Berkeley UNIX with AT&T's TLI. What are the major differences? How are the two similar? What reasons could designers have for choosing one design over the other?
4.3 Some hardware architectures limit the number of possible system calls to a small number (e.g., 64 or 128). How many system calls have already been assigned in your local operating system?
4.4 Think about the hardware limit on system calls discussed in the previous exercise. How can a system designer add additional system calls without changing the hardware?
4.5 Find out how recent versions of the Kom shell use /dev/tcp to allow UNIX shell scripts to communicate with TCP. Write an example script.
37
39
The Berkeley socket interface provides generalized functions that support network communication using many possible protocols. Socket calls refer to all TCP/IP protocols as a single protocol family. The calls allow the programmer to specify the type of service required rather than the name of a specific protocol.
The overall design of sockets and the generality they provide have been debated since their inception. Some computer scientists argue that generality is unnecessary and merely makes application programs difficult to read. Others argue that having programmers specify the type of service instead of the specific protocol makes it easier to program because it frees the programmer from understanding the details of each protocol family. Finally, some commercial vendors of TCP/IP software have argued in favor of alternative interfaces because sockets cannot be added to an operating system unless the customer has the source code, which usually requires a special license agreement and additional expense.
The socket interface adds a new abstraction for network communication, the socket. Like files, each active socket is identified by a small integer called its socket descriptor. UNIX allocates socket descriptors in the same descriptor table as file descriptors. Thus, an application cannot have both a file descriptor and a socket descriptor with the same value. BSD UNIX contains a separate system function, socket, that applications call to create a socket; an application only uses open to create file descriptors. The general idea underlying sockets is that a single system call is sufficient to create any socket. Once the socket has been created, an application must make additional system calls to specify the details of its exact use. The paradigm will become clear after we examine the data structures the system maintains. 40
Although the internal data structure for a socket contains many fields, the system leaves most of them unfilled when it creates the socket. As we will see, the application that created the socket must make additional system calls to fill in information in the socket data structure before the socket can be used. 5.4.3 Using Sockets Once a socket has been created, it can be used to wait for an incoming connection or to initiate a connection. A socket used by a server to wait for an incoming connection is called a passive socket, while a socket used by a client to initiate a connection is called an active socket. The only difference between active and passive sockets lies in how applications use them; the sockets are created the same way initially.
UNIX data structures are more complex than shown in Figure 5.1; the diagram illustrates the concept, not the details.
41
In practice, much confusion arises between the TCP/IP protocol family, denoted PF_INET, and the address family it uses, denoted AF_INET. The chief problem is that both symbolic constants have the same numeric value (2), so programs that inadvertent ly use one in place of the other operate correctly. Even the Berkeley UNIX source code contains examples of misuse. Programmers should observe the distinction, however, because it helps clarify the meaning of variables and makes programs more portable.
Unfortunately, not all address families define endpoints that fit into the sockaddr structure. For example, BSD UNIX defines the AF_UNIX address family to specify what UNIX programmers think of as a named pipe. Endpoint addresses in the AF UNIX family consist of UNIX path names that can be much longer than 14 bytes. Therefore, application programs should not use sockaddr_in variable declarations because a variable declared to be of type sockaddr is not large enough to hold all possible endpoint addresses. Confusion often arises in practice because the sockaddr structure accommodates addresses in the AF_INET family. Thus, TCP/IP software works correctly even if the programmer declares variables to be of type sockaddr. However, to keep programs portable and maintainable, TCP/IP code should not use the sockaddr structure in declarations. Instead, sockaddr should be used only as an overlay, and code should reference only the sa_family field in it. Each protocol family that uses sockets defines the exact representation of its endpoint addresses, and the socket software provides corresponding structure declarations. Each TCP/IP endpoint address consists of a 2-byte field that identifies the address type (it must contain AF_INET), a 2-byte port number field, a 4-byte IP address field, and an 8-byte field that remains unused. Predefined structure sockaddr_in specifies the format:
/* struct to hold an address */ /* total length */ /* type of address */ /* protocol port number */
This text describes the structure as defined in release 4.4 of the Berkeley software; older versions of the sockaddr structure do not include the sa len field.
42
*/ * /
char sin_zero[8]; };
An application that uses TCP/IP protocols exclusively can use structure sockaddr_in exclusively; it never needs to use the sockaddr structure4. Thus,
When representing a TCP/IP communication endpoint, an application program uses structure sockaddr_in, which contains both an IP address and a protocol port number. Programmers must be careful when writing programs that use a mixture of protocols because some non-TCP/IP endpoint addresses require a larger structure.
Structure sockaddr is used to cast (i.e., change the type of) pointers or the results of system functions to make programs pass the type checking in lint.
43
network. If the system buffers become full, the call to write may block temporarily until TCP can send data across the network and make space in the buffer for new data. 5.7.4 The Read Call Both clients and servers use read to receive data from a TCP connection. Usually, after a connection has been established, the server uses read to receive a request that the client sends by calling write. After sending its request, the client uses read to receive a reply. To read from a connection, an application calls read with three arguments. The first specifies the socket descriptor to use, the second specifies the address of a buffer, and the third specifies the length of the buffer. Read extracts data bytes that have ar rived at that socket, and copies them to the user's buffer area. If no data has arrived, the call to read blocks until it does. If more data has arrived than fits into the buffer, read only extracts enough to fill the buffer. If less data has arrived than fits into the buffer, read extracts all the data and returns the number of bytes it found. Clients and servers can also use read to receive messages from sockets that use UDP. As with the connection-oriented case, the caller supplies three arguments that identify a socket descriptor, the address of a buffer into which the data should be placed, and the size of the buffer. Each call to read extracts one incoming UDP message (i.e., one user datagram). If the buffer cannot hold the entire message, read fills the buffer and discards the remainder. 5.7.5 The Close Call Once a client or server finishes using a socket, it calls close to deallocate it. If only one process is using the socket, close immediately terminates the connection and deallocates the socket. If several processes share a socket, close decrements a reference count and deallocates the socket when the reference count reaches zero. 5.7.6 The Bind Call When a socket is created, it does not have any notion of endpoint addresses (neither the local nor remote addresses are assigned). An application calls bind to specify the local endpoint address for a socket. The call takes arguments that specify a socket descriptor and an endpoint address. For TCP/IP protocols, the endpoint address uses the sockaddr_in structure, which includes both an IP address and a protocol port number. Primarily, servers use bind to specify the well-known port at which they will await connections. 5.7.7 The Listen Call When a socket is created, the socket is neither active (i.e., ready for use by a client) nor passive (i.e., ready for use by a server) until the application takes further action. Connection-oriented servers call listen to place a socket in passive mode and make it ready to accept incoming connections. Most servers consist of an infinite loop that accepts the next incoming connection, handles it, and then returns to accept the next connection. Even if handling a given connection takes only a few milliseconds, it may happen that a new connection request arrives during the time the server is busy handling an existing request. To ensure that no connection request is lost, a server must pass listen an argument that tells the operating system to enqueue connection requests for a socket. Thus, one argument to the listen call specifies a socket to be placed in passive mode, while the other specifies the size of the queue to be used for that socket. 5.7.8 The Accept Call For TCP sockets, after a server calls socket to create a socket, bind to specify a local endpoint address, and listen to place it in passive mode, the server calls accept to extract the next incoming connection request. An argument to accept specifies the socket from which a connection should be accepted. Accept creates a new socket for each new connection request, and returns the descriptor of the new socket to its caller. The server uses the new socket only for the new connection; it uses the original socket to accept additional connection requests. Once 44
it has accepted a connection, the server can transfer data on the new socket. After it finishes using the new socket, the server closes it. 5.7.9 Summary Of Socket Calls Used With TCP The table in Figure 5.3 provides a brief summary of the system functions related to sockets.
45
Software that uses TCP/IP calls functions htons, ntohs, htonl and ntohl to convert binary integers between the host's native byte order and network standard byte order. Doing so makes the source code portable to any machine, regardless of its native byte order.
The client creates a socket, calls connect to connect to the server, and then interacts using write to send requests and read to receive replies. When it finishes using the connection, it calls close. A server uses bind to specify the local (well-known) protocol port it will use, calls listen to set the length of the connection queue, and then enters a loop. Inside the loop, the server calls accept to wait until the next connection request arrives, uses read and write to interact with the client, and finally uses close to terminate the connection. The server then returns to the accept call, where it waits for the next connection.
We will assume throughout the remainder of this text that programs always begin with these statements, even if they are not shown explicitly in the examples. To summarize: 46
UNIX supplies predefined symbolic constants and data structure declarations used with the socket system calls. Programs that reference these constants must begin with C preprocessor include statements that reference the files in which the definitions appear.
5.11 Summary
BSD UNIX introduced the socket abstraction as a mechanism that allows application programs to interface with protocol software in the operating system. Because so many other vendors have adopted sockets, they have become a de facto standard. A program calls socket to create a socket and obtain a descriptor for it. Arguments to the socket call specify the protocol family to be used and the type of service required. All TCP/IP protocols are part of the Internet family, specified with symbolic constant PF_INET. The system creates an internal data structure for the socket, fills in the protocol family, and uses the type of service argument to select a specific protocol (usually either UDP or TCP). Additional system calls allow the application to specify a local endpoint address(bind), to force the socket into passive mode for use by a server (listen), or to force the socket into active mode for use by a client (connect). Servers can make further calls to obtain incoming connection requests (accept), and both clients and servers can send or receive information (read and write). Finally, both clients and servers can deallocate a socket once they have finished using it (close). The socket structure allows each protocol family to define one or more address representations. All TCP/IP protocols use the Internet address family, AF_INET, which specifies that an endpoint address contains both an IP address and a protocol port number. When an application specifies a communication endpoint to a socket function, it uses predefined structure sockaddr_in. If a client specifies that it needs an arbitrary, unused local protocol port, the TCP/IP software will select one. Before an application program written in C can use the predefined structures and symbolic constants associated with sockets, it must include several files that define them. In particular, we assume that all source programs begin with statements that include files <sys/types.h> and <sys/socket.h>. FOR FURTHER STUDY Leffler et. al. [1989] describes the Berkeley UNIX system in detail, and describes the internal data structures UNIX uses for sockets. Presotto and Ritchie [June 1990] describes an interface for TCP/IP protocols using the UNIX file system space. The UNIX Programmer's Manual contains specifications for the socket functions, including an exact description of arguments and return codes. The section entitled The IPC Tu torial is worth reading. Much of the information on socket calls can also be found in Appendix A. EXERCISES
5.1 Look at the include file for sockets (usually /usr/include/sys/socket.h). What socket types are allowed? Which socket types do not makes sense for TCP/IP protocols? 5.2 If your system has a clock with at least microsecond accuracy, measure how long it takes to execute each of the socket system calls. Why do some calls require orders of magnitude more time than others? 5.3 Read the BSD UNIX manual pages for connect carefully. What network traffic is generated if one calls connect on a socket of type SOCK DGRAM? 5.4 Arrange to monitor your local network while an application executes connect for the first time on a socket of type SOCK STREAM. How many packets do you see?
47
Although programmers need to understand the conceptual capabilities of the protocol interface, they should concentrate on learning about ways to structure communicating programs instead of memorizing the details of a particular interface.
48
Client software can use one of several methods to find a server's IP address and protocol port number. A client can: have the server's domain name or IP address specified as a constant when the program is compiled, require the user to identify the server when invoking the program, obtain information about the server from stable storage (e.g., from a file on a local disk), or use a separate protocol to find a server (e.g., multicast or broadcast a message to which all servers respond). Specifying the server's address as a constant makes the client software faster and less dependent on a particular local computing environment. However, it also means that the client must be recompiled if the server is moved. More important, it means that the client cannot be used with an alternative server, even temporarily for testing. As a compromise, some clients fix a machine name instead of an IP address. Fixing the name instead of an address delays the binding until run-time. It allows a site to choose a generic name for the server and add an alias to the domain name system for that name. Using aliases permits a site manager to change the location of a server without changing client software. To move the server, the manager needs to change only the alias. For example, it is possible to add an alias for mailhost in the local domain and to arrange for all clients to look up the string "mailhost" instead of a specific machine. Because all clients reference the generic name instead of a specific machine, the system manager can change the location of the mail host without recompiling client software. Storing the server's address in a file makes the client more flexible, but it means that the client program cannot execute unless the file is available. Thus, the client software cannot be transported to another machine easily. While using a broadcast protocol to find servers works in a small, local environment, it does not scale well to large internets. Furthermore, use of a dynamic search mechanism introduces additional complexity for both clients and servers, and adds additional broadcast traffic to the network. To avoid unnecessary complexity and dependence on the computing environment, most clients solve the problem of server specification in a simple manner: they require the user to supply an argument that identifies the server when invoking the client program. Building client software to accept the server address as an argument makes the client software general and eliminates dependency on the computing environment.
Allowing the user to specify a server address when invoking client software makes the client program more general and makes it possible to change server locations.
An important point to note is that using an argument to specify the server's address results in the most flexibility. A program that accepts an address argument can be combined with other programs that extract the server address from disk, find the address using a remote nameserver, or search for it with a broadcast protocol. Thus,
Building client software that accepts a server address as an argument makes it easy to build extended versions of the software that use other ways to find the server address (e.g., read the address from a file on disk).
Some services require an explicit server, while others can use any available server. For example, when a user invokes a remote login client, the user has a specific target machine in mind; logging into another machine usually does not make sense. However, if the user merely wants to find the current time of day, the user does not care which server responds. To accommodate such services, the designer can modify any of the server look up methods discussed above so they supply a set of server names instead of a single name. Clients must also be changed so they try each server in a set until they find one that responds.
49
struct hostent { char char int int char }; #define h_addr h_addr_list[0] *h_name; **h_aliases; h_addrtype; h_length; **h_addr_list; /* official host name */ /* other aliases /* address type /* address length /* list of addresses */ */ */ */
50
Fields that contain names and addresses must be lists because hosts that have multiple interfaces also have multiple names and addresses. For compatibility with earlier versions, the file also defines the identifier h addr to refer to the first location in the host address list. Thus, a program can use h addr as if it were a field of the structure. Consider a simple example of name conversion. Suppose a client has been passed the domain name merlin.cs.purdue.edu in string form and needs to obtain the IP address. The client can call gethostbyname as in:
if ( hptr = gethostbyname( examplenam ) ) { /* IP address is now in hptr->h_addr */ } else { /* error in name - handle it */ }
If the call is successful, gethostbyname returns a pointer to a valid hostent structure. If the name cannot be mapped into an IP address, the call returns zero. Thus, the client examines the value that gethostbyname returns to determine if an error occurred.
struct servent { char *s_name; char **s_aliases; int sort; char *s_proto; /* official service name */ /* other aliases */
If a TCP client needs to look up the official protocol port number for SMTP, it calls getservbyname, as in the following example:
struct servent *sptr; if (sptr = getservbyname( "smtp", tcp" )) { /* port number is now in sptr->s_port */ } else { /* error occurred - handle it */ }
51
struct protoent { char *p_name; / * official protocol name /* list of aliases allowed * / */
If a client needs to look up the official protocol number for UDP, it calls getprotobyname, as in the following example:
struct protoent *pptr; if (pptr = getprotobyname ( "udp" )) { /* official protocol number is now in pptr->p _proto */ } else { /* error occurred - handle it */ }
52
int
s;
/* socket descriptor */
In general, the difficulty in choosing an IP address arises because the correct choice depends on routing and applications seldom have access to routing information. To understand why, imagine a computer with multiple network interfaces and, therefore, multiple IP addresses. Before an application can use TCP, it must have an endpoint address for the connection. When TCP communicates with a foreign destination, it encapsulates each TCP segment in an IP datagram and passes the datagram to the IP software. IP uses the remote destination address and its routing table to select a nexthop address and a network interface that it can use to reach the next hop. Herein lies the problem: the IP source address in an outgoing datagram should match the IP address of the network interface over which IP routes the datagram. However, if an application chooses one of the machine's IP addresses at random, it might select an address that does not match that of the interface over which IP routes the traffic. In practice, a client may appear to work even if the programmer chooses an incorrect address because packets may travel back to the client by a different route than they travel to the server. However, using an incorrect address violates the specification, makes network management difficult and confusing, and makes the program less reliable. To solve the problem, the socket calls make it possible for an application to leave the local IP address field unfilled and to allow TCP/IP software to choose a local IP address automatically at the time the client connects to a server.
Because choosing the correct local IP address requires the application to interact with IP routing software, TCP client software usually leaves the local endpoint address unfilled, and allows TCP/IP software to select the correct local IP address and an unused local protocol port number automatically.
where s is the descriptor for a socket, remaddr is the address of a structure of type sockaddr_in that specifies the remote endpoint to which a connection is desired, and remaddrlen is the length (in bytes) of the second argument. Connect performs four tasks. First, it tests to ensure that the specified socket is valid and that it has not already been connected. Second, it fills in the remote endpoint address in the socket from the second argument. Third, it chooses a local endpoint ad dress for the connection (IP address and protocol port number) if the socket does not have one. Fourth, it initiates a TCP connection and returns a value to tell the caller whether the connection succeeded.
54
*req = "request of some sort"; buf[BLEN] ; *bptr; n; buflen; /* buffer for answer /* pointer to buffer */ */
/* send request */
/* read response (may come in many pieces) */ while ((n = read(s, bptr, buflen) > 0) { bptr += n; buf_len -= n; }
Because TCP does not preserve record boundaries, any program that reads from a TCP connection must be prepared to accept data a few bytes at a time. This rule holds even if the sending application writes data in large blocks.
55
6.17 Closing A TCP Connection 6.17.1 The Need For Partial Close
When an application finishes using a connection completely, it can call close to terminate the connection gracefully and deallocate the socket. However, closing a connection is seldom simple because TCP allows two-way communication. Thus, closing a connection usually requires coordination among the client and server. To understand the problem, consider a client and server that use the requestresponse interaction described above. The client software repeatedly issues requests to which the server responds. On one hand, the server cannot terminate the connection be cause it cannot know whether the client will send additional requests. On the other hand, while the client knows when it has no more requests to send, it may not know whether all data has arrived from the server. The latter is especially important for application protocols that transfer arbitrary amounts of data in response to a request (e.g., the response to a database query).
The direction argument is an integer. If it contains 0, no further input is allowed. If it contains 1, no further output is allowed. Finally, if the value is 2, the connection is shutdown in both directions. The advantage of a partial close should now be clear: when a client finishes sending requests, it can use shutdown to specify that it has no further data to send without deallocating the socket. The underlying protocol reports the shutdown to the remote machine, where the server application program receives an end-of-file signal. Once the server detects an end-of-file, it knows no more requests will arrive. After sending its last response, the server can close the connection. To summarize:
The partial close mechanism removes ambiguity for application protocols that transmit arbitrary amounts of information in response to a request. In such cases, the client issues a partial close after its last request; the server then closes the connection after its last response.
56
The first few steps of the UDP client algorithm are much like the corresponding steps of the TCP client algorithm. A UDP client obtains the server address and protocol port number, and then allocates a socket for communication.
UDP sockets can be connected, making it convenient to interact with a specific server, or they can be unconnected, making it necessary for the application to specify the server's address each time it sends a message.
Client software that uses UDP must implement reliability with techniques like packet sequencing, acknowledgements, timeouts, and retransmission. Designing protocols that are correct, reliable, and efficient for an internet environment requires considerable expertise.
6.25 Summary
Client programs are among the most simple network programs. The client must obtain the server's IP address and protocol port number before it can communicate; to increase flexibility, client programs often require the user to identify the server when in voking the client. The client then converts the server's address from dotted decimal notation into binary, or uses the domain name system to convert from a textual machine name into an IP address. The TCP client algorithm is straightforward: a TCP client allocates a socket and connects it to a server. The client uses write to send requests to the server and read to receive replies. Once it finishes using a connection, either the client or server invokes close to terminate it. Although a client must explicitly specify the endpoint address of the server with which it wishes to communicate, it can allow TCP/IP software to choose an unused protocol port number and to fill in the correct local IP address. Doing so avoids the prob lem that can arise on a router or multi-homed host when a client inadvertently chooses an IP address that differs from the IP address of the interface over which IP routes the traffic. The client uses connect to specify a remote endpoint address for a socket. When used with TCP, connect initiates a 3way handshake and ensures that communication is possible. When used with UDP, connect merely records the server's endpoint address for later use. Connection shutdown can be difficult if neither the client nor the server know exactly when communication has ended. To solve the problem, the socket interface supplies the shutdown primitive that causes a partial close and lets the other side know that no more data will arrive. A client uses shutdown to close the path leading to the server; the server receives an end-of-file signal on the connection that indicates the client has finished. After the server finishes sending its last response, it uses close to terminate the connection. 58
FOR FURTHER STUDY Many RFCs that define protocols also suggest algorithms or implementation techniques for client code. Stevens [1990] also reviews client implementation. EXERCISES
6.1 Read about the sendto and recvfrom socket calls. Do they work with sockets using TCP or sockets using UDP? 6.2 When the domain name system resolves a machine name, it returns a set of one or more IP addresses. Why? 6.3 Build client software that uses gethostbyname to look up machine names at your site and print all information returned. Which official names, if any, surprised you? Do you tend to use official machine names or aliases? Describe the circumstances, if any, when aliases may not work correctly. 6.4 Measure the time required to look up a machine name (gethostbyname) and a service entry (getservent). Repeat the test for both valid and invalid names. Does a look up for an invalid name take substantially longer than for a valid one? Explain any differences you observe. 6.5 Use a network monitor to watch the network traffic your computer generates when you look up an IP address name using gethostbyname. Run the experiment more than one time for each machine name you resolve. Explain the differences in network traffic between look ups. 6.6 To test whether your machine's local byte order is the same as the network byte order, write a program that uses getservbyname to look up the ECHO service for UDP and then prints the resulting protocol port value. If the local byte order and network byte order agree, the value will be 7. 6.7 Write a program that allocates a local protocol port, closes the socket, delays a few seconds, and allocates another local port. Run the program on an idle machine and on a busy timesharing system. Which port values did your program receive on each system? If they are not the same, explain. 6.8 Under what circumstances can a client program use close instead of shutdown? 6.9 Should a client use the same protocol port number each time it begins? Why or why not?
59
60
int connectsock(const char *host, const char *service, const char *transport);
/*-----------------------------------------------------------------------* connectTCP - connect to a specified TCP service on a specified host *-----------------------------------------------------------------------*/ int connectTCP(const char *host, const char *service ) /* * Arguments: * * */ { return connectsock( host, service, "tcp"); host - name of host to which connection is desired
61
int connectsock(const char *host, const char *service, const char *transport);
/*-----------------------------------------------------------------------* connectUDP - connect to a specified UDP service on a specified host *-----------------------------------------------------------------------*/ int connectUDP(const char *host, const char *service ) /* * Arguments: * * */ { return connectsock(host, service, "udp"); } host - name of host to which connection is desired
#define __USE_BSD
62
/*-----------------------------------------------------------------------* connectsock - allocate & connect a socket using TCP or UDP *-----------------------------------------------------------------------*/ int connectsock(const char *host, const char *service, const char *transport ) /* * Arguments: * * * */ { struct hostent struct servent *phe; *pse; /* pointer to host information entry */ host service - name of host to which connection is desired - service associated with the desired port
struct sockaddr_in sin; /* an Internet endpoint address int s, type; /* socket descriptor and socket type
/* Map service name to port number */ if ( pse = getservbyname(service, transport) ) sin.sin_port = pse->s_port; else if ( (sin.sin_port = htons((u_short)atoi(service))) == 0 ) errexit("can't get \"%s\" service entry\n", service);
/* Map host name to IP address, allowing for dotted decimal */ if ( phe = gethostbyname(host) ) memcpy(&sin.sin_addr, phe->h_addr, phe->h_length);
63
else if ( (sin.sin_addr.s_addr = inet_addr(host)) == INADDR_NONE ) errexit("can't get \"%s\" host entry\n", host);
/* Map transport protocol name to protocol number */ if ( (ppe = getprotobyname(transport)) == 0) errexit("can't get \"%s\" protocol entry\n", transport);
/* Use protocol to choose a socket type */ if (strcmp(transport, "udp") == 0) type = SOCK_DGRAM; else type = SOCK_STREAM;
/* Allocate a socket */ s = socket(PF_INET, type, ppe->p_proto); if (s < 0) errexit("can't create socket: %s\n", strerror(errno));
/* Connect the socket */ if (connect(s, (struct sockaddr *)&sin, sizeof(sin)) < 0) errexit("can't connect to %s.%s: %s\n", host, service, strerror(errno)); return s; }
Although most steps are straightforward, a few details make the code seem complicated. First, the C language permits complex expressions. As a result, the expressions in many of the condition statements contain a function call, an assignment, and a com parison, all on one line. For example, the call to getprotobyname appears in an expression that assigns the result to variable ppe, and then compares the result to 0. If the value returned is zero (i.e., an error occurred), the if statement executes a call to errexit. Otherwise, the procedure continues execution. Second, the code uses two library procedures defined by ANSI C, memset and memcpy 1. Procedure memset places bytes of a given value in a block of memory; it is the fastest way to zero a large structure or array. Procedure memcpy copies a block of bytes from one memory location to another, regardless of the contents2. Connectsock uses memset to fill the entire sockaddr_in structure with zeroes, and then uses memcpy to copy the bytes of the server's IP address into field sin addr. Finally, Connectsock calls procedure connect to connect the socket. If an error occurs, it calls errexit.
/* errexit.c - errexit */
Early versions of UNIX used the names bzero and bcopy. Function strcpy cannot be used to copy an IP address because IP addresses can contain zero bytes which strcpy interprets as end of string.
64
/*-----------------------------------------------------------------------* errexit - print an error message and exit *-----------------------------------------------------------------------*/ /*VARARGS1*/ int errexit(const char *format, ...) { va_list args;
Errexit takes a variable number of arguments, which it passes on to vfprintf for output. Errexit follows the printf conventions for formatted output. The first argument specifies how the output should be formatted; remaining arguments specify values to be printed according to the given format.
extern int
errno;
int TCPdaytime(const char *host, const char *service); int errexit(const char *format, ...); int connectTCP(const char *host, const char *service);
#define LINELEN
128
/*-----------------------------------------------------------------------* main - TCP client for DAYTIME service *-----------------------------------------------------------------------*/ int main(int argc, char *argv[]) { char char *host = "localhost"; *service = "daytime"; /* host to use if none supplied */ /* default service port */
switch (argc) { case 1: host = "localhost"; break; case 3: service = argv[2]; /* FALL THROUGH */ case 2: host = argv[1]; break; default: fprintf(stderr, "usage: TCPdaytime [host [port]]\n"); exit(1); }
66
/*-----------------------------------------------------------------------* TCPdaytime - invoke Daytime on specified host and print results *-----------------------------------------------------------------------*/ TCPdaytime(const char *host, const char *service) { char buf[LINELEN+1]; /* buffer for one line of text */ */
int s, n;
s = connectTCP(host, service);
Notice how using connectTCP simplifies the code. Once a connection has been established, DAYTIME merely reads input from the connection and prints it, iterating until it detects an end of file condition.
67
converts from its local time to universal time before sending a reply, and a client converts from universal time to its local time when the reply arrives. Unlike the DAYTIME service, which is intended for human users, the TIME service is intended for use by programs that store or manipulate times. The TIME protocol always specifies time in a 32-bit integer, representing the number of seconds since an epoch date. The TIME protocol uses midnight, January l, 1900, as its epoch. Using an integer representation allows computers to transfer time from one machine to another quickly, without waiting to convert it into a text string and back into an integer. Thus, the TIME service makes it possible for one computer to set its timeofday clock from the clock on another system.
#include <sys/types.h>
68
#define BUFSIZE 64
2208988800
*/
int connectUDP(const char *host, const char *service); int errexit(const char *format, ...);
/*-----------------------------------------------------------------------* main - UDP client for TIME service that prints the resulting time *-----------------------------------------------------------------------*/ int main(int argc, char *argv[]) { char char time_t *host = "localhost"; *service = "time"; now; /* host to use if none supplied */ /* default service name */ */
int s, n;
switch (argc) { case 1: host = "localhost"; break; case 3: service = argv[2]; /* FALL THROUGH */ case 2: host = argv[1]; break; default: fprintf(stderr, "usage: UDPtime [host [port]]\n"); exit(1); }
s = connectUDP(host, service);
69
n = read(s, (char *)&now, sizeof(now)); if (n < 0) errexit("read failed: %s\n", strerror(errno)); now = ntohl((u_long)now); now -= UNIXEPOCH; /* put in host byte order */ */
The example code contacts the TIME service by sending a datagram. It then calls read to wait for a reply and extract the time value from it. Once UDPtime has obtained the time, it must convert the time into a form suitable for the local machine. First, it uses ntohl to convert the 32-bit value (a long in C) from network standard byte order into the local host byte order. Second, UDPtime must convert to the machine's local representation. The example code is designed for UNIX. Like the Internet protocols, UNIX represents time in a 32-bit integer and interprets the integer to be a count of seconds. Unlike the Internet, however, UNIX assumes an epoch date of January 1, 1970. Thus, to convert from the TIME protocol epoch to the UNIX epoch, the client must subtract the number of seconds between January 1, 1900 and January 1, 1970. The example code uses the conversion value 2208988800. Once the time has been converted to a representation compatible with that of the local machine, UDPtime can invoke the library procedure ctime, which converts the value into a human readable form for output.
extern int
errno;
int TCPecho(const char *host, const char *service); int errexit(const char *format, ...); int connectTCP(const char *host, const char *service);
70
#define LINELEN
128
/*-----------------------------------------------------------------------* main - TCP client for ECHO service *-----------------------------------------------------------------------*/ int main(int argc, char *argv[]) { char char *host = "localhost"; *service = "echo"; /* host to use if none supplied */ /* default service name */
switch (argc) { case 1: host = "localhost"; break; case 3: service = argv[2]; /* FALL THROUGH */ case 2: host = argv[1]; break; default: fprintf(stderr, "usage: TCPecho [host [port]]\n"); exit(1); } TCPecho(host, service); exit(0); }
/*-----------------------------------------------------------------------* TCPecho - send input to ECHO service on specified host and print reply *-----------------------------------------------------------------------*/ int TCPecho(const char *host, const char *service) { char buf[LINELEN+1]; /* buffer for one line of text */
int s, n;
71
s = connectTCP(host, service);
while (fgets(buf, sizeof(buf), stdin)) { buf[LINELEN] = '\0'; outchars = strlen(buf); (void) write(s, buf, outchars); /* insure line null-terminated */
/* read it back */ for (inchars = 0; inchars < outchars; inchars+=n ) { n = read(s, &buf[inchars], outchars - inchars); if (n < 0) errexit("socket read failed: %s\n", strerror(errno)); } fputs(buf, stdout); } }
After opening a connection, TCPecho enters a loop that repeatedly reads one line of input, sends the line across the TCP connection to the ECHO server, reads it back again, and prints it. After all input lines have been sent to the server, received back, and printed successfully, the client exits.
extern int
errno;
int UDPecho(const char *host, const char *service); int errexit(const char *format, ...); int connectUDP(const char *host, const char *service);
#define LINELEN
128
72
*-----------------------------------------------------------------------*/ int main(int argc, char *argv[]) { char char *host = "localhost"; *service = "echo";
switch (argc) { case 1: host = "localhost"; break; case 3: service = argv[2]; /* FALL THROUGH */ case 2: host = argv[1]; break; default: fprintf(stderr, "usage: UDPecho [host [port]]\n"); exit(1); } UDPecho(host, service); exit(0); }
/*-----------------------------------------------------------------------* UDPecho - send input to ECHO service on specified host and print reply *-----------------------------------------------------------------------*/ int UDPecho(const char *host, const char *service) { char buf[LINELEN+1]; /* buffer for one line of text */
int s, nchars;
s = connectUDP(host, service);
while (fgets(buf, sizeof(buf), stdin)) { buf[LINELEN] = '\0'; nchars = strlen(buf); /* insure null-terminated */
73
if (read(s, buf, nchars) < 0) errexit("socket read failed: %s\n", strerror(errno)); fputs(buf, stdout); } }
The example UDP ECHO client follows the same general algorithm as the TCP version. It repeatedly reads a line of input, sends it to the server, reads it back from the server, and prints it. The biggest difference between the UDP and TCP versions lies in how they treat data received from the server. Because UDP is datagram-oriented, the client treats an input line as a unit and places each in a single datagram. Similarly, the ECHO server receives and returns complete datagrams. Thus, while the TCP client reads incoming data as a stream of bytes, the UDP client either receives an entire line back from the server or receives none of it; each call to read returns the entire line unless an error has occurred.
7.19 Summary
Programmers use the procedural abstraction to keep programs flexible and easy to maintain, to hide details, and to make it easy to port programs to new computers. Once a programmer writes and debugs a procedure, he or she places it in a library where it can be reused in many programs easily. A library of procedures is especially important for programs that use TCP/IP because they often operate on multiple computers. This chapter presents an example library of procedures used to create client software. The primary procedures in our library, c onne c t TCP and c onne c t UDP, make it easy to allocate and connect a socket to a specified service on a specified host. The chapter presents examples of a few client applications. Each example contains the code for a complete C program that implements a standard application protocol: DAYTIME (used to obtain and print the time of day in a human-readable format), TIME (used to obtain the time in 32-bit integer form), and ECHO (used to test network connectivity). The example code shows how a library of procedures hides many of the details associated with socket allocation and makes it easier to write client software. FOR FURTHER STUDY The application protocols described here are each part of the TCP/IP standard. Postel [RFC 867] contains the standard for the DAYTIME protocol, Postel and Harrenstien [RFC 868] contains the standard for the TIME protocol, and Postel [RFC 862] contains the standard for the ECHO protocol. Mills [RFC 1305] specifies version 3 of the Network Time Protocol, NTP. EXERCISES
7.1 Use program T CPd a y t i me to contact servers on several machines. How does each format the time and date? 7.2 The Internet standard represents time in a 32-bit integer that gives seconds past the epoch, midnight January 1, 1900. UNIX systems also represent time in a 32-bit integer that measures seconds, but UNIX uses January 1, 1970 as its epoch. What is the maximum date and time that can be represented in each system? 7.3 Improve the TIME client so it checks the date received to verify that it is greater than January 1, 1996 (or some other date you know to be in the recent past). 7.4 Modify the TIME client so it computes E, the time that elapses between when it sends the request and when it receives a response. Add one-half E to the time the server sends. 7.5 Build a TIME client that contacts two TIME servers, and reports the differences between the times they return.
74
7.6 Explain how deadlock can occur if a programmer changes the line size in the TCP ECHO client to be arbitrarily large (e.g., 20,000). 7.7 The ECHO clients presented in this chapter do not verify that the text they receive back from the server matches the text they sent. Modify them to verify the data received. 7.8 The ECHO clients presented in this chapter do not count the characters sent or received. What happens if a server incorrectly sends one additional character back that the client did not send? 7.9 The example ECHO clients in this chapter do not use shutdown. Explain how the use of shutdown can improve client performance. 7.10 Rewrite the code in UDPecho. c so it tests reachability by generating a message, sending it, and timing the reply. If the reply does not arrive in S seconds, declare the destination host to be unreachable. Be sure to retransmit the request at least once in case the internet happens to lose a datagram. 7.11 Rewrite the code in UDPecho. c so it creates and sends a new message once per second, checks replies to be sure they match transmissions, and reports only the round trip time for each reply without printing the contents of the message itself. 7.12 Explain what happens to UDPecho when the underlying network: duplicates a request sent from the client to the server, duplicates a response sent from the server to the client, loses a request sent from the client to the server, or loses a response sent from the server to the client. Modify the code to handle each of these problems.
75
The socket interface does permit an application to connect a UDP socket to a remote endpoint, but practical servers do not do so, and UDP is not a connectionoriented protocol.
77
Although we apply the terminology to servers, it would be more accurate if we restricted it to application protocols, because the choice between connectionless and connection-oriented implementations depends on the application protocol. An applica tion protocol designed to use a connection-oriented transport service may perform incorrectly or inefficiently when using a connectionless transport protocol. To summarize: When considering the advantages and disadvantages of various server implementation strategies, the designer must remember that the application protocol used may restrict some or all of the choices.
must be connectionless. In practice, most sites try to avoid broadcasting whenever possible; none of the standard TCP/IP application protocols currently require multicast. However, future applications could depend more on multicast.
The programmer uses the client's IP address and protocol port number as an index into the table, and arranges for each table entry to contain a pointer to a large buffer of data from the file being read. When a client issues its first request, the server searches the table and finds that it has no record of the client. It allocates a large buffer to hold data from the file, allocates a new table entry to point to the buffer, opens the specified file, and reads data into the buffer. It then copies information out of the 79
buffer when forming a reply. The next time a request arrives from the same client, the server finds the matching entry in the table, follows the pointer to the buffer, and extracts data from it without opening the file. Once the client has read the entire file, the server deallocates the buffer and the table entry, making the resources available for use by another client. Of course, our clever programmer builds the software carefully so that it checks to make sure the requested data resides in the buffer and reads new data into the buffer from the file if necessary. The server also compares the file specified in a request with the file name in the table entry to verify that the client is still using the same file as the previous request. If the clients follow the assumptions listed above and the programmer is careful, adding large file buffers and a simple table to the server can improve its performance dramatically. Furthermore, under the assumptions given, the optimized version of the server will perform at least as fast as the original version because the server spends little time maintaining the data structures compared to the time required to read from a disk. Thus, the optimization seems to improve performance without any penalty. Adding the proposed table changes the server in a subtle way, however, because it introduces state information. Of course, state information chosen carelessly could introduce errors in the way the server responds. For example, if the server used the client's IP address and protocol port number to find the buffer without checking the file name or file offset in the request, duplicate or out-of-order requests could cause the server to return incorrect data. But remember we said that the programmer who designed the optimized version was clever and programmed the server to check the file name and offset in each request, just in case the network duplicates or drops a request or the client decides to read from a new file instead of reading sequentially from the old file. Thus, it may seem that the addition of state information does not change the way the server replies. In fact, if the programmer is careful, the protocol will remain correct. If so, what harm can the state information do? Unfortunately, even a small amount of state information can cause a server to perform badly when machines, client programs, or networks fail. To understand why, consider what happens if one of the client programs fails (i.e., crashes) and must be restart ed. Chances are high that the client will ask for an arbitrary protocol port number and UDP will assign a new protocol port number different from the one assigned for earlier requests. When the server receives a request from the client, it cannot know that the client has crashed and restarted, so it allocates a new buffer for the file and a new slot in the table. Consequently, it cannot know that the old table entry the client was using should be removed. If the server does not remove old entries, it will eventually run out of table slots. It may seem that leaving an idle table entry around does not cause any problem as long as the server chooses an entry to delete when it needs a new one. For example, the server might choose to delete the least recently used (LRU) entry, much like the LRU page replacement strategy used in many virtual memory systems. However, in a network where multiple clients access a single server, frequent crashes can cause one client to dominate the table by filling it with entries that will never be reused. In the worst case, each request that arrives causes the server to delete an entry and reuse it. If one client crashes and reboots frequently enough, it can cause the server to remove entries for legitimate clients. Thus, the server expends more effort managing the table and buffers than it does answering requests2. The important point here is that: A programmer must be extremely careful when optimizing a stateless server because managing small amounts of state information can consume resources if clients crash and reboot frequently or if the underlying network duplicates or delays messages.
80
81
82
where s is an unconnected socket, message is the address of a buffer that contains the data to be sent, len specifies the number of bytes in the buffer, flags specifies debugging or control options, toaddr is a pointer to a sockaddr_in structure that contains the endpoint address to which the message should be sent, and toaddrlen is an integer that specifies the length of the address structure. The socket calls provide an easy way for connectionless servers to obtain the address of a client: the server obtains the address for a reply from the source address found in the request. In fact, the socket interface provides a call that servers can use to receive the sender's address along with the next datagram that arrives. The call, recvfrom, takes two arguments that specify two buffers. The system places the arriving datagram in one buffer and the sender's address in the second buffer. A call to recvfrom has the form:
retcode = recvfrom(s, buf, len, flags, from, fromlen);
where argument s specifies a socket to use, buf specifies a buffer into which the system will place the next datagram, len specifies the space available in the buffer, from specifies a second buffer into which the system will place the source address, and fromlen specifies the address of an integer. Initially, fromlen specifies the length of the from buffer. When the call returns, fromlen will contain the length of the source address the system placed in the buffer. To generate a reply, the server uses the address that recvfrom stored in the from buffer when the request arrived.
the processing time required varies dramatically among requests, or the server executes on a computer with multiple processors. In the first case, allowing the server to compute responses concurrently means that it can overlap use of the processor and peripheral devices, even if the machine has only one CPU. While the processor works to compute one response, the I/O devices can be transferring data into memory that will be needed for other responses. In the second case, timeslicing permits a single processor to handle requests that only require small amounts of processing without waiting for requests that take longer. In the third case, concurrent execution on a computer with multiple processors allows one processor to compute a response to one request while another processor computes a response to another. In fact, most concurrent servers adapt to the underlying hardware automatically - given more hardware resources (e.g., more processors), they perform better. Concurrent servers achieve high performance by overlapping processing and I/O. They are usually designed so performance improves automatically if the server is run on hardware that offers more resources.
Programmers should remember that although the exact cost of creating a process depends on the operating system and underlying architecture, the operation can be expensive. In the case of a connectionless protocol, one must consider carefully whether the cost of concurrency will be greater than the gain in speed. In fact: Because process creation is expensive, few connectionless servers have concurrent implementations.
Connection-oriented application protocols use a connection as the basic paradigm for communication. They allow a client to establish a connection to a server, communicate over that connection, and then discard it. In most cases, the connection between client and server handles more than a single request: the protocol allows a client to repeatedly send requests and receive responses without terminating the connection or creating a new one. Thus, Connection-oriented servers implement concurrency among connections rather than among individual requests. Algorithm 8.4 specifies the steps that a concurrent server uses for a connectionoriented protocol.
As in the connectionless case, the master server process never communicates with the client directly. As soon as a new connection arrives, the master creates a slave to handle that connection. While the slave interacts with the client, the master waits for other connections.
controls one window, sending requests that update the contents. Each client operates independently, and may wait many hours before changing the display or may update the display frequently. For example, an application that displays the time by drawing a picture of a clock might update its display every minute. Meanwhile, an application that displays the status of a user's electronic mail waits until new mail arrives before it changes the display. A server for the X window system integrates information it obtains from clients into a single, contiguous section of memory called the display buffer . Because data arriving from all clients contributes to a single, shared data structure and because BSD UNIX does not allow independent processes to share memory, the server cannot execute as separate UNIX processes. Thus, a conflict arises between a desire for concurrency among processes that share memory and a lack of support for such concurrency in UNIX. Although it may not be possible to achieve real concurrency among processes that share memory, it may be possible to achieve apparent concurrency if the total load of requests presented to the server does not exceed its capacity to handle them. To do so, the server operates as a single UNIX process that uses the select system call for asynchronous I/O. Algorithm 8.5 describes the steps a single-process server takes to handle multiple connections.
86
3 The term deadlock refers to a condition in which a program or set of programs cannot proceed because they are blocked waiting for an event that will never happen. In the case of servers, deadlock means that the server ceases to answer requests.
87
block. If the central server process blocks, it cannot handle other connections. The important point is that any server using only one process can be subject to deadlock. A misbehaving client can cause deadlock in a single-process server if the server uses system functions that can block when communicating with the client. Deadlock is a serious liability in servers because it means the behavior of one client can prevent the server from handling other clients.
8.28 Summary
Conceptually, a server consists of a simple algorithm that iterates forever, waiting for the next request from a client, handling the request, and sending a reply. In practice, however, servers use a variety of implementations to achieve reliability, flexibility, and efficiency. Iterative implementations work well for services that require little computation. When using a connectionoriented transport, an iterative server handles one connection at a time; for connectionless transport, an iterative server handles one request at a time. To achieve efficiency, servers often provide concurrent service by handling multiple requests at the same time. A connection-oriented server provides for concurrency among connections by creating a process to handle each new connection. A connec tionless server provides concurrency by creating a new process to handle each new request. Any server implemented with a single process that uses synchronous system functions like read or write can be subject to deadlock. Deadlock can arise in iterative servers as well as in concurrent servers that use a singleprocess implementation. Server deadlock is especially serious because it means a single misbehaving client can prevent the server from handling requests for other clients. FOR FURTHER STUDY Stevens [1990] describes some of the server algorithms covered in this chapter and shows implementation details. BSD UNIX contains examples of many server algorithms; programmers often consult the UNIX source code for programming techniques. EXERCISES
8.1 Calculate how long an iterative server takes to transfer a 200 megabyte file if the internet has a throughput of 2.3 Kbytes per second. 8.2 If 20 clients each send 2 requests per second to an iterative server, what is the maximum time that the server can spend on each request? 8.3 How long does it take a concurrent, connection-oriented server to accept a new connection and create a new process to handle it on the computers to which you have access? 8.4 Write an algorithm for a concurrent, connectionless server that creates one new process for each request. 8.5 Modify the algorithm in the previous problem so the server creates one new process per client instead of one new process per request. How does your algorithm handle process termination? 8.6 Connection-oriented servers provide concurrency among connections. Does it make sense for a concurrent, connection-oriented server to increase concurrency even further by having the slave processes create additional processes for each request? Explain.
88
8.7 Rewrite the TCP echo client so it uses a single process to concurrently handle input from the keyboard, input from its TCP connection, and output to its TCP connection. 8.8 Can clients cause deadlock or disrupt service in concurrent servers? Why or why not? 8.9 Look carefully at the select system call. How can a single-process server use select to avoid deadlock? 8.10 The select call takes an argument that specifies how many I/O descriptors it should check. Explain how the argument makes a singleprocess server program portable across many UNIX systems.
89
1 2
See page Section 8.16 for a description of Algorithm 8.2. In UNIX, an application must execute as root (i.e., be the superuser) to have sufficient privilege to bind to a reserved port.
91
*-----------------------------------------------------------------------*/ int passiveUDP(const char *service) /* * Arguments: * */ { return passivesock(service, "udp", 0); } service - service associated with the desired port
Procedure passivesock contains the socket allocation details, including the use of portbase. It takes three arguments. The first argument specifies the name of a service, the second specifies the name of the protocol, and the third (used only for TCP sockets) specifies the desired length of the connection request queue. Passivesock allocates either a datagram or stream socket, binds the socket to the well-known port for the service, and returns the socket descriptor to its caller. Recall that when a server binds a socket to a well-known port, it must specify the address using structure sockaddr_in, which includes an IP address as well as a protocol port number. Passivesock uses the constant INADDR_ANY instead of a specific local IP address, enabling it to work either on hosts that have a single IP address or on routers and multi-homed hosts that have multiple IP addresses. Using INADDR_ANY means that the server will receive communication addressed to its wellknown port at any of the machine's IP addresses.
/* passivesock.c - passivesock */
#include <netinet/in.h>
extern int
errno;
u_short portbase = 0;
*/
/*-----------------------------------------------------------------------* passivesock - allocate & bind a server socket using TCP or UDP *-----------------------------------------------------------------------*/
92
int passivesock(const char *service, const char *transport, int qlen) /* * Arguments: * * * */ { struct servent *pse; /* pointer to service information entry */ /* pointer to protocol information entry*/ */ */ service - service associated with the desired port
transport - transport protocol to use ("tcp" or "udp") qlen - maximum server request queue length
struct sockaddr_in sin; /* an Internet endpoint address int s, type; /* socket descriptor and socket type
/* Map service name to port number */ if ( pse = getservbyname(service, transport) ) sin.sin_port = htons(ntohs((u_short)pse->s_port) + portbase); else if ( (sin.sin_port = htons((u_short)atoi(service))) == 0 ) errexit("can't get \"%s\" service entry\n", service);
/* Map protocol name to protocol number */ if ( (ppe = getprotobyname(transport)) == 0) errexit("can't get \"%s\" protocol entry\n", transport);
/* Use protocol to choose a socket type */ if (strcmp(transport, "udp") == 0) type = SOCK_DGRAM; else type = SOCK_STREAM;
/* Allocate a socket */ s = socket(PF_INET, type, ppe->p_proto); if (s < 0) errexit("can't create socket: %s\n", strerror(errno));
93
if (bind(s, (struct sockaddr *)&sin, sizeof(sin)) < 0) errexit("can't bind to %s port: %s\n", service, strerror(errno)); if (type == SOCK_STREAM && listen(s, qlen) < 0) errexit("can't listen on %s port: %s\n", service, strerror(errno)); return s; }
The single server process executes forever. It uses a single passive socket that has been bound to the well-known protocol port for the service it offers. The server obtains a request from the socket, computes a response, and sends a reply back to the client using the same socket. The server uses the source address in the request as the destination address in the reply.
#include <stdio.h>
94
extern int
errno;
#define UNIXEPOCH
2208988800
*/
/*-----------------------------------------------------------------------* main - Iterative UDP server for TIME service *-----------------------------------------------------------------------*/ int main(int argc, char *argv[]) { struct sockaddr_in fsin; char char *service = "time"; buf[1]; /* the from address of a client */ /* service name or port number */
/* "input" buffer; any size > 0 */ /* server socket /* current time /* from-address length */ */ */
int alen;
switch (argc) { case 1: break; case 2: service = argv[1]; break; default: errexit("usage: UDPtimed [port]\n"); }
sock = passiveUDP(service);
95
(struct sockaddr *)&fsin, &alen) < 0) errexit("recvfrom: %s\n", strerror(errno)); (void) time(&now); now = htonl((u_long)(now + UNIXEPOCH)); (void) sendto(sock, (char *)&now, sizeof(now), 0, (struct sockaddr *)&fsin, sizeof(fsin)); } }
Like any server, the UDPtimed process must execute forever. Thus, the main body of code consists of an infinite loop that accepts a request, computes the current time, and sends a reply back to the client that sent the request. The code contains several details. After parsing its arguments, UDPtimed calls passive UDP to create a passive socket for the TIME service. It then enters the infinite loop. The TIME protocol specifies that a client can send an arbitrary datagram to trigger a reply. The datagram can be of any length and can contain any values because the server does not interpret its contents. The example implementation uses recvfrom to read the next datagram. Recvfrom places the incoming datagram in buffer buf, and places the endpoint address of the client that sent the datagram in structure fsin. Because it does not need to examine the data, the implementation uses a single-character buffer. If the datagram contains more than one byte of data, recvfrom discards all remaining bytes. UDPtimed uses the UNIX system routine time to obtain the current time. Recall from Chapter 7 that UNIX uses a 32-bit integer to represent time, measuring from the epoch of midnight, January 1, 1970. After obtaining the time from UNIX, UDPtimed must convert it to a value measured from the Internet epoch and place the result in network byte order. To perform the conversion, it adds constant UNIXEPOCH, which is defined to have the value 2208988800, the difference in seconds between the Internet epoch and the UNIX epoch. It then calls function htonl to convert the result to network byte order. Finally, UDPtimed calls sendto to transmit the result back to the client. Sendto uses the endpoint address in structure fsin as the destination address (i.e., it uses the address of the client that sent the datagram).
9.5 Summary
For simple services, where a server does little computation for each request, an iterative implementation works well. This chapter presented an example of an iterative server for the TIME service that uses UDP for connectionless access. The example il lustrates how procedures hide the details of socket allocation and make the server code simpler and easier to understand. FOR FURTHER STUDY Harrenstien [RFC 738] specifies the TIME protocol. Mills [RFC 1305] describes the Network Time Protocol (NTP); Mills [September 1991] summarizes issues related to using NTP in practical networks, and Mills [RFC 1361] discusses the use of NTP for clock synchronization. Marzullo and Owicki [July 1985] also discusses how to maintain clocks in a distributed environment. EXERCISES
9.1 Instrument UDPtimed to determine how much time it expends processing each request. If you have access to a network analyzer, also measure the time that elapses between the request and response packets. 9.2 Suppose UDPtimed inadvertently clobbered the client's address between the time it received a request and sent a response (i.e., the server accidentally assigned fsin a random value before using it in the call to sendto). What would happen? Why? 9.3 Conduct an experiment to determine what happens if N clients all send requests to UDPtimed simultaneously. Vary both N, the number of senders, and S, the size of the datagrams they send. Explain why the server fails to respond to all requests. (Hint: look at the manual page for listen.) 9.4 The example code in UDPtimed. c specifies a buffer size of 1 when it calls recvfrom. What happens if it specifies a buffer size of 0?
96
9.5 Compute the difference between the UNIX time epoch and the Internet time epoch. Remember to account for leap years. Does the value you compute agree with the constant UNIXEPOCH defined in UDPtimed? If not, explain. (Hint: read about leap seconds.) 9.6 As a security check, the system manager asks you to modify UDPtimed so it keeps a written log of all clients who access the service. Modify the code to print a line on the console whenever a request arrives. Explain how logging can affect the service. 9.7 If you have access to a pair of machines connected by a wide-area internet, use the UDPtime client in Chapter 7 and the UDPtimed server in this chapter to see if your internet drops or duplicates packets.
97
/*-----------------------------------------------------------------------* passiveTCP - create a passive socket for use in a TCP server *-----------------------------------------------------------------------*/ int passiveTCP(const char *service, int qlen) /* * Arguments: * * */ { return passivesock(service, "tcp", qlen); } service - service associated with the desired port qlen - maximum server request queue length
99
Chapter 7 shows how a client uses TCP to contact a DAYTIME server and to display the text that the server sends back. Because obtaining and formatting a date requires little processing and one expects little demand for the service, a DAYTIME server need not be optimized for speed. If additional clients attempt to make connection requests while the server is busy handling a request, the protocol software enqueues the additional requests. Thus, an iterative implementation suffices.
A server that uses connection-oriented transport iterates on connections: it waits at the well-known port for the next connection to arrive from a client, accepts the connection, handles it, closes the connection, and then waits again. The DAYTIME service makes the implementation especially simple because the server does not need to receive an explicit request from the client - it uses the presence of an incoming connection to trigger a response. Because the client does not send an explicit request, the server does not read data from the connection.
errno;
100
int int
#define QLEN
/*-----------------------------------------------------------------------* main - Iterative TCP server for DAYTIME service *-----------------------------------------------------------------------*/ int main(int argc, char *argv[]) { struct char sockaddr_in fsin; /* the from address of a client */ /* service name or port number */ */
*service = "daytime";
switch (argc) { case 1: break; case 2: service = argv[1]; break; default: errexit("usage: TCPdaytimed [port]\n"); }
while (1) { ssock = accept(msock, (struct sockaddr *)&fsin, &alen); if (ssock < 0) errexit("accept failed: %s\n", strerror(errno)); (void) TCPdaytimed(ssock); (void) close(ssock); } }
101
*/ int TCPdaytimed(int fd) { char time_t char *pts; now; *ctime(); /* pointer to time string /* current time */ */
Like the iterative, connectionless server described in the previous chapter, an iterative, connection-oriented server must run forever. After creating a socket that listens at the well-known port, the server enters an infinite loop in which it accepts and handles connections. The code for the server is fairly short because the call to passiveTCP hides the details of socket allocation and binding. The call to passiveTCP creates a master socket associated with the well-known port for the DAYTIME service. The second argument specifies that the master socket will have a request queue length of QLEN, allowing the system to enqueue connection requests that arrive from QLEN additional clients while the server is busy replying to a request from a given client. After creating the master socket, the server's main program enters an infinite loop. During each iteration of the loop, the server calls accept to obtain the next connection request from the master socket. To prevent the server from consuming resources while waiting for a connection from a client, the call to accept blocks the server process until a connection arrives. When a connection request arrives, the TCP protocol software engages in a 3-way handshake to establish a connection. Once the handshake completes and the system allocates a new socket for the incoming connection, the call to accept returns the descriptor of the new socket, allowing the server to continue execution. If no connection arrives, the server process remains blocked forever in the accept call. Each time a new connection arrives, the server calls procedure TCPdaytimed to handle it. The code in TCPdaytimed centers around calls to the UNIX functions time and ctime. Procedure time returns a 32-bit integer that gives the current time in seconds since the UNIX epoch. The UNIX library function ctime takes an integer argument that specifies a time in seconds since the UNIX epoch, and returns the address of an ASCII string that contains the time and date formatted so a human can understand it. Once the server obtains the time and date in an ASCII string, it calls write to send the string back to the client over the TCP connection. Once the call to TCPdaytimed returns, the main program continues executing the loop, and encounters the accept call again. The accept call blocks the server until another request arrives.
102
Of course, TCP's definition of graceful shutdown means that the call to close may not return instantly - the call will block until TCP on the server receives a reply from TCP on the client. Once the client acknowledges both the receipt of all data and the request to terminate the connection, the close call returns,
10.8 Summary
An iterative, connection-oriented server iterates once per connection. Until a connection request arrives from a client, the server remains blocked in a call to accept. Once the underlying protocol software establishes the new connection and creates a new socket, the call to accept returns the socket descriptor and allows the server to continue execution. Recall from Chapter 7 that the DAYTIME protocol uses the presence of a connection to trigger a response from the server. The client does not need to send a request because the server responds as soon as it detects a new connection. To form a response, the server obtains the current time from the operating system, formats the information into a string suitable for humans to read, and then sends the response back to the client. The example server closes the socket that corresponds to an individual connection after sending a response. The strategy of closing the connection immediately works because the DAYTIME service only allows one response per connection. Servers that allow multiple requests to arrive over a single connection must wait for the client to close the connection. FOR FURTHER STUDY Postel [RFC 867] describes the DAYTIME protocol used in this chapter. EXERCISES
10.1 Does a process need special privilege to run a DAYTIME server on your local system? Does it need special privilege to run a DAYTIME client? 10.2 What is the chief advantage of using the presence of a connection to trigger a response from a server? The chief disadvantage? 10.3 Some DAYTIME servers terminate the line of text by a combination of two characters: carriage return (CR) and linefeed (LF). Modify the example server to send CR-LF at the end of the line instead of sending only LF. How does the standard specify lines should be terminated?
103
10.4 TCP software usually allocates a fixed-size queue for additional connection requests that arrive while a server is busy, and allows the server to change the queue size using listen. How large is the queue that your local TCP software provides? How large can the server make the queue with listen? 10.5 Modify the example server code in TCPdaytimed. c so it does not explicitly close the connection after writing a response. Does it still work correctly? Why or why not? 10.6 Compare a connection-oriented server that explicitly closes each connection after sending a response to one that allows the client to hold a connection arbitrarily long before closing the connection. What are the advantages and disadvantages of each approach? 10.7 Assume that TCP uses a connection timeout of 4 minutes (i.e., keeps information for 4 minutes after a connection closes). If a DAYTIME server runs on a system that has 100 slots for TCP connection information, what is the maximum rate at which the server can handle requests without running out of slots?
104
105
To avoid using CPU resources while it waits for connections, the master server uses a blocking call of accept to obtain the next connection from the well-known port. Thus, like the iterative server process in Chapter 10, the master server process in a con current server spends most of its time blocked in a call to accept. When a connection request arrives, the call to accept returns, allowing the master process to execute. The master creates a slave to handle the request, and reissues the call to accept. The call blocks the server again until another connection request arrives.
#include <sys/types.h> #include <sys/signal.h> #include <sys/socket.h> #include <sys/time.h> #include <sys/resource.h> #include <sys/wait.h> #include <sys/errno.h> #include <netinet/in.h>
5 4096
*/
106
extern int
errno;
void
reaper(int);
int TCPechod(int fd); int errexit(const char *format, ...); int passiveTCP(const char *service, int qlen);
/*-----------------------------------------------------------------------* main - Concurrent TCP server for ECHO service *-----------------------------------------------------------------------*/ int main(int argc, char *argv[]) { char struct *service = "echo"; sockaddr_in fsin; /* service name or port number /* the address of a client */ */ */
switch (argc) { case 1: break; case 2: service = argv[1]; break; default: errexit("usage: TCPechod [port]\n"); }
while (1) { alen = sizeof(fsin); ssock = accept(msock, (struct sockaddr *)&fsin, &alen); if (ssock < 0) { if (errno == EINTR) continue; errexit("accept: %s\n", strerror(errno));
107
/*-----------------------------------------------------------------------* TCPechod - echo data until end of file *-----------------------------------------------------------------------*/ int TCPechod(int fd) { char int cc; buf[BUFSIZ];
while (cc = read(fd, buf, sizeof buf)) { if (cc < 0) errexit("echo read: %s\n", strerror(errno)); if (write(fd, buf, cc) < 0) errexit("echo write: %s\n", strerror(errno)); } return 0; }
/*-----------------------------------------------------------------------* reaper - clean up zombie children *-----------------------------------------------------------------------*/ /*ARGSUSED*/ void reaper(int sig) {
108
int status;
As the example shows, the calls that control concurrency occupy only a small portion of the code. A master server process begins executing at main. After it checks its arguments, the master server calls passiveTCP to create a passive socket for the wellknown protocol port. It then enters an infinite loop. During each iteration of the loop, the master server calls accept to wait for a connection request from a client. As in the iterative server, the call blocks until a request arrives. After the underlying TCP protocol software receives a connection request, the system creates a socket for the new connection, and the call to accept returns the socket descriptor. After accept returns, the master server creates a slave process to handle the connection. To do so, the master process calls fork to divide itself into two processes2. The newly created child process first closes the master socket, and then calls procedure TCPechod to handle the connection. The parent process closes the socket that was created to handle the new connection, and continues executing the infinite loop. The next iteration of the loop will wait at the accept call for another new connection to arrive. Note that both the original and new processes have access to open sockets after the call to fork(), and that they both must close a socket before the system deallocates it. Thus, when the master process calls close for the new connection, the socket for that connection only disappears from the master process. Similarly, when the slave process calls close for the master socket, the socket only disappears from the slave process. The slave process continues to retain access to the socket for the new connection until the slave exits; the master server continues to retain access to the socket that corresponds to the well-known port. After the slave closes the master socket, it calls procedure TCPechod, which provides the ECHO service for one connection. Procedure TCPechod consists of a loop that repeatedly calls read to obtain data from the connection and then calls write to send the same data back over the connection. Normally, read returns the (positive) count of bytes read. It returns a value less than zero if an error occurs (e.g., the network connection between the client and server breaks) or zero if it encounters an endof-file condition (i.e., no more data can be extracted from the socket). Similarly, write normally returns the count of characters written, but returns a value less than zero if an error occurs. The slave checks the return codes, and uses errexit to print a message if an error occurs. TCPechod returns zero if it can echo all data without error. When TCPechod returns, the main program uses the returned value as the argument in a call to exit. UNIX interprets the exit call as a request to terminate the process, and uses the argument as a process exit code. By convention, a process uses exit code zero to denote normal termination. Thus, the slave process exits normally after performing the ECHO service. When the slave exits, the system automatically closes all open descriptors, including the descriptor for the TCP connection.
informs the operating system that the master server process should execute function reaper whenever it receives a signal that a child process has exited (signal SIGCHLD). After the call to signal, the system automatically invokes reaper each time the server process receives a SIGCHLD signal.
2 Recall from Chapter 3 that fork creates two processes, both executing the same code. The return value distinguishes between the original parent process and the newly created child.
109
Function reaper calls system function wait3 to complete termination for a child that exits. Wait3 blocks until one or more children exit (for any reason). It returns a value in the status structure that can be examined to find out about the process that exit ed. Because the program calls wait3 when a SIGCHLD signal arrives, it will always be called after a child has exited. To ensure that an erroneous call does not deadlock the server, the program uses argument WNOHANG to specify that wait3 should not block waiting for a process to exit, but should return immediately, even if no process has exited.
11.7 Summary
Connection-oriented servers achieve concurrency by allowing multiple clients to communicate with the server. The straightforward implementation in this chapter uses the fork function to create a new slave process each time a connection arrives. The master process never interacts with any clients; it merely accepts connections and creates a slave to handle each of them. Each slave process begins execution in the main program immediately following the call to fork. The master process closes its copy of the descriptor for the new connection, and the slave closes its copy of the master descriptor. A connection to a client terminates after the slave exits because the operating system closes the slave's copy of the socket. FOR FURTHER STUDY Postel [RFC 862] defines the ECHO protocol used in the example TCP server. EXERCISES
11.1 Instrument the server so it keeps a log of the time at which it creates each slave process and the time at which the slave terminates. How many clients must you start before you can find any overlap between the slave processes? 11.2 How many clients can access the example concurrent server simultaneously before any client must be denied service? How many can access the iterative server in Chapter 10 before any is denied service? 11.3 Build an iterative implementation of an ECHO server. Conduct an experiment to determine if a human can sense the difference in response time between the concurrent and iterative versions. 11.4 Modify the example server so procedure TCPechod explicitly closes the connection before it returns. Explain why an explicit call to close might make the code easier to maintain.
110
111
because a single-process implementation requires less switching between process contexts, it may be able to handle a slightly higher load than an implementation that uses multiple processes. The key to programming a single-process concurrent server lies in the use of asynchronous I/O through the operating system primitive select. A server creates a socket for each of the connections it must manage, and then calls select to wait for data to ar rive on any of them. In fact, because select can wait for I/O on all possible sockets, it can also wait for new connections at the same time. Algorithm 8.5 lists the detailed steps a single-process server uses.
112
#include <netinet/in.h>
5 4096
*/
errno;
errexit(const char *format, ...); passiveTCP(const char *service, int qlen); echo(int fd);
/*-----------------------------------------------------------------------* main - Concurrent TCP server for ECHO service *-----------------------------------------------------------------------*/ int main(int argc, char *argv[]) { char *service = "echo"; /* service name or port number */
/* read file descriptor set */ /* active file descriptor set /* from-address length */ */
switch (argc) { case 1: break; case 2: service = argv[1]; break; default: errexit("usage: TCPmechod [port]\n"); }
113
if (select(nfds, &rfds, (fd_set *)0, (fd_set *)0, (struct timeval *)0) < 0) errexit("select: %s\n", strerror(errno)); if (FD_ISSET(msock, &rfds)) { int ssock;
alen = sizeof(fsin); ssock = accept(msock, (struct sockaddr *)&fsin, &alen); if (ssock < 0) errexit("accept: %s\n", strerror(errno)); FD_SET(ssock, &afds); } for (fd=0; fd<nfds; ++fd) if (fd != msock && FD_ISSET(fd, &rfds)) if (echo(fd) == 0) { (void) close(fd); FD_CLR(fd, &afds); } } }
/*-----------------------------------------------------------------------* echo - echo one buffer of data, returning byte count *-----------------------------------------------------------------------*/ int echo(int fd) { char int cc; buf[BUFSIZ];
114
if (cc < 0) errexit("echo read: %s\n", strerror(errno)); if (cc && write(fd, buf, cc) < 0) errexit("echo write: %s\n", strerror(errno)); return cc; }
The single-process server begins, like the master server in a multiple-process implementation, by opening a passive socket at the well-known port. It uses FD_ZERO and FD_SET to create a bit vector that corresponds to the socket descriptors that it wishes to test. The server then enters an infinite loop in which it calls select to wait for one or more of the descriptors to become ready. If the master descriptor becomes ready, the server calls accept to obtain a new connection. It adds the descriptor for the new connection to the set it manages, and continues to wait for more activity. If a slave descriptor becomes ready, the server calls pro cedure echo which calls read to obtain data from the connection and write to send it back to the client. If one of the slave descriptors reports an end-of-file condition, the server closes the descriptor and uses macro FD_CLR to remove it from the set of descriptors select uses.
12.6 Summary
Execution in concurrent servers is often driven by the arrival of data and not by the timeslicing mechanism in the underlying operating system. In cases where the service requires little processing, a single-process implementation can use asynchronous I/O to manage connections to multiple clients as effectively as an implementation that uses multiple processes. The single-process implementation performs the duties of the master and slave processes. When the master socket becomes ready, the server accepts a new connection. When any other socket becomes ready, the server reads a request and sends a re ply. An example single-process server for the ECHO service illustrates the ideas and shows the programming details. FOR FURTHER STUDY A good protocol specification does not constrain the implementation. For example, the single-process server described in this chapter implements the ECHO protocol defined by Postel [RFC 862]. Chapter 11 shows an example of a multiple-process, concurrent server built from the same protocol specification. EXERCISES
12.1 Conduct an experiment that proves the example ECHO server can handle connections concurrently. 12.2 Does it make sense to use the implementation discussed in this chapter for the DAYTIME service? Why or why not? 12.3 Read the UNIX Programmer's Manual to find out the exact representation of descriptors in the list passed to select. Write the FD_SET and FD_CLR macros. 12.4 Compare the performance of single-process and multiple-process server implementations on a computer with multiple processors. Under what circumstances will a single-process version perform better than (or equal to) a multiple-process version? 12.5 Suppose a large number of clients (e.g., 100) access the example server in this chapter at the same time. Explain what each client might observe. 12.6 Can a single-process server ever deprive one client of service while it repeatedly honors requests from another? Can a multipleprocess implementation ever exhibit the same behavior? Explain.
115
117
because a single-process implementation requires less switching between process contexts, it may be able to handle a slightly higher load than an implementation that uses multiple processes. The key to programming a single-process concurrent server lies in the use of asynchronous 1/O through the operating system primitive select. A server creates a socket for each of the connections it must manage, and then calls select to wait for data to ar rive on any of them. In fact, because select can wait for 1/O on all possible sockets, it can also wait for new connections at the same time. Algorithm 8.5 lists the detailed steps a single-process server uses.
In essence, a single process server must perform the duties of both the master and :lave processes. It maintains a set of sockets, with one socket in the set bound to the well-known port at which the master would accept connections. The other sockets in the set each correspond to a connection over which a slave would handle requests. The server passes the set of socket descriptors as an argument to select, and waits for activity on any of them. When select returns, it passes back a bit mask that specifies which of the descriptors in the set is ready. The server uses the order in which descriptors become ready to decide how to proceed. To distinguish between master and slave operations, a single-process server uses the descriptor. If the descriptor that corresponds to the master socket becomes ready, the server performs the same operation the master would perform: it calls accept on the socket to obtain a new connection. If a descriptor that corresponds to a slave socket becomes ready, the server performs the operation a slave would perform: it calls read to obtain a request, and then answers it.
118
5 4096
*/
errno;
errexit(const char *format, ...); passiveTCP(const char *service, int qlen); echo(int fd);
/*-----------------------------------------------------------------------* main - Concurrent TCP server for ECHO service *-----------------------------------------------------------------------*/ int main(int argc, char *argv[]) { char *service = "echo"; /* service name or port number */
/* read file descriptor set */ /* active file descriptor set /* from-address length */ */
switch (argc) { case 1: break; case 2: service = argv[1]; break; default: errexit("usage: TCPmechod [port]\n"); }
nfds = getdtablesize();
119
if (select(nfds, &rfds, (fd_set *)0, (fd_set *)0, (struct timeval *)0) < 0) errexit("select: %s\n", strerror(errno)); if (FD_ISSET(msock, &rfds)) { int ssock;
alen = sizeof(fsin); ssock = accept(msock, (struct sockaddr *)&fsin, &alen); if (ssock < 0) errexit("accept: %s\n", strerror(errno)); FD_SET(ssock, &afds); } for (fd=0; fd<nfds; ++fd) if (fd != msock && FD_ISSET(fd, &rfds)) if (echo(fd) == 0) { (void) close(fd); FD_CLR(fd, &afds); } } }
/*-----------------------------------------------------------------------* echo - echo one buffer of data, returning byte count *-----------------------------------------------------------------------*/ int echo(int fd) { char int cc; buf[BUFSIZ];
120
errexit("echo read: %s\n", strerror(errno)); if (cc && write(fd, buf, cc) < 0) errexit("echo write: %s\n", strerror(errno)); return cc; }
The single-process server begins, like the master server in a multiple-process implementation, by opening a passive socket at the well-known port. It uses FD_ZERO and FD_SET to create a bit vector that corresponds to the socket descriptors that it wishes to test. The server then enters an infinite loop in which it calls select to wait for one or more of the descriptors to become ready. If the master descriptor becomes ready, the server calls accept to obtain a new connection. It adds the descriptor for the new connection to the set it manages, and continues to wait for more activity. If a slave descriptor becomes ready, the server calls pro cedure echo which calls read to obtain data from the connection and write to send it back to the client. If one of the slave descriptors reports an end-of-file condition, the server closes the descriptor and uses macro FD_CLR to remove it from the set of descriptors select uses.
12.6 Summary
Execution in concurrent servers is often driven by the arrival of data and not by the timeslicing mechanism in the underlying operating system. In cases where the service requires little processing, a single-process implementation can use asynchronous I/O to manage connections to multiple clients as effectively as an implementation that uses multiple processes. The single-process implementation performs the duties of the master and slave processes. When the master socket becomes ready, the server accepts a new connection. When any other socket becomes ready, the server reads a request and sends a reply. An example single-process server for the ECHO service illustrates the ideas and shows the programming details. FOR FURTHER STUDY A good protocol specification does not constrain the implementation. For example, the single-process server described in this chapter implements the ECHO protocol defined by Postel [RFC 862]. Chapter 11 shows an example of a multiple-process, con;urrent server built from the same protocol specification. EXERCISES
12.1 Conduct an experiment that proves the example ECHO server can handle connections concurrently. 12.2 Does it make sense to use the implementation discussed in this chapter for the DAYTIME service? Why or why not? 12.3 Read the UNIX Pro g ra mme r's Ma n u a l to find out the exact representation of descriptors in the list passed to se le c t. Write the FD
S ET and FD _ CLR macros.
12.4 Compare the performance of single-process and multiple-process server implementations on a computer with multiple processors. Under what circumstances will a single-process version perform better than (or equal to) a multiple-process version? 12.5 Suppose a large number of clients (e.g., 100) access the example server in this chapter at the same time. Explain what each client might observe. 11.6 Can a single-process server ever deprive one client of service while it repeatedly honors requests from another? Can a multipleprocess implementation ever exhibit the same behavior? Explain.
121
123