Performance Issues in WWW Servers
Performance Issues in WWW Servers
(Extended Abstract)
Erich Nahum, Tsipora Barzilai, and Dilip Kandlur
IBM T.J. Watson ResearchCenter
Hawthorne, NY 10532
{nahum,tsipora,kandlur}@watson.ibm.com
216
1 File Size 11 Flash ( Flash 1 Diff fl configuration SURGE Diff
Opdsec (%)
Flash-Poll 437.72
+ send-file () 418.05 -05
+ Mbuf Caching 519.83 +20
+ Checksum Offload 555.14 +06
+ FIN Piggyback 560.66 +01
+ Delayed Ack of FIN 571.60 to2
+ Delayed Ack of SYN 581.56 +02
Total Improvement: 11 +25
Table 1: HTTP Throughput in ops/sec (WebStone) Table 2: HTTP Throughput in ops/sec (SURGE)
2 Overview of Results incrementally since they do not require both hosts in a con-
versation to adopt them, unlike T/TCP or SACK.
Space constraints prevent us from describing our results fully, thus
we can only provide an overview of our findings. Table 1 shows aggregate benefits. Using SURGE as a macrobenchmark, we
how our optimizations improve server throughput across requests show that the combination of techniques improve aggregate
for different file sizes as measured by WebStone. Table 2 shows server performance by 25 percent.
the aggregate increase in server throughput as our optimizations are
incrementally added as measured by SURGE. Interested readers While we have evaluated these optimizations in the context of a
should consult the IBM research report [6] for more details. We WWW server, they have utility for other programs as well. Reduc-
summarize our findings as follows: ing packet exchanges should help other TCP-based applications,
and send-f i 1 e ( 1 is a general function that can be used by other
l newsocketfinction~ We evaluate the proposed socket fimc- network servers, such as NFS, FTP, or SMB. As a consequence of
tionsacceptex()andsend-file(). Wefindlittleorno our findings, IBM’s AM division has releasedthese features in AIX
increaseinperformance usingthe acceptex ( 1function, on 4.3.2.
either process-based or thread-based WWW servers. In ad-
dition, kernel profiling shows that servers spend a relatively References
smallamountoftime inthe accept 0, getsockname () ,
andread ( ) system calls. A send-file ( ) implementation [l] JussaraM. Ahneida, Virgilio Ahneida, and David J.Yates. Measuring
the behaviorofaworld-widewebserver. InSeventhZFZP Conferenceon
that incurs a single copy provides no advantage over a com-
High Peflrmance Networking (HPN), White Plains, NY, April 1997.
binationofmmap()andwritevO.
[2] Martin F. Arlitt and Carey L. Williamson. Internet web servers: Work-
l per-byte optimizations. Per-byte optimizations that we ex- load characterization and performance implications. IEEE/ACM Trans-
amine include eliminating a data copy on the fast path by actions on Networking, 5(5):63 1-646,Oct 1997.
caching mbufs within the kernel and offloading the TCP [3] Paul Barford and Mark Crovella. Generating representative web work-
checksum to the adaptor. A send-f i 1 e ( ) implementation loads for network and server performance evaluation. In Proceedings
tied to an integrated I/O system which does not copy data of the ACM Sigmelrics Conference on Measurement and Modeling of
provides substantially better performance. In our testbed, Computer Systems, Madison, WI, June 1998.
we observe an increase in throughput of up to 53 percent. [4] James C. Hu, I&n Pyarali, and Douglas C. Schmidt. Measuring the
We find that offloading the checksum to the network device impact of event dispatching and concurrency models on web server
can improve WWW server performance by up to 7 percent. performance over high-speed networks. In Proceedings of the 2nd
Our mbuf cache mechanism can also be enhanced to allow Globallntemet Conference (held aspart of GLOBECOM ‘97), Phoenix,
caching of the checksum values in the mbufs, for network AZ, Nov 1997.
interfaces that do not support the checksum offload. [5] Yiing Hu, Ashwini Nan&, and Qing Yang. Measurement, analysis,
and performance improvement of the Apache web server. Technical
0 per-connection optimizations. Our per-connection optimiza- Report 1097-0001,UniversityofRhodeIslandDepartmentofElectrical
tionsreduce overheadby eliminating redundantpacketsin the and Computer Engineering, Ott 1997.
TCP connection setup and teardown. We show how the close [6] Erich M. Nahum, Tsipora Btilai, and Dilip KandIur. Performance
option to sendfile ( ) provides the semantic support to issues in WWW servers. IBM Research Report, May 1999.
enable piggybacking the FIN on the last data segment, elim- [7] Vivek S. Pai, Peter Dmschel, and Willy Zwaenepoel. I/O Lite: A
inating one packet in small transfers and improving through- copy-free UNIX I/O system. In 3rd USENLYSymposium on Operating
put by 6 percent in those cases. We also show how delaying SystemsDesign andZmplementation, New Orleans, LA, February 1999.
acknowledgments for the FIN and SYN-ACK packets can
eliminate 2 more packets, increasing performance an addi-
tional 14 percent for small transfers. In total, we reduce the
packets in a small HTTP exchange from 9 to 6, reducing
network utilization and raising server throughput by up to
20 percent in those scenarios, all without violating the TCP
protocol specification. Our changes are more easily deployed
217