Jean Tunis - Analyzing HTTP - A Practical Guide To Profiling and Troubleshooting Web Performance-Byrnes Publishing (2015)
Jean Tunis - Analyzing HTTP - A Practical Guide To Profiling and Troubleshooting Web Performance-Byrnes Publishing (2015)
Jean Tunis
Copyright © 2015 by Jean Tunis
ISBN: 978-1-6873-4191-4
Table of Contents
About This Book........................................................................v
Types of Requests..........................................................................4
Types of Responses........................................................................5
Header Fields.................................................................................6
Message Body................................................................................6
HTTP/1.1 Limitations..................................................................45
Workarounds...............................................................................47
Next Version................................................................................48
I wrote this book to help save you some time before, during
and after you have tested your web application. Who doesn’t
want to save time?
Next, I discuss some more details about the protocol itself and
how it works.
1
JEAN TUNIS
2
ANALYZING HTTP
One solution was to allow the client to specify to the server that
it wanted the connection to remain open. Another solution to
aid with this was to allow the client to send a request without
waiting for its response before sending the next request. This
allowed the client to pipeline multiple HTTP requests.
3
JEAN TUNIS
Types of Requests
There are a number of types of HTTP requests, but the two
most utilized are HTTP GETs and HTTP POSTs. Some of the
lesser utilized requests include the HEAD, PUT and DELETE.
4
ANALYZING HTTP
Types of Responses
There are a number of HTTP response types from a web server
to a client. However, some of the most commonly known and
seen are 100, 200, 301, 404 and 500. Here is a quick
discussion on these response types.
HTTP 100
The HTTP 100 response from a server tells a client that it can
continue on and send the request it wanted to send. This often
happens when a browser does not want to make a request for a
file that may take some processing time. Instead of requesting
the whole file, the browser requests for the existence of the file,
and waits for the server to tell it that it can continue with its
requests, if the file exists. A client that makes this request
usually specifies the following in its HTTP header: “Expect:
100-continue”.
HTTP 200
The 200 response is the most common and most expected
response. It signifies that the server has received the client's
request and that it can fulfill it. In your packet analysis tool,
you should see “200 OK”.
HTTP 301
The 301 response is used by the server to redirect client
requests to another server. This code tells the client browser
that the redirect is permanent, and the resource it is looking
for has permanently moved. A similar 302 code specifies that
the move is temporary. In either case, redirects tend to be
frowned upon as they can adversely affect web performance.
HTTP 404
This is commonly seen when a particular requested resource
(such as a file) on a server is not found. This can happen when
5
JEAN TUNIS
changes were made to the application, but the code was not
updated to reflect those changes, and the browser continues to
request files no longer being used by the application. This is
the dreaded 404 error. Some companies today plan for this
error page and provide custom pages so that the user still finds
something useful, instead of nothing at all.
HTTP 500
This generic code is sent by the server when an internal server
error has been detected, but no additional information is
known about the reason for the error. This error usually means
that something has gone terribly wrong with your web or
application server.
Header Fields
The HTTP header contains information used by both client
and server to determine how to handle requests and responses.
There are many fields in the header, specifying what both
client and server should do or are doing with data that is
requested or received.
Message Body
The HTTP message-body is the actual data being sent or
received. This part of the HTTP protocol follows the header
information above. This field is actually optional since there
doesn’t have to be any data sent.
6
ANALYZING HTTP
The HTTP header contains all the information about the body
of the message, such as what type of message it is, so that the
receiver (either client or server) can understand how to, or
even when it can, render or receive the message.
7
JEAN TUNIS
Over the years, the Web has become more integrated into our
daily lives. We use the Internet for everything these days, from
buying goods and services to providing them ourselves.
8
ANALYZING HTTP
9
JEAN TUNIS
This isn’t to say that the network is never at fault any more.
Things happen.
All these things still happen. However, when they do, from my
experience, it is not because of the network not being robust
enough.
We have queuing delay. This delay is the time that the data
sent is waiting in a device’s queue along the path between the
client and the server before it can be transmitted.
10
ANALYZING HTTP
All of the above delays, except one, are functions of things that
we can reduce with better technology.
11
JEAN TUNIS
If you are on one of those teams, below are some steps you can
follow to help you with profiling and/or troubleshooting a
website or web application.
Tools
The process of analyzing a website or web application involves,
first and foremost, having the appropriate knowledge about
how the HTTP protocol and the TCP/IP stack function. While I
provided an overview of the HTTP protocol, this book is not
intended to delve into any sort of detail of how TCP/IP works.
It is firstly assumed you have some of the basic knowledge.
12
ANALYZING HTTP
Pre-Capture Preparation
When preparing to analyze a website or web application, you
cannot just dive right in and start capturing data. That is a
recipe for disaster and potentially a waste of yours and
everybody else’s time.
You must have a plan and then create a plan of attack. As they
say, if you fail to plan, you plan to fail.
Here are some questions to ask and get answers to before you
go off capturing data. These are questions you ask the
application team in many cases, but may also include the
involvement of existing users.
13
JEAN TUNIS
14
ANALYZING HTTP
15
JEAN TUNIS
Troubleshooting Questions
The general questions above should be asked for any analysis
process. However, when you’re being asked to troubleshoot an
existing web application issue, there are some additional
questions you will need some answers to that will help you
16
ANALYZING HTTP
17
JEAN TUNIS
Tiers Involved
In order to properly analyze an HTTP application, or any
application for that matter, you have to be able to capture the
appropriate data. To do this, you must identify the correct
tiers. As mentioned in the above Questions section, you should
have gotten either a diagram or some information about the
application architecture that would have helped you identify all
the tiers involved.
18
ANALYZING HTTP
19
JEAN TUNIS
Once you have the information you need from all your
questions, you can submit any requests that are needed to get
the agents installed on the appropriate tiers. Don’t forget about
Change Management! An organization's Change Management
process can derail your analysis efforts, if you’re not aware of
them.
20
ANALYZING HTTP
Agent Installation
When installing your agent, there are some things to be
cognizant of to help make installation and capture go as
smoothly as possible. Each organization has its own sets of
rules, processes and procedures for getting software installed
on client and server machines. Be sure to follow whatever
process is required in a timely fashion to get the agents on the
machines well in advance of the testing.
Installation Tips
Below are some tips to help with the installation and testing of
your web application:
21
JEAN TUNIS
22
ANALYZING HTTP
23
JEAN TUNIS
After identifying the reason for the analysis, you want to better
understand the application architecture. You want to confirm
that what you were told when you were asking the above
questions is what is actually happening. Any erroneous
information is often not the fault of the users or application
team. Sometimes, this information is either not known or
simply out of date. Organizations that have an application
testing process will more than likely (although not always)
have provided accurate information.
Second, you also want to confirm the protocols being used, the
amount of data exchanged, any retransmissions, and more.
You want to verify some typical metrics:
24
ANALYZING HTTP
It is not very likely that you will be able to obtain the private
key to decrypt TLS, especially in a production environment. If
you are unable to obtain the private key, you will need to focus
on profiling or troubleshooting the application based on TCP
metrics only, such as retransmissions and window sizes.
There are some other ways to get around this, like getting
server logs and correlating data in those logs to when requests
are being made. So it’s clearly possible...just more time-
consuming and sometimes more interesting!
25
JEAN TUNIS
26
ANALYZING HTTP
2. HTTP Version
HTTP/2 is the recently ratified version of the HTTP protocol.
However, few websites have it implemented. Most
organizations are running HTTP/1.1 as the version of the
protocol that has been standardized. This was officially defined
in RFC 2068 in 1997. The original standard version, HTTP/1.0,
was defined in RFC 1945 in 1996, even though HTTP had been
in use since 1990 in some form or another. It quickly became
evident that HTTP/1.0 had its shortcomings and work and
support for HTTP/1.1 was available in 1996.
27
JEAN TUNIS
3. User-Agent
There are many browsers available for use to access websites.
The browser makes requests on behalf of the user, and is
known as the HTTP user agent. At many organizations, the
current browser standard is Internet Explorer (IE), especially
those with a longer history.
28
ANALYZING HTTP
The user agent is the same as the browser or HTTP client being
used to access web services. Sometimes, if you see an issue and
want to figure out what browser a client may be using, the way
to do so in the captured data is to look at the HTTP header in
the protocol decodes in your analyzer. You may even find out
that the client isn’t using a browser at all, but an HTTP client
that is using a user agent not commonly seen at your
organization.
You will find a header field called “user-agent” when you look
at the header. Somewhere in this field will be information that
clues you in on what browser version is being used.
29
JEAN TUNIS
4. Gzip Compression
Something else to check for when analyzing an HTTP
application is whether compression is enabled. When this is
enabled, the web server compresses files before sending them
to the client. However, in order for this to occur, both sides
have to be able to support compression. It is quite common
these days to find that the client supports compression, but the
server either does not support it or it is not enabled.
30
ANALYZING HTTP
31
JEAN TUNIS
5. Caching
Caching is commonly known as beneficial to application
performance, particularly with HTTP applications. By default,
most browsers support client-side caching. However,
depending on the application, not everything is cacheable and
some servers do not allow it.
6. Connection Keep-Alives
HTTP Keep-alive (or HTTP persistent connection) is a
capability that has been part of HTTP since HTTP/1.1. With
persistent connections in HTTP/1.1, it is this capability that
32
ANALYZING HTTP
33
JEAN TUNIS
34
ANALYZING HTTP
35
JEAN TUNIS
8. HTTP Chunking
HTTP chunking is a capability in HTTP that allows a server to
break up the data it needs to send to a client into chunks. As
mentioned above, this was added in HTTP/1.1 and helps client
browsers process received data that can be displayed to the
user faster. By default, many web and application servers
break up data into 8192 bytes. What this means is that if a user
requests a 64KB file, the server will send it in about eight 8KB
chunks.
36
ANALYZING HTTP
affects how much data TCP can send over the network,
regardless of how big its congestion window.
37
JEAN TUNIS
38
ANALYZING HTTP
39
JEAN TUNIS
40
ANALYZING HTTP
41
JEAN TUNIS
There are also cases when the client makes direct requests to
the database with this same type of inefficiency. One of the
biggest reasons for this is that the SQL queries used request
data one database row at a time. This is typical of database
communication without explicit configuration changes.
42
ANALYZING HTTP
There are times when the application team cannot or will not
accept any of the other recommendations to remedy a slow
application. This is most often the case because the application
is vendor-based, and changes are either not likely occur or may
not occur for a long time. If that is the case, testing the viability
of remote desktop solutions should be considered.
43
JEAN TUNIS
For scaling to work, like other options in TCP and HTTP, both
sides must support it. To determine whether TCP scaling is
enabled, you can check the TCP options in the protocol
decodes in your packet analyzer. When looking at the decodes
for the TCP protocol, you should see a “Window scale:” field
with the value of the scaling factor.
44
ANALYZING HTTP
HTTP/1.1 Limitations
Although HTTP/1.1 has a number of limitations, this has not
stopped the protocol from being probably the most popular
application layer protocol, and arguably the most popular
behind TCP. However, as websites and web applications and
now, mobile websites and applications, make more use of the
protocol, its limitations are becoming more of a pain. In this
section, I want to discuss some of these limitations.
45
JEAN TUNIS
This means that when the server did receive a client request, it
still had to reply in the order the requests were received.
Unfortunately, if one of those multiple requests got lost en
route to the server, it could not respond to the other requests
until the lost request was re-sent. This is due to the nature of
how the TCP works, thereby causing what’s known as head-of-
line blocking.
Short-Lived Requests
As mentioned previously, HTTP requests can be for many
images or files. Because of this, many requests don’t last very
long. As a result, the short-lived nature of HTTP always ends
up being impacted by TCP slow-start algorithm.
46
ANALYZING HTTP
Workarounds
As the popularity of the HTTP protocol grew, workarounds
have had to be used to get around the HTTP/1.1 limitations.
The following are some workarounds that have come up over
the years.
Multiple Connections
As mentioned above, today’s browsers allow six concurrent
TCP connections. It wasn’t always this way. Previous versions
of Internet Explorer, for example, only allowed two concurrent
connections. As these limitations began to impact web
performance, browsers went up to six. This was done so that
multiple HTTP requests could be sent and to help avoid the
impact of TCP slow-start.
Domain Sharding
As mentioned above, there can be too many connections, and
that can adversely impact performance. The number of
concurrent connections in today’s browser is what it is in part
because you do not want too many connections. However, this
limits a website that has numerous small images or files that
need to be sent to the browser.
47
JEAN TUNIS
Resource Inlining
As I mentioned previously, there can be too many connections.
Too many connections introduce the user to any delay that
may be on the network. The best way to reduce this delay is to
avoid it. Resource inlining involves putting your scripting code
into the HTML so that the browser can execute it without
making a request to the server for the script file.
Next Version
To help solve, not just work around, many of the HTTP/1.1
limitations, Google released the first draft in 2009 of what they
called the SPDY (pronounced speedy) protocol. The main goal
was to reduce the time it takes for web pages to load.
48
ANALYZING HTTP
49
JEAN TUNIS
→ https://developers.google.com/speed/docs/insights/about:
Google page that discusses how to optimize HTTP.
→ http://developer.yahoo.com/performance/rules.html:
Yahoo page that discusses how to optimize HTTP.
Thank you for reading this book. If you found it valuable to your
work as an analyst, engineer or developer, let me know. I’d like to
learn what helped. Email me at [email protected].
And if you have a friend whom you believe this book can help in
their work, please share it with them.
Happy analyzing!
Jean Tunis
50