Web Infrastructure Design
Web Infrastructure Design
Network basics
What is a protocol
What is an IP address
What is TCP/IP
What is an Internet Protocol (IP) port?
Server
Servers are located in datacenters which are buildings that host hundreds, or even thousands of
computers (servers). You can think of a server as a computer without a keyboard, mouse, or screen,
that is accessible only by a network. A server can be physical or virtual. A server runs an OS
(operating system).
The word service (noun) may refer to the abstract form of functionality, e.g. Web service.
Alternatively, it may refer to a computer program that turns a computer into a server, Thus any
general-purpose computer connected to a network can host servers. For example, if files on a
device are shared by some process, that process is a file server. Similarly, web server software
can run on any capable computer, and so a laptop or a personal computer can host a web server.
Web server
The term web server can refer to hardware or software, or both working together.
1. On the hardware side, a web server is a computer that stores web server software and a website's
component files (for example, HTML documents, images, CSS stylesheets, and JavaScript files).
A web server connects to the Internet and supports physical data interchange with other devices
connected to the web.
2. On the software side, a web server includes several parts that control how web users access
hosted files. At a minimum, this is an HTTP server. An HTTP server is software that
understands URLs (web addresses) and HTTP (the protocol your browser uses to view
webpages). An HTTP server can be accessed through the domain names of the websites it stores,
and it delivers the content of these hosted websites to the end user's device.
At the most basic level, whenever a browser needs a file that is hosted on a web server, the browser
requests the file via HTTP. When the request reaches the correct (hardware) web server, the
(software) HTTP server accepts the request, finds the requested document, and sends it back to the
browser, also through HTTP. (If the server doesn't find the requested document, it returns a 404 response
instead.
A dynamic web server consists of a static web server plus extra software, most commonly
an application server and a database. We call it "dynamic" because the application server updates the
hosted files before sending content to your browser via the HTTP server.
For example, to produce the final webpages you see in the browser, the application server might fill an
HTML template with content from a database. Sites like MDN or Wikipedia have thousands of webpages.
Typically, these kinds of sites are composed of only a few HTML templates and a giant database, rather
than thousands of static HTML documents. This setup makes it easier to maintain and deliver the content.
Hosting files.
Dedicated web servers or your personal computer?
the importance of hosting website files on a dedicated web server rather than a personal computer due to
factors like availability, internet connectivity, fixed IP addresses, and maintenance provided by hosting
providers.
Second, a web server provides support for HTTP (Hypertext Transfer Protocol). As its name
implies, HTTP specifies how to transfer hypertext (linked web documents) between two
computers.
A Protocol is a set of rules for communication between two computers. HTTP is a textual,
stateless protocol.
Neither the server nor the client remember previous communications. For example, relying on HTTP
alone, a server can't remember a password you typed or remember your progress on an incomplete
transaction. You need an application server for tasks like that.
On a web server, the HTTP server is responsible for processing and answering incoming requests.
Upon receiving a request, an HTTP server checks if the requested URL matches an existing file.
If so, the web server sends the file content back to the browser. If not, the server will check if it
should generate a file dynamically for the request (see Static vs. dynamic content).
If neither of these options are possible, the web server returns an error message to the browser,
most commonly 404 Not Found. The 404 error is so common that some web designers devote
considerable time and effort to designing 404 error pages.
HTTP
Hypertext Transfer Protocol (HTTP) is an application-layer.
protocol for transmitting hypermedia documents, such as HTML. It was designed for communication
between web browsers and web servers. it is a client-server protocol, which means requests are initiated
by the recipient, usually the Web browser
Clients and servers communicate by exchanging individual messages (as opposed to a stream of data).
The messages sent by the client, usually a Web browser, are called requests and the messages sent by the
server as an answer are called responses. Between the client and the server there are numerous entities,
collectively called proxies, which perform different operations and act as gateways or caches, for
example.
In reality, there are more computers between a browser and the server handling the request: there are
routers, modems, and more. Thanks to the layered design of the Web, these are hidden in the network and
transport layers. HTTP is on top, at the application layer. Although important for diagnosing network
problems, the underlying layers are mostly irrelevant to the description of HTTP.
A server appears as only a single machine virtually; but it may be a collection of servers sharing the load
(load balancing), or other software (such as caches, a database server, or e-commerce servers), totally or
partially generating the document on demand. A server is not necessarily a single machine, but several
server software instances can be hosted on the same machine. With HTTP/1.1 and the Host header, they
may even share the same IP address.
operating at the application layers are generally called proxies. These can be transparent, forwarding on
the requests they receive without altering them in any way, or non-transparent, in which case they will
change the request in some way before passing it along to the server. Proxies may perform numerous
functions:
caching (the cache can be public or private, like the browser cache)
filtering (like an antivirus scan or parental controls)
load balancing (to allow multiple servers to serve different requests)
authentication (to control access to different resources)
logging (allowing the storage of historical information)
HTTP is stateless, but not sessionless
HTTP is stateless: there is no link between two requests being successively carried out on the same
connection. This immediately has the prospect of being problematic for users attempting to interact with
certain pages coherently, for example, using e-commerce shopping baskets. But while the core of HTTP
itself is stateless, HTTP cookies allow the use of stateful sessions. Using header extensibility, HTTP
Cookies are added to the workflow, allowing session creation on each HTTP request to share the same
context, or the same state.
A connection is controlled at the transport layer, and therefore fundamentally out of scope for HTTP.
HTTP doesn't require the underlying transport protocol to be connection-based; it only requires it to
be reliable, or not lose messages (at minimum, presenting an error in such cases). Among the two most
common transport protocols on the Internet, TCP is reliable and UDP isn't. HTTP therefore relies on the
TCP standard, which is connection-based.
Before a client and server can exchange an HTTP request/response pair, they must establish a TCP
connection, a process which requires several round-trips. The default behavior of HTTP/1.0 is to open a
separate TCP connection for each HTTP request/response pair. This is less efficient than sharing a single
TCP connection when multiple requests are sent in close succession.
In order to mitigate this flaw, HTTP/1.1 introduced pipelining (which proved difficult to implement)
and persistent connections: the underlying TCP connection can be partially controlled using
the Connection header. HTTP/2 went a step further by multiplexing messages over a single connection,
helping keep the connection warm and more efficient.
Caching: How documents are cached can be controlled by HTTP. The server can instruct proxies
and clients about what to cache and for how long. The client can instruct intermediate cache
proxies to ignore the stored document.
Relaxing the origin constraint: To prevent snooping and other privacy invasions, Web browsers
enforce strict separation between websites. Only pages from the same origin can access all the
information of a Web page. Though such a constraint is a burden to the server, HTTP headers can
relax this strict separation on the server side, allowing a document to become a patchwork of
information sourced from different domains; there could even be security-related reasons to do
so.
Authentication: Some pages may be protected so that only specific users can access them. Basic
authentication may be provided by HTTP, either using the WWW-Authenticate and similar headers,
or by setting a specific session using HTTP cookies.
Proxy and tunneling: Servers or clients are often located on intranets and hide their true IP
address from other computers. HTTP requests then go through proxies to cross this network
barrier. Not all proxies are HTTP proxies. The SOCKS protocol, for example, operates at a lower
level. Other protocols, like ftp, can be handled by these proxies.
Sessions: Using HTTP cookies allows you to link requests with the state of the server. This
creates sessions, despite basic HTTP being a state-less protocol. This is useful not only for e-
commerce shopping baskets, but also for any site allowing user configuration of the output.
HTTP flow
When a client wants to communicate with a server, either the final server or an intermediate
proxy, it performs the following steps:
1. Open a TCP connection: The TCP connection is used to send a request, or several, and receive an
answer. The client may open a new connection, reuse an existing connection, or open several TCP
connections to the servers.
2. Send an HTTP message: HTTP messages (before HTTP/2) are human-readable. With HTTP/2,
these simple messages are encapsulated in frames, making them impossible to read directly, but
the principle remains the same. For example:
HTTPCopy to Clipboard
GET / HTTP/1.1
Host: developer.mozilla.org
Accept-Language: fr
HTTPCopy to Clipboard
HTTP/1.1 200 OK
Date: Sat, 09 Oct 2010 14:28:02 GMT
Server: Apache
Last-Modified: Tue, 01 Dec 2009 20:18:22 GMT
ETag: "51142bc1-7449-479b075b2891b"
Accept-Ranges: bytes
Content-Length: 29769
Content-Type: text/html
<!DOCTYPE html>… (here come the 29769 bytes of the requested web page)
If HTTP pipelining is activated, several requests can be sent without waiting for the first
response to be fully received. HTTP pipelining has proven difficult to implement in existing
networks, where old pieces of software coexist with modern versions. HTTP pipelining has been
superseded in HTTP/2 with more robust multiplexing requests within a frame.
HTTP Messages
HTTP messages, as defined in HTTP/1.1 and earlier, are human-readable. In HTTP/2, these
messages are embedded into a binary structure, a frame, allowing optimizations like compression
of headers and multiplexing. Even if only part of the original HTTP message is sent in this
version of HTTP, the semantics of each message is unchanged and the client reconstitutes
(virtually) the original HTTP/1.1 request. It is therefore useful to comprehend HTTP/2 messages
in the HTTP/1.1 format.
There are two types of HTTP messages, requests and responses, each with its own format.
Requests
An HTTP method, usually a verb like GET, POST, or a noun like OPTIONS or HEAD that defines
the operation the client wants to perform. Typically, a client wants to fetch a resource (using GET)
or post the value of an HTML form (using POST), though more operations may be needed in
other cases.
The path of the resource to fetch; the URL of the resource stripped from elements that are obvious
from the context, for example without the protocol (http://),
the domain (here, developer.mozilla.org), or the TCP port (here, 80).
The version of the HTTP protocol.
Optional headers that convey additional information for the servers.
A body, for some methods like POST, similar to those in responses, which contain the resource
sent.
Responses
An example response:
Another API, server-sent events, is a one-way service that allows a server to send events to the
client, using HTTP as a transport mechanism. Using the EventSource interface, the client opens a
connection and establishes event handlers. The client browser automatically converts the
messages that arrive on the HTTP stream into appropriate Event objects. Then it delivers them to
the event handlers that have been registered for the events' type if known, or to
the onmessage event handler if no type-specific event handler was established.
Conclusion
HTTP is an extensible protocol that is easy to use. The client-server structure, combined with the
ability to add headers, allows HTTP to advance along with the extended capabilities of the Web.
Though HTTP/2 adds some complexity by embedding HTTP messages in frames to improve
performance, the basic structure of messages has stayed the same since HTTP/1.0. Session flow
remains simple, allowing it to be investigated and debugged with a simple HTTP message
monitor
DNS
Load balancer
Monitoring
What is a database
What’s the difference between a web server and an app server?
DNS record types
Single point of failure
How to avoid downtime when deploying new code
High availability cluster (active-active/active-passive)
What is HTTPS
What is a firewall