Assignment 2
Assignment 2
a. Use a packet sniffer (e.g., Wireshark) to capture the sequence of ASCII characters that are
sent and received by a web browser as a result of a request of your choice to a web server.
Provide a screen capture of these sequences and add carriage return and line feed
characters as needed to improve readability.
I used Wireshark to capture the sequence of ASCII characters that are sent and received using
Google Chrome for amazon.com.
Requester:
Response:
b. Identify the complete URL of the document requested, the HTTP protocol version for both
the request and response, the operating system that the web browser is running on, and
the kind of web server that answered the request.
Complete URL of the document requested: https://www.amazon.com/
HTTP protocol version for both the request and response: 1.1
Operating System that the web browser is running on: Windows 10
d. Which web browser sent the request and why is it important for the server to know this
information?
The web browser sending the request is Google Chrome. This information is provided by the
browser while sending the request. It is important for the server to know this information as in
most of the cases it sends different versions of the same objects to different types of browsers.
Based on the browser, the web server sends the appropriate response back to computer using
the IP address and a port number.
e. Was the request successful and, if so, what type of document was received by the server?
Yes, the request was successful and the document type received is text/html (which can be seen
in the screenshot above).
Explain how Web architectures were developed and refined to increasingly support
applications with informational, interactive, transactional, and delivery requirements?
Please relate to specific architectures, their corresponding protocols, and describe the
improvements that were made over time.
The basic web architecture is two-tiered and characterized by a web client that displays
information content and a web server that transfers information to the client. This basic two-tier
architecture of the web, static web pages (documents) are transferred from information servers
to browser clients world-wide. This architecture depends on three key standards: HTML for
encoding document content, URLs for naming remote information objects in a global namespace,
and HTTP for staging the transfer. In order to understand these, we need to have a deeper look
into the above standards.
This basic web architecture is fast evolving to serve a wider variety of needs beyond static
document access and browsing. The Common Gateway Interface (CGI) extends the architecture
to three-tiers by adding a back-end server that provides services to the Web server on behalf of
the Web client, permitting dynamic composition of web pages. Helpers/plug-ins and
Java/JavaScript provide other interesting Web architecture extensions. [1]
List at least four mainstream real-time messaging applications. Document the protocols
they use (along with references to corresponding IETF RFCs) and explain in detail how
they differ. Please provide references and/or links to all documentation sources used to
answer this question.
The four most popular mainstream real-time messaging applications currently available are
Facebook Messenger, WhatsApp, WeChat (most used in China) and Google Hangout.
Facebook Messenger:
It’s a stand-alone messaging app which uses MQTT protocol
MQTT protocolwith small-footprint, low bandwidth nature is ideal for mobile messaging.
The protocol helps minimize both battery use and network traffic.
As of March 2013, MQTT is in the process of undergoing standardization at OASIS. The
protocol specification has been openly published with a royalty-free license for many years,
and companies such as Eurotech (formerly known as Arcom) have implemented the protocol
in their products.
MQTT was invented by Dr Andy Stanford-Clark of IBM, and Arlen Nipper of Arcom (now
Eurotech), in 1999.
MQTT stands for MQ Telemetry Transport. It is a publish/subscribe, extremely simple and
lightweight messaging protocol, designed for constrained devices and low-bandwidth, high-
latency or unreliable networks.
The design principles are to minimize network bandwidth and device resource requirements
whilst also attempting to ensure reliability and some degree of assurance of delivery. These
principles also turn out to make the protocol ideal of the emerging “machine-to-machine”
(M2M) or “Internet of Things” world of connected devices, and for mobile applications
where bandwidth and battery power are at a premium.
Source:https://www.facebook.com/notes/facebook-engineering/building-facebook-
messenger/10150259350998920
Whatsapp:
WhatsApp server is almost completely implemented in Erlang.
Server systems that do the backend message routing are done in Erlang.
The number of active users is managed with a really small server footprint largely
because of Erlang.
Originally chosen because its open, had great reviews by developers, ease of start and
the promise of Erlang’s long term suitability for large communication system.
The next few years were spent re-writing and modifying quite a few parts of ejabberd,
including switching from XMPP to internally developed protocol, restructuring the code
base and redesigning some core components, and making lots of important modifications to
Erlang VM to optimize server performance.
A primary gauge of system health is message queue length. The message queue length of
all the processes on a node is constantly monitored and an alert is sent out if they
accumulate backlog beyond a preset threshold. If one or more processes falls behind that is
alerted on, which gives a pointer to the next bottleneck to attack.
Multimedia messages are sent by uploading the image, audio or video to be sent to an
HTTP server and then sending a link to the content along with its Base64 encoded
thumbnail (if applicable).
What protocol is used in Whatsapp app? SSL socket to the WhatsApp server pools. All
messages are queued on the server until the client reconnects to retrieve the messages. The
successful retrieval of a message is sent back to the whatsapp server which forwards this
status back to the original sender (which will see that as a "checkmark" icon next to the
message). Messages are wiped from the server memory as soon as the client has accepted
the message
RFC for XMPP used is RFC 6120 and 6121
Source: http://highscalability.com/blog/2014/2/26/the-whatsapp-architecture-
facebook-bought-for-19-billion.html
WeChat:
Remote Logging Protocol is a communication protocol used by WeChat to interact with
the remote log server. It uses syslog protocol which is used to convey event notification
messages.
It also uses HTTP protocol to communicate with WeChat API. RFCs corresponding to
Remote Logging Protocol and Syslog protocol are 5424. RFCs corresponding to HTTP are
7230,7231,7232,7234,2817,2818.
Source: https://github.com/tencent-wechat/phxsql/wiki/Architecture
http://ieeexplore.ieee.org/document/7404750/?reload=true
Google Hangout:
The new Hangouts IM an audio/video chat product (which replaced Google Talk), has
proprietary technology that doesn't support server federation via the XMPP (Extensible
Messaging and Presence Protocol) industry standard.
Any client that supports Jabber/XMPP can connect to the Google Talk service.
As long one adheres to the requirements of the XMPP specs, he will be able to connect to
the Google Talk service. The following details are to be kept in mind:
o The service is hosted at talk.google.com on port 5222
o TLS is required
o The preferred authentication mechanism is OAuth 2.0
o SASL PLAIN is supported for legacy clients
Hangouts allows conversations between two or more users. The service can be accessed
online through the Gmail or Google+ websites, or through mobile apps available for
Android and iOS.
Source: https://developers.google.com/talk/open_communications
List at least five mainstream email applications. Document the protocols they use (along
with references to corresponding IETF RFCs) to send and receive emails and explain in
detail how they differ. Please provide references and/or links to all documentation
sources used to answer this question.
Below are the 5 most popular mainstream email application and the protocols they use to
send/receive emails.
Google
Both IMAP and POP are supported for receiving emails
SSL or TLS are required
Port 995 is used for POP and port 993 is used for IMAP
Port 465 is used if SSL is used, otherwise port 587 is used. If SSL is used, it is only
possible to send emails to Gmail or Google applications users
Non-conformant to RFCs 5322/3696 since it does not support IPv4 address literals in
email addresses. It also does not support non-encrypted connections
Outlook
Both IMAP and POP are supported for receiving emails
SSL is required for incoming emails and TLS is required for outgoing emails
Uses port 995 for POP and 993 for IMAP
Only TLS can be used for outgoing connection and both ports 25 and 587 can be used
Mostly compliant to RFCs 5322/3696 but does not support non-encrypted connections
Yahoo
Both IMAP and POP are supported for receiving emails
SSL is required for incoming emails and TLS is required for outgoing emails
Port 995 is used for POP and port 993 is used for IMAP
Either port 465 for SSL or 587 for TLS are used. Mails can be sent either way
Mostly compliant to RFCs 5322/3696 but does not support non- encrypted connections
iCloud
IMAP is supported for receiving emails
SSL or TLS are required
Port 993 is used for IMAP
Port 587 is used for SMTP
Non-conformant to RFC 5322 and RFC 3696 due to the fact that it does not support more
than 21 characters for local part of the username (the specification mentions that
applications have to support 64 octets). It also does not support non- encrypted
connections
AOL
Both IMAP and POP are supported for receiving emails
Both non-encrypted and encrypted connections are supported for receiving emails and
only encrypted connections are supported for outgoing emails
Port 110 is used for regular IMAP and port 993 is used for SSL IMAP. Port 110 is used for
regular POP connection and port 995 is used for SSL POP
TLS and port 587 are the only options for SMTP connections
Requires mail to be compliant to RFC 2821/ 2822 (these RFCs are obsolete at this point).
It also does not support non-encrypted connections
Source: https://en.wikipedia.org/wiki/Comparison_of_email_clients
Traceroute
Traceroute (TRACERT) is a command which can show you the path a packet of information
takes from your computer to one you specify. It will list all the routers it passes through
until it reaches its destination, or fails to and is discarded. In addition to this, it will tell you
how long each 'hop' from router to router takes. When run, traceroute outputs the list of
traversed routers in a simple text format, together with timing information. In
computing, traceroute is a computer network diagnostic tool for displaying the route (path)
and measuring transit delays of packets across an Internet Protocol (IP) network.
The syntax of the command is shown below:
Usage: tracert [-d] [-h maximum_hops] [-j host-list] [-w timeout]
[-R] [-S srcaddr] [-4] [-6] target_name
Ping:
The ping command helps to verify IP-level connectivity. When troubleshooting, it can be
used to send an ICMP echo request to a target host name or IP address. It is used to verify
that a host computer can connect to the TCP/IP network and network resources.
nslookup
nslookup is a network administration command-line tool for querying the Domain Name
System to obtain domain name or IP address mapping or for any other specific DNS record.
Ipconfig
Ipconfig displays all current TCP/IP network configuration values and refreshes Dynamic
Host Configuration Protocol (DHCP) and Domain Name System (DNS) settings. Used
without parameters, ipconfig displays the IP address, subnet mask, and default gateway for
all adapters.
Dig
dig (Domain Information Groper) is a network administration command-line tool for
querying Domain Name System (DNS) servers. It is useful for network troubleshooting and
for educational purposes. dig can operate in interactive command line mode or in batch
mode by reading requests from an operating system file. When a specific name server is not
specified in the command invocation, it will use the operating systems default resolver,
usually configured via the resolv.conf file. Without any arguments, it queries the DNS root
zone.
6. Problem 6 – Overlay Networks
a.How is peer churn managed in P2P applications such as file-sharing, conferencing, and
content distribution?
The dynamics of peer participation, or churn, are an inherent property of Peer-to-Peer (P2P)
systems and critical for design and evaluation. Accurately characterizing churn requires
precise and unbiased information about the arrival and departure of peers, which is
challenging to acquire.
Characterizing a churn requires detailed and neutral information about the arrival and
departure of peers. In practice, it is difficult to acquire due to the large size and highly dynamic
nature of applications. Due to the unavailability of a reliable model for churn, researchers end
up making assumptions about the distribution of arrival times and session lengths that might
be incorrect.
One way is to identify several key challenges in characterizing churn that arise from factors
such as measurement limitations, network conditions, and peer dynamics. Developing
techniques to address these difficulties or at least binding the resulting error is important
resulting in measurements that are significantly more accurate and representative. Churn can
be examined at two levels, Group-level characteristics that capture the behavior of all
participating peers collectively and Peer-level characteristics that capture the behavior of
specific peers across multiple appearances in the system over time. To have broad
applicability of the results it is necessary to understand churn in three types of widely-
deployed P2P systems:
Gnutella, an unstructured file-sharing system
Kad, a Distributed Hash Table (DHT)
Bit Torrent, a content distribution system.
Examining multiple systems allows exploration of the similarities and differences in churn
behavior between different types of P2P systems. Results can be summarized as follows:
Group-level properties of churn exhibit similar behavior across all three applications, but
per-peer properties in Bit Torrent are significantly different.
Session lengths are fit by Weibull or log-normal distributions, but not by the exponential
or Pareto.
Past session length is a good predictor of the next session length in Gnutella and Kad, but
not Bit Torrent.
The availability of individual peers exhibits a strong correlation across consecutive days.
In Bit Torrent, peers frequently remain in the system long after their downloads
complete.
The main goal is to measure the arrival and departure time of peers so that we can
compute characteristics such as session lengths and inter-arrival intervals. Each system
provides slightly different hooks for measurement, each with advantages and
disadvantages. The two most important properties are the precision in measuring arrival
and departure times and the ability to capture a representative set of sessions. [2]
b. Provide specific examples of P2P applications, explain how they specifically handle
churn, and estimate the performance improvements achieved in each case. Please
provide references and/or links to all documentation sources used to answer this
question.
For this question, will discuss Bamboo, whichis a Java based Open Source DHT which
implements algorithms of DHT, Pastry, Chord, modifying existing algorithms, to handle churn
better. It maps a large identifier space onto the set of nodes in the system in a deterministic
and distributed fashion, a function we alternately call routing or lookup. Bamboo achieves this
goal through the following three features of its design:
1. Static resilience to failures: Static resilience provides routability after failure even before
recovery takes place, and so allows the DHT to use the power of its own routing
mechanism to enact that recovery.
2. Timely, accurate failure detection: Nodes should accurately choose timeouts such that a
late response is indicative of node failure, rather than network congestion or processor
load. Bamboo chooses such timeouts through two complementary techniques. First, it
performs active probing through the user-level networking layer. When actual traffic
lookup requests, leaf set changes, etc. is being sent to some neighbor, that traffic is used to
maintain this timing information; in its absence, dummy requests are sent (every 4
seconds by default). Thus, a Bamboo node always has a recent estimate of the response
time for each of its neighbors. Since a node has only O(log N) neighbors, the use of
recursive routing allows a Bamboo node to only communicate with a small number of
peers, and active probing is feasible.
http://www.objs.com/survey/WebArch.htm
https://www.facebook.com/notes/facebook-engineering/building-facebook-
messenger/10150259350998920
http://highscalability.com/blog/2014/2/26/the-whatsapp-architecture-facebook-bought-for-
19-billion.html
https://github.com/tencent-wechat/phxsql/wiki/Architecture
http://www.objs.com/survey/WebArch.htm
http://conferences.sigcomm.org/imc/2006/papers/p19-stutzbach2.pdf
http://research.microsoft.com/en-us/um/people/padmanab/papers/msr-tr-2005-03.pdf
https://en.wikipedia.org/wiki/Comparison_of_email_clients
https://developers.google.com/talk/open_communications