0% found this document useful (0 votes)

16 views

Lab 0_ Hypertext _ Web protocols Lab

The document outlines a laboratory session focused on 'Hypertext & Web protocols', detailing its authorship, teaching staff, concepts, and goals. It covers the foundational aspects of the World Wide Web, including web architecture, protocols, and technologies necessary for creating static and dynamic web pages. Additionally, it provides instructions for setting up a web server environment and exercises to analyze web pages and server logs.

Uploaded by

Ramón García

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views

Lab 0_ Hypertext _ Web protocols Lab

Uploaded by

Ramón García

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Lab0: ‘Hypertext & Web protocols’ Lab

General Information.

Authorship José M. Rubio Manso ([email protected]) with minor changes made by

Luis Sánchez Fernández ([email protected])

Teaching Luis Sánchez Fernández ([email protected])

Staff:

Concepts: Hypertext, Web Browser, Web Page, Web Server, Web Site, URL, HTML,
XML, HTTP, MIME, CSS, Javascript, Cookies, PHP, Page tagging, Web
Analytics, Log Server analysis.

Location: 0.A.06

Date: 10/02/2023 (11h15-12h45)

Introduction.
In the beginning, the ‘World Wide Web’ (WWW) was designed basically as a mechanism to
share documents (‘Hypertext’), also known as ‘Web Pages’, which contain further references
to other documents by the use of links (‘Hypherlinks’). These ‘Web Pages’ are usually written
with a markup language (HTML, ‘Hypertext Markup Language’) and all of them are shared
and accessible through the Internet by a client-server protocol (HTTP, ‘Hypertext Transfer
Protocol’). The ‘Web Browser’ is the HTTP client´s protocol application that asks, on behalf of
the user, for a ‘Web Page’ that is localized by its ‘address’ (URL, Uniform Resource Locator’).
If the ‘Web Page’ contains ‘hyperlinks’ too, they can also be recovered through their
corresponding URLs. Any ‘URL’ includes a reference that points to the ‘Web Server’, the
counterpart HTTP-service application that listens to clients’ requests. It maps the logical URL
‘address’ of the ‘Web page’ to the server’s physical file so it is able to send it back to the ‘Web
Browser’ that parses the received HTML document, aggregates additional content (like
images) if necessary, and finally, renders it to the user.

The evolution of ‘WWW’ has been towards the inclusion of new capabilities for ‘Web Pages’
in order to increase both the transactability and the user interactivity within the world of the
‘Web’. The former was due to the boom of e-Commerce and the last one to the development
of ‘rich-multimedia’ content and the emergence of ‘Social Networks’. In that sense, a bunch of
standards have been developed and incorporated into the ‘Web’ such as: Javascript, CSS
(Cascade Style Sheet), MIME (Multipurpose Internet Mail Extensions), XML (eXtensible
Markup Language), Web Services, etc…. As HTTP is a ‘stateless’ protocol, it is convenient to
have some kind of ‘State Management Mechanism’ for building sessions between clients and
servers and ‘HTTP Cookie’ is the designed standard to achieve it. The apparition of
programming languages, such as PHP or Java, also led to the creation of ‘dynamic Web
Pages’, the original prototype of actual popular ‘Web applications’ which can be considered
as the aggregation of static and dynamic ‘Web Pages’ and other Web ‘resources (images,
audio, video, etc..)’. A ‘Web Site’ can be considered as an aggregation of ‘static’ and ‘dynamic
Web content’.

In the context of ‘Web Analytics’, several techniques have been developed to track the activity
of users through their use of ‘Web Pages’. These techniques will be studied in the lectures on
Web Usage Mining. In this lab, we will learn the basic Web technologies that can be used to
track user activity: server logs, Javascript, and cookies.

Goals.
The goals of this lab are both to understand and to practice basic concepts, technologies, and
techniques related to the ‘World Wide Web’ (WWW) in the context of this subject. First of all,
to be able to set up the laboratory, we need some background knowledge of Web Architecture
and its protocols in order to understand how ‘Web applications’ work. This includes practice
concepts like Web Pages, Web Browsers, Web Servers, HTTP protocol, URI-URL, and
MIME….

Secondly, it’s basic to gain further knowledge of some Web design standards and
programming languages to be able to create both static and dynamic Web Pages. This will
include some of the main standards in Web Development such as HTML, CSS, Javascript, …

Source code.
The source code for the laboratory can be downloaded from Aula Global (file sources.zip).

Set up
As we are going to practice with ‘Web Applications’, we need to be capable of setting up a
minimal infrastructure to work with such applications. Any ‘Web Application’ is composed of
two main components:
● A ‘Web Client’, that is, an HTTP-Client. In our case, we will use a ‘Web Browser’ and
you can use whatever you like (Firefox, Chrome, IExplorer, Microsoft Edge).
● A minimal ‘Internet Server infrastructure’ to deploy ‘Web Applications’, based on a
‘Web Server’ (we will use the well-known Apache), combined with some extra
components, such as a programming language and a ‘Relational Database Manage
System’ (RDBMS) to store application data. One of the most famous packages that
implement this is LAMP (https://en.wikipedia.org/wiki/LAMP_(software_bundle)) which
integrates Linux, Apache, DBRMS like MySQL, and several programming languages
(Perl, PHP, Python).
There are many other implementations of LAMP (WAMP, MAMP, XAMPP) but we are going
to focus on one of these installations for the laboratory depending on the Operating System
(OS) that you have:
○ Windows users: you don’t need to install anything because we are going to use a
‘portable LAMP’ distribution (USBWebserver). You can download it from Aula Global
(file USBWebserver v10.zip). Please, extract the ZIP file on any subfolder you like and
then just run the ‘.exe’ file (usbwebserver.exe) to start it.
○ Linux users: you only need to install Apache, PHP and MySQL, please follow these
links depending on your distribution:
○ (Ubuntu) https://ubuntu.com/server/docs/lamp-applications
○ (Debian) https://wiki.debian.org/LaMp

○ Mac users: please install MAMP from this link:

○ https://documentation.mamp.info/en/MAMP-Mac/Installation/

IMPORTANT NOTICE

USBWebserver should be placed in your local disk (and thus, it should not be placed in One
Drive). Otherwise, it will not work.

For Windows users, it is advised that you first try to do this lab using USBWebserver.

As a backup solution for Windows users that are not able to execute USBWebserver in their
computers, and Mac users that do not have computer skills, an Oracle VirtualBox virtual
machine is provided. This virtual machine runs under an ubuntu operating system, and
includes a LAMP server already installed. The source files used in this lab are also already
included in the virtual machine.

You can install Oracle VirtualBox from the following URL:

https://www.virtualbox.org/

It is recommended that you create a shared folder between your computer and the virtual
machine (in the Configuration menu of the virtual machine in Oracle VirtualBox), to ease
copying files between your computer and the virtual machine.
After running the ‘usbwebserver.exe’ file, please, take a look at the Main window.

This is the ‘General configuration’ tab. Please, click on the ‘Settings’ tab and the next window
should appear:

Here you can set up both the ‘Root directory’ where we will place the ‘Web Pages’ files (and
related Web resources) and the ‘port number’ in which the Apache Web Server will be
listening to the clients’ HTTP-requests. Also, in the following table there are some other useful
Web server directories for the lab so you should take them into account for the next exercises:

COMPONENT DIRECTORY DESCRIPTION

Apache /apache2/conf Directory where the main

configuration file of the Web
Server is located:
- httpd.conf

/apache2/logs The Directory of log files.

There are two log files:
- access.log
- error.log

You can access these files also through the ‘Apache Settings’ Tab:
¡¡ IMPORTANT STEP !! Now, please, extract the content of the source code of the ZIP file
(sources.zip) at the ‘root’ directory of the USBWebServer.

On the client's side, we will also need some tool to be able to view, test and debug the HTML
code of a Web Page and the HTTP protocol so we can understand the operation of Web
Applications. Nowadays, all ‘Web browsers’ integrate a set of debugger tools called ‘Web
development tools’ (https://en.wikipedia.org/wiki/Web_development_tools). Although
relatively unknown among general users, they are very useful and we will make great use of
them during the laboratory. They are generally integrated as a ‘plugin’ in Web Browsers and
are accessible through the ‘Web Development’ option under the ‘Configuration’ menu. There
is also a common ‘key-shortcut’ to activate them (‘F12’). After clicking on it, the main window
of ‘Web Development Tools’ is activated and looks like this (in Firefox’s Web Browser):
Exercise 1: Web architecture and protocols.

CONCEPTS TASKS

Web Concepts: Web Page, Navegador, Client activity: the Web Browser:
Servidor Web ● Analyze the source code of a Web
Protocol concepts: HTTP, URI-URL, MIME Page.
● Open the ‘debugger’ and try to
understand it.
Server activity: the Apache Web Server:
● Install/open ‘USBWebserver’
● Configure the Server: Port, Root
Directory, et…
● Open a URL of the Web Server and
analyze it with the debugger.
● Watch the Apache Server’s Log File.

NOTE: Please, first of all, check that the source code for the laboratory is already extracted at
the ‘root’ directory of the Apache Web Server (refer to the ‘Setup’ section).

Before doing anything, try to read and understand the following terms:

TERM DEFINITION

Web Page https://en.wikipedia.org/wiki/Web_page

Web Browser https://en.wikipedia.org/wiki/Web_browser

HTTP https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol

HTTPS https://en.wikipedia.org/wiki/HTTPS

URL https://en.wikipedia.org/wiki/URL

MIME https://en.wikipedia.org/wiki/MIME

Ok, let’s start. First of all, check that ‘USBWebsever’ Web Server is set up and running. Open
a ‘Web Browser’, click on ‘F12’ to start ‘Web Development Tools’ and select the ‘Network’ tab
to trace all HTTP requests that will be made to the Web Server. Please, type the following
URL at the Web Browser’s address bar: http://127.0.0.1/index.php. You should see the
following ‘Index Web Page’, that is, the entry-point to the ‘Web Site’:
Please, find out the ‘Index Web Page’ source file (index.php) that has been just served from
Apache ‘Web Server’s ‘root’ directory (hint: find the route out from USBWebServer’s
Configuration Panel ) and after looking at the ‘debug’ panel try to respond to the following
questions about what you have seen until now:

● Is the ‘Index Web Page’ a ‘static’ or a ‘dynamic’ page?.........why?

● How many ‘Web resources’ has the ‘Web Page’?.....which are their MIME types?
● Which ‘Web resource’ couldn’t be served by the ‘Web Server’?...why?
● What do ‘Method’ and ‘Status’ fields mean at each HTTP-request?

Please, click on the first HTTP-request and a new ‘Debug Tab’ will appear on the right where
you can see the full information about the client request (Headers, Cookies, Parameters,
Response and Times).
To see the source code of the ‘Index Web Page’, place the mouse over the page and click on
‘Watch the source code’ option of the contextual-menu and a new window will be opened with
the ‘Web Page’s’ HTML source code:

What you can see is the general HTML code structure of a ‘Web Page’. Please, analyze it and
try to answer the following questions (note: don’t be panic right now, we will practice more
with HTML in the next exercise, this is just to take a ‘first look’ of the anatomy of a Web Page):

● ¿What does the ‘<head>’ tag section mean?.......¿and the ‘<body>’ tag section?
● ¿What do <div>, <a>, <ul>,<li> tags mean?

Please, look for the Apache’s ‘log directory’ (hint: find it out from USBWebServer’s Apache
Settings Panel ) and open the ‘access-log’ file.

Note: in the virtual machine that is provided for this lab, Apache’s log files are placed in

/var/log/apache2

You should see some entries like these:

127.0.0.1 - - [12/Dec/2018:16:04:46 +0100] "GET / HTTP/1.1" 200 78184 "-" "Mozilla/5.0

(Windows NT 6.1; Win64; x64; rv:63.0) Gecko/20100101 Firefox/63.0"
127.0.0.1 - - [12/Dec/2018:16:04:46 +0100] "GET /style.css HTTP/1.1" 304 - "http://127.0.0.1/"
"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:63.0) Gecko/20100101 Firefox/63.0"
127.0.0.1 - - [12/Dec/2018:16:04:46 +0100] "GET /images/header.png HTTP/1.1" 304 -
"http://127.0.0.1/" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:63.0) Gecko/20100101
Firefox/63.0"
127.0.0.1 - - [12/Dec/2018:16:04:46 +0100] "GET /index.php?=PHPE9568F35-D428-11d2-A769-
00AA001ACF42 HTTP/1.1" 200 2146 "http://127.0.0.1/" "Mozilla/5.0 (Windows NT 6.1; Win64; x64;
rv:63.0) Gecko/20100101 Firefox/63.0"
127.0.0.1 - - [12/Dec/2018:16:04:46 +0100] "GET /index.php?=PHPE9568F34-D428-11d2-A769-
00AA001ACF42 HTTP/1.1" 200 2524 "http://127.0.0.1/" "Mozilla/5.0 (Windows NT 6.1; Win64; x64;
rv:63.0) Gecko/20100101 Firefox/63.0"
127.0.0.1 - - [12/Dec/2018:16:04:46 +0100] "GET /favicon.ico HTTP/1.1" 404 209 "-"
"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:63.0) Gecko/20100101 Firefox/63.0"
127.0.0.1 - - [12/Dec/2018:16:04:46 +0100] "GET /images/menuleft.jpg HTTP/1.1" 304 -
"http://127.0.0.1/style.css" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:63.0) Gecko/20100101
Firefox/63.0"
127.0.0.1 - - [12/Dec/2018:16:04:46 +0100] "GET /images/background.jpg HTTP/1.1" 304 -
"http://127.0.0.1/style.css" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:63.0) Gecko/20100101
Firefox/63.0"
127.0.0.1 - - [12/Dec/2018:16:04:46 +0100] "GET /images/menu.jpg HTTP/1.1" 304 -
"http://127.0.0.1/style.css" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:63.0) Gecko/20100101
Firefox/63.0"
127.0.0.1 - - [12/Dec/2018:16:04:46 +0100] "GET /images/menuright.jpg HTTP/1.1" 304 -
"http://127.0.0.1/style.css" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:63.0) Gecko/20100101
Firefox/63.0"
127.0.0.1 - - [12/Dec/2018:16:04:46 +0100] "GET /images/topcontent.jpg HTTP/1.1" 304 -
"http://127.0.0.1/style.css" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:63.0) Gecko/20100101
Firefox/63.0"
127.0.0.1 - - [12/Dec/2018:16:04:46 +0100] "GET /images/content.jpg HTTP/1.1" 304 -
"http://127.0.0.1/style.css" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:63.0) Gecko/20100101
Firefox/63.0"
127.0.0.1 - - [12/Dec/2018:16:04:46 +0100] "GET /images/banner.jpg HTTP/1.1" 304 -
"http://127.0.0.1/style.css" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:63.0) Gecko/20100101
Firefox/63.0"

The general format of each Apache’s log registry is specified at the ‘httpd.conf’ file and it is
based on the following sequence of fields (if you don’t see neither the ‘Referer’ nor the
‘User Agent’ fields at the log file, please watch the NOTE below):

%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"

NOTE: To activate the ‘HTTP-referer’ (https://en.wikipedia.org/wiki/HTTP_referer) and the

‘User Agent’ (https://en.wikipedia.org/wiki/User_agent) fields at the access-log registry,
please, set-up the ‘combined’ log format at the ‘LogFormat’ section of the configuration file
(httpd.conf). You can open it by clicking at ‘Settings’ link-option in the Apache’s configuration
Tab:

Take a look at the next table that explains every field of a Apache’s log registry:

LOG FIELD DESCRIPTION

%h This is the IP address of the client (remote

host) which made the request to the server.

%t The time that the request was received.

\"%r\" First, the method used by the client is GET.

Second, the client requested the resource
/apache_pb.gif, and third, the client used
the protocol HTTP/1.0. It is also possible to
log one or more parts of the request line
independently.
%>s This information is very valuable, because
it reveals whether the request resulted in a
successful response (codes beginning in 2),
a redirection (codes beginning in 3), an
error caused by the client (codes beginning
in 4), or an error in the server (codes
beginning in 5).

%b The size of the object returned to the client.

Please, analyze the previous Apache’s log entries and find out where the useful information
to track users is located at each registry. Finally, try to answer the following questions after
analyzing the log file:

● ¿Which is the ‘IP address’ of all HTTP-client requests?

● ¿Did Apache ‘Web Server’ serve the ‘favicon.ico’ image to the client?
● ¿Why are the ‘HTTP-referer address’ and the ‘IP address’ the same?
● ¿Which ‘Web Browser’ and O.S. (Operative System), have been used by the client?
Exercise 2. Web design.

CONCEPTS TASKS

HTML, CSS, Javascript, PHP Analyze a ‘static’ and a ‘dynamic’ web page.
● Analyze the HTML content of a Web
page.
● Create a ‘welcome page’ and analyze
it (head, body, tags, scripts, css)
Analyze your first Form.
● Create/view a form to send data to
the Server.
● Watch the data reached at Server.
Analyze your first ‘dynamic’ Web page:
● Customization of a ‘welcome page’
with PHP.
● Data form validation.

‘Web pages’ have been evolving over time as more and more types of ‘Web content’
(multimedia, semantic data, graphics, etc...) were needed to be integrated in them. This led to
the increase of complexity of Web Pages and HTML code becoming really unmanageable. To
address the problem, the W3C (https://w3.org) developed a bunch of standards to rationalize
the creation of ‘Web content’. There are basically three Web standards and technologies that
are necessary to know for creating and designing ‘Web Pages’:

WEB STANDARD WEB ASPECT DESCRIPTION

HTML5 Content The structure of the content of a ‘Web Page’

(https://en.wikipedia.or
g/wiki/HTML5)

CSS Presentation The presentation of the content of a ‘Web

(https://en.wikipedia.or Page’
g/wiki/Cascading_Style
_Sheets)

Javascript Interaction The interaction with and within the ‘Web Page
(https://en.wikipedia.or
g/wiki/JavaScript)

These standards only permit to build ‘static’ pages, that is, pages that can’t change their web
contents. To make such content to be ‘changeable’ you need to integrate some kind of
programming language in the ‘Web Page’ and so, it will be renamed from ‘static’ to ‘dynamic’
page. To be able to build ‘dynamic’ ‘Web Pages’, there are a lot of ‘programming languages’
prepared for Web Development and they are basic for building actual ‘Web Applications’. Here
are some of the most popular (we will use PHP in this laboratory):
LANGUAGE REFERENCE URL

Java https://www.oracle.com/es/java/

Phyton https://www.python.org/

PHP http://php.net/

Finally, when dealing with ‘Web Development’, it’s quite convenient to have a basic reference
tool on HTML (and other Web technologies) to be able to answer all the questions that will be
arised in the future. In such sense, it’s quite worth to remember this ‘Web Site’:

● http://www.w3schools.com

This is one of the most popular ‘Reference Sites’ for ‘Web developing’ on the Internet, not only
to learn ‘Web technologies’ but also as a reference tool. Please, before starting the next
exercise, read and try to understand all the previous terms, concepts and technologies.

Ok, let’s go. First of all, open in a Web Browser any of the following ‘Web Pages’ located at
these URLs: http://127.0.0.1/Demo.html http://localhost/Demo.html (NOTE: 127.0.0.1 is
equivalent to type ‘localhost’). Notice that both are the same ‘Web Page’. In that page,
you will see a selection of HTML Web content that you can create with HTML, CSS and
Javascript to build ‘Web Pages’. Please, select any piece of Web content on the page and
click on ‘View source code’ menu-option to see the HTML code of each piece of ‘Web content’.
As you will notice, HTML is able to aggregate many kinds of data, like text, graphics or
multimedia. These are the main common characteristics to all ‘Web Pages’:

● The ‘structure’ of a ‘Web Page’ content is defined by HTML <tags> that define the
type of the content, so that:
○ Any content should be surrounded by a HTML<tag>.
○ HTML<tags> can have attributes to extend their properties.
○ The <head> section is not visible, is for ‘meta-data’. The <body> section is
the one which allocates the visible content.
● The ‘presentation’ content is carried out by CSS rules situated inside the <style>
or within attributes (like ‘style’) of a HTML <tag>.
○ It is usual also to attach CSS code from an external file by using a ‘link’ to the
file that contains it.
● The ‘interaction’ of the user with the ‘Web Page’ and vice versa is achieved by the
use of the Javascript scripts which can be located as content of the HTML <script>
tag or in the attribute of a HTML <tag> (like occurs when a ‘Button’ is used)
○ It is usual also to attach Javascript code from an external file by using a ‘link’
to the file that contains it.

Let’s create our first ‘Web Page’. Create a New file (Welcome.html). It will be your first
‘Welcome Page’ and in it you are going to add the following content:
● A centered ‘heading’ with your Name.
● An image of you, aligned to the right.
● A table with this data: Age, Address, Phone Number, Email address.
● A title and a paragraph with your presentation.
● A title and a paragraph with your ‘hobbies’.

Please, feel free to change the appearance of the page by adding some styles (colours, fonts,
sizes, et…). Finally, place the file at the ‘root’ directory of the ‘Web Server’ and access it
through the corresponding URL at the ‘Web Browser’ to see the result.

Note: in the virtual machine that is provided for this lab, web pages should be placed in the
following folder:

/var/www/html

The main mechanism for ‘Web Pages’ to send data to the ‘Web Server’ is by using the HTML
<form> tag. ‘Forms’ can aggregate multiple data fields that permit collecting user’s
information. When all data is ready to be sent, the user clicks on a ‘Submit’ button and the
Browser sends it to the Server. There are several examples of forms in the ‘Demo page.html’
to practice with. Please, also click ‘F12’ to open ‘Web Developer Tools’ so you can see how
the ‘Web Browser’ creates a HTTP request every time you click on the ‘Submit’ button of every
form. You will also be able to see what data is reaching the ‘Web Server’ at each request if
you see the result page. For example, go to the first ‘form’ of the Demo page (that is shown in
the next figure):

After clicking on the ‘Submit’ button, you will see the following result page:
As you will notice, you can see both, the client HTTP request ‘parameters’ that have been sent
by the ‘Web Browser’ (firstname, lastname) and how they have been collected by the
‘Web Server’ as shown at the result page. Remember that, ‘Demo.html’ is known as a ‘static’
‘Web Page’ (https://en.wikipedia.org/wiki/Static_web_page) because its content is not able to
be changed at all. To do so, you need a ‘Web Server’ with programming capabilities that make
possible to generate such changing web content. Those pages are known as ‘dynamic’ ‘Web
Pages’ that generally combine ‘static’ content with other generated by some kind of
programming language (https://en.wikipedia.org/wiki/Dynamic_web_page). In this case, PHP
is being used to generate such ‘dynamic’ pages as the result page that is ‘showing’ the
parameters of the ‘form’. The name of the ‘dynamic’ page that is collecting the form data and
showing back to the user is ‘action_page.php’. You can edit it to see the code and as
you will notice, the page combines both ‘static’ code like HTML with other ‘dynamic’ code by
inserting PHP code within it.

Let’s practice a bit more with ‘dynamic’ pages. Let’s put a ‘Terms Of Service’ check to control
the login on a ‘Web Site’. The ‘login-form’ we are now considering will look like this:
As you see, there is a new ‘field’ in the ‘form’, a ‘check box’ to assert a condition from the
user. When you click on the ‘Submit’ button to send the form data to the ‘Web Server’, the
page will read the state of the ‘check’ button in order to give further access to the user. If the
‘Terms of Service’ have not been checked out by the user, the ‘error-page’ will notice to the
user:

But, on the other hand, if the ‘checkbox’ has been selected by the user, then the following
‘success page’ should be shown after sending the ‘form’:

In the following URL: http://127.0.0.1/registration_form.php you can access and watch the
source code of a ‘dynamic’ PHP page that builds a ‘Login form’ but without the ‘check box’
control. Think about the changes that are necessary to add to a new page
(registration_form_check.php) so that it is able to insert the ‘check box’ button in
it and the corresponding ‘dynamic’ code to be able to generate the new result page as it has
been stated previously. Save it at the ‘root’ directory of the ‘web server’ and test the page with
your ‘Web Browser’. You can see the solution by editing the ‘dynamic page’
(registration_form2.php)at the ‘root’ directory.
Exercise 3. Web usage mining’ techniques.

CONCEPTS TASKS

Cookies Client Server Logging (‘Cookies’)

● Create a ‘client’ cookie from the client
and from the server.
● Session ‘cookie’. Implement a page-
counter.
● ‘Third party’ cookie’. Integration with
a ‘third party tracking pixel’.

‘Web usage mining’ or ‘Web analytics’ is the process of analysing users’ behaviour when
visiting a ‘Web Site’ so it involves the collection of web data related to them. To do so, we
need to find mechanisms to be able to aggregate the activity of the user during a ‘dialog’
(session) with the ‘Web Server’. But it is now that a ‘small but great’ inconvenience appears
which makes things quite harder. As you will probably remember, HTTP is a ‘stateless’
protocol, that is, client requests are not shared with the server and every time a client sends
a request, HTTP protocol creates a new connection. This fact has advantages (less charge of
management) but, it is a main drawback for being able to track the user activity because, as
the protocol manages no user-session data, it is necessary to find out another mechanism to
do it. Apart from reading and analysing ‘Web Server’s log files to track users (we have just
practiced it in the first exercise), there are other techniques that come to help us and with the
additional benefit that, as they are executed at ‘client side’ they can provide data with improved
accuracy. These techniques are based on the use of ‘Cookies’.

‘Cookies’ are ‘little pieces of information’ that are stored at the client's side, more precisely at
the Web Browser, while the user is browsing. They are the main mechanism to manage the
state of a web dialog with a ‘Web Server’. They are also sometimes cause of concern about
‘privacy’ matters regarding them as they can be used ‘without’ the user’s control.
Here are the main terms and concepts that we are going to practice in this exercise, please
make a slight read of them before starting so you can understand the context of the lab.

TERM DEFINITION

Web analytics https://en.wikipedia.org/wiki/Web_analytics

Cookie https://en.wikipedia.org/wiki/HTTP_cookie

Ok, let’s create our first ‘cookie’. Please, open the following URL:
http://127.0.0.1/client_cookie.html , then press ‘F12’ to activate ‘Web Developer Tools’,
activate the ‘Application’ Tab and select ‘cookies’ as shown at the next image (Firefox):

Please, enter a pair of values (‘cookie1’,’value1’) at the corresponding

fields(‘name’,’value’) and then click on ‘Set Cookie’ button. You will notice that a new
‘cookie’ has been set at the ‘Web Browser’ as following:
Please, before sending the ‘cookie’, please activate the ‘Network’ Tab and then click on ‘Set
Cookie’ button. Finally, click on the HTTP client-request registry to visualize the related data
and open the ‘cookies’ Tab. You will be able to see how the cookie has been added to the
corresponding ‘http’ request-header’ and that the ‘Web Server’ has received it as shown in the
next figure:

This ‘cookie’ has been set at client’s Web Browser context by using a Javascript script and
this kind of ‘cookies’ are usually known as ‘Javascript cookies’ (
https://www.w3schools.com/js/js_cookies.asp ) but there are also another way of creating
them from the Web Server’s side. To do so, we will need to use a programming language that
supports them, like PHP. Now, open the next URL: http://127.0.0.1/server_cookie.php (please,
don’t close ‘Web Developer Tools’ window) and you will see the next page:
This ‘dynamic’ Web Page (server_cookie.php) sets a client ‘cookie’, but this time from the
‘Web Server’ side. Please, notice that, as there is a previous created ‘cookie’ at the ‘Web
Browser’, it has been also sent within the http-request while the new ‘server-cookie’ has been
set at Web Server’s side and sent back to the client in the http-response to be created in the
‘Web Browser’. Now, if you reload the same ‘Web page’ (server_cookie.php) and then
select the ‘Storage’ Tab of ‘Web Developer Tools’ you will see that the new ‘server-cookie’
has already been set up:

‘Cookies’ are also widely used to manage ‘HTTP sessions’ (

https://en.wikipedia.org/wiki/Session_(computer_science) ) and they are usually named as
‘HTTP session tokens’. Here are some basic characteristics of ‘sessions’ and ‘cookies’:

● A ‘session’ is stored on the server and is unique for each client.

● When a ‘session’ is created, a ‘session-cookie’ with the unique id is stored on client's
‘Web Browser’. If cookies were not supported, then the ‘session Id’ would be stored at
the next URLs (URL Rewriting).
● A ‘session’ can store data from the client at a server map-array ($_SESSION)

Let’s see how to combine ‘cookies’ and ‘sessions’ to count the number of times a page has
been reloaded by a HTTP client. Please, open the following URL:
http://127.0.0.1/page_count.php and don’t close the ‘Web Developer Tools’ window to be able
to watch the ‘cookies’ Tab. You should see the following result:

This ‘dynamic’ PHP page (page_count.php) creates a new ‘session’ at the ‘Web Server’
that leads to the creation of the corresponding ‘HTTP session’ token, that is, a ‘session-cookie’
(‘PHPSESSID’) that stores the value of a unique ID (‘qd8sba4bchd74cbpvt8ggggfa5’)
bound to the new ‘session’ that has been created in the server’s map. In that map-array, a
counter variable (‘page_count’) has been stored that will be auto-incremented every time
a user reloads the page. So, if you reload the page, the page-counter will be incremented by
one. To ‘reset’ the session-counter you can click on the ‘Destroy Session’ link and you will
notice that the ‘session-cookie’ will have disappeared as the ‘session’ has been dropped. If
you return to the initial page you can check out that a new ‘session-cookie’ has been created
with another ‘Id’.

Until now, all ‘cookies’ that have been stored at the client’s Browser have come from the ‘Web
domain’ the user has visited. But, as any ‘Web Page’ can aggregate multiple contents and
those can be hosted at other foreign ‘Web domains’, these can also set and read their own
‘cookies’. These are known as ‘third-party cookies’ and they are commonly used for user’s
‘Web tracking’. For instance, imagine you are an ‘Advertisement Provider’ that wants to have
a registry of how many times your own ‘ad-banner’ in a page has been viewed by web users.
This could be achieved by placing a ‘cookie’ when the page that contains the banner was
accessed. But, due to security concerns, a ‘Web Browser’ behaves under the ‘Same origin
policy’’ ( https://en.wikipedia.org/wiki/Same-origin_policy ) so it’s not so easy to download Web
content, specially cookies, from a ‘third-party’ host. Nevertheless, there is an old ‘tricky’
technique to ‘bypass’ this policy by using a well-known technique called ‘Pixel Tag’, that is, a
little ‘invisible’ image added to a ‘Web Page’ whose source code is ‘dynamically’ created by
the ‘third-party’ Server and that can also be used to ‘track’ the user, for example, by setting a
corresponding ‘third-party cookie’.

To be able to practice with ‘third party’ cookies we will need to set up an alternative ‘Web
Server’ to simulate such ‘third party’. The next diagram shows the components that we have
to deploy and it shows also all ‘cookies’ (first, third-party) that are gone to be created at each
context:

port:
Main-
host
Server
Web Web
Brow
ser

cookies: port:
server=main- Ad-
Web host
Page
Ad Server
Ad

For this exercise we will have an ‘Ad Server’ running on a remote computer. Now you will
access to two running ‘Web Servers’: the ‘Main-host’ (which hosts the main Web Page),
running in your computer, and the ‘Ad Server’ (which hosts the ‘Ad banner’) running on a
remote computer. Both of them will set up ‘server’ cookies, but only the one created at the ‘Ad
Server’ will be known as the ‘third-party cookie’.

Please, open a ‘Web Browser’ and press ‘F12’ to run ‘Web Developer Tools’ and activate the
‘Network’ Tab. Next, open the following URL: http://127.0.0.1/3rd_cookie.php at the ‘Web
Browser’. This ‘Web page’ simulates a page that first, sets a ‘cookie’ from ‘Web server’ and it
also adds a ‘banner’ which is wanted to be tracked by the ‘Ad-server’. To do so, the banner
also integrates a snippet of code (‘pixel tag’) that calls to the ‘3Pcookie.php’ ‘dynamic
page’ that will create the second ‘third-party’ cookie at the client. You can check out the HTTP
connection that has been created to set the ‘third party cookie’ created.
If you open the ‘cookie’ Tab, you will see both cookies (first, third-party):

Finally, please open the source of the ‘dynamic’ page (3rd_cookie.php) and find where is
the HTML code that inserts the ‘Pixel Tag’. Analyze the URL of the source of the image tag
(<img>) and you will find that it is not pointing to an image file but to a PHP file that is the
script that sets the ‘third party cookie’.
Assignment
Students must develop a small web site. The topic of the web site can be freely chosen.
Examples of possible topics include a particular sport or sport team, videogames, music,
historic buildings, literature, travel, hobbies, etc. Each web site must include a home page
from which the visitors of the web site must depart. Your web site must be composed of at
least 6 HTML web pages (including the home page). There should be some web pages that
contain links to more than one web page within your web site, and web pages that can be
reached from several web pages. An example of a possible structure for the links in your web
site is shown in the following figure, where node A would be the home page.

B C

D E F
Example of a possible link structure for the assignment

The figure above is just an example, you do not need to follow it. It is not necessary to develop
web pages with tons of multimedia content and a professional web design, but it is expected
that at least the web pages contain some text content and a reasonable format, not just the
links (you can copy and paste the formatting of the Demo.html file). You can include images
and other multimedia files in the web pages. You can also include external links.

The developed web pages should be placed in a folder called assignment. Thus, the URL
of the home page should be

http://localhost/assignment/index.html

Such folder is already included in the virtual machine that is provided for this lab.

Next, you have to perform 3 visits to your web site. Following the usual conventions in web
usage mining, at least 30 minutes should elapse between the last web page visited within a
visit to your web site and the first web page accessed in the next visit. All the visits must start
accessing to the home page of your web site. Each visit must follow a different path within
your web site, so that all your web pages should be visited at least once.

Then, you have to go to the web log and identify the three visits to your web site, as well as
the sequence of web pages visited within each visit. It is not a problem if your web log contains
other different entries (for instance, those related with the exercises we have done in this lab).
Finally, you have to write a short report. In your report, include the name and NIA of the
members of the group (groups of one or two students are allowed), describe the topic you
have chosen for your web site and its link structure. Also, you must indicate the lines in your
web log corresponding to each visit, as well as the sequence of web pages visited on each
visit to your web site.

You have to submit the content of your assignment folder, compressed in a zip file, the report
in PDF format, and the web log.

Introduction to PHP Web Services: PHP, JavaScript, MySQL, SOAP, RESTful, JSON, XML, WSDL
From Everand
Introduction to PHP Web Services: PHP, JavaScript, MySQL, SOAP, RESTful, JSON, XML, WSDL
Imran Ghani
No ratings yet
A concise guide to PHP MySQL and Apache
From Everand
A concise guide to PHP MySQL and Apache
alasdair gilchrist
4/5 (2)
WEB ENGINEERING Mannual PDF
40% (5)
WEB ENGINEERING Mannual PDF
19 pages
MIS LAB MANUAL New
No ratings yet
MIS LAB MANUAL New
35 pages
Configuration of Apache Server to Support Asp
From Everand
Configuration of Apache Server to Support Asp
Dr. Hidaia Mahmood Alassouli
No ratings yet
Web Programming III-i
100% (1)
Web Programming III-i
340 pages
IWT v
No ratings yet
IWT v
50 pages
ST Lab Manual1
No ratings yet
ST Lab Manual1
75 pages
PHP MySQL Development of Login Modul: 3 hours Easy Guide
From Everand
PHP MySQL Development of Login Modul: 3 hours Easy Guide
Esstree Ishak Abdullah
5/5 (1)
FFFF
No ratings yet
FFFF
5 pages
Web and mobile application development
No ratings yet
Web and mobile application development
18 pages
(R18A0517) WebTechnologies
No ratings yet
(R18A0517) WebTechnologies
134 pages
P.H.P Simple C.R.U.D Design
From Everand
P.H.P Simple C.R.U.D Design
Rohaya Mohamad
4/5 (1)
HTML & CSS
No ratings yet
HTML & CSS
95 pages
Introduction to PHP, Part 1, Second Edition
From Everand
Introduction to PHP, Part 1, Second Edition
Adam Majczak
No ratings yet
Unit1-WP
No ratings yet
Unit1-WP
93 pages
Lab 8: Forms - Server-Side WEB1201: Web Fundamentals: Action
No ratings yet
Lab 8: Forms - Server-Side WEB1201: Web Fundamentals: Action
29 pages
PHP & MySQL Practice It Learn It
From Everand
PHP & MySQL Practice It Learn It
Jitendra Patel
3/5 (2)
phpMyAdmin Starter
From Everand
phpMyAdmin Starter
Marc Delisle
No ratings yet
Clu-Csc134 Lecture 1
No ratings yet
Clu-Csc134 Lecture 1
17 pages
Web Technologies (PDFDrive)
No ratings yet
Web Technologies (PDFDrive)
402 pages
Chapter 1 (Ip)
No ratings yet
Chapter 1 (Ip)
30 pages
Abhishek Kumar
No ratings yet
Abhishek Kumar
30 pages
Wit Lecture Notes by Ragini
No ratings yet
Wit Lecture Notes by Ragini
239 pages
Aprende programación python aplicaciones web: python, #2
From Everand
Aprende programación python aplicaciones web: python, #2
Jesus Jonathan cuevas orozco
No ratings yet
WST1_Full_1stsemSY-22-23
No ratings yet
WST1_Full_1stsemSY-22-23
152 pages
Shwet IWT
No ratings yet
Shwet IWT
33 pages
Web Technology Lab Manual 24-25
No ratings yet
Web Technology Lab Manual 24-25
115 pages
Lab Manual
No ratings yet
Lab Manual
33 pages
Configuration of Apache Server To Support ASP
From Everand
Configuration of Apache Server To Support ASP
Dr. Hedaya Mahmood Alasooly
No ratings yet
2021-Ee-18 Lab - 02
No ratings yet
2021-Ee-18 Lab - 02
19 pages
WT&WS
No ratings yet
WT&WS
100 pages
10 - The World Wide Web
No ratings yet
10 - The World Wide Web
51 pages
Web Technology Lab
No ratings yet
Web Technology Lab
75 pages
LAS-ICT10-SP-W5.docx (1)
No ratings yet
LAS-ICT10-SP-W5.docx (1)
13 pages
What Is A Web Server?: Z Z Z Z
No ratings yet
What Is A Web Server?: Z Z Z Z
11 pages
Web Dev Csic102
No ratings yet
Web Dev Csic102
7 pages
Computer Networks Lecture Notes: Course Code - Course Name
No ratings yet
Computer Networks Lecture Notes: Course Code - Course Name
57 pages
01 Introduction To Web Programming Updated
No ratings yet
01 Introduction To Web Programming Updated
87 pages
Lesson 1
No ratings yet
Lesson 1
30 pages
Internet_Programming(IP)_Unit_02
No ratings yet
Internet_Programming(IP)_Unit_02
10 pages
Unit 1
No ratings yet
Unit 1
58 pages
Introduction Concept of WWW
No ratings yet
Introduction Concept of WWW
19 pages
Intro Lecture 2
No ratings yet
Intro Lecture 2
28 pages
Web Programming End Sem Combined Slides
0% (1)
Web Programming End Sem Combined Slides
112 pages
WT Manual
No ratings yet
WT Manual
49 pages
WIT-1 Ruby Special 2marks PDF
No ratings yet
WIT-1 Ruby Special 2marks PDF
73 pages
Session 8 Web Technologies: 15.561 Information Technology Essentials
No ratings yet
Session 8 Web Technologies: 15.561 Information Technology Essentials
40 pages
introduction to web
No ratings yet
introduction to web
30 pages
WIT-Course Material-UNIT-1
No ratings yet
WIT-Course Material-UNIT-1
88 pages
It - (R22) - 3-1 - Web Application Development - Digital Notes
No ratings yet
It - (R22) - 3-1 - Web Application Development - Digital Notes
94 pages
Web Technologies PDF
No ratings yet
Web Technologies PDF
46 pages
Anuj-INHOUSE Final Report 2022
No ratings yet
Anuj-INHOUSE Final Report 2022
32 pages
Nama:Fayyadh Syafiq Rabbani Kelas: 2IA15 NPM: 52418630
No ratings yet
Nama:Fayyadh Syafiq Rabbani Kelas: 2IA15 NPM: 52418630
6 pages
Webtech Lab Mannual
No ratings yet
Webtech Lab Mannual
30 pages
4020 Week 1
No ratings yet
4020 Week 1
56 pages
Web Programming
No ratings yet
Web Programming
339 pages
COM 225 Web-Tech
No ratings yet
COM 225 Web-Tech
48 pages
Lab Introduce Web
No ratings yet
Lab Introduce Web
23 pages
CSA M5 Ktunotes.in
No ratings yet
CSA M5 Ktunotes.in
52 pages
Mixture of Gaussians and the EM Algorithm
No ratings yet
Mixture of Gaussians and the EM Algorithm
1 page
HomeworkEmilio
No ratings yet
HomeworkEmilio
2 pages
Final_Project
No ratings yet
Final_Project
6 pages
G.J.O. Jameson - Finding Carmichael Numbers
No ratings yet
G.J.O. Jameson - Finding Carmichael Numbers
12 pages
WhatsApp Issues 4
No ratings yet
WhatsApp Issues 4
14 pages
List Placement Media News Portal 2021 - Media Online Lokal-Nasional Di Indonesia
No ratings yet
List Placement Media News Portal 2021 - Media Online Lokal-Nasional Di Indonesia
2 pages
HTTP Doc - Nabaztag.com API Home
No ratings yet
HTTP Doc - Nabaztag.com API Home
9 pages
W13 PPT-Mendeley
No ratings yet
W13 PPT-Mendeley
9 pages
Cookie Manipulation
No ratings yet
Cookie Manipulation
5 pages
Computer Science Department Portal
No ratings yet
Computer Science Department Portal
9 pages
Technical Assignment - Web Hiring Platform Application
No ratings yet
Technical Assignment - Web Hiring Platform Application
2 pages
Output Log
No ratings yet
Output Log
36 pages
Position - Drupal Developer Experience - 5+ Yrs Job Location - Pune
No ratings yet
Position - Drupal Developer Experience - 5+ Yrs Job Location - Pune
1 page
Prateek Tripathi
No ratings yet
Prateek Tripathi
1 page
Course Outline of Video Editing Professional - Created by Way To Technology
No ratings yet
Course Outline of Video Editing Professional - Created by Way To Technology
6 pages
Vijayant Singh Tomar: Career Highlights
0% (1)
Vijayant Singh Tomar: Career Highlights
4 pages
User Manual SR 9797HD
No ratings yet
User Manual SR 9797HD
47 pages
SAP Content Server 6.40 - Operations Guide
No ratings yet
SAP Content Server 6.40 - Operations Guide
63 pages
README
No ratings yet
README
2 pages
Cleaning A Voice Track With Audacity
No ratings yet
Cleaning A Voice Track With Audacity
14 pages
CHAPTER 8 Safety Security
No ratings yet
CHAPTER 8 Safety Security
14 pages
SACFA Applicant User Manual v2
No ratings yet
SACFA Applicant User Manual v2
21 pages
Lok Adalat
No ratings yet
Lok Adalat
1,343 pages
TEREX 860 Workshop Manual
No ratings yet
TEREX 860 Workshop Manual
1 page
UCCS 1613: Computer Systems and Applications
No ratings yet
UCCS 1613: Computer Systems and Applications
5 pages
Microsoft PL 200 Dumps by Jones 09-08-2024 7qa Certsinside
No ratings yet
Microsoft PL 200 Dumps by Jones 09-08-2024 7qa Certsinside
11 pages
Sony Bravia Klv32ex300 Philippines Price
No ratings yet
Sony Bravia Klv32ex300 Philippines Price
3 pages
Calling-Webservices From Abap Via Https
No ratings yet
Calling-Webservices From Abap Via Https
7 pages
Installing Scapy On Windows (HTTP://WWW - secdev.org/Projects/Scapy
No ratings yet
Installing Scapy On Windows (HTTP://WWW - secdev.org/Projects/Scapy
4 pages
Atta and Mama - Google Search
No ratings yet
Atta and Mama - Google Search
1 page
Module 3 Post Test
100% (1)
Module 3 Post Test
5 pages
Assignment Photoshop
No ratings yet
Assignment Photoshop
11 pages
RSView User Manual Eng
No ratings yet
RSView User Manual Eng
920 pages
Building Best-In-class Miracast Solutions With Windows 10
No ratings yet
Building Best-In-class Miracast Solutions With Windows 10
22 pages