Lab 0_ Hypertext _ Web protocols Lab
Lab 0_ Hypertext _ Web protocols Lab
General Information.
Concepts: Hypertext, Web Browser, Web Page, Web Server, Web Site, URL, HTML,
XML, HTTP, MIME, CSS, Javascript, Cookies, PHP, Page tagging, Web
Analytics, Log Server analysis.
Location: 0.A.06
Introduction.
In the beginning, the ‘World Wide Web’ (WWW) was designed basically as a mechanism to
share documents (‘Hypertext’), also known as ‘Web Pages’, which contain further references
to other documents by the use of links (‘Hypherlinks’). These ‘Web Pages’ are usually written
with a markup language (HTML, ‘Hypertext Markup Language’) and all of them are shared
and accessible through the Internet by a client-server protocol (HTTP, ‘Hypertext Transfer
Protocol’). The ‘Web Browser’ is the HTTP client´s protocol application that asks, on behalf of
the user, for a ‘Web Page’ that is localized by its ‘address’ (URL, Uniform Resource Locator’).
If the ‘Web Page’ contains ‘hyperlinks’ too, they can also be recovered through their
corresponding URLs. Any ‘URL’ includes a reference that points to the ‘Web Server’, the
counterpart HTTP-service application that listens to clients’ requests. It maps the logical URL
‘address’ of the ‘Web page’ to the server’s physical file so it is able to send it back to the ‘Web
Browser’ that parses the received HTML document, aggregates additional content (like
images) if necessary, and finally, renders it to the user.
The evolution of ‘WWW’ has been towards the inclusion of new capabilities for ‘Web Pages’
in order to increase both the transactability and the user interactivity within the world of the
‘Web’. The former was due to the boom of e-Commerce and the last one to the development
of ‘rich-multimedia’ content and the emergence of ‘Social Networks’. In that sense, a bunch of
standards have been developed and incorporated into the ‘Web’ such as: Javascript, CSS
(Cascade Style Sheet), MIME (Multipurpose Internet Mail Extensions), XML (eXtensible
Markup Language), Web Services, etc…. As HTTP is a ‘stateless’ protocol, it is convenient to
have some kind of ‘State Management Mechanism’ for building sessions between clients and
servers and ‘HTTP Cookie’ is the designed standard to achieve it. The apparition of
programming languages, such as PHP or Java, also led to the creation of ‘dynamic Web
Pages’, the original prototype of actual popular ‘Web applications’ which can be considered
as the aggregation of static and dynamic ‘Web Pages’ and other Web ‘resources (images,
audio, video, etc..)’. A ‘Web Site’ can be considered as an aggregation of ‘static’ and ‘dynamic
Web content’.
In the context of ‘Web Analytics’, several techniques have been developed to track the activity
of users through their use of ‘Web Pages’. These techniques will be studied in the lectures on
Web Usage Mining. In this lab, we will learn the basic Web technologies that can be used to
track user activity: server logs, Javascript, and cookies.
Goals.
The goals of this lab are both to understand and to practice basic concepts, technologies, and
techniques related to the ‘World Wide Web’ (WWW) in the context of this subject. First of all,
to be able to set up the laboratory, we need some background knowledge of Web Architecture
and its protocols in order to understand how ‘Web applications’ work. This includes practice
concepts like Web Pages, Web Browsers, Web Servers, HTTP protocol, URI-URL, and
MIME….
Secondly, it’s basic to gain further knowledge of some Web design standards and
programming languages to be able to create both static and dynamic Web Pages. This will
include some of the main standards in Web Development such as HTML, CSS, Javascript, …
Source code.
The source code for the laboratory can be downloaded from Aula Global (file sources.zip).
Set up
As we are going to practice with ‘Web Applications’, we need to be capable of setting up a
minimal infrastructure to work with such applications. Any ‘Web Application’ is composed of
two main components:
● A ‘Web Client’, that is, an HTTP-Client. In our case, we will use a ‘Web Browser’ and
you can use whatever you like (Firefox, Chrome, IExplorer, Microsoft Edge).
● A minimal ‘Internet Server infrastructure’ to deploy ‘Web Applications’, based on a
‘Web Server’ (we will use the well-known Apache), combined with some extra
components, such as a programming language and a ‘Relational Database Manage
System’ (RDBMS) to store application data. One of the most famous packages that
implement this is LAMP (https://en.wikipedia.org/wiki/LAMP_(software_bundle)) which
integrates Linux, Apache, DBRMS like MySQL, and several programming languages
(Perl, PHP, Python).
There are many other implementations of LAMP (WAMP, MAMP, XAMPP) but we are going
to focus on one of these installations for the laboratory depending on the Operating System
(OS) that you have:
○ Windows users: you don’t need to install anything because we are going to use a
‘portable LAMP’ distribution (USBWebserver). You can download it from Aula Global
(file USBWebserver v10.zip). Please, extract the ZIP file on any subfolder you like and
then just run the ‘.exe’ file (usbwebserver.exe) to start it.
○ Linux users: you only need to install Apache, PHP and MySQL, please follow these
links depending on your distribution:
○ (Ubuntu) https://ubuntu.com/server/docs/lamp-applications
○ (Debian) https://wiki.debian.org/LaMp
IMPORTANT NOTICE
USBWebserver should be placed in your local disk (and thus, it should not be placed in One
Drive). Otherwise, it will not work.
For Windows users, it is advised that you first try to do this lab using USBWebserver.
As a backup solution for Windows users that are not able to execute USBWebserver in their
computers, and Mac users that do not have computer skills, an Oracle VirtualBox virtual
machine is provided. This virtual machine runs under an ubuntu operating system, and
includes a LAMP server already installed. The source files used in this lab are also already
included in the virtual machine.
https://www.virtualbox.org/
It is recommended that you create a shared folder between your computer and the virtual
machine (in the Configuration menu of the virtual machine in Oracle VirtualBox), to ease
copying files between your computer and the virtual machine.
After running the ‘usbwebserver.exe’ file, please, take a look at the Main window.
This is the ‘General configuration’ tab. Please, click on the ‘Settings’ tab and the next window
should appear:
Here you can set up both the ‘Root directory’ where we will place the ‘Web Pages’ files (and
related Web resources) and the ‘port number’ in which the Apache Web Server will be
listening to the clients’ HTTP-requests. Also, in the following table there are some other useful
Web server directories for the lab so you should take them into account for the next exercises:
You can access these files also through the ‘Apache Settings’ Tab:
¡¡ IMPORTANT STEP !! Now, please, extract the content of the source code of the ZIP file
(sources.zip) at the ‘root’ directory of the USBWebServer.
On the client's side, we will also need some tool to be able to view, test and debug the HTML
code of a Web Page and the HTTP protocol so we can understand the operation of Web
Applications. Nowadays, all ‘Web browsers’ integrate a set of debugger tools called ‘Web
development tools’ (https://en.wikipedia.org/wiki/Web_development_tools). Although
relatively unknown among general users, they are very useful and we will make great use of
them during the laboratory. They are generally integrated as a ‘plugin’ in Web Browsers and
are accessible through the ‘Web Development’ option under the ‘Configuration’ menu. There
is also a common ‘key-shortcut’ to activate them (‘F12’). After clicking on it, the main window
of ‘Web Development Tools’ is activated and looks like this (in Firefox’s Web Browser):
Exercise 1: Web architecture and protocols.
CONCEPTS TASKS
Web Concepts: Web Page, Navegador, Client activity: the Web Browser:
Servidor Web ● Analyze the source code of a Web
Protocol concepts: HTTP, URI-URL, MIME Page.
● Open the ‘debugger’ and try to
understand it.
Server activity: the Apache Web Server:
● Install/open ‘USBWebserver’
● Configure the Server: Port, Root
Directory, et…
● Open a URL of the Web Server and
analyze it with the debugger.
● Watch the Apache Server’s Log File.
NOTE: Please, first of all, check that the source code for the laboratory is already extracted at
the ‘root’ directory of the Apache Web Server (refer to the ‘Setup’ section).
Before doing anything, try to read and understand the following terms:
TERM DEFINITION
HTTP https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol
HTTPS https://en.wikipedia.org/wiki/HTTPS
URL https://en.wikipedia.org/wiki/URL
MIME https://en.wikipedia.org/wiki/MIME
Ok, let’s start. First of all, check that ‘USBWebsever’ Web Server is set up and running. Open
a ‘Web Browser’, click on ‘F12’ to start ‘Web Development Tools’ and select the ‘Network’ tab
to trace all HTTP requests that will be made to the Web Server. Please, type the following
URL at the Web Browser’s address bar: http://127.0.0.1/index.php. You should see the
following ‘Index Web Page’, that is, the entry-point to the ‘Web Site’:
Please, find out the ‘Index Web Page’ source file (index.php) that has been just served from
Apache ‘Web Server’s ‘root’ directory (hint: find the route out from USBWebServer’s
Configuration Panel ) and after looking at the ‘debug’ panel try to respond to the following
questions about what you have seen until now:
Please, click on the first HTTP-request and a new ‘Debug Tab’ will appear on the right where
you can see the full information about the client request (Headers, Cookies, Parameters,
Response and Times).
To see the source code of the ‘Index Web Page’, place the mouse over the page and click on
‘Watch the source code’ option of the contextual-menu and a new window will be opened with
the ‘Web Page’s’ HTML source code:
What you can see is the general HTML code structure of a ‘Web Page’. Please, analyze it and
try to answer the following questions (note: don’t be panic right now, we will practice more
with HTML in the next exercise, this is just to take a ‘first look’ of the anatomy of a Web Page):
● ¿What does the ‘<head>’ tag section mean?.......¿and the ‘<body>’ tag section?
● ¿What do <div>, <a>, <ul>,<li> tags mean?
Please, look for the Apache’s ‘log directory’ (hint: find it out from USBWebServer’s Apache
Settings Panel ) and open the ‘access-log’ file.
Note: in the virtual machine that is provided for this lab, Apache’s log files are placed in
/var/log/apache2
The general format of each Apache’s log registry is specified at the ‘httpd.conf’ file and it is
based on the following sequence of fields (if you don’t see neither the ‘Referer’ nor the
‘User Agent’ fields at the log file, please watch the NOTE below):
Take a look at the next table that explains every field of a Apache’s log registry:
Please, analyze the previous Apache’s log entries and find out where the useful information
to track users is located at each registry. Finally, try to answer the following questions after
analyzing the log file:
CONCEPTS TASKS
HTML, CSS, Javascript, PHP Analyze a ‘static’ and a ‘dynamic’ web page.
● Analyze the HTML content of a Web
page.
● Create a ‘welcome page’ and analyze
it (head, body, tags, scripts, css)
Analyze your first Form.
● Create/view a form to send data to
the Server.
● Watch the data reached at Server.
Analyze your first ‘dynamic’ Web page:
● Customization of a ‘welcome page’
with PHP.
● Data form validation.
‘Web pages’ have been evolving over time as more and more types of ‘Web content’
(multimedia, semantic data, graphics, etc...) were needed to be integrated in them. This led to
the increase of complexity of Web Pages and HTML code becoming really unmanageable. To
address the problem, the W3C (https://w3.org) developed a bunch of standards to rationalize
the creation of ‘Web content’. There are basically three Web standards and technologies that
are necessary to know for creating and designing ‘Web Pages’:
Javascript Interaction The interaction with and within the ‘Web Page
(https://en.wikipedia.or
g/wiki/JavaScript)
These standards only permit to build ‘static’ pages, that is, pages that can’t change their web
contents. To make such content to be ‘changeable’ you need to integrate some kind of
programming language in the ‘Web Page’ and so, it will be renamed from ‘static’ to ‘dynamic’
page. To be able to build ‘dynamic’ ‘Web Pages’, there are a lot of ‘programming languages’
prepared for Web Development and they are basic for building actual ‘Web Applications’. Here
are some of the most popular (we will use PHP in this laboratory):
LANGUAGE REFERENCE URL
Java https://www.oracle.com/es/java/
Phyton https://www.python.org/
PHP http://php.net/
Finally, when dealing with ‘Web Development’, it’s quite convenient to have a basic reference
tool on HTML (and other Web technologies) to be able to answer all the questions that will be
arised in the future. In such sense, it’s quite worth to remember this ‘Web Site’:
● http://www.w3schools.com
This is one of the most popular ‘Reference Sites’ for ‘Web developing’ on the Internet, not only
to learn ‘Web technologies’ but also as a reference tool. Please, before starting the next
exercise, read and try to understand all the previous terms, concepts and technologies.
Ok, let’s go. First of all, open in a Web Browser any of the following ‘Web Pages’ located at
these URLs: http://127.0.0.1/Demo.html http://localhost/Demo.html (NOTE: 127.0.0.1 is
equivalent to type ‘localhost’). Notice that both are the same ‘Web Page’. In that page,
you will see a selection of HTML Web content that you can create with HTML, CSS and
Javascript to build ‘Web Pages’. Please, select any piece of Web content on the page and
click on ‘View source code’ menu-option to see the HTML code of each piece of ‘Web content’.
As you will notice, HTML is able to aggregate many kinds of data, like text, graphics or
multimedia. These are the main common characteristics to all ‘Web Pages’:
● The ‘structure’ of a ‘Web Page’ content is defined by HTML <tags> that define the
type of the content, so that:
○ Any content should be surrounded by a HTML<tag>.
○ HTML<tags> can have attributes to extend their properties.
○ The <head> section is not visible, is for ‘meta-data’. The <body> section is
the one which allocates the visible content.
● The ‘presentation’ content is carried out by CSS rules situated inside the <style>
or within attributes (like ‘style’) of a HTML <tag>.
○ It is usual also to attach CSS code from an external file by using a ‘link’ to the
file that contains it.
● The ‘interaction’ of the user with the ‘Web Page’ and vice versa is achieved by the
use of the Javascript scripts which can be located as content of the HTML <script>
tag or in the attribute of a HTML <tag> (like occurs when a ‘Button’ is used)
○ It is usual also to attach Javascript code from an external file by using a ‘link’
to the file that contains it.
Let’s create our first ‘Web Page’. Create a New file (Welcome.html). It will be your first
‘Welcome Page’ and in it you are going to add the following content:
● A centered ‘heading’ with your Name.
● An image of you, aligned to the right.
● A table with this data: Age, Address, Phone Number, Email address.
● A title and a paragraph with your presentation.
● A title and a paragraph with your ‘hobbies’.
Please, feel free to change the appearance of the page by adding some styles (colours, fonts,
sizes, et…). Finally, place the file at the ‘root’ directory of the ‘Web Server’ and access it
through the corresponding URL at the ‘Web Browser’ to see the result.
Note: in the virtual machine that is provided for this lab, web pages should be placed in the
following folder:
/var/www/html
The main mechanism for ‘Web Pages’ to send data to the ‘Web Server’ is by using the HTML
<form> tag. ‘Forms’ can aggregate multiple data fields that permit collecting user’s
information. When all data is ready to be sent, the user clicks on a ‘Submit’ button and the
Browser sends it to the Server. There are several examples of forms in the ‘Demo page.html’
to practice with. Please, also click ‘F12’ to open ‘Web Developer Tools’ so you can see how
the ‘Web Browser’ creates a HTTP request every time you click on the ‘Submit’ button of every
form. You will also be able to see what data is reaching the ‘Web Server’ at each request if
you see the result page. For example, go to the first ‘form’ of the Demo page (that is shown in
the next figure):
After clicking on the ‘Submit’ button, you will see the following result page:
As you will notice, you can see both, the client HTTP request ‘parameters’ that have been sent
by the ‘Web Browser’ (firstname, lastname) and how they have been collected by the
‘Web Server’ as shown at the result page. Remember that, ‘Demo.html’ is known as a ‘static’
‘Web Page’ (https://en.wikipedia.org/wiki/Static_web_page) because its content is not able to
be changed at all. To do so, you need a ‘Web Server’ with programming capabilities that make
possible to generate such changing web content. Those pages are known as ‘dynamic’ ‘Web
Pages’ that generally combine ‘static’ content with other generated by some kind of
programming language (https://en.wikipedia.org/wiki/Dynamic_web_page). In this case, PHP
is being used to generate such ‘dynamic’ pages as the result page that is ‘showing’ the
parameters of the ‘form’. The name of the ‘dynamic’ page that is collecting the form data and
showing back to the user is ‘action_page.php’. You can edit it to see the code and as
you will notice, the page combines both ‘static’ code like HTML with other ‘dynamic’ code by
inserting PHP code within it.
Let’s practice a bit more with ‘dynamic’ pages. Let’s put a ‘Terms Of Service’ check to control
the login on a ‘Web Site’. The ‘login-form’ we are now considering will look like this:
As you see, there is a new ‘field’ in the ‘form’, a ‘check box’ to assert a condition from the
user. When you click on the ‘Submit’ button to send the form data to the ‘Web Server’, the
page will read the state of the ‘check’ button in order to give further access to the user. If the
‘Terms of Service’ have not been checked out by the user, the ‘error-page’ will notice to the
user:
But, on the other hand, if the ‘checkbox’ has been selected by the user, then the following
‘success page’ should be shown after sending the ‘form’:
In the following URL: http://127.0.0.1/registration_form.php you can access and watch the
source code of a ‘dynamic’ PHP page that builds a ‘Login form’ but without the ‘check box’
control. Think about the changes that are necessary to add to a new page
(registration_form_check.php) so that it is able to insert the ‘check box’ button in
it and the corresponding ‘dynamic’ code to be able to generate the new result page as it has
been stated previously. Save it at the ‘root’ directory of the ‘web server’ and test the page with
your ‘Web Browser’. You can see the solution by editing the ‘dynamic page’
(registration_form2.php)at the ‘root’ directory.
Exercise 3. Web usage mining’ techniques.
CONCEPTS TASKS
‘Web usage mining’ or ‘Web analytics’ is the process of analysing users’ behaviour when
visiting a ‘Web Site’ so it involves the collection of web data related to them. To do so, we
need to find mechanisms to be able to aggregate the activity of the user during a ‘dialog’
(session) with the ‘Web Server’. But it is now that a ‘small but great’ inconvenience appears
which makes things quite harder. As you will probably remember, HTTP is a ‘stateless’
protocol, that is, client requests are not shared with the server and every time a client sends
a request, HTTP protocol creates a new connection. This fact has advantages (less charge of
management) but, it is a main drawback for being able to track the user activity because, as
the protocol manages no user-session data, it is necessary to find out another mechanism to
do it. Apart from reading and analysing ‘Web Server’s log files to track users (we have just
practiced it in the first exercise), there are other techniques that come to help us and with the
additional benefit that, as they are executed at ‘client side’ they can provide data with improved
accuracy. These techniques are based on the use of ‘Cookies’.
‘Cookies’ are ‘little pieces of information’ that are stored at the client's side, more precisely at
the Web Browser, while the user is browsing. They are the main mechanism to manage the
state of a web dialog with a ‘Web Server’. They are also sometimes cause of concern about
‘privacy’ matters regarding them as they can be used ‘without’ the user’s control.
Here are the main terms and concepts that we are going to practice in this exercise, please
make a slight read of them before starting so you can understand the context of the lab.
TERM DEFINITION
Cookie https://en.wikipedia.org/wiki/HTTP_cookie
Ok, let’s create our first ‘cookie’. Please, open the following URL:
http://127.0.0.1/client_cookie.html , then press ‘F12’ to activate ‘Web Developer Tools’,
activate the ‘Application’ Tab and select ‘cookies’ as shown at the next image (Firefox):
This ‘cookie’ has been set at client’s Web Browser context by using a Javascript script and
this kind of ‘cookies’ are usually known as ‘Javascript cookies’ (
https://www.w3schools.com/js/js_cookies.asp ) but there are also another way of creating
them from the Web Server’s side. To do so, we will need to use a programming language that
supports them, like PHP. Now, open the next URL: http://127.0.0.1/server_cookie.php (please,
don’t close ‘Web Developer Tools’ window) and you will see the next page:
This ‘dynamic’ Web Page (server_cookie.php) sets a client ‘cookie’, but this time from the
‘Web Server’ side. Please, notice that, as there is a previous created ‘cookie’ at the ‘Web
Browser’, it has been also sent within the http-request while the new ‘server-cookie’ has been
set at Web Server’s side and sent back to the client in the http-response to be created in the
‘Web Browser’. Now, if you reload the same ‘Web page’ (server_cookie.php) and then
select the ‘Storage’ Tab of ‘Web Developer Tools’ you will see that the new ‘server-cookie’
has already been set up:
Let’s see how to combine ‘cookies’ and ‘sessions’ to count the number of times a page has
been reloaded by a HTTP client. Please, open the following URL:
http://127.0.0.1/page_count.php and don’t close the ‘Web Developer Tools’ window to be able
to watch the ‘cookies’ Tab. You should see the following result:
This ‘dynamic’ PHP page (page_count.php) creates a new ‘session’ at the ‘Web Server’
that leads to the creation of the corresponding ‘HTTP session’ token, that is, a ‘session-cookie’
(‘PHPSESSID’) that stores the value of a unique ID (‘qd8sba4bchd74cbpvt8ggggfa5’)
bound to the new ‘session’ that has been created in the server’s map. In that map-array, a
counter variable (‘page_count’) has been stored that will be auto-incremented every time
a user reloads the page. So, if you reload the page, the page-counter will be incremented by
one. To ‘reset’ the session-counter you can click on the ‘Destroy Session’ link and you will
notice that the ‘session-cookie’ will have disappeared as the ‘session’ has been dropped. If
you return to the initial page you can check out that a new ‘session-cookie’ has been created
with another ‘Id’.
Until now, all ‘cookies’ that have been stored at the client’s Browser have come from the ‘Web
domain’ the user has visited. But, as any ‘Web Page’ can aggregate multiple contents and
those can be hosted at other foreign ‘Web domains’, these can also set and read their own
‘cookies’. These are known as ‘third-party cookies’ and they are commonly used for user’s
‘Web tracking’. For instance, imagine you are an ‘Advertisement Provider’ that wants to have
a registry of how many times your own ‘ad-banner’ in a page has been viewed by web users.
This could be achieved by placing a ‘cookie’ when the page that contains the banner was
accessed. But, due to security concerns, a ‘Web Browser’ behaves under the ‘Same origin
policy’’ ( https://en.wikipedia.org/wiki/Same-origin_policy ) so it’s not so easy to download Web
content, specially cookies, from a ‘third-party’ host. Nevertheless, there is an old ‘tricky’
technique to ‘bypass’ this policy by using a well-known technique called ‘Pixel Tag’, that is, a
little ‘invisible’ image added to a ‘Web Page’ whose source code is ‘dynamically’ created by
the ‘third-party’ Server and that can also be used to ‘track’ the user, for example, by setting a
corresponding ‘third-party cookie’.
To be able to practice with ‘third party’ cookies we will need to set up an alternative ‘Web
Server’ to simulate such ‘third party’. The next diagram shows the components that we have
to deploy and it shows also all ‘cookies’ (first, third-party) that are gone to be created at each
context:
port:
Main-
host
Server
Web Web
Brow
ser
cookies: port:
server=main- Ad-
Web host
Page
Ad Server
Ad
For this exercise we will have an ‘Ad Server’ running on a remote computer. Now you will
access to two running ‘Web Servers’: the ‘Main-host’ (which hosts the main Web Page),
running in your computer, and the ‘Ad Server’ (which hosts the ‘Ad banner’) running on a
remote computer. Both of them will set up ‘server’ cookies, but only the one created at the ‘Ad
Server’ will be known as the ‘third-party cookie’.
Please, open a ‘Web Browser’ and press ‘F12’ to run ‘Web Developer Tools’ and activate the
‘Network’ Tab. Next, open the following URL: http://127.0.0.1/3rd_cookie.php at the ‘Web
Browser’. This ‘Web page’ simulates a page that first, sets a ‘cookie’ from ‘Web server’ and it
also adds a ‘banner’ which is wanted to be tracked by the ‘Ad-server’. To do so, the banner
also integrates a snippet of code (‘pixel tag’) that calls to the ‘3Pcookie.php’ ‘dynamic
page’ that will create the second ‘third-party’ cookie at the client. You can check out the HTTP
connection that has been created to set the ‘third party cookie’ created.
If you open the ‘cookie’ Tab, you will see both cookies (first, third-party):
Finally, please open the source of the ‘dynamic’ page (3rd_cookie.php) and find where is
the HTML code that inserts the ‘Pixel Tag’. Analyze the URL of the source of the image tag
(<img>) and you will find that it is not pointing to an image file but to a PHP file that is the
script that sets the ‘third party cookie’.
Assignment
Students must develop a small web site. The topic of the web site can be freely chosen.
Examples of possible topics include a particular sport or sport team, videogames, music,
historic buildings, literature, travel, hobbies, etc. Each web site must include a home page
from which the visitors of the web site must depart. Your web site must be composed of at
least 6 HTML web pages (including the home page). There should be some web pages that
contain links to more than one web page within your web site, and web pages that can be
reached from several web pages. An example of a possible structure for the links in your web
site is shown in the following figure, where node A would be the home page.
B C
D E F
Example of a possible link structure for the assignment
The figure above is just an example, you do not need to follow it. It is not necessary to develop
web pages with tons of multimedia content and a professional web design, but it is expected
that at least the web pages contain some text content and a reasonable format, not just the
links (you can copy and paste the formatting of the Demo.html file). You can include images
and other multimedia files in the web pages. You can also include external links.
The developed web pages should be placed in a folder called assignment. Thus, the URL
of the home page should be
http://localhost/assignment/index.html
Such folder is already included in the virtual machine that is provided for this lab.
Next, you have to perform 3 visits to your web site. Following the usual conventions in web
usage mining, at least 30 minutes should elapse between the last web page visited within a
visit to your web site and the first web page accessed in the next visit. All the visits must start
accessing to the home page of your web site. Each visit must follow a different path within
your web site, so that all your web pages should be visited at least once.
Then, you have to go to the web log and identify the three visits to your web site, as well as
the sequence of web pages visited within each visit. It is not a problem if your web log contains
other different entries (for instance, those related with the exercises we have done in this lab).
Finally, you have to write a short report. In your report, include the name and NIA of the
members of the group (groups of one or two students are allowed), describe the topic you
have chosen for your web site and its link structure. Also, you must indicate the lines in your
web log corresponding to each visit, as well as the sequence of web pages visited on each
visit to your web site.
You have to submit the content of your assignment folder, compressed in a zip file, the report
in PDF format, and the web log.