Assignment 3
Assignment 3
Assignment 3
1 Lab Overview
Behavioral targeting is a type of online advertising where ads are displayed based on the users web-browsing
behavior. The user leaves a trail of digital foot prints moving from one website to the other. Behavioral
targeting anonymously monitors and tracks the sites visited by a user. When a user surfs internet, the pages
they visit, the searches they make, location of the user browsing from, device used for browsing and many
other inputs are used by the tracking sites to collect data. A user profile is created from the data and data-
mined for an online behavioral pattern of the user. As a result when users return to a specific site or a
network of sites, the created user profiles are helpful in reaching the targeted audience to advertise. The
targeted ads will fetch more user interest, the publisher (or seller) can charge a premium for these ads over
random advertising or ads based on the context of a site.
2 Lab Environment
You need to use our provided virtual machine image for this lab. The name of the VM image that supports
this lab is called SEEDUbuntu12.04.zip, which is built in June 2014. If you happen to have an older
version of our pre-built VM image, you need to download the most recent version, as the older version does
not support this lab. Go to our SEED web page (http://www.cis.syr.edu/wedu/seed/) to get
the VM image.
Starting the Apache Server. The Apache web server is also included in the pre-built Ubuntu image.
However, the web server is not started by default. You need to first start the web server using the following
command:
The Elgg Web Application. We use an open-source web application called Elgg in this lab. Elgg is a
web-based social-networking application. It is already set up in the pre-built Ubuntu VM image. We have
also created several user accounts on the Elgg server and the credentials are given below.
User UserName Password
Admin admin seedelgg
Alice alice seedalice
Boby boby seedboby
Charlie charlie seedcharlie
Samy samy seedsamy
Configuring DNS. We have configured the following URLs needed for this lab. To access the URLs , the
Apache server needs to be started first:
URL Description Directory
http://www.wtlabelgg.com Elgg web site /var/www/webtracking/elgg
http://www.wtcamerastore.com CameraStore /var/www/webtracking/CameraStore
http://www.wtmobilestore.com MobileStore /var/www/webtracking/MobileStore
http://www.wtelectronicsstore.com ElectronicStore /var/www/webtracking/ElectronicStore
http://www.wtshoestore.com ShoeStore /var/www/webtracking/ShoeStore
http://www.wtlabadserver.com ReviveAdserver /var/www/webtracking/adserver
The above URLs are only accessible from inside of the virtual machine, because we have modified
the /etc/hosts file to map the domain name of each URL to the virtual machines local IP address
(127.0.0.1). You may map any domain name to a particular IP address using /etc/hosts. For
example you can map http://www.example.com to the local IP address by appending the following
entry to /etc/hosts:
127.0.0.1 www.example.com
If your web server and browser are running on two different machines, you need to modify /etc/hosts
on the browsers machine accordingly to map these domain names to the web servers IP address, not to
127.0.0.1.
Configuring Apache Server. In the pre-built VM image, we use Apache server to host all the web sites
used in the lab. The name-based virtual hosting feature in Apache could be used to host several web sites (or
URLs) on the same machine. A configuration file named default in the directory "/etc/apache2/
sites-available" contains the necessary directives for the configuration:
1. The directive "NameVirtualHost *" instructs the web server to use all IP addresses in the ma-
chine (some machines may have multiple IP addresses).
2. Each web site has a VirtualHost block that specifies the URL for the web site and directory
in the file system that contains the sources for the web site. For example, to configure a web site
with URL http://www.example1.com with sources in directory /var/www/Example_1/,
and to configure a web site with URL http://www.example2.com with sources in directory
/var/www/Example_2/, we use the following blocks:
SEED Labs Web Tracking Lab 3
<VirtualHost *>
ServerName http://www.example1.com
DocumentRoot /var/www/Example_1/
</VirtualHost>
<VirtualHost *>
ServerName http://www.example2.com
DocumentRoot /var/www/Example_2/
</VirtualHost>
You may modify the web application by accessing the source in the mentioned directories. For example,
with the above configuration, the web application http://www.example1.com can be changed by
modifying the sources in the directory /var/www/Example_1/.
1. Open Firefox browser, select History from the top menu, and click on Clear Recent History
option from the menu (Figure 1). A window Clear All History pops up.
2. Select all the check boxes and Click the Clear Now button in the pop up window (Figure 2). Close
the Firefox browser, reopen and start browsing.
SEED Labs Web Tracking Lab 4
1. On the left desktop menu, right click on the Firefox icon, Select Open a New Private Window
option as shown in Figure 3.
2. New Private browsing Firefox window opens up, start browsing in that private browser.
3 Lab Tasks
For this assignment, you will submit a pdf file describing both your observations and requested screenshots
for the following tasks. Your observations should be short - please limit yourself to 50-100 characters
depending on the question (roughly one to two sentences). Your screenshots should be limited in size,
specifically less than 1 MB. The expected time of completion is 3-5 hours.
1. Open Firefox and open the Elgg website (http://www.wtlabelgg.com) without visiting any
other website.
2. Open Firefox and open the wtCameraStore.com, wtMobileStore.com,
wtElectronicStore.com and wtShoeStore.com websites.
3. Click on view details for any products in the websites.
4. Refresh the Elgg website in Firefox and describe your observation.
5. Close the browser, reopen it and browse the Elgg website. Describe your observation.
Note: If you want to repeat the observations for step 1, clear the browsing history and cookies from the
Firefox browser. Please follow the instructions to clear history and cookies in section 2.2
SEED Labs Web Tracking Lab 5
2. Click on view details for any product in the website and use the LiveHTTPHeader Firefox exten-
sion to capture the HTTP header traffic information. You can open the LiveHTTPHeader Firefox
extension by going to the Tools menu in the browser (Figure 2).
3. In LiveHTTPHeaders identify the HTTP request that set the third party cookies, and take the
screenshot.
4. Right click on the Product Detail page and select View Page Source. Find out how the request
for the tracking cookie is sent from the webpage. Take a screenshot with the relevant source code
highlighted, and describe your observation.
products viewed by you will be logged in the ad server database. Please follow the steps below and give
your observation.
2. Click on view details for any product on at least two different websites.
4. Answer this question: What information is tracked about a user and how is the information mapped
to a particular user?
2. Capture and observe the LiveHTTPHeader traffic of the Elgg website, identify the HTTP requests
which are from a different domain (third party).
3. Take a screenshot of an HTTP request to the third party. Be sure the cookies being sent back to the
server are included in the screenshot.
4. Take a moment to make sure you understand how the Elgg website displays the targeted ads to the
user. Hint: take note of the cookies sent in the previous step. The value for the UserTrackID cookie
should match what you saw in the table displayed in Task 3.
5. Close the InPrivate browser, reopen it and browse the Elgg website. Describe your observations.
6. Answer this question: Why does your final observation here differ from what you saw at the end of
Task 1?
http://dictionary.reference.com
http://www.amazon.com
http://www.careerbuilder.com
Open the websites and observe the HTTP request and response in LiveHTTPHeaders.
You dont have to submit your answers to this task. Its just a fun, eye-opening exercise.
1. Disable the third party cookies from the Firefox browser. You can find instructions on how to disable
third party cookies in the Firefox browser here: https://support.mozilla.org/en-US/
kb/disable-third-party-cookies.
2. After disabling the third party cookies, open the wtCameraStore.com, wtMobileStore.com,
wtElectronicStore.com, wtShoeStore.com websites and LiveHTTPHeaders.
4. In LiveHTTPHeaders, identify the HTTP request that set the third party cookies, and take the
screenshot.
6. Take the screenshot of the HTTP request in LiveHTTPHeaders sent to the ad server when viewing
the Elgg site. Compare it with the HTTP request sent to the ads server in Task 4 and describe the
difference.
SEED Labs Web Tracking Lab 8
There are other ways to mitigate the web tracking. To opt out of targeted advertisement you can add
browser extensions like RequestPolicy, NoScript and Ghostery that control the third party requests from the
web browser. Also you can keep cookies only for the browsing session, by setting a cookie policy of only
keep cookies until I close my browser, which will delete all the cookies after the browser window is closed.
Major web browsers provide an option of Do Not Track, which is a feature to let third party trackers
know your preference to opt out of third party tracking. It is done by sending a HTTP header for every web
request. This Do Not Track preference may or may not be adhered to by the third party trackers. Some
third party trackers provide users with an option to Opt Out of targeted advertisement. However, some of
them may interpret Opt Out to mean do not show me targeted ads, rather than do not track my behavior
online. You can check your tracked online profile created by Google in www.google.com/settings/
ads. You can also find the opt out option provided in the above Google URL.
4 Guidelines
The diagram in Figure 5 shows the high level architecture of the Web tracking. In this diagram we have three
major components, the E-Commerce websites, Ad server and the Elgg website to display the targeted
advertisements. Each of the e-commerce websites have web bugs or beacons to track user preferences.
They are implanted as 1px by 1px image tags in the websites.
5 Submission
You will be submitting your assignment through Sakai. In your PDF submission, provide the following
information:
2. Task 1: Step 5 - Briefly describe your observations after closing and repoing your web browser.
4. Task 2: Step 4 - Screenshot of product detail page source requesting the tracking cookie
SEED Labs Web Tracking Lab 9
5. Task 2: Step 4 - Describe how the code enables the tracking cookie and from which third party site
the cookie will be sent.
7. Task 3: Step 4 - What information is tracked about a user and how is the information mapped to a
particular user?
9. Task 5: Step 4 - What do you see after refreshing the Elgg website?
10. Task 5: Step 5 - What happens after you close and reopen the browser in Private/Incognito mode?
11. Task 5: Step 6 - Does this answer differ from what you saw in task 1? Why or why not?
12. Task 7: Step 6 - Submit a screenshot of the HTTP request to the ad server.
13. Task 7: Step 6 - What is the difference between the HTTP requests sent to the ad server in this task
and the ones sent in task 4?
References
[1] HTTP Cookie - Wikipedia. Available at the following URL:
http://en.wikipedia.org/wiki/HTTP_cookie.
[2] New Cookie Technologies : Harder to See and Remove, Widely Used to Track you
https://www.eff.org/deeplinks/2009/09/new-cookie-technologies-harder-see-and-remove-wide
[3] How Online Tracking companies know most of what you do online
https://www.eff.org/deeplinks/2009/09/online-trackers-and-social-networks.