Project Report
On
“WEBSITE ANALYSIS”
Submitted To
BARKATULLAH VISHWAVIDHYALAYA, BHOPAL (M.P)
IN PARTIAL FULFILLMENT
OF THE DEGREE OF
Bachelor of Commerce (Computer applications)
Session – 2019-2020
Submitted By
SUMIT RAI
R185150280149
Under the
Guidance of
MR. ANKUR SAXENA ANIL SAHU
(Department of Commerce) (External Guide)
Scanned by CamScanner
VIDYAYANI INSTITUTE OF SCIENCE MANAGEMENT
AND TECHNOLOGY
DEPARTMENT OF COMMERCE
ACKNOWLEDGEMENT
I convey my sincere gratitude to Mrs. Sarita Chouhan (Head of
Commerce Department) for giving me the opportunity to prepare my
project work in “Website Analysis”. I express my sincere thanks to
all the staff member of Bachelor of Commerce Department. I am
thankful to Mr. Ankur Saxena for her / his guidance during my
project work and sparing her/his valuable time for the same.
I express my sincere obligation and thanks to the principal Mrs.
Sunita Sharma and all Faculties of the Department of Commerce
for providing me with guidance help motivation and valuable
advice at every stage for completing the project work successfully.
Mr. SUMIT RAI
R185150280149
B.COM (C.A.)
FINAL YEAR
VIDYAYANI INSTITUTE OF SCIENCE MANAGEMENT
AND TECHNOLOGY
DEPARTMENT OF COMMERCE
DECLARATION
I do hereby declare that the project work entitled “Website
Analysis” submitted by me for the partial fulfillment of the
requirement for the award of B.Com (Computer Application), is an
authentic work completed by me. The report being submitted has
not been submitted earlier for the award of any degree or diploma to
any Institute or University.
Mr. SUMIT RAI
R18515028014
B.COM (C.A.)
FINAL YEAR
VIDYAYANI INSTITUTE OF SCIENCE MANAGEMENT
AND TECHNOLOGY
DEPARTMENT OF COMMERCE
CERTIFICATION OF ORIGINALITY
This is to certify that the project report entitled “Website Analysis”
Submitted to Barkatullah University, Bhopal in partial fulfillment of
the requirement for the award of the degree of Bachelor of
Commerce, is an original work carried out by Mr. Sumit Rai,
Enrollment No: R185150280149.
The matter embodied in this project is a genuine work done
by the student and has not been submitted whether to this
University or to any other University / Institute for the fulfillment
of the requirement of any course of study.
Signature of the guide
Mr. Ankur Saxena
Assistant Professor
Department of Commerce
PROJECT
ON
Website Analysis
Introduction
In Search Engine Optimization, Website analysis is a
process which comes under on page optimization. It is a
tool through which the professionals analyses entire
website and find out how well it is working or
performing on search engine. The professionals need to
go through this process to find out the weakness and
status of the website which certainly helps the SEO
professionals to set the strategy accordingly. It helps the
optimizers to primarily focus on the weaker areas of the
website that becomes barrier in obtaining the optimum
profit and success from the website. The result of
website analysis enables the optimizers to work hard on
the SEO campaign, keeping in mind the weaken
elements of the website.
Web Analysis is the measurement, collection, analysis
and reporting of web data for purposes of understanding
and optimizing web usage.[1] However, Web Analysis is
not just a process for measuring web traffic but can be
used as a tool for business and market research, and to
assess and improve the effectiveness of a website. Web
Analysis applications can also help companies measure
the results of traditional print or broadcast advertising
campaigns. It helps one to estimate how traffic to a
website changes after the launch of a new advertising
campaign. Web Analysis provides information about
the number of visitors to a website and the number of
page views. It helps gauge traffic and popularity trends
which is useful for market research.
Table of Contents
PARTICULARS
1 Certificate from the Organization where project has been undertaken
2 Certification of Originality
3 Declaration
4 Acknowledgement
5 Introduction
Basic steps of the web Analysis process
Web Analysis technologies
Web Analysis data sources
Web server log file analysis
Page tagging
Logfile analysis vs page tagging
Advantages of logfile analysis
Advantages of page tagging
Economic factors
Hybrid methods
Geolocation of visitors
Click Analysis
Customer lifecycle Analysis
Other methods
On-site web Analysis - definitions
Off-site web Analysis
Common sources of confusion in web Analysis
The hotel problem
Web Analysis methods
Problems with cookies
Secure Analysis (metering) methods
See also
References
Bibliography
Basic Steps of the Web Analysis
Process
Process
Most web Analysis processes come down to four essential
stages or steps, which are:
Collection of data: This stage is the collection of the basic,
elementary data. Usually, these data are counts of things.
The objective of this stage is to gather the data.
Processing of data into information: This stage usually take
counts and make them ratios, although there still may be
some counts. The objective of this stage is to take the data
and conform it into information, specifically metrics.
Developing KPI: This stage focuses on using the ratios
(and counts) and infusing them with business strategies,
referred to as key performance indicators (KPI). Many
times, KPIs deal with conversion aspects, but not always. It
depends on the organization.
Formulating online strategy: This stage is concerned with
the online goals, objectives, and standards for the
organization or business. These strategies are usually
related to making money, saving money, or increasing
market share.
Another essential function developed by the analysts for the
optimization of the websites are the experiments
Experiments and testing’s: A/B testing is a controlled
experiment with two variants, in online settings, such
as web development.
The goal of A/B testing is to identify and suggest changes to
web pages that increase or maximize the effect of a
statistically tested result of interest.
Each stage impacts or can impact (i.e., drives) the stage
preceding or following it. So, sometimes the data that is
available for collection impacts the online strategy. Other
times, the online strategy affects the data collected.
Web Analysis technologies
There are at least two categories of web Analysis, off-
site and on-site web Analysis.
Off-site web Analysis refers to web measurement and
analysis regardless of whether you own or maintain a
website. It includes the measurement of a
website's potential audience (opportunity), share of voice
(visibility), and buzz (comments) that is happening on the
Internet as a whole.
On-site web Analysis, the more common of the two,
measure a visitor's behavior once on your website. This
includes its drivers and conversions; for example, the
degree to which different landing pages are associated with
online purchases. On-site web Analysis measures the
performance of your website in a commercial context. This
data is typically compared against key performance
indicators for performance and is used to improve a website
or marketing campaign's audience response. Google
Analysis and Adobe Analysis are the most widely used on-
site web Analysis service; although new tools are emerging
that provide additional layers of information, including heat
maps and session replay.
Historically, web Analysis has been used to refer to on-site
visitor measurement. However, this meaning has become
blurred, mainly because vendors are producing tools that span
both categories. Many different vendors provide on-site web
Analysis software and services. There are two main technical
ways of collecting the data. The first and traditional
method, server log file analysis, reads the log files in which
the web server records file requests by browsers. The second
method, page tagging, uses JavaScript embedded in the
webpage to make image requests to a third-party Analysis-
dedicated server, whenever a webpage is rendered by a web
browser or, if desired, when a mouse click occurs. Both
collect data that can be processed to produce web traffic
reports.
Web Analysis data sources
The fundamental goal of web Analysis is to collect and
analyze data related to web traffic and usage patterns. The
data mainly comes from four sources
1. Direct HTTP request data: directly comes from HTTP
request messages (HTTP request headers).
2. Network level and server generated data associated with
HTTP requests: not part of an HTTP request, but it is
required for successful request transmissions - for
example, IP address of a requester.
3. Application level data sent with HTTP requests:
generated and processed by application level programs
(such as JavaScript, PHP, and ASP.Net), including
session and referrals. These are usually captured by
internal logs rather than public web Analysis services.
4. External data: can be combined with on-site data to help
augment the website behavior data described above and
interpret web usage. For example, IP addresses are
usually associated with Geographic regions and internet
service providers, e-mail open and click-through rates,
direct mail campaign data, sales and lead history, or
other data types as needed.
Web server log file analysis
Web servers record some of their transactions in a log file. It
was soon realized that these log files could be read by a
program to provide data on the popularity of the website.
Thus arose web log analysis software.
In the early 1990s, website statistics consisted primarily of
counting the number of client requests (or hits) made to the
web server. This was a reasonable method initially, since each
website often consisted of a single HTML file. However, with
the introduction of images in HTML, and websites that
spanned multiple HTML files, this count became less useful.
The first true commercial Log Analyzer was released by IPRO
in 1994.[4]
Two units of measure were introduced in the mid-1990s to
gauge more accurately the amount of human activity on web
servers. These were page views and visits (or sessions).
A page view was defined as a request made to the web server
for a page, as opposed to a graphic, while a visit was defined
as a sequence of requests from a uniquely identified client that
expired after a certain amount of inactivity, usually 30
minutes. The page views and visits are still commonly
displayed metrics, but are now considered[by whom?]
rather
rudimentary.
The emergence of search engine spiders and robots in the late
1990s, along with web proxies and dynamically assigned IP
addresses for large companies and ISPs, made it more difficult
to identify unique human visitors to a website. Log analyzers
responded by tracking visits by cookies, and by ignoring
requests from known spiders.[citation needed]
The extensive use of web caches also presented a problem for
log file analysis. If a person revisits a page, the second request
will often be retrieved from the browser's cache, and so no
request will be received by the web server. This means that
the person's path through the site is lost. Caching can be
defeated by configuring the web server, but this can result in
degraded performance for the visitor and bigger load on the
servers
Page tagging
Concerns about the accuracy of log file analysis in the
presence of caching, and the desire to be able to perform web
Analysis as an outsourced service, led to the second data
collection method, page tagging or 'Web bugs'.
In the mid-1990s, Web counters were commonly seen —
these were images included in a web page that showed the
number of times the image had been requested, which was an
estimate of the number of visits to that page. In the late 1990s,
this concept evolved to include a small invisible image instead
of a visible one, and, by using JavaScript, to pass along with
the image request certain information about the page and the
visitor. This information can then be processed remotely by a
web Analysis Company, and extensive statistics generated.
The web Analysis service also manages the process of
assigning a cookie to the user, which can uniquely identify
them during their visit and in subsequent visits. Cookie
acceptance rates vary significantly between websites and may
affect the quality of data collected and reported.
Collecting website data using a third-party data collection
server (or even an in-house data collection server) requires an
additional DNS lookup by the user's computer to determine
the IP address of the collection server. On occasion, delays in
completing a successful or failed DNS lookups may result in
data not being collected.
With the increasing popularity of Ajax-based solutions, an
alternative to the use of an invisible image is to implement a
call back to the server from the rendered page. In this case,
when the page is rendered on the web browser, a piece of
Ajax code would call back to the server and pass information
about the client that can then be aggregated by a web Analysis
company. This is in some ways flawed by browser restrictions
on the servers which can be contacted with Xml Http
Request objects. Also, this method can lead to slightly lower
reported traffic levels, since the visitor may stop the page
from loading in mid-response before the Ajax call is made.
Logfile analysis vs. page tagging
Both logfile analysis programs and page tagging solutions are
readily available to companies that wish to perform web
Analysis. In some cases, the same web Analysis company will
offer both approaches. The question then arises of which
method a company should choose. There are advantages and
disadvantages to each approach
Advantages of logfile analysis
The main advantages of log file analysis over page tagging are
as follows:
The web server normally already produces log files, so the
raw data is already available. No changes to the website are
required.
The data is on the company's own servers, and is in a
standard, rather than a proprietary, format. This makes it
easy for a company to switch programs later, use several
different programs, and analyze historical data with a new
program.
Logfiles contain information on visits from search engine
spiders, which generally are excluded from the Analysis
tools using JavaScript tagging. (Some search engines might
not even execute JavaScript on a page.) Although these
should not be reported as part of the human activity, it is
useful information for search engine optimization.
Logfiles require no additional DNS lookups or TCP slow
starts. Thus there are no external server calls which can
slow page load speeds, or result in uncounted page views.
The web server reliably records every transaction it makes,
e.g. serving PDF documents and content generated by
scripts, and does not rely on the visitors' browsers
cooperating.
Advantages of page tagging
The main advantages of page tagging over log file analysis are
as follows:
Counting is activated by opening the page (given that the
web client runs the tag scripts), not requesting it from the
server. If a page is cached, it will not be counted by server-
based log analysis. Cached pages can account for up to
one-third of all page views. Not counting cached pages
seriously skews many site metrics. It is for this reason
server-based log analysis is not considered suitable for
analysis of human activity on websites.[by whom?]
Data is gathered via a component ("tag") in the page,
usually written in JavaScript, though Java or Flash can also
be used. Ajax can also be used in conjunction with a server-
side scripting language (such as PHP) to manipulate and
(usually) store it in a database, basically enabling complete
control over how the data is represented.[dubious – discuss]
The script may have access to additional information on the
web client or on the user, not sent in the query, such as
visitors' screen sizes and the price of the goods they
purchased.
Page tagging can report on events which do not involve a
request to the web server, such as interactions
within Flash movies, partial form completion, mouse
events such as onClick, onMouseOver, onFocus, onBlur
etc.
The page tagging service manages the process of assigning
cookies to visitors; with log file analysis, the server has to
be configured to do this.
Page tagging is available to companies who do not have
access to their own web servers.
Lately, page tagging has become a standard in web
Analysis
Economic factors
Logfile analysis is almost always performed in-house. Page
tagging can be performed in-house, but it is more often
provided as a third-party service. The economic difference
between these two models can also be a consideration for a
company deciding which to purchase.
Logfile analysis typically involves a one-off software
purchase; however, some vendors are introducing
maximum annual page views with additional costs to
process additional information.[citation needed]
In addition to
commercial offerings, several open-source logfile analysis
tools are available free of charge.
For Logfile analysis data must be stored and archived,
which often grows large quickly. Although the cost of
hardware to do this is minimal, the overhead for an IT
department can be considerable.
For Logfile analysis software needs to be maintained,
including updates and security patches.
Complex page tagging vendors charge a monthly fee based
on volume i.e. number of page views per month collected.
Which solution is cheaper to implement depends on the
amount of technical expertise within the company, the vendor
chosen, the amount of activity seen on the websites, the depth
and type of information sought, and the number of distinct
websites needing statistics.
Regardless of the vendor solution or data collection method
employed, the cost of web visitor analysis and interpretation
should also be included. That is, the cost of turning raw data
into actionable information. This can be from the use of third
party consultants, the hiring of an experienced web analyst, or
the training of a suitable in-house person. A cost-benefit
analysis can then be performed. For example, what revenue
increase or cost savings can be gained by analyzing the web
visitor data?
Hybrid methods
Some companies produce solutions that collect data through
both log-files and page tagging and can analyze both kinds.
By using a hybrid method, they aim to produce more accurate
statistics than either method on its own. An early hybrid
solution was produced in 1998 by Rufus Evison.
Geolocation of visitors
With IP geolocation, it is possible to track visitors' locations.
Using IP geolocation database or API, visitors can be
geolocated to city, region or country level.[8]
IP Intelligence, or Internet Protocol (IP) Intelligence, is a
technology that maps the Internet and categorizes IP addresses
by parameters such as geographic location (country, region,
state, city and postcode), connection type, Internet Service
Provider (ISP), proxy information, and more. The first
generation of IP Intelligence was referred to
as retargeting or geolocation technology. This information is
used by businesses for online audience segmentation in
applications such online advertising, behavioral targeting,
content localization (or website localization), digital rights
management, personalization, online fraud detection,
localized search, enhanced Analysis, global traffic
management, and content distribution.
Click Analytical
Click Analysis is a special type of web Analysis that gives
special attention to clicks.
Commonly, click Analysis focuses on on-site Analysis. An
editor of a website uses click Analysis to determine the
performance of his or her particular site, with regards to where
the users of the site are clicking.
Also, click Analysis may happen real-time or "unreal"-time,
depending on the type of information sought. Typically, front-
page editors on high-traffic news media sites will want to
monitor their pages in real-time, to optimize the content.
Editors, designers or other types of stakeholders may analyze
clicks on a wider time frame to help them assess performance
of writers, design elements or advertisements etc.
Data about clicks may be gathered in at least two ways.
Ideally, a click is "logged" when it occurs, and this method
requires some functionality that picks up relevant information
when the event occurs. Alternatively, one may institute the
assumption that a page view is a result of a click, and
therefore log a simulated click that led to that page view.
Customer lifecycle Analysis
Customer lifecycle Analysis is a visitor-centric approach to
measuring that falls under the umbrella of lifecycle marketing.
Page views, clicks and other events (such as API calls, access
to third-party services, etc.) are all tied to an individual visitor
instead of being stored as separate data points. Customer
lifecycle Analysis attempts to connect all the data points into
a marketing funnel that can offer insights into visitor behavior
and website optimization
Other methods
Other methods of data collection are sometimes used. Packet
sniffing collects data by sniffing the network traffic passing
between the web server and the outside world. Packet sniffing
involves no changes to the web pages or web servers.
Integrating web Analysis into the web server software itself is
also possible.[9] Both these methods claim to provide
better real-time data than other methods.
Web Analysis - definitions
There are no globally agreed definitions within web Analysis
as the industry bodies have been trying to agree on definitions
that are useful and definitive for some time. The main bodies
who have had input in this area have been the IAB (Interactive
Advertising Bureau), JICWEBS (The Joint Industry
Committee for Web Standards in the UK and Ireland), and
The DAA (Digital Analysis Association), formally known as
the WAA (Web Analysis Association, US). However, many
terms are used in consistent ways from one major Analysis
tool to another, so the following list, based on those
conventions, can be a useful starting point:
Bounce Rate - The percentage of visits that are single-page
visits and without any other interactions (clicks) on that
page. In other words, a single click in a particular session is
called a bounce.
Click path - the chronological sequence of page views
within a visit or session.
Hit - A request for a file from the web server. Available
only in log analysis. The number of hits received by a
website is frequently cited to assert its popularity, but this
number is extremely misleading and dramatically
overestimates popularity. A single web-page typically
consists of multiple (often dozens) of discrete files, each of
which is counted as a hit as the page is downloaded, so the
number of hits is really an arbitrary number more reflective
of the complexity of individual pages on the website than
the website's actual popularity. The total number of visits or
page views provides a more realistic and accurate
assessment of popularity.
Page view - A request for a file, or sometimes an event
such as a mouse click, that is defined as a page in the setup
of the web Analysis tool. An occurrence of the script being
run in page tagging. In log analysis, a single page view may
generate multiple hits as all the resources required to view
the page (images, .js and .css files) are also requested from
the web server.
Visitor / Unique Visitor / Unique User - The uniquely
identified client that is generating page views or hits within
a defined time period (e.g. day, week or month). A
uniquely identified client is usually a combination of a
machine (one's desktop computer at work for example) and
a browser (Firefox on that machine). The identification is
usually via a persistent cookie that has been placed on the
computer by the site page code. An older method, used in
log file analysis, is the unique combination of the
computer's IP address and the User-Agent (browser)
information provided to the web server by the browser. It is
important to understand that the "Visitor" is not the same as
the human being sitting at the computer at the time of the
visit, since an individual human can use different
computers or, on the same computer, can use different
browsers, and will be seen as a different visitor in each
circumstance. Increasingly, but still, somewhat rarely,
visitors are uniquely identified by Flash LSO's (Local
Shared Object), which are less susceptible to privacy
enforcement.
Visit / Session - A visit or session is defined as a series of
page requests or, in the case of tags, image requests from
the same uniquely identified client. A unique client is
commonly identified by an IP address or a unique ID that is
placed in the browser cookie. A visit is considered ended
when no requests have been recorded in some number of
elapsed minutes. A 30-minute limit ("time out") is used by
many Analysis tools but can, in some tools (such as Google
Analysis), be changed to another number of minutes.
Analysis data collectors and analysis tools have no reliable
way of knowing if a visitor has looked at other sites
between page views; a visit is considered one visit as long
as the events (page views, clicks, whatever is being
recorded) are 30 minutes or less closer together. Note that a
visit can consist of a one-page view or thousands. A unique
visit's session can also be extended if the time between
page loads indicates that a visitor has been viewing the
pages continuously.
Active Time / Engagement Time - Average amount of time
that visitors spend actually interacting with content on a
web page, based on mouse moves, clicks, hovers, and
scrolls. Unlike Session Duration and Page View Duration /
Time on Page, this metric can accurately measure the
length of engagement in the final page view, but it is not
available in many Analysis tools or data collection
methods.
Average Page Depth / Page Views per Average Session -
Page Depth is the approximate "size" of an average visit,
calculated by dividing the total number of page views by
the total number of visits.
Average Page View Duration - Average amount of time
that visitors spend on an average page of the site.
Click - "refers to a single instance of a user following a
hyperlink from one page in a site to another".
Event - A discrete action or class of actions that occur on a
website. A page view is a type of event. Events also
encapsulate clicks, form submissions, keypress events, and
other client-side user actions.
Exit Rate / % Exit - A statistic applied to an individual
page, not a web site. The percentage of visits seeing a page
where that page is the final page viewed in the visit.
First Visit / First Session - (also called 'Absolute Unique
Visitor' in some tools) A visit from a uniquely identified
client that has theoretically not made any previous visits.
Since the only way of knowing whether the uniquely
identified client has been to the site before is the presence
of a persistent cookie or via digital fingerprinting that had
been received on a previous visit, the First Visit label is not
reliable if the site's cookies have been deleted since their
previous visit.
Frequency / Session per Unique - Frequency measures how
often visitors come to a website in a given time period. It is
calculated by dividing the total number of sessions (or
visits) by the total number of unique visitors during a
specified time period, such as a month or year. Sometimes
it is used interchangeable with the term "loyalty."
Impression - The most common definition of "Impression"
is an instance of an advertisement appearing on a viewed
page. Note that an advertisement can be displayed on a
viewed page below the area actually displayed on the
screen, so most measures of impressions do not necessarily
mean an advertisement has been view-able.
New Visitor - A visitor that has not made any previous
visits. This definition creates a certain amount of confusion
(see common confusions below), and is sometimes
substituted with analysis of first visits.
Page Time Viewed / Page Visibility Time / Page View
Duration - The time a single page (or a blog, Ad Banner...)
is on the screen, measured as the calculated difference
between the time of the request for that page and the time
of the next recorded request. If there is no next recorded
request, then the viewing time of that instance of that page
is not included in reports.
Repeat Visitor - A visitor that has made at least one
previous visit. The period between the last and current visit
is called visitor recency and is measured in days.
Return Visitor - A Unique visitor with activity consisting of
a visit to a site during a reporting period and where the
Unique visitor visited the site prior to the reporting period.
The individual is counted only once during the reporting
period.
Session Duration / Visit Duration - Average amount of time
that visitors spend on the site each time they visit.It is
calculated as the sum total of the duration of all the
sessions divided by the total number of sessions. This
metric can be complicated by the fact that Analysis
programs can not measure the length of the final page
view.[citation needed]
Single Page Visit / Singleton - A visit in which only a
single page is viewed (this is not a 'bounce').
Site Overlay is a report technique in which statistics
(clicks) or hot spots are superimposed, by physical location,
on a visual snapshot of the web page.
Click-through Rate is a ratio of users who click on a
specific link to the number of total users who view a page,
email, or advertisement. It is commonly used to measure
the success of an online advertising campaign for a
particular website as well as the effectiveness of email
campaigns.
Off-site web Analysis
Off-site web Analysis is based on open data analysis, social
media exploration, share of voice on web properties. It is
usually used to understand how to market a site by identifying
the keywords tagged to this site, either from social media or
from other websites.
By using HTTP Referrer, webpage owners will be able to
trace which are the referrer sites that helps bring in traffic to
their own site.
Common sources of confusion in Web
Analysis
The hotel problem[edit]
The hotel problem is generally the first problem encountered
by a user of web Analysis. The problem is that the unique
visitors for each day in a month do not add up to the same
total as the unique visitors for that month. This appears to an
inexperienced user to be a problem in whatever Analysis
software they are using. In fact it is a simple property of the
metric definitions.
The way to picture the situation is by imagining a hotel. The
hotel has two rooms (Room A and Room B).
Day 01 Day 02 Day 03 Total
Room A John John Mark 2 Unique Users
Room B Mark Anne Anne 2 Unique Users
Total 2 2 2 ?
As the table shows, the hotel has two unique users each day
over three days. The sum of the totals with respect to the days
is therefore six.
During the period each room has had two unique users. The
sum of the totals with respect to the rooms is therefore four.
Actually only three visitors have been in the hotel over this
period. The problem is that a person who stays in a room for
two nights will get counted twice if you count them once on
each day, but is only counted once if you are looking at the
total for the period. Any software for web Analysis will sum
these correctly for the chosen time period, thus leading to the
problem when a user tries to compare the totals.
Web Analysis methods
Problems with cookies
Historically, vendors of page-tagging Analysis solutions have
used third-party cookies sent from the vendor's domain
instead of the domain of the website being browsed. Third-
party cookies can handle visitors who cross multiple unrelated
domains within the company's site, since the cookie is always
handled by the vendor's servers.
However, third-party cookies in principle allow tracking an
individual user across the sites of different companies,
allowing the Analysis vendor to collate the user's activity on
sites where he provided personal information with his activity
on other sites where he thought he was anonymous. Although
web Analysis companies deny doing this, other companies
such as companies supplying banner ads have done
so. Privacy concerns about cookies have therefore led a
noticeable minority of users to block or delete third-party
cookies. In 2005, some reports showed that about 28% of
Internet users blocked third-party cookies and 22% deleted
them at least once a month.[10] Most vendors of page tagging
solutions have now moved to provide at least the option of
using first-party cookies (cookies assigned from the client
subdomain).
Another problem is cookie deletion. When web Analysis
depend on cookies to identify unique visitors, the statistics are
dependent on a persistent cookie to hold a unique visitor ID.
When users delete cookies, they usually delete both first- and
third-party cookies. If this is done between interactions with
the site, the user will appear as a first-time visitor at their next
interaction point. Without a persistent and unique visitor id,
conversions, click-stream analysis, and other metrics
dependent on the activities of a unique visitor over time,
cannot be accurate.
Cookies are used because IP addresses are not always unique
to users and may be shared by large groups or proxies. In
some cases, the IP address is combined with the user agent in
order to more accurately identify a visitor if cookies are not
available. However, this only partially solves the problem
because often users behind a proxy server have the same user
agent. Other methods of uniquely identifying a user are
technically challenging and would limit the tractable audience
or would be considered suspicious. Cookies are the selected
option because they reach the lowest common denominator
without using technologies regarded as spyware.
Secure Analysis (metering) methods
It may be good to be aware that the third-party information
gathering is subject to any network limitations and security
applied. Countries, Service Providers and Private Networks
can prevent site visit data from going to third parties. All the
methods described above (and some other methods not
mentioned here, like sampling) have the central problem of
being vulnerable to manipulation (both inflation and
deflation). This means these methods are imprecise and
insecure (in any reasonable model of security). This issue has
been addressed in a number of papers, but to-date the
solutions suggested in these papers remain theoretical,
possibly due to lack of interest from the engineering
community, or because of financial gain the current situation
provides to the owners of big websites. For more details,
consult the aforementioned papers.
Web Analysis Overview
Web Analysis is the technology and method for the collection,
measurement, analysis and reporting of websites and web
applications usage data (Burby & Brown, 2007). Web
Analysis has been growing ever since the development of the
World Wide Web. It has grown from a simple function of
HTTP (Hypertext Transfer Protocol) traffic logging to a more
comprehensive suite of usage data tracking, analysis, and
reporting. The web Analysis industry and market are also
booming with a plethora of tools, platforms, jobs, and
businesses. The market was projected to reach 1 billion in
2014 with an annual growth rate more than 15% (Lovett,
2009).
Web Analysis technologies are usually categorized into on-
site and off-site web Analysis. On-site web Analysis refers to
data collection on the current site (Kaushik, 2009). It is used
to effectively measure many aspects of direct user-website
interactions, including number of visits, time on site, click
path, etc. Off-site Analysis is usually offered by third party
companies such as Twitalyzer (http://twitalyzer.com) or
Sweetspot (http://www.sweetspotintelligence.com). It
includes data from other sources such as surveys, market
report, competitor comparison, public information, etc. This
chapter provides an overview of on-site web Analysis, with a
focus on categorizing and explaining data, sources, collection
methods, metrics and analysis methods.
BACKGROUND
Log files have been used to keep track of web requests since
World Wide Web emerged and the first widely used browser
Mosaic was released in 1993. One of the pioneers of web log
analysis was WebTrends, a Portland, Oregon based company,
which conducted website Analysis using data collected from
web server logs. In the same year, WebTrends created the first
commercial website Analysis software. In 1995, Dr. Stephen
Turner created Analog, the first free log file analysis software.
In 1996, WebSideStory offered hit counter as a service for
websites that would display a banner. Web server logs have
some limits in types of data collected. For example, they
could not provide information about visitors' screen sizes, user
interactions with page elements, mouse events such as
clicking and hovering, etc. The new technique of page tagging
is able to overcome the limitation and gets more popular
recently.
The fundamental basis of web Analysis is collection and
analysis of website usage data. Today, web Analysis is used in
many industries for different purposes, including traffic
monitoring, e-commerce optimization, marketing/advertising,
web development, information architecture, website
performance improvement, web-based campaigns/programs,
etc. Some of the major web Analysis usages are:
1. Improving website/application design and user experience.
This includes optimizing website information architecture,
navigation, content presentation/layout, and user interaction.
It also helps to identify user interest/attention areas and
improve web application features. A particular example is a
heat map that highlights areas of a webpage with higher than
average click rate and helps determine if intended link/content
is in the right place.
2. Optimizing e-Commerce and improving e-CRM on
customer orientation, acquisition and retention. More and
more companies analyze website usage data in order to
understand customers' needs to increase traffic and ultimately
increase their revenue. Different sites can have
Web Analysis Basics
Web Analysis is the collection, reporting, and analysis
of website data. The focus is on identifying measures based
on your organizational and user goals and using the website
data to determine the success or failure of those goals and to
drive strategy and improve the user’s experience.
Measuring Content
Critical to developing relevant and effective web analysis is
creating objectives and calls-to-action from your
organizational and site visitors goals, and identifying key
performance indicators (KPIs) to measure the success or
failures for those objectives and calls-to-action. Here are
some examples for building a measurement framework for an
informational website:
Framework
What is it? Examples
Item
Goals Your site’s major To educate the
goals should public about safe
essentially outline handling of food.
Framework
What is it? Examples
Item
why you have a
website.
Objectives Objectives help To reach as many
outline what it online users
takes to achieve looking for
your goals. information on
food safety and
convert them into
site visitors.
Calls-to- Calls-to-action are Online users come
Action tasks that site to the website
visitors must because it was
complete as part of listed on search
your sites’ goals engines as a
and objectives. credible source for
food safety
Framework
What is it? Examples
Item
content.
Key Key performance Website’s search
Performance indicators are clickthrough rate
Indicators metrics in which for keywords
(KPIs) we can measure related to food
each CTA. safety
Search visits to the
website for
keywords related
to food safety
Targets Targets are Search
thresholds that clickthrough rate
determine whether in queries for food
safety should be
no less than 10%.
Search visits show
Framework
What is it? Examples
Item
an increasing
trend.
Actionable Insights from Using Multiple Tools
As you can see from the table above, measuring success
comes in many KPIs and will require multiple tools. While
the thought of managing more than one web Analysis tool can
be daunting, know that by simplifying and focusing on the
KPIs that you need to measure your organizational and user
goals, you can weed out other data to get to the right insights.
Data Source Types of Tools
Clickstream Web Analysis tools
Clickthrough, scroll-tracking,
heatmaps
Experimentation & A/B, multivariate testing
Data Source Types of Tools
Testing User testing
Voice of Customer Customer satisfaction surveys
Page-level surveys
Competitive Intelligence Keyword research tools,
& Market Research Competitive analysis tools
Web Analysis Best Practices
Web Analysis can strongly support the qualitative research
and testing finding. Some best practices to keep in mind
related to this field are:
Encourage a data-driven environment for decision
making. After collecting the relevant data to answer
whether you have met (or fail to meet) your goals, find out
what you can do to improve your KPIs. Are there high-
value content (based on user feedback to the website) that is
not getting any traffic? Find out why through user path
analysis or engagement analysis of top sources for that
page. Leverage the experimentation & testing tools to try
out different solutions and find the best placement that
generates the most engagement for that page.
Avoid only providing traffic reports. Reporting about
visits, pageviews, top sources, or top pages only skims the
surface. Large numbers can be misleading; just because
there is more traffic or time spent on site doesn’t mean that
there is success. Reporting these numbers is largely
tactical; after all, what do 7 million visits have to do with
the success of your program?
Always provide insights with the data. Reporting metrics to
your stakeholders with no insights or tie-ins to your
business or user goals misses the point. Make the data
relevant and meaningful by demonstrating how the website
data shows areas of success and of improvement on your
site.
Avoid being snapshot-focused in reporting. Focusing on
visits or looking only within a specific time period doesn’t
capture the richer and more complex web experiences that
are happening online now. Pan-session metrics, such as
visitors, user-lifetime value, and other values that provide
longer-term understanding of people and users, allow you
to evaluate how your website has been doing as it matures
and as it interacts with visitors, especially the returning
ones.
Communicate clearly with stakeholders. Be consistent in
the information you provide, know your audience, and
know the weaknesses of your system and disclose them to
your stakeholders.
Web Analysis Tools
My goal is to give you a list of tools that I use in my everyday
life as a practitioner (you'll see many of them implemented on
this blog). You are not going to use all of them all at the same
time (or with every client), but 1. it is good to know what is
out there and 2. to be awesome you are likely to use one from
each category.
[Disclosure:] I am the co-Founder of Market Motive Inc and
the Analysis Evangelist for Google. I do not have financial or
equity or any other stake in any company mentioned in this
blog post (except Google). None of these tools vendors have
any relationship with Market Motive either. They are on this
list because IMHO they provide value and are better than their
competition. [/Disclosure]
Before we jump into tools a few key bits of context, after
all context is queen!
First Bit Of Context. . . Web Analysis 2.0.
This blog post is about web Analysis 2.0. Not just clickstream
analysis.
As defined in my second book Web Analysis 2.0 is:
1. the analysis of qualitative and quantitative data from your
website and the competition,
2. to drive a continual improvement of the online experience
of your customers and prospects,
3. which translates into your desired outcomes (online and
offline)
An expansive view of what it means to use data online, both
from the type of data perspective and the kind of desired
impact perspective.
Second Bit Of Context. . . Multiplicity.
Given the definition above, I am a firm believer
in Multiplicity.
Every single company, regardless of size, will require
multiple tools to understand the performance of its website,
happiness of its customers and glean key context from
competitors and ecosystem evolution.
The quest for a "single source of the truth" on the web is
futile.
Actually let me rephrase that. . . the quest for a single
tool/source to answer all your questions will ensure that your
business will end up in a ditch, and additionally ensure that
your career (from the Analyst to the web CMO) will be short-
lived.
Sorry.
You should know upfront that you are going to fail, often
spectacularly, if you don't embrace the fact that you have
many complicated questions to answer, from many different
sources.
To be an Analysis Ninja, and part of a successful web
business, embrace Web Analysis 2.0 and embrace
Multiplicity. Use a clickstream source when you have to,
switch to testing to move beyond HiPPO's and inferences
from click data, invite customers on a regular basis share
feedback with you using surveys and usability, and poke and
prod your competitor's and ecosystem performance to know
what to do more of and what to do less of and what you have
been blind to.
Do that. Work hard. Win big. Rinse, repeat.
Third Bit Of Context. . . Don't Be Scared: Prioritize.
Many people get really scared and run for the hills when they
first put Web Analysis 2.0 and Multiplicity together.
Don't be.
Depending on the size of your company (translation:
resources available and what's impactful and doable) here is
the priority order that I recommend for you to execute your
web Analysis tools strategy right. . .
Not everybody should do everything in the same order. In my
humble experience the above order works best for small,
medium and large sized companies.
The result of going in a specific order is that this does not
have to all be done overnight. You can take your time and
evolve over time.
For more on why I recommend this specific order please see
my second book, Web Analysis 2.0, which many of you
already have.
Fourth Bit Of Context. . . The 10/90 Rule!
I can't ever talk about tools without reminding you of
my 10/90 rule for magnificent success in web Analysis.
First presented at an eMetrics summit in 2005 the 10/90 rule
was borne out of my observations of why most companies fail
miserably at web Analysis.
Put simply it states:
If your have a budget of $100 to make smart decisions about
your websites… invest $10 in tools and vendor
implementation and spend $90 on Analysts with big brains.
Summary: Its the people.
You may not go with precisely 90, that is ok. But overinvest
in people and everything that is required to make those people
successful: invest in process, invest in their training, invest in
large monitors for them, invest in backing them up against
senior management, invest in involving them in key business
strategy meetings, invest in… you catch my drift).
The coolest tools, the really expensive tools, will deliver
diddly squat for your business. They'll simply puke data faster
and, if you implement them right, more efficiently.
It's your investment in the 90 that will deliver glory.
People matter.
With those minor caveats, and what it takes to be successful
refreshers, I am really excited to tell you all about tools!
:)
The Best Web Analysis 2.0 Tools For Maximum
Awesomeness!
Let us break this list into the components of Web Analysis 2.0
so you have some reference as to where each item fits (and
this will also make it easier for you to pick tools for the
priority order referenced in Context #3 above).
Clickstream Analysis Tools [The "What"]
To many people the clickstream world is all there is to the
web Analysis world. It is without a doubt the largest source of
data you'll access.
There are hundreds (I kid you not) of clickstream tools, I
recommend you keep your life on the straight and narrow and
pick one, just one (!), of these three tools:
~ Yahoo! Web Analysis
~ Google Analysis
~ Piwik
Yahoo! and Google provide world class web Analysis tools
for free.
Custom reporting, advanced segmentation, advanced rich
media tracking, auto-integration with search engine PPC
campaigns, advanced mathematical intelligence, algorithmic
data sorting options, complete ecommerce tracking, super
scalable sophisticated data capture methods such as custom
variables, open free and full API access to the data, loads and
loads and loads of developer applications to do cool data
visualizations, data transformations, external data integrations
and more. I am forgetting the other 25 features these tools
provide for free.
Additionally if you look at the massive progress these two
tools have made in the last 24 months there is hardly anything,
more like _nothing_, they can't do that other vendors, free or
paid, can do.
There would have to be an overwhelming preponderance of
evidence showing that your company is magnificently unique,
extremely special and with such incredibly uncommon needs
that you need to go with any other clickstream tool (including
paid clickstream tools from Omniture, CoreMetrics, Unica,
WebTrends or anyone else).
If you have never done web analysis, start with one of these
two.
If you have always done analysis and only use clickstream
tools like Site Catalyst or Coremetrics Analysis or WebTrends
Analysis then switch to one of these two tools and invest the
money in Analysts (and wait just a couple months for your
mind to be blown by valuable insights).
This is not to say paid web Analysis tools (that do more than
just clickstream analysis) don't provide value.
If after rigorous analysis you have determined that you have
evolved to a stage that you need a data warehouse then you
are out of luck with Yahoo! and Google, get a paid solution. If
you can show ROI on a DW it would be a good use of your
money to go with Omniture Discover, WebTrends Data Mart,
Coremetrics Explore.
If you have evolved to a stage that you need behavior
targeting then get Omniture Test and Target or Sitespect.
Good use of your money.
Etc etc.
Spending money on the base solutions from paid vendors is a
very poor use of your money.
IMPORTANT: Many people think it is hard to get the free
Yahoo! Web Analysis. Not true. There are three specific
ways to get Yahoo! Web Analysis. Read this: How do I get a
Yahoo! Web Analysis account?
If you are technically oriented, don't trust either Yahoo! or
Google and up for an adventure I highly recommend you
consider using Piwik.
It is a wonderful solution. It has been constantly updated in
the two years I have watched it. Piwik provides you plenty of
capability to explore your inner technical unicorn while
allowing you to answer business questions.
Three tools. Pick one. Move on with your analytical lives.
Move from a data collection obsession and develop a crush on
data analysis.
Business Objectives:
This is the answer to the question: "Why does your website
exist?"
Or: "What are you hoping to accomplish for your business by
being on the web?"
Or: "What are the three most important priorities for your
site?"
Or other questions like that.
Without a clearly defined list of business objectives you are
doomed, because if you don't know where you are going then
any road will take you there.
The objectives must be DUMB: Doable. Understandable.
Manageable. Beneficial.
90% of the failures in web Analysis, the reasons companies
are data rich and information poor, is because they don't have
DUMB objectives.
Or they have just one (DUMB) Macro Conversion defined
and completely ignore the Micro Conversions and Economic
Value.
Your company leadership (small business or fortune 100) will
help you identify business objectives for your online
existence. Beg, threaten, embarrass, sleep with someone, do
what you have to get them defined.
Point of confusion: People, like me, often also use the
term Desirable Outcomes to refer to business objectives. They
are one and the same thing.
[Full disclosure: Depending on the specificity of your
business objectives my asking you for your "desirable
outcomes" could refer to "what are your goals". See below
Goals:
Goals are specific strategies you'll leverage to accomplish
your business objectives.
Business objectives can be quite strategic and high level. Sell
more stuff. Create happy customers. Improve marketing
effectiveness.
Goals are the next level drill down.
It goes something like this. . .
Sell more stuff really means we have to:
1. do x
2. improve y
3. reduce z
Improve marketing effectiveness might translate into these
goals because currently they are our priorities:
1. identify broken things in m
2. figure out how to do n
3. experiment with p type of campaigns
Get it?
The beauty of goals is that they reflect specific strategies.
They are really DUMB. They are priorities. They are actually
things almost everyone in the company will understand as
soon as you say them.
I would not have included the step of identifying Goals were
it not for the fact that almost every C level executive, every
VP and SVP, give very high level nearly impossible to pin
down business objectives.
Point of confusion: Many web Analysis tools, like Google
Analysis, have a feature that encourages you to
measure Goals. Like so. . .
It is possible that some Analysis Tool Goals directly measure
your business objectives or goals. Usually though Analysis
Tool Goals do not rise to the strategic importance so as to
measure either your business objectives or your goals.
For example only one of the above, Subscribers, is an actual
goal ("increase persistent reach")for me that lines up directly
with a business objective ("effective permission marketing").
Others are nice to know.
So to be clear: Just because you have Goals in your Analysis
tool defined is not a sure sign that you know what your
business objectives or goals are.
Before you touch the data make sure your business objectives
(usually 3, or 5 max) are clearly identified and you have
drilled down to really DUMB goals!
Metric:
A metric is a number.
That is the simplest way to think about it.
Technically a metric can be a Count (a total) or a Ratio (a
division of one number by another).
Examples of metrics that are a Count is Visits or Pageviews.
Examples of a Ratio is Conversion Rate (a quantitative
metric) or Task Completion Rate (a qualitative metric).
This is a crude way to think about it but. . . Metrics almost
always appear in columns in a report / excel spreadsheet.
This is what metrics look like in your web Analysis tool:
Metrics form the life blood of all the measurement we do.
They are the reason we call the web the most accountable
channel on the planet.
Key Performance Indicator:
Key performance indicators (KPI's) are metrics. But not
normal metrics. They are our BFF's.
Here is the definition of a KPI that is on Page 37 of Web
Analysis 2.0:
A key performance indicator (KPI) is a metric that helps you
understand how you are doing against your objectives.
That last word – objectives – is critical to something being
called a KPI, which is also why KPI's tend to be unique to
each company.
I run www.bestbuy.com. My business objective is to sell lots
of stuff. My web Analysis KPI is: Average Order Size.
Business objective: Sell Stuff. KPI: Average Order Size.
I might use other metrics in my reports, say Visits or # of
Videos Watched or whatever. But they won't be my KPI's.
Makes sense? No? Ok one more. . .
I run www.nytimes.com. My business objective is to make
money. One of my KPI's is: Visitor Loyalty (number of visits
to the site by the same person in a month) and another one is #
of clicks on banner ads.
So one thing should be pretty clear to you by now. . . if you
don't have business objectives (from your HiPPO's) clearly
defined, you can't identify what your KPI's are.
No matter how metrics rich you are. You'll be information
poor. Forever. So. Don't be.
Business Objectives -> Goals -> KPI's -> Metrics -> Magic.
Targets:
Targets are numerical values you have pre-determined as
indicators success or failure.
It is rare, even with the best intentions, that you'll create
targets for all the metrics you'll report on.
Yet it is critical that you create targets for each web Analysis
key performance indicator.
I am still at Best Buy. My KPI is still Average Order Value.
But how do I know what's good or bad?
I'll consult with my finance team. I'll confab with my
Assistant Senior Vice President for American Online Sales.
I'll look over my historical performance.
Through this consultative process we'll create a 2010 AOV
target of $95.
Now when I do analysis of my performance (not just in
aggregated but segmented by geo and campaign and source
and…) I'll know if our results are good or bad or ugly.
I will do this for every single KPI whose responsibility is
thrust on em.
You can create targets for the quarter (Christmas!) or for the
year or to any drill down level of specificity. But at least have
one overall target for each KPI.
Business Objectives -> Goals -> KPIs -> Metrics -> Targets -
> Minor Orgasms.
Dimension:
A dimension is, typically, an attribute of the Visitor to your
website.
Here's a simplistic pictorial representation. . .
The source that someone came from (referring urls,
campaigns, countries etc) is a dimension in your web Analysis
data.
So is technical information like browsers or mobile phones or
(god save you if you are still doing daily reports on) screen
resolution or ISP used.
The activity a person performed such as the landing page
name, the subsequent pages they saw, videos they played,
searches they did on your website and the products they
purchased are all dimensions.
Finally the day they visited, the days since their last visit (if
returning visitor) the number of visits they made, the number
of pages they saw are all dimensions as well. I know, I know,
they sound like metrics. But they are, as the definition says up
top, attributes of the visitor and their activity on your website.
This is a crude way to think about it but… Dimensions almost
always appear in rows in a report / excel spreadsheet.
Here are the metrics and dimensions in one of my
favorite Yahoo! Web Analysis reports, it shows me how many
clicks it takes for visitors to get to content I consider valuable.
..
Columns and rows. Get it?
Let's solidify this with another example of a report that shows
metrics and dimensions. This report might not come to your
mind most easily. I am looking at the internal site searches (on
this blog) and the continent from where the search is done and
a set of metrics to judge performance. . .
Dimensions allow you to group your data into different
buckets and they are most frequently used to slice and dice the
web Analysis data.
In your web Analysis tools you'll bump into dimensions when
you are either creating custom reports (love this!) or
doing advanced segmentation (worship this!). The chooser
thingys look like this. . .
In Yahoo! Web Analysis they are called "Groups" or "Group
Selection" but they are the same thing: Dimensions.
There are many long and complicated definitions of
dimensions. There are some nuances that I have simplified.
But I hope that this definition and explanation helps you
internalize this key concept in web Analysis.
Segments:
A segment contains a group of rows from one or more
dimensions.
In aggregate almost all data is useless (like # of Visits). The
best way to find insights is to segment the data using one or
more dimensions (like # of Visits from: USA, UK, India as a
% of All Visits).
You segment by dimensions and report by metrics.
Here are some examples of segments I use in my Google
Analysis account:
Checkout the dimensions I am using to segment my website
traffic to understand performance better.
Analyzing people just from North Carolina (because there
was an ad campaign targeted just to NC)
People who spend more than one minute on the site
People who click on the link to go to Feed burner to sign
up for my RSS feed
People who come from images.google.com and smart
mobile phones
People who visit from one source, Wikipedia, AND only
one page on Wikipedia (the bounce rate article)
These are just a few of the 28 advanced segments I have
created in my Analysis profile.
And I am not even a real business.
Think of how many segments I would analyze to truly analyze
my Key Performance Indicators to understand causes of
success or failure of my Business Objectives!
The Analysis Ninja rallying cry: Segment or Die!
So now you know the seven most fundamental, yet critical,
things you need to know about online Analysis.
If you fee that you did not understand it all, please go back
and re-read it. You are very welcome to ask questions or for
clarification via comments. Whatever it takes, make sure you
are able to internalize this.
Let's move to the last step. . .
Web Analysis Measurement Framework
As promised I want to wrap up this post with a couple of
examples that pull this whole thing together.Let's say I am
responsible for the National Council of La Raza (a wonderful
organization I support). Here is how the measurement
framework could possibly look for me.
Importance of Website Analysis
Keywords: Keywords are the vital element that plays an
important role in bringing traffic to the website. Incase, the
keywords are not targeting the potential customers on web, it
certainly leaves a bad impact on its traffic generation.
Through website analysis, professionals can find which
keywords are working well for you and which are not. The
analysis report brings the actual result of the keywords with
which professionals get an idea to replace the keywords with
new one or start working for the same with new strategy.
Traffic Volume: Through analysis, the professionals easily
find the volume of traffic. The professionals can find from
where the traffic is actually coming from and not. This
enables the professionals to strengthen the practice and SEO
activities to obtain quality traffic from all across the world.
Once the keywords start performing well, it certainly boosts
the traffic volume. Thus to improve the traffic volume,
professionals focuses on their keywords and various other
SEO activities.
Web Content: Apart from Keywords and Traffic, the
content of the website also matters a lot. This keeps the
potential visitors to easily connect with the brand and product.
The quality content of the website helps the professionals to
maintain the brand identity and beat their competitors with its
high page rank and traffic generating from the search engine.
The quality and appealing web content enables the brand to
have huge profit from SEO and empowers the brand to have
strong presence on web.
Traffic on website
Total visits in a day
Percentage of visitors from outside sources
New visitors comes everyday
Ranking of the website
Performance on search engines