0% found this document useful (0 votes)

332 views39 pages

CPA Complete Guide

Wanna know CPA marketing it's for you

Uploaded by

crazy tarak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

332 views39 pages

CPA Complete Guide

Wanna know CPA marketing it's for you

Uploaded by

crazy tarak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

Search Engine Optimization—Black

and White Hat Approaches

ROSS A. MALAGA
Management and Information Systems, School
of Business, Montclair State University, Montclair,
New Jersey, USA

Abstract
Today the first stop for many people looking for information or to make a
purchase online is one of the major search engines. So appearing toward the
top of the search results has become increasingly important. Search engine
optimization (SEO) is a process that manipulates Web site characteristics and
incoming links to improve a site’s ranking in the search engines for particular
search terms. This chapter provides a detailed discussion of the SEO process.
SEO methods that stay within the guidelines laid out by the major search engines
are generally termed ‘‘white hat,’’ while those that violate the guidelines are
called ‘‘black hat.’’ Black hat sites may be penalized or banned by the search
engines. However, many of the tools and techniques used by ‘‘black hat’’
optimizers may also be helpful in ‘‘white hat’’ SEO campaigns. Black hat
SEO approaches are examined and compared with white hat methods.

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1. Search Engines History and Current Statistics . . . . . . . . . . . . . . . . . 4
2.2. SEO Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3. The SEO Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.1. Keyword Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.2. Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3. On-Site Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.4. Link Building . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

ADVANCES IN COMPUTERS, VOL. 78 1 Copyright © 2010 Elsevier Inc.

ISSN: 0065-2458/DOI: 10.1016/S0065-2458(10)78001-3 All rights reserved.
2 R.A. MALAGA

4. Black Hat SEO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.1. Black Hat Indexing Methods . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2. On-Page Black Hat Techniques . . . . . . . . . . . . . . . . . . . . . . . . 19
4.3. Cloaking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.4. Doorway Pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.5. Content Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.6. Link Building Black Hat Techniques . . . . . . . . . . . . . . . . . . . . . 25
4.7. Negative SEO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5. Legal and Ethical Considerations . . . . . . . . . . . . . . . . . . . . . . 31
5.1. Copyright Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.2. SEO Ethics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.3. Search Engine Legal and Ethical Considerations . . . . . . . . . . . . . . . 33
6. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
6.1. Conclusions for Site Owners and SEO Practitioners . . . . . . . . . . . . . 35
6.2. Future Research Directions . . . . . . . . . . . . . . . . . . . . . . . . . . 35
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

1. Introduction

The past few years have seen a tremendous growth in the area of search engine
marketing (SEM). SEM includes paid search engine advertising and search engine
optimization (SEO). According to the Search Engine Marketing Professional
Organization (SEMPO), search engine marketers spent over $13.4 billion in 2008.
In addition, this figure is expected to grow to over $26 billion by 2013. Of the
$13.4 billion spent on SEM, about 10% ($1.4 billion) was spent on SEO [1].
Paid advertising are the small, usually text-based, ads that appear alongside the
query results on search engine sites (see Fig. 1). Paid search engine advertising usually
works on a pay-per-click (PPC) basis. SEO is a process that seeks to achieve a high
ranking in the search engine results for certain search words or phrases. The main
difference between SEO and PPC is that with PPC, the merchant pays for every click.
With SEO each click is free (but the Web site owner may pay a considerable amount to
achieve the high ranking). In addition, recent research has shown that users trust the
SEO (called organic) results and are more likely to purchase from them [2].
Industry research indicates that most search engine users only clicked on sites that
appeared on the first page of the search results—basically the top 10 results. Very
few users clicked beyond the third page of search results [3]. These results confirm
the research conducted by Granka et al. [4], in which they found that almost 80% of
SEARCH ENGINE OPTIMIZATION 3

FIG. 1. Paid versus organic search results.

the clicks on a search engine results page came went to those sites listed in the first
three spots.
SEO has become a very big business. Some of the top optimizers and SEO firms
regularly charge $20,000 or more per month for ongoing optimization. It is not
uncommon for firms with large clients to charge them $150,000 or more on a
monthly basis [5].
Because of the importance of high search engine rankings and the profits
involved, search engine optimizers look for tools, methods, and techniques that
will help them achieve their goals. Some focus their efforts on methods aimed at
fooling the search engines. These optimizers are considered ‘‘black hat,’’ while
those that closely follow the search engine guidelines would be considered ‘‘white
hat.’’ There are two main reasons why it is important to understand the methods
employed by black hat optimizers. First, some black hats have proven successful in
achieving high rankings. When these rankings are achieved, it means that white hat
sites are pushed lower in the search results. However, in some cases these rankings
might prove fleeting and there are mechanisms in place to report such sites to the
search engines. Second, some of the tools and methods used by black hat optimizers
can actually be used by white hat optimizers. In many cases, it is just a matter of
scope and scale that separates black and white hat.
4 R.A. MALAGA

While there are some studies dealing with SEO, notably Refs. [6–9], academic
research in the area of SEO has been relatively scant given its importance in the
online marketing field. This chapter combines the academic work with the extensive
practitioner information. Much of that information comes in the form of blogs,
forum discussions, anecdotes, and Web sites.
The remainder of this chapter proceeds as follows. Section 2 provides a back-
ground on search engines in general and basic SEO concepts. After that a detailed
discussion on the SEO process including keyword research, indexing, on-site
factors, and linking ensues. The section that follows focuses on black hat SEO techni-
ques. Legal and ethical implications of SEO are then discussed. Finally, implications for
management, conclusions, and future research directions are detailed.

2. Background

A search engine is simply a database of Web pages, a method for finding Web
pages and indexing them, and a way to search the database. Search engines rely on
spiders—software that followed hyperlinks—to find new Web pages to index and
insure that pages that have already been indexed are kept up to date.
Although more complex searches are possible, most Web users conduct simple
searches on a keyword or key phrase. Search engines return the results of a search
based on a number of factors. All of the major search engines consider the relevance
of the search term to sites in its database when returning search results. So, a search
for the word ‘‘car’’ would return Web pages that had something to do with auto-
mobiles. The exact algorithms used to determine relevance are constantly changing
and are a trade secret.

2.1 Search Engines History and Current Statistics

The concept of optimizing a Web site so that it appears toward the top of the
results when somebody searches on a particular word or term has existed since the
mid-1990s. Back then the search engine landscape was dominated by about 6–10
companies, including Alta Vista, Excite, Lycos, and Northern Lights. At that time,
SEO largely consisted of keyword stuffing. That is adding the search term numerous
times to the Web site. A typical trick employed was repeating the search term
hundreds of times using white letters on a white background. Thus, the search
engines would ‘‘see’’ the text, but a human user would not.
The search engine market and SEO have changed dramatically over the past few
years. The major shift had been the rise and dominance of Google. Google currently
handles more than half of all Web searches [10]. The other major search engines used
SEARCH ENGINE OPTIMIZATION 5

in the United States are Yahoo and MSN. Combined, these three search engines are
responsible for over 91% of all searches [10]. It should be noted that at the time this
chapter was written, Microsoft had just released Bing.com as its main search engine.
The dominance of the three major search engines (and Google in particular)
combined with the research on user habits meant that for any particular search term,
a site must appear in the top 30 spots on at least one of the search engines or it was
effectively invisible. So, for a given term, for example ‘‘toyota corolla,’’ there were
only 90 spots available overall. In addition, 30 of those spots (the top 10 in each search
engine) are highly coveted and the top 10 spots in Google are extremely important.

2.2 SEO Concepts

Curran [11] states, ‘‘search engine optimization is the process of improving a
website’s position so that the webpage comes up higher in the search results [search
engine results page (SERP)] of major search engines’’ (p. 202). This process
includes manipulation of dozens or even hundreds of Web site elements. For
example, some of the elements used by the major search engines to determine
relevance include, but are not limited to: age of the site, how often new content is
added, the ratio of keywords or terms to the total amount of content on the site, and
the quality and number of external sites linking to the site [12].

3. The SEO Process

In general, the process of SEO can be broken into four main steps: (1) keyword
research, (2) indexing, (3) on-site optimization, and (4) off-site optimization.

3.1 Keyword Research

A search engine query is basically just a word or phrase. It is the result of the
query to a specific word or phrase that is of interest to search engine optimizers. The
problem is that there are usually many words or phrases that can be used for a
particular search. For example, if a user was looking to purchase a car—say a Toyota
Prius, she might use any of the following words or phrases in her search:
l Car
l Automobile
l New car
l Toyota
6 R.A. MALAGA

l Prius
l Toyota Prius
l New Toyota Prius
l Toyota Prius New York City
l Toyota Prius NYC
l NYC Toyota Prius
It is easy to see that this list can keep going. In terms of SEO, which term or terms
should we try to optimize our site for?
Keyword research consists of building a large list of relevant search words and
phrases and then comparing them along three main dimensions. First, we need to
consider the number of people who are using the term in the search engines. After all,
why optimize for a term that nobody (or very few people) use? Fortunately, Google
now makes search volume data available via its external keyword tool (available at
https://adwords.google.com/select/KeywordToolExternal). Simply type the main
keywords and terms and click on Get Keyword Ideas. Google will generate a large
list of relevant terms and provide the approximate average search volume (see Fig. 2).

FIG. 2. Google external keyword tool search volume results for hybrid car.
SEARCH ENGINE OPTIMIZATION 7

Clearly, we are looking for terms with a comparatively high search volume.
So, for example, we can start building a keyword list with:
l Toyota Prius—388,000
l Hybrid car—165,000
l Hybrid vehicle—33,100
l Hybrid vehicles—60,500
l Hybrid autos—4400
Many search engine optimizers also consider simple misspellings. For example,
we can add the following to our list:
l Hybird—27,100
l Hybird cars—2900
l Pruis—9900
Once we have generated a large list of keywords and phrases (most optimizers
generate lists with thousands of terms), the second phase is to determine the level of
competition for each term. To determine the level of competition, simply type the
term into Google and see how many results are reported in the top right part of the
page (see Fig. 3).
To compare keyword competition, optimizers determine the results to search
(R/S) ratio. The R/S ratio is calculated by simply dividing the number results
(competitors) by the number of searches over a given period of time. On this scale
lower numbers are better. So, we might end up with a list like that in Table I.
Comparing R/S ratios is more effective than just looking at how many people are
searching for a particular word or phrase as it incorporates the level of competition.
In general, optimizers want to target terms that are highly searched and have a low
level of competition. However, the R/S ratio can reveal terms that have a relatively

FIG. 3. Competition in Google results.

8 R.A. MALAGA

Table I
SEARCH RESULTS AND COMPETITION

Term No. of searches per month Competition R/S

Toyota Prius 388,000 5,490,000 14.14

Hybrid car 165,000 12,900,000 78.18
Hybrid vehicle 33,100 17,900,000 540.78
Hybrid vehicles 60,500 7,880,000 130.24
Hybird 27,100 1,230,000 45.38

low level of searches, but also a very low level of competition. For instance, Table I
shows that the misspelled word ‘‘hybrid’’ has a lower search volume than many of
the other terms. However, when the competition is also considered via the R/S ratio,
the misspelled word appears to be a good potential target for SEO.
The third factor to consider, at least in most cases, is the commercial viability of
the term. To determine commercial viability we must understand a bit about
consumer buying behavior. The traditional consumer purchase model consists of
five phases: (1) need recognition, (2) information search, (3) option evaluation,
(4) purchase decision, and (5) postpurchase behavior.
Once a potential consumer becomes aware of a need she begins to search for more
information about how to fulfill that need. For example, a person who becomes
aware of a need for a car would begin by gathering some general information.
For example, she might research SUVs, trucks, sedans, etc. At this point the
consumer does not even know what type of vehicle she wants. She might use search
terms like ‘‘how to buy a car’’ or ‘‘types of cars.’’ Since the consumer does not
know what type of car she wants at this point, these terms would be considered to
have low commercial viability.
In the next phase, the consumer begins to narrow down the choices and evaluate
the various options. Some exemplar search terms in this phase might include ‘‘car
review,’’ ‘‘SUV recommendation,’’ and ‘‘best cars.’’ These terms have a bit more
commercial viability, but would still not be considered high viable.
During the fourth phase, consumers have made a choice and are now just looking
for where to purchase—comparing options like price, warranties, service, trust, etc.
At this point the search terms become much more specific. For example, the
consumer might use terms like ‘‘Toyota Prius 2009,’’ ‘‘prius best price,’’ and
‘‘new jersey Toyota dealer.’’ Since the consumer is ready to purchase these terms
are considered to have high commercial viability.
A good optimizer will actually target multiple terms—some for the site’s home-
page and some for the internal pages of the site. For instance, the site for a Toyota
dealer in Montclair New Jersey might use ‘‘New Jersey Toyota Dealer’’ as the main
SEARCH ENGINE OPTIMIZATION 9

SEO target for the homepage. The same site might use ‘‘Toyota Prius 2009 Best
Price’’ for an internal page that lists the features of that car and the details of the
vehicles on the lot.
Clearly, determining commercial viability is a combination of art and science.
It requires the optimizer to think like a consumer. Microsoft researchers have
conducted research into this area [13]. They have broken search into three main
categories: navigational, informational, and transactional. In addition, queries are
also categorized as either commercial or noncommercial based on the nature of the
search term used, resulting in the 3 ! 2 grid shown in Table II. For example, terms
that include words such as ‘‘buy,’’ ‘‘purchase,’’ or ‘‘price’’ would be considered
commercial in nature. The researchers determined the categorization of commercial
and noncommercial by asking human reviewers to rate various terms along those
dimensions. Obviously, this approach has serious limitations. However, Microsoft
has developed a Detecting Commercial Online Intent tool, which is available at
http://adlab.microsoft.com/Online-Commercial-Intention/. Many optimizers use
this site to gauge the commercial viability and search type of their keywords.
Finally, some optimizers have attempted to capture the consumer earlier in the
process—during the option evaluation phases in particular. This is typically accom-
plished by developing review and recommendation type sites. There are, to date, no
reliable data on how well these types of sites perform in terms of moving the visitor
from the information phase to the transactional phase.

3.2 Indexing
Indexing is the process of attracting the search engine spiders to a site, with the goal of
getting indexed (included in the search engine’s database) and hopefully ranked well by
the search engine quickly. All of the major search engines have a site submit form where
a user could submit a site for consideration. However, most SEO experts advise against
this approach. It appears that the major search engines prefer ‘‘discovering’’ a new site.
The search engines ‘‘discover’’ a new site when the spiders find a link to that site from

Table II
COMMERCIAL ONLINE INTENTION (ADAPTED FROM REF. [13])

Commercial Noncommercial

Navigational Toyota Hotmail

Informational Hybrid Car San Francisco
Transactional Buy Toyota Prius Collide lyrics
10 R.A. MALAGA

other sites. So the main approach to indexing involves getting links to a site from other
sites that were frequently visited by the spiders.
An increasingly popular approach to generating quick links to a new site is via
Web 2.0 properties. The term Web 2.0 seems to have been coined by Tim O’Reilly
whose O’Reilly Media sponsored a Web 2.0 conference in 2004 [14]. There does not
appear to be any standard definition of Web 2.0 as the concept is continually
evolving. However, Web 2.0 incorporates concepts such as user-developed content
(blogs, wikis, etc.), social bookmarking, and the use of really simple syndication
(RSS). For the purposes of indexing, user-developed content and social bookmark-
ing sites are key.
User-developed content sites enable users to quickly and easily publish written,
audio, and/or video content online. Many content sites employ a simple user
interface for text input that in many ways mimics traditional word processing
software. Most of these sites allow users to upload pictures and embed multimedia
content (audio and video). Among the most widely used Web 2.0 content sites are
blogs and wikis from multiple providers, and sites that allow users to create simple
Web sites (e.g., Squidoo and Hubpages).
A blog (short for Weblog) is simply a Web site where a user can post comments
and the comments are typically displayed in reverse chronological order. The
comments can range broadly from political commentary to product reviews to
simple online diaries. Modern blog software can also handle pictures, audio files,
and video files. The blog tracking site Technorati had recorded over 112 million
blogs in its system as of November 2007 [15].
A wiki is a type of software that allows users to create a Web site in a collabora-
tive manner. Wikis use a simple markup language for page creation. The most well-
known wiki site is Wikipedia (www.wikipedia.org)—an online encyclopedia to
which anyone can contribute. In fact, the site has over 9 million articles and more
than 75,000 contributors. Personal wikis are easy to create on a user’s own domain
or on free wiki sites, such as Wetpaint.com.
According to Wikipedia, ‘‘social book marking is a way for Internet users to store,
organize, share, and search bookmarks of web pages’’ [16]. These bookmarks are
typically public, meaning anyone can see them. Bookmarks are usually categorized
and also tagged. Tagging allows the user to associate a bookmark with any chosen
words or phrases. Tags are completely created by the user, not the social book-
marking site. Some of the more popular social bookmarking sites include Del.icio.us,
Digg, and StumbleUpon.
An optimizer who wants to get a new site indexed quickly can build one or more
Web 2.0 content sites and include links to the new site. The optimizer can also use
social bookmarking to bookmark the new site. Since the search engine spiders visit
these sites frequently, they will find and follow the links to the new site and index
SEARCH ENGINE OPTIMIZATION 11

them. This approach has an added benefit, in that the content site or social bookmark
itself might rank for the chosen term. If done correctly a good optimizer can
dominate the top positions in the search engines using a combination of their own
sites, Web 2.0 content sites, and social bookmarks.
Malaga [17] reports on using Web 2.0 SEO techniques to dominate the search
engine results pages. In one experiment the researcher was able to get sites indexed
in less than a day. In addition, after only a few days the researcher was able to obtain
two top 10 rankings on Google and five top 30 rankings. The results were similar for
results on Yahoo and MSN.

3.3 On-Site Optimization

On-site optimization is the process of developing or making changes to a Web site
in order to improve its search engine rankings. There are dozens, perhaps hundreds,
of on-site factors that are considered in determining a site’s ranking for a particular
term.
Some of the main on-site factors used by the search engines to determine rank
include title tag, meta description tag, H1 tag, bold text, keyword density, and the
constant additional of relevant unique content.

3.3.1 Meta Tags

Meta tags are HTML structures that are placed in the HEAD section of a Web
page and provide metadata about that page. Research has shown that meta tags are
an important component in SEO. Zhang and Dimitroff [18], for instance, examined a
large number of SEO components in controlled experiments. Their results show that
sites that make proper use of meta tags achieve better search engine results. In terms
of SEO, the most frequently used (and perhaps important) meta tags include title,
description, and keyword. Search engine optimizers are also usually interested in the
robots meta tag.
Officially the title tag is not really a meta tag; however since it works like a meta
tag, most optimizers think of it like one. The title tag is used by the search engines in
a number of ways. First, it appears to be an important component in the ranking
algorithms of the major search engines. Malaga [6] showed that aligning the title tag
with the targeted SEO term can result in major improvement in search engine
rankings. Second, the search engines typically use the title tag as a site’s name in
the search engine results page (see Fig. 4).
There are two common mistakes that many Web developers make when it comes
to the title tag. First, they use the name of the company or the domain name as the
title tag. In other words, they do not align the title tag with the chosen SEO target
12 R.A. MALAGA

FIG. 4. Use of title and description meta tags in search results.

term. Second, they use the same title tag for every page on the site. If we consider the
example of the Toyota dealer the homepage might have a title tag of ‘‘Montclair
New Jersey Toyota Dealer.’’ However, we would want the internal pages to have
their own title tags. For example, the Prius page might use ‘‘Toyota Prius’’ for the
title and the Corolla page would use ‘‘Toyota Corolla.’’
The description meta tag is used to explain what the Web page is about. It is
usually a short sentence or two consisting of about 25–30 words. Again, the search
engines use the description tag in two ways. First, it appears that at least some of the
search engines use the description tag as part of their ranking algorithm. Second,
some of the search engines will display the contents of the description tag below the
site link in the search engine results page (see Fig. 4).
The keyword meta tag is simply a list of words or phrases that relate to the Web
page. For example, terms that might be used on the Toyota Prius page are ‘‘prius,’’
’’Toyota prius,’’ and ‘‘hybrid car.’’ The list of keywords can be quite long.
It appears that the major search engines do not give the keyword tag much or any
weight in determining a site’s rank. But some anecdotal evidence exists to support
the notion that Yahoo still uses this tag in its ranking algorithm.
The major search engines will index each Web page they find and will follow all of
the links on that page. There are some circumstances in which we might not want the
search engines to index a page or follow the links on it. For example, we might not
want a privacy policy or terms of use page to be indexed by the search engines. In this
case the optimizer can use a robot tag set to ‘‘noindex.’’ The major search engines will
still crawl the site, but most will not include the page in their database. There are some
cases where we might not want the search engines to follow the links on a particular
page. Including sponsored links on a page would be a good example. In this case the
optimizer would use a robot tag set to ‘‘nofollow.’’ These terms can be combined so
that a page is not indexed and the links on the page are also not followed. It should
be noted that while the major search engines currently abide by ‘‘noindex’’ and
‘‘nofollow’’ statements, they might change their policies at any time.
SEARCH ENGINE OPTIMIZATION 13

Optimizers and Web developers often handle robots restrictions at the domain
level. They can include a file named robots.txt in the root directory. That file lists
directories and files that should and should not be followed or indexed.
If we pull all meta tag information together, we might wind up with the following
example for a Prius review on a hybrid car blog that is targeting the term ‘‘prius car
review’’:
< head>
< title>Prius Car Review</title>
< meta name¼"description" content¼"Unbiased Prius Car Review
from the Hybrid Car Experts"/>
< meta name¼"keyword" content¼"prius, toyota prius, hybrid car
review, hybrid truck review, hybrid SUV review, hybrid cars"/>
</head>

If the same blog also contained a page with sponsored links that the author did not
want the search engines to index or follow the head might appear as follows:
< head>
< title>Hybrid Car Review--Sponsors</title>
< meta name¼"description" content¼"Links to interesting sites
about hybrid cars"/>
< meta name¼"keyword" content¼"hybrid car review, hybrid truck
review, hybrid SUV review, hybrid cars, hybrid car blog, hybrid car
links"/>
< meta name¼"robots" content¼"noindex">
< meta name¼"robots" content¼"nofollow">
</head>

3.3.2 Other On-Site Elements

While meta tags are important the text content on the page may be just as, if not
more, important. For instance, Zhang and Dimitroff [9] found that sites with key-
words that appear in both the site title tag and throughout the site’s text have better
search engine results than sites that only optimize the title tag.

3.3.2.1 Latent Semantic Indexing. All of the major search

engines place a good deal of importance on relevant text content. The search engines
determine relevance by analyzing the text on the Web page. For this purpose, the
search engines tend to ignore HTML tags, Javascript, and cascading style sheet
(CSS) tags. ‘‘Understanding’’ what a particular Web page is about is actually quite a
14 R.A. MALAGA

complex task. A simple approach is to just match up the text on a Web site with the
query. The problem with this technique is that there are many ways to search for the
same thing. To overcome this problem, the major search engines now use latent
semantic indexing (LSI) to ‘‘understand’’ a Web page. Google’s Matt Cutts has
confirmed that Google uses LSI [19].
Latent semantic indexing (sometimes called latent semantic analysis) is a natural
language processing method that analyzes the pattern and distribution of words on a
page to develop a set of common concepts. Scott Deerwester, Susan Dumais, George
Furnas, Richard Harshman, Thomas Landauer, Karen Lochbaum, and Lynn Streeter
were granted a patent in 1988 (U.S. Patent 4,839,853) for latent semantic analysis.
A basic understanding of LSI is important in terms of writing for the search engines.
The technical details of LSI are beyond the scope of this chapter—the interested
reader is referred to Ref. [20]. However, a short example should aid in understanding
the overall concept. If we go back to our Toyota dealer we might expect its site to
include terms like car, truck, automobile, as well as names of cars (Corolla, Prius,
Highlander, etc.). To keep the example simple, let us assume that one page on our
Toyota dealer’s site might read ‘‘Toyota trucks are the most fuel efficient.’’ If a
search engine just used a simple matching approach then a query for ‘‘Toyota
Tacoma’’ would not return the page—since that exact term does not appear on the
page. However, using LSI the search engine is able to ‘‘understand’’ that the page is
about Toyota trucks and that a search for Toyota Tacoma should return the page.
It does this by comparing the site with all of the others that include terms such as
‘‘Toyota,’’ ‘‘truck,’’ and ‘‘Tacoma’’ and judges that it is likely that these terms are
related. The goal of LSI is to determine just how closely related the terms are. For
example, if the user searched on the term ‘‘Chevy Silverado’’ it should not return the
Toyota truck page. Although both pages are about ‘‘trucks,’’ using LSI the search
engine can determine that the relation among the terms is not very close.
It should be noted that Google probably does not actually compare pages with every
other page on the Web, as this would be extremely computationally intensive. However,
Google does attempt to determine the ‘‘theme’’ of individual pages and of sites.
Therefore, optimizers attempt to organize their sites based on the ‘‘themes’’ they are
attempting to optimize for. For example, if we were trying to optimize for ‘‘Toyota
trucks,’’ ‘‘Toyota sedans,’’ and ‘‘Toyota hybrids’’ each of these would become a page
on the dealer’s site. In addition, each page would be written around the ‘‘theme.’’

3.3.2.2 Updated Content. There seems to be a general belief

among optimizers that unique fresh text content is an essential aspect of SEO [21].
Many optimizers go to great lengths, such as hiring copy writers, to constantly add
fresh content to a site. However, some optimizers (and this author) have found that
SEARCH ENGINE OPTIMIZATION 15

some sites with no fresh content may still rank well. For example, Darren Rowse
writing on ProBlogger [22] discusses the case of a blog that still ranked third on
Google for the term ‘‘digital photography’’ 9 months after any new content was
added. In addition, this author had a site that ranked first on Google for ‘‘discount
ipod accessories’’ 2 years after any new content was added to the site. While the data
are scant and anecdotal, it appears that adding fresh content may help get a site
ranked well to begin, but does not appear to be necessary to maintain a high ranking.

3.3.2.3 Formatting. The content on the page is important for SEO, but
so is the formatting. Formatting gives the search engines clues as to what the Web
developer thinks are the important aspects of the page. Some of the formatting
components considered by the search engines include header tags (H1, H2, etc.),
bold, and emphasis.
Header tags are used to identify various sections of a page. Header 1 (H1) tags
deliminate major sections on a page. Higher header numbers (H2, H3, etc.) identify
subsections. The major search engines consider text in an H1 HTML header to be of
increased importance [23]. Therefore, many optimizers place the main key terms
within an H1 tag structure. Lesser SEO terms might be placed in higher number
header tags (H2, H3, etc.). The bold and emphasis (em) tags are also used by
optimizers to indicate important text on a page.
Some early Web browsers, such as Lynx, were text based. That is, they could not
display any graphics. The < alt> tag was originally used to display information
about a Web-based image when displayed in a text-based browser. Today the
overwhelming majority of Web traffic is from graphical browsers (such as Internet
Explorer, FireFox, Opera, and Chrome). However, optimizers still use the < alt> tag
as another way of serving text content to the search engines. At one time search
engines appeared to make use of the text in an < alt> tag. According to Jerry West
[24], today the search engines not only do not use the < alt> tag in determining
rankings, but also may actually penalize sites that use, or overuse, the tag. The
< alt> tag remains important for people with disabilities, so it is good Web devel-
opment practice to include descriptive < alt> tags for all images.

3.4 Link Building

All of the major search engines consider backlinks in their ranking algorithms.
A backlink is a hyperlink from a site to the target site. All the major search engines
also consider the relevance of the text used in the backlink (called the anchor text).
For example, a link to the Toyota Web site that said ‘‘car’’ would be considered
relevant, but one that said ‘‘flowers’’ would be irrelevant.
16 R.A. MALAGA

Yahoo and MSN use the number of backlinks in their algorithms. However,
Google places particular importance on backlinks. Google does not just consider
the number of links, but also the ‘‘quality’’ of those links. Google assigns each page
in its index a Page Rank (PR). PR is a logarithmic number scale from 0 to 10 (with
10 the best). Google places more weight on backlinks that came from higher
PR sites.
Exactly how PR is currently calculated is a closely held secret. However, we can
gain a basic understanding of the concept by examining the original PR formula
(U.S. Patent 6,285,999):
PRðAÞ ¼ ð1 % dÞ þ dðPRðt1 Þ=Cðt1 Þ þ ' ' ' þ PRðtn Þ=Cðtn ÞÞ;
where PR is the Page Rank of a particular Web page, t1,. . .,tn are Web pages that link
to page A, C is the number of links from a page to another page on the Web, and d is
a damping factor.
Clearly, determining PR is an iterative process. Also, each page provides a portion
of its PR to each page it links to. So, from a SEO perspective we want incoming links
from sites with a high PR. In addition, it is beneficial to obtain links from pages that
have fewer outgoing links. Consider the example in Fig. 5 (adapted from http://
www.whitelines.nl/html/google-page-rank.html).
In this example, the entire Web consists of only three pages (A, B, and C).
In addition, the only links are the ones indicated. Each page has its PR set initially
to 1.0. An initial PR needs to be assumed for each page. However, after 15 iterations
the actual PR emerges and can be seen in Fig. 6.
In addition to external sites, it must be noted that links from pages within a site
(internal links) also count toward PR. Therefore, optimizers must consider the
internal linking structure of the site. For example, many sites include links to privacy
policies, terms of service, about us, and other pages that do not allow visitors to take

Page A Page B

PR = 1.0 PR = 1.0

Page C

PR = 1.0

FIG. 5. Page Rank Example 1.

SEARCH ENGINE OPTIMIZATION 17

Page A Page B

PR = 1.164 PR = 0.644

Page C

PR = 1.192

FIG. 6. Page Rank Example 2.

Page A Page B

PR = 0.434 PR = 0.335

Page C

PR = 0.335

FIG. 7. Page Rank example with NoFollow.

commercial action. However, since links to these pages appear on every page in the
site they may obtain a high PR. Optimizers can attempt to manipulate PR by using
the Robots NoFollow meta tag.
For instance, assume that the above example represents pages on a site. If we use
NoFollow to essentially eliminate the link between page B and page C (note that the
link still exists, but does not count toward PR), we would wind up with the structure
shown in Fig. 7 after 20 iterations. In this example, adding the NoFollow leads to a
major change in PR for all of the pages on the site. In fact, it reduced the PR for all of
the pages—a very nonoptimal outcome!
Clearly, the result in this example is highly unlikely due to the circular link
structure on the site and lack of external links. But it does point out how an
inexperienced optimizer can run into problems when attempting to manipulate PR
on a site.
18 R.A. MALAGA

4. Black Hat SEO

Some of the basic tenants of black hat SEO include automated site creation by
using existing content and automated link building. While there is nothing wrong
with automation in general, black hat SEOs typically employ techniques which
violate the search engines Webmaster guidelines.
For example, both Google and Yahoo provide some guidance for Webmasters.
Since Google is the most widely used search engine and their guidance is the most
detailed we will discuss their policies.
Google’s Webmaster guidelines (available at http://www.google.com/support/
webmasters/bin/answer.py?answer¼35769) offer quality guidelines that are impor-
tant for SEO. Google states:
These quality guidelines cover the most common forms of deceptive or manipulative
behavior, but Google may respond negatively to other misleading practices not listed
here (e.g., tricking users by registering misspellings of well-known websites). It’s not
safe to assume that just because a specific deceptive technique isn’t included on this
page, Google approves of it. Webmasters who spend their energies upholding the spirit
of the basic principles will provide a much better user experience and subsequently
enjoy better ranking than those who spend their time looking for loopholes they can
exploit.
In general, Google would like Webmasters to develop pages primarily for users,
not search engines. Google suggests that when in doubt the Webmaster should ask
‘‘does this help my users?’’ and ‘‘would I do this if the search engines did not
exist?’’
The Google Quality Guidelines also outline a number of specific SEO tactics
which Google finds offensive. These tactics are discussed in detail in the sections
below.
The astute reader might now ask, what will Google (or any other search engine)
do if I violate its quality guidelines. Google has two levels of penalties for sites that
are in violation. Those sites that use the most egregious tactics are simply banned
from Google. For example, on February 7, 2006 Google banned BMW’s German
language site (www.bmw.de) for using a ‘‘doorway page.’’ This is a page that shows
different content to search engines and human visitors. Sites that use borderline
tactics may be penalized instead of banned. A penalty simply means that the site
loses ranking position.
We can use the penalty system to provide a working definition of black and white
hat SEO. In general, black hat SEO consists of methods that will most likely lead to
Google penalizing or banning the site at some point. White hat SEOs are methods
SEARCH ENGINE OPTIMIZATION 19

that Google approves of and will therefore not lead to any penalty. There are some
techniques that are borderline and some that are generally white hat, but may be
overused. We might define optimizers that fall into this category as gray hat. Gray
hat techniques may lead to a penalty, but will not usually result in a ban.
If black hat strategies lead to a site ban, why do it? Black hats tend to fall into two
categories. First, as it typically takes a bit of time for Google to ban a site, there are
individuals who use this delay to temporarily achieve a top ranking and make a bit of
money from their site. As they use software to automate the site creation and ranking
process, they are able to churn out black hat sites.
The second category consists of SEO consulting firms that use black hat techni-
ques. These companies achieve a temporary high ranking for their clients, collect
their money, and move on. For example, according to Google employee Matt Cutts’
blog [25], the SEO consulting company Traffic Power was banned from the Google
index for using black hat strategies. In addition, Google also banned Traffic Powers’
clients.

4.1 Black Hat Indexing Methods

While links from other sites might enable a site to get indexed quickly, it usually
takes time for a site to begin ranking well. As mentioned above, Google appears to
place new sites into the ‘‘sandbox’’ for an unspecified period of time in order to see
how the site evolves.
One of the primary tricks black hat SEOs use to attract search engine spiders is
called Blog Ping (BP). This technique consists of establishing hundreds or even
thousands of blogs. The optimizer then posts a link to the new site on each blog. The
final step is to continually ping the blogs. Pinging automatically sends a message to a
number of blog servers that the blog has been updated. The number of blogs and
continuous pinging attracts the search engine spiders that then follow the link.
It should be noted that many white hat SEOs use the BP technique in an ethical
manner. That is, they post a link to the new site on one (or a few) blog and then ping
it only after an update. This method has been shown to attract search engine spiders
in a few days [6].
As many of the techniques for indexing involve getting links to the site, we will
discuss them in more detail in Section 4.6.

4.2 On-Page Black Hat Techniques

Black hat optimizers use a variety of on-page methods. Most of these are aimed at
providing certain content only to the spiders, while actual users see completely
different content. The reason for this is that the content used to achieve high rankings
20 R.A. MALAGA

may not be conducive to good site design or a high conversion rate (the rate at which
site visitors perform a monetizing action, such as make a purchase). The three main
methods that fall into this category are keyword stuffing with hidden content,
cloaking, and doorway pages.

4.2.1 Keyword Stuffing

Keyword stuffing involves repeating the target keyword or term repeatedly on a
page. Take, for example, our Toyota Prius site. It might repeat the term ‘‘Toyota
Prius’’ over and over again on the page. Obviously, repeating this term on the page
would not look good to human visitors, so early optimizers used HTML elements
such as the same foreground and background colors, or very small fonts to hide the
content from human visitors. The search engines quickly got wise to these tricks and
will now ignore content that is obviously not visible to humans.
Black hat optimizers generally look for any way to stuff their keywords into the
page code. As mentioned above, use of the < alt> tag on all images is good Web
development practice. However, in the early days of SEO a typically keyword
stuffing trick was to repeat the word or phrase within the < alt> tag text. This is
the reason that the search engines no longer consider < alt> tag text in their ranking
algorithms.
Today black hat SEOs have found new keyword stuffing methods. For instance,
optimizers (both black hat and white hat) have experimented with more obscure
meta tags such as abstract, author, subject, and copyright. These tags can be used
responsibly by white hat SEOs and as a place to stuff keywords for black hats. While
the major search engines seem to ignore these tags, they may influence ranking in
less well-known search engines.

4.2.1.1 Cascading Style Sheet—Keyword Stuffing.

More recently, optimizers have taken to using CSS to hide elements. The elements
the optimizer wants to hide are placed within hidden div tags, extremely small
divisions, off-page divisions, and using z position to hide content behind a visible
layer (see the sample code below for complete details).
Hidden Division:
< div style:visibility ¼"hidden">Toyota Prius</div>

By setting the division’s visibility setting to ‘‘hidden,’’ none of the text in the
division is displayed to human visitors. However, the text can be found and indexed
by search engine spiders.
SEARCH ENGINE OPTIMIZATION 21

Small Division:
#hidetext a
{
width:1px;
height:1px;
overflow:hidden;
}
< div id¼"hidetext">Toyota Prius</div>

In this example, the text in the division is theoretically visible to both humans and
search engines. However, the layer is so small (only 1 pixel in size) that it will likely
be completely overlooked by human visitors.
Positioning Content Off-Screen:
#hideleft {
position:absolute;
left:-1000px;
}
< div id¼"hideleft">Toyota Prius</div>

The content in this division is also theoretically visible to both humans and search
engines. In this case, however, the layer is positioned so far to the left (1000 pixels)
that it will not be seen by humans.
Hiding Layers Behind Other Layers:
< div style¼"position: absolute; width: 100px; height: 100px;
z-index: 1" id¼"hide">
This is the text we want to hide</div>
< div style¼"position: absolute; width: 100px; height: 100px;
z-index: 2; background-color: #FFFFFF" id¼"showthis">
This is the text we want to display
</div>

This code example uses the CSS layer’s z index property to position one layer on
top of the other. The z index provides a measure of three dimensionality to a Web
page. A z index of 1 is content that site directly on the page. z index values greater than
1 refer to layers that are ‘‘coming out of the screen.’’ z index values less than 1 are used
for layers that are further behind the screen. In the example above, the hidden text is
put in a layer with a z index value of 1. The visible text appears in a layer with a z index
value of 2. Thus, due to the positioning, the second layer is aligned directly over the
first. This causes the text in the first layer to appear only to the search engines.
Google, for one, has begun removing content contained within hidden div tags
from its index. It has explicitly banned sites that use small divisions for keyword
22 R.A. MALAGA

stuffing purposes [26]. However, these policies may cause a problem for legitimate
Web site developers who use hidden divisions for design purposes. For example,
some developers use hidden CSS layers and z index positioning to implement mouse
over multilevel menus (menus that expand when the user places their mouse over a
menu item) and other interactive effects on their sites.

4.2.1.2 Keyword Stuffing Using Code. Javascript is widely

used by Web developers to add interactivity to their sites. One of the most popular
uses of Javascript is to enable cascading menus. However, some site visitors might
turn off Javascript in their browser settings. Therefore, good Javascript coding
specifies that the developer should include content in the NoScript tag. The NoScript
tag provides alternative navigation or text for those who have Javascript turned off.
NoScript Example:
< script type¼"text/javascript">
[Javascript menu or other code goes here]
</script>
< noscript>Toyota Prius, Prius, Prius Car, Prius Hybrid
</noscript>

While NoScript has legitimate uses, it can also be abused by black hat optimizers.
Many use the NoScript tag for keyword stuffing. There is some good anecdotal
evidence to suggest that this tactic has helped some sites achieve a high ranking for
competitive terms [27].

4.3 Cloaking
As we have already seen, some of the tricks used in black hat SEO are not conducive to
a good visitor experience. Cloaking overcomes this problem. The main goal of cloaking
is to provide different content to the search engines and to human visitors. Since users
will not see a cloaked page, it can contain only optimized text—no design elements are
needed. So the black hat optimizer will set up a normal Web site and individual, text
only, pages for the search engines. The Internet protocol (IP) addresses of the search
engine spiders are well known. This allows the optimizer to include simple code on the
page that serves the appropriate content to either the spider or human (see Fig. 8).
Some black hat optimizers are taking the cloaking concept to the next level and
using it to optimize for each individual search engine. Since each search engine uses
a different algorithm, cloaking allows optimizers to serve specific content to each
different spider.
Since some types of cloaking actually may provide benefits to users, the concept
of cloaking and what is, and is not, acceptable by the search engines has evolved
SEARCH ENGINE OPTIMIZATION 23

Human web visitor

Search engine spider

HTTP
HTTP request
request

Keyword optimized
Formatted page with no
web page formatting

Web server

FIG. 8. Cloaking.

over the past few years. One topic of much debate is the concept of geolocation.
Geolocation uses a visitor’s IP address to determine their physical location and
changes the site’s content accordingly. For instance, a site that sells baseball
memorabilia might use geolocation to direct people who live in the New York
City area to a Yankees page and those who live in the Boston area to a Red Sox page.
Clearly, geolocation allows site developers to provide more highly targeted content.
The main question is if the site serves different content to the search engines than to
most users, is it still considered cloaking? Maile Ohye [28] posting on the Google
Webmaster Central Blog chimed in on the controversy. According to Ohye as long as
the site treats the Google spider the same way as a visitor, by serving content that is
appropriate for the spider’s IP location, the site will not incur a penalty for cloaking.

4.4 Doorway Pages

The goal of doorway pages is to achieve high rankings for multiple keywords or
terms. The optimizer will create a separate page for each keyword or term. Some
optimizers use hundreds of these pages. Doorway pages typically use a fast meta
24 R.A. MALAGA

Search engine
spiders

Page formatted Page formatted Page formatted Page formatted Page formatted
for search engines: for search engines: for search engines: for search engines: for search engines:
keyword A keyword B keyword C keyword D keyword E

Fast refresh or redirect

Homepage
formatted for
human visitors

FIG. 9. Doorway pages.

refresh to redirect users to the main page (see Fig. 9). A meta refresh is an HTML
command that automatically switches users to another page after a specified period
of time. Meta refresh is typically used on out of date Web pages—you often see
pages that state you will be taken to the new page after 5 s. A fast meta
refresh occurs almost instantly, so the user is not aware of it. All of the major search
engines now remove pages that contain meta refresh. Of course, the black hats have
fought back with a variety of other techniques, including the use of Javascript
for redirects. This is the specific technique that caused Google to ban bmw.de and
ricoh.de.
SEARCH ENGINE OPTIMIZATION 25

4.5 Content Generation

Black hat optimizers make extensive use of tools that allow them to automatically
generate thousands of Web pages very quickly. The ability to quickly create new
sites is extremely important for black hats since their sites are usually targeted at low
competition terms and are also likely to be banned by the search engines. Site
generators pull relevant content from existing sites based on a keyword list. This
content includes not only text, but also videos and images.
As mentioned previously, unique content appears to be an important element in many
of the search engine algorithms. In fact, Google will penalize a site that includes
only duplicate content. It should be noted that some duplicate content on a site is
generally okay, especially now that many sites syndicate their content via RSS. As long
as a site has some unique content, it will usually not receive a penalty from Google. The
best site generators have the ability to randomize the content so that it appears unique.
To understand how site generators work, let us use the ‘‘Yet Another Content
Generator’’ (YACG) site generation tool (this is an open source tool that can be
found at getyacg.com). YACG is written in PHP and is highly flexible and custom-
izable. After downloading YACG and installing it on a Web server, the optimizer
simply uploads a list of keywords. The longer the list of keywords the larger the site.
For example, this author used a list of 40 weight loss terms. Next, some basic setup
information, such as the domain name, contact information, and some API keys
(needed to pull in content from certain sites), was entered. Within a matter of
minutes YACG has created a site with over 189 content pages, as seen in Fig. 10.
The tool also includes a sitemap and RSS feed based on the pages generated.
YACG has separated the look and feel of the site (template system) from the
content generation (the hook system). YACG templates use PHP and HTML and
provide the optimizer with complete control over site design. In fact, a paid version
of YACG called the YACG Mass Installer will actually grab the design of any site
the optimizer chooses to use as a template for generated sites.
The YACG hook system consists of scripts that pull in content from various sites.
For example, hooks for Wikipedia, Flickr, and YouTube are available in the basic
open source version. However, since YACG hooks are simple to code in PHP
optimizers can add hooks for virtually any site.

4.6 Link Building Black Hat Techniques

All of the major search engines consider the number and quality of incoming links
to a site as part of their algorithm. Links are especially important for ranking well on
Google. Therefore, black hat optimizers use a variety of methods to increase their
site’s backlinks (links from other sites).
26 R.A. MALAGA

FIG. 10. YACG-generated site.

4.6.1 Guestbook Spamming

One of the simplest black hat linking techniques is guest book spamming.
Optimizers simply look for guest book programs running on authority (usually
.edu or .gov) sites. Links from authority sites are given more weight than links from
ordinary sites. Black hats then simply add a new entry with their link in the comments
area. This is becoming more difficult for black hats as guest books have decreased in
popularity and those that do exist may use a nofollow tag to deter this behavior.

4.6.2 Blog Spamming

While guest books may have decreased in popularity over the past few years,
the popularity of blogs (Weblogs) have certainly increased. Black hat optim-
izers have obviously noticed this trend and have developed techniques to take
SEARCH ENGINE OPTIMIZATION 27

advantage of them. Two ways that black hats use blogs to generate incoming links
are blog comment spamming and trackback spamming.
Blog comment spamming is similar to guest book spamming, in that the optimizer
leaves backlinks in the comments section of publicly available blogs. However,
many blogging systems have added features to handle comment spam. At the
simplest level, the blog owner can require that all comments receive his or her
approval before they appear on the site. Another simple technique is to use nofollow
tags on all comments.
Many blogging systems now have a simple plugin that will require the commenter
to complete a ‘‘Completely Automated Public Turing test to tell Computers and
Humans Apart’’ (CAPTCHA). A CAPTCHA system presents a visitor with an
obscured word, words, or phrase. The obscuring is usually achieved by warping
the words, distorting the background, or segmenting the word by adding lines. While
humans can see through the obscuring technique, computers cannot. Since compu-
ters cannot solve a CAPTCHA, systems that use it are usually not vulnerable to
automated spamming software. However, some CAPTCHA systems have been
hacked, allowing black hat optimizers to successfully bypass this countermeasure
on certain sites.
A trackback is a link between blogs that is used when one blog refers to comment
on another blog. When this occurs, the original blog will generate a link to the blog
that made the comment. For example, legitimate blog A makes a post on a blog that
uses trackbacks. A black hat optimizer then ‘‘comments’’ on the post with his
backlink. The optimizer then sends blog a trackback ping (just a signal that indicates
that the optimizer had something to say about a post on blog A). Blog A will then
automatically display a summary of the comments provided by the black hat and a
link to the black hat’s site. Trackbacks bypass many of the methods that can be used
to handle comment spam. For this reason many blog systems no longer use
trackbacks.
Most serious black hat optimizers use scripts or software to find blogs that have
comments sections that do not require approval, do not have CAPTCHA, and do
follow. Once the system builds a large list of the appropriate type of blogs it then
submits the backlink to all of them. Similar systems are available that can find blogs
that have trackbacks enabled and implement trackback spamming. Using blog
spamming systems a black hat optimizer can easily generate thousands, even tens
of thousands, of quick backlinks.
It should be noted that many white hat optimizers also use scripts or software to
automate their own blog comment campaigns. However, instead of taking a scatter
shot approach, as the black hats do, white hat optimizers submit useful comments to
relevant blogs. They use the automated tools to find those blogs and manage their
comment campaigns.
28 R.A. MALAGA

4.6.3 Forum Spamming

The underlying concepts involved with guest book and blog spamming has also
been implemented in online forums. Many forums allow users to post links—either
in the body of their forum post or in their signature lines. Black hats have developed
automated tools that find open forums, develop somewhat relevant posts, and insert
the posts with a link back to the target site.

4.6.4 Stats Page Spamming

Stats page spamming, also known as referrer spam, is a bit more complicated than
the other types of link spam discussed above. Some Web server statistics packages,
such as AWStats, publish their statistics, including their referrer lists, publicly on the
domain on which they run. A black hat can repeatedly make a Web site request using
the target site as the referrer (requesting site). As shown in Fig. 11, when the

FIG. 11. Referrer statistics example.

SEARCH ENGINE OPTIMIZATION 29

statistics package publishes the referrer list the target site will appear as a backlink.
Like the other types of link spam discussed, there are scripts and software available
that will help the black hat find appropriate sites (those that publish referrer links)
and continuously request Web pages from them so that the referrer appears on the
top referrers list.
Clearly, a simple way around this type of link spam is to set stats packages so they
do not publish their results in a publicly accessible area. Site owners can also ban
specific IP addresses (or address ranges) or ignore requests from certain referrers.
However, these measures typically require at least some level of technical expertise,
which may be beyond most site owners.

4.6.5 Link Farms

Unlike link spamming, a link farm is a set of sites setup in order to link to each
other. Like most black hat sites many link farms are developed using automated
tools. Link farms usually work by including simple code on a links page. This code
generates a huge list of links from a database. All sites that include the code are
listed in the database. So, a site owner who includes the code on his page will have a
link to his site appear on every other site in the link farm.
As the search engines caught on to link farming, they began to ban sites that
participated in these schemes. Of course, the black hat optimizers started using more
subtle techniques. For example, instead of linking to every site in the database some
link farms will only present relevant links, which makes the outgoing links page
look more natural. Some will also randomize the links, so that every site does not
have the same list of links—which is fairly easy for the search engines to spot.
There is a fine line between legitimate link exchange management systems and
link farms. Link exchange management systems help optimizers (both white hat and
black hat) find sites they might want to link with. The systems will automate the
sending of a link request. These systems also check to see that the other site still
contains the optimizer’s link and will remove the reciprocal link if necessary.

4.6.6 Paid Links

Realizing the value of good links, many entrepreneurs started selling links from
their sites. In fact, some set up companies to act as link brokers, while others became
link aggregators. The aggregators would buy up links from various sites and resell
them as a package. There are no reliable statistics on the size of the link selling
industry; however, in its heyday (around 2004–2005) it was not uncommon for links
from high PR sites to sell for hundreds or even thousands of dollars per month.
30 R.A. MALAGA

Since Google’s algorithm gives links a great deal of weight, the company quickly
caught on to these schemes and began to take action in 2005 [29]. Google appears to
have two ways it deals with paid links. First, its algorithm looks for paid links and will
penalize sites that sell them and those that purchase them. It attempts to find paid links in
a number of ways. It can, for example, look for links that follow text like ‘‘sponsored
links’’ or ‘‘paid links.’’ It also looks for links that appear out of place from a contextual
perspective. For example, a site about computers that contains links to casino sites
might be penalized. Second, Webmasters can now report sites they believe are involved
in buying or selling links. These sites then undergo a manual review.
The penalties for buying or selling links appear to vary. At the low end of the
penalty range, Google might simply discount or remove paid links when determin-
ing a site’s PR. Sites that sell links may lose their ability to flow PR to other sites.
Finally, Google can ban repeat or egregious violators from its index.
Of course, the companies that sell links have attempted to develop techniques
aimed at fooling Google. For example, some companies now sell links that appear
within the text of a site. These links appear natural to Google. In addition, some
companies specialize in placing full site or product reviews. Some bloggers, for
example, will provide a positive site or product review, with associated links, for a fee.

4.6.7 HTML Injection

One popular off-site black hat method is HTML injection, which allows optimizers to
insert a link in search programs that run on another site. This is achieved by sending the
search program a query which contains special HTML characters. These characters
cause the insertion of data specified by the user into the site. For example, WebGlimpse
is Web site search program widely used on academia and government Web sites. The
Stanford Encyclopedia of Philosophy Web site located at plato.stanford.edu, which is
considered an authority site, uses the WebGlimpse package. So an optimizer that would
like a link from this authority site could simply navigate to http://plato.stanford.edu/cgi-
bin/webglimpse.cgi?nonascii¼on&query¼%22%3E%3Caþhref%3Dhttp%3A%2F
%2F##site##%3E##word##%3C%2Fa%3E&rankby¼DEFAULT&errors¼0&max
files¼50&maxlines¼30&maxchars¼10000&ID¼1. The optimizer then replaces
##site## with the target site’s URL and ##word## with the anchor text.

4.7 Negative SEO

One of the most insidious black hat methods is manipulating competitors’ search
engine results. The result is a search engine penalty or outright ban (a term black
hatters call bowling). The incentive for this type of behavior is fairly obvious.
SEARCH ENGINE OPTIMIZATION 31

If a black hat site is ranked third for a key term, the optimizer who can get the top
two sites banned will be ranked first.
There are a number of techniques that can be used for bowling. For instance, the
HTML injection approach discussed above can be used to change the content that
appears on a competitor’s site. If a black hat optimizer is targeting a site that sells
computers, for example, the HTML injected might be < H1>computer, computer,
computer. . . The extensive use of keywords over and over again is almost guaran-
teed to lead to a penalty or outright ban in all the major search engines.
A recent article in Forbes [30] discussed the tactic of getting thousands of quick
links to a site a black hat wants to bowl. Quickly piling up incoming links is viewed
in a negative light by most search engines.
Since this type of SEO is ethnically questionable, most optimizers who conduct
such campaigns are required to sign nondisclosure agreements. Therefore, uncovering
actual cases where this approach has worked (produced a ban) is extremely difficult.
However, a competitor in a 2006 SEO competition believes he was inadvertently
bowled [31]. The goal of the competition was to achieve the best ranking for a made-
up term. The winner received $7000. One competitor offered to donate any winnings
to Celiac Disease Research. So many people began linking to the site that it quickly
started ranking well (typically in the top five on the major search engines). However,
over a period of a few months, as the number of incoming links kept increasing, the
site began to lose rank in Google (while maintaining rank on Yahoo and MSN). The
site was never completed bowled and eventually regained much of it rank.
According to the Forbes article, Google’s Matt Cutts states, ‘‘We try to be
mindful of when a technique can be abused and make our algorithm robust against
it. I won’t go out on a limb and say it’s impossible. But Google bowling is much
more inviting as an idea than it is in practice.’’

5. Legal and Ethical Considerations

Obviously, the field of SEO raises important legal and ethical considerations. The
main legal concern is copyright infringement. Ethical considerations are more
complex as there are currently no standards or guidelines in the industry.

5.1 Copyright Issues

The main legal issues associated with SEO (particularly black hat) involve
intellectual property—primarily copyright, but also trademark. Automated content
generators are of particular concern in this area as they are designed to steal content
from various sites and, in some cases, even the look and feel of a particular site.
32 R.A. MALAGA

The use of content from another site without attribution is clearly a violation of
copyright law. However, on the Web copyright issues can become somewhat complex.
For example, Wikipedia (www.wikipedia.org) use the GNU Free Documentation
License (GFDL), which explicitly states, ‘‘Wikipedia content can be copied, modified,
and redistributed if and only if the copied version is made available on the same terms to
others and acknowledgment of the authors of the Wikipedia article used is included
(a link back to the article is generally thought to satisfy the attribution requirement).’’ So
simply including content from Wikipedia on a site would not constitute a copyright
violation as long as a link back to the source article was included.
YouTube.com is another interesting example of the complexities of Web copy-
right. The site has very strict guidelines and enforcement mechanisms to prevent
users from uploading copyrighted material without appropriate permissions. How-
ever, when a user submits their own videos to YouTube they, ‘‘grant YouTube a
worldwide, nonexclusive, royalty-free, sublicenseable and transferable license to
use, reproduce, distribute, prepare derivative works of, display, and perform the
User Submissions in connection with the YouTube Web site. . . You also hereby
grant each user of the YouTube Web site a nonexclusive license to access your User
Submissions through the Web site, and to use, reproduce, distribute, display, and
perform such User Submissions as permitted through the functionality of the Web
site and under these Terms of Service (from http://www.youtube.com/t/terms?
hl¼en_US).’’ Since YouTube makes it very easy to include videos on a Web site
(via the embed feature), it appears that doing so does not violate copyright.
So, at what point does the optimizer (either white or black hat) cross the line and
become a copyright violator? If, for example, you read this chapter and then write a
summary of it in your own words it is not a violation of copyright. Does this
assessment change if instead of writing the summary yourself, you write a computer
program that strips out the first two sentences of each paragraph in order to
automatically generate the summary? What if you take those sentences and mix
them with others extracted from a dozen other articles on SEO? This is essentially
what some of the content generators do.
From a practical perspective, black hats can actually use copyright law as a
weapon in their arsenal. All of the major search engines provide a mechanism for
making a complaint under the Digital Millennium Copyright Act (DMCA). Some of
the search engines will remove a site from its index as soon as a DMCA complaint is
received. In addition, sites that contain backlinks, such as YouTube, will also
remove the content when a complaint is received. All of these sites will attempt
to contact the ‘‘infringing’’ site and allow it to provide a counter-notification—
basically a defense against the complaint. There is at least one anecdotal report (see
http://www.ibrian.co.uk/26-06-2005/dmca-the-new-blackhat-for-yahoo-search/) that
this approach resulted in the temporary removal of a site from Yahoo.
SEARCH ENGINE OPTIMIZATION 33

It should be noted that submitting a false DMCA complaint is not without serious
risks and potential consequences. As stated on the Google DMCA site (http://www.
google.com/dmca.html), ‘‘Please note that you will be liable for damages (including
costs and attorneys’ fees) if you materially misrepresent that a product or activity is
infringing your copyrights. Indeed, in a recent case (please see http://www.
onlinepolicy.org/action/legpolicy/opg_v_diebold/ for more information), a com-
pany that sent an infringement notification seeking removal of online materials
that were protected by the fair use doctrine was ordered to pay such costs and
attorneys’ fees. The company agreed to pay over $100,000.’’

5.2 SEO Ethics

According to Merriam–Webster’s dictionary (http://www.merriam-webster.com/
dictionary/ethics) ethics are, ‘‘the principles of conduct governing an individual or
group.’’ In some industries, these principles are determined by a governing body.
However, there is no governing body in the field of SEO. The SEMPO is a voluntary
industry association. However, even SEMPO has not put forth a set of ethical
principles for SEO professionals.
As a practical matter, Google’s Webmaster guidelines currently serve as a de
facto set of standards in the SEO industry. Violating those guidelines generally puts
an optimizer in the black hat camp.
One of the major problems in determining ethical behavior in the SEO industry is
that some tools and techniques can be used for either white or black hat SEO.
For example, if done in moderation and with good content the blog and ping
indexing approach is generally considered white hat. However, when it is abused
the technique becomes black hat.

5.3 Search Engine Legal and Ethical Considerations

As search engines, especially Google, have become the starting point for most
commercial activity on the Web, it is appropriate to ask if they are acting in an
ethical and legal manner. For instance, can a search engine selectively increase or
decrease the organic results for certain sites based on commercial, political, or other
interests. Clearly, the major search engines have censored their search results in
various countries—most notably China. Could a search engine raise the search rank
of its business partners or certain politicians? Since the search algorithms are a trade
secret it would be very difficult to determine if such a move was made intentionally
or as a natural result of the algorithm.
Currently most of the revenue generated by search engines comes from PPC ads
which are clearly labeled as ‘‘sponsored links.’’ However, could a search engine
34 R.A. MALAGA

ethically and legally ‘‘sell’’ the top spots in its organic results without labeling them
as sponsored or ads? In fact, in the early days of PPC advertising (prior to 2002)
most search engines did not readily distinguish between paid and nonpaid listings in
the search results. This led to a June 27, 2002 U.S. Federal Trade Commission (FTC)
recommendation to the search engines that ‘‘any paid ranking search results are
distinguished from nonpaid results with clear and conspicuous disclosures’’ [32].
The major search engines have complied with this recommendation and it is likely
that any search engine that ‘‘sells’’ organic results without disclosing they are paid
will incur a FTC investigation.
Finally, do the search engines have any legal or ethical obligations when it comes
to its users? Search engines have the ability to gather a tremendous amount of data
about a person based on his or her search patterns. If that user has an account on the
search engine (usually for ancillary services like email) then these data can become
personally identifiable. What the search engines and users must be aware of is that as
global entities, the data collected by search engines may fall under various jurisdic-
tions. For example, in 2004 Yahoo! Holdings (Hong Kong) in response to a
subpoena from Chinese authorities provided the IP address and other information
about dissident Shi Tao. Based on the information provided, Tao was sentenced to
10 years in prison for sending foreign Web sites the text of a message warning
journalists about certain actions pertaining to the anniversary of the Tiananmen
Square massacre [33]. While Yahoo! acted according to local laws, many would
claim that they acted unethically. In fact, Yahoo! executives were called before a
Congressional committee to testify about the ethics of its actions.

6. Conclusions

The growth in the number of Web searches, along with more online purchases,
and the ability to precisely track what site visitors are doing (and where they have
come from) has led to explosive growth in the SEM industry. This growth is
expected to continue at around a 13% annual rate over the next few years, this is
opposed to a 4% growth rate for offline advertising [34]. Given this growth and the
profit incentives involved it is no wonder that some people and companies are
looking for tools and techniques to provide them with an advantage.
SEO is a constantly changing field. Not only are the major search engines
continually evolving their algorithms, but also others are entering (and frequently
leaving) the Web search space. For example, while writing this chapter Microsoft
launched its latest search engine—Bing. The press has also given much attention to
Wolfram Alpha and Twitter’s search capability. By the time you read this we should
SEARCH ENGINE OPTIMIZATION 35

have a better idea if these new initiatives have been a success or a flop. In either case
SEO practitioners and researchers need to keep abreast of the most recent develop-
ments in this field.

6.1 Conclusions for Site Owners

and SEO Practitioners
The implications of black hat SEO are particularly important for anyone who
owns a Web site or anyone who is involved in the online marketing industry. Web
site owners need to be aware of black hat SEOs on two levels. First, they need to
understand if the competition is using black hat approaches and if they are achieving
success with them as this might impact the prospects of the owner’s site. Second, the
site owner who might want to hire a search engine optimizer needs to be aware of the
techniques the person or firm might use. Some SEO companies use black hat
techniques to get quick results, and payment, from their clients. This was the case
with a company called Traffic Power. In 2005, Traffic Powers’ sites and many of its
clients’ sites were banned by Google for using black hat SEO approaches [25].
Online marketers also need to understand and keep up with the most current SEO
methods and tools, including the black hat approaches. As was mentioned above,
many black hat methods work well for white hat SEO when done in moderation.
In addition, many black hat tools may prove useful to white hat SEOs.
Everyone involved in SEO must realize that today’s ‘‘white hat’’ SEO methods
might become ‘‘black hat’’ in the future as the search engines change their algo-
rithms and policies. We have already seen this occur with the case of purchased
links.
Finally, white hat SEOs need to understand when the competition is using black
hat techniques. The major search engines have processes in place to report black hat
sites. For instance, Google’s Webmaster Tools site allows Webmasters to ‘‘Report
spam in our index’’ and ‘‘Report paid links.’’

6.2 Future Research Directions

While SEO has been around since the mid-1990s, academic researchers have only
recently begun to take an interest in this area. Therefore, there are numerous
directions for future research in both black hat and white hat SEO. In general, the
area of SEO research can be broken into three main categories.
First, we need to have a better understanding of which techniques produce the best
results. This is true for both white hat and black hat approaches. For instance, it is not
36 R.A. MALAGA

clear exactly which SEO techniques and the scope of those techniques will result in a
penalty or outright ban in the search engines.
As the search engine industry evolves researchers should try to keep pace. For
example, Google has recently begun to provide integrated search results. As shown
in Fig. 12, instead of just showing a list of Web sites, Google now provides products,
video, and images that match the query. It is unclear exactly how these should be
handled from a SEO perspective.
Second, researchers can play a very important role in helping the search engines
improve their algorithms and develop other measures to deal with black hat SEO.
Much of the original research and development of many of the major search engines
comes directly from academia. For example, Google began as a Ph.D. research
project at Stanford University and the university provided the original servers for the
search engine. Ask.com is a search engine that started out as Teoma. The underlying
algorithm for Teoma was developed by professors at Rutgers University in
New Jersey.
While more work needs to be done, some researchers have already begun con-
ducting research into preventing black hat SEO methods. Krishnan and Raj [35] use
a seed set of black hat (spam) pages and then follow the links coming into those
pages. The underlying concept is that good pages are unlikely to link to spam pages.

FIG. 12. Integrated Google search results.

SEARCH ENGINE OPTIMIZATION 37

Thus by following back from the spam pages, as process they term antitrust, they
find pages that can be removed from or penalized by the search engines. Kimuara
et al. [36] developed a technique based on network theory aimed at detecting black
hat trackback blog links.
Third, academic researchers can help in understanding how visitors and potential
customers use search engines. As mentioned above, Microsoft has taken a first step in
this direction with its research into Online Commercial Intention (OCI). However,
the current OCI research is based on human interpretation of search terms.
A promising area for future research would be to analyze what search engine visitors
actually do after performing certain queries.
Another interesting research area is understanding the demographic and behav-
ioral characteristics of the people who visit each search engine and how these impact
online purchases. There is some anecdotal evidence to suggest that the major search
engines do perform differently in terms of conversion rate—at least for paid search
[37]. Taking this concept a step further, an optimizer might decide to focus a SEO
campaign on a certain search engine based on a match between demographics, the
product or service offered, and the conversion rate.

References

[1] SEMPO, The State of Search Engine Marketing 2008, 2008. Retrieved May 1, 2009, from http://
www.sempo.org/learning_center/research/2008_execsummary.pdf.
[2] R. Sen, Optimal search engine marketing strategy, Int. J. Electron. Comm. 10 (1) (2005) 9–25.
[3] iProspect, iProspect Search Engine User Behavior Study, 2006. Retrieved June 15, 2009, from http://
www.iprospect.com/premiumPDFs/WhitePaper_2006_SearchEngineUserBehavior.pdf.
[4] L.A. Granka, T. Joachims, G. Gay, Eye-tracking analysis of user behavior in WWW search, in:
Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Develop-
ment in Information Retrieval, Sheffield, United Kingdom, July 25–29, 2004. Retrieved June 15,
2009, from http://www.cs.cornell.edu/People/tj/publications/granka_etal_04a.pdf.
[5] R. Bauer, SEO Services Comparison & Selection Guide, 2008. Retrieved June 10, 2009, from http://
www.scribd.com/doc/2405746/SEO-Pricing-Comparison-Guide.
[6] R. Malaga, The value of search engine optimization: an action research project at a new e-commerce
site, J. Electron. Comm. Organ. 5 (3) (2007) 68–82.
[7] R. Malaga, Worst practices in search engine optimization, Commun. ACM 51 (12) (2008) 147–150.
[8] M.S. Raisinghani, Future trends in search engines, J. Electron. Comm. Organ. 3 (3) (2005) i–vii.
[9] J. Zhang, A. Dimitroff, The impact of metadata implementation on webpage visibility in search
engine results (Part II), Inform. Process. Manage. 41 (2005) 691–715.
[10] E. Burns, U.S. Core Search Rankings, February 2008, 2008. Retrieved May 1, 2009, from http://
searchenginewatch.com/showPage.html?page¼3628837.
[11] K. Curran, Tips for achieving high positioning in the results pages of the major search engines,
Inform. Technol. J. 3 (2) (2004) 202–205.
[12] D. Sullivan, How Search Engines Rank Web Pages, 2003. Retrieved June 15, 2009, from http://
searchenginewatch.com/webmasters/article.php/2167961.
38 R.A. MALAGA

[13] H.K. Dai, L. Zhao, Z. Nie, J. Wen, L. Wang, Y. Li, Detecting Online Commercial Intention (OCI),
in: Proceedings of the 15th International Conference on the World Wide Web, Edinburgh, Scotland,
2006, pp. 829–837.
[14] T. O’Reilly, What Is Web 2.0—Design Patterns and Business Models for the Next Generation of
Software, 2005. Retrieved May 1, 2009, from http://www.oreillynet.com/pub/a/oreilly/tim/news/
2005/09/30/what-is-web-20.html.
[15] Technorati, Welcome to Technorati, 2008. Retrieved May 1, 2009, from http://technorati.com/about/.
[16] Wikipedia, Social Bookmarking, 2008. Retrieved May 1, 2009, from http://en.wikipedia.org/wiki/
Social_bookmarking.
[17] R. Malaga, Web 2.0 techniques for search engine optimization—two case studies, Rev. Bus. Res.
9 (1) 2009.
[18] J. Zhang, A. Dimitroff, The impact of webpage content characteristics on webpage visibility in
search engine results (Part I), Inform. Process. Manage. 41 (2005) 665–690.
[19] A. Beal, SMX: Cutts on Themes and Latent Semantic Indexing, 2007. Retrieved June 10, 2009, from
http://www.webpronews.com/blogtalk/2007/06/11/smx-cutts-on-themes-and-latent-semantic-
indexing.
[20] S. Deerwester, S. Dumais, G. Furnas, T. Landauer, R. Harshman, Indexing by latent semantic
analysis, J. Am. Soc. Inform. Sci. 1 (6) (1990) 291–407.
[21] Practical Ecommerce, Importance of New, Fresh Content for SEO, 2009. Retrieved June 10, 2009,
from http://www.practicalecommerce.com/podcasts/episode/803-Importance-Of-New-Fresh-Content-
For-SEO.
[22] D. Rowse, How Much Does Fresh Content Matter in SEO? 2007. Retrieved June 10, 2009, from
http://www.problogger.net/archives/2007/05/19/how-much-does-fresh-content-matter-in-seo/.
[23] A. K’necht, SEO and Your Web Site—Digital Web Magazine, 2004. Retrieved June 10, 2009, from
http://www.digital-web.com/articles/seo_and_your_web_site/.
[24] R. Nobles, How Important Is ALT Text in Search Engine Optimization? 2005. Retrieved June 10,
2009, from http://www.webpronews.com/topnews/2005/08/15/how-important-is-alt-text-in-search-
engine-optimization.
[25] M. Cutts, Confirming a Penalty, 2006. Retrieved June 10, 2009, from http://www.mattcutts.com/
blog/confirming-a-penalty/.
[26] M. Cutts, SEO Tip: Avoid Keyword Stuffing, 2007. Retrieved June 10, 2009, from http://www.
mattcutts.com/blog/avoid-keyword-stuffing/.
[27] S. Spencer, Bidvertiser SO Does Not Belong in Google’s Top 10 for ‘‘marketing’’, 2007. Retrieved
June 10, 2009, from http://www.stephanspencer.com/tag/noscript.
[28] M. Ohye, How Google Defines IP Delivery, Geolocation, and Cloaking, 2008. Retrieved June 10,
2009, from http://googlewebmastercentral.blogspot.com/2008/06/how-google-defines-ip-delivery.
html.
[29] M. Cutts, How to Report Paid Links, 2007. Retrieved June 10, 2009, from http://www.mattcutts.
com/blog/how-to-report-paid-links/.
[30] A. Greenberg, The Saboteurs of Search, 2007. Retrieved June 10, 2009, from http://www.forbes.
com/2007/06/28/negative-search-google-tech-ebiz-cx_ag_0628seo.html.
[31] Anonymous, Google Bowling, 2006. Retrieved June 10, 2009, from http://www.watching-paint-dry.
com/v7ndotcom-elursrebmem/google-bowling/.
[32] H. Hippsley, Letter to Mr. Gary Ruskin, Executive Director, Commercial Alert, 2002. Retrieved
September 3, 2009, from http://www.ftc.gov/os/closings/staff/commercialalertletter.shtm.
[33] BBC, Yahoo ‘helped jail China writer’, 2007. Retrieved September 4, 2009, from http://news.bbc.
co.uk/2/hi/asia-pacific/4221538.stm.
SEARCH ENGINE OPTIMIZATION 39

[34] J. Kerstetter, Online Ad Spending Should Grow 20 Percent in 2008, 2008. Retrieved June 10, 2009,
from http://news.cnet.com/8301-1023_3-9980927-93.html.
[35] V. Krishnan, R. Raj, Web spam detection with anti-trust rank, in: 2nd Workshop on Adversarial
Information Retrieval on the Web, Seattle, WA, August 2006. Retrieved June 10, 2009, from http://i.
stanford.edu/(kvijay/krishnan-raj-airweb06.pdf.
[36] M. Kimuara, S. Kazumi, K. Kazuhiro, S. Sato, Detecting Search Engine Spam from a Trackback
Network in Blogspace, in: Lecture Notes in Computer Science, Springer, Berlin, 2005, p. 723.
[37] D.J. Kennedy, Google, Yahoo! or MSN—Who Has the Best Cost per Conversion—A Study, 2008.
Retrieved June 10, 2009, from http://risetothetop.techwyse.com/pay-per-click-marketing/google-
yahoo-or-msn-who-has-the-best-cost-per-conversion-a-study/.

One Click Money
No ratings yet
One Click Money
4 pages
Premium Posts 2016 - Finch
0% (1)
Premium Posts 2016 - Finch
381 pages
Notes Sebt Rise
No ratings yet
Notes Sebt Rise
24 pages
Adsense Tips: (Collection of Adsense Tips and How To Articles) This Book Is Compiled by
No ratings yet
Adsense Tips: (Collection of Adsense Tips and How To Articles) This Book Is Compiled by
46 pages
CPA Dash PDF
100% (1)
CPA Dash PDF
29 pages
Products Marketingtoolshed Co Untapped Traffic Fe
No ratings yet
Products Marketingtoolshed Co Untapped Traffic Fe
8 pages
How To Add AdSense Ads Below Post Title in Blogger
No ratings yet
How To Add AdSense Ads Below Post Title in Blogger
2 pages
Elite CPA Master 1 PDF
No ratings yet
Elite CPA Master 1 PDF
71 pages
15
No ratings yet
15
225 pages
Search Arbitrage Guide 1234442
No ratings yet
Search Arbitrage Guide 1234442
185 pages
Adsense Apocalypse PDF
No ratings yet
Adsense Apocalypse PDF
42 pages
Cpamarketing PDF
No ratings yet
Cpamarketing PDF
89 pages
The Insider's Secret To CPA Marketing Profits: UGNO Marketing Plan Series
No ratings yet
The Insider's Secret To CPA Marketing Profits: UGNO Marketing Plan Series
87 pages
Cpa Boss
No ratings yet
Cpa Boss
26 pages
Local Kingdom Rule The Pack
No ratings yet
Local Kingdom Rule The Pack
182 pages
Introduction To SEO 2 (Search Engine Optimization) (12th IT)
50% (2)
Introduction To SEO 2 (Search Engine Optimization) (12th IT)
29 pages
Overnight CPA Riches V1 PDF
100% (1)
Overnight CPA Riches V1 PDF
62 pages
Google Sniper 2.0 by George Brown
No ratings yet
Google Sniper 2.0 by George Brown
104 pages
Forum Traffic Secrets
No ratings yet
Forum Traffic Secrets
16 pages
I CPA Profits
No ratings yet
I CPA Profits
20 pages
SEO For Beginners
No ratings yet
SEO For Beginners
24 pages
SEO SessionBBA1
100% (1)
SEO SessionBBA1
55 pages
Downloaded From Manuals Search Engine
No ratings yet
Downloaded From Manuals Search Engine
56 pages
Universal PTC Guide
No ratings yet
Universal PTC Guide
28 pages
Serch Engine Marketing
No ratings yet
Serch Engine Marketing
6 pages
Cpaxxx Markexing
No ratings yet
Cpaxxx Markexing
23 pages
MBA - Digital Marketing
No ratings yet
MBA - Digital Marketing
6 pages
Untapped Traffic Sources in 2020
No ratings yet
Untapped Traffic Sources in 2020
22 pages
$160,000 Per Month With Google AdWords. - Best of Kuszter's Collection
No ratings yet
$160,000 Per Month With Google AdWords. - Best of Kuszter's Collection
11 pages
Instagram CPA Domination
No ratings yet
Instagram CPA Domination
21 pages
Money Making Guide - Proxxy
No ratings yet
Money Making Guide - Proxxy
19 pages
CPA Trinity
No ratings yet
CPA Trinity
25 pages
How Blackhat SEO Became Big November 2010
No ratings yet
How Blackhat SEO Became Big November 2010
20 pages
Hairat Kada Rashid Ashraf April 2015 Flickr Photo Sharing PDF
0% (1)
Hairat Kada Rashid Ashraf April 2015 Flickr Photo Sharing PDF
2 pages
Clickcode Income Maximizer
No ratings yet
Clickcode Income Maximizer
38 pages
Howtomakemoneywithclickbank Thesimplestepbystepguide 121201063023 Phpapp02
No ratings yet
Howtomakemoneywithclickbank Thesimplestepbystepguide 121201063023 Phpapp02
3 pages
Swiftly Profit
No ratings yet
Swiftly Profit
21 pages
Successwithmarketingplr
No ratings yet
Successwithmarketingplr
18 pages
Twitter Peaceful Profits - 2024
No ratings yet
Twitter Peaceful Profits - 2024
26 pages
Pop Traffic CPA Guide
No ratings yet
Pop Traffic CPA Guide
8 pages
Mobile Gaming Cpa Academy
No ratings yet
Mobile Gaming Cpa Academy
8 pages
20190421, Sharesale Affiliate Service Agreement
0% (1)
20190421, Sharesale Affiliate Service Agreement
8 pages
Cinstaller Method: You Can Register
No ratings yet
Cinstaller Method: You Can Register
2 pages
Google Search Operators - The Complete List (42 Advanced Operators)
No ratings yet
Google Search Operators - The Complete List (42 Advanced Operators)
50 pages
CPA Magic
No ratings yet
CPA Magic
6 pages
CPA Marketing Locked Notes
No ratings yet
CPA Marketing Locked Notes
3 pages
Go
No ratings yet
Go
13 pages
Location Specific Keywords and Why Are They Important - 10292024
No ratings yet
Location Specific Keywords and Why Are They Important - 10292024
2 pages
Crypto Offers CPA METHOD 2022 #1
No ratings yet
Crypto Offers CPA METHOD 2022 #1
2 pages
In Come Lockdown
No ratings yet
In Come Lockdown
16 pages
CPA Relapse Attack Tactic
No ratings yet
CPA Relapse Attack Tactic
14 pages
Cpa Flare
No ratings yet
Cpa Flare
18 pages
SEO
No ratings yet
SEO
3 pages
Make Money Online in Affiliate Marketing
No ratings yet
Make Money Online in Affiliate Marketing
13 pages
9 2022+Complete+Guide+to+Backlinks+ (Link+Building)
No ratings yet
9 2022+Complete+Guide+to+Backlinks+ (Link+Building)
34 pages
CPA Fastbreak Booster
100% (1)
CPA Fastbreak Booster
6 pages
Adsense X Asx
No ratings yet
Adsense X Asx
19 pages
Characteristics of The Invisible Web
No ratings yet
Characteristics of The Invisible Web
14 pages
2 Money Making Methods by Chrisily
No ratings yet
2 Money Making Methods by Chrisily
5 pages
Earn Money From Google AdSense
No ratings yet
Earn Money From Google AdSense
9 pages
Keyword Research Note
No ratings yet
Keyword Research Note
9 pages
CPA Fastbreak
100% (2)
CPA Fastbreak
26 pages
Offline Course Schedule (2 Months)
No ratings yet
Offline Course Schedule (2 Months)
4 pages
CPA Bing Blazers Booster
No ratings yet
CPA Bing Blazers Booster
32 pages
Cash in With Cpa Offers Now
No ratings yet
Cash in With Cpa Offers Now
28 pages
Seo Mse
No ratings yet
Seo Mse
10 pages
15EC52
No ratings yet
15EC52
5 pages
Paypal Magnet PDF
No ratings yet
Paypal Magnet PDF
17 pages
Seo Proposal Template
No ratings yet
Seo Proposal Template
14 pages
Google Ads Keyword Research: Your Bedside Guide
From Everand
Google Ads Keyword Research: Your Bedside Guide
Jay Nans
No ratings yet
Lazy Money V2 - by JohnGon
No ratings yet
Lazy Money V2 - by JohnGon
7 pages
Scraper Site
No ratings yet
Scraper Site
3 pages
CPA Lava
No ratings yet
CPA Lava
16 pages
Gapps Remove
No ratings yet
Gapps Remove
2 pages
Make 50-75
No ratings yet
Make 50-75
4 pages
Fast Track Clickbank Newbie Cash Machine
No ratings yet
Fast Track Clickbank Newbie Cash Machine
3 pages
Practical Paper
No ratings yet
Practical Paper
25 pages
CPA Brainiac
No ratings yet
CPA Brainiac
17 pages
Quora Unlimited Traffic CPA Leads Affiliate Sales
100% (1)
Quora Unlimited Traffic CPA Leads Affiliate Sales
3 pages
Kumudini Sahoo Mobile: +9040519162 Sovabhawan Plot - No-955/4248, Gobindeswar Road, Old Town, Bhubaneswar-2, Odisha.751002
No ratings yet
Kumudini Sahoo Mobile: +9040519162 Sovabhawan Plot - No-955/4248, Gobindeswar Road, Old Town, Bhubaneswar-2, Odisha.751002
4 pages
Sample SEO Contract
No ratings yet
Sample SEO Contract
2 pages
Traffic Tidal Wave - 20 of the Best Known Ways to Get Traffic Online
From Everand
Traffic Tidal Wave - 20 of the Best Known Ways to Get Traffic Online
Thrivelearning institute Library
No ratings yet
Seo 3
No ratings yet
Seo 3
29 pages
High PR Bookmarking List
No ratings yet
High PR Bookmarking List
3 pages
Keyword Research Guide
100% (3)
Keyword Research Guide
173 pages
High Quality Backlinks Creation Sites List
100% (4)
High Quality Backlinks Creation Sites List
86 pages
Forum Marketing Secrets: How I Use Forums to Fire Targeted Traffic
From Everand
Forum Marketing Secrets: How I Use Forums to Fire Targeted Traffic
John Hawkins
No ratings yet
Omr Sheet
No ratings yet
Omr Sheet
1 page
Google Skill Checklist
No ratings yet
Google Skill Checklist
6 pages

CPA Complete Guide

Uploaded by

CPA Complete Guide

Uploaded by

Search Engine Optimization—Black

and White Hat Approaches

ADVANCES IN COMPUTERS, VOL. 78 1 Copyright © 2010 Elsevier Inc.

4. Black Hat SEO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

FIG. 1. Paid versus organic search results.

2.1 Search Engines History and Current Statistics

2.2 SEO Concepts

3. The SEO Process

3.1 Keyword Research

FIG. 3. Competition in Google results.

Term No. of searches per month Competition R/S

Toyota Prius 388,000 5,490,000 14.14

Navigational Toyota Hotmail

3.3 On-Site Optimization

3.3.1 Meta Tags

FIG. 4. Use of title and description meta tags in search results.

3.3.2 Other On-Site Elements

3.3.2.1 Latent Semantic Indexing. All of the major search

3.3.2.2 Updated Content. There seems to be a general belief

3.4 Link Building

FIG. 5. Page Rank Example 1.

FIG. 6. Page Rank Example 2.

FIG. 7. Page Rank example with NoFollow.

4. Black Hat SEO

4.1 Black Hat Indexing Methods

4.2 On-Page Black Hat Techniques

4.2.1 Keyword Stuffing

4.2.1.1 Cascading Style Sheet—Keyword Stuffing.

4.2.1.2 Keyword Stuffing Using Code. Javascript is widely

Human web visitor

4.4 Doorway Pages

Fast refresh or redirect

FIG. 9. Doorway pages.

4.5 Content Generation

4.6 Link Building Black Hat Techniques

FIG. 10. YACG-generated site.

4.6.1 Guestbook Spamming

4.6.2 Blog Spamming

4.6.3 Forum Spamming

4.6.4 Stats Page Spamming

FIG. 11. Referrer statistics example.

4.6.5 Link Farms

4.6.6 Paid Links

4.6.7 HTML Injection

4.7 Negative SEO

5. Legal and Ethical Considerations

5.1 Copyright Issues

5.2 SEO Ethics

5.3 Search Engine Legal and Ethical Considerations

6.1 Conclusions for Site Owners

6.2 Future Research Directions

FIG. 12. Integrated Google search results.

You might also like