CPA Complete Guide
CPA Complete Guide
ROSS A. MALAGA
Management and Information Systems, School
of Business, Montclair State University, Montclair,
New Jersey, USA
Abstract
Today the first stop for many people looking for information or to make a
purchase online is one of the major search engines. So appearing toward the
top of the search results has become increasingly important. Search engine
optimization (SEO) is a process that manipulates Web site characteristics and
incoming links to improve a site’s ranking in the search engines for particular
search terms. This chapter provides a detailed discussion of the SEO process.
SEO methods that stay within the guidelines laid out by the major search engines
are generally termed ‘‘white hat,’’ while those that violate the guidelines are
called ‘‘black hat.’’ Black hat sites may be penalized or banned by the search
engines. However, many of the tools and techniques used by ‘‘black hat’’
optimizers may also be helpful in ‘‘white hat’’ SEO campaigns. Black hat
SEO approaches are examined and compared with white hat methods.
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1. Search Engines History and Current Statistics . . . . . . . . . . . . . . . . . 4
2.2. SEO Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3. The SEO Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.1. Keyword Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.2. Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3. On-Site Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.4. Link Building . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1. Introduction
The past few years have seen a tremendous growth in the area of search engine
marketing (SEM). SEM includes paid search engine advertising and search engine
optimization (SEO). According to the Search Engine Marketing Professional
Organization (SEMPO), search engine marketers spent over $13.4 billion in 2008.
In addition, this figure is expected to grow to over $26 billion by 2013. Of the
$13.4 billion spent on SEM, about 10% ($1.4 billion) was spent on SEO [1].
Paid advertising are the small, usually text-based, ads that appear alongside the
query results on search engine sites (see Fig. 1). Paid search engine advertising usually
works on a pay-per-click (PPC) basis. SEO is a process that seeks to achieve a high
ranking in the search engine results for certain search words or phrases. The main
difference between SEO and PPC is that with PPC, the merchant pays for every click.
With SEO each click is free (but the Web site owner may pay a considerable amount to
achieve the high ranking). In addition, recent research has shown that users trust the
SEO (called organic) results and are more likely to purchase from them [2].
Industry research indicates that most search engine users only clicked on sites that
appeared on the first page of the search results—basically the top 10 results. Very
few users clicked beyond the third page of search results [3]. These results confirm
the research conducted by Granka et al. [4], in which they found that almost 80% of
SEARCH ENGINE OPTIMIZATION 3
the clicks on a search engine results page came went to those sites listed in the first
three spots.
SEO has become a very big business. Some of the top optimizers and SEO firms
regularly charge $20,000 or more per month for ongoing optimization. It is not
uncommon for firms with large clients to charge them $150,000 or more on a
monthly basis [5].
Because of the importance of high search engine rankings and the profits
involved, search engine optimizers look for tools, methods, and techniques that
will help them achieve their goals. Some focus their efforts on methods aimed at
fooling the search engines. These optimizers are considered ‘‘black hat,’’ while
those that closely follow the search engine guidelines would be considered ‘‘white
hat.’’ There are two main reasons why it is important to understand the methods
employed by black hat optimizers. First, some black hats have proven successful in
achieving high rankings. When these rankings are achieved, it means that white hat
sites are pushed lower in the search results. However, in some cases these rankings
might prove fleeting and there are mechanisms in place to report such sites to the
search engines. Second, some of the tools and methods used by black hat optimizers
can actually be used by white hat optimizers. In many cases, it is just a matter of
scope and scale that separates black and white hat.
4 R.A. MALAGA
While there are some studies dealing with SEO, notably Refs. [6–9], academic
research in the area of SEO has been relatively scant given its importance in the
online marketing field. This chapter combines the academic work with the extensive
practitioner information. Much of that information comes in the form of blogs,
forum discussions, anecdotes, and Web sites.
The remainder of this chapter proceeds as follows. Section 2 provides a back-
ground on search engines in general and basic SEO concepts. After that a detailed
discussion on the SEO process including keyword research, indexing, on-site
factors, and linking ensues. The section that follows focuses on black hat SEO techni-
ques. Legal and ethical implications of SEO are then discussed. Finally, implications for
management, conclusions, and future research directions are detailed.
2. Background
A search engine is simply a database of Web pages, a method for finding Web
pages and indexing them, and a way to search the database. Search engines rely on
spiders—software that followed hyperlinks—to find new Web pages to index and
insure that pages that have already been indexed are kept up to date.
Although more complex searches are possible, most Web users conduct simple
searches on a keyword or key phrase. Search engines return the results of a search
based on a number of factors. All of the major search engines consider the relevance
of the search term to sites in its database when returning search results. So, a search
for the word ‘‘car’’ would return Web pages that had something to do with auto-
mobiles. The exact algorithms used to determine relevance are constantly changing
and are a trade secret.
in the United States are Yahoo and MSN. Combined, these three search engines are
responsible for over 91% of all searches [10]. It should be noted that at the time this
chapter was written, Microsoft had just released Bing.com as its main search engine.
The dominance of the three major search engines (and Google in particular)
combined with the research on user habits meant that for any particular search term,
a site must appear in the top 30 spots on at least one of the search engines or it was
effectively invisible. So, for a given term, for example ‘‘toyota corolla,’’ there were
only 90 spots available overall. In addition, 30 of those spots (the top 10 in each search
engine) are highly coveted and the top 10 spots in Google are extremely important.
In general, the process of SEO can be broken into four main steps: (1) keyword
research, (2) indexing, (3) on-site optimization, and (4) off-site optimization.
l Prius
l Toyota Prius
l New Toyota Prius
l Toyota Prius New York City
l Toyota Prius NYC
l NYC Toyota Prius
It is easy to see that this list can keep going. In terms of SEO, which term or terms
should we try to optimize our site for?
Keyword research consists of building a large list of relevant search words and
phrases and then comparing them along three main dimensions. First, we need to
consider the number of people who are using the term in the search engines. After all,
why optimize for a term that nobody (or very few people) use? Fortunately, Google
now makes search volume data available via its external keyword tool (available at
https://adwords.google.com/select/KeywordToolExternal). Simply type the main
keywords and terms and click on Get Keyword Ideas. Google will generate a large
list of relevant terms and provide the approximate average search volume (see Fig. 2).
FIG. 2. Google external keyword tool search volume results for hybrid car.
SEARCH ENGINE OPTIMIZATION 7
Clearly, we are looking for terms with a comparatively high search volume.
So, for example, we can start building a keyword list with:
l Toyota Prius—388,000
l Hybrid car—165,000
l Hybrid vehicle—33,100
l Hybrid vehicles—60,500
l Hybrid autos—4400
Many search engine optimizers also consider simple misspellings. For example,
we can add the following to our list:
l Hybird—27,100
l Hybird cars—2900
l Pruis—9900
Once we have generated a large list of keywords and phrases (most optimizers
generate lists with thousands of terms), the second phase is to determine the level of
competition for each term. To determine the level of competition, simply type the
term into Google and see how many results are reported in the top right part of the
page (see Fig. 3).
To compare keyword competition, optimizers determine the results to search
(R/S) ratio. The R/S ratio is calculated by simply dividing the number results
(competitors) by the number of searches over a given period of time. On this scale
lower numbers are better. So, we might end up with a list like that in Table I.
Comparing R/S ratios is more effective than just looking at how many people are
searching for a particular word or phrase as it incorporates the level of competition.
In general, optimizers want to target terms that are highly searched and have a low
level of competition. However, the R/S ratio can reveal terms that have a relatively
Table I
SEARCH RESULTS AND COMPETITION
low level of searches, but also a very low level of competition. For instance, Table I
shows that the misspelled word ‘‘hybrid’’ has a lower search volume than many of
the other terms. However, when the competition is also considered via the R/S ratio,
the misspelled word appears to be a good potential target for SEO.
The third factor to consider, at least in most cases, is the commercial viability of
the term. To determine commercial viability we must understand a bit about
consumer buying behavior. The traditional consumer purchase model consists of
five phases: (1) need recognition, (2) information search, (3) option evaluation,
(4) purchase decision, and (5) postpurchase behavior.
Once a potential consumer becomes aware of a need she begins to search for more
information about how to fulfill that need. For example, a person who becomes
aware of a need for a car would begin by gathering some general information.
For example, she might research SUVs, trucks, sedans, etc. At this point the
consumer does not even know what type of vehicle she wants. She might use search
terms like ‘‘how to buy a car’’ or ‘‘types of cars.’’ Since the consumer does not
know what type of car she wants at this point, these terms would be considered to
have low commercial viability.
In the next phase, the consumer begins to narrow down the choices and evaluate
the various options. Some exemplar search terms in this phase might include ‘‘car
review,’’ ‘‘SUV recommendation,’’ and ‘‘best cars.’’ These terms have a bit more
commercial viability, but would still not be considered high viable.
During the fourth phase, consumers have made a choice and are now just looking
for where to purchase—comparing options like price, warranties, service, trust, etc.
At this point the search terms become much more specific. For example, the
consumer might use terms like ‘‘Toyota Prius 2009,’’ ‘‘prius best price,’’ and
‘‘new jersey Toyota dealer.’’ Since the consumer is ready to purchase these terms
are considered to have high commercial viability.
A good optimizer will actually target multiple terms—some for the site’s home-
page and some for the internal pages of the site. For instance, the site for a Toyota
dealer in Montclair New Jersey might use ‘‘New Jersey Toyota Dealer’’ as the main
SEARCH ENGINE OPTIMIZATION 9
SEO target for the homepage. The same site might use ‘‘Toyota Prius 2009 Best
Price’’ for an internal page that lists the features of that car and the details of the
vehicles on the lot.
Clearly, determining commercial viability is a combination of art and science.
It requires the optimizer to think like a consumer. Microsoft researchers have
conducted research into this area [13]. They have broken search into three main
categories: navigational, informational, and transactional. In addition, queries are
also categorized as either commercial or noncommercial based on the nature of the
search term used, resulting in the 3 ! 2 grid shown in Table II. For example, terms
that include words such as ‘‘buy,’’ ‘‘purchase,’’ or ‘‘price’’ would be considered
commercial in nature. The researchers determined the categorization of commercial
and noncommercial by asking human reviewers to rate various terms along those
dimensions. Obviously, this approach has serious limitations. However, Microsoft
has developed a Detecting Commercial Online Intent tool, which is available at
http://adlab.microsoft.com/Online-Commercial-Intention/. Many optimizers use
this site to gauge the commercial viability and search type of their keywords.
Finally, some optimizers have attempted to capture the consumer earlier in the
process—during the option evaluation phases in particular. This is typically accom-
plished by developing review and recommendation type sites. There are, to date, no
reliable data on how well these types of sites perform in terms of moving the visitor
from the information phase to the transactional phase.
3.2 Indexing
Indexing is the process of attracting the search engine spiders to a site, with the goal of
getting indexed (included in the search engine’s database) and hopefully ranked well by
the search engine quickly. All of the major search engines have a site submit form where
a user could submit a site for consideration. However, most SEO experts advise against
this approach. It appears that the major search engines prefer ‘‘discovering’’ a new site.
The search engines ‘‘discover’’ a new site when the spiders find a link to that site from
Table II
COMMERCIAL ONLINE INTENTION (ADAPTED FROM REF. [13])
Commercial Noncommercial
other sites. So the main approach to indexing involves getting links to a site from other
sites that were frequently visited by the spiders.
An increasingly popular approach to generating quick links to a new site is via
Web 2.0 properties. The term Web 2.0 seems to have been coined by Tim O’Reilly
whose O’Reilly Media sponsored a Web 2.0 conference in 2004 [14]. There does not
appear to be any standard definition of Web 2.0 as the concept is continually
evolving. However, Web 2.0 incorporates concepts such as user-developed content
(blogs, wikis, etc.), social bookmarking, and the use of really simple syndication
(RSS). For the purposes of indexing, user-developed content and social bookmark-
ing sites are key.
User-developed content sites enable users to quickly and easily publish written,
audio, and/or video content online. Many content sites employ a simple user
interface for text input that in many ways mimics traditional word processing
software. Most of these sites allow users to upload pictures and embed multimedia
content (audio and video). Among the most widely used Web 2.0 content sites are
blogs and wikis from multiple providers, and sites that allow users to create simple
Web sites (e.g., Squidoo and Hubpages).
A blog (short for Weblog) is simply a Web site where a user can post comments
and the comments are typically displayed in reverse chronological order. The
comments can range broadly from political commentary to product reviews to
simple online diaries. Modern blog software can also handle pictures, audio files,
and video files. The blog tracking site Technorati had recorded over 112 million
blogs in its system as of November 2007 [15].
A wiki is a type of software that allows users to create a Web site in a collabora-
tive manner. Wikis use a simple markup language for page creation. The most well-
known wiki site is Wikipedia (www.wikipedia.org)—an online encyclopedia to
which anyone can contribute. In fact, the site has over 9 million articles and more
than 75,000 contributors. Personal wikis are easy to create on a user’s own domain
or on free wiki sites, such as Wetpaint.com.
According to Wikipedia, ‘‘social book marking is a way for Internet users to store,
organize, share, and search bookmarks of web pages’’ [16]. These bookmarks are
typically public, meaning anyone can see them. Bookmarks are usually categorized
and also tagged. Tagging allows the user to associate a bookmark with any chosen
words or phrases. Tags are completely created by the user, not the social book-
marking site. Some of the more popular social bookmarking sites include Del.icio.us,
Digg, and StumbleUpon.
An optimizer who wants to get a new site indexed quickly can build one or more
Web 2.0 content sites and include links to the new site. The optimizer can also use
social bookmarking to bookmark the new site. Since the search engine spiders visit
these sites frequently, they will find and follow the links to the new site and index
SEARCH ENGINE OPTIMIZATION 11
them. This approach has an added benefit, in that the content site or social bookmark
itself might rank for the chosen term. If done correctly a good optimizer can
dominate the top positions in the search engines using a combination of their own
sites, Web 2.0 content sites, and social bookmarks.
Malaga [17] reports on using Web 2.0 SEO techniques to dominate the search
engine results pages. In one experiment the researcher was able to get sites indexed
in less than a day. In addition, after only a few days the researcher was able to obtain
two top 10 rankings on Google and five top 30 rankings. The results were similar for
results on Yahoo and MSN.
term. Second, they use the same title tag for every page on the site. If we consider the
example of the Toyota dealer the homepage might have a title tag of ‘‘Montclair
New Jersey Toyota Dealer.’’ However, we would want the internal pages to have
their own title tags. For example, the Prius page might use ‘‘Toyota Prius’’ for the
title and the Corolla page would use ‘‘Toyota Corolla.’’
The description meta tag is used to explain what the Web page is about. It is
usually a short sentence or two consisting of about 25–30 words. Again, the search
engines use the description tag in two ways. First, it appears that at least some of the
search engines use the description tag as part of their ranking algorithm. Second,
some of the search engines will display the contents of the description tag below the
site link in the search engine results page (see Fig. 4).
The keyword meta tag is simply a list of words or phrases that relate to the Web
page. For example, terms that might be used on the Toyota Prius page are ‘‘prius,’’
’’Toyota prius,’’ and ‘‘hybrid car.’’ The list of keywords can be quite long.
It appears that the major search engines do not give the keyword tag much or any
weight in determining a site’s rank. But some anecdotal evidence exists to support
the notion that Yahoo still uses this tag in its ranking algorithm.
The major search engines will index each Web page they find and will follow all of
the links on that page. There are some circumstances in which we might not want the
search engines to index a page or follow the links on it. For example, we might not
want a privacy policy or terms of use page to be indexed by the search engines. In this
case the optimizer can use a robot tag set to ‘‘noindex.’’ The major search engines will
still crawl the site, but most will not include the page in their database. There are some
cases where we might not want the search engines to follow the links on a particular
page. Including sponsored links on a page would be a good example. In this case the
optimizer would use a robot tag set to ‘‘nofollow.’’ These terms can be combined so
that a page is not indexed and the links on the page are also not followed. It should
be noted that while the major search engines currently abide by ‘‘noindex’’ and
‘‘nofollow’’ statements, they might change their policies at any time.
SEARCH ENGINE OPTIMIZATION 13
Optimizers and Web developers often handle robots restrictions at the domain
level. They can include a file named robots.txt in the root directory. That file lists
directories and files that should and should not be followed or indexed.
If we pull all meta tag information together, we might wind up with the following
example for a Prius review on a hybrid car blog that is targeting the term ‘‘prius car
review’’:
< head>
< title>Prius Car Review</title>
< meta name¼"description" content¼"Unbiased Prius Car Review
from the Hybrid Car Experts"/>
< meta name¼"keyword" content¼"prius, toyota prius, hybrid car
review, hybrid truck review, hybrid SUV review, hybrid cars"/>
</head>
If the same blog also contained a page with sponsored links that the author did not
want the search engines to index or follow the head might appear as follows:
< head>
< title>Hybrid Car Review--Sponsors</title>
< meta name¼"description" content¼"Links to interesting sites
about hybrid cars"/>
< meta name¼"keyword" content¼"hybrid car review, hybrid truck
review, hybrid SUV review, hybrid cars, hybrid car blog, hybrid car
links"/>
< meta name¼"robots" content¼"noindex">
< meta name¼"robots" content¼"nofollow">
</head>
complex task. A simple approach is to just match up the text on a Web site with the
query. The problem with this technique is that there are many ways to search for the
same thing. To overcome this problem, the major search engines now use latent
semantic indexing (LSI) to ‘‘understand’’ a Web page. Google’s Matt Cutts has
confirmed that Google uses LSI [19].
Latent semantic indexing (sometimes called latent semantic analysis) is a natural
language processing method that analyzes the pattern and distribution of words on a
page to develop a set of common concepts. Scott Deerwester, Susan Dumais, George
Furnas, Richard Harshman, Thomas Landauer, Karen Lochbaum, and Lynn Streeter
were granted a patent in 1988 (U.S. Patent 4,839,853) for latent semantic analysis.
A basic understanding of LSI is important in terms of writing for the search engines.
The technical details of LSI are beyond the scope of this chapter—the interested
reader is referred to Ref. [20]. However, a short example should aid in understanding
the overall concept. If we go back to our Toyota dealer we might expect its site to
include terms like car, truck, automobile, as well as names of cars (Corolla, Prius,
Highlander, etc.). To keep the example simple, let us assume that one page on our
Toyota dealer’s site might read ‘‘Toyota trucks are the most fuel efficient.’’ If a
search engine just used a simple matching approach then a query for ‘‘Toyota
Tacoma’’ would not return the page—since that exact term does not appear on the
page. However, using LSI the search engine is able to ‘‘understand’’ that the page is
about Toyota trucks and that a search for Toyota Tacoma should return the page.
It does this by comparing the site with all of the others that include terms such as
‘‘Toyota,’’ ‘‘truck,’’ and ‘‘Tacoma’’ and judges that it is likely that these terms are
related. The goal of LSI is to determine just how closely related the terms are. For
example, if the user searched on the term ‘‘Chevy Silverado’’ it should not return the
Toyota truck page. Although both pages are about ‘‘trucks,’’ using LSI the search
engine can determine that the relation among the terms is not very close.
It should be noted that Google probably does not actually compare pages with every
other page on the Web, as this would be extremely computationally intensive. However,
Google does attempt to determine the ‘‘theme’’ of individual pages and of sites.
Therefore, optimizers attempt to organize their sites based on the ‘‘themes’’ they are
attempting to optimize for. For example, if we were trying to optimize for ‘‘Toyota
trucks,’’ ‘‘Toyota sedans,’’ and ‘‘Toyota hybrids’’ each of these would become a page
on the dealer’s site. In addition, each page would be written around the ‘‘theme.’’
some sites with no fresh content may still rank well. For example, Darren Rowse
writing on ProBlogger [22] discusses the case of a blog that still ranked third on
Google for the term ‘‘digital photography’’ 9 months after any new content was
added. In addition, this author had a site that ranked first on Google for ‘‘discount
ipod accessories’’ 2 years after any new content was added to the site. While the data
are scant and anecdotal, it appears that adding fresh content may help get a site
ranked well to begin, but does not appear to be necessary to maintain a high ranking.
3.3.2.3 Formatting. The content on the page is important for SEO, but
so is the formatting. Formatting gives the search engines clues as to what the Web
developer thinks are the important aspects of the page. Some of the formatting
components considered by the search engines include header tags (H1, H2, etc.),
bold, and emphasis.
Header tags are used to identify various sections of a page. Header 1 (H1) tags
deliminate major sections on a page. Higher header numbers (H2, H3, etc.) identify
subsections. The major search engines consider text in an H1 HTML header to be of
increased importance [23]. Therefore, many optimizers place the main key terms
within an H1 tag structure. Lesser SEO terms might be placed in higher number
header tags (H2, H3, etc.). The bold and emphasis (em) tags are also used by
optimizers to indicate important text on a page.
Some early Web browsers, such as Lynx, were text based. That is, they could not
display any graphics. The < alt> tag was originally used to display information
about a Web-based image when displayed in a text-based browser. Today the
overwhelming majority of Web traffic is from graphical browsers (such as Internet
Explorer, FireFox, Opera, and Chrome). However, optimizers still use the < alt> tag
as another way of serving text content to the search engines. At one time search
engines appeared to make use of the text in an < alt> tag. According to Jerry West
[24], today the search engines not only do not use the < alt> tag in determining
rankings, but also may actually penalize sites that use, or overuse, the tag. The
< alt> tag remains important for people with disabilities, so it is good Web devel-
opment practice to include descriptive < alt> tags for all images.
Yahoo and MSN use the number of backlinks in their algorithms. However,
Google places particular importance on backlinks. Google does not just consider
the number of links, but also the ‘‘quality’’ of those links. Google assigns each page
in its index a Page Rank (PR). PR is a logarithmic number scale from 0 to 10 (with
10 the best). Google places more weight on backlinks that came from higher
PR sites.
Exactly how PR is currently calculated is a closely held secret. However, we can
gain a basic understanding of the concept by examining the original PR formula
(U.S. Patent 6,285,999):
PRðAÞ ¼ ð1 % dÞ þ dðPRðt1 Þ=Cðt1 Þ þ ' ' ' þ PRðtn Þ=Cðtn ÞÞ;
where PR is the Page Rank of a particular Web page, t1,. . .,tn are Web pages that link
to page A, C is the number of links from a page to another page on the Web, and d is
a damping factor.
Clearly, determining PR is an iterative process. Also, each page provides a portion
of its PR to each page it links to. So, from a SEO perspective we want incoming links
from sites with a high PR. In addition, it is beneficial to obtain links from pages that
have fewer outgoing links. Consider the example in Fig. 5 (adapted from http://
www.whitelines.nl/html/google-page-rank.html).
In this example, the entire Web consists of only three pages (A, B, and C).
In addition, the only links are the ones indicated. Each page has its PR set initially
to 1.0. An initial PR needs to be assumed for each page. However, after 15 iterations
the actual PR emerges and can be seen in Fig. 6.
In addition to external sites, it must be noted that links from pages within a site
(internal links) also count toward PR. Therefore, optimizers must consider the
internal linking structure of the site. For example, many sites include links to privacy
policies, terms of service, about us, and other pages that do not allow visitors to take
Page A Page B
PR = 1.0 PR = 1.0
Page C
PR = 1.0
Page A Page B
PR = 1.164 PR = 0.644
Page C
PR = 1.192
Page A Page B
PR = 0.434 PR = 0.335
Page C
PR = 0.335
commercial action. However, since links to these pages appear on every page in the
site they may obtain a high PR. Optimizers can attempt to manipulate PR by using
the Robots NoFollow meta tag.
For instance, assume that the above example represents pages on a site. If we use
NoFollow to essentially eliminate the link between page B and page C (note that the
link still exists, but does not count toward PR), we would wind up with the structure
shown in Fig. 7 after 20 iterations. In this example, adding the NoFollow leads to a
major change in PR for all of the pages on the site. In fact, it reduced the PR for all of
the pages—a very nonoptimal outcome!
Clearly, the result in this example is highly unlikely due to the circular link
structure on the site and lack of external links. But it does point out how an
inexperienced optimizer can run into problems when attempting to manipulate PR
on a site.
18 R.A. MALAGA
Some of the basic tenants of black hat SEO include automated site creation by
using existing content and automated link building. While there is nothing wrong
with automation in general, black hat SEOs typically employ techniques which
violate the search engines Webmaster guidelines.
For example, both Google and Yahoo provide some guidance for Webmasters.
Since Google is the most widely used search engine and their guidance is the most
detailed we will discuss their policies.
Google’s Webmaster guidelines (available at http://www.google.com/support/
webmasters/bin/answer.py?answer¼35769) offer quality guidelines that are impor-
tant for SEO. Google states:
These quality guidelines cover the most common forms of deceptive or manipulative
behavior, but Google may respond negatively to other misleading practices not listed
here (e.g., tricking users by registering misspellings of well-known websites). It’s not
safe to assume that just because a specific deceptive technique isn’t included on this
page, Google approves of it. Webmasters who spend their energies upholding the spirit
of the basic principles will provide a much better user experience and subsequently
enjoy better ranking than those who spend their time looking for loopholes they can
exploit.
In general, Google would like Webmasters to develop pages primarily for users,
not search engines. Google suggests that when in doubt the Webmaster should ask
‘‘does this help my users?’’ and ‘‘would I do this if the search engines did not
exist?’’
The Google Quality Guidelines also outline a number of specific SEO tactics
which Google finds offensive. These tactics are discussed in detail in the sections
below.
The astute reader might now ask, what will Google (or any other search engine)
do if I violate its quality guidelines. Google has two levels of penalties for sites that
are in violation. Those sites that use the most egregious tactics are simply banned
from Google. For example, on February 7, 2006 Google banned BMW’s German
language site (www.bmw.de) for using a ‘‘doorway page.’’ This is a page that shows
different content to search engines and human visitors. Sites that use borderline
tactics may be penalized instead of banned. A penalty simply means that the site
loses ranking position.
We can use the penalty system to provide a working definition of black and white
hat SEO. In general, black hat SEO consists of methods that will most likely lead to
Google penalizing or banning the site at some point. White hat SEOs are methods
SEARCH ENGINE OPTIMIZATION 19
that Google approves of and will therefore not lead to any penalty. There are some
techniques that are borderline and some that are generally white hat, but may be
overused. We might define optimizers that fall into this category as gray hat. Gray
hat techniques may lead to a penalty, but will not usually result in a ban.
If black hat strategies lead to a site ban, why do it? Black hats tend to fall into two
categories. First, as it typically takes a bit of time for Google to ban a site, there are
individuals who use this delay to temporarily achieve a top ranking and make a bit of
money from their site. As they use software to automate the site creation and ranking
process, they are able to churn out black hat sites.
The second category consists of SEO consulting firms that use black hat techni-
ques. These companies achieve a temporary high ranking for their clients, collect
their money, and move on. For example, according to Google employee Matt Cutts’
blog [25], the SEO consulting company Traffic Power was banned from the Google
index for using black hat strategies. In addition, Google also banned Traffic Powers’
clients.
may not be conducive to good site design or a high conversion rate (the rate at which
site visitors perform a monetizing action, such as make a purchase). The three main
methods that fall into this category are keyword stuffing with hidden content,
cloaking, and doorway pages.
By setting the division’s visibility setting to ‘‘hidden,’’ none of the text in the
division is displayed to human visitors. However, the text can be found and indexed
by search engine spiders.
SEARCH ENGINE OPTIMIZATION 21
Small Division:
#hidetext a
{
width:1px;
height:1px;
overflow:hidden;
}
< div id¼"hidetext">Toyota Prius</div>
In this example, the text in the division is theoretically visible to both humans and
search engines. However, the layer is so small (only 1 pixel in size) that it will likely
be completely overlooked by human visitors.
Positioning Content Off-Screen:
#hideleft {
position:absolute;
left:-1000px;
}
< div id¼"hideleft">Toyota Prius</div>
The content in this division is also theoretically visible to both humans and search
engines. In this case, however, the layer is positioned so far to the left (1000 pixels)
that it will not be seen by humans.
Hiding Layers Behind Other Layers:
< div style¼"position: absolute; width: 100px; height: 100px;
z-index: 1" id¼"hide">
This is the text we want to hide</div>
< div style¼"position: absolute; width: 100px; height: 100px;
z-index: 2; background-color: #FFFFFF" id¼"showthis">
This is the text we want to display
</div>
This code example uses the CSS layer’s z index property to position one layer on
top of the other. The z index provides a measure of three dimensionality to a Web
page. A z index of 1 is content that site directly on the page. z index values greater than
1 refer to layers that are ‘‘coming out of the screen.’’ z index values less than 1 are used
for layers that are further behind the screen. In the example above, the hidden text is
put in a layer with a z index value of 1. The visible text appears in a layer with a z index
value of 2. Thus, due to the positioning, the second layer is aligned directly over the
first. This causes the text in the first layer to appear only to the search engines.
Google, for one, has begun removing content contained within hidden div tags
from its index. It has explicitly banned sites that use small divisions for keyword
22 R.A. MALAGA
stuffing purposes [26]. However, these policies may cause a problem for legitimate
Web site developers who use hidden divisions for design purposes. For example,
some developers use hidden CSS layers and z index positioning to implement mouse
over multilevel menus (menus that expand when the user places their mouse over a
menu item) and other interactive effects on their sites.
While NoScript has legitimate uses, it can also be abused by black hat optimizers.
Many use the NoScript tag for keyword stuffing. There is some good anecdotal
evidence to suggest that this tactic has helped some sites achieve a high ranking for
competitive terms [27].
4.3 Cloaking
As we have already seen, some of the tricks used in black hat SEO are not conducive to
a good visitor experience. Cloaking overcomes this problem. The main goal of cloaking
is to provide different content to the search engines and to human visitors. Since users
will not see a cloaked page, it can contain only optimized text—no design elements are
needed. So the black hat optimizer will set up a normal Web site and individual, text
only, pages for the search engines. The Internet protocol (IP) addresses of the search
engine spiders are well known. This allows the optimizer to include simple code on the
page that serves the appropriate content to either the spider or human (see Fig. 8).
Some black hat optimizers are taking the cloaking concept to the next level and
using it to optimize for each individual search engine. Since each search engine uses
a different algorithm, cloaking allows optimizers to serve specific content to each
different spider.
Since some types of cloaking actually may provide benefits to users, the concept
of cloaking and what is, and is not, acceptable by the search engines has evolved
SEARCH ENGINE OPTIMIZATION 23
HTTP
HTTP request
request
Keyword optimized
Formatted page with no
web page formatting
Web server
FIG. 8. Cloaking.
over the past few years. One topic of much debate is the concept of geolocation.
Geolocation uses a visitor’s IP address to determine their physical location and
changes the site’s content accordingly. For instance, a site that sells baseball
memorabilia might use geolocation to direct people who live in the New York
City area to a Yankees page and those who live in the Boston area to a Red Sox page.
Clearly, geolocation allows site developers to provide more highly targeted content.
The main question is if the site serves different content to the search engines than to
most users, is it still considered cloaking? Maile Ohye [28] posting on the Google
Webmaster Central Blog chimed in on the controversy. According to Ohye as long as
the site treats the Google spider the same way as a visitor, by serving content that is
appropriate for the spider’s IP location, the site will not incur a penalty for cloaking.
Search engine
spiders
Page formatted Page formatted Page formatted Page formatted Page formatted
for search engines: for search engines: for search engines: for search engines: for search engines:
keyword A keyword B keyword C keyword D keyword E
Homepage
formatted for
human visitors
refresh to redirect users to the main page (see Fig. 9). A meta refresh is an HTML
command that automatically switches users to another page after a specified period
of time. Meta refresh is typically used on out of date Web pages—you often see
pages that state you will be taken to the new page after 5 s. A fast meta
refresh occurs almost instantly, so the user is not aware of it. All of the major search
engines now remove pages that contain meta refresh. Of course, the black hats have
fought back with a variety of other techniques, including the use of Javascript
for redirects. This is the specific technique that caused Google to ban bmw.de and
ricoh.de.
SEARCH ENGINE OPTIMIZATION 25
advantage of them. Two ways that black hats use blogs to generate incoming links
are blog comment spamming and trackback spamming.
Blog comment spamming is similar to guest book spamming, in that the optimizer
leaves backlinks in the comments section of publicly available blogs. However,
many blogging systems have added features to handle comment spam. At the
simplest level, the blog owner can require that all comments receive his or her
approval before they appear on the site. Another simple technique is to use nofollow
tags on all comments.
Many blogging systems now have a simple plugin that will require the commenter
to complete a ‘‘Completely Automated Public Turing test to tell Computers and
Humans Apart’’ (CAPTCHA). A CAPTCHA system presents a visitor with an
obscured word, words, or phrase. The obscuring is usually achieved by warping
the words, distorting the background, or segmenting the word by adding lines. While
humans can see through the obscuring technique, computers cannot. Since compu-
ters cannot solve a CAPTCHA, systems that use it are usually not vulnerable to
automated spamming software. However, some CAPTCHA systems have been
hacked, allowing black hat optimizers to successfully bypass this countermeasure
on certain sites.
A trackback is a link between blogs that is used when one blog refers to comment
on another blog. When this occurs, the original blog will generate a link to the blog
that made the comment. For example, legitimate blog A makes a post on a blog that
uses trackbacks. A black hat optimizer then ‘‘comments’’ on the post with his
backlink. The optimizer then sends blog a trackback ping (just a signal that indicates
that the optimizer had something to say about a post on blog A). Blog A will then
automatically display a summary of the comments provided by the black hat and a
link to the black hat’s site. Trackbacks bypass many of the methods that can be used
to handle comment spam. For this reason many blog systems no longer use
trackbacks.
Most serious black hat optimizers use scripts or software to find blogs that have
comments sections that do not require approval, do not have CAPTCHA, and do
follow. Once the system builds a large list of the appropriate type of blogs it then
submits the backlink to all of them. Similar systems are available that can find blogs
that have trackbacks enabled and implement trackback spamming. Using blog
spamming systems a black hat optimizer can easily generate thousands, even tens
of thousands, of quick backlinks.
It should be noted that many white hat optimizers also use scripts or software to
automate their own blog comment campaigns. However, instead of taking a scatter
shot approach, as the black hats do, white hat optimizers submit useful comments to
relevant blogs. They use the automated tools to find those blogs and manage their
comment campaigns.
28 R.A. MALAGA
statistics package publishes the referrer list the target site will appear as a backlink.
Like the other types of link spam discussed, there are scripts and software available
that will help the black hat find appropriate sites (those that publish referrer links)
and continuously request Web pages from them so that the referrer appears on the
top referrers list.
Clearly, a simple way around this type of link spam is to set stats packages so they
do not publish their results in a publicly accessible area. Site owners can also ban
specific IP addresses (or address ranges) or ignore requests from certain referrers.
However, these measures typically require at least some level of technical expertise,
which may be beyond most site owners.
Since Google’s algorithm gives links a great deal of weight, the company quickly
caught on to these schemes and began to take action in 2005 [29]. Google appears to
have two ways it deals with paid links. First, its algorithm looks for paid links and will
penalize sites that sell them and those that purchase them. It attempts to find paid links in
a number of ways. It can, for example, look for links that follow text like ‘‘sponsored
links’’ or ‘‘paid links.’’ It also looks for links that appear out of place from a contextual
perspective. For example, a site about computers that contains links to casino sites
might be penalized. Second, Webmasters can now report sites they believe are involved
in buying or selling links. These sites then undergo a manual review.
The penalties for buying or selling links appear to vary. At the low end of the
penalty range, Google might simply discount or remove paid links when determin-
ing a site’s PR. Sites that sell links may lose their ability to flow PR to other sites.
Finally, Google can ban repeat or egregious violators from its index.
Of course, the companies that sell links have attempted to develop techniques
aimed at fooling Google. For example, some companies now sell links that appear
within the text of a site. These links appear natural to Google. In addition, some
companies specialize in placing full site or product reviews. Some bloggers, for
example, will provide a positive site or product review, with associated links, for a fee.
If a black hat site is ranked third for a key term, the optimizer who can get the top
two sites banned will be ranked first.
There are a number of techniques that can be used for bowling. For instance, the
HTML injection approach discussed above can be used to change the content that
appears on a competitor’s site. If a black hat optimizer is targeting a site that sells
computers, for example, the HTML injected might be < H1>computer, computer,
computer. . . The extensive use of keywords over and over again is almost guaran-
teed to lead to a penalty or outright ban in all the major search engines.
A recent article in Forbes [30] discussed the tactic of getting thousands of quick
links to a site a black hat wants to bowl. Quickly piling up incoming links is viewed
in a negative light by most search engines.
Since this type of SEO is ethnically questionable, most optimizers who conduct
such campaigns are required to sign nondisclosure agreements. Therefore, uncovering
actual cases where this approach has worked (produced a ban) is extremely difficult.
However, a competitor in a 2006 SEO competition believes he was inadvertently
bowled [31]. The goal of the competition was to achieve the best ranking for a made-
up term. The winner received $7000. One competitor offered to donate any winnings
to Celiac Disease Research. So many people began linking to the site that it quickly
started ranking well (typically in the top five on the major search engines). However,
over a period of a few months, as the number of incoming links kept increasing, the
site began to lose rank in Google (while maintaining rank on Yahoo and MSN). The
site was never completed bowled and eventually regained much of it rank.
According to the Forbes article, Google’s Matt Cutts states, ‘‘We try to be
mindful of when a technique can be abused and make our algorithm robust against
it. I won’t go out on a limb and say it’s impossible. But Google bowling is much
more inviting as an idea than it is in practice.’’
Obviously, the field of SEO raises important legal and ethical considerations. The
main legal concern is copyright infringement. Ethical considerations are more
complex as there are currently no standards or guidelines in the industry.
The use of content from another site without attribution is clearly a violation of
copyright law. However, on the Web copyright issues can become somewhat complex.
For example, Wikipedia (www.wikipedia.org) use the GNU Free Documentation
License (GFDL), which explicitly states, ‘‘Wikipedia content can be copied, modified,
and redistributed if and only if the copied version is made available on the same terms to
others and acknowledgment of the authors of the Wikipedia article used is included
(a link back to the article is generally thought to satisfy the attribution requirement).’’ So
simply including content from Wikipedia on a site would not constitute a copyright
violation as long as a link back to the source article was included.
YouTube.com is another interesting example of the complexities of Web copy-
right. The site has very strict guidelines and enforcement mechanisms to prevent
users from uploading copyrighted material without appropriate permissions. How-
ever, when a user submits their own videos to YouTube they, ‘‘grant YouTube a
worldwide, nonexclusive, royalty-free, sublicenseable and transferable license to
use, reproduce, distribute, prepare derivative works of, display, and perform the
User Submissions in connection with the YouTube Web site. . . You also hereby
grant each user of the YouTube Web site a nonexclusive license to access your User
Submissions through the Web site, and to use, reproduce, distribute, display, and
perform such User Submissions as permitted through the functionality of the Web
site and under these Terms of Service (from http://www.youtube.com/t/terms?
hl¼en_US).’’ Since YouTube makes it very easy to include videos on a Web site
(via the embed feature), it appears that doing so does not violate copyright.
So, at what point does the optimizer (either white or black hat) cross the line and
become a copyright violator? If, for example, you read this chapter and then write a
summary of it in your own words it is not a violation of copyright. Does this
assessment change if instead of writing the summary yourself, you write a computer
program that strips out the first two sentences of each paragraph in order to
automatically generate the summary? What if you take those sentences and mix
them with others extracted from a dozen other articles on SEO? This is essentially
what some of the content generators do.
From a practical perspective, black hats can actually use copyright law as a
weapon in their arsenal. All of the major search engines provide a mechanism for
making a complaint under the Digital Millennium Copyright Act (DMCA). Some of
the search engines will remove a site from its index as soon as a DMCA complaint is
received. In addition, sites that contain backlinks, such as YouTube, will also
remove the content when a complaint is received. All of these sites will attempt
to contact the ‘‘infringing’’ site and allow it to provide a counter-notification—
basically a defense against the complaint. There is at least one anecdotal report (see
http://www.ibrian.co.uk/26-06-2005/dmca-the-new-blackhat-for-yahoo-search/) that
this approach resulted in the temporary removal of a site from Yahoo.
SEARCH ENGINE OPTIMIZATION 33
It should be noted that submitting a false DMCA complaint is not without serious
risks and potential consequences. As stated on the Google DMCA site (http://www.
google.com/dmca.html), ‘‘Please note that you will be liable for damages (including
costs and attorneys’ fees) if you materially misrepresent that a product or activity is
infringing your copyrights. Indeed, in a recent case (please see http://www.
onlinepolicy.org/action/legpolicy/opg_v_diebold/ for more information), a com-
pany that sent an infringement notification seeking removal of online materials
that were protected by the fair use doctrine was ordered to pay such costs and
attorneys’ fees. The company agreed to pay over $100,000.’’
ethically and legally ‘‘sell’’ the top spots in its organic results without labeling them
as sponsored or ads? In fact, in the early days of PPC advertising (prior to 2002)
most search engines did not readily distinguish between paid and nonpaid listings in
the search results. This led to a June 27, 2002 U.S. Federal Trade Commission (FTC)
recommendation to the search engines that ‘‘any paid ranking search results are
distinguished from nonpaid results with clear and conspicuous disclosures’’ [32].
The major search engines have complied with this recommendation and it is likely
that any search engine that ‘‘sells’’ organic results without disclosing they are paid
will incur a FTC investigation.
Finally, do the search engines have any legal or ethical obligations when it comes
to its users? Search engines have the ability to gather a tremendous amount of data
about a person based on his or her search patterns. If that user has an account on the
search engine (usually for ancillary services like email) then these data can become
personally identifiable. What the search engines and users must be aware of is that as
global entities, the data collected by search engines may fall under various jurisdic-
tions. For example, in 2004 Yahoo! Holdings (Hong Kong) in response to a
subpoena from Chinese authorities provided the IP address and other information
about dissident Shi Tao. Based on the information provided, Tao was sentenced to
10 years in prison for sending foreign Web sites the text of a message warning
journalists about certain actions pertaining to the anniversary of the Tiananmen
Square massacre [33]. While Yahoo! acted according to local laws, many would
claim that they acted unethically. In fact, Yahoo! executives were called before a
Congressional committee to testify about the ethics of its actions.
6. Conclusions
The growth in the number of Web searches, along with more online purchases,
and the ability to precisely track what site visitors are doing (and where they have
come from) has led to explosive growth in the SEM industry. This growth is
expected to continue at around a 13% annual rate over the next few years, this is
opposed to a 4% growth rate for offline advertising [34]. Given this growth and the
profit incentives involved it is no wonder that some people and companies are
looking for tools and techniques to provide them with an advantage.
SEO is a constantly changing field. Not only are the major search engines
continually evolving their algorithms, but also others are entering (and frequently
leaving) the Web search space. For example, while writing this chapter Microsoft
launched its latest search engine—Bing. The press has also given much attention to
Wolfram Alpha and Twitter’s search capability. By the time you read this we should
SEARCH ENGINE OPTIMIZATION 35
have a better idea if these new initiatives have been a success or a flop. In either case
SEO practitioners and researchers need to keep abreast of the most recent develop-
ments in this field.
clear exactly which SEO techniques and the scope of those techniques will result in a
penalty or outright ban in the search engines.
As the search engine industry evolves researchers should try to keep pace. For
example, Google has recently begun to provide integrated search results. As shown
in Fig. 12, instead of just showing a list of Web sites, Google now provides products,
video, and images that match the query. It is unclear exactly how these should be
handled from a SEO perspective.
Second, researchers can play a very important role in helping the search engines
improve their algorithms and develop other measures to deal with black hat SEO.
Much of the original research and development of many of the major search engines
comes directly from academia. For example, Google began as a Ph.D. research
project at Stanford University and the university provided the original servers for the
search engine. Ask.com is a search engine that started out as Teoma. The underlying
algorithm for Teoma was developed by professors at Rutgers University in
New Jersey.
While more work needs to be done, some researchers have already begun con-
ducting research into preventing black hat SEO methods. Krishnan and Raj [35] use
a seed set of black hat (spam) pages and then follow the links coming into those
pages. The underlying concept is that good pages are unlikely to link to spam pages.
Thus by following back from the spam pages, as process they term antitrust, they
find pages that can be removed from or penalized by the search engines. Kimuara
et al. [36] developed a technique based on network theory aimed at detecting black
hat trackback blog links.
Third, academic researchers can help in understanding how visitors and potential
customers use search engines. As mentioned above, Microsoft has taken a first step in
this direction with its research into Online Commercial Intention (OCI). However,
the current OCI research is based on human interpretation of search terms.
A promising area for future research would be to analyze what search engine visitors
actually do after performing certain queries.
Another interesting research area is understanding the demographic and behav-
ioral characteristics of the people who visit each search engine and how these impact
online purchases. There is some anecdotal evidence to suggest that the major search
engines do perform differently in terms of conversion rate—at least for paid search
[37]. Taking this concept a step further, an optimizer might decide to focus a SEO
campaign on a certain search engine based on a match between demographics, the
product or service offered, and the conversion rate.
References
[1] SEMPO, The State of Search Engine Marketing 2008, 2008. Retrieved May 1, 2009, from http://
www.sempo.org/learning_center/research/2008_execsummary.pdf.
[2] R. Sen, Optimal search engine marketing strategy, Int. J. Electron. Comm. 10 (1) (2005) 9–25.
[3] iProspect, iProspect Search Engine User Behavior Study, 2006. Retrieved June 15, 2009, from http://
www.iprospect.com/premiumPDFs/WhitePaper_2006_SearchEngineUserBehavior.pdf.
[4] L.A. Granka, T. Joachims, G. Gay, Eye-tracking analysis of user behavior in WWW search, in:
Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Develop-
ment in Information Retrieval, Sheffield, United Kingdom, July 25–29, 2004. Retrieved June 15,
2009, from http://www.cs.cornell.edu/People/tj/publications/granka_etal_04a.pdf.
[5] R. Bauer, SEO Services Comparison & Selection Guide, 2008. Retrieved June 10, 2009, from http://
www.scribd.com/doc/2405746/SEO-Pricing-Comparison-Guide.
[6] R. Malaga, The value of search engine optimization: an action research project at a new e-commerce
site, J. Electron. Comm. Organ. 5 (3) (2007) 68–82.
[7] R. Malaga, Worst practices in search engine optimization, Commun. ACM 51 (12) (2008) 147–150.
[8] M.S. Raisinghani, Future trends in search engines, J. Electron. Comm. Organ. 3 (3) (2005) i–vii.
[9] J. Zhang, A. Dimitroff, The impact of metadata implementation on webpage visibility in search
engine results (Part II), Inform. Process. Manage. 41 (2005) 691–715.
[10] E. Burns, U.S. Core Search Rankings, February 2008, 2008. Retrieved May 1, 2009, from http://
searchenginewatch.com/showPage.html?page¼3628837.
[11] K. Curran, Tips for achieving high positioning in the results pages of the major search engines,
Inform. Technol. J. 3 (2) (2004) 202–205.
[12] D. Sullivan, How Search Engines Rank Web Pages, 2003. Retrieved June 15, 2009, from http://
searchenginewatch.com/webmasters/article.php/2167961.
38 R.A. MALAGA
[13] H.K. Dai, L. Zhao, Z. Nie, J. Wen, L. Wang, Y. Li, Detecting Online Commercial Intention (OCI),
in: Proceedings of the 15th International Conference on the World Wide Web, Edinburgh, Scotland,
2006, pp. 829–837.
[14] T. O’Reilly, What Is Web 2.0—Design Patterns and Business Models for the Next Generation of
Software, 2005. Retrieved May 1, 2009, from http://www.oreillynet.com/pub/a/oreilly/tim/news/
2005/09/30/what-is-web-20.html.
[15] Technorati, Welcome to Technorati, 2008. Retrieved May 1, 2009, from http://technorati.com/about/.
[16] Wikipedia, Social Bookmarking, 2008. Retrieved May 1, 2009, from http://en.wikipedia.org/wiki/
Social_bookmarking.
[17] R. Malaga, Web 2.0 techniques for search engine optimization—two case studies, Rev. Bus. Res.
9 (1) 2009.
[18] J. Zhang, A. Dimitroff, The impact of webpage content characteristics on webpage visibility in
search engine results (Part I), Inform. Process. Manage. 41 (2005) 665–690.
[19] A. Beal, SMX: Cutts on Themes and Latent Semantic Indexing, 2007. Retrieved June 10, 2009, from
http://www.webpronews.com/blogtalk/2007/06/11/smx-cutts-on-themes-and-latent-semantic-
indexing.
[20] S. Deerwester, S. Dumais, G. Furnas, T. Landauer, R. Harshman, Indexing by latent semantic
analysis, J. Am. Soc. Inform. Sci. 1 (6) (1990) 291–407.
[21] Practical Ecommerce, Importance of New, Fresh Content for SEO, 2009. Retrieved June 10, 2009,
from http://www.practicalecommerce.com/podcasts/episode/803-Importance-Of-New-Fresh-Content-
For-SEO.
[22] D. Rowse, How Much Does Fresh Content Matter in SEO? 2007. Retrieved June 10, 2009, from
http://www.problogger.net/archives/2007/05/19/how-much-does-fresh-content-matter-in-seo/.
[23] A. K’necht, SEO and Your Web Site—Digital Web Magazine, 2004. Retrieved June 10, 2009, from
http://www.digital-web.com/articles/seo_and_your_web_site/.
[24] R. Nobles, How Important Is ALT Text in Search Engine Optimization? 2005. Retrieved June 10,
2009, from http://www.webpronews.com/topnews/2005/08/15/how-important-is-alt-text-in-search-
engine-optimization.
[25] M. Cutts, Confirming a Penalty, 2006. Retrieved June 10, 2009, from http://www.mattcutts.com/
blog/confirming-a-penalty/.
[26] M. Cutts, SEO Tip: Avoid Keyword Stuffing, 2007. Retrieved June 10, 2009, from http://www.
mattcutts.com/blog/avoid-keyword-stuffing/.
[27] S. Spencer, Bidvertiser SO Does Not Belong in Google’s Top 10 for ‘‘marketing’’, 2007. Retrieved
June 10, 2009, from http://www.stephanspencer.com/tag/noscript.
[28] M. Ohye, How Google Defines IP Delivery, Geolocation, and Cloaking, 2008. Retrieved June 10,
2009, from http://googlewebmastercentral.blogspot.com/2008/06/how-google-defines-ip-delivery.
html.
[29] M. Cutts, How to Report Paid Links, 2007. Retrieved June 10, 2009, from http://www.mattcutts.
com/blog/how-to-report-paid-links/.
[30] A. Greenberg, The Saboteurs of Search, 2007. Retrieved June 10, 2009, from http://www.forbes.
com/2007/06/28/negative-search-google-tech-ebiz-cx_ag_0628seo.html.
[31] Anonymous, Google Bowling, 2006. Retrieved June 10, 2009, from http://www.watching-paint-dry.
com/v7ndotcom-elursrebmem/google-bowling/.
[32] H. Hippsley, Letter to Mr. Gary Ruskin, Executive Director, Commercial Alert, 2002. Retrieved
September 3, 2009, from http://www.ftc.gov/os/closings/staff/commercialalertletter.shtm.
[33] BBC, Yahoo ‘helped jail China writer’, 2007. Retrieved September 4, 2009, from http://news.bbc.
co.uk/2/hi/asia-pacific/4221538.stm.
SEARCH ENGINE OPTIMIZATION 39
[34] J. Kerstetter, Online Ad Spending Should Grow 20 Percent in 2008, 2008. Retrieved June 10, 2009,
from http://news.cnet.com/8301-1023_3-9980927-93.html.
[35] V. Krishnan, R. Raj, Web spam detection with anti-trust rank, in: 2nd Workshop on Adversarial
Information Retrieval on the Web, Seattle, WA, August 2006. Retrieved June 10, 2009, from http://i.
stanford.edu/(kvijay/krishnan-raj-airweb06.pdf.
[36] M. Kimuara, S. Kazumi, K. Kazuhiro, S. Sato, Detecting Search Engine Spam from a Trackback
Network in Blogspace, in: Lecture Notes in Computer Science, Springer, Berlin, 2005, p. 723.
[37] D.J. Kennedy, Google, Yahoo! or MSN—Who Has the Best Cost per Conversion—A Study, 2008.
Retrieved June 10, 2009, from http://risetothetop.techwyse.com/pay-per-click-marketing/google-
yahoo-or-msn-who-has-the-best-cost-per-conversion-a-study/.