Wad Module3
Wad Module3
WAD 3
Module 3: Web Search and Information Retrieval
Web Search and Retrieval: Search Engine Optimization-
Importance of SEO for web visibility and ranking,
Types of SEO: On-page, Off-page, Technical SEO.
Web Crawling and indexing- Crawling Algorithms and Challenges,
Ranking Algorithms,
Web traffic models.
Web Search & Retrieval" refers to the process of finding relevant information on the
World Wide Web by using a search engine
It is the process of utilizing an online search engine to locate documents, web pages,
videos, images, or any other type of digital content on the internet.
It applies "Information Retrieval" techniques to the vast collection of data available
online
Ø The user enters a query and the system returns a ranked list of web pages that best
match the search terms based on complex algorithms that analyze content and link
structure
Ø it involves components like web crawlers to index web pages, and ranking
algorithms to determine the most relevant results for a given query
Advantages of web search
It is fast, convenient, and comprehensive.
ü You can quickly search through thousands of websites containing data on virtually
any topic.
ü It enables you to easily compare different sources of information to get the most
accurate and up-to-date facts.
Web searching offers a number of advantages over other methods of finding
information, including the following:
ü Your web search history is automatically saved, so you can access your search
results whenever you want.
ü It allows you to narrow down your search criteria so that you can find exactly what
you are looking fory
ü Most search engines now offer a web search app, so you can access them with a
single tap on your phone.Most search engines now offer a web search app, so you
Types of Web search:
Ø Boolean search — This type of search allows the user to combine keywords and
phrases using the so-called Boolean operators, such as “and”, “or”, and “not”, in
order to narrow down results. This approach is best for finding specific information.
Ø Natural language search — This one allows the user to type in a phrase or question
the same way they would say it out loud. This type of search is great for getting
more informal results from your queries.
Ø Semantic search — This option takes into account the context of a query to provide
more precise results. Semantic search is great for finding information related to
specific topics or concepts.
Boolean search - eg
1. A typical example of a Boolean search in a web application would be searching for
"laptops AND (Dell OR HP) NOT Chromebook" on an online shopping site, which
would return results only for laptops that are either Dell or HP, but would exclude
any Chromebooks, demonstrating the use of "AND", "OR", and "NOT" operators to
narrow down the search results.
2. "Climate change OR global warming NOT politics" - To find articles about climate
change or global warming, excluding articles focused on politics
Natural Language Search - eg
https://www.hindustantimes.com/world-news/volodymyr-zelensky-on-white-house-
fight-with-donald-trump-not-good-for-both-sides-101740786722125.html
https://www.thehindu.com/news/international/world-reacts-to-trump-zelensky-oval-
office-clash/article69278424.ece
Ø Semantic search is a set of search engine capabilities, which includes
understanding words from the searcher's intent and their search context.
Ø intended to improve the quality of search results by interpreting natural language
more accurately and in context.
Ø Semantic search is a data searching technique that focuses on understanding the
contextual meaning and intent behind a user's search query, rather than only
matching keywords.
v Google's Knowledge Graph is one of the most well-known examples of semantic
search in action. It uses data from a variety of sources to provide users with
information about people, places, and things. The Knowledge Graph has been
growing steadily since it was first introduced in 2012.
v use semantic search to provide users with more accurate and relevant search
results. Companies use semantic search to boost market visibility, increase sales,
and more.
Key points about Web Search & Retrieval:
Information Retrieval (IR):
ü The broader field that encompasses the theory and techniques for finding relevant
information from a collection of data, with web search being a specific application of IR.
Web Crawler:
ü A program that automatically browses the web, discovering and downloading web
pages to be indexed by the search engine.
Index:
ü A structured data storage that allows for quick lookups of relevant web pages based on
keywords and other metadata.
Query:
ü The text a user enters into a search engine to specify what information they are
looking for.
Ranking Algorithm:
ü The complex mathematical formula that determines the order in which search results
are displayed based on relevance to the query.
o Example:
When you search for "best restaurants near me" on Google, the search engine
uses its web crawler to find relevant restaurant pages, indexes them based on keywords
and location data, and then applies its ranking algorithm to present you with the most
likely top matches based on your search query
Bookmark (or favourite): a collection of links (saved shortcuts) to web pages that is
stored in a web browser.
Saving bookmarks allows users to quickly navigate back to the websites they visit the
most.
Bookmark bar: A toolbar that contains all bookmarks and is displayed at the top of a
browser window (under the address bar)
Web search vs Information retrieval
Ø While both involve finding relevant information based on a user query, "web search" is
a specific application of "information retrieval" that focuses on searching the vast,
interconnected collection of data on the World Wide Web,
Ø User Input: The user enters a keyword or phrase into the search interface.
Ø Query Processing: The search engine processes the query using algorithms to find
relevant documents in its database.
Ø Result Ranking: The engine ranks the results based on relevance and displays
them to the user.
Ø Web Crawlers : Web crawlers play a vital role in web search by systematically
browsing the internet to index content. They ensure that search engines have up-
to-date information to provide users with the most relevant results.
Search Engine Optimization-
Ø Search engine optimization is the process of improving the quality and quantity of
website traffic to a website or a web page from search engines.
Ø SEO targets unpaid search traffic rather than direct traffic, referral traffic, social
media traffic, or paid traffic.
Ø Search engine optimization (SEO) is an essential practice for any website looking to
improve its visibility and attract more organic traffic.
Ø the practice of orienting your website to rank higher on a search engine results
page (SERP) so that you receive more traffic. The aim is typically to rank on the first
page of Google results for search terms that mean the most to your target audience.
The four types of SEO are on-page, off-page, technical, and local.
Each type has its own strategies and best practices to improve a website's search
engine ranking.
On-page SEO
Ø Also known as on-site SEO, this involves optimizing a website's body copy,
keywords, headers, meta titles, meta descriptions, and images
Off-page SEO
Ø This involves using tools, tips, and best practices to promote a website on search
engines and third-party websites
Technical SEO
Ø This involves crawling, indexing, rendering, and website architecture
Local SEO
Ø This involves optimizing a website for customers in a particular geographic area
Other types of SEO
Ø International SEO: Optimizing a website for users in different countries and
speakers of different languages
Ø Multilingual SEO: Optimizing website content for multiple languages to improve
visibility and ranking on search engines
Ø White hat SEO: A safe optimization strategy to rank websites higher on search
engine results page (SERP)s
Ø Black hat SEO: A tactic that uses keyword stuffing, spammy tactics, and abuses
Google's algorithms
Ø Grey hat SEO: A technique that falls between white hat and black hat SEO
To optimize your website for SEO (Search Engine Optimization),
Ø conduct thorough keyword research
Ø create high-quality content relevant to those keywords
Ø optimize your website structure
Ø use relevant meta tags
Ø ensure mobile responsiveness
Ø monitor your website's performance using analytics tools like Google Search
Console to identify areas for improvement; essentially making your site easily
understood and valuable to search engines like Google, leading to higher rankings in
search results.
"Website crawling and indexing" refers to the process where
Ø search engine bots, also called "crawlers" or "spiders", automatically discover and
explore web pages across the internet
Ø then store and organize the content of those pages in a searchable database called
an "index",
Ø allowing them to display relevant results when users search for information online
Ø Page structure changes: Frequent changes to page structure can make it hard for web
Ø Dynamic content: Websites that use dynamic content, like JavaScript, can be difficult
Ø IP blocking: Websites can block IP addresses that make too many requests to the site.
Ø Scalability: As data needs grow, so does the complexity of managing, storing,
and processing it.
Ø Bandwidth: Web crawlers can consume a lot of server bandwidth when visiting
large websites.
Ø Crawling policies: Crawlers must adhere to website terms of use, such as
robots.txt files.
Ø Ethical considerations: Web crawlers must comply with legal and ethical
guidelines.
Ø Managing large-scale data: Web crawlers must handle large amounts of data
efficiently.
Ranking algorithms
§ Search engines use ranking algorithms to determine the order of search results for
a given query.
§ Some examples of ranking algorithms include PageRank, HITS, and Tagrank
§ Performance Analysis: They help assess the capacity and efficiency of networks,
§ Forecasting: They can be used to predict future traffic trends, allowing for proactive
v The aim is to improve and optimise the website in every respect, by making the
analysis of the different factors
BEST WISHES