Lecture 1 - On Internet

The document provides a short history of the internet from its origins in the 1960s as the ARPANET network funded by the US Department of Defense to the present day with over 4 billion users globally. It also describes the evolution of the World Wide Web from static Web 1.0 to user-generated content on social media in Web 2.0. Finally, it explains how search engines like Google index web pages using crawlers and the PageRank algorithm to retrieve and rank relevant information for users.

Uploaded by

Shabeer Ali PC

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

64 views56 pages

Lecture 1 - On Internet

Uploaded by

Shabeer Ali PC

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 56

AcSIR Course : Computer Applications and

Informatics

October 2021

Elizabeth Jacob
Chief Scientist
CSIR-NIIST
The Internet Past to Present

 Short History
 The Web
 Web Information Retrieval by Search Engines
 Search Engine Optimization
 Case Study Google
 About other Search Engines
History
Internet
• 1962 Joseph Licklider of MIT proposed the
earliest ideas of global networking. He had a
vision of Internet as we see it today.

• The dream was realized in 1969 ,funded by

Defence Dept., the first internet was called
ARPANet (Advanced Research Project Agency)
• Leonard Klienrock developed the theory of packet
switching. His laboratory’s UCLA Host computer
became the first ARPANET node in September
1969.
• Packet switching is a method of grouping data that
is transmitted over a digital network as packets
made of a header and a payload. Data in the header
is used by networking hardware to direct the packet
to its destination, where the payload is extracted
and used by application software.
History of Internet
• University of California at Los Angeles, University
of California at Santa Barbara, University of Utah,
and Stanford Research Institute were linked
together in the first ever truly wide-area-
network.
System crashed as they typed the G in LOGIN
• The Internet, consists of a complex network of
computers connected by high-speed
communication technologies (wired and wireless)
• The term “Internet” was finally coined in 1995 by
the FNC (Federal Networking Council, USA)
• E-mail was adapted for ARPANET by Ray Tomlinson
in 1972. He picked the @ symbol to link the
username and the address.
• TELNET or TELetype NETwork is commonly used by
terminal emulation programs that allow you to log
into a remote host computer. Can send text
messages, no encryption.
• 1973 FTP or File Transfer Protocol transfers files
between computers over a network.
1983 ARPANET adopts TCP/IP. 200 routers to direct
the traffic.
1984 NSF funds a TCP/IP based backbone
network.This backbone grows into the
NSFNET, which becomes the successor of the
ARPANET.

• In 1989 Tim Berners-Lee invents the concept of

hypertext systems that can run across the
Internet independent of a computer’s operating
system. (This is the idea of a Browser and WWW)
• 1995 NSF stops funding of NSFNET. The Internet
becomes completely commercial.
– Search Engines Yahoo and Altavista appear
• 1996
– Microsoft releases Internet Explorer
– AOL Instant Messenger (AIM) released, changing
the way people communicate over the Internet
• 1998
– Google arrives, with a new kind of search
mechanism using ranking rather than categories.
• The Internet Society decide on the rules, known
as protocols, for communication over the
Internet.
• As of January 2021 there were 4.66 billion active
internet users worldwide - 59.5 percent of the
global population. Of this total, 92.6 percent
(4.32 billion) accessed the internet via mobile
devices.
• In 2020, India had over 749 million internet users
across the country. This figure is projected to
grow to over 1.5 billion users by 2040
The Present - Surge towards universal
wireless access
• Travellers search for wi-fi hot spots to connect
their gadgets. City-wide access, wiMAX, 4g, 5g
will battle for dominance.
• Responsive web design - Small devices like
smart phones, tablets, GPS devices want to
tap into the web.
• Internet of Things is adding devices –
refrigerators, personal robots, VR headsets,
cameras
Global/Galactic Information
Infrastructure
• As it grows and becomes accessible to non-
technical communities, social networking and
services are boosting sites like Facebook,
Twitter, Linked-In, YouTube, Instagram
• Internet is driving Businesses
• Protecting privacy and data breaches is a
challenge for cybersecurity
Internet and the World Wide Web
• Internet is a huge network of computers all
connected together.
• the World Wide Web is a global collection of
documents and other resources, linked by hyperlinks
and URIs.
• World Wide Web (WWW) is defined as a
system of interlinked hypertext documents
accessed via the internet.
• Anyone who has internet connection can see
web pages which involve multimedia tools
such as text, images or videos. The proposal of
Tim Berners-Lee in 1989 and Robert Cailliau,
was to use hypertext to integrate information
into a web as nodes where users can view.
Web 1.0
• Coined by Tim Berners-Lee as “read only”
web. It is the first generation of WWW and
lasted from 1989 and 2005. Internet users
were only reading information presented to
them.
• The primary aim of the websites was to make
information public for anyone, and set up an
online presence.
• The focus was on content delivery rather than
interaction and production.
Web 2.0
Web 2.0 (2000-2010 and continuing) is described
as people-centric, participative, and read- write
web. Unlike 1.0 version, Web 2.0 allows more
control to users and is also called the social web.
facilitates interaction between web users and sites
which in turn allows users to communicate with
other users.
Web 2.0 applications are Facebook, Youtube,
Flickr, Twitter.
Web 3.0
• Web 3.0 was suggested by John Markoff as a new
kind of web in 2006. It is defined as semantic web
and includes integration, automation, discovery, and
machine-based understanding of data
• It encourages mobility and globalization. The Web
3.0, Semantic Web or intelligent web is the era (2010
and above) which refers to the future of web. In this
era computers can interpret information like humans
via Artificial Intelligence and Machine Learning.
• Examples of Web 3.0 are Apple’s Siri, Wolfram
Alpha.
Information Retrieval from the Web
The Search Process
Search Engines to Surf the Web
• YouTube is not simply a website; it is a search
engine. YouTube's user-friendliness, combined
with the soaring popularity of video content,
makes it the 2nd largest search engine with 3
billion searches per month.
• It aims to find the most relevant videos and
channels according to what people type in the
search box.
• The videos are ranked on how well the title,
descriptions and the video match the query and
which videos have had the most watch time.
• Facebook has acquired Whatsapp the messaging
service and Instagram the photo sharing app
Facebook allows search engines like Google to
index your profile and publicly available
information.
• Twitter a social networking and blogging service.
Search through words and hashtags to find what
you're looking for. You can search a date range to
get old tweets.
How does a Search Engine work ?
Information Retrieval by Search Engines
Web Crawlers
• Search engine crawlers, also called spiders (why?),
robots or just bots, are programs or scripts that
systematically and automatically browse pages
on the web.
• Gather information from across hundreds of
billions of webpages and organize it in the Search
index.
• The number of Internet pages is extremely large;
even the largest crawlers fall short of making a
complete index.
The process of crawling
• Web pages are connected to each other by
hyperlinks (what are these links ?)
• The spider follows the hyperlinks till it visits
every page.
• The URL is fetched and parsed. Each time it visits
a page, it adds information about it to a database
called the Search Index.
• As the Net has a billion websites, crawlers work
in advance to give results fast when we search.
Indexing
• The crawler crawls the web, building the list of
documents, figuring out which words appear in
each page. Documents are then indexed.
• Indexing is the process by which search engines
organise information before a search to enable
instantaneous responses to queries.
• Search engines use an inverted index, also
known as a reverse index for fast fetching of
results.
Creating Index
• The inverted index is the list of words, and the
documents in which they appear. Words are
indexed.
• Forward indexing from documents->to->words,
• Inverted indexing from words->to->documents.
• In web search example, you provide the list of
words (your search query), and SE produces
the documents (search result links).
Break up text of the doc into words and
sentences by tokenization.
Case Study
• Google was kick-started by two Stanford Univ.
students in 1998
• 'Googol' in mathematics means 10^100.
Google is a play on the term 'Googol,' which
means a number of nearly incomprehensible
size.
• Google’s website index contains billions of
pages and 100,000,000 gigabytes of data
• From noun to verb
• Larry Page and Sergey Brin developed
PageRank at Stanford University in 1998 as part
of a SE research project. Eponymous ?

• Three years prior, in 1995, an undergrad in

Brown’s Cognitive and Linguistic Sciences
program, Bradly Love and Steven Sloman
published an identical algorithm to PageRank, the
centrality algorithm.
Page Rank Algorithm
• To find the importance of a page to estimate
how good a website is.
• Page rank of a page is determined by other
pages.
• An inbound link increases
the B page’s rank
• Outbound link increases
rank of another page C
Page Rank Algorithm
• We assume page A has pages T1…Tn which point to it (i.e.,
are citations). The parameter d is a damping factor which
can be set between 0 and 1. We usually set d to 0.85.
• Also C(Ti) is defined as the number of links going out of
page Ti. The PageRank of a page A is given as follows:

PR(A) = (1-d) + d (PR(T1)/C(T1) + … + PR(Tn)/C(Tn))

• Iterate till the PRs converge
• PageRanks form a probability distribution over web pages.
Iteration A B C
PR(A)=PR(B)=PR(C)
0 1 1 1 D=0.85
1 1 .575 1.06375

PR(A)=(1-d)+d[PR(C)/C(C)]
= (1-0.85)+0.85[1/1]
=1
PR(B)=(1-d)+d[PR(A)/C(A)]
= (1-0.85)+0.85[1/2]
= 0.575
PR(C)=(1-d)+d[PR(A)/C(A)+
PR(B)/C(B)]
= (1-0.85)+0.85[1/2
+ 0.575/1]
= 1.06375
Iteration A B C
0 1 1 1
1 1 .575 1.06375

PR(A)=(1-d)+d[PR(C)/C(C)]
= (1-0.85)+0.85[1.06375/1]
= 1.054187
PR(B)=(1-d)+d[PR(A)/C(A)]
= (1-0.85)+0.85[1.054187/2]
= 0.598029
PR(C)=(1-d)+d[PR(A)/C(A)+
PR(B)/C(B)]
= (1-0.85)+0.85[1.054187/2
+ 0.598029/1]
= 1.0635
https://dnschecker.org/pagerank.php
Exercise to calculate Page Rank
A A

A A
Refining Google Search
• Search for “exact phrase” good for referencing
• Boolean Search AND OR operators solar system
• To exclude a term from search –term
• Search for a phrase with missing words use *
• Reverse image search to find the origin of a
special image. Go to images.google.com
Camera icon>upload image from computer OR right
click on image hosted online and copy its URL and
paste in search field -> similar matches
Choose Search by image option for exact match
• Search within a single website term site:URL
e.g. RTI site:niist.res.in
• Search for similar websites related:URL
e.g. related:myntra.com
• Search for a filetype
e.g. big data filetype:pdf
• No space before search term site: bbc.com
• Search with Startpage.com to get google results
protecting privacy – no tracking of IP address,
personal info, cookies, SSL encryption
Search Engine Optimization
• SEO or “Search Engine Optimization.” is the process of
improving your site to increase its visibility when people
search for products, services or information related to
their work.
• Organic search, also known as natural search, refers to
unpaid search results. In contrast to paid search results
(pay-per-click advertising), which are populated via
an auction system,
• Organic search results are based on relevance to the
user's search query.
• Unlike paid search ads, you can’t pay search engines to
get higher organic search rankings.
7 Simple Steps for SEO
• Know Your Keywords.
• Write High Quality Content (Naturally)
• Use Keywords in Your Website Page URLs.
• Don't Overlook Page Titles.
• Review Every Page for Additional Keyword Placement.
• Improve User Experience.
• Hire an Expert.
SEO Tools : Google Search Console, Semrush,
BuzzStream. DreamHost SEO Toolkit, Moz Pro, Linkody
Search Engines of 2020
• Google
• Microsoft Bing
• Yahoo
• Baidu
• Yandex
• DuckDuckGo
• Ask.com
• Ecosia
Study and
• Aol.com
• Internet Archive
compare these
SEs
AI-powered Engines

• Semantic Scholar is an AI- backed search

engine for academic publications developed at
the Allen Institute for AI and publicly released in
November 2015. It uses advanced NLP to provide
summaries for scholarly papers.
• provides one-sentence summary of scientific
literature. Useful options to narrow down search by
field of study, data range, filters by journal, author,
news and sort by recency, relevance, citation count
and landmark papers.
• 200 million papers
• MATSCHOLAR is an AI-based search
engine for information extraction from material
science literature.
• This website uses NLP to power search. It was
created as part of a research effort at Lawrence
Berkeley National Laboratory.
• provides one-sentence summary. Options to filter
search by material, properties, applications, sample
descriptors, synthesis method, characterization
method.
https://matscholar.com
Vulnerability of Internet
• Human Error
• Hardware Software failure,
• Communication disruption due to Natural
Phenomena
Shortly after 9 pm IST on Oct 4th, 2021, Facebook’s
services including WhatsApp, Instagram went down ?

Configuration changes on the backbone routers that

coordinate network traffic between the company’s
data centers. Facebook’s machines stopped
communicating with each other because of a DNS
error.

Working With Time - Lab Guide: Index Type Sourcetype Interesting Fields
0% (7)
Working With Time - Lab Guide: Index Type Sourcetype Interesting Fields
8 pages
Cybersecurity For Everyone Course Final Project: Oilrig
73% (11)
Cybersecurity For Everyone Course Final Project: Oilrig
16 pages
Internet Presentation
No ratings yet
Internet Presentation
25 pages
Lesson 3-The Web and The Internet
No ratings yet
Lesson 3-The Web and The Internet
43 pages
Lesson 1 - Exploring The Internet As A Productive Tool
100% (1)
Lesson 1 - Exploring The Internet As A Productive Tool
30 pages
Module 3 The Web and The Internet
No ratings yet
Module 3 The Web and The Internet
55 pages
8080 Albania-Iptv
No ratings yet
8080 Albania-Iptv
2 pages
Osint Tools - Osint Post
No ratings yet
Osint Tools - Osint Post
7 pages
Semantc Web and Social Networks
No ratings yet
Semantc Web and Social Networks
63 pages
Cryptographic Storage Cheat Sheet
No ratings yet
Cryptographic Storage Cheat Sheet
8 pages
Introduction To Web Technology
No ratings yet
Introduction To Web Technology
14 pages
Internet and Internet Protocols
No ratings yet
Internet and Internet Protocols
21 pages
Websearch
No ratings yet
Websearch
21 pages
CH12 Testbank Crypto6e
No ratings yet
CH12 Testbank Crypto6e
6 pages
Internet and WWW
No ratings yet
Internet and WWW
3 pages
System To System API
No ratings yet
System To System API
13 pages
ISMS Presentation
No ratings yet
ISMS Presentation
18 pages
Lecture01 Introductions
No ratings yet
Lecture01 Introductions
31 pages
Inbound 7232818393756273775
No ratings yet
Inbound 7232818393756273775
62 pages
AWS Security Best Practices v1.0
No ratings yet
AWS Security Best Practices v1.0
30 pages
New Microsoft Word Document
No ratings yet
New Microsoft Word Document
29 pages
MCS 1 Cautare Pe Net
No ratings yet
MCS 1 Cautare Pe Net
22 pages
Advanced New Media ENDSEM
No ratings yet
Advanced New Media ENDSEM
53 pages
Week 3 - The World in My Hands (Part 1)
No ratings yet
Week 3 - The World in My Hands (Part 1)
48 pages
Module 1
No ratings yet
Module 1
53 pages
Introduction To The Internet and World Wide Web
No ratings yet
Introduction To The Internet and World Wide Web
42 pages
Phishing Awareness
No ratings yet
Phishing Awareness
5 pages
Ict 02 Css q1 Module 1 Basic Web Concepts
No ratings yet
Ict 02 Css q1 Module 1 Basic Web Concepts
10 pages
Web Search. Web Spidering
No ratings yet
Web Search. Web Spidering
44 pages
1 PB
No ratings yet
1 PB
5 pages
Lecture 04 The World Wide Web
No ratings yet
Lecture 04 The World Wide Web
36 pages
1.2 A Brief History of The Web and The Internet
No ratings yet
1.2 A Brief History of The Web and The Internet
6 pages
Lecture 04 The World Wide Web
No ratings yet
Lecture 04 The World Wide Web
36 pages
Building A Brand On Social Media
No ratings yet
Building A Brand On Social Media
21 pages
Lesson-3 The Web and The Internet
No ratings yet
Lesson-3 The Web and The Internet
45 pages
On Internet
No ratings yet
On Internet
38 pages
Application Support Protocols: File Transfer Protocol (FTP) / Trivial FTP (TFTP)
No ratings yet
Application Support Protocols: File Transfer Protocol (FTP) / Trivial FTP (TFTP)
56 pages
Summary Part1
No ratings yet
Summary Part1
37 pages
Application of Information and Computing Technologies Lecture 3
No ratings yet
Application of Information and Computing Technologies Lecture 3
34 pages
Ip Unit-2 Notes
No ratings yet
Ip Unit-2 Notes
57 pages
Ch5 - Internet Evolution
No ratings yet
Ch5 - Internet Evolution
31 pages
Lectue 3
No ratings yet
Lectue 3
33 pages
Inbound 422181700900912230
No ratings yet
Inbound 422181700900912230
32 pages
Writing Email
No ratings yet
Writing Email
14 pages
Information Evolution
No ratings yet
Information Evolution
21 pages
INFOT 2 Chapter 1 Web Systems and Technologies
No ratings yet
INFOT 2 Chapter 1 Web Systems and Technologies
32 pages
Internet
No ratings yet
Internet
18 pages
Week 1 - Introduction
No ratings yet
Week 1 - Introduction
34 pages
2021 Star Event Planning Process Summary Page
No ratings yet
2021 Star Event Planning Process Summary Page
1 page
Module 3
No ratings yet
Module 3
29 pages
CMP-111 Lec18 - 220410 - 110145
No ratings yet
CMP-111 Lec18 - 220410 - 110145
18 pages
Ict1 - Chapter 3.2
No ratings yet
Ict1 - Chapter 3.2
21 pages
UNIT - III - INTERNET TECHNOLOGY and PROTOCOL
No ratings yet
UNIT - III - INTERNET TECHNOLOGY and PROTOCOL
20 pages
G4 1st Lesson
No ratings yet
G4 1st Lesson
17 pages
Lesson 1
No ratings yet
Lesson 1
30 pages
Iwt Unit1
No ratings yet
Iwt Unit1
31 pages
L7 - WWW
No ratings yet
L7 - WWW
14 pages
Article - What Is A Chatbot
No ratings yet
Article - What Is A Chatbot
2 pages
Lec03 Www.
No ratings yet
Lec03 Www.
18 pages
World Wide Web
No ratings yet
World Wide Web
30 pages
JMC 414 - March 24, 2023... Dr. Ndode
No ratings yet
JMC 414 - March 24, 2023... Dr. Ndode
22 pages
Web Programming
No ratings yet
Web Programming
22 pages
Web Design Lesson 1 5
No ratings yet
Web Design Lesson 1 5
11 pages
Week 1 Internet
No ratings yet
Week 1 Internet
20 pages
Introduction To Web Technology
No ratings yet
Introduction To Web Technology
30 pages
Interesting File
No ratings yet
Interesting File
29 pages
ITEC50 Lesson 1
No ratings yet
ITEC50 Lesson 1
29 pages
7.1 History of The Internet
No ratings yet
7.1 History of The Internet
7 pages
Final - STS Notes
No ratings yet
Final - STS Notes
14 pages
It Era Midterm
No ratings yet
It Era Midterm
10 pages
English Assignment
No ratings yet
English Assignment
12 pages
Topic 1
No ratings yet
Topic 1
8 pages
The Word Wide Web Multimedia
No ratings yet
The Word Wide Web Multimedia
12 pages
DCN 4 Unit
No ratings yet
DCN 4 Unit
7 pages
Web Designing
No ratings yet
Web Designing
6 pages
UAT Turn Over Memo - CIB
No ratings yet
UAT Turn Over Memo - CIB
3 pages
Comp 3rd Quarter Reviewer
No ratings yet
Comp 3rd Quarter Reviewer
9 pages
wt2021 Handout w05 DOM
No ratings yet
wt2021 Handout w05 DOM
33 pages
Lite Reviewer (Midterm)
No ratings yet
Lite Reviewer (Midterm)
6 pages
Angularjs: Manne Veera Venkat Narayana 17551A0589 Computer Science and Engineering
No ratings yet
Angularjs: Manne Veera Venkat Narayana 17551A0589 Computer Science and Engineering
9 pages
IWT 2 Evolution
No ratings yet
IWT 2 Evolution
5 pages
Lesson 3
No ratings yet
Lesson 3
5 pages
Output
No ratings yet
Output
9 pages
What Is Augmented Reality by Simple Language
No ratings yet
What Is Augmented Reality by Simple Language
5 pages
ALDAY - 1.0.1.2 Class Activity - Branching Out
No ratings yet
ALDAY - 1.0.1.2 Class Activity - Branching Out
3 pages
Bindings
No ratings yet
Bindings
10 pages
2.5.2.6 Packet Tracer - Exploring File and Data Encryption-1
No ratings yet
2.5.2.6 Packet Tracer - Exploring File and Data Encryption-1
4 pages
Sita Security Requirements
No ratings yet
Sita Security Requirements
3 pages
Bibliography
No ratings yet
Bibliography
2 pages
Responsive Web Design With Html 5 & Css
From Everand
Responsive Web Design With Html 5 & Css
James wood
No ratings yet