Unit 1
Unit 1
Unit 1
The World Wide Web, abbreviated as WWW and commonly known as the Web, is a system of
interlinked hypertext documents accessed via the Internet. With a web browser, one can view
web pages that may contain text, images, videos, and other multimedia and navigate between
them via hyperlinks. Using concepts from earlier hypertext systems, English engineer and
computer scientist Sir Tim Berners-Lee, now the Director of the World Wide Web Consortium,
wrote a proposal in March 1989 for what would eventually become the World Wide Web. At
CERN in Geneva, Switzerland, Berners-Lee and Belgian computer scientist Robert Cailliau
proposed in 1990 to use "Hypertext to link and access information of various kinds as a web of
nodes in which the user can browse at will", and publicly introduced the project in December.
"The World-Wide Web (W3) was developed to be a pool of human knowledge, and human
culture, which would allow collaborators in remote sites to share their ideas and all aspects of
a common project.
WWW
W World (Ability to access information around the world)
W Wide (Large Span of Computers, Vast information that is stored at a no. of web servers.)
W Web (Information that is linked to one another text + visual information, stored at
multiple information.
Information is cross-platform
1. Access information from any hardware INTEL, APPLE
2. Access information from any software
1. Windows
2. UNIX
3. LINUX
WHAT is required to Exchange/Access information PROTOCOL
1. HTTP (Hyper Text Transfer Protocol.)
2. FTP (File Transfer Protocol).
2. Share information.
The Internet, not the Web, is also used for e-mail, which relies on SMTP, Usenet news
groups, instant messaging and FTP. So the Web is just a portion of the Internet, albeit a large
portion, but the two terms are not synonymous and should not be confused.
URL
In computing, a Uniform Resource Locator (URL) is a Uniform Resource Identifier (URI) that
specifies where an identified resource is available and the mechanism for retrieving it. In
popular usage and in many technical documents and verbal discussions it is often incorrectly
used as a synonym for URI. The best-known example of the use of URLs is for the addresses of
web pages on the World Wide Web, such as http://www.example.com/.
Every URL consists of some of the following: the scheme name (commonly called protocol),
followed by a colon, then, depending on scheme, a domain name (alternatively, IP address), a
port number, the path of the resource to be fetched or the program to be run, then, for
programs such as Common Gateway Interface (CGI) scripts, a query string, and an optional
fragment identifier.
Scheme
The scheme name defines the namespace, purpose, and the syntax of the remaining part of
the URL.
Software will try to process a URL according to its scheme and context. For example, a web
browser will usually dereference the URL http://example.org:80 by performing an HTTP
request to the host at example.org, using port number 80.
Other examples of scheme names include Https, gopher, ftp:
Secure Website
URLs with https as a scheme (such as https://example.com/) require that requests and
responses will be made over a secure connection to the website.
Some schemes that require authentication allow a username and perhaps a password too, to
be embedded in the URL, for example ftp://[email protected]. Passwords embedded in
this way are not conducive to secure working, but the full possible syntax is
scheme://username:password@domain:port/path?query_string#fragment_id
Domain Name
The domain name or IP address gives the destination location for the URL.
http://www.selfseo.com/find_ip_address_of_a_website.php
Website IP-Address
www.google.com http://173.194.70.19/
www.gmail.com 173.194.70.19
www.rediff.com 92.123.68.170
www.youtube.com http://208.65.153.238/
209.85.171.83
The domain googleq.com, or its IP address 209.85.153.104, is the address of Google's website.
The domain name portion of a URL is not case sensitive since DNS ignores case:
http://en.example.org/ and HTTP://EN.EXAMPLE.ORG/ both open the same page.
Port Number
A port is associated with an IP address of the host, as well as the type of protocol used for
communication.
The port number is optional; if omitted, the default for the scheme is used.
For example, http://vnc.example.com:5800 connects to port 5800 of vnc.example.com, which
may be appropriate for a VNC remote control session.
If the port number is omitted for an http: URL, the browser will connect on port 80, the default
HTTP port.
Port No
HTTP 80
HTTPS 443
SMTP 25
FTP 47
Path
The path is used to specify and perhaps find the resource requested. It is case-sensitive, though
it may be treated as case-insensitive by some servers, especially those based on Microsoft
Windows. If the server is case sensitive and http://en.example.org/wiki/URL is correct,
http://en.example.org/WIKI/URL/ or http://en.example.org/wiki/url/ will display an HTTP 404
error page, unless these URLs point to valid resources themselves.
Parts of URL
The first part is the protocol. In this case, we are requesting to view a file using hypertext transfer protocol. Another
popular protocol is ftp (file transfer protocol).
The protocol is followed by ://
The next part, wiki, is usually the name of the server that stores the file you will view through
the browser. http://www.answers.com/ is not the same as http://wiki.answers.com/ even
though you see answers.com in both URLS.
The next part: answers.com is the domain. This is made up of two parts. The first is the host
name (in this case "answers") and the second is the top-level domain. Other top level
domains include .org and .mil
Some urls include directories and files. For example, in the URL
http://www.answers.com/main/business.jsp there is a directory called main and in that
directory is a FILE called business.jsp.
Notice that the domain, directories, and files are separated by slashes and the filename and the file extension are
separated by a period. There endless types of files on the Web including .pdf, .php, .html and others.
You'll notice that many times you do not see a filename in a URL. In cases like that, you are
actually looking at a default file. Many sites default to a file called index.html. If you go to
http://wiki.answers.com/index.html and http://wiki.answers.com/ you are actually looking at
the same page. You just don't have to type the entire url to see the main/default page.
The optional port number defines the port with which to connect to the server or service
(this is specified by the server and you can only connect to a special port if one exists on the
server). By default websites communicate over port 80 so when no port is specified port 80 is
assumed however other ports can be defined in the following format:
http://somesite.com:port/ (eg. http://somesite.com:1010/).
The file path defines the path to the page or file to be viewed. When you load a website
without the file path ie. http://www.google.com/ you are directed to the root level of the
Public_Html or www folder and if it exists the file in that directory named index.htm,
index.html, index.asp or index.php. Defining a file path will take you to a different location
such as http://www.somesite.com/My_pet_photos.htm.
Finally, the optional query string defines any variables when the file path is a script such as
php, asp or cgi. Query strings can cause the script to react in different ways. For example:
http://www.somewiki.com/wiki/Page_Name&Action=edit.
Query String
The query string contains data to be passed to software running on the server. It may contain
name/value pairs separated by ampersands, for example? first_name=John&last_name=Doe.
The fragment identifier, if present, specifies a part or a position within the overall resource or
document. When used with HTTP, it usually specifies a section or location within the page, and
the browser may scroll to display that part of the page.
Domain
Domain is a group of computers that are part of a N/w and share a common directory
address.
A domain is registered as a unit with common rules and procedures. Each domain has a
unique name.
In simple terms it is the same as your home postal address of your computer system. It routes
the packet from its source to your system over the internet. When somebody sends you the
mail, it gives the internet routing protocols the unique information they need to route packets
to your desktop anywhere across the internet.
The IP address is the geographical description of the virtual world and the addresses of both
source and destination systems are stored in the header of every packet.
The address consists of 4 octets, each separated by a dot.
Domain Name System (DNS): This allows the IP address to be translated to words. It is much
easier for us to remember a word than a series of numbers. The same is true for email
addresses.
For example, it is much easier for you to remember a web address name such as
whatismyip.com than it is to remember 192.168.1.1 or in the case of email it is much easier to
remember [email protected] than [email protected]
Dynamic IP Address: An IP address that is not static and could change at any time. This IP
address is issued to you from a pool of IP addresses allocated by your ISP . This is for a large
number of customers that do not require the same IP Address all the time for a variety of
reasons.
Static IP Address: An IP address that is fixed and never changes. This is in contrast to a dynamic
IP address which may change at any time. Most ISP's a single static IP or a block of static IP's
.org: The .org domain extension stands for organization. It was established in January 1985, with the
purpose of being given to organizations that do not fulfill the requirements of other generic top-level
domains. Organizations all over the world can register for .org domain extension. It can also be
used by individuals. However individuals can also use domain extensions such as .name and .info.
There are no requirements for registration of the .org extension.
.edu: It stands for education and is widely used by the educational institutions across the United States.
Not all websites using the .edu extension are educational institutions. Some of them are museums
or research organizations linked with education.
.gov: It is a sponsored domain extension that is used by the government entities in the United States.
Federal agencies in the United States use the .fed domain extension. The Department of Defense
and its subordinate organizations use the .mil domain extension.
.net earliest top-level domains in use. Established in 1985, it is currently being managed by VeriSign.
Similar to the .org domain, the .net domain also has no requirements for registration. It ranks third
in the list of most popular top-level domains.
.info and for personal use and other domain extensions like .aero, .biz and .pro are some of the relatively
.name new domains added to the list of generic top-level domains. They were developed and began to be
domain used in the period between 2000 and 2002. The .aero domain stands for aeroplane and is used by
extensions businesses associated with aviation. .biz is used by businesses. It was designed with the aim of
providing businesses with an option to the .com domain. The .pro generic top-level domain can be
used by qualified professionals.
More recent domains developed by the Internet Corporation for Assigned Names and Numbers
(ICANN). The company websites intended at seeking employees and dealing with issues related to
jobs,
the company employment use the .jobs domain extension. The .mobi domain extension is used by
.mobi and
mobile devices gaining an access to the Internet. Supported by Google, Microsoft, the GSM
.travel
Association and many prominent telecom industries, .mobi is one of the very important domain
extensions. The .travel domain is meant to be used by travel agents and tourism agencies.
.pk by Pakistan
.us used by the state and local governments in the United States
Protocols
The Internet Protocol (IP) is the method or protocol by which data is sent from one
computer to another on the Internet. Each computer (known as a host) on the
Internet has at least one IP address that uniquely identifies it from all other
computers on the Internet. When you send or receive data (for example, an e-mail
note or a Web page), the message gets divided into little chunks called packets.
Each of these packets contains both the sender's Internet address and the
receiver's address. Any packet is sent first to a gateway computer that understands
a small part of the Internet. The gateway computer reads the destination address
and forwards the packet to an adjacent gateway that in turn reads the destination
address and so forth across the Internet until one gateway recognizes the packet
as belonging to a computer.
Because a message is divided into a number of packets, each packet can, if necessary, be sent
by a different route across the Internet. Packets can arrive in a different order than the order
they were sent in. The Internet Protocol just delivers them. It's up to another protocol, the
Transmission Control Protocol (TCP) to put them back in the right order.
For this purpose the Internet Protocol defines an addressing system that has two functions.
Each packet is tagged with a header that contains the meta-data for the purpose of delivery. This process of tagging is
also called encapsulation.
TCP/IP
TCP (Transmission Control Protocol) and IP (Internet Protocol) are two different procedures
that are often linked together.
In fact, the term "TCP/IP" is normally used to refer to a whole suite of protocols, each with
different functions. This suite of protocols is what carries out the basic operations of the Web.
TCP/IP is also used on many local area networks.
When information is sent over the Internet, it is generally broken up into smaller pieces or
"packets". The use of packets facilitates speedy transmission since different parts of a message
can be sent by different routes and then reassembled at the destination. It is also a safety
measure to minimize the chances of losing information in the transmission process.
TCP is the means for creating the packets, putting them back together in the correct order at
the end, and checking to make sure that no packets got lost in transmission. If necessary, TCP
will request that a packet be resent.
Internet Protocol (IP) is the method used to route information to the proper address.
Every computer on the Internet has to have it own unique address known as the IP address.
Every packet sent will contain an IP address showing where it is supposed to go. A packet may
go through a number of computer routers before arriving at its final destination and IP controls
the process of getting everything to the designated computer. Note that IP does not make
physical connections between computers but relies on TCP for this function. IP is also used
in conjunction with other protocols that create connections.
This protocol is used together with IP when small amounts of information are involved.
Thus, it uses fewer system resources.
Web pages are constructed according to a standard method called Hypertext Markup
Language (HTML). An HTML page is transmitted over the Web in a standard way and format
known as Hypertext Transfer Protocol (HTTP). This protocol uses TCP/IP to manage the Web
transmission.
HTTP functions as a request-response protocol in the client-server computing model.
In HTTP, a web browser, for example, acts as a client, while an application running on a
computer hosting a web site functions as a server. The client submits an HTTP request message
to the server. The server, which stores content, or provides resources, such as HTML files, or
performs other functions on behalf of the client, returns a response message to the client. A
response contains completion status information about the request and may contain any
content requested by the client in its message body.
HTTP is an application layer network protocol built on top of TCP.
HTTP clients (such as Web browsers) and servers communicate via HTTP request and response
messages. The three main HTTP message types are GET, POST, and HEAD.
HTTP utilizes TCP port 80 by default, though other ports such as 8080 can alternatively be used.
A related protocol is Hypertext Transfer Protocol over Secure Socket Layer (HTTPS), first
introduced by Netscape. It provides for the transmission in encrypted form to provide security
for sensitive data. A Web page using this protocol will have https: at the front of its URL.
HTTP Methods
1. GET
The Get is one the simplest Http method. Its main job is to ask the server for the resource.
If the resource is available then it will given back to the user on your browser.
That resource may be a HTML page, a sound file, a picture file (JPEG) etc. We can say that
get method is for getting something from the server. It doesn't mean that you can't send
parameters to the server. But the total amount of characters in a GET is really limited. In get
method the data we send get appended to the URL so whatever you will send will be seen
by other user so can say that it is not even secure.
2. POST
The Post method is more powerful request. By using Post we can request as well as
send some data to the server. We use post method when we have to send a huge data
to the server, like when we have to send a long enquiry form then we can send it by
using the post method.
3. HEAD
Head is the same as GET but returns only HTTP headers and no document body.
Protocol Usage
HTTP Display Web Pages & related files
To transfer files with FTP, you use a program often called the "client."
The FTP client program initiates a connection to a remote computer running
FTP "server" software. After the connection is established, the client can
choose to send and/or receive copies of files, singly or in groups. To connect
to an FTP server, a client requires a username and password as set by the
administrator of the server.
SMTP Stands for "Simple Mail Transfer Protocol."
It's a set of communication guidelines that allow software to transmit email
over the Internet. Most email software is designed to use SMTP for
communication purposes when sending email, and It only works for outgoing
messages. When people set up their email programs, they will typically have
to give the address of their Internet service provider's SMTP server for
outgoing mail. There are two other protocols - POP3(Post Office Protocol)
and IMAP(Internet Message Access Protocol) - that are used for retrieving
and storing email.
Your e-mail client (such as Outlook Expres, Eudora, or Mac OS X Mail) uses
SMTP to send a message to the mail server, and the mail server uses SMTP
to relay that message to the correct receiving mail server.
From within a company, an intranet server may respond much more quickly
than a typical Web site. This is because the public Internet is at the mercy
of traffic spikes, server breakdowns and other problems that may slow the
network. Within a company, however, users have much more bandwidth
and network hardware may be more reliable. This makes it easier to serve
high bandwidth content, such as audio and video, over an
intranet.
Today I think of intranets, extranets, and the Web as collections of content. An intranet is a set
of content shared by a well-defined group within a single organization. An extranet is a set of
content shared by a well-defined group, but one that crosses enterprise boundaries.
Website represents an organization to outside world, but a portal provides multiple user roles with a
common access point.
A website is also a portal, if it broadcast information from different independent resources, thus offering a
public service function to visitors.
Web Portal
Web portal refers to a website or service that offers broad array of resources and services such as email,
forums, search engines and online shopping malls. Its an organized gateway that helps to configure the
access to information found on the internet. Web portal applications offers consistent look and feel with
access control & procedures for multiple applications and databases. Some of the web portals are AOL,
iGoogle, Yahoo and even more.
9. Extensive & unfocused content can be created to accommodate unidentified users needs.
Websites
A website refers to a location on the internet and a collection of web pages, images, videos which are
addressed relative to a common Uniform Resource Location (URL). Its nothing but a domain name
hosted on a server which is accessible via a network called internet or private local area network.
Owning a website becomes an essential part for any businesses and company with no web presence is
just running the risk of losing the business opportunities.
Portal: It provides facility of Logging-In. Provides you with information based on who you are.
e.g. mail.yahoo.com, gmail.com, rediffmail.com
Website: No log-in.
e.g. www.yahoo.com
Personalization:
Portal: Limited, focused content. Eliminates the need to visit many different sites.
e.g. You type in your user name and password and see your yahoo mail only.
Website: Extensive, unfocused content written to accommodate anonymous users needs.
Customization:
Portal: You will select and organize the materials you want to access. Organized with the materials you want to access.
Website: Searchable, but not customizable. All content is there for every visitor.
e.g. you can navigate to yahoo mail, yahoo shopping, geo cities, yahoo group. If you wish to use any of these services
you will either have to authenticate yourself and see things personalized to you or you can simply visit sections that are
for everyone like yahoo news were if you are not signed in then the default sign in is guest.
Web Server
A web server can be referred to as either the hardware (the computer) or the software (the computer
application) that helps to deliver content that can be accessed through the Internet. A web server is what makes
it possible to be able to access content like web pages or other data from anywhere as long as it is connected
to the internet. The hardware houses the content, while the software makes the content accessible through the
internet.
The most common use of web servers is to host websites but there are other uses like data storage or for
running enterprise applications.
There are also different ways to request content from a web server. The most common request is the Hypertext
Transfer Protocol (HTTP), but there are also other requests like the Internet Message Access Protocol (IMAP)
or the File Transfer Protocol (FTP).
A client, commonly a web browser or web crawler, initiates communication by making a request for a specific
resource using HTTP and the server responds with the content of that resource or an error message if unable
to do so. The resource is typically a real file on the server's secondary memory, but this is not necessarily the
case and depends on how the web server is implemented.
While the primary function is to serve content, a full implementation of HTTP also includes ways of receiving
content from clients. This feature is used for submitting web forms, including uploading of files.
Web servers are not always used for serving the World Wide Web. They can also be found embedded in
devices such as printers, routers, webcams and serving only a local network. The web server may then be
used as a part of a system for monitoring and/or administrating the device in question. This usually means that
no additional software has to be installed on the client computer; since only a web browser is required (which
now is included with most operating systems).
Common features
A Web server (program) has defined load limits, because it can handle only a limited number of concurrent
client connections (usually between 2 and 80,000, by default between 500 and 1,000) per IP address (and
TCP port) and it can serve only a certain maximum number of requests per second depending on:
Overload causes
At any time web servers can be overloaded because of:
1. Too much legitimate web traffic. Thousands or even millions of clients connecting to the web site in a short
interval
2. Computer worms that sometimes cause abnormal traffic because of millions of infected computers (not
coordinated among them);
A computer worm is a self-replicating malware computer program, which uses a computer network to send
copies of itself to other nodes (computers on the network) and it may do so without any user intervention. This
is due to security shortcomings on the target computer. Unlike a computer virus, it does not need to attach itself
to an existing program. Worms almost always cause at least some harm to the network, even if only by
consuming bandwidth, whereas viruses almost always corrupt or modify files on a targeted computer.
3. XSS viruses can cause high traffic because of millions of infected browsers and/or Web servers;
4. Internet (network) slowdowns, so that client requests are served more slowly and the number of connections
increases so much that server limits are reached;
5. Web servers (computers) partial unavailability. This can happen because of required or urgent maintenance or
upgrade, hardware or software failures, back-end (e.g., database) failures, etc.; in these cases the remaining
web servers get too much traffic and become overloaded.
6. Internet bots. Traffic not filtered/limited on large web sites with very few resources (bandwidth, etc.);Internet
bots, also known as web robots, WWW robots or simply bots, are software applications that run automated
tasks over the Internet. Typically, bots perform tasks that are both simple and structurally repetitive, at a much
higher rate than would be possible for a human alone.
Symptoms of overload
The symptoms of an overloaded web server are:
Anti-overload techniques
Managing network traffic, by using:
1. Firewalls to block unwanted traffic coming from bad IP sources or having bad patterns;
2. Bandwidth management and traffic shaping, in order to smooth down peaks in network usage;
6. using many web servers (computers) that are grouped together so that they act or are seen as one big web
server
Market structure
Below is the most recent statistics of the market share of the top web servers on the internet by Netcraft
survey in January 2012.
Internet History
The Internet was originally developed by DARPA, the Defense Advanced Research
Projects Agency, as a means to share information on defense research between
involved universities and defense research facilities. Originally it was just email and
FTP sites as well as the Usenet where scientists could question and answer each other.
It was originally called ARPANET (Advanced Research Projects Agency Network)
developed starting in 1964, since networking computers was new to begin with,
standards were being developed on the fly. Once the concept was proven, the
organizations involved started to lay out some ground rules for standardization.
Web Browsers
A web browser or Internet browser is a software application for retrieving, presenting,
and traversing information resources on the World Wide Web. An information resource
is identified by a Uniform Resource Identifier (URI) and may be a web page, image,
video, or other piece of content. Hyperlinks present in resources enable users to easily
navigate their browsers to related resources.
Although browsers are primarily intended to access the World Wide Web, they can
also be used to access information provided by Web servers in private networks or
files in file systems. Some browsers can also be used to save information resources
to file systems.
History
The first web browser was invented in 1990 by Tim
Berners-Lee. It was called WorldWideWeb (no spaces)
and was later renamed Nexus.
The history of the Web browser dates back in to the
late 1980s, when a variety of technologies laid the
foundation for the first Web browser,
WorldWideWeb, by Tim Berners-Lee in 1991. That browser brought together a variety of
existing and new software and hardware technologies.
Microsoft responded with its browser Internet Explorer in 1995 (also heavily influenced by
Mosaic), initiating the industry's first browser war. Microsoft was able to leverage its
dominance in the operating system market to take over the Web browser market; Internet
Explorer usage share peaked at over 95% by 2002.
Opera first appeared in 1996; although it has never achieved widespread use, with a browser
usage share that has fluctuated between 2.2% and 2.4% throughout 2010
In 1998, Netscape launched what was to become the Mozilla Foundation in an attempt to
produce a competitive browser using the open source software model. That browser would
eventually evolve into Firefox, which developed a respectable following while still in the beta
stage of development; shortly after the release of Firefox 1.0 in late 2004, Firefox (all versions)
accounted for 7.4% of browser use. The Firefox usage share has slowly declined in 2010, from
24.4% in January to 22.8% in December.
Apple's Safari had its first beta release in January 2003; it has a dominant share of Apple-based
Web browsing, having risen from 4.5% usage share in January 2010 to 5.9% in December 2010.
Its rendering engine, called WebKit, is also running in the standard browsers of several mobile
phone platforms, including Apple iOS, Google Android, Nokia S60 and Palm webOS.
The most recent major entrant to the browser market is Google's Chrome, first released in September 2008. Chrome's
take-up has increased significantly year on year, by doubling its usage share from 7.7 percent to 15.5 percent by August
2011.
This increase seems largely to be at the expense of Internet Explorer, whose share has tended to decrease from month
to month.
In December 2011 Google Chrome overtook Internet Explorer 8 as the most widely used web browser.
However, when all versions of Internet Explorer are put together, IE is still most popular.
Function
The primary purpose of a web browser is to bring information resources to the user. This process begins when the user
inputs a Uniform Resource Identifier (URI), for example http://en.wikipedia.org/, into the browser. The prefix of the
URI determines how the URI will be interpreted.
The most commonly used kind of URI starts with http:
and identifies a resource to be retrieved over the Hypertext Transfer Protocol (HTTP).
Many browsers also support a variety of other prefixes, such as https: for HTTPS, ftp: for the File Transfer Protocol, and
file: for local files. Prefixes that the web browser cannot directly handle are often handed off to another application
entirely. For example, mailto: URIs are usually passed to the user's default e-mail application and news: URIs are passed
to the user's default newsgroup reader.
1. All major web browsers allow the user to open multiple information resources at
the same time, either in different browser windows or in different tabs of the same
window.
2. Major browsers also include pop-up blockers to prevent unwanted windows from
"popping up" without the user's consent.
3. Most web browsers can display a list of web pages that the user has bookmarked
so that the user can quickly return to them. Bookmarks are also called "Favorites"
in Internet Explorer.
4. In addition, all major web browsers have some form of built-in web feed
aggregator.
5. In Mozilla Firefox, web feeds are formatted as "live bookmarks" and behave like
a folder of bookmarks corresponding to recent entries in the feed.
6. In Opera, a more traditional feed reader is included which stores and displays
the contents of the feed.
7. Furthermore, most browsers can be extended via plug-ins, downloadable
components that provide additional features.
User interface
1. Back and forward buttons to go back to the previous resource and forward
again.
2. A history list, showing resources previously visited in a list (typically, the list is
not visible all the time and has to be summoned)
3. A refresh or reload button to reload the current resource.
A stop button to cancel loading the resource. In some browsers, the stop button
is merged with the reload button.
4. A home button to return to the user's home page.
5. An address bar to input the Uniform Resource Identifier (URI) of the desired
resource and display it.
6. A search bar to input terms into a search engine.
7. A status bar to display progress in loading the resource and also the URI of
links when the cursor hovers over them, and page zooming capability.
Standards support
Early web browsers supported only a very simple version of HTML. The rapid
development of web browsers led to the development of non-standard dialects of
HTML, leading to problems with interoperability. Modern web browsers support a
combination of standards-based and de facto HTML and XHTML, which should be
rendered in the same way by all browsers.
Firefox (25.0%)
Safari (8.0%)
Opera (2.7%)
Mobile browsers (6.7%)
While Microsoft Internet Explorer comes preinstalled on all PCs running the Windows operating system, many
consumers look to third-party browsers when the features better suit their needs. In the case of the Netscape Navigator
browser, several key differences set it apart from Internet Explorer.
1. Interface
Speed
2. Netscape Navigator's interface is more bare bones, with its simple gray windows and minimal clutter. Internet
Explorer has a multi-faceted interface, ideal for some advanced users but unnecessarily complicated for
others.
Support
3. Though Netscape takes a little bit longer to initialize than Internet Explorer, the Netscape browser is able to
offer quicker real-time browsing due to its smaller file sizes.
Security
4. Perhaps the most glaring shortcoming of Netscape Navigator is its complete lack of support and upgrades.
While Internet Explorer is a current, regularly upgraded product, the Associated Press reports that Netscape
Navigator was officially cancelled by AOL on Feb 1, 2009.
Compatibility
5. According to Microsoft, Internet Explorer offers "cross-site scripting filter" and "a SmartScreen Filter" to help
avoid security risks. Netscape employs security certificates, but it does not have active security updates.
6. Both browsers allow for basic Direct X, Java, and Flash compatibility. However, in terms of toolbars and other
browser add-ons, Internet Explorer is more widely compatible.
The first internet browser was created in the 1980s and was called WorldWideWeb released in
1991. It wasnt until the creation of NCSA Mosaic the first graphical web browser that the
internet began to see wide spread use.
The leader of the Mosaic team then separated to create Netscape Navigator in 1994 which went on
to become the most widely used internet browser in the world accounting for 90 percent of all web
use.
In 1995, Microsoft then went ahead to create their version of Netscape Navigator Internet
Explorer. This was the beginning of the internets browser war. Internet Explorer quickly took over
from Netscape and had a 95 percent market share by 2002.
In the past five years dozens of other internet browsers have come onto the market all offering
bigger and better features than the last. Mozilla Firefox was released in 2004 and has now taken
over from Internet Explorer as the worlds most used internet browser. Google Chrome was
released in September 2008 and is quickly becoming another popular choice.
We shall compare the five main internet browsers that make up the majority of the worlds market
share in internet browsers. These are: Internet Explorer, Mozilla Firefox, Safari, Opera and Google
Chrome.
All of the top five internet browsers have several things in common they are fast, light in weight,
provide internet security and are reliable. So, if that is all you after from an internet browser then
any of them will suit you fine. However, if you want additional features and add-ons then there are
slight variations between the browsers that we shall look at below.
Internet Explorer
Internet Explorer has been the leading internet browser for many years, only recently being over taken by Mozilla
Firefox. However, many people still use Internet Explorer and it has many fantastic features.
There are two main benefits to using Internet Explorer compatibility and security. Internet Explorer is compatible
with all websites, whereas other browsers may have difficult opening several websites. It is also incredibly safe and
helps protect against phishing and malware attacks. It is fast, safe and easy to use, however when compared to its
biggest rival Firefox it doesnt have as many features or the ability to customize.
The latest Internet Explorer offers crash recovery, a fast start up and the address bar provides auto complete.
Safari (Apple)
Safari is the standard internet browser for Mac OS X users. It is a fast and reliable browser that has a sleek and easy to
use interface like most of Apples products.
The best thing about Safari is its speed which Apple claims is the fastest of all web browsers. However, it does not
have many features that other browsers have which allow it to be so fast and light. If you dont require flashy extras
then Safari is a great choice for you, but if you want something with more features then you may want to try a different
browser.
Google Chrome
Google Chrome is the latest competitor in the internet browser game. It was only released in September 2008 but
already has a huge fan base.
Chrome offers a sleek and simple interface that incorporates speed, simplicity and compatibility all rolled into one
package. Some of the extra features included are: the ability to drag, drop and rearrange tabs as well as an excellent
task manager feature.
The main downside of Chrome is that it has very few add-ons, however it is expected that these will come over the
coming months as Chrome continues to become more popular
However you will need to remember that IE is more flexible than Netscape and due to small differences something
that works really well in IE might not work at all in Netscape. So you need to be really careful and alert when
programming for both browsers.
Another major limitation of Netscape as compared to IE is that not all properties of a page can be changed at any time.
This is because when the web page is once written to the screen, only position, visibility and clipping can be
manipulated dynamically.
The good news is that from the web designing point of view you can now forget completely about debugging all your
websites for Netscape 4.x as a very small fraction of the Netscape community still use it. Think of it this way, if you are
bent on making the website work perfectly for version 4.x then you cannot use some effects (especially javascript and
CSS) that are easily supported by the latest versions of all the major browsers.
Web search engine
A program that searches documents for specified keywords and returns a list of the
documents where the keywords were found. Although search engine is really a general
class of programs, the term is often used to specifically describe systems like
Google, Alta Vista and Excite that enable users to search for documents on the
World and USENET newsgroups.
Typically, a search engine works by sending out a spider to fetch as many documents
as possible. Another program, called an indexer, then reads these documents and
creates an index based on the words contained in each document. Each search engine
uses a proprietary algorithm to create its indices such that, ideally, only meaningful
results are returned for each query.
A web search engine is designed to search for information on the World Wide Web
and FTP servers. The search results are generally presented in a list of results and are
often called hits. The information may consist of web pages, images, information and
other types of files. Some search engines also mine data available
in databases or open directories. Unlike web directories, which are maintained by
human editors, search engines operate algorithmically or are a mixture of algorithmic
and human input.
1. A spider (also called a "crawler" or a "bot") that goes to every page or representative pages on every
Web site that wants to be searchable and reads it, using hypertext links on each page to discover and
read a site's other pages.
2. A program that creates a huge index (sometimes called a "catalog") from the pages that have been
read.
3. A program that receives your search request, compares it to the entries in the index, and returns results
to you.
An alternative to using a search engine is to explore a structured directory of topics. Yahoo, which also lets
you use its search engine, is the most widely-used directory on the Web. A number of Web portal sites offer
both the search engine and directory approaches to finding information.
Year Engine
1990 Archieve The very first tool used for searching on the Internet was Archie. The program
downloaded the directory listings of all the files located on public anonymous FTP
(File Transfer Protocol) sites, creating a searchable database of file names;
however, Archie did not index the contents of these sites since the amount of data
was so limited it could be readily searched manually.
1992 W3Catalog the web's first primitive search engine, released on September 2, 1993
1993 Wandex the first web robot, the Perl-based World Wide Web Wanderer
1993 Jump Station Used a web robot to find web pages and to build its index, and used a web form as
the interface to its query program. It was thus the first WWW resource-discovery
tool to combine the three essential features of a web search engine (crawling,
indexing, and searching) as described below.
1994 Web Crawler One of the first "full text" crawler-based search engines Unlike its predecessors, it
let users search for any word in any webpage, which has become the standard for
all major search engines since. It was also the first one to be widely known by the
public.
1994 Yahoo Was among the most popular ways for people to find web pages of interest, but its
search function operated on its web directory, rather than full-text copies of web
pages. Information seekers could also browse the directory instead of doing a
keyword-based search.
2000 Google The company achieved better results for many searches with an innovation
called PageRank. Google also maintained a minimalist interface to its search
engine. In contrast, many of its competitors embedded a search engine in a web
portal.
1. Web crawling
2. Indexing
3. Searching
There are basically three types of search engines:
1. Those that are powered by robots (called crawlers; ants or spiders)
2. Those that are powered by human submissions;
3. and those that are a hybrid of the two.
Web Crawler
Web search engines work by storing information about many web pages, which they
retrieve from the html itself. These pages are retrieved by a Web crawler (sometimes
also known as a spider) an automated Web browser which follows every link on the
site. Exclusions can be made by the use of robots.txt.
2. Specialized content search engines are selective about what part of the Web is crawled and indexed.
For example, TechTarget sites for products such as the AS/400 (http://www.search400.com) and CRM
applications (http://www.searchCRM.com) selectively index only the best sites about these products and
provide a shorter but more focused list of results.
3. Ask Jeeves (http://www.ask.com) provides a general search of the Web but allows you to enter a search
request in natural language, such as "What's the weather in Seattle today?"
4. Special tools and some major Web sites such as Yahoo let you use a number of search engines at the
same time and compile results for you in a single list.
5. Individual Web sites, especially larger corporate sites, may use a search engine to index and retrieve
the content of just their own site. Some of the major search engine companies license or sell their
search engines for use on individual sites.
Powered by Robots
A Web crawler is a computer program that browses the World Wide Web in a methodical, automated manner
or in a particular order. Other terms for Web crawlers are ants, automatic indexers, bots, Web spiders, Web
robots.
Crawler-based search engines are those that use automated software agents (called crawlers) that visit a Web
site, read the information on the actual site, read the site'smeta tags and also follow the links that the site
connects to performing indexing on all linked Web sites as well.
The crawler returns all that information back to a central depository, where the data is indexed. The crawler will
periodically return to the sites to check for any information that has changed. The frequency with which this
happens is determined by the administrators of the search engine.
This process is called Web crawling or spidering. Many sites, in particular search
engines, use spidering as a means of providing up-to-date data. Web crawlers are
mainly used to create a copy of all the visited pages for later processing by a search
engine that will index the downloaded pages to provide fast searches. Crawlers can
also be used for automating maintenance tasks on a Web site, such as checking links
or validating HTML code. Also, crawlers can be used to gather specific types of
information from Web pages, such as harvesting e-mail addresses (usually for spam).
A Web crawler is one type of bot, or software agent. In general, it starts with a list
of URLs to visit, called the seeds. As the crawler visits these URLs, it identifies all
the hyperlinks in the page and adds them to the list of URLs to visit, called the crawl
frontier. URLs from the frontier are recursively visited according to a set of policies.
Examples of Web Crawler
Yahoo Slurp, Msnbot, Fast Crawler, Google bot, World Wide Web Worm, WebFountain , Web Crawler
In both cases, when you query a search engine to locate information, you're actually searching through the
index that the search engine has created you are not actually searching the Web. These indices are
giant databases of information that is collected and stored and subsequently searched. This explains why
sometimes a search on a commercial search engine, such as Yahoo! or Google, will return results that are, in
fact, dead links. Since the search results are based on the index, if the index hasn't been updated since a
Web page became invalid the search engine treats the page as still an active link even though it no longer is.
It will remain that way until the index is updated.
This tutorial is a how-to guide for creating AND, OR, NOT, phrase, and field searches on Web search engines.
We'll be using Google as an example. Keep in mind that the illustrated searches will work on most general search engines on
the Web.
Putting together a search is a three-step process.
TIP! There are also optional things you can do to focus a search. One useful option is known as field searching, and is
covered later on in this tutorial.
Notice how both words appear in the results. This is exactly what we wanted.
A variant of an AND search is the plus sign (+). In many search engines, the plus sign signals an AND
search. It guarantees that the words or phrases you include in your search will appear in your search results.
For example, +rain +snow. In most search engines, you don't need to use the plus sign because the
search engine will assume it.
Boolean OR search
What if we want results that include either the word r a i n or the word s no w? This calls for Boolean OR logic. With OR logic,
we're asking for one word, or the other word, or both. An easy way to use OR logic is to use an advanced search page. Most
search engines have such an option and it's very useful.
And the results are in, as you can see in the screenshot below - all 551,000,000 of them! The search results
include pages with just the word rain or just the word snow, exactly as we wanted. Farther down in the
results will be documents containing both words - the overlap in the Venn diagram that you learned about
in Boolean Searching on the Internet.
Notice that Google has translated this search into its own syntax: rain OR snow. Google requires that
the word OR be typed in CAPITAL LETTERS. So do some other search engines. Since this may not be easy
to remember, it's best to go to the advanced search page and let the search engine do the rest.
An OR search is usually used to search for synonyms, for example, global warming OR climate
change.
Boolean NOT search
Sometimes you want to retrieve documents that do not contain a particular word. This can help when
associated words are not really relevant and can muddy the focus of your results. To do this, place a minus
sign (-) in front of the word you want to exclude.
Let's go back to our rain-snow example. In this case, we want documents that contain the word rain, but not
the word snow. So, we've placed the minus sign immediately in front of the word snow: rain -snow.
Phrase Search
Some words naturally appear in the context of a phrase, for example, freedom of the press. To search on
phrases in most search engines, simply enclose the phrase within double quotes: "freedom of the
press".
Phrases are especially important when there are STOP WORDS in your search. These are "little" words such as
a, and, the, in, it, etc. Most search engines tend to ignore these words. If you want to be sure they are
included in your search results, enclose them with the rest of your search within quotation marks. You can
also put a plus sign (+) in front of them. Yahoo! suggests a combination of quotation marks and the plus sign,
e.g., "+in thing".
100 Continue
The request has been accepted for processing, but the processing
202 Accepted
has not been completed.
The cached version of the requested file is the same as the file to
304 Not Modified
be sent.
400 Bad Request The request had bad syntax or was impossible to be satisified.
The request does not specify the file name. Or the directory or the
403 Forbidden file does not have the permission that allows the pages to be
viewed from the web.
Proxy Authentication
407
Required
409 Conflict
410 Gone
501 Not Implemented The server does not support the facility required.
The service did not respond within the time frame that the
504 Gateway Time-Out
gateway was willing to wait.
Practical List:-
1. Basic Tags (Head, Body & Title)
7. Pre-formatted tags
1. Resume (With Table for Qualification)
2. Drawing like Hut , Teddy Bear
8. List Tag
1. Ordered List
1. Number 1,2,3
2. Number A,B,C
3. Number in Roman
2. Unordered List
1. With Disc
2. With circle
3. With square
Example
Specify a default font-color and font-size for text on page:
<head>
<basefont color="red" size="5" />
</head>
<body>
<h1>This is a header</h1>
<p>This is a paragraph</p>
</body>
Browser Support
<p>This is the standard font size for this document.<br/>
<basefont size="4" />
And now the font is a bit larger.
<basefont size="3" />
And now the font is back to normal.</p>
Internet service provider
An Internet service provider (ISP), also sometimes referred to as an Internet access provider (IAP), is a
company that offers its customers access to the Internet. The ISP connects to its customers using a data
transmission technology appropriate for delivering Internet Protocol packets or frames, such as dial-
up, DSL, cable modem, wireless or dedicated high-speed interconnects.
Internet Service Provider, it refers to a company that provides Internet services, including personal and
business access to the Internet. For a monthly fee, the service provider usually provides a software
package, username, password and access phone number. Equipped with a modem, you can then log on to
the Internet and browse the World Wide Web and USENET, and send and receive e-mail.
For broadband access you typically receive the broadband modem hardware or pay a monthly fee for this
equipment that is added to your ISP account billing.
In addition to serving individuals, ISPs also serve large companies, providing a direct connection from the
company's networks to the Internet. ISPs themselves are connected to one another through Network Access
Points (NAPs). ISPs may also be called IAPs (Internet Access Providers).
ISPs may provide Internet e-mail accounts to users which allow them to communicate with one another by
sending and receiving electronic messages through their ISP's servers. ISPs may provide services such as
remotely storing data files on behalf of their customers, as well as other services unique to each particular ISP.
Typical home user connection
1. Broadband wireless access
2. Cable Internet
3. Dial-up
6. Wi-Fi (its use to describe only a narrow range of connectivity technologies including wireless local area
network (WLAN) based on the IEEE 802.11 standards)
2. Ethernet technologies
3. Leased line
Essel Shyam
BSNL CMC RPG Infotech
Communications
Primus
RailTel
Reliance Telecommunications ERNET India
Corporation
India
Reliance
Pacific Internet
Bharti Infotel In2Cable (India) Engineering
India
Associates
Swiftmail Estel
BG Broad India Bharti Aquanet
Communications Communication
Harthway Cable
More recently, wireless Internet service providers or WISPs have emerged that offer Internet access through wireless
LAN or wireless broadband networks.
In addition to basic connectivity, many ISPs also offer related Internet services like email, Web hosting and access to
software tools.
A few companies also offer free ISP service to those who need occasional Internet connectivity. These free offerings
feature limited connect time and are often bundled with some other product or service.
When you are connected to the Internet through your service provider, communication between you and the
ISP is established using a simple protocol: PPP (Point to Point Protocol), a protocol making it possible for two
remote computers to communicate without having an IP address.
In fact your computer does not have an IP address. However an IP address is necessary to be able to go
onto the Internet because the protocol used on the Internet is the TCP/IP protocol which makes it possible for
a very large number of computers which are located by these addresses to communicate.
So, communication between you and the service provider is established according to the PPP protocol which
is characterized by:
1. a telephone call
2. initialization of communication
Once you are "connected", the internet service provider lends you an IP address which you keep for the whole
duration that you are connected to the internet. However, this address is not fixed because at the time of the
next connection the service provider gives you one of its free addresses (therefore different because depending
on its capacity, it may have several hundreds of thousand addresses.).
Your connection is therefore a proxy connection because it is your service provider who sends all the requests
you make and the service provider who receives all the pages that you request and who returns them to you.
It is for these reasons for example that when you have Internet access via an ISP, you must pick up your email
on each connection because generally it is the service provider that receives your email (it is stored on one of
its servers).
1. Cover: some ISPs only offer cover in large towns, other offers national coverage, i.e. a number which
is charged as a local call wherever you are calling from
2. Bandwidth: this is the total speed that the ISP offers. This bandwidth is shared between the number of
subscribers, so the more the number of subscribers increases the smaller this becomes (the bandwidth
allocated to each subscriber must be greater than his transmission capacity in order to provide him
with a quality service).
3. Price: this depends on the ISP and the type of package chosen. Some ISPs now offer free access
4. Access: unlimited: some ISPs offer a package where your connection time is taken into account, i.e.
you cannot exceed a number of hours of connection per month, in which case the call charge is
subject to a price increase (additional minutes are very expensive). Some providers even offer tariffs
without subscription, i.e. only the communication is paid for (but obviously is more expensive than a
local call!)
5. Technical service: this is a team responsible for responding to your technical problems (also called a
hotline or even customer service). ISPs generally charge for this type of service (sometimes 1.35 for
the call then 0.34/min)
6. Supplementary services:
These two major browsers are coming closer to each other regarding the DHTML effects possible towards
newer versions. However you will need to remember that IE is more flexible than Netscape and due to
small differences something that works really well in IE might not work at all in Netscape. So you need
to be really careful and alert when programming for both browsers. One hint you can follow in most
cases is that if you get it working in Netscape it should most probably work in IE.
Another major limitation of Netscape as compared to IE is that not all properties of a page can be
changed at any time. This is because when the web page is once written to the screen, only position,
visibility and clipping can be manipulated dynamically.
The good news is that from the web designing point of view you can now forget completely about
debugging all your websites for Netscape 4.x as a very small fraction of the Netscape community still use
it. Think of it this way, if you are bent on making the website work perfectly for version 4.x then you
cannot use some effects (especially javascript and CSS) that are easily supported by the latest versions
of all the major browsers.
For Dreamweaver to not keep throwing up Netscape 4 errors set the browser check settings to show
Netscape 6 instead of the default 4.0. To do this click on the Results panel, select theTarget Browser
Check tab, click on the green arrow to show the list of options - select theSettings option and
set Netscape Navigator to version 6.0.