Web Technologies Unit-III
Directories: The first method of finding and organizing web information is the
directory approach. A directory offers a hierarchical representation of hyperlinks to
the web pages and presentation broken down into topics and subtopics.
1. Allows the user to submit a form containing a query that consists of a word or
phrase describing the specific information user is trying to locate on the web.
2. Searches its database to try to match the query.
3. Collates and returns a list of clickable URLs containing presentations that
match the query; the list is usually ordered, w ith better matches appearing at
the top.
4. Permits the user to revise and resubmit a query.
A number of search engines also prov ide URLs for related or suggested topics. Like
directories, search engines can be classified as either general or specialty search
engine. A specialty search engine is also called a vertical search engine or a topic
search engine. Many people find that search engines are not as easy to use as
directories. To use a search engine, the user supplies a query by entering
information into a field on the screen. To be effective, the search engine returns a
small list of URLs on the user’s topic. To pose such queries, user must learn the
query syntax of the search engine w ith which user is working.
Many metasearch engines will collate the search results into one list, remove
duplicates and then rank the pages according to how well they match the query. The
advantage of a metasearch engine is that user can access a number of different
search engines with a single query. The disadvantage is that user will have a high
noise-to-signal ratio; that is, a lot of the matches will not be of interest to the user.
This means user will need to spend more time evaluating the results and deciding
which hyperlinks to follow.
Search Terminology:
1. Search Tool: Any mechanism for locating information on the Web; usually
refers to a search or metasearch engine or a directory.
2. Query: Information entered into a form on a search engine’s web page that
describes the information being sought.
3. Query Syntax: A set of rules describing what constitutes a legal query. On
some search engines, special sy mbols may be used in a query.
4. Query Semantics: A set of rules that defines the meaning to a query.
5. Hit: A URL that a search engine returns in response to a query.
6. Match: A synony m for hit.
7. Re levancy Score: A value that indicates how close a match a URL was to a
query; usually expressed as a value from 1 to 100 w ith the higher score
meaning more relevant.
Pattern Matching Que ry: The most basic type of query is a pattern matching
query. We formulate a pattern matching query using a keyword or a group of
keywords. The search engine returns the URL of any page that contains these
Boolean Queries: Boolean queries involve the Boolean operations AND, OR and
NOT. Most search engines allow the user to enter Boolean queries.
Search Strategies: User can begin by testing a number of different search engines,
trying to find one that meets the following conditions:
· Possesses a user-friendly interface.
· Has easy-to-understand, comprehensive documentation.
· Is convenient to access; that is user need to wait several minutes before
being able to submit a query.
· Contains a large database so that it knows a lot about the information for
which user is searching.
· Does a good job in assigning relevance scores.
User can find a search engine that meets most of the above criteria. User should
concentrate on learning it well rather than learning a little bit about several
different search engines.
Search Gene ralization (Fe w Hits): Suppose the query returns no hits or less
number of hits, we need to generalize the search.
Search Specialization (Too Ma ny Hits): Suppose the query returns more URLs,
then user needs to specialize the search:
· If the user started with a pattern matching query, then user may want to add
more keywords.
· If user began with a Boolean query, user need to AND another keyword or
use the NOT operator to exclude some pages.
· If user is still retrieving too many hits, try capitalizing proper nouns and
pronouns or names.
· If nothing seems to work, try review ing first 20 URLs since search engines list
the best matches near the top. If they don’t contain the seeking information,
user can refine the search.
· If this fails, user could resort to a directory and work dow n to the topic of
Search Engine Compone nts: Based upon functionality, the search engine is splits
into the following components.
1. User Interface: The screen in which user types a query and w hich displays
the search results.
2. Searche r: The part that searches a database for information to match the
3. Evaluator: The function that assigns relevancy scores to the information
4. Gathe rer: The component that traverses the web collecting information
about the web pages.
5. Indexer: The function that categorizes the data obtained by the gatherer and
creates the index.
User Interface: The user interface must provide a mechanism by which a user
can submit queries to the search engine. This is universally done using forms. In
addition, the user interface should be friendly and v isually appealing. Hyperlinks
to help files should be displayed prominently and advertisements should not
hinder a reader’s use of the search engine. Finally, the user interface needs to
display the results of the search in a convenient way. The user should be
presented w ith a list of hits from the search, a relevancy score for each hit and a
summary of each page that was matched. This way, the user can make an
informed choice as to which hyperlinks to follow.
Searche r: The searcher is a program that uses the search engine’s index and
database to see if any matches can be found for the query. The query must first
be transformed into a syntax that the searcher can process. Since the database
associated with the search engine is extremely large, a highly efficient search
Evaluator: The searcher locates any URLs that match the query. The hits
retrieved by the query are called the result set of the search. Not all of the hits
will match the query equally well. The relevancy score is an indication of how well
a given page matched with the query. The relevancy score varies from search
engine to search engine. A number of different factors are involved and each one
contributes a different percentage. Some of the factors are:
· How many times the words in the query appear in the page.
· Whether or not the query words appear in the title.
· The proximity of the query words to the beginning of the page.
· Whether the query words appear in the CONTENT attribute of the META
· How many of the query words appear in the document.
Breadth-First Sea rch: A breadth-first search proceeds in levels "across" the pages.
The gatherer begins at a particular Web page and then explores all pages that it can
reach by using only one hy perlink from the original page. Once it has exhausted all
Web pages at that one level, it explores all of the Web pages that can be reached by
follow ing only one hyperlink from any page that was discovered at one level. In this
way, a second level, which usually contains many more web pages than the first
level, is explored. This process is repeated level by level until no new Web pages are
found. When no more pages can be located, the search may need to jump to a new
starting point.
Indexer: Once the gatherer retrieves information about Web pages, the information
is placed into a database and indexed. The indexer functions create a set of keys (an
index) that organizes the data, so that high-speed searches can be conducted and
the desired information can be located retrieved quickly. The equivalent elements
that should go into a Web page record include the URL, document title, and
descriptive keywords.
Telnet and Remote Login are two programs that allow the user to log into another
computer from an account in to which user is already logged. To do this, user needs
a second computer that is accessible to the user. The second computer is usually at
a different physical location.
Te lnet: The telnet command uses the Telnet protocol to log into a remote computer
on the internet. The command is often called telnet, but different programs w ith
names like tn3270 (IBM 3270 Machine) , WinQVT (Query/View/Transformation) and
QWS3270(Quick Windows Sequencer).
There are a wide range of Telnet clients and many of them have a user-
friendly interface. On a desktop system, a Telnet client can usually be launched from
one of the system’s menus simply by selecting the Telnet option.
If the telnet is not located on the desktop and it is a Windows operating
system, there is still a good chance that there is a Telnet client on the system. To
determine whether the system has Telnet or not, go to the start menu and select
Find. Under F ind, select the Files or Folders option. Now simply enter the word
“telnet” in the search area. The telnet.exe file is an executable telnet program.
In a windows env ironment, in the Telnet interface, select RemoteSystem
option from Connect pull-down menu causes the Connect window to display within
the Telnet window. The form in the Connect window specifies the hostname, port and
On UNIX system, we can type the command telnet at the operating system
prompt. We receive the following prompt.
We can type the open command followed by the hostname of the computer to
connect as follows:
The hostname is the machine domain name or the IP address of the machine. In
some case, we need to type port number also.
Typing help or ? at the Telnet prompt w ill usually result in the Telnet documentation
being displayed. When the Telnet needs to quit, we can type close or quit to end it.
One of the most common uses of Telnet is to log into personal machine to retrieve
email while traveling. Be warned that the process of reading email in this fashion can
be very tedious from many countries. The connections are often slow that sometimes
it is impossible to retrieve.
Remote Login: The rlogin command is similar to the telnet command, except that
it provides the remote computer with information about where we are logging in
from. If the machine that we are performing the remote login from is listed in the
remote machine’s file of hostnames, we need not enter any password.
On UNIX systems, the list of hostnames is given in a hidden f ile called
.rhosts. From UNIX prompt, the syntax for the rlogin command is
%rlogin hostname
Where hostname is the name of the machine from w hich we want to establish a
remote login connection. All the commands entered w ill run on the remote machine
until the remote session is terminated by using an exit command.
File Transfer is an application that allows the user to transfer files between two
computers on the Internet or on the same network. The two most import file transfer
functions are:
· Copying a file from another computer to user’s computer.
· Sending a file from user’s computer to another computer.
The process of transferring the file from user’s computer to another computer is
called uploading. The process of getting the file from another computer to user’s
computer is called downloading. When copying the file, user should first run virus
detection software on them before using them. This helps safeguard against the
computer getting infected, but it is not a guarantee.
Gra phical File Transfer Clie nt: Graphical file transfer clients are the easiest to use.
These applications display the sending computer’s f ile system in one w indow and the
receiving computer’s file system in a second window.
In order to connect to a remote site using a graphical FTP client, user should
first click on Connect button. In the first line, we simply type in the hostname or the
IP address of the remote system we are connecting. In the third line, we enter the
user account name and in the fourth line, the password. Once we type all the
information, we can press the OK button. This will connect the user to the remote
Many features of a graphical FTP client are self-explanatory. For example, to
transfer a file f rom one system to another, we can drag a nd drop it to the other
system. Files can be thus be exchanged in either direction.
One important point is the transfer setting mode. This can usually be specified
by clicking on a button. Most clients have a text transfer mode (ASCII) and a binary
transfer mode and Auto. All file types can be transferred using binary mode, but not
all files can be transferred using text mode.
After completing an FTP session, it is a good practice to close the session by
clicking on the Close button and then exit the FTP client by clicking on the Exit
The following steps are followed while transferring the file:
1. Locate the file to transfer.
2. Launch the FTP client on the PC
3. Connect and login to the remote UNIX system.
4. Change to the appropriate directories on both the local and remote systems.
5. Select the appropriate transfer mode.
6. Select the file to transfer.
7. Transfer the file.
8. Close and exit FTP.
Text-Based File Tra nsfe r Client: we can launch the UNIX f ile transfer client called
File Transfer Protocol by entering the command.
%ftp hostname
Here hostname is the name of the computer w ith which we want to exchange files.
Once we have successfully initiated an FTP session by supply ing userid and
password, we get the following prompt
· Bye - Terminate the session and exit the file transfer program
· Cd - Change directory
· Get - Copy a file
· Help – View the list of commands
· Ls - list of files in the current working directory
· Put – send a copy of the file
· Pwd – Print the name of the current directory.
Anonymous File Transfer: On some systems, files are made available to anyone
who wants to retrieve them. If a file needs to be widely distributed, it may not
feasible to assign accounts and passwords to everyone interesting in receiving a copy
of the file. Anony mous file transfer was established to solve this problem.
Computer Viruses: Some of the programs that are downloaded from the internet or
obtain as email attachments may threaten the security of the computers if they
contain Virus, Trojan Horse and Worms.
Virus :A v irus can be thought of as a program that when run can replicate and then
embed itself within another program. Although there are harmless v iruses, most are
intended to damage the host system. The damage can occur immediately by f illing
all the available space in the hard disk or it may occur after some later time. The
damage of the computer might involve something as innocent as a message being
displayed on the desktop. Before doing damage, the virus could infect the other
programs on the computer as well as other computers if we send program f iles to
others. A specific even, such as on a particular date, the virus becomes active and it
is called a trigger.
Troja n Horse: The name came from Greek Mythology. It is a legitimate program for
carrying out some useful function, but within it is hidden code that is activated by
some trigger. When the hidden code is executed, it might release a virus, permit
unauthorized access to the computer or destroy files and data.
Virus Avoida nce and Precautions: UNX viruses are rare, because of the strict
security measures on UNIX systems. Most viruses are designed to infect PCs or
Macs. A virus is usually target at one type or the other of such systems, since nearly
all viruses are operating-system specific. To protect from v iruses, Trojan Horses and
Worms, we need to take the following precautions:
7. Differe ntiate between Sema ntic and Sy ntactic base style types.
Semantic-Based Style Types: These are also called as Content-Based Style Types.
These tags are used to indicate the content of the text. The following are the list of
Semantic Style types:
1. Emphasis Tag: The emphasis tag <EM> w ith its corresponding </EM> ending
tag is used for highlighting text.
2. Strong Tag: The strong tag <ST RONG> is used to indicate an even higher level
of emphais.
3. Citation Tag: The citation tag <CITE> is used to specify a reference. A collection
of citations creates a bibliography. Using the citation tag facilitates that collection
since every reference is bracketed between <CITE> and </CITE>
For example variable name file1 represent any file name. in the on-line
documentation, we are developing about file manipulation, we can specify how to
delete a file using the follow ing code:
7. Code Ta g: The code tag <CODE> is used for specifying program code.
8. Sma ll Tag: To reduce the relative font size small tag is used <SMALL>.
Syntactic-Based Style Tags: These are also called as Physical-Based Style Tags.
These tags allow the programmer to tell the browser specifically how to display the
text on a web page. The following are the list of Syntactic Style types:
1. Bold Ta g: The bold tag <B> is used to make text in boldface. Most browsers
darken the text and widen the letters.
2. Ita lics Tag: To place the portion of text in italics, use the italics tag <I>
3. Monospaced Type writer Text: the typewriter text <TT> is used for placing text
in a monospaced typewriter font. This can be used to indicate that a certain phrase
needs to be typed in.
4. Strike Ta g: The strike tag <STRIKE> may be used for crossing out a word or a
phrase by hav ing a line drawn through it.
5. Subsc ript Tag: The subscript tag <SUB> is used to generate subscript.
To get x1 + x2 =0
7. Underline Tag: The underline tag <U> is used to underline text. Since
hyperlinks are depicted by underlining, the underline tag should be used sparingly
and only in situations where no confusion can result as to whether or not the
underlined item is a hyperlink.
8. Blink Tag: Flashing text is created using the blink tag <BLINK>.
Headers: The beginning part of a rendered Webpage is called the header. The
header is the information contained at the top of a rendered web page, not at the
top of an HTML source file. The header is not an HTML tag. The header is not
formatted within the head tag, but in the body of a document. Most headers contain
subset of the following information:
· The title of the page
· Last-updated information
· Signature of the page developer
· An icon or logo associated with the page.
· A counter of the number of visitors.
· An advertisement
The purpose of the header is to convey the most important information about the
page, introduce the page and set the tone for the page. In any collection of web
pages, it is a good idea to use consistent headers. This helps the reader to determine
the boundaries of the presentation. If a hyperlink leads to a different looking header,
readers realize they may have left the original presentation. Consistent headers help
tie the presentation together.
Footers: The bottom of many web pages contains similar information. The ending
part of a web page is called the footer. Footer is not an HTML element but rather web
page content appearing at the bottom of a page. Most footers contain subset of the
follow ing information.