100% found this document useful (6 votes)
1K views22 pages

PHONET

The document discusses the motivation for developing a voice-based web technology called PhoNET. It notes limitations of existing internet access methods and the digital and language divides that prevent many people from accessing the internet. The challenge is to integrate technologies like VXML, SALT, speech recognition, and text-to-speech to allow simple, affordable voice-based internet access from any phone and overcome barriers to universal internet usage.

Uploaded by

Plex Das
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
100% found this document useful (6 votes)
1K views22 pages

PHONET

The document discusses the motivation for developing a voice-based web technology called PhoNET. It notes limitations of existing internet access methods and the digital and language divides that prevent many people from accessing the internet. The challenge is to integrate technologies like VXML, SALT, speech recognition, and text-to-speech to allow simple, affordable voice-based internet access from any phone and overcome barriers to universal internet usage.

Uploaded by

Plex Das
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 22

PhoNET - A Voice Based Web Technology 2010-11

CHAPTER: 1

1.1 INTRODUCTION

Today's telecom business has seen recent growth, especially in bandwidth


infrastructure for long distance (LD) and data. The industry is currently experiencing
strong growth in the wireless segment as mobile devices prove to be very popular with
both consumers and business. An evolving market segment is "Internet anywhere" and
many companies are trying approaches to present viable products for this market. One
approach is Internet access over wireless devices such as cell phones with a screen.
However, this method has inherent limitations such as small screen size, lack of a
keyboard, the need for a special device (web-enabled phone), the need to rewrite and
maintain a special website, and severe bandwidth constraints using wireless data transfer
protocols.

Another approach that is becoming popular is voice-based limited Internet access,


which overcomes all of the limitations of the wireless data devices but one; they still limit
access to the few sites that are re-engineered for voice. They typically deliver content such
as news, weather, horoscopes, and stock quotes, etc. over the phone. These companies are
called "Voice Portals." Voice portals were the first web applications that tried to integrate
websites with voice which gave birth to the enterprise based PBX systems.

Other solutions, such as Personal Digital Assistants, phones with display screens
and other Internet appliances, are available, but have limitations. Users must have special
hardware with intelligence built in, and often must view the Web through small, difficult-
to-read screens. Such devices are often expensive, as well.

Our solution, which presents a third option, gives users all of the benefits of the
voice portals, yet has complete access to the entire Internet without limitation. With our
Voice Internet technology PhoNET, anyone can surf, search, send and receive email, and
conduct e-commerce transactions, etc. using their voice from anywhere using any phone,
with the more freedom of movement than a standard Internet browser which requires a PC
and an Internet connection.

PhoNET technology is faster and cheaper than existing alternatives. Today, only
the largest of companies are making their Web sites telephone-accessible because existing
technology requires a manual, costly and time-consuming re-write of each page. With the
Department of Electronics and Communication Engg. S.J.M.I.T,Chitradurga 1
PhoNET - A Voice Based Web Technology 2010-11
voice internet technology-PhoNET, existing Web pages are used, allowing users to
leverage their Web investment. The software dynamically converts existing pages into
audio format, significantly lowering the up-front investment a business must make to
allow users to hear and interact with their Web site by phone.

1.2 MOTIVATION

Department of Electronics and Communication Engg. S.J.M.I.T,Chitradurga 2


PhoNET - A Voice Based Web Technology 2010-11
The primary method of access today continues to be the computer, which has
certain advantages as well as some limitations. Computers offer a visual Internet
experience that is usually rich in content. Some basic computer skills and knowledge are
needed to access the Internet. But, computer-based access is proving insufficient for the
professional on the move. When in the car or away from the office or computer, accessing
the Web is difficult, if not impossible. And, an increasing number of people prefer an
interface that allows them to hear and speak rather than see and click or type.

The computer-based Internet experience also does not meet the needs of another
segment of the population – the visually impaired. Neither visual displays of information,
nor keyboard-based interactions naturally meet their needs, and this segment is often
unable to benefit from all that the Information Age has to offer.

Some existing Internet users have also identified problems with the visual Internet
experience. Pages are increasingly full of graphics, advertisement banners, etc., which
move, flash, and blink as they vie for attention. Some find this "information overload"
annoying, and lament the delays it creates by severely taxing the available bandwidth.

The "Digital Divide"

While computers and their use are on the rise, they're not ubiquitous yet. A large
segment of the population still doesn't have access in the United and other parts of the
world. Thus, Internet is limited to only a small fraction of the world population; the
majority is left out from the Internet. This gap between those who can effectively use new
information from the Internet, and those who cannot is known as the digital divide.
Bridging this digital divide is the key to ensure that most people in the world have the
capability to access the Internet. Making computers ubiquitous is not a very attractive and
feasible solution, at least in the near future, because of various barriers. One key barrier is
cost, although the price of a computer has come down significantly in recent years.
Insufficient visual Internet Infrastructure is another barrier in many countries and it will
take a while to build such infrastructure. Other consumers have a basic distaste for
complex technology, which prevents them from accessing Webbased information via a
computer. A more natural, less cumbersome way to interface with the net would provide
them an opportunity to experience the Internet as well, thus bridging the Digital Divide.

The "Language Divide"

Department of Electronics and Communication Engg. S.J.M.I.T,Chitradurga 3


PhoNET - A Voice Based Web Technology 2010-11
Today more than eighty percent of website contents are written in English
language. People in China, Japan and other countries in Asia, countries in Europe and
Latin America speak a language other than English as their native language. These people
are left out of a significant portion of the World Wide Web. For example, Japanese or a
Chinese cannot understand the content of CNN or New York Times. This gap of not
having access to major part of the Internet because of language barrier is called "The
Language Divide”. Bridging this Language divide is the key to ensure that most people in
the world has the capability to access the major part of the Internet. The demand for
machine translation is growing phenomenally as more people each day embracing the
Internet. A service that translates the accessed information into the desired language would
clearly add value to these users. As the need for alternative access to the Internet becomes
more evident, several technology companies are pursuing solutions. Their products
include “smart” cell phones with visual displays, intelligence built into the handset, and
voice-activated Web sites. These products address different aspects of the problems
outlined above. While these alternative technologies are in the pipeline, few are ready for
market. But the very existence of a race to market by many companies is evidence of a
large potential market.

1.3 THE CHALLENGE


Department of Electronics and Communication Engg. S.J.M.I.T,Chitradurga 4
PhoNET - A Voice Based Web Technology 2010-11
To integrate existing technologies, or develop new technologies, to make simple,
affordable, alternative Internet access possible.

As the need for an alternative access method to the Internet has become evident,
progress continues to be made by technologists to provide such solutions. One key area of
focus has been voice-based technology, which would allow a very natural interface for
most people, and address the limitations described earlier. A voice interface provides an
alternative to the visually based interface. A device such as the telephone provides a
readily accessible alternative to the computer.

Several technologies existing today are keys to the solution, but the problem lies in
successfully integrating these technologies into useful applications of greater value than
their individual components. These technologies include:

 Voice Extensible Mark-up Language (VXML) which is an extension of HTML,


the normal language in which Web pages are created. This technology adds voice
capability to a Web page. The page can then be displayed, as usual, over a
computer, but it can also be presented in audio format with voice navigation.
 Speech Language Application Tags (SALT) specification for supporting
multimodal communication from PCs, cell phones, PDAs and other handheld
devices. For example, input can be voice (such as asking for directions) and output
can be data (a map pops up). SALT is a lightweight set of extensions to existing
markup languages, allowing developers to embed speech enhancements in
existing HTML, xHTML and XML pages. As with VoiceXML, applications will
be portable - thanks to the separation from the underlying hardware and platform.
 Speech recognition (SR), which allows computers, through the use of software, to
recognize spoken language, eliminating the need for the computer keyboard as an
interface. The vocabulary recognized in products using this technology tends to be
limited.
 Text-to-speech (TTS), which allows text to be converted automatically to
synthetic speech. It allows communications between computers and humans
through a “natural” interface, such as speech.
 Telephone integration is the key to interface with computers from a remote
location. A protocol is needed to communicate with the computer from a
telephone using voice. This also includes multimedia integration (e.g. with .wav
files).

Department of Electronics and Communication Engg. S.J.M.I.T,Chitradurga 5


PhoNET - A Voice Based Web Technology 2010-11
 Intelligent software agents are needed to automate communication between a
telephone and a computer, a computer and a Web site, to interpret the contents of
a web page, to extract key information that makes sense in audio, to efficiently
navigate through web pages, and to manage access to the Internet.
 Language processing allows translation to other languages, understanding and
interpreting of structured sentences. Natural language processing allows us to
understand and interpret human languages.

The first technology listed – VoiceXML – is a very elegant solution that leverages
technology specifically developed for audio Internet access. However, it requires that Web
sites be customized, or VXML-enabled. This means rewriting the web pages in VXML.
According to analysts, today there are more than a billion web pages. Assuming that it
takes one hour and costs about $100 to rewrite one page, the cost to voice-enable all sites
would be about $100B. Clearly, it will take several years before the majority of popular
pages are VXML-enabled. Today, only a very small portion of the total Web pages is
voice-enabled using VXML.

The second technology listed – SALT – is another elegant technology that allows
developers to embed speech enhancements in existing HTML, DHTML and XML pages.
However, like VoiceXML, it requires that websites be rewritten or enhanced with SALT
and hence it will also take many years before majority of popular pages are SALT-
enabled.

The paper presents a solution that successfully integrates the other technologies
listed into a useful, audio-based approach for accessing the Internet today. It is
independent of the timeline, interest and willingness of content providers to update their
pages to be VXML or SALT-enabled.

Another approach is to provide Internet access over wireless devices such as palm
pilot or a cell phone with a screen. However, this method has inherent limitations such as
the small size of the screen and the need for a special phone. Also there is need to rewrite
the website in WML. Today’s wireless Internet industry is facing many challenges due to
limitation of bandwidth and small screen. The cost of cell phone based Internet access is
very high and users do not want to pay high service fee. Also our eyes and fingers are not
changing but the devices are getting smaller and smaller. Thus, existing visual based
access is going to be even more difficult in future.

Department of Electronics and Communication Engg. S.J.M.I.T,Chitradurga 6


PhoNET - A Voice Based Web Technology 2010-11

1.4 THE PhoNET SOLUTION

Department of Electronics and Communication Engg. S.J.M.I.T,Chitradurga 7


PhoNET - A Voice Based Web Technology 2010-11
An audio Internet Technology that allows users to listen to email, buy on-line or
surf and hear any Web site, using a simple and natural interface - an ordinary telephone.
No computer is needed.

Subscribers dial a toll-free number, and start accessing the Internet using voice
commands. Speech recognition technology in the company’s system allows users to give
simple commands, such as "go to Yahoo" or "read my email" to get to the Net-based
information they want, when they want it, whether they’re out on an appointment, stuck in
traffic, sitting in an airport, or cooking dinner. They’ll be able to quickly locate
information, such as late-breaking news, traffic reports, directions, or anything else they’re
interested in on the World Wide Web. The product PhoNET has the capability to
automatically download web contents, filter out graphics, banners and images. It then
renders extracted texts into concise, meaningful and very suitable in audio format texts
before using TTS to convert into speech. PhoNET also converts the rendered texts into
other languages in real time. It can also be easily integrated with any back end application
such as CRM/SCM, ERP etc. Thus, PhoNET completely eliminates the need to rewrite
any content in VXML, SALT or WML. So we strongly believe that this automation based
approach will be very successful. Using text-to-speech technology, an "intelligent agent"
will read the requested information out loud via a computerized voice, and process the
user’s voice commands.

CHAPTER: 2

Department of Electronics and Communication Engg. S.J.M.I.T,Chitradurga 8


PhoNET - A Voice Based Web Technology 2010-11
2.1 PhoNET TECHNOLOGY OVERVIEW

The idea of listening to the Internet may at first sound a bit like watching the radio.
How does a visual medium rich in icons, text, and images translate itself into an audible
format that is meaningful and pleasing to the ear? The answer lies in an innovative
integration of three distinct technologies that render visual content into short, precise,
easily navigable, and meaningful text that can be converted to audio.

The technologies and steps employed to accomplish this feat are:

Document Processing

1. Speech Recognition.
2. Text-to-speech translation and

Document Rendering

3. Artificial Intelligence

Department of Electronics and Communication Engg. S.J.M.I.T,Chitradurga 9


PhoNET - A Voice Based Web Technology 2010-11

The PhoNET platform acts as an “Intelligent Agent” (IA) located between the user
and the Internet (Figure 1). The IA automates the process of rendering information from
the Internet to the user in a meaningful, precise, easily navigable and pleasant to listen to
audio format. Rendering is achieved by using Page Highlights (a method to find and speak
the key contents on a page), finding right as well as only relevant contents on a linked
page, assembling right contents from a linked page, and providing easy navigation. These
key steps are done using the information available in the visual web page itself and proper
algorithms that use information such as text contents, color, font size, links, paragraph,
and amount of text. Artificial Intelligence techniques are used in this automated rendering
process. This is similar to how the human brain renders from a visual page; selecting the
information of interest and then reading it.

The IA includes a language translation engine that dynamically translates web


contents from one language into another in real time. Thus, a Chinese speaking person can
ask to surf an English website in Chinese - the Intelligent Agent would access the English
website, extract the content of the website and translate it on the fly in Chinese and read it
back to the user in Chinese.

Department of Electronics and Communication Engg. S.J.M.I.T,Chitradurga 10


PhoNET - A Voice Based Web Technology 2010-11
The platform incorporates the highest quality speech recognition and text to speech
engines from third party suppliers.

The PhoNET architecture is shown in Figure 2. The process starts with a telephone
call placed by a user. The user is prompted for a logon pass phrase. The users pass phrase
establishes the first connection to the Web site associated with that phrase and loads the
first Web page. The HTML is parsed, as described later, separating text from other media
types, isolating URL from HTML anchors and isolating the associated anchor titles
(including ALT fields) for grammar generation. Grammar generation computes
combinations of the words in titles to produce a wide range of alternative ways to say
subsets of the title phrase. In this process simple function words (i.e., "and," "or," "the,"
etc.) are not allowed to occur in isolation where they would be meaningless. Browser
control commands are mixed in to control typical browser operations like "go back "and
"go home" (similar to the typical browser button commands). Further automatic expansion
of the language models for link titles can include thesaurus substitution of words, etc. soon
to be implemented .The current processing allows the user to say any keyword phrase
from the title with deletions allowed to activate the associated link. Grammar processing
continues with compilation and optimization into a finite-state network where
redundancies have been eliminated. The word vocabulary associated with the grammar is
further processed by a Text-To-Speech (TTS) pronunciation module that generates
phonetic transcriptions for each word of the grammar. Since the TTS engine uses
pronunciation rules it is not limited to dictionary words. The grammar and vocabulary are
then loaded into the speech recognizer. This process typically takes about a second. At the
same time, the Web document is described to the user, as discussed later. The user may
then speak a navigation or browser command phrase to control browsing. Each user
navigation command takes the user to a new Web page. If the command is ambiguous the
dialog manager collects the possible interpretations into a description list and asks the user
to choose one. Recognizing that a collection of Web pages is inherently a finite-state
network, it is easy to see that this mechanism provides the basis for a finite-state dialog
and Web application control system.

2.2 DOCUMENT PROCESSING


Department of Electronics and Communication Engg. S.J.M.I.T,Chitradurga 11
PhoNET - A Voice Based Web Technology 2010-11
Document analysis is performed in the HTML parser, grammar generator, and
Hyper Voice processor modules. The typical HTML Web page is first parsed into a list of
elements based mostly on the HTML tags structure. Some elements are aggregations
(tables, for instance) but the element list is not a full parse tree, which we found was not
needed and in some cases actually complicates processing. Images, tables, forms and most
text structure elements like paragraphs are recognized and processed according to their
recognized type. Much of the effort in building a robust HTML processor is dealing with
malformed HTML expressions such as unclosed tag scope, overlapping tag scopes, etc.
Unfortunately space does not allow for fully addressing this issue here. Commercial
browsers currently handle these issues in differing ways. Briefly, the handling of HTML
errors by Phone Browser mostly follows the style of Netscape Communicator.

Images often have ALT attribute tags that are used to derive the voice navigation
commands for these items. The location of each image is announced along with any
associated caption. This feature can be disabled on a site-by-site basis when the user does
not want to hear about images.

Tables are first classified according to purpose, either layout or content. Most
tables are actually used for page layout which can be recognized by the variety and types
of data contained in the table cells. Data tables are processed by a parser according to one
of a set of table model formats that Phone Browser recognizes. This provides primarily a
simple way of reading the table contents row by row, which is often not very satisfying.
Alternatively a transcoder can be used to reconstruct the table in sentential format. An
example of this is a Phone Browser stock quote service where the transcoder extracts data
from the Website table and builds a new Web page containing sentences describing the
company name, ticker symbol, last trade price, change, percent change and volume rather
than simply reading the numbers. The user can also ask for related news reports.

Forms with pop-up menus and radio buttons are handled by creating voice
command grammars from the choices described in the HTML. Each menu or button
choice is spoken to the user who can repeat that phrase or a phrase subset to activate that
choice. Open dialog forms (e.g. search engines) present a larger challenge. Since there is
nothing to define a grammar, the implication is a full language model. While large
vocabulary dictation speech systems are available, most require speaker training to
achieve sufficiently high accuracy for most applications. Phone Browser is intended to be
immediately usable without training so dictation is not yet supported. This also implies

Department of Electronics and Communication Engg. S.J.M.I.T,Chitradurga 12


PhoNET - A Voice Based Web Technology 2010-11
that creating arbitrary text for messaging is also not yet supported. One additional type of
form input is an extension to HTML. A GSL (Grammar Specification Language) or JSGF
(Java Speech Grammar Format) specification can be inserted into an HTML anchor using
an attribute tag (currently LSPSGSL). Using this method an application can specify an
elaborate input grammar allowing many possible sentences to address the associated
hyperlink and construct a GET type form response where the QUERY_STRING element
is constructed by inserting the speech recognition text results. Grammar specifications
written this way may represent many thousands of possible sentence inputs giving the end
user great speaking flexibility.

2.3 DOCUMENT RENDERING

Department of Electronics and Communication Engg. S.J.M.I.T,Chitradurga 13


PhoNET - A Voice Based Web Technology 2010-11
Rendering-Definition

Information technology uses this term ‘Rendering’ refers to how information is


presented according to the medium, for example, graphically displayed on a screen,
audibly read using a recording device, or printed on a piece of paper.

In the context of voice/audio Internet, Web content rendering entails the translation
of information originally intended for visual presentation into a format more suitable to
audio. Conceptually this is quite a straightforward process but tactically, it poses some
daunting challenges in executing this translation. What are those challenges and why are
they so difficult to overcome? These questions are explored in the next section of this
paper.

The Rendering Problem

Computers possess certain superhuman attributes, which far outstrip that of mortal
man—most notable are their computational capabilities. The common business spread
sheet is a testament to this fact. Other seemingly more mundane tasks, however, present
quite a conundrum for even the most sophisticated of processors. Designing a high-speed
special purpose computer capable of defeating a grandmaster at chess took the computing
industry over 50 years to perfect. Employing strategic thinking is not a computer’s forte.
That is because in all the logic embodied in their digitized ones and zeroes, there is no
inherent cognitive thought. This one powerful achievement of the brain along with our
ability to feel and express emotion separates the human mind from its computerized
equivalent—the centralized processing unit (CPU).

The relevance of cognitive thought to text rendering may not be immediately


obvious but it is one of the major challenges faced when attempting to present information
designed for one medium and rendering it to another. This is because there are no hard and
fast objective rules to follow. Computers are very good at following instructions when
they can be reduced to very objective decision points. They are not so good when value
judgments are involved. A human being can readily distinguish a cat from a dog, or a
relevant news link on a Web page from a link for an advertisement. For a computer this
simple exercise is significantly more challenging than applying the Taylor expansion
formula to a set of polynomials - something a computer can do quite handily.

Solving the problem


Department of Electronics and Communication Engg. S.J.M.I.T,Chitradurga 14
PhoNET - A Voice Based Web Technology 2010-11
To solve the rendering problem, some intelligent techniques must be applied. The
relevant data must be selected, navigated to its conclusion, and reassembled for
presentation by a different medium. All of this must be done for all web pages,
dynamically, in real-time and in an automated fashion. We have used an Intelligent Agent
(IA) that uses various intelligence techniques including “artificial intelligence”.

Using Visual Clues

Understanding the process that our brains go through in making qualitative choices
is key to developing an artificially intelligent solution. In the example of Web page
navigation we know that our brains do not attempt to read and interpret an entire page of
data rather they take their cues from the visual clues implemented by the Web designer.
These clues include such things as placement of text, use of color, size of font, and density
of content.

From these clues a list of potential areas of interest can be developed and presented
as a list of candidates.

Upon selecting an item of interest it is common to have to navigate to another Web


page to read all the data of interest (just like in the newspaper example). To do so we click
on a Web link. When following a page link the problem of continuity of thought is
encountered because almost assuredly the newly linked page contains data in addition to
the thread of information we are attempting to follow. In order to maintain continuity with
the item from the previous page a contextual correlation must be made. Once again, this
cognitive process poses a formidable challenge for the computer and requires application
of Intelligent Agent (or artificial intelligence) principles to solve.

Simplifying for speech

The first step involves dynamically removing all the programming constructs and
coding tags that comprise the instruction to a Web browser on how to visually render the
data. HTML, CHTML, XML, and other languages are typically used for this purpose.
Because the data is now being translated or rendered to a different medium, these tags no
longer serve any purpose. It is doubtful that every single data item on a page will be read.
Just like reading a newspaper, we read only items of interest and generally skip
advertisements completely. Thus, we need to automatically render important information
on a page and then when a topic is selected, only the relevant information from the linked

Department of Electronics and Communication Engg. S.J.M.I.T,Chitradurga 15


PhoNET - A Voice Based Web Technology 2010-11
page corresponding to the selected topic needs to be presented. Rendering is achieved by
using Page Highlights (using a method to find and speak the key contents on a page),
finding right as well as only relevant contents on a linked page, assembling right contents
from a linked page, and providing easy navigation.

Finding and Assembling Relevant Information

To find relevant information, the Intelligent Agent (IA) uses various deterministic
and non-deterministic algorithms that use contextual and non-contextual matches,
semantic analysis, and learning. This is again very similar to how we do use our eyes and
brain to find the relevant contents. To ensure real-time performance, algorithms are
simplified as needed yet producing very satisfactory results. Once relevant contents are
determined, they are assembled in appropriate order that makes sense when listen to in
audio or viewed on a small screen.

A content rich page with a small number of links makes rendering and navigation
easy since there are only a few choices, and one can quickly select a particular topic or
section. If the site is rich in content, links and images/graphics, the problem is more
difficult but good solution still exists by carefully selecting a built-in feature called “Page
Highlights”. The most difficult case is when a page is very rich in images/graphics and
links. In such cases, the main information is located several levels down from the home
page and so navigation becomes more difficult as one has to go through multiple levels.
Using multi-level Page Highlights and customized Highlights, the content can still be
rendered well. But in this case, it is not as easy to navigate as the other two cases. Usually
most of the Internet contents fall under the first and second categories.

Rendering to a new medium

The two key media for rendering into are Audio using any phone and Visual using
a cell phone screen or PDA. There is a good synergy between these two modes from
rendering standpoint. Both need small amount of meaningful information at a time that
can be heard or viewed at ease with easy navigation. This is achieved by using Page
Highlights mentioned above and finding relevant contents, column at a time like we do
when we read a newspaper or website. A column of text information can be converted to
audio that can be heard with ease. The rate of hearing i.e. content delivery can be
controlled to suit user’s needs. The selection of a website, Page Highlight, speed of

Department of Electronics and Communication Engg. S.J.M.I.T,Chitradurga 16


PhoNET - A Voice Based Web Technology 2010-11
hearing etc can be all done by Voice Commands. This results Voice Internet i.e. basically
talking and listening to the Internet. The same column of text can be displayed on a small
screen that can be viewed at ease as a small screen can easily display a column; but not a
whole page. The contents are then automatically scrolled using various speeds and hence
can easily be viewed and absorbed at ease. This is what results a MicroBrowser or “true”
wireless Internet that does not need any re-write of the website and presents contents at
ease in a meaningful way.

CHAPTER: 3

3.1 ADVANTAGES

Department of Electronics and Communication Engg. S.J.M.I.T,Chitradurga 17


PhoNET - A Voice Based Web Technology 2010-11
 Using PhoNET the visually impaired persons can use the internet.
 There is no need of a Computer, an ordinary phone is sufficient.
 There is no need to rewrite the webpages in VXML.
 It can be efficiently used to bridge THE DIGITAL DIVIDE and THE
LANGUAGE DIVIDE.

3.2 DISADVANTAGES

 Efficient implementation of this technology depends on the developments in the


field of speech recognition and artificial intelligence.
 The major challenge of this technology is its complexity which comes with a high
price tag.
 People are more habituated with using the internet visually, so it might take some
time to switch from visual access to phonetic access.

CHAPTER: 4

4.1 APPLICATIONS

Department of Electronics and Communication Engg. S.J.M.I.T,Chitradurga 18


PhoNET - A Voice Based Web Technology 2010-11
This technology can find applications for Service Providers, Businesses and
governments in the following areas:

For Service Providers-

 Surf and browse the Web,


 Email (send, receive, compose, copy, forward, reply, delete and more),
 Search the Web,
 Voice Portal Features such as News, Weather, Stock Quotes, Horoscopes and
more.

For Businesses-

 Airline reservations and tracking,


 Package tracker,
 Reservations,
 eCommerce,
 Customer service,
 Alert service,
 Order Confirmation,
 CRM Applications.

For Governments-

 All key benefits for Businesses Plus


 Easy Accessibility to all Government contents
 More efficient communication between Government and citizens, Government
and businesses, and between Government departments.
 Meeting Government obligation to bridge the Digital and Language Divides
(with automatic translation of Internet content from English to any other
language and vice-versa.), and providing Internet to elderly, visually impaired
and blind people in a very simple and cost effective way.

4.2 FUTURE WORK

Implementation and results of this new proposed technology depends on the


developments in the field of speech recognition and artificial intelligence. The major
challenge of this technology is its complexity which comes with a high price tag. But it is
Department of Electronics and Communication Engg. S.J.M.I.T,Chitradurga 19
PhoNET - A Voice Based Web Technology 2010-11
not far more than the complexity and cost involved in voice enabling a web site using the
present technologies. We believe that Voice/speech based interface options will become
an important part of the overall solution to access the Internet content. And an automated
approach to Voice-Enable or create Voice Portals would be most practical and more
common way than rewriting web contents in different languages and maintaining multiple
version of the web sites.

4.3 CONCLUSION

The possibility of accessing the web through an ordinary phone is considered. It is


a new technology which provides a true audio Internet experience. Using an ordinary
telephone and simple voice commands, users will be able to surf and hear the entire
Internet for the information they desire. A computer is not needed. Any web page will be
Department of Electronics and Communication Engg. S.J.M.I.T,Chitradurga 20
PhoNET - A Voice Based Web Technology 2010-11
accessible, but not limited to sites written with Wireless Application Protocol, and pages
that are specially written in Voice Extensible Mark-up Language (VXML). In this report a
detailed analysis of how the technologies like SR, TTS and AI are integrated to develop an
intelligent Platform (PhoNET) to achieve voice based web access and the major problems
involved in Document processing and document rendering along with solution are
presented.

REFERENCES

1. http://www.w3.org/Voice/

2. http://www.voicexml.org/

Department of Electronics and Communication Engg. S.J.M.I.T,Chitradurga 21


PhoNET - A Voice Based Web Technology 2010-11
4. Internet speech Inc.

5. Avaya Labs

6. http://www.lhs.com/

7. http://trqce.wisc.edu/world/web

8. http://www.dcp.ucla.edu/

Department of Electronics and Communication Engg. S.J.M.I.T,Chitradurga 22

You might also like