The document discusses voice recognition systems and how they work by converting speech to text or commands. It describes the technology used including VoiceXML, which is an XML language for building voice applications. Potential applications of web-based voice recognition systems are also outlined.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
39 views
PHP Voice
The document discusses voice recognition systems and how they work by converting speech to text or commands. It describes the technology used including VoiceXML, which is an XML language for building voice applications. Potential applications of web-based voice recognition systems are also outlined.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6
Problem Statement: Web Based Voice recognition System .
Introduction & working of Voice recognition system :
Today, when we call most large companies, a person doesn't usually answer the phone. Instead, an automated voice recording answers and instructs you to press buttons to move through option menus. Many companies have moved beyond requiring you to press buttons, though. OIten you can just speak certain words (again, as instructed by a recording) to get what you need. The system that makes this possible is a type oI speech recognition program -- an automated phone system. You an also use speech recognition soItware in homes and businesses. A range oI soItware products allows users to dictate to their computer and have their words converted to text in a word processing or e-mail document. You can access Iunction commands, such as opening Iiles and accessing menus, with voice instructions. Some programs are Ior speciIic business settings, such as medical or legal transcription. People with disabilities that prevent them Irom typing have also adopted speech-recognition systems. II a user has lost the use oI his hands, or Ior visually impaired users when it is not possible or convenient to use a Braille keyboard, the systems allow personal expression through dictation as well as control oI many computer tasks. Some programs save users' speech data aIter every session, allowing people with progressive speech deterioriation to continue to dictate to their computers. Current programs Iall into two categories: Small-vocabulary/many-users These systems are ideal Ior automated telephone answering. The users can speak with a great deal oI variation in accent and speech patterns, and the system will still understand them most oI the time. However, usage is limited to a small number oI predetermined commands and inputs, such as basic menu options or numbers. Large-vocabulary/limited-users These systems work best in a business environment where a small number oI users will work with the program. While these systems work with a good degree oI accuracy (85 percent or higher with an expert user) and have vocabularies in the tens oI thousands, you must train them to work best with a small number oI primary users. The accuracy rate will Iall drastically with any other user. Speech recognition systems made more than 10 years ago also Iaced a choice between discrete and continuous speech. It is much easier Ior the program to understand words when we speak them separately, with a distinct pause between each one. However, most users preIer to speak in a normal, conversational speed. Almost all modern systems are capable oI understanding continuous speech. Speech to Data To convert speech to on-screen text or a computer command, a computer has to go through several complex steps. When you speak, you create vibrations in the air. The analog-to- digital converter (ADC) translates this analog wave into digital data that the computer can understand. To do this, it samples, or digitizes, the sound by taking precise measurements oI the wave at Irequent intervals. The system Iilters the digitized sound to remove unwanted noise, and sometimes to separate it into diIIerent bands oI frequency (Irequency is the wavelength oI the sound waves, heard by humans as diIIerences in pitch). It also normalizes the sound, or adjusts it to a constant volume level. It may also have to be temporally aligned. People don't always speak at the same speed, so the sound must be adjusted to match the speed oI the template sound samples already stored in the system's memory.
An ADC translates the analog waves of your voice into digital data by sampling the sound. The higher the sampling and precision rates, the higher the quality.
ext the signal is divided into small segments as short as a Iew hundredths oI a second, or even thousandths in the case oI plosive consonant sounds -- consonant stops produced by obstructing airIlow in the vocal tract -- like "p" or "t." The program then matches these segments to known phonemes in the appropriate language. A phoneme is the smallest element oI a language -- a representation oI the sounds we make and put together to Iorm meaningIul expressions. There are roughly 40 phonemes in the English language (diIIerent linguists have diIIerent opinions on the exact number), while other languages have more or Iewer phonemes.
The next step seems simple, but it is actually the most diIIicult to accomplish and is the is Iocus oI most speech recognition research. The program examines phonemes in the context oI the other phonemes around them. It runs the contextual phoneme plot through a complex statistical model and compares them to a large library oI known words, phrases and sentences. The program then determines what the user was probably saying and either outputs it as text or issues a computer command.
Tecbnology USED : PHP :
M?SCL
VUICE XML Introduction InO, (IO,) ls Lhe W3Cs sLandard xML formaL for speclfylng lnLeracLlve volce dlalogues beLween a human and a compuLer lL allows volce appllcaLlons Lo be developed and deployed ln an analogous way Lo P1ML for vlsual appllcaLlons !usL as P1ML documenLs are lnLerpreLed by a vlsual web browser volcexML documenLs are lnLerpreLed by a volce browser Many commercial VoiceXML applications have been deployed, processing millions oI telephone calls per day. These applications include: order inquiry, package tracking, driving directions, emergency notiIication, wake-up, Ilight tracking, voice access to email, customer relationship management, prescription reIilling, audio news magazines, voice dialing, real- estate inIormation and national directory assistance applications. VoiceXML has tags that instruct the voice browser to provide speech synthesis, automatic speech recognition, dialog management, and audio playback. The Iollowing is an example oI a VoiceXML document: <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml" <form <block <prompt Hello world! </prompt </block </form </vxml When interpreted by a VoiceXML interpreter this will output "Hello world" with synthesized speech. Typically, HTTP is used as the transport protocol Ior Ietching VoiceXML pages. Some applications may use static VoiceXML pages, while others rely on dynamic VoiceXML page generation using an application server like Tomcat, Weblogic, IIS, or WebSphere. Historically, VoiceXML platIorm vendors have implemented the standard in diIIerent ways, and added proprietary Ieatures. But the VoiceXML 2.0 standard, adopted as a W3C Recommendation on 16 March 2004, clariIied most areas oI diIIerence. The VoiceXML Forum, an industry group promoting the use oI the standard, provides a conIormance testing process that certiIies vendors' implementations as conIormant.
Problem statement can be solved by the Iollowing strategy.
Applications : 1. Web based IVRS . 2. Technique can be used to make voice based secure login system. 3. Voice based search engines. 4. Voice based Online home automation system over IP networks. Etc..