Raspberry Pi
Raspberry Pi
Piri
A Raspberry Pi Speech Recognition System
Made by :
Sujit Royal (201201216)
Tanya Shah (201201217)
Ajay Gaur (201201218)
Miten Shah (201201219)
Khyati Vaghamshi (201201220)
HARDWARE REQUIREMENTS
Raspberry Pi
SD Card
USB keyboard
USB mouse
USB headset with microphone
USB hub
Monitor
Breadboard
Power Supply
Cables
LEDs
Internet connectivity: LAN cable
SPECIFICATIONS
Raspberry Pi Model B
PROJECT IDEA
The user will be given three options. User may choose to:
1) Speak something and know the translation.
2) Ask a query and have a reply.
3) Give an order for the LEDs to glow.
APPLICATION DESCRIPTION
Libraries to be installed
python-pip
pycurl
mplayer
flac
python2.7
libcurl
wolframalpha
APIs to be accessed
Google Speech API
Microsoft Bing Translator API
Wolfram Alpha API
#sendthefiletogooglespeechapi
c=pycurl.Curl()
c.setopt(pycurl.VERBOSE,0)
c.setopt(pycurl.URL,url)
fout=StringIO.StringIO()
c.setopt(pycurl.WRITEFUNCTION,fout.write)
c.setopt(pycurl.POST,1)
c.setopt(pycurl.HTTPHEADER,[
'ContentType:audio/xflacrate=16000'])
filesize=os.path.getsize(filename)
c.setopt(pycurl.POSTFIELDSIZE,filesize)
fin=open(filename,'rb')
c.setopt(pycurl.READFUNCTION,fin.read)
c.perform()
#receivethetextbackfromgooglespeechapi
response_code=c.getinfo(pycurl.RESPONSE_CODE)
response_data=fout.getvalue()
10
start_loc=response_data.find("transcript")
tempstr=response_data[start_loc+13:]
end_loc=tempstr.find("\"")
final_result=tempstr[:end_loc]
c.close()
#displaytherecognizedtext
print"YouSaid:"+final_result
defspeakOriginText(phrase):
googleSpeechURL=
"http://translate.google.com/translate_tts?tl="+
origin_language+"&q="+phrase
subprocess.call(["mplayer",googleSpeechURL],shell=False,
stdout=subprocess.PIPE,stderr=subprocess.PIPE)
defspeakDestinationText(phrase):
googleSpeechURL=
"http://translate.google.com/translate_tts?tl="+
destination_language+"&q="+phrase
printgoogleSpeechURL
subprocess.call(["mplayer",googleSpeechURL],shell=False,
stdout=subprocess.PIPE,stderr=subprocess.PIPE)
11
args={
'client_id':'ClientID',
'client_secret':'APIkey,
'scope':'http://api.microsofttranslator.com',
'grant_type':'client_credentials'
}
oauth_url=
'https://datamarket.accesscontrol.windows.net/v2/OAuth213'
oauth_junk=
json.loads(requests.post(oauth_url,data=urllib.urlencode(args)
).content)
translation_args={
'text':text,
'to':destination_language,
'from':origin_language
headers={'Authorization':'Bearer
'+oauth_junk['access_token']}
translation_url=
'http://api.microsofttranslator.com/V2/Ajax.svc/Translate?'
translation_result=
requests.get(translation_url+urllib.urlencode(translation_args
),headers=headers)
translation=translation_result.text[2:1]
speakOriginText('Translating'+translation_args["text"])
speakDestinationText(translation)
Query Processing
Using Wolfram Alpha: Wolfram Alpha is a very popular engine which
answers to query requests very smartly. For example, it will reply with the
12
current time when queried with What time is it, etc. Here we take a query
and send it to Wolfram Alpha to process it and then display the reply.
app_id='APIkey'
client=wolframalpha.Client(app_id)
query=''.join(sys.argv[1:])
res=client.query(query)
iflen(res.pods)>0:
texts=""
pod=res.pods[1]
ifpod.text:
texts=pod.text
else:
texts="Ihavenoanswerforthat"
texts=texts.encode('ascii','ignore')
printtexts
else:
print"Sorry,Iamnotsure."
13
TEST RESULTS
Phase 1 - Testing Speech-to-text
1)
1)
2)
3)
PASSED.
Phase 2 - Testing Translation and Text-to-speech
1) To translate the text into a different language we used Microsoft
Bing Translator ,we passed the language as arguments into it.
2) After translating ,translated language and original text obtained by
speech to text conversion were passed to google speech engine to
convert them into speech.
3) We have used Mplayer Libraries to play both the sounds
PASSED.
Phase 3- Testing the Query Processing
1) We have used Wolfram Alpha API due to its advanced features to
handle queries.
2) We have passed translated text as query to it.
3) It processes the query and gives the output (i.e in text form)
accordingly and convert the same to speech.
14
PASSED.
Phase 4 - Testing the Toggle of LEDs
1) Depending on users intention LEDs will toggle.
PASSED.
15
CONTRIBUTION
Sujith Royal
1)Documentation
2)Background Reading
Tanya Shah
1)Creation of Hardware Schematic
2)Application Study and generation of Requirements and Specification
Ajay Gaur
1)Coding
2)Background Reading
Miten Shah
1)Coding
2)Refining documentation
Khyati Vaghamshi
1)Testing of hardware and Software
2) Creation of Hardware Schematic
16
REFERENCES
1) Add the power of speech, hearing and vision to your robot MagPi, Pg 18-21, Issue 26, Aug 2014
2) Universal Translator - Dave Conroy,
http://makezine.com/projects/universal-translator/
3) Raspberry Voice Recognition System - Oscal Liang,
http://blog.oscarliang.net/raspberry-pi-voice-recognition-works-like
-siri/
4) Jasper - Control anything with your voice http://jasperproject.github.io/
5) eSpeak - http://espeak.sourceforge.net/