0% found this document useful (0 votes)
199 views

Raspberry Pi

The document describes a Raspberry Pi based speech recognition system that can translate speech to text, translate between languages, respond to queries, and control LEDs. It details the hardware and software requirements, implementation of each feature using various APIs, and testing of the completed system.

Uploaded by

mitenshah
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
199 views

Raspberry Pi

The document describes a Raspberry Pi based speech recognition system that can translate speech to text, translate between languages, respond to queries, and control LEDs. It details the hardware and software requirements, implementation of each feature using various APIs, and testing of the completed system.

Uploaded by

mitenshah
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

1

Piri
A Raspberry Pi Speech Recognition System

Made by :
Sujit Royal (201201216)
Tanya Shah (201201217)
Ajay Gaur (201201218)
Miten Shah (201201219)
Khyati Vaghamshi (201201220)

HARDWARE REQUIREMENTS

Raspberry Pi
SD Card
USB keyboard
USB mouse
USB headset with microphone
USB hub
Monitor
Breadboard
Power Supply
Cables
LEDs
Internet connectivity: LAN cable

SPECIFICATIONS
Raspberry Pi Model B

Processor Core - ARM1176JZFS/Video Core4GPU


Processor - Broadcom BCM2835
RAM - 512MB
Storage - SDcard
USB - 2 Host
Video output - RS-232, HDMI cable, Composite AV
Audio output - Via HDMI, 3.5mm Audio Jack
Power source - USB, 5V DC Jack
GPIO - 8
Ethernet - 10/100Mbs
Clock speed 700MHz

RS-232 to HDMI cable

iBall i2025MV USB Multimedia Headphone With Mic

Headphone Driver Unit : 40 mm driver unit


Headphone Frequency Response: 20Hz~20,000Hz
Headphone Sensitivity : 108dB
Impedance : 32 ohms
Microphone Driver Unit : 9.7 mm driver unit
Microphone Sensitivity : 582dB
Output Power : 100 mW
Input/Output Plugs : USB

APPLICATION HARDWARE SCHEMATIC

PROJECT IDEA
The user will be given three options. User may choose to:
1) Speak something and know the translation.
2) Ask a query and have a reply.
3) Give an order for the LEDs to glow.

APPLICATION DESCRIPTION

Libraries to be installed

python-pip
pycurl
mplayer
flac
python2.7
libcurl
wolframalpha

APIs to be accessed
Google Speech API
Microsoft Bing Translator API
Wolfram Alpha API

Modules of the program


Speech-to-Text
Using Google API: Create a FLAC file and encode the speech into it.
Upload it to Google Speech engine and get back the final result in the
form of text.
filename='speech.flac'
key='APIkey'
url=
'https://www.google.com/speechapi/v2/recognize?output=json&la
ng=enus&key='+key

#sendthefiletogooglespeechapi
c=pycurl.Curl()
c.setopt(pycurl.VERBOSE,0)
c.setopt(pycurl.URL,url)
fout=StringIO.StringIO()
c.setopt(pycurl.WRITEFUNCTION,fout.write)

c.setopt(pycurl.POST,1)
c.setopt(pycurl.HTTPHEADER,[

'ContentType:audio/xflacrate=16000'])

filesize=os.path.getsize(filename)
c.setopt(pycurl.POSTFIELDSIZE,filesize)
fin=open(filename,'rb')
c.setopt(pycurl.READFUNCTION,fin.read)
c.perform()

#receivethetextbackfromgooglespeechapi
response_code=c.getinfo(pycurl.RESPONSE_CODE)
response_data=fout.getvalue()

10

start_loc=response_data.find("transcript")
tempstr=response_data[start_loc+13:]
end_loc=tempstr.find("\"")
final_result=tempstr[:end_loc]

c.close()

#displaytherecognizedtext
print"YouSaid:"+final_result

Translation and Speech-to-Text


Using Microsoft Bing Translator API and Google Speech Engine: The
recognized text needs to be uploaded to Bing translator engine with the
origin and destination languages specified as arguments. The original text
and the translated text will be sent to Google Speech engine to be
converted into speech. Using Mplayer libraries, we shall play both the
sounds.
text=args.text_to_translate
origin_language=args.origin_language
destination_language=args.destination_language

defspeakOriginText(phrase):
googleSpeechURL=
"http://translate.google.com/translate_tts?tl="+
origin_language+"&q="+phrase
subprocess.call(["mplayer",googleSpeechURL],shell=False,
stdout=subprocess.PIPE,stderr=subprocess.PIPE)

defspeakDestinationText(phrase):
googleSpeechURL=
"http://translate.google.com/translate_tts?tl="+
destination_language+"&q="+phrase
printgoogleSpeechURL
subprocess.call(["mplayer",googleSpeechURL],shell=False,
stdout=subprocess.PIPE,stderr=subprocess.PIPE)

11

args={

'client_id':'ClientID',

'client_secret':'APIkey,

'scope':'http://api.microsofttranslator.com',

'grant_type':'client_credentials'
}

oauth_url=
'https://datamarket.accesscontrol.windows.net/v2/OAuth213'
oauth_junk=
json.loads(requests.post(oauth_url,data=urllib.urlencode(args)
).content)
translation_args={

'text':text,

'to':destination_language,

'from':origin_language

headers={'Authorization':'Bearer
'+oauth_junk['access_token']}
translation_url=
'http://api.microsofttranslator.com/V2/Ajax.svc/Translate?'
translation_result=
requests.get(translation_url+urllib.urlencode(translation_args
),headers=headers)
translation=translation_result.text[2:1]

speakOriginText('Translating'+translation_args["text"])
speakDestinationText(translation)

Query Processing
Using Wolfram Alpha: Wolfram Alpha is a very popular engine which
answers to query requests very smartly. For example, it will reply with the

12

current time when queried with What time is it, etc. Here we take a query
and send it to Wolfram Alpha to process it and then display the reply.
app_id='APIkey'
client=wolframalpha.Client(app_id)

query=''.join(sys.argv[1:])
res=client.query(query)

iflen(res.pods)>0:
texts=""
pod=res.pods[1]

ifpod.text:

texts=pod.text
else:

texts="Ihavenoanswerforthat"

texts=texts.encode('ascii','ignore')
printtexts
else:
print"Sorry,Iamnotsure."

Glowing LEDs: Based on the users intention, we may toggle the


LEDs.

13

TEST RESULTS
Phase 1 - Testing Speech-to-text
1)
1)
2)
3)

We successfully installed all the libraries and software needed.


We created the FLAC file to encode the speech.
Uploaded it to google engine.
Checked whether the output text matched with speech or not.

PASSED.
Phase 2 - Testing Translation and Text-to-speech
1) To translate the text into a different language we used Microsoft
Bing Translator ,we passed the language as arguments into it.
2) After translating ,translated language and original text obtained by
speech to text conversion were passed to google speech engine to
convert them into speech.
3) We have used Mplayer Libraries to play both the sounds

PASSED.
Phase 3- Testing the Query Processing
1) We have used Wolfram Alpha API due to its advanced features to
handle queries.
2) We have passed translated text as query to it.
3) It processes the query and gives the output (i.e in text form)
accordingly and convert the same to speech.

14

PASSED.
Phase 4 - Testing the Toggle of LEDs
1) Depending on users intention LEDs will toggle.
PASSED.

15

CONTRIBUTION
Sujith Royal
1)Documentation
2)Background Reading
Tanya Shah
1)Creation of Hardware Schematic
2)Application Study and generation of Requirements and Specification
Ajay Gaur
1)Coding
2)Background Reading
Miten Shah
1)Coding
2)Refining documentation
Khyati Vaghamshi
1)Testing of hardware and Software
2) Creation of Hardware Schematic

16

REFERENCES
1) Add the power of speech, hearing and vision to your robot MagPi, Pg 18-21, Issue 26, Aug 2014
2) Universal Translator - Dave Conroy,
http://makezine.com/projects/universal-translator/
3) Raspberry Voice Recognition System - Oscal Liang,
http://blog.oscarliang.net/raspberry-pi-voice-recognition-works-like
-siri/
4) Jasper - Control anything with your voice http://jasperproject.github.io/
5) eSpeak - http://espeak.sourceforge.net/

You might also like