0% found this document useful (0 votes)
51 views24 pages

Bala Approtech Internship Report

The document outlines an internship project focused on developing a Voice Assistant using Python, which utilizes speech recognition and text-to-speech technologies for basic user interactions. The project is structured over four weeks, covering planning, backend development, user experience design, and testing, with features like Wikipedia search and web navigation. The internship provided hands-on experience in building intelligent applications and understanding voice interface functionalities.

Uploaded by

Aswin Karthik AS
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views24 pages

Bala Approtech Internship Report

The document outlines an internship project focused on developing a Voice Assistant using Python, which utilizes speech recognition and text-to-speech technologies for basic user interactions. The project is structured over four weeks, covering planning, backend development, user experience design, and testing, with features like Wikipedia search and web navigation. The internship provided hands-on experience in building intelligent applications and understanding voice interface functionalities.

Uploaded by

Aswin Karthik AS
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

INTERNSHIP

(PGI20P01L)
On
Voice Assistant
At

Approtech R&D Solutions Pvt Ltd


By

BALARAMAN.S
RA2432242020019

Submitted to

DEPARTMENT OF COMPUTER SCIENCE AND APPLICATIONS (MCA)

Under the guidance of


DR.N.KRISHNAMOORTHY
Assistant Professor

MASTER OF COMPUTER APPLICATIONS

SRM INSTITUTE OF SCIENCE & TECHNOLOGY


Ramapuram Campus, Chennai.
NOVEMBER 2025
INDEX

S.NO CONTENTS PAGE


NO.

1 Abstract 2

2 Details about the Training 3

3 Project Description 8

4 Hardware and Software Requirements 11

5 Frontend Design Screenshots 12

6 Backend Coding 13

7 Output Screenshots 18

8 Conclusion 20

9 Future Enhancement 20

10 References 21
ABSTRACT

Voice Assistant: An Overview of Conversational AI

Voice assistants have rapidly transformed how we interact with technology,


moving beyond traditional interfaces to offer a more intuitive and natural user
experience. At their core, these systems are a blend of speech recognition and
text-to-speech (TTS) synthesis, enabling users to interact with devices using
voice commands.

The journey begins when a user utters a command. This audio input is captured
and processed by the speech recognition module, which converts the spoken
words into textual data. This transcription allows the system to interpret the
command and decide what action to take.

After recognizing the spoken command, the assistant performs simple


predefined tasks based on the identified keywords or phrases. For instance, if a
user says "What's the time?" or "Open Google," the assistant can retrieve the
current time or launch a web browser, respectively. This approach avoids
complex language analysis and focuses on direct keyword-based execution.

Once the action is complete or information is retrieved, the result is sent to the
text-to-speech synthesis module, which converts the text response into
natural-sounding speech. The assistant then speaks the result back to the user,
completing the voice interaction loop.

This simplified structure makes voice assistants practical and efficient for basic
tasks, especially in lightweight applications where advanced natural language
understanding is not required.
DETAILS ABOUT TRAINING

ABOUT COMPANY

"APPROTECH R&D SOLUTIONS PRIVATE LIMITED" is a


relatively new company, incorporated on March 28, 2025, in India,
with its registered office in Tambaram, Tamil Nadu. It is classified as
a non-government private limited company with an authorized and
paid-up capital of ₹2.00 lakh. The company's directors are
Shanmugam Prabu and Anantharaj Mariyaselvam. This entity focuses
on professional, scientific, and technical activities, and has recently
posted job openings for roles like Full Stack Engineer and Java
Developer in Chennai.

Regarding training, one of the search results for "Approtech


Solutions" (which may or may not be directly affiliated with
"APPROTECH R&D SOLUTIONS PRIVATE LIMITED" but
appears to operate in a similar domain) lists various training
programs. These include "Implant Training," which provides exposure
to industrial setups and processes, and "Seminar" which suggests
academic or professional instruction. The company "Approtech
Solutions" (from Tirunelveli) also offers training in areas such as
Power Electronics IT Solution, Embedded Systems, DSP/DIP, Java,
and Dotnet, and emphasizes continuous internal quality training
sessions for its employees.
System Design

The system design of the Voice Assistant is centered around simplicity and ease
of use. It’s built to help users perform basic tasks—like checking the time,
opening a website, or getting a quick answer—just by speaking. Unlike complex
AI systems that rely heavily on Natural Language Processing (NLP), this
assistant focuses on direct voice command recognition using straightforward
keyword detection. The overall structure includes two main components: the
backend, which handles logic and processing, and the frontend, which
manages interaction and response.

Backend Design

The backend is developed in Python, using libraries like


SpeechRecognition, pyttsx3, and others to process voice input and
respond through speech.

●​ Voice Input (Speech Recognition): The assistant starts by listening


through the microphone. The speech_recognition library captures
the user's voice and converts it into text.​

●​ Command Detection: Instead of interpreting natural language, the


assistant uses simple keyword-based matching. For example:​

○​ If the command includes the word "time", it tells the current time.​

○​ If the command includes "open Google", it opens the browser.​


○​ If the word "weather" is detected, it gives a weather update.​

●​ Text-to-Speech (TTS): Once a response is ready, the assistant uses the


pyttsx3 library to speak the response out loud.​

●​ Task Execution: Each recognized command is linked to a specific


function—like opening a website, checking the system time, or exiting
the assistant.​

The backend is modular, making it easy to add new commands or change


existing ones. It also includes error handling to manage unrecognized input
gracefully.

Frontend Design

The frontend is voice-based and console-driven, offering a clean and minimal


interface.

●​ Users speak directly into the microphone—there’s no need to type.​

●​ The assistant responds with spoken feedback, creating a hands-free


experience.​

●​ For debugging or visual confirmation, the console displays messages like


"Listening..." or "You said: open YouTube."​
While there’s no graphical interface for now, the design is clean and intuitive. A
GUI can be added later if needed for things like customizing commands or
viewing history.

Overall System Design

The voice assistant is designed to be:

●​ Simple – It avoids unnecessary complexity and focuses on what’s


essential.​

●​ Fast and Responsive – Commands are recognized and executed quickly.​

●​ Easy to Expand – Adding new features or commands only takes a few


lines of code.​

●​ Accessible – Voice-based interaction makes it convenient and hands-free.​

This system is ideal for anyone who wants a basic personal assistant that just
works. It’s lightweight, easy to use, and a great starting point for building more
advanced features in the future.

Development Plan

The development of the Voice Assistant project is structured across a


four-week timeline, with each week focused on specific objectives to ensure
smooth progress and successful implementation. The goal is to build a
voice-controlled system capable of responding to simple voice commands using
speech recognition and text-to-speech technologies.

WEEK 1: PLANNING AND REQUIREMENTS GATHERING

In the first week, the focus is on clearly defining the purpose and functionality
of the voice assistant. This includes identifying supported features such as
fetching the time, Wikipedia search, and opening websites (Google, YouTube,
WhatsApp). The team will also finalize the tech stack, including:

●​ Python as the core language​

●​ Libraries like speech_recognition, pyttsx3, sounddevice,


wikipedia, and webbrowser​

●​ Basic error handling and voice interaction flow​

This week also involves setting up the development environment and gathering
initial requirements regarding user interaction style and supported commands.

WEEK 2: BACKEND DEVELOPMENT

Week 2 is focused on implementing the core backend logic that powers the
assistant. This includes:

●​ Capturing microphone input using sounddevice​


●​ Converting audio to text using Google Speech Recognition​

●​ Processing commands (e.g., telling time, opening websites, Wikipedia


lookup)​

●​ Implementing logic to handle keywords like "exit" or "stop" for


graceful shutdown​

●​ Building the text-to-speech system using pyttsx3 for natural responses​

By the end of the week, the assistant should be able to process voice input and
respond appropriately based on recognized commands.

WEEK 3: USER EXPERIENCE DESIGN & COMMAND


STRUCTURE

This week focuses on refining the command structure and user interaction to
make the experience smooth and intuitive:

●​ Implementing a wake word system ("hey bro") for activation​

●​ Improving handling of invalid inputs or silence​

●​ Enhancing output clarity and tone with customized responses​

●​ Designing fallback mechanisms when speech isn’t recognized​


Optional improvements may include:

●​ Configurable command durations​

●​ Background listening capability​

●​ Logging of previous commands​

This week ensures that the assistant feels responsive and user-friendly, even in
less-than-perfect conditions.

WEEK 4: INTEGRATION, TESTING, AND REFINEMENT

The final week is dedicated to bringing everything together and preparing for
final delivery. Key tasks include:

●​ Testing all features in different environments (e.g., with varied accents or


noise levels)​

●​ Debugging command recognition mismatches and improving accuracy​

●​ Collecting feedback from test users and refining the responses


accordingly​

●​ Optimizing performance for low-latency responses​

If desired, documentation and packaging for deployment (e.g., as a script or


executable) will also be completed this week.
PROJECT DESCRIPTION

The Voice Assistant is a Python-based application designed to offer a simple,


voice-driven interface for executing basic computer tasks and retrieving
information. It leverages speech recognition to understand user input,
text-to-speech (TTS) for spoken responses, and integrates modules such as
Wikipedia, web browser access, and system time functions. By enabling
hands-free interaction with the system, the assistant improves accessibility and
convenience, particularly for multitasking or screen-free use.

The assistant responds to a wake word ("hey bro") and executes commands such
as checking the time, searching Wikipedia, or opening popular websites like
Google, YouTube, and WhatsApp. Built using Python and libraries such as
speech_recognition, pyttsx3, and sounddevice, the system is
lightweight and easy to run on most machines without requiring a GUI.

The project follows a structured four-week timeline, covering requirement


gathering, backend logic implementation, voice interaction design, integration,
and final testing. It serves as a foundational model for further enhancements like
weather support, chatbot integration, or smart home control.

Key Features

●​ Voice-controlled interface for hands-free operation.​

●​ Speech recognition to process user commands using natural voice.​

●​ Text-to-speech output for spoken feedback.​

●​ Support for Wikipedia search, time reporting, and web navigation.​


●​ Lightweight Python implementation suitable for local desktops.​

●​ Wake-word detection system for active listening.

Benefits

●​ Provides a hands-free alternative to basic computer interaction.​

●​ Simplifies information retrieval through voice commands.​

●​ Enhances accessibility for users with limited physical input capability.​

●​ Promotes productivity by reducing manual task switching.​

●​ Serves as an expandable base for future voice AI projects.​

●​ Built with open-source tools, making it easy to adapt, extend, and


integrate.

The Voice Assistant offers a functional and practical solution for users looking
to interact with their system through voice commands. With its intuitive
command structure, clear vocal responses, and essential feature set, it provides a
valuable starting point for developing more advanced conversational AI
systems. Whether used as a personal productivity tool or as a base for future
innovations, this assistant showcases how speech technologies can create
smarter and more natural user experiences
PROJECT STRUCTURE

The Voice Assistant project is built using Python, leveraging various


open-source libraries for speech recognition, text-to-speech synthesis, and web
integration. The structure is designed to keep the core functionalities modular
and easy to extend.

Environment Setup

●​ Programming Language: Python 3​

●​ Key Libraries:​

○​ speech_recognition for converting speech to text​

○​ pyttsx3 for text-to-speech output​

○​ wikipedia for information retrieval​

○​ datetime for time-based features​

○​ webbrowser for opening web links​

○​ sounddevice and numpy for capturing and processing audio


input​
Core Modules

●​ Speech Input Module​


Captures audio from the microphone using sounddevice and
processes it to text with speech_recognition.​

●​ Command Processor​
Handles interpretation of commands such as checking time, searching
Wikipedia, and opening websites.​

●​ Speech Output Module​


Converts text responses back into speech using pyttsx3.​

●​ Wake Word Detection​


Listens for a predefined wake phrase (“hey bro”) before activating the
assistant.​

HARDWARE AND SOFTWARE COMPONENTS

OS Name: MacOS Sequoia

Version: 15.5

OS Manufacturer: Apple inc.

System Model: MacBook Air M2

System Type: ARM-based system-on-a-chip (SoC)

Processor: Apple M2
Installed RAM: 8GB

Storage Memory: 512GB

SOFTWARE AND DEVICE REQUIREMENTS

Software Name: Jupyter Notebook 7.2.2

Python Version: Python 3.8 or higher

Key Libraries: speech_recognition, pyttsx3, wikipedia, numpy,

sounddevice, datetime, webbrowser

Operating System: macOS Ventura (or later)

Internet: Required for Google Speech Recognition API and

Connectivity: Wikipedia search

Device Type: Apple MacBook with M2 Chip

Processor: Apple M2 8-core CPU

RAM: 8GB (16GB recommended for smoother multitasking)

Storag:e Minimum 256GB SSD (more recommended for data

and projects)
Additional: Built-in microphone and speakers (or external

Requirements: mic/headphones

FRONTEND DESIGN SCREENSHOTS


Backend Coding
import speech_recognition as sr
import pyttsx3
import wikipedia
import datetime
import webbrowser
import sounddevice as sd
import numpy as np

engine = pyttsx3.init()
engine.setProperty('rate', 150)

def speak(text):
print("Assistant:", text)
engine.say(text)
engine.runAndWait()

def listen(duration=7, fs=44100):


print("Listening...")
recording = sd.rec(int(duration * fs), samplerate=fs, channels=1, dtype='int16')
sd.wait()
audio_data = np.squeeze(recording)
audio = sr.AudioData(audio_data.tobytes(), fs, 2)

recognizer = sr.Recognizer()
try:
command = recognizer.recognize_google(audio)
print("You:", command)
return command.lower().strip()
except sr.UnknownValueError:
return ""
except sr.RequestError:
speak("Speech recognition service is unavailable.")
return ""

def process_command(command):
if not command:
speak("Closing.")
return False

if any(word in command for word in ["stop", "exit", "bye", "thank you"]):


speak("Goodbye!")
return False
elif "time" in command:
now = datetime.datetime.now().strftime("%I:%M %p")
speak(f"The current time is {now}")

elif "wikipedia" in command:


topic = command.replace("wikipedia", "").strip()
if topic:
try:
summary = wikipedia.summary(topic, sentences=2)
speak(summary)
except:
speak("Sorry, I couldn't find anything on Wikipedia.")
else:
speak("Please say a topic to search on Wikipedia.")

elif "open youtube" in command:


speak("Opening YouTube.")
webbrowser.open("https://youtube.com")

elif "open google" in command:


speak("Opening Google.")
webbrowser.open("https://google.com")

elif "open whatsapp" in command:


speak("Opening whatsapp.")
webbrowser.open("https://web.whatsapp.com")

else:
speak("Sorry, I didn't understand that.")

return True

wake_word = listen()

if "hey bro" in wake_word:


speak("Yes, I am listening.")
while True:
command = listen()
if not process_command(command):
break
else:
speak("Closing.")
OUTPUT SCREENSHOTS
CONCLUSION

The internship provided an excellent opportunity to gain hands-on experience in


developing intelligent voice-enabled applications using Python. Throughout the
project, I worked on building a simple yet functional Voice Assistant that
leverages key technologies such as speech recognition, text-to-speech synthesis,
and Wikipedia integration to perform basic user commands.

By implementing this project, I deepened my understanding of how voice


interfaces work and how audio data is captured, processed, and interpreted in
real time. I also gained practical experience with Python libraries like
speech_recognition, pyttsx3, wikipedia, and sounddevice,
while learning to handle common issues such as unclear inputs, API errors, and
system integration.

The project taught me the importance of clean code structure, exception


handling, and user-centric interaction design. In addition, testing on real
hardware (MacBook with M2 chip) gave insights into optimizing applications
for cross-platform compatibility and hardware efficiency.

Overall, this internship has enhanced both my technical and problem-solving


skills, and has provided a strong foundation for pursuing more advanced
projects in Conversational AI and Voice User Interfaces (VUIs). It was a
valuable step toward building intelligent, voice-driven systems that are
becoming increasingly important in today’s digital landscape.
FUTURE ENHANCEMENT

The Voice Assistant developed during this internship serves as a foundational


prototype with essential features like time queries, Wikipedia searches, and web
navigation. However, there are several opportunities for future enhancement
that can transform it into a more intelligent and versatile system:

●​ Wake Word Integration with Continuous Listening: Implementing a


real-time wake-word detection system to keep the assistant active without
manual triggers, similar to commercial assistants like Siri or Alexa.​

●​ Natural Language Understanding (NLU): Enhancing the assistant’s


ability to understand more complex or conversational queries by
integrating Natural Language Processing frameworks such as spaCy or
Rasa.​

●​ Task Automation: Adding features like voice-controlled file management,


calendar events, reminders, or controlling smart home devices using APIs
and IoT integration.​

●​ Multilingual Support: Expanding support for multiple regional languages


to make the assistant accessible to a broader audience.​

●​ Mobile or Web Deployment: Converting the desktop-based prototype into


a mobile app or web-based assistant using platforms like Flask for
backend and React Native for cross-platform frontend.​

●​ Emotion Detection: Integrating sentiment or emotion analysis based on


voice tone to provide more empathetic responses.​

These enhancements open up possibilities for building a full-fledged


Conversational AI platform suitable for real-world applications in personal
productivity, accessibility tools, and enterprise automation. The experience
gained during this internship lays a strong foundation for exploring these
advanced concepts in future projects or professional roles.
REFERENCES
1.​ SpeechRecognition Library Documentation​
https://pypi.org/project/SpeechRecognition​

2.​ pyttsx3 Text-to-Speech Library​


https://pyttsx3.readthedocs.io/en/latest/​

3.​ Wikipedia Python API Documentation​


https://wikipedia.readthedocs.io/en/latest/​

4.​ Webbrowser Module – Python Standard Library​


https://docs.python.org/3/library/webbrowser.html​

5.​ NumPy for Audio Processing​


https://numpy.org/doc/stable/​

6.​ SoundDevice Library Documentation​


https://python-sounddevice.readthedocs.io/​

7.​ Datetime Module – Python Standard Library​


https://docs.python.org/3/library/datetime.html​

8.​ Official Python Documentation​


https://docs.python.org/3/​

You might also like