INTERNSHIP
(PGI20P01L)
On
Voice Assistant
At
Approtech R&D Solutions Pvt Ltd
By
BALARAMAN.S
RA2432242020019
Submitted to
DEPARTMENT OF COMPUTER SCIENCE AND APPLICATIONS (MCA)
Under the guidance of
DR.N.KRISHNAMOORTHY
Assistant Professor
MASTER OF COMPUTER APPLICATIONS
SRM INSTITUTE OF SCIENCE & TECHNOLOGY
Ramapuram Campus, Chennai.
NOVEMBER 2025
INDEX
S.NO CONTENTS PAGE
NO.
1 Abstract 2
2 Details about the Training 3
3 Project Description 8
4 Hardware and Software Requirements 11
5 Frontend Design Screenshots 12
6 Backend Coding 13
7 Output Screenshots 18
8 Conclusion 20
9 Future Enhancement 20
10 References 21
ABSTRACT
Voice Assistant: An Overview of Conversational AI
Voice assistants have rapidly transformed how we interact with technology,
moving beyond traditional interfaces to offer a more intuitive and natural user
experience. At their core, these systems are a blend of speech recognition and
text-to-speech (TTS) synthesis, enabling users to interact with devices using
voice commands.
The journey begins when a user utters a command. This audio input is captured
and processed by the speech recognition module, which converts the spoken
words into textual data. This transcription allows the system to interpret the
command and decide what action to take.
After recognizing the spoken command, the assistant performs simple
predefined tasks based on the identified keywords or phrases. For instance, if a
user says "What's the time?" or "Open Google," the assistant can retrieve the
current time or launch a web browser, respectively. This approach avoids
complex language analysis and focuses on direct keyword-based execution.
Once the action is complete or information is retrieved, the result is sent to the
text-to-speech synthesis module, which converts the text response into
natural-sounding speech. The assistant then speaks the result back to the user,
completing the voice interaction loop.
This simplified structure makes voice assistants practical and efficient for basic
tasks, especially in lightweight applications where advanced natural language
understanding is not required.
DETAILS ABOUT TRAINING
ABOUT COMPANY
"APPROTECH R&D SOLUTIONS PRIVATE LIMITED" is a
relatively new company, incorporated on March 28, 2025, in India,
with its registered office in Tambaram, Tamil Nadu. It is classified as
a non-government private limited company with an authorized and
paid-up capital of ₹2.00 lakh. The company's directors are
Shanmugam Prabu and Anantharaj Mariyaselvam. This entity focuses
on professional, scientific, and technical activities, and has recently
posted job openings for roles like Full Stack Engineer and Java
Developer in Chennai.
Regarding training, one of the search results for "Approtech
Solutions" (which may or may not be directly affiliated with
"APPROTECH R&D SOLUTIONS PRIVATE LIMITED" but
appears to operate in a similar domain) lists various training
programs. These include "Implant Training," which provides exposure
to industrial setups and processes, and "Seminar" which suggests
academic or professional instruction. The company "Approtech
Solutions" (from Tirunelveli) also offers training in areas such as
Power Electronics IT Solution, Embedded Systems, DSP/DIP, Java,
and Dotnet, and emphasizes continuous internal quality training
sessions for its employees.
System Design
The system design of the Voice Assistant is centered around simplicity and ease
of use. It’s built to help users perform basic tasks—like checking the time,
opening a website, or getting a quick answer—just by speaking. Unlike complex
AI systems that rely heavily on Natural Language Processing (NLP), this
assistant focuses on direct voice command recognition using straightforward
keyword detection. The overall structure includes two main components: the
backend, which handles logic and processing, and the frontend, which
manages interaction and response.
Backend Design
The backend is developed in Python, using libraries like
SpeechRecognition, pyttsx3, and others to process voice input and
respond through speech.
● Voice Input (Speech Recognition): The assistant starts by listening
through the microphone. The speech_recognition library captures
the user's voice and converts it into text.
● Command Detection: Instead of interpreting natural language, the
assistant uses simple keyword-based matching. For example:
○ If the command includes the word "time", it tells the current time.
○ If the command includes "open Google", it opens the browser.
○ If the word "weather" is detected, it gives a weather update.
● Text-to-Speech (TTS): Once a response is ready, the assistant uses the
pyttsx3 library to speak the response out loud.
● Task Execution: Each recognized command is linked to a specific
function—like opening a website, checking the system time, or exiting
the assistant.
The backend is modular, making it easy to add new commands or change
existing ones. It also includes error handling to manage unrecognized input
gracefully.
Frontend Design
The frontend is voice-based and console-driven, offering a clean and minimal
interface.
● Users speak directly into the microphone—there’s no need to type.
● The assistant responds with spoken feedback, creating a hands-free
experience.
● For debugging or visual confirmation, the console displays messages like
"Listening..." or "You said: open YouTube."
While there’s no graphical interface for now, the design is clean and intuitive. A
GUI can be added later if needed for things like customizing commands or
viewing history.
Overall System Design
The voice assistant is designed to be:
● Simple – It avoids unnecessary complexity and focuses on what’s
essential.
● Fast and Responsive – Commands are recognized and executed quickly.
● Easy to Expand – Adding new features or commands only takes a few
lines of code.
● Accessible – Voice-based interaction makes it convenient and hands-free.
This system is ideal for anyone who wants a basic personal assistant that just
works. It’s lightweight, easy to use, and a great starting point for building more
advanced features in the future.
Development Plan
The development of the Voice Assistant project is structured across a
four-week timeline, with each week focused on specific objectives to ensure
smooth progress and successful implementation. The goal is to build a
voice-controlled system capable of responding to simple voice commands using
speech recognition and text-to-speech technologies.
WEEK 1: PLANNING AND REQUIREMENTS GATHERING
In the first week, the focus is on clearly defining the purpose and functionality
of the voice assistant. This includes identifying supported features such as
fetching the time, Wikipedia search, and opening websites (Google, YouTube,
WhatsApp). The team will also finalize the tech stack, including:
● Python as the core language
● Libraries like speech_recognition, pyttsx3, sounddevice,
wikipedia, and webbrowser
● Basic error handling and voice interaction flow
This week also involves setting up the development environment and gathering
initial requirements regarding user interaction style and supported commands.
WEEK 2: BACKEND DEVELOPMENT
Week 2 is focused on implementing the core backend logic that powers the
assistant. This includes:
● Capturing microphone input using sounddevice
● Converting audio to text using Google Speech Recognition
● Processing commands (e.g., telling time, opening websites, Wikipedia
lookup)
● Implementing logic to handle keywords like "exit" or "stop" for
graceful shutdown
● Building the text-to-speech system using pyttsx3 for natural responses
By the end of the week, the assistant should be able to process voice input and
respond appropriately based on recognized commands.
WEEK 3: USER EXPERIENCE DESIGN & COMMAND
STRUCTURE
This week focuses on refining the command structure and user interaction to
make the experience smooth and intuitive:
● Implementing a wake word system ("hey bro") for activation
● Improving handling of invalid inputs or silence
● Enhancing output clarity and tone with customized responses
● Designing fallback mechanisms when speech isn’t recognized
Optional improvements may include:
● Configurable command durations
● Background listening capability
● Logging of previous commands
This week ensures that the assistant feels responsive and user-friendly, even in
less-than-perfect conditions.
WEEK 4: INTEGRATION, TESTING, AND REFINEMENT
The final week is dedicated to bringing everything together and preparing for
final delivery. Key tasks include:
● Testing all features in different environments (e.g., with varied accents or
noise levels)
● Debugging command recognition mismatches and improving accuracy
● Collecting feedback from test users and refining the responses
accordingly
● Optimizing performance for low-latency responses
If desired, documentation and packaging for deployment (e.g., as a script or
executable) will also be completed this week.
PROJECT DESCRIPTION
The Voice Assistant is a Python-based application designed to offer a simple,
voice-driven interface for executing basic computer tasks and retrieving
information. It leverages speech recognition to understand user input,
text-to-speech (TTS) for spoken responses, and integrates modules such as
Wikipedia, web browser access, and system time functions. By enabling
hands-free interaction with the system, the assistant improves accessibility and
convenience, particularly for multitasking or screen-free use.
The assistant responds to a wake word ("hey bro") and executes commands such
as checking the time, searching Wikipedia, or opening popular websites like
Google, YouTube, and WhatsApp. Built using Python and libraries such as
speech_recognition, pyttsx3, and sounddevice, the system is
lightweight and easy to run on most machines without requiring a GUI.
The project follows a structured four-week timeline, covering requirement
gathering, backend logic implementation, voice interaction design, integration,
and final testing. It serves as a foundational model for further enhancements like
weather support, chatbot integration, or smart home control.
Key Features
● Voice-controlled interface for hands-free operation.
● Speech recognition to process user commands using natural voice.
● Text-to-speech output for spoken feedback.
● Support for Wikipedia search, time reporting, and web navigation.
● Lightweight Python implementation suitable for local desktops.
● Wake-word detection system for active listening.
Benefits
● Provides a hands-free alternative to basic computer interaction.
● Simplifies information retrieval through voice commands.
● Enhances accessibility for users with limited physical input capability.
● Promotes productivity by reducing manual task switching.
● Serves as an expandable base for future voice AI projects.
● Built with open-source tools, making it easy to adapt, extend, and
integrate.
The Voice Assistant offers a functional and practical solution for users looking
to interact with their system through voice commands. With its intuitive
command structure, clear vocal responses, and essential feature set, it provides a
valuable starting point for developing more advanced conversational AI
systems. Whether used as a personal productivity tool or as a base for future
innovations, this assistant showcases how speech technologies can create
smarter and more natural user experiences
PROJECT STRUCTURE
The Voice Assistant project is built using Python, leveraging various
open-source libraries for speech recognition, text-to-speech synthesis, and web
integration. The structure is designed to keep the core functionalities modular
and easy to extend.
Environment Setup
● Programming Language: Python 3
● Key Libraries:
○ speech_recognition for converting speech to text
○ pyttsx3 for text-to-speech output
○ wikipedia for information retrieval
○ datetime for time-based features
○ webbrowser for opening web links
○ sounddevice and numpy for capturing and processing audio
input
Core Modules
● Speech Input Module
Captures audio from the microphone using sounddevice and
processes it to text with speech_recognition.
● Command Processor
Handles interpretation of commands such as checking time, searching
Wikipedia, and opening websites.
● Speech Output Module
Converts text responses back into speech using pyttsx3.
● Wake Word Detection
Listens for a predefined wake phrase (“hey bro”) before activating the
assistant.
HARDWARE AND SOFTWARE COMPONENTS
OS Name: MacOS Sequoia
Version: 15.5
OS Manufacturer: Apple inc.
System Model: MacBook Air M2
System Type: ARM-based system-on-a-chip (SoC)
Processor: Apple M2
Installed RAM: 8GB
Storage Memory: 512GB
SOFTWARE AND DEVICE REQUIREMENTS
Software Name: Jupyter Notebook 7.2.2
Python Version: Python 3.8 or higher
Key Libraries: speech_recognition, pyttsx3, wikipedia, numpy,
sounddevice, datetime, webbrowser
Operating System: macOS Ventura (or later)
Internet: Required for Google Speech Recognition API and
Connectivity: Wikipedia search
Device Type: Apple MacBook with M2 Chip
Processor: Apple M2 8-core CPU
RAM: 8GB (16GB recommended for smoother multitasking)
Storag:e Minimum 256GB SSD (more recommended for data
and projects)
Additional: Built-in microphone and speakers (or external
Requirements: mic/headphones
FRONTEND DESIGN SCREENSHOTS
Backend Coding
import speech_recognition as sr
import pyttsx3
import wikipedia
import datetime
import webbrowser
import sounddevice as sd
import numpy as np
engine = pyttsx3.init()
engine.setProperty('rate', 150)
def speak(text):
print("Assistant:", text)
engine.say(text)
engine.runAndWait()
def listen(duration=7, fs=44100):
print("Listening...")
recording = sd.rec(int(duration * fs), samplerate=fs, channels=1, dtype='int16')
sd.wait()
audio_data = np.squeeze(recording)
audio = sr.AudioData(audio_data.tobytes(), fs, 2)
recognizer = sr.Recognizer()
try:
command = recognizer.recognize_google(audio)
print("You:", command)
return command.lower().strip()
except sr.UnknownValueError:
return ""
except sr.RequestError:
speak("Speech recognition service is unavailable.")
return ""
def process_command(command):
if not command:
speak("Closing.")
return False
if any(word in command for word in ["stop", "exit", "bye", "thank you"]):
speak("Goodbye!")
return False
elif "time" in command:
now = datetime.datetime.now().strftime("%I:%M %p")
speak(f"The current time is {now}")
elif "wikipedia" in command:
topic = command.replace("wikipedia", "").strip()
if topic:
try:
summary = wikipedia.summary(topic, sentences=2)
speak(summary)
except:
speak("Sorry, I couldn't find anything on Wikipedia.")
else:
speak("Please say a topic to search on Wikipedia.")
elif "open youtube" in command:
speak("Opening YouTube.")
webbrowser.open("https://youtube.com")
elif "open google" in command:
speak("Opening Google.")
webbrowser.open("https://google.com")
elif "open whatsapp" in command:
speak("Opening whatsapp.")
webbrowser.open("https://web.whatsapp.com")
else:
speak("Sorry, I didn't understand that.")
return True
wake_word = listen()
if "hey bro" in wake_word:
speak("Yes, I am listening.")
while True:
command = listen()
if not process_command(command):
break
else:
speak("Closing.")
OUTPUT SCREENSHOTS
CONCLUSION
The internship provided an excellent opportunity to gain hands-on experience in
developing intelligent voice-enabled applications using Python. Throughout the
project, I worked on building a simple yet functional Voice Assistant that
leverages key technologies such as speech recognition, text-to-speech synthesis,
and Wikipedia integration to perform basic user commands.
By implementing this project, I deepened my understanding of how voice
interfaces work and how audio data is captured, processed, and interpreted in
real time. I also gained practical experience with Python libraries like
speech_recognition, pyttsx3, wikipedia, and sounddevice,
while learning to handle common issues such as unclear inputs, API errors, and
system integration.
The project taught me the importance of clean code structure, exception
handling, and user-centric interaction design. In addition, testing on real
hardware (MacBook with M2 chip) gave insights into optimizing applications
for cross-platform compatibility and hardware efficiency.
Overall, this internship has enhanced both my technical and problem-solving
skills, and has provided a strong foundation for pursuing more advanced
projects in Conversational AI and Voice User Interfaces (VUIs). It was a
valuable step toward building intelligent, voice-driven systems that are
becoming increasingly important in today’s digital landscape.
FUTURE ENHANCEMENT
The Voice Assistant developed during this internship serves as a foundational
prototype with essential features like time queries, Wikipedia searches, and web
navigation. However, there are several opportunities for future enhancement
that can transform it into a more intelligent and versatile system:
● Wake Word Integration with Continuous Listening: Implementing a
real-time wake-word detection system to keep the assistant active without
manual triggers, similar to commercial assistants like Siri or Alexa.
● Natural Language Understanding (NLU): Enhancing the assistant’s
ability to understand more complex or conversational queries by
integrating Natural Language Processing frameworks such as spaCy or
Rasa.
● Task Automation: Adding features like voice-controlled file management,
calendar events, reminders, or controlling smart home devices using APIs
and IoT integration.
● Multilingual Support: Expanding support for multiple regional languages
to make the assistant accessible to a broader audience.
● Mobile or Web Deployment: Converting the desktop-based prototype into
a mobile app or web-based assistant using platforms like Flask for
backend and React Native for cross-platform frontend.
● Emotion Detection: Integrating sentiment or emotion analysis based on
voice tone to provide more empathetic responses.
These enhancements open up possibilities for building a full-fledged
Conversational AI platform suitable for real-world applications in personal
productivity, accessibility tools, and enterprise automation. The experience
gained during this internship lays a strong foundation for exploring these
advanced concepts in future projects or professional roles.
REFERENCES
1. SpeechRecognition Library Documentation
https://pypi.org/project/SpeechRecognition
2. pyttsx3 Text-to-Speech Library
https://pyttsx3.readthedocs.io/en/latest/
3. Wikipedia Python API Documentation
https://wikipedia.readthedocs.io/en/latest/
4. Webbrowser Module – Python Standard Library
https://docs.python.org/3/library/webbrowser.html
5. NumPy for Audio Processing
https://numpy.org/doc/stable/
6. SoundDevice Library Documentation
https://python-sounddevice.readthedocs.io/
7. Datetime Module – Python Standard Library
https://docs.python.org/3/library/datetime.html
8. Official Python Documentation
https://docs.python.org/3/