0% found this document useful (0 votes)
11 views14 pages

Mini Project

The document outlines the development of Jarvis, an AI-based voice assistant designed for WhatsApp automation, focusing on enhancing user productivity through voice commands and facial recognition for security. It includes a literature survey on advancements in voice assistant technology and compares existing systems like Amazon Alexa and Google Assistant. The project aims to create a cost-effective, multi-functional tool that improves accessibility and efficiency in digital communication.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views14 pages

Mini Project

The document outlines the development of Jarvis, an AI-based voice assistant designed for WhatsApp automation, focusing on enhancing user productivity through voice commands and facial recognition for security. It includes a literature survey on advancements in voice assistant technology and compares existing systems like Amazon Alexa and Google Assistant. The project aims to create a cost-effective, multi-functional tool that improves accessibility and efficiency in digital communication.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Rayat Shikshan Sanstha’s

Karmaveer Bhaurao Patil College of Engineering, Satara


Department : Computer Science & Engineering
Academic Year: 2024-25 Semester-V

Under the Guidance of


Prof. Anuja Jadhav

by
1. Arundhati Avinash Gadekar (22)
2. Awantika Satish Karpe (20)
3. Karan Vishnu Gite (52)
4. Suraj Sanjay Patil (53)
Motivation
The development of Jarvis is driven by the need for efficient, voice-controlled
systems that simplify tasks and boost productivity. By automating tasks like
opening apps, sending messages, and browsing the web using voice
commands, Jarvis saves time and effort. It also aims to be a cost-effective
alternative to commercial assistants like Siri or Alexa, using free APIs and
open-source tools. The inclusion of facial recognition adds an extra layer of
security, making it both user-friendly and secure.

Case Study

1. Personal Productivity: A software developer uses Jarvis to multitask by


opening apps, sending WhatsApp messages, and browsing the web through
voice commands, increasing efficiency.
2.Accessibility: For visually impaired users, Jarvis offers hands-free
interaction with devices using voice commands, making tasks like messaging
and browsing easier.
3.Customer Service Automation: A small business owner uses Jarvis to
handle customer inquiries and automate WhatsApp messages, improving
business efficiency without high costs.
Introduction
In the rapidly evolving digital communication landscape, automation tools are
becoming essential for enhancing user efficiency. This mini project focuses on
developing an AI-based voice assistant specifically for WhatsApp automation,
allowing users to perform various tasks through simple voice commands

sss
A standout feature of this voice assistant is the integration of facial recognition
technology for secure authentication. Additionally, the assistant utilizes
natural language processing powered by ChatGPT, enabling it to understand
and respond to user queries conversationally. This feature enhances user
experience by providing intuitive support and facilitating seamless
communication.

Overall, this project represents a significant innovation in automating and


securing WhatsApp interactions, making digital communication more efficient
and user-friendly.
Literature Survey
Voice assistants have evolved rapidly due to advancements in artificial
intelligence (AI) and natural language processing (NLP). Early developments
such as IBM’s Shoebox (1961) and Dragon Dictate (1990s) paved the way
for modern AI-based assistants like Siri (Apple, 2011), Google Assistant
(2016), and Amazon Alexa (2014). These systems rely on speech recognition
and natural language understanding to interpret and respond to user
commands.

Speech Recognition Technologies: Systems like Google’s Speech-to-Text


API and Microsoft’s Azure Speech Services have improved accuracy using
deep learning models like Deep Neural Networks (DNNs) and Recurrent
Neural Networks (RNNs). Research has shown that integrating such models
with language models like Transformer-based architectures (e.g., BERT,
GPT) enhances understanding and response quality.
Literature Survey

Text-to-Speech (TTS) has evolved from early concatenative synthesis to


modern neural TTS models such as WaveNet, which delivers more natural-
sounding speech. These innovations help bridge human-computer interaction
by producing highly intelligible and natural voice output.

Face Recognition for authentication has gained traction in securing voice


assistants, using technologies like convolutional neural networks (CNNs)
and feature-matching techniques for robust user identification. This helps
personalize and secure interactions.

Recent studies focus on improving conversational AI through models like


GPT-3 and open-source alternatives such as DialoGPT, emphasizing their
role in making interactions more dynamic and human-like.
Existing Systems
Amazon Alexa Google Assistant Apple Siri

• Description: Amazon • Description: Google • Description: Siri is


Alexa is a cloud-based Assistant is an AI- Apple’s voice-activated
voice service available powered virtual assistant available on
on Amazon Echo and assistant developed by iOS devices, including
other Alexa-enabled Google. It is available iPhones, iPads, and
devices. It can perform on smartphones, smart Macs.
tasks such as controlling speakers, and other • Capabilities: Voice
lights, adjusting connected devices command recognition,
thermostats, and • Capabilities: Advanced natural language
managing entertainment speech recognition, understanding, and
systems. natural language seamless integration
• Capabilities: Voice processing, and with Apple’s ecosystem.
recognition, natural integration with Google
language understanding, services and third-party
and integration with devices.
numerous third-party
smart devices.
Objectives

Develop a functional AI-based


voice assistant that can execute
Implement WhatsApp
commands, including converting
automation to send messages
text to speech, recognizing
programmatically based on voice
speech, and performing actions
commands.
like opening applications and
websites.

Integrate a free ChatGPT API


alternative to provide
conversational AI capabilities
without incurring high costs.
s
Proposed System

Auto Classification Speech-to-text


(detect wake words) (transcribe query)

Spoken Query

Text-to –Speech Language model


(synthesis speech) (generate response)
Spoken Answer

- On device

- On the cloud
Proposed System

Frontend View of Voice Assistant


System Architecture
User Interface(UI)
(Voice & Text Inputs)

Speech Recognition Modules –


(Speech-to-Text)
User Interface (UI)
Speech Recognition (Speech-to-
Command Processing Module Text)
(Logic and Task Execution) Command Processing Module
Text-to-Speech (TTS)
Application and Website Control
Face WhatsApp Automation
App Whatsapp ChatGPT API Alternative
TTS Control Automation Authenti-
cation Face Authentication

chatGPT API Alternative


(conversational AI)
Advantages

Cost-Effective &
Helps to Blind
easily handle by
people to send
non-technical
messages
person also

Multi-Functional Improve response


and 24/7 time
Availability
Limitations and Future Scope

Limitations Future Scope

• Dependency on Devices • Integration with Additional


• Dependency on internet Services
• Language • Enhanced platform services
• User Personalization
• Multiple Language options
References

1. https://www.researchgate.net/publication/372394842_Development_o
f_AI-based_voice_assistants_using_Large_Language_Models
2. Artificial Intelligence-based Voice Assistant | IEEE Conference
Publication | IEEE Xplore
3. http://www.ijert.org
4. Voice Assistants: The Present and Future
5. A Comprehensive Review on Speech Emotion Recognition
6. Speech Synthesis with Transformers: A Review
Project Implementation

You might also like