Voice Assistant Notepad
Voice Assistant Notepad
https://doi.org/10.22214/ijraset.2023.50278
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
Abstract: The notepad is made on the basic concept of a real-time voice to text conversion technology that translates said words
into text exactly as the user pronounces them. We developed a real-time speech recognition system and tested it in normal
surroundings. The system is made up of two parts: the first is for processing an acoustic signal acquired by a microphone, and
the second is for interpreting the processed signal and translating it to words. We want a voice recognition system that is reliable
and inexpensive and with a good efficiency in performance. This system allows us to take notes faster which helps us to increase
productivity and maintain good work life balance at the same time. This helps people of all age groups such as kids to take down
notes who find writing difficult, adults to note down paragraphs or important points more easily and elderly persons to make use
of technology who find typing hard. The software program was created using an object-oriented analysis and design
methodology, and it accomplishes Speech Recognition by detecting and also capturing the audio using the microphone on the
device. Along with additional advantages, the suggested system decreased the note making duration to more than 50% depending
on the user's speed. To address the present issues with note taking, we decided to take on and do the project.
I. INTRODUCTION
People document important points from day to day activities or observations making notes an essential part of any data
documentation. Majority of people use technology to document notes more efficiently according to their convenience of using the
software, which influences how simple its to take notes.
The initial voice recognition systems concentrated on numbers rather than words. Bell Laboratories created the "Audrey" system in
1952, which could detect a singular voice speaking numbers aloud. Several years later, IBM released the "Shoebox," which
comprehended and replied to 16 English words. This resulted in the discovery of speech to text which identifies or recognises the
words spoken. For the purpose of extracting the audio from raw microphone input, it employs speech processing techniques.
In order to convert the raw input audio into words, the system often includes a microphone, processor and application that can
conduct advanced speech recognition. A monitor shows the processed data of the input using the above technique the words are
extracted and used for collecting information about the words features.
Note taking has become quicker and simpler as a result of effective system implementations. This has also made it quicker for users
to note information with a much faster rate to allow productive work flow.
B. Audio Capture
The stage of audio capture comes first. A microphone for recording the audio is used to record the audio of the user’s speech.
Creating a basic audio version, eliminating noise, and enhancing the key features are the key steps in audio pre-processing. Audio
filtering is typically done for audio pre-processing.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1037
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
C. Feature Extraction
The feature recognition step, which comes next, performs a number of tasks, including scaling the audio to a workable aspect ratio.
In addition to making the speech into a set of objects.
D. Feature Segmentation
A technique that separates depicted bars or phrases within singular letters is known as feature segmentation. This procedure aims to
break down audio from string of letters into smaller depictions from the constituent symbols. The goal of feature segmentation is to
break down an audio from string of letters to smaller objects of singular notations.[6]
E. Feature Classification
Feature classifier is the action of extracting letters from a given audio sample, identifying them, then transforming them within
readable text in standard representation of data in computer science otherwise another system-mutable format. Action of classifying
the given letters in the manner of an established letter group is known as feature classification.
Here the flowchart or the above diagram shows how the speech recognition function works which is used in this project. Using this
as a base the later front-end of the system is developed both for a web application and as a mobile application also. The audio is first
acquired from a microphone from the device then through a step by step process converts it into editable text and displays it to the
user.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1038
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
V. OVERVIEW OF TECHNOLOGIES
A. Hardware Technologies
1) Microphone: Audio microphones are employed in the audio recording phase of the process. They are mainly used to capture
audio or speech of users.
2) System: The physical processing system is used as the mainframe in this application and to apply different filtering algorithms.
In this project both a computer and a smartphone are used to run the application.
B. Software Technologies
1) In Computer
a) Speech Recognition Software: This program's speech recognition features enable it to extract the required audio sample from
raw audio input.
b) HTML: HTML is an abbreviation for Hyper Text Markup Language. HTML is the industry standard markup language for
developing Web pages. The structure of a Web page is described in HTML. HTML is made up of a number of elements. HTML
elements instruct the browser on how to render the material.
c) CSS: CSS is an abbreviation for Cascading Style Sheets.
CSS specifies how HTML components should appear on screen, paper, or in other mediums.
CSS saves a significant amount of time. It has the ability to control the layout of numerous web pages at the same time.
CSS files include external stylesheets.
VI. IMPLEMENTATION
A. In Computer Application
The user first opens the application who can select the required language as shown.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1039
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
Give permission to the microphone and click the start recording button to take notes through speech.
If the text is undesirable click clear or if you want to save it click the download button to save your note text file in the desired
location.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1040
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
B. In Smartphone
In the android application the user first clicks on the mic icon to speak and the app listens and then displays the text for the user to
edit freely.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1041
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
VII. RESULTS
We looked into the note taking procedure to understand the issue better and discovered the duration of the action of notes taken
while collecting the textual data.
The problems were identified then came up with answers towards the issues by greatly decreasing the data collecting duration as
shown in the below results table.
Therefore the time was approximately shortened by a great margin as it increases more with longer the input the accuracy is >90%.
VIII. CONCLUSION
The reasons for this study was the issues and difficulties related to data entry in notepads. The major objective of this study was to
create an android application for automatic voice assistant order to manage notes.
The complete method that we suggested as a way of addressing the difficulties faced during this textual data collecting procedure.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1042
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
Below are the pros achieved from the completed system’s performance:
1) The data collection has been made virtual which lessens the maintenance of the physical notebooks or records.
2) Shortening the time by speeding the note taking procedure.
3) Thorough documentation of note data.
4) Offers a method for simple information backup and exchange.
5) Sharing real-time information with the user.
6) Easier examination of the recorded data.
REFERENCES
[1] Nikhil Jain, Manya Goyal, Agravi Gupta, Vivek Kumar Speech to text conversion for using sentiment analysis (v-3 june 2021)
[2] Android studio software development kit tutorialspoint
[3] Voice Recognition System Research Gate (Pranab Das Nov 2015
[4] JavaScript Languages Speech recognition Geeksforgeeks.com
[5] Automatic Speech Recognition Survey (Dr.Arbana Kadriu 2020)
[6] HTML, CSS, JS basics from w3schools.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1043