TEXT - TO - SPEECH - CONVERSION - 22215a1211

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING LAB Roll No: 22215A1211

Week No: Title: CASE STUDY Date:

12 TEXT-TO-SPEECH CONVERSION 06/02/2023

ABSTRACT:

The text to speech is a generation of speech synthesized from a text. The technology
is used to communicate with users when a reading screen is not possible or
impractical. This not only open apps and information to use in new ways , but can
also make the world more accessible to people who cannot read text on a screen.
Text-To-Speech (TTSR) converts text to speech either by typing the text into the
text field provided or by coping from an external document in the local machine and
then pasting it in the text field provided in the application. It also provides a
functionality that allows the user browse the World Wide Web (www) on the
application. Text-To-Speech is capable of reading any portion of the web page the
user browses. This can be achieved by the user highlighting the portion he wants to
be read out loud by the TTS and then clicking on the “Play” button. TTS contains
an exceptional function that gives the user the choice of saving its already converted
text to any part of the local machine in an audio format; this allows the user to copy
the audio format to any of his/her audio devices.

INTRODUCTION:

The technology has evolved so much that the AI systems are taking over most of
fields and automating everything. In contrast, the python language was getting
used more widely in construction of these AI systems internally. Python is vast
language Containing many number of Functionalities, Operations, Libraries than of
any other programming language. The Functions, Libraries was developed So that
it can be used in the real world applications.

Department of Information Technology


ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING LAB Roll No: 22215A1211

One of the small application is the Text to speech converter .The technology
behind TTS (Text to speech) has evolved over the past few decades using “Natural
Language Processing”, it is now possible to produce very natural speech that
includes changes in pitch, speed, Pronounciation and inflexition. This is the basic
form of TTS that converts the text in a file to audio file containing the audio in it.
The text-to-speech (TTS) synthesis procedure consists of two main phases. The
first is text analysis, where the input text is transcribed into a phonetic or some
other linguistic representation, and the second one is the generation of speech
waveforms, where the output is produced from this phonetic and prosodic
information. These two phases are usually called high and low-level synthesis [1].
A simplified version of this procedure is presented in figure 1 below. The input
text might be for example data from a word processor, standard ASCII from e-mail
, text from some website etc. The character string is then pre-processed and
analyzed into phonetic representation which is usually a string of phonemes with
some additional information for correct intonation, duration, and stress. Speech
sound is finally generated with the low-level synthesizer by the information
from high-level one. The artificial production of speech-like sounds has a long
history, with documented mechanical attempts dating to the eighteenth century.

The Libraries used in this program are :


• “Newspaper” module used for extracting and parsing newspaper articles./
• “NLTK” or Natural Language Toolkit is a python package that can be used
for Natural Language Processing (NLP).
• “gTTS” or Google Text to Speech is a python Library and CLI
Tool to interface with Google Translate’s Text to Speech API
• “OS” Module in python provides Function for interacting with the operating
system in the PC.

CODE:
 Before writing the actual code , we need to install the mentioned Libraries in
the system. To install , you must have a python environment in your system .
eg:- VS code , anaconda…..etc.

Department of Information Technology


ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING LAB Roll No: 22215A1211

 In command prompt type “python –version”. It gives the python version if


python environment is found.
 Now to install the libraries type:-
“!pip install package name”
Eg:- “!pip install opencv-python”
“!pip install gTTS”
“!pip3 install url3k”
“!pip3 install newspaper3k”

(or)
 If you use Google colab or Jupiter-notebooks , you can directly mention the
statements of installing and run it in the cell.

 Step 1:
Import all the Libraries required for this to convert text to speech

Department of Information Technology


ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING LAB Roll No: 22215A1211

Run the cell and check for errors. If errors are present try reinstalling the
Libraries and updating them.

 Step 2:
After importing all the libraries , we need to get the “Articles” from the
online source (Website) , so that we can convert the text to speech form that
articles.

 Step 3:
Now , let’s download and parse the articles:

Department of Information Technology


ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING LAB Roll No: 22215A1211

 Step 4:
Now you need to punkt package so that we can apply NLP on it:

 Step 5:
Now define a variable to store the text of the article.

 Step 6:
Now choose the language of speech like ‘en’ for English , ‘te’ for
telugu etc.

 Step 7:
Now we need to pass the text and language to the engine to convert the
text to speech and store it in a variable. Mark slow as ‘False’ to tell the
plugin to converted audio should be at high speed.

RUNNING TEXT TO SPEECH:

 Step 8:
Now we have converted the article for text to speech in windows . We
have to save this speech to mp3 file.

Department of Information Technology


ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING LAB Roll No: 22215A1211

 Step 9:
Now lets play the converted audio file from text to speech in windows ,
using the command start followed by the name of the mp3 file.

OUTPUT:
The file “read_article.mp3” will be generated and saved in the internal memory of
the System.

In Google colab , the file will be saved temporarily in the temporary files bar. We
need to download it to check the file.

CONCLUSION:
• The TTS using python has been used widely in many basic level
applications. In many AI Systems , the text in memory is converted to
speech using the NLP. In many apps and websites , the speech is getting
converted to text so that information searching and retrieving has become
more easy.
• Many video editors use this TTS to convert the written text to audio form
without the need of an actual person translating it.
• It can also be used in converting the written text into different spoken
languages so that the information could be understandable by everyone
using it.

REFERENCES:
• I reffered this project on the website
“thecleverprogrammer.com”. The person who built the project is Aman
kharwal , a data scientist , data science mentor and machine learning in
python.
• The project website link is :-“ Text To Speech with Python | Aman Kharwal
(thecleverprogrammer.com)”.

Department of Information Technology


ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING LAB Roll No: 22215A1211

• Note:- The above project had some errors and running issues. The credit for
correcting the mistakes goes to me. I have used other references to correct
those mistakes.
• Dutoit, T., 1993. High quality text-to-speech synthesis of
the French language. Doctoral dissertation, Faculte
Polytechnique de Mons.
 Suendermann, D., Höge, H., and Black, A., 2010.
Challenges in Speech Synthesis. Chen, F., Jokinen, K.,
(eds.), Speech Technology, Springer Science + Business
Media LLC.
 Allen, J., Hunnicutt, M. S., Klatt D., 1987. From Text to
Speech: The MITalk system. Cambridge University
Press.
 Rubin, P., Baer, T., and Mermelstein, P., 1981. An
articulatory synthesizer for perceptual research. Journal
of the Acoustical Society of America 70: 321–328.
 van Santen, J.P.H., Sproat, R. W., Olive, J.P., and
Hirschberg, J., 1997. Progress in Speech Synthesis.
Springer.
 van Santen, J.P.H., 1994. Assignment of segmental
duration in text-to-speech synthesis. Computer Speech &
Language, Volume 8, Issue 2, Pages 95–128
 Wasala, A., Weerasinghe R. , and Gamage, K., 2006,
Sinhala Grapheme-to-Phoneme Conversion and Rules
for Schwaepenthesis. Proceedings of the COLING/ACL
2006 Main Conference Poster Sessions, Sydney,
Australia, pp. 890-897.
 Lamel, L.F., Gauvain, J.L., Prouts, B., Bouhier, C., and
Boesch, R., 1993. Generation and Synthesis of Broadcast
Messages, Proceedings ESCA-NATO Workshop and
Applications of Speech Technology.
 van Truc, T., Le Quang, P., van Thuyen, V., Hieu, L.T.,
Tuan, N.M., and Hung P.D., 2013. Vietnamese Synthesis

Department of Information Technology


ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING LAB Roll No: 22215A1211

System, Capstone Project Document, FPT


UNIVERSITY.
 Black, A.W., 2002. Perfect synthesis for all of the people
all of the time. IEEE TTS Workshop.
 Kominek, J., and Black, A.W., 2003. CMU ARCTIC
databases for speech synthesis. CMU-LTI-03-177.
Language Technologies Institute, School of Computer
Science, Carnegie Mellon University.
 Zhang, J., 2004. Language Generation and Speech
Synthesis in Dialogues for Language Learning. Masters
Dissertation, Massachusetts Institute of Technology.
 Dutoit, T., Pagel, V., Pierret, N., Bataille, F., van der
Vrecken, O., 1996. The MBROLA Project: Towards a
set of high quality speech synthesizers of use for non-
commercial purposes. ICSLP Proceedings.
 Text-to-speech (TTS) Overview. In Voice RSS Website.
Retrieved February 21, 2014, from
http://www.voicerss.org/tts/
 Text-to-speech technology: In Linguatec Language
Technology Website. Retrieved February 21, 2014,
From :- http://www.linguatec.net/products/tts/information/technology
 Dutoit , T., 1997. High-Quality Text-to-Speech
Synthesis: An Overview. Journal Of Electrical And Electronics Engineering
and Electronics and Communication Engineering.

Department of Information Technology

You might also like