0% found this document useful (0 votes)
72 views18 pages

A Project Report ON Speech Synthesizer: Tilak Maharashtra Vidyapeeth, Pune

This document is a project report submitted by Aditya Gurav for the partial fulfillment of a Bachelor of Computer Application degree. The report details Gurav's work on developing a speech synthesizer. It includes sections on introducing the problem, providing an overview of speech synthesizers, discussing different synthesis methods, outlining the implementation and requirements, and describing future work and limitations. The report was certified by Gurav's project guide and department head.

Uploaded by

aditya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views18 pages

A Project Report ON Speech Synthesizer: Tilak Maharashtra Vidyapeeth, Pune

This document is a project report submitted by Aditya Gurav for the partial fulfillment of a Bachelor of Computer Application degree. The report details Gurav's work on developing a speech synthesizer. It includes sections on introducing the problem, providing an overview of speech synthesizers, discussing different synthesis methods, outlining the implementation and requirements, and describing future work and limitations. The report was certified by Gurav's project guide and department head.

Uploaded by

aditya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Tilak Maharashtra Vidyapeeth, Pune

Department of Computer Science


------------------------------------------------------------------------------------------------------------------------

A PROJECT REPORT
ON
SPEECH SYNTHESIZER

------------------------------------------------------------------------------------------------------------------------

By
ADITYA GURAV

Towards The Partial Fulfillment of the


Bachelor of Computer Application

------------------------------------------------------------------------------------------------------------------------

Tilak Maharashtra Vidyapeeth, Pune


Department of Computer Science
[ 2020-2021 ]
------------------------------------------------------------------------------------------------------------------------

Page no -1
Tilak Maharashtra Vidyapeeth, Pune
Department of Computer Science
------------------------------------------------------------------------------------------------------------------------

CERTIFICATE

This is to certify that the project

“Speech Synthesizer”
Has been satisfactorily completed by

ADITYA GURAV

Towards the partial Fulfillment of the ‘Bachelor of Computer Application’.


For the Academic Year 2020-2021 at PUNE ,
Tilak Maharashtra Vidyapeeth, Pune(Department of Computer Science),
And its approved.

------------------------------------------------------------------------------------------------------------------------

Project Guide Examiner Head of Department


[Pune]

[2020-2021] Page no -2
Tilak Maharashtra Vidyapeeth, Pune
Department of Computer Science
------------------------------------------------------------------------------------------------------------------------

ACKNOWLEDGEMENT

With immense please I am presenting Speech Synthesizer Project report as part of


the curriculum of ‘Bachelor of Computer Application’. I wish to thank all the people who
gave me unending support.

I express my profound thanks to our head of department Mrs. “Asmita Namjoshi ”, project
guide and project incharge Mr. Rakesh Patil Sir and all those who have indirectly guided and helped
us in preparation of this project.

ADITYA GURAV

------------------------------------------------------------------------------------------------------------------------

[2020-2021] Page no -3
Tilak Maharashtra Vidyapeeth, Pune
Department of Computer Science
------------------------------------------------------------------------------------------------------------------------

PROJECT SYNOPSIS
Speech synthesizer can be described as artificial production of human speech . A computer
system used for this purpose is called a speech synthesizer, and can be implemented in software or
hardware. A speech synthesizer system converts normal language text into speech Synthesized
speech can be created by concatenating pieces of recorded speech that are stored in a database.
Systems differ in the size of the stored speech units, a system that stores phones or diphones
provides the largest output range, but may lack clarity. For specific usage domains, the storage of
entire words or sentences allows for high-quality output. Alternatively, a synthesizer can
incorporate a model of the vocal tract and other human voice characteristics to create a
completely "synthetic" voice output . The quality of a speech synthesizer is judged by its
similarity to the human voice and by its ability to be understood. An intelligible speech
synthesizer program allows people with visual impairments or reading disabilities to listen
to written works on a home computer. From the information now available, it can produce a speech
signal. The structure of the speech synthesizer can be broken down into major modules

 Natural Language Processing (NLP) module:It produces a phonetic transcription of the text
read, together with prosody.
 DigitalSignal Processing (DSP) module:It transforms the symbolicinformation it receives from
NLP into audible and intelligible speech.The major operations of the NLP module are as follows:
 TextAnalysis: First the text is segmented into tokens. The token-to-word conversion creates the
orthographic form of the token. For the token “Mr” the orthographic form “Mister” is formed by
expansion, the token “12” gets the orthographic form “twelve” and “1997” is transformed to
“nineteen ninety seven”.
 Application of Pronunciation Rules:After the text analysis has been completed, pronunciation
rules can be applied. Letters cannot be transformed 1:1 into phonemes because correspondence is
not always parallel. In certain environments, a single letter can correspond to either no phoneme .
 Prosody Generation: after the pronunciation has been determined, the prosody is generated. The
degree of naturalness of a speech synthesizer is dependent on prosodic factors like intonation
modelling amplitude modelling and duration modelling

[2020-2021] Page no -4
Tilak Maharashtra Vidyapeeth, Pune
Department of Computer Science
------------------------------------------------------------------------------------------------------------------------

INDEX

Sr.No Particulars Page No.


1. Introduction and Statement of problem 6
2. Overview of Speech Synthesizer 7
3. Domain-specific Synthesis 8
4. Unit Selection Synthesis 9
5. Diphone Synthesis & structure of TTS synthesizer 10
6. Implementation & Prerequisite 12
7. Pre-processing Text Analysis 13
8. Domain Knowledge & Advantages 14
9. Hardware and Software Requirements 15
10. Future scope and Limitaions or Boundaries 16
11. Conclusion and Developers comment 17
12. References & Bibliography 18

[2020-2021] Page no -5
Tilak Maharashtra Vidyapeeth, Pune
Department of Computer Science
------------------------------------------------------------------------------------------------------------------------

SPEECH SYNTHESIZER
------------------------------------------------------------------------------------------------------------------------

Introduction:-
The project Speech Synthesizer is developed as a part of VI semester for the partial
fulfilment of the BCA Degree.A Speech Synthesizer is an Python based tool that converts text into
spoken word, by analyzing and processing the text using Natural Language Processing (NLP) and
then using Digital Signal Processing (DSP) technology to convert this processed text into
synthesized speech representation of the text. Here, I developed useful speech synthesizer in the
form of a simple application that converts inputted text into synthesized speech and reads out to the
user which can then be saved as an mp3.file. The development of a speech synthesizer will be of
great help to people with visual impairment and make making through large volume of text easier.
speech synthesizer is the automatic conversion of a text into speech that resembles,as closely as
possible, a native speaker of the language reading that text. speech synthesizer is the technology
which lets computer speak to you. The TTS system gets the text as the input and then a computer
algorithm which called TTS engine analyses the text, pre-processes the text and synthesizes the
speech with some mathematical models. The TTS engine usually generates sound data in an audio
format as the output. Speech synthesizer procedure consists of two main phases. The first is text
analysis, where the input text is transcribed into a phonetic or some other linguistic representation,
and the second one is the generation of speech waveforms, where the output is produced from this
phonetic and prosodic information. These two phases are usually called high and low-level
synthesis .The input text might be for example data from a word processor, standard ASCII from
email, a mobile text-message, or scanned text from a newspaper. The character string is then pre-
processed and analyzed into phonetic representation which is usually a string of phonemes with
some additional information for correct intonation, duration, and stress. Speech sound is finally
generated with the low-level synthesizer by the information from high-level one. The artificial
production of speech-like sounds has a long history, with documented mechanical attempts dating to
the eighteenth century.

Statement Of The Problem:-


Manny people who dont love to read books and get bore of rerading
books , Speech synthesizer will help them and not make them fill bored because they have to just
paste whatever they want in speech synthesizer and it will read for them. It will also help Blind
People to read anything they want by just pasting it in speech synthesizer. It also records the sound
and saves it in mp3 file.

[2020-2021] Page no -6
Tilak Maharashtra Vidyapeeth, Pune
Department of Computer Science
------------------------------------------------------------------------------------------------------------------------

SPEECH SYNTHESIZER
------------------------------------------------------------------------------------------------------------------------

Overview of Speech Synthesizer :-


A text-to-speech system or Library is composed of two parts: a front-
end and a back-end. The front-end has two major tasks. First, it converts raw text containing
symbols like numbers and abbreviations into the equivalent of written-out words. This process is
often called text normalization, pre-processing, or tokenization. The front-end then assigns phonetic
transcriptions to each word, and divides and marks the text into prosodic units, like phrases, clauses,
and sentences. The process of assigning phonetic transcriptions to words is called text-to-phoneme
or grapheme-to-phoneme conversion. Phonetic transcriptions and prosody information together
make up the symbolic linguistic representation that is output by the front-end. The back-end often
referred to as the synthesizer then converts the symbolic linguistic representation into sound. In
certain systems, this part includes the computation of the target prosody which is then imposed on
the output speech. There are different ways to perform speech synthesis. The choice depends on the
task they are used for, but the most widely used method is Concatentive Synthesis, because it
generally produces the most natural-sounding synthesized speech. Concatenative synthesis is based
on the concatenation of segments of recorded speech. There are three major sub-types of
concatenative synthesis

Google TTS Library Functional diagram (1)

[2020-2021] Page no -7
Tilak Maharashtra Vidyapeeth, Pune
Department of Computer Science
------------------------------------------------------------------------------------------------------------------------

SPEECH SYNTHESIZER
------------------------------------------------------------------------------------------------------------------------

Domain-specific Synthesis: -
Domain-specific synthesis concatenates pre-recorded words and
phrases to create complete utterances. It is used in applications where the variety of texts the system
will output is limited to a particular domain, like transit schedule announcements or weather reports.
The technology is very simple to implement, and has been in commercial use for a long time, in
devices like talking clocks and calculators. The level of naturalness of these systems can be very
high because the variety of sentence types is limited, and they closely match the prosody and
intonation of the original recordings. Because these systems are limited by the words and phrases in
their databases, they are not general-purpose and can only synthesize the combinations of words
and phrases with which they have been pre-programmed. The blending of words within naturally
spoken language however can still cause problems unless many variations are taken into account.
For example, in non-rhotic dialects of English the "r" in words like "clear" /ˈklɪə/ is usually only
pronounced when the following word has a vowel as its first letter (e.g. "clear out" is realized as
/ˌklɪəɾˈʌʊt/) . Likewise in French, many final consonants become no longer silent if followed by a
word that begins with a vowel, an effect called liaison. This alternation cannot be reproduced by a
simple word-concatenation system, which would require additional complexity to be context-
sensitive. This involves recording the voice of a person speaking the desired words and phrases.
This is useful if only the restricted volume of phrases and sentences is used and the variety of
texts the system will output is limited to a particular domain e.g. a message in a train station,
whether reports or checking a telephone subscriber’s account balance.

[2020-2021] Page no -8
Tilak Maharashtra Vidyapeeth, Pune
Department of Computer Science
------------------------------------------------------------------------------------------------------------------------

SPEECH SYNTHESIZER
------------------------------------------------------------------------------------------------------------------------

Unit Selection Synthesis:-


Unit selection synthesis uses large databases of recorded speech. During
database creation, each recorded utterance is segmented into some or all of the following: individual
phones, diphones, half-phones, syllables, morphemes, words, phrases, and sentences.Typically, the
division into segments is done using a specially modified speech recognizer set to a "forced
alignment" mode with some manual correction afterward, using visual representations such as the
waveform and spectrogram. An index of the units in the speech database is then created based on
the segmentation and acoustic parameters like the fundamental frequency (pitch), duration, position
in the syllable, and neighboring phones. At runtime, the desired target utterance is created by
determining the best chain of candidate units from the database (unit selection). This process is
typically achieved using a specially weighted decision tree.

Unit selection provides the greatest naturalness, because it applies only a


small a90-mount of digital signals processing (DSP) to the recorded speech. DSP often makes
recorded speech sound less natural, although some systems use a small amount of signal processing
at the point of concatenation tosmooth the waveform. The output from the best unit-selection
systems is often indistinguishable from real human voices, especially in contexts for which the TTS
system has been tuned. However, maximum naturalness typically require unit-selection speech
databases to be very large, in some systems ranging into the gigabytes of recorded data,
representing dozens of hours of speech. Also, unit selection algorithms have been known to select
segments from a place that results in less than ideal synthesis (e.g. minor words become unclear)
even when a better choice exists in the database.

[2020-2021] Page no -9
Tilak Maharashtra Vidyapeeth, Pune
Department of Computer Science
------------------------------------------------------------------------------------------------------------------------

SPEECH SYNTHESIZER
------------------------------------------------------------------------------------------------------------------------

Diphone Synthesis:-
Diphone synthesis uses a minimal speech database containing all the diphones
(sound-to-sound transitions) occurring in a language. The number of diphones depends on the
phonotactics of the language: for example, Spanish has about 800 diphones, and German about
2500. In diphone synthesis, only one example of each diphone is contained in the speech database.
At runtime, the target prosody of a sentence is superimposed on these minimal units by means of
digital signal processing techniques such as linear predictive coding, PSOLA or MBROLA. The
quality of the resulting speech is generally worse than that of unit-selection systems, but more
natural-sounding than the output of formant synthesizers. Diphone synthesis suffers from the sonic
glitches of concatenative synthesis and the robotic-sounding nature of formant synthesis, and has
few of the advantages of either approach other than small size. As such, its use in commercial
applications is declining, although it continues to be used in research because there are a number
of freely available software implementations

Structure of TTS Synthesizer:-


● Natural Language Processing (NLP) module: It produces a phonetic transcription of the text
read, together with prosody.
● Digital Signal Processing (DSP) module: It transforms the symbolic information it receives
from NLP into audible and intelligible speech. The major operations of the NLP module are as
follows:
• Text Analysis: First the text is segmented into tokens. The token-to-word conversion creates
the orthographic form of the token. For the token “Mr” the orthographic form “Mister” is
formed by expansion, the token “12” gets the orthographic form “twelve” and “1997” is
transformed to “nineteen ninety seven”.
• Application of Pronunciation Rules: After the text analysis has been completed,
pronunciation rules can be applied. Letters cannot be transformed 1:1 into phonemes
because correspondence is not always parallel. In certain environments, a single letter can
correspond to either no phoneme (for example, “h” in “caught”) or several phoneme (“m”in
“Maximum”). In addition, several letters can correspond to a single phoneme (“ch” in
“rich”). There are two strategies to determine pronunciation:
• In dictionary-based solution with morphological components, as many morphemes (words)
as possible are stored in a dictionary. Full forms are generated by means of inflection,
derivation and composition rules. Alternatively, a full form dictionary is used in which all
possible word forms are stored. Pronunciation rules determine thepronunciation of words
not found in the dictionary.

[2020-2021] Page no -10


Tilak Maharashtra Vidyapeeth, Pune
Department of Computer Science
------------------------------------------------------------------------------------------------------------------------

• In a rule based solution, pronunciation rules are generated from the phonological knowledge
of dictionaries. Only words whose pronunciation is a complete\ exception are included in the
dictionary. The two applications differ significantly in the size of their dictionaries. The
dictionary-based solution is many times larger than the rules-based solution’s
dictionary of exception. However, dictionary-based solutions can be more exact than rule-
based solution if they have a large enough phonetic dictionary available.

● Prosody Generation:
After the pronunciation has been determined, the prosody is generated.
The degree of naturalness of a TTS system is dependent on prosodic factors like intonation
modelling (phrasing and accentuation), amplitude modelling and duration modelling
(including the duration of sound and the duration of pauses, which determines the length of
the syllable and the tempos of the speech). The output of the NLP module is passed to the
DSP module. This is where the actual synthesis of the speech signal happens. In
concatenative synthesis the selection and linkingof speech segments take place. For
individual sounds the best option (where several appropriate options are available) are
selected from a database and concatenated.

Operations of the natural Language processing module of a TTS module (2)

[2020-2021] Page no -11


Tilak Maharashtra Vidyapeeth, Pune
Department of Computer Science
------------------------------------------------------------------------------------------------------------------------

IMPLEMENTATION / PREREQUISITE :-
Tkinter is the standard GUI library for Python. Python
when combined with tkinter provides a fast and easy way to create GUI applications.By this library
we can make a compelling choice for building GUI applications in Python, especially for
applications where a modern sheen is unnecessary, and the top priority is to build something that’s
functional and cross-platform quickly.

To create a tkinter application:


1.Importing the module – tkinter
2.Create the main window (container)
3.Add any number of widgets to the main window
4.Apply the event Trigger on the widgets.

There are lots of library in python one of them is gTTS (Google Text-to-Speech), a Python library
and CLI tool to interface with Google Translate’s text-to-speech API. Then import the libraries:
tkinter, gTTS, and playsound, Then initialize the window , then write function to convert text to
speech in python ,then write function to Exit and function to reset and define Buttons and we are
Done.

Screenshot Of Speech Synthesizer Interface (3)

[2020-2021] Page no -12


Tilak Maharashtra Vidyapeeth, Pune
Department of Computer Science
------------------------------------------------------------------------------------------------------------------------

Speech Synthesizer Modules In Tool:

➢ Input Text - Bar


➢ Play Sound
➢ Exit Tool
➢ Reset Text Entered In Input Bar

Pre-processing Text Analysis:-


The first stage of speech synthesis systems is based on prepro- cessing, which is usually a
very complex task, depending on the language. At the stage of preprocessing, the input text is
converted into a sequence of words and symbols that will be processed by the rest of the system.
This is called text normalization. Although the speech synthesis is an area where much attention is
paid to the standardization of the text, dealing with the real text is a problem that also appears in
other applications such as machine translation, speech recognition and detection of the topic of
conversation. The most beneficial for speech synthesis system is the situation in which there is
unambiguous relationship between spelling and pronunciation. But in real text, there are many
unusual words: numbers, digit sequences, acronyms, abbreviations, dates. The main problem of text
normalization module is converting non- standard words into regular words . At this stage of
prepro-cessing, system also identifies and makes decisions concerning punctuation, identifies and
expands to a full from acronyms and numbers. The main problem in sentence segmentation, which
is a part of preprocessing process, is the ambiguity of the period that is marking sentence
boundaries or abbreviations. To determine the correct function of a period it is necessary to identify
the acro-nyms and capital letters (proper names and beginning of the sen-tence). Difficulties arise
from abbreviations which do not differ from normal sentence final words and from the fact, that
proper names may appear on the beginning of a sentence

[2020-2021] Page no -13


Tilak Maharashtra Vidyapeeth, Pune
Department of Computer Science
------------------------------------------------------------------------------------------------------------------------

Domain Knowledge:-
Domain-specific synthesizer concatenates pre-recorded words and phrases to
create complete utterances. It is used in applications where the variety of texts the system will
output is limited to a particular domain, like transit schedule announcements or weather reports.The
technology is very simple to implement, and has been in commercial use for a long time, in devices
like talking clocks and calculators This is useful if only the restricted volume of phrases and
sentences is used and the variety of texts the system will output is limited to a particular domain e.g.
a message in a train station, whether reports or checking a telephone subscriber’s account balance

Advantages:-

 It is user friendly tool to use for everyone

 Help to get pronounciation of a difficult word

 People with learning disabilities who have difficulty reading large amounts of text
due to dyslexia or other problems really benefit from speech synthesizer, offering
them an easier option for experiencing website content.

 Speech synthesizer allows people to enjoy , and also provides an option for content
consumption on the go

 speech synthesizer offers many benefits for content owners and publishers as well

 Speech synthesizer makes it easier in general for all people to access online
content on mobile devices, and strengthens corporate social responsibility by
ensuring that information is available in both written and audio format.

[2020-2021] Page no -14


Tilak Maharashtra Vidyapeeth, Pune
Department of Computer Science
------------------------------------------------------------------------------------------------------------------------

Hardware Requirement:-
 Computer / Laptop
 Keyboard
 Mouse

Software Requirement:-
 Python
 Web Browser

[2020-2021] Page no -15


Tilak Maharashtra Vidyapeeth, Pune
Department of Computer Science
------------------------------------------------------------------------------------------------------------------------

Future Scope:-Accuracy will become better and better. Dictation speech recognition will gradually
become accepted Greater use will be made of “intelligent systems” which will attempt to guess
what the speaker intended to say, rather than what was actually said, as people often misspeak and
make unintentional mistakes. Microphone and sound systems will be designed to adapt more
quickly to changing background noise levels, different environments, with better recognition of
extraneous material to be discarded. More and more elderly people benefit from voice interfaces.
They are also becoming more familiar with the computer technology, however they have problems
with understanding the synthesized speech, particularly if they have hearing problems, and when
they miss the contextual clues that compensate for weakened acoustic stimuli. Unfortunately, most
of the research investigating potential reasons for these problems has not been carried out on unit-
selection synthesis, but on formant synthesis. Formant synthesis lacks acoustic information in the
signal and exhibits incorrect prosody. Since concatenative approaches preserve far more of the
acoustic signal than formant synthesizers, lack of information should not be a problem anymore.
Instead, there are problems with spectral mismatches between units with spectral distortion due to
signal processing, and temporal distortion due to wrong durations .

Limitations or Boundaries:-
It can often be seen that online speech synthesizers do not recog-nize
special characters and symbols such as dot ".", question mark "?", or hash "#". Their databases
usually contain only a few prere-corded voices that are used for synthesis. Modern software often
leads to a different pronunciation of a particular text. What is more, there is a limit to the number of
words for the input text that is going to be converted into speech .For certain languages synthetic
speech is easier to produce than in others. Also, the amount of potential users and markets are very
different with different countries and languages which also affects how much resources are
available for developing speech synthesis. Most of languages have also some special features which
can make the development process either much easier or considerably harder. Some languages, such
as Finnish, Italian, and Spanish, have very regular pronunciation. Sometimes there is almost one-to-
one correspondence with letter to sound. The other end is for example French with very irregular
pronunciation. Many languages, such as French, German, Danish and Portuguese also contain lots
of special stress markers and other non ASCII characters (Oliveira et al. 1992). In German, the
sentential structure differs largely from other languages. For text analysis, the use of capitalized
letters with nouns may cause some problems because capitalized words are usually analyzed
differently than others.

[2020-2021] Page no -16


Tilak Maharashtra Vidyapeeth, Pune
Department of Computer Science
------------------------------------------------------------------------------------------------------------------------

Conclusion and Developers comment :- Speech synthesiszer is a rapidly growing aspect of


computer technology and is increasingly playing a more important role in the way we interact with
the system and interfaces across a variety of platforms. We have identified the various operations
and processes involved in text to speech synthesis. .My tool interfaces with a text to speech engine
developed for American English. In future, my plan is to make efforts to create engines for localized
Nigerian language so as to make text to speech technology more accessible to a wider range of
Nigerians. Another area of further work is the implementation of a speech synthesizer system on
other platforms, such as telephony systems, ATM machines, video games and any other platforms
where text to speech technology would be an added advantage and increase functionality
Synthesizing text is a high technology advancement and artificial formation of speech given a text
to be spoken. With Speech synthesizer, we can actually mediate and fill in the lacuna provided by
not fully exploiting the capabilities of some handicapped individuals. It's never been so easy to use
a speech synthesizer program, as just one click and your computer will speak any text aloud in a
clear, natural sounding voice. Therefore, there is need to use Information Technology to solve the
problem for the Before the use of the new system, proper training should be given to the users.
Despite the fact that speech synthesis constitutes a dynamically developing technology, there are
still some limitations in the currently developed speech synthesis systems. We have examined
eleven weak points of speech synthesis systems implemented so far. Scope of the issues to improve
in speech synthesis systems is very wide and includes: emotions, prosody, spontaneous speech,
preprocessing and text analysis, ambiguities, naturalness, adapta-tion of the system utterances,
disadvantages related to different types of systems, disadvantages associated with not commonly
used languages, speech synthesis for older people, and finally some other limitations concerning
special characters and symbols. One of the most important features that needs to be improved
concern natural sound of a synthetic speech system. Although the quality of speech generated by the
concatenative systems is very good, however, such systems fail if the required segments of speech
are not included in the primary database. This is due to the fact that even the largest corpora are not
able to cover all variants of contextual segments of speech. Concatenative speech synthesis systems
depend largely on the quality of the speech corpus used to construct these systems .

[2020-2021] Page no -17


Tilak Maharashtra Vidyapeeth, Pune
Department of Computer Science
------------------------------------------------------------------------------------------------------------------------

Refrences:-

1) Lemmetty, S., 1999. Review of Speech Syn1thesis


Technology. Masters Dissertation, Helsinki University
Of Technology.

2) Dutoit, T., 1993. High quality text-to-speech synthesis of


the French language. Doctoral dissertation, Faculte
Polytechnique de Mons.

3) Suendermann, D., Höge, H., and Black, A., 2010.


Challenges in Speech Synthesis. Chen, F., Jokinen, K.,
(eds.), Speech Technology, Springer Science + Business
Media LLC.

4) Allen, J., Hunnicutt, M. S., Klatt D., 1987. From Text to


Speech: The MITalk system. Cambridge University Press.

Website Link To Access My Project -


https://adityagurav.github.io/speech_synthesizer/index.html

[2020-2021] Page no -18

You might also like