0% found this document useful (0 votes)
17 views14 pages

Text To Speech

The document outlines a project for a Text-to-Speech (TTS) system that converts written text into natural-sounding speech using AI, targeting visually impaired users, audiobook creators, and AI assistants. It details the methodology, including text input, preprocessing, feature extraction, and audio generation, while highlighting the use of deep learning models like Tacotron2 and WaveGlow. The system aims to provide high-quality, customizable speech output in multiple languages at a low cost, addressing existing limitations in current TTS solutions.

Uploaded by

fardeentaseen469
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views14 pages

Text To Speech

The document outlines a project for a Text-to-Speech (TTS) system that converts written text into natural-sounding speech using AI, targeting visually impaired users, audiobook creators, and AI assistants. It details the methodology, including text input, preprocessing, feature extraction, and audio generation, while highlighting the use of deep learning models like Tacotron2 and WaveGlow. The system aims to provide high-quality, customizable speech output in multiple languages at a low cost, addressing existing limitations in current TTS solutions.

Uploaded by

fardeentaseen469
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 14

TEXT TO SPEECH

MOHAMMED REHAN SAADI | SAZZAD ISLAM RAFEW | FARDEEN ABDULLAH TASEEN


PROJECT BRIEF

• Converts written text into natural-sounding speech using AI.


• Helps visually impaired users, audiobook creators, and AI-powered assistants.
• Uses Deep Learning models like Tacotron2 & WaveGlow to generate high-
quality speech.
• Provides a realistic voice output with adjustable pitch and speed.
EXPECTED OUTCOME

• A system where users input text and receive clear, human-like speech
output.
• Supports multiple languages and voice customizations.
• Helps in accessibility, AI assistants, and content creation.
PROBLEM STATEMENT

• Existing TTS solutions are expensive, robotic-sounding, or language-limited.


• Visually impaired individuals struggle to access written content.
• Content creators need high-quality AI voices for audiobooks and podcasts.
• Solution: Our TTS system generates natural, expressive speech at low cost, making it
accessible and customizable.
METHODOLOGY OVERVIEW
The Text-to-Speech (TTS) system follows a structured process to convert text into natural-sounding speech
using deep learning techniques. Below are the key steps involved:
• Step 1:Text Input
 User provides text input via a UI.
 Text can be loaded from a file or typed directly.

• Step 2:Text Preprocessing


 Normalize text (convert numbers, abbreviations, and symbols into readable words).
 Remove unnecessary punctuation.
 Convert text into phonemes for accurate pronunciation.
METHODOLOGY OVERVIEW
• Step 3:Feature Extraction
 Tokenize text and convert it into a phonetic representation.
 Extract linguistic and prosodic features.

• Step 4:Speech Synthesis Model


 Use Tacotron2 or FastSpeech for sequence-to-sequence text-to-speech conversion.
 Generate Mel spectrograms as an intermediate representation.

• Step 5:Audio Waveform Generation


 Use WaveGlow or HiFi-GAN to convert Mel spectrograms into audio waveforms.
 Apply post-processing for noise reduction and clarity.
METHODOLOGY OVERVIEW
• Step 6:Output & Playback
 Play the generated speech audio.
 Allow customization of voice parameters (pitch, speed, tone).

• Tools & Technologies Used:


 gTTS, pyttsx3 (Basic TTS APIs)
 Tacotron2, FastSpeech (Deep Learning Models)
 WaveGlow, HiFi-GAN (Audio Waveform Generation)
 Flask/Streamlit (User Interface)
FEATURE LIST
The core features of out project mainly consist of the following:
• Text to Speech Conversion
 Convert text to speech using deep learning models.
 Ensures pronunciation, natural rhythm and intonation.
 Uses open-source text to speech models like Tacotron2, WaveGlow and FastSpeech.

• Multi Language Support


 Supports languages other than just English.
 Uses open-source datasets like CommonVoice and LJSpeech for different speech synthesis.
 Users can select preferred language for converting text to speech.

• Adjustable Voice Speech and Speech


 Has a range of voices like man, female and robot.
 Allows speech speed control such as slow, fast or normal.
 Generate high-quality audio files like MP3.
 Uses built-in audio player for hearing generated speech.
DATASET DETAILS
• Dataset Name: LJSpeech Dataset
• Source: Open-source dataset with 13,100 English audio clips
• Size: 24 hours of recorded speech
• Features:
 Text – The sentence to be converted into speech
 Audio File – Corresponding recorded human speech
 Speaker ID – Identifies the speaker (if multi-speaker)
 Duration – Length of the audio clip

Use Case: AI learns speech patterns and converts text into natural-sounding audio.
TECHNOLOGY STACK
• Programming Language: Python
• Frameworks & Libraries:
 Tacotron2, WaveGlow – Deep Learning models for speech synthesis
 pyttsx3, gTTS – Simple text-to-speech conversion
 Librosa – Audio processing
 TensorFlow / PyTorch – Model training and optimization
 Web Framework (Optional): Flask / Streamlit (For UI)
 Database (Optional): SQLite / Firebase (For storing user text inputs)
 Deployment: Google Cloud / AWS
TARGET MARKET

The target market for the Text to speech system includes:


• Visually Impaired Individuals-Provides accessible reading options.
• Audiobook & Podcast Creators-Converts text into natural speech.
• Educational Institutions-Converts textbooks into audio for students.
• Elderly & Disabled Users-Assists with communication and reading.
THANK YOU!

You might also like