Natural Language Processing: Task4
Natural Language Processing: Task4
Task4
Speech Recognition and Synthesis
• Some AI solution need accept vocal commands and provide spoken response
• Example:
➢Asking Siri “Will it rain today?“
Speech
Speech-to-text API
• Perform real-time or batch transcription of audio into a text format
• Optimized for two scenarios, conversational and dictation
• Create custom models including acoustics, language, and
pronunciation if the pre-built models do not provide what you need
Speech-to-text API
Real-time transcription Batch transcription
➢Real-time ➢Asynchronously (Need to wait)
➢Transcribe text in audio streams ➢Transcribe multiple audio files
➢Scheduled on a best-effort basis
Text-to-speech API
• Support multiple languages and regional pronunciation
• Include standard voices and neural voices that provide more natural sounding
• Develop custom voices with the text to speech API
Question 1
For which two scenarios is the Universal Language Model used by the
speech-to-text API optimized?
• Acoustic
• Conversational
• Dictation
• Language
• Pronunciation
Question 2
What is the role of an acoustic model in speech recognition?