0% found this document useful (0 votes)

15 views17 pages

Build A Python Web Application That Turns Voice Into Text Into Image - by Andrew

Uploaded by

willdynamics

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views17 pages

Build A Python Web Application That Turns Voice Into Text Into Image - by Andrew

Uploaded by

willdynamics

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Published in Better Programming

You have 1 free member-only story left this month.

Andrew Hershy Follow

Sep 29, 2022 · 4 min read · · Listen

Save

Build a Python Web Application That Turns

Voice Into Text Into Image
Speaking images into existence using DALL-E mini and
assembly
DALLE-2 Image Source Prompt: “steampunk Iphone 12”

Introduction
Speech, text, and images are the three ways humanity has transmitted
information
Open in app
throughout history. In this project, we are going toSign
build
up anSign In
You are that
application signedlistens
out. Sign
tointhe
withspeech,
your turns that speech into text, then turns
member account (wi__@g__.com) to
that text into images. All this can be done in the afternoon. We live in a
view other member-only stories. Sign
remarkable
in time!
speech to text to image

Background knowledge needed:

DALL-E was created by the organization OpenAI. This introduced the
world to AI-generated images and took off in popularity about a year
ago. They have a free API that does all sorts of other fun AI-related functions
also. 163

DALL-E mini is an open-source alternative to DALL-E that tinkerers, like

you and I, can play around with for free. This is the engine we’ll be
leveraging in this tutorial

DALL-E Playground is an open source application that does two things:

1. Uses Google Colab to create and run a backend DALL-E mini server
which provides the GPU processing needed to generate images. And 2.
Provides a front-end web interface via javascript that users can interact
with and view their images on. This interface is linked to the Google
Colab server.

What this application does

1. Reengineers DALL-E Playground’s front-end interface from JavaScript to
streamlit Python (because 1. The UI looks better 2. It functions more
seamlessly with the speech-to-text API and 3. Python is cooler).

2. Leverages AssemblyAI’s transcription models to transcribe speech into

the text input DALL-E mini engine can work with

3. Listens to speech and displays creative and interesting images

Design
This project is broken up into two primary files: main.py and dalle.py.

If the summaries of the files below sound like gibberish to you, hang in there!
Because within the code ,itself, there are many comments which break down these
concepts more thoroughly!

The main script is used for both the streamlit web application and the voice-
to-text API connection. It involves configuring the streamlit session-state,
creating visual features such as buttons and sliders on the web app
interface, setting up a WebSockets server, filling in all the parameters
required for pyaudio , creating asynchronous functions for sending and

receiving the speech data concurrently between our application and the
AssemblyAi’s server.

The dalle.py file is used to connect the streamlit web application to the
Google Colab server running the DALL-E mini engine. This file has a few
functions which serve the following purposes:

1. Establishes a connection to backend server and verifies it’s valid

2. Initiates call to the server by sending text input for processing

3. Retrieves image JSON data, and decodes data using base64.b64decode()

Code
Please reference my GitHub here to see the full application. I tried to
include comments and a breakdown of what each chunk of code is doing as
I went along, so hopefully, it’s fairly intuitive. And please reference the
original project’s repository here for additional context.

main file:

1 #create web apps in python using streamlit

2 import streamlit as st
3 #PyAudio provides Python bindings for PortAudio v19, the cross-platform audio I/O li
4 import pyaudio
5 #python library for building a websocket server, a two-way interactive communication
6 import websockets
7 #asyncio is a library to write concurrent code using the async/await syntax.
8 import asyncio
9 #This module provides functions for encoding binary data to printable ASCII characte
10 import base64
11 #Python has a built-in package called json, which can be used to work with JSON data
12 import json
13 #pulling in the api key for AssemblyAI
14 from configure import api_key
15 #pulling in function from other file
16 from dalle import create_and_show_images
17
18 #configuring session_state
19 if 'text' not in st.session_state:
20 st.session_state['text'] = ''
21 st.session_state['run'] = False
22
23 #creating webapp title
24 st.title("DALL-E Mini")
25
26 #function to begin session_state
27 def start_listening():
28 st.session_state["run"] = True
29
30 #button to activate session_state function
31 st.button("Say something", on_click=start_listening)
32
33 #text on application
34 text = st.text_input("What should I create?", value=st.session_state["text"
35
36 #slider visualization
37 num_images = st.slider("How many images?", 1, 6)
38
39 #variable for button
40 ok = st.button("GO!")
41
42 #if statement to determine when to call and retreive from dalle file
43 if ok and text:
44 create_and_show_images(text, num_images)
45
46 # the AssemblyAI endpoint we're going to hit
47 URL = "wss://api.assemblyai.com/v2/realtime/ws?sample_rate=16000"
48
49 #setting up microphone paramaters
50 #how many bytes of data per chunk of audio processed
51 FRAMES_PER_BUFFER = 3200
52 #port audio input / output bit integer format, this is default
53 FORMAT = pyaudio.paInt16
54 #monoformat channel, meaning we only need the input audio coming from 1 direction
55 CHANNELS = 1
56 #desired rate in Hz of incoming audio
57 RATE = 16000
58 p = pyaudio.PyAudio()
59
60 # starts recording, creating stream variable, assigning paramaters
61 stream = p.open(
62 format=FORMAT,
63 channels=CHANNELS,
64 rate=RATE,
65 input=True,
66 frames_per_buffer=FRAMES_PER_BUFFER
67 )
68 #creating asynchronous function, so it can continue running and sending stream of sp
69 async def send_receive():
70 print(f'Connecting websocket to url ${URL}')
70 print(f'Connecting websocket to url ${URL}')
71
72 async with websockets.connect(
73 URL,
74 extra_headers=(("Authorization", api_key),),
75 ping_interval=5,
76 ping_timeout=20
77 ) as _ws:
78
79 r = await asyncio.sleep(0.1)
80 print("Receiving Session begins ...")
81
82 session_begins = await _ws.recv()
83
84 async def send():
85 while st.session_state['run']:
86 try:
87 data = stream.read(FRAMES_PER_BUFFER)
88 data = base64.b64encode(data).decode("utf-8")
89 json_data = json.dumps({"audio_data":str(data)})
90 r = await _ws.send(json_data)
91 except websockets.exceptions.ConnectionClosedError as e:
92 print(e)
93 assert e.code == 4008
94 break
95 except Exception as e:
96 print(e)
97 assert False, "Not a websocket 4008 error"
98
99 r = await asyncio.sleep(0.01)
100
101
102 async def receive():
103 while st.session_state['run']:
104 try:
105 result_str = await _ws.recv()
106 result = json.loads(result_str)['text']
107
108 if json.loads(result_str)['message_type'] == 'FinalTranscript'
109 result = result.replace('.', '')
110 result = result.replace('!', '')
111 st.session_state['text'] = result
111 st.session_state['text'] = result
112 st.session_state['run'] = False
113 st.experimental_rerun()
114 except websockets.exceptions.ConnectionClosedError as e:
115 print(e)
116 assert e.code == 4008
117 break
118 except Exception as e:
119 print(e)
120 assert False, "Not a websocket 4008 error"
121
122 send_result, receive_result = await asyncio.gather(send(), receive())
123
124
125 asyncio.run(send_receive())

main.py hosted with by GitHub view raw

dalle file:

1 #Requests allows you to send HTTP/1.1 requests easily

2 import requests
3 #This module provides functions for encoding binary data to printable ASCII character
4 import base64
5 #create web apps in python using streamlit
6 import streamlit as st
7
8 #This is the unique URL that obtained by going to the google colab link found in DALL
9 URL = "https://sky-reservoir-fighting-sacrifice.trycloudflare.com"
10 headers = {'Bypass-Tunnel-Reminder': "go",
11 'mode': 'no-cors'}
12
13 #Establishes connection to backend server and verifies it's valid
14 def check_if_valid_backend(url):
15 try:
16 resp = requests.get(url, timeout=5, headers=headers)
17 return resp.status_code == 200
18 except requests.exceptions.Timeout:
19 return False
20 #Initiates call to server by sending text input for processing
21 def call_dalle(url, text, num_images=1):
22 data = {"text": text, "num_images": num_images}
23 resp = requests.post(url + "/dalle", headers=headers, json=data)
24 if resp.status_code == 200:
25 return resp
26 #Retrieves image json data, and decodes data using base64.b64decode()
27 def create_and_show_images(text, num_images):
28 valid = check_if_valid_backend(URL)
29 if not valid:
30 st.write("Backend service is not running")
31 else:
32 resp = call_dalle(URL, text, num_images)
33 if resp is not None:
34 for data in resp.json():
35 img_data = base64.b64decode(data)
36 st.image(data)

dalle.py hosted with by GitHub view raw

Conclusion
This project is a proof of concept for something I’d like to have in my house
one day. I’d like to have a screen on my wall in the middle of a decorative
frame. Let’s call it a smart picture frame. This screen will have a built-in
microphone that listens to all conversations spoken in proximity. Using
speech-to-text transcription and natural language processing, the frame will
filter and choose the most interesting assortment of words spoken every 30
seconds or so. From there, the text will be continually visualized to
dynamically add more depth to the atmosphere.

Imagine visual representations and themes of conversation being displayed

on the wall during hangouts and family gatherings in real time. How many
creative ideas can emerge from something similar to this? How can the
mood of the house change and morph depending on the mood of the
participants? The house will feel less like an inorganic structure and more
like a participant, itself. Very interesting to think about.

Alas, this project was a fun way to get our hands dirty and play around with
these concepts. It’s sort of disappointing that the DALL-E mini doesn’t have
the same sort of extremely high-quality images that engines like the OpenAI
DALL-E2 have. Nevertheless, I still enjoyed learning the process and
principles behind the technology on this project. Most likely in a few years,
APIs for these high-resolution image-generating services will be easier to
access and play around with anyway. Thanks to anyone who made it all the
way through. And good luck on your journey towards learning every day.

This project was influenced by a YouTube tutorial, so please check that out,
as I found it helpful and they deserve credit.

Check out some of my other articles if you found this one

helpful/interesting:
Build an Alexa- or Siri-Equivalent Bot in Python Using OpenAI
How to find land when you’re at sea using python
I wrote a python script to play the lottery for me

AI Python Web Development Software Development

Artificial Intelligence

Enjoy the read? Reward the writer.Beta

Your tip will go to Andrew Hershy through a third-party platform of their choice, letting them know you
appreciate their story.

Give a tip

Sign up for Coffee Bytes

By Better Programming

A newsletter covering the best programming articles published across Medium Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review
our Privacy Policy for more information about our privacy practices.

Get this newsletter

About Help Terms Privacy

Get the Medium app

Learn SAP Basis in 24 Hours
From Everand
Learn SAP Basis in 24 Hours
Alex Nordeen
4.5/5 (2)
Some Tutorials in Computer Networking Hacking
From Everand
Some Tutorials in Computer Networking Hacking
Dr. Hidaia Mahmood Alassouli
No ratings yet
Good Manufacturing Practice: GMP Training-FDA Regulations
67% (3)
Good Manufacturing Practice: GMP Training-FDA Regulations
110 pages
Based On The UK Construction Industry Key Performance Indicators
No ratings yet
Based On The UK Construction Industry Key Performance Indicators
30 pages
Test SDHGFJHDF'
No ratings yet
Test SDHGFJHDF'
5 pages
Python and SQLite Development
From Everand
Python and SQLite Development
Agus Kurniawan
No ratings yet
50 Recipes for Programming Node.js
From Everand
50 Recipes for Programming Node.js
Jamie Munro
3/5 (4)
Inspiring Powershell Articles
From Everand
Inspiring Powershell Articles
Murat Yildirimoglu
No ratings yet
Pyqt6 101: A Beginner’s Guide to PyQt6
From Everand
Pyqt6 101: A Beginner’s Guide to PyQt6
Edward Chang
No ratings yet
Build your own Blockchain: Make your own blockchain and trading bot on your pc
From Everand
Build your own Blockchain: Make your own blockchain and trading bot on your pc
Magelan Cybersecurity
No ratings yet
App Explanation Request
No ratings yet
App Explanation Request
29 pages
ЯП - презентация eng
No ratings yet
ЯП - презентация eng
5 pages
Ai Cold Call
No ratings yet
Ai Cold Call
2 pages
Learn Java Programming in 24 Hours
From Everand
Learn Java Programming in 24 Hours
PublishDrive
No ratings yet
Jarvis
No ratings yet
Jarvis
8 pages
Study Guide Cisco 300-735 SAUTO Automating and Programming Cisco Security Solutions Exam
From Everand
Study Guide Cisco 300-735 SAUTO Automating and Programming Cisco Security Solutions Exam
Anand Vemula
No ratings yet
Node.js: The Definitive Resource
From Everand
Node.js: The Definitive Resource
Tom Henricksen
No ratings yet
Python Codes
No ratings yet
Python Codes
2 pages
Angular Generative AI: Building an intelligent CV enhancer with Google Gemini
From Everand
Angular Generative AI: Building an intelligent CV enhancer with Google Gemini
Abdelfattah Ragab
No ratings yet
Fresher PyQt5: A Beginner’s Guide to PyQt5
From Everand
Fresher PyQt5: A Beginner’s Guide to PyQt5
Edward Chang
No ratings yet
C++ for Game Developers: Building Scalable and Robust Gaming Applications
From Everand
C++ for Game Developers: Building Scalable and Robust Gaming Applications
Jarrel E.
No ratings yet
Mastering Go Network Automation: Automating Networks, Container Orchestration, Kubernetes with Puppet, Vegeta and Apache JMeter
From Everand
Mastering Go Network Automation: Automating Networks, Container Orchestration, Kubernetes with Puppet, Vegeta and Apache JMeter
Ian Taylor
No ratings yet
Mastering Go Network Automation
From Everand
Mastering Go Network Automation
Ian Taylor
No ratings yet
Roop-Unleashed4.1.1 .Ipynb
No ratings yet
Roop-Unleashed4.1.1 .Ipynb
2 pages
New Text Document
No ratings yet
New Text Document
4 pages
Foundation Course for Advanced Computer Studies
From Everand
Foundation Course for Advanced Computer Studies
Franck Ismael Djédjé
No ratings yet
Core Java Programming Book
From Everand
Core Java Programming Book
Manish Soni
No ratings yet
Projects with IOTA
From Everand
Projects with IOTA
Guillermo Perez Guillen
No ratings yet
Import Datetime
No ratings yet
Import Datetime
6 pages
File
No ratings yet
File
14 pages
Server
No ratings yet
Server
21 pages
Firebase Storage for Angular: A reliable file upload solution for your applications
From Everand
Firebase Storage for Angular: A reliable file upload solution for your applications
Abdelfattah Ragab
No ratings yet
How to Hack Like a Legend: Breaking Windows
From Everand
How to Hack Like a Legend: Breaking Windows
Sparc Flow
No ratings yet
Azure For Starters
From Everand
Azure For Starters
Chinmoy Mukherjee
No ratings yet
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
Simplifying Data Science With Python
From Everand
Simplifying Data Science With Python
Billy David millican
No ratings yet
FreeSWITCH 1.0.6
From Everand
FreeSWITCH 1.0.6
Anthony Minessale
No ratings yet
Prerequisites For All Programs: Install Required Packages
No ratings yet
Prerequisites For All Programs: Install Required Packages
5 pages
How to a Developers Guide to 4k: Developer edition, #3
From Everand
How to a Developers Guide to 4k: Developer edition, #3
Xinc Cyberwizard
No ratings yet
Python Programming For Blackhats 2024 08 04 03 39 43
No ratings yet
Python Programming For Blackhats 2024 08 04 03 39 43
62 pages
Project Documentation: Muhammad Munib Muhammad Afaaf
No ratings yet
Project Documentation: Muhammad Munib Muhammad Afaaf
11 pages
Network Programing With Python - UPCM
No ratings yet
Network Programing With Python - UPCM
17 pages
Python Programming (21EC643) (Module-5) by Prof. Sujay Gejji
No ratings yet
Python Programming (21EC643) (Module-5) by Prof. Sujay Gejji
34 pages
Voice Assistant Suggetion
No ratings yet
Voice Assistant Suggetion
3 pages
NoSQL Injection for Elasticsearch
From Everand
NoSQL Injection for Elasticsearch
Gary Drocella
No ratings yet
Jal Jeera
No ratings yet
Jal Jeera
3 pages
Generative AI Report Internship
No ratings yet
Generative AI Report Internship
4 pages
781 Assignment5
No ratings yet
781 Assignment5
9 pages
JARVIS
No ratings yet
JARVIS
5 pages
Python Programming (21EC643) Module - 5 QP Solution by Prof. Sujay Gejji
100% (1)
Python Programming (21EC643) Module - 5 QP Solution by Prof. Sujay Gejji
30 pages
Boto SES + Python Twisted GitHub
No ratings yet
Boto SES + Python Twisted GitHub
20 pages
Week 12 959
No ratings yet
Week 12 959
6 pages
Web Server
No ratings yet
Web Server
4 pages
Jarvis
No ratings yet
Jarvis
5 pages
C++ Functions and tutorial
From Everand
C++ Functions and tutorial
Nino Paiotta
No ratings yet
College Project
No ratings yet
College Project
39 pages
Roop Unleashed 4.1.4.ipynb
No ratings yet
Roop Unleashed 4.1.4.ipynb
2 pages
Wireless and Mobile Hacking and Sniffing Techniques
From Everand
Wireless and Mobile Hacking and Sniffing Techniques
Dr. Hidaia Mahmood Alassouli
No ratings yet
Agent
No ratings yet
Agent
20 pages
Visual Basic 2010 Coding Briefs Data Access
From Everand
Visual Basic 2010 Coding Briefs Data Access
Kevin Hough
5/5 (1)
CTP M5 CH1, CH2
No ratings yet
CTP M5 CH1, CH2
18 pages
How To Program A Mobile Game
From Everand
How To Program A Mobile Game
Duong Tran
4/5 (1)
Muhammad Danish Afif Bin Rosman Resume As of Aug 2022
No ratings yet
Muhammad Danish Afif Bin Rosman Resume As of Aug 2022
1 page
Lynx
No ratings yet
Lynx
6 pages
LCD TV/DVD: Service Manual Circuit Diagrams
No ratings yet
LCD TV/DVD: Service Manual Circuit Diagrams
31 pages
Cloud Computing Unit-1 Notes
No ratings yet
Cloud Computing Unit-1 Notes
16 pages
College and Advanced Algebra (Content)
100% (1)
College and Advanced Algebra (Content)
269 pages
Thesis Book T2310065 Pothole Detection Using Lightweight Network Models
No ratings yet
Thesis Book T2310065 Pothole Detection Using Lightweight Network Models
90 pages
Operating Manual-Sx60-100 Om 090824
No ratings yet
Operating Manual-Sx60-100 Om 090824
112 pages
Operations On Array
No ratings yet
Operations On Array
9 pages
IRIS - Boiler Tubes Inspection Report PDF
100% (1)
IRIS - Boiler Tubes Inspection Report PDF
11 pages
36 - Extracted - CN LAB FILE
No ratings yet
36 - Extracted - CN LAB FILE
21 pages
Pure Mathematics Coordinate Geometry Project
No ratings yet
Pure Mathematics Coordinate Geometry Project
25 pages
Datasheet ST S5H100
No ratings yet
Datasheet ST S5H100
5 pages
Diagnose IIS Performance Problems Using Windows Performance Monitor
No ratings yet
Diagnose IIS Performance Problems Using Windows Performance Monitor
2 pages
Magnetically Levitated Ball
No ratings yet
Magnetically Levitated Ball
4 pages
Unit 30 - Assignment 1
100% (1)
Unit 30 - Assignment 1
3 pages
ISO (International Organization Standardization)
100% (1)
ISO (International Organization Standardization)
18 pages
BPM Guide
No ratings yet
BPM Guide
10 pages
Annihilator Method
100% (1)
Annihilator Method
7 pages
Succinctly
100% (1)
Succinctly
121 pages
77 9097
No ratings yet
77 9097
75 pages
Phase in Oxo Connect C080 en
100% (1)
Phase in Oxo Connect C080 en
2 pages
Cmos Digital Vlsi Design: Sequential Logic Design-VII
No ratings yet
Cmos Digital Vlsi Design: Sequential Logic Design-VII
11 pages
MAX1737 Stand-Alone Switch-Mode Lithium-Ion Battery-Charger Controller
No ratings yet
MAX1737 Stand-Alone Switch-Mode Lithium-Ion Battery-Charger Controller
42 pages
Sinhgad Institute of Management, Pune-41: Assignment No.4
No ratings yet
Sinhgad Institute of Management, Pune-41: Assignment No.4
2 pages
Engineering Mathematics
100% (1)
Engineering Mathematics
14 pages
KHUSH
No ratings yet
KHUSH
21 pages
Lecture 2
No ratings yet
Lecture 2
47 pages
CSC Examination Result
No ratings yet
CSC Examination Result
2 pages

Build A Python Web Application That Turns Voice Into Text Into Image - by Andrew

Uploaded by

Build A Python Web Application That Turns Voice Into Text Into Image - by Andrew

Uploaded by

Published in Better Programming

You have 1 free member-only story left this month.

Andrew Hershy Follow

Sep 29, 2022 · 4 min read · · Listen

Build a Python Web Application That Turns

Background knowledge needed:

DALL-E mini is an open-source alternative to DALL-E that tinkerers, like

DALL-E Playground is an open source application that does two things:

What this application does

2. Leverages AssemblyAI’s transcription models to transcribe speech into

3. Listens to speech and displays creative and interesting images

1. Establishes a connection to backend server and verifies it’s valid

2. Initiates call to the server by sending text input for processing

3. Retrieves image JSON data, and decodes data using base64.b64decode()

1 #create web apps in python using streamlit

main.py hosted with by GitHub view raw

1 #Requests allows you to send HTTP/1.1 requests easily

dalle.py hosted with by GitHub view raw

Imagine visual representations and themes of conversation being displayed

Check out some of my other articles if you found this one

AI Python Web Development Software Development

Enjoy the read? Reward the writer.Beta

Sign up for Coffee Bytes

Get this newsletter

Get the Medium app

You might also like