Turn Python Scripts Into Beautiful ML Tools - Towards Data Science PDF
Turn Python Scripts Into Beautiful ML Tools - Towards Data Science PDF
Tools
Introducing Streamlit, an app framework built for ML engineers
Adrien Treuille
Oct 1 · 7 min read
Coding a semantic search engine with real-time neural-net inference in 300 lines of Python.
I saw this first at Carnegie Mellon, then at Berkeley, Google X, and finally while
building autonomous robots at Zoox. These tools were often born as little Jupyter
notebooks: the sensor calibration tool, the simulation comparison app, the LIDAR
alignment app, the scenario replay tool, and so on.
When a tool became crucial, we called in the tools team. They wrote fluent Vue and
React. They blinged their laptops with stickers about declarative frameworks. They had
a design process:
The tools team’s clean-slate app building ow.
Which was awesome. But these tools all needed new features, like weekly. And the
tools team was supporting ten other projects. They would say, “we’ll update your tool
again in two months.”
So we were back to building our own tools, deploying Flask apps, writing HTML, CSS,
and JavaScript, and trying to version control everything from notebooks to stylesheets.
So my old Google X friend, Thiago Teixeira, and I began thinking about the following
question: What if we could make building tools as easy as writing Python scripts?
#1: Embrace Python scripting. Streamlit apps are really just scripts that run from top
to bottom. There’s no hidden state. You can factor your code with function calls. If you
know how to write Python scripts, you can write Streamlit apps. For example, this is
how you write to the screen:
import streamlit as st
st.write('Hello, world!')
#2: Treat widgets as variables. There are no callbacks in Streamlit! Every interaction
simply reruns the script from top to bottom. This approach leads to really clean code:
import streamlit as st
x = st.slider('x')
st.write(x, 'squared is', x * x)
An interactive Streamlit app in three lines of code.
#3: Reuse data and computation. What if you download lots of data or perform
complex computation? The key is to safely reuse information across runs. Streamlit
introduces a cache primitive that behaves like a persistent, immutable-by-default, data
store that lets Streamlit apps safely and effortlessly reuse information. For example,
this code downloads data only once from the Udacity Self-driving car project,
yielding a simple, fast app:
1 import streamlit as st
2 import pandas as pd
3
4 # Reuse this data across runs!
5 read_and_cache_csv = st.cache(pd.read_csv)
6
7 BUCKET = "https://streamlit-self-driving.s3-us-west-2.amazonaws.com/"
8 data = read_and_cache_csv(BUCKET + "labels.csv.gz", nrows=1000)
9 desired_label = st.selectbox('Filter to:', ['car', 'truck'])
10 st.write(data[data.label == desired_label])
Using st.cache to persist data across Streamlit runs. To run this code, please follow these instructions.
1. The entire script is run from scratch for each user interaction.
User events trigger Streamlit to rerun the script from scratch. Only the cache persists across runs.
If this sounds intriguing, you can try it right now! Just run:
This will automatically pop open a web browser pointing to your local Streamlit app. If
not, just click the link.
To see more examples like this fractal animation, run streamlit hello from the command line.
. . .
Ok. Are you back from playing with fractals? Those can be mesmerizing.
The simplicity of these ideas does not prevent you from creating incredibly rich and
useful apps with Streamlit. During my time at Zoox and Google X, I watched as self-
driving car projects ballooned into gigabytes of visual data, which needed to be
searched and understood, including running models on images to compare
performance. Every self-driving car project I’ve seen eventually has had entire teams
working on this tooling.
Building such a tool in Streamlit is easy. This Streamlit demo lets you perform semantic
search across the entire Udacity self-driving car photo dataset, visualize human-
annotated ground truth labels, and run a complete neural net (YOLO) in real time
from within the app [1].
This 300-line Streamlit demo combines semantic visual search with interactive neural net inference.
The whole app is a completely self-contained, 300-line Python script, most of which is
machine learning code. In fact, there are only 23 Streamlit calls in the whole app. You
can run it yourself right now!
. . .
As we worked with machine learning teams on their own projects, we came to realize
that these simple ideas yield a number of important benefits:
Streamlit apps are pure Python files. So you can use your favorite editor and
debugger with Streamlit.
My favorite layout for writing Streamlit apps has VSCode on the left and Chrome on the right.
Pure Python scripts work seamlessly with Git and other source control software,
including commits, pull requests, issues, and comments. Because Streamlit’s
underlying language is pure Python, you get all the benefits of these amazing
collaboration tools for free 🎉.
Because Streamlit apps are just Python scripts, you can easily version control them with Git.
Streamlit provides an immediate-mode live coding environment. Just click Always
rerun when Streamlit detects a source file change.
1 import streamlit as st
2 import pandas as pd
3
4 @st.cache
5 def load_metadata():
6 DATA_URL = "https://streamlit-self-driving.s3-us-west-2.amazonaws.com/labels.csv.gz"
7 return pd.read_csv(DATA_URL, nrows=1000)
8
9 @st.cache
10 def create_summary(metadata, summary_type):
11 one_hot_encoded = pd.get_dummies(metadata[["frame", "label"]], columns=["label"])
12 return getattr(one_hot_encoded.groupby(["frame"]), summary_type)()
13
14 # Piping one st.cache function into another forms a computation DAG.
15 summary_type = st.selectbox("Type of summary:", ["sum", "any"])
16 metadata = load_metadata()
17 summary = create_summary(metadata, summary_type)
18 st.write('## Metadata', metadata, '## Summary', summary)
caching DAG example py hosted with ❤ by GitHub view raw
A simple computation pipeline in Streamlit. To run this code, please follow these instructions.
Basically, the pipeline is load_metadata → create_summary. Every time the script is run
Streamlit only recomputes whatever subset of the pipeline is required to get the
right answer. Cool!
To make apps performant, Streamlit only recomputes whatever is necessary to update the UI.
Streamlit is built for GPUs. Streamlit allows direct access to machine-level primitives
like TensorFlow and PyTorch and complements these libraries. For example in this
demo, Streamlit’s cache stores the entire NVIDIA celebrity face GAN [2]. This approach
enables nearly instantaneous inference as the user updates sliders.
This Streamlit app demonstrates NVIDIA celebrity face GAN [2] model using Shaobo Guan’s TL-GAN [3].
Streamlit is a free and open-source library rather than a proprietary web app. You
can serve Streamlit apps on-prem without contacting us. You can even run Streamlit
locally on a laptop without an Internet connection! Furthermore, existing projects can
adopt Streamlit incrementally.
. . .
This just scratches the surface of what you can do with Streamlit. One of the most
exciting aspects of Streamlit is how these primitives can be easily composed into
complex apps that look like scripts. There’s a lot more we could say about how our
architecture works and the features we have planned, but we’ll save that for future
posts.
Block diagram of Streamlit’s components. More coming soon!
We’re excited to finally share Streamlit with the community today and see what you all
build with it. We hope that you’ll find it easy and delightful to turn your Python scripts
into beautiful ML apps.
. . .
Thanks to Amanda Kelly, Thiago Teixeira, TC Ricks, Seth Weidman, Regan Carey, Beverly
Treuille, Geneviève Wachtell, and Barney Pell for their helpful input on this article.
References:
[2] T. Karras, T. Aila, S. Laine, and J. Lehtinen, Progressive Growing of GANs for
Improved Quality, Stability, and Variation (2018), ICLR.
[3] S. Guan, Controlled image synthesis and editing using a novel TL-GAN model (2018),
Insight Data Science Blog.