Google Ai
Google Ai
Teachable Machine
UX insights on designing simple, accessible interfaces
for teaching computers
By Barron Webster
Machine learning increasingly affects our digital lives—from recommending music to
translating foreign phrases and curating photo albums. It’s important for everyone
using this technology to understand how it works, but doing so isn’t always easy.
Machine learning is defined by its ability to “learn” from data without being explicitly
programmed, and ML-driven products don’t typically let users peek behind the
curtain to see why the system is making the decisions it does. Machine learning can
help find your cat photos, but it won’t really tell you how it does so.
Last October, Google's Creative Lab released Teachable Machine, a free online
experiment that lets you play with machine learning in your web browser, using your
webcam as an input to train a machine learning model of your very own—no
programming experience required. The team—a collaborative effort by Creative Lab
and PAIR team members, plus friends from Støj and Use All Five—wanted people
to get a feel for how machine learning actually “learns,” and to make the process
itself more understandable.
OK, but what are inputs? Or models? An input is just some piece of data—a picture,
video, song, soundbite, or article—that you feed into a machine learning model in
order to teach it. A model is a small computer program that learns from the
information it’s given in order to evaluate other information. For example, a feature
that recognizes faces in the photos stored on your phone, probably uses a model
that was trained on inputs of many photos containing faces.
Let’s say you train a model by showing it a bunch of pictures of oranges (the inputs).
Later, you could show that model a new picture (of say, a kiwi). It would use the
pictures (of oranges) you showed it earlier to decide how likely it is that the new
picture contains an orange.
You can also teach the model about different classes, or categories. In the case of
fruit, you might have one class that’s really good at recognizing oranges, and another
that’s good at recognizing apples (or kiwi).
With Teachable Machine, we set out to let people easily train their own ML models—
and we hope to inspire more designers to build interfaces for experimenting with
machine learning. Just as we were inspired by the lessons of many fun, accessible
ML projects (like Rebecca Fiebrink’s Wekinator), we hope these lessons will help
other designers get through the process quickly and easily.
Training ≠ output
Instead of training a model to do something very narrow (for example, recognize
cats), we wanted Teachable Machine to feel as though you’re training a tool, like the
keyboard on your computer or phone. The model you train could be used for any
digital output. To emphasize the principle that input and output can be mixed and
matched, we broke the interface into three different blocks.
The “input” block comes first. This is where digital media enters the system (pictured
above, left). Then there’s the “learning” block (above, middle), where the model
learns and interprets the input. The “output” box (above, right), is where the model’s
interpretation does something like play a GIF or make a sound. Connections run
between each block, just like the wires leading from your keyboard to your computer
to your monitor. You can swap out your monitor for a new one, but your keyboard
and computer will still work, right? The same principle holds true for the outputs in
Teachable Machine—even when you switch them, the input and learning blocks still
work.
This means that anyone can change the output of their model. After spending a few
minutes on training, you can use it to show different GIFs and then switch to the
speech output. The model works just as well.
Train by example, not by rule-making
Another principle core to the capabilities of machine learning: training the model by
example, not by instruction. You teach the model by showing it a lot of examples—in
our case, individual image frames. When it’s running, or “inferring,” your model looks
at each new image frame, compares it to all the examples you’ve taught each class
(or category) and returns a confidence rating—a measure of how confident it is that
the new frame is similar to the classes you’ve trained before.
Our class model displays each class’ examples individually as they’re recorded. This
also helps communicate an important distinction—that the model is not seeing
motion, but just the still images it captures every few milliseconds.