0% found this document useful (0 votes)
13 views2 pages

Turn Data Into Models

This document discusses how machine learning models are trained using data rather than being handcrafted. It provides an example of using structured data from milk shopping trips to train a model to estimate trip duration. The model is trained by assigning initial weights to different inputs like time of day and weather, then allowing the computer to adjust the weights until estimates best match historical data. This creates a model that can then be tested to see if it can accurately predict trip times for new shopping trips. The example illustrates how different types of data require different machine learning techniques like supervised learning using structured data with clear inputs and outputs.

Uploaded by

Gowtham Thalluri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views2 pages

Turn Data Into Models

This document discusses how machine learning models are trained using data rather than being handcrafted. It provides an example of using structured data from milk shopping trips to train a model to estimate trip duration. The model is trained by assigning initial weights to different inputs like time of day and weather, then allowing the computer to adjust the weights until estimates best match historical data. This creates a model that can then be tested to see if it can accurately predict trip times for new shopping trips. The example illustrates how different types of data require different machine learning techniques like supervised learning using structured data with clear inputs and outputs.

Uploaded by

Gowtham Thalluri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 2

Turn Data into Models

The Shift from Crafting to Training


For decades, programmers have written code that takes an input, processes it using
a set of rules, and returns an output. For example, here’s how to find the average
from a set of numbers.

Input: 5, 8, 2, 9
Process: Add the values [5 + 8 + 2 + 9] then divide by the number of inputs [4]
Output: 6
This simple set of rules for turning an input into an output is an example of an
algorithm. Algorithms have been written to perform some pretty sophisticated tasks.
But some tasks have so many rules (and exceptions) that it’s impossible to capture
them all in a hand-crafted algorithm. Swimming is a good example of a task that is
hard to encapsulate as a set of rules. You might get some advice before jumping in
the pool, but you only really figure out what works once you’re trying to keep your
head above water. Some things are learned best by experience.
What if we could train a computer in the same way? Not by tossing it into
a pool, but by letting it figure out what works to succeed at a task? But just like
learning to swim is very different from learning to speak a foreign language, the
kind of training depends on the task.

Experience Required
Imagine that every time you went to the store to pick up milk, you tracked details
of the trip in a spreadsheet. It’s a little weird, but go with it. You set up the
following columns.

Is it the weekend?
Time of day
Is it raining or not?
Distance to store
Total minutes of trip
After several trips you start getting a feel for how conditions affect how long
it’ll take. Like, rain makes the drive longer, but it also means fewer people are
shopping. Your brain makes connections between the inputs (weekend [W], time [T],
raining [R], distance [D]) and the output (minutes [M]).

Diagram of inputs [W, T, R, D

But how can we get a computer to notice trends in the data so it can estimate too?
One way is the guess-and-check method. Here’s how you do it.

Step 1: Assign all of your inputs a “weight.” This is a number that represents how
strongly an input should affect the output. It’s OK to start with the same weight
for everything.

Step 2: Use the weights with your existing data (and some clever math we won’t get
into here) to estimate the minutes for a milk run. We can compare the estimate to
the historic data. It’ll be way off, but that’s OK.

Step 3: Let the computer guess a new weight for each input, making some a little
more important than others. For example, the time of day might be more important
than whether or not it’s raining.

Step 4: Rerun the calculations to check if the new weights result in a better
estimate. If so, it means the weights are a better fit, and changing in the right
direction.
Step 5: Repeat steps 3 and 4, letting the computer tweak weights until its
estimates aren’t getting any better.

At this point the computer has settled on weights for each input. If you think of
weight as how strongly an input is connected to the output, you can make a diagram
that uses line-thickness to represent the weight of a connection.

Diagram of input nodes connected to an output.

For this example it looks like the time of day has the strongest connection, but
apparently rain doesn’t make much of a difference.

This process of guess-and-check has created a model of our milk runs. And like a
model boat, we can take it to the pool to see if it floats, so to speak. That means
testing it in the real world. So for your next several milk runs, before you leave,
have the model estimate how long it’ll take. If it’s right enough times in a row,
you can confidently let it do the estimating for every future trip.

Use the Right Data for the Right Job


This is a very simple example of using training to make an AI model, but it touches
on some important ideas. First, it’s an example of machine learning (ML), which is
the process of using large amounts of data to train a model to make predictions,
instead of handcrafting an algorithm.

Second, not all data is the same. In our milk run example, the spreadsheet is what
we would call structured data. It is well organized, with labels on every column so
you know the significance of every cell. In contrast, unstructured data would be
something like a news article, or an unlabeled image file. The kind of data that
you have available will affect what kind of training you can do.

Third, the structured data from our spreadsheet lets computers do supervised
learning. It’s considered supervised because we can make sure every piece of input
data has a matching, expected output that we can verify. Conversely, unstructured
data is used for unsupervised learning, which is when AI tries to find connections
in the data without really knowing what it’s looking for.

Letting the computer figure out a single weight for each input is just one kind of
training regimen. But often interconnected systems are more complicated than what
1-to-1 weighting can represent. Thankfully, as you learn in the next unit, there
are other ways to train!

You might also like