0% found this document useful (0 votes)
6 views10 pages

Ai Study Progress

The document provides an introduction to Artificial Intelligence (AI) and its various types, including Machine Learning and Deep Learning, along with their historical context and applications. It explains fundamental machine learning concepts such as features, labels, models, and the differences between regression and classification, while also detailing the process of training models using labeled examples. Additionally, it covers linear regression, loss functions, and the importance of minimizing loss during model training.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views10 pages

Ai Study Progress

The document provides an introduction to Artificial Intelligence (AI) and its various types, including Machine Learning and Deep Learning, along with their historical context and applications. It explains fundamental machine learning concepts such as features, labels, models, and the differences between regression and classification, while also detailing the process of training models using labeled examples. Additionally, it covers linear regression, loss functions, and the importance of minimizing loss during model training.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Week 1: Introduction to AI

● Topics:
○ What is Artificial Intelligence (AI)?
○ Different types of AI (Machine Learning, Deep Learning)
○ History and Applications of AI (real-world examples)
● Learning Resources:
○ Online courses: Introduction to AI by [insert platform], Crash Course AI by
Google
○ Books: "Artificial Intelligence: A Modern Approach" by Stuart Russell and Peter
Norvig (advanced), "Superintelligence" by Nick Bostrom (broader view)
○ Documentaries: "AlphaGo" (on Netflix)

google certification course


This one comprises of 25 lessons
Uses of ai
Customize products
Complete seemingly
It can do things that the humans cant code like things like the face recognition
Machine learning crash course with tensor flow API s
Framing

What is (supervised) machine learning? Concisely put, it is the following:

● ML systems learn how to combine input to produce useful predictions on


never-before-seen data.

Let's explore fundamental machine learning terminology.

Labels
A label is the thing we're predicting—the y variable in simple linear regression. The label
could be the future price of wheat, the kind of animal shown in a picture, the meaning of
an audio clip, or just about anything.

Features
A feature is an input variable—the x variable in simple linear regression. A simple
machine learning project might use a single feature, while a more sophisticated
machine learning project could use millions of features, specified as:
𝑥1,𝑥2,...𝑥𝑁

In the spam detector example, the features could include the following:

● words in the email text


● sender's address
● time of day the email was sent
● email contains the phrase "one weird trick."

Examples
An example is a particular instance of data, x. (We put x in boldface to indicate that it is
a vector.) We break examples into two categories:

● labeled examples
● unlabeled examples

A labeled example includes both feature(s) and the label. That is:

labeled examples: {features, label}: (x, y)

Use labeled examples to train the model. In our spam detector example, the labeled
examples would be individual emails that users have explicitly marked as "spam" or "not
spam."

For example, the following table shows 5 labeled examples from a data set containing
information about housing prices in California:

housingMedianAge totalRooms totalBedrooms medianHouseValue


(feature) (feature) (feature) (label)

15 5612 1283 66900

19 7650 1901 80100

17 720 174 85700

14 1501 337 73400


20 1454 326 65500

An unlabeled example contains features but not the label. That is:

unlabeled examples: {features, ?}: (x, ?)

Here are 3 unlabeled examples from the same housing dataset, which exclude
medianHouseValue:

housingMedianAge totalRooms totalBedrooms


(feature) (feature) (feature)

42 1686 361

34 1226 180

33 1077 271

Once we've trained our model with labeled examples, we use that model to predict the
label on unlabeled examples. In the spam detector, unlabeled examples are new emails
that humans haven't yet labeled.

Models
A model defines the relationship between features and label. For example, a spam
detection model might associate certain features strongly with "spam". Let's highlight
two phases of a model's life:

● Training means creating or learning the model. That is, you show the model
labeled examples and enable the model to gradually learn the relationships
between features and label.
● Inference means applying the trained model to unlabeled examples. That is, you
use the trained model to make useful predictions (y'). For example, during
inference, you can predict medianHouseValue for new unlabeled examples.

Regression vs. classification


A regression model predicts continuous values. For example, regression models make
predictions that answer questions like the following:

● What is the value of a house in California?


● What is the probability that a user will click on this ad?

A classification model predicts discrete values. For example, classification models


make predictions that answer questions like the following:

● Is a given email message spam or not spam?


● Is this an image of a dog, a cat, or a hamster?

Descending into ML: Linear Regression


bookmark_border
Estimated Time: 6 minutes

It has long been known that crickets (an insect species) chirp more frequently on hotter
days than on cooler days. For decades, professional and amateur scientists have
cataloged data on chirps-per-minute and temperature. As a birthday gift, your Aunt Ruth
gives you her cricket database and asks you to learn a model to predict this relationship.
Using this data, you want to explore this relationship.

First, examine your data by plotting it:

Figure 1. Chirps per Minute vs. Temperature in Celsius.

As expected, the plot shows the temperature rising with the number of chirps. Is this
relationship between chirps and temperature linear? Yes, you could draw a single
straight line like the following to approximate this relationship:
Figure 2. A linear relationship.

True, the line doesn't pass through every dot, but the line does clearly show the
relationship between chirps and temperature. Using the equation for a line, you could
write down this relationship as follows:

𝑦=𝑚𝑥+𝑏

where:

● 𝑦
● is the temperature in Celsius—the value we're trying to predict.
● 𝑚
● is the slope of the line.
● 𝑥
● is the number of chirps per minute—the value of our input feature.
● 𝑏
● is the y-intercept.

By convention in machine learning, you'll write the equation for a model slightly
differently:

where:

● 𝑦′
● is the predicted label (a desired output).
● 𝑏
● is the bias (the y-intercept), sometimes referred to as
● 𝑤0
● .
● 𝑤1
● is the weight of feature 1. Weight is the same concept as the "slope"
● 𝑚
● in the traditional equation of a line.
● 𝑥1
● is a feature (a known input).

To infer (predict) the temperature

𝑦′

for a new chirps-per-minute value

𝑥1

, just substitute the

𝑥1

value into this model.

Although this model uses only one feature, a more sophisticated model might rely on
multiple features, each having a separate weight (

𝑤1

𝑤2

, etc.). For example, a model that relies on three features might look as follows:

𝑦′=𝑏+𝑤1𝑥1+𝑤2𝑥2+𝑤3𝑥3

Imagine we have a small dataset of chirps per minute and temperature:

Chirps per Minute (x) Temperature (Celsius) (y)


10 18
15 20
22 24
30 26
Goal: Predict the temperature based on the number of chirps per minute.

Steps:

Visually inspect the data: Plot the chirps per minute (x) on the horizontal axis and
temperature (y) on the vertical axis. You should see a positive trend, with temperature
increasing as chirps increase.
Calculate the slope (w₁):
There are different formulas for calculating the slope in linear regression, but a
common one is:

w₁ = Σ(xᵢ - x̅) * (yᵢ - ȳ) / Σ(xᵢ - x̅)²


where:

Σ represents summation (adding all the values)


xᵢ is the individual chirp value
x̅ is the average chirp value (sum of all chirps divided by the number of chirps)
yᵢ is the individual temperature value
ȳ is the average temperature value (sum of all temperatures divided by the number of
temperatures)
Calculate the average chirp and temperature:

Average chirp (x̅) = (10 + 15 + 22 + 30) / 4 = 19.3


Average temperature (ȳ) = (18 + 20 + 24 + 26) / 4 = 22
Plug the values into the slope formula (replace Σ with actual summation):

w₁ = [(10 - 19.3) * (18 - 22) + (15 - 19.3) * (20 - 22) + (22 - 19.3) * (24 - 22) + (30 - 19.3)
* (26 - 22)] /
[(10 - 19.3)² + (15 - 19.3)² + (22 - 19.3)² + (30 - 19.3)²]
w₁ ≈ 0.33
3. Calculate the bias (b₀):

There are different approaches to finding the bias, but a common one is using the
average temperature (ȳ) and the slope (w₁) we just calculated, along with one data
point (x₁, y₁):

b₀ = ȳ - w₁ * x₁
Let's use the first data point (10 chirps, 18 degrees Celsius):

b₀ = 22 - 0.33 * 10 ≈ 18.67
4. The Model Equation:

Now we have the slope (w₁ = 0.33) and bias (b₀ = 18.67). We can express the linear
regression model as:

Predicted Temperature (y') = 18.67 + 0.33 * Chirps per Minute (x)


5. Use the model to predict temperature:

For example, if you hear 25 chirps per minute, you can predict the temperature using
the model:
Predicted Temperature = 18.67 + 0.33 * 25 ≈ 25.32 degrees Celsius
Remember:

This is a simplified example with a small dataset. In real-world applications, you'd


have more data and potentially more complex models.
The model assumes a linear relationship, which might not be perfectly accurate for
all data points.

Training a model simply means learning (determining) good values for all the weights
and the bias from labeled examples. In supervised learning, a machine learning
algorithm builds a model by examining many examples and attempting to find a model
that minimizes loss; this process is called empirical risk minimization.

Loss is the penalty for a bad prediction. That is, loss is a number indicating how bad the
model's prediction was on a single example. If the model's prediction is perfect, the loss
is zero; otherwise, the loss is greater. The goal of training a model is to find a set of
weights and biases that have low loss, on average, across all examples. For example,
Figure 3 shows a high loss model on the left and a low loss model on the right. Note the
following about the figure:

● The arrows represent loss.


● The blue lines represent predictions.

Figure 3. High loss in the left model; low loss in the right model.

Notice that the arrows in the left plot are much longer than their counterparts in the
right plot. Clearly, the line in the right plot is a much better predictive model than the line
in the left plot.

You might be wondering whether you could create a mathematical function—a loss
function—that would aggregate the individual losses in a meaningful fashion.

Squared loss: a popular loss function


The linear regression models we'll examine here use a loss function called squared loss
(also known as L2 loss). The squared loss for a single example is as follows:

= the square of the difference between the label and the prediction

= (observation - prediction(x))2

= (y - y')2

Mean square error (MSE) is the average squared loss per example over the whole
dataset. To calculate MSE, sum up all the squared losses for individual examples and
then divide by the number of examples:

𝑀𝑆𝐸=1𝑁∑(𝑥,𝑦)∈𝐷(𝑦−𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛(𝑥))2

where:

● (𝑥,𝑦)
● is an example in which
● 𝑥
● is the set of features (for example, chirps/minute, age, gender) that the
model uses to make predictions.
● 𝑦
● is the example's label (for example, temperature).
● 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛(𝑥)
● is a function of the weights and bias in combination with the set of features
● 𝑥
● .
● 𝐷
● is a data set containing many labeled examples, which are
● (𝑥,𝑦)
● pairs.
● 𝑁
● is the number of examples in
● 𝐷
● .

Although MSE is commonly-used in machine learning, it is neither the only practical loss
function nor the best loss function for all circumstances.
decending into machine learning

You might also like