0% found this document useful (0 votes)
53 views100 pages

Material For Student CAIPC (V062021A) EN

The document outlines the learning objectives and target audience for the Artificial Intelligence Professional Certificate (CAIPC®) offered by CertiProf®, which includes understanding AI fundamentals, machine learning methods, and Python programming. It describes CertiProf® as a certifying entity focused on lifelong learning and collaboration with various educational partners. Additionally, the document provides an agenda for the course, covering topics such as supervised and unsupervised machine learning, data analysis, and practical projects using Python.

Uploaded by

raulytb2024
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views100 pages

Material For Student CAIPC (V062021A) EN

The document outlines the learning objectives and target audience for the Artificial Intelligence Professional Certificate (CAIPC®) offered by CertiProf®, which includes understanding AI fundamentals, machine learning methods, and Python programming. It describes CertiProf® as a certifying entity focused on lifelong learning and collaboration with various educational partners. Additionally, the document provides an agenda for the course, covering topics such as supervised and unsupervised machine learning, data analysis, and practical projects using Python.

Uploaded by

raulytb2024
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 100

Learning Objectives

• Understand the fundamentals of Artificial Intelligence and Machine Learning


• Describe the methods of machine learning: supervised and unsupervised
• Use the data analysis for Decision-Making
• Understand the limits of algorithms
• Understand and grasp Python programming, essential mathematics knowledge in AI, basic
programming methods

Who should attend to this certification?


• Anyone interested in expanding their knowledge in Artificial Intelligence and Machine Learning
• Engineers, Analysts, Marketing Managers
• Data Analysts, Data Scientists, Data Stewards
• Anyone interested in Data Mining and Machine Learning techniques

Who is CertiProf®?
CertiProf® is a certifying entity founded in the United States in 2015, currently located in Sunrise,
Florida.

Our philosophy is based on the creation of knowledge in community and for this its collaborative
network is formed by:

• Our Lifelong Learners (LLLs) identify themselves as continuous learners, demonstrating their
unwavering commitment to lifelong learning, which is vitally important in today's ever-changing
ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

and expanding digital world. Whether they pass the exam or not.
• Universities, training centers, and facilitators around the world are part of our network of allies
ATPs (Authorized Training Partners.)
• The authors (co-creators) are industry experts or practitioners who, with their knowledge, develop
content for the creation of new certifications that respond to the industry needs.
• Internal Staff: Our distributed team with operations in India, Brazil, Colombia, and the United States
is in charged of overcoming obstacles, finding solutions, and delivering exceptional results.

2
Our Accreditations and Affiliations

Agile Alliance

ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®


CertiProf® is a Corporate Member of the Agile
Alliance.

By joining the Agile Alliance corporate program,


we continue empowering individuals by helping
them reach their potential through education.
Every day, we provide more tools and resources
allowing our partners to train professionals
that are looking to improve their professional
development and skills.

https://www.agilealliance.org/organizations/
certiprof/

3
IT Certification Council - ITCC

CertiProf® is an active Member of ITCC.

The fundamental purpose of the ITCC is to


support the industry and its member companies
by marketing the value of certification, promoting
exam security, encouraging innovation, and
establishing and sharing industry best practices.

Credly
This alliance allows people and companies
certified or accredited with CertiProf® to have a
worldwide distinction through a digital badge.

Credly is leading the digital credential movement,


and companies such as IBM, Microsoft,
PMI, Nokia, Stanford University, among others
issue their badges with Credly.
ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

4
Presentation

Name

Company

Job title

Experience

Expectations

Badge

ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

https://www.credly.com/org/certiprof/badge/artificial-intelligence-professional-certificate-ca

5
Lifelong Learning
Holders of this particular badge have
demonstrated their unwavering commitment
to lifelong learning, which is vitally important
in today's ever-changing and expanding digital
world. It also identifies the qualities of an
open, disciplined and constantly evolving mind,
capable of using and contributing its knowledge
to the development of a more egalitarian and
better world.

Acquisition Criteria:
• Be a candidate for a CertiProf certification
• Be a continuous and focused learner
• Identify with the concept of lifelong learning
• Really believe and identify with the concept
that knowledge and education can and should
change the world.
• Want to boost your professional growth
ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

6
7
ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®
Agenda
Machine Learning Fundamentals 10
Machine Learning Fundamentals 11
I.1 Key Points 12
Supervised Machine Learning 13
Unsupervised Machine Learning 13
Reinforcement Machine Learning 13
I.2 Introduction to K-Nearest Neighbors 14
Introduction 15
Introduction to the Data 15
K-nearest Neighbors 17
Euclidean Distance 18
Calculate Distance for All Observations 20
Randomizing and Sorting 21
Average Price 22
Functions for Prediction 22
I.3 Evaluating Model Performance 24
Testing Quality of Predictions 25
Error Metrics 26
Mean Squared Error 27
Training Another Model 28
Root Mean Squared Error 29
Comparing MAE and RMSE 30
I.4 Multivariate K-Nearest Neighbors 32
Recap 33
Removing Features 34
ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

Handling Missing Values 35


Normalize Columns 36
Euclidean Distance for Multivariate Case 37
Introduction to Scikit-learn 39
Fitting a Model and Making Predictions 40
Calculating MSE using Scikit-Learn 41
Using More Features 42
Using All Features 43
I.5 Hyperparameter Optimization 45
Recap 46
Hyperparameter Optimization 46
Expanding Grid Search 48
Visualizing Hyperparameter Values 49
I.6 Cross Validation 51
Concept 52
Holdout Validation 53
K-Fold Cross Validation 54

8
I.7 Guided Project: Predicting Car Prices 57
Guided Project: Predicting Car Prices 58
II Calculus For Machine Learning 59
Calculus For Machine Learning 60
Understanding Linear and Nonlinear Functions 61
Understanding Limits 63
Finding Extreme Points 63
III Linear Algebra For Machine Learning 65
Linear Algebra For Machine Learning 66
Linear Systems 66
Vectors 67
Matrix Algebra 68
Solution Sets 69
IV Linear Regression For Machine Learning 70
Linear Regression For Machine Learning 71
The Linear Regression Model 71
Feature Selection 73
Gradient Descent 74
Ordinary Least Squares 75
Processing And Transforming Features 76
Guided Project: Predicting House Sale Prices 77
V Machine Learning in Python 79
Logistic Regression 80
Introduction to Evaluating Binary Classifiers 80
Multiclass Classification 81
Overfitting 82
Clustering Basics 83

ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®


K-means Clustering 84
Guided Project: Predicting the Stock Market 84
VI Decision Tree 87
Decision Tree 88
Why use Decision Trees? 89
Decision Tree Terminologies 89
How Does the Decision Tree Algorithm Work 89
Pruning: Getting an Optimal Decision Tree 91
Advantages of the Decision Tree 91
Disadvantages of the Decision Tree 91
Python Implementation of Decision Tree 91
Guided Project: Predicting Bike Rentals 97
References and Bibliography 98

9
ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

10
Fundamentals
Machine Learning
Machine Learning Fundamentals
In 1959, Arthur Samuel, a computer scientist who pioneered the study of artificial intelligence,
described machine learning as “the study that gives computers the ability to learn without being
explicitly programmed.” Alan Turing’s seminal paper (Turing, 1950) introduced a benchmark standard
for demonstrating machine intelligence, such that a machine has to be intelligent and responsive in a
manner that cannot be differentiated from that of a human being.

Machine Learning is an application of Artificial Intelligence where a computer/machine learns from


the past experiences (input data) and makes future predictions. The performance of such a system
should be at least at human level.

In this material, we will focus in clustering problems for unsupervised machine learning with K-Means
algorithm. For Supervised machine learning we will describe the classification problem with a
demonstration of design trees algorithm and the regression one with an example of linear regression.
Here is a summary that represents the types of machine learning and some algorithms as examples in
the following figure:

ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

Figure 1.Types of Machine Learning

11
ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

12
I.1 Key Points
Supervised Machine Learning
Supervised learning is an approach to creating artificial intelligence (AI), where a computer algorithm
is trained on input data that has been labeled for a particular output. The model is trained until it
can detect the underlying patterns and relationships between the input data and the output labels,
enabling it to yield accurate labeling results when presented with never-before-seen data.

Classification

A classification algorithm aims to sort inputs into a given number of categories or classes, based on
the labeled data it was trained on. Classification algorithms can be used for binary classifications such
as filtering email into spam or non-spam and categorizing customer feedback as positive or negative.
Feature recognition, such as recognizing handwritten letters and numbers or classifying drugs into
many different categories, is another classification problem solved by supervised learning.

Regression

Regression analysis consists of a set of machine learning methods that allow us to predict a continuous
outcome variable (y) based on the value of one or multiple predictor variables (x). Briefly, the goal of
regression model is to build a mathematical equation that defines y as a function of the x variables.

Unsupervised Machine Learning


Unsupervised Learning is a machine learning technique in which the users do not need to supervise
the model. Instead, it allows the model to work on its own to discover patterns and information that
was previously undetected. It mainly deals with the unlabelled data.

Clustering

ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®


This method of unsupervised classification brings together a set of learning algorithms whose goal is
to group together unlabeled data with similar properties. Isolating patterns or families in this way also
prepares the ground for the subsequent application of supervised learning algorithms (such as KNN).

Reinforcement Machine Learning


Reinforcement learning (RL) is an area of machine learning concerned with how intelligent agents
ought to take actions in an environment in order to maximize the notion of cumulative reward.
Reinforcement learning is one of three basic machine learning paradigms, alongside supervised
learning and unsupervised learning.

13
ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

14
Neighbors
to K-Nearest
I.2 Introduction
Introduction
At its core, data science helps us make sense of the massive world of information all around us — a
world that's far too complex to study directly by ourselves. Data is the record of everything that's going
on and what we should learn from it. The real value of all this information is what it means. Machine
learning helps us discover patterns in data, which is where meaning lives. When we can see what the
data means, we can make predictions about the future. In this lesson, we'll explore machine learning
with a technique called "K-Nearest Neighbors.“ We'll use a dataset of AirBnB rental rates to identify
similar rates in one area for competing AirBnB units and make predictions for ideal rates to maximize
profit. You'll need to be comfortable programming in Python, and you'll need to be familiar with the
NumPy and pandas libraries. Here are a few takeaways you can expect from this lesson:

The basics of the machine learning workflow


• How the K-Nearest Neighbors algorithm works
• The role of Euclidean distance in machine learning
Now, let's get to know our dataset.

Introduction to the Data


Although AirBnB does not publish any data on listings in its marketplace, an independent group called
Inside AirBnB has extracted data on a sample of listings in many of the website's major cities. In this
lesson we will make use of the extracted data for the city of Washington.
Each row in the dataset is a specific listing that was available for rent on AirBnB in the Washington, D.C.
area. The dataset is stored in a file with a csv (comma separated values) extension named dc_airbnb.csv.

You can click on it to download and view the file.


• host_response_rate: the response rate of the • bathrooms: number of bathrooms included in

ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®


host the rental
• host_acceptance_rate: number of requests to • beds: number of beds included in the rental
the host that convert to rentals • price: nightly price for the rental
• host_listings_count: number of the host's • cleaning_fee: additional fee for cleaning the
other listings rental after the guest leaves
• latitude: latitude of the geographic coordinates • security_deposit: refundable security deposit,
• longitude: longitude of the geographic in case of damages
coordinates • minimum_nights: minimum number of nights
• city: the city of the rental a guest can stay at the rental
• zipcode: the zip code of the rental • maximum_nights: maximum number of nights
• state: the state the rental a guest can stay at the rental
• accommodates: the number of guests the • number_of_reviews: number of reviews that
rental can accommodate previous guests have left
• room_type: the type of rental (Private room,
Shared room or Entire home/apt
• bedrooms: number of bedrooms included in
the rental

15
The following is an exploratory data analysis (EDA). In which it will be possible to visualize in a detailed
way and with graphs the data of each column as they can be, the data of the maximum and minimum
number of a numerical column, the value that repeats more in a column, the quantity of null and not null
values, among others. It is advisable to download the file from the following link and open it from the
computer, with the objective of interacting from a web browser.

Example of visualization:
ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

16
Let's read the dataset into Pandas and become more familiar with it.

Instructions

1. In the code editor on the right, write code that does the following:
• Read dc_airbnb.csv into a DataFrame named dc_listings
• Use the print function to display the first row in dc_listings

Solutions

K-nearest Neighbors
Here's the strategy we wanted to use:
• Find a few similar listings
• Calculate the average nightly rental price of these listings
• Set the average price as the price for our listing
The k-nearest neighbors algorithm is similar to this strategy. Here's an overview:

ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

Figure 2.K-Nearest Neighbors steps

17
There are two things we need to unpack in more detail:
• The similarity metric
• How to choose the k value
In this lesson, we'll define what similarity metric we're going to use. Then, we’ll implement the k-nearest
neighbors algorithm and use it to suggest a price for a new, unpriced listing. We'll use a k value of 5 in
this lesson.

Euclidean Distance
The similarity metric works by comparing a fixed set of numerical features (another word for attributes)
between two observations, or living spaces in our case. When trying to predict a continuous value,
like price, the main similarity metric is Euclidean distance. Here's the general formula for Euclidean
distance:
ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

Where q1 to qn represent the feature values for one observation and p1 to pn represent the feature
values for the other observation. Here's a diagram that breaks down the Euclidean distance between
the first two observations in the dataset using only host_listing_count, accommodates, bedrooms,
bathrooms, and bed columns.

18
In this lesson, we'll use just one feature to keep things simple as you become familiar with the machine

ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®


learning workflow. Since we're only using one feature, this is the univariate case. The formula for the
univariate case is:

The square root and the squared power cancel, and the formula simplifies to:

The living space that we want to rent can accommodate three people. Let's first calculate the distance,
using just the accommodates feature, between the first living space in the dataset and our own.

Instructions

1. Calculate the Euclidean distance between our living space, which can accommodate three people,
and the first living space in the dc_listings DataFrame
2. Assign the result to first_distance and display the value using the print function

19
Solutions

Calculate Distance for All Observations


The Euclidean distance between the first row in the dc_listings DataFrame and our own living space
is 1.

How do we know if this is high or low? If you look at the Euclidean distance equation itself, the lowest
value you can achieve is 0. This happens when the value for the feature is exactly the same for both
observations you're comparing. If p1=q1, then d=|q1−p1|, which results in d=0. The closer to 0 the
distance is, the more similar the living spaces are.

If we want to calculate the Euclidean distance between each living space in the dataset and a living
space that accommodates 8 people, here's a preview of what that would look like.
ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

Then, we can rank the existing living spaces by ascending distance values, the proxy for similarity.

Instructions

1. Calculate the distance between each value in the accommodates column from dc_listings and the
value 3, which is the number of people our listing accommodates:
• Use the apply method to calculate the absolute value between each value in accommodates and
3 and return a new Series containing the distance values
2. Assign the distance values to the distance column
3. Use the Series method value_counts and the print function to display the unique value counts for
the distance column

Solutions

20
Randomizing and Sorting
It looks like there are quite a few living spaces
(461, to be precise) that can accommodate three
people just like ours. This means the five "nearest
neighbors" we select after sorting all will have a
distance value of zero.

If we sort by the distance column and then select


the first 5 living spaces, we would be biasing the
result to the ordering of the dataset.

Instead, let's randomize the ordering of the


dataset and then sort the DataFrame by the
distance column. This way, all of the living spaces
that accommodate the same number of people
will still be at the top of the DataFrame, but they
will be in random order across the first 461 rows.

We have set a random seed, so we can perform


answer-checking on our end.

Instructions

1. Randomize the order of the rows in dc_listings:


• Use the np.random.permutation() function to return a NumPy array of shuffled index values
• Use the DataFrame method loc[] to return a new DataFrame containing the shuffled order
• Assign the new DataFrame back to dc_listings

ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®


2. After randomization, sort dc_listings by the distance column, and assign back to dc_listings
3. Display the first 10 values in the price column using the print function

Solutions

21
Average Price
Before we can select the five most similar living spaces and compute the average price, we need to
clean the price column.

Right now, the price column contains comma characters (,) and dollar sign characters and is a text
column instead of a numeric column. We need to remove these values and convert the entire column
to the float datatype. Then, we can calculate the average price.

Instructions

1. Remove the commas (,) and dollar sign characters ($) from the price column:
• Use the str accessor so we can apply string methods to each value in the column followed by the
string method replace to replace all comma characters with the empty character: stripped_commas
= dc_listings['price'].str.replace(',', ‘’)
• Repeat to remove the dollar sign characters
2. Convert the new Series object containing the cleaned values to the float datatype and assign back
to the price column in dc_listings
3. Calculate the mean of the first five values in the price column and assign to mean_price
4. Use the print function or the variable inspector below to display mean_price

Solutions
ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

Functions for Prediction


Congrats! You've just made your first prediction! Based on the average price of other listings that
accommodate three people, we should charge 158,6 dollars per night for a guest to stay at our living
space.

Let's write a more general function that can suggest the optimal price for other values of
the accommodates column.

The dc_listings DataFrame has information specific to our living space (e.g., the distance column).

To save time, we've reset the dc_listings DataFrame to a clean slate and only kept the data cleaning
and randomization we did since those weren't unique to the prediction we were making for our living
space.

22
Instructions

1. Write a function named predict_price that can use the k-nearest neighbors machine learning
technique to calculate the suggested price for any value for accommodates. This function should
do the following:
• Take in a single parameter, new_listing, that describes the number of bedrooms
• (We've added code that assigns dc_listings to a new DataFrame named temp_df. We used the pandas.
DataFrame.copy() method, so the underlying DataFrame is assigned to temp_df, instead of just a
reference to dc_listings)
• Calculate the distance between each value in the accommodates column and the new_listing value
that was passed in. Assign the resulting Series object to the distance column in temp_df
• Sort temp_df by the distance column and select the first five values in the price column. Don't
randomize the ordering of temp_df
• Calculate the mean of these five values and use that as the return value for the entire predict_
price function.
2. Use the predict_price function to suggest a price for a living space that does the following:
• If it accommodates 1 person, assign the suggested price to acc_one
• If it accommodates 2 people, assign the suggested price to acc_two
• If it accommodates 4 people, assign the suggested price to acc_four

Solutions

ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®


• In this lesson, we explored the problem of predicting the optimal listing price for an AirBnB rental
based on the price of similar listings on the site. We worked through the entire machine learning
workflow, from selecting a feature to testing the model. To explore the basics of machine learning,
we limited ourselves to only using one feature (the univariate case) and a fixed k value of 5.
• In the next lesson, we'll learn how to evaluate a model's performance.

23
ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

24
Model
Performance
I.3 Evaluating
Testing Quality of Predictions
We now have a function that can predict the price for any living space we want to list as long as we
know the number of people it can accommodate. The function we wrote represents a predicted_price,
which means that it outputs a prediction based on the input to the model.

A simple way to test the quality of your model is to:


• Split the dataset into 2 partitions:
• The training set: contains the majority of the rows (75%)
• The test set: contains the remaining minority of the rows (25%)
• Use the rows in the training set to predict the price value for the rows in the test set
• Add new column named predicted_price to the test set
• Compare the predicted_price values with the actual price values in the test set to see how accurate
the predicted values were

This validation process, where we use the training set to make predictions and the test set to predict
values for, is known as train/test validation. Whenever you're performing machine learning, you
want to perform validation of some kind to ensure that your machine learning model can make
good predictions on new data. While train/test validation isn't perfect, we'll use it to understand the
validation process, to select an error metric, and then we'll dive into a more robust validation process
later in this course.

Let's modify the predict_price function to use only the rows in the training set, instead of the full
dataset, to find the nearest neighbors, average the price values for those rows, and return the predicted
price value. Then, we'll use this function to predict the price for just the rows in the test set. Once we
have the predicted price values, we can compare with the true price values and start to understand the
model's effectiveness in the next screen.

ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®


To start, we've gone ahead and assigned the first 75% of the rows in dc_listings to train_df and the last
25% of the rows to test_df. Here's a diagram explaining the split:

Figure 3. Diagram for train and test model decomposition

25
Instructions

• Within the predict_price function, change the Dataframe that temp_df is assigned to. Change it
from dc_listings to train_df, so only the training set is used
• Use the Series method apply to pass all of the values in the accommodates column from test_
df through the predict_price function
• Assign the resulting Series object to the predicted_price column in test_df

Solutions

Error Metrics
We now need a metric that quantifies how good the predictions were on the test set. This class of
metrics is called an error metric. As the name suggests, an error metric quantifies how inaccurate
our predictions were compared to the actual values. In our case, the error metric tells us how off our
ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

predicted_price values were from the actual price values for the living spaces in the test dataset.

We could start by calculating the difference between each predicted and actual value and then
averaging these differences. This is referred to as mean error but isn't an effective error metric for
most cases. Mean error treats a positive difference differently than a negative difference, but we're
really interested in how far off the prediction is in either the positive or negative direction. If the true
price was 200 dollars and the model predicted 210 or 190 it's off by 10 dollars either way.

We can instead use the mean absolute error, where we compute the absolute value of each error
before we average all the errors.

26
Instructions

• Use numpy.absolute() to calculate the mean absolute error between predicted_price and price
• Assign the MAE to mae

Solutions

Mean Squared Error


For many prediction tasks, we want to penalize predicted values that are further away from the actual
value far more than those closer to the actual value.

We can instead take the mean of the squared error values, which is called the mean squared error or
MSE for short. The MSE makes the gap between the predicted and actual values more clear. A prediction
that's off by 100 dollars will have an error (of 10,000) that's 100 times more than a prediction that's off
by only 10 dollars (which will have an error of 100).

Here’s the formula for MSE:

Where n represents the number of rows in the test set. Let's calculate the MSE value for the predictions
we made on the test set.

ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®


Instructions

Calculate the MSE value between the predicted_price and price columns and assign to mse

Solutions

27
Training Another Model
The model we trained achieved a mean squared error of around 18646.5. Is this a high or a low mean
squared error value? What does this tell us about the quality of the predictions and the model? By
itself, the mean squared error value for a single model isn't all that useful.

The units of mean squared error in our case is dollars squared (not dollars), which makes it hard to
reason about intuitively as well. We can, however, train another model and then compare the mean
squared error values to see which model performs better on a relative basis. Recall that a low error
metric means that the gap between the predicted list price and actual list price values is low while a
high error metric means the gap is high.

Let's train another model, this time using the bathrooms column, and compare MSE values.

Instructions
• Modify the predict_price function to the right to use the bathrooms column instead of
the accommodates column to make predictions
• Apply the function to test_df and assign the resulting Series object containing the predicted_price
values to the predicted_price column in test_df
• Calculate the squared error between the price and predicted_price columns in test_df and assign
the resulting Series object to the squared_error column in test_df
• Calculate the mean of the squared_error column in test_df and assign to mse
• Use the print function or the variables inspector to display the MSE value

Solutions
ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

28
Root Mean Squared Error
While comparing MSE values helps us identify which model performs better on a relative basis, it
doesn't help us understand if the performance is good enough in general. This is because the units of
the MSE metric are squared (in this case, dollars squared). An MSE value of 16377.5 dollars squared
doesn't give us an intuitive sense of how far off the model's predictions are from the true price value
in dollars.

Root mean squared error is an error metric whose units are the base unit (in our case, dollars). RMSE
for short, this error metric is calculated by taking the square root of the MSE value.

Since the RMSE value uses the same units as the target column, we can understand how far off in real
dollars we can expect the model to perform.

Let's calculate the RMSE value of the model we trained using the bathrooms column.

Instructions

• Calculate the RMSE value of the model we trained using the bathrooms column and assign it to rmse

Solutions

ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

29
Comparing MAE and RMSE
The model achieved an RMSE value of approximately 135.6, which implies that we should expect for
the model to be off by 135.6 dollars on average for the predicted price values. Given that most of the
living spaces are listed at just a few hundred dollars, we need to reduce this error as much as possible
to improve the model's usefulness.

We discussed a few different error metrics we can use to understand a model's performance. As
we mentioned earlier, these individual error metrics are helpful for comparing models. To better
understand a specific model, we can compare multiple error metrics for the same model. This requires
a better understanding of the mathematical properties of the error metrics.

If you look at the equation for MAE:

You'll notice that that the differences between predicted and actual values grow linearly. A prediction
that's off by 10 dollars has a 10 times higher error than a prediction that's off by 1 dollar. If you look at
the equation for RMSE, however:

You'll notice that each error is squared before the square root of the sum of all the errors is taken. This
means that the individual errors grows quadratically and has a different effect on the final RMSE value.

Let's look at an example using different data entirely. We've created 2 Series objects containing 2 sets
of errors and assigned to errors_one and errors_two.

Instructions
ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

• Calculate the MAE for errors_one and assign to mae_one


• Calculate the RMSE for errors_one and assign to rmse_one
• Calculate the MAE for errors_two and assign to mae_two
• Calculate the RMSE for errors_two and assign to rmse_two

Solutions

30
While the MAE (7.5) to RMSE (7.9056941504209481) ratio was about 1:1 for the first list of errors,
the MAE (62.5) to RMSE (235.82302686548658) ratio was closer to 1:4 for the second list of errors.
In general, we should expect that the MAE value be much less than the RMSE value. The only
difference between the 2 sets of errors is the extreme 1000 value in errors_two instead of 10. When
we're working with larger data sets, we can't inspect each value to understand if there's one or some
outliers or if all of the errors are systematically higher. Looking at the ratio of MAE to RMSE can help
us understand if there are large but infrequent errors. You can read more about comparing MAE and
RMSE in this wonderful post.

In this mission, we learned how to test our machine learning models using basic cross validation and
different metrics. In the next 2 missions, we'll explore how adding more features to the machine
learning model and selecting a more optimal k value can help improve the model's performance.

ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

31
ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

32
K-Nearest
Neighbors
I.4 Multivariate
Recap
In the last mission, we explored how to use a simple k-nearest neighbors machine learning model
that used just one feature, or attribute, of the listing to predict the rent price. We first relied on
the accommodates column, which describes the number of people a living space can comfortably
accommodate. Then, we switched to the bathrooms column and observed an improvement in accuracy.
While these were good features to become familiar with the basics of machine learning, it's clear that
using just a single feature to compare listings doesn't reflect the reality of the market. An apartment
that can accommodate 4 guests in a popular part of Washington D.C. will rent for much higher than
one that can accommodate 4 guests in a crime ridden area.

There are 2 ways we can tweak the model to try to improve the accuracy (decrease the RMSE during
validation):
• Increase the number of attributes the model uses to calculate similarity when ranking the closest
neighbors
• Increase k, the number of nearby neighbors the model uses when computing the prediction

In this mission, we'll focus on increasing the number of attributes the model uses. When selecting
more attributes to use in the model, we need to watch out for columns that don't work well with the
distance equation. This includes columns containing:

• Non-numerical values (e.g. city or state)


• Euclidean distance equation expects numerical values
• Missing values
• Distance equation expects a value for each observation and attribute
• Non-ordinal values (e.g. latitude or longitude)
• Ranking by Euclidean distance doesn't make sense if all attributes aren't ordinal

ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®


In the following code screen, we've read the dc_airbnb.csv dataset from the last mission into pandas
and brought over the data cleaning changes we made. Let's first look at the first row's values to identify
any columns containing non-numerical or non-ordinal values. In the next screen, we'll drop those
columns and then look for missing values in each of the remaining columns.

Instructions

• Use the DataFrame.info() method to return the number of non-null values in each column

Solutions

33
Removing Features
The following columns contain non-numerical values:
• room_type: e.g. private_room
• city: e.g. Washington
• state: e.g. DC

While these columns contain numerical but non-ordinal values:


• latitude: e.g. 38.913458
• longitude: e.g. -77.031
• zipcode: e.g. 20009

Geographic values like these aren't ordinal, because a smaller numerical value doesn't directly
correspond to a smaller value in a meaningful way. For example, the zip code 20009 isn't smaller or
larger than the zip code 75023 and instead both are unique, identifier values. Latitude and longitude
value pairs describe a point on a geographic coordinate system and different equations are used in
those cases (e.g. haversine).

While we could convert the host_response_rate and host_acceptance_rate columns to be numerical


(right now they're object data types and contain the % sign), these columns describe the host and
not the living space itself. Since a host could have many living spaces and we don't have enough
information to uniquely group living spaces to the hosts themselves, let's avoid using any columns that
don't directly describe the living space or the listing itself:
• host_response_rate
• host_acceptance_rate
• host_listings_count
ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

Let's remove these 9 columns from the Dataframe.

Instructions

• Remove the 9 columns we discussed above from dc_listings:


• 3 containing non-numerical values
• 3 containing numerical but non-ordinal values
• 3 describing the host instead of the living space itself

Solutions

34
Handling Missing Values
Of the remaining columns, 3 columns have a few missing values (less than 1% of the total number of
rows):
• bedrooms
• bathrooms
• beds

Since the number of rows containing missing values for one of these 3 columns is low, we can select
and remove those rows without losing much information. There are also 2 columns that have a large
number of missing values:
• cleaning_fee - 37.3% of the rows
• security_deposit - 61.7% of the rows

And we can't handle these easily. We can't just remove the rows containing missing values for these
2 columns because we'd miss out on the majority of the observations in the dataset. Instead, let's
remove these 2 columns entirely from consideration.

Instructions

• Drop the cleaning_fee and security_deposit columns from dc_listings


• Then, remove all rows that contain a missing value for the bedrooms, bathrooms, or beds column
from dc_listings
• You can accomplish this by using the Dataframe method dropna() and setting the axis parameter
to 0
• Since only the bedrooms, bathrooms, and beds columns contain any missing values, rows
containing missing values in these columns will be removed

ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®


• Display the null value counts for the updated dc_listings Dataframe to confirm that there are no
missing values left

Solutions

35
Normalize Columns
Here's how the dc_listings Dataframe looks after all the changes we made:

accommodates bedrooms bathrooms beds price minimum_nights maximum_nights number_of_reviews

2 1.0 1.0 1.0 125.0 1 4 149

2 1.0 1.5 1.0 85.0 1 30 49

1 1.0 0.5 1.0 50.0 1 1125 1

2 1.0 1.0 1.0 209.0 4 730 2

12 5.0 2.0 5.0 215.0 2 1825 34

You may have noticed that while the accommodates, bedrooms, bathrooms, beds, and minimum_
nights columns hover between 0 and 12 (at least in the first few rows), the values in the maximum_
nights and number_of_reviews columns span much larger ranges. For example, the maximum
nights column has values as low as 4 and as high as 1825, in the first few rows itself. If we use these
2 columns as part of a k-nearest neighbors model, these attributes could end up having an outsized
effect on the distance calculations, because of the largeness of the values.

For example, 2 living spaces could be identical across every attribute but be vastly different just on
the maximum_nights column. If one listing had a maximum_nights value of 1825 and the other
a maximum_nights value of 4, because of the way Euclidean distance is calculated, these listings
would be considered very far apart because of the outsized effect the largeness of the values had on
the overall Euclidean distance. To prevent any single column from having too much of an impact on
the distance, we can normalize all of the columns to have a mean of 0 and a standard deviation of 1.
ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

Normalizing the values in each column to the standard normal distribution (mean of 0, standard
deviation of 1) preserves the distribution of the values in each column while aligning the scales. To
normalize the values in a column to the standard normal distribution, you need to:
• From each value, subtract the mean of the column
• Divide each value by the standard deviation of the column

Here's the mathematical formula describing the transformation that needs to be applied for all values
in a column:

Where x is a value in a specific column, μ is the mean of all the values in the column, and σ is the
standard deviation of all the values in the column. Here's what the corresponding code, using pandas,
looks like:

36
It should be noted that you can also do the following:

And get the same answer as above.

This is because first_transform is merely shifting the mean of the distribution and has no effect on
the shape or scaling of the distribution. In other words, the variance of dc_listings is the same as the
variance of first_transform.

To apply this transformation across all of the columns in a Dataframe, you can use the corresponding
Dataframe methods mean() and std():

These methods were written with mass column transformation in mind and when you call mean() or std(),
the appropriate column means and column standard deviations are used for each value in the
Dataframe. Let's now normalize all of the feature columns in dc_listings.

Instructions

• Normalize all of the feature columns in dc_listings and assign the new Dataframe containing just the
normalized feature columns to normalized_listings
• Add the price column from dc_listings to normalized_listings
• Display the first 3 rows in normalized_listings

Solutions

ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®


Euclidean Distance for Multivariate Case
In the last mission, we trained 2 univariate k-nearest neighbors models. The first one used the
accommodates attribute while the second one used the bathrooms attribute. Let's now train a model
that uses both attributes when determining how similar 2 living spaces are. Let's refer to the Euclidean
distance equation again to see what the distance calculation using 2 attributes would look like:

Since we're using 2 attributes, the distance calculation would look like:

37
To find the distance between 2 living spaces, we need to calculate the squared difference between both
accommodates values, the squared difference between both bathrooms values, add them together,
and then take the square root of the resulting sum. Here's what the Euclidean distance between the
first 2 rows in normalized_listings looks like:

So far, we've been calculating Euclidean distance ourselves by writing the logic for the equation ourselves.
We can instead use the distance.euclidean() function from scipy.spatial, which takes in 2 vectors as the
parameters and calculates the Euclidean distance between them. The euclidean() function expects:
• Both of the vectors to be represented using a list-like object (Python list, NumPy array, or pandas
Series)
• Both of the vectors must be 1-dimensional and have the same number of elements

Here’s a simple example:


ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

Let's use the euclidean() function to calculate the Euclidean distance between 2 rows in our dataset
to practice.

Instructions

• Calculate the Euclidean distance using only the accommodates and bathrooms features between
the first row and fifth row in normalized_listings using the distance.euclidean() function
• Assign the distance value to first_fifth_distance and display using the print function

Solutions

38
Introduction to Scikit-learn
So far, we've been writing functions from scratch to train the k-nearest neighbor models. While this is
helpful deliberate practice to understand how the mechanics work, you can be more productive and
iterate quicker by using a library that handles most of the implementation. In this screen, we'll learn
about the scikit-learn library, which is the most popular machine learning library in Python. Scikit-learn
contains functions for all of the major machine learning algorithms and a simple, unified workflow.
Both of these properties allow data scientists to be incredibly productive when training and testing
different models on a new dataset.

The scikit-learn workflow consists of 4 main steps:


• Instantiate the specific machine learning model you want to use
• Fit the model to the training data
• Use the model to make predictions
• Evaluate the accuracy of the predictions

We'll focus on the first 3 steps in this screen and the next screen. Each model in scikit-learn is
implemented as a separate class and the first step is to identify the class we want to create an instance
of. In our case, we want to use the KNeighborsRegressor class.

Any model that helps us predict numerical values, like listing price in our case, is known as
a regression model. The other main class of machine learning models is called classification, where we're
trying to predict a label from a fixed set of labels (e.g. blood type or gender). The word regressor from
the class name KNeighborsRegressor refers to the regression model class that we just discussed.

Scikit-learn uses a similar object-oriented style to Matplotlib and you need to instantiate an empty
model first by calling the constructor:

ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®


If you refer to the documentation, you'll notice that by default:
• n_neighbors: the number of neighbors, is set to 5
• algorithm: for computing nearest neighbors, is set to auto
• p: set to 2, corresponding to Euclidean distance

Let's set the algorithm parameter to brute and leave the n_neighbors value as 5, which matches the
implementation we wrote in the last mission. If we leave the algorithm parameter set to the default
value of auto, scikit-learn will try to use tree-based optimizations to improve performance (which are
outside of the scope of this mission):

39
Fitting a Model and Making Predictions
Now, we can fit the model to the data using the fit method. For all models, the fit method takes in 2
required parameters:
• Matrix-like object, containing the feature columns we want to use from the training set
• List-like object, containing correct target values

Matrix-like object means that the method is flexible in the input and either a Dataframe or a NumPy
2D array of values is accepted. This means you can select the columns you want to use from the
Dataframe and use that as the first parameter to the fit method.
If you recall from earlier in the mission, all of the following are acceptable list-like objects:
• NumPy array
• Python list
• Pandas Series object (e.g. when selecting a column)

You can select the target column from the Dataframe and use that as the second parameter to
the fit method:

When the fit() method is called, scikit-learn stores the training data we specified within the
ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

KNearestNeighbors instance (knn). If you try passing in data containing missing values or non-numerical
values into the fit method, scikit-learn will return an error. Scikit-learn contains many such features
that help prevent us from making common mistakes.

Now that we specified the training data we want to use to make predictions, we can use the predict
method to make predictions on the test set. The predict method has only one required parameter:
• Matrix-like object, containing the feature columns from the dataset we want to make predictions on

The number of feature columns you use during both training and testing need to match or scikit-learn
will return an error:

The predict() method returns a NumPy array containing the predicted price values for the test set. You
now have everything you need to practice the entire scikit-learn workflow.

40
Instructions

• Create an instance of the KNeighborsRegressor class with the following parameters:


• n_neighbors: 5
• algorithm: brute
• Use the fit method to specify the data we want the k-nearest neighbor model to use. Use the
following parameters:
• Training data, feature columns: just the accommodates and bathrooms columns, in that order,
from train_df
• Training data, target column: the price column from train_df
• Call the predict method to make predictions on:
• The accommodates and bathrooms columns from test_df
• Assign the resulting NumPy array of predicted price values to predictions

Solutions

ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®


Calculating MSE using Scikit-Learn
Earlier in this mission, we calculated the MSE and RMSE values using the pandas arithmetic operators to
compare each predicted value with the actual value from the price column of our test set. Alternatively,
we can instead use the sklearn.metrics.mean_squared_error function(). Once you become familiar with
the different machine learning concepts, unifying your workflow using scikit-learn helps save you a lot
of time and avoid mistakes.

The mean_squared_error() function takes in 2 inputs:


• List-like object, representing the true values
• List-like object, representing the predicted values using the model

For this function, we won't show any sample code and will leave it to you to understand the function from
the documentation itself to calculate the MSE and RMSE values for the predictions we just made.

41
Instructions

• Use the mean_squared_error function to calculate the MSE value for the predictions we made in
the previous screen
• Assign the MSE value to two_features_mse
• Calculate the RMSE value by taking the square root of the MSE value and assign to two_features_
rmse
• Display both of these error scores using the print function

Solutions

Using More Features


Here's a table comparing the MSE and RMSE As you can tell, the model we trained using both
values for the 2 univariate models from the features ended up performing better (lower error
score) than either of the univariate models from
ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

last mission and the multivariate model we just


trained: the last mission. Let's now train a model using the
following 4 features:
• accommodates
• bedrooms
• bathrooms
• number_of_reviews

Scikit-learn makes it incredibly easy to swap the


columns used during training and testing. We're
going to leave this for you as a challenge to train
and test a k-nearest neighbors model using these
columns instead. Use the code you wrote in the
last screen as a guide.

42
Instructions

• Create a new instance of the KNeighborsRegressor class with the following parameters:
• n_neighbors: 5
• algorithm: brute
• Fit a model that uses the following columns from our training set (train_df):
• Accommodates
• Bedrooms
• Bathrooms
• number_of_reviews
• Use the model to make predictions on the test set (test_df) using the same columns. Assign the
NumPy array of predictions to four_predictions
• Use the mean_squared_error() function to calculate the MSE value for these predictions by
comparing four_predictions with the price column from test_df. Assign the computed MSE value
to four_mse
• Calculate the RMSE value and assign to four_rmse
• Display four_mse and four_rmse using the print function

Solutions

ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®


Using All Features
So far so good! As we increased the features the model used, we observed lower MSE and RMSE
values:

Let's take this to the extreme and use all of the potential features. We should expect the error scores
to decrease since so far adding more features has helped do so.

43
Instructions

• Use all of the columns, except for the price column, to train a k-nearest neighbors model using the
same parameters for the KNeighborsRegressor class as the ones from the last few screens
• Use the model to make predictions on the test set and assign the resulting NumPy array of predictions
to all_features_predictions
• Calculate the MSE and RMSE value and assign to all_features_mse and all_features_rmse accordingly
• Use the print function to display both error scores

Solutions

• Interestingly enough, the RMSE value actually increased to 125.1 when we used all of the features
available to us. This means that selecting the right features is important and that using more
features doesn't automatically improve prediction accuracy. We should re-phrase the lever we
mentioned earlier from:
• Increase the number of attributes the model uses to calculate similarity when ranking the
ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

closest neighbors

to:

• Select the relevant attributes the model uses to calculate similarity when ranking the closest
neighbors
• The process of selecting features to use in a model is known as feature selection
• In this mission, we prepared the data to be able to use more features, trained a few models using
multiple features, and evaluated the different performance tradeoffs. We explored how using
more features doesn't always improve the accuracy of a k-nearest neighbors model. In the next
mission, we'll explore another knob for tuning k-nearest neighbor models - the k value

44
I.5
Optimization
Hyperparameter

45
ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®
Recap
In the last mission, we focused on increasing the number of attributes the model uses. We saw how,
in general, adding more attributes generally lowered the error of the model. This is because the model
is able to do a better job identifying the living spaces from the training set that are the most similar to
the ones from the test set. However, we also observed how using all of the available features didn't
actually improve the model's accuracy automatically and that some of the features were probably not
relevant for similarity ranking. We learned that selecting relevant features was the right lever when
improving a model's accuracy, not just increasing the features used in the absolute.

In this mission, we'll focus on the impact of increasing k, the number of nearby neighbors the model
uses to make predictions. We exported both the training (train_df) and test sets (test_df) from the last
missions to CSV files, dc_airbnb_train.csv and dc_airbnb_test.csv respectively. Let's read both these
CSV's into Dataframes.

Instructions

• Read dc_airbnb_train.csv into a Dataframe and assign to train_df


• Read dc_airbnb_test.csv into a Dataframe and assign to test_df

Hyperparameter Optimization
When we vary the features that are used in the model, we're affecting the data that the model uses.
On the other hand, varying the k value affects the behavior of the model independently of the actual
ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

data that's used when making predictions. In other words, we're impacting how the model performs
without trying to change the data that's used.

Values that affect the behavior and performance of a model that are unrelated to the data that's used
are referred to as hyperparameters. The process of finding the optimal hyperparameter value is known
as hyperparameter optimization. A simple but common hyperparameter optimization technique is
known as grid search, which involves:

• Selecting a subset of the possible hyperparameter values


• Training a model using each of these hyperparameter values
• Evaluating each model's performance
• Selecting the hyperparameter value that resulted in the lowest error value

46
Grid search essentially boils down to evaluating the model performance at different k values and
selecting the k value that resulted in the lowest error. While grid search can take a long time when
working with large datasets, the data we're working with in this mission is small and this process is
relatively quick.

Let's confirm that grid search will work quickly for the dataset we're working with by first observing
how the model performance changes as we increase the k value from 1 to 5. If you recall, we set 5 as
the k value for the last 2 missions. Let's use the features from the last mission that resulted in the best
model accuracy:

• accommodates
• bedrooms
• bathrooms
• number_of_reviews

Instructions

• Create a list containing the integer values 1, 2, 3, 4, and 5, in that order, and assign to hyper_params
• Create an empty list and assign to mse_values
• Use a for loop to iterate over hyper_params and in each iteration:
• Instantiate a KNeighborsRegressor object with the following parameters:
• n_neighbors: the current value for the iterator variable
• algorithm: brute
• Fit the instantiated k-nearest neighbors model to the following columns from train_df:
• accommodates
• bedrooms
• bathrooms

ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®


• number_of_reviews
• Use the trained model to make predictions on the same columns from test_df and assign
to predictions
• Use the mean_squared_error function to calculate the MSE value between predictions and
the price column from test_df
• Append the MSE value to mse_values
• Display mse_values using the print() function

47
Solutions

Expanding Grid Search


Since our dataset is small and scikit-learn has been developed with performance in mind, the code
ran quickly. As we increased the k value from 1 to 5, the MSE value fell from approximately 26,364 to
approximately 14,090:
ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

Let's expand grid search all the way to a k value of 20. While 20 may seem like an arbitrary ending
point for our grid search, we can always expand the values we try if we're unconvinced that the lowest
MSE value is associated with one of the hyperparameter values we tried so far.

48
Instructions

• Change the list of hyperparameter values, hyper_params, so it ranges from 1 to 20


• Create an empty list and assign to mse_values
• Use a for loop to iterate over hyper_params and in each iteration:
• Instantiate a KNeighborsRegressor object with the following parameters:
• n_neighbors: the current value for the iterator variable
• algorithm: brute
• Fit the instantiated k-nearest neighbors model to the following columns from train_df:
• accommodates
• bedrooms
• bathrooms
• number_of_reviews
• Use the trained model to make predictions on the same columns from test_df and assign
to predictions
• Use the mean_squared_error function to calculate the MSE value between predictions and
the price column from test_df
• Append the MSE value to mse_values
• Display mse_values using the print() function

Solutions

ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®


Visualizing Hyperparameter Values
As we increased the k value from 1 to 6, the MSE value decreased from approximately 26,364
to approximately 13,657. However, as we increased the k value from 7 to 20, the MSE value didn't
decrease further but instead hovered between approximately 14,288 and 14,870. This means that the
optimal k value is 6, since it resulted in the lowest MSE value.

This pattern is something you'll notice while performing grid search across other models as well. As
you increase k at first, the error rate decreases until a certain point, but then rebounds and increases
again. Let's confirm this behavior visually using a scatter plot.

49
Instructions

• Use the scatter() method from matplotlib.pyplot to generate a line plot with:
• hyper_params on the x-axis
• mse_values on the y-axis
• Use plt.show() to display the line plot

Solutions

The first model, which used the accommodates and bathrooms columns, was able to achieve
an MSE value of approximately 14,790. The second model, which added the bedrooms column,
was able to achieve an MSE value of approximately 13,522.9, which is even lower than
the lowest MSE value we achieved using the best model from the last mission (which used
the accommodates, bedrooms, bathrooms, and number_of_reviews columns). Hopefully this
demonstrates that using just one lever to find the best model isn't enough and you really want to
ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

use both levers in conjunction.

In this mission, we learned about hyperparameter optimization and the workflow of finding the optimal
model to make predictions. Next in this course is a challenge, where you'll practice the concepts you've
learned so far on a completely new dataset.

50
51
I.6 Cross Validation

ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®


Concept
In an earlier mission, we learned about train/test validation, a simple technique for testing a machine
learning model's accuracy on new data that the model wasn't trained on. In this mission, we'll focus on
more robust techniques.
To start, we'll focus on the holdout validation technique, which involves:

• Splitting the full dataset into 2 partitions:


• A training set
• A test set
• Training the model on the training set
• Using the trained model to predict labels on the test set
• Computing an error metric to understand the model's effectiveness
• Switch the training and test sets and repeat
• Average the errors

In holdout validation, we usually use a 50/50 split instead of the 75/25 split from train/test validation.
This way, we remove the number of observations as a potential source of variation in our model
performance.
ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

Figure 4.Test et train decomposition


Let's start by splitting the data set into 2 nearly equivalent halves.

When splitting the data set, don't forget to set a copy of it using .copy() to ensure you don't get any
unexpected results later on. If you run the code locally in Jupyter Notebook or Jupyter Lab without .
copy(), you'll notice what is known as a SettingWithCopy Warning. This won't prevent your code from
running properly, but it's letting you know that whatever operation you're doing is trying to be set on
a copy of a slice from a dataframe. To make sure you don't see this warning, make sure to include .
copy() whenever you perform operations on a dataframe.

52
Instructions

• Use the numpy.random.permutation() function to shuffle the ordering of the rows in dc_listings
• Select the first 1862 rows and assign to split_one
• Select the remaining 1861 rows and assign to split_two

Holdout Validation
Now that we've split our data set into 2 dataframes, let’s:
• Train a k-nearest neighbors model on the first half
• Test this model on the second half
• Train a k-nearest neighbors model on the second half
• Test this model on the first half

Instructions

ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®


• Train a k-nearest neighbors model using the default algorithm (auto) and the default number of
neighbors (5) that:
• Uses the accommodates column from train_one for training and
• Tests it on test_one.
• Assign the resulting RMSE value to iteration_one_rmse.
• Train a k-nearest neighbors model using the default algorithm (auto) and the default number of
neighbors (5) that:
• Uses the accommodates column from train_two for training
• Tests it on test_two
• Assign the resulting RMSE value to iteration_two_rmse
• Use numpy.mean() to calculate the average of the 2 RMSE values and assign to avg_rmse

53
Solutions

K-Fold Cross Validation


If we average the two RMSE values from the last step, we get an RMSE value of approximately 128.96.
Holdout validation is actually a specific example of a larger class of validation techniques called k-fold
cross-validation. While holdout validation is better than train/test validation because the model isn't
repeatedly biased towards a specific subset of the data, both models that are trained only use half the
available data. K-fold cross validation, on the other hand, takes advantage of a larger proportion of
the data during training while still rotating through different subsets of the data to avoid the issues of
train/test validation.

Here's the algorithm from k-fold cross validation:

• Splitting the full dataset into k equal length partitions


ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

• Selecting k-1 partitions as the training set and


• Selecting the remaining partition as the test set
• Training the model on the training set
• Using the trained model to predict labels on the test fold
• Computing the test fold's error metric
• Repeating all of the above steps k-1 times, until each partition has been used as the test set for an
iteration.
• Calculating the mean of the k error values

54
Holdout validation is essentially a version of k-fold cross validation when k is equal to 2.
Generally, 5 or 10 folds is used for k-fold cross-validation. Here's a diagram describing each iteration
of 5-fold cross validation:

Figure 5.diagram describing each iteration of 5-fold cross validation


As you increase the number the folds, the number of observations in each fold decreases and the
variance of the fold-by-fold errors increases. Let's start by manually partitioning the data set into
5 folds. Instead of splitting into 5 dataframes, let's add a column that specifies which fold the row
belongs to. This way, we can easily select our training set and testing set.

Instructions

• Add a new column to dc_listings named fold that contains the fold number each row belongs to:
• Fold 1 should have rows from index 0 up to745, not including 745
• Fold 2 should have rows from index 745 up to 1490, not including 1490
• Fold 3 should have rows from index 1490 up to 2234, not including 2234
• Fold 4 should have rows from index 2234 up to 2978, not including 2978

ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®


• Fold 5 should have rows from index 2978 up to 3723, not including 3723
• Display the unique value counts for the fold column to confirm that each fold has roughly the same
number of elements
• Display the number of missing values in the fold column to confirm we didn't miss any rows

Solutions

55
So far, we've been working under the assumption that a lower RMSE always means that a model
is more accurate. This isn't the complete picture, unfortunately. A model has two sources of
error, bias and variance.

Bias describes error that results in bad assumptions about the learning algorithm. For example,
assuming that only one feature, like a car's weight, relates to a car's fuel efficiency will lead you to fit
a simple, univariate regression model that will result in high bias. The error rate will be high since a
car's fuel efficiency is affected by many other factors besides just its weight.

Variance describes error that occurs because of the variability of a model's predicted values. If
we were given a dataset with 1,000 features on each car and used every single feature to train an
incredibly complicated multivariate regression model, we will have low bias but high variance. In an
ideal world, we want low bias and low variance but in reality, there's always a tradeoff.

The standard deviation of the RMSE values can be a proxy for a model's variance while the average
RMSE is a proxy for a model's bias. Bias and variance are the 2 observable sources of error in a model
that we can indirectly control.
ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

56
Prices
Predicting Car
I.7 Guided Project:

57
ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®
Guided Project: Predicting Car Prices
In this course, we explored the fundamentals of
machine learning using the k-nearest neighbors
algorithm. In this guided project, you'll practice
the machine learning workflow you've learned
so far to predict a car's market price using its
attributes. The data set we will be working with
contains information on various cars. For each car
we have information about the technical aspects
of the vehicle such as the motor's displacement,
the weight of the car, the miles per gallon, how
fast the car accelerates, and more. You can read
more about the data set here and can download
it directly from here. Here's a preview of the data
set:

https://archive.ics.uci.edu/ml/datasets/
automobile

Instructions

• Read imports-85.data into a dataframe named cars. If you read in the file using pandas.read_
csv() without specifying any additional parameter values, you'll notice that the column names don't
match the ones in the dataset's documentation. Why do you think this is and how can you fix this?
• Determine which columns are numeric and can be used as features and which column is the target
column
ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

• Display the first few rows of the dataframe and make sure it looks like the data set preview

Solutions

You can find the solutions for this guided project here.

58
II Calculus For
Machine Learning

59
ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®
Calculus For Machine Learning
In the previous course, we explored the machine learning workflow using the k-nearest neighbors
algorithm. We chose the k-nearest neighbors algorithm because building the intuition for how the
algorithm works doesn't require any mathematics. While the algorithm is easy to grasp, we can't use
it for larger datasets because the model itself is represented using the entire training set. Each time
we want to make a prediction on a new observation, we need to calculate the distance between each
observation in our training set and our new observation, then rank by ascending distance. This is a
computationally intensive technique!

Moving forward, for most of the machine learning techniques we'll learn about next, the model is
represented as a mathematical function. This mathematical function approximates the underlying
function that describes how the features are related to the target attribute. Once we derive this
mathematical function using the training dataset, making predictions on the test dataset (or on a
future dataset) is computationally cheap. The following diagram shows 2 different linear regression
functions that approximate the dataset (note that the values in this dataset are random).

https://app.dataquest.io/course/calculus-for-machine-learning
ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

Before we can dive into using linear regression models for machine learning, we'll need to understand
some key ideas from calculus. Calculus provides a framework for understanding how mathematical
functions behave. Calculus helps us:

• Understand the steepness at various points


• Find the extreme points in a function
• Determine the optimal function that best represents a dataset

Let's start by setting up a motivating problem, which we'll refer back to throughout this course. Let's
say we're given the following equation, which describes the trajectory of a ball after it's kicked by a
football player: y= −(x2)+3x−1

60
X is time in seconds while Y is the vertical position of the ball. Naturally, we'd like to know the highest
position the ball reached and at what time that happened. While we can graph the equation and
estimate the result visually, if we want the precise time and vertical position we'll need to use calculus.
In this course, we'll explore the different calculus concepts necessary to build up to being able to find
this precise point.

Let's start by visualizing this function.

Instructions

• Use numpy.linspace() to generate a NumPy array containing 100 values from 0 to 3 and assign to x
• Transform x by applying the function: y=−(x2)+3x−1 Assign the resulting array of transformed
values to y
• Use pyplot.plot() to generate a line plot with x on the x-axis and y on the y-axis
• Brainstorm how you would calculate the maximum height and find the exact time it occurred

Solutions

Understanding Linear and Nonlinear Functions


Before we dive into analyzing the curve of a ball's height, we'll need to understand a few key ideas first.
We'll explore those concepts using simple, straight lines first then build up to applying those concepts

ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®


to curves. A simple, straight line is more clearly defined as a linear function. All linear functions can be
written in the following form:

y=mx+b

For a specific linear function, m and bare constant values, while x and y are variables. y=3x+1 and y=5 are
both examples of linear functions.

Let's focus on the function y=3x+1 for now. This function multiples any x value we pass in by 3 then
adds 1 to it.

61
Let's start by gaining a geometric understanding
of linear functions. Below, you'll find an image
to help you understand how the line shifts or
changes when altering values of m and/or b.

• How does the line change when you keep m


fixed but vary b?
• How does the line change when you
keep b fixed but vary m?
• Which value controls the steepness of the line?
• What happens to the line when m is set to 0?

Figure 6.Example of linear function

So far, we've been working with linear functions, where we can determine the slope of the function
from the equation itself. If we step back to our ball trajectory equation, however, you'll notice that it
doesn't match the form y=mx+b:

y=−(x2)+3x−1

This is because this function is a nonlinear function. Nonlinear functions don't represent straight
lines -- they represent curves like the one we plotted in the first step of this mission. The outputs y of
a nonlinear function are not proportional to the input values x. An increment in x doesn't result in a
constant increment in y.

Whenever x is raised to a power not equal to 1, In the following image, observe how the slope
ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

we have a non-linear function. Here are some changes with different values of x1 and x2.
more examples of nonlinear functions:

Figure 7.Example of nonlinear function

62
Understanding Limits
At the end of the last mission, we fixed a first point on our curve, drew a secant line between that first
point and a second point, and observed what happened when we moved the second point closer to
the first point along the curve. The larger the interval between the 2 points on the x-axis, the more
the steepness of the secant line diverged from the steepness of the curve. The closer the interval, the
more the secant line started to match the steepness at the first point on the curve.

In this mission, we'll formalize the idea of slope further and learn how to calculate the slope for nonlinear
equations at any given point. As you go through the rest of this course, we strongly recommend
following the math we present using pencil and paper. We'll start by introducing some mathematical
notation that formalizes the observation we made at the end of the last mission. If we try to state the
observation by plugging in values to the slope equation, we'll run into the division-by-zero problem:

Even though the slope is undefined when x1 and x2 are equivalent, we still want to be able to state
and reason about what value the slope approaches as x2 approaches x1. To do that, we need to
reframe the problem as a limit. A limit describes the value a function approaches when the input
variable to the function approaches a specific value. In our case, the input variable is x2 and our
function is m=f(x2)−f(x1)x2−x1. The following mathematical notation formalizes the statement
"As x2 approaches 3, the slope between x1 and x2 approaches −3" using a limit:

limx2→3 is another way of saying "As x2 approaches 3". Because we fixed x1 to 3, we can
replace x1 with 3 in the function:

ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®


Finding Extreme Points
In the last mission, we learned how to use limits to calculate the point a function approaches when
the input value approaches a specific value. We applied this technique to calculate the slope of the
tangent line at a specific point on our nonlinear function:

If you recall from the first mission in this course, we're interested in determining the highest point on
this curve.

63
If you've ever hiked a mountain before, you'll be
familiar with how the trail slopes up until you
reach the peak. Once you're at the peak, however,
all of the paths back down slope downwards.
Understanding how the slope varies throughout
a curve provides a useful lens for determining the
maximum point on a curve.

We'll start by building some visual intuition for


how a function's slope and it's maximum point are
related. In the image below, we've generated two
plots. As xx varies, the plot on the left visualizes
how the tangent line for the curve changes while
the plot on the right visualizes how the slope of Figure 8.the tangent line for the curve changes
this tangent line changes.
ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

64
Learning
For Machine
III Linear Algebra

65
ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®
Linear Algebra For Machine Learning
Linear algebra is a pillar of machine learning. You cannot develop a deep understanding and application
of machine learning without it. Using clear explanations, standard Python libraries, and step-by-step
tutorial lessons, you will discover what linear algebra is, the importance of linear algebra to machine
learning, vector, and matrix operations, and much more.
https://mml-book.github.io/book/mml-book.pdf

Linear Systems
In the last course, we explored the framework of calculus and used it to:
• Understand the slope of linear functions
• Understand the derivative (slope as a function) of nonlinear functions
• Find extreme values in nonlinear functions

While we learned the basics of slope through linear functions, we primarily focused on nonlinear
functions in the last course.

In this course, we'll focus on understanding linear functions. Specifically, we'll explore the framework
of linear algebra, which provides a way to represent and understand the solutions to systems of linear
equations. A system of linear equations consists of multiple, related functions with a common set of
variables. The word linear equation is often used interchangeably with linear function. Many real world
processes can be modeled using multiple, related linear equations. We'll start by exploring a concrete
example of a linear system, another word for system of linear equations, before we dive further into
linear algebra.

Optimal Salary Problem


ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

Let's say we have to pick between 2 different job offers. The first job offer has a base weekly salary of
1000 dollars and pays 30 dollars an hour. We can represent this offer as y=1000+30x, where y represents
dollars earned that week and xx represents hours worked that week. The second job offer has a base
weekly salary of 100 dollars and pays 50 dollars an hour. We can represent this offer as y=100+50x,
where y also represents dollars earned that week and xx also represents hours worked that week.

We want to understand which job offer is better. If we know exactly the amount of money we'd like to
make each week (y), we can substitute that value into both equations and solve for x to identify which
job will require us to work less hours. If we know exactly the number of hours we want to work each
week (x), we can substitute that value into both equations and solve for y to identify which job will
make us more money for the same amount of hours worked. Instead, if we want to understand:

• At what number of hours worked can we expect to make the same amount of money at either job?
• How many hours do we have to work to make more money at the first job than the second job?

To answer the first question, we need to find the x value where both the y values are equivalent. Once
we know where they intersect, we can easily find out the answer to the second question.

66
Instructions

• Use numpy.linspace() to generate 1000, evenly spaced values between 0 and 50 and assign to x
• Transform x using the equation y=30x+1000 and assign the result to y1
• Transform x using the equation y=50x+100 and assign the result to y2
• Generate 2 line plots on the same subplot:
• One with x on the x-axis and y1 on the y-axis. Set the line color to "orange"
• One with x on the x-axis and y2 on the y-axis. Set the line color to "blue"
• Skip selecting a value range for the x and y axes, and instead let matplotlib automatically select
based on the data

Solutions

Vectors
In the last mission, we learned how to use an augmented matrix and the row operations that preserve
the relationships in a system to solve a system of linear functions. At its core, a matrix is a way to
represent a table of numbers. All of the matrices we worked with in the last mission contained 2 rows

ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®


and 3 columns. Here's the first one that setup our linear system:

This is known as a 2x3 matrix (pronounced "two by three matrix"). The convention in linear algebra is
to specify the number of rows first (2) then the number of columns (3). Each of the rows and columns
in this matrix is represented as a list of numbers.

A list of numbers is known as a vector. A row from a matrix is known as a row vector, while a column
is known as a column vector. Here are the row vectors from the matrix:

67
Here are the column vectors from the matrix:

In this mission, we'll learn more about column vectors and their associated operations to help us
understand certain properties of linear systems. We'll end this mission by justifying the approach we
used in the last mission to solve the linear system by connecting a few key ideas from matrices and
vectors. We'll start by building some geometric intuition of vectors. Generally, the word vector refers
to the column vector (ordered list of elements in a single column) and we'll refer to the column vector
that way throughout the rest of this course.

Matrix Algebra
Like vectors, matrices have their own set of Because of that, the operations from vectors also
algebraic operations. In this mission, we'll learn carry over to matrices. We could perform vector
the core matrix operations and build up to using addition and subtraction between vectors with
some of them to solve the matrix equation. Let's the same number of rows. We can perform matrix
first start with matrix addition and subtraction. addition and subtraction between matrices
If you recall from the previous mission, a matrix containing the same number of rows and columns.
consists of one or more column vectors.
ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

68
As with vectors, matrix addition and subtraction works by distributing the operations across the
specific elements and combining them.

Lastly, we can also multiply a matrix by a scalar value, just like we can with a vector.

Solution Sets
In this course, we've explored two different ways to find the solution to A→x=→b when bb isn't a vector
containing all zeroes (b≠0). The first way we explored was Gaussian elimination, which involves using
the row operations to transform the augmented representation of a linear system to echelon form and
then finally to reduced row echelon form.

The second way we explored was to compute the matrix inverse of A and left multiplying both sides
of the equation to find →x.

While we can use these techniques to solve most of the linear systems we'll encounter, we need to
learn what to do when:

ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®


• the solution set for a linear system doesn't exist
• the solution set for a linear system isn't just a single vector
• b is equal to →0

In this mission, we'll wrap up this course by exploring all three of these situations.

69
ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

70
IV Linear
Regression For
Machine Learning
Linear Regression For Machine Learning
Linear regression is one of the easiest and most popular Machine Learning algorithms. It is a statistical
method that is used for predictive analysis. You will learn in this section about parametric machine
learning algorithms and the basics of the linear regression model.

The Linear Regression Model


In the first course in this step, Machine Learning Fundamentals, we walked through the full machine
learning workflow using the k-nearest neighbors algorithm. K-nearest neighbors works by finding
similar, labelled examples from the training set for each instance in the test set and uses them to
predict the label. K-nearest neighbors is known as an instance-based learning algorithm because it
relies completely on previous instances to make predictions. The k-nearest neighbors algorithm doesn't
try to understand or capture the relationship between the feature columns and the target column.

Because the entire training dataset is used to find a new instance's nearest neighbors to make label
predictions, this algorithm doesn't scale well to medium and larger datasets. If we have a million
instances in our training data set and we want to make predictions for a hundred thousand new
instances, we'd have to sort the million instances in the training set by Euclidean distance for each
instance. The following diagram provides an overview of the complexity of k-nearest neighbors:

ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®


Figure 9.Training Process Figure 10.Testing Process-Step 1

Figure 11.Testing Process-Step 2 Figure 12.Testing Process-Step 3

71
We need to instead learn about parametric machine learning approaches, like linear regression and
logistic regression. Unlike the k-nearest neighbors algorithm, the result of the training process for
these machine learning algorithms is a mathematical function that best approximates the patterns in
the training set. In machine learning, this function is often referred to as a model.

In this course, we'll explore the most commonly used machine learning model -- the linear regression
model. Parametric machine learning approaches work by making assumptions about the relationship
between the features and the target column. In linear regression, the approximate relationship between
the feature columns and the target column is expressed as a linear regression equation:

The following diagram provides an overview of the machine learning process for linear regression.
For now, the goal isn't to understand the entire process but more to compare and contrast with
the nonparametric approach of k-nearest neighbors.

Figure 13.machine learning process for linear regression-Training Process


ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

Figure 14.machine learning process for linear regression-Testing Process

72
In this mission, we'll provide an overview of how we use a linear regression model to make predictions.
We'll use scikit-learn for the model training process, so we can focus on gaining intuition for the
model-based learning approach to machine learning.

In later missions in this course, we'll dive into the math behind how a model is fit to the dataset, how
to select and transform features, and more.

Feature Selection
In the machine learning workflow, once we've selected the model we want to use, selecting the
appropriate features for that model is the next important step. In this mission, we'll explore how to
use correlation between features and the target column, correlation between features, and variance
of features to select features. We'll continue working with the same housing dataset from the last
mission.

We'll specifically focus on selecting from feature columns that don't have any missing values or don't
need to be transformed to be useful (e.g. columns like Year Built and Year Remod/Add). We'll explore
how to deal with both of these in a later mission in this course.

To start, let's look at which columns fall into either of these two categories.

Instructions

• Read AmesHousing.txt into a dataframe named data. Be sure to separate on the \t delimiter.
• Create a dataframe called train, which contains the first 1460 rows of data
• Create a dataframe called test, which contains the rest of the rows of data
• Select the integer and float columns from train and assign them to the variable numerical_train
Drop the following columns from numerical_train:

ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®



• PID (place ID isn't useful for modeling)
• Year Built
• Year Remod/Add
• Garage Yr Blt
• Mo Sold
• Yr Sold
• Calculate the number of missing values from each column in numerical_train. Create a Series object
where the index is made up of column names and the associated values are the number of missing
values
• Assign this Series object to null_series. Select the subset of null_series to keep only the columns
with no missing values, and assign the resulting Series object to full_cols_series
• Display full_cols_series using the print() function

73
Solutions

Gradient Descent
In the previous missions, we learned how the linear regression model estimates the relationship
between the feature columns and the target column and how we can use that for making predictions.
In this mission and the next, we'll discuss the two most common ways for finding the optimal parameter
values for a linear regression model. Each combination of unique parameter values forms a unique
linear regression model, and the process of finding these optimal values is known as model fitting.

In both approaches to model fitting, we'll aim to minimize the following function:

This function is the mean squared error between the predicted labels made using a given model and
the true labels. The problem of choosing a set of values that minimize or maximize another function
is known as an optimization problem. To build intuition for the optimization process, let's start with a
single parameter linear regression model:

Note that this is different from a simple linear regression model, which actually
ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

has two parameters: a0a0 and a1a1.

Let's use the Gr Liv Area column for the single parameter:

Figure 15.simple linear regression model for


Sale Price

74
Ordinary Least Squares
In the last mission, we explored an iterative technique for model fitting named gradient descent. The
gradient descent algorithm requires multiple iterations to converge on the optimal parameter values
and the number of iterations is highly dependent on the initial parameter values and the learning rate
we select.

In this mission, we'll explore a technique called ordinary least squares estimation or OLS estimation
for short. Unlike gradient descent, OLS estimation provides a clear formula to directly calculate the
optimal parameter values that minimize the cost function. To understand OLS estimation, we need to
first frame our linear regression problem in the matrix form. We've mostly worked with the following
form of the linear regression model:

https://app.dataquest.io/login?target-url=%2Fm%2F238%2Fordinary-least-squares

While this form represents the relationship between the features (x1x1 to xnxn) and the target column
(yy) well when there are just a few parameter values, it doesn't scale well when we have hundreds of
parameters. If you recall from the Linear Algebra for Machine Learning course, we explored how matrix
notation lets us better represent and reason about a linear system with many variables. With that in
mind, here's what the matrix form of our linear regression model looks like:

Where X is a matrix representing the columns from the training set our model uses, aa is a vector
representing the parameter values, and ŷ is the vector of predictions. Here's a diagram with some
sample values for each:

ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

Now that we've gained an understanding for the matrix representation of the linear regression model,
let's take a peek at the OLS estimation formula that results in the optimal vector a:

75
Let's start by computing OLS estimation to find the best parameters for a model using the following
features:

In the following screens, we'll dive into the mathematical derivation of the OLS estimation technique.
It's important to note that you'll most likely never implement this technique in a data science role and
will instead use an existing, efficient implementation (scikit-learn uses OLS under the hood when you
call fit() on a LinearRegression instance).

Instructions

• Create a dataframe, X, where:


• Its number of rows is the same as that of train (defined in the display code)
• The first column is called bias and is populated with 1s throughout
• The following columns are the ones in features from train, in the same order
• Select the SalePrice column from the training set and assign to y
• Use the OLS estimation formula to return the optimal parameter values. Store the estimation to the
variable ols_estimation

Solutions
ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

Processing And Transforming Features


To understand how linear regression works, we have stuck using or dropping some features from
the training dataset that contained no missing values and were already in a convenient numeric
representation. In this mission, we'll explore how to transform some of the remaining features so we
can use them in our model. Broadly, the process of processing and creating new features is known
as feature engineering. Feature engineering is a bit of an art and having knowledge in the specific
domain (in this case real estate) can help you create better features. In this mission, we'll focus on some
domain-independent strategies that work for all problems.

In the first half of this mission, we'll focus only on columns that contain no missing values but still
aren't in the proper format to use in a linear regression model. In the latter half of this mission, we'll
explore some ways to deal with missing values.

76
Amongst the columns that don't contain missing values, some of the common issues include:

• The column is not numerical (e.g. a zoning code represented using text)
• The column is numerical but not ordinal (e.g. zip code values)
• The column is numerical but isn't representative of the type of relationship with the target column
(e.g. year values)

Let's start by filtering the training set to just the columns containing no missing values.

Instructions

• Select just the columns from the train data frame that contain no missing values
• Assign the resulting data frame, that contains just these columns, to df_no_mv
• Use the variables display to become familiar with these columns

Solutions

Guided Project: Predicting House Sale Prices

ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®


In this course, we started by building intuition for
model based learning, explored how the linear
regression model worked, understood how the
two different approaches to model fitting worked,
and some techniques for cleaning, transforming,
and selecting features. In this guided project, you
can practice what you learned in this course by
exploring ways to improve the models we built.
You'll work with housing data for the city of Ames,
Iowa, United States from 2006 to 2010. You can
read more about why the data was collected here.
You can also read about the different columns in
the data here.

https://app.dataquest.io/login?target-
url=%2Fm%2F240%2Fguided-project%253A-
predicting-house-sale-prices

Figure 16.pipeline of functions for linear regression

77
Instructions

• Import pandas, matplotlib, and numpy into the environment. Import the classes you need from
scikit-learn as well
• Read AmesHousing.tsv into a pandas data frame
• For the following functions, we recommend creating them in the first few cells in the notebook. This
way, you can add cells to the end of the notebook to do experiments and update the functions in
these cells
• Create a function named transform_features() that, for now, just returns the train data frame
• Create a function named select_features() that, for now, just returns the Gr Liv
Area and SalePrice columns from the train data frame
• Create a function named train_and_test() that, for now:
• Selects the first 1460 rows from data and assign to train
• Selects the remaining rows from data and assign to test
• Trains a model using all numerical columns except the SalePrice column (the target column)
from the data frame returned from select_features()
• Tests the model on the test set and returns the RMSE value

Solutions

You can find the solutions notebook for this guided project here.
ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

78
V Machine

79
Learning in Python

ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®


Logistic Regression
Linear regression is a supervised machine learning technique that works well when the target column
we're trying to predict, the dependent variable, is ordered and continuous. If the target column instead
contains discrete values, then linear regression isn't appropriate.

In this mission, we'll explore how to build a predictive model for these types of problems, which are
known as classification problems. In classification, our target column has a limited set of possible
values which represent different categories for a row. We use integers to represent the different
categories to continue use of mathematical functions for describing how the independent variables
map to the dependent variable.

Here are a few examples of classification problems:

We'll focus on binary classification for now, where the only two options for values are:

• 0 for the False condition


• 1 for the True condition

Before we learn more classification, let's gain an understand of the data.

Introduction to Evaluating Binary Classifiers


ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

In the previous lesson, we learned about classification, Here's a preview of the dataset:
logistic regression, and how to use scikit-learn to fit
a logistic regression model to a dataset on graduate
school admissions. We'll continue to work with the
dataset, which contains data on 644 applications
with the following columns:

• GRE - Applicant's score on the Graduate


Record Exam, a generalized test for prospective
graduate students
• Score ranges from 200 to 800
• GPA - College grade point average
• Continuous between 0.0 and 4.0
• Admit - Binary value
• Binary value, 0 or 1, where 1 means the
applicant was admitted to the program and 0
means the applicant was rejected

80
Let's use the logistic regression model from the last mission to predict the class labels for each
observation in the dataset and add these labels to the dataframe in a separate column.

Instructions

• Use the LogisticRegression method predict to return the label for each observation in the
dataset, admissions. Assign the returned list to labels
• Add a new column to the admissions dataframe named predicted label that contains the values
from labels
• Use the Series method value_counts and the print function to display the distribution of the values
in the predicted_label column
• Use the dataframe method head and the print function to display the first five rows in admissions

Solutions

ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®


Multiclass Classification
The dataset we will be working with contains information on various cars. For each car we have
information about the technical aspects of the vehicle such as the motor's displacement, the car's,
the miles per gallon, and car acceleration. This information can predict the origin of the vehicle, either
North America, Europe, or Asia. Unlike our previous classification datasets, we have three categories
to choose from, making our task more challenging.

Here's a preview of the data:

81
The dataset is hosted by the University of California Irvine on their machine learning repository. The
UCI Machine Learning repository contains many small datasets which are useful when getting your
hands dirty with machine learning.

You'll notice that the Data Folder contains different files. We'll be working with auto-mpg.data, which
omits the 8 rows containing missing values for fuel efficiency (mpg column). We've converted this data
into a CSV file named auto.csv for you.

Here are the columns in the dataset:

• mpg -- Miles per gallon, Continuous


• cylinders -- Number of cylinders in the motor, Integer, Ordinal, and Categorical
• displacement -- Motor size, Continuous
• horsepower -- Horsepower produced, Continuous
• weight -- Car's weight, Continuous
• acceleration -- Acceleration, Continuous
• year -- Year the car was built, Integer and Categorical
• origin -- Integer and Categorical. 1: North America, 2: Europe, 3: Asia

Instructions

• Import the Pandas library and read auto.csv into a Dataframe named cars.
• Use the Series.unique() method to assign the unique elements in the column origin to unique
regions. Then use the print function to display unique_regions

Solutions
ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

Overfitting
While exploring regression, we've briefly mentioned overfitting and the problems it can cause. In this
lesson, we'll explore how to identify overfitting and what you can do to avoid it. To explore overfitting,
we'll use a dataset on cars which contains 7 numerical features that could have an effect on a car's
fuel efficiency:

• cylinders -- the number of cylinders in the engine.


• displacement -- the engine's displacement
• horsepower -- the engine's horsepower
• weight -- the car's weight
• acceleration -- the car's acceleration
• model year -- the year that car model was released (e.g. 70 corresponds to 1970)
• origin -- where the car was manufactured (0 if North America, 1 if Europe, 2 if Asia)

82
• The mpg column is our target column and we want to predict using the other features
• The dataset is hosted by the University of California Irvine on their machine learning repository
You'll notice that the Data Folder contains multiple files. We'll work with auto-mpg.data, which
omits the 8 rows containing missing values for fuel efficiency (mpg column)
• The starter code imports Pandas, reads the data into a dataframe, and cleans up some messy values.
Explore the dataset to become more familiar with it
• Reading the starter code, you might discover some different syntax. If you run the code locally in
Jupyter Notebook or Jupyter Lab, you'll notice a SettingWithCopy Warning. This won't prevent
your code from running properly, but notifies you that whatever operation you're doing is trying to
be set on a copy of a slice from a dataframe. To resolve this, it's considered good practice to include
.copy() whenever you perform operations on a dataframe

Clustering Basics
So far, we've about regression and classification. These are both types of supervised machine learning.
In supervised learning, you can train an algorithm to predict an unknown variable from known variables.
Another major type of machine learning is called unsupervised learning. In unsupervised learning, we
aren't trying to predict anything. Instead, we're trying to find patterns in data.

We'll use an algorithm called k-means clustering to split our data into clusters. k-means clustering uses
Euclidean distance to form clusters of similar Senators. We'll dive into the theory of k-means clustering
and build the algorithm from the ground up in a later lesson. For now, it's important to understand
clustering at a high level, so we'll leverage the scikit-learn library to train a k-means model.

The k-means algorithm groups Senators who vote similarly on bills in clusters. Each cluster is assigned a
center, and the Euclidean distance from each Senator to the center is computed. Senators are assigned
to clusters based on proximity. From our background knowledge, we think that Senators cluster along

ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®


party lines.

The k-means algorithm requires us to specify the number of clusters upfront, because we suspect
that clusters will occur along party lines, and the vast majority of Senators are either Republicans or
Democrats. We'll pick 2 for our number of clusters.

We'll use the KMeans class from scikit-learn to perform the clustering, because we aren't predicting
anything. There's no risk of overfitting, so we'll train our model on the whole dataset. After training,
we'll be able to determine cluster labels that indicate each Senator's cluster.

We can initialize the model like this:

83
The above code initializes the k-means model with 2 clusters and a random state of 1 to allow for the
same results to be reproduced whenever the algorithm runs.

We'll then be able to use the fit transform() method to fit the model to votes and get the distance of
each Senator to each cluster. The result will look similar to the example below:
This is a NumPy array with two columns. The
first column is the Euclidean distance from each
Senator to the first cluster and the second column
is the Euclidean distance to the the second
cluster. The values in the columns indicate how
"far" the Senator is from each cluster. The further
away from the cluster, the less the Senator's
voting history aligns with the voting history of
the cluster

K-means Clustering
In NBA media coverage, sports reporters usually focus on a few players and create stories about the
uniqueness of these players' stats. As data scientists, we're likely to be skeptical about how unique
each player really is. In this lesson, we'll use data science to explore this idea by looking at a dataset of
player information from the 2013-2014 season. Here are the columns we'll work with:

• player — the player's name


• pos — the player's position
• g — the number of games played
pts — the player's total points scored
ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®


• fg. — the field goal percentage
• ft. — the free throw percentage

Check out the glossary in Basketball Reference for an explanation of each column.

Guided Project: Predicting the Stock Market


Predicting The Stock Market. If you haven't been through a guided project in this interface, here's a
quick introduction. You could use this tools online:
• Google colaboratory
• Cocalc Collaborative Calculation and Data Science

For a more thorough overview, we recommend that you complete the Working With Data
Downloads guided project. In this project, you'll work with data from the S&P500 Index. The S&P500
is a stock market index. Before we get into what an index is, we'll need to start with the basics of the
stock market.

84
Some companies are publicly traded, which means that anyone can buy and sell their shares on the
open market. A share entitles the owner to some control over the direction of the company and to a
percentage (or share) of the earnings of the company. When you buy or sell shares, it's common known
as trading a stock.

The price of a share is based on supply and demand for a given stock. For example, Apple stock has a
price of 120 dollars per share as of December 2015 -- http://www.nasdaq.com/symbol/aapl. A stock
that is in less demand, like Ford Motor Company, has a lower price -- http://finance.yahoo.com/q?s=F.
Stock price is also influenced by other factors, including the number of shares a company has issued.

Stocks are traded daily and the price can rise or fall from the beginning of a trading day to the end
based on demand. Stocks that are in more in demand, such as Apple, are traded more often than
stocks of smaller companies.

Indexes aggregate the prices of multiple stocks together, and allow you to see how the market as a
whole performs. For example, the Dow Jones Industrial Average aggregates the stock prices of 30 large
American companies together. The S&P500 Index aggregates the stock prices of 500 large companies.
When an index fund goes up or down, you can say that the primary market or sector it represents is
doing the same. For example, if the Dow Jones Industrial Average price goes down one day, you can
say that American stocks overall went down (ie, most American stocks went down in price).

You'll be using historical data on the price of the S&P500 Index to make predictions about future
prices. Predicting whether an index goes up or down helps forecast how the stock market as a whole
performs. Since stocks tend to correlate with how well the economy as a whole is performs, it can also
help with economic forecasts.

There are thousands of traders who make money by buying and selling Exchange Traded Funds. ETFs

ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®


allow you to buy and sell indexes like stocks. This means that you could "buy" the S&P500 Index ETF
when the price is low and sell when it's high to make a profit. Creating a predictive model could allow
traders to make money on the stock market.

The columns of the dataset are:

• Date -- The date of the record


• Open -- The opening price of the day (when trading starts)
• High -- The highest trade price during the day
• Low -- The lowest trade price during the day
• Close -- The closing price for the day (when trading is finished)
• Volume -- The number of shares traded
• Adj Close -- The daily closing price, adjusted retroactively to include any corporate actions

85
You'll be using this dataset to develop a predictive model. You'll train the model with data from 1950-
2012 and try to make predictions from 2013-2015.

Note: You shouldn't make trades with any models developed in this mission. Trading stocks has risks
and nothing in this mission constitutes stock trading advice.

In this lesson, you'll be working with a csv file containing index prices. Each row in the file contains a
daily record of the price of the S&P500 Index from 1950 to 2015. The dataset is stored in sphist.csv.
ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

86
VI Decision Tree

87
ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®
Decision Tree
• Decision Tree is a Supervised learning technique that can be used for both classification and
Regression problems, but mostly it is preferred for solving Classification problems. It is a tree-
structured classifier, where internal nodes represent the features of a dataset, branches represent
the decision rules and each leaf node represents the outcome.
• In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node. Decision nodes
are used to make any decision and have multiple branches, whereas Leaf nodes are the output of
those decisions and do not contain any further branches.
• The decisions or the test are performed on the basis of features of the given dataset.
• It is a graphical representation for getting all the possible solutions to a problem/decision based on
given conditions.
• It is called a decision tree because, similar to a tree, it starts with the root node, which expands on
further branches and constructs a tree-like structure.
• In order to build a tree, we use the CART algorithm, which stands for Classification and Regression
Tree algorithm.
• A decision tree simply asks a question, and based on the answer (Yes/No), it further split the tree
into subtrees.
• Below diagram explains the general structure of a decision tree:
ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

Figure 17.Decision tree diagram

Note: A decision tree can contain categorical data (YES/NO) as well as numeric data.

88
Why use Decision Trees?
There are various algorithms in Machine learning, so choosing the best algorithm for the given dataset
and problem is the main point to remember while creating a machine learning model. Below are the
two reasons for using the Decision tree:

• Decision Trees usually mimic human thinking ability while making a decision, so it is easy to
understand
• The logic behind the decision tree can be easily understood because it shows a tree-like structure

Decision Tree Terminologies


• Root Node: Root node is from where the decision tree starts. It represents the entire dataset, which
further gets divided into two or more homogeneous sets
• Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated further after
getting a leaf node
• Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes according
to the given conditions
• Branch/Sub Tree: A tree formed by splitting the tree
• Pruning: Pruning is the process of removing the unwanted branches from the tree
• Parent/Child node: The root node of the tree is called the parent node, and other nodes are called
the child nodes
How Does the Decision Tree Algorithm Work
In a decision tree, for predicting the class of the given dataset, the algorithm starts from the root node
of the tree. This algorithm compares the values of root attribute with the record (real dataset) attribute
and, based on the comparison, follows the branch and jumps to the next node.

ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®


For the next node, the algorithm again compares the attribute value with the other sub-nodes and
move further. It continues the process until it reaches the leaf node of the tree. The complete process
can be better understood using the below algorithm:

• Step 1: Begin the tree with the root node, says S, which contains the complete dataset
• Step 2: Find the best attribute in the dataset using Attribute Selection Measure (ASM)
• Step 3: Divide the S into subsets that contains possible values for the best attributes
• Step 4: Generate the decision tree node, which contains the best attribute
• Step 5: Recursively make new decision trees using the subsets of the dataset created in step 3.
Continue this process until a stage is reached where you cannot further classify the nodes and called
the final node as a leaf node

89
Example: Suppose there is a candidate who has a
job offer and wants to decide whether he should
accept the offer or Not. So, to solve this problem,
the decision tree starts with the root node (Salary
attribute by ASM). The root node splits further into
the next decision node (distance from the office)
and one leaf node based on the corresponding
labels. The next decision node further gets split
into one decision node (Cab facility) and one leaf
node. Finally, the decision node splits into two
leaf nodes (Accepted offers and Declined offer).
Consider the following diagram:

Figure 18.Attribute Selection Measures

While implementing a Decision tree, the main issue arises in how to select the best attribute for
the root node and for sub-nodes. So, to solve such problems there is a technique which is called
as Attribute selection measure or ASM. By this measurement, we can easily select the best attribute
for the nodes of the tree. There are two popular techniques for ASM, which are:
• Information Gain
• Gini Index

1. Information Gain:
• Information gain is the measurement of changes in entropy after the segmentation of a dataset
based on an attribute
• It calculates how much information a feature provides us about a class
ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

• According to the value of information gain, we split the node and build the decision tree
• A decision tree algorithm always tries to maximize the value of information gain, and a node/
attribute having the highest information gain is split first. It can be calculated using the following
formula: Information Gain= Entropy(S)- [(Weighted Avg) *Entropy(each feature)  

Entropy: Entropy is a metric to measure the impurity in a given attribute. It specifies randomness in
data. Entropy can be calculated as:

Where,
• S= Total number of samples
• P(yes)= probability of yes
• P(no)= probability of no

90
2. Gini Index:

• Gini index is a measure of impurity or purity used while creating a decision tree in the
CART(Classification and Regression Tree) algorithm
• An attribute with the low Gini index should be preferred as compared to the high Gini index
• It only creates binary splits, and the CART algorithm uses the Gini index to create binary splits
• Gini index can be calculated using the below formula:

Pruning: Getting an Optimal Decision Tree


Pruning is a process of deleting the unnecessary nodes from a tree in order to get the optimal decision
tree.

A too-large tree increases the risk of overfitting, and a small tree may not capture all the important
features of the dataset. Therefore, a technique that decreases the size of the learning tree without
reducing accuracy is known as Pruning. There are mainly two types of tree pruning technology used:

• Cost Complexity Pruning


• Reduced Error Pruning

Advantages of the Decision Tree


• It is simple to understand as it follows the same process which a human follows while making any
decision in real-life
• It can be very useful for solving decision-related problems

ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®


• It helps to think about all the possible outcomes for a problem
• There is less requirement of data cleaning compared to other algorithms

Disadvantages of the Decision Tree


• The decision tree contains lots of layers, which makes it complex
• It may have an overfitting issue, which can be resolved using the Random Forest algorithm
• For more class labels, the computational complexity of the decision tree may increase

Python Implementation of Decision Tree


Now we will implement the Decision tree using Python. For this, we will use the dataset "user_data.
csv," which we have used in previous classification models. By using the same dataset, we can compare
the Decision tree classifier with other classification models such as KNN SVM, LogisticRegression, etc.

91
Steps will also remain the same, which are given below:

• Data Pre-processing step


• Fitting a Decision-Tree algorithm to the Training set
• Predicting the test result
• Test accuracy of the result(Creation of Confusion matrix)
• Visualizing the test set result
1. Data Pre-Processing Step:
Below is the code for the pre-processing step:
ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

92
In the above code, we have pre-processed the data. Where we have loaded the dataset, which is given
as:

ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®


2. Fitting a Decision-Tree algorithm to the Training set
Now we will fit the model to the training set. For this, we will import the DecisionTreeClassifier class
from sklearn.tree library. Below is the code for it:

93
In the above code, we have created a classifier object, in which we have passed two main parameters:
• "criterion='entropy': Criterion is used to measure the quality of split, which is calculated by
information gain given by entropy.
• random_state=0": For generating the random states.

Below is the output for this:

3. Predicting the test result


Now we will predict the test set result. We will create a new prediction vector y_pred. Below is the
code for it:

Output:
In the below output image, the predicted output and real test output are given. We can clearly see that
there are some values in the prediction vector, which are different from the real vector values. These
are prediction errors.
ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

94
4. Test accuracy of the result (Creation of Confusion matrix)
In the above output, we have seen that there were some incorrect predictions, so if we want to know
the number of correct and incorrect predictions, we need to use the confusion matrix. Below is the
code for it:

Output:

In the above output image, we can see the confusion matrix, which has 6+3= 9 incorrect
predictions and62+29=91 correct predictions. Therefore, we can say that compared to other

ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®


classification models, the Decision Tree classifier made a good prediction.

5. Visualizing the training set result:


Here we will visualize the training set result.
To visualize the training set result we will
plot a graph for the decision tree classifier.
The classifier will predict Yes or No for the
users who have either Purchased or Not
purchased the SUV car as we did in Logistic
Regression. Below is the code for it:

95
Output:

The above output is completely different from


the rest classification models. It has both vertical
and horizontal lines that are splitting the dataset
according to the age and estimated salary variable.

As we can see, the tree is trying to capture each


dataset, which is the case of overfitting.

6. Visualizing the test set result:


Visualization of test set result will be similar to
the visualization of the training set except that
the training set will be replaced with the test
set.
ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

Output:

As we can see in the above image that there


are some green data points within the purple
region and vice versa. So, these are the incorrect
predictions which we have discussed in the
confusion matrix.

96
Guided Project: Predicting Bike Rentals
Many U.S. cities have communal bike sharing stations where you can rent bicycles by the hour or day.
Washington, D.C. is one of these cities. The District collects detailed data on the number of bicycles
people rent by the hour and day.

Hadi Fanaee-T at the University of Porto compiled this data into a CSV file, which you'll work with
in this project. The file contains 17380 rows, with each row representing the number of bike rentals
for a single hour of a single day. You can download the data from the University of California, Irvine's
website. If you need help at any point, you can consult the solution notebook in our GitHub repository.

Here's what the first five rows look like:

Here are the descriptions for the relevant columns:

• instant - A unique sequential ID number for each row


• dteday - The date of the rentals
• season - The season in which the rentals occurred
• yr - The year the rentals occurred
• mnth - The month the rentals occurred
hr - The hour the rentals occurred

ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®



• holiday - Whether or not the day was a holiday
• weekday - The day of the week (as a number, 0 to 7)
• workingday - Whether or not the day was a working day
• weathersit - The weather (as a categorical variable)
• temp - The temperature, on a 0-1 scale
• atemp - The adjusted temperatura
• hum - The humidity, on a 0-1 scale
• windspeed - The wind speed, on a 0-1 scale
• casual - The number of casual riders (people who hadn't previously signed up with the bike sharing
program)
• registered - The number of registered riders (people who had already signed up)
• cnt - The total number of bike rentals (casual + registered)

97
In this project, you'll try to predict the total number of bikes people rented in a given hour. You'll
predict the cnt column using all of the other columns, except for casual and registered. To accomplish
this, you'll create a few different machine learning models and evaluate their performance.

Instructions

• Use the pandas library to read bike_rental_hour.csv into the dataframe bike_rentals
• Print out the first few rows of bike_rentals and take a look at the data
• Make a histogram of the cnt column of bike_rentals, and take a look at the distribution of total rentals
• Use the corr method on the bike_rentals dataframe to explore how each column is correlated
with cnt

References and Bibliography


Papers

[1] M. Nasiri, B. Minaei, and Z. Sharifi, “Adjusting data sparsity problem using linear algebra and
machine learning algorithm,” Appl. Soft Comput., vol. 61, pp. 1153–1159, Dec. 2017, doi: 10.1016/j.
asoc.2017.05.042.

[2] G. Marzano and A. Novembre, “Machines that Dream: A New Challenge in Behavioral-Basic
Robotics,” Procedia Comput. Sci., vol. 104, pp. 146–151, Jan. 2017, doi: 10.1016/j.procs.2017.01.089.

[3] Y. Ao, H. Li, L. Zhu, S. Ali, and Z. Yang, “The linear random forest algorithm and its advantages
in machine learning assisted logging regression modeling,” J. Pet. Sci. Eng., vol. 174, pp. 776–789, Mar.
2019, doi: 10.1016/j.petrol.2018.11.067.
ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

[4] D. A. Otchere, T. O. Arbi Ganat, R. Gholami, and S. Ridha, “Application of supervised machine
learning paradigms in the prediction of petroleum reservoir properties: Comparative analysis of ANN
and SVM models,” J. Pet. Sci. Eng., vol. 200, p. 108182, May 2021, doi: 10.1016/j.petrol.2020.108182.

[5] Y. Iwamoto et al., “Development and Validation of Machine Learning-Based Prediction


for Dependence in the Activities of Daily Living after Stroke Inpatient Rehabilitation: A Decision-
Tree Analysis,” J. Stroke Cerebrovasc. Dis., vol. 29, no. 12, p. 105332, Dec. 2020, doi: 10.1016/j.
jstrokecerebrovasdis.2020.105332.

[6] K. Maheswari, A. Priya, A. Balamurugan, and S. Ramkumar, “Analyzing student performance


factors using KNN algorithm,” Mater. Today Proc., Feb. 2021, doi: 10.1016/j.matpr.2020.12.1024.

[7] L. Liang et al., “Status evaluation method for arrays in large-scale photovoltaic power stations
based on extreme learning machine and k-means,” Energy Rep., vol. 7, pp. 2484–2492, Nov. 2021, doi:
10.1016/j.egyr.2021.04.039.

98
[8] W. A. van Eeden et al., “Predicting the 9-year course of mood and anxiety disorders with
automated machine learning: A comparison between auto-sklearn, naïve Bayes classifier, and
traditional logistic regression,” Psychiatry Res., vol. 299, p. 113823, May 2021, doi: 10.1016/j.
psychres.2021.113823.

[9] C. M. Yeşilkanat, “Spatio-temporal estimation of the daily cases of COVID-19 in worldwide


using random forest machine learning algorithm,” Chaos Solitons Fractals, vol. 140, p. 110210, Nov.
2020, doi: 10.1016/j.chaos.2020.110210.

Books

[10] A. Geron, Hands-On Machine Learning with Scikit-Learn, Keras, and Tensorflow: Concepts,
Tools, and Techniques to Build Intelligent Systems, $ {number}nd édition. Sebastopol, CA: O’Reilly
Media, Inc, USA, 2019.

[11] Fondements de l’apprentissage automatique. Cambridge, MA: The MIT Press, 2012.

Platformes for self-spaced-learning

https://www.dataquest.io
https://www.analyticsvidhya.com
https://cloudacademy.com
https://data-flair.training
https://learn.datacamp.com

ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

99
ARTIFICIAL INTELLIGENCE PROFESSIONAL CERTIFICATE CAIPC®

100
www.certiprof.com

You might also like