Day 5

Day 5 focuses on Linear Regression, a key algorithm in supervised learning used to predict continuous outcomes, specifically housing prices using the California Housing Dataset. The document outlines the steps involved in the regression process, including understanding the dataset, exploring data relationships, splitting data for training and testing, training the model, and evaluating its performance using metrics like Mean Squared Error and R-squared. It emphasizes that mastering these steps is essential for progressing to more advanced machine learning techniques.

Uploaded by

Lapi Lapil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views2 pages

Day 5

Uploaded by

Lapi Lapil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Day 5: Hands-On with Regression

Welcome to Day 5 of our series! Today, we’ll explore Linear Regression, a foundational
algorithm in supervised learning. This method is widely used for predicting continuous
outcomes, and by the end of this blog, you will know how to apply it to predict housing
prices using a real-world dataset!

What is Regression?
In supervised learning, regression is a method used to predict continuous outcomes based on
input features. Unlike classification, where the goal is to categorize inputs into discrete labels,
regression predicts values, such as prices, temperatures, or sales. The most common type of
regression is Linear Regression, which assumes a linear relationship between input variables
(also called features or independent variables) and the target variable (or dependent
variable).
Key types of regression include:
 Linear Regression: Fits a straight line that best represents the data.
 Polynomial Regression: Fits a nonlinear curve to the data.
 Ridge/Lasso Regression: Adds penalties to prevent overfitting in models with many
features.
Today, we’ll focus on Linear Regression using the California Housing Dataset.

Hands-On: Linear Regression with California Housing Dataset

Step 1: Understanding the Dataset
For this practical task, we'll use the California Housing Dataset. This dataset includes
several features like the median income in an area, the average age of houses, and the
average number of rooms per household, among others. The goal is to predict the median
house price based on these factors.
The first step in any data project is to load and inspect the dataset. This helps us understand
the structure of the data and what each feature represents.
Step 2: Exploring the Data
Once we have the dataset, the next step is to explore it. Data exploration involves analyzing
the relationships between the features and the target variable (in this case, house prices). For
example, we might find that median income has a strong correlation with house prices,
making it a key predictor for our model.
Visualization is a powerful tool for data exploration. Creating plots can help us better
understand how different features relate to house prices. At this stage, you might notice that
certain features, such as the average number of rooms or population density, could have
meaningful relationships with house prices.

LinkedIn - Anubhav
Step 3: Splitting the Data
To evaluate the performance of our model, we need to split the data into training and testing
sets. The training set will be used to build the model, while the testing set will allow us to
check how well the model performs on unseen data.
A typical approach is to use around 80% of the data for training and 20% for testing. This
ensures that we have enough data to train the model while still keeping a portion to validate
its performance.
Step 4: Training the Linear Regression Model
Now that we have the data ready, it's time to build the model. Linear Regression aims to
find a straight line (a linear relationship) that best fits the data. This line is used to predict
house prices based on the input features we’ve selected.
The model will try to minimize the difference between the actual prices and the predicted
prices by finding the best possible coefficients for the input features (e.g., median income and
average rooms).
Step 5: Evaluating the Model
Once the model is trained, we can use the testing set to evaluate its performance. Two
important metrics for regression models are:
 Mean Squared Error (MSE): This measures the average squared difference between
the actual and predicted values. A lower MSE indicates better model performance.
 R-squared (R²): This tells us how well the input features explain the variation in the
target variable. An R² value close to 1 means the model explains most of the variation
in house prices.
After evaluating the model, you’ll have an idea of how well it can predict house prices based
on the features in the dataset.

Takeaways
 Linear Regression is a great starting point in supervised learning, helping us
understand how different features relate to a continuous target variable.
 In real-world projects, the process of loading data, exploring it, splitting it into
training and testing sets, building the model, and evaluating performance is a common
workflow.
 Understanding these steps will serve as a foundation for more advanced machine
learning models that we’ll cover in future days!
In the next few days, we will dive deeper into other techniques and introduce more complex
models. I’ll see you in the next blog. Until then, keep learning and stay healthy!

LinkedIn - Anubhav

Linear Regression Analysis. Statistics 2 Notes
No ratings yet
Linear Regression Analysis. Statistics 2 Notes
20 pages
House Price Prediction Using Regression Techniques: A Comparative Study
No ratings yet
House Price Prediction Using Regression Techniques: A Comparative Study
5 pages
Redshift DG
No ratings yet
Redshift DG
733 pages
A Family of Median Based Estimators in Simple Random Sampling
No ratings yet
A Family of Median Based Estimators in Simple Random Sampling
11 pages
Data Science Project Report Long
No ratings yet
Data Science Project Report Long
177 pages
ML Project Part A 1
No ratings yet
ML Project Part A 1
6 pages
Ce 463
No ratings yet
Ce 463
139 pages
Linear Regression
No ratings yet
Linear Regression
130 pages
Unit 6
No ratings yet
Unit 6
107 pages
D2 Basic Stat
No ratings yet
D2 Basic Stat
53 pages
Comparing Linear Regression and Decision Trees For Housing Price Prediction
No ratings yet
Comparing Linear Regression and Decision Trees For Housing Price Prediction
8 pages
Session 11 - BBS10 - PPT - ch13 - Simple Lin Reg
No ratings yet
Session 11 - BBS10 - PPT - ch13 - Simple Lin Reg
78 pages
Chatterjee & Hadi
100% (1)
Chatterjee & Hadi
30 pages
Assignment 1
100% (1)
Assignment 1
3 pages
4TH Year Cat 1
No ratings yet
4TH Year Cat 1
12 pages
Yug Removed
No ratings yet
Yug Removed
29 pages
Regression: Introduction: Basic Idea: Use Data To Identify Among Variables and Use These Relationships To Make
No ratings yet
Regression: Introduction: Basic Idea: Use Data To Identify Among Variables and Use These Relationships To Make
23 pages
House Price Prediction Project
No ratings yet
House Price Prediction Project
55 pages
Pratapa P Evidence of Learning 4
No ratings yet
Pratapa P Evidence of Learning 4
2 pages
Brief Lecture Notes On Simple Linear Regression Regression Analysis
No ratings yet
Brief Lecture Notes On Simple Linear Regression Regression Analysis
8 pages
Lecture - 2 Linear Regression (Applied ML)
No ratings yet
Lecture - 2 Linear Regression (Applied ML)
45 pages
Chapter 2 Slides Handout
No ratings yet
Chapter 2 Slides Handout
48 pages
AI Lec 3
No ratings yet
AI Lec 3
36 pages
House Prices Prediction in King County
No ratings yet
House Prices Prediction in King County
10 pages
ML Project CLG
No ratings yet
ML Project CLG
62 pages
Unit5 Updated
No ratings yet
Unit5 Updated
69 pages
ML Practical 04
No ratings yet
ML Practical 04
19 pages
Regression: Introduction: Basic Idea: Use Data To Identify Among Variables and Use These Relationships To Make
No ratings yet
Regression: Introduction: Basic Idea: Use Data To Identify Among Variables and Use These Relationships To Make
23 pages
Regression Analysis
No ratings yet
Regression Analysis
37 pages
HongDaiNghia NguyenPhucToan
No ratings yet
HongDaiNghia NguyenPhucToan
33 pages
Data Analysis and Modeling
No ratings yet
Data Analysis and Modeling
24 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
Project Report
No ratings yet
Project Report
15 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
27 pages
House Price Prediction Using Linear Regression in ML
No ratings yet
House Price Prediction Using Linear Regression in ML
9 pages
Sameeksha Mishra Project Report
No ratings yet
Sameeksha Mishra Project Report
28 pages
Da Unit 3 R22
No ratings yet
Da Unit 3 R22
15 pages
Pa Da1
No ratings yet
Pa Da1
17 pages
BSD 3101-Lab Exercise 1
No ratings yet
BSD 3101-Lab Exercise 1
12 pages
Unit 5
No ratings yet
Unit 5
18 pages
DSBDAL - Assignment No 4
No ratings yet
DSBDAL - Assignment No 4
15 pages
House Price Prediction
No ratings yet
House Price Prediction
29 pages
R2-V1 Exam 3 Morning
No ratings yet
R2-V1 Exam 3 Morning
69 pages
1822 B.E Ece Batchno 120
No ratings yet
1822 B.E Ece Batchno 120
29 pages
Lab9 Solution 24052024 115622am
No ratings yet
Lab9 Solution 24052024 115622am
10 pages
Intro 4 Up
No ratings yet
Intro 4 Up
7 pages
Predicting House Prices
No ratings yet
Predicting House Prices
9 pages
Mod2 - Multiple Linear Regression
No ratings yet
Mod2 - Multiple Linear Regression
10 pages
Lab 2 Linear Regression Representation
No ratings yet
Lab 2 Linear Regression Representation
6 pages
Predicting House Prices Using Machine Learning
No ratings yet
Predicting House Prices Using Machine Learning
6 pages
Chapter 7: Parameter Estimation: ST2334 Probability and Statistics (Academic Year 2014/15, Semester 1)
No ratings yet
Chapter 7: Parameter Estimation: ST2334 Probability and Statistics (Academic Year 2014/15, Semester 1)
45 pages
AI Lab7
No ratings yet
AI Lab7
13 pages
ML Assignment2 33418
No ratings yet
ML Assignment2 33418
6 pages
AWS Learning Material
No ratings yet
AWS Learning Material
13 pages
Making Predictions
No ratings yet
Making Predictions
13 pages
Coefficient of Determination
No ratings yet
Coefficient of Determination
11 pages
SEM 4 Mini Project On CHPP
No ratings yet
SEM 4 Mini Project On CHPP
8 pages
Task 1
No ratings yet
Task 1
11 pages
Fully Modified Ols For Heterogeneous Cointegrated Panels: Peter Pedroni
No ratings yet
Fully Modified Ols For Heterogeneous Cointegrated Panels: Peter Pedroni
38 pages
Shub Neet DT
No ratings yet
Shub Neet DT
12 pages
Pertemuan Sesi 3
No ratings yet
Pertemuan Sesi 3
34 pages
Causal Inference Using Difference-in-Differences: Lecture 7: Leveraging Repeated Cross-Sectional Data
No ratings yet
Causal Inference Using Difference-in-Differences: Lecture 7: Leveraging Repeated Cross-Sectional Data
30 pages
A Second Analysis of A Marking, Tagging, and Recovery Program For Central Valley Hatchery Chinook Salmon
No ratings yet
A Second Analysis of A Marking, Tagging, and Recovery Program For Central Valley Hatchery Chinook Salmon
47 pages
Pdfhouse Price Prediction System
No ratings yet
Pdfhouse Price Prediction System
9 pages
R Square 30%
No ratings yet
R Square 30%
10 pages
AIML
No ratings yet
AIML
5 pages
Day 57
No ratings yet
Day 57
11 pages
Tuesday (2:30-4:30) Wednesday (7:30-9:30) Tuesday (2:30-4:30)
No ratings yet
Tuesday (2:30-4:30) Wednesday (7:30-9:30) Tuesday (2:30-4:30)
3 pages
Assignment 1 AI
No ratings yet
Assignment 1 AI
6 pages
Ads Lab8
No ratings yet
Ads Lab8
5 pages
Chapter 16. Simultaneous Equations Models
No ratings yet
Chapter 16. Simultaneous Equations Models
23 pages
Regression Dataset
No ratings yet
Regression Dataset
3 pages
Multi-Equations Econometrics Model
No ratings yet
Multi-Equations Econometrics Model
23 pages
Explain Me Every Code Written in It With Deep Know
No ratings yet
Explain Me Every Code Written in It With Deep Know
7 pages
House Prices
No ratings yet
House Prices
5 pages
Phase 5
No ratings yet
Phase 5
5 pages
Leslie Salt Property Project Report
No ratings yet
Leslie Salt Property Project Report
10 pages
Estadistica, Articulo, Analyzing Outliers: Influential or Nuisance?
No ratings yet
Estadistica, Articulo, Analyzing Outliers: Influential or Nuisance?
3 pages
Data Mining Final Assignment
No ratings yet
Data Mining Final Assignment
4 pages
Synopsis 01
No ratings yet
Synopsis 01
2 pages
Production Function Estimation and Related Risk Considerations
No ratings yet
Production Function Estimation and Related Risk Considerations
9 pages
MLP Project My Part VP
No ratings yet
MLP Project My Part VP
3 pages
ML Week 4
No ratings yet
ML Week 4
5 pages
Chapter 8.3. Maximum Likelihood Estimation: Prof. Tesler
No ratings yet
Chapter 8.3. Maximum Likelihood Estimation: Prof. Tesler
11 pages
Day 76
No ratings yet
Day 76
10 pages
Lab 12 Worksheet Correlation
No ratings yet
Lab 12 Worksheet Correlation
2 pages
Day 62
No ratings yet
Day 62
9 pages
Day 24
No ratings yet
Day 24
8 pages
Exp 4
No ratings yet
Exp 4
2 pages
Day 27
No ratings yet
Day 27
6 pages
Tests of Normality
No ratings yet
Tests of Normality
6 pages
Day 28
No ratings yet
Day 28
5 pages
Week1 Exercises
No ratings yet
Week1 Exercises
3 pages
Practical Activity 01: Linear Regression: Case of Study: Predicting House Prices
No ratings yet
Practical Activity 01: Linear Regression: Case of Study: Predicting House Prices
2 pages
University of Mumbai: Examinations Summer 2022 Quantitative Analysis
No ratings yet
University of Mumbai: Examinations Summer 2022 Quantitative Analysis
3 pages
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet

Day 5

Uploaded by

Day 5

Uploaded by

Day 5: Hands-On with Regression

Hands-On: Linear Regression with California Housing Dataset

You might also like