0% found this document useful (0 votes)

7 views11 pages

Assignment1

The document outlines an assignment on linear regression for predicting used car prices, detailing data visualization, variable relationships, and the importance of data partitioning. Key findings indicate that Age, Mileage, and Weight are significant predictors of price, and the Backward Elimination method was preferred for variable selection. Additionally, the assignment highlights the business value of price prediction models and the broader applications of linear regression in fields like finance and healthcare.

Uploaded by

1182023362w

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views11 pages

Assignment1

Uploaded by

1182023362w

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

Course: 4KG3

Assignment 1 -Linear regression for prediction

1. Using data visualization for initial variable investigation and selection (20%)

a) What are the distributions for the dependent and independent variables?

There are 11 variables. Among them, 10 independent variables: Age, Mileage, Fuel Type, Horse Power, Metalic,
Automatic, CC, Doors, QuartTax and Weight, and one dependent variable Price.
By analyzing the distribution, it is observed that some continuous variables such as Price, Mileage, and Weight are
left skewed, while age exhibits right-skewed distributions. Most variables, such as Price, age, Mileage, Horse Power,
CC, QuartTax, Weight have outliers. Other variables are relatively concentrated or dispersed.

b) How do they related to each other?

Based on the Scatterplot Matrix analysis and “fit Y by X by price”, there is a negative relationship between Price and
Age, Price and Mileage. As the age or mileage of a used car increases, its price tends to decrease, vice vera. In
contrast, there is a positive relationship between Price and Weight, indicating heavier vehicles are typically more
expensive. Horse Power and QuartTax also rise with price, but Fuel type, Doors, Metallic, Automatic, CC have less
clear effects on price.
Scatterplot Matrix：

Fit Y by X by price:
c) What appears to be the three or four most important car specifications for predicting the used car price?

The most important specifications that affect the price of a used car are Age, Mileage, and Weight. By analyzing the
Scatterplot Matrix and Fit Y by X above, we can see that Price has a strong negative correlation with Age and
Mileage, which means that the older the car, the higher the mileage, the lower the price. Conversely, there is a
strong positive correlation between weight and price, indicating that heavier vehicles tend to be more expensive.

2. Data preparation and partition for training, validation, and testing (10%)

a) Why do we need to convert fuel type to dummy variables?

Since the Linear Regression model only deals with numerical variables. Dummy variables transform categorical
(discrete) data into numerical data. Adding dummy variables to the analysis will help to create a better fit of the
model. To ensure that the model runs effectively and predicts accurately, we need to convert the Fuel Type to a
dummy variable.

b) Why do we need to do data partition?

In constructing the prediction model, the data was divided into Training, Validation and Test sets. The model was
trained using the training set, and the performance of the model was evaluated on the validation set. Without data
partitioning, the model may overfit the noise in the training data, resulting in decreasing predictive ability and
accuracy. Data partitioning can reflect the performance of the model more objectively and realistically, mitigate
overfitting issues and improve performance of the predicting model.

3. Run a linear regression with all available variables (30%)

a) What is the mathematical formula of the regression model obtained?

b) How do you calculate the predicted price and the prediction error (residual) for each record?

We can use the prediction expression to calculate the predicted price. For each record, plugging in the value of each
independent variable into the prediction expression, we can calculate the predicted price.

The residual is known by calculating the difference between the actual value and the predicted value
(residual=actual value-predicted value). The residuals can also be approximated from the adjusted R-square. When
the adjusted R-square is high, it means that the model fits the data well, and the overall deviation of the predicted
value from the actual value is relatively small; and vice versa.

c) Show the error distribution for training, validation and testing? What are the differences between them?

Below are the error distributions for Training, Validation and Testing.

The residual mean of the Training set is close to 0, and the standard deviation is 1332.2942, indicating that the
model fits well on the training data. The Validation set has a residual mean of 74.991602 and a standard deviation of
1422.1056, which is slightly higher than the training set, but still within acceptable limits. The Test set has a residual
mean of -134.0457 and a standard deviation of 1332.3742, which is similar to the validation set, and the overall
performance is stable.
The R-square values are 0.8711 for the Training set, 0.8571 for the Validation set, and 0.8781 for the Test set. The
values of these three sets are close to each other. Overall, these values indicate that my model performs well across
different datasets, there is no obvious overfitting problem. The performance of the model is reliable, it can predict
the price consistently.
4. Automated variable selection (30%)

a) What methods have you used for variable selection?

Three methods were used for variable selection: Forward selection, Backward elimination and Mixed Stepwise.

Forward selection: using Max Validation R- square as the stopping rule.

Backward elimination: using Max Validation R- square as the stopping rule.

Mixed Stepwise: p-value Threshold is used as the stopping rule.

b) What is the best set of variables you will use? What are the criteria used for selection?

The best set of variables to select are Age, Weight, Mileage, Horse Power, QuartTax, Fuel Type (CNG).

Max Validation R-Square was used as criteria for selection. In my analysis, the Backward Elimination method has
higher R-Square (0.8661) and higher Adjusted R-square (0. 8627) than forward selection method (R-Square 0.8653
and Adjusted R-square 0.8605). In addition, Backward Elimination has lower BIC and AIC values, demonstrated
better performance and simplicity. Therefore, Backward elimination is the preferred model to select variables.

The selection criterion is based on the Backward Elimination method. It starts with all predictors and removes the
least relevant predictor one by one in order and stops when the Validation R-Square doesn’t improve when
removing additional variables. By doing so, Age, Weight, Mileage, Horse Power, QuartTax, Fuel Type (CNG) are
selected as effective predictors in the model.

5.Possible real word applications (10%)

a) What is the potential business value for used car price prediction?

A used car price prediction model can greatly benefit dealerships and buyers. For dealerships, it helps in pricing cars
competitively, ensuring a fair profit margin. Buyers can make informed decisions, avoiding overpaying for a vehicle.
The model can also analyze trends, helping sellers understand market demands. Additionally, it streamlines the
buying and selling process, reducing the time spent on negotiations. By predicting accurate prices, it promotes
transparency and trust between buyers and sellers. Overall, this model enhances decision-making, improves
customer satisfaction, and boosts sales in the used car market.

b) Search the web and find out potential utilities of linear regression in other domains such as finance, healthcare
etc.

Linear regression has potential utility in the financial field and healthcare. Take financial as an example. Linear
regression is primarily used to identify relationships between different financial variables, allowing analysts to
predict future values like stock prices, forecast company performance, and assess the impact of various factors on
investment returns. By analyzing historical data and identifying trends through a "best fit" line that shows how
variables change in relation to stock price or company value. For example, we can use linear regression method to
calculate a stock's "beta" to measure its volatility compared to the overall market.

5. Report how much time you have spent on this assignment, what problem you have faced and what you have
learned.

This assignment took me about 10 hours to finish it. It was the first time for me to use JMP, it took some time to
understand its functions. Through the professor's demonstrations and videos in class, I gradually understood the JMP
software and know how to use it to do basic linear regression. By doing the assignment, I gained a more
comprehensive understanding of data distribution, data partition and their importance in data analysis. I also
learned how to use three different stepwise methods to select the most meaningful variables, and what criterions
can be used to judge their performance so that we can build the most effective prediction model. One problem I
met is understanding of the patriation and how it affects outcomes. At first, I didn’t understand why each partition
generates different prediction expressions. Later, I knew that when we partition the data, the date sets divided into
3 data sets: training, validation and test. Each time we partition the data sets, different data sets are created, so the
prediction expression is different either. I also learned that linear regression is a useful data analysis tool for building
predictive modes that can be widely applied in different business sectors, such as financial, healthcare, marketing
etc. to make informed decisions.

Linear Regression
100% (1)
Linear Regression
16 pages
4TH YEAR CAT 1
No ratings yet
4TH YEAR CAT 1
12 pages
Motor Trend Car Road Tests
No ratings yet
Motor Trend Car Road Tests
5 pages
Kuiper Ch03 PDF
No ratings yet
Kuiper Ch03 PDF
35 pages
Session7 LinearRegression
No ratings yet
Session7 LinearRegression
52 pages
RapidMiner Lab 1
No ratings yet
RapidMiner Lab 1
73 pages
Cars4u Project: Proprietary Content. © Great Learning. All Rights Reserved. Unauthorized Use or Distribution Prohibited
100% (2)
Cars4u Project: Proprietary Content. © Great Learning. All Rights Reserved. Unauthorized Use or Distribution Prohibited
30 pages
Car Price Prediction Using Various Algorithms
100% (1)
Car Price Prediction Using Various Algorithms
19 pages
Kuiper Ch03
No ratings yet
Kuiper Ch03
35 pages
Prediction of Resale Value of The Car Using Linear Regression Algorithm
No ratings yet
Prediction of Resale Value of The Car Using Linear Regression Algorithm
5 pages
Carprediction
No ratings yet
Carprediction
9 pages
Is 4410 AzureML Regression Predict Auto Price-1
No ratings yet
Is 4410 AzureML Regression Predict Auto Price-1
15 pages
S2-Linear-Regression-LKW-9March2025
No ratings yet
S2-Linear-Regression-LKW-9March2025
23 pages
Fa22 Project Submission 2
No ratings yet
Fa22 Project Submission 2
6 pages
20220523121909pmwebology 18 (6) - 443 PDF
No ratings yet
20220523121909pmwebology 18 (6) - 443 PDF
14 pages
Multiple Regression Analysis Project
No ratings yet
Multiple Regression Analysis Project
9 pages
Six Sigma Tools in A Excel Sheet
No ratings yet
Six Sigma Tools in A Excel Sheet
23 pages
Car Price Prediction
No ratings yet
Car Price Prediction
12 pages
BSD 3101-Lab Exercise 1
No ratings yet
BSD 3101-Lab Exercise 1
12 pages
Name Daksh Pathak PRN 23021141032 Business Analytics Topics-Assignment 1, Optimization, Predictive Analysis
No ratings yet
Name Daksh Pathak PRN 23021141032 Business Analytics Topics-Assignment 1, Optimization, Predictive Analysis
9 pages
Text Problems Solved
No ratings yet
Text Problems Solved
9 pages
regression_presentation
No ratings yet
regression_presentation
12 pages
Regression_Analysis_Sumana Mondal
No ratings yet
Regression_Analysis_Sumana Mondal
18 pages
Model Evalution
No ratings yet
Model Evalution
6 pages
WW-M1 Bernardo
No ratings yet
WW-M1 Bernardo
3 pages
Assignment Report - Predictive Modelling - Rahul Dubey
No ratings yet
Assignment Report - Predictive Modelling - Rahul Dubey
18 pages
QMB 3200 Spring 2024 Project Assignment 2
No ratings yet
QMB 3200 Spring 2024 Project Assignment 2
5 pages
Regression Analysis
No ratings yet
Regression Analysis
19 pages
INSY446 - 02 - Linear Model Part 1
No ratings yet
INSY446 - 02 - Linear Model Part 1
27 pages
Simple Linear Regression in SPSS
No ratings yet
Simple Linear Regression in SPSS
8 pages
Used Cars Price Prediction and Valuation Using Data Mining Techni
No ratings yet
Used Cars Price Prediction and Valuation Using Data Mining Techni
37 pages
Lab 10 Ai Mussab(Fa22 Bce 073)
No ratings yet
Lab 10 Ai Mussab(Fa22 Bce 073)
7 pages
Project Group 20
No ratings yet
Project Group 20
3 pages
7406HW03
No ratings yet
7406HW03
2 pages
Strategic Service Management
No ratings yet
Strategic Service Management
312 pages
Homework1-1
No ratings yet
Homework1-1
3 pages
Sales Car Price Predictions
No ratings yet
Sales Car Price Predictions
6 pages
Ajay and Saurabh
No ratings yet
Ajay and Saurabh
16 pages
Car Price Prediction Report
No ratings yet
Car Price Prediction Report
8 pages
Project Stat
No ratings yet
Project Stat
2 pages
Lab 6
No ratings yet
Lab 6
2 pages
Second Course in Statistics Regression Analysis 7th Edition Mendenhall Solutions Manual download
100% (2)
Second Course in Statistics Regression Analysis 7th Edition Mendenhall Solutions Manual download
48 pages
Final Report Team 4
No ratings yet
Final Report Team 4
12 pages
Used Car Price Prediction Using Machine Learning: Veluru Ranjith (Urk18Cs020)
No ratings yet
Used Car Price Prediction Using Machine Learning: Veluru Ranjith (Urk18Cs020)
26 pages
Sample Paper 6
No ratings yet
Sample Paper 6
10 pages
Car Price Prediction
No ratings yet
Car Price Prediction
18 pages
NN-BNU3
No ratings yet
NN-BNU3
42 pages
Rohit Godke Dsbda Report Sppu
No ratings yet
Rohit Godke Dsbda Report Sppu
10 pages
Lecture Plan 12 - 16!1!1
No ratings yet
Lecture Plan 12 - 16!1!1
7 pages
Capstone Project
No ratings yet
Capstone Project
24 pages
Assignment 3.1
No ratings yet
Assignment 3.1
15 pages
ME Paper - HIG Mill Modelling&optimization
No ratings yet
ME Paper - HIG Mill Modelling&optimization
9 pages
Predictive Model: Submitted by
100% (3)
Predictive Model: Submitted by
27 pages
Zero Code Learning Business Analytics Assignment I: Regression Analysis
No ratings yet
Zero Code Learning Business Analytics Assignment I: Regression Analysis
2 pages
We Have To Consider Different Variables While Purchasing The Used Cars
No ratings yet
We Have To Consider Different Variables While Purchasing The Used Cars
2 pages
regression lecture notes
No ratings yet
regression lecture notes
8 pages
Statistics True or False
100% (1)
Statistics True or False
9 pages
VaibhavKumar Extendedproject PDF
100% (2)
VaibhavKumar Extendedproject PDF
10 pages
33 Submission
No ratings yet
33 Submission
8 pages
MTCARS Regression Analysis
No ratings yet
MTCARS Regression Analysis
5 pages
Demand Estimation: Managerial Economics: Economic Tools For Today's Decision Makers, 4/e by Paul Keat and Philip Young
No ratings yet
Demand Estimation: Managerial Economics: Economic Tools For Today's Decision Makers, 4/e by Paul Keat and Philip Young
44 pages
Untitled Document
No ratings yet
Untitled Document
6 pages
1st Review
No ratings yet
1st Review
9 pages
Machine Learning-Based Models for Accurate Car Pri
No ratings yet
Machine Learning-Based Models for Accurate Car Pri
6 pages
Belay Getachew PDF
100% (1)
Belay Getachew PDF
119 pages
Open Electives Circular VII Sem AY 2021-22
No ratings yet
Open Electives Circular VII Sem AY 2021-22
31 pages
Oktari Et Al 2016 Determinan Modal Intelektual Pada Perusahaan Publik Di Indonesia Dan Implikasinya Terhadap Nilai Perusahaan
No ratings yet
Oktari Et Al 2016 Determinan Modal Intelektual Pada Perusahaan Publik Di Indonesia Dan Implikasinya Terhadap Nilai Perusahaan
29 pages
Telecom Customer Churn
0% (1)
Telecom Customer Churn
39 pages
Rubinstein 2014 - Internaciones Evitables Por ACSC
No ratings yet
Rubinstein 2014 - Internaciones Evitables Por ACSC
13 pages
The Impact of Foreign Remittances On Economic Growth: Evidence From Zimbabwe
No ratings yet
The Impact of Foreign Remittances On Economic Growth: Evidence From Zimbabwe
17 pages
Effectiveness of Performance Appraisal Systems and Their Impact On Employee Motivation and Productivity
No ratings yet
Effectiveness of Performance Appraisal Systems and Their Impact On Employee Motivation and Productivity
35 pages
24267-Article Text-40991-1-10-20221109
No ratings yet
24267-Article Text-40991-1-10-20221109
12 pages
Savitribai Phule Pune University: A Report On Mini Project
No ratings yet
Savitribai Phule Pune University: A Report On Mini Project
10 pages
STAT 511 MID - T - MCQs
No ratings yet
STAT 511 MID - T - MCQs
16 pages
Online Advertising and Consumers Patronage of Female Wears in Port Harcourt Metropolis
No ratings yet
Online Advertising and Consumers Patronage of Female Wears in Port Harcourt Metropolis
9 pages
(80 - 88) Ayu Puspita
No ratings yet
(80 - 88) Ayu Puspita
9 pages
The Effect of Audit Quality On Earnings Management
No ratings yet
The Effect of Audit Quality On Earnings Management
17 pages
IJRAR22B1253
No ratings yet
IJRAR22B1253
20 pages
An Analysis of Linkage Between Economic Value Added and Corporate Social
No ratings yet
An Analysis of Linkage Between Economic Value Added and Corporate Social
11 pages
Module 11 Unit 2 Simple Linear Regression
No ratings yet
Module 11 Unit 2 Simple Linear Regression
10 pages
Soal UAS Ekonometrika 2018
No ratings yet
Soal UAS Ekonometrika 2018
8 pages
The Impact of Competitiveness, Information Technology, Risk Perception, and Financial Literacy On The Intention To Invest in Cryptocurrency
No ratings yet
The Impact of Competitiveness, Information Technology, Risk Perception, and Financial Literacy On The Intention To Invest in Cryptocurrency
7 pages
N Paper 0-92-11may-Ccbs Docs 9916 1
No ratings yet
N Paper 0-92-11may-Ccbs Docs 9916 1
12 pages
PROP - Does Media Literacy Help Identification of Fake News? Information Literacy Helps, But Other Literacies Don't
No ratings yet
PROP - Does Media Literacy Help Identification of Fake News? Information Literacy Helps, But Other Literacies Don't
18 pages
2018 Biosatics MCQ
100% (4)
2018 Biosatics MCQ
33 pages
Series
No ratings yet
Series
5 pages
Learning Reinforcement 6 J.villanueva Gmath 2567
No ratings yet
Learning Reinforcement 6 J.villanueva Gmath 2567
4 pages
Chapter4 Solutions
No ratings yet
Chapter4 Solutions
5 pages
Applied Econometrics: A Simple Introduction
From Everand
Applied Econometrics: A Simple Introduction
K.H. Erickson
5/5 (2)
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet

Assignment1

Uploaded by

Assignment1

Uploaded by

Course: 4KG3

Assignment 1 -Linear regression for prediction

b) How do they related to each other?

a) Why do we need to convert fuel type to dummy variables?

b) Why do we need to do data partition?

3. Run a linear regression with all available variables (30%)

a) What is the mathematical formula of the regression model obtained?

a) What methods have you used for variable selection?

Forward selection: using Max Validation R- square as the stopping rule.

Mixed Stepwise: p-value Threshold is used as the stopping rule.

5.Possible real word applications (10%)

You might also like