0% found this document useful (0 votes)
19 views24 pages

Capstone Project

This project report outlines the development of a machine learning model for predicting car prices using the CarPrice dataset, which consists of 205 records and 26 features. The study employs Linear Regression and Random Forest Regression models, with Random Forest achieving better accuracy and robustness in predictions. Key findings highlight the influence of features like engine size and horsepower on car prices, and future work suggests incorporating deep learning techniques and external economic factors.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views24 pages

Capstone Project

This project report outlines the development of a machine learning model for predicting car prices using the CarPrice dataset, which consists of 205 records and 26 features. The study employs Linear Regression and Random Forest Regression models, with Random Forest achieving better accuracy and robustness in predictions. Key findings highlight the influence of features like engine size and horsepower on car prices, and future work suggests incorporating deep learning techniques and external economic factors.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Project Report

on
Title

Submitted in partial fulfillment for completion of


AI-training

SUBMITTED TO
FOUNDATION FOR INNOVATION
AND TECHNOLOGY TRANSFER

भारतीय प्रौद्योगिकी संस्थान गिल्ली


INDIAN INSTITUTE OF TECHNOLOGY DELHI

Submitted By
PRASHANT DWIVEDI
[email protected]

COLLEGE NAME & ADDRESS


GOVT PMCOE COLLEGE SATNA (MP)
TABLE OF CONTENT
1. GOAL
2. INTRODUCTION

2.1 PROBLEM FORMULATION

2.2 LIBRARY USED

2.3 TASK

3. LITERATURE SURVEY

3.1 CURRENT MODELS FOR PRICE PREDICTION

3.2 SELECTED MODEL THEORY

3.3 MODEL FUNCTIONING

4. DATASET

4.1 DATA COLLECTION

4.2 DATA PREPROCESSING

4.3 DATA VISUALIZATION

5. EDA (EXPLORATORY DATA ANALYSIS)

5.1 INTRODUCTION TO EDA

5.2 CAR PRICE DATASET ANALYSIS

6. MODEL SELECTION AND TRAINING

6.1 MODEL SELECTION FOR CAR PRICE PREDICTION

6.2 FEATURE SELECTION

6.3 TRAINING THE MODELS

6.4 RESULT SUMMARY


7. MODEL EVALUATION

7.1 EVALUATION METRICS

7.2 CROSS VALIDATION & TESTING

7.3 CONCLUSION

8. CONCLUSION

2.1 KEY ACHIEVEMENTS

2.2 IMPLICATIONS AND APPLICATION

2.3 FUTURE DIRECTION

2.4 FINAL THOUGHTS


1. GOAL
The overall objective of this research is to explore the CarPrice dataset and create a
predictive model to predict car prices from features. It assists us in identifying the most
influential factors on car prices and making informed decisions in the auto market.The car
prices are determined by a huge set of parameters like brand, engine, mileage, and type.
We can develop a good price prediction model with data analysis and machine learning
algorithms. We have to develop an accurate price prediction model.

2. INTRODUCTION
2.1 Problem Formulation
Car price prediction is a vital part of the automobile industry that affects customers,
sellers, and manufacturers. A reliable car price forecasting system can help automobile
dealers to price their automobiles sensibly, inform customers about a better decision,
and allow manufacturers to modify prices reasonably. Manual comparisons are the
conventional technique for car pricing, which might have been subjective and less
accurate. Through the use of data-centric approaches, we can automate and enhance
the precision of car pricing estimation.

2.2 Libraries Used


To perform data analysis and machine learning, we utilize the following Python libraries:

• Pandas: For data manipulation and analysis.


• NumPy: For numerical operations and handling arrays.
• Matplotlib & Seaborn: For data visualization and pattern recognition.
• Scikit-learn: For implementing machine learning models and evaluation
metrics.
2.3 Task
The task is divided into some key points:

• Data Exploration: Getting to know the organization of the dataset and the most
critical features.
• Data Preprocessing: Handling missing values, encoding categorical variables, and
normalization.
• Exploratory Data Analysis (EDA): Identification of the most significant trends and
correlations within the dataset.
• Model Selection and Training: Choosing and training the most suitable machine
learning algorithms.
• Model Evaluation: Comparison of model performance using standard evaluation
metrics.
• Conclusion and Future Work: Summary of results and proposing improvements.

3. LITERATURE SURVEY
3.1 Current Models for Price Prediction

A few of the machine learning models used commonly for predicting car prices are:

• Linear Regression: Creates linear relationships between independent variables and


price.
• Decision Trees: Divides data into hierarchical rules for interpretability.
• Random Forest: A collection of decision trees that increases accuracy and avoids
overfitting.
• Support Vector Machines (SVM): Determines best hyperplanes to classify and
predict values.
• Neural Networks: Deep learning architectures that can identify complex patterns.

3.2 Selected Model Theory

For the purposes of research here, we have Multiple Linear Regression and Random Forest
Regression.

• Linear Regression: A simple but concise model that postulates a linear relationship
between attributes and car price.
• Random Forest: An ensemble algorithm that enhances accuracy by aggregating
several decision trees, hence being less susceptible to overfitting.

3.3 Model Functioning

• Linear Regression: The model is trained to estimate the function mapping input
features to car prices as a weighted sum of predictors.
• Random Forest: Decision trees are learned in parallel and whose predictions are
averaged to provide more accurate outcomes.
4. DATASET
4.1 Data Collection

The CarPrice dataset has 205 rows and 26 columns, with prominent car features such as:

• Car details: Name of the car, type of fuel, type of body, type of drive.
• Engine specification: Size of the engine, horsepower, fuel system.
• Performance indicators: City/highway MPG mileage, curb weight, compression
ratio.
• Price: The target variable to be predicted.

4.2 Data Preprocessing

• Missing value management: Having full data in every row.


• Encoding categorical variables: Converting car names, fuel types, and other
categorical attributes into numerical form.
• Feature scaling: Scaling numeric features to improve the stability of the model.

4.3 Data Visualization

In order to obtain insights from the dataset, we use:

• Histograms: To observe the distribution of numeric variables.


• Scatter plots: To observe relationships between car price and features like engine
size and horsepower.
• Correlation heatmap: To observe the most influencing factors on car price.

5. EXPLORATORY DATA ANALYSIS (EDA)


5.1 Introduction to EDA

Exploratory Data Analysis helps uncover hidden patterns, detect outliers, and identify
feature importance before building predictive models.

5.2 Car Price Dataset Analysis

Key findings from EDA:

• Strong correlation between engine size and car price.


• Luxury brands tend to have significantly higher prices.
• Fuel efficiency (MPG) negatively correlates with price; high-performance cars are
less fuel-efficient but more expensive.

6. MODEL SELECTION AND TRAINING


6.1 Model Selection for Car Price Prediction

We compare different models based on performance metrics such as:

• Root Mean Squared Error (RMSE)


• Mean Absolute Error (MAE)
• R² Score
6.2 Feature Selection

Identifying the most important features for prediction using:

• Correlation matrix
• Feature importance scores from Random Forest

6.3 Training the Model

• Splitting the dataset into training (80%) and testing (20%) sets.
• Training Linear Regression and Random Forest models.
• Using hyperparameter tuning to optimize model performance.
6.4 Results Summary
• Linear Regression: Provides interpretability but may not capture complex patterns.
• Random Forest: Offers better accuracy and robustness.

7 MODEL EVALUATION
7.1Evaluation Metrics

• Mean Squared Error (MSE): Measures the average squared difference between
predicted and actual values.
• Mean Absolute Error (MAE): Calculates the average absolute differences.
• R² Score: Indicates how well independent variables explain price variance.

7.2 Cross-validation & Testing


To prevent overfitting, we use k-fold cross-validation and test model performance on
unseen data.

7.3 Conclusion of Model Performance

• Random Forest performs best, achieving the lowest RMSE and highest R² score.
• Linear Regression is useful for interpretability, though less accurate.

8. CONCLUSION
8.1 Key Achievements

• Developed and compared models for car price prediction.


• Identified key factors influencing car price.
• Improved prediction accuracy using Random Forest.

8.2 Implications and Applications

• Car Dealerships: Helps in setting competitive prices.


• Buyers: Assists in evaluating if a car is priced fairly.
• Manufacturers: Aids in pricing strategies based on market trends.

8.3 Future Directions

• Using deep learning techniques for better predictions.


• Incorporating external economic factors like inflation and market demand.
• Collecting a larger dataset for improved model generalization.

8.4 Final Thoughts

This study demonstrates the power of machine learning in price prediction. By leveraging
data-driven techniques, we can achieve accurate and reliable estimates, benefiting both
consumers and businesses in the automotive industry.
ABSTRACT
This is a project for machine learning-based car price prediction. Car price varies on the
basis of many parameters such as brand, engine size, horsepower, body, fuel, etc.
Human estimation of price cannot be exact and consistent. Hence, this project
attempts to develop a model that estimates car prices from structured data on these
parameters.

We used a dataset that contains data about 205 cars with 26 features. The data were
preprocessed and cleaned by converting text data to numeric values and scaling the
features. We used the data to determine which features are most influential in car
prices. Engine size, horsepower, and curb weight were the most influential features.

We employed two machine learning models: Linear Regression and Random Forest
Regression. Linear Regression helped us develop a simple, interpretable model, while
Random Forest helped with better performance since it could handle complex patterns
in the data. We contrasted the models using the performance metrics of R² score, Mean
Squared Error, and Mean Absolute Error, and we noticed that Random Forest performed
much better with more than 92% accuracy.

This project illustrates how machine learning can be employed to accurately predict car
prices and allow buyers, sellers, and car dealers to make better-informed decisions.
This can be enhanced in the future with increased data and the development of a web
application for ease of use.

CHAPTER 1

INTRODUCTION
Forecasting the price of a car is a sophisticated task that entails countless features like
technical features, reputation of the brand, fuel type, body style, and powerplant. Price
forecasting in the conventional car market usually relies on experience or narrow
market information.
However, with the growing ability of data analytics and machine learning, we are now in
a position to build intelligent systems that can learn from past data and calculate car
prices with high accuracy.
The fundamental objective of this project is to develop a machine learning model that
can predict the price of a car based on its features. It will be helpful to several
stakeholders across the automotive community:
The consumer knows very well how much a car is worth when they purchase it.
Sellers and dealers can price competitively with the help of forecasting methods.
Producers know the impact various features have on cost and make smart design
decisions.
The data set utilized in this project has 205 records of different car models, each with 26
attributes like car brand, fuel type, engine size, horsepower, body style, and the price of
the car. The variability in this data set enables us to analyze how various features affect
the price and create a sound model.
In order to complete project follows these steps:
1. Data Preprocessing: Cleaned and preprocessed the dataset to a suitable format.
2. Exploratory Data Analysis (EDA): Identifying patterns, trends, and relationships within
the data.
3. Model Training and Selection: Selecting appropriate machine learning models and
training them on the available data.
4. Model Evaluation: Evaluating the performance of each model using the appropriate
metrics.
5. Conclusion and Future Scope: Condensing results and establishing how the model
can be extended or enhanced.
At the completion of this project, we want to show that machine learning can be a
useful tool in the car business for smart price estimation. Through good analysis and
modeling, we can develop a system that provides accuracy and understanding of how
various features affect automobile prices.

CHAPTER 2
LITRATURE SURVEY
Existing Practices
• Linear Regression: Excellent default model, although linear relationship is
assumed.
• Decision Trees: Can handle non-linear relationships but over fit.
• Random Forest: Trained ensemble of trees that compensates for variance
and provides a more precise output.
• Support Vector Regression (SVR): Excellent with small or high-
dimensional data.
• Neural Networks: Represent complex patterns at the cost of larger
datasets and more computation.

Models of Interest
• Linear Regression: For understanding and baseline.
• Random Forest Regression: For fitting complex interactions among
features.

CHAPTER 3
DATA COLLECTION AND PREPROCESSING
Dataset Overview
• Total Records: 205
• Columns: 26
• Target Variable: price
• Data Types: 10 categorical, 8 integer, 8 float

Preprocessing Steps
• No missing values were found.
• Brand extraction: Extracted brand from CarName by splitting on the space.
• Label Encoding: Used for categorical features such as fueltype, aspiration,
etc.
• Feature scaling: StandardScaler applied to numeric features for
normalization.
CHAPTER 4
Exploratory Data Analysis (EDA)
EDA uncovered strong correlations between a number of features and car price:

enginesize, horsepower, and curbweight all positively correlate with price

Brand name has a significant impact on price

Visualizations such as scatter plots, box plots, and heatmaps were employed to
detect trends and correlations

Key Observations:
• Sedan and hatchback vehicles are lower-priced
• Luxury brands exhibit considerably higher prices
• High multicollinearity detected between some numeric features

CHAPTER 5
MODEL SELECTION AND TRAINING
We chose Linear Regression as the starting model due to its interpretability and
performance on continuous variables.

Training Steps
• Data split: 80% for training, 20% for testing
• Used LinearRegression() from scikit-learn
• Trained on several predictors, both numerical and encoded categorical
variables
CHAPTER 6
MODEL EVALUATION
The Linear Regression model was tested with:

• R² Score: ~0.85 (meaning that the model explains 85% of price variation)
• Mean Squared Error (MSE): Low, affirming good prediction quality

The model is good at predicting car prices and meets industry standards for baseline
models. However, performance may be enhanced through regularization or
advanced algorithms.

CHAPTER 7
CONCLUSION
The purpose of this project is to create a prediction model to forecast car prices with
the help of a Linear Regression algorithm. The model is implemented with the help
of the CarPrice dataset, which holds technical, numerical, and categorical
characteristics of several models of cars. After data preprocessing, exploratory
analysis, and training of the model, the Linear Regression model proves to be well-
performing with an R² score of about 0.85, showing excellent prediction strength.

REFERENCES
• Kaggle – Car Price Prediction Datasets and Notebooks
• Community-contributed data and notebooks exploring various car price
prediction models.
• W3Schools.com

https://colab.research.google.com/drive/1LG3g0_eGLnFy3zJF9x--5lv12iE7-
9Dn?usp=sharing

You might also like