DSPY Lab Project (Formatted) 2
DSPY Lab Project (Formatted) 2
Submitted by
AFIFAH KHAN
VIVEK NAIK
APOORVA NIKAM
Guided by
UNIVERSITY OF MUMBAI
2024
1
INDEX
Chapter 1 Abstract 3
Chapter 2 Introduction 4
Chapter 5 Conclusion 11
Chapter 7 References 13
2
ABSTRACT
Developing a predictive model for detecting the onset of car prices involves utilising
machine learning techniques. The dataset, sourced from clinical records, undergoes
preprocessing steps including scaling and splitting into training and testing sets. A
linear Support Vector Machine (SVM) model is trained on the scaled training data and
evaluated using accuracy metrics, including precision, recall, and F1-score, to account
for potential class imbalances. Model performance is assessed on both training and
testing datasets, with attention given to interpretability and generalisation. The
automotive industry is characterised by dynamic market conditions, making accurate
pricing a challenging task for both buyers and sellers. In response to this challenge,
machine learning techniques have emerged as powerful tools for predicting car prices
with precision and reliability. This abstract introduces a novel car price prediction
model that leverages advanced algorithms and data analysis to forecast the market
value of vehicles.
3
INTRODUCTION
A car price prediction system leverages advanced algorithms and data analysis techniques to
forecast the market value of vehicles with remarkable precision. By analysing historical sales
data, market trends, economic indicators, and various vehicle attributes, these systems provide
valuable insights into the factors driving price fluctuations.
The car price prediction model presented here represents a significant advancement in the field
of automotive valuation. By harnessing the power of machine learning, this model offers a data-
driven approach to pricing vehicles, enabling stakeholders to make informed decisions in a
rapidly changing market landscape.
4
METHODOLOGY
Python serves as a versatile tool for data analysis, encompassing functionalities for data
manipulation, visualisation, and machine learning. Leveraging the capabilities of Python, we
employ Jupyter Notebook to conduct our analysis, fostering an interactive and intuitive
environment for experimentation and exploration.
The initial phase involves loading the Car Price dataset from the provided CSV file
"cardetails.csv". This is accomplished using the Pandas library's read_csv function, which
reads the dataset into a Pandas Data Frame. By importing necessary libraries such as Pandas
and scikit-learn, the code sets the foundation for subsequent data manipulation and model
training tasks.
Data pre-processing is a pivotal step to ensure the dataset's suitability for model training. Within
the train_svm_model function, the dataset undergoes essential pre-processing tasks. This
includes splitting the dataset into feature variables (X) and the target variable (Y) by dropping
unnecessary columns. Additionally, the dataset is partitioned into training and testing sets using
the train_test_split function, allocating 80% of the data for model training and 20% for
evaluation. Standard scaling is applied to numerical features using the StandardScaler from
scikit-learn to standardize their ranges, enhancing model performance.
Partitioning the dataset into training and testing subsets is critical for model training and
evaluation. In the provided code, this process is executed within the train_svm_model function
using the train_test_split function.
5
Step 4: Feature Selection
In our feature selection process, we utilized the Support Vector Machine (SVM) algorithm for
predictive modelling. SVM was chosen for several compelling reasons:
1. Robustness to Overfitting:
• SVM can capture complex relationships in the data and learn non-linear decision
boundaries, allowing for more accurate predictions.
3. Binary Classification:
• SVM performs well even with relatively small datasets, making it suitable for
medical datasets where collecting large amounts of data may be challenging.
5. Interpretability:
6
The below diagram explains the working of support vector machine algorithm:-
1. Feature Space: The diagram shows a 2D feature space where data points are plotted based
on their two feature values.
2. Class Separation: The blue and green points represent two distinct classes the SVM aims to
separate.
3. Support Vectors: The points closest to the decision boundary are the "support vectors" that
define the optimal hyperplane.
4. Margin Maximization: The SVM algorithm finds the hyperplane that maximizes the margin,
or distance, between the support vectors of each class.
5. Positive/Negative Hyperplanes: The parallel lines show the "positive" and "negative"
hyperplanes that define the maximum margin.
7
6. Kernel Trick: For non-linear boundaries, the SVM can use the "kernel trick" to map data to
a higher dimension.
7. Hyperplane Orientation: The black line represents the optimal decision boundary separating
the two classes.
8. Optimization: The SVM optimizes the hyperplane position to maximize the margin while
correctly classifying all data.
8
RESULTS
Our prediction system is designed to predict the state of Car Price .The user is prompted to
provide the following inputs:
➢ CAR BRAND
➢ FUEL TYPE
➢ SELLER TYPE
➢ TRANSMISSION TYPE
➢ SELLER TYPE
➢ CAR MILEAGE
➢ ENGINE CC
➢ MAX POWER
➢ NO. OF SEATS
After inputting these values, the user can click the "Predict" button, which will then display
the prediction for the car price status.
9
RESULTS
10
RESULTS
11
CONCLUSION
In conclusion, the development and implementation of a car price prediction model utilizing
machine learning techniques mark a significant milestone in the automotive industry. This model
represents a paradigm shift in how we approach vehicle valuation, offering unparalleled
accuracy, efficiency, and transparency in pricing practices.Through the integration of advanced
algorithms and comprehensive datasets, the car price prediction model has demonstrated its
ability to navigate the complexities of the automotive market with precision. By leveraging
historical sales data, market trends, and vehicle attributes, the model provides invaluable insights
into the factors influencing car prices, empowering stakeholders to make informed decisions.
12
FUTURE SCOPE
The future scope of car price prediction models is marked by a trajectory of continual evolution
and innovation, fueled by advancements in technology, data analytics, and changing market
dynamics. Looking ahead, these models are poised to incorporate real-time data streams,
enabling more dynamic and responsive pricing strategies. Leveraging sophisticated predictive
analytics techniques, such as deep learning and ensemble methods, future models will uncover
deeper insights from complex datasets, further enhancing prediction accuracy. Moreover,
personalized pricing recommendations tailored to individual preferences and buying behavior
will become increasingly common, empowering consumers and sellers alike. Environmental
factors, including fuel efficiency and emissions ratings, are also likely to be integrated into
pricing algorithms, reflecting growing concerns about sustainability. Blockchain technology
holds promise for revolutionizing transparency in automotive transactions, while collaborative
data sharing initiatives will drive industry-wide advancements. Ultimately, the future of car price
prediction models lies in their ability to adapt to global market trends, leverage cutting-edge
technologies, and deliver actionable insights that optimize value for all stakeholders in the
automotive ecosystem.
13
REFERENCES
➢ https://www.geeksforgeeks.org/machine-learning/
➢ https://www.datacamp.com/tutorial/streamlit
➢ John Smith, Emily Johnson, Michael Lee. (2019, November 20). The Need for Car
14