0% found this document useful (0 votes)

37 views19 pages

Phase 5

Uploaded by

chinnasamy8103

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views19 pages

Phase 5

Uploaded by

chinnasamy8103

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

o

GOVERNMENT COLLEGE OF ENGINEERING,THANJAVUR

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

PHASE 5 PROJECT SUBMISSION

College Code: 8227

Technology: Artificial Intelligence (AI)
Total number of Students in the group: 5

Completed the project named as

PREDICTIVE MAINTENANCE

Submitted by

Lokesh R

Meganthan P

Krishna kumar R

Rathish GP

Dhayanithi S
Project Title: Project Development – Dynamic Pricing

Introduction:

 Dynamic pricing, also known as real-time pricing, is a pricing strategy where businesses
adjust the prices of their products or services in response to real-time supply and demand
conditions, market trends, competitor pricing, and other external factors.
 This approach contrasts with static pricing, where prices remain fixed over a period of time
regardless of changing circumstances.
 Dynamic pricing is widely adopted in various industries, including airlines, hospitality, e-
commerce, and ride-sharing, due to its potential to maximize revenue and optimize
resource utilization.

Project Objective:

 Maximizing Revenue: Increase overall revenue by setting prices that reflect current
demand levels, ensuring higher prices during peak demand and competitive pricing during
low demand.
 Improving Profit Margins: Optimize profit margins by efficiently balancing supply and
demand, reducing the occurrence of overpricing or underpricing.
 Enhancing Market Responsiveness: Quickly adapt to changes in market conditions,
including competitor pricing, seasonal variations, and consumer behavior trends.
 Personalizing Customer Experience: Use customer data to offer personalized pricing,
improving customer satisfaction and loyalty.
 Optimizing Inventory Management: Adjust prices to manage inventory levels
effectively, preventing stockouts and overstock situations.
 Leveraging Technology: Implement advanced technologies such as machine learning
algorithms and big data analytics to predict demand and automate price adjustments.
 Maintaining Competitive Edge: Stay ahead of competitors by continually analyzing and
responding to their pricing strategies.
 Ensuring Transparency and Trust: Clearly communicate the reasons for price changes
to maintain customer trust and avoid perceptions of unfair pricing practices.

About the Dataset:

o Product ID - Values: Unique identifier for each product or service (e.g., "P12345")
o Product Name - Values: Descriptive name of the product (e.g., "Wireless Headphones")
o Category - Values: Product category or type (e.g., "Electronics")
o Base Price - Values: Initial or standard price before any dynamic adjustments (e.g.,
$50.00)
o Current Price - Values: The dynamically adjusted price at a given time (e.g., $45.00)
o Demand Level - Values: Indicator of current demand (e.g., "High", "Medium", "Low")
o Stock Level - Values: Quantity of product currently available (e.g., 100 units)
o Competitor Prices - Values: Prices offered by competitors for the same or similar
products (e.g., $48.00)
o Time of Day - Values: Specific time when the price was recorded or adjusted (e.g.,
"14:00")
o Day of Week - Values: Day when the price was recorded or adjusted (e.g., "Monday")
o Date - Values: Specific date when the price was recorded or adjusted (e.g., "2024-06-04")
o Season - Values: Current season affecting pricing (e.g., "Summer", "Winter")

System Requirements:

 Data:
o Description of the heart attack dataset used, including its source, size, and
attributes.
o Explanation of how the data was collected and preprocessed.
 Hardware:
o Specifications for the hardware required to run the predictive maintenance
system, such as computational resources.
 Software:
o List of software tools and libraries used for data preprocessing, model
development, and evaluation.
o Description of any specific software requirements for deploying the predictive
maintenance system.

Methodology:
 Overview of the methodology followed in the project, including the steps involved in:
o Data preprocessing: Cleaning the dataset, handling missing values, encoding
categorical variables, and feature scaling.
o Model development: Choosing appropriate algorithms (e.g.,
RandomForestClassifier) and hyperparameter tuning.
o Model evaluation: Splitting the data into training and testing sets, assessing
model performance metrics (accuracy, precision, recall, F1-score), and
visualizing results using confusion matrices and classification reports.
 Explanation of any additional steps taken, such as feature engineering or ensemble
techniques.

Data Preprocessing:
 Detailed explanation of the data preprocessing steps undertaken, including:
o Handling missing values: Imputation techniques used, if any.
o Encoding categorical variables: One-hot encoding or label encoding methods
applied.
o Feature scaling: Standardization or normalization of features to ensure
consistent scale across variables.

Model Evaluation:
 Description of the model evaluation process, covering:
o Splitting the dataset into training and testing sets.
o Training the predictive model using appropriate algorithms and
hyperparameters.
o Evaluating the model's performance on the test set using relevant metrics
(accuracy, precision, recall, F1-score).
o Visualizing evaluation results using confusion matrices and classification
reports to gain insights into the model's performance.

Existing Work:
 Review of existing literature, research papers, and projects related to predictive
maintenance in cardiovascular health.
 Summary of methodologies, techniques, and findings from previous studies.
 Identification of gaps or limitations in existing approaches.

Proposed Work:
 Overview of the proposed methodology and objectives of the project.
 Explanation of how the proposed approach addresses the limitations or gaps identified
in existing work.
 Description of the predictive maintenance model for heart attack analysis and its
components.

Flow chart:
Implementation:
# IMPORTANT: RUN THIS CELL IN ORDER TO IMPORT YOUR KAGGLE DATA SOURCES
# TO THE CORRECT LOCATION (/kaggle/input) IN YOUR NOTEBOOK,
# THEN FEEL FREE TO DELETE THIS CELL.
# NOTE: THIS NOTEBOOK ENVIRONMENT DIFFERS FROM KAGGLE'S PYTHON
# ENVIRONMENT SO THERE MAY BE MISSING LIBRARIES USED BY YOUR
# NOTEBOOK.

import os import
sys
from tempfile import NamedTemporaryFile
from urllib.request import urlopen from
urllib.parse import unquote, urlparse from
urllib.error import HTTPError from zipfile
import ZipFile import tarfile import
shutil

CHUNK_SIZE = 40960
DATA_SOURCE_MAPPING = 'dynamic-pricing-dataset:https%3A%2F
%2Fstorage.googleapis.com%2Fkaggle-data-sets
%2F4365344%2F7496965%2Fbundle%2Farchive.zip%3FX-Goog-Algorithm
%3DGOOG4-RSA-SHA256%26XGoog-Credential%3Dgcp-kaggle-com%2540kaggle-
161607.iam.gserviceaccount.com%252F20240604%252Fauto%252Fstorage
%252Fgoog4_request%26X-Goog-Date%3D20240604T141813Z%26X- Goog-Expires
%3D259200%26X-Goog-SignedHeaders%3Dhost%26X- Goog-Signature
%3D4b0752a7b69b9e7909ee9c3329842cd9ba61766b7eda89a24962a67551b95d3eaa7
dea046d402fae2dfc42360f03a955d9127029345c2f54cc95910da9b072fd57efe3dc1
80 ad7b4abede0315918728a5fd9f81fb9390eee7010e9dd76c0d32a76fb2f3f5173ba
1
e8c1429d53fbe7656a876b9c3d2823829bdbe6c65c163cd94d6f50800a16cb544b2c29
d348abdf43465906677d00f2bb0ff96b69f9b537b503b539fc31ec85a8e78912fb4dff
835 bed78eda36be723f116855c4f47ef14db5c58d570f8efd7d9f5f98edc77ddff8b
11 2
aa83d61495d4bfcc042bd2f98774cbceaf3f68ffb954c43201a931cddd1f67afcb67a
d966e30bb4606f119e24f1bda'

KAGGLE_INPUT_PATH='/kaggle/input'
KAGGLE_WORKING_PATH='/kaggle/working'
KAGGLE_SYMLINK='kaggle'

!umount /kaggle/input/ 2> /dev/null

shutil.rmtree('/kaggle/input', ignore_errors=True)
os.makedirs(KAGGLE_INPUT_PATH, 0o777, exist_ok=True)
os.makedirs(KAGGLE_WORKING_PATH, 0o777, exist_ok=True)

try: os.symlink(KAGGLE_INPUT_PATH, os.path.join("..",

'input'), target_is_directory=True) except FileExistsError:
pass try:
os.symlink(KAGGLE_WORKING_PATH, os.path.join("..", 'working'),
target_is_directory=True) except FileExistsError: pass

for data_source_mapping in DATA_SOURCE_MAPPING.split(','):

directory, download_url_encoded = data_source_mapping.split(':')
download_url = unquote(download_url_encoded) filename =
urlparse(download_url).path
destination_path = os.path.join(KAGGLE_INPUT_PATH, directory)
try:
with urlopen(download_url) as fileres, NamedTemporaryFile() as
tfile: total_length = fileres.headers['content-length']
print(f'Downloading {directory}, {total_length} bytes compressed')
dl = 0
data = fileres.read(CHUNK_SIZE)
while len(data) > 0: dl +=
len( data )
tfile.write(data)
done = int(50 * dl / int( total_length ))
sys.stdout.write(f"\r[{'=' * done}{' ' * (50-done)}]
{dl} bytes downloaded")
sys.stdout.flush()
data = fileres.read(CHUNK_SIZE)
if filename.endswith('.zip'): with
ZipFile(tfile) as zfile:
zfile.extractall(destination_path) else:
with tarfile.open(tfile.name) as tarfile:
tarfile.extractall(destination_path)
print(f'\nDownloaded and uncompressed: {directory}')
except HTTPError as e: print(f'Failed to load (likely expired)
{download_url} to path
{destination_path}') continue except
OSError as e: print(f'Failed to load
{download_url} to path
{destination_path}')
continue

print('Data source import complete.')

Downloading dynamic-pricing-dataset, 22341 bytes compressed
[==================================================] 22341 bytes
downloaded Downloaded and uncompressed: dynamic-
pricing-dataset Data source import complete.
Introduction
This notebook explores a dataset provided by a ride-sharing company seeking to implement a dynamic
pricing strategy. Currently, the company sets fares based solely on ride duration. This project aims to
leverage data-driven techniques to develop a predictive model for dynamic pricing that adjusts fares in
response to real-time market conditions.
The provided dataset encompasses historical ride information, including features like the number of
riders, drivers, location categories, customer loyalty, past rides, average ratings, booking time, vehicle
type, expected ride duration, and historical costs.
Our objective here is to build a dynamic pricing model that utilizes these features to predict optimal fares
for rides in real-time, considering factors like demand patterns and driver availability.

import warnings import pandas as

pd import numpy as np import
seaborn as sns import
matplotlib.pyplot as plt
warnings.filterwarnings("ignore")

EDA
# Loading data
data =
pd.read_csv("/kaggle/input/dynamic-pricing-dataset/dynamic_pricing.csv
")
data.head()
{"summary":"{\n \"name\": \"data\",\n \"rows\": 1000,\n \"fields\":
[\n {\n \"column\": \"Number_of_Riders\",\n
\"properties\": {\n \"dtype\": \"number\",\n \"std\":
23,\n \"min\": 20,\n \"max\": 100,\n
\"num_unique_values\": 81,\n \"samples\": [\n 68,\n
90,\n 48\n ],\n \"semantic_type\": \"\",\n
\"description\": \"\"\n }\n },\n {\n \"column\":
\"Number_of_Drivers\",\n \"properties\": {\n \"dtype\":
\"number\",\n \"std\": 19,\n \"min\": 5,\n
\"max\": 89,\n \"num_unique_values\": 79,\n \"samples\":
[\n 55,\n 45,\n 9\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\ n
},\n {\n \"column\": \"Location_Category\",\n
\"properties\": {\n \"dtype\": \"category\",\n
\"num_unique_values\": 3,\n \"samples\": [\n
\"Urban\",\n \"Suburban\",\n \"Rural\"\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\ n
},\n {\n \"column\": \"Customer_Loyalty_Status\",\n
\"properties\": {\n \"dtype\": \"category\",\n
\"num_unique_values\": 3,\n \"samples\": [\n
\"Silver\",\n \"Regular\",\n \"Gold\"\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\ n
},\n {\n \"column\": \"Number_of_Past_Rides\",\n
\"properties\": {\n \"dtype\": \"number\",\n \"std\":
29,\n \"min\": 0,\n \"max\": 100,\n
\"num_unique_values\": 101,\n \"samples\": [\n 42,\n
31,\n 90\n ],\n \"semantic_type\": \"\",\n
\"description\": \"\"\n }\n },\n {\n \"column\":
\"Average_Ratings\",\n \"properties\": {\n \"dtype\":
\"number\",\n \"std\": 0.4357808813802833,\n \"min\":
3.5,\n \"max\": 5.0,\n \"num_unique_values\": 151,\n
\"samples\": [\n 4.26,\n 4.82,\n 4.91\n
],\n \"semantic_type\": \"\",\n \"description\": \"\"\n
}\n },\n {\n \"column\": \"Time_of_Booking\",\n
\"properties\": {\n \"dtype\": \"category\",\n
\"num_unique_values\": 4,\n \"samples\": [\n
\"Evening\",\n \"Morning\",\n \"Night\"\n ],\
n \"semantic_type\": \"\",\n \"description\": \"\"\n
}\n },\n {\n \"column\": \"Vehicle_Type\",\n
\"properties\": {\n \"dtype\": \"category\",\n
\"num_unique_values\": 2,\n \"samples\": [\n
\"Economy\",\n \"Premium\"\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\ n
},\n {\n \"column\": \"Expected_Ride_Duration\",\n
\"properties\": {\n \"dtype\": \"number\",\n \"std\":
49,\n \"min\": 10,\n \"max\": 180,\n
\"num_unique_values\": 171,\n \"samples\": [\n 145,\n
28\n ],\n \"semantic_type\": \"\",\n
\"description\": \"\"\n }\n },\n {\n \"column\":
\"Historical_Cost_of_Ride\",\n \"properties\": {\n
\"dtype\": \"number\",\n \"std\": 187.1587562201799,\n
\"min\": 25.993449448411635,\n \"max\": 836.1164185613576,\n
\"num_unique_values\": 1000,\n \"samples\": [\n
470.2690237026412,\n 286.409294385432\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\ n
}\n ]\n}","type":"dataframe","variable_name":"data"} data.info()

< class 'pandas.core.frame.DataFrame' >

RangeIndex: 1000 entries, 0 to 999
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Number_of_Riders 1000 non-null int64
1 Number_of_Drivers 1000 non-null int64
2 Location_Category 1000 non-null object
3 Customer_Loyalty_Status 1000 non-null object
4 Number_of_Past_Rides 1000 non-null int64
5 Average_Ratings 1000 non-null float64
6 Time_of_Booking 1000 non-null object
7 Vehicle_Type 1000 non-null object
8 Expected_Ride_Duration 1000 non-null int64 9
Historical_Cost_of_Ride 1000 non-null float64
dtypes: float64(2), int64(4), object(4)
memory usage: 78.2+ KB data.describe()

{"summary":"{\n \"name\": \"data\",\n \"rows\": 8,\n \"fields\": [\

n {\n \"column\": \"Number_of_Riders\",\n \"properties\":
{\n \"dtype\": \"number\",\n \"std\":
335.2107999243274,\n \"min\": 20.0,\n \"max\": 1000.0,\n
\"num_unique_values\": 8,\n \"samples\": [\n 60.372,\n
60.0,\n 1000.0\n ],\n \"semantic_type\": \"\",\
n \"description\": \"\"\n }\n },\n {\n
\"column\": \"Number_of_Drivers\",\n \"properties\": {\n
\"dtype\": \"number\",\n \"std\": 343.8714218161568,\n
\"min\": 5.0,\n \"max\": 1000.0,\n
\"num_unique_values\": 8,\n \"samples\": [\n 27.076,\n
22.0,\n 1000.0\n ],\n \"semantic_type\": \"\",\
n \"description\": \"\"\n }\n },\n {\n
\"column\": \"Number_of_Past_Rides\",\n \"properties\": {\n
\"dtype\": \"number\",\n \"std\": 338.2700202372605,\n
\"min\": 0.0,\n \"max\": 1000.0,\n
\"num_unique_values\": 8,\n \"samples\": [\n 50.031,\n
51.0,\n 1000.0\n ],\n \"semantic_type\": \"\",\
n \"description\": \"\"\n }\n },\n {\n
\"column\": \"Average_Ratings\",\n \"properties\": {\n
\"dtype\": \"number\",\n \"std\": 352.2447571461208,\n
\"min\": 0.4357808813802833,\n \"max\": 1000.0,\n
\"num_unique_values\": 8,\n \"samples\": [\n 4.25722,\
n 4.27,\n 1000.0\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\ n
},\n {\n \"column\": \"Expected_Ride_Duration\",\n
\"properties\": {\n \"dtype\": \"number\",\n \"std\":
325.4930600251677,\n \"min\": 10.0,\n \"max\": 1000.0,\n
\"num_unique_values\": 8,\n \"samples\": [\n 99.588,\n
102.0,\n 1000.0\n ],\n \"semantic_type\":
\"\",\n \"description\": \"\"\n }\n },\n {\n
\"column\": \"Historical_Cost_of_Ride\",\n \"properties\": {\n
\"dtype\": \"number\",\n \"std\": 331.49960406582255,\n
\"min\": 25.993449448411635,\n \"max\": 1000.0,\n
\"num_unique_values\": 8,\n \"samples\": [\n
372.5026233496332,\n 362.01942584564324,\n 1000.0\n
],\n \"semantic_type\": \"\",\n \"description\": \"\"\n
}\n }\n ]\n}","type":"dataframe"}
In lack of a complete coloumn description we need to make some assumptions. For the further analysis
we assume the following:
Number_of_Riders: The number of riders available at the time of booking, reflecting the market situation.
Number_of_Drivers: The number of drivers available at the time of booking, reflecting the market
situation. Location_Category: The category representing the geographical location where the ride was
booked, such as Urban, Suburban, or Rural. Customer_Loyalty_Status: The loyalty status of the customer
towards the ride-sharing company, indicating whether the customer is a regular user or enrolled in a
loyalty program. Number_of_Past_Rides: The number of past rides taken by the customer, indicating
their experience and familiarity with the service. Average_Ratings: The average rating given by the
customer for past rides, reflecting customer satisfaction and feedback. Time_of_Booking: The time of the
day when the ride was booked, categorized into different time slots such as Morning, Afternoon, Evening,
or Night.
Vehicle_Type: The type of vehicle used for the ride, such as Premium, Economy, or other classes.
Expected_Ride_Duration: The expected duration of the ride in minutes. Historical_Cost_of_Ride:
The historical cost of past rides, indicating pricing patterns and customer spending.
numerical_data = data[['Number_of_Riders', 'Number_of_Drivers',
'Number_of_Past_Rides',
'Average_Ratings', 'Expected_Ride_Duration',
'Historical_Cost_of_Ride']]

sns.pairplot(numerical_data, diag_kind='hist')
plt.show()
sns.regplot(x='Expected_Ride_Duration', y='Historical_Cost_of_Ride',
data=data, scatter=True, color='cornflowerblue',
line_kws={"color":
"green"})

plt.title('Scatterplot of Expected Ride Duration vs. Historical Cost

of Ride with Trendline')
plt.xlabel('Expected Ride Duration')
plt.ylabel('Historical Cost of Ride')
plt.show()
cat = ['Location_Category', 'Customer_Loyalty_Status',
'Time_of_Booking', 'Vehicle_Type']

# create subplots
plt.figure(figsize=(12,10))

for i, c in enumerate(cat, 1):

plt.subplot(2,2,i)
sns.boxplot(y=data['Historical_Cost_of_Ride'], x=data[c],
palette='GnBu')

plt.subplots_adjust(hspace=0.5, wspace=0.5)
plt.show()
plt.figure(figsize=(12, 10))

for i, c in enumerate(cat, 1):

plt.subplot(2,2,i)
c_counts = data[c].value_counts()

sns.barplot(x=c_counts.index, y=c_counts.values,
palette='GnBu')

plt.subplots_adjust(hspace=0.5, wspace=0.5)
plt.show()
A look at the results:
• Urban rides are the cheapest
• Regular and Gold customer pay surprisingly identical prices
• AFternoon rides tend to be significantly more expensive
• Big Differerence between the vehicle Type
correlation_matrix = numerical_data.corr()

plt.figure(figsize=(10,8))
sns.heatmap(correlation_matrix, annot=True, cmap='crest',
linewidths=0.5)
plt.title('Correlation Matrix')
plt.show()
Preprocessing
data = pd.get_dummies(data, columns= cat, dtype = int)
data.head()
{"summary":"{\n \"name\": \"data\",\n \"rows\": 1000,\n \"fields\":
[\n {\n \"column\": \"Number_of_Riders\",\n
\"properties\": {\n \"dtype\": \"number\",\n \"std\":
23,\n \"min\": 20,\n \"max\": 100,\n
\"num_unique_values\": 81,\n \"samples\": [\n 68,\n
90,\n 48\n ],\n \"semantic_type\": \"\",\n
\"description\": \"\"\n }\n },\n {\n \"column\":
\"Number_of_Drivers\",\n \"properties\": {\n \"dtype\":
\"number\",\n \"std\": 19,\n \"min\": 5,\n
\"max\": 89,\n \"num_unique_values\": 79,\n \"samples\":
[\n 55,\n 45,\n 9\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\ n
},\n {\n \"column\": \"Number_of_Past_Rides\",\n
\"properties\": {\n \"dtype\": \"number\",\n \"std\":
29,\n \"min\": 0,\n \"max\": 100,\n
\"num_unique_values\": 101,\n \"samples\": [\n 42,\n
31,\n 90\n ],\n \"semantic_type\": \"\",\n
\"description\": \"\"\n }\n },\n {\n \"column\":
\"Average_Ratings\",\n \"properties\": {\n \"dtype\":
\"number\",\n \"std\": 0.4357808813802833,\n \"min\":
3.5,\n \"max\": 5.0,\n \"num_unique_values\": 151,\n
\"samples\": [\n 4.26,\n 4.82,\n 4.91\n
],\n \"semantic_type\": \"\",\n \"description\": \"\"\n
}\n },\n {\n \"column\": \"Expected_Ride_Duration\",\n
\"properties\": {\n \"dtype\": \"number\",\n \"std\":
49,\n \"min\": 10,\n \"max\": 180,\n
\"num_unique_values\": 171,\n \"samples\": [\n 145,\n
28,\n 169\n ],\n \"semantic_type\": \"\",\n
\"description\": \"\"\n }\n },\n {\n \"column\":
\"Historical_Cost_of_Ride\",\n \"properties\": {\n
\"dtype\": \"number\",\n \"std\": 187.1587562201799,\n
\"min\": 25.993449448411635,\n \"max\": 836.1164185613576,\n
\"num_unique_values\": 1000,\n \"samples\": [\n
470.2690237026412,\n 286.409294385432,\n
552.2693747461685\n ],\n \"semantic_type\": \"\",\n
\"description\": \"\"\n }\n },\n {\n \"column\":
\"Location_Category_Rural\",\n \"properties\": {\n
\"dtype\": \"number\",\n \"std\": 0,\n \"min\": 0,\n
\"max\": 1,\n \"num_unique_values\": 2,\n \"samples\":
[\n 1,\n 0\n ],\n \"semantic_type\":
\"\",\n \"description\": \"\"\n }\n },\n {\n
\"column\": \"Location_Category_Suburban\",\n \"properties\": {\n
\"dtype\": \"number\",\n \"std\": 0,\n \"min\": 0,\n
\"max\": 1,\n \"num_unique_values\": 2,\n \"samples\":
[\n 1,\n 0\n ],\n \"semantic_type\":
\"\",\n \"description\": \"\"\n }\n },\n {\n
\"column\": \"Location_Category_Urban\",\n \"properties\": {\n
\"dtype\": \"number\",\n \"std\": 0,\n \"min\": 0,\n
\"max\": 1,\n \"num_unique_values\": 2,\n \"samples\":
[\n 0,\n 1\n ],\n \"semantic_type\":
\"\",\n \"description\": \"\"\n }\n },\n {\n
\"column\": \"Customer_Loyalty_Status_Gold\",\n \"properties\":
{\n \"dtype\": \"number\",\n \"std\": 0,\n
\"min\": 0,\n \"max\": 1,\n \"num_unique_values\": 2,\n
\"samples\": [\n 1,\n 0\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\
n },\n {\n \"column\":
\"Customer_Loyalty_Status_Regular\",\n \"properties\": {\n
\"dtype\": \"number\",\n \"std\": 0,\n \"min\": 0,\n
\"max\": 1,\n \"num_unique_values\": 2,\n \"samples\":
[\n 1,\n 0\n ],\n \"semantic_type\":
\"\",\n \"description\": \"\"\n }\n },\n {\n
\"column\": \"Customer_Loyalty_Status_Silver\",\n \"properties\":
{\n \"dtype\": \"number\",\n \"std\": 0,\n
\"min\": 0,\n \"max\": 1,\n \"num_unique_values\": 2,\n
\"samples\": [\n 0,\n 1\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\ n
},\n {\n \"column\": \"Time_of_Booking_Afternoon\",\n
\"properties\": {\n \"dtype\": \"number\",\n \"std\":
0,\n \"min\": 0,\n \"max\": 1,\n
\"num_unique_values\": 2,\n \"samples\": [\n 1,\n
0\n ],\n \"semantic_type\": \"\",\n
\"description\": \"\"\n }\n },\n {\n \"column\":
\"Time_of_Booking_Evening\",\n \"properties\": {\n
\"dtype\": \"number\",\n \"std\": 0,\n \"min\": 0,\n
\"max\": 1,\n \"num_unique_values\": 2,\n \"samples\":
[\n 1,\n 0\n ],\n \"semantic_type\":
\"\",\n \"description\": \"\"\n }\n },\n {\n
\"column\": \"Time_of_Booking_Morning\",\n \"properties\": {\n
\"dtype\": \"number\",\n \"std\": 0,\n \"min\": 0,\n
\"max\": 1,\n \"num_unique_values\": 2,\n \"samples\":
[\n 1,\n 0\n ],\n \"semantic_type\":
\"\",\n \"description\": \"\"\n }\n },\n {\n
\"column\": \"Time_of_Booking_Night\",\n \"properties\": {\n
\"dtype\": \"number\",\n \"std\": 0,\n \"min\": 0,\n
\"max\": 1,\n \"num_unique_values\": 2,\n \"samples\":
[\n 0,\n 1\n ],\n \"semantic_type\":
\"\",\n \"description\": \"\"\n }\n },\n {\n
\"column\": \"Vehicle_Type_Economy\",\n \"properties\": {\n
\"dtype\": \"number\",\n \"std\": 0,\n \"min\": 0,\n
\"max\": 1,\n \"num_unique_values\": 2,\n \"samples\":
[\n 1,\n 0\n ],\n \"semantic_type\":
\"\",\n \"description\": \"\"\n }\n },\n {\n
\"column\": \"Vehicle_Type_Premium\",\n \"properties\": {\n
\"dtype\": \"number\",\n \"std\": 0,\n \"min\": 0,\n
\"max\": 1,\n \"num_unique_values\": 2,\n \"samples\":
[\n 0,\n 1\n ],\n \"semantic_type\":
\"\",\n \"description\": \"\"\n }\n }\n ]\
n}","type":"dataframe","variable_name":"data"}
X = data.drop('Historical_Cost_of_Ride', axis = 1)
y = data['Historical_Cost_of_Ride']
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size =
0.3, random_state = 42)
from sklearn.linear_model import LinearRegression from
sklearn.ensemble import RandomForestRegressor from sklearn.ensemble
import GradientBoostingRegressor
from sklearn.cluster import KMeans

from sklearn.metrics import r2_score, mean_squared_error

Modeling
Upon completing exploratory data analysis (EDA) and gaining insights into the dataset, the next step
is to develop predictive models for price estimation. In this phase, we will explore the performance of
various machine learning algorithms, namely K-Nearest Neighbors, Random Forest, Linear
Regression, and Gradient Boosting, in predicting prices based on the available features. Through
rigorous evaluation, we aim to identify the most effective model for accurate price prediction, thereby
facilitating informed decision-making and enhancing the overall efficiency of our system.
# Creating a dictionary to store the results
results = {}
# In this section, we'll use the Elbow Method to determine the optimal
number of clusters for the K-Nearest Neighbor algorithm. from
sklearn.cluster import KMeans
wcss = []
max_cluster = 10

for c in range(1, max_cluster+1): kmeans =

KMeans(n_clusters=c, random_state=42)
kmeans.fit(X)
wcss.append(kmeans.inertia_)

plt.plot(range(1, max_cluster+1), wcss, marker='o')

plt.title('Elbow Method for Optimal Number of Clusters')
plt.xlabel('Number of Clusters')
plt.ylabel('WCSS - Within-Cluster-Sum-of-Squares')
plt.show()

# Linear regression specifically targeting features that exhibit

linear relationships, as observed during the EDA phase. lr =
LinearRegression()
lr.fit(X_train[['Expected_Ride_Duration']] , y_train )
y_pred_lr = lr.predict(X_test[['Expected_Ride_Duration']])

RMSE = mean_squared_error(y_test, y_pred_lr, squared=False)

r2 = r2_score(y_test, y_pred_lr)

results['Lineare Regression'] = {'RMSE': RMSE.round(3), 'r2':

r2.round(3)}
In this phase, we assess the performance of several machine learning models in predicting prices
based on the available features. The models under consideration include K-Nearest Neighbor,
Random Forest, and Gradient Boosting algorithms.
df_result = pd.DataFrame(results)
df_result
{"summary":"{\n \"name\": \"df_result\",\n \"rows\": 2,\n
\"fields\": [\n {\n \"column\": \"Lineare Regression\",\n
\"properties\": {\n \"dtype\": \"number\",\n \"std\":
51.37908582779573,\n \"min\": 0.847,\n \"max\": 73.508,\
n \"num_unique_values\": 2,\n \"samples\": [\n
0.847,\n 73.508\n ],\n \"semantic_type\":
\"\",\n \"description\": \"\"\n }\n },\n {\n
\"column\": \"K-Nearest-Neighbor\",\n \"properties\": {\n
\"dtype\": \"number\",\n \"std\": 304.1054133848985,\n
\"min\": -4.143,\n \"max\": 425.927,\n
\"num_unique_values\": 2,\n \"samples\": [\n -
4.143,\n
425.927\n ],\n \"semantic_type\": \"\",\n
\"description\": \"\"\n }\n },\n {\n \"column\":
\"Random Forest\",\n \"properties\": {\n \"dtype\":
\"number\",\n \"std\": 52.41782568935877,\n \"min\":
0.841,\n \"max\": 74.971,\n \"num_unique_values\":
2,\n \"samples\": [\n 0.841,\n 74.971\n
],\n \"semantic_type\": \"\",\n \"description\": \"\"\n
}\ n },\n {\n \"column\": \"Gradient Boosting\",\n
\"properties\": {\n \"dtype\": \"number\",\n \"std\":
50.71440545348038,\n \"min\": 0.851,\n \"max\":
72.572,\ n \"num_unique_values\": 2,\n \"samples\":
[\n 0.851,\n 72.572\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\n
}\n ]\ n}","type":"dataframe","variable_name":"df_result"}

Upon evaluating the machine learning models for price prediction, the following observations were
made:
Linear Regression: Linear Regression demonstrates reasonable performance, as evidenced by an
RMSE of 73.508 and an R² of 0.847. These metrics indicate that the model effectively explains a
substantial amount of variance in the data, suggesting its suitability for this predictive task.
K-Nearest Neighbor: In contrast, K-Nearest Neighbor exhibits poor performance relative to other
models. It yields a considerably high RMSE of 425.927 and a negative R² value, indicating that it fails
to accurately capture the underlying patterns in the data. Thus, K-Nearest Neighbor is not deemed
suitable for the task of price prediction in this context.
Random Forest and Gradient Boosting: Both Random Forest and Gradient Boosting models perform
comparably well, with RMSE values hovering around 74 and R² scores ranging between
0.84 and 0.85. These models effectively capture the underlying patterns in the data, demonstrating
their potential for accurate price prediction.

Future Enhancement:

 Contextual Factors: Dynamic pricing adjusts prices based on real-time market

conditions, including demand, supply, and competitor actions1.
 Advanced Algorithms: Machine learning models, such as regression and decision
trees, continuously optimize pricing1.
 Data Sources: Internal data (sales, customer behavior) and external signals (traffic,
conversions) inform pricing decisions1.
 Competitive Advantage: Companies embracing dynamic pricing gain a sustained
edge in today’s competitive landscape2.
 Future Trends: Ongoing development of machine learning technology will further
shape real-time pricing.

Conclusion:
 Summary of key findings and contributions of the project.
 Reflection on the effectiveness and implications of the predictive
maintenanceapproach for heart attack analysis.
 Recommendations for future research and areas for further exploration in the
field ofpredictive healthcare analytics.

Statement of Financial Position
100% (3)
Statement of Financial Position
5 pages
Deloitte Consulting, LLP Strategy
No ratings yet
Deloitte Consulting, LLP Strategy
621 pages
10 Minutes Presentation About Myself: Your Company Name
No ratings yet
10 Minutes Presentation About Myself: Your Company Name
65 pages
Omni Channel - Capsule Rooms PDF
0% (1)
Omni Channel - Capsule Rooms PDF
56 pages
Project Report Gr-12
No ratings yet
Project Report Gr-12
25 pages
Intelligent Heart Diseases Prediction System Using Datamining Techniques0
No ratings yet
Intelligent Heart Diseases Prediction System Using Datamining Techniques0
104 pages
20250308T221320
No ratings yet
20250308T221320
44 pages
IT Law Cyber Certificate
No ratings yet
IT Law Cyber Certificate
18 pages
Abel Beauty Salon Group Busnious Paln
No ratings yet
Abel Beauty Salon Group Busnious Paln
19 pages
SCM PreferredSupplierList
No ratings yet
SCM PreferredSupplierList
11 pages
B Tech Major Project Report Final B Tech Major Project Report Final
No ratings yet
B Tech Major Project Report Final B Tech Major Project Report Final
56 pages
Final Report
No ratings yet
Final Report
103 pages
Project Word
No ratings yet
Project Word
58 pages
Project File
No ratings yet
Project File
49 pages
Projectworddoc
No ratings yet
Projectworddoc
56 pages
Genpactreport Tex
No ratings yet
Genpactreport Tex
48 pages
Pricing Recommendation by Applying Statistical Modeling Techniques
No ratings yet
Pricing Recommendation by Applying Statistical Modeling Techniques
73 pages
Report 4
No ratings yet
Report 4
50 pages
Vikash Rai Project Report
No ratings yet
Vikash Rai Project Report
53 pages
Active Sample Selection For Matrix Compl
No ratings yet
Active Sample Selection For Matrix Compl
89 pages
Share CapstoneFinal
No ratings yet
Share CapstoneFinal
69 pages
MLP Proj
No ratings yet
MLP Proj
37 pages
Multiple Disease Prediction and Medical Check Up Using Machine Learning
No ratings yet
Multiple Disease Prediction and Medical Check Up Using Machine Learning
38 pages
Report 4
No ratings yet
Report 4
38 pages
BDA Final
No ratings yet
BDA Final
33 pages
Ecommerce PPT
No ratings yet
Ecommerce PPT
32 pages
Predictive Health Care-Enhancin Diagnosis and Treatment With Maching Learning
No ratings yet
Predictive Health Care-Enhancin Diagnosis and Treatment With Maching Learning
49 pages
Masters Thesis of Accounting
No ratings yet
Masters Thesis of Accounting
104 pages
Aim DBMS2023 V
No ratings yet
Aim DBMS2023 V
20 pages
Final Report
No ratings yet
Final Report
25 pages
Data Quality Management With SAP Master Data Governance On SAP S - 4HANA
No ratings yet
Data Quality Management With SAP Master Data Governance On SAP S - 4HANA
57 pages
The Stakeholder Approach Revisited
No ratings yet
The Stakeholder Approach Revisited
2 pages
Report 8
No ratings yet
Report 8
24 pages
Final Project Report
No ratings yet
Final Project Report
27 pages
ReportFormatBScCSE (Updated) Phase 1
No ratings yet
ReportFormatBScCSE (Updated) Phase 1
19 pages
GUB Thesis Project Template Final 2019 Final 1
No ratings yet
GUB Thesis Project Template Final 2019 Final 1
31 pages
B Tech Major Project Report Final
No ratings yet
B Tech Major Project Report Final
56 pages
2013 Petroleum Notification
No ratings yet
2013 Petroleum Notification
4 pages
Maindra
No ratings yet
Maindra
22 pages
Compparison of Classification Algorithm For Heart Disease - Predictionpdf
No ratings yet
Compparison of Classification Algorithm For Heart Disease - Predictionpdf
34 pages
Mini Project Report Template
No ratings yet
Mini Project Report Template
21 pages
Major Project PPT Format
No ratings yet
Major Project PPT Format
19 pages
ReportFormatBScCSEPHASE I (Updated) 2.0
No ratings yet
ReportFormatBScCSEPHASE I (Updated) 2.0
16 pages
Majorpptfin
No ratings yet
Majorpptfin
19 pages
Final Year Project Proposal: Scissors
No ratings yet
Final Year Project Proposal: Scissors
14 pages
Exercises 5
No ratings yet
Exercises 5
3 pages
Case Study 1 - HR at Enterprise
No ratings yet
Case Study 1 - HR at Enterprise
1 page
Find My Tech
No ratings yet
Find My Tech
14 pages
C6 - ML Project P1 and P2
No ratings yet
C6 - ML Project P1 and P2
4 pages
LGC Notes
No ratings yet
LGC Notes
12 pages
Full Text 01
No ratings yet
Full Text 01
26 pages
SPM Experiment 9: Activity Network Diagram
No ratings yet
SPM Experiment 9: Activity Network Diagram
8 pages
Britannia Company Journey
No ratings yet
Britannia Company Journey
7 pages
Chapter6 KTQT PDF - SV
No ratings yet
Chapter6 KTQT PDF - SV
15 pages
Retail Price Optimization
No ratings yet
Retail Price Optimization
16 pages
Class 10 Question Bank Economics Chapter - 4 Globalisation and The
No ratings yet
Class 10 Question Bank Economics Chapter - 4 Globalisation and The
5 pages
Uber Bangladesh 2019 1 10 317
No ratings yet
Uber Bangladesh 2019 1 10 317
32 pages
Rizq Aly Afif - 1181002057 - Audit 6 - AKT 52
No ratings yet
Rizq Aly Afif - 1181002057 - Audit 6 - AKT 52
5 pages
Mayur Microproject (CSS)
No ratings yet
Mayur Microproject (CSS)
6 pages
Inventory Projection: Integrating Machine Learning For Accurate Forecasting in Warehouse Management Systems
No ratings yet
Inventory Projection: Integrating Machine Learning For Accurate Forecasting in Warehouse Management Systems
6 pages
A Project Report
No ratings yet
A Project Report
5 pages
Case Study 219302405
No ratings yet
Case Study 219302405
14 pages
Project Presentation
No ratings yet
Project Presentation
11 pages
ML Project
No ratings yet
ML Project
5 pages
Maths Sba
No ratings yet
Maths Sba
4 pages
Spouses Gauvain vs. Court of Appeals, G.R. No. 97973, January 27, 1992
No ratings yet
Spouses Gauvain vs. Court of Appeals, G.R. No. 97973, January 27, 1992
5 pages
Daa 01
No ratings yet
Daa 01
11 pages
Final 1
No ratings yet
Final 1
6 pages
Project Report: Application of Machine Learning
No ratings yet
Project Report: Application of Machine Learning
12 pages
A1991370857 65680 10 2025 Csm355ca1
No ratings yet
A1991370857 65680 10 2025 Csm355ca1
6 pages
SUSTAINABLE ENGINEERING SYSTEMS AND ENVIRONMENT Assignment 01 PDF
No ratings yet
SUSTAINABLE ENGINEERING SYSTEMS AND ENVIRONMENT Assignment 01 PDF
12 pages
Project Description Document
No ratings yet
Project Description Document
7 pages
Ex 5.1 Customer Behaviour Prediction
No ratings yet
Ex 5.1 Customer Behaviour Prediction
8 pages
Komal Res1
No ratings yet
Komal Res1
2 pages
Exp2 A
No ratings yet
Exp2 A
10 pages
Assement Marks
No ratings yet
Assement Marks
10 pages
D and F Block Elements Notes
No ratings yet
D and F Block Elements Notes
15 pages
Capstones AIML and DS Capstone Projects
No ratings yet
Capstones AIML and DS Capstone Projects
6 pages
Predictive Analysis For Big Mart Sales Using Machine Learning Algorithms
No ratings yet
Predictive Analysis For Big Mart Sales Using Machine Learning Algorithms
6 pages
Paco, A. Rodrigues, R. G. Rodrigues, L. - Branding in NGOs Brend Imidz, Familijarnost, Tipicnost, Prethodno Doniranje
No ratings yet
Paco, A. Rodrigues, R. G. Rodrigues, L. - Branding in NGOs Brend Imidz, Familijarnost, Tipicnost, Prethodno Doniranje
13 pages
Medhun Final 1
No ratings yet
Medhun Final 1
4 pages
Data Analytics On Banking
No ratings yet
Data Analytics On Banking
3 pages
Chapter 1
No ratings yet
Chapter 1
4 pages
Example 2 SPM Lec#1
No ratings yet
Example 2 SPM Lec#1
3 pages
ML Projects
No ratings yet
ML Projects
2 pages
Payslip Ravi
No ratings yet
Payslip Ravi
2 pages
Sundex Presentation - Metals
No ratings yet
Sundex Presentation - Metals
7 pages
ZEE5 Invoice 09 09 2022
No ratings yet
ZEE5 Invoice 09 09 2022
1 page
Exp4 B
No ratings yet
Exp4 B
4 pages
MBA Class of 2019: Full-Time Employment Statistics
No ratings yet
MBA Class of 2019: Full-Time Employment Statistics
4 pages
Bara Cool Oring
No ratings yet
Bara Cool Oring
2 pages
The Mine Health and Safety Act
No ratings yet
The Mine Health and Safety Act
1 page
It PR 01 Information Security Policy
No ratings yet
It PR 01 Information Security Policy
1 page

Phase 5

Uploaded by

Phase 5

Uploaded by

o

GOVERNMENT COLLEGE OF ENGINEERING,THANJAVUR

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

PHASE 5 PROJECT SUBMISSION

College Code: 8227

Completed the project named as

About the Dataset:

!umount /kaggle/input/ 2> /dev/null

try: os.symlink(KAGGLE_INPUT_PATH, os.path.join("..",

for data_source_mapping in DATA_SOURCE_MAPPING.split(','):

print('Data source import complete.')

import warnings import pandas as

< class 'pandas.core.frame.DataFrame' >

{"summary":"{\n \"name\": \"data\",\n \"rows\": 8,\n \"fields\": [\

plt.title('Scatterplot of Expected Ride Duration vs. Historical Cost

for i, c in enumerate(cat, 1):

for i, c in enumerate(cat, 1):

from sklearn.metrics import r2_score, mean_squared_error

for c in range(1, max_cluster+1): kmeans =

plt.plot(range(1, max_cluster+1), wcss, marker='o')

# Linear regression specifically targeting features that exhibit

RMSE = mean_squared_error(y_test, y_pred_lr, squared=False)

results['Lineare Regression'] = {'RMSE': RMSE.round(3), 'r2':

 Contextual Factors: Dynamic pricing adjusts prices based on real-time market

You might also like