0% found this document useful (0 votes)
25 views37 pages

Bike Sharing Prediction Project Structure

The document outlines a project for predicting bike-sharing demand based on environmental and temporal factors. It details the project structure, including data generation, preprocessing, modeling, and future advancements such as real-time data integration and advanced modeling techniques. The implementation includes code for data handling, model training, evaluation, and user input functionality.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views37 pages

Bike Sharing Prediction Project Structure

The document outlines a project for predicting bike-sharing demand based on environmental and temporal factors. It details the project structure, including data generation, preprocessing, modeling, and future advancements such as real-time data integration and advanced modeling techniques. The implementation includes code for data handling, model training, evaluation, and user input functionality.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

1

Bike Sharing Prediction Project Structure


1. Project Title:
• Bike Sharing Demand Prediction
Predicting the demand for bike-sharing services based on various environmental and temporal
factors.

2. Problem Statement:
• Objective:
Predict the number of bikes rented at different hours of the day based on environmental
factors like temperature, humidity, wind speed, and whether it’s a working day or holiday.
• Goal:
Help bike-sharing companies optimize their bike distribution, improve availability, and
ensure customer satisfaction by predicting demand.

3. Features of the Program:


• Data Generation and Preprocessing:
o Synthetic Data Creation: Generating a simulated dataset with various features such
as hour, temperature, humidity, windspeed, holiday, and working_day that influence
the number of bikes rented.
o Data Cleaning: Checking for missing values, outliers, and normalizing data where
necessary.
• Feature Engineering:
o Adding derived features like hour_of_day, day_of_week, etc., that might have an
impact on bike rental patterns.
o Conversion of categorical variables (holiday, working_day) into numerical values
(e.g., 0 for "no", 1 for "yes").
• Data Visualization:
o Exploratory Data Analysis (EDA): Visualizing the data using matplotlib or seaborn
to understand the relationships between features and bike rentals.
o Visualizations might include:
▪ Heatmap of correlations
▪ Line plots for time-based patterns (e.g., bike rentals throughout the day)
▪ Scatter plots to show relationships between variables like temperature and
number of bikes rented.
• Modeling:
o Regression Models:
▪ Linear regression, Random Forest Regressor, Gradient Boosting, etc., to
predict bike rental demand.
▪ Hyperparameter tuning to improve model performance.
o Evaluation Metrics:
▪ RMSE (Root Mean Squared Error)
▪ MAE (Mean Absolute Error)
▪ R² score for regression performance.
• Prediction & Deployment:
o Develop a model to make predictions on new, unseen data.
o Optional: Create a simple API or web interface to input features and get bike rental
predictions (using Flask/Django for deployment).

4. Advancement Features for the Project:


• Time Series Analysis:
Given that bike rentals have strong temporal dependencies, you could implement time-series
forecasting methods like ARIMA, LSTM (Long Short-Term Memory), or Prophet to predict
future bike rentals.
• Integration with Real-Time Data:
Instead of using synthetic data, consider integrating real-time weather data and holiday
information through APIs (e.g., OpenWeather API for weather data).
• Demand Prediction Algorithm:
Develop more sophisticated demand prediction models, such as ensemble methods combining
multiple models for better accuracy.
• Smart Bike Distribution Algorithm:
Build an algorithm that can optimize bike distribution across various stations based on the
predicted demand.
• User Behavior Analysis:
Use historical data of bike rentals to understand user behavior and offer personalized
recommendations (e.g., discounts, preferred stations).
• Data Visualization Dashboard:
Create an interactive dashboard using Dash or Streamlit to visualize predictions, weather
trends, and bike-sharing patterns in real-time.

5. Code Implementation:
Below is a clean version of your code with explanations:
python
Copy
import pandas as pd
3

import numpy as np
import matplotlib.pyplot as plt

# Set a random seed for reproducibility


np.random.seed(42)

# Number of samples in the dataset


num_samples = 1000

# Create synthetic features


hours = np.random.randint(0, 24, size=num_samples) # Hours of the day (0-23)
temperature = np.random.uniform(5, 30, size=num_samples) # Temperature between 5°C to 30°C
humidity = np.random.uniform(40, 100, size=num_samples) # Humidity between 40% to 100%
windspeed = np.random.uniform(0, 15, size=num_samples) # Windspeed between 0 and 15 m/s
holiday = np.random.choice([0, 1], size=num_samples) # 0 or 1, for holiday (binary)
working_day = np.random.choice([0, 1], size=num_samples) # 0 or 1, for working day (binary)

# Generate target variable (bike rentals) using a synthetic formula


bikes_rented = (
100 + 20 * np.sin(np.pi * hours / 12) + 2 * temperature + 0.5 * humidity
- 0.3 * windspeed + 50 * holiday - 30 * working_day + np.random.normal(0, 30, num_samples)
)

# Create the DataFrame


data = pd.DataFrame({
'hour': hours,
'temperature': temperature,
'humidity': humidity,
'windspeed': windspeed,
'holiday': holiday,
'working_day': working_day,
'bikes_rented': bikes_rented
})

# Display first few rows of the dataset


print(data.head())

# Save to a CSV file for further analysis or model training


data.to_csv('bike_sharing_data.csv', index=False)

# Data Visualization
plt.figure(figsize=(10, 6))
plt.scatter(data['temperature'], data['bikes_rented'], alpha=0.6)
plt.title('Temperature vs Bikes Rented')
plt.xlabel('Temperature (°C)')
plt.ylabel('Number of Bikes Rented')
plt.show()

# Correlation heatmap (optional)


import seaborn as sns
corr = data.corr()
plt.figure(figsize=(10, 6))
sns.heatmap(corr, annot=True, cmap='coolwarm', fmt='.2f')
plt.title('Correlation Matrix')
plt.show()

6. Future Improvements/Advancements:
• Hyperparameter Tuning & Model Evaluation:
Implement grid search or random search to fine-tune model parameters.
• Deep Learning:
Use deep learning models like neural networks or LSTMs for time series forecasting.
• Real-Time Data Updates:
Add real-time data fetching and updating functionality for dynamic predictions.
5

Output:
hour temperature humidity windspeed holiday working_day bikes_rented
0 6 7.870921 62.052065 14.452489 0 0 192.811250
1 19 20.265501 44.090339 7.785484 1 0 186.177394
2 14 12.215764 41.548714 10.163134 0 0 229.449734
3 10 19.530956 48.109977 4.677958 0 0 240.951000
4 7 8.859068 97.786907 11.609871 0 1 147.003672
CODE IMPLEMENTATION:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
from sklearn.model_selection import GridSearchCV

# Load the dataset


data = pd.read_csv('bike_sharing_data.csv')

# Feature columns (all columns except 'bikes_rented')


X = data.drop(columns=['bikes_rented'])

# Target variable (bikes rented)


y = data['bikes_rented']

# Split the data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Feature Scaling: Standardize features using StandardScaler


scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train a Random Forest Regressor model


model = RandomForestRegressor(n_estimators=100, random_state=42)
model.fit(X_train_scaled, y_train)

# Predict on the test set


y_pred = model.predict(X_test_scaled)
7

# Calculate Mean Squared Error and R-squared


mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

# Print the results


print(f'Mean Squared Error: {mse}')
print(f'R-squared: {r2}')

# Plot the predicted vs actual values


plt.scatter(y_test, y_pred)
plt.xlabel('Actual Bike Rentals')
plt.ylabel('Predicted Bike Rentals')
plt.title('Actual vs Predicted Bike Rentals')
plt.show()

# Hyperparameter tuning using GridSearchCV


param_grid = {
'n_estimators': [50, 100, 200],
'max_depth': [10, 20, None],
'min_samples_split': [2, 5, 10]
}

# Set up GridSearchCV for hyperparameter tuning


grid_search = GridSearchCV(RandomForestRegressor(random_state=42), param_grid, cv=5)
grid_search.fit(X_train_scaled, y_train)

# Print the best parameters from GridSearchCV


print("Best parameters found: ", grid_search.best_params_)

# Use the best model found by GridSearchCV for predictions


best_model = grid_search.best_estimator_
# Predict with the tuned model
y_pred_best = best_model.predict(X_test_scaled)

# Calculate Mean Squared Error and R-squared for the best model
mse_best = mean_squared_error(y_test, y_pred_best)
r2_best = r2_score(y_test, y_pred_best)

# Print the results of the best model


print(f'Mean Squared Error (Best Model): {mse_best}')
print(f'R-squared (Best Model): {r2_best}')

# Plot the predicted vs actual values for the best model


plt.scatter(y_test, y_pred_best)
plt.xlabel('Actual Bike Rentals')
plt.ylabel('Predicted Bike Rentals (Best Model)')
plt.title('Actual vs Predicted Bike Rentals (Best Model)')
plt.show()

# Optional: Save the model for future use (e.g., for deployment)
import joblib
joblib.dump(best_model, 'best_bike_sharing_model.pkl')
9

Output:
Explanation of the Code:
1. Imports:
o Libraries such as pandas, scikit-learn, matplotlib, and joblib are used to process data,
train models, evaluate, and visualize results.
2. Data Loading:
o The dataset is loaded from a CSV file, and the features (X) and target variable (y) are
extracted.
3. Data Preprocessing:
o The data is split into training and testing sets (80% for training and 20% for testing).
o The features are standardized using StandardScaler to improve the performance of
machine learning algorithms.
4. Model Training:
o A Random Forest Regressor model is trained on the scaled training data.
o Predictions are made on the test set.
5. Model Evaluation:
o The model’s performance is evaluated using Mean Squared Error (MSE) and R-
squared (R²), which help determine how well the model fits the data.
6. Visualization:
o A scatter plot visualizes the predicted values vs. the actual values, helping assess how
well the model performs.
7. Hyperparameter Tuning (GridSearchCV):
o GridSearchCV is used to find the best hyperparameters for the Random Forest model.
It searches through different combinations of hyperparameters like the number of
trees (n_estimators), maximum depth of trees (max_depth), and minimum samples to
split a node (min_samples_split).
o After finding the best hyperparameters, the model is retrained, and performance is
evaluated again.
8. Saving the Model:
o The trained and tuned model is saved using joblib for future use, such as deployment
in a real-time prediction system.

Future Advancements:
1. Cross-Validation:
o Use cross_val_score to perform cross-validation to get a better estimate of model
performance.
11

2. Advanced Models:
o Consider using more advanced models like XGBoost or LightGBM, which often
perform better for regression tasks.
3. Time Series Modeling:
o Incorporate time series forecasting techniques (e.g., ARIMA, LSTM) to predict bike
rentals based on time-related patterns.
4. Real-Time Data Integration:
o Integrate real-time data (e.g., weather, traffic conditions) into the model to provide
up-to-date predictions for bike-sharing demand.
5. Deployment:
o Deploy the trained model using a Flask or FastAPI web application for real-time
prediction.
6. Automated Retraining:
o Set up a pipeline for automated retraining of the model as new data becomes
available to keep the model's predictions up-to-date.
CODE IMPLEMENTATION:

import pandas as pd
import numpy as np

# Define a function to get user input and save the data to CSV
def get_user_input():
# Get input from the user
hour = int(input("Enter the hour of the day (0-23): "))
temperature = float(input("Enter the temperature (in Celsius): "))
humidity = float(input("Enter the humidity (%): "))
windspeed = float(input("Enter the windspeed (in m/s): "))
holiday = int(input("Is it a holiday? (1 for Yes, 0 for No): "))
working_day = int(input("Is it a working day? (1 for Yes, 0 for No): "))
bikes_rented = float(input("Enter the number of bikes rented: "))

# Create a new record for the entered data


new_data = {
'hour': hour,
'temperature': temperature,
'humidity': humidity,
'windspeed': windspeed,
'holiday': holiday,
'working_day': working_day,
'bikes_rented': bikes_rented
}

# Create a DataFrame for the new record


new_df = pd.DataFrame([new_data])

# Check if the CSV file already exists


13

try:
# If it exists, append the new data to it
existing_df = pd.read_csv('bike_sharing_data.csv')
updated_df = pd.concat([existing_df, new_df], ignore_index=True)
updated_df.to_csv('bike_sharing_data.csv', index=False)
except FileNotFoundError:
# If the CSV doesn't exist, create a new one with the new data
new_df.to_csv('bike_sharing_data.csv', index=False)

print("Data has been saved to 'bike_sharing_data.csv'.")

# Call the function to get user input and save data


get_user_input()

Explanation of the Code:


1. Importing Libraries:
o pandas: Used for data manipulation and reading/writing CSV files.
o numpy: Although not used in this script, it's commonly used for numerical operations
and could be useful for future extensions.
2. Defining the get_user_input() Function:
o This function collects data from the user regarding bike rental conditions and saves it
to a CSV file.
3. Getting User Input:
o The program asks the user to input several details related to the bike rental conditions,
such as:
▪ hour: Hour of the day (0-23).
▪ temperature: Temperature in Celsius.
▪ humidity: Humidity in percentage.
▪ windspeed: Windspeed in meters per second.
▪ holiday: Whether it's a holiday (1 for Yes, 0 for No).
▪ working_day: Whether it's a working day (1 for Yes, 0 for No).
▪ bikes_rented: Number of bikes rented (this is the target variable).
4. Creating a Data Dictionary:
o A dictionary new_data is created where each key corresponds to a feature (hour,
temperature, humidity, etc.) and its corresponding value is the user's input.
5. Converting Dictionary to DataFrame:
o The dictionary is then converted into a pandas DataFrame new_df, making it easier to
manipulate and append to an existing dataset.
6. Saving Data to CSV:
o The program checks if the bike_sharing_data.csv file already exists:
▪ If the file exists, it reads the existing data into a DataFrame, appends the new
data (new_df), and writes it back to the CSV.
▪ If the file does not exist, it creates a new CSV file with the input data.
7. Confirmation Message:
o After saving the data, the program prints a message confirming that the data has been
saved to the bike_sharing_data.csv file.
8. Calling the Function:
o Finally, the get_user_input() function is called to execute the program and prompt the
user for inputs.

Future Advancements:
1. Data Validation:
o Implement checks to ensure the user inputs are within valid ranges, e.g., ensuring the
hour is between 0 and 23.
2. User Interface (UI):
o Consider creating a graphical user interface (GUI) using libraries like Tkinter or
PyQt for easier data input or even a web interface with Flask or Django.
3. Automated Data Aggregation:
o Add functionality to automatically aggregate data by time (e.g., average bike rentals
per day, per hour) for better data analysis.
4. Real-time Data Integration:
o Incorporate real-time data sources, such as weather APIs or bike-sharing systems, to
collect data automatically without manual input.
5. Data Preprocessing:
o Introduce preprocessing steps like normalization, encoding categorical features, or
handling missing values to prepare the dataset for predictive modeling.
6. Predictive Modeling:
o With enough data, you could apply machine learning algorithms (e.g., Random
Forest, Linear Regression, Time-series forecasting) to predict future bike rentals
based on input features.
15

7. Visualization:
o Visualize the data using plotting libraries like matplotlib or seaborn to identify
trends in bike rentals based on time, temperature, and other factors.
8. Data Storage and Security:
o For large datasets, consider moving from a CSV file to a more scalable database
solution (e.g., SQLite, PostgreSQL) for better performance and security.

Code:
data = pd.read_csv('bike_sharing_data.csv')
data.groupby('hour')['bikes_rented'].mean().plot()

OUTPUT:
CODE IMPEMENTATION:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

# Load the dataset


data = pd.read_csv('bike_sharing_data.csv')

# Extract relevant columns for the plot


hours = data['hour']
temperature = data['temperature']
humidity = data['humidity']
bikes_rented = data['bikes_rented']

# Create a 3D scatter plot


fig = plt.figure(figsize=(10, 7))
ax = fig.add_subplot(111, projection='3d')

# Scatter plot with hour, temperature, and humidity as axes


# Color by the number of bikes rented (using a colormap)
scatter = ax.scatter(hours, temperature, humidity, c=bikes_rented, cmap='viridis', s=30)

# Labels and title


ax.set_xlabel('Hour of the Day')
ax.set_ylabel('Temperature (°C)')
ax.set_zlabel('Humidity (%)')
ax.set_title('3D Plot of Bike Rentals')

# Add a color bar to indicate bike rentals


cbar = plt.colorbar(scatter)
cbar.set_label('Bikes Rented')
17

# Show the plot


plt.show()

OUTPUT:
CODE IMPLEMENTATION:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Set a random seed for reproducibility


np.random.seed(42)

# Number of samples in the dataset


num_samples = 1000

# Create synthetic features


hours = np.random.randint(0, 24, size=num_samples) # Hours of the day (0-23)
temperature = np.random.uniform(5, 30, size=num_samples) # Temperature between 5°C to 30°C
humidity = np.random.uniform(40, 100, size=num_samples) # Humidity between 40% to 100%
windspeed = np.random.uniform(0, 15, size=num_samples) # Windspeed between 0 and 15 m/s
holiday = np.random.choice([0, 1], size=num_samples) # 0 or 1, for holiday (binary)
working_day = np.random.choice([0, 1], size=num_samples) # 0 or 1, for working day (binary)

# Generate target variable (bike rentals)


# This is a synthetic formula for the number of bikes rented
bikes_rented = (
100 + 20 * np.sin(np.pi * hours / 12) + 2 * temperature + 0.5 * humidity
- 0.3 * windspeed + 50 * holiday - 30 * working_day + np.random.normal(0, 30, num_samples)
)

# Create the DataFrame


data = pd.DataFrame({
'hour': hours,
'temperature': temperature,
'humidity': humidity,
19

'windspeed': windspeed,
'holiday': holiday,
'working_day': working_day,
'bikes_rented': bikes_rented
})

# Add a 'weekday' column based on the 'hour' column


# Let's assume data starts on a Monday for simplicity (0 = Sunday, 1 = Monday, ..., 6 = Saturday)
# Create a base start datetime
base_date = pd.to_datetime('2023-01-01') # Start from January 1st, 2023
data['datetime'] = base_date + pd.to_timedelta(data['hour'], unit='h')

# Add 'weekday' column


data['weekday'] = data['datetime'].dt.weekday # 0 = Monday, 6 = Sunday

# Add an 'epoch' column (number of seconds since the Unix epoch)


data['epoch'] = data['datetime'].astype(np.int64) // 10**9 # Convert datetime to epoch time (seconds
since 1970-01-01)

# Display first few rows of the modified dataset


print(data.head())

# Save the updated dataset to a CSV file


data.to_csv('updated_bike_sharing_data.csv', index=False)
OUTPUT:
hour temperature humidity windspeed holiday working_day \
0 6 7.870921 62.052065 14.452489 0 0
1 19 20.265501 44.090339 7.785484 1 0
2 14 12.215764 41.548714 10.163134 0 0
3 10 19.530956 48.109977 4.677958 0 0
4 7 8.859068 97.786907 11.609871 0 1
bikes_rented datetime weekday epoch
0 192.811250 2023-01-01 06:00:00 6 1672552800
1 186.177394 2023-01-01 19:00:00 6 1672599600
2 229.449734 2023-01-01 14:00:00 6 1672581600
3 240.951000 2023-01-01 10:00:00 6 1672567200
4 147.003672 2023-01-01 07:00:00 6 1672556400
21

EXPLANATION:
Synthetic Data Generation: It creates synthetic bike rental data based on features such as hour,
temperature, humidity, windspeed, holiday status, and working day status.
Adding weekday Column: It adds a column representing the day of the week, assuming that the
dataset starts on a Monday.
Adding epoch Column: It adds a column with the number of seconds since the Unix epoch (1970-
01-01), based on the datetime column.
Saving the Data: Finally, it saves the updated dataset to a CSV file
(updated_bike_sharing_data.csv).
CODE IMPLEMETATION:
import pandas as pd
import numpy as np
from datetime import timedelta

# Set a random seed for reproducibility


np.random.seed(42)

# Bike companies
bike_companies = ['Hero', 'Honda', 'Pulsar', 'Bajaj', 'TVS Scooty', 'Honda Activa', 'Elecfasion']

# Generate synthetic features for the dataset


num_samples = 1000
hours = np.random.randint(0, 24, size=num_samples)
temperature = np.random.uniform(5, 30, size=num_samples)
humidity = np.random.uniform(40, 100, size=num_samples)
windspeed = np.random.uniform(0, 15, size=num_samples)
holiday = np.random.choice([0, 1], size=num_samples)
working_day = np.random.choice([0, 1], size=num_samples)

# Generate target variable (bike rentals)


bikes_rented = (
100 + 20 * np.sin(np.pi * hours / 12) + 2 * temperature + 0.5 * humidity
- 0.3 * windspeed + 50 * holiday - 30 * working_day + np.random.normal(0, 30, num_samples)
)

# Create the DataFrame


data = pd.DataFrame({
'hour': hours,
'temperature': temperature,
'humidity': humidity,
23

'windspeed': windspeed,
'holiday': holiday,
'working_day': working_day,
'bikes_rented': bikes_rented
})

# Add a 'datetime' column for date handling


base_date = pd.to_datetime('2023-01-01')
data['datetime'] = base_date + pd.to_timedelta(data['hour'], unit='h')

# Add 'weekday' column


data['weekday'] = data['datetime'].dt.weekday

# Add 'epoch' column (number of seconds since Unix epoch)


data['epoch'] = data['datetime'].astype(np.int64) // 10**9

# Adding the new columns based on your requirement

# Bike Name - Randomly selecting from predefined companies and generating bike names
bike_names = []
for company in bike_companies:
bike_names += [f'{company}-{i}' for i in range(1, 101)] # Creating 100 bikes per company

# Ensure that we have enough names to cover the samples


data['bike_name'] = np.random.choice(bike_names, size=num_samples)

# Bike Company - Random selection from the provided companies


data['bike_company'] = np.random.choice(bike_companies, size=num_samples)

# Enrolling Date (random date between January 2020 and January 2023)
enrollment_dates = pd.to_datetime(np.random.choice(pd.date_range('2020-01-01', '2023-01-01',
freq='D'), size=num_samples))
data['enrollment_date'] = enrollment_dates
# Servicing Time (every 3 months from the enrollment date)
data['next_servicing'] = data['enrollment_date'] + pd.to_timedelta(np.random.randint(90, 120,
size=num_samples), unit='D')

# Owner Address (random synthetic addresses)


addresses = ['Address A', 'Address B', 'Address C', 'Address D', 'Address E']
data['owner_address'] = np.random.choice(addresses, size=num_samples)

# Display the first few rows of the updated dataset


print(data.head())

# Save the updated dataset to a CSV file


data.to_csv('updated_bike_sharing_with_bike_details.csv', index=False)

OUTPUT:
hour temperature humidity windspeed holiday working_day \
0 6 7.870921 62.052065 14.452489 0 0
1 19 20.265501 44.090339 7.785484 1 0
2 14 12.215764 41.548714 10.163134 0 0
3 10 19.530956 48.109977 4.677958 0 0
4 7 8.859068 97.786907 11.609871 0 1

bikes_rented datetime weekday epoch bike_name \


0 192.811250 2023-01-01 06:00:00 6 1672552800 Bajaj-93
1 186.177394 2023-01-01 19:00:00 6 1672599600 Hero-85
2 229.449734 2023-01-01 14:00:00 6 1672581600 Pulsar-67
3 240.951000 2023-01-01 10:00:00 6 1672567200 Hero-69
4 147.003672 2023-01-01 07:00:00 6 1672556400 Elecfasion-11
25

bike_company enrollment_date next_servicing owner_address


0 Honda Activa 2020-02-25 2020-06-14 Address D
1 TVS Scooty 2021-07-30 2021-11-03 Address A
2 TVS Scooty 2020-09-24 2021-01-16 Address D
3 Hero 2020-03-09 2020-07-05 Address B
4 Honda 2022-06-20 2022-10-16 Address C
CODE IMPLEMENTATION:
import pandas as pd

# Load the updated dataset


data = pd.read_csv('updated_bike_sharing_with_bike_details.csv')

# Group by bike company and calculate the average number of bikes rented
average_rentals = data.groupby('bike_company')['bikes_rented'].mean()

# Find the company with the maximum average bike rentals


best_company = average_rentals.idxmax()

# Get the average number of rentals for the best company


best_company_avg_rentals = average_rentals.max()

# Display the result


print(f"The best bike company based on the highest average bike rentals is: {best_company}")
print(f"With an average of {best_company_avg_rentals:.2f} bikes rented.")

OUTPUT:

The best bike company based on the highest average bike rentals is: Honda Activa
With an average of 187.26 bikes rented.
27

EXPLANATION:
Load the dataset: Reads the CSV file updated_bike_sharing_with_bike_details.csv that contains
the bike-sharing data.
Group by bike company: It groups the data by bike_company and calculates the mean number of
bikes rented for each company.
Identify the best company: Finds the bike company with the highest average number of bikes
rented using idxmax() and max().
Display results: Prints out the name of the best company and its average rentals, formatted to two
decimal places.
CODE IMPLEMENTATION:
import pandas as pd
import numpy as np
from datetime import timedelta

# Set a random seed for reproducibility


np.random.seed(42)

# Bike companies
bike_companies = ['Hero', 'Honda', 'Pulsar', 'Bajaj', 'TVS Scooty', 'Honda Activa', 'Elecfasion']

# Generate synthetic features for the dataset


num_samples = 1000
hours = np.random.randint(0, 24, size=num_samples)
temperature = np.random.uniform(5, 30, size=num_samples)
humidity = np.random.uniform(40, 100, size=num_samples)
windspeed = np.random.uniform(0, 15, size=num_samples)
holiday = np.random.choice([0, 1], size=num_samples)
working_day = np.random.choice([0, 1], size=num_samples)

# Generate target variable (bike rentals)


bikes_rented = (
100 + 20 * np.sin(np.pi * hours / 12) + 2 * temperature + 0.5 * humidity
- 0.3 * windspeed + 50 * holiday - 30 * working_day + np.random.normal(0, 30, num_samples)
)

# Create the DataFrame


data = pd.DataFrame({
'hour': hours,
'temperature': temperature,
'humidity': humidity,
'windspeed': windspeed,
29

'holiday': holiday,
'working_day': working_day,
'bikes_rented': bikes_rented
})

# Add a 'datetime' column for date handling


base_date = pd.to_datetime('2023-01-01')
data['datetime'] = base_date + pd.to_timedelta(data['hour'], unit='h')

# Add 'weekday' column


data['weekday'] = data['datetime'].dt.weekday

# Add 'epoch' column (number of seconds since Unix epoch)


data['epoch'] = data['datetime'].astype(np.int64) // 10**9

# Adding the new columns based on your requirement

# Bike Name - Randomly selecting from predefined companies and generating bike names
bike_names = []
for company in bike_companies:
bike_names += [f'{company}-{i}' for i in range(1, 101)] # Creating 100 bikes per company

# Ensure that we have enough names to cover the samples


data['bike_name'] = np.random.choice(bike_names, size=num_samples)

# Bike Company - Random selection from the provided companies


data['bike_company'] = np.random.choice(bike_companies, size=num_samples)

# Enrolling Date (random date between January 2020 and January 2023)
enrollment_dates = pd.to_datetime(np.random.choice(pd.date_range('2020-01-01', '2023-01-01',
freq='D'), size=num_samples))
data['enrollment_date'] = enrollment_dates
# Servicing Time (every 3 months from the enrollment date)
data['next_servicing'] = data['enrollment_date'] + pd.to_timedelta(np.random.randint(90, 120,
size=num_samples), unit='D')

# Owner Address (random synthetic addresses)


addresses = ['Address A', 'Address B', 'Address C', 'Address D', 'Address E']
data['owner_address'] = np.random.choice(addresses, size=num_samples)

# Display the first few rows of the updated dataset


print(data.head())

# Save the updated dataset to a CSV file


data.to_csv('updated_bike_sharing_with_bike_details.csv', index=False)
31

Output:

hour temperature humidity windspeed holiday working_day \


0 6 7.870921 62.052065 14.452489 0 0
1 19 20.265501 44.090339 7.785484 1 0
2 14 12.215764 41.548714 10.163134 0 0
3 10 19.530956 48.109977 4.677958 0 0
4 7 8.859068 97.786907 11.609871 0 1

bikes_rented datetime weekday epoch bike_name \


0 192.811250 2023-01-01 06:00:00 6 1672552800 Bajaj-93
1 186.177394 2023-01-01 19:00:00 6 1672599600 Hero-85
2 229.449734 2023-01-01 14:00:00 6 1672581600 Pulsar-67
3 240.951000 2023-01-01 10:00:00 6 1672567200 Hero-69
4 147.003672 2023-01-01 07:00:00 6 1672556400 Elecfasion-11

bike_company enrollment_date next_servicing owner_address


0 Honda Activa 2020-02-25 2020-06-14 Address D
1 TVS Scooty 2021-07-30 2021-11-03 Address A
2 TVS Scooty 2020-09-24 2021-01-16 Address D
3 Hero 2020-03-09 2020-07-05 Address B
4 Honda 2022-06-20 2022-10-16 Address C
[22]:
Code implementation:
import matplotlib.pyplot as plt

# Plot the average bike rentals for each company


plt.figure(figsize=(10, 6))
average_rentals.sort_values().plot(kind='bar', color='skyblue')
plt.title('Average Bike Rentals by Company')
plt.xlabel('Bike Company')
plt.ylabel('Average Number of Bikes Rented')
plt.xticks(rotation=45)
plt.show()
33

Output:
CODE IMPLEMENTATION:
import pandas as pd

# Load the dataset


data = pd.read_csv('updated_bike_sharing_with_bike_details.csv')

# Function to interact with the user and fetch the relevant bike rental information
def get_rental_info():
# Ask the user to provide some criteria for filtering
print("Available bike companies: Hero, Honda, Pulsar, Bajaj, TVS Scooty, Honda Activa,
Elecfasion")
company_name = input("Enter the bike company name to see rentals (or type 'exit' to quit):
").strip()

if company_name.lower() == 'exit':
return

# Filter data by the entered company name


filtered_data = data[data['bike_company'].str.lower() == company_name.lower()]

if filtered_data.empty:
print("No data found for this company. Please check the name and try again.")
return

print(f"\nDisplaying rental information for {company_name.capitalize()}:\n")


print(filtered_data[['bike_name', 'hour', 'bikes_rented', 'temperature', 'humidity']].head())

# Ask for feedback on the service


feedback = input("\nPlease provide your feedback on the service (e.g., 'Good service',
'Needs improvement', etc.): ").strip()
rating = input("Rate the service (1-5, with 5 being the best): ").strip()
35

# Use .loc[] to avoid SettingWithCopyWarning


filtered_data.loc[:, 'feedback'] = feedback
filtered_data.loc[:, 'rating'] = rating

# Save the feedback to a new CSV file


feedback_file = 'bike_rental_feedback.csv'
filtered_data[['bike_name', 'bike_company', 'hour', 'bikes_rented', 'feedback',
'rating']].to_csv(feedback_file, mode='a', header=not
pd.io.common.file_exists(feedback_file), index=False)

print("\nThank you for your feedback! It has been recorded.")

# Loop to interact with the user until they want to exit


while True:
get_rental_info()
continue_input = input("\nDo you want to check feedback for another bike company?
(yes/no): ").strip().lower()
if continue_input != 'yes':
print("Thank you! Have a great day.")
break
Output:
Available bike companies: Hero, Honda, Pulsar, Bajaj, TVS Scooty, Honda Activa,
Elecfasion
Enter the bike name (e.g., Hero-85, Bajaj-93, etc.): 85]
No data found for the bike: 85]. Please check the name and try again.

Do you want to check feedback for another bike? (yes/no): yes


Available bike companies: Hero, Honda, Pulsar, Bajaj, TVS Scooty, Honda Activa,
Elecfasion
Enter the bike name (e.g., Hero-85, Bajaj-93, etc.): Hero-85

Displaying rental information for bike Hero-85:

hour temperature humidity windspeed holiday working_day \


1 19 20.265501 44.090339 7.785484 1 0
523 13 22.012483 56.891272 0.029882 1 0

bikes_rented datetime weekday epoch bike_name \


1 186.177394 2023-01-01 19:00:00 6 1672599600 Hero-85
523 206.866187 2023-01-01 13:00:00 6 1672578000 Hero-85

bike_company enrollment_date next_servicing owner_address


1 TVS Scooty 2021-07-30 2021-11-03 Address A
523 Pulsar 2022-04-07 2022-07-23 Address B

Please provide your feedback on the service (e.g., 'Good service', 'Needs improvement'): 1
Rate the service (1-5, with 5 being the best): 5

Thank you for your feedback! It has been recorded.


37

You might also like