0% found this document useful (0 votes)

25 views37 pages

Bike Sharing Prediction Project Structure

The document outlines a project for predicting bike-sharing demand based on environmental and temporal factors. It details the project structure, including data generation, preprocessing, modeling, and future advancements such as real-time data integration and advanced modeling techniques. The implementation includes code for data handling, model training, evaluation, and user input functionality.

Uploaded by

shaikallabhakshu503

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views37 pages

Bike Sharing Prediction Project Structure

Uploaded by

shaikallabhakshu503

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

1

Bike Sharing Prediction Project Structure

1. Project Title:
• Bike Sharing Demand Prediction
Predicting the demand for bike-sharing services based on various environmental and temporal
factors.

2. Problem Statement:
• Objective:
Predict the number of bikes rented at different hours of the day based on environmental
factors like temperature, humidity, wind speed, and whether it’s a working day or holiday.
• Goal:
Help bike-sharing companies optimize their bike distribution, improve availability, and
ensure customer satisfaction by predicting demand.

3. Features of the Program:

• Data Generation and Preprocessing:
o Synthetic Data Creation: Generating a simulated dataset with various features such
as hour, temperature, humidity, windspeed, holiday, and working_day that influence
the number of bikes rented.
o Data Cleaning: Checking for missing values, outliers, and normalizing data where
necessary.
• Feature Engineering:
o Adding derived features like hour_of_day, day_of_week, etc., that might have an
impact on bike rental patterns.
o Conversion of categorical variables (holiday, working_day) into numerical values
(e.g., 0 for "no", 1 for "yes").
• Data Visualization:
o Exploratory Data Analysis (EDA): Visualizing the data using matplotlib or seaborn
to understand the relationships between features and bike rentals.
o Visualizations might include:
▪ Heatmap of correlations
▪ Line plots for time-based patterns (e.g., bike rentals throughout the day)
▪ Scatter plots to show relationships between variables like temperature and
number of bikes rented.
• Modeling:
o Regression Models:
▪ Linear regression, Random Forest Regressor, Gradient Boosting, etc., to
predict bike rental demand.
▪ Hyperparameter tuning to improve model performance.
o Evaluation Metrics:
▪ RMSE (Root Mean Squared Error)
▪ MAE (Mean Absolute Error)
▪ R² score for regression performance.
• Prediction & Deployment:
o Develop a model to make predictions on new, unseen data.
o Optional: Create a simple API or web interface to input features and get bike rental
predictions (using Flask/Django for deployment).

4. Advancement Features for the Project:

• Time Series Analysis:
Given that bike rentals have strong temporal dependencies, you could implement time-series
forecasting methods like ARIMA, LSTM (Long Short-Term Memory), or Prophet to predict
future bike rentals.
• Integration with Real-Time Data:
Instead of using synthetic data, consider integrating real-time weather data and holiday
information through APIs (e.g., OpenWeather API for weather data).
• Demand Prediction Algorithm:
Develop more sophisticated demand prediction models, such as ensemble methods combining
multiple models for better accuracy.
• Smart Bike Distribution Algorithm:
Build an algorithm that can optimize bike distribution across various stations based on the
predicted demand.
• User Behavior Analysis:
Use historical data of bike rentals to understand user behavior and offer personalized
recommendations (e.g., discounts, preferred stations).
• Data Visualization Dashboard:
Create an interactive dashboard using Dash or Streamlit to visualize predictions, weather
trends, and bike-sharing patterns in real-time.

5. Code Implementation:
Below is a clean version of your code with explanations:
python
Copy
import pandas as pd
3

import numpy as np
import matplotlib.pyplot as plt

# Set a random seed for reproducibility

np.random.seed(42)

# Number of samples in the dataset

num_samples = 1000

# Create synthetic features

hours = np.random.randint(0, 24, size=num_samples) # Hours of the day (0-23)
temperature = np.random.uniform(5, 30, size=num_samples) # Temperature between 5°C to 30°C
humidity = np.random.uniform(40, 100, size=num_samples) # Humidity between 40% to 100%
windspeed = np.random.uniform(0, 15, size=num_samples) # Windspeed between 0 and 15 m/s
holiday = np.random.choice([0, 1], size=num_samples) # 0 or 1, for holiday (binary)
working_day = np.random.choice([0, 1], size=num_samples) # 0 or 1, for working day (binary)

# Generate target variable (bike rentals) using a synthetic formula

bikes_rented = (
100 + 20 * np.sin(np.pi * hours / 12) + 2 * temperature + 0.5 * humidity
- 0.3 * windspeed + 50 * holiday - 30 * working_day + np.random.normal(0, 30, num_samples)
)

# Create the DataFrame

data = pd.DataFrame({
'hour': hours,
'temperature': temperature,
'humidity': humidity,
'windspeed': windspeed,
'holiday': holiday,
'working_day': working_day,
'bikes_rented': bikes_rented
})

# Display first few rows of the dataset

print(data.head())

# Save to a CSV file for further analysis or model training

data.to_csv('bike_sharing_data.csv', index=False)

# Data Visualization
plt.figure(figsize=(10, 6))
plt.scatter(data['temperature'], data['bikes_rented'], alpha=0.6)
plt.title('Temperature vs Bikes Rented')
plt.xlabel('Temperature (°C)')
plt.ylabel('Number of Bikes Rented')
plt.show()

# Correlation heatmap (optional)

import seaborn as sns
corr = data.corr()
plt.figure(figsize=(10, 6))
sns.heatmap(corr, annot=True, cmap='coolwarm', fmt='.2f')
plt.title('Correlation Matrix')
plt.show()

6. Future Improvements/Advancements:
• Hyperparameter Tuning & Model Evaluation:
Implement grid search or random search to fine-tune model parameters.
• Deep Learning:
Use deep learning models like neural networks or LSTMs for time series forecasting.
• Real-Time Data Updates:
Add real-time data fetching and updating functionality for dynamic predictions.
5

Output:
hour temperature humidity windspeed holiday working_day bikes_rented
0 6 7.870921 62.052065 14.452489 0 0 192.811250
1 19 20.265501 44.090339 7.785484 1 0 186.177394
2 14 12.215764 41.548714 10.163134 0 0 229.449734
3 10 19.530956 48.109977 4.677958 0 0 240.951000
4 7 8.859068 97.786907 11.609871 0 1 147.003672
CODE IMPLEMENTATION:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
from sklearn.model_selection import GridSearchCV

# Load the dataset

data = pd.read_csv('bike_sharing_data.csv')

# Feature columns (all columns except 'bikes_rented')

X = data.drop(columns=['bikes_rented'])

# Target variable (bikes rented)

y = data['bikes_rented']

# Split the data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Feature Scaling: Standardize features using StandardScaler

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train a Random Forest Regressor model

model = RandomForestRegressor(n_estimators=100, random_state=42)
model.fit(X_train_scaled, y_train)

# Predict on the test set

y_pred = model.predict(X_test_scaled)
7

# Calculate Mean Squared Error and R-squared

mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

# Print the results

print(f'Mean Squared Error: {mse}')
print(f'R-squared: {r2}')

# Plot the predicted vs actual values

plt.scatter(y_test, y_pred)
plt.xlabel('Actual Bike Rentals')
plt.ylabel('Predicted Bike Rentals')
plt.title('Actual vs Predicted Bike Rentals')
plt.show()

# Hyperparameter tuning using GridSearchCV

param_grid = {
'n_estimators': [50, 100, 200],
'max_depth': [10, 20, None],
'min_samples_split': [2, 5, 10]
}

# Set up GridSearchCV for hyperparameter tuning

grid_search = GridSearchCV(RandomForestRegressor(random_state=42), param_grid, cv=5)
grid_search.fit(X_train_scaled, y_train)

# Print the best parameters from GridSearchCV

print("Best parameters found: ", grid_search.best_params_)

# Use the best model found by GridSearchCV for predictions

best_model = grid_search.best_estimator_
# Predict with the tuned model
y_pred_best = best_model.predict(X_test_scaled)

# Calculate Mean Squared Error and R-squared for the best model
mse_best = mean_squared_error(y_test, y_pred_best)
r2_best = r2_score(y_test, y_pred_best)

# Print the results of the best model

print(f'Mean Squared Error (Best Model): {mse_best}')
print(f'R-squared (Best Model): {r2_best}')

# Plot the predicted vs actual values for the best model

plt.scatter(y_test, y_pred_best)
plt.xlabel('Actual Bike Rentals')
plt.ylabel('Predicted Bike Rentals (Best Model)')
plt.title('Actual vs Predicted Bike Rentals (Best Model)')
plt.show()

# Optional: Save the model for future use (e.g., for deployment)
import joblib
joblib.dump(best_model, 'best_bike_sharing_model.pkl')
9

Output:
Explanation of the Code:
1. Imports:
o Libraries such as pandas, scikit-learn, matplotlib, and joblib are used to process data,
train models, evaluate, and visualize results.
2. Data Loading:
o The dataset is loaded from a CSV file, and the features (X) and target variable (y) are
extracted.
3. Data Preprocessing:
o The data is split into training and testing sets (80% for training and 20% for testing).
o The features are standardized using StandardScaler to improve the performance of
machine learning algorithms.
4. Model Training:
o A Random Forest Regressor model is trained on the scaled training data.
o Predictions are made on the test set.
5. Model Evaluation:
o The model’s performance is evaluated using Mean Squared Error (MSE) and R-
squared (R²), which help determine how well the model fits the data.
6. Visualization:
o A scatter plot visualizes the predicted values vs. the actual values, helping assess how
well the model performs.
7. Hyperparameter Tuning (GridSearchCV):
o GridSearchCV is used to find the best hyperparameters for the Random Forest model.
It searches through different combinations of hyperparameters like the number of
trees (n_estimators), maximum depth of trees (max_depth), and minimum samples to
split a node (min_samples_split).
o After finding the best hyperparameters, the model is retrained, and performance is
evaluated again.
8. Saving the Model:
o The trained and tuned model is saved using joblib for future use, such as deployment
in a real-time prediction system.

Future Advancements:
1. Cross-Validation:
o Use cross_val_score to perform cross-validation to get a better estimate of model
performance.
11

2. Advanced Models:
o Consider using more advanced models like XGBoost or LightGBM, which often
perform better for regression tasks.
3. Time Series Modeling:
o Incorporate time series forecasting techniques (e.g., ARIMA, LSTM) to predict bike
rentals based on time-related patterns.
4. Real-Time Data Integration:
o Integrate real-time data (e.g., weather, traffic conditions) into the model to provide
up-to-date predictions for bike-sharing demand.
5. Deployment:
o Deploy the trained model using a Flask or FastAPI web application for real-time
prediction.
6. Automated Retraining:
o Set up a pipeline for automated retraining of the model as new data becomes
available to keep the model's predictions up-to-date.
CODE IMPLEMENTATION:

import pandas as pd
import numpy as np

# Define a function to get user input and save the data to CSV
def get_user_input():
# Get input from the user
hour = int(input("Enter the hour of the day (0-23): "))
temperature = float(input("Enter the temperature (in Celsius): "))
humidity = float(input("Enter the humidity (%): "))
windspeed = float(input("Enter the windspeed (in m/s): "))
holiday = int(input("Is it a holiday? (1 for Yes, 0 for No): "))
working_day = int(input("Is it a working day? (1 for Yes, 0 for No): "))
bikes_rented = float(input("Enter the number of bikes rented: "))

# Create a new record for the entered data

new_data = {
'hour': hour,
'temperature': temperature,
'humidity': humidity,
'windspeed': windspeed,
'holiday': holiday,
'working_day': working_day,
'bikes_rented': bikes_rented
}

# Create a DataFrame for the new record

new_df = pd.DataFrame([new_data])

# Check if the CSV file already exists

try:
# If it exists, append the new data to it
existing_df = pd.read_csv('bike_sharing_data.csv')
updated_df = pd.concat([existing_df, new_df], ignore_index=True)
updated_df.to_csv('bike_sharing_data.csv', index=False)
except FileNotFoundError:
# If the CSV doesn't exist, create a new one with the new data
new_df.to_csv('bike_sharing_data.csv', index=False)

print("Data has been saved to 'bike_sharing_data.csv'.")

# Call the function to get user input and save data

get_user_input()

Explanation of the Code:

1. Importing Libraries:
o pandas: Used for data manipulation and reading/writing CSV files.
o numpy: Although not used in this script, it's commonly used for numerical operations
and could be useful for future extensions.
2. Defining the get_user_input() Function:
o This function collects data from the user regarding bike rental conditions and saves it
to a CSV file.
3. Getting User Input:
o The program asks the user to input several details related to the bike rental conditions,
such as:
▪ hour: Hour of the day (0-23).
▪ temperature: Temperature in Celsius.
▪ humidity: Humidity in percentage.
▪ windspeed: Windspeed in meters per second.
▪ holiday: Whether it's a holiday (1 for Yes, 0 for No).
▪ working_day: Whether it's a working day (1 for Yes, 0 for No).
▪ bikes_rented: Number of bikes rented (this is the target variable).
4. Creating a Data Dictionary:
o A dictionary new_data is created where each key corresponds to a feature (hour,
temperature, humidity, etc.) and its corresponding value is the user's input.
5. Converting Dictionary to DataFrame:
o The dictionary is then converted into a pandas DataFrame new_df, making it easier to
manipulate and append to an existing dataset.
6. Saving Data to CSV:
o The program checks if the bike_sharing_data.csv file already exists:
▪ If the file exists, it reads the existing data into a DataFrame, appends the new
data (new_df), and writes it back to the CSV.
▪ If the file does not exist, it creates a new CSV file with the input data.
7. Confirmation Message:
o After saving the data, the program prints a message confirming that the data has been
saved to the bike_sharing_data.csv file.
8. Calling the Function:
o Finally, the get_user_input() function is called to execute the program and prompt the
user for inputs.

Future Advancements:
1. Data Validation:
o Implement checks to ensure the user inputs are within valid ranges, e.g., ensuring the
hour is between 0 and 23.
2. User Interface (UI):
o Consider creating a graphical user interface (GUI) using libraries like Tkinter or
PyQt for easier data input or even a web interface with Flask or Django.
3. Automated Data Aggregation:
o Add functionality to automatically aggregate data by time (e.g., average bike rentals
per day, per hour) for better data analysis.
4. Real-time Data Integration:
o Incorporate real-time data sources, such as weather APIs or bike-sharing systems, to
collect data automatically without manual input.
5. Data Preprocessing:
o Introduce preprocessing steps like normalization, encoding categorical features, or
handling missing values to prepare the dataset for predictive modeling.
6. Predictive Modeling:
o With enough data, you could apply machine learning algorithms (e.g., Random
Forest, Linear Regression, Time-series forecasting) to predict future bike rentals
based on input features.
15

7. Visualization:
o Visualize the data using plotting libraries like matplotlib or seaborn to identify
trends in bike rentals based on time, temperature, and other factors.
8. Data Storage and Security:
o For large datasets, consider moving from a CSV file to a more scalable database
solution (e.g., SQLite, PostgreSQL) for better performance and security.

Code:
data = pd.read_csv('bike_sharing_data.csv')
data.groupby('hour')['bikes_rented'].mean().plot()

OUTPUT:
CODE IMPEMENTATION:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

# Load the dataset

data = pd.read_csv('bike_sharing_data.csv')

# Extract relevant columns for the plot

hours = data['hour']
temperature = data['temperature']
humidity = data['humidity']
bikes_rented = data['bikes_rented']

# Create a 3D scatter plot

fig = plt.figure(figsize=(10, 7))
ax = fig.add_subplot(111, projection='3d')

# Scatter plot with hour, temperature, and humidity as axes

# Color by the number of bikes rented (using a colormap)
scatter = ax.scatter(hours, temperature, humidity, c=bikes_rented, cmap='viridis', s=30)

# Labels and title

ax.set_xlabel('Hour of the Day')
ax.set_ylabel('Temperature (°C)')
ax.set_zlabel('Humidity (%)')
ax.set_title('3D Plot of Bike Rentals')

# Add a color bar to indicate bike rentals

cbar = plt.colorbar(scatter)
cbar.set_label('Bikes Rented')
17

# Show the plot

plt.show()

OUTPUT:
CODE IMPLEMENTATION:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Set a random seed for reproducibility

np.random.seed(42)

# Number of samples in the dataset

num_samples = 1000

# Create synthetic features

# Generate target variable (bike rentals)

# This is a synthetic formula for the number of bikes rented
bikes_rented = (
100 + 20 * np.sin(np.pi * hours / 12) + 2 * temperature + 0.5 * humidity
- 0.3 * windspeed + 50 * holiday - 30 * working_day + np.random.normal(0, 30, num_samples)
)

# Create the DataFrame

data = pd.DataFrame({
'hour': hours,
'temperature': temperature,
'humidity': humidity,
19

'windspeed': windspeed,
'holiday': holiday,
'working_day': working_day,
'bikes_rented': bikes_rented
})

# Add a 'weekday' column based on the 'hour' column

# Let's assume data starts on a Monday for simplicity (0 = Sunday, 1 = Monday, ..., 6 = Saturday)
# Create a base start datetime
base_date = pd.to_datetime('2023-01-01') # Start from January 1st, 2023
data['datetime'] = base_date + pd.to_timedelta(data['hour'], unit='h')

# Add 'weekday' column

data['weekday'] = data['datetime'].dt.weekday # 0 = Monday, 6 = Sunday

# Add an 'epoch' column (number of seconds since the Unix epoch)

data['epoch'] = data['datetime'].astype(np.int64) // 10**9 # Convert datetime to epoch time (seconds
since 1970-01-01)

# Display first few rows of the modified dataset

print(data.head())

# Save the updated dataset to a CSV file

data.to_csv('updated_bike_sharing_data.csv', index=False)
OUTPUT:
hour temperature humidity windspeed holiday working_day \
0 6 7.870921 62.052065 14.452489 0 0
1 19 20.265501 44.090339 7.785484 1 0
2 14 12.215764 41.548714 10.163134 0 0
3 10 19.530956 48.109977 4.677958 0 0
4 7 8.859068 97.786907 11.609871 0 1
bikes_rented datetime weekday epoch
0 192.811250 2023-01-01 06:00:00 6 1672552800
1 186.177394 2023-01-01 19:00:00 6 1672599600
2 229.449734 2023-01-01 14:00:00 6 1672581600
3 240.951000 2023-01-01 10:00:00 6 1672567200
4 147.003672 2023-01-01 07:00:00 6 1672556400
21

EXPLANATION:
Synthetic Data Generation: It creates synthetic bike rental data based on features such as hour,
temperature, humidity, windspeed, holiday status, and working day status.
Adding weekday Column: It adds a column representing the day of the week, assuming that the
dataset starts on a Monday.
Adding epoch Column: It adds a column with the number of seconds since the Unix epoch (1970-
01-01), based on the datetime column.
Saving the Data: Finally, it saves the updated dataset to a CSV file
(updated_bike_sharing_data.csv).
CODE IMPLEMETATION:
import pandas as pd
import numpy as np
from datetime import timedelta

# Set a random seed for reproducibility

np.random.seed(42)

# Bike companies
bike_companies = ['Hero', 'Honda', 'Pulsar', 'Bajaj', 'TVS Scooty', 'Honda Activa', 'Elecfasion']

# Generate synthetic features for the dataset

num_samples = 1000
hours = np.random.randint(0, 24, size=num_samples)
temperature = np.random.uniform(5, 30, size=num_samples)
humidity = np.random.uniform(40, 100, size=num_samples)
windspeed = np.random.uniform(0, 15, size=num_samples)
holiday = np.random.choice([0, 1], size=num_samples)
working_day = np.random.choice([0, 1], size=num_samples)

# Generate target variable (bike rentals)

bikes_rented = (
100 + 20 * np.sin(np.pi * hours / 12) + 2 * temperature + 0.5 * humidity
- 0.3 * windspeed + 50 * holiday - 30 * working_day + np.random.normal(0, 30, num_samples)
)

# Create the DataFrame

data = pd.DataFrame({
'hour': hours,
'temperature': temperature,
'humidity': humidity,
23

'windspeed': windspeed,
'holiday': holiday,
'working_day': working_day,
'bikes_rented': bikes_rented
})

# Add a 'datetime' column for date handling

base_date = pd.to_datetime('2023-01-01')
data['datetime'] = base_date + pd.to_timedelta(data['hour'], unit='h')

# Add 'weekday' column

data['weekday'] = data['datetime'].dt.weekday

# Add 'epoch' column (number of seconds since Unix epoch)

data['epoch'] = data['datetime'].astype(np.int64) // 10**9

# Adding the new columns based on your requirement

# Bike Name - Randomly selecting from predefined companies and generating bike names
bike_names = []
for company in bike_companies:
bike_names += [f'{company}-{i}' for i in range(1, 101)] # Creating 100 bikes per company

# Ensure that we have enough names to cover the samples

data['bike_name'] = np.random.choice(bike_names, size=num_samples)

# Bike Company - Random selection from the provided companies

data['bike_company'] = np.random.choice(bike_companies, size=num_samples)

# Enrolling Date (random date between January 2020 and January 2023)
enrollment_dates = pd.to_datetime(np.random.choice(pd.date_range('2020-01-01', '2023-01-01',
freq='D'), size=num_samples))
data['enrollment_date'] = enrollment_dates
# Servicing Time (every 3 months from the enrollment date)
data['next_servicing'] = data['enrollment_date'] + pd.to_timedelta(np.random.randint(90, 120,
size=num_samples), unit='D')

# Owner Address (random synthetic addresses)

addresses = ['Address A', 'Address B', 'Address C', 'Address D', 'Address E']
data['owner_address'] = np.random.choice(addresses, size=num_samples)

# Display the first few rows of the updated dataset

print(data.head())

# Save the updated dataset to a CSV file

data.to_csv('updated_bike_sharing_with_bike_details.csv', index=False)

OUTPUT:
hour temperature humidity windspeed holiday working_day \
0 6 7.870921 62.052065 14.452489 0 0
1 19 20.265501 44.090339 7.785484 1 0
2 14 12.215764 41.548714 10.163134 0 0
3 10 19.530956 48.109977 4.677958 0 0
4 7 8.859068 97.786907 11.609871 0 1

bikes_rented datetime weekday epoch bike_name \

0 192.811250 2023-01-01 06:00:00 6 1672552800 Bajaj-93
1 186.177394 2023-01-01 19:00:00 6 1672599600 Hero-85
2 229.449734 2023-01-01 14:00:00 6 1672581600 Pulsar-67
3 240.951000 2023-01-01 10:00:00 6 1672567200 Hero-69
4 147.003672 2023-01-01 07:00:00 6 1672556400 Elecfasion-11
25

bike_company enrollment_date next_servicing owner_address

0 Honda Activa 2020-02-25 2020-06-14 Address D
1 TVS Scooty 2021-07-30 2021-11-03 Address A
2 TVS Scooty 2020-09-24 2021-01-16 Address D
3 Hero 2020-03-09 2020-07-05 Address B
4 Honda 2022-06-20 2022-10-16 Address C
CODE IMPLEMENTATION:
import pandas as pd

# Load the updated dataset

data = pd.read_csv('updated_bike_sharing_with_bike_details.csv')

# Group by bike company and calculate the average number of bikes rented
average_rentals = data.groupby('bike_company')['bikes_rented'].mean()

# Find the company with the maximum average bike rentals

best_company = average_rentals.idxmax()

# Get the average number of rentals for the best company

best_company_avg_rentals = average_rentals.max()

# Display the result

print(f"The best bike company based on the highest average bike rentals is: {best_company}")
print(f"With an average of {best_company_avg_rentals:.2f} bikes rented.")

OUTPUT:

The best bike company based on the highest average bike rentals is: Honda Activa
With an average of 187.26 bikes rented.
27

EXPLANATION:
Load the dataset: Reads the CSV file updated_bike_sharing_with_bike_details.csv that contains
the bike-sharing data.
Group by bike company: It groups the data by bike_company and calculates the mean number of
bikes rented for each company.
Identify the best company: Finds the bike company with the highest average number of bikes
rented using idxmax() and max().
Display results: Prints out the name of the best company and its average rentals, formatted to two
decimal places.
CODE IMPLEMENTATION:
import pandas as pd
import numpy as np
from datetime import timedelta

# Set a random seed for reproducibility

np.random.seed(42)

# Bike companies
bike_companies = ['Hero', 'Honda', 'Pulsar', 'Bajaj', 'TVS Scooty', 'Honda Activa', 'Elecfasion']

# Generate synthetic features for the dataset

# Generate target variable (bike rentals)

bikes_rented = (
100 + 20 * np.sin(np.pi * hours / 12) + 2 * temperature + 0.5 * humidity
- 0.3 * windspeed + 50 * holiday - 30 * working_day + np.random.normal(0, 30, num_samples)
)

# Create the DataFrame

data = pd.DataFrame({
'hour': hours,
'temperature': temperature,
'humidity': humidity,
'windspeed': windspeed,
29

'holiday': holiday,
'working_day': working_day,
'bikes_rented': bikes_rented
})

# Add a 'datetime' column for date handling

base_date = pd.to_datetime('2023-01-01')
data['datetime'] = base_date + pd.to_timedelta(data['hour'], unit='h')

# Add 'weekday' column

data['weekday'] = data['datetime'].dt.weekday

# Add 'epoch' column (number of seconds since Unix epoch)

data['epoch'] = data['datetime'].astype(np.int64) // 10**9

# Adding the new columns based on your requirement

# Ensure that we have enough names to cover the samples

data['bike_name'] = np.random.choice(bike_names, size=num_samples)

# Bike Company - Random selection from the provided companies

data['bike_company'] = np.random.choice(bike_companies, size=num_samples)

# Owner Address (random synthetic addresses)

addresses = ['Address A', 'Address B', 'Address C', 'Address D', 'Address E']
data['owner_address'] = np.random.choice(addresses, size=num_samples)

# Display the first few rows of the updated dataset

print(data.head())

# Save the updated dataset to a CSV file

data.to_csv('updated_bike_sharing_with_bike_details.csv', index=False)
31

Output:

hour temperature humidity windspeed holiday working_day \

0 6 7.870921 62.052065 14.452489 0 0
1 19 20.265501 44.090339 7.785484 1 0
2 14 12.215764 41.548714 10.163134 0 0
3 10 19.530956 48.109977 4.677958 0 0
4 7 8.859068 97.786907 11.609871 0 1

bikes_rented datetime weekday epoch bike_name \

bike_company enrollment_date next_servicing owner_address

# Plot the average bike rentals for each company

plt.figure(figsize=(10, 6))
average_rentals.sort_values().plot(kind='bar', color='skyblue')
plt.title('Average Bike Rentals by Company')
plt.xlabel('Bike Company')
plt.ylabel('Average Number of Bikes Rented')
plt.xticks(rotation=45)
plt.show()
33

Output:
CODE IMPLEMENTATION:
import pandas as pd

# Load the dataset

data = pd.read_csv('updated_bike_sharing_with_bike_details.csv')

# Function to interact with the user and fetch the relevant bike rental information
def get_rental_info():
# Ask the user to provide some criteria for filtering
print("Available bike companies: Hero, Honda, Pulsar, Bajaj, TVS Scooty, Honda Activa,
Elecfasion")
company_name = input("Enter the bike company name to see rentals (or type 'exit' to quit):
").strip()

if company_name.lower() == 'exit':
return

# Filter data by the entered company name

filtered_data = data[data['bike_company'].str.lower() == company_name.lower()]

if filtered_data.empty:
print("No data found for this company. Please check the name and try again.")
return

print(f"\nDisplaying rental information for {company_name.capitalize()}:\n")

print(filtered_data[['bike_name', 'hour', 'bikes_rented', 'temperature', 'humidity']].head())

# Ask for feedback on the service

feedback = input("\nPlease provide your feedback on the service (e.g., 'Good service',
'Needs improvement', etc.): ").strip()
rating = input("Rate the service (1-5, with 5 being the best): ").strip()
35

# Use .loc[] to avoid SettingWithCopyWarning

filtered_data.loc[:, 'feedback'] = feedback
filtered_data.loc[:, 'rating'] = rating

# Save the feedback to a new CSV file

feedback_file = 'bike_rental_feedback.csv'
filtered_data[['bike_name', 'bike_company', 'hour', 'bikes_rented', 'feedback',
'rating']].to_csv(feedback_file, mode='a', header=not
pd.io.common.file_exists(feedback_file), index=False)

print("\nThank you for your feedback! It has been recorded.")

# Loop to interact with the user until they want to exit

while True:
get_rental_info()
continue_input = input("\nDo you want to check feedback for another bike company?
(yes/no): ").strip().lower()
if continue_input != 'yes':
print("Thank you! Have a great day.")
break
Output:
Available bike companies: Hero, Honda, Pulsar, Bajaj, TVS Scooty, Honda Activa,
Elecfasion
Enter the bike name (e.g., Hero-85, Bajaj-93, etc.): 85]
No data found for the bike: 85]. Please check the name and try again.

Do you want to check feedback for another bike? (yes/no): yes

Available bike companies: Hero, Honda, Pulsar, Bajaj, TVS Scooty, Honda Activa,
Elecfasion
Enter the bike name (e.g., Hero-85, Bajaj-93, etc.): Hero-85

Displaying rental information for bike Hero-85:

hour temperature humidity windspeed holiday working_day \

1 19 20.265501 44.090339 7.785484 1 0
523 13 22.012483 56.891272 0.029882 1 0

bikes_rented datetime weekday epoch bike_name \

1 186.177394 2023-01-01 19:00:00 6 1672599600 Hero-85
523 206.866187 2023-01-01 13:00:00 6 1672578000 Hero-85

bike_company enrollment_date next_servicing owner_address

1 TVS Scooty 2021-07-30 2021-11-03 Address A
523 Pulsar 2022-04-07 2022-07-23 Address B

Please provide your feedback on the service (e.g., 'Good service', 'Needs improvement'): 1
Rate the service (1-5, with 5 being the best): 5

Thank you for your feedback! It has been recorded.

(Anh Duc Nguyen) Capstone
No ratings yet
(Anh Duc Nguyen) Capstone
53 pages
Ds R Capstone Template
No ratings yet
Ds R Capstone Template
36 pages
Solution - Data Analysis With Python-Project-2 - v1.0
No ratings yet
Solution - Data Analysis With Python-Project-2 - v1.0
14 pages
Business-Case-Yulu-Hypothesis-Testing - Ipynb - Colab
No ratings yet
Business-Case-Yulu-Hypothesis-Testing - Ipynb - Colab
4 pages
Solution To Linear Regression Assignment
No ratings yet
Solution To Linear Regression Assignment
5 pages
Seoul Rental Bike Data Analysis and Modeling: Quantitative Techniques - Ii
No ratings yet
Seoul Rental Bike Data Analysis and Modeling: Quantitative Techniques - Ii
20 pages
Bike Sharing Demand Prediction
No ratings yet
Bike Sharing Demand Prediction
42 pages
Report Format Merged
No ratings yet
Report Format Merged
20 pages
Rushikesh Chawat SIP
No ratings yet
Rushikesh Chawat SIP
42 pages
Project Template
No ratings yet
Project Template
13 pages
Mini Project Final Presentation
No ratings yet
Mini Project Final Presentation
18 pages
Keylogger in Security
No ratings yet
Keylogger in Security
12 pages
Regression Linaire Python Tome I
No ratings yet
Regression Linaire Python Tome I
9 pages
Project Template
No ratings yet
Project Template
11 pages
Project Template
No ratings yet
Project Template
13 pages
Capstone Project
No ratings yet
Capstone Project
11 pages
Project Template
No ratings yet
Project Template
11 pages
Optimizing The Hyperparameters 1693296270
No ratings yet
Optimizing The Hyperparameters 1693296270
11 pages
Project Template
No ratings yet
Project Template
11 pages
Group7 Report
No ratings yet
Group7 Report
10 pages
MyPresentation 1
No ratings yet
MyPresentation 1
11 pages
A Short-Term Hybrid TCN-GRU Prediction Model of Bike-Sharing Demand Based On Travel Characteristics Mining
No ratings yet
A Short-Term Hybrid TCN-GRU Prediction Model of Bike-Sharing Demand Based On Travel Characteristics Mining
22 pages
Internship Report Bike Data
No ratings yet
Internship Report Bike Data
30 pages
Project
No ratings yet
Project
27 pages
Iml 51
No ratings yet
Iml 51
10 pages
Detail Project Report
No ratings yet
Detail Project Report
9 pages
Bike Sharing Company Analysis
No ratings yet
Bike Sharing Company Analysis
14 pages
Business Analytics Project - Group 06
100% (1)
Business Analytics Project - Group 06
16 pages
YULU Assignment
No ratings yet
YULU Assignment
5 pages
Bike Sharing Demand Prediction
No ratings yet
Bike Sharing Demand Prediction
4 pages
Project Template - AICTE - SB4C 1
No ratings yet
Project Template - AICTE - SB4C 1
13 pages
Yulu SRK
No ratings yet
Yulu SRK
20 pages
GA - Meet - Problem Statement & Methodology
No ratings yet
GA - Meet - Problem Statement & Methodology
19 pages
What Features in The Dataset Are Most Important For Predicting Equipment Failures?
No ratings yet
What Features in The Dataset Are Most Important For Predicting Equipment Failures?
25 pages
Analyzing Bike Sharing Trends
No ratings yet
Analyzing Bike Sharing Trends
7 pages
Project Templ6
No ratings yet
Project Templ6
11 pages
Linear Regression Subjective Questions
No ratings yet
Linear Regression Subjective Questions
17 pages
Experiment 8: Aim: Objective: Tools Used: Theory
No ratings yet
Experiment 8: Aim: Objective: Tools Used: Theory
10 pages
TD
No ratings yet
TD
4 pages
Project Template AICTE
No ratings yet
Project Template AICTE
11 pages
New Synopsis
No ratings yet
New Synopsis
3 pages
ML Week 15
No ratings yet
ML Week 15
6 pages
Assignment: Regression: Problem Statement
No ratings yet
Assignment: Regression: Problem Statement
3 pages
Aiml - 4351601
No ratings yet
Aiml - 4351601
60 pages
Your First Neural Network
No ratings yet
Your First Neural Network
15 pages
Bike Sharing Prediction
No ratings yet
Bike Sharing Prediction
19 pages
06 DT BikeShareData
No ratings yet
06 DT BikeShareData
2 pages
Lab 1. Boston House
No ratings yet
Lab 1. Boston House
7 pages
RESHMA Internship Summary Report 1
No ratings yet
RESHMA Internship Summary Report 1
22 pages
MLLABDSA
No ratings yet
MLLABDSA
16 pages
Flight Price Prediction Report
No ratings yet
Flight Price Prediction Report
18 pages
Bike Data
No ratings yet
Bike Data
1 page
Big Data Practical
No ratings yet
Big Data Practical
20 pages
BIke Sharing Dataset Assignment PDF
No ratings yet
BIke Sharing Dataset Assignment PDF
2 pages
Bike Rental R
No ratings yet
Bike Rental R
13 pages
Bike Rental
No ratings yet
Bike Rental
12 pages
Project - Template - MS AI
No ratings yet
Project - Template - MS AI
11 pages
Bike Rental (Project)
No ratings yet
Bike Rental (Project)
16 pages
A1 Siddhant's Resume
No ratings yet
A1 Siddhant's Resume
1 page
843 AI Projects Cookbook
No ratings yet
843 AI Projects Cookbook
43 pages
Main Report
No ratings yet
Main Report
94 pages
144-Statistical Analysis of Imbalanced Classification With Training Size Variation and Subsampling On Datasets of Research Papers in Biomedical Literature
No ratings yet
144-Statistical Analysis of Imbalanced Classification With Training Size Variation and Subsampling On Datasets of Research Papers in Biomedical Literature
26 pages
Unit1 (Complete)
No ratings yet
Unit1 (Complete)
111 pages
PR LabManual Anurag
No ratings yet
PR LabManual Anurag
21 pages
Drawing DeepSeek R1 Architecture and Training Process From Scratch - by Fareed Khan - Feb, 2025 - Level Up Coding
No ratings yet
Drawing DeepSeek R1 Architecture and Training Process From Scratch - by Fareed Khan - Feb, 2025 - Level Up Coding
39 pages
IE5005 Lecture 04
No ratings yet
IE5005 Lecture 04
57 pages
1232-Article Text-2726-2-10-20240615
No ratings yet
1232-Article Text-2726-2-10-20240615
22 pages
Association Rule Mining For Healthcare Data Analysis
No ratings yet
Association Rule Mining For Healthcare Data Analysis
16 pages
Interpretability and Algorithmic Fairness DSB 2023 Slides
No ratings yet
Interpretability and Algorithmic Fairness DSB 2023 Slides
273 pages
A Self-Organizing Deep Network Architecture Designed Based On LSTM Network Via Elitism-Driven Roulette-Wheel Selection For Time-Series Forecasting
No ratings yet
A Self-Organizing Deep Network Architecture Designed Based On LSTM Network Via Elitism-Driven Roulette-Wheel Selection For Time-Series Forecasting
17 pages
CS550 Lec7-ClassificationIntro
No ratings yet
CS550 Lec7-ClassificationIntro
49 pages
iWCE 2024 01
No ratings yet
iWCE 2024 01
10 pages
AI-Driven Risk Modeling in Life Insurance: Advanced Techniques For Mortality and Longevity Prediction
No ratings yet
AI-Driven Risk Modeling in Life Insurance: Advanced Techniques For Mortality and Longevity Prediction
31 pages
Csemp - 152 (Interim Review - 1 For Major Project Stage 2)
No ratings yet
Csemp - 152 (Interim Review - 1 For Major Project Stage 2)
11 pages
YOLO (You Only Look Once)
No ratings yet
YOLO (You Only Look Once)
4 pages
Project Report
No ratings yet
Project Report
30 pages
Fingerprinting Attack On Tor Anonymity U
No ratings yet
Fingerprinting Attack On Tor Anonymity U
6 pages
Training Verifiers To Solve Math Word Problems
No ratings yet
Training Verifiers To Solve Math Word Problems
22 pages
49 - Detection - of - Covid-19 - Cases - From - Chest - X-Ray - Image
No ratings yet
49 - Detection - of - Covid-19 - Cases - From - Chest - X-Ray - Image
9 pages
1 s2.0 S0301420722005529 Main
No ratings yet
1 s2.0 S0301420722005529 Main
15 pages
UNIT-1 Polynomial Regression
No ratings yet
UNIT-1 Polynomial Regression
7 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
Iclr2022 Should We Replace Cnns With TR
No ratings yet
Iclr2022 Should We Replace Cnns With TR
15 pages
Conference Template
No ratings yet
Conference Template
5 pages
Immunocto - A Massive Immune Cell Database Auto-Generated For Histopathology
No ratings yet
Immunocto - A Massive Immune Cell Database Auto-Generated For Histopathology
9 pages
Seismic Data Interpolation Using Dual-Domain Conditional Generative Adversarial Networks
No ratings yet
Seismic Data Interpolation Using Dual-Domain Conditional Generative Adversarial Networks
6 pages
Bi 8
No ratings yet
Bi 8
3 pages
Classification in R
No ratings yet
Classification in R
5 pages
Essential n8n Playbook
From Everand
Essential n8n Playbook
Leandro Calado
No ratings yet
Data Science Programming In Python
From Everand
Data Science Programming In Python
Anita Raichand
No ratings yet