0% found this document useful (0 votes)
25 views12 pages

Group 1 ML Project - Ipynb - Colab

The document outlines a machine learning project for DublinBikes, consisting of two main tasks: identifying three bike stations for removal or resizing and proposing three new bike station locations, while simulating the impact on user experience; and developing an algorithm to predict when bike stations need refilling or emptying. The project encourages the use of various methodologies such as classification and regression, with a focus on understanding the data and providing clear explanations of the chosen methods. Additionally, it emphasizes the importance of data quality and suggests focusing on a specific subset of the city for simplicity.

Uploaded by

tghosh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views12 pages

Group 1 ML Project - Ipynb - Colab

The document outlines a machine learning project for DublinBikes, consisting of two main tasks: identifying three bike stations for removal or resizing and proposing three new bike station locations, while simulating the impact on user experience; and developing an algorithm to predict when bike stations need refilling or emptying. The project encourages the use of various methodologies such as classification and regression, with a focus on understanding the data and providing clear explanations of the chosen methods. Additionally, it emphasizes the importance of data quality and suggests focusing on a specific subset of the city for simplicity.

Uploaded by

tghosh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

12/2/24, 1:45 PM Group 1 ML Project.

ipynb - Colab

The company proposed 2 goals. The two tasks correctly carried out will give you full points. The two tasks have the same value.

1. To identify 3 bike stations that could be removed or substantially reduced in size, and to identify areas for 3 new bike stations. Simulate
the time-course for the new bike stations. Quantify your result: you should make a case that your solution may actually improve the user
experience.
2. Build an algorithm that can warn DublinBikes when a bike station needs refill or to be emptied. This is not as simple as refilling all stations
that don’t have bikes. Instead, the task involves anticipating whether a station is going to be empty in the near future (e.g., the next 30
minutes, or the next hour). This will require machine learning. You will need to define appropriate evaluation metrics and to justify your
reasoning. There are many angle to tackle this task. Each group is asked to propose one solution.

You are free to use classification, regression, clustering, and any other methodology that was covered in class. You are free to use Gemini.
However, your report will have to make it clear that you understand what your code is doing. To do so, your report will need to include high-level
explanations in plain English of your methodology (e.g., the dataset has XX features; since some of them were not numeric, we converted them
to numerical values; outliers were removed by applying a threshold etc).

A few tips: The manager suggested focussing on a subset of the city for simplicity (of course, you are free to do more than that, if you like).
Make sure that the data for that bike station is available if you plan on combining datasets from different quartiles. Missing or bad data-points
can be a problem. So, identifying stations with good data will make your life easier (but feel free to make your life more complicated if you like
the challenge).

# Import of libraries that are going to be used


%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as dates
from sklearn.model_selection import train_test_split
import seaborn as sns

# Libraries for the Heat Map


import folium
from folium.plugins import HeatMap
from branca.colormap import linear

# Dublin Bike Data of the whole 2023 import & print

from google.colab import drive


drive.mount('/content/drive')

dataset = pd.concat([pd.read_csv(f) for f in ['/content/drive/MyDrive/CSP7000/Group Project/dublinbike-historical-data-2023-01.csv', '/co

print(dataset.columns)
print(len(dataset))
dataset

Mounted at /content/drive
Index(['STATION ID', 'TIME', 'LAST UPDATED', 'NAME', 'BIKE_STANDS',
'AVAILABLE_BIKE_STANDS', 'AVAILABLE_BIKES', 'STATUS', 'ADDRESS',
'LATITUDE', 'LONGITUDE'],
dtype='object')
1994400
STATION LAST
TIME NAME BIKE_STANDS AVAILABLE_BIKE_STANDS AVAILABLE_BIKES STATUS ADDRESS LATITUDE
ID UPDATED

2023- 2022-
CLARENDON Clarendon
0 1 01-01 12-31 31 31 0 OPEN 53.3409
ROW Row
00:00:03 23:59:39

2023- 2022-
BLESSINGTON Blessington
1 2 01-01 12-31 20 18 2 OPEN 53.3568
STREET Street
00:00:03 23:57:48

2023- 2022-
BOLTON Bolton
2 3 01-01 12-31 20 9 11 OPEN 53.3512
STREET Street
00:00:03 23:57:10

2023- 2022-
GREEK Greek
3 4 01-01 12-31 20 8 12 OPEN 53.3469
STREET Street
00:00:03 23:51:39

2023- 2022-
CHARLEMONT Charlemont
4 5 01-01 12-31 40 16 24 OPEN 53.3307
PLACE Street
00:00:03 23:58:28

Task 1

https://colab.research.google.com/drive/12wsG8argKpM9F6cVYu9_Uzonb9_h79YF#scrollTo=hw3biYbDXe-m&printMode=true 1/12
12/2/24, 1:45 PM Group 1 ML Project.ipynb - Colab

# We start by preparing the data. We are calculating the average daily (weekday) profile for the 'AVAILABLE BIKES' feature.
dataset['AVAILABLE_BIKES_CHANGE'] = abs(dataset.groupby('STATION ID')['AVAILABLE_BIKES'].diff())
dataset = dataset.dropna()
dataset

STATION LAST
TIME NAME BIKE_STANDS AVAILABLE_BIKE_STANDS AVAILABLE_BIKES STATUS ADDRESS LATITUDE
ID UPDATED

2023- 2023-
CLARENDON Clarendon
113 1 01-01 01-01 31 31 0 OPEN 53.3409
ROW Row
00:30:02 00:21:23

2023- 2023-
BLESSINGTON Blessington
114 2 01-01 01-01 20 18 2 OPEN 53.3568
STREET Street
00:30:02 00:28:07

2023- 2023-
BOLTON Bolton
115 3 01-01 01-01 20 8 12 OPEN 53.3512
STREET Street
00:30:02 00:20:18

2023- 2023-
GREEK Greek
116 4 01-01 01-01 20 8 12 OPEN 53.3469
STREET Street
00:30:02 00:21:55

2023- 2023-
CHARLEMONT Charlemont
117 5 01-01 01-01 40 16 24 OPEN 53.3307
PLACE Street
00:30:02 00:28:45

# We remove station 507: it is a TEST station (for maintenance purposes only)


dataset = dataset.loc[dataset['STATION ID']!=507]

# We select months after January to exclude the system update we detected in Tutorial 1
data = dataset.copy() # creating a new copy of the dataset
data['TIME'] = (pd.to_datetime(data['TIME'], format = "%Y-%m-%d %H:%M:%S")) # changing the format of the 'time' variable
dateMask = data['TIME'].dt.month > 1 # Mask: Data-points recorded after January are set to 'true', All other datapoints are set to 'false
data = data[dateMask] # applying the mask

# Remove columns with information that we don't need for the clustering
data = data.drop(columns = {'STATUS','ADDRESS','LAST UPDATED'})

# Select days from Monday to Friday only


daysMask = data.TIME.dt.weekday < 5 #0-4 indicate Monday to Friday.
weekdays_data = data[daysMask]

# Change the 'TIME' column to only the time (erase the date)
weekdays_data['HOUR'] = weekdays_data['TIME'].dt.hour
weekdays_data= weekdays_data.assign(TIME = weekdays_data['TIME'].dt.hour+weekdays_data['TIME'].dt.minute/60)
weekdays_data

# Convert relevant columns to numeric, handling errors


numeric_cols = ['AVAILABLE_BIKES', 'BIKE_STANDS', 'AVAILABLE_BIKES_CHANGE']
for col in numeric_cols:
weekdays_data[col] = pd.to_numeric(weekdays_data[col], errors='coerce')
#errors='coerce' will replace non-numeric values with NaN

# Averaging data by grouping it into time of the day and station ID.
# This will produce a dataset with one datapoint for each station and time of the day.
# Fields such as 'available bikes' will indicate the average available bikes
# Across all datapoints (e.g., all days) for a given station and time of the day
weekday_avg = weekdays_data.groupby(['STATION ID','NAME','HOUR']).agg('mean')

weekday_avg

https://colab.research.google.com/drive/12wsG8argKpM9F6cVYu9_Uzonb9_h79YF#scrollTo=hw3biYbDXe-m&printMode=true 2/12
12/2/24, 1:45 PM Group 1 ML Project.ipynb - Colab

<ipython-input-4-51fa606ec974>:18: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus


weekdays_data['HOUR'] = weekdays_data['TIME'].dt.hour
TIME BIKE_STANDS AVAILABLE_BIKE_STANDS AVAILABLE_BIKES LATITUDE LONGITUDE AVAILABLE_BIKES_CHA

STATION
NAME HOUR
ID

1 CLARENDON 0 0.250000 31.0 23.565126 7.415966 53.3409 -6.26250 0.584


ROW
1 1.250000 31.0 23.798319 7.180672 53.3409 -6.26250 0.012

2 2.250000 31.0 23.794118 7.184874 53.3409 -6.26250 0.000

3 3.249474 31.0 23.833684 7.147368 53.3409 -6.26250 0.000

4 4.250000 31.0 23.794118 7.184874 53.3409 -6.26250 0.000

... ... ... ... ... ... ... ... ...

117 HANOVER 19 19.250529 40.0 31.494715 8.488372 53.3437 -6.23175 1.215


QUAY EAST
20 20.250529 40.0 31.955603 8.021142 53.3437 -6.23175 0.649

21 21.250000 40.0 32.357143 7.619748 53.3437 -6.23175 0.464

22 22.250526 40.0 32.640000 7.334737 53.3437 -6.23175 0.421

23 23.250000 40.0 32.804622 7.170168 53.3437 -6.23175 0.243

# Reset index. 'STATION ID' and 'TIME' become columns.


# The index is now a counter (from 1 to the number of elements in the dataset)
weekday_avg = weekday_avg.reset_index()
weekday_avg

STATION
NAME HOUR TIME BIKE_STANDS AVAILABLE_BIKE_STANDS AVAILABLE_BIKES LATITUDE LONGITUDE AVAILABLE_BIK
ID

CLARENDON
0 1 0 0.250000 31.0 23.565126 7.415966 53.3409 -6.26250
ROW

CLARENDON
1 1 1 1.250000 31.0 23.798319 7.180672 53.3409 -6.26250
ROW

CLARENDON
2 1 2 2.250000 31.0 23.794118 7.184874 53.3409 -6.26250
ROW

CLARENDON
3 1 3 3.249474 31.0 23.833684 7.147368 53.3409 -6.26250
ROW

CLARENDON
4 1 4 4.250000 31.0 23.794118 7.184874 53.3409 -6.26250
ROW

... ... ... ... ... ... ... ... ... ...

HANOVER
2731 117 19 19.250529 40.0 31.494715 8.488372 53.3437 -6.23175
QUAY EAST

HANOVER
2732 117 20 20 250529 40 0 31 955603 8 021142 53 3437 6 23175

# We create a new column 'Percent_full' indicating how many bikes are available in a station (percentage)
#weekday_norm = weekday_avg['AVAILABLE BIKES'] / weekday_avg['BIKE STANDS']
weekday_norm = weekday_avg['AVAILABLE_BIKES_CHANGE'] / weekday_avg['BIKE_STANDS']
weekday_avg = weekday_avg.assign(Percent_full = weekday_norm)
weekday_avg['Percent_full'] = weekday_avg['Percent_full'] * 100
weekday_avg

https://colab.research.google.com/drive/12wsG8argKpM9F6cVYu9_Uzonb9_h79YF#scrollTo=hw3biYbDXe-m&printMode=true 3/12
12/2/24, 1:45 PM Group 1 ML Project.ipynb - Colab

STATION
NAME HOUR TIME BIKE_STANDS AVAILABLE_BIKE_STANDS AVAILABLE_BIKES LATITUDE LONGITUDE AVAILABLE_BIK
ID

CLARENDON
0 1 0 0.250000 31.0 23.565126 7.415966 53.3409 -6.26250
ROW

CLARENDON
1 1 1 1.250000 31.0 23.798319 7.180672 53.3409 -6.26250
ROW

CLARENDON
2 1 2 2.250000 31.0 23.794118 7.184874 53.3409 -6.26250
ROW

CLARENDON
3 1 3 3.249474 31.0 23.833684 7.147368 53.3409 -6.26250
ROW

CLARENDON
4 1 4 4.250000 31.0 23.794118 7.184874 53.3409 -6.26250
ROW

... ... ... ... ... ... ... ... ... ...

HANOVER
2731 117 19 19.250529 40.0 31.494715 8.488372 53.3437 -6.23175
QUAY EAST

HANOVER
2732 117 20 20 250529 40 0 31 955603 8 021142 53 3437 6 23175

# Reshape to get each time of the day in a column (features) and each station in a row (data-points) for Percent full
time_station_percent_full= weekday_avg.pivot(index='STATION ID' , columns='HOUR', values='Percent_full')
print(time_station_percent_full.shape)
time_station_percent_full

(114, 24)
HOUR 0 1 2 3 4 5 6 7 8 9 ... 14 15

STATION
ID

1 1.883979 0.040661 0.000000 0.000000 0.000000 1.667118 4.964346 9.507997 6.986988 9.934942 ... 7.259762 5.426146

2 3.088235 0.493697 0.010504 0.010526 0.000000 0.651261 7.747368 9.926471 8.928571 9.065126 ... 6.389474 4.715789

3 3.119748 0.325630 0.063025 0.010526 0.010504 1.186975 4.294737 6.018908 7.163866 12.363445 ... 9.094737 6.842105

4 1.176471 0.094538 0.000000 0.000000 0.000000 0.199580 4.242105 5.892857 6.974790 13.025210 ... 5.694737 5.221053

5 1.953782 0.283613 0.015756 0.000000 0.000000 0.315126 5.952632 10.724790 27.657563 14.185924 ... 4.536842 3.868421

... ... ... ... ... ... ... ... ... ... ... ... ... ...

113 0.231092 0.015756 0.000000 0.000000 0.000000 0.005252 0.494737 1.654412 2.032563 4.243697 ... 1.826316 1.389474

114 0.651261 0.057773 0.010504 0.000000 0.000000 0.446429 4.984211 13.602941 14.994748 12.988445 ... 3.710526 5.815789 1

115 1.729692 0.273109 0.000000 0.014035 0.000000 0.084034 2.982456 5.602241 9.593838 17.394958 ... 4.835088 4.294737

116 0.567227 0.070028 0.000000 0.000000 0.000000 0.098039 0.701754 1.582633 2.100840 1.939776 ... 1.305263 1.319298

117 0.477941 0.063025 0.000000 0.000000 0.000000 0.000000 0.652632 2.069328 4.858193 10.099790 ... 1.726316 1.742105

# Same for and Available Bike Change


time_station_available_bike_change= weekday_avg.pivot(index='STATION ID' , columns='HOUR', values='AVAILABLE_BIKES_CHANGE')
print(time_station_available_bike_change.shape)
time_station_available_bike_change

https://colab.research.google.com/drive/12wsG8argKpM9F6cVYu9_Uzonb9_h79YF#scrollTo=hw3biYbDXe-m&printMode=true 4/12
12/2/24, 1:45 PM Group 1 ML Project.ipynb - Colab

(114, 24)
HOUR 0 1 2 3 4 5 6 7 8 9 ... 14 15

STATION
ID

1 0.584034 0.012605 0.000000 0.000000 0.000000 0.516807 1.538947 2.947479 2.165966 3.079832 ... 2.250526 1.682105 1.81

2 0.617647 0.098739 0.002101 0.002105 0.000000 0.130252 1.549474 1.985294 1.785714 1.813025 ... 1.277895 0.943158 1.54

3 0.623950 0.065126 0.012605 0.002105 0.002101 0.237395 0.858947 1.203782 1.432773 2.472689 ... 1.818947 1.368421 1.83

4 0.235294 0.018908 0.000000 0.000000 0.000000 0.039916 0.848421 1.178571 1.394958 2.605042 ... 1.138947 1.044211 1.52

5 0.781513 0.113445 0.006303 0.000000 0.000000 0.126050 2.381053 4.289916 11.063025 5.674370 ... 1.814737 1.547368 2.02

... ... ... ... ... ... ... ... ... ... ... ... ... ...

113 0.092437 0.006303 0.000000 0.000000 0.000000 0.002101 0.197895 0.661765 0.813025 1.697479 ... 0.730526 0.555789 0.70

114 0.260504 0.023109 0.004202 0.000000 0.000000 0.178571 1.993684 5.441176 5.997899 5.195378 ... 1.484211 2.326316 4.72

115 0.518908 0.081933 0.000000 0.004211 0.000000 0.025210 0.894737 1.680672 2.878151 5.218487 ... 1.450526 1.288421 1.84

116 0.170168 0.021008 0.000000 0.000000 0.000000 0.029412 0.210526 0.474790 0.630252 0.581933 ... 0.391579 0.395789 0.41

117 0.191176 0.025210 0.000000 0.000000 0.000000 0.000000 0.261053 0.827731 1.943277 4.039916 ... 0.690526 0.696842 1.14

# We have decided to look at peak hours in the morning and in the evening to capture the usage of bikes for work-related travel
time_station_percent_full = time_station_percent_full[:][[7,8,9,10,17,18,19]]
time_station_percent_full

HOUR 7 8 9 10 17 18 19

STATION ID

1 9.507997 6.986988 9.934942 7.232598 6.166983 7.241051 5.469549

2 9.926471 8.928571 9.065126 5.389474 7.878151 7.879747 7.325581

3 6.018908 7.163866 12.363445 5.778947 8.581933 7.985232 7.283298

4 5.892857 6.974790 13.025210 7.936842 7.626050 6.793249 4.376321

5 10.724790 27.657563 14.185924 12.700000 12.421218 15.374473 7.468288

... ... ... ... ... ... ... ...

113 1.654412 2.032563 4.243697 2.857895 2.221639 2.969409 1.871036

114 13.602941 14.994748 12.988445 6.289474 13.356092 7.753165 2.647992

115 5.602241 9.593838 17.394958 3.719298 6.449580 9.831224 8.421424

116 1.582633 2.100840 1.939776 0.568421 1.785714 2.616034 2.332629

117 2.069328 4.858193 10.099790 3.368421 7.074580 7.067511 3.039112

114 rows × 7 columns

# Heat map of percent full for our peak hours (all together)
# Get unique station names, latitudes, and longitudes
stations, mask_stations = np.unique(dataset.NAME, return_index=True)
lats = dataset.LATITUDE.iloc[mask_stations]
longs = dataset.LONGITUDE.iloc[mask_stations]

# Get 'Percent_full' for the unique stations using the correct DataFrame: time_station_percent_full
# Use .loc to access values by station ID, ensuring to use .values to get numeric values
percent_full = time_station_percent_full.loc[dataset['STATION ID'].iloc[mask_stations]].mean(axis=1)
#average the percent full across different hours for each station.

# Create a list of [latitude, longitude, percent_full] for each station


heatmap_data = [[lats.iloc[i], longs.iloc[i], percent_full.iloc[i]] for i in range(len(stations))]

mp = folium.Map(location=[53.34, -6.2603], zoom_start=14, tiles='cartodbpositron')

# Add the heatmap layer


HeatMap(heatmap_data, radius=15).add_to(mp)

# Create a colormap
colormap = linear.YlOrRd_09.scale(
percent_full.min(), percent_full.max()
)

# Add the heatmap layer with colormap


HeatMap(heatmap_data, radius=10, gradient=colormap.to_dict()).add_to(mp)

https://colab.research.google.com/drive/12wsG8argKpM9F6cVYu9_Uzonb9_h79YF#scrollTo=hw3biYbDXe-m&printMode=true 5/12
12/2/24, 1:45 PM Group 1 ML Project.ipynb - Colab
mp.save('heatmap.html')
mp

Make this Notebook Trusted to load map: File -> Trust Notebook
+

Leaflet | © OpenStreetMap contributors © CARTO

# Heat map of percent full for our peak hours separated into morning and evening
# Filter data for peak hours
peak_hours = [7, 8, 9, 10, 17, 18, 19]
peak_hour_data = weekday_avg[weekday_avg['HOUR'].isin(peak_hours)]

# Separate morning and evening data


morning = peak_hour_data[peak_hour_data['HOUR'].isin([7, 8, 9, 10])]
evening = peak_hour_data[peak_hour_data['HOUR'].isin([17, 18, 19])]

def get_heatmap_data(data):
# Get unique station names, latitudes, and longitudes
stations, mask_stations = np.unique(dataset.NAME, return_index=True)
lats = dataset.LATITUDE.iloc[mask_stations]
longs = dataset.LONGITUDE.iloc[mask_stations]

# Get 'Percent_full' for the unique stations


# Use the index of percent_full (station IDs) for lookup

https://colab.research.google.com/drive/12wsG8argKpM9F6cVYu9_Uzonb9_h79YF#scrollTo=hw3biYbDXe-m&printMode=true 6/12
12/2/24, 1:45 PM Group 1 ML Project.ipynb - Colab
percent_full = time_station_percent_full.loc[dataset['STATION ID'].iloc[mask_stations]].mean(axis=1)

# Create a list of [latitude, longitude, percent_full] for each station


# Use the index of percent_full when accessing values
heatmap_data = [[lats.iloc[i], longs.iloc[i], percent_full.iloc[i]] for i in range(len(stations))]

return heatmap_data

# Heat map of percent full for our morning peak hours

# Create the morning heatmap


morning_heatmap_data = get_heatmap_data(morning)
morning_map = folium.Map(location=[53.34, -6.2603], zoom_start=10, tiles='cartodbpositron')
colormap_morning = linear.YlOrRd_09.scale(min(data[2] for data in morning_heatmap_data), max(data[2] for data in morning_heatmap_data))
HeatMap(morning_heatmap_data, radius=15).add_to(morning_map)

display(morning_map)

Make this Notebook Trusted to load map: File -> Trust Notebook
+

Leaflet | © OpenStreetMap contributors © CARTO

# Heat map of percent full for our evening peak hours


# Create the evening heatmap

https://colab.research.google.com/drive/12wsG8argKpM9F6cVYu9_Uzonb9_h79YF#scrollTo=hw3biYbDXe-m&printMode=true 7/12
12/2/24, 1:45 PM Group 1 ML Project.ipynb - Colab
evening_heatmap_data = get_heatmap_data(evening)
evening_map = folium.Map(location=[53.34, -6.2603], zoom_start=10, tiles='cartodbpositron')
colormap_evening = linear.YlOrRd_09.scale(min(data[2] for data in evening_heatmap_data), max(data[2] for data in evening_heatmap_data))
HeatMap(evening_heatmap_data, radius=15).add_to(evening_map)

display(evening_map)

Make this Notebook Trusted to load map: File -> Trust Notebook
+

Leaflet | © OpenStreetMap contributors © CARTO

time_station_available_bike_change = time_station_available_bike_change[:][[7,8,9,10,17,18,19]]
time_station_available_bike_change

https://colab.research.google.com/drive/12wsG8argKpM9F6cVYu9_Uzonb9_h79YF#scrollTo=hw3biYbDXe-m&printMode=true 8/12
12/2/24, 1:45 PM Group 1 ML Project.ipynb - Colab

HOUR 7 8 9 10 17 18 19

STATION ID

1 2.947479 2.165966 3.079832 2.242105 1.911765 2.244726 1.695560

2 1.985294 1.785714 1.813025 1.077895 1.575630 1.575949 1.465116

3 1.203782 1.432773 2.472689 1.155789 1.716387 1.597046 1.456660

4 1.178571 1.394958 2.605042 1.587368 1.525210 1.358650 0.875264

5 4.289916 11.063025 5.674370 5.080000 4.968487 6.149789 2.987315

... ... ... ... ... ... ... ...

113 0.661765 0.813025 1.697479 1.143158 0.888655 1.187764 0.748414

114 5.441176 5.997899 5.195378 2.515789 5.342437 3.101266 1.059197

115 1.680672 2.878151 5.218487 1.115789 1.934874 2.949367 2.526427

116 0.474790 0.630252 0.581933 0.170526 0.535714 0.784810 0.699789

117 0.827731 1.943277 4.039916 1.347368 2.829832 2.827004 1.215645

114 rows × 7 columns

# Filter data for peak hours


peak_hours = [7, 8, 9, 10, 17, 18, 19]
peak_hour_data = weekday_avg[weekday_avg['HOUR'].isin(peak_hours)]

# Separate morning and evening data


morning = peak_hour_data[peak_hour_data['HOUR'].isin([7, 8, 9, 10])]
evening = peak_hour_data[peak_hour_data['HOUR'].isin([17, 18, 19])]

# Heat map of percent full for our peak hours


# Get unique station names, latitudes, and longitudes
stations, mask_stations = np.unique(dataset.NAME, return_index=True)
lats = dataset.LATITUDE.iloc[mask_stations]
longs = dataset.LONGITUDE.iloc[mask_stations]

# Get 'Bike Change' for the unique stations using the correct DataFrame: time_station_available_bike_change
# Use .loc to access values by station ID, ensuring to use .values to get numeric values
change = time_station_available_bike_change.loc[dataset['STATION ID'].iloc[mask_stations]].mean(axis=
mean 1)
#average the percent full across different hours for each station.

# Create a list of [latitude, longitude, percent_full] for each station


heatmap_data = [[lats.iloc[i], longs.iloc[i], change.iloc[i]] for i in range(len(stations))]

mp = folium.Map(location=[53.34, -6.2603], zoom_start=14, tiles='cartodbpositron')

# Add the heatmap layer


HeatMap(heatmap_data, radius=15).add_to(mp)

# Create a colormap
colormap = linear.YlOrRd_09.scale(
change.min(), change.max()
)

# Add the heatmap layer with colormap


HeatMap(heatmap_data, radius=10, gradient=colormap.to_dict()).add_to(mp)

mp.save('heatmap.html')
mp

https://colab.research.google.com/drive/12wsG8argKpM9F6cVYu9_Uzonb9_h79YF#scrollTo=hw3biYbDXe-m&printMode=true 9/12
12/2/24, 1:45 PM Group 1 ML Project.ipynb - Colab

Make this Notebook Trusted to load map: File -> Trust Notebook
+

Leaflet | © OpenStreetMap contributors © CARTO

# Heat map of bike change for our peak hours separated into morning and evening
# Filter data for peak hours
peak_hours = [7, 8, 9, 10, 17, 18, 19]
peak_hour_data = weekday_avg[weekday_avg['HOUR'].isin(peak_hours)]

# Separate morning and evening data


morning = peak_hour_data[peak_hour_data['HOUR'].isin([7, 8, 9, 10])]
evening = peak_hour_data[peak_hour_data['HOUR'].isin([17, 18, 19])]

def get_heatmap_data(data):
# Get unique station names, latitudes, and longitudes
stations, mask_stations = np.unique(dataset.NAME, return_index=True)
lats = dataset.LATITUDE.iloc[mask_stations]
longs = dataset.LONGITUDE.iloc[mask_stations]

# Get 'Percent_full' for the unique stations


# Use the index of percent_full (station IDs) for lookup
change = time_station_available_bike_change.loc[dataset['STATION ID'].iloc[mask_stations]].mean(axis=
mean 1)

# Create a list of [latitude, longitude, percent_full] for each station

https://colab.research.google.com/drive/12wsG8argKpM9F6cVYu9_Uzonb9_h79YF#scrollTo=hw3biYbDXe-m&printMode=true 10/12
12/2/24, 1:45 PM Group 1 ML Project.ipynb - Colab
# Use the index of percent_full when accessing values
heatmap_data = [[lats.iloc[i], longs.iloc[i], change.iloc[i]] for i in range(len(stations))]

return heatmap_data

# Heat map of bike change for our morning peak hours

# Create the morning heatmap


morning_heatmap_data = get_heatmap_data(morning)
morning_map = folium.Map(location=[53.34, -6.2603], zoom_start=10, tiles='cartodbpositron')
colormap_morning = linear.YlOrRd_09.scale(min(data[2] for data in morning_heatmap_data), max(data[2] for data in morning_heatmap_data))
HeatMap(morning_heatmap_data, radius=15).add_to(morning_map)

display(morning_map)

Make this Notebook Trusted to load map: File -> Trust Notebook
+

Leaflet | © OpenStreetMap contributors © CARTO

# Heat map of percent full for our evening peak hours


# Create the evening heatmap
evening_heatmap_data = get_heatmap_data(evening)
evening_map = folium.Map(location=[53.34, -6.2603], zoom_start=10, tiles='cartodbpositron')
colormap_evening = linear.YlOrRd_09.scale(min(data[2] for data in evening_heatmap_data), max(data[2] for data in evening_heatmap_data))

https://colab.research.google.com/drive/12wsG8argKpM9F6cVYu9_Uzonb9_h79YF#scrollTo=hw3biYbDXe-m&printMode=true 11/12
12/2/24, 1:45 PM Group 1 ML Project.ipynb - Colab
HeatMap(evening_heatmap_data, radius=15).add_to(evening_map)

display(evening_map)

Make this Notebook Trusted to load map: File -> Trust Notebook
+

Leaflet | © OpenStreetMap contributors © CARTO

# Filter data for peak hours


peak_hours = [7, 8, 9, 10, 17, 18, 19]
peak_hour_data = weekday_avg[weekday_avg['HOUR'].isin(peak_hours)]

# Separate morning and evening data


morning = peak_hour_data[peak_hour_data['HOUR'].isin([7, 8, 9, 10])]
evening = peak_hour_data[peak_hour_data['HOUR'].isin([17, 18, 19])]

# Create subplots for morning and evening


fig, axes = plt.subplots(1, 2, figsize=(11, 6)) # 1 row, 2 columns

# Function to plot data on an axis


def plot_data(ax, data, title):
hourly_change = data.groupby('HOUR')['AVAILABLE_BIKES_CHANGE'].mean
mean()
hourly_percent_full = data.groupby('HOUR')['Percent_full'].mean
mean()

sns.lineplot(x=hourly_change.index, y=hourly_change.values, ax=ax, color='blue', label='Available Bike Change')


https://colab.research.google.com/drive/12wsG8argKpM9F6cVYu9_Uzonb9_h79YF#scrollTo=hw3biYbDXe-m&printMode=true 12/12

You might also like