0% found this document useful (0 votes)
6 views4 pages

Python Capstone Implementation

The document outlines a Python capstone project using Google Colab to analyze a dataset related to water leak detection, consisting of 1000 rows and 7 columns. It includes data loading, summary statistics, and visualizations such as correlation matrices and histograms. The dataset contains information on timestamps, sensor IDs, pressure, flow rate, temperature, and leak/burst statuses, with no missing values.

Uploaded by

kanipakureddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views4 pages

Python Capstone Implementation

The document outlines a Python capstone project using Google Colab to analyze a dataset related to water leak detection, consisting of 1000 rows and 7 columns. It includes data loading, summary statistics, and visualizations such as correlation matrices and histograms. The dataset contains information on timestamps, sensor IDs, pressure, flow rate, temperature, and leak/burst statuses, with no missing values.

Uploaded by

kanipakureddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

7/6/25, 3:03 PM Python Capstone Implementation - Colab

from google.colab import files


uploaded = files.upload()

Choose Files water_leak…00_rows.xlsx


water_leak_detection_1000_rows.xlsx(application/vnd.openxmlformats-officedocument.spreadsheetml.sheet) - 78295 bytes, last modified: 7/6/2025 - 100%
done
Saving water_leak_detection_1000_rows.xlsx to water_leak_detection_1000_rows (1).xlsx
Choose Files water_leak…00_rows.xlsx
water_leak_detection_1000_rows.xlsx(application/vnd.openxmlformats-officedocument.spreadsheetml.sheet) - 78295 bytes, last modified: 7/6/2025 - 100%
done

import pandas as pd
data = pd.read_excel("water_leak_detection_1000_rows.xlsx")
data.head()

Timestamp Sensor_ID Pressure (bar) Flow Rate (L/s) Temperature (°C) Leak Status Burst Status

0 2024-01-01 00:00:00 S007 3.694814 77.515218 21.695365 0 0

1 2024-01-01 00:05:00 S007 2.587125 179.926422 19.016725 0 0

2 2024-01-01 00:10:00 S002 2.448965 210.130823 10.011681 1 0

3 2024-01-01 00:15:00 S009 2.936844 141.777934 12.092408 0 0

4 2024-01-01 00:20:00 S003 3.073693 197.484633 17.001443 0 0

Next steps: Generate code with data toggle_off View recommended plots New interactive sheet

data.shape

(1000, 7)

data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Timestamp 1000 non-null datetime64[ns]
1 Sensor_ID 1000 non-null object
2 Pressure (bar) 1000 non-null float64
3 Flow Rate (L/s) 1000 non-null float64
4 Temperature (°C) 1000 non-null float64
5 Leak Status 1000 non-null int64
6 Burst Status 1000 non-null int64
dtypes: datetime64[ns](1), float64(3), int64(2), object(1)
memory usage: 54.8+ KB

data.isnull().sum()

Timestamp 0

Sensor_ID 0

Pressure (bar) 0

Flow Rate (L/s) 0

Temperature (°C) 0

Leak Status 0

Burst Status 0

dtype: int64

data.describe()

https://colab.research.google.com/drive/1sY2gcm-d9PRkpgIuJGIhjFr75I0Q6pxx#scrollTo=2V9uqbNf7fqu&printMode=true 1/4
7/6/25, 3:03 PM Python Capstone Implementation - Colab

Timestamp Pressure (bar) Flow Rate (L/s) Temperature (°C) Leak Status Burst Status

count 1000 1000.000000 1000.000000 1000.000000 1000.000000 1000.000000

mean 2024-01-02 17:37:30 3.220696 125.038082 17.434794 0.019000 0.010000

min 2024-01-01 00:00:00 0.910977 50.654490 10.002020 0.000000 0.000000

25% 2024-01-01 20:48:45 2.859332 87.946866 13.715323 0.000000 0.000000

50% 2024-01-02 17:37:30 3.265711 124.106896 17.330067 0.000000 0.000000

75% 2024-01-03 14:26:15 3.607196 162.086708 20.922839 0.000000 0.000000

max 2024-01-04 11:15:00 3.995364 331.754081 24.966107 1.000000 1.000000

std NaN 0.488997 44.121419 4.288908 0.136593 0.099549

import seaborn as sns


import matplotlib.pyplot as plt

data_numeric = data.select_dtypes(include=['number'])
corr_matrix = data_numeric.corr()
plt.figure(figsize=(10, 6))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm')
plt.title('Correlation Matrix')
plt.show()

plt.figure(figsize=(10, 6))
sns.histplot(data['Flow Rate (L/s)'], kde=True, bins=30, color='skyblue')
plt.title('Pressure (bar)')
plt.xlabel('Flow Rate (L/s)')
plt.ylabel('Temperature (°C)')
plt.grid(True)
plt.show()

sns.pairplot(data[['Pressure (bar)', 'Temperature (°C)', 'Flow Rate (L/s)']])


plt.suptitle('Pair Plots', y=1.02)
plt.show()

https://colab.research.google.com/drive/1sY2gcm-d9PRkpgIuJGIhjFr75I0Q6pxx#scrollTo=2V9uqbNf7fqu&printMode=true 2/4
7/6/25, 3:03 PM Python Capstone Implementation - Colab

import pandas as pd

data = pd.read_excel("water_leak_detection_1000_rows.xlsx")

print("First 5 rows:\n", data.head(), "\n")

print("Column names:\n", data.columns.tolist(), "\n")

print("Data types:\n", data.dtypes, "\n")

print("Null values:\n", data.isnull().sum(), "\n")

https://colab.research.google.com/drive/1sY2gcm-d9PRkpgIuJGIhjFr75I0Q6pxx#scrollTo=2V9uqbNf7fqu&printMode=true 3/4
7/6/25, 3:03 PM Python Capstone Implementation - Colab

print("Summary statistics:\n", data.describe(), "\n")

print("Correlation matrix:\n", data.select_dtypes(include=['number']).corr())

First 5 rows:
Timestamp Sensor_ID Pressure (bar) Flow Rate (L/s) \
0 2024-01-01 00:00:00 S007 3.694814 77.515218
1 2024-01-01 00:05:00 S007 2.587125 179.926422
2 2024-01-01 00:10:00 S002 2.448965 210.130823
3 2024-01-01 00:15:00 S009 2.936844 141.777934
4 2024-01-01 00:20:00 S003 3.073693 197.484633

Temperature (°C) Leak Status Burst Status


0 21.695365 0 0
1 19.016725 0 0
2 10.011681 1 0
3 12.092408 0 0
4 17.001443 0 0

Column names:
['Timestamp', 'Sensor_ID', 'Pressure (bar)', 'Flow Rate (L/s)', 'Temperature (°C)', 'Leak Status', 'Burst Status']

Data types:
Timestamp datetime64[ns]
Sensor_ID object
Pressure (bar) float64
Flow Rate (L/s) float64
Temperature (°C) float64
Leak Status int64
Burst Status int64
dtype: object

Null values:
Timestamp 0
Sensor_ID 0
Pressure (bar) 0
Flow Rate (L/s) 0
Temperature (°C) 0
Leak Status 0
Burst Status 0
dtype: int64

Summary statistics:
Timestamp Pressure (bar) Flow Rate (L/s) \
count 1000 1000.000000 1000.000000
mean 2024-01-02 17:37:30 3.220696 125.038082
min 2024-01-01 00:00:00 0.910977 50.654490
25% 2024-01-01 20:48:45 2.859332 87.946866
50% 2024-01-02 17:37:30 3.265711 124.106896
75% 2024-01-03 14:26:15 3.607196 162.086708
max 2024-01-04 11:15:00 3.995364 331.754081
std NaN 0.488997 44.121419

Temperature (°C) Leak Status Burst Status


count 1000.000000 1000.000000 1000.000000
mean 17.434794 0.019000 0.010000
min 10.002020 0.000000 0.000000
25% 13.715323 0.000000 0.000000
50% 17.330067 0.000000 0.000000
75% 20.922839 0.000000 0.000000
max 24.966107 1.000000 1.000000
std 4.288908 0.136593 0.099549

https://colab.research.google.com/drive/1sY2gcm-d9PRkpgIuJGIhjFr75I0Q6pxx#scrollTo=2V9uqbNf7fqu&printMode=true 4/4

You might also like