Practical Labs Guide
Practical Labs Guide
Practical Labs Guide
Learning
Day 1
Lab: Setting Up Python Environment for Machine Learning
Objective:
Understand how to create and configure a Python environment for machine learning
projects, including the installation of essential libraries such as NumPy, Pandas,
Matplotlib, Seaborn, Scikit-learn, TensorFlow, and Keras.
Step 1: Installing Python
1. Ensure that Python is installed on your system by running the following command:
python --version
If Python is not installed, download and install it from the official Python website.
Step 2: Creating a Virtual Environment
A virtual environment is a self-contained directory where Python and its libraries are
installed, which helps to manage dependencies for different projects.
1. Navigate to your project folder where you'd like to create the virtual environment:
cd /path/to/your/project
2. Create a virtual environment named ml_env:
● For Linux/macOS
python3 -m venv ml_env
● For Windows users
python -m venv ml_env
This command will create a folder named ml_env that contains the environment.
Step 3: Activating the Virtual Environment
1. To activate the virtual environment:
● For Linux/macOS
source ml_env/bin/activate
● For Windows users
ml_env\Scripts\activate
2. After activation, your terminal prompt should show the virtual environment name
(ml_env) indicating it’s activated.
Step 4: Installing Required Libraries
With the virtual environment activated, you can now install the necessary libraries for
machine learning.
python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import sklearn
import tensorflow as tf
from tensorflow import keras
3. If there are no errors, your environment is set up successfully.
Step 6: Deactivating the Virtual Environment
Once you're done working in the environment, you can deactivate it by typing:
deactivate
Practical Exercise 1 : Test Installation
Objective:
Generate and visualize some random data using NumPy, Pandas, and Matplotlib.
Step 1: Import Required Libraries
Before generating data or plotting, you'll need to import the necessary libraries: NumPy
(for random number generation), Pandas (for organizing data), and Matplotlib (for
plotting).
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
Dataset Columns:
● NAME (Location)
● HUMIDITY, LIGHT, NO_MAX, NO_MIN, NO2_MAX, NO2_MIN, etc.
● SOUND, TEMPERATURE, UV, AIR_PRESSURE, Lattitude, Longitude
● LASTUPDATEDATETIME (Timestamp)
# Fill missing values with the mean for numeric columns only
numeric_cols = df.select_dtypes(include=['number']).columns
df[numeric_cols] = df[numeric_cols].fillna(df[numeric_cols].mean())
# Descriptive statistics
print(df.describe())
Outputs
##############################################
Output of Exercise 3
##############################################
##############################################
Inspect the first few rows
##############################################
NAME HUMIDITY LIGHT NO_MAX NO_MIN ...
UV_MIN AIR_PRESSURE LASTUPDATEDATETIME Lattitude Longitude0
BopadiSquare_65 19.995 3762.914 0 0 ... 0.2
0.933 13/05/19 12:16 18.559427 73.8286561 Karve Statue Square_5
20.730 529.245 0 0 ... 0.1 0.930 13/05/19
12:16 18.501727 73.8135952 Lullanagar_Square_14 17.387 693.375
0 0 ... 0.2 0.926 13/05/19 12:16 18.487306
73.8856503 Hadapsar_Gadital_01 18.725 723.631 0 0
... 0.1 0.930 13/05/19 12:16 18.501834 73.9414784
PMPML_Bus_Depot_Deccan_15 20.622 816.476 0 0 ... NaN
0.932 13/05/19 12:16 18.451716 73.856170
[5 rows x 28 columns]
##############################################
Check the data types and non-null values
##############################################
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14999 entries, 0 to 14998
Data columns (total 28 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 NAME 14999 non-null object
1 HUMIDITY 14805 non-null float64
2 LIGHT 14326 non-null float64
3 NO_MAX 14999 non-null int64
4 NO_MIN 14999 non-null int64
5 NO2_MAX 14973 non-null float64
6 NO2_MIN 14973 non-null float64
7 OZONE_MAX 14973 non-null float64
8 OZONE_MIN 14973 non-null float64
9 PM10_MAX 14646 non-null float64
10 PM10_MIN 14646 non-null float64
11 PM2_MAX 14646 non-null float64
12 PM2_MIN 14646 non-null float64
13 SO2_MAX 14973 non-null float64
14 SO2_MIN 14973 non-null float64
15 CO_MAX 14973 non-null float64
16 CO_MIN 14973 non-null float64
17 CO2_MAX 14963 non-null float64
18 CO2_MIN 14963 non-null float64
19 SOUND 14805 non-null float64
20 TEMPRATURE_MAX 14963 non-null float64
21 TEMPRATURE_MIN 14963 non-null float64
22 UV_MAX 14116 non-null float64
23 UV_MIN 14116 non-null float64
24 AIR_PRESSURE 14804 non-null float64
25 LASTUPDATEDATETIME 14999 non-null object
26 Lattitude 14999 non-null float64
27 Longitude 14999 non-null float64
dtypes: float64(24), int64(2), object(2)
memory usage: 3.2+ MB
None
##############################################
Output of Exercise 4
##############################################
##############################################
Check for missing values
##############################################
NAME 0
HUMIDITY 194
LIGHT 673
NO_MAX 0
NO_MIN 0
NO2_MAX 26
NO2_MIN 26
OZONE_MAX 26
OZONE_MIN 26
PM10_MAX 353
PM10_MIN 353
PM2_MAX 353
PM2_MIN 353
SO2_MAX 26
SO2_MIN 26
CO_MAX 26
CO_MIN 26
CO2_MAX 36
CO2_MIN 36
SOUND 194
TEMPRATURE_MAX 36
TEMPRATURE_MIN 36
UV_MAX 883
UV_MIN 883
AIR_PRESSURE 195
LASTUPDATEDATETIME 0
Lattitude 0
Longitude 0
dtype: int64
##############################################
Confirm data cleaning
##############################################
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14999 entries, 0 to 14998
Data columns (total 28 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 NAME 14999 non-null object
1 HUMIDITY 14999 non-null float64
2 LIGHT 14999 non-null float64
3 NO_MAX 14999 non-null int64
4 NO_MIN 14999 non-null int64
5 NO2_MAX 14999 non-null float64
6 NO2_MIN 14999 non-null float64
7 OZONE_MAX 14999 non-null float64
8 OZONE_MIN 14999 non-null float64
9 PM10_MAX 14999 non-null float64
10 PM10_MIN 14999 non-null float64
11 PM2_MAX 14999 non-null float64
12 PM2_MIN 14999 non-null float64
13 SO2_MAX 14999 non-null float64
14 SO2_MIN 14999 non-null float64
15 CO_MAX 14999 non-null float64
16 CO_MIN 14999 non-null float64
17 CO2_MAX 14999 non-null float64
18 CO2_MIN 14999 non-null float64
19 SOUND 14999 non-null float64
20 TEMPRATURE_MAX 14999 non-null float64
21 TEMPRATURE_MIN 14999 non-null float64
22 UV_MAX 14999 non-null float64
23 UV_MIN 14999 non-null float64
24 AIR_PRESSURE 14999 non-null float64
25 LASTUPDATEDATETIME 14999 non-null datetime64[ns]
26 Lattitude 14999 non-null float64
27 Longitude 14999 non-null float64
dtypes: datetime64[ns](1), float64(24), int64(2), object(1)
memory usage: 3.2+ MB
None
##############################################
Output of Exercise 5
##############################################
##############################################
Check the transformation
##############################################
NAME HUMIDITY LIGHT NO_MAX NO_MIN ...
UV_MIN AIR_PRESSURE LASTUPDATEDATETIME Lattitude Longitude
0 BopadiSquare_65 -1.085440 0.185820 0.0 0.0 ...
0.984757 0.072996 2019-05-13 12:16:00 18.559427 73.828656
1 Karve Statue Square_5 -1.039402 -0.312201 0.0 0.0 ...
0.055806 -0.302701 2019-05-13 12:16:00 18.501727 73.813595
2 Lullanagar_Square_14 -1.248798 -0.286923 0.0 0.0 ...
0.984757 -0.803631 2019-05-13 12:16:00 18.487306 73.885650
3 Hadapsar_Gadital_01 -1.164989 -0.282264 0.0 0.0 ...
0.055806 -0.302701 2019-05-13 12:16:00 18.501834 73.941478
4 PMPML_Bus_Depot_Deccan_15 -1.046166 -0.267964 0.0 0.0 ...
0.000000 -0.052237 2019-05-13 12:16:00 18.451716 73.856170
[5 rows x 28 columns]
##############################################
Output of Exercise 6
##############################################
##############################################
Descriptive statistics
##############################################
HUMIDITY LIGHT NO_MAX NO_MIN ... AIR_PRESSURE
LASTUPDATEDATETIME Lattitude Longitude
count 1.499900e+04 1.499900e+04 14999.0 14999.0 ... 1.499900e+04
14999 14999.000000 14999.000000
mean 1.136944e-16 5.684721e-18 0.0 0.0 ... -6.897461e-15
2019-04-18 03:16:54.379625472 18.504770 73.849372
min -1.701101e+00 -3.936832e-01 0.0 0.0 ... -1.194931e+01
2019-04-08 00:01:00 18.451716 73.792927
25% -9.244623e-01 -3.932329e-01 0.0 0.0 ... -1.774689e-01
2019-04-12 20:21:30 18.487306 73.824393
50% 4.450639e-16 -3.364439e-01 0.0 0.0 ... 7.299581e-02
2019-04-17 22:20:00 18.501834 73.828656
75% 7.363311e-01 0.000000e+00 0.0 0.0 ... 3.234605e-01
2019-04-23 03:58:00 18.525066 73.858092
max 2.709809e+00 8.622138e+00 0.0 0.0 ... 9.496222e-01
2019-05-13 12:46:00 18.559427 73.941478
std 1.000033e+00 1.000033e+00 0.0 0.0 ... 1.000033e+00
NaN 0.028060 0.042748
[8 rows x 27 columns]
##############################################
Correlation between numerical columns
##############################################
HUMIDITY LIGHT NO_MAX NO_MIN NO2_MAX ... UV_MAX
UV_MIN AIR_PRESSURE Lattitude Longitude
HUMIDITY 1.000000 -0.197010 NaN NaN -0.044097 ... -0.455358
-0.448803 0.022563 -0.023711 -0.036286
LIGHT -0.197010 1.000000 NaN NaN -0.178748 ... 0.183274
0.201387 0.028497 0.101492 -0.048619
NO_MAX NaN NaN NaN NaN NaN ... NaN
NaN NaN NaN NaN
NO_MIN NaN NaN NaN NaN NaN ... NaN
NaN NaN NaN NaN
NO2_MAX -0.044097 -0.178748 NaN NaN 1.000000 ... -0.293246
-0.049395 0.018118 -0.209901 0.472119
NO2_MIN -0.066931 -0.169840 NaN NaN 0.845805 ... -0.235142
-0.010271 0.031517 -0.229492 0.470624
OZONE_MAX -0.014712 0.088718 NaN NaN -0.288305 ... 0.085393
-0.053782 0.056310 -0.032154 -0.080527
OZONE_MIN 0.016655 0.023345 NaN NaN -0.069185 ... -0.021095
0.100634 0.018579 0.159597 -0.036567
PM10_MAX -0.288862 0.038256 NaN NaN 0.254617 ... -0.072332
0.033999 -0.003582 0.115276 0.027580
PM10_MIN -0.256055 0.001034 NaN NaN 0.229997 ... -0.076591
0.077383 -0.036099 0.009867 0.178765
PM2_MAX -0.271402 0.042085 NaN NaN 0.255919 ... -0.082036
0.038399 -0.018413 0.158809 -0.021888
PM2_MIN -0.249854 -0.001211 NaN NaN 0.225540 ... -0.072932
0.078671 -0.054172 0.018211 0.169836
SO2_MAX 0.042583 -0.016404 NaN NaN 0.063862 ... 0.027795
-0.071879 0.164260 -0.064666 0.152573
SO2_MIN 0.010488 -0.019259 NaN NaN 0.032783 ... 0.048649
-0.054404 0.119508 -0.193233 0.316796
CO_MAX 0.033494 0.032465 NaN NaN 0.412400 ... -0.188511
0.004618 0.094195 -0.039327 0.228064
CO_MIN 0.042300 -0.022054 NaN NaN 0.549969 ... -0.236444
0.012838 0.125869 -0.028176 0.390238
CO2_MAX -0.033887 0.000595 NaN NaN -0.026485 ... 0.025683
0.010616 0.009534 0.023452 -0.000432
CO2_MIN -0.081031 0.028052 NaN NaN 0.123964 ... -0.031739
0.017659 0.132243 0.190245 -0.049888
SOUND -0.101544 0.118385 NaN NaN 0.135084 ... 0.301812
0.034389 0.101464 -0.081280 0.120192
TEMPRATURE_MAX -0.014381 0.047581 NaN NaN 0.028936 ... -0.016208
-0.006466 -0.054032 0.208727 -0.254933
TEMPRATURE_MIN -0.072410 -0.060941 NaN NaN 0.321034 ... -0.193122
0.015924 -0.119544 0.034650 0.095099
UV_MAX -0.455358 0.183274 NaN NaN -0.293246 ... 1.000000
0.354292 0.049916 -0.006787 -0.253838
UV_MIN -0.448803 0.201387 NaN NaN -0.049395 ... 0.354292
1.000000 0.027530 0.060500 0.033563
AIR_PRESSURE 0.022563 0.028497 NaN NaN 0.018118 ... 0.049916
0.027530 1.000000 0.162451 -0.170239
Lattitude -0.023711 0.101492 NaN NaN -0.209901 ... -0.006787
0.060500 0.162451 1.000000 -0.400559
Longitude -0.036286 -0.048619 NaN NaN 0.472119 ... -0.253838
0.033563 -0.170239 -0.400559 1.000000
# Print coefficients
print("Coefficients:", regressor.coef_)
Output
##############################################
Evaluate the model using Mean Squared Error (MSE)
##############################################
Mean Squared Error: 0.5558915986952425
##############################################
Print coefficients
##############################################
Coefficients: [ 4.48674910e-01 9.72425752e-03 -1.23323343e-01
7.83144907e-01
-2.02962058e-06 -3.52631849e-03 -4.19792487e-01 -4.33708065e-01]
Output
Accuracy: 1.0
Output
Accuracy: 1.0
Output
Accuracy: 1.0
Output
Practical Exercise 14: Hierarchical Clustering
Objective:
Goal: Hierarchical Clustering builds a tree-like structure of nested clusters. There
are two types:
● Agglomerative: Start with each data point as its own cluster, then iteratively
merge the closest clusters.
● Divisive: Start with one cluster and recursively split it into smaller clusters.
Output
Practical Exercise 15: Dimensionality Reduction
Objective:
Goal:In many real-world datasets, there are a large number of features, making
analysis complex. Dimensionality Reduction techniques reduce the number of
features while retaining important information.
# Best hyperparameters
print("Best Parameters: ", grid_search.best_params_)
print("Best Cross-Validation Score:
{:.2f}".format(grid_search.best_score_))
# Best hyperparameters
print("Best Parameters: ", random_search.best_params_)
print("Best Cross-Validation Score:
{:.2f}".format(random_search.best_score_))
Output
##############################################
Best hyperparameters
##############################################
Best Parameters: {'n_estimators': 200, 'max_depth': 6, 'criterion':
'entropy', 'bootstrap': True}
Best Cross-Validation Score: 0.94
##############################################
Test the best model on the test set
##############################################
Test Accuracy: 1.00
Key Concepts:
# Best hyperparameters
print("Best Parameters: ", random_search.best_params_)
print("Best Cross-Validation Score:
{:.2f}".format(random_search.best_score_))
Output
Test Accuracy: 1.00