MLCyber Lab
MLCyber Lab
Laboratory Exercises
Kuldeep. J. Purohit
November 25, 2024
Contents
1 Data Manipulation and Statistical Analysis 2
1
1 Data Manipulation and Statistical Analysis
Objective
Apply Python programming skills to manipulate data using lists, dictionaries, and the
pandas library. Perform statistical operations such as mean, median, mode, standard
deviation, and variance.
Tasks
1. Create a dictionary representing a dataset of students with their names, ages, and
scores. Convert it into a pandas DataFrame and display the data.
• Mean
• Median
• Mode
• Standard deviation
• Variance
Solution
1 import pandas as pd
2 import numpy as np
3 import matplotlib . pyplot as plt
4
5 # Data dictionary
6 data = {
7 ’ Name ’: [ ’ Alice ’ , ’ Bob ’ , ’ Charlie ’ , ’ David ’ , ’ Eve ’] ,
8 ’ Age ’: [23 , 22 , 24 , 23 , 22] ,
9 ’ Score ’: [88 , 92 , 85 , 91 , 89]
10 }
11
12 # Convert to DataFrame
13 df = pd . DataFrame ( data )
14 print ( df )
15
16 # Calculate statistical measures
17 mean_score = np . mean ( df [ ’ Score ’ ])
18 median_score = np . median ( df [ ’ Score ’ ])
19 mode_score = df [ ’ Score ’ ]. mode () [0]
20 std_dev_score = np . std ( df [ ’ Score ’ ])
21 variance_score = np . var ( df [ ’ Score ’ ])
22
23 # Print statistics
24 print ( f " Mean : { mean_score } " )
25 print ( f " Median : { median_score } " )
26 print ( f " Mode : { mode_score } " )
27 print ( f " Standard Deviation : { std_dev_score } " )
28 print ( f " Variance : { variance_score } " )
2
29
30 # Plot histogram
31 plt . hist ( df [ ’ Score ’] , bins =5 , color = ’ blue ’ , alpha =0.7)
32 plt . title ( ’ Distribution of Scores ’)
33 plt . xlabel ( ’ Score ’)
34 plt . ylabel ( ’ Frequency ’)
35 plt . show ()
Tasks
1. Represent the system of equations 2x + 3y = 5 and x − y = 1 as a matrix equation
Ax = b.
3. Verify the solution by substituting x and y back into the original equations.
Solution
1 import numpy as np
2
3 # Coefficients matrix A
4 A = np . array ([[2 , 3] , [1 , -1]])
5
6 # Constants matrix b
7 b = np . array ([5 , 1])
8
9 # Solve for x and y
10 solution = np . linalg . solve (A , b )
11 print ( f " Solution : x = { solution [0]} , y = { solution [1]} " )
12
13 # Verify the solution
14 check = np . dot (A , solution )
15 print ( f " Verification : { check } " )
3
Tasks
1. Create vectors and perform:
• Dot product
• Element-wise addition
• Cross product
2. Create matrices and perform:
• Matrix multiplication
• Transpose
• Inverse (if invertible)
3. Compute eigenvalues and eigenvectors of a random 3 × 3 matrix.
Solution
1 # Vector operations
2 v1 = np . array ([1 , 2 , 3])
3 v2 = np . array ([4 , 5 , 6])
4
5 dot_product = np . dot ( v1 , v2 )
6 e l e m e n t w i s e _ a d d i t i o n = v1 + v2
7 cross_product = np . cross ( v1 , v2 )
8
4
Tasks
1. Generate synthetic data for y = 2x + 1 with random noise.
2. Visualize the data using matplotlib.
3. Implement the linear regression formula:
θ = (X T X)−1 X T y
Solution
1 # Generate synthetic data
2 np . random . seed (42)
3 X = np . random . rand (100 , 1) * 10
4 y = 2 * X + 1 + np . random . randn (100 , 1)
5
6 # Visualize the data
7 plt . scatter (X , y , color = ’ blue ’)
8 plt . title ( ’ Generated Data ’)
9 plt . xlabel ( ’X ’)
10 plt . ylabel ( ’y ’)
11 plt . show ()
12
13 # Linear regression
14 X_bias = np . c_ [ np . ones (( X . shape [0] , 1) ) , X ]
15 theta = np . linalg . inv ( X_bias . T . dot ( X_bias ) ) . dot ( X_bias . T ) . dot ( y )
16
17 # Predictions
18 y_pred = X_bias . dot ( theta )
19
20 # MSE
21 mse = np . mean (( y - y_pred ) ** 2)
22 print ( f " Mean Squared Error : { mse } " )
23
Tasks
1. Research and Present AI Applications: List at least 5 applications of AI and
ML in different domains (e.g., healthcare, finance, transportation, etc.). Write a
brief explanation of how AI/ML is used in each application.
5
2. Classify Types of Learning: Describe and compare supervised learning, unsu-
pervised learning, and reinforcement learning. For each type of learning, provide a
real-world example. Create a table summarizing the types of learning.
3. Hands-on Task: Load a simple dataset (e.g., the Iris dataset) using scikit-learn
and visualize the features.
1 from sklearn import datasets
2 import matplotlib . pyplot as plt
3
Tasks
1. Data Cleaning: Load a dataset (e.g., Titanic dataset from Kaggle). Check for
missing values and apply methods to handle them (e.g., fill with mean or drop
rows).
1 import pandas as pd
2
3 # Load Titanic dataset
4 df = pd . read_csv ( ’ titanic . csv ’)
5
6 # Check for missing values
7 print ( df . isnull () . sum () )
8
9 # Fill missing ’ Age ’ values with the mean
10 df [ ’ Age ’ ]. fillna ( df [ ’ Age ’ ]. mean () , inplace = True )
11
6
3. Feature Selection: Use correlation analysis or feature importance (e.g., decision
trees) to select relevant features.
1 import seaborn as sns
2
Tasks
1. Regression with Linear Regression: Use the Boston Housing dataset from
scikit-learn to perform linear regression and predict house prices. Evaluate the
model using Mean Squared Error (MSE).
1 from sklearn . datasets import load_boston
2 from sklearn . linear_model import LinearRegression
3 from sklearn . metrics import m ea n_ sq ua re d_ er ro r
4 from sklearn . model_selection import train_test_split
5
6 # Load dataset
7 boston = load_boston ()
8 X = boston . data
9 y = boston . target
10
11 # Train - test split
12 X_train , X_test , y_train , y_test = train_test_split (X , y ,
test_size =0.2 , random_state =42)
13
14 # Train the model
15 model = LinearRegression ()
16 model . fit ( X_train , y_train )
17
18 # Predict and evaluate
19 y_pred = model . predict ( X_test )
20 mse = me an _s qu ar ed_ er ro r ( y_test , y_pred )
21 print ( f " Mean Squared Error : { mse } " )
22
2. Classification with Logistic Regression: Use the Iris dataset for classification
with Logistic Regression. Evaluate the model using accuracy and confusion matrix.
1 from sklearn . linear_model import Log is ti cR eg re ss io n
2 from sklearn . metrics import accuracy_score , confusion_matrix
3 from sklearn . model_selection import train_test_split
4
5 # Load dataset
7
6 iris = datasets . load_iris ()
7 X = iris . data
8 y = iris . target
9
10 # Train - test split
11 X_train , X_test , y_train , y_test = train_test_split (X , y ,
test_size =0.3 , random_state =42)
12
13 # Train the model
14 model = Lo gi st ic Re gr es si on ( max_iter =200)
15 model . fit ( X_train , y_train )
16
17 # Predict and evaluate
18 y_pred = model . predict ( X_test )
19 print ( f " Accuracy : { accuracy_score ( y_test , y_pred ) } " )
20 print ( f " Confusion Matrix : \ n { confusion_matrix ( y_test , y_pred ) } "
)
21
Tasks
1. Clustering with K-Means: Apply K-Means clustering on the Iris dataset and
visualize the clusters.
1 from sklearn . cluster import KMeans
2 import matplotlib . pyplot as plt
3
4 # Apply KMeans clustering
5 kmeans = KMeans ( n_clusters =3 , random_state =42)
6 y_kmeans = kmeans . fit_predict ( X )
7
8 # Visualize the clusters
9 plt . scatter ( X [: , 0] , X [: , 1] , c = y_kmeans , cmap = ’ viridis ’)
10 plt . scatter ( kmeans . cluster_centers_ [: , 0] , kmeans .
cluster_centers_ [: , 1] , s =200 , c = ’ red ’ , marker = ’x ’)
11 plt . title ( "K - Means Clustering " )
12 plt . xlabel ( " Feature 1 " )
13 plt . ylabel ( " Feature 2 " )
14 plt . show ()
15
8
5 X_pca = pca . fit_transform ( X )
6
7 # Visualize the reduced data
8 plt . scatter ( X_pca [: , 0] , X_pca [: , 1] , c =y , cmap = ’ viridis ’)
9 plt . title ( " PCA - Iris Data " )
10 plt . xlabel ( " Principal Component 1 " )
11 plt . ylabel ( " Principal Component 2 " )
12 plt . show ()
13
Tasks
1. Model Evaluation with Cross-Validation: Apply cross-validation to evaluate
the performance of a classification model (e.g., SVM or Random Forest).
1 from sklearn . model_selection import cross_val_score
2 from sklearn . ensemble import R a n d o m F o r e s t C l a s s i f i e r
3
4 # Apply cross - validation
5 rf = R a n d o m F o r e s t C l a s s i f i e r ( n_estimators =100)
6 scores = cross_val_score ( rf , X , y , cv =5)
7 print ( f " Cross - validated accuracy : { scores . mean () } " )
8
8 # Apply GridSearchCV
9 grid_search = GridSearchCV ( svm , param_grid , cv =5)
10 grid_search . fit ( X_train , y_train )
11
12 # Display best hyperparameters
13 print ( f " Best Hyperparameters : { grid_search . best_params_ } " )
14