Machine Learning Lab
Machine Learning Lab
Vision
To develop competent professionals with strong fundamentals in Information
Science and Engineering, interdisciplinary research and ethical values for the
betterment of the society.
Mission
M1- To establish a transformational learning ambience with good infrastructure
facilities to impart knowledge and the necessary skill set to produce competent
professionals.
M2- To create a new generation of engineers who excel in their career with
leadership/entrepreneur qualities.
Syllabus
Machine Learning Lab Semester 6
Course Code BCSL606 CIE Marks 50
Teaching Hours/Week (L: T:P: S) 0:0:2:0 SEE Marks 50
Credits 01 Total Marks 100
Examination nature (SEE) Practical
Course objectives:
1. To become familiar with data and visualize univariate, bivariate, and multivariate data using statistical
techniques and dimensionality reduction.
2. To understand various machine learning algorithms such as similarity-based learning, regression, decision
trees, and clustering.
3. To familiarize with learning theories, probability-based models and developing the skills required for
decision-making in dynamic environments.
SETTING UP BASIC COMMANDS:
iv
Department of Information Science & Engineering, Atria Institute of Technology
Machine Learning Lab BCSL606
CONTENTS
1 Program No.1 1
2 Program No.2 4
3 Program No.3 7
4 Program No.4 9
5 Program No.5 11
6 Program No.6 13
7 Program No.7 15
8 Program No.8 19
9 Program No.9 22
10 Program No.10 25
12 VIVA QUESTION 28
v
Department of Information Science & Engineering, Atria Institute of Technology
Machine Learning Lab BCSL606
Program 1: Develop a program to create histograms for all numerical features and
analyze the distribution of each feature. Generate box plots for all numerical features
and identify any outliers. Use California Housing dataset.
Code:
# Import necessary libraries
import numpy as np # For numerical computations
import pandas as pd # For handling tabular data
import matplotlib.pyplot as plt # For data visualization
import seaborn as sns # For enhanced statistical plots
OUTPUT:
2
Department of Information Science & Engineering, Atria Institute of Technology
Machine Learning Lab BCSL606
3
Department of Information Science & Engineering, Atria Institute of Technology
Machine Learning Lab BCSL606
Code:
# Import necessary libraries
import pandas as pd # For handling tabular data efficiently
import seaborn as sns # For visualization, including correlation heatmaps and pair plots
import matplotlib.pyplot as plt # For handling plots and customizing visualizations
4
Department of Information Science & Engineering, Atria Institute of Technology
Machine Learning Lab BCSL606
OUTPUT:
5
Department of Information Science & Engineering, Atria Institute of Technology
Machine Learning Lab BCSL606
6
Department of Information Science & Engineering, Atria Institute of Technology
Machine Learning Lab BCSL606
Code:
# Import necessary libraries
import numpy as np # For numerical operations
import pandas as pd # For handling tabular data
import matplotlib.pyplot as plt # For plotting
import seaborn as sns # For better visualizations
from sklearn.decomposition import PCA # PCA algorithm
from sklearn.preprocessing import StandardScaler # Standardization
# Step 7: Print explained variance ratio (How much information each principal component retains)
print("\nExplained Variance Ratio of PCA Components:", pca.explained_variance_ratio_)
OUTPUT:
8
Department of Information Science & Engineering, Atria Institute of Technology
Machine Learning Lab BCSL606
Program 4: For a given set of training data examples stored in a .CSV file, implement and
demonstrate the Find-S algorithm to output a description of the set of all hypotheses
consistent with the training examples.
Code:
# Import necessary libraries
import pandas as pd # For data handling
9
Department of Information Science & Engineering, Atria Institute of Technology
Machine Learning Lab BCSL606
OUTPUT:
10
Department of Information Science & Engineering, Atria Institute of Technology
Machine Learning Lab BCSL606
Code:
# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier
# Step 2: Label the first 50 points and assign labels to the entire dataset
y = np.where(X[:50] <= 0.5, 1, 2) # Label the first 50 points as 1 (Class 1 if x <= 0.5)
y = np.concatenate([y, np.where(X[50:] <= 0.5, 1, 2)]) # Label the next 50 points as 1 or 2
for k in k_values:
knn = KNeighborsClassifier(n_neighbors=k) # Initialize KNN model
knn.fit(X_train, y_train) # Train the model
y_pred = knn.predict(X_test) # Predict on test data
predictions[k] = y_pred # Store predictions
11
Department of Information Science & Engineering, Atria Institute of Technology
Machine Learning Lab BCSL606
for k, pred in predictions.items():
print(f"\nPredictions for k={k}:")
print(pred)
OUTPUT:
12
Department of Information Science & Engineering, Atria Institute of Technology
Machine Learning Lab BCSL606
Code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Select the features for regression (for simplicity, let's use 'age' as feature and 'trestbps' as target)
X = df['age'].values.reshape(-1, 1) # Feature: 'age'
y = df['trestbps'].values # Target: 'trestbps'
return theta
13
Department of Information Science & Engineering, Atria Institute of Technology
Machine Learning Lab BCSL606
# Predictions function
def predict(X, theta):
"""
Predict the target using the trained model coefficients.
"""
X = np.hstack((np.ones((X.shape[0], 1)), X)) # Add bias term (intercept)
return X @ theta
# Generate predictions
y_pred = predict(X, theta)
OUTPUT:
14
Department of Information Science & Engineering, Atria Institute of Technology
Machine Learning Lab BCSL606
Code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.impute import SimpleImputer # For handling missing values
# Load datasets
boston_data_path = "/content/drive/MyDrive/dataset/6TH SEM/BostonHousing.csv"
auto_mpg_data_path = "/content/drive/MyDrive/dataset/6TH SEM/auto-mpg.csv"
poly_reg = LinearRegression()
poly_reg.fit(X_train_poly, y_train_auto)
OUTPUT:
17
Department of Information Science & Engineering, Atria Institute of Technology
Machine Learning Lab BCSL606
18
Department of Information Science & Engineering, Atria Institute of Technology
Machine Learning Lab BCSL606
Code:
# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
# Data Preprocessing
X = dataset.drop(['diagnosis'], axis=1) # Features (excluding the target column 'diagnosis')
y = dataset['diagnosis'] # Target variable: 'diagnosis'
# Data Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Make Predictions
y_pred = clf.predict(X_test_scaled)
# Model Evaluation
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")
print("\nClassification Report:")
print(classification_report(y_test, y_pred))
print("\nConfusion Matrix:")
print(confusion_matrix(y_test, y_pred))
19
Department of Information Science & Engineering, Atria Institute of Technology
Machine Learning Lab BCSL606
# Example: Classify a new sample (this matches the structure of the original dataset)
new_sample = pd.DataFrame({
'radius_mean': [15.0], # Example values for features
'texture_mean': [20.0],
'perimeter_mean': [100.0],
'area_mean': [500.0],
'smoothness_mean': [0.1],
'compactness_mean': [0.2],
'concavity_mean': [0.3],
'concave points_mean': [0.15],
'symmetry_mean': [0.3],
'fractal_dimension_mean': [0.05],
'radius_se': [0.1],
'texture_se': [0.2],
'perimeter_se': [0.05],
'area_se': [200.0],
'smoothness_se': [0.02],
'compactness_se': [0.05],
'concavity_se': [0.1],
'concave points_se': [0.03],
'symmetry_se': [0.06],
'fractal_dimension_se': [0.02],
'radius_worst': [25.0],
'texture_worst': [25.0],
'perimeter_worst': [150.0],
'area_worst': [1200.0],
'smoothness_worst': [0.1],
'compactness_worst': [0.5],
'concavity_worst': [0.4],
'concave points_worst': [0.3],
'symmetry_worst': [0.5],
'fractal_dimension_worst': [0.1]
})
20
Department of Information Science & Engineering, Atria Institute of Technology
Machine Learning Lab BCSL606
# Scale the new sample using the same scaler (fit during training)
new_sample_scaled = scaler.transform(new_sample)
OUTPUT:
21
Department of Information Science & Engineering, Atria Institute of Technology
Machine Learning Lab BCSL606
Code:
import scipy.io
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
# Predictions
y_pred = nb_classifier.predict(X_test)
# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Naive Bayes Classifier Accuracy: {accuracy * 100:.2f}%\n")
22
Department of Information Science & Engineering, Atria Institute of Technology
Machine Learning Lab BCSL606
for i in range(10):
axes[i].imshow(X_test[i].reshape(64, 64), cmap='gray')
axes[i].set_title(f"Pred: {y_pred[i]}\nActual: {y_test[i]}")
axes[i].axis('off')
plt.tight_layout()
plt.show()
OUTPUT:
23
Department of Information Science & Engineering, Atria Institute of Technology
Machine Learning Lab BCSL606
24
Department of Information Science & Engineering, Atria Institute of Technology
Machine Learning Lab BCSL606
Code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
# Load dataset
dataset_path = '/content/drive/MyDrive/dataset/breastcancer_modified.csv'
df = pd.read_csv(dataset_path)
df['PCA1'] = X_pca[:, 0]
df['PCA2'] = X_pca[:, 1]
plt.figure(figsize=(8, 6))
25
Department of Information Science & Engineering, Atria Institute of Technology
Machine Learning Lab BCSL606
sns.scatterplot(x=df['PCA1'], y=df['PCA2'], hue=df['Cluster'], palette='viridis', s=80)
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], c='red', marker='X', s=200,
label="Centroids")
plt.xlabel('PCA Component 1')
plt.ylabel('PCA Component 2')
plt.title('K-Means Clustering Visualization')
plt.legend()
plt.show()
OUTPUT:
26
Department of Information Science & Engineering, Atria Institute of Technology
Machine Learning Lab BCSL606
27
Department of Information Science & Engineering, Atria Institute of Technology
Machine Learning Lab BCSL606
VIVA QUESTION:
1. What is a histogram, and how does it help in data analysis?
2. How do box plots help in identifying outliers in a dataset?
3. What is the significance of the correlation matrix in data analysis?
4. How does a heatmap help in visualizing the correlation matrix?
5. What is a pair plot, and how does it help in feature analysis?
6. What is Principal Component Analysis (PCA), and why is it used?
7. How does PCA reduce dimensionality while preserving variance?
8. What are eigenvalues and eigenvectors in PCA?
9. Explain the Find-S algorithm and its working principle.
10. What are the assumptions of the Find-S algorithm?
11. What is k-Nearest Neighbors (KNN), and how does it work?
12. How does the choice of k affect the performance of the KNN algorithm?
13. What is the difference between parametric and non-parametric models?
14. How does Locally Weighted Regression differ from standard linear regression?
15. What are the advantages of Locally Weighted Regression?
16. What is the difference between linear regression and polynomial regression?
17. How do you evaluate the performance of a regression model?
18. What are the main components of a decision tree?
19. How does a decision tree split data at each node?
20. What is entropy in the context of decision tree algorithms?
28
Department of Information Science & Engineering, Atria Institute of Technology