Machine Learning Lab Manual-1
Machine Learning Lab Manual-1
[email protected] www.btibangalore.org
Phone: 7090404050
PREPARED BY:
Mrs.Dhivya C
[email protected] www.btibangalore.org
Phone: 7090404050
VISION
To impart the best in academia that empowers the students of Computer Science and
Engineering to contribute their best for the society.
MISSION
To mold the students as responsible computing professionals and citizens by providing an
excellent soft skill learning environment
To equip the students with wisdom- theory and practical in the discipline of computing and
the ability to apply knowledge to the benefits of the society
To Inculcate Technical Capabilities, Ethical values, & Leadership abilities for meeting the
current and future demands of Industry and Society.
Design and develop dynamic, interactive, and responsive websites using HTML, CSS,
JavaScript, and modern front-end frameworks like React or Angula.
Develop server-side applications using technologies like Node.js, PHP, or Python, and
integrate them with databases (SQL/NoSQL) for data storage and management.
MACHINE LEARNING LABORATORY
SEMESTER – VI
For a given set of training data examples stored in a .CSV file, implement and
4. demonstrate the Find-S algorithm to output a description of the set of all hypotheses
consistent with the training examples.
Develop a program to implement k-Nearest Neighbour algorithm to classify the randomly
5. generated 100 values of x in the range of [0,1]. Perform the following based on dataset
generated. a. Label the first 50 points {x1,……,x50} as follows: if (xi ≤ 0.5), then xi ∊ Class1,
else xi ∊ Class1
b. Classify the remaining points, x51,……,x100 using KNN. Perform this for
k=1,2,3,4,5,20,30
Implement the non-parametric Locally Weighted Regression algorithm in order to fit data
6. points. Select appropriate data set for your experiment and draw graphs
Develop a program to demonstrate the working of Linear Regression and Polynomial
7. Regression. Use Boston Housing Dataset for Linear Regression and Auto MPG Dataset (for
vehicle fuel efficiency prediction) for Polynomial Regression.
Develop a program to demonstrate the working of the decision tree algorithm. Use Breast
8. Cancer Data set for building the decision tree and apply this knowledge to classify a new
sample.
Develop a program to implement the Naive Bayesian classifier considering Olivetti Face Data
9. set for training. Compute the accuracy of the classifier, considering a few test data sets.
Develop a program to implement k-means clustering using Wisconsin Breast Cancer data set
10. and visualize the clustering result
● Illustrate the principles of multivariate data and apply dimensionality reduction techniques.
● Demonstrate similarity-based learning methods and perform regression analysis.
● Develop decision trees for classification and regression problems, and Bayesian models for
probabilistic learning.
● Implement the clustering algorithms to share computing resources.
Conduct of Practical Examination:
● Experiment distribution
o For laboratories having only one part: Students are allowed to pick one experiment from
the lot with equal opportunity.
o For laboratories having PART A and PART B: Students are allowed to pick one
experiment from PART A and one experiment from PART B, with equal opportunity.
● Change of experiment is allowed only once and marks allotted for procedure to be made zero of
the changed part only.
● Marks Distribution (Need to change in accordance with university regulations)
c) For laboratories having only one part – Procedure + Execution + Viva-Voce: 15+70+15 =
100 Marks
d) For laboratories having PART A and PART B
i. Part A – Procedure + Execution + Viva = 6 + 28 + 6 = 40 Marks
ii. Part B – Procedure + Execution + Viva = 9 + 42 + 9 = 60 Marks
INDEX
import pandas as pd
import numpy as np
data = fetch_california_housing(as_frame=True)
housing_df = data.frame
numerical_features = housing_df.select_dtypes(include=[np.number]).columns
# Plot histograms
plt.figure(figsize=(15, 10))
plt.subplot(3, 3, i + 1)
plt.title(f'Distribution of {feature}')
plt.tight_layout()
plt.show()
plt.figure(figsize=(15, 10))
plt.subplot(3, 3, i + 1)
1|Page
sns.boxplot(x=housing_df[feature], color='orange')
plt.tight_layout()
plt.show()
print("Outliers Detection:")
outliers_summary = {}
Q1 = housing_df[feature].quantile(0.25)
Q3 = housing_df[feature].quantile(0.75)
IQR = Q3 - Q1
outliers_summary[feature] = len(outliers)
print("\nDataset Summary:")
print(housing_df.describe())
OutPut:-
2|Page
3|Page
2. Develop a program to Compute the correlation ma trix to understand the
relationships between pairs of features. Visualize the correlation matrix using a
heatmap to know which variables have strong positive/negative correlations. Create a
pair plot to visualize pairwise relationships between features. Use California Housing
dataset.
import pandas as pd
california_data = fetch_california_housing(as_frame=True)
data = california_data.frame
correlation_matrix = data.corr()
plt.figure(figsize=(10, 8))
plt.show()
plt.show()
OutPut:-
4|Page
5|Page
3. Develop a program to implement Principal Component Analysis (PCA) for reducing
the dimensionality of the Iris dataset from 4 features to 2
import numpy as np
import pandas as pd
iris = load_iris()
data = iris.data
labels = iris.target
label_names = iris.target_names
pca = PCA(n_components=2)
data_reduced = pca.fit_transform(data)
reduced_df['Label'] = labels
plt.figure(figsize=(8, 6))
6|Page
colors = ['r', 'g', 'b']
plt.scatter(
label=label_names[label],
color=colors[i]
plt.legend()
plt.grid()
plt.show()
OutPut:-
7|Page
4. For a given set of training data examples stored in a .CSV file, implement and
demonstrate the Find-S algorithm to output a description of the set of all hypotheses
consistent with the training examples.
import pandas as pd
def find_s_algorithm(file_path):
data = pd.read_csv(file_path)
print("Training data:")
print(data)
attributes = data.columns[:-1]
class_label = data.columns[-1]
if row[class_label] == 'Yes':
hypothesis[i] = value
else:
hypothesis[i] = '?'
return hypothesis
file_path = 'training_data.csv'
hypothesis = find_s_algorithm(file_path)
8|Page
9|Page
5. Develop a program to implement k-Nearest Neighbour algorithm to classify the
randomly generated 100 values of x in the range of [0,1]. Perform the following based
on dataset generated.
a. Label the first 50 points {x1,……,x50} as follows: if (xi ≤ 0.5), then xi ∊ Class1, else xi
∊ Class1
b. Classify the remaining points, x51,……,x100 using KNN. Perform this for
k=1,2,3,4,5,20,30
import numpy as np
data = np.random.rand(100)
distances.sort(key=lambda x: x[0])
k_nearest_neighbors = distances[:k]
return Counter(k_nearest_labels).most_common(1)[0][0]
train_data = data[:50]
train_labels = labels
test_data = data[50:]
10 | P a g e
print("Training dataset: First 50 points labeled based on the rule (x <= 0.5 -> Class1, x > 0.5 ->
Class2)")
results = {}
for k in k_values:
results[k] = classified_labels
print("\n")
print("Classification complete.\n")
for k in k_values:
classified_labels = results[k]
plt.figure(figsize=(10, 6))
plt.xlabel("Data Points")
plt.ylabel("Classification Level")
plt.legend()
11 | P a g e
plt.grid(True)
plt.show()
12 | P a g e
13 | P a g e
14 | P a g e
15 | P a g e
6. Implement the non-parametric Locally Weighted Regression algorithm in order to fit
data points. Select appropriate data set for your experiment and draw graphs
import numpy as np
m = X.shape[0]
W = np.diag(weights)
X_transpose_W = X.T @ W
return x @ theta
np.random.seed(42)
X_bias = np.c_[np.ones(X.shape), X]
tau = 0.5
plt.figure(figsize=(10, 6))
plt.xlabel('X', fontsize=12)
16 | P a g e
plt.ylabel('y', fontsize=12)
plt.legend(fontsize=10)
plt.grid(alpha=0.3)
plt.show()
17 | P a g e
7. Develop a program to demonstrate the working of Linear Regression and Polynomial
Regression. Use Boston Housing Dataset for Linear Regression and Auto MPG Dataset
(for vehicle fuel efficiency prediction) for Polynomial Regression
import numpy as np
import pandas as pd
def linear_regression_california():
housing = fetch_california_housing(as_frame=True)
X = housing.data[["AveRooms"]]
y = housing.target
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
18 | P a g e
plt.ylabel("Median value of homes ($100,000)")
plt.legend()
plt.show()
def polynomial_regression_auto_mpg():
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-
mpg.data"
data = data.dropna()
X = data["displacement"].values.reshape(-1, 1)
y = data["mpg"].values
poly_model.fit(X_train, y_train)
y_pred = poly_model.predict(X_test)
19 | P a g e
plt.scatter(X_test, y_test, color="blue", label="Actual")
plt.xlabel("Displacement")
plt.legend()
plt.show()
linear_regression_california()
polynomial_regression_auto_mpg()
20 | P a g e
21 | P a g e
8. Develop a program to demonstrate the working of the decision tree algorithm. Use
Breast Cancer Data set for building the decision tree and apply this knowledge to
classify a new sample.
import numpy as np
data = load_breast_cancer()
X = data.data
y = data.target
clf = DecisionTreeClassifier(random_state=42)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
new_sample = np.array([X_test[0]])
prediction = clf.predict(new_sample)
22 | P a g e
plt.figure(figsize=(12,8))
plt.show()
23 | P a g e
9. Develop a program to implement the Naive Bayesian classifier considering Olivetti
Face Data set for training. Compute the accuracy of the classifier, considering a few test
data sets.
import numpy as np
X = data.data
y = data.target
gnb = GaussianNB()
gnb.fit(X_train, y_train)
y_pred = gnb.predict(X_test)
print("\nClassification Report:")
print("\nConfusion Matrix:")
print(confusion_matrix(y_test, y_pred))
24 | P a g e
cross_val_accuracy = cross_val_score(gnb, X, y, cv=5, scoring='accuracy')
ax.axis('off')
plt.show()
25 | P a g e
26 | P a g e
10. Develop a program to implement k-means clustering using Wisconsin Breast Cancer
data set and visualize the clustering result.
import numpy as np
import pandas as pd
data = load_breast_cancer()
X = data.data
y = data.target
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
y_kmeans = kmeans.fit_predict(X_scaled)
print("Confusion Matrix:")
print(confusion_matrix(y, y_kmeans))
print("\nClassification Report:")
print(classification_report(y, y_kmeans))
pca = PCA(n_components=2)
27 | P a g e
X_pca = pca.fit_transform(X_scaled)
df['Cluster'] = y_kmeans
df['True Label'] = y
plt.figure(figsize=(8, 6))
plt.legend(title="Cluster")
plt.show()
plt.figure(figsize=(8, 6))
plt.legend(title="True Label")
plt.show()
plt.figure(figsize=(8, 6))
centers = pca.transform(kmeans.cluster_centers_)
28 | P a g e
plt.title('K-Means Clustering with Centroids')
plt.legend(title="Cluster")
plt.show()
29 | P a g e
30 | P a g e