Machine Learning Laboratory
Machine Learning Laboratory
Machine Learning
Laboratory
BCSL606
2025
Faculty Incharge
Bengaluru - 560032
2025
Machine Learning Laboratory (BCSL606)
1. Develop a program to create histograms for all numerical features and analyze the
distribution of each feature. Generate box plots for all numerical features and identify any
outliers. Use California Housing dataset.
DATASET
California Housing dataset
california_housing = fetch_california_housing(as_frame=True)
california_housing.frame.head()
print(california_housing.DESCR)
.. _california_housing_dataset:
:Attribute Information:
- MedInc median income in block group
- HouseAge median house age in block group
- AveRooms average number of rooms per household
- AveBedrms average number of bedrooms per household
- Population block group population
Machine Learning Laboratory (BCSL606)
The target variable is the median house value for California districts,
expressed in hundreds of thousands of dollars ($100,000).
This dataset was derived from the 1990 U.S. census, using one row per census
block group. A block group is the smallest geographical unit for which the U.S.
Census Bureau publishes sample data (a block group typically has a population
of 600 to 3,000 people).
.. rubric:: References
PROGRAM 1
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import fetch_california_housing
return df
if __name__ == "__main__":
main()
OUTPUT:
Machine Learning Laboratory (BCSL606)
2. Develop a program to Compute the correlation matrix to understand the relationships
between pairs of features. Visualize the correlation matrix using a heatmap to know
which variables have strong positive/negative correlations. Create a pair plot to visualize
pairwise relationships between features. Use California Housing dataset.
PROGRAM 2
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing
def main():
# Load dataset
df = load_data()
print("Dataset loaded successfully!")
# Pair plot
print("\nCreating pair plot for numerical features...")
create_pairplot(df)
if __name__ == "__main__":
main()
OUTPUT:
3. Develop a program to implement Principal Component Analysis (PCA) for reducing the
dimensionality of the Iris dataset from 4 features to 2.
PROGRAM 3.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
# Apply PCA
pca = PCA(n_components=n_components)
principal_components = pca.fit_transform(standardized_features)
# Perform PCA
pca_data, pca = perform_pca(data, n_components=2)
explained_variance = pca.explained_variance_ratio_
if __name__ == "__main__":
main()
OUTPUT:
4. For a given set of training data examples stored in a .CSV file, implement and
demonstrate the Find-S algorithm to output a description of the set of all hypotheses
consistent with the training examples.
DATASET:
training_data.csv
PROGRAM 4.
import pandas as pd
return hypothesis
try:
data = pd.read_csv(r'C:\Users\USER\Desktop\training_data.csv')
print("Training Data Loaded Successfully!\n")
print(data, "\n")
except Exception as e:
print(f"Error loading data: {e}")
OUTPUT:
Final Hypothesis Consistent with Positive Examples: ['Sunny', 'Warm', '?', 'Strong', '?', '?']
Machine Learning Laboratory (BCSL606)
5. Develop a program to implement k-Nearest Neighbour algorithm to classify the randomly
generated 100 values of x in the range of [0,1]. Perform the following based on dataset
generated. a. Label the first 50 points {x1,……,x50} as follows: if (xi ≤ 0.5), then xi ∊
Class1, else xi ∊ Class1 b. Classify the remaining points, x51,……,x100 using KNN.
Perform this for k=1,2,3,4,5,20,30
PROGRAM 5
import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier
# Visualization
plt.figure(figsize=(8, 5))
plt.scatter(x[:50], [0] * 50, c=['red' if label == "Class1" else 'blue' for label in y],
label="Training Data (Class1=Red, Class2=Blue)")
Machine Learning Laboratory (BCSL606)
plt.scatter(x[50:], [0] * 50, c=['red' if label == "Class1" else 'blue' for label in y_pred],
marker='x', label="Test Data Predictions")
plt.axvline(x=0.5, color='gray', linestyle='--', label="Decision Boundary (x=0.5)")
plt.title(f"KNN Classification with k={k}")
plt.xlabel("x values")
plt.yticks([])
plt.legend()
plt.grid(alpha=0.4)
plt.tight_layout()
plt.show()
OUTPUT:
6. Implement the non-parametric Locally Weighted Regression algorithm in order to fit data
points. Select appropriate data set for your experiment and draw graphs.
PROGRAM 6
import numpy as np
import matplotlib.pyplot as plt
Parameters:
x_train: np.array, shape (n,)
Training data features.
y_train: np.array, shape (n,)
Training data labels.
x_test: np.array, shape (m,)
Test data features.
tau: float
Bandwidth parameter (controls the weight decay).
Returns:
y_pred: np.array, shape (m,)
Predicted values for x_test.
"""
m = len(x_test)
y_pred = np.zeros(m)
for i in range(m):
weights = np.exp(-np.square(x_train - x_test[i]) / (2 * tau**2)) # Gaussian weights
W = np.diag(weights) # Diagonal weight matrix
X = np.c_[np.ones(len(x_train)), x_train] # Add intercept term
theta = np.linalg.pinv(X.T @ W @ X) @ X.T @ W @ y_train # Normal equation with
weights
y_pred[i] = [1, x_test[i]] @ theta # Predict y for x_test[i]
return y_pred
plt.tight_layout()
plt.show()
OUTPUT:
====================== RESTART: C:/Users/USER/Sixth.py ======================
Machine Learning Laboratory (BCSL606)
Machine Learning Laboratory (BCSL606)
7. Develop a program to demonstrate the working of Linear Regression and Polynomial
Regression. Use Boston Housing Dataset for Linear Regression and Auto MPG Dataset
(for vehicle fuel efficiency prediction) for Polynomial Regression.
PROGRAM 7
OUTPUT:
==================== RESTART: C:/Users/USER/Seventh.py =====================
8. Develop a program to demonstrate the working of the decision tree algorithm. Use
Breast Cancer Data set for building the decision tree and apply this knowledge to
classify a new sample.
PROGRAM 8
OUTPUT :
9. Develop a program to implement the Naive Bayesian classifier considering Olivetti Face
Data set for training. Compute the accuracy of the classifier, considering a few test data
sets.
PROGRAM 9
Machine Learning Laboratory (BCSL606)
OUTPUT :
10.Develop a program to implement k-means clustering using Wisconsin Breast Cancer
data set and visualize the clustering result.
PROGRAM 10
import pandas as pd
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
from sklearn.metrics import silhouette_score
OUTPUT:
===================== RESTART: C:/Users/USER/tenth.py ======================
Silhouette Score: 0.345