21CSC305P ML - Lab Programs 1 - 9
21CSC305P ML - Lab Programs 1 - 9
LAB MANUAL
QGPROGRAM 1:
AIM: To implement a program to load and view the dataset
ALGORITHM:
1. Download a dataset from Kaggle
PROGRAM:
import pandas as pd
import matplotlib.pylot as plt
af = pd.read_csv(“Most streamed spotify songs 2024.csv”, encoding=’unicode_escape’)
new_af=af.copy()
new_af=new_af[[“Artist”, “release date”, “All time rank”, “spotify playlist reach”, “youtube
views”]]
new_af.head(10)
OUTPUT:
RESULT: Thus the desired output of most streamed songs has been successfully executed.
PROGRAM 2
ALGORITHM:
PROGRAM:
new_af.describe()
OUTPUT:
Median is 67.0
RESULT: Thus the program to display the summary and statistics has been successfully verified
and executed.
PROGRAM 3
ALGORITHM:
1. Initialize Parameters: Start by initializing the parameters like coefficients (slope and
intercept).
2. Input Data: Gather the dataset containing the independent variable (X) and dependent
variable (Y).
3. Feature Scaling (Optional): Normalize or standardize the input data if necessary to ensure
better convergence.
4. Split Data: Divide the dataset into training and testing sets to evaluate the model.
5. Model Training: Implement a method to optimize the parameters (coefficients) based on the
training data. This can be done using techniques like Gradient Descent, Normal Equations, or
using libraries like scikit-learn.
6. Prediction: Use the learned parameters to predict outcomes for new data points or the test
set.
7. Evaluation: Measure the performance of the model using evaluation metrics like Mean
Squared Error (MSE), R-squared, etc.
PROGRAM:
import numpy as np
import matplotlib.pyplot as plt
# Sample dataset
X = np.array([1, 2, 3, 4, 5])
Y = np.array([2, 3, 4, 5, 6])
# Calculate theta using the Normal Equation: theta = (X^T * X)^(-1) * X^T * Y
theta = np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(Y)
return theta
return Y_pred
theta = linear_regression_ols(X, Y)
# Make predictions
X_new = np.array([6, 7])
predictions = predict(X_new, theta)
plt.ylabel('Y')
plt.legend()
RESULT: Thus the implementation of linear regression to perform prediction has been successfully
executed.
PROGRAM 4.1
ALGORITHM:
8. Initialize Parameters: Start with prior distributions for the model parameters (e.g.,
coefficients, intercept) and likelihood distributions based on the data.
9. Input Data: Gather the dataset containing features (X) and corresponding binary
labels (Y).
10. Posterior Calculation: Use Bayesian inference techniques such as Markov Chain
Monte Carlo (MCMC) or Variational Inference to compute the posterior distribution
over the parameters given the data.
11. Prediction: Use the posterior distribution to predict the probability of classes for new
data points.
12. Evaluation: Measure the performance of the model using metrics such as accuracy,
precision, recall, and F1-score.
PROGRAM:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
for i in range(num_samples):
# Generate proposal from Gaussian distribution
proposal = theta_current + np.random.randn(n)
# Calculate likelihoods
likelihood_current = np.sum(Y * X.dot(theta_current) - np.log(1 +
np.exp(X.dot(theta_current))))
likelihood_proposal = np.sum(Y * X.dot(proposal) - np.log(1 + np.exp(X.dot(proposal))))
if accept:
theta_current = proposal
acceptance_count += 1
trace[i] = theta_current
# Calculate accuracy
accuracy = accuracy_score(Y_test, Y_pred)
print(f'Accuracy: {accuracy}')
OUTPUT:
Acceptance rate: 0.005
Accuracy: 0.69
RESULT: Thus the program for to implement Bayesian logistic regression for classification is
been successfully executed.
PROGRAM 4.2
ALGORITHM:
i. Initialize Parameters: Start with setting parameters such as kernel type (linear,
polynomial, radial basis function (RBF)), regularization parameter (C), and kernel
coefficients (gamma).
ii. Input Data: Gather the dataset containing features (X) and corresponding binary
labels (Y).
iii. Model Training: Use the training data to fit the SVM model, adjusting the parameters
to maximize the margin between classes.
iv. Prediction: Use the learned SVM model to predict classes for new data points.
v. Evaluation: Measure the performance of the model using metrics such as accuracy,
precision, recall, and F1-score.
PROGRAM:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report
# Calculate accuracy
accuracy = accuracy_score(Y_test, Y_pred)
print(f'Accuracy: {accuracy}')
# Classification report
print(classification_report(Y_test, Y_pred))
OUTPUT:
Accuracy: 0.845
RESULT: The given program for SVM classification is executed and verified successfully.
PROGRAM 5.1
ALGORITHM:
import numpy as np
np.random.seed(random_seed)
return X
# K-Means Clustering
for _ in range(max_iters):
if np.all(centroids == new_centroids):
break
centroids = new_centroids
X = generate_data()
plt.figure(figsize=(10, 6))
plt.title('K-Means Clustering')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()
OUTPUT:
RESULT: Thus the program for K-means clustering is been executed successfully
PROGRAM 5.2
ALGORITHM:
1. Initialize: Choose initial parameters for the Gaussian components (means, covariances, and
mixing coefficients).
2. Expectation (E-step): Calculate the probability of each data point belonging to each Gaussian
component.
3. Maximization (M-step): Update the parameters of the Gaussian components using the
probabilities computed in the E-step.
4. Repeat: Repeat the E-step and M-step until convergence.
PROGRAM:
import numpy as np
np.random.seed(random_seed)
return X
# Initialize parameters
np.random.seed(42)
weights = np.ones(k) / k
n = x.shape[0]
diff = x - mean
for i in range(n_samples):
for j in range(k):
responsibilities[i] /= np.sum(responsibilities[i])
return responsibilities
covariances = []
for j in range(k):
diff = X - means[j]
covariances.append(cov)
for _ in range(max_iters):
plt.figure(figsize=(10, 6))
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()
OUTPUT:
RESULT:
PROGRAM 5.3
ALGORITHM:
PROGRAM:
import numpy as np
np.random.seed(random_seed)
return X
def compute_distance_matrix(X):
n_samples = X.shape[0]
for i in range(n_samples):
distances[j, i] = distances[i, j]
return distances
# Hierarchical Clustering
def hierarchical_clustering(X):
distances = compute_distance_matrix(X)
n_samples = len(X)
# Initialize clusters
min_dist = float('inf')
for i in range(len(clusters)):
if d < min_dist:
min_dist = d
to_merge = (i, j)
clusters[i].extend(clusters[j])
del clusters[j]
return clusters
labels = np.zeros(n_samples)
labels[index] = cluster_id
return labels
X = generate_data()
final_clusters = hierarchical_clustering(X)
plt.figure(figsize=(10, 6))
plt.title('Hierarchical Clustering')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()
OUTPUT:
ALGORITHM:
1. Standardize the Data: Center the data by subtracting the mean of each feature from the
data. Optionally, scale each feature to unit variance.
2. Compute the Covariance Matrix: Calculate the covariance matrix of the centered data.
3. Calculate Eigenvalues and Eigenvectors: Find the eigenvalues and eigenvectors of the
covariance matrix. The eigenvectors determine the directions of the new feature space, and
the eigenvalues determine their magnitude.
4. Sort Eigenvalues and Eigenvectors: Sort the eigenvectors by decreasing eigenvalues and
select the top k eigenvectors to form a matrix (principal components).
5. Transform the Data: Project the original data onto the new feature space using the matrix of
principal components.
PROGRAM:
import numpy as np
sorted_index = np.argsort(eigenvalues)[::-1]
sorted_eigenvectors = eigenvectors[:, sorted_index]
sorted_eigenvalues = eigenvalues[sorted_index]
# Example usage
if __name__ == "__main__":
# Example data
X = np.array([[2.5, 2.4],
[0.5, 0.7],
[2.2, 2.9],
[1.9, 2.2],
[3.1, 3.0],
[2.3, 2.7],
[2, 1.6],
[1, 1.1],
[1.5, 1.6],
[1.1, 0.9]])
# Perform PCA
print("Eigenvalues:\n", eigenvalues)
print("Eigenvectors:\n", eigenvectors)
OUTPUT:
Reduced Data:
[[ 0.82797019 -0.17511531]
[-1.77758033 0.14285723]
[ 0.99219749 0.38437499]
[ 0.27421042 0.13041721]
[ 1.67580142 -0.20949846]
[ 0.9129491 0.17528244]
[-0.09910944 -0.3498247 ]
[-1.14457216 0.04641726]
[-0.43804614 0.01776463]
[-1.22382056 -0.16267529]]
Eigenvalues:
[1.28402771 0.0490834 ]
Eigenvectors:
[[ 0.6778734 -0.73517866]
[ 0.73517866 0.6778734 ]]
ALGORITHM :
import numpy as np
class SimpleHMM:
self.states = states
self.observations = observations
self.start_prob = start_prob
self.trans_prob = trans_prob
self.emit_prob = emit_prob
n_states = len(self.states)
n_obs = len(obs_sequence)
# Initialization step
first_obs = obs_sequence[0]
for s in range(n_states):
# Recursion step
for s in range(n_states):
viterbi_table[s, t] = np.max(probabilities)
backpointer_table[s, t] = np.argmax(probabilities)
# Termination step
# Path backtracking
best_path[-1] = best_last_state
# Example usage
if __name__ == "__main__":
transition_probability = np.array([
])
emission_probability = np.array([
])
OUTPUT:
RESULT: Thus the program to implement HMM has been successfully executed .
PROGRAM 8
ALGORITHM:
data = load_iris()
X, y = data.data, data.target
clf.fit(X_train, y_train)
# Make predictions on the test set
y_pred = clf.predict(X_test)
print(f'Accuracy: {accuracy:.2f}')
plt.figure(figsize=(12, 8))
plt.show()
OUTPUT:
Accuracy:1.0
RESULT: Thus the given program for the implementation of CART learning algorithms to perform
categorization has been successfully executed.
PROGRAM 9
ALGORITHM:
1. Create multiple subsets of the training data by sampling with replacement (bootstrap
sampling).
3. Aggregate the predictions of all models by taking a majority vote (for classification) or
averaging (for regression).
2. Train a base model on the training data, weighted according to their current weights.
PROGRAM:
import numpy as np
X, y = load_iris(return_X_y=True)
pca = PCA(n_components=2)
X_train_2d = pca.fit_transform(X_train)
X_test_2d = pca.transform(X_test)
rf_clf = RandomForestClassifier(random_state=42)
rf_clf.fit(X_train_2d, y_train)
ada_clf = AdaBoostClassifier(random_state=42)
ada_clf.fit(X_train_2d, y_train)
Z = Z.reshape(xx.shape)
ax.set_title(title)
# Create plots
plt.show()
OUTPUTS:
RESULT: Thus the program for implementation to Ensemble learning models to perfom classification
is been successfully executed and verified.