Amlnew
Amlnew
ALGORITHM:
Step 1: Load and read all text documents from the specified folder.
Step 2: Extract labels from the filenames and store the content of the documents in a list.
Step 3: Check for consistency between the number of documents and labels.
Step 4: Split the data into training and testing sets (80% training, 20% testing).
Step 5: Apply the TF-IDF vectorizer to convert text documents into a numerical format.
Step 6: Train a Naïve Bayesian Classifier (MultinomialNB) using the training data.
Step 7: Predict the labels for the test set and compute accuracy, precision, and recall.
Step 8: Print the accuracy, precision, recall, and a customized classification report.
PROGRAM:
import os
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, precision_score, recall_score, classification_report
# Add the document content and the label to their respective lists
documents.append(content)
labels.append(file_label)
except Exception as e:
print(f"Error processing file {filename}: {e}")
continue
# Split data into training and test sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(documents, labels, test_size=0.2, random_state=42)
# Function to print the first and last few lines of the classification report
def print_classification_report(report):
lines = report.split('\n')
# Print ellipsis
print('...')
OUTPUT:
Accuracy: 0.8750
Precision (weighted): 0.8800
Recall (weighted): 0.8750
Classification Report:
precision recall f1-score support
accuracy 0.88 40
macro avg 0.88 0.88 0.88 40
weighted avg 0.88 0.88 0.88 40
...
accuracy 0.88 40
macro avg 0.88 0.88 0.88 40
weighted avg 0.88 0.88 0.88 40
RESULT:
Thus, the given program to implement a Naïve Bayesian Classifier to classify is executed and the
output is verified successfully.
CS4514- Advanced Machine Learning Department of CSE Reg No:312422104069
AIM:
To implement a Naïve Bayesian Classifier to classify a set of 190 documents based on their content,
and to evaluate its performance using accuracy, precision, and recall metrics.
ALGORITHM:
Step 1: Load and read all text documents from the specified folder.
Step 2: Extract labels from the filenames and store the content of the documents in a list.
Step 3: Check for consistency between the number of documents and labels.
Step 4: Split the data into training and testing sets (80% training, 20% testing).
Step 5: Apply the TF-IDF vectorizer to convert text documents into a numerical format.
Step 6: Train a Naïve Bayesian Classifier (MultinomialNB) using the training data.
Step 7: Predict the labels for the test set and compute accuracy, precision, and recall.
Step 8: Print the accuracy, precision, recall, and a customized classification report.
CS4514- Advanced Machine Learning Department of CSE Reg No:312422104069
PROGRAM:
# Import necessary libraries
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, classification_report
# Load the Iris dataset (you can replace this with any dataset of your choice)
iris = load_iris()
X = iris.data # Features
y = iris.target # Labels
# Split the dataset into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
OUTPUT:
Accuracy: 100.00%
Classification Report:
precision recall f1-score support
accuracy 1.00 30
macro avg 1.00 1.00 1.00 30
weighted avg 1.00 1.00 1.00 30
Result:
Thus, the given program to apply the Expectation-Maximization (EM) algorithm and the k-Means
clustering algorithm Is executed and the output is verified successfully
CS4514- Advanced Machine Learning Department of CSE Reg No:312422104069
Ex No: 7
Date: COMPARE EM ALGORITHM AND K MEANS ALGORITHM
AIM:
To apply the Expectation-Maximization (EM) algorithm and the k-Means clustering algorithm to a
dataset stored in a CSV file, and compare the clustering results of these two algorithms.
ALGORITHM:
Step 1: Load the dataset from the CSV file using the Pandas library.
Step 2: Preprocess the data by scaling the features using StandardScaler to standardize the data.
Step 3: Apply the Expectation-Maximization (EM) algorithm using the Gaussian Mixture model to cluster
the data.
Step 4: Predict and store the cluster labels from the EM algorithm.
Step 5: Apply the k-Means algorithm to the same dataset with a specified number of clusters.
Step 6: Predict and store the cluster labels from the k-Means algorithm.
Step 7: Add the resulting cluster labels from both algorithms to the original dataset for comparison.
Step 8: Print the dataset with the EM and k-Means cluster labels for comparison of the clustering results.
CS4514- Advanced Machine Learning Department of CSE Reg No:312422104069
PROGRAM:
# Install necessary libraries: pip install pandas scikit-learn
import pandas as pd
from sklearn.mixture import GaussianMixture
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
# Apply EM algorithm
em = GaussianMixture(n_components=3, random_state=42)
em.fit(data_scaled)
em_labels = em.predict(data_scaled)
OUTPUT:
Player Match_Score Performance EM_Cluster KMeans_Cluster
0 A 50.0 0.8 0 2
1 B 40.5 0.6 1 0
2 C 60.3 0.9 2 1
3 D 55.1 0.7 0 2
4 E 45.2 0.4 1 0
...
RESULT:
Thus, the given program to apply the Expectation-Maximization (EM) algorithm and the k-Means
clustering algorithm Is executed and the output is verified successfully.
CS4514- Advanced Machine Learning Department of CSE Reg No:312422104069
Ex No: 8
Date: K NEAREST NEIGHBOUR ALGORITHM
AIM:
To implement the k-Nearest Neighbour (k-NN) algorithm to classify the Iris dataset and display both
correct and wrong predictions.
ALGORITHM:
Step 1: Load the Iris dataset using the load_iris() function from the sklearn library.
Step 2: Split the dataset into training and testing sets using train_test_split().
Step 3: Initialize the k-NN classifier with k=3 neighbors.
Step 4: Train the k-NN model using the training data.
Step 5: Predict the labels for the test data using the trained k-NN model.
Step 6: Generate and print the classification report to evaluate the model's performance.
Step 7: Identify and display 5 correct predictions where predicted and actual labels match.
Step 8: Identify and display 5 wrong predictions where predicted and actual labels differ.
CS4514- Advanced Machine Learning Department of CSE Reg No:312422104069
PROGRAM:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report
# Make predictions
y_pred = knn.predict(X_test)
for i in range(len(y_test)):
if y_test[i] == y_pred[i] and correct_count < 5:
print(f"Correct: Predicted = {y_pred[i]}, Actual = {y_test[i]}")
correct_count += 1
elif y_test[i] != y_pred[i] and wrong_count < 5:
print(f"Wrong: Predicted = {y_pred[i]}, Actual = {y_test[i]}")
wrong_count += 1
OUTPUT:
Classification Report:
precision recall f1-score support
accuracy 0.98 45
macro avg 0.98 0.98 0.98 45
weighted avg 0.98 0.98 0.98 45
RESULT:
Thus, the given program to implement the k-Nearest Neighbour (k-NN) algorithm and display both
correct predictions is executed and the output is verified successfully.
CS4514- Advanced Machine Learning Department of CSE Reg No:312422104069
AIM:
To implement the non-parametric Locally Weighted Regression (LWR) algorithm to fit data points and
visualize the predicted results compared to the actual dataset.
ALGORITM:
Step 1: Generate a synthetic dataset with input values and noisy target values.
Step 2: Define a weight function based on the distance between training data points and the query point.
Step 3: Formulate the Locally Weighted Regression model by solving for theta using the normal equation
and weighted matrix.
Step 4: Add a bias term to the input features to accommodate the intercept in the linear model.
Step 5: Select a suitable bandwidth parameter (tau) that controls the extent of locality for the weights.
Step 6: For each test point, compute the weights and predict the corresponding output using the LWR
model.
Step 7: Store the predicted output for each test point by applying the LWR model to the entire dataset.
Step 8: Plot the original dataset and the predicted curve on the same graph to compare the model fit with
the actual data.
CS4514- Advanced Machine Learning Department of CSE Reg No:312422104069
PROGRAM:
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(42)
X = np.linspace(0, 15, 150)
y = np.sin(X) + np.random.normal(0, 0.3, X.shape)
OUTPUT:
RESULT:
Thus the given program to implement the non-parametric Locally Weighted Regression (LWR)
dataset is executed and the output is verified successfully.
CS4514- Advanced Machine Learning Department of CSE Reg No:312422104069
Ex No: 10
Date: FINDS ALGORITHM
AIM: To write a program in order to implement finds algorithm for the given data.
ALGORITHM:
Step 1: Initialize the hypothesis with 'Φ' for each feature.
Step 2: Iterate through each example in the dataset.
Step 3: If the target value of the example is 'yes', update the hypothesis based on the example's features.
Step 4: If a feature value in the hypothesis does not match the example's feature value, set it to '?'.
Step 5: Return the updated hypothesis.
Step 6: Print the final hypothesis.
CS4514- Advanced Machine Learning Department of CSE Reg No:312422104069
PROGRAM:
def find_s(examples):
hypothesis = ['Φ', 'Φ', 'Φ']
for example in examples:
if example[-1] == 'yes':
for i in range(len(hypothesis)):
if hypothesis[i] == 'Φ':
hypothesis[i] = example[i]
elif hypothesis[i] != example[i]:
hypothesis[i] = '?'
return hypothesis
data = [
[1, 'apple', 'red', 'fruit', 'yes'],
[2, 'mango', 'yellow', 'fruit', 'no'],
[3, 'jackfruit', 'green', 'fruit', 'yes'],
[4, 'BlueBerry', 'purple', 'fruit', 'yes']
]
hypothesis = find_s(data)
print("Final Hypothesis:", hypothesis)
OUTPUT:
Final Hypothesis: ['?', '?', '?']
RESULT:
The given program to implement finds algorithm for the given data is executed and the output is
verified successfully.
CS4514- Advanced Machine Learning Department of CSE Reg No:312422104069
AIM:
To construct a Bayesian Network using medical data and demonstrate the diagnosis of heart patients using
a standard heart disease dataset. The model will be trained using Maximum Likelihood Estimation and will
be used for inference.
ALGORITHM:
Step 1: Load the heart disease dataset using Pandas and optimize memory usage by converting data types
to appropriate types.
Step 2: Handle any missing values in the dataset by filling in missing values with the column mean.
Step 3: Split the dataset into training and testing sets using the train_test_split() function.
Step 4: Define the Bayesian Network structure by specifying the relationships between features and the
target variable.
Step 5: Train the Bayesian Network model using the Maximum Likelihood Estimation (MLE) method.
Step 6: Use Variable Elimination for inference to diagnose the likelihood of heart disease based on certain
features.
Step 7: Perform diagnosis on new patient data using the trained model to predict the presence of heart
disease.
Step 8: Evaluate the model’s accuracy by testing the predictions on the test dataset and calculate the
overall accuracy.
CS4514- Advanced Machine Learning Department of CSE Reg No:312422104069
PROGRAM:
import pandas as pd
from pgmpy.models import BayesianNetwork
from pgmpy.estimators import MaximumLikelihoodEstimator
from pgmpy.inference import VariableElimination
from sklearn.model_selection import train_test_split
import numpy as np
import logging
# Reduce memory usage by converting data types to the most appropriate types
data = data.astype({
'age': 'int8',
'sex': 'int8',
'cp': 'int8',
'trestbps': 'int16',
'chol': 'int16',
'fbs': 'int8',
'restecg': 'int8',
'thalach': 'int16',
'exang': 'int8',
'oldpeak': 'float32',
'slope': 'int8',
'ca': 'int8',
'thal': 'int8',
'target': 'int8'
})
('age', 'target'),
('cp', 'target'),
('thalach', 'target'),
('exang', 'target')
])
if predicted == actual:
CS4514- Advanced Machine Learning Department of CSE Reg No:312422104069
correct_predictions += 1
RESULT:
Thus, the given program to construct a Bayesian Network using medical data Maximum
Likelihood Estimation and will be used for inference.
CS4514- Advanced Machine Learning Department of CSE Reg No:312422104069
Ex no:12
Date: LOGISTIC REGRESSION
AIM:
To implement a Logistic Regression model to classify the given dataset and evaluate the model's
performance using accuracy, confusion matrix, and classification report.
ALGORITHM:
Step 1: Load the heart disease dataset using Pandas and display a preview of the data.
Step 2: Define the feature matrix (X) and the target vector (y) by separating the target column from the
dataset.
Step 3: Split the dataset into training and testing sets (80% training, 20% testing) using train_test_split().
Step 4: Initialize a Logistic Regression model, specifying any necessary parameters such as max_iter.
Step 5: Train the Logistic Regression model using the training dataset.
Step 6: Make predictions on the test data using the trained model.
Step 7: Evaluate the model by calculating the accuracy score on the test data.
Step 8: Print the confusion matrix and classification report to analyze the performance of the model further.
CS4514- Advanced Machine Learning Department of CSE Reg No:312422104069
PROGRAM:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
# Split the dataset into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
OUTPUT:
Dataset preview:
age sex cp trestbps chol fbs restecg thalach exang oldpeak slope ca thal target
0 63 1 3 145 233 1 0 150 0 2.3 3 0 6 1
1 67 1 2 160 286 0 1 108 1 1.5 1 0 3 1
2 67 1 2 160 286 0 1 108 1 1.5 1 0 3 1
3 37 1 2 130 250 0 1 187 0 3.5 0 0 3 1
4 41 0 1 130 204 0 1 172 0 1.4 1 0 3 1
accuracy 0.92 50
macro avg 0.92 0.92 0.92 50
weighted avg 0.92 0.92 0.92 50
RESULT:
Thus, the given program to implement a Logistic Regression model to classify the given dataset is
executed and the output is verified successfully.