0% found this document useful (0 votes)

26 views38 pages

ML Full For Print New 1

The document outlines a series of exercises focused on implementing various machine learning algorithms, including Linear Regression, binary classification with k-NN, and K-Means clustering, using real datasets. Each exercise includes an aim, algorithm steps, and program code, demonstrating how to preprocess data, train models, evaluate performance, and visualize results. The final exercise emphasizes analyzing model performance to detect overfitting.

Uploaded by

4213 SWARNA B CSE

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views38 pages

ML Full For Print New 1

Uploaded by

4213 SWARNA B CSE

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

EX.

NO:1 Implement a Linear Regression with a Real Dataset

DATE: https://www.kaggle.com/harrywang/housing), experiment with different
features in building a model. Tune the model's hyper parameters.

Aim:
To implement a Linear Regression with a Real Dataset experiment with different features in building
a model.

Algorithm:

1. Import necessary libraries.

2. Load the housing dataset from a URL.
3. Check and handle missing values.
4. Convert categorical data to numeric using one-hot encoding.
5. Define features (X) and target (y).
6. Split the data into training and testing sets.
7. Train a Linear Regression model.
8. Predict house prices on the test set.
9. Evaluate the model using MAE, MSE, RMSE, and R² score
10. Visualize actual vs predicted values and residuals.

Program:
import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

# Load the California housing dataset

dataset_url = "https://raw.githubusercontent.com/ageron/handson-
ml/master/datasets/housing/housing.csv"
df = pd.read_csv(dataset_url)

1
# Display first few rows of the dataset
print(df.head())

# Check for missing values

print("Missing values per column:\n", df.isnull().sum())

# Convert categorical column 'ocean_proximity' to numerical using one-hot encoding

df = pd.get_dummies(df, columns=['ocean_proximity'], drop_first=True)

# Fill missing values with the median of each column

df.fillna(df.median(), inplace=True)

# Define independent (X) and dependent (y) variables

X = df.drop(columns=['median_house_value']) # Features
y = df['median_house_value'] # Target variable

# Split dataset into training (80%) and testing (20%) sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train the Linear Regression model

model = LinearRegression()
model.fit(X_train, y_train)

# Predict on the test set

y_pred = model.predict(X_test)

# Calculate performance metrics

mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred)

print(f"\nModel Performance Metrics:")

print(f"Mean Absolute Error (MAE): {mae}")
print(f"Mean Squared Error (MSE): {mse}")
2
print(f"Root Mean Squared Error (RMSE): {rmse}")
print(f"R² Score: {r2}")

# Plot Actual vs Predicted Prices

plt.figure(figsize=(8, 6))
sns.scatterplot(x=y_test, y=y_pred, alpha=0.5)
plt.xlabel("Actual Prices")
plt.ylabel("Predicted Prices")
plt.title("Actual vs Predicted House Prices")
plt.show()

# Plot Residual Errors

residuals = y_test - y_pred
plt.figure(figsize=(8, 6))
sns.histplot(residuals, bins=50, kde=True)
plt.xlabel("Residual Error")
plt.title("Distribution of Residuals")
plt.show()

3
Sample input output:

4
Result:
Thus to implement a Linear Regression with a Real Dataset experiment with different features in
building a model was successfully executed.

5
Ex.no:2 Implement a binary classification model. That is, answers a binary
question such as "Are houses in this neighbourhood above a certain
Date: price?" (use data from exercise 1). Modify the classification threshold and
determine how that modification influences the model. Experiment with
different classification metrics to determine your model's effectiveness.

Aim:
To implement a binary classification model and to modify the classification threshold and determine
how that modification influences the model with different classification metrics to determine your model's
effectiveness.

Algorithm:

 Import required libraries.

 Load the diabetes dataset from a URL.
 Split the data into features (X) and target (y).
 Split the dataset into training and testing sets.
 Train k-NN models with different values of k (1 to 8).
 Plot training and testing accuracy vs k to find the best k.
 Train k-NN model using the best k (e.g., 7).
 Predict and evaluate using:
o Confusion matrix
o Classification report
o ROC curve and AUC score
 Perform hyperparameter tuning using GridSearchCV to find the best k.
 Display the best parameters and the corresponding score.

Program:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import confusion_matrix, classification_report, roc_curve, roc_auc_score

# Load dataset from an alternative online source

url = "https://raw.githubusercontent.com/plotly/datasets/master/diabetes.csv"
df = pd.read_csv(url)
print(df.head())

# Define features and target variable

X = df.drop('Outcome', axis=1).values
y = df['Outcome'].values
6
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=42, stratify=y)

# Initialize variables
neighbors = np.arange(1, 9)
train_accuracy = np.empty(len(neighbors))
test_accuracy = np.empty(len(neighbors))

# Loop through different k values to determine the best k

for i, k in enumerate(neighbors):
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X_train, y_train)
train_accuracy[i] = knn.score(X_train, y_train)
test_accuracy[i] = knn.score(X_test, y_test)

# Plot accuracy vs number of neighbors

plt.figure()
plt.title('k-NN Varying Number of Neighbors')
plt.plot(neighbors, test_accuracy, label='Testing Accuracy')
plt.plot(neighbors, train_accuracy, label='Training Accuracy')
plt.legend()
plt.xlabel('Number of Neighbors')
plt.ylabel('Accuracy')
plt.show()

# Train the model with optimal k

knn = KNeighborsClassifier(n_neighbors=7)
knn.fit(X_train, y_train)

# Evaluate the model

y_pred = knn.predict(X_test)
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))
print("\nClassification Report:")
print(classification_report(y_test, y_pred))

# Compute ROC curve

y_pred_proba = knn.predict_proba(X_test)[:, 1]
fpr, tpr, thresholds = roc_curve(y_test, y_pred_proba)

plt.figure()
plt.plot([0, 1], [0, 1], 'k--')
plt.plot(fpr, tpr, label='k-NN (n_neighbors=7)')
plt.xlabel('False Positive Rate (FPR)')
plt.ylabel('True Positive Rate (TPR)')
plt.title('ROC Curve')
plt.legend()
plt.show()

# Compute AUC score

auc_score = roc_auc_score(y_test, y_pred_proba)
7
print(f"ROC AUC Score: {auc_score:.4f}")

# Perform hyperparameter tuning

param_grid = {'n_neighbors': np.arange(1, 50)}
knn_cv = GridSearchCV(KNeighborsClassifier(), param_grid, cv=5)
knn_cv.fit(X, y)

# Print best parameters and score

print(f"Best Score: {knn_cv.best_score_:.4f}")
print(f"Best Parameters: {knn_cv.best_params_}"

8
Sample input output:

9
Result:
Thus to implement a binary classification model and to modify the classification threshold and
determine how that modification influences the model with different classification metrics to determine your
model's effectiveness was successfully executed.

10
EX.NO:3 CLASSIFICATION WITH NEAREST NEIGHBORS.IN THIS QUESTION
YOU WILL USE SCIKIT’S LEARN’S KNN CLASSIFIER TO CLASSIFY
DATE: REAL VS FAKE NEWS HEADLINES.THE AIM OF THIS QUESTION IS
FOR YOU TO READ THE SCIKIT-LEARN API AND GET
COMFORTABLE WITH TRAINING/VALIDATION SPLITS.USE
CALIFORNIA HOUSING DATASET.

AIM:

To implement a program for Classification using Nearest Neighbors using Scikit-learn’s KNN
classifier and evaluate the model’s performance with training/validation splits and metrics.

ALGORITHM:

1. Start the program.

2. Import the necessary libraries and dataset.
3. Preprocess the dataset if needed.
4. Split the dataset into training and testing sets.
5. Train the K-Nearest Neighbors (KNN) model with different values of K.
6. Plot the accuracy scores for different values of K.
7. Choose the best K and evaluate the model.
8. Print accuracy, confusion matrix, and classification report.
9. Plot the confusion matrix.
10. End the program.

PROGRAM:

# Import Libraries

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.datasets import load_wine

11
from sklearn.model_selection import train_test_split

from sklearn.neighbors import KNeighborsClassifier

from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

# Load Dataset

wine = load_wine()

X = wine.data

y = wine.target

# Split Data

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Try different K values

neighbors = np.arange(1, 10)

train_accuracy = []

test_accuracy = []

for k in neighbors:

knn = KNeighborsClassifier(n_neighbors=k)

knn.fit(X_train, y_train)

train_accuracy.append(knn.score(X_train, y_train))

test_accuracy.append(knn.score(X_test, y_test))

# Plot K vs Accuracy

plt.figure(figsize=(8,5))

plt.plot(neighbors, train_accuracy, label="Train Accuracy", marker='o')

plt.plot(neighbors, test_accuracy, label="Test Accuracy", marker='s')

plt.xlabel("Number of Neighbors (K)")

plt.ylabel("Accuracy")

12
plt.title("KNN Accuracy for Different K Values")

plt.legend()

plt.grid(True)

plt.show()

# Final Model with Best K

best_k = 5

knn = KNeighborsClassifier(n_neighbors=best_k)

knn.fit(X_train, y_train)

y_pred = knn.predict(X_test)

# Evaluation

acc = accuracy_score(y_test, y_pred)

print(f"\n Final Accuracy (K={best_k}): {acc:.4f}")

cm = confusion_matrix(y_test, y_pred)

print("\n Confusion Matrix:")

print(cm)

print("\n Classification Report:")

print(classification_report(y_test, y_pred, target_names=wine.target_names))

# Plot Confusion Matrix

plt.figure(figsize=(6, 4))

sns.heatmap(cm, annot=True, fmt='d', cmap='Blues' xticklabels=wine.target_names,

yticklabels=wine.target_names)
plt.xlabel("Predicted")

plt.ylabel("Actual")

plt.title("Confusion Matrix")

plt.show()
13
SAMPLE INPUT OUTPUT:

14
RESULT:

Thus to implement a program for Classification using Nearest Neighbors using Scikit-learn’s KNN
classifier and evaluate the model’s performance with training/validation splits and metrics was successfully
executed.

15
EX NO: 4 ANALYZE DELTAS BETWEEN TRAINING SET
AND VALIDATION SET RESULTS TO DETERMINE THE MODEL
IS OVERFITTING
DATE:

AIM:

To analyze the difference in performance between training and validation sets to determine if the model is
overfitting, and to visualize the results to detect and address this issue.

ALGORITHM:

1. Start the program.

2. Import necessary libraries and generate or load a classification dataset.
3. Split the dataset into training and testing sets.
4. Train a Decision Tree classifier with different depths.
5. Evaluate model accuracy on training and testing sets for each depth.
6. Record and compare the results.
7. Plot accuracy vs model complexity (tree depth).
8. Identify the presence of overfitting.
9. End the program.

PROGRAM:

# Step 1: Import Libraries

import numpy as np

import matplotlib.pyplot as plt

from sklearn.datasets import make_classification

from sklearn.model_selection import train_test_split

from sklearn.tree import DecisionTreeClassifier

from sklearn.metrics import accuracy_score

# Step 2: Generate synthetic classification dataset

X, y = make_classification(n_samples=10000, n_features=20,

16
n_informative=5, n_redundant=15, random_state=1)

# Step 3: Split into train and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Step 4: Train and test models with different max_depth values

train_scores = []

test_scores = []

depth_range = range(1, 21)

for depth in depth_range:

model = DecisionTreeClassifier(max_depth=depth, random_state=0)

model.fit(X_train, y_train)

train_yhat = model.predict(X_train)

test_yhat = model.predict(X_test)

train_acc = accuracy_score(y_train, train_yhat)

test_acc = accuracy_score(y_test, test_yhat)

train_scores.append(train_acc)

test_scores.append(test_acc)

print(f"Depth={depth}, Train Acc={train_acc:.3f}, Test Acc={test_acc:.3f}")

# Step 5: Plot Training vs Testing Accuracy

plt.figure(figsize=(10, 6))

plt.plot(depth_range, train_scores, '-o', label='Training Accuracy')

plt.plot(depth_range, test_scores, '-o', label='Testing Accuracy')

plt.xlabel('Tree Depth')

plt.ylabel('Accuracy')

plt.title('Overfitting Detection: Training vs Testing Accuracy')

17
plt.legend()

plt.grid(True)

plt.show()

18
SAMPLE INPUT OUTPUT:

19
RESULT:

Thus to analyse the difference in performance between training and validation sets to determine if
the model is overfitting, and to visualize the results to detect and address this issue was executed
successfully.

20
EX NO: 5 Implement the K-Means algorithm using the given
dataset.
DATE:

Aim:

To implement the K-Means Clustering Algorithm on the given biological dataset and group the data points
(species) based on their codon usage frequencies.

Algorithm: K-Means Clustering

1. Start
2. Import the required libraries (pandas, sklearn, matplotlib, etc.).
3. Load the dataset containing codon usage frequencies and other features.
4. Select the relevant numerical features (e.g., UUU, UUC, UUA, UUG).
5. Normalize the features using StandardScaler for better clustering results.
6. Choose the number of clusters (e.g., k = 3).
7. Apply the K-Means clustering algorithm:
■ Initialize centroids randomly.
■ Assign each point to the nearest centroid.
■ Update centroids as the mean of assigned points.
■ Repeat steps until centroids do not change or max iterations are reached.
8. Assign the cluster labels to the dataset.
9. Display the final clustered data with the cluster number.
10. Optionally, visualize clusters using scatter plots.

PROGRAM:

# Step 1: Import required libraries

import pandas as pd

from sklearn.cluster import KMeans

from sklearn.preprocessing import StandardScaler

import matplotlib.pyplot as plt

21
import seaborn as sns

# Step 2: Create sample data

data = {

'Kingdom': ['vrl']5 + ['pri']2,

'DNAType': [0]*7,

'SpeciesID': [100217, 100220, 100755, 100880, 100887, 9601, 9606],

'Ncodons': [1995, 1474, 4862, 1915, 22831, 1097, 40662582],

'SpeciesName': [

'Epizootic haemotopoietic necrosis virus',

'Bohle iridovirus',

'Sweet potato leaf curl virus',

'Northern cereal mosaic virus',

'Soil-borne cereal mosaic virus',

'Pongo pygmaeus abelii',

'Homo sapiens'

'UUU': [0.01654, 0.02714, 0.01974, 0.01775, 0.02816, 0.02552, 0.01757],

'UUC': [0.01203, 0.01357, 0.0218, 0.02245, 0.01371, 0.03555, 0.02028],

'UUA': [0.0005, 0.00068, 0.01357, 0.01619, 0.00767, 0.00547, 0.00767],

'UUG': [0.00351, 0.00678, 0.01543, 0.00992, 0.03679, 0.01367, 0.01293]

df = pd.DataFrame(data)

# Step 3: Extract codon usage features

22
features = df[['UUU', 'UUC', 'UUA', 'UUG']]

# Step 4: Normalize codon frequencies

scaler = StandardScaler()

scaled_features = scaler.fit_transform(features)

# Step 5: Apply KMeans clustering

kmeans = KMeans(n_clusters=3, random_state=0)

df['Cluster'] = kmeans.fit_predict(scaled_features)

# Step 6: Display final output

pd.set_option('display.max_columns', None)

print(" Final Output:\n")

print(df[['Kingdom', 'DNAType', 'SpeciesID', 'Ncodons', 'SpeciesName', 'UUU', 'UUC', 'UUA', 'UUG',

'Cluster']])

# Step 7: Optional visualization

plt.figure(figsize=(8, 6))

sns.scatterplot(data=df, x='UUU', y='UUC', hue='Cluster', palette='Set1', s=100)

plt.title("Codon Usage Clustering using K-Means")

plt.xlabel("UUU Frequency")

plt.ylabel("UUC Frequency")

plt.grid(True)

plt.show()

23
SAMPLE INPUT OUTPUT:

24
RESULT:

Thus to implement the K-Means Clustering Algorithm on the given biological dataset and group
the data points (species) based on their codon usage frequencies has been successfully completed.

25
EX.NO:6 IMPLEMENT THE NAÏVE BAYES CLASSIFIER USING THE GIVEN
DATASET
DATE:

Aim:

To Implement The Naïve Bayes Classifer Using The Given Dataset For Predicting The Results.

Algorithm:

STEP1: Start the algorithm and implementing a program

STEP2: Open the jupyter notebook and activate the environment

STEP3:Create a new environment and Rename it.

STEP4:Then install the necessary packages and libraries in the created environment on the jupyter
notebook

STEP 5: Data Pre-processing step

STEP 6: Fitting Naive Bayes to the Training set

STEP 7:Predicting the test result

STEP 8:Test accuracy of the result(Creation of Confusion matrix)

STEP 9:Visualizing the test set result.

STEP 10: Finish

Program:

# Step 1: Import required libraries

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

from matplotlib.colors import ListedColormap

26
# Step 2: Create a sample dataset similar to what's used in the experiment

data = {

'User ID': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],

'Gender': ['Male', 'Male', 'Female', 'Female', 'Male', 'Male', 'Female', 'Female', 'Male', 'Female'],

'Age': [19, 35, 26, 27, 45, 46, 48, 50, 29, 31],

'EstimatedSalary': [19000, 20000, 43000, 57000, 26000, 28000, 30000, 87000, 80000, 150000],

'Purchased': [0, 0, 0, 1, 1, 1, 1, 1, 0, 1]

dataset = pd.DataFrame(data)

# Step 3: Select features (Age, EstimatedSalary) and target (Purchased)

X = dataset.iloc[:, [2, 3]].values # Age and EstimatedSalary

y = dataset.iloc[:, 4].values # Purchased

# Step 4: Split the dataset into training and testing sets

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=0)

# Step 5: Feature Scaling

from sklearn.preprocessing import StandardScaler

sc = StandardScaler()

X_train = sc.fit_transform(X_train)

27
X_test = sc.transform(X_test)

# Step 6: Train the Naive Bayes model

from sklearn.naive_bayes import GaussianNB

classifier = GaussianNB()

classifier.fit(X_train, y_train)

# Step 7: Predict the test set results

y_pred = classifier.predict(X_test)

# Step 8: Evaluate the model

from sklearn.metrics import confusion_matrix, accuracy_score

cm = confusion_matrix(y_test, y_pred)

ac = accuracy_score(y_test, y_pred)

print("Confusion Matrix:\n", cm)

print("Accuracy Score:", ac)

# Step 9: Visualization function

def plot_decision_boundary(X_set, y_set, title):

X1, X2 = np.meshgrid(

np.arange(start=X_set[:, 0].min() - 1, stop=X_set[:, 0].max() + 1, step=0.01),

np.arange(start=X_set[:, 1].min() - 1, stop=X_set[:, 1].max() + 1, step=0.01)

28
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),

alpha=0.75, cmap=ListedColormap(('purple', 'green')))

plt.xlim(X1.min(), X1.max())

plt.ylim(X2.min(), X2.max())

for i, j in enumerate(np.unique(y_set)):

plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],

c=ListedColormap(('purple', 'green'))(i), label=j)

plt.title(title)

plt.xlabel('Age')

plt.ylabel('Estimated Salary')

plt.legend()

plt.show()

# Step 10: Plot results

plot_decision_boundary(X_train, y_train, 'Naive Bayes (Training set)')

plot_decision_boundary(X_test, y_test, 'Naive Bayes (Test set)')

29
SAMPLE INPUT OUTPUT:

30
RESULT:

Thus to implement the naïve bayes classifier using the given dataset has been successfully
completed.

31
EX.NO : 7 PROJECT
DATE: BRAIN TUMOUR DETECTION USING MACHINE LEARNING
ALGORITHM

Abstract

The development of aberrant brain cells, some of which may turn cancerous, is known as a brain
tumour. Magnetic Resonance Imaging (MRI) scans are the most common technique for finding brain
tumours. Information about the aberrant tissue growth in the brain is discernible from the MRI scans.
In numerous research papers, machine learning and deep learning algorithms are used to detect brain
tumours. It takes extremely little time to forecast a brain tumour when these algorithms are applied
to MRI pictures, and the better accuracy makes it easier to treat patients. The radiologist can make
speedy decisions thanks to these predictions. A self-defined Artificial Neural Network (ANN) and
Convolution Neural Network (CNN) are used in the proposed work to detect the existence of the
presence of brain tumor and their performance is analyzed.
Keywords: Convolution Neural Network, Machine Learning, Brain tumor, Algorithms

Introduction

The brain is one of the most crucial parts of the human body since it regulates the operation of all
other organs and aids in decision-making It is basically the central nervous system's command post
and is in charge of carrying out the body's regular voluntary and involuntary functions. The tumour
is an uncontrolled proliferation of undesirable tissue that has formed a fibrous mesh inside of our
brain. A brain tumour is identified in roughly 3,540 youngsters this year at the age of 15 To
effectively prevent and treat the condition, it is crucial to have a thorough grasp of brain tumours and
their stages. ANN and CNN is used in the classification of normal and tumor brain.
ANN(Artifical Neural Network) works like a human brain nervous system, on this basis a
digital computer is connected with large amount of interconnections and networking which makes
neural network to train with the use of simple processing units applied on the training set and stores
the experiential knowledge. It has different layers of neurons which is connected together. The neural
network can acquire the knowledge by using data set applied on learning process. There will be one
input and output layer whereas there may be any number of hidden layers. In the learning process,
the weight and bias is added to neurons of each layer depending upon the input features and on the
previous layers(for hidden layers and output layers). A model is trained based on the activation
function applied on the input features and on the hidden layers where more learning happens to
achieve the expected output
32
Existing Methodology

Brain tumour is detected by using Image processing techniques. Various algorithms are used
for the partial fulfilment of the requirements to arrive the best results. Some of the
algorithms used are Probabilistic neural network has been used for more productivity using
SVM and KNN technique. Segmentation plays major role to detect brain tumour.

Pre- Processing

Pre-Processing ways purpose the upgrade of the image while not dynamic the
information content. The first driver of image flaws is as Low.

Segmentation

Local developing could be a basic district primarily based image division strategy. It’s in
addition delegated a pixel-based image division strategy since it includes the
determination of introductory seed focuses. This manner to alter division inspects
neighbouring elements of introductory seed focuses and figures out if the pixel neighbours
need to be additional to the district. The procedure is iterated on, in associate degree
indistinguishable approach from general data grouping calculations. A general discourse
of the venue developing calculation is portrayed beneath.

Convolutional Neural Network

Convolutional Neural Network (CNN) are easier to coach and fewer liable to over fitting.
Methodology like mentioned earlier within the report, we have a tendency to use a patch
primarily based segmentation approach. The Convolutional spec and implementation
administrated exploitation CAFFE. CNNs are the continuation of the multi-layer

33
Perceptron. In the MLP, a unit performs an easy computation by taking the weighted add
of all different units that function input to that. The network is organized into layers of
units within the previous l2ayer. The essence of CNNs is that the convolutions.

The most trick that convolutional Neural Network that avoid the mater too several
parameters is distributed connections. Each unit isn’t connected connect to each
different unit within the previous layer.

Proposed Methodology

The two techniques ANN and CNN are applied on the brain tumor dataset and
their performance on classifying the image is analyzed. Steps followed in applying ANN
on the brain tumor dataset are

1. Import the needed packages

2. Import the data folder
3. Read the images, provide the labels for the image (Set Image having Brain
Tumor as 1 and image nothaving brain tumor as 0) and store them in the
Data Frame.
4. Change the size of images as 256x256 by reading the images one by one.
5. Normalize the image
6. Split the data set into train, validation and test sets
7. Create the model
8. Compile the model
9. Apply the model on the train set.
10. Evaluate the model by applying it on the test set.

The ANN model used here has seven layers. First layer is the flatten layer which
converts the 256x256x3 images into single dimensional array. The next five layers are
the dense layers having the activation function as relu and number of neurons in each
layers are 128,256,512,256 and 128 respectively. These five layers act as the hidden
layers and the last dense layer having the activation function is sigmoid is the output
layer with 1 neuron representing the two classes.

The model is compiled with the adam optimization technique and binary
crossentropy loss function. The model is generated and trained by providing the training
images and the validation images. Once the model is trained, it is tested using the test
34
image set. Next the same dataset is given to the CNN technique. Steps followed
in applying CNN on the brain tumor dataset are

1. Import the needed packages

2. Import the data folder(Yes and No)
3. Set the class labels for images(1 for Brain Tumor and 0 for No Brain Tumor)
4. Convert the images into shape(256X256)
5. Normalize the Image
6. Split the images into the train, validation and test set images.
7. Create the sequential model.
8. Compile the model.
9. Apply it on the train dataset(use validation set to evaluate the training performance).
10. Evaluate the model using the test images.
11. Plot the graph comparing the training and validation accuracy.
12. Draw the confusion matrix for actual output against the predicted output.

The CNN sequential model is generated by implementing different layers. The input
image is reshaped into 256x256. The convolve layer is applied on the input image with
the relu as activation function, padding as same which means the output images looks
like the input image and the number of filters are 32,32,64,128,256 for different
convolve layers. The max pooling applied with the 2x2 window size and droupout
function is called with 20% of droupout. Flatten method is applied to convert the
features into one dimensional array. The fully connected layer is done by calling the
dense method with the number of units as 256 and relu as the activation function. The
output layer has 1 unit to represent the two classes and the sigmoid as activation
function. The architecture of CNN model is shown in the Figure . The implementation
is done using Python language and are executed in google colab.

35
The model is applied for 200 with the training and the validation dataset. The
history of execution is stored and plotted to understand the models generated.

Convolve(32,3x3,"relu",(256x256x3),padding=same)
Convolve(32,3x3,"relu",(256x256x3),padding=same)

Convolve(128,3x3,"relu",(256x256x3),padding=same)

Convolve(256,3x3,"relu",(256x256x3),padding=same)
Convolve(64,3x3,"relu",(256x256x3),padding=same)

Optimizer= "adam" and loss

Model= Sequential()

Fully Connected Layer

Max Pooling(2x2)
Max Pooling(2x2)

Max Pooling(2x2)

Max Pooling(2x2)
Droupout(0.2)
Droupout(0.2)

Output Layer
Droupout(0.2)

Droupout(0.2)

Flatten()
Figure : Architecture of CNN model

DataSet

The dataset is taken from Github website. This dataset contains MRI images of brain
tumor. Figure shows the sample normal and brain tumor image. Out of 1672 training
image, 877 images are tumor image and 795 images are non tumor image. 92 tumor
and 94 non tumor images taken from 186 validation images. Among 207 testing
images, 116 tumor images and 91 non tumor images.

36
Experimental Result Analysis

Comparing training/validation accuracy and loss of ANN model

Comparing training/validation accuracy and loss of CNN model

Conclusion:

CNN is considered as one of the best technique in analyzing the image dataset. The
CNN makes the prediction by reducing the size the image without losing the information
needed for making predictions. ANN model generated here produces 65.21% of testing
accuracy and this can be increased by providing more image data. The same can be done
by applying the image augmentation techniques and the analyzing the performance of the
ANN and CNN can be done. The model developed here is generated based on the trail and
error method. In future optimization techniques can be applied so as to decide the number
of layers and filters that can used in a model. As of now for the given dataset the CNN
proves to be the better technique in predicting the presence of brain tumor.

37
References:

[1] Javeria Amin Muhammad Sharif Mudassar Raza Mussarat Yasmin 2018 Detection of
Brain Tumor based on Features Fusion and Machine Learning Journal of Ambient
Intelligence and Humanized Computing Online Publication.

[2]. Rajeshwar Nalbalwar Umakant Majhi Raj Patil Prof.Sudhanshu Gonge 2014 Detection
of Brain Tumor by using ANN International Journal of Research in Advent Technology

[3]. Fatih Özyurt Eser Sert Engin Avci Esin Dogantekin 2019 Brain tumor detection based on
Convolutional Neural Network with neutrosophic expert maximum fuzzy sure entropy Elsevier Ltd

Year of Goodbyes
28% (18)
Year of Goodbyes
41 pages
L&T Pushbutton Catalogue Price List
100% (1)
L&T Pushbutton Catalogue Price List
16 pages
Wiring Colour Code Table
100% (1)
Wiring Colour Code Table
5 pages
Logistics Support Analysis
0% (1)
Logistics Support Analysis
5 pages
Clean and Green
100% (1)
Clean and Green
9 pages
Atterberg Limits (Liquid and Plastic Limit) and Linear Shrinkage Test
No ratings yet
Atterberg Limits (Liquid and Plastic Limit) and Linear Shrinkage Test
7 pages
Samuel Murphy Case Study Firms and Markets
100% (1)
Samuel Murphy Case Study Firms and Markets
21 pages
Udacity Machine Learning Analysis Supervised Learning
100% (1)
Udacity Machine Learning Analysis Supervised Learning
504 pages
Power, Conflict and Resistance: SocialMovements, Networks and Hierarchies by Athina Karatzogianni
No ratings yet
Power, Conflict and Resistance: SocialMovements, Networks and Hierarchies by Athina Karatzogianni
284 pages
DPR Dahisar Mangrove Park
No ratings yet
DPR Dahisar Mangrove Park
370 pages
CP4252 Machine Learning Lab Manual
No ratings yet
CP4252 Machine Learning Lab Manual
26 pages
NRF 24 e 1
No ratings yet
NRF 24 e 1
119 pages
Machine Learnin
100% (2)
Machine Learnin
23 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
9 pages
Machine
100% (1)
Machine
45 pages
Arts7 Q1 M1 Attiresfabricsandtapestriesv Final
100% (2)
Arts7 Q1 M1 Attiresfabricsandtapestriesv Final
28 pages
ML Manual
No ratings yet
ML Manual
30 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
22 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
23 pages
ML Lap
No ratings yet
ML Lap
23 pages
VND - Openxmlformats Officedocument - Wordprocessingml.document&rendition 1
No ratings yet
VND - Openxmlformats Officedocument - Wordprocessingml.document&rendition 1
24 pages
ML Lab Manual1
No ratings yet
ML Lab Manual1
23 pages
Flexural Analysis of Singly Reinforced Beams-Example
100% (1)
Flexural Analysis of Singly Reinforced Beams-Example
2 pages
In Bluebeard's Castle
No ratings yet
In Bluebeard's Castle
65 pages
Tariffs & Beyond - Future of India-US Trade Relations
No ratings yet
Tariffs & Beyond - Future of India-US Trade Relations
49 pages
CP4252 Machine Learning Laboratory
No ratings yet
CP4252 Machine Learning Laboratory
37 pages
ML Manual Final
No ratings yet
ML Manual Final
35 pages
ML Manual
No ratings yet
ML Manual
24 pages
May 2024 Resume
No ratings yet
May 2024 Resume
2 pages
Resume Sample
No ratings yet
Resume Sample
2 pages
Iii Aid - ML
No ratings yet
Iii Aid - ML
30 pages
DA Programs
No ratings yet
DA Programs
44 pages
Machine Learning Final Manual
No ratings yet
Machine Learning Final Manual
45 pages
Machine Learning LAB
No ratings yet
Machine Learning LAB
20 pages
ML Lab Manual
No ratings yet
ML Lab Manual
24 pages
Cp4252 Machine Learning Lab Manual
No ratings yet
Cp4252 Machine Learning Lab Manual
27 pages
Lyrics
No ratings yet
Lyrics
56 pages
MAP Lab Completed
No ratings yet
MAP Lab Completed
29 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
26 pages
cp4252 Machine Learning Lab Manual
No ratings yet
cp4252 Machine Learning Lab Manual
21 pages
Final ML File
No ratings yet
Final ML File
34 pages
M.E Machine Learning - CP4252 Lab Manual4716718074353656238
No ratings yet
M.E Machine Learning - CP4252 Lab Manual4716718074353656238
26 pages
Machinelearning
No ratings yet
Machinelearning
26 pages
ML Record
No ratings yet
ML Record
19 pages
Benham Rise Final Powerpoint Presentation
No ratings yet
Benham Rise Final Powerpoint Presentation
16 pages
MLLab Manual
No ratings yet
MLLab Manual
24 pages
Final Lab Manual
No ratings yet
Final Lab Manual
34 pages
Lab (Work) Experiment File Priyanka Rajak 0901MC221056
No ratings yet
Lab (Work) Experiment File Priyanka Rajak 0901MC221056
19 pages
Wa0003
No ratings yet
Wa0003
16 pages
2018 Wassce - English Language 1
No ratings yet
2018 Wassce - English Language 1
9 pages
Aiml Practicals
No ratings yet
Aiml Practicals
22 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
18 pages
ML Practical File
No ratings yet
ML Practical File
30 pages
ML Lab
No ratings yet
ML Lab
23 pages
Mlalllabprgs
No ratings yet
Mlalllabprgs
17 pages
Aiml Practical
No ratings yet
Aiml Practical
17 pages
Python For Data Science IA 1 Programs
No ratings yet
Python For Data Science IA 1 Programs
14 pages
Argha's ML LAB - 240927 - 121838
No ratings yet
Argha's ML LAB - 240927 - 121838
13 pages
ML Lab Manual
No ratings yet
ML Lab Manual
14 pages
ML Lab Manual
No ratings yet
ML Lab Manual
13 pages
CP4252 Lab Manual
No ratings yet
CP4252 Lab Manual
13 pages
Python For Data Science IA 1 Programs
No ratings yet
Python For Data Science IA 1 Programs
14 pages
ML
No ratings yet
ML
17 pages
ML Journal External
No ratings yet
ML Journal External
14 pages
Train
No ratings yet
Train
17 pages
DS Assignment COMPLETED
No ratings yet
DS Assignment COMPLETED
11 pages
Machine Learning
No ratings yet
Machine Learning
22 pages
LAB-4 Report
No ratings yet
LAB-4 Report
21 pages
Machine Learning
No ratings yet
Machine Learning
10 pages
Shubham Pract 6 - Merged
No ratings yet
Shubham Pract 6 - Merged
12 pages
Machine Learning Programs
No ratings yet
Machine Learning Programs
10 pages
ML Lab Programs
No ratings yet
ML Lab Programs
9 pages
Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)
No ratings yet
Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)
14 pages
pc120 6 - 6E - 6EO SAA4D102E 2
No ratings yet
pc120 6 - 6E - 6EO SAA4D102E 2
12 pages
ML - LAB - FILE Amrit
No ratings yet
ML - LAB - FILE Amrit
13 pages
ML Minimized Programs
No ratings yet
ML Minimized Programs
9 pages
V
No ratings yet
V
8 pages
FYMCA IDSLab A6 Submission
No ratings yet
FYMCA IDSLab A6 Submission
9 pages
Week 3
No ratings yet
Week 3
4 pages
Unit2 ML Programs
No ratings yet
Unit2 ML Programs
7 pages
C1 W1 Lab03 Model Representation Soln-Copy1
No ratings yet
C1 W1 Lab03 Model Representation Soln-Copy1
7 pages
ML Short Code - Under Updating
No ratings yet
ML Short Code - Under Updating
4 pages
Nominal Scale
No ratings yet
Nominal Scale
3 pages
Raye Rose - Quora 3
No ratings yet
Raye Rose - Quora 3
1 page
Canablast EDP 10 Pump - en PDF
No ratings yet
Canablast EDP 10 Pump - en PDF
4 pages
Activity Selection and Huffman Coding Implementation
No ratings yet
Activity Selection and Huffman Coding Implementation
6 pages
Data Sheet
No ratings yet
Data Sheet
2 pages
Ir Assignment 1 Answers
No ratings yet
Ir Assignment 1 Answers
4 pages
Solar Cookers
No ratings yet
Solar Cookers
4 pages
CSR Initiatives Related To Procurement and Suppliers: Organic Cotton
No ratings yet
CSR Initiatives Related To Procurement and Suppliers: Organic Cotton
2 pages
Implementation of Matrix Chain Algorithm
No ratings yet
Implementation of Matrix Chain Algorithm
2 pages
Name:Nor Shakira Binti Azemi & Dharvin Dharan A/L Elango Theme: Environment Issue Topic: Humans Are To Blame For Environmental Degradation
No ratings yet
Name:Nor Shakira Binti Azemi & Dharvin Dharan A/L Elango Theme: Environment Issue Topic: Humans Are To Blame For Environmental Degradation
3 pages
Listening 8th Form Solutions Pre Intermediate Module 6 20240426 134908
No ratings yet
Listening 8th Form Solutions Pre Intermediate Module 6 20240426 134908
5 pages
Curriculum Vitae 2020 2
No ratings yet
Curriculum Vitae 2020 2
2 pages
Areas of Social Sciences PDF
No ratings yet
Areas of Social Sciences PDF
1 page
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet

ML Full For Print New 1

Uploaded by

ML Full For Print New 1

Uploaded by

EX.

NO:1 Implement a Linear Regression with a Real Dataset

1. Import necessary libraries.

import matplotlib.pyplot as plt

# Load the California housing dataset

# Check for missing values

# Convert categorical column 'ocean_proximity' to numerical using one-hot encoding

# Fill missing values with the median of each column

# Define independent (X) and dependent (y) variables

# Split dataset into training (80%) and testing (20%) sets

# Initialize and train the Linear Regression model

# Predict on the test set

# Calculate performance metrics

print(f"\nModel Performance Metrics:")

# Plot Actual vs Predicted Prices

# Plot Residual Errors

 Import required libraries.

# Load dataset from an alternative online source

# Define features and target variable

# Loop through different k values to determine the best k

# Plot accuracy vs number of neighbors

# Train the model with optimal k

# Evaluate the model

# Compute ROC curve

# Compute AUC score

# Perform hyperparameter tuning

# Print best parameters and score

1. Start the program.

import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.datasets import load_wine

from sklearn.neighbors import KNeighborsClassifier

from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Try different K values

neighbors = np.arange(1, 10)

plt.plot(neighbors, train_accuracy, label="Train Accuracy", marker='o')

plt.plot(neighbors, test_accuracy, label="Test Accuracy", marker='s')

plt.xlabel("Number of Neighbors (K)")

# Final Model with Best K

acc = accuracy_score(y_test, y_pred)

print(f"\n Final Accuracy (K={best_k}): {acc:.4f}")

print("\n Confusion Matrix:")

print("\n Classification Report:")

print(classification_report(y_test, y_pred, target_names=wine.target_names))

# Plot Confusion Matrix

sns.heatmap(cm, annot=True, fmt='d', cmap='Blues' xticklabels=wine.target_names,

1. Start the program.

# Step 1: Import Libraries

import matplotlib.pyplot as plt

from sklearn.datasets import make_classification

from sklearn.model_selection import train_test_split

from sklearn.tree import DecisionTreeClassifier

from sklearn.metrics import accuracy_score

# Step 2: Generate synthetic classification dataset

# Step 3: Split into train and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Step 4: Train and test models with different max_depth values

depth_range = range(1, 21)

for depth in depth_range:

model = DecisionTreeClassifier(max_depth=depth, random_state=0)

train_acc = accuracy_score(y_train, train_yhat)

test_acc = accuracy_score(y_test, test_yhat)

print(f"Depth={depth}, Train Acc={train_acc:.3f}, Test Acc={test_acc:.3f}")

# Step 5: Plot Training vs Testing Accuracy

plt.plot(depth_range, train_scores, '-o', label='Training Accuracy')

plt.plot(depth_range, test_scores, '-o', label='Testing Accuracy')

plt.title('Overfitting Detection: Training vs Testing Accuracy')

Algorithm: K-Means Clustering

# Step 1: Import required libraries

from sklearn.cluster import KMeans

from sklearn.preprocessing import StandardScaler

import matplotlib.pyplot as plt

# Step 2: Create sample data

'Kingdom': ['vrl']*5 + ['pri']*2,

'SpeciesID': [100217, 100220, 100755, 100880, 100887, 9601, 9606],

'Kingdom': ['vrl']5 + ['pri']2,