0% found this document useful (0 votes)
26 views38 pages

ML Full For Print New 1

The document outlines a series of exercises focused on implementing various machine learning algorithms, including Linear Regression, binary classification with k-NN, and K-Means clustering, using real datasets. Each exercise includes an aim, algorithm steps, and program code, demonstrating how to preprocess data, train models, evaluate performance, and visualize results. The final exercise emphasizes analyzing model performance to detect overfitting.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views38 pages

ML Full For Print New 1

The document outlines a series of exercises focused on implementing various machine learning algorithms, including Linear Regression, binary classification with k-NN, and K-Means clustering, using real datasets. Each exercise includes an aim, algorithm steps, and program code, demonstrating how to preprocess data, train models, evaluate performance, and visualize results. The final exercise emphasizes analyzing model performance to detect overfitting.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

EX.

NO:1 Implement a Linear Regression with a Real Dataset


DATE: https://www.kaggle.com/harrywang/housing), experiment with different
features in building a model. Tune the model's hyper parameters.

Aim:
To implement a Linear Regression with a Real Dataset experiment with different features in building
a model.

Algorithm:

1. Import necessary libraries.


2. Load the housing dataset from a URL.
3. Check and handle missing values.
4. Convert categorical data to numeric using one-hot encoding.
5. Define features (X) and target (y).
6. Split the data into training and testing sets.
7. Train a Linear Regression model.
8. Predict house prices on the test set.
9. Evaluate the model using MAE, MSE, RMSE, and R² score
10. Visualize actual vs predicted values and residuals.

Program:
import numpy as np

import pandas as pd

import matplotlib.pyplot as plt


import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

# Load the California housing dataset


dataset_url = "https://raw.githubusercontent.com/ageron/handson-
ml/master/datasets/housing/housing.csv"
df = pd.read_csv(dataset_url)

1
# Display first few rows of the dataset
print(df.head())

# Check for missing values


print("Missing values per column:\n", df.isnull().sum())

# Convert categorical column 'ocean_proximity' to numerical using one-hot encoding


df = pd.get_dummies(df, columns=['ocean_proximity'], drop_first=True)

# Fill missing values with the median of each column


df.fillna(df.median(), inplace=True)

# Define independent (X) and dependent (y) variables


X = df.drop(columns=['median_house_value']) # Features
y = df['median_house_value'] # Target variable

# Split dataset into training (80%) and testing (20%) sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train the Linear Regression model


model = LinearRegression()
model.fit(X_train, y_train)

# Predict on the test set


y_pred = model.predict(X_test)

# Calculate performance metrics


mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred)

print(f"\nModel Performance Metrics:")


print(f"Mean Absolute Error (MAE): {mae}")
print(f"Mean Squared Error (MSE): {mse}")
2
print(f"Root Mean Squared Error (RMSE): {rmse}")
print(f"R² Score: {r2}")

# Plot Actual vs Predicted Prices


plt.figure(figsize=(8, 6))
sns.scatterplot(x=y_test, y=y_pred, alpha=0.5)
plt.xlabel("Actual Prices")
plt.ylabel("Predicted Prices")
plt.title("Actual vs Predicted House Prices")
plt.show()

# Plot Residual Errors


residuals = y_test - y_pred
plt.figure(figsize=(8, 6))
sns.histplot(residuals, bins=50, kde=True)
plt.xlabel("Residual Error")
plt.title("Distribution of Residuals")
plt.show()

3
Sample input output:

4
Result:
Thus to implement a Linear Regression with a Real Dataset experiment with different features in
building a model was successfully executed.

5
Ex.no:2 Implement a binary classification model. That is, answers a binary
question such as "Are houses in this neighbourhood above a certain
Date: price?" (use data from exercise 1). Modify the classification threshold and
determine how that modification influences the model. Experiment with
different classification metrics to determine your model's effectiveness.

Aim:
To implement a binary classification model and to modify the classification threshold and determine
how that modification influences the model with different classification metrics to determine your model's
effectiveness.

Algorithm:

 Import required libraries.


 Load the diabetes dataset from a URL.
 Split the data into features (X) and target (y).
 Split the dataset into training and testing sets.
 Train k-NN models with different values of k (1 to 8).
 Plot training and testing accuracy vs k to find the best k.
 Train k-NN model using the best k (e.g., 7).
 Predict and evaluate using:
o Confusion matrix
o Classification report
o ROC curve and AUC score
 Perform hyperparameter tuning using GridSearchCV to find the best k.
 Display the best parameters and the corresponding score.

Program:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import confusion_matrix, classification_report, roc_curve, roc_auc_score

# Load dataset from an alternative online source


url = "https://raw.githubusercontent.com/plotly/datasets/master/diabetes.csv"
df = pd.read_csv(url)
print(df.head())

# Define features and target variable


X = df.drop('Outcome', axis=1).values
y = df['Outcome'].values
6
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=42, stratify=y)

# Initialize variables
neighbors = np.arange(1, 9)
train_accuracy = np.empty(len(neighbors))
test_accuracy = np.empty(len(neighbors))

# Loop through different k values to determine the best k


for i, k in enumerate(neighbors):
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X_train, y_train)
train_accuracy[i] = knn.score(X_train, y_train)
test_accuracy[i] = knn.score(X_test, y_test)

# Plot accuracy vs number of neighbors


plt.figure()
plt.title('k-NN Varying Number of Neighbors')
plt.plot(neighbors, test_accuracy, label='Testing Accuracy')
plt.plot(neighbors, train_accuracy, label='Training Accuracy')
plt.legend()
plt.xlabel('Number of Neighbors')
plt.ylabel('Accuracy')
plt.show()

# Train the model with optimal k


knn = KNeighborsClassifier(n_neighbors=7)
knn.fit(X_train, y_train)

# Evaluate the model


y_pred = knn.predict(X_test)
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))
print("\nClassification Report:")
print(classification_report(y_test, y_pred))

# Compute ROC curve


y_pred_proba = knn.predict_proba(X_test)[:, 1]
fpr, tpr, thresholds = roc_curve(y_test, y_pred_proba)

plt.figure()
plt.plot([0, 1], [0, 1], 'k--')
plt.plot(fpr, tpr, label='k-NN (n_neighbors=7)')
plt.xlabel('False Positive Rate (FPR)')
plt.ylabel('True Positive Rate (TPR)')
plt.title('ROC Curve')
plt.legend()
plt.show()

# Compute AUC score


auc_score = roc_auc_score(y_test, y_pred_proba)
7
print(f"ROC AUC Score: {auc_score:.4f}")

# Perform hyperparameter tuning


param_grid = {'n_neighbors': np.arange(1, 50)}
knn_cv = GridSearchCV(KNeighborsClassifier(), param_grid, cv=5)
knn_cv.fit(X, y)

# Print best parameters and score


print(f"Best Score: {knn_cv.best_score_:.4f}")
print(f"Best Parameters: {knn_cv.best_params_}"

8
Sample input output:

9
Result:
Thus to implement a binary classification model and to modify the classification threshold and
determine how that modification influences the model with different classification metrics to determine your
model's effectiveness was successfully executed.

10
EX.NO:3 CLASSIFICATION WITH NEAREST NEIGHBORS.IN THIS QUESTION
YOU WILL USE SCIKIT’S LEARN’S KNN CLASSIFIER TO CLASSIFY
DATE: REAL VS FAKE NEWS HEADLINES.THE AIM OF THIS QUESTION IS
FOR YOU TO READ THE SCIKIT-LEARN API AND GET
COMFORTABLE WITH TRAINING/VALIDATION SPLITS.USE
CALIFORNIA HOUSING DATASET.

AIM:

To implement a program for Classification using Nearest Neighbors using Scikit-learn’s KNN
classifier and evaluate the model’s performance with training/validation splits and metrics.

ALGORITHM:

1. Start the program.


2. Import the necessary libraries and dataset.
3. Preprocess the dataset if needed.
4. Split the dataset into training and testing sets.
5. Train the K-Nearest Neighbors (KNN) model with different values of K.
6. Plot the accuracy scores for different values of K.
7. Choose the best K and evaluate the model.
8. Print accuracy, confusion matrix, and classification report.
9. Plot the confusion matrix.
10. End the program.

PROGRAM:

# Import Libraries

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.datasets import load_wine

11
from sklearn.model_selection import train_test_split

from sklearn.neighbors import KNeighborsClassifier

from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

# Load Dataset

wine = load_wine()

X = wine.data

y = wine.target

# Split Data

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Try different K values

neighbors = np.arange(1, 10)

train_accuracy = []

test_accuracy = []

for k in neighbors:

knn = KNeighborsClassifier(n_neighbors=k)

knn.fit(X_train, y_train)

train_accuracy.append(knn.score(X_train, y_train))

test_accuracy.append(knn.score(X_test, y_test))

# Plot K vs Accuracy

plt.figure(figsize=(8,5))

plt.plot(neighbors, train_accuracy, label="Train Accuracy", marker='o')

plt.plot(neighbors, test_accuracy, label="Test Accuracy", marker='s')

plt.xlabel("Number of Neighbors (K)")

plt.ylabel("Accuracy")

12
plt.title("KNN Accuracy for Different K Values")

plt.legend()

plt.grid(True)

plt.show()

# Final Model with Best K

best_k = 5

knn = KNeighborsClassifier(n_neighbors=best_k)

knn.fit(X_train, y_train)

y_pred = knn.predict(X_test)

# Evaluation

acc = accuracy_score(y_test, y_pred)

print(f"\n Final Accuracy (K={best_k}): {acc:.4f}")

cm = confusion_matrix(y_test, y_pred)

print("\n Confusion Matrix:")

print(cm)

print("\n Classification Report:")

print(classification_report(y_test, y_pred, target_names=wine.target_names))

# Plot Confusion Matrix

plt.figure(figsize=(6, 4))

sns.heatmap(cm, annot=True, fmt='d', cmap='Blues' xticklabels=wine.target_names,


yticklabels=wine.target_names)
plt.xlabel("Predicted")

plt.ylabel("Actual")

plt.title("Confusion Matrix")

plt.show()
13
SAMPLE INPUT OUTPUT:

14
RESULT:

Thus to implement a program for Classification using Nearest Neighbors using Scikit-learn’s KNN
classifier and evaluate the model’s performance with training/validation splits and metrics was successfully
executed.

15
EX NO: 4 ANALYZE DELTAS BETWEEN TRAINING SET
AND VALIDATION SET RESULTS TO DETERMINE THE MODEL
IS OVERFITTING
DATE:

AIM:

To analyze the difference in performance between training and validation sets to determine if the model is
overfitting, and to visualize the results to detect and address this issue.

ALGORITHM:

1. Start the program.


2. Import necessary libraries and generate or load a classification dataset.
3. Split the dataset into training and testing sets.
4. Train a Decision Tree classifier with different depths.
5. Evaluate model accuracy on training and testing sets for each depth.
6. Record and compare the results.
7. Plot accuracy vs model complexity (tree depth).
8. Identify the presence of overfitting.
9. End the program.

PROGRAM:

# Step 1: Import Libraries

import numpy as np

import matplotlib.pyplot as plt

from sklearn.datasets import make_classification

from sklearn.model_selection import train_test_split

from sklearn.tree import DecisionTreeClassifier

from sklearn.metrics import accuracy_score

# Step 2: Generate synthetic classification dataset

X, y = make_classification(n_samples=10000, n_features=20,

16
n_informative=5, n_redundant=15, random_state=1)

# Step 3: Split into train and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Step 4: Train and test models with different max_depth values

train_scores = []

test_scores = []

depth_range = range(1, 21)

for depth in depth_range:

model = DecisionTreeClassifier(max_depth=depth, random_state=0)

model.fit(X_train, y_train)

train_yhat = model.predict(X_train)

test_yhat = model.predict(X_test)

train_acc = accuracy_score(y_train, train_yhat)

test_acc = accuracy_score(y_test, test_yhat)

train_scores.append(train_acc)

test_scores.append(test_acc)

print(f"Depth={depth}, Train Acc={train_acc:.3f}, Test Acc={test_acc:.3f}")

# Step 5: Plot Training vs Testing Accuracy

plt.figure(figsize=(10, 6))

plt.plot(depth_range, train_scores, '-o', label='Training Accuracy')

plt.plot(depth_range, test_scores, '-o', label='Testing Accuracy')

plt.xlabel('Tree Depth')

plt.ylabel('Accuracy')

plt.title('Overfitting Detection: Training vs Testing Accuracy')

17
plt.legend()

plt.grid(True)

plt.show()

18
SAMPLE INPUT OUTPUT:

19
RESULT:

Thus to analyse the difference in performance between training and validation sets to determine if
the model is overfitting, and to visualize the results to detect and address this issue was executed
successfully.

20
EX NO: 5 Implement the K-Means algorithm using the given
dataset.
DATE:

Aim:

To implement the K-Means Clustering Algorithm on the given biological dataset and group the data points
(species) based on their codon usage frequencies.

Algorithm: K-Means Clustering

1. Start
2. Import the required libraries (pandas, sklearn, matplotlib, etc.).
3. Load the dataset containing codon usage frequencies and other features.
4. Select the relevant numerical features (e.g., UUU, UUC, UUA, UUG).
5. Normalize the features using StandardScaler for better clustering results.
6. Choose the number of clusters (e.g., k = 3).
7. Apply the K-Means clustering algorithm:
■ Initialize centroids randomly.
■ Assign each point to the nearest centroid.
■ Update centroids as the mean of assigned points.
■ Repeat steps until centroids do not change or max iterations are reached.
8. Assign the cluster labels to the dataset.
9. Display the final clustered data with the cluster number.
10. Optionally, visualize clusters using scatter plots.

PROGRAM:

# Step 1: Import required libraries

import pandas as pd

from sklearn.cluster import KMeans

from sklearn.preprocessing import StandardScaler

import matplotlib.pyplot as plt

21
import seaborn as sns

# Step 2: Create sample data

data = {

'Kingdom': ['vrl']*5 + ['pri']*2,

'DNAType': [0]*7,

'SpeciesID': [100217, 100220, 100755, 100880, 100887, 9601, 9606],

'Ncodons': [1995, 1474, 4862, 1915, 22831, 1097, 40662582],

'SpeciesName': [

'Epizootic haemotopoietic necrosis virus',

'Bohle iridovirus',

'Sweet potato leaf curl virus',

'Northern cereal mosaic virus',

'Soil-borne cereal mosaic virus',

'Pongo pygmaeus abelii',

'Homo sapiens'

],

'UUU': [0.01654, 0.02714, 0.01974, 0.01775, 0.02816, 0.02552, 0.01757],

'UUC': [0.01203, 0.01357, 0.0218, 0.02245, 0.01371, 0.03555, 0.02028],

'UUA': [0.0005, 0.00068, 0.01357, 0.01619, 0.00767, 0.00547, 0.00767],

'UUG': [0.00351, 0.00678, 0.01543, 0.00992, 0.03679, 0.01367, 0.01293]

df = pd.DataFrame(data)

# Step 3: Extract codon usage features

22
features = df[['UUU', 'UUC', 'UUA', 'UUG']]

# Step 4: Normalize codon frequencies

scaler = StandardScaler()

scaled_features = scaler.fit_transform(features)

# Step 5: Apply KMeans clustering

kmeans = KMeans(n_clusters=3, random_state=0)

df['Cluster'] = kmeans.fit_predict(scaled_features)

# Step 6: Display final output

pd.set_option('display.max_columns', None)

print(" Final Output:\n")

print(df[['Kingdom', 'DNAType', 'SpeciesID', 'Ncodons', 'SpeciesName', 'UUU', 'UUC', 'UUA', 'UUG',


'Cluster']])

# Step 7: Optional visualization

plt.figure(figsize=(8, 6))

sns.scatterplot(data=df, x='UUU', y='UUC', hue='Cluster', palette='Set1', s=100)

plt.title("Codon Usage Clustering using K-Means")

plt.xlabel("UUU Frequency")

plt.ylabel("UUC Frequency")

plt.grid(True)

plt.show()

23
SAMPLE INPUT OUTPUT:

24
RESULT:

Thus to implement the K-Means Clustering Algorithm on the given biological dataset and group
the data points (species) based on their codon usage frequencies has been successfully completed.

25
EX.NO:6 IMPLEMENT THE NAÏVE BAYES CLASSIFIER USING THE GIVEN
DATASET
DATE:

Aim:

To Implement The Naïve Bayes Classifer Using The Given Dataset For Predicting The Results.

Algorithm:

STEP1: Start the algorithm and implementing a program

STEP2: Open the jupyter notebook and activate the environment

STEP3:Create a new environment and Rename it.

STEP4:Then install the necessary packages and libraries in the created environment on the jupyter
notebook

STEP 5: Data Pre-processing step

STEP 6: Fitting Naive Bayes to the Training set

STEP 7:Predicting the test result

STEP 8:Test accuracy of the result(Creation of Confusion matrix)

STEP 9:Visualizing the test set result.

STEP 10: Finish

Program:

# Step 1: Import required libraries

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

from matplotlib.colors import ListedColormap

26
# Step 2: Create a sample dataset similar to what's used in the experiment

data = {

'User ID': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],

'Gender': ['Male', 'Male', 'Female', 'Female', 'Male', 'Male', 'Female', 'Female', 'Male', 'Female'],

'Age': [19, 35, 26, 27, 45, 46, 48, 50, 29, 31],

'EstimatedSalary': [19000, 20000, 43000, 57000, 26000, 28000, 30000, 87000, 80000, 150000],

'Purchased': [0, 0, 0, 1, 1, 1, 1, 1, 0, 1]

dataset = pd.DataFrame(data)

# Step 3: Select features (Age, EstimatedSalary) and target (Purchased)

X = dataset.iloc[:, [2, 3]].values # Age and EstimatedSalary

y = dataset.iloc[:, 4].values # Purchased

# Step 4: Split the dataset into training and testing sets

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=0)

# Step 5: Feature Scaling

from sklearn.preprocessing import StandardScaler

sc = StandardScaler()

X_train = sc.fit_transform(X_train)

27
X_test = sc.transform(X_test)

# Step 6: Train the Naive Bayes model

from sklearn.naive_bayes import GaussianNB

classifier = GaussianNB()

classifier.fit(X_train, y_train)

# Step 7: Predict the test set results

y_pred = classifier.predict(X_test)

# Step 8: Evaluate the model

from sklearn.metrics import confusion_matrix, accuracy_score

cm = confusion_matrix(y_test, y_pred)

ac = accuracy_score(y_test, y_pred)

print("Confusion Matrix:\n", cm)

print("Accuracy Score:", ac)

# Step 9: Visualization function

def plot_decision_boundary(X_set, y_set, title):

X1, X2 = np.meshgrid(

np.arange(start=X_set[:, 0].min() - 1, stop=X_set[:, 0].max() + 1, step=0.01),

np.arange(start=X_set[:, 1].min() - 1, stop=X_set[:, 1].max() + 1, step=0.01)

28
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),

alpha=0.75, cmap=ListedColormap(('purple', 'green')))

plt.xlim(X1.min(), X1.max())

plt.ylim(X2.min(), X2.max())

for i, j in enumerate(np.unique(y_set)):

plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],

c=ListedColormap(('purple', 'green'))(i), label=j)

plt.title(title)

plt.xlabel('Age')

plt.ylabel('Estimated Salary')

plt.legend()

plt.show()

# Step 10: Plot results

plot_decision_boundary(X_train, y_train, 'Naive Bayes (Training set)')

plot_decision_boundary(X_test, y_test, 'Naive Bayes (Test set)')

29
SAMPLE INPUT OUTPUT:

30
RESULT:

Thus to implement the naïve bayes classifier using the given dataset has been successfully
completed.

31
EX.NO : 7 PROJECT
DATE: BRAIN TUMOUR DETECTION USING MACHINE LEARNING
ALGORITHM

Abstract

The development of aberrant brain cells, some of which may turn cancerous, is known as a brain
tumour. Magnetic Resonance Imaging (MRI) scans are the most common technique for finding brain
tumours. Information about the aberrant tissue growth in the brain is discernible from the MRI scans.
In numerous research papers, machine learning and deep learning algorithms are used to detect brain
tumours. It takes extremely little time to forecast a brain tumour when these algorithms are applied
to MRI pictures, and the better accuracy makes it easier to treat patients. The radiologist can make
speedy decisions thanks to these predictions. A self-defined Artificial Neural Network (ANN) and
Convolution Neural Network (CNN) are used in the proposed work to detect the existence of the
presence of brain tumor and their performance is analyzed.
Keywords: Convolution Neural Network, Machine Learning, Brain tumor, Algorithms

Introduction

The brain is one of the most crucial parts of the human body since it regulates the operation of all
other organs and aids in decision-making It is basically the central nervous system's command post
and is in charge of carrying out the body's regular voluntary and involuntary functions. The tumour
is an uncontrolled proliferation of undesirable tissue that has formed a fibrous mesh inside of our
brain. A brain tumour is identified in roughly 3,540 youngsters this year at the age of 15 To
effectively prevent and treat the condition, it is crucial to have a thorough grasp of brain tumours and
their stages. ANN and CNN is used in the classification of normal and tumor brain.
ANN(Artifical Neural Network) works like a human brain nervous system, on this basis a
digital computer is connected with large amount of interconnections and networking which makes
neural network to train with the use of simple processing units applied on the training set and stores
the experiential knowledge. It has different layers of neurons which is connected together. The neural
network can acquire the knowledge by using data set applied on learning process. There will be one
input and output layer whereas there may be any number of hidden layers. In the learning process,
the weight and bias is added to neurons of each layer depending upon the input features and on the
previous layers(for hidden layers and output layers). A model is trained based on the activation
function applied on the input features and on the hidden layers where more learning happens to
achieve the expected output
32
Existing Methodology

Brain tumour is detected by using Image processing techniques. Various algorithms are used
for the partial fulfilment of the requirements to arrive the best results. Some of the
algorithms used are Probabilistic neural network has been used for more productivity using
SVM and KNN technique. Segmentation plays major role to detect brain tumour.

Pre- Processing

Pre-Processing ways purpose the upgrade of the image while not dynamic the
information content. The first driver of image flaws is as Low.

Segmentation

Local developing could be a basic district primarily based image division strategy. It’s in
addition delegated a pixel-based image division strategy since it includes the
determination of introductory seed focuses. This manner to alter division inspects
neighbouring elements of introductory seed focuses and figures out if the pixel neighbours
need to be additional to the district. The procedure is iterated on, in associate degree
indistinguishable approach from general data grouping calculations. A general discourse
of the venue developing calculation is portrayed beneath.

Convolutional Neural Network

Convolutional Neural Network (CNN) are easier to coach and fewer liable to over fitting.
Methodology like mentioned earlier within the report, we have a tendency to use a patch
primarily based segmentation approach. The Convolutional spec and implementation
administrated exploitation CAFFE. CNNs are the continuation of the multi-layer

33
Perceptron. In the MLP, a unit performs an easy computation by taking the weighted add
of all different units that function input to that. The network is organized into layers of
units within the previous l2ayer. The essence of CNNs is that the convolutions.

The most trick that convolutional Neural Network that avoid the mater too several
parameters is distributed connections. Each unit isn’t connected connect to each
different unit within the previous layer.

Proposed Methodology

The two techniques ANN and CNN are applied on the brain tumor dataset and
their performance on classifying the image is analyzed. Steps followed in applying ANN
on the brain tumor dataset are

1. Import the needed packages


2. Import the data folder
3. Read the images, provide the labels for the image (Set Image having Brain
Tumor as 1 and image nothaving brain tumor as 0) and store them in the
Data Frame.
4. Change the size of images as 256x256 by reading the images one by one.
5. Normalize the image
6. Split the data set into train, validation and test sets
7. Create the model
8. Compile the model
9. Apply the model on the train set.
10. Evaluate the model by applying it on the test set.

The ANN model used here has seven layers. First layer is the flatten layer which
converts the 256x256x3 images into single dimensional array. The next five layers are
the dense layers having the activation function as relu and number of neurons in each
layers are 128,256,512,256 and 128 respectively. These five layers act as the hidden
layers and the last dense layer having the activation function is sigmoid is the output
layer with 1 neuron representing the two classes.

The model is compiled with the adam optimization technique and binary
crossentropy loss function. The model is generated and trained by providing the training
images and the validation images. Once the model is trained, it is tested using the test
34
image set. Next the same dataset is given to the CNN technique. Steps followed
in applying CNN on the brain tumor dataset are

1. Import the needed packages


2. Import the data folder(Yes and No)
3. Set the class labels for images(1 for Brain Tumor and 0 for No Brain Tumor)
4. Convert the images into shape(256X256)
5. Normalize the Image
6. Split the images into the train, validation and test set images.
7. Create the sequential model.
8. Compile the model.
9. Apply it on the train dataset(use validation set to evaluate the training performance).
10. Evaluate the model using the test images.
11. Plot the graph comparing the training and validation accuracy.
12. Draw the confusion matrix for actual output against the predicted output.

The CNN sequential model is generated by implementing different layers. The input
image is reshaped into 256x256. The convolve layer is applied on the input image with
the relu as activation function, padding as same which means the output images looks
like the input image and the number of filters are 32,32,64,128,256 for different
convolve layers. The max pooling applied with the 2x2 window size and droupout
function is called with 20% of droupout. Flatten method is applied to convert the
features into one dimensional array. The fully connected layer is done by calling the
dense method with the number of units as 256 and relu as the activation function. The
output layer has 1 unit to represent the two classes and the sigmoid as activation
function. The architecture of CNN model is shown in the Figure . The implementation
is done using Python language and are executed in google colab.

35
The model is applied for 200 with the training and the validation dataset. The
history of execution is stored and plotted to understand the models generated.

Convolve(32,3x3,"relu",(256x256x3),padding=same)
Convolve(32,3x3,"relu",(256x256x3),padding=same)

Convolve(128,3x3,"relu",(256x256x3),padding=same)

Convolve(256,3x3,"relu",(256x256x3),padding=same)
Convolve(64,3x3,"relu",(256x256x3),padding=same)

Optimizer= "adam" and loss


Model= Sequential()

Fully Connected Layer


Max Pooling(2x2)
Max Pooling(2x2)

Max Pooling(2x2)

Max Pooling(2x2)

Max Pooling(2x2)
Droupout(0.2)
Droupout(0.2)

Output Layer
Droupout(0.2)

Droupout(0.2)

Droupout(0.2)

Flatten()
Figure : Architecture of CNN model

DataSet

The dataset is taken from Github website. This dataset contains MRI images of brain
tumor. Figure shows the sample normal and brain tumor image. Out of 1672 training
image, 877 images are tumor image and 795 images are non tumor image. 92 tumor
and 94 non tumor images taken from 186 validation images. Among 207 testing
images, 116 tumor images and 91 non tumor images.

36
Experimental Result Analysis

Comparing training/validation accuracy and loss of ANN model

Comparing training/validation accuracy and loss of CNN model

Conclusion:

CNN is considered as one of the best technique in analyzing the image dataset. The
CNN makes the prediction by reducing the size the image without losing the information
needed for making predictions. ANN model generated here produces 65.21% of testing
accuracy and this can be increased by providing more image data. The same can be done
by applying the image augmentation techniques and the analyzing the performance of the
ANN and CNN can be done. The model developed here is generated based on the trail and
error method. In future optimization techniques can be applied so as to decide the number
of layers and filters that can used in a model. As of now for the given dataset the CNN
proves to be the better technique in predicting the presence of brain tumor.

37
References:

[1] Javeria Amin Muhammad Sharif Mudassar Raza Mussarat Yasmin 2018 Detection of
Brain Tumor based on Features Fusion and Machine Learning Journal of Ambient
Intelligence and Humanized Computing Online Publication.

[2]. Rajeshwar Nalbalwar Umakant Majhi Raj Patil Prof.Sudhanshu Gonge 2014 Detection
of Brain Tumor by using ANN International Journal of Research in Advent Technology

[3]. Fatih Özyurt Eser Sert Engin Avci Esin Dogantekin 2019 Brain tumor detection based on
Convolutional Neural Network with neutrosophic expert maximum fuzzy sure entropy Elsevier Ltd

38

You might also like