Minor Project
Minor Project
Spinal
Abnormaliti
es using ML
Introduction:
Problem Statement
Lower Back Pain is one of the most common
afflictions and our project attempts to find a
solution to reduce it. The reason why this is a
major issue to be tackled:
➔ Occurs for a large variety of reasons in a large
range of age groups.
➔ Ailment is painful and long-lasting.
➔ Difficult to cure quickly and efficiently, if not
impossible.
➔ Preventative measures are superior to
curative measures.
Introduction:
Our Solution
As we have established, a
prevention is better than
cure. Hence we will
perform accurate
mathematical predictions.
We will utilize a variety of Machine
Learning Algorithms in order to predict
abnormal spine problems. This feature
will be made available in an
application.
● Histogram of each feature is plotted using the given dataset. It shows the range
of the values of the each feature. It is plotted with the help of the hist() function in
matplotlib.pyplot, which falls under the matplotlib library.
● Distribution and box plots are also plotted using the seaborn library and matplotlib
library respectively.
● All the features are plotted to find the bivariate relationships between the
combinations of variables using a scatter plot matrix.
● Finally, a heatmap is plotted to show the correlation among the features of the
dataset.
Data Visualization: Histogram
Data Visualization: Heatmap
Extraction of values and
splitting
We use the iloc[] function, of the Training set denotes the subset
Pandas library, to extract the of a dataset that is used for
selected rows and columns from training the machine learning
the dataset. model.
.iloc[] is primarily integer position A testing set is the subset of the
based(from 0 to length-1 of the dataset that is used for testing
axis), but may also be used with a the machine learning model.
boolean array.
Here, we split the dataset into a
66.67% section for training and
a 33.33% section for test.
Model Selection and
Discussions
Decision Tree Classifier
➔ Decision Trees are algorithms where the data is continuously split according to a
certain parameter.
➔ The tree is explained by two entities called decision nodes and leaves. The leaves
are the decisions or the final outcomes, and the decision nodes are where the data is
split.
➔ It partitions the tree in recursive manner called recursive partitioning. This flowchart-
like structure helps us in decision making.
➔ The time complexity of decision trees is a function of the number of records and
number of attributes in the given data. The decision tree is a distribution-free or non-
parametric method, which does not depend upon probability distribution
assumptions. Decision trees can handle high dimensional data with good accuracy.
Random Forest Classifier
➔ There are two stages in RF algorithm, one is random forest creation, and the other is
to make a prediction from the random forest classifier created in the first stage.
➔ We take the test features and use the rules of each randomly created decision tree to
predict the outcome and stores the predicted outcome (target)
K - nearest Neighbours (KNN)
Classifier
➔ K nearest neighbors is a simple algorithm that stores all available cases and
classifies new cases based on a similarity measure (e.g., distance functions).
➔ The data is classified by a majority vote of its neighbors, with the case being
assigned to the class most common amongst its K nearest neighbors measured by a
distance function.
➔ If k = 1, then the case is simply assigned to the class of its nearest neighbor.
➔ We also implement the weighted KNN algorithm in order to predict and classify the
data into normal and abnormal spines in this research.
➔ Choosing the number of nearest neighbors, which means determine the value of k
plays a crucial role in determining the efficacy of the model. A high k-value has an
advantage which includes reducing the variance due to the noisy data.
Deep Neural Network Classifier
➔ TensorFlow is a open-source deep learning library with tools for building almost any
type of neural network (NN) architecture. It builds a feedforward multilayer neural
network that is trained with a set of labeled data in order to perform classification on
similar, unlabeled data.
➔ Recall that the DNNClassifier builds a feedforward multilayer neural network, hence
when we call the function, we need to indicate how many hidden layers we want and
how many nodes there should be on each of the layer.
Model Evaluation and Selection
● .After exhaustive training on different classification algorithm We
choose a small DNN for this task.
● We first compared the models based on Accuracy on test set.
And if some models were going to be comparable we were going
to use evaluation based on F1 score and Area Under the Curve.
● But fortunately the small DNN out performed other models by
huge margin.So we directly choose it as the final Model.
Graphical User Interface &
Deployment
How to make the Application
The following future scope and uses are possible for this project:
➔ Can be integrated into a application using API, thus increasing platform reach.
https://pandas.pydata.org/
https://matplotlib.org/3.1.1/tutorials/introductory/pyplot.html
https://www.tensorflow.org/
https://deepai.org/machine-learning-glossary-and-terms/
Thank You