0% found this document useful (0 votes)
15 views21 pages

Minor Project

Mini project

Uploaded by

Samiksha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views21 pages

Minor Project

Mini project

Uploaded by

Samiksha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Predicting

Spinal
Abnormaliti
es using ML
Introduction:
Problem Statement
Lower Back Pain is one of the most common
afflictions and our project attempts to find a
solution to reduce it. The reason why this is a
major issue to be tackled:
➔ Occurs for a large variety of reasons in a large
range of age groups.
➔ Ailment is painful and long-lasting.
➔ Difficult to cure quickly and efficiently, if not
impossible.
➔ Preventative measures are superior to
curative measures.
Introduction:
Our Solution
As we have established, a
prevention is better than
cure. Hence we will
perform accurate
mathematical predictions.
We will utilize a variety of Machine
Learning Algorithms in order to predict
abnormal spine problems. This feature
will be made available in an
application.

This way, preventative steps can be


taken by anyone utilizing this
application.
Workflow of
Project
We followed a standard machine learning
methodology that is generally adopted:

1.Data Extraction - Data to be worked with is


retrieved, obtained or collected.

2. Data Cleaning - Corrupt and inaccurate data is


checked for and dealt with.

3. Feature Extraction - The data is altered into more


manageable groups.

4. Classification - Determine the type of ML Model


that is needed.

5. Supervised/Unsupervised ML - Fit the data to the


model.

6. Validation - Check if the obtained data matches


The Data Set
1

For the project we will be


using a data set consisting
of the biometric data of 310
patients: 210 Abnormal
Cases, 100 Normal Cases.
Biometric Data Present:
Pelvic Incidence, Pelvic Tilt,
Lumbar Lordosis Angle,
Pelvic Radius, etc.
Identifying and handling the missing
values
➔ In data preprocessing, it is pivotal to identify
and correctly handle the missing values,
failing to do this, you might draw inaccurate
and faulty conclusions and inferences from
the data.

➔ To make sure that there are no missing


values, we use a Data Frame class, that
contains a function isnull(), which checks for
NaN values and returns a boolean value
True or False. We further use the sum()
method to count the number of NaN, aka, the
null values.
Visualization of Data

● Histogram of each feature is plotted using the given dataset. It shows the range
of the values of the each feature. It is plotted with the help of the hist() function in
matplotlib.pyplot, which falls under the matplotlib library.

● Distribution and box plots are also plotted using the seaborn library and matplotlib
library respectively.

● All the features are plotted to find the bivariate relationships between the
combinations of variables using a scatter plot matrix.

● Finally, a heatmap is plotted to show the correlation among the features of the
dataset.
Data Visualization: Histogram
Data Visualization: Heatmap
Extraction of values and
splitting
We use the iloc[] function, of the Training set denotes the subset
Pandas library, to extract the of a dataset that is used for
selected rows and columns from training the machine learning
the dataset. model.
.iloc[] is primarily integer position A testing set is the subset of the
based(from 0 to length-1 of the dataset that is used for testing
axis), but may also be used with a the machine learning model.
boolean array.
Here, we split the dataset into a
66.67% section for training and
a 33.33% section for test.
Model Selection and
Discussions
Decision Tree Classifier

➔ Decision Trees are algorithms where the data is continuously split according to a
certain parameter.

➔ The tree is explained by two entities called decision nodes and leaves. The leaves
are the decisions or the final outcomes, and the decision nodes are where the data is
split.

➔ It partitions the tree in recursive manner called recursive partitioning. This flowchart-
like structure helps us in decision making.

➔ The time complexity of decision trees is a function of the number of records and
number of attributes in the given data. The decision tree is a distribution-free or non-
parametric method, which does not depend upon probability distribution
assumptions. Decision trees can handle high dimensional data with good accuracy.
Random Forest Classifier

➔ Random forests or random decision forests operate by constructing a multitude of


decision trees at training time and outputting the class that is the mode of the
classes (classification) or mean prediction (regression) of the individual trees.

➔ There are two stages in RF algorithm, one is random forest creation, and the other is
to make a prediction from the random forest classifier created in the first stage.

➔ We take the test features and use the rules of each randomly created decision tree to
predict the outcome and stores the predicted outcome (target)
K - nearest Neighbours (KNN)
Classifier
➔ K nearest neighbors is a simple algorithm that stores all available cases and
classifies new cases based on a similarity measure (e.g., distance functions).

➔ The data is classified by a majority vote of its neighbors, with the case being
assigned to the class most common amongst its K nearest neighbors measured by a
distance function.

➔ If k = 1, then the case is simply assigned to the class of its nearest neighbor.

➔ We also implement the weighted KNN algorithm in order to predict and classify the
data into normal and abnormal spines in this research.

➔ Choosing the number of nearest neighbors, which means determine the value of k
plays a crucial role in determining the efficacy of the model. A high k-value has an
advantage which includes reducing the variance due to the noisy data.
Deep Neural Network Classifier

➔ TensorFlow is a open-source deep learning library with tools for building almost any
type of neural network (NN) architecture. It builds a feedforward multilayer neural
network that is trained with a set of labeled data in order to perform classification on
similar, unlabeled data.

➔ When we create a DNNClassifier, we need to specify the feature columns (input


layer), the architecture of the neural network (hidden layers) and the number of
classes (output layer).

➔ Recall that the DNNClassifier builds a feedforward multilayer neural network, hence
when we call the function, we need to indicate how many hidden layers we want and
how many nodes there should be on each of the layer.
Model Evaluation and Selection
● .After exhaustive training on different classification algorithm We
choose a small DNN for this task.
● We first compared the models based on Accuracy on test set.
And if some models were going to be comparable we were going
to use evaluation based on F1 score and Area Under the Curve.
● But fortunately the small DNN out performed other models by
huge margin.So we directly choose it as the final Model.
Graphical User Interface &
Deployment
How to make the Application

● We made an simple Windows Application by which a person can


use our model get predictions.
● For Making the FrontEnd We used the the python Library called
Tkinter.
● Its a basic Basic GUI Library by which we can create simple and
clean Interface.
● We have also took care of some basic things such as you
can’t submit with all your column being empty.
● The Deep Learning Model was used to do the Job.
GUI
Conclusion and Future Scope
We have acquired an optimized model that gives 96.45% accuracy as well as a simple GUI
that can allow a user to interact with our application.

The following future scope and uses are possible for this project:

➔ Usable by any individuals who have performed preliminary health check-up.

➔ Can be integrated into a application using API, thus increasing platform reach.

➔ Can be utilized directly in medical and health-care industries for classification.

➔ Integrated into a larger scope personalized health-care application.(i.e Mobile App)


References
https://numpy.org/

https://pandas.pydata.org/

https://matplotlib.org/3.1.1/tutorials/introductory/pyplot.html

https://www.tensorflow.org/

https://deepai.org/machine-learning-glossary-and-terms/

Thank You

You might also like