Presentation 1
Presentation 1
Health Data
Supervised By Submitted by
Shahidul Islam Khan Md Sohel Mahmud Avon (C151009)
Md Azizul Hakim (C151025)
Problem Definition
◦ In modern countries, people preserved and used mental health data in the
research field. By storing and analyzing the data they extract hidden useful
knowledge and improve their medical services.
◦ In this case, we are lagging behind. Our doctors cannot get enough help from
modern technology like data mining. By preserving mental health data in a
proper way and using the data for knowledge discovery by data mining
techniques we can help our doctors as well as improve our medical services.
Objective
◦ Collection of mental health data, digitized them and use them for knowledge discovery.
◦ Selecting the required attributes for the duration of stay of patient’s model using features
selection technique.
◦ Developing a model to predict the duration of stay of patients with a mental disorder in
hospitals.
◦ Selecting the required attribute for the suicidal attempts triggering factor using features
selection technique.
◦ Predicting suicidal attempts using classifiers.
◦ Among all the columns/features sorting out influential attributes for suicidal attempts.
Motivation
◦ According to the National Mental Health Survey in 2003-2005 about 16.05% of the adult population in
the country are suffering from mental disorders [2].
◦ Suicide attempt cases along with these mental diseases are also rising high in our country. Statistics say
that, every year 11,000 people committed suicide in Bangladesh, which means on average 172 people are
committing suicide in each district [1].
Related Work
◦ This paper “Supporting the Treatment of Mental Diseases using Data Mining” [3] analyzed 466 mental health
patient’s datasets to find the relation between diagnosis and attributes. They applied three machine-learning
techniques: Random forest, SVM, K-nearest neighbor and compared their performances on different measures of
accuracy in diagnosing mental health problems.
◦ Bhakta I. and Sau A. [4] developed a predictive model for prediction of depression among senior citizens of India
using machine learning classifiers. Data that were used in this study was collected from a slum in Kolkata. Naïve
Bayes (NB), logistic regression (LR), Multi-layer Perceptron (MLP), Support Vector Machines (SVM) and Decision
Trees (DT) machine learning techniques were used in this study.
◦ This paper “Predicting Generalized Anxiety Disorder among Women Using Random Forest Approach” [5] worked
with GAD data. They used machine learning approach Random Forest algorithm to find out the prediction model of
GAD.
Research Gap
Contribution
◦ We’ve collected real-life raw data (747) which includes suicidal data (59) from NIMH.
◦ We’ve used Regression model to predict the duration of stay of patients in mental hospital
based on their attributes.
◦ We’ve used feature selection technique to select triggering factors/attributes related to suicidal
attempt.
◦ We’ve used different classifiers to predict the suicide attempt tendency among the patients in
mental hospital based on their attributes.
Methodology
Flow Chart
Data Collection
◦ We’ve collected 747 patients data from NIMH.
◦ Each of the patients had average of 14 page full of
hand written information and diagnosis data.
◦ We found total 59 patients who attempted to commit
suicide.
• confusion matrix is a table that is often used to describe the performance of a classification model (or "classifier") on
a set of test data for which the true values are known.
Test Result
• How many days a patient stayed in
hospital (date difference).
After evaluation the RMSE was
36.99
• Suicidal Attempt prediction result is
shown in the figure