0% found this document useful (0 votes)
23 views16 pages

Week 10

Uploaded by

Jamila Hamdi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views16 pages

Week 10

Uploaded by

Jamila Hamdi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Week 10– Healthcare Project

Group Name: Data Scientists Team

Team Members Details:

Name Email Countr College / Specializ


y Company ation
Yousef yousefxelbayomi@gm Palesti Bahçeşehir Data
Elbayoumi ail.com ne University Science
Mukhammad kmukhammadjon@g Uzbeki Ulsan National Data
jon mail.com stan Institute of Science
Kholmirzev Science and
Technology
Jamila hamdi jamila.hamdi90@gma Tunisia Trainee Data
il.com Science
H. Melis [email protected] UK Hacettepe Data
Tekin Akcin University Science

Problem Description
One of the challenges for Pharmaceutical companies is to
understand the persistency of drug as per the physician
prescription. This issue results in a bad impact on the
pharmacies for all the categories; patients, physicians, and
administration. However, the team of data scientist is capable of
discovering the analyzing the dataset and detecting the factors
that are impacting the primary factor which is the "persistency".
By building a classification machine learning model, we will be
able to classify the dataset and find the variables that affect the
target variables "Persistency Flag".
EDA performed on data

Dataset:

Totally we have 3424 observations and 69 features.

The captures below shows informations on some columns.


Features types:
Null values:

We checked from the data and didn’t find any null values.

Unknown Values :
On the other hand, we found a lot of the “Unknown” values, we
considered them as null values and decided to remove them
because they can affect the results of our ML models.

Outliers :
We have 460 outliers in “Dexa_Freq_During_Rx” variable.

We have 8 outliers in “Count_Of_Risks” variable.

Skewed Data :

As seen here, since the tail is on the right side, we can say that
“Dexa_Freq_During_Rx” variable has right-skewed distribution.
Hence, we can conclude that the mean value is greater than the
mode.

Demoghraphics analysis:
Ethnicity:

“Non-Hispanic” people dominates the “Hispanic” people and


also we have unknown values.
Age:

Age “>55” can be related to have persistency to drug.


Race:

The Caucasians are dominated the other races.

Gender:

The female patients are more than the male patients.


Gender wise Analysis :

As you can see from the graph, a huge imbalance between the
genders.
Ntm Speciality analysis:
General Practitioner, Rheumatology, Endocrinology and
Oncology specialists prescribed the NTM Rx most.

Clinical Factors analysis:

Risk Segment:

We have compared the risk segments prior NTM and during


NTM and examine how it changes

Fragility: we have obtained the following cross- table:


T-scores prior to NTM:

T-scores during RX:


Statistics analysis:

Statistics for numerical Features:

Statistics for categorical features


Ratio of the target variable:

Number of DEXA scans by each region:


Numerical Values :

We have only two columns with numerical values, these


diagrams shows the relations between these columns.

Final recommendations :

 According to Cleaning: The data is considered clean pretty


much.
 According to Region: The data mostly as “Not Persistent”.
 According to Correlations: The data doesn’t have good
correlations due to encoding the data, we replaced Y and N
with 0 and 1.
 According to Statistics: Similar to the correlations, we
can’t comment much here due to the same reason.
 Obviously, this is a classification problem, the team is
considering several ML models to build and test, such as
KNN, MLP, Decision Tree, Random Forest, etc.

GitHub Repo Link: https://github.com/jamilaHa/Healthcare---


Persistency-of-a-drug/tree/main/Week_10

You might also like