0% found this document useful (0 votes)

6 views34 pages

Analysis Process Data Analysis and Visualisation Using Python

The document outlines an analysis process of a cleaned student depression dataset, which contains 27,857 entries and 18 columns related to demographics, academic pressures, and mental health indicators. It includes data visualizations to explore the distribution of depression among individuals, gender differences, family history of mental illness, sleep duration, and dietary habits. The analysis aims to provide insights into the factors associated with depression among students.

Uploaded by

Aditya Agarwal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views34 pages

Analysis Process Data Analysis and Visualisation Using Python

Uploaded by

Aditya Agarwal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

analysis_process

May 20, 2025

[36]: import pandas as pd

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import warnings #used to supress user warnings
#---------------------------------------------------------------------------------------------
#supresses 'user warnings' related to boolean series
warnings.filterwarnings(action='ignore', category=UserWarning,␣
↪message=r"Boolean Series.*")

#---------------------------------------------------------------------------------------------
#imported cleaned database
df = pd.read_csv(r"cleaned_student_depression_dataset.csv",index_col= 0,header␣
↪= 0 )

[37]: df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 27857 entries, 0 to 27901
Data columns (total 18 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 27857 non-null int64
1 Gender 27857 non-null object
2 Age 27857 non-null int64
3 City 27857 non-null object
4 Profession 27857 non-null object
5 Academic Pressure 27857 non-null int64
6 Work Pressure 27857 non-null int64
7 CGPA 27857 non-null float64
8 Study Satisfaction 27857 non-null int64
9 Job Satisfaction 27857 non-null int64
10 Sleep Duration 27857 non-null object
11 Dietary Habits 27857 non-null object
12 Degree 27857 non-null object
13 Have you ever had suicidal thoughts ? 27857 non-null object
14 Work/Study Hours 27857 non-null int64
15 Financial Stress 27857 non-null int64

1
16 Family History of Mental Illness 27857 non-null object
17 Depression 27857 non-null bool
dtypes: bool(1), float64(1), int64(8), object(8)
memory usage: 3.9+ MB

[38]: df.head()

[38]: id Gender Age City Profession Academic Pressure \

Serial Number
0 1 Male 19 Delhi Student 4
1 2 Male 33 Visakhapatnam Student 5
2 8 Female 24 Bangalore Student 2
3 26 Male 31 Srinagar Student 3
4 30 Female 28 Varanasi Student 3

Work Pressure CGPA Study Satisfaction Job Satisfaction \

Serial Number
0 0 6.00 3 0
1 0 8.97 2 0
2 0 5.90 5 0
3 0 7.03 5 0
4 0 5.59 2 0

Sleep Duration Dietary Habits Degree \

Serial Number
0 '6-7 hours' Moderate B.Com
1 '5-6 hours' Healthy B.Pharm
2 '5-6 hours' Moderate BSc
3 'Less than 5 hours' Healthy BA
4 '7-8 hours' Moderate BCA

Have you ever had suicidal thoughts ? Work/Study Hours \

Serial Number
0 Yes 8
1 Yes 3
2 No 3
3 No 9
4 Yes 4

Financial Stress Family History of Mental Illness Depression

Serial Number
0 4 No True
1 1 No True
2 2 Yes False
3 1 Yes False
4 5 Yes True

[39]: df.describe() #description of cleaned database

2
[39]: id Age Academic Pressure Work Pressure \
count 27857.000000 27857.000000 27857.000000 27857.000000
mean 70443.316725 25.820835 3.141580 0.000431
std 40648.631003 4.906158 1.381802 0.044027
min 1.000000 18.000000 0.000000 0.000000
25% 35039.000000 21.000000 2.000000 0.000000
50% 70694.000000 25.000000 3.000000 0.000000
75% 105827.000000 30.000000 4.000000 0.000000
max 140699.000000 59.000000 5.000000 5.000000

CGPA Study Satisfaction Job Satisfaction Work/Study Hours \

count 27857.000000 27857.000000 27857.000000 27857.000000
mean 7.655911 2.944395 0.000682 7.157196
std 1.470837 1.360876 0.044429 3.707066
min 0.000000 0.000000 0.000000 0.000000
25% 6.280000 2.000000 0.000000 4.000000
50% 7.770000 3.000000 0.000000 8.000000
75% 8.920000 4.000000 0.000000 10.000000
max 10.000000 5.000000 4.000000 12.000000

Financial Stress
count 27857.000000
mean 3.140467
std 1.437145
min 1.000000
25% 2.000000
50% 3.000000
75% 4.000000
max 5.000000

[40]: # Inference 1
#---------------------------------------------------------------------------------------------
fig,axes = plt.subplots(nrows = 1, ncols = 2, figsize=(10, 6))
colors = ['#e87d5d','#62d997']
#---------------------------------------------------------------------------------------------
labels1 = ['Depressed','Non-depressed']
axes[0].bar(labels1, df['Depression'].value_counts(), width=0.4, color = colors)
axes[0].set_xticks(labels1,labels1,
rotation=0, ha='center')
axes[0].tick_params(axis='x', labelsize=10)
axes[0].set_title('Distribution of depression\n in individuals', size = 15)
axes[0].set_ylabel('Individuals (count)', size = 12)
#---------------------------------------------------------------------------------------------
explode = (0.05,0.05)
axes[1].pie(df['Depression'].value_counts(), labels=labels1,
autopct='%1.0f%%', colors=colors, explode=explode,
shadow=True, startangle = 30)

3
axes[1].set_title('Distribution of depression\n in individuals',size = 15)
plt.subplots_adjust(left=0.1, right=0.9, top=0.9, bottom=0.1, wspace=0.1,␣
↪hspace=0.4)

#---------------------------------------------------------------------------------------------
plt.show()

[41]: # Inference 2
#---------------------------------------------------------------------------------------------
fig,axes = plt.subplots(nrows = 1, ncols = 2, figsize=(10, 6))
colors = ['#62add9','#d57bdb']
#---------------------------------------------------------------------------------------------
labels1 = df[df['Depression'] == True]['Gender'].value_counts().index
axes[0].bar(labels1, df[df['Depression'] == True]['Gender'].value_counts(),␣
↪width=0.4, color = colors)

axes[0].set_xticks(labels1,labels1,
rotation=0, ha='center')
axes[0].tick_params(axis='x', labelsize=10)
axes[0].set_title('Distribution of gender\n in depressed individuals', size =␣
↪15)

axes[0].set_ylabel('Individuals (count)', size = 12)

axes[0].set_xlabel('Gender', size = 12)
#---------------------------------------------------------------------------------------------
explode = (0.05,0.05)

4
axes[1].pie(df[df['Depression'] == True]['Gender'].value_counts(),␣
↪labels=labels1,

autopct='%1.0f%%', colors=colors, explode=explode,

shadow=True, startangle = 30)
axes[1].set_title('Distribution of gender\n in depressed individuals',size = 15)
plt.subplots_adjust(left=0.1, right=0.9, top=0.9, bottom=0.1, wspace=0.1,␣
↪hspace=0.4)

#---------------------------------------------------------------------------------------------
plt.show()

[42]: # Inference 3
#---------------------------------------------------------------------------------------------
fig,axes = plt.subplots(nrows = 1, ncols = 2, figsize=(10, 6))
colors = ['#0390fc','#f032b7']
#---------------------------------------------------------------------------------------------
labels1 = df[df['Family History of Mental Illness'] == 'Yes']['Depression'].
↪value_counts().index

axes[0].bar(labels1, df[df['Family History of Mental Illness'] ==␣

↪'Yes']['Depression'].value_counts(),

width=0.4, color = ['#e87d5d','#62d997'])

axes[0].set_xticks(labels1,['Depressed','Non-depressed'],
rotation=0, ha='center')

5
axes[0].tick_params(axis='x', labelsize=10)
axes[0].set_title('Distribution of individuals whose\n family had history of␣
↪mental illness', size = 15)

axes[0].set_ylabel('Individuals (count)', size = 12)

#---------------------------------------------------------------------------------------------
explode = (0.05,0.05)
axes[1].pie(df[df['Family History of Mental Illness'] == 'Yes']['Depression'].
↪value_counts(),

labels=['Depressed','Non-depressed'], autopct='%1.0f%%', colors=␣

↪['#e87d5d','#62d997'],

explode=explode, shadow=True, startangle = 30)

axes[1].set_title('Distribution of individuals whose\n family had history of␣
↪mental illness', size = 15)

plt.subplots_adjust(left=0.1, right=0.9, top=0.9, bottom=0.1, wspace=0.1,␣

↪hspace=0.4)

#---------------------------------------------------------------------------------------------
plt.show()

[43]: # Inference 4
#---------------------------------------------------------------------------------------------
fig,axes = plt.subplots(nrows = 2, ncols = 2, figsize=(15, 10))
colors = (0.2,0.4,0.2,0.6)
#---------------------------------------------------------------------------------------------
labels1 = df['Sleep Duration'].value_counts().index

6
axes[0,0].bar(labels1, df['Sleep Duration'].value_counts(), width=0.4, color =␣
↪colors)

axes[0,0].set_xticks(labels1, labels1, rotation=25, ha='center')

axes[0,0].tick_params(axis='x', labelsize=10)
axes[0,0].set_title('Sleep duration of all the individuals', size = 15)
axes[0,0].set_xlabel('Sleeping hours', size = 12)
axes[0,0].set_ylabel('Individuals (count)', size = 12)
#---------------------------------------------------------------------------------------------
labels2 = df[df['Depression'] == True]['Sleep Duration'].value_counts().index
axes[0,1].bar(labels2, df[df['Depression'] == True]['Sleep Duration'].
↪value_counts(), width=0.4, color = colors)

axes[0,1].set_xticks(labels2, labels2, rotation=30, ha='center')

axes[0,1].tick_params(axis='x', labelsize=10)
axes[0,1].set_title('Sleep duration of depressed individuals', size = 15)
axes[0,1].set_xlabel('Sleeping hours', size = 12)
axes[0,1].set_ylabel('Individuals (count)', size = 12)
#---------------------------------------------------------------------------------------------
labels3 = df[df['Depression'] == False]['Sleep Duration'].value_counts().index
axes[1,0].bar(labels3, df[df['Depression'] == False]['Sleep Duration'].
↪value_counts(), width=0.4, color = colors)

axes[1,0].set_xticks(labels3, labels3, rotation=30, ha='center')

axes[1,0].tick_params(axis='x', labelsize=10)
axes[1,0].set_title('Sleep duration of non-depressed individuals', size = 15)
axes[1,0].set_xlabel('Sleeping hours', size = 12)
axes[1,0].set_ylabel('Individuals (count)', size = 12)
#---------------------------------------------------------------------------------------------
explode = (0.05,0.05,0.05,0.05,0.05)
axes[1,1].pie(df['Sleep Duration'].value_counts(), autopct='%1.3f%%',␣
↪shadow=True, startangle = 30, labels = labels1,

explode = explode )
axes[1,1].set_title('Sleep duration of all individuals',size = 15)
plt.subplots_adjust(left=0.1, right=0.9, top=0.9, bottom=0.1, wspace=0.17,␣
↪hspace=0.53)

#---------------------------------------------------------------------------------------------
plt.show()

7
[44]: # Inference 5
#---------------------------------------------------------------------------------------------
fig,axes = plt.subplots(nrows = 2, ncols = 2, figsize=(15, 10))
colors = (0.2,0.4,0.2,0.6)
#---------------------------------------------------------------------------------------------
labels1 = df['Dietary Habits'].value_counts().index
axes[0,0].bar(labels1, df['Dietary Habits'].value_counts(), width=0.4,color =␣
↪colors)

axes[0,0].set_xticks(labels1, labels1, ha='center')

axes[0,0].tick_params(axis='x', labelsize=10)
axes[0,0].set_title('Dietary habits of all individuals', size = 15)
axes[0,0].set_ylabel('Individuals (count)', size = 12)
axes[0,0].set_xlabel('Dietary Habits', size = 12)
#---------------------------------------------------------------------------------------------
labels2 = df[df['Depression'] == True]['Dietary Habits'].value_counts().index
axes[0,1].bar(labels2, df[df['Depression'] == True]['Dietary Habits'].
↪value_counts(), width=0.4, color = colors)

axes[0,1].set_xticks(labels2, labels2, ha='center')

axes[0,1].tick_params(axis='x', labelsize=10)
axes[0,1].set_title('Dietary habits of depressed individuals', size = 15)

8
axes[0,1].set_ylabel('Individuals (count)', size = 12)
axes[0,1].set_xlabel('Dietary Habits', size = 12)
#---------------------------------------------------------------------------------------------
labels3 = df[df['Depression'] == False]['Dietary Habits'].value_counts().index
axes[1,0].bar(labels3, df[df['Depression'] == False]['Dietary Habits'].
↪value_counts(), width=0.4, color = colors)

axes[1,0].set_xticks(labels3, labels3, ha='center')

axes[1,0].tick_params(axis='x', labelsize=10)
axes[1,0].set_title('Dietary habits of non-depressed individuals', size = 15)
axes[1,0].set_ylabel('Individuals (count)', size = 12)
axes[1,0].set_xlabel('Dietary Habits', size = 12)
#---------------------------------------------------------------------------------------------
explode = (0.05,0.05,0.05,0.05)
axes[1,1].pie(df['Dietary Habits'].value_counts(), autopct='%1.2f%%',␣
↪shadow=True, startangle = 30, labels = labels1,

explode = explode)
axes[1,1].set_title('Dietary habits of all individuals', size = 15)
#---------------------------------------------------------------------------------------------
plt.subplots_adjust(left=0.1, right=0.9, top=0.9, bottom=0.1, wspace=0.17,␣
↪hspace=0.4)

plt.show()

9
[45]: # Inference 6
#---------------------------------------------------------------------------------------------
labels = ['age of all \nindividuals', 'age of depressed \nindividuals',
'age of non-depressed\n individuals']
colors = ['#995757', '#b0ae54', '#4bad6f']

fig, axes = plt.subplots(nrows=1,ncols=1,figsize=(5, 7))

axes.set_title('Age of individuals in different categories', size = 15)
axes.set_ylabel('Age', size = 12)

bplot = axes.boxplot([df['Age'],df[df['Depression'] ==␣

↪True]['Age'],df[df['Depression'] == False]['Age']], widths=0.80,

patch_artist=True, # allows color

tick_labels=labels)
# fills with colors
for patch, color in zip(bplot['boxes'], colors):
patch.set_facecolor(color)

for median in bplot['medians']:

median.set_color('black')

axes.tick_params(axis='x', labelrotation=25)
axes.set_yticks(range(16,64,2))
axes.set_yticklabels(range(16,64,2))

plt.show()

10
[46]: # Inference 7
#---------------------------------------------------------------------------------------------
labels = ['Academic Pressure \nof all individuals', 'Academic Pressure of␣
↪\ndepressed individuals',

'Academic Pressure of \nnon-depressed individuals']

colors = ['#536cb8', '#20465c', '#905799']

11
fig, axes = plt.subplots(nrows=1,ncols=1,figsize=(5, 7))
axes.set_title('Academic pressure of individuals in different categories', size␣
↪= 15)

axes.set_ylabel('Academic Pressure (on a scale of 0 to 5)', size = 12)

bplot = axes.boxplot([df['Academic Pressure'],df[df['Depression'] ==␣

↪True]['Academic Pressure'],

df[df['Depression'] == False]['Academic Pressure']],␣

↪widths=0.80,

patch_artist=True,
tick_labels=labels)

for patch, color in zip(bplot['boxes'], colors):

patch.set_facecolor(color)

for median in bplot['medians']:

median.set_color('black')

axes.tick_params(axis='x', labelrotation=25)
axes.set_yticks(np.arange(0,5.5,0.5))
axes.set_yticklabels(np.arange(0,5.5,0.5))

plt.show()

12
[47]: # Inference 8
#---------------------------------------------------------------------------------------------
labels = ['CGPA \nof all individuals', 'CGPA of \ndepressed individuals',
'CGPA of \nnon-depressed individuals']
colors = ['peachpuff', '#32a8a2', '#32a852']

fig, axes = plt.subplots(nrows=1,ncols=1,figsize=(5, 7))

axes.set_title('CGPA of individuals in different categories', size = 15)

13
axes.set_ylabel('CGPA', size = 12)
bplot = axes.boxplot([df['CGPA'],df[df['Depression'] == True]['CGPA'],
df[df['Depression'] == False]['CGPA']], widths=0.80,
patch_artist=True,
tick_labels=labels)

for patch, color in zip(bplot['boxes'], colors):

patch.set_facecolor(color)

for median in bplot['medians']:

median.set_color('black')

axes.tick_params(axis='x', labelrotation=25)
axes.set_yticks(np.arange(0,10.5,0.5))
axes.set_yticklabels(np.arange(0,10.5,0.5))

plt.show()

14
[48]: # Inference 9
#---------------------------------------------------------------------------------------------
fig, axes = plt.subplots(nrows=1,ncols=1,figsize=(20, 10))
labels = df['Degree'].unique()
collection = []
for x in df['Degree'].unique() :
collection.append(df[df['Degree']==x]['Work/Study Hours'])

15
bplot = axes.boxplot(collection, widths=0.80,
patch_artist=True,
tick_labels=labels)

to_highlight = [1,3,6,10,17,27]
for x in to_highlight:
bplot['boxes'][x].set_facecolor('#cf9d7c')

for median in bplot['medians']:

median.set_color('black')

axes.tick_params(axis='x', labelrotation=90)
axes.set_title('Study hours of different courses', size = 15)
axes.set_xlabel('Degrees', size = 12)
axes.set_ylabel('Study hours', size = 12)
axes.set_yticks(np.arange(0,14,1))
axes.set_yticklabels(np.arange(0,14,1))

plt.show()

[49]: # Inference 10
#---------------------------------------------------------------------------------------------
fig, axes = plt.subplots(nrows=1,ncols=1,figsize=(20, 10))
labels = df['City'].unique()
collection = []

16
for x in df['City'].unique() :
collection.append(df[df['City']==x]['Academic Pressure'])

bplot = axes.boxplot(collection, widths=0.80,

patch_artist=True,
tick_labels=labels)

to_highlight = [0,4,8,10,13,14,17,22,24,28]
for x in to_highlight:
bplot['boxes'][x].set_facecolor('#cf9d7c')

for median in bplot['medians']:

median.set_color('black')

axes.tick_params(axis='x', labelrotation=90)
axes.set_title('Academic pressure of individuals in different cities', size =␣
↪15)

axes.set_xlabel('Cities', size = 12)

axes.set_ylabel('Academic pressure (on a scale of 0 to 5)', size = 12)
axes.set_yticks(np.arange(0,6,0.5))
axes.set_yticklabels(np.arange(0,6,0.5))

plt.show()

17
[50]: # Inference 11
#---------------------------------------------------------------------------------------------
labels = ['Male','Female']
colors = ['#707ccc', '#cc708d']

fig, axes = plt.subplots(nrows=1,ncols=3,figsize=(17, 8.5))

#---------------------------------------------------------------------------------------------
axes[0].set_title('Academic pressure of all \nindividuals of both genders',␣
↪size = 15)

axes[0].set_xlabel('Gender', size = 12)

axes[0].set_ylabel('Academic pressure (on a scale of 0 to 5)', size = 12)
bplot0 = axes[0].boxplot([df[df['Gender'] == 'Male']['Academic Pressure'],
df[df['Gender'] == 'Female']['Academic Pressure']],␣
↪widths=0.5,

patch_artist=True,
tick_labels=labels)

for patch, color in zip(bplot0['boxes'], colors):

patch.set_facecolor(color)

for median in bplot0['medians']:

median.set_color('black')

axes[0].tick_params(axis='x', labelrotation=25)
axes[0].set_yticks(np.arange(0,5.5,0.5))
axes[0].set_yticklabels(np.arange(0,5.5,0.5))
axes[0].set_aspect(0.5)
#---------------------------------------------------------------------------------------------
axes[1].set_title('Academic pressure of depressed \nindividuals of both␣
↪genders', size = 15)

axes[1].set_xlabel('Gender', size = 12)

axes[1].set_ylabel('Academic pressure (on a scale of 0 to 5)', size = 12)
bplot1 = axes[1].boxplot([df[df['Depression']==True][df['Gender'] ==␣
↪'Male']['Academic Pressure'],

df[df['Depression']==True][df['Gender'] ==␣
↪'Female']['Academic Pressure']], widths=0.5,

patch_artist=True,
tick_labels=labels)

for patch, color in zip(bplot1['boxes'], colors):

patch.set_facecolor(color)

for median in bplot1['medians']:

median.set_color('black')

axes[1].tick_params(axis='x', labelrotation=25)
axes[1].set_yticks(np.arange(0,5.5,0.5))

18
axes[1].set_yticklabels(np.arange(0,5.5,0.5))
axes[1].set_aspect(0.5)
#---------------------------------------------------------------------------------------------
axes[2].set_title('Academic pressure of non-depressed\n individuals of both␣
↪genders', size = 15)

axes[2].set_xlabel('Gender', size = 12)

axes[2].set_ylabel('Academic pressure (on a scale of 0 to 5)', size = 12)
bplot2 = axes[2].boxplot([df[df['Depression']==False][df['Gender'] ==␣
↪'Male']['Academic Pressure'],

df[df['Depression']==False][df['Gender'] ==␣
↪'Female']['Academic Pressure']], widths=0.5,

patch_artist=True,
tick_labels=labels)

for patch, color in zip(bplot2['boxes'], colors):

patch.set_facecolor(color)

for median in bplot2['medians']:

median.set_color('black')

axes[2].tick_params(axis='x', labelrotation=25)
axes[2].set_yticks(np.arange(0,5.5,0.5))
axes[2].set_yticklabels(np.arange(0,5.5,0.5))
axes[2].set_aspect(0.5)
#---------------------------------------------------------------------------------------------
plt.show()

19
[51]: # Inference 12
#---------------------------------------------------------------------------------------------
labels = ['Male','Female']
colors = ['#707ccc', '#cc708d']

fig, axes = plt.subplots(nrows=1,ncols=3,figsize=(12, 6.5))

#---------------------------------------------------------------------------------------------
axes[0].set_title('CGPA of all \nindividuals of both genders', size = 15)
axes[0].set_xlabel('Gender', size = 12)
axes[0].set_ylabel('CGPA', size = 12)
bplot0 = axes[0].boxplot([df[df['Gender'] == 'Male']['CGPA'],
df[df['Gender'] == 'Female']['CGPA']], widths=0.5,
patch_artist=True,
tick_labels=labels)

for patch, color in zip(bplot0['boxes'], colors):

patch.set_facecolor(color)

for median in bplot0['medians']:

median.set_color('black')

axes[0].set_yticks(np.arange(0,10.5,0.5))
axes[0].set_yticklabels(np.arange(0,10.5,0.5))
axes[0].set_aspect(0.35)
#---------------------------------------------------------------------------------------------
axes[1].set_title('CGPA of depressed\n individuals of both genders', size = 15)
axes[1].set_xlabel('Gender', size = 12)
axes[1].set_ylabel('CGPA', size = 12)
bplot1 = axes[1].boxplot([df[df['Depression']==True][df['Gender'] ==␣
↪'Male']['CGPA'],

df[df['Depression']==True][df['Gender'] ==␣
↪'Female']['CGPA']], widths=0.5,

patch_artist=True,
tick_labels=labels)

for patch, color in zip(bplot1['boxes'], colors):

patch.set_facecolor(color)

for median in bplot1['medians']:

median.set_color('black')

axes[1].set_yticks(np.arange(0,10.5,0.5))
axes[1].set_yticklabels(np.arange(0,10.5,0.5))
axes[1].set_aspect(0.35)
#---------------------------------------------------------------------------------------------
axes[2].set_title('CGPA of non-depressed\n individuals of both genders', size =␣
↪15)

20
axes[2].set_xlabel('Gender', size = 12)
axes[2].set_ylabel('CGPA', size = 12)
bplot2 = axes[2].boxplot([df[df['Depression']==False][df['Gender'] ==␣
↪'Male']['CGPA'],

df[df['Depression']==False][df['Gender'] ==␣
↪'Female']['CGPA']], widths=0.5,

patch_artist=True,
tick_labels=labels)

for patch, color in zip(bplot2['boxes'], colors):

patch.set_facecolor(color)

for median in bplot2['medians']:

median.set_color('black')

axes[2].set_yticks(np.arange(0,10.5,0.5))
axes[2].set_yticklabels(np.arange(0,10.5,0.5))
axes[2].set_aspect(0.35)
#---------------------------------------------------------------------------------------------
plt.show()

[52]: # Inference 13
#---------------------------------------------------------------------------------------------
fig, axes = plt.subplots(nrows=1,ncols=1,figsize=(6,7.5))
labels = df['Sleep Duration'].unique()

21
collection = []
colors = ['#48db5e','#0390fc','#d93261','#07f57e','#17e3d5']
for x in df['Sleep Duration'].unique() :
collection.append(df[df['Sleep Duration']==x]['Age'])

bplot = axes.boxplot(collection, widths=0.80,

patch_artist=True,
tick_labels=labels)

for patch, color in zip(bplot['boxes'], colors):

patch.set_facecolor(color)

for median in bplot['medians']:

median.set_color('black')

axes.tick_params(axis='x', labelrotation=45)
axes.set_title('How age affects sleeping hours', size = 15)
axes.set_xlabel('Sleeping hours', size = 12)
axes.set_ylabel('Age', size = 12)
axes.set_yticks(np.arange(14,60,2))
axes.set_yticklabels(np.arange(14,60,2))

plt.show()

22
[53]: # Inference 14
#---------------------------------------------------------------------------------------------
fig,axes = plt.subplots(nrows = 1, ncols = 2, figsize=(10, 6))

23
#---------------------------------------------------------------------------------------------
labels1 = df['Have you ever had suicidal thoughts ?'].value_counts().index
colors = ['#e87d5d','#62add9']

axes[0].bar(labels1, df['Have you ever had suicidal thoughts ?'].

↪value_counts(), width=0.4, color = colors)

axes[0].set_xticks(labels1,['Have had suicidal \nthoughts','Do not have␣

↪\nsuicidal thoughts'],

rotation=0, ha='center')
axes[0].tick_params(axis='x', labelsize=10)
axes[0].set_title('Individuals with/without\n suicidal throughts', size = 15)
axes[0].set_ylabel('Individuals (count)',size = 12)
#---------------------------------------------------------------------------------------------
explode = (0.05,0.05)
axes[1].pie(df['Have you ever had suicidal thoughts ?'].value_counts(),␣
↪autopct='%1.3f%%', shadow=True, startangle = 30,

labels = ['Have had \nsuicidal \nthoughts','Do not \nhave␣

↪suicidal\nthoughts'], explode = explode, colors=colors)

axes[1].set_title('Individuals with/without\n suicidal throughts', size = 15)

plt.subplots_adjust(left=0.1, right=0.9, top=0.9, bottom=0.1, wspace=0.1,␣
↪hspace=0.4)

#---------------------------------------------------------------------------------------------
plt.show()

24
[54]: # Inference 15
#---------------------------------------------------------------------------------------------
fig, axes = plt.subplots(nrows=1,ncols=1,figsize=(6,7.5))
labels = sorted(df['Financial Stress'].unique())
collection = []
colors = ['#48db5e','#0390fc','#b4db48','#07f57e','#17e3d5']

for x in sorted(df['Financial Stress'].unique()) :

collection.append(df[df['Financial Stress']==x]['CGPA'])

bplot = axes.boxplot(collection, widths=0.80,

patch_artist=True,
tick_labels=labels)

for patch, color in zip(bplot['boxes'], colors):

patch.set_facecolor(color)

for median in bplot['medians']:

median.set_color('black')

axes.set_title('How financial stress affects CGPA', size = 15)

axes.tick_params(axis='x', labelrotation=0)
axes.set_ylabel('CGPA', size = 12)
axes.set_xlabel('Financial stress (on a scale of 1 to 5)', size = 12)
axes.set_yticks(np.arange(0,10.5,0.5))
axes.set_yticklabels(np.arange(0,10.5,0.5))

plt.show()

25
[55]: # Inference 16
#---------------------------------------------------------------------------------------------
fig, axes = plt.subplots(nrows=1,ncols=1,figsize=(6,4))

random_rows = df.sample(n=50, axis='rows')

x = random_rows['Work/Study Hours']
y = random_rows['Age']
z = np.polyfit(x,y, 1)

26
p = np.poly1d(z)

axes.scatter(x,y,s =170, c = '#42424270')

axes.plot(x,p(x),'r--')
axes.set_xticks(np.arange(0,13,1))
axes.set_yticks(np.arange(16,61,3))
axes.set_title('Relation between age and work/study hours', size = 15)
axes.set_ylabel('Age', size = 12)
axes.set_xlabel('Work/study hours', size = 12)

plt.show()

[56]: # Inference 17
#---------------------------------------------------------------------------------------------
fig, axes = plt.subplots(nrows=1,ncols=1,figsize=(6,4))

random_rows = df.sample(n=50, axis='rows')

x = random_rows['Work/Study Hours']
y = random_rows['CGPA']
z = np.polyfit(x,y, 1)
p = np.poly1d(z)

27
axes.scatter(x,y,s =170, c = '#42424270')
axes.plot(x,p(x),'r--')
axes.set_xticks(np.arange(0,13,1))
axes.set_yticks(np.arange(0,10.5,1))
axes.set_title('Relation between work/study hours and CGPA', size = 15)
axes.set_ylabel('CGPA', size = 12)
axes.set_xlabel('Work/study hours', size = 12)

plt.show()

[57]: # Inference 18
#---------------------------------------------------------------------------------------------
g = sns.pairplot(df[['Age','Academic Pressure','CGPA','Study␣
↪Satisfaction','Work/Study Hours','Financial Stress']])

g.figure.suptitle("Pair plot to show pairwise bivariate distribution", y=1)

plt.show()

28
[58]: # Inference 19
#---------------------------------------------------------------------------------------------
sns.kdeplot(data=df, x ='Academic Pressure')
plt.title('Density distribution for Academic Pressure')
plt.xlabel('Academic Pressure (on a scale of 0 to 5)')
plt.show()

29
[59]: df[df['Academic Pressure'] < 3]['Depression'].value_counts()

[59]: Depression
False 6475
True 2497
Name: count, dtype: int64

[60]: df[df['Academic Pressure'] > 3]['Depression'].value_counts()

[60]: Depression
True 9335
False 2103
Name: count, dtype: int64

[61]: # Inference 20
#---------------------------------------------------------------------------------------------
sns.kdeplot(data=df, x = 'Study Satisfaction')
plt.title('Density distribution for Study Satisfaction')
plt.xlabel('Study Satisfaction (on a scale of 0 to 5)')
plt.show()

30
[62]: df[df['Study Satisfaction'] < 4]['Depression'].value_counts()

[62]: Depression
True 10969
False 6123
Name: count, dtype: int64

[63]: df[df['Study Satisfaction'] > 3]['Depression'].value_counts()

[63]: Depression
False 5421
True 5344
Name: count, dtype: int64

[64]: # Inference 21
#---------------------------------------------------------------------------------------------
sns.kdeplot(data=df, x = 'Financial Stress')
plt.title('Density distribution for Financial Stress')
plt.xlabel('Financial Stress (on a scale of 0 to 5)')
plt.show()

31
[65]: df[df['Financial Stress'] > 4]['Depression'].value_counts()

[65]: Depression
True 5448
False 1257
Name: count, dtype: int64

[66]: # Inference 22
#---------------------------------------------------------------------------------------------
matrix = df[['Age','Academic Pressure','CGPA','Study Satisfaction','Work/Study␣
↪Hours','Financial Stress']]

values = pd.DataFrame(columns=['mean','mode','median','standard␣
↪deviation','confidence interval at 95%','standard error'],

index=['Age','Academic Pressure','CGPA','Study␣
↪Satisfaction','Work/Study Hours','Financial Stress'])

for x in matrix.columns :
values.loc[x,'mean'] = matrix[x].mean()
values.loc[x,'mode'] = matrix[x].mode()[0]

32
values.loc[x,'median'] = matrix[x].median()
values.loc[x,'standard deviation'] = matrix[x].std()
values.loc[x,'standard error'] = matrix[x].sem()
interval = values.loc[x,'standard error'] * stats.t.ppf((1 + 0.95) / 2,␣
↪len(matrix[x]) - 1)

values.loc[x,'confidence interval at 95%'] = (values.loc[x,'mean'] -␣

↪interval, values.loc[x,'mean'] + interval)

values.head()

[66]: mean mode median standard deviation \

Age 25.820835 24 25.0 4.906158
Academic Pressure 3.14158 3 3.0 1.381802
CGPA 7.655911 8.04 7.77 1.470837
Study Satisfaction 2.944395 4 3.0 1.360876
Work/Study Hours 7.157196 10 8.0 3.707066

confidence interval at 95% standard error

Age (25.76321921075781, 25.878450746524024) 0.029395
Academic Pressure (3.125352936553525, 3.1578074899102004) 0.008279
CGPA (7.638638485585483, 7.673184216069394) 0.008812
Study Satisfaction (2.9284130546380416, 2.96037611863977) 0.008154
Work/Study Hours (7.113661520434242, 7.200729835418866) 0.022211

[67]: # Inference 23
#---------------------------------------------------------------------------------------------
corr_matrix = df[['Age','Academic Pressure','CGPA','Study Satisfaction','Work/
↪Study Hours','Financial Stress']].corr()

plt.imshow(corr_matrix, cmap='autumn_r', interpolation='nearest')

plt.colorbar()
plt.grid(False)
plt.title('Heat map of correlation coefficients\n betweeen each pair of␣
↪features')

tick_marks = np.arange(len(corr_matrix.columns))
plt.xticks(tick_marks, corr_matrix.columns, rotation=80)
plt.yticks(tick_marks, corr_matrix.index)

for i in range(len(corr_matrix.index)):
for j in range(len(corr_matrix.columns)):
plt.text(j, i, round(corr_matrix.iloc[i, j],3), ha="center",␣
↪va="center", color="black")

plt.show()

33
[68]: #for summary -- refer thesis/documentation

Ruqya Ben Halima
67% (3)
Ruqya Ben Halima
122 pages
Vedant, Aiml
No ratings yet
Vedant, Aiml
63 pages
Final CS Project Report
No ratings yet
Final CS Project Report
50 pages
Plagiarism Report - Consumer Behaviour-1
No ratings yet
Plagiarism Report - Consumer Behaviour-1
54 pages
Migraine Classification 1700730646
No ratings yet
Migraine Classification 1700730646
26 pages
Python Pandas
No ratings yet
Python Pandas
13 pages
Recharge: Practical Devotion for Everyday
From Everand
Recharge: Practical Devotion for Everyday
Regina D. Campbell
No ratings yet
AI Project LogBook (1) 2
No ratings yet
AI Project LogBook (1) 2
29 pages
Titanic Survival Prediction ML
No ratings yet
Titanic Survival Prediction ML
36 pages
Whipple Surgery
No ratings yet
Whipple Surgery
22 pages
DA Lab
No ratings yet
DA Lab
27 pages
580 March Womens Health - v6
No ratings yet
580 March Womens Health - v6
32 pages
Review 2
No ratings yet
Review 2
50 pages
Cleaning Process Data Analysis and Visualisation Using Python
No ratings yet
Cleaning Process Data Analysis and Visualisation Using Python
15 pages
Class XII-IP-Practical File 1
No ratings yet
Class XII-IP-Practical File 1
28 pages
Document 15
No ratings yet
Document 15
22 pages
Heart Disease Diagnosis Using Machine Learning
No ratings yet
Heart Disease Diagnosis Using Machine Learning
26 pages
Mehak Monika Ip Project Final 1
No ratings yet
Mehak Monika Ip Project Final 1
24 pages
Aiml
No ratings yet
Aiml
27 pages
Mini Project UNC514: Data Science Fundamentals
No ratings yet
Mini Project UNC514: Data Science Fundamentals
23 pages
Emotion Classification With DistilBERT
No ratings yet
Emotion Classification With DistilBERT
25 pages
Ambulatory Care Medical Surgical Nursing
100% (5)
Ambulatory Care Medical Surgical Nursing
11 pages
Third Round Allotment Opening Closing - MP State Ayush NEET-UG Counselling - 2024 (BAMS, BHMS, BUMS) (Date 29-10-2024) - 193
No ratings yet
Third Round Allotment Opening Closing - MP State Ayush NEET-UG Counselling - 2024 (BAMS, BHMS, BUMS) (Date 29-10-2024) - 193
13 pages
Anemia Code
No ratings yet
Anemia Code
33 pages
Python Analysis
No ratings yet
Python Analysis
30 pages
1 Edition Competition Guide Rio2022boccia
No ratings yet
1 Edition Competition Guide Rio2022boccia
36 pages
Similarity - Mental Health Analysis Dashboards in Power BI
No ratings yet
Similarity - Mental Health Analysis Dashboards in Power BI
18 pages
Sleep Disorder 1689050852
No ratings yet
Sleep Disorder 1689050852
41 pages
Oriflame Catalogue AUGUST 2021 HD
No ratings yet
Oriflame Catalogue AUGUST 2021 HD
108 pages
Logistic Regression
No ratings yet
Logistic Regression
28 pages
Stroke Prediction
No ratings yet
Stroke Prediction
14 pages
Student Mental Health Vs CGPA - EDA - Colab
No ratings yet
Student Mental Health Vs CGPA - EDA - Colab
18 pages
Ist Year Assignment July 2019 PDF
No ratings yet
Ist Year Assignment July 2019 PDF
16 pages
Model2.ipynb - Colab
No ratings yet
Model2.ipynb - Colab
11 pages
Python Solution
No ratings yet
Python Solution
30 pages
Depression
No ratings yet
Depression
96 pages
Final Group Project
No ratings yet
Final Group Project
26 pages
Natural Language Understanding
No ratings yet
Natural Language Understanding
14 pages
# Load Packages: Pandas Pandas PD PD Numpy Numpy NP NP
No ratings yet
# Load Packages: Pandas Pandas PD PD Numpy Numpy NP NP
17 pages
LDA Code
No ratings yet
LDA Code
19 pages
DS CP
No ratings yet
DS CP
18 pages
Preprocessing1.ipynb - Colab
No ratings yet
Preprocessing1.ipynb - Colab
13 pages
Rimjhim
No ratings yet
Rimjhim
21 pages
CHC Report
No ratings yet
CHC Report
17 pages
Dsbda 5
No ratings yet
Dsbda 5
12 pages
ML Expt 1 Description
No ratings yet
ML Expt 1 Description
15 pages
Mental Health Tracker
No ratings yet
Mental Health Tracker
12 pages
Heart Disease Indicator Prediction Model
No ratings yet
Heart Disease Indicator Prediction Model
17 pages
Datascience 2 PDF
No ratings yet
Datascience 2 PDF
24 pages
Gym Mngment System CH.1
No ratings yet
Gym Mngment System CH.1
7 pages
Assignment 1 - LP1
No ratings yet
Assignment 1 - LP1
14 pages
Final Paper in Methods of Research
No ratings yet
Final Paper in Methods of Research
18 pages
Biomechanical Determinants of The Modified And.11
No ratings yet
Biomechanical Determinants of The Modified And.11
12 pages
Data Cleaning and Pre Processing 1
No ratings yet
Data Cleaning and Pre Processing 1
12 pages
Heart - Disease - 1.ipynb - Colaboratory
No ratings yet
Heart - Disease - 1.ipynb - Colaboratory
9 pages
Data Analyzer
No ratings yet
Data Analyzer
10 pages
Data Visualization
No ratings yet
Data Visualization
13 pages
Suicide Analysis
No ratings yet
Suicide Analysis
18 pages
Stroke Prediction
No ratings yet
Stroke Prediction
10 pages
Itm 617 Capstone Code - Colaboratory
No ratings yet
Itm 617 Capstone Code - Colaboratory
13 pages
Sample Data For The Past 5 Years
No ratings yet
Sample Data For The Past 5 Years
8 pages
Numpy
No ratings yet
Numpy
9 pages
Latihan Fizikal
No ratings yet
Latihan Fizikal
55 pages
Science 4 Q4 Week 6 DLL
No ratings yet
Science 4 Q4 Week 6 DLL
8 pages
Machine Learning Techniques For Prediction of Mental Health
No ratings yet
Machine Learning Techniques For Prediction of Mental Health
8 pages
Assignment 2 - Jupyter Notebook
No ratings yet
Assignment 2 - Jupyter Notebook
8 pages
Logistic Regression
No ratings yet
Logistic Regression
12 pages
The Dentists of The Family Health Strategy Study On The Taste Related To Their Practices
No ratings yet
The Dentists of The Family Health Strategy Study On The Taste Related To Their Practices
6 pages
Business Plan
No ratings yet
Business Plan
8 pages
Baseline - Ipynb - Colab
No ratings yet
Baseline - Ipynb - Colab
5 pages
Indian J Community Med 2005 Ses
No ratings yet
Indian J Community Med 2005 Ses
5 pages
Sav
No ratings yet
Sav
5 pages
Analyzing Depression in Students
No ratings yet
Analyzing Depression in Students
5 pages
ML 7
No ratings yet
ML 7
6 pages
Data Science Assignment Submission
No ratings yet
Data Science Assignment Submission
12 pages
Project Code Health Sleep Lifestyle
No ratings yet
Project Code Health Sleep Lifestyle
4 pages
Amma Sadaram
No ratings yet
Amma Sadaram
1 page
Jamainternal Budhathoki 2019 Oi 190058
No ratings yet
Jamainternal Budhathoki 2019 Oi 190058
10 pages
Submitted By: Parth Saraogi (18scse1010348) QUES 46
No ratings yet
Submitted By: Parth Saraogi (18scse1010348) QUES 46
9 pages
Conflicts in Organizations Causes and Consequences
No ratings yet
Conflicts in Organizations Causes and Consequences
6 pages
Health Dataset
No ratings yet
Health Dataset
2 pages
Swing Grinding Jha
No ratings yet
Swing Grinding Jha
6 pages
Guidelines Rodent Blood Collection
No ratings yet
Guidelines Rodent Blood Collection
2 pages
Microblading Guide
No ratings yet
Microblading Guide
12 pages
Fact Sheet PediatricAPTA EvidenceBasedPractice - 2007
No ratings yet
Fact Sheet PediatricAPTA EvidenceBasedPractice - 2007
3 pages
Terror Casualty Attack
No ratings yet
Terror Casualty Attack
6 pages
Resume Ronel Gamuchirai Murungweni
No ratings yet
Resume Ronel Gamuchirai Murungweni
2 pages
IT Bio F4 Topical Test 4 (BL)
No ratings yet
IT Bio F4 Topical Test 4 (BL)
8 pages
Attiana: Contact Me
No ratings yet
Attiana: Contact Me
1 page
List of International Days-2605j
No ratings yet
List of International Days-2605j
12 pages