0% found this document useful (0 votes)

29 views88 pages

Aids

- The document analyzes data from the AIDS Clinical Trials Group Study 175, which tested the effectiveness of antiretroviral treatment regimens for HIV/AIDS. - The data is read into a Pandas dataframe with 2,139 rows and 25 columns containing patient information like treatment assignment, demographics, CD4 cell counts over time. - The dataframe is explored through methods like .info(), .describe(), and classifying its features as categorical, discrete, or continuous. Feature values are also printed.

Uploaded by

Carlos Huerta Salas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views88 pages

Aids

Uploaded by

Carlos Huerta Salas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 88

AIDS_Clinical_Trials_Group_Study_175 Analysis

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

import warnings
warnings.filterwarnings('ignore')

df = pd.read_csv("aids.csv")

df.head()

Unnamed: 0 time trt age wtkg hemo homo drugs karnof

oprior \
0 0 948 2 48 89.8128 0 0 0 100
0
1 1 1002 3 61 49.4424 0 0 0 90
0
2 2 961 3 45 88.4520 0 1 1 90
0
3 3 1166 3 47 85.2768 0 1 0 100
0
4 4 1090 0 43 66.6792 0 1 0 100
0

... str2 strat symptom treat offtrt cd40 cd420 cd80 cd820
cid
0 ... 0 1 0 1 0 422 477 566 324
0
1 ... 1 3 0 1 0 162 218 392 564
1
2 ... 1 3 0 1 1 326 274 2063 1893
0
3 ... 1 3 0 1 0 287 394 1590 966
0
4 ... 1 3 0 0 0 504 353 870 782
0

[5 rows x 25 columns]

df.tail()

Unnamed: 0 time trt age wtkg hemo homo drugs karnof

oprior \
2134 2134 1091 3 21 53.2980 1 0 0 100
0
2135 2135 395 0 17 102.9672 1 0 0 100
0
2136 2136 1104 2 53 69.8544 1 1 0 90
0
2137 2137 465 0 14 60.0000 1 0 0 100
0
2138 2138 1045 3 45 77.3000 1 0 0 100
0

... str2 strat symptom treat offtrt cd40 cd420 cd80

cd820 cid
2134 ... 1 3 0 1 1 152 109 561
720 0
2135 ... 1 3 0 0 1 373 218 1759
1030 0
2136 ... 1 3 0 1 0 419 364 1391
1041 0
2137 ... 0 1 0 0 0 166 169 999
1838 1
2138 ... 0 1 0 1 0 911 930 885
526 0

[5 rows x 25 columns]

df.shape

(2139, 25)

df.columns

Index(['Unnamed: 0', 'time', 'trt', 'age', 'wtkg', 'hemo', 'homo',

'drugs',
'karnof', 'oprior', 'z30', 'zprior', 'preanti', 'race',
'gender',
'str2', 'strat', 'symptom', 'treat', 'offtrt', 'cd40', 'cd420',
'cd80',
'cd820', 'cid'],
dtype='object')

df.duplicated().sum()

df.isnull().sum()

Unnamed: 0 0
time 0
trt 0
age 0
wtkg 0
hemo 0
homo 0
drugs 0
karnof 0
oprior 0
z30 0
zprior 0
preanti 0
race 0
gender 0
str2 0
strat 0
symptom 0
treat 0
offtrt 0
cd40 0
cd420 0
cd80 0
cd820 0
cid 0
dtype: int64

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2139 entries, 0 to 2138
Data columns (total 25 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Unnamed: 0 2139 non-null int64
1 time 2139 non-null int64
2 trt 2139 non-null int64
3 age 2139 non-null int64
4 wtkg 2139 non-null float64
5 hemo 2139 non-null int64
6 homo 2139 non-null int64
7 drugs 2139 non-null int64
8 karnof 2139 non-null int64
9 oprior 2139 non-null int64
10 z30 2139 non-null int64
11 zprior 2139 non-null int64
12 preanti 2139 non-null int64
13 race 2139 non-null int64
14 gender 2139 non-null int64
15 str2 2139 non-null int64
16 strat 2139 non-null int64
17 symptom 2139 non-null int64
18 treat 2139 non-null int64
19 offtrt 2139 non-null int64
20 cd40 2139 non-null int64
21 cd420 2139 non-null int64
22 cd80 2139 non-null int64
23 cd820 2139 non-null int64
24 cid 2139 non-null int64
dtypes: float64(1), int64(24)
memory usage: 417.9 KB

df.describe()

Unnamed: 0 time trt age wtkg

\
count 2139.000000 2139.000000 2139.000000 2139.000000 2139.000000

mean 1069.000000 879.098177 1.520804 35.248247 75.125311

std 617.620434 292.274324 1.127890 8.709026 13.263164

min 0.000000 14.000000 0.000000 12.000000 31.000000

25% 534.500000 727.000000 1.000000 29.000000 66.679200

50% 1069.000000 997.000000 2.000000 34.000000 74.390400

75% 1603.500000 1091.000000 3.000000 40.000000 82.555200

max 2138.000000 1231.000000 3.000000 70.000000 159.939360

hemo homo drugs karnof oprior

... \
count 2139.000000 2139.000000 2139.000000 2139.000000 2139.000000
...
mean 0.084151 0.661057 0.131370 95.446470 0.021973
...
std 0.277680 0.473461 0.337883 5.900985 0.146629
...
min 0.000000 0.000000 0.000000 70.000000 0.000000
...
25% 0.000000 0.000000 0.000000 90.000000 0.000000
...
50% 0.000000 1.000000 0.000000 100.000000 0.000000
...
75% 0.000000 1.000000 0.000000 100.000000 0.000000
...
max 1.000000 1.000000 1.000000 100.000000 1.000000
...

str2 strat symptom treat offtrt

\
count 2139.000000 2139.000000 2139.000000 2139.000000 2139.000000

mean 0.585788 1.979897 0.172978 0.751286 0.362786

std 0.492701 0.899053 0.378317 0.432369 0.480916

min 0.000000 1.000000 0.000000 0.000000 0.000000

25% 0.000000 1.000000 0.000000 1.000000 0.000000

50% 1.000000 2.000000 0.000000 1.000000 0.000000

75% 1.000000 3.000000 0.000000 1.000000 1.000000

max 1.000000 3.000000 1.000000 1.000000 1.000000

cd40 cd420 cd80 cd820 cid

count 2139.000000 2139.000000 2139.000000 2139.000000 2139.000000

mean 350.501169 371.307153 986.627396 935.369799 0.243572

std 118.573863 144.634909 480.197750 444.976051 0.429338

min 0.000000 49.000000 40.000000 124.000000 0.000000

25% 263.500000 269.000000 654.000000 631.500000 0.000000

50% 340.000000 353.000000 893.000000 865.000000 0.000000

75% 423.000000 460.000000 1207.000000 1146.500000 0.000000

max 1199.000000 1119.000000 5011.000000 6035.000000 1.000000

[8 rows x 25 columns]

df = df.drop(['Unnamed: 0', 'cid'], axis = 1)

object_columns = df.select_dtypes(include=['object', 'bool']).columns

print("Object type columns:")
print(object_columns)

numerical_columns = df.select_dtypes(include=['int64',
'float64']).columns
print("\nNumerical type columns:")
print(numerical_columns)

Object type columns:

Index([], dtype='object')

Numerical type columns:

Index(['time', 'trt', 'age', 'wtkg', 'hemo', 'homo', 'drugs',
'karnof',
'oprior', 'z30', 'zprior', 'preanti', 'race', 'gender', 'str2',
'strat',
'symptom', 'treat', 'offtrt', 'cd40', 'cd420', 'cd80',
'cd820'],
dtype='object')

def classify_features(df):
categorical_features = []
non_categorical_features = []
discrete_features = []
continuous_features = []

for column in df.columns:

if df[column].dtype in ['object', 'bool']:
if df[column].nunique() < 15:
categorical_features.append(column)
else:
non_categorical_features.append(column)
elif df[column].dtype in ['int64', 'float64']:
if df[column].nunique() < 10:
discrete_features.append(column)
else:
continuous_features.append(column)

return categorical_features, non_categorical_features,

discrete_features, continuous_features

categorical, non_categorical, discrete, continuous =

classify_features(df)
print("Categorical Features:", categorical)
print("Non-Categorical Features:", non_categorical)
print("Discrete Features:", discrete)
print("Continuous Features:", continuous)

Categorical Features: []
Non-Categorical Features: []
Discrete Features: ['trt', 'hemo', 'homo', 'drugs', 'karnof',
'oprior', 'z30', 'zprior', 'race', 'gender', 'str2', 'strat',
'symptom', 'treat', 'offtrt']
Continuous Features: ['time', 'age', 'wtkg', 'preanti', 'cd40',
'cd420', 'cd80', 'cd820']

for i in discrete:
print(i, ':')
print(df[i].unique())
print()

trt :
[2 3 0 1]

hemo :
[0 1]

homo :
[0 1]

drugs :
[0 1]

karnof :
[100 90 80 70]

oprior :
[0 1]

z30 :
[0 1]

zprior :
[1]

race :
[0 1]

gender :
[0 1]

str2 :
[0 1]
strat :
[1 3 2]

symptom :
[0 1]

treat :
[1 0]

offtrt :
[0 1]

for i in discrete:
print(i, ':')
print(df[i].value_counts())
print()

trt :
3 561
0 532
2 524
1 522
Name: trt, dtype: int64

hemo :
0 1959
1 180
Name: hemo, dtype: int64

homo :
1 1414
0 725
Name: homo, dtype: int64

drugs :
0 1858
1 281
Name: drugs, dtype: int64

karnof :
100 1263
90 787
80 80
70 9
Name: karnof, dtype: int64

oprior :
0 2092
1 47
Name: oprior, dtype: int64
z30 :
1 1177
0 962
Name: z30, dtype: int64

zprior :
1 2139
Name: zprior, dtype: int64

race :
0 1522
1 617
Name: race, dtype: int64

gender :
1 1771
0 368
Name: gender, dtype: int64

str2 :
1 1253
0 886
Name: str2, dtype: int64

strat :
1 886
3 843
2 410
Name: strat, dtype: int64

symptom :
0 1769
1 370
Name: symptom, dtype: int64

treat :
1 1607
0 532
Name: treat, dtype: int64

offtrt :
0 1363
1 776
Name: offtrt, dtype: int64

for i in discrete:
plt.figure(figsize=(15,6))
sns.countplot(df[i], data = df, palette='hls')
plt.show()
for i in discrete:
plt.figure(figsize=(20,10))
plt.pie(df[i].value_counts(), labels=df[i].value_counts().index,
autopct='%1.1f%%', textprops={'fontsize': 15,
'color': 'black',
'weight': 'bold',
'family': 'serif' })
hfont = {'fontname':'serif', 'weight': 'bold'}
plt.title(i, size=20, **hfont)
plt.show()
for i in continuous:
plt.figure(figsize=(15,6))
sns.histplot(df[i], bins = 20, kde = True, palette='hls')
plt.xticks(rotation = 90)
plt.show()
for i in continuous:
plt.figure(figsize=(15,6))
sns.distplot(df[i], bins = 20, kde = True)
plt.xticks(rotation = 90)
plt.show()
for i in continuous:
plt.figure(figsize=(15,6))
sns.boxplot(i, data = df, palette='hls')
plt.xticks(rotation = 90)
plt.show()
for i in continuous:
plt.figure(figsize=(15,6))
sns.violinplot(i, data = df, palette='hls')
plt.xticks(rotation = 90)
plt.show()
for i in continuous:
for j in continuous:
if i != j:
plt.figure(figsize=(15,6))
sns.scatterplot(x = i, y = j, data = df, ci = None,
palette='hls')
plt.xticks(rotation = 90)
plt.show()
for i in continuous:
for j in continuous:
if i != j:
plt.figure(figsize=(15,6))
sns.lineplot(x = i, y = j, data = df, ci = None,
palette='hls')
plt.xticks(rotation = 90)
plt.show()
correlation_matrix = df.corr()

correlation_matrix

time trt age wtkg hemo homo

drugs \
time 1.000000 0.101482 0.026544 0.009225 -0.017501 0.043430 -
0.021856
trt 0.101482 1.000000 -0.001931 -0.031685 0.012329 0.025035
0.005712
age 0.026544 -0.001931 1.000000 0.132858 -0.231257 0.158917
0.077446
wtkg 0.009225 -0.031685 0.132858 1.000000 -0.075791 0.155909
0.002343
hemo -0.017501 0.012329 -0.231257 -0.075791 1.000000 -0.391307 -
0.092957
homo 0.043430 0.025035 0.158917 0.155909 -0.391307 1.000000 -
0.206876
drugs -0.021856 0.005712 0.077446 0.002343 -0.092957 -0.206876
1.000000
karnof 0.094417 -0.014573 -0.100041 0.034271 0.068403 -0.042072 -
0.084558
oprior -0.016116 -0.026805 0.056161 0.009607 0.034978 0.019743 -
0.029968
z30 0.012898 -0.001656 0.061178 -0.073841 0.111554 -0.049760
0.014961
zprior NaN NaN NaN NaN NaN NaN
NaN
preanti 0.007249 0.006710 0.113220 -0.079292 0.113892 0.014132 -
0.029981
race -0.051276 0.017080 -0.097678 -0.081452 -0.070333 -0.307108
0.082311
gender 0.020810 0.022691 0.048705 0.240013 0.115867 0.607820 -
0.141748
str2 0.010098 -0.003003 0.068230 -0.078885 0.124983 -0.036700
0.001106
strat 0.022033 -0.003508 0.089884 -0.080458 0.141674 -0.022608 -
0.011319
symptom -0.104611 -0.000765 0.032814 0.003942 -0.076296 0.118575
0.027052
treat 0.153314 0.775990 0.001499 -0.040638 0.010786 0.024407
0.022055
offtrt -0.475795 -0.043239 -0.057695 -0.003159 0.005949 -0.045151
0.098031
cd40 0.191436 -0.012770 -0.040302 0.036401 -0.022533 0.000511 -
0.003360
cd420 0.350611 0.064448 -0.044294 0.020980 -0.065838 0.019915
0.013109
cd80 -0.017425 -0.015665 0.046874 0.090075 -0.037273 0.086028
0.014900
cd820 0.032480 -0.004595 0.037458 0.085447 -0.058392 0.082284
0.025728

karnof oprior z30 ... gender str2

strat \
time 0.094417 -0.016116 0.012898 ... 0.020810 0.010098
0.022033
trt -0.014573 -0.026805 -0.001656 ... 0.022691 -0.003003 -
0.003508
age -0.100041 0.056161 0.061178 ... 0.048705 0.068230
0.089884
wtkg 0.034271 0.009607 -0.073841 ... 0.240013 -0.078885 -
0.080458
hemo 0.068403 0.034978 0.111554 ... 0.115867 0.124983
0.141674
homo -0.042072 0.019743 -0.049760 ... 0.607820 -0.036700 -
0.022608
drugs -0.084558 -0.029968 0.014961 ... -0.141748 0.001106 -
0.011319
karnof 1.000000 -0.057291 -0.074947 ... -0.011695 -0.085975 -
0.055172
oprior -0.057291 1.000000 -0.037580 ... 0.042976 0.126040
0.134629
z30 -0.074947 -0.037580 1.000000 ... -0.036119 0.903417
0.848624
zprior NaN NaN NaN ... NaN NaN
NaN
preanti -0.023189 0.067082 0.655054 ... 0.032099 0.680354
0.833213
race 0.026155 -0.003923 -0.073658 ... -0.292146 -0.080510 -
0.106307
gender -0.011695 0.042976 -0.036119 ... 1.000000 -0.031258
0.003586
str2 -0.085975 0.126040 0.903417 ... -0.031258 1.000000
0.916723
strat -0.055172 0.134629 0.848624 ... 0.003586 0.916723
1.000000
symptom -0.107940 0.024199 0.020883 ... 0.064373 0.030760
0.041857
treat 0.001379 -0.031801 0.003776 ... 0.024280 0.005794 -
0.000836
offtrt -0.103251 0.019561 -0.029318 ... -0.019309 -0.026789 -
0.051276
cd40 0.077730 -0.059199 -0.121282 ... -0.030423 -0.124566 -
0.121317
cd420 0.098463 -0.109643 -0.200149 ... -0.023369 -0.216457 -
0.206306
cd80 -0.008567 -0.019247 0.029346 ... 0.087233 0.009576
0.032360
cd820 -0.003981 -0.036577 0.018454 ... 0.087572 0.012055
0.021257

symptom treat offtrt cd40 cd420 cd80

cd820
time -0.104611 0.153314 -0.475795 0.191436 0.350611 -0.017425
0.032480
trt -0.000765 0.775990 -0.043239 -0.012770 0.064448 -0.015665 -
0.004595
age 0.032814 0.001499 -0.057695 -0.040302 -0.044294 0.046874
0.037458
wtkg 0.003942 -0.040638 -0.003159 0.036401 0.020980 0.090075
0.085447
hemo -0.076296 0.010786 0.005949 -0.022533 -0.065838 -0.037273 -
0.058392
homo 0.118575 0.024407 -0.045151 0.000511 0.019915 0.086028
0.082284
drugs 0.027052 0.022055 0.098031 -0.003360 0.013109 0.014900
0.025728
karnof -0.107940 0.001379 -0.103251 0.077730 0.098463 -0.008567 -
0.003981
oprior 0.024199 -0.031801 0.019561 -0.059199 -0.109643 -0.019247 -
0.036577
z30 0.020883 0.003776 -0.029318 -0.121282 -0.200149 0.029346
0.018454
zprior NaN NaN NaN NaN NaN NaN
NaN
preanti 0.012304 0.005682 -0.042379 -0.067495 -0.132213 0.037500
0.023221
race -0.078378 -0.006071 0.004638 -0.001290 -0.035935 0.006930
0.009981
gender 0.064373 0.024280 -0.019309 -0.030423 -0.023369 0.087233
0.087572
str2 0.030760 0.005794 -0.026789 -0.124566 -0.216457 0.009576
0.012055
strat 0.041857 -0.000836 -0.051276 -0.121317 -0.206306 0.032360
0.021257
symptom 1.000000 0.008648 0.071388 -0.131006 -0.124883 0.035311
0.049254
treat 0.008648 1.000000 -0.051731 -0.013123 0.139934 -0.000746
0.009255
offtrt 0.071388 -0.051731 1.000000 -0.145311 -0.196474 -0.033651 -
0.024180
cd40 -0.131006 -0.013123 -0.145311 1.000000 0.583578 0.214274
0.073039
cd420 -0.124883 0.139934 -0.196474 0.583578 1.000000 0.054165
0.216472
cd80 0.035311 -0.000746 -0.033651 0.214274 0.054165 1.000000
0.756218
cd820 0.049254 0.009255 -0.024180 0.073039 0.216472 0.756218
1.000000

[23 rows x 23 columns]

plt.figure(figsize=(20,10))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm')
plt.show()
threshold = 0.75
correlation_pairs = set()

for i in range(len(correlation_matrix.columns)):
for j in range(i):
if abs(correlation_matrix.iloc[i, j]) > threshold:
colname_i = correlation_matrix.columns[i]
colname_j = correlation_matrix.columns[j]
correlation_pairs.add((colname_i, colname_j))

for pair in correlation_pairs:

feature1, feature2 = pair
if feature1 in df.columns and feature2 in df.columns:
if df[feature1].var() > df[feature2].var():
df.drop(feature2, axis=1, inplace=True)
else:
df.drop(feature1, axis=1, inplace=True)

correlation_matrix_after_drop = df.corr()

plt.figure(figsize=(12, 10))
sns.heatmap(correlation_matrix_after_drop, annot=True,
cmap='coolwarm', fmt=".2f")
plt.title('Feature Correlation Matrix After Dropping Highly Correlated
Features')
plt.show()
print("Remaining Features:")
print(df.columns)

Remaining Features:
Index(['time', 'trt', 'age', 'wtkg', 'hemo', 'homo', 'drugs',
'karnof',
'oprior', 'zprior', 'preanti', 'race', 'gender', 'symptom',
'offtrt',
'cd40', 'cd420', 'cd80'],
dtype='object')

features = ['age', 'hemo', 'cd40']

target = 'time'

fig = plt.figure(figsize=(10, 8))

ax = fig.add_subplot(111, projection='3d')
ax.scatter(df[features[0]], df[features[1]], df[features[2]],
c=df[target], cmap='viridis', marker='o')
ax.set_xlabel(features[0])
ax.set_ylabel(features[1])
ax.set_zlabel(features[2])
ax.set_title(f'3D Scatter Plot: {features[0]}, {features[1]},
{features[2]} vs. {target}')

plt.show()
scatter_features = ['age', 'cd40', 'wtkg']

sns.pairplot(df, x_vars=scatter_features, y_vars='time', height=4)

plt.suptitle('Scatter Plots of Features vs. Time', y=1.02)
plt.show()

surface_features = ['age', 'hemo', 'cd40']

fig = plt.figure(figsize=(10, 8))

ax = fig.add_subplot(111, projection='3d')
ax.plot_trisurf(df[surface_features[0]], df[surface_features[1]],
df[surface_features[2]], cmap='viridis', linewidth=0.2)
ax.set_xlabel(surface_features[0])
ax.set_ylabel(surface_features[1])
ax.set_zlabel(surface_features[2])
ax.set_title(f'3D Surface Plot: {surface_features[0]},
{surface_features[1]}, {surface_features[2]} vs. Time')

plt.show()
Thanks !!!

Do417 2.8 Student Guide
No ratings yet
Do417 2.8 Student Guide
600 pages
Diabetes - Prediction - Project - Ipynb - Colab
No ratings yet
Diabetes - Prediction - Project - Ipynb - Colab
11 pages
Braced Cuts
No ratings yet
Braced Cuts
62 pages
Heart Failure Prediction
100% (1)
Heart Failure Prediction
41 pages
Cardio Screen RF
100% (1)
Cardio Screen RF
27 pages
KDP Amazon
100% (1)
KDP Amazon
7 pages
QAQC
100% (1)
QAQC
15 pages
Kidney Ipynb
No ratings yet
Kidney Ipynb
253 pages
Stroke Prediction Dataset
No ratings yet
Stroke Prediction Dataset
48 pages
Diabetes
No ratings yet
Diabetes
97 pages
Doctors Appointment
No ratings yet
Doctors Appointment
34 pages
DAL Experiment Outputs 6to10
No ratings yet
DAL Experiment Outputs 6to10
16 pages
ML Proj Diabetes
No ratings yet
ML Proj Diabetes
51 pages
Asthama Disease Prediction Using Machine Learning !!!!: Importing Necessary Libraries
No ratings yet
Asthama Disease Prediction Using Machine Learning !!!!: Importing Necessary Libraries
55 pages
Sleep Disorder 1689050852
No ratings yet
Sleep Disorder 1689050852
41 pages
Data Science Code
No ratings yet
Data Science Code
29 pages
Heart Disease Diagnosis Using Machine Learning
No ratings yet
Heart Disease Diagnosis Using Machine Learning
26 pages
Diabetis Project
No ratings yet
Diabetis Project
7 pages
Capstone Project 2
No ratings yet
Capstone Project 2
15 pages
Python Solution
No ratings yet
Python Solution
30 pages
Python 2025
No ratings yet
Python 2025
25 pages
Binary Prediction of Smoker Status Using Bio-Signals
No ratings yet
Binary Prediction of Smoker Status Using Bio-Signals
20 pages
Diabetes EDA and Kears Modeling
No ratings yet
Diabetes EDA and Kears Modeling
26 pages
Diabetes Prediction 1704256341
No ratings yet
Diabetes Prediction 1704256341
17 pages
AML Sessional 1 Students
No ratings yet
AML Sessional 1 Students
16 pages
Heart Disease Indicator Prediction Model
No ratings yet
Heart Disease Indicator Prediction Model
17 pages
Major Project - Colab
No ratings yet
Major Project - Colab
15 pages
LAB8 LogisticReg HeartDisease
No ratings yet
LAB8 LogisticReg HeartDisease
31 pages
Hcin620 m6 Lab6 Hanifahmutesi-Finalproject
No ratings yet
Hcin620 m6 Lab6 Hanifahmutesi-Finalproject
5 pages
TP3.ipynb - Colab
No ratings yet
TP3.ipynb - Colab
17 pages
Stroke Prediction
No ratings yet
Stroke Prediction
10 pages
Preprocessing1.ipynb - Colab
No ratings yet
Preprocessing1.ipynb - Colab
13 pages
Eda-Ml-Decision-Tree - Ipynb - Colab
No ratings yet
Eda-Ml-Decision-Tree - Ipynb - Colab
20 pages
C ML1
No ratings yet
C ML1
10 pages
Assignment 1
No ratings yet
Assignment 1
10 pages
Model2.ipynb - Colab
No ratings yet
Model2.ipynb - Colab
11 pages
My Code
No ratings yet
My Code
7 pages
Logistic Regression
No ratings yet
Logistic Regression
12 pages
Report On Launching of New Product
40% (5)
Report On Launching of New Product
43 pages
LP Practical ! Jupyter Notebook
No ratings yet
LP Practical ! Jupyter Notebook
6 pages
Healthcare-Project-Simplilearn - Week1
No ratings yet
Healthcare-Project-Simplilearn - Week1
6 pages
45 AIML Practical 09
No ratings yet
45 AIML Practical 09
6 pages
Pima Indian Diabetes Questions
No ratings yet
Pima Indian Diabetes Questions
6 pages
KNN For Classification
No ratings yet
KNN For Classification
5 pages
Practical 1
No ratings yet
Practical 1
7 pages
KNN - Jupyter Notebook
No ratings yet
KNN - Jupyter Notebook
7 pages
Apply Logistic Regression Model Techniques To Predict Data On Any Dataset
No ratings yet
Apply Logistic Regression Model Techniques To Predict Data On Any Dataset
5 pages
Heart Disease Prediction (1) (1) - 1
No ratings yet
Heart Disease Prediction (1) (1) - 1
1 page
Logistic - Ipynb - Colaboratory
No ratings yet
Logistic - Ipynb - Colaboratory
6 pages
Exp 5
No ratings yet
Exp 5
7 pages
Logistic Regression 205
No ratings yet
Logistic Regression 205
8 pages
Linear and Multilinear Regression
No ratings yet
Linear and Multilinear Regression
5 pages
Project 3 - Diabetes Prediction - Ipynb - Colab
No ratings yet
Project 3 - Diabetes Prediction - Ipynb - Colab
4 pages
Experiment 4
No ratings yet
Experiment 4
5 pages
Baseline - Ipynb - Colab
No ratings yet
Baseline - Ipynb - Colab
5 pages
Step-By-Step-Diabetes-Classification-Knn-Detailed-Copy1 - Jupyter Notebook
No ratings yet
Step-By-Step-Diabetes-Classification-Knn-Detailed-Copy1 - Jupyter Notebook
12 pages
ADS Exp-1
No ratings yet
ADS Exp-1
3 pages
Mod 4
No ratings yet
Mod 4
2 pages
Dovdush KN-305 Lab3
No ratings yet
Dovdush KN-305 Lab3
2 pages
DocScanner Oct 22, 2024 17-38
No ratings yet
DocScanner Oct 22, 2024 17-38
2 pages
MODULE 12 GMOs AND GENE THERAPY
100% (2)
MODULE 12 GMOs AND GENE THERAPY
37 pages
Dovdush KN-305 Lab2
No ratings yet
Dovdush KN-305 Lab2
2 pages
Artificial Neural Network (Ann)
No ratings yet
Artificial Neural Network (Ann)
1 page
B58 - Handling Missing Values, Feature - Selection
No ratings yet
B58 - Handling Missing Values, Feature - Selection
4 pages
Sample Prescott's Microbiology 9th 9E Joanne Willey
No ratings yet
Sample Prescott's Microbiology 9th 9E Joanne Willey
30 pages
Bio-Signal Analysis For Smoking
No ratings yet
Bio-Signal Analysis For Smoking
1 page
Chapter - 14 Advanced Regression Models
No ratings yet
Chapter - 14 Advanced Regression Models
49 pages
As 3515.2-2002 Gold and Gold Bearing Alloys Determination of Gold Content 30 Percent To 99.5 Percent - Gravim
No ratings yet
As 3515.2-2002 Gold and Gold Bearing Alloys Determination of Gold Content 30 Percent To 99.5 Percent - Gravim
7 pages
Blood Bank 3
No ratings yet
Blood Bank 3
14 pages
Type VR Vacuum Circuit Breaker Interruptor Automático Al Vacío Tipo VR Disjoncteur Sous Vide Type VR
No ratings yet
Type VR Vacuum Circuit Breaker Interruptor Automático Al Vacío Tipo VR Disjoncteur Sous Vide Type VR
113 pages
NMI
No ratings yet
NMI
36 pages
10th Science Sample Paper 2024
No ratings yet
10th Science Sample Paper 2024
13 pages
Pakistan Tobacco Company Assignment
No ratings yet
Pakistan Tobacco Company Assignment
9 pages
Notation: Ae Aeff An
No ratings yet
Notation: Ae Aeff An
4 pages
5 PDF
No ratings yet
5 PDF
1 page
DDMCA Regulations Updated
No ratings yet
DDMCA Regulations Updated
11 pages
CRI Test Method 114
No ratings yet
CRI Test Method 114
11 pages
TCAD Simulation
No ratings yet
TCAD Simulation
35 pages
Saffola
No ratings yet
Saffola
2 pages
Petrifilm Salmonella Express SALX Interpretation Guide - en US - FS00587
No ratings yet
Petrifilm Salmonella Express SALX Interpretation Guide - en US - FS00587
6 pages
PHD - Aerodynamics of Flexible Membranes
No ratings yet
PHD - Aerodynamics of Flexible Membranes
165 pages
PH Formative Assessment Training - Final Report - FINAL
No ratings yet
PH Formative Assessment Training - Final Report - FINAL
50 pages
TUGAS 2 BAHASA INGGRIS Arin
No ratings yet
TUGAS 2 BAHASA INGGRIS Arin
5 pages
App 002 Final Exam Reviewer
No ratings yet
App 002 Final Exam Reviewer
3 pages
SHM Exercise-3
No ratings yet
SHM Exercise-3
5 pages
Theory of Literature
No ratings yet
Theory of Literature
5 pages
Waiver
No ratings yet
Waiver
6 pages
MR-Pdt-SE New Adhesion Communication
No ratings yet
MR-Pdt-SE New Adhesion Communication
2 pages
Ec2209 Set 3
No ratings yet
Ec2209 Set 3
2 pages
A List of Factorial Math Constants
From Everand
A List of Factorial Math Constants
StreetLib
No ratings yet

Aids

Uploaded by

Aids

Uploaded by

AIDS_Clinical_Trials_Group_Study_175 Analysis

Unnamed: 0 time trt age wtkg hemo homo drugs karnof

Unnamed: 0 time trt age wtkg hemo homo drugs karnof

... str2 strat symptom treat offtrt cd40 cd420 cd80

Index(['Unnamed: 0', 'time', 'trt', 'age', 'wtkg', 'hemo', 'homo',

Unnamed: 0 time trt age wtkg

mean 1069.000000 879.098177 1.520804 35.248247 75.125311

std 617.620434 292.274324 1.127890 8.709026 13.263164

min 0.000000 14.000000 0.000000 12.000000 31.000000

25% 534.500000 727.000000 1.000000 29.000000 66.679200

50% 1069.000000 997.000000 2.000000 34.000000 74.390400

75% 1603.500000 1091.000000 3.000000 40.000000 82.555200

max 2138.000000 1231.000000 3.000000 70.000000 159.939360

hemo homo drugs karnof oprior

str2 strat symptom treat offtrt

mean 0.585788 1.979897 0.172978 0.751286 0.362786

std 0.492701 0.899053 0.378317 0.432369 0.480916

min 0.000000 1.000000 0.000000 0.000000 0.000000

25% 0.000000 1.000000 0.000000 1.000000 0.000000

50% 1.000000 2.000000 0.000000 1.000000 0.000000

75% 1.000000 3.000000 0.000000 1.000000 1.000000

max 1.000000 3.000000 1.000000 1.000000 1.000000

cd40 cd420 cd80 cd820 cid

count 2139.000000 2139.000000 2139.000000 2139.000000 2139.000000

mean 350.501169 371.307153 986.627396 935.369799 0.243572

std 118.573863 144.634909 480.197750 444.976051 0.429338

min 0.000000 49.000000 40.000000 124.000000 0.000000

25% 263.500000 269.000000 654.000000 631.500000 0.000000

50% 340.000000 353.000000 893.000000 865.000000 0.000000

75% 423.000000 460.000000 1207.000000 1146.500000 0.000000

max 1199.000000 1119.000000 5011.000000 6035.000000 1.000000

df = df.drop(['Unnamed: 0', 'cid'], axis = 1)

object_columns = df.select_dtypes(include=['object', 'bool']).columns

Object type columns:

Numerical type columns:

for column in df.columns:

return categorical_features, non_categorical_features,

categorical, non_categorical, discrete, continuous =

time trt age wtkg hemo homo

karnof oprior z30 ... gender str2

symptom treat offtrt cd40 cd420 cd80

[23 rows x 23 columns]

for pair in correlation_pairs:

features = ['age', 'hemo', 'cd40']

fig = plt.figure(figsize=(10, 8))

sns.pairplot(df, x_vars=scatter_features, y_vars='time', height=4)

surface_features = ['age', 'hemo', 'cd40']

fig = plt.figure(figsize=(10, 8))

You might also like