0% found this document useful (0 votes)

12 views13 pages

Preprocessing1.ipynb - Colab

The document outlines a data preprocessing workflow for a dataset related to stroke prediction, including data loading, handling missing values, and outlier management. It employs techniques such as median imputation for numerical features, mode imputation for categorical features, and applies transformations like log transformation for skewed distributions. The document also discusses the treatment of extreme values and standardizes categorical variables to ensure consistency in the dataset.

Uploaded by

gacia der

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views13 pages

Preprocessing1.ipynb - Colab

Uploaded by

gacia der

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

2/23/25, 1:03 PM Preprocessing1.

ipynb - Colab

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder, StandardScaler, MinMaxScaler
from sklearn.impute import SimpleImputer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import fbeta_score, precision_recall_curve, roc_auc_score, classification_report, confusion_matri
import os

from google.colab import files

uploaded = files.upload()

Choose Files 2 files

test.csv(text/csv) - 152301 bytes, last modified: 2/21/2025 - 100% done
train.csv(text/csv) - 146010 bytes, last modified: 2/21/2025 - 100% done
Saving test.csv to test (1).csv
Saving train csv to train (1) csv

train_path = "/content/train.csv"
test_path = "/content/test.csv"

import os
print(os.path.exists(train_path))
print(os.path.exists(test_path))

True
True

# Load Data (for manually uploaded files in Colab or Jupyter)

train_path = "/content/train.csv"
test_path = "/content/test.csv"

if not os.path.isfile(train_path):
raise FileNotFoundError(f"Train file not found at {train_path}")
if not os.path.isfile(test_path):
raise FileNotFoundError(f"Test file not found at {test_path}")

train_df = pd.read_csv(train_path)
test_df = pd.read_csv(test_path)

# Display dataset description

print(train_df.describe(include='all'))

gender age hypertension heart_disease ever_married \

count 2555 2555.000000 2555.000000 2555.000000 2555
unique 5 NaN NaN NaN 2
top Female NaN NaN NaN Yes
freq 1309 NaN NaN NaN 1699
mean NaN 46.373777 0.099804 0.053620 NaN
std NaN 149.971251 0.299798 0.225311 NaN
min NaN 0.000100 0.000000 0.000000 NaN
25% NaN 26.000000 0.000000 0.000000 NaN
50% NaN 44.000000 0.000000 0.000000 NaN
75% NaN 60.000000 0.000000 0.000000 NaN
max NaN 7500.000000 1.000000 1.000000 NaN

work_type Residence_type avg_glucose_level bmi \

count 2555 2555 2555.000000 2454.000000
unique 5 2 NaN NaN
top Private Urban NaN NaN
freq 1461 1288 NaN NaN
https://colab.research.google.com/drive/1BHbOmi8ERUUKCVu93HA6ePUdqED9B74s#scrollTo=uq_fzXWzZC9m&printMode=true 1/13
2/23/25, 1:03 PM Preprocessing1.ipynb - Colab
mean NaN NaN 105.534755 28.898248
std NaN NaN 44.689250 7.958036
min NaN NaN 55.220000 10.300000
25% NaN NaN 77.000000 23.500000
50% NaN NaN 91.450000 28.000000
75% NaN NaN 113.160000 33.200000
max NaN NaN 271.740000 92.000000

smoking_status stroke
count 2555 2554
unique 4 4
top never smoked 0
freq 945 2429
mean NaN NaN
std NaN NaN
min NaN NaN
25% NaN NaN
50% NaN NaN
75% NaN NaN
max NaN NaN

Handling Missing Values

# Handling Missing Values

print("Missing values before handling:")
print(train_df.isnull().sum())

Missing values before handling:

gender 0
age 0
hypertension 0
heart_disease 0
ever_married 0
work_type 0
Residence_type 0
avg_glucose_level 0
bmi 101
smoking_status 0
stroke 1
dtype: int64

# Impute missing values for numerical features using median

num_features = ["age", "avg_glucose_level", "bmi"]
num_imputer = SimpleImputer(strategy="median")
train_df[num_features] = num_imputer.fit_transform(train_df[num_features])
test_df[num_features] = num_imputer.transform(test_df[num_features])

# Handling Categorical Missing Values by Filling with Most Frequent Value

cat_features = ["gender", "ever_married", "work_type", "Residence_type", "smoking_status"]
cat_imputer = SimpleImputer(strategy="most_frequent")
train_df[cat_features] = cat_imputer.fit_transform(train_df[cat_features])
test_df[cat_features] = cat_imputer.transform(test_df[cat_features])

# Double-check missing values after handling

print("Missing values after handling:")
print(train_df.isnull().sum())

Missing values after handling:

gender 0
age 0
hypertension 0
heart_disease 0
ever_married 0
work_type 0

https://colab.research.google.com/drive/1BHbOmi8ERUUKCVu93HA6ePUdqED9B74s#scrollTo=uq_fzXWzZC9m&printMode=true 2/13
2/23/25, 1:03 PM Preprocessing1.ipynb - Colab
Residence_type 0
avg_glucose_level 0
bmi 0
smoking_status 0
stroke 1
dtype: int64

# Drop row with missing stroke value

train_df = train_df.dropna(subset=["stroke"])

# Double-check missing values after handling

print("Missing values after handling:")
print(train_df.isnull().sum())

Missing values after handling:

gender 0
age 0
hypertension 0
heart_disease 0
ever_married 0
work_type 0
Residence_type 0
avg_glucose_level 0
bmi 0
smoking_status 0
stroke 0
dtype: int64

# Standardizing target variable 'stroke'

train_df["stroke"] = train_df["stroke"].replace({"Yes": 1, "yes": 1, "No": 0, "no": 0}).astype(int)

# Checking Unique Values and Their Frequencies for Target Variable

print("Stroke Unique Values Count:", train_df['stroke'].nunique())
print("Stroke Unique Values:", list(train_df['stroke'].unique()))
print("Stroke Value Counts:\n", train_df['stroke'].value_counts(), "\n")

Stroke Unique Values Count: 2

Stroke Unique Values: [0, 1]
Stroke Value Counts:
stroke
0 2427
1 124
Name: count, dtype: int64

Checking for Negative Values

# Checking for Negative Values

print("Checking for Negative Values:")
for feature in num_features:
negative_count = (train_df[feature] < 0).sum()
print(f"{feature}: {negative_count} negative values")

Checking for Negative Values:

age: 0 negative values
avg_glucose_level: 0 negative values
bmi: 0 negative values

since there are no negative values in the data so writing a code to handle negative values would be redundant

Handling Outliers

https://colab.research.google.com/drive/1BHbOmi8ERUUKCVu93HA6ePUdqED9B74s#scrollTo=uq_fzXWzZC9m&printMode=true 3/13
2/23/25, 1:03 PM Preprocessing1.ipynb - Colab

# Visualizing Outliers using Boxplots

plt.figure(figsize=(12,6))
for i, feature in enumerate(num_features):
plt.subplot(1, len(num_features), i+1)
sns.boxplot(y=train_df[feature])
plt.title(f"Boxplot of {feature}")
plt.tight_layout()
plt.show()

# Visualizing Distribution of avg_glucose_level

plt.figure(figsize=(6,4))
sns.histplot(train_df['avg_glucose_level'], kde=True, bins=30)
plt.title("Distribution of avg_glucose_level")
plt.show()

https://colab.research.google.com/drive/1BHbOmi8ERUUKCVu93HA6ePUdqED9B74s#scrollTo=uq_fzXWzZC9m&printMode=true 4/13
2/23/25, 1:03 PM Preprocessing1.ipynb - Colab

For age, I applied the Interquartile Range (IQR) method to remove extreme values. The rationale is that extremely high ages
might be biologically unrealistic and could distort the model’s learning process. The IQR method effectively identifies and
removes such extreme values while preserving most of the data distribution.

For BMI, I used Winsorization (capping at the 1st and 99th percentiles) instead of removing outliers. Since BMI naturally varies
among individuals, especially in medical datasets, completely removing high or low values could lead to loss of important
information. Instead, capping prevents extreme values from dominating the model while retaining valuable patterns.

For average glucose level, I applied a log transformation to address the right-skewed distribution observed in the data. This
transformation helps stabilize variance and ensures that large glucose values do not disproportionately affect the model. Unlike
outright removal, log transformation allows the model to learn from high glucose levels while mitigating their impact.

These methods ensure that we retain critical medical data while improving model robustness and preventing outliers from
biasing the predictions.

# Handling Outliers using IQR method for Age Only

Q1 = train_df['age'].quantile(0.25)
Q3 = train_df['age'].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
train_df = train_df[(train_df['age'] >= lower_bound) & (train_df['age'] <= upper_bound)]

# Capping extreme BMI values using Winsorization (1st and 99th percentile)
bmi_lower_cap = train_df['bmi'].quantile(0.01)
bmi_upper_cap = train_df['bmi'].quantile(0.99)
train_df['bmi'] = np.clip(train_df['bmi'], bmi_lower_cap, bmi_upper_cap)
test_df['bmi'] = np.clip(test_df['bmi'], bmi_lower_cap, bmi_upper_cap)

# Applying Log Transformation to avg_glucose_level

train_df['avg_glucose_level'] = np.log1p(train_df['avg_glucose_level']) # log1p to avoid log(0)
test_df['avg_glucose_level'] = np.log1p(test_df['avg_glucose_level'])

# Rechecking Outlier Handling with Boxplots

plt.figure(figsize=(12,6))
for i, feature in enumerate(['age', 'bmi', 'avg_glucose_level']):
plt.subplot(1, 3, i+1)
sns.boxplot(y=train_df[feature])
plt.title(f"Boxplot of {feature} After Outlier Handling")
plt.tight_layout()
plt.show()

https://colab.research.google.com/drive/1BHbOmi8ERUUKCVu93HA6ePUdqED9B74s#scrollTo=uq_fzXWzZC9m&printMode=true 5/13
2/23/25, 1:03 PM Preprocessing1.ipynb - Colab

# Checking for small or incorrect age values

print("Smallest age values:")
print(train_df[['age']].sort_values(by='age').head(10))

Smallest age values:

age
621 0.0001
1184 0.0800
2272 0.2400
113 0.2400
2006 0.3200
1467 0.3200
817 0.3200
701 0.3200
1036 0.4000
2042 0.4800

I chose to keep the low-age children in the dataset because they represent a valid demographic group that could still be at risk
of stroke, especially due to congenital conditions. Instead of removing them, I verified their presence and ensured that extremely
small, likely erroneous values were corrected, preserving the integrity of medically relevant data.

Handling Abnormal Categorical Values: I standardized the gender column by converting all values to lowercase to prevent
duplicate categories (e.g., "Male" and "male" being treated separately). Additionally, we replaced "other" with the most frequent
gender in the dataset. This ensures consistency and prevents issues during encoding while maintaining the integrity of the data

# Checking Unique Values and Their Frequencies for Each Categorical Column
for feature in cat_features:
print(f"{feature} unique values count: {train_df[feature].nunique()}")
print(f"{feature} unique values: {list(train_df[feature].unique())}")
print(f"{feature} value counts:\n{train_df[feature].value_counts()}\n")

gender unique values count: 2

gender unique values: ['male', 'female']

https://colab.research.google.com/drive/1BHbOmi8ERUUKCVu93HA6ePUdqED9B74s#scrollTo=uq_fzXWzZC9m&printMode=true 6/13
2/23/25, 1:03 PM Preprocessing1.ipynb - Colab
gender value counts:
gender
female 1491
male 1060
Name: count, dtype: int64

ever_married unique values count: 2

ever_married unique values: ['Yes', 'No']
ever_married value counts:
ever_married
Yes 1696
No 855
Name: count, dtype: int64

work_type unique values count: 5

work_type unique values: ['Self-employed', 'Private', 'Govt_job', 'children', 'Never_worked']
work_type value counts:
work_type
Private 1458
Self-employed 412
children 341
Govt_job 330
Never_worked 10
Name: count, dtype: int64

Residence_type unique values count: 2

Residence_type unique values: ['Urban', 'Rural']
Residence_type value counts:
Residence_type
Urban 1286
Rural 1265
Name: count, dtype: int64

smoking_status unique values count: 4

smoking_status unique values: ['smokes', 'never smoked', 'Unknown', 'formerly smoked']
smoking_status value counts:
smoking_status
never smoked 944
Unknown 759
formerly smoked 452
smokes 396
Name: count, dtype: int64

# Checking Frequency of Gender Categories Before Standardization

print("Gender Distribution Before Standardization:")
print(train_df['gender'].value_counts())

Gender Distribution Before Standardization:

gender
female 1491
male 1060
Name: count, dtype: int64

# Standardizing Gender Values

train_df["gender"] = train_df["gender"].str.lower()
test_df["gender"] = test_df["gender"].str.lower()

# Replacing 'other' gender with the most frequent gender

most_frequent_gender = train_df["gender"].mode()[0]
train_df["gender"] = train_df["gender"].replace("other", most_frequent_gender)
test_df["gender"] = test_df["gender"].replace("other", most_frequent_gender)

# Checking Frequency of Gender Categories After Standardization

print("Gender Distribution After Standardization:")
print(train_df['gender'].value_counts())

https://colab.research.google.com/drive/1BHbOmi8ERUUKCVu93HA6ePUdqED9B74s#scrollTo=uq_fzXWzZC9m&printMode=true 7/13
2/23/25, 1:03 PM Preprocessing1.ipynb - Colab

Gender Distribution After Standardization:

gender
female 1491
male 1060
Name: count, dtype: int64

I chose to retain the "Unknown" category in smoking_status because it represents a significant proportion of the dataset (759
instances) and removing or replacing it could introduce bias. By keeping it as a separate category, the model can learn patterns
from individuals with missing smoking data rather than making incorrect assumptions about their smoking habits.

# Checking Stroke Distribution by Gender

print("Stroke Distribution by Gender:")
print(train_df.groupby('gender')['stroke'].value_counts())

Stroke Distribution by Gender:

gender stroke
female 0 1420
1 71
male 0 1007
1 53
Name: count, dtype: int64

The stroke rates for males (5.00%) and females (4.76%) are very close, indicating that there is no significant gender bias in
stroke occurrence based on this dataset.

Handling Duplicate Records

# Checking for duplicate rows before removal

print("Duplicate rows before removal:", train_df.duplicated().sum())

Duplicate rows before removal: 0

since there are no duplicate rows writing a code to remoce duplicates would be redundant

Checking for Redundant Features

# Checking for redundant features

print("Correlation Matrix:")

# Selecting only numeric columns for correlation

numeric_cols = train_df.select_dtypes(include=['number'])

plt.figure(figsize=(10,6))
sns.heatmap(numeric_cols.corr(), annot=True, cmap="coolwarm", fmt=".2f", linewidths=0.5)
plt.title("Feature Correlation Matrix")
plt.show()

https://colab.research.google.com/drive/1BHbOmi8ERUUKCVu93HA6ePUdqED9B74s#scrollTo=uq_fzXWzZC9m&printMode=true 8/13
2/23/25, 1:03 PM Preprocessing1.ipynb - Colab

Correlation Matrix:

from sklearn.linear_model import LogisticRegression

from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
import pandas as pd

# Selecting relevant features

X = train_df[['age', 'hypertension', 'heart_disease', 'avg_glucose_level', 'bmi']] # Independent variables
y = train_df['stroke'] # Dependent variable

# Splitting into train and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

# Initializing and training the model

model = LogisticRegression(max_iter=1000, class_weight="balanced")
model.fit(X_train, y_train)

# Print feature importance (coefficients)

feature_importance = pd.DataFrame({'Feature': X.columns, 'Importance': model.coef_[0]})
print(feature_importance.sort_values(by='Importance', ascending=False))

# Evaluate model
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))

Feature Importance
1 hypertension 0.654611
3 avg_glucose_level 0.379540

https://colab.research.google.com/drive/1BHbOmi8ERUUKCVu93HA6ePUdqED9B74s#scrollTo=uq_fzXWzZC9m&printMode=true 9/13
2/23/25, 1:03 PM Preprocessing1.ipynb - Colab
2 heart_disease 0.220947
0 age 0.095052
4 bmi 0.004861
precision recall f1-score support

0 0.98 0.73 0.84 486

1 0.11 0.64 0.19 25

accuracy 0.73 511

macro avg 0.54 0.69 0.51 511
weighted avg 0.93 0.73 0.80 511

Scaling some the two numerical features Age and BMI because they have different magnitudes, and logistic regression
performs better when features are on a similar scale. Standardizing these variables ensures that no single feature dominates
the model, improving optimization and stability during training.

scaler = StandardScaler()
scaled_train = pd.DataFrame(scaler.fit_transform(train_df[num_features]), columns=num_features)
scaled_test = pd.DataFrame(scaler.transform(test_df[num_features]), columns=num_features)

# Checking if Age and BMI are properly scaled

print("Mean and Standard Deviation after Scaling:")
print(pd.DataFrame(scaled_train, columns=['age', 'bmi']).describe().loc[['mean', 'std']])

Mean and Standard Deviation after Scaling:

age bmi
mean 2.228280e-17 -3.676662e-16
std 1.000196e+00 1.000196e+00

Encoding Categorical Values

# Encoding Categorical Features

encoder = OneHotEncoder(drop='first', sparse_output=False)
encoded_train = pd.DataFrame(encoder.fit_transform(train_df[cat_features]), columns=encoder.get_feature_names_out(cat_
encoded_test = pd.DataFrame(encoder.transform(test_df[cat_features]), columns=encoder.get_feature_names_out(cat_featur

# Reset index to align with other features

encoded_train.reset_index(drop=True, inplace=True)
encoded_test.reset_index(drop=True, inplace=True)

# Verifying One-Hot Encoding

print("Encoded train data shape:", encoded_train.shape)
print("Encoded test data shape:", encoded_test.shape)

# Display first few rows to check encoding

print("First few rows of encoded train data:")
print(encoded_train.head())

Encoded train data shape: (2551, 10)

Encoded test data shape: (2555, 10)
First few rows of encoded train data:
gender_male ever_married_Yes work_type_Never_worked work_type_Private \
0 1.0 1.0 0.0 0.0
1 1.0 1.0 0.0 1.0
2 1.0 0.0 0.0 0.0
3 1.0 1.0 0.0 1.0
4 1.0 0.0 0.0 0.0

work_type_Self-employed work_type_children Residence_type_Urban \

0 1.0 0.0 1.0
1 0.0 0.0 0.0
https://colab.research.google.com/drive/1BHbOmi8ERUUKCVu93HA6ePUdqED9B74s#scrollTo=uq_fzXWzZC9m&printMode=true 10/13
2/23/25, 1:03 PM Preprocessing1.ipynb - Colab
2 0.0 0.0 0.0
3 0.0 0.0 1.0
4 0.0 1.0 0.0

smoking_status_formerly smoked smoking_status_never smoked \

0 0.0 0.0
1 0.0 1.0
2 0.0 1.0
3 0.0 0.0
4 1.0 0.0

smoking_status_smokes
0 1.0
1 0.0
2 0.0
3 0.0
4 0.0

# Combining Processed Features

X_train_final = pd.concat([scaled_train, encoded_train, train_df[["hypertension", "heart_disease"]].reset_index(drop=T
X_test_final = pd.concat([scaled_test, encoded_test, test_df[["hypertension", "heart_disease"]].reset_index(drop=True)

y_train_final = train_df["stroke"].astype(int)

# Train-Test Split
X_train, X_val, y_train, y_val = train_test_split(X_train_final, y_train_final, test_size=0.2, random_state=42, strati

print("Preprocessing complete.")

Preprocessing complete.

from google.colab import files

# Save cleaned data locally

train_df.to_csv("train_cleaned.csv", index=False)
test_df.to_csv("test_cleaned.csv", index=False)

# Download the files

files.download("train_cleaned.csv")
files.download("test_cleaned.csv")

# Displaying summary statistics of the cleaned dataset

print("Summary Statistics of Cleaned Data:")
print(train_df.describe(include='all'))

Summary Statistics of Cleaned Data:

gender age hypertension heart_disease ever_married \
count 2551 2551.000000 2551.000000 2551.000000 2551
unique 2 NaN NaN NaN 2
top female NaN NaN NaN Yes
freq 1491 NaN NaN NaN 1696
mean NaN 43.115249 0.098785 0.052528 NaN
std NaN 22.406533 0.298431 0.223134 NaN
min NaN 0.000100 0.000000 0.000000 NaN
25% NaN 25.500000 0.000000 0.000000 NaN
50% NaN 44.000000 0.000000 0.000000 NaN
75% NaN 60.000000 0.000000 0.000000 NaN
max NaN 82.000000 1.000000 1.000000 NaN

work_type Residence_type avg_glucose_level bmi \

count 2551 2551 2551.000000 2551.000000
unique 5 2 NaN NaN
top Private Urban NaN NaN
https://colab.research.google.com/drive/1BHbOmi8ERUUKCVu93HA6ePUdqED9B74s#scrollTo=uq_fzXWzZC9m&printMode=true 11/13
2/23/25, 1:03 PM Preprocessing1.ipynb - Colab
freq 1458 1286 NaN NaN
mean NaN NaN 4.598510 28.810702
std NaN NaN 0.355820 7.485319
min NaN NaN 4.029273 15.100000
25% NaN NaN 4.356709 23.700000
50% NaN NaN 4.526668 28.000000
75% NaN NaN 4.737601 32.800000
max NaN NaN 5.608519 53.450000

smoking_status stroke
count 2551 2551.000000
unique 4 NaN
top never smoked NaN
freq 944 NaN
mean NaN 0.048608
std NaN 0.215090
min NaN 0.000000
25% NaN 0.000000
50% NaN 0.000000
75% NaN 0.000000
max NaN 1.000000

# Train-Test Split
X_train, X_val, y_train, y_val = train_test_split(X_train_final, y_train_final, test_size=0.2, random_state=42, strati

print("Preprocessing complete.")

# Train Logistic Regression Model

model = LogisticRegression(class_weight='balanced', max_iter=1000)
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_val)
y_probs = model.predict_proba(X_val)[:, 1]

# Evaluation Metrics
auc_score = roc_auc_score(y_val, y_probs)
f_beta = fbeta_score(y_val, y_pred, beta=10)
class_report = classification_report(y_val, y_pred)
conf_matrix = confusion_matrix(y_val, y_pred)

# Display Metrics
print(f"AUC Score: {auc_score}")
print(f"F-beta Score (β=10): {f_beta}")
print("Classification Report:")
print(class_report)
print("Confusion Matrix:")
print(conf_matrix)

Preprocessing complete.
AUC Score: 0.8042798353909465
F-beta Score (β=10): 0.688115064345193
Classification Report:
precision recall f1-score support

0 0.98 0.74 0.85 486

1 0.13 0.72 0.22 25

accuracy 0.74 511

macro avg 0.55 0.73 0.53 511
weighted avg 0.94 0.74 0.82 511

Confusion Matrix:
[[362 124]
[ 7 18]]

# Prepare Test Data for Submission

https://colab.research.google.com/drive/1BHbOmi8ERUUKCVu93HA6ePUdqED9B74s#scrollTo=uq_fzXWzZC9m&printMode=true 12/13
2/23/25, 1:03 PM Preprocessing1.ipynb - Colab
# Prepare Test Data for Submission
# Predicting on the test dataset
test_probs = model.predict_proba(X_test_final)[:, 1]
test_preds = model.predict(X_test_final)

https://colab.research.google.com/drive/1BHbOmi8ERUUKCVu93HA6ePUdqED9B74s#scrollTo=uq_fzXWzZC9m&printMode=true 13/13

Detailed Lesson Plan in
77% (56)
Detailed Lesson Plan in
4 pages
Step-By-Step-Diabetes-Classification-Knn-Detailed-Copy1 - Jupyter Notebook
No ratings yet
Step-By-Step-Diabetes-Classification-Knn-Detailed-Copy1 - Jupyter Notebook
12 pages
Technical Service Manual: Prismasync V3.2
No ratings yet
Technical Service Manual: Prismasync V3.2
164 pages
Import: Sys - Executable - M Pip Install
No ratings yet
Import: Sys - Executable - M Pip Install
23 pages
Aia LLD
No ratings yet
Aia LLD
118 pages
DELTA HMI Compliance With Electronic Record - v1 - English
No ratings yet
DELTA HMI Compliance With Electronic Record - v1 - English
21 pages
Model2.ipynb - Colab
No ratings yet
Model2.ipynb - Colab
11 pages
baseline.ipynb - Colab
No ratings yet
baseline.ipynb - Colab
5 pages
Stroke Prediction
No ratings yet
Stroke Prediction
10 pages
Cardio Screen RF
100% (1)
Cardio Screen RF
27 pages
1728086737277
No ratings yet
1728086737277
26 pages
B58_ Handling Missing Values,Feature_Selection (1)
No ratings yet
B58_ Handling Missing Values,Feature_Selection (1)
4 pages
Logistic Regression
No ratings yet
Logistic Regression
12 pages
DSBDA2
No ratings yet
DSBDA2
6 pages
Diabetes - Prediction - Project - Ipynb - Colab
No ratings yet
Diabetes - Prediction - Project - Ipynb - Colab
11 pages
Student Notebook HR Analysis
No ratings yet
Student Notebook HR Analysis
11 pages
Major project - Colab
No ratings yet
Major project - Colab
15 pages
Data Pre-Processing
No ratings yet
Data Pre-Processing
22 pages
ML Proj Diabetes.pptx
No ratings yet
ML Proj Diabetes.pptx
51 pages
KNN For Classification
No ratings yet
KNN For Classification
5 pages
KNN - Jupyter Notebook (1)
No ratings yet
KNN - Jupyter Notebook (1)
7 pages
Data Science Code
No ratings yet
Data Science Code
29 pages
Linear and Multilinear Regression
No ratings yet
Linear and Multilinear Regression
5 pages
AML Sessional 1 Students
No ratings yet
AML Sessional 1 Students
16 pages
LAB8_LogisticReg_HeartDisease[1]
No ratings yet
LAB8_LogisticReg_HeartDisease[1]
31 pages
Python Solution
No ratings yet
Python Solution
30 pages
Artificial Neural Network (Ann)
No ratings yet
Artificial Neural Network (Ann)
1 page
Dovdush_KN-305_lab3
No ratings yet
Dovdush_KN-305_lab3
2 pages
ADS Exp-1
No ratings yet
ADS Exp-1
3 pages
Diabetis Project
No ratings yet
Diabetis Project
7 pages
Heart Disease Prediction! ❤️?
No ratings yet
Heart Disease Prediction! ❤️?
52 pages
DSBDA 5
No ratings yet
DSBDA 5
12 pages
Logistic Regression 205
No ratings yet
Logistic Regression 205
8 pages
Bio-Signal Analysis For Smoking
No ratings yet
Bio-Signal Analysis For Smoking
1 page
Aids
No ratings yet
Aids
88 pages
ML Data Preprocessing in Python
No ratings yet
ML Data Preprocessing in Python
9 pages
eda-ml-decision-tree.ipynb - Colab
No ratings yet
eda-ml-decision-tree.ipynb - Colab
20 pages
AttiqAhmadAfsarMidExam
No ratings yet
AttiqAhmadAfsarMidExam
8 pages
vertopal.com_python2025
No ratings yet
vertopal.com_python2025
25 pages
Heart Failure Prediction
100% (1)
Heart Failure Prediction
41 pages
Linear Regression: Data Exploration
No ratings yet
Linear Regression: Data Exploration
12 pages
ExNo 08ml
No ratings yet
ExNo 08ml
4 pages
ML Lab Records
No ratings yet
ML Lab Records
101 pages
Batch-2 Ieee DMT
No ratings yet
Batch-2 Ieee DMT
4 pages
Stroke Prediction Dataset
No ratings yet
Stroke Prediction Dataset
48 pages
ML LAB manual-1
No ratings yet
ML LAB manual-1
33 pages
LP Practical ! Jupyter Notebook
No ratings yet
LP Practical ! Jupyter Notebook
6 pages
Heart Disease Indicator Prediction Model
No ratings yet
Heart Disease Indicator Prediction Model
17 pages
m3125 Practical 3
No ratings yet
m3125 Practical 3
13 pages
Openlab1
No ratings yet
Openlab1
17 pages
ML Practical 04
No ratings yet
ML Practical 04
20 pages
Project 3 - Diabetes Prediction.ipynb - Colab
No ratings yet
Project 3 - Diabetes Prediction.ipynb - Colab
4 pages
prg7a - Jupyter Notebook
No ratings yet
prg7a - Jupyter Notebook
12 pages
turing-data-analysis
No ratings yet
turing-data-analysis
30 pages
Data Science Practicals - Ipynb
No ratings yet
Data Science Practicals - Ipynb
54 pages
Data Science Practical 9
No ratings yet
Data Science Practical 9
6 pages
Diabetes_Prediction_1704256341
No ratings yet
Diabetes_Prediction_1704256341
17 pages
Logistic Regression
No ratings yet
Logistic Regression
28 pages
vertopal.com_Project_16_Calories_Burnt_Prediction
No ratings yet
vertopal.com_Project_16_Calories_Burnt_Prediction
10 pages
PythonForMachineLearning
No ratings yet
PythonForMachineLearning
66 pages
Ass 1 Dsbda
No ratings yet
Ass 1 Dsbda
8 pages
Data Science Fundamentals
No ratings yet
Data Science Fundamentals
22 pages
Practical 4
No ratings yet
Practical 4
2 pages
MCS-011: Problem Solving and Programming
From Everand
MCS-011: Problem Solving and Programming
Dr. DK Sukhani
No ratings yet
Lect_06_Feature_Engineering_and_Selection
No ratings yet
Lect_06_Feature_Engineering_and_Selection
41 pages
ML_Science
No ratings yet
ML_Science
6 pages
Lect_05_Preprocessing_text
No ratings yet
Lect_05_Preprocessing_text
25 pages
MSBA315_Syllabus_2025
No ratings yet
MSBA315_Syllabus_2025
6 pages
MSBA315-Project-Description
No ratings yet
MSBA315-Project-Description
1 page
Gage R&R Tool - Average and Range Method (Control Chart Method, Xbar and R Method)
No ratings yet
Gage R&R Tool - Average and Range Method (Control Chart Method, Xbar and R Method)
4 pages
20220108202159D6130 - 01-02 Systems of Linear Equations-Update
No ratings yet
20220108202159D6130 - 01-02 Systems of Linear Equations-Update
31 pages
Genuine Accessories 3
No ratings yet
Genuine Accessories 3
8 pages
Optima SG Series Sliding Gate Manual
No ratings yet
Optima SG Series Sliding Gate Manual
20 pages
750-402, 750-403 / 753-402, 753-403 4-Channel Digital Input Module 24 V DC
No ratings yet
750-402, 750-403 / 753-402, 753-403 4-Channel Digital Input Module 24 V DC
1 page
Schematic
No ratings yet
Schematic
1 page
Test Bank 7
100% (1)
Test Bank 7
28 pages
Voxyvi A System For Long-Term Audio and Video Acquisitions in Neonatal
No ratings yet
Voxyvi A System For Long-Term Audio and Video Acquisitions in Neonatal
15 pages
Python Assignment-3
No ratings yet
Python Assignment-3
8 pages
Crowther 2002_Rowe and Kahn's Model of Successful Aging Revisited Positive Spirituality_The Forgotten Factor
No ratings yet
Crowther 2002_Rowe and Kahn's Model of Successful Aging Revisited Positive Spirituality_The Forgotten Factor
9 pages
Theory Proposed by Judith Graves and Sheila Corcoran's Model (1989)
No ratings yet
Theory Proposed by Judith Graves and Sheila Corcoran's Model (1989)
2 pages
TOT (1)
No ratings yet
TOT (1)
119 pages
OTDR Report: Total Fiber Information
No ratings yet
OTDR Report: Total Fiber Information
1 page
B. Tech. VI SEM FSD Question Bank 2024 25
No ratings yet
B. Tech. VI SEM FSD Question Bank 2024 25
3 pages
Major Project Report Format
No ratings yet
Major Project Report Format
9 pages
Coding Qualitative Data
No ratings yet
Coding Qualitative Data
9 pages
Noc 25 Cs 11 s 1537303036
No ratings yet
Noc 25 Cs 11 s 1537303036
3 pages
Intensive English 3: Week 3 Online Session 1 Unit 4: Gadgets
No ratings yet
Intensive English 3: Week 3 Online Session 1 Unit 4: Gadgets
19 pages
QOS Management Protocol For Mobile Ad Hoc Networks Using Mobile Agents - SpringerLink
No ratings yet
QOS Management Protocol For Mobile Ad Hoc Networks Using Mobile Agents - SpringerLink
9 pages
Web Access Management and Single Sign-On: Ronnie Dale Huggins
No ratings yet
Web Access Management and Single Sign-On: Ronnie Dale Huggins
9 pages
A Novel Pipeline Leak Detection Approach Independent of Prior Failure Information
No ratings yet
A Novel Pipeline Leak Detection Approach Independent of Prior Failure Information
12 pages
UsingCOMInAutomationDeskApplicationNote
No ratings yet
UsingCOMInAutomationDeskApplicationNote
30 pages
Certification and Security in Inter Organizational E Services IFIP 18th World Computer Congress August 22 27 2004 Toulouse France IFIP International Federation for Information Processing 1st Edition Enrico Nardelli pdf download
100% (1)
Certification and Security in Inter Organizational E Services IFIP 18th World Computer Congress August 22 27 2004 Toulouse France IFIP International Federation for Information Processing 1st Edition Enrico Nardelli pdf download
55 pages
KS1&2 Operation Manual-EN-V2.1
No ratings yet
KS1&2 Operation Manual-EN-V2.1
18 pages
IoT Box 60533323
No ratings yet
IoT Box 60533323
12 pages
Week-04Assignment MCQ
No ratings yet
Week-04Assignment MCQ
5 pages