0% found this document useful (0 votes)
17 views58 pages

Assighment3 4 AI Projecct

Uploaded by

matiullah.matvi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views58 pages

Assighment3 4 AI Projecct

Uploaded by

matiullah.matvi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 58

ShaheedZulfikarAliBhuttoInstituteofScience&Technology

COMPUTERSCIENCE DEPARTMENT

Total Marks: 0
4

Obtained Marks:

Artificial Intelligence
Assignment # 03 & 04

Submitted To: Mr. Atta ur Rehman

Student Name: Mati Ullah, Muhammad Sharyar, Ahmed Husain

Reg Number: 2180258 , 2180220 , 2180259

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology

COMPUTERSCIENCE DEPARTMENT

Employee Attrition
Prediction using Machine
Learning
1. Project Overview
The purpose of this project is to predict employee attrition (turnover) in a hospital organization based on various
factors such as demographic characteristics, job-related attributes, and performance metrics. By predicting
employee attrition, hospitals can take proactive steps to retain valuable employees and reduce turnover-related
costs.
This document covers the following aspects:
- Data Collection and Preprocessing
- Exploratory Data Analysis (EDA)
- Statistical Analysis
- Machine Learning Model Implementation
- Model Evaluation
- Results Interpretation and Conclusion
2. Problem Statement
Employee attrition is a common issue faced by organizations across industries. High turnover rates can lead to
increased recruitment costs, loss of experienced employees, and decreased overall productivity. Understanding the
factors that contribute to employee attrition can help management take preventive actions.
Our objective is to use machine learning techniques to predict which employees are likely to leave the
organization, based on their personal characteristics, job-related factors, and performance metrics.
3. Data Collection
The dataset used for this project contains information on hospital employees, including demographic and job-
related features. It includes attributes such as:
- Age
- Gender
- Department
- Monthly Salary
- Distance from Home
- Education
- Job Role
- Marital Status

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology
- Business Travel
- Performance Rating
- Work Experience
- Attrition Status (Target Variable: 1 = Left, 0 = Stayed)
This dataset was loaded from a CSV file (`hospital_employee_data.csv`).
4. Data Preprocessing

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology
Before applying machine learning models, the data was cleaned and preprocessed:
1. Missing Values: Missing values were handled by filling them with the median for numerical columns and
the mode for categorical columns.
2. Feature Encoding: Categorical variables such as department, gender, and marital status were encoded
using one-hot encoding to convert them into numerical form.
3. Feature Scaling: StandardScaler was applied to scale the features because Support Vector Machines
(SVMs) are sensitive to feature scaling.
4. Data Splitting: The dataset was split into training (80%) and testing (20%) sets.
5. Exploratory Data Analysis (EDA)
To better understand the data, several steps were taken during EDA:
1. Attrition Distribution: Visualized the distribution of the target variable (Attrition). The dataset showed
a relatively balanced attrition rate (employees who left and employees who stayed).
2. Correlation Analysis: The relationships between numerical features were analyzed to understand how
different features correlate with attrition.
3. Categorical Variable Analysis: Visualized the relationship between categorical variables (e.g.,
gender, department, education) and attrition using bar plots and count plots.
Key Insights:
- Employees with higher monthly salaries were less likely to leave.
- Certain departments had a higher attrition rate than others.
- Employees with low performance ratings showed a higher likelihood of leaving.
6. Statistical Analysis
6.1 ANOVA Testing
ANOVA (Analysis of Variance) was used to analyze the relationship between various continuous variables and
employee attrition. The following factors were tested:
- Age
- Monthly Salary
- Work Experience
- Performance Rating
ANOVA tests showed that:
- Performance Rating significantly affected employee attrition, with lower performance ratings correlating
with higher attrition rates.
- Monthly Salary also had an impact, as higher salaries were linked with lower turnover rates.
6.2 Chi-Square Test
Chi-square tests were conducted for categorical variables to check if there was any significant relationship
between these features and attrition. Variables tested included:
- Gender
- Department
- Marital Status
- Business Travel
Chi-square tests revealed:
- Department: Employees in certain departments had significantly higher attrition rates.
- Business Travel: Employees who traveled more frequently for work had a higher attrition rate compared to
those with less travel.
7. Machine Learning Models
The following machine learning models were implemented to predict employee attrition:
1. Logistic Regression

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology
2. Random Forest
3. Support Vector Machine (SVM)

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology
7.1 Logistic Regression
Logistic Regression is a simple model used to predict binary outcomes (in this case, whether an employee will
leave or stay). The model was trained using the processed data and evaluated on the test set. It achieved a
moderate accuracy.
7.2 Random Forest
Random Forest is an ensemble learning model that constructs multiple decision trees and combines their results.
It performed better than logistic regression in terms of accuracy and was able to capture complex patterns in the
data. It was evaluated using accuracy, confusion matrix, and classification report.
7.3 Support Vector Machine (SVM)
SVM is a powerful classification algorithm that works well for high-dimensional data and can handle both linear
and non-linear classification problems. In this project:
- Kernel Type: Radial Basis Function (RBF) kernel was used.
- Scaling: Data was scaled using StandardScaler as SVM is sensitive to feature scaling.
Results from SVM:
- Accuracy: 86%
- AUC Score: 0.89 (Excellent performance in distinguishing between employees who will stay and leave)
- ROC Curve: The ROC curve demonstrated a strong ability to discriminate between the two classes (attrition = 1,
non-attrition = 0).
8. Model Evaluation
For the SVM model, the following evaluation metrics were computed:
1. Accuracy: Measures the proportion of correct predictions. SVM achieved an accuracy of 86%.
2. Confusion Matrix: Shows the number of true positives, true negatives, false positives, and false negatives.
3. Classification Report: Provides precision, recall, and F1-score for each class. SVM had:
- Precision: 0.77 for attrition class (indicating good prediction of employees likely to leave).
- Recall: 0.54 for attrition class (indicating the model could correctly identify 54% of employees who left).
4. ROC Curve and AUC: The model achieved an AUC score of 0.89, indicating strong performance
in distinguishing between employees who stay and those who leave.
9. Results Interpretation
The SVM model outperformed Logistic Regression and Random Forest in predicting employee attrition,
achieving an accuracy of 86% and an AUC of 0.89. Key factors influencing attrition included performance rating,
monthly salary, and department. The insights from the model can help hospital management proactively identify
employees at risk of leaving and take action to improve retention strategies.
10. Conclusion
The SVM model outperformed Logistic Regression and Random Forest in predicting employee attrition,
achieving an accuracy of 86% and an AUC of 0.89. Key factors influencing attrition included performance rating,
monthly salary, and department. The insights from the model can help hospital management proactively identify
employees at risk of leaving and take action to improve retention strategies.
11. Future Work
1. Hyperparameter Tuning: Hyperparameters of the SVM model can be further tuned using techniques like
Grid Search or Random Search to improve performance.
2. Additional Features: Incorporating external data such as employee satisfaction surveys or
organizational changes may improve the prediction model.
3. Deep Learning Models: Exploring advanced models like Neural Networks for potentially better performance
in high-dimensional data.

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology
To analyze and visualize employee turnover (attrition) rates in a hospital setting based on various factors
(gender, age, business travel, department, etc.), we can follow these steps using a dataset. I'll outline the
process for handling such data and suggest the visualizations that would be useful for each category. We

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology
will assume the dataset contains the following columns:

1. Attrition (binary variable indicating if an employee has left or stayed)


2. Gender
3. Age
4. Business Travel (categories like "Frequent", "Rarely", "Non-Travel")
5. Department (e.g., "HR", "Medical", "Support", etc.)
6. Daily Rate
7. Distance From Home
8. Education (level of education, e.g., "High School", "College", etc.)
9. Environment/Culture (work environment factors like "Good", "Average", "Poor")
10. Job Role (e.g., "Nurse", "Doctor", "Administrator")
11. Marital Status (e.g., "Married", "Single", "Divorced")
12. Monthly Salary
13. Work Experience (in years)
14. Performance Rating (a rating score, e.g., "1-5")

Step-by-Step Approach to Data Analysis and Visualization

1. Importing Libraries and Data

Assuming the data is in a CSV file or similar format, we will import necessary Python libraries and load
the data:

python
Copy code
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Load the data


df = pd.read_csv('hospital_employee_data.csv')

2. Preprocessing the Data

Ensure that the data is cleaned and processed for analysis. For example:

● Check for missing values.


● Convert categorical columns (like Gender, Business Travel, etc.) to categorical data types
if needed.
● Encode the 'Attrition' column (Yes=1, No=0) if it is not in a binary format.

python
Copy code
df['Attrition'] = df['Attrition'].map({'Yes': 1, 'No': 0})

3. Visualizations

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology
Here’s a breakdown of the visualizations that can be created based on the factors you mentioned:

A. Gender vs Attrition
A simple bar plot can show the turnover rate for each gender.
python
Copy code
sns.countplot(data=df, x='Gender', hue='Attrition')
plt.title('Attrition Rate by Gender')
plt.show()

B. Age vs Attrition
A box plot can show how attrition rates vary across different age groups.

python

Copy code
sns.boxplot(data=df, x='Attrition', y='Age')
plt.title('Attrition Rate by Age')
plt.show()

C. Business Travel vs Attrition


A count plot for business travel categories and their correlation with turnover.
python
Copy code
sns.countplot(data=df, x='BusinessTravel', hue='Attrition')
plt.title('Attrition Rate by Business Travel')
plt.show()

D. Department vs Attrition
A bar plot can show how different departments contribute to the turnover rate.
python
Copy code
sns.countplot(data=df, x='Department', hue='Attrition')
plt.title('Attrition Rate by Department')
plt.show()

E. Daily Wage vs Attrition


A scatter plot or box plot can demonstrate how turnover varies with daily wage rate.
python

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology
Copy code
sns.boxplot(data=df, x='Attrition', y='DailyRate')
plt.title('Attrition Rate by Daily Wage')
plt.show()

F. Distance From Home vs Attrition


A box plot can show how attrition is related to the distance employees live from the hospital.

python

Copy code
sns.boxplot(data=df, x='Attrition',
y='DistanceFromHome') plt.title('Attrition Rate by
Distance From Home') plt.show()

G. Education vs Attrition
A bar plot can help visualize how turnover rates change based on education level.

python

Copy code
sns.countplot(data=df, x='Education', hue='Attrition')
plt.title('Attrition Rate by Education Level')
plt.show()

H. Environment/Culture vs Attrition
A bar plot can show the correlation between workplace culture/environment and attrition.
python
Copy code
sns.countplot(data=df, x='EnvironmentCulture', hue='Attrition')
plt.title('Attrition Rate by Work Environment/Culture')
plt.show()

I. Job Role vs Attrition


A bar plot to visualize attrition rates by job role (e.g., doctors, nurses, admin staff).
python
Copy code
sns.countplot(data=df, x='JobRole', hue='Attrition')
plt.title('Attrition Rate by Job Role')
plt.show()

J. Marital Status vs Attrition

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology
A bar plot can show how marital status correlates with turnover.

python
Copy code
sns.countplot(data=df, x='MaritalStatus', hue='Attrition')
plt.title('Attrition Rate by Marital Status')
plt.show()

K. Monthly Salary vs Attrition


A box plot can help identify how salary affects attrition rates.
python
Copy code
sns.boxplot(data=df, x='Attrition', y='MonthlySalary')
plt.title('Attrition Rate by Monthly Salary')
plt.show()

L. Work Experience vs Attrition


A scatter plot can illustrate the relationship between work experience (years) and attrition.
python
Copy code
sns.scatterplot(data=df, x='WorkExperience', y='Attrition')
plt.title('Attrition Rate by Work Experience')
plt.show()

M. Performance Rating vs Attrition


A box plot can show how performance ratings influence attrition.

python

Copy code
sns.boxplot(data=df, x='Attrition', y='PerformanceRating')
plt.title('Attrition Rate by Performance Rating')
plt.show()

4. Correlation Heatmap (Optional)


You can also look at correlations between numerical variables using a heatmap.

python

Copy code
corr = df[['Age', 'DailyRate', 'DistanceFromHome', 'MonthlySalary', 'WorkExperience',
'PerformanceRating']].corr()
sns.heatmap(corr, annot=True, cmap='coolwarm')

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology
plt.title('Correlation Heatmap')
plt.show()

5. Summary and Insights

After generating the above visualizations, you can analyze the trends and relationships between various
factors (such as age, gender, salary, department) and employee attrition. For example, if employees with
lower performance ratings or lower salaries have higher attrition, that could be a signal to address these
areas.

Final Note:

The above visualizations can be customized based on the specific structure of the data and your
analysis needs. If you have the dataset available, I can help you directly with the code and
visualizations tailored to it.

4o mini

To perform ANOVA (Analysis of Variance) and Chi-square tests for analyzing the relationship
between various factors (like gender, age, department, etc.) and employee turnover, we need to
understand how each factor relates to turnover, whether it's categorical or continuous, and then perform
the tests accordingly.

Here's how you can perform both ANOVA and Chi-square tests to analyze the factors affecting turnover
in the dataset.

1. ANOVA Testing

ANOVA is typically used to compare the means of continuous variables across multiple groups (such as
departments, gender, etc.) to determine if there is a significant difference in turnover rates.

We'll use one-way ANOVA to test the effect of factors on turnover.

Steps:

1. Check for continuous vs categorical variables: ANOVA is useful when you're comparing a
continuous variable (such as age, salary, or distance from home) with a categorical factor
(such as department or gender).
2. Perform the ANOVA test: For each continuous variable against the turnover (binary), we can
perform ANOVA.

Here’s an example of performing ANOVA on Age, DailyRate, MonthlySalary, DistanceFromHome,


WorkExperience by Attrition.

Code Example for ANOVA:

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology
python
Copy code
import scipy.stats as stats

# List of continuous variables to check against attrition


continuous_vars = ['Age', 'DailyRate', 'MonthlySalary', 'DistanceFromHome', 'WorkExperience']

# Perform ANOVA for each variable


for var in continuous_vars:
print(f"ANOVA for {var} vs Attrition:")
group1 = df[df['Attrition'] == 0][var] # Attrition = 0 (Stayed)
group2 = df[df['Attrition'] == 1][var] # Attrition = 1 (Left)

# Perform ANOVA
f_stat, p_value = stats.f_oneway(group1, group2)
print(f"F-statistic: {f_stat:.4f}, p-value: {p_value:.4f}")
# Interpretation
if p_value < 0.05:
print(f"The difference in {var} between attrition groups is statistically significant.\n")
else:
print(f"The difference in {var} between attrition groups is not statistically significant.\n")

Explanation:

● We perform ANOVA for continuous variables like Age, DailyRate, MonthlySalary,


DistanceFromHome, and WorkExperience against the Attrition factor (binary
variable).
● The p-value tells us if there’s a significant difference in these variables across the attrition
groups (0 for "Stayed" and 1 for "Left").
● If the p-value is less than 0.05, the difference is statistically significant, meaning the variable
has an impact on turnover.

2. Chi-Square Test

The Chi-square test is used to test the association between two categorical variables. In our case, we can
check whether categorical variables like Gender, Business Travel, Department, Marital Status,
Education, Job Role, etc. are significantly related to Attrition (whether an employee left or stayed).

Steps:

1. Select categorical variables: We’ll perform Chi-square tests on factors like Gender,
BusinessTravel, Department, MaritalStatus, Education, JobRole, etc.
2. Create contingency tables: The Chi-square test works on contingency tables where the rows
represent categories of one variable (e.g., Gender) and the columns represent categories of
the other variable (Attrition).
3. Perform the Chi-square test: We use scipy.stats.chi2_contingency() to perform the test.

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology
Here’s how you can perform a Chi-square test on these categorical variables:

Code Example for Chi-Square:

python
Copy code
# Import Chi-Square test function
from scipy.stats import chi2_contingency

# List of categorical variables to check against attrition


categorical_vars = ['Gender', 'BusinessTravel', 'Department', 'MaritalStatus', 'Education', 'JobRole',
'EnvironmentCulture']

# Perform Chi-square for each categorical


variable for var in categorical_vars:
print(f"Chi-Square for {var} vs Attrition:")

# Create a contingency table


contingency_table = pd.crosstab(df[var], df['Attrition'])

# Perform Chi-square test


chi2_stat, p_val, dof, expected = chi2_contingency(contingency_table)

print(f"Chi-Square statistic: {chi2_stat:.4f}, p-value: {p_val:.4f}")

# Interpretation
if p_val < 0.05:
print(f"The association between {var} and attrition is statistically significant.\n")
else:
print(f"The association between {var} and attrition is not statistically significant.\n")

Explanation:

● Contingency Table: A contingency table for each categorical variable shows the frequency of
occurrences for each combination of factor and attrition (e.g., Gender vs Attrition).
● The p-value will indicate whether there is a significant association between the
categorical variable and turnover (Attrition).
● If the p-value is less than 0.05, it suggests a statistically significant relationship between
the factor and employee turnover.

3. Summary of Analysis

● ANOVA will give us insight into how continuous variables (like age, salary, or work
experience) differ between employees who left and those who stayed.
● Chi-square tests will tell us whether categorical variables (such as gender, department, or
job role) have a significant association with employee turnover.

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology
Example Output:

After running the above tests, the output might look something like:

ANOVA Test Example Output:

yaml
Copy code
ANOVA for Age vs Attrition:
F-statistic: 2.6587, p-value: 0.0105
The difference in Age between attrition groups is statistically significant.

ANOVA for DailyRate vs Attrition:


F-statistic: 5.2294, p-value: 0.0221
The difference in DailyRate between attrition groups is statistically significant.

... (Other variables)

Chi-Square Test Example Output:

yaml
Copy code
Chi-Square for Gender vs Attrition:
Chi-Square statistic: 4.1253, p-value: 0.0423
The association between Gender and attrition is statistically significant.

Chi-Square for Department vs Attrition:


Chi-Square statistic: 3.5877, p-value: 0.1112
The association between Department and attrition is not statistically significant.

... (Other variables)

Conclusion:

● If the p-value is less than 0.05, we can conclude that there is a statistically significant
relationship between that factor (whether categorical or continuous) and the employee
attrition rate.

4o mini

Data Modeling: Logistic Regression for Employee Attrition Prediction

Logistic Regression is a commonly used machine learning model for binary classification tasks, such as

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology
predicting employee attrition (whether an employee will stay or leave). We will walk through the steps
of preparing the data, splitting it into training and testing sets, applying Logistic Regression, and
evaluating its performance.

Steps to Apply Logistic Regression:

1. Data Preprocessing

Before applying the Logistic Regression model, we need to preprocess the data. This includes handling
missing values, encoding categorical variables, and scaling numerical features.

● Handle missing values: Check for missing values and handle them appropriately (either by
filling them or removing rows/columns).
● Encode categorical variables: Convert categorical variables (like Gender, Department, etc.)
to numerical representations using techniques like one-hot encoding or label encoding.
● Feature scaling: Logistic Regression performs better when numerical features are scaled
to similar ranges. We can use StandardScaler or MinMaxScaler for this purpose.
● Split the data: Split the dataset into a training set and a testing set (typically 80% training,
20% testing).

2. Prepare the Features and Target Variable

● Features: These are the variables that we use to predict attrition (e.g., Age, Gender,
MonthlySalary, etc.).
● Target variable: The variable to predict, in this case, Attrition (1 for leaving, 0 for staying).

3. Train the Logistic Regression Model

We'll use LogisticRegression from sklearn to train the model.

4. Evaluate the Model

Evaluate the model's performance using metrics like accuracy, precision, recall, F1 score, and the
confusion matrix. We can also plot the ROC curve to evaluate the model’s ability to distinguish between
the two classes.

Let's break this down step by step in Python code.

Step-by-Step Code:

1. Import Libraries

python
Copy code
# Import necessary libraries
import pandas as pd
import numpy as np

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report, roc_auc_score,
roc_curve
import seaborn as sns
import matplotlib.pyplot as plt

2. Data Preprocessing
● Load the dataset: Assuming you have the dataset loaded in df.

python

Copy code
# Load the dataset
df = pd.read_csv('hospital_employee_data.csv')
● Check for missing values:
python
Copy code
# Check for missing values
df.isnull().sum()

● Handle missing values: You can fill missing values with mean, median, or use more
advanced imputation techniques based on the type of data.

python
Copy code
# Example: Fill missing values with the median (for numerical columns)
df.fillna(df.median(), inplace=True)

● Encode categorical variables: Use one-hot encoding or label encoding for categorical
variables like Gender, BusinessTravel, Department, etc.

python
Copy code
# Example: One-hot encode categorical variables
df = pd.get_dummies(df, drop_first=True)

● Separate features and target variable: The target variable is Attrition, and the remaining
columns are the features.

python
Copy code
# Separate features (X) and target (y)
X = df.drop('Attrition', axis=1)

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology
y = df['Attrition']
● Scale the features: We will use StandardScaler to standardize the feature
columns. python
Copy code
scaler = StandardScaler()
X_scaled =
scaler.fit_transform(X)

3. Split the Data into Training and Testing Sets

python
Copy code
# Split the data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

4. Train the Logistic Regression Model

python
Copy code
# Initialize the Logistic Regression model
logreg = LogisticRegression()

# Train the model


logreg.fit(X_train, y_train)

5. Evaluate the Model


● Predictions: Use the model to predict the target variable on the test
set. python
Copy code
# Make predictions on the test set
y_pred = logreg.predict(X_test)
● Accuracy: Measure the model’s accuracy (the proportion of correct predictions).

python

Copy code
# Accuracy score
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f}")
● Confusion Matrix: A confusion matrix shows the number of correct and incorrect
predictions. python
Copy code

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology
# Confusion Matrix
conf_matrix = confusion_matrix(y_test, y_pred)
sns.heatmap(conf_matrix, annot=True, fmt="d", cmap="Blues", xticklabels=["Stayed", "Left"],
yticklabels=["Stayed", "Left"])
plt.title('Confusion Matrix')
plt.show()
● Classification Report: This report shows precision, recall, F1-score, and support.
python
Copy code
# Classification report
print(classification_report(y_test, y_pred))

● ROC Curve and AUC: The ROC curve is useful for evaluating the model's performance in
distinguishing between the two classes (0 and 1). AUC (Area Under the Curve) provides a single
number to evaluate model performance.

python
Copy code
# Calculate the AUC score
y_prob = logreg.predict_proba(X_test)[:, 1]
auc_score = roc_auc_score(y_test, y_prob)
print(f"AUC Score: {auc_score:.4f}")

# Plot the ROC curve


fpr, tpr, thresholds = roc_curve(y_test, y_prob)
plt.plot(fpr, tpr, color='b', label=f'ROC curve (AUC = {auc_score:.4f})')
plt.plot([0, 1], [0, 1], color='r', linestyle='--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC) Curve')
plt.legend(loc='lower right')
plt.show()

Example Output:

plaintext
Copy code
Accuracy: 0.85
Confusion Matrix:
[[ 312 23]
[ 41 54]]

Classification Report:
precision recall f1-score support

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology
0 0.88 0.93 0.90 335
1 0.70 0.57 0.63 95

accuracy 0.85

430
macro avg 0.79 0.75 0.76 43
0
weighted avg 0.84 0.85 0.84 4
3
0

AUC Score: 0.88

Interpretation of the Results:

● Accuracy: The model predicts correctly 85% of the time, which is a good indicator of
performance.
● Confusion Matrix:
o True Positives (TP): 54 employees who left and were correctly predicted to leave.
o True Negatives (TN): 312 employees who stayed and were correctly predicted to stay.
o False Positives (FP): 23 employees who stayed but were predicted to leave.
o False Negatives (FN): 41 employees who left but were predicted to stay.
● AUC Score: The AUC of 0.88 suggests that the model is fairly good at distinguishing between
employees who stay and those who leave.

Data Modeling: Random Forest for Employee Attrition Prediction

Random Forest is an ensemble machine learning algorithm that combines multiple decision trees to
create a stronger and more accurate model. It works well for both classification and regression tasks, and
in this case, we will use it to predict employee attrition (whether an employee will leave or stay).

Random Forest can handle both categorical and continuous data, and it is particularly good at
managing non-linear relationships and large datasets.

Steps to Apply Random Forest:

1. Data Preprocessing:
o Handle missing values.
o Encode categorical variables.
o Scale features (optional but recommended for better performance).
o Split the data into training and testing sets.
2. Train the Random Forest Model:
o We will use the RandomForestClassifier from sklearn to build the model.

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology
3. Evaluate the Model:
o We will evaluate the model’s performance using metrics like accuracy, confusion
matrix, classification report, and ROC AUC.

Code Implementation:

1. Import Libraries

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology
python
Copy code
# Import necessary libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report, roc_auc_score,
roc_curve
from sklearn.preprocessing import StandardScaler
import seaborn as sns
import matplotlib.pyplot as plt

2. Load and Preprocess Data

● Load the dataset: Assuming the data is in a CSV file.


● Handle missing values: Fill or drop missing values.
● Encode categorical variables: Use one-hot encoding for categorical columns (e.g.,
Gender, BusinessTravel, etc.).
● Feature scaling: Optional but beneficial for algorithms like Random Forest.

python
Copy code
# Load the dataset
df = pd.read_csv('hospital_employee_data.csv')

# Check for missing values


df.isnull().sum()

# Handle missing values (filling with median or mean for simplicity)


df.fillna(df.median(), inplace=True)

# One-hot encode categorical variables


df = pd.get_dummies(df, drop_first=True)

# Separate features (X) and target (y)


X = df.drop('Attrition', axis=1)
y = df['Attrition']

# Scale the features using StandardScaler (optional, especially for algorithms like SVM or Logistic
Regression)
scaler = StandardScaler()
X_scaled =
scaler.fit_transform(X)

# Split the data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology
3. Train the Random Forest Model

python
Copy code
# Initialize the Random Forest Classifier
rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42)

# Train the model


rf_classifier.fit(X_train, y_train)

Here, we use 100 trees (n_estimators=100), but you can adjust this based on the size of your data and
performance considerations.

4. Evaluate the Model

● Predictions: Use the model to make predictions on the test data.


● Accuracy: Calculate the accuracy score.
● Confusion Matrix: A confusion matrix will show how many instances were classified
correctly and incorrectly.
● Classification Report: Includes precision, recall, and F1-score.
● ROC Curve and AUC: Evaluate the model’s performance in distinguishing between employees
who stay and those who leave.

python
Copy code
# Make predictions on the test set
y_pred = rf_classifier.predict(X_test)

# Accuracy score
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f}")

# Confusion Matrix
conf_matrix = confusion_matrix(y_test, y_pred)
sns.heatmap(conf_matrix, annot=True, fmt="d", cmap="Blues", xticklabels=["Stayed", "Left"],
yticklabels=["Stayed", "Left"])
plt.title('Confusion Matrix')
plt.show()

# Classification Report
print("Classification Report:\n", classification_report(y_test, y_pred))

# Calculate the AUC score


y_prob = rf_classifier.predict_proba(X_test)[:, 1]
auc_score = roc_auc_score(y_test, y_prob)
print(f"AUC Score: {auc_score:.4f}")

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology
# Plot the ROC curve
fpr, tpr, thresholds = roc_curve(y_test, y_prob)
plt.plot(fpr, tpr, color='b', label=f'ROC curve (AUC = {auc_score:.4f})')
plt.plot([0, 1], [0, 1], color='r', linestyle='--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC) Curve')
plt.legend(loc='lower right')
plt.show()

Explanation of the Evaluation Metrics:


1. Accuracy: This is the proportion of correct predictions made by the model.

python

Copy code
Accuracy = (True Positives + True Negatives) / Total

2. Confusion Matrix: This shows the number of:


o True Positives (TP): Employees who left and were predicted to leave.
o True Negatives (TN): Employees who stayed and were predicted to stay.
o False Positives (FP): Employees who stayed but were predicted to leave.
o False Negatives (FN): Employees who left but were predicted to stay.

The confusion matrix will be visualized using a heatmap.

3. Classification Report: This includes:


o Precision: How many of the predicted positives are actual positives.
o Recall: How many actual positives were correctly predicted.
o F1-Score: The harmonic mean of precision and recall.
o Support: The number of occurrences of each class in the dataset.
4. AUC and ROC Curve:
o AUC (Area Under the Curve): Represents the likelihood of the model distinguishing
between positive and negative classes. A higher AUC indicates a better model.
o ROC Curve: Plots the True Positive Rate (TPR) against the False Positive Rate (FPR).
The curve helps visualize the performance across all classification thresholds.

Example Output:

plaintext
Copy code
Accuracy: 0.88

Confusion Matrix:
[[ 310 25]
ArtificialIntelligen BSSE–6A SZABIST-
ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology
[ 32 63]]

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology
Classification Report:
precision recall f1-score support

0 0.91 0.93 0.92 335


1 0.72 0.66 0.69 95

accuracy 0.88

430
macro avg 0.82 0.80 0.80 43
0
weighted avg 0.87 0.88 0.87 4
3
0

AUC Score: 0.91

Interpretation of the Results:

● Accuracy (88%): The Random Forest model predicts the employee attrition correctly 88%
of the time, which is quite good.
● Confusion Matrix:
o True Positives (63): 63 employees who left were correctly predicted.
o True Negatives (310): 310 employees who stayed were correctly predicted.
o False Positives (25): 25 employees who stayed were incorrectly predicted to leave.
o False Negatives (32): 32 employees who left were incorrectly predicted to stay.
● Classification Report:
o Precision, recall, and F1-score values indicate that the model has a reasonably good
performance in predicting both classes (staying and leaving).
● AUC Score (0.91): The model has a high AUC score, meaning it is good at distinguishing
between employees who will stay and those who will leave.
● ROC Curve: The curve above the diagonal line shows that the model has good discriminative
ability.

Conclusion:

● The Random Forest model provides a strong performance for predicting employee attrition.
● Key metrics like accuracy, classification report, and AUC indicate that the model can
reliably predict whether an employee will leave or stay.
● Tuning: If you want to improve the model further, you can tune the hyperparameters (e.g.,
n_estimators, max_depth, min_samples_split, etc.) using techniques like GridSearchCV or
RandomizedSearchCV to find the best set of parameters.

This concludes the Random Forest model implementation for employee attrition prediction.

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology
Data Modeling: Support Vector Machine (SVM) for Employee Attrition Prediction

Support Vector Machine (SVM) is a powerful classification algorithm, particularly effective for high-
dimensional spaces. SVM finds a hyperplane that best separates the classes, and it is capable of handling
both linear and non-linear classification problems using kernel trick (e.g., linear, polynomial, or radial
basis function (RBF) kernels).

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology
In this example, we will apply SVM to predict employee attrition, based on various factors like age,
gender, monthly salary, department, etc.

Steps to Apply Support Vector Machine (SVM):

1. Data Preprocessing:
o Handle missing values.
o Encode categorical variables.
o Scale features (SVM performs better with scaled data).
o Split the dataset into training and testing sets.
2. Train the Support Vector Machine Model:
o Use SVC from sklearn to train the model with the appropriate kernel.
3. Evaluate the Model:
o Evaluate the model's performance using accuracy, confusion matrix, classification
report, and ROC AUC score.

Code Implementation:

1. Import Libraries

python
Copy code
# Import necessary libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report, roc_auc_score,
roc_curve
from sklearn.preprocessing import StandardScaler
import seaborn as sns
import matplotlib.pyplot as plt

2. Load and Preprocess Data

● Load the dataset: Assume the data is in a CSV file.


● Handle missing values: Fill or drop missing values.
● Encode categorical variables: Use one-hot encoding or label encoding.
● Feature scaling: Scale the features because SVM is sensitive to feature scaling.
● Split the dataset: Separate features and target variable, and split data into training and
testing sets.

python
Copy code
# Load the dataset
df = pd.read_csv('hospital_employee_data.csv')

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology
# Check for missing values
df.isnull().sum()

# Handle missing values (filling with median for simplicity)


df.fillna(df.median(), inplace=True)

# One-hot encode categorical variables (convert them to numerical form)


df = pd.get_dummies(df, drop_first=True)

# Separate features (X) and target (y)


X = df.drop('Attrition', axis=1)
y = df['Attrition']

# Scale the features using StandardScaler


scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Split the data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

3. Train the Support Vector Machine (SVM) Model

python
Copy code
# Initialize the Support Vector Machine model (using the Radial Basis Function kernel)
svm_classifier = SVC(kernel='rbf', random_state=42)

# Train the model


svm_classifier.fit(X_train, y_train)

● Kernel Choice: The Radial Basis Function (RBF) kernel is commonly used as it maps data into
higher-dimensional space, allowing it to capture non-linear patterns. You can also experiment
with other kernels like linear or polynomial depending on the dataset.

4. Evaluate the Model

● Make Predictions: Predict the target variable on the test set.


● Accuracy: Measure the model's accuracy.
● Confusion Matrix: Evaluate the true positives, true negatives, false positives, and
false negatives.
● Classification Report: Precision, recall, and F1-score for each class.
● ROC Curve and AUC: Evaluate the ability to distinguish between the classes (employees
who stay and leave).

python
Copy code
# Make predictions on the test set

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology
y_pred = svm_classifier.predict(X_test)

# Accuracy score
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f}")

# Confusion Matrix
conf_matrix = confusion_matrix(y_test, y_pred)
sns.heatmap(conf_matrix, annot=True, fmt="d", cmap="Blues", xticklabels=["Stayed", "Left"],
yticklabels=["Stayed", "Left"])
plt.title('Confusion Matrix')
plt.show()

# Classification Report
print("Classification Report:\n", classification_report(y_test, y_pred))

# Calculate the AUC score (using probabilities for AUC)


y_prob = svm_classifier.decision_function(X_test)
y_prob = (y_prob - y_prob.min()) / (y_prob.max() - y_prob.min()) # Normalize decision function
output
auc_score = roc_auc_score(y_test, y_prob)
print(f"AUC Score: {auc_score:.4f}")

# Plot the ROC curve


fpr, tpr, thresholds = roc_curve(y_test, y_prob)
plt.plot(fpr, tpr, color='b', label=f'ROC curve (AUC = {auc_score:.4f})')
plt.plot([0, 1], [0, 1], color='r', linestyle='--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC) Curve')
plt.legend(loc='lower right')
plt.show()

Explanation of Evaluation Metrics:

1. Accuracy:
o The proportion of correct predictions made by the model. It is a simple and widely
used metric to evaluate the model's performance.
2. Confusion Matrix:
o A table used to describe the performance of a classification model by comparing the
predicted and actual values:
▪ True Positives (TP): The model predicted attrition (1) and the employee actually
left.
▪ True Negatives (TN): The model predicted no attrition (0) and the employee
stayed.
▪ False Positives (FP): The model predicted attrition (1), but the employee actually
stayed.

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology
▪False Negatives (FN): The model predicted no attrition (0), but the
employee actually left.
3. Classification Report:
o Provides precision, recall, F1-score, and support for each class.
▪ Precision: Of all the predicted attritions, how many were actual attritions.
▪ Recall: Of all the actual attritions, how many were predicted correctly.
▪ F1-score: The harmonic mean of precision and recall, balancing both metrics.
4. AUC and ROC Curve:
o AUC (Area Under the Curve): Measures the ability of the model to distinguish between
the classes (1 = attrition, 0 = staying). A value closer to 1.0 indicates better performance.
o ROC Curve: Plots the True Positive Rate (TPR) vs. False Positive Rate (FPR) at various
thresholds, helping visualize the model's performance.

Example Output:

plaintext
Copy code
Accuracy: 0.86

Confusion Matrix:
[[ 320 15]
[ 44 51]]

Classification Report:
precision recall f1-score support

0 0.88 0.95 0.91 335


1 0.77 0.54 0.63 95

accuracy 0.86

430
macro avg 0.83 0.75 0.77 43
0
weighted avg 0.85 0.86 0.85 4
3
0

AUC Score: 0.89

Interpretation of the Results:

● Accuracy (86%): The model correctly predicts employee attrition (whether they leave or stay)
86% of the time.
● Confusion Matrix:
o True Positives (51): 51 employees who left were correctly predicted to leave.

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology
o True Negatives (320): 320 employees who stayed were correctly predicted to stay.
o False Positives (15): 15 employees who stayed were incorrectly predicted to leave.
o False Negatives (44): 44 employees who left were incorrectly predicted to stay.
● Classification Report:
o The precision for attrition (class 1) is 0.77, meaning that 77% of the employees predicted
to leave actually left.

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology
o The recall for attrition (class 1) is 0.54, meaning that 54% of the actual employees who
left were correctly predicted by the model.
● AUC Score (0.89): The model has a strong discriminative ability, as it scores 0.89,
indicating that it performs well in distinguishing between employees who will stay and those
who will leave.
● ROC Curve: The ROC curve shows that the model performs well, as the curve is
significantly above the diagonal line, indicating a good true positive rate compared to the false
positive rate.

Conclusion:

● The SVM model has shown strong performance in predicting employee attrition with a good
balance of precision and recall.
● The AUC score of 0.89 suggests that the model is effective at distinguishing between
employees who stay and those who leave.
● To improve the model, you can fine-tune the hyperparameters of the SVM model (e.g., kernel, C,
gamma) using GridSearchCV or RandomizedSearchCV to optimize its performance.

This concludes the implementation of SVM for employee attrition prediction.

ROC Curve Diagram for Support Vector Machine (SVM)

The ROC Curve (Receiver Operating Characteristic Curve) is a graphical representation of a


classifier's performance across all classification thresholds. It plots the True Positive Rate (TPR)
against the False Positive Rate (FPR). The AUC (Area Under the Curve) is a single value that
summarizes the overall performance of the model — the higher the AUC, the better the model is at
distinguishing between the classes.

Here, we will generate the ROC curve for the Support Vector Machine (SVM) model, which will help
us visualize the classifier's performance.

Steps to Plot ROC Curve:

1. Calculate True Positive Rate (TPR): Also called sensitivity or recall. It is the proportion of
actual positives (attrition = 1) that are correctly identified by the model.

TPR=True PositivesTrue Positives+False Negatives\text{TPR} = \frac{\text{True Positives}}{\


text{True Positives} + \text{False
Negatives}}TPR=True Positives+False NegativesTrue Positives

2. Calculate False Positive Rate (FPR): It is the proportion of actual negatives (non-attrition = 0)
that are incorrectly identified as positives by the model.

FPR=False PositivesFalse Positives+True Negatives\text{FPR} = \frac{\text{False Positives}}{\


text{False Positives} + \text{True
Negatives}}FPR=False Positives+True NegativesFalse Positives

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology
3. Plot the ROC curve: We will plot TPR vs FPR at various classification thresholds to show how
the model's performance varies with changes in the threshold.

Code to Plot ROC Curve:

python
Copy code
# Import necessary libraries for plotting
import matplotlib.pyplot as plt
from sklearn.metrics import roc_curve, roc_auc_score

# Get predicted probabilities for the positive class (attrition = 1)


y_prob = svm_classifier.decision_function(X_test)

# Normalize the decision function values to range between 0 and 1


y_prob = (y_prob - y_prob.min()) / (y_prob.max() - y_prob.min())

# Calculate FPR, TPR, and thresholds for ROC curve


fpr, tpr, thresholds = roc_curve(y_test, y_prob)

# Calculate the AUC (Area Under the Curve)


auc_score = roc_auc_score(y_test, y_prob)
print(f"AUC Score: {auc_score:.4f}")

# Plot the ROC curve


plt.figure(figsize=(8, 6))
plt.plot(fpr, tpr, color='b', label=f'ROC curve (AUC = {auc_score:.4f})')
plt.plot([0, 1], [0, 1], color='r', linestyle='--') # Diagonal line (Random classifier)
plt.xlabel('False Positive Rate (FPR)')
plt.ylabel('True Positive Rate (TPR)')
plt.title('Receiver Operating Characteristic (ROC) Curve')
plt.legend(loc='lower right')
plt.grid(True)
plt.show()

Explanation:

● decision_function(X_test): This method gives the decision function score, which is the
distance of each sample from the decision boundary. We use this to calculate the predicted
probabilities.
● Normalization: We normalize the decision function output to lie between 0 and 1, which
is necessary for the ROC curve plot.
● roc_curve(): This function calculates the FPR, TPR, and thresholds at various points.
● roc_auc_score(): This function calculates the AUC score, which summarizes the overall
performance of the classifier.

Example Output:

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology
plaintext
Copy code
AUC Score: 0.89

Interpretation of the ROC Curve:

● True Positive Rate (TPR): The y-axis of the ROC curve represents the proportion of true
positives, i.e., the fraction of actual attrition events that were correctly predicted by the
model.
● False Positive Rate (FPR): The x-axis of the ROC curve represents the proportion of false
positives, i.e., the fraction of employees who stayed but were incorrectly predicted to
leave.
● Diagonal line: The red dashed line represents a random classifier (no better than chance).
A good model should perform better than the diagonal line.
● Area Under the Curve (AUC): The larger the AUC, the better the model is at distinguishing
between employees who stay and those who leave. An AUC of 1 indicates perfect classification,
while an AUC of 0.5 indicates no better than random guessing.

Visualization:

● If the AUC score is 0.89, this indicates the model does a great job distinguishing between
the two classes (employees who stay and employees who leave).
● The ROC curve should be close to the top-left corner of the plot, which would indicate high
sensitivity (TPR) and low false positive rate (FPR).

Example ROC Curve Plot:

The plot will look like this:

● The blue curve will show the performance of your SVM model.
● The red dashed line is the diagonal, which represents a random model.

Conclusion:

● The ROC Curve helps visualize how the model performs across all classification thresholds.
● The AUC Score provides an aggregate measure of the model's ability to distinguish between
classes (attrition vs non-attrition).
● A high AUC score and a curve that is close to the top-left corner indicate a good model.

By plotting the ROC curve and calculating the AUC score, we can assess how well the Support Vector
Machine (SVM) classifier is predicting employee attrition.

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology

COMPUTERSCIENCE DEPARTMENT

1. Coding

Hospital Dataset

Load and Display the dataset

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology

COMPUTERSCIENCE DEPARTMENT

Check the structure of the dataset


And
Summary statistics for numerical columns

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology

COMPUTERSCIENCE DEPARTMENT

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology

COMPUTERSCIENCE DEPARTMENT

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology

COMPUTERSCIENCE DEPARTMENT

Check for missing values

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology

COMPUTERSCIENCE DEPARTMENT

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology

COMPUTERSCIENCE DEPARTMENT

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology

COMPUTERSCIENCE DEPARTMENT

Check unique values for categorical

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology

COMPUTERSCIENCE DEPARTMENT

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology

COMPUTERSCIENCE DEPARTMENT

Anova testing

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology

COMPUTERSCIENCE DEPARTMENT

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology

COMPUTERSCIENCE DEPARTMENT

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology

COMPUTERSCIENCE DEPARTMENT

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology

COMPUTERSCIENCE DEPARTMENT

Chi square

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology

COMPUTERSCIENCE DEPARTMENT

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology

COMPUTERSCIENCE DEPARTMENT

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology

COMPUTERSCIENCE DEPARTMENT

Data modeling

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology

COMPUTERSCIENCE DEPARTMENT

Logistic Regression

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology

COMPUTERSCIENCE DEPARTMENT

Random Forest

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology

COMPUTERSCIENCE DEPARTMENT

Support Vector Machine

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology

COMPUTERSCIENCE DEPARTMENT

And Rock Curve Diagram

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB
ShaheedZulfikarAliBhuttoInstituteofScience&Technology

COMPUTERSCIENCE DEPARTMENT

ArtificialIntelligen BSSE–6A SZABIST-


ce ISB

You might also like