0% found this document useful (0 votes)

33 views11 pages

Credit Risk Project

Uploaded by

Shah Anzar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views11 pages

Credit Risk Project

Uploaded by

Shah Anzar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

Machine Learning Project

Goal of this project is to practice what we have discussed so far. We will define a new credit card
strategy (based on a new ML model), and compare it with the existing strategy.

1. Download data from https://www.kaggle.com/competitions/amex-default-prediction/data. We

will work with “train_data.csv” and “train_labels.csv”.

Project and Data Explanation:

I suggest to read “Chapter 11 – I am a Data Scientist 2”, before continuing.

Business needs a Default Risk model. The model will be used in Credit Approval Decisioning; i.e. to
decide whether to approve an application for a Credit Product.

Modeling team will build the model. Strategy team will use this model’s output to design a Credit
Approval strategy. I this project, we will build this model and design a strategy in this project.

Chapter 11 discussed the following steps for a modeling project. Red steps are already done in this
project. We first need to know what went on there. Make sure you clearly understand these steps. You
will receive questions on them.

1. Model Design
1.1. Target Definition
1.2. Sample Definition
2. Data Collection
3. Data Cleaning
3.1. Feature Exclusion
3.2. Observation Exclusion
4. Data Processing
4.1. One-Hot Encoding
4.2. Outlier Treatment
4.3. Feature Scaling
4.4. Missing Value Imputation
5. Feature Reduction
6. Model Training
6.1. Grid Search (Hyper-parameter Tuning)
6.2. Bias/Variance Analysis and Finalizing the Model

The first step is to define target variable. Target is 0/1, with 0 indicating no default and 1 indicating
default. How is default defined? We don’t have that information. We just know that “train_labels.csv”
shows target variable (default / no default) for some of bank’s customers as of April 2018. For example,
if default is defined as “Missing 2 payments in the next 1 year”, train_labels shows whether applicant
defaulted in the next one year; i.e. May 2018 to April 2019.

The second step is to define the modeling sample. Modeling team will use this sample to develop the
model; i.e. the sample will be used for Test(s) and Train samples. This step is also done. Modeling team
has decided to use “April 2018 Originations” to build the model. These are the customers who received
a loan in April 2018, and we have enough historical data for them to calculate the target variable. For
example, if like above default is defined as “Missing 2 payments in the next 1 year”, then we should
have one year of data for these customers, so we can calculate whether this customer defaulted in the
next 1 year or not. In other words, we need to have data for these customers from May 2018 to April
2019. So these are the cases who have been our customer, at least for 1 year.

Why do we use only April 2018? Why don’t we use other cohorts (like May 2018 originations, or like a
period, like 2021, …). That is a decision that is made in the design phase, and as mentioned is already
done.

Q. What criteria to consider when defining the dev sample? Answer: Quality and Quantity of data. Read
the very important “Chapter 7 - I am Data… Bias/Variance and Sample Bias”.

So we discussed model design and its two steps:

1. Target Definition
2. Sample Definition (we didn’t discuss Test/Train split, will discuss it later)

Next step is data collection and data cleaning. These time-consuming steps are also already done. But be
ready for them as one of the first tasks that will be assigned to you as a data guy.

We have data on target variable, now we need data for independent features. “train_data.csv” shows
data available for these customers as of April 2018. Data is from April 2017 to April 2018; i.e. 13 months
of data. So when the customer applied for a loan in April 2018, we had this information about the
customer (13 months of historical data from April 2017 to April 2018). Modeling team has decided to
use this data to define features.

Note that we don’t have 13 months of data for all customers. For some we have less. They provided us
with less months of information, for any reason.

We may have more than 13 months of data. Why do we use 13 months to define features?
This question is similar to how we define Target. The decision would be made in the design phase.
Maybe modelers think more than 13 months is very old, and 13 months of data is the best to define
default in the next 12 months (like how we defined target variable).

As mentioned, feature exclusion and observation exclusion steps are also already done. Some
observations may have been removed, like to mitigate sample bias.
Exam/Project Question. Think of possible sources of sample bias for this project. You can come up with
a story for model’s application, and think of some sources of sample bias.

Feature exclusion is also already done, and data is clean. Even several steps of data processing has been
done. All features are scaled, and probably before this step, outliers are removed. Also I think Missing
values are imputed, but I am not sure! So you may need to do that part.

Now that we understood what has been done, let’s start the project.

2. The data might be too large, and you may get memory error while doing the project; so we will
use only 20% of observations. Randomly choose 20% of observations from the
“train_labels.csv”. Merge this sample with “train_data.csv” to have features for these
applicants. This will be our development sample. Save this data, so in the future you don’t have
to read the original large file again.
3. Explore the data. Data Size, data type of features, a snapshot of data, …
4. Perform One-Hot encoding on categorical variables.
5. Next we want to define some features. As mentioned, we have historical data for up to 13
months for each applicant. For some applicants less than that. We need to aggregate these up
to 13 months.
For Numerical features, aggregation can be done by: Average, Sum, Min, Max,… Also I suggest
you include feature’s value as of April 2018, which is the most recent value.
Here are some examples for some aggregated features based on feature X_1:
 X_1_Ave_6: Average X_1 in the last 6 months
 X_1_Ave_12: Average X_1 in the last 12 months
 X_1_Min_6: Minimum X_1 in the last 6 months
 X_1_Max_9: Maximum X_1 in the last 9 months
 X_1_Sum_3: You know
 X_1_Apr_2018
 You name it: (X_1_Apr_2018 – X_1_Apr_2017)/ X_1_Apr_2017
 …

As you can see, you can define many many features. Do that, Model will choose for you, the
ones that have real predictive power. Try to come up with a feature that adds to the model.

Note: For some observations you have less months of data. So the above features may be
calculated with less months. For example, for an application with 4 months of data, X_1_Ave_6
will be calculated based on average of X_1 in the last 4 months.

Sometimes people make some decisions for these cases. For example, you may decide that if
there is less than 2 months of data for an observation, then X_1_Ave_6 would be recorded as
missing. I don’t suggest that.

For Categorical features, some examples for aggregation are as following. Note that you have
already done one-hot-encoding and your categorical features are binary (0/1). In fact they the
features, are categories of categorical features, one-hot encoded.
 X_1_Response_Rate_6: Percentage of times X_1 equals 1 in the last 6 months.
 X_1_Ever_Response_12: Whether X_1 is response at least once in the last 12 months
 X_1_April_2018
 …

6. Split data into 70% as Train sample, 15% as Test1, and 15% as Test2.
7. Next we want to reduce number of features, and keep only features which have high predictive
power. To do so we build an XGB model and will keep features with Feature Importance higher
than 0.5%.
Make sure all missing values are stored as NaN, so XGBoost can work with them.
8. Run an XGBoost model on the train sample, with default parameters. Don’t forget to drop
unnecessary columns if any. Calculate feature importance and save the feature importance as a
CSV file.
9. Run another XGBoost model, which has 300 trees, 0.5 as learning rate, maximum depth of trees
is 4, uses 50% of observation to build each tree, uses 50% of features to build each tree, and
assigns a weight of 5 to default observations. Save the feature importance as a CSV file.
10. Keep features that have feature importance of higher 0.5% in any of the two models. We will
use only these features after this.
11. Next we run Grid Search for the XGBoost model (using only features we chose in step 10). Use
the following combinations in the grid search:
 Number of trees: 50, 100, and 300
 Learning Rate: 0.01, 0.1
 Percentage of observations used in each tree: 50%, 80%
 Percentage of features used in each tree: 50%, 100%
 Weight of default observations: 1, 5, 10
Create the following table. Update the table after each iteration of grid search and save the
table, so in case you got memory error or any other issues, you don’t need to re-run that part of
Grid.

# Trees LR Subsample % Features Weight of Default AUC Train AUC Test 1 AUC Test 2
50 0.01 50% 50% 1 … … …
… … … … … … … …

Note, optimum would be to use all features in the grid search. Also optimum is to test all the
possible parameters for grid search! But we sacrifice a little in model’s performance, but gain a lot
in computational efficiency. At the end, the sacrifice has minor impact on model’s performance,
and even lower impact on the strategy and business results.

12. Choose the best model, based on bias and variance. Re-run the model with optimum
parameters, and save the final XGB model.
13. Next, grid search for Neural Network. We first need to process the data. We have already done
one-hot encoding. We need to do Missing Value Imputation, Outlier Treatment, and
Normalization. We will use only features that we chose in step 10. As mentioned, probably
there is no need for outlier treatment and feature scaling; but to practice, cap and floor
observations at 1 and 99 percentiles. Use StandardScaler for normalization (standardization).
Replace missing values with 0.
As you know, you should get values for 1 and 99 percentiles, as well as Mean and Standard
Deviation values for scaling, only based on the Train sample. Later you should apply the same
value to Test samples (or any other sample). In other words, for each observation in the test
sample (or any other sample), you should first do outlier treatment based on 1 and 99
percentiles of the train sample, and Standardize it based on Mean and Standard Deviation
from the (capped and floored) train sample.
14. Next we run Grid Search for the Neural Network model. Use the following combinations in the
grid search:
 Number of hidden layers: 2, 4
 # nodes in each hidden layer: 4, 6
 Activation function for hidden layers: ReLu, Tanh
 Dropout regularization for hidden layers: 50%, 100% (no dropout)
 Batch size: 100, 10000
Use Adam for optimizer, Cross Entropy for Loss function, and 20 for number of Epochs. For
everything else, use default parameters.
Note you would need to run separate For Loops for different number of Hidden Layers.
Create the following table. Update the table after each iteration of grid search and save the
table, so in case you got memory error or any other issues, you don’t need to re-run that part of
Grid.

# HL # Node Activeation Function Dropout Batch Size AUC Train AUC Test 1 AUC Test 2
2 4 ReLu 50% 100 … … …
… … … … … … … …

15. Choose the best model, based on bias and variance. Re-run the model with optimum
parameters, and save the final NN model.
16. Choose the best model among NN and XGB (models of step 11 and step 14)

Strategy:

Next, you want to define two strategies: a conservative and an aggressive. For each strategy, you define
a threshold to accept/reject applicants based on the model’s output. Applicants with probability of
default (model’s output) lower than threshold, will be accepted, and those with PD higher than
threshold will be rejected. The conservative strategy has a lower threshold compared with the
aggressive one; hence accepts less applicants.

We will estimate Portfolio’s default rate, and Revenue based on each strategy, show it to management,
and let them decide which strategy is better.
Estimate Portfolio’s Default Rate: You already know how to calculate default rate for a strategy; you just
need to calculate default rate among applications that will be accepted based on the strategy, i.e. those
with PD less than threshold.

Estimate Portfolio’s Revenue: Revenue on a credit card depends on two factors: how much the customer
spends, and how much of monthly balance the customer does not pay (roll over to the next month).
Credit Card companies, charge a small amount for each dollar you spend. Also they charge an interest
rate on the remaining monthly balance that you do not pay (revolving balance).

For example, assume a CC charges 0.1% on each dollar spent, and charges 24% (annually) on balances. If
a customer spend $1000 in a month, company’s revenue from spend of this customer will be
1000×0.001=$1. If customer pays back $200 out of $1000, company will charge 2% monthly interest on
the remaining $800, which means $16 interest revenue in that month.

Note: As you know interest rates on CC balances are very high, so don’t be manipulated by banks; i.e.
don’t spend too much. You are most attractive with a cheap, healthy life, with a lot of exercise. Also
Never Default on your Debt; i.e. never miss the minimum monthly payment.

So, to estimate revenue, you need to have a measure of Spend and Balance, in the next few months. In
other words, just like default that we checked payments in say 12 months after origination, we need to
have information on spend and balance in say 12 months after origination. If we have that information,
we may be able to build ML models for spend and balance. For example, Spend model in this case
estimates “Expected Spend in the next 12 months Conditional on Independent Variables.”

However in this data we have no information on spend and balance after origination. Note that the only
information we have about after-origination period is 0/1 indicator in the train_labels.csv. Since we
don’t have spend and balance data, we use historical data on balance and spend for each customer.
Basically we are assuming historical spend and balance is a good predictor of spend and balance in
future.

In the data, features that start with S_ are spend variables, and features that start with B_ are balance
variables. Choose one spend and one balance feature (any feature of your choice). Calculate average of
these two features for the last 6 months (i.e. November 2017 to April 2018). If we show these two
averages with S_Ave and B_Ave, monthly revenue for a customer would be calculated as:

Monthly Revenue for 1Customer =B Ave × 0.02+ S Ave ×0.001

And Expected Revenue in the next 12 months would be 12 multiplied by the above value.

To estimate portfolio’s expected revenue based on a strategy, calculate sum of the above revenue
among customers who are accepted based on the strategy. Assume a revenue of 0 for those who
default.

Note: To estimate Balance and Spend you have built a model. It is a very simple model, which is just the
average.

17. Write a function that calculates default rate and revenue based on a threshold. Function gets
sixe inputs:
 Data with four columns: Target Variable (Default indicator), Default model’s output (PD),
Estimated Monthly Balance, Estimated Monthly Spend
 Name of Target Variable (as a text/string)
 Name of default model’s output (as a text/string)
 Name of Estimated Monthly Balance variable (as a text/string)
 Name of Estimated Monthly Spend variable (as a text/string)
 Threshold (a number between 0 and 1)

And will return two outputs: portfolio’s default rate, and portfolio’s expected revenue.

Use only train sample to try a few thresholds, and choose one conservative and one aggressive
strategy. It is up to you how to choose the thresholds. Imagine you want to present it to senior
management and want to impress them with your work/results. The only constraint is that
company does not want the default rate to be higher than 10%.

Prepare the presentation:

General Guidelines:
1. Create pretty slides
2. Don’t use any background
3. Format numbers, use 1000 separators. Decimal numbers with 2 decimal places (in case of very
small numbers with 3 decimal places)
4. Don’t use small fonts that can not be seen
5. Don’t put too much material in a slide
6. Each slide should be self explanatory. While you don’t want to put too much material, put
enough material that explain the stuff in the slide
7. Format tables. Assign appropriate titles to tables and figures
8. Don’t copy paste from your code
9. Have a good story to tell
10. Format everything. Standard fonts …
11. Use colors, but don’t overuse

In general, remember a presentation is like presenting a product. Both packaging and functionality
matter. You need to wrap your good model in a pretty package.

Note 1: In the following steps, feel free to change the format of tables to make the slides easier to
follow and understand.
Note 2: I have proposed the minimum items to be included in the slides. Feel free to add additional
explanations, …
Note 3: Due to computational constraints, you may need to simplify the project, read less rows and
observations, … Adjust the following tables based on your final sample.

Fill the attached deck with your results.

Slide #1. Executive Summary. This is where you sell your model. Show the results of your strategies, and
add any explanation that can attract people. In this slide imagine you are a seller. Include the following
table. Talk about the project, project’s goal, why this project is important, how it helps the company,
and anything that might be interesting (like these days people get excited when they hear AI …)
Propose the strategy that you think help the company better. Explain why you think this is a better
strategy.

Train Test 1 Test 2

#Total Default Rate Revenue #Total Default Rate Revenue #Total Default Rate Revenue
Conservative Strategy
Aggressive Stratgey

Slide #2. Data. Explain your data (data of step 3). Explain why you chose April 2018 originations (come
up with a story). Include the following table, explain why you decided to use this data, explain your
target variable (you can generate a story for what default means), …
Category # Observations Default Rate
All Applications
Applications with 13 months of historical data
Applications with 12 months of historical data
Applications with 11 months of historical data
Applications with 10 months of historical data
Applications with 9 months of historical data
Applications with 8 months of historical data
Applications with 7 months of historical data
Applications with 6 months of historical data
Applications with 5 months of historical data
Applications with 4 months of historical data
Applications with 3 months of historical data
Applications with 2 months of historical data
Applications with 1 month of historical data

Slide #3. Features. Talk about categories of independent variables used in the development process
(data of step 3). Use raw features; i.e. features as they are in the raw data, and before defining new
features in step 5.
Category # of features

Slide #4. Feature Engineering. Talk about type of features you have created (step 5). You can talk about
categories, such as Average, Median, Min, Max, …

Add a table like table of slide 3, this time not for raw features, but for features you have defined based
on raw features.
Category # of features

Also show summary statistics for the top 5 features with highest SHAP values in the best XGBoost model
(Step 12). Note that at this point you don’t need to talk about the XGBoost model and SHAP. You can
just mention that based on your analyses these are among the most important attributes.
Feature Min 1 Percentile 5 Percentile Median 95 Percentile 99 Percentile Max Mean % Missing

Slide #5. Data Processing / One-Hot Encoding. Show the categorical variables, and show how you
treated them. Show the results after One-Hot Encoding. Include your code to do one-hot encoding.

Slide #6. Feature Selection. Add a graph that explains your feature selection process (steps 7 to 10).
Create a pretty graph. Attach an excel file with results of feature importance for two models (steps 8 and
89 Add a column to table of slide 4, that shows # features selected from each category to be used in grid
search.
Category # of features # selected

Slide #7. XGBoost - Grid Search. Include your grid search code. Explain why you chose these parameters
(don’t say because you said …). Talk about your experience with grid search, how many models you
trained, any lessons learned, …

Slide #8. XGBoost - Grid Search. In this slide, we create scatter plots for models of grid search, and will
choose the best model based on the scatter plot. For each of the models of grid search, calculate
average and standard deviation of AUC across three samples (train and tests). Then include 2 scatter
plots in the slide:

 In the first one, X_Axis shows Average AUC, and Y-Axis shows Standard Deviation of AUC.
 In the second one, X-Axis is AUC of train sample and Y-Axis is AUC of Test 2 sample.

Explain which model you would choose based on each scatter plot.

Slide #9. XGBoost – Final Model. Show the parameters of the final model, also AUC of model on each
sample. Also show how model Rank Orders on each of the three samples. Check the last part of XGBoost
sample code, for rank ordering. Note you need to define score bins based on the train sample, and apply
the same thresholds to test samples. Show rank orderings in a Bar-Chart, where each sample is one
series in Bar Chart, X-Axis shows score bins (intervals), and Y-Axis shows default rate in each bin.

Slide #10. XGBoost – SHAP Analysis. Show Beeswarm Graph for the final model, based on Test 2
sample. Add some explanation of your choice. You can talk about ranking of attributes, correlation
between attribute and the output, …

Slide #11. XGBoost – SHAP Analysis. Show Waterfall Graph for the final model, based on one
observation in Test 2 sample. Add some explanation of your choice. You can talk about which attributes
are driving the score, how to improve the score, …

Slide #12. Neural Network – Data Processing. Explain your data processing for Neural Network. Feel
free to add code, tables, … Format this slide, so it is easy to follow and understand.
Slide #13. Neural Network - Grid Search. Include your grid search code. Explain why you chose these
parameters (don’t say because you said …). Talk about your experience with grid search, how many
models you trained, any lessons learned, …

Slide #14. Neural Network - Grid Search. In this slide, we create scatter plots for models of grid search,
and will choose the best model based on the scatter plot. For each of the models of grid search,
calculate average and standard deviation of AUC across three samples (train and tests). Then include 2
scatter plots in the slide:

 In the first one, X_Axis shows Average AUC, and Y-Axis shows Standard Deviation of AUC.
 In the second one, X-Axis is AUC of train sample and Y-Axis is AUC of Test 2 sample.

Explain which model you would choose based on each scatter plot.

Slide #15. Neural Network – Final Model. Show the parameters of the final model, also AUC of model
on each sample. Also show how model Rank Orders on each of the three samples. Check the last part of
XGBoost sample code, for rank ordering. Note you need to define score bins based on the train sample,
and apply the same thresholds to test samples. Show rank orderings in a Bar-Chart, where each sample
is one series in Bar Chart, X-Axis shows score bins (intervals), and Y-Axis shows default rate in each bin.

Slide #16. Final Model. Talk about the final model (XGBoost or Neural Net), and why you chose this one.
Add tables or graphs from previous steps to support your reasoning …

Slide #17. Strategy. Include the function you have written in step 17. Also include the following table.
Explain what thresholds you chose for conservative and aggressive strategy, and explain your rationale.
Train Test 1 Test 2
Threshold #Total Default Rate Revenue #Total Default Rate Revenue #Total Default Rate Revenue
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1

30 Days ML Projects Challenge
No ratings yet
30 Days ML Projects Challenge
288 pages
Predicting Credit Card Approvals
100% (1)
Predicting Credit Card Approvals
14 pages
MSC Nursing: Ruhs Entrance Examination 2019 Question Paper
100% (3)
MSC Nursing: Ruhs Entrance Examination 2019 Question Paper
21 pages
ML Book Notes
No ratings yet
ML Book Notes
9 pages
Progress of GRADIENT BOOSTING ALGORITHM FOR ELECTRICITY THEFT DETECTION IN POWER UTILITIES
No ratings yet
Progress of GRADIENT BOOSTING ALGORITHM FOR ELECTRICITY THEFT DETECTION IN POWER UTILITIES
10 pages
22K61A0654 2 Sasi Auto
No ratings yet
22K61A0654 2 Sasi Auto
24 pages
Business Report M2 PDF
100% (2)
Business Report M2 PDF
14 pages
Machine Learning Project: Sneha Sharma PGPDSBA Mar'21 Group 2
100% (4)
Machine Learning Project: Sneha Sharma PGPDSBA Mar'21 Group 2
36 pages
01 - Feature Engg
No ratings yet
01 - Feature Engg
43 pages
Implementing Artificial Neural Network in Python From Scratch
No ratings yet
Implementing Artificial Neural Network in Python From Scratch
16 pages
Articles Xgboost Classification With Smote-Enn Algorithm
No ratings yet
Articles Xgboost Classification With Smote-Enn Algorithm
11 pages
Lecture 2 20022025 092902am
No ratings yet
Lecture 2 20022025 092902am
87 pages
Credit Card Approval Prediction Report-Final
No ratings yet
Credit Card Approval Prediction Report-Final
27 pages
Assignment 2: Hive
No ratings yet
Assignment 2: Hive
11 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
32 pages
Machine Learning
100% (1)
Machine Learning
33 pages
Question 1 The Given Dataset Can Be Visualized As Follows
No ratings yet
Question 1 The Given Dataset Can Be Visualized As Follows
13 pages
Ore Sampling
No ratings yet
Ore Sampling
8 pages
Final Report
No ratings yet
Final Report
17 pages
ML Lab Manual - Ex No. 1 To 9
No ratings yet
ML Lab Manual - Ex No. 1 To 9
26 pages
Project Report
100% (3)
Project Report
36 pages
Machine Learning Project Checklist
No ratings yet
Machine Learning Project Checklist
30 pages
Machine Learning Laboratory (BTCS619-18) B.Tech Cse 6Th 2024 EVEN
No ratings yet
Machine Learning Laboratory (BTCS619-18) B.Tech Cse 6Th 2024 EVEN
29 pages
ML New Record
No ratings yet
ML New Record
51 pages
Machine Learning Project Checklist
100% (1)
Machine Learning Project Checklist
10 pages
DIT865 2018 Mar Solution
No ratings yet
DIT865 2018 Mar Solution
9 pages
ML Unit 3
No ratings yet
ML Unit 3
17 pages
ML Unit 2
No ratings yet
ML Unit 2
33 pages
S 11
No ratings yet
S 11
7 pages
ML Report - 22112037
No ratings yet
ML Report - 22112037
9 pages
ML Unit 1
No ratings yet
ML Unit 1
73 pages
AI and ML Lab Ex3 To 12
No ratings yet
AI and ML Lab Ex3 To 12
27 pages
Machine Learning Path
No ratings yet
Machine Learning Path
21 pages
Data Science Checklist
No ratings yet
Data Science Checklist
22 pages
Project - Machine Learning-Business Report: By: K Ravi Kumar PGP-Data Science and Business Analytics (PGPDSBA.O.MAR23.A)
No ratings yet
Project - Machine Learning-Business Report: By: K Ravi Kumar PGP-Data Science and Business Analytics (PGPDSBA.O.MAR23.A)
38 pages
MBAN Assignment
No ratings yet
MBAN Assignment
2 pages
Model Learning Steps
No ratings yet
Model Learning Steps
12 pages
Lec 2
No ratings yet
Lec 2
13 pages
Fraud Claim Detection
No ratings yet
Fraud Claim Detection
13 pages
Article Review 11 Eng
No ratings yet
Article Review 11 Eng
18 pages
Subject - Machine Learning Group - E27-24 Name
No ratings yet
Subject - Machine Learning Group - E27-24 Name
18 pages
PAMLSET1 New
No ratings yet
PAMLSET1 New
4 pages
Optimizing Flight Booking Decisions Through Machine Learning Price Predictions
No ratings yet
Optimizing Flight Booking Decisions Through Machine Learning Price Predictions
50 pages
Machine Learning Assignment
No ratings yet
Machine Learning Assignment
5 pages
PAMLSET2
No ratings yet
PAMLSET2
4 pages
TE ML LAB Mannual
No ratings yet
TE ML LAB Mannual
21 pages
01 Apply Data Preprocessing On Heart Dataset and Evaluate Performance Using Confusion Matrix
No ratings yet
01 Apply Data Preprocessing On Heart Dataset and Evaluate Performance Using Confusion Matrix
19 pages
Sspc-Ab 1
No ratings yet
Sspc-Ab 1
7 pages
CE802 Report
No ratings yet
CE802 Report
7 pages
ML Checklist PDF
No ratings yet
ML Checklist PDF
4 pages
Answer
No ratings yet
Answer
5 pages
The Implication of Statistical Analysis and Feature Engineering For Model Building Using Machine Learning Algorithms
No ratings yet
The Implication of Statistical Analysis and Feature Engineering For Model Building Using Machine Learning Algorithms
11 pages
Lab 08 - Data Preprocessing
No ratings yet
Lab 08 - Data Preprocessing
9 pages
4DATA: Data Scientist M1 - Project (2020-2021) : 1-Import The Useful Library
No ratings yet
4DATA: Data Scientist M1 - Project (2020-2021) : 1-Import The Useful Library
1 page
SampleQuestion - AIOL 2024
No ratings yet
SampleQuestion - AIOL 2024
5 pages
StarterNotebook - Jupyter Notebook
No ratings yet
StarterNotebook - Jupyter Notebook
12 pages
Flight Price Prediction Report
No ratings yet
Flight Price Prediction Report
18 pages
What Are The Differences Between Supervised and Unsupervised Learning?
No ratings yet
What Are The Differences Between Supervised and Unsupervised Learning?
21 pages
Oe Cae 3
No ratings yet
Oe Cae 3
7 pages
Module 2
No ratings yet
Module 2
20 pages
A3 Classification and Feature Engineering
No ratings yet
A3 Classification and Feature Engineering
2 pages
Cash Management Indian Oil - Okok
No ratings yet
Cash Management Indian Oil - Okok
75 pages
ML Report
No ratings yet
ML Report
3 pages
GM-100 - ProjectGuideline
100% (1)
GM-100 - ProjectGuideline
59 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
4 pages
Ius 2022
No ratings yet
Ius 2022
121 pages
Mgt301 Short Notes Lec No 1 To 22
80% (5)
Mgt301 Short Notes Lec No 1 To 22
24 pages
Thesis Systematic Literature Review
100% (2)
Thesis Systematic Literature Review
8 pages
Coca Cola
No ratings yet
Coca Cola
110 pages
Module 3 Data Gathering Establishing Requirements Analysis Interpretation and Presentation
No ratings yet
Module 3 Data Gathering Establishing Requirements Analysis Interpretation and Presentation
20 pages
Effects of Conducting Limited Face To Face Classes
100% (1)
Effects of Conducting Limited Face To Face Classes
45 pages
Research Methodology Assignment
No ratings yet
Research Methodology Assignment
15 pages
Minor Project Ready
No ratings yet
Minor Project Ready
21 pages
Series:July 2020
No ratings yet
Series:July 2020
48 pages
Data Science Methodologies
No ratings yet
Data Science Methodologies
31 pages
Master of Business Administration - MBA Semester 3 MB0050 Research Methodology Assignment Set-1
No ratings yet
Master of Business Administration - MBA Semester 3 MB0050 Research Methodology Assignment Set-1
9 pages
Adoga Mary o
No ratings yet
Adoga Mary o
87 pages
BUSINESS MATHS AND STATISTICS Chalimbana University
No ratings yet
BUSINESS MATHS AND STATISTICS Chalimbana University
105 pages
Factors Influencing Career Choices Among Second Year Students of Bachelor of Science in Industrial Security Management in Visayas State University
No ratings yet
Factors Influencing Career Choices Among Second Year Students of Bachelor of Science in Industrial Security Management in Visayas State University
12 pages
Formula For Hypothesis Testing
No ratings yet
Formula For Hypothesis Testing
5 pages
PR 6
No ratings yet
PR 6
19 pages
Bachelor of Science (Hons.) Co-Operation & Banking Faculty of Agriculture
No ratings yet
Bachelor of Science (Hons.) Co-Operation & Banking Faculty of Agriculture
42 pages
Zheng Et Al., 2021
No ratings yet
Zheng Et Al., 2021
12 pages
10 Examples of Systematic Random Sampling
No ratings yet
10 Examples of Systematic Random Sampling
3 pages
Chaney Et Al 2023
No ratings yet
Chaney Et Al 2023
10 pages
NBC News SurveyMonkey Toplines and Methodology 7 25-731
No ratings yet
NBC News SurveyMonkey Toplines and Methodology 7 25-731
6 pages
Chapter 8 Sampling and Estimation
No ratings yet
Chapter 8 Sampling and Estimation
14 pages
Dissertation 3watatsa
No ratings yet
Dissertation 3watatsa
21 pages
Women's Knowledge and Associated Factors With Preconception Care in Ethiopia
No ratings yet
Women's Knowledge and Associated Factors With Preconception Care in Ethiopia
10 pages
Special Techniques in Excel
From Everand
Special Techniques in Excel
David Fong
No ratings yet

Credit Risk Project

Uploaded by

Credit Risk Project

Uploaded by

Machine Learning Project

1. Download data from https://www.kaggle.com/competitions/amex-default-prediction/data. We

Project and Data Explanation:

I suggest to read “Chapter 11 – I am a Data Scientist 2”, before continuing.

So we discussed model design and its two steps:

Monthly Revenue for 1Customer =B Ave × 0.02+ S Ave ×0.001

Prepare the presentation:

Fill the attached deck with your results.

Train Test 1 Test 2

You might also like