Task 5

Uploaded by

This document discusses classifying credit card transactions as fraudulent or not fraudulent using gradient boosting algorithms. It involves preprocessing a credit card transaction dataset, training classifiers like XGBoost and logistic regression on the data, and evaluating the classifiers' performance using metrics like accuracy, precision, recall and F1 score.

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Task 5

Uploaded by

azurebirble12

0% found this document useful (0 votes)

4 views2 pages

Original Title

Task5

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Download as pdf or txt

0% found this document useful (0 votes)

4 views2 pages

Task 5

Uploaded by

azurebirble12

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Download as pdf or txt

Jump to Page

You are on page 1of 2

Search inside document

Section 5: Credit Score Classification with Gradient Boosting (25 Points)

Load the dataset creditcard_fraud_subsample.csv from STiNE. The dataset contains trans-
actions made by credit cards in two days in September 2013 by european cardholders. It
contains only numerical input variables which are the result of some transformation (due to
confidentiality issues there is not more background on the original features available). The
only features which have not been transformed are Time and Amount, which display the seconds
elapsed between each transaction and the first transaction in the dataset and the transaction
amount. The Feature Class is the response variable and it takes value 1 in case of fraud and
0 otherwise.

1. The first task is to preprocess the dataset. Remove the feature ‘Time’ from the dataset
and standardize the feature ‘Amount’ by substracting the mean and scaling to unit
variance.
2. Split the data into a training and test set to be able to test the out of sample performance
(use 40 percent of the data for the test set).
3. Do some research on the classification method called “Gradient Boosting”, for example
using the book “Elements of Statistical Learning”. Explain the main idea and describe
how the algorithm proceeds.
4. We will use a version of gradient boosting that is called extreme gradient boosting. A pop-
ular implementation is XGBoost, see https://xgboost.readthedocs.io/en/stable/python/
python_api.html#xgboost.XGBClassifier. Hint: You have to install the XGBoost pack-
age using pip install xgboost, more info available at https://xgboost.readthedocs.
io/en/stable/install.html#python. If you have problems with installing XGBoost, you
can also use scikit-learn’s GradientBoostingClassifier, see https://scikit-learn.
org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html.
5. Use the XGBoostClassifier to predict the value of Class based on the available fea-
tures. Compute the accuracy. Compare the predictive performance to that of a logistic
regression. Also compare the accuracy with a naive classifier that simply ‘predicts’ “no
fraud” in every possible case. Why is the accuracy of the second classifier still very close
to one?
6. To be able to evaluate the performance we define the confusion matrix (binary classifi-
cation, 1 =̂ positive)

𝐶1,1 𝐶1,0 true positives false negatives

𝐶 ∶= ( )=( ),
𝐶0,1 𝐶0,0 false positives true negatives

where 𝐶𝑖,𝑗 contains the number of observations of the test sample, which are in class 𝑖
and classified as class 𝑗. Write a function which takes two vectors (where each entry is
either zero or one) as input and calculates the confusion matrix. Use your function to
calculate the confusion matrix on the test set for both classifiers.

7
7. We define
𝐶1,1
precision ∶=
𝐶1,1 + 𝐶0,1
and
𝐶1,1
recall ∶= .
𝐶1,1 + 𝐶1,0
What do precision and recall measure? Write functions which take two vectors as input
and calculate precision and recall. Use your functions to evaluate your classifier (here
nan is considered a valid result if you divide by zero).
8. Finally, we combine precision and recall to the 𝐹1 -score

precision ⋅ recall
𝐹1 ∶= 2 .
precision + recall
Write a function which takes two vectors as input and calculates the 𝐹1 -score. Try
to train new classifiers (e.g. Random Forest, SVM, Logistic Regression) with different
tuning parameters and maximize the 𝐹1 -score on the test sample.
9. (Bonus): Explain the role of the parameters n_estimators and learning_rate in
XGBoost. Set the learning_rate to a value of 0.1. We want to find the optimal
value of n_estimators, i.e., the one that gives the highest 𝐹1 -score. Provide your own
implementation of cross-validation from scratch in order to find the best possible value
for n_estimators based on the cross-validated 𝐹1 score.

Coincent - Data Science With Python Assignment
Document23 pages
Coincent - Data Science With Python Assignment
Sai Nikhil Nellore
100% (2)
Machine Learning Lab Manual 06
Document8 pages
Machine Learning Lab Manual 06
Raheel Aslam
100% (1)
Instruction Assignment9
Document3 pages
Instruction Assignment9
Nurbek Nussipbekov
No ratings yet
Lab NN KNN SVM
Document13 pages
Lab NN KNN SVM
maaian.iancu
No ratings yet
A layman's guide to the project
Document34 pages
A layman's guide to the project
feiairic
No ratings yet
A Layman's Guide to the Project
Document34 pages
A Layman's Guide to the Project
feiairic
No ratings yet
TD2345
Document3 pages
TD2345
ashitaka667
No ratings yet
Pythonfile
Document36 pages
Pythonfile
collection58209
No ratings yet
Exercise - 3: DS203-2024-S1 Roll Number: 23B2215
Document25 pages
Exercise - 3: DS203-2024-S1 Roll Number: 23B2215
aagamkasliwal
No ratings yet
Linear Regression Using Python
Document15 pages
Linear Regression Using Python
Gnanaraj
No ratings yet
CS335 Lab6
Document7 pages
CS335 Lab6
YASH PATIL
No ratings yet
Machine Learning Lab
Document43 pages
Machine Learning Lab
aids2aitstpt
No ratings yet
Logistic Regression
Document13 pages
Logistic Regression
Daksh
No ratings yet
1694600777-Unit2.2 Logistic Regression CU 2.0
Document37 pages
1694600777-Unit2.2 Logistic Regression CU 2.0
prime9316586191
100% (1)
Homework 2: Lasso Regression: 1.1 Data Set and Programming Problem Overview
Document11 pages
Homework 2: Lasso Regression: 1.1 Data Set and Programming Problem Overview
Mayur Agrawal
No ratings yet
Lab#10 Ai
Document3 pages
Lab#10 Ai
Momel Fatima
No ratings yet
# ELG 5255 Applied Machine Learning Fall 2020 # Assignment 3 (Multivariate Method)
Document8 pages
# ELG 5255 Applied Machine Learning Fall 2020 # Assignment 3 (Multivariate Method)
raosaheb
No ratings yet
Evaluation of Different Classifier
Document4 pages
Evaluation of Different Classifier
Tiwari Vivek
No ratings yet
4. Logistic Regression
Document21 pages
4. Logistic Regression
fahimuzz86
No ratings yet
Machine Learning Random Forest Algorithm - Javatpoint
Document14 pages
Machine Learning Random Forest Algorithm - Javatpoint
RAMZI Azeddine
No ratings yet
Regression Analysis
Document16 pages
Regression Analysis
datasciencetrainingnucot
No ratings yet
Week 7 Laboratory Activity
Document12 pages
Week 7 Laboratory Activity
Gar Noob
No ratings yet
C2W3 Lab 01 Model Evaluation and Selection
Document21 pages
C2W3 Lab 01 Model Evaluation and Selection
bhramanand awasthi
No ratings yet
Assignment II
Document3 pages
Assignment II
Dragon King
No ratings yet
Unit - 2 ML
Document32 pages
Unit - 2 ML
imjyoti1511
No ratings yet
ML Lab 08 Manual - Logisitic Regression (Ver7)
Document9 pages
ML Lab 08 Manual - Logisitic Regression (Ver7)
Naima Yaqub
No ratings yet
Linear Regression - Numpy and Sklearn
Document7 pages
Linear Regression - Numpy and Sklearn
Arala Fola
No ratings yet
GRADIENTBOOSTING
Document6 pages
GRADIENTBOOSTING
Sreshta Tric
No ratings yet
ML Coursera Python Assignments
Document20 pages
ML Coursera Python Assignments
M
No ratings yet
Unit 2 MLMM
Document41 pages
Unit 2 MLMM
face the fear
No ratings yet
DS Unit 2 Essay Answers
Document17 pages
DS Unit 2 Essay Answers
Savitha Elluru
No ratings yet
B-56 Sanket Jambhulkar MLA-3
Document7 pages
B-56 Sanket Jambhulkar MLA-3
sanketjambhulkar018
No ratings yet
Random Forest Algorithm
Document9 pages
Random Forest Algorithm
Mangesh Deshmukh
No ratings yet
HW1 Final
Document4 pages
HW1 Final
Vino Wad
No ratings yet
Introduction To Deep Learning Assignment 0: September 2023
Document3 pages
Introduction To Deep Learning Assignment 0: September 2023
christiaanbergsma03
No ratings yet
Week10 KNN Practical
Document4 pages
Week10 KNN Practical
seerungen jordi
No ratings yet
Broadly, There Are 3 Types of Machine Learning Algorithms.
Document33 pages
Broadly, There Are 3 Types of Machine Learning Algorithms.
hbk.hariharan
No ratings yet
Regression Log
Document4 pages
Regression Log
mrarcadian26
No ratings yet
K-Means in Python - Solution
Document6 pages
K-Means in Python - Solution
Rodrigo Violante
No ratings yet
Trust-In Machine Learning Models
Document11 pages
Trust-In Machine Learning Models
smartin1970
No ratings yet
Confusion Matrix
Document6 pages
Confusion Matrix
amir
No ratings yet
hw2 311
Document4 pages
hw2 311
john doe
No ratings yet
DSCI 303: Machine Learning For Data Science Fall 2020
Document5 pages
DSCI 303: Machine Learning For Data Science Fall 2020
Anonymous Student
No ratings yet
Ex 2
Document13 pages
Ex 2
sumerian786
No ratings yet
178 hw3
Document3 pages
178 hw3
jagaenator
No ratings yet
2802ICT Programming Assignment 2
Document6 pages
2802ICT Programming Assignment 2
Anonymous 07GrYB0sN
No ratings yet
CIS 520, Machine Learning, Fall 2015: Assignment 2 Due: Friday, September 18th, 11:59pm (Via Turnin)
Document3 pages
CIS 520, Machine Learning, Fall 2015: Assignment 2 Due: Friday, September 18th, 11:59pm (Via Turnin)
DiDo Mohammed Abdulla
No ratings yet
Machine Learning 2
Document45 pages
Machine Learning 2
Arti Raju
No ratings yet
Logistic Regression
Document10 pages
Logistic Regression
Chichi Jnr
100% (1)
CSE512 Fall19 HW4V1
Document6 pages
CSE512 Fall19 HW4V1
JaspreetSingh
No ratings yet
ML RECORD - Merged
Document33 pages
ML RECORD - Merged
dnr departments
No ratings yet
Machine Learning LAB
Document20 pages
Machine Learning LAB
asmimcse
No ratings yet
Unit2 ML Programs
Document7 pages
Unit2 ML Programs
diroja5648
No ratings yet
Lab Manual DL (New)
Document89 pages
Lab Manual DL (New)
saikrishna.medam789
No ratings yet
Logistic Regression With Newton Method
Document16 pages
Logistic Regression With Newton Method
Aster Rev
No ratings yet
Oops Course Material r23
Document151 pages
Oops Course Material r23
444jayasree
No ratings yet
2-Machine Learning Algorithms
Document16 pages
2-Machine Learning Algorithms
Mohamedi Ally Ussi
No ratings yet
Exp 3 Bi
Document12 pages
Exp 3 Bi
Smaranika Patil
No ratings yet
Lab 08 - Data Preprocessing
Document9 pages
Lab 08 - Data Preprocessing
rida
No ratings yet
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
Review Test Submission - Online Quiz 10 (Chapter 11) - ..
Document3 pages
Review Test Submission - Online Quiz 10 (Chapter 11) - ..
21000348
No ratings yet
ML - Practical List
Document3 pages
ML - Practical List
Akansha Sharma
No ratings yet
Regression Analysis
Document8 pages
Regression Analysis
Slamat Parulian Simamora
No ratings yet
ADM2304 Multiple Regression Dr. Suren Phansalker
Document12 pages
ADM2304 Multiple Regression Dr. Suren Phansalker
Andrew Watson
No ratings yet
Testul 6
Document10 pages
Testul 6
Pirvu
No ratings yet
Pengaruh Kemudahan Transaksi Non-Tunai Terhadap Sikap Konsumtif Masyarakat Kota Makassar
Document12 pages
Pengaruh Kemudahan Transaksi Non-Tunai Terhadap Sikap Konsumtif Masyarakat Kota Makassar
Putri Nurhoeriyah
No ratings yet
Chapter 4 Regression Fitting Lines To Data
Document33 pages
Chapter 4 Regression Fitting Lines To Data
api-367364282
100% (1)
Classification Pros Cons
Document1 page
Classification Pros Cons
Cem Ersoy
No ratings yet
Analyze House Price For King County
Document15 pages
Analyze House Price For King County
yshprasd
100% (1)
A Simple Random Walk Model
Document1 page
A Simple Random Walk Model
Haji Xeeshan
No ratings yet
AMOS - Analysis of Moment Structures: HIV Prevention Center University of Kentucky
Document83 pages
AMOS - Analysis of Moment Structures: HIV Prevention Center University of Kentucky
nprakashmba
No ratings yet
CH 06
Document22 pages
CH 06
Kathiravan Gopalan
No ratings yet
The Oxford Handbook Quantitative Methods: Todd D. Little
Document63 pages
The Oxford Handbook Quantitative Methods: Todd D. Little
Dwi Widi Hariyanto
No ratings yet
VII Pearson R
Document4 pages
VII Pearson R
Jamaica Antonio
No ratings yet
Multiple Regression Tutorial 3
Document5 pages
Multiple Regression Tutorial 3
2plus5is7
100% (2)
UAH - Fall 2014 - ISE 690 - Helvaci - Final Exam Formula Sheet
Document2 pages
UAH - Fall 2014 - ISE 690 - Helvaci - Final Exam Formula Sheet
Anaan Eemus
No ratings yet
Campus Recruitment of MBA Students: The Determining Factors of Getting Job Placements
Document27 pages
Campus Recruitment of MBA Students: The Determining Factors of Getting Job Placements
sparkle shrestha
No ratings yet
Heart Disease Classification ML Assignment - Jupyter Notebook
Document7 pages
Heart Disease Classification ML Assignment - Jupyter Notebook
furole sammyce
No ratings yet
Econometrics: Domodar N. Gujarati
Document36 pages
Econometrics: Domodar N. Gujarati
Hamid Ullah
No ratings yet
Basics of Sas
Document14 pages
Basics of Sas
msrag
No ratings yet
Validitas Kuesioner Pengetahuan: Reliability Statistics
Document3 pages
Validitas Kuesioner Pengetahuan: Reliability Statistics
uswatun
No ratings yet
Progression - Maths Appl - 2nd Cycle
Document21 pages
Progression - Maths Appl - 2nd Cycle
Cyrille Kono
No ratings yet
Machine Learning: Interview Questions
Document21 pages
Machine Learning: Interview Questions
SL MA
No ratings yet
Analisis Aitem KLMPK 10
Document5 pages
Analisis Aitem KLMPK 10
yola amantha
No ratings yet
Rossmann Sales Prediction Presentation
Document35 pages
Rossmann Sales Prediction Presentation
naveen mani
No ratings yet
SML Lab Manuel
Document24 pages
SML Lab Manuel
navin rathi
No ratings yet
Lab 3 - Linear Regression
Document15 pages
Lab 3 - Linear Regression
Nikhilesh Prabhakar
No ratings yet
Basic Business Statistics Australian 4Th Edition Berenson Test Bank Full Chapter PDF
Document68 pages
Basic Business Statistics Australian 4Th Edition Berenson Test Bank Full Chapter PDF
benedictrobertvchkb
100% (19)
Data Science & Machine Learning by Using R Programming
Document6 pages
Data Science & Machine Learning by Using R Programming
Vikram Singh
No ratings yet
UNIT-1 Polynomial Regression
Document7 pages
UNIT-1 Polynomial Regression
210701304
No ratings yet