0% found this document useful (0 votes)
181 views

Fraud Detection in Python Chapter2

The document discusses various machine learning classification methods that are commonly used for fraud detection, including logistic regression, neural networks, decision trees, and random forests. It covers evaluating model performance using metrics like precision, recall, F1 score, and confusion matrices. Finally, it discusses techniques like adjusting class weights, hyperparameter tuning, and ensemble methods like stacking and voting classifiers that can be used to improve fraud detection models.

Uploaded by

Fgpeqw
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
181 views

Fraud Detection in Python Chapter2

The document discusses various machine learning classification methods that are commonly used for fraud detection, including logistic regression, neural networks, decision trees, and random forests. It covers evaluating model performance using metrics like precision, recall, F1 score, and confusion matrices. Finally, it discusses techniques like adjusting class weights, hyperparameter tuning, and ensemble methods like stacking and voting classifiers that can be used to improve fraud detection models.

Uploaded by

Fgpeqw
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

DataCamp Fraud Detection in Python

FRAUD DETECTION IN PYTHON

Review of classification
methods for fraud
detection
Charlotte Werger
Data Scientist
DataCamp Fraud Detection in Python

What is classification?

Goal of classification: Use known fraud cases to train a model to


recognise new fraud cases

Examples:

Email Spam/Not spam


Transaction online fraudulent Yes/No
Tumor Malignant/Benign?

Variable to predict: y ∈ 0, 1

0: Negative class ("majority" normal cases)

1: Positive class ("minority" fraud cases)


DataCamp Fraud Detection in Python

Classification methods commonly used for fraud detection


Logistic Regression
DataCamp Fraud Detection in Python

Classification methods commonly used for fraud detection


Neural Network
DataCamp Fraud Detection in Python

Classification methods commonly used for fraud detection


Decision trees
Random Forests
DataCamp Fraud Detection in Python

Decision Trees and Random Forests


Random forests are a collection of trees on random subsets of
features
DataCamp Fraud Detection in Python

Random Forests for fraud detection


from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(random_state=42)

model.fit(X_train, y_train)

predicted = model.predict(X_test)

print (metrics.accuracy_score(y_test, predicted))

0.991324200913242
DataCamp Fraud Detection in Python

FRAUD DETECTION IN PYTHON

Let's practice!
DataCamp Fraud Detection in Python

FRAUD DETECTION IN PYTHON

Measuring fraud
detection performance

Charlotte Werger
Data Scientist
DataCamp Fraud Detection in Python

Accuracy isn't everything

Throw accuracy out of the window when working on fraud detection


problems
DataCamp Fraud Detection in Python

False positives, false negatives and actual fraud caught


DataCamp Fraud Detection in Python

Precision Recall trade-off


DataCamp Fraud Detection in Python

Obtaining performance metrics


# Import the packages
from sklearn.metrics import precision_recall_curve
from sklearn.metrics import average_precision_score

# Calculate average precision and the PR curve


average_precision = average_precision_score(y_test, predicted)

# Obtain precision and recall


precision, recall, _ = precision_recall_curve(y_test, predicted)
DataCamp Fraud Detection in Python

Precision-Recall Curve
DataCamp Fraud Detection in Python

ROC curve to compare algorithms


DataCamp Fraud Detection in Python

Confusion matrix and classification report


from sklearn.metrics import classification_report, confusion_matrix

# Obtain predictions
predicted = model.predict(X_test)

# Print classification report using predictions


print(classification_report(y_test, predicted))

precision recall f1-score support

0.0 0.99 1.00 1.00 2099


1.0 0.96 0.80 0.87 91

avg / total 0.99 0.99 0.99 2190

# Print confusion matrix using predictions


print(confusion_matrix(y_test, predicted))

[[2096 3]
[ 18 73]]
DataCamp Fraud Detection in Python

FRAUD DETECTION IN PYTHON

Let's practice!
DataCamp Fraud Detection in Python

FRAUD DETECTION IN PYTHON

Adjusting your
algorithms for fraud
detection
Charlotte Werger
Data Scientist
DataCamp Fraud Detection in Python

Balanced weights
model = RandomForestClassifier(class_weight='balanced')

model = RandomForestClassifier(class_weight='balanced_subsample')

model = LogisticRegression(class_weight='balanced')

model = SVC(kernel='linear', class_weight='balanced', probability=True)


DataCamp Fraud Detection in Python

Hyperparameter tuning for fraud detection


model = RandomForestClassifier(class_weight={0:1,1:4},random_state=1)

model = LogisticRegression(class_weight={0:1,1:4}, random_state=1)

model = RandomForestClassifier(n_estimators=10,
criterion=’gini’,
max_depth=None,
min_samples_split=2,
min_samples_leaf=1,
max_features=’auto’,
n_jobs=-1, class_weight=None)
DataCamp Fraud Detection in Python

Using GridSearchCV
from sklearn.model_selection import GridSearchCV

# Create the parameter grid


param_grid = {
'max_depth': [80, 90, 100, 110],
'max_features': [2, 3],
'min_samples_leaf': [3, 4, 5],
'min_samples_split': [8, 10, 12],
'n_estimators': [100, 200, 300, 1000]
}

# Define which model to use


model = RandomForestRegressor()

# Instantiate the grid search model


grid_search_model = GridSearchCV(estimator = model,
param_grid = param_grid, cv = 5,
n_jobs = -1, scoring='f1')
DataCamp Fraud Detection in Python

Finding the best model with GridSearchCV


# Fit the grid search to the data
grid_search_model.fit(X_train, y_train)

# Get the optimal parameters


grid_search_model.best_params_

{'bootstrap': True,
'max_depth': 80,
'max_features': 3,
'min_samples_leaf': 5,
'min_samples_split': 12,
'n_estimators': 100}

# Get the best_estimator results


grid_search.best_estimator_
grid_search.best_score_
DataCamp Fraud Detection in Python

FRAUD DETECTION IN PYTHON

Let's practice!
DataCamp Fraud Detection in Python

FRAUD DETECTION IN PYTHON

Using ensemble
methods to improve
fraud detection
Charlotte Werger
Data Scientist
DataCamp Fraud Detection in Python

What are Ensemble Methods: Bagging versus Stacking


DataCamp Fraud Detection in Python

Stacking Ensemble Methods


DataCamp Fraud Detection in Python

Why use ensemble methods for fraud detection

Ensemble methods:

Are robust
Can help you avoid overfitting
Can typically improve prediction performance
Are a winning formula at prestigious Kaggle competitions
DataCamp Fraud Detection in Python

Voting Classifier
from sklearn.ensemble import VotingClassifier

clf1 = LogisticRegression(random_state=1)
clf2 = RandomForestClassifier(random_state=1)
clf3 = GaussianNB()

ensemble_model = VotingClassifier(estimators=[('lr', clf1),


('rf', clf2), ('gnb', clf3)], voting='hard')

ensemble_model.fit(X_train, y_train)
ensemble_model.predict(X_test)

VotingClassifier(estimators=[('lr', clf1), ('rf', clf2),


('gnb', clf3)], voting='soft', weights=[2,1,1])
DataCamp Fraud Detection in Python

Reliable labels for fraud detection


DataCamp Fraud Detection in Python

FRAUD DETECTION IN PYTHON

Let's practice

You might also like