Fraud Detection in Python Chapter2
Fraud Detection in Python Chapter2
Review of classification
methods for fraud
detection
Charlotte Werger
Data Scientist
DataCamp Fraud Detection in Python
What is classification?
Examples:
Variable to predict: y ∈ 0, 1
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)
predicted = model.predict(X_test)
0.991324200913242
DataCamp Fraud Detection in Python
Let's practice!
DataCamp Fraud Detection in Python
Measuring fraud
detection performance
Charlotte Werger
Data Scientist
DataCamp Fraud Detection in Python
Precision-Recall Curve
DataCamp Fraud Detection in Python
# Obtain predictions
predicted = model.predict(X_test)
[[2096 3]
[ 18 73]]
DataCamp Fraud Detection in Python
Let's practice!
DataCamp Fraud Detection in Python
Adjusting your
algorithms for fraud
detection
Charlotte Werger
Data Scientist
DataCamp Fraud Detection in Python
Balanced weights
model = RandomForestClassifier(class_weight='balanced')
model = RandomForestClassifier(class_weight='balanced_subsample')
model = LogisticRegression(class_weight='balanced')
model = RandomForestClassifier(n_estimators=10,
criterion=’gini’,
max_depth=None,
min_samples_split=2,
min_samples_leaf=1,
max_features=’auto’,
n_jobs=-1, class_weight=None)
DataCamp Fraud Detection in Python
Using GridSearchCV
from sklearn.model_selection import GridSearchCV
{'bootstrap': True,
'max_depth': 80,
'max_features': 3,
'min_samples_leaf': 5,
'min_samples_split': 12,
'n_estimators': 100}
Let's practice!
DataCamp Fraud Detection in Python
Using ensemble
methods to improve
fraud detection
Charlotte Werger
Data Scientist
DataCamp Fraud Detection in Python
Ensemble methods:
Are robust
Can help you avoid overfitting
Can typically improve prediction performance
Are a winning formula at prestigious Kaggle competitions
DataCamp Fraud Detection in Python
Voting Classifier
from sklearn.ensemble import VotingClassifier
clf1 = LogisticRegression(random_state=1)
clf2 = RandomForestClassifier(random_state=1)
clf3 = GaussianNB()
ensemble_model.fit(X_train, y_train)
ensemble_model.predict(X_test)
Let's practice