0% found this document useful (0 votes)
16 views188 pages

Ilovepdf Merged-9 Removed (2) Compressed Compressed Compressed Compressed-1-188

Uploaded by

yand59326
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views188 pages

Ilovepdf Merged-9 Removed (2) Compressed Compressed Compressed Compressed-1-188

Uploaded by

yand59326
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 188

Artificial intelligence and business analytics

Three stages of business analytics

 Descriptive analytics
 Predictive analytics
 Prescriptive analytics
Descriptive analytics

Image from ICONS Library


Descriptive analytics and data visualization

 Descriptive analytics
 Analyze past data to describe it to see clear patterns and trends
 Data visualization
 Tells an interesting, insightful story
 Intuitive to understand accurately
 Tells the truth: intuitive interpretation is accurate, not misleading
 It does NOT apply the tricks of How to lie with statistics

Image from ICONS Library


AI-powered descriptive analytics
with Microsoft Excel

Excel AI - data analysis made easy - YouTube


US National Medical Expenditure Survey (NMES) dataset
Variables Description
hospital_stays Number of hospital stays. (Prediction target)
health Self-perceived health status, levels are "poor", "average" (reference category), "excellent".
chronic_illnesses Number of chronic conditions.
active Whether the individual has a condition that limits activities of daily living ("limited") or not ("normal").
region Region; levels are northeast, midwest, west, other (reference category).
age Age in years.
black Is the individual African-American?
gender Female or male.
married Is the individual married?
years_education Number of years of education.
income Family income in USD.
employed Is the individual employed?
private_insurance Is the individual covered by private insurance?
medicaid Is the individual covered by Medicaid?

4,406 patients; 14 variables


Predictive analytics

Image source: ICONS Library


Predictive analytics

 Predictive analytics: analyze the past to predict the future, assuming that the
future will continue to resemble the past
 Primary focus: high-accuracy estimation or prediction of the target outcome
 Input factors that lead to the outcome are incidental
 Legitimate uses for predictive analytics
 Prioritize: managers can understand how things work and then choose priorities
 Anticipate: for outcomes over which managers have very little control
 Time series forecasting: genuine time-based prediction
 Benchmark: for outcomes over which managers have some control
 Establish a benchmark to beat for prescriptive analytics if nothing changes
 Automated machine learning (AutoML) usually gives the best possible results
Image source: ICONS Library
Predictive analytics

Eric Siegel - Predictive Analytics - 7-minute keynote sample - YouTube


Prescriptive analytics

Image source: ICONS Library


Prescriptive analytics

 Prescriptive analytics: analyze the past for guidance on


how to intervene to ensure that the future will be better than the past
 Primary focus: input features (independent variables)
 What factors result in the target outcome?
 A good estimation model with high interpretability is more valuable than
an excellent estimation model with low interpretability
 Goal is to intervene to improve the target outcomes, not to reproduce them
 Interpretation of input feature influence to suggest priorities for intervention
 Optimize and simulate outcomes based on input features to precisely suggest
actions
 Benchmark predictive models can be used to simulate the effects of prescribed
actions

Image source: ICONS Library


Accurate predictions: important but limited

 Accurate predictions are inherently valuable


 Target variables are important in themselves
 Achieved through state-of-the-art machine learning techniques
 Simulations
 Can effectively estimate the effects of various values on the target variable
 Simulation of the full range of possible values can provide guidance for action in
multiple scenarios
 Disadvantages
 Managerial confidence is limited, so low buy-in
 No human learning through the process
Meaningful explanations for prescriptive analytics

 Managerial action is guided by an understanding of how and why various variables


are related to the target variable
 Without explanation, it is hard to know when and why predictive rules change
 Meaningful explanations are achieved through explainable AI (XAI) approaches
 Also called “interpretable machine learning (IML)”
 Simulations can also be helpful
 But not all explanations are equally actionable
 Even if you understand what causes the outcome, what can you do about it?
 It is crucial to distinguish the analytical “importance” of variables from their managerial
actionability
Explainable AI (XAI)
Explainable AI (XAI)

What is Explainable AI? - YouTube


Stakeholders of XAI

 Managers or product owners


 Care about the fit of a model with business strategy and purpose, ensure
product efficiency and develop new functionalities
 Users
 Affected by the model decisions and consumers who require transparency
about how decisions are taken and how they might be potentially affected by
those systems
 Developers of AI
 Developers, data scientists, model risk analysts, and other AI specialists who
benefit when building a model or who look for ways to improve performance
 Regulators
 Inspect the reliability and legal compliance of a model, as well as the impact of
its decisions on customers and users
Various goals of the stakeholders of XAI

 Transparency
 Ethical responsibility
 Realism
 Actionability
XAI for Transparency

 Transparency refers to XAI that explains in human-understandable


terms how the AI arrives at its results.

 Feature importance analysis is a useful global explainability


technique.
 Transparency helps debugging and refinement of models.
 Evaluating XAI based on system functionality is more objective and
less costly than human experiments.
 Explainable AI techniques should be preferred when possible.
XAI for Ethical responsibility

 Ethical responsibility refers to XAI that provides information to assure that the AI
respects human values of fairness, free will, liberty, privacy, and so on.

 Industries and jurisdictions with strong regulatory requirements governing the


treatment of consumers implicitly or explicitly require documentation of how AI
complies with such requirements.
 Determining the appropriate level of AI explainability requires a cost-benefit analysis
of how much benefit obtuse AI might offer compared to the costs of explainability.
 The appropriate level of XAI depends on contextual factors, technical solutions, and
the explanation output choices.
 An action plan should be implemented specifically to ensure that the XAI of black-box
is not overlooked.

Image source: Start Your Business Magazine


XAI for Realism

 Realism refers to XAI that explains how


the AI reflects a realistic representation
of the real-world scenario that the AI models.

 A good explanation helps to build a mental model of the AI that corresponds


with the needs in the actual scenario.
 For scientific knowledge learning from ML, explanations are needed beyond
simply accurate models.
 XAI for realism should prioritize human-grounded evaluations that test actual
human perceptions of explainability.

Image source: pxfuel


XAI for Actionability

 Actionability refers to XAI that explains how the results of the AI carry
implications for recommended actions.

 Understanding the causal relationships between input and target variables is


the foundation of XAI for actionability.
 Incorporating XAI for actionability can be very helpful in the AI development
process to ensure more effective AI systems, including protecting against data
vulnerabilities and adversarial attacks.
 Increasing XAI for actionability can increase the buy-in for AI adoption.

Image source: ICONS Library


Actionable explanation
Process of actionable explanation

Analysis
(various
methodologies)

Step 2: Classify concepts Step 3: Conduct analysis Step 4: After analysis, classify
Step 1: Gather all
according to their ultimate according to various concepts according to
concepts available
importance and time frame appropriate methodologies actionable explanation types
Key attributes of concepts in actionable explanation

 Relevance
 Ultimate
 Relevant
 Not relevant
 Control
 High
 Low
 No
Relevance of concepts:
Ultimate

 Every project must have at least one distinct Ultimate


concept
 Usually only one Ultimate, but there might be more than one
 Target or label in supervised learning
 Either highly desirable or highly undesirable
 Numeric: either maximize or minimize it
 Categorical: certain categories are preferred while others to be
avoided
 Managerial implications:
 Explain the effects of antecedent concepts
Relevance of concepts:
Relevant
 Studied and confirmed to be relevant in affecting the Ultimate
 Managerial implications:
 If controllable, managers should take action to shape the
concept in their favour
 If not controllable, managers should observe, anticipate and
react to the changes in the concept
Relevance of concepts:
Not relevant
 Studied and confirmed to be irrelevant in affecting the Ultimate
 Very valuable to be able to confidently designate a concept as
Not Relevant
 Managers can deemphasize such concepts so that they do not waste
resources (time, personnel, etc.) on their observation or control (at
least, as far as the current project is concerned)
 But what is Not Relevant today may become Relevant tomorrow
 Whenever the model is reassessed, concepts previously categorized
as Not Relevant should always be reverified
 Frequency of reverification depends on the cost of data collection
 Managerial implications:
 Do not waste resources on their observation or intervention
 Periodically verify if they continue to be Not Relevant
Controllability of concepts

 The extent to which managers can intervene to take action to


shape the concept in their favour
 The nature of the relationships determines the nature of the
influence
 Might interact with other concepts in the model
 High control
 Managers have very much or total influence on the values of
the concept
 Low control
 Managers have some influence on the values of the concept,
but much of it depends on factors beyond the managers’
actions
 No control
 Managers have no influence at all on the values of the concept
Managerial implications of controllability of concepts

 High control
 It is managers’ fundamental responsibility to shape the concept
such that the Ultimate changes in the desired direction
 Low or no control
 Although managers cannot directly control the values of the
concept, they should measure and observe its values
 Thus, managers should understand its effects and anticipate
them proactively
 Managers should identify effective responses to take based on
changes in observed values
This presentation uses
Introducing
the subject
Short video on an AI
which failed

AI Camera Ruins Soccer Game For Fans After Mistaking Referee's Bald Head For Ball | IFLScience
AI Camera Ruins Soccer Game For Fans After Mistaking
Referee's Bald Head For Ball - YouTube
Performance Metrics in Machine Learning

1. What is Machine Learning?


2. What are the different kinds of algorithms in Machine Learning? Examples of use.
3. Supervised Machine Learning Methodology
4. Underfitting and Overfitting – Bias and Variance
5. Performance Metrics for Regression
6. Performance Metrics for Binary Classification
7. Fairness Metrics
8. Conclusion
Machine learning algorithms build a model based on sample data, known
as training data, in order to make predictions or decisions without being
explicitly programmed to do so.

Machine Learning Vs Traditional Programming – Avenga


Example:
Escape the
Maze
Escape the maze

Data Program

…/…

heuristic
Escape the Maze - Python Coding - Day 6 - Southernfolks Designs
Escape the maze Output

…/…

Escape the Maze - Python Coding - Day 6 - Southernfolks Designs


Escape the maze

Data Output Program

qmaze (samyzaf.com)
Simplified history of AI, ML and DL

AI Generative AI

https://towardsdatascience.com/notes-on-artificial-intelligence-ai-machine-learning-ml-and-deep-learning-dl-for-56e51a2071c2
When do
you need
Machine
Learning?

Hilarious Data Science Jokes on the Web to Lighten Your Mood (analyticsinsight.net)
This presentation uses
Applications
of Machine
Learning
What are the
different kinds
of algorithms
in Machine
Learning?

Machine Learning Algorithms In Layman’s Terms, Part 1 | by Audrey Lorberfeld | Towards Data Science
Supervised
Machine
Learning
Regression

Predicting a continuous-valued attribute


associated with an object.
This presentation uses
A few examples of use
1. Stock market prediction Machine learning regression algorithms can analyse historical stock data, market trends, and various factors influencing
stock prices to forecast future values. This information can help investors make informed decisions about buying or selling stocks.
2. Sales forecasting Regression models can be used to predict future sales based on historical sales data, advertising expenditures, pricing strategies,
and other relevant factors. This information is valuable for businesses in planning inventory, managing resources, and setting sales targets.
3. Demand forecasting Regression models can analyse historical data on product demand, weather patterns, economic indicators, and other
variables to forecast future demand. This information is useful for supply chain management, production planning, and optimizing inventory levels.
4. Real estate price prediction Regression models can consider factors such as location, property size, number of bedrooms, nearby amenities, and
historical sales data to predict property prices. This information is valuable for real estate agents, buyers, and sellers to assess property values
accurately.
5. Energy consumption prediction Regression models can analyse historical energy consumption patterns, weather conditions, and other relevant
variables to predict future energy consumption. This information can assist energy providers in optimizing resource allocation, managing peak loads,
and improving energy efficiency.
6. Risk assessment in insurance Regression models can be used to assess the risk associated with an insurance policyholder based on various factors
such as age, health status, driving records, and previous claims. This information helps insurance companies set appropriate premiums and
determine policy terms.
7. Credit scoring Regression models can analyse a person's credit history, income, employment status, and other relevant factors to predict their
creditworthiness. This information is crucial for banks and financial institutions when evaluating loan applications and determining interest rates.
8. Demand for ride-sharing services Regression models can predict the demand for ride-sharing services based on historical trip data, time of day,
weather conditions, and other factors. This information helps ride-sharing companies optimize driver allocation, estimate fare prices, and reduce
passenger waiting times.
Linear Separation Non-Linear Separation
Supervised
Machine
Learning
Classification

Identifying which category an object belongs to.


This presentation uses
A few examples of use
1. Email spam filtering Classification models can be trained to analyse email content and classify incoming emails as either spam or legitimate. The
model learns from labelled examples and can accurately predict whether an email is spam, helping in managing email inboxes effectively.
2. Sentiment analysis Classification algorithms can be used to analyse text data, such as customer reviews or social media posts, and classify them
into positive, negative, or neutral sentiment categories. This information is valuable for understanding public opinion, brand monitoring, and
customer feedback analysis.
3. Fraud detection Classification models can be trained on historical data to identify patterns and indicators of fraudulent activities in credit card
transactions, insurance claims, or online transactions. This helps in detecting and preventing fraudulent behaviour, saving businesses from financial
losses.
4. Disease diagnosis Classification algorithms can assist in medical diagnosis by analysing patient data, symptoms, and test results to classify
individuals into different disease categories. This information can aid healthcare professionals in making accurate diagnoses and providing
appropriate treatments.
5. Image recognition Classification models can be trained on labelled image datasets to classify images into various categories or objects. This is
useful in applications such as facial recognition, object detection, autonomous driving, and quality control in manufacturing.
6. Customer segmentation Classification techniques can group customers based on their demographics, purchasing behavior, or other relevant
features. This segmentation helps businesses tailor their marketing strategies, personalize recommendations, and optimize customer experiences.
7. Document categorization Classification models can automatically classify documents into different categories, such as news articles, legal
documents, or research papers. This enables efficient organization and retrieval of large document collections.
This presentation uses
dataset dataset
features target individuals are represented by rows of features and a target value

features
individual matrix of values for individuals

target
vector of values for individuals
X y regression model
the model predicts a target from the features
the target is a continuous variable: e.g., physical value, price, score…

binary classification model


the model predicts a target from the features
the target is one of two classes: e.g., healthy/diseased, won’t churn/will churn, not fraud/fraud…
train the model
using X and y
features X Training set ❷ of the training set
target y ~75% to build a model

Initial split randomly the dataset


❶ 95%
Dataset into training and test sets
100% Model
90%

use the model


Test set evaluate the model
❸ to predict ŷ using X ❹ by comparing y and ŷ
~25% ŷ = model(X)
Example:
R
Example: Real Estate dataset for regression
features X target y

Real estate price prediction | Kaggle


features X target y prediction ŷ

performance
compare
❹ numbers
y and ŷ

❷ train the model using X and y

features X
Regression ❸
model use the model
to predict ŷ using X
ŷ = model(X)
Example of
Failure in
Real Estate

The $500mm+ Debacle at Zillow Offers – What Went Wrong with the AI Models? - insideBIGDATA
Example:
Titanic
dataset for
classification
Example: Titanic dataset for classification
features X target y

Titanic - Machine Learning from Disaster | Kaggle


features X target y prediction ŷ

performance
compare
❹ classes
y and ŷ

❷ train the model using X and y

features X
Classification ❸
model use the model
to predict ŷ using X
ŷ = model(X)
Global
Methodology
CRISP-DM Methodology
• CRoss-Industry
• Standard Processing
• for Data Mining
Global
Methodology
What data scientists
spend the most time Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data Science Task, Survey Says (forbes.com)

doing
Data
Cleaning
• Garbage In
• Garbage Out
What are the advantages of pre-processing and cleaning data before modelling?

1. Data Quality High-quality data leads to better model performance. Data cleaning helps identify and rectify errors, outliers, and missing values,
ensuring the data is accurate and reliable.
2. Model Performance Cleaned data leads to more robust and accurate models. Outliers and noisy data can mislead machine learning
algorithms, resulting in subpar predictions.
3. Reduced Bias Biased data can lead to biased models. Data cleaning helps in mitigating biases, ensuring that the model provides fair and
unbiased predictions.
4. Feature Engineering Data cleaning often involves feature engineering, where irrelevant or redundant features are removed, and relevant ones
are transformed or created. This enhances the model's ability to extract meaningful patterns.
5. Computational Efficiency Cleaning data reduces its complexity, making it easier and faster for algorithms to process. This is especially
important when dealing with large datasets.
6. Improved Interpretability Cleaned data is more interpretable. When stakeholders understand the data, they can better trust and use the
model's output.
7. Generalization Data cleaning improves a model's ability to generalize from the training data to new, unseen data. This ensures the model
performs well in real-world scenarios.
8. Error Reduction By addressing data quality issues beforehand, the likelihood of errors in predictions is significantly reduced, leading to more
reliable results.
This presentation uses
Underfitting
and
Overfitting
in Regression
The model is too The model can The model is too
simple to be able generalize complex and was
to learn anything properly from able to learn the
from the data. the data. noise in the data.

Overfitting and Underfitting – The Correlation


Underfitting
and Overfitting
in Classification
The model is too The model can The model is too
simple to be able generalize complex and was
to learn anything properly from able to learn the
from the data. the data. noise in the data.

Underfitting and Overfitting in Machine Learning – C#, Python, Data Structure , Algorithm and ML (wordpress.com)
Definitions

Bias
• The bias measures how well the average prediction of a model matches the true value.
• It quantifies the systematic error of the model.
• Formula

Variance
• The variance measures the variability of the model's predictions for different training
sets.
• It quantifies the model's sensitivity to changes in the training data.
• Formula
Geometric perspective

Variance

Bias
By Bernhard Thiery - Bias–variance tradeoff - Wikipedia
Model perspective and performance

Good performance Low Bias High Bias


High Variance High Variance
on training set
Model is accurate Model is inaccurate
Poor performance but inconsistent and inconsistent
on test set Overfitting Worst model

Variance

Low Bias High Bias


Good performance Low Variance Low Variance
Poor performance
Model is accurate Model is consistent
on training and test sets and consistent but inaccurate on training and test sets
Appropriate fitting Underfitting

Bias
Bias-Variance Trade-off
A good model should be neither too simple nor too complex to provide good results on both training and test sets.

error on test set

underfitting overfitting

~ error on training set

Decoding Bias Variance Tradeoff. Bias — An error that on avergae tells… | by Saurav Agrawal | Jun, 2023 | Medium
Underfitting Overfitting
reducing bias reducing variance
more data more data for training
+ +
dataset train set

feature engineering feature selection

dataset + dataset-

+ -
regularization regularization
increase model complexity decrease model complexity
cross-validation cross-validation
hyperparameters hyperparameters
train longer early stop
… …
This presentation uses
To evaluate a regression model, we need to compute the error due to the gap between y and ŷ

It's important to note that the choice of metric depends on the specific problem, the nature of the data, and
the context in which the model will be used. It is recommended to consider the characteristics of your data
and the objectives of your analysis to select the most appropriate metric for your regression task.

We are going to deal with 3 metrics:


- Mean Squared Error (MSE)
- Root Mean Squared Error (RMSE)
- Mean Absolute Error (MAE)
Mean Squared Error
(MSE)
MSE calculates the average of the squared
differences between the predicted and actual
values. It gives higher weights to larger errors due
to the squaring operation. MSE is widely used as a
general-purpose metric for regression and is
sensitive to outliers. It is suitable when large
errors are particularly important and need to be
penalized heavily.
Root Mean Squared
Error (RMSE)
RMSE is the square root of the MSE and provides
an interpretable measure in the same units as the
target variable. It is commonly used when you
want to report the average error in the original
scale of the data and want to penalize large
errors.
Mean Absolute Error
(MAE)
MAE calculates the average of the absolute
differences between the predicted and actual
values. Unlike MSE, it does not involve squaring,
which makes it less sensitive to outliers. MAE is
useful when you want a metric that is more robust
to extreme values and when you want to focus on
the magnitude of errors rather than their
direction.
Example: Real Estate dataset
sample
y ŷ
43.2 48.0
40.8 41.4
27.7 30.6
34.2 30.3
… …

metrics train test 1


MSE 66.98 108.53
RMSE 8.18 10.42
MAE 6.08 6.31
Example: Real Estate dataset
sample
y ŷ
outlier 43.2 48.0
40.8 41.4
27.7 30.6
34.2 30.3
… …
117.5 41.4

outlier
What is
an outlier
in real estate?
Example: Real Estate dataset without the outlier
sample
y ŷ
without outlier
43.2 48.0
40.8 41.4
27.7 30.6
34.2 30.3
… …

metrics train test 1 test 2


MSE 66.98 108.53 53.41
RMSE 8.18 10.42 7.31
MAE 6.08 6.31 5.63
A few other metrics in regression

- R-squared (R²) measures the proportion of the variance in the target


variable that can be explained by the regression model. It ranges from 0 to
1, where 1 indicates a perfect fit. R-squared is often used to assess how
well the model captures the variability of the data. However, it does not
provide information about the magnitude or direction of errors.
- Mean Percentage Error calculates the average percentage difference
between the predicted and actual values. It is commonly used in
forecasting problems and provides a measure of relative error. MPE is
useful when you want to assess the average percentage deviation of the
predictions from the actual values.
- Mean Absolute Percentage Error is like MPE but takes the absolute value
of the percentage differences. It provides a measure of the average
relative error and is commonly used when the scale of the data varies
widely across different samples.
It is recommended to use a combination of metrics to capture a comprehensive understanding of
the model’s strengths and weaknesses. The most used metrics are in green.

MSE – Mean Squared Error R2 – R-squared


+ penalizes large errors + shows how well the model fits the data
- sensitive to outliers - does not capture the accuracy of individual predictions
- not in the original unit - can give false information when overfitting

RMSE – Root Mean Squared Error MPE – Mean Percentage Error


+ penalizes large errors + compute average magnitude of errors relatively to the true values
+ in the original unit - sensitive to outliers
- sensitive to outliers - sensitive to near-zero actual values

MAE – Mean Absolute Error MAPE – Mean Absolute Percentage Error


+ less sensitive to outliers + easy to interpret
+ in the original unit + used in forecasting and time series analysis
- treat positive and negative errors equally - sensitive to near-zero actual values
metrics train test 1 test 2
MSE 66.98 108.53 53.41
Example of RMSE 8.18 10.42 7.31
other metrics
with the real MAE 6.08 6.31 5.63
estate dataset R2 0.60 0.53 0.68
MPE -0.05 -0.05 -0.05
MAPE 0.19 0.19 0.19
Perfect classifier

dataset

Positive Negative

perfect classifier
Standard classifier

dataset

True Positive False Positive

Positive Negative

False Negative True Negative


classifier
Contingency table
dataset

True Positive False Positive

Positive Negative Observation

False Negative True Negative


classifier
Negative
Positive

Positive True Positive False Positive


Prediction

Negative False Negative True Negative


False Positive vs False Negative

Observation

Negative
Positive

Positive True Positive False Positive


False Positive
Prediction (noise)
False Negative Negative False Negative True Negative

(silence)
Accuracy
Accuracy is a common performance metric used to
evaluate the performance of classification models. It
measures the proportion of correctly predicted
instances out of the total number of instances in a
dataset. Accuracy is particularly useful when the classes
in the dataset are balanced, meaning they have roughly
equal representation.

TP FP
FN TN

TP FP
FN TN
Precision
Precision is a performance metric used to evaluate the
quality of a classification model, particularly in binary
classification problems. Precision measures the
proportion of correctly predicted positive instances
(true positives) out of the total instances predicted as
positive (true positives plus false positives). It provides
an indication of how well the model predicts positive
instances and how likely it is to correctly identify them.

TP FP
FN TN

TP FP
FN TN
Recall
Recall, also known as sensitivity or true positive rate, is
a performance metric used to evaluate the
effectiveness of a classification model, particularly in
binary classification problems. Recall measures the
proportion of correctly predicted positive instances
(true positives) out of the total actual positive instances
(true positives plus false negatives). It quantifies the
ability of the model to identify all positive instances
correctly.

TP FP
FN TN

TP FP
FN TN
Example: Titanic dataset with a very simple classifier
Let us implement a very simple classifier based on the feature gender: female → survived, male → not survived
Let us compute the metrics on the whole dataset, just to figure out the numbers.

Observation

Survived Not survived

Survived Metrics Score 1


233 81
Prediction

(female)
Accuracy 0.787
Precision 0.742
Not survived
109 468 Recall 0.681
(male)
Example: Titanic dataset with another simple classifier
Let us implement another simple classifier based on the features gender (same as above) and infancy (below 6 years old)
Let us compute again the metrics on the whole dataset, just to figure out the numbers.

Observation

Survived Not survived

Survived Metrics Score 1 Score 2


248 (+15) 89 (+8)
Prediction

(female or infant)
Accuracy 0.787 0.795

Not survived Precision 0.742 0.736


(male and not 94 (-15) 460 (-8) Recall 0.681 0.725
infant)
1 2

Density
(random selection)

A Survey of Online Failure Prediction Methods (researchgate.net)


A few other metrics in binary classification

- F1-Score is the harmonic mean of precision and recall, combining both


metrics into a single value. It provides a balanced measure that considers
both false positives and false negatives. The F1 score ranges from 0 to 1,
where 1 indicates perfect precision and recall.
- Specificity, also known as true negative rate, measures the proportion of
correctly predicted negative instances (true negatives) out of the total
actual negative instances (true negatives plus false positives). It quantifies
the ability of the model to correctly identify negative instances.
- Area Under the ROC Curve (AUC-ROC) is a performance metric that
measures the area under the Receiver Operating Characteristic (ROC)
curve. The ROC curve plots the true positive rate (sensitivity) against the
false positive rate (1 - specificity) at various classification thresholds. A
higher AUC-ROC indicates better discrimination ability of the model.
It is recommended to use a combination of metrics to capture a comprehensive understanding of the
model’s performance and trade-offs. It is crucial to consider the consequences of misclassifications.

Accuracy F1-Score
+ simple to understand and interpret + balance the trade-off between precision and recall
- sensitive to unbalanced classes - implies equal weight on false positives and false negatives

Precision Specificity (True Negative Rate)


+ focuses on the correctness of positive predictions + focuses on the correctness of negative instances
- does not consider false negatives - does not consider false negatives

Recall AUC (Area Under the ROC Curve)


+ focuses on the correctness of positive instances + aggregated measure of the model’s performance
- does not consider false positives + focuses on the model’s ability to rank instances correctly
- less interpretable
- sensitive to unbalanced classes
Metrics Score 1 Score 2
Accuracy 0.787 0.795

Example of Precision 0.742 0.736


other metrics
with the Recall 0.681 0.725
Titanic dataset
F1 Score 0,710 0,730

Specificity 0,852 0,838

AUC-ROC 0,770 0,780


High Precision Algorithms minimize false positives which are visible by users in selection

dataset

True Positive False Positive


(noise)

Positive Negative

False Negative True Negative


high precision classifier
High Precision Algorithms

1. Fraud Detection ML models can be trained to identify fraudulent activities, such as credit card fraud or online transaction fraud, by analyzing
patterns and anomalies in the data. High precision is important in this case to minimize false positives and ensure that genuine transactions are
not mistakenly flagged as fraudulent.
2. Medical Diagnosis ML models can assist in diagnosing medical conditions based on patient data, including symptoms, medical history, and test
results. Achieving high precision is essential to avoid misdiagnosis and ensure accurate identification of diseases or conditions.
3. Image Classification ML models can classify images into different categories, such as recognizing objects, identifying faces, or detecting specific
features. High precision is crucial to minimize misclassifications and ensure accurate categorization of images.
4. Sentiment Analysis ML models can analyze text data, such as customer reviews or social media posts, to determine the sentiment expressed
(e.g., positive, negative, or neutral). High precision is important to accurately classify the sentiment and provide reliable insights for businesses.
5. Spam Filtering ML models can be trained to filter out spam emails or messages from legitimate ones. High precision is necessary to avoid
mistakenly classifying important messages as spam, ensuring that the filtering process is accurate and reliable.
6. Autonomous Vehicles ML algorithms are employed in self-driving cars to interpret sensor data and make decisions while driving. High precision
is critical to ensure the correct identification of objects, accurate lane detection, and precise decision-making to maintain safety on the road.
7. Quality Control ML models can be used to detect defects or anomalies in manufacturing processes, such as identifying faulty products on an
assembly line. High precision is vital to minimize false positives and accurately identify defective items, improving overall quality control.
High Recall Algorithms minimize false negatives which are not visible by users in selection

dataset

True Positive False Positive

Positive Negative

False Negative True Negative


(silence) high recall classifier
High Recall Algorithms
1. Disease Screening ML models can be utilized to screen individuals for various diseases or medical conditions based on symptoms, risk factors, or
diagnostic tests. High recall is crucial in this scenario to ensure that as many affected individuals as possible are correctly identified, minimizing false
negatives and preventing missed diagnoses.
2. Anomaly Detection ML algorithms can be employed to detect unusual patterns or anomalies in data, such as identifying fraudulent activities, network
intrusions, or system failures. High recall is essential to minimize false negatives and ensure that potential threats or abnormalities are not overlooked.
3. Cancer Detection ML models can aid in the early detection of cancer by analyzing medical images, such as mammograms or CT scans, to identify
suspicious lesions or tumors. High recall is critical in this case to minimize false negatives and ensure that cancer cases are not missed, enabling early
intervention and improved treatment outcomes.
4. Customer Churn Prediction ML algorithms can predict customer churn in various industries, such as telecommunications or subscription-based services.
High recall is important to identify as many potential churners as possible, enabling proactive retention strategies and reducing customer attrition.
5. Security Threat Detection ML models can analyze network traffic, log files, or user behavior to detect potential security threats, such as malware attacks
or data breaches. High recall is vital to minimize false negatives and ensure that suspicious activities or indicators of compromise are not missed, enhancing
overall system security.
6. Rare Event Prediction ML algorithms can be employed to predict rare events, such as earthquakes, stock market crashes, or equipment failures. High
recall is crucial in this context to detect as many instances of the rare event as possible, allowing for appropriate preventive measures or timely responses.
7. Emergency Response Systems ML models can aid in emergency response systems by analyzing various data sources, such as social media feeds or sensor
data, to detect and respond to incidents like natural disasters or public safety threats. High recall is important to ensure that relevant incidents are not
missed, enabling swift and effective response efforts.
This presentation uses
Information Retrieval refers to the process of efficiently and effectively retrieving relevant textual resources

documents

selection topic
topic
Density =
documents

selection Λ topic
Precision =
selection

noise selection Λ topic


visible Recall =
topic

silence
invisible
Trade-off between precision and recall depends on polymorphism of entities and polysemy of words

Caisse des Dépôts et Consignations


silence
P = 100% – R = 20%
polymorphism

silence

polysemy
CDC
noise
P = 40% – R = 60%
A classifier with pretty good accuracy, precision and recall, is it actually good?

dataset

True Positive False Positive

metrics value
accuracy 91.7%
Positive Negative
precision 92.5%
recall 92.5%
F1-score 92.5%

False Negative True Negative


classifier
This presentation uses
A sensitive subgroup refers to a specific group of individuals or a demographic category that is designated as
requiring protection against unfair or discriminatory treatment. Such a subgroup is defined based on certain
attributes that are considered legally or socially protected from discrimination.

gender age origin opinion

condition family occupation


Suppose that we have such a sensitive subgroup

dataset with a sensitive subgroup


marked in orange
True Positive False Positive

Positive Negative

False Negative True Negative


classifier
When not belonging to the sensitive subgroup, the performances might get even better…

dataset without the sensitive subgroup

True Positive False Positive

metrics value
accuracy 95.0%
Positive Negative
precision 94.1%
recall 97.0%
F1-score 95.5%

False Negative True Negative


classifier
… but for the sensitive subgroup the performances might get much worse

dataset limited to the sensitive subgroup

True Positive False Positive

metrics value
accuracy 75.0%
Positive Negative
precision 83.3%
recall 71.4%
F1-score 76.9%

False Negative True Negative


classifier
This presentation uses
We do have such a sensitive subgroup: bald referees or players (condition)

True Positive False Positive

Positive Negative

False Negative True Negative


classifier
or Recall

or Specificity

Fairness
Metrics

or Precision

[2102.08453] Towards the Right Kind of Fairness in AI (arxiv.org)


Comparing fairness metrics between the two subgroups

group = 0 group = 1

Observation Observation

y=1 y=0 y=1 y=0


Prediction

Prediction
ŷ=1 33 2 FDR=5.7% PR=58.3% ŷ=1 5 1 FDR=16.7% PR=50.0%

ŷ=0 1 24 FOR=4% NR=41.7% ŷ=0 2 4 FOR=33.3% NR=50.0%

TPR=97.1% TNR=92.3% TPR=71.4% TNR=80.0%


BR=56.7% BR=58.3%
Recall Specificity Recall Specificity

base rates are similar


% of both groups in the two classes are similar
Bare Rate is a fairness metric whose goal is to ensure that the ratio of positive examples in a
dataset is independent of membership in a sensitive group

• What it compares: Proportion


of positives between different
P group
• Reason to use: If the input
data are known to contain
Base rate biases, base rate may be
appropriate to measure
fairness
P
• Caveats: By only using the
observed values, information is
partial

Common fairness metrics — Fairlearn 0.10.0.dev0 documentation


Demographic Parity (or Statistical Parity) is a fairness metric whose goal is to ensure a machine
learning model’s predictions are independent of membership in a sensitive group

• What it compares:
Predictions between different
P N
groups
N
• Reason to use: If the input
data are known to contain
Positive Rates Negative Rates biases, demographic parity
may be appropriate to
measure fairness
P N
• Caveats: By only using the
predicted values, information
is thrown away

+ Classification model -
Common fairness metrics — Fairlearn 0.10.0.dev0 documentation
Equalized Odds is a fairness metric whose goal is to ensure a machine learning model’s predictions
are independent of membership in a sensitive group
• What it compares: True and
False Positive rates between
different groups
TP P N TN • Reason to use: If historical
data does not contain
measurement bias or historical
bias that we need to take into
True Positive Rates True Negative Rates account, and true and false
positives are considered to be
of the same importance,
TP P N TN
equalized odds may be useful
• Caveats: If there are historical
biases in the data, then the
original labels may hold little
+ Classification model - value
Common fairness metrics — Fairlearn 0.10.0.dev0 documentation
Equal Opportunity is a relaxed version of equalized odds that only considers conditional
expectations with respect to positive labels

• What it compares: True


Positive rates between
TP P different groups
• Reason to use: If historical
data are useful, and extra false
positives are much less likely to
True Positive Rates cause harm than missed true
positives, equal opportunity
TP P may be useful
• Caveats: If there are historical
biases in the data, then the
original labels may hold little
value
+ Classification model -
Common fairness metrics — Fairlearn 0.10.0.dev0 documentation
This presentation uses
Fairness metrics to use depends on business cases, here equal representation vs. equal errors
Fairness metrics to use depends on business cases, here punitive vs. assistive
Metrics in Machine Learning are Crucial from a Business Perspective

Performance Evaluation by providing clear and objective evaluation of a model


Benchmarking by comparing models’ performances
Model Selection for choosing the best model among various alternatives
Model Monitoring for detecting drift in model performance across time
Continuous Improvement by providing feedback on models' behaviour
ROI Assessment by measuring how well a model achieves its intended goals
Regulatory Compliance for ensuring that model’s outcomes meet business standards
© Mazzelli, 2023-2024
4
© Mazzelli, 2023-2024
5
© Mazzelli, 2023-2024
© Mazzelli, 2023-2024
7
© Mazzelli, 2023-2024
© Mazzelli, 2023-2024

© Mazzelli, 2023-2024
10
© Mazzelli, 2023-2024
© Mazzelli, 2023-2024
© Mazzelli, 2023-2024
© Mazzelli, 2023-2024
GO TO
wooclap.com
And use the code

TBZAXS

© Mazzelli, 2023-2024
© Mazzelli, 2023-2024
© Mazzelli, 2023-2024
© Mazzelli, 2023-2024

© Mazzelli, 2023-2024
© Mazzelli, 2023-2024

© Mazzelli, 2023-2024
© Mazzelli, 2023-2024
© Mazzelli, 2023-2024
© Mazzelli, 2023-2024
25
© Mazzelli, 2023-2024
• •

© Mazzelli, 2023-2024
© Mazzelli, 2023-2024
GO TO
wooclap.com
And use the code

TBZAXS

© Mazzelli, 2023-2024
© Mazzelli, 2023-2024

© Mazzelli, 2023-2024
© Mazzelli, 2023-2024





© Mazzelli, 2023-2024
© Mazzelli, 2023-2024
© Mazzelli, 2023-2024

© Mazzelli, 2023-2024
GO TO
wooclap.com
And use the code

TBZAXS

© Mazzelli, 2023-2024
© Mazzelli, 2023-2024

© Mazzelli, 2023-2024

© Mazzelli, 2023-2024

© Mazzelli, 2023-2024
© Mazzelli, 2023-2024

© Mazzelli, 2023-2024

© Mazzelli, 2023-2024
© Mazzelli, 2023-2024
© Mazzelli, 2023-2024
© Mazzelli, 2023-2024
© Mazzelli, 2023-2024
© Mazzelli, 2023-2024
GO TO
wooclap.com
And use the code

TBZAXS

© Mazzelli, 2023-2024

© Mazzelli, 2023-2024

© Mazzelli, 2023-2024

© Mazzelli, 2023-2024

© Mazzelli, 2023-2024
55
© Mazzelli, 2023-2024

© Mazzelli, 2023-2024
57
© Mazzelli, 2023-2024
© Mazzelli, 2023-2024



© Mazzelli, 2023-2024
© Mazzelli, 2023-2024
ARTIFICIAL INTELLIGENCE
AND BEYOND FOR FINANCE
PROF. DAVIDE LA TORRE, PHD, HDR

FULL PROFESSOR OF APPLIED MATHEMATICS AND COMPUTER SCIENCE


SKEMA BUSINESS SCHOOL - UNIVERSITÉ CÔTE D'AZUR, SOPHIA ANTIPOLIS CAMPUS, FRANCE

You might also like