Ilovepdf Merged-9 Removed (2) Compressed Compressed Compressed Compressed-1-188
Ilovepdf Merged-9 Removed (2) Compressed Compressed Compressed Compressed-1-188
Descriptive analytics
Predictive analytics
Prescriptive analytics
Descriptive analytics
Descriptive analytics
Analyze past data to describe it to see clear patterns and trends
Data visualization
Tells an interesting, insightful story
Intuitive to understand accurately
Tells the truth: intuitive interpretation is accurate, not misleading
It does NOT apply the tricks of How to lie with statistics
Predictive analytics: analyze the past to predict the future, assuming that the
future will continue to resemble the past
Primary focus: high-accuracy estimation or prediction of the target outcome
Input factors that lead to the outcome are incidental
Legitimate uses for predictive analytics
Prioritize: managers can understand how things work and then choose priorities
Anticipate: for outcomes over which managers have very little control
Time series forecasting: genuine time-based prediction
Benchmark: for outcomes over which managers have some control
Establish a benchmark to beat for prescriptive analytics if nothing changes
Automated machine learning (AutoML) usually gives the best possible results
Image source: ICONS Library
Predictive analytics
Transparency
Ethical responsibility
Realism
Actionability
XAI for Transparency
Ethical responsibility refers to XAI that provides information to assure that the AI
respects human values of fairness, free will, liberty, privacy, and so on.
Actionability refers to XAI that explains how the results of the AI carry
implications for recommended actions.
Analysis
(various
methodologies)
Step 2: Classify concepts Step 3: Conduct analysis Step 4: After analysis, classify
Step 1: Gather all
according to their ultimate according to various concepts according to
concepts available
importance and time frame appropriate methodologies actionable explanation types
Key attributes of concepts in actionable explanation
Relevance
Ultimate
Relevant
Not relevant
Control
High
Low
No
Relevance of concepts:
Ultimate
High control
It is managers’ fundamental responsibility to shape the concept
such that the Ultimate changes in the desired direction
Low or no control
Although managers cannot directly control the values of the
concept, they should measure and observe its values
Thus, managers should understand its effects and anticipate
them proactively
Managers should identify effective responses to take based on
changes in observed values
This presentation uses
Introducing
the subject
Short video on an AI
which failed
AI Camera Ruins Soccer Game For Fans After Mistaking Referee's Bald Head For Ball | IFLScience
AI Camera Ruins Soccer Game For Fans After Mistaking
Referee's Bald Head For Ball - YouTube
Performance Metrics in Machine Learning
Data Program
…/…
heuristic
Escape the Maze - Python Coding - Day 6 - Southernfolks Designs
Escape the maze Output
…/…
qmaze (samyzaf.com)
Simplified history of AI, ML and DL
AI Generative AI
https://towardsdatascience.com/notes-on-artificial-intelligence-ai-machine-learning-ml-and-deep-learning-dl-for-56e51a2071c2
When do
you need
Machine
Learning?
Hilarious Data Science Jokes on the Web to Lighten Your Mood (analyticsinsight.net)
This presentation uses
Applications
of Machine
Learning
What are the
different kinds
of algorithms
in Machine
Learning?
Machine Learning Algorithms In Layman’s Terms, Part 1 | by Audrey Lorberfeld | Towards Data Science
Supervised
Machine
Learning
Regression
features
individual matrix of values for individuals
target
vector of values for individuals
X y regression model
the model predicts a target from the features
the target is a continuous variable: e.g., physical value, price, score…
performance
compare
❹ numbers
y and ŷ
features X
Regression ❸
model use the model
to predict ŷ using X
ŷ = model(X)
Example of
Failure in
Real Estate
The $500mm+ Debacle at Zillow Offers – What Went Wrong with the AI Models? - insideBIGDATA
Example:
Titanic
dataset for
classification
Example: Titanic dataset for classification
features X target y
performance
compare
❹ classes
y and ŷ
features X
Classification ❸
model use the model
to predict ŷ using X
ŷ = model(X)
Global
Methodology
CRISP-DM Methodology
• CRoss-Industry
• Standard Processing
• for Data Mining
Global
Methodology
What data scientists
spend the most time Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data Science Task, Survey Says (forbes.com)
doing
Data
Cleaning
• Garbage In
• Garbage Out
What are the advantages of pre-processing and cleaning data before modelling?
1. Data Quality High-quality data leads to better model performance. Data cleaning helps identify and rectify errors, outliers, and missing values,
ensuring the data is accurate and reliable.
2. Model Performance Cleaned data leads to more robust and accurate models. Outliers and noisy data can mislead machine learning
algorithms, resulting in subpar predictions.
3. Reduced Bias Biased data can lead to biased models. Data cleaning helps in mitigating biases, ensuring that the model provides fair and
unbiased predictions.
4. Feature Engineering Data cleaning often involves feature engineering, where irrelevant or redundant features are removed, and relevant ones
are transformed or created. This enhances the model's ability to extract meaningful patterns.
5. Computational Efficiency Cleaning data reduces its complexity, making it easier and faster for algorithms to process. This is especially
important when dealing with large datasets.
6. Improved Interpretability Cleaned data is more interpretable. When stakeholders understand the data, they can better trust and use the
model's output.
7. Generalization Data cleaning improves a model's ability to generalize from the training data to new, unseen data. This ensures the model
performs well in real-world scenarios.
8. Error Reduction By addressing data quality issues beforehand, the likelihood of errors in predictions is significantly reduced, leading to more
reliable results.
This presentation uses
Underfitting
and
Overfitting
in Regression
The model is too The model can The model is too
simple to be able generalize complex and was
to learn anything properly from able to learn the
from the data. the data. noise in the data.
Underfitting and Overfitting in Machine Learning – C#, Python, Data Structure , Algorithm and ML (wordpress.com)
Definitions
Bias
• The bias measures how well the average prediction of a model matches the true value.
• It quantifies the systematic error of the model.
• Formula
Variance
• The variance measures the variability of the model's predictions for different training
sets.
• It quantifies the model's sensitivity to changes in the training data.
• Formula
Geometric perspective
Variance
Bias
By Bernhard Thiery - Bias–variance tradeoff - Wikipedia
Model perspective and performance
Variance
Bias
Bias-Variance Trade-off
A good model should be neither too simple nor too complex to provide good results on both training and test sets.
underfitting overfitting
Decoding Bias Variance Tradeoff. Bias — An error that on avergae tells… | by Saurav Agrawal | Jun, 2023 | Medium
Underfitting Overfitting
reducing bias reducing variance
more data more data for training
+ +
dataset train set
dataset + dataset-
+ -
regularization regularization
increase model complexity decrease model complexity
cross-validation cross-validation
hyperparameters hyperparameters
train longer early stop
… …
This presentation uses
To evaluate a regression model, we need to compute the error due to the gap between y and ŷ
It's important to note that the choice of metric depends on the specific problem, the nature of the data, and
the context in which the model will be used. It is recommended to consider the characteristics of your data
and the objectives of your analysis to select the most appropriate metric for your regression task.
outlier
What is
an outlier
in real estate?
Example: Real Estate dataset without the outlier
sample
y ŷ
without outlier
43.2 48.0
40.8 41.4
27.7 30.6
34.2 30.3
… …
dataset
Positive Negative
perfect classifier
Standard classifier
dataset
Positive Negative
Observation
Negative
Positive
(silence)
Accuracy
Accuracy is a common performance metric used to
evaluate the performance of classification models. It
measures the proportion of correctly predicted
instances out of the total number of instances in a
dataset. Accuracy is particularly useful when the classes
in the dataset are balanced, meaning they have roughly
equal representation.
TP FP
FN TN
TP FP
FN TN
Precision
Precision is a performance metric used to evaluate the
quality of a classification model, particularly in binary
classification problems. Precision measures the
proportion of correctly predicted positive instances
(true positives) out of the total instances predicted as
positive (true positives plus false positives). It provides
an indication of how well the model predicts positive
instances and how likely it is to correctly identify them.
TP FP
FN TN
TP FP
FN TN
Recall
Recall, also known as sensitivity or true positive rate, is
a performance metric used to evaluate the
effectiveness of a classification model, particularly in
binary classification problems. Recall measures the
proportion of correctly predicted positive instances
(true positives) out of the total actual positive instances
(true positives plus false negatives). It quantifies the
ability of the model to identify all positive instances
correctly.
TP FP
FN TN
TP FP
FN TN
Example: Titanic dataset with a very simple classifier
Let us implement a very simple classifier based on the feature gender: female → survived, male → not survived
Let us compute the metrics on the whole dataset, just to figure out the numbers.
Observation
(female)
Accuracy 0.787
Precision 0.742
Not survived
109 468 Recall 0.681
(male)
Example: Titanic dataset with another simple classifier
Let us implement another simple classifier based on the features gender (same as above) and infancy (below 6 years old)
Let us compute again the metrics on the whole dataset, just to figure out the numbers.
Observation
(female or infant)
Accuracy 0.787 0.795
Density
(random selection)
Accuracy F1-Score
+ simple to understand and interpret + balance the trade-off between precision and recall
- sensitive to unbalanced classes - implies equal weight on false positives and false negatives
dataset
Positive Negative
1. Fraud Detection ML models can be trained to identify fraudulent activities, such as credit card fraud or online transaction fraud, by analyzing
patterns and anomalies in the data. High precision is important in this case to minimize false positives and ensure that genuine transactions are
not mistakenly flagged as fraudulent.
2. Medical Diagnosis ML models can assist in diagnosing medical conditions based on patient data, including symptoms, medical history, and test
results. Achieving high precision is essential to avoid misdiagnosis and ensure accurate identification of diseases or conditions.
3. Image Classification ML models can classify images into different categories, such as recognizing objects, identifying faces, or detecting specific
features. High precision is crucial to minimize misclassifications and ensure accurate categorization of images.
4. Sentiment Analysis ML models can analyze text data, such as customer reviews or social media posts, to determine the sentiment expressed
(e.g., positive, negative, or neutral). High precision is important to accurately classify the sentiment and provide reliable insights for businesses.
5. Spam Filtering ML models can be trained to filter out spam emails or messages from legitimate ones. High precision is necessary to avoid
mistakenly classifying important messages as spam, ensuring that the filtering process is accurate and reliable.
6. Autonomous Vehicles ML algorithms are employed in self-driving cars to interpret sensor data and make decisions while driving. High precision
is critical to ensure the correct identification of objects, accurate lane detection, and precise decision-making to maintain safety on the road.
7. Quality Control ML models can be used to detect defects or anomalies in manufacturing processes, such as identifying faulty products on an
assembly line. High precision is vital to minimize false positives and accurately identify defective items, improving overall quality control.
High Recall Algorithms minimize false negatives which are not visible by users in selection
dataset
Positive Negative
documents
selection topic
topic
Density =
documents
selection Λ topic
Precision =
selection
silence
invisible
Trade-off between precision and recall depends on polymorphism of entities and polysemy of words
silence
polysemy
CDC
noise
P = 40% – R = 60%
A classifier with pretty good accuracy, precision and recall, is it actually good?
dataset
metrics value
accuracy 91.7%
Positive Negative
precision 92.5%
recall 92.5%
F1-score 92.5%
Positive Negative
metrics value
accuracy 95.0%
Positive Negative
precision 94.1%
recall 97.0%
F1-score 95.5%
metrics value
accuracy 75.0%
Positive Negative
precision 83.3%
recall 71.4%
F1-score 76.9%
Positive Negative
or Specificity
Fairness
Metrics
or Precision
group = 0 group = 1
Observation Observation
Prediction
ŷ=1 33 2 FDR=5.7% PR=58.3% ŷ=1 5 1 FDR=16.7% PR=50.0%
• What it compares:
Predictions between different
P N
groups
N
• Reason to use: If the input
data are known to contain
Positive Rates Negative Rates biases, demographic parity
may be appropriate to
measure fairness
P N
• Caveats: By only using the
predicted values, information
is thrown away
+ Classification model -
Common fairness metrics — Fairlearn 0.10.0.dev0 documentation
Equalized Odds is a fairness metric whose goal is to ensure a machine learning model’s predictions
are independent of membership in a sensitive group
• What it compares: True and
False Positive rates between
different groups
TP P N TN • Reason to use: If historical
data does not contain
measurement bias or historical
bias that we need to take into
True Positive Rates True Negative Rates account, and true and false
positives are considered to be
of the same importance,
TP P N TN
equalized odds may be useful
• Caveats: If there are historical
biases in the data, then the
original labels may hold little
+ Classification model - value
Common fairness metrics — Fairlearn 0.10.0.dev0 documentation
Equal Opportunity is a relaxed version of equalized odds that only considers conditional
expectations with respect to positive labels
© Mazzelli, 2023-2024
10
© Mazzelli, 2023-2024
© Mazzelli, 2023-2024
© Mazzelli, 2023-2024
© Mazzelli, 2023-2024
GO TO
wooclap.com
And use the code
TBZAXS
© Mazzelli, 2023-2024
© Mazzelli, 2023-2024
© Mazzelli, 2023-2024
© Mazzelli, 2023-2024
•
© Mazzelli, 2023-2024
© Mazzelli, 2023-2024
•
© Mazzelli, 2023-2024
© Mazzelli, 2023-2024
© Mazzelli, 2023-2024
© Mazzelli, 2023-2024
25
© Mazzelli, 2023-2024
• •
© Mazzelli, 2023-2024
© Mazzelli, 2023-2024
GO TO
wooclap.com
And use the code
TBZAXS
© Mazzelli, 2023-2024
© Mazzelli, 2023-2024
•
•
© Mazzelli, 2023-2024
© Mazzelli, 2023-2024
•
•
•
•
•
•
•
•
© Mazzelli, 2023-2024
© Mazzelli, 2023-2024
© Mazzelli, 2023-2024
•
© Mazzelli, 2023-2024
GO TO
wooclap.com
And use the code
TBZAXS
© Mazzelli, 2023-2024
© Mazzelli, 2023-2024
•
© Mazzelli, 2023-2024
•
© Mazzelli, 2023-2024
•
© Mazzelli, 2023-2024
© Mazzelli, 2023-2024
•
© Mazzelli, 2023-2024
•
© Mazzelli, 2023-2024
© Mazzelli, 2023-2024
© Mazzelli, 2023-2024
© Mazzelli, 2023-2024
© Mazzelli, 2023-2024
© Mazzelli, 2023-2024
GO TO
wooclap.com
And use the code
TBZAXS
© Mazzelli, 2023-2024
•
© Mazzelli, 2023-2024
•
© Mazzelli, 2023-2024
•
© Mazzelli, 2023-2024
•
© Mazzelli, 2023-2024
55
© Mazzelli, 2023-2024
•
© Mazzelli, 2023-2024
57
© Mazzelli, 2023-2024
© Mazzelli, 2023-2024
►
•
►
►
© Mazzelli, 2023-2024
© Mazzelli, 2023-2024
ARTIFICIAL INTELLIGENCE
AND BEYOND FOR FINANCE
PROF. DAVIDE LA TORRE, PHD, HDR