0% found this document useful (0 votes)
216 views21 pages

Audit Practitioners Guide To Machine Learning Part 1 - WHPAPG1 - WHP - Eng - 1022

The document provides an overview of why auditing AI/ML applications is important as adoption increases. Key risks include data privacy issues from using personal data to train models, potential unfairness or bias in model outcomes, and loss of brand reputation and trust if models make inaccurate predictions. These challenges relate to both technology risks from how models are developed and compliance risks from regulations around data privacy and other issues. Audits are needed to evaluate risks across the full development process and help ensure models are accurate, fair, and comply with relevant laws and standards.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
216 views21 pages

Audit Practitioners Guide To Machine Learning Part 1 - WHPAPG1 - WHP - Eng - 1022

The document provides an overview of why auditing AI/ML applications is important as adoption increases. Key risks include data privacy issues from using personal data to train models, potential unfairness or bias in model outcomes, and loss of brand reputation and trust if models make inaccurate predictions. These challenges relate to both technology risks from how models are developed and compliance risks from regulations around data privacy and other issues. Audits are needed to evaluate risks across the full development process and help ensure models are accurate, fair, and comply with relevant laws and standards.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Audit Practitioner’s Guide to

Machine Learning, Part 1:


Technology

Audit
2 AUDIT PRACTITIONER’S GUIDE TO MACHINE LEARNING, PART 1: TECHNOLOGY

CONTENTS

4 Why Is AI/ML Application Auditing


Important?
6 Understanding Artificial Intelligence,
Machine Learning, Deep Learning
6 / Definitions of AI, ML, DL
7 / Machine Learning Categories
7 / Supervised Learning
7 / Unsupervised Learning
8 / Reinforcement Learning
9 Technology Risk in ML Applications
9 / Data Governance
9 / Data Engineering
11 / Feature Engineering
13 / Model Training
14 / Model Evaluation
14 / Supervised Learning Model Evaluation
16 / Model Deployment and Prediction
17 / Popular Machine Learning Libraries
18 Conclusion
19 Appendix: Recommended Resources
20 Acknowledgments

© 2022 ISACA. All Rights Reserved.


3 AUDIT PRACTITIONER’S GUIDE TO MACHINE LEARNING, PART 1: TECHNOLOGY

ABSTRACT
As artificial intelligence (AI) and machine learning (ML) continue to be rapidly adopted by
companies and governments around the world, existing auditing frameworks and
information technology controls must be better tailored to address the unique and
evolving risk AI and ML pose. Risk arises from the choice and design of the ML model and
the development cycle. Risk is also a factor in compliance with regulations, such as the
EU’s General Data Protection Regulation (GDPR) or the California Consumer Privacy Act
(CCPA) in the US.

This white paper series, consisting of two parts, provides a systematic auditing guideline
that identifies technology and compliance risk as the two primary focal points for
auditors.1 Part 1 of this white paper series addresses technology risk in ML auditing by
1

dissecting the typical software development life cycle of AI/ML algorithms and
highlighting the key areas that IT auditors should investigate. Part 2 addresses
compliance risk by identifying actionable steps that IT auditors can take to promote
compliance with applicable laws, regulations and industry standards.

1
1
Ahmed, H.; “Auditing Guidelines for Artificial Intelligence,” @ISACA, 21 December 2020, www.isaca.org/resources/news-and-
trends/newsletters/atisaca/2020/volume-26/auditing-guidelines-for-artificial-intelligence

© 2022 ISACA. All Rights Reserved.


4 AUDIT PRACTITIONER’S GUIDE TO MACHINE LEARNING, PART 1: TECHNOLOGY

Why Is AI/ML Application Auditing


Important?
Adoption of artificial intelligence (AI) and its subset, cancer. Many governments and nonprofit organizations
machine learning (ML), skyrocketed during the COVID-19 around the world deem the risk of inaccurate or biased
pandemic, as a recent Harvard Business Review article2 decisions by algorithms as tangible and potentially
and a PwC study show. Fifty-two percent of companies significant. As a result, regulations to protect citizens have
accelerated their AI adoption plans, centering AI in their been proposed, and guidance has been published.
businesses to boost productivity and efficiency and to Examples include the EU’s proposed Artificial Intelligence
drive innovation through new products and services. In a Act 3 , 4 and “Model Risk Management” from the Office of
2 3

recent Deloitte survey, respondents from 83% of the the Comptroller of the Currency in the US5 and 4

companies polled believed AI has made or will make a Environment, Social and Governance (ESG).6 5

practical and visible impact. As adoption accelerates AI’s


transition from an emerging technology to one The main challenges businesses need to consider when
characterized by ubiquitous mainstream use, the adopting AI and ML can be summarized as follows:
opportunities it provides are accompanied by challenges. • Data privacy—AI and ML models are tested and trained on

significant amounts of data, often personal data, which are


As companies embed more AI in products and processes, protected by laws and regulations around the world. Use of data
proliferating algorithms increase the potential for
in these models may become problematic when dealing with
inaccurate or biased decisions—in particular, these
complex models may affect the daily lives, health and sensitive data. Efforts to address these challenges are often in
rights of individuals seeking approvals for loan
the form of legislation. For example, the use of personally
applications, making investment decisions or planning
treatment for medical conditions such as cancer. identifiable information (PII) and protected health information

(PHI) is governed by the Health Insurance Portability and

In the past decade, public concern about technological Accountability Act of 1996 (HIPAA), and other elements of

risk has generally focused on the expanding collection consumer data are addressed in the GDPR, CCPA and similar

and use of personal data. But as companies embed more regulations.

AI in products and processes, proliferating algorithms • Fairness—Fairness relates to the emerging field of data ethics

increase the potential for inaccurate or biased decisions— and requires evaluating the impact of AI-based outcomes on

in particular, these complex models may affect the daily people’s rights, whether the model makes the decision on its

lives, health and rights of individuals seeking approvals for own or in concert with a human agent. The Boston housing

loan applications, making investment decisions or price data set, which has an explicit parameter to identify a

planning treatment for medical conditions such as locality as a Black or White area and how those racial

1
2
McKendrick, J.; “AI Adoption Skyrocketed Over the Last 18 Months,” Harvard Business Review, 27 September 2021, https://hbr.org/2021/09/ai-adoption-
skyrocketed-over-the-last-18-months
2
3
European Institute of Public Administration (EIPA), “The Artificial Intelligence Act Proposal and its Implications for Member States,” September 2021,
www.eipa.eu/publications/briefing/the-artificial-intelligence-act-proposal-and-its-implications-for-member-states/
3
4
Daws, R.; “EU regulation sets fines of €20M or up to 4% of turnover for AI misuse,” AI News, 14 April 2021, https://artificialintelligence-
news.com/2021/04/14/eu-regulation-fines-20m-up-4-turnover-ai-misuse/
4
5
Office of the Comptroller of the Currency (OCC), “Model Risk Management,” August 2021, www.occ.treas.gov/publications-and-
resources/publications/comptrollers-handbook/files/model-risk-management/pub-ch-model-risk.pdf
5
6
Anand, R.; B. Greenstein; “Six AI business predictions for 2022,” 30 November 2021, PwC AI and Analytics, https://www.pwc.com/us/en/tech-effect/ai-
analytics/six-ai-predictions.html

© 2022 ISACA. All Rights Reserved.


5 AUDIT PRACTITIONER’S GUIDE TO MACHINE LEARNING, PART 1: TECHNOLOGY

demographics affect housing prices, is a classic example of the monitoring procedure and enforce it in the development and

data fairness problem.7 6


ML operations processes.

• Brand reputation and trust—Twenty-two percent of enterprises Challenges associated with privacy, fairness, trust,
in the last two to three years have already faced customer discrimination, prediction and trending cut across
backlash due to decisions reached via their AI systems.8 One 7
categories of technology and compliance risk. For
prominent area where customer concerns have surfaced is in example, model systematic errors can be addressed with
facial recognition systems, as potential misidentification can a thorough technology risk audit. For the purposes of this
have a significant impact in sensitive circumstances, such as white paper series, audits of AI and ML applications
airport security monitoring or exam proctoring. should consider the following:
• Model transparency—For simple ML models such as linear • Technology risk auditing—Assess risk related to ML algorithms
regression, it is fairly easy to notify users of the system’s logic and life cycle, data science, and cybersecurity. Practitioners are
and decision-making parameters. However, for state-of-the-art advised to implement auditing procedures based on the six key
black box ML models such as deep learning, there is currently stages in a typical AI and ML application development cycle:
no simple way to disentangle and weigh the many parameters • Data governance
in the hidden layers. These transparency differences have • Data engineering
become especially important in recent years as transparency • Feature engineering
and explainability have become core principles for AI use, as • Model training
seen in new laws, regulations or industry standards. • Model testing (evaluation)

• Model deployment

• Compliance risk auditing—Assess risk related to the rights and


Challenges associated with privacy, fairness, trust,
discrimination, prediction and trending cut across freedoms of individuals affected by the AI/ML application,
categories of technology and compliance risk. For which may vary by industry and region. Examples of data
example, model systematic errors can be addressed with
a thorough technology risk audit. compliance regulations include:

• Financial industry—Office of the Comptroller of the

Currency’s “Model Risk Management: New Comptroller’s


• Model systematic errors—The ML model can be overfitting (i.e.,
Handbook Booklet”
the model is complex and tends to “memorize” noise in a large
• Healthcare industry—HIPAA (Health Insurance Portability
data set, while failing to capture the overall trend) or underfitting
and Accountability Act)
(i.e., the model is too simple to model the complex data). If the
• Consumer data privacy—CCPA (California Consumer
model is not trained in a scientific way, overfitting or underfitting
Privacy Act)
can occur, and the resulting systematic errors will introduce
• European region regulation—GDPR (General Data
bias in the product. For example, using the same data to train
Protection Regulation)
and to test will lead to overfitting models.

• Learning and adapting algorithms—While algorithms that learn Part 1 of the white paper series covers technology
and adapt may be more accurate, they can evolve in a elements of auditing ML. Part 2 addresses compliance
discriminatory way. It is critical to implement a continuous considerations.

6
7
Delve, “The Boston Housing Dataset,” 10 October 1996, www.cs.toronto.edu/~delve/data/boston/bostonDetail.html
7
8
Thieullent, A.-L., et al.; “AI and the Ethical Conundrum: How organizations can build ethically robust AI systems and gain trust,” Capgemini Research
Institute, 2020, https://www.capgemini.com/wp-content/uploads/2020/10/AI-and-the-Ethical-Conundrum-Report.pdf

© 2022 ISACA. All Rights Reserved.


6 AUDIT PRACTITIONER’S GUIDE TO MACHINE LEARNING, PART 1: TECHNOLOGY

Understanding Artificial Intelligence,


Machine Learning and Deep Learning
This section will demystify and disentangle common learning from new data, without a human being to give
terminology associated with AI. In particular, while AI is an instructions in the form of a program.”10 The key word is
9

umbrella term with broad application, ML and deep “data.” Currently, most of the applications classified as AI
learning (DL) are subsets of AI with important distinctions are actually ML tools that leverage statistical models that
for the purposes of this white paper. “summarize” the patterns from large data sets that can
later be used to make predictions in new data.

Definitions of AI, ML, DL Deep learning is a type of machine learning based on


artificial neural networks in which multiple layers of
Artificial intelligence is “the theory and development of processing are used to extract progressively higher-level
computer systems able to perform tasks that normally features from data.
require human intelligence.” Practically speaking, AI
9 8

AI and ML innovation began decades ago. However, these


consists of algorithms that specialize in “learning” from
technologies have attracted significant industry attention
data. Given that, AI is a broad term that encompasses
recently, thanks to advancements in DL that outperform
everything from conditional “if-else” rule engines to
traditional ML—such as logistic regression, support vector
modern applications such as self-driving cars and
machines and random forest classification—in the
sophisticated computer programs such as AlphaGo.
application of computer vision, language translation,
Machine learning, a subcategory of AI, is “the process of speech recognition and more. AI, ML and DL have a
computers changing the way they carry out tasks by hierarchical relationship that is depicted in figure 1.

FIGURE 1: Hierarchical Relationship Among AI, ML and DL

Artificial intelligence (AI)—Describes computer systems that


Artificial Intelligence perform tasks associated with human intelligence

Machine learning (ML)—Describes processes of self-correction


Machine Learning by computers during execution of successive or iterative tasks,
wherein output of prior tasks (failures, false positives, false
negatives, etc.) is received as input to later tasks, resulting in
process correction without human intervention (e.g., software
modifications)

Deep learning (DL)—Describes a category of machine


Deep Learning
learning based on artificial neural networks in which multiple
layers of processing extract progressively higher-level features
from data

8
9
Lexico, “Artificial Intelligence,” www.lexico.com/en/definition/artificial_intelligence
9
10
Cambridge Dictionary, “machine learning,” https://dictionary.cambridge.org/us/dictionary/english/machine-learning

© 2022 ISACA. All Rights Reserved.


7 AUDIT PRACTITIONER’S GUIDE TO MACHINE LEARNING, PART 1: TECHNOLOGY

Based on this hierarchy, this white paper focuses on Instead, it explores the data and draws inferences from
machine learning because it is the most common family of data sets to describe hidden structures in unlabeled data.12 11

algorithms encountered when auditing practical applications.

Knowledge Check: Predicting Property


Machine Learning Categories Location
A real estate mortgage team is trying to build an ML
ML can be classified into three main categories:
model to predict the ZIP code of a property based on
supervised learning, unsupervised learning and
house size and other characteristics. For example, a
reinforcement learning.
3,000-square-foot lot size with a three-bedroom house
Supervised Learning will be predicted to be in the 94301 postal code area,
The term supervised learning refers to applications in while an 8,000-square-foot lot size with a five-
which the training data comprise examples of the input bedroom house will be predicted to be in 75218.
vectors, along with their corresponding target vectors.11 10

Question: Is this ML model classification or


Because this learning involves training (i.e., teaching) the
regression?13
computer using labeled data (i.e., examples), the learning
1

is supervised.

Key categories of unsupervised learning include:


Supervised learning refers to applications in which the
• Clustering—To automatically identify groups in data, such as
training data comprise examples of the input vectors,
along with their corresponding target vectors. grouping animals by number of legs or size. Typical algorithm:

K-means clustering.
Key categories of supervised learning include: • Dimensionality reduction—To compress the data by projecting
• Classification—Supervised learning that involves predicting a them from higher to lower dimensions while preserving certain
nominal class label. For example, does a chest X-ray image characteristics, such as reducing the dimensions of city, state
indicate “Has Cancer” or “Not Cancer”? Typical algorithms are and postal code to location. Typical algorithm: principal
logistic regression, support vector machine and random forest. component analysis (PCA)
• Regression—Supervised learning that involves predicting a • Anomaly detection—To identify outliers that stand out from the
numerical label. For example, “this three-bedroom house in the crowd, such as a credit card account that made 157 purchases
94301 postal code area may be worth $2.1 million.” Typical in one day, which may indicate fraudulent activity. Typical
algorithms are linear regression and support vector regression. algorithm: median, z-score statistics, local outlier factor (LOF)
• Hybrid—Supervised learning that involves the use of labeled • Association rule—To discover rules for correlation patterns,
data that can learn from partial labeled data, such as semi- such as people who buy X also tend to also buy Y.
supervised learning.
12
Answer: Classification. Although a ZIP code is composed of numbers,
Unsupervised
in this case it is learning studies
variablehow systems can infer a
1

a categorical to represent a geolocation.


Unsupervised Learning Unsupervised learning studies how systems
function to describe a hidden structure from unlabeled
can infer a
Unsupervised learning studies how systems can infer a function to describe a hidden structure from unlabeled
data. The system does not predict the right output.
data. The system does not predict the right output.
function to describe a hidden structure from unlabeled Instead, it explores the data and draws inferences from
Instead, it explores the data and draws inferences from
data sets to describe hidden structures in unlabeled data.
data. The system does not predict the right output. data sets to describe hidden structures in unlabeled
data.

11
Bishop, C.; Pattern Recognition and Machine Learning, Springer, USA, 2006, https://link.springer.com/book/9780387310732
12
Loukas, S.; “What is Machine Learning: Supervised, Unsupervised, Semi-Supervised and Reinforcement learning methods,” Towards Data Science, 10
June 2020, https://towardsdatascience.com/what-is-machine-learning-a-short-note-on-supervised-unsupervised-semi-supervised-and-aed1573ae9bb
13 Answer: Classification. Although a ZIP code is composed of numbers, in this case it is a categorical variable to represent a geolocation.

© 2022 ISACA. All Rights Reserved.


8 AUDIT PRACTITIONER’S GUIDE TO MACHINE LEARNING, PART 1: TECHNOLOGY

Reinforcement Learning
Example: Bank Direct Marketing Campaign
Reinforcement learning is learning what to do—how
to map situations to actions—to maximize a A banking institution runs direct marketing campaigns

numerical reward signal. The learner is not told on over 45,000 customers.

which actions to take, but instead must discover Customers in the marketing campaigns represent
which actions yield the most reward by trying dozens of demographics, which makes manual
them.14 12

analysis quite complex.


Reinforcement learning is quite different from supervised
Demographics represented include the following:
learning, in the sense that the algorithm learns by
interacting with the environment, not with the training data • Age—Numeric
set. For example, with a chess-playing algorithm, • Job—Categorical by type (e.g., admin, blue collar,
reinforcement learning involves agents playing the games entrepreneur, housemaid, management, retired, self-
millions of times to figure out strategies. Supervised employed, services, student, technician, unemployed,
learning feeds agents past chess-playing training data. unknown)
Typical reinforcement learning algorithms include Q- • Marital status—Categorical (e.g., married, single,
learning and deep reinforcement learning. Their unknown). Note: single means never married, divorced
applications include self-driving cars, robot control and or widowed.
Google’s AlphaGo program (which defeated the best • Education—Categorical (e.g., basic.4y, basic.6y, basic.9y,
human Go player). high.school, illiterate, professional.course,

university.degree, unknown)
Example: DeepMind Technology’s AlphaGo and • Default—Categorical (e.g., has credit in default?
Deep Reinforcement Learning [no | yes | unknown])
AlphaGo was the first computer program to defeat a • Housing—Categorical (e.g., has a housing loan?
professional human Go player and was the first to [no | yes | unknown])
defeat a Go world champion in 2016. It is arguably the
strongest Go player in history. Go has 10170 possible The data scientists perform a quick K-modes

board configurations—more than the number of atoms algorithm (similar to K-means), dividing the data set

in the known universe. It is far more complex than into two clusters, which identifies the following

chess. AlphaGo combines advanced search trees with interesting patterns:

deep neural networks such as policy and value • Cluster 1—Captures demographics such as
networks. The DeepMind team introduced AlphaGo to admin/technician/management jobs,
numerous amateur gamers to help the program university degrees, homeownership
develop an understanding of reasonable human play. • Cluster 2—Captures attributes such as blue-collar jobs,
Then it played against different versions of itself high school degrees, lack of homeownership
thousands of times, each time learning from its
mistakes. Over time, AlphaGo improved and became With these insights coming from the clustering ML
increasingly stronger and better at learning and algorithm, the chief marketing officer and marketing
decision-making. This process is known as deep team can design the next campaign to target each
reinforcement learning.15 1
segment.16 1

12
15
14 Sutton,
Sutton,R.;
R.;A.
A.Barto;
Barto;Reinforcement
ReinforcementLearning:
Learning:An
AnIntroduction,
Introduction,Second
SecondEdition,
Edition,MIT
MITPress,
Press,USA,
USA,2018,
2018,https://mitpress.mit.edu/books/reinforcement-
https://mitpress.mit.edu/books/reinforcement-
learning-second-edition
learning-second-edition
15
DeepMind, “AlphaGo,” https://deepmind.com/research/case-studies/alphago-the-story-so-far
16
Moro, S.; P. Cortez; P. Rita; UCI Machine Learning Repository, “A Data-Driven Approach to Predict the Success of Bank Telemarketing,” Decision
Support Systems 62:22-31, June 2014, Elsevier, https://archive.ics.uci.edu/ml/datasets/bank+marketing

© 2022 ISACA. All Rights Reserved.


9 AUDIT PRACTITIONER’S GUIDE TO MACHINE LEARNING, PART 1: TECHNOLOGY

Technology Risk in ML Applications


Building machine learning applications follows a typical • Data security—Existence of adequate data protection measures

data science and software development life cycle. This against unauthorized access and cyberthreats, such as data

section highlights the main phases to provide a roadmap poisoning, AI attacks and hacking. Remaining alert to emerging

for the audit discussion and to highlight the key risk security threats is key.

factors: • Data licenses—For open-source data sets, awareness of any

restrictions for commercial applications. Understanding of the


• Data governance
implications of popular open-source data licenses, such as
• Data engineering
Creative Commons Share-Alike, is necessary.
• Feature engineering

• Model training
A data governance framework establishes sound data
• Model evaluation governance, standardizes data management practices
• Model deployment and prediction and enhances trust in the integrity of an AI application.

Typically, these stages are not linear, especially when


Data Engineering
involving model retraining that may introduce an iterative
Data engineering focuses on practical applications of data
process. (This section assumes that the business
collection and analysis. Data engineering is the first step
problem and success criteria for the ML application have
in building an ML application to address a specific
been identified—e.g., to increase house price prediction
business problem.
accuracy to 95%.)
ML and AI application models currently rely heavily on
training data. Hence, auditing the data sources is the critical
Data Governance first step of ML auditing. In this stage, the auditors should
Data governance focuses on data strategies to manage focus on data compliance and governance. It is critical for
the data an enterprise already owns or that it plans to the auditors to examine whether the data sources comply
obtain or procure. with the GDPR, CCPA and other relevant regulations.

Audit practitioners should examine the following key areas


A data governance framework establishes sound data
in the data engineering stage:
governance, standardizes data management practices
and enhances trust in the integrity of an AI application. • Data collection—To acquire and ingest data into the target

When auditing a data governance framework, the auditor’s analytic environment, with attention to:

considerations should include but are not limited to: • Credibility of data source

• Accuracy and completeness of data provided


• Accountability—Existence of a documented data governance • Whether the receiving enterprise has the appropriate
framework with clearly defined assignment of roles and processes to check the accuracy and completeness of
responsibilities, including senior management, based on the incoming data
IIA's three lines model. Oversight and monitoring are key. • Data sources, such as Web API, other databases, existing
• Data quality—Assurance that data used are of good quality and data pipeline
that any data quality issues are escalated in an appropriate and • Different formats, such as CSV, JSON, log files, raw text,
timely way to responsible parties for rectification. images, audio

© 2022 ISACA. All Rights Reserved.


10 AUDIT PRACTITIONER’S GUIDE TO MACHINE LEARNING, PART 1: TECHNOLOGY

• Data exploration—To explore which data fields may be useful


Knowledge Check: Predicting Credit Limits for
for building ML predictive models. Note that raw data sources
Borrowers
are usually noisy, and only a few data fields may be helpful to
A bank is building an ML application to predict credit
the business problem.
limits for loan applicants. The data scientists are
• Data cleansing—To cleanse raw data, especially unstructured
requesting access to these data sources:
data. For example, some bank records may use all capital

letters for customers’ names, or area codes may or may not 1. Customer’s full name

include parentheses being written as either (650)-123-4567 or 2. Date of birth

1234567. Without normalizing such data, the ML model 3. Government-issued ID number

may suffer from data inaccuracy and missing values, especially 4. Country of residence

when data sources need to be joined. 5. Current home address

6. Race
Raw data sources are usually noisy, and only a few data 7. Gender
fields may be helpful to the business problem.
8. Previous year’s income

• Data anonymizing—To comply with HIPAA, GDPR, CCPA and 9. Number of years of relationship with the bank

other legal and regulatory requirements, PHI and PII data fields Question: Which of these data sources contain PII that
(e.g., social security number) may be anonymized for data may be restricted by GDPR requirements?17 14

analytics.

• Data schema—To formally describe the database schema in

language supported by the database management system


Advanced Topic: Synthetic Data for AI/ML
Applications
(DBMS). Most data applications use databases as storage, for
While access to sensitive customer data such as PHI
better access and analytics.
is becoming a bigger hurdle for data scientists
• ETL data pipeline—To extract the right data fields, apply
building ML applications, some enterprises have
transform functions and load them into the database (ETL).
begun to generate or procure synthetic data that
reflect the important statistical properties of the
To comply with HIPAA, GDPR, CCPA and other legal and
regulatory requirements, PHI and PII data fields (e.g., underlying real-world data. Collecting and using
social security number) may be anonymized for data sensitive data can raise privacy concerns and leave
analytics.
businesses vulnerable to data breaches; hence privacy
• Data fairness—To determine whether any data fields could regulations such as the GDPR and CCPA restrict the
introduce unfairness that might impact people’s lives. For collection and use of personal data and impose fines
example, the inclusion of race in a data field might lead banks to on enterprises that violate those restrictions.
unfairly reject loan applications from minorities.
Synthetic data are inexpensive compared to the cost of
collecting large data sets and they can support AI/ML
development or software testing without compromising
customer privacy. The use of synthetic data shows
promise for training ML models. However, enterprises’
C-suite, risk and legal teams should evaluate the potential
benefits of creating a program and ensure that AI experts
with highly complex skills manage its execution.18 15

14
17
Answer: 1, 3 and 5. University of Pittsburgh, “Guide to Identifying Personally Identifiable Information (PII),” www.technology.pitt.edu/help-desk/how-to-
documents/guide-identifying-personally-identifiable-information-pii.
15
18
Lucini, F.; “The Real Deal About Synthetic Data,” MITSloan Management Review, 20 October 2021, https://sloanreview.mit.edu/article/the-real-deal-
about-synthetic-data/

© 2022 ISACA. All Rights Reserved.


11 AUDIT PRACTITIONER’S GUIDE TO MACHINE LEARNING, PART 1: TECHNOLOGY

Feature Engineering • Derivation—Some raw data fields might be hard to use directly

in the ML model, such as a time stamp. However, a wide range


Feature in machine learning terminology means an individual
of features can be “derived” from a time stamp, such as a year,
measurable property or characteristic of a phenomenon
month, day, hour, minute or day of the week.
derived from raw data. Feature engineering refers to the
• Binning and data transformation—When dealing with numeric
process of using domain knowledge to select and transform
values that have a huge range but are in a small sample set,
the most relevant variables from raw data when creating a
binning and log transformation may help the model focus on
predictive model using ML or statistical modeling. A good
the order of magnitude of the feature. For example, rather than
feature can make ML prediction simpler and much more
asking for annual income, which could vary from 0 to millions,
accurate. In practice, feature engineering is probably more
asking for tax bracket might be preferable, since there are fewer
critical in machine learning than choosing algorithms.19 15

possibilities.

The following feature engineering techniques are critical


areas for auditors to examine: Example: Newborn Customers?
• Data imputation—Missing values are one of the most common A marketing team’s web survey collects ages of bank
problems when preparing training data. Values might be clients. Their front-end web form defaults age to zero,
missing due to human errors, interruptions in the data flow or which introduces significant outliers in ML model
privacy concerns. Whatever the reason, missing values training data. In this case, age values of zero can
negatively affect the performance of ML models. clearly be excluded in the model training. However, in
• Handling outliers—In statistics, an outlier is a data point that is other contexts, outliers may be more subtle. What if
X standard deviation away from the population mean. One an infant-formula company is collecting the ages of
common way to remove such outliers is the z-score method in babies? Legitimate customer data may reflect age
statistics. Auditors need to examine outlier handling because it zero and could be corrupted by substituting NULL
may introduce systematic errors into the ML model. values for all ages. Here, engineers might have to
redesign the data collection front end. Hence, it is
Example: Predicting Home Prices critical for the auditor to pay special attention to
A real estate marketing team is building an ML model outlier issues in the feature engineering stage.
to predict home prices. The data scientists propose a
list of features that would be predictive:
Example: Encoding the Context in Feature
• Number of square feet
Engineering
• Number of bedrooms
When building an ML model, source data time stamp
• Number of bathrooms
January 3, 2022, is not as insightful as “Monday,” the
• House type (single-family home, townhouse or condo)
weekday feature derived from the date. Furthermore,
• Postal code
the context can be encoded as “the first workday of
• School district rating
the year.” The ML model can predict customer
• Regional crime rate
behavior more accurately based on derived features
Note that some features are intrinsic (such as number rather than raw time stamps, simply because more
of square feet), but others are contextual (such as contextual information is encoded in the feature
school district ratings), and the latter may need to be engineering step.
correlated from different data sources.

15
19
Rençberoğlu, E.; “Fundamental Techniques of Feature Engineering for Machine Learning,” Towards Data Science, 1 April 2019,
https://towardsdatascience.com/feature-engineering-for-machine-learning-3a5e293a5114

© 2022 ISACA. All Rights Reserved.


12 AUDIT PRACTITIONER’S GUIDE TO MACHINE LEARNING, PART 1: TECHNOLOGY

• Scaling—Different raw data may have different data ranges, Hence, in DL models, the feature engineering step is not
such as age and income. In ML, how can these two columns be as essential. However, this raises another important issue:
compared? Normalization and standardization can render the how to determine what features are encoded in the black
multivariate features into a similar scale. For example, both age box ML model in the deep neural network’s hidden layers.
and income can be scaled to between 0 and 1, where 0 is the The following section addresses this topic.
minimum and 1 is the maximum in the original data.
Advanced Topic: White Box and Black Box ML
Models and Model Interpretability
With the rise of DL algorithms, features can be
Auditors often encounter two kinds of machine
automatically derived from raw data through the use of
complex hierarchical neural networks. learning models:

• White box models—For example, linear regression models


• Fairness—Fairness in ML is a recent area of study that aims to
can be easily interpreted because they usually have only
ensure that biases in the data and inaccuracies in the model do
a few parameters. However, this model is less powerful
not result in models that treat individuals unfavorably based on
(due to underfitting) when predicting complicated data
characteristics such as race, gender, disability, and sexual or
problems such as detecting human faces in Internet
political orientation.20 Auditors should perform due diligence
16

photos.
with regard to whether a feature is introducing discrimination,
• Black box models—For example, deep learning models
bias or other unfairness.
usually have millions of parameters and complex neural
• Privacy—Feature engineering can introduce privacy concerns,
network architectures, which makes them more capable of
especially when correlating the original data with external data
predicting complicated problems. However, it is almost
sources. The auditor should examine whether additional privacy
impossible to explain what hidden neural network layers
issues are introduced in this step.
actually learn.24 Recent research (e.g., on the LIME and XAI
1

With the rise of DL algorithms, features can be algorithms) aims to understand the interpretability of DL
automatically derived from raw data through the use of
models.25 2

complex hierarchical neural networks.

Case Study: Netflix’s Million-Dollar Prize and the Data Privacy Lesson
Data privacy in the machine learning era might be more subtle than immediately apparent. In 2009, Netflix had to
cancel its million-dollar prize that invited ML researchers to build a better recommendation system to predict movies,
based on a training data set from 480,000 Netflix customers. Although the data sets were constructed to preserve
1
24
Gunning, D.; M. Stefik; J. Choi; T. Miller; S. Stump; G. Yang; “XAI-
customer privacy, privacy advocates criticized the prize. In 2007, two researchers
Explainable from theScience
artificial intelligence,” University of18
Robotics, Texas at 2019,
December
https://openaccess.city.ac.uk/id/eprint/23405/
Austin were able to identify individual users by matching the data
25 setsM.;with
Ribeiro, filmC.ratings
S. Singh;
2
on
Guestrin; theShould
“’Why Internet Movie
I Trust You?’: Explaining
the Predictions of Any Classifier,” Proceedings of the 22nd ACM
Database (IMDb).21 On December 17, 2009, four Netflix users filed
1
a class-action lawsuit against Netflix, 22 alleging
SIGKDD International Conference on Knowledge Discovery and Data
2

that Netflix had violated US fair trade laws and the Video Mining,
Privacy August 2016,
Protection Act by releasing the data sets. There was
https://dl.acm.org/doi/10.1145/2939672.2939778
public debate about privacy for research participants. On March 19, 2010, Netflix reached a settlement with the
plaintiffs, after which they voluntarily dismissed the lawsuit.23 3

For ML auditors, sometimes data privacy in ML can be subtle, especially when feature engineering correlates one
data source with external data sources that may breach data privacy requirements.

20
Oneto, L.; S. Chiappa; “Fairness in Machine Learning: Recent Trends in Learning From Data,” Studies in Computational Intelligence, Springer, USA, 2020
21
Narayanan, A.; V. Shmatikov; “Robust De-anonymization of Large Sparse Datasets,” University of Texas at Austin,
www.cs.utexas.edu/~shmat/shmat_oak08netflix.pdf
22
Jane Doe, et al., v. Netflix, Class Action Complaint, US District Court for the Northern District of California, 17 December 2009,
21 www.wired.com/images_blogs/threatlevel/2009/12/doe-v-netflix.pdf
1
Narayanan, A.; V. Shmatikov; “Robust De-anonymization of Large Sparse Datasets,” University of Texas at Austin,
23
Singel, R.; “NetFlix Cancels Recommendation Contest After Privacy Lawsuit,” Wired, 12 March 2018, www.wired.com/2010/03/netflix-cancels-contest/
www.cs.utexas.edu/~shmat/shmat_oak08netflix.pdf
24
22
2
16
Gunning,
20 Jane
Oneto, L.; D.;
Doe, S. M.
al.,Stefik;
etChiappa; J. Choi;
v. Netflix, Class
“Fairness T.in
Miller;
Action S. Stump;
Complaint,
Machine G.US
Learning:Yang; “XAI-Explainable
District
Recent Court for
Trends the artificial
Northern
in Learning From intelligence,”
District Science
of California,
Data,” Studies Robotics, 18 Intelligence,
17 December
in Computational December
2009, 2019, Springer, USA, 2020
https://openaccess.city.ac.uk/id/eprint/23405/
www.wired.com/images_blogs/threatlevel/2009/12/doe-v-netflix.pdf
23 Ribeiro, M.; S. Singh; C. Guestrin; “’Why Should I Trust You?’: Explaining the Predictions of Any Classifier,” Proceedings of the 22nd ACM SIGKDD
25
3
Singel, R.; “NetFlix Cancels Recommendation Contest After Privacy Lawsuit,” Wired, 12 March 2018, www.wired.com/2010/03/netflix-cancels-contest/
International Conference on Knowledge Discovery and Data Mining, August 2016, https://dl.acm.org/doi/10.1145/2939672.2939778

© 2022 ISACA. All Rights Reserved.


13 AUDIT PRACTITIONER’S GUIDE TO MACHINE LEARNING, PART 1: TECHNOLOGY

Model Training
Model training is the heart of building an ML model. The
• What machine learning algorithms are used?
process of training a model involves selecting the optimal
• What are the model selection criteria? In other words, why
hyperparameters that fit the training data well, based on the
use this algorithm instead of alternatives?
following business justification, use case and success criteria:
• What are the limitations of this ML model?
• Supervised learning—To minimize errors between the predicted
• Transparency—Is it a white box or black box model? How is the
values and target values
machine learning model output explained?
• Deep learning—To minimize fitting errors in each epoch

iteration
Knowledge Check: ML Model to Detect Online
• Unsupervised learning—To maximize cluster tightness metrics Fraud
When auditing the model training stage, the two key areas An antifraud team is building an ML model to detect
to examine are training data and model methodology. online fraud. They have collected 100 past fraud cases
from the legal department. Each fraud case has its
• Training data—The quality of training data determines the ML
own unique characteristics, and the data scientists
model quality. The auditor should ask the following questions
want to make sure the ML model covers all of them.
regarding training data:
So, the data scientists use all the data cases to train a
• Where are the training data collected? Specify the
complex decision tree model with a depth of 50, and
sources and access rights.
they achieve results with 98% accuracy on all 100
• What are the statistics for the training data (e.g.,
cases.
percentage of positive and negative labels)?

• How are the training data prepared? What is the split ratio Question: Is this a reliable ML model?26 1

between training, testing and validation?

• How are unbalanced data treated? For example, the

training data contain 10 fraud cases but 10,000 normal

cases because fraud is a small probability incident for the Knowledge Check: ML Prototypes for
Predicting Home Prices
population. Is up-sampling or down-sampling used to

balance the training data? A bank’s data science team has developed two ML
prototypes for predicting home prices. Model 1 is a
The ML model’s conceptual soundness is vital to simple linear regression model with 5 parameters—a
success. ML model training is about finding the optimal
white box ML model.
model between overfitting and underfitting and
understanding the assumptions made.
Model 2 is a 10-layer convolutional neural network
• Model methodology—The ML model’s conceptual soundness is 1
(deep
26
Answer:learning)—a blackoverfitting
No. This is a typical box MLmodel.
model.
Without proper cross-
validation, this model ends up memorizing all the cases but is unable
vital to success. ML model training is about finding the optimal to generalize to new examples.both models perform well in
During cross-validation,
model between overfitting and underfitting and understanding
all metrics.
the assumptions made. The auditor should ask the following

questions about model methodology: Question: Which ML model is preferable?27 1

• What analytical assumptions are made?

26
Answer: No. This is a typical overfitting model. Without proper cross-validation, this model ends up memorizing all the cases but is unable to generalize
to new examples.
27
Answer: Model 1, because it performs just as well but is simpler.

© 2022 ISACA. All Rights Reserved.


14 AUDIT PRACTITIONER’S GUIDE TO MACHINE LEARNING, PART 1: TECHNOLOGY

Model Evaluation Knowledge Check: Is 99% Precision Good


Model evaluation rigorously measures an ML model’s Enough?
performance using different metrics. Auditing ML models A bank’s antifraud data science team claimed its first
is probably the most critical step for the audit function, ML prototype achieved a remarkable level of 99%
since it involves quantitative processes that are universal precision in fraud detection. That is, the new ML
to all machine learning models. model was able to detect 99 out of 100 frauds.

As explained earlier, ML models are categorized into Question: Was this ML model good enough?28 1

supervised learning, deep learning, and unsupervised


learning and their evaluation metrics differ. In practice, it is
more straightforward to measure supervised learning
models because they use ground truth data labeling. Finding the best ML model is a battle between underfitting
and overfitting.
Supervised Learning Model Evaluation
• Underfitting—Model is too simple for the complex data.
A common pitfall in auditing the ML model is relying
• Overfitting—Model is complex and tends to “memorize” noise in
exclusively on one metric. This can generate misleading
a large data set, while failing to capture the overall trend.
results that are inconsistent with the model’s actual
performance in the real world, as illustrated by the Figure 2 shows an example of a linear regression model
following knowledge check. to predict housing prices.

FIGURE 2: ML Models That Are Underfit, Optimal and Overfit

Linear regression model to predict housing prices


PRICE

PRICE

PRICE

1
28
Answer: Not necessarily. An experienced auditor would ask about the
false positive rate and the specificity. That is, of the frauds detected,
how many were normal transactions (i.e., false positives)? A model
that predicts all transactions as fraud would achieve 100% precision,
SIZE SIZE because it would successfully detect all frauds.
SIZE Yet it is clearly an
ineffective ML model, because it would create an excessive number of
Ø O + Ø 1x Ø O + Ø 1x + Ø 2 x Ø O + Ø 1x + Ø 2 x + Ø 3 x + Ø 4 x
false 2positives and would not be practical for production.
2 3 4

Underfit Optimal Overfit


Source: Ng, A.; “Machine Learning,” offered by Stanford University through Coursera, https://www.coursera.org/learn/machine-learning

28
Answer: Not necessarily. An experienced auditor would ask about the false positive rate and the specificity. That is, of the frauds detected, how many
were normal transactions (i.e., false positives)? A model that predicts all transactions as fraud would achieve 100% precision, because it would
successfully detect all frauds. Yet it is clearly an ineffective ML model, because it would create an excessive number of false positives and would not be
practical for production.

© 2022 ISACA. All Rights Reserved.


15 AUDIT PRACTITIONER’S GUIDE TO MACHINE LEARNING, PART 1: TECHNOLOGY

To get a holistic assessment of the ML model, it is key to times. The use of k-fold cross-validation will result in a
adopt scientific experiment settings. lower-bias ML model.

Three Data Sets for ML Model Training


Confusion Matrix and Its Derived Metrics
29
• Training data set—Sample of data used to fit the model. 1

Confusion matrices are structured as tables to help users


• Validation data set—Sample of data used to provide an
visualize overall performance of an ML model. Various
unbiased evaluation of a model fit on the training data set while
metrics can be derived from them (figure 3)—making
tuning model hyperparameters. The evaluation becomes more
them the most insightful tools for ML model evaluation.
biased as skill on the validation data set is incorporated into the

model configuration. • Accuracy—Proportion of the total number of predictions that

• Test data set—Sample of data used to provide an unbiased are correct

evaluation of a final model fit on the training data set. • Precision—Proportion of positive cases that are correctly
identified
K-fold Cross-Validation • Negative predictive value—Proportion of negative cases that
A more rigorous assessment, cross-validation uses a are correctly identified
resampling procedure to evaluate ML models on a limited • Sensitivity or recall—Proportion of actual positive cases that
data sample. It shuffles and splits data into k groups, are correctly identified
keeping one group as a test data set and using the • Specificity—Proportion of actual negative cases that are
training procedure on the remaining k-1-folds. Repeat k correctly identified

1
29
Shah,3:
FIGURE T.;Confusion
“About Train, Matrix
Validation and Test Sets in Machine Learning,”
Towards Data Science, 6 December 2017,
https://towardsdatascience.com/train-validation-and-test-sets-
Actual
72cb40cba9e7Confusion Matrix
Positive (P) Negative (N)
Positive (PP) True positive (TP) False positive (FP) Precision TP/PP
Predicted
Negative (PN) False negative (FN) True negative (TN) Recall TP/P
Sensitivity Specificity Accuracy
TP/P TN/N (TP + TN) / (P + N)

ROC Curve and AUC Score FIGURE 4: ROC Curve

The receiver operating characteristic (ROC) curve figure 4 Perfect


Classifier ROC curve
is created by plotting the true-positive rate (TPR) against
1.0
the false-positive rate (FPR) at various threshold settings. Better
ROC analysis provides tools to select possibly optimal
TRUE POSITIVE RATE

models. In practice, when the ROC curve approaches the


top left point (perfect classifier), an AUC (area under the Worse
ier
ROC curve) score close to 1 indicates a good ML model.30 1

0.5 ss if
c la
om
30
ROC, https://en.wikipedia.org/wiki/Receiver_operating_characteristic and
1

0.0
0.0 0.5 1.0
FALSE POSITIVE RATE

29
Shah, T.; “About Train, Validation and Test Sets in Machine Learning,” Towards Data Science, 6 December 2017, https://towardsdatascience.com/train-
validation-and-test-sets-72cb40cba9e7
30
ROC, https://en.wikipedia.org/wiki/Receiver_operating_characteristic

© 2022 ISACA. All Rights Reserved.


16 AUDIT PRACTITIONER’S GUIDE TO MACHINE LEARNING, PART 1: TECHNOLOGY

Unsupervised Learning Model Evaluation


Model Deployment and
Evaluating the performance of unsupervised learning Prediction
models, such as clustering, is not as simple as a
Once an ML model is trained and tested, the next step is
supervised classification algorithm, due to the lack of
to deploy the model in production and make predictions
labeled data. Results can be subjective if not measured
on new live data streams. For example, a random forest
quantitatively.
model deployed as a web service can predict whether a
Intuitively, a good clustering model will create dense payment transaction is fraudulent.
clusters (small intracluster distance), while keeping each
cluster away from others (large intercluster distance). There is a paradigm shift in the modern ML industry. The

Most of the clustering model metrics are based on such ML model is not static, and it requires the updating of

heuristics. model parameters and input data regularly, depending on


use cases. Auditors need to enforce continuous
Intuitively, a good clustering model will create dense monitoring.
clusters (small intracluster distance), while keeping
each cluster away from others (large intercluster Understanding service-level agreements (SLAs) is critical
distance). Most of the clustering model metrics are
based on such heuristics. to the eventual success of an ML application. An SLA for
model deployment and prediction typically includes:
Silhouette Score • Hardware requirements—What hardware (chipset)

The silhouette score31 measures the separation distance


17
specifications are required to meet the business needs?

between clusters. It displays a measure of how close each • Cloud-based or Edge device-based?

point in a cluster is to points in neighboring clusters. With • Central processing unit (CPU) vs. graphics processing

a range of [-1, 1] this measure is a great tool to visually unit (GPU)?

inspect the similarities within clusters and differences • Specific chipset-level support required?

across clusters. • Other hardware specifications (e.g., RAM, I/O, network

bandwidth) required?
The silhouette score is calculated using the mean
• Alignment with business requirements? (Cloud-based ML
intracluster distance (i) and the mean nearest-cluster
web applications inherit the cloud vendor’s SLA terms
distance (n) for each sample. The silhouette coefficient for
applicable to availability and network latency in certain
a sample is (n - i) / max(i, n), where n is the distance
regions.)
between each sample and the nearest cluster the sample
• Software requirements—The software environment will impact
is not a part of, and i is the mean distance within each
the ML model production as well.
cluster.
• Operating system—Linux, Unix, Windows, macOS, IOS,
Other Unsupervised Learning Metrics: Android or other OS?

• Rand Index 32 18
• Software library versions—Not all software libraries will be

• Adjusted Rand Index backward-compatible, especially in open-source libraries,

• Mutual information and upgrades may break the previous working

• Calinski-Harabasz Index applications. Hence, auditors need to examine the

• Davies-Bouldin Index versions.

17
31
Zuccarelli, E.; “Performance Metrics in Machine Learning — Part 3: Clustering,” Towards Data Science, 31 January 2021,
https://towardsdatascience.com/performance-metrics-in-machine-learning-part-3-clustering-d69550662dc6
18
32
Scikit-Learn, https://scikit-learn.org/stable/modules/clustering.html#rand-index

© 2022 ISACA. All Rights Reserved.


17 AUDIT PRACTITIONER’S GUIDE TO MACHINE LEARNING, PART 1: TECHNOLOGY

• Latency and throughput—Latency measures how fast and


Popular Machine Learning
throughput measures how many predictions are delivered

simultaneously. The auditor’s considerations include: Libraries


• To evaluate latency, how many milliseconds does each
As part of the technical risk assessment process, IT
prediction take? In some financial scenarios, such as
auditors should also examine the ML libraries being
credit card payments, transaction fraud detection may
adopted. Popular programming languages for ML and DL
require real-time prediction capability (i.e., subsecond) to
include Python, Julia, R, and Java, but Python may be
avoid a negative impact on customer satisfaction.
emerging as the preferred language of machine learning.
• To evaluate throughput, how many predictions at peak
The availability of libraries and open-source tools makes
does the ML application need to handle? Is batch mode
Python an ideal choice for developing ML models.34 19

supported?

Continuous monitoring is crucial for optimal ML model Continuous monitoring is crucial for optimal ML model
performance, especially on data streams that may performance, especially on data streams that may
introduce new instances the ML model cannot interpret
introduce new instances the ML model cannot interpret accurately.
accurately. This is a common problem in adversarial
environments such as fraud detection and cybersecurity
Example: Apple M1 Chipset for Machine
intrusion detection, where adversaries change their attack Learning
tactics to bypass existing detection methods. The audit
Some ML applications will be deployed on mobile
practitioner should examine whether the ML model
devices, and it is important for auditors to examine
production has implemented continuous monitoring to
whether certain chipset features are required for
adapt to the new data landscape and determine the best
expected performance. Apple’s M1 system chip with
time to retrain the ML model.
the Apple Neural Engine for ML models delivers up to
15 times faster ML performance.35
Popular programming languages for ML and DL include
Python, Julia, R and Java, but Python may be emerging
as the preferred language of machine learning. The Example: Is a 10-Millisecond ML Prediction
availability of libraries and open-source tools makes Fast Enough?
Python an ideal choice for developing ML models.
It takes a payment fraud ML model only 10
milliseconds to predict whether a transaction could be
fraudulent, which appears to be a relatively fast
Example: Deep Learning and GPU
computation. However, context is important. A global
Most DL algorithms are based primarily on GPU for
payment network such as Visa handles 24,000
optimal speed and perform poorly on CPU-based
transactions per second.
architectures. In a recent TensorFlow benchmark,
algorithms with GPU convolutional neural nets (CNN) In this case, on one web application server, the ML
were more than 600% faster than with a CPU only. system would take 240 seconds to predict 1 second’s
worth of payment throughput, which is unacceptable.
The auditor should examine whether a certain
Hence, it is important to understand how an ML
processing hardware environment is assumed.33 1

application can be scaled to handle high throughput.

33
DATAmadness, “TensorFlow 2 - CPU vs GPU Performance Comparison,” 27 October 2019, https://datamadness.github.io/TensorFlow2-CPU-vs-GPU
34
Costa, C.; “Best Python Libraries for Machine Learning and Deep Learning,” Towards Data Science, 24 March 2020,
https://towardsdatascience.com/best-python-libraries-for-machine-learning-and-deep-learning-b0bd40c7e8c
35 Costa, C.; “Best Python Libraries for Machine Learning and Deep Learning,” Towards Data Science, 24 March 2020,
35
Apple, “Apple Unleashes M1,” 10 November 2020, www.apple.com/newsroom/2020/11/apple-unleashes-m1/
19

https://towardsdatascience.com/best-python-libraries-for-machine-learning-and-deep-learning-b0bd40c7e8c

© 2022 ISACA. All Rights Reserved.


18 AUDIT PRACTITIONER’S GUIDE TO MACHINE LEARNING, PART 1: TECHNOLOGY

Popular Open-Source ML Libraries enables easy scaling of computations. It is simple to use and

quick and easy to set up, and it offers smooth integration with
Widely used open-source ML libraries include the
other tools.
following:
Commercially Available ML Libraries
• TensorFlow—TensorFlow is one of the best libraries available

for working with machine learning on Python. Offered by Widely used commercial ML libraries include the
Google, TensorFlow makes ML model building easy for following:
beginners and professionals alike. • SAS
• Pytorch—Developed by Facebook, PyTorch is one of the few • H2O
machine learning libraries for Python. Apart from Python, • RapidMiner
PyTorch also supports C++ with its C++ interface. It is
The auditor should ask the following questions regarding
considered among the top contenders for best machine
machine learning libraries:
learning and deep learning framework.
• Scikit-learn—Scikit-learn is an actively used ML library for • What software license does this ML library have?
Python. It includes easy integration with different ML
• Is this ML library the latest stable version?
programming libraries such as NumPy and Pandas.
• Does this ML library contain known vulnerabilities identified in
• Pandas—Pandas is a Python data analysis library used primarily
databases such as CVE?36 For example, NumPy (Python) has
for data manipulation and analysis before the data set is
reported major vulnerabilities such as DoS Overflow in certain
prepared for training. Pandas make working with time series
versions.
and structured multidimensional data effortless for machine

learning programmers. • Are there any known compatibility or dependency issues? For

• Spark MLlib—For large-scale machine learning, Apache Spark is example, Pickle, a popular Python persistent library, has reported

a popular choice. Apache Spark MLlib is an ML library that compatibility issues with Pandas’ data frame in certain versions.37
21

Conclusion
The elements of the roadmap identified in this white for continuous monitoring, as these models are not static.
paper—data governance, data engineering, feature This white paper provides the foundation for auditors to
engineering, model training, model evaluation and model gain an understanding of these and other aspects of
deployment and prediction—are key risk factors in ML auditing ML. This white paper also identifies the ML pre-
applications. These elements provide specific stages in implementation, post-implementation and continuous
ML where practitioners can identify audit considerations. monitoring elements that auditors need to evaluate and
For auditors becoming familiar with ML, it is important to communicate to identify where ML opportunities have
understand the training and testing of models prior to been leveraged as expected and where there are
deployment. Once ML models are deployed, auditors opportunities for improvement.
should examine data outliers and acknowledge the need

20
36
MITRE Corporation, “CVE® Details: The Ultimate Security Vulnerbility Data Source,” https://www.cvedetails.com/vulnerability-list/vendor_id-
16835/product_id-39445/Numpy-Numpy.html
21
37
GitHub, Inc., “Pickle incompatibility between 0.25 and 1.0 when saving a MultiIndex dataframe #34535,” https://github.com/pandas-
dev/pandas/issues/34535

© 2022 ISACA. All Rights Reserved.


19 AUDIT PRACTITIONER’S GUIDE TO MACHINE LEARNING, PART 1: TECHNOLOGY

Appendix: Recommended Resources


Machine learning has been among the most active Goodfellow, I.; Y. Bengio; A. Courville; Deep Learning, MIT
computer science academic research areas for decades, Press, USA, 2016, https://mitpress.mit.edu/books/deep-
and efforts from researchers and statisticians have learning
enabled the field to progress from abstraction to maturity.
Hastie, T.; R. Tibshirani; J. Friedman; The Elements of
Although practitioners are not required to understand the Statistical Learning: Data Mining, Inference, and Prediction,
comprehensive theoretical aspects of machine learning Springer, USA, 2016,
algorithms, the following books and online resources are https://link.springer.com/book/10.1007/978-0-387-84858-7
recommended for those who wish to develop a deeper
Course
understanding of ML.
Ng, A.; “Machine Learning,” offered by Stanford University
Books
through Coursera, www.coursera.org/learn/machine-
Bishop, C.; Pattern Recognition and Machine Learning, learning
Springer, USA, 2006,
https://link.springer.com/book/9780387310732

Domingos, P.; The Master Algorithm: How the Quest for the
Ultimate Learning Machine Will Remake Our World, Basic
Books, USA, 2015,
https://www.basicbooks.com/titles/pedro-domingos/the-
master-algorithm/9780465061921/

© 2022 ISACA. All Rights Reserved.


20 AUDIT PRACTITIONER’S GUIDE TO MACHINE LEARNING, PART 1: TECHNOLOGY

Acknowledgments
ISACA would like to acknowledge:

Lead Developer Expert Reviewers (cont.)


Victor Fang, Ph.D. Xitij U. Shukla, Ph.D. Wickey Wang
CEO, AnChain.AI Inc., USA CISA CISA, Six Sigma Green Belt
India USA

Expert Reviewers Ioannis Vittas Yap Kai Yeow


Ibrahim Sulaiman Alnamlah CISA, CISM, COBIT 2019 CISA, CISM, CRISC, CAMS
CISA, COBIT 2019, ITLv4, Foundation, BCCLA Singapore
ISO/IEC 27001 LA Greece
Saudi Arabia
ISACA Board of Directors
Chetan Anand
Pamela Nigro, Chair Bjorn R. Watne
CDPSE, CCIO, CPISI, ISF IRAM2, ISO 9001
CISA, CGEIT, CRISC, CDPSE, CRMA CISA, CISM, CGEIT, CRISC, CDPSE, CISSP-
LA, ISO 22301 LA, ISO 27001 LA, ISO Gregory Touhill
Vice President, Security, Medecision, USA ISSMP
27701, ISO 31000, SQAM, Lean Six Sigma CISM, CISSP
John De Santis, Vice-Chair Senior Vice President and Chief Security
Green Belt, Agile Scrum Master, Fellow of ISACA Board Chair, 2021-2022
Former Chairman and Chief Executive Officer, Telenor Group, USA
Privacy Technology, NLSIU Privacy and Director, CERT Center, Carnegie Mellon
Data Protection Laws Officer, HyTrust, Inc., USA Asaf Weisberg
University, USA
India Niel Harper CISA, CISM, CGEIT, CRISC, CDPSE, CSX-P
Tracey Dedrick
Shigeto Fukuda CISA, CRISC, CDPSE, CISSP Chief Executive Officer, introSight Ltd.,
ISACA Board Chair, 2020-2021
Chief Information Security Officer, Data Israel
CISA, CDPSE Former Chief Risk Officer, Hudson City
Japan Privacy Officer, Doodle GmbH, Germany Bancorp, USA
Gregory Touhill

Kevin Fumai Gabriela Hernandez-Cardoso CISM, CISSP


Brennan P. Baybeck
CDPSE, CCSK, CEET, CIPM, CIPP/US/E, Independent Board Member, Mexico ISACA
CISA, Board
CISM, Chair,CISSP
CRISC, 2021-2022
CIPT, FIP, PLS Director,
ISACA CERT
Board Center,
Chair, Carnegie Mellon
2019-2020
Maureen O’Connell
USA University, USA
NACD-DC Vice President and Chief Information
Board Chair, Acacia Research (NASDAQ), Security Officer for Customer Services,
Tracey Dedrick
Shamik Kacker
Former Chief Financial Officer and Chief Oracle
ISACACorporation,
Board Chair, USA
2020-2021
CISM, CRISC, CDPSE, CCSK, CCSP, CISSP,
ITIL Expert, TOGAF 9 Administration Officer, Scholastic, Inc., Former
Rob Chief Risk Officer, Hudson City
Clyde
USA USA Bancorp,
CISM, USA
NACD-DC
Veronica Rose ISACA Board
Brennan Chair, 2018-2019
P. Baybeck
Harmendra Nitish Koladoo
CISA, CDPSE Independent Director,CISSP
CISA, CISM, CRISC, Titus, Executive
CISA, CDPSE, CEH, CHFI, ISO 27001 LA
Senior Information Systems Auditor– Chair,
ISACAWhite
BoardCloud
Chair,Security,
2019-2020Managing
Hong Kong SAR, China
Advisory Consulting, KPMG Uganda, Director, Clyde Consulting
Vice President LLC, USA
and Chief Information
Frank Oelker Founder, Encrypt Africa, Kenya Security Officer for Customer Services,
CISA Oracle Corporation, USA
David Samuelson
Germany
Chief Executive Officer, ISACA, USA Rob Clyde
Christian Nyanor Ohene CISM, NACD-DC
Gerrard Schmid
CISA, CEH ISACA Board Chair, 2018-2019
Former President and Chief Executive
Ghana Independent Director, Titus, Executive
Officer, Diebold Nixdorf, USA
Jian Qin, Ph.D. Chair, White Cloud Security, Managing
CISA, CISSP, PMP, ODSC Machine Director, Clyde Consulting LLC, USA
Learning Certification
USA

© 2022 ISACA. All Rights Reserved.


21 AUDIT PRACTITIONER’S GUIDE TO MACHINE LEARNING, PART 1: TECHNOLOGY

About ISACA
For more than 50 years, ISACA® (www.isaca.org) has advanced the best
1700 E. Golf Road, Suite 400
talent, expertise and learning in technology. ISACA equips individuals with
Schaumburg, IL 60173, USA
knowledge, credentials, education and community to progress their careers
and transform their organizations, and enables enterprises to train and build
Phone: +1.847.660.5505
quality teams that effectively drive IT audit, risk management and security
priorities forward. ISACA is a global professional association and learning Fax: +1.847.253.1755
organization that leverages the expertise of more than 150,000 members who
Support: support.isaca.org
work in information security, governance, assurance, risk and privacy to drive
innovation through technology. It has a presence in 188 countries, including Website: www.isaca.org
more than 220 chapters worldwide. In 2020, ISACA launched One In Tech, a
philanthropic foundation that supports IT education and career pathways for
under-resourced, under-represented populations.

Provide Feedback:
DISCLAIMER
www.isaca.org/audit-practitioners-
ISACA has designed and created Audit Practitioner’s Guide to Machine guide-to-ML-part-1
Learning, Part 1: Technology (the “Work”) primarily as an educational resource
for professionals. ISACA makes no claim that use of any of the Work will Participate in the ISACA Online
assure a successful outcome. The Work should not be considered inclusive of Forums:
all proper information, procedures and tests or exclusive of other information, https://engage.isaca.org/onlineforums

procedures and tests that are reasonably directed to obtaining the same Twitter:
www.twitter.com/ISACANews
results. In determining the propriety of any specific information, procedure or
test, professionals should apply their own professional judgment to the LinkedIn:
www.linkedin.com/company/isaca
specific circumstances presented by the particular systems or information
technology environment. Facebook:
www.facebook.com/ISACAGlobal

RESERVATION OF RIGHTS Instagram:


www.instagram.com/isacanews/
© 2022 ISACA. All rights reserved.

Audit Practitioner’s Guide to Machine Learning, Part 1: Technology

© 2022 ISACA. All Rights Reserved.

You might also like