Beyond Algorithms Delivering AI For Business
Beyond Algorithms Delivering AI For Business
“Much has been written about the mechanics of modern AI – what it is, what it isn’t, and
how to build and train one – but this fascinating book is among the few that explain what
it means for the enterprise – where it fits in to business needs, how it can be integrated into
existing systems, and how it can create opportunities for new value.”
—Grady Booch, IBM Fellow, Chief Scientist for Software Engineering
“Books about AI too often tend towards the extremes – either praising or condemning
unconditionally. Beyond Algorithms offers a more nuanced, nontechnical introduction from
experienced practitioners. Filled with guiding principles and real-world examples, this book
offers a balanced view of AI’s strengths and limitations and provides practical advice for
problem selection, project management, and expectation setting for AI initiatives.”
—Emily Riederer, Senior Analytics Manager at Capital One
“Beyond Algorithms is one of those wonderful story-driven IT books that distils decades
worth of real-world experience into intoxicating wisdom that is refreshingly easy to
consume. The book plows through and around the hype of the ‘all powerful’ deep neural
nets to provide an engineering approach for the field. It won’t tell you how to establish
the Singularity, but its ‘A to I checklist’, accuracy and monitoring advice, and ‘doability
method’ could make you an AI delivery hero.”
—Richard Hopkins, FREng FIET CEng, Former President of IBM
Academy of Technology
“Important and timely. A practical guide for realizing business value through AI capabilities!
Rather than focusing on hard to access details of the mathematics and algorithms behind
modern AI, this book provides a roadmap for a diversity of stakeholders to realize AI business
solutions that are reliable, responsible, and sustainable.”
—Matthew Gaston, AI Engineering Evangelist
Beyond Algorithms
Beyond Algorithms
Delivering AI for Business
James Luke
David Porter
Padmanabhan Santhanam
First Edition published 2022
by CRC Press
6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742
Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume
responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to
trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to
publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know
so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or
utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, i ncluding
photocopying, microfilming, and recording, or in any information storage or retrieval system, without written
permission from the publishers.
For permission to photocopy or use material electronically from this work, access www.copyright.com or contact the
Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. For works that are
not available on CCC please contact [email protected]
Trademark notice: Product or corporate names may be trademarks or registered trademarks and are used only for
identification and explanation without intent to infringe.
DOI: 10.1201/9781003108498
Authorsxi
Acknowledgementsxiii
PROLOGUE1
v
vi ◾ Contents
EPILOGUE279
INDEX281
Authors
Dr. James Luke For as long as I can remember, I have been dreaming of Thinking Machines.
As a child, I was fascinated by the prospect of creating intelligent machines and desperate
to understand how such machines could be created. As an adult, I have focused my entire
career, first in IBM and now at Roke, on building real artificial intelligence (AI) systems
that solve real problems.
It may surprise some to hear that for most of my career there was little interest, from
any quarter, in AI. In fact, I remember being told by at least one senior executive, in the late
1990s, that if I wanted to be successful in my career, I had to stop going on about AI. I still
meet him every so often … but he’s even more senior now … and I’m ashamed to admit
that I haven’t had the courage to remind him of his terrible career advice.
The recent explosion of interest in AI is both exciting and terrifying for me. Exciting as,
for the first time in my career, there is genuine interest in the mainstream adoption of AI.
Terrifying as, with so much expectation, there is a real risk that society will become disil-
lusioned if projects fail to deliver.
In this book, I want to share my experience with you. Quite simply, I want to spare you
the terrible sense of pain that comes with a failed project by sharing the experience that
myself and others have gained over many years. To use a simple analogy, I often feel that
many newcomers to the domain of AI are learning to fly … by crashing. I’d like to teach
you to fly safely and without the need to experience a crash.
David Porter I started my degree at the University of Greenwich with a vague idea I wanted
to do something in IT, somewhere the magic happened, and I graduated in 1995 with a
burning passion for all things data and analytics. I have been lucky enough to work in
Data Science, ever since. I have held senior consultancy roles at SAS Software, Detica/BAE
Systems and now IBM. Early on in my consultancy career, I chose to focus on counter-
fraud and law enforcement systems, I have never regretted that choice. My specialisation
has allowed me to work with governments and organisations all over the world, it turns out
financial crime is both universal and relentless. I didn’t know then, but it’s turned into one
of those rare things, a job for life. Career highlights include co-inventing the NetReveal
graph analytics software which in turn enabled the UKs first Insurance Fraud Bureau to
launch and designing the conceptual architecture for what was to become the UK’s ground
breaking tax compliance system Connect. Job security hasn’t been the only upside, pitting
my analytic wits against, sometimes brilliant (but mostly lucky) criminals has sustained my
xi
xii ◾ Authors
insatiable curiosity and fascination for every new AI development, in the hope I can use it
to sock it to the bad guys. As I often say, to anyone that will listen, it’s like being paid to play
chess. I joined IBM in 2016, enticed by the Watson story, could AI be used to catch crooks? It
took a while for me to convert my maths brain to understand Natural Language Processing
(NLP), but it was totally worth the effort, who knew words could be used to predict things?
I’ve been putting NLP to good use ever since. NLP unlocks the value buried in the written
data that every organisation collects, but no one has time to read. We truly are at the start of
a new data gold rush. My ambition in this book is to help you get at that gold.
Dr. Padmanabhan Santhanam After graduate school and a brief career in low-temperature
physics, I joined a team in IBM to work in software engineering research some 28 years ago.
The goal was to improve the quality of commercial software produced and the productivity of
software engineers who created it. I realised that the software development was a combination
of computer science, some engineering, sociology and culture. I spent two decades working on
leading-edge tools that covered a wide range of topics such as software defect analysis, project
metrics, model-based testing and human–computer interaction. One of my proud accom-
plishments was a tool that automated ways to assess the maturity of a software project by
analyzing various project artefacts. It was deployed across hundreds of software development
teams in IBM with much success. Almost 15 years ago, we leveraged the NLP technology (that
was created for the IBM Watson Jeopardy Project) to analyse textual use cases (in English and
Japanese) in software projects. We were able to extract expected software behaviour and create
functional test cases and other key artefacts automatically. Since 2014, I have worked on vari-
ous aspects of AI technical strategy for IBM Research and our clients.
My personal interest is both in the use of AI for engineering traditional software systems
and the emerging field of AI Engineering i.e. how to engineer trustworthy AI systems. In
this book, I want to bring some practical perspectives on AI Engineering.
Acknowledgements
T his book is truly a labour of love over the past 2 years! James Luke had sensed the
need for this book based on his experiences with clients and had even started writing
drafts of some chapters. With the usual demands at work, he needed partners to make it
a reality. The c ollaboration on the book started on 14 September 2019 when James sent an
email to the other two authors asking for interest in the project. From the very first meeting,
we all agreed on the need for the book, what the book should be about and who the target
audience must be. Thus started the long journey. To keep the consistency of the narrative,
we decided that all of us will contribute to all the chapters, instead of divvying up the
chapters among us. Given the broad audience we had in mind, the concept of using
vignettes as side narratives for different purposes was also very appealing.
To give the book the authenticity of different voices and experiences, we recruited a few
of our IBM colleagues to write vignettes on various related topics.
We are indebted to them for their enthusiasm and contributions to enrich the contents
of this book.
We discovered our publisher, Randi Cohen, at Taylor & Francis in early 2020. She has
been patient with us over the months, and has provided feedback and steady encourage-
ment at every opportunity.
We thank Annemarie McLaren for bringing her fresh eyes to review the contents of
this book and giving us detailed suggestions for improvement. Francesca Rossi helped to
clarify the articulation of trustworthy AI and the role of ethics in developing AI systems.
xiii
xiv ◾ Acknowledgements
In addition, PS would like to thank Zach Lemnios and Glenn Holmes for their enthu-
siastic management support for his contribution to this book and Hema Santhanam for
many useful discussions on various sections of the book while being home bound during
the COVID pandemic.
In the end, we hope that we have provided some clarity on what it takes to select and run
a successful AI project in business with the current state of the technology and are looking
forward to hearing from you on how we did in helping you overcome the common pitfalls
in adopting AI for business.
James Luke
David Porter
Padmanabhan Santhanam
November 2021
Prologue
The CEO has declared that our new business will be the lead in Artificial
Intelligence within 3 months. He read it in the newspaper, everyone is doing it, it
has got to be good; we must do it!
Looks like we may need some data to make it work. No problem we have loads
of data? Between the data in our legacy systems and the internet, which is free
right? There should be plenty. Let us start with what we can find to build a proto-
type in the next month. We can work out the GDPR issues later.
I know you are going to say the data format we have isn’t quite right! And
some documents are in foreign languages, that the team doesn’t speak, but that
shouldn’t be a problem because they’re easy to translate online, right? Sure, some
of our corporate data is not really digital … our documents are mainly typed with
handwritten notes; and so what if a few of the documents were burnt (only at the
edges) in a fire, that’s ok, we shouldn’t worry, AI can deal with all that. Did you
see there was an article in the paper last week about how AI is already working on
damaged documents in the National Museum? the CEO says, if it’s good enough
for them…
With all that data, we just need some good Machine Learning algorithms to
make sense of it. There has to be gold in there somewhere, the AI will find it
for us. We had some interns put together a Machine Learning app in just a few
days, so our programmers should have no trouble putting a production system
together.
Our CTO thinks our current infrastructure and DevOps processes should be
more than enough to create the first version of the AI application in three months.
We will deal with any other issues later. ok?
Oh, just a reminder, do not bother our business subject matter experts yet.
They’re busy with our clients. We can always get them to talk to the programmers
later when there is something to see. Besides, AI should be able to figure this stuff
out on its own. I hear accuracy of these algorithms can be 80% without too much
trouble and that should be good enough for our business.
See you in 3 months for the big launch!
If any part of the above resonates with your current Artificial Intelligence project, then you
need to read this book!
DOI: 10.1201/9781003108498-1 1
2 ◾ Prologue
If all the above resonates, then you need to find a new job and STILL read this book.
If none of the above resonates, and you are successfully rolling out new Artificial
Intelligence applications without a problem, then please get in touch so that you can help
us co-author the next edition of this book!
Chapter 1
In the years before the Second World War, officials in the British Air Ministry became
increasingly concerned about the risks to aircraft of a death ray. In addition to offer-
ing a £1,000 prize to anyone who could demonstrate an effective death ray, they also
consulted the scientist Robert Watson-Watt. Watt, after collaborating with his col-
league Skip Wilkins, determined that a death ray was not feasible. However, Watt and
Wilkins did propose bouncing radio waves off aircraft as a means of detecting and
tracking them. The search for a death ray inadvertently led to the invention of radar.
AI IS EVERYWHERE
The moral of this story [1] is simple. When it comes to any form of new technology, the
destination you arrive at may not be the one you were planning for. As a society, we are
embarking on arguably the most exciting technological journey; the quest to improve
the way we live and, especially, the way we work through the practical use of Artificial
Intelligence (AI). Reading the daily media reports and the recent AI Index report [2], many
would argue that we are already using AI extensively in our personal and professional lives.
One reason that AI is seen as ubiquitous is that there are many high-profile and success-
ful applications that do have a major impact on our daily lives. Many of these applications
are driven by large web companies such as Google, Amazon and Facebook and enable
impressive applications such as search, question answering, and image classification. In
our online shopping, we experience AI-enabled profiling and targeted advertising continu-
ally and can be forgiven for thinking that the AI knows us better than we know ourselves.
We no longer need to learn a foreign language as online translation tools allow us to read
foreign language text or order a drink anywhere in the world. When we call our bank or
book a holiday, we can count on dealing with virtual agents that seem to understand our
conversation as if they were human. So, what’s the problem?
DOI: 10.1201/9781003108498-2 3
4 ◾ Beyond Algorithms
ENTERPRISE APPLICATIONS
Enterprise applications are different from these popular consumer web applications, and
they really do run the world in both the commercial and public sectors. In the commercial
sector, enterprise applications are used in retail sales, insurance, finance, telecommunica-
tions, transportation, manufacturing, and many other industries. In the public sector, gov-
ernment agencies deploy applications to support areas including Law Enforcement, Social
Security Administration, Internal Revenue Service, Health Services and National Defense.
With so much AI already available and working for our benefit, you may be surprised to
know that not only are actual enterprise applications of AI still in their infancy, but that
the majority of enterprise AI projects fail. As evidence, recent analyst reports [3–6] estimate
that at most 20%–40% success rate for the adoption of AI to create business value. This
supports the assertion that moving AI from a proof-of-concept to a real business solution
is not a trivial exercise.
While the consumer web applications mentioned above are extensive in their reach and
impact, they represent just a tiny fraction of real Information Technology (IT) applica-
tions around the world. Hidden under the covers of the organisations that we rely on for
our day-to-day lives are tens of thousands of applications. Individually, these applications
may appear much smaller in reach and impact than web search or online shopping, but, in
reality, they are really critical to our lives. These applications perform all the critical func-
tions of the modern world from managing our prescriptions to evaluating life insurance
risks, controlling city traffic, managing bank accounts and scheduling the maintenance
of trains and buses. There is a vast number of such enterprise applications, and many
could benefit from the application of AI. However, many well-intended AI projects still
underestimate the extra complexity of their delivery in an enterprise setting, and often
even after stellar early success, still fail to deliver actual business benefit. In creating these
enterprise applications, we need to recognise that they are very different from consumer
web applications and that delivering AI in the enterprise is different from delivering AI
in a web company.
Consider, for example, a consumer web application such as one of the personal assis-
tants that we all now have in our homes and use in our everyday activities. These assis-
tants are designed to answer the most frequently asked questions such as, “is it going to
rain today?” To answer these questions, the developers provide specific services and then
capture data from millions of users to discover all the possible ways the question could
be asked.
But what about answering general knowledge questions? While we all love to be
impressed by the power of our online assistants, answering general knowledge ques-
tions is considerably easier than answering enterprise questions. “Who was the British
Prime Minister during the Suez Crisis?” requires a factual response. To find the answer
to such a transactional question, the technology can exploit the massive levels of infor-
mation redundancy on the internet. There are tens, if not hundreds of thousands of
Why This Book? ◾ 5
documents online about the Suez Crisis. This information redundancy means that it
is possible to use simple correlation algorithms to identify the correlation between the
terms “Suez crisis”, “Prime Minister” and “Anthony Eden”. All the web AI has to do is
take the statistically strongest answer. The internet also includes many trusted sources
of data, including news agencies, educational organisations and encyclopaedic sites
that aim to provide validated and trustworthy information. In tuning their algorithms,
the web companies can use the feedback from millions of users to spot and correct
errors.
In other words, consumer web applications tend to focus on quite simple, common
tasks that can be performed with huge volumes of information, from many (often trusted)
sources, while using feedback from tens of millions of users to tune the algorithms. In the
enterprise, the questions asked are rarely that simple and the volumes of information are
much, much lower. Often information is contradictory and may take a skilled user to assess
and really understand. Alternatively, the required information may not exist at all. As for
the number of people involved, in an enterprise, far fewer people ask far more complex
questions and the differences between questions may be subtle, but important. Our ability
to capture high volume feedback is limited.
Finally, there is one further and very significant difference between applying AI in an
enterprise setting and applying AI for consumer web applications. We are much more for-
giving of errors in web AI; it is mostly of low consequence if a web AI brings back a wrong
answer. If we ask, “Alexa play my favorite tune” and the response is, “THE MONTH OF
JUNE HAS 30 DAYS”, it’s another thing to laugh about at dinner parties. However, we will
be far less forgiving if a police application leads to the arrest of the wrong person or a medi-
cal AI leads to a misdiagnosis.
The excitement generated by AI web applications does still add value to enterprise appli-
cations. First, it changes the way we might think about enterprise applications by placing
greater emphasis on ease of access and simplicity. Graduates joining modern enterprises
expect web technology and web style user experiences in their life at work. Second, it drives
innovation and helps push forward the business cases for enterprise applications of AI.
Of course, such an endeavour will succeed only if the AI can deliver on the expectations
assumed in the business case. In this respect, we must recognise that the domain of AI has
a track record of failed delivery.
AI WINTERS
There have been two periods in the history of AI where dashed expectations have led the
industry to lose all confidence in AI. The perception of AI was so poor that these peri-
ods were called ‘AI Winters’. During these periods, funding for new AI endeavours all
but disappeared and there was widespread disillusionment, some would say c ynicism,
about AI.
6 ◾ Beyond Algorithms
The Beginning
The modern formulation of AI started with Alan Turing’s famous paper, “Computing
Machinery and Intelligence” [1], where he discussed the concept of computing machines
emulating human intelligence in some context, giving rise to the term “Turing Test” for
machines. The next big event was the 1956 Dartmouth Summer Workshop [2] organised
by John McCarthy, Marvin Minsky, Nathaniel Rochester and Claude Shannon when the
term ‘Artificial Intelligence’ was introduced as an interdisciplinary area of research. Two
key people who could not attend the event were, Alan Turing who had died in 1954, and
the computing visionary, John von Neumann who was already seriously ill. Even though
the workshop did not produce anything specific, it gave the participants the motivation
to approach AI from their different perspectives over the next many decades.
What Is AI?
John McCarthy defined [3] AI as, “the science and engineering of making intelligent
machines” and defined “intelligence” as “the computational part of the ability to achieve
goals in the world.” Marvin Minsky offered [4] a similar definition, “the science of making
machines do things that would require intelligence if done by men”. The Encyclopedia
Britannica currently defines AI as, “the ability of a digital computer or computer-controlled
robot to perform tasks commonly associated with intelligent beings”.
These definitions all share a common theme in that they refer to performing tasks
that would normally require or be associated with human intelligence. There is clearly a
paradox in this definition. Pamela McCorduck [5] describes this: “Practical AI successes,
computational programs that actually achieved intelligent behavior, were soon assimi-
lated into whatever application domain they were found to be useful, and became silent
partners alongside other problem-solving approaches, which left AI researchers to deal
only with the “failures,” the tough nuts that couldn’t yet be cracked.” There is actually a
name for this! It is called the “AI Effect” and summarised in Larry Tesler’s theorem [6],
“Intelligence is whatever machines haven’t done yet”. This is because the society wants
to associate intelligence only with humans and does not want to admit that human tasks
can indeed be performed by machines!
REFERENCES
1. A.M. Turing, “Computing machinery and intelligence,” Mind, New Series, 59(236), pp. 433–460
(October 1950).
2. “Dartmouth summer research project on Artificial Intelligence,” http://jmc.stanford.edu/
articles/dartmouth/dartmouth.pdf.
3. J. McCarthy, “What is AI”, http://jmc.stanford.edu/articles/whatisai/whatisai.pdf.
4. M. Minsky, Semantic Information Processing, (Cambridge: MIT Press, 2003).
5 P. McCorduck, Machines Who Think, (London, UK: Routledge, 2004).
6. D. Hofstadter, quoted Larry Tesler differently as “AI is whatever hasn’t been done yet” in
“Gödel, Escher, Bach: An eternal golden braid,” (1980).
Why This Book? ◾ 7
The first AI Winter occurred between 1974 and 1980 following the publication of a
report by Sir James Lighthill [7] criticising the failure of AI research to meet its objectives
and challenging the ability of many algorithms to work on real problems. Funding for AI
research has been cut across UK universities and across the world.
In fact, AI’s fortunes seemed to be on the up again in the mid-1980s when investment
banks found that neural networks and genetic algorithms seemed to be able to predict
stock prices better than humans. A stampede of activity took place as banks competed to
get the upper hand with better more sophisticated automated trading algorithms. What
could possibly go wrong? In the rush to get rich, IT architects failed to acknowledge the
critical weakness of neural nets; they need historical precedent in their training data for
predictions. They can be unpredictable when applied to previously unseen situations.
The stock price boom, made possible by AI, was (you guessed it) unprecedented. By this
point everybody trusted the algorithms, when they said, “Sell!” the bank sold, when they
said “Buy!”, it bought, no questions asked. The global financial crash of 1987 aka “Black
Monday” was enabled by a chain reaction of AI trading algorithms going off the rails [8].
Unsurprisingly, this was a major factor in triggering the Second AI Winter (1987–1993)
with increasing disillusionment in expert systems and the collapse of funding in special-
ist AI hardware. Either side of these major AI Winters, there were multiple periods of less
significant disillusionment when AI failed to meet over-hyped expectations.
machine translation, etc.) and can perform at accuracies comparable to or better than
humans. There is no reason that businesses cannot leverage these advancements to solve
the data deluge problem they face.
DELIVERING AI SOLUTIONS
As it stands today, there is way too much focus on algorithms, and not enough atten-
tion paid to engineering real solutions. Millions of researchers in Academia and Industry
around the world with deep expertise in AI are working on algorithms and the science
behind them. There are far fewer engineers who have specialised in how to define and build
complete reliable end-to-end solutions. To adopt an automotive analogy, we have a huge
investment in the science of fuels and brilliant researchers coming up with more and more
efficient fuel compounds. However, we lack the overall systems engineering skills to build
useful and enjoyable automobiles. While algorithms are critical to advancing AI, we also
need to embrace other disciplines to build complete solutions. We need Data Scientists,
Business Analysts, Ethicists, Performance Engineers, Systems Integrators, Test Specialists
and User Experience Designers, all working towards a common goal under an overall
engineering method. Such a framework does not exist yet.
AI really works and how to work with it, not just a small group of brilliant scientists. This
is different from the past, when users did not really need to understand how the technol-
ogy behind the application worked. As a society, a huge cross section of people will need to
understand AI in a way that they haven’t had to previously.
• We will explain the core concepts of AI using real practical examples, with a focus on
the systems engineering and the business change aspects of building solutions.
• For the technically inclined, there are many technical deep dives scattered through-
out the book. These can be skipped over by the less technical reader.
• For the philosophers, we’ve included the odd thought experiments designed to chal-
lenge your brain and destroy your weekends.
• Finally, there are case studies of real AI systems and projects that will hopefully help
you understand what works … and what doesn’t.
Chapter 2 gives a broad overview of examples of AI applications and discusses key aspects of
building and sustaining them. Chapter 3 introduces various approaches to algorithms and
their applicability in practice. Chapter 4 helps you to select the right AI project and avoid
the bad ones. Chapters 5 and 6 aim to define and measure business value of AI. Chapters 7
and 8 address the doability aspect of projects with detailed discussion of the data usage and
specific challenges in real projects. Chapter 9 helps to evaluate your business idea in terms
10 ◾ Beyond Algorithms
of specific questions related to value and doability and take actions accordingly. Chapter 10
deals with the not-so-boring engineering aspects of creating and maintaining AI systems.
Chapter 11 is a view of the future of AI in business to get you prepared for what is coming.
So, let’s get started on this exciting journey and make AI real.
REFERENCES
1. “Robert Watson-Watt and the triumph of radar,” https://blog.sciencemuseum.org.uk/
robert-watson-watt-and-the-triumph-of-radar/.
2. 2021 AI index report: https://aiindex.stanford.edu/wp-content/uploads/2021/11/2021-AI-
Index-Report_Master.pdf
3. KPMG 2019 Report: “AI transforming the enterprise”.
4. O’Reilly 2019 Report: “AI adoption in the enterprise”.
5. Databricks 2018 Report: “Enterprise AI adoption”.
6. MIT Sloan-BCG Research Report: “Winning with AI”.
7. J. Lighthill (1973): “Artificial intelligence: a general survey,” Artificial Intelligence: a Paper
Symposium, Science Research Council, London.
8. F. Norris, “A computer lesson still unlearned,” The New York Times, (October 18, 2012):
https://www.nytimes.com/2012/10/19/business/a-computer-lesson-from-1987-still-
unlearned-by-wall-street.html.
9. G. Marcus and E. Davis, Rebooting AI, Pantheon Books, New York (2019).
10. N.C. Thompson, et al., “Deep learning’s diminishing returns,” IEEE Spectrum, 58(10),
pp. 50–55 (October 2021).
11. C.Q. Choi, “7 revealing ways AIs fail,” IEEE Spectrum, 58(10), pp. 42–47 (October 2021).
Chapter 2
Building Applications
No way! Why would we pay for a translator when we can use a free online service?
The Requirements Document was complex and, to make matters worse, it was written
in Arabic which no one on the team spoke. We had just two weeks to respond and it was
a must win deal. Our manager point blank refused to pay for a human translator, so we
resorted to the online translation tool of a well-known web search company.
I watched as one of the team copied the first requirement from the “Administration
and Management” section and pasted it into the translation tool. The translated
question brought an instant smile to my face! It said simply, “what is the point of
management?”
James Luke
In many respects, Artificial Intelligence (AI) applications are no different from any other
type of complex systems. But in some respects, they’re very different. Later in this book,
we’ll highlight the similarities and differences in the system development activities of AI
and typical information technology (IT) systems. In this chapter, we’ll explain what makes
AI applications very different from traditional business applications. Then we’ll review
some of the most famous applications in the history of AI and discuss what we can learn
from them. We will discuss some existing examples of business applications that incorpo-
rate AI and the underlying architectural considerations that you may want to think about
in building your AI application.
to the program. In AI applications, data is encapsulated together with the code in a way
that actually determines the functional behaviour.
One aspect of data that needs to be considered upfront is the relationship between the
training data and the production data. Almost always, we see the need for extensive data
cleansing (see Chapter 7) in order to create effective training and test data. Remember that
any data cleansing needed in producing training and test data will also be required, con-
sistently each and every day, in the production environment. If a crucial stage of the data
cleansing for training data involves a human manually and laboriously correcting some
fields, then that person is going to be needed in the production environment and they will
need to move a lot faster if you are planning on building a sub-second process! Seriously, if
you have to make lots of manual interventions at the training data stage, you are going to
need to find a way to automate that data cleansing routine before you can go live with an
operational application.
Beyond the data, there are many other critical considerations that must be understood.
While these next considerations may not apply to all applications, many will apply in most
cases.
Making Mistakes
As AI applications are designed to perform tasks normally requiring humans then, by defi-
nition, there are distinctly human aspects to their functionality, such as making judgement
calls in decisions. You will not only need to accept that they will not always make the right
decision but will also need to accept that it may not be possible to determine what the right
decision is. This challenge will be compounded if you choose to use AI technologies that
are not able to explain their reasoning.
Accepting that AI applications will make mistakes is not easy for human beings to do!
Consider, for example, the use of AI in enabling driverless vehicles. At the time of writing,
just a small number of fatalities were reported in accidents involving driverless vehicles,
but each of them evoked a huge media reaction. Yet in 2019, nearly 40,000 people were
killed by cars in the US alone without any significant media reaction. It seems that the
public is far more accepting of human error than machine error.
Ability to Generalise
In Machine Learning systems, we deliberately develop systems that achieve a percentage
accuracy against test data. There is a continuous contention within the Machine Learning
community regarding accuracy versus generalisation. Our aim in Machine Learning is to
create a system that can generalise in the same way humans do. Continuing with our driv-
ing theme, in the UK, people are taught to drive right-handed vehicles on the left-hand side
of the road. They learn a particular set of road signs and road layouts. Once qualified, they
are able to take their driving license to Europe or the US and are able to drive left-handed
vehicles reasonably safely on the opposite side of the road with different types of signs,
intersections and road layouts.
Our ability to generalise is extremely powerful and is perhaps a true test of human intel-
ligence. It is important because it allows the system, either human or machine, to operate
in an unfamiliar environment. Generalisation enables a pilot to adapt to manage a serious
Building Applications ◾ 13
technical fault or a fire chief to develop a plan for a disaster that has never been previously
encountered. Generalisation enables a judge to make a decision on a case that represents a
serious legal and ethical dilemma. Generalisation is critical in delivering the ability to deal
with a previously unseen scenario.
So, in developing Machine Learning systems, we are constantly trying to understand
whether the system is actually generalising or whether it is simply memorising all the
training cases; the latter is often referred to as either “over-fitting” or “learning the deck”.
If the system has “learnt the deck”, then it will not work well in previously unseen situ-
ations. One of the great indicators that the system has learnt the deck is a near-perfect
performance against training and test data. For this reason, Machine Learning algorithms
are designed and tuned not to achieve perfect performance against training and test data.
In other words, we must deliberately design systems that fail test cases and are therefore
deploying into operational use systems that are expected to make mistakes.
AI Ethics
When dealing with such judgement calls, especially those involving human life, ethical
and trust issues become paramount. We are accustomed to seeing ethical debates in the
media about AI applications. Often these debates focus on whether it is ethical to develop
a particular application. From an architectural and development perspective, the ethical
considerations run deeper. Given the ambiguity inherent in AI systems, resulting from the
accuracy considerations and the fact that many decisions will be judgement calls, how do
14 ◾ Beyond Algorithms
we ensure that an AI application is competent to perform its task? Note the word compe-
tent is being used in place of the more scientific “fit for purpose” engineering term. “Fit for
purpose” implies a formal test process as applied in conventional engineering. With AI
applications, we have already explained why such testing is not effective. Instead, we need
to think about “competence” in the same way that we would refer to the competence of a
professional person.
AI Accountability
Let’s consider a common question asked about AI applications … who is responsible when
it goes wrong? If a driverless vehicle makes a mistake that results in a human death, who
is responsible? Can the AI be deemed responsible … and perhaps in a future of sentient
AI droids would it be deactivated? Is the owner of the AI responsible or, more likely, is it
the developer? To a certain extent, this is not a new problem, and society has pretty much
figured this stuff out. If an aircraft crashes, there is an investigation that looks at all aspects
of the process, from the training of the pilots to the design of the aircraft to the standards
mandated in the industry and beyond. We suspect it will be the same with AI! Given that
it may not be possible to exhaustively test all aspects of an AI application, there will be an
ethical responsibility on the developer to ensure best practice was adopted in the develop-
ment process. For example, was the right amount of training data used and was the level of
testing reasonable for the complexity and risk associated with the application? The devel-
oper may even need to preserve the actual training data so that retrospective investigations
can be run. For example, “did the driverless vehicle training data include sufficient dogs
running across the road, snow covered lane markers?”
Determinism in AI
In considering best practice for AI, as opposed to conventional software engineering, the
subject of determinism is particularly interesting. Determinism is considered an essential
element of conventional software engineering and ensures that a system always functions
in exactly the same way for a given set of input data. Quite simply, if you put the same
input data into the application, you should get the same output! In AI systems, it may
not be quite that simple. For example, the AI model generated by some machine learning
algorithms depends on the order in which the training data is presented to the algorithm.
If you change the order, then you get a different model and data that was previously classi-
fied correctly may now be classified incorrectly and vice versa. As these algorithms require
considerable computational power and take a long time to run, we use parallel processing
to speed up the training process. Depending on the memory and number of processing
cores available, the order in which the data is presented may change and, therefore, subtly
different models may be generated for the same data.
trading systems that directly impact the environment in which they operate. AI applica-
tions buy or sell shares that cause changes in the share price that cause the AI to buy or
sell shares. It should not be a surprise therefore that we experience unexpected feedback
loops and maelstroms when these scenarios cascade out of control. Whilst these risks exist
in any automated system, with or without AI, the use of AI means they are harder to test
and prevent.
From the discussion above, can we all agree that AI Applications are sufficiently differ-
ent from conventional applications? Good … then let’s continue our journey by looking at
the spirits of AI applications: past, present and future.
Board Games
This shouldn’t really be a surprise! Game playing naturally provides a rich environment in
which to develop and test AI capabilities. The reasons for this are simple. First, the scope
of the problem is naturally limited to the domain of the game, with clear rules on what is
allowed and what constitutes a win. Second, assessing performance is easy, since the games
fit the human paradigm of winners and losers. Third, it is easy to explain what is actually
accomplished by the machine to the population at large. Finally, game playing is extremely
low risk in comparison with the real world. All these factors make playing board games a
very attractive proposition for AI researchers.
We just want to highlight the key technology advancement that resulted from these
gaming applications. Arthur Samuel [1] introduced the phrase “Machine Learning” in the
1950s to explain how his program playing Checkers learnt the game. It took a little while,
till the early 1990s when Gerald Tesauro created TD-Gammon [2], which learnt by playing
against itself (using reinforcement learning) to reach championship level in Backgammon.
Then came Deep Blue [3] from IBM in 1997 that captured the imagination of both AI
researchers and Chess fans by beating world chess champion Garry Kasparov in a regu-
lation match. This system used both custom hardware and AI learning from historical
games and player styles. In 2016, the AlphaGo system [4] from Google Deep Mind that
uses DNNs convincingly beat the world champion Lee Sedol in the Chinese board game
Go. The current version of this system called AlphaGo Zero is even better, with no need for
human input, beyond the game rules. The low risk in game playing gives those developing
AI for games a massive advantage over those developing real-world AI at a time when the
virtual world and real worlds are converging. As simulation games become more and more
realistic, we should not be surprised to see AI developed initially in games being applied
in the real world.
16 ◾ Beyond Algorithms
BOARD GAMES
AI playing board games with humans has been an exciting aspect of demonstrating the
sophistication of the AI to the community at large. Here is a short summary of some
famous AI projects playing board games in the last seven decades.
positions and selected moves using deep neural networks (DNNs). These neural
networks were trained by supervised learning from human expert moves and by
reinforcement learning from playing against itself. After the success of AlphaGo,
there is a more recent version of the program called AlphaZero, which was devel-
oped with NO human input beyond the game rules. AlphaZero won 100–0, play-
ing against the AlphaGo program, the previous winner.
REFERENCES
1. A. L. Samuel, “Some studies in machine learning using the game of checkers,” IBM Journal of
Research and Development, pp. 211–229 (1959).
2. G. Tesauro, “Temporal difference learning and TD-gammon,” Communications of the ACM,
38, pp. 58–68 (1995).
3. M. Campbell, “Knowledge discovery in deep blue,” Communications of the ACM, 42, pp.
65–67 (1999); https://www.youtube.com/watch?v=KF6sLCeBj0s.
4. D. Silver et al., “Mastering the game of go without human knowledge,” Nature, 550, pp.
354–359 (2017).
day, it won the points for enriching the knowledge of the audience by a large mar-
gin. Project Debater’s knowledge base consisted of around 10 billion sentences,
taken from newspapers, journals and archival sources. The system had to iden-
tify the contents relevant for the topic of the debate, both pro and con, and pull
together a coherent argument supporting its position using precise language in a
clear and persuasive way. A good rebuttal is always the hardest part of a debate.
Project Debater applies many techniques to anticipate and identify the opponent’s
arguments and then responds with claims and evidence that counter these argu-
ments. The major accomplishment of Project Debater was the compilation of dif-
ferent ideas from its corpus on both sides of the resolution and creating a coherent
presentation in real time. This clearly demonstrated the possibilities of human and
machine working together to perform better than either of them.
REFERENCES
1. J. Weizenbaum, “ELIZA: a computer program for the study of natural language communica-
tion between man and machine,” Communications of the ACM, 9, pp. 36–45 (1966).
2. T. Winograd, Understanding Natural Language Academic Press, (1972).
3. D. A. Ferrucci, “This is Watson,” IBM Journal of Research and Development, 56 (2012) and
the rest of the Issue.
4. N. Slonim, et al. “An autonomous debating system,” Nature, 591, pp. 379–384 (2021).
Expert Systems
From the 1970s to the early 1990s, building systems to emulate human knowledge was a
big endeavour in various domains. This required definition of domain specific rules by
humans who are experts in the domain, hence the name, Expert Systems. MYCIN [10]
was meant to be an assistant to a physician to diagnose and treat patients with bacterial
infections. DENDRAL project [11] dealt with the problem of determining the molecular
structure from the results of mass spectrometry experiments. AARON: The Robotic Artist
[12] was created to emulate a painter who produced physical, representational artwork.
These and many other applications of that time suffered from the same basic problem: The
creation of the systems needed experts and vast amount of domain knowledge and they
could not scale to other domains. The manual effort needed to create them and the practi-
cal difficulty of maintaining them with evolving knowledge in any domain were the other
reasons that the expert systems approach as the primary method to build AI systems went
out of favour. Fundamentally, these systems tried to do things decades ago that would be
significant value to the society. Clearly, medicine is an area of huge potential value; how-
ever, it is also a domain where the challenges of successful safe applications are massive.
Creation of “Deep Fake” images of today may be the current incarnation of what AARON
tried to do all those years ago.
20 ◾ Beyond Algorithms
EXPERT SYSTEMS
In our zest to make AI systems emulate humans, considerable effort over at least three
decades was devoted to transferring human knowledge to machines. Here are three
famous examples of these ‘Expert Systems’.
• MYCIN (1972–1979)
MYCIN was a rule-based expert system [1] developed at Stanford University,
designed to assist physicians in the diagnosis and treatment for patients with bac-
terial infections. In addition to the consultation aspects, MYCIN also contained
an explanation system which answered simple questions to justify its advice or
educate the user. The system’s knowledge was encoded in the form of about
350 rules that captured the clinical decision criteria of infectious disease experts.
The architecture of MYCIN allowed the system to dissect its own reasoning and
facilitated easy modification of the knowledge base. Even though the system was
competitive when compared to human experts, it was never used in real clinical
settings.
• The DENDRAL Project (1965–1993)
DENDRAL was a collection of programs [2] developed at Stanford University
that addressed the important problem in organic chemistry of determining the
organisation of the chemical atoms in specific molecules. In order to do this, they
collected rules that chemists use to describe the results of mass spectrometry.
They later added specialised postprocessing knowledge and procedures to deal
with data beyond mass spectra. The program was assessed by experts by testing
the predictions on structures not in the training set. It succeeded in rediscovering
known rules of spectrometry that had already been published, as well as discov-
ering new rules.
• AARON: The Robotic Artist (1975–1995)
Harold Cohen at the University of California, San Diego, created AARON, a
robotic action painter [3] that could produce physical, representational artwork. It
used a model for synthesising a set of objects (e.g. people and plants) and chose
its subjects, composition and palette entirely autonomously. While the paintings
have some elements of randomness, the style and aesthetics of the paintings were
designed and programmed by humans.
REFERENCES
1. W. Van Melle, “MYCIN: a knowledge-based consultation program for infectious disease
diagnosis,” International Journal of Man-Machine Studies, 10, pp. 313–322 (1978).
2. R. K. Lindsay, B. G. Buchanan, E. A. Feigenbaum and J. Lederberg, Applications of Artificial
Intelligence for Organic Chemistry: The DENDRAL Project McGraw-Hill, New York, (1980).
3. C. Garcia, “Harold Cohen and AARON—A 40-year collaboration,” (2016). https://computer-
history.org/blog/harold-cohen-and-aaron-a-40-year-collaboration/.
Building Applications ◾ 21
Autonomy
It is hard to believe that the first instance of a physical robot that could react to the environ-
ment not explicitly guided by humans was demonstrated in 1949 by William Grey Walter
[13]. Over many decades since then, the field of robotics has evolved significantly. You
can visit the vignette on “Advances in Robotics” in this chapter for a summary of the
uses of robots in practice today. The quest for autonomous vehicles also had gone on for
decades before the DARPA Grand Challenge of 2004 [14] that posed a challenging mili-
tary scenario. Even though no one really won the competition, the event brought various
stakeholders to a common goal that has since proven instrumental in the evolution of the
driverless vehicle technology of today.
AUTONOMY
Human desire for the AI to have a physical form and be independent from the human
control has persisted for many decades. Here are two projects that capture the essence of
the autonomy across many decades. Please visit our vignette on ‘Robotics’ in this chapter
for various uses of robots today.
REFERENCES
1. W. G. Walter, “An imitation of life,” Scientific American, 182, pp. 42–45 (1950).
2. R. Behringer, “The DARPA grand challenge – autonomous ground vehicles in the desert,”
IFAC Proceedings, 37, pp. 904–909 (2004).
22 ◾ Beyond Algorithms
ADVANCES IN ROBOTICS
In 1917, a Czech author Joseph Capek introduced the term ‘automat’ to refer to an arti-
ficial worker in one of his short stories. In 1921, his brother Karel Capek wrote the play
“RUR” (Rossum’s Universal Robots) [1], which was derived from the Czech word ‘robota’
for ‘forced labor’. Thus, the Capek brothers introduced ‘robots’ to popular language.
Isaac Asimov captured the imagination of so many of us through short stories in science
fiction [2] and with his three laws for robots:
1. A robot may not injure a human being or through inaction allow a human to come
to harm.
2. A robot must obey orders given to it by humans except when doing so conflicts
with the first law.
3. A robot must protect its own existence as long as this does not conflict with the first
or second law.
In 1958, General Motors brought robots to reality when it introduced the Unimate [3] to
assist in automobile production. With its use on the assembly line in 1961, the application
of robotics in industry began. Over decades, there has been widespread use of robots for
various purposes. In all cases, robotics has one of the following goals:
To perform any complex function, robots need appropriate sensors (visual, infrared, etc.)
depending on the intended purpose. Mobility is another consideration. Fixed robots
have a clear reference coordinate system for their operation. Mobile robots, however, are
expected to move around and perform tasks in a more open environment that may not
be precisely known and hence need to depend on their sensors to compute their loca-
tion and orientation. Due to the significantly different mechanisms for navigation, robot
Building Applications ◾ 23
designs for the three operating environments, i.e. aquatic, terrestrial and aerial, are very
different. The table below shows some examples of popular uses of robots, grouped into
two broad categories [4]: (i) Industrial Robots that work in well-defined environments (ii)
Service Robots that support humans in their tasks in various domains.
REFERENCES
1. K. Capek, R.U.R. (Rossum’s Universal Robots), Penguin Group, New York, (2004).
2. I. Asimov, “Runaround,” In: Astounding Science Fiction, Street & Smith, New York (March
1942).
3. http://www.robothalloffame.org/inductees/03inductees/unimate.html.
4. M. Ben-Ari and F. Mondada, Elements of Robotics, Springer, Cham. (2018) https://doi.org/
10.1007/978-3-319-62533-1_1.
AI OR NO AI?
Having looked at some of the most famous historical applications of AI, let’s now consider
where AI is being applied today and how it’s impacting our lives. The first point to note is
that the use of the term AI in a product name doesn’t really mean the product uses AI. If
we are to believe the media and advertising, every single product from a vacuum cleaner to
a clothes peg is using AI and on the verge of demonstrating sentience.
Whilst we should be cautious about believing that AI is ubiquitous, there is an ele-
ment of emergent AI behaviour that is creeping into society as a result of the Internet
of Things. As we fill our lives and daily activities with more and more devices, we are
generating more and more data that can both enable and be exploited by AI. Within this
environment, even the most simple of algorithms can start to generate an appearance of
intelligence and even sentience. For example, after commuting to work, you may get off
a train and return to your car parked in the station car park. As you climb into your car,
your smart phone flashes up a message saying, “7 minutes drive to home with light traffic
on the High street”.
The algorithms that analyse your movements to understand the location of your home,
where you parked your car and the fact that you are likely to drive home (because you do so
every day at this time) are really quite simple. These algorithms are enabled by the mobile
device you carry that tracks your every movement. Traffic sensors allow us to determine
in real time the density of traffic and route planners allow the rapid calculation of opti-
mum routes. Bringing all of these capabilities together creates a really simple but effective
and useful application with the appearance of some form of intelligence. Purists would of
course argue that this is not AI because the capability is being delivered by a machine and
does not, therefore, require human intelligence. This is a classic example of what is known
as the AI effect. A problem is only considered an AI problem until it has been solved at
which point it is no longer considered to be AI.
Let’s take this a step further. In addition to this simple tracking application, a com-
pletely different developer may put together a completely separate application. This second
application analyzes your shopping patterns and fuses this data with a sensor in your bin
to determine that you’re running low on milk. You only bought two pints at the weekend
24 ◾ Beyond Algorithms
and one of the empty cartons was thrown away 2 days ago. As you have to drive home past
a garage and you need petrol, the application reminds you to buy milk at the same time.
Again, it’s a really simple algorithm, enabled by location services together with some new
sensors and the Internet of Things; nothing intelligent about that application so no AI.
The point is, however, that if we keep adding sensors and devices and very simple algo-
rithms, then more and more of these simple but cool functions are going to emerge and
the Internet of Things is going to appear to be more and more intelligent … even though
the underlying algorithms are really quite simple and not in any way considered, at least
by purists, as real AI.
and sometimes personal data, without much thought. Their business models (typically
based on advertising or referral fees) rely on their ability to monetise the vast amounts
of collected data and AI is the only viable option for them. So, it is no coincidence that
these companies are investing heavily in AI skills and infrastructure. In addition, due to
the nature of these applications and their licensing terms, these companies have complete
control of their contents, underlying data and deployment infrastructure. Intrinsically,
the data collected across the global population is considered by them to be public domain
knowledge. The web companies are then able to license the use of this acquired public
domain knowledge through various services to enterprise customers. Governmental poli-
cies and regulations have been slow to catch up on these practices due to their relative
novelty, this may well change as the public become more conscious of their digital privacy.
However, when it comes to AI, we need to remember that many of the most successful
applications are actually quite basic. Consider, for example, simple question answering.
One very successful technique in simple question answering is to correlate the co-existence
of terms in a large text corpus. They don’t come much larger than the World Wide Web.
Let’s go back to the use case we touched on briefly in Chapter 1; a question answering appli-
cation being asked the question, “when was Neil Armstrong born?”
If you take the billions (trillions?) of pages of content on the web, there are going to be a
very large number of documents about the first man on the moon. A significant number of
these documents will include phrases such as:
• “Neil Armstrong (born 5th August 1930) was the first man to walk on the moon”,
• “The first man to walk on the moon was Neil Armstrong who was born in the US on
the 5th August 1930”,
• “Born on the 5th August 1930, Neil Armstrong trained as a pilot …”.
The sheer volume of content on the web means that there will be hundreds, if not thou-
sands, of pages that include statements similar to those above. By writing a relatively sim-
ple algorithm that correlates dates with names and the word “born”, it is relatively simple
to identify dates of birth for famous people. By generalising the algorithm to look for any
verb, the correlation algorithm could identify when famous people died, graduated, mar-
ried, divorced and a host of other life events.
This simple correlation algorithm works for two reasons. Firstly, the question we’re ask-
ing is simple! We’re asking for a single fact which is a lot more simple than asking a com-
puter to explain the causes of the Civil War or the reasons behind the economic crash of
2009. Secondly, there is a massive amount of redundancy in web data (especially about
common knowledge and famous people). Neil Armstrong’s date of birth exists in thou-
sands of places expressed in many different ways. That’s why the correlation algorithm
works!
Correlation algorithms are far less successful when applied to more sparse data using
more complex language. For example, in a police intelligence database, there could be a
26 ◾ Beyond Algorithms
single sentence stating, “John Edgar cannot be the brother of Steve Edgar because he was
born on 12th July 1978 which was 2 years before the latters’ parents first met”. If such a
sentence is the only sentence in the entire database that mentions the Edgar brothers (or
non-brothers), there is no way a correlation algorithm can work!
In addition, the web companies are able to leverage vast human resources to provide
feedback on their AI decisions. A web company wanting to develop an image classifier can
leverage their vast user base to score the performance of the systems to rapidly develop a
huge set of training data.
In developing these amazing AI web services, the web companies benefit to a large
extent in the same way as those early game playing researchers. The risk and cost of an
error is relatively low … perhaps even non-existent! That’s possibly a bit unfair as there
is clearly a massive commercial return on getting things right … there must therefore be
a loss if you get things wrong. However, the failure to generate the optimum amount of
advertising revenue does not equate to the cost of making a mistake in other domains such
as medicine or defence or arresting the wrong Edgar.
In their core business, the web companies are using a very specific set of techniques to
answer simple problems using massive information redundancy with feedback from huge
numbers of users. That is very different, almost opposite in fact, to the challenge faced in
many enterprises. As we grow beyond these commodity web applications and move into
the enterprise, we need to understand that the risks, consequences and technical chal-
lenges are far greater, and the tools and approach taken by a web company may not always
work out of the box for your business problem!
Let us summarise some of the prominent consumer facing AI applications heavily influ-
enced by the web companies. This helps to compare their functions and usage with typical
enterprise applications to run a business.
As you will see below, these are very different attributes from those of an enterprise AI
application that is supporting a business function in a specific domain (e.g. retail, banking,
insurance, manufacturing, etc.) with a direct connection to the business impact.
the supporting resources already exist, and therefore, any investment in an AI has to
be thoroughly evaluated for business value in terms of quality and efficiency. Since the
customer satisfaction is paramount, there is a clear need for correctness and consistency
in their applications. The consequence of following the recommendation from an AI
system to the enterprise can be severe (e.g. processing bank loans, identifying poten-
tial criminals, recommending medical treatments, etc.) compared to getting movie or
purchase recommendations on the internet or social media. In addition, there may be
government regulations and auditability requirements on their business practices (e.g.
Sarbanes–Oxley Act in banking, HIPAA in healthcare, GDPR in EU, etc.). The applica-
tion requirements can be unique and industry and/or organisation specific, requiring a
good understanding of the various data sources, not controlled by common data models
or governance. Due to concerns about competitive advantage or privacy regulations,
these enterprises may not want to share their data with third party vendors to build AI
models for their purpose. All these considerations point to much higher complexity in
building enterprise applications compared to the popular applications from the web
technology companies.
In most businesses, we are trying to answer very complex (and mission critical) ques-
tions, requiring the fusion of complex data from multiple sources, using multiple AI
components with much less feedback from users. It’s not just the questions that are more
complex! It’s also the fact that it’s harder to know when the answer is right or wrong. If
an online search engine answers a question about the Beatles incorrectly, it will be easily
recognisable as incorrect and, if we’re honest, it doesn’t really matter. In the enterprise, it
does matter and it’s harder to know when the system is wrong.
An Insurance Example
Consider, for example, an insurance company that wants to automate the analysis of
medical records in order to assess life insurance risk. One such solution, developed in
2012, required three different AI components integrated into an intelligent business
process. The first stage used Optical Character Recognition (OCR) to transform hard
copy documents into machine-readable text. The text was then processed by an entity
extraction tool that identified key medical events and terms such as “myocardial infarc-
tion” and “patient smokes more than 20 cigarettes a day”. Finally, a risk model used the
output of the entity extractor to make the final risk assessment for review by the medial
underwriter.
This solution required the integration of these three different AI components. The
data was incredibly complex with records going back more than 50 years in some cases.
The historical nature of the data caused new challenges as medical knowledge continu-
ally changes. Fibromyalgia, a condition which is said to affect 3%–6% of the popula-
tion was only first defined in 1976. Prior to 1976, it was described under many headings
including fibrositis. Finally, feedback on accuracy came from a small number of medical
underwriters.
Building Applications ◾ 29
This is just one single example that demonstrates the challenge of applying AI in the
enterprise. However, in terms of complexity, this underwriting solution is still relatively
simple.
A Mismatch of Expectations
Over the next few years, we will be delivering AI solutions of increasing sophistication
and we’re going to need to develop new tools and methods to do so. Take a moment to
think about conventional engineering and how we build incredibly complicated machines.
Airliners, nuclear submarines, satellites and many other amazing feats of engineering are
designed and built by teams of engineers using sophisticated tools and processes.
Just consider for a moment the Computer-Aided Design (CAD) tools available to civil,
or nuclear, or aeronautical engineers. Then consider the sophistication of the processes for
selecting and managing components in large engineering projects.
Now, think about the size of a typical AI project and the limitations of the tooling avail-
able. Most AI projects are typically very small; in our experience, project teams can com-
prise only a small number of AI specialists supported by programmers to help with data
access, integration and such like. Whilst tooling is improving with the growth in data sci-
ence as a discipline, we still rely heavily on spreadsheets and simple scripting. Delivering
thinking machines has to be one of the greatest engineering challenges ever attempted, yet
we are many years behind conventional engineering in terms of our ambition for tackling
complexity and the tooling and processes we need to do so.
Before you drop this book and run away in fear … it’s not all bad news. We may have a
long way to go in developing our tooling and processes; however, we have already started.
There are things you can do to bring a professional engineering approach to your proj-
ect … so please read on!
• Business Relevance: Will the investment in the AI support your business and make
more than it costs? “If you build it, they will come” applications or novelty AI built
just for the sake of it will still have value for a web company it often showcases the
art of the possible or is simply fun and will attract consumers to browse more – both
valid business outcomes. It is not so easy for a typical business, corporate enterprises
with a duty to shareholders to tie any investment to specific measurable business out-
comes before they spend out on an IT project.
• Stakeholder Agreement: Will the key stakeholders see the AI as a good thing? A web
company and a traditional business or government department have very different
stakeholders with different perspectives of what is good for the business. Would you
as a customer stakeholder of Google stop using it as a search engine if it occasionally
30 ◾ Beyond Algorithms
returned poor search results, would you feel the same if your bank AI occasionally
gave you someone else’s financial data?
• Application Complexity: Can you make a profit from a simple AI or will you need a
moonshot project to find success? There is a lot of hype around AI, especially around
how easy it is. Luckily it is, for the most part easy to use, but that doesn’t mean it
will be a good fit to every business problem you’d dearly like to solve. AI fits some
problems well and others really badly, AI applications are always data dependent.
This book will help you spot the viable and valuable opportunities. Businesses, con-
templating their first AI, often underestimate the complexity that will be required to
meet their “simple” requirement.
• Correctness & Consistency: Do you need to guarantee that the AI is giving you the
best answer every time or is some ambiguity/error acceptable? You can either pick
a business problem that the users can easily distinguish and skip erroneous results,
such as a web search for a new golf club, bringing back adverts for Volkswagen cars,
or prepare to invest, to chase down those edge cases that could ruin your plans.
• Decision Consequence: What will you do when (not if) the AI makes the wrong
decision? Put simply, the stakes are often higher for an AI in a business setting, versus
a web demonstrator system, that is not trying to solve a specific business problem but
rather indicate how it might be possible.
Very High
High
Medium
Low
Very
Low
FIGURE 2.1 Figure on the left describes a simple rubric of eight metrics to assess an AI project.
Each metric is rated from very low to very high. Figure on the right compares the metrics for an
application in web technology companies (in red) to a business solution for a typical enterprise (in
blue).
Building Applications ◾ 31
• Data Strategy & Governance: Can you guarantee a steady supply of source data, will
you need to archive it in case something goes wrong and you need to rebuild your
system? Do you have a person/team ready to take on the care of the AI. Hint: AI proj-
ects need skilled maintenance throughout their lifecycle.
• Skills & Resources: Do you have enough AI skilled staff, both for the development
and the ongoing maintenance?
• Ethics & Regulations: Are you breaking any laws or will your AI cause a Public
Relations issue?
Figure 2.1 provides a radar chart showing these features and comparing a typical enter-
prise application with those of internet technology companies.
As you can see, enterprise and web AI applications are very different. The hard part is
understanding and accepting that point. Once it is accepted, then good management and
engineering practice can be applied to ensure the challenges of the enterprise are properly
addressed. The challenges are all manageable once the decision has been taken to manage
them.
James Luke
AI Application
Input AI Output
Component
FIGURE 2.2 AI application with one AI component. Shaded boxes are non-AI components for
dealing with inputs and outputs in general.
Building Applications ◾ 33
Business Results
Since November 2016, the AI service is deployed to several million small businesses in
US, UK, Australia, Canada, India and France, involving billions of financial transactions
each year. The AI service results in 56% fewer uncategorised transactions and 28% fewer
errors, resulting in a substantial reduction in the number of errors and the manual work
needed.
Defect Finder for Weld Nuts in a Manufacturing Company (Frontec Co, Ltd.)
Business Problem
Factory produces critical weld nuts for use by companies in the manufacturing industry
[17]. Quality inspection of the weld nuts was manual, subjected to individual decisions
and employee state, such as fatigue. Since even an apparent defect resulted in expen-
sive investigations, it became necessary to automate the inspection. The speed of quality
inspection had to be less than 0.2 seconds per weld nut with at least 95% defect classifica-
tion accuracy.
Business Results
The resulting embedded classification system was integrated with an existing vision
inspector environment. Since the deployment on November 2018, the expected monetary
benefit is US$20,000 per month, a combination of labour costs and failure costs from prior
manually performed quality inspections.
34 ◾ Beyond Algorithms
Business Results
A proof-of-concept prototype was developed in 2017, and a refined version was rolled out
in 2019. The annual cost savings from the true positive contracts spotted so far are esti-
mated to be 7 million Singapore dollars.
The next set of application examples involves two AI components that can be connected
in different ways depending on the application architecture. The components can also rep-
resent different types of underlying algorithms, as will be evident below.
Input Output
Causal Model AI Planning
Extraction
FIGURE 2.3 Two AI components in series. Shaded boxes are non-AI components.
Business Results
IBM Scenario Planning Advisor is currently deployed to support 70 IBM finance teams
and an external client. A careful user study shows that the tool can generate a model and
ten scenarios in about 11 hours compared to about 3800 person-hours for people to achieve
the same.
Life Score
Calculator
profile as input and produces a single, standardised life score for each individual. In the
end, AI underwriter had a life score and risk class for each applicant.
Business Results
The new approach resulted in a 6% reduction in claims in the healthiest pool of applicants.
Over a 2-year period, this system reduced the time to issue policies by >25% and increased
customer acceptance by >30% for offers made with only light manual review. Overall sav-
ings were millions of dollars in operational efficiency behind tens of billions of dollars of
policy holder benefits.
Next let us consider applications that have three or more AI components. These compo-
nents can be used in different ways in the architecture based on the application flow.
Chatbots
The best examples are popular Chatbots such as Apple/Siri, Amazon/Alexa, Google/Home
or Microsoft/Cortana used in the simple question answering mode. Figure 2.5 represents
the high-level view of a speech-based Chatbot. When the user speaks “Who is the president
of the United States?”, it gets transcribed by the “Speech-to-Text” service into the cor-
responding text. The text gets passed as input to the next AI task “Question-Answering”,
which is another service in the popular domain from the companies, which brings the
answer as “Joe Biden”. The answer gets passed to the “Text-to-Speech” service, which ren-
ders the answer to the user. For simplicity, we have represented ‘Question-Answering’ as
one AI component in Figure 2.5, but in reality, it will consist of many subcomponents [21].
There is a user input analysis component that extracts user intent (“Name”) and entity
identification (“President of the United States”). It may also perform additional tasks such
as user sentiment analysis. There is a dialogue management component that manages
ambiguities, errors & information retrieval and a response generation component which
prepares the textual content to be delivered to the user.
Multi-Skill Agents
This next example is an agent that can answer questions in multiple domains such as
Travel, Weather and Food [22]. Bot designers call these domains ‘skills’. The basic idea
Chat Bot
Multi-Skill Bot
Input Output
Agent Bot
FIGURE 2.6 Multi-skill bot with parallel AI components for three different skills.
is that each bot specialises in one skill (i.e. answering questions in one domain). In this
example (see Figure 2.6), the input comes as text and the agent bot determines the intent of
the input message from user and directs the flow to the specific bot with sufficient details
and collects the response of the bot as output to the user. Clearly, this pattern can be com-
bined with the Chatbot design above to create a voice-driven application.
Business Problem
In a typical sports tournament (e.g. Masters Golf, Wimbledon Tennis, US Open Tennis,
etc.) there are numerous matches played between many dozens of players over many days.
The production of sports highlight summary capturing the most exciting moments of the
tournament is an essential task for broadcast media. For even 1 day of the tournament,
this process may require manual viewing of hundreds of cumulative hours of video and
selecting a few minutes of highlights appealing to a large audience, in time for the evening
summary report. This is obviously very labour intensive. Can AI do better?
Process TV
Graphics/OCR
Process Highlight
Analyst Input Metadata
Player Action Highlight
Visual Recognition Highlight Video
Video Boundary Segments
Player Facial Detection
Expression Player
Excitement
Recognition
Composite
Crowd Cheer
Spectators Excitement
Audio Detection
Excitement Metric
Tone-based
Detection
Commentator
Excitement
Text-based
Detection
FIGURE 2.7 AI components in IBM’s High-Five system to select video highlights from the video
inputs of sports tournaments. The system processes visual, audio and text data in parallel to iden-
tify the video segments of interest. The rectangles with rounded corners are AI components, ovals
are evaluation steps and rectangles are summary information for the output.
along with game analytics to determine the most interesting moments of a game. High-Five
also accurately identifies the start and end of the video clips with additional metadata, such
as the player’s name and game context, or analysts input allowing personalised c ontent
summarisation and retrieval. Figure 2.7 gives a high-level architecture of the High-Five
system.
Business Results
The system was demonstrated at the 2017 Masters Golf tournament and at the 2017
Wimbledon and US Open tennis tournaments through the course of the sports events.
For the 2017 Masters, 54% of the clips selected by the system overlapped with the official
highlights reels. User studies showed that 90% of the non-overlapping ones were of the
same quality of the official clips for the 2017 Masters. The automatic selection of highlights
of 2017 Wimbledon and 2017 US Open agreed with human preferences 80% and 84.2% of
the time, respectively.
High
Suitable for
AI
Confidence in the AI
AI Augmenting
Done
by Humans
Low
Low Consequence of AI Task High
in order to meet your business objectives. You need to think about where you can benefit
the most by the introduction of AI and its potential to please or displease the user. As an
example, if you have live customer support only during a certain time window during a day
and only for the weekdays, you can consider introducing a customer service bot that can
provide some support during the off hours. You do not even have to match the full func-
tionality of your live agents, that could come over time, but customers will see it as a bonus
and are likely to put up with its shortcomings as it provides an out of hours service. If you
were to introduce this limited service as a replacement for your live agents the customers
would rightly see it as a degrade in service, it might have a negative effect on your business.
Once you start having some experience with the performance of the bot, you can consider
introducing the bot to take care of simple informational tasks so that you can free your
employees to attend to more demanding tasks in customer support.
Figure 2.8 gives a broader guidance to this concept by considering the confidence in the
AI to do the task versus the consequence of the AI task. When the consequence of the AI
task is low (e.g. restaurant recommendation) and you have high confidence in AI doing the
task, this is the best situation to introduce AI. When the confidence in the AI is low and the
consequence of the task is high (e.g. risk of human life, significant revenue loss, etc.), the
choice is clearly a human. For tasks in the middle of this spectrum, augmenting humans
with AI is the best approach.
User Interaction
H. Interact Human to Machine Machine to Human Machine to Machine
C. Process Acquisition
License terms,
Selection
Cleansing, Sampling, etc.
Annotation
Crowd sourcing, etc.
etc.
Unstructured Structured
Images Videos
Semantic Web
Multiple formats Multiple Lang. & formats
Streaming IoT
Each component will
Stock market, Live events, need to be considered in
Vehicles, Devices, etc.
Text messages, etc. your AI project plan!
FIGURE 2.9 Activity diagram. An overview of the various tasks to build AI applications
grouped by the Activity labels on the left. An AI application will need one or more tasks from
each activity layer.
Building Applications ◾ 41
User Interaction
H. Interact Human to Machine Machine to Human Machine to Machine
Trust
G. Ethical Explanation Reasoning
C. Process Acquisition
License terms,
Selection
Cleansing, Sampling, etc.
Annotation
Crowd sourcing, etc.
etc.
Unstructured Structured
Images Videos
Semantic Web
Multiple formats Multiple Lang. & formats
The bottom three layers A, B and C have to do with the choice of the data sources,
data types and the data workflow that needs to be supported during the AI application
lifecycle. These are discussed in Chapter 7. AI model management, layers D & E, includes
all aspects of model creation, validation and sustenance through the application deploy-
ment (Chapters 2 and 10). Human “Knowledge” tasks in layer F refer to the ability of the
application creators to understand and validate the knowledge captured in the AI and its
limitations to be able to add suitable insights (e.g. guardrails in terms of rules for allowed
application behaviour). The ethical layer G is to make sure that the application meets the
trust expectations of the users of the application in terms of explainability, reasoning and
AI governance (Chapter 5). User interaction models, layer H, include the consideration of
various aspects how users will interact with the application in terms of input and output
modes (text, speech, gestures, etc.) and fault-tolerant designs to recover from potential fail-
ures in the user-AI communication. The top layer just recognises the fact that the users of
the AI application can be one or more individuals or for that matter, another machine. To
be considered complete, an enterprise application has to be designed with consideration of
at least one component (represented by the blue shaded boxes) from every row.
An Example of Using the Activity Diagram
Figure 2.10 highlights the specific activities in Figure 2.9 related to building a simple text
based conversational assistant tool for a typical business enterprise.
COMPLEXITY OF AI APPLICATIONS
Given the various options for implementing enterprise AI applications, it is worthwhile
discussing what makes an application complex. Some of the attributes are unique to AI
while others are generic attributes that contribute to any application complexity.
Number of AI Components
Generally speaking, more AI components in the system, more complicated it will be. Most
AI Applications that include AI Components operate with some form of data flow where
data enters at one end and an answer is produced at another.
Interdependence of AI Components
If your AI application does rely on multiple AI components, then you need to think about
the interdependence between them. In a simple system, where data flows out of one AI
component into another AI component, what happens to any errors? Generally, errors tend
to be compounded when they flow through systems so it is really important you under-
stand how the different sub-components will interact. It’s only by doing so that you can
ensure the correct level of accuracy at each stage.
Event-Driven Systems
However, in some cases, our AI Applications are required to interact in real time with their
environment. In such cases, the AI Components may be integrated using more of an event-
driven architecture. Clearly, such architectures are significantly more complex to develop
and test than the simpler data processing integrations.
Building Applications ◾ 43
Continuous Learning
One area that really excites AI Junkies, Algorithm Addicts and Nerds in general is the
thought of continuous, interactive learning. The idea that the AI is continually learning
and improving really excites those who dream of thinking machines. Sadly, it introduces
a whole new level of complexity and risk, as a system owner how will you know it’s learn-
ing the right things? The most notable example of this to hit the press was of course the
Microsoft Virtual Agent Tay [24] that was taught highly inappropriate language by its user
community.
Federation of AI Applications
The final factor that introduces complexity to an AI Application is when we decide to adopt
a system of systems where decision-making is federated out to multiple AI Applications.
One example of this would be to build a super search engine that took a user’s search term
and sent it to the top three search providers before fusing the results into a single result set.
Such approaches are also used in safety critical systems such as auto-pilots where there are
perhaps three separate collision detection systems and one over-arching Application that
arbitrates their separate decisions.
Note that the complicating factors we talk about above are not a hierarchy. They are not
mutually exclusive. So, it is possible to have just about any permutation of these features in
your AI Application.
The day-to-day operations of every enterprise involve many different business processes
and many of those processes could benefit from the application of AI. This raises the very
serious challenge of how we scale our ability to apply AI when there are so many opportu-
nities. Our ability to meet this challenge is limited by the complexity of these solutions and
44 ◾ Beyond Algorithms
the skills required to deliver them, hence, the need to rapidly grow skills in the application
of AI to real-world enterprise problems.
AI Component
Automated Evaluation
Input Output Subsystem
ARBITRATOR
Input Output
FIGURE 2.11 On the left is the conventional ‘Static Integration’ where one AI Component is
retained for several years. On the right is the future proof ‘Dynamic Integration’, where multiple
services are continually being evaluated with the optimum configuration in use at any given time.
Building Applications ◾ 45
James Luke
Sustain
Develop
Performance
5 Statistic Collection
ALERTS
Contingency /
Data Preparation Fail Over
Development
Production
AI Application Monitoring
Testing
AI Application
Development 4 Deploy
2
DATA training
Data
Preparation
USER
1 AI
3 live Processing
Output
Preparation
Education
The first and perhaps the most important is Education. That’s why you’re reading this
book so at this point we must, very sincerely and with straight faces, ask you to recom-
mend this book to your colleagues, friends, family, fellow commuters and anyone you
happen to walk near or past in the street. It’s your solemn duty to do so for the greater
good of humanity.
In all seriousness though, education is particularly important with reference to Domain
Understanding. One hard-earned lesson from delivering real AI solutions is that domain
knowledge is a major critical success factor. No IT consultant or engineer can understand
your business as well as you do! It’s therefore really important for you to acquire sufficient
Building Applications ◾ 49
Defining an AI Strategy
It takes more than Algorithms !!
AI knowledge amongst the domain experts in your enterprise to be able to form joint
teams with AI suppliers in order to ensure success for your AI projects.
Measurement
Beyond education, you need to understand the value that AI brings to the business. This
can be measured directly, in terms of overall business impact, or indirectly in terms of
enabling business impact. To clarify the difference between the two, an AI that enables a
lawyer to find relevant case information more quickly has a measurable impact in terms
of the time saved and the improvement in efficiency. However, an AI that automatically
translates foreign language documents into a native language as an input to the aforemen-
tioned search tool has an indirect benefit. It may not be possible to quantify this benefit in
terms of overall business impact; however, it is a critical enabler for a solution where the
impact can be directly measured. Measurability and value are massive subjects and will be
discussed in detail in Chapters 5 and 6 (Business Value & Impact and Ensuring It Works &
How Do You Know).
Method
Now that you’re educating your team and measuring business value, you can start to
understand where to apply AI for optimum benefit. Selecting the right project is another
critical success factor and will be discussed in the following chapter (imaginatively called
Know Where to Start – Select the Right Project). One anecdotal observation from business
consultants about AI is that the domain seems to comprise a never-ending series of Proofs
of Concept (POC). This has rather amusingly been referred to by one of our colleagues as
“perpetual beta”. Selecting the right project is essential in escaping “perpetual beta”.
Factory
Finally, you need to put in place a development organisation, which we refer to as a Factory,
aimed at reducing the cost of evaluating and delivering real AI Applications. The AI Factory
brings together people (skills), assets and tools from multiple suppliers with a process to
enable the rapid development and deployment of operational capability (including ongoing
operational support). The AI Factory is intended to enable the delivery of capabilities from
an ecosystem of suppliers such that the optimum capability is available to the authority.
50 ◾ Beyond Algorithms
The aim of the AI Factory is to reduce the time, cost and effort to evaluate and deploy new
AI capabilities. When a requirement is identified, the AI Factory first looks to existing
capabilities (e.g. developed for other missions with the same, or similar, requirements) and
makes these services available. In the event that the existing capability is not suitable, the
next question is whether an existing capability can be adapted. If not, then a new capability
can be developed and evaluated. Even when this is required, the capability development
should be more rapid due to the existence of existing infrastructure, services, test and
automation tooling.
One of the major inhibitors to the successful delivery of AI projects is the upfront cost
of evaluating possible projects.
Invariably, once a possible project has been identified, the first question that needs to
be asked is whether the AI can do the job. At that point, a proof of concept is proposed
and a whole series of logistics need to be considered. What data is needed for the POC?
What technologies will be evaluated? Are there multiple suppliers that need to be com-
pared? What hardware will the evaluation run on? Who is going to label any training data
(remember not all AI projects are Machine Learning)? How will we actually compare the
output of the AI with any test data?
In short, conducting a POC is not a small task and it’s not uncommon for what appeared
to be a simple evaluation to grow into a project lasting several months and costing tens,
or even hundreds, of thousands of pounds. The costs of running such POCs can soon
become an inhibitor … especially when the first attempt is unsuccessful. Beyond the POC,
Building Applications ◾ 51
remember that this is not a static situation. The initial POC may prove the feasibility of
applying AI; however, maintaining and optimising the application is a continual process.
Imagine developing an AI application and identifying three potential technology suppli-
ers. Each supplier is continually developing and enhancing the service. The service that
performs best in today’s evaluation actually performs worst in 3 months’ time. If we are
to realise the full potential of AI, applications need to be continually maintained with an
ongoing evaluation of technology options. That requires skills, infrastructure and tooling.
In deciding whether or not to adopt an AI Factory model, organisations need to give
serious consideration to the scale and form of their AI ambitions. When brainstorming
potential AI applications, it is not unusual to identify 20 or more ideas. The cost-to-POC
issue causes organisations to focus on just one or two. That’s a serious mismatch and dem-
onstrates the braking effect of the cost-to-POC issue. There is a chance that your organisa-
tion only needs to develop one single AI application and, therefore, the cost of developing
an AI factory cannot be justified. In that case, you are the exception. In most cases, there
is a need to develop and maintain large numbers of applications and the initial investment
in the Factory will result in both considerable savings and an acceleration in your ability to
transform your business through AI.
It’s exciting … but we’re only just getting started. We will develop various concepts in the
upcoming chapters to understand the unique challenges of real enterprise AI applications
and how to address them in practice.
REFERENCES
1. A. L. Samuel, “Some studies in machine learning using the game of checkers,” IBM Journal of
Research and Development, pp. 211–229 (1959).
2. G. Tesauro, “Temporal difference learning and TD-gammon,” Communications of the ACM,
38, pp. 58–68 (1995).
3. M. Campbell, “Knowledge discovery in deep blue,” Communications of the ACM, 42,
pp. 65–67 (1999); https://www.youtube.com/watch?v=KF6sLCeBj0s.
4. D. Silver et al., “Mastering the game of go without human knowledge,” Nature, 550, pp. 354–
359 (2017).
5. J. Weizenbaum, “ELIZA: a computer program for the study of natural language communica-
tion between man and machine,” Communications of the ACM, 9, pp. 36–45 (1966).
6. T. Winograd, Understanding Natural Language Academic Press, (1972).
7. D. A. Ferrucci, “This is Watson,” IBM Journal of Research and Development, 56 (2012) and the
rest of the Issue.
8. N. Slonim, et al. “An autonomous debating system,” Nature, 591, pp. 379–384 (2021).
9. T. Young, et al. “Recent trends in deep learning based natural language processing,” IEEE
Computational Intelligence Magazine, 13(3), pp. 55–75 (2018).
10. W. Van Melle, “MYCIN: a knowledge-based consultation program for infectious disease
diagnosis,” International Journal of Man-Machine Studies, 10, pp. 313–322 (1978).
11. R. K. Lindsay, B. G. Buchanan, E. A. Feigenbaum and J. Lederberg, Applications of
Artificial Intelligence for Organic Chemistry: The DENDRAL Project McGraw-Hill, New
York, (1980).
12. C. Garcia, “Harold Cohen and AARON—A 40-year collaboration,” (2016). https://computer-
history.org/blog/harold-cohen-and-aaron-a-40-year-collaboration/.
13. W. G. Walter, “An imitation of life,” Scientific American, 182, pp. 42–45 (1950).
14. R. Behringer, “The DARPA grand challenge – autonomous ground vehicles in the desert,”
IFAC Proceedings, 37, pp. 904–909 (2004).
15. R. G. Smith and J. Eckroth, “Building AI applications: yesterday, today, and tomorrow,” AI
Magazine, 38(1), pp. 6–22 (2017).
16. C. Lesner, et al., “Large scale personalized categorization of financial transactions,” The
Thirty-First AAAI Conference on Innovative Applications of Artificial Intelligence (IAAI–19),
pp. 9365–9372.
17. K. J. Lee, et al., “Embedding convolution neural network-based defect finder for deployed
vision inspector in manufacturing company Frontec,” The Thirty-Second Innovative
Applications of Artificial Intelligence Conference (IAAI–20), pp. 13164–13171.
18. E. Shaham, et al., “Using unsupervised learning for data-driven procurement demand aggre-
gation,” The Thirty-Third Innovative Applications of Artificial Intelligence Conference (IAAI–
21), Vol. 2., 2021.
19. M. Feblowitz, et al., “IBM scenario planning advisor: a neuro-symbolic ERM solution,”
Proceedings of the Demonstration Track at the 35th Conference on Artificial Intelligence (AAAI-
21), 2021.
20. M. Maier, et al., “Transforming underwriting in the life insurance industry,” The Thirty-
First AAAI Conference on Innovative Applications of Artificial Intelligence (IAAI–19),
pp. 9373–9380.
Building Applications ◾ 53
DOI: 10.1201/9781003108498-4 55
56 ◾ Beyond Algorithms
For those with less technical background, we will explain what algorithms are and posi-
tion their use within a broader AI application. We will examine some important algo-
rithms in the domain of AI and use some simple problems to demonstrate how different
algorithms can be applied to solve the same problem. The objective being to demonstrate
that there is no single magic algorithm, they are relatively interchangeable with each algo-
rithm having pros and cons. Algorithms are only as good as the data they operate against
and so we will introduce the concept of feature extraction and explain its importance
within AI applications. Finally, we will pull out some fundamental principles for the safe
use of algorithms that will enable you to successfully deliver AI applications.
INTRODUCING ALGORITHMS
Algorithm is a great word! Not long ago, most people would never have even heard of the
word. Yet, now it is the subject of dinner party conversations. The media and press are full
of articles about the brilliance of AI Algorithms in understanding who we are, what we do,
what we want and what we plan to do next. What is an algorithm anyway?
An algorithm is just a coherent set of steps to do a task. The purpose is to take some
input and create an output that meets the objective of the task. In some sense, a cooking
recipe with the necessary ingredients and the steps to cook the perfect filet mignon can be
thought of as an algorithm. Computing algorithms have a rich history [1]. The accompany-
ing vignette gives the details of three examples of algorithms created over a period of more
than 3000 years of human history, including the fact that Babylonians knew Pythagoras’
theorem more than 1000 years before he gave us his version.
Before the invention of computers, these algorithms were performed by humans by
hand. With the use of computer programs, they become more natural in specifying the
required operations to a computer. In that context, there are generally five key properties
algorithms must have [2]:
In the algorithm examples described in the vignette, the steps are simple. As the com-
plexity of an algorithm increases, sometimes it is not clear if it can actually do the
intended task in time with the available computing resources or not. In fact, there is
an established discipline in computer science to assess the efficiency and feasibility of
algorithms [3].
It’s Not Just the Algorithms, Really! ◾ 57
ALGORITHMS IN AI
As we discussed above, the purpose of an algorithm is to execute a task. In his 1996 Turing
award lecture, Raj Reddy [4] discussed AI algorithms that:
AI algorithms tend to be rich in content and complex in design. He also observed that
it was less important if AI had more or less intelligence compared to humans; but it was
more important to understand that “AI is about creating artefacts that enhance the mental
capabilities of the human being”. This brings us to the ideas of “Intelligent Automation”
or “Augmented Intelligence”, the vision of humans and machines working together to do
better than either of them separately. Humans are better at self-directed goals, common
sense reasoning, value judgement, etc., whereas machines are better at large-scale math-
ematics, pattern discovery, statistical reasoning, etc. Can we work together to do some
amazing things, such as to find the cure for cancer or stop climate change?
In the last decade, the use of an AI component in the business context has had one of
three goals:
The scope of the AI tasks has been narrow, giving rise to the term ‘Narrow AI’. Tasks in
Narrow AI deal with one type of data such as text, or image, or speech, with the need for
large amounts of training data and result in ‘opaque’ (aka black box) models that cannot
explain their outputs. Recently, the industry has started efforts in ‘Broad AI’ which bring
together knowledge from a variety of sources and formats and perform more complex
58 ◾ Beyond Algorithms
ALGORITHMS
The goal of this vignette is to describe three interesting examples of algorithms.
1 ( L + W + D ) − 2 A
2
D=
2 (L +W + D)
This uses the Pythagoras theorem (L2 + W2 = D2) implicitly, more than 1000 years before
the time of Pythagoras! As an aside, Babylonians used a 60-based number (sexagesimal)
system, where 30 in sexagesimal is 30; 1 in sexagesimal is 60; 2 in sexagesimal is 120,
and (2,24) means (2 × 60 + 24 =144).
• Step 1: Subtract multiples of the smaller number from the larger number until the
remainder is less than the smaller number and keep the remainder and the smaller
number. Four multiples of 12 subtracted from 56 give the remainder 8. Twelve and
8 are the new numbers to compare.
• Step 2: Repeat the process. One multiple of 8 subtracted from 12 gives a remainder
of 4. Eight and 4 are the new numbers to compare.
• Step 3: Repeat the process. Two multiples of 4 subtracted from 8 gives a remainder
of zero. The algorithm ends. The GCD for 12 and 56 is 4.
This sorting is accomplished in first three passes and the fourth pass is just to confirm
the result. There are many different algorithms [4] to do the sorting task (Quicksort, Merge
sort, etc.). The specific algorithm is selected for a task based on the size of the input list and
the time it takes to execute the number of operations needed to complete the algorithm.
REFERENCES
1. D. E. Knuth, “Ancient Babylonian algorithms,” Communications of the ACM, 15,
pp. 671–677 (1972).
2. Euclidean algorithm: https://en.wikipedia.org/wiki/Euclidean_algorithm.
3. E. Friend, “Sorting on electronic computer systems,” Journal of the ACM, 3, pp.
134–168. (1956).
4. Sorting Algorithms: https://en.wikipedia.org/wiki/Sorting_algorithm.
tasks with less data requirements and some explanations of outputs. It is likely that Broad
AI will take many more years to evolve and prove itself. We believe that the age of General
AI, when a machine can learn and do any intellectual task that a human being can (i.e. the
Skynet and the robots of the Terminator movies), is many decades away, if at all [5].
ALGORITHM ADDICTION
When things go wrong with AI applications, it’s often the algorithm that is blamed. One
example of this is in the domain of AI bias; we hear reports of algorithms being biased.
After some investigation of the origins of the bias, one quickly finds out that the data used
to train or configure the algorithm was the real culprit. This is the case in most of the
instances. This is because the algorithms are just finding the patterns in the data, which
may not have been obvious before AI came along. Unfortunately, truth hurts! We use bias
as an example of our tendency to blame the algorithm. However, there are many other
types of AI failure where the algorithm is invariably held accountable.
The nature of our business contributes to our obsession with algorithms. Despite mas-
sive investment, a lot of AI research still happens in relatively small teams with individual
engineers desperate to solve the problem of creating intelligence. Many engineers dream
of creating a beautiful and elegant algorithm that will, somehow, evolve intelligence when
presented with enough data. Algorithms are fascinating, exciting and essential, but it takes
much more than an algorithm to successfully deliver an AI solution. This isn’t easy to
accept because many of us, who work in this domain, need to confess that we are … quite
simply … Algorithm Addicts. We love algorithms and we spend our lives thinking of them,
trying to understand them and dreaming up new ones.
60 ◾ Beyond Algorithms
As Algorithm Addicts, we must come to terms with our addiction and warn others of the
risks. The single greatest threat to the advancement of AI is believing that there is a magic
algorithm that will solve the challenge of creating AI. The failure of AI projects is often
attributed to the need for a more intelligent algorithm. In our experience, AI projects are
more likely to fail because of a fundamental breakdown in systems engineering. Often this
breakdown starts with the problem selection and project definition, both of which will be
discussed in Chapter 4. However, before looking at problem selection, it’s useful to under-
stand some of the fundamentals of the algorithms that underpin current AI. Understanding
the fundamentals will enable you to understand that there is no magic in the algorithm.
There may be beauty, elegance, wonder and amazing mathematics, but there is no magic.
THE AI DISCIPLINE
Topics in AI have evolved over the years. The classic AI textbook by Russell and Norvig
[1] gives the full range of the technology in scope. As is evident, this is very interdisci-
plinary in nature, requiring concepts in cognitive science, neuroscience, mathematics,
computer science, engineering and ethics. Some of the key topics are
• Perception: Deriving information from sensory inputs (text, vision, speech, etc.) to
build the relevant knowledge.
• Knowledge Representation: Accumulating and storing the semantic knowledge as
ontologies, graphs, etc., in a knowledge base (a world model) in a specific domain
for practical use.
• Learning: Ability to learn from data and human inputs to update the knowledge base.
• Reasoning: Various process steps that utilise the knowledge base for practical use.
• Problem Solving by Search: An agent that helps to find answers to specific tasks by
leveraging the knowledge base.
• Common Sense: Incorporating various elements in the world model that are
assumed to be evident to humans, without explicit training.
• Rule-Based Systems: Creation of real-world systems using a collection of rules in a
domain, either based on expert knowledge with the use of relevant data.
• Planning: The task of designing a sequence of possible actions to achieve a goal.
Artificial Intelligence
Perception, Learning, Knowledge Representation, Reasoning,
Search, Common Sense, Rule-based Systems, Planning, etc.
Machine Learning
Supervised Learning (e.g. Logistic Regression,
Decision Trees), Unsupervised Learning (e.g.
Clustering), Reinforcement Learning, etc.
Artificial Neural Nets/Deep
Learning
Multi-layer Perceptrons,
Convolutional Nets, Recurrent
Nets, Autoencoders, etc.
Machine Learning (ML) is a part of the learning activities in an AI system that can be
performed with or without human supervision or intervention. An Artificial Neural
Network (ANN) represents a particular architecture that is derived from the structure
of the brain consisting of neurons, dendrites, etc. [2]. ANNs are made of a network of
artificial neurons with an input layer, an output layer and some intermediate (‘hidden’)
layers in the middle and connections across the layers. Deep Neural Networks (DNNs)
are ANNs with a large number of hidden layers.
REFERENCES
1. S. J. Russell and P. Norvig, Artificial Intelligence: A Modern Approach Pearson, 4th Edn (2020).
2. W. S. McCulloch and W. Pitts, “A logical calculus of the ideas immanent in nervous activity”,
Bulletin of Mathematical Biophysics, v. 5, pp.115–137 (1943).
diagnosis component. We’ll discuss these different approaches later in this chapter. However,
remember that there are thousands of different algorithms to select from. We’ve chosen those
three just to give you a feel for how different algorithms can work in an application.
62 ◾ Beyond Algorithms
The second point to understand is that algorithms are, in almost every case, generic!
Algorithms can be applied to many different problems with the only constraint being the
type of data fed into the algorithm and the type of output expected. For example, some
algorithms can only work on numeric data whilst others are designed to process lists.
Similarly, some algorithms can only output classifications (e.g. Good/Bad/Ugly) whilst
other algorithms generate numerical outputs. As long as an algorithm can handle the types
of input and output expected, it can be applied to most problems.
Algorithms are generally implemented using a chosen programming language (e.g. java,
python, etc.) as generic tools. In the brave new world of Cloud-based services, most of the
Cloud providers provide implementations of the most popular AI algorithms that you can
provision as services. So, how do you make a generic algorithm work for your particular
application? The answer is that you give it an AI Model. An AI Model is a configuration of
an AI Algorithm to perform a specific task. The form of the AI Model depends on the AI
Algorithm. In a rules-based system, the Model will be a list of rules. In a neural network,
the Model will describe the configuration of the network in terms of the number of neu-
rons and the connectivity between them. For example, if we use a Neural Network to per-
form the X-Ray classification, the AI Model will comprise the weights and configuration of
the neurons required to perform that task.
AI Models can be developed in many different ways that generally fall into one of three
categories. Firstly, Models can be defined by the developer. A human being can sit down and
manually create a Model using their human expertise. Secondly, the AI can be trained to
develop its own Model using training data that has been created by a human expert. In our
X-Ray classifier example, this would require a radiographer to manually examine several
hundred X-Rays and label each one as either malignant or benign. This form of training is
known as ‘Supervised Machine Learning’ as the human expert has prepared the training data
with output labels. Thirdly, the AI can train itself using ‘Unsupervised Machine Learning’. In
this case, there is no direct expert input (i.e. labels), and the AI examines data automatically
searching for interesting patterns. There is a vignette on an example of Unsupervised ML
later in this chapter. Much of the discussion in this chapter is on Supervised ML.
It’s Not Just the Algorithms, Really! ◾ 63
What does it mean therefore when you hear, for example, about searching the web for
images using neural networks? Quite simply, this means that there is an AI Application
for image searching that is using an AI Algorithm. The algorithm will be a neural network
which we’ll discuss in more detail later. The neural network algorithm will be imple-
mented somewhere in a conventional computer program. The algorithm will be custom-
ised to perform the image search task using an AI Model that was developed using ML
with a neural network.
It’s important to understand that, generally, there are two parts to the algorithms used in
any AI Application: the algorithm that uses a Model to perform a task and a different algo-
rithm that has been used to develop that Model. In some cases, these two algorithms are
developed using the same technology or approach whereas in other cases it is possible to
combine different approaches. For example, a model using a neural network will invariably
be developed using ML algorithms, whereas the model for a Rules-based Decision Engine
could be either manually specified or developed using ML.
The key to successfully developing an AI Application is understanding which algorithm
to apply and what it takes to make that algorithm work. That also means understanding
when the algorithmic approach can’t work and not trying to force the impossible. That’s
not as easy as it sounds in a world of Algorithm Addicts.
Now that we understand the basic anatomy of an AI Application, let’s take a deeper look
at what happens under the hood! There is a vast number of books that will explain the mul-
titude of different AI algorithms in detail. Whilst the sheer volume of algorithms may seem
daunting, it’s really not that bad! In fact, once you understand the fundamental operations
at the core of all algorithms, you will find it easier to see through the complexity. In turn,
that will allow you to see through the hype and make good decisions about which approach
is best for your projects.
AI Solution
(Making Dinner)
It is evening and you are hungry, which is your (business) problem. You could go to a
restaurant (which would be outsourcing), but you decide to make some dinner instead,
since (i) you are a good cook, (ii) have a kitchen and (iii) your refrigerator has the neces-
sary raw ingredients in it. These represent skills and resources to create the AI Solution.
“Making Dinner” is the AI Solution to your hunger. The menu consists of Grilled Steak,
Potato Salad and Carrot Cake, which are the three AI Applications in the solution. We
will focus on one application, viz., the Potato Salad in detail.
To make the potato salad, you need the raw ingredients, potatoes, mayonnaise, mus-
tard, etc., which can be thought of as data. Then, you have to follow a recipe from wash-
ing the potatoes, cooking the potatoes, adding mayonnaise, … etc., which constitutes
the AI application. Note that only one step actually involves cooking the potato, which
is our AI Component. Similarly, real AI applications include other processes as non-AI
components. The use of one or more AI components is what makes it an AI application.
Within this recipe, you have a choice of how you cook the potato. For example, you
could use a pot of boiling water over a stove, a microwave oven or a conventional oven.
You can think of these three methods as alternative AI Algorithms. Not all AI algorithms
are interchangeable with one another, you probably won’t want to grill your steak in the
microwave.
Depending on which cooking device you’ve chosen for your potato, you are going
to need some settings, power and timer settings for the microwave or temperature
and timer settings for the conventional oven, etc. These specific settings, optimised to
cook the potato to perfection, are the parameters chosen for the specific algorithm and
It’s Not Just the Algorithms, Really! ◾ 65
the data. Change the parameters and the outputs will change – your potato may burn!
AI Algorithm and the related parameters make up the AI Model.
Most metaphors can be stretched too far, but this one has one more gift to help
us understand AI. Let us consider the role of the humble potato (the “data”) in this
allegorical story... Could you have succeeded in any way to make your own potato
salad if you had no potatoes? After all, you’re a great cook, you have all the right kit,
the perfect recipe and you really really want a potato salad … The answer of course
is no, if you were short of some of the other optional ingredients say capers or even if
you had limited kitchen equipment you could probably make a less delicious potato
salad but just not without potatoes. I know this seems obvious, but many an AI
project, blessed with a clear business problem, equipped with all the latest gear and
skilled technicians and a really strong desire to build an AI to meet their demand …
has ended in exasperation because they forgot that all of that the importance of the
data completely outweighs the efforts – this is the opposite of a classic IT project.
Having the right data, and enough of it, is the immutable core of all AI, there is no
way around it.
drop them from different heights in order to see if they break or not. After a day of drop-
ping stuff, the minions return with the data shown in Table 3.1. We know this is a more
expensive option, but remember, we are imagining here! But, the realities of data collection
(see Chapter 7) to build AI models for business are not that different.
If we have this data, can we predict if a particular item of specific type and mass, when
dropped from a particular height, will break or not?
Fb (Medium) = 3000 Newtons and Fb (Hard) = 4000 Newtons, ‘Broken’ or ‘Unbroken’ labels
are created for each record. We created 1000 records using this approach to serve as a
labelled training dataset to create the Supervised ML models. The resulting data consisted
of 330 Fragile Objects, 168 Medium Objects and 502 Hard Objects.
In generating simulated data, we are of course revealing the fact that we understand
the physics of this problem well enough to be able to define a proper mathematical model.
As such, we should ask ourselves why we need to use ML. The answer in this case is simply
for educational purposes, and later in this chapter, we will compare conventional, math-
ematical analysis with the use of ML. As a general rule, however, always be cautious in
using ML if it is possible to solve the problem using more conventional scientific means.
FIGURE 3.2 A parallel coordinate plot to visualise the training data for the object dropping
roblem. Each record is represented by a line connecting the three values on the three axes. The
p
grey box in the middle gives the data behind the record shown as a black line representing a hard
object of mass 17 kg dropped from 5 m that did not break.
It’s Not Just the Algorithms, Really! ◾ 67
Lookup Table
The simplest form of AI would be to create a huge lookup table with every possible com-
bination of inputs to the system together with the desired outputs. The application task
will then be to simply take any set of input data and select the corresponding output in the
table. The lookup table would be the algorithm used by the AI task.
In practice, generating a lookup table is not a sensible approach for two fundamental
reasons. First, we don’t always have example data for every possible combination of inputs.
Second, even if we did have every possible combination of inputs, the table would be mas-
sive. Imagine creating such a lookup table for a driverless vehicle and trying to define,
obtain and manage every possible combination of input data covering different weather
conditions, road configurations and the actions of other drivers, pedestrians and animals.
Human beings do not operate using lookup tables! We learn from our experiences and
develop generalised Models, which we then apply to infer future decisions under new cir-
cumstances. Similarly, in AI, we need to develop general Models that perform the specific
AI task to make the right decisions even when it has not seen the exact data before. We’re
going to start by looking at two different ML approaches. We’re going to look at how both
a Neural Network, and a rules system can each be generated using ML algorithms.
0 0 0
Bias Bias Bias
FIGURE A: Examples of neuron activation functions, left to right, Step, Linear and Sigmoid.
When input data is presented to the neural network, each neuron calculates the weighted
sum of the inputs, applies the activation function and generates an output. This output
is then fed forward into neurons in the subsequent layer. In our example, a ‘broken’ or
‘unbroken’ classification occurs when the output of the ‘broken’ or ‘unbroken’ nodes
exceed 0.5. There is of course the possibility that both nodes could exceed 0.5 and, for
It’s Not Just the Algorithms, Really! ◾ 69
complex problems, a strategy to handle such scenarios may be required. Such a strategy
could include simply choosing the classification of the neuron that fired with the highest
value. For our simple problem, we didn’t have to worry about this as it never happened.
At the start of the training process, the weight and bias values are randomised. In
order to train the network, each record in the training set is presented to the network
in sequence. The output is calculated for each input record and the error between the
actual output and the desired input is calculated. This error is often referred to as the loss.
A ML algorithm back propagates the error through the network and adjusts the weight
and bias values for each neuron.
There are many different strategies for calculating the error and many different ML
algorithms. There are further parameters that can be used to define the behaviour of
the learning; for example, the proportion by which weights and bias are changed. For
experts in the field, we calculated the loss using a binary cross entropy function with a
batch size = 10 and we used stochastic gradient descent for the ML. We trained our net-
work with 800 records and use four-fold cross validation. Once trained, we evaluated the
model using the test set of 200 records.
The performance of the network is shown in Table 1 below.
TABLE 1 Comparison of neural net model performance for training and test data sets.
Kai Mumford
Kai Mumford was born and raised in the south of England. He is currently an IT Degree
Apprentice at IBM Client Innovation Centre at the Hursley Park in the UK. He is
currently finishing his final year of his bachelor’s degree at De Montfort University with
plans for graduate school.
has the ‘ground truth’ (i.e. right answers) because it has labelled data, it can calculate the
error (called ‘loss’) in the model output. One of the key steps in the training process, called
‘back propagation’, helps to minimise the error systematically by adjusting the underly-
ing parameters of the model in an efficient manner. The key word here is ‘minimise’, to
indicate that the model may still produce errors in the output, given the quality of the data
and level of tuning of the neural network structure and the parameters chosen in the end.
We refer to the ‘Neural Network’ vignette for more details on the model for the object
dropping problem. The model was trained with 800 records and evaluated using 200 ‘hold
out’ records that were not used in training. The overall accuracy was 98.4% for the training
data and 96% for the test data. This is very impressive for such a simple model and shows
the predictive power of the technique.
70 ◾ Beyond Algorithms
Height Neurons
Mass 1
Broken
Fragile 2
Unbroken
Medium 3
Each
Hard connection
has a weight
FIGURE 3.3 Description of key concepts for our Artificial Neural Network model.
Height = 3
Mass = 16 1
Broken
Fragile = 0 2
Unbroken
Medium = 0 3
FIGURE 3.4 Illustration of the neural network response to inputs for the case height = 3 m,
mass = 16 kg for a hard object to produce an output ‘unbroken’.
Overfitting is a common concern with neural network models, when too many model
parameters are fitted with too few training data. This is the consequence when the algo-
rithm finds a way to represent all the inputs and the corresponding outputs present in the
training data (i.e. ‘learning the deck’), rather than creating a generalised model that can
learn to respond to input data not previously seen during training. In this scenario, models
will yield high accuracy during the training phase leading to a false sense of victory, only
to fail badly during the ‘test’ phase or actual deployment.
In this very simple explanation, the Model comprises the configuration of the network
(e.g. how many neurons in each layer, number of layers, etc.) and the model parameters
(e.g. values of the weights connecting the neurons). Obviously, the model we have described
It’s Not Just the Algorithms, Really! ◾ 71
is very simple for the object dropping problem. Many neural networks for image recogni-
tion or Natural Language Processing (NLP) use dozens or hundreds of layers with many
neurons in each, resulting in millions of parameters. All these properties represent the
Model and need to be configured and tuned for optimal performance.
Discipline-Specific Model
In general, if the problem we want to solve belongs to a specific discipline, such as phys-
ics, chemistry, economics, operations research, control theory and electrical engineering
72 ◾ Beyond Algorithms
C5 RULE-BASED MODEL
C5 is a ML system that generates rules from training data. The algorithm was developed
by Ross Quinlan [1] and offers the major advantage of generating explainable ML models.
At a very simplistic level, C5 operates by generating a decision tree and then converting
that decision tree into a set of rules. The decision tree is constructed by examining each
potential parametric test for each of the input features. In our case study, the algorithm
would iterate through all the possible tests for height (e.g. height > 1, height > 2, etc.) and
then all the possible tests for mass and type. For each possible test, the Information Gain
is calculated and the test with the highest Information Gain is selected to divide the train-
ing set into subsets. The algorithm is then applied recursively to the subsets in order to
construct a complete decision tree.
Once a decision tree has been constructed, the C5 algorithm converts the decision
tree into a set of rules. In converting the decision tree into rules, there is a small element
of generalisation as a pruning algorithm is used to decide if aspects of the tree are too
specific. The resulting rule set can then be applied to previously unseen data. A further
point to note is that the rules are applied in sequence and the rule that fires first in the
sequence is taken as the output decision.
Taking the same 800 record training data used for the neural network deep dive, we
used C5 to generate the following rules.
It’s Not Just the Algorithms, Really! ◾ 73
We then tested these rules using the same 200 record test set that was used in
eveloping the neural network model. The performance of the C5 rules against both the
d
training and test sets is shown below.
TABLE 1 Comparison of C5 Rule-Based Model for the Training and Test Data Sets.
- Roy Hepper
Roy Hepper has worked in information technology for over 30 years on wide a variety of
projects – including using formal methods, knowledge representation techniques, statistical
learning and ML/AI. He continues to work on information architecture projects, applying
appropriate mixtures of conventional software architecture, design and programming,
augmented with advanced analytics & other ML/AI techniques.
REFERENCE
1. J. R. Quinlan, C4.5: Programs for Machine Learning Morgan Kaufmann Series in Machine
Learning, (1992).
invariably, the solution involves a good understanding of the discipline to develop the
insightful models and do the relevant mathematics. Obviously, our object dropping prob-
lem belongs to physics. The key idea is that the knowledge of the discipline helps to formu-
late the problem better and leads to a better understanding of the problem and the solution
than just doing blind data science of finding relationships between elements in the data.
To emphasise this point and having seen how ML can be used to solve a problem, let’s
go back to basics and try a conventional approach to solving the same problem. Let’s try …
don’t be scared … please keep reading … we’ll look after you … we promise … some good
old-fashioned mathematics and physics.
Why do we need to do this? Well … it is to help you understand what you are asking ML
to do. Such an understanding will help you define problems that are solvable, rather than
just blundering into a project and hoping for the best. That is a useful skill to develop as it
will help you assess the feasibility of using ML in other potential applications.
So, how would an old-fashioned engineer solve this problem?
First, an engineer would use their extensive knowledge of physics (because all engi-
neers love physics) to understand what was happening. Basic physics tells us that the
higher you drop an item from, the faster it will be travelling when it hits the ground. The
velocity and the mass will determine the impulse experienced when an item hits a hard
surface. The time for the object to come to rest on the ground and the impulse determines
the force with which an object hits the ground (Newton’s second law). Heavy objects
74 ◾ Beyond Algorithms
INFORMATION GAIN
Information Gain is used to determine if dividing a set of data into subsets increases our
ability to predict the class of an item in each subset.
Imagine a set of data comprising two output classes referred to as Class A and Class B;
for simplicity, let’s assume the data set contains 50 items of class A and 50 items of class B.
Now consider two parametric tests that each divide the data into two subsets.
The composition of the subsets resulting from each of the parametric tests is shown below:
TABLE 1 Examples of Partitioning Data. Parametric Test 2 Has More Information than
Parameter Test 1.
As you can see, Parametric Test 1 has taken a set of data with a 50:50 split in the classes
and created two subsets that each still has a 50:50 split. Conversely, Parametric Test 2 has
resulted in two subsets comprising a 90:10 split.
In this simple example, the Parametric Test 2 has clearly provided an Information Gain
as, by applying the test, we can more easily predict the class of an item in the resulting
subsets.
Information Gain can be mathematically calculated for much more complex data sets
comprising many more classes. It is a highly valuable tool in comparing different para-
metric tests to understand which is more effective in dividing a data set.
hit the ground much harder than light objects at the same velocity. Newton’s third law
explains that the force exerted by the object hitting the ground is equal and opposite to
the force exerted by the ground on the object. Finally, we know that the greater the force
applied to an object, the more likely it is to break. We can use some structural mechanics
and material science to figure out if the observed force is enough to break the object. In its
absence, we can use the experimental data to determine the thresholds at which Fragile,
Medium and Hard objects break. Then, we have the whole story. For those who can’t get
enough of the physics, we’ve shown the details in the “Physics Model Behind the Object
Dropping Problem” vignette.
Another example of a discipline-specific model is described in the explanation of rise
and fall of tides in Chapter 7, which involves the knowledge of astronomy behind the
motions of the sun, the moon and the earth.
modelling is superior to the other options in terms of the overall assessment, while requir-
ing the deep expertise in the discipline.
Background
We are dropping an object of mass, M (in kilograms) from rest at a height, d (in meters)
from the ground. The goal is to calculate the force (F) exerted on the object when it
hits the ground and relate that to the observation whether or not the object breaks. The
composition of the object will decide if the force is adequate to break the object or not,
a detailed understanding of which will take us into the disciplines of structural mechanics
and material science. For now, we will assume that there are specific breaking thresholds
of force Fb for each of object, when exceeded the object will break. In our example, we
have assumed Fb (Fragile) = 750 N, Fb (Medium) = 3000 N and Fb (Hard) = 4000 Newtons.
We also assume for simplicity that all the objects make contact with the ground for the
same time interval, Δt = 0.1 seconds before they come to complete rest. In general,
the calculation of Δt also needs a detailed understanding of the momentum transfer
between the object and the ground. The acceleration due to gravity at the earth’s surface
is generally indicated by the letter g and is decided by the mass and radius of the earth
and the universal gravitational constant G. Using the parameters for earth, g = 9.8 m/s2.
Vf g.t
Slope = g
Vi
0 t
Time, t (sec)
FIGURE A: The mathematics of the object dropping problem. Details are explained in text.
we need two additional concepts: Newton’s second law, change in momentum of the object
due to the impact is, Impulse I = MVf = ΔF and Newton’s third law, the force exerted by the
object on the ground is equal to the force exerted by the ground on the object, which unfortu-
nately leads to the breakage. If F > Fb, the object will break and F < Fb, the object will not break.
It is worth noting that there are only five parameters in the physics model: acceleration due
to gravity, three force thresholds for object breakage and the time for the object to stop after
hitting the ground.
Clearly, since this model is based on first principles, it can explain observations any-
where in the universe and more than 330 years of empirical validation of the underlying
physics confirms the validity of the model.
forms of explainability that should be considered, and these are discussed in more detail
in Chapter 5.
To repeat the same old message … there are many different AI Algorithms and it’s
important to select the right one for the right job. To do that, you need to avoid Algorithm
Addiction and look at the broader aspects of an AI project. You need to consider how accu-
rate the AI needs to be. You need to understand how you will evaluate its accuracy. You
need to consider the qualitative issues such as explainability and how important they are to
your business. You will usually benefit from testing more than one algorithm against your
data, as idiosyncrasies in your data may well favour one algorithm over its competitors.
Most importantly, you need to avoid being blinded by the belief that one algorithm alone
can magically make sense of your data!
James Luke
78 ◾ Beyond Algorithms
Speed On
Height (m) Mass (KG) Type Impact (m/s) Deceleration Force OUTCOME
1 1 FRAGILE 4.42944692 44.2944692 44 UNBROKEN
IF Height=1 AND Mass<5
1 2 FRAGILE 4.42944692 44.2944692 88 UNBROKEN
THEN UNBROKEN
1 5 FRAGILE 4.42944692 44.2944692 221 UNBROKEN
1 5 MEDIUM 4.42944692 44.2944692 221 UNBROKEN IF Height=1 AND Mass<7 AND Type=MEDIUM
1 5 MEDIUM 4.42944692 44.2944692 221 UNBROKEN THEN UNBROKEN
1 7 MEDIUM 4.42944692 44.2944692 310 UNBROKEN
1 10 FRAGILE 4.42944692 44.2944692 442 UNBROKEN
1 11 HARD 4.42944692 44.2944692 487 UNBROKEN
1 12 MEDIUM 4.42944692 44.2944692 531 UNBROKEN
1 13 HARD 4.42944692 44.2944692 575 UNBROKEN
1 14 FRAGILE 4.42944692 44.2944692 620 UNBROKEN
1 17 MEDIUM 4.42944692 44.2944692 753 UNBROKEN
1 19 HARD 4.42944692 44.2944692 841 UNBROKEN
1 23 HARD 4.42944692 44.2944692 1018 UNBROKEN
1 26 MEDIUM 4.42944692 44.2944692 1151 UNBROKEN
1 28 HARD 4.42944692 44.2944692 1240 UNBROKEN
1 30 MEDIUM 4.42944692 44.2944692 1328 UNBROKEN
1 32 FRAGILE 4.42944692 44.2944692 1417 BROKEN IF Height>31 AND Type=FRAGILE
1 34 FRAGILE 4.42944692 44.2944692 1506 BROKEN THEN BROKEN
…. …. .... …. …. …. ….
A Cluster 1 2 3 4
B Type Fragile Medium Hard Hard
Cluster size/total of 330/330 168/168 212/502 290/502
type in data
C Mean height (m) 3.03 3.08 4.5 2.0
D Mean mass (kg) 24.10 24.83 26.42 26.58
E Our description of the All fragile All medium Hard objects at Hard objects at
cluster objects objects long distances shorter distances
F Breakage percentage 77.9 % 17.3% 14.2% 0
from the labelled data
Row A identifies the cluster numbers and the next three rows (B, C, D) represent the properties of
each of the clusters. Row E is our “human” interpretation of the cluster based on these properties.
Row F is the breakage percentage for each cluster calculated from labeled test data not used in
the clustering model creation process. Columns 3 & 4 (row B) show that out of the 502 Hard
objects in the data 212 ended up in cluster 3 and 290 in cluster 4.
Our cluster descriptions in Row E attempt to give a short summary of the data repre-
sented in the four clusters. In practice, this means using domain experts to examine the
data in each cluster and determine what each seems to represent – this is not always
straightforward, it’s why you need domain experts in your team, especially if the algo-
rithm identifies a previously unknown novel sub-group.
In our example, the SPSS Modeler also outputs a number between 0 and 1 to indi-
cate the Prediction Importance of the input variables in the formation of the clusters.
It’s Not Just the Algorithms, Really! ◾ 81
The prediction importance numbers are 1.0, 0.33 and 0.01 for Type, Height and Mass,
respectively, indicating that the Type of the object played the dominant role in creating
the four clusters, followed by the Height variable, Mass playing practically no role. It is
easy to see how the clusters are related to the breakage percentages in Row F, supporting
the intuition observed in the data visualisation discussion in Chapter 3.
Height = 3
Mass = 16 1
Broken
Fragile = 0 2
Unbroken
Medium = 0 3
Hard = 1
Does this layer represent
physical concepts like
Acceleration, Force, etc.?
FIGURE 3.6 Imagining what the neural network may be doing; we really do not know.
that a Neural Network may have learnt to emulate the mathematical model that we derived
manually (Figure 3.6).
So, which of these possibilities has happened in the Neural Network that we trained?
The simple answer is that we don’t know and there isn’t really, at present, a way of know-
ing. We do know that parts of neural networks can learn to emulate complex mathemati-
cal functions. Unfortunately, the fact that a neural network could theoretically learn to
calculate Acceleration and Force doesn’t mean it actually has! We just don’t know. This is
one of the big disadvantages of Neural Networks. We can’t, with current technology, fully
understand what a neural network has actually learnt.
We can ensure that the neural network is safe to use in an operational scenario by ensur-
ing it is properly tested with data that covers all possible scenarios. Rigorous testing will
give us the confidence to use the application operationally.
In our simple example, it is intuitive that any AI Algorithm may stand a better chance of
success if the input data contained Force instead of Height. Quite simply, the AI will have
less to learn because it won’t need to figure out that it needs to calculate this new feature
called Force using the original feature of Height. The less the AI Algorithm has to learn, the
better the chance of success.
But what about more complex AI Applications? What features are important when pre-
dicting the stock market or performing facial recognition?
If we have 50 years of stock market price data, we can Define and Extract a massive range
of features. We can work out the 1-day average, the 7-day average, the 1-year average. We
can calculate trends in relative performance between stocks. The list is endless and there
are a great number of mathematicians, economists and scientists who invest an awful lot
of time analysing stock market data in search of the magic formula that will earn them a
fortune.
Hot tip … if you see a book entitled How to Predict the Stock Market, it’s highly unlikely
the author was successful. The odds are that they’ve written the book in the hope of recov-
ering their losses.
Face recognition is a really interesting AI challenge because it is a task that humans per-
form with apparent ease yet it has proved very challenging for AI. In the 1980s and 1990s,
face recognition research focused on the use of (conventional) neural networks. The key
question was what data to feed into those neural networks. Researchers looked at the fea-
tures that could be extracted from bitmap images and then evaluated which of those fea-
tures were most useful in enabling a neural network to learn to recognise a face (Figure
3.7). Typical features included measuring the distance between the subject’s eyes, the height
of a person’s ear lobes relative to their eyes and the angle of a person’s nose relative to the
line between their eyes. These features were manually defined using conventional image
analysis techniques and then fed into neural networks. This effort needed skills and was
Feature 1
Feature 1 – Distance between pupils.
Feature 2
Feature 4
Feature 3
labour intensive. Today, we tend to solve this problem differently using DNNs (see below).
In cases when you do not have enough data, manual feature extraction still has a role to play.
Hidden Layer
FIGURE 3.8 Comparison of a conventional neural network and a deep neural network.
84 ◾ Beyond Algorithms
layers, 650,000 neurons and 60 million parameters to optimise! [18]. This sheer number of
parameters that need to be configured means that we need significant processing power
and a massive volume of training data.
Whilst the size of DNN causes challenges in terms of processing power and train-
ing data volumes, they are considered to have one major advantage. There is less need to
undertake manual feature extraction of the type described above and shown in Figure 3.7.
In certain applications, the first few hidden layers appear to perform the feature extraction
automatically. The extent to which this happens may depend on the type of feature extrac-
tion required. Analysis of DNN behaviour suggests that for applications such as facial rec-
ognition, the DNN is learning to identify the type of visual features described in Figure 3.7.
However, does this mean that is always the case? If we were to apply DNN to a complex
radar data challenge, would the DNN learn to perform the type of sophisticated signal
processing currently designed by radar experts and embedded into the system design? The
simple answer is that we just don’t know at present. However, understanding what DNNs
are capable of learning and how they can be taught to learn those features with much less
data is the subject of considerable research right now.
to discover the exact weights, and other configuration parameters, required to emulate the
mathematics, without access to vast amounts of accurate training data. The same could be
said of advanced signal processing or control theory problems. Human beings do not learn
this level of mathematics just by example! We are taught mathematics at school and, even
then, relatively few human beings are able to solve this type of problem. Is it realistic to
expect a ML algorithm to construct such a mathematical model just by observation?
Which brings us to the second factor, the volume of data. In cases where there are mas-
sive volumes of data, then the chances of a network learning the real underlying model are
greatly enhanced. The smaller the amount of available example data for training, the less
likely the algorithm is to learn the model.
Finally, it depends on the efficiency of the deep learning algorithm. Given enough data,
there is no reason why the AI should not learn a real-world model if you test every possible
combination of weights and configuration parameters. The ML algorithm is effectively
conducting a guided search for the right combination of parameters. Theoretically, with
enough processing power and time, it would be possible to simply try every combination
of weights and configurations parameters; after all we do joke that if we take an infinite
number of monkeys and give each one a typewriter, one of them will type the complete
works of Shakespeare. Clearly, trying every combination isn’t an efficient strategy so DL
algorithms are being designed to perform a guided search. Even so, they are still massively
intensive in terms of computational requirements and, of course, the “guided” aspect of the
search may mean that the optimum solution is not found.
The reason the search is so challenging is that the search space of a DNN is massive. It
is no surprise therefore that the greatest advocates of DNNs and DL are web companies
with massive amounts of processing power and access to massive volumes of training data.
Applications such as image classification and speech recognition are perfect applications
for the use of DL.
In situations where the volumes of data are much lower, these techniques may be more
limited. In situations where there is already extensive scientific knowledge, such as signals
processing and radar theory, that knowledge should be used. It doesn’t make sense to rely
on the hope that the ML may discover what we already know.
The object breakage problem we have described above is an incredibly simple prob-
lem and the mathematical analysis, whilst frightening for most, should be trivial for most
engineers. An engineer, already familiar with gravity, and the likely relationship between
higher velocities and propensity to break data, needs few examples to create a useful model.
The human has an amazing head start over the algorithm which has literally no prior
knowledge and can only learn by examples (lots of examples).
Other factors need to be taken into account besides the accuracy of the actual solu-
tion. If explainability is an issue, then you may wish to consider a rules-based approach …
remember rules can be developed using ML as well!
From a strategic perspective, it is also important to ensure that you design your solu-
tion in a way that enables continual evaluation of different algorithms and products such
that you can swap in and out as appropriate. That means that your programme needs to
develop and maintain appropriate test data and the tools to automatically evaluate differ-
ent capabilities.
The key message is that there is no single best algorithm! At any point in time one par-
ticular approach may prove more effective than others; however, it’s a continuously chang-
ing domain. Solutions should be architected so that the AI algorithms are interchangeable.
Above all, it is important not to focus on a single algorithm … fight your algorithm addic-
tion … and ensure you are using the right tool for the job!
TRANSFER LEARNING
As discussed earlier, the best scenario for ML is when there is plenty of labelled training
data and adequate computing power. In many cases, collecting sufficient training data is
expensive or time consuming or simply not possible. This is where Transfer Learning can
help. It is a technique for knowledge reuse across related domains. Can we use the data
and the models developed for doing tasks in one (i.e. source) domain to help with tasks in
another (i.e. target) domain? For example, a photo analysis company could take the image
classification model from a large web search company and then use Transfer Learning to
develop its own customised version.
nature of the tasks in a domain, (iv) the availability of data in a domain and (v) the choice
of the algorithm.
Discussion of the technical details behind Transfer Learning and the various approaches
is beyond the scope of this book. We refer to excellent survey papers [21–23] on the topic.
We want to discuss a practical example of Transfer Learning to demonstrate the nature of
the ‘related’ domains and the contribution of Transfer Learning to the task at hand.
Consider the problem of classifying customer sentiments as positive or negative based on
the customer reviews of products in four domains (Books, Kitchen Appliances, Electronics
and DVDs) on Amazon [24]. Each customer review consists of some textual input and an
associated rating of 1–5 stars. More than three stars was taken as positive sentiment and
less than three stars was taken as negative sentiment. These provide the natural labels/
target variables for the classification of the text. Zhuang et al. [23] showed the comparison
of Transfer Learning results from every domain to every other domain; that is 4 × 3 = 12
Transfer Learning experiments, each using ten different algorithms. The baseline was the
in-domain classifier. Most algorithms achieved better accuracy than the baseline, when
the source domain was electronics or kitchen appliances. This indicates that these two
domains may contain more transferable information than books or DVDs. Also, five algo-
rithms performed well in all the 12 tasks while three were relatively unstable. The better
performing algorithms were feature-based, which is currently the popular approach to
Transfer Learning.
Pretrained Models
Let us get back to our music analogy and pick the piano as our first instrument to
learn. Piano is a unique musical instrument that covers the widest range of notes on
the musical scale than any other instrument and gives the broadest exposure to music.
Besides, it requires the coordination of the brain, hands and feet. After learning to play
the piano, learning other instruments is much easier. Analog of this in transfer ML is
to use generic data in a domain to pretrain a model and then fine tune it to a specific
application domain. This is particularly effective when there is not enough data in the
application domain.
Here is an example of this approach in medicine. Nanni et al. [25] studied the clas-
sification of patients in the various stages of developing Alzheimer’s disease (AD) from
3D-MRI images. There are four groups: (i) cognitively normal patients (CN), (ii) patients
with AD, (iii) patients with mild cognitive impairment who will convert to AD (MCIc) and
(iv) patients with mild cognitive impairment who will not convert to AD (MCInc). They
had labelled MRI data for a total of 773 patients in these groups, and this was not enough
to train large traditional image classification neural networks. The authors demonstrated
very good classification performance using pretrained standard 2D image recognition
architectures (trained on non-medical images) when supplemented by an effective method
to decompose 3D MRIs into 2D structures. This suggests that the features learnt from non-
medical images in pretrained networks are effectively transferred to medical images.
Another area where pretraining models have become popular is in natural language
processing [26]. Compared to the days when NLP methods relied on discrete hand-crafted
88 ◾ Beyond Algorithms
REINFORCEMENT LEARNING
There are scenarios where the AI must decide on a set of sequential actions leading to an
optimal outcome. Obvious examples are games (i.e. Chess, Go, Atari, etc.) where indi-
vidual moves/actions lead to a win or loss. As for a business example, you can think of the
problem to optimise the supply chain for a widget business for most profit that considers
widget pricing, consumer purchase behaviour, manufacturing cost, widget shelf time, etc.
To address this type of problems, Reinforcement Learning model [28] using historical (or
simulated) data may be the best place to start, and depending on its performance, it can
be integrated into the live environment. Figure 3.9 describes the interaction between the
environment (i.e. system) and the AI agent. In each step, environment provides the system
state, AI agent takes an action which changes the state and the environment provides a
reward (positive or negative) to nudge the actions of the AI agent to move in the desired
direction. While Reinforcement Learning does not need labels for outputs, setting up an
AI agent to respond to the appropriate rewards for each likely action and letting the sys-
tem figure out the optimal solution is the hard part that can take considerable effort and
compute time.
Environment
AI Agent
FIGURE 3.9 Interaction model between the environment and the AI agent in Reinforcement Learning.
relatively simple, small and elegant computer program that somehow replicates itself and
naturally evolves into a form that is capable of achieving human levels of intelligence … or
greater! Before being seduced into believing that we are on the verge of this seismic event,
how about taking a step back and comparing an artificial neuron with a real neuron?
Artificial neurons are relatively simple and can be described in just a few sentences. They
take a set of inputs, multiply each input by a weight and then apply some form of activa-
tion function to generate the output. Networks of these very simple neurons are connected
together and the output of each neuron feeds forward into other neurons to generate the
final network output. When implemented in a computer program, everything operates in
a controlled sequence starting with the first neuron in the first layer and finishing with the
last neuron in the final layer.
Sejnowski [29] gives an excellent comparison of today’s DNNs to the structure and
performance of the human brain. Evolved over 200 million years, human neocortex
(called the grey matter) is about 30 cm in diameter and 5 mm thick when flattened. There
are about 30 billion cortical neurons in the human brain forming six highly intercon-
nected layers. Each cubic millimetre of the cerebral cortex (about the size of a rice grain)
contains a billion synapses. The largest DL networks today are reaching a billion weights.
The cortex has the equivalent power of hundreds of thousands of deep learning networks,
each specialised for specific problems! There is also work [30] in high-performance comput-
ing with Blue Gene/P supercomputer that requires 147,456 processors and 144 TB of main
memory to simulate the brain of a cat with approximately 800 million neurons.
Real neurons are far more complex with chemical processes enabling a single neuron
to recognise and respond to hundreds of different input patterns. The simplicity of the
weighted connections between artificial neurons bears no resemblance to the complexity
of the synapses connecting real neurons. Each individual synapse comprises a complex
system of dendrites, axons and neurotransmitters. Each neuron is connected to other neu-
rons through thousands of synapses. The neurons are not arranged in neat layers that are
triggered in a formal sequence, and there are feedback loops far beyond anything we see in
any man-made engineering system.
While the idea that ANNs are based on the human brain is attractive and even seduc-
tive, it isn’t really a fair representation of the truth.
There’s a long way to go before an ANN of 86 billion neurons is equivalent to a human
brain with the same number of neurons!
Furthermore, the human brain is not just a huge collection of neurons. The brain has
an architecture that has evolved over hundreds of thousands of years and contains subsys-
tems responsible for different functions. Even with the huge number of neurons and the
sophisticated architecture that we, mere mortals, are only just starting to understand, the
human brain does not teach itself even the most simple concepts. Human beings require
2 years of 24-hour care followed by several years of primary education and even more years
of secondary education to acquire relatively basic skills of mathematics and literacy.
Neural networks are an exciting and stimulating branch of AI … but they should be
used in the right way for the right problem with the right understanding of their capabili-
ties and their limitations [31–33].
90 ◾ Beyond Algorithms
If you are ever asked to evaluate an AI approach, remember and apply these fundamental
principles.
• An algorithm transforms inputs to outputs in a finite number of steps and there are
many different types of algorithms to build models (neural networks, rule-based,
using knowledge in a discipline, etc.)
• Obsession with any particular set of algorithms can lead a project astray.
For those who are interested in really digging deep into this subject, we recommend the
book Artificial Intelligence: A Modern Approach by Stewart Russell and Peter Norvig [6].
It’s an essential resident of every Algorithm Addict’s bookshelf. Please take the time to
understand the multitude of different algorithms in this fascinating domain … but, what-
ever happens, beware of Algorithm Addiction!
It’s Not Just the Algorithms, Really! ◾ 93
REFERENCES
1. D. E. Knuth, “Ancient Babylonian algorithms,” Communications of the ACM, 15, pp. 671–677
(1972).
2. D. E. Knuth, The Art of Computer Programming: Volume 1: Fundamental Algorithms Addison-
Wesley, 3rd Edn (1997).
3. M. R. Garey and D. S. Johnson, Computers and Intractability- A Guide to Theory of
NP-Completeness W. H. Freeman, (1979).
4. R. Reddy, “To dream the possible dream,” Communications of the ACM, 39, pp. 105–112
(1996).
5. R. Fjelland, “Why general artificial intelligence will not be realized,” Humanities and Social
Sciences Communications, 7, Article 10 (2020).
6. S. J. Russell and P. Norvig, Artificial Intelligence: A Modern Approach Pearson, 4th Edn (2020).
7. P. Jackson, Introduction to Expert Systems Addison-Wesley, 3rd Edn (1998).
8. A. Ng, “Machine Learning Yearning,” https://www.deeplearning.ai/
machine-learning-yearning/.
9. A. Geron, Hands on Machine Learning with Scikit-Learn & TensorFlow O’Reilly, (2017).
10. I. Goodfellow, Y. Bengio and A. Courville, Deep Learning MIT Press, (2016).
11. J. R. Quinlan, C4.5: Programs for Machine Learning Morgan Kaufmann Series in Machine
Learning, (1992).
12. L. Breiman, et al., Classification and Regression Trees Chapman and Hall/CRC, 1st Edn (1984).
13. M. Williams, “How strong is gravity on other planets?” https://phys.org/news/2016-01-strong-
gravity-planets.html.
94 ◾ Beyond Algorithms
14. J. Navratil et al., “Accelerating physics-based simulations using end-to-end neural network
proxies: an application in oil reservoir modeling,” Frontiers in Big Data, 2, Article 33 (2019).
15. W. S. McCulloch and W. Pitts, “A logical calculus of the ideas immanent in nervous activity,”
Bulletin of Mathematical Biophysics, 5, pp. 115–137 (1943).
16. D. Rumelhart, G. Hinton and R. Williams, “Learning representations by back-propagating
errors,” Nature 323, pp. 533–536 (1986).
17. Y. Le Cun, et al. “Handwritten digit recognition with a back-propagation network,” Neural
Information Processing Systems Conference, pp. 396–404 (1989).
18. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolu-
tional neural networks,” Advances in Neural Information Processing Systems, 25, pp. 1097–
1105 (2012).
19. B. Zhou et al., “Interpreting deep visual representations via network dissection,” IEEE
Transactions on Pattern Analysis and Machine Intelligence, 41, pp. 2131–2145 (2019).
20. D. Yu et al., “Feature learning in deep neural networks - studies on speech recognition tasks,”
arXiv:1301.3605.
21. S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Transactions on Knowledge and
Data Engineering, 22(10), pp. 1345–1359 (2010).
22. K. Weiss et al., “A survey of transfer learning,” Journal of Big Data, 3, p. 9 (2016).
23. F. Zhuang, et al. “A comprehensive survey on transfer learning,” Proceedings of the IEEE, 109,
pp. 43–76 (2021).
24. J. Blitzer, M. Dredze, and F. Pereira, “Biographies, bollywood, boom-boxes and blenders:
Domain adaptation for sentiment classification,” Proceedings of the 45th Annual Meeting of the
Association for Computational Linguistics, Prague, Czech Republic, (Jun. 2007), pp. 440–447.
25. L. Nanni et al., “Comparison of transfer learning and conventional machine learning Applied
to structural brain MRI for the early diagnosis and prognosis of Alzheimer’s disease,”
Frontiers in Neurology, www.frontiersin.org, 11, Article 576194 (2020).
26. T. Young et al., “Recent trends in deep learning based natural language processing,” IEEE
Computational Intelligence Magazine, 13(3), pp. 55–75 (2018, August).
27. X. Qiu, et al., “Pre-trained models for natural language processing: a survey,” Science China
Technological Sciences 63, 1872–1897 (2020).
28. V. Francois-Lavet, et al., “An introduction to deep reinforcement learning,” arXiv:1811.12560.
29. T. J. Sejnowski, “The unreasonable effectiveness of deep learning in artificial intelligence,”
Proceedings of the National Academy of Sciences, Jan 2020, p. 201907373.
30. D. Modha, et al., “Cognitive computing,” Communications of the ACM, 54, pp. 62–71 (2011).
31. A. Darwiche, “Human-level intelligence or animal-like abilities?” Communications of the
ACM, 61, pp. 56–67 (2018).
32. J. Pearl, “The seven tools of causal inference, with reflections on machine learning,”
Communications of the ACM, 62, pp. 54–60 (2019).
33. G. Marcus, “Deep learning: a critical appraisal,” arXiv:1801.00631.
Chapter 4
Know Where to
Start – Select the Right Project
It was 1994 and I had just given up a secure career with a guaranteed pension! I needed
a job and fast. More importantly, I wanted to make my fortune so that I never had to
work again.
I wrote to one of the top Formula 1 teams in the history of the sport and told them that
I knew how to use AI to revolutionise motor racing. The team responded by inviting me
to work with them to test out my ideas. In our initial brainstorm we came up with some
great ideas and two seemed to be particularly interesting.
One idea was to use AI to understand how to set up racing cars for races. It was a mas-
sively complex problem that needed a really intelligent system and there was a load of
data to work with. The other idea was to use AI to determine race strategy. This problem
seemed to be less important, a bit mundane and we didn’t really have the right data.
I started work immediately on the car setup problem and soon discovered that the data,
whilst plentiful, was still insufficient. The problem itself was highly complex and the
business process I was trying to align with was led by highly skilled individuals who
were using years of experience to make sensitive judgement calls. Conversely, the race
strategy problem could be tested easily without the need to modify a proven business
process. The lack of data was easily solvable with simulation. By the time I realised that
I had chosen the wrong problem, it was too late! The project was shut down and I failed
to make my millions.
By selecting the wrong problem, I failed to deliver a working solution and instead of
living a life of glamour in Formula 1, I now have to write a book!”
James Luke
DOI: 10.1201/9781003108498-5 95
96 ◾ Beyond Algorithms
In the previous two chapters, we introduced the key ideas behind building artificial
intelligence (AI) applications for the enterprise and explained that it takes more than algo-
rithms to deliver those applications. Given that a large percentage of AI projects in the
enterprise end as end as proof of concepts (even when successful) and do not move to
production [1–4], we want to address the important question, How do we select AI projects
that are most likely to succeed in the enterprise? We call it the “The Doability Method”. It is
a key contribution of this book. This method was developed in IBM over many years, and
we found it to be very useful in prioritising AI projects in an enterprise portfolio and in
managing an AI project during the execution. Let us start.
Step 1: Assess candidate business ideas and determine which ones are suitable for the
current state of practice of AI technology.
Step 2: Evaluate the ones that are suitable for AI in terms of business value and techni-
cal feasibility using five themes: Business Problem, Stakeholders, Trust, Data and AI
Expectation.
In this chapter, we present the basic ideas behind the first step of the Doability Method
ahead of deeper discussions about AI in the subsequent chapters. The second step of the
method is in Chapter 9. The intervening chapters provide the rationale to get you ready for
the second step. Applying the Doability Method requires an understanding of the practical
application of AI to real problems. Whilst the chapters that follow this one aim to provide
the understanding required to support the application of the Doability Method, we hope
and believe that the knowledge of AI shared in those chapters will be of value all by itself.
Before getting into the detail of the Doability Method, it’s worth taking a few minutes
to talk about Innovation and Emerging Technologies. Even after 70 years (or more), AI is
still very much an emerging technology and, as such, must be approached differently from
conventional projects.
A PORTFOLIO-BASED APPROACH
Adopting a portfolio-based approach creates the flexibility and agility required to optimise
the chances of success. A typical portfolio may be initiated with 10–20 ideas out of which
two or three may be immediately deliverable. A handful of projects will be dropped due to
the lack of achievable business value or technical feasibility. The remaining projects may
require further evaluation, or technical development.
A key element of this approach is not closing out options too early: you will need to
accept that your best project ideas might need to be put on ice if blockers are encountered
and brought back to life when those blockers are removed. For example, it may be that
the application of AI in a control system is not practical due to the lack of a sensor with
the accuracy required for the solution to work. Rather than pursue an unachievable goal,
it may be wise to shelve the project and focus on another idea. However, it’s critical that
when projects are pushed onto the back burner, they are not considered failures. In the
case of the control sensor, good practice would be to maintain a ‘technology watch’ such
that if an appropriate sensor becomes available, the project can be re-started and the
benefits realised.
An undervalued aspect of recognising that a project should stop is that you have prob-
ably learnt something profound about your business. It may indicate another opportunity
and/or something that you need to better protect. In our sensor example, presumably the
current system is working because of the hidden, and presumably undervalued, expertise
of some of your workers. Maybe special care should be taken to retain those workers.
• Supervised ML is, by far, the most proven AI technology across a wide range of problems.
• Supervised ML is best (easiest to build, test and maintain) when applied to problems
of narrow scope (e.g. image or text classification). Therefore, it is better to break any
complicated task into a set of simpler tasks if you can.
98 ◾ Beyond Algorithms
• It is necessary that the use of AI fits the appropriate business process to minimise the
risk. So, it is critical that typical project team members (with no deep expertise in ML
algorithms) can understand and evaluate the task.
• This picture reflects the maturity of the state of practice in the AI technology for real
business use now. It doesn’t preclude the use of ML techniques such as Reinforcement
Learning (see Chapter 3), which relies on interaction with the application environ-
ment (e.g. user feedback) for learning instead of labelled data.
• If there are other ways for machines to learn without the reliance on large quanti-
ties of labeled data, this decision diagram in the next edition of this book may look
different.
START HERE
Q1
YES Can humans NO
perform the
task at some
scope & speed?
Q4
Can the task be
Q2 YES broken into
NO
NO
YES Can they smaller tasks?
explain the
reasoning?
Q5
YES Can humans
evaluate the
Q3 task, if done by
Is the reasoning NO NO
AI?
YES behind the task
practical to Q6
encode? Is it feasible to
YES get sufficient
labeled or
simulated NO
data?
We now discuss the diagram in terms of the questions posed at the decision nodes (i.e.
diamonds).
Q1. Can Humans Perform the Task at Some Scope and Speed?
We start with the question of whether human analysts or specialists can already perform
the task, albeit at lower speed and may be of some limited scope.
This question has three important implications:
i. Ideally, the best scenario for AI use is when there is an existing business process
where a specific task could benefit through automation to improve the productivity
and timeliness. In this scenario, clearly someone is already performing the task.
ii. Creation of AI needs business domain knowledge to make sure that the right factors
are considered to create the AI.
Know Where to Start ◾ 99
iii. If there is someone who knows how to do the task, it increases the chance of success
of creating a viable solution, whether it is AI or not. Remember it’s not just any old
human doing the task; it’s a human that your solution development team has access
to (e.g. don’t try to build a medical diagnosis AI without a medical professional on
your team).
may notice the way a person walks or paleness in their appearance. The Oncologist will ask
questions about lifestyle and form a judgement based on a very broad set of data. Typically,
an Oncologist will see a relatively small number of cases and, unfortunately, we’re not able
to go back to historical data and test what would have happened with a treatment plan. As
you can see, this is a far more challenging task to encode.
Q5. C
an Humans Evaluate the Task, if Done by AI?
This is another very important question with strong implications. It is well known in the
field of computational complexity [5] that an algorithm is not useful unless we can validate
the correctness of it. The expectation is that validating the algorithm may be easier than
creating the algorithm. If the AI comes up with an algorithm to solve a business task, but
we have no way of verifying the results, it is a significant business risk. AlphaFold, from
Google Deepmind, is an example of an AI that can solve a problem not (practically) solvable
by humans but where humans can evaluate the answer; AlphaFold’s neural nets exhaus-
tively try millions of combinations of protein folds until a shape is reached that obeys the
tens of thousands of folding rules per protein. It is not practical to do this manually, but
human researchers can validate the correctness of AlphaFold’s candidate predictive shapes
using the actual protein’s chemical properties.
In terms of labelling data, there are strategies that can be adopted to make the task more
practical. For example, crowd sourcing can be a very good way of generating very large sets
of training data. An important aspect of crowd sourcing data labelling is of course to send
the same data to multiple annotators and then normalise the results to ensure consistency.
Another approach is to use the ML as part of the labelling process. This technique, often
referred to as Active Learning, involves interactive analysis of the data. The human annota-
tor is presented with a set of data for manual labelling. As the human labels the data, the ML
is being trained in the background and additional algorithms are working to identify which
data to present next. The Active Learning algorithms may, in one strategy, pick records
similar to those which are achieving the poorest performance in the background training.
Simulating data is often seen as a great panacea in AI projects. Why waste all that expen-
sive human time manually labelling outputs (e.g. pictures of cats, dogs, etc.) when you can
just write an algorithm to do that? Please be careful when it comes to simulated, or synthetic,
data. We discuss this in greater detail in Chapter 7. The obvious challenge is that if you can
write an algorithm to do the job perfectly then why bother with training an AI to do the
same thing. If your algorithm gets it right most of the time, your AI will at best do as well
but most likely worse than the original algorithm. That being said, it’s not quite that simple,
and there are cases where the use of synthetic data can work. For example, if developing a
decision support system where the AI makes decisions about the environment rather than
classifying data within it. In such cases, the use of synthetic data can work as a simulator
simply generating all possible permutations and the AI learns what works and what doesn’t.
Ultimately, using ML is all about the data and you should consider the practicalities of
how you are going to obtain and manage high-quality data. Knowing whether you have
sufficient data is the billion-dollar question and there is no simple way of answering (e.g.
you may have millions of rows of training data, but does it cover all possible eventualities?).
However, you should have the data science experts on your project to at least make an ini-
tial statistical evaluation.
iii. AI may be a good option, but you must do Doability – Step 2 to assess the Business
Value and the Technical Feasibility of this venture.
To demonstrate its utility, let us walk through the first step of the Doability Method with
three examples.
i. With the internet becoming the primary platform for business applications and
increase in the digitisation of business transactions, there is an explosion of data
captured by an enterprise [7].
ii. Increasing use of public and private clouds is making it possible to acquire ‘elastic’
computing resources as you need, compared to a traditional IT infrastructure with
upfront capital equipment costs.
iii. The field of AI has made phenomenal advances in ML algorithms in the last decade,
particularly in supervised ML.
With this in mind, let us evaluate three ideas using Doability Method Step 1 to see where
we land in the three recommendations in Figure 4.1.
• Q1: Yes. World champions are people (apparently) and they become world champion
by beating other people. So, yes … a person can perform the task.
• Q2: Yes. While a Grand Master cannot sit down and write out an end-to-end process
for winning at chess due to its sheer complexity, he or she may be able to explain their
reasoning for any specific board position. Since chess is a game made up of moves, it
is possible to anticipate and evaluate some number of moves on either side in terms
of a path to a win (or loss), loss of pieces on either side, the freedom to manoeuvre
individual pieces, etc.
• Q3: No. Whilst a Grand Master could explain some number of moves, it would not
be practical to write a procedure for every possible “what if’ outcome/board position.
There are just too many!
Know Where to Start ◾ 103
• Q6: Yes. Indeed, there were many thousands of historical Grand Master games. Yet,
they still DID NOT cover all possible positions, move sequences and game outcomes.
BUT, you could construct a simulation – you know how each piece moves, the board
is finite and there is a definite goal/end game.
In 1997, the ML algorithms and the infrastructure were not mature enough. The best you
could do was to:
i. Create a system that could simulate many moves ahead, sometimes more than 20,
consisting of millions of combinations on the fly.
ii. Encode a set of programmatic rules that could use the simulation to search for the
optimum choice to win the game at every move, based on game knowledge and
history. By doing so, you would have built a system that defeated a World Chess
Champion … at least that is how IBM’s Deep Blue defeated Garry Kasparov.
An interesting point to note here is that Deep Blue was simply simulating a subset of every
possible permutation of the game, evaluating 200 million moves per second. By doing so,
Deep Blue was able to perform a task that an ordinary person can perform, BUT at a
massive scale. This gave Deep Blue its “intelligence”, i.e. AI. It is important to note that
the technology behind Deep Blue was very different from the current focus on ML as the
primary technique in AI.
However, Google Deep Mind’s AlphaZero program can play championship level
chess today with NO human input, but just using the game rules augmented by self-
play with reinforcement learning [10]. Given the advantage demonstrated recently by
AlphaZero over Starfish (the best current chess program in the DeepBlue genre) [11],
if we were to create a chess application today, we would be foolish not to consider ML
based approaches.
• Q1: Yes; A human analyst can read through a sequence of financial transactions
involving a person or an organisation and identify the potential evidence of money
laundering. He may be able to do a few cases in a day but considering the large num-
ber of financial transactions in any given day and most of them are perfectly legal,
this scenario is begging for an AI application.
• Q2: Yes. Typically, this takes the form of enumerating rules that capture the human
understood patterns of money laundering. While this approach may not be complete,
it will definitely help to automate the task.
104 ◾ Beyond Algorithms
In the 2020s, there are a number of alternatives to building a program to play chess or a
similar type of board game.
1. Classical Approach: Following the style of Deep Blue and its predecessors, this
approach would rely more heavily on automated tuning of the evaluation function
parameters and use modern search tree pruning methods that are highly efficient.
Most older versions of Stockfish [5], a popular open source chess program, are
based on this method.
2. Deep Reinforcement Learning Approach: AlphaGo [6] demonstrated that training
a Go program with self play, using deep neural networks, reinforcement learning
and Monte Carlo tree search [7] could produce very high-level play. AlphaZero
[8] showed how an improved version of this approach could train even stronger
systems for three different games, including chess.
3. Hybrid Approach: Combining the classical approach with neural networks has
also resulted in a very high-performance chess program (Stockfish 12 and later
versions [9]).
From the point of view of a game developer, either approach 2 or 3 is likely to be an effec-
tive for developing a game playing program. Approach 1, the classical approach, while
quite effective, requires significant domain knowledge about the game and would likely
require a much longer development time. Approach 2, as implemented in AlphaZero
and recent open source variants (e.g. Leela Chess [10]), has a relatively large and com-
plex evaluation function, which requires significant computation for training. In addition,
the relatively slow execution time of the neural network forces a smaller scale search
(based on Monte Carlo tree search). Approach 3 has a much simpler and faster neural
network evaluation, which enables a very fast alpha-beta-based tree search that is close
to the speed of the classical versions of Stockfish.
Murray Campbell
Murray Campbell is a Distinguished Research Scientist at the IBM T. J. Watson Research
Center and a Fellow of the Association for the Advancement of Artificial Intelligence
(AAAI). He is best known for his work on Deep Blue, the IBM computer that was the
first to defeat the human world chess champion in a match.
Know Where to Start ◾ 105
REFERENCES
1. M. Campbell, Jr, A. J. Hoane and F. H. Hsu, “Deep blue,” Artificial Intelligence,
134(1–2), pp. 57–83 (2002).
2. G. Tesauro, “TD-Gammon, a self-teaching backgammon program, achieves mas-
ter-level play,” Neural Computation, 6(2), pp. 215–219 (1994).
3. D. J. Slate and L. R. Atkin, “Chess 4.5—the Northwestern University chess program.”
In Chess Skill in Man and Machine. Springer, New York, NY, (pp. 82–118, 1983).
4. T. Anantharaman, M. S. Campbell and F. H. Hsu, “Singular extensions: adding
selectivity to brute-force searching.” Artificial Intelligence, 43(1), pp. 99–109 (1990).
5. https://stockfishchess.org/.
6. D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche … D.
Hassabis, “Mastering the game of Go with deep neural networks and tree search,”
Nature, 529(7587), pp. 484–489 (2016).
7. R. Coulom, “Efficient selectivity and backup operators in Monte-Carlo tree search,”
In International Conference on Computers and Games, Springer, Berlin, Heidelberg,
(pp. 72–83, 2006).
8. D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez …D. Hassabis,
“A general reinforcement learning algorithm that masters chess, shogi, and Go
through self-play,” Science, 362(6419), pp. 1140–1144 (2018).
9. https://www.chessprogramming.org/Stockfish_NNUE.
10. https://lczero.org/.
• Q3: No. The reason for this answer is simply that the human created rules cannot
keep up with the new ways of money laundering invented every day by other crimi-
nally minded clever humans.
• Q6: Yes. With the digitisation of financial transactions in the last decade, it is possible
to get the training data with human annotated labels on normal and money launder-
ing scenarios.
Clearly, human analysts cannot scale to perform tens of millions of such tasks every sec-
ond. By exploiting the scalability of AI, we can deliver solutions that far out-perform
human capabilities … even though the task they are performing is simple. The key point is
that AI is not being used to solve a difficult problem that no human can comprehend! AI is
solving a very simple problem that any human could easily do. However, the AI can do this
task millions and millions of times per second to deliver a capability far greater than that
of a human. We refer to the paper by Chen et al. [12] on the evolution of ML techniques for
anti-money laundering applications.
• Q1: Yes. the task is currently performed by radiographers and other medical imaging
specialists. It is a skilled task and does require extensive education, but it is performed
by very large numbers of medical professionals around the world.
• Q2: No. The answer to this probably depends on the exact use case, but the general
answer is that they can partially explain their reasoning. They can look at an image
and point out features that draw their attention to a particular part of the image.
However, there will be variance and exception cases so their ability to fully explain
their reasoning is limited. Ultimately, they will be using their experience to identify
features that “look like” points of interest. So, we’re going to say no to this question.
They can’t fully explain their how they do it.
• Q5: Yes. A trained healthcare professional will be able to evaluate the perfor-
mance of an AI.
• Q6: Yes. Millions and millions of medical images are captured and analysed all
around the world every day. These images are labelled using a standard medical ter-
minology and could form a very valuable set of training data for ML.
Our overall decision therefore is that analysing medical images is potentially a very sound
application of ML. In fact, Wu et al. [13] compared the performance of ML algorithms vs.
Radiology Residents and concluded that “…it is possible to build AI algorithms that reach
and exceed the mean level of performance of third-year radiology residents for full-fledged
preliminary read of anteroposterior frontal chest radiographs”.
A caveat here is that in the real world, a medical doctor may interpret the image in
conjunction with other facts about the patient (e.g. demeanour, skin colour, shortness of
breath, difficulty of movement etc.) for a proper diagnosis. Clearly, these are not present in
the training data consisting of medical images alone.
• Business Problem: a key part of any complex project. Clearly defining the scope is
important, however in an AI project we also need to consider how the impact of the
AI will be measured, the skills you will require to deliver and how the new capability
will be integrated into the business process.
• Stakeholders: as mentioned in the introductory chapter, AI solutions will impact
society to a much greater extent than previously. In addition to managing internal
Stakeholders within an enterprise, the values and beliefs of a whole range of external
Stakeholders, from regulators to customers, need to be managed.
Know Where to Start ◾ 107
• Trust: for AI to be successful it really does need to be trusted … unless you are
a James Bond villain of course. Trust is not something you can specify; it’s up to
your consumers to decide whether to trust the AI. However, you can aim to develop
Trustworthy AI by considering important factors including accuracy, ethics, bias
mitigation, explainability, robustness and transparency.
• Data: it’s all about the data, so any project evaluation will need to consider privacy,
availability, adequacy, operations and access to domain expertise.
• AI Expectation: what is the real necessity driving the project and has this type of
thing been done before? What is the true scope (again) of the application and is it
really feasible (a more thorough version of the step 1 evaluation outlined above)?
Finally, what other complexity factors exist and what are your hopes regarding
reusability?
As we have said, having just one AI project in mind could be considered inefficient. AI
projects can and do fail, most often when the data doesn’t quite live up to your expecta-
tions. It pays early in your project to have a portfolio of AI ideas, preferably achievable with
the same data. Step 2 of the Doability Method in Chapter 9 is designed to compare these
ideas and help you to prioritise the ones most likely to succeed. Before you jump straight to
Chapter 9, we know you want to, or preferably read the intervening chapters first, why don’t
you cycle back to the decision flow diagram (Figure 4.1) and run a couple of alternatives to
your main idea through it? It may surprise you which one comes out on top when you put
them through Step 2 of the Doability Method.
If you are eager test your top ideas in Step 2, then feel free to skip ahead to Chapter 9
and dive straight into the questions. If you’re interested in learning more about the rea-
soning behind the questions, then we recommend that you read the following Chapters.
Chapters 5 & 6 will focus on Value, and Chapters 7 & 8 on Doability.
• The Doability Method allows you to evaluate if your business idea can be supported
by the current AI technology.
• The method consists of a Decision Diagram (Step 1) and a Doability Matrix (Step 2)
• The Decision Diagram helps to validate the suitability of your business idea to the
most proven AI technology (i.e. Supervised Machine Learning)
108 ◾ Beyond Algorithms
The Doability Method Step 2 is covered in full detail in Chapter 9 following a series of
Chapters that discuss the finer points of Doability and Value.
REFERENCES
1. KPMG 2019 Report: “AI transforming the enterprise.”
2. O’Reilly 2019 Report: “AI adoption in the enterprise.”
3. Databricks 2018 Report: “Enterprise AI adoption.”
4. MIT Sloan-BCG 2019 Research Report “Winning with AI.”
5. NP-Completeness, https://en.wikipedia.org/wiki/NP-completeness.
6. “DeepMind says it will release the structure of every protein known to science,” MIT
Technology Review: https://www.technologyreview.com/2021/07/22/1029973/deepmind-
alphafold-protein-folding-biology-disease-drugs-proteome.
7. D. Reinsel, J. Gantz and J. Rydning, “The digitization of the world from edge to core,” An IDC
White Paper – #US44413318, sponsored by Seagate, November 2018.
8. M. Campbell, A.J. Hoane Jr and F. H. Hsu, “Deep blue,” Artificial Intelligence, 134(1–2),
pp. 57–83 (2002).
9. N. Silver, see the Chapter “Race Against the Machines” in The Signal and the Noise, Penguin
Press, New York (2012); “The man vs. the machine”, https://fivethirtyeight.com/features/
rage-against-the-machines/.
10. D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez …D. Hassabis, “A general
reinforcement learning algorithm that masters chess, shogi, and Go through self-play,”
Science, 362(6419), pp. 1140–1144 (2018).
11. AlphaZero Crushes Stockfish in New 1,000-Game Match https://www.chess.com/news/view/
updated-alphazero-crushes-stockfish-in-new-1-000-game-match.
12. Z. Chen et al., “Machine learning techniques for anti-money laundering (AML) solutions
in suspicious transaction detection: a review,” Knowledge and Information Systems, 57,
pp. 245–285 (2018).
13. J. T. Wu et al., “Comparison of chest radiograph interpretations by artificial intelligence
a lgorithm vs radiology residents,” JAMA Network Open, 3(10), p. e2022779 (2020).
Chapter 5
The story of the Allied code breakers in World War 2 represents one of the greatest engi-
neering achievements in human history. On the eve of war, the British Government
established a code breaking group in a country house, called Bletchley Park, 50 miles
northwest of London. The team was tasked with de-coding intercepted German messages
and providing the Government with intelligence of exceptional value.
Initially a small group of code breakers including mathematicians, linguists and those
who had learned their trade in World War 1, gathered at the establishment. However,
within a very short time, the group started growing rapidly as the original code breakers
were joined by more mathematicians, scientists and engineers. Collectively, they were
doing something which many, including the enemy, thought was impossible. They were
breaking the enemy’s codes and reading their messages.
Whilst code breaking had been practiced for centuries, Bletchley Park represented
a massive step change! The group developed the methods, technologies and pro-
cesses required to transform coded messages into plain text on an industrial scale.
In doing so, they gave the Allies a massive military advantage that some argue
shortened the war by at least two years and laid the foundations of the computer
industry. One of leading mathematicians at Bletchley Park was Alan Turing, one of
the founding fathers of both AI and Computing, captured in the 2014 movie “The
Imitation Game”.
One fascinating aspect of the Bletchley Park story is that, at the start of the war, there
was no way of knowing that the codebreakers would be successful.
Try to imagine writing a modern-day business case for Bletchley Park! The executive
summary would be, “We would like to take several hundred theoretical academics and
put them in a country house where will give them unlimited resources to invent a com-
pletely new technology aimed at breaking an unbreakable code”.
Would that business case carry much weight with a twenty first century CEO?
Fortunately, the British Government was willing to take a leap of faith and make a massive
investment in the industrialisation of code breaking. There was no upfront business case
for Bletchley Park … just total belief in the importance of the enterprise and trust in those
empowered to deliver. This was demonstrated very clearly in October 1941 when three of
the leading code breakers, including Turing, felt that they needed more resources. They
wrote a letter to the Prime Minister, Winston Churchill, expressing their concern about
the lack of resources. On reading the letter, Churchill wrote on it a very direct and clear
instruction, “Action this day! Make sure they have all that they want on extreme priority
and report to me that this has been done” [1].
• Since AI applications are going to make decisions and judgement calls that were tra-
ditionally left to humans, topics such as ethics, explainability transparency are not
peripheral topics. They are at the core of an AI project. We are at the infancy of dealing
with these ‘human’ issues.
• Stakeholder issues are more significant due to the social and moral aspects. For exam-
ple, you may have to worry about what the media would say or how the employees of
the enterprise would react to the choice of the AI being deployed.
• AI projects need more experimentation (e.g. choice of models) compared to tradi-
tional IT projects. This introduces uncertainty in the quality of the outcome and the
schedule of the project.
• The availability of data of sufficient quality and quantity is critical. Most often, this
is not easy to assess in the beginning of a project since some level of iteration with
model building is necessary to get a reliable assessment. This adds to the uncertainty.
• There are significant differences in the engineering of machine learning (ML) sys-
tems compared to traditional IT systems in terms of resources (i.e. people, engineer-
ing environment, etc.) and processes (e.g. DevOps, testing, persistence of training
data, etc.). These have to be in place for a successful execution. Also, the engineering
environment has to persist for as long as the application itself is deployed.
You should think about the business processes you have and assess where introducing AI
will give you the best business value, while being practically doable. If the AI is going to
require new business processes, you should be aware of potential challenges lying ahead
due to their introduction. Let us talk about factors contributing to building business cases
for AI projects and see how they are different.
Examples of AI Business Cases
Let us consider an IT portfolio of applications supporting the various business processes
in an enterprise. A typical business case for a new AI project in this context will belong to
one of three basic buckets: (i) efficiency, (ii) enhancements and (iii) new capabilities. The
benefit to the enterprise is measured typically in terms of increased revenue, reduced cost,
increased customer satisfaction, improved market share and, for publicly held companies,
increased share value and market capitalisation.
Efficiency
AI is going to automate a task, currently being done by one or more humans. Expected benefit
will be less expenses, a shorter time to do the task and potentially much more consistent out-
put, particularly if multiple humans are currently doing the same task. The details of the busi-
ness case will revolve around net cost savings due to the use of the AI instead of the humans
or the financial value of the saved time. An example of this is in Customer Relationship
Management where incoming email or text from clients needs to be classified into different
bins depending on the type of follow-up action needed, a tedious job for a human, great for
AI. A second example of efficiency gains would be the use of “chatbots” in customer support
to answer simple questions from the customers. In either case, primary evaluation criterion
has to be an objective comparison of AI versus humans in performing the task.
Enhancements
This can have two flavours.
i. Your current technology for doing a task that could be improved by using AI. An
example of this can be seen in computer vision where the older technologies relied on
human efforts to define distinguishing features in the image, whereas the more recent
deep neural networks are able to extract these features automatically using the training
samples with labels and also perform image recognition task at a higher accuracy.
ii. AI can do certain narrow tasks better than humans in terms of speed and accuracy.
Examples of such tasks include speech to text conversion, text to speech conversion
and text translation. Given the advancements in speech processing and text process-
ing in the past decade, this can be done quite well by AI. The business case will involve
an assessment of the value of the use of AI for these tasks vs. the current approach in
terms of performance quality and effort.
New Capabilities
These are tasks simply not possible for humans to do or complete in the time allotted. For
example, recommender systems are really ideal for machines because they can pick out the
patterns in the behaviours of individual users and user communities very easily, but this
112 ◾ Beyond Algorithms
is something that a human or a group of humans cannot do. Reading thousands of pages
in a few seconds to look for specific concepts or entities of interest is another example of
something that would be impossible for a human to achieve at the required speed but easy
for an AI. The business case will contain the new revenue opportunity for the company
due to these new AI capabilities that do not exist in the company portfolio today or may be
anywhere in the market.
STAKEHOLDERS
Your AI project will have a lot of stakeholders, more than for your typical IT application
project, and they won’t all be obvious. We are accustomed to thinking of the Head of
Marketing and the Finance Director as stakeholders. In AI projects, we generally need to
consider many more groups and individuals – in our examples we’ve underlined a dozen,
your project may well have more.
The first scenario (Figure 5.1) is when the enterprise AI application is directly interact-
ing with the end consumers. Examples are internet applications in banking, insurance,
e-commerce, etc. There are three groups of stakeholders in this scenario:
• The Enterprise: This group includes various people involved in the creation of the
AI. Investors with the financial stake in the project; Business sponsors who support
the specific AI project and assign resources; Domain Specialists whose knowledge you
need to build and verify the AIs efficacy; Engineers who build the AI application and
Business Value and Impact ◾ 113
The Enterprise
• Investors
• Project Sponsors
• Domain Specialists
• Engineers
• Data Suppliers
• Compute Suppliers
• Workers AI is replacing
AI Project
Stakeholders
Society End Consumers
• Regulators (Internet Banking, e-Commerce, Social
Influences Media, etc.)
• General Public
Society
• Media
FIGURE 5.1 Scenario where the AI application directly interacts with end consumers.
any suppliers of data used by the AI and computing environment (e.g. cloud provider).
In addition, the introduction of AI either in the internal enterprise processes (e.g. auto-
mated email classification) or in the consumer facing aspects (e.g. bots in customer
support) may affect the employment of some workers in the enterprise while their par-
ticipation is critically necessary for the creation of the AI. The overall quality of the
AI application critically depends on this group. Your employees as a whole will have a
view on what you are doing. They may be excited that you are moving into this brave
new world or they may be worried that this is the beginning of the end as the machines
take over.
• End Consumers/Customers: These are the intended beneficiaries of the AI outputs.
The purpose of the AI is to provide benefits (e.g. a more pleasant user experience) to
these end consumers in their interactions with the enterprise. More engagement of the
end consumers with the application is generally a positive impact on the enterprise,
particularly amplified by social media. Your customers will have a view of AI and will
want to know how it impacts their service and what it means for their personal data.
• Society: The impact of the AI on the society is really judged by three different sources:
The regulators who have the job of making sure that the AI conforms to the defined
standards, where they exist; the general public whose impression of AI matters for its
acceptance in the large; and the media, who have insatiable appetite for sensational
stories either good or bad. The public perception of the AI can have significant impact
either positively or negatively on the business success of the enterprise.
In the second scenario (Figure 5.2) where the purpose of the AI application is to
help professionals (e.g. doctors, loan officers, radiologists, etc.) with their decisions
affecting their clients (i.e. patients, loan applicants, etc.), you need to add them as
“Operators/Decision Makers” to the list of stakeholders.
114 ◾ Beyond Algorithms
The Enterprise
• Investors
Assist Humans Operators/Decision Makers
• Project Sponsors
• Domain Specialists (Doctors, Loan Officers, etc.)
Reduce AI risk
• Engineers
• Data Suppliers
Improved Service
• Compute Suppliers
• Workers AI is replacing
AI Project
Stakeholders
Impact on
Business
Society
End Consumers
• Regulators Influences Society
• General Public
(Patients, Loan Applicants, etc.)
• Media
FIGURE 5.2 Scenario where the goal of the AI application is to help in a professional task.
• Operators/Decision Makers: You need to make sure that AI functions are really
making them more productive and generally be more effective from their clients’
perspective. As discussed in Chapter 2, they also provide a way to reduce the risk
of AI recommending inappropriate decisions to the end consumers of AI. They play
the critical role by adding their professional judgement to apply the AI output for the
benefit of their clients (i.e. end consumers of the AI).
In addition to managing the training data, they were able to take on additional responsi-
bilities for analyzing more complex data. In effect, we had developed an augmented system
where AI and Analysts worked together to achieve far more than the Analysts had been able
to achieve previously.
It would be nice to think that this success was our intention from the outset, but the
reality is that we were lucky! Had we simply approached the project with a view to replac-
ing the Analysts, there is a good chance that they would have been hostile to the project.
Instead of helping cleanse the data and teaching us about the challenges, they would have
focused on defending their personal value. As it was, they loved their inclusion in the proj-
ect and they loved the technology even more. The fact that we stumbled into this success
should not prevent us from learning from the fact. The key learning point being that AI is
most successful when not approached as a replacement for existing resources. AI will be
most effective in augmenting existing resources and growing their roles to be more effective.
A further factor that contributed to the success of the classifier project was the ease with
which it was possible to demonstrate value to the Stakeholders. On the very first day of the
project, the Analysts were able to produce a spreadsheet of example data manually labelled
as part of the existing business process. The data was in a form that could be immediately
used to train and test the classifier. The ease with which this initial evaluation could be
undertaken meant that we were able to prove, and establish trust in the capability right at
the outset of the project.
Dave Porter
the scope of the project and quick to deploy “out-of-the-box” solutions are proposed that
perform end-to-end analytics in a proprietary manner. Any intermediate data transforma-
tion may even be locked away in a proprietary format such that it is not available for use by
other potential downstream analytics.
In cases where an open, generic approach is maintained, the initial project is required to
cover the cost of the intermediate data transformation. The data transformation may have
the potential to support multiple downstream applications; however, the business case is
evaluated solely against the first project. At this point, the business case can fail to demon-
strate value. The challenge of developing AI capabilities to perform this type of intermedi-
ate service is significant. We often refer to it as the “telephone exchange challenge”. Every
project requires a telephone, but none of the projects can afford, or are willing, to fund the
procurement of the exchange.
So, when it comes to managing the Stakeholders of AI projects, it’s important to remem-
ber three key points. Firstly, the nature of AI projects is such that the range and number
of Stakeholders are far greater than for conventional projects. Secondly, it is essential to
bring onboard the key employee Stakeholders whose roles will be directly impacted by
the new capability, they are your greatest asset. Thirdly, give serious consideration to the
“telephone exchange challenge” and develop a strategy that will enable the delivery of high-
value intermediate capabilities.
Business Value and Impact ◾ 117
• Co-creation: Sitting side by-side, the bank and IBM teams jointly co-created the
Virtual Agent. Contact centre staff felt bought into the experience and could see
the potential value right from the start.
• Team Member: The Virtual Agent was intentionally personified as a member of
the bank’s team and even has her own personality. She was even given a name to
humanise the solution and identify her as a colleague.
• Bespoke: The Virtual Agent was created and customised to how the bank would
get most value from the platform.
• Impact to Job: Using the platform meant that the contact centre staff could provide
better customer service, without having to spend a long time finding information or
getting concerned that the information being provided was correct. They were able
to trust and rely on the Virtual Agent.
Since implementing the Virtual Agent, the bank has not only seen a 20% improvement in
customer feedback but importantly a large upturn in staff engagement. They are engaged
in the process and do not see the Virtual Agent as a threat, rather a colleague they can’t
do without.
Richard Hairsine
Richard Hairsine is an Associate Partner and AI & Automation Lead in Financial Services at
IBM Consulting in the UK.
return was massive but also that the core enterprise was both understandable and measur-
able. The idea of taking thousands of encrypted enemy messages and transforming them
into thousands of plain text ones is so easily understandable that justifying an investment
becomes easy. From a measurability perspective, if you take 1000 messages and decrypt
800 of them, you are doing okay. If you don’t decrypt any, then you have a problem.
An interesting aspect of measurability at Bletchley Park is that the code breakers knew
if they were being successful. If you successfully decode enemy messages, you can read
the plain text whereas, if you fail, it’s still gibberish. Knowing when you are performing a
task correctly, being able to quantify the number of individual tasks performed and a clear
operational return on each task, all add up to a rock-solid business case.
Today’s Data Scientists will look at Bletchley Park with envy! No upfront business case and
a completely measurable return on investment throughout the entire lifecycle of the project.
Most projects have a detailed upfront business case, with milestones and must haves firmly
in place often before anyone has seen the data and no measurability once the project starts.
Modern business processes and their supporting IT are complex. It can be hard to deter-
mine where to invest in order to generate any form of return. Should AI be applied at the
point of data collection to improve the quality of data available or should it be applied in
the downstream analytics to make better operational decisions?
Often, we need to consider direct versus indirect return on investment. Improving
the quality of data at the point of capture may have no initial impact. However, it may
be an essential enabler for future projects that subsequently deliver a massive return on
investment. This challenge of understanding indirect and enabling capability versus direct
impact is significant, in modern management as we tend to focus on near term and direct
business benefit.
Right now, organisations are collecting vast quantities of data and the potential of AI to
exploit this data is enticing, seductive and exciting. So, at a strategic level, we hear business
leaders talking about the potential of AI to transform their operations. But at a detail level,
working out exactly where the value is and how to enable it is not always that simple.
Public sector organisations face the same challenge of trying to figure out where to
invest. Few government departments have the resources, or at least the commitment, to
invest everywhere! Most are less ambitious and want a more focused strategy that delivers
a more specific return in a defined time period.
Knowing where to invest requires an understanding of your business processes, either
existing or proposed, together with the accuracy required of any AI within those processes.
This can only be achieved if you have a measurement strategy for business processes and
the ability to measure the impact of AI on the effectiveness of those processes. Sometimes
this will mean identifying and measuring distant upstream processes. The next chapter of
this book will look in detail at Accuracy and Measurability.
In textbook exercises, the ability to forecast return on investment is relatively simple. In
the real world, any forecasts are invariably based on a catalogue of assumptions and judge-
ment calls. When it comes to AI, any forecast needs to take into account a range of addi-
tional factors beyond the financial return. AI offers huge potential benefits that can only be
realised if the technology is trusted and applied in an ethical manner. It is also critical that
Business Value and Impact ◾ 119
consideration is given to the impact of AI on the key stakeholders, especially those whose
assistance is needed in order to develop the capability.
Imagine a health application that fuses together data from multiple sources to allow the
identification of new treatments and best practice. Imagine if the application improved
the quality of life for tens of thousands of people and, at the same time, reduced the cost
of their care such that there was a measurable financial return to the treasury. Such an
application would clearly offer massive value to society … but what if, for the application to
work, the privacy of individuals needed to be compromised?
What if, for the application to work, we need to collect data about peoples’ lifestyles? What
if the application needs to track exactly how much alcohol an individual consumes, how
much exercise they actually take and how many calories they consume? What if that data
will be held indefinitely? What if a future government decides that they will take into account
that lifestyle data in determining entitlement to medical care and social welfare benefits?
The point here is simple! In determining value, we need to take into account many fac-
tors beyond the direct operational or financial return. Often these, knock on effects, create
issues that will negate any immediate benefit if not properly considered and managed.
Having considered our Stakeholders’ view of the project in terms of measurability and
understandability, it’s time to consider the topic of AI Ethics from a business point of view
in more detail.
the conditions were safe and it wasn’t above 40 mph, she might escape a conviction of
her own. The corollary to this, in a high-consequence AI, is you may need to store your
training data for future legal cross examination!
Thirdly, like human beings, AI Applications may be unfairly influenced by their experi-
ence. If the training data contains bias, much as rookie cop may have been taught badly
to apply all the rules rigidly, then they may show bias in their future decisions. Time spent
analysing your training data to spot and fix bias, e.g. making sure any gender or race is
not under or over represented, is time well spent.
Fourthly, if a decision is questioned by others, then AI Applications will be expected
to explain their decision-making. ML models that are used to classify images can be
exhaustively tested, using partial/feint images, to reveal the boundary features between
classifications. This can help a lot when explaining their decisions, NLP-based AIs do not
have this same luxury. In practice, many useful AI methods are poor at explaining their
decision-making. AI solutions that need safe decisions, such as driverless cars, often are
built to incorporate multiple overlapping decision models (using different techniques) to
make the actual decision by triangulation (all models agree equals good, models disagree
escalate to a human). Overlapping model systems cost more to build and maintain, but
they do offer better safety/surety to AI decisions and can help pinpoint the logic used by
the AI. In addition, the cases where the overlapping models disagree can be used to find
potential faults and troubleshoot areas in the AI solution that may need model improve-
ment/extra training.
Autonomous Judgements by AI
Perhaps the most significant difference is in situations where we expect AI Applications
to make decisions that normally would require human judgement. What do we mean by
judgement and how does it differ from normal IT decision-making? Typically, when we
think of decisions in business applications (e.g. whether a person qualifies for payment of
insurance claims), we consider various variables and the relevant criteria and decide in a
deterministic fashion. This is different from cases when we need human judgement.
Think of a robocop of the future (our archetype for a role that needs judgement) who
stops a speeding human driver on the road. The creators of the robocop have used a ML
system to ‘decide’ if the driver should get a ticket or not, based on historical data. How
hard can it be? The key observation here is that historically any two human police officers
looking at the same speeding situation may decide differently, based on their individual
judgements guided by contextual data (bad weather, emergency visit to the hospital,
pizza delivery, etc.). Our AI designers could not have captured every aspect of every
speeding incident. In reality, all they could expect to have for training data is driver
speed versus speed limit, and no records of when police officers ignored the speeders
because they know it would be a marginal call. Many factors outside the basic informa-
tion (i.e. posted speed limit and speed of the driver) influence the judgement. In short,
our robocop will not be capable of contextual judgment, because the relevant informa-
tion often involves considerations and societal understanding beyond what is contained
in the training data. No doubt, the decisions of our robocop will be scrutinised more
122 ◾ Beyond Algorithms
than any other IT system or indeed humans for proper social behaviour. In addition,
the inherent biases in the training data are adopted by the AI and hence needs a careful
evaluation. AI builders have to be sensitive to the underlying ethical questions and use
the technology appropriately. Please visit the Decision vs. Judgement vignette for more
discussion.
Data Concerns
AI needs training data for building models (more on this in Chapter 7). This immediately
brings concerns about data privacy, governance and inherent bias.
Explainability
As we discussed in Chapter 3, ML models learn complex relationships between inputs and
outputs. From the perspective of the users of the models, they are opaque (i.e. black box).
Consequently, it becomes critical to explain to the users how the model derives its output
from the inputs and provide transparency to the modelling process.
Making Mistakes
AI algorithms can use statistical models which inherently contain some percentage of mis-
takes, even if it can be designed to be small. This raises two practical questions immedi-
ately: (i) Who decides how much of mistakes is acceptable? (ii) Who is accountable for the
mistakes when they happen?
Misuse
Just like any other technology, AI is susceptible for misuse by individuals, organisations or
governments. The sheer availability of the technology and its increasing number of appli-
cations also make it inevitable that there will be large negative impacts.
Social Change
In contrast to other technology revolutions in history, AI is evolving at a very fast rate. This
can result in a rapid transformation in the job markets without the necessary period for
social adjustment in public education and training.
The discussion above shows the importance of proper ethics in building AI systems, and
in the following sections, we discuss the key factors to do just that.
DELIVERING TRUSTWORTHY AI
Of the topics mentioned above, technology misuse and social change can only be addressed
by careful human societal judgement. There are many technology examples (e.g. social
media) where misuse resulted in significant negative social impact (e.g. fake news). We
cannot blame AI for that. In the following sections, we will discuss three topics that are
ethical consequences of ML, viz., the need for Fairness, Explainability and Transparency
in AI systems. We will also discuss two other topics that expose the weakness of the ML
technology: its propensity to make mistakes and its vulnerability to adversarial attacks.
Business Value and Impact ◾ 123
BIAS IN DEVELOPMENT OF AI
You have just walked into a shop, reaching to get your credit card out and realise it’s
gone. You search your coat, bag and pockets finally coming to the conclusion you’ve
been pickpocketed. You contact the bank, cancel the cards and then try to find out if you
can claim any of the lost cash back through your insurance.
Luckily for you the insurance company has developed a chatbot to answer standard
customer queries using natural language processing. You open the dialogue box to speak
with the chatbot and ask “I’ve lost my wallet, how do I claim for its contents?” to which
the chatbot directs you to a link in order to begin processing a new claim. Fantastic. Easy.
Job done.
Let’s try again. Same scenario, same problem, same question “Hey Chatbot, I’ve lost
my purse, how do I claim for its contents?” to which the chatbot responds “I’m not sure
how to deal with your request. Please see FAQs for lost or stolen items”. Why? We asked
the same question?
Here is a clear example of the subtle nuances between male and female entity values.
The use of “wallet” and “purse” both elicited different responses, because the chatbot
was unable to recognise purse, since this variable was unknowingly overlooked from the
conversational development process.
The example above is not unusual, bias towards males and majority groups is per-
vasive throughout numerous applications of AI, from Amazon’s sexist hiring algorithm
[1], to Google disproportionately displaying high-paying job ads to men [2] and fintech’s
assigning extra mortgage interest to borrowers who are members of protected classes [3].
But these systems were not ‘born biased’. Bias, by nature, is a human construct. We
use cognitive biases to make mental shortcuts (heuristics). Since there have been over 180
different types of bias defined and classified by the Cognitive Bias Codex [4], how can
we as humans, developing systems to ‘mimic the human brain’ not expect some of our
biases to become embedded within these systems unless we put significant governance
measures in place? In the example above, it was clear that the chatbot development team
was mostly, if not entirely, male, lacking diverse and representative variables from wider
demographics. Therefore, a subtle nuance like purse and wallet is easily missed.
As a community, we need to make an active effort to ensure diversity in our devel-
opment teams and, thus, diversity in our thinking. Why? So, we don’t unintentionally
exclude vast amounts of the population from having access to our AI systems. If we con-
tinue on this trajectory, it then begs a great question posed by IBM Chief Watson Scientist
Grady Booch: “whose values are we using?” since the “AI community at large has a self-
selecting bias simply because the people who are building such systems are still largely
white, young and male” [5].
Here is a reference to a maturity framework based on ethical principles and best prac-
tices, which can be used to evaluate an organisation’s capability to govern bias [6].
Daphne Coates
Daphne Coates is a Senior Intelligent Connected Operations Consultant at IBM. She is
the UK & Ireland AI Ethics Community Lead driving new research, Intellectual Property
and offerings whilst also becoming the UK & Ireland Subject Matter Expert for fair and
explainable AI through delivering IBM’s first UKI Explainable AI project for public sector.
Business Value and Impact ◾ 125
REFERENCES
1. Business Insider, “Why it’s totally unsurprising that Amazon’s recruitment AI was biased
against women,” https://www.businessinsider.com/amazon-ai-biased-against-
women-no-surprise-sandra-wachter-2018-10.
2. The Guardian, “Women less likely to be shown ads for high-paid jobs on
Google, study shows,” https://www.theguardian.com/technology/2015/jul/08/
women-less-likely-ads-high-paid-jobs-google-study.
3. Harvard Business Review, “AI can make bank loans more fair,” https://hbr.
org/2020/11/ai-can-make-bank-loans-more-fair.
4. J. Manoogian and B. Benson (2017) Cognitive bias codex. https://betterhumans.
coach.me/cognitive-bias-cheat-sheet-55a472476b18
5. IBM, “Building trust in AI,” https://www.ibm.com/watson/advantage-reports/
future-of-artificial-intelligence/building-trust-in-ai.html.
6. D. L. Coates and A. Martin, “An instrument to evaluate the maturity of bias gov-
ernance capability in artificial intelligence projects,” IBM Journal of Research and
Development, 63(4/5), pp. 7:1–7:15, (1 July-Sept. 2019).
In the media, the term bias is now being used in the context of the very serious, and
more sinister, concern that AI systems will actually exhibit bias. This is a very legitimate
concern and we need to ensure that AI is a part of the solution to historical unfairness.
As with other aspects of AI, bias is one of those subjects where it’s important to dig through
the hype and the headlines in order to understand the reality.
In this section, we discuss the business value of building fair AI systems. We intro-
duce some famous bias examples, relevant definitions and discuss the various factors
that can contribute to bias in AI and how to anticipate these during the various stages
of a project.
• Microsoft Tay chatbot that was taught racist language by its social media users. This is
a case when the AI designers did not anticipate the type of data provided by the actual
users of the application [6].
• Amazon stops the use of an AI recruiting tool after the system taught itself that male
candidates were preferable and penalised women candidates [7].
• Google searches involving black-sounding names were more likely to serve up ads
suggestive of a criminal record than white-sounding names [8].
126 ◾ Beyond Algorithms
Some Definitions
Let us imagine building an AI to recommend loan approvals in a bank. Based on a set of attri-
butes about each applicant (e.g. annual income, FICO score, prevailing debts, assets owned,
etc.), the bank decides to approve or reject the loan application. Fairness is the desired behav-
iour dictated by an underlying standard that can be statistical, social, moral, etc. A protected
attribute (such as gender, race, etc.) is used to separate the population into groups for fair-
ness assessment. In our example, ‘fairness’ criterion can be that the percentage of the loans
approved by the bank should be the same for males and females. Bias represents the deviation
from the standard. If only 40% of the females are approved, whereas 70% of the males are
approved, then we potentially have a bias concern. Here we are evaluating the fairness at the
Business Value and Impact ◾ 127
group level (i.e. gender). Group fairness expects that groups defined by the protected attri-
butes receive similar outcomes. A fairness metric quantifies the unwanted bias in the specific
AI application such as the approval percentage above. There are many dozens of metrics for
evaluating fairness [13]. A bias mitigation algorithm is a procedure for reducing unwanted
bias in the application. Depending on how much control the enterprise has over the applica-
tion development, bias mitigation can happen in the preprocessing stage (i.e. by modifying the
training data), in-processing stage (i.e. by changing the learning algorithm) or the postpro-
cessing stage (i.e. by detecting the bias during deployment and changing the predictions) [13].
• Training Data Bias: Since the statistical AI models learn from training data, if the
training data is biased, not surprisingly the model will also be biased. In our loan
approval example, if the data from the last 10 years contains implicit bias in favour of
males, which may not even be known to the bank, AI will learn that. Now that we are
looking for bias in the AI, we cannot escape from the biases in the past practices. It
is like looking at ourselves in the mirror and realising we have been biased all along.
Once we are aware, this problem can be addressed systematically.
• Biased Algorithm: Different groups are likely to have different distributions over the
attributes, and those attributes have different relationships to the decision we are try-
ing to predict. Since most algorithms are designed to minimise average error, there
is more data corresponding to the majority group (by definition), the algorithm gives
more weight to the majority group. This effect can be partially addressed by careful
data gathering. Another reason for biased algorithm may be the deliberate choice of
a simpler model due to the limited data available, while the actual problem may need
a more sophisticated model.
To understand how bias in data can occur, and more importantly what to do about it, let’s
consider three different examples.
Recidivism Prediction
Our first scenario is based on the real-world recidivism incident mentioned above [11].
In this scenario, data was collected regarding decisions taken by human operatives
over a period of many years. These historical decisions are used to train an AI system.
Unfortunately, because the decisions have been made by ordinary people, they include
the types of prejudice that humans have, sadly, demonstrated throughout history. This
prejudice is learnt by an ML-based AI and, consequently, the AI exhibits the historical
prejudice. In this scenario, it is imperative that the data is cleaned up! The prejudice
embedded in the historical training data must be identified and removed. There is no
place for it. This goes beyond protecting obvious fields such as “gender” from being used
128 ◾ Beyond Algorithms
in the model, you need to consider other data markers that might identify a candidate’s
ethnicity or gender, such as “zip code”, “preferred language” or a Social Security Number
that is not consistent with the date of birth, possibly indicating an immigrant status.
EXPLAINABILITY
Until the recent popularity of ML, we did not have a reason to invoke the word ‘Explainability’
in a real conversation or in an article. So, what is this all about? As we have explained in previ-
ous chapters, the problem with statistical ML systems (e.g. neural networks) is that they map
Business Value and Impact ◾ 129
inputs to expected outputs using complex algorithms, sometimes with millions of param-
eters, with no explicit form that you can see. As a result, you do not know why the model gives
you the output it does, because there is no real explanation coming with the output [17,18].
Let us take the simplest of example, where a social media tool recommends a friend or a
connection for you. It is common for the tool to identify the common friends you and the
recommended person have, in order to justify the suggestion. That was an easy one. Every
now and then, you scratch your head, “Why this person?” The same thing about movie
selections; since you like science fiction movies, here is another science fiction recommen-
dation for you! These are easy recommendations that you could accept or ignore.
i. Engineers who build, deploy and support the system need to know the technical
details of how it works for the purpose of debugging and identifying improvements.
ii. Operators of the AI system (i.e. Professional Decision Makers) will need explanations
in the domain of relevance (medicine, law, etc.).
iii. Government agencies and Regulatory bodies that monitor or audit the relevant busi-
ness processes want to make sure that proper guidelines on data privacy, fairness,
etc., are followed. The explanation has to be in the context of the relevant regulations.
iv. End consumers need to hear explanations that make sense to their situations and help
with exploring options in the outcomes. Clearly, they will have the lowest threshold
for complexity and domain information.
130 ◾ Beyond Algorithms
Ultimately, this situation could have been avoided if OFQUAL had asked the right
question. Instead of ‘Can we build a model to assign grades’, the question should have
been ‘Should we’. And given the situation, the available data and the consequences, the
answer should have been an emphatic ‘No’.
Michael Nicholson
Michael Nicolson is IBM’s Chief Data Scientist for Energy, Environment and Utilities. In
this role he helps clients drive improved performance from their data, be that through
energy efficiency, leakage reduction, pollution prevention or otherwise. He holds a
First Class degree in Physics from Imperial College, and before focussing on water and
utilities he developed real time analytics applications for England Rugby Union.
REFERENCES
1. S. Coughlan, “A-levels and GCSEs: Boris Johnson blames ‘mutant algorithm’ for exam fiasco,”
https://www.bbc.co.uk/news/education-53923279.
2. T. S. F. Haines, “A-levels: the model is not the student,” http://thaines.com/post/alevels2020.
3. Financial Times, “Were this year’s A-level results fair?” https://www.ft.com/
content/425a4112-a14c-4b88-8e0c-e049b4b6e099.
4. S. Hubble and P. Bolton, “A level results in England and the impact on university admissions
in 2020–21,” UK House of Commons Briefing Paper 8989, (September 2, 2020).
In each case, acceptable explainability is dictated by the audience, and a single application
may need to explain itself multiple times in different ways.
The approach [17–19] to provide the appropriate explanations depends on the specific
context. Some relevant considerations are:
• One shot static output from the system or an interactive explanation with the user.
• Explanation of a single output instance or a global behaviour of the model.
• Explanation is from the actual model or a surrogate model that tries to mimic the
actual model.
• Explanation is based on specific samples or features (i.e. attributes) or data
distributions.
There are examples of open-source packages [20] for implementing explainability in busi-
ness applications.
I made the decision to pull over to the side of the road because the noise classifier
detected a siren approaching from behind, the vision system identified an emergency
vehicle approaching from behind, the road was straight and it was a safe place to
stop without causing an obstruction.
Within this explanation, there are “classifications” that may have come from an AI subsys-
tem that cannot fully explain its reasoning. For example, the vision system may not be able
to explain how it identified an emergency vehicle approaching. Whilst we may feel that is
unacceptable, it is important to accept that there are many situations when we as human
beings cannot explain our own image or sound recognition. However, even when the clas-
sifier cannot explain its reasoning, it should be possible to design a ML system so that it
retrieves the training data that influenced its classification decisions. The vision system
may not be able to explain its classification, but it should be able to say, “here are the closest
matching images in my training data”.
It may one day be a legal requirement that all critical system AIs retain their original
training data, so that a forensic examination is possible should the AI go wrong.
So, whilst certain algorithms cannot explain their decision-making, it doesn’t mean
an overall system cannot be engineered to be explainable. ML systems should be able to
support their decisions by showing the training evidence that most closely resembles their
decision. In any system, we should be able to trace the functionality of the AI and repro-
duce decisions in a test environment. We must also ensure that the AI has been developed
using best practices and in a manner that would be deemed competent in any investiga-
tion. Make sure you allocate resources in your AI project plan/business case to cover these
system requirements.
TRANSPARENCY
Transparency, in the development and operation of AI systems, is intrinsically linked to
trust! We can’t build trust without transparency.
If we are expecting AI applications to make important decisions, then the Stakeholders of
such systems are going to want to understand how the AI was developed and how it works.
This means being clear on which algorithms are used in the application and what data was
used in the development (either directly as training data or as use case and test data).
Once again, this is an area where AI needs to build on and extend the best practice of con-
ventional complex systems. As with all software projects, it is imperative to ensure proper
configuration control. It is also important to ensure the software implementation is properly
tested. However, in cases where ML is applied, then the data aspect of AI is again very signifi-
cant. In developing an application, we need to be able to demonstrate that the training data
was correctly managed and that the system was trained and tested in a competent manner.
This includes traceability of the development process. For example, when an AI appli-
cation makes a decision, we must be able to trace the configuration of the AI including
Business Value and Impact ◾ 133
understanding which data was used for training and how the training process was config-
ured. This may sound obvious; however, be careful! In conventional software development,
we may have released a new version of the code every few weeks. With ML, we could be
developing and deploying new AI Models on a daily basis.
The concept of FactSheets [20] is an appealing idea, much like a food label, to collect the
various pieces of information related to the AI development. This information can then be
used to satisfy the needs of the various stakeholders. There are ways [22] to automate the
capture of most of the information from the development artefacts to make it less painful.
One particular area to think about is the impact of privacy regulations on traceability.
If our ML is trained using real data, there is the possibility that we could be asked to delete
some of the data records. An individual could, for example, make a request under GDPR
for their personal records to be deleted. If we delete data records from the training data,
then hypothetically, we may also be required to delete or retrain the AI Model associated
with those redacted records. Once the data has been deleted, we may be unable to repro-
duce the behaviour of the system if subsequently challenged to validate, or explain, a his-
torical decision. This is currently a topic of debate.
Making Mistakes
As described in Chapter 3, the underlying concept behind most of the current ML tech-
niques is called ‘Statistical Learning’ [23]. Just as it sounds, the model is not created by
explicit programming, but ‘learnt’ from the training data. While it is very powerful in find-
ing patterns in data not easily found by humans, it suffers from the known vagaries of data.
The first observation is that due to the statistical nature of the data, the mapping from the
inputs to the output may not be unique and have some scatter. The other is the complexity
of the statistical models. More complex the model, the more data you need. Simpler models
failed to explain the detailed differences in the data. Complex models are so optimised for
the training data that they have trouble ‘generalising’ to new data previously not ‘seen’ by
the model. So, any model that does well on a broad set of data, will inherently contain some
errors. Added complication comes from the observation that a ML model can often fail on
specific instances, despite being very confident in its prediction [24]. This leads to the need
for quantifying uncertainties in the model prediction in addition to the model output, so
that the user can use the information accordingly [25].
The question is how to deal with these errors, however small they may be, in business/mis-
sion critical systems. While 95% accuracy is amazing for most tasks of no serious consequence,
5% error in consequential systems can lead to significant loss of life or value. This is the conun-
drum we are in. The complicating factor is that in many of these situations, we do not know
134 ◾ Beyond Algorithms
the right answer. Think of the use of AI in a doctor’s office, where AI recommends treatment
option A for a patient, when the doctor prefers option B. If the patient got option B and did not
recover, what can we say? May be, option A would have been better, we do not know!
The practical business consequence of these inevitable errors is the need to decide what
percentage of errors is tolerable for the specific business application and how to manage the
consequence of errors when they do happen.
• Whether they become ‘trusted’ or not, is left to the assessment of the users of the
applications.
• ‘Trust’ has elements beyond direct revenue to the enterprise.
• It is up to the enterprise to make the best choice on where to invest in the AI technol-
ogy for long-term business value and social benefit.
REFERENCES
1. Action This Day, International Churchill Society, https://winstonchurchill.org/the-life-of-
churchill/life/man-of-action/action-this-day-27/.
2. S. Russell, Human Compatible: Artificial Intelligence and the Problem of Control, Penguin
(2020).
3. G. Marcus and E. Davis, Rebooting AI: Building Artificial Intelligence We Can Trust, Pantheon
(2019).
4. M. Liao, Ethics of Artificial Intelligence, Oxford University Press (2020).
5. M. Coeckelbergh, AI Ethics, MIT Press (2020).
6. G. Neff and P. Nagy, “Talking to bots: symbiotic agency and the case of Tay,” International
Journal of Communication, 10, pp. 4915–4931 (2016).
7. J. Dastin, “Amazon scraps secret AI recruiting tool that showed bias against women” (October
2018) https://www.reuters.com/article/us-amazon-com-jobs-automation-insight/amazon-
scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK08G.
8. L. Sweeney, “Discrimination in online ad delivery,” (January 28, 2013). Available at SSRN:
https://ssrn.com/abstract=2208240 or http://dx.doi.org/10.2139/ssrn.2208240.
9. O. Papakyriakopoulos, et al., “Bias in word embeddings,” Proceedings of the 2020 Conference
on Fairness, Accountability, and Transparency, FAT*’20, pp. 446–457 (2020).
10. M. Zhang, “Google photos tags two African-Americans as gorillas through facial recogni-
tion software,” https://www.forbes.com/sites/mzhang/2015/07/01/google-photos-tags-two-
african-americans-as-gorillas-through-facial-recognition-software/.
11. J. Angwin, et al., “Machine bias: there’s software used across the country to predict future
criminals. And it’s biased against blacks,” ProPublica, (23 May 2016). www.propublica.org/
article/machine-bias-risk-assessments-in-criminal-sentencing.
12. J. Buolamwini and T. Gebru, “Gender shades: intersectional accuracy disparities in commer-
cial gender classification,” Proceedings of the 1st Conference on Fairness, Accountability and
Transparency, PMLR 81:77–91 (2018).
13. R. K. E. Bellamy et al., “AI Fairness 360: an extensible toolkit for detecting and mitigating
algorithmic bias,” IBM Journal of Research and Development, 63(4/5) (2019).
14. D. Danks and A. J. London, “Algorithmic bias in autonomous systems,” Proceedings of the
26th International Joint Conference on Artificial Intelligence (IJCAI-17).
15. A. Chouldechova and A. Roth, “The frontiers of fairness in machine learning,” arXiv:1810.08810.
16. P. Jenkins, “Why gender-based car insurance is preferable to a black box spy,” https://www.
ft.com/content/0e54a5da-8148-11e8-bc55-50daf11b720d (July 9, 2018).
17. M. Hind, “Explaining explainable AI,” XRDS: Crossroads, The ACM Magazine for Students,
25(3), pp. 16–19 (2019).
18. R. Guidotti, et al., “A survey of methods for explaining black box models,” ACM Computing
Surveys, 51, pp. 1–42 (2018). Article no. 93.
19. V. Arya et al., “AI explainability 360: an extensible toolkit for understanding data and machine
learning models,” Journal of Machine Learning Research, 21 (2020).
20. AI 360 Explainability Toolkit: https://aix360.mybluemix.net/.
136 ◾ Beyond Algorithms
21. M. Arnold, et al., “FactSheets: increasing trust in AI services through supplier’s declarations
of conformity,” IBM Journal of Research and Development, 63(4/5) (2019).
22. AI Lifecycle Governance, https://aifs360.mybluemix.net/governance.
23. T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning, Springer (2017).
24. E. Hüllermeier and W. Waegeman, “Aleatoric and epistemic uncertainty in machine learning:
an introduction to concepts and methods,” Machine Learning, 110, pp. 457–506 (2021).
25. U. Bhatt, et al., “Uncertainty as a form of transparency: measuring, communicating, and
using uncertainty,” Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society,
AIES’21, pp. 401–413 (2021).
26. H. Xu, et al. “Adversarial attacks and defenses in images, graphs and text: a review,” The
International Journal of Automation and Computing, 17, pp. 151–178 (2020).
27. K. Eykholt, et al. “Robust physical-world attacks on deep learning visual classification,”
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.
1625–1634 (2018).
28. I. J. Goodfellow, J. Shlens and C. Szegedy, “Explaining and harnessing adversarial examples,”
arXiv preprint arXiv:1412.6572 (2014).
29. Adversarial Robustness Toolkit, http://art360.mybluemix.net/.
Chapter 6
Ensuring It Works –
How Do You Know?
If one of your children arrives home with a report card stating that he or she has
achieved 95% in their mathematics exam, you should be justifiably proud.
Our perception of accuracy and what it means is perhaps heavily influenced by our
childhood experience of tests and exams. When we hear that an AI has achieved an
accuracy of 95%, our immediate reaction is that the performance is excellent.
Now, let’s consider a different perspective … imagine you board a flight and the Captain
announces, “Don’t worry folks, I only crash one in twenty!” Will you still stay on the
plane? That is 95% accuracy, but it doesn’t seem so good now.
The concept of accuracy is something that all those involved in artificial intelligence (AI)
projects need to understand. It may at first appear relatively simple. However, as our simple
example above demonstrates, our perception and understanding of accuracy may need to
be challenged if we are to successfully deliver an AI application. Furthermore, accuracy
is just one of the metrics that should be considered in measuring the quality of an AI
application. In this chapter, we will look at how the performance of an AI application can
be defined and measured. We start with a short description of how application quality is
managed in traditional software development and compare it to what is needed for AI
applications using Machine Learning (ML).
IS THE AI WORKING?
If we are to trust AI, we need to be sure that it’s working!
In conventional systems, we do that through extensive testing that covers relevant
input scenarios and key edge cases. We ensure the system is developed “to specifica-
tion”. However, in AI systems, we are expecting the AI to make decisions much more
akin to those of a human specialist.
In conventional systems, we ask one simple question …
We answer that question by testing, testing and testing again! We perform white box
test, black box test, factory acceptance test, user acceptance test, site acceptance test,
security test and performance test (to name just a few)!
What about AI applications … they’re just software so surely we just test them in
the same way? Well, unfortunately, it’s not quite that simple. As mentioned previously,
AI applications are expected to perform tasks normally requiring human intelligence.
As such, they are often expected to exercise judgement and make decisions in circum-
stances that have are not always expected. We don’t expect human beings to behave like
robots … we just want our robots to behave like human beings.
When it comes to assessing the performance of a human decision maker, we use a
much broader set of criteria. With AI applications, we will need to do the same. We
will also need to ask …
… could the application realistically be expected to cope with the situation it was
presented with?
… was the application deployed with appropriate governance and the right safe-
guards to ensure it was making good decisions.
So, can we test AI systems in the way we used to test conventional systems or should we
think about AI in a different way and ask questions such as …
… were the correct development processes used?
… could the developers have anticipated the circumstances in which the AI was
applied?
is put into production. Once deployed, there is a defined process to collect incident reports
during operations and provide a mechanism for support teams to diagnose and resolve
them expediently. This may involve code fixes by the original engineers who created the
relevant code. There are two key points to remember here:
Ensuring It Works – How Do You Know? ◾ 139
James Luke
We spend the next few sections on explaining accuracy in its various flavours. Readers
with prior experience of statistical accuracy may wish to skip over the next few sections.
For those, who haven’t had the pleasure, we aim to provide a basic introduction to the art!
STATISTICAL ACCURACY
For a start, it is very rare that the effectiveness of an AI application can actually be reduced
down to a single number. Understanding what accuracy is, how to measure it and how to
define what is required are essential aspects of any AI program.
So, what do we mean by accuracy? Based on the discussion above, there are two com-
mon types of AI predictions: (i) Numerical prediction by regression and (ii) prediction by
classification. Metrics for estimating accuracy is different in these two cases.
Ensuring It Works – How Do You Know? ◾ 141
• Mean Square Error (MSE) is the error in the model captured as the sum of the
squares of the difference between the prediction and the actual value in the train-
ing data, divided by the number of predictions. The goal of the modelling process is
to find specific values of the model parameters that minimise this error. Obviously,
smaller MSE is better.
• A related metric is Root Mean Square Error (RMSE) which is just the square root
of the MSE. A simple interpretation of the RMSE for the linear regression model is
that the prediction of the model is within ±17.839 kg of the actual for 68% of the data,
when the error is normally distributed.
• R2 (R-Square) represents the percentage variation of the output that is explained by
the input variable(s). It ranges between 0 and 1. In our example, R2 = 0.64 means, 64%
of the variation in the weight is explained by the height. During the model creation
activity, this metric also helps to do better feature engineering by identifying input
variables (i.e. features) that do not contribute much to quality of the model and hence
can be removed from the analysis.
The regression metrics are very close for the two models and the Regression Tree model
appears to be only slightly better than the Linear Regression model. But, when you look
at the predicted values in the figures (see the accompanying vignette), you can see that for
lower heights (i.e. for children), the predictions from the Linear Regression model are basi-
cally useless (including negative weights!). While Linear Regression is simpler to under-
stand, the Regression Tree model is doing better predictions across the entire population
and therefore a better model between the two. This is just an example of the assessment
needed before any model is put into practice.
Now we move on to discuss the accuracy metrics in classification problems.
TABLE 6.1 Regression Metrics for the Two Machine Learning Models from the Vignette, “Predicting Weight
from Height”
Model MSE RMSE R2
Linear regression 318.24 17.839 0.63
Decision tree 311.29 17.643 0.64
142 ◾ Beyond Algorithms
FIGURE A: A Linear Regression Model finds a straight line that best fits the training data and uses
that to predict. The blue points are the training data and the orange line represents the predictions
for the hold-out data set.
FIGURE B: A Regression Tree Model [2] partitions the input data into bins (called leaves) to mini-
mize the error and estimates the output values based on the bin averages. The blue points are the
training data, the orange points are the predictions for the hold-out data set.
The relevant metrics and relative performances of the two models are discussed in
Chapter 6.
REFERENCES
1. Body Measures Data, Center for Disease Control, https://wwwn.cdc.gov/Nchs/
Nhanes/2007-2008/BMX_E.XPT.
2. L. Breiman, et al., Classification and Regression Trees Chapman and Hall/CRC,
1st Edn (1984).
Ensuring It Works – How Do You Know? ◾ 143
Prediction by Classification
Let’s start with a simple binary classification example where the AI has just two choices.
Many of us now use biometrics to logon to our laptops and phones. When you present your
thumb to your device, the classifier can either correctly identify you or fail to do so. What
about if someone else picks up your device and tries to access it? Does the classifier cor-
rectly identify the attempted intrusion and block access? So, we have two separate metrics
to consider: the number of times a person who should have access is granted access and the
number of times a person who should not have access is blocked. In this case, we can have
a simple definition of accuracy as the percentage of times the classifier correctly identifies
the user as valid or not, out of the total attempts.
Correct Response
Accuracy=
Total Attempts
Obviously, we need to consider the opposite scenarios: the number of times a person who
should have access is blocked and the number of times a person who should not have access
is allowed in.
In the world of AI, we have a wonderful terminology that describes these four statistics.
You will hear accuracy described in terms of True Positives, False Positives, True Negatives
and False Negatives. This basic four box model covers all possible scenarios when consid-
ering the accuracy of a simple, binary classifier. What does all of this really mean? Well,
quite simply, it means that even a simple binary decision requires four separate metrics to
fully describe its accuracy. In practical terms, most applications can probably just rely on
a single metric to describe performance. For example, if we were writing a review of com-
modity biometric classifiers for a consumer magazine, we could simply use Accuracy as
defined above. The Correct Response is the sum of True Positives and True Negatives. So,
the ratio of the Correct Response to the Total Attempts is a single simple measure of the
product effectiveness.
However, if we are formally evaluating a classifier for a more serious application (e.g.
biometric security system for a prison), then we need to conduct a more thorough evalua-
tion that looks at all four metrics. We should also be very clear on what is required in terms
of performance for each metric.
Figure 6.1 explains the relevant ideas. The box illustration in Figure 6.1a is called a “con-
fusion matrix” (ironic, huh!) in the literature, where the textual descriptions in the four
boxes are for our biometric authentication application. An example of an actual confusion
matrix is shown in Figure 6.1b based on the results of a test run of 100 attempts with 83
legitimate users and 17 fraudulent users. Eighty results were true positives, seven were false
positives, and three were false negatives and ten were true negatives.
It would be nice if some of these metrics could be combined and the good news is
that they can be! Ratios of these numbers are used to describe the model’s efficacy.
The two terms you need to listen out for are Precision and Recall. Precision is the frac-
tion of the decisions by the AI to grant access that are correct. Precision is defined in
general as:
144 ◾ Beyond Algorithms
Team A Team B
True positives False positives True positives False positives
26 6 26 10
False negatives True negatives False negatives True negatives
10 60 6 60
F-measure = 0.87 F-measure = 0.87
The first point to note in examining these results is that the F-Measure for both teams is
the same. However, they achieve this result in quite different ways with Team A having a
higher precision but lower recall and vice versa.
Which is best?
Now, let’s add a couple of complicating factors.
Anyone identified by the AI as having the cancer must undergo a long and expensive
medical procedure. The procedure is uncomfortable and invasive; in short, it’s not a
pleasant treatment.
What does this mean for the application of our classifiers?
If we selected Team A’s classifier, for every 100 people classified, six will unnecessar-
ily undergo this difficult treatment. Conversely, using Team B’s classifier would result in
ten people unnecessarily receiving a painful treatment. Clearly, therefore, we should go
with Team A.
However, what if the cancer is terminal?
Team A’s classifier misses ten people who require treatment and may therefore die.
The classifier developed by Team B only misses six cases that require treatment.
Now, which is best? Team B, of course.
This type of dilemma has existed throughout history and, as a society, we are accus-
tomed to hearing cost benefit and moral arguments about similar situations.
The key point to understand is that it’s simply not possible to reduce the assessment
of two competing AI applications to a single metric. In this example, both applications
achieved the same F-Measure but in different ways and with different implications for
those affected. In real-world applications, you may need to factor in the consequence
and or liability cost of the error before you choose.
Ensuring It Works – How Do You Know? ◾ 145
(a) (b)
TRUE POSITIVE FALSE POSITIVE
FIGURE 6.1 (a) gives the description of the confusion matrix for a binary decision of grant access
or block access. (b) shows the confusion matrix filled in for the example described in the text.
True Positives
Precision =
( True Positives+False Positives )
Recall is the fraction of the legitimate users that are correctly identified by the AI. Recall is
defined in general as:
True Positives
Recall =
( True Positives+False Negatives )
These metrics can in turn be combined into a single metric. One particular metric is
referred to as the F-Score (or the F1 score or just plain F) is the harmonic mean of Precision
and Recall and is defined mathematically as:
2 × ( Precision × Recall )
F=
( Precision + Recall )
A perfect model will have all three metrics: Precision, Recall and F-Score equal to 1. In our
example for the algorithm to give network access to people, Accuracy = 0.9; Precision =
80/(80 + 7) = 0.92; Recall = 80/(80 + 3) = 0.96 and F = 0.94. That is very good.
There are other metrics available [5] to describe accuracy of AI models for specific con-
texts, such as Area Under ROC Curve for binary classification problems and Cross Entropy
Loss (or log loss) for evaluating the predictions of probabilities of membership to a given
prediction class.
Combining raw accuracy data together into a single metric, or a small group of metrics,
is useful when performing a high-level evaluation. These metrics are useful when conduct-
ing a scientific evaluation of different algorithms or when, for example, selecting a general
search tool such as an intranet search engine. However, these combined metrics should be
used cautiously when looking at more sophisticated and mission-critical applications.
146 ◾ Beyond Algorithms
COST FUNCTIONS
In many enterprise applications, the cost of a mistake varies with the type of entity being
classified. Consider a fraud detection solution such as the one used by your bank. The solu-
tion monitors transactions and blocks your attempt to pay for skiing lessons in the Alps
because the spending pattern has deviated from your normal behaviour.
In such a solution, the cost of incorrectly blocking a transaction is very low. The worst-
case scenario is that you call the number on the card, verify the transaction and continue
your holiday. You may feel a bit irritated about the inconvenience, but you recognise and
respect the fact that your bank is being cautious.
In the same solution, the cost of missing a genuinely fraudulent transaction can be very
high. A company could lose thousands of pounds by allowing the transaction to proceed.
As you can see, the cost of a False Positive is far higher than the cost of a False Negative.
It is therefore understandable that banks err on the side of caution and tune their systems
to be quite sensitive; they tend to shout “fraud” at the first hint of trouble.
It’s particularly important in this scenario to look at the performance of the AI against
each possible outcome and not to rely on a single, overall accuracy figure. If the bank
normally experiences fraud in 5% of transactions, then an overall accuracy of 95% can be
achieved just by classifying all transactions as not fraudulent. Clearly, that is a very mis-
leading figure and clearly demonstrates the need to drill down into the actual figures for
each possible outcome (Figure 6.2).
Now let’s consider a different and far more serious scenario. Let’s imagine our system is
a medical application that examines medical images to decide whether a lump is cancer-
ous. The cost of classifying a malignant lump as benign is huge … possibly a human life!
Just as with the fraud solution, the safe thing to do would be to tune the solution so that it
was skewed towards False Positives.
What if, however, classifying a lump to be cancerous meant that the patient had to
undergo a very expensive and, more importantly, a very unpleasant and painful proce-
dure? In such a scenario, there is an ethical responsibility to minimise unnecessary inter-
ventions and a financial incentive not to incur unnecessary cost.
A legitimate A fraudulent
transaction is transaction is
allowed. allowed.
(HIGH COST)
A legitimate A fraudulent
transaction is transaction is
blocked. blocked.
(LOW COST)
FIGURE 6.2 Description of the confusion matrix for the fraudulent transaction case.
Ensuring It Works – How Do You Know? ◾ 147
FIGURE 6.3 Confusion matrix for the three decision example of classifying products as PASS, FIX
or FAIL. The size of the matrix gives 3 × 3 = 9 metrics.
MULTIPLE OUTCOMES
As always, enterprise applications are almost certainly going to be a little more compli-
cated. Imagine your enterprise wishes to apply AI to perform a quality assessment on
products. As products roll off the production line, they will pass through a booth where
photographs are taken and then passed to an AI to determine if there are any faults. The
simplest possible scenario is that the AI has to make a pass-fail decision for each item that
passes through the booth.
To make matters more complex, quality assessment is rarely a simple binary decision.
Products can have different types of faults and some of them may be fixable, whilst others
may be more terminal. If we require our Classifier to make a PASS-FIX-FAIL decision, the
number of metrics we need just increased from 4 to 9 (see Figure 6.3). Ideally, we would
like the red (off-diagonal) cells in the matrix to be zeroes, implying perfect classification.
The decisions in the three red cells on the left (below the green diagonal) are wasteful since
good and fixable products are being rejected as “FAIL” or unnecessary additional work is
recommended on good products. In contrast, the three red cells on the right (above the
green diagonal) are potential sources of business risk by passing defective products for use
or unproductive ‘fix’ on defective products.
As the number of decisions (n) increase, the confusion matrix cells grow as n2 and (n2 −n)
represents off-diagonal elements that need careful evaluation.
Search/Information Retrieval
You are very familiar with doing searches on your favourite search engine looking for
something. You already know that as you type the words for your query, the search engine
can suggest potential search terms you can use. One measure of quality of the search engine
is how well it can anticipate your intended search term based on what it learnt from you
and other users like you. The second measure of quality is when you get the search results
back, whether the recommended ranked sources of information meet your need. If the top
recommendation is what you wanted, that is pretty awesome. If you get the best informa-
tion in the top five recommendations, you may still be happy. If you have to do a lot more
work than that, you will be unhappy with the search engine.
Information Extraction
Unstructured text documents are often packed full of valuable information that needs to
be transformed into a structured form. This process is known as entity and relationship
extraction. Extracting this information into a structured form is of huge value to down-
stream analytics.
Let us consider the use of NLU in law enforcement (Figure 6.4). A typical police report
or witness statement may describe a whole raft of different entities. There could be refer-
ences to names, addresses, vehicle registration numbers, vehicle makes, vehicle models,
vehicle covers, phone numbers, email addresses, passport numbers, driving license num-
bers, immigration visas, credit card numbers, cash, drugs, guns, cigarettes, property of
all sorts of types, organisation names, guns, knives, dates of death, dates of birth, dates of
arrest, dates of immigration entry, dates of other events and just about anything else you
can imagine!
If the police department applied NLU to their entire content, they would have an incred-
ibly valuable resource against which they could perform data mining, time-series, predic-
tive, geospatial and many other forms of analysis. The effectiveness of these downstream
analytics will be directly related to the accuracy of the entity and relationship extraction …
but, what do we mean by accuracy?
As explained earlier, it is possible to use a single metric such as the F-Measure to describe
the overall accuracy of the AI. However, to really understand the effectiveness of the AI, we
will need to look into the detail.
Entity Value
Arresting_Officer PC 143
Arrest_Date_Time 15/06/2018: 23:47
PC 143 (Hunter) Suspect_Forename John
15 June 2018 23:47 Suspect_Surname Smith
Suspect identified himself as John Smith. Matched description Entity &
Suspect_VRN AU18JCT
given by night club doorman (IC1, Male, Age 22–24 yrs, blue Relationship
Everton shirt). Stopped while driving White Ford Mondeo, Suspect_Vehicle_Colour White
Extraction
AU18JCT. Address given as 22 East Dene Ridge, Copdock, Suspect_Vehicle_Make Ford Mondeo
Ipswich. Searched at the scene and found in possession of 1 oz
Suspect_Address_Street 22 East Dene Ridge
Cannabis Resin and lockable pocketknife.
Suspect_Address_Town Ipswich
Evidence_1_Description 1 oz Cannabis Resin
Evidence_2_Description Lockable Pocketknife
First, some entities may be of higher value than others and we need to take that into
account. Examples are name of the suspect and the current address. For those key vari-
ables you may have to build extra tests (sanity checks). Nothing is perfect, but you can
often use simple rules to isolate possible anomalies. For example, you could flag for man-
ual checking any document that does not seem to have a suspect name, and or extract
those key suspect names out as a list and compare them with known criminals, and or
against staff names or the report’s author name (just in case the AI has misidentified the
author of a report, or their fellow police officers as suspects). Extracted addresses can be
matched using geocoding software, which in turn will give you (typically) a clean version
of the address including zip/post code plus a latitude and longitude for geolocation. If you
can get geolocation data, you can quickly plot the Euclidean distance, from an expected
location e.g. your police station, the centre of your operating area, etc., and the suspects
address, bad guys often perform their crimes close to home, so you can flag any seeming
long-distance behaviours. If the distance is small (use averages), your address extraction
and geo-matching is probably correct, if lots of suspects seem to live 1000s of miles away,
chances are your extraction process is conflating street name with city name or you really
do have some travelling bandits.
Second, it is important to know which entities are most problematic for the AI so that
we can focus any optimisation efforts appropriately.
Finally, we need to consider the relative numbers of different entity types across the
corpus during training and in production. Our test corpus may contain a large number of
phone numbers and a small number of credit card numbers. If the AI is great at identifying
phone numbers and useless at credit card numbers, then the relative distribution during
testing will make the AI look better than it is for real use.
Interactive ChatBots
They are everywhere! Typically, ChatBots are deployed in customer support roles special-
ising in business-relevant tasks in specific domains (called Skills). In contrast to one-shot
Question-Answering, the ChatBot tasks need multiple turns with the user in a specific
context. Examples are travel reservations, banking transactions, answering COVID-19-
related questions [6], etc. The user interaction can be via speech or text. Speech technology
involves acoustic models and thus introduces further complications due to accents, dia-
lects, background noise, etc. Once the user speech is transformed into text (typically using
a speech-to-text AI component), NLU processes take over and then the problem becomes
the same as for textual inputs.
There are different architectures to support the specific goals of ChatBots [7]. The role
of NLU is to understand the user intent from the textual input and map it to the available
knowledge and come up with an appropriate response, while keeping the context of the
user and the topic. Other embellishments like greetings are added to the ChatBot response
to make the user experience closer to a human–human interaction. Kuligowska [8] intro-
duced quality metrics to evaluate commercial ChatBots, addressing various aspects such
as user interface, general and special knowledge content, personalisation and conversa-
tional ability. Sandbank et al. [9] studied distinct attributes of an “egregious” conversation
150 ◾ Beyond Algorithms
James Luke
between the user and a ChatBot. These included repetition of responses by the ChatBot and
its inability to understand the user intent, rephrasing by the users, emotional state of the
user, user request to talk to a human, etc.
This discussion shows that creating a ChatBot for business requires a careful assessment
of its task against the NLU technology needed to support the specific skill (e.g. product
support) from the institutional knowledge and making sure that the user experience con-
tributes positively to further the business goals. Training data for the ChatBot must come
from actual user interactions to increase the success of its deployment.
Importance of Tooling
To use some historical analogies, the Industrial Revolution would not have been pos-
sible without the lathe, called the “mother of machine tools” [10]. When combined
with the power of steam, creating all types of heavy machinery became much more
Ensuring It Works – How Do You Know? ◾ 151
doable. Similarly, it was the building of a wind tunnel [11] by the Wright Brothers that
allowed them to design the aerodynamics for their plane. As we stated in Chapter 3,
AI is not just about the algorithms. The importance of tooling to create and sustain AI
applications cannot be overstated. Based on their experience of deploying production
systems at Google, Sculley et al. [12] have pointed out the substantial hidden technical
debt in ML systems and observed, Only a small fraction of real-world ML systems is
composed of the ML code …. The required surrounding infrastructure is vast and com-
plex. We shall discuss some general tooling needs in this section, with more details in
Chapter 10.
Provision of training and test data is always a challenge for AI projects. To train an ML
system effectively, we require a large volume of consistently labelled data and a significant
proportion of that data is going to be needed for testing. This test data needs to be provided
in a form that can be fed into the AI, and we are going to need tools to compare the output
of the AI with the ground truth data (data where we already know the correct answer).
Let us return to the entity and relationship extraction application in NLU to illustrate
some specific needs. Since our data sources are unstructured text, we need a tool that allows
a domain expert to annotate (label) the unstructured text data. This is a human-intensive
effort, and crowd sourcing is a popular approach to help with this massive labelling job. A
part of the tool’s function is to resolve any inconsistencies among the labellers to minimise
the noise in the data. Recently, due to rapid advances in NLU technology [13], pretrained
language models (called transformers) have been built using large corpuses in the public
domain. Any NLU project can leverage these models as the first step and then customise
the tooling to support the specific NLU task subsequently. Despite such advances, adapt-
ing public domain language models to specialised business domains (e.g. manufacturing,
finance, etc.) remains a challenge [14] and requires human expert in the domain to create
the appropriate labels.
Once we create a large-enough corpus of ‘labelled’ data, we need tools to divide that
into training and test datasets, making sure that their feature contents are statistically
equivalent between the two AND with the operational data. In addition to maintaining a
reproducible data pipeline, tooling is also needed to create the model and test the model
for various properties (i.e. accuracy, bias, etc.). We also need monitoring tools during pro-
duction (see below) to assess that the application is behaving properly. For the purposes of
debugging and auditing the model behaviour at any time during development or deploy-
ment, we need versioning of the models and the related data. We refer to Baylor et al. [15]
for an example of an end-to-end tools platform.
These requirements apply to all ML applications (not just NLU) so they should be fac-
tored into your AI development programme. One option to consider is whether you are
able to capture labelled data through your existing business processes. If that’s possible, it
may significantly simplify your task and is one of the reasons it is preferable to first apply
AI within existing business processes. If you do this, it’s important to ensure the captured
data is consistent. When multiple people label data, they don’t always generate the same
labels for each situation. It’s important to have cross over (e.g. the same data labelled by
multiple Analysts) to enable normalisation of the data into a consistent set.
152 ◾ Beyond Algorithms
Rule-Based Techniques are suitable for use cases with a small number of variations
(dates, email addresses), or when it is difficult to create labelled data (PII informa-
tion such as social security or bank account numbers). Less suitable for complex
tasks with lots of variation or needing information from context (e.g., identifying
person names).
Classic Statistical ML is suitable in use cases that require fast training and inference
time, and where you have sufficient training data to achieve result quality that is
reasonable for your application. Incorporating word embeddings as features can be
extremely powerful and alleviate the need for human-engineered features.
Deep Learning (DL) is suitable in use cases with sufficient training data where you
want high-quality of result as well as reasonable inference times; especially useful
in cases when GPUs are cost-prohibitive.
Transformers are suitable in a few different use cases:
• High-Value Use Cases where the quality of result is of utmost importance and
worth the trade-off in compute resources.
• Use Cases Where Labelled Data is Scarce and classic ML/DL techniques do
not achieve the quality necessary by the business application.
• Multilingual Use Cases where training data exists in multiple languages and
you want to simplify operational costs by deploying a single multilingual model,
instead of many monolingual models.
Ensuring It Works – How Do You Know? ◾ 153
Laura Chiticariu
Laura Chiticariu is a Distinguished Engineer and Chief Architect for Watson Core Language
at IBM. She has more than 10 years experience developing and productizing Natural
Language Processing systems and algorithms, and she now leads the development
of core Natural Language Processing algorithms that power many IBM products.
REFERENECES
1. T. Young et al., “Recent trends in deep learning based natural language processing,” IEEE
Computational Intelligence Magazine, 13, pp. 55–75 (August, 2018).
2. X. Qiu, et al., “Pre-trained models for natural language processing: A survey,” Science China
Technological Sciences, 63, pp. 1872–1897 (2020).
3. E. Brill and R. J. Mooney, “An overview of empirical natural language processing,” AI
Magazine, 18(4), pp. 13–24 (1997).
As part of this strategy, you should remember that AI capabilities are continually
improving. Therefore, it is necessary to review available AI components on a regular basis
and ensure your application is using the most effective component available. This is par-
ticularly true if you are delivering cloud-based applications where the AI components are
available as Cloud services.
Another concern with historical data is that it may also include prejudice! We have
already touched on the topic of AI Bias in Chapter 5 and will discuss data aspects of bias in
more detail in Chapter 7. Human-labelled data can often reflect the prejudices, conscious
or unconscious, of those who labelled it. This is potentially one of the greatest benefits of
AI in that we can systemise these decisions, identify bias and manage it out of the system
to deliver ethical and fair decisions.
What this means, of course, is that you are going to need more tooling! Even if you have
a huge corpus of labelled data ready for immediate use, you will still need to check the
data to ensure it’s acceptable to use and doesn’t contain any thorny issues such as personal
information that should have been redacted under GDPR. You are going to need tools and
processes to validate the labels and continuously improve the quality of the training and
test data.
What about when you do not have existing data? Then, quite simply, you need to cre-
ate it. This is one of the reasons why, in evaluating the Doability of an AI project, we ask
whether the AI will fit into an existing business process. If it does not, then you are going to
need to create training and test data from scratch and you need to ensure it is accurate. One
method of doing this is to crowd source. Take a large corpus of data and ask your entire
workforce, or other volunteers, to each label some of the data. Many hands do make light
work; however, it is critical that you ensure the consistency of the data. Once again, you are
going to need the tools and processes to ensure accurate and consistent data.
Much of what we are discussing here will be revisited in the following chapter. The key
message is that, if you are to measure accuracy, you need accurate training and test data
and that means you are going to need tools to develop and maintain this data.
One point to note, however, is that this is not necessarily an upfront process. It may be
perfectly reasonable to develop an AI with the aim of starting with whatever data is avail-
able and then continually improving its quality. By deploying an AI early, even when its
accuracy is dubious, you may be able to gather feedback and identify conflicts such that
you can improve the quality of your training and test data more rapidly. Of course, there
is a caveat! If the initial performance is too poor, the users may lose faith in the approach
and disengage. Remember that careful Stakeholder management is another essential part
of making AI work in the real world.
processing time by a certain amount? We know what it costs to insure an applicant and we
can quantify the risk of approving an application that should have been declined. It may
be a complicated task, but assessing and quantifying risk is what insurance companies do!
As such, the company should be able to determine what they need from the AI in terms of
accuracy to allow them to reap the benefits of augmenting human underwriters with AI
tools. There is an example of a real insurance underwriting solution in Chapter 2.
Now let’s consider driverless vehicles. Clearly, 95% would not be acceptable … or would
it? Well, firstly we need to be clear on what the 95% accuracy is describing! Driverless
vehicles are complicated beasts and made up of many subsystems. This raises the ques-
tion of how the accuracy of an individual AI component contributes to the accuracy of an
overall system.
When AI components interact, they can impact the overall system performance in dif-
ferent ways depending on the configuration and architecture. One pattern touched on in
Chapter 2 is to federate decisions to multiple AI services and perform some form of arbitra-
tion or fusion on the results. In a scenario where we used three separate Classifiers, each
with an accuracy of 95%, a simple voting system could improve the overall accuracy. A
fusion strategy could be to select a classification if two or more of the Classifiers generate
the same result. If all three Classifiers generate a different result, then the classification gen-
erated by the Classifier with the highest confidence is chosen. This type of configuration
can, depending on the scenario, achieve a far higher accuracy than any of the Classifiers
individually.
Now let’s consider a pattern where multiple AI components are integrated in a single
sequential architecture (Figure 6.5). In this pattern, errors are normally compounded. An
error in one of the earlier components causes inaccurate data to flow into a later compo-
nent and the errors can grow as a consequence.
AI AI AI
COMPONENT 1 COMPONENT 2 COMPONENT 3
(Entity Extraction) (Entity Resolution) (Graph Analytics)
sensitive and qualitative factors. If the driverless vehicle is to replace human drivers in a
delivery business, we can calculate both the business cases in terms of cost and the safety
case in terms of reduced accidents. However, it will be more difficult to quantify the public
perception of the safety. We should expect the public to require a far higher standard of
safety than currently delivered by human operators.
As always in engineering, there are few right and wrong answers. The key point, how-
ever, is that in order to successfully develop an AI application, you need to understand
the accuracy required for the application to be both operationally valuable and ethically
acceptable.
ALERTS
COMPARATIVE
DEVIATION
ANALYSIS
FIGURE 6.6 Monitoring inputs and outputs of an AI component in production for potential
deviations from the expected behaviour.
i. The challenger models can be used to ratify the deployed model’s performance.
ii. Once the relative performance among the models is established and understood, if all
the models begin to deviate, there is a good chance that something has changed with
the input data assumptions.
iii. Keeping a stable of challenger models (including the ones that are being retrained
with the latest data) means that it can be easier to swap out an errant model without
major business impact.
158 ◾ Beyond Algorithms
ALERTS
COMPARATIVE
DEVIATION
ANALYSIS
Challenger
AI
MODEL
Challenger
NON-AI
MODEL
FIGURE 6.7 Use of alternative models to validate the behaviour of the deployed model and provide
options in case it fails.
James Luke
Ensuring It Works – How Do You Know? ◾ 159
If there is any ethical concern (e.g. bias) about your deployed model, keeping one of the
challenger models as a traditional non-AI model can help to swap them quickly and limit
the negative business impact. Even if less accurate, this gives an option of a “safer and more
explainable” model while you repair your AI.
COMPARATIVE
DEVIATION
ANALYSIS
LOW Manual/Human
Confidence. Decision
10% of Cases 15%
FIGURE 6.8 Manual decision to supplement AI automation. The system has high confidence in
90% of the cases and low confidence in 10% of the cases. Five percent of the high-confidence cases
are deliberately routed to manual decision for validation.
160 ◾ Beyond Algorithms
Fairness/Bias
Does your AI application carry implicit biases? Typically biases stem from the imbalances
in training data, and it is important to check the data distributions to identify biases even
before the AI model is built. As mentioned earlier, blind use of historical data for training
is a natural place for biases to creep in. There must also be monitoring during deployment
to identify any unexpected biases in the AI application in production.
Explainability
Since ML models are black boxes (i.e. opaque models) created using complex statistical
algorithms operating on the training data, there is no simple way to explain the out-
put result from the input data. The various stakeholders (development teams, consumer
of the application, auditors, etc.) need appropriate levels of explanation for the model to
be trusted. This means, explanations for the relevant stakeholders must be built into the
development activities and the application user interfaces.
Robustness
It is well established that the ML models are prone for adversarial attacks. Just small manip-
ulations of an image or text by an adversary can lead to unexpected results. If the adversary
has access to training data and the model development process, more virulent attacks are
possible. Application owners must design techniques to withstand adversarial attacks as
well as to recover from them when they happen. The consequence of adversarial attacks can
be severe in mission/life-critical scenarios.
Uncertainty
Due to the underlying statistical learning techniques in ML models, every output from a
model has some uncertainty in it. Model owners need to provide an estimation of uncer-
tainty in model outputs and make it visible to the users. This is in addition to the usual
confidence levels provided by typical ML algorithms. This will allow the user to opt for
human decisions when the model uncertainties are not acceptable in practice.
Transparency/Governance
Due to the extreme sensitivity of AI model behaviour to training data and the AI ethical
guidelines from governmental agencies on the use of data (e.g. European Commission’s
GDPR), there has to be a governance mechanism for built-in transparency in the creation
of AI components. Even a technically sound and a very useful application can fail in the
society if proper governance is not practiced by the enterprise.
REFERENCES
1. F. P. Brooks, The Mythical Man-Month: Essays on Software Engineering, Anniversary edn.
Addison-Wesley Longman, Reading (1995).
2. S. McConnell, Code Complete: A Practical Handbook of Software Construction, 2nd Edn.
Microsoft Press, Redmond (2004).
3. B. Hailpern and P. Santhanam, “Software debugging, testing and verification,” The IBM
Systems Journal, 41, pp. 4–12 (2002).
4. Body Measures Data, Center for Disease Control, https://wwwn.cdc.gov/Nchs/
Nhanes/2007-2008/BMX_E.XPT.
5. J. Brownlee, BLOG: metrics to evaluate machine learning algorithms in python. https://
machinelearningmastery.com/metrics-evaluate-machine-learning-algorithms-python/.
6. M. McKillop, et al., “Leveraging conversational technology to answer common COVID-19
questions,” Journal of the American Medical Informatics Association, 28(4), pp. 850–855 (2021).
7. E. Adamopoulou and L. Moussiades, “Chatbots: history, technology, and applications,”
Machine Learning with Applications, 2, Elsevier (2020).
8. K. Kuligowska, “Commercial chatbot: performance evaluation, usability metrics and
quality standards of embodied conversational agents,” Professionals Center for Business
Research 02, (2015).
9. T. Sandbank, et al., “Detecting egregious conversations between customers and virtual
agents,” Proceedings of the Conference of the North American Chapter of the Association for
Computational Linguistics: Human Language Technologies, 1 (2018).
162 ◾ Beyond Algorithms
“More data … I need more data!” If you are the spouse, parent, partner, colleague or friend
of an Analysis Junkie, you’ll have heard them saying that at least once (a day). AI Engineers
are both Algorithm Addicts and Analysis Junkies so they’re going to need feed their habit
and will be continuously demanding more data. The reason Algorithm Addicts and Analysis
Junkies have such a craving for more data is that it really is the key to success. In fact, it
could be argued that availability of data is the most significant factor that is driving the cur-
rent interest in AI. As a society, we are collecting, or rather, generating more and more data.
DATA TSUNAMI
Let us understand the sources that contribute to the massive data tsunami. Figure 7.1 shows
a snapshot of activities on the internet in 1 minute in 2021. Not surprisingly, the activi-
ties relate to online searches, shopping, social media, entertainment, mobile applications,
etc. Almost everything we do on the internet is recorded. Our financial transactions are
increasingly electronic. We also cannot forget the real-time data from a plethora of sources
such as stock market trades and Internet of Things (IoT) devices including cell phones,
automobiles, refrigerators, Closed-Circuit TV (CCTV) in parking lots, Automatic Number
Plate Recognition (ANPR) at the toll booths, etc., that are continually recording all types of
data from weather to traffic to power consumption.
FIGURE 7.1 What happens in an internet minute in 2021 [1]. (Reproduced with permission.)
Beyond the usual suspects on the internet mentioned above, there are digital news
media (e.g. CNN, BBC, etc.) and the digital versions of print newspapers (e.g. The New
York Times, The Wall Street Journal, The Guardian, etc.). To add to this deluge of archival
data, most recent books, magazines and journals are currently available in the digital por-
table document format (pdf). Even publications that were previously published on paper
are being rapidly digitised for easier access to the readers, facilitating machine analysis,
It’s All about the Data ◾ 165
even if requiring Optical Character Recognition (OCR). Then, there are web sources such
as Wikipedia or Data.gov and unlimited number of blogs on almost any topic imaginable.
In addition to the raw transactions, the metadata collected during the transactions (e.g.
name, email addresses, IP address, location, etc.) have commercial value to the compa-
nies that are collecting them. While these data from these sources may be publicly visible,
using the data for AI requires proper licensing agreements with the owning companies. In
addition, companies (e.g. banks, retail stores, telecommunications, manufacturing, etc.)
have their own business-relevant proprietary data such as customer profiles, credit card
transactions, business processes, product engineering information, personnel records and
email. With so much data, it is no surprise that organisations of all shapes and sizes see the
opportunity to exploit this data … if they can make sense of it.
A recent IDC study [2] estimates the trend of the worldwide data growth (they call it
“Global Datasphere”). It is shown in Figure 7.2, reaching 180 Zeta Bytes in 2025. In case
you are wondering how much one Zeta Byte is, it is 1021 Bytes. The study is also project-
ing the data stored in the enterprise data centres to grow from 46 ZB in 2021 to 122 ZB in
2025. That is a lot of data! We can only anticipate with the expanding 5G services across
the world by communication service providers, the variety, volume and velocity of the data
can only increase even more.
With so much data available, it’s all too easy for the Algorithm Addict to fall off the
wagon! Surely, all you need to do is gather as much data as possible and the algorithm will
do the rest? No! It’s time to get back into rehab and remember that there is no magic algo-
rithm and just throwing algorithms at data never delivers value. To successfully deliver AI
solutions, you are going to need to learn a little more about data and how to deal with it.
It really is all about the data, and in AI project, you can expect to invest up to 80% of the
available resources just on accessing, understanding, cleansing, preparing and managing
the data. Remember, once your model is trained, you are going to have to include most
or all of the data transformations you put your training data through into your live data
FIGURE 7.2 Estimated trend of world-wide data growth [2]. (Reproduced with permission.)
166 ◾ Beyond Algorithms
application before you can use it. Spoiler alert: if your data transformation code needed to
look up massive tables of reference data to clean/augment the transaction data, so will your
live model. If you need the application that model sits in to have sub second responses, you
may want to rethink your business case.
In this chapter, our aim is to help you understand the fundamentals of data science so that
you can make good decisions when it comes to selecting, scoping and managing projects.
Before digging deeper, it’s worth taking a few minutes to remember that enterprise AI is dif-
ferent from consumer AI on the internet. We talked about this earlier in Chapter 2 and the
fact that so much of the current interest in AI is driven from the perspective of the big inter-
net companies such as Google, Amazon and Facebook. We make two key observations here:
i. These companies are accessing, generating and storing massive volumes of data that
requires an AI to get business value.
ii. In turn, they naturally drive advances in AI to meet their own needs, which are not
the same as the needs of a typical enterprise.
In addition, as we discussed above, the content the web companies generate is actually quite
narrow in scope and their platforms inherently standardise the data collection so that they
can be easily analyzed. Consequently, the internet companies do not share the same concerns
about AI applications as typical enterprises, in terms of data volume, types or quality.
DATA TYPES
Structured Data
The format of the data collected can vary widely depending on the source and the intended
purpose. Structured data refers to the format of the data that facilitates queries against it in a
pre-designed and engineered way. There are three mainstream structured data representations:
i. Traditionally, enterprise data used for business analytics has been in the form of
tables (e.g. Microsoft Excel spreadsheets) with columns representing attributes (or
features) and rows representing observations. Relational Databases (e.g. IBM DB2,
Oracle, etc.) that have schemas and linking tables are used for complex large-scale
data and high-performance needs. Structured Query Language (SQL) allows queries
against relational databases.
ii. Semantic Web is a knowledge representation promoted by the World Wide Web
Consortium and Tim Berners-Lee to make the internet data machine readable. It
consists of a Resource Description Framework (RDF), Web Ontology Language
(OWL) and Extensible Markup Language (XML). SPARQL is the language to per-
form ‘semantic’ queries.
iii. Recently, graph databases (e.g. Neo4j, JanusGraph, etc.) that use nodes and edges to
represent relationships between entities have gained wider acceptance. Cypher and
Gremlin are examples of query languages for graph databases.
It’s All about the Data ◾ 167
Unstructured Data
Not surprisingly, most of the human-created data is unstructured. Natural language
(i.e. English, Spanish, Hindi, etc.) is the most pervasive form of communication among
humans and the most natural to capture knowledge. The language can be in the spoken or
written textual form. In either case, the sheer number of languages and their dialects pose
a major challenge to a machine. In the written text, the syntax of the grammar and the
symbols representing the language have to be learnt by a machine. In addition to the gram-
mar, the spoken language has many inherent attributes (e.g. slangs, accents or emotions)
present, which are difficult to process. In spite of major advances in Natural Language
Understanding (NLU) in narrow contexts, as evidenced by our daily interactions with
Alexa or Siri, mastering language in general remains a significant AI challenge.
With the ubiquity of cell phones, data captured as a photograph or a video has become
extremely common. Images can be in different formats and/or resolutions. Videos bring
in the complexity of actions captured as time sequence of image frames and the fusion of
audio & visual information. Computer vision is a very established field over many decades.
But, as discussed in Chapter 3, the handcrafted feature extractions of the past are being
completely overrun by the capabilities of today’s Deep Neural Network (DNN) algorithms.
• The choice of the training data set depends on the domain of interest. As an example,
let us imagine we want to use AI to identify the missiles from aerial images during
the Cuban Missile crisis in October 1962. If the reconnaissance flights can happen
any time of the day, we need to have training data consisting of aerial images of
potential missiles, hopefully taken during day AND night.
• As we discussed in Chapter 3, supervised ML needs labelled data sets. If not already
labelled, manual labelling of the data has to be included in the project planning.
• Careful consideration of licensing implications and privacy constraints for the imme-
diate and long-term use in the business application.
There are many public data sets available to support the training of AI models, both
unstructured and structured formats [3]. The data sets include images for various pur-
poses (e.g. facial recognition, object recognition, handwriting & character recognition,
etc.), videos for action recognition, text data (news articles, messages, tweets, etc.) and
structured multivariate data in a tabular form in various domains (financial, weather, cen-
sus, etc.). The ability of an AI algorithm to extract useful knowledge out of data critically
depends on the ability of the data scientist to prepare the data for the purpose. Structured
168 ◾ Beyond Algorithms
and unstructured data require different challenges in that context. For example, noise in
a collection of images manifests itself very differently from noise in a relational database;
hence, different techniques are required in order to process the different data types. The
processing of unstructured data requires critical steps for extracting ‘structured’ features
in it. An example in natural language text is ‘topic modelling’ that captures statistical co-
occurrences of certain words in a document.
ENTERPRISE REALITY
A typical enterprise (e.g. banks, insurance, retail stores, manufacturing, etc.) faces a different
reality from the web technology companies when it comes to data. To keep this discussion
simple, we will use the language of structured data captured in a tabular form to illustrate the
basic ideas, but analogous examples in the unstructured data domain are easy to imagine. Data
may have been collected over many years and is often stored in siloed systems. Just accessing
the data can be a challenge due to security, privacy and regulatory concerns. Once accessed,
data from separate sources needs to be fused together; not easy when there may be no com-
mon point of reference between records. Fields may be incomplete, and all too often we see
data that has been partially or wholly duplicated across multiple systems. In many cases,
the duplication is inconsistent with some records being amended such that it is impossible
to determine which data is actually accurate. These are just some of the issues that you will
encounter and, in a nutshell, are the reason so much effort needs to be focused on the data.
AI projects really are all about the data! So … what do you need to know? We’re going to
start out by considering some key differences in the way human beings and AI algorithms
use data when learning and then when making decisions.
It’s All about the Data ◾ 169
DATA WRANGLING
Data wrangling is simply the process of making raw data useful for AI model building,
much like taming the wild cattle to behave farm friendly. The objective is to create a coher-
ent useful view of the data so that we can train the AI to learn the underlying patterns
easily and efficiently. The process may require manipulating large volumes of data from
disparate sources by summarisation, categorisation, aggregation, etc. For example, let us
say that you own three stores in the local marketplace, selling fruits, bicycles and fast food,
respectively. You record the sales at each store daily using some point-of-sale software. You
could choose to keep the three data sets separate or aggregate them by fruit sales versus
bicycles versus fast food, or food (fast food plus fruit) versus bicycles, or sum all to create
a daily market sales figure. You might want to merge in weather data summarised by day
showing the average temperature, rainfall, etc. Depending on what you want to predict,
you will need the appropriate data. Your task is to create an input data record that is con-
sistent i.e. can be replicated in the real world and carries enough detail to give your model
a chance to spot a pattern that is useful. With the data we have collected this far, here are
some examples of questions you can ask and expect an answer from the AI models:
• Should you keep the bicycle store open for longer hours on weekends and nice
weather days?
• Do you sell more fast food on rainy days?
• Do you have the same customers coming to all three stores? Perhaps, you can give
them special incentives?
Answers to each of these questions is looking for a pattern in the buying behaviour of your
customers in each data set or across the combined data set from the three stores.
Data wrangling is data science’s dirty little secret, it can be more an art form than sci-
ence, i.e. it is a creative activity with little formal rules. Remember to keep a focus on what
is practical for the environment you are trying to model, the variables and summaries that
you pick for your training HAVE to be available in the time frame for your live AI system. A
common error (more common than you’d think) is to inadvertently bake ‘future’ data into
your training data. For example, when aggregating historical data sets, you may find a field
containing actual average temperature per day. You thought it would be useful and chose
to merge it by date with your other data. But, if you are hoping to use your model to predict
today’s sales, what temperature value are you going to use? Today hasn’t happened yet and
so you can’t use today’s actual average temperature! Here are some choices:
• Use a predicted average temperature. This is doable in this simple example but pos-
sibly inaccurate and only as good as the temperature prediction.
• Rebuild your model with the historical “predicted” not actual temperatures (if they exist).
• Rebuild the model with the previous day’s average temperature, etc.
It’s All about the Data ◾ 171
Choosing what variables (aka features) to include or create (by data wrangling) is always
messy, do not be frightened to experiment and try multiple approaches. We’re now going
to dig deep into the question everyone asks … “how much data do I need?”
2πt
hM = AM cos +ϕM
TM
where TM = 12.421 hours is the Principal Semidiurnal Lunar (M2) period, AM is the ampli-
tude and ϕ M is the phase that depends on the details of the underlying physics. These
are the three parameters we need to determine using the measurements discussed in
Figure 7.3. If we have this theoretical knowledge, then all we really need is sufficient
data to estimate these three parameters at a given location. The generalisation of the
model is achieved by the equation itself. We need to be careful at this point, because a
lot depends on how we are collecting the data. If we are able to sit on the shoreline, we
would only need to record the minimum height and the maximum height of tide together
with the times at which they occurred. If the measurements are perfect, we can rely
on only three data points because we need as many independent observations as the
number of unknown parameters in the model. Due to inevitable measurement errors, we
probably need many more data points (say, 30) to build a valid model with appropriate
error estimates.
FIGURE A: An illustration of the effect of the sun and the moon on the tides on earth.
Not to scale.
It’s All about the Data ◾ 173
2πt
hS = AS cos +ϕS
TS
where TS = 12 hours is the Principal Semidiurnal Solar Period (S2), AS is the amplitude and
ϕ S is the phase that depends on the details of the underlying physics.
Combined Tidal Impact of Both Moon (M2) and the Sun (S2).
The combined effect of the Moon and the Sun can be written as
2 TMTS 2 TMTS
Tc = ; Tm =
TM + TS TM − TS
resulting in Tc =12.2 hours and Tm = 29.5 days. There are five parameters (TC, Tm, ac , am and ϕ )
in this equation giving rise to the shape of Figure 7.4. Here, the amplitude of the higher-
frequency (i.e. smaller period, 12.2 hours) wave is modulated by a lower-frequency (i.e.
higher period, 29.5 days) wave. The larger amplitude (Spring tide) is caused by the Moon
and the Sun reinforcing each other while the smaller amplitude (Neap tide) is when they
are opposed to each other.
believed it was possible to build a machine to calculate and print these tables. His name was
Charles Babbage! He built a small version of a ‘Difference Engine’ in 1822, but the work on
a larger engine was suspended in 1833. It could be argued that the need to understand tides
was a key factor in the development of early mechanical computers.
What would happen though if we knew nothing about the physics of tides [8,9] and
we wanted to use AI to predict tidal height. How much data would we need? The simple
answer is that we don’t know how much data we would need. All we could do is start col-
lecting data. What happens if we collect data at different sampling rates (See Figure 7.3)?
FIGURE 7.3 Shows an example of different amount of tidal data collected (blue dots) during a
24-hour period; (a) every 12 hours, three data points; (b) every 6 hours, five data points; (c) every 3
hours, nine data points.
174 ◾ Beyond Algorithms
As you can see in the diagrams, the 12-hour sampling doesn’t really give us any insight
into what’s happening. The 6-hour sampling suggests that there is some form of periodic
behaviour and the 3-hour sampling strongly indicates a sinusoidal form. Obviously, the
more data we collect, the clearer the picture will become. If we had access to an expert,
then we would know that tides rise and fall sinusoidally with an average period of 12 hours
and 24 minutes in most locations in the UK.
With that knowledge, we could then construct a theoretical model of a sine wave for
the tidal height that has three parameters: amplitude, period and phase (please see the
vignette for more physics). We need three independent observations minimally for a per-
fect fit. In reality, due to random errors in observations, we may need to have more obser-
vations (say 30) to estimate the three parameters and the associated errors more accurately.
Without that item of knowledge, all we can do is collect more data until we are confident
that we understand what is going on.
FIGURE 7.4 The tidal amplitude over a month from [10] reproduced with permission. This com-
plex behaviour is due to the combination of the result of the principal semi-diurnal lunar harmonic
(M2) and the principal semi-diurnal solar harmonic (S2). The envelope has a period of 29.5 days
whereas the daily waves have a period of 12 hours and 24 minutes [8,9]. Please refer to the vignette,
“Why tides rise and fall”.
Unfortunately, if we only collect tidal data over a 24-hour period, we are missing some
critical observations. The maximum and minimum height of tide varies over the course
of a month … but remember we don’t know that! If we monitored the height of tide every
hour over a period of one lunar month, we would observe something similar to Figure 7.4.
Again, the more data we collect, the clearer the picture will become. If we had access
to an expert, she would explain the theory behind this behaviour as due to the interaction
between two different wave forms due to the influence of the moon and the sun, resulting
in this ‘amplitude modulation’. The underlying theoretical model will have a specific equa-
tion for the tidal height with five parameters related to the two waves (see the vignette ‘Why
tides rise and fall’ for more physics). With that knowledge, we would only need to collect
minimally five independent observations in order to construct a theoretical model, even
though as before, we may need a few dozens more for estimating the five parameters with
It’s All about the Data ◾ 175
the corresponding errors. Without that knowledge, all we can do is collect more data until
we are confident that we understand what is going on (have we said that before?).
Unfortunately, if we only collect data every 3 hours, we might be missing some critical
observations! The challenge is that in some locations, other factors impact the height of
tide. Geographic features may cause back eddies and all sorts of strange water flows. For
example, there are no significant tides in the Mediterranean due to the narrow entrance
at Gibraltar; the Straits act as a bottleneck and prevent the seawater flowing in and out of
the Atlantic. Similarly, if there is a large river, the height of tide may be impacted by high
rainfall. Figure 7.5 shows the tidal behaviour in the Solent, the stretch of water between the
Isle of Wight and Southampton in the UK.
FIGURE 7.5 Tidal wave over a day at the Solent in the UK. The wave form is more complicated than
a simple sine wave due to the local geographical features [11]. Reproduced with permission.
As you can see, this profile is very different to the sinusoid we have been talking about
so far. The only way to discover that is to collect more data! The more data you collect, the
more confident you will be that you understand what is going on (have we said that before
… twice?).
This raises an interesting conundrum in the use of ML. The main reason for using ML
is that we don’t understand the underlying model and we need to create a model that can
generalise using the data available. The more data we collect, the less we need to generalise
and predict because we could, theoretically, just look up the data. ML will hopefully allow
us to generalise and predict … but only if we have sufficient data. We’re using ML because
we don’t understand the underlying model and we can’t know if we have sufficient data
unless we understand the underlying model.
So, how do we address this conundrum? First, ensure that whatever expertise does exist
is tapped into! Whilst the experts may not fully understand what is going on, they will have
176 ◾ Beyond Algorithms
knowledge that will help validate that sufficient data has been obtained. Second, ensure
that an effective test strategy is adopted to ensure the behaviour of the AI in the production
environment meets the business objectives.
For this simple tidal example, let’s finish by considering the three factors we mentioned
earlier.
Firstly, the complexity of the underlying model. If the tidal behaviour we are modelling
follows a simple sinusoid, then we only need a handful of data to build an accurate model.
If we are modelling the tides in Southampton, then we’re going to need a lot more data.
Secondly, our theoretical knowledge of the situation. If we have a theoretical knowledge
of tides, we can use this knowledge to inform our data collection strategy.
Thirdly, the accuracy required of our model. If we are sailing a yacht between the Solent
and the Isle of Wight, then we can probably get away with a sinusoidal model. Yacht sailors
generally don’t need tidal data to be too accurate as they allow contingency and use their
echo sounders. However, if we’re manoeuvring an aircraft carrier, then we probably need a
more accurate model and therefore we’re going to need more data.
corresponding disease/ailment. Is your new “Diagnozer” ready for real use? Does it know
enough? Have you trained it on enough data?
It might seem so; however, a simple discussion with a medical doctor would inform you
that whilst the encyclopedia in question did indeed hold great information on the illnesses
that patients exhibit 95% of the time, it only held information on a small fraction of all
the illnesses you need to know about to safely diagnose. This is because whilst the com-
mon illnesses represent 95% of all cases, these common illnesses represent only 20% of all
possible medical complaints. The remaining 5% of cases represent the remaining 80% of
possibilities, but more importantly many of the 5% patients have rare and deadly illnesses
with symptoms easily confused with lesser ailments. Your Diagnozer could lead a critically
ill patient to believe they had a benign condition. Your AI could kill!
So your “Diagnozer” is dangerous and worthless … right? Wrong!
Business value is very much in the eye of the beholder, your AI tool is not suitable to be
given to the general public for self-diagnosing that clearly would be a lawsuit waiting to hap-
pen. But consider your AI Diagnozer as a tool to augment a General Practice Medical Doctor
and you have a massive timesaver/quality improver. Your tool would allow the doctor to
instantly get to the most probable cause of a patient’s illness, freeing plenty of time in the con-
sultation to check for the less likely candidates. In the hands of an amateur, any tool can be
dangerous; the same tool in the hands of a professional much less so. It is true of the cheapest
medical scalpel, we should not be surprised it’s also true for the most expensive medical AI.
Because we have discrete values, it is possible to calculate the exact volume of the Feature
Space. In this case, there are theoretically 576 possible permutations in the Feature Space.
In the real world of course, numerical values are rarely discrete so the volume of the Feature
Space is technically … possibly … potentially … infinite … but you get the idea!
The volume of data we require to understand this Feature Space depends on what hap-
pens at every possible point in the Feature Space. In particular, it depends on the probabil-
ity of each possible output across the Feature Space.
Let’s imagine for a moment that we decided to collect 576 examples, one record repre-
senting each of the cells in the Feature Space (we will ignore the fact that in the real world,
few if any, six bedroom houses exist with one bathroom). For one particular cell, we find that
the house price is LOW as shown below.
However, if we were to collect ten records for each of the cells, then we might collect the
data below.
As you can see, when we collect more data, we find that for this particular set of input
data there are five high-priced houses and five low priced. Knowing this particular set of
data gives us a 50-50 chance of predicting the correct output value. If we had only used
a single record, clearly, we would not understand the 50-50 distribution. A single record
does not give us enough data on which to understand what really happens at that point in
the Feature Space. In an ideal world, every time we collected data for an individual point
in the Feature Space, we would observe the same output. If, as in the example above, the
data reveals a 50-50 situation, then we can’t really make a useful prediction. Guess what?
We need more data!
It’s All about the Data ◾ 179
If we’re lucky, we could collect a thousand data points and discover that the distribu-
tion is actually 995 low to 5 high. That would give us a clear and useful correlation. If we’re
unlucky … and in this game, we usually are … collecting 1000 samples will still give us a
50-50 distribution. What does that mean? We need to collect more features as the features
we are collecting are, as in the subsidence example above, insufficient to pull the data apart.
Adding more features increases the volume of the Feature Space so … wait for it … we’re
going to need to collect more data.
So, how many records do we need to collect for each point in the Feature Space? It
depends! It depends on the complexity of the underlying model and, generally, when using
ML, we don’t fully understand that model!
FIGURE 7.6 House price against two features, Number of Bedrooms and Distance to School,
ideal case.
Unfortunately, the world isn’t perfect. In reality, we will see a picture that looks more
like that shown in Figure 7.7.
180 ◾ Beyond Algorithms
FIGURE 7.7 House prices against the same two features in Figure 7.6, reality.
It shouldn’t be a surprise to anyone to see that we have a couple of cheap houses that are
close to good schools and have a large number of bedrooms. Clearly, there are other factors
which impact the price of a house so, at this point, we need to engage our Analysis Junkie
mantra and collectively shout, “More data please!” (Figure 7.8).
(a) (b)
FIGURE 7.8 Explaining perceived anomalies with additional dimensions. (a) Low prices of the
two houses are due to the risk of flooding. (b) High prices of two houses are due to the number of
bathrooms.
The Data Gods hear our call for help and graciously provide us with more data and
instantly we see that two of the unexpectedly cheap houses have a problem. They are the
victims of flooding and are therefore much cheaper than other properties of the same size
in similar locations. The two high-priced houses that are further away from the school hap-
pened to have a large number of bathrooms.
While the new features allow us to explain what is happening, it introduces a new chal-
lenge that we refer to as the Curse of Dimensionality. As we add more features, the size of
the Feature Space increases. We therefore need more data in order to fully understand what
It’s All about the Data ◾ 181
is happening. In the case of a house price valuation, there are many different factors (See
Figure 7.9) that need to be taken into account. Flooding is one, subsidence could be another
possible factor that could justify a reduction in the value of the property. Alternatively, the
house may have been constructed using dangerous materials such as asbestos or it may
have been built in a location close to a busy road or a railway line or a noisy night club. The
question is, how do you present this information to the AI in the most effective manner?
Theoretically, you could just keep adding features. However, that isn’t really an effective
strategy for two reasons. Firstly, every time you add a new feature you are increasing the
size of the feature space and you are going to need more and more data to ensure you cover
the whole feature space. Secondly, no matter how hard you think about the problem in
advance, once operational, you will always encounter new scenarios that you hadn’t previ-
ously considered. It simply isn’t practical to keep adding extra features.
One approach is to derive a new amalgamated feature that can be used for all inputs that
can have a negative impact on the price of a property (Figure 7.10).
The key here, this is a theme we will return to, is ask your experts what factors they take
into consideration when they make a decision. Start by mimicking the information your
experts use/need and build from there. Hint you may need to add more features than the
expert initially identifies, often experts are good at their job because they have internalised
their knowledge, e.g. they don’t have to think about how they make decisions, they just do
it based on experience. It is only after you circle back to the experts with a model that seems
to do naive things that they remember, “oh yeah I forgot to say I always check to see if the
house is on a registered flood plain before I give a valuation…”
The creation of this new amalgamated data feature requires data science and engineering.
By applying this engineering effort, we achieve two positive results. Firstly, we reduce
the amount of data required to build an effective AI Model and, secondly, we ensure that
the model is future proof. It is possible to add new features or remove old features without
needing to re-train the AI Model.
This simple example highlights the importance of Feature Definition in determining the
effectiveness of the solution. Like all computer systems, AI solutions are only as effective as
the data they are provided with. In our house price solution, the ability to accurately pre-
dict the price of a property is directly constrained by the features available. If the solution
has no access to flooding or subsidence data, then it simply cannot predict the lower price
of a flood risked, or subsided house. This may seem obvious, but it’s a point that is often
missed in more complex case studies.
Within Feature Definition, we often talk about two specific techniques: Feature Reduction
and Feature Expansion. Feature Reduction aims to take a large number of Features and
reduce the number either by fusing them together, as in the above example, or remov-
ing them all together if they don‘t seem to add value. Feature Expansion aims to add new
features in situations where we have insufficient data to perform the task. There are many
different techniques used by data scientists to perform these tasks. Quite often, data scien-
tists will use correlation techniques to understand the relationships between features. If,
for example, you are measuring frequency and wavelength, those two features are directly
related … they are effectively the same thing … so there is no point including both in the
Feature Space. Data scientists also use a technique called Principal Component Analysis
(PCA) to fuse multiple features into new features that are significant in making decisions.
PCA is useful when you have a large sparsely populated feature space and you can’t be
certain which features to fuse or safely remove, PCA automatically starts to combine the
features into new statistically correlated merged features, so your 100 sparsely populated
actual features might come out as 10 PCA condensed pseudo features! Perfect for your ML
training but not if “explainability” is part of your ‘must haves’. Picture the scene…“Yes Mr
Smith, we need to amputate your left leg, why? It’s because the PCA3 variable which may
or may not be based on a mathematical combination of seven symptoms that you may or
may not have says so…” So PCA is great to prove a pattern exists and is great when there is
no sensible way of manually reducing the feature set, but it is less useful if you really need
to understand why your model makes a particular decision. Please refer to the vignette for
conceptual details behind how it works.
It’s All about the Data ◾ 183
The answer, once again, is to have a well-defined test strategy to evaluate the effective-
ness of the AI. The good news is that, at this point, we can once again rely on the wonders
of science and mathematics. Whilst we can’t predict how much data we will need, once we
have collected data, we can determine if we have enough. We refer to Chapter 6 for more
details on some useful techniques.
The high-level messages are simple. AI is data greedy and, in everything you do, you
should be ready to chant the Analysis Junkie’s mantra and demand “more data”. The
amount and the distribution of the data you need will depend on the system you are trying
to model. Always remember that just having a huge amount of data doesn’t mean you have
enough data! The amount of data you need depends on the complexity of the problem and
the system you are trying to model.
As in all things to do with AI, it’s important to apply the best scientific and engineering
practices. AI is not engineering free so never throw data at algorithms without doing the
science and engineering needed to understand exactly what is going on.
technique to reduce noise in images. An advantage of the Autoencoders over PCA is their
ability to represent non-linear relationships among the attributes whereas PCA assumes
linear dependency. However, they do have the problem of human understandability of
their internal representations.
REFERENCES
1. F. Lima, “Principal component analysis in R,” in R bloggers: https://www.r-bloggers.
com/2017/01/principal-component-analysis-in-r/.
2. G. Dong et al., “A review of the autoencoder and its variants: a comparative perspective from
target recognition in synthetic-aperture radar images,” IEEE Geoscience and Remote Sensing
Magazine, 6(3), pp. 44–68 (2018).
The interpretation of the data recorded in the notes will also change significantly with
time as medical knowledge has developed. For example, the condition known as Systemic
Lupus was unknown in the 1950s, and patients suffering from that condition would prob-
ably have been incorrectly diagnosed with Rheumatoid Arthritis.
The next problem to think about is inaccessible data. Quite often you will be told that
there is a huge amount of data only to find that it is not in a form that is usable. It may be
that the data is in the form of handwritten notes, or scanned image data, which cannot yet
be transformed into an electronic or usable form. Alternatively, the data may exist in an
electronic form but be missing the key index or reference number that enables it to be cross
referenced with another key source.
The real world is continually changing and that is reflected in the continual creation
of new data sources and new features. This raises an interesting issue relating to the
scope of an AI Application and, in particular, whether to adopt a Fixed or an Expanding
Feature Space. In the most basic AI Applications, the features used are fixed. For exam-
ple, an image classifier will be designed to process a specific size of image with no vari-
ance in the input data. However, in more sophisticated solutions, there may be a desire
to add new data sources and features to the capability as they become available. In that
situation, the Feature Space will be continually expanding. If missing features are an
extreme form of missing data, then an expanding feature space is an extreme form of
missing features.
Then we come to the problem of missing data. Data collection is rarely perfect for vari-
ous reasons. People who are asked to fill in forms often fail to complete every field. Sensors
and communication links fail so data that should have been collected is lost. Systems crash
and hard drives are corrupted. When building data-marts, data-lakes and data-warehouses,
data is often merged from many historical legacy data stores. However, when merging data,
the data models are rarely an identical match so not all fields are populated.
An extreme form of missing data is missing features. In our house price example above,
we realised during the analysis that we needed to collect subsidence data. Remember that
if you subsequently add a feature like subsidence, because it helps the model, then you will
need to answer that question for all the houses already in your set, if not, how will the model
interpret the houses with missing values. If you are using historical data, data from a survey
186 ◾ Beyond Algorithms
Hard times, indeed! A giant quake struck New Madrid, Missouri, on Feb. 7, 1812,
the day this author struck England.
The answer turns out to be ‘Charles Dickens’, since he was born on that day!
So, what was the big deal with Jeopardy and, in particular, what made the applica-
tion different to conventional search or question answering? Firstly, the questions could
not be answered just by keyword search. In most search engines, it is possible to simply
extract keywords from the question and find a passage of text that is likely to include
the answer.
Secondly, in conventional search applications, a massive proportion of the questions
asked relate to a relatively small amount of the content. In Deep Question Answering,
the application needs to answer questions that are very rarely asked and require the full
breadth of content to answer. See Figure A for a comparison of traditional Search vs.
Jeopardy.
FIGURE B: An extensible approach used in the Jeopardy project to deal with the ever
expanding feature space.
It’s All about the Data ◾ 187
In developing the Watson solution, the IBM Research team had to identify and ingest
content from a massive range of sources. Each of these sources required processing to
extract features that were specific to each content source and the types of question that
could be asked of each source.
With development taking several years, this meant that every time a new content
source was identified and added, new features were also added. In effect, the feature
space was continually expanding.
Given the scale of the Jeopardy challenge, it would have been completely impracti-
cal to consolidate and normalise the feature space every time a new content source was
added. Instead, the team developed a new architecture whereby new content sources
and features were added cumulatively to the application without the need to change any
of the existing features.
ML was then applied as the very final stage of the process to learn which features were
important for each type of question.
that didn’t collect that information at the time, then you will get less value from subsidence
as a model feature.
It may be possible to collect an entirely new feature going forward, but retrospectively
finding such data is not always possible.
Data consistency is a huge challenge; especially with very large data sets and multiple
annotators. If you ask a team of ten medical underwriters to assess the same 100 insur-
ance applications, it’s unlikely that they will all assess the same level of risk for each of the
applications. When multiple people interpret data, they rarely produce identical results.
Finally, the answer isn’t always in the data! That was an incredibly hard sentence for
an Analysis Junkie to write. Unfortunately, it’s true and it’s important that we approach AI
projects with a realistic expectation of what can be achieved. In some cases, the data simply
does not exist to answer the question being asked. Criminal investigations are very signifi-
cant in this respect. It’s very common for large IT companies to volunteer their technology
to law enforcement agencies in the hope of solving a major crime. Our intentions in doing
this are genuinely good and our hope is that we can do something good for society. Quite
often, however, you realise that the key information needed to solve the crime just does not
exist in the data.
Data Imputation
If you have a percentage of records that lack information in certain fields, you can try to
statistically impute the values. This of course depends on how many missing values you
have versus complete, e.g. 10% missing is much better than 20%. You also need to care if
188 ◾ Beyond Algorithms
REFERENCE
1. MIT Technology Review, “Google’s medical AI was super accurate in a lab. Real life was a
different story,” https://www.technologyreview.com/2020/04/27/1000658/google-medical-
ai-accurate-lab-real-life-clinic-covid-diabetes-retina-disease/.
the missing values are not evenly distributed amongst your record types. If one type of
record never has a value filled in, it can be problematic to impute it. It is also worth check-
ing if some missing values actually mean zero or vice versa (in our house buying example
an apartment record might leave the field for garden size blank rather than put a zero,
whereas an auction property might have a blank for garden as the detailed information has
yet to arrive).
In our house buying example if many of the historical records lack information on
flooding, then we might consider it a candidate for Imputation, as it would be impractical
to visit homes we had sold in the past to ask if they had this issue: Imputation is a statistical
process that will use the complete data records to predict the missing value in incomplete
records e.g. that house is too cheap given those features it probably has flooding or sub-
sidence, etc. This may be a chicken and egg scenario for what you are trying to achieve, it
obviously works best when the ratio of complete to incomplete data is high.
It’s All about the Data ◾ 189
Proxy Data
If you cannot get the data you want, get creative, look around, is there some other data
source you could get that might work as a substitute? Look for data values that might imply
the feature you are after. For example, in our house price example could you source envi-
ronmental data that provides a map reference data set of areas with flood risk, that’s a good
start but now you need to link house data to the map data to give each house a flood score.
You will now need to enter the fun world of data wrangling!
In the real world for our housing example, we could use a geocoder service (software that
calculates a map grid reference from an address) to establish the map references for each
historical house sale record. We could then merge our historical house records with the
environmental data to create a flood risk score per record. We could then run our models
again using this data to see if the added score increased the accuracy of our model. Note
from this point on, your housing price predictor would have to know or calculate the map
reference for each new house and obtain the r elevant flood risk score.
than simply dismiss the approach. This is important because, whilst using real data may be
best practice, it is by no means perfect!
Clearly, from an ethical and trusted AI perspective, there would be huge advantages in
the use of Synthetic Data. Firstly, if done properly, Synthetic Data would not be subject to
the same privacy concerns. Secondly, the explainability and transparency issue discussed
above would be avoided.
market (2021) many show promising technical developments, especially for structured data,
but synthetic unstructured data such as text narratives or images is still in its infancy.
We need to build the tools and methods to enable the generation and maintenance of
Synthetic Data sets for use in configuring and training AI systems. We should not under-
estimate the significance of these tools as they will need to address the concerns about the
maintenance of relationships. At present, in the case of unstructured content, our tooling
is limited to text editors with search and replace functions. It should therefore not be a
surprise that any Synthetic Data generated this way is not fit for purpose.
At this point, we need to acknowledge a significant argument against the feasibility of
developing tools to produce Synthetic Data. The argument is that if you have the knowledge
and automation required to generate Synthetic Data, then surely there is no need to develop
an AI Model … just use the data generation model instead! This is a very valid argument
but, again, we must be careful about applying general arguments to specific cases.
Let’s consider free text content. There is huge value in the provision of Synthetic Content
for use in the development of entity and relationship extraction tools. The sensitivity of
medical reports, police witness statements, customs documents and other free text content
is such that sharing the content is difficult. Vast resources are available in academia and
commercial organisations that could develop AI capabilities if provided with Synthetic
Data that was perfectly representative of the real-world data.
In an extreme case, it would be possible for a human analyst to take a large corpus of
content and manually obfuscate entity and relationship data in a way that was both consis-
tent and representative of the original real-world content. This process is possible without
developing a complete understanding of the underlying natural language model (because
the human race has not yet developed a full understanding of how natural language really
works). With improved tooling, this manual process could be made more efficient. It would
still be labour intensive, but the rewards would be significant.
A further example exists in the world of simulation. Consider a very complex situation
such as a motor race or traffic management or weather forecasting. In such situations, it
is possible to build a simulator where the behaviour of individual entities is reasonably
well understood, but the combined behaviour of all entities is too complex to understand.
Having built a simulator to generate data, it is possible to use this Synthetic Data to develop
AI decision makers. A key point to note here is that the AI decision makers are not being
trained to learn the underlying model of the complex system. They are learning how to
make the correct decisions to manage the underlying complex system.
In simulating complex systems, it is possible to run simulations with a much broader
range of input parameters than would be experienced in the real world. For example, it
would be possible to simulate a motor race where the fastest car is twice as fast as the
slowest car. This ability to simulate extreme scenarios addresses the completeness issue
mentioned above.
Obviously, the effectiveness of a simulator led approach is dependent on the quality of
the simulation. Initially, a simulator may not be very effective in representing the real-
world system. It is therefore important to invest the effort and resources required to con-
verge the simulated environment with the real world.
It’s All about the Data ◾ 193
The use of Synthetic Data in developing AI systems is a contentious issue. Many experts
believe that Synthetic Data can never be sufficiently representative and, even if it could
be, it would be impractical to generate the data. However, it’s clear that there is a need
for training data that is compliant with privacy regulations and complete from an engi-
neering perspective. Developing the methods and tools to produce such data is going to
be challenging. However, if we look at other fields of engineering, that’s exactly what has
happened. AI is not engineering free, so we should expect to invest considerably in tools,
methods and data required.
(b)
(c)
(a)
FIGURE 7.11 Data workflow for enterprise AI projects. (a) The cycle of all the activities in a logical
sequence. In practice, there are many smaller cycles depending on the context. For example, (b) is a
very common scenario of building and validating models. (c) The situation where a deployed model
may be drifting due to previously unseen data or may need additional labelling. The details of these
activities are described in the text.
194 ◾ Beyond Algorithms
Requirements (Define)
Data requirements stem from business requirements. The clearer the business objectives,
the better the definition of data needed and the demands on the workflow shown above.
Examples of business requirements and their data implications are:
The application has to be deployed in the Data governance has to support GDPR guidelines and
countries in the European Union. associated IT responsibilities.
The medical AI application has to work in the The data corpus for training has to include international
international context. standards of practice, not just from one country.
Support Bot for users has to work in 10 major The training data (speech or text) has to contain the actual
languages. data from the users in the original languages, not a
translated version from English. The original version will
contain slangs, dialects, accents and other variations that
have to be understood by the AI.
The mortgage assistant for the bank officer is to Need to check for systemic biases in the training data.
serve a diverse demographic population.
The image recognition software has to work Training and test data have to include low-resolution
with images from older devices with low picture images.
resolutions.
Acquisition (Get)
Once you know what data you need, the next task is figuring out how to get it. This will
depend critically on whether your company already owns the data or you have to go out-
side your company. Let us first consider the case, you own the data. If you own a retail
department store chain (say, Walmart), you may already have the data from your past that
can be used. It may need some data wrangling to make it usable, but that is within your
control. If you do not have the data you need, but it is obtainable within your normal busi-
ness processes, with some careful design, you can start collecting the right type of data
over a period of time, before creating the AI application. If you are in a specialised domain
It’s All about the Data ◾ 195
(e.g. banking, insurance, manufacturing, legal, law enforcement, healthcare, etc.), you may
not have a choice of an external data source for competitive or regulatory reasons. The only
options may be the generic industry-specific sources (e.g. international banking standards
such as BASEL III, HIPAA standards for healthcare, etc.) that can provide a basis for the
domain knowledge but not specific to your business.
Now, if you do not own the data or you have to augment your data from other
sources, the challenge gets a lot more complicated. If the sources are not for profit (e.g.
Wikipedia, Data.gov), they will allow you to use their data for free, but for restricted
purposes under a license, with proper governance. Most commercial data sources have
strict license terms, usage policies and duration of use that need to be carefully consid-
ered, in addition to cost, before committing to them. Such sources are more likely to be
in consumer areas such as retail, e-commerce and social media. A good understanding
of the formats and the storage needs of the licensed data is also critical, before the deci-
sion to license the data is made. Getting a sample of the data ahead of time will help
immensely.
Ingestion (Load)
Ok, you have the rights to the data, one way or the other! The next step is to actually get the
data into your infrastructure in proper formats, volume, etc. One major option to consider
is data virtualisation. This allows access to the data from multiple sources, without hav-
ing to copy and replicate data, while making the most current data available for analysis
at any time and reducing cost. The benefit of this approach can be significant if no sub-
sequent data wrangling is needed. But, if snapshots of the data are needed to capture the
data versions corresponding to the versions of AI models, some automation is needed to
make the process more tolerable for day-to-day operations. If there is a real-time aspect
to it (e.g. stock trades, IoT device outputs, etc.), a careful consideration of the ingestion
rates, frequency of updates, etc., is also needed. In most cases, it is quite likely that you are
going to need the data for the life of your application and so you need to make sure that the
infrastructure is extensible for your future needs. If your business is successful, this can be
a very long time!
Preparation (Clean-Up)
This is the step where you decide what part of the data you are actually going to use. This
task requires a good understanding of the AI application you are building and the skills to
do data exploration. Since more data invariably means more data wrangling, it is good to
select sources and contents that fit the needs of the AI application the best. The clean-up
requires various tricks, and the choices made in the preparation process will have impli-
cations when the model predictions are made later in the process. Let us say we want to
predict the number of pandemic infections in a particular area for the next many months.
The daily data may be too noisy due to statistical fluctuations or systemic issues (e.g. no
reporting on Sundays!). The smoothing of the data can be done in many ways: moving
averages over a time window (e.g. a week) and/or picking a larger granularity (e.g. grouping
by county instead of town).
196 ◾ Beyond Algorithms
What if we have the test results for the individuals but do not have the demographic
data (e.g. patient race or sex) from the same source? This is the next big challenge of
missing features. In projects that use unstructured data, this may manifest as having not
enough training examples of a particular type, such as lack of non-Caucasian pictures
in developing AI applications for facial recognition [13]. Missing or inconsistent values
in the data fields are the next challenge for clean-up. We have discussed some additional
ideas on addressing the data preparation problems earlier in the “What Happens in the
Real World” section.
Merging (Combine)
When the data from different sources are combined, it leads to various problems with data
merging. The first task is to figure out the relationship between the data to get the basic
story straight. Examples are:
• Joining: Linking rows across tables with unique identifiers (called ‘key columns’)
that make it possible to expand the number of features in a rigorous fashion. Without
the key columns, this is simply not possible. Example is a patient ID in hospital
records across different processes such as billing, physician’s records and appoint-
ment scheduling.
• Normalisation: The same entity may be called by different names in different
data tables. Example is under the column ‘Company’, it may contain three names:
International Business Machines, IBM and IBM Corporation for the same com-
pany. The merging process will typically create a new column called “Normalized
Company” to represent using the normalised name, may be just ‘IBM’. We will have
a similar problem when we have textual descriptions from different sources involv-
ing the same entity using different names. This is called “Entity Resolution” in NLU.
Augmentation (Enhance)
Sometimes the available data needs to be enhanced to meet the needs of the AI algorithm.
In supervised ML (see Chapter 3), in the training data we need the inputs as well as the
corresponding output labels. An example is identifying objects in images. So, we need to
label the object that goes with each image for the algorithm to use. This is typically done by
humans to start with, but once the algorithm learns how to do this with enough accuracy,
labelling can be fully automated or semi-automated with partial human validation. Similar
need exists in NLU where we need to identify the various entities and their relationships
from reading unstructured textual documents. Initially these had to be manually anno-
tated by humans for the machines to learn from. Recent advances in Natural Language
Processing [14] have made it possible to preprocess the text to do multi-level automatic
feature representation learning. In business domains where the language models are dif-
ferent from the popular language corpuses, domain-specific manual annotations may still
be needed. If the annotations are done by a group of individuals (e.g. crowd-sourcing via
Amazon Mechanical Turk), there needs to be enough checks to make sure that the labels
It’s All about the Data ◾ 197
by different annotators are consistent and correct. It may be also necessary to add synthetic
data where it makes sense (see earlier discussion in this chapter).
Modeling (Use)
The goal of the modelling task is to create the best model with the available data. You have
to be careful to make sure that training data represents the operational data in feature dis-
tributions. There are two aspects to modelling:
i. Try different models with the available data and pick the one that gives the best per-
formance in terms of metrics of relevance.
ii. Identify if the model performance can be improved by either additional quantity or
improved quality of data. This needs a careful debugging of the model performance
and a judgement to ask for additional data help.
Cross validation is a technique that uses different subsets of the training data to build and
validate models during the modelling process to give a better view of the quality of the
model created.
Testing (Check)
Primary goal of testing the AI system is to get the confidence that the model generalises the out-
puts to inputs previously not seen in the model building task. This is best accomplished by using
a ‘hold-out’ test data set that represents the operational data well and evaluating the model
performance. To be deployed in practice, when tested with feature values beyond what is in
the training set, the application should recover gracefully, even if the model fails to generalise.
DevOps (Deploy)
From the data governance perspective, DevOps activity needs to make sure that the ver-
sions of the model and training data are tracked and documented, in case the model per-
formance in the operations requires a diagnosis of the modelling process. If the model
performance degrades during operations, it may be necessary to regress to an earlier model
version with confidence. Lack of a clear definition of a ‘defect’ in AI systems due to their
statistical nature challenges traditional notions of success and failure in integration testing
and deployment. Running automated tests prior to deployment has to be complemented
with some debugging activity.
198 ◾ Beyond Algorithms
Operations (Evolve)
In AI systems, monitoring is not an option, but a required activity. This is because the AI
model behaviour can drift with time due to changes in the input data and the data dis-
tributions in the deployment. The specific details on what to monitor and how often will
depend on the specific use cases and business goals. One common challenge with auto-
mated model evolution during deployment is that the data collected during deployment is
not ‘labelled’ (i.e. the correct output is unknown) and hence not directly usable for model
retraining. Techniques such as active learning [15] can be used to get users to provide the
labels directly; semi-automated approaches combine automated labelling and human vali-
dation. As the Microsoft Tay bot example demonstrated [16], continuous model learning
has its own challenges in validation.
Data Governance
As it should be obvious from the descriptions of the data workflow activities above, data
governance is a critical activity in an AI project. It has to start even before a project starts
and last as long as the AI application is deployed. It covers business aspects (data regulations,
licensing constraints, etc.) as well as the technical aspects (the data infrastructure, tools, test-
ing and development practices, etc.). Here are some critical areas of data governance:
• Data provenance: the quality of the AI applications critically depends on the veracity
and quality of the data used in the modelling. Data scientists typically use data from
wherever they can get it and do not question its trustworthiness. In the heady pursuit
of a performant model, they can often make ad hoc assumptions and data transfor-
mations in the process, which are not recorded for reproducibility.
• Trust: it is now well known that business data contains implicit and explicit biases.
AI applications bring them to the foreground. There needs to be a clear recognition
upfront on the business priorities with respect to the importance of bias and fairness
in decisions and the requirements on the explainability of the AI outputs. The ability
to check for the data quality across the entire data workflow is essential to meet these
objectives.
Data tools for AI are in their early infancy, and they are critical to help with the data work-
flow tasks shown in Figure 7.11. Gradual progress is being made in this area.
Third, it’s important to fix the problem at the source as well! Often data cleansing is seen
as a one-off exercise when, in reality, it never ends. It’s always frightening to see require-
ments for new data repositories because they invariably include some sort of vision state-
ment that the new repository will bring together all the historical and legacy data into
one single new, all singing, all dancing repository. Unfortunately, if the user input at the
source isn’t simultaneously fixed, when the data in the new repository goes live, it too
becomes a legacy repository. When cleansing existing data, it is imperative that the sources
of that data are understood and actions put in place to improve the ongoing quality of
data received from those sources. For example, if source data is often incomplete or inac-
curate then perhaps the user experience of the source data system needs to be reviewed to
improve the quality of data capture.
REFERENCES
1. “What happens in an internet minute?” https://lorilewismedia.com/.
2. IDC Global DataSphere, Forecast: 2021–2025 The World Keeps Creating More Data - Now,
What Do We Do With It All? (IDC #US46410201, March 2021).
3. List of datasets for machine-learning research: https://en.wikipedia.org/wiki/List_of_
datasets_for_machine-learning_research.
200 ◾ Beyond Algorithms
It’s amazing … we put a small team of grads on this and in just a few days they achieved
so much. They took some data, used a Cloud based AI and built an image classifier that
could really transform our business.
The Client was right to be impressed because the graduates had done a great job and the
solution that they had demonstrated was impressive.
Three years later, the capability that the grads built still isn’t being used operationally.
James Luke
How Hard Can It Be? ◾ 203
DO WE NEED AN INVENTION?
For many years, engineers have joked about boxes in architectures labelled “clever stuff
happens here”. By definition, an AI project is going to need a box in which clever stuff hap-
pens. However, there is a massive difference between “clever stuff” and “impossible stuff”.
When evaluating potential AI projects, it is important to understand whether the “clever
stuff” required already exists or whether it needs to be invented.
We are most definitely not telling you never to attempt a solution that requires an
invention. Quite the opposite! The Apollo mission, Bletchley Park and the Manhattan
Project were all massive programmes that required inventions in order to achieve their
objectives. In today’s context, the development of driverless vehicles also fits into this
category. Work on driverless vehicle technology has been underway for at least two
decades and has resulted in many inventions. We think projects on this scale are incred-
ible and should definitely be attempted … as long as you can meet two critical criteria.
First, you need to appreciate the scale of your undertaking and be willing to resource
accordingly. You need to understand that the project may take 30 years and may cost hun-
dreds of millions. Second, you need a top-notch leadership team to work with you and you
should consider the authors of this book!
Seriously though, the scale of the invention required to deliver a solution will naturally
determine its likelihood of success. This is often a judgement call so it is important to bring
in experienced technical experts to assess the level of invention required and to ensure you
have the right people, resources and organisation in place to maximise your chances of
success. Given the seven decades of research and development, let us explore the current
state of AI in the next section.
204 ◾ Beyond Algorithms
CURRENT STATE OF AI
Neural versus Symbolic
Much of the excitement in the last decade in AI is due to the advances in statistical learning
from data using Artificial Neural Networks (ANNs), particularly Deep Neural Networks
(DNNs). The idea that machines can ‘learn’ on their own from data without human inter-
vention has been proved beyond doubt. Due to their statistical nature, these systems
learn from large amounts of data and perform narrow tasks very well. The knowledge in
these ‘neural’ systems is captured in internal representations of the statistical models and
not easily understandable by humans. This is in contrast with the expert systems of the
last century (discussed in Chapters 2 and 3) that were ‘taught’ explicit human-created,
machine-readable rules and knowledge in any specific domain. These are called ‘symbolic’
systems. Since these rules and knowledge representation were created by humans, it was
easy for humans to define and understand the behaviour of such systems. Logical reason-
ing using human-friendly concepts came naturally with symbolic systems.
The reality we face today in AI is that we have centuries of knowledge captured in
human-friendly ‘symbolic’ form (i.e. languages, documents, etc.) and we do not know how
to integrate this knowledge with the ‘neural’ models that learn statistically from data. Key
point on the ‘neural’ learning is that it is based on statistical properties of data and not
based on human-level abstractions. For example, if you type ‘bathroom’ in a search engine,
it will suggest various options, ‘bathroom vanities’, ‘bathroom ideas’, ‘bathroom remodel’,
etc., based on probabilities derived from the data from millions of people who have used
those combinations in their searches. Believe it or not, the AI behind the search engine actu-
ally does not know anything about a bathroom. The fact that it is one of the rooms in a house
where people take baths. It has a sink, tub, shower, toilet, cabinet, etc. These are the attri-
butes captured in a symbolic knowledge representation of a bathroom. So, the challenge
today is how to combine ‘neural’ aspects of statistical learning with the ‘symbolic’ aspects
of human knowledge and reasoning. Such a ‘neuro-symbolic’ AI [2] will be able to do a lot
more than what is possible now.
Examples of AI Challenges
However, AI has a long way to go before it can come close to emulating true human intel-
ligence. As in many other fields, there are still some fundamental problems that have not yet
been solved! It is fair to say that AI has fallen between two different approaches: ‘Symbolic
AI’ of the prior decades that incorporates compositionality and construction of cognitive
models, but inadequate at learning. ‘Neural AI’ of the past decade that is very good in learn-
ing, but poor at compositionality, abstractions and building cognitive models of the world.
Our current AI technology works well in carefully scoped environments on constrained
data and feature spaces. Here are some examples where the current AI techniques have
trouble. For a more detailed discussion, we recommend the book Rebooting AI by Gary
Marcus and Ernest Davis [1].
• Data Needed: As we have emphasised in various places in this book, due to their
statistical basis, Deep Learning (DL) techniques need data in sufficient quantity and
quality to perform well. Most often, you can find the data inadequacy only after suf-
ficient experimentation in a project.
• Long-Tail Problem: Algorithms do well with common items for which there is a lot
of data but have difficulty with large number of rarer items that have very little data.
This is the typical scenario with enterprise AI.
• Proclivity to Errors: DL systems are based on the concept of statistical learning to
create the underlying models. Inherently, even the best optimised models will inevi-
tably make mistakes at some fraction of outputs. The enterprise has to decide on what
percentage of errors is tolerable to the business and how to manage the risk, not if, but
when the errors occur.
• Objects in Unusual Poses or Surroundings: The training data for images of objects
typically come from their natural poses and surroundings. In testing or actual use,
algorithms have trouble identifying the same objects, if they are placed in unusual
surroundings or in uncommon orientations (e.g. a school bus lying on its side or a
diagonal view of a cube).
• Synthesis of Concepts: Even the most advanced natural language applications are
looking to match exact set of words or synonyms and their ‘statistical closeness’ based
on correlations. They cannot synthesise information from many documents that do
not use the exact same combination of words or synonyms. The interesting example
is the difficulty with listing the seven Horcruxes from just reading the Harry Potter
books, where they do not appear together in the same passage or items explicitly
identified as a Horcrux in sufficient proximity.
• Common Sense: Machine learning (ML) techniques do not have the common sense
understanding of relationships between real-world objects and so they cannot follow
a chain of inferences that are implicit. The example from [1] is Elsie tried to reach her
aunt on the phone, but she didn’t answer. There are many challenges for statistical algo-
rithms here. (i) ‘Reach’ has many different usages in practice beyond communication,
206 ◾ Beyond Algorithms
Information Extraction
This is an essential task that is required before implementing many of the other applica-
tions. Given a corpus of unstructured text (e.g. pdf documents, wikipedia, etc.), how
do we extract explicit or implicit information from it? Commonly extracted information
includes named entities and their relationships (e.g. companies & their CEOs), events
and their participants (e.g. Wimbledon and Roger Federer) and temporal information
(e.g. World War II events, dates and relative order). The outputs are typically stored in
relational databases for efficient retrieval and analysis.
Search
The purpose is to help the user find the information most relevant to the input words
as quickly as possible and rank the retrieved items by relevance. You can think of your
favorite search engines.
Interactive Bots
User inquiry needs an extended dialogue and context for the system to provide the right
response. Since questions in open domain (i.e. any arbitrary topic) are difficult to answer,
commercial bots are trained in specific domains/skills (e.g. banking) and/or specific tasks
(e.g. money transfer).
Text Classification
This is a popular application of NLP in business where the AI classifies the input free-text
documents into predefined categories. Imagine a news agency classifying the various
streams of incoming news items into Politics, Sports, Culture, Weather, etc.
Text Generation
Here are four examples of NLP tasks that require the generation of human-like language.
• Given non-textual data (e.g. images), the goal is to create text (e.g. captions) that
represents the contents of the data.
• Extractive Summarisation that finds key elements in textual documents and pro-
duces a summary of the most important content by selecting and rearranging text
taken directly from the documents.
How Hard Can It Be? ◾ 207
IBM’s Project Debater [2] is a recent example of advances in NLU for a live debate with
a human.
REFERENCES
1. D. W. Otter, J. R. Medina, and J. K. Kalita, “A survey of the usages of deep learning for natural
language processing,” IEEE Transactions on Neural Networks and Learning Systems, 32(2),
pp. 604–624 (2021).
2. N. Slonim, et al. “An autonomous debating system,” Nature 591, pp. 379–384 (2021).
(ii) ‘aunt’ is not physically ‘on’ the phone which is the common usage of ‘on’ to specify
relative positioning of objects and (iii) ‘she’ refers to the aunt because Elsie cannot call
herself. There is really no model of the relevant world in DL.
• Dealing with Negation: ML techniques have trouble with understanding the general
usage of ‘not’. A search of ‘a restaurant that is not McDonalds’, brings back a bunch of
references to local McDonalds!
• Composition: ML techniques do not have the concept of composition, i.e. bringing
various parts together to make a whole; you compose the image of a cat with head,
feet, body, tail, etc. Head is composed of eyes, ears, mouth, etc. The current DL sys-
tems get rattled by changes to a few pixels of an image and mislabel an object, even
though to a human it looks just fine [6].
• Causal Reasoning: ML cannot do causal reasoning [7]. As a human, you can look at
a situation and decide that it is dangerous. You know that certain objects are heavy or
hard and that they will cause harm if they come into contact with humans or animals
with a high level of force. You know that certain situations cause objects to fall, or
to be catapulted, at high speed. You know that other circumstances make floors and
walkways slippery. You are able to look at a structure, such as a ladder leaning against
a wall, and decide that the setup does not look very secure. You can fuse together all
these different scenarios and decide that what is happening is dangerous and that it
would be safer to move away from the scene. The current breed of AI technology is
not able to perform that type of prediction.
• Ambiguity in Language: One of the reasons it is so hard for machines to understand
human language is that it is packed with ambiguity. Even something as simple as
a word can have multiple different meanings depending on the context. The word
‘bat’ could refer to either a flying mammal or a wooden instrument used in play-
ing baseball or cricket. As human beings, we have an incredible ability to resolve
208 ◾ Beyond Algorithms
these ambiguities by taking into account the context and using our extensive human
knowledge (and perhaps some common sense) to figure out what was really meant.
Even then, a phrase such as “Mary was watching the Yankees game when she was
unexpectedly hit by a flying bat” could be interpreted either way. The recent advances
with pretrained models from massive data corpuses in the public domain [8.9] can
often give people the impression that AI is very close to ‘understanding’ natural lan-
guage. Unfortunately, it is a long way to go before that happens, if at all.
behaviour. If an application that normally rejects 25% of claims suddenly starts rejecting
40% of claims, then perhaps something has changed. It may be that there has been a change
in one of the data sources or something in the environment is causing a higher number of
claims. A monitoring framework is essential in identifying unexpected changes in behav-
iour so that the behaviour of the application can be assessed and, if necessary, corrected.
If a completely new business process is being defined, then the standard business change
considerations apply. However, in the case of an AI application, it is critical to ensure the
AI is accurate enough for the business process to be effective. This requires careful defini-
tion of the AI evaluation and, in particular, it’s important to ensure that the data used for
the evaluation is representative of the data that will be encountered in the new business
process. AI applications are massively data sensitive and even small changes in data can
have a dramatic impact on accuracy.
In Chapter 5, we also talked about the business motivations and the basic requirements
behind creating Trustworthy AI. Implementation of these requirements needs adoption of
appropriate ethical principles and careful evaluation and selection of the relevant technol-
ogy approaches. Here is a table summarising the popular trust requirements and useful
references for help with implementations.
A key, but often overlooked, aspect once you have put an AI into production and remem-
bered to put monitoring infrastructure in place is now you have to establish what you will
do if your AI starts to underperform. Knowing your AI is failing is only half the story,
you will need a plan “B”. You could keep an army of experts, just in case, to take over with
manual routines in reality that’s not practical. The best strategy is to keep a small team of
data scientists employed to not just monitor your production AI but to keep alternate mod-
els in training, evolving with the data, ready to deploy if the main algorithm starts to wane.
A doubly good tip is to keep at least one of those models in a form that is explainable, even
if it is sub-optimal. If your production AI is pulled due to a potential ethical bias, you may
not be able to replace it quickly with another that cannot prove it isn’t biased.
Similarly, in considering transparency and explainability, any application needs to include
appropriate tools to understand, and demonstrate to Stakeholders, why decisions were made.
These tools should include the DevOps capability to identify which model was in use when a
specific decision was made. For ML models, we should also be able to trace the training data
used to generate the model and the test data was used to validate it. In cases where a decision
has been challenged, we should be able to identify all cases in the original test and training
data that are similar to the data on which the contentious decision was based.
210 ◾ Beyond Algorithms
AI IS SOFTWARE
In the enterprise context, the purpose of the AI is to do a specific task to support a business
application. Therefore, it is natural to expect that the invocation of the AI is through some
traditional application program interface (e.g. REST-based microservices). In short, it is just
another software component, albeit with some special attributes. Thus, from the system or
software management point of view, it has all the same expectations as any other software
component. Figure 8.1 shows the recommended system and software quality models and
attributes from the ISO/IEC 25010 process standard [11]. Even though the specific inter-
pretation may have to be refined for the purpose of AI components, the utility of the basic
structure is immediately evident. The quality attributes in use (in the left column) i.e. effec-
tiveness, efficiency, satisfaction, risk and context coverage do represent the relevant dimen-
sions for consideration. Inclusion of ‘Trust’ under the ‘Satisfaction’ category is fortuitous
in hindsight, since it has taken a more profound meaning for AI components. The product
quality attributes on the right are essential for product owners. Notably, the common met-
ric used by AI algorithm owners is accuracy which relates to ‘Functional Correctness’ in
Figure 8.1, and it is only one of the more than two dozen attributes in the ISO standard. It is
important to evaluate an AI component against these attributes to understand the require-
ments they place on the use of an AI component in a software system.
FIGURE 8.1 ISO/IEC 25010 system and software quality models [11].
How Hard Can It Be? ◾ 211
undertaken by a new set of developers, there is a good chance that they will feel it is easier
to start from scratch. Let us face it, no one wants to debug someone else’s code!
Despite the challenges of reuse, the software industry has progressed a great deal in
recent years. Whereas systems developed in the late 1990s were largely bespoke, more
modern applications are increasingly built using Consumer-Off-The-Shelf (COTS) prod-
ucts such as databases, application servers and message brokers. The advent of Cloud-based
services encompasses a huge level of reuse based on publicly available application program
interfaces (APIs). The growth of Open Source Software (OSS) has also introduced a new
twist. Since the development of OSS is visible to the community, the utility and quality of
the resulting software can be judged directly. There are numerous examples of successful
OSS projects in the last two decades, Linux operating system, Mozilla Firefox Browser,
Apache Tomcat Web server, to list a few. OSS licenses allow full-scale commercial use of
software at low costs, and hence, they are very popular in the industry. The reuse of open-
source components and libraries for specific purposes has also become very common.
Reuse in AI Applications
Due to significant differences in the processes for developing AI applications versus tra-
ditional software applications (to be discussed in more detail in Chapter 10), reuse in AI
deserves some discussion. Whilst AI software components may be generic, AI applications
are highly specific. It is important to remember that enterprises want to solve business
problems and not just invest in AI technology. When buying or building an AI applica-
tion, an enterprise may ask for a fraud detection component … or an image classifier …
or a speech-to-text transcription tool … or a language translation service … or any of the
other thousands of potential AI components. When procuring such components, it is only
natural for Stakeholders to expect reuse of existing capabilities. Someone working for a
pharmaceutical company may see a new report about a tool being used to translate chem-
istry papers for a university. It is only logical, in such circumstances, to expect the tool
that works on chemistry papers to work on the pharmaceutical content. Unfortunately, AI
technologies are very, very sensitive to data, and even small changes in the data can have a
big impact on accuracy.
Figure 8.2 shows a high-level decomposition of AI application development from a reuse
point of view. The boxes in grey represent traditional software components that may also
be a part of an AI application and their reuse opportunities are guided by the general soft-
ware discussion we had above. The orange boxes represent AI-specific topics we discuss
below in more detail.
AI Applications
Application needs are generally unique within an industry (e.g. banking, insurance, man-
ufacturing, etc.), and even companies within the same industry typically strive for differ-
entiation. There are still some AI applications that can be reused within a given industry or
even across industries with some minimum customisation of data and workflow. The obvi-
ous example for these is a Cognitive Assistant or Service Bot. AI agents to help clients with
questions on information about specific product offerings, Frequently Asked Questions
(FAQs), ordering, active online support, etc., have become very common [12].
212 ◾ Beyond Algorithms
AI Models
There are two scenarios for reusing AI models. (i) Reusing model within the enterprise and
(ii) reusing a model from an external source such as open source or a commercial vendor.
Within the enterprise, ideally, it should be possible to use an AI Model from one applica-
tion directly in another similar application. In reality, even if the applications are similar,
it is highly unlikely that an AI Model will be directly reusable unless the data being pro-
cessed share at least some features and their distributions. One possible solution to this is
the use of Transfer Learning which we described in Chapter 3. Transfer Learning is a pow-
erful capability; however, the key point still applies. It is incredibly rare for an AI Model to
be re-usable in different, albeit similar, applications without some form of revision.
For the scenario of using an external source, many AI capabilities are generally avail-
able as generic components, typically as cloud services. Many commercial vendors such as
Google, IBM, Microsoft and Amazon provide access to commercial services (e.g. speech-
to-text, text-to-speech, sentiment analysis, image recognition, etc.) supporting AI applica-
tion development.
Data for AI
Given the importance of data in an AI project, it is reasonable to expect to reuse data as
much as possible. Consider an insurance company that offers life, home and car insur-
ance. An AI model developed for some aspect of life insurance underwriting will almost
certainly not be re-usable in a car insurance process. However, the customer data may be
directly transferable. This core data will, of course, need to be complemented with other
data such as health data for life insurance or vehicle data for car insurance.
As we had mentioned in Chapter 7, public data sources available in unstructured and
structured formats [13] include images for various purposes, videos for action recogni-
tion, text data (news articles, messages, tweets, etc.) and structured multivariate data in a
tabular form in various domains (financial, weather, census, etc.). However, some caution
is needed to avoid potential poisoning of the publicly available data by adversaries.
How Hard Can It Be? ◾ 213
James Luke
AI Lifecycle Tools
The lifecycle of an AI application is strongly affected by the demands of the ML model(s)
being used. Due to the significant differences from traditional software components, the
creation and sustenance of ML components require a wide range of tools across the life-
cycle. We refer to [14,15,16] for current approaches to manage AI model lifecycle. The tool-
set covers the various phases in an AI project i.e. Data preparation, Model Building &
Validation, Testing & Verification, Imbuing Trust, Deployment and Monitoring. We will
discuss more details of the engineering lifecycle in Chapter 10.
applications carry significant technical debt. Due to lack of clear functional boundaries,
the behaviour of a ML component can be summarised as “Change Anything – Changes
Everything”. Data dependencies are more costly than code dependencies. Systems can also
become overly complex due to glue code needed to support various models, data process-
ing and ubiquitous experimentation. Maintenance is expensive due to inevitable changes
in data and models with time. Amershi et al. [18] reported on a study of software teams at
Microsoft developing AI-based applications. They had three key observations:
i. Managing the AI data lifecycle is harder than other types of software engineering,
ii. Model customisation and reuse require very different skills that are not typically
found in software development teams
iii. AI components are more difficult to handle than traditional software components
due to the difficulty of isolating error behaviour.
Another study at Microsoft by Kim et al. [19] reported on the technical and cultural chal-
lenges of integrating data scientists into software development teams.
From the description above, it should be obvious that while bringing some exciting new
capabilities, using ML in AI applications also brings significant risks. An enterprise needs
to have sufficient knowledge and experience to develop AI applications. Sharing lessons
learnt and the practical aspects of deploying tools and practices across AI project teams
can add considerable value to the enterprise.
THE AI FACTORY
One approach to maximise reuse opportunities is to see AI as a cross department issue
and take a “factory” approach. The AI Factory is a combination of skills (otherwise known
as people), assets, tools and processes aimed at improving the efficiency of AI delivery. An
AI factory will help ensure you have a consistent approach and rigour to all your AI proj-
ects across your business, this will dramatically reduce your maintenance and risk. An AI
factory needs a blended team of data scientists, traditional IT folk and most importantly
business domain experts.
Getting started on an AI journey can be challenging for many enterprises simply
because of the resources required to get started and the time taken to demonstrate value.
In order to evaluate the feasibility of an application, some form of evaluation environ-
ment is required. This may involve provisioning hardware or virtual cloud infrastructure.
Evaluations are complex and require both skilled evaluators and tooling if they are to be
conducted properly.
The AI Factory aims to build a core team of skilled resource supported by ready provi-
sioned infrastructure and tools to allow rapid evaluations. A team of people that can both
build new AIs and maintain your organisations growing stack of operationally deployed
AI. The overall aim of this is to reduce the cost and time taken to evaluate ideas and
How Hard Can It Be? ◾ 215
James Luke
216 ◾ Beyond Algorithms
capabilities such that an enterprise can fail fast and deliver the benefits expected of an AI
programme.
An AI Factory requires several elements:
• Choosing the hardest problem for AI may not be best strategy for success.
• Understanding the current state of AI technology is important to achieve higher
probability of success.
• There are some unsolved (hard) problems in AI … understand when your solution
requires you to solve one of them with a breakthrough invention.
• Domain specialists are critical for successful delivery.
• Understanding the impact of AI on the business process is important.
• Instrumenting trustworthiness requires careful implementation.
• Since AI is software, you also need to manage the traditional expectations of software
(e.g. maintainability, reliability, etc.)
• AI Solution needs a monitoring framework to make sure that it is operating in a
trustworthy fashion.
• Extensive tooling support is needed to manage the AI model lifecycle.
• Reuse of assets is possible, but it comes with a set of caveats.
• Transfer learning is a realistic alternative, but it depends on the closeness of applica-
tion domains.
• Any significant AI programme should be supported by an AI Factory.
REFERENCES
1. G. Marcus and E. Davis, Rebooting AI, Pantheon Books, New York (2019).
2. H. Kautz, “The third AI summer,” Robert S. Engelmore Memorial Lecture, AAAI 2020 https://
www.youtube.com/watch?v=_cQITY0SPiw.
3. T. R. Besold et al., “Neural-symbolic learning and reasoning: a survey and interpretation,”
arXiv preprint, arXiv:1711.03902 (2017).
4. A. d. Garcez and L. C. Lamb, “Neurosymbolic AI: the 3rd wave,” arXiv preprint,
arXiv:2012.05876 (2020).
5. S. Russell and P. Norvig, Artificial Intelligence-A Modern Approach, Pearson (2020).
6. H. Xu, et al. “Adversarial attacks and defenses in images, graphs and text: a review,” The
International Journal of Automation and Computing, 17, pp. 151–178 (2020).
7. J. Pearl, “The seven tools of causal inference, with reflections on machine learning,”
Communications of the ACM, 62(3), pp. 54–60 (2019).
8. T. Young et al., “Recent trends in deep learning based natural language processing,” IEEE
Computational Intelligence Magazine, 13(3), pp. 55–75 (August, 2018).
9. X. Qiu, et al., “Pre-trained models for natural language processing: a survey,” Science China
Technological Sciences, 63, pp. 1872–1897 (2020).
10. R. K. E. Bellamy et al., “Think your artificial intelligence software is fair? Think again,” IEEE
Software, 36(4), pp. 76–80 (2019).
11. ISO/IEC 25010, “Systems and software engineering - systems and software quality require-
ments and evaluation (SQuaRE) - System and software quality models,” (2011) https://www.
iso.org/obp/ui/#iso:std:iso-iec:25010:ed-1:v1:en.
218 ◾ Beyond Algorithms
Many years ago, I was running a workshop for a major airline. These sessions can be
very dull, so I like to lighten the mood by telling the odd joke or teasing the audience. On
this occasion, I chose the latter! To get the audience fired up, I suggested that no matter
how funny or exciting or glamorous pilots were, deep down they were really quite bor-
ing people. In fact, I said that the average airline pilot was the sort of person who would,
even if they had driven over a hundred miles from their home, immediately rush back
home if they suddenly thought they’d left the oven on.
At that point, a pilot in the audience stole the show! He didn’t get upset nor did he send
any abuse back in my direction. He just calmly said, “James, I would know that the
oven wasn’t left on because, before leaving the house, I complete my checklist”.
James Luke
This chapter represents the culmination of the contents presented in the previous c hapters.
We expect you to have already gone through Doability Method Step 1 in Chapter 4 and
passed the Artificial Intelligence “(AI)/Not AI” decision diagram (Figure 4.1). Here, we
introduce “Doability Method Step 2” also called “Doability Matrix”. The goal is to help
your AI project get off the ground on a successful path and keep it there. In case you chose
to come here directly from Chapter 4, we believe you will still benefit from the methodol-
ogy described below. You can always go back to the other chapters for clarifications and
more details on business value and doability. We recommend that you read this chapter
carefully since this can stop your project from crashing and help you build AI solutions that
people will want to use.
Whilst the evaluation of projects in terms of Value and Doability may be normal for
complex engineering projects (apart from the use of the word Doability), there are many
nuances introduced by the use of AI that need to be understood. An effective approach
is to conduct a Doability workshop with introductory AI education for all participants,
especially the business stakeholders, before you settle on one particular project. Use this
chapter and Chapter 4 to guide your workshop content.
Value Questions
Doability Questions
Questions Yes/No/May Be
Trust 1. Once the system is operational, will you be able to prove that the AI is
working correctly?
Will it be technically possible to measure the accuracy (without the
ground truth of training data)? This is a real gotcha problem. It is so easy
to ignore this issue until too late in a project’s lifecycle, often a month or
two after “go live” and after all the developers have left the project.
Sometimes this problem is impossible to fix, even if you consider it early
enough! For example, if your AI recommends a patient a drug, and the
patient dies, does that mean the AI failed or would the patient have died
anyway?
(Continued)
Getting Your Priorities Right ◾ 223
Questions Yes/No/May Be
Data 2. Can you guarantee the supply of the data into the future (e.g. do you
already own or license it)?
Everybody now knows that data is the new precious resource. If your AI
runs using somebody else’s data, the data owner may want some
compensation. Just because you found your data on the internet doesn’t
mean it is free. If someone owns the data that makes/made your profitable
AI work, they may feel they have some legal right to your income. If the
owner of the data stops supplying it, or changes its content, or regulators
stop you using it, your AI is unlikely to survive. Secure your data sources
before you go live.
3. Is your data labelled?
Having loads of data is great, but you still have work to do – this work can
eat into your projects budget, 80% of your development time could easily
go to data wrangling – make sure it’s in your plan. If your data lacks a
clear target variable (the thing you’d like your AI to provide), or if you
want your AI to interpret meaning from unstructured data (text, pictures,
audio, etc.), then most likely you are going to have to engage human
experts to start labelling the data so it can be used for training the AI.
Spoiler alert: this can be time-consuming and expensive.
4. Is the training data representative of the operational data?
Be worried, for your future in AI, if you find yourself reading this
question and asking why? Do not train your speech recogniser on PBS
newscasts and then be surprised when it cannot decipher high-school
playground chatter.
5. Is your internal development/testing environment representative of
your target operational environment?
Whatever data conditions you created to train your model (cleaning the
data, aggregating it, cross referencing it to list data etc.), you will need to
recreate that for each and every record at run time. This can be a problem
if you have developed your lab prototype on a supercomputer and expect
it to operate on a cell phone.
6. Does the training data for the AI have all the information that a
human would use to do this task?
This bear trap is not always set, there will be exceptions (typically when
you have millions of rows of data with limited variation), but those are
very rare. For all your other AI projects, if the decision task needs
information that is not in the data (e.g. the side-effects of a medicine not
just its name), you are going to have to bake enough of that data into your
training and operational data to help the AI differentiate in the same way.
This will mean merging the reference data with the individual training
data records. Remember (as per question 5) you are going to do that same
data merge in the production environment as well.
AI Expectations 7. Is the task for the AI simple enough that it does not need the skills of a
very experienced human?
It is not that you shouldn’t go ahead with your project. Just be aware it is
unlikely to be cheap or quick. Using AI to replace a boring, repetitive task
is a great idea, using AI to replace a complex task, one that takes a human
many years to master is rarely easy. This is especially the case if the human
has to deal with rare anomalies (sparse training data examples) and/or
needs common sense/external context (no training data).
(Continued)
224 ◾ Beyond Algorithms
Questions Yes/No/May Be
8. Are you confident that your idea does not require an invention?
Inventions take time to create and test and they may not work. They rarely
conform to business application delivery schedule. You want to build an
AI to find a cure for cancer, fix that thing your best engineers couldn’t do,
or decode alien radio signals, or maybe select perfect soulmates … How
are you going to know the AI is fully trained and ready for deployment,
when you do not know what or if the answer exists? The only fallback you
have is to exhaustively test the results, which will be expensive, assuming
there is a test you can devise. Even then how will you know it’s complete
and will work for all cases… Expecting AI to do magic is not good
business; try to stick to fixing problems that can be measured.
9. Can the system be implemented without it needing any complex
situational awareness?
This is less of a bear trap and more of a money pit for your project. If your
AI needs to understand context or common sense for the given task or
requires situational awareness involving multiple data streams, it will need
a more complex architecture possibly including multiple AI components.
Managing one AI component is already difficult, managing multiple AI
components and the interactions between them is even more difficult and
therefore expensive.
10. Have you confirmed that the deployed AI will not feedback and
corrupt the data source you are using for training?
If you are using today’s share prices to predict tomorrow’s share prices,
you need to know your fun probably won’t last (unless you put a lot of
effort into re-training new models every day). The simple fact is that by
financially investing in those AI recommended shares, you will affect
tomorrow’s market price in a novel way, which your model will not have
been trained to recognise, and it will probably start to fail. You could
always build a model and not use it to invest, but what would be the
point?
11. Can the AI be tested without major changes to the existing business
processes and systems (e.g. run in parallel with existing systems)?
You have just created an AI to recommend online adverts to your website
customers. The problem is you have never sold these products before nor
do you have a way to see if the adverts are influencing the customer
purchases. Are they buying because of the adverts, or are they buying less
because the adverts are poorly targeted? Are they using the adverts info to
investigate cheaper deals elsewhere? The bottom line is, “how are you
going to justify your investment in the AI, if you can’t test its
performance”?
10
Ambitious Sweet
9 Spot
Initiative Most
8
Real
7 Projects
Value
6
5
4
3
2 Ideal for
1 Training
0
0 1 2 3 4 5 6 7 8 9 10 11
Doability
FIGURE 9.1 Doability matrix to place project ideas for comparison and assessment.
If you answer YES to all the questions in the checklist, your project is on solid ground
and has a high chance of success – Congratulations!
If your project elicits even a single unavoidable “NO”, then the score for that dimension
(i.e. Value or Doability) is reduced to ZERO (sorry, but it’s for your own good, no amount of
YESs can fix a NO). If you are familiar with the children’s game “Snakes & Ladders”, you
can think about the NO answers as snakes that always take you back to zero.
You can add some depth to your project evaluation by including “Maybe” as a third
response. This typically means you have more work (and thinking) to do before your project
can be operationalised. More May Be’s mean more effort, the project with the least May Be’s is
better. We will return to specific nuances of “No” and “Maybe” responses later in this chapter.
To give you a perspective of the broad possibilities of AI projects, we have identified
specific areas in Figure 9.1 in different colours with special names to indicate their essential
attributes. You can use Figure 9.1 to evaluate an individual project or compare competing
project ideas. The box you end up will help identify the winners and losers; do not be sur-
prised when the Cinderella project wins!
Sweet Spot
Projects that are evaluated to be high Value and high Doability are in the top right-hand
box in green. These are rare. Enabling the scaling of a human task is a great place to start
an AI project. The best opportunities for AI are in scaling tasks that humans can already
perform. In this respect, consider two different types of scale.
The first type of scale is when there is a human-led task such as classifying images or
textual contents. If we can leverage the human expertise to train an AI, then we can do
the task on a massive scale with an AI. An example of this would be in the field of radiog-
raphy. A human being can evaluate an X-Ray and identify potentially malignant growths.
However, evaluating each X-Ray takes several minutes and so it’s only possible for a human
being to evaluate a relatively small number of X-Rays in a day, not to mention the potential
226 ◾ Beyond Algorithms
SWEET SPOT
A great example of a “Sweet Spot” project was the classifier built for a military Client and
mentioned briefly in Chapter 5.
The Client organisation had a team of four Analysts who each spent their days clas-
sifying rows of data received via a live feed. Even though the dimensionality was high,
the rows of data were easily presentable in a spreadsheet, and there was a vast history
of labelled data produced by the Analysts. As a result, it was a relatively straightforward
task to extract training and test sets from the mass of available data. After we had a simple
Machine Learning model using a set of training data, we were able to apply the model to
a set of unlabelled test data. It was a trivial task to evaluate the accuracy of the classifica-
tion, and the Analysts were able to confirm the effectiveness of the model.
We conducted this exercise just as the business was preparing to implement a major
change programme. The change programme would massively increase the volumes of
data received via the live feed. Our ability to prove the effectiveness of the Artificial
Intelligence convinced the leadership team that Artificial Intelligence was the key enabler
for the delivery of the change programme.
One of the biggest reasons that this project was so successful was that the Analysts
bought into the vision from day one. They saw immediately that the AI we were building
would NOT replace the Analyst. In fact, the Analysts would become the Data Scientists
responsible for building and maintaining the Machine Learning models. Rather than
replacing the Analyst, the Artificial Intelligence enabled them to scale and made their
role more interesting in the process.
Other factors also contributed to the success of the project. There was an existing
business process in place; we didn’t need to collect any additional data and/or face any
integration challenges. We weren’t asking the Client to change any existing processes or
systems and the business case was easy to demonstrate. We were able to test the AI very
easily using a simple extract of the data and with little additional effort from the Client.
The fact we could test and prove the new capability with so little effort and no immediate
operational impact placed this project well and truly in the “Sweet Spot”.
James Luke
fatigue that could lead to errors. In short, the human can complete the task in a reasonable
time but can’t complete the required numbers of tasks.
The second type of scale is where a human can describe how to perform a function, but
actually completing a real-world task requires the chaining of that process many thou-
sands or millions of times. The most obvious example of this can be found in the earliest
applications of computing to code breaking. A human being can take an encoded message,
test a specific key and evaluate whether that key is the correct key to decode the message.
What a human being cannot do is repeat this process for all permutations of keys. Early
forms of AI exploited the ability of computers to scale simple functions to deliver AI appli-
cations in fields ranging from code breaking to scheduling algorithms. However, even with
brute force computation many of these applications were only made possible by detailed
algorithmic work to reduce the size of the search space and therefore the level of scaling
required. It’s easy to dismiss these applications as “not really AI”; however, in creating AI,
Getting Your Priorities Right ◾ 227
we need to leverage every advantage at our disposal. Whilst we should strive to avoid brute
force algorithms that cannot scale, it would also be unwise for engineers to ignore the
fact that computers are really good at doing simple tasks repeatedly. Murray Campbell’s
vignette on ‘Chess Programs, Then & Now’ in Chapter 4 on the future of chess playing
presents an interesting perspective on this point.
A classic feature of a “Sweet Spot” project is when you are evolving and enhancing
an existing business process. This is important because it means you will have easy
access to labelled data. Then you can extract training data from the operations, train
and evaluate machine learning models very easily. This will enable you to demonstrate
the operational value of the AI and have that value confirmed by a team of supportive
Analysts.
Unfortunately, it’s very rare that a project is blessed with a clearly defined business prob-
lem complemented with a large set of labelled historical data that can easily be used for
model development and evaluation. Sweet Spot scenarios are rare… especially when AI is
proposed to enable a completely new business process.
Deceptively Seductive
The opposite of the “Sweet Spot” is in the bottom left of Figure 9.1, … this part of the
matrix (in red) is referred to as “Deceptively Seductive”, representing those difficult proj-
ects of questionable value. The ideas for these come for two main reasons. (i) From the
perception that AI needs to be smarter than your average human and perform some bril-
liant task way beyond human capability. Quite often, the proponents will want to select
a problem to test the sophistication of the AI, rather than support a relevant business
task. (ii) The influencers in an organisation want to ‘do’ AI as soon as possible because
everyone else on the planet is ‘doing’ it. This forces the creation of AI projects without
due consideration of Business Value or Doability. These factors naturally pull you into
the “Deceptively Seductive” region. You have been warned; at all costs, stay out of the
red region.
Ambitious Initiative
The top left box (in gold) covers projects that are extremely hard to do, but of high busi-
ness value. This could be a research project that is hard to do because we don’t yet have the
technology or a project that requires a whole new infrastructure. We tend to think about
this box as the Manhattan Project or the Apollo Programme box. In the domain of AI, the
quest for driverless vehicles falls into this region.
These are not necessarily projects to be avoided as long as you have the vision, com-
mitment and resources to go for it. They are the types of projects that are undertaken by
a visionary with the conviction that the business return or societal impact is worth the
investment. To be clear, these projects are still worth consideration if you do two things …
first, invest properly to ensure success and, second, call us to join the project because big,
ambitious projects can be very exciting!
228 ◾ Beyond Algorithms
DECEPTIVELY SEDUCTIVE
Example 1: No Existing Business Process to Support AI
When a major bank decided to give online investment advice using a virtual assistant,
they were developing a completely new business process. Their existing business pro-
cess relied on human advisors visiting customers in their homes and having one-on-one
conversations. These conversations were not recorded and the advisors were threatened
by the prospect of their jobs being taken over by an AI. Without an existing business pro-
cess, there was a lack of data with which to evaluate the AI. The business case required
proof that the AI capability would work before the new process could be implemented;
however, without the business process there was no data with which to prove that the AI
could do the job. A lack of data and unsupportive Stakeholders meant that this was not
the right place to start an AI programme!
James Luke
Developing a project in this “Ideal for Training” region of Figure 9.1 may not revolutio-
nise your business, but it will give you the confidence to attempt more challenging projects
with greater expertise.
Be prepared to modify your business goals and project ambitions. It is our experience
that the projects that change early, to reflect the reality of the data, are the only ones that
ever succeed.
• The AI Project Assessment Checklist has 10 Value questions and 11 Doability ques-
tions covering five themes: Business Problem, Stakeholders, Trust, Data and AI
Expectations.
• The Doability Matrix can be used for assessing and managing risk in one AI project
or for prioritisation across a portfolio of ideas based on business value and underly-
ing technical risk.
• Chapters 5 and 6 addressed the topic of Business Value and Chapters 7 and 8 addressed
the questions on Doability.
• Any organisation dealing with a portfolio of business ideas for exploiting AI should
use Chapters 4 and 9 to prioritise and manage the risk and outcome.
REFERENCE
1. A. Gawande, The Checklist Manifesto: How to Get Things Right Picador (2010).
Chapter 10
Science is fascinating! Engineers love applying science to solve real problems and change
the world for the better. The science that is being done in research labs around the world
is essential in furthering our knowledge of AI. However, to deliver real AI that has a
real impact on the scale that society expects, we need to put a lot more effort into the
engineering.
We have already discussed how AI projects are different in many other aspects, rang-
ing from ethics to stakeholder management in the previous chapters. In this chapter, we
consider the fundamental differences in the engineering of AI applications compared to
typical enterprise software projects.
TRADITIONAL ENGINEERING
For many decades, engineers have been building complex systems such as automobiles,
trains, ships, planes and even spacecraft with amazing success. Each of these systems con-
sists of many subsystems i.e. electrical, mechanical, engine, communications, sensors, etc.
Over the recent decades, the software content in these systems has been growing steadily
with increasing levels of functionality. Table 10.1 gives some examples of well-known sys-
tems/applications with the size of software in them [1].
This table gives a sense of the increasing importance and complexity of software systems
over a few decades. These increasingly complex software systems have been delivered using
traditional systems and software engineering practices [3-6] that have also evolved over
decades.
Figure 10.1 shows an example of a typical traditional software application lifecycle.
High-level business or system requirements (top left) are decomposed into functional (sub-
system) components and their constituent code modules that individual programmers
write. These program units must have clear boundaries and known expected behaviours
(i.e. specification of input vs. output) at the design time. Special attention is paid to the
appropriate user interfaces. Testing at various levels (unit, function, and application)
of integration is meant to expose any unexpected behaviour. The enterprise IT infra-
structure may include on-premise systems, private or public cloud options, which can
also support external calls to services from other sources or vendors. Suitable DevOps
(Development/Operations) processes facilitate a smooth deployment of the application
New Project
Starts Here FUNCTIONAL
DECOMPOSITION
APPLICATION
BUSINESS
DESIGN
REQUIREMENTS
USER
Deployment INTERACTION
MAINTENANCE APPLICATION
LIFECYCLE /
FUNCTIONAL
MONITORING PROJECT IMPLEMENTATION
OPTIONAL MANAGEMENT (Includes Coding
Modules and Unit
Testing)
OPERATIONS
FUNCTIONAL
DEVOPS TESTING
APPLICATION
External
INTEGRATION
CLOUD & TESTING
Services
FIGURE 10.1 Traditional Application lifecycle. A new project release cycle starts with Business
Requirements and ends with Maintenance, only to start the next cycle for the next release of the
application. Use of agile or iterative processes within the application lifecycle is not shown for sim-
plicity. The underlying infrastructure must support the needs of the in-house development activi-
ties and application deployment. Details are described in the text.
Some (Not So) Boring Stuff ◾ 233
REFERENCE
1. The Sorcerer’s Apprentice https://en.wikipedia.org/wiki/The_Sorcerer’s_Apprentice.
with diagnosing performance issues in the running system. However, depending on the
maturity, many systems can function quite adequately without such active monitoring. In
short, monitoring is optional. The primary need for application maintenance is to address
the tickets raised by customers on the deployed application. Depending on their severity,
tickets are resolved as soon as possible or scheduled for fixes in the next planned release.
The frequency of software releases is managed by a process (i.e. agile, waterfall, etc.)
to meet the expectations captured by business value, pending customer requirements,
planned new functions, bug fixes, etc.
Any deviation from the expected software behaviour is the definition of a defect (bug),
at the heart of any software quality management program. Detection, diagnosis and reso-
lution of software bugs are key activities across the software lifecycle. Data only serves as
inputs to programs. Programs and data are managed separately. The software behaviour is
deterministic (i.e. we know the expected output for the given input) and the development
team follows good design practices such as modularity, encapsulations and separation of
concerns. There are tools to support various activities (e.g. code analysis, debugging, data
flow, change management, bug tracking, test harnesses, DevOps, etc.). Even though there
are still many challenges to execute complex software projects successfully [7,8], it is fair
to say that there is enough actual evidence to show that we know how to build complex
software systems with adequate quality for real use. You will see below why that may look
‘boring’ when compared to what we face in projects that include AI components.
estimation before [18], it just got worse! The safest approach is not to commit to a
grand project until you have really proved the AI will work.
• AI Task Selection: The selection of the task to be performed by AI has to be done
carefully, based on business requirements such as risk, required accuracy and avail-
ability of pertinent data (See Chapter 4). Data can be fickle, it doesn’t always hold the
answers that you want, but it always holds something, be prepared to alter your busi-
ness plans based on what the data can actually do, not on what you wish it could do.
• No Specifications: Since machine learning (ML) algorithms learn from training data
containing inputs and corresponding outputs, there is no need for a specification
document i.e. “Data is the new specification”. At first glance, this may sound like a
good thing! The problem this creates is one of practicality. Since we have not written
down the specifications of the expected functional behaviour in terms of inputs and
the expected outputs, we are completely at the mercy of ‘statistical learning’ on how
the AI is going to behave. While it may not be a good engineering feeling, it can be
empowering, if you are flexible in your business planning.
• No Simple Way to Define a Bug! Since ML models are statistically learnt, there is
implicit uncertainty in the model outputs, and hence, they are not deterministic.
That means, you can get different outputs for the same input. Without a predefined
behaviour captured in a specification, there is no clear definition of a software bug.
There’s no magic way out of this one, you just need to make sure your application has
monitoring and contingency built in from the start.
• Debugging Is Complicated! When there is an unexpected behaviour in the output
(e.g. wrong classification), figuring out the cause for the behaviour involves under-
standing the model performance and the training data. To make matters worse, if
you are re-using a prebuilt AI model e.g. “transfer learning”, you may not have access
to the original training data, which makes understanding why a model is failing
practically impossible. It is no longer finding the relevant lines of program code result-
ing in the observed behaviour, since such a code does not exist! If you can keep your
original training data somewhere safe, you may need to go back to it to fathom a bug.
• Traditional Testing Will Not Work: Since there is no specification, there is no
expected behaviour. Consequently, traditional testing approaches that rely on verify-
ing expected outputs for specific inputs do not work. Making sure your training data
is an accurate reflection (noisy, dirty, etc.) of the data your AI is going to meet in pro-
duction is the best defense.
• Fault-Tolerant User Interaction: In applications such as Support Bots (aka AI
Assistants) that need more natural user interaction that involve AI (e.g. text, speech,
gesture interfaces), the interface needs to be tolerant of potential user errors and input
variations (spelling mistakes, unknown accents, etc.). For any serious AI/human
interaction applications, you are going to have to factor in a parallel process with the
236 ◾ Beyond Algorithms
ability to hand over to a human assistant seamlessly when the AI cannot cope. By
analyzing the failed interactions, you can improve the AI performance quickly and
the need for a human to take over will diminish.
• AI Models Can Drift over Time: Once deployed, AI model behaviour can drift over
time due to previously unseen data. All AI applications need to be monitored closely
after deployment to understand whether they are behaving appropriately, and there
needs to be a plan for what to do, if they’re not! In addition, increasingly they will
need to stand up to scrutiny of audit and be able to explain their actions. In the event
of continuous learning during deployment, new patterns of relationships in data can
emerge unknown to the model owners. You can think of an AI deployed in fast-
changing business environment as a soccer player in a tough match with you as the
manager; you are going to need a bench full of substitute players ready to take over if
he/she becomes injured or start to underperform.
• Dealing with Bias: If the AI model learns from existing historical data, there is a
distinct possibility that any inherent bias in the training data or in the algorithm
will become visible in the model. Systematic approaches to detect and mitigate any
unwanted bias are necessary for the application to be trusted (see Chapter 5).
• Need for Explanations: Since the ML models are complex functions that are not
available to the users, they are just ‘black boxes’. Consequently, there is a critical need
for explaining the output to various stakeholders (see Chapter 5).
• Robustness of AI Models: There is overwhelming evidence that the outputs of AI
models are susceptible for various types of adversarial attacks. A whole new disci-
pline of how to attack and protect AI models is emerging. We discussed AI robust-
ness to adversarial attacks in Chapter 5.
• DevOps is Complicated: Deploying model changes from development to production
requires versioning of both models and associated data. Instead of the traditional
automated regression testing to validate the application behaviour before deploy-
ment, validation of the AI model is statistical and more complex, and therefore more
involved.
• Frequent Application Maintenance: In addition to the model drift mentioned above,
dependency on cloud-based AI services and continually improving AI capabilities
may warrant more frequent refresh of the underlying technology components for AI
applications. This can be difficult to manage in regulated industries (e.g. banking).
So, there you have it … as you can see, so many aspects of traditional application develop-
ment activities are affected by the use of ML components. In this chapter, we will describe
in some detail the various AI lifecycle activities and best practices.
Some (Not So) Boring Stuff ◾ 237
James Luke
238 ◾ Beyond Algorithms
Prototyping
If successful with the PoC, the next question is if you can build an end-to-end application.
This phase will enable evaluation of all the broader issues that need to be considered. These
range from integration of the data pipelines for repeated use to the definition of the user
interface (if there is one). A prototype will allow you to test the feasibility of the solution in
an environment that is as close to the operational environment as possible. Often the pro-
totyping phase is timeboxed and designed to identify potential issues ahead of implemen-
tation, rather than actually address them. Whilst this helps to limit the duration and effort
of the prototyping phase, this can be a false economy. Deferring the resolution of issues to
the implementation phase is, in reality, kicking the can (and the risk) down the road.
Implementation
The goal of this phase is to build the actual application and deploy it in an environment
akin to the actual production environment. Uncertainty at this stage is often introduced
by scale and performance against realistic workloads. Whilst the PoC and Prototyping
phases were meant to reduce this risk, practical limitations (e.g. lack of resources, aggres-
sive schedule, etc.) may mean that it is not always possible. Typically, the risk manifests in
this phase as the realisation that the real scope is substantially different from that assumed
in the earlier phases, such as the number of entities for extraction or the number of classes
for classification is orders of magnitude larger! If the ‘trustworthy’ aspects of AI (e.g. bias,
explanation, robustness, etc.) were not considered in the earlier phases, you will have a
serious rude awakening here.
Monitoring
This is the phase when the operational application is monitored to ensure it is working
within its expected parameters and behaving as expected. Changes in the environment
and, most importantly, in the data being processed can have a massive impact on perfor-
mance. In the event that these changes break the AI Model, it will be necessary to spot
what’s happening, intervene and address the issue. This topic is discussed in more detail
later in this chapter.
Some (Not So) Boring Stuff ◾ 239
In the following section, we will consider in some detail the tasks that are undertaken
in each of these phases.
External
CLOUD
Services
AI MODEL LIFECYCLE
ENABLE NON-AI
TRUSTWORTHINESS MLOps DevOps IMPLEMENTATION
Integration & & TESTING
Deployment
DEBUG
MODEL APPLICATION USER
INTEGRATION INTERACTION
& TESTING
TEST AI
MODEL Project APPLICATION
>
OPERATIONS
Starts APPLICATION
Here LIFECYCLE
TRAIN DESIGN
MODEL MANDATORY
MONITORING
FUNCTIONAL
BUILD DECOMPOSITION
MODEL SUSTENANCE
BUSINESS
PREPARE ACQUIRE REQUIREMENTS
DATA DATA
PROJECT MANAGEMENT
AUDITABLITY/EXPLAINABILITY
SECURITY
FIGURE 10.2 AI Application Development with one AI component. Left side of the figure repre-
sents the AI model lifecycle and right side represents the AI application lifecycle. Inclusion of busi-
ness requirements in both lifecycles is key to the success of the AI project. Grey box in the middle
represents the integration of the two processes and deployment of the application for production
use. Three horizontal bars at the bottom, Project Management, Auditability & Explainability and
Security, touch all activities. Use of agile or iterative processes within the lifecycle is not shown for
simplicity. Details are described in the text.
240 ◾ Beyond Algorithms
The ML model lifecycle may go through many iterations before delivering a usable
model to support the specific AI task for the business application. Project planning must
include these false starts when deciding on resources and schedules. Also, in reality, any
ML components may be relatively a small piece in the overall application. There are many
other non-AI functions needed to create the business application (e.g. user interface, access
management, customer relationship management, network connections, data retrieval,
traditional analytics, visualisation, etc.).
We will discuss the activities in Figure 10.2 in the following sub-subsections:
• AI Model Lifecycle
• Application Lifecycle
• Integration & Deployment
• Project Management
• Auditability & Explainability
• Security
AI MODEL LIFECYCLE
Acquire/Prepare Data
We refer to Chapter 7 for a detailed discussion of data related topics. Raw data acquisition
may involve licensing, security, privacy issues as well as proper data governance after the
acquisition. ML modelling currently requires large amounts of labelled data that may have
to be acquired from live user inputs or domain experts or crowd sourcing. For AI appli-
cations, proper preparation is needed to avoid bias and ensure fairness & trust. Feature
extraction is a critical task in this process, which can help remove redundant data dimen-
sions, unwanted noise and other properties that degrade model performance.
Build/Train/Test/Debug Model
This step aims to produce the best model that meets the business requirements with the
available data. We refer to Chapter 3 for a broader discussion on the choice of algorithms.
In practice, various AI frameworks (e.g. TensorFlow, PyTorch, Scikit-Learn, etc.) are used
to create the model code. These frameworks typically provide some tool support for the
coding process. However, as should be evident, even if the model code does not have any
errors in it, that does not mean that the model is good for the business purpose. Another
important step in building ML models is the separation of training data and validation
data so that the model’s ability to generalise can be evaluated accurately [19-20]. k-fold
cross-validation is a standard practice, which partitions the available data randomly into
k subgroups, and each subgroup is validated against the model trained with the other
remaining data. This helps to tune model parameters, select data features and tweak the
learning algorithm. The data needs to be drawn from the same distribution for the train-
ing and validation sets. Unfortunately, debugging of the ML models is complex [21,22,23]
Some (Not So) Boring Stuff ◾ 241
since ML behaviour can be the result of the model-inferred code and the underlying train-
ing data. Breck et al. [24] discussed 28 specific tests and monitoring needs, based on expe-
rience with a wide range of production ML systems at Google.
Enable Trustworthiness
As we discussed in Chapter 5, the black box nature of the ML models invokes questions about
the trustworthiness of ML model outputs. These concerns must be addressed during the
model building activity since the preparation of the data or the specific algorithm chosen can
be significantly affected as a consequence. There is often a reluctance to tackle trustworthiness
early in the project since it may appear extraneous to the main function. But this is AI. Doing
this whilst the training data is fresh is much easier/cheaper than finding out too late that you
have lost the opportunity to bake in trustworthiness – which is the difference between people
using your application or not. We emphasise four specific aspects of trust below.
There are techniques (e.g. FactSheet [32]) to help with defining what information to collect.
242 ◾ Beyond Algorithms
MLOps
Like the DevOps process for software code and its configurations, MLOps represents the
step of moving the validated AI model from development to operational use and managing
it in a production environment. We refer to [33,34] for detailed descriptions and underly-
ing practices. MLOps is a complicated extension to DevOps in that the ML models are
easily changed, yet have a significant impact on business decisions and must therefore be
carefully managed.
To minimise the coupling between the model lifecycle and application lifecycle, models
are typically made available as microservices in the enterprise operational IT environ-
ment. Application integration process can invoke the model services with hold-out test
data sets and make sure that the performance of the model is adequate for the business
purpose before integrating with the rest of the application components. If the model per-
formance is found lacking, the AI model lifecycle starts again. MLOps activity also needs
to keep the model and training data versions in sync so that any changes to the model in
the future can be adequately tracked and audited.
Consider a situation where an ML model is trained using a set of training data and
deployed onto a production environment. The AI then makes business decisions such as
offering discounts to particular customers or estimating insurance premiums. At some
stage, new data is used to train a new model that is then deployed into production. The
new model will almost certainly behave differently, so some customers may no longer be
offered a discount or insurance premiums may be higher for one group of customers. From
an operational perspective, the enterprise needs to understand which model was in use at
any point in time and what training data was used to develop that model.
In some ways, this could be considered to be just standard configuration management
and best practice in configuration management so DevOps is certainly the best place to start
when considering ML Ops. However, it’s also important to think about some of the subtle-
ties that ML brings to this problem. For example, what if a customer makes a request under
GDPR for their personal data to be deleted? If that personal data exists in a training set, it
needs to be deleted in accordance with the regulatory requirement. At that point, there is an
immediate question about the ML model. The model is technically a summary of the train-
ing data. Hence, is there a requirement for the ML model also to be deleted and rebuilt? If
the model does need to be deleted, then the behaviour of the system with that model is no
longer re-producible. Consequently, we have an operational system that made operational
decisions and we cannot re-create the conditions under which those operational decisions
were made. If there are any historical concerns about those decisions, we can’t understand
exactly how and why they were made! These are important questions that the application
owners have to consider and make business decisions that make sense legally and ethically.
APPLICATION LIFECYCLE
Now you have mastered the AI component, it is time to integrate it into your enterprise
application. Since we have already discussed traditional application development at the
beginning of this chapter, in this section we only focus on how the software development
activities are influenced by the inclusion of the ML component in the application. We use
Some (Not So) Boring Stuff ◾ 243
the right side of Figure 10.2 and start at the bottom right with Business Requirements and
go counterclockwise.
Business Requirements
This is a critical activity in the application lifecycle since it matches the application need to the
capabilities of the AI component. The requirements should reflect the business goals (e.g. sup-
port multiple languages or countries), use cases (e.g. end user vs. domain expert) and system
performance expectations (e.g. accuracy, runtime performance, etc.). It should also include
any specific concerns about robustness, security, bias, ethics, human-level explanation and
system transparency explicitly. These requirements must be considered during the PoC and
Prototyping stages of the AI Model Lifecycle before embarking on the AI application.
To add to the complexity, there are situations when the requirements are uncertain. A
practical example is when the system needs to work in the ‘open world’ (e.g. self-driving
cars, again). It is simply not possible to anticipate all the different variables and their impact
on the system performance. One way out of this is to think about defining properties (e.g.
safety, security, etc.) of the system important for the business/mission and make sure we
can characterise them and address them. Chechik [35] proposes a framework for manag-
ing uncertain requirements in three phases:
Functional Decomposition
This activity requires careful consideration of which task in the application can be reli-
ably executed by an ML component. If the output of the ML task has a high consequence
(e.g. human life) and the confidence in the ML output is low, then the task is not suitable
for AI. Conversely, if the output has low consequence and high confidence, then it is an
ideal task for AI. A critical requirement is the availability of data of adequate quality and
quantity before an ML model building process is attempted. Any assessments and/or deci-
sions about reusing existing AI components either from internal projects or from exter-
nal sources are also needed. These considerations have significant impact on the PoC and
Prototyping stages of the AI Model lifecycle.
Application Design
This activity must address all the components needed to build the application, not just the
ML component. As discussed in Chapter 2, key contributors to the application complex-
ity such as the number of AI components, interdependency of AI components and event-
driven or context aware behaviour need to be assessed and understood. The application
architecture must support the use of other mechanisms such as rule-based checkers as
guardrails to make sure that the ML components are meeting the business/mission-critical
244 ◾ Beyond Algorithms
objectives. The demand for computing resources at run time (i.e. when the users are actu-
ally using the application) has to be sensitive to form factors of edge devices (i.e. cell phones,
tablets, etc.) and other networking limitations.
User Interaction
With the proliferation of smart phones and their role as popular application delivery chan-
nel, applications need to support productive human-machine collaboration and a better
user experience. Beyond the traditional graphic user interface, the opportunity to leverage
other user interaction modes that use AI (i.e. unstructured text or speech) is very attractive.
As an example, many companies allow speech as the interface to interact with an automo-
bile for specific tasks. There is no reason for today’s business applications not to exploit this
technology. This may even help with age old accessibility challenges with technology. The
decision on which interaction paradigm to use depends on the quality of the AI component
available as well as on the ability for a graceful recovery in the event of its potential failures
during the interaction such as spelling mistakes and unfamiliar accent. If a decision is made
to include an AI component (e.g. speech-to-text or text-to-speech) to support user interac-
tion, it should be treated an additional ML component in the AI application lifecycle and
put through proper AI development and evaluation process using realistic input data.
Application DevOps
Even though DevOps for traditional software development is generally well understood
[36,37], the integration with the MLOps from the AI lifecycle needs some care. In the
iterative or agile model of development, the full capability of the application may not be
realised in the early stages of development, but only after many iterations/sprints. It will be
necessary to keep track of versions of relevant non-AI software components and their ver-
sions to complement the versions of AI models and data to provide a view of the evolving
application capability over many iterations.
AI Application Testing
Testing AI applications is still an open research area, and it requires many innovations to
be practically useful. In this section, we point out some common practical approaches and
specific challenges.
Consequence of Personalisation
Some AI applications (e.g. product recommenders) require customisation of the AI out-
put to match the user profile. The correctness of such an output cannot be validated by
a generic ‘user acceptance test’, but only by explicit or implicit user feedback. This can
be captured by the acceptance of the recommendation by the users or by their disregard.
Therefore, the validation of the AI model can only be done during the application monitor-
ing activity during deployment. This is an interesting use of monitoring to be discussed in
more detail below.
Some (Not So) Boring Stuff ◾ 247
Operations
There are three areas in which the inclusion of an ML model in the AI application affects
the traditional operations activity.
• The need for parallel management of the evolutions of ML model versus the rest of
the application. The model needs close monitoring and more frequent updates com-
pared to the rest of the application.
• Depending on the nature of the application, it may be difficult to judge if the observed
application behaviour is correct or not. This is because we may not know the right
answer at that time.
• Debugging needed to resolve problems with the behaviour of ML models is intrinsi-
cally more complicated (involving the details behind the training data and the ML
algorithms) compared to looking for specific lines of offending code or configuration
parameters.
Mandatory Monitoring
In AI systems monitoring is not an option, but a required activity. We refer to [40,41] for
more details on the need for monitoring and the various practical approaches. Here we just
summarise some key ideas.
• During model creation, the training data was not chosen carefully. Distribution
of the features in the training data does not match the distribution seen in the
production.
• A feature that was used as input to build the model is not available in production.
The only option here is to remove the feature in the training data or use an alternate
feature that is already available in the production data or can be derived from the
production data, with additional feature engineering.
248 ◾ Beyond Algorithms
• Data from different sources do not match. Data used to train the model came from a
different source from the one used during production. Due to differences in feature
engineering in the data pipelines, even the same features in the testing and live data
may have different values.
• Dependencies on other systems. It is not uncommon for data scientists to use the
data from other systems not directly under their supervision for model training.
If these systems change the way they produce their data, unknown to the data sci-
entists, there will be a negative ripple effect in the data use. For example, if there
are changes to the government guidelines on benefits eligibility (e.g. age for social
security changed), the data before and after the change will have inconsistencies
that have to be reconciled.
Beyond these practical data and modelling issues, there are other reasons models may get
stale or behave badly.
• Changes in Business Context: The historical data used to train the models does not
represent the current behaviour of the population. For example, the business driv-
ers have changed (e.g. increasing use of the internet for schools, work, etc.) since the
eruption of the COVID-19 pandemic.
• Changes in Social and Cultural Behaviour: Over a period of time, behavioural pat-
terns change in various domains (e.g. politics, fashion, music, entertainment, etc.).
These are particularly significant in the behaviour of consumer-facing recommender
systems.
• Adversarial Attacks: As described in Chapter 5, various types of attacks by adver-
sarial actors can make your model behave badly. The impact of these attacks can
be severe depending on how much familiarity they have to the modelling and data
environment.
• Continuous Learning During Production without Guardrails: This goes back to
Microsoft’s Social Bot Tay that had to be shut down due to unacceptable behaviour
due to learning directly from hateful user input data. We refer to the vignette on
“Continuous Learning” for more discussion on this topic.
CONTINUOUS LEARNING
For many people, AI is all about ML. Most non-AI specialists assume that all AI systems
must include ML and quite often believe that learning in AI is a continuous process just
as it is in human intelligence. There is a natural assumption that any AI system must be
continually learning and continually improving its decision-making.
In reality, the need for predictable and assured behaviours in AI applications makes
continuous learning impractical for most applications. There are three fundamental rea-
sons for this.
Firstly, we need to think about the model itself. Being able to verify, analyse and
explain historical decisions is important in delivering trustworthy AI, and to do this, we
need to have access to the model used at the time a decision was made. That means
either maintaining a continuous set of backups or implementing a mechanism to be
able to re-create the model for any point of time. Both approaches require considerable
resources and sophisticated engineering.
Secondly, there is the issue of testing and assurance. Normally, any model would be
tested before being deployed to ensure that it operates as expected. If we update a model
continuously, how in practice do we undertake the required level of testing in the time
available?
Thirdly, AI applications can (and do) learn bad behaviour. In previous chapters, we
have talked about bias and the risk of ML applications learning a bias that exists in histori-
cal data. If an AI is learning continuously, then there is a risk of learning biases that exist in
society. Sometimes, this can be the result of a deliberately malicious act as experienced
in 2016 when Microsoft’s Tay Bot acquired the prejudices of Users.
Whilst continuous learning is attractive from the visionary AI perspective of truly emu-
lating human intelligence, its use should be focussed in the right areas. Those AI applica-
tions that require fixed, version-controlled models that have undertaken some form of
testing and assurance are clearly unsuitable for continuous learning.
Other applications, in particular recommender systems such as those used by com-
panies such as Netflix and Amazon, are more suitable. Applications that are suitable are
generally those where the decisions made are not critical enough to warrant repeatable
and explainable decisions. In addition, it is helpful if the User is either constrained in
their ability to influence the ML or invested in its success. For example, in shopping
recommender systems, the ML is leveraging the human’s buying decisions so there are
limited opportunities for the User to mislead the system. In other applications, the rec-
ommendations are made to professionals such as doctors who are strongly invested in
the proper use of the system.
What to Monitor
Ideally, we want to know the accuracy of an AI model running in production for every
instance of its use. Only in some simple scenarios (such as in recommender systems), do we
know the validity of the AI output, if there is live user feedback. In most other cases, we do
not know the accuracy of a model immediately. Examples are fraud detection, healthcare
treatment recommendation, predicting future housing prices, etc. Given these constraints,
250 ◾ Beyond Algorithms
James Luke
we can only exploit what we can observe in the live system to form an indirect assessment
of the quality of the AI models being served. Here are some examples of things to track:
Sustenance
As mentioned in Chapter 2, we call the upkeep of the AI application ‘Sustenance’ instead
of the traditional ‘Maintenance’ for software applications. There is an ‘optional’ nature
to maintenance of traditional applications; application owners decide what changes go in
during operations (e.g. hot fixes), what goes in the next release (e.g. bug fixes, enhance-
ments, new functions, etc.) and how often to put out bug fixes or new releases. When it
comes to AI applications, the ‘Sustenance’ starts immediately after the deployment of the
AI application and continues till the application is withdrawn from operations. The con-
stant monitoring, ongoing assessments and immediate actions that are needed to support
AI applications should remind you of a 2-year-old mischievous child who never grows up.
There are further complicating factors to AI sustenance. Since the AI technology is
changing rapidly, it is necessary to evaluate the available technology on a regular basis and
update the components to reflect the state of the art. As discussed in Chapter 2, applica-
tions must be architected in such a way that switching of components is painless. If the
enterprise is using external technology providers to support the AI application, it may be
necessary to change technology providers more frequently based on their current capabili-
ties. All these changes must be reckoned with the testing challenges mentioned above.
PROJECT MANAGEMENT
In Figure 10.2, Project Management covers all activities in the AI application develop-
ment. Typical project management involves definition, assignment and management of
resources; tasks; schedules and stakeholders to meet the business goals. Tracking progress,
managing the risk & outcome through project execution are key elements. The introduc-
tion of an AI component creates some new twists to this otherwise ‘established’ process.
252 ◾ Beyond Algorithms
• Two Lifecycles to Manage: Application lifecycle versus AI Model lifecycle. While the
AI Model lifecycle lifecycle is critical to the project, it also bears the most uncertainty
on the quality and timeliness of the delivered AI model(s).
• Data Strategy & Governance: Availability of data in adequate quality and quantity
to build models, data provenance throughout the life of the application (which can be
many years), data storage & retrieval, relevant data privacy and security are all new
aspects introduced by the AI component.
• Business Goals versus AI Performance: Typical data scientists claim success based
on the accuracy of the AI models they create. Business goals (see Chapter 5) revolve
around value of the technology in a tangible way such as increased revenue and
reduced cost.
• Managing Trust Expectations: Understanding the trust requirements (fairness,
explanations, etc.) of an AI application at the beginning of the project is critical for
its success. This is something completely new for the AI applications.
• Persistent IT Infrastructure: Due to the demands for reproducibility and regulatory
auditing, the data pipelines, model versions and the supporting infrastructure have
to be maintained for the life of the application.
data is presented, different hardware configurations can change the resulting model. Are
you able to re-run the training process and recreate the model that was responsible for the
contentious decision? Are you able to recreate the operational environment in which the
contentious decision was made?
In setting up your AI project, it’s critical that you consider these questions and ensure
you have the correct auditing capabilities to be able to explain why your application
behaved the way it did.
SECURITY
Security is another of those (not so) boring considerations that must be considered through-
out the delivery of any project. In terms of security, AI is just like many other emerging
technologies; it creates a whole new set of risks.
For anyone enthralled by AI, understanding emerging security risks adds a whole new
dimension to this fascinating subject. A detailed study of the sub-field is beyond the scope
of this book. For those who are interested in a deep dive, we recommend [45].
To whet your appetite, here are some simple examples of things you may wish to con-
sider when running your AI projects:
• AI Models are, in many cases, a summary of the training data used to create them.
Techniques are emerging that allow an attacker to re-construct training data using
only the model. This means that you must be careful about who is able to access the
Model and the data used to construct it. If you develop a Model using highly sensitive
data and then deploy the Model in a public environment, you may be giving away
sensitive data.
• Data is everything in AI (we may have mentioned that before) and data actually
determines the behaviour of many AI applications. If your source data is not secure,
an adversary may be able to tamper with the data and, as a result, change the func-
tionality of your application.
• Whilst an adversary may not have direct access to your development environment,
they may actually be the supplier of your data. This effectively gives an adversary
indirect access to your development environment. Consider a military system that
operates with an ML-based classifier. An adversary may deliberately behave in a cer-
tain way in peace time, for example, transmitting on certain radio frequencies, to
ensure the classifier behaves in a certain way. In war time, the adversary may then
change their behaviour in war time in order to fool the classifier.
• People are often the weak link in IT security. AI projects are heavily dependent on
people to label data. This creates an opportunity for an adversary to have an unwanted
influence on the effectiveness and behaviour of your application.
As in conventional application development, security is critical yet may feel like an expen-
sive investment with little return. The only return is that nothing happens. However, it
254 ◾ Beyond Algorithms
is absolutely critical and don’t underestimate the importance of ensuring your data and
models are protected.
REFERENCES
1. J. Desjardins, “How many millions of lines of code does it take?” (February, 2017) https://
www.visualcapitalist.com/millions-lines-of-code/.
2. V. Antinyan, “Revealing the complexity of automotive software,” Proceedings of the 28th
ACM Joint Meeting on European Software Engineering Conference and Symposium on the
Foundations of Software Engineering, pp. 1525–1528 (2020).
3. INCOSE Systems Engineering Handbook-A Guide for System Life cycle Processes and Activities,
4th Edn. Wiley (2015).
4. C. Shamieh, “Systems engineering for dummies,” IBM Limited Edition, John Wiley & Sons,
Hoboken (2012).
5. F. P. Brooks, The Mythical Man-Month: Essays on Software Engineering, Anniversary Edn.
Addison-Wesley Longman, Reading (1995).
6. S. McConnell, Code Complete: A Practical Handbook of Software Construction, 2nd Edn.
Microsoft Press, Redmond (2004).
7. N. Cerpa and J. M. Verner, “Why did your project fail?” Communications of the ACM, 52(12),
pp. 130–134 (2009).
8. L. Northrop, et al., Ultra-Large-Scale Systems: The Software Challenge of the Future,
Software Engineering Institute (2006). https://resources.sei.cmu.edu/asset_files/
Book/2006_014_001_635801.pdf.
9. D. Scully, et al., “Machine learning: the high-interest credit card of technical debt,” Software
Engineering for Machine Learning Workshop, NIPS (2014).
10. R. Akkiraju, et al., “Characterizing machine learning process: A maturity framework,” https://
arxiv.org/abs/1811.04871 (2018).
11. A. Arpteg, et al., “Software engineering challenges of deep learning,” Proceedings of the 44th
Euromicro Conference on Software Engineering and Advanced Applications (SE-AA), pp. 50–59
(2018).
Some (Not So) Boring Stuff ◾ 255
12. S. Amershi, et al., “Software engineering for machine learning: a case study,” ICSE-SEIP’10
Proceedings of the 41st International Conference on Software Engineering: Software Engineering
in Practice, pp. 291–300 (2019).
13. A. Horneman, A. Mellinger, and I. Ozkaya, AI Engineering: 11 Foundational Practices, CMU
Software Engineering Institute (2019).
14. P. Santhanam, “Quality management of machine learning systems,” In International Workshop
on Engineering Dependable and Secure Machine Learning Systems, pp. 1-13. Springer, Cham,
2020.
15. I. Ozkaya, “What is really different in engineering AI-enabled systems?” IEEE Software, 37(4),
pp. 3–6 (July-August, 2020).
16. J. Bosch, I. Crnkovic, and H. H. Olsson, “Engineering AI systems: a research agenda,”
arXiv:2001.07522 (2020).
17. W. C. Benton, “Machine learning systems and intelligent applications,” IEEE Software, 37(4),
pp. 43–49 (July-August, 2020).
18. P. Aroonvatanaporn et al., “Reducing estimation uncertainty with continuous assessment:
tracking the ‘cone of uncertainty’,” ASE’10: Proceedings of the IEEE/ACM International
Conference on Automated Software Engineering, pp. 337–340 (2010).
19. A. Ng, “Machine learning yearning,” https://www.deeplearning.ai/programs/.
20. M. Zinkevich, “Rules of machine learning: best practices for ML engineering,” Google Blog:
https://developers.google.com/machine-learning/rules-of-ml/.
21. A. Chakarov, et al., “Debugging machine learning tasks,” arXiv:1603.07292.
22. R. Lourenço, J. Freire and D. Shasha, “Debugging machine learning pipelines,” DEEM’19:
Proceedings of the 3rd International Workshop on Data Management for End-to-End Machine
Learning, pp. 1–10 (2019).
23. F. Hohman, et al., “Visual analytics in deep learning: an interrogative survey for the next
frontiers,” IEEE Transactions on Visualization and Computer Graphics, 25(8), pp. 2674–2693
(2019).
24. E. Breck, et al., “The ML test score: a rubric for ml production readiness and technical debt
reduction,” IEEE International Conference on Big Data (Big Data), 1, pp. 1123–1132 (2017).
25. M. Hind, “Explaining Explainable AI,” XRDS: Crossroads, The ACM Magazine for Students,
25(3), pp. 16–19 (2019).
26. R. Guidotti, et al., “A survey of methods for explaining black box models,” ACM Computing
Surveys, 51, pp. 1–42 Article no. 93 (2018).
27. V. Arya et al., “AI explainability 360: an extensible toolkit for understanding data and machine
learning models,” Journal of Machine Learning Research, 21, pp. 1–6 (2020).
28. S. Verma and J Rubin, “Fairness definitions explained,” IEEE/ACM International Workshop
on Software Fairness (FairWare) (2018).
29. R. K. E. Bellamy, et al., “AI fairness 360: an extensible toolkit for detecting, understand-
ing, and mitigating unwanted algorithmic bias,” IBM Journal of Research and Development,
63(4/5) (2019).
30. H. Xu, et al., “Adversarial attacks and defenses in images, graphs and text: a review,”
arXiv:1909.08072 (2019).
31. IBM Research Blog “The adversarial robustness toolbox: securing AI against adversarial
threats,” https://www.ibm.com/blogs/research/2018/04/aiadversarial-robustness-toolbox/.
32. M. Arnold, et al., “FactSheets: increasing trust in AI services through supplier’s declarations
of conformity,” IBM Journal of Research and Development, 63(4/5) (2019).
33. MLOps: Continuous delivery and automation pipelines in machine learning https://
cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-
machine-learning.
34. D. Sato, A. Wider and C. Windheuser, “Continuous delivery for machine learning,” https://
martinfowler.com/articles/cd4ml.html.
256 ◾ Beyond Algorithms
The Future
How do you write a single chapter on the future of AI? Futurologists, science fiction
riters, brilliant academics and commentators in all different shapes and sizes have writ-
w
ten volumes on the subject. Some of the greatest thinkers in science and technology from
Kurzweil to Hawking and Musk have made bold predictions about the future of AI includ-
ing bold claims about the advent of the singularity. For those not familiar with the concept
of the Singularity, the theory is that there will be a point in time when AI becomes intel-
ligent enough to create better AI. At that point, AI will just keep creating more and more
intelligent versions of itself achieving levels of intelligence beyond human comprehension.
The question, of course, is what happens to us when AI takes over?
Given so many fascinating, amazing and quite terrifying predictions, what can we pos-
sibly add that others haven’t already covered? Perhaps, a sense of perspective and a dose of
reality of AI in the Business Enterprise as opposed to the Starship Enterprise.
For a start, AI as we know it, is way off being “intelligent”. Sure, we can teach it complex
things, and those actions can look intelligent, but the more you understand it, the more
you realise how far away from real intelligence it actually is. Society seems to have no
problem in accepting an airplane’s autopilot as unintelligent, even though it can take off,
fly and land a 747 jet better than any human who is not a pilot. AI is what we teach it, and
nothing more or less. AI is an incredibly useful asset for business, we haven’t yet scratched
the surface of all the complex processes we could be automating with it, but it is not about
to achieve consciousness and enslave humanity.
Throughout this book, we have attempted to focus on the practical application of AI to
real-world problems. Continuing in that spirit, we will finish this book by reflecting on a
few significant aspects of the future of AI in the Enterprise.
James Luke
REFERENCES
1. A.M. Turing, “Computing machinery and intelligence,” Mind, New Series, 59(236),
pp. 433–460 (October 1950).
2. M. L. Minsky, “Computation: Finite and Infinite Machines”, Prentice Hall (1967)
3. P. Rejcek, “Can Futurists Predict the Year of the Singularity?”, Mar 31, 2017. https://
singularityhub.com/2017/03/31/can-futurists-predict-the-year-of-the-singularity/
The Future ◾ 259
Getting Data
The value of data has clearly been recognised by the large web companies who aim to cap-
ture as much data about their users as possible. The nature of their business is such that
capturing data at a massive scale is challenging but achievable. The web companies focus
huge resources on very specific problems such as enterprise search. They have tens of mil-
lions of Users who return on a daily basis and use technology that enables tracking of their
behaviour. Much of the data captured and required by the web companies is ‘new data’; it
is generated by and then used by the web ecosystem in which the web company’s applica-
tions exist.
For the business enterprise, the first challenge is to obtain data. It is not uncommon for
an enterprise AI application to require data that is not at present available to the enterprise.
A simple example of this is weather data. A retail organisation may wish to develop an ice
cream sales prediction algorithm that requires up-to-date weather data. This data will need
to be either collected from scratch or more likely purchased, and it should be no surprise
that there is a growing market for data and data services [3]. We should expect a massive
growth in this business as companies increasingly buy and sell data from one another.
specialised and stored in disparate systems. Enterprise data is often very old and has been
captured over many decades. It should therefore be no surprise that the growing discipline
of data science is delivering new tools, methods and algorithms specifically for cleansing
and managing enterprise data [4].
Whilst data scientists have made a great start in developing new tooling, there is still a
long way to go. We are only just starting to address the challenges of managing enterprise
data. Two particular areas of concern will be privacy and scale, scale in terms of compu-
tation (see the next section to see how compute-hungry AI is becoming) and in terms of
numbers of AI projects. Small AI programs, each making small autonomous decisions,
will proliferate in the enterprise, this will take some looking after. As the number of AI
applications grows within the enterprise, the sophistication of the tooling to manage the
supporting data sets will need to increase.
From a privacy perspective, increasing public awareness and regulatory pressures will
mean that enterprises need to be able to respond to regulatory requests to explain decisions
or redact data. As already explained, redacting data will impact training data sets and
the ability to explain historical decisions. Tooling is going to be required to manage these
complex issues.
Synthetic Data
Ultimately, the best way to address the privacy issues is to use data that does not include
any personal information. This can only be achieved by either anonymising real-world
data or creating completely new synthetic data. In Chapter 7, we discussed the pros and
cons of using synthetic data. One thing that we can predict with confidence is that argu-
ments about the use of synthetic data will continue long into the future. The fundamental
argument will continue to be that the generation of representative synthetic data is only
possible if you have sufficient knowledge about the underlying system; if the knowledge
exists to generate the representative data, then there is no need to use AI. Unfortunately,
this argument assumes that AI is only used to model systems rather than to make decisions
about those systems. If we look at other forms of engineering, the use of simulators to test
and evaluate systems is both widespread and essential. We do believe that AI engineering
will increasingly follow a similar strategy and there will be an increase in the use of syn-
thetic data.
FIGURE 11.1 A plot of logarithm of PetaFlops/s-days versus calendar year for training popular AI
applications. Prior to 2012, the AI computation was following Moore’s law of doubling every 2 years.
Since 2012, the computation in the largest AI training runs has been doubling every 3.4 months,
thus increasing by more than 300,000x in that period. 1 PetaFlop/s-day = 1015 neural net operations
per second for a day. (From reference [8] reproduced with permission.)
262 ◾ Beyond Algorithms
computing resources in data centres (i.e. special hardware, extensive cloud computing,
etc.) with associated substantial energy consumption [9,10].
To meet this challenge, recent hardware developments for deep learning (DL) show a
migration from a general-purpose design to more specialised hardware to improve com-
pute efficiency using accelerators and new architectures. Here are four examples of recent
advances in hardware technology that can contribute to more computing efficiency for AI
workloads and/or lower power consumption.
• Reducing the numerical precision of data and computation is extremely effective for
accelerating DL training workloads, while saving significant computing time and
power. Sun et al. [11] proposed a hybrid 8-bit Floating Point (HFP8) format and end-
to-end deep neural network (DNN) distributed training procedure. Using HFP8,
they demonstrated successful training of DL models (e.g. Image Classification, Object
Detection, etc.) without any loss of accuracy. This paves the way for a new generation
of 8-bit hardware to support robust neural network models.
• Esser et al. [12] used a brain-inspired neuromorphic computing architecture based
on spiking neurons, low-precision synapses and a scalable communication to achieve
unprecedented energy efficiency. They implemented deep convolution neural net-
works that approach the state-of-the-art classification accuracy across eight standard
datasets in vision and speech, while preserving underlying energy efficiency and high
throughput. This approach makes it possible to combine powerful DL algorithms
with the energy efficiency of neuromorphic processors.
• Neural network training built on traditional von Neumann computing architecture
can be slow and energy intensive, due to the need for moving large volumes of data
between memory and processor (the ‘von Neumann bottleneck’). While the use of
analogue non-volatile memory to perform the calculations in situ can accelerate the
training by avoiding the data movement, the resulting model accuracies are gener-
ally less than those of software-based training. Ambrogio et al. [13] demonstrated
mixed hardware–software neural network implementations that achieve accuracies
equivalent to those of software-based training on various commonly used datasets.
The estimated computational energy efficiency and throughput for their implementa-
tion exceed those of today’s GPUs by two orders of magnitude, leading to a future of
hardware accelerators that are both fast and energy efficient.
• When the feature space becomes large, traditional ML classification algorithms (e.g.
Support Vector Machines) become computationally expensive. Quantum algorithms
exploit an exponentially large quantum state space through controllable entangle-
ment and interference. Havlíček et al. [14] demonstrated two novel methods on the
superconducting quantum processor to speed up the processing, thus heralding a
new class of tools that use noisy intermediate scale quantum computers to ML.
These are just examples of work going on in IBM Research to explore alternatives to the
current computing paradigms to support ML workloads. It is quite possible that these
The Future ◾ 263
approaches can provide a faster and more energy-efficient alternative to the GPU-intensive
calculations of today in the near future.
• It may be expensive to get large amount of labelled data due to the business cost. This
is particularly true where there is no existing business process to capture the data
naturally. You are left with the option to have skilled business experts invest their
time to do the labelling when the business value of the AI is not so clear.
• Emergent behaviour in the business can create new artefacts over time that may not
be known at the training time. Think of an AI application that is designed to identify
a dozen different financial fraud patterns in traditional banking, only to find out that
there are new types of frauds using digital bitcoins that they did not anticipate.
• There may be rare instances that do not allow capture of adequate data. With the
proliferation of internet connected devices (we mean, refrigerators, dish washers,
laundry washing machines, etc., really), it is attractive to come up with an AI appli-
cation to do preventive maintenance on these devices that predicts which machines
are going to fail in the next month and what parts are needed to fix them. If some
of these machines turn out to be very reliable (like the old days), there may not be
adequate failure data to train the AI models. Another practical example of this is the
challenge in creating Natural Language Processing (NLP) technology for more than
7000 languages in the world [15], when most business applications can support only
a very small fraction of them in practice.
• Privacy and regulatory constraints may prevent access to the required data. A very
powerful example of this exists in healthcare where massive volumes of patient data
already exist; however, it is not possible for privacy reasons to consolidate this data
into a single store and make it easily accessible to AI engineers.
algorithmic approaches that address this challenge and may help to alleviate data-related
problems in AI projects. To make the discussion more concrete, we use examples of
applications.
Now, let us say our Londoner also wants to visit Paris and needs help with English-
French translation also. This means we need to give examples of training data and build
a model for the English-French translation. Let us say we did that. Now, Londoner has
a Parisian friend who wants to visit Madrid. Do we need to train a new model with the
labelled data for French-Spanish translation? As the number of languages increase, the
number of models for pair-wise translations increase combinatorically, six models for four
languages, ten models for five languages, etc.
The most straightforward approach to translate between languages where there is no
direct training data available is to use a ‘bridge’ language. If we want to translate between
languages A <−> B, use language C as the bridge. First translate between A <−> C and then
between C <−> B. The bridge language is often English since the data is more readily avail-
able. Two potential concerns about this approach are total translation time doubles due
to the two translations needed, and there is a potential loss of quality introduced by the
bridge language.
The algorithms in Google’s Multilingual Neural Machine Translation System [19] make
zero-shot translation possible between two languages for which direct labelled training
data is not available. The key idea is the creation of a ‘universal’ language representation
that helps to mediate between two languages. Obviously, the underlying model can do
zero-shot translation only between languages it has seen individually before as source and
target languages during training at some point but not for entirely new languages. We can
think of this approach as an efficient form of transfer learning. Google Translate [20] does
the translation based on this technology for more than 100 languages.
This is an example of the use of ZSL with real business impact. The technology obviously
required substantial investment by Google to develop and deploy and is available as web
services. How such algorithms can be created in other application areas is a topic of active
research [21].
i. Source domain dataset consisting of digital images of tissue samples obtained from
primary tumours of colorectal cancer, divided into eight different types/classes of
textures, with 625 image samples for each, with a total of 5000 samples
ii. Target domain dataset consisting of a total of 1755 digital images of healthy and
tumoural samples of tissues from colon, breast and lung
The goal of the FSL was to extend the base model of eight classes to five new classes: one
new type of colon tumour, two types of breast tumour and two types of lung tumour. They
varied the number of target image samples for each of the new classes to see their impact
on classification accuracy. Their results showed that their few-shot approach was able to
obtain an accuracy of 90% with just 60 training images, even for the lung and breast tissues
that were not present on the source training dataset.
FSL can also be thought of as another transfer learning technique, leveraging the infor-
mation contained in the source domain to complement the information contained in the
few training samples in the target domain. When the source and target domains are very
different, FSL becomes less useful.
• A multinational corporation (e.g. bank) with operations that are loosely coordi-
nated across national boundaries subjected to local governance (e.g. General Data
Protection Regulation in the European Union) on the data collection and usage
• A large hospital system with patient data in multiple sites subjected to regulations
(e.g. HIPAA) that restrict access and movement of data
The clue is in the word ‘Federated’. With Federated Machine Learning (FML), instead
of bringing all the data to a single location to build/train your AI model, you can leave
the data where they are and bring the algorithm to it. Once the local models are trained
with local data, you can exchange models which are much smaller in size and do not have
the same sensitivities as the raw data. Preservation of privacy in building models is a key
requirement for its success. This will be huge for business in many ways, not just in net-
work and data storage costs, but security as well.
For domains such as healthcare, this will be a game changer. It will mean that research-
ers can build analytics on patient data, potentially held in multiple hospitals, without ever
needing to see the raw data. The patients’ privacy is completely retained as their data never
leaves the hospital’s firewall, only the anonymous model code. If that raw patient data also
happens to be large (e.g. CT scans which can be over a terabyte each), the cost savings in
not having to move thousands of CT scans across a network or the internet each time you
need to test a ML model will also be significant.
The Future ◾ 267
This is not just a pie-in-the-sky vision. Roth et al. [26] investigated the use of FML to
build a model for breast density classification based on the Breast Imaging, Reporting &
Data System (BIRADS) in a real-world collaborative setting with seven clinical institutions
across the world. Despite substantial differences in the datasets (mammography system,
class distribution, and size of data), they demonstrated that FML performs 6.3% on aver-
age better than their counterparts trained on an institute’s local data alone. In addition,
there was 45.8% relative improvement in the models’ generalisability when evaluated on
the other participating sites’ testing data.
Here is an example of a tool kit that supports FML [27]. Who knows what new advance-
ments will be made when the use of FML becomes an every-day practice across the institu-
tions and application domains?
Edge AI
‘Edge’ is the brave new world of Internet of Things (IoT) devices such as mobile phones,
wearable devices and automobiles which create massive amounts of data every day. Edge
computing refers to the computation done locally, say on a mobile phone, rather than send-
ing the raw data across a network for a remote server to perform the calculations. This
is possible primarily due to the availability of significant processing power, memory and
storage in the IoT devices. The goal of Edge AI is to leverage low latency (i.e. short response
time), high privacy, improved robustness and efficient use of network bandwidth & con-
nectivity. It can also be thought of as an extension of FML applied to the edge, with the
additional challenges in communication, heterogeneity in edge system implementations
and disparate data collections [28].
If you have a smart speaker at home, you can see this in action when you ask it a com-
mon question like “What’s the weather like today?” It will interpret your speech on the
device itself and send the interpreted command to the server on the cloud, which replies
with a response “It is 75° (in Fahrenheit of course) and sunny”. The key point here is that the
raw question as speech did not have to go through the network to the cloud for processing.
Here is a business example of applying Edge AI to provide accurate dietary assessment
to the users to improve the effectiveness of weight loss interventions. Liu et al. [29] imple-
mented a “Deep Food on the Edge” system which does preprocessing and segmentation of
the food images on a mobile device and uses a DNN on the cloud to do the food recogni-
tion. Their results showed that their system outperformed existing work in terms of food
recognition accuracy and reduced response time to a minimum, while lowering the com-
puting energy consumption.
It is very exciting to watch the evolution of edge AI in different business domains. We
have no doubt that this will shepherd in a new generation of cheaper “smarter” decision-
making consumer products.
Neuro-Symbolic AI
Moving beyond the training data challenge, there is a need to bring together two different
disciplines within the field of AI.
268 ◾ Beyond Algorithms
Neuro-symbolic algorithms are generally quite complex since they need to find a knowl-
edge representation that is suitable for both learning and reasoning. To illustrate its applica-
tion, we use the example of a Neuro-Symbolic Cognitive Agent (NSCA) [32] that evaluates
student drivers as they perform specific driving tasks on a training simulator employed in
real-world scenarios. On the symbolic side, there is a set of rules that capture the criteria
for evaluation expressed in terms of temporal logic with uncertainties. An example of such
a rule is:
If the trainee’s car approaching an intersection, arrives at the intersection and
stops when traffic is coming from the right, then the trainee gets a good evaluation.
The Future ◾ 269
Notably, in addition to the sequence of time events implied here, there is no specific men-
tion of distance of the car from the intersection. In the study, five students participated in
the driving test consisting of five test scenarios each. For each scenario, the relevant data
from the simulator (i.e. relative positions and orientations of all traffic, speed, etc.) was
used in the ‘neural’ training part. Assessment scores on several driving skills (i.e. vehicle
control, traffic flow, safe driving, etc.) were provided by three human driving instructors
in real time during the simulation attempts, which were simultaneously observed by the
NSCA. The results showed that the NSCA was able to learn from these observations and
assess the student drivers similar to the driving instructors and that the underlying knowl-
edge can be extracted in the form of temporal logic rules.
As should be obvious from the discussion above, the neuro-symbolic algorithms of
today are very specific to the use case under consideration. It will take some time for the
technology to mature when the patterns of use cases can be identified, and a common
framework can be established for business use.
• Training data defines the functionality of the ML applications. Quality and quantity
of training data matter.
• An AI system can perform in an unpredictable manner when presented with previ-
ously unseen situations. Traditional testing approaches based on upfront functional
specifications and expected deterministic behaviour do not work.
• Building trustworthy AI requires careful attention paid to a full range of stakeholders
(both inside the development organisation and society as a whole).
• AI lifecycle includes model lifecycle for the duration of the applications deployment,
plus the traditional software application lifecycle, these all add to project cost, com-
plexity and risk.
• Detecting drifts in model performance during deployment and updating models are
compulsory activities.
• Supporting IT infrastructure needs persistent versions of data, models and code for
the life of the application (may be even after) for auditing purposes.
Given these differences, traditional systems engineering processes and tools must change
to meet these needs. We call this new discipline, ‘AI Engineering’. Luckily, there have been
270 ◾ Beyond Algorithms
numerous authors discussing these very issues [33–40] recently. To highlight its impor-
tance, United States Office of the Director of National Intelligence sponsored the Software
Engineering Institute at Carnegie Mellon University to lead the AI Engineering Initiative
for the US government [41]. With so much interest and excitement in delivering real value
through AI across the world and awareness of the need for a new discipline, we should
expect AI Engineering to mature significantly in the next 5 years.
Here are at least two fundamental objectives for the new AI Engineering discipline to
address:
• How do we design and build Trustworthy AI systems for a broad set of stakeholders?
• How do we specify and manage a system for mission/business-critical uses where
some decision-making components can be wrong for some fraction of the time, say
20%?
As a practical matter, at the current state and maturity of the AI technology, we need to
develop various techniques to improve the odds of success of an AI project in the enter-
prise. We hope our Doability Method contributes to this cause. Some of the more specific
needs for managing the development and deployment of AI applications are:
• Managing Data: Tools and processes will be required to perform a whole range of
functions. Obvious requirements exist in areas such as data cleansing and labelling.
However, the tooling needs to go much further and enable functions for managing
data provenance, redaction and anonymisation. It is important to understand exactly
what data was used to develop and configure a model along with understanding the
impact on the model of redacting some of that data. The critical aspect here is sus-
tained management of data artefacts for the life of an AI application and beyond.
• Governance of AI Models: In the context of successful use or reuse of AI components
within an organisation or from an external source, the context behind the creation of
the AI Models is critical. A good example of emerging capabilities in this area is the
concept of AI Fact Sheets [42].
• Managing Complexity: If an AI application needs the use of multiple AI compo-
nents that interact to provide the final output (e.g. processing various sensors in a
self-driving car to make a decision to stop or not), this will require sophisticated tools
to enable end-to-end tracing of decisions.
• Monitoring & Managing Change: Continuous monitoring is critical to improve
the performance of a deployed application. The aircraft industry provides a strong
role model in this respect. Over a 100-year history, the aircraft industry has continu-
ously improved safety through the rigorous investigation of every accident and the
implementation of any resulting safety recommendations. Similarly, in AI, we need
processes to investigate unacceptable decisions and continually improve the deci-
sion-making of the application.
The Future ◾ 271
HUMAN–MACHINE TEAMING
In this section, we want to outline the evolution of the interaction between humans and
machines since the birth of general-purpose computing and explore where this can go as
the use of AI becomes more prevalent in the enterprise.
i. Number of people regularly interacting with a computer increased from a few thou-
sands to billions. The number of daily activities on the social media attests to this fact.
ii. We have moved from a transactional era to a relationship era with computing.
In other words, in the years past, we used computers when we had to do something spe-
cific. Now we live with them in our pockets. The last thing we see before we go to sleep is
the phone and the first thing we see in the morning is the phone. Even if we spend a few
minutes without access to the phone (e.g. when the battery needs charging), we feel help-
less. Use of Alexa and Siri for doing chores is a common occurrence. This simply means
our relationship with computers is moving to be more symbiotic to enrich our lives and
make us more productive. While all these advances have happened in the personal lives
of people, the human–machine interaction in the enterprise setting has not changed very
much in the last two decades. Now enter, AI.
272 ◾ Beyond Algorithms
Augmented Intelligence
Throughout the history of AI, there has been considerable discussion on the prospect of
machines replacing humans. In science fiction, it is not uncommon to be presented with an
apocalyptic vision of the future where humans are battling AI for survival.
What these many different narratives seem to assume is that AI and humans will evolve
separately and that problems will either be solved by humans or solved by machines. This is
a huge over-simplification and almost certainly untrue. Instead, we believe that augmented
human–machine systems will emerge where AI and human decision makers work in part-
nership to deliver more effective results.
As early as in 1960, Licklider [43] articulated the foundations of a human–machine
partnership as follows:
“Man-computer symbiosis is an expected development in cooperative interaction
between men and electronic computers. It will involve very close coupling between
the human and the electronic members of the partnership. The main aims are (1) to
let computers facilitate formulative thinking as they now facilitate the solution of
formulated problems, and (2) to enable men and computers to cooperate in mak-
ing decisions and controlling complex situations without inflexible dependence
on predetermined programs. In the anticipated symbiotic partnership, men will
set the goals, formulate the hypotheses, determine the criteria, and perform the
evaluations. Computing machines will do the routinizable work that must be done
to prepare the way for insights and decisions in technical and scientific thinking.”
This is an amazing prognostication 60 years ago on how we see AI in the enterprise today!
The term used to describe this vision of the future was symbiotic computing. Already we
are seeing existing AI applications where humans and machines work together. In such
applications, the AI may perform the straightforward classification tasks whilst referring
more complex and confusing cases to a human operator. However, it is not simply the case
that the AI performs the simple and the human the complex. When referring the complex
cases to the human, the AI may also provide an initial assessment together with supporting
evidence and possibly even counter examples. The human may use other AI tools to inform
his or her decision, and when the decision is made, it is fed back into the ML algorithm to
improve future performance. The AI and the human work together in an augmented and
symbiotic manner.
To give a further example of a physical implementation of such a partnership between
humans and machines in business, we refer to the description of a Cognitive Environments
Lab (CEL) by Farrell et al. [44]. CEL was equipped with motion sensors, microphones, cam-
eras, speakers and displays. Humans interacted with the environment via speech, gestures,
wands, etc. Their architecture made it possible to run cloud-based services, manipulate
data, run analytics and generate spoken and visual outputs as one contiguous experience.
They demonstrated an illustrative application for strategic decision-making with a busi-
ness use case of mergers and acquisitions [45]. A more recent incarnation of this technol-
ogy as an embodied cognitive agent [46] helps scientists to visualise and analyze exo-planet
data. This includes the ability to program itself using AI planning algorithms to derive
The Future ◾ 273
WHAT’S IN A NAME?
If I could just invent an Arnold Swartzenegger look-alike robot to go back to 1956, I’d
find John McCarthy (the person who coined the phrase “Artificial Intelligence”) and give
him a bit of a telling off. What was he thinking? Why couldn’t he just have called it
“Probabilistic Learning”, or “Coding by Example”, anything would be better than invok-
ing the Dr Frankenstein spectre of “Artificial Intelligence”. I doubt he had any idea how
many spurious media apocalyptic scare stories or hyperbole ridden product launches
those two words have caused. But there again, would we as computer scientists have
striven so hard and with such passion to try and live up to the challenge, laid down in
those two words?
David Porter
certain physical quantities and provide human-friendly explanations of the steps used in
the derivation. It is quite possible to imagine such environments being used in places that
need accumulation and assimilation of large volumes of data for decision support, such as
corporate board rooms and military command centres.
This interaction between human and AI needs to be considered as part of an AI Systems
Engineering approach [47] to delivering continually improving AI.
On the positive side, there are numerous advances in assistive technologies ranging from
robotic arms, AI-enabled prosthetic limbs, AI route planning for the visually impaired,
etc., which enrich the experiences of the disabled. For at least three decades, laws have
existed [49,50] that provide requirements for information technology to support the needs
of people with disabilities. Due to competing business priorities, it is fair to say actual
implementations of IT systems have lagged behind in providing seamless accessibility to
the disabled, usually an afterthought. With the increasing availability of multimodal user
interfaces enabled by AI (e.g. text to speech, speech to text, gesture recognition, etc.), it is
possible to imagine a future where an AI system will pick the appropriate mode to interact
with a disabled person based on his/her profile either explicitly given or learnt through
interactions. We can think of these as parameters in an Application Program Interface
of Human with the AI system. This is another practical aspect of realising augmented
intelligence.
Now, let us discuss the negative side. Even before the adoption of AI, despite all the
good intentions, there is clear evidence for lack of consideration of the concerns and
welfare of the disabled in the general psyche of the society [51]. Introducing AI to auto-
mate decisions in these contexts runs the risk of simply codifying and replicating this
bias. Trewin et al. [52] described potential opportunities and underlying risks across
four emerging AI application areas: employment, education, public safety and health-
care. Reference [53] provides six steps to design AI applications to treat people with
disabilities fairly.
In this section, we have described various aspects of the impact of AI technology on
the interactions between humans and machines. Human–Machine Teaming is no longer
an esoteric topic for a few experts to discuss in annual conferences but rather a critical
consideration for mainstream business applications. We expect major advances to happen
here in the near future.
• In keeping with the messages in the rest of the book, data is everything. It’s critical
to the success of any enterprise AI project and it’s critical to the future of your busi-
ness. Think very hard about your data strategy and how you are going to feed the AI
machine.
• Algorithms are going to keep developing, and there is a lot of fascinating work under-
way to address the data challenge. Neuro-symbolic approaches should bring humans
and ML algorithms a little bit closer.
• To support AI workloads more efficiently and at lower power consumption, new com-
puting paradigms are being developed.
The Future ◾ 275
• The new discipline of AI Engineering must mature soon to deliver the real value of
AI. A key element of the discipline will be to support the creation of trustworthy
systems that will facilitate the collaboration between humans and machines to solve
complex problems.
For those of you who are worried about the prospect of sentient AI enslaving humanity
… please rest assured that the Singularity is some way off … you can relax and sleep well.
Even if we are wrong in that prediction, you still don’t need to worry because Steve
Wozniak’s prediction [54] for the future of AI is that intelligent robots will keep us as pets
and, as all readers with pets know, that’s a wonderful way to live!
REFERENCES
1. L. Zhou, et al., “Machine learning on big data: opportunities and challenges,” Neurocomputing,
237, pp. 350–361 (2017).
2. A. L’Heureux, et al., “Machine learning with big data: challenges and approaches,” IEEE
Access, 5, pp. 7776–7797 (2017).
3. IDC Global DataSphere, Forecast: 2021-2025 The World Keeps Creating More Data - Now,
What Do We Do With It All? (IDC #US46410201, March 2021).
4. N. Fishman and C. Stryker, Smarter Data Science: Succeeding with Enterprise-Grade Data and
AI Projects, Wiley (2020).
5. G. E. Moore, “Cramming more components onto integrated circuits,” Electronics, 38(8),
pp. 114–117 (April 19, 1965).
6. M. Bohr, “A 30 year retrospective on Dennard’s MOSFET scaling paper,” IEEE SSCS Newsletter,
pp. 11–13, Winter 2007.
7. Von Neumann Architecture, https://en.wikipedia.org/wiki/Von_Neumann_architecture.
8. D. Amodei and D. Hernandez, “AI and compute,” (May 16, 2018) https://openai.com/blog/
ai-and-compute/.
9. E. Strubell et al. “Energy and policy considerations for deep learning in NLP,” 57th Annual
Meeting of the Association for Computational Linguistics (ACL), arXiv:1906.02243 (2019).
10. W. Knight, “AI can do great things—if it doesn’t burn the planet,” Wired Magazine, (January
21, 2020) https://www.wired.com/story/ai-great-things-burn-planet/.
11. X. Sun et al., “Hybrid 8-bit floating point (HFP8) training and inference for deep neural net-
works,” (NeurIPS 2019) 33rd Conference on Neural Information Processing Systems.
12. S. K. Esser, et al., “Convolutional networks for fast, energy-efficient neuromorphic comput-
ing,” Proceedings of the National Academy of Sciences, 113(41), pp. 11441–11446 (2016).
13. S. Ambrogio, et al., “Equivalent-accuracy accelerated neural-network training using analogue
memory,” Nature, 558, pp. 60–67 (2018).
14. V. Havlíček, et al. “Supervised learning with quantum-enhanced feature spaces,” Nature, 567,
pp. 209–212 (2019).
15. Ethnologue, “Languages of the world,” https://www.ethnologue.com/.
16. T. Young, et al., “Recent trends in deep learning based natural language processing,” IEEE
Computational Intelligence Magazine, 13(3), pp. 55–75 (August, 2018).
17. D. W. Otter, “A survey of the usages of deep learning for natural language processing,” IEEE
Transactions on Neural Networks and Learning Systems, 32(2), pp. 604–624 (2021).
18. X. Liu et al., “Using language models to pre-train features for optimizing information tech-
nology operations management tasks,” H. Hacid et al. (eds.) ICSOC 2020 Workshops, Springer
LNCS 12632, pp. 150–161 (2021), https://www.youtube.com/watch?v=niD0UwIi-YY.
276 ◾ Beyond Algorithms
19. M. Johnson, et al., “Google’s multilingual neural machine translation system: enabling zero-
shot translation,” Transactions of the Association for Computational Linguistics, 5, pp. 339–
351, (2017).
20. Google Translate: https://translate.google.com/.
21. W. Wang, et al., “A survey of zero-shot learning: settings, methods, and applications,” ACM
Transactions on Intelligent Systems and Technology, 10(2), Article 13 (2019).
22. Y. Wang, et al., “Generalizing from a few examples: a survey on few-shot learning,” ACM
Computing Surveys, 53(3), Article No: 63, pp. 1–34 (2020).
23. G. Litjens, et al., “A survey on deep learning in medical image analysis,” Medical Image
Analysis, 42, pp. 60–88 (2017).
24. A. Medela, et al., “Few shot learning in histopathological images: reducing the need of labelled
data on biological datasets,” IEEE 16th International Symposium on Biomedical Imaging,
pp. 1860–1864 (2019).
25. D. Verma, G. White and G. de Mel, “Federated AI for the enterprise: a web services based
implementation,” IEEE International Conference on Web Services (ICWS), pp. 20–27 (2019).
26. H. R. Roth, et al., “Federated learning for breast density classification: a real-world implemen-
tation,” In: Albarqouni S. et al. (eds.) Domain Adaptation and Representation Transfer, and
Distributed and Collaborative Learning. Lecture Notes in Computer Science, 12444, Springer,
Cham (2020).
27. IBM Federated Learning: https://ibmfl.mybluemix.net/.
28. T. Li, et al., “Federated learning: challenges, methods, and future directions,” IEEE Signal
Processing Magazine, 37(3), pp. 50–60, (May 2020).
29. C. Liu et al., “A new deep learning-based food recognition system for dietary assessment on
an edge computing service infrastructure,” IEEE Transactions on Services Computing, 11(2),
pp. 249–261 (2018).
30. L. G. Valiant, “Three problems in computer science,” Journal of the ACM, 50(1), pp. 96–99
(2003).
31. Md K. Saker, et al., “Neuro-symbolic artificial intelligence-current trends,” arXiv:2105.05330
(2021).
32. H. L. H. de Penning, et al., “A neural-symbolic cognitive agent for online learning and reason-
ing,” Twenty-Second International Joint Conference on Artificial Intelligence (2011); the rule
cited is slightly modified from the version in the paper for clarity.
33. D. Scully, et al., “Machine learning: the high-interest credit card of technical debt,” Software
Engineering for Machine Learning Workshop, NIPS (2014).
34. R. Akkiraju, et al., “Characterizing machine learning process: a maturity framework,” https://
arxiv.org/abs/1811.04871 (2018).
35. A. Arpteg, et al., “Software engineering challenges of deep learning,” Proceedings of the 44th
Euromicro Conference on Software Engineering and Advanced Applications (SE-AA), pp. 50–59
(2018).
36. S. Amershi, et al., “Software engineering for machine learning: a case study,” ICSE-SEIP’10
Proceedings of the 41st International Conference on Software Engineering: Software Engineering
in Practice, pp. 291–300 (2019).
37. A. Horneman, A. Mellinger, and I. Ozkaya, AI Engineering: 11 Foundational Practices, CMU
Software Engineering Institute (2019).
38. P. Santhanam, “Quality management of machine learning systems,” In: O. Shehory, E. Farchi
and G. Barash. (eds.) Engineering Dependable and Secure Machine Learning Systems. EDSMLS
2020. Communications in Computer and Information Science, 1272, pp.1-13, Springer, Cham
(2020).
39. I. Ozkaya, “What is really different in engineering AI-enabled systems?” IEEE Software,v.37,
No.4, pp. 3–6 (July-August, 2020).
40. J. Bosch, I. Crnkovic, and H. H. Olsson, “Engineering AI systems: a research agenda,”
arXiv:2001.07522 (2020).
The Future ◾ 277
279
280 ◾ Epilogue
Data is going to be critical … in fact, it’s all about the data! We need to develop
our Data Science skills and really look at the quality of our data. We also need to
look hard at the external data we rely on and ensure it’s available in the future.
Whilst this will help our AI projects, it’s actually a good thing to do anyway.
Whilst our competitors are making a big noise about AI, the CEO isn’t sure
they’re really using it properly. She’s seen some very high-profile demos … the
sort of things a Graduate would throw together in a few hours … but nothing that
suggests they’ve done this properly. We’re going to do it properly with a proper
infrastructure, devops and a real focus on building a trustworthy capability that
our customers and the public will want to be part of.
Ultimately, this is all about skills … AI is not engineering free and we need to
put the engineering skills in place to build proper applications. Remember applica-
tions are much more than just algorithms. There’s so much more to this!
So, I think the brief is quite simple … let’s get a team together … not just techies
… we need the business and domain specialists working alongside engineers who
really understand this stuff. By engineers, I mean people who build systems and
understand what that means. They need to be educated in AI and systems engi-
neering. With the right skills and the right understanding, we can start developing
a proper strategy so that we can be the leaders in this field.
Index
Note: Bold page numbers refer to tables; italic page numbers refer to figures.
A AI planning component 34
AI project stakeholder see stakeholders
A*STAR 34
AI Winters 5, 7
accuracy see statistical accuracy
algorithms 56
adversarial AI 122, 134, 248
addiction 59–60, 92
AI applications
awareness 77
activities 39, 39–42
models 60–61
attributes 29–31
properties 56
auditability/explainability 252–253
purpose 57–59
autonomy 21
Alpha Fold’s neural nets 100
complexity 42–44
AlphaGo Zero 15
engineering aspects 269–271
alternative models 157–159, 158
enterprise 29–31, 239–240
Ambrogio, S. 262
expert systems 19–20
American Association of Artificial Intelligence
expertise/processes 213–214
(AAAI) 31
integration/deployment 244–251
Amershi, S. 214
lifecycle 213 (see application lifecycle)
application lifecycle
monitoring 43, 48, 151, 157, 157, 158, 209, 220,
business requirements 243
233–234, 239–240, 247, 250
design 243–244
NLU 17–19, 206–207
DevOps 244
operations 247
functional decomposition 243
reuse 211, 212
non-ai implementation/testing 244
robotics 22–23
user interaction 244
security 253–255
architectural& engineering considerations 44–45
Sustenance 251
Artificial Intelligence (AI)
Testing 245–247
ability to generalise 12–13
three stages (develop/deploy/sustain) 51
accountability 14
three stages of enterprise 46–48, 47
algorithms (see algorithms)
web companies 26
application behaviour 11–12
AI components 32, 32–33
business change 208–209
goals 57
challenges 205, 207–208
monitoring 157, 157
data 212
three in series 36, 155, 155
definition 6
AI engineering 269–271
determinism 14
AI factory approach 49–51, 214–216
enterprise 28
AI models 62–63, 212, 239
ethics 13–14
acquire/prepare data 240
experimentation 13
build/train/test/debug model 240–241
factory approach 49–51, 214–216
MLOps 242
implementation 238
trustworthiness 241
281
282 ◾ Index
B D
Babylonian algorithm 58 data
back propagation 69 acquisition 194–195
Baylor, D. 151 AI expectations 223
bias 123–126 augmentation 196–197
definition 126–127 consistency 187
detection/mitigation 161, 251 DevOps 197
reasons 127 enterprise 168
black box 92, 122, 134, 140, 236, 245 feature engineering 197
bootstrapping labelled data 153 feature space 185
Breck, E. 241 governance 198
Bubble Sort 58–59 helping the doctor 176–177
business case 110–111 historical 183
efficiency 111 imputation 187, 188
enhancements 111 inaccessible 185
measurability/impact 112 ingestion 195
new capabilities 111–112 merging 196
business impact 49, 109, 112, 156 missing 185
business problem modelling 197
Enterprise Risk Management 34 modelling tides 171–176, 174, 175
financial accounting 33 operations 198
life insurance 35 PoC phase 237
manufacturing industry 33 predicting house prices 177–179, 179, 180
procurement demand aggregation 34 preparation 195–196
business value 49, 110, 112, 177, 193 proxy 189
quality 198–199
real world 183, 185, 187
C
reducing dimensions 184–185
California Consumer Privacy Act (CCPA) 168 requirements 194
Capek, K. 22 sets 167–168
Causal Extraction component 34 sources 167–168
C5 rule-based ML 72–73 structured data 166
challenger models 157 synthetic (see synthetic data)
Index ◾ 283
E F
Eckroth, R. G. 31 fairness 123–128
Edge AI 267 Farrell, R. 272
efficiency 111, 262 fault tolerance 157, 210
effort estimation 50, 234–235 feature definition process 81–82, 82, 83, 179–183
Electro Light Sensitive with Internal and External feature extraction process 179–183
stability (ELSIE) 21 feature reduction process 182
Electro-Mechanical Robot (ELMER) 22 Federated Machine Learning (FML) 266–267
emerging technology 96–97 Few-Shot Learning (FSL) 265–266
enterprise applications 4–5 financial accounting 33
data 168 food analogy 64, 64–65
reality 168 Frontec Co, Ltd. 33
284 ◾ Index
U X