Discover millions of ebooks, audiobooks, and so much more with a free trial

From $11.99/month after trial. Cancel anytime.

The Deep Learning Architect's Handbook: Build and deploy production-ready DL solutions leveraging the latest Python techniques
The Deep Learning Architect's Handbook: Build and deploy production-ready DL solutions leveraging the latest Python techniques
The Deep Learning Architect's Handbook: Build and deploy production-ready DL solutions leveraging the latest Python techniques
Ebook1,125 pages9 hours

The Deep Learning Architect's Handbook: Build and deploy production-ready DL solutions leveraging the latest Python techniques

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Deep learning enables previously unattainable feats in automation, but extracting real-world business value from it is a daunting task. This book will teach you how to build complex deep learning models and gain intuition for structuring your data to accomplish your deep learning objectives.
This deep learning book explores every aspect of the deep learning life cycle, from planning and data preparation to model deployment and governance, using real-world scenarios that will take you through creating, deploying, and managing advanced solutions. You’ll also learn how to work with image, audio, text, and video data using deep learning architectures, as well as optimize and evaluate your deep learning models objectively to address issues such as bias, fairness, adversarial attacks, and model transparency.
As you progress, you’ll harness the power of AI platforms to streamline the deep learning life cycle and leverage Python libraries and frameworks such as PyTorch, ONNX, Catalyst, MLFlow, Captum, Nvidia Triton, Prometheus, and Grafana to execute efficient deep learning architectures, optimize model performance, and streamline the deployment processes. You’ll also discover the transformative potential of large language models (LLMs) for a wide array of applications.
By the end of this book, you'll have mastered deep learning techniques to unlock its full potential for your endeavors.

LanguageEnglish
Release dateDec 29, 2023
ISBN9781803235349
The Deep Learning Architect's Handbook: Build and deploy production-ready DL solutions leveraging the latest Python techniques

Related to The Deep Learning Architect's Handbook

Related ebooks

Data Modeling & Design For You

View More

Related articles

Reviews for The Deep Learning Architect's Handbook

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    The Deep Learning Architect's Handbook - Ee Kin Chin

    Cover.png

    The Deep Learning Architect’s Handbook

    Copyright © 2023 Packt Publishing

    All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

    Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

    Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

    Group Product Manager: Ali Abidi

    Book Project Manager: Shambhavi Mishra

    Senior Editor: Rohit Singh

    Technical Editor: Devanshi Ayare

    Copy Editor: Safis Editing

    Proofreader: Safis Editing

    Indexer: Subalakshmi Govindhan

    Production Designer: Ponraj Dhandapani

    DevRel Marketing Executive: Vinishka Kalra

    First published: December 2023

    Production reference: 1301123

    Published by Packt Publishing Ltd.

    Grosvenor House

    11 St Paul’s Square

    Birmingham

    B3 1RB, UK.

    ISBN 978-1-80324-379-5

    www.packtpub.com

    To my wife, Nina, the constant source of inspiration, support, and encouragement in my life. Without you, this book would have remained a dream.

    – Ee Kin Chin

    Contributors

    About the author

    Ee Kin Chin is a senior deep learning engineer at DataRobot. He led teams to develop advanced AI tools used by numerous organizations from diverse industries and provided consultation on many customer AI use cases. Previously, he worked on deep learning (DL) computer vision projects for smart vehicles and human sensing applications at Panasonic and offered AI solutions using edge cameras at a tech solutions provider. He was also a DL mentor for an online course. Holding a Bachelor of Engineering (honors) degree in electronics, with a major in telecommunications, and a proven track record of successful application of AI, Ee Kin's expertise includes embedded applications, practical deep learning, data science, and classical machine learning.

    A huge shout-out to my fantastic friends, mentors, colleagues, family, book reviewers, and the open source community who’ve supported and motivated me during my professional career. Your shared knowledge, insights, and wisdom have been invaluable.

    About the reviewers

    Shivani Modi is a data scientist with expertise in machine learning, deep learning, and NLP, holding a master’s degree from Columbia University. Her five years of professional experience spans IBM, SAP, and C3 AI, where she has excelled in deploying scalable AI models across various sectors. At Konko AI, Shivani spearheaded the development of tools to optimize LLM selection and deployment. Shivani’s dedication to mentoring and talent development, coupled with her hands-on experience in leading complex projects, underscores her status as a thought leader in AI innovation. Her upcoming project aims to revolutionize how developers utilize LLMs, ensuring their secure and efficient implementation.

    Ved Upadhyay is a seasoned data science and AI professional, bringing over seven years of hands-on experience in addressing enterprise-level challenges in deep learning. His expertise spans diverse industries, including retail, e-commerce, pharmaceuticals, agro-tech, and socio-tech, where he has successfully implemented AI solutions. Ved is currently working as a senior data scientist at Walmart, where he leads multiple initiatives focused on customer propensity and responsible AI. He earned his master’s degree in data science from the University of Illinois Urbana-Champaign and has contributed as a deep learning researcher at IIIT Hyderabad.

    Table of Contents

    Preface

    Part 1 – Foundational Methods

    1

    Deep Learning Life Cycle

    Technical requirements

    Understanding the machine learning life cycle

    Strategizing the construction of a deep learning system

    Starting the journey

    Evaluating deep learning’s worthiness

    Defining success

    Planning resources

    Preparing data

    Deep learning problem types

    Acquiring data

    Making sense of data through exploratory data analysis (EDA)

    Data pre-processing

    Developing deep learning models

    Deep learning model families

    The model development strategy

    Delivering model insights

    Managing risks

    Ethical and regulatory risks

    Business context mismatch

    Data collection and annotation risks

    Data security risk

    Summary

    Further reading

    2

    Designing Deep Learning Architectures

    Technical requirements

    Exploring the foundations of neural networks using an MLP

    Understanding neural network gradients

    Understanding gradient descent

    Implementing an MLP from scratch

    Implementing MLP using deep learning frameworks

    Regularization

    Designing an MLP

    Summary

    3

    Understanding Convolutional Neural Networks

    Technical requirements

    Understanding the convolutional neural network layer

    Understanding the pooling layer

    Building a CNN architecture

    Designing a CNN architecture for practical usage

    Exploring the CNN architecture families

    Understanding the ResNet model family

    Understanding the DenseNet architecture family

    Understanding the EfficientNet architecture family

    Understanding small and fast CNN architecture families for small-scale edge devices

    Understanding SqueezeNet

    Understanding MobileNet

    Understanding MicroNet, the current state-of-the-art architecture for the edge

    Summary

    4

    Understanding Recurrent Neural Networks

    Technical requirements

    Understanding LSTM

    Decoding the forget mechanism of LSTMs

    Decoding the learn mechanism of LSTMs

    Decoding the remember mechanism of LSTMs

    Decoding the information-using mechanism of LSTMs

    Building a full LSTM network

    Understanding GRU

    Decoding the reset gate of GRU

    Decoding the update gate of GRU

    Understanding advancements over the standard GRU and LSTM layers

    Decoding bidirectional RNN

    Adding peepholes to LSTMs

    Adding working memory to exceed the peephole connection limitations for LSTM

    Summary

    5

    Understanding Autoencoders

    Technical requirements

    Decoding the standard autoencoder

    Exploring autoencoder variations

    Building a CNN autoencoder

    Summary

    6

    Understanding Neural Network Transformers

    Exploring neural network transformers

    Decoding the original transformer architecture holistically

    Uncovering transformer improvements using only the encoder

    Improving the encoder only pre-training tasks and objectives

    Improving the encoder-only transformer’s architectural compactness and efficiency

    Improving the encoder-only transformers’ core functional architecture

    Uncovering encoder-only transformers’ adaptations to other data modalities

    Uncovering transformer improvements using only the decoder

    Diving into the GPT model family

    Diving into the XLNet model

    Discussing additional advancements for a decoder-only transformer model

    Summary

    7

    Deep Neural Architecture Search

    Technical requirements

    Understanding the big picture of NAS

    Understanding general hyperparameter search-based NAS

    Searching neural architectures by using successive halving

    Searching neural architectures by using Hyperband

    Searching neural architectures by using Bayesian hyperparameter optimization

    Understanding RL-based NAS

    Understanding founding NAS based on RL

    Understanding ENAS

    Understanding MNAS

    Summarizing NAS with RL methods

    Understanding non-RL-based NAS

    Understanding path elimination-based NAS

    Understanding progressive growth-based NAS

    Summary

    8

    Exploring Supervised Deep Learning

    Technical requirements

    Exploring supervised use cases and problem types

    Implementing neural network layers for foundational problem types

    Implementing the binary classification layer

    Implementing the multiclass classification layer

    Implementing a regression layer

    Implementing representation layers

    Training supervised deep learning models effectively

    Preparing the data for DL training

    Configuring and tuning DL hyperparameters

    Executing, visualizing, tracking, and comparing experiments

    Exploring model-building tips

    Exploring general techniques to realize and improve supervised deep learning based solutions

    Breaking down the multitask paradigm in supervised deep learning

    Multitask pipelines

    TL

    Multiple objective learning

    Multimodal NN training

    Summary

    9

    Exploring Unsupervised Deep Learning

    Technical requirements

    Exploring unsupervised deep learning applications

    Creating pretrained network weights for downstream tasks

    Creating general representations through unsupervised deep learning

    Exploring zero-shot learning

    Exploring the dimensionality reduction component of unsupervised deep learning

    Detecting anomalies in external data

    Summary

    Part 2 – Multimodal Model Insights

    10

    Exploring Model Evaluation Methods

    Technical requirements

    Exploring the different model evaluation methods

    Engineering the base model evaluation metric

    Exploring custom metrics and their applications

    Exploring statistical tests for comparing model metrics

    Relating the evaluation metric to success

    Directly optimizing the metric

    Summary

    11

    Explaining Neural Network Predictions

    Technical requirements

    Exploring the value of prediction explanations

    Demystifying prediction explanation techniques

    Exploring gradient-based prediction explanations

    Trusting and understanding integrated gradients

    Using integrated gradients to aid in understanding predictions

    Explaining prediction explanations automatically

    Exploring common pitfalls in prediction explanations and how to avoid them

    Summary

    Further reading

    12

    Interpreting Neural Networks

    Technical requirements

    Interpreting neurons

    Finding neurons to interpret

    Interpreting learned image patterns

    Explaining predictions with image input data and integrated gradients

    Practically visualizing neurons with image input data

    Discovering the counterfactual explanation strategy

    Summary

    13

    Exploring Bias and Fairness

    Technical requirements

    Exploring the types of bias

    Understanding the source of AI bias

    Discovering bias and fairness evaluation methods

    Evaluating the bias and fairness of a deep learning model

    Tailoring bias and fairness measures across use cases

    Mitigating AI bias

    Summary

    14

    Analyzing Adversarial Performance

    Technical requirements

    Using data augmentations for adversarial analysis

    Analyzing adversarial performance for audio-based models

    Executing adversarial performance analysis for speech recognition models

    Analyzing adversarial performance for image-based models

    Executing adversarial performance analysis for a face recognition model

    Exploring adversarial analysis for text-based models

    Summary

    Part 3 – DLOps

    15

    Deploying Deep Learning Models to Production

    Technical requirements

    Exploring the crucial components for DL model deployment

    Identifying key DL model deployment requirements

    Choosing the right DL model deployment options

    Architectural choices

    Computing hardware choices

    Model packaging and frameworks

    Communication protocols to use

    User interfaces

    Exploring deployment decisions based on practical use cases

    Exploring deployment decisions for a sentiment analysis application

    Exploring deployment decisions for a face detection and recognition system for security cameras

    Discovering general recommendations for DL deployment

    Model safety, trust, and reliability assurance

    Optimizing model latency

    Tools that abstract deployment

    Deploying a language model with ONNX, TensorRT, and NVIDIA Triton Server

    Practically deploying a DL model with the single pipeline approach

    Summary

    16

    Governing Deep Learning Models

    Technical requirements

    Governing deep learning model utilization

    Governing a deep learning model through monitoring

    Monitoring a deployed deep learning model with NVIDIA Triton Server, Prometheus, and Grafana

    Governing a deep learning model through maintenance

    Exploring limitations and risks of using automated tasks triggered by model monitoring alerts

    Summary

    17

    Managing Drift Effectively in a Dynamic Environment

    Technical requirements

    Exploring the issues of drift

    Exploring the types of drift

    Exploring data drift types

    Exploring concept drift

    Exploring model drift

    Exploring strategies to handle drift

    Exploring drift detection strategies

    Analyzing the impact of drift

    Exploring strategies to mitigate drift

    Detecting drift programmatically

    Detecting concept drift programmatically

    Detecting data drift programmatically

    Implementing programmatic data distribution drift detection using evidently

    Comparing and contrasting the Evidently and Alibi-Detect libraries for drift detection

    Summary

    18

    Exploring the DataRobot AI Platform

    Technical requirements

    A high-level look into what the DataRobot AI platform provides

    Preparing data with DataRobot

    Ingesting data for deep learning model development

    Exploratory analysis of the data

    Wrangling data for deep learning model development

    Executing modeling experiments with DataRobot

    Deep learning modeling

    Gathering model and prediction insights

    Making batch predictions

    Deploying a deep learning blueprint

    Practically deploying a blueprint in DataRobot

    Governing a deployed deep learning blueprint

    Governing through model utilization in DataRobot

    Governing through model monitoring in DataRobot

    Governing through model maintenance in DataRobot

    Exploring some customer success stories

    Summary

    19

    Architecting LLM Solutions

    Overview of LLM solutions

    Handling knowledge for LLM solutions

    Exploring chunking methods

    Exploring embedding models

    Exploring the knowledge base index types

    Exploring orchestrator tools for LLM solutions

    Evaluating LLM solutions

    Evaluating LLM solutions through quantitative metrics

    Evaluating LLM solutions through qualitative evaluation methods

    Identifying challenges with LLM solutions

    Tackling challenges with LLM solutions

    Tackling the output and input limitation challenge

    Tackling the knowledge- and information-related challenge

    Tackling the challenges of accuracy and reliability

    Tackling the runtime performance challenge

    Tackling the challenge of ethical implications and societal impacts

    Tackling the overarching challenge of LLM solution adoption

    Leveraging LLM to build autonomous agents

    Exploring LLM solution use cases

    Summary

    Further reading

    Index

    Other Books You May Enjoy

    Preface

    As a deep learning practitioner and enthusiast, I have spent years working on various projects and learning from diverse sources such as Kaggle, GitHub, colleagues, and real-life use cases. I've realized that there is a significant gap in the availability of cohesive, end-to-end deep learning resources. Traditional Massively Open Online Courses (MOOC), while helpful, often lack the practical knowledge and real-world insights that can only be gained through hands-on experience.

    To bridge this gap, I've created The Deep Learning Architect Handbook, a comprehensive and practical guide that combines my unique experiences and insights. This book will help you navigate the complex landscape of deep learning, providing you with the knowledge and insights that would typically take years of hands-on experience to acquire, condensed into a resource that can be consumed in just days or weeks.

    This book delves into various stages of the deep learning life cycle, from planning and data preparation to model deployment and governance. Throughout this journey, you'll encounter both foundational and advanced deep learning architectures, such as Multi-Layer Perceptrons (MLPs), Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), autoencoders, transformers, and cutting-edge methods, such as Neural Architecture Search (NAS). Divided into three parts, this book covers foundational methods, model insights, and DLOps, exploring advanced topics such as NAS, adversarial performance, and Large Language Model (LLM) solutions. By the end of this book, you will be well-prepared to design, develop, and deploy effective deep learning solutions, unlocking their full potential and driving innovation across various applications.

    I hope that this book will serve as a way for me to give back to the community, by sparking conversations, challenging assumptions, and inspiring new ideas and approaches in the field of deep learning. I invite you to join me on this journey, and I look forward to hearing your thoughts and feedback as we explore the captivating world of deep learning together. Please feel free to reach out to me via LinkedIn through www.linkedin.com/in/chineekin

    , Kaggle through https://www.kaggle.com/dicksonchin93

    , or other channels listed on my LinkedIn profile. Your unique experiences and perspectives will undoubtedly contribute to the ongoing evolution of this book and the deep learning community as a whole.

    Who this book is for

    This book is best suited for deep learning practitioners, data scientists, and machine learning developers who want to explore deep learning architectures to solve complex business problems. The audience of this book is professionals in the deep learning and AI space who are going to use the knowledge in their business use cases. Working knowledge of Python programming and a basic understanding of deep learning techniques is needed to get the most out of this book.

    What this book covers

    Chapter 1

    , Deep Learning Life Cycle, introduces the key stages of a deep learning project, focusing on planning and data preparation, and sets the stage for a comprehensive exploration of the deep learning life cycle throughout the book.

    Chapter 2, Designing Deep Learning Architectures, dives into the foundational aspects of deep learning architectures, including MLPs, and discusses their role in advanced neural networks, as well as the importance of backpropagation and regularization.

    Chapter 3

    , Understanding Convolutional Neural Networks, provides an in-depth look at CNNs, their applications in image processing, and various model families within the CNN domain.

    Chapter 4, Understanding Recurrent Neural Networks, explores the structure and variations of RNNs and their ability to process sequential data effectively.

    Chapter 5

    , Understanding Autoencoders, examines the fundamentals of autoencoders as a method for representation learning and their applications across different data modalities.

    Chapter 6

    , Understanding Neural Network Transformers, delves into the versatile nature of transformers, capable of handling diverse data modalities without explicit data-specific biases, and their potential applications in various tasks and domains.

    Chapter 7, Deep Neural Architecture Search, introduces the concept of NAS as a way to automate the design of advanced neural networks and discusses its applications and limitations in different scenarios.

    Chapter 8

    , Exploring Supervised Deep Learning, covers various supervised learning problem types, techniques for implementing and training deep learning models, and practical implementations using popular deep learning frameworks.

    Chapter 9

    , Exploring Unsupervised Deep Learning, discusses the contributions of deep learning to unsupervised learning, particularly highlighting the unsupervised pre-training method. Harnessing the vast amounts of freely available data on the internet, this approach improves model performance for downstream supervised tasks and paves the way toward general Artificial Intelligence (AI).

    Chapter 10

    , Exploring Model Evaluation Methods, provides an overview of model evaluation techniques, metric engineering, and strategies for optimizing against evaluation metrics.

    Chapter 11

    , Explaining Neural Network Predictions, delves into the prediction explanation landscape, focusing on the integrated gradients technique and its practical applications for understanding neural network predictions.

    Chapter 12

    , Interpreting Neural Networks, delves into the nuances of model understanding and showcases techniques for uncovering patterns detected by neurons. By exploring real images and generating images through optimization to activate specific neurons, you will gain valuable insights into the neural network’s decision-making process.

    Chapter 13

    , Exploring Bias and Fairness, addresses the critical issue of bias and fairness in machine learning models, discussing various types, metrics, and programmatic methods for detecting and mitigating bias.

    Chapter 14

    , Analyzing Adversarial Performance, examines the importance of adversarial performance analysis in identifying vulnerabilities and weaknesses in machine learning models, along with practical examples and techniques for analysis.

    Chapter 15

    , Deploying Deep Learning Models in Production, focuses on key components, requirements, and strategies for deploying deep learning models in production environments, including architectural choices, hardware infrastructure, and model packaging.

    Chapter 16

    , Governing Deep Learning Models, explores the fundamental pillars of model governance, including model utilization, model monitoring, and model maintenance, while providing practical steps for monitoring deep learning models.

    Chapter 17

    , Managing Drift Effectively in a Dynamic Environment, discusses the concept of drift and its impact on model performance, along with strategies for detecting, quantifying, and mitigating drift in deep learning models.

    Chapter 18

    , Exploring the DataRobot AI Platform, showcases the benefits of AI platforms, specifically DataRobot, in streamlining and accelerating the deep learning life cycle, and highlights various features and capabilities of the platform.

    Chapter 19

    , Architecting LLM Solutions, delves into LLMs and the potential applications, challenges, and strategies for creating effective, contextually aware solutions using LLMs.

    To get the most out of this book

    The code provided in the chapters has been tested on a computer with Python 3.10, Ubuntu 20.04 LTS 64-bit OS, 32 GB RAM, and an RTX 2080TI GPU for running deep learning models. Although the code has been tested on this specific setup, it may also work on other configurations; however, compatibility and performance are not guaranteed. Python dependencies are included in the requirements.txt file for easy installation in each chapter’s respective GitHub folders. Additionally, some non-Python software might be required; their installation instructions will be mentioned at the beginning of each relevant tutorial. For these software installations, you need to refer to external manuals or guides to install them. Do keep in mind the potential differences in system configurations as you carry out the practical code sections in this book.

    If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book’s GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

    Download the example code files

    You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/The-Deep-Learning-Architect-Handbook

    . If there’s an update to the code, it will be updated in the GitHub repository.

    We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/

    . Check them out!

    Conventions used

    There are a number of text conventions used throughout this book.

    Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: We will be using pandas for data manipulation and structuring, matplotlib and seaborn for plotting graphs, tqdm for visualizing iteration progress, and lingua for text language detection.

    A block of code is set as follows:

    import pandas as pd

    import matplotlib.pyplot as plt

    import seaborn as sns

    from tqdm import tqdm

    from lingua import Language, LanguageDetectorBuilder

    tqdm.pandas()

    Any command-line input or output is written as follows:

    sudo systemctl start node_exporter sudo systemctl start prometheus

    Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: We can set up the Prometheus link now by clicking on the three-line button on the top-left tab and clicking on the Data Sources tab under the Administration dropdown.

    Tips or important notes

    Appear like this.

    Get in touch

    Feedback from our readers is always welcome.

    General feedback: If you have questions about any aspect of this book, email us at [email protected]

    and mention the book title in the subject of your message.

    Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata

    and fill in the form.

    Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected]

    with a link to the material.

    If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com

    .

    Share Your Thoughts

    Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.

    Share Your Thoughts

    Once you’ve read The Deep Learning Architect’s Handbook, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page

    for this book and share your feedback.

    Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.

    Download a free PDF copy of this book

    Thanks for purchasing this book!

    Do you like to read on the go but are unable to carry your print books everywhere?

    Is your eBook purchase not compatible with the device of your choice?

    Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.

    Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application.

    The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily

    Follow these simple steps to get the benefits:

    Scan the QR code or visit the link below

    https://packt.link/free-ebook/9781803243795

    Submit your proof of purchase

    That’s it! We’ll send your free PDF and other benefits to your email directly

    Part 1 – Foundational Methods

    In this part of the book, you will gain a comprehensive understanding of the foundational methods and techniques in deep learning architectures. Starting with the deep learning life cycle, you will explore various stages at a high level, from planning and data preparation to model development, insights, deployment, and governance. You will then dive into the intricacies of designing deep learning architectures such as MLPs, CNNs, RNNs, autoencoders, and transformers. Additionally, you will learn about the emerging method of neural architecture search and its impact on the field of deep learning.

    Throughout this part, you will also delve into the practical aspects of supervised and unsupervised deep learning, covering topics such as binary classification, multiclassification, regression, and multitask learning, as well as unsupervised pre-training and representation learning. With a focus on real-world applications, this part provides valuable insights into the implementation of deep learning models using popular frameworks and programming languages.

    By the end of this part, you will have a solid foundation in deep learning architectures, methods, and life cycles, which will enable you to continue your journey to face other challenges involved in crafting deep learning solutions.

    This part contains the following chapters:

    Chapter 1

    , Deep Learning Life Cycle

    Chapter 2

    , Designing Deep Learning Architectures

    Chapter 3

    , Understanding Convolutional Neural Networks

    Chapter 4

    , Understanding Recurrent Neural Networks

    Chapter 5

    , Understanding Autoencoders

    Chapter 6

    , Understanding Neural Network Transformers

    Chapter 7

    , Deep Neural Architecture Search

    Chapter 8

    , Exploring Supervised Deep Learning

    Chapter 9

    , Exploring Unsupervised Deep Learning

    1

    Deep Learning Life Cycle

    In this chapter, we will explore the intricacies of the deep learning life cycle. Sharing similar characteristics to the machine learning life cycle, the deep learning life cycle is a framework as much as it is a methodology that will allow a deep learning project idea to be insanely successful or to be completely scrapped when it is appropriate. We will grasp the reasons why the process is cyclical and understand some of the life cycle’s initial processes on a deeper level. Additionally, we will go through some high-level sneak peeks of the later processes of the life cycle that will be explored at a deeper level in future chapters.

    Comprehensively, this chapter will help you do the following:

    Understand the similarities and differences between the deep learning life cycle and its machine learning life cycle counterpart

    Understand where domain knowledge fits in a deep learning project

    Understand the few key steps in planning a deep learning project to make sure it can tangibly create real-world value

    Grasp some deep learning model development details at a high level

    Grasp the importance of model interpretation and the variety of deep learning interpretation techniques at a high level

    Explore high-level concepts of model deployments and their governance

    Learn to choose the necessary tools to carry out the processes in the deep learning life cycle

    We’ll cover this material in the following sections:

    Machine learning life cycle

    The construction strategy of a deep learning life cycle

    The data preparation stage

    Deep learning model development

    Delivering model insights

    Managing risks

    Technical requirements

    This chapter includes some practical implementations in the Python programming language. To complete it, you need to have a computer with the following libraries installed:

    pandas

    matplotlib

    seaborn

    tqdm

    lingua

    The code files are available on GitHub: https://github.com/PacktPublishing/The-Deep-Learning-Architect-Handbook/tree/main/CHAPTER_1

    .

    Understanding the machine learning life cycle

    Deep learning is a subset of the wider machine learning category. The main characteristic that sets it apart from other machine learning algorithms is the foundational building block called neural networks. As deep learning has advanced tremendously since the early 2000s, it has made many previously unachievable feats possible through its machine learning counterparts. Specifically, deep learning has made breakthroughs in recognizing complex patterns that exist in complex and unstructured data such as text, images, videos, and audio. Some of the successful applications of deep learning today are face recognition with images, speech recognition from audio data, and language translation with textual data.

    Machine learning, on the other hand, is a subset of the wider artificial intelligence category. Its algorithms, such as tree-based models and linear models, which are not considered to be deep learning models, still serve a wide range of use cases involving tabular data, which is the bulk of the data that’s stored by small and big organizations alike. This tabular data may exist in multiple structured databases and can span from 1 to 10 years’ worth of historical data that has the potential to be used for building predictive machine learning models. Some of the notable predictive applications for machine learning algorithms are fraud detection in the finance industry, product recommendations in e-commerce, and predictive maintenance in the manufacturing industry. Figure 1.1 shows the relationships between deep learning, machine learning, and artificial intelligence for a clearer visual distinction between them:

    Figure 1.1 – Artificial intelligence relationships

    Figure 1.1 – Artificial intelligence relationships

    Now that we know what deep learning and machine learning are in a nutshell, we are ready for a glimpse of the machine learning life cycle, as shown in Figure 1.2:

    Figure 1.2 – Deep learning/machine learning life cycle

    Figure 1.2 – Deep learning/machine learning life cycle

    As advanced and complex the deep learning algorithm is compared to other machine learning algorithms, the guiding methodologies that are needed to ensure success in both domains are unequivocally the same. The machine learning life cycle involves six stages that interact with each other in different ways:

    Planning

    Data Preparation

    Model Development

    Deliver Model Insights

    Model Deployment

    Model Governance

    Figure 1.2 shows these six stages and the possible stage transitions depicted with arrows. Typically, a machine learning project will iterate between stages, depending on the business requirements. In a deep learning project, most of the innovative predictive use cases require manual data collection and data annotation, which is a process that lies in the realm of the Data Preparation stage. As this process is generally time-consuming, especially when the data itself is not readily available, a go-to solution would be to start with an acceptable initial number of data and transition into the Model Development stage and, subsequently, to the Deliver Model Insight stage to make sure results from the ideas are sane.

    After the initial validation process, depending again on business requirements, practitioners would then decide to transition back into the Data Preparation stage and continue to iterate through these stages cyclically in different data size milestones until results are satisfactory toward both the model development and business metrics. Once it gets approval from the necessary stakeholders, the project then goes into the Model Deployment stage, where the built machine learning model will be served to allow its predictions to be consumed. The final stage is Model Governance, where practitioners carry out tasks that manage the risk, performance, and reliability of the deployed machine learning model. Model deployment and model governance both deserve more in-depth discussion and will be introduced in separate chapters closer to the end of this book. Whenever any of the key metrics fail to maintain themselves to a certain determined confidence level, the project will fall back into the Data Preparation stage of the cycle and repeat the same flow all over again.

    The ideal machine learning project flows through the stages cyclically for as long as the business application needs it. However, machine learning projects are typically susceptible to a high probability of failure. According to a survey conducted by Dimensional Research and Alegion, covering around 300 machine learning practitioners from 20 different business industries, 78% of machine learning projects get held back or delayed at some point before deployment. Additionally, Gartner predicted that 85% of machine learning projects will fail (https://venturebeat.com/2021/06/28/why-most-ai-implementations-fail-and-what-enterprises-can-do-to-beat-the-odds/

    ). By expecting the unexpected, and anticipating failures before they happen, practitioners can likely circumvent potential failure factors early down the line in the planning stage. This also brings us to the trash icon bundled together in Figure 1.2. Proper projects with a good plan typically get discarded only at the Deliver Model Insights stage, when it’s clear that the proposed model and project can’t deliver satisfactory results.

    Now that we’ve covered an overview of the machine learning life cycle, let’s dive into each of the stages individually, broken down into sections, to help you discover the key tips and techniques that are needed the complete each stage successfully. These stages will be discussed in an abstract format and are not a concrete depiction of what you should ultimately be doing for your project since all projects are unique and strategies should always be evaluated on a case-by-case basis.

    Strategizing the construction of a deep learning system

    A deep learning model can only realize real-world value by being part of a system that performs some sort of operation. Bringing deep learning models from research papers to actual real-world usage is not an easy task. Thus, performing proper planning before conducting any project is a more reliable and structured way to achieve the desired goals. This section will discuss some considerations and strategies that will be beneficial when you start to plan your deep learning project toward success.

    Starting the journey

    Today, deep learning practitioners tend to focus a lot on the algorithmic model-building part of the process. It takes a considerable amount of mental strength to not get hooked on the hype of state-of-the-art (SOTA) research-focused techniques. With crazy techniques such as pixtopix, which is capable of generating high-resolution realistic color images from just sketches or image masks, and natural language processing (NLP) techniques such as GPT-3, a 175-billion parameters text generation model from OpenAI, and GPT-4, a multimodal text generation model that is a successor to GPT-3 and its sub-models, that are capable of generating practically anything you ask it to in a text format that ranges from text summarization to generating code, why wouldn’t they?!

    Jokes aside, to become a true deep learning architect, we need to come to a consensus that any successful machine learning or deep learning project starts with the business problem and not from the shiny new research paper you just read online complete with a public GitHub repository. The planning stage often involves many business executives who are not savvy about the details of machine learning algorithms and often, the same set of people wouldn’t care about it at all. These algorithms are daunting for business-focused stakeholders to understand and, when added on top of the tough mental barriers of the adoption of artificial intelligence technologies itself, it doesn’t make the project any more likely to be adopted.

    Evaluating deep learning’s worthiness

    Deep learning shines the most in handling unstructured data. This includes image data, text data, audio data, and video data. This is largely due to the model’s ability to automatically learn and extract complex, high-level features from the raw data. In the case of images and videos, deep learning models can capture spatial and temporal patterns, recognizing objects, scenes, and activities. With audio data, deep learning can understand the nuances of speech, noise, and various sound elements, making it possible to build applications such as speech recognition, voice assistants, and audio classification systems. For text data, deep learning models can capture the context, semantics, and syntax, enabling NLP tasks such as sentiment analysis, machine translation, and text summarization.

    This means that if this data exists and is utilized by your company in its business processes, there may be an opportunity to solve a problem with the help of deep learning. However, never overcomplicate problems just so you can solve them with deep learning. Equating this to something more relatable, you wouldn’t use a huge sledgehammer to get a nail into wood. It could work and you might get away with it, but you’d risk bending the nail or injuring yourself while using it.

    Once a problem has been identified, evaluate the business value of solving it. Not all problems are born the same and they can be ranked based on their business impact, value, complexity, risks, costs, and suitability for deep learning. Generally, you’d be looking for high impact, high value, low complexity, low risks, low cost, and high suitability to deep learning. Trade-offs between these metrics are expected but simply put, make sure the problem you’ve discovered is worth solving at all with deep learning. A general rule of thumb is to always resort to a simpler solution for a problem, even if it ends up abandoning the usage of deep learning technologies. Simple approaches tend to be more reliable, less costly, less prone to risks, and faster to fruition.

    Consider a problem where a solution is needed to remove background scenes in a video feed and leave only humans or necessary objects untouched so that a more suitable background scene can be overlaid as a background instead. This is a common problem in the professional filmmaking industry in all film genres today.

    Semantic segmentation, which is the task of assigning a label to every pixel of an image in the width and height dimensions, is a method that is needed to solve such a problem. In this case, the task needs to assign labels that can help identify which pixels need to be removed. With the advent of many publicly available semantic segmentation datasets, deep learning has been able to advance considerably in the semantic segmentation field, allowing itself to achieve a very satisfactory fine-grained understanding of the world, enough so that it can be applied in the industry of autonomous driving and robot navigation most prominently. However, deep learning is not known to be 100% error-free and almost always has some error, even in the controlled evaluation dataset. In the case of human segmentation, for example, the model would likely result in the most errors in the fine hair areas. Most filmmakers aim for perfect depictions of their films and require that every single pixel gets removed appropriately without fail since a lot of money is spent on the time of the actors hired for the film. Additionally, a lot of time and money would be wasted in manually removing objects that could be otherwise simply removed if the scene had been shot with a green screen. This is an example of a case where we should not overcomplicate the problem. A green screen is all you need to solve the problem described: specifically, the rare chromakey green color. When green screens are prepped properly in the areas where the desired imagery will be overlaid digitally, image processing techniques alone can remove the pixels that are considered to be in the small light intensity range centered on the chromakey green color and achieve semantic segmentation effectively with a rule-based solution. The green screen is a simpler solution that is cost-effective, foolproof, and fast to set up.

    That was a mouthful! Now, let’s go through a simpler problem. Consider a problem where we want to automatically and digitally identify when it rains. In this use case, it is important to understand the actual requirements and goals of identifying the rain: is it sufficient to detect rain exactly when it happens? Or do we need to identify whether rain will happen in the near future? What will we use the information of rain events for? These questions will guide whether deep learning is required or not. We, as humans, know that rain can be predicted by visual input by either looking at the presence of raindrops falling or looking at cloud conditions. However, if the use case is sufficient to detect rain when it happens, and the goal of detecting rain is to determine when to water the plants, a simpler approach would be to use an electronic sensor to detect the presence of water or humidity. Only when you want to estimate whether it will rain in the future, let’s say in 15 minutes, does deep learning make more sense to be applied as there are a lot of interactions between meteorological factors that can affect rainfall. Only by brainstorming each use case and analyzing all potential solutions, even outside of deep learning, can you make sure deep learning brings tangible business value compared to other solutions. Do not just apply deep learning because you want to.

    At times, when value isn’t clear when you’re directly considering a use case, or when value is clear but you have no idea how to execute it, consider finding reference projects from companies in the same industry. Companies in the same industry have a high chance of wanting to optimize the same processes or solve the same pain points. Similar reference projects can serve as a guide to designing a deep learning system and can serve as proof that the use case being considered is worthy of the involvement of deep learning technologies. Of course, not everybody has access to details like this, but you’d be surprised what Google can tell you these days. Even if there isn’t a similar project being carried out for direct reference, you would likely be able to pivot upon the other machine learning project references that already have a track record of bringing value to the same industry.

    Admittedly, rejecting deep learning at times would be a hard pill to swallow considering that most practitioners get paid to implement deep learning solutions. However, dismissing it earlier will allow you to focus your time on more valuable problems that would be more useful to solve with deep learning and prevent the risk of undermining the potential of deep learning in cases where simpler solutions can outperform deep learning. Criteria for deep learning worthiness should be evaluated on a case-by-case basis and as a practitioner, the best advice to follow is to simply practice common sense. Spend a good amount of time going through the problem exploration and the worthiness evaluation process. The last thing you want is to spend a painstaking amount of time preparing data, building a deep learning model, and delivering very convincing model insights only to find out that the label you are trying to predict does not provide enough value for the business to invest further.

    Defining success

    Ever heard sentences like "My deep learning model just got 99% accuracy on my validation dataset!"? Data scientists often make the mistake of determining the success of a machine learning project just by using validation metrics they use to evaluate their machine learning models during the model development process. Model-building metrics such as accuracy, precision, or recall are important metrics to consider in a machine learning project but unless they add business values and connect to the business objectives in some way, they rarely mean anything. A project can achieve a good accuracy score but still fail to achieve the desired business goals. This can happen in cases when no proper success metrics have been defined early and subsequently cause a wrong label to be used in the data preparation and model development stages. Furthermore, even when the model metric positively impacts business processes directly, there is a chance that the achievement won’t be communicated effectively to business stakeholders and the worst case not considered to be successful when reported as-is.

    Success metrics, when defined early, act as the machine learning project’s guardrails and ensure that the project goals are aligned with the business goals. One of the guardrails is that a success metric can help guide the choice of a proper label that can at inference time, tangibly improve the business processes or otherwise create value in the business. First, let’s make sure we are aligned with what a label means, which is a value that you want the machine learning model to predict. The purpose of a machine learning model is to assign these labels automatically given some form of input data, and thus during the data preparation and model development stages, a label needs to be chosen to serve that purpose. Choosing the wrong label can be catastrophic to a deep learning project as sometimes, when data is not readily available, it means the project has to start all over again from the data preparation stage. Labels should always be indirectly or directly attributed to the success metric.

    Success metrics, as the name suggests, can be plural, and range from time-based success definitions or milestones to the overall project success, and from intangible to tangible. It’s good practice to generally brainstorm and document all the possible success criteria from a low level to a high level. Another best practice is to make sure to always define tangible success metrics alongside intangible metrics. Intangible metrics generate awareness, but tangible metrics make sure things are measurable and thus make them that much more attainable. A few examples of intangible and hard-to-measure metrics are as follows:

    Increasing customer satisfaction

    Increasing employee performance

    Improving shareholder outlook

    Metrics are ways to measure something and are tied to goals to seal the deal. Goals themselves can be intangible, similar to the few examples listed previously, but so long as it is tied to tangible metrics, the project is off to a good start. When you have a clear goal, ask yourself in what way the goal can be proven to be achieved, demonstrated, or measured. A few examples of tangible success metrics for machine learning projects that could align with business goals are as follows:

    Increase the time customers spend, which can be a proxy for customer delight

    Increase company revenue, which can be a proxy for employee performance

    Increase the click-through rate (CTR), which can be a proxy for the effectiveness of targeted marketing campaigns

    Increase the customer lifetime value (CLTV), which can be a proxy for long-term customer satisfaction and loyalty

    Increase conversion rate, which can be a proxy for the success of promotional campaigns and website user experience

    This concept is not new nor limited to just machine learning projects – just about any single project carried out for a company as every single real-world project needs to be aligned with the business goal. Many foundational project management techniques can be applied similarly to machine learning projects, and spending time gaining some project management skills out of the machine learning field would be beneficial and transferable to machine learning projects. Additionally, as machine learning is considered to be a software-based technology, software project management methodologies also apply.

    A final concluding thought to take away is that machine learning systems are not about how advanced your machine learning models are, but instead about how humans and machine intelligence can work together to achieve a greater good and create value.

    Planning resources

    Deep learning often involves neural network architectures with a large set of parameters, otherwise called weights. These architecture’s sizes can go from holding a few parameters up to holding hundreds of billions of parameters. For example, an OpenAI GPT-3 text generation model holds 175 billion neural network parameters, which amounts to around 350 GB in computer storage size. This means that to run GPT-3, you need a machine with a random access memory (RAM) size of at least 350 GB!

    Deep learning model frameworks such as PyTorch and TensorFlow have been built to work with devices called graphics processing units (GPUs), which offer tremendous neural network model training and inference speedups. Off-the-shelf GPU devices commonly have a GPU RAM of 12 GB and are nowhere near the requirements needed to load a GPT-3 model in GPU mode. However, there are still methods to partition big models into multiple GPUs and run the model on GPUs. Additionally, some methods can allow for distributed GPU model training and inference to support larger data batch sizes at any one usage point. GPUs are not considered cheap devices and can cost anywhere from a few hundred bucks to hundreds of thousands from the most widely used GPU brand, Nvidia. With the rise of cryptocurrency technologies, the availability of GPUs is also reduced significantly due to people buying them immediately when they are in stock. All these emphasize the need to plan computing resources for training and inferencing deep learning models beforehand.

    It is important to align your model development and deployment needs to your computing resource allocation early in the project. Start by gauging the range of sizes of deep learning architectures that are suitable for the task at hand either by browsing research papers or websites that provide a good summary of techniques, and setting aside computing resources for the model development process.

    Tip

    paperswithcode.com

    provides summaries of a wide variety of techniques grouped by a wide variety of tasks!

    When computing resources are not readily available, make sure you always make purchase plans early, especially if it involves GPUs. But what if a physical machine is not desired? An alternative to using computing resources is to use paid cloud computing resource providers you can access online easily from anywhere in the world. During the model development stage, one of the benefits of having more GPUs with more RAM allocated is that it can allow you to train models faster by either using a larger data batch size during training or allowing the capability to train multiple models at any one time. It is generally fine to also use CPU-only deep learning model training, but the model training time would just inevitably be much longer.

    The GPU and CPU-based computing resources that are required during training are often considered overkill to be used during inference time when they are deployed. Different applications have different deployment computing requirements and the decision on what resource specification to allocate can be gauged by asking yourself the following three questions:

    How often are the inference requests made?

    Many inference requests in a short period might signal the need to have more than one inference service up in multiple computing devices in parallel

    What is the average amount of samples that are requested for a prediction at any one time?

    Device RAM requirements should match batch size expectations

    How fast do you need a reply?

    GPUs are needed if

    Enjoying the preview?
    Page 1 of 1