The Deep Learning Architect's Handbook: Build and deploy production-ready DL solutions leveraging the latest Python techniques
By Ee Kin Chin
()
About this ebook
Deep learning enables previously unattainable feats in automation, but extracting real-world business value from it is a daunting task. This book will teach you how to build complex deep learning models and gain intuition for structuring your data to accomplish your deep learning objectives.
This deep learning book explores every aspect of the deep learning life cycle, from planning and data preparation to model deployment and governance, using real-world scenarios that will take you through creating, deploying, and managing advanced solutions. You’ll also learn how to work with image, audio, text, and video data using deep learning architectures, as well as optimize and evaluate your deep learning models objectively to address issues such as bias, fairness, adversarial attacks, and model transparency.
As you progress, you’ll harness the power of AI platforms to streamline the deep learning life cycle and leverage Python libraries and frameworks such as PyTorch, ONNX, Catalyst, MLFlow, Captum, Nvidia Triton, Prometheus, and Grafana to execute efficient deep learning architectures, optimize model performance, and streamline the deployment processes. You’ll also discover the transformative potential of large language models (LLMs) for a wide array of applications.
By the end of this book, you'll have mastered deep learning techniques to unlock its full potential for your endeavors.
Related to The Deep Learning Architect's Handbook
Related ebooks
Metaprogramming with Python: A programmer's guide to writing reusable code to build smarter applications Rating: 0 out of 5 stars0 ratingsPractical Convolutional Neural Networks: Implement advanced deep learning models using Python Rating: 0 out of 5 stars0 ratingsDeep Learning with PyTorch: A practical approach to building neural network models using PyTorch Rating: 0 out of 5 stars0 ratingsInternet of Things (IoT) A Quick Start Guide: A to Z of IoT Essentials Rating: 0 out of 5 stars0 ratingsPython for Geeks: Build production-ready applications using advanced Python concepts and industry best practices Rating: 0 out of 5 stars0 ratingsThe Machine Learning Solutions Architect Handbook: Create machine learning platforms to run solutions in an enterprise setting Rating: 0 out of 5 stars0 ratingsDesigning Deep Learning Systems: A software engineer's guide Rating: 0 out of 5 stars0 ratingsInterpretable Machine Learning with Python: Learn to build interpretable high-performance models with hands-on real-world examples Rating: 0 out of 5 stars0 ratingsDeep Learning for Data Architects: Unleash the power of Python's deep learning algorithms (English Edition) Rating: 0 out of 5 stars0 ratingsUltimate Machine Learning with ML.NET Rating: 0 out of 5 stars0 ratingsApplied Deep Learning: Design and implement your own Neural Networks to solve real-world problems (English Edition) Rating: 0 out of 5 stars0 ratingsDistributed Machine Learning with Python: Accelerating model training and serving with distributed systems Rating: 0 out of 5 stars0 ratingsDecoding Large Language Models: An exhaustive guide to understanding, implementing, and optimizing LLMs for NLP applications Rating: 0 out of 5 stars0 ratingsEngineering MLOps: Rapidly build, test, and manage production-ready machine learning life cycles at scale Rating: 0 out of 5 stars0 ratings3D Deep Learning with Python: Design and develop your computer vision model with 3D data using PyTorch3D and more Rating: 0 out of 5 stars0 ratingsHands-On Artificial Intelligence for Beginners: An introduction to AI concepts, algorithms, and their implementation Rating: 0 out of 5 stars0 ratingsDeep Learning By Example: A hands-on guide to implementing advanced machine learning algorithms and neural networks Rating: 0 out of 5 stars0 ratingsR Deep Learning Essentials.: A step-by-step guide to building deep learning models using TensorFlow, Keras, and MXNet Rating: 0 out of 5 stars0 ratingsCloud Native Software Security Handbook: Unleash the power of cloud native tools for robust security in modern applications Rating: 0 out of 5 stars0 ratingsHands-On Machine Learning with C++: Build, train, and deploy end-to-end machine learning and deep learning pipelines Rating: 0 out of 5 stars0 ratingsHands-On Neural Network Programming with C#: Add powerful neural network capabilities to your C# enterprise applications Rating: 0 out of 5 stars0 ratingsMastering Azure Machine Learning.: Execute large-scale end-to-end machine learning with Azure Rating: 0 out of 5 stars0 ratingsR Machine Learning Projects: Implement supervised, unsupervised, and reinforcement learning techniques using R 3.5 Rating: 0 out of 5 stars0 ratingsAdvanced Deep Learning with Python: Design and implement advanced next-generation AI solutions using TensorFlow and PyTorch Rating: 0 out of 5 stars0 ratingsPractical Machine Learning on Databricks: Seamlessly transition ML models and MLOps on Databricks Rating: 0 out of 5 stars0 ratingsGo Machine Learning Projects: Eight projects demonstrating end-to-end machine learning and predictive analytics applications in Go Rating: 0 out of 5 stars0 ratings
Data Modeling & Design For You
Data Analytics for Beginners: Introduction to Data Analytics Rating: 4 out of 5 stars4/5Data Visualization: a successful design process Rating: 4 out of 5 stars4/5Mastering Agile User Stories Rating: 4 out of 5 stars4/5R All-in-One For Dummies Rating: 0 out of 5 stars0 ratingsThe Secrets of ChatGPT Prompt Engineering for Non-Developers Rating: 5 out of 5 stars5/5150 Most Poweful Excel Shortcuts: Secrets of Saving Time with MS Excel Rating: 3 out of 5 stars3/5Neural Networks for Beginners: An Easy-to-Follow Introduction to Artificial Intelligence and Deep Learning Rating: 2 out of 5 stars2/5Microsoft Access: Database Creation and Management through Microsoft Access Rating: 0 out of 5 stars0 ratingsMachine Learning Interview Questions Rating: 5 out of 5 stars5/5Neural Networks: Neural Networks Tools and Techniques for Beginners Rating: 5 out of 5 stars5/5DAX Patterns: Second Edition Rating: 5 out of 5 stars5/5Spreadsheets To Cubes (Advanced Data Analytics for Small Medium Business): Data Science Rating: 0 out of 5 stars0 ratingsSupercharge Excel: When you learn to Write DAX for Power Pivot Rating: 0 out of 5 stars0 ratingsThinking in Algorithms: Strategic Thinking Skills, #2 Rating: 4 out of 5 stars4/5Living in Data: A Citizen's Guide to a Better Information Future Rating: 4 out of 5 stars4/5Data Analytics with Python: Data Analytics in Python Using Pandas Rating: 3 out of 5 stars3/5Tailoring Prompts For Success - The Ultimate ChatGPT Prompt Engineering Guide Rating: 3 out of 5 stars3/5Python Data Analysis Cookbook Rating: 5 out of 5 stars5/5AI and UX: Why Artificial Intelligence Needs User Experience Rating: 0 out of 5 stars0 ratingsAdvanced Deep Learning with Python: Design and implement advanced next-generation AI solutions using TensorFlow and PyTorch Rating: 0 out of 5 stars0 ratingsRaspberry Pi :Raspberry Pi Guide On Python & Projects Programming In Easy Steps Rating: 3 out of 5 stars3/5A Concise Guide to Object Orientated Programming Rating: 0 out of 5 stars0 ratingsApplied Predictive Modeling: An Overview of Applied Predictive Modeling Rating: 0 out of 5 stars0 ratingsSupercharge Power BI: Power BI is Better When You Learn To Write DAX Rating: 5 out of 5 stars5/5Learning Social Media Analytics with R Rating: 0 out of 5 stars0 ratingsR: Data Analysis and Visualization Rating: 5 out of 5 stars5/5
Reviews for The Deep Learning Architect's Handbook
0 ratings0 reviews
Book preview
The Deep Learning Architect's Handbook - Ee Kin Chin
The Deep Learning Architect’s Handbook
Copyright © 2023 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Group Product Manager: Ali Abidi
Book Project Manager: Shambhavi Mishra
Senior Editor: Rohit Singh
Technical Editor: Devanshi Ayare
Copy Editor: Safis Editing
Proofreader: Safis Editing
Indexer: Subalakshmi Govindhan
Production Designer: Ponraj Dhandapani
DevRel Marketing Executive: Vinishka Kalra
First published: December 2023
Production reference: 1301123
Published by Packt Publishing Ltd.
Grosvenor House
11 St Paul’s Square
Birmingham
B3 1RB, UK.
ISBN 978-1-80324-379-5
www.packtpub.com
To my wife, Nina, the constant source of inspiration, support, and encouragement in my life. Without you, this book would have remained a dream.
– Ee Kin Chin
Contributors
About the author
Ee Kin Chin is a senior deep learning engineer at DataRobot. He led teams to develop advanced AI tools used by numerous organizations from diverse industries and provided consultation on many customer AI use cases. Previously, he worked on deep learning (DL) computer vision projects for smart vehicles and human sensing applications at Panasonic and offered AI solutions using edge cameras at a tech solutions provider. He was also a DL mentor for an online course. Holding a Bachelor of Engineering (honors) degree in electronics, with a major in telecommunications, and a proven track record of successful application of AI, Ee Kin's expertise includes embedded applications, practical deep learning, data science, and classical machine learning.
A huge shout-out to my fantastic friends, mentors, colleagues, family, book reviewers, and the open source community who’ve supported and motivated me during my professional career. Your shared knowledge, insights, and wisdom have been invaluable.
About the reviewers
Shivani Modi is a data scientist with expertise in machine learning, deep learning, and NLP, holding a master’s degree from Columbia University. Her five years of professional experience spans IBM, SAP, and C3 AI, where she has excelled in deploying scalable AI models across various sectors. At Konko AI, Shivani spearheaded the development of tools to optimize LLM selection and deployment. Shivani’s dedication to mentoring and talent development, coupled with her hands-on experience in leading complex projects, underscores her status as a thought leader in AI innovation. Her upcoming project aims to revolutionize how developers utilize LLMs, ensuring their secure and efficient implementation.
Ved Upadhyay is a seasoned data science and AI professional, bringing over seven years of hands-on experience in addressing enterprise-level challenges in deep learning. His expertise spans diverse industries, including retail, e-commerce, pharmaceuticals, agro-tech, and socio-tech, where he has successfully implemented AI solutions. Ved is currently working as a senior data scientist at Walmart, where he leads multiple initiatives focused on customer propensity and responsible AI. He earned his master’s degree in data science from the University of Illinois Urbana-Champaign and has contributed as a deep learning researcher at IIIT Hyderabad.
Table of Contents
Preface
Part 1 – Foundational Methods
1
Deep Learning Life Cycle
Technical requirements
Understanding the machine learning life cycle
Strategizing the construction of a deep learning system
Starting the journey
Evaluating deep learning’s worthiness
Defining success
Planning resources
Preparing data
Deep learning problem types
Acquiring data
Making sense of data through exploratory data analysis (EDA)
Data pre-processing
Developing deep learning models
Deep learning model families
The model development strategy
Delivering model insights
Managing risks
Ethical and regulatory risks
Business context mismatch
Data collection and annotation risks
Data security risk
Summary
Further reading
2
Designing Deep Learning Architectures
Technical requirements
Exploring the foundations of neural networks using an MLP
Understanding neural network gradients
Understanding gradient descent
Implementing an MLP from scratch
Implementing MLP using deep learning frameworks
Regularization
Designing an MLP
Summary
3
Understanding Convolutional Neural Networks
Technical requirements
Understanding the convolutional neural network layer
Understanding the pooling layer
Building a CNN architecture
Designing a CNN architecture for practical usage
Exploring the CNN architecture families
Understanding the ResNet model family
Understanding the DenseNet architecture family
Understanding the EfficientNet architecture family
Understanding small and fast CNN architecture families for small-scale edge devices
Understanding SqueezeNet
Understanding MobileNet
Understanding MicroNet, the current state-of-the-art architecture for the edge
Summary
4
Understanding Recurrent Neural Networks
Technical requirements
Understanding LSTM
Decoding the forget mechanism of LSTMs
Decoding the learn mechanism of LSTMs
Decoding the remember mechanism of LSTMs
Decoding the information-using
mechanism of LSTMs
Building a full LSTM network
Understanding GRU
Decoding the reset gate of GRU
Decoding the update gate of GRU
Understanding advancements over the standard GRU and LSTM layers
Decoding bidirectional RNN
Adding peepholes to LSTMs
Adding working memory to exceed the peephole connection limitations for LSTM
Summary
5
Understanding Autoencoders
Technical requirements
Decoding the standard autoencoder
Exploring autoencoder variations
Building a CNN autoencoder
Summary
6
Understanding Neural Network Transformers
Exploring neural network transformers
Decoding the original transformer architecture holistically
Uncovering transformer improvements using only the encoder
Improving the encoder only pre-training tasks and objectives
Improving the encoder-only transformer’s architectural compactness and efficiency
Improving the encoder-only transformers’ core functional architecture
Uncovering encoder-only transformers’ adaptations to other data modalities
Uncovering transformer improvements using only the decoder
Diving into the GPT model family
Diving into the XLNet model
Discussing additional advancements for a decoder-only transformer model
Summary
7
Deep Neural Architecture Search
Technical requirements
Understanding the big picture of NAS
Understanding general hyperparameter search-based NAS
Searching neural architectures by using successive halving
Searching neural architectures by using Hyperband
Searching neural architectures by using Bayesian hyperparameter optimization
Understanding RL-based NAS
Understanding founding NAS based on RL
Understanding ENAS
Understanding MNAS
Summarizing NAS with RL methods
Understanding non-RL-based NAS
Understanding path elimination-based NAS
Understanding progressive growth-based NAS
Summary
8
Exploring Supervised Deep Learning
Technical requirements
Exploring supervised use cases and problem types
Implementing neural network layers for foundational problem types
Implementing the binary classification layer
Implementing the multiclass classification layer
Implementing a regression layer
Implementing representation layers
Training supervised deep learning models effectively
Preparing the data for DL training
Configuring and tuning DL hyperparameters
Executing, visualizing, tracking, and comparing experiments
Exploring model-building tips
Exploring general techniques to realize and improve supervised deep learning based solutions
Breaking down the multitask paradigm in supervised deep learning
Multitask pipelines
TL
Multiple objective learning
Multimodal NN training
Summary
9
Exploring Unsupervised Deep Learning
Technical requirements
Exploring unsupervised deep learning applications
Creating pretrained network weights for downstream tasks
Creating general representations through unsupervised deep learning
Exploring zero-shot learning
Exploring the dimensionality reduction component of unsupervised deep learning
Detecting anomalies in external data
Summary
Part 2 – Multimodal Model Insights
10
Exploring Model Evaluation Methods
Technical requirements
Exploring the different model evaluation methods
Engineering the base model evaluation metric
Exploring custom metrics and their applications
Exploring statistical tests for comparing model metrics
Relating the evaluation metric to success
Directly optimizing the metric
Summary
11
Explaining Neural Network Predictions
Technical requirements
Exploring the value of prediction explanations
Demystifying prediction explanation techniques
Exploring gradient-based prediction explanations
Trusting and understanding integrated gradients
Using integrated gradients to aid in understanding predictions
Explaining prediction explanations automatically
Exploring common pitfalls in prediction explanations and how to avoid them
Summary
Further reading
12
Interpreting Neural Networks
Technical requirements
Interpreting neurons
Finding neurons to interpret
Interpreting learned image patterns
Explaining predictions with image input data and integrated gradients
Practically visualizing neurons with image input data
Discovering the counterfactual explanation strategy
Summary
13
Exploring Bias and Fairness
Technical requirements
Exploring the types of bias
Understanding the source of AI bias
Discovering bias and fairness evaluation methods
Evaluating the bias and fairness of a deep learning model
Tailoring bias and fairness measures across use cases
Mitigating AI bias
Summary
14
Analyzing Adversarial Performance
Technical requirements
Using data augmentations for adversarial analysis
Analyzing adversarial performance for audio-based models
Executing adversarial performance analysis for speech recognition models
Analyzing adversarial performance for image-based models
Executing adversarial performance analysis for a face recognition model
Exploring adversarial analysis for text-based models
Summary
Part 3 – DLOps
15
Deploying Deep Learning Models to Production
Technical requirements
Exploring the crucial components for DL model deployment
Identifying key DL model deployment requirements
Choosing the right DL model deployment options
Architectural choices
Computing hardware choices
Model packaging and frameworks
Communication protocols to use
User interfaces
Exploring deployment decisions based on practical use cases
Exploring deployment decisions for a sentiment analysis application
Exploring deployment decisions for a face detection and recognition system for security cameras
Discovering general recommendations for DL deployment
Model safety, trust, and reliability assurance
Optimizing model latency
Tools that abstract deployment
Deploying a language model with ONNX, TensorRT, and NVIDIA Triton Server
Practically deploying a DL model with the single pipeline approach
Summary
16
Governing Deep Learning Models
Technical requirements
Governing deep learning model utilization
Governing a deep learning model through monitoring
Monitoring a deployed deep learning model with NVIDIA Triton Server, Prometheus, and Grafana
Governing a deep learning model through maintenance
Exploring limitations and risks of using automated tasks triggered by model monitoring alerts
Summary
17
Managing Drift Effectively in a Dynamic Environment
Technical requirements
Exploring the issues of drift
Exploring the types of drift
Exploring data drift types
Exploring concept drift
Exploring model drift
Exploring strategies to handle drift
Exploring drift detection strategies
Analyzing the impact of drift
Exploring strategies to mitigate drift
Detecting drift programmatically
Detecting concept drift programmatically
Detecting data drift programmatically
Implementing programmatic data distribution drift detection using evidently
Comparing and contrasting the Evidently and Alibi-Detect libraries for drift detection
Summary
18
Exploring the DataRobot AI Platform
Technical requirements
A high-level look into what the DataRobot AI platform provides
Preparing data with DataRobot
Ingesting data for deep learning model development
Exploratory analysis of the data
Wrangling data for deep learning model development
Executing modeling experiments with DataRobot
Deep learning modeling
Gathering model and prediction insights
Making batch predictions
Deploying a deep learning blueprint
Practically deploying a blueprint in DataRobot
Governing a deployed deep learning blueprint
Governing through model utilization in DataRobot
Governing through model monitoring in DataRobot
Governing through model maintenance in DataRobot
Exploring some customer success stories
Summary
19
Architecting LLM Solutions
Overview of LLM solutions
Handling knowledge for LLM solutions
Exploring chunking methods
Exploring embedding models
Exploring the knowledge base index types
Exploring orchestrator tools for LLM solutions
Evaluating LLM solutions
Evaluating LLM solutions through quantitative metrics
Evaluating LLM solutions through qualitative evaluation methods
Identifying challenges with LLM solutions
Tackling challenges with LLM solutions
Tackling the output and input limitation challenge
Tackling the knowledge- and information-related challenge
Tackling the challenges of accuracy and reliability
Tackling the runtime performance challenge
Tackling the challenge of ethical implications and societal impacts
Tackling the overarching challenge of LLM solution adoption
Leveraging LLM to build autonomous agents
Exploring LLM solution use cases
Summary
Further reading
Index
Other Books You May Enjoy
Preface
As a deep learning practitioner and enthusiast, I have spent years working on various projects and learning from diverse sources such as Kaggle, GitHub, colleagues, and real-life use cases. I've realized that there is a significant gap in the availability of cohesive, end-to-end deep learning resources. Traditional Massively Open Online Courses (MOOC), while helpful, often lack the practical knowledge and real-world insights that can only be gained through hands-on experience.
To bridge this gap, I've created The Deep Learning Architect Handbook, a comprehensive and practical guide that combines my unique experiences and insights. This book will help you navigate the complex landscape of deep learning, providing you with the knowledge and insights that would typically take years of hands-on experience to acquire, condensed into a resource that can be consumed in just days or weeks.
This book delves into various stages of the deep learning life cycle, from planning and data preparation to model deployment and governance. Throughout this journey, you'll encounter both foundational and advanced deep learning architectures, such as Multi-Layer Perceptrons (MLPs), Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), autoencoders, transformers, and cutting-edge methods, such as Neural Architecture Search (NAS). Divided into three parts, this book covers foundational methods, model insights, and DLOps, exploring advanced topics such as NAS, adversarial performance, and Large Language Model (LLM) solutions. By the end of this book, you will be well-prepared to design, develop, and deploy effective deep learning solutions, unlocking their full potential and driving innovation across various applications.
I hope that this book will serve as a way for me to give back to the community, by sparking conversations, challenging assumptions, and inspiring new ideas and approaches in the field of deep learning. I invite you to join me on this journey, and I look forward to hearing your thoughts and feedback as we explore the captivating world of deep learning together. Please feel free to reach out to me via LinkedIn through www.linkedin.com/in/chineekin
, Kaggle through https://www.kaggle.com/dicksonchin93
, or other channels listed on my LinkedIn profile. Your unique experiences and perspectives will undoubtedly contribute to the ongoing evolution of this book and the deep learning community as a whole.
Who this book is for
This book is best suited for deep learning practitioners, data scientists, and machine learning developers who want to explore deep learning architectures to solve complex business problems. The audience of this book is professionals in the deep learning and AI space who are going to use the knowledge in their business use cases. Working knowledge of Python programming and a basic understanding of deep learning techniques is needed to get the most out of this book.
What this book covers
Chapter 1
, Deep Learning Life Cycle, introduces the key stages of a deep learning project, focusing on planning and data preparation, and sets the stage for a comprehensive exploration of the deep learning life cycle throughout the book.
Chapter 2, Designing Deep Learning Architectures, dives into the foundational aspects of deep learning architectures, including MLPs, and discusses their role in advanced neural networks, as well as the importance of backpropagation and regularization.
Chapter 3
, Understanding Convolutional Neural Networks, provides an in-depth look at CNNs, their applications in image processing, and various model families within the CNN domain.
Chapter 4, Understanding Recurrent Neural Networks, explores the structure and variations of RNNs and their ability to process sequential data effectively.
Chapter 5
, Understanding Autoencoders, examines the fundamentals of autoencoders as a method for representation learning and their applications across different data modalities.
Chapter 6
, Understanding Neural Network Transformers, delves into the versatile nature of transformers, capable of handling diverse data modalities without explicit data-specific biases, and their potential applications in various tasks and domains.
Chapter 7, Deep Neural Architecture Search, introduces the concept of NAS as a way to automate the design of advanced neural networks and discusses its applications and limitations in different scenarios.
Chapter 8
, Exploring Supervised Deep Learning, covers various supervised learning problem types, techniques for implementing and training deep learning models, and practical implementations using popular deep learning frameworks.
Chapter 9
, Exploring Unsupervised Deep Learning, discusses the contributions of deep learning to unsupervised learning, particularly highlighting the unsupervised pre-training method. Harnessing the vast amounts of freely available data on the internet, this approach improves model performance for downstream supervised tasks and paves the way toward general Artificial Intelligence (AI).
Chapter 10
, Exploring Model Evaluation Methods, provides an overview of model evaluation techniques, metric engineering, and strategies for optimizing against evaluation metrics.
Chapter 11
, Explaining Neural Network Predictions, delves into the prediction explanation landscape, focusing on the integrated gradients technique and its practical applications for understanding neural network predictions.
Chapter 12
, Interpreting Neural Networks, delves into the nuances of model understanding and showcases techniques for uncovering patterns detected by neurons. By exploring real images and generating images through optimization to activate specific neurons, you will gain valuable insights into the neural network’s decision-making process.
Chapter 13
, Exploring Bias and Fairness, addresses the critical issue of bias and fairness in machine learning models, discussing various types, metrics, and programmatic methods for detecting and mitigating bias.
Chapter 14
, Analyzing Adversarial Performance, examines the importance of adversarial performance analysis in identifying vulnerabilities and weaknesses in machine learning models, along with practical examples and techniques for analysis.
Chapter 15
, Deploying Deep Learning Models in Production, focuses on key components, requirements, and strategies for deploying deep learning models in production environments, including architectural choices, hardware infrastructure, and model packaging.
Chapter 16
, Governing Deep Learning Models, explores the fundamental pillars of model governance, including model utilization, model monitoring, and model maintenance, while providing practical steps for monitoring deep learning models.
Chapter 17
, Managing Drift Effectively in a Dynamic Environment, discusses the concept of drift and its impact on model performance, along with strategies for detecting, quantifying, and mitigating drift in deep learning models.
Chapter 18
, Exploring the DataRobot AI Platform, showcases the benefits of AI platforms, specifically DataRobot, in streamlining and accelerating the deep learning life cycle, and highlights various features and capabilities of the platform.
Chapter 19
, Architecting LLM Solutions, delves into LLMs and the potential applications, challenges, and strategies for creating effective, contextually aware solutions using LLMs.
To get the most out of this book
The code provided in the chapters has been tested on a computer with Python 3.10, Ubuntu 20.04 LTS 64-bit OS, 32 GB RAM, and an RTX 2080TI GPU for running deep learning models. Although the code has been tested on this specific setup, it may also work on other configurations; however, compatibility and performance are not guaranteed. Python dependencies are included in the requirements.txt file for easy installation in each chapter’s respective GitHub folders. Additionally, some non-Python software might be required; their installation instructions will be mentioned at the beginning of each relevant tutorial. For these software installations, you need to refer to external manuals or guides to install them. Do keep in mind the potential differences in system configurations as you carry out the practical code sections in this book.
If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book’s GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.
Download the example code files
You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/The-Deep-Learning-Architect-Handbook
. If there’s an update to the code, it will be updated in the GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/
. Check them out!
Conventions used
There are a number of text conventions used throughout this book.
Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: We will be using pandas for data manipulation and structuring, matplotlib and seaborn for plotting graphs, tqdm for visualizing iteration progress, and lingua for text language detection.
A block of code is set as follows:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from tqdm import tqdm
from lingua import Language, LanguageDetectorBuilder
tqdm.pandas()
Any command-line input or output is written as follows:
sudo systemctl start node_exporter sudo systemctl start prometheus
Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: We can set up the Prometheus link now by clicking on the three-line button on the top-left tab and clicking on the Data Sources tab under the Administration dropdown.
Tips or important notes
Appear like this.
Get in touch
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, email us at [email protected]
and mention the book title in the subject of your message.
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata
and fill in the form.
Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected]
with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com
.
Share Your Thoughts
Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.
Share Your Thoughts
Once you’ve read The Deep Learning Architect’s Handbook, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page
for this book and share your feedback.
Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.
Download a free PDF copy of this book
Thanks for purchasing this book!
Do you like to read on the go but are unable to carry your print books everywhere?
Is your eBook purchase not compatible with the device of your choice?
Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.
Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application.
The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily
Follow these simple steps to get the benefits:
Scan the QR code or visit the link below
https://packt.link/free-ebook/9781803243795
Submit your proof of purchase
That’s it! We’ll send your free PDF and other benefits to your email directly
Part 1 – Foundational Methods
In this part of the book, you will gain a comprehensive understanding of the foundational methods and techniques in deep learning architectures. Starting with the deep learning life cycle, you will explore various stages at a high level, from planning and data preparation to model development, insights, deployment, and governance. You will then dive into the intricacies of designing deep learning architectures such as MLPs, CNNs, RNNs, autoencoders, and transformers. Additionally, you will learn about the emerging method of neural architecture search and its impact on the field of deep learning.
Throughout this part, you will also delve into the practical aspects of supervised and unsupervised deep learning, covering topics such as binary classification, multiclassification, regression, and multitask learning, as well as unsupervised pre-training and representation learning. With a focus on real-world applications, this part provides valuable insights into the implementation of deep learning models using popular frameworks and programming languages.
By the end of this part, you will have a solid foundation in deep learning architectures, methods, and life cycles, which will enable you to continue your journey to face other challenges involved in crafting deep learning solutions.
This part contains the following chapters:
Chapter 1
, Deep Learning Life Cycle
Chapter 2
, Designing Deep Learning Architectures
Chapter 3
, Understanding Convolutional Neural Networks
Chapter 4
, Understanding Recurrent Neural Networks
Chapter 5
, Understanding Autoencoders
Chapter 6
, Understanding Neural Network Transformers
Chapter 7
, Deep Neural Architecture Search
Chapter 8
, Exploring Supervised Deep Learning
Chapter 9
, Exploring Unsupervised Deep Learning
1
Deep Learning Life Cycle
In this chapter, we will explore the intricacies of the deep learning life cycle. Sharing similar characteristics to the machine learning life cycle, the deep learning life cycle is a framework as much as it is a methodology that will allow a deep learning project idea to be insanely successful or to be completely scrapped when it is appropriate. We will grasp the reasons why the process is cyclical and understand some of the life cycle’s initial processes on a deeper level. Additionally, we will go through some high-level sneak peeks of the later processes of the life cycle that will be explored at a deeper level in future chapters.
Comprehensively, this chapter will help you do the following:
Understand the similarities and differences between the deep learning life cycle and its machine learning life cycle counterpart
Understand where domain knowledge fits in a deep learning project
Understand the few key steps in planning a deep learning project to make sure it can tangibly create real-world value
Grasp some deep learning model development details at a high level
Grasp the importance of model interpretation and the variety of deep learning interpretation techniques at a high level
Explore high-level concepts of model deployments and their governance
Learn to choose the necessary tools to carry out the processes in the deep learning life cycle
We’ll cover this material in the following sections:
Machine learning life cycle
The construction strategy of a deep learning life cycle
The data preparation stage
Deep learning model development
Delivering model insights
Managing risks
Technical requirements
This chapter includes some practical implementations in the Python programming language. To complete it, you need to have a computer with the following libraries installed:
pandas
matplotlib
seaborn
tqdm
lingua
The code files are available on GitHub: https://github.com/PacktPublishing/The-Deep-Learning-Architect-Handbook/tree/main/CHAPTER_1
.
Understanding the machine learning life cycle
Deep learning is a subset of the wider machine learning category. The main characteristic that sets it apart from other machine learning algorithms is the foundational building block called neural networks. As deep learning has advanced tremendously since the early 2000s, it has made many previously unachievable feats possible through its machine learning counterparts. Specifically, deep learning has made breakthroughs in recognizing complex patterns that exist in complex and unstructured data such as text, images, videos, and audio. Some of the successful applications of deep learning today are face recognition with images, speech recognition from audio data, and language translation with textual data.
Machine learning, on the other hand, is a subset of the wider artificial intelligence category. Its algorithms, such as tree-based models and linear models, which are not considered to be deep learning models, still serve a wide range of use cases involving tabular data, which is the bulk of the data that’s stored by small and big organizations alike. This tabular data may exist in multiple structured databases and can span from 1 to 10 years’ worth of historical data that has the potential to be used for building predictive machine learning models. Some of the notable predictive applications for machine learning algorithms are fraud detection in the finance industry, product recommendations in e-commerce, and predictive maintenance in the manufacturing industry. Figure 1.1 shows the relationships between deep learning, machine learning, and artificial intelligence for a clearer visual distinction between them:
Figure 1.1 – Artificial intelligence relationshipsFigure 1.1 – Artificial intelligence relationships
Now that we know what deep learning and machine learning are in a nutshell, we are ready for a glimpse of the machine learning life cycle, as shown in Figure 1.2:
Figure 1.2 – Deep learning/machine learning life cycleFigure 1.2 – Deep learning/machine learning life cycle
As advanced and complex the deep learning algorithm is compared to other machine learning algorithms, the guiding methodologies that are needed to ensure success in both domains are unequivocally the same. The machine learning life cycle involves six stages that interact with each other in different ways:
Planning
Data Preparation
Model Development
Deliver Model Insights
Model Deployment
Model Governance
Figure 1.2 shows these six stages and the possible stage transitions depicted with arrows. Typically, a machine learning project will iterate between stages, depending on the business requirements. In a deep learning project, most of the innovative predictive use cases require manual data collection and data annotation, which is a process that lies in the realm of the Data Preparation stage. As this process is generally time-consuming, especially when the data itself is not readily available, a go-to solution would be to start with an acceptable initial number of data and transition into the Model Development stage and, subsequently, to the Deliver Model Insight stage to make sure results from the ideas are sane.
After the initial validation process, depending again on business requirements, practitioners would then decide to transition back into the Data Preparation stage and continue to iterate through these stages cyclically in different data size milestones until results are satisfactory toward both the model development and business metrics. Once it gets approval from the necessary stakeholders, the project then goes into the Model Deployment stage, where the built machine learning model will be served to allow its predictions to be consumed. The final stage is Model Governance, where practitioners carry out tasks that manage the risk, performance, and reliability of the deployed machine learning model. Model deployment and model governance both deserve more in-depth discussion and will be introduced in separate chapters closer to the end of this book. Whenever any of the key metrics fail to maintain themselves to a certain determined confidence level, the project will fall back into the Data Preparation stage of the cycle and repeat the same flow all over again.
The ideal machine learning project flows through the stages cyclically for as long as the business application needs it. However, machine learning projects are typically susceptible to a high probability of failure. According to a survey conducted by Dimensional Research and Alegion, covering around 300 machine learning practitioners from 20 different business industries, 78% of machine learning projects get held back or delayed at some point before deployment. Additionally, Gartner predicted that 85% of machine learning projects will fail (https://venturebeat.com/2021/06/28/why-most-ai-implementations-fail-and-what-enterprises-can-do-to-beat-the-odds/
). By expecting the unexpected, and anticipating failures before they happen, practitioners can likely circumvent potential failure factors early down the line in the planning stage. This also brings us to the trash icon bundled together in Figure 1.2. Proper projects with a good plan typically get discarded only at the Deliver Model Insights stage, when it’s clear that the proposed model and project can’t deliver satisfactory results.
Now that we’ve covered an overview of the machine learning life cycle, let’s dive into each of the stages individually, broken down into sections, to help you discover the key tips and techniques that are needed the complete each stage successfully. These stages will be discussed in an abstract format and are not a concrete depiction of what you should ultimately be doing for your project since all projects are unique and strategies should always be evaluated on a case-by-case basis.
Strategizing the construction of a deep learning system
A deep learning model can only realize real-world value by being part of a system that performs some sort of operation. Bringing deep learning models from research papers to actual real-world usage is not an easy task. Thus, performing proper planning before conducting any project is a more reliable and structured way to achieve the desired goals. This section will discuss some considerations and strategies that will be beneficial when you start to plan your deep learning project toward success.
Starting the journey
Today, deep learning practitioners tend to focus a lot on the algorithmic model-building part of the process. It takes a considerable amount of mental strength to not get hooked on the hype of state-of-the-art (SOTA) research-focused techniques. With crazy techniques such as pixtopix, which is capable of generating high-resolution realistic color images from just sketches or image masks, and natural language processing (NLP) techniques such as GPT-3, a 175-billion parameters text generation model from OpenAI, and GPT-4, a multimodal text generation model that is a successor to GPT-3 and its sub-models, that are capable of generating practically anything you ask it to in a text format that ranges from text summarization to generating code, why wouldn’t they?!
Jokes aside, to become a true deep learning architect, we need to come to a consensus that any successful machine learning or deep learning project starts with the business problem and not from the shiny new research paper you just read online complete with a public GitHub repository. The planning stage often involves many business executives who are not savvy about the details of machine learning algorithms and often, the same set of people wouldn’t care about it at all. These algorithms are daunting for business-focused stakeholders to understand and, when added on top of the tough mental barriers of the adoption of artificial intelligence technologies itself, it doesn’t make the project any more likely to be adopted.
Evaluating deep learning’s worthiness
Deep learning shines the most in handling unstructured data. This includes image data, text data, audio data, and video data. This is largely due to the model’s ability to automatically learn and extract complex, high-level features from the raw data. In the case of images and videos, deep learning models can capture spatial and temporal patterns, recognizing objects, scenes, and activities. With audio data, deep learning can understand the nuances of speech, noise, and various sound elements, making it possible to build applications such as speech recognition, voice assistants, and audio classification systems. For text data, deep learning models can capture the context, semantics, and syntax, enabling NLP tasks such as sentiment analysis, machine translation, and text summarization.
This means that if this data exists and is utilized by your company in its business processes, there may be an opportunity to solve a problem with the help of deep learning. However, never overcomplicate problems just so you can solve them with deep learning. Equating this to something more relatable, you wouldn’t use a huge sledgehammer to get a nail into wood. It could work and you might get away with it, but you’d risk bending the nail or injuring yourself while using it.
Once a problem has been identified, evaluate the business value of solving it. Not all problems are born the same and they can be ranked based on their business impact, value, complexity, risks, costs, and suitability for deep learning. Generally, you’d be looking for high impact, high value, low complexity, low risks, low cost, and high suitability to deep learning. Trade-offs between these metrics are expected but simply put, make sure the problem you’ve discovered is worth solving at all with deep learning. A general rule of thumb is to always resort to a simpler solution for a problem, even if it ends up abandoning the usage of deep learning technologies. Simple approaches tend to be more reliable, less costly, less prone to risks, and faster to fruition.
Consider a problem where a solution is needed to remove background scenes in a video feed and leave only humans or necessary objects untouched so that a more suitable background scene can be overlaid as a background instead. This is a common problem in the professional filmmaking industry in all film genres today.
Semantic segmentation, which is the task of assigning a label to every pixel of an image in the width and height dimensions, is a method that is needed to solve such a problem. In this case, the task needs to assign labels that can help identify which pixels need to be removed. With the advent of many publicly available semantic segmentation datasets, deep learning has been able to advance considerably in the semantic segmentation field, allowing itself to achieve a very satisfactory fine-grained understanding of the world, enough so that it can be applied in the industry of autonomous driving and robot navigation most prominently. However, deep learning is not known to be 100% error-free and almost always has some error, even in the controlled evaluation dataset. In the case of human segmentation, for example, the model would likely result in the most errors in the fine hair areas. Most filmmakers aim for perfect depictions of their films and require that every single pixel gets removed appropriately without fail since a lot of money is spent on the time of the actors hired for the film. Additionally, a lot of time and money would be wasted in manually removing objects that could be otherwise simply removed if the scene had been shot with a green screen. This is an example of a case where we should not overcomplicate the problem. A green screen is all you need to solve the problem described: specifically, the rare chromakey green color. When green screens are prepped properly in the areas where the desired imagery will be overlaid digitally, image processing techniques alone can remove the pixels that are considered to be in the small light intensity range centered on the chromakey green color and achieve semantic segmentation effectively with a rule-based solution. The green screen is a simpler solution that is cost-effective, foolproof, and fast to set up.
That was a mouthful! Now, let’s go through a simpler problem. Consider a problem where we want to automatically and digitally identify when it rains. In this use case, it is important to understand the actual requirements and goals of identifying the rain: is it sufficient to detect rain exactly when it happens? Or do we need to identify whether rain will happen in the near future? What will we use the information of rain events for? These questions will guide whether deep learning is required or not. We, as humans, know that rain can be predicted by visual input by either looking at the presence of raindrops falling or looking at cloud conditions. However, if the use case is sufficient to detect rain when it happens, and the goal of detecting rain is to determine when to water the plants, a simpler approach would be to use an electronic sensor to detect the presence of water or humidity. Only when you want to estimate whether it will rain in the future, let’s say in 15 minutes, does deep learning make more sense to be applied as there are a lot of interactions between meteorological factors that can affect rainfall. Only by brainstorming each use case and analyzing all potential solutions, even outside of deep learning, can you make sure deep learning brings tangible business value compared to other solutions. Do not just apply deep learning because you want to.
At times, when value isn’t clear when you’re directly considering a use case, or when value is clear but you have no idea how to execute it, consider finding reference projects from companies in the same industry. Companies in the same industry have a high chance of wanting to optimize the same processes or solve the same pain points. Similar reference projects can serve as a guide to designing a deep learning system and can serve as proof that the use case being considered is worthy of the involvement of deep learning technologies. Of course, not everybody has access to details like this, but you’d be surprised what Google can tell you these days. Even if there isn’t a similar project being carried out for direct reference, you would likely be able to pivot upon the other machine learning project references that already have a track record of bringing value to the same industry.
Admittedly, rejecting deep learning at times would be a hard pill to swallow considering that most practitioners get paid to implement deep learning solutions. However, dismissing it earlier will allow you to focus your time on more valuable problems that would be more useful to solve with deep learning and prevent the risk of undermining the potential of deep learning in cases where simpler solutions can outperform deep learning. Criteria for deep learning worthiness should be evaluated on a case-by-case basis and as a practitioner, the best advice to follow is to simply practice common sense. Spend a good amount of time going through the problem exploration and the worthiness evaluation process. The last thing you want is to spend a painstaking amount of time preparing data, building a deep learning model, and delivering very convincing model insights only to find out that the label you are trying to predict does not provide enough value for the business to invest further.
Defining success
Ever heard sentences like "My deep learning model just got 99% accuracy on my validation dataset!"? Data scientists often make the mistake of determining the success of a machine learning project just by using validation metrics they use to evaluate their machine learning models during the model development process. Model-building metrics such as accuracy, precision, or recall are important metrics to consider in a machine learning project but unless they add business values and connect to the business objectives in some way, they rarely mean anything. A project can achieve a good accuracy score but still fail to achieve the desired business goals. This can happen in cases when no proper success metrics have been defined early and subsequently cause a wrong label to be used in the data preparation and model development stages. Furthermore, even when the model metric positively impacts business processes directly, there is a chance that the achievement won’t be communicated effectively to business stakeholders and the worst case not considered to be successful when reported as-is.
Success metrics, when defined early, act as the machine learning project’s guardrails and ensure that the project goals are aligned with the business goals. One of the guardrails is that a success metric can help guide the choice of a proper label that can at inference time, tangibly improve the business processes or otherwise create value in the business. First, let’s make sure we are aligned with what a label means, which is a value that you want the machine learning model to predict. The purpose of a machine learning model is to assign these labels automatically given some form of input data, and thus during the data preparation and model development stages, a label needs to be chosen to serve that purpose. Choosing the wrong label can be catastrophic to a deep learning project as sometimes, when data is not readily available, it means the project has to start all over again from the data preparation stage. Labels should always be indirectly or directly attributed to the success metric.
Success metrics, as the name suggests, can be plural, and range from time-based success definitions or milestones to the overall project success, and from intangible to tangible. It’s good practice to generally brainstorm and document all the possible success criteria from a low level to a high level. Another best practice is to make sure to always define tangible success metrics alongside intangible metrics. Intangible metrics generate awareness, but tangible metrics make sure things are measurable and thus make them that much more attainable. A few examples of intangible and hard-to-measure metrics are as follows:
Increasing customer satisfaction
Increasing employee performance
Improving shareholder outlook
Metrics are ways to measure something and are tied to goals to seal the deal. Goals themselves can be intangible, similar to the few examples listed previously, but so long as it is tied to tangible metrics, the project is off to a good start. When you have a clear goal, ask yourself in what way the goal can be proven to be achieved, demonstrated, or measured. A few examples of tangible success metrics for machine learning projects that could align with business goals are as follows:
Increase the time customers spend, which can be a proxy for customer delight
Increase company revenue, which can be a proxy for employee performance
Increase the click-through rate (CTR), which can be a proxy for the effectiveness of targeted marketing campaigns
Increase the customer lifetime value (CLTV), which can be a proxy for long-term customer satisfaction and loyalty
Increase conversion rate, which can be a proxy for the success of promotional campaigns and website user experience
This concept is not new nor limited to just machine learning projects – just about any single project carried out for a company as every single real-world project needs to be aligned with the business goal. Many foundational project management techniques can be applied similarly to machine learning projects, and spending time gaining some project management skills out of the machine learning field would be beneficial and transferable to machine learning projects. Additionally, as machine learning is considered to be a software-based technology, software project management methodologies also apply.
A final concluding thought to take away is that machine learning systems are not about how advanced your machine learning models are, but instead about how humans and machine intelligence can work together to achieve a greater good and create value.
Planning resources
Deep learning often involves neural network architectures with a large set of parameters, otherwise called weights. These architecture’s sizes can go from holding a few parameters up to holding hundreds of billions of parameters. For example, an OpenAI GPT-3 text generation model holds 175 billion neural network parameters, which amounts to around 350 GB in computer storage size. This means that to run GPT-3, you need a machine with a random access memory (RAM) size of at least 350 GB!
Deep learning model frameworks such as PyTorch and TensorFlow have been built to work with devices called graphics processing units (GPUs), which offer tremendous neural network model training and inference speedups. Off-the-shelf GPU devices commonly have a GPU RAM of 12 GB and are nowhere near the requirements needed to load a GPT-3 model in GPU mode. However, there are still methods to partition big models into multiple GPUs and run the model on GPUs. Additionally, some methods can allow for distributed GPU model training and inference to support larger data batch sizes at any one usage point. GPUs are not considered cheap devices and can cost anywhere from a few hundred bucks to hundreds of thousands from the most widely used GPU brand, Nvidia. With the rise of cryptocurrency technologies, the availability of GPUs is also reduced significantly due to people buying them immediately when they are in stock. All these emphasize the need to plan computing resources for training and inferencing deep learning models beforehand.
It is important to align your model development and deployment needs to your computing resource allocation early in the project. Start by gauging the range of sizes of deep learning architectures that are suitable for the task at hand either by browsing research papers or websites that provide a good summary of techniques, and setting aside computing resources for the model development process.
Tip
paperswithcode.com
provides summaries of a wide variety of techniques grouped by a wide variety of tasks!
When computing resources are not readily available, make sure you always make purchase plans early, especially if it involves GPUs. But what if a physical machine is not desired? An alternative to using computing resources is to use paid cloud computing resource providers you can access online easily from anywhere in the world. During the model development stage, one of the benefits of having more GPUs with more RAM allocated is that it can allow you to train models faster by either using a larger data batch size during training or allowing the capability to train multiple models at any one time. It is generally fine to also use CPU-only deep learning model training, but the model training time would just inevitably be much longer.
The GPU and CPU-based computing resources that are required during training are often considered overkill to be used during inference time when they are deployed. Different applications have different deployment computing requirements and the decision on what resource specification to allocate can be gauged by asking yourself the following three questions:
How often are the inference requests made?
Many inference requests in a short period might signal the need to have more than one inference service up in multiple computing devices in parallel
What is the average amount of samples that are requested for a prediction at any one time?
Device RAM requirements should match batch size expectations
How fast do you need a reply?
GPUs are needed if