Deep Learning and Machine Learning Advancing Big D
Deep Learning and Machine Learning Advancing Big D
Tianyang Wang
Xi’an Jiaotong-Liverpool University
[email protected]
2
* Equal contribution
† Corresponding author
Contents
II Getting Started 13
3
4 CONTENTS
for Loop: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Specifying Start and Stop Values . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Using a Step Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Using Negative Step Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
while Loop: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Breaking and Continuing in Loops: . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
break: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
continue: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Combining break and continue: . . . . . . . . . . . . . . . . . . . . . . . . 61
5.10 Functions in Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.10.1 def Keyword and Function Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Basic Example: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Functions with Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Functions with Multiple Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Returning Values from Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Default Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.10.2 Recursion in Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Basic Example of Recursion: Factorial Function . . . . . . . . . . . . . . . . . . . 64
Recursive Fibonacci Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Base Case and Recursive Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Caution with Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.11 Simple Sorting Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.11.1 Bubble Sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.11.2 Insertion Sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.11.3 Python’s Built-in sort() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.11.4 Comparing the Time of Bubble Sort, Insertion Sort, and Python’s Built-in sort() . 68
Concrete Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
7.2.2 Unsupervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Working Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Steps in Unsupervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Common Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Unsupervised Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Application Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Applications of Data Analysis Techniques . . . . . . . . . . . . . . . . . . . . . . 84
Advantages and Disadvantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Advantages and Disadvantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Disadvantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Concrete Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
7.2.3 Semi-supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Working Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Semi-Supervised Learning Process . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Common Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Semi-Supervised Learning Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Application Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Advantages and Disadvantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Disadvantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Concrete Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
7.2.4 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Working Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Agent Interaction Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Key Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Key Concepts in Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . 88
Common Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Reinforcement Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Application Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Advantages and Disadvantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Disadvantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Concrete Example: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Game Setup: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Environment Definition: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Q-table Initialization: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Learning Parameters: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Training Process (Q-learning algorithm): . . . . . . . . . . . . . . . . . . . 90
Evaluation: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Results Analysis: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
8 CONTENTS
Visualization: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Further Improvements: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
7.3 Key Concepts and Terminologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
7.3.1 Features and Labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
7.3.2 Training Data and Test Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
7.3.3 Model, Algorithm, and Hyperparameters . . . . . . . . . . . . . . . . . . . . . . . 92
7.4 Machine Learning Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
7.4.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
7.4.2 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
7.4.3 Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
7.4.4 Training and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
7.4.5 Model Deployment and Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . 94
7.5 Common Algorithms in Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 94
7.5.1 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
7.5.2 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
7.5.3 Decision Trees and Random Forests . . . . . . . . . . . . . . . . . . . . . . . . . 96
7.5.4 Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
7.5.5 Clustering Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
7.6 Challenges in Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
7.6.1 Overfitting and Underfitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
7.6.2 Bias-Variance Tradeoff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
7.6.3 Data Quality and Quantity Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
7.6.4 Ethical Concerns and Bias in AI . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
7.7 Machine Learning Tools and Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
7.7.1 Overview of Popular Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
7.7.2 Choosing the Right Tool for Your Task . . . . . . . . . . . . . . . . . . . . . . . . 98
7.7.3 Integration with Data Processing Tools . . . . . . . . . . . . . . . . . . . . . . . . 99
7.8 Practical Examples and Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
7.8.1 Example Projects Using Supervised Learning . . . . . . . . . . . . . . . . . . . . 99
7.8.2 Unsupervised Learning in Action . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
7.8.3 Real-world Applications of Reinforcement Learning . . . . . . . . . . . . . . . . . 100
7.9 Summary and Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
7.9.1 Recap of Key Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
7.9.2 Emerging Trends in Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . 100
7.9.3 Further Reading and Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Getting Started
13
Chapter 1
Welcome to the world of deep learning and machine learning! In this era full of infinite possibilities,
artificial intelligence (AI) is no longer just a concept from science fiction but is becoming an integral
part of our daily lives. From voice assistants in smartphones to self-driving cars, AI is redefining the
world around us. If you’re a beginner looking for a gateway into this vast field, this chapter will open a
window to the history, applications, and future trends of AI, giving you an exciting overview of what’s
to come.
15
16 CHAPTER 1. A JOURNEY INTO ARTIFICIAL INTELLIGENCE
• Smartphones and Daily Technology: Facial recognition on your phone, as well as voice assis-
tants that answer your questions, rely on powerful AI-driven image recognition and natural lan-
guage processing technologies.
• Autonomous Driving and Transportation: Self-driving cars are becoming a reality, thanks to
breakthroughs in machine learning, which enable them to perceive their environment, make de-
cisions, and plan routes.
• Healthcare Revolution: AI is helping doctors analyze medical images, predict disease trends,
and develop personalized treatment plans, elevating healthcare to new heights.
• Financial Services: In the financial sector, deep learning is used for fraud detection, market pre-
diction, and providing personalized investment advice.
learning model extracts features from images or text. Visualization not only aids learning but is also
an essential tool for debugging and optimizing models.
1. Solidify Your Math Foundation: Concepts like linear algebra, calculus, and probability are the
bedrock of machine learning. You don’t need to be a mathematician, but mastering these tools
will significantly enhance your understanding.
2. Learn to Code: Python is the most popular language in machine learning, and with libraries like
TensorFlow and PyTorch, you can easily build deep learning models.
3. Start with Simple Models: Classical algorithms like linear regression and decision trees are
great starting points. Once you understand the basics, you can move on to more complex deep-
learning models.
4. Hands-On Projects: Theory is important, but hands-on experience is where real learning hap-
pens. Start with publicly available datasets and gradually build and optimize your own AI mod-
els.
1.6 Conclusion
Deep learning and machine learning are reshaping the world, and you have the opportunity to be part
of this transformation. As you embark on this learning journey, you’ll not only master cutting-edge
technologies but also discover how to apply them creatively and meaningfully. Whether your goal
is to solve real-world problems with AI or explore the frontiers of research, the future holds endless
possibilities. Let’s step into this intelligent era together and embrace a brighter future!
18 CHAPTER 1. A JOURNEY INTO ARTIFICIAL INTELLIGENCE
Chapter 2
In today’s digital age, artificial intelligence (AI) tools have become widely adopted. These tools sig-
nificantly boost productivity and help us achieve more efficient outcomes in big data management
and analysis, machine learning, and deep learning. Particularly, natural language processing (NLP)
advancements have made multimodal AI tools incredibly powerful. These tools can understand and
generate natural language and analyze multimodal inputs (such as text, images, and code) to assist
users in solving complex tasks.
The utility of these AI tools goes beyond just coding. They can be used to ask questions, explore
ideas, confirm research directions, and solve technical issues. In big data management and machine
learning projects, AI tools are invaluable for quickly exploring data, designing and optimizing models,
and increasing the overall efficiency of research and development. This chapter will introduce several
of the most popular AI tools today and explain how to make the best use of them to enhance your
work in big data and machine learning.
2.1 ChatGPT
ChatGPT [73], developed by OpenAI, is a powerful conversational AI tool [16]. It can not only understand
and generate natural language but also assist developers in data analysis, model design, and code
generation [21]. ChatGPT excels in providing detailed suggestions based on context and continuously
refines its responses through ongoing dialogue with the user. This makes ChatGPT a valuable tool for
solving complex problems and determining research directions.
In the process of big data analysis, ChatGPT can assist users in understanding data distributions,
selecting features, and performing data cleaning tasks. Users can inquire about specific analysis
methods, the details of algorithmic models, or how to select the best model for processing their data.
ChatGPT can also help users narrow down their scope of analysis and develop effective strategies by
discussing and answering specific questions.
19
20 CHAPTER 2. MAKING THE BEST USE OF TOOLS
ChatGPT is highly effective in the field of machine learning and deep learning. It can help users under-
stand the workings of different algorithms and provide suggestions on adjusting model parameters
to improve performance. During model design, ChatGPT can offer advice on selecting activation func-
tions, optimizers, and loss functions, ensuring that the constructed model is well-tuned and logical.
Beyond direction setting and model design, ChatGPT can also generate code snippets based on user
requirements. For example, if a user describes a machine learning algorithm, ChatGPT can provide
the corresponding code implementation. Supporting multiple programming languages, it helps de-
velopers quickly prototype algorithms. Additionally, in terms of code optimization, ChatGPT can help
simplify complex code structures and improve execution efficiency.
2.2 Claude
Claude, developed by Anthropic [6], is another powerful conversational AI tool. It is particularly well-
suited for projects that require a high level of security and robustness, making it a strong fit for big data
analysis and machine learning tasks. Claude not only helps with code generation but is also adept at
performing data processing, analysis, and security review in model design.
In big data management and analysis, data security and accuracy are paramount [27]. Claude can
analyze the characteristics of datasets and help developers design more robust algorithms, ensuring
privacy protection during data processing [18]. Developers can use Claude to design distributed data
processing pipelines, identify key features, and detect potential anomalies in the data.
Claude stands out in model optimization and tuning, especially when aiming for robustness. It helps
users understand how different algorithms apply to large-scale data and offers suggestions for param-
eter tuning, ensuring the generated models generalize well [60]. When working with private or sensitive
data, Claude’s security insights are invaluable in ensuring the model is compliant and secure.
2.3 Gemini
Gemini, developed by Google DeepMind, is a multimodal AI tool that combines powerful language
understanding with the ability to process multiple input types [25]. Gemini not only assists with code
generation and optimization but also helps users perform big data analysis and multimodal reasoning,
making it an ideal tool for complex data analysis and model design [9].
2.4. OTHER MULTIMODAL AI TOOLS 21
Gemini’s greatest strength lies in its multimodal reasoning capability. In big data analysis and machine
learning projects, users can input textual descriptions, code snippets, or images, and Gemini will com-
bine this information for deeper reasoning and analysis. This multimodal capability is particularly ad-
vantageous when working with diverse data or when designing models for complex, cross-disciplinary
tasks.
For deep learning models, especially in image processing and natural language processing, Gemini
assists users in designing and optimizing neural network structures. Users can seek advice on model
architectures (such as convolutional neural networks, recurrent neural networks, or transformer mod-
els), and Gemini will provide suggestions for suitable architectures based on the input. It also helps
with hyperparameter tuning to improve model performance.
In addition to ChatGPT, Claude, and Gemini, there are many other multimodal conversational AI tools
available that can also assist developers and data scientists in big data management, machine learn-
ing, and deep learning.
2.4.1 CodeWhisperer
CodeWhisperer, developed by Amazon, is an AI code assistant integrated into IDEs such as Visual
Studio Code. It is particularly useful for cloud computing and machine learning tasks on big data [4].
CodeWhisperer not only offers contextual code completions but is also seamlessly integrated with
AWS services, making it an ideal choice for developers working on cloud-based machine learning and
data processing tasks.
2.4.2 Copilot
GitHub Copilot, powered by OpenAI Codex, is an AI code assistant that can help write and optimize
code, especially in complex machine learning projects [31]. For data scientists, Copilot simplifies many
time-consuming tasks, such as writing model code, by offering real-time code suggestions tailored to
the user’s coding habits.
Replit Ghostwriter is an AI code assistant on the Replit platform, offering online programming and
real-time data analysis [84]. It is particularly useful for interactive machine-learning experiments and
coding within a browser. Ghostwriter provides instant feedback, helping users optimize model code
and visualize data analysis on the fly.
22 CHAPTER 2. MAKING THE BEST USE OF TOOLS
2.5 Conclusion
The use of AI tools has become an essential skill for modern data scientists and developers. By ef-
fectively leveraging tools such as ChatGPT, Claude, Gemini, and others, developers can significantly
enhance their efficiency in big data management, machine learning, and deep learning. From setting
research directions to optimizing algorithmic models, these tools offer robust support. As AI tech-
nology continues to evolve, we can expect these tools to become even more intelligent and easier to
use.
Chapter 3
In this chapter, we will explain the importance of selecting the right computer hardware for various
tasks, from general programming to more demanding machine learning and deep learning workloads.
The hardware you choose can significantly affect both your productivity and the performance of your
applications.
23
24 CHAPTER 3. CHOOSING COMPUTER HARDWARE FOR PROGRAMMING AND ML
and task distribution, which are CPU-intensive. 32 threads allows parallel processing, particularly in
multi-task and high-load environments, such as training multiple models or handling different datasets
simultaneously.
2. Graphics Card (GPU): NVIDIA GeForce RTX 3090
• Memory: 24 GB GDDR6X
• CUDA Cores: 10,496
• Tensor Cores: 328
• Memory Bandwidth: 936.2 GB/s
• Supports 8K resolution, ray tracing (RTX)
Why it suits deep learning: The NVIDIA RTX 3090 is an excellent choice for deep learning:
• Large VRAM: 24 GB of VRAM allows for loading and processing large neural networks and
datasets without frequent memory swapping, which is crucial for deep learning tasks such as
image processing and NLP.
• CUDA and Tensor Cores: The 10,496 CUDA cores and 328 Tensor cores significantly enhance
the parallel computing capability, accelerating matrix operations (such as matrix multiplication
in deep learning) and convolution operations. Tensor cores further accelerate FP16/FP32 calcu-
lations, improving performance during training and inference.
• High Memory Bandwidth: The 936.2 GB/s memory bandwidth ensures rapid data transfer be-
tween memory and processing units, reducing latency and enhancing training efficiency.
3. Memory (RAM): 32 GB x 4 DDR4 3200 MHz (128 GB total)
• Type: DDR4
• Frequency: 3200 MHz
• CAS latency: CL16 or lower
Why it suits deep learning: Deep learning requires handling large datasets, especially in tasks in-
volving computer vision and natural language processing. The 128 GB memory capacity allows loading
large datasets into memory at once, reducing the need for frequent data swapping between memory
and storage. This large memory capacity speeds up data preprocessing, batch loading, and in-memory
computations, minimizing memory bottlenecks. Additionally, the 3200 MHz frequency ensures fast
data transfer, improving system responsiveness.
4. Motherboard: B550/x570 Chipset
• Supports PCIe 4.0 (GPU slot)
• Supports overclocking
• Supports high-speed M.2 SSDs
Why it suits deep learning: The B550 chipset provides PCIe 4.0 support, which is critical for maxi-
mizing the performance of the RTX 3090 and NVMe SSD. While more affordable than the X570 chipset,
the B550 still offers essential high-end features, including support for overclocking, high-speed mem-
ory, and fast storage, making it a cost-effective choice for deep learning tasks without sacrificing
critical functionality.
5. Storage (SSD): 2TB NVMe SSD
• Read speed: ∼ 500 MB/s
• Write speed: ∼000 MB/s
• Interface: PCIe 4.0 (dependent on motherboard)
Why it suits deep learning: Deep learning involves frequent access to large datasets, and a high-
speed NVMe SSD significantly accelerates data loading and storage operations. The speed of an
NVMe SSD is crucial for tasks such as loading datasets, writing model weights, and saving check-
3.1. EXAMPLE CONFIGURATIONS 25
points during training. The 2TB capacity is sufficient to store multiple projects, datasets, and model
files, reducing the need to rely on external storage.
6. Storage (HDD): 16TB SATA HDD
• 7200 RPM, 256 MB cache
• Large capacity for data storage
Why it suits deep learning: The 16TB HDD is ideal for storing large amounts of archival data, back-
ups, and massive datasets that do not require frequent access. Although the read/write speed is
slower compared to SSDs, HDDs offer a much larger capacity, which is essential for long-term storage
in deep learning projects. This combination of SSD for performance and HDD for capacity provides
an efficient solution for handling both hot and cold data.
7. Power Supply: 1000W 80 PLUS Gold or Platinum Certified PSU
• Provides sufficient power for high-end GPU and multi-core CPU
• Power headroom for overclocking
Why it suits deep learning: The 1000W PSU ensures stable power delivery to high-performance
components such as the RTX 3090 and Ryzen 9 5950X, which are power-hungry during deep learning
workloads. The 80 PLUS Gold or Platinum certification guarantees high energy efficiency, which is
crucial for maintaining system stability and reliability during long-duration, high-load tasks. Addition-
ally, the extra power headroom allows for potential overclocking, ensuring future-proofing for more
demanding workloads.
8. Case: Full Tower Case
• Excellent thermal design and airflow management
• Supports multiple fans and liquid cooling solutions
Why it suits deep learning: Deep learning tasks often involve long periods of heavy GPU and CPU
usage, generating significant heat. A full tower case provides ample space for effective cooling solu-
tions, such as multiple fans or liquid cooling systems, ensuring optimal temperature control. Keeping
components cool under heavy load helps maintain system performance and longevity, preventing ther-
mal throttling and potential hardware damage.
ThreadRipper processors are ideal for high-end workstations or tasks requiring extreme multi-threading.
Here are the three processors you can find on eBay, along with their price trends:
• 3995WX: 64 cores and 128 threads, suitable for massively parallel computing or virtualization
environments. Originally priced at around $5,500, it can now be found on eBay for around $1,000
to $1,500, offering excellent value.
• 5975WX: 32 cores and 64 threads, ideal for applications requiring high parallel processing, such
as video rendering and 3D modeling. Originally priced around $3,200, current eBay prices are
26 CHAPTER 3. CHOOSING COMPUTER HARDWARE FOR PROGRAMMING AND ML
approximately $1,200.
• 5995WX: 64 cores and 128 threads, the most powerful in the ThreadRipper series, designed for
extreme workloads. Originally priced at around $6,500, it can now be found for about $3,000 on
eBay.
It’s important to note that the 3995WX and other ThreadRipper 3000 series processors use the
TRX40 chipset motherboards, while the ThreadRipper 5000 series (e.g., 5975WX and 5995WX) re-
quire the WRX80 chipset motherboards. Be sure to choose the correct motherboard based on your
processor.
When purchasing on eBay, be aware of vendor locks. For example, some 3995WX processors may
have a Lenovo lock, meaning they will only work on specific Lenovo-branded motherboards. Avoid
purchasing locked processors if you do not have a compatible motherboard.
Motherboard Selection
For the 3995WX, you should choose a motherboard that supports the TRX40 chipset, whereas the
5000 series ThreadRipper (like 5975WX and 5995WX) requires a WRX80 chipset motherboard. Here
are some recommended options:
• TRX40 Motherboards (for the 3995WX and other ThreadRipper 3000 series processors):
• WRX80 Motherboards (for the 5995WX, 5975WX, and other ThreadRipper 5000 series proces-
sors):
When searching for these models on eBay, ensure the product is free of vendor locks and comes
with all necessary accessories (e.g., I/O shields, M.2 screws, etc.).
Cooling Solution
ThreadRipper processors generate significant heat, and do not come with a stock cooler, so you will
need to choose a suitable cooling solution. The type of cooler you select will significantly affect the
system’s noise level. Below are common cooling options:
• Liquid Coolers (e.g., Corsair H150i RGB Pro XT): Liquid coolers can deliver efficient cooling while
maintaining low noise levels, making them ideal for users requiring a quiet work environment.
Liquid cooling systems typically produce minimal noise, perfect for home or office use.
• High-Efficiency Air Coolers (e.g., Noctua NH-U14S TR4-SP3): Air coolers also provide good
cooling but may generate more noise at higher fan speeds. Noctua, for example, offers high-end
air coolers known for their low noise levels, suitable for users sensitive to noise.
3.1. EXAMPLE CONFIGURATIONS 27
If using large, high-power fans (often referred to as "aggressive fans"), they can offer extreme
cooling efficiency but typically produce a significant amount of noise. This setup is often used for
server-grade equipment or specialized high-performance workstations. If you plan to use such a cool-
ing solution, we recommend installing your system in a soundproof server room or a dedicated rack
to mitigate noise interference in your work environment.
When installing the cooler, ensure it comes with the TR4 or SP3 mounting brackets, as traditional
coolers might not be compatible with the larger processor size.
Memory Selection
The ThreadRipper platform supports multi-channel memory, so we recommend starting with 32GB per
DIMM and using at least 128GB (4 x 32GB) or more to fully utilize the multi-channel capability. Here
are some recommended memory options:
• Corsair Vengeance LPX DDR4 3200MHz 32GB: A cost-effective option with good stability.
• G.Skill Ripjaws V DDR4 3200MHz 32GB: High-frequency memory, ideal for multi-tasking and
rendering workloads.
You can choose a 4-channel or 8-channel memory configuration to maximize memory bandwidth
and performance, depending on the motherboard’s maximum supported memory channels.
AMD’s EPYC series processors are designed for server and data center environments. Here are two
recommended processors, along with their price trends:
• EPYC 7763: 64 cores and 128 threads, delivering excellent multi-threading performance, ideal for
highly concurrent workloads. Originally priced at around $7,800, it can now be found for about
$700 on eBay.
• EPYC 7773X: 64 cores and 128 threads, leveraging 3D V-Cache technology to provide larger
cache sizes, making it ideal for workloads requiring huge data caches, such as database pro-
cessing. Originally priced at $8,500, it can now be found for around $1,500 on eBay.
Be aware that these processors may have vendor locks (e.g., HP, Dell, or Lenovo lock), meaning
they will only work on specific branded motherboards. Make sure the processor is compatible with
your motherboard before purchasing.
Motherboard Selection
EPYC processors require motherboards with the SP3 socket. Here are some common motherboard
options:
28 CHAPTER 3. CHOOSING COMPUTER HARDWARE FOR PROGRAMMING AND ML
• Supermicro H12SSL-i
• Gigabyte MZ72-HB0
• ASUS KGPE-D16
When searching for these motherboards on eBay, ensure that the product includes all necessary
accessories, particularly the I/O shield and cooling components.
Cooling Solution
EPYC processors also do not come with stock coolers, so you will need to purchase a cooler that
supports the SP3 socket. Below are recommended coolers:
• Noctua NH-U14S TR4-SP3: An air cooler offering excellent cooling performance for the SP3
socket.
• Dynatron A39 Server Cooler: A small, efficient cooler suitable for server setups.
Ensure that the cooler comes with the correct SP3 mounting hardware to guarantee proper instal-
lation.
Memory Selection
EPYC processors support DDR4 RDIMM or LRDIMM. We recommend starting with 32GB per DIMM
and using an 8-channel configuration. The minimum recommended memory configuration is 256GB
(8 x 32GB). Below are some memory options:
Make sure that the memory type is compatible with your motherboard and processor, and we rec-
ommend using multi-channel memory configurations to maximize overall performance.
Other Considerations
For a server configuration, especially one based on the EPYC 7763 or 7773X processors, you should
also consider the following hardware:
• Power Supply: A high-wattage power supply is necessary for server systems. We recommend
at least 1200W with 80 Plus Platinum certification.
• Chassis: Choose a chassis that supports multi-point cooling and large-sized motherboards. You
can consider rackmount chassis from brands like Supermicro or Dell.
• Cooling Systems: If using high-power fans, we recommend installing the server in a dedicated
soundproof server room or cabinet to avoid noise disruptions in your work environment.
By following these steps, you can easily complete your high-end ThreadRipper or EPYC server hard-
ware purchases on eBay.
3.1. EXAMPLE CONFIGURATIONS 29
formance in games and visual simulations. The RTX 4090 is ideal for anyone requiring top-tier GPU
performance for AI, creative, or gaming workloads.
3. Memory (RAM): 96 GB DDR5
• Type: DDR5
• Frequency: 5200 MHz or slightly higher(up to 6000 MHz)
• Configuration: 2 x 48 GB modules
Why it suits high-end performance: Although DDR5 offers higher bandwidth and improved effi-
ciency over DDR4, it operates at slightly lower memory frequencies. The frequency might not reach its
full potential due to DDR5’s initial adoption phase, but the increased memory capacity—96 GB—more
than compensates for this, allowing users to work with large datasets and multitask efficiently. This
configuration is ideal for deep learning, large-scale simulations, and video editing, where vast amounts
of data must be processed simultaneously. The 96 GB capacity is future-proof, offering ample head-
room for memory-intensive applications.
4. Motherboard: B650/X670 Chipset
• Supports PCIe 5.0 (GPU slot)
• Supports overclocking
• Supports high-speed M.2 SSDs
Why it suits deep learning: The B650 and X670 chipsets provide PCIe 5.0 support, which is es-
sential for maximizing the performance of next-generation GPUs and NVMe SSDs used in deep learn-
ing. These motherboards offer enhanced memory speed, overclocking capabilities, and support for
high-bandwidth components, ensuring optimal performance for large datasets and compute-intensive
tasks. The B650 offers a more affordable option with essential features, while the X670 provides ad-
ditional PCIe lanes and connectivity options, making it a more robust choice for demanding deep
learning environments.
5. Storage (SSD): 4TB NVMe SSD
• Read speed: ∼ 7000 MB/s
• Write speed: ∼ 6800 MB/s
• Interface: PCIe 4.0
Why it suits high-end performance: The 4TB NVMe SSD offers fast read and write speeds, signifi-
cantly enhancing system responsiveness and reducing load times for large files and applications. This
is especially important in deep learning and video editing tasks where large datasets and files need to
be accessed and processed frequently. PCIe 4.0 support allows even faster data transfer rates, further
reducing bottlenecks in data-heavy workflows.
6. Storage (HDD): 20TB SATA HDD
• 7200 RPM, 256 MB cache
• Massive storage capacity
Why it suits high-end performance: The 20TB HDD provides an enormous amount of storage for
archival data, backups, and large datasets that do not require frequent access. While not as fast as
SSDs, the large capacity of the HDD is essential for long-term data storage in applications like video
editing, deep learning, and content creation, where raw data, project files, and backup images can
consume significant space.
7. Power Supply: 1200W 80 PLUS Platinum Certified PSU
• Provides ample power for high-end GPU and CPU
• Efficiency ensures stable performance under heavy loads
Why it suits high-end performance: A 1200W power supply ensures sufficient and stable power
3.2. HARDWARE CONSIDERATIONS FOR PROGRAMMING 31
delivery for the power-hungry RTX 4090 and Ryzen 9 7950X3D, especially during overclocking or peak
loads. The 80 PLUS Platinum certification guarantees high efficiency, reducing power waste and en-
suring reliability during prolonged high-load operations, which is critical for tasks like deep learning
and rendering that require long periods of uninterrupted performance.
8. Case: Full Tower Case
• Supports excellent airflow and multiple cooling solutions
• Ample space for large components
Why it suits high-end performance: A full tower case is essential for housing large components
like the RTX 4090, and it provides space for advanced cooling systems such as liquid cooling. Given
the high power consumption and heat generation of this configuration, efficient cooling is crucial to
prevent thermal throttling and ensure long-term stability during extended workloads.
When working on general programming tasks, the hardware requirements are relatively modest com-
pared to machine learning or deep learning workloads. However, ensuring that you have appropriate
resources is still crucial for efficient development.
For most programming tasks, a modern multi-core processor is sufficient. Compiling code and run-
ning applications will benefit from higher clock speeds and more cores, especially in multi-threaded
programming environments. However, a balance should be struck between power and cost based on
the complexity of your development needs.
RAM is another critical factor for programming. While typical tasks may not be extremely memory-
intensive, having at least 16 GB of RAM is recommended to avoid slowdowns due to memory paging,
especially when running multiple applications or virtual machines.
3.2.3 Storage
Fast storage is essential for reducing load times when working with large files or compiling code.
Solid-state drives (SSDs) are highly recommended over traditional hard disk drives (HDDs) to ensure
faster data access and overall system responsiveness.
Machine learning and deep learning tasks require considerably more powerful hardware due to the
scale and complexity of the operations involved.
32 CHAPTER 3. CHOOSING COMPUTER HARDWARE FOR PROGRAMMING AND ML
NVIDIA GPUs
NVIDIA continues to be a dominant player in deep learning, with their CUDA framework and GPUs
such as the RTX series for consumer use and the A100 for enterprise-level tasks. These GPUs excel in
handling large datasets and complex model training. Models like the RTX 4090, with 24 GB of memory,
or enterprise-grade cards with even more VRAM, are ideal for heavy workloads.
For users with modern Apple Silicon Macs, Apple’s Metal Performance Shaders (MPS) offer an in-
tegrated solution for GPU acceleration in deep learning. MPS support is built into frameworks such
as PyTorch, enabling accelerated model training directly on Apple’s M1, M2, and M3 chips. The uni-
fied memory architecture in Apple Silicon allows Apple GPUs to access the entire system’s memory,
making it an attractive option for local development without the need for external GPUs.
AMD ROCm
AMD’s ROCm (Radeon Open Compute) platform is gaining traction as a strong competitor to CUDA,
particularly with support for PyTorch, TensorFlow, and JAX. AMD’s recent GPUs, including the Radeon
RX 7900 XTX and the MI250 series, are optimized for deep learning tasks. ROCm enables high-
performance computing and offers alternatives for users looking for non-NVIDIA options, especially
in Linux environments.
Memory Considerations
Regardless of the GPU brand, selecting a GPU with enough VRAM is critical. GPUs with 16 GB to 24
GB of memory, such as NVIDIA RTX cards and AMD’s Radeon Pro models, or Apple Silicon Macs with
more than 32 GB of RAM are recommended to prevent memory bottlenecks during training.
Each of these platforms has its strengths, allowing users to choose based on their budget, hard-
ware preferences, and compatibility with their development environments.
learning tasks, with 64 GB or more being preferable for larger datasets and more complex model
architectures.
3.3.4 Storage
Deep learning tasks can involve large datasets that require fast access times. NVMe SSDs are ideal
for this, providing high-speed data retrieval necessary for fast iteration and model training.
Before embarking on any task, having the proper tools is crucial for success. In the realm of software
development, setting up a proper development environment is the first critical step. With the right
setup, developers can streamline their workflow, reduce errors, and focus on producing high-quality
code. This chapter will guide you through the necessary steps to prepare your development environ-
ment, ensuring you are equipped to work efficiently with Python.
• Windows: Run the installer and ensure the option "Add Python to PATH" is checked before con-
tinuing with the installation. This will allow you to run Python from the command line.
• macOS: macOS usually comes with Python pre-installed, but it may be an older version. To install
the latest version, you can use a package manager like Homebrew (brew install python) or
download it from the Python website.
35
36 CHAPTER 4. SETTING UP YOUR DEVELOPMENT ENVIRONMENT
• Linux: Most Linux distributions include Python by default, but if it’s not present, you can install it
manually using your package manager, such as apt for Ubuntu (sudo apt install python3) or
dnf for Fedora.
After installing Python, it is crucial to verify the installation. Open a terminal or command prompt and
type the following command:
python --version
If the installation is successful, the terminal will display the installed Python version.
Python 3.11.3
Additionally, you can check whether the Python package installer, pip, is installed by typing:
pip --version
Note: If you are installing a newer version of Python 3.x, pip is usually installed automatically along
with it.
Once both versions are displayed, your Python installation is complete, and you are ready to begin
coding!
pip 23.3.2 from C:\Users\user\AppData\Local\Programs\Python\Python311\Lib\site-packages\pip (
python 3.11)
In this section, we will explore the significance of virtual environments in Python development and
cover two popular ways to create and manage virtual environments: using venv and conda. Virtual
environments allow you to isolate project dependencies and avoid conflicts between different versions
of libraries.
A virtual environment is an isolated environment that contains its installation directories and libraries,
separate from the global Python installation. This is essential when working on multiple projects that
require different versions of Python packages. Virtual environments enable you to:
2. Run the following command to create a new virtual environment with the desired Python version:
conda deactivate
Note: After activating the virtual environment, your terminal prompt will display the environment
name in parentheses, e.g., (myenv), indicating that the environment is active.
For example, to install numpy in a specific conda environment, follow these steps:
1. First, activate the environment where you want to install numpy by running:
2. Once the environment is activated, install numpy within that environment by running:
• scikit-learn: A machine learning library for data mining and analysis [78].
• tqdm: A fast, extensible progress bar for loops and command-line applications.
Note: If you do not fully understand what CUDA is or do not need GPU acceleration at the moment, you
can skip this section and simply install the CPU version of PyTorch using the following command:
PyTorch is a deep learning framework widely used in artificial intelligence and machine learning
tasks [76]. PyTorch supports both CPU and GPU computations, leveraging CUDA for GPU accelera-
tion. To utilize GPU for faster training and inference, it is essential to install the appropriate version
of PyTorch that is compatible with CUDA. Below is an introduction to CUDA and the steps to install a
CUDA-enabled PyTorch version.
Introduction to CUDA
CUDA (Compute Unified Device Architecture) is a parallel computing platform and API developed by
NVIDIA that enables developers to use GPUs for high-performance computations. For tasks such as
deep learning, where large-scale computations are required, GPU acceleration can significantly speed
up model training and inference. To allow PyTorch to take advantage of CUDA, the correct version of
the CUDA toolkit and a compatible version of PyTorch must be installed.
4.3. INSTALLING ESSENTIAL PACKAGES 39
1. Ensure that your system has the NVIDIA GPU driver and the appropriate version of the CUDA
toolkit installed. You can check the CUDA version on your system using the command nvidia-smi.
3. For instance, if you want to install PyTorch with CUDA 11.8 on Windows using pip, you can follow
the selection process and copy the generated command below into your Conda environment for
execution.
Figure 4.1: PyTorch installation guide for Windows using Pip with CUDA 11.8, showing the command
to install PyTorch, torchvision, and torchaudio.
4. If the CUDA toolkit is not yet installed on your system, follow the instructions on the NVIDIA
website to download and install the correct version.
After installation, you can verify whether PyTorch detects CUDA support by running the following
Python command in your conda environment:
1 import torch
2 print(torch.cuda.is_available()) # Returns True if CUDA is available
By installing PyTorch with CUDA support, you can take full advantage of the computational power
of GPUs, dramatically improving the performance of deep learning models, especially when dealing
with large datasets and complex architectures.
To use TensorFlow [1] with GPU acceleration, you will need to install the correct version of Ten-
sorFlow that is compatible with your system’s CUDA version. TensorFlow leverages NVIDIA’s CUDA
framework for GPU computation, which can greatly accelerate the performance of deep learning mod-
els.
Here are the steps to install TensorFlow with CUDA support:
• First, ensure that you have a compatible version of CUDA and cuDNN installed. TensorFlow
supports specific versions of CUDA and cuDNN, so check TensorFlow’s compatibility chart to
match the correct versions.
• After installation, verify that TensorFlow can access the GPU by running the following Python
script:
1 import tensorflow as tf
2 print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
If the output shows the number of available GPUs, the installation was successful and Tensor-
Flow is ready to use the GPU for computations.
Make sure you also have the necessary NVIDIA drivers installed on your system for CUDA to func-
tion properly. Additionally, it’s recommended to use a virtual environment to manage package depen-
dencies effectively.
1. Using a requirements.txt file: List the required packages and versions in this file.
// Basic version
numpy
tqdm
opencv-python
Once you’ve listed your required packages in the requirements.txt file, you can install all the
packages at once with the following command:
4.4. SETTING UP AN INTEGRATED DEVELOPMENT ENVIRONMENT (IDE) 41
pip install -r requirements.txt
2. Specifying version numbers: Ensure correct package versions in the requirements.txt file to
avoid issues from updates.
3. Using virtual environments: Isolate your project’s dependencies from the global environment.
• Text Editors: Text editors are generally simpler and lighter, focusing on writing and editing text,
with fewer built-in features. Some popular text editors include:
– Vim: A highly customizable, keyboard-driven text editor favored by developers for its effi-
ciency.
– Notepad++: A free text editor for Windows that supports syntax highlighting and other basic
features but lacks integrated debugging or project management.
– VSCode (Visual Studio Code): A highly extensible text editor that, with the right extensions,
can function similarly to an IDE. It supports syntax highlighting, debugging, version control,
and more.
• IDEs: Integrated Development Environments come with a full set of tools for coding, debugging,
testing, and managing projects all in one place.
– PyCharm: A powerful IDE specifically designed for Python development. PyCharm provides
tools like code completion, debugging, refactoring, and testing built into one package, mak-
ing it ideal for larger, more complex projects.
• Setting Up VSCode:
4. Install additional extensions like Pylint and Black for linting and code formatting.
• Setting Up PyCharm:
Proper configuration of your IDE or text editor can significantly improve your productivity and help
you focus on writing better code.
• JSON Crack: A handy tool for visualizing JSON structures. This extension allows you to explore
JSON data in a graphical format, making it easier to understand complex nested structures.
• Rainbow CSV: This extension highlights columns in CSV files, making it easier to recognize and
differentiate columns when working with data. It’s particularly helpful when dealing with large
datasets or unformatted CSV files.
• Copilot: A powerful AI-driven extension developed by GitHub. Copilot assists by providing intel-
ligent code suggestions, auto-completion, and even writing entire blocks of code based on your
prompts, greatly improving coding speed and accuracy.
• CodeSnap: A simple but effective extension that allows you to take beautiful screenshots of
your code snippets directly from VSCode. It includes customization options like background
color and padding to enhance readability and aesthetics.
Chapter 5
In this chapter, we will start by writing a simple Python program to print a message, “Hello, World!”
This is the first step in learning any programming language and will help you understand the basics of
Python syntax.
1 print("Hello, World!")
When you run this code, the output on your screen will be:
Hello, World!
This program is very simple. It uses the “print” function to display the text inside the parentheses
on the screen. This is your first Python program.
1 def hello_world():
2 print("Hello, World!")
Now, if we want to display “Hello, World!” we just need to call this function:
1 hello_world()
When you run this code, the output on your screen will be:
Hello, World!
43
44 CHAPTER 5. INTRODUCTION TO PYTHON PROGRAMMING
1 def hello_world():
2 print("Hello, World!")
3
4 hello_world()
1 def hello_world():
2 print("Hello, World!")
3
4 if __name__ == "__main__":
5 hello_world()
1 import hw
2
3 hw.hello_world()
When you run test.py, the output on your screen will be:
Hello, World!
However, if you do not want the code in hw.py to run automatically when it is imported into another
file, but only when it is run directly, then if __name__ == "__main__" is very useful. This ensures that
hello_world() is only executed when you run hw.py directly; if you import it into main.py, the code will
not run unless you explicitly call it in test.py.
This is the purpose of if __name__ == "__main__" — it prevents code from running when it is not
supposed to, giving you better control over the behavior of your program.
In debugging a program that consists of multiple source files, using the if __name__ == "__main__"
statement can help you debug more effectively. Specifically, when your project includes multiple mod-
ules or script files, each file may have its functionality, logic, and interfaces. When you want to debug
a specific module independently, you can add if __name__ == "__main__" to that module and write
some modular test code or invoke functions within that module underneath it. This not only helps in
verifying the module’s independence but also ensures that each part can function correctly on its own
when integrated with other modules, thereby improving debugging efficiency and program reliability.
On Windows, if you want to use multi-threading, you must also wrap the main program in the if
__name__ == "__main__" construct. This is because, on Windows, when using multi-threading or
5.2. DATA TYPES IN PYTHON 45
multi-processing modules (such as threading or multiprocessing), child processes will reload the
main program. Without this safeguard, the main program might be executed multiple times, leading
to the creation of new processes in a loop, which could cause infinite recursion or other unexpected
behavior. Therefore, when writing multi-threaded programs, especially on Windows, it’s crucial to place
the main program inside the if __name__ == "__main__" block to avoid such issues.
5.2.1 Numbers
Numbers in Python include integers, floats (decimal numbers), and complex numbers. Python sup-
ports standard arithmetic operations on numbers such as addition, subtraction, multiplication, and
division.
1 x = 10 # integer
2 y = 3.14 # float
3 z = 1 + 2j # complex number
Python automatically performs type conversion (implicit casting) when you perform operations
between different types of numbers. For example, if you add an integer and a float, Python will au-
tomatically convert the integer to a float, ensuring that the result is also a float. Similarly, operations
between floats and complex numbers will result in a complex number.
When you add an integer and a float, the result is a float because Python promotes the integer to a
float to avoid losing precision.
1 a = 10 # integer
2 b = 2.5 # float
3 result = a + b
4 print(result)
5 print(type(result))
Output:
12.5
<class 'float'>
When adding a float and a complex number, Python automatically converts the float to a complex
number, preserving the real and imaginary parts.
46 CHAPTER 5. INTRODUCTION TO PYTHON PROGRAMMING
1 c = 3.14 # float
2 d = 2 + 3j # complex number
3 result = c + d
4 print(result)
5 print(type(result))
Output:
(5.14+3j)
<class 'complex'>
Even when dividing two integers, the result will always be a float in Python 3.x. This ensures that
division always returns a precise result, avoiding truncation.
1 e = 10
2 f = 3
3 result = e / f
4 print(result)
5 print(type(result))
Output:
3.3333333333333335
<class 'float'>
If you want to perform integer division (where the result is an integer and truncates the decimal part),
you can use the ‘//‘ operator.
1 g = 10
2 h = 3
3 result = g // h
4 print(result)
5 print(type(result))
Output:
3
<class 'int'>
1 i = 10
2 j = 3
3
5.2. DATA TYPES IN PYTHON 47
Output:
1
1000
5.2.2 Strings
Strings are sequences of characters. They can be created by enclosing text in either single or double
quotes.
1 greeting = "Hello, Python!"
2 name = 'Alice'
Output:
HELLO WORLD
hello Python
String Formatting
You can format strings using various methods, including f-strings, the format method, or using the old
% operator.
1 name = "Alice"
2 age = 25
3 print(f"My name is {name} and I am {age} years old.") # f-string
4 print("My name is {} and I am {} years old.".format(name, age)) # .format()
5 print("My name is %s and I am %d years old." % (name, age)) # % operator
6 print("My name is " + str(name) + "and I am " + str(age) + "years old.") # string concatenation
Output:
My name is Alice and I am 25 years old.
My name is Alice and I am 25 years old.
My name is Alice and I am 25 years old.
My name is Alice and I am 25 years old.
48 CHAPTER 5. INTRODUCTION TO PYTHON PROGRAMMING
5.3 Lists
Lists are ordered collections of items in Python. They are mutable, which means you can change their
content without creating a new list. Lists can hold items of different data types, including other lists.
1 # Creating a list
2 fruits = ["apple", "banana", "cherry"]
3 print(fruits[0])
4
9 # Slicing lists
10 print(fruits[1:3])
11
12 # Unpack lists
13 print(*fruits)
Output:
apple
banana
cherry
['banana', 'cherry']
banana cherry
1 # Creating a list
2 fruits = ["apple", "banana", "cherry"]
3 print("Original list:", fruits)
4
20
Output:
Original list: ['apple', 'banana', 'cherry']
After append: ['apple', 'banana', 'cherry', 'orange']
After insert: ['apple', 'blueberry', 'banana', 'cherry', 'orange']
After remove: ['apple', 'blueberry', 'cherry', 'orange']
After delete: ['blueberry', 'cherry', 'orange']
After sort: ['blueberry', 'cherry', 'orange']
After reverse: ['orange', 'cherry', 'blueberry']
Copied list: ['orange', 'cherry', 'blueberry']
Output:
2
[4, 5, 6]
5.4 Tuples
Tuples are immutable sequences in Python, meaning once you create a tuple, you cannot modify its
elements. They are often used to group related data together and can be of mixed data types. Tuples
are defined using parentheses ‘()‘.
50 CHAPTER 5. INTRODUCTION TO PYTHON PROGRAMMING
1 # Creating a tuple
2 coordinates = (10, 20)
3 print(coordinates[0]) # Prints the first element
4 print(coordinates[1]) # Prints the second element
5
10 # Nested tuples
11 nested_tuple = (1, (2, 3), (4, 5))
12 print(nested_tuple[1]) # Prints the nested tuple at index 1
13 print(nested_tuple[1][0]) # Prints the first element of the nested tuple at index 1
14
15 # Tuple unpacking
16 a, b, c = (10, 20, 30)
17 print(a) # Prints the value of a
18 print(b) # Prints the value of b
19 print(c) # Prints the value of c
20
21 # Concatenating tuples
22 tuple1 = (1, 2, 3)
23 tuple2 = (4, 5, 6)
24 combined = tuple1 + tuple2
25 print(combined) # Prints the concatenated tuple
26
27 # Repeating tuples
28 repeated = tuple1 * 3
29 print(repeated) # Prints the repeated tuple
30
31 # Checking membership
32 print(2 in tuple1) # Checks if 2 is in tuple1
33 print(7 in tuple1) # Checks if 7 is in tuple1
Output:
10
20
(1, 'apple', 3.14, [1, 2, 3])
(2, 3)
2
10
20
30
(1, 2, 3, 4, 5, 6)
(1, 2, 3, 1, 2, 3, 1, 2, 3)
True
False
5.5. DICTIONARIES 51
5.5 Dictionaries
Dictionaries are collections of key-value pairs. They allow you to store data that is associated with a
unique key. Dictionaries are defined using curly braces ‘‘.
1 # Creating a dictionary
2 person = {"name": "Alice", "age": 25}
3 print(person["name"]) # Prints the value associated with the key 'name'
4 print(person["age"]) # Prints the value associated with the key 'age'
5
Output:
Alice
25
{'name': 'Alice', 'age': 25, 'email': '[email protected]'}
{'name': 'Alice', 'age': 26, 'email': '[email protected]'}
{'name': 'Alice', 'age': 26}
dict_keys(['name', 'age'])
dict_values(['Alice', 26])
True
False
Alice
Not Available
52 CHAPTER 5. INTRODUCTION TO PYTHON PROGRAMMING
5.6 Sets
Sets are unordered collections of unique elements. They are useful when you want to store unique
values and perform operations like union, intersection, and difference. Sets are defined using curly
braces ‘‘.
1 # Creating a set
2 my_set = {1, 2, 3, 3, 4}
3 print(my_set) # Prints the set, automatically removing duplicates
4
13 # Checking membership
14 print(3 in my_set) # Checks if 3 is in the set
15 print(2 in my_set) # Checks if 2 is in the set
16
17 # Set operations
18 set1 = {1, 2, 3}
19 set2 = {3, 4, 5}
20
21 # Union of sets
22 union_set = set1 | set2
23 print(union_set) # Prints the union of set1 and set2
24
25 # Intersection of sets
26 intersection_set = set1 & set2
27 print(intersection_set) # Prints the intersection of set1 and set2
28
29 # Difference of sets
30 difference_set = set1 - set2
31 print(difference_set) # Prints the difference between set1 and set2
32
Output:
{1, 2, 3, 4}
{1, 2, 3, 4, 5}
{1, 3, 4, 5}
True
False
{1, 2, 3, 4, 5}
{3}
5.7. VARIABLES IN PYTHON 53
{1, 2}
{1, 2, 4, 5}
3 def example():
4 x = "local" # Local variable
5 print(x)
6
7 example()
8 print(x)
Output:
local
global
Mutable Objects
Mutable objects can be changed after their creation. Lists, dictionaries, and sets are examples of
mutable objects.
1 def modify_list(lst):
2 lst.append(4) # Modifies the original list
3
4 original_list = [1, 2, 3]
5 modify_list(original_list)
6 print(original_list)
Output:
54 CHAPTER 5. INTRODUCTION TO PYTHON PROGRAMMING
[1, 2, 3, 4]
Immutable Objects
Immutable objects cannot be changed after their creation. Integers, floats, strings, and tuples are
examples of immutable objects.
1 def modify_tuple(t):
2 t += (4,) # Creates a new tuple, does not modify the original tuple
3
4 original_tuple = (1, 2, 3)
5 modify_tuple(original_tuple)
6 print(original_tuple)
Output:
(1, 2, 3)
1 import numpy as np
2
Output:
0.1
0.2
0.30000000000000004
0.4
As seen in the example above, the accumulation of floating-point precision errors causes the result
to deviate from what you might expect, such as 0.3 being displayed as 0.30000000000000004.
3 a = Decimal('0.1')
4 b = Decimal('0.2')
5 print(a + b)
Output:
0.3
Output:
2
4
56 CHAPTER 5. INTRODUCTION TO PYTHON PROGRAMMING
If you need traditional rounding where 0.5 always rounds up to the next integer, you can use the
‘decimal‘ module with the ‘ROUND_HALF_UP‘ method:
1 import decimal
2
3 decimal.getcontext().rounding = decimal.ROUND_HALF_UP
4
Output:
3
4
Logical operators in Python allow you to combine or invert conditions in your conditional statements.
The three main logical operators are and, or, and not.
and Operator:
The and operator is used to combine two or more conditions. It returns True only if all conditions are
true. If any condition is false, the entire expression evaluates to False.
Output:
In this example, both conditions (age > 18 and income > 30000) are true, so the program prints
"You are eligible for a loan."
or Operator:
The or operator is used to combine two or more conditions. It returns True if at least one of the
conditions is true. If all conditions are false, the expression evaluates to False.
5.9. BASIC CONTROL STRUCTURES 57
Output:
In this case, even though the age is less than 18, the second condition (has_parental_consent) is
true, so the program prints "You can apply for a driver’s license."
not Operator:
The not operator is used to invert the result of a condition. If the condition is True, not makes it False,
and if the condition is False, not makes it True.
4 if not is_raining:
5 print("You don't need an umbrella")
6 else:
7 print("You need an umbrella")
Output:
In this example, since is_raining is False, the not operator inverts it to True, so the program prints
"You don’t need an umbrella."
4 if x > 15:
5 print("x is greater than 15")
6 elif x == 10:
7 print("x is equal to 10")
58 CHAPTER 5. INTRODUCTION TO PYTHON PROGRAMMING
8 else:
9 print("x is less than 10")
Output:
x is equal to 10
In this example, the program checks if x is greater than 15, then checks if it is equal to 10. Since x
equals 10, the second block is executed, and the message "x is equal to 10" is printed.
if statements can also be nested, allowing for more complex logic:
1 # Nested if statements
2 y = 20
3
4 if y > 10:
5 if y < 30:
6 print("y is between 10 and 30")
7 else:
8 print("y is greater than or equal to 30")
9 else:
10 print("y is less than or equal to 10")
Output:
y is between 10 and 30
In this nested if statement example, Python first checks if y is greater than 10, and then within that
block, it checks whether y is less than 30.
Note: It is important to avoid using too many layers of nested if statements as it can make the code
difficult to read and maintain. In such cases, it is better to refactor the code by using functions or logical
operators like and and or.
for Loop:
The for loop is used to iterate over a sequence (such as a list, tuple, dictionary, set, or string). It will
execute the block of code for each item in the sequence.
1 # Example of a for loop with list[str]
2 fruits = ["apple", "banana", "cherry"]
3
Output:
apple
banana
cherry
5.9. BASIC CONTROL STRUCTURES 59
In this example, the for loop iterates over the list numbers, printing each item in the list.
You can also use the range() function to loop through a sequence of numbers. The range() func-
tion generates a sequence of numbers, starting from 0 by default, and stops before a specified number.
You can also specify the start, stop, and step values to control the sequence.
1 # Using range() in a for loop
2 for i in range(5):
3 print(i)
Output:
0
1
2
3
4
In this example, range(5) generates a sequence from 0 to 4, and the loop iterates through each
value.
You can also specify a starting point and an ending point for the range() function. The loop will start
from the specified start value and stop right before the specified end value.
1 # Using range() with start and stop values
2 for i in range(3, 8):
3 print(i)
Output:
3
4
5
6
7
The range() function also allows you to specify a step value, which determines the increment between
each number in the sequence.
1 # Using range() with a step value
2 for i in range(0, 10, 2):
3 print(i)
Output:
0
2
4
6
8
60 CHAPTER 5. INTRODUCTION TO PYTHON PROGRAMMING
Output:
10
8
6
4
2
while Loop:
The while loop continues to execute a block of code as long as a given condition is True. Once the
condition becomes False, the loop stops.
1 # Example of a while loop
2 count = 0
3
Output:
0
1
2
3
4
In this example, the while loop keeps executing as long as count is less than 5. The variable count
increments by 1 with each iteration until the condition is no longer true.
The flow of loops in Python can be controlled using two important keywords: break and continue.
break: The break statement is used to exit the loop completely. Once the break statement is encoun-
tered, the program immediately stops executing the current loop and moves on to the next section of
code after the loop. This is useful when you want to terminate a loop early, based on a specific condi-
tion.
1 # Example of break
2 for i in range(10):
3 if i == 5:
4 break
5 print(i)
5.9. BASIC CONTROL STRUCTURES 61
Output:
0
1
2
3
4
In this example, the loop prints the numbers from 0 to 4. When i equals 5, the break statement is
executed, and the loop is exited, preventing the numbers 5 through 9 from being printed.
continue: The continue statement is used to skip the current iteration of the loop and move directly
to the next iteration. Unlike break, continue does not stop the loop entirely; it only skips over the
remaining code in the current iteration and continues looping.
1 # Example of continue
2 for i in range(10):
3 if i % 2 == 0:
4 continue
5 print(i)
Output:
1
3
5
7
9
In this example, the continue statement is used to skip even numbers. When i is even, the program
jumps to the next iteration, and thus only odd numbers are printed.
Combining break and continue: You can use break and continue together in a loop to control the
flow in different ways. Here’s an example that demonstrates both statements in the same loop.
Output:
1
3
5
In this example, the loop skips over even numbers using continue, but when i equals 7, the break
statement is triggered, stopping the loop entirely. Therefore, only the odd numbers 1, 3, and 5 are
printed before the loop exits.
62 CHAPTER 5. INTRODUCTION TO PYTHON PROGRAMMING
Basic Example:
A simple function that takes no arguments and returns a greeting message can be defined as follows:
1 def greet():
2 return "Hello, World!"
To call the function and see the result, you can write:
1 # Call the function
2 print(greet())
Output:
Hello, World!
In this example, the function greet() is defined with no parameters and returns the string "Hello,
World!". The function is called using its name followed by parentheses, and the result is printed.
Functions can also take parameters, which allow you to pass data into the function. These parameters
are specified inside the parentheses when the function is defined.
1 # Function with a parameter
2 def greet(name):
3 return f"Hello, {name}!"
Output:
Hello, Alice!
In this case, the function greet() takes one parameter, name, and returns a personalized greeting.
When the function is called with "Alice" as the argument, it returns "Hello, Alice!".
5.10. FUNCTIONS IN PYTHON 63
Functions can accept multiple parameters by separating them with commas. Here is an example:
Output:
In this example, the function add_numbers() takes two parameters, a and b, and returns their sum.
When called with the arguments 3 and 5, the function returns 8.
A function can return a value using the return statement. The returned value can be stored in a variable
or used directly. Here’s an example:
Output:
16
In this example, the function square() returns the square of the argument x. When called with the
argument 4, the function returns 16, which is then stored in the variable result and printed.
Default Parameters
Python allows you to define functions with default parameter values. If an argument is not provided
when calling the function, the default value is used.
If no argument is passed, the default value "Guest" is used:
Output:
Hello, Guest!
Hello, Alice!
In this case, the function greet() has a default parameter value of "Guest". If no argument is
provided, the function returns "Hello, Guest!". If an argument is provided, it overrides the default
value.
Recursion is a technique in programming where a function calls itself to solve a problem. Recursive
functions break a problem down into smaller subproblems, and the function continues to call itself
with these smaller subproblems until it reaches a base case, which is a condition that stops the re-
cursion. Recursion is useful for tasks that can be naturally divided into similar, smaller tasks, such as
mathematical problems or traversing tree-like structures.
To write a recursive function in Python, you define the function using def, and within the function,
call the function itself with a modified argument. You also need to define a base case to prevent the
recursion from running indefinitely.
A classic example of recursion is calculating the factorial of a number. The factorial of a number n is
defined as n × (n − 1) × (n − 2) × · · · × 1, and can be expressed recursively as:
n! = n × (n − 1)!
Output:
120
5.10. FUNCTIONS IN PYTHON 65
In this example, the function factorial() calls itself with n − 1 until n = 0, at which point the base
case is reached, and the recursion stops.
Another common example of recursion is calculating numbers in the Fibonacci sequence. In the Fi-
bonacci sequence, each number is the sum of the two preceding ones, starting from 0 and 1. Mathe-
matically, it can be expressed as:
Output:
8
In this example, the function fibonacci() recursively calls itself to compute the Fibonacci number
for n − 1 and n − 2, continuing until the base cases n = 0 or n = 1 are reached.
• Base case: This is the condition that stops the recursion. Without a base case, the function
would call itself infinitely.
• Recursive case: This is the part of the function that reduces the problem into smaller instances
and continues the recursion.
In the factorial example, the base case is when n == 0, and in the Fibonacci example, the base
cases are when n == 0 and n == 1.
Recursion can be an elegant solution to certain problems, but it comes with some trade-offs. Recursive
functions consume memory with each function call, and Python has a limit on the depth of recursion
to prevent stack overflow. If recursion goes too deep, you may encounter a RecursionError. To avoid
this, make sure your recursive algorithm reaches a base case for all possible inputs.
66 CHAPTER 5. INTRODUCTION TO PYTHON PROGRAMMING
By default, Python sets the recursion limit to 1000. If necessary, you can increase the recursion
limit, but it’s generally better to refactor the recursive algorithm if it becomes too deep.
4. Repeat the process for the entire list until it is sorted, regardless of whether swaps were made
or not during a pass.
Here is the modified Python implementation of Bubble Sort where the algorithm always runs for
all iterations without early exit:
1 # Bubble Sort algorithm in Python (without early exit)
2 def bubble_sort(arr):
3 n = len(arr)
4 for i in range(n):
5 # Traverse the array and perform comparisons and swaps
6 for j in range(0, n-i-1):
7 # Compare adjacent elements
8 if arr[j] > arr[j+1]:
9 # Swap if they are in the wrong order
10 arr[j], arr[j+1] = arr[j+1], arr[j]
Output:
Bubble Sort is easy to understand but is not very efficient for large datasets due to its time com-
plexity of O(n2 ). It is best suited for small datasets or educational purposes.
Output:
Insertion Sort has a time complexity of O(n2 ), similar to Bubble Sort, but it is more efficient when
the dataset is already partially sorted. It performs well for small datasets and is often used in practice
for such cases.
Here’s how you can use the built-in sort() method in Python:
1 # Using Python's built-in sort() method
2 arr = [64, 34, 25, 12, 22, 11, 90]
3 arr.sort()
4 print("Sorted array:", arr)
Output:
Sorted array: [11, 12, 22, 25, 34, 64, 90]
This built-in sort() method works efficiently with large datasets and offers various customization
options, such as specifying a key function or sorting in reverse order. Additionally, Python’s built-in sort
is stable, meaning it preserves the relative order of elements with equal values, which can be useful in
certain applications.
To sort a list in descending order, you can pass the reverse=True argument:
1 # Sorting in descending order
2 arr.sort(reverse=True)
3 print("Sorted array in descending order:", arr)
Output:
Sorted array in descending order: [90, 64, 34, 25, 22, 12, 11]
Overall, Python’s built-in sort() is highly optimized and should be preferred for most sorting tasks,
especially when dealing with large datasets. It offers a significant performance improvement com-
pared to simpler sorting algorithms like Bubble Sort and Insertion Sort, making it the go-to choice for
efficient sorting in Python.
5.11.4 Comparing the Time of Bubble Sort, Insertion Sort, and Python’s Built-in
sort()
To compare the efficiency of Bubble Sort, Insertion Sort, and Python’s built-in sort() method, we can
measure the time each algorithm takes to sort the same list. Below is a Python example that uses the
time module to measure the time for each sorting algorithm.
1 import time
2
Output:
This code compares the time taken by Bubble Sort, Insertion Sort, and Python’s built-in sort()
function to sort a list of 1000 random integers. The results are printed in milliseconds, and as expected,
Python’s built-in sort() is significantly faster due to its highly optimized Timsort algorithm. While
Bubble Sort and Insertion Sort are easy to implement and understand, they are inefficient for large
datasets. The built-in sort() should be the preferred choice when performance is critical.
70 CHAPTER 5. INTRODUCTION TO PYTHON PROGRAMMING
Chapter 6
• Iris: The Iris dataset is one of the most classic datasets in machine learning, commonly used
for teaching and research in classification algorithms. The dataset contains 150 records, each
with 4 features (sepal length, sepal width, petal length, and petal width) and a target label (Iris
setosa, Iris versicolor, Iris virginica) [29]. Although the dataset is small, it is significant due to its
simplicity and historical importance in education.
• MNIST: The MNIST (Modified National Institute of Standards and Technology) dataset consists
of 70,000 images of handwritten digits (0-9), with 60,000 used for training and 10,000 for test-
ing [57]. Each image has a resolution of 28x28 pixels in grayscale. The MNIST dataset is one of
the benchmark datasets for image classification tasks, widely used to evaluate the performance
of image processing and machine learning algorithms.
• CIFAR-10/CIFAR-100: CIFAR-10 and CIFAR-100 are two widely used datasets for image classifi-
cation [53]. CIFAR-10 consists of 60,000 32x32 pixel color images, divided into 10 classes with
6,000 images per class. CIFAR-100 has a similar structure but includes 100 classes with 600
71
72 CHAPTER 6. PYTHON FOR DATA SCIENCE
images per class. These datasets are well-known for evaluating convolutional neural networks
(CNNs) and other image-processing algorithms.
• ImageNet: The ImageNet dataset is a large-scale visual database containing over 14 million
labeled images, categorized into 1,000 classes [26]. This dataset is a significant benchmark in
the field of computer vision, especially for tasks like image classification, object detection, and
image segmentation. The annual ImageNet challenge (ImageNet Large Scale Visual Recognition
Challenge, ILSVRC) has been a key driver in the development of deep learning and convolutional
neural networks.
These datasets have become indispensable resources in machine learning and deep learning re-
search due to their wide application and influence. In the following sections, we will explore these
datasets in more detail, discussing their characteristics and applications.
– Iris-setosa
– Iris-versicolor
– Iris-virginica
• Features:
– Sepal Length
– Sepal Width
– Petal Length
– Petal Width
Each sample in the dataset is described by these four morphological features, and the goal is to
classify the samples into the three species using machine learning algorithms.
Uses of the dataset:
• Classification problem: Since the iris species are clearly labeled, this dataset is frequently used
to test and compare different classification algorithms.
• Visualization: Due to its small size and intuitive features, the Iris dataset is often used for data
visualization, especially for beginners.
• Model validation: The dataset is commonly used to validate the performance of machine learn-
ing models, particularly in supervised learning classification tasks.
6.1. COMMONLY USED DATASETS 73
The Iris dataset is widely used in machine learning education and research and is included as a
built-in dataset in many open-source machine learning libraries, such as scikit-learn.
The Iris dataset is relatively small, so the entire dataset is provided here for readers to study. To
facilitate its use in a Python environment, the code below demonstrates how to import the Iris dataset
using the “sklearn” library, print the dataset’s dimensions, and display them.
After executing the code above, you will see the dimensions of the dataset and the target labels
displayed as follows:
Dataset shape: (150, 4)
Target shape: (150,)
Listing 6.2: Iris Dataset and Target Dimensions
Table 6.1: Iris Dataset - Sepal and Petal Measurements for Different
Varieties
Sepal Sepal Petal Petal Variety Sepal Sepal Petal Petal Variety
Length Width Length Width Length Width Length Width
5.1 3.5 1.4 0.2 Setosa 4.9 3.0 1.4 0.2 Setosa
4.7 3.2 1.3 0.2 Setosa 4.6 3.1 1.5 0.2 Setosa
5.0 3.6 1.4 0.2 Setosa 5.4 3.9 1.7 0.4 Setosa
4.6 3.4 1.4 0.3 Setosa 5.0 3.4 1.5 0.2 Setosa
4.4 2.9 1.4 0.2 Setosa 4.9 3.1 1.5 0.1 Setosa
5.4 3.7 1.5 0.2 Setosa 4.8 3.4 1.6 0.2 Setosa
4.8 3.0 1.4 0.1 Setosa 4.3 3.0 1.1 0.1 Setosa
5.8 4.0 1.2 0.2 Setosa 5.7 4.4 1.5 0.4 Setosa
5.4 3.9 1.3 0.4 Setosa 5.1 3.5 1.4 0.3 Setosa
5.7 3.8 1.7 0.3 Setosa 5.1 3.8 1.5 0.3 Setosa
5.4 3.4 1.7 0.2 Setosa 5.1 3.7 1.5 0.4 Setosa
4.6 3.6 1.0 0.2 Setosa 5.1 3.3 1.7 0.5 Setosa
4.8 3.4 1.9 0.2 Setosa 5.0 3.0 1.6 0.2 Setosa
5.0 3.4 1.6 0.4 Setosa 5.2 3.5 1.5 0.2 Setosa
5.2 3.4 1.4 0.2 Setosa 4.7 3.2 1.6 0.2 Setosa
4.8 3.1 1.6 0.2 Setosa 5.4 3.4 1.5 0.4 Setosa
74 CHAPTER 6. PYTHON FOR DATA SCIENCE
Sepal Sepal Petal Petal Variety Sepal Sepal Petal Petal Variety
Length Width Length Width Length Width Length Width
5.2 4.1 1.5 0.1 Setosa 5.5 4.2 1.4 0.2 Setosa
4.9 3.1 1.5 0.2 Setosa 5.0 3.2 1.2 0.2 Setosa
5.5 3.5 1.3 0.2 Setosa 4.9 3.6 1.4 0.1 Setosa
4.4 3.0 1.3 0.2 Setosa 5.1 3.4 1.5 0.2 Setosa
5.0 3.5 1.3 0.3 Setosa 4.5 2.3 1.3 0.3 Setosa
4.4 3.2 1.3 0.2 Setosa 5.0 3.5 1.6 0.6 Setosa
5.1 3.8 1.9 0.4 Setosa 4.8 3.0 1.4 0.3 Setosa
5.1 3.8 1.6 0.2 Setosa 4.6 3.2 1.4 0.2 Setosa
5.3 3.7 1.5 0.2 Setosa 5.0 3.3 1.4 0.2 Setosa
7.0 3.2 4.7 1.4 Versicolor 6.4 3.2 4.5 1.5 Versicolor
6.9 3.1 4.9 1.5 Versicolor 5.5 2.3 4.0 1.3 Versicolor
6.5 2.8 4.6 1.5 Versicolor 5.7 2.8 4.5 1.3 Versicolor
6.3 3.3 4.7 1.6 Versicolor 4.9 2.4 3.3 1.0 Versicolor
6.6 2.9 4.6 1.3 Versicolor 5.2 2.7 3.9 1.4 Versicolor
5.0 2.0 3.5 1.0 Versicolor 5.9 3.0 4.2 1.5 Versicolor
6.0 2.2 4.0 1.0 Versicolor 6.1 2.9 4.7 1.4 Versicolor
5.6 2.9 3.6 1.3 Versicolor 6.7 3.1 4.4 1.4 Versicolor
5.6 3.0 4.5 1.5 Versicolor 5.8 2.7 4.1 1.0 Versicolor
6.2 2.2 4.5 1.5 Versicolor 5.6 2.5 3.9 1.1 Versicolor
5.9 3.2 4.8 1.8 Versicolor 6.1 2.8 4.0 1.3 Versicolor
6.3 2.5 4.9 1.5 Versicolor 6.1 2.8 4.7 1.2 Versicolor
6.4 2.9 4.3 1.3 Versicolor 6.6 3.0 4.4 1.4 Versicolor
6.8 2.8 4.8 1.4 Versicolor 6.7 3.0 5.0 1.7 Versicolor
6.0 2.9 4.5 1.5 Versicolor 5.7 2.6 3.5 1.0 Versicolor
5.5 2.4 3.8 1.1 Versicolor 5.5 2.4 3.7 1.0 Versicolor
5.8 2.7 3.9 1.2 Versicolor 6.0 2.7 5.1 1.6 Versicolor
5.4 3.0 4.5 1.5 Versicolor 6.0 3.4 4.5 1.6 Versicolor
6.7 3.1 4.7 1.5 Versicolor 6.3 2.3 4.4 1.3 Versicolor
5.6 3.0 4.1 1.3 Versicolor 5.5 2.5 4.0 1.3 Versicolor
5.5 2.6 4.4 1.2 Versicolor 6.1 3.0 4.6 1.4 Versicolor
5.8 2.6 4.0 1.2 Versicolor 5.0 2.3 3.3 1.0 Versicolor
5.6 2.7 4.2 1.3 Versicolor 5.7 3.0 4.2 1.2 Versicolor
5.7 2.9 4.2 1.3 Versicolor 6.2 2.9 4.3 1.3 Versicolor
5.1 2.5 3.0 1.1 Versicolor 5.7 2.8 4.1 1.3 Versicolor
6.3 3.3 6.0 2.5 Virginica 5.8 2.7 5.1 1.9 Virginica
7.1 3.0 5.9 2.1 Virginica 6.3 2.9 5.6 1.8 Virginica
6.5 3.0 5.8 2.2 Virginica 7.6 3.0 6.6 2.1 Virginica
4.9 2.5 4.5 1.7 Virginica 7.3 2.9 6.3 1.8 Virginica
6.7 2.5 5.8 1.8 Virginica 7.2 3.6 6.1 2.5 Virginica
6.5 3.2 5.1 2.0 Virginica 6.4 2.7 5.3 1.9 Virginica
6.1. COMMONLY USED DATASETS 75
Sepal Sepal Petal Petal Variety Sepal Sepal Petal Petal Variety
Length Width Length Width Length Width Length Width
6.8 3.0 5.5 2.1 Virginica 5.7 2.5 5.0 2.0 Virginica
5.8 2.8 5.1 2.4 Virginica 6.4 3.2 5.3 2.3 Virginica
6.5 3.0 5.5 1.8 Virginica 7.7 3.8 6.7 2.2 Virginica
7.7 2.6 6.9 2.3 Virginica 6.0 2.2 5.0 1.5 Virginica
6.9 3.2 5.7 2.3 Virginica 5.6 2.8 4.9 2.0 Virginica
7.7 2.8 6.7 2.0 Virginica 6.3 2.7 4.9 1.8 Virginica
6.7 3.3 5.7 2.1 Virginica 7.2 3.2 6.0 1.8 Virginica
6.2 2.8 4.8 1.8 Virginica 6.1 3.0 4.9 1.8 Virginica
6.4 2.8 5.6 2.1 Virginica 7.2 3.0 5.8 1.6 Virginica
7.4 2.8 6.1 1.9 Virginica 7.9 3.8 6.4 2.0 Virginica
6.4 2.8 5.6 2.2 Virginica 6.3 2.8 5.1 1.5 Virginica
6.1 2.6 5.6 1.4 Virginica 7.7 3.0 6.1 2.3 Virginica
6.3 3.4 5.6 2.4 Virginica 6.4 3.1 5.5 1.8 Virginica
6.0 3.0 4.8 1.8 Virginica 6.9 3.1 5.4 2.1 Virginica
6.7 3.1 5.6 2.4 Virginica 6.9 3.1 5.1 2.3 Virginica
5.8 2.7 5.1 1.9 Virginica 6.8 3.2 5.9 2.3 Virginica
6.7 3.3 5.7 2.5 Virginica 6.7 3.0 5.2 2.3 Virginica
6.3 2.5 5.0 1.9 Virginica 6.5 3.0 5.2 2.0 Virginica
6.2 3.4 5.4 2.3 Virginica 5.9 3.0 5.1 1.8 Virginica
76 CHAPTER 6. PYTHON FOR DATA SCIENCE
Chapter 7
This chapter will cover the fundamentals of machine learning, providing a comprehensive overview of
various types, methodologies, and practical applications.
• Parameters: The factors considered by the model (often adjusted during training).
• Learner: The algorithm that adjusts the parameters and improves the model based on its perfor-
mance on training data.
Machine learning models are often described as either supervised or unsupervised, which refers
to whether the model is trained with human supervision (i.e., the data is labeled) or without it.
Supervised Learning: This approach involves training a model on a labeled dataset, which means
that each example in the training set is tagged with the correct answer (the label) [13]. The learning
algorithm gets a sample of data and then makes adjustments to the model to minimize errors. After
numerous iterations, the model aims for the smallest possible number of errors when predicting labels
on new, unseen data.
Unsupervised Learning: In contrast, unsupervised learning involves training a model on data that
does not have labeled responses [37]. Here, the goal is to infer the natural structure present within
77
78 CHAPTER 7. INTRODUCTION TO MACHINE LEARNING
a set of data points. It includes clustering [43] and association algorithms [3] that group objects of
similar kinds into respective categories and discover interesting patterns in data.
Comparing to Traditional Programming: Traditional computational approaches typically require a
clear set of instructions and do not change unless explicitly updated by a human. In contrast, machine
learning algorithms are designed to learn from data and update themselves in response to that data.
This capability allows them to adapt to new trends or unknown variables without human intervention.
Why is Machine Learning Important?
1. Adaptability: ML can adapt to new trends as it learns from the data. This makes it particularly
useful for applications where it is impractical or impossible to program explicit, rule-based in-
structions.
2. Scale: ML algorithms can handle vast amounts of data and complex variable interactions that
are too complex for human analysts to handle.
3. Automation and Decision-making: ML can automate routine processes and make decisions in
real time based on data analysis, which can significantly enhance the efficiency and effective-
ness of systems across different industries.
Challenges in Machine Learning: While machine learning offers significant advantages, it also
comes with its own set of challenges, such as ensuring the quality of data, dealing with imbalanced
and unstructured data, interpreting model results, and maintaining privacy and security.
In conclusion, machine learning represents a significant shift in how computers can learn and make
decisions. It bridges the gap between human programming capabilities and computational speed,
making it a key driver of innovation and efficiency in the modern digital era.
period marked by reduced funding and waning interest in AI research. A second AI winter occurred in
the late 1980s, following overly optimistic predictions that failed to materialize.
The Revival and Growth of Neural Networks: The 1980s saw a revival of interest in neural networks
thanks to the backpropagation algorithm, which allowed networks to adjust their hidden layers of
neurons in situations where the output did not match the target value, effectively enabling deeper
learning. The invention of Convolutional Neural Networks (CNNs) by Yann LeCun [56] in 1989 further
advanced the field, particularly in image and video processing applications.
The Rise of Big Data and Advanced Algorithms: The 21st century has seen an explosion in data
generation and the computational power necessary to process it. This era has been marked by signif-
icant advancements in algorithms and an increase in the use and sophistication of machine learning.
Notable developments include the introduction of Support Vector Machines, ensemble methods like
Random Forests, and boosting techniques which have substantially increased the accuracy and ap-
plicability of predictive models.
Deep Learning Breakthroughs: In 2012, a landmark event in the history of machine learning oc-
curred when a deep learning model designed by Geoffrey Hinton and his team won the ImageNet [26]
competition by a large margin. This victory underscored the potential of deep learning, leading to its
widespread adoption across various sectors, including speech recognition, autonomous vehicles, and
medical diagnostics.
Current Trends and Future Directions: Today, machine learning is ubiquitous, powering search
engines, recommender systems, speech recognition, and numerous other applications. The focus has
shifted towards making AI more accessible and ethical, improving model interpretability, and moving
towards unsupervised learning techniques that require less human supervision.
The field of machine learning continues to evolve, driven by a community of researchers and prac-
titioners dedicated to pushing the boundaries of what machines can learn. As we look to the future,
the ongoing integration of AI with other technologies promises to transform industries and societies
in profound ways.
In conclusion, the history of machine learning is a testament to the collaborative, interdisciplinary
efforts that have driven advances in this field, highlighting an exciting trajectory from theoretical ex-
ploration to practical, transformative technologies.
Understanding the differences between machine learning (ML) and traditional programming is essen-
tial to appreciate the unique advantages that ML brings to various computational tasks. Traditional
programming and machine learning fundamentally differ in their approach to problem-solving, adapt-
ability, and application complexity.
Fundamental Approach:
• Machine Learning: Conversely, machine learning algorithms infer rules from provided data. In-
stead of being explicitly programmed for each step, an ML model is trained using a large amount
80 CHAPTER 7. INTRODUCTION TO MACHINE LEARNING
of data, learning the patterns or statistical representations required to perform a task. This ap-
proach is particularly advantageous for complex problems where defining explicit rules is im-
practical or impossible.
• Machine Learning: ML models excel in environments where they continuously learn and adapt
from new data, improving their accuracy and efficiency over time without human intervention.
This makes them highly suitable for dynamic and evolving tasks such as personalized recom-
mendations, real-time fraud detection, and predictive maintenance.
• Traditional Programming: Traditional methods struggle with large datasets or complex patterns
because they require the programmer to foresee and code for every possible scenario. The
complexity and volume of data can quickly become unmanageable, leading to rigid and brittle
systems.
• Machine Learning: Machine learning algorithms are designed to handle and analyze vast amounts
of data and detect complex patterns that may not be immediately apparent or predictable to hu-
man programmers. This capability is underpinned by ML’s ability to perform feature extraction
and dimensionality reduction, simplifying inputs without losing essential information.
• Traditional Programming: The precision and accuracy of traditional programs are limited by the
initial logic and algorithms coded by the programmer. Any errors in the code or oversight in the
logic can propagate and magnify throughout the application, leading to incorrect results.
• Machine Learning: ML models minimize errors through iterative learning. As more data be-
comes available and the model is trained over numerous cycles, it fine-tunes its algorithms to
improve accuracy and reduce errors, often surpassing human-level performance in tasks like
image recognition and language translation.
• Machine Learning: ML models are inherently scalable, designed to improve as data volume
grows. This scalability makes them ideal for applications like social media trend analysis and
large-scale financial systems where data grows exponentially.
Concept
The core idea of supervised learning is to learn a function through known input-output pairs (called
training sets) that can map new unseen inputs to correct outputs. The "supervision" here refers to the
correct output labels included in the training data, which the algorithm can constantly refer to during
the learning process to adjust its predictions.
Working Principle
1. Data Preparation: Collect labeled training data, usually including input features and correspond-
ing output labels.
2. Model Selection: Choose an appropriate algorithm model based on the nature of the problem,
such as linear regression, decision trees, neural networks, etc.
3. Model Training: Use training data to optimize model parameters, minimizing the error between
the model’s predicted output and actual labels.
4. Model Evaluation: Use test data not involved in training to evaluate the model’s generalization
ability.
5. Prediction: Use the trained model to make predictions on new unlabeled data.
Common Algorithms
• Linear Regression: A simple and effective algorithm for predicting continuous value outputs.
• Logistic Regression: Despite having "regression" in its name, it’s an algorithm for binary classifi-
cation problems.
• Decision Trees: A tree-based algorithm for classification and regression, easy to understand and
interpret.
• Random Forests: An algorithm that ensembles multiple decision trees, usually performing better
than a single decision tree.
• Support Vector Machines (SVM): A powerful classification algorithm, particularly suitable for
handling high-dimensional data.
82 CHAPTER 7. INTRODUCTION TO MACHINE LEARNING
• Neural Networks: A class of algorithms inspired by biological neural systems, capable of learning
complex non-linear relationships.
Application Scenarios
• Image Classification: Recognizing objects in images, such as face recognition, handwritten digit
recognition, etc.
• Natural Language Processing: Text classification, sentiment analysis, machine translation, etc.
• Medical Diagnosis: Predicting disease risks or diagnosing diseases based on patient data.
Advantages
Disadvantages
• Requires a large amount of labeled training data, which can be costly to obtain.
Detailed Explanation
Supervised learning is one of the most commonly used methods in machine learning. Its core idea is
to learn a mapping function from input to output through labeled data. This process can be analogous
to having a "teacher" (i.e., the labeled data) guiding the learning process.
In supervised learning, we have a training dataset D = (x1, y1), (x2, y2), ..., (xn, yn), where xi is
the input feature and yi is the corresponding label or target value. The learning objective is to find a
function f such that for a new input x, f(x) can accurately predict the corresponding y.
Supervised learning can be further divided into two main types of problems:
1. Classification Problems: When the output variable y is a discrete category, such as determining
whether an email is spam or not.
2. Regression Problems: When the output variable y is a continuous value, such as predicting
house prices.
7.2. TYPES OF MACHINE LEARNING 83
Concrete Example
Let’s use a house price prediction problem as an example to illustrate the supervised learning process
in detail:
1. Data Collection: Collect historical house sale data, including the following features: - Area
(square meters) - Number of bedrooms - Number of bathrooms - Age of the house (years) - Location
- Sale price (target variable)
2. Data Preprocessing: - Handle missing values: For example, interpolate missing area data. -
Feature encoding: Convert categorical features like "location" into numerical form, such as one-hot
encoding. - Feature scaling: Normalize all features to the same scale, such as using Min-Max scaling.
3. Model Selection: For a regression problem like house price prediction, we can choose a linear
regression model: y = w0 + w1x1 + w2x2 + ... + wnxn where y is the predicted house price, xi are the
various features, and wi are the corresponding weights.
4. Model Training: Use optimization algorithms like gradient descent to find the optimal set of
weights that minimize prediction error. For example, we can use Mean Squared Error (MSE) as the
loss function:
n
1X
MSE = (yi − ŷi )2
n i=1
where yi is the actual house price and ŷi is the model’s predicted price.
5. Model Evaluation: Use techniques like cross-validation to evaluate the model’s performance.
For example, we can use the R² score to measure the model’s fit:
Pn
(yi − ŷi )2
R2 = 1 − Pi=1
n 2
i=1 (yi − ȳ)
Unsupervised learning is a branch of machine learning that deals with unlabeled data. This learning
method attempts to discover hidden structures or patterns from the data itself, without relying on
predefined outputs.
The main tasks of unsupervised learning include:
1. Clustering: Grouping similar data points. 2. Dimensionality Reduction: Reducing the number of
features in the data while preserving the main information. 3. Association Rule Learning: Discovering
relationships between data items. 4. Anomaly Detection: Identifying abnormal or rare data points.
The challenge in unsupervised learning is that, due to the lack of explicit target outputs, it’s difficult
to objectively evaluate the algorithm’s performance. Often, the interpretation and validation of results
require the involvement of domain experts.
84 CHAPTER 7. INTRODUCTION TO MACHINE LEARNING
Working Principle
2. Algorithm Selection: Choose a suitable unsupervised learning algorithm based on task objec-
tives.
4. Interpret Results: Analyze algorithm output to discover patterns or structures in the data.
5. Validation: Use domain knowledge or other methods to verify if the discovered patterns are
meaningful.
Common Algorithms
• K-means Clustering: Groups data points into a predetermined number of clusters [59].
• Principal Component Analysis (PCA): Used for dimensionality reduction and feature extraction [35].
• Independent Component Analysis (ICA): Decomposes multivariate signals into independent sub-
components [42].
• Self-Organizing Maps (SOM): A neural network method for data visualization [51].
• Gaussian Mixture Models: A probabilistic model assuming data is composed of multiple Gaus-
sian distributions [13].
• Association Rule Learning: Discovers relationships between items in large databases [3].
Application Scenarios
• Market Segmentation: Dividing customers into different groups based on their behavior [101].
• Anomaly Detection: Identifying anomalies or outliers in datasets, such as fraud detection [19].
• Feature Learning: Automatically learning useful feature representations from raw data [12].
Advantages
• Does not require labeled data and can handle large amounts of unlabeled data.
Disadvantages
Concrete Example
Let’s use customer segmentation as an example to illustrate the application process of unsupervised
learning in detail:
Suppose an e-commerce company wants to divide its customers into different groups based on
their purchasing behavior to develop targeted marketing strategies.
1. Data Collection: Collect the following information about customers: - Annual purchase amount
- Purchase frequency - Time since last purchase - Product categories browsed - Customer age - Cus-
tomer registration duration
2. Data Preprocessing: - Handle missing values: For example, interpolate missing age data. - Fea-
ture scaling: Normalize all features, such as using Z-score standardization. - Handle outliers: Remove
or adjust extreme values.
3. Algorithm Selection: For customer segmentation, we can use the K-means clustering algorithm.
4. Apply Algorithm: Steps of the K-means algorithm: a) Choose K initial center points (assume
K=3) b) Assign each data point to the nearest center point c) Recalculate the center point of each
cluster d) Repeat steps b and c until the center points no longer change significantly
5. Result Analysis: Suppose we get three customer groups: - Group A: High spending, high-frequency
purchases - Group B: Medium spending, occasional purchases - Group C: Low spending, rare pur-
chases
6. Result Application: Based on these groups, the company can develop different marketing strate-
gies: - Offer VIP services and exclusive discounts to Group A - Provide promotional activities to Group B
to encourage more frequent purchases - Offer entry-level products and discounts to Group C to attract
them to increase purchases
7. Continuous Optimization: - Regularly rerun the clustering algorithm, as customer behavior may
change over time - Try different K values to find the optimal number of groups - Combine with other
algorithms, such as Principal Component Analysis (PCA) for feature dimensionality reduction
This example demonstrates how unsupervised learning can help businesses understand customer
structure and develop more effective business strategies. By discovering hidden patterns in the data,
unsupervised learning provides valuable insights for decision-making.
Certainly! I’ll reformat the semi-supervised learning and reinforcement learning sections to match
the structure of supervised and unsupervised learning sections.
86 CHAPTER 7. INTRODUCTION TO MACHINE LEARNING
Concept
The core idea of semi-supervised learning [75] is to leverage the abundant unlabeled data to improve
the performance of models trained on limited labeled data. This approach is particularly useful in
scenarios where obtaining labeled data is expensive or time-consuming, but unlabeled data is plentiful.
Working Principle
1. Data Preparation: Collect a small set of labeled data and a large set of unlabeled data.
3. Pseudo-labeling: Use the initial model to generate pseudo-labels for part of the unlabeled data.
4. Model Update: Retrain the model using both the original labeled data and the pseudo-labeled
data.
5. Iteration: Repeat the pseudo-labeling and model update steps until convergence or a set number
of iterations.
Common Algorithms
• Self-training: The model uses its predictions to generate labels for unlabeled data [105].
• Co-training: Multiple models learn from each other, each focusing on different aspects of the
data [14].
• Generative Models: Use generative models to model the data distribution [50].
• Graph-based Methods: Construct graphs based on relationships between data points, then prop-
agate labels on the graph [109].
Application Scenarios
1. Text Classification: Improving classification performance using large amounts of unlabeled text
data [72].
2. Image Recognition: Training models with few labeled images and many unlabeled images [81].
3. Speech Recognition: Enhancing recognition accuracy using large amounts of untranscribed
speech data [106].
7.2. TYPES OF MACHINE LEARNING 87
4. Bioinformatics: Predicting gene functions using few genes with known functions and many with
unknown functions [103].
5. Web Page Classification: Automatically classifying web pages using a few manually classified
pages and many unclassified ones [14].
Advantages
• Often performs better than supervised learning using only labeled data.
Disadvantages
• Theoretical foundations are relatively weak, and effectiveness depends on specific problems
and data distributions.
Concrete Example
Let’s use a text classification problem to illustrate the application of semi-supervised learning:
Suppose we’re building a news article classification system to categorize articles into four classes:
"Politics", "Sports", "Technology", and "Entertainment". We have a small number of manually labeled
articles and a large number of unlabeled articles.
1. Data Preparation: - Labeled data: 1,000 annotated news articles - Unlabeled data: 100,000 unan-
notated news articles
2. Feature Extraction: - Use TF-IDF (Term Frequency-Inverse Document Frequency) to convert text
into numerical features - Features include: word frequency, article length, and presence of specific
keywords
3. Initial Model Training: Train an initial Support Vector Machine (SVM) classifier using the 1,000
labeled articles
4. Self-training Process: a) Use the initial SVM to predict labels for the 100,000 unlabeled articles
b) Select the 10,000 articles with the highest prediction confidence c) Add these high-confidence pre-
dictions to the training set d) Retrain the SVM using the augmented training set (now 11,000 articles)
e) Repeat steps a-d for 5 iterations or until performance stabilizes
5. Model Evaluation: - Use 5-fold cross-validation to evaluate model performance - Compare the
performance of the SVM trained only on labeled data vs. the semi-supervised model
6. Results Analysis: Suppose we observe: - Initial SVM (1,000 labeled articles): 75- Semi-supervised
SVM (after 5 iterations): 85- "Politics" and "Sports" categories show the highest improvement - "Tech-
nology" and "Entertainment" still have some confusion
7. Practical Application: - Deploy the final model to classify incoming news articles - Implement
a feedback loop where human editors occasionally verify and correct classifications, providing newly
labeled data
88 CHAPTER 7. INTRODUCTION TO MACHINE LEARNING
8. Continuous Improvement: - Periodically retrain the model with newly acquired labeled data -
Experiment with other semi-supervised methods like graph-based label propagation - Enhance feature
extraction by incorporating word embeddings
This example demonstrates how semi-supervised learning can significantly improve classification
performance by leveraging a large amount of unlabeled data, which is often readily available in real-
world scenarios.
Concept
In reinforcement learning, an intelligent agent learns by taking actions in an environment and observing
the results. Each action receives feedback from the environment, usually in the form of rewards or
penalties. The agent’s goal is to learn a policy that maximizes long-term cumulative rewards.
Working Principle
1. State Perception: The agent observes the current state of the environment.
2. Action Selection: Based on the current policy, the agent chooses an action.
4. Feedback Reception: The agent receives a reward or penalty from the environment.
6. Policy Update: The action policy is updated based on the experience gained.
7. Repetition: The above steps are repeated, continuously improving the policy.
Key Concepts
• Markov Decision Process (MDP): A mathematical framework for describing reinforcement learn-
ing problems [11].
• Value Function: Estimates the expected future rewards from a given state [96].
• Policy: A rule that determines what action to take in a given state [95].
• Exploration vs. Exploitation: Balancing between trying new actions (exploration) and choosing
known good actions (exploitation) [46].
• Temporal Difference Learning: Updating value estimates based on current estimates and ob-
served rewards.
7.2. TYPES OF MACHINE LEARNING 89
Common Algorithms
• SARSA: Similar to Q-learning but considers the actual next action taken [87].
• Policy Gradient Methods: Directly optimize the policy without using a value function [97].
• Actor-Critic Methods: Combine policy gradient methods with value function estimation [52].
Application Scenarios
Advantages
Disadvantages
• The Training process can be unstable and requires a lot of trial and error.
This reformatted structure for semi-supervised learning and reinforcement learning sections now
matches the layout of the supervised and unsupervised learning sections, providing a consistent for-
mat throughout the chapter.
Concrete Example: Let’s illustrate reinforcement learning with an example of training an AI to play a
simple maze game:
90 CHAPTER 7. INTRODUCTION TO MACHINE LEARNING
Game Setup:
Environment Definition:
• Rewards:
Q-table Initialization: Create a 100x4 table (100 states, 4 actions), initialize all values to 0.
Learning Parameters:
• ϵ for ϵ-greedy strategy: initial value 0.9, decaying by 0.005 each episode
Evaluation:
Results Analysis:
• Suppose we observe:
– First 100 episodes: 20% success rate, average 80 steps when successful
– Last 100 episodes: 95% success rate, average 20 steps
– Final test run: 98% success rate, average 18 steps
Visualization:
Further Improvements:
The test data, which the model has never seen during training, is used to evaluate its generalization
capabilities. This split ensures that the model is assessed on data it has not encountered before,
indicating how well it will perform on real-world, unseen datasets. Typically, an 80/20 or 70/30 split
between training and test data is used, although cross-validation techniques can provide a more robust
evaluation.
In addition to training and test data, a validation set is often used to tune hyperparameters and eval-
uate model performance during the training process. It helps prevent overfitting by providing feedback
on how well the model generalizes before it is exposed to the test data.
Key considerations during data collection include ensuring the data is relevant, representative, and
free of bias. In real-world projects, data collection is often one of the most time-consuming phases
due to the need for data cleaning and formatting before use.
• Data Cleaning: Handling missing data (e.g., by imputing values), removing duplicates, and cor-
recting inconsistencies.
• Normalization and Standardization: Scaling numerical features so that they have a uniform
range or distribution, which improves the performance of many machine learning algorithms.
• Encoding Categorical Data: Converting categorical features into numerical values using tech-
niques like one-hot encoding or label encoding.
• Splitting the Data: Dividing the dataset into training, validation, and test sets to ensure that the
model generalizes well to unseen data.
• Problem Type: Classification, regression, clustering, etc. For instance, decision trees, logistic
regression, and neural networks are common for classification tasks.
• Model Complexity: Simple models (like linear regression) are easier to interpret but may underfit
complex data, while more complex models (like deep neural networks) are powerful but prone
to overfitting.
• Data Size: Algorithms like k-nearest neighbors (KNN) can become computationally expensive
with large datasets, while models like stochastic gradient descent (SGD) can scale well.
• Training Time: Some algorithms (e.g., support vector machines) take more time to train, so
resource constraints must be considered.
• Hyperparameter Tuning: Adjusting non-learnable parameters (e.g., learning rate, batch size) to
improve model performance. This is often done using grid search or randomized search.
• Cross-validation: A technique to assess the model’s performance and ensure that it generalizes
well. K-fold cross-validation is commonly used, where the data is split into multiple folds, and
the model is trained and tested on each fold.
• Evaluation Metrics: Different metrics are used to measure the model’s effectiveness, depending
on the problem. For classification tasks, accuracy, precision, recall, and F1-score are popular,
while for regression tasks, mean squared error (MSE) or R-squared is typically used.
• Model Serving: Setting up infrastructure to expose the model as a service, often through APIs.
This allows other systems or applications to send new data to the model and receive predictions
in real time.
• Monitoring: Once deployed, continuous monitoring of the model is critical to ensure that it main-
tains its accuracy over time. Model drift (where the model’s performance deteriorates due to
changes in data patterns) is a common issue in production environments, requiring retraining or
adjustments to the model.
• Scalability: Ensuring that the model can handle increasing amounts of data or more requests
in real-time scenarios. Techniques like distributed computing or cloud-based deployment can
help.
y = β0 + β1 x + ϵ
Where y is the dependent variable, x is the independent variable, β0 is the intercept, β1 is the coefficient
(slope), and ϵ is the error term.
To implement linear regression in Python using PyTorch, you can follow this example:
1 import torch
2 import torch.nn as nn
3 import torch.optim as optim
4
5 # Sample data
6 x_train = torch.tensor([[1.0], [2.0], [3.0], [4.0]])
7.5. COMMON ALGORITHMS IN MACHINE LEARNING 95
9 # Model definition
10 class LinearRegressionModel(nn.Module):
11 def __init__(self):
12 super(LinearRegressionModel, self).__init__()
13 self.linear = nn.Linear(1, 1)
14
18 model = LinearRegressionModel()
19
24 # Training loop
25 for epoch in range(1000):
26 model.train()
27 optimizer.zero_grad()
28 outputs = model(x_train)
29 loss = criterion(outputs, y_train)
30 loss.backward()
31 optimizer.step()
32
33 if epoch % 100 == 0:
34 print(f'Epoch {epoch}, Loss: {loss.item()}')
Output:
Epoch 0, Loss: 44.164981842041016
Epoch 100, Loss: 0.10525545477867126
Epoch 200, Loss: 0.05778397619724274
Epoch 300, Loss: 0.03172262758016586
Epoch 400, Loss: 0.017415346577763557
Epoch 500, Loss: 0.009560829028487206
Epoch 600, Loss: 0.0052487668581306934
Epoch 700, Loss: 0.0028815295081585646
Epoch 800, Loss: 0.0015819233376532793
Epoch 900, Loss: 0.000868460105266422
This code defines a simple linear regression model using PyTorch, where we create a model, define
a loss function, and use gradient descent to optimize the model’s weights.
In PyTorch, a logistic regression model can be implemented similarly to linear regression but using a
sigmoid activation function for the output:
1 import torch.nn.functional as F
2
3 class LogisticRegressionModel(nn.Module):
4 def __init__(self):
5 super(LogisticRegressionModel, self).__init__()
6 self.linear = nn.Linear(1, 1)
7
11 model = LogisticRegressionModel()
Here, the logistic function sigmoid is used to map outputs to probabilities between 0 and 1, suitable
for binary classification.
3 # Sample data
4 X_train = [[1, 2], [2, 3], [3, 4], [4, 5]]
5 y_train = [0, 1, 0, 1]
6
7 clf = RandomForestClassifier(n_estimators=10)
8 clf.fit(X_train, y_train)
9 print(clf.predict([[3, 3]]))
3 # Sample data
4 X_train = [[1, 2], [2, 3], [3, 4], [4, 5]]
5 y_train = [0, 1, 0, 1]
6
7.6. CHALLENGES IN MACHINE LEARNING 97
7 clf = svm.SVC(kernel='linear')
8 clf.fit(X_train, y_train)
9 print(clf.predict([[3, 3]]))
For underfitting, the solution usually involves increasing model complexity, adding more features, or
training for longer periods.
1 import pandas as pd
2
1 import sklearn
2 import tensorflow as tf
3 import torch
typically the best choice due to its simplicity. - For deep learning models, PyTorch or TensorFlow are
preferred. - TensorFlow is often chosen for large-scale production systems, while PyTorch is com-
monly used in research and development.
4 # Loading data
5 df = pd.read_csv('data.csv')
6
4 model = LinearRegression()
5 model.fit(X_train, y_train)
6 predictions = model.predict(X_test)
3 kmeans = KMeans(n_clusters=3)
4 kmeans.fit(data)
5 labels = kmeans.predict(data)
100 CHAPTER 7. INTRODUCTION TO MACHINE LEARNING
Reinforcement learning is used in scenarios where an agent learns to take actions in an environment
to maximize cumulative reward. It is commonly applied in areas like robotics, game-playing, and au-
tonomous vehicles.
An example of reinforcement learning is Google’s AlphaGo, which uses this technique to play and
defeat human players in the game of Go. Reinforcement learning models like Q-learning are imple-
mented using PyTorch or TensorFlow, allowing the agent to learn optimal policies through trial and
error.
Here’s the basic structure of a Q-learning algorithm:
1 import numpy as np
2
• We explored common algorithms in machine learning, such as linear regression, logistic regres-
sion, decision trees, random forests, support vector machines, and clustering algorithms.
• We discussed the key challenges faced in machine learning, including overfitting, underfitting,
bias-variance tradeoff, data quality, and ethical concerns related to bias in AI.
• We reviewed popular machine learning libraries such as Scikit-learn, TensorFlow, and PyTorch,
emphasizing how to choose the right tool for specific tasks.
• Practical case studies in supervised, unsupervised, and reinforcement learning were examined
to highlight real-world applications of these techniques.
Understanding these core topics prepares you for more advanced machine learning topics, where
these foundational algorithms and tools will be applied in increasingly complex scenarios.
Machine learning is a rapidly evolving field, and new trends are constantly emerging. Some of the key
trends shaping the future of machine learning include:
7.9. SUMMARY AND FUTURE DIRECTIONS 101
AutoML: Automating machine learning processes has been a significant focus in recent years.
AutoML [28] tools allow non-experts to build machine learning models with minimal effort, automating
tasks such as feature engineering, model selection, and hyperparameter tuning.
Explainable AI (XAI): With the increasing adoption of machine learning in critical areas like health-
care and finance, there is a growing need for models that are interpretable and transparent. XAI fo-
cuses on creating models that can explain their decisions in ways humans can understand [7].
Federated Learning: Federated learning [62] is a technique that enables training models across
decentralized devices without needing to centralize data. This approach enhances data privacy and
security, making it suitable for industries like healthcare and finance.
Deep Reinforcement Learning: Reinforcement learning combined with deep learning techniques
has been making headlines due to its success in areas such as robotics, game AI, and autonomous
driving. This trend is expected to grow as more real-world applications are explored.
AI Ethics and Fairness: As machine learning becomes more embedded in society, there is a grow-
ing focus on ensuring that AI systems are fair, transparent, and ethical. Regulatory frameworks are
being developed to ensure the responsible deployment of AI technologies [65].
• Deep Learning by Ian Goodfellow, Yoshua Bengio, and Aaron Courville — An in-depth guide to
deep learning, covering foundational concepts and modern applications.
Articles:
• "A Few Useful Things to Know About Machine Learning" by Pedro Domingos — A well-known
article that covers essential tips for practitioners in the field.
• "Deep Learning for AI" by Yann LeCun, Yoshua Bengio, and Geoffrey Hinton — A foundational
paper on deep learning that outlines the concepts driving this field.
Online Resources:
• Coursera: Machine Learning by Andrew Ng — A popular course that provides an excellent intro-
duction to the field of machine learning.
• Fast.ai: Practical deep learning for coders — A hands-on course designed to teach deep learning
with PyTorch.
• Kaggle: An online platform where you can participate in machine learning competitions, learn
from other practitioners, and work on real-world datasets.
102 CHAPTER 7. INTRODUCTION TO MACHINE LEARNING
Chapter 8
Neural networks (NN), central to deep learning, are computational models inspired by the structure
and functioning of the human brain. Each neural network consists of interconnected layers of artificial
neurons, which process and transmit information. The foundation of neural networks lies in their ability
to learn complex patterns from data, enabling them to perform tasks like image recognition, language
translation, and decision-making with unprecedented accuracy.
Neural networks have become indispensable in modern machine learning because they can ap-
proximate any continuous function, making them highly versatile in solving both linear and nonlin-
ear problems. They excel in domains with large amounts of data, where traditional algorithms might
struggle to capture intricate relationships between features. For example, deep learning models, par-
ticularly those based on neural networks, have set new benchmarks in fields like speech recognition,
autonomous driving, and medical diagnosis.
The perceptron, introduced by Frank Rosenblatt in 1958 [86], is the simplest form of a neural net-
work. Mathematically, it consists of an input vector X = (x1 , x2 , . . . , xn ), corresponding weights
W = (w1 , w2 , . . . , wn ), a bias term b, and an activation function ϕ. The output of the perceptron is
computed as:
y = ϕ(W · X + b),
where ϕ is often a step function (Heaviside function) that outputs a binary decision based on
whether the weighted sum is greater than a threshold.
This model can be visualized as a single neuron in which input data is passed through the weighted
sum and bias, followed by an activation function. Despite its simplicity, the perceptron is capable of
solving linearly separable problems, such as classifying points on opposite sides of a line in a 2D
space.
103
104 CHAPTER 8. UNDERSTANDING NEURAL NETWORKS
Softmax For classification tasks involving multiple categories, the softmax function is often used in
the output layer. It converts the raw output of the network into a probability distribution, ensuring that
the sum of all outputs equals one. This is particularly useful in multi-class classification problems [77].
Each activation function has its specific use case, and the choice of function depends on the nature
of the problem being solved, the depth of the network, and the computational resources available. The
selection of the activation function also affects the speed and stability of model convergence during
training.
This process repeats for every layer until the final output is produced at the output layer.
Cross-Entropy Loss: Primarily used for classification tasks, cross-entropy measures the difference
between the true probability distribution and the predicted one. For binary classification, it is defined
as:
N
1 X
Cross-Entropy = − [yi log(yˆi ) + (1 − yi ) log(1 − yˆi )]
N i=1
Cross-entropy penalizes confident but incorrect predictions more heavily, making it effective for clas-
sification models.
The gradients of the loss function concerning the model’s parameters are calculated and used to
update the weights via optimization algorithms like gradient descent or its variants. Minimizing the
loss function over multiple iterations ensures that the model improves its predictions by learning the
optimal set of weights.
Gradient Descent [107] is an optimization technique used to minimize the cost function of a neural
network by iteratively adjusting the model’s parameters (weights and biases). The core idea is to
move in the direction of the negative gradient of the cost function to find the global or local minimum.
Mathematically, it updates each parameter w by:
∂J(w)
w := w − η
∂w
∂J(w)
where η is the learning rate, and ∂w is the gradient of the cost function concerning the weight
w. The learning rate determines the size of the step we take toward the minimum; if it’s too large, the
algorithm may overshoot, and if it’s too small, convergence will be slow.
Backpropagation is the key algorithm used to compute the gradients needed for gradient descent in a
neural network. It works by calculating the gradient of the loss function concerning each weight, and
iteratively adjusting weights to minimize the error. This is achieved using the chain rule to propagate
errors backward from the output layer to the input layer.
Steps of Backpropagation
• Forward Pass: Input data is passed through the network layer by layer, computing activation until
the output is produced. The loss function J(ŷ, y) is calculated based on the difference between
the predicted output ŷ and the true output y.
• Backward Pass: The partial derivatives of the loss function concerning the weights are calcu-
lated. Using the chain rule, we can compute the gradient of the cost function for each layer
starting from the output:
∂J ∂J ∂ ŷ ∂z
= · ·
∂w ∂ ŷ ∂z ∂w
where z is the weighted input to the neuron, and ŷ is the predicted output.
8.6. WEIGHT INITIALIZATION AND REGULARIZATION 107
• Weight Update: The gradients are used to update the weights, moving them in the direction that
minimizes the loss function:
∂J
w′ = w − η
∂w
This process is repeated for multiple training epochs until the network converges to a minimal loss.
mt = β1 mt−1 + (1 − β1 )gt
vt = β2 vt−1 + (1 − β2 )gt2
where gt is the gradient at time step t, and β1 , β2 are decay rates. The adaptive learning rate makes
Adam particularly effective for non-stationary and noisy objectives.
Xavier Initialization:
This method is used primarily for networks with sigmoid or tanh activation functions [32]. It ensures
that the variance of the outputs of each layer remains constant by drawing the initial weights from a
distribution with variance:
1
v2 =
N
where N is the number of input neurons to the layer. This helps to prevent the gradients from becoming
too small as they propagate through the network.
108 CHAPTER 8. UNDERSTANDING NEURAL NETWORKS
He Initialization:
For ReLU and its variants, He initialization is more appropriate [38]. It adjusts the variance of the
weights to:
2
v2 =
N
where N is the number of inputs to the neuron. This helps maintain variance throughout the network,
allowing for faster convergence, especially in deep networks.
L1 and L2 Regularization
L1 Regularization (Lasso): L1 regularization [17] adds a penalty proportional to the absolute value of
the weights:
X
J = J0 + λ |wi |
This encourages sparsity in the network by driving some weights to zero, which can be useful for
feature selection.
L2 Regularization (Ridge or Weight Decay): L2 regularization [40] adds a penalty proportional to the
square of the weights:
X
J = J0 + λ wi2
L2 regularization tends to shrink all weights but does not force them to zero, promoting smoother,
more distributed weight values. It helps in stabilizing learning and preventing large swings in weights
during training.
Dropout
Dropout [94] is a widely used regularization technique that works by randomly "dropping" a subset of
neurons during each training iteration. Each neuron is kept active with a probability p (a hyperparam-
eter typically between 0.5 and 0.8). This prevents the network from relying too heavily on any single
neuron and forces it to learn more robust features. During inference, the full network is used, but the
weights are scaled to account for the dropped neurons during training.
CNNs differ from traditional feedforward neural networks in their architecture, utilizing convo-
lutional layers instead of fully connected layers. This enables the network to be more parameter-
efficient, making it better suited for high-dimensional inputs such as images, which can contain mil-
lions of pixels. CNNs are widely used in applications such as object detection, facial recognition,
medical imaging, and even natural language processing when processing structured data.
where S(i, j) is the result of the convolution at position (i, j), and m, n are the dimensions of the filter.
Convolutional layers are often followed by non-linear activation functions like ReLU (Rectified Lin-
ear Unit), which introduce non-linearity into the model and allow it to learn more complex representa-
tions.
Pooling Layers:
Pooling layers are another essential part of CNNs, typically used after convolutional layers to reduce
the spatial dimensions of the feature maps, which helps decrease the number of parameters and
computational complexity. The most common form of pooling is max pooling, where a filter selects
the maximum value from a region of the feature map. Mathematically, max pooling for a 2x2 region
can be described as:
Pooling helps retain the most important information while discarding less relevant details, making the
network more robust to minor translations and distortions in the input data.
• Image Classification: CNNs are used to classify images, from basic object recognition to more
complex tasks like identifying tumors in medical imaging [54].
• Object Detection: Advanced CNN architectures such as Faster R-CNN [83] and YOLO (You Only
Look Once) [82] enable real-time object detection and localization in images and videos.
• Facial Recognition: CNNs are used in facial recognition systems, where they extract unique
features from faces for identification and verification purposes [98].
• Medical Imaging: In the healthcare sector, CNNs are utilized to analyze medical images such as
X-rays, MRIs, and CT scans to detect diseases and abnormalities [90].
110 CHAPTER 8. UNDERSTANDING NEURAL NETWORKS
• Natural Language Processing: While CNNs are primarily used for image-related tasks, they are
also applied in NLP for tasks like sentence classification by treating text as a structured grid [48].
The combination of convolutional layers for feature extraction and pooling layers for dimension-
ality reduction makes CNNs highly effective for handling large-scale image data and other spatially
structured data, making them a cornerstone of modern machine learning.
where Wxh and Whh are weight matrices, bh is the bias, and f is the activation function (typically tanh
or ReLU).
While RNNs excel at capturing temporal dependencies, they struggle with long-term dependencies
due to issues like vanishing and exploding gradients. This led to the development of more advanced
architectures like LSTMs and GRUs.
ft = σ(Wf · [ht−1 , xt ] + bf )
it = σ(Wi · [ht−1 , xt ] + bi )
Ct = ft ∗ Ct−1 + it ∗ C̃t
ot = σ(Wo · [ht−1 , xt ] + bo )
ht = ot ∗ tanh(Ct )
Here, ft , it , and ot represent the forget, input, and output gates, respectively, while Ct is the cell state.
The LSTM architecture allows the network to retain information over long sequences, making it ideal
for tasks like language modeling, machine translation, and video analysis.
8.9. ADVANCED NEURAL NETWORK ARCHITECTURES 111
zt = σ(Wz · [ht−1 , xt ] + bz )
rt = σ(Wr · [ht−1 , xt ] + br )
ht = zt ∗ ht−1 + (1 − zt ) ∗ h̃t
Here, zt is the update gate and rt is the reset gate. GRUs are particularly useful for time series analysis,
speech recognition, and tasks where computational efficiency is important.
• Natural Language Processing: RNNs, LSTMs, and GRUs are used for tasks such as language
translation, text generation, and sentiment analysis.
• Time Series Prediction: RNNs are applied to financial forecasting, weather prediction, and stock
market analysis.
• Speech Recognition: LSTMs and GRUs are employed in speech-to-text systems to capture tem-
poral dependencies in audio signals.
• Video Analysis: These architectures are used to analyze sequential frames in video data, en-
abling tasks like action recognition and video captioning.
8.9.1 Autoencoders
Autoencoders [10] are a type of unsupervised neural network architecture designed to learn efficient
codings of input data. The network consists of two main parts: the encoder, which compresses the
input data into a latent representation, and the decoder, which reconstructs the original data from
this representation. Autoencoders are commonly used for tasks such as dimensionality reduction,
anomaly detection, and data denoising.
The architecture of an autoencoder can be represented as follows:
h = f (We · x + be )
x̂ = g(Wd · h + bd )
112 CHAPTER 8. UNDERSTANDING NEURAL NETWORKS
where x is the input, h is the encoded representation, and x̂ is the reconstructed input. The goal of
training an autoencoder is to minimize the difference between x and x̂, typically using a loss function
like mean squared error (MSE).
Variational Autoencoders (VAEs) are an extension of basic autoencoders that introduce a prob-
abilistic approach to the latent space, allowing for more meaningful interpolation and generation of
new data samples.
LG = − log(D(G(z)))
where G(z) is the generator’s output given noise input z, and D is the discriminator’s prediction.
GANs are widely used in tasks such as image generation, video synthesis, and style transfer. Vari-
ants like Deep Convolutional GANs (DCGANs) and Conditional GANs (CGANs) have expanded their
application scope, allowing for more controlled and realistic data generation.
8.9.3 Transformers
Transformers [99] are a type of neural network architecture designed to handle sequential data without
relying on recurrent layers. Instead, they use a mechanism called self-attention, which allows the
model to weigh the importance of different parts of the input sequence. This makes Transformers
particularly effective for tasks involving long-range dependencies.
The self-attention mechanism can be expressed as:
QK T
Attention(Q, K, V ) = softmax √ V
dk
where Q (queries), K (keys), and V (values) are the inputs, and dk is the dimension of the key vectors.
Transformers are widely used in natural language processing tasks, particularly for sequence-to-
sequence tasks such as language translation and text summarization. The architecture was popu-
larized by models like BERT and GPT, which have set new benchmarks in NLP by leveraging massive
pre-trained models and fine-tuning them for specific tasks.
• GANs: Applied in image and video generation, deepfake technology, and creative AI systems for
art generation.
• Transformers: Power state-of-the-art language models, machine translation systems, and question-
answering tasks.
8.10. HYPERPARAMETER TUNING 113
Learning Rate:
The learning rate determines the size of the steps taken during gradient descent to minimize the loss
function. A low learning rate results in slow convergence, while a high learning rate can cause the
model to overshoot the optimal values, leading to divergence. Optimizing this hyperparameter is es-
sential for training efficiency and stability.
Batch Size:
Batch size defines how many samples are processed before the model’s internal parameters (weights)
are updated. Small batch sizes can introduce noise into gradient estimates but often lead to faster
convergence. Larger batch sizes provide more stable gradients but require more memory and can lead
to slower updates.
The architecture of a neural network, such as the number of layers and the number of neurons per
layer, defines its capacity to learn complex patterns. More layers and neurons can capture higher-level
abstractions but may also lead to overfitting if not controlled through regularization.
Dropout Rate:
Dropout is a regularization technique where randomly selected neurons are ignored during training
to prevent overfitting. The dropout rate specifies the probability of dropping a neuron in each layer.
Adjusting this hyperparameter can significantly affect the generalization ability of the model.
Weight Initialization:
Proper weight initialization can prevent issues such as vanishing and exploding gradients. Techniques
like Xavier or He initialization are commonly used to ensure that the variance of activations is consis-
tent across layers, allowing for smoother learning.
Optimizer:
The choice of the optimizer (e.g., Stochastic Gradient Descent, Adam, RMSprop) controls how the
model’s parameters are updated based on the gradients. Some optimizers are better suited for spe-
cific tasks or architectures, and tuning their hyperparameters (e.g., momentum, beta parameters) can
improve convergence.
114 CHAPTER 8. UNDERSTANDING NEURAL NETWORKS
Grid Search:
Grid search is an exhaustive method for hyperparameter tuning where all possible combinations of
a predefined set of hyperparameter values are evaluated. While it guarantees finding the best com-
bination, grid search can be computationally expensive, especially when dealing with multiple hyper-
parameters or large models. It works best when the search space is small or when computational
resources are abundant.
Random Search:
In random search, hyperparameters are randomly sampled from predefined distributions. Unlike grid
search, random search does not systematically test all combinations, but it often finds good solutions
faster because it covers a wider range of the hyperparameter space. Empirical studies have shown that
random search can be more efficient than grid search for high-dimensional hyperparameter spaces.
Bayesian Optimization:
Bayesian optimization [93] models the relationship between hyperparameters and the model’s perfor-
mance as a probabilistic function. It selects hyperparameter settings based on previous evaluations,
balancing exploration (trying new regions of the hyperparameter space) with exploitation (focusing on
promising areas). This approach is more efficient than grid or random search, particularly for complex
models with many hyperparameters. Bayesian optimization is ideal when computational resources are
limited or when training is time-consuming.
The choice of hyperparameter optimization method depends on the complexity of the model and the
available computational resources:
• Grid Search: Suitable for small models and when all hyperparameter values need to be tested.
• Random Search: More practical for larger search spaces where only a subset of hyperparameter
combinations need to be evaluated.
• Bayesian Optimization: Best suited for models where each evaluation is computationally expen-
sive, and an informed search is necessary to optimize quickly.
GPUs are widely used for training neural networks due to their ability to perform parallel computa-
tions efficiently. They are particularly effective for tasks involving large datasets and models such as
Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). High-end GPUs, such
as the NVIDIA A100, are ideal for large-scale deep-learning tasks, offering high throughput for both
training and inference.
TPUs, developed by Google [45], are specialized hardware designed for machine learning tasks. TPUs
are highly optimized for tensor operations and provide even faster performance than GPUs for certain
tasks, such as training large transformer models and CNNs. TPUs are particularly suited for projects
deployed on Google Cloud, where they can be scaled easily for large datasets and complex models.
Training large neural networks requires substantial memory resources. Models with millions or billions
of parameters demand high VRAM (16GB to 40GB or more) to prevent memory bottlenecks. Fast
storage solutions such as NVMe SSDs are critical for loading large datasets quickly, especially when
performing data-intensive tasks like training on large image or text corpora.
Training and inference speed depend heavily on the hardware used and the optimization of the neural
network architecture. Key considerations include:
Batch size and learning rate are critical hyperparameters that influence training speed. Larger batch
sizes reduce the variance in gradient updates but require more memory. The learning rate controls the
speed of convergence, and optimizing this can prevent unnecessary training epochs.
For very large models, such as those used in NLP (e.g., GPT-3), it is common to use parallelism strate-
gies. Model parallelism splits the model across multiple devices, while data parallelism distributes
the training data across multiple devices. These techniques are essential for scaling models across
multiple GPUs or TPUs to accelerate training.
Inference Optimization:
For inference tasks, it is essential to minimize latency. Optimizations like pruning, quantization, and
model compression reduce the size of the model, making inference faster and more efficient, espe-
cially when deploying models on edge devices or in production environments with limited resources.
116 CHAPTER 8. UNDERSTANDING NEURAL NETWORKS
Scalability:
In large-scale applications, such as recommendation systems or real-time analytics, the ability to scale
neural network models across multiple devices or cloud platforms is essential. TPUs, particularly
in Google Cloud environments, offer seamless scalability for deep learning tasks. For GPU-based
deployments, frameworks like NVIDIA’s TensorRT help optimize models for different GPUs, enhancing
inference speed and performance.
Cloud platforms like AWS, Google Cloud, and Azure provide flexible and scalable environments for
deploying machine learning models. These platforms offer pre-configured instances with GPUs or
TPUs, enabling rapid scaling as demand increases. Leveraging serverless functions and containerized
services (e.g., Kubernetes) allows for dynamic scaling of machine learning services.
For applications like autonomous vehicles and IoT devices, deploying models on edge devices is es-
sential. Accelerators such as Edge TPUs and mobile GPUs offer efficient deployment solutions by
optimizing models for lower power consumption and real-time processing.
Installing PyTorch: The installation command for PyTorch depends on your system configuration
(CPU or GPU). You can specify these options on the official PyTorch website (https://pytorch.org/).
For example, to install PyTorch with CUDA (for NVIDIA GPU support):
Setting up Virtual Environments: It’s highly recommended to create a virtual environment for your
deep learning projects. This helps you manage dependencies and avoid conflicts between different
projects. Here’s how to set up a virtual environment:
117
118 CHAPTER 9. GETTING STARTED WITH DEEP LEARNING
Remember to always activate the virtual environment before working on your project, and deacti-
vate it after finishing your session by running:
deactivate
1. Download and install the NVIDIA drivers for your GPU from https://www.nvidia.com/drivers.
4. After installing CUDA, copy the cuDNN files to the appropriate directories:
Once CUDA and cuDNN are installed, you can verify that PyTorch is using the GPU by running the
following Python code:
1 import torch
2 if torch.cuda.is_available():
3 print("CUDA is available. PyTorch is using GPU!")
4 else:
5 print("CUDA is not available. PyTorch is using CPU.")
1 import tensorflow as tf
2 print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
9.2. INTRODUCTION TO TENSORFLOW AND PYTORCH 119
1 import tensorflow as tf
2 from tensorflow.keras import layers
3
In this example, we define a simple fully connected neural network using two hidden layers with ReLU
activation and an output layer with softmax activation for classification. The model is compiled with
the Adam optimizer and trained on randomly generated data.
Basic Neural Network in PyTorch: The following example demonstrates how to create a simple
feedforward neural network using PyTorch:
1 import torch
2 import torch.nn as nn
3 import torch.optim as optim
4
28 # Training loop
29 for epoch in range(5):
30 optimizer.zero_grad()
31 outputs = model(x_train)
32 loss = criterion(outputs, y_train)
33 loss.backward()
34 optimizer.step()
35
36 if epoch % 1 == 0:
37 print(f'Epoch {epoch+1}, Loss: {loss.item()}')
In this PyTorch example, we define a simple neural network with two hidden layers. The model is
trained using stochastic gradient descent with the Adam optimizer, and the loss is computed using
cross-entropy loss on randomly generated data.
• MNIST: A dataset of handwritten digits (28x28 grayscale images) used for classification tasks [57].
• CIFAR-10: A dataset containing 60,000 color images in 10 classes, with each image sized 32x32
pixels [53].
In TensorFlow, datasets like MNIST can be loaded directly using the tensorflow.keras.datasets
module:
1 import tensorflow as tf
2
1 import torchvision
2 import torchvision.transforms as transforms
3
These libraries provide convenient access to widely used datasets, allowing you to focus on model
building.
1 import tensorflow as tf
2 from tensorflow.keras import layers
3
5 model = tf.keras.Sequential([
6 layers.Flatten(input_shape=(28, 28)), # Flatten the input (for MNIST)
7 layers.Dense(128, activation='relu'), # Hidden layer
8 layers.Dense(10, activation='softmax') # Output layer for 10 classes
9 ])
Here, the input is flattened to a 1D vector, followed by a dense hidden layer with ReLU activation and
an output layer with 10 units (for the 10 classes in MNIST), using softmax for classification.
PyTorch: Defining a Feedforward Neural Network
1 import torch.nn as nn
2 import torch.nn.functional as F
3
17 model = SimpleNN()
In PyTorch, the input is also flattened, followed by a hidden layer with ReLU activation, and the output
layer uses log_softmax for classification.
In TensorFlow, we compile the model with the Adam optimizer and a cross-entropy loss function, which
is suitable for classification tasks. The model is then trained using the fit method.
PyTorch: Training the Model In PyTorch, the training process is more explicit, as we need to define
the training loop manually:
7 # Training loop
8 for epoch in range(5):
9 for inputs, labels in trainloader:
10 optimizer.zero_grad() # Zero the gradients
11 outputs = model(inputs) # Forward pass
12 loss = criterion(outputs, labels) # Compute loss
13 loss.backward() # Backward pass
14 optimizer.step() # Update weights
15
In PyTorch, we define the optimizer and loss function manually, then write a loop to feed data through
the model, compute the loss, backpropagate the gradients, and update the model weights.
Both frameworks allow us to effectively train deep learning models, with TensorFlow offering a
more high-level API and PyTorch providing more granular control.
The evaluate function returns the loss and accuracy on the test dataset.
PyTorch: Evaluating on Test Data In PyTorch, we evaluate the model using the following loop:
1 correct = 0
2 total = 0
3 model.eval() # Set the model to evaluation mode
4
8 outputs = model(inputs)
9 _, predicted = torch.max(outputs.data, 1)
10 total += labels.size(0)
11 correct += (predicted == labels).sum().item()
12
In this code, we loop over the test dataset, compute the predicted classes, and compare them to the
true labels to compute accuracy.
Confusion Matrices: For classification tasks, confusion matrices provide more detailed insight
into model performance by showing how many samples were correctly or incorrectly classified for
each class. A confusion matrix can be generated in Python using the sklearn.metrics library:
1 from sklearn.metrics import confusion_matrix
2 import seaborn as sns
3 import matplotlib.pyplot as plt
4
In TensorFlow, models are saved in the HDF5 format by default and can be reloaded easily for further
use.
PyTorch: Saving and Loading Models In PyTorch, saving and loading models are slightly different,
using torch.save and torch.load:
1 # Save the model
9.4. EVALUATING AND FINE-TUNING YOUR MODEL 125
2 torch.save(model.state_dict(), 'model.pth')
3
In PyTorch, we save the model’s state dictionary, which contains all the model parameters, and reload
it when needed.
• Learning rate: Determines how quickly the model’s weights are updated during training.
• Batch size: Defines the number of samples processed before updating the model’s weights.
• Number of epochs: Refers to how many times the model sees the entire training dataset.
In PyTorch, you can modify the learning rate when setting up the optimizer:
1 optimizer = optim.Adam(model.parameters(), lr=0.0001)
Batch Size: Increasing the batch size can lead to faster training, but larger batches require more
memory. A smaller batch size can result in more frequent updates and may help improve generaliza-
tion but could increase training time.
In TensorFlow:
1 model.fit(x_train, y_train, epochs=5, batch_size=64) # Batch size is 64
Number of Epochs: The number of epochs determines how many times the model will iterate over
the entire training dataset. Increasing the number of epochs may improve model accuracy, but too
many epochs can lead to overfitting.
You can experiment with these hyperparameters to find the best combination for your specific
dataset and model.
126 CHAPTER 9. GETTING STARTED WITH DEEP LEARNING
• Convolutional Layers: Apply convolution operations to extract features from the input data.
• Pooling Layers: Reduce the dimensionality of feature maps, retaining important features.
TensorFlow: Implementing a Simple CNN Here’s how you can build a simple CNN for image clas-
sification using TensorFlow:
1 import tensorflow as tf
2 from tensorflow.keras import layers, models
3
In this example, the CNN consists of three convolutional layers with ReLU activation and two pool-
ing layers to down-sample the feature maps. The network ends with fully connected layers for classi-
fication into 10 classes.
PyTorch: Implementing a Simple CNN Here’s how to build the same CNN using PyTorch:
9.5. IMPLEMENTING MORE COMPLEX MODELS 127
1 import torch
2 import torch.nn as nn
3 import torch.nn.functional as F
4
In this PyTorch implementation, we define a CNN with three convolutional layers followed by max
pooling and then flatten the feature maps to feed into two fully connected layers for classification.
• Input Layers: Accept sequential input data (e.g., time series, text sequences).
TensorFlow: Implementing an LSTM for Text Classification Here’s how you can implement an
LSTM model for text classification using TensorFlow:
1 import tensorflow as tf
2 from tensorflow.keras import layers
128 CHAPTER 9. GETTING STARTED WITH DEEP LEARNING
In this example, we use an embedding layer to convert words into vectors, followed by an LSTM
layer to process the sequential data, and an output layer with 10 units for classification.
PyTorch: Implementing an LSTM for Text Classification Here’s how you can implement an LSTM
model using PyTorch:
1 import torch
2 import torch.nn as nn
3
In this PyTorch implementation, we define an LSTM model with an embedding layer, an LSTM layer
to process the sequences, and a fully connected layer for the classification task.
you can use an existing model trained on a large dataset (e.g., ImageNet) and fine-tune it for your
specific task. This approach is particularly useful when working with small datasets or when training
deep models that would otherwise take a long time to converge.
• Pre-trained models have already learned useful features from a large dataset, which can be re-
purposed for new tasks.
• It improves performance on tasks where you don’t have enough labeled data.
• Replace the final layer(s) to fit your specific task (e.g., change the number of output classes).
• Fine-tune the model on your custom dataset, either by training only the new layers or by fine-
tuning the entire model.
1 import tensorflow as tf
2 from tensorflow.keras.applications import VGG16
3 from tensorflow.keras import layers, models
4
5 # Load the pre-trained VGG16 model, exclude the top layer (for custom output)
6 base_model = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
7
In this example, we load the VGG16 model, freeze its pre-trained layers, and add custom layers to
adapt it for a new classification task. You can also unfreeze the layers and fine-tune the entire model
if needed.
PyTorch: Using a Pre-trained Model (ResNet) Similarly, in PyTorch, you can load pre-trained mod-
els from torchvision.models and fine-tune them. Below is an example using the ResNet18 model:
1 import torch
2 import torch.nn as nn
3 import torchvision.models as models
4
12 # Replace the final fully connected layer with a new one for your task
13 num_ftrs = resnet.fc.in_features
14 resnet.fc = nn.Linear(num_ftrs, 10) # 10 output classes
15
In this example, we load the ResNet18 model, freeze the pre-trained layers, and replace the final
fully connected layer with one suited for our custom task (e.g., 10 output classes). We then fine-tune
the model on our dataset.
9.6. USING PRE-TRAINED MODELS AND TRANSFER LEARNING 131
• Feature extraction: Freeze the pre-trained model’s layers and only train the new layers you’ve
added. This is useful when you don’t have much data or want to preserve the learned features
from the base model.
• Fine-tuning: Unfreeze some or all of the layers of the pre-trained model and train them alongside
the new layers. This allows the entire model to adapt to the new dataset.
Here’s how to fine-tune a model by unfreezing some layers (e.g., the top few convolutional layers)
in TensorFlow:
In PyTorch, you can similarly unfreeze certain layers and adjust the optimizer to fine-tune the entire
model.
Fine-tuning allows the model to better adapt to your custom dataset while still benefiting from the
features learned in the pre-training phase.
132 CHAPTER 9. GETTING STARTED WITH DEEP LEARNING
# Serve a model
tensorflow_model_server --rest_api_port=8501 --model_name=my_model --model_base_path=/path/to/
model/
TorchScript: For PyTorch models, TorchScript is a way to serialize and optimize deployment mod-
els. It converts PyTorch models into a format that can be run in a high-performance environment,
independent of Python.
1 import torch
2
Cloud Services for Model Deployment: Cloud platforms like AWS, Google Cloud, and Azure provide
robust services for deploying and scaling machine learning models.
• AWS Sagemaker: AWS Sagemaker [5] allows you to train and deploy machine learning models
at scale. It simplifies the process of building, training, and deploying models in the cloud.
• Google Cloud AI Platform: Google Cloud offers AI Platform [34], which integrates with Tensor-
Flow and other ML frameworks to deploy and manage models at scale.
• Azure Machine Learning: Azure [63] provides services for training, deploying, and managing
machine learning models, including support for PyTorch, TensorFlow, and Scikit-learn.
9.7. PRACTICAL CONSIDERATIONS 133
PyTorch Quantization: In PyTorch, you can apply dynamic quantization, which quantizes the model
weights to 8-bit integers:
1 import torch.quantization
2
Model Pruning: Model pruning involves removing weights or entire neurons from the network that
have little contribution to the model’s output. This reduces the number of parameters, making the
model smaller and faster without significantly impacting accuracy.
TensorFlow Pruning: TensorFlow offers pruning capabilities through the TensorFlow Model Opti-
mization Toolkit:
1 import tensorflow_model_optimization as tfmot
2
PyTorch Pruning: In PyTorch, you can prune certain layers or entire networks:
Optimizing models through techniques like quantization and pruning can greatly enhance perfor-
mance, particularly when deploying models to edge devices or cloud environments with limited com-
putational resources.
Chapter 10
Data Visualization
Data visualization plays a critical role in data analysis and scientific research. By transforming com-
plex data into intuitive graphs and charts, data visualization helps us easily identify trends, patterns,
and anomalies, thereby enhancing the accuracy of decision-making. It is not only an effective way to
present the results of data but also a vital tool for data exploration. Data visualization allows complex
datasets to be displayed concisely and understandably, enabling audiences to quickly grasp the core
information. Whether used in business analysis, academic research, or everyday decision support,
data visualization is an indispensable tool. Through data visualization, we can more clearly under-
stand the story behind the data and effectively share these insights with others.
In a particular grade, we have the height data of 200 students listed below. When looking at these
numbers arranged in a table, it’s challenging to discern any meaningful patterns or trends due to the
sheer volume and scattered distribution of the data. The overall distribution, central tendencies, and
potential outliers are all hidden within this dense set of numbers, making it difficult to interpret by
eye. In such cases, using visualization tools like histograms can help us better understand the data
distribution, allowing us to see the concentration, range, and shape of the data more clearly, thus
uncovering any underlying patterns.
174 168 176 185 167 167 185 177 165 175 165 165 172 150 152 164 159 173 160 155
184 167 170 155 164 171 158 173 163 167 163 188 169 159 178 157 172 150 156 171
177 171 168 166 155 162 165 180 173 152 173 166 163 176 180 179 161 166 173 179
165 168 158 158 178 183 169 180 173 163 173 185 169 185 143 178 170 167 170 150
167 173 184 164 161 164 179 173 164 175 170 179 162 166 166 155 172 172 170 167
155 165 166 161 168 174 188 171 172 169 150 169 170 194 168 173 169 158 181 177
177 160 184 155 175 191 160 164 170 164 154 170 159 174 160 185 162 166 178 157
172 183 153 171 172 177 157 156 175 172 172 173 163 172 172 162 188 174 158 176
160 177 181 161 179 174 178 188 167 162 161 161 169 173 172 178 170 184 167 197
176 161 159 174 167 177 174 169 161 154 165 178 172 157 171 173 161 171 170 158
135
136 CHAPTER 10. DATA VISUALIZATION
30
25
Number of Students
20
15
10
0
150 160 170 180 190
Height (cm)
This chart shows the distribution of heights among 200 students. As illustrated, the majority of
heights are concentrated between 160 cm and 180 cm, with a clear peak around 170 cm, forming an
evident normal distribution. Visualization like this allows us to easily observe the distribution, identify
the central tendencies, and spot any anomalies. If we were to rely solely on raw data, such patterns
would be much harder to detect. Thus, data visualization is crucial for understanding and analyzing
large datasets; it simplifies complex information and helps us make quick, accurate decisions.
The code presented in this section is designed to generate and visualize a distribution of student
heights using Python. It begins by importing the necessary libraries, setting a random seed for repro-
ducibility, and generating a dataset of heights following a normal distribution. The heights are then
displayed in a formatted manner with 20 numbers per row for better readability. Finally, the code
creates a histogram with improved aesthetics, including custom colors, labels, and grid settings, to
visually represent the distribution of heights among 200 students.
1 import numpy as np
2 import matplotlib.pyplot as plt
3
14 plt.ylabel('y')
15 plt.grid(True)
16 plt.show()
In the above code, the x array represents values ranging from -10 to 10, and the corresponding y
array represents the values calculated using the quadratic function y = x2 . We use the plt.plot()
function to draw a line chart, with markers (marker=’o’) highlighting each calculated point.
The chart’s title is "Line Chart of Quadratic Function y = x2 ", with the x-axis labeled "x" and the
y-axis labeled "y". The grid is enabled using plt.grid(True) to help better visualize the position of
each data point.
Below is the line chart illustrating the quadratic function y = x2 over the range -10 to 10:
[H]
80
60
y
40
20
0
10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0
x
This chart shows the typical parabolic shape of a quadratic function, demonstrating the squared
relationship between the independent variable x and the dependent variable y.
This code simulates the probability distribution of sums when rolling three dice and visualizes it
using a pie chart. Each slice represents the probability of a specific sum occurring.
140 CHAPTER 10. DATA VISUALIZATION
11
12 10
13
14
15
0
1
6.9%
2 4.6% 9.7%
2.8% 9
1.4%
0.5%
0.5%
1.4%
3 2.8% 11.6%
4.6%
6.9%
4
12.5%
9.7% 8
12.5%
11.6%
5
7
6
7 sum_counts = Counter(sums)
8 total_rolls = 6**3
9 probabilities = {sum_value: count / total_rolls * 100 for sum_value, count in sum_counts.items()}
10
This code calculates the probability distribution for sums when rolling three dice and visualizes
it using a bar chart with colors derived from the “viridis” colormap. You can see that the number of
points from the three dice combinations conforms to the normal distribution. The use of visualization
presents this mathematical law.
10
8
Probability (%)
0
0 2 4 6 8 10 12 14 16
Sum of Dice Rolls
10.2.4 Histogram
A histogram is a commonly used statistical chart to display the distribution of data. It divides the data
into continuous intervals (called "bins") and counts the number of data points within each interval
142 CHAPTER 10. DATA VISUALIZATION
(frequency), which reflects the frequency distribution of the data. Unlike a bar chart, the horizontal
axis of a histogram represents continuous values, making it suitable for numerical data.
In this example, we simulate rolling five dice 2000 times and use a histogram to show the distribu-
tion of the sum of each roll.
1 import numpy as np
2 import matplotlib.pyplot as plt
3
The resulting histogram shows the distribution of the sum of the dice rolls. As seen, the distribution
of the sums forms a bell curve, consistent with the Central Limit Theorem.
100
75
50
25
0
5 10 15 20 25 30
Sum
1 import pandas as pd
2 import matplotlib.pyplot as plt
3 from pandas.plotting import parallel_coordinates
4 from sklearn.datasets import load_iris
144 CHAPTER 10. DATA VISUALIZATION
Figure 10.5 shows the generated parallel coordinates chart, illustrating the differences in charac-
10.2. COMMONLY USED VISUALIZATION CHARTS 145
3 D U D O O H O &