Get Deep Learning in Bioinformatics: Techniques and Applications in Practice Habib Izadkhah Free All Chapters
Get Deep Learning in Bioinformatics: Techniques and Applications in Practice Habib Izadkhah Free All Chapters
com
https://ebookmass.com/product/deep-learning-in-
bioinformatics-techniques-and-applications-in-practice-
habib-izadkhah/
OR CLICK HERE
DOWLOAD NOW
https://ebookmass.com/product/machine-learning-for-business-analytics-
concepts-techniques-and-applications-in-rapidminer-galit-shmueli/
ebookmass.com
https://ebookmass.com/product/applications-of-deep-learning-in-
electromagnetics-teaching-maxwells-equations-to-machines-maokun-li/
ebookmass.com
https://ebookmass.com/product/time-series-algorithms-recipes-
implement-machine-learning-and-deep-learning-techniques-with-python-
akshay-r-kulkarni/
ebookmass.com
https://ebookmass.com/product/physical-geology-15th-edition-charles-c-
plummer/
ebookmass.com
https://ebookmass.com/product/open-arms-the-sabela-series-book-7-tina-
hogan-grant/
ebookmass.com
https://ebookmass.com/product/dimensions-of-uncertainty-in-
communication-engineering-ezio-biglieri/
ebookmass.com
https://ebookmass.com/product/digital-forensics-and-internet-of-
things-impact-and-challenges-1st-edition-anita-gehlot/
ebookmass.com
Public-Private Partnerships in Health: Improving
Infrastructure and Technology 1st Edition Veronica Vecchi
https://ebookmass.com/product/public-private-partnerships-in-health-
improving-infrastructure-and-technology-1st-edition-veronica-vecchi/
ebookmass.com
Deep Learning in
Bioinformatics
This page intentionally left blank
Deep Learning in
Bioinformatics
Techniques and Applications in
Practice
Habib Izadkhah
Department of Computer Science
University of Tabriz
Tabriz, Iran
Academic Press is an imprint of Elsevier
125 London Wall, London EC2Y 5AS, United Kingdom
525 B Street, Suite 1650, San Diego, CA 92101, United States
50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States
The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom
Copyright © 2022 Elsevier Inc. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical,
including photocopying, recording, or any information storage and retrieval system, without permission in writing from the
publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our
arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found
at our website: www.elsevier.com/permissions.
This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may
be noted herein).
Notices
Knowledge and best practice in this field are constantly changing. As new research and experience broaden our
understanding, changes in research methods, professional practices, or medical treatment may become necessary.
Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any
information, methods, compounds, or experiments described herein. In using such information or methods they should be
mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility.
To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any
injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or
operation of any methods, products, instructions, or ideas contained in the material herein.
ISBN: 978-0-12-823822-6
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
CHAPTER 1 Why life science? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Why deep learning? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Contemporary life science is about data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Deep learning and bioinformatics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 What will you learn? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
CHAPTER 2 A review of machine learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 What is machine learning? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Challenge with machine learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Overfitting and underfitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4.1 Mitigating overfitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4.2 Adjusting parameters using cross-validation . . . . . . . . . . . . . . . . . . . . 15
2.4.3 Cross-validation methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.5 Types of machine learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5.1 Supervised learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.5.2 Unsupervised learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.5.3 Reinforcement learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.6 The math behind deep learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.6.1 Tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.6.2 Relevant mathematical operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.6.3 The math behind machine learning: statistics . . . . . . . . . . . . . . . . . . . . 25
2.7 TensorFlow and Keras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.8 Real-world tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
CHAPTER 3 An introduction of Python ecosystem for deep learning . . . . . . . . . . . . . . . . . 31
3.1 Basic setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 SciPy (scientific Python) ecosystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.3 Scikit-learn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.4 A quick refresher in Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.4.1 Identifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.4.2 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.4.3 Data type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.4.4 Control flow statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.4.5 Data structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.4.6 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
vii
viii Contents
3.5 NumPy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.6 Matplotlib crash course . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.7 Pandas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.8 How to load dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.8.1 Considerations when loading CSV data . . . . . . . . . . . . . . . . . . . . . . . . 46
3.8.2 Pima Indians diabetes dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.8.3 Loading CSV files in NumPy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.8.4 Loading CSV files in Pandas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.9 Dimensions of your data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.10 Correlations between features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.11 Techniques to understand each feature in the dataset . . . . . . . . . . . . . . . . . . . . 53
3.11.1 Histograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.11.2 Box-and-whisker plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.11.3 Correlation matrix plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.12 Prepare your data for deep learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.12.1 Scaling features to a range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.12.2 Data normalizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.12.3 Binarize data (make binary) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.13 Feature selection for machine learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.13.1 Univariate selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.13.2 Recursive feature elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.13.3 Principal component analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.13.4 Feature importance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.14 Split dataset into training and testing sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.15 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
CHAPTER 4 Basic structure of neural networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.2 The neuron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.3 Layers of neural networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.4 How a neural network is trained? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.5 Delta learning rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.6 Generalized delta rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.7 Gradient descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.7.1 Stochastic gradient descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.7.2 Batch gradient descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.7.3 Mini-batch gradient descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.8 Example: delta rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.8.1 Implementation of the SGD method . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.8.2 Implementation of the batch method . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.9 Limitations of single-layer neural networks . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
CHAPTER 5 Training multilayer neural networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Contents ix
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
Acknowledgments
This book is a product of sincere cooperation of many people. The author would like to thank all those
who contributed in the process of writing and publishing this book. Dr. Masoud Kargar, Dr. Masoud
Aghdasifam, Hamed Babaei, Mahsa Famil, Esmaeil Roohparver, Mehdi Akbari, Mahsa Hashemzadeh,
and Shabnam Farsiani have read the whole draft and made numerous suggestions, improving the pre-
sentation quality of the book; I thank them for all their effort and encouragement.
I wish to express my sincere appreciation to the team at Elsevier, particularly Chris Katsaropoulos,
Senior Acquisitions Editor, for his guidance, comprehensive explanations of the issues, prompt reply
to my e-mails, and, of course, his patience. I would like to also thank Joshua Mearns and Nirmala
Arumugam, for preparing the production process and coordinating the web page, and the production
team. Finally, I thank “the unknown reviewers,” for their great job on exposing what needed to be
restated, clarified, rewritten, and/or complemented.
Habib Izadkhah
xiii
This page intentionally left blank
Preface
Artificial Intelligence, Machine Learning, Deep Learning, and Big Data have become the latest hot
buzzwords, Deep learning and bioinformatics being two of the hottest areas of contemporary research.
Deep learning, as an emerging branch from machine learning, is a good solution for big data analytics.
Deep learning methods have been extensively applied to various fields of science and engineering,
including computer vision, speech recognition, natural language processing, social network analyzing,
and bioinformatics, where they have produced results comparable to and in some cases superior to
domain experts. A vital value of deep learning is the analysis and learning of massive amounts of data,
making it a valuable method for Big Data Analytics.
Bioinformatics research comes into an era of Big Data. With increasing data in biology, it is ex-
pected that deep learning will become increasingly important in the field and will be utilized in a vast
majority of analysis problems. Mining potential value in biological data for researchers and the health
care domain has great significance. Deep learning, which is especially formidable in handling big data,
shows outstanding performance in biological data processing.
To practice deep learning, you need to have a basic understanding of the Python ecosystem. Python
is a versatile language that offers a large number of libraries and features that are helpful for Artificial
Intelligence and Machine Learning in particular, and, of course, you do not need to learn all of these
libraries and features to work with deep learning. In this book, I first give you the necessary Python
background knowledge to study deep learning. Then, I introduce deep learning in an easy to under-
stand and use way, and also explore how deep learning can be utilized for addressing several important
problems in bioinformatics, including drug discovery, de novo molecular design, protein structure pre-
diction, gene expression regulation, protein sequence classification, and biomedical image processing.
Through real-world case studies and working examples, you’ll discover various methods and strategies
for building deep neural networks using the Keras library. The book will give you all the practical in-
formation available on the bioinformatics domain, including the best practices. I believe that this book
will provide valuable insights for a successful career and will help graduate students, researchers, ap-
plied bioinformaticians working in the industry and academia to use deep learning techniques in their
biological and bioinformatics studies as a starting point.
This book
• provides necessary Python background for practicing deep learning,
• introduces deep learning in a convenient way,
• provides the most practical information available on the domain to build efficient deep learning
models,
• presents how deep learning can be utilized for addressing several important problems in bioinfor-
matics,
• explores the legendary deep learning architectures, including convolutional and recurrent neural
networks, for bioinformatics,
• discusses deep learning challenges and suggestions.
Habib Izadkhah
xv
This page intentionally left blank
CHAPTER
deep neural network (deep learning) is presented in the subsequent chapters of the book, it is important
to know about some of the breakthroughs achieved with deep learning first.
A common application of deep learning is image recognition. Using deep learning for facial recog-
nition includes a wide range of applications from security areas and cell phone unlocking methods to
automated tagging of individuals who are present in an image. Companies now seek to use this feature
to set up the process of making purchases without the need for credit cards. For instance, have you
noticed that Facebook has developed an extraordinary feature that lets you know about the presence of
your friends in your photos? Facebook used to make you click on photos and type your friends’ names
to tag them. However, as soon as a photo is uploaded, Facebook now does the magic and tags everybody
for you. This technology is called facial recognition.
Deep learning can also be utilized to restore images or eliminate their noise. This feature of machine
learning is also employed in different security areas, identification of criminals, and quality enhance-
ment of a family photo or a medical image. Producing fake images is also another feature of deep
learning. In fact, deep learning algorithms are able to generate new images of people’s faces, objects,
and even sceneries that have never existed. These images are utilized in graphic design, video game
development, and movie production.
Leading to a plethora of applications for users, many of the similar deep learning developments are
now employed in bioinformatics and biomedicine to classify tumor cells into various categories. Given
the scarcity of medical data, fake images can be produced to generate new data.
Deep learning has also resulted in many speech recognition developments that have become perva-
sive in search engines, cell phones, computers, TV sets, and other online devices everywhere.
So far, various speech recognition technologies have been developed, such as Alexa, Cortana,
Google Assistant, and Siri, changing human interactions with devices, homes, cars, and jobs. Through
the speech recognition technology, it is possible to talk with computers and devices, which can also
understand what the speech means and can make a response. Introducing voice-controlled or digital
assistants into the speech recognition market has changed the outlook of this technology in the 21st
century.
Analyzing its user’s behavior, a recommender system suggests the most appropriate items (e.g., data,
information, and goods). Helping users find their targets faster, this system is an approach proposed to
deal with the problems caused by the growingly massive amount of information. Many companies that
have extensive websites now employ recommender systems to facilitate their processes. Given different
preferences of various users at different ages, there is no doubt that users select different products; thus,
recommender systems should yield various results accordingly. Recommender systems have significant
effects on the revenues of different companies. If employed correctly, these systems can bring about
high profitability for companies. For instance, Netflix has announced that 60% of DVDs rented by users
are provided through recommender systems, which can greatly affect user choices of films.
Recommender systems can also be employed to prescribe appropriate medicines for patients. In
fact, prescribing the right medicines for patients is among the most important processes of their treat-
ments, for which accurate decisions must be made based on patients’ current conditions, history, and
symptoms. In many cases, patients may need more than one medicine or new medicines for another
condition in addition to a previous disease. Such cases increase the chances of medical error in the
prescription of medicines and the incidence of side effects of medicine misuse.
These are only a few innovations achieved through the use of deep learning methods in bioinformat-
ics. Ranging from medical diagnosis and tumor detection to production and prescription of customized
1.3 Contemporary life science is about data 3
medicines based on a specific genome, deep learning has attracted many large pharmaceutical and
medical companies. Many deep learning ideas used in bioinformatics are inspired by the conventional
applications of deep learning.
We are living in an interesting era when there is a convergence of biological data and the extensive
scientific methods of processing that kind of data. Those who can combine data with novel methods to
learn from data patterns can achieve significant scientific breakthroughs.
ter 2, I discuss the three ways that a machine can learn, which are supervised learning, unsupervised
learning, and reinforcement learning.
To work with deep learning, you need to be familiar with a number of mathematical and statistical
concepts. In Chapter 2, I outline some of the important concepts, e.g., tensors, you will be working with.
Chapter 2 introduces the Keras library where we will implement deep learning projects. Chapter 2 ends
by introducing several real-world tensors.
Chapter 3 provides a brief introduction to the Python ecosystem. If you would like to make a ca-
reer in the domain of deep learning, you need to know Python programming along with the Python
ecosystem. Based on the report of GitHub, Python is the most popular programming language used
for machine learning hosted on its service. To build effective deep learning models, you need to have
some basic understanding of the Python ecosystem, e.g., Numpy, Pandas, Matplotlib, and Scikit-learn
libraries. This chapter introduces various Python libraries and examples that are very useful to develop
deep learning applications.
The chapter begins with introducing four high-performance computing environments that you can
use to write Python programs without installing anything, including IPython, the Jupyter notebook,
Colaboratory, and Kaggle. Chapter 3 provides general descriptions about SciPy (Scientific Python)
ecosystem and Scikit-learn library. This chapter provides a few basic details about Python syntax you
should be familiar with to understand the code and write a typical program. The syntaxes discussed
include identifier, comments, data type, control flow statements, data structures, and functions. I provide
examples to explain these syntaxes.
NumPy is a Python core library that is widely used in deep learning applications. This library
supports multidimensional arrays and matrices, along with a large collection of high-level mathematical
functions to operate on them. In Chapter 3, I provide several examples about this library which you will
need in deep learning applications. After providing an overview of NumPy, I discuss the Matplotlib
library which is a plotting library used for creating plots and charts. An easy way to load data is to use
the Pandas library. This library is built on top of the Python programming language. In Chapter 3, you
learn how to use this library to load data. In Python, there exist several ways to load a CSV data file to
use in deep learning algorithms. In Chapter 3 you will learn two frequently used ways: (1) loading CSV
files with NumPy and (2) loading CSV Files with Pandas. Reviewing the shape of the dataset is one
of the most frequent data manipulation operations in deep learning applications, for example, seeing
how much data we have, in terms of rows and columns. Chapter 3 also provides examples of this. After
that, I explain how you can use the Pearson correlation coefficient to determine the correlation between
features.
In Chapter 3, I explain Histograms, Box and Whisker Plots, and Correlation Matrix Plot, three
techniques that you can use to understand each feature of your dataset independently. Deep learning
algorithms use numerical features to learn from the data. However, when the features have different
scales, such as “Age” in years and “Income” in hundreds of dollars, the features using larger scales
can unduly influence the model. As a result, we want the features to be on a similar scale that can
be achieved through scaling techniques. In this chapter, you learn how to standardize the data using
Scikit-learn.
Bioinformatics datasets are often high-dimensional. Chapter 3 introduces several feature selection
methods. Feature selection is one of the key concepts in machine learning which is used to select a
subset of features that contribute the most to the output. It thus hugely impacts the performance of the
6 Chapter 1 Why life science?
constructed model. Chapter 3 ends with introducing the train_test_split() function which allows you to
split a dataset into the training and test sets.
Chapter 4 provides the basic structure of neural networks. In this chapter, I discuss the types of
neural network and provide an example of how to train a single-layer neural network. Chapter 4 dis-
cusses gradient descent which is used to update the network’s weights. To this end, three gradient
descent methods, namely Stochastic Gradient Descent, Batch Gradient Descent, and Mini-batch Gradi-
ent Descent, are discussed. Chapter 4 ends with a discussion about the limitations of single-layer neural
networks.
Training a multilayer neural network is discussed in Chapter 5. In this chapter, the backpropagation
algorithm, an effective algorithm used to train a neural network, is introduced. After that, I explain how
you can design a neural network in Keras. The MNIST dataset is often considered the “hello world” of
deep learning. The purpose of this example is first to classify different types of handwritten numbers
based on their appearance and then to classify the handwritten input into the most similar group in order
to identify the corresponding digit. In this chapter, I implement a handwritten classification problem
with dense layers in Keras. After the implementation of this problem, you can learn the components of
neural networks without going into technical details. Chapter 5 ends with a discussion about two more
general data preprocessing techniques, namely vectorization and value normalization. After studying
this chapter, you will be able to design a deep learning network with dense layers.
Chapter 6 discusses the classification problem. Classification is a very important task in bioinfor-
matics and refers to a predictive modeling problem where a class label is predicted for a given input
data. Pima Indians Diabetes Database is employed to predict the onset of diabetes based on diagnostic
measures. In this dataset, there are 768 observations with 8 input variables (i.e., the number of features)
and one output variable (Diabetic and Nondiabetic). In this chapter, using the Pima dataset, I imple-
ment a binary classification in Keras to classify people into diabetic and nondiabetic categories. Neural
networks expect numerical input values. For nonnumerical data, we need to convert it to numerical data
to make this data ready for the network. Label encoding is one of the popular processes of converting
labels, i.e., categorical texts, into numeric values in order to make them understandable for machines.
Chapter 6 explains how you can do this. In this chapter, I also discuss multiclass classification.
Chapter 7 provides an overview of deep learning. Deep learning is a type of machine learning that
has improved the ability to classify, recognize, detect, and generate—or in one word, understand. Chap-
ter 7 helps you understand why deep learning was introduced much later than the single-layer neural
networks and also what challenges deep learning faces. This chapter discusses the most important chal-
lenge of deep learning, namely overfitting, and how to deal with it. This chapter will show you how
to build a deep neural network with two examples from the bioinformatics field, namely breast cancer
classification and molecular classification of cancer by gene expression, using the Keras library.
In deep learning, the main problem is overfitting. The best solution to reduce overfitting is to get
more training data. When no further training data can be accessed, the next best solution is to limit
the amount of information your model can store or be allowed to store. This is called regularization.
In Chapter 7, I describe three techniques, namely reducing the network’s size, dropout, and weight
regularization, to deal with overfitting. Another important concept discussed in this chapter is how to
deal with imbalanced datasets. A dataset is said to be imbalanced when there is a significant difference
in the number of instances in one set of classes, called a majority class, compared to another set of
classes, called a minority class. In imbalanced datasets, neural networks can function well. To deal
with this problem, this chapter introduces RandomOverSampler class in Keras.
1.5 What will you learn? 7
Chapter 8 introduces the convolutional neural network, a deep neural network with special image
processing applications. Such networks significantly improve the processing of information (images) by
deep layers. In Chapter 8, I briefly explain the basic part of a convolution architecture. How convolution
works can hardly be described in words. However, the concept and steps of calculating it are simpler
than they first seem. In this chapter, using a simple example, I show how convolution works. This
chapter also discusses the pooling layer. This layer is utilized to reduce the image’s size by summarizing
neighboring pixels and giving them a value. In fact, it is a downsampling operation. In this chapter, I
implement three medical image processing problems, namely predicting coronavirus disease (COVID-
19), predicting breast cancer, and diabetic retinopathy detection, in Keras. After studying these three
problems, you will learn many practical concepts and techniques in image processing.
Chapter 9 provides an overview of popular deep learning image classifiers. In this chapter, I analyze
eight well-known image classification architectures that have been ranked first in the ILSVRC compe-
tition in different years, along with their Keras codes. After studying this chapter, you will be able to
design high-precision convolutional networks for a problem of interest.
In Chapter 10, I discuss electrocardiogram (ECG) arrhythmia classification. Arrhythmia refers to
any irregular change from normal heart rhythms. This chapter helps you understand how to classify
ECG signals into normal and different types of arrhythmia using a convolutional neural network (CNN).
This chapter provides a Keras code to do this.
Chapter 11 discusses autoencoders and generative models and how to implement them. The net-
works discussed in this chapter, although seemingly identical to the previous chapters, use different
concepts called encoding and decoding. These concepts were not present in previous chapters. The au-
toencoders and generative models are a newly emerging field in deep learning, showing a lot of success
and receiving increasing attention in the deep learning area in the past couple of years. In this chapter,
I will discuss different types of deep generative model and focus on autoencoders’ variations, teaching
how to implement and train autoencoders and deep generators using Keras.
A large part of the data, such as speech, protein sequence, data received from sensors, videos,
and texts, are inherently serial (sequential). Sequential data are data whose current value depends on
previous values. Recurrent neural networks (simple RNNs) are a good way to process sequential data
due to considering the sequence dependence in the calculations. But their capability to compute long
sequence data is limited. The long short-term memory networks, in short LSTM, is a type of recurrent
neural network utilized to handle large sequences. In Chapter 12, I discuss RNN and LSTM, as well as
two important topics in bioinformatics, namely protein sequence classification and the design of new
molecules.
Chapter 13 presents several deep learning applications in bioinformatics, then discusses several deep
learning challenges and ways we can overcome them.
This page intentionally left blank
CHAPTER
Before moving on to the meaning of machine learning, let us find out what the sense of Artificial
Intelligence (AI) is. According to Webster’s dictionary, intelligence is the ability to learn and solve
problems. It will be recalled that intelligence is the skill to obtain and apply knowledge. Knowledge
is the information taken through experience or/and training. Artificial, also, refers to something that is
simulated or made by humans, not by nature.
Now we are ready to define AI. There is not a unique definition for AI. The Oxford Dictionary
defines AI as “the theory and development of computer systems able to perform tasks normally requir-
ing human intelligence, such as visual perception, speech recognition, decision-making, and translation
between languages.” It is, therefore, an intelligence where we would like to add all the abilities to a
machine that human mind contains.
FIGURE 2.1
The relationship between artificial intelligence, machine learning, and deep learning.
FIGURE 2.2
Traditional programming (left) vs machine learning (right).
In machine learning, we can generate a program (also known as a learned model) by integrating the
input and output of that program.
2.2 What is machine learning? 11
Machine learning is very popular now and is often synonymous with artificial intelligence.
In general, one cannot understand the concept of artificial intelligence without knowing how
machine learning works.
Let x and y are two vectors. In most of the problems in machine learning, the aim is to create a
mathematical function as follows:
y = f (x).
This function may take many vectors as input, perhaps thousands or even millions, and may generate
many numbers as output. Here are some examples of functions you may want to create:
• x contains the health characteristics of a large number of people, e.g., Pregnancies, Glucose, Blood
Pressure, Skin Thickness, Insulin, BMI, Age, and f (x) should equal to 1 if a person has diabetes
and 0 if it does not.
• x is the structure of a protein (i.e., a sequence of acids and amino acids) and f (x) must determine
the function of a protein, depending on the dataset used, there can be many functions.
• x contains a number of color images and f (x) should equal to 1 if the image has breast cancer and
0 if it does not.
• x contains a number of chest radiograph (chest X-ray) images; f (x) should be a vector of numbers.
The first element indicates whether the image contains a pleural thickening, the second whether it
contains cardiomegaly, the third whether it contains a nodule, and so on for many types of objects.
As you can see, f (x) can be a very, very complex function! It usually takes a lot of inputs and tries
to extract patterns from them that cannot be extracted manually just by looking at the input numbers.
In machine learning, f (x) is called the model.
In machine learning, we basically try to build a model from the dataset, which is referred to as the
“learned model,” to predict the new and unseen data. This short description has implications that may
not be obvious at first glance. Consequently, let me elaborate on this, just a few words first. Machine
learning aims to automatically create a “model” from “data,” which you can then use to make decisions.
In this direction, the data means information such as genes, proteins, images, documents, etc.
Before going further toward the model, let me step aside from the model a bit. If you have noticed,
the definition of machine learning only describes the concepts of data and model, and does not discuss
anything about “learning.” The term machine learning itself describes the process of finding a model
by analyzing data without having to be done by a human. Because this process, i.e., finding a model,
is trained with the help of data, we call it the “learning process.” Therefore, the data used for building
the “model” is called the training set. The first thing you need, of course, is a training set to train the
model. Fig. 2.3 depicts the overall process of machine learning.
I need to point out that the dataset used for a problem is initially split into two sets: training set and
test set. As mentioned earlier, the samples in the training set are used to train the model and the samples
in the test set are used to evaluate the performance of the resulting model. Fig. 2.4 shows this division.
After testing the model, if it is observed that the model is performing well enough, it can be used in the
real environment for new data.
Model is our main interest in this section, and let us now resume this discussion. In machine learn-
ing, the model is the final product we are looking for and this is what we actually use. The resulting
12 Chapter 2 A review of machine learning
FIGURE 2.3
The overall process of machine learning.
FIGURE 2.4
Splitting data into two parts of training and test sets.
model can be a mathematical representation of a real-world process. For example, if we are developing
a prediction system to identify the risk of breast cancer at earlier stages of the disease, the prediction
system is the model that we are talking about. If the training data used in the learning process are com-
prehensive, the model constructed works as well as the experts themselves. Machine learning has two
steps of training and inference:
• Training refers to the process of creating a model,
• Inference refers to the process of using a trained model to make a prediction.
2.3 Challenge with machine learning 13
FIGURE 2.5
Using the model for prediction.
In machine learning, the output of the training process is a model so that we can then utilize the
model to real-world domain data. This process is depicted in Fig. 2.5. The training data that is used to
create a model and the new data which is used in the real environment are often different.
learning model’s ability to perform well on unseen data rather than just the data that it was trained on.
The ability of a model to generalize is crucial to the success of machine learning (learned model).
Overfitting is due to the model learning “too much” from the training data. When we simplify the
model to reduce the risk of overfitting, we call this process regularization.
Underfitting is when not only learning is not good, but also when the model performs poorly on other
datasets. Underfitting is due to the model having “not learned enough” from the training data, yielding
low generalization and inaccurate predictions.
In summary,
• In overfitting, the accuracy of the model is high for data similar to training data as well as training
data.
• In overfitting, the model accuracy for new and never seen data is low.
• Overfitting occurs when the model is highly dependent on training data and therefore cannot be
generalized to new data.
• Overfitting occurs when a model learns the details and noise in training data to the extent that it
negatively affects the model performance on new data.
• Overfitting occurs when the model tries to memorize the training data only, instead of learning the
scope of the problem and finding the relationship between the independent and dependent variables,
and this is what we call being very dependent on the training data.
• Underfitting occurs when the model is not sufficiently trained from the training data at the time of
learning.
2.4 Overfitting and underfitting 15
FIGURE 2.6
Splitting the training data into two sets, namely training and validation sets. The validation set must not share any
samples with either the training set or the test set.
According to the separation intended for these two sets, modeling will be based only on the training
data part. But in the cross-validation method, hereinafter referred to as CV, during a repetitive process,
the training set used to create the model is split into two parts. Each time the CV process is repeated,
part of the data is used to train and part to test the model. Thus, this process is a sampling method
to estimate the model error. Fig. 2.6 illustrates the splitting of training data into two sets, training and
validation sets.
The ratio of these parts is also debatable, which I will not discuss here, but usually 50% of the total
data is for training purposes, 25% for cross-validation, and the rest of the data for model testing.
It should be noted that the test data in the CV process may be used as training data in the next
iteration, so their nature is different from the data previously introduced as test data.
At each stage of the CV process, the model trained by applying the training samples is used to
predict the other part of CV data, and the “error” or “accuracy” of the model is calculated on the samples
that were not used to train the model. The average of these errors (accuracy) is usually considered as
the overall error (accuracy) of the model. Of course, it is better to report the standard deviation of
errors (accuracy). Thus, according to the number of different parameters (model complexity), different
models can be produced and their estimation error can be measured using the CV method. At the end,
we will choose a model as the most appropriate if it has the lowest error estimate.
Leave-One-Out method. In this method, an observation is removed from the training set and based
on the rest of the observations, the parameters are estimated. The model error is then calculated for
the removed observation. Since in this method only one observation is removed from each stage of the
CV process, the number of iterations of the CV process is equal to the number of training data. As a
result, the error calculation time of the model is short and can be easily implemented. This method is
sometimes called LOO for short.
Leave-P-Out method. If in the LOO method the number of observations coming out of the training set
is equal to p, it is called the Leave-P-Out method, or LPO for short. As a result, if n denotes the number
of observations in the training set, the number of steps in the CV process will be pn . Thus, at each stage
of the process, the p observations are removed from the training data and the model is estimated based
on the other parameters. The model error is then calculated for the removed observations. Finally, by
calculating the average of the obtained errors, the model error is estimated.
K-Fold method. If we randomly split the training samples into k subfolders or “folds” of the same size,
at each stage of the CV process, we can consider k − 1 of these subfolders as the training set and one
as the validation set. Fig. 2.7 illustrates the splitting of the training data into k folds. It is clear that by
selecting k = 5, the number of iterations of the CV process will be equal to 5 and it will be possible to
achieve the appropriate model quickly. This method is the gold-standard to evaluate the performance of
a machine learning algorithm.
Choosing the right number of folds is an important consideration in this approach. When choosing
the number of folds, it should be noted that it is necessary to have enough data in each fold to be able
to provide a good estimate of the model performance. On the other hand, the number of folds should
not be underestimated, in order to have enough folds to evaluate the model performance.
Validation based on random sampling. In this method, sometimes known as Monte Carlo cross-
validation, the dataset is randomly divided into training and validation sets. The model parameters are
then estimated based on the training data and the error or accuracy of the model is calculated using the
validation data. By repeating the random separation of data, the mean error or accuracy of the models
is considered as the criterion for selecting the appropriate model (least error or highest accuracy). Due
to the random selection of data, the ratio of training data size and validation will not depend on the
number of iterations, and unlike the k-fold method, the CV process can be performed with any number
of iterations. Instead, due to the random selection of subsamples, some observations may never be used
in the validation section and others may be used more than once in the model error estimate calculations.
FIGURE 2.7
K-fold validation process.
FIGURE 2.8
Three core types of machine learning techniques differing in their approach.
Table 2.1 shows the shape of the training dataset in detail. In this table, you can see that the correct
output is provided for each input. Another name for the “correct output” is “class” or “label.”
Now that you know the meaning of the supervised process, let us look at how a supervised algorithm
works:
Step 1. Data preparation – the very first step conducted before training a model in the supervised
process is to load labeled data into the system. This step usually takes more time for data preparation,
including data labeling and some preprocessing operations on it, such as removing invalid data. Most
tasks that can be done at this stage are often performed by a human trainer. At the end of this step, the
dataset prepared for the next step is so divided into training and test sets.
Step 2. Training process – the goal of this step is to find a relationship between input and output with
acceptable accuracy. Machine learning algorithms are used to find such a relationship. The output of
this step is a model made for the problem.
Step 3. Testing process – the model built in the second step will be tested on new data in this step to
determine its performance in the face of new and unseen data.
Step 4. Prediction – when the model is ready after training and testing, it can start making a prediction
or decision when new data is given to it.
There are two main supervised learning techniques: Regression and Classification. Table 2.2 shows
a summary of what they perform.
A classification algorithm classifies the input data (new observation) into one of several predefined
classes. It learns from the available dataset and then uses this learning to classify new observations.
There are two types of classification, which are binary and nonbinary classification. The classification
20 Chapter 2 A review of machine learning
of humans into two groups with diabetes and those without is an example of a binary classification.
Protein family classification is an example of nonbinary classification. In this problem, the proteins are
classified into classes that share similar function.
The structure of the training data of the classification problem, i.e., input and correct output pairs,
looks like in Table 2.3.
WHAT IS A FEATURE?
A feature in machine learning is any column value in the dataset that describes a piece of data.
For example, in the diagnosis of diabetes in a human, Pregnancies, Glucose, Blood Pressure,
etc., are examples of features. Note that we use features as independent variables.
Regression is another useful application from supervised machine learning algorithms that is used
to find a relationship between variables (features). It attempts to predict the output value when the input
value is given. In contrast to classification, regression does not determine the class. Instead, it involves
predicting a numerical value.
FIGURE 2.9
Several supervised and unsupervised algorithms.
find all kinds of unknown patterns in data. Something to bear in mind is that clustering and classification
are distinct terms. Some clustering approaches are:
• Partitioning methods
• Hierarchical clustering
• Fuzzy clustering
• Density-based clustering
• Model-based clustering
Why reduce dimensionality? Among the reasons are time or space complexity, desire to reduce the
cost of viewing and collecting additional and unnecessary data, and having better visualization when
data is 2D or 3D. Fig. 2.9 depicts several most used supervised and unsupervised algorithms.
a set of learning problems in which an “agent” must perform “actions” in an “environment” in order to
maximize the defined “reward function.”
Unlike in supervised learning, in reinforcement learning there is no labeled data or, in fact, correct
input and output pairs. Thus, a large part of learning takes place “online” and, for example, when the
agent actively interacts with its environment over several repetitions and gradually learns the “policy”
that applies and explains what can be done to maximize the “reward.”
Reinforcement learning has different goals compared to unsupervised learning. While the goal in
unsupervised learning is to explore the distribution in the data in order to learn more about the data,
reinforcement learning aims to discover the right data model that maximizes the “total cumulative
reward” for the agent.
Q-learning and SARSA (State–Action–Reward–State–Action) are two popular, model-independent
algorithms for reinforcement learning. The difference between these algorithms is in their search strate-
gies.
2.6.1 Tensors
Tensor is a new word. A tensor is a matrix in which each cell can hold multiple numbers instead of one.
Typically, deep learning uses tensors as the primary data structure. Tensors are the basis of this field,
which is why TensorFlow Google is so named. Now, what is a tensor? A tensor is actually a container
for storing data. Let us see the several types of tensors:
Scalars (zero-dimensional tensors). A tensor that contains only one number is called a scalar. This
number can be an integer or a decimal number.
Vectors (one-dimensional tensors). An array of numbers or a one-dimensional tensor is called a vector.
In mathematical texts, we often see vectors written as follows:
⎡ ⎤
x1
⎢.⎥
x = ⎣ .. ⎦
xn
or [x1 , . . . , xn ].
A one-dimensional tensor has exactly one axis. If an array has four elements then it is called a 4D
vector. There is a difference between a 4D vector and a 4D tensor. The 4D vector has only one axis
containing four components, while the 4D tensor has five axes.
Matrices (two-dimensional tensors). A vector of vectors, or arrays, is called a matrix. A matrix has
two axes known as the row axis and the column axis. For example, the following matrix has three rows
2.6 The math behind deep learning 23
FIGURE 2.10
3D tensor.
FIGURE 2.11
4D tensor.
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
ebookmass.com