Machine Learning for Beginners: Learn to Build Machine Learning Systems Using Python (English Edition)
By Harsh Bhasin
()
About this ebook
Towards the end, the book gives a brief overview of Unsupervised Learning. Various Feature Extraction techniques, such as Fourier Transform, STFT, and Local Binary patterns, are covered. The book also discusses Principle Component Analysis and its implementation.
Related to Machine Learning for Beginners
Related ebooks
Building Machine Learning Systems Using Python: Practice to Train Predictive Models and Analyze Machine Learning Results with Real Use-Cases (English Edition) Rating: 0 out of 5 stars0 ratingsDeep Learning: Computer Vision, Python Machine Learning And Neural Networks Rating: 0 out of 5 stars0 ratingsApplied Deep Learning: Design and implement your own Neural Networks to solve real-world problems (English Edition) Rating: 0 out of 5 stars0 ratingsMachine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4 Rating: 0 out of 5 stars0 ratingsMachine Learning for Finance Rating: 5 out of 5 stars5/5Machine Learning with Spark - Second Edition Rating: 0 out of 5 stars0 ratingsPython Machine Learning: A Step by Step Beginner’s Guide to Learn Machine Learning Using Python Rating: 0 out of 5 stars0 ratingsR Machine Learning Essentials Rating: 0 out of 5 stars0 ratingsHands-on Supervised Learning with Python Rating: 0 out of 5 stars0 ratingsMicrosoft Azure Machine Learning Rating: 4 out of 5 stars4/5Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python Rating: 0 out of 5 stars0 ratingsAdvanced Machine Learning with Python Rating: 0 out of 5 stars0 ratingsEffective Amazon Machine Learning Rating: 0 out of 5 stars0 ratingsData Science Career Guide Interview Preparation Rating: 0 out of 5 stars0 ratingsMathematics for Machine Learning: A Deep Dive into Algorithms Rating: 0 out of 5 stars0 ratingsMachine Learning with Spark and Python: Essential Techniques for Predictive Analytics Rating: 0 out of 5 stars0 ratingsIntroduction to Statistical and Machine Learning Methods for Data Science Rating: 0 out of 5 stars0 ratingsThe Data Science Workshop: A New, Interactive Approach to Learning Data Science Rating: 0 out of 5 stars0 ratingsMachine Learning For Dummies Rating: 3 out of 5 stars3/5AI and ML for Coders: AI Fundamentals Rating: 0 out of 5 stars0 ratingsA Practical Approach for Machine Learning and Deep Learning Algorithms: Tools and Techniques Using MATLAB and Python Rating: 0 out of 5 stars0 ratingsGetting Started with Python Data Analysis Rating: 0 out of 5 stars0 ratingsGroup Method of Data Handling: Fundamentals and Applications for Predictive Modeling and Data Analysis Rating: 0 out of 5 stars0 ratingsMastering Machine Learning on AWS: Advanced machine learning in Python using SageMaker, Apache Spark, and TensorFlow Rating: 0 out of 5 stars0 ratings
Intelligence (AI) & Semantics For You
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing Rating: 4 out of 5 stars4/5ChatGPT Millionaire: Work From Home and Make Money Online, Tons of Business Models to Choose from Rating: 5 out of 5 stars5/5Writing AI Prompts For Dummies Rating: 0 out of 5 stars0 ratingsArtificial Intelligence: A Guide for Thinking Humans Rating: 4 out of 5 stars4/5Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates Rating: 4 out of 5 stars4/580 Ways to Use ChatGPT in the Classroom Rating: 5 out of 5 stars5/5The Secrets of ChatGPT Prompt Engineering for Non-Developers Rating: 5 out of 5 stars5/5Midjourney Mastery - The Ultimate Handbook of Prompts Rating: 5 out of 5 stars5/53550+ Most Effective ChatGPT Prompts Rating: 0 out of 5 stars0 ratings100M Offers Made Easy: Create Your Own Irresistible Offers by Turning ChatGPT into Alex Hormozi Rating: 0 out of 5 stars0 ratingsGenerative AI For Dummies Rating: 2 out of 5 stars2/5AI for Educators: AI for Educators Rating: 3 out of 5 stars3/5AI Money Machine: Unlock the Secrets to Making Money Online with AI Rating: 5 out of 5 stars5/5THE CHATGPT MILLIONAIRE'S HANDBOOK: UNLOCKING WEALTH THROUGH AI AUTOMATION Rating: 5 out of 5 stars5/5The ChatGPT Revolution: How to Simplify Your Work and Life Admin with AI Rating: 0 out of 5 stars0 ratingsThe AI-Driven Leader: Harnessing AI to Make Faster, Smarter Decisions Rating: 4 out of 5 stars4/5Artificial Intelligence For Dummies Rating: 3 out of 5 stars3/51200+ AI Prompts for Everyone.: Artificial Intelligence Prompt Library. Rating: 0 out of 5 stars0 ratingsEnterprise AI For Dummies Rating: 3 out of 5 stars3/5Demystifying Prompt Engineering: AI Prompts at Your Fingertips (A Step-By-Step Guide) Rating: 4 out of 5 stars4/5A Quickstart Guide To Becoming A ChatGPT Millionaire: The ChatGPT Book For Beginners (Lazy Money Series®) Rating: 4 out of 5 stars4/5The Roadmap to AI Mastery: A Guide to Building and Scaling Projects Rating: 3 out of 5 stars3/5AI Investing For Dummies Rating: 0 out of 5 stars0 ratings
Reviews for Machine Learning for Beginners
0 ratings0 reviews
Book preview
Machine Learning for Beginners - Harsh Bhasin
CHAPTER 1
An Introduction to Machine Learning
With the advancements in technology, data collection has become easy. When you turn on location in your mobile, upload your pictures on Facebook or Instagram, fill online forms, browse websites, or even order items from an e-commerce website, your data is collected. What do companies do with this huge data? They analyze it, find your preferences, and this helps them in marketing. The advertisements being shown to you, generally, depending on the above things. Marketing professionals must lure you into buying something that you need or are even remotely interested in. Your data helps them. Likewise, the dispensation may keep track of suspicious activities using this data, may tract the source of transactions, or gather other important information using this data. However, this is easier said than done. It is a huge data, and its analysis cannot be done using conventional methods.
Let us consider another example to understand this. Suppose Hari visits YouTube every day and watches videos related to Indian Classical Music, Hindi Poetry, and watch Lizzie McGuire. His friend Tarush goes to YouTube and watches Beer Biceps and other videos related to workouts. After some time, YouTube starts suggesting different relevant videos to both of them. While Hari is shown a video related to Lizzie McGuire’s reboot or Dinkar, in the recommended videos’ list, Tarush is not recommended any such video. On the other hand, Tarush is shown a recommendation for a workout video.
It may be stated that recommendation requires an in-depth analysis and cannot be done solely based on any conventional algorithms. Those using e-commerce websites or famous music streaming apps like YouTube must be knowing that the recommendations are mostly good, if not excellent. Here the task is prediction. Your browsing history helps in this task, and for sure, it cannot be accomplished by conventional algorithms. Moreover, the betterment in the output, with time, means there is a well-defined performance measure for the task.
Machine learning comes to the rescue of those wanting to analyze this huge data, predict trends, find patterns, and so on. This chapter introduces machine learning, discusses it’s types, explains how the given data is divided, and discusses its pipeline. This chapter also presents an overview of the history of machine learning and its applications.
Structure
The main topics covered in this chapter are as follows:
Conventional algorithm and machine learning
Types of learning
Working
Applications of machine learning
History of machine learning
Objective
After reading this chapter, the reader will be able to learn the following topics:
Understand the definition and types of machine learning
Understand the working of a machine learning algorithm
Appreciate the applications of machine learning
Learn about the history of machine learning
Conventional algorithm and machine learning
The algorithmic solution of a problem requires the input data and a program to produce an output. Here, a program is a set of instructions, and output is generated by applying those instructions to the input data. In a machine learning algorithm, the system takes the Input Data along with the examples of Output (in the case of supervised learning). It creates a model, which establishes (or tries to establish) some relation between the input and the output. Learning, in general, is improving the outcome using experience (E). How do we know that we have improved? The performance measure tells the performance of our model. As per Tom Michel, machine learning can be defined as follows.
If the performance measure (P) improves with experience (E) on task (T), then the system is said to have learned.
Here, the Task (T) can be Classification, Regression, clustering, and so on. The data constitutes Experience (E). The Performance Measure (P) can be any accuracy, specificity, sensitivity, F measure, Sum of Squared errors, and so on. These terms will be defined as we proceed. To understand this, let us consider an example of disease classification using Magnetic Resonance Imagining. If the number of patients correctly classified (accuracy) as diseased is considered as a performance measure, then this problem can be defined as follows:
T: Classify given patients as diseased or not-diseased
P: Accuracy
E: The MRI images of a patient
The task will be accomplished by pre-processing the given data, extracting relevant features from the pre-processed data, selecting the most important features, applying a classification algorithm followed by post-processing. In general, a machine learning pipeline constitutes the following steps (Figure 1.1):
Figure 1.1: Machine learning pipeline
These terms will become clear in the following chapters. Pre-processing has been discussed in the second chapter. The chapter also introduces the idea of Feature selection. The next six chapters discuss supervised learning techniques, and the last chapter introduces Feature extraction. I decided to discuss Feature extraction at the end because some of the techniques require the knowhow of concepts introduced in the previous seven chapters. Having seen the definition of machine learning, let us now have a look at its types.
Types of learning
Machine learning can be classified as supervised, unsupervised, or semi-supervised. This section gives a brief overview of the types.
Supervised machine learning
This type of learning uses the labels of the data in training set to predict the label of a sample in the test set. The training set acts as a teacher in this type of algorithm, which supervises the training process. The data in these algorithms contain samples and their correct labels. The training process tries to uncover the pattern hidden in the data. That is, the learning aims to relate the labels Y with the data X as y = f(x), where x is a sample, and y is the label.
If this label is a discrete value, then the process is termed as classification. If y is a real value, then it is called regression. Chapter 3 of this book introduces a regression, and Chapter 4 to Chapter 8 discusses classification algorithms.
Examples of classification are face detection, voice detection, object detection, and so on. Classification essentially means placing the given sample into one of the predefined categories. Examples of regression include predicting the price of a commodity, predicting temperature, housing price, and so on.
Unsupervised learning
This type of learning uses input Data(X) but no labels. The learning aims to learn about the data by grouping the like samples or by deducing the associations. Since there is no teacher involved in the algorithm, it is called unsupervised learning. Clustering and association come under unsupervised learning. Clustering uncovers the groupings in the data. Association, on the other hand, uncovers the rules which associate the events. Chapter 9 of this book discusses clustering.
There is something in between supervised and unsupervised learning. It is called semi-supervised learning. In this type of learning, a part of the input data may be labeled. Many practical problems fall into this category.
Working
This section discusses the working of a machine learning algorithm. We begin with understanding the data. It is followed by the division of data into train and test sets. The learning algorithm is then applied to the training data, and the performance is then measured.
Data
In the discussion that follows, the data is represented by X, which is a matrix with n rows and m columns (n × m matrix). Here, n is the number of samples, and m is the number of features in each sample. The labels are represented by y, which is a (n × 1 matrix). It may be noted that the ith row of y contains the label corresponding to the ith row of X.
For example, consider the Wine dataset available at the UCI Machine Learning Repository. The data considers attributes of wines from three different cultivars but from the same region in Italy. The dataset has 13 features, which are as follows (as per the official documentation at https://archive.ics.uci.edu/ml/datasets/Wine):
Alcohol
Malic acid
Ash
Alkalinity of ash
Magnesium
Total phenols
Flavanoids
Nonflavanoid phenols
Proanthocyanins
Color intensity
Hue
OD280/OD315 of diluted wines
Proline
The label is the class of the Wine (1, 2, or 3). The number of samples in the dataset is 178. That is, the values of the 13 features determine the class of Wine. The value of n is 178, and that of m is 13. The data, X, is 178 × 13 array, and the response variable, y, is a 178 × 1 array. It is followed by pre-processing, which involves many things, including removing null values. Some of these techniques have been discussed in the second chapter. Once you have got the data, create a train, and a test set out of the data.
Train test validation data
Suppose you are given the responsibility of teaching a topic to a group of students. How will you find whether your students have understood the topic? You will probably take a test, and based on the performance of the test; you will judge how well the topic has been understood. Wait! The performance is indicative of the learning only if the questions asked are not the same as those discussed while teaching (or during training). It is because giving the same questions in the test will judge how well the students can memorize, not their understanding of the topic.
The same is true for a machine learning model. The data used in the training phase should not be used for testing. So, to have confidence in the model, the given data is divided into two parts train and test Figure 1.2:
Figure 1.2: Splitting the data into train set and test set
Well! It may also mislead us in believing that the model so developed is good (or bad). So, we randomly split the data into train data (x %) and test data (100-x%) and find the accuracy. Repeat this process many times and take the average accuracy. It increases the confidence in the model so developed.
It may also be stated that while training, we may need to use some data for testing. We cannot use the test data because of the reasons stated above. So we divide the train data into train and validation. Train the model using the train data, and once the parameters are learned. Test the model using the test data:
Figure 1.3: Splitting the data into train set, test set, and validation set
While learning, this validation data can be used to see the training performance. Once the model has learned, the test data can be used to test the model.
Another approach to validation is referred to as cross-validation. In this approach, the given data is divided into k parts. Out of these k parts (say part 1), one is used for testing and the rest for training. The process is repeated, taking part 2 as the test data and the rest as the train data. Likewise, k such models are created, and the average performance of these k models is reported. For example, in Figure 1.4, the value of k is 6. The data is split into six parts, and in each iteration, one of the parts is used for testing and the rest for training. The performance of the model is reported as the average of the six models:
To summarize, K-Fold is better than the train-test split as it gives more confidence in the results. However, the volume of the data must be considered before applying K-Fold. Also, in K-Fold, you take the average performance of K models and declare it as the output. Having seen the methods of division of data into train and test, let us move forward.
Figure 1.4: K Fold Validation: K=6
Rest of the steps
The division of data is followed by choosing the target function and its representation. The learning algorithms are then applied to the training set. The algorithm learns its parameters using the training set and then applies them to the test set. We will learn about learning about our journey. The performance measures, which tell you how good your algorithm is, are discussed in the next chapter.
Having seen the outline of the machine learning pipeline, let us have a look at some of the exciting applications of machine learning.
Applications
Machine learning is an involved task, and along with other things, it also requires algorithms that learn from data. Machine learning has successfully been applied in many domains and disciplines. From Social Science to Drug Development, ML has proved it’s mental everywhere. It is creating history, and we are a part of this history. Let us just not watch it, let us live this history and immerse ourselves in ML.
Natural Language Processing (NLP)
NLP aims to process and understand natural language. It involves linguistics, engineering, and artificial Intelligence. The advancements in ML have greatly helped the field. One of the fascinating examples of the advancement in this field is Alexa, voice assistance by Amazon.
Weather forecasting
This discipline aims at predicting the weather conditions at a particular location, using the past data available. It may be stated that even before the advent of machine learning, or even computers, the weather was being predicted. However, the latest technologies have helped improve this prediction.
Robot control
As per the literature review, the mechanical parts of the robot are generally controlled by software. This software may fail if it does not update itself with time, or learn. At this point, ML comes into play. ML helps a robot in making intelligent decisions using the training data.
Speech recognition
Speech recognition aims at the translation of spoken languages by computers. It requires computational linguistics and Computer Science. This field helps in understanding