ML U1 2
ML U1 2
Machine Learning: 1) Machine Learning, Engineer, Data Architect, Data Mining scientist, Data specialist, cloud
Architect and cyber security Analyst, and more. 2) Statistics, Probability Data Modelling. Programming skills, Applying ML
Libraries and algorithms. Software design python. 3) 1123 k/year Average base pay.
Artificial Intelligence: 1) Machine learning Engineer, scientist, Data Business intelligence. Developer, Big Data Architect,
Research Scientist. 2) Mathematical and Algorithms skills Probability and statistics Knowledge, Expertise in programming,
Awareness about Advanced Signal Processing techniques well versed with unix tools. 3) 14.3 lakhs per annum.
Data Science: 1) Data engineer, Data scientist, Data Analyst, Data Architect, Database Administrator, Machine learning
engineer, statistician, Business Analyst, Data and Analytics Manager. 2) Programming skills. Statistics Machine Learning,
Multi-variable calculus and linear Algorithm, Data visualisation and Communication Software, Engineering Data Intuition.
3) 1050 k/year Average base pay.
1) Learning that takes place based on a class of examples in referred to an supervised learning. It is learning based on
labelled data. In short, while learning, the system has knowledge of a set of labelled data. This is one of the most
common and frequently used learning methods. 2) The supervised learning method is comprised of a series of algorithms
that build mathematical models of certain data sets that are capable of containing both inputs and the desired outputs for
that particular machine. 3) The data being inputted into the supervised learning method in known as training data, and
essentially consists of training examples which contain one or more inputs and typically only one desired output. This
output is known as a "supervisory signal." 4) In the training examples for the supervised learning method, the training
example in represented by an array, also known as a vector or a feature vector, and the training data is represented by a
matrix. 5) The algorithm uses the iterative optimization of an objective function to predict the output that will be associated
with new inputs. 6) Ideally, if the supervised learning algorithm is working properly, the machine will be able to correctly
determine the output for the inputs that were not a part of the training data. 7) Supervised learning uses classification and
regression techniques to develop predictive models. Classification techniques predict categorical responses. 8)
Regression techniques predict continuous responses, for example, changes in temperature or fluctuations in power
demand. Typical applications include electricity load forecasting and algorithmic trading.
Advantages of supervised learning: 1) With the help of supervised learning, the model can predict the output on the basis
of prior experiences. 2) In supervised learning, we can have an exact idea about the classes of objects. 3) Supervised
learning model helps us to solve various real-world problems such as fraud detection.
Disadvantages of supervised learning: 1) Supervised learning models are not suitable for handling the complex tasks. 2)
Supervised learning cannot predict the correct output if the test data is different from the training dataset. 3) Training
required lots of computation times. 4) In supervised learning, we need enough knowledge about the classes of object.
1) Unsupervised learning refers to learning from unlabeled data. It is based more on similarity and differences than on
anything else. In this type of learning, all similar items are clustered together in a particular class where the label of a
class is not known. 2) It is not possible to learn in a supervised way in the absence of properly labeled data. In these
scenarios there is need to learn in an unsupervised way. Here the learning is based more on similarities and differences
that are visible. These differences and similarities are mathematically represented in unsupervised learning. 3) Given a
large collection of objects, we often want to be able to understand these objects and visualize their relationships. For an
example based on similarities, a kid can separate birds from other animals. It may use some property or similarity while
separating, such as the birds have wings. 4) The criterion in initial stages is the most visible aspects of those objects.
Linnaeus devoted much of his life to arranging living organisms into a hierarchy of classes, with the goal of arranging
similar organisms together at all levels of the hierarchy. Many unsupervised learning algorithms create similar hierarchical
arrangements based on similarity-based mappings.
Advantages of unsupervised learning: 1) Unsupervised learning is used for more complex tasks as compared to
supervised learning because, in unsupervised learning, we don't have labeled input data. 2) Unsupervised learning is
preferable as it is easy to get unlabeled data in comparison to labeled data.
Disadvantages of unsupervised learning: 1) Unsupervised learning is intrinsically more difficult than supervised learning
as it does not have corresponding output. 2) The result of the unsupervised learning algorithm might be less accurate as
input data is not labeled, and algorithms do not know the exact output in advance.
1) Reinforcement Learning is a feedback-based Machine learning technique in which an agent learns to behave in an
environment by performing the actions and seeing the results of actions. For each good action, the agent gets positive
feedback, and for each bad action, the agent gets negative feedback or penalty. 2) In Reinforcement Learning, the agent
learns automatically using feedbacks without any labeled data, unlike supervised learning. 3) Since there is no labelled
data, so the agent is bound to learn by its experience only. 4) RL solves a specific type of problem where decision making
is sequential, and the goal is long-term, such as game-playing, robotics, etc. 5) The agent interacts with the environment
and explores it by itself. The primary goal of an agent in reinforcement learning is to improve the performance by getting
the maximum positive rewards. The agent learns with the process of hit and trial, and based on the experience, it learns
to perform the task in a better way. Hence, we can say that "Reinforcement learning is a type of machine learning method
where an intelligent agent (computer program) interacts with the environment and learns to act within that." How a
Robotic dog learns the movement of his arms is an example of Reinforcement learning. 6) It is a core part of Artificial
intelligence, and all AI agent works on the concept of reinforcement learning. Here we do not need to pre-program the
agent, as it learns from its own experience without any human intervention.
Parametric Models: 1) Parametric methods use a fixed number of parameters to build the model. 2) Parametric
analysis is for testing group means. 3) It is applicable only for variables. 4) It always considers strong assumptions about
data. 5) Parametric methods require lesser data than Non-parametric methods. 6) Parametric data handles intervals data
or ration data. 7) Parametric methods follow normal distribution. 8) The output generated by parametric methods can be
easily affected by outliers. 9) Parametric methods function well in many situations but its performance is at peak (top)
when the spread of each group is different. 10) Parametric Methods have more statistical power than Non-Parametric
methods. 11) As for as the computation is concerned, these methods are computationally faster than the Non-parametric
methods. 12) Examples: Logistic Regression, Naive Bayes Model etc.
Non-Parametric Models: 1) Non-parametric methods use flexible number of parameters to build the model. 2) A non-
parametric analysis is for testing medians. 3) It is applicable for both variable and Attribute. 4) It generally considers fewer
assumptions and data. 5) Non-parametric methods require much more data than parametric methods. 6) Non-parametric
methods handle original data. 7) There is no assumed distribution in non parametric methods. 8) The output generated
cannot be seriously affected by outliers. 9) Non-parametric Methods can perform well in many situations but its
performance is at the top when the spread of each group is the same. 10) Non-parametric methods have less statistical
power than parametric methods. 11) As far as the computation is concerned, these methods are computationally slower
than the parametric methods. 12) Examples: KNN, Decision Tree Model, etc.
1) Each data format represents how the input data is represented in memory. 2) Each machine learning application
performs well for a particular data format and worse for others. Choosing the correct format is a major optimisation
technique. 3) There are four types of data formats, which are commonly used. (1) NEWC (2) NCHW (3) NCDHW (4)
NDHWC
Each letter in the formats denotes a particular aspect or dimension of the data
(i) N: Batch size: is the number of images passed together as a group for inference.
(ii) C: Channel: is the number of data components that make a data point for the input data. It is 3 for opaque images and
4 for transparent images.
(1) NHWE: NHWC, denotes (Batch size, Height, Width, Channel). This implies that there is a 4D array where the first
dimension represents batch size and accordingly. This 4D array is laid out in memory in row - major order. I commonly
used data: images]
(2) NCHW: NCHW denotes (Batch Size, Channel, Height, Width). This means that there is a 4D array where the first
dimension represents batch size and so on. This 4D array is laid out in memory in row-major order. [Commonly used data
- images)
(3) NCDHW: NCDHW denotes (Batch Size, Channel, Depth, Height, Width). This means that there is a 5D array where
the first dimension represents batch size and so on. This 5D array is laid out in memory in row major order [Commonly
used data: Video]
(4) NDHWC: NDHWC denotes (Batch Size, Depth, Height, Width, Channel). This means there is a 5D array where the
first dimension represents batch size and accordingly. This 5D array is laid out in memory in row major order. Commonly
used data: Video Software: Tensor flow.
1) Statistical learning theory is a tramework for machine learning drawing from the fields of statistics and functional
analysis. 2) Statistical learning theory deals with the statistical inference problem of finding a predictive function based on
data. 3) Statistical learning theory has led to successful applications in fields such as computer vision, speech recognition
and bioinformatics. 4) Statistical learning is a set of tool for understanding data. 5) These tools come under two classes.
Supervised learning and Unsupervised learning. 6) Statistical learning is mathematical intensive which is based on the
coefficient estimator and requires a good understanding of data On the other hand, Machine Learning identifies patterns
from the dataset through it the iterations which does not require much human effort.
Lexical Acquisition: 1) The role of statistical learning in language acquisition is well documented in lexical acquisition. 2)
One important contribution to infant's understanding of segmenting words from a speech is their ability to recognize
statistical regularities of the speech, that in heard from their environment.
Statistical algorithms: 1) Statistical algorithms create a statistical model of the input data, which is in most cases Statistical
Algorithm represented as a probabilistic tree data structure. 2) Subsequences with a higher frequency are represented
with shorter codes. 3) The type of algorithm used is linear regression. It is the popular algorithm in machine learning and
statistics. 4) This model will assume a linear relationship between the input and the output variable. 5) It is represented in
the form of linear equation which has a set of inputs and a predictive output.
Types of statistical analysis: (i) Descriptive statistical analysis. (ii) Inferential Statistical Analysis, (iii) Associational
statistical analysis, (iv) Predictive analysis, (v) Prescriptive analysis, (vi) Exploratory data analysis, (vii) Causal analysis,
(viii) Data collection
Geometric Models: 1) In Geometric models, features could be described as points in two dimensions (x- and y-axis) or
a three- dimensional space (x, y, and z). Even when features are not intrinsically geometric, they could be modelled in a
geometric manner (for example, temperature as a function of time can be modelled in two axen). In geometric models,
there are two ways we could impose similarity. 2) We could use geometric concepts like lines or planes to segment
(classify) the instance space. These are called Linear models.3) Alternatively, we can use the geometric notion of
distance to represent similarity. In this case, if two points are close together, they have similar values for features and thus
can be classed as similar. We call such models as Distance-based models
Probabilistic Models: 1) In contrast to deterministic models, where the relationship between quantities is already known,
Probabilistic models are based on the assumption that relationship between quantities which is reasonably accurate but
other components are also taken into consideration. 2) Thus probabilistic models are statistical models, which give
probability distributions to account for these components. 3) Probabilistic models from the basis in other areas such as
machine learning, artificial intelligence, and data analysis. Their formulation and solution rest on the two basic rules of
probability theory, that is, the sum rule and product rule. 4) We mention an example: if one lives in a cold climate, one
knows that traffic tends to be more difficult when snow falls and covers the roads. 5) We can go a step further and make a
hypothesis: There will be a strong correlation between snowy weather and increased traffic incidents. Probabilistic models
are used in a variety of disciplines, including statistical physics, quantum mechanics and theoretical computer science.
Logical Models: 1) Logical models use a logical expression to divide the instance space into segments and hence
construct grouping models. A logical expression is an expression that returns a Boolean value, i.e., a True or False
outcome. Once the data is grouped using a logical expression, the data is divided into homogeneous groupings for the
problem we are trying to solve. 2) There are two types of logical models: Tree models and Rule models. Rule models
consist of a collection of implications or IF-THEN rules. For tree-based models, the if-part' defines a segment and the
'then-part' defines the behaviour of the model for this segment. Rule models follow the same reasoning. Tree models can
be seen as a particular type of rule model where the if parts of the rules are organised in a tree structure. Both Tree
models and Rule models use the same approach to supervised learning
Groping Models: 1) Tree models repeatedly split the instance space into smaller subsets. Trees are usually of limited
depth and don't contain all the available features. 2) Subsets at the leaves of the tree partition the instance space with
some finite resolution. Instances filtered into the same leaf of the tree are treated the same, regardless of any features not
in the tree that might be able to distinguish them.
Grading Model: 1) They don't use the notion of segment. 2) Forms one global model over instance space. 3) Grading
models are (usually) able to distinguish between arbitrary instances, no matter how similar they are Resolution is, in
theory, infinite, particularly when working in a Cartesian instance space. 4) Support vector machines and other geometric
classifiers are examples of grading models. They work in a Cartesian instance space. 5) Exploit the minutest differences
between instances
Groping and Grading Models: 1) The key difference between grouping and grading models in the way they handle the
instance space. Grouping models: Grouping models break up the instance space into groups or segments, the number
determined at training time. They have fixed resolution cannot distinguish instances beyond a resolution. At the finest
resolution grouping models assign the majority class to all instances that fall into the Bogment. Determine the right
segments and label all the objects in that segment.
Groping versus Grading Models: Some models combine the features of both grouping and grading models. Linear
classifiers are a prime example of a grading model. 2) Instances on a line or plane parallel to the decision boundary can't
be distinguished by a linear model. There are infinitely many segments.
Reinforcement Learning: 1) Reinforcement learning helps you to take your decisions sequentially. 2) Works on
interacting with the environment. 3) In RL method learning decision is dependent. Therefore, you should give labels to all
the dependent decisions. 4) Supports and work better in AI, where I human interaction is prevalent. 5) Chess game
Supervised Learning: 1) In this method, a decision is made on the input given at the beginning. 2) Works on examples or
given sample data. 3) Supervised learning the decisions which are independent of each other, so labels are given for
every decision. 4) It is mostly operated with an interactive software system or applications. 5) Object recognition