Machine Learning Notes
Machine Learning Notes
Features of ML:
• Machine learning uses data to detect various patterns in a given dataset.
• It can learn from past data and improve automatically.
• It is a data-driven technology.
• Machine learning is much similar to data mining as it also deals with the huge amount
of the data.
2. Types of ml algorithm
There are four types of machine learning algorithms: supervised, semi-supervised,
unsupervised and reinforcement.
6. Ordinal Data:
Ordinal data is a kind of qualitative data that groups variables into ordered categories.
The categories have a natural order or rank based on some hierarchal scale, like from high to
low. But there is no clearly defined interval between the categories.
→Soft margin SVM allows some misclassification to happen by relaxing the hard
constraints of Support Vector Machine. Soft margin SVM is implemented with the help of
the Regularization parameter (C). Regularization parameter (C): It tells us how much
misclassification we want to avoid.
16. Why do you prefer Euclidean distance over Manhattan distance in the K means
Algorithm:
Euclidean distance is preferred over Manhattan distance since Manhattan distance
calculates distance only vertically or horizontally due to which it has dimension
restrictions. On the contrary, Euclidean distance can be used in any space to calculate the
distances between the data points
17. Ways to avoid the problem of initialization sensitivity in the K means Algorithm:
There are two ways to avoid the problem of initialization sensitivity:
• Repeat k-means: The algorithm is executed repeatedly. ...
• k-means++: This is a smart centroid initialization technique.
18. Why does there arise a need for DBSCAN when we already have other
clustering algorithms
K-Means and Hierarchical Clustering both fail in creating clusters of arbitrary shapes. They
are not able to form clusters based on varying densities. That's why we need DBSCAN
clustering.
20. Expectation-Maximization:
The Expectation-Maximization (EM) algorithm is defined as the combination of various
unsupervised machine learning algorithms, which is used to determine the local maximum
likelihood estimates (MLE) or maximum a posteriori estimates (MAP) for unobservable
variables in statistical models.
22. Gradient:
In machine learning, a gradient is a derivative of a function that has more than one input
variable. Known as the slope of a function in mathematical terms, the gradient simply
measures the change in all weights with regard to the change in error.
Decision Trees handle missing values in the following ways: Fill the missing attribute value
by the most common value of that attribute. Fill the missing value by assigning a
probability to each of the possible values of the attribute based on other samples.