0% found this document useful (0 votes)
91 views5 pages

Machine Learning Notes

Machine learning uses data to detect patterns and improve automatically from past data. There are four main types of machine learning algorithms: supervised, semi-supervised, unsupervised, and reinforcement learning. Feature engineering improves model performance by uncovering hidden patterns and boosting predictive power. It involves feature creation, transformations, extraction, and selection to help algorithms understand the right data.

Uploaded by

Vinay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
91 views5 pages

Machine Learning Notes

Machine learning uses data to detect patterns and improve automatically from past data. There are four main types of machine learning algorithms: supervised, semi-supervised, unsupervised, and reinforcement learning. Feature engineering improves model performance by uncovering hidden patterns and boosting predictive power. It involves feature creation, transformations, extraction, and selection to help algorithms understand the right data.

Uploaded by

Vinay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

1.

Features of ML:
• Machine learning uses data to detect various patterns in a given dataset.
• It can learn from past data and improve automatically.
• It is a data-driven technology.
• Machine learning is much similar to data mining as it also deals with the huge amount
of the data.

2. Types of ml algorithm
There are four types of machine learning algorithms: supervised, semi-supervised,
unsupervised and reinforcement.

3. Bias and Variance

4. Underfitting and Overfitting


5. No free Lunch Theorem
According to the “No Free Lunch” theory, there is no one model that works best for
every situation. Because the assumptions of a great model for one issue may not
hold true for another, it is typical in machine learning to attempt many models to
discover the one that performs best for a specific problem.

6. Ordinal Data:
Ordinal data is a kind of qualitative data that groups variables into ordered categories.
The categories have a natural order or rank based on some hierarchal scale, like from high to
low. But there is no clearly defined interval between the categories.

7. Steps in Feature Engineering:


Feature engineering in ML contains mainly four processes: Feature Creation,
Transformations, Feature Extraction, and Feature Selection.
8. Vectorization
In Machine Learning, vectorization is a step in feature extraction. The idea is to get some
distinct features out of the text for the model to train on, by converting text to numerical
vectors.

9. What happens to ml model if feature engineering is not there?


→But if we create a model without pre-processing or data handling, then it may not give
good accuracy. Whereas, if we apply feature engineering on the same model, then the
accuracy of the model is enhanced. Hence, feature engineering in machine learning improves
the model's performance.

→In addition, feature engineering influences how machine learning models


perform and how accurate they are. It helps uncover the hidden patterns in the data
and boosts the predictive power of machine learning. For machine algorithms to work
properly, users must input the right data that the algorithms can understand.

10. Flow diagram on feature selection process

11. Three steps to be followed for text corpus data:


12. Feature Scaling for Decision trees:
Feature scaling is a method used to normalize the range of independent variables or
features of data. In data processing, it is also known as data normalization and is generally
performed during the data preprocessing step.

13. Hard Margin and Soft Margin SVM:


→Hard-Margin SVMs address the generalization problem of perceptrons by
maximizing the mar- gin, formally defined as the minimum distance from the decision
boundary to the training points. Figure 2: The optimal decision boundary (as shown)
maximizes the margin.

→Soft margin SVM allows some misclassification to happen by relaxing the hard
constraints of Support Vector Machine. Soft margin SVM is implemented with the help of
the Regularization parameter (C). Regularization parameter (C): It tells us how much
misclassification we want to avoid.

14. Concept of hyperplane:


Hyperplanes are decision boundaries that help classify the data points. Data points falling
on either side of the hyperplane can be attributed to different classes. Also, the dimension of
the hyperplane depends upon the number of features. If the number of input features is 2, then
the hyperplane is just a line.

15. Supervised and Unsupervised Learning:

16. Why do you prefer Euclidean distance over Manhattan distance in the K means
Algorithm:
Euclidean distance is preferred over Manhattan distance since Manhattan distance
calculates distance only vertically or horizontally due to which it has dimension
restrictions. On the contrary, Euclidean distance can be used in any space to calculate the
distances between the data points
17. Ways to avoid the problem of initialization sensitivity in the K means Algorithm:
There are two ways to avoid the problem of initialization sensitivity:
• Repeat k-means: The algorithm is executed repeatedly. ...
• k-means++: This is a smart centroid initialization technique.

18. Why does there arise a need for DBSCAN when we already have other
clustering algorithms
K-Means and Hierarchical Clustering both fail in creating clusters of arbitrary shapes. They
are not able to form clusters based on varying densities. That's why we need DBSCAN
clustering.

19. Hierarchial clustering:


Hierarchical clustering, also known as hierarchical cluster analysis or HCA,
is another unsupervised machine learning approach for grouping unlabeled datasets
into clusters. The hierarchy of clusters is developed in the form of a tree in this technique,
and this tree-shaped structure is known as the dendrogram.

20. Expectation-Maximization:
The Expectation-Maximization (EM) algorithm is defined as the combination of various
unsupervised machine learning algorithms, which is used to determine the local maximum
likelihood estimates (MLE) or maximum a posteriori estimates (MAP) for unobservable
variables in statistical models.

21. Main Objective of Representation Learning:


The goal of representation learning is to train machine learning algorithms to learn useful
representations, such as those that are interpretable, incorporate latent features, or can be
used for transfer learning.

22. Gradient:
In machine learning, a gradient is a derivative of a function that has more than one input
variable. Known as the slope of a function in mathematical terms, the gradient simply
measures the change in all weights with regard to the change in error.

23. How does a Decision tree handle missing attribute values:

Decision Trees handle missing values in the following ways: Fill the missing attribute value
by the most common value of that attribute. Fill the missing value by assigning a
probability to each of the possible values of the attribute based on other samples.

You might also like