Algorithms 1
Algorithms 1
ALGORITHMS
REGRESSION
• Good
- Simple to implement and efficient to train.
- Overfitting can be reduced by regularization.
- Performs well when the dataset is linearly
separable.
• Bad
- Assumes that the data is independent which
is rare in real life.
-Prone to noise and overfitting.
- Sensitive to outliers.
Logistic Regression
• Good
- Less prone to over-fitting but it can overfit in
high dimensional datasets.
- Efficient when the dataset has features that
are linearly separable.
- Easy to implement and efficient to train.
• Bad
-Should not be used when the number of
observations are lesser than the number of
features.
- Assumption of linearity which is rare in
practice.
- Can only be used to predict discrete functions.
Decision Trees
• Good
- Can solve non-linear problems.
- Can work on high-dimensional data with
excellent accuracy.
- Easy to visualize and explain.
• Bad
- Overfitting. Might be resolved by random
forest.
- A small change in the data can lead to a
large change in the structure of the optimal
decision tree.
- Calculations can get very complex.
Random Forests
• Good
- It can perform both regression and
classification tasks.
- Handle large datasets efficiently.
- Higher level of accuracy in predicting.
• Bad
-These are prone to overfitting.
-It can be quite large, thus making pruning
necessary.
-Calculations can become complex when
there are many class variables.
K Nearest Neighbour
• Good
- Can make predictions without training.
- Time complexity is O(n).
- Can be used for both classification and
regression.
• Bad
- Does not work well with large dataset.
- Sensitive to noisy data, missing values
and outliers.
- Need feature scaling.
- Choose the correct K value.
Support Vector Machine
• Good
- Good at high dimensional
data.
- Can work on small dataset.
- Can solve non-linear
problems.
• Bad
-Inefficient on large data.
- Requires picking the right
Naive Bayes
• Good
- Training period is less. Bayes Theorem
- Better suited for categorical inputs.
- Easy to implement.
• Bad
- Assumes that all features are
independent which is rarely happening in
real life.
- Zero Frequency.
- Estimations can be wrong in some
K Means(UNSUPERVISED LEARNING)
• Good
- Simple to implement.
- Scales to large data sets.
- Guarantees convergence.
- Easily adapts to new examples.
- Generalizes to clusters of different shapes
and sizes.
• Bad
- Sensitive to the outliers.
- Choosing the k values manually is tough.
- Dependent on initial values.
- Scalability decreases when dimension
TRAIN AND TEST
• Machine learning is about learning
some properties of a data set and
then testing those properties against
another data set.
• A common practice in machine
learning is to evaluate an algorithm
by splitting a data set into two.
• We call one of those sets the
training set, on which we learn some
properties. we call the other set the
testing set, on which we test the
learned properties
*