Machine Learning Techniques - Overview of Decision Trees, Logistic Regression, SVM, and K-NN
Machine Learning Techniques - Overview of Decision Trees, Logistic Regression, SVM, and K-NN
supervised
ple learning technique that classifies
data points based on the labels of their k
nearest neighbors in the training dataset.
Decision trees are a popular machine learning
technique used for classification and regres-
It operates on the principle of similarity, us- Definition and Mecha… tasks.
distance metrics like Euclidean or Manhattan sion They create a model that predicts the
ing value of a target variable based on several
distance to determine the nearest neighbors. input features.
Step 4: For classification, assign the class label Instability: Small changes in the dataset can
based on majority voting among the neigh- lead to a completely different tree structure,
bors. Limitations of Decision Trees making them sensitive to data variations.
Advantages: k-NN is easy to implement, adap-
easily
ts to new data, and requires few
hyperparameters. Bias in Imbalanced Datasets: Decision trees
favor
may the majority class if the dataset is
imbalanced, leading to poor performance on
Disadvantages: It does not scale well with large minority classes.
Advantages and Disadvantages of k-NN
datasets, is affected by the curse of
dimensionality, and is prone to overfitting due
to noise in the data.
Logistic regression is a statistical method used
for modeling the probability of a binary
Support Vector Machines are supervised Machine Learning outcome. It predicts the likelihood of an event
occurring by fitting data to a logistic function,
learning algorithms used for classification
regression.
and They find a hyperplane that Techniques: Overview which maps inputs to probabilities between 0
and 1.
separates data points of different classes with
the maximum margin. of Decision Trees,
Logistic Regression, Definition and Purpose It is primarily used in classification problems
where the goal is to predict categories, such as
The support vectors are the data points closest
to the hyperplane and are critical for defining
Overview and Functionality SVM, and k-NN spam detection or disease diagnosis.