New Classification and Regression Models
New Classification and Regression Models
o Logistic Regression
o K-Nearest Neighbours
o Support Vector Machines
o Kernel SVM
o Naïve Bayes
o Decision Tree Classification
o Random Forest Classification
Regression:
Regression is a process of finding the correlations between dependent and
independent variables. It helps in predicting the continuous variables such
as prediction of Market Trends, prediction of House prices, etc.
The task of the regression algorithm The task of the classification algorithm is to
is to map the input value (x) with the map the input value(x) with the discrete output
continuous output variable(y). variable(y).
Regression Algorithms are used with Classification Algorithms are used with discrete
continuous data. data.
In Regression, we try to find the best In Classification, we try to find the decision
fit line, which can predict the output boundary, which can divide the dataset into
more accurately. different classes.
Basic Concepts:
Idea: The core idea of KNN is to predict the class or value of a data point by looking at
the K data points that are closest to it.
Distance Metric: KNN uses a distance metric (such as Euclidean distance) to measure the
similarity between data points. The smaller the distance, the more similar the points
are.
Choosing K: You need to choose the value of K, which represents the number of nearest
neighbors to consider when making predictions. A small K might lead to noisy
predictions, while a large K might overlook local patterns.
Steps:
Choose K: Decide the value of K (the number of neighbors to consider).
Distance Calculation: Calculate the distance between the target data point and all other data
points in the training set.
Find K Neighbors: Identify the K data points with the smallest distances to the target data point.
Majority Vote (Classification): For classification, count the classes of the K nearest neighbors
and predict the class with the highest count.
Average (Regression): For regression, calculate the average of the output values of the K
nearest neighbors and predict that value.
Advanced Concepts:
Weighted KNN: Instead of considering all neighbors equally, you can assign different
weights to neighbors based on their distances. Closer neighbors might have a higher
influence.
Distance Weights: You can experiment with different distance metrics (Euclidean,
Manhattan, etc.) based on the nature of your data.
Curse of Dimensionality: KNN can suffer from the curse of dimensionality when working
with high-dimensional data, as distances become less meaningful in higher dimensions.
Model Complexity: The choice of K affects the model's complexity. Smaller K can lead to
overfitting noisy data, while larger K can result in oversmoothed predictions.
Code:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")