0% found this document useful (0 votes)
26 views8 pages

Linear Regression Vs Logistic Regression

Uploaded by

riteshprasad1026
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views8 pages

Linear Regression Vs Logistic Regression

Uploaded by

riteshprasad1026
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

What is Regression?

Regression is a statistical approach used to analyze the relationship


between a dependent variable (target variable) and one or more
independent variables (predictor variables). The objective is to determine
the most suitable function that characterizes the connection between these
variables.
It seeks to find the best-fitting model, which can be utilized to make
predictions or draw conclusions.

Linear Regression vs Logistic Regression

Linear regression is Logistic Regression


used to predict the is used to predict
continuous the categorical
dependent variable dependent variable
using a given set of using a given set of
independent independent
variables. variables.

Linear Regression is Logistic regression


used for solving is used for solving
Regression Classification
problem. problems.

In logistic
In Linear regression,
Regression, we
we predict the value
predict the values of
of continuous
categorical
variables.
variables.

In linear regression, In Logistic


we find the best fit Regression, we find
line, by which we the S-curve by
can easily predict which we can
the output. classify the samples.

Maximum
Least square
likelihood
estimation method
estimation method
is used for
is used for
estimation of
estimation of
accuracy.
accuracy.

The output for The output of


Linear Regression Logistic Regression
must be a must be a
continuous value, Categorical value
such as price, age, such as 0 or 1, Yes or
etc. No, etc.

In Linear regression, In Logistic


it is required that regression, it is not
relationship required to have the
between linear relationship
dependent variable between the
and independent dependent and
variable must be independent
linear. variable.

In logistic
In linear regression,
regression, there
there may be
should not be
collinearity between
collinearity between
the independent
the independent
variables.
variable.

Applications of PCA in Machine Learning

• PCA is used to visualize multidimensional data.

• It is used to reduce the number of dimensions in healthcare data.

• PCA can help resize an image.

• It can be used in finance to analyze stock data and forecast returns.

• PCA helps to find patterns in the high-dimensional datasets.


Here are some examples of unsupervised learning:
• Anomaly detection
Unsupervised learning can identify data points that are unusual in a dataset. For
example, cybersecurity programs can use unsupervised learning to detect
deviations in network traffic patterns that might indicate a hacker.
• Customer segmentation
Unsupervised learning can help businesses understand their customers' common
traits and purchasing habits. This can help businesses personalize their advertising
strategies.
• Recommendation engines
Unsupervised learning can help businesses discover data trends that can be used
to develop effective cross-selling strategies. For example, e-commerce or news
websites can use unsupervised learning to analyze customer behavior and
recommend products to similar users.
• Natural language processing (NLP)
Unsupervised learning can be used for various NLP applications, such as
categorizing articles in news sections, text translation, and speech recognition.
• Time series analysis
Unsupervised learning can be used to find patterns in time series data and make
predictions about future events. This is important for things like weather
forecasting, sales prediction, and stock market predictions.

a kernel function is a function that transforms data into a higher dimensional


space, allowing linear methods to be applied to non-linear problems. Kernel
functions are a key component of many machine learning algorithms,
including support vector machines (SVMs).
What is a Confusion Matrix?
A confusion matrix is a matrix that summarizes the performance of a machine
learning model on a set of test data. It is a means of displaying the number of
accurate and inaccurate instances based on the model’s predictions. It is often
used to measure the performance of classification models, which aim to predict
a categorical label for each input instance.
The matrix displays the number of instances produced by the model on the
test data.
• True Positive (TP): The model correctly predicted a positive outcome
(the actual outcome was positive).
• True Negative (TN): The model correctly predicted a negative
outcome (the actual outcome was negative).
• False Positive (FP): The model incorrectly predicted a positive
outcome (the actual outcome was negative). Also known as a Type I
error.
• False Negative (FN): The model incorrectly predicted a negative
outcome (the actual outcome was positive). Also known as a Type II
error.
Metrics based on Confusion Matrix Data
1. Accuracy
Accuracy is used to measure the performance of the model. It is the ratio of Total correct
instances to the total instances.
Accuracy =(TP + TN)/(TP +TN + FP + FN)

2. Precision
Precision is a measure of how accurate a model’s positive predictions are. It is
defined as the ratio of true positive predictions to the total number of positive
predictions made by the model.
Precision= TP/(TP+FP)

Recall
Recall measures the effectiveness of a classification model in identifying all
relevant instances from a dataset. It is the ratio of the number of true positive
(TP) instances to the sum of true positive and false negative (FN) instances.
Recall = TP/(TP +FN)
F1-Score
F1-score is used to evaluate the overall performance of a classification model.
It is the harmonic mean of precision and recall,
F1-Score = 2*Precision*Recall/Precision+Recall

Specificity
Specificity is another important metric in the evaluation of classification
models, particularly in binary classification. It measures the ability of a model
to correctly identify negative instances. Specificity is also known as the True
Negative Rate. Formula is given by:
Specificity = TN/(TN+FP)

What are Ensemble Methods?

Ensemble methods are techniques that aim at improving the accuracy of results in
models by combining multiple models instead of using a single model. The combined
models increase the accuracy of the results significantly.

Types of Ensemble Methods:

Bagging, or bootstrap

aggregation, is a machine learning technique that uses multiple models to


improve the accuracy and stability of predictive models:

Here's how bagging works:


1. 1. Random sampling
Randomly select data points with replacement from the training set to create
multiple subsets, called bootstrap samples.
2. 2. Train models
Train multiple base models, such as decision trees or neural networks, on each
bootstrap sample.
3. 3. Aggregate predictions
For regression tasks, average the predictions from all models. For classification
tasks, use majority voting to select the class with the highest number of votes.
Boosting is a machine learning technique that improves the accuracy of
predictive models by combining multiple weak learners into a single strong
learner.
Boosting is a machine learning technique that improves the accuracy of predictive
models by combining multiple weak learners into a single strong learner .

Here's how boosting works:


• Train weak learners: Train multiple models sequentially, with each model
focusing on correcting the mistakes of the previous model.
• Iterative process: Repeat the training process until the algorithm's accuracy
improves.
• Combine weak learners: Merge the weak rules into a strong rule with each
iteration.
Stacking

Stacking, another ensemble method, is often referred to as stacked generalization. This


technique works by allowing a training algorithm to ensemble several other similar
learning algorithm predictions. Stacking has been successfully implemented in
regression, density estimations, distance learning, and classifications. It can also be used
to measure the error rate involved during bagging.

Advantages of Kernel PCA


Some of the advantages of kernel PCA are:

• Higher-dimensional transformation – by mapping data into a higher-


dimensional space, kernel PCA can create a more expressive
representation, potentially leading to better separation of classes or
clusters
• Nonlinear transformation – it has the ability to capture complex and
nonlinear relationships
• Flexibility – by capturing nonlinear patterns, it’s more flexible and
adaptable to various data types. Thus, kernel PCA is used for many
domains, including image recognition and speech processing
Advantages of Standard PCA
Some of the advantages of standard PCA are:

• Computational efficiency – standard PCA is computationally more


efficient than Kernel PCA, especially for high-dimensional datasets
• Interpretability – it’s easier to understand and interpret the
transformed data
• Linearity – excels in capturing linear patterns
• Here’s the comparison in tabular form:

Feature K-Means Kernel K-Means

Divides data into clusters based Maps data to a higher-dimensional


Clustering
on linear separation using space and clusters in that space using
Method
centroids. a kernel function.

Distance Euclidean distance in the original Kernel-induced distance in the


Measure space. transformed feature space.

Data Assumes clusters are linearly Handles non-linearly separable and


Assumption separable and convex. complex-shaped clusters.

Limited to simple, linearly Works well with overlapping or non-


Flexibility
separable clusters. linear clusters.

Requires a kernel function (e.g., RBF,


Kernel Usage Not used.
polynomial).

Computationally efficient and Computationally expensive due to


Efficiency
fast. kernel calculations.

Parameter No hyperparameters, Requires selecting and tuning the


Tuning straightforward. kernel and its parameters.

Slower convergence due to kernel


Convergence Usually converges faster.
computations.

Simple datasets with linear Complex datasets with non-linear


Applications
structure. structures or overlapping clusters.

You might also like