0% found this document useful (0 votes)
63 views

MachineLearning Notebook

Machine learning algorithms can be used for regression or classification problems. Simple linear regression finds a linear relationship between variables like salary and experience. Multiple linear regression extends this to model relationships between a dependent variable and multiple independent variables. Random forest improves predictions by creating many decision trees on random subsets of data and averaging their predictions. Logistic regression is used for classification when the dependent variable is categorical.

Uploaded by

Mohammad saheem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as RTF, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views

MachineLearning Notebook

Machine learning algorithms can be used for regression or classification problems. Simple linear regression finds a linear relationship between variables like salary and experience. Multiple linear regression extends this to model relationships between a dependent variable and multiple independent variables. Random forest improves predictions by creating many decision trees on random subsets of data and averaging their predictions. Logistic regression is used for classification when the dependent variable is categorical.

Uploaded by

Mohammad saheem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as RTF, PDF, TXT or read online on Scribd
You are on page 1/ 4

Machine Learning

Simple Linear Regression:


· Regression is used when we try to find relation between variables.

· Linear regression is when we try to find linear relation.

e.g, salary on experience

Y = b0 +b1X

Y= dependent variable

X = independent variable

b0 = intercept

b1 = slope

b1 = sum{(x-x_mean)(y-y_mean)}divided by sum{(x-x_mean) 2}

Ordinary least square method


· It is used to find the best fitting line.

· Output is least calculated by this formulla

sum{(actual - predicted)2} for every point.

Multiple Linear Regression


y=b0 + b1x1 + b2x2 + ......+bnxn
· It is used to find the relation between many variables.

Multi-colinearity:

output is same as intercept.


solution:

1. either eliminate intercept.

2. or eliminate one of dummy variables (d-1)

Backward Elimination:

3. Select a significance level to stay in model (default 5%)

4. Fit the model with all predictors (input values).

5. Look for higest value of p. if p>sL , then goto iv

6. Remove the p value.

7. Again fit the model without p value.

Polynomial linear Regression:


y = b0 x0+ b1x1 + b2x2 + b3x3 + .......+bnxn

Decision Tree:
· Regression

· classification

Regression

entropy:

It is used to calculate independent varaible.

Random Forest Algorithm:


it uses ensemble learning (i.e, either combine multiple
algorithms to predict a better output or use one algorithm
multiple times to predict a better output).
Steps:

8. Select k random points.

9. Make a decision tree.

10. Select a n tree and repeat i and ii.

11. Predict all the y output and take average of all outputs.

Logistic Regression:
· It is used when the data is non continous.

· It is a classifier.

· The data is in the form of yes or no.

ln(p/1-p) = b0 + b1x

Feature Scaling:
It is used to scale the feature in same scale in order
to make calculation efficient.
Confusion Matrix:
It gives us the correct and incorrect results in a
matrix form.

K Nearest Neighbour:
· It is a classification algorithm.
· By default k=5

· If I add a random point. To predict its catagory,we did this:

12. Select k random points.

13. Find k nearest points by euclidean distance formulla.

14. The catagory having more no. of neighbours will be considered the
catagory where the new point will belong.

SVM(support vector machine):


15. Itdivides catogirical linear seperable data by optimal
hyperplane.
16. optimalhyperplane is drawn by calculating max.
margin(by finding nearest points).
17. These points are called supporting vectors.
18. The data point seperate themselves by comparing
eachother.

You might also like