0% found this document useful (0 votes)
7 views4 pages

ML For Predictive Analysis

Ml business analytics
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views4 pages

ML For Predictive Analysis

Ml business analytics
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Machine learning for predictive analysis involves using historical data to predict future

outcomes. Here are three widely used algorithms for predictive analysis, along with detailed
explanations:

1. Linear Regression

Overview:

Linear Regression is a simple yet powerful supervised learning algorithm used for predictive
analysis. It models the relationship between a dependent variable (target) and one or more
independent variables (features) by fitting a linear equation.

Key Concepts:

● Equation: y=β0+β1x+ϵ
○ y: Predicted value (target variable).
○ xxx: Feature variable.
○ β0​: Intercept.
○ β1​: Coefficient (slope).
○ ϵ: Error term.
● Works best when there is a linear relationship between features and the target.

Steps:

1. Gather and preprocess the dataset.


2. Split the dataset into training and testing sets.
3. Fit the model to the training data using least squares estimation.
4. Evaluate the model using metrics like Mean Squared Error (MSE) or R-squared
(R2R^2R2).

Use Cases:

● Predicting house prices based on features like size, location, and amenities.
● Estimating sales based on advertising budgets.

Advantages:

● Easy to implement and interpret.


● Performs well for linearly related data.

Limitations:

● Assumes a linear relationship between variables.


● Sensitive to outliers.
2. Decision Trees

Overview:

Decision Trees are versatile and interpretable algorithms used for classification and
regression tasks. They partition data into subsets based on feature values, creating a
tree-like structure.

Key Concepts:

● Each node represents a feature or attribute.


● Branches represent decision rules.
● Leaf nodes represent predicted outcomes.

Steps:

1. Choose the best feature to split the data using a criterion like Gini Impurity or Entropy
(for classification) or Mean Squared Error (for regression).
2. Recursively split the data until stopping criteria are met (e.g., minimum number of
samples per leaf).
3. Predict outcomes by traversing the tree from the root to a leaf node.

Code:

From sklearn.tree import DecisionTreeClassifier

​ classifier= DecisionTreeClassifier(criterion='entropy', random_state=0)


​ classifier.fit(x_train, y_train)

Use Cases:

● Predicting customer churn in a subscription service.


● Estimating loan default risks.

Advantages:

● Easy to visualize and interpret.


● Handles both numerical and categorical data.
● Does not require feature scaling.

Limitations:

● Prone to overfitting if the tree is too deep.


● Sensitive to small changes in data.
3. Random Forest

Overview:

Random Forest is an ensemble learning algorithm that builds multiple decision trees and
combines their predictions for better accuracy and generalization.

Key Concepts:

● Uses Bagging (Bootstrap Aggregating): Random subsets of data are used to train
each tree.
● Aggregates predictions (e.g., by majority voting for classification or averaging for
regression).

Steps:

1. Randomly select subsets of the data (with replacement).


2. Build a decision tree for each subset.
3. Combine the predictions from all trees.

Code :

from sklearn.ensemble import RandomForestClassifier

classifier= RandomForestClassifier(n_estimators= 10, criterion="entropy")


classifier.fit(x_train, y_train

Use Cases:

● Fraud detection in banking and finance.


● Predicting customer lifetime value in marketing.

Advantages:

● Reduces overfitting by averaging results.


● Handles missing data and categorical variables.
● Works well with large datasets and high-dimensional spaces.

Limitations:

● Computationally intensive for large datasets.


● Harder to interpret compared to individual decision trees.
Comparison of Algorithms

Feature Linear Decision Trees Random Forest


Regression

Type Regression Classification/Regressi Classification/Regression


on

Interpretabilit High Medium Low


y

Accuracy Moderate Moderate to High High

Overfitting Low High (if deep) Low (due to averaging)

Scalability High Moderate Low to Moderate

Training Fast Medium Slow (due to multiple


Speed trees)

Example Applications

1. Linear Regression: Predicting future sales based on historical trends.


2. Decision Trees: Segmenting customers based on their likelihood to buy a product.
3. Random Forest: Detecting anomalies in network traffic.

You might also like