0% found this document useful (0 votes)
3 views

Assignment 4

Uploaded by

arun raghu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Assignment 4

Uploaded by

arun raghu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Assignment-4

1. What is Linear Regression, and how is it used in Machine Learning?

Linear Regression is one of the fundamental supervised learning algorithms used in machine
learning for predictive modeling. It establishes a relationship between an independent
variable (input) and a dependent variable (output) by fitting a straight line to the data. This
line, known as the regression line, is represented by the equation:

Y=mX+CY = mX + C

where:

 YY is the predicted output,


 mm is the slope (coefficient),
 XX is the input variable, and
 CC is the intercept.

Linear Regression is mainly used for predicting continuous values, such as stock prices,
house prices, or temperature trends. The model learns by minimizing the difference between
the actual and predicted values using techniques like Ordinary Least Squares (OLS) or
Gradient Descent.

2. How do we implement Linear Regression using a programming language like


Python?

Linear Regression can be implemented in Python using libraries such as scikit-learn.


Below is a simple example:

from sklearn.linear_model import LinearRegression


from sklearn.model_selection import train_test_split
import numpy as np

# Sample data
X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
Y = np.array([2, 4, 5, 4, 5])
# Splitting data into training and testing sets
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2,
random_state=42)

# Creating and training the model


model = LinearRegression()
model.fit(X_train, Y_train)

# Making predictions
predictions = model.predict(X_test)
print("Predictions:", predictions)

This script trains a linear regression model on a small dataset and makes predictions on
unseen data.

3. What are some real-world applications of Linear Regression?

Linear Regression has a wide range of real-world applications, including:

 Finance: Predicting stock prices based on historical trends.


 Healthcare: Estimating patient recovery time based on health indicators.
 Marketing: Understanding the impact of advertising budgets on sales revenue.
 Real Estate: Predicting house prices based on features like location and size.
 Manufacturing: Forecasting product demand based on past sales data.

These applications demonstrate how Linear Regression is an essential tool for making data-
driven decisions.

4. What are the key performance parameters used to evaluate a Linear Regression
model?

Key performance metrics for evaluating a Linear Regression model include:

 Mean Absolute Error (MAE): Measures the average absolute difference between
actual and predicted values.
 Mean Squared Error (MSE): Penalizes larger errors by squaring the differences.
 Root Mean Squared Error (RMSE): The square root of MSE, providing error in
original units.
 R-Squared (R²): Represents the proportion of variance in the dependent variable
explained by the model. A value close to 1 indicates a good fit.

These metrics help in assessing how well the model generalizes to unseen data.

5. What is a Decision Tree Classifier, and how does it work?

A Decision Tree Classifier is a supervised learning algorithm used for classification tasks. It
works by recursively splitting the dataset based on feature values to create a tree-like
structure of decision rules.

At each node, the algorithm selects the best feature to split the data by minimizing impurity
(measured using Gini Index or Entropy). The process continues until all samples in a node
belong to the same class or another stopping criterion is met.

Decision Trees are widely used because they are easy to interpret and handle both numerical
and categorical data effectively.

6. What are the key differences between Classification and Regression Trees?

Aspect Classification Trees Regression Trees


Categorical labels (e.g., spam/non- Continuous values (e.g., house
Output Type
spam) prices)
Splitting
Gini Index, Entropy Mean Squared Error (MSE)
Criterion
Application Used for classification tasks Used for regression tasks

While both trees use recursive partitioning, classification trees focus on predicting categories,
whereas regression trees predict continuous values.

7. How does the Gini Index help in creating a Decision Tree?

The Gini Index is a measure of impurity used to split nodes in a Decision Tree. It is
calculated as:
Gini=1−∑pi2Gini = 1 - \sum p_i^2

where pip_i is the probability of a class appearing in the node.

A lower Gini Index indicates a purer node. The algorithm selects splits that minimize
impurity, leading to better classification performance.

8. What is the ID3 algorithm, and how does it use Information Gain?

The ID3 (Iterative Dichotomiser 3) algorithm is a Decision Tree learning algorithm that
builds trees using the concept of Information Gain. Information Gain measures the reduction
in entropy (randomness) after a dataset split.

Formula for entropy:

Entropy=−∑pilog 2piEntropy = - \sum p_i \log_2 p_i

The feature with the highest Information Gain is chosen for splitting, as it provides the most
informative split.

9. What is a Random Forest Classifier, and how does it improve accuracy?

A Random Forest Classifier is an ensemble learning method that combines multiple Decision
Trees to improve accuracy and reduce overfitting. It works by:

1. Creating multiple Decision Trees using different subsets of data and features.
2. Aggregating predictions from all trees (majority vote for classification, average for
regression).

This method increases robustness, generalization, and reduces sensitivity to individual noisy
features.

10. Can you explain a real-world case study where regression and classification models
are used to solve a problem?

A real-world example of using both regression and classification is loan approval prediction
and risk assessment in banking.
1. Classification Model (Decision Tree/Random Forest):
o Used to classify loan applicants as "Approved" or "Rejected" based on factors
like credit score, income, and employment status.
o Helps automate loan processing, improving efficiency.
2. Regression Model (Linear Regression):
o Used to predict loan default probability based on factors like past loan history,
outstanding debts, and economic trends.
o Helps banks decide interest rates and loan limits.

This combination of regression and classification ensures accurate decision-making,


minimizing financial risk while improving customer experience.

You might also like