0% found this document useful (0 votes)
20 views9 pages

Assignment - DADS303 - MBA 3 - Set 1 and 2

The document is an assignment for a Master of Business Administration (MBA) course on Machine Learning, covering key concepts such as Machine Learning definitions, applications in business, Support Vector Machines, linear regression assumptions, K-Means clustering, validation measures, and decision tree criteria. It discusses the relevance of ML in various business operations and outlines the steps and considerations for implementing ML algorithms. Additionally, it highlights the importance of validation measures for assessing model performance and the criteria for building effective decision trees.

Uploaded by

gopalkr9958
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views9 pages

Assignment - DADS303 - MBA 3 - Set 1 and 2

The document is an assignment for a Master of Business Administration (MBA) course on Machine Learning, covering key concepts such as Machine Learning definitions, applications in business, Support Vector Machines, linear regression assumptions, K-Means clustering, validation measures, and decision tree criteria. It discusses the relevance of ML in various business operations and outlines the steps and considerations for implementing ML algorithms. Additionally, it highlights the importance of validation measures for assessing model performance and the criteria for building effective decision trees.

Uploaded by

gopalkr9958
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

NAME ANAND KUMAR

ROLL NO. 2314515035


PROGRAMME MASTER OF BUSINESS ADMINISTRATION (MBA)
SEMESTER III
COURSE NAME INTRODUCTION TO MACHINE LEARNING
CODE DADS303

ASSIGNMENT SET - 1

Q.1. What do you mean by Machine Learning? Discuss the relevance of Machine
Learning in Business.
Ans. Machine Learning (ML), a subfield of Artificial Intelligence (AI), focuses on
understanding data structure and representing it through models. These models enable
analysis and application, allowing users to derive desired outcomes.
The term "Machine Learning" was coined in 1959 by Arthur Samuel, who defined it
as the study of algorithms that allow computers to learn without explicit programming.
Professor Tom Mitchell of Carnegie Mellon University offered a more recent
definition, describing it as a computer's ability to improve its performance on a task
through experience and performance measurement.
ML is a field built upon decades of research, integrating principles and techniques
from diverse disciplines. Probability, statistics, and computer science form the
theoretical foundation of modern machine learning. Furthermore, it draws inspiration
from biology, genetics, clinical trials, and various social sciences.
ML tasks are categorized based on the learning approach, specifically how the system
learns from existing data or makes predictions using feedback datasets. The following
are the commonly used classifications:
Supervised Learning: The algorithm learns from labeled data, where a "response"
column acts as a teacher. It learns from provided examples and applies that
knowledge to new, unseen data.
Unsupervised Learning: The algorithm learns without labeled data or a "teacher." It
analyzes data features to identify patterns independently.
Reinforcement Learning: Inspired by behavioral psychology, this approach presents
data points sequentially. The algorithm receives rewards for correct actions and
penalties for incorrect ones, learning through trial and error.
The applications of machine learning are vast, particularly with the widespread use of
smartphones. Smartphones are deeply integrated with various ML processes, and ML
is becoming increasingly popular in business through mobile applications.
Businesses are leveraging ML to enhance their operations, automate tasks, and gain
valuable insights. For example:
Uber uses ML algorithms to optimize pick-up and drop-off times.
Spotify employs ML to personalize marketing and music recommendations.
Dell utilizes ML to gather employee and customer feedback and improve its practices.
Beyond these examples, ML can benefit businesses in numerous other areas. To
effectively implement ML, a well-defined strategy and policy are essential. Here are
some potential applications:
 Monitoring of Social Media Content
 Customer Care Services
 Image Processing
 Virtual Assistance
 Product Recommendation
 Stock Market Trading and Investment
 Clinical Decision Support Systems
 Data Deduplication
 Cyber Security Enhancement

Q.2. What is Support Vector Machine? What are the various steps in using
Support Vector Machine?
Ans. Support Vector Machines (SVMs) are supervised machine learning algorithms
used for both classification and regression. Their primary goal is to identify the
optimal hyperplane that effectively separates data points belonging to different classes
or accurately predicts continuous values. This hyperplane maximizes the margin
between the classes, and the data points closest to it are termed "support vectors."
SVMs are highly valued in machine learning and data analysis for their ability to
handle high-dimensional data and generalize well to unseen data. Their relative
simplicity in implementation and efficient training make them a popular choice in
diverse real-world applications such as computer vision, natural language processing,
and finance.

Key Considerations When Using SVMs:


High-Dimensional Data: For datasets with numerous features, radial kernels may
offer a slight performance edge over linear kernels. However, consider using a linear
kernel with careful tuning of the 'C' parameter for simplicity.
Non-Linear Data with Fewer Features: If the data is non-linear with a limited number
of features, radial kernels are generally the preferred choice.
Polynomial Kernels: Exercise caution when using polynomial kernels, avoiding
degrees higher than 4, as this can negatively impact performance and increase
computational time.
SVM Margin Maximization
The core principle of SVMs is to maximize the margin ('m') between the separating
hyperplane and the support vectors. This involves modifying the decision rules to
ensure correct classification while achieving the largest possible margin.
SVM specifically focuses on deciding on having an green margin value of ‘m’. To
perform this system, some tweaks will be made to the prevailing regulations,
For finding the effective margin ‘m’, the selection guidelines are to be changed as
done under,
Original decision rule:
w.x + b > 0 for x points
w.x + b < 0 for o points

Modified decision rule:


Mark x if w.x + b ≥ 1
Mark o if w.x + b ≤ -1

Compact form of new decision rule:


y= 1 if point has x mark
y= -1 if point has o mark
New Rule: y(w.x + b) ≥ 1
Take rule in Consideration, new Margins will be

w.x + b = c1
w.x + b = c2
After apllying formula of distance between parallel lines
|𝐶 −𝐶 |
m= 2|𝑊| 1
If C1=1 and C2=-1, by means of L2 norm of w offers us the algebraic expression for
the margin of the SVM.
2
Margin = m = |𝑤|
The goal was to maximize the margin m, but one need to additionally make certain
that it efficiently classifies crosses and circles. In different words, for the most
advantageous w vector this is observed, one desires to make certain crosses visit the
left of the crimson line and circles go to the proper of the blue line. be aware, however,
that the appealing amount inside the objective function, the L2 norm of the w vector,
appears inside the denominator and has a square root. both features add headaches to
solving a mathematical problem.

Q.3. Discuss all the assumptions of linear regression.


Ans. Regression analysis, a widely used analytical technique, proves valuable for
addressing various business challenges, especially in forecasting and understanding
future trends based on current consumer behavior data. This statistical method aims to
identify the strength and direction of potential causal links between observed patterns
and the variables influencing those patterns.
When implementing ordinary least squares (OLS) regression models, understanding
the assumptions that validate the OLS beta coefficient estimates is crucial. These
assumptions, some of which are essential, include:
 Linear Relationship
 Multivariate Normality
 Little to No Multicollinearity
 No Autocorrelation
 Homoscedasticity

 Linear Relationship: The model must exhibit linearity in its parameters, meaning
the beta coefficients, which are fundamental to linear regression, have a linear
nature. A linear relationship should exist between the independent variable (X)
and the mean of the dependent variable (Y). Moreover, due to the sensitivity of
linear regression to outliers, it is essential to identify and address them. Scatter
plots are an effective tool for assessing the linearity assumption.
 Multivariate Normality: Linear regression models assume that the data represents
a random sample from the underlying population, with errors that are
uncorrelated and statistically independent. All variables should follow a
multivariate normal distribution. This assumption can be assessed using
histograms or Q-Q plots, and normality can be verified through goodness-of-fit
tests like the Kolmogorov-Smirnov test. If data is not normally distributed, non-
linear transformations (e.g., a log transformation) might be necessary.
 Little to No Multicollinearity: A key assumption of linear regression is the
absence, or minimization, of multicollinearity. Multicollinearity occurs when
independent variables are highly correlated with each other.
 No Autocorrelation: Linear regression analysis requires little to no
autocorrelation in the data. Autocorrelation occurs when residuals are not
independent, often observed in time series data where current values depend on
past values (e.g., stock prices). The Durbin-Watson test can detect autocorrelation,
and scatter plots can visually reveal it. The Durbin-Watson test assesses the null
hypothesis that residuals are not linearly auto-correlated. The test statistic 'd'
ranges from 0 to 4, with values near 2 indicating no autocorrelation. Generally,
values between 1.5 and 2.5 suggest no significant autocorrelation. Note that the
Durbin-Watson test primarily examines first-order effects and linear
autocorrelation between immediate neighbors.
 Homoscedasticity: Homoscedasticity, meaning "same variance," is fundamental
to linear regression. It assumes that the error term (the random disturbance in the
relationship between independent and dependent variables) has the same variance
across all values of the independent variables. Scatter plots are a suitable method
for checking for homoscedasticity in the data.

ASSIGNMENT SET - 2

Q.4. Explain the K-Means Clustering algorithm


Ans. The K-Means algorithm is an iterative clustering technique where 'K' represents
the user-defined number of clusters to be identified within the dataset. The
algorithm’s primary output is a cluster assignment for each data point (row).
Essentially, K-Means identifies which data points belong to which distinct groups.
The resulting cluster labels can be added as a column to the dataset, indicating each
data point's group membership.
Sales Revenue with Cluster Labels:

KMeans Algorithm - Step-by-Step

1. Initialization: The algorithm begins by randomly selecting 'k' data points to serve as
the initial cluster centers. These points represent the centroids of the clusters.
2. Assignment: Each data point is then assigned to the cluster whose center is nearest,
according to a distance metric (e.g., Euclidean distance). This creates initial clusters.
3. Update: The algorithm then calculates new cluster centers by computing the mean
of all the data points assigned to each cluster. These new means become the updated
cluster centers.
4. Iteration: Steps 2 and 3 (assignment and update) are repeated iteratively. In each
iteration, data points are reassigned to the nearest cluster center, and the cluster
centers are recalculated.
5. Convergence: The iterative process continues until a stable solution is reached.
Stability is defined as the point where re-running the algorithm no longer results in
points changing cluster memberships (or minimal changes occur).

Clustering algorithm Steps

Data Preparation for K-Means

Several considerations are essential for effective K-Means clustering:

 Numeric Data: KMeans requires numerical data. Categorical variables must be


converted into a numerical format.
 Scaling: All variables should be on the same scale. Variables on disparate scales
can bias the results.
 Determining 'K': Determining the optimal number of clusters ('K') for a given
dataset can be challenging and often involves techniques beyond simple visual
inspection.

Example of converting a categorical variable into dummy variables

Handling Categorical Variables (Dummy Variables/One-Hot Encoding)

To use categorical data with KMeans, it must be transformed into a numerical


representation. One common method is to create "dummy variables," also known as
one-hot encoding. This process creates a new binary column for each unique category
within the categorical variable.

Scaling Data: Z-Standardization

Bringing data onto the same scale is critical. A common scaling method is Z-
standardization (or Z-transform). This involves:

1. Subtracting the mean of the variable from each data point.


2. Dividing the result by the standard deviation of the variable.

The resulting values are called Z-scores. Z-standardization transforms the data to have
a mean of zero and a standard deviation of one, allowing for valid comparisons across
different distributions.

Computation of Z-Scores
The above parent indicates the z transformation of the income and kids’s information.
As you could
have a look at, the values are in a not unusual range, i.e., -1 to at least one.

Q.5. Discuss various validation measures used for Machine Learning in detail.
Ans. Evaluating a model's performance on an unseen dataset using validation
measures is essential for assessing its generalization ability. These measures differ
depending on the type of problem, such as classification or regression, and offer
insights into the quality of the model's predictions.

For classification problems, common validation measures include accuracy, precision,


recall, F1 score, and AUC-ROC.

1. Accuracy is the proportion of correct predictions out of all predictions.


However, it can be misleading in cases of imbalanced classes or unequal costs
of false positives and false negatives.
2. Precision measures the proportion of true positive predictions out of all
positive predictions made. While it reflects the classifier's reliability when
predicting the positive class, it doesn't capture all aspects of model
performance.
3. Recall, also known as sensitivity, quantifies the proportion of actual positives
correctly predicted. It highlights how effectively the model captures all
positive instances but, like precision, doesn't provide a complete picture.
4. The F1 score is the harmonic mean of precision and recall, offering a single
metric that balances both. It is useful when both precision and recall are
important and is commonly used in imbalanced datasets.
5. AUC-ROC represents a classifier's performance by plotting the true positive
rate against the false positive rate at various thresholds. The AUC value
summarizes overall performance, with 1 being perfect and 0.5 indicating
random guessing.

For regression problems, validation measures like R² and adjusted R² are commonly
used:

1. R² measures the proportion of variance in the dependent variable that is


explained by the model. It ranges from 0 to 1, with 1 indicating a perfect fit.
However, R² can increase with more predictors, even if they don't improve the
model's quality.
2. Adjusted R² adjusts R² to penalize models that include too many irrelevant
predictors, providing a more reliable measure of goodness of fit.

A Confusion Matrix offers a detailed breakdown of true positives, false positives, true
negatives, and false negatives, from which other metrics like accuracy, precision, and
recall are derived.

For binary classification with imbalanced data, the AUC-PR (Precision-Recall curve)
is often more informative than ROC-AUC. The AUC-PR summarizes the classifier's
performance by combining precision and recall at different thresholds, with 1
indicating perfect performance and 0.5 suggesting no better than random guessing.
Choosing the appropriate validation measure depends on the specific problem and
data characteristics. Measures like accuracy, precision, recall, and F1 score are useful
for classification, while R² and adjusted R² serve regression tasks. The AUC-ROC and
AUC-PR provide further insights, particularly in cases of imbalanced datasets.

Q.6. Briefly explain ‘Splitting Criteria’, ‘Merging Criteria’ and ‘Stopping


Criteria’ in Decision Tree.
Ans. Decision tree algorithms, versatile for both classification and regression tasks,
utilize specific criteria to control tree growth and avoid overfitting. These criteria fall
into three main categories: splitting, merging, and stopping, each essential for
building effective and efficient decision trees.

Splitting Criteria:

The core of decision tree construction lies in splitting criteria, which identifies the
best attribute to divide a node into purer sub-nodes. The goal is to maximize
information gain, resulting in sub-nodes that are more homogeneous with respect to
the target variable. Various splitting metrics exist, each with its own advantages and
disadvantages.

For classification, common metrics include Gini impurity and information gain (based
on entropy). Gini impurity measures the probability of misclassifying a randomly
selected element if it were labeled based on the class distribution of the node. Lower
Gini impurity signifies a more homogeneous node. Information gain, conversely,
quantifies the reduction in entropy (a measure of disorder) achieved by the split. The
attribute yielding the highest information gain is chosen for the split.

For regression, variance reduction is often used. This involves selecting the attribute
that minimizes the variance of the target variable within the resulting sub-nodes. By
reducing variance, more homogeneous groups are created, improving prediction
accuracy.

The choice of splitting criterion impacts the tree's structure and performance.
Information gain tends to favor attributes with many values, while Gini impurity is
computationally cheaper.

Merging Criteria:

Merging criteria, though less frequently used, allow for combining nodes or branches
after initial tree growth. This technique, known as pruning, aims to simplify the tree
and prevent overfitting. After an initial tree is built, merging criteria assesses whether
combining nodes improves performance. This involves comparing the performance of
the reduced tree with the original. The criterion often uses statistical tests to determine
if merging would significantly negatively impact accuracy. If not, the nodes are
combined.

Stopping Criteria:
Stopping criteria define the conditions under which tree growth is halted. Without
these, the tree could grow indefinitely, leading to overfitting and poor generalization.
Several common stopping criteria are used.

One criterion is a minimum number of samples required in a node before splitting is


attempted. This prevents the creation of small, specialized nodes prone to overfitting.
Another is a maximum depth for the tree, limiting complexity and preventing
memorization of training data. A minimum impurity decrease can also halt splits if the
improvement in homogeneity is below a threshold. Finally, splitting can stop when all
instances in a node belong to the same class, as further splitting is unnecessary.

By carefully balancing splitting criteria with appropriate stopping criteria, accurate


and generalizable decision trees can be constructed, capable of reliable predictions on
new data. Fine-tuning these parameters is crucial for optimal performance and
avoiding overfitting or underfitting.

You might also like