0% found this document useful (0 votes)
14 views

Assignment Solution

Solution for the IEC-03 assignment

Uploaded by

kousikkb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Assignment Solution

Solution for the IEC-03 assignment

Uploaded by

kousikkb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Assignment Solution

1. answer
Answer 2 (1)
Answer 2 (2)
Generalization:-
Generalization refers to your model's ability to adapt properly to new,
previously unseen data, drawn from the same distribution as the one used to
create the model. Develop intuition about overfitting.

Answer 2 (3)
Cross-validation:-
Cross-validation sometimes called rotation estimation or out-of-sample
testing techniques for assessing how the results of a statistical analysis will
generalize to an independent data set. Cross-validation includes resampling
and sample splitting methods that use different portions of the data to test
and train a model on different iterations. It is often used in settings where the
goal is prediction, and one wants to estimate how accurately a predictive
model will perform in practice.

Hold-out validation:- Hold-out is when you split up your dataset into a


‘train’ and ‘test’ set. The training set is what the model is trained on, and the
test set is used to see how well that model performs on unseen data. A
common split when using the hold-out method is using 80% of data for
training and the remaining 20% of the data for testing.
LOOCV(Leave One Out Cross-Validation):- it is a type of cross-validation
approach in which each observation is considered as the validation set and
the rest (N-1) observations are considered as the training set. In LOOCV,
fitting of the model is done and predicting using one observation validation
set. Furthermore, repeating this for N times for each observation as the
validation set. The model is fitted and the model is used to predict a value
for observation. This is a special case of K-fold cross-validation in which the
number of folds is the same as the number of observations(K = N). This
method helps to reduce Bias and Randomness. The method aims to reduce
the Mean-Squared error rate and prevent overfitting.

Answer 2(4):-
Answer 2 (5)
Feature correlation is a measure of how strongly two or more variables are
related to each other. It can help you understand the patterns and
dependencies in your data, and how they affect your machine learning
model.

Answer 2 (6)
If the algorithm is too simple (hypothesis with linear equation) then it may
be on high bias and low variance condition and thus is error-prone. If
algorithms fit too complex (hypothesis with high degree equation) then it
may be on high variance and low bias. In the latter condition, the new entries
will not perform well. Well, there is something between both of these
conditions, known as a Trade-off or Bias Variance Trade-off.

The best fit will be given by the hypothesis on the tradeoff point. The error to
complexity graph to show trade-off is given as –
Answer 2 (7):-
Prior Probability :-
The prior probability is the probability assigned to an event before the
arrival of some information that makes it necessary to revise the assigned
probability.

Posterior probability = prior probability + new data. Ans. A posterior


probability, in Bayesian statistics, is defined as the revised or updated
probability of an event happening after including new information.

Conditional probability is known as the possibility of an event or outcome


happening, based on the existence of a previous event or outcome. It is
calculated by multiplying the probability of the preceding event by the
renewed probability of the succeeding, or conditional, event.

Answer 3 :-
Issues with current ai models:-
1. Lack of Generalization
2. Biasness on algorithm and data
3. Robustness and reliability
4. Data dependency
5. High computational cost
The problem of unity of perception, also known as the combination problem or
subjective unity of perception, is the question of how the brain constructs
phenomenal objects. Early philosophers like Gottfried Wilhelm Leibniz and René
Descartes noted that the apparent unity of experience is a qualitative characteristic
that doesn't seem to have an equivalent in the quantitative features of composite
matter, like cohesion or proximity.

Answer 4:-
Entropy E(S) = E([3+, 3-]) = -(3/6) log2 (3/6) - (3/6) log2 (3/6) = 1.
Gain (S, a2) = E(S) – (4/6)E(T) – (2/6) E(F) = 1 – 4/6 – 2/6 ≈ 0. E(T) = E([2+, 2-]) = 1.
E(F) = E([1+, 1-]) = 1.

Answer 5 :-
Multicollinearity occurs when independent variables in a regression model are
correlated. This correlation is a problem because independent variables should be
independent. If the degree of correlation between variables is high enough, it can
cause problems when you fit the model and interpret the results.

Removing colinearity :-

1. Variable Selection
2. Principal Component Analysis

Answer 6:-
A 98.5% accuracy might not be good if the dataset is imbalanced. Check precision,
recall, and F1-score to ensure effective fraud detection. If these metrics are also
high, the model is likely good. Otherwise, address class imbalance and validate
with cross-validation.

Answer 7:-
Pruning a tree reduces overfitting, simplifies the model, improves performance,
and decreases computation time.

Answer 8.
Answer 9.

You might also like