Ai&ml 2
Ai&ml 2
K-FOLD CROSS-VALIDATION
For the ideal conditions, it provides the optimum output. But for the inconsistent data, it
may produce a drastic result.
In predictive modeling, the data evolves over a period, due to which, it may face the
differences between the training set and validation sets.
Applications of Cross-Validation
This technique can be used to compare the performance of different predictive modeling
methods.
It has great scope in the medical research field.
It can also be used for the meta-analysis.
2. Define confusion matrix and explain its components like TP, TN, FP, FN, Type-1
and Type-2 errors and calculation using confusion matrix.
ANS- The confusion matrix also known as an Error matrix. is a matrix used to determine the
performance of the classification models for a given set of test data. Features of Confusion
matrix are:
o For the 2 prediction classes of classifiers, the matrix is of 2*2 table, for 3 classes, it is 3*3
table, and so on.
o The matrix is divided into two dimensions, that are predicted values and actual
values along with the total number of predictions.
o Predicted values are those values, which are predicted by the model, and actual values are
the true values for the given observations.
o True Negative: Model has given prediction No, and the real or actual value was also No.
o True Positive: The model has predicted yes, and the actual value was also true.
o False Negative: The model has predicted no, but the actual value was Yes, it is also
called as Type-II error.
o False Positive: The model has predicted Yes, but the actual value was No. It is also
called a Type-I error.
o It evaluates the performance of the classification models, when they make predictions on
test data, and tells how good our classification model is.
o It not only tells the error made by the classifiers but also the type of errors such as it is
either type-I or type-II error.
o Misclassification rate/error rate: It defines how often the model gives the wrong
predictions. The formula is given below:
o Precision: It can be defined as the number of correct outputs provided by the model. It
can be calculated using the below formula:
o Recall: It is defined as the out of total positive classes. The recall must be as high as
possible.
o F-measure: This score helps us to evaluate the recall and precision at the same time. The
F-score is maximum if the recall is equal to the precision. It can be calculated using the
below formula:
ROC Curve:
ROC or Receiver Operating Characteristic curve represents a probability graph to show the
performance of a classification model at different threshold levels. The curve is plotted
between two parameters, which are:
TPR:
FPR or False Positive Rate can be calculated as:
AUC Curve:
AUC is known for Area Under the ROC curve calculates the two-dimensional area under the
entire ROC curve ranging from (0,0) to (1,1), as shown below image:
Applications:
1. Classification of 3D model
The curve is used to classify a 3D model and separate it from the normal models.
2. Healthcare
The curve has various applications in the healthcare sector. It can be used to detect cancer
disease in patients. It does this by using false positive and false negative rates, and accuracy
depends on the threshold value used for the curve.
3. Binary Classification
AUC-ROC curve is mainly used for binary classification problems to evaluate their
performance.
o AUC is used to measure how well the predictions are ranked instead of giving their
absolute values. Hence, we can say AUC is Scale-Invariant.
o It measures the quality of predictions of the model without considering the selected
classification threshold. It means AUC is classification-threshold-invariant.
o Further, AUC is not a useful metric when there are wide disparities in the cost of false
negatives vs false positives, and it is difficult to minimize one type of classification error.
4. Define clustering.
e.g.- Grouping documents according to the topic, recommendation system of amazon and
Netflix etc.
The clustering methods are broadly divided into Hard clustering (datapoint belongs to only
one group) and Soft Clustering (data points can belong to another group also). But there are
also other various approaches of Clustering exist:
ANS- It is an iterative, centroid based unsupervised learning algorithm that divides the
unlabeled dataset into k different clusters in such a way that each dataset belongs only one
group that has similar properties.
o Assigns each data point to its closest k-center. Those data points which are near to the
particular k-center, create a cluster.
Working:
Step-2: Select random K points or centroids. (It can be other from the input dataset).
Step-3: Assign each data point to their closest centroid, which will form the predefined K
clusters.
Step-4: Calculate the variance and place a new centroid of each cluster.
Step-5: Repeat the third steps, which means reassign each datapoint to the new closest centroid
of each cluster.
6. Write the algorithm of K-medoid method and its advantages and disadvantages. (for
numerical y/t)
ANS-
1. Initialize: select k random points out of the n data points as the medoids.
2. Associate each data point to the closest medoid by using any common distance metric
methods.
3. While the cost decreases: For each medoid m, for each data o point which is not a
medoid:
Swap m and o, associate each data point to the closest medoid, and recompute the
cost.
If the total cost is more than that in the previous step, undo the swap.
Advantages:
Disadvantages:
1. It is not suitable for clustering non-spherical (arbitrarily shaped) groups of objects. This is
because it uses compactness as clustering criteria instead of connectivity.
2. It may obtain different results for different runs on the same dataset because the first k
medoids are chosen randomly.
ANS- Partitioning methods (K-means, PAM clustering) and hierarchical clustering are suitable
only for compact and well-separated clusters and they are also severely affected by the presence
of noise and outliers in the data. Real-life data may contain irregularities, like: clusters can be of
arbitrary shape, data may contain noise. Thus we use DBSCAN in such cases.
1. eps: It defines the neighborhod around a data point. If it is chosen very large then the
clusters will merge and the majority of the data points will be in the same clusters.To find
the eps value we use k-distance graph.
2. MinPts: Minimum number of neighbors (data points) within eps radius. The larger the
dataset, the larger value of MinPts must be chosen. MinPts can be derived from the
number of dimensions D in the dataset as, MinPts >= D+1. The minimum value of
MinPts must be chosen at least 3.
Algorithm:
Step-1: Select the number K of the neighbors.
Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
Step-4: Among these k neighbors, count the number of the data points in each category.
Step-5: Assign the new data points to that category for which the number of the neighbor is
maximum.
Euclidean Distance :
o It is simple to implement.
o Always needs to determine the value of K which may be complex some time.
Applications of KNN
Banking system, Calculating credit card ratings, Politics, Speech Recognition, Handwriting
detection, Image Recognition, Video Recogniton.
K-
NN
ANS- Decision Tree is a Supervised learning technique that can be used for both
classification and Regression problems. It is called a decision tree because, similar to a tree,
it starts with the root node, which expands on further branches and constructs a tree-like
structure. In order to build a tree, we use the CART algorithm, which stands
for Classification and Regression Tree algorithm. A decision tree simply asks a question,
and based on the answer (Yes/No), it further split the tree into subtrees.
Int DT:
Algorithm:
Step-1: Begin the tree with the root node, says S, which contains the complete dataset.
Step-2: Find the best attribute in the dataset using Attribute Selection Measure (ASM).
Step-3: Divide the S into subsets that contains possible values for the best attributes.
Step-4: Generate the decision tree node, which contains the best attribute.
Step-5: Recursively make new decision trees using the subsets of the dataset created in step
-3. Continue this process until a stage is reached where you cannot further classify the nodes
and called the final node as a leaf node.
o It may have an overfitting issue, which can be resolved using the Random Forest
algorithm.
o For more class labels, the computational complexity of the decision tree may increase.