Data Analytics and Model Evaluation
Data Analytics and Model Evaluation
com/k-means-clustering-algorithm-in-machine-learning
In this algorithm, we develop the hierarchy of clusters in the form of a tree, and this tree-
shaped structure is known as the dendrogram.
Sometimes the results of K-means clustering and hierarchical clustering may look similar,
but they both differ depending on how they work. As there is no requirement to
predetermine the number of clusters as we did in the K-Means algorithm.
o Step-1: Create each data point as a single cluster. Let's say there are N data points, so the
number of clusters will also be N.
o Step-2: Take two closest data points or clusters and merge them to form one cluster. So,
there will now be N-1 clusters.
o Step-3: Again, take the two closest clusters and merge them together to form one cluster.
There will be N-2 clusters.
o Step-4: Repeat Step 3 until only one cluster left. So, we will get the following clusters.
Consider the below images:
o Step-5: Once all the clusters are combined into one big cluster, develop the dendrogram
to divide the clusters as per the problem.
2. Complete Linkage: It is the farthest distance between the two points of two different
clusters. It is one of the popular linkage methods as it forms tighter clusters than single-
linkage.
3. Average Linkage: It is the linkage method in which the distance between each pair of
datasets is added up and then divided by the total number of datasets to calculate the
average distance between two clusters. It is also one of the most popular linkage methods.
4. Centroid Linkage: It is the linkage method in which the distance between the centroid of
the clusters is calculated. Consider the below image:
From the above-given approaches, we can apply any of them according to the type of
problem or business requirement.
The working of the dendrogram can be explained using the below diagram:
In the above diagram, the left part is showing how clusters are created in agglomerative
clustering, and the right part is showing the corresponding dendrogram.
o As we have discussed above, firstly, the datapoints P2 and P3 combine together and form
a cluster, correspondingly a dendrogram is created, which connects P2 and P3 with a
rectangular shape. The hight is decided according to the Euclidean distance between the
data points.
o In the next step, P5 and P6 form a cluster, and the corresponding dendrogram is created.
It is higher than of previous, as the Euclidean distance between P5 and P6 is a little bit
greater than the P2 and P3.
o Again, two new dendrograms are created that combine P1, P2, and P3 in one dendrogram,
and P4, P5, and P6, in another dendrogram.
o At last, the final dendrogram is created that combines all the data points together.
We can cut the dendrogram tree structure at any level as per our requirement.
https://www.javatpoint.com/hierarchical-clustering-in-machine-learning
If you receive huge amounts of unstructured data in the form of text (emails, social media
conversations, chats), you’re probably aware of the challenges that come with analyzing this
data.
Manually processing and organizing text data takes time, it’s tedious, inaccurate, and it can be
expensive if you need to hire extra staff to sort through text.
In this guide, learn more about what text analysis is, how to perform text analysis using AI tools,
and why it’s more important than ever to automatically analyze your text in real time.
Text analysis (TA) is a machine learning technique used to automatically extract valuable
insights from unstructured text data. Companies use text analysis tools to quickly digest online
data and documents, and transform them into actionable insights.
You can us text analysis to extract specific information, like keywords, names, or company
information from thousands of emails, or categorize survey responses by sentiment and topic.
The Text Analysis vs. Text Mining vs. Text Analytics
Firstly, let's dispel the myth that text mining and text analysis are two different processes. The
terms are often used interchangeably to explain the same process of obtaining data through
statistical pattern learning. To avoid any confusion here, let's stick to text analysis.
Text analysis delivers qualitative results and text analytics delivers quantitative results. If a
machine performs text analysis, it identifies important information within the text itself, but if it
performs text analytics, it reveals patterns across thousands of texts, resulting in graphs, reports,
tables etc.
Let's say a customer support manager wants to know how many support tickets were solved by
individual team members. In this instance, they'd use text analytics to create a graph that
visualizes individual ticket resolution rates.
However, it's likely that the manager also wants to know which proportion of tickets resulted in a
positive or negative outcome?
By analyzing the text within each ticket, and subsequent exchanges, customer support managers
can see how each agent handled tickets, and whether customers were happy with the outcome.
Basically, the challenge in text analysis is decoding the ambiguity of human language, while in
text analytics it's detecting patterns and trends from the numerical results.
Text analysis tools allow businesses to structure vast quantities of information, like emails, chats,
social media, support tickets, documents, and so on, in seconds rather than days, so you can
redirect extra resources to more important business tasks.
Businesses are inundated with information and customer comments can appear anywhere on the
web these days, but it can be difficult to keep an eye on it all. Text analysis is a game-changer
when it comes to detecting urgent matters, wherever they may appear, 24/7 and in real time. By
training text analysis models to detect expressions and sentiments that imply negativity or
urgency, businesses can automatically flag tweets, reviews, videos, tickets, and the like, and take
action sooner rather than later.
Humans make errors. Fact. And the more tedious and time-consuming a task is, the more errors
they make. By training text analysis models to your needs and criteria, algorithms are able to
analyze, understand, and sort through data much more accurately than humans ever could.
Text data derived from natural language is unstructured and noisy. Text preprocessing
involves transforming text into a clean and consistent format that can then be fed into a
model for further analysis and learning.
Text preprocessing techniques may be general so that they are applicable to many
types of applications, or they can be specialized for a specific task. For example, the
methods for processing scientific documents with equations and other mathematical
symbols can be quite different from those for dealing with user comments on social
media.
Here's what you need to know about text preprocessing to improve your natural
language processing (NLP).
An NLP pipeline for document classification might include steps such as sentence
segmentation, word tokenization, lowercasing, stemming or lemmatization, stop word
removal, and spelling correction. Some or all of these commonly used text
preprocessing stages are used in typical NLP systems, although the order can vary
depending on the application.
Segmentation
Segmentation involves breaking up text into corresponding sentences. While this may
seem like a trivial task, it has a few challenges. For example, in the English language, a
period normally indicates the end of a sentence, but many abbreviations, including
“Inc.,” “Calif.,” “Mr.,” and “Ms.,” and all fractional numbers contain periods and introduce
uncertainty unless the end-of-sentence rules accommodate those exceptions.
Tokenization
The tokenization stage involves converting a sentence into a stream of words, also
called “tokens.” Tokens are the basic building blocks upon which analysis and other
methods are built.
Many NLP toolkits allow users to input multiple criteria based on which word boundaries
are determined. For example, you can use a whitespace or punctuation to determine if
one word has ended and the next one has started. Again, in some instances, these
rules might fail. For example, don’t, it’s, etc. are words themselves that contain
punctuation marks and have to be dealt with separately.
Change Case
Changing the case involves converting all text to lowercase or uppercase so that all
word strings follow a consistent format. Lowercasing is the more frequent choice in NLP
software.
Spell Correction
Many NLP applications include a step to correct the spelling of all words in the text.
Stop-Words Removal
"Stop words" are frequently occurring words used to construct sentences. In the English
language, stop words include is, the, are, of, in, and and. For some NLP applications,
such as document categorization, sentiment analysis, and spam filtering, these words
are redundant, and so are removed at the preprocessing stage.
Stemming
The term word stem is borrowed from linguistics and used to refer to the base or root
form of a word. For example, learn is a base word for its variants such as learn, learns,
learning, and learned.
Stemming is the process of converting all words to their base form, or stem. Normally, a
lookup table is used to find the word and its corresponding stem. Many search engines
apply stemming for retrieving documents that match user queries. Stemming is also
used at the preprocessing stage for applications such as emotion identification and text
classification.
Lemmatization
Lemmatization is a more advanced form of stemming and involves converting all words
to their corresponding root form, called “lemma.” While stemming reduces all words to
their stem via a lookup table, it does not employ any knowledge of the parts of speech
or the context of the word. This means stemming can’t distinguish which meaning of the
word right is intended in the sentences “Please turn right at the next light” and “She is
always right.”
The stemmer would stem right to right in both sentences; the lemmatizer would treat
right differently based upon its usage in the two phrases.
A lemmatizer also converts different word forms or inflections to a standard form. For
example, it would convert less to little, wrote to write, slept to sleep, etc.
A lemmatizer works with more rules of the language and contextual information than
does a stemmer. It also relies on a dictionary to look up matching words. Because of
that, it requires more processing power and time than a stemmer to generate output.
For these reasons, some NLP applications only use a stemmer and not a lemmatizer.
Text Normalization
Text normalization is the preprocessing stage that converts text to a canonical
representation. A common application is the processing of social media posts, where
input text is shortened or words are spelled in different ways. For example, hello might
be written as hellooo or something might appear as smth, and different people might
choose to write real time, real-time, or realtime. Text normalization cleans the text and
ideally replaces all words with their corresponding canonical representation. In the last
example, all three forms would be converted to realtime. Many text normalization stages
also replace emojis in text with a corresponding word. For example, :-) is replaced by
happy face.
One of the more advanced text preprocessing techniques is parts of speech (POS)
tagging. This step augments the input text with additional information about the
sentence’s grammatical structure. Each word is, therefore, inserted into one of the
predefined categories such as a noun, verb, adjective, etc. This step is also sometimes
referred to as grammatical tagging.
The simple answer is yes. Text preprocessing improves the performance of an NLP
system. For tasks such as sentiment analysis, document categorization, document
retrieval based upon user queries, and more, adding a text preprocessing layer provides
more accuracy.
Stages such as stemming, lemmatization, and text normalization make the vocabulary
size more manageable and transform the text into a more standard form across a
variety of documents acquired from different sources.
Once you have a clear idea of the type of application you are developing and the source
and nature of text data, you can decide on which preprocessing stages can be added to
your NLP pipeline. Most of the NLP toolkits on the market include options for all of the
preprocessing stages discussed above.
https://towardsdatascience.com/text-preprocessing-in-natural-language-processing-using-python-
6113ff5decd8
A Simple Explanation of the Bag-of-Words
Model
A quick, easy introduction to the Bag-of-Words model and
how to implement it in Python.
NOVEMBER 30, 2019
The bag-of-words (BOW) model is a representation that turns arbitrary text into fixed-
length vectors by counting how many times each word appears. This process is often
referred to as vectorization.
Let’s understand this with an example. Suppose we wanted to vectorize the following:
We first define our vocabulary, which is the set of all words found in our document set.
The only words that are found in the 3 documents above are: the, cat, sat, in, the, hat,
and with.
Step 2: Count
To vectorize our documents, all we have to do is count how many times each word
appears:
Notice that we lose contextual information, e.g. where in the document the word
appeared, when we use BOW. It’s like a literal bag-of-words: it only tells you what words
occur in the document, not where they occurred.
docs = [
'the cat sat',
'the cat sat in the hat',
'the cat with the hat',
]
## Step 2: Count
vectors = tokenizer.texts_to_matrix(docs, mode='count')
print(vectors)
Notice that the vectors here have length 7 instead of 6 because of the extra 0 element at
the beginning. This is an inconsequential detail - Keras reserves index 0 and never
assigns it to any word.
I’ve written a blog post that uses BOW for profanity detection - check it out if you’re
curious to see BOW in action!
The term "bag of words" refers to a popular and simple technique used in natural language processing
(NLP) and information retrieval tasks. It represents a text document as an unordered collection or "bag"
of its individual words, disregarding grammar and word order. This technique focuses on the presence or
absence of words in a document rather than their sequence.
2. Vocabulary creation: A vocabulary or dictionary is created by listing all unique words present in the
document corpus. Each word is assigned a unique index or identifier.
3. Encoding: Each document is represented as a numerical vector, where the length of the vector is equal
to the size of the vocabulary. The value at each position in the vector indicates the frequency, presence,
or other statistics associated with the corresponding word in the vocabulary.
4. Vectorization: The textual data is converted into numerical feature vectors, typically using methods
such as one-hot encoding or term frequency-inverse document frequency (TF-IDF) representation.
The bag of words approach has some limitations. It discards valuable information about word order,
grammar, and semantics, as it treats each word independently. It also ignores the context and meaning
of the words. Nevertheless, it has been widely used for various text-based tasks, such as document
classification, sentiment analysis, and information retrieval, especially when the focus is on keyword-
based analysis rather than understanding the overall structure of the text.
https://www.engati.com/glossary/bag-of-words
TF-IDF (Term Frequency-Inverse Document Frequency) is a statistical measure used to evaluate the
importance of a term (word) within a document in the context of a collection of documents or corpus. It
is commonly employed in natural language processing (NLP) and text mining tasks, including topic
modeling.
1. Term Frequency (TF): This measures the frequency of a term within a document. It indicates how often
a term appears in a document relative to the total number of terms in that document. A higher TF value
signifies that a term is more relevant to the document.
2. Inverse Document Frequency (IDF): This quantifies the rarity of a term across the entire corpus. It
measures the logarithmically scaled inverse fraction of documents that contain the term. Terms that
appear in fewer documents are given a higher IDF value, indicating their significance and distinctiveness.
The TF-IDF score for a term in a document is calculated by multiplying its TF value with its IDF value. The
formula for TF-IDF is as follows:
TF-IDF = (Term Frequency in Document) * (Inverse Document Frequency)
In the context of topic modeling, TF-IDF is often employed as a preprocessing step to identify and extract
important features from a collection of documents. These features, represented as TF-IDF vectors,
capture the relative importance of terms within each document and across the entire corpus. Topic
modeling algorithms like Latent Dirichlet Allocation (LDA) can then be applied to these TF-IDF vectors to
discover latent topics present in the corpus.
By applying TF-IDF, words that are frequent within a document but rare in the overall corpus receive
higher weights, making them more influential in determining the document's topic. This helps to identify
and prioritize key terms or features associated with specific topics. In essence, TF-IDF serves as a
weighting scheme that highlights the salient terms that contribute significantly to the characterization of
topics within a corpus.
By leveraging TF-IDF in topic modeling, researchers and analysts can effectively extract and explore the
underlying themes and topics present in a collection of documents, facilitating tasks such as document
clustering, categorization, recommendation systems, and information retrieval.
Social network analysis (SNA) is a field of study that examines the relationships and interactions between
individuals, organizations, or other entities. It focuses on understanding the structure, dynamics, and
patterns of social networks. SNA provides a framework and set of methods to analyze and visualize the
relationships within a network, uncovering valuable insights about social systems.
At its core, social network analysis recognizes that social interactions occur within a larger network of
connections. These connections can be represented graphically, where each node represents an entity
(e.g., a person, organization, or website), and the edges represent the relationships or interactions
between them (e.g., friendships, collaborations, or information flows). By studying these network
structures and properties, social network analysts aim to understand how information, influence,
resources, and behaviors flow through social systems.
Social network analysis has gained significant attention and application in various fields, including
sociology, anthropology, psychology, organizational behavior, communication studies, and computer
science. It offers a powerful lens to explore and analyze social phenomena, such as the spread of ideas,
the formation of social groups, the diffusion of innovations, the dynamics of online communities, and
the influence of individuals within a network.
1. Network visualization: Graphs and visual representations help in understanding and interpreting the
structure and patterns of a social network.
2. Centrality measures: These measures identify the most important or influential nodes within a
network, based on their connections and positions.
4. Network metrics: Quantitative measures that capture properties of a network, such as density,
clustering coefficient, path length, and assortativity.
5. Diffusion and contagion modeling: Analyzing the spread of information, behaviors, or diseases through
a network, examining how they propagate and influence individuals.
6. Social network mining: Extracting patterns and insights from large-scale social network data, often
using machine learning and data mining techniques.
Social network analysis provides valuable insights into various real-world applications, including social
media analysis, organizational dynamics, epidemiology, marketing, and recommendation systems. By
uncovering hidden relationships, influential individuals, and community structures, SNA contributes to a
deeper understanding of social systems and facilitates decision-making processes in various domains.
Business analysis is a discipline that focuses on identifying, analyzing, and solving business problems and
improving organizational processes. It involves understanding the needs and objectives of a business and
using analytical techniques to drive informed decision-making and achieve desired outcomes.
The role of a business analyst is to bridge the gap between business stakeholders and technology teams,
ensuring that solutions align with business goals and requirements. Business analysts work across
various industries and sectors, including finance, healthcare, retail, and information technology, among
others.
The key activities and components of business analysis include:
1. Understanding Business Needs: Business analysts work closely with stakeholders to identify and
articulate business needs, goals, and challenges. They gather requirements by conducting interviews,
workshops, and data analysis to gain a comprehensive understanding of the organization's current state
and desired future state.
2. Requirements Elicitation and Documentation: Business analysts gather requirements by engaging with
stakeholders to identify and document their needs. This involves creating business requirements
documents (BRDs), use cases, user stories, and other artifacts that capture the functional and non-
functional requirements of a project or initiative.
3. Analysis and Problem Solving: Business analysts analyze the gathered requirements and perform gap
analysis to identify areas of improvement and potential solutions. They use various techniques such as
process modeling, data analysis, and feasibility studies to evaluate different options and recommend the
most suitable course of action.
4. Solution Design and Evaluation: Business analysts collaborate with stakeholders and subject matter
experts to design solutions that address the identified business needs. This includes creating functional
specifications, wireframes, and prototypes to communicate the proposed solution. They also participate
in solution evaluation and validation to ensure that it meets the intended objectives.
5. Facilitating Communication and Collaboration: Business analysts act as facilitators and mediators
between business stakeholders and technology teams. They bridge the communication gap, ensuring
that requirements are understood by all parties involved. They facilitate meetings, workshops, and
discussions to foster collaboration and resolve conflicts.
6. Change Management and Implementation Support: Business analysts play a crucial role in managing
organizational change and ensuring the successful implementation of solutions. They create change
management plans, conduct impact assessments, and provide support during the implementation
phase. They also assist in user training and documentation to ensure smooth adoption of new processes
or technologies.
Overall, business analysis enables organizations to make informed decisions, streamline processes, and
achieve their strategic objectives. It requires a blend of analytical skills, communication abilities, domain
knowledge, and a deep understanding of business operations. By applying business analysis techniques
and methodologies, organizations can enhance efficiency, drive innovation, and gain a competitive
advantage in the marketplace.
1. Accuracy
The overall accuracy of a model is simply the number of correct
predictions divided by the total number of predictions. An accuracy
score will give a value between 0 and 1, a value of 1 would indicate a
perfect model.
2. Confusion Matrix
A confusion matrix is an extremely useful tool to observe in which
way the model is wrong (or right!). It is a matrix that compares the
number of predictions for each class that are correct and those that are
incorrect.
The image below shows a confusion matrix for a classifier. Using this
we can understand the following:
ROC curves plot the accuracy of the model and therefore are best
suited to diagnose the performance of models where the data is not
imbalanced.
ROC curve example (plotted using Pycaret). Image by Author
4. Precision
Precision measures how good the model is at correctly identifying the
positive class. In other words out of all predictions for the positive class
how many were actually correct? Using alone this metric for optimising
a model we would be minimising the false positives. This might be
desirable for our fraud detection example, but would be less useful for
diagnosing cancer as we would have little understanding of positive
observations that are missed.
6. F1 score
The F1 score is the harmonic mean of precision and recall. The F1
score will give a number between 0 and 1. If the F1 score is 1.0 this
indicates perfect precision and recall. If the F1 score is 0 this means
that either the precision or the recall is 0.
Actual
Example
Index 1 2 3 4 5 6 7 8 9 10
Result TP FN TP TN TP FP TP TP TN TN
Confusion Matrix
Steps:
Import the necessary libraries like Numpy,
confusion_matrix from sklearn.metrics, seaborn, and
matplotlib.
Create the NumPy array for actual and predicted
labels.
compute the confusion matrix.
Plot the confusion matrix with the help of the
seaborn heatmap.
Python3
#Import the necessary libraries
import numpy as np
actual = np.array(
predicted = np.array(
cm = confusion_matrix(actual,predicted)
sns.heatmap(cm,
annot=True,
fmt='g',
xticklabels=['Dog','Not Dog'],
yticklabels=['Dog','Not Dog'])
plt.ylabel('Prediction',fontsize=13)
plt.xlabel('Actual',fontsize=13)
plt.title('Confusion Matrix',fontsize=17)
plt.show()
Output:
Confusion Matrix
The hold-out method for training the machine learning models is a technique that
involves splitting the data into different sets: one set for training, and other sets
for validation and testing. The hold-out method is used to check how well a machine
learning model will perform on the new data. In this post, you will learn about the hold-
out method used during the process of training the machine learning model. Do check
out my post on what is machine learning? concepts & examples for a detailed understanding
of different aspects related to the basics of machine learning. Also, check out a related post
on what is data science?
When evaluating machine learning (ML) models, the question that arises is whether the
model is the best model available from the model’s hypothesis space in terms of
generalization error on the unseen / future data set. Whether the model is trained and
tested using the most appropriate method. Out of available models, which model to select?
These questions are taken care of using what is called as a hold-out method.
Instead of using an entire dataset for training, different sets called validation set and test set
are separated or set aside (and, thus, hold-out name) from the entire dataset and the
model is trained only on what is termed as the training dataset.
Table of Contents
When the entire data is used for training the model using different algorithms, the problem
of evaluating the models and selecting the most optimal model remains. The primary task is
to find out which model out of all models has the lowest generalization error. In other
words, which model makes a better prediction on future or unseen datasets than all other
models. This is where the need to have some mechanism arises wherein the model is trained
on one data set, and, validated and tested on another dataset. This is where the hold-out
method comes into the picture.
Hold-out method for Model Evaluation
The hold-out method for model evaluation represents the mechanism of splitting the
dataset into training and test datasets. The model is trained on the training set and then
tested on the testing set to get the most optimal model. This approach is often used when
the data set is small and there is not enough data to split into three sets (training, validation,
and testing). This approach has the advantage of being simple to implement, but it can be
sensitive to how the data is divided into two sets. If the split is not random, then the results
may be biased. Overall, the hold out method for model evaluation is a good starting point
for training machine learning models, but it should be used with caution. The following
represents the hold-out method for model evaluation.
This technique is well suited if the goal is to compare the models based on the model
accuracy on the test dataset and select the best model. However, there is always a
possibility that trying to use this technique can result in the model fitting well
to the test dataset. In other words, the models are trained to improve model accuracy on
the test dataset assuming that the test dataset represents the population. The test error,
thus, becomes an optimistically biased estimation of generalization error.
However, that is not desired. The final model fails to generalize well to the unseen or future
dataset as it is trained to fit well (or overfit) concerning the test data.
The following is the process of using the hold-out method for model evaluation:
Split the dataset into two parts (preferably based on a 70-30% split; However,
the percentage split will vary)
Train the model on the training dataset; While training the model, some fixed
set of hyperparameters is selected.
Test or evaluate the model on the held-out test dataset
Train the final model on the entire dataset to get a model which can generalize
better on the unseen or future dataset.
Note that this process is used for model evaluation based on splitting the dataset into
training and test datasets and using a fixed set of hyperparameters. There is another
technique of splitting the data into three sets and using these three sets for model selection
or hyperparameters tuning. We will look at that technique in the next section.
print(f"Accuracy: {accuracy}")
2. Sample Generation: Random subsampling involves generating multiple samples by randomly selecting
observations from the original dataset. Each sample is created by randomly selecting observations with
replacement, which means that the same observation can be chosen more than once for a particular
sample.
3. Sample Size: The sample size for each subsample is typically the same as the original dataset, but it
can also be smaller or larger, depending on the requirements of the analysis. The sample size is denoted
by n, where n ≤ N.
4. Repetition: The process of generating subsamples is typically repeated multiple times to obtain a set
of subsamples. The number of repetitions is denoted by R.
5. Analysis: Each subsample can be used independently for analysis, such as training machine learning
models, estimating statistical parameters, assessing variability, or conducting hypothesis testing.
2. Sample Size Flexibility: Random subsampling provides flexibility in choosing the sample size for each
subsample. Researchers can control the sample size based on computational constraints or statistical
requirements.
3. Robustness: Random subsampling helps to create robust estimates by incorporating random variations
and reducing the impact of outliers or extreme observations.
4. Model Evaluation: In machine learning, random subsampling is often used for model evaluation.
Multiple subsamples can be used to train and validate models, allowing for an estimation of model
performance on unseen data.
However, it's important to note that random subsampling does not guarantee that each observation will
be selected in each subsample. Due to the random selection process, some observations may be
excluded from certain subsamples, while others may be duplicated. The number of unique observations
in each subsample is expected to be lower than the total number of observations in the original dataset.
Overall, random subsampling is a useful technique for generating multiple samples from a dataset,
enabling robust analysis, variability assessment, and model evaluation.
Understanding AUC - ROC
Curve
Sarang Narkhede
·
Follow
Published in
5 min read
The ROC curve is plotted with TPR against the FPR where TPR is on
the y-axis and FPR is on the x-axis.
AUC - ROC Curve [Image 2] (Image courtesy: My Photoshopped Collection)
Image 3
Specificity
Image 4
FPR
Image 5
This is an ideal situation. When two curves don’t overlap at all means
model has an ideal measure of separability. It is perfectly able to
distinguish between positive class and negative class.
[Image 8 and 9] (Image courtesy: My Photoshopped Collection)
This is the worst situation. When AUC is approximately 0.5, the model
has no discrimination capacity to distinguish between positive class
and negative class.
[Image 12 and 13] (Image courtesy: My Photoshopped Collection)
The ROC curve is created by plotting the True Positive Rate (TPR) against the False Positive Rate (FPR) at
various classification thresholds. The TPR is also known as sensitivity or recall, representing the
proportion of positive instances correctly classified as positive. The FPR is the ratio of negative instances
incorrectly classified as positive. Each point on the ROC curve corresponds to a different threshold
setting for classifying positive and negative instances.
The AUC-ROC is the area under this ROC curve, ranging from 0 to 1. It provides a measure of the
classifier's ability to distinguish between positive and negative instances across all possible threshold
settings. The higher the AUC-ROC value, the better the classifier's performance.
An AUC-ROC of 0.5 indicates that the classifier performs no better than random guessing, while an AUC-
ROC of 1 indicates a perfect classifier with a clear separation between positive and negative instances.
AUC-ROC values between 0.5 and 1 represent varying degrees of classification performance.
The AUC-ROC metric is particularly useful when dealing with imbalanced datasets or when the cost of
false positives and false negatives is not balanced. It is commonly used in medical diagnostics, credit
scoring, fraud detection, and many other applications where binary classification is essential.
In summary, AUC-ROC provides a concise summary of a binary classifier's performance across various
classification thresholds, enabling effective comparison and evaluation of different models.