0% found this document useful (0 votes)

26 views8 pages

Unit 4 DWDM

The document discusses several machine learning concepts: 1. Accuracy is defined as the number of correct predictions divided by the total number of predictions. It is used to evaluate classification models. 2. Cross-validation involves reserving a portion of the dataset for testing a model trained on the remaining data, to avoid overfitting. 3. Decision trees are supervised learning algorithms that classify instances by sorting them based on feature values.

Uploaded by

mrdeepuu000

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views8 pages

Unit 4 DWDM

Uploaded by

mrdeepuu000

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

1. What is the accuracy of classifier?

Ans. Accuracy is one metric for evaluating classification models. Informally, accuracy is the
fraction of predictions our model got right. Formally, accuracy has the following definition:

Accuracy=Number of correct predictions/Total number of predictions

For binary classification, accuracy can also be calculated in terms of positives and negatives
as follows:

Accuracy=TP+TN/TP+TN+FP+FN

Where TP = True Positives, TN = True Negatives, FP = False Positives, and FN = False

Negatives.

2. What is cross validation?

Cross-Validation
Cross-validation is a technique in which we train our model using the subset of the data-set
and then evaluate using the complementary subset of the data-set. The three steps involved in
cross-validation are as follows :
Reserve some portion of sample data-set.
Using the rest data-set train the model.
Test the model using the reserve portion of the data-set.
3.What is a Decision Tree?
A decision tree is a non-parametric supervised learning algorithm, which is utilized for both
classification and regression tasks. It has a hierarchical, tree structure, which consists of a
root node, branches, internal nodes and leaf nodes.

4.How is the accuracy of a classifier measured?

Ans. The accuracy of a classifier is given as the percentage of total correct predictions
divided by the total number of instances.

Methods To Find Accuracy Of The Classifiers

 Holdout Method
 Random Subsampling
 K-fold Cross-Validation
 Bootstrap Methods
5.What are the different types of data used in cluster analysis?
symmetric binary, asymmetric binary, nominal, ordinal, interval, and ratio. And those
combinedly called as mixed-type variables.
6. What is Bootstrap?

Bootstrap provides a set of pre-designed HTML, CSS, and JavaScript components and tools
that make it easier for web developers to create websites and web applications quickly and
efficiently. The framework includes a responsive grid system, typography, forms, buttons,
navigation, modals, carousels, and many other UI components.
7. Explain the concept of Bagging
Bagging, also known as Bootstrap aggregating, is an ensemble learning technique that helps
to improve the performance and accuracy of machine learning algorithms.
8. What is clustering?
Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects
in the same group (called a cluster) are more similar (in some sense) to each other than to
those in other groups (clusters).
9. What is classification?
Classification is a supervised machine learning method where the model tries to predict the
correct label of a given input data. In classification, the model is fully trained using the
training data, and then it is evaluated on test data before being used to perform prediction on
new unseen data.
10. Define Gini index.
Gini Index is a powerful measure of the randomness or the impurity or entropy in the values
of a dataset. Gini Index aims to decrease the impurities from the root nodes (at the top of
decision tree) to the leaf nodes (vertical branches down the decision tree) of a decision tree
model.
But what is actually meant by ‘impurity’?
If all the elements belong to a single class, then it can be called pure. The degree of Gini
Index varies between 0 and 1,

where,
'0' denotes that all elements belong to a certain class or there exists only one class (pure),
and
'1' denotes that the elements are randomly distributed across various classes (impure).

A Gini Index of '0.5 'denotes equally distributed elements into some classes.

11.Define Information Gain.

Information gain is the reduction in entropy or surprise by transforming a dataset and is

calculated by comparing the entropy of the dataset before and after a transformation.
12.Difference between Classification and Clustering

S.No. Classification Clustering

1 It is an approach to classifying the It is used to set the instances on the

input instances on the basis of related basis of their resemblance without
class labels. class labels.

2 Classification is a type of supervised Clustering is a kind of unsupervised

learning method. learning method.

3 It prefers a training dataset. It does not prefer a training dataset.

4 Classification is more complex as Clustering is less complex as

compared to clustering. compared to the classification.

5 Here, we utilised the labels for training Here, we don’t prefer the labels for
data. training data.
Classification Prediction

Classification is the process of identifying Predication is the process of

which category a new observation identifying the missing or
belongs to based on a training data set unavailable numerical data for
containing observations whose category a new observation.
membership is known.

In classification, the accuracy depends on In prediction, the accuracy 13.

finding the class label correctly. depends on how well a given What is
predictor can guess the value
of a predicated attribute for
new data.

In classification, the model can be known In prediction, the model can

as the classifier. be known as the predictor.

A model or the classifier is constructed to A model or a predictor will be

find the categorical labels. constructed that predicts a
continuous-valued function or
ordered value.

For example, the grouping of patients For example, We can think of

based on their medical records can be prediction as predicting the
considered a classification. correct treatment for a
particular disease for a
person.
14.Prediction? Discuss any prediction technique in brief.

A prediction (Latin præ-, "before," and dicere, "to say"), or forecast, is a statement about a
future event or data. They are often, but not always, based upon experience or knowledge. There
is no universal agreement about the exact difference from "estimation"; different authors and
disciplines ascribe different connotations.

The training dataset contains the inputs and numerical output values. According to the training
dataset, the algorithm generates a model or predictor. When fresh data is provided, the model
should find a numerical output. This approach, unlike classification, does not have a class label.

In most cases, regression is utilized to make predictions. For example: Predicting the worth of a
home based on facts like the number of rooms, total area, and so on.

Consider the following scenario: A marketing manager needs to forecast how much a specific
consumer will spend during a sale. In this scenario, we are bothered to forecast a numerical value. In
this situation, a model or predictor that forecasts a continuous or ordered value function will be
built.
Prediction Issues:

Preparing the data for prediction is the most pressing challenge. The following activities are involved
in data preparation:

Data Cleaning: Cleaning data include reducing noise and treating missing values. Smoothing
techniques remove noise, and the problem of missing values is solved by replacing a missing value
with the most often occurring value for that characteristic.

Relevance Analysis: The irrelevant attributes may also be present in the database. The correlation
analysis method is used to determine whether two attributes are connected.

Data Transformation and Reduction: Any of the methods listed below can be used to transform the
data.

Normalization: Normalization is used to transform the data. Normalization is the process of scaling all
values for a given attribute so that they lie within a narrow range. When neural networks or methods
requiring measurements are utilized in the learning process, normalization is performed.

Generalization: The data can also be modified by applying a higher idea to it. We can use the concept
of hierarchies for this.

Define Data Visualization.

Data visualization is the representation of data through use of common graphics, such as
charts, plots, infographics, and even animations.

Dashboards include common visualization techniques, such as:

 Tables: This consists of rows and columns used to compare variables. Tables can
show a great deal of information in a structured way, but they can also overwhelm
users that are simply looking for high-level trends.
 Pie charts and stacked bar charts: These graphs are divided into sections that
represent parts of a whole. They provide a simple way to organize data and compare
the size of each component to one other.
 Line charts and area charts: These visuals show change in one or more quantities
by plotting a series of data points over time and are frequently used within predictive
analytics. Line graphs utilize lines to demonstrate these changes while area charts
connect data points with line segments, stacking variables on top of one another and
using color to distinguish between variables.
 Histograms: This graph plots a distribution of numbers using a bar chart (with no
spaces between the bars), representing the quantity of data that falls within a
particular range. This visual makes it easy for an end user to identify outliers within a
given dataset.
 Scatter plots: These visuals are beneficial in reveling the relationship between two
variables, and they are commonly used within regression data analysis. However,
these can sometimes be confused with bubble charts, which are used to visualize three
variables via the x-axis, the y-axis, and the size of the bubble.
 Heat maps: These graphical representation displays are helpful in visualizing
behavioral data by location. This can be a location on a map, or even a webpage.
 Tree maps, which display hierarchical data as a set of nested shapes, typically
rectangles. Treemaps are great for comparing the proportions between categories via
their area size.
Data visualization is the graphical representation of information and data in a
pictorial or graphical format(Example: charts, graphs, and maps). Data
visualization tools provide an accessible way to see and understand trends, patterns
in data, and outliers. Data visualization tools and technologies are essential to
analyzing massive amounts of information and making data-driven decisions. The
concept of using pictures is to understand data that has been used for centuries.
General types of data visualization are Charts, Tables, Graphs, Maps, Dashboards.

Categories of Data Visualization

Data visualization is very critical to market research where both numerical and
categorical data can be visualized, which helps in an increase in the impact of
insights and also helps in reducing the risk of analysis paralysis. So, data
visualization is categorized into the following categories:
To read more on this refer to: Categories of Data Visualization
Let’s now discuss some of the Advantages of Data Visualization.

Advantages of Data Visualization

1. Better Agreement: In business, for numerous periods, it happens that we need to look at the
exhibitions of two components or two situations. A conventional methodology is to experience the
massive information of both the circumstances and afterward examine it.

2. A Superior Method: It can tackle the difficulty of placing the information of both perspectives into
the pictorial structure. For instance, Google patterns assist us with understanding information
identified with top ventures or inquiries in pictorial or graphical structures.

3. Simple Sharing of Data: With the representation of the information, organizations present another
arrangement of correspondence. Rather than sharing the cumbersome information, sharing the
visual data will draw in and pass on across the data which is more absorbable.

4. Deals Investigation: With the assistance of information representation, a salesman can, without
much of a stretch, comprehend the business chart of items. With information perception
instruments like warmth maps, he will have the option to comprehend the causes that are pushing
the business numbers up just as the reasons that are debasing the business numbers.

5. Discovering Relations Between Occasions: A business is influenced by a lot of elements. Finding a

relationship between these elements or occasions encourages chiefs to comprehend the issues
identified with their business

6. Investigating Openings and Patterns: With the huge loads of information present, business chiefs
can discover the profundity of information in regard to the patterns and openings around them.
Utilizing information representation, the specialists can discover examples of the conduct of their
clients, subsequently preparing for them to investigate patterns and open doors for business.

Disadvantages of data visualization

Can be time-consuming: Creating visualizations can be a time-consuming process, especially when
dealing with large and complex datasets. This can slow down the machine learning workflow and
reduce productivity.

Can be misleading: While data visualization can help identify patterns and relationships in data, it can
also be misleading if not done correctly. Visualizations can create the impression of patterns or
trends that may not actually exist, leading to incorrect conclusions and poor decision-making.

Can be difficult to interpret: Some types of visualizations, such as those that involve 3D or interactive
elements, can be difficult to interpret and understand. This can lead to confusion and
misinterpretation of the data.

May not be suitable for all types of data: Certain types of data, such as text or audio data, may not
lend themselves well to visualization. In these cases, alternative methods of analysis may be more
appropriate.

May not be accessible to all users: Some users may have visual impairments or other disabilities that
make it difficult or impossible for them to interpret visualizations. In these cases,
alternative methods of presenting data may be necessary to ensure accessibility.

Data Mining Classification and Prediction
No ratings yet
Data Mining Classification and Prediction
17 pages
Classification Basic Concept - Data Mining
No ratings yet
Classification Basic Concept - Data Mining
20 pages
DCSN 216 Summary
No ratings yet
DCSN 216 Summary
23 pages
What Is Classification? What Is Prediction?
No ratings yet
What Is Classification? What Is Prediction?
36 pages
Training Report On Machine Learning PDF
No ratings yet
Training Report On Machine Learning PDF
28 pages
Data Mining UNIT-2 Notes
No ratings yet
Data Mining UNIT-2 Notes
91 pages
Lecture 16
No ratings yet
Lecture 16
14 pages
Unit-5 3161610
No ratings yet
Unit-5 3161610
92 pages
New Classification11
No ratings yet
New Classification11
98 pages
Classification
No ratings yet
Classification
73 pages
Unit 3
No ratings yet
Unit 3
28 pages
It 311-Ads Module 5
No ratings yet
It 311-Ads Module 5
9 pages
Classification (Part II)
No ratings yet
Classification (Part II)
162 pages
Unit-4 Data Mining
No ratings yet
Unit-4 Data Mining
19 pages
ClassificationandPrediction Module3
No ratings yet
ClassificationandPrediction Module3
88 pages
ML Unit 3
No ratings yet
ML Unit 3
10 pages
Classification & Prediction
No ratings yet
Classification & Prediction
24 pages
DCSN 216 Summary
No ratings yet
DCSN 216 Summary
19 pages
Unit-6: Classification and Prediction
No ratings yet
Unit-6: Classification and Prediction
63 pages
Classification Analysis
No ratings yet
Classification Analysis
4 pages
Unit 3
No ratings yet
Unit 3
53 pages
Big Data Analytics - Unit 3
No ratings yet
Big Data Analytics - Unit 3
55 pages
Classification and Prediction Chapter6 Detailed Notes
No ratings yet
Classification and Prediction Chapter6 Detailed Notes
4 pages
Data Mining and Warehousing Mod3
No ratings yet
Data Mining and Warehousing Mod3
69 pages
Unit6 - 1 Classification-and-Prediction-Basics
No ratings yet
Unit6 - 1 Classification-and-Prediction-Basics
12 pages
Unit 3 DM
No ratings yet
Unit 3 DM
34 pages
DM Unit 4
No ratings yet
DM Unit 4
22 pages
Unit 3 (DWDM)
No ratings yet
Unit 3 (DWDM)
23 pages
Data Mining Unit 3
No ratings yet
Data Mining Unit 3
50 pages
3 DM Classification
No ratings yet
3 DM Classification
62 pages
Classify Vs Pedict
No ratings yet
Classify Vs Pedict
6 pages
Data Mining Jntuh Cse R18
No ratings yet
Data Mining Jntuh Cse R18
20 pages
Dataminingshort Question Part2
No ratings yet
Dataminingshort Question Part2
17 pages
Classification Unit-4
No ratings yet
Classification Unit-4
19 pages
Chapter 4 Classification
No ratings yet
Chapter 4 Classification
78 pages
4 - Data Analytics Using DM and ML Algorithms - 1
No ratings yet
4 - Data Analytics Using DM and ML Algorithms - 1
71 pages
DM Chapter 4
No ratings yet
DM Chapter 4
47 pages
3 DM Classification
No ratings yet
3 DM Classification
55 pages
Classification and Prediction
No ratings yet
Classification and Prediction
41 pages
Overview Basics
No ratings yet
Overview Basics
16 pages
Bi Unit 5
No ratings yet
Bi Unit 5
20 pages
5 What Is Data-WPS Office
No ratings yet
5 What Is Data-WPS Office
19 pages
Classification Unit3
No ratings yet
Classification Unit3
15 pages
Lecture 9
No ratings yet
Lecture 9
27 pages
Data Mining-Unit-3
No ratings yet
Data Mining-Unit-3
16 pages
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
No ratings yet
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
43 pages
Unit Iii Classification
No ratings yet
Unit Iii Classification
57 pages
DM Unit-3
No ratings yet
DM Unit-3
46 pages
DM Unit - 3
No ratings yet
DM Unit - 3
21 pages
V1-CH-6-Classification and Prediction
No ratings yet
V1-CH-6-Classification and Prediction
38 pages
Bilal Ahmed Shaik Data Mining
No ratings yet
Bilal Ahmed Shaik Data Mining
88 pages
Data Mining Unit-IV
No ratings yet
Data Mining Unit-IV
5 pages
18mca52c U3
No ratings yet
18mca52c U3
8 pages
Classification: Unit-III
No ratings yet
Classification: Unit-III
90 pages
Data Mining Classification Prediction
No ratings yet
Data Mining Classification Prediction
3 pages
For More Visit WWW - Ktunotes.in
No ratings yet
For More Visit WWW - Ktunotes.in
21 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
Yong Shi - Advances in Big Data Analytics - Theory, Algorithms and Practices (2022, Springer) - Libgen - Li
No ratings yet
Yong Shi - Advances in Big Data Analytics - Theory, Algorithms and Practices (2022, Springer) - Libgen - Li
723 pages
Machine Learning & Data Mining
No ratings yet
Machine Learning & Data Mining
4 pages
Dell Case Analysis
57% (7)
Dell Case Analysis
77 pages
Introduction To Business Analytics: Alka Vaidya Nibm
100% (1)
Introduction To Business Analytics: Alka Vaidya Nibm
41 pages
Algorithms and Tools of Big Dat3
No ratings yet
Algorithms and Tools of Big Dat3
66 pages
Mooc On Weka
No ratings yet
Mooc On Weka
59 pages
21csc206t Unit 5
No ratings yet
21csc206t Unit 5
174 pages
REVIEW. Artificial Intelligence and Machine Learning-Based Monitoring and Design of Biological Wastewater Treatment Systems
No ratings yet
REVIEW. Artificial Intelligence and Machine Learning-Based Monitoring and Design of Biological Wastewater Treatment Systems
13 pages
2007 - F. Guillet & H. J. Hamilton (Eds) - Quality Measures in Data Mining (Springer)
No ratings yet
2007 - F. Guillet & H. J. Hamilton (Eds) - Quality Measures in Data Mining (Springer)
316 pages
Syllabus 6th
No ratings yet
Syllabus 6th
56 pages
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
No ratings yet
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
40 pages
CS1004 DWM 2marks 2013
No ratings yet
CS1004 DWM 2marks 2013
22 pages
Unit 2
No ratings yet
Unit 2
40 pages
CO 3 Complete Notes
No ratings yet
CO 3 Complete Notes
40 pages
DMDW Lab Oral Question Bank
No ratings yet
DMDW Lab Oral Question Bank
4 pages
Text As Data
No ratings yet
Text As Data
31 pages
Knowledge Discovery in Databases: An Overview: William J. Frawley, Gregory Piatetsky-Shapiro, and Christopher J. Matheus
No ratings yet
Knowledge Discovery in Databases: An Overview: William J. Frawley, Gregory Piatetsky-Shapiro, and Christopher J. Matheus
14 pages
720-Article Text-3595-1-10-20240613
No ratings yet
720-Article Text-3595-1-10-20240613
15 pages
Job Runtime Prediction of HPC Cluster Based On PC-Transformer
No ratings yet
Job Runtime Prediction of HPC Cluster Based On PC-Transformer
27 pages
Clusters Are Individuals - Volume II - Annex PDF
No ratings yet
Clusters Are Individuals - Volume II - Annex PDF
260 pages
Summary Data Quality Course
No ratings yet
Summary Data Quality Course
7 pages
3250-Article Text-6299-1-10-20190531
No ratings yet
3250-Article Text-6299-1-10-20190531
10 pages
Unsupervised Optimal Fuzzy Clustering: I.Gath and A. B. Geva. IEEE Transactions On Pattern
No ratings yet
Unsupervised Optimal Fuzzy Clustering: I.Gath and A. B. Geva. IEEE Transactions On Pattern
34 pages
Pacheco Et Al (2014) Cognitive Profiles DD in Portuguese
No ratings yet
Pacheco Et Al (2014) Cognitive Profiles DD in Portuguese
17 pages
Word Level Analyis III
No ratings yet
Word Level Analyis III
24 pages
Which ML Algo Should I Use SAS
No ratings yet
Which ML Algo Should I Use SAS
20 pages
Consumer Acceptance of The New Red Fleshed Apple Variety: December 2016
No ratings yet
Consumer Acceptance of The New Red Fleshed Apple Variety: December 2016
23 pages
Data Warehousing and Data Mining Iv-Cse A: Prepared by
No ratings yet
Data Warehousing and Data Mining Iv-Cse A: Prepared by
5 pages
Increasing Accuracy of Automated Essay Grading by Grouping Similar Graders
No ratings yet
Increasing Accuracy of Automated Essay Grading by Grouping Similar Graders
6 pages
MVS Clustering of Sparse and High Dimensional Data
No ratings yet
MVS Clustering of Sparse and High Dimensional Data
5 pages
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
Statistical Classification: Fundamentals and Applications
From Everand
Statistical Classification: Fundamentals and Applications
Fouad Sabry
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet

Unit 4 DWDM

Uploaded by

Unit 4 DWDM

Uploaded by

1. What is the accuracy of classifier?

Accuracy=Number of correct predictions/Total number of predictions

Where TP = True Positives, TN = True Negatives, FP = False Positives, and FN = False

2. What is cross validation?

4.How is the accuracy of a classifier measured?

Methods To Find Accuracy Of The Classifiers

11.Define Information Gain.

Information gain is the reduction in entropy or surprise by transforming a dataset and is

S.No. Classification Clustering

1 It is an approach to classifying the It is used to set the instances on the

2 Classification is a type of supervised Clustering is a kind of unsupervised

3 It prefers a training dataset. It does not prefer a training dataset.

4 Classification is more complex as Clustering is less complex as

Classification is the process of identifying Predication is the process of

In classification, the accuracy depends on In prediction, the accuracy 13.

In classification, the model can be known In prediction, the model can

A model or the classifier is constructed to A model or a predictor will be

For example, the grouping of patients For example, We can think of

Define Data Visualization.

Dashboards include common visualization techniques, such as:

Categories of Data Visualization

Advantages of Data Visualization

5. Discovering Relations Between Occasions: A business is influenced by a lot of elements. Finding a

Disadvantages of data visualization

You might also like