0% found this document useful (0 votes)
26 views8 pages

Unit 4 DWDM

The document discusses several machine learning concepts: 1. Accuracy is defined as the number of correct predictions divided by the total number of predictions. It is used to evaluate classification models. 2. Cross-validation involves reserving a portion of the dataset for testing a model trained on the remaining data, to avoid overfitting. 3. Decision trees are supervised learning algorithms that classify instances by sorting them based on feature values.

Uploaded by

mrdeepuu000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views8 pages

Unit 4 DWDM

The document discusses several machine learning concepts: 1. Accuracy is defined as the number of correct predictions divided by the total number of predictions. It is used to evaluate classification models. 2. Cross-validation involves reserving a portion of the dataset for testing a model trained on the remaining data, to avoid overfitting. 3. Decision trees are supervised learning algorithms that classify instances by sorting them based on feature values.

Uploaded by

mrdeepuu000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

1. What is the accuracy of classifier?

Ans. Accuracy is one metric for evaluating classification models. Informally, accuracy is the
fraction of predictions our model got right. Formally, accuracy has the following definition:

Accuracy=Number of correct predictions/Total number of predictions

For binary classification, accuracy can also be calculated in terms of positives and negatives
as follows:

Accuracy=TP+TN/TP+TN+FP+FN

Where TP = True Positives, TN = True Negatives, FP = False Positives, and FN = False


Negatives.

2. What is cross validation?


Cross-Validation
Cross-validation is a technique in which we train our model using the subset of the data-set
and then evaluate using the complementary subset of the data-set. The three steps involved in
cross-validation are as follows :
Reserve some portion of sample data-set.
Using the rest data-set train the model.
Test the model using the reserve portion of the data-set.
3.What is a Decision Tree?
A decision tree is a non-parametric supervised learning algorithm, which is utilized for both
classification and regression tasks. It has a hierarchical, tree structure, which consists of a
root node, branches, internal nodes and leaf nodes.

4.How is the accuracy of a classifier measured?


Ans. The accuracy of a classifier is given as the percentage of total correct predictions
divided by the total number of instances.

Methods To Find Accuracy Of The Classifiers

 Holdout Method
 Random Subsampling
 K-fold Cross-Validation
 Bootstrap Methods
5.What are the different types of data used in cluster analysis?
symmetric binary, asymmetric binary, nominal, ordinal, interval, and ratio. And those
combinedly called as mixed-type variables.
6. What is Bootstrap?

Bootstrap provides a set of pre-designed HTML, CSS, and JavaScript components and tools
that make it easier for web developers to create websites and web applications quickly and
efficiently. The framework includes a responsive grid system, typography, forms, buttons,
navigation, modals, carousels, and many other UI components.
7. Explain the concept of Bagging
Bagging, also known as Bootstrap aggregating, is an ensemble learning technique that helps
to improve the performance and accuracy of machine learning algorithms.
8. What is clustering?
Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects
in the same group (called a cluster) are more similar (in some sense) to each other than to
those in other groups (clusters).
9. What is classification?
Classification is a supervised machine learning method where the model tries to predict the
correct label of a given input data. In classification, the model is fully trained using the
training data, and then it is evaluated on test data before being used to perform prediction on
new unseen data.
10. Define Gini index.
Gini Index is a powerful measure of the randomness or the impurity or entropy in the values
of a dataset. Gini Index aims to decrease the impurities from the root nodes (at the top of
decision tree) to the leaf nodes (vertical branches down the decision tree) of a decision tree
model.
But what is actually meant by ‘impurity’?
If all the elements belong to a single class, then it can be called pure. The degree of Gini
Index varies between 0 and 1,

where,
'0' denotes that all elements belong to a certain class or there exists only one class (pure),
and
'1' denotes that the elements are randomly distributed across various classes (impure).

A Gini Index of '0.5 'denotes equally distributed elements into some classes.

11.Define Information Gain.

Information gain is the reduction in entropy or surprise by transforming a dataset and is


calculated by comparing the entropy of the dataset before and after a transformation.
12.Difference between Classification and Clustering

S.No. Classification Clustering

1 It is an approach to classifying the It is used to set the instances on the


input instances on the basis of related basis of their resemblance without
class labels. class labels.

2 Classification is a type of supervised Clustering is a kind of unsupervised


learning method. learning method.

3 It prefers a training dataset. It does not prefer a training dataset.

4 Classification is more complex as Clustering is less complex as


compared to clustering. compared to the classification.

5 Here, we utilised the labels for training Here, we don’t prefer the labels for
data. training data.
Classification Prediction

Classification is the process of identifying Predication is the process of


which category a new observation identifying the missing or
belongs to based on a training data set unavailable numerical data for
containing observations whose category a new observation.
membership is known.

In classification, the accuracy depends on In prediction, the accuracy 13.


finding the class label correctly. depends on how well a given What is
predictor can guess the value
of a predicated attribute for
new data.

In classification, the model can be known In prediction, the model can


as the classifier. be known as the predictor.

A model or the classifier is constructed to A model or a predictor will be


find the categorical labels. constructed that predicts a
continuous-valued function or
ordered value.

For example, the grouping of patients For example, We can think of


based on their medical records can be prediction as predicting the
considered a classification. correct treatment for a
particular disease for a
person.
14.Prediction? Discuss any prediction technique in brief.

A prediction (Latin præ-, "before," and dicere, "to say"), or forecast, is a statement about a
future event or data. They are often, but not always, based upon experience or knowledge. There
is no universal agreement about the exact difference from "estimation"; different authors and
disciplines ascribe different connotations.

The training dataset contains the inputs and numerical output values. According to the training
dataset, the algorithm generates a model or predictor. When fresh data is provided, the model
should find a numerical output. This approach, unlike classification, does not have a class label.

In most cases, regression is utilized to make predictions. For example: Predicting the worth of a
home based on facts like the number of rooms, total area, and so on.

Consider the following scenario: A marketing manager needs to forecast how much a specific
consumer will spend during a sale. In this scenario, we are bothered to forecast a numerical value. In
this situation, a model or predictor that forecasts a continuous or ordered value function will be
built.
Prediction Issues:

Preparing the data for prediction is the most pressing challenge. The following activities are involved
in data preparation:

Data Cleaning: Cleaning data include reducing noise and treating missing values. Smoothing
techniques remove noise, and the problem of missing values is solved by replacing a missing value
with the most often occurring value for that characteristic.

Relevance Analysis: The irrelevant attributes may also be present in the database. The correlation
analysis method is used to determine whether two attributes are connected.

Data Transformation and Reduction: Any of the methods listed below can be used to transform the
data.

Normalization: Normalization is used to transform the data. Normalization is the process of scaling all
values for a given attribute so that they lie within a narrow range. When neural networks or methods
requiring measurements are utilized in the learning process, normalization is performed.

Generalization: The data can also be modified by applying a higher idea to it. We can use the concept
of hierarchies for this.

Define Data Visualization.

Data visualization is the representation of data through use of common graphics, such as
charts, plots, infographics, and even animations.

Dashboards include common visualization techniques, such as:

 Tables: This consists of rows and columns used to compare variables. Tables can
show a great deal of information in a structured way, but they can also overwhelm
users that are simply looking for high-level trends.
 Pie charts and stacked bar charts: These graphs are divided into sections that
represent parts of a whole. They provide a simple way to organize data and compare
the size of each component to one other.
 Line charts and area charts: These visuals show change in one or more quantities
by plotting a series of data points over time and are frequently used within predictive
analytics. Line graphs utilize lines to demonstrate these changes while area charts
connect data points with line segments, stacking variables on top of one another and
using color to distinguish between variables.
 Histograms: This graph plots a distribution of numbers using a bar chart (with no
spaces between the bars), representing the quantity of data that falls within a
particular range. This visual makes it easy for an end user to identify outliers within a
given dataset.
 Scatter plots: These visuals are beneficial in reveling the relationship between two
variables, and they are commonly used within regression data analysis. However,
these can sometimes be confused with bubble charts, which are used to visualize three
variables via the x-axis, the y-axis, and the size of the bubble.
 Heat maps: These graphical representation displays are helpful in visualizing
behavioral data by location. This can be a location on a map, or even a webpage.
 Tree maps, which display hierarchical data as a set of nested shapes, typically
rectangles. Treemaps are great for comparing the proportions between categories via
their area size.
Data visualization is the graphical representation of information and data in a
pictorial or graphical format(Example: charts, graphs, and maps). Data
visualization tools provide an accessible way to see and understand trends, patterns
in data, and outliers. Data visualization tools and technologies are essential to
analyzing massive amounts of information and making data-driven decisions. The
concept of using pictures is to understand data that has been used for centuries.
General types of data visualization are Charts, Tables, Graphs, Maps, Dashboards.

Categories of Data Visualization

Data visualization is very critical to market research where both numerical and
categorical data can be visualized, which helps in an increase in the impact of
insights and also helps in reducing the risk of analysis paralysis. So, data
visualization is categorized into the following categories:
To read more on this refer to: Categories of Data Visualization
Let’s now discuss some of the Advantages of Data Visualization.

Advantages of Data Visualization

1. Better Agreement: In business, for numerous periods, it happens that we need to look at the
exhibitions of two components or two situations. A conventional methodology is to experience the
massive information of both the circumstances and afterward examine it.

2. A Superior Method: It can tackle the difficulty of placing the information of both perspectives into
the pictorial structure. For instance, Google patterns assist us with understanding information
identified with top ventures or inquiries in pictorial or graphical structures.

3. Simple Sharing of Data: With the representation of the information, organizations present another
arrangement of correspondence. Rather than sharing the cumbersome information, sharing the
visual data will draw in and pass on across the data which is more absorbable.

4. Deals Investigation: With the assistance of information representation, a salesman can, without
much of a stretch, comprehend the business chart of items. With information perception
instruments like warmth maps, he will have the option to comprehend the causes that are pushing
the business numbers up just as the reasons that are debasing the business numbers.

5. Discovering Relations Between Occasions: A business is influenced by a lot of elements. Finding a


relationship between these elements or occasions encourages chiefs to comprehend the issues
identified with their business

6. Investigating Openings and Patterns: With the huge loads of information present, business chiefs
can discover the profundity of information in regard to the patterns and openings around them.
Utilizing information representation, the specialists can discover examples of the conduct of their
clients, subsequently preparing for them to investigate patterns and open doors for business.

Disadvantages of data visualization


Can be time-consuming: Creating visualizations can be a time-consuming process, especially when
dealing with large and complex datasets. This can slow down the machine learning workflow and
reduce productivity.

Can be misleading: While data visualization can help identify patterns and relationships in data, it can
also be misleading if not done correctly. Visualizations can create the impression of patterns or
trends that may not actually exist, leading to incorrect conclusions and poor decision-making.

Can be difficult to interpret: Some types of visualizations, such as those that involve 3D or interactive
elements, can be difficult to interpret and understand. This can lead to confusion and
misinterpretation of the data.

May not be suitable for all types of data: Certain types of data, such as text or audio data, may not
lend themselves well to visualization. In these cases, alternative methods of analysis may be more
appropriate.

May not be accessible to all users: Some users may have visual impairments or other disabilities that
make it difficult or impossible for them to interpret visualizations. In these cases,
alternative methods of presenting data may be necessary to ensure accessibility.

You might also like