Unit 4 DWDM
Unit 4 DWDM
Ans. Accuracy is one metric for evaluating classification models. Informally, accuracy is the
fraction of predictions our model got right. Formally, accuracy has the following definition:
For binary classification, accuracy can also be calculated in terms of positives and negatives
as follows:
Accuracy=TP+TN/TP+TN+FP+FN
Holdout Method
Random Subsampling
K-fold Cross-Validation
Bootstrap Methods
5.What are the different types of data used in cluster analysis?
symmetric binary, asymmetric binary, nominal, ordinal, interval, and ratio. And those
combinedly called as mixed-type variables.
6. What is Bootstrap?
Bootstrap provides a set of pre-designed HTML, CSS, and JavaScript components and tools
that make it easier for web developers to create websites and web applications quickly and
efficiently. The framework includes a responsive grid system, typography, forms, buttons,
navigation, modals, carousels, and many other UI components.
7. Explain the concept of Bagging
Bagging, also known as Bootstrap aggregating, is an ensemble learning technique that helps
to improve the performance and accuracy of machine learning algorithms.
8. What is clustering?
Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects
in the same group (called a cluster) are more similar (in some sense) to each other than to
those in other groups (clusters).
9. What is classification?
Classification is a supervised machine learning method where the model tries to predict the
correct label of a given input data. In classification, the model is fully trained using the
training data, and then it is evaluated on test data before being used to perform prediction on
new unseen data.
10. Define Gini index.
Gini Index is a powerful measure of the randomness or the impurity or entropy in the values
of a dataset. Gini Index aims to decrease the impurities from the root nodes (at the top of
decision tree) to the leaf nodes (vertical branches down the decision tree) of a decision tree
model.
But what is actually meant by ‘impurity’?
If all the elements belong to a single class, then it can be called pure. The degree of Gini
Index varies between 0 and 1,
where,
'0' denotes that all elements belong to a certain class or there exists only one class (pure),
and
'1' denotes that the elements are randomly distributed across various classes (impure).
A Gini Index of '0.5 'denotes equally distributed elements into some classes.
5 Here, we utilised the labels for training Here, we don’t prefer the labels for
data. training data.
Classification Prediction
A prediction (Latin præ-, "before," and dicere, "to say"), or forecast, is a statement about a
future event or data. They are often, but not always, based upon experience or knowledge. There
is no universal agreement about the exact difference from "estimation"; different authors and
disciplines ascribe different connotations.
The training dataset contains the inputs and numerical output values. According to the training
dataset, the algorithm generates a model or predictor. When fresh data is provided, the model
should find a numerical output. This approach, unlike classification, does not have a class label.
In most cases, regression is utilized to make predictions. For example: Predicting the worth of a
home based on facts like the number of rooms, total area, and so on.
Consider the following scenario: A marketing manager needs to forecast how much a specific
consumer will spend during a sale. In this scenario, we are bothered to forecast a numerical value. In
this situation, a model or predictor that forecasts a continuous or ordered value function will be
built.
Prediction Issues:
Preparing the data for prediction is the most pressing challenge. The following activities are involved
in data preparation:
Data Cleaning: Cleaning data include reducing noise and treating missing values. Smoothing
techniques remove noise, and the problem of missing values is solved by replacing a missing value
with the most often occurring value for that characteristic.
Relevance Analysis: The irrelevant attributes may also be present in the database. The correlation
analysis method is used to determine whether two attributes are connected.
Data Transformation and Reduction: Any of the methods listed below can be used to transform the
data.
Normalization: Normalization is used to transform the data. Normalization is the process of scaling all
values for a given attribute so that they lie within a narrow range. When neural networks or methods
requiring measurements are utilized in the learning process, normalization is performed.
Generalization: The data can also be modified by applying a higher idea to it. We can use the concept
of hierarchies for this.
Data visualization is the representation of data through use of common graphics, such as
charts, plots, infographics, and even animations.
Tables: This consists of rows and columns used to compare variables. Tables can
show a great deal of information in a structured way, but they can also overwhelm
users that are simply looking for high-level trends.
Pie charts and stacked bar charts: These graphs are divided into sections that
represent parts of a whole. They provide a simple way to organize data and compare
the size of each component to one other.
Line charts and area charts: These visuals show change in one or more quantities
by plotting a series of data points over time and are frequently used within predictive
analytics. Line graphs utilize lines to demonstrate these changes while area charts
connect data points with line segments, stacking variables on top of one another and
using color to distinguish between variables.
Histograms: This graph plots a distribution of numbers using a bar chart (with no
spaces between the bars), representing the quantity of data that falls within a
particular range. This visual makes it easy for an end user to identify outliers within a
given dataset.
Scatter plots: These visuals are beneficial in reveling the relationship between two
variables, and they are commonly used within regression data analysis. However,
these can sometimes be confused with bubble charts, which are used to visualize three
variables via the x-axis, the y-axis, and the size of the bubble.
Heat maps: These graphical representation displays are helpful in visualizing
behavioral data by location. This can be a location on a map, or even a webpage.
Tree maps, which display hierarchical data as a set of nested shapes, typically
rectangles. Treemaps are great for comparing the proportions between categories via
their area size.
Data visualization is the graphical representation of information and data in a
pictorial or graphical format(Example: charts, graphs, and maps). Data
visualization tools provide an accessible way to see and understand trends, patterns
in data, and outliers. Data visualization tools and technologies are essential to
analyzing massive amounts of information and making data-driven decisions. The
concept of using pictures is to understand data that has been used for centuries.
General types of data visualization are Charts, Tables, Graphs, Maps, Dashboards.
Data visualization is very critical to market research where both numerical and
categorical data can be visualized, which helps in an increase in the impact of
insights and also helps in reducing the risk of analysis paralysis. So, data
visualization is categorized into the following categories:
To read more on this refer to: Categories of Data Visualization
Let’s now discuss some of the Advantages of Data Visualization.
1. Better Agreement: In business, for numerous periods, it happens that we need to look at the
exhibitions of two components or two situations. A conventional methodology is to experience the
massive information of both the circumstances and afterward examine it.
2. A Superior Method: It can tackle the difficulty of placing the information of both perspectives into
the pictorial structure. For instance, Google patterns assist us with understanding information
identified with top ventures or inquiries in pictorial or graphical structures.
3. Simple Sharing of Data: With the representation of the information, organizations present another
arrangement of correspondence. Rather than sharing the cumbersome information, sharing the
visual data will draw in and pass on across the data which is more absorbable.
4. Deals Investigation: With the assistance of information representation, a salesman can, without
much of a stretch, comprehend the business chart of items. With information perception
instruments like warmth maps, he will have the option to comprehend the causes that are pushing
the business numbers up just as the reasons that are debasing the business numbers.
6. Investigating Openings and Patterns: With the huge loads of information present, business chiefs
can discover the profundity of information in regard to the patterns and openings around them.
Utilizing information representation, the specialists can discover examples of the conduct of their
clients, subsequently preparing for them to investigate patterns and open doors for business.
Can be misleading: While data visualization can help identify patterns and relationships in data, it can
also be misleading if not done correctly. Visualizations can create the impression of patterns or
trends that may not actually exist, leading to incorrect conclusions and poor decision-making.
Can be difficult to interpret: Some types of visualizations, such as those that involve 3D or interactive
elements, can be difficult to interpret and understand. This can lead to confusion and
misinterpretation of the data.
May not be suitable for all types of data: Certain types of data, such as text or audio data, may not
lend themselves well to visualization. In these cases, alternative methods of analysis may be more
appropriate.
May not be accessible to all users: Some users may have visual impairments or other disabilities that
make it difficult or impossible for them to interpret visualizations. In these cases,
alternative methods of presenting data may be necessary to ensure accessibility.