0% found this document useful (0 votes)
28 views25 pages

DV Chapter 1

The document provides an overview of data visualization, emphasizing its importance in analyzing and interpreting data through graphical representations like charts and graphs. It discusses various data classification methods, data collection types, and transformation techniques, highlighting their roles in enhancing decision-making and understanding trends. Additionally, it covers data mining, processing, analysis, reporting, and cleaning, underscoring the significance of high-quality data for effective insights.

Uploaded by

Prasanna Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views25 pages

DV Chapter 1

The document provides an overview of data visualization, emphasizing its importance in analyzing and interpreting data through graphical representations like charts and graphs. It discusses various data classification methods, data collection types, and transformation techniques, highlighting their roles in enhancing decision-making and understanding trends. Additionally, it covers data mining, processing, analysis, reporting, and cleaning, underscoring the significance of high-quality data for effective insights.

Uploaded by

Prasanna Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

Data Visualization

Dr. B. SUJATHA
ASSOCIATE PROFESSOR
UNIT 1- INTRODUCTION TO DATA
VISUALIZATION
Data is a collection of raw facts, numbers, text, sound, images, or
any other format
Data classification is the process of organizing data into groups
based on shared characteristics

The main objectives of Classification of Data are as follows:


• Explain similarities and differences of data
• Simplify and condense data’s mass
• Facilitate comparisons
• Study the relationship
• Prepare data for tabular presentation
• Present a mental picture of the data
Basis of Classification of Data
The classification of statistical data is done
after considering the scope, nature, and purpose
of an investigation and is generally done on four
bases; viz., geographical location, chronology,
qualitative characteristics, and quantitative
characteristics.
1. Geographical Classification
The classification of data on the basis of geographical location or
region is known as Geographical or Spatial Classification. For
example, presenting the population of different states of a country is
done on the basis of geographical location or region.
2. Chronological Classification
The classification of data with respect to different time
periods is known as Chronological or Temporal
Classification. For example, the number of students in a
school in different years can be presented on the basis of a
time period.
3. Qualitative Classification
The classification of data on the basis of descriptive or qualitative characteristics like region,
caste, sex, gender, education, etc., is known as Qualitative Classification. A qualitative
classification can not be quantified and can be of two types; viz., Simple
Classification and Manifold Classification.
Simple Classification
When based on only one attribute, the given data is classified into two classes, which is
known as Simple Classification. For example, when the population is divided into literate
and illiterate, it is a simple classification.

Manifold Classification
When based on more than one attribute, the given data is classified into different classes,
and then sub-divided into more sub-classes, which is known as Manifold Classification. For
example, when the population is divided into literate and illiterate, then sub-divided into
male and female, and further sub-divided into married and unmarried, it is a manifold
classification.
4. Quantitative Classification
The classification of data on the basis of the characteristics,
such as age, height, weight, income, etc., that can be
measured in quantity is known as Quantitative
Classification. For example, the weight of students in a
class can be classified as quantitative classification.
Data Collection refers to the systematic process of
gathering, measuring, and analyzing data from various sources to
get a complete and accurate picture of an area of interest.
Primary data refers to information collected directly from first-
hand sources specifically for a particular research purpose. This type of
data is gathered through various methods, including surveys, interviews,
experiments, observations, and focus groups. One of the main
advantages of primary data is that it provides current, relevant, and
specific information tailored to the researcher’s needs, offering a high
level of accuracy and control over data quality.
Secondary data refers to information that has already been
collected, processed, and published by others. This type of data can be
sourced from existing research papers, government reports, books,
statistical databases, and company records. The advantage of secondary
data is that it is readily available and often free or less expensive to
obtain compared to primary data. It saves time and resources since the
data collection phase has already been completed.
What is Data Visualization and Why is It Important in
analyzing the data?

Data visualization is the graphical representation of information and


data. By using visual elements like charts, graphs, and maps, data
visualization tools provide an accessible way to see and understand
trends, outliers, and patterns in data. Data visualization translates
complex data sets into visual formats that are easier for the
human brain to comprehend. This can include a variety of visual
tools such as:
• Charts: Bar charts, line charts, pie charts, etc.
• Graphs: Scatter plots, histograms, etc.
• Maps: Geographic maps, heat maps, etc.
• Dashboards: Interactive platforms that combine
multiple visualizations.
The Role of Data Visualization in Decision Making
Data visualization plays an integral role in the decision-
making process, as it helps stakeholders understand trends,
patterns, relationships, and outliers within data. By presenting data
in an easily digestible format, decision-makers can grasp the
implications of the information, leading to more informed choices
and better outcomes.
Furthermore, effective data visualization can foster
collaboration and facilitate communication between team members
by presenting information in a universally understandable manner.
For example, a sales team might use a data visualization tool to track
their progress toward their monthly targets. By presenting this
information in a clear and concise manner, the team can identify
areas where they need to improve and take action accordingly. This
can lead to increased sales, higher revenue, and better overall
performance.
Examples of data visualizations
1. Traditional visuals: Time-tested data visualization tools like
charts (bar, line, pie), graphs (scatter plots, histograms) and
maps remain incredibly powerful for conveying information
quickly and clearly.
2. Infographics: Combine visuals, text and data to present
complex information in a compelling and easy-to-follow
way.
3. Data dashboards: Interactive dashboards consolidate real-
time key performance indicators (KPIs), providing an at-a-
glance overview of business health.
4. Advanced visual techniques: Techniques like heatmaps,
network diagrams and treemaps are used to visualize
complex relationships or hierarchical data.
Benefits of Effective Data Visualization

Effective data visualization offers several benefits, such as:

• Improved comprehension of complex data


• Increased ability to identify trends and patterns
• Enhanced decision-making and problem-solving
capabilities
• Streamlined communication, collaboration, and sharing
of insights
• Reduced time, effort, and resources required to interpret
data
Data transformation techniques
Data transformation techniques refer to all the
actions that help you transform your raw data into a clean
and ready-to-use dataset. The process of data
transformation, involves converting, cleansing, and
structuring data into a usable format which is used to
analyzed to support decision-making processes.
It includes modifying the format, organization, or
values of data to prepare it for consumption by an
application or for analysis. This crucial process is
undertaken by organizations seeking to leverage their data
to provide timely business insights, ensuring that the
information is accessible, consistent, safe, and eventually
acknowledged by the targeted business users.
Different types of data transformation techniques
Data Smoothing:
Problem solved: Smoothing removes noise and fluctuations from data, making it
easier to analyze and interpret.
Use case scenarios: Smoothing can be useful in scenarios where the data is noisy or
contains fluctuations that obscure the underlying patterns.
How it works: Techniques include moving averages, exponential smoothing, and
kernel smoothing. The goal is to reduce noise and fluctuations in the data, making it
easier to analyze and interpret.
Attribute Construction (Feature Engineering):
Problem solved: Attribute construction creates new features or modifies existing
ones to improve the performance of machine learning models.
Use case scenarios: Feature engineering can be useful in various scenarios, such as
combining or aggregating features to capture higher-level patterns, applying
mathematical transformations (e.g., log, square root) to address skewed
distributions, or extracting new information from existing features (e.g., creating a
day of the week from a timestamp).
How it works: Feature engineering can be accomplished through various methods,
such as mathematical transformations, aggregation, binning, and dimensionality
reduction techniques. The goal is to create new data attributes that are more
representative of the underlying patterns in the data and that help to improve the
performance of the machine learning model.
Generalization:
Problem solved: Generalization reduces the complexity of data by replacing
low-level attributes with high-level concepts.
Use case scenarios: Generalization can be useful in scenarios where the
dataset is too complex to analyze, such as in image or speech recognition.
How it works: Techniques include abstraction, summarization, and clustering.
The goal is to reduce the complexity of the data by identifying patterns and
replacing low-level attributes with high-level concepts that are easier to
understand and analyze.

Data Aggregation:
Problem solved: Aggregation combines data at different levels of granularity,
making it easier to analyze and understand.
Use case scenarios: Aggregation can be useful in scenarios where data needs
to be analyzed at different levels of detail, such as in financial analysis or sales
forecasting.
How it works: Techniques include summarization, averaging, and grouping.
The goal is to combine data at different levels of granularity, creating
summaries or averages that are more representative of the underlying
patterns in the data.
Normalization:
Problem solved: Data normalization scales numerical features to a standard
range, typically [0, 1] or [-1, 1]. This prevents features with larger scales
from dominating the model and causing biased results.
Use case scenarios: Normalization is particularly important when working
with machine learning algorithms that are sensitive to the scale of input
features.
How it works: Techniques include min-max scaling and z-score
standardization, which transform the original feature values to a standard
range or distribution, making them more suitable for analysis and modeling.

Generalization:
Problem solved: Generalization reduces the complexity of data by replacing
low-level attributes with high-level concepts.
Use case scenarios: Generalization can be useful in scenarios where the
dataset is too complex to analyze, such as in image or speech recognition.
How it works: Techniques include abstraction, summarization, and clustering.
The goal is to reduce the complexity of the data by identifying patterns and
replacing low-level attributes with high-level concepts that are easier to
understand and analyze.
Filters and slicers
Slicing is the process of extracting a part of a collection (like a list or
a string) by specifying a range. It's like cutting a piece out of a larger
set. You can decide where to start, where to end, and how to step
through the collection.
Key Points:
Start: Where to begin the slice (inclusive).
End: Where to stop the slice (exclusive).
Step: The spacing between elements to include in the slice (optional).
Filtering is the process of picking out specific items from a
collection based on a condition. You can think of it like a sieve: only
the items that match your criteria pass through.
There are two main ways to filter:
Using filter(): This function selects items based on a condition.
Using List Comprehension: A more flexible way to create a new list by
filtering items that meet certain criteria.
Filters and slicers
Aspect Slicing Filtering

Extract a specific range of Select items based on a


Purpose
items based on position. condition.

Using start, end, and step Testing each item against a


Works By (indices). condition.

A subset of the collection, A subset of the collection,


Result based on index positions. based on conditions.

List comprehension or filter()


Syntax iterable[start:end:step] with a condition.

Less flexible (fixed range by More flexible (custom


Flexibility index). conditions can be applied).

When you need elements that


Use Case When you need a specific satisfy a condition (e.g., even
range of elements. numbers).
Data mining
Data mining is the process of discovering patterns,
trends, and useful information from large datasets using
statistical, mathematical, and computational techniques. It
involves analyzing vast amounts of data to extract valuable
insights that can help organizations make data-driven
decisions.
In simpler terms, data mining is like digging through
a pile of data to find hidden gems of information that can
be used for various purposes, such as improving business
operations, predicting future trends, or understanding
customer behavior.
Data processing
Data processing refers to the collection,
transformation, and manipulation of raw data into
meaningful information. The process involves a series
of steps to convert data into a usable format, and often
involves cleaning, organizing, and structuring the data
before it can be analyzed or used for decision-making.
In short, data processing turns raw data into
useful insights or outcomes. This is an essential activity
in fields like business analytics, data science, research,
and artificial intelligence
Data analysis
Data analysis refers to the process of inspecting,
cleaning, transforming, and modeling data with the
goal of discovering useful information, drawing
conclusions, and supporting decision-making. It
involves applying statistical, mathematical, or
computational techniques to extract insights from
data. Data analysis is used to make sense of raw
data, uncover patterns, and provide actionable
insights that guide business, scientific, or
operational strategies.
Data report
A data report is a structured presentation of data
analysis results, often accompanied by insights,
conclusions, and recommendations. It is typically
used to communicate findings to stakeholders such
as managers, executives, clients, or teams, helping
them make informed decisions. Data reports can
take many forms, including tables, charts, graphs,
and narrative explanations, depending on the
audience and the complexity of the information.
Data cleaning
Data cleaning (also called data cleansing or data
scrubbing) is the process of identifying and rectifying (or
removing) errors, inconsistencies, and inaccuracies in raw
data to improve its quality. Data cleaning is a crucial step
in the data analysis pipeline because clean, accurate, and
consistent data leads to more reliable analysis and better
decision-making.
Raw data collected from various sources may be
incomplete, inconsistent, or erroneous, and cleaning the
data ensures that the data used for analysis is of high
quality.

You might also like