DV Chapter 1
DV Chapter 1
Dr. B. SUJATHA
ASSOCIATE PROFESSOR
UNIT 1- INTRODUCTION TO DATA
VISUALIZATION
Data is a collection of raw facts, numbers, text, sound, images, or
any other format
Data classification is the process of organizing data into groups
based on shared characteristics
Manifold Classification
When based on more than one attribute, the given data is classified into different classes,
and then sub-divided into more sub-classes, which is known as Manifold Classification. For
example, when the population is divided into literate and illiterate, then sub-divided into
male and female, and further sub-divided into married and unmarried, it is a manifold
classification.
4. Quantitative Classification
The classification of data on the basis of the characteristics,
such as age, height, weight, income, etc., that can be
measured in quantity is known as Quantitative
Classification. For example, the weight of students in a
class can be classified as quantitative classification.
Data Collection refers to the systematic process of
gathering, measuring, and analyzing data from various sources to
get a complete and accurate picture of an area of interest.
Primary data refers to information collected directly from first-
hand sources specifically for a particular research purpose. This type of
data is gathered through various methods, including surveys, interviews,
experiments, observations, and focus groups. One of the main
advantages of primary data is that it provides current, relevant, and
specific information tailored to the researcher’s needs, offering a high
level of accuracy and control over data quality.
Secondary data refers to information that has already been
collected, processed, and published by others. This type of data can be
sourced from existing research papers, government reports, books,
statistical databases, and company records. The advantage of secondary
data is that it is readily available and often free or less expensive to
obtain compared to primary data. It saves time and resources since the
data collection phase has already been completed.
What is Data Visualization and Why is It Important in
analyzing the data?
Data Aggregation:
Problem solved: Aggregation combines data at different levels of granularity,
making it easier to analyze and understand.
Use case scenarios: Aggregation can be useful in scenarios where data needs
to be analyzed at different levels of detail, such as in financial analysis or sales
forecasting.
How it works: Techniques include summarization, averaging, and grouping.
The goal is to combine data at different levels of granularity, creating
summaries or averages that are more representative of the underlying
patterns in the data.
Normalization:
Problem solved: Data normalization scales numerical features to a standard
range, typically [0, 1] or [-1, 1]. This prevents features with larger scales
from dominating the model and causing biased results.
Use case scenarios: Normalization is particularly important when working
with machine learning algorithms that are sensitive to the scale of input
features.
How it works: Techniques include min-max scaling and z-score
standardization, which transform the original feature values to a standard
range or distribution, making them more suitable for analysis and modeling.
Generalization:
Problem solved: Generalization reduces the complexity of data by replacing
low-level attributes with high-level concepts.
Use case scenarios: Generalization can be useful in scenarios where the
dataset is too complex to analyze, such as in image or speech recognition.
How it works: Techniques include abstraction, summarization, and clustering.
The goal is to reduce the complexity of the data by identifying patterns and
replacing low-level attributes with high-level concepts that are easier to
understand and analyze.
Filters and slicers
Slicing is the process of extracting a part of a collection (like a list or
a string) by specifying a range. It's like cutting a piece out of a larger
set. You can decide where to start, where to end, and how to step
through the collection.
Key Points:
Start: Where to begin the slice (inclusive).
End: Where to stop the slice (exclusive).
Step: The spacing between elements to include in the slice (optional).
Filtering is the process of picking out specific items from a
collection based on a condition. You can think of it like a sieve: only
the items that match your criteria pass through.
There are two main ways to filter:
Using filter(): This function selects items based on a condition.
Using List Comprehension: A more flexible way to create a new list by
filtering items that meet certain criteria.
Filters and slicers
Aspect Slicing Filtering