0% found this document useful (0 votes)
17 views

1 Elements, Variables and Data Categorization

Uploaded by

sriyogesh223
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

1 Elements, Variables and Data Categorization

Uploaded by

sriyogesh223
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 27

U18AII5202 – EXPLORATORY DATA

ANALYSIS AND VISUALIZATION

UNIT I – ANALYSIS TECHNIQUES


TOPIC: ELEMENTS, VARIABLES AND DATA
CATEGORIZATION

UNIT I - INTRODUCTION TO ADVANCED DATA MODELING


What is dataset?
● A Dataset is an ordered collection of data. This set is normally
presented in a tabular pattern.
● Every column describes a particular variable, and each row
corresponds to a member of the data set.
● A dataset is a set of numbers or values that pertain to a specific topic.
Types of Datasets
● Numerical data sets
● Bivariate data sets
● Multivariate data sets
● Categorical data sets
● Correlation data sets
Numerical Datasets
● The numerical data set is a data set, where the data are expressed in
numbers rather than natural language. The numerical data is
sometimes called quantitative data.
● Examples
 Weight and height of a person
 The count of RBC in a medical report
 Number of pages present in a book
Bivariate Datasets
● A data set that has two variables is called a Bivariate data set. It
deals with the relationship between the two variables. Bivariate
dataset usually contains two types of related data.
● Examples
 Height and weight of group of individuals
 The sales of ice cream versus the temperature on that day.
Multivariate Datasets
● A data set with multiple variables. When the dataset contains three
or more than three data types (variables), then the data set is called
a multivariate dataset.
● In other words, the multivariate dataset consists of individual
measurements that are acquired as a function of three or more than
three variables.
● Example: To describe the volume of a rectangular box variables
such as length, width, height are required.
Categorical Datasets
● Categorical data sets represent features or characteristics of a person or an
object.
● The categorical dataset consists of a categorical variable also called the qualitative
variable, that can take exactly two values. Hence, it is termed as a dichotomous
variable.
● Categorical data/variables with more than two possible values are called
polytomous variables.
● Example:
● A person’s gender (male or female)
● Marital status (married/unmarried)
Correlation Datasets
 The set of values that demonstrate some relationship with each other indicates
correlation data sets. Here the values are found to be dependent on each other.
 Generally, correlation is defined as a statistical relationship between two
entities/variables.
 The correlation is classified into three types. They are:
 Positive correlation – Two variables move in the same direction (Either both are up or
both or down)
 Negative correlation – Two variables move in opposite directions. (One variable is up
and another variable is down and vice versa)
 No or zero correlation – No relationship between two variables.
 Example: A tall person is considered to be heavier than a short person. So here the
weight and height variables are dependent on each other.
Basis of Data Classification
Geographical Classification
 The classification of data on the basis of geographical location or region is known as
Geographical or Spatial Classification.
 For example, presenting the population of different states of a country is done on the
basis of geographical location or region.
Chronological Classification
 The classification of data with respect to different time periods is known as Chronological
or Temporal Classification.
 For example, the number of students in a school in different years can be presented on
the basis of a time period.
Qualitative Classification
 The classification of data on the basis of descriptive or qualitative characteristics like

region, caste, gender, education, etc., is known as Qualitative Classification.

 A qualitative classification can be of two types; viz., Simple Classification and Manifold

Classification.
Qualitative Classification
 When based on only one attribute, the given data is classified into two classes, which is
known as Simple Classification.
 For example, when the population is divided into literate and illiterate, it is a simple
classification.
Qualitative Classification
 When based on more than one attribute, the given data is classified into different
classes, and then sub-divided into more sub-classes, which is known as Manifold
Classification.
 For example, when the population is divided into literate and illiterate, then sub-divided
into male and female, and further sub-divided into married and unmarried, it is a
manifold classification.
Quantitative Classification
 The classification of data on the basis of the characteristics, such as age, height, weight,
income, etc., that can be measured in quantity is known as Quantitative Classification.
 For example, the weight of students in a class can be classified as quantitative
classification.
Types of Data
The data in statistics is classified into four categories:
 Nominal data
 Ordinal data
 Discrete data
 Continuous data
Nominal Data
● Nominal data is a type of qualitative(categorical) data that
consists of categories or names that cannot be ordered or
ranked.
● Nominal data is often used to categorize observations into
groups, and the groups are not comparable.
● In other words, nominal data has no inherent order or ranking.
● Examples of nominal data include gender (Male or female), race
(White, Black, Asian), religion (Hinuduism, Christianity, Islam,
Judaism), and blood type (A, B, AB, O).
Ordinal Data
● Ordinal data is a type of qualitative(categorical) data that
consists of categories that can be ordered or ranked.
● Ordinal data is often used to measure subjective attributes
or opinions, where there is a natural order to the
responses.
● Examples of ordinal data include education level
(Elementary, Middle, High School, College), job position
(Manager, Supervisor, Employee), etc.
Discrete Data
● Discrete data type is a type of data in statistics that only
uses Discrete Value or Single Values.
● These data types have values that can be easily counted
as whole numbers.
● Examples:
○ Height of Students in a class
○ Marks of the students in a class test
Continuous Data
● Continuous data is the type of the quantitative data that
represent the data in a continuous range.
● The variable in the data set can have any value between
the range of the data set.
● Examples:
○ Temperature Range
○ Salary range of Workers in a Factory
Basis components of a data set

● Element: The entities on which data are collected.

● Variable: A characteristic of interest for the element.

● Observation: The set of measurements collected for a


particular element.
Basis components - Example
Practice Problems
Practice Problems
● A recent issue of Fortune Magazine reported that the following companies had lowest sales
per employee among the Fortune 500 companies.
(a) How many elements are in the data set?
Write down these elements.
(b) How many variables are in the data set?
Write down these variables.
(c) How many observations are in the data set?
Write down these observations.
(d) Which of the above variables are qualitative
and which are quantitative?
Practice Problems

● Determine which of the following data are quantitative or qualitative.

i) the marks that students get in a test

ii) the gender of newborn babies

iii) the area codes in phone numbers

iv) the heights of buildings


Practice Problems

● Determine which of the following quantitative data is discrete or continuous.

i) the number of customers visiting a store over a weekend

ii) the amount of water consumed by a country over the past 10

years

iii) the outcomes of rolling a 6-sided die ten times

iv) the heights of trees in a rainforest

v) students' shoe sizes in a class

You might also like