0% found this document useful (0 votes)
9 views13 pages

EDS Unit 2 ?

The document provides an overview of data types and statistical descriptions, detailing various attributes such as qualitative and quantitative types, along with their subcategories. It explains basic statistical measures including central tendency (mean, median, mode) and dispersion (range, variance, standard deviation), emphasizing their importance in data analysis. Additionally, it covers the structure of datasets and the significance of graphic displays in representing data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views13 pages

EDS Unit 2 ?

The document provides an overview of data types and statistical descriptions, detailing various attributes such as qualitative and quantitative types, along with their subcategories. It explains basic statistical measures including central tendency (mean, median, mode) and dispersion (range, variance, standard deviation), emphasizing their importance in data analysis. Additionally, it covers the structure of datasets and the significance of graphic displays in representing data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

☘️

Unit 2 Data Types & Statistical


Description
Syllabus
Types of Data: Attributes and Measurement, What is an Attribute? The Type of
an Attribute, The Different Types of Attributes, Describing Attributes by the
Number of Values, Asymmetric Attributes, Binary Attribute, Nominal Attributes,
Ordinal Attributes, Numeric Attributes, Discrete versus Continuous Attributes.
Basic Statistical Descriptions of Data:

Measuring the Central Tendency: Mean, Median, and Mode

Measuring the Dispersion of Data: Range, Quartiles, Variance, Standard


Deviation, and Inter-quartile Range, Graphic Displays of Basic Statistical
Descriptions of Data.

Definition of Dataset
A dataset is a collection of data objects organized in a structured format,
typically in rows and columns. Each row represents an instance (record), and
each column represents an attribute (feature or variable). Datasets are used for
analysis, prediction, and decision-making.

Data Objects are defined by number of attributes (variables)

The dataset consists of rows and columns. Where the rows correspond the
data objects, and the columns correspond the attributes of the data objects.

Unit 2 Data Types & Statistical Description 1


Attributes
An Attribute is a property or characteristic of data objects. Example: In a
student dataset, attributes can be age, gender, or GPA.

Describing Attributes by the Number of Values


Finite Attributes: Limited number of possible values (e.g., the number of
days in a week).

Infinite Attributes: Unlimited or uncountable values (e.g., real numbers


within a range).

Types of Attributes

Unit 2 Data Types & Statistical Description 2


The attributes are classified broadly into two types

1. Qualitative

2. Quantitative

https://medium.com/@netrajpatil12mati/data-objects-and-attribute-types-
704d7d9ea8a8

Qualitative Attributes
These attributes are descriptive and non-numerical, and are used to describe
characteristics that can't be easily measured.

Nominal Attributes
Definition: Represent categories or labels without a meaningful order or
ranking.

Examples: Colors (Red, Green, Blue), Gender (Male, Female), Nationalities.

Key Point: No arithmetic operations can be performed.

Binary Attributes
Definition: Attributes with only two possible values.

Examples: Yes/No, True/False, On/Off.

Key Point: Often encoded as 0 (False) and 1 (True).

Unit 2 Data Types & Statistical Description 3


Symmetric Attributes:

Definition: Binary attributes where both outcomes have equal


importance.

Examples: Male/Female, Pass/Fail.

Key Point: No bias toward one outcome over the other.

Asymmetric Attributes:

Definition: Binary attributes where one outcome is more significant than


the other.

Examples: Presence/Absence of a disease, Positive/Negative test


results.

Key Point: The two values are not equally important.

Ordinal Attributes
Definition: Represent categories with a meaningful order or ranking, but the
intervals between values are not defined.

Examples: Education levels (High School < College < Graduate), Likert
scale (Poor, Average, Good).

Key Point: Arithmetic operations are not applicable, but comparisons are.

Quantitative Attributes
These attributes are numerical and quantifiable, and are used to measure
values or counts.

Discrete Attributes
Definition: Numeric attributes with a finite or countable number of values.

Examples: Number of children, Number of cars.

Key Point: Values are distinct and separate.

Continuous Attributes

Unit 2 Data Types & Statistical Description 4


Definition: Numeric attributes with an infinite number of possible values
within a range.

Examples: Height, Weight, Temperature.

Key Point: Can take any value in a given interval.

Numeric Attributes
Numeric attributes represent measurable quantities and can be classified as:

Interval-Scaled:

Definition: Interval-scaled attributes are measured on a scale with


equal-sized units. The values of interval-scaled attributes have a
definite order and can be positive, zero, or negative. However, these
attributes lack a true zero point. While we can calculate the difference
between values, we cannot express one value as a multiple of another.

Example: Temperature in Celsius or Fahrenheit.

Ratio-Scaled:

Definition: A ratio-scaled attribute is a numeric attribute with a natural


zero point. This means if a measurement is ratio-scaled, we can talk
about one value being a multiple (or ratio) of another value. Plus, the
values are ordered, and we can figure out the difference between them,
as well as calculate things like the mean, median, and mode.

Example: Height, Weight, Age.

Discrete vs Continuous attributes


A discrete attribute has a limited or countable set of values. These values might
be numbers, but they don’t have to be. For example, hair color, smoker status,
medical test results, or drink sizes — each of these has a set number of
options, so they’re discrete.
If an attribute isn’t discrete, we call it continuous. Continuous attributes can
take any value within a range. There are no gaps between possible values.

Let’s look at an example to make this clearer. Height is a continuous attribute.


Someone could be 170.5 cm tall, or 170.51 cm, or 170.513 cm — there’s no limit

Unit 2 Data Types & Statistical Description 5


to how precise we can get. We can always squeeze another possible value
between two heights, no matter how close they are.

Basic Statistical Descriptions of Data


Importance of Basic Statistical Descriptions of Data
1. Understanding Central Trends: Measures like mean, median, and mode
help in identifying the central tendency of the data, giving a snapshot of
what is typical or average in the dataset.

2. Evaluating Data Spread: Dispersion measures like range, variance, and


standard deviation indicate how spread out the data is. Understanding
variability is crucial for determining consistency in the dataset.

3. Detecting Anomalies: Statistical tools like box plots and inter-quartile range
(IQR) help identify outliers, which could indicate errors, special cases, or
significant trends worth investigating.

4. Comparing Datasets: Basic statistical measures make it easier to compare


different datasets, helping to identify trends, similarities, or differences
between groups or categories.

5. Supporting Data-Driven Decisions: By summarizing data effectively, these


statistical descriptions provide a solid foundation for making informed
decisions, conducting hypothesis testing, and selecting appropriate models
for analysis.

These basic descriptions are essential for interpreting data accurately and
making informed, data-driven decisions.

Measure of Central Tendency


It is the statistical measure that identifies a single value as representative of an
entire distribution

Mean
The average value, calculated by summing all data points and dividing by the
number of data points.

1. Mean of an Individual Series:

Unit 2 Data Types & Statistical Description 6


The individual series refers to a set of individual data points or values.

The mean (average) is calculated by summing up all the values and


dividing by the number of observations.

Formula:
∑x
Mean = n  ​

Example: For the data series [3, 5, 7, 9], the mean would be:
3+5+7+9 24
Mean = 4 ​ = 4 ​ = 6
2. Mean of a Discrete Series:

A discrete series consists of distinct, countable data points, often


associated with frequencies (i.e., how many times a value appears).

The mean is calculated by multiplying each data point by its frequency,


summing the results, and dividing by the total number of observations.

Formula:

Where:
∑ (f ⋅x)
Mean = ∑f
​

f = frequency of each value


x= value of the data point
Example: For the data series with values [2, 4, 6]and corresponding
frequencies [3, 5, 2]:
(2×3)+(4×5)+(6×2) 6+20+12 38
Mean = 3+5+2

= 10
​ = 10
​ = 3.8
3. Mean of a Continuous Series:

A continuous series deals with data that can take any value within a
given range, often represented in intervals or class groups.

The mean is calculated by finding the midpoint of each class,


multiplying it by the frequency of the class, summing the results, and
dividing by the total number of observations.

Formula:
∑ (f ⋅m)
Mean = ∑f
 ​

Where:

Unit 2 Data Types & Statistical Description 7


f = frequency of each class
m= midpoint of each class interval
Example: For a continuous series with class intervals [10 − 20, 20 −
30, 30 − 40]and corresponding frequencies [5, 8, 7]:
Midpoints:

m = 15, 25, 35


(5×15)+(8×25)+(7×35) 75+200+245 520
Mean = 5+8+7
​ = 20
= 20
​ = 26

Mode
The most frequently occurring value in the dataset.

1. For an Individual Series:

Identify the value that occurs most frequently in the dataset.

Example: In the series [1, 2, 2, 3, 4], the mode is 2because it occurs


most often.

2. For a Discrete Series (with frequencies):

The mode is the value (or class interval) that has the highest frequency.

Formula:
Mode = Value with highest frequency
Example: For the dataset values [2, 4, 6]with frequencies [3, 5, 2], the
mode is 4because it has the highest frequency (5).

3. For a Continuous Series (with class intervals):

The mode for a continuous series can be found using the following
formula:

Mode = L + ( (2f1f)−f
1 −f0
0 −f 2

) × h


Where:

L= Lower boundary of the modal class


f1 = Frequency of the modal class

f0 = Frequency of the class before the modal class


Unit 2 Data Types & Statistical Description 8


f2 = Frequency of the class after the modal class

h= Width of the class intervals


Example: For the continuous series with class intervals [10 − 20, 20 −
30, 30 − 40]and frequencies [5, 8, 7], the modal class is 20 − 30
(because it has the highest frequency of 8). We would plug in values to
the formula to calculate the mode.

Median
The middle value when data points are arranged in ascending order. If there is
an even number of values, the median is the average of the two middle values.

Odd Number of Data Points:

The median is the middle value.


Example: [1, 3, 5] → Median = 3.

Even Number of Data Points:


The median is the average of the two middle values.

Example: [1, 3, 5, 7] → Median = 3+5


2

= 4.

Measure of Dispersion of Data


The measure of dispersion measures the extent to which the datapoints vary
from the central point (mean, median, mode).

These are of two types

1. Absolute Measure

2. Relative Measure

Absolute Measure of Dispersion


When dispersion is expressed in terms of original units, it’s absolute measure of
dispersion

1. Range:

The difference between the maximum and minimum values in a dataset.

Unit 2 Data Types & Statistical Description 9


Formula: Range = Max − Min
Example: [1, 3, 5, 7] → Range = 7 - 1 = 6

2. Quartiles:

Q1 (First Quartile): The median of the lower half of the dataset.

Q2 (Second Quartile): The median of the entire dataset.

Q3 (Third Quartile): The median of the upper half of the dataset.

Interquartile Range (IQR): The difference between Q3 and Q1.


Formula:
IQR = Q3 − Q1
Example: For [1, 3, 5, 7, 9], Q1 = 3, Q2 = 5, Q3 = 7, and IQR = 7 - 3 = 4

3. Variance:

A measure of how much the values in the dataset deviate from the
mean.

Formula: Variance = n1 ​ ∑ni=1 (xi − μ)2 


​ ​

Where is each value, is the mean, and is the number of data points.

Example: For [1, 2, 3], mean = 2, variance = .


(1−2)2 +(2−2)2 +(3−2)2
3
​ = 0.67
4. Standard Deviation:

The square root of the variance. It shows how spread out the numbers
are.

Formula: Standard Deviation = Variance


Example: For a variance of 0.67, standard deviation =


0.67 ≈ 0.82

5. Inter-quartile Range (IQR):

The range between Q1 and Q3, representing the middle 50% of the
data.

Formula: IQR = Q3 − Q1


Example: For [1, 3, 5, 7, 9], Q1 = 3, Q3 = 7, and IQR = 7 - 3 = 4

Unit 2 Data Types & Statistical Description 10


Graphic Displays:
Box Plot: Shows the distribution of data, highlighting the median, quartiles,
and outliers.

Histogram: Displays the frequency of data within specific intervals.

Bar Chart: Compares quantities of different categories.

Relative Measure of Dispersion


When dispersion is expressed in terms of ratios of absolute measure of
dispersion, it’s Relative Measure of Dispersion. It is mostly used to compare the
variation of two or more distributions

Coefficients of Dispersion
1. Coefficient of Range:

Measures the relative spread of the range.

Formula:
Max−Min
Coefficient of Range = Max+Min
 ​

Example: If Max = 50, Min = 10:


50−10 40
Coefficient of Range = 50+10 ​ = 60 = 0.67
2. Coefficient of Variation (CV):

Measures relative variability compared to the mean.

Formula:
Standard Deviation
CV = Mean
​ × 100
Example: If mean = 20, standard deviation = 4:
4
CV = 20

× 100 = 20%
3. Coefficient of Mean Deviation:

Measures the mean deviation relative to the mean.

Formula:
Mean Deviation
Coefficient of Mean Deviation = Mean
 ​

Example: If mean = 30, mean deviation = 5:


5
Coefficient of Mean Deviation = 30
​ = 0.167

Unit 2 Data Types & Statistical Description 11


4. Coefficient of Quartile Deviation:

Measures the relative dispersion of the middle 50% of data.

Formula:
Q3−Q1
Coefficient of Quartile Deviation = Q3+Q1
 ​

Example: If Q1 = 25, Q3 = 75:


75−25 50
Coefficient of Quartile Deviation = 75+25
​ = 100
= 0.5
Q3−Q1
Quartile Deviation = 2
 ​

Basic Graphical Representations of Data


1. Line Graphs
Single Line Graph:
Represents one variable over time or another continuous variable.
Example: A graph showing daily temperatures over a week.

Multiple Line Graphs:


Displays two or more variables on the same graph for comparison.

Example: Sales trends for two products over a year.

Compound Line Graph:


Used to show cumulative data trends, where different variables contribute
to the total.

Example: A graph showing total revenue divided into product categories


over time.

2. Pie Chart
Represents data as a circular chart divided into slices, where each slice is
proportional to the percentage of a category.

Example: Market share of different companies in a sector.

3. Histogram
Displays frequency distribution of continuous data, with adjacent bars to
indicate intervals.

Unit 2 Data Types & Statistical Description 12


Example: Distribution of student scores in an exam.

4. Bar Charts
Vertical Bar Chart:

Bars are upright, and their height represents the value of each category.

Example: Sales revenue for different products.

Horizontal Bar Chart:

Bars are horizontal, suitable when category names are long.

Example: Population of different cities.

Grouped Bar Chart:

Groups multiple bars for each category to compare subcategories.

Example: Sales revenue for two brands in different regions.

Stacked Bar Chart:

Stacks subcategories within a bar to show their contribution to the total.

Example: Total revenue with individual contributions from different


departments.

Unit 2 Data Types & Statistical Description 13

You might also like