0% found this document useful (0 votes)
8 views6 pages

Unit 3

The document provides an overview of data types, including qualitative and quantitative data, and further divides quantitative data into discrete and continuous categories. It explains the construction and purpose of frequency curves, standard deviation, variance, covariance, quartiles, and percentiles, detailing their calculations and significance in statistical analysis. Additionally, it highlights the differences between frequency curves, polygons, and histograms.

Uploaded by

anjalitak906
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views6 pages

Unit 3

The document provides an overview of data types, including qualitative and quantitative data, and further divides quantitative data into discrete and continuous categories. It explains the construction and purpose of frequency curves, standard deviation, variance, covariance, quartiles, and percentiles, detailing their calculations and significance in statistical analysis. Additionally, it highlights the differences between frequency curves, polygons, and histograms.

Uploaded by

anjalitak906
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

DATA AND ITS TYPES

Data: A set of values recorded on one or more observational units. Or it is factual information
collected during research studies.

Qualitative data: The variables that yield observations on which individuals can be categorized
according to certain characteristics or qualities are referred as qualitative variables or data, e.g.
gender, occupation, marital status, and educational level.

Quantitative data: The variables that yield observations that can be measured are considered
quantltative data, e.g. height, weight, blood pressure, serum cholesterol, body temperitture
Quantitatijre data are further divided into discrete and continuou5 data.

Discrete data: The data in a whole number is called discrete data such as number of children in a
family, pulse rate, ESR, blood sugar, blood pressure, etc. It can be understood with following
example. Pulse rate of ten people recorded is as follow:
80, 72, 75, 82, 77, 83, 86, 74, 78, 88

Continuous data: The data which can be measured in fractional values such as height, weight, body
temperature, chest circumference, etc. is called continuous data. It can be understood with following
example:

Weight of ten students of 10th class is recorded in Kilograms is as follow:

45.7, 50.2, 48.9, 48.4, 56.5, 44.5, 47.8, 47.8, 45.5, 46.3

Frequency Curve:
A frequency curve is a smooth, graphical representation of a frequency distribution, which shows
how often different values occur in a dataset. The curve is formed by connecting the midpoints of
the top edges of a histogram's bars with a freehand, smooth curve.

Construction of Frequency Curve/How to draw Frequency Curve:

X-axis:
It represents the variable being measured (e.g., height, weight, scores).
Y-axis:
It represents the frequency, or how many times each value (or range of values) occurs.
Shape:
The curve's shape can reveal information about the data distribution.
Common shapes include:

1) Normal distribution (bell curve): Symmetrical, with the highest frequency in the center and
tails tapering off on both sides.
2) Skewed distributions: Asymmetrical, with a longer tail on one side, indicating a higher
concentration of values towards one end.
3) U-shaped curve: Has a low frequency in the center and higher frequencies at the extremes.
4) J-shaped curve: Starts with a high peak and then slopes downward.
5) Mixed curve: A combination of different shapes.

Normal (Bell Shaped) Frequency Curve Skewed Frequency Curve

U Shaped Frequency Curve J shaped Frequency Curve

Purpose of a frequency curve:


Visual representation: It provides a clear visual representation of how the data is distributed.
Comparison: It allows for easy comparison of different datasets or distributions.
Understanding the underlying distribution: It helps to understand the shape, symmetry, and
spread of the data.

Difference between frequency curve, polygon, and histogram:


Histogram: Uses bars to represent the frequency of each class interval.
Frequency polygon: Joins the midpoints of the top edges of the histogram bars with straight lines.
Frequency curve: Connects the midpoints of the histogram bars with a smooth, freehand curve,
rather than straight lines
STANDARD DEVIATION:
It is the measure of the dispersion of statistical data. Standard Deviation shows how much variation
from the mean exists. The standard deviation indicates a “typical” deviation from the mean. Standard
deviation calculates the extent to which the values differ from the average. A change in even one
value affects the value of standard deviation.

Standard Deviation is denoted by (σ)

Calculation of Standard Deviation:

or

σ = Variance

VARIANCE:
Variance is a statistical measure that shows how much the values in a dataset deviate from the mean
(average). It gives a sense of how spread out or concentrated the data is.

• If variance is low, data points are close to the mean.


• If variance is high, data points are spread out over a wider range.

Calculation of variance:

∑(x−xˉ)2
Variance(σ ) =
2

or
Variance = σ2
Covariance of Data:
Covariance is a measure of the relationship between two random variables and to what extent, they
change together or in other words, it defines the changes between the two variables, such that change
in one variable is equal to change in another variable. Covariance is measured in units, which are
calculated by multiplying the units of the two variables.

Types of Covariance
Covariance can have both positive and negative values. Based on this, it has two types:

• Positive Covariance
• Negative Covariance

Positive Covariance
If the covariance for any two variables is positive, that means, both the variables move in the same
direction. Here, the variables show similar behaviour. That means, if the values (greater or lesser) of
one variable corresponds to the values of another variable, then they are said to be in positive
covariance.
Negative Covariance
• If the covariance for any two variables is negative, that means, both the variables move in the
opposite direction. It is the opposite case of positive covariance, where greater values of
one variable correspond to lesser values of another variable and vice-versa.

Where,

xi = data value of x

yi = data value of y

x̄ = mean of x

ȳ = mean of y
N = number of data values.

If cov(X, Y) is greater than zero, the covariance for any two variables is positive and both the
variables move in the same direction.

If cov(X, Y) is less than zero, the covariance for any two variables is negative and both the variables
move in the opposite direction.

If cov(X, Y) is zero, there is no relation between two variables.

Quartile:
Quartiles are the set of values which has three points dividing the data set into four identical parts.
The middle part of the three quarters measures the central point of distribution and shows the data
which are near to the central point. The lower part of the quarters indicates just half information set
which comes under the median and the upper part shows the remaining half, which falls over the
median.

Quartiles divide the entire set into four equal parts. So, there are three quartiles, first, second and
third represented by Q1, Q2 and Q3, respectively. Q2 is the median, since it indicates the position of
the item in the list and thus, is a positional average. To find quartiles of a group of data, arrange the
data in ascending order.

Quartiles (Q1, Q2, Q3)


Quartiles Formula
Suppose, Q3 is the upper quartile is the median of the upper half of the data set. Whereas, Q1 is the lower
quartile and median of the lower half of the data set. Q2 is the median. Consider, we have n number of items
in a data set. Then the quartiles are given by;

Q1 = [(n+1)/4]th item

Q2 = [(n+1)/2]th item

Q3 = [3(n+1)/4]th item
Percentile:
A percentile is a statistical measure that indicates the relative standing of a value within a dataset.
For example, if a student scores in the 90th percentile on a test, they have scored better than 90% of
the other students who took the test. A percentile is a measure used to indicate the value below which
a given percentage of observations in a group of observations fall.

Formula of Percentile

For calculating the percentile of 'x' in the data,

Percentile = (Number of values below 'x'/Total number of values) × 100

Steps for calculating percentile:

Step 1: Arrange Data

Sort the data set in ascending order.

Step 2: Calculate Rank

After arranging the data in order, we need to calculate the rank. The formula for rank is given as

Rank = (Desired Percentile/100) × (n+1)

Where n is the number of observations.

Step 3: Find the Value

Case 1: If the rank is a whole number, the value at that position in the ordered dataset is the desired
percentile.

Case 2: If the rank is a decimal, interpolate to the nearest whole number to find the percentile value.

The general formula to find the Pth percentile is:

P=100n×(N+1)P=n100×(N+1)

Let's assume we have the following data set: 10, 20, 30, 40, 50, 60, 70, 80, 90, 100. To find the 70th
percentile:

R=70100×(10+1)=7.7R=10070×(10+1)=7.7

The 70th percentile lies between the 7th and 8th values. Thus, the 70th percentile is a value between
70 and 80.

You might also like