ASA Notes
ASA Notes
Notes
Index
1. Introduction 3
2. Measurement Scales 14
3. Measures of Central Tendency 18
4. Measures of Dispersion 27
2|Page
1. Introduction
3|Page
mean or median. Essentially, variability measures the degree to which data values differ from
each other in a given dataset.
A higher variability indicates a more scattered or diverse dataset, while lower variability
suggests that data points are closer to each other.
4) Standard Deviation: Standard deviation is a statistical measure that quantifies the amount
of dispersion or spread within a dataset. It provides an insight into the variability of
individual data points in relation to the mean (average). A higher standard deviation indicates
greater data variability, while a lower standard deviation suggests that the data points are
closer to the mean.
5) Variance: Variance is a measure of the spread or dispersion of a set of data points around
the mean. It quantifies how much each data point deviates from the mean of the dataset. The
variance is calculated by taking the average of the squared differences between each data
point and the mean. A higher variance indicates greater variability among the data points,
while a lower variance suggests that the data points are closer to the mean.
6) Measure of central tendency: The measures of central tendency in statistics are the
summary statistics which describe the center or typical value of a dataset. The three main
measures of central tendency are:
Mean: It is calculated by summing all values and dividing by the total number of
observations. The mean is sensitive to extreme values.
Median: This is the middle value in a sorted dataset. The median is less affected by extreme
values.
Mode: Mode is that value which appears most frequently in the dataset. A dataset may have
one mode, more than one mode, or no mode at all.
These measures provide us a central reference point for understanding the distribution of
data. We can choose the appropriate measure of central tendency depending on the nature of
the data and the desired interpretation from the data.
7) Normal Distribution: A normal distribution, also known as a Gaussian distribution or bell
curve, is a symmetrical probability distribution characterized by a bell-shaped curve. It is
described by the mean(center) and standard deviation(spread). In this distribution:
a) The majority of data points cluster around the mean, forming the highest point on the
curve.
b) The curve is symmetric, with tails extending infinitely in both directions.
c) The spread of the distribution is determined by the standard deviation.
4|Page
8) Left Skewed Distribution: A left-skewed distribution, also known as negatively skewed
or left-tailed, is characterized by a longer left tail and a concentration of data points on the
right side. In this distribution:
a) The mean is typically less than the median, indicating that the distribution is pulled to the
left by a few lower extreme values.
b) The bulk of the data points are concentrated on the right side, and the left tail is stretched,
resulting in asymmetry.
c) Left-skewed distributions are less common than right-skewed distributions but occur when
there are lower limits on the data.
Real-life examples of left-skewed distributions include the distribution of age at retirement or
the distribution of income in a population with a minimum wage.
9) Right Skewed Distribution: A right-skewed distribution, also known as positively
skewed or right-tailed, is characterized by a longer right tail and a concentration of data
points on the left side. In this distribution:
a) The mean is typically greater than the median, indicating that the distribution is pulled to
the right by a few higher extreme values.
b) The bulk of the data points are concentrated on the left side, and the right tail is stretched,
resulting in asymmetry.
c) Right-skewed distributions are more common than left-skewed distributions and often
occur when there are upper limits on the data.
Real-life examples of right-skewed distributions include income distributions in populations
with high-income inequality or the distribution of home prices in certain real estate markets.
5|Page
Ques: Why is median a better representation of data ?
1. Less Affected by Outliers : The median is less influenced by extreme values or outliers
compared to the mean.
2. Robust Measure : In skewed distributions, where data is not symmetrically distributed,
the median provides a more robust measure of central tendency.
3. Reflects Central Tendency : In asymmetric distributions, the median often better reflects
the central tendency of the data.
4. Better for Skewed Data : When data exhibits significant skewness or has outliers, the
median offers a more accurate representation of the center of the distribution.
5. Suitable for Ordinal Data : In situations where data is ordinal or categorical, the median
is often preferred as it provides a meaningful measure of central tendency without making
assumptions about the interval between values.
6|Page
6. Resistant to Extremes : Extreme values, which might skew the mean, have less impact
on the median, making it more resistant to fluctuations caused by such values.
7. Easier Interpretation : In some contexts, the median might be easier to interpret,
especially for non-technical audiences or when dealing with non-normal data distributions.
Ques: What is hypothesis testing?
Hypothesis testing is a statistical method used to determine if there is enough evidence in a
sample data to draw conclusions about a population. It involves testing an assumption, known
as the null hypothesis, against an alternative hypothesis. Hypothesis testing is crucial in
research and decision-making processes. It provides a systematic framework to evaluate the
validity of assumptions and draw meaningful conclusions from data.
Ques: Explain the different types of data.
There are two types of data:-
Qualitative Data
Quantitative Data
1) Qualitative Data: Qualitative data is descriptive and non-numerical. It focusses on the
qualities, characteristics and attributes of data. Qualitative data is valuable for exploring
complex phenomena and understanding context. Examples: gender male/female, smoker/non-
smoker, questionnaire response (agree, disagree, neutral), hair color, religion, political party
affiliation and profession.
2) Quantitative Data: Quantitative data is numerical and measurable, expressed in terms of
quantities or amounts. It involves objective observations and precise measurements that can
be analyzed statistically. Quantitative data is used to perform statistical analysis and precise
measurement. Examples: Revenue in dollars, age in months or years, distance in miles or
kilometres, time in days or weeks, test scores, Weight in kilograms or pounds, height in feet
or inches, length in centimetres, population size, income, sales figures, fuel consumption
and website page load speed.
Identify whether the following is qualitative data or quantitative data:
The natural hair colour of 20 randomly selected fashion models: hair colour cannot be
measured or expressed as a number; instead, it is descriptive and categorical. Hence this is
qualitative data.
The ages of 20 randomly selected fashion models: quantitative data because age is a
numerical value.
The fuel economy in miles per gallon of 20 new cars purchased last month: quantitative
data because fuel economy in miles per gallon is a numerical value.
7|Page
Ques: What are the different types of sampling?
The different types of sampling are:
Random Sampling
Biased Sampling
Stratified Sampling
1) Random Sampling: Random sampling involves selecting a subset of individuals from a
larger population in such a way that each member of the population has an equal chance of
being chosen. This method ensures that the sample is representative of the entire population
and reduces the risk of bias.
2) Biased Sampling: Biased sampling occurs when certain members of the population are
more likely to be selected than others, leading to a skewed representation of the population.
This can result from flaws in the sampling method or deliberate manipulation to favour a
particular outcome.
3) Stratified Sampling: Stratified sampling involves dividing the population into subgroups
or strata based on specific characteristics, such as age or income level. Samples are then
randomly selected from each subgroup, ensuring that each subgroup is adequately
represented in the final sample. This method allows for a more precise analysis of each
subgroup and can improve the overall accuracy of the sample.
Ques: What are the advantages of performing stratified sampling?
1) By dividing the population into homogeneous subgroups or strata and then sampling from
each stratum, the variability within each group is reduced, leading to more accurate estimates.
2) Stratified sampling ensures that each subgroup or stratum is adequately represented in the
sample, allowing for a more comprehensive understanding of the population.
3) When the population is diverse, stratified sampling can be more cost-effective.
4) Stratified sampling reduces the risk of sample bias by ensuring that each subgroup or
characteristic of interest is appropriately represented in the sample.
Ques: What is sample bias?
Sample bias refers to the systematic error introduced in a study or survey when the
characteristics of the sample differ from those of the population which the sample intends to
represent. This discrepancy can lead to inaccurate conclusions because the sample does not
adequately reflect the true population. Sample bias can occur due to various factors, such as
non-random sampling methods, voluntary response bias, or under coverage. It can distort the
results and lead to misleading interpretations if not appropriately addressed.
Ques: What is voluntary response bias?
Voluntary response bias occurs when individuals self-select to participate in a survey or
study, leading to a non-random sample. Those who choose to respond may have stronger
opinions or experiences related to the topic, while others may opt out. This can skew the
results, as the sample may not represent the broader population accurately. Voluntary
8|Page
response bias can lead to overrepresentation of certain viewpoints or demographics, while
underrepresenting others, resulting in biased conclusions.
Ques: What is population and sample?
A population is the entire group that we want to draw conclusions about. A sample is the
specific group from which we will collect data from. Samples are used because they are more
manageable, cost-effective, and less time-consuming to study compared to entire populations.
The size of the sample is always less than the total size of the population.
Examples of population:
Examples of sample:
All the students in the school are the population and the students of class 10 are the sample.
Patients in the hospital are the population and the old age patients are the sample.
All the people who have the ID proofs is the population and a group of people who only have
voter id with them is the sample.
All the students in the class are population whereas the top 10 students in the class are the
sample.
All the members of the parliament is population and the female candidates present there is the
sample.
Identify whether population and sample
9|Page
The gender of every second customer entering a mall : sample
The snow crabs caught on a single fishing trip represent only a portion of the total snow crab
population in the area.
Answers:
- This represents a population if it includes the heights of all students enrolled in that
school. However, if it only includes the heights of a subset of students, such as those in a
specific grade or class, it would be a sample
- This represents a population if it includes the count of all cars passing through the toll
booth in a day. However, if it only includes the count of cars passing through during certain
times or days, it would be a sample .
10 | P a g e
3. The salaries of employees in a specific company:
- This represents a population if it includes the salaries of all employees working for that
specific company. However, if it only includes the salaries of a subset of employees, such as
those in a particular department or location, it would be a sample
- This represents a population if it includes the ages of all residents living in that particular
city. However, if it only includes the ages of a subset of residents, such as those living in a
specific neighbourhood, it would be a sample .
- This represents a population if it includes the scores of all students who took the
standardized test in that state. However, if it only includes the scores of a subset of students,
such as those from a specific school or district, it would be a sample .
- This represents a sample as it likely does not include the weights of all fish in the lake
but rather a subset of fish caught during a specific time or by certain methods.
- This represents a population if it includes the count of all daily visitors to the tourist
attraction. However, if it only includes the count of visitors on certain days or during specific
times, it would be a sample .
- This represents a sample as it likely does not include the prices of all houses sold in the
neighbourhood but rather a subset of houses sold during a specific period.
- This represents a sample as it likely does not include the blood pressure readings of all
patients in the hospital but rather a subset of patients measured during a specific time or for
specific reasons.
- This represents a sample if it only includes the ages of customers present during a
specific time or day at the shopping mall. However, if it includes the ages of all customers
who have visited the mall over a certain period, it could be considered a population .
11 | P a g e
Identify whether population and sample
The test scores of 200 students who took a standardized exam in a school district:
Answers:
- This represents a sample as it includes the heights of only a subset of basketball players
in the tournament, not all players.
- This represents a sample as it includes the incomes of only a subset of employees in the
company, not all employees.
- This represents a sample if it includes the ages of only a subset of individuals living in
the city. However, if it includes the ages of all individuals in the city, it would be a
population .
- This represents a sample as it includes the weights of only a subset of fish caught in the
river, not all fish.
5. The test scores of 200 students who took a standardized exam in a school district:
- This represents a sample as it includes the test scores of only a subset of students who
took the exam, not all students in the school district.
12 | P a g e
6. The number of daily visitors to a museum over the course of a month:
- This represents a population if it includes the count of all daily visitors to the museum
over the month. However, if it only includes the count of visitors on certain days or during
specific times, it would be a sample .
- This represents a sample as it includes the prices of only a subset of houses sold in the
neighbourhood, not all houses.
- This represents a sample as it includes the ages of only a subset of participants in the
study, not all participants.
- This represents a sample as it includes the blood pressure readings of only a subset of
patients in the hospital, not all patients.
- This represents a sample as it includes the heights of only a subset of trees in the forest
reserve, not all trees.
13 | P a g e
2. Measurement Scales
Ques: What is a scale?
Scales of measurement refer to ways in which variables/numbers are defined and categorized.
The four scales of measurement are nominal, ordinal, interval, and ratio.
In statistics, a "true zero" refers to a point on the measurement scale where the absence of the
attribute being measured is indicated by the value zero. Unlike other scales where zero might
be arbitrary or simply a reference point, a true zero represents an absolute absence of the
variable being measured.
1) Nominal Scale: This scale categorizes data into distinct categories or groups and does not
involve any quantitative value or order. The nominal scale only allows measurement of the
mode. Nominal data consists of categories without any inherent numerical value or order.
Therefore, it lacks the properties required for calculating the mean or median
Examples: Gender (M), Eye Colour(black/brown/green), Marital
status(Married/Unmarried), Political Party Affiliation, place of abode(city/town/village),
Nationality, Blood Type, Religion, Language Spoken, Car Brands, Animal Species, colours of
crayon in a 24 crayon box
2) Ordinal Scale: This scale is used to simply depict the order of variables and not the
difference between each variable. These scales generally depict non-mathematical ideas such
as frequency, satisfaction, happiness, Grades, degree of pain, etc. The ordinal scale allows the
14 | P a g e
measurement of the mode and median. Although the ordinal data has a rank order, the
intervals between variables may not be consistent, making the calculation of the mean
incorrect. However, it is possible to identify the middle value, making it possible to calculate
the median.
Examples:
Happiness scale (e.g. Very Sad, Sad, Neutral, Happy, Very Happy)
Educational attainment (e.g., high school diploma, bachelor's degree, master's degree)
Socioeconomic status (e.g., lower class, middle class, upper class)
Likert scale responses (e.g., strongly disagree, disagree, neutral, agree, strongly agree)
Job hierarchy levels (e.g., entry-level, supervisor, manager, director)
Survey response options indicating frequency (e.g., never, rarely, sometimes, often, always)
Performance ratings (e.g., poor, fair, satisfactory, good, excellent)
Satisfaction levels (e.g., very unsatisfied, unsatisfied, neutral, satisfied, very satisfied)
Health condition severity (e.g., mild, moderate, severe)
Economic development classifications (e.g., developing, emerging, developed)
Sports rankings (e.g., first place, second place, third place)
Letter Grades (e.g. A, B, C, D, F)
Political Outcomes (e.g. left of centre, centre, right of centre)
3) Interval Scale: In this scale, the order of the variables and the difference between the
variables is known. Interval means the distance between the two entities. The interval scale
contains all the properties of the ordinal scale and offers a calculation of the difference
between variables. The main characteristic of this scale is the equidistant difference between
objects. The only drawback of this scale is that there is no pre-decided starting point or a true
zero value. This scale allows the measurement of mean, median and mode.
Temperature measured in the Celsius/Fahrenheit scale belongs to the interval scale because
both Celsius and Fahrenheit scales have consistent intervals between each unit of
measurement. For example, the difference between 10°C and 20°C is the same as the
difference between 20°C and 30°C, indicating equal intervals. The zero points on these scales
are arbitrary and do not represent an absence of temperature. In the Celsius scale, 0°C
15 | P a g e
corresponds to the freezing point of water, while in the Fahrenheit scale, 0°F is based on an
arbitrary freezing point mixture. However, neither zero point indicates a total absence of
temperature. Both Celsius and Fahrenheit scales maintain a clear order of values, with higher
temperatures represented by higher numerical values and vice versa.
Grading interval belongs to the interval scale because in an interval scale, the numerical
values assigned to data points represent consistent intervals. For example, the difference
between an "A" grade and a "B" grade is the same as the difference between a "C" grade and
a "D" grade, making the intervals equal. Interval scales do not have a true zero point. In
grading systems, there isn't an absolute absence of achievement represented by a zero grade;
rather, the scale begins at a defined starting point (e.g., "F" or "0%"). The zero point in an
interval scale is arbitrary and does not denote the absence of the measured attribute. In
grading systems, a score of zero or an "F" grade does not signify a complete lack of
achievement but rather a starting point on the scale. Grading interval maintains an order.
Baking Temperature
Stock prices
4) Ratio Scale: This scale not only gives the order of the variables and the difference
between the variables but also gives information on the value of true zero. A ratio scale can
do everything that a nominal, ordinal and interval scale can do. Because of the existence of a
true zero value, the ratio scale doesn’t have negative values. This scale allows the
measurement of mean, median and mode. Examples:
16 | P a g e
o Less than 50 kilograms
o 51- 70 kilograms
o 71- 90 kilograms
o 91-110 kilograms
o More than 110 kilograms
Test scores with a true zero point (e.g., percentage scores out of 100)
Ratio scales have a true zero point, meaning that a value of zero indicates the absence of the
measured attribute. In the case of age, weight, and height, a value of zero implies the absence
of age (birth), weight (no mass), and height (no length), respectively. The intervals between
consecutive values on the scale are equal and have the same meaning throughout the scale.
For example, the difference between 20 and 30 years of age is the same as the difference
between 50 and 60 years, indicating equal intervals. The values on a ratio scale have
magnitude, allowing for meaningful comparisons in terms of more or less. For instance,
someone who weighs 80 kilograms is twice as heavy as someone who weighs 40 kilograms.
The values on a ratio scale are ordered from smallest to largest, allowing for comparisons in
terms of greater than, less than, or equal to.
17 | P a g e
Measures of Central Tendency
18 | P a g e
Ques: List the steps for computing the arithmetic mean of discrete series by direct
method and assumed mean method (shortcut method).
Page No: 4.4, 4.5
19 | P a g e
Step 1: Convert the less than series into normal lower-upper class series. Enter them in the
first column labelled class intervals.
Step 2: Construct the frequency column as follows:-
a) Every frequency = current frequency – next frequency
Step 3: Follow the same steps as done in case of exclusive continuous series.
Ques: What is inclusive continuous series? (Ulta hai, inclusive means to exclude)
When the upper limit of one class interval is not the lower limit of the next class interval. In
other words, the class intervals are non-overlapping.
6) Inclusive Continuous series Page No: 4.10
Step 1: Calculate the difference between the lower limit of the second class interval and the
upper limit of the first class interval. Divide the difference by 2.
Step 2: Subtract the halved difference from the lower limit of class intervals and add the
halved difference to the upper limit of class intervals.
Step 3: Repeat the same steps as done for exclusive continuous series.
Ques: What is exclusive continuous series? (Ulta hai, exclusive means to include)
When the upper limit of one class interval is the lower limit of the next class interval. In other
words, the class intervals are overlapping.
7) Exclusive Continuous series when class intervals are unequal Page No: 4.11
Step 1: Make the class intervals equal and adjust the frequencies under the assumption that
the frequencies are equally distributed throughout the class interval.
Step 2: Divide the given frequency of a class interval by the number of class intervals which
have been constructed out of a particular class interval.
Ques: List the mathematical properties of arithmetic mean.
Ques: Why is mean characterized as a point of balance?
Ques: List the merits of arithmetic mean.
Ques: What is an open-ended distribution?
Open-ended distribution is a frequency distribution where one or more classes lack definite
boundaries. These classes extend indefinitely in one or both directions without specific upper
or lower limits. Example: Consider a dataset of ages where the class intervals are "0-10
years," "11-20 years," and "21 years and above." The last class ("21 years and above") is
open-ended as it does not specify an upper limit.
Ques: What is a U-shaped distribution?
It is a statistical distribution which when plotted on a graph has a shape that resembles the
letter ‘U’. This distribution has low frequencies in the middle and higher frequencies at both
20 | P a g e
extremes. Example: age-related mortality rates (Y axis pe mortality rate, X axis pe age). The
mortality rates are higher among infants and the elderly due to vulnerability to diseases. On
the other hand, mortality rates tend to be lower for middle aged individuals who are generally
healthier and less susceptible to life-threatening illnesses.
Ques: List the limitations of arithmetic mean.
1) Extreme values, also known as outliers, can unduly influence the arithmetic mean, leading
to a distorted representation of the data.
2) In skewed distributions, where the data is not symmetrical, the arithmetic mean may not
accurately represent the central tendency due to its sensitivity to outliers
3) Arithmetic mean cannot be computed for qualitative data.
4) When data distribution is highly irregular, the arithmetic mean may not be able to
accurately reflect the underlying trends or patterns.
5) For open ended distribution and U shaped distribution, mean cannot be computed
accurately.
Ques: List the steps involved in the computation of combined average of two or more
related groups. Page No. 4.13
Ques: When does the mean become incorrect?
a) When wrong values of certain items are taken
b) When values of certain items are not taken
c) When values of certain extra items are taken
While calculating the mean, then the mean becomes incorrect.
Ques: List the steps involved in calculating the correct mean. Page No. 4.14
Ques: What is weighted arithmetic mean?
The weighted arithmetic mean is a type of mean calculated by giving different weights to
individual data points. Unlike the regular arithmetic mean where each data point contributes
equally, in the weighted arithmetic mean, certain data points have more influence on the
arithmetic mean based on their assigned weights. It is computed by multiplying each data
point by its corresponding weight, summing these products, and dividing by the sum of the
weights.
Median
21 | P a g e
Ques: What is the median?
Note: While calculating the median, begin the sequence of the items from 1.
Ques: List the steps involved in the calculation of median in case of individual series.
Case 1: When the number of items of the series (N) is even.
Step 1: Arrange the size of item in ascending or descending order. Reindex the items as 1,
2…
Step 2: Find the (N+1)/2 th item.
Step 3: Calculate Median as follows:-
(a) If (N+1)/2 th item is a whole number then
Median = size of the integer part of the (N+1)/2 th item + 50 % of the difference
between the size of the immediate next item and the size of the integer part of the
(N+1)/2 th item
22 | P a g e
Ques: List the steps involved in the calculation of median in case of exclusive continuous
series.
Step 1: Given the class intervals and the frequencies, calculate the cumulative frequency by
adding each frequency cumulatively to the sum of its predecessors and enter them in a
column headed cf.
Step 2: Find out the N/2 th item where N, equals the sum of class frequencies, Σf.
Step 3: Find the cumulative frequency which includes the N/2 th item by finding out that
cumulative frequency which is just higher than the N/2 th item.
Step 4: Corresponding to this cumulative frequency, find out the lower limit L of class
interval and the class frequency f. In addition, find out the cumulative frequency cf and the
length of the interval i of the preceding class interval.
Step 5: Calculate the Median as given on Page No. 4.20
Ques: List the steps involved in the calculation of median in case of less than continuous
series.
Step 1: Convert the less than series into normal lower-upper class series. Enter them in the
first column labelled class intervals.
Step 2: Construct the frequency column as follows:-
a) The first frequency will be as it is.
b) Every other frequency = current frequency – previous frequency
Step 3: Follow the same steps as done in case of exclusive continuous series.
Ques: List the steps involved in the calculation of median in case of more than
continuous series.
Step 1: Convert the less than series into normal lower-upper class series. Enter them in the
first column labelled class intervals.
Step 2: Construct the frequency column as follows:-
a) Every frequency = current frequency – next frequency
Step 3: Follow the same steps as done in case of exclusive continuous series.
Ques: List the steps involved in the calculation of median in case of continuous inclusive
series.
Step 1: Calculate the difference between the lower limit of the second class interval and the
upper limit of the first class interval. Divide the difference by 2.
Step 2: Subtract the halved difference from the lower limit of class intervals and add the
halved difference to the upper limit of class intervals.
Step 3: Repeat the same steps as done for exclusive continuous series.
Ques: List the steps involved in the calculation of median in case of exclusive continuous
series when class intervals are unequal.
23 | P a g e
Step 1: Make the class intervals equal and adjust the frequencies under the assumption that
the frequencies are equally distributed throughout the class interval.
Step 2: Divide the given frequency of a class interval by the number of class intervals which
have been constructed out of a particular class interval.
Ques: List the steps involved in the calculation of median in case of open ended series.
Same steps as for calculating the median for continuous exclusive series.
24 | P a g e
The values of a variate which divide a given series into 100 equal parts are called as
percentiles. Since 99 points are required to divide the given series into 100 equal parts we
have 99 percentiles starting from P1 to P99.
The percentile Pj is that value of the variate up to which j% of the total observations lie when
the given series is arranged in ascending or descending order.
For Example:
P10 = is that value of the variate up to which 10% of the total observations lie when the given
series is arranged in ascending or descending order.
P25 = is that value of the variate up to which 25% of the total observations lie, when the
given series is arranged in ascending or descending order
P50 = is that value of the variate up to which 50% of the total observations, when the given
series is arranged in ascending or descending order lie
P75 = is that value of the variate up to which 75% of the total observations lie, when the
given series is arranged in ascending or descending order
Ques: Write the expression for computing percentiles in case of individual, discrete and
continuous series.
Ques: What is the relationship between partition values?
Q1 = P25, is that value of the variate below which exactly 25% of the observations lie.
Q2 = P50, is that value of the variate below which exactly 50% of the observations lie.
Q3 = P75, is that value of the variate below which exactly 75% of the observations lie.
Note: When the series is arranged in either ascending or descending order.
Note: Quartile calculation is like median calculation only.
Do illustration 28, 29 and 30 for more clarity.
Note: For calculating both quartiles and percentiles, the data needs to be arranged in
ascending or descending order. The calculation for both quartiles and percentiles is same as
seen in case of median with slight modifications which can be seen in illustration 28, 29 and
30.
Ques: What is the percentile of some value say x in a sorted individual series.
Count the number of items less than or equal to x (this would include item x as well). Let y
be the number of item less than or equal to x. Let N be the total number of items in that
series. Then find:
y = what percentage of N ?
This percentage is the percentile of x.
Mode
25 | P a g e
Ques: What is mode?
In statistics, mode refers to that value which occurs most frequently in a give dataset. It is one
of the measures of central tendency.
See Page No. 4.33
A dataset cam have no mode at all if all the values occur with equal frequency. Or repeat
same number of times.
Do the question on mode done in class.
Ques: What is the relationship in case of positively skewed and negatively skewed
distributions?
Page No. 4.46 and 4.47
Note: In class, we have done mode numerical on continuous series.
Do illustrations 36-40.
Ques: List the merits of mode.
Ques: List the demerits of mode.
Ques: What is the usefulness of mode?
Ques: According to Karl Pearson, what is the relationship between mean, median and
mode?
Mode = 3Median – 2Mean
26 | P a g e
Measures of Dispersion
27 | P a g e
Ques: Write the expression for coefficient of variation or standard deviation.
Ques: List the properties of standard deviation.
Ques: List the merits of standard deviation.
Ques: List the demerits of standard deviation.
Ques: What is variance?
Ques: What can we interpret from small value of variance and what can we interpret
from large value of variance?
Ques: Write the expression for combined standard deviation. Page No. 5.33
Ques: What is a box plot?
Box plots also called as the box and whiskers plot is a graphical representation of the
distribution of the dataset. It provides the 5 point summary i.e. minimum, first quartile,
median, third quartile and the maximum. The box of the box plot represents the IQR which
spans from 25th percentile to 75th percentile. The median is represented in the middle of the
plot by a line. The whiskers are drawn from the box to the minimum and maximum value.
The length of the whiskers is typically 1.5 IQR.
Ques: What is a z score?
Z score is a statistical measure which indicates how many standard deviations does a
particular observation lie above or below the mean of a dataset.
Ques: What is the use of z score, standard deviation and variance?
Z scores standardize the data. It indicates the number of standard deviations a particular
observation lies above or below the mean of a dataset. Standard deviation and variance
indicate the spread of observations around the mean.
Z score is calculated for each observation. If the value of z score is negative x, then it means
that particular observation lies x standard deviations below the mean.
The percentile rank and z-score of a measurement indicate its relative position with regard to
the other measurements in a data set.
28 | P a g e