0% found this document useful (0 votes)
31 views28 pages

ASA Notes

Uploaded by

mayurachibb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views28 pages

ASA Notes

Uploaded by

mayurachibb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 28

Applied Statistical Analysis

Notes
Index

S. No. Topic Name Page No.

1. Introduction 3
2. Measurement Scales 14
3. Measures of Central Tendency 18
4. Measures of Dispersion 27

2|Page
1. Introduction

Ques: What do you mean by the term Statistics ?


Statistics is the branch of mathematics that deals with the collection, analysis, interpretation,
presentation, and organization of numeric data. It involves the study of methods and
techniques for gathering, summarizing, and drawing conclusions from data. Statistics plays a
crucial role in various fields, including science, economics, business, and social sciences.
Ques: What are the different types of statistics ?
The different types of statistics are :-
1) Descriptive Statistics
2) Inferential Statistics
Descriptive Statistics: Descriptive statistics involves organizing, summarizing, and
presenting data to describe its main features. Common measures include mean, median, and
mode which provide insights into the central tendencies. For example, calculating the average
income of a sample population involves descriptive statistics.
Inferential Statistics: Inferential statistics involves drawing conclusions and making
predictions about a population based on a sample. In inferential statistics, techniques like
hypothesis testing and confidence intervals are used. For example, estimating the average
height of an entire forest by measuring the height of a sample of trees involves inferential
statistics.

Basic Definitions (Write formulas from textbook)


1) Mean: Mean, which is also known as the average is a measure of central tendency in
statistics. It is calculated by summing up all the values in a dataset and dividing the sum by
the total number of values.
2) Median: The median is a measure of central tendency in statistics that represents the
middle value of a dataset when it is ordered from least to greatest. To calculate the median:
a) Arrange the data in ascending or descending order.
b) If the dataset has an odd number of observations, the median is the middle value.
c) If the dataset has an even number of observations, the median is the average of the two
middle values.
The median is robust to extreme values, making it useful for skewed datasets. It is very useful
while dealing with ordinal or interval data.
3) Variability: Variability in statistics refers to the extent of dispersion or spread within a
dataset. It quantifies how individual data points deviate from the central tendency, such as the

3|Page
mean or median. Essentially, variability measures the degree to which data values differ from
each other in a given dataset.
A higher variability indicates a more scattered or diverse dataset, while lower variability
suggests that data points are closer to each other.
4) Standard Deviation: Standard deviation is a statistical measure that quantifies the amount
of dispersion or spread within a dataset. It provides an insight into the variability of
individual data points in relation to the mean (average). A higher standard deviation indicates
greater data variability, while a lower standard deviation suggests that the data points are
closer to the mean.
5) Variance: Variance is a measure of the spread or dispersion of a set of data points around
the mean. It quantifies how much each data point deviates from the mean of the dataset. The
variance is calculated by taking the average of the squared differences between each data
point and the mean. A higher variance indicates greater variability among the data points,
while a lower variance suggests that the data points are closer to the mean.
6) Measure of central tendency: The measures of central tendency in statistics are the
summary statistics which describe the center or typical value of a dataset. The three main
measures of central tendency are:
Mean: It is calculated by summing all values and dividing by the total number of
observations. The mean is sensitive to extreme values.
Median: This is the middle value in a sorted dataset. The median is less affected by extreme
values.
Mode: Mode is that value which appears most frequently in the dataset. A dataset may have
one mode, more than one mode, or no mode at all.
These measures provide us a central reference point for understanding the distribution of
data. We can choose the appropriate measure of central tendency depending on the nature of
the data and the desired interpretation from the data.
7) Normal Distribution: A normal distribution, also known as a Gaussian distribution or bell
curve, is a symmetrical probability distribution characterized by a bell-shaped curve. It is
described by the mean(center) and standard deviation(spread). In this distribution:
a) The majority of data points cluster around the mean, forming the highest point on the
curve.
b) The curve is symmetric, with tails extending infinitely in both directions.
c) The spread of the distribution is determined by the standard deviation.

4|Page
8) Left Skewed Distribution: A left-skewed distribution, also known as negatively skewed
or left-tailed, is characterized by a longer left tail and a concentration of data points on the
right side. In this distribution:
a) The mean is typically less than the median, indicating that the distribution is pulled to the
left by a few lower extreme values.
b) The bulk of the data points are concentrated on the right side, and the left tail is stretched,
resulting in asymmetry.
c) Left-skewed distributions are less common than right-skewed distributions but occur when
there are lower limits on the data.
Real-life examples of left-skewed distributions include the distribution of age at retirement or
the distribution of income in a population with a minimum wage.
9) Right Skewed Distribution: A right-skewed distribution, also known as positively
skewed or right-tailed, is characterized by a longer right tail and a concentration of data
points on the left side. In this distribution:
a) The mean is typically greater than the median, indicating that the distribution is pulled to
the right by a few higher extreme values.
b) The bulk of the data points are concentrated on the left side, and the right tail is stretched,
resulting in asymmetry.
c) Right-skewed distributions are more common than left-skewed distributions and often
occur when there are upper limits on the data.
Real-life examples of right-skewed distributions include income distributions in populations
with high-income inequality or the distribution of home prices in certain real estate markets.

5|Page
Ques: Why is median a better representation of data ?
1. Less Affected by Outliers : The median is less influenced by extreme values or outliers
compared to the mean.
2. Robust Measure : In skewed distributions, where data is not symmetrically distributed,
the median provides a more robust measure of central tendency.
3. Reflects Central Tendency : In asymmetric distributions, the median often better reflects
the central tendency of the data.
4. Better for Skewed Data : When data exhibits significant skewness or has outliers, the
median offers a more accurate representation of the center of the distribution.
5. Suitable for Ordinal Data : In situations where data is ordinal or categorical, the median
is often preferred as it provides a meaningful measure of central tendency without making
assumptions about the interval between values.

6|Page
6. Resistant to Extremes : Extreme values, which might skew the mean, have less impact
on the median, making it more resistant to fluctuations caused by such values.
7. Easier Interpretation : In some contexts, the median might be easier to interpret,
especially for non-technical audiences or when dealing with non-normal data distributions.
Ques: What is hypothesis testing?
Hypothesis testing is a statistical method used to determine if there is enough evidence in a
sample data to draw conclusions about a population. It involves testing an assumption, known
as the null hypothesis, against an alternative hypothesis. Hypothesis testing is crucial in
research and decision-making processes. It provides a systematic framework to evaluate the
validity of assumptions and draw meaningful conclusions from data.
Ques: Explain the different types of data.
There are two types of data:-
Qualitative Data
Quantitative Data
1) Qualitative Data: Qualitative data is descriptive and non-numerical. It focusses on the
qualities, characteristics and attributes of data. Qualitative data is valuable for exploring
complex phenomena and understanding context. Examples: gender male/female, smoker/non-
smoker, questionnaire response (agree, disagree, neutral), hair color, religion, political party
affiliation and profession.
2) Quantitative Data: Quantitative data is numerical and measurable, expressed in terms of
quantities or amounts. It involves objective observations and precise measurements that can
be analyzed statistically. Quantitative data is used to perform statistical analysis and precise
measurement. Examples: Revenue in dollars, age in months or years, distance in miles or
kilometres, time in days or weeks, test scores, Weight in kilograms or pounds, height in feet
or inches, length in centimetres, population size, income, sales figures, fuel consumption
and website page load speed.
Identify whether the following is qualitative data or quantitative data:
The natural hair colour of 20 randomly selected fashion models: hair colour cannot be
measured or expressed as a number; instead, it is descriptive and categorical. Hence this is
qualitative data.
The ages of 20 randomly selected fashion models: quantitative data because age is a
numerical value.
The fuel economy in miles per gallon of 20 new cars purchased last month: quantitative
data because fuel economy in miles per gallon is a numerical value.

Ques: Can qualitative data generate quantitative data?


Yes, qualitative data can be transformed or quantified into quantitative data through various
methods such as coding, categorization, and numerical assignment.

7|Page
Ques: What are the different types of sampling?
The different types of sampling are:
Random Sampling
Biased Sampling
Stratified Sampling
1) Random Sampling: Random sampling involves selecting a subset of individuals from a
larger population in such a way that each member of the population has an equal chance of
being chosen. This method ensures that the sample is representative of the entire population
and reduces the risk of bias.
2) Biased Sampling: Biased sampling occurs when certain members of the population are
more likely to be selected than others, leading to a skewed representation of the population.
This can result from flaws in the sampling method or deliberate manipulation to favour a
particular outcome.
3) Stratified Sampling: Stratified sampling involves dividing the population into subgroups
or strata based on specific characteristics, such as age or income level. Samples are then
randomly selected from each subgroup, ensuring that each subgroup is adequately
represented in the final sample. This method allows for a more precise analysis of each
subgroup and can improve the overall accuracy of the sample.
Ques: What are the advantages of performing stratified sampling?
1) By dividing the population into homogeneous subgroups or strata and then sampling from
each stratum, the variability within each group is reduced, leading to more accurate estimates.
2) Stratified sampling ensures that each subgroup or stratum is adequately represented in the
sample, allowing for a more comprehensive understanding of the population.
3) When the population is diverse, stratified sampling can be more cost-effective.
4) Stratified sampling reduces the risk of sample bias by ensuring that each subgroup or
characteristic of interest is appropriately represented in the sample.
Ques: What is sample bias?
Sample bias refers to the systematic error introduced in a study or survey when the
characteristics of the sample differ from those of the population which the sample intends to
represent. This discrepancy can lead to inaccurate conclusions because the sample does not
adequately reflect the true population. Sample bias can occur due to various factors, such as
non-random sampling methods, voluntary response bias, or under coverage. It can distort the
results and lead to misleading interpretations if not appropriately addressed.
Ques: What is voluntary response bias?
Voluntary response bias occurs when individuals self-select to participate in a survey or
study, leading to a non-random sample. Those who choose to respond may have stronger
opinions or experiences related to the topic, while others may opt out. This can skew the
results, as the sample may not represent the broader population accurately. Voluntary

8|Page
response bias can lead to overrepresentation of certain viewpoints or demographics, while
underrepresenting others, resulting in biased conclusions.
Ques: What is population and sample?
A population is the entire group that we want to draw conclusions about. A sample is the
specific group from which we will collect data from. Samples are used because they are more
manageable, cost-effective, and less time-consuming to study compared to entire populations.
The size of the sample is always less than the total size of the population.

Examples of population:

Advertisements for IT jobs in a country.

All the songs sung in a song contest.

The number of undergraduate students in a country.

All countries of the world.

The entire student body at a school.

Examples of sample:

The top 50 search results for advertisements for IT jobs in a country.

The winning songs in a song contest.

Examples of distinction between population and sample:

All the students in the school are the population and the students of class 10 are the sample.
Patients in the hospital are the population and the old age patients are the sample.

All the people who have the ID proofs is the population and a group of people who only have
voter id with them is the sample.

All the students in the class are population whereas the top 10 students in the class are the
sample.

All the members of the parliament is population and the female candidates present there is the
sample.
Identify whether population and sample

The grade point averages of all students in ASET : sample

The grade point averages of all students in Amity : population

The ages of 9 Supreme Court judges on 1st January 1845 : population

9|Page
The gender of every second customer entering a mall : sample

The number of snow crabs caught on a fishing trip : sample

The snow crabs caught on a single fishing trip represent only a portion of the total snow crab
population in the area.

The GPA’S of a randomly selected group of students from a contest: sample

The weight of turtles caught in the Indian ocean:

The height of students in a particular school:

The number of cars passing through a toll booth in a day:

The salaries of employees in a specific company:

The ages of residents in a particular city:

The scores of students who took a standardized test in a state:

The weights of fish caught in a specific lake:

The number of daily visitors to a popular tourist attraction:

The prices of houses sold in a specific neighbourhood:

The blood pressure readings of patients in a hospital:

The ages of customers at a shopping mall:

Answers:

Certainly, here are the answers to the provided questions:

1. The height of students in a particular school:

- This represents a population if it includes the heights of all students enrolled in that
school. However, if it only includes the heights of a subset of students, such as those in a
specific grade or class, it would be a sample

2. The number of cars passing through a toll booth in a day:

- This represents a population if it includes the count of all cars passing through the toll
booth in a day. However, if it only includes the count of cars passing through during certain
times or days, it would be a sample .

10 | P a g e
3. The salaries of employees in a specific company:

- This represents a population if it includes the salaries of all employees working for that
specific company. However, if it only includes the salaries of a subset of employees, such as
those in a particular department or location, it would be a sample

4. The ages of residents in a particular city:

- This represents a population if it includes the ages of all residents living in that particular
city. However, if it only includes the ages of a subset of residents, such as those living in a
specific neighbourhood, it would be a sample .

5. The scores of students who took a standardized test in a state:

- This represents a population if it includes the scores of all students who took the
standardized test in that state. However, if it only includes the scores of a subset of students,
such as those from a specific school or district, it would be a sample .

6. The weights of fish caught in a specific lake:

- This represents a sample as it likely does not include the weights of all fish in the lake
but rather a subset of fish caught during a specific time or by certain methods.

7. The number of daily visitors to a popular tourist attraction:

- This represents a population if it includes the count of all daily visitors to the tourist
attraction. However, if it only includes the count of visitors on certain days or during specific
times, it would be a sample .

8. The prices of houses sold in a specific neighbourhood:

- This represents a sample as it likely does not include the prices of all houses sold in the
neighbourhood but rather a subset of houses sold during a specific period.

9. The blood pressure readings of patients in a hospital:

- This represents a sample as it likely does not include the blood pressure readings of all
patients in the hospital but rather a subset of patients measured during a specific time or for
specific reasons.

10. The ages of customers at a shopping mall:

- This represents a sample if it only includes the ages of customers present during a
specific time or day at the shopping mall. However, if it includes the ages of all customers
who have visited the mall over a certain period, it could be considered a population .

11 | P a g e
Identify whether population and sample

The heights of 20 randomly selected basketball players in a tournament:

The incomes of 50 employees working in a specific company:

The ages of 100 individuals living in a particular city:

The weights of 30 fish caught in a specific river:

The test scores of 200 students who took a standardized exam in a school district:

The number of daily visitors to a museum over a month:

The prices of 50 houses sold in a particular neighbourhood:

The ages of 15 participants in a research study:

The blood pressure readings of 25 patients in a hospital:

The heights of 50 trees in a forest reserve:

Answers:

1. The heights of 20 randomly selected basketball players in a tournament:

- This represents a sample as it includes the heights of only a subset of basketball players
in the tournament, not all players.

2. The incomes of 50 employees working in a specific company:

- This represents a sample as it includes the incomes of only a subset of employees in the
company, not all employees.

3. The ages of 100 individuals living in a particular city:

- This represents a sample if it includes the ages of only a subset of individuals living in
the city. However, if it includes the ages of all individuals in the city, it would be a
population .

4. The weights of 30 fish caught in a specific river:

- This represents a sample as it includes the weights of only a subset of fish caught in the
river, not all fish.

5. The test scores of 200 students who took a standardized exam in a school district:

- This represents a sample as it includes the test scores of only a subset of students who
took the exam, not all students in the school district.

12 | P a g e
6. The number of daily visitors to a museum over the course of a month:

- This represents a population if it includes the count of all daily visitors to the museum
over the month. However, if it only includes the count of visitors on certain days or during
specific times, it would be a sample .

7. The prices of 50 houses sold in a particular neighbourhood:

- This represents a sample as it includes the prices of only a subset of houses sold in the
neighbourhood, not all houses.

8. The ages of 15 participants in a research study:

- This represents a sample as it includes the ages of only a subset of participants in the
study, not all participants.

9. The blood pressure readings of 25 patients in a hospital:

- This represents a sample as it includes the blood pressure readings of only a subset of
patients in the hospital, not all patients.

10. The heights of 50 trees in a forest reserve:

- This represents a sample as it includes the heights of only a subset of trees in the forest
reserve, not all trees.

To Do: Introductory Statistics book, Exercise 1.1

13 | P a g e
2. Measurement Scales
Ques: What is a scale?

Scales of measurement refer to ways in which variables/numbers are defined and categorized.
The four scales of measurement are nominal, ordinal, interval, and ratio.

Ques: What is a true zero?

In statistics, a "true zero" refers to a point on the measurement scale where the absence of the
attribute being measured is indicated by the value zero. Unlike other scales where zero might
be arbitrary or simply a reference point, a true zero represents an absolute absence of the
variable being measured.

Ques: Describe the different types of scales.

1) Nominal Scale: This scale categorizes data into distinct categories or groups and does not
involve any quantitative value or order. The nominal scale only allows measurement of the
mode. Nominal data consists of categories without any inherent numerical value or order.
Therefore, it lacks the properties required for calculating the mean or median
Examples: Gender (M), Eye Colour(black/brown/green), Marital
status(Married/Unmarried), Political Party Affiliation, place of abode(city/town/village),
Nationality, Blood Type, Religion, Language Spoken, Car Brands, Animal Species, colours of
crayon in a 24 crayon box

2) Ordinal Scale: This scale is used to simply depict the order of variables and not the
difference between each variable. These scales generally depict non-mathematical ideas such
as frequency, satisfaction, happiness, Grades, degree of pain, etc. The ordinal scale allows the

14 | P a g e
measurement of the mode and median. Although the ordinal data has a rank order, the
intervals between variables may not be consistent, making the calculation of the mean
incorrect. However, it is possible to identify the middle value, making it possible to calculate
the median.
Examples:
Happiness scale (e.g. Very Sad, Sad, Neutral, Happy, Very Happy)
Educational attainment (e.g., high school diploma, bachelor's degree, master's degree)
Socioeconomic status (e.g., lower class, middle class, upper class)
Likert scale responses (e.g., strongly disagree, disagree, neutral, agree, strongly agree)
Job hierarchy levels (e.g., entry-level, supervisor, manager, director)
Survey response options indicating frequency (e.g., never, rarely, sometimes, often, always)
Performance ratings (e.g., poor, fair, satisfactory, good, excellent)
Satisfaction levels (e.g., very unsatisfied, unsatisfied, neutral, satisfied, very satisfied)
Health condition severity (e.g., mild, moderate, severe)
Economic development classifications (e.g., developing, emerging, developed)
Sports rankings (e.g., first place, second place, third place)
Letter Grades (e.g. A, B, C, D, F)
Political Outcomes (e.g. left of centre, centre, right of centre)

3) Interval Scale: In this scale, the order of the variables and the difference between the
variables is known. Interval means the distance between the two entities. The interval scale
contains all the properties of the ordinal scale and offers a calculation of the difference
between variables. The main characteristic of this scale is the equidistant difference between
objects. The only drawback of this scale is that there is no pre-decided starting point or a true
zero value. This scale allows the measurement of mean, median and mode.

The characteristics of the interval scale are:-


Equal or consistent intervals, Arbitrary Zero point, Absence of true zero, Order and
Magnitude
Examples:

Temperature measured in the Celsius/Fahrenheit scale belongs to the interval scale because
both Celsius and Fahrenheit scales have consistent intervals between each unit of
measurement. For example, the difference between 10°C and 20°C is the same as the
difference between 20°C and 30°C, indicating equal intervals. The zero points on these scales
are arbitrary and do not represent an absence of temperature. In the Celsius scale, 0°C

15 | P a g e
corresponds to the freezing point of water, while in the Fahrenheit scale, 0°F is based on an
arbitrary freezing point mixture. However, neither zero point indicates a total absence of
temperature. Both Celsius and Fahrenheit scales maintain a clear order of values, with higher
temperatures represented by higher numerical values and vice versa.

Grading interval belongs to the interval scale because in an interval scale, the numerical
values assigned to data points represent consistent intervals. For example, the difference
between an "A" grade and a "B" grade is the same as the difference between a "C" grade and
a "D" grade, making the intervals equal. Interval scales do not have a true zero point. In
grading systems, there isn't an absolute absence of achievement represented by a zero grade;
rather, the scale begins at a defined starting point (e.g., "F" or "0%"). The zero point in an
interval scale is arbitrary and does not denote the absence of the measured attribute. In
grading systems, a score of zero or an "F" grade does not signify a complete lack of
achievement but rather a starting point on the scale. Grading interval maintains an order.

Satisfaction Survey because values are assigned

Baking Temperature

Credit Scores interval

Dates on calendar (e.g., years, months, days)

Time intervals (e.g., hours, minutes, seconds)

IQ scores from intelligence tests

Standardized test scores (e.g., SAT scores, GRE scores)

Stock prices

pH level measurements (e.g., acidity of a solution)

Music pitch frequencies (e.g., Hz)

Wind speed is measured in miles per hour or kilometres per hour

4) Ratio Scale: This scale not only gives the order of the variables and the difference
between the variables but also gives information on the value of true zero. A ratio scale can
do everything that a nominal, ordinal and interval scale can do. Because of the existence of a
true zero value, the ratio scale doesn’t have negative values. This scale allows the
measurement of mean, median and mode. Examples:

 What is your daughter’s current height?


o Less than 5 feet.
o 5 feet 1 inch – 5 feet 5 inches
o 5 feet 6 inches- 6 feet
o More than 6 feet
 What is your weight in kilograms?

16 | P a g e
o Less than 50 kilograms
o 51- 70 kilograms
o 71- 90 kilograms
o 91-110 kilograms
o More than 110 kilograms

Age (measured in years)

Dates 1940, 1945

Distance (measured in meters, kilometres, miles, etc.)

Time of the day (measured in seconds, minutes, hours, etc.)

Volume (measured in litres, gallons, etc.)

Energy consumption (measured in kilowatt-hours)

Salary or income (measured in currency units)

Blood pressure (measured in millimetres of mercury - mmHg)

Test scores with a true zero point (e.g., percentage scores out of 100)

why do age, weight and height belong to the ratio scale?

The characteristics of the ratio scale are :-

True Zero Point, Equal Intervals, Magnitude and Order

Ratio scales have a true zero point, meaning that a value of zero indicates the absence of the
measured attribute. In the case of age, weight, and height, a value of zero implies the absence
of age (birth), weight (no mass), and height (no length), respectively. The intervals between
consecutive values on the scale are equal and have the same meaning throughout the scale.
For example, the difference between 20 and 30 years of age is the same as the difference
between 50 and 60 years, indicating equal intervals. The values on a ratio scale have
magnitude, allowing for meaningful comparisons in terms of more or less. For instance,
someone who weighs 80 kilograms is twice as heavy as someone who weighs 40 kilograms.
The values on a ratio scale are ordered from smallest to largest, allowing for comparisons in
terms of greater than, less than, or equal to.

17 | P a g e
Measures of Central Tendency

Refer: Chapter 4 (Tulsian)


Ques: What is the measure of central tendency?
Ques: What are the objectives of the average?
Ques: What are the requisites/characteristics of average?
Ques: Draw a diagram representing the different measures of central tendency.
Mean
Ques: What is arithmetic mean?
Note: Individual series is also known as ungrouped data. In addition, no frequency is
associated with individual series. Discrete and continuous series are called as grouped data.
Frequency is associated with continuous series and discrete series.
Note: In individual series, N = number of observations
In discrete series, N, equals the sum of class frequencies, which is Σf.
In continuous series, N, equals the sum of class frequencies, which is Σf.
Ques: What is individual series?
In individual series, each observation is listed separately without grouping or categorization.
It consists of a single column of observations. No frequency is associated with individual
series because here each observation is unique. Example: Marks of all students in a class
along with the roll no of each student.
Ques: List the steps for computing the arithmetic mean of individual series by direct
method and assumed mean method (shortcut method).
Page No: 4.2, 4.3
Ques: What is discrete series?
In a discrete series, data is represented in the form of classes along with their corresponding
frequencies. It contains two columns: one for the classes, and the other for their frequencies.
Each class is different and separate from the others. The total number of observations in a
discrete series, N, equals the sum of class frequencies, which is Σf. Example: A list of marks
of students in a class along with the number of students who have attained those marks.

18 | P a g e
Ques: List the steps for computing the arithmetic mean of discrete series by direct
method and assumed mean method (shortcut method).
Page No: 4.4, 4.5

Ques: What is continuous series?


In continuous series, the data is grouped into intervals. It includes two columns: one for class
intervals and the other for their frequencies. Class intervals are continuous and contiguous
(sharing a common border), with no gaps between them. The total number of observations in
a continuous series, N, equals the sum of class frequencies, which is Σf. Example: Height
distribution of students grouped into intervals like 150-160 cm, 160-170 cm, etc.
Ques: Midpoints for each class-interval are calculated under what assumption?
Midpoints for each class-interval are calculated under the assumption that the corresponding
frequency is evenly distributed throughout that class interval. It means that equal number of
observations are present below the midpoint and above the midpoint.
Ques: List the steps for computing the arithmetic mean for each of the following:-
1) Exclusive Continuous series (direct method) Page No: 4.6
2) Exclusive Continuous series (assumed mean method) Page No: 4.7
Note: Step deviation method is used to calculate the arithmetic mean of continuous series.
Ques: When is the step deviation method useful?
The step deviation method becomes particularly useful when we have to calculate the
arithmetic mean of continuous series by the assumed mean method but the midpoints of
various classes are large. This method simplifies calculations. In this method, the deviations
from the assumed mean are divided by a common factor ‘c’. If all class intervals are equal
then ‘c’ is the class interval.
3) Continuous series (step deviation method) Page No: 4.8
4) Less than Continuous series
Step 1: Convert the less than series into normal lower-upper class series. Enter them in the
first column labelled class intervals.
Step 2: Construct the frequency column as follows:-
a) The first frequency will be as it is.
b) Every other frequency = current frequency – previous frequency
Step 3: Follow the same steps as done in case of exclusive continuous series.
5) More than Continuous series

19 | P a g e
Step 1: Convert the less than series into normal lower-upper class series. Enter them in the
first column labelled class intervals.
Step 2: Construct the frequency column as follows:-
a) Every frequency = current frequency – next frequency
Step 3: Follow the same steps as done in case of exclusive continuous series.

Ques: What is inclusive continuous series? (Ulta hai, inclusive means to exclude)
When the upper limit of one class interval is not the lower limit of the next class interval. In
other words, the class intervals are non-overlapping.
6) Inclusive Continuous series Page No: 4.10
Step 1: Calculate the difference between the lower limit of the second class interval and the
upper limit of the first class interval. Divide the difference by 2.
Step 2: Subtract the halved difference from the lower limit of class intervals and add the
halved difference to the upper limit of class intervals.
Step 3: Repeat the same steps as done for exclusive continuous series.
Ques: What is exclusive continuous series? (Ulta hai, exclusive means to include)
When the upper limit of one class interval is the lower limit of the next class interval. In other
words, the class intervals are overlapping.
7) Exclusive Continuous series when class intervals are unequal Page No: 4.11
Step 1: Make the class intervals equal and adjust the frequencies under the assumption that
the frequencies are equally distributed throughout the class interval.
Step 2: Divide the given frequency of a class interval by the number of class intervals which
have been constructed out of a particular class interval.
Ques: List the mathematical properties of arithmetic mean.
Ques: Why is mean characterized as a point of balance?
Ques: List the merits of arithmetic mean.
Ques: What is an open-ended distribution?
Open-ended distribution is a frequency distribution where one or more classes lack definite
boundaries. These classes extend indefinitely in one or both directions without specific upper
or lower limits. Example: Consider a dataset of ages where the class intervals are "0-10
years," "11-20 years," and "21 years and above." The last class ("21 years and above") is
open-ended as it does not specify an upper limit.
Ques: What is a U-shaped distribution?
It is a statistical distribution which when plotted on a graph has a shape that resembles the
letter ‘U’. This distribution has low frequencies in the middle and higher frequencies at both

20 | P a g e
extremes. Example: age-related mortality rates (Y axis pe mortality rate, X axis pe age). The
mortality rates are higher among infants and the elderly due to vulnerability to diseases. On
the other hand, mortality rates tend to be lower for middle aged individuals who are generally
healthier and less susceptible to life-threatening illnesses.
Ques: List the limitations of arithmetic mean.
1) Extreme values, also known as outliers, can unduly influence the arithmetic mean, leading
to a distorted representation of the data.
2) In skewed distributions, where the data is not symmetrical, the arithmetic mean may not
accurately represent the central tendency due to its sensitivity to outliers
3) Arithmetic mean cannot be computed for qualitative data.
4) When data distribution is highly irregular, the arithmetic mean may not be able to
accurately reflect the underlying trends or patterns.
5) For open ended distribution and U shaped distribution, mean cannot be computed
accurately.
Ques: List the steps involved in the computation of combined average of two or more
related groups. Page No. 4.13
Ques: When does the mean become incorrect?
a) When wrong values of certain items are taken
b) When values of certain items are not taken
c) When values of certain extra items are taken
While calculating the mean, then the mean becomes incorrect.
Ques: List the steps involved in calculating the correct mean. Page No. 4.14
Ques: What is weighted arithmetic mean?
The weighted arithmetic mean is a type of mean calculated by giving different weights to
individual data points. Unlike the regular arithmetic mean where each data point contributes
equally, in the weighted arithmetic mean, certain data points have more influence on the
arithmetic mean based on their assigned weights. It is computed by multiplying each data
point by its corresponding weight, summing these products, and dividing by the sum of the
weights.

Read the underlined lines on Page No. 4.15

Median

21 | P a g e
Ques: What is the median?
Note: While calculating the median, begin the sequence of the items from 1.
Ques: List the steps involved in the calculation of median in case of individual series.
Case 1: When the number of items of the series (N) is even.
Step 1: Arrange the size of item in ascending or descending order. Reindex the items as 1,
2…
Step 2: Find the (N+1)/2 th item.
Step 3: Calculate Median as follows:-
(a) If (N+1)/2 th item is a whole number then

Median = size of (N+1)/2 th item

(b) If (N+1)/2 th item is a decimal then

Median = size of the integer part of the (N+1)/2 th item + 50 % of the difference
between the size of the immediate next item and the size of the integer part of the
(N+1)/2 th item

Case 2: When the number of items of the series (N) is odd.


Step 1: Arrange the size of item in ascending or descending order. Reindex the items as 1,
2…
Step 2: Find the (N+1)/2 th item.
Step 3: Calculate Median as follows:-
Median = size of (N+1)/2 th item
Ques: List the steps involved in the calculation of median in case of discrete series.
Step 1: Arrange the size of item in ascending or descending order. Accordingly, rearrange the
corresponding frequencies.
Step 2: Calculate the cumulative frequency by adding each frequency cumulatively to the
sum of its predecessors and enter them in a column headed cf.
Step 3: Find the (N+1)/2 th item where N, equals the sum of class frequencies, Σf.
Step 4: Calculate Median as follows:-
Median = size of (N+1)/2 th item
The size of (N+1)/2 th item can be found out through the following steps:-
a) Firstly locate the cumulative frequency which is just higher than the (N+1)/2 th item.
b) Now find the size of item corresponding to this cumulative frequency. This is the size
of (N+1)/2 th item.

22 | P a g e
Ques: List the steps involved in the calculation of median in case of exclusive continuous
series.
Step 1: Given the class intervals and the frequencies, calculate the cumulative frequency by
adding each frequency cumulatively to the sum of its predecessors and enter them in a
column headed cf.
Step 2: Find out the N/2 th item where N, equals the sum of class frequencies, Σf.
Step 3: Find the cumulative frequency which includes the N/2 th item by finding out that
cumulative frequency which is just higher than the N/2 th item.
Step 4: Corresponding to this cumulative frequency, find out the lower limit L of class
interval and the class frequency f. In addition, find out the cumulative frequency cf and the
length of the interval i of the preceding class interval.
Step 5: Calculate the Median as given on Page No. 4.20
Ques: List the steps involved in the calculation of median in case of less than continuous
series.
Step 1: Convert the less than series into normal lower-upper class series. Enter them in the
first column labelled class intervals.
Step 2: Construct the frequency column as follows:-
a) The first frequency will be as it is.
b) Every other frequency = current frequency – previous frequency
Step 3: Follow the same steps as done in case of exclusive continuous series.
Ques: List the steps involved in the calculation of median in case of more than
continuous series.
Step 1: Convert the less than series into normal lower-upper class series. Enter them in the
first column labelled class intervals.
Step 2: Construct the frequency column as follows:-
a) Every frequency = current frequency – next frequency
Step 3: Follow the same steps as done in case of exclusive continuous series.
Ques: List the steps involved in the calculation of median in case of continuous inclusive
series.
Step 1: Calculate the difference between the lower limit of the second class interval and the
upper limit of the first class interval. Divide the difference by 2.
Step 2: Subtract the halved difference from the lower limit of class intervals and add the
halved difference to the upper limit of class intervals.
Step 3: Repeat the same steps as done for exclusive continuous series.
Ques: List the steps involved in the calculation of median in case of exclusive continuous
series when class intervals are unequal.

23 | P a g e
Step 1: Make the class intervals equal and adjust the frequencies under the assumption that
the frequencies are equally distributed throughout the class interval.
Step 2: Divide the given frequency of a class interval by the number of class intervals which
have been constructed out of a particular class interval.
Ques: List the steps involved in the calculation of median in case of open ended series.
Same steps as for calculating the median for continuous exclusive series.

Ques: List the mathematical property of median.


Ques: List the merits of median.
Ques: List the limitations of median.
Ques: What are partition values?
Ques: What is a variate?
Variate also known as variable, is a term used in statistics and mathematics to denote a
quantity which can take different numerical values.
Ques: What are quartiles?
The values of a variate which divide a given series into 4 equal parts are called as quartiles.
Since four points are required to divide the given series into 4 equal parts, that is why we
have 3 quartiles, Q1, Q2 and Q3.
Q1 is called as the first quartile or the lower quartile. It is that value of the variate below
which 25% of the observations lie and above which 75% of the observations lie when the
given series is arranged in ascending or descending order.
Q2 is called as the second quartile or middle quartile or median. It is that value of the variate
which divides the given series into two equal parts. 50% of the observations lie below the
median and 75% of the observations lie above the median when the given series is arranged
in ascending or descending order.
Q3 is called as the third quartile or the upper quartile. It is that value of the variate below
which 75% of the observations lie and above which 25% of the observations lie when the
given series is arranged in ascending or descending order.
Ques: Compare the three quartiles.
Q1< Q2<Q3
Ques: Write the expression for computing quartiles in case of individual, discrete and
continuous series.
Ques: Is it compulsory that the values of the quartiles should be a part of the dataset?
No, it is not necessary.
Ques: What are percentiles?

24 | P a g e
The values of a variate which divide a given series into 100 equal parts are called as
percentiles. Since 99 points are required to divide the given series into 100 equal parts we
have 99 percentiles starting from P1 to P99.
The percentile Pj is that value of the variate up to which j% of the total observations lie when
the given series is arranged in ascending or descending order.
For Example:
P10 = is that value of the variate up to which 10% of the total observations lie when the given
series is arranged in ascending or descending order.
P25 = is that value of the variate up to which 25% of the total observations lie, when the
given series is arranged in ascending or descending order
P50 = is that value of the variate up to which 50% of the total observations, when the given
series is arranged in ascending or descending order lie
P75 = is that value of the variate up to which 75% of the total observations lie, when the
given series is arranged in ascending or descending order
Ques: Write the expression for computing percentiles in case of individual, discrete and
continuous series.
Ques: What is the relationship between partition values?
Q1 = P25, is that value of the variate below which exactly 25% of the observations lie.
Q2 = P50, is that value of the variate below which exactly 50% of the observations lie.
Q3 = P75, is that value of the variate below which exactly 75% of the observations lie.
Note: When the series is arranged in either ascending or descending order.
Note: Quartile calculation is like median calculation only.
Do illustration 28, 29 and 30 for more clarity.
Note: For calculating both quartiles and percentiles, the data needs to be arranged in
ascending or descending order. The calculation for both quartiles and percentiles is same as
seen in case of median with slight modifications which can be seen in illustration 28, 29 and
30.
Ques: What is the percentile of some value say x in a sorted individual series.
Count the number of items less than or equal to x (this would include item x as well). Let y
be the number of item less than or equal to x. Let N be the total number of items in that
series. Then find:
y = what percentage of N ?
This percentage is the percentile of x.

Mode

25 | P a g e
Ques: What is mode?
In statistics, mode refers to that value which occurs most frequently in a give dataset. It is one
of the measures of central tendency.
See Page No. 4.33
A dataset cam have no mode at all if all the values occur with equal frequency. Or repeat
same number of times.
Do the question on mode done in class.

Ques: What is the relationship in case of positively skewed and negatively skewed
distributions?
Page No. 4.46 and 4.47
Note: In class, we have done mode numerical on continuous series.
Do illustrations 36-40.
Ques: List the merits of mode.
Ques: List the demerits of mode.
Ques: What is the usefulness of mode?
Ques: According to Karl Pearson, what is the relationship between mean, median and
mode?
Mode = 3Median – 2Mean

26 | P a g e
Measures of Dispersion

Ques: What is inter quartile range?


Inter quartile range is the difference between upper quartile (Q3) and the lower quartile (Q1).
IQR = Q3 – Q1
Ques: List the steps involved in the calculation of IQR in case of individual series.
Ques: List the steps involved in the calculation of IQR in case of discrete series.
Ques: List the steps involved in the calculation of IQR in case of continuous exclusive
series.
Ques: What is quartile deviation or semi inter quartile range?
Ques: List the merits of quartile deviation.
Ques: List the limitations of quartile deviation.
Ques: List the steps involved in the calculation of quartile deviation in case of individual
series.
Ques: List the steps involved in the calculation of quartile deviation in case of discrete
series.
Ques: List the steps involved in the calculation of quartile deviation in case of
continuous
series.
Ques: What is mean deviation?
Ques: List the merits of mean deviation.
Ques: List the demerits of mean deviation.
Ques: What is standard deviation? What is the property of standard deviation?
Ques: List the expressions for calculation of standard deviation in case of individual,
discrete and continuous series.

27 | P a g e
Ques: Write the expression for coefficient of variation or standard deviation.
Ques: List the properties of standard deviation.
Ques: List the merits of standard deviation.
Ques: List the demerits of standard deviation.
Ques: What is variance?
Ques: What can we interpret from small value of variance and what can we interpret
from large value of variance?
Ques: Write the expression for combined standard deviation. Page No. 5.33
Ques: What is a box plot?
Box plots also called as the box and whiskers plot is a graphical representation of the
distribution of the dataset. It provides the 5 point summary i.e. minimum, first quartile,
median, third quartile and the maximum. The box of the box plot represents the IQR which
spans from 25th percentile to 75th percentile. The median is represented in the middle of the
plot by a line. The whiskers are drawn from the box to the minimum and maximum value.
The length of the whiskers is typically 1.5 IQR.
Ques: What is a z score?
Z score is a statistical measure which indicates how many standard deviations does a
particular observation lie above or below the mean of a dataset.
Ques: What is the use of z score, standard deviation and variance?
Z scores standardize the data. It indicates the number of standard deviations a particular
observation lies above or below the mean of a dataset. Standard deviation and variance
indicate the spread of observations around the mean.
Z score is calculated for each observation. If the value of z score is negative x, then it means
that particular observation lies x standard deviations below the mean.
The percentile rank and z-score of a measurement indicate its relative position with regard to
the other measurements in a data set.

Ques: What are the empirical rules? [68-95-99.7]


Empirical rules are applicable only to the normally distributed datasets. For all other cases we
use Chebyshev’s theorem
Ques: What is Chebyshev’s theorem?

28 | P a g e

You might also like