0% found this document useful (0 votes)
9 views33 pages

Allama Iqbal Open University Islamabad: Name Semester User ID Assignment No. 1 Program BS

The document discusses statistics, defining it as the branch of mathematics focused on data collection, analysis, interpretation, and presentation. It highlights the importance of statistics across various fields such as business, medicine, education, and more, emphasizing its role in decision-making and trend analysis. Additionally, it outlines steps for constructing a frequency distribution and measures of central tendency, underscoring their significance in data analysis.

Uploaded by

jiwaxe4301
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views33 pages

Allama Iqbal Open University Islamabad: Name Semester User ID Assignment No. 1 Program BS

The document discusses statistics, defining it as the branch of mathematics focused on data collection, analysis, interpretation, and presentation. It highlights the importance of statistics across various fields such as business, medicine, education, and more, emphasizing its role in decision-making and trend analysis. Additionally, it outlines steps for constructing a frequency distribution and measures of central tendency, underscoring their significance in data analysis.

Uploaded by

jiwaxe4301
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 33

ALLAMA IQBAL OPEN UNIVERSITY ISLAMABAD

Name

Semester

User ID

Assignment No. 1st

Program BS
Course Code 4485

Q. 1

(a) What do you understand by the term statistics? Give its


chief characteristics.

(b) Give a brief account of the importance of statistics in


different fields

(a) What is Statistics?

Statistics is the branch of mathematics that involves the collection, analysis, interpretation,
presentation, and organization of data. It provides tools and techniques to make sense of
numerical data, enabling individuals and organizations to make informed decisions. The scope of
statistics spans various disciplines and applications, ranging from basic data summarization to
advanced predictive modeling.
Chief Characteristics of Statistics

1. Quantitative Nature: Statistics deals primarily with numerical data. It is concerned with
measurements and observations that can be quantified.
2. Aggregate of Facts: Statistics works with a set of data, not isolated individual values.
For example, the average income of a population is derived from data on multiple
individuals.
3. Systematic Collection: Statistical data must be collected systematically and with
precision to ensure accuracy and reliability.
4. Variability: It considers variability and differences in data, acknowledging that no two
sets of data are identical.
5. Purpose-Driven: The collection and analysis of statistical data are conducted with
specific objectives, such as identifying trends or solving a problem.
6. Interdisciplinary Application: Statistics can be applied across diverse fields, including
economics, medicine, education, and engineering.
7. Inference: A key feature of statistics is drawing conclusions or making predictions based
on data analysis. This includes hypothesis testing and estimation.

(b) Importance of Statistics in Different Fields

Statistics plays a vital role across various sectors and disciplines. Its applications are broad and
significant, as detailed below:

1. Business and Economics

Statistics is essential for decision-making in business and economics. It helps:

 Market Research: Companies use statistical surveys to understand consumer behavior


and preferences.
 Forecasting: Businesses predict sales, revenue, and market trends using statistical tools.
 Quality Control: Techniques like Six Sigma employ statistical analysis to maintain
product quality.
 Economic Planning: Governments rely on statistics for GDP calculation, inflation
tracking, and resource allocation.

2. Medicine and Healthcare

In the medical field, statistics is indispensable for:

 Clinical Trials: Determining the efficacy and safety of new drugs.


 Epidemiology: Tracking and analyzing disease patterns and outbreaks.
 Healthcare Planning: Allocating resources, such as hospital beds and vaccines, based on
statistical analysis.
 Diagnostic Tools: Statistical methods support the development of accurate diagnostic
tests.

3. Education

Statistics is crucial in educational research and planning:

 Performance Analysis: Schools and universities analyze student performance using


statistical metrics.
 Curriculum Development: Research studies using statistics help design effective
educational programs.
 Enrollment Trends: Forecasting future student populations aids in resource allocation.

4. Social Sciences

Sociologists, psychologists, and political scientists depend on statistics to:

 Survey Public Opinion: Polls and surveys provide insights into societal views and
preferences.
 Analyze Behavior: Psychological studies often use statistical methods to interpret human
behavior.
 Policy Making: Governments and organizations design policies based on statistical
analysis of social data.

5. Engineering and Technology

Statistics is integral to innovation and manufacturing processes:

 Product Design: Engineers use statistical modeling to optimize designs.


 Reliability Testing: Ensuring product durability and performance under varied
conditions.
 Data Science: Statistical algorithms are foundational in machine learning and artificial
intelligence.

6. Environmental Science

Statistics supports environmental monitoring and conservation efforts:

 Climate Analysis: Understanding temperature trends and weather patterns.


 Biodiversity Studies: Assessing species population trends and ecosystem health.
 Pollution Control: Measuring and analyzing pollution levels to develop mitigation
strategies.

7. Agriculture

In agriculture, statistics enhances productivity and efficiency:

 Crop Yield Analysis: Estimating potential output based on soil and weather conditions.
 Resource Optimization: Efficient use of fertilizers, water, and other inputs.
 Pest Control: Statistical models predict pest outbreaks and their impact on crops.

8. Sports and Entertainment

Statistics enriches both performance analysis and audience engagement:

 Player Performance: Evaluating athletes using statistical metrics.


 Game Strategies: Coaches use data analytics to devise winning tactics.
 Audience Insights: Understanding viewership patterns to tailor content delivery.

9. Law and Criminal Justice

Statistics aids in maintaining justice and public safety:

 Crime Rate Analysis: Identifying trends and hotspots to allocate resources.


 Jury Selection: Ensuring unbiased selection through demographic analysis.
 Forensic Science: Supporting evidence interpretation and validation.

10. Transport and Urban Planning

Statistics is vital for creating efficient infrastructure:

 Traffic Analysis: Managing congestion through data-driven solutions.


 Public Transit Planning: Designing optimal routes and schedules.
 Urban Development: Statistical studies guide the planning of housing, utilities, and
amenities.

11. Finance and Investment

The financial sector relies heavily on statistics:

 Risk Assessment: Analyzing potential losses in investments.


 Market Analysis: Identifying profitable opportunities using data trends.
 Portfolio Management: Optimizing investment strategies based on statistical models.

12. Astronomy and Space Science

In space exploration and research, statistics helps:

 Star Mapping: Analyzing data from telescopes to chart celestial bodies.


 Mission Planning: Risk assessment and trajectory optimization.
 Cosmological Studies: Understanding the universe's origins and evolution.
Conclusion

The importance of statistics cannot be overstated. It serves as a powerful tool for understanding
the world and making informed decisions across a multitude of fields. From enhancing business
strategies to advancing scientific discoveries, the applications of statistics are both diverse and
impactful. Mastery of statistical concepts and techniques empowers individuals and
organizations to navigate complexities, solve problems, and seize opportunities effectively.

Q. 2

(a) Describe the steps you would take to construct a


frequency distribution.

Steps to Construct a Frequency Distribution

A frequency distribution is a tabular representation of data that shows the frequency or number
of occurrences of each data point or a group of data points within specified intervals. It helps in
organizing raw data into a structured format, making it easier to identify patterns and interpret
the dataset. Here are the detailed steps to construct a frequency distribution:

1. Understand the Data and Objectives

Before constructing a frequency distribution, it is essential to:

 Understand the nature of the data (e.g., continuous or discrete).


 Define the purpose of the analysis (e.g., identifying trends or summarizing large datasets).

2. Collect and Organize Raw Data

Gather all the raw data and ensure it is accurate and complete. Arrange the data in ascending
order to identify its range and variations easily.

3. Determine the Range of the Data


The range is the difference between the highest and lowest values in the dataset:

 Formula: Range = Highest value − Lowest value


 Example: If the dataset is 12,15,20,25,3012, 15, 20, 25, 30, the range is 30 − 12 = 18.

4. Choose the Number of Classes

The number of classes (or intervals) affects the readability of the frequency distribution. Too few
classes may oversimplify the data, while too many classes may overcomplicate it. Use Sturges’
Rule as a guideline:

 Formula: Number of classes = 1 + 3.322 × log10(N)


 Here, NN is the total number of data points.

5. Determine the Class Width

Class width is the size of each class interval. It can be calculated using the formula:

 Formula: Class width = Range / Number of classes


 Round up the result to the nearest whole number for simplicity.

Example: If the range is 18 and the number of classes is 6, the class width is 18 / 6 = 3.

6. Define the Class Intervals

Create mutually exclusive and exhaustive class intervals. Ensure:

 No overlap between intervals.


 All data points are included.

Start from the lowest value in the dataset and add the class width successively to define the
intervals.

Example:

 If the lowest value is 12 and the class width is 3, the intervals can be:
o 12–14
o 15–17
o 18–20, and so on.

7. Tally the Data into Classes

Go through the dataset and count the frequency of data points falling into each class interval:

 Use a tally mark system for simplicity during counting.


 Sum up the tallies for each class to determine the frequency.

8. Create a Frequency Table

Organize the results into a frequency table with the following columns:

1. Class intervals
2. Tally marks (optional for visual representation)
3. Frequency (actual count of data points)

Example:

Class Interval Tally Frequency

12–14

15–17

18–20

9. Calculate Additional Measures (Optional)

Enhance the frequency table by adding columns for:

 Relative Frequency: Proportion of each class frequency to the total frequency.


o Formula: Relative frequency = (Class frequency / Total frequency) × 100%
 Cumulative Frequency: Running total of frequencies up to a certain class.
Example:

Class Interval Frequency Relative Frequency (%) Cumulative Frequency

12–14 4 40 4

15–17 3 30 7

18–20 2 20 9

10. Visualize the Data

Represent the frequency distribution graphically to facilitate analysis:

 Histogram: A bar graph where each bar represents a class interval, and the height corresponds to
the frequency.
 Frequency Polygon: A line graph connecting the midpoints of the class intervals.
 Cumulative Frequency Curve (Ogive): A line graph of cumulative frequency.

11. Interpret the Results

Analyze the frequency distribution to derive insights, such as:

 Identifying the class with the highest frequency (mode).


 Understanding data distribution trends (e.g., skewness, central tendency).

Example: Constructing a Frequency Distribution

Raw Data:

15,12,20,18,25,15,17,14,22,30,18,15,12,28,1915, 12, 20, 18, 25, 15, 17, 14, 22, 30, 18, 15, 12,
28, 19
Steps:

1. Organize the Data: Arrange in ascending order:


12,12,14,15,15,15,17,18,18,19,20,22,25,28,3012, 12, 14, 15, 15, 15, 17, 18, 18, 19, 20, 22, 25, 28,
30.
2. Range: 30 ‒ 12 = 18.
3. Number of Classes: Using Sturges’ Rule (N=15N = 15), classes = 1+3.322×log10(15)≈51 +
3.322 × log10(15) ≈ 5.
4. Class Width: 18 / 5 = 3.6, rounded to 4.
5. Define Class Intervals:
o 12–15
o 16–19
o 20–23
o 24–27
o 28–31
6. Tally and Frequency Table:

Class Interval Tally Frequency

12–15

16–19

20–23

24–27

28–31

Graphical Representation:

Use this table to create a histogram or frequency polygon to visualize the data distribution.

Conclusion

Constructing a frequency distribution involves systematic steps, from organizing raw data to
creating tables and graphs. By doing so, it becomes easier to identify patterns, analyze trends,
and communicate insights effectively. This process is invaluable in statistics, providing a
foundation for further analysis and decision-making.

(b) Tabulate the following marks in a grouped frequency


distribution.
74 49 103 95 90 118 52 88 101

96 72 56 64 110 97 59 62 96

82 65 85 105 116 91 83 99 52

76 84 89 11 104 96 84 62 58

66 100 80 54 75 55 99 104 78

66 96 83 57 60 51 114 120 121

92 88 64 63 95 78

Here is the grouped frequency distribution for the given marks:

Class Interval Frequency


[11, 27) 1
[27, 43) 0
[43, 59) 9
[59, 75) 12
[75, 91) 15
[91, 107) 17
[107, 123) 6
The intervals are created using an optimal class width based on the data range and number of
classes. Let me know if you need any further adjustments or visualizations!

Q. 3 (a)What is a measure of “Central Tendencey”?


What is the purpose served by it?

(a) What is a Measure of Central Tendency?

A measure of central tendency is a statistical metric used to identify the center point or typical
value of a dataset. It represents a single value that summarizes the entire dataset and provides a
central location around which data points are distributed. Central tendency is essential in
understanding the general characteristics of the data and is foundational for further statistical
analysis.

Types of Measures of Central Tendency

1. Mean (Arithmetic Average):


o The mean is calculated by summing all the data values and dividing by the total
number of values.
o Formula: extMean=Sum of all data valuesTotal number of data values
ext{Mean} = \frac{\text{Sum of all data values}}{\text{Total number of data
values}}
o Example: For data 4,6,8,104, 6, 8, 10, the mean is ((4 + 6 + 8 + 10) / 4 = 7.
2. Median:
o The median is the middle value of an ordered dataset. If the dataset has an even
number of values, the median is the average of the two middle numbers.
o Example: For data 5,9,11,15,205, 9, 11, 15, 20, the median is 11. For 5,9,11,155,
9, 11, 15, the median is ((9 + 11)/2 = 10.
3. Mode:
o The mode is the most frequently occurring value in a dataset. A dataset can have
one mode (unimodal), two modes (bimodal), or multiple modes (multimodal).
o Example: For data 2,3,3,4,52, 3, 3, 4, 5, the mode is 3.
4. Other Measures (less common):
o Geometric Mean: Used for datasets involving rates or growth.
o Harmonic Mean: Suitable for datasets involving ratios or rates, such as speeds.

Purpose of Measures of Central Tendency

Measures of central tendency serve multiple purposes across various fields. Their primary
objectives include:

1. Summarizing Data:
o Central tendency provides a single representative value for a dataset, reducing
complexity and making the data easier to understand.
2. Comparison Across Groups:
o It allows for comparisons between different datasets or populations. For instance,
comparing the average income of two regions can highlight economic disparities.
3. Facilitating Decision-Making:
o In business, education, and healthcare, decisions are often based on the average
performance, cost, or outcomes derived from central tendency metrics.
4. Understanding Data Distribution:
o Central tendency helps to identify whether the data is symmetric, skewed, or
contains outliers.
5. Foundation for Further Analysis:
o Measures of central tendency are the starting point for advanced statistical
techniques like variance, standard deviation, and hypothesis testing.

Characteristics of a Good Measure of Central Tendency

1. Representative:
o It should accurately reflect the central value of the dataset.
2. Simple and Easy to Compute:
o The measure should be straightforward to calculate and interpret.
3. Resistant to Extreme Values:
o A good measure is not unduly affected by outliers. For instance, the median is less
sensitive to extreme values than the mean.
4. Applicability:
o It should be applicable across various datasets and scenarios.

Comparison of the Mean, Median, and Mode

Measure Advantages Disadvantages


Mean - Easy to calculate - Sensitive to extreme values
- Utilizes all data points - Not suitable for skewed datasets
Median - Resistant to outliers - Ignores some data points
- Suitable for skewed data - Not as informative for large datasets
Mode - Identifies most frequent value - May not exist or be unique
- Useful for categorical data - Ignores other values in dataset

Applications of Central Tendency

1. Business and Economics:


o Analyzing average income, expenditure, or sales.
o Determining central market trends.
2. Education:
o Calculating average test scores to gauge student performance.
o Identifying the typical grade achieved by students.
3. Healthcare:
o Determining average patient recovery times.
o Analyzing central trends in disease occurrence.
4. Social Sciences:
o Studying typical behavior or responses in surveys.
o Analyzing demographic trends like average age or income.
5. Engineering and Technology:
o Identifying typical performance metrics.
o Comparing central tendencies of different systems.

Example: Comparing Measures of Central Tendency

Consider a dataset of monthly incomes in dollars: 2,000,2,500,3,000,10,0002,000, 2,500, 3,000,


10,000.

1. Mean: (2000+2500+3000+10000)/4=4,375(2000 + 2500 + 3000 + 10000)/4 = 4,375.


This value is skewed by the outlier (10,000).
2. Median: The middle value after ordering: ((2500 + 3000)/2 = 2,750.
3. Mode: No mode exists since all values are unique.

In this case, the median provides a more accurate representation of central income.

Conclusion

A measure of central tendency is a cornerstone of statistical analysis. It provides a summary of


data, simplifies decision-making, and lays the groundwork for more complex statistical methods.
Understanding the strengths and limitations of each measure (mean, median, and mode) is
essential to selecting the most appropriate one for specific datasets and objectives.
(b) The frequency distribution of a group of
persons according to age is given below:
Age in years <1 1–4 5–9 10–19 20–29 30–39 40–59 60–79

No. of Persons 5 10 11 12 22 18 8 7

Calculate the Mean and the Median ages of the


distribution.

Here are the detailed steps and calculations for determining the Mean and Median ages of the
distribution:

1. Mean Age

Formula:
Mean=∑(f⋅x)∑f\text{Mean} = \frac{\sum (f \cdot x)}{\sum f}

Where:

 ff = frequency of each class


 xx = midpoint of each class

Steps:

1. Calculate the Midpoints (xx) of each age interval:


o For each class, the midpoint is lower limit+upper limit2\frac{\text{lower limit} + \
text{upper limit}}{2}.
o Example: For 1−41-4, midpoint = (1+4)/2=2.5(1 + 4) / 2 = 2.5.

2. Multiply Midpoints by Frequencies (f⋅xf \cdot x):


o For each class, multiply the midpoint by its frequency.

3. Find the Sum of Products (∑(f⋅x)\sum (f \cdot x)):


o Add all the f⋅xf \cdot x values.

4. Find the Total Frequency (∑f\sum f):


o Add all frequencies.

5. Calculate the Mean:


o Divide the sum of f⋅xf \cdot x by the total frequency.

Age Interval Midpoint (xx) Frequency (ff) f⋅xf \cdot x

<1 0.5 5 2.5

1–4 2.5 10 25.0

5–9 7.0 11 77.0

10–19 14.5 12 174.0

20–29 24.5 22 539.0

30–39 34.5 18 621.0

40–59 49.5 8 396.0

60–79 69.5 7 486.5

Total 93 2321.0

Calculation:
Mean=∑(f⋅x)∑f=2321.093≈24.96 years\text{Mean} = \frac{\sum (f \cdot x)}{\sum f} = \frac{2321.0}
{93} \approx 24.96 \, \text{years}
2. Median Age

Formula:
Median=L+(N2−CFfm)⋅h\text{Median} = L + \left( \frac{\frac{N}{2} - CF}{f_m} \right) \cdot h

Where:

 LL: Lower boundary of the median class.


 NN: Total frequency.
 CFCF: Cumulative frequency before the median class.
 fmf_m: Frequency of the median class.
 hh: Class width.

Steps:

1. Find the Median Class:


o Locate the class where the cumulative frequency ≥N2\geq \frac{N}{2} (i.e., 46.5).
o Cumulative frequencies:
 <1: 55
 1–4: 1515
 5–9: 2626
 10–19: 3838
 20–29: 6060 → Median class is 20–2920–29.

2. Extract Median Class Details:


o L=19.5L = 19.5 (lower boundary of 20–2920–29).
o N=93N = 93 (total frequency).
o CF=38CF = 38 (cumulative frequency before the median class).
o fm=22f_m = 22 (frequency of the median class).
o h=10h = 10 (class width).

3. Substitute Values into the Formula:


Median=19.5+(46.5−3822)⋅10\text{Median} = 19.5 + \left( \frac{46.5 - 38}{22} \right) \cdot 10

4. Simplify:

Median=19.5+(8.522)⋅10\text{Median} = 19.5 + \left( \frac{8.5}{22} \right) \cdot 10


Median=19.5+3.86≈18.86 years\text{Median} = 19.5 + 3.86 \approx 18.86 \, \text{years}

Results:

 Mean Age: 24.96 years (approximately)


 Median Age: 18.86 years (approximately)

Let me know if you'd like additional clarifications!

Q. 4 (a)Define Mean, Median and Mode. Give an empirical


relation between them. Does this relation give correct value
for the Mode?

Here’s an 800-word essay on the topic, including the definitions of mean, median, and mode,
their relationship, and a discussion on the empirical formula's accuracy.

Mean, Median, and Mode: Definitions, Relationship, and Accuracy

Statistics is a branch of mathematics that deals with collecting, analyzing, interpreting, and
presenting data. In statistics, measures of central tendency help summarize large data sets by
identifying a single value that represents the entire distribution. The three most commonly used
measures of central tendency are Mean, Median, and Mode. Each measure has unique
characteristics, applications, and implications, and their relationship is often described through
an empirical formula.

1. Definition of Mean
The Mean, often referred to as the average, is the most commonly used measure of central
tendency. It is calculated by dividing the sum of all data points by the total number of data
points. The formula for the arithmetic mean is:

Mean=Sum of all valuesNumber of values\text{Mean} = \frac{\text{Sum of all values}}{\text{Number


of values}}

Example:

If a dataset contains the numbers 4, 6, 8, and 10, the mean is calculated as:

Mean=4+6+8+104=284=7\text{Mean} = \frac{4 + 6 + 8 + 10}{4} = \frac{28}{4} = 7

The mean is highly sensitive to extreme values (outliers). For example, if a large value is added
to the dataset, the mean will shift significantly, making it less representative of the majority of
the data.

2. Definition of Median

The Median is the middle value of a dataset when the data is arranged in ascending or
descending order. If the dataset contains an odd number of values, the median is the central
value. If the dataset contains an even number of values, the median is the average of the two
middle values.

Example:

 For the dataset 4, 6, 8, 10, the median is: Median=6+82=7\text{Median} = \frac{6 + 8}{2} = 7
 For the dataset 4, 6, 8, the median is: Median=6\text{Median} = 6

The median is not affected by extreme values, making it a robust measure of central tendency,
especially for skewed distributions.

3. Definition of Mode
The Mode is the value that occurs most frequently in a dataset. A dataset can have one mode
(unimodal), more than one mode (bimodal or multimodal), or no mode if all values occur with
the same frequency.

Example:

 In the dataset 4, 6, 6, 8, 10, the mode is 6, as it appears twice.


 In the dataset 4, 6, 8, 10, there is no mode since all values appear only once.

Mode is particularly useful for categorical data or datasets where the most common value is of
interest.

4. Empirical Relationship Between Mean, Median, and Mode

For a moderately skewed distribution, the following empirical formula describes the relationship
between the three measures of central tendency:

Mode=3×Median−2×Mean\text{Mode} = 3 \times \text{Median} - 2 \times \text{Mean}

Derivation and Rationale:

This relationship arises because, in a symmetric distribution (e.g., normal distribution), the mean,
median, and mode are equal. However, in a skewed distribution:

 The mean is pulled towards the tail of the distribution due to extreme values.
 The mode remains near the peak of the distribution.
 The median lies between the mean and the mode.

The formula provides an approximate mode based on the mean and median.

Example:

Consider a dataset where the mean is 30 and the median is 25. Using the formula:

Mode=3×25−2×30=75−60=15\text{Mode} = 3 \times 25 - 2 \times 30 = 75 - 60 = 15

5. Accuracy of the Empirical Relationship


The empirical formula provides a reasonable approximation for the mode in many cases,
especially when the distribution is moderately skewed. However, there are limitations to its
accuracy:

When the Formula Works:

 In moderately skewed distributions where the tail is not excessively long or extreme, the formula
gives a close estimate of the mode.
 It is particularly useful when the mode cannot be directly calculated or identified, such as in
continuous data.

When the Formula Fails:

1. Highly Skewed Distributions:


o For highly skewed distributions (e.g., income data), the mode can deviate significantly
from the value predicted by the formula.
2. Bimodal or Multimodal Distributions:
o The formula assumes the presence of a single mode, so it is not applicable to datasets
with multiple peaks.
3. Small or Irregular Datasets:
o In small datasets or those with irregular frequency distributions, the mode may differ
substantially from the formula's prediction.

Example of Failure:

Consider a dataset where the mean is 40, the median is 35, and the actual mode is 50. Using the
formula:

Mode=3×35−2×40=105−80=25\text{Mode} = 3 \times 35 - 2 \times 40 = 105 - 80 = 25

The formula predicts a mode of 25, which is far from the actual mode of 50.

6. Practical Applications of the Three Measures

 Mean:
o Used in scientific research, economics, and business analytics where the overall average
is required.
 Median:
o Preferred in real estate, income distribution, and other areas with skewed data where the
middle value is more representative.
 Mode:
o Useful in marketing, demographics, and fashion industries where the most frequent
category or choice is of interest.

7. Conclusion

The mean, median, and mode are fundamental measures of central tendency, each with unique
strengths and weaknesses. The empirical relationship Mode=3×Median−2×Mean\text{Mode} =
3 \times \text{Median} - 2 \times \text{Mean} offers a useful approximation for the mode in
moderately skewed distributions. However, it is not universally reliable, especially in cases of
extreme skewness, multimodal distributions, or small datasets. Understanding when and how to
apply each measure is crucial for accurate data analysis and interpretation.

(b) Calculate the modal numbers of persons per house from the
following data:
No. of persons per house 1 2 3 4 5 6 7 8 9 10

No. of houses 26 113 120 95 60 42 21 14 5 4

To calculate the Mode for the given data, we use the Mode formula for grouped data:

Mode=L+(fm−f1(fm−f1)+(fm−f2))×h\text{Mode} = L + \left( \frac{f_m - f_{1}}{(f_m - f_{1})


+ (f_m - f_{2})} \right) \times h

Where:

 LL = lower boundary of the modal class.


 fmf_m = frequency of the modal class.
 f1f_{1} = frequency of the class preceding the modal class.
 f2f_{2} = frequency of the class following the modal class.
 hh = class width.

Step 1: Identify the Modal Class

 The modal class is the class with the highest frequency.


 From the data, the highest frequency is fm=120f_m = 120, corresponding to 3 persons
per house.

Step 2: Define the Values

For the modal class (3 persons per house):

 L=2.5L = 2.5 (lower boundary of the class: assume class width = 1)


 fm=120f_m = 120 (frequency of the modal class)
 f1=113f_{1} = 113 (frequency of the preceding class: 2 persons per house)
 f2=95f_{2} = 95 (frequency of the following class: 4 persons per house)
 h=1h = 1 (class width)

Step 3: Apply the Mode Formula

Mode=2.5+(120−113(120−113)+(120−95))×1\text{Mode} = 2.5 + \left( \frac{120 - 113}{(120 -


113) + (120 - 95)} \right) \times 1

First, calculate the terms:

 fm−f1=120−113=7f_m - f_{1} = 120 - 113 = 7


 fm−f2=120−95=25f_m - f_{2} = 120 - 95 = 25
 Sum: (120−113)+(120−95)=7+25=32(120 - 113) + (120 - 95) = 7 + 25 = 32

Now, substitute into the formula:

Mode=2.5+(732)×1\text{Mode} = 2.5 + \left( \frac{7}{32} \right) \times 1


Mode=2.5+0.21875≈2.72\text{Mode} = 2.5 + 0.21875 \approx 2.72

Result

The modal number of persons per house is approximately 2.72 persons per house.

Q. 5

(a) Define Harmonic mean. How does it differ from arithmetic


mean? What are its advantages and disadvantages?

Harmonic Mean: Definition, Differences, Advantages, and Disadvantages

In statistics, various measures of central tendency are used to summarize and analyze data. One
such measure is the Harmonic Mean, which is particularly useful in certain situations where
other measures like the Arithmetic Mean may not be suitable. Understanding the harmonic mean,
its differences from the arithmetic mean, and its pros and cons is essential for accurate data
interpretation and analysis.

Definition of Harmonic Mean


The Harmonic Mean (HM) is a type of average calculated as the reciprocal of the arithmetic
mean of the reciprocals of a given set of data values. It is especially useful in situations where
rates, ratios, or proportions are involved.

Formula for Harmonic Mean:

For a dataset with nnn values (x1,x2,x3,…,xnx_1, x_2, x_3, \dots, x_nx1,x2,x3,…,xn):

Harmonic Mean (HM)=n∑i=1n1xi\text{Harmonic Mean (HM)} = \frac{n}{\sum_{i=1}^{n} \frac{1}


{x_i}}Harmonic Mean (HM)=∑i=1nxi1n

Alternatively, it can be expressed as:

HM=Number of ObservationsSum of Reciprocals of Observations\text{HM} = \frac{\text{Number of


Observations}}{\text{Sum of Reciprocals of
Observations}}HM=Sum of Reciprocals of ObservationsNumber of Observations

Example:

Consider three values: 4, 6, and 12. The harmonic mean is:

HM=314+16+112=30.25+0.1667+0.0833=30.5=6\text{HM} = \frac{3}{\frac{1}{4} + \frac{1}{6} + \


frac{1}{12}} = \frac{3}{0.25 + 0.1667 + 0.0833} = \frac{3}{0.5} = 6HM=41+61+1213
=0.25+0.1667+0.08333=0.53=6

The harmonic mean emphasizes smaller values in the dataset, making it especially relevant in
datasets where outliers (large values) would distort the results of other means.

Difference Between Harmonic Mean and Arithmetic Mean

The Arithmetic Mean (AM) is the simple average of a dataset, calculated by summing all
values and dividing by the number of values. The Harmonic Mean (HM), on the other hand, is
based on the reciprocals of the values.
Key Differences:

1. Formula:
o Arithmetic Mean: AM=∑xin\text{AM} = \frac{\sum x_i}{n}AM=n∑xi
o Harmonic Mean: HM=n∑1xi\text{HM} = \frac{n}{\sum \frac{1}{x_i}}HM=∑xi1n

2. Emphasis:
o Arithmetic Mean gives equal weight to all values in the dataset.
o Harmonic Mean gives more weight to smaller values, making it more sensitive to low
values.

3. Usage:
o Arithmetic Mean is used for general data and sums.
o Harmonic Mean is used for rates, speeds, or ratios.

4. Order of Magnitude:
o For the same dataset: HM≤AM\text{HM} \leq \text{AM}HM≤AM, with equality only
when all values are the same.

Example Comparison:

For the dataset 4, 6, and 12:

 Arithmetic Mean: AM=4+6+123=223≈7.33\text{AM} = \frac{4 + 6 + 12}{3} = \frac{22}{3} \


approx 7.33AM=34+6+12=322≈7.33
 Harmonic Mean: HM=314+16+112=6\text{HM} = \frac{3}{\frac{1}{4} + \frac{1}{6} + \
frac{1}{12}} = 6HM=41+61+1213=6

The harmonic mean (6) is less than the arithmetic mean (7.33), showing the harmonic mean’s
tendency to be pulled down by smaller values.

Advantages of Harmonic Mean

1. Suitability for Rates and Ratios: The harmonic mean is ideal for situations where the
data represents rates, such as speeds, densities, or other inversely proportional quantities.
For example, when calculating average speed for a round trip with different speeds in
each direction, the harmonic mean provides the correct result.

Example: If a car travels at 40 km/h for one direction and 60 km/h for the return trip, the
average speed is:

HM=2140+160=20.025+0.0167≈48 km/h.\text{HM} = \frac{2}{\frac{1}{40} + \frac{1}{60}}


= \frac{2}{0.025 + 0.0167} \approx 48 \, \text{km/h}.HM=401+6012=0.025+0.01672≈48km/h.

2. Weighting Low Values: The harmonic mean gives greater importance to smaller values,
which can be beneficial when small values play a critical role in the dataset.
3. Minimizes Impact of Outliers: The harmonic mean reduces the influence of large
outliers, making it more robust in certain datasets.
4. Precision in Specialized Contexts: It is widely used in finance (e.g., price-to-earnings
ratios), engineering (e.g., electrical resistances), and science (e.g., averaging speeds).

Disadvantages of Harmonic Mean

1. Not Applicable for Zero or Negative Values: The harmonic mean cannot be calculated
for datasets containing zero, as division by zero is undefined. Similarly, it is not well-
suited for datasets with negative values.
2. Complexity in Interpretation: For general datasets, the harmonic mean is less intuitive
and harder to understand compared to the arithmetic mean.
3. Overemphasis on Small Values: While the harmonic mean benefits from giving weight
to smaller values, this property can sometimes distort results, especially if the smaller
values are outliers.
4. Limited Applicability: The harmonic mean is not suitable for additive data (e.g., total
income or total profit), as it is designed for rates or ratios.

Applications of Harmonic Mean

1. Finance:
o Used in calculating average price-to-earnings (P/E) ratios in stock markets.
o Employed in weighted portfolio calculations.

2. Physics:
o Used for finding average resistances in parallel circuits.

3. Transportation:
o Calculating average speed over equal distances with varying speeds.

4. Economics:
o Useful in situations involving rates, such as population growth or inflation rates.

Conclusion

The harmonic mean is a specialized measure of central tendency that is particularly suited for
datasets involving rates, ratios, or proportions. It differs from the arithmetic mean by giving
more weight to smaller values, making it an effective tool for specific applications like
calculating average speeds or financial ratios. However, its limitations, such as sensitivity to zero
and its complexity, restrict its use to particular scenarios. Understanding its advantages and
disadvantages enables statisticians and analysts to choose the appropriate measure of central
tendency for their data, ensuring more accurate and meaningful interpretations.

(b) Calculate G. M and H. M for the following frequency


distribution given below:
Income
35–39 40–44 45–49 50–54 55–59 60–64 65–69
(weekly)

No. of
15 13 17 29 11 10 5
workers

To calculate the Geometric Mean (G.M.) and Harmonic Mean (H.M.) for the given frequency
distribution, let's proceed step by step.
Frequency Distribution Table

We first tabulate the data and include the midpoints (xx) for each income class.

Income Midpoint
Frequency (ff) f⋅log⁡xf \cdot \log x (for G.M.) fx\frac{f}{x} (for H.M.)
(weekly) (xx)

35–39 15 37 15⋅log⁡3715 \cdot \log 37 1537\frac{15}{37}

40–44 13 42 13⋅log⁡4213 \cdot \log 42 1342\frac{13}{42}

45–49 17 47 17⋅log⁡4717 \cdot \log 47 1747\frac{17}{47}

50–54 29 52 29⋅log⁡5229 \cdot \log 52 2952\frac{29}{52}

55–59 11 57 11⋅log⁡5711 \cdot \log 57 1157\frac{11}{57}

60–64 10 62 10⋅log⁡6210 \cdot \log 62 1062\frac{10}{62}

65–69 5 67 5⋅log⁡675 \cdot \log 67 567\frac{5}{67}

Step 1: Calculate the Geometric Mean (G.M.)

The formula for G.M. is:

G.M.=antilog(∑(f⋅log⁡x)∑f)\text{G.M.} = \text{antilog} \left( \frac{\sum (f \cdot \log x)}{\sum f} \right)

Step 1.1: Calculate f⋅log⁡xf \cdot \log x

Find the logarithms of the midpoints and multiply them by their respective frequencies (ff):

log⁡37=1.5682, log⁡42=1.6232, log⁡47=1.6721, log⁡52=1.7160, log⁡57=1.7559, log⁡62=1.7924,


log⁡67=1.8261\log 37 = 1.5682, \; \log 42 = 1.6232, \; \log 47 = 1.6721, \; \log 52 = 1.7160, \; \log 57 =
1.7559, \; \log 62 = 1.7924, \; \log 67 = 1.8261

Midpoint (xx) Frequency (ff) f⋅log⁡xf \cdot \log x

37 15 15⋅1.5682=23.52315 \cdot 1.5682 = 23.523


Midpoint (xx) Frequency (ff) f⋅log⁡xf \cdot \log x

42 13 13⋅1.6232=21.101613 \cdot 1.6232 = 21.1016

47 17 17⋅1.6721=28.425717 \cdot 1.6721 = 28.4257

52 29 29⋅1.7160=49.76429 \cdot 1.7160 = 49.764

57 11 11⋅1.7559=19.314911 \cdot 1.7559 = 19.3149

62 10 10⋅1.7924=17.92410 \cdot 1.7924 = 17.924

67 5 5⋅1.8261=9.13055 \cdot 1.8261 = 9.1305

∑f=100,∑(f⋅log⁡x)=169.1827\sum f = 100, \quad \sum (f \cdot \log x) = 169.1827

Step 1.2: Apply the G.M. Formula


G.M.=antilog(∑(f⋅log⁡x)∑f)\text{G.M.} = \text{antilog} \left( \frac{\sum (f \cdot \log x)}{\sum f} \right)
G.M.=antilog(169.1827100)=antilog(1.6918)\text{G.M.} = \text{antilog} \left( \frac{169.1827}{100} \
right) = \text{antilog}(1.6918) G.M.≈49.0\text{G.M.} \approx 49.0

Step 2: Calculate the Harmonic Mean (H.M.)

The formula for H.M. is:

H.M.=∑f∑(fx)\text{H.M.} = \frac{\sum f}{\sum \left( \frac{f}{x} \right)}

Step 2.1: Calculate fx\frac{f}{x}

Find fx\frac{f}{x} for each midpoint:

Midpoint (xx) Frequency (ff) fx\frac{f}{x}

37 15 1537=0.4054\frac{15}{37} = 0.4054

42 13 1342=0.3095\frac{13}{42} = 0.3095
Midpoint (xx) Frequency (ff) fx\frac{f}{x}

47 17 1747=0.3617\frac{17}{47} = 0.3617

52 29 2952=0.5577\frac{29}{52} = 0.5577

57 11 1157=0.1930\frac{11}{57} = 0.1930

62 10 1062=0.1613\frac{10}{62} = 0.1613

67 5 567=0.0746\frac{5}{67} = 0.0746

∑(fx)=2.0632\sum \left( \frac{f}{x} \right) = 2.0632

Step 2.2: Apply the H.M. Formula


H.M.=∑f∑(fx)=1002.0632≈48.5\text{H.M.} = \frac{\sum f}{\sum \left( \frac{f}{x} \right)} = \frac{100}
{2.0632} \approx 48.5

Final Results

 Geometric Mean (G.M.): 49.049.0


 Harmonic Mean (H.M.): 48.548.5

You might also like