Name Rabia Basri ID 18PKR10306 Program B.S (Library Info - Sciences) Semester SPRING 2024
Name Rabia Basri ID 18PKR10306 Program B.S (Library Info - Sciences) Semester SPRING 2024
ASSIGNMENT No. 1
establish relationships between variables, thereby enhancing our understanding of the world
around us.
(b) Describe the methods which can be used in the collection of statistical data, stating
the advantages and disadvantages of each method.
1.Surveys and Questionnaires: Surveys and questionnaires are used to collect data from
a large group of people by asking them a set of predefined questions.
Advantages:
o Cost-effective: Relatively inexpensive, especially for large populations.
o Standardized: Ensures uniformity in questions, making data analysis easier.
o Wide Reach: Can be distributed widely through mail, online, or in-person.
Disadvantages:
o Response Bias: Responses may be influenced by how questions are framed or
respondents' desire to present themselves in a certain way.
o Low Response Rates: Particularly for mail or online surveys.
o Limited Depth: Typically, surveys collect quantitative rather than qualitative
data.
2.Interviews: Interviews involve direct, face-to-face or virtual interaction between the
interviewer and the respondent.
Advantages:
o Depth of Information: Allows for detailed and nuanced responses.
o Clarification: Interviewers can clarify questions and probe deeper based on
responses.
o Higher Response Rates: People are more likely to respond in a personal
setting.
Disadvantages:
o Time-consuming: Conducting and transcribing interviews can be very time-
intensive.
o Expensive: Requires more resources compared to surveys.
o Interviewer Bias: Responses can be influenced by the interviewer's behavior
or phrasing.
3.Observations: Observational methods involve collecting data by watching subjects in
their natural environment without interference.
Advantages:
o Natural Behavior: Captures genuine behavior without manipulation.
Course: Introduction to Statistics (4485)
Semester: Spring, 2024
Advantages:
o Rich Qualitative Data: Provides deep insights into participants' attitudes and
perceptions.
o Interactive: Participants can build on each other's ideas, leading to more
comprehensive data.
Disadvantages:
o Group Dynamics: Dominant participants can skew the discussion, and
groupthink can occur.
o Limited Generalizability: Findings are based on a small, non-random
sample.
o Logistics: Organizing and moderating focus groups can be challenging and
costly.
Each method of data collection has its unique advantages and disadvantages. The choice of
method depends on the research objectives, resources available, and the nature of the data
required. Combining multiple methods, known as triangulation, can often provide a more
comprehensive understanding of the research question by balancing the strengths and
weaknesses of individual methods.
Q. 2
(a) Explain what is meant by classification? What are its basic principles?
Classification in statistics refers to the process of organizing data into different categories or
classes based on their characteristics or attributes. This helps in simplifying, summarizing,
and analyzing the data more effectively. By grouping similar data items together, patterns and
relationships can be more easily identified.
Basic Principles of Classification
1.Clarity and Simplicity:
o Principle: The classification should be clear and straightforward.
o Example: When classifying survey responses about customer satisfaction, use
simple categories like "Very Satisfied," "Satisfied," "Neutral," "Dissatisfied,"
and "Very Dissatisfied."
2.Mutual Exclusiveness:
o Principle: Each data item should fit into one and only one category to avoid
overlap.
Course: Introduction to Statistics (4485)
Semester: Spring, 2024
o Label the X-axis (horizontal) and Y-axis (vertical) with appropriate titles and units
of measurement.
o Example: X-axis: "Months," Y-axis: "Sales Revenue (in $1000s)"
3. Choose the Right Type of Graph:
o Select the appropriate type of graph for the data being presented (e.g., bar graph,
line graph, pie chart, histogram).
o Example: Use a line graph for showing trends over time.
4. Consistent Scale:
o Use a consistent scale on the axes to avoid misleading representations of data.
o Example: Ensure equal intervals on the Y-axis for a bar graph to accurately
compare the heights of bars.
5. Include a Legend:
o If the graph includes multiple data sets or categories, include a legend to
differentiate between them.
o Example: Different lines on a line graph representing different products should be
identified in the legend.
6. Data Points and Lines:
o Clearly mark data points and use lines or bars that are easily distinguishable.
o Example: Use different colors or patterns for different data sets in a multi-line
graph.
7. Avoid Clutter:
o Keep the graph simple and avoid excessive grid lines, text, or other elements that
can clutter the graph.
o Example: Use only essential grid lines to enhance readability.
8. Accurate Representation:
o Ensure the graph accurately represents the data without distortion.
o Example: Avoid using a truncated Y-axis that can exaggerate differences between
data points.
9. Source of Data:
o Include the source of the data if it is not original to provide context and credibility.
o Example: "Source: Company Sales Records"
10. Annotations:
Use annotations or labels for important data points or trends to provide additional
context.
Course: Introduction to Statistics (4485)
Semester: Spring, 2024
The arithmetic mean is a fundamental statistical measure that provides a useful summary of
central tendency. While it has several advantages, including simplicity and comprehensive
inclusion of data points, it also has limitations such as sensitivity to outliers and
inapplicability to certain types of data. Various methods, including simple, weighted, and
grouped data calculations, allow for flexibility in its application depending on the dataset's
nature.
Course: Introduction to Statistics (4485)
Semester: Spring, 2024
(b) The weight of the 40 male students at a university are given in the following
frequency table:
Weight 118-126 127-135 136-144 145-153 154-162 162-171 172-180
Frequency 3 5 9 12 5 4 2
Course: Introduction to Statistics (4485)
Semester: Spring, 2024
Q. 4
(a) Define the mode of a frequency distribution. How does it compare with other types
of averages?
The mode of a frequency distribution is the value that appears most frequently in the dataset.
In other words, it is the value or class interval with the highest frequency. The mode is
particularly useful in identifying the most common value or the peak in a distribution.
Comparison with Other Types of Averages
1.Mode vs. Arithmetic Mean:
o Calculation:
Mode: The value with the highest frequency.
Mean: Sum of all values divided by the number of values.
o Sensitivity to Outliers:
Mode: Not affected by extreme values (outliers).
Mean: Can be significantly affected by outliers, which can distort the
representation of central tendency.
o Usage:
Mode: Useful for categorical data and identifying the most common
category.
Mean: Best for continuous data and provides a mathematical basis for
further statistical analysis.
o Example:
Mode: In a dataset of shoe sizes: {6, 6, 7, 8, 8, 8, 9}, the mode is 8.
Course: Introduction to Statistics (4485)
Semester: Spring, 2024
Mean: In a dataset of test scores: {70, 75, 80, 85, 90}, the mean is
70+75+80+85+905=80\frac{70 + 75 + 80 + 85 + 90}{5} =
80570+75+80+85+90=80.
2.Mode vs. Median:
o Calculation:
Mode: The most frequently occurring value.
Median: The middle value when the data is ordered.
o Sensitivity to Outliers:
Mode: Not affected by outliers.
Median: Less affected by outliers compared to the mean, making it a
better measure of central tendency for skewed distributions.
o Usage:
Mode: Useful for identifying the most common value in a dataset,
especially in categorical or discrete data.
Median: Best for ordinal data or skewed distributions where the
central value is more representative than the mean.
o Example:
Mode: In a dataset of favorite ice cream flavors: {Vanilla, Chocolate,
Chocolate, Strawberry}, the mode is Chocolate.
Median: In a dataset of house prices: {$100,000, $150,000, $200,000,
$250,000, $1,000,000}, the median is $200,000.
3.Advantages of Mode:
o Simple to Understand: The mode is easy to identify and interpret.
o Useful for Categorical Data: It is the only measure of central tendency that
can be used with nominal data.
o Not Affected by Outliers: The mode is robust to extreme values.
4.Disadvantages of Mode:
o May Not Be Unique: A dataset can have more than one mode (bimodal or
multimodal).
o Less Informative for Continuous Data: It may not provide a meaningful
measure of central tendency for continuous data.
o Less Stable: The mode can change with a small change in the data, especially
in small datasets.
Course: Introduction to Statistics (4485)
Semester: Spring, 2024
The mode is a useful measure of central tendency, particularly for categorical data and
distributions with a clear peak. While it provides a simple and intuitive measure of the most
common value, it has limitations compared to the arithmetic mean and median, especially in
terms of providing a comprehensive summary of continuous data and being influenced by
small changes in the dataset. Each measure of central tendency—mean, median, and mode—
has its own strengths and appropriate applications, and choosing the right one depends on the
nature of the data and the specific analysis requirements.
(b) The following is the distribution of wages per thousand employees in a certain
factory.
Daily 22 24 26 28 30 32 34 36 38 40 42 44
Wages
(Rs.)
No. of 3 13 43 102 175 220 204 139 69 25 6 1
Employees
Course: Introduction to Statistics (4485)
Semester: Spring, 2024
Q. 5
(a) Define Harmonic Mean. How does it differ from arithmetic mean? What are its
advantages and disadvantages.
The harmonic mean is a measure of central tendency that is calculated as the reciprocal of the
average of the reciprocals of the data values. It is particularly useful for data sets involving
rates or ratios.
Course: Introduction to Statistics (4485)
Semester: Spring, 2024
The harmonic mean is a valuable measure of central tendency, especially suited for rates and
ratios. While it has specific advantages in dealing with certain types of data, its sensitivity to
small values and less intuitive nature make it less versatile than the arithmetic mean for
general use. Its application should be chosen based on the characteristics of the data and the
specific requirements of the analysis.
(b) Calculate G. M and H. M for the following frequency distribution given below:
Variable 0–5 5–10 10–15 15–20 20–25 25–30 30–35
Frequency 2 5 7 13 21 16 8
Course: Introduction to Statistics (4485)
Semester: Spring, 2024
Course: Introduction to Statistics (4485)
Semester: Spring, 2024
import numpy as np
# Define mid-points and frequencies
mid_points = np.array([2.5, 7.5, 12.5, 17.5, 22.5, 27.5, 32.5])
frequencies = np.array([2, 5, 7, 13, 21, 16, 8])
# Calculate the product of mid-points raised to the power of their frequencies
product = np.prod(mid_points ** frequencies)
# Calculate the geometric mean
N = np.sum(frequencies)
geometric_mean = product ** (1/N)
geometric_mean
Course: Introduction to Statistics (4485)
Semester: Spring, 2024