0% found this document useful (0 votes)
50 views

Assignment No 2 Course Codes 8614

This document contains information about a student named Fatima Ahsan taking the course code 8614 in the 3rd semester of their 1.5 year B.D program. It includes two assignments, the first asking how to calculate the median and its merits and demerits, and the second explaining the process of hypothesis testing and types of errors. The student's responses provide steps for calculating the median from a data set, advantages of the median including resistance to outliers and simplicity, disadvantages including ignoring data magnitude. The response on hypothesis testing outlines the process, types I and II errors and their probabilities, and importance of effect size and practical significance. A third question defines Pearson correlation and how it is interpreted.

Uploaded by

Fatima Ahsan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views

Assignment No 2 Course Codes 8614

This document contains information about a student named Fatima Ahsan taking the course code 8614 in the 3rd semester of their 1.5 year B.D program. It includes two assignments, the first asking how to calculate the median and its merits and demerits, and the second explaining the process of hypothesis testing and types of errors. The student's responses provide steps for calculating the median from a data set, advantages of the median including resistance to outliers and simplicity, disadvantages including ignoring data magnitude. The response on hypothesis testing outlines the process, types I and II errors and their probabilities, and importance of effect size and practical significance. A third question defines Pearson correlation and how it is interpreted.

Uploaded by

Fatima Ahsan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Name: FatimaAhsan

Username I’d; 19psl01931

Level ;Be .d1.5 year

Semester; 3rd spring (2023)

Course code: 8614

Assignment no 2

Q.1 How do we calculate median? Also mention its merits and demerits?

Ans:

The median is a statistical measure that represents the middle value in a set of data. It is a type of
measure of central tendency, along with the mean and mode. Unlike the mean, which is the arithmetic
average of all the values, the median focuses on the middle value of the data.

To calculate the median, follow these steps:

1. Arrange the data in ascending order.

2. If the number of data points is odd, the median is the middle value in the sorted list.

3. If the number of data points is even, calculate the average of the two middle values.

Let’s take an example to illustrate the calculation of the median:

Example: Consider the following set of data representing the scores of 10 students in a mathematics test:
85, 92, 78, 88, 96, 81, 90, 87, 93, 84.

Step 1: Arrange the data in ascending order: 78, 81, 84, 85, 87, 88, 90, 92, 93, 96.
Step 2: Since there are 10 data points, which is an even number, we need to calculate the average of the
two middle values.

The two middle values are 87 and 88.

Step 3: Calculate the average of the two middle values: (87 + 88) / 2 = 87.5.

Therefore, the median of the given data set is 87.5.

Merits of using the median:

1. Resistant to Outliers: The median is less affected by extreme values or outliers in the data. It
provides a robust measure of central tendency, particularly in cases where there are outliers that
can heavily influence the mean.

2. Appropriate for Skewed Data: The median is especially useful when dealing with skewed data
distributions. It is not affected by the skewness of the data and can better represent the central
value in such cases.

3. Simple Calculation: Calculating the median is relatively simple, as it involves arranging the data
and finding the middle value(s).

Demerits of using the median:

2
1. Ignores Data Magnitude: The median only considers the position of the data points and not their
actual values. It does not take into account the magnitude or distance between the data points,
potentially losing information about the data dispersion.

2. Limited Use in Statistical Analysis: The median may not be suitable for certain statistical tests and
procedures that assume a normal distribution or require the use of the mean as a measure of
central tendency.

3. Loss of Precision: The median reduces the data to a single value, resulting in a loss of precision. It
provides information about the central position but does not convey the full picture of the data
distribution.

When deciding whether to use the median or other measures of central tendency, such as the mean or
mode, consider the characteristics of the data and the specific objectives of the analysis. Each measure has
its own merits and demerits, and the choice should be based on the nature of the data and the
requirements of the analysis.

Qno2; Explain the process and errors in hypothesis testing?

Ans: Hypothesis testing is a statistical method used to make inferences or draw conclusions
about a population based on sample data. It involves setting up a null hypothesis (H0) and an
alternative hypothesis (Ha) and testing the evidence against the null hypothesis. The process
of hypothesis testing generally involves the following steps:

1. State the hypotheses: The null hypothesis (H0) is the statement of no effect or no difference, while
the alternative hypothesis (Ha) is the statement of the effect or difference that you want to test.

2. Set the significance level: The significance level (also known as alpha, typically denoted as α)
determines the threshold at which the null hypothesis will be rejected. Commonly used values for α
are 0.05 or 0.01, representing a 5% or 1% chance of rejecting the null hypothesis when it is true.

3
3. Collect and analyze the data: Gather the sample data and perform the necessary statistical analysis
based on the nature of the data and the hypothesis being tested. This may involve calculating test
statistics, conducting t-tests, chi-square tests, or other appropriate statistical tests.

4. Calculate the test statistic: The test statistic measures the discrepancy between the observed data
and what would be expected under the null hypothesis. The choice of the test statistic depends on
the type of data and the hypothesis being tested.

5. Determine the critical region: The critical region is the range of values in which the test statistic
must fall to reject the null hypothesis. It is determined by the significance level and the distribution
of the test statistic under the null hypothesis.

6. Compare the test statistic with the critical region: If the test statistic falls within the critical region,
the null hypothesis is rejected in favor of the alternative hypothesis. If it does not fall within the
critical region, there is insufficient evidence to reject the null hypothesis.

Errors in Hypothesis Testing:

1. Type I error: A Type I error occurs when the null hypothesis is rejected even though it is true. It
represents a false positive result, indicating that a significant effect or difference is detected when
there is actually no effect or difference in the population. The probability of Type I error is equal to
the chosen significance level (α).

2. Type II error: A Type II error occurs when the null hypothesis is not rejected, even though it is
false. It represents a false negative result, indicating that no significant effect or difference is
detected when there is actually an effect or difference in the population. The probability of Type II
error is denoted as β.

4
The goal of hypothesis testing is to minimize both Type I and Type II errors, but there is a trade-off
between them. Lowering the significance level (α) decreases the chance of Type I error but increases the
chance of Type II error. Increasing the sample size generally reduces the probability of both types of
errors.

It is important to interpret the results of hypothesis testing carefully, considering the context and
limitations of the study. Hypothesis testing provides statistical evidence but does not prove causation or
establish absolute truth. Statistical significance should be interpreted in conjunction with effect size,
practical significance, and the overall research question.

Qno3What do you understand by ‘Pearson Correlation’? Where is it used and how is it


interpreted?

Ans; Pearson correlation, also known as Pearson’s correlation coefficient or Pearson’s r, is a


statistical measure that quantifies the strength and direction of the linear relationship
between two continuous variables. It was developed by Karl Pearson and is widely used in
statistics and research fields to assess the association between variables.

The Pearson correlation coefficient ® ranges from -1 to 1. The value of r indicates the strength and
direction of the relationship:

1. Perfect positive correlation: If r = 1, it indicates a perfect positive linear relationship, meaning that
as one variable increases, the other variable also increases in a consistent and proportional
manner.

2. Strong positive correlation: If r is close to 1 (0.8 to 1), it indicates a strong positive linear
relationship, suggesting that as one variable increases, the other variable tends to increase.

5
3. No correlation: If r is close to 0 (around -0.2 to 0.2), it suggests no significant linear relationship
between the variables.

4. Strong negative correlation: If r is close to -1 (-0.8 to -1), it indicates a strong negative linear
relationship, implying that as one variable increases, the other variable tends to decrease.

5. Perfect negative correlation: If r = -1, it represents a perfect negative linear relationship, meaning
that as one variable increases, the other variable decreases in a consistent and proportional
manner.

To calculate the Pearson correlation coefficient, the following steps are followed:

1. Organize the paired data into two columns: one column for the values of the first variable and
another column for the corresponding values of the second variable.

2. Calculate the means (average) of both variables.

3. Calculate the difference between each value and its corresponding mean for both variables.

4. Multiply the differences obtained in step 3 for each pair and sum them.

5. Calculate the standard deviation for both variables.

6. Divide the sum obtained in step 4 by the product of the standard deviations calculated in step 5.

6
The resulting value is the Pearson correlation coefficient , which quantifies the strength and direction of the
linear relationship between the variables.

Qno3;Where is it used and how is it interpreted?

Ans;

The Pearson correlation coefficient ® is widely used in various fields and disciplines, including:

1. Social Sciences: In sociology, psychology, and anthropology, Pearson correlation is used to


examine relationships between variables such as personality traits, attitudes, behaviors, and social
interactions. For example, it can be used to analyze the relationship between self-esteem and
academic performance or the association between income level and job satisfaction.

2. Economics: Pearson correlation is employed in economic research to explore the relationship


between economic variables like GDP growth and unemployment rates, interest rates and
investment, or inflation and consumer spending.

3. Business and Marketing: Pearson correlation is utilized in market research to examine the
association between variables like customer satisfaction and loyalty, advertising expenditure and
sales, or product price and demand.

4. Health and Medicine: In medical and health-related studies, Pearson correlation is used to
analyze the relationship between variables such as body mass index and cardiovascular health,
smoking and lung function, or drug dosage and treatment effectiveness.

5. Education: Pearson correlation can be used in educational research to investigate relationships


between variables like teaching methods and student performance, parental involvement and
academic achievement, or classroom climate and student engagement.

7
6. Environmental Science: Pearson correlation is employed to assess the relationship between
environmental variables such as pollution levels and respiratory diseases, temperature changes and
species distribution, or rainfall patterns and crop yield.

Interpreting the Pearson correlation coefficient involves considering both the magnitude and statistical
significance of the coefficient:

1. Magnitude: The magnitude of the correlation coefficient ® indicates the strength of the
relationship. A value close to -1 or 1 suggests a strong linear relationship, while a value close to 0
indicates a weak or no linear relationship.

2. Sign: The sign of the correlation coefficient indicates the direction of the relationship. A positive
value of r indicates a positive or direct relationship, meaning that as one variable increases, the
other variable tends to increase. A negative value of r indicates a negative or inverse relationship,
implying that as one variable increases, the other variable tends to decrease.

3. Statistical Significance: To determine if the observed correlation is statistically significant,


hypothesis testing is conducted. The calculated correlation coefficient is compared to a critical
value, typically using a t-test or p-value. If the calculated coefficient is significantly different from
zero (i.e., p-value is less than the chosen significance level, often 0.05), the relationship is
considered statistically significant.

It is important to note that correlation does not imply causation. While a significant correlation indicates a
relationship between variables, it does not establish a cause-and-effect relationship. Other factors, lurking
variables, or complex interactions may be involved. Therefore, caution should be exercised in drawing
causal conclusions solely based on correlation analysis.

Q.4 Explain ANOVA and its logics.

Ans: Answer

8
ANOVA, which stands for Analysis of Variance, is a statistical technique used to analyze and compare the
means of two or more groups or treatments. It helps determine whether there are significant differences
among the means and whether those differences are likely due to the treatments being compared or
simply due to chance.

The key concept behind ANOVA is the partitioning of the total variation in the data into different sources of
variation. By comparing the variation between groups with the variation within groups, ANOVA assesses
whether the differences observed among the group means are statistically significant.

The basic steps involved in conducting ANOVA are as follows:

1. Formulate the hypothesis: The first step is to state the null hypothesis (H0) and the alternative
hypothesis (Ha). The null hypothesis typically assumes that there are no significant differences
among the group means, while the alternative hypothesis suggests that at least one group mean is
different from the others.

2. Select the significance level: The significance level (often denoted as α) determines the
threshold at which the null hypothesis will be rejected. Commonly used values for α are 0.05 or
0.01, representing a 5% or 1% chance of rejecting the null hypothesis when it is true.

3. Collect and organize the data: Gather data from each group or treatment being compared and
organize it in a suitable format.

4. Calculate the necessary statistics: ANOVA involves calculating several statistics, including the
sum of squares (SS), degrees of freedom (df), mean square (MS), and the F-ratio. These statistics
help quantify the variation between groups and within groups.

9
5. Compute the F-ratio: The F-ratio is obtained by dividing the mean square between groups
(MSB) by the mean square within groups (MSW). The F-ratio measures the ratio of the explained
variance (variation between groups) to the unexplained variance (variation within groups).

6. Determine the critical F-value: The critical F-value is obtained from the F-distribution table
based on the chosen significance level and the degrees of freedom.

7. Compare the F-ratio with the critical F-value: If the calculated F-ratio is greater than the critical F-
value, it suggests that there are significant differences among the group means, and the null
hypothesis is rejected. On the other hand, if the calculated F-ratio is less than the critical F-value,
it implies that the observed differences in means are not statistically significant, and the null
hypothesis is not rejected.

ANOVA is used in a wide range of disciplines and research areas, including psychology, social sciences,
biology, economics, and more. It allows researchers to assess the effects of different treatments,
interventions, or independent variables on a dependent variable by comparing multiple groups. ANOVA
helps determine if observed differences are meaningful or simply due to chance, providing valuable insights
for making informed decisions and drawing accurate conclusions in research studies.

Logics of ANOVA

The logic of ANOVA (Analysis of Variance) can be explained through the following steps:

1. Variation in the data: ANOVA begins by recognizing that there is variation or differences in the
data. It acknowledges that the observed values of a variable are not all the same.

2. Partitioning the total variation: The total variation in the data is divided into different sources of
variation. ANOVA decomposes the total variation into two components: variation between groups
(explained variation) and variation within groups (unexplained variation).

10
3. Null hypothesis and alternative hypothesis: The null hypothesis in ANOVA states that there are no
significant differences among the group means being compared. The alternative hypothesis
suggests that at least one group mean is different from the others.

4. Comparing the sources of variation: ANOVA compares the variation between groups (explained
variation) with the variation within groups (unexplained variation). It examines whether the
observed differences in group means are larger than what would be expected due to random
chance alone.

5. F-ratio and statistical test: ANOVA uses the F-ratio to quantify the ratio of explained variation to
unexplained variation. The F-ratio is obtained by dividing the mean square between groups (MSB)
by the mean square within groups (MSW). It is a measure of the signal-to-noise ratio.

6. Hypothesis testing: ANOVA involves hypothesis testing to determine the statistical significance of
the observed differences among group means. The F-ratio is compared to a critical value based on
the chosen significance level and the degrees of freedom associated with the groups and error.

7. Decision and interpretation: If the calculated F-ratio is greater than the critical value, the null
hypothesis is rejected. This indicates that there are significant differences among the group means,
and the observed variation between groups is unlikely to be due to random chance alone.
Conversely, if the calculated F-ratio is less than the critical value, the null hypothesis is not
rejected, suggesting that the observed differences in means are not statistically significant.

The logic of ANOVA is based on the idea that if the variation between groups is significantly larger than the
variation within groups, it provides evidence for the presence of real differences among the group means.
It allows researchers to assess the impact of different treatments, interventions, or independent variables
on a dependent variable by comparing multiple groups. ANOVA provides a statistical framework for

11
determining whether the observed differences are meaningful or simply due to random variability, allowing
researchers to draw valid conclusions and make informed decisions based on the data.

Q.5 Explain Chi-Square. Also discuss it as independent test.

Answer:

Chi-square is a statistical test used to determine if there is a significant association between two
categorical variables. It is particularly useful when the variables are independent and the researcher wants
to investigate if there is any relationship or dependency between them. The Chi-square test assesses
whether the observed distribution of frequencies in different categories significantly differs from the
expected distribution.

The logic of the Chi-square test can be explained as follows:

1. Hypotheses: The Chi-square test involves formulating the null hypothesis (H0) and the
alternative hypothesis (Ha). The null hypothesis assumes that there is no association or
relationship between the variables, while the alternative hypothesis suggests that there is a
significant association or relationship.

2. Data Preparation: The data for the Chi-square test should be organized in a contingency table or
cross-tabulation format. This table displays the frequencies or counts of each combination of
categories for the two variables being analyzed.

3. Expected Frequencies: Under the assumption of independence between the variables, the
expected frequencies for each cell in the contingency table are calculated. These expected
frequencies are based on the total frequencies and the marginal distributions of the variables.

4. Chi-square Statistic: The Chi-square statistic (χ²) is calculated by comparing the observed
frequencies in the contingency table with the expected frequencies. It measures the discrepancy
between the observed and expected frequencies, indicating the degree of association or deviation
from independence.

12
5. Degrees of Freedom: The degrees of freedom (df) for the Chi-square test are determined by the
number of categories in each variable. It is calculated as (number of rows – 1) multiplied by
(number of columns – 1).

6. Critical Value and p-value: The critical value of the Chi-square statistic is obtained from the Chi-
square distribution table based on the chosen significance level and the degrees of freedom.
Additionally, the p-value is calculated, which represents the probability of obtaining the observed
Chi-square statistic or a more extreme value under the null hypothesis.

7. Decision and Interpretation: If the calculated Chi-square statistic is greater than the critical
value or the p-value is below the chosen significance level (e.g., 0.05), the null hypothesis is
rejected. This indicates that there is a significant association or relationship between the variables.
Conversely, if the calculated Chi-square statistic is less than the critical value or the p-value is
above the significance level, the null hypothesis is not rejected, suggesting no significant
association between the variables.

The Chi-square test is commonly used in various fields, such as social sciences, biology, market research,
and public health, to investigate relationships between categorical variables. It provides valuable insights
into the association or dependency between variables, allowing researchers to understand patterns, make
comparisons, and draw conclusions based on observed frequencies in different categories.

Also discuss it as independent test

13
The Chi-square test can be used as an independent test when the researcher wants to examine the
association between two categorical variables without any specific hypothesis about the direction or nature
of the relationship. In this case, the test is often referred to as the Chi-square test for independence.

The Independent Chi-square test follows a similar logic to the Chi-square test described earlier, but it
focuses on determining if the observed distribution of frequencies is independent of the variables being
studied. The goal is to assess whether the two variables are related or associated in a statistically
significant way.

The steps involved in conducting the independent Chi-square test are as follows:

1. Hypotheses: The null hypothesis (H0) assumes that there is no association or relationship
between the variables, indicating that they are independent. The alternative hypothesis (Ha)
suggests that there is a significant association between the variables.

2. Data Preparation: The data should be organized in a contingency table format, with rows
representing categories of one variable and columns representing categories of the other variable.
The table displays the observed frequencies or counts for each combination of categories.

3. Expected Frequencies: Under the assumption of independence, the expected frequencies are
calculated for each cell in the contingency table. These expected frequencies are based on the
assumption that there is no association between the variables.

4. Chi-square Statistic: The Chi-square statistic (χ²) is computed by comparing the observed
frequencies with the expected frequencies. It quantifies the discrepancy between the observed and
expected frequencies, providing a measure of the association or independence between the
variables.

14
5. Degrees of Freedom: The degrees of freedom (df) for the independent Chi-square test are
determined by the number of categories in each variable. It is calculated as (number of rows – 1)
multiplied by (number of columns – 1).

6. Critical Value and p-value: The critical value for the Chi-square statistic is obtained from the
Chi-square distribution table based on the chosen significance level and the degrees of freedom.
The p-value is also calculated, representing the probability of obtaining the observed Chi-square
statistic or a more extreme value under the null hypothesis.

7. Decision and Interpretation: If the calculated Chi-square statistic is greater than the critical
value or the p-value is below the chosen significance level (e.g., 0.05), the null hypothesis is
rejected. This indicates that there is a significant association between the variables. On the other
hand, if the calculated Chi-square statistic is less than the critical value or the p-value is above the
significance level, the null hypothesis is not rejected, suggesting no significant association between
the variables.

The independent Chi-square test allows researchers to examine the presence of an association or
relationship between two categorical variables, providing insights into the independence or dependency of
the variables. It is widely used in social sciences, market research, epidemiology, and other fields to
explore relationships between variables, identify patterns, and make informed decisions based on observed
frequencies.

15

You might also like