Emailing Assignmen 8614 Irshan

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Assignment #1

Name: Tanveer Haider

Roll #: cb638821

Course code: 8614

Submitted to:

Program: B.ED (1.5 YEARS)

Semester: Autumn 2021

Subject: Educational Statistics


Q.1 What are the function of Statistics? Discuss in detail with reference to research in

1.1 Functions of Statistics

Functions of Statistics are summarized under following headings.

i) To present facts in a definite form

Daily we encounter millions of pieces of information which are often vague, indefinite and
unclear. When such pieces of information undergo certain statistical techniques and are
represented in the form of tables or figures, they represent things in a perspective which is easy
to comprehend. For example, when we say that some students out of 1000 who appeared for
B. Ed examination were declared successful.

This statement is not giving as much information. But when we say that 900 students out of
1000 who appeared for B. Ed examination were declared successful; and after using certain
statistical techniques we conclude that “90% of B. Ed. students were successful”; now the
sentence becomes more clear and meaningful.

ii) To simplify unmanageable and complex data

In our daily life and in research also, we often get large amount of information. To get a clear
picture, statistics helps us either by simplifying such information by taking few figures to serve
as a representative sample or by taking average to give a bird’s eye view of the large masses.
Complex data may be simplified by presenting them in the form of a tables, graphs or diagrams,
or representing it through an average etc.

iii) To use techniques for making comparisons

Often in research things become more clear and significant when they are compared with others
of the same type. The comparison between two different groups is courtesy of certain statistical
techniques, such as average, coefficients, rates, ratios, etc.

iv) To enlarge individual experience

As an individual our knowledge is limited to what we can observe and see; and that is a very
small part of the ocean of knowledge. Statistics extends our knowledge and experiences by
presenting various conclusions and results, based on numerical investigations. For example,
we daily listen and also have genera impression that the cost of living has increased. But to
know to what extent the increase has occurred, and how far the rise in prices have affected
different income groups, it would be necessary to have a comparison of the rise in prices of
articles consumed.

v) To provide guidance in the formulation of policies

Statistics enable us to make correct decisions, whether they are taken by a businessman or
government. In fact statistics is a great servant of business in management, government.
Statistical methods are employed in industry in tackling the problem of standardization of

Large industries maintain a separate department for statistical intelligence or statistical bureau,
the work of which is to collect, compare and coordinate figures for formulating future policies
of the firm regarding production and sales.

vi) To enable measurement of the magnitude of a phenomenon

Statistics enables us to measure the magnitude of a phenomenon under investigation. Estimate
of the population of a country or the quantity of wheat, rice and other agricultural commodities
produced in the country during any year are examples of such phenomena.
Q.2 Write a comprehensive essay on “Types of Variables”.

Types of Variables

Variables can be categorized in three different ways,

A. The causal relationship
B. The design of study,
C. The unit of measurement. Let us describe these variables in some details.

The Causal Relationship

In causal relationship studies four types of variables may operate. These may be:
i. Change variables that are responsible for bringing about change in a phenomena;
ii. Variables which affect the link between cause and effect variables
iii. Outcome variables which result from the effects of a change variable
iv. Connecting or linking variables, which in certain situation are important to complete
relationship between cause and effect.

In research, change variables are referred to as independent variables while the outcome
variables are known as dependent variables. In cause effect relationship, there are some
unmeasured variables affecting the relationship. These are called extraneous variables. The
variables linking cause-effect relationship are called intervening variables. A brief summary of
above mentioned variables is given in the following table.

Variable Description

Independent Variable It is a cause that brings changes in the situation

Dependent Variable It is a change that occurs due to dependent variable

Extraneous Variable It is a situation/factor in everyday life that influences changes in

dependent variable. As these factors are not measured in the research study, they can increase
or decrease the magnitude of relationship between the independent and dependent variables.
Intervening Variable It is a link between independent and dependent variable. Sometimes,
without the intervention of another variable, it is impossible to establish a relationship between
independent and independent variables.

Design of the Study

A study that investigates causation or association may be controlled, contrived experiment, a
quasi-experiment or an ex post facto or non-experimental study. Normally, there are two types
this category of variables.

i. Active Variables: these variables can be changed or controlled; and

ii. Attribute Variables: these variables can be changed or controlled and refer to
characteristics of the research study population. Demographic features like age, gender,
education, qualification and income etc. are attributive variables. Some common types
of variables are given below.

i) Binary Variable
These variables take only two values. For example, male or female, true or false, yes of no,
improved or not improved, completed task or failed to complete task etc. These variables can
be divided into two types; opposite binary variables, and Conjunct binary variables. Opposite
binary variables are polar opposite to eachother.

For example, success or failure, true or false etc. There is no third or middle value. On the
other hand conjunct binary variables assume two values but also have middle value. For
example, agreeing 20% with the policies of one party and 80% with others.

ii) Categorical Variable

Usually an independent variable or predictor contains values indicating membership in more
than one possible categories.
For example, gender (male or female), marital status (married, single, divorced, widow), or
brand of a product.

iii) Confounding Variable

A variable that has hidden effect on the experiment.

iv) Continuous Variable

A variable with infinite number of values. And its values are obtained by measuring. For
example, height and weight of students in a class, time it takes to get to school, distance
between Lahore and Karachi etc.

v) Dependent Variable
Outcome or response of an experiment. An independent variable has direct or inverse effect
upon dependent variable. In graph it is plotted on y-axis.

vi) Independent Variable

The variable that is manipulated by the researcher. In graph it is plotted on x-axis.

vii) Nominal Variable

It is another name of categorical variable.

viii) Ordinal Variable

Similar as categorical variable, but there is clear order. For example, income level of low,
middle and high.

ix) Interval Variable

An interval variable is a measurement where the difference between two values is meaningful.
The difference between the temperature of 100o and 90o is the same as 80o and 70o.

x) Ratio Variable
Similar to interval variable, but has meaningful zero.

xi) Qualitative Variable

A broad category of any variable that can’t be counted “i.e. has no numerical value”. Nominal
and ordinal variable fall under this umbrella.
xii) Quantitative Variable
A broad category of any variable that can be counted “i.e. has numerical value associated with
it”. Variable fall in this category include discrete variable and ratio variable.

2.1.2 Less Common Types of Variables Data

Some less common types of variables are given below.

i) Attribute Variable
Another name for a categorical variable (in statistical software) or a variable that isn’t
manipulated (in design of experiments).

ii) Collider Variable

A variable represented by a node on a causal graph that has paths pointing in as well as out.

iii) Covariate Variable

Similar to an independent variable, it has an effect on the dependent variable but is usually not
the variable of interest.

iv) Criterion Variable

Another name for a dependent variable, when the variable is used in nonexperimental

v) Dichotomous Variable
Another name for a binary variable.

vi) Dummy Variables

Used in regression analysis when you want to assign relationships to unconnected categorical
variables. For example, if you had the categories “has dogs” and “owns a car” you might assign
a 1 to mean “has dogs” and 0 to mean “owns a car.”

vii) Endogenous Variable

Similar to dependent variables, they are affected by other variables in the system. Used almost
exclusively in econometrics.

viii) Exogenous Variable

Variables that affect others in the system.

ix) Indicator variable

Another name for a dummy variable.

x) Intervening variable
A variable that is used to explain the relationship between variables.

xi) Latent Variable

A hidden variable that can’t be measured or observed directly.

xii) Manifest variable

A variable that can be directly observed or measured.

xiii) Manipulated variable

Another name for independent variable.

xiv) Mediating variable

Variables that explain how the relationship between variables happens. For example, it could
explain the difference between the predictor and criterion.

xv) Moderating variable

Changes the strength of an effect between independent and dependent variables. For example,
psychotherapy may reduce stress levels for women more than men, so sex moderates the effect
between psychotherapy and stress levels.

xvi) Nuisance Variable

An extraneous variable that increase variability overall.

xvii) Observed Variable

A measured variable (usually used in SEM).

xviii) Outcome variable

Similar in meaning to a dependent variable, but used in a non-experimental study.

xix) Polychotomous variables

variables that can have more than two values.

xx) Predictor variable

Similar in meaning to the independent variable, but used in regression and in non experimental

xxi) Test Variable

Another name for the Dependent Variable.

xxii) Treatment variable

Another name for independent variable.

Q.3 Explain “Exploratory Data Analysis” in detail.

3.1 Bar Chart
Bar charts are one of the most commonly used graphical representations of data used to visually
display compare values to each other. They are easy to create and interpret. They are also
flexible and have several variations of standard bar charts including vertical or horizontal bar
charts, component or grouped charts, and stacker bar charts.

Data for a bar chart are entered in columns. Each numeric data value becomes a bar. The chart
is constructed such that lengths of the different bars are proportional to the size of the category
they represent. X-axis represents the different categories and has no scale; the y-axis does have
a scale and indicates the units of measurement, in case of vertical bar charts, and vice versa in
case of horizontal bar charts. In the following figure result of first, second and the pak studies.

3.2 Pictograms
A pictogram is a graphical symbol that conveys its meaning through its pictorial resemblance
to a physical object. A pictogram may include a symbol plus graphic elements such as border,
back pattern, or color that is intended to covey specific information s. we can also say that a
pictogram is a kind of graph that uses pictures instead of bars to represent data under analysis.
A pictogram is also called “pictograph”, or simply “picto”.

Fig 3.2 pictogram example

A histogram is a type of graph that provides a visual interpretation of numerical data by
indicating the number of data points that lie within the range value. These range values are
called classes or bins.
A histogram looks similar to bar charts. Both are ways to display data set. The height of the
bar corresponds to the relative frequency of the amount of data in the class. The higher the bar
is, the greater the frequency of the data will bean vice versa. The main difference between these
graphs is the level of measurement of the data. Bar graphs are used for data at nominal level of
measurement. It measures the frequency of categorical data.

Fig: 3.3 histograms graph

3.4 Frequency Polygon

The frequency polygon is as graph that displays data by using lines that connect points plotted
for the frequencies at the midpoint of the classes. This graph is useful for understanding the
shape of distribution. They are good choice for displaying cumulative frequency distribution.
A frequency polygon is similar to histogram. The difference is that histogram tends to be
rectangles while a frequency polygon resembles a line graph.
3.5 Cumulative Frequency Polygon or Ogive
The cumulative frequency is the sum of the frequencies accumulated up to the upper boundary
of a class in the distribution. A graph that can be used to represent the cumulative frequencies
for the classes is called cumulative frequency graph or ogive. An ogive is drawn on the basis
of cumulative frequency. To construct cumulative frequency, first we have to form cumulative
frequency table. The upper limits of the classes are taken on the x-axis and the cumulative
frequencies on the y-axis and the points are plotted.

3.6 Scatter Plot

A scatter plot is used to plot data in XY- plane to show how much one variable or data set is
affected by another. It has points that show the relationship between two variables or two sets
of data. These points are sometimes called markers and position of these points depends on the
values in the columns sets on the XY axis. Scatter plot gives good visual picture of the
relationship or association between two variables or data sets, and aids to interpretation of the
correlation coefficient or regression model.
The relationship between two data sets or variables is called correlation. If the markers
are close together and make a straight line in the scatter plot, the two variables of data
sets have high correlation. If the markers are equally distributed in the scatter plot, the
correlation is low.

Name of Student GPA

A 2.11
B 2.34
C 3.27
D 3.44
E 2.60
F 3.09
G 3.39
Q.4 Write down the basic purpose of measurement of central tendency. Give examples
from your daily life.

i) Mean
Mean is the most commonly used measure in educational research. It is appropriate for
describing ratio or interval data. It can also be used for both continuous and discrete numeric
data. It is the arithmetic average of the score. It is determined by adding up all the scores and
then by the sum by the total number of scores. Suppose we have scores, 40, 85, 94, 62, 76, 66,
90, 59, 68, and 84. In order to find the mean of these scores we simply add all the scores, which
comes to 724. Then divide this sum 10 (total number of scores). We will get 72.4, which is the
mean score.

The formula for computing the mean is:

(Mean score) X = ƩX/n
Where Ʃ represents “Sum of”, X represents any raw score value, n represents total
number of scores.

5.2.1 Merits of Mean

i) It is rigidly defined.
ii) It is easy to understand and calculate.
iii) It is used for further analysis and treatment.
iv) It is based upon all the values of the given data.
v) It is capable of further mathematical treatment.
vi) It is not much affected by sampling fluctuations.

5.2.2 Demerits of Mean

i) It cannot be calculated if any observation is missing.

ii) It cannot be calculated for data with open ended distribution.
iii) It may not lie in the middle of series, if series is skewed.
iv) It is affected by extreme values.
v) It cannot be located graphically.
vi) It may be number which is not present in the data.
vii) It can be calculated for the data representing qualitative values.

5.3 Median

Median is the middle value of rank order data. It divides the distribution in two halves (i.e. 50%
of scores or observations on either side of median value). It means that this value separates
higher half of the data set from the lower half. The goal of the median is to determine the
precise midpoint of the distribution. Median is appropriate for describing ordinal data.

5.3.1 Procedure for Determining Median

When the number of scores is odd, simply arrange the scores in order (from lower to higher or
from higher to lower). The median will be the middle score in the list. Consider the set of scores
2, 5, 7, 10, 12. The score “7”lies in the middle of the scores, so it is median.
5.3.2 Merits of Median

i) It is rigidly defined.
ii) It is easy to understand and calculate.
iii) It is not affected by extreme values.
iv) Even if the extreme values are not known median can be calculated.
v) It can be located just by inspection in many cases.
vi) It can be located graphically.
vii) It is not much affected by sampling fluctuations.
viii) It can be calculated by data based on ordinal scale.
ix) It is suitable for skewed distribution.
x) It is easily located in individual and discrete classes.

5.3.3 Demerits of Median

i) It is not based on all values of the given data.

ii) For larger data size the arrangements of the data in the increasing order is somewhat difficult
iii) It is not capable for further mathematical treatment.
iv) It is not sensitive to some change in the data value.
v) It cannot be used for further mathematical processing.

5.4 Mode

The mode is the most frequently occurring score in the distribution. Consider following data
set. 25, 43, 39, 25, 82, 77, 25, 47. The score 25 comes more frequently, so it is the mode.
Sometimes there may be no single mode if no one value appears more than any other. There
may be one mode (uni-modal), two modes (bi-model), three modes (tri-model), or more than
three modes (multi-model). Mode is useful when scores reflect a nominal scale of
measurement. But along with mean and median it can also be used for ordinal, interval or ratio
data. It can be located graphically by drawing histogram.

5.4.1 Merits of Mode

i) It is easy to understand and easy to calculate.

ii) It is not affected by extreme values.
iii) Even if the extreme values are not known mode can be calculated.
iv) It can be located just by inspection in many cases.
v) It can be located graphically.
vi) It is always present in the data.
vii) It is applicable for both quantitative and qualitative data.
viii) It is useful for methodological forecasts.

5.4.2 Demerits of Mode

i) It is not rigidly defined.

ii) It is not based upon all values of the given data.
iii) It is not capable of further mathematical calculation.
iv) There will be no mode if there is no common value in the data.

Q.5 Write down merits and demerits of Mean, Median and Mode.
1. Merits of Mean

i. It is rigidly defined.
ii. It is easy to understand and calculate.
iii. It is used for further analysis and treatment.
iv. It is based upon all the values of the given data.
v. It is capable of further mathematical treatment.
vi. It is not much affected by sampling fluctuations.

2. Demerits of Mean

i. It cannot be calculated if any observation is missing.

ii. It cannot be calculated for data with open ended distribution.
iii. It may not lie in the middle of series, if series is skewed.
iv. It is affected by extreme values.
v. It cannot be located graphically.
vi. It may be number which is not present in the data.
vii. It can be calculated for the data representing qualitative values

3. Merits of Median
a. It is rigidly defined.
b. It is easy to understand and calculate.
c. It is not affected by extreme values.
d. Even if the extreme values are not known median can be calculated.
e. It can be located just by inspection in many cases.
f. It can be located graphically.
g. It is not much affected by sampling fluctuations.
h. It can be calculated by data based on ordinal scale.
i. It is suitable for skewed distribution.
j. It is easily located in individual and discrete classes.
4. Demerits of Median
a. It is not based on all values of the given data.
b. For larger data size the arrangements of the data in the increasing order is
somewhat difficult process.
c. It is not capable for further mathematical treatment.
d. It is not sensitive to some change in the data value.
e. It cannot be used for further mathematical processing

5. Merits of Mode

a) It is easy to understand and easy to calculate.

b) It is not affected by extreme values.
c) Even if the extreme values are not known mode can be calculated.
d) It can be located just by inspection in many cases.
e) It can be located graphically.
f) It is always present in the data.
g) It is applicable for both quantitative and qualitative data.
h) It is useful for methodological forecasts.

6. Demerits of Mode

a) It is not rigidly defined.

b) It is not based upon all values of the given data.
c) It is not capable of further mathematical calculation.
d) There will be no mode if there is no common value in the data.

You might also like