Unit 3 Tech
Unit 3 Tech
The mathematical process of collecting, organizing, interpreting and analyzing the data are
termed as “statistics”. In statistics, data related to any individual/organization/behavior etc. are expressed
in numerical form. The word statistics is derived from the Latin word ‘Status’, Italian word ‘Statista’,
German word ‘Statistik’, and French word ‘Statiique’ , which all means ‘political state’. It provides data
concerning the various attributes of state/country that help in successful administration.
Seligman defines “Statistics is the science which deals with the methods of collecting, classifying,
presenting, comparing and interpreting numerical data collected through some light on any sphere of
enquiry”
In fact, educationists and psychologists use statistics widely to study human behavior. At the
same time, statistics also help a teacher analyze and judge students’ performance.
Descriptive Statistics: it involves the methods of collecting, classifying, organizing, picturing and
summarizing observations or information’s from data and generating statistical
interpretations/conclusions about the data. It is concerned about particular group of individuals.
Inferential Statistics: it involves the method of using information from samples to arrive at conclusions
about the population. It is thus concerned about samples selected from the population.
Inductive Statistics: It involves the method of generating statistical interpretations about one
group/individual based on the interpretations made on another group/ individual. It is sometimes called
predictive statistics.
Analytical Statistics: It involves methods of arriving statistical interpretations comparing two or more sets
of data.
Applied Statistics: It involves the application of statistics to study real life situations such as educational
growth, economic status, political matters, etc
Statistics plays a pivotal role in education. Without statistics, it is difficult for the teacher to interpret
and judge the learning performance of children. Not only learning performance, performance of students
in co-curricular/extracurricular activities analysed and interpreted using statistics. For example during
sports meet, running time of students participating in 100m race marked in numerical terms. As we know,
assessment and evaluation are two critical aspects of teaching –learning process. Assessment is the
process of collecting scores and evaluation is the interpretation of that score. For example, a teacher
conducts a unit test for a particular unit. The marks against each student depict the assessment while
assigning them rank is evaluation. The whole process involved is statistics.
Population Sample
Refers to the persons/objects from which Small portion of the population selected
the sample is drawn for the actual study
Eg: 400 students of 8th std. It represents all characteristics of the
Whole students of a school population
All teachers Eg: 25 students selected from 8th std.
20 teachers selected from different
schools
Evidence or fact which describes group or a situation and from which conclusion is drawn is
called data (Biswal & Dash, 2009).
Types of
Data
Qualitative Data/Series: The data (facts, Items. events, persons, phenomena, etc) expressed in qualitative
terms are called qualitative data. Qualitative data are not measurable on a scale. For example, gender of
students, type of school, mode of transportation of children, etc.
Quantitative Data/ Series: The data expressed in numerical format are called quantitative data. Such data
are measurable and countable. For example, the data showing the children who passed tenth grade,
children’s attendance in a particular day, etc.
Continuous Series/Data: The data expressed in a sequence form are called continuous data. There will
not be any gap in between the numbers. Continuous data are expressed as fractions. For example, the
height of the children may be 5’7’’ or 5’4’’ and so on.
Discrete Series/Data: If the data expressed have gaps in between, such data are called discrete data.
Discrete data are represented as whole numbers and not as fractions. For example, number of children in a
particular class, number of periods in a day, etc.
Many a times, numerical data is complex and difficult to interpret and understand. This is true for
common people and investigators and in our context for teachers. So a more interesting and attractive
kind of representation came into practice and that is the graphical form of data representation. In
graphical representation the data is represented as geometric figures which could be easily interpreted and
understood by any one. But the geometric picture needs to be drawn keeping into account the proportion
and measurements of data. Thus it is possible to visualize and transform numerical data to picture or
graphic format drawn considering a reasonable proportion. Graph represents the numerical data in a
geometric figure drawn on scale.
Limitations
It provides only an approximate idea and therefore is not suitable to the situation where greater
accuracy is needed
It cannot show many facts as a statistical table can do.
Two- or three-dimensional representation can be difficult to understand than statistical table.
The data in the form of raw scores is called ungrouped data while data organized in the frequency
distribution is called grouped data. The following are the different types of graphs/diagrams which we use
when the data is ungrouped.
i. Bar graph or bar diagram
ii. Circle or Pie graphs/diagram
The pi-diagram/pie graph is known as circle graph representing the statistical data as circular
figure considering weightage given to the proportion of data.
The percentage break-ups are represented in pi- diagrams.
To construct pie- diagram knowledge of angle measurements and percentages is necessary.
When the raw scores are arranged in frequency distribution, the data obtained is called grouped data. The
following are the graphical representations of grouped data.
Histogram or column diagram
Frequency polygon
Frequency Curve
Cumulative Frequency Graph
Cumulative Frequency Percentage Curve or Ogive.
Histogram
Histogram is essentially a bar graph of a frequency distribution. But histogram is used when the
statistical data is arranged in class intervals. Here the frequency is represented using vertical adjacent
rectangles. Generally the class interval is depicted in x axis and frequency on y axis. Thus the base of the
rectangle represents the class interval and height its frequency. Thus histogram is the graphical
representation of grouped data in the form of vertical bars (equal width) whose area is proportional to the
frequency represented. It is to be noted that, histograms cannot be constructed with open end classes.
To construct histogram using the frequency distribution given above the following process is
followed. First the limits of the class intervals are calculated. To compute limits, both lower limit and
upper limit of each class interval is found out. The lower limit and upper limits are plotted in the x axis
the frequencies are plotted on the y axis. Thereafter, each class interval is depicted using adjacent
rectangular bars of equal width.
Frequency Polygon
Polygon is a closed figure with many sides. Frequency polygon is a line graph representation of
statistical data/frequency distribution. To construct frequency polygon, the mid points of histogram are
joined together and the two end sides are connected to the base line(x axis). As the end points touch
themselves form a closed a figure and hence the name frequency polygon.
To draw frequency polygon, first of all, the mid points of class interval are found out and are
represented using the letter ‘X’. The mid points of class intervals are represented on X-axis. The
frequency of class intervals are indicated on Y-axis. Then the corresponding frequency is plotted against
each midpoint in the graph and is connected using straight lines. Finally, the start point and end points of
the frequency polygon are connected to ‘0’ on the x axis. This can be achieved by adding a lower limit
and higher limit (add an extra class interval at the lower/higher limit). This helps to create a closed
polygon.
Frequency Curve
Frequency curve is obtained by joining the points of frequency polygon by a freehand smoothed curve.
To plot a frequency curve without using histogram, we need to plot the frequency of the class against its' class
marks and join the points with line segments. The following are the characteristics of frequency curve:
A Frequency curve is formed by smoothly joining the consecutive points on the graph with a
specific pattern.
Frequency curve can also be drawn with the help of histogram by joining their mid points of
rectangle.
Frequency polygon and frequency curves are same except frequency curve is drawn using free
hand and frequency polygon is drawn using scale.
The third method of representing grouped data is through cumulative frequencies. A curve that
represents the cumulative frequency distribution of grouped data on a graph is called a Cumulative Frequency
Curve or an Ogive.
In cumulative frequency graph, the frequencies are added and the resulting cumulative
frequencies are plotted in the graph. To draw the cumulative frequency graph, an extra class interval is
added at the lowest limit whose frequency is 0. Thereafter, the frequencies are added and thus the
cumulative frequencies are found out. Then these frequencies are represented on the graph.
Measures of Central Tendency
The most common statistics that we use in school is the measures of central tendency. Central
tendency is the central value or score of a group. For example, let the score of 30 students in a particular
subject is given, we can calculate the average score of that group and then we can compare the individual
score with the central value/score of that group. Again, central tendency also helps us to compare the
central values of two different groups it may be within different sections of a same class, between two
different classes in the same school and also between same standards in two different schools. We use
measures of central tendency for making inter and intra group comparisons. The most commonly used
measures of central tendency are:
Mean
Median
Mode
According to Yukl the following are the arguments to satisfy an ideal measure
It should be rigidly defined so that the definition should be clear and should lead to only one
interpretation by different person otherwise it could lead to biases.
It should be easy to understand and calculate
It should be based on all your observations that the entire data should be used and no information
should be deleted or avoided.
It should be suitable for further treatment i.e it can be calculated further.
It should be affected as little as possible by fluctuation of sampling.
It should not be affected much by extreme scores.
Mean: The mean is the sum of all scores in a distribution divided by total number of scores.
Properties of Mean:
Unlike other measures of central tendency, the mean is responsible to the exact position of such
score in distribution.
The mean is the balance point of a distribution.
The mean is more sensitive to the presence or absence of score at the extreme in comparison to
median and mode
When a measure of central tendency should reflect the total of the score, mean is the best method
which can be used e.g: test score, intelligent score.
When we need to do further statistical calculation then mean can be used
Mean has normally stable from one sample to another
Advantages:
It is simple and easy to calculate and understand. Mean is useful for comparison between
different distributions
It is not necessary that the item should be present in order.
Disadvantages:
Median: It is a middle value that divides the distribution into two equal parts. i.e half of the value lie
above the median and half below.
Properties of Median:
The median respond to how many score lies below or above it but not to how far away the scores
may be. Hence the median is less sensitive than the median to the presenting few scores.
The median may be a better choice for measuring the central tendency in the distribution that are
deviant
Median can be used for open ended distribution.
Advantages:
Disadvantages:
In case of even number of observations, the median cannot be calculated exactly. In such cases
we take the middle observation between the 2 points
Median cannot be calculated further i.e we cannot use other treatments after the calculation of
mean.
Mode: In un group distribution the mode is the score that occurs with the great frequency. The word
means popular style.
Properties of Mode
The mode is the only score or measure that can be used for qualitative analysis
It describes what scores occur more frequently
It can be used with any scale of measurement
It is possible for distribution to have more than one mode
When a quick and app. Measure of central tendency is wanted, we use Mode
When the measure of central tendency should be more typical value e.g: when you want to know
the dress code which used more frequently by college student, we use mode.
Advantages
Easy to calculate
Can be calculated on any scale
It is useful in the study of popular sizes like manufacturing large number of sizes in more demand
It is useful for qualitative analysis
Disadvantages
It is not very stable measure of central tendency as its position might shift whenever the
distribution is altered
It cannot be subjected to further statistical treatment
Measures of Dispersion/Variability
The statistical measures used to determine the nature and extent of dispersion of the scores are
known as measures of dispersion or measures of variability. The important measures of dispersion are
Range
Quartile Deviation
Mean Deviation
Standard Deviation
Range: The Range is the simplest measure of variability. It is calculated by subtracting the lowest scores
in the series from the highest. Though easy to calculate, range is not a good measure of dispersion
because it takes into consideration only the end scores which quite often may be extreme cases.
Advantages
It helps us to make only a rough comparison of two or more groups with respect to the variability
of the scores concerned.
It takes into account only the two extreme end scores of a distribution and is unreliable when N is
small or when there are large gaps in the frequency distribution.
It is very greatly affected by fluctuations in sampling. Its value is never stable.
The range does not take into account the composition of a group or the nature of distribution of
the scores within the extremes. The range of a symmetrical and an asymmetrical distribution can
be identical.
When to use a Range
When knowledge of the difference between the two extreme cores is all that is wanted.
When the data are too scant or too scattered to justify the computation of a more precise measure
of variability.
Quartile Deviation: The second measure of measures of dispersion is Quartile Deviation. It is also known
as ‘semi-interquartile range’. The distance between first and third quartile divided by two is known as
‘Quartile Deviation’. Quartile deviation is easy to calculate and interpret, it is independent of the problem
of extreme values and, therefore, it is more representative and authentic than range.
Advantages
Limitations
The mean deviation (MD), also known as average deviation (AD), is the mean of all deviations of
all individual and separate scores in a distribution taken from the mean. While calculating mean
deviation, no account is taken of signs and, therefore, all deviations whether having plus or minus sign are
treated as positive only.
Advantages
It is easily understood.
It is based on all the scores.
It is not affected very much by the values of the extreme items.
Limitations
It ignores the algebraic signs of the deviations and as such it is not capable of further
mathematical treatment.
When to use Mean Deviation
When it is desired to weigh all deviations from the mean according to their size.
When extreme deviations would influence Standard Deviation unduly.
Standard Deviation
Standard deviation (SD) is commonly and frequently used measure of dispersion, because SD is
the most consistent and stable index of variability and is usually used in experimental research.
Deviations of all scores taken from their mean and then square root of their average, is known as,
Standard Deviation. The standard deviation differs from mean deviation in many respects. In calculating
the mean deviation, we ignore signs and treat all deviations as positive, whereas in standard deviation we
avoid such complexities of signs by squaring the individual deviations. Also, unlike mean deviation, in
the standard deviation, the squared deviations used in the process are always taken from the mean only,
never from the median or mode. The commonly acknowledged symbol for the standard deviation is the
Greek letter ‘’ (known as Sigma).
In conclusion, it may be said that ‘The sum of the squared deviations from the mean, divided by
N, is known as the variance and the square root of the variance is known as standard deviation’.
Advantages
Limitations
When a measure of deviation having the greatest stability and reliability, is sought.
When the co-efficient of correlation and other statistics are subsequently to be computed.
When the interpretations related to the normal probability curve are desired.
CORRELATION
Correlation may be defined as the relationship between two variables. Two variables are said to
be correlated if the change in one variable results in a corresponding change in the other variable. That is
when two variables move together, we say they are correlated. For example, when the price of a
commodity rises, the supply for that commodity also rises. On the other hand, if price falls the supply also
falls. So both variables move together or they move in sympathy. Hence price and supply are correlated.
If the change in one variable appears to be accompanied by a change in the other variable, the
two variables are said to be co-related and this inter-variation is called correlation.
Coefficient of Correlation
The ratio indicating the degree of relationship between two related variables is called Coefficient
of Correlation. For a perfect positive correlation the coefficient of correlation is +1 and for perfect
negative it will be -1. Perfect positive or negative correlation is possible only in physical sciences. In
social sciences like education the correlation between pairs of variables will lie within the limits of+1
and-1. Positive coefficient of correlation varies from 0 to +1 and negative coefficient of correlation varies
from 0 to -1. Zero correlation indicates no consistent relationship.
Types of Correlation
There are three types of correlation, namely, positive, negative and zero. This classification is
done on the basis of the direction of the relation between variables.
Positive Correlation: If when the first variable increases or decreases the other also increases or decreases
respectively their relationship is said to be positive because they move in the same direction.
Negative Correlation: If when the first variable increases or other respectively decreases or increases their
relationship is said to be negative because they move in the opposite direction.
Simple, partial and multiple correlation: In the study of relationship between variables, if there are only
two variable, the correlation is said to be simple. For e.g., the correlation between price and demand is
simple.
In the study of multiple correlation measure, the degree of association between one variable on one side
and all the other variable together on the other side. Thus the relationship between yield with both rainfall
and temperature together is multiple correlations.
In partial correlation we study the relationship of one variable with one of the other variable presuming
that the other variables remain constant. For eg: let there be three variables yield, rainfall and temperature.
Each is related with the other. Then the relationship between yield and rainfall (assuming temperature is
constant) is the partial correlation.
Interpretation of Correlation