0% found this document useful (0 votes)
17 views16 pages

Unit 3 Tech

The document provides an overview of basic statistics, emphasizing its importance in education for analyzing and interpreting assessment data. It distinguishes between various types of statistics, including descriptive, inferential, and analytical, and discusses the significance of data types and graphical representation in understanding student performance. Additionally, it covers measures of central tendency, such as mean, median, and mode, and their applications in educational assessment.

Uploaded by

Abhay Krishnan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views16 pages

Unit 3 Tech

The document provides an overview of basic statistics, emphasizing its importance in education for analyzing and interpreting assessment data. It distinguishes between various types of statistics, including descriptive, inferential, and analytical, and discusses the significance of data types and graphical representation in understanding student performance. Additionally, it covers measures of central tendency, such as mean, median, and mode, and their applications in educational assessment.

Uploaded by

Abhay Krishnan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

UNIT 3

BASIC STATISTICS FOR ANALYSIS AND INTERPRETATION OF ASSESSMENT

The mathematical process of collecting, organizing, interpreting and analyzing the data are
termed as “statistics”. In statistics, data related to any individual/organization/behavior etc. are expressed
in numerical form. The word statistics is derived from the Latin word ‘Status’, Italian word ‘Statista’,
German word ‘Statistik’, and French word ‘Statiique’ , which all means ‘political state’. It provides data
concerning the various attributes of state/country that help in successful administration.

Seligman defines “Statistics is the science which deals with the methods of collecting, classifying,
presenting, comparing and interpreting numerical data collected through some light on any sphere of
enquiry”

In fact, educationists and psychologists use statistics widely to study human behavior. At the
same time, statistics also help a teacher analyze and judge students’ performance.

Descriptive Statistics: it involves the methods of collecting, classifying, organizing, picturing and
summarizing observations or information’s from data and generating statistical
interpretations/conclusions about the data. It is concerned about particular group of individuals.

Inferential Statistics: it involves the method of using information from samples to arrive at conclusions
about the population. It is thus concerned about samples selected from the population.

Inductive Statistics: It involves the method of generating statistical interpretations about one
group/individual based on the interpretations made on another group/ individual. It is sometimes called
predictive statistics.

Analytical Statistics: It involves methods of arriving statistical interpretations comparing two or more sets
of data.

Applied Statistics: It involves the application of statistics to study real life situations such as educational
growth, economic status, political matters, etc

Statistics plays a pivotal role in education. Without statistics, it is difficult for the teacher to interpret
and judge the learning performance of children. Not only learning performance, performance of students
in co-curricular/extracurricular activities analysed and interpreted using statistics. For example during
sports meet, running time of students participating in 100m race marked in numerical terms. As we know,
assessment and evaluation are two critical aspects of teaching –learning process. Assessment is the
process of collecting scores and evaluation is the interpretation of that score. For example, a teacher
conducts a unit test for a particular unit. The marks against each student depict the assessment while
assigning them rank is evaluation. The whole process involved is statistics.

The following are importance of statistics in assessment and evaluation.

 It helps teacher analyze and interpret scores.


 It helps teachers compare the scores of different groups within the school or outside.
 It helps teacher construct standardized achievement test.
 Statistics helps to determine the individual differences like intelligence, aptitude, attitude, personality,
etc. among children.
 When the standardized tests are administered on children, the results obtained are analysed and used
for providing counseling and guidance services.
 It helps a teacher predict the future performance of children.
 It helps a teacher make selection, categorization and promotion of students.
 It helps a teacher compare the functioning and working of his/her organization with that of other.
 One of the most important applications of statistics is its use in educational research.

Population and Sample

Population Sample
 Refers to the persons/objects from which  Small portion of the population selected
the sample is drawn for the actual study
 Eg: 400 students of 8th std.  It represents all characteristics of the
 Whole students of a school population
 All teachers  Eg: 25 students selected from 8th std.
 20 teachers selected from different
schools

Data and its Type


Data is the plural form of the word ‘datum’ which means ‘fact’. In statistics data represents any
kind of information obtained from an experiment, observation, interview, or through any investigatory
procedures. In the context of schools, the information about the total number of students, the number of
teachers, the periods allotted for each subject, number of absentees each day, the marks scored by
children in term end examinations and assignments, participation in co-curricular activities, mode of
transportation in reaching schools, etc. represents data. These information (or data) would help a teacher
in many ways such as, to judge her children’s learning performance, recognize talents of children in
extracurricular activities providing guidance and counselling services and so on.

Evidence or fact which describes group or a situation and from which conclusion is drawn is
called data (Biswal & Dash, 2009).

Types of
Data

Primary Data Secondary Data

 Reported from an actual observer or  Reported by others than the oribgibal


participant regarding a situation, event, observers or participants about any event,
happening etc. happenings, situation etc.
 First hand information or fact  Second hand information or fact
 Eg:Official Records, interview records,  Eg: Informations from newspaper,
journals, articles etc.

Qualitative Data/Series: The data (facts, Items. events, persons, phenomena, etc) expressed in qualitative
terms are called qualitative data. Qualitative data are not measurable on a scale. For example, gender of
students, type of school, mode of transportation of children, etc.

Quantitative Data/ Series: The data expressed in numerical format are called quantitative data. Such data
are measurable and countable. For example, the data showing the children who passed tenth grade,
children’s attendance in a particular day, etc.
Continuous Series/Data: The data expressed in a sequence form are called continuous data. There will
not be any gap in between the numbers. Continuous data are expressed as fractions. For example, the
height of the children may be 5’7’’ or 5’4’’ and so on.

Discrete Series/Data: If the data expressed have gaps in between, such data are called discrete data.
Discrete data are represented as whole numbers and not as fractions. For example, number of children in a
particular class, number of periods in a day, etc.

GRAPHICAL REPRESENTAION OF DATA

Many a times, numerical data is complex and difficult to interpret and understand. This is true for
common people and investigators and in our context for teachers. So a more interesting and attractive
kind of representation came into practice and that is the graphical form of data representation. In
graphical representation the data is represented as geometric figures which could be easily interpreted and
understood by any one. But the geometric picture needs to be drawn keeping into account the proportion
and measurements of data. Thus it is possible to visualize and transform numerical data to picture or
graphic format drawn considering a reasonable proportion. Graph represents the numerical data in a
geometric figure drawn on scale.

Objectives of Graphical representation of data

 It makes complex statistical data simple


 It is used to make comparison between two or more variable in a very short time
 It can be used to get a clear idea of the variation in the value of variables

Importance of Graphical Representation

 Graphical representations are attractive and beautiful.


 It helps easy visualization and appealing to eyes.
 Graphical representation facilitates trouble-free interpretation and judgments.
 It gives a bird’s eye view of the entire data.
 It is easy to construct.
 Save money and time

Limitations

 It provides only an approximate idea and therefore is not suitable to the situation where greater
accuracy is needed
 It cannot show many facts as a statistical table can do.
 Two- or three-dimensional representation can be difficult to understand than statistical table.

Rules for constructing Diagram


 There should be proper heading for every diagram
 Headings should be simple and clear
 Should be drawn in proper and accurate scale
 Must be accompanied by the original data from which it is drawn
 Proper indication like dots, line etc. should be used
 Proper index should be there to interpret symbols, color lines etc.

Graphical Representation of Ungrouped Data

The data in the form of raw scores is called ungrouped data while data organized in the frequency
distribution is called grouped data. The following are the different types of graphs/diagrams which we use
when the data is ungrouped.
i. Bar graph or bar diagram
ii. Circle or Pie graphs/diagram

Bar graph or Bar Diagram

 It graphs the frequency distribution of the data on a X-Y axis.


 The class intervals are plotted on X-axis and Frequency on Y-axis.
 Each interval is represented by a rectangle whose base corresponds to a class interval and whose
height is equal to the frequency associated with the class interval.

Circle or Pie graphs/diagram

 The pi-diagram/pie graph is known as circle graph representing the statistical data as circular
figure considering weightage given to the proportion of data.
 The percentage break-ups are represented in pi- diagrams.
 To construct pie- diagram knowledge of angle measurements and percentages is necessary.

Graphical Representation of Grouped Data

When the raw scores are arranged in frequency distribution, the data obtained is called grouped data. The
following are the graphical representations of grouped data.
 Histogram or column diagram
 Frequency polygon
 Frequency Curve
 Cumulative Frequency Graph
 Cumulative Frequency Percentage Curve or Ogive.

Histogram
Histogram is essentially a bar graph of a frequency distribution. But histogram is used when the
statistical data is arranged in class intervals. Here the frequency is represented using vertical adjacent
rectangles. Generally the class interval is depicted in x axis and frequency on y axis. Thus the base of the
rectangle represents the class interval and height its frequency. Thus histogram is the graphical
representation of grouped data in the form of vertical bars (equal width) whose area is proportional to the
frequency represented. It is to be noted that, histograms cannot be constructed with open end classes.
To construct histogram using the frequency distribution given above the following process is
followed. First the limits of the class intervals are calculated. To compute limits, both lower limit and
upper limit of each class interval is found out. The lower limit and upper limits are plotted in the x axis
the frequencies are plotted on the y axis. Thereafter, each class interval is depicted using adjacent
rectangular bars of equal width.

Frequency Polygon

Polygon is a closed figure with many sides. Frequency polygon is a line graph representation of
statistical data/frequency distribution. To construct frequency polygon, the mid points of histogram are
joined together and the two end sides are connected to the base line(x axis). As the end points touch
themselves form a closed a figure and hence the name frequency polygon.
To draw frequency polygon, first of all, the mid points of class interval are found out and are
represented using the letter ‘X’. The mid points of class intervals are represented on X-axis. The
frequency of class intervals are indicated on Y-axis. Then the corresponding frequency is plotted against
each midpoint in the graph and is connected using straight lines. Finally, the start point and end points of
the frequency polygon are connected to ‘0’ on the x axis. This can be achieved by adding a lower limit
and higher limit (add an extra class interval at the lower/higher limit). This helps to create a closed
polygon.
Frequency Curve

Frequency curve is obtained by joining the points of frequency polygon by a freehand smoothed curve.
To plot a frequency curve without using histogram, we need to plot the frequency of the class against its' class
marks and join the points with line segments. The following are the characteristics of frequency curve:
 A Frequency curve is formed by smoothly joining the consecutive points on the graph with a
specific pattern.
 Frequency curve can also be drawn with the help of histogram by joining their mid points of
rectangle.
 Frequency polygon and frequency curves are same except frequency curve is drawn using free
hand and frequency polygon is drawn using scale.

Cumulative Frequency Curve or Ogive

The third method of representing grouped data is through cumulative frequencies. A curve that
represents the cumulative frequency distribution of grouped data on a graph is called a Cumulative Frequency
Curve or an Ogive.
In cumulative frequency graph, the frequencies are added and the resulting cumulative
frequencies are plotted in the graph. To draw the cumulative frequency graph, an extra class interval is
added at the lowest limit whose frequency is 0. Thereafter, the frequencies are added and thus the
cumulative frequencies are found out. Then these frequencies are represented on the graph.
Measures of Central Tendency
The most common statistics that we use in school is the measures of central tendency. Central
tendency is the central value or score of a group. For example, let the score of 30 students in a particular
subject is given, we can calculate the average score of that group and then we can compare the individual
score with the central value/score of that group. Again, central tendency also helps us to compare the
central values of two different groups it may be within different sections of a same class, between two
different classes in the same school and also between same standards in two different schools. We use
measures of central tendency for making inter and intra group comparisons. The most commonly used
measures of central tendency are:
 Mean
 Median
 Mode

According to Yukl the following are the arguments to satisfy an ideal measure

 It should be rigidly defined so that the definition should be clear and should lead to only one
interpretation by different person otherwise it could lead to biases.
 It should be easy to understand and calculate
 It should be based on all your observations that the entire data should be used and no information
should be deleted or avoided.
 It should be suitable for further treatment i.e it can be calculated further.
 It should be affected as little as possible by fluctuation of sampling.
 It should not be affected much by extreme scores.
Mean: The mean is the sum of all scores in a distribution divided by total number of scores.

Properties of Mean:
 Unlike other measures of central tendency, the mean is responsible to the exact position of such
score in distribution.
 The mean is the balance point of a distribution.
 The mean is more sensitive to the presence or absence of score at the extreme in comparison to
median and mode
 When a measure of central tendency should reflect the total of the score, mean is the best method
which can be used e.g: test score, intelligent score.
 When we need to do further statistical calculation then mean can be used
 Mean has normally stable from one sample to another

When can we use Mean

 When the score is distributed symmetrically around a central point.


 When the measure of central tendency having the greater ability is wanted as means is more
stable than median and mode.
 When we want to use other statistics later on.

Advantages:

 It is simple and easy to calculate and understand. Mean is useful for comparison between
different distributions
 It is not necessary that the item should be present in order.

Disadvantages:

 It cannot be used for qualitive analysis


 Effected by extreme scores
 If one value is missing, mean cannot be calculated
 Mean may lead to wrong conclusions
 It gives more importance to larger frequency than smaller ones

Median: It is a middle value that divides the distribution into two equal parts. i.e half of the value lie
above the median and half below.
Properties of Median:

 The median respond to how many score lies below or above it but not to how far away the scores
may be. Hence the median is less sensitive than the median to the presenting few scores.
 The median may be a better choice for measuring the central tendency in the distribution that are
deviant
 Median can be used for open ended distribution.

When can we use Median

 When the exact midpoint of distribution is wanted.


 When there are extreme scores which would affect the mean. e.g: If we have 5 number in a
distribution 4,5,6,7,8 mean and median both equal to 6 and if we replace 7 with 50, median will
not be affected but mean will be.
 When the distribution is open ended i.e no upper limit is given to us ,we can calculate median

Advantages:

 Median can be calculated in all distribution


 It is easy to calculate
 It is very useful in quantitative analysis
 It is not effected by extreme scores
 Can be calculated when there is an open ended distribution
 It can be calculated when some data is missing from distribution

Disadvantages:

 In case of even number of observations, the median cannot be calculated exactly. In such cases
we take the middle observation between the 2 points
 Median cannot be calculated further i.e we cannot use other treatments after the calculation of
mean.

Mode: In un group distribution the mode is the score that occurs with the great frequency. The word
means popular style.

Properties of Mode

 The mode is the only score or measure that can be used for qualitative analysis
 It describes what scores occur more frequently
 It can be used with any scale of measurement
 It is possible for distribution to have more than one mode

When can we use Mode

 When a quick and app. Measure of central tendency is wanted, we use Mode
 When the measure of central tendency should be more typical value e.g: when you want to know
the dress code which used more frequently by college student, we use mode.

Advantages

 Easy to calculate
 Can be calculated on any scale
 It is useful in the study of popular sizes like manufacturing large number of sizes in more demand
 It is useful for qualitative analysis

Disadvantages

 It is not very stable measure of central tendency as its position might shift whenever the
distribution is altered
 It cannot be subjected to further statistical treatment

Measures of Dispersion/Variability
The statistical measures used to determine the nature and extent of dispersion of the scores are
known as measures of dispersion or measures of variability. The important measures of dispersion are
 Range
 Quartile Deviation
 Mean Deviation
 Standard Deviation

Range: The Range is the simplest measure of variability. It is calculated by subtracting the lowest scores
in the series from the highest. Though easy to calculate, range is not a good measure of dispersion
because it takes into consideration only the end scores which quite often may be extreme cases.

Advantages

 It can be easily calculated and understood.


Limitations

 It helps us to make only a rough comparison of two or more groups with respect to the variability
of the scores concerned.
 It takes into account only the two extreme end scores of a distribution and is unreliable when N is
small or when there are large gaps in the frequency distribution.
 It is very greatly affected by fluctuations in sampling. Its value is never stable.
 The range does not take into account the composition of a group or the nature of distribution of
the scores within the extremes. The range of a symmetrical and an asymmetrical distribution can
be identical.
When to use a Range
 When knowledge of the difference between the two extreme cores is all that is wanted.
 When the data are too scant or too scattered to justify the computation of a more precise measure
of variability.

Quartile Deviation: The second measure of measures of dispersion is Quartile Deviation. It is also known
as ‘semi-interquartile range’. The distance between first and third quartile divided by two is known as
‘Quartile Deviation’. Quartile deviation is easy to calculate and interpret, it is independent of the problem
of extreme values and, therefore, it is more representative and authentic than range.

Advantages

 It is a more representative and trustworthy measure of variability than the range.


 It is a good index of score density at the middle of the distribution.
 It is useful in indicating the skewness of a distribution.
 Like the median, it is applicable to open-end distributions.

Limitations

 It is not capable of further algebraic treatment.


 It is possible for two distributions to have equal Quartile Deviations but quite dissimilar
variability at the lower and upper 25% scores. This may lead to incorrect conclusions.
 It is unduly affected by a considerable clustering of scores at anyone end of a distribution.

When to use the Quartile Deviation:

 When the median is taken as a measure of central tendency.


 When the details of the distribution at either end is not available.
Mean Deviation

The mean deviation (MD), also known as average deviation (AD), is the mean of all deviations of
all individual and separate scores in a distribution taken from the mean. While calculating mean
deviation, no account is taken of signs and, therefore, all deviations whether having plus or minus sign are
treated as positive only.

Advantages

 It is easily understood.
 It is based on all the scores.
 It is not affected very much by the values of the extreme items.

Limitations

 It ignores the algebraic signs of the deviations and as such it is not capable of further
mathematical treatment.
When to use Mean Deviation
 When it is desired to weigh all deviations from the mean according to their size.
 When extreme deviations would influence Standard Deviation unduly.

Standard Deviation

Standard deviation (SD) is commonly and frequently used measure of dispersion, because SD is
the most consistent and stable index of variability and is usually used in experimental research.
Deviations of all scores taken from their mean and then square root of their average, is known as,
Standard Deviation. The standard deviation differs from mean deviation in many respects. In calculating
the mean deviation, we ignore signs and treat all deviations as positive, whereas in standard deviation we
avoid such complexities of signs by squaring the individual deviations. Also, unlike mean deviation, in
the standard deviation, the squared deviations used in the process are always taken from the mean only,
never from the median or mode. The commonly acknowledged symbol for the standard deviation is the
Greek letter ‘’ (known as Sigma).
In conclusion, it may be said that ‘The sum of the squared deviations from the mean, divided by
N, is known as the variance and the square root of the variance is known as standard deviation’.

Advantages

 It is well defined and its value is always definite


 It is based on all the scores in the data
 It is amenable to algebraic treatment and possesses many useful mathematical properties.
 This is why it is used in many advanced statistical studies
 It is less affected by fluctuations in sampling than most other measures of variability,

Limitations

 Statistical interpretation using standard deviation is comparatively difficult.


 It gives more weight to extreme scores and less to those which are near the mean, because the
squares of the deviations are taken. These squares will become very large as the deviations
increase.

When to use Standard Deviation

 When a measure of deviation having the greatest stability and reliability, is sought.
 When the co-efficient of correlation and other statistics are subsequently to be computed.
 When the interpretations related to the normal probability curve are desired.

CORRELATION

Correlation may be defined as the relationship between two variables. Two variables are said to
be correlated if the change in one variable results in a corresponding change in the other variable. That is
when two variables move together, we say they are correlated. For example, when the price of a
commodity rises, the supply for that commodity also rises. On the other hand, if price falls the supply also
falls. So both variables move together or they move in sympathy. Hence price and supply are correlated.
If the change in one variable appears to be accompanied by a change in the other variable, the
two variables are said to be co-related and this inter-variation is called correlation.

Coefficient of Correlation

The ratio indicating the degree of relationship between two related variables is called Coefficient
of Correlation. For a perfect positive correlation the coefficient of correlation is +1 and for perfect
negative it will be -1. Perfect positive or negative correlation is possible only in physical sciences. In
social sciences like education the correlation between pairs of variables will lie within the limits of+1
and-1. Positive coefficient of correlation varies from 0 to +1 and negative coefficient of correlation varies
from 0 to -1. Zero correlation indicates no consistent relationship.
Types of Correlation

There are three types of correlation, namely, positive, negative and zero. This classification is
done on the basis of the direction of the relation between variables.
Positive Correlation: If when the first variable increases or decreases the other also increases or decreases
respectively their relationship is said to be positive because they move in the same direction.

 For e.g. Intelligence and Achievement.

Negative Correlation: If when the first variable increases or other respectively decreases or increases their
relationship is said to be negative because they move in the opposite direction.

 For e.g. Anxiety and Performance


Zero Correlation: - when there is no relationship observed or established between two variables i.e. the
change in one variable cannot establish the change of other variable in a definite pattern.

 For e.g. intelligence and weight

Linear and non linear correlation


Correlation may be linear or non linear when the amount of change in one variable leads to a
constant ratio of change in the other variable, correlation is said to be linear.,
Correlation is said to be non linear (or curvilinear), when the amount of change in one variable is
not in constant ratio to the change in the other variable, In the case of non linear correlation, the ratio of
change fluctuates and is never constant.

Simple, partial and multiple correlation: In the study of relationship between variables, if there are only
two variable, the correlation is said to be simple. For e.g., the correlation between price and demand is
simple.

In the study of multiple correlation measure, the degree of association between one variable on one side
and all the other variable together on the other side. Thus the relationship between yield with both rainfall
and temperature together is multiple correlations.

In partial correlation we study the relationship of one variable with one of the other variable presuming
that the other variables remain constant. For eg: let there be three variables yield, rainfall and temperature.
Each is related with the other. Then the relationship between yield and rainfall (assuming temperature is
constant) is the partial correlation.

Interpretation of Correlation

Use of Coefficient of Correlation

 It helps to determine the validity of a test.


 It helps to determine the reliability of a test.
 It can be used to ascertain the degree of the objectivity of a test.
 It can answer the validity of arguments for or against a statement.
 It indicates the nature of the relationship between two variables
 It predicts the value of one variable given the value of another related variable.
 It helps to ascertain the traits and capacities of pupils.

You might also like