8602 Assignment No 2

ALLAMA IQBAL OPEN UNIVERSITY
Course: Educational Assessment and Evaluation (8602) Semester: Spring, 2021

Level: B. Ed (One and Half Years) Total Marks: 100
Tutor:Habib-ur-Rehman
ASSIGNMENT No. 2
(Units:5-9)
Q.1 What is the relationship between validity and reliability of test

Answer
Reliability vs validity: what’s the difference?

Published on July 3, 2019 by Fiona Middleton. Revised on July 16, 2021.
Reliability and validity are concepts used to evaluate the quality of research. They indicate how
well a method, technique or test measures something. Reliability is about the consistency of a
measure, and validity is about the accuracy of a measure.
It’s important to consider reliability and validity when you are creating your research design,
planning your methods, and writing up your results, especially in quantitative research.
Reliability vs validity
Reliability Validity
What does it tell The extent to which the results can The extent to which the results
you? be reproduced when the research is really measure what they are
repeated under the same supposed to measure.
conditions.
How is it By checking the consistency of By checking how well the results

assessed? results across time, across different correspond to established theories
observers, and across parts of the and other measures of the same
test itself. concept.
How do they A reliable measurement is not A valid measurement is generally

relate? always valid: the results might be reliable: if a test produces accurate
reproducible, but they’re not results, they should be
necessarily correct. reproducible.
Table of contents
1.
2.
3.
4.
Understanding reliability vs validity

Reliability and validity are closely related, but they mean different things. A measurement can
be reliable without being valid. However, if a measurement is valid, it is usually also reliable.
What is reliability?
Reliability refers to how consistently a method measures something. If the same result can be
consistently achieved by using the same methods under the same circumstances, the
measurement is considered reliable.
You measure the temperature of a liquid sample several times under identical conditions. The
thermometer displays the same temperature every time, so the results are reliable.
A doctor uses a symptom questionnaire to diagnose a patient with a long-term medical

condition. Several different doctors use the same questionnaire with the same patient but give
different diagnoses. This indicates that the questionnaire has low reliability as a measure of the
condition.
What is validity?
Validity refers to how accurately a method measures what it is intended to measure. If research
has high validity, that means it produces results that correspond to real properties,
characteristics, and variations in the physical or social world.
High reliability is one indicator that a measurement is valid. If a method is not reliable, it
probably isn’t valid.
If the thermometer shows different temperatures each time, even though you have carefully
controlled conditions to ensure the sample’s temperature stays the same, the thermometer is
probably malfunctioning, and therefore its measurements are not valid.
If a symptom questionnaire results in a reliable diagnosis when answered at different times and
with different doctors, this indicates that it has high validity as a measurement of the medical
condition.
However, reliability on its own is not enough to ensure validity. Even if a test is reliable, it may
not accurately reflect the real situation.
The thermometer that you used to test the sample gives reliable results. However, the
thermometer has not been calibrated properly, so the result is 2 degrees lower than the true
value. Therefore, the measurement is not valid.
A group of participants take a test designed to measure working memory. The results are
reliable, but participants’ scores correlate strongly with their level of reading comprehension.
This indicates that the method might have low validity: the test may be measuring participants’
reading comprehension instead of their working memory.
Validity is harder to assess than reliability, but it is even more important. To obtain useful
results, the methods you use to collect your data must be valid: the research must be measuring
what it claims to measure. This ensures that your discussion of the data and the conclusions you
draw are also valid.
How are reliability and validity assessed?
Reliability can be estimated by comparing different versions of the same measurement. Validity
is harder to assess, but it can be estimated by comparing the results to other relevant data or
theory. Methods of estimating reliability and validity are usually split up into different types.
Types of reliability
Different types of reliability can be estimated through various statistical methods.
Types of reliability
Type of reliability What does it assess? Example
Test-retest The consistency of a A group of participants complete

measure across time: do you get a questionnaire designed to
the same results when you repeat measure personality traits. If they
the measurement? repeat the questionnaire days,
weeks or months apart and give the
same answers, this indicates high
test-retest reliability.
Interrater The consistency of a Based on an assessment criteria

measure across raters or observers: checklist, five examiners submit
do you get the same results when substantially different results for
different people conduct the same the same student project. This
measurement? indicates that the assessment
checklist has low inter-rater
reliability (for example, because the
criteria are too subjective).
Internal The consistency of the You design a questionnaire to

consistency measurement itself: do you get the measure self-esteem. If you
same results from different parts of randomly split the results into two
a test that are designed to measure halves, there should be a strong
the same thing? correlation between the two sets of
results. If the two results are very
different, this indicates low internal
consistency.
Types of validity
The validity of a measurement can be estimated based on three main types of evidence. Each
type can be evaluated through expert judgement or statistical methods.
Types of validity
Type of validity What does it assess? Example
Construct The adherence of a measure A self-esteem questionnaire could

to existing theory and be assessed by measuring other
knowledge of the concept being traits known or assumed to be
measured. related to the concept of
self-esteem (such as social skills and
optimism). Strong correlation
between the scores for self-esteem
and associated traits would indicate
high construct validity.
Content The extent to which the A test that aims to measure a class
measurement covers all aspects of of students’ level of Spanish
the concept being measured. contains reading, writing and
speaking components, but no
listening component. Experts agree
that listening comprehension is an
essential aspect of language ability,
so the test lacks content validity for
measuring the overall level of
ability in Spanish.
Criterion The extent to which the result of a A survey is conducted to measure

measure corresponds to other valid the political opinions of voters in a
measures of the same concept. region. If the results accurately
predict the later outcome of an
election in that region, this
indicates that the survey has high
criterion validity.
To assess the validity of a cause-and-effect relationship, you also need to consider internal
validity (the design of the experiment) and external validity (the generalizability of the results).
What can proofreading do for your paper?

Scribbr editors not only correct grammar and spelling mistakes, but also strengthen your writing
by making sure your paper is free of vague language, redundant words and awkward phrasing.
How to ensure validity and reliability in your research

The reliability and validity of your results depends on creating a strong research design,
choosing appropriate methods and samples, and conducting the research carefully and
consistently.
Ensuring validity
If you use scores or ratings to measure variations in something (such as psychological traits,
levels of ability or physical properties), it’s important that your results reflect the real variations
as accurately as possible. Validity should be considered in the very earliest stages of your
research, when you decide how you will collect your data.
● Choose appropriate methods of measurement
Ensure that your method and measurement technique are high quality and targeted to measure
exactly what you want to know. They should be thoroughly researched and based on existing
knowledge.
For example, to collect data on a personality trait, you could use a

standardized questionnaire that is considered reliable and valid. If you develop your own
questionnaire, it should be based on established theory or findings of previous studies, and the
questions should be carefully and precisely worded.
● Use appropriate sampling methods to select your subjects
To produce valid generalizable results, clearly define the population you are researching (e.g.
people from a specific age range, geographical location, or profession). Ensure that you have
enough participants and that they are representative of the population.
Ensuring reliability
Reliability should be considered throughout the data collection process. When you use a tool or
technique to collect data, it’s important that the results are precise, stable and reproducible.
● Apply your methods consistently

Plan your method carefully to make sure you carry out the same steps in the same way for each
measurement. This is especially important if multiple researchers are involved.
For example, if you are conducting interviews or observations, clearly define how specific
behaviours or responses will be counted, and make sure questions are phrased the same way
each time.
● Standardize the conditions of your research
When you collect your data, keep the circumstances as consistent as possible to reduce the
influence of external factors that might create variation in the results.
For example, in an experimental setup, make sure all participants are given the same
information and tested under the same conditions
Q.2 Define a scoring criteraia for essay type test items for 8th
grade?
Answer
Scoring Essay Type Items
A rubric or scoring criteria is developed to evaluate/score an essay type item. A rubric is a scoring guide
for subjective assessments. It is a set of criteria and standards linked to learning objectives that are used
to assess a student's performance on papers, projects, essays, and other assignments. Rubrics allow for
standardized evaluation according to specified criteria, making grading simpler and more transparent. A
rubric may vary from simple checklists to elaborate combinations of checklist and rating scales. How
elaborate your rubric is depends on what you are trying to measure. If your essay item is A
restricted-response item simply assessing mastery of factual content, a fairly simple listing of essential
points would be sufficient. An example of the rubric of restricted response item is given below.
Test Item:
Name and describe five of the most important factors of unemployment in Pakistan. (10 points) 147
Rubric/Scoring Criteria:
(i) 1 point for each of the factors named, to a maximum of 5 points

(ii) One point for each appropriate description of the factors named, to a maximum of 5
points
(iii) No penalty for spelling, punctuation, or grammatical error
(iv) No extra credit for more than five factors named or described.
(iv) Extraneous information will be ignored.
However, when essay items are measuring higher order thinking skills of cognitive domain, more
complex rubrics are mandatory. An example of Rubric for writing test in language is given below.
Advantages of Essay Type Items

The main advantages of essay type tests are as follows:
(i) They can measures complex learning outcomes which cannot be measured by other means.
(ii) They emphasize integration and application of thinking and problem solving skills.
(iii) They can be easily constructed.
(iv) They give examines freedom to respond within broad limits.
(v) The students cannot guess the answer because they have to supply it rather than select it.
(vi) Practically it is more economical to use essay type tests if number of students is small.
(vii) They required less time for typing, duplicating or printing. They can be written on the blackboard
also if number of students is not large.
(viii) They can measure divergent thinking.
(ix) They can be used as a device for measuring and improving language and expression skill of
examinees.
(x) They are more helpful in evaluating the quality of the teaching process.
(xi) Studies has supported that when students know that the essay type questions will be asked, they
focus on learning broad concepts and articulating relationships, contrasting and comparing.
(xii) They set better standards of professional ethics to the teachers because they expect more time in
assessing and scoring from the teachers.
Limitations of Essay Type Items
The essay type tests have the following serious limitations as a measuring instrument:
A major problem is the lack of consistency in judgments even among competent examiners.
(ii) They have halo effects. If the examiner is measuring one characteristic, he can be influenced in
scoring by another characteristic. For example, a well behaved student may score more marks on
account of his good behaviour also.
(iii) They have question to question carry effect. If the examinee has answered satisfactorily in the
beginning of the question or questions he is likely to score more than the one who did not do well in the
beginning but did well later on.
(iv) They have examinee to examinee carry effect. A particular examinee gets marks not only on the basis
of what he has written but also on the basis that whether the previous examinee whose answered book
was examined by the examiner was good or bad. 149
(v) They have limited content validity because of sample of questions can only be asked in essay type
test.
(vi) They are difficult to score objectively because the examinee has wide freedom of expression and he
writes long answers.
(vii) They are time consuming both for the examiner and the examinee.
(viii) They generally emphasize the lengthy enumeration of memorized facts
Suggestions for Writing Essay Type Items

I. Ask questions or establish tasks that will require the student to demonstrate command of
essential knowledge. This means that students should not be asked merely to reproduce
material heard in a lecture or read in a textbook. To "demonstrate command" requires that the
question be somewhat novel or new. The substance of the question should be essential
knowledge rather than trivia that might be a good board game question.
II. Ask questions that are determinate, in the sense that experts (colleagues in the field) could
agree that one answer is better than another. Questions that contain phrases such as "What do
you think..." or "What is your opinion about..." are indeterminate. They can be used as a
medium for assessing skill in written expression, but because they have no clearly right or wrong
answer, they are useless for measuring other aspects of achievement.
III. Define the examinee's task as completely and specifically as possible without interfering with
the measurement process itself. It is possible to word an essay item so precisely that there is
one and only one very brief answer to it. The imposition of such rigid bounds on the response is
more limiting than it is helpful. Examinees do need guidance, however, to judge how extensive
their response must to be considered complete and accurate.
IV. Generally give preference to specific questions that can be answered briefly. The more questions
used, the better the test constructor can sample the domain of knowledge covered by the test. And the
more responses available for scoring, the more accurate the total test scores are likely to be. In addition,
brief responses can be scored more quickly and more accurately than long, extended responses, even
when there are fewer of the latter type.
IV. Use enough items to sample the relevant content domain adequately, but not so many that
students do not have sufficient time to plan, develop, and review their responses. Some
instructors use essay tests rather than one of the objective types because they want to
encourage and provide practice in written expression. However, when time pressures become
great, the essay test is one of the most unrealistic and negative writing experiences to which
students can be exposed. Often there is no time for editing, for rereading, or for checking
spelling. Planning time is short changed so that writing time will not be. There are few, if any,
real writing tasks that require such conditions. And there are few writing experiences that
discourage the use of good writing habits as much as essay testing does. 150
V. Avoid giving examinees a choice among optional questions unless special circumstances make
such options necessary. The use of optional items destroys the strict comparability between
student scores because not all students actually take the same test. Student A may have
answered items 1-3 and Student B may have answered 3-5. In these circumstances the
variability of scores is likely to be quite small because students were able to respond to items
they knew more about and ignore items with which they were unfamiliar. This reduced
variability contributes to reduced test score reliability. That is, we are less able to identify
individual differences in achievement when the test scores form a very homogeneous
distribution. In sum, optional items restrict score comparability between students and
contribute to low score reliability due to reduced test score variability.
VI. Test the question by writing an ideal answer to it. An ideal response is needed eventually to
score the responses. It if is prepared early, it permits a check on the wording of the item, the
level of completeness required for an ideal response, and the amount of time required to
furnish a suitable response. It even allows the item writer to determine if there is any "correct"
response to the question..
Q.3 Write a note on Mean, Median and Mode. Also dicsuss

their importance in interpreting test scores
Answer
Mean
Mean is an average of the given numbers: a calculated central value of a set of numbers. In
simple words, it is the average of the set of values. In statistics, the mean is one of the Measures
of central tendency, apart from the mode and median. But altogether, all three measures
(mean, median, mode) define the central value of given data or observations.
Mean = (Sum of all the observations/Total number of observations)
Example: What is the mean of 2, 4, 6, 8 and 10?

First add all the numbers,
2+4+6+8+10 = 30
Now divide by 5 (total number of observations)
Mean = 30/5 = 6
How to Find Mean?

To find the mean of any given data set, we have to use the average. The example given below
will help you in understanding how to find the mean of given data.
For example, in a class there are 20 students and they have secured a percentage of:
88,82,88,85,84,80,81,82,83,85,84,74,75,76,89,90,89,80,82,83. Find the average of the
percentage obtained by the class.
Solution: Average = Total of percentage obtained by 20 students in class/Total number of
students
Avg = [88+82+88+85+84+80+81+82+83+85+84+74+75+76+89+90+89+80+82+83]/20
Avg.=1660/20 = 83
Hence, the average percentage of each student in class is 83%.
In the same way, we find the mean of the given data set, in statistics.
Mean of Negative Numbers

We have seen examples of finding the mean of positive numbers till now. But what if the
numbers in the observation list include negative numbers. Let us understand with an instance,
Example: Find the mean of 9, 6, -3, 2, -7, 1.
Add all the numbers first:
Total: 9+6+(-3)+2+(-7)+1 = 9+6-3+2-7+1 = 8
Now divide the total from 6, to get the mean.
Mean = 8/6 = 1.33
Types of Mean
There are majorly three different types of mean value that you will be studying in statistics.
1. Arithmetic Mean
2. Geometric Mean
3. Harmonic Mean
Mean Median Mode Formula

The Mean, Median and Mode are the three measures of central tendency. Mean is the
arithmetic average of a data set. This is found by adding the numbers in a data set and dividing
by the number of observations in the data set. The median is the middle number in a data set
when the numbers are listed in either ascending or descending order. The mode is the value
that occurs the most often in a data set and the range is the difference between the highest and
lowest values in a data set.
The Mean
¯¯¯x=∑xNx¯=∑xN
Here,
$\sum$ represents the summation
X represents observations
N represents the number of observations .
The Median
If the total number of observations (n) is an odd number, then the formula is given below:
Median=(n+12)thobservationMedian=(n+12)thobservation
If the total number of the observations (n) is an even number, then the formula is given below:
Median=(n2)thobservation+(n2+1)thobservation2Median=(n2)thobservation+(n2+1)thobservati
on2
The Mode
Themodeisthemostfrequentlyoccuringobservationorvalue.Themodeisthemostfrequentlyoccurin
gobservationorvalue.
Solved Examples
Question: Find the mean, median, mode, and range for the following list of values:
13, 18, 13, 14, 13, 16, 14, 21, 13
Solution:
Given data: 13, 18, 13, 14, 13, 16, 14, 21, 13
The mean is the usual average.
Mean = $\frac{13+18+13+14+13+16+14+21+13}{9}=15$
(Note that the mean is not a value from the original list. This is a common result. You should not
assume that your mean will be one of your original numbers.)
The median is the middle value, so to rewrite the list in ascending order as given below:
13, 13, 13, 13, 14, 14, 16, 18, 21
There are nine numbers in the list, so the middle one will be
$\frac{9+1}{2}=\frac{10}{2}=5$
= 5th number
Hence, the median is 14.
The mode is the number that is repeated more often than any other, so 13 is the mode.
The largest value in the list is 21, and the smallest is 13, so the range is 21 – 13 = 8.
Mean = 15
Median = 14
Mode = 13
Range = 8
The purpose of this paper is to describe concepts which will give meaning to the scores
achieved by students. They are: measures of central tendency -mode, median and midpoint;
measures of dispersion – range and standard deviation; validity; reliability; and item analysis-
facility values, discrimination index, and option analysis.
Importance of Measures of Central Tendency
Mode is the most common score or the score which occurs most frequently (Hughes,
2008:221); (Brown, 2005:99). If the scores of a group of students are:
40,55,60,60,77,75,75,75,80, the mode of the scores is 75.
Median can be found by putting all the individual scores in order of magnitude (great size) and
choosing the middle one. The median of the scores exemplified above is 77. In case there is an
even number of test takers, where there are two middle scores, add the two scores and divide
by 2. If the scores of a group of students are: 27,34,56,57,78,81, the median is taken to be
midway between the two middle scores ( Brown, 2005:100); In this example, the two middle
scores are 56 and 57 so the median is 56.5.
Midpoint in a set of scores is that point halfway between the highest score and the lowest score
on the test.
The median may be more useful than the mean when there are extreme values in the data
set as it is not affected by the extreme values. The mode is useful when the most common item,
characteristic or value of a data set is required.
Mean, median and mode are three measures of central tendency of data. Accordingly, they give
what is the value towards which the data have tendency to move. Since each of these three
determines the central position, these three are also interpreted as location parameters.
Q.4 Write the procedure of arisiing letter

grades to test score.
ANSWER
INTRODUCTION
Although teachers can calculate grade scores in a number of endless ways, some of the
standard scores for grades go through many educational institutions. Most schools have a rating
scale, which is a set of standard percentages that indicate what students should get in each
grade A with an F. issue his book grade. Most teachers will set assignments as percent, while
others will use straightforward points. Both of these plans will be considered here.
Calculating CGPA and Assigning Letter Grades

СGРА stаnds fоr Сumulаtive Grаde Роint Аverаge. It refleсts the grаde роint аverаge оf аll
subjeсts/соurses regаrding а student’s рerfоrmаnсe in соmроsite wаy. Tо саlсulаte СGРА,
we shоuld hаve fоllоwing infоrmаtiоn.
●Marks for each subject /book ocourse
●Average grade points for each subject\ course ●Total credit hours (by adding credit hours for
each subject / course)
Calculating the CGPA is so easy that the average grade point is divided into total credit hours.
For example if the MA education system has studied 12 subjects, each of 3 credits. The total
credit hours will be 36. The CGPA will be 36/12 = 3.0
Assigning letter grades to test score
Give character marks The book grade program is very popular in the world including Pakistan.
Most teachers are concerned problems while giving marks. There are four main problems or
issues in this regard; 1) to be included in the book grade, 2) how the implementation data
should be compiled in giving bullet marks? How should the allocation of character marks be
determined?
1. Decision to be included in grade
Character tags can be very meaningful and useful when represented only success. When linked
to other factors or factors such as effort of finished work, human behavior, and so on, their
interpretation will be confused hopelessly. For example, a book in Grade C could represent a
moderate achievement with excessive effort and very good behavior and behavior or vice versa.
If bookmarks are to be legitimate indicators of success, they should be based on legitimacy
steps to success. This includes setting objectives such as targeted learning results and
development or selection of tests and tests that can measure this learning outcomes.
2. Combining data on marking
One of the most important things when giving marks is to be clear about the characteristics of
the student will be assessed whether there will be an equal weight for each learning outcome.
Because
for example,
if we decide that an average of 35 percent should be given in mid-year exams, 40% of the
end-of-year test or examination, and 25% in assignments, presentations, classroom
participation and ethics; we have to put everything in assign appropriate instruments to each
item, and use these integrated schools as the basis of grading.
3. Select the appropriate category reference frame
Standard character marks are given on the basis of the following frame of index.
a) Working with other team members (related ratings)
b) Performance standards (absolute balance)
c) Working towards learning ability (value for improvement)

Giving marks on a related basis involves comparing student performance with that of the
reference group, most of your classmates. In this system, the distance is determined by the
position of the student relative or the position in the entire group. Although related ratings have
the inconsistency of the moving reference frame (i.e. the group-dependent marks ability), is still
widely used in schools, as our testing system is often used 'Generalized'. Total grade allocation
involves comparing student performance with set standards set by the teacher. This is what we
call ‘conditional identification’ examination. If all students show a low level of management in
line with the establishment performance level, all will get low marks. Student performance in
terms of reading ability does not correspond to standard- a system based on student
assessment and reporting. Advance over short time is hard. Therefore the lack of honesty in
judging success in relation to competence and a judgmental level of progress will lead to lower
confidence scores. Such marks are therefore used as an addition to other measurement
systems.
4. Determining the allocation of marks
The awarding of related marks is primarily a matter of positioning the student in the order of
total earning and allocation of marks on the basis of each student's level in group. This situation
may be limited to one class or may be based on it a combined distribution of multiple study
groups conducting the same study. If curling is to be done, the most sensible approach to
determining
The distribution of bullet marks at the school prompted school staff to set standard guidelines
with introductory and advanced studies. All employees must understand the basics of this mark
allocation, and this foundation should be clearly communicated to grade users. If the objectives
of the courses are clearly stated and the levels of management properly set, character marks in
a complete system can be defined as this level goals achieved, as pursued.
A = Outstanding (90 to 100%)
B = Very good (80-89%)
C = Satisfactory (70-79%)
D = Very Weak (60-69%)
F = Unsatisfactory (Less than 60%)

Explanation of Assigning letter grades
Letter grade system is most popular in the world including Pakistan. Most teachers face
problems while assigning grades. There are four core problems or issues in this regard;
1) what should be included in a letter grade,
2) how should achievement data be combined in assigning letter grades?,
3) what frame of reference should be used in grading, and
4) how should the distribution of letter grades be determined?
1. Determining what to include in a grade
Letter grades are likely to be most meaningful and useful when they represent achievement
only. If they are communicated with other factors or aspects such as effort of work completed,
personal conduct, and so on, their interpretation will become hopelessly confused. For
example, a letter grade C may represent average achievement with extraordinary effort and
excellent conduct and behaviour or vice versa. If letter grades are to be valid indicators of
achievement, they must be based on valid measures of achievement. This involves defining
objectives as intended learning outcomes and developing or selecting tests and assessments
which can measure these learning outcomes.
2. Combining data in assigning grades
One of the key concerns while assigning grades is to be clear what aspects of a student are to
be assessed or what will be the tentative weightage to each learning outcome. For example, if
we decide that 35 percent weightage is to be given to mid-term assessment, 40 percent final
term test or assessment, and 25% to assignments, presentations, classroom participation and
conduct and behaviour; we have to combine all elements by assigning appropriate weights to
each element, and then use these composite scores as a basis for grading.
4. Selecting the proper frame of reference for grading
Letter grades are typically assigned on the basis of one of the following frames of reference.
a) Performance in relation to other group members (relative grading)
b) Performance in relation to specified standards (absolute grading)
c) Performance in relation to learning ability (amount of improvement)

Assigning grades on relative basis involves comparing a student’s performance with that of a
reference group, mostly class fellows. In this system, the grade is determined by the student’s
relative position or ranking in the total group. Although relative grading has a disadvantage of a
shifting frame of reference (i.e. grades depend upon the group’s ability), it is still widely used in
schools, as most of the time our system of testing is ‘norm-referenced’. Assigning grades on an
absolute basis involves comparing a student’s performance to specified standards set by the
teacher
Q.5 Discuss the difference between measures of central

tendency and measures of reliability.
Answer
Definition Of Measures Of Central Tendency
the tendency of the data to measure the merger with a certain average value. The approximate
when values are around the median value are usually measured using standard deviations. They
are classified as abbreviated figures.
Definition Of Reliability
Reliability means consistency. Psychologists look at three types of consistency: over time

(reliability of the test), in all things (internal consistency), and in various researchers
(intermediate reliability
Difference between measures of central tendency and

measures of reliability.
Measures of central Tendency
Measures of central Tendency are numbers that usually meet "in the middle" of a set of values.
Three such numbers in the middle mean, media, and mode.
Mean: The sum of all the measurements divided by the visual value.It can be used with
different and continuous details. Very common value.
Median: Medium value that separates the upper and lower half. Mean and median can
becompared to each other to determine whether the population is still distributed normally or
not. The numbers are arranged in sequence ascending or descending. The average number is
more than one taken.
Mode: Frequent value. Displays the most popular option and is the highest bar in the histogram.
Example to use: to determine the most common blood group.
Measures of reliability
Reliability is one of the most important aspects of quality testing. Related to consistency, or
rebirth, of test performer performance. Impossible to calculate the accuracy exactly. Instead, we
must balance honesty, and this is always the case incomplete effort. Here, we present major
statistics of fidelity and talk about them their strengths and weaknesses.
There are six standard categories for reliability, each of which measures reliability in a different
way. Of course:
i) Inter-Rater or Inter-Observer Reliability
Assessing the extent to which different providers / viewers offer matching ratings for the same
thing. That is, if two teachers mark the same test and the results are the same, then so be it
shows fidelity between a rater or inter-observer.
ii) Test-Retest Reliability:
Testing for rate variability from one period to another, when the same test is the case treated
twice and the results of all the treatment are the same, this is the way the reliability of the test.
Students can remember and can mature after the start management creates a problem with the
reliability of the test.
iii) Integrity of the Common Form:
To test the consistency of the results of two similarly structured trials from same content
domain. Here the experimental designer is trying to develop two experiments of the same type
and after treatment the results are the same then they will show the same form honesty.
iv) Internal Integrity:
To test the consistency of the results in all items in the tests, the combination of points for
individual items in all tests.
v) Divide half the reliability:
To test the consistency of the results by comparing two halves of one test, these halves may be
be unusual things in one test.
vi) Kuder-Richardson's credibility:
To test the consistency of the results using all the separate half of the test.
Measures of central tendency and measures of reliability.
What does term fidelity mean? Honesty means Honesty. The test rating says it is called honestly
when we have reason to believe that the test rate is stable again purpose. For example if the
same test is given to two classes and marked separately teachers even if they produced similar
results, can be considered reliable. Sustainability and reliability depend on the level at which
points are unlikely error. We must first build a conceptual bridge between the question posed
by
per person (i.e., are my points reliable?) and how reliability is measured scientifically. This
bridge is not as simple as it may first appear. When a person thinks honestly, many things can
come to his mind - my friend is very trustworthy, my car is very reliable, My online payment
process is very reliable, my client's performance is very reliable,
and so on. The elements in question are concepts like consensus, trust, prediction, diversity etc.
Note that the excluded, reliable statements are
performance, machine functionality, data processes, and performance functionality are possible
sometimes unreliable. The question is “how many test numbers vary differently to be seen?
But is the measure of central Tendency
Suppose the teacher gave the same test to two different classes and to the following results
are available:
Category 1: 80%, 80%, 80%, 80%, 80%
Category 2: 60%, 70%, 80%, 90%, 100%
If you count the meanings in both sets of schools, you get the same answer: 80%. But i the data
for the two classes from which this route was taken were very different from both charges. It is
also possible that different data sets have similar meanings, median, and mode.
For example:
Category A: 72 73 76 76 78
Category B: 67 76 76 78 80
Category A and category B therefore have the same meanings, mode, and the same media.
The way mathematicians classify cases like this is known as measuring the sample variability. As
a measure of moderation, there are a number of
methods for measuring sample variability.
Probably the easiest way to find out the width of the sample, that is, the difference between
seeing too much and too little. The average rating for Class 1 is 0, and the range in grade 2 is
40%. Just knowing that fact gives you the best understanding of the data obtained in these two
categories. In Phase 1, the value was 80%,
and the range was 0, but in class 2, the value was 80%, and the range was 40%. Mathematicians
use concise steps to define data patterns. Intermediate steps inclination refers to the
abbreviated steps used to describe the most "normal" value in price collection. Here, we are
interested in common, very representative points. There are three of them Typical measures for
medium inclination mean mode, and mode. It has to be a teacher you are familiar with these
common modes of moderation.

8602 Assignment No 2

Uploaded by

Copyright:

Available Formats

8602 Assignment No 2

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

8602 Assignment No 2

Uploaded by

Copyright:

Available Formats

ALLAMA IQBAL OPEN UNIVERSITY

Course: Educational Assessment and Evaluation (8602) Semester: Spring, 2021

Q.1 What is the relationship between validity and reliability of test

Reliability vs validity: what’s the difference?

How is it By checking the consistency of By checking how well the results

How do they A reliable measurement is not A valid measurement is generally

Understanding reliability vs validity

A doctor uses a symptom questionnaire to diagnose a patient with a long-term medical

Type of reliability What does it assess? Example

Test-retest The consistency of a A group of participants complete

Interrater The consistency of a Based on an assessment criteria

Internal The consistency of the You design a questionnaire to

Type of validity What does it assess? Example

Construct The adherence of a measure A self-esteem questionnaire could

Criterion The extent to which the result of a A survey is conducted to measure

What can proofreading do for your paper?

How to ensure validity and reliability in your research

● Choose appropriate methods of measurement

For example, to collect data on a personality trait, you could use a

● Use appropriate sampling methods to select your subjects

● Apply your methods consistently

● Standardize the conditions of your research

(i) 1 point for each of the factors named, to a maximum of 5 points

Advantages of Essay Type Items

(iv) They give examines freedom to respond within broad limits.

(viii) They can measure divergent thinking.

(viii) They generally emphasize the lengthy enumeration of memorized facts

Suggestions for Writing Essay Type Items

Q.3 Write a note on Mean, Median and Mode. Also dicsuss

Mean = (Sum of all the observations/Total number of observations)

Example: What is the mean of 2, 4, 6, 8 and 10?

How to Find Mean?

Mean of Negative Numbers

Mean Median Mode Formula

Q.4 Write the procedure of arisiing letter

Calculating CGPA and Assigning Letter Grades

●Marks for each subject /book ocourse

Assigning letter grades to test score

1. Decision to be included in grade

2. Combining data on marking

3. Select the appropriate category reference frame

a) Working with other team members (related ratings)

b) Performance standards (absolute balance)

c) Working towards learning ability (value for improvement)

4. Determining the allocation of marks

A = Outstanding (90 to 100%)

B = Very good (80-89%)

D = Very Weak (60-69%)

F = Unsatisfactory (Less than 60%)

1) what should be included in a letter grade,

2) how should achievement data be combined in assigning letter grades?,

3) what frame of reference should be used in grading, and

4) how should the distribution of letter grades be determined?

1. Determining what to include in a grade

2. Combining data in assigning grades

4. Selecting the proper frame of reference for grading

a) Performance in relation to other group members (relative grading)

b) Performance in relation to specified standards (absolute grading)