8602 Assignment No 2
8602 Assignment No 2
8602 Assignment No 2
ASSIGNMENT No. 2
(Units:5-9)
Reliability and validity are concepts used to evaluate the quality of research. They indicate how
well a method, technique or test measures something. Reliability is about the consistency of a
measure, and validity is about the accuracy of a measure.
It’s important to consider reliability and validity when you are creating your research design,
planning your methods, and writing up your results, especially in quantitative research.
Reliability vs validity
Reliability Validity
What does it tell The extent to which the results can The extent to which the results
you? be reproduced when the research is really measure what they are
repeated under the same supposed to measure.
conditions.
Table of contents
1.
2.
3.
4.
What is reliability?
Reliability refers to how consistently a method measures something. If the same result can be
consistently achieved by using the same methods under the same circumstances, the
measurement is considered reliable.
You measure the temperature of a liquid sample several times under identical conditions. The
thermometer displays the same temperature every time, so the results are reliable.
What is validity?
Validity refers to how accurately a method measures what it is intended to measure. If research
has high validity, that means it produces results that correspond to real properties,
characteristics, and variations in the physical or social world.
High reliability is one indicator that a measurement is valid. If a method is not reliable, it
probably isn’t valid.
If the thermometer shows different temperatures each time, even though you have carefully
controlled conditions to ensure the sample’s temperature stays the same, the thermometer is
probably malfunctioning, and therefore its measurements are not valid.
If a symptom questionnaire results in a reliable diagnosis when answered at different times and
with different doctors, this indicates that it has high validity as a measurement of the medical
condition.
However, reliability on its own is not enough to ensure validity. Even if a test is reliable, it may
not accurately reflect the real situation.
The thermometer that you used to test the sample gives reliable results. However, the
thermometer has not been calibrated properly, so the result is 2 degrees lower than the true
value. Therefore, the measurement is not valid.
A group of participants take a test designed to measure working memory. The results are
reliable, but participants’ scores correlate strongly with their level of reading comprehension.
This indicates that the method might have low validity: the test may be measuring participants’
reading comprehension instead of their working memory.
Validity is harder to assess than reliability, but it is even more important. To obtain useful
results, the methods you use to collect your data must be valid: the research must be measuring
what it claims to measure. This ensures that your discussion of the data and the conclusions you
draw are also valid.
How are reliability and validity assessed?
Reliability can be estimated by comparing different versions of the same measurement. Validity
is harder to assess, but it can be estimated by comparing the results to other relevant data or
theory. Methods of estimating reliability and validity are usually split up into different types.
Types of reliability
Different types of reliability can be estimated through various statistical methods.
Types of reliability
Types of validity
Content The extent to which the A test that aims to measure a class
measurement covers all aspects of of students’ level of Spanish
the concept being measured. contains reading, writing and
speaking components, but no
listening component. Experts agree
that listening comprehension is an
essential aspect of language ability,
so the test lacks content validity for
measuring the overall level of
ability in Spanish.
Ensuring validity
If you use scores or ratings to measure variations in something (such as psychological traits,
levels of ability or physical properties), it’s important that your results reflect the real variations
as accurately as possible. Validity should be considered in the very earliest stages of your
research, when you decide how you will collect your data.
Ensure that your method and measurement technique are high quality and targeted to measure
exactly what you want to know. They should be thoroughly researched and based on existing
knowledge.
To produce valid generalizable results, clearly define the population you are researching (e.g.
people from a specific age range, geographical location, or profession). Ensure that you have
enough participants and that they are representative of the population.
Ensuring reliability
Reliability should be considered throughout the data collection process. When you use a tool or
technique to collect data, it’s important that the results are precise, stable and reproducible.
For example, if you are conducting interviews or observations, clearly define how specific
behaviours or responses will be counted, and make sure questions are phrased the same way
each time.
When you collect your data, keep the circumstances as consistent as possible to reduce the
influence of external factors that might create variation in the results.
For example, in an experimental setup, make sure all participants are given the same
information and tested under the same conditions
Q.2 Define a scoring criteraia for essay type test items for 8th
grade?
Answer
Scoring Essay Type Items
A rubric or scoring criteria is developed to evaluate/score an essay type item. A rubric is a scoring guide
for subjective assessments. It is a set of criteria and standards linked to learning objectives that are used
to assess a student's performance on papers, projects, essays, and other assignments. Rubrics allow for
standardized evaluation according to specified criteria, making grading simpler and more transparent. A
rubric may vary from simple checklists to elaborate combinations of checklist and rating scales. How
elaborate your rubric is depends on what you are trying to measure. If your essay item is A
restricted-response item simply assessing mastery of factual content, a fairly simple listing of essential
points would be sufficient. An example of the rubric of restricted response item is given below.
Test Item:
Name and describe five of the most important factors of unemployment in Pakistan. (10 points) 147
Rubric/Scoring Criteria:
(iv) No extra credit for more than five factors named or described.
(iv) Extraneous information will be ignored.
However, when essay items are measuring higher order thinking skills of cognitive domain, more
complex rubrics are mandatory. An example of Rubric for writing test in language is given below.
(i) They can measures complex learning outcomes which cannot be measured by other means.
(ii) They emphasize integration and application of thinking and problem solving skills.
(iii) They can be easily constructed.
(v) The students cannot guess the answer because they have to supply it rather than select it.
(vi) Practically it is more economical to use essay type tests if number of students is small.
(vii) They required less time for typing, duplicating or printing. They can be written on the blackboard
also if number of students is not large.
(ix) They can be used as a device for measuring and improving language and expression skill of
examinees.
(x) They are more helpful in evaluating the quality of the teaching process.
(xi) Studies has supported that when students know that the essay type questions will be asked, they
focus on learning broad concepts and articulating relationships, contrasting and comparing.
(xii) They set better standards of professional ethics to the teachers because they expect more time in
assessing and scoring from the teachers.
Limitations of Essay Type Items
The essay type tests have the following serious limitations as a measuring instrument:
A major problem is the lack of consistency in judgments even among competent examiners.
(ii) They have halo effects. If the examiner is measuring one characteristic, he can be influenced in
scoring by another characteristic. For example, a well behaved student may score more marks on
account of his good behaviour also.
(iii) They have question to question carry effect. If the examinee has answered satisfactorily in the
beginning of the question or questions he is likely to score more than the one who did not do well in the
beginning but did well later on.
(iv) They have examinee to examinee carry effect. A particular examinee gets marks not only on the basis
of what he has written but also on the basis that whether the previous examinee whose answered book
was examined by the examiner was good or bad. 149
(v) They have limited content validity because of sample of questions can only be asked in essay type
test.
(vi) They are difficult to score objectively because the examinee has wide freedom of expression and he
writes long answers.
(vii) They are time consuming both for the examiner and the examinee.
IV. Generally give preference to specific questions that can be answered briefly. The more questions
used, the better the test constructor can sample the domain of knowledge covered by the test. And the
more responses available for scoring, the more accurate the total test scores are likely to be. In addition,
brief responses can be scored more quickly and more accurately than long, extended responses, even
when there are fewer of the latter type.
IV. Use enough items to sample the relevant content domain adequately, but not so many that
students do not have sufficient time to plan, develop, and review their responses. Some
instructors use essay tests rather than one of the objective types because they want to
encourage and provide practice in written expression. However, when time pressures become
great, the essay test is one of the most unrealistic and negative writing experiences to which
students can be exposed. Often there is no time for editing, for rereading, or for checking
spelling. Planning time is short changed so that writing time will not be. There are few, if any,
real writing tasks that require such conditions. And there are few writing experiences that
discourage the use of good writing habits as much as essay testing does. 150
V. Avoid giving examinees a choice among optional questions unless special circumstances make
such options necessary. The use of optional items destroys the strict comparability between
student scores because not all students actually take the same test. Student A may have
answered items 1-3 and Student B may have answered 3-5. In these circumstances the
variability of scores is likely to be quite small because students were able to respond to items
they knew more about and ignore items with which they were unfamiliar. This reduced
variability contributes to reduced test score reliability. That is, we are less able to identify
individual differences in achievement when the test scores form a very homogeneous
distribution. In sum, optional items restrict score comparability between students and
contribute to low score reliability due to reduced test score variability.
VI. Test the question by writing an ideal answer to it. An ideal response is needed eventually to
score the responses. It if is prepared early, it permits a check on the wording of the item, the
level of completeness required for an ideal response, and the amount of time required to
furnish a suitable response. It even allows the item writer to determine if there is any "correct"
response to the question..
Types of Mean
There are majorly three different types of mean value that you will be studying in statistics.
1. Arithmetic Mean
2. Geometric Mean
3. Harmonic Mean
Solved Examples
Question: Find the mean, median, mode, and range for the following list of values:
13, 18, 13, 14, 13, 16, 14, 21, 13
Solution:
Given data: 13, 18, 13, 14, 13, 16, 14, 21, 13
The mean is the usual average.
Mean = $\frac{13+18+13+14+13+16+14+21+13}{9}=15$
(Note that the mean is not a value from the original list. This is a common result. You should not
assume that your mean will be one of your original numbers.)
The median is the middle value, so to rewrite the list in ascending order as given below:
13, 13, 13, 13, 14, 14, 16, 18, 21
There are nine numbers in the list, so the middle one will be
$\frac{9+1}{2}=\frac{10}{2}=5$
= 5th number
Hence, the median is 14.
The mode is the number that is repeated more often than any other, so 13 is the mode.
The largest value in the list is 21, and the smallest is 13, so the range is 21 – 13 = 8.
Mean = 15
Median = 14
Mode = 13
Range = 8
The purpose of this paper is to describe concepts which will give meaning to the scores
achieved by students. They are: measures of central tendency -mode, median and midpoint;
measures of dispersion – range and standard deviation; validity; reliability; and item analysis-
facility values, discrimination index, and option analysis.
Importance of Measures of Central Tendency
Mode is the most common score or the score which occurs most frequently (Hughes,
2008:221); (Brown, 2005:99). If the scores of a group of students are:
40,55,60,60,77,75,75,75,80, the mode of the scores is 75.
Median can be found by putting all the individual scores in order of magnitude (great size) and
choosing the middle one. The median of the scores exemplified above is 77. In case there is an
even number of test takers, where there are two middle scores, add the two scores and divide
by 2. If the scores of a group of students are: 27,34,56,57,78,81, the median is taken to be
midway between the two middle scores ( Brown, 2005:100); In this example, the two middle
scores are 56 and 57 so the median is 56.5.
Midpoint in a set of scores is that point halfway between the highest score and the lowest score
on the test.
The median may be more useful than the mean when there are extreme values in the data
set as it is not affected by the extreme values. The mode is useful when the most common item,
characteristic or value of a data set is required.
Mean, median and mode are three measures of central tendency of data. Accordingly, they give
what is the value towards which the data have tendency to move. Since each of these three
determines the central position, these three are also interpreted as location parameters.
INTRODUCTION
Although teachers can calculate grade scores in a number of endless ways, some of the
standard scores for grades go through many educational institutions. Most schools have a rating
scale, which is a set of standard percentages that indicate what students should get in each
grade A with an F. issue his book grade. Most teachers will set assignments as percent, while
others will use straightforward points. Both of these plans will be considered here.
●Average grade points for each subject\ course ●Total credit hours (by adding credit hours for
each subject / course)
Calculating the CGPA is so easy that the average grade point is divided into total credit hours.
For example if the MA education system has studied 12 subjects, each of 3 credits. The total
credit hours will be 36. The CGPA will be 36/12 = 3.0
Give character marks The book grade program is very popular in the world including Pakistan.
Most teachers are concerned problems while giving marks. There are four main problems or
issues in this regard; 1) to be included in the book grade, 2) how the implementation data
should be compiled in giving bullet marks? How should the allocation of character marks be
determined?
Character tags can be very meaningful and useful when represented only success. When linked
to other factors or factors such as effort of finished work, human behavior, and so on, their
interpretation will be confused hopelessly. For example, a book in Grade C could represent a
moderate achievement with excessive effort and very good behavior and behavior or vice versa.
If bookmarks are to be legitimate indicators of success, they should be based on legitimacy
steps to success. This includes setting objectives such as targeted learning results and
development or selection of tests and tests that can measure this learning outcomes.
One of the most important things when giving marks is to be clear about the characteristics of
the student will be assessed whether there will be an equal weight for each learning outcome.
Because
for example,
if we decide that an average of 35 percent should be given in mid-year exams, 40% of the
end-of-year test or examination, and 25% in assignments, presentations, classroom
participation and ethics; we have to put everything in assign appropriate instruments to each
item, and use these integrated schools as the basis of grading.
Standard character marks are given on the basis of the following frame of index.
The awarding of related marks is primarily a matter of positioning the student in the order of
total earning and allocation of marks on the basis of each student's level in group. This situation
may be limited to one class or may be based on it a combined distribution of multiple study
groups conducting the same study. If curling is to be done, the most sensible approach to
determining
The distribution of bullet marks at the school prompted school staff to set standard guidelines
with introductory and advanced studies. All employees must understand the basics of this mark
allocation, and this foundation should be clearly communicated to grade users. If the objectives
of the courses are clearly stated and the levels of management properly set, character marks in
a complete system can be defined as this level goals achieved, as pursued.
C = Satisfactory (70-79%)
Letter grades are likely to be most meaningful and useful when they represent achievement
only. If they are communicated with other factors or aspects such as effort of work completed,
personal conduct, and so on, their interpretation will become hopelessly confused. For
example, a letter grade C may represent average achievement with extraordinary effort and
excellent conduct and behaviour or vice versa. If letter grades are to be valid indicators of
achievement, they must be based on valid measures of achievement. This involves defining
objectives as intended learning outcomes and developing or selecting tests and assessments
which can measure these learning outcomes.
One of the key concerns while assigning grades is to be clear what aspects of a student are to
be assessed or what will be the tentative weightage to each learning outcome. For example, if
we decide that 35 percent weightage is to be given to mid-term assessment, 40 percent final
term test or assessment, and 25% to assignments, presentations, classroom participation and
conduct and behaviour; we have to combine all elements by assigning appropriate weights to
each element, and then use these composite scores as a basis for grading.
Letter grades are typically assigned on the basis of one of the following frames of reference.
the tendency of the data to measure the merger with a certain average value. The approximate
when values are around the median value are usually measured using standard deviations. They
are classified as abbreviated figures.
Definition Of Reliability
Measures of central Tendency are numbers that usually meet "in the middle" of a set of values.
Three such numbers in the middle mean, media, and mode.
Mean: The sum of all the measurements divided by the visual value.It can be used with
different and continuous details. Very common value.
Median: Medium value that separates the upper and lower half. Mean and median can
becompared to each other to determine whether the population is still distributed normally or
not. The numbers are arranged in sequence ascending or descending. The average number is
more than one taken.
Mode: Frequent value. Displays the most popular option and is the highest bar in the histogram.
Example to use: to determine the most common blood group.
Measures of reliability
Reliability is one of the most important aspects of quality testing. Related to consistency, or
rebirth, of test performer performance. Impossible to calculate the accuracy exactly. Instead, we
must balance honesty, and this is always the case incomplete effort. Here, we present major
statistics of fidelity and talk about them their strengths and weaknesses.
There are six standard categories for reliability, each of which measures reliability in a different
way. Of course:
Assessing the extent to which different providers / viewers offer matching ratings for the same
thing. That is, if two teachers mark the same test and the results are the same, then so be it
shows fidelity between a rater or inter-observer.
Testing for rate variability from one period to another, when the same test is the case treated
twice and the results of all the treatment are the same, this is the way the reliability of the test.
Students can remember and can mature after the start management creates a problem with the
reliability of the test.
iii) Integrity of the Common Form:
To test the consistency of the results of two similarly structured trials from same content
domain. Here the experimental designer is trying to develop two experiments of the same type
and after treatment the results are the same then they will show the same form honesty.
To test the consistency of the results in all items in the tests, the combination of points for
individual items in all tests.
To test the consistency of the results by comparing two halves of one test, these halves may be
be unusual things in one test.
To test the consistency of the results using all the separate half of the test.
What does term fidelity mean? Honesty means Honesty. The test rating says it is called honestly
when we have reason to believe that the test rate is stable again purpose. For example if the
same test is given to two classes and marked separately teachers even if they produced similar
results, can be considered reliable. Sustainability and reliability depend on the level at which
points are unlikely error. We must first build a conceptual bridge between the question posed
by
per person (i.e., are my points reliable?) and how reliability is measured scientifically. This
bridge is not as simple as it may first appear. When a person thinks honestly, many things can
come to his mind - my friend is very trustworthy, my car is very reliable, My online payment
process is very reliable, my client's performance is very reliable,
and so on. The elements in question are concepts like consensus, trust, prediction, diversity etc.
Note that the excluded, reliable statements are
performance, machine functionality, data processes, and performance functionality are possible
sometimes unreliable. The question is “how many test numbers vary differently to be seen?
But is the measure of central Tendency
Suppose the teacher gave the same test to two different classes and to the following results
are available:
Category 1: 80%, 80%, 80%, 80%, 80%
Category 2: 60%, 70%, 80%, 90%, 100%
If you count the meanings in both sets of schools, you get the same answer: 80%. But i the data
for the two classes from which this route was taken were very different from both charges. It is
also possible that different data sets have similar meanings, median, and mode.
For example:
Category A: 72 73 76 76 78
Category B: 67 76 76 78 80
Category A and category B therefore have the same meanings, mode, and the same media.
The way mathematicians classify cases like this is known as measuring the sample variability. As
a measure of moderation, there are a number of
methods for measuring sample variability.
Probably the easiest way to find out the width of the sample, that is, the difference between
seeing too much and too little. The average rating for Class 1 is 0, and the range in grade 2 is
40%. Just knowing that fact gives you the best understanding of the data obtained in these two
categories. In Phase 1, the value was 80%,
and the range was 0, but in class 2, the value was 80%, and the range was 40%. Mathematicians
use concise steps to define data patterns. Intermediate steps inclination refers to the
abbreviated steps used to describe the most "normal" value in price collection. Here, we are
interested in common, very representative points. There are three of them Typical measures for
medium inclination mean mode, and mode. It has to be a teacher you are familiar with these
common modes of moderation.