Characteristics of A Good Test: Validity and Reliability Criteria of Assessment and Rubric of Scoring

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

Characteristics of a good test: Validity and Reliability

Criteria of Assessment and Rubric of Scoring.

Characteristics of a good test


A test is an instrument or systematic procedure for observing and describing one or more
characteristics of student, using either a numerical scale or classification scheme. According
to Brown (2000), there are three criteria of a good test, they are : Practicality, Reliability and
Validity.

A. Practicality
Practicality means financial limitations, time constraints, case of administration,
scoring and interpretation. The extent to which a test is practical sometimes hinges on
whether a test is designed to be norm-referenced or criterion-referenced. The
purpose in norm-referenced is to place the test-takers along a mathematical continuum
in rank order. Typical of norm-referenced test are standardized test intended to be
administered to large audiences, with results quickly disseminated to test-takers.
Criterion-referenced tests are designed to give test-takers feedback on specific course
or lesson objective. Classroom test involving smaller numbers and connected to a
curriculum, are typical of criterion-referenced testing.

B. Reliability
Reliability is the extent to a test shows the same result on repeated trials. Or in other
words, a test is reliable if we get the same results repeatedly.several ways to
measuring reliability are:

EQUIVALENCY RELIABILITY Equivalency reliability is the extent to which


two items measure identical concepts at an identical level of difficulty.
Equivalency reliability is determined by relating two sets of test scores to one
another to highlight the degree of relationship or association. For example, a
researcher studying university English students happened to notice that when
some students were studying for finals, they got sick. Intrigued by this, the
researcher attempted to observe how often, or to what degree, these two behaviors
co-occurred throughout the academic year. The researcher used the results of the
observations to assess the correlation between studying throughout the academic
year and getting sick. The researcher concluded there was poor equivalency
reliability between the two actions. In other words, studying was not a reliable
predictor of getting sick.

STABILITY RELIABILITY Stability reliability (sometimes called test, re-test


reliability ) is the agreement of measuring instruments over time. To determine
stability, a measure or test is repeated on the same subjects at a future date.
Results are compared and correlated with the initial test to give a measure of
stability. This method of evaluating reliability is appropriate only if the
phenomenon that the test measures is known to be stable over the interval between
assessments. The possibility of practice effects should also be taken into account.

INTERNAL CONSISTENCY Internal consistency is the extent to which tests or


procedures assess the same characteristic, skill or quality. It is a measure of the
precision between the measuring instruments used in a study. This type of
reliability often helps researchers interpret data and predict the value of scores and
the limits of the relationship among variables. For example, analyzing the internal
reliability of the items on a vocabulary quiz will reveal the extent to which the
quiz focuses on the examinees knowledge of words.

INTER-RATER RELIABILITY Inter-rater reliability is the extent to which two or


more individuals (coders or raters) agree. Inter-rater reliability assesses the
consistency of how a measuring system is implemented. For example, when two
or more teachers use a rating scale with which they are rating the students oral
responses in an interview (1 being most negative, 5 being most positive). Inter-
rater reliability is dependent upon the ability of two or more individuals to be
consistent. Training, education and monitoring skills can enhance inter-rater
reliability.

INTRA-RATER RELIABILITY Intra-rater reliability is a type of reliability


assessment in which the same assessment is completed by the same rater on two
or more occasions. These different ratings are then compared, generally by means
of correlation. Since the same individual is completing both assessments, the
rater's subsequent ratings are contaminated by knowledge of earlier ratings.

C. Validity
Validity refers to whether or not a test measures what it intends to measure. On a test
with high validity the items will be closely linked to the tests intended focus. For
many certification and licensure tests this means that the items will be highly related
to a specific job,education or occupation. If a test has poor validity then it does not
measure the -related content and competencies it ought to. Types Of Validity are:
Content Validity, Criterion-related Validity, and Construct Validity.

Content Validity Does the test measure the objectives of the course? The
extent to which a test measures a representative sample of the content to be tested
at the intended level of learning. The measuring tool is estimated in accordance
with what has been taught based on the curriculum.

Criterion-related Validity investigates the correspondence between the scores


obtained from the newly-developed test and the scores obtained from some
independent outside criteria. There are two types of Criterion-related Validity:
predictive and concurrent.
o Predictive validity: Comparison (correlation) of students' scores with a
criterion taken at a later time.
o Concurrent validity refers to how well a data collection process correlates with
some current criterion (usually another test).

Construct validity Refers to measuring certain traits or theoretical construct.


It is based on the degree to which the items in a test reflect the essential aspects of
the theory on which the test is based on. Constructs validity can be done by
identifying and pairing of test items with certain objectives that are intended to
reveal a certain level of cognitive aspects as well. As well as the content validity,
to determine the extent of the construct validity, preparation of items can be done
by basing itself on the grille measuring instrument.

The Relationship of Reliability and Validity, test validity is requisite to test reliability.
If a test is not valid, then reliability is moot. In other words, if a test is not valid there is no
point in discussing reliability because test validity is required before reliability can be
considered in any meaningful way. Validity is at the center of target. A test must be both
objective and reliable before its validity can be considered.

Criteria of Assessment and Rubric of Scoring

Criteria are developed by analysing the learning outcomes and identifying the specific
characteristics that contribute to the overall assignment. These are the standards by which
learning is judged. Ensuring assessment criteria are clearly defined will make it easier for
students to understand what is expected of them in a particular assignment. And help teachers
to focus on his or her goals in teaching and learning process. Good assessment criteria should
always:

describe which aspects of the learning outcomes will be assessed


indicate what is needed for a pass using positive language
state clearly what is expected to reach different levels of achievement

To assess the task, teachers often use rubrics that will give students the success criteria as
well as with descriptions of a number of different performance levels in relation to those
criteria. A rubric is an assessment tool that clearly indicates achievement criteria across all
the components of any kind of student work, from written to oral to visual. It can be used for
marking assignments, class participation, or overall grades. There are two types of rubrics:
holistic and analytical.

A holistic rubric consists of a single scale with allcriteria to be included in the evaluation
being considered together (e.g., clarity, organization, and mechanics). With a holistic rubric
the rater assigns a single score (usually on a 1 to 4 or 1 to 6 point scale)based on an overall
judgment of the studentwork. The rater matches an entire piece of student work to a single
description on the scale.

An analytic rubric resembles a grid with the criteria for a student product listed in the
leftmost column and the levels of performance. When scoring with an analytic rubric each of
the criteria is scored individually. The cells within the center of the rubric may or may
contain descriptions of what the specified criteria look like for each level of performance.
Example of scoring rubrics:
REFERENCES:

Brown, H. Douglas. 2000. Teaching by principles an interactive approach to language


pedagogy second edition. USA: Pearson Education.

Nurgiyantoro, Burhan. 2001. Penilaian Pembelajaran Bahasa.Yogyakarta: BPFE.

Rea-Dickins, P. and Geimanie, K. 1993. Evaluation. Oxford: Oxford University Press.

Toendan, W.H. 2016. Research Methodology. Palangka Raya : Unpublished Teaching


Material.
Characteristics of a good test: Validity and Reliability
Criteria of Assessment and Rubric of Scoring.

BY:

ANI ROSANI

MAYA NANDA RIYANTI

SUBJECT: EVALUATION IN ENGLISH TEACHING

You might also like