0% found this document useful (0 votes)
279 views16 pages

MRAD

1) The study evaluates the validity and reliability of departmental exam questions for professional education subjects given at the University of Cebu-Lapu-Lapu and Mandaue Campus. 2) Using classical test theory, the study analyzes exam items to determine difficulty, discrimination, and classify items as retain, revise, or reject to develop a test item bank. 3) Preliminary results found the exams were reliable but many items were too easy, signaling a need for question revision to better assess student abilities. The findings aim to improve future exam quality.

Uploaded by

Irelle B.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
279 views16 pages

MRAD

1) The study evaluates the validity and reliability of departmental exam questions for professional education subjects given at the University of Cebu-Lapu-Lapu and Mandaue Campus. 2) Using classical test theory, the study analyzes exam items to determine difficulty, discrimination, and classify items as retain, revise, or reject to develop a test item bank. 3) Preliminary results found the exams were reliable but many items were too easy, signaling a need for question revision to better assess student abilities. The findings aim to improve future exam quality.

Uploaded by

Irelle B.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

VALIDATION OF DEPARTMENTAL EXAMINATION QUESTIONS: BASIS FOR TEST

ITEM BANK FORMULATION


Augusto, Hanna Rica I., Caballa, Joshua M., Capacite, Clarisse Marie M., Reñono, Jeselle Marie,
Diaz, Edcel Marie M.

University of Cebu-Lapulapu and Mandaue


A.C. Cortes Ave. Looc, Mandaue City, Philippines
Email: [email protected]
May 2023

ABSTRACT

Board examination results stand as a testament to the educational caliber offered by


Teacher Education Institutions (TEIs). Heeding the clarion call for educational excellence, this
study delves into the validity of departmental examination questions for the professional education
subjects in S.Y. 2022-2023, aspiring to lay the groundwork for an impeccable test item bank. Rooted
in a meticulous descriptive quantitative framework, data sourced from the College of Teacher
Education underwent comprehensive statistical scrutiny. The insights drawn were enlightening:
while the six professional education courses manifested notable reliability, an overwhelming
proportion of their questions were classified as "very easy", signaling an imperative for revision.
While this may hint at an academically adept student populace, it simultaneously accentuates the
critical role of item analysis in sculpting superior assessments. Concluding on a note of urgency,
the study underscores the paramountcy of constructing tests that faithfully capture student prowess,
advocating for an earnest overhaul of items and the inception of a robust, standardized test bank.
Keywords: Departmental Examination, Item Analysis, Item Difficulty, Item Discrimination,
Reliability Index

INTRODUCTION
As the global community grapples with sustainability challenges, the realm of education
has not remained untouched. Amidst pressing environmental, economic, and societal challenges, the
validity and integrity of educational assessments have emerged as crucial facets in achieving global
sustainability targets, particularly under the umbrella of quality education. Reliable assessment tools
are not mere evaluative instruments; they are the bedrock upon which it can nurture a future
generation that's equipped to address multifaceted global issues.
Standardized examinations, a predominant component of educational evaluation, hold the promise
of equanimity, offering an even playing ground for all students. Examples such as departmental
exams and board exams are testament to this, where a universal set of questions aims to gauge the
readiness of future professionals—from doctors to educators—for their respective fields. These
exams aren't just milestones; they shape futures and hence, the weight of their influence is
formidable.
Given the stakes, Teacher Education Institutions (TEIs) wield a potent tool in mock boards and
departmental examinations, mirroring the Board Licensure Examination for Professional Teachers
(BLEPT). Designed to mimic real-world testing conditions, these mock examinations serve as
essential dress rehearsals. However, the validity of these tools hinges on the quality of their
construction. In an era advocating for revitalized research and ethics, ensuring the quality of these
exams through rigorous validation becomes a global imperative. The resilience and credibility of a
test score correlate with its reliability, and any shortcoming in this spectrum risks the sanctity of the
evaluations, and by extension, the future trajectories of students.
Alarmingly, many educators have not internalized the gravity of thorough item analysis, often
resting on the assumption that their constructed test items are beyond reproach (Hartoyo, 2011).
Such presumptions pose a threat to quality education as they can misguide interpretations about
student capabilities based merely on scores, without weighing the quality of the test itself. Despite
global emphasis on item analysis, the Philippines lags behind, with glaring gaps, particularly
concerning departmental exams.
In light of these issues, this research aims to validate departmental test items, emphasizing their role
in achieving holistic educational excellence in the Philippines and by extension, contributing to
global sustainability. This study's outcomes hope to illuminate the path for educators, inspiring the
crafting of reliable assessments. By sharpening the precision of test items, this study takes one step
closer to providing students with unparalleled evaluative tools, solidifying the foundations for a
globally-recognized test bank in service of sustainable futures.

THEORETICAL BACKGROUND
Anchored in the principles of Classical Test Theory (CTT), this study delves into the
quantitative assessment of test reliability and validity. CTT operates on the premise that every
individual has an inherent true score, captured by the equation X = T + E. Here, 'X' denotes the
observed score, 'T' the true score, and 'E' the random error. The core objective of CTT is to refine
both the reliability and validity of tests.
In this research context, a designated sample undergoes testing. Post-examination, an
analytical evaluation ensues, leveraging various facets of CTT, including descriptive statistics like
mean and variance. A critical step involves gauging item-level statistics, especially the mean, to
discern item efficacy. Additionally, item difficulty, quantified by 'p' values, and the discrimination
index, which measures the item's validity, play pivotal roles in the analysis.
The study's crux revolves around CTT's application in evaluating departmental exam
questions' validity. Through meticulous assessments, researchers aim to categorize items as
retainable, in need of revision, or to be discarded. The findings, in essence, will chart the blueprint
for a robust test item bank and provide directives for validating exam questions, thereby enabling
institutions to offer exams that truly reflect and augment students' learning capacities.
Figure 1. Schematic Diagram of Study

OBJECTIVES
The study entitled Validation of Departmental Examination Questions: Basis for Test Item
Bank Formulation aims to validate the departmental examination questions of the professional
education subjects for the S.Y. 2022-2023 as the basis for test item bank formulation. Specifically,
this study seeks to answer the following question: (1) evaluate the validity indices of subjects within
the departmental examination during the first semester of S.Y. 2022-2023, (2) evaluate the
reliability indices of these subjects within the same examination framework (3) identify the number
of exam items categorized for retention, revision, or rejection, and (4) propose a strategic action
plan derived from the study's findings for future examination enhancements.
METHODOLOGY
This study employed a descriptive quantitative approach to assess the validity and reliability
of departmental examination questions for professional education subjects during the academic year
2022-2023. The descriptive quantitative design was chosen as it provides a structured method for
gathering quantifiable data from a sample population, facilitating statistical analysis.
In this particular study, there were no respondents involved as the data collected is of a
secondary nature, as the college itself is responsible for conducting and, consequently, collecting the
data. Given that the college administers these examinations as part of its standard academic
procedures, there was no need to seek external respondents.
This study is conducted at the University of Cebu—Lapu-Lapu and Mandaue Campus,
situated in Mandaue City. This esteemed institution of higher learning is deeply dedicated to
providing a holistic education founded on the principles of humanism, nationalism, and academic
excellence. The university's overarching mission is to make quality education accessible to all and
aims to lead the industry through visionary leadership, with a strong commitment to inspiring and
positively influencing the lives of its students. In pursuit of this mission, the College of Teacher
Education (CTE) consistently administers departmental assessments, designed to equip students
with the knowledge and skills needed to excel in board examinations, with the ultimate goal of
producing topnotchers.

The data gathering process involved several distinct phases. Initially, during the preliminary
stage, the researchers diligently navigated the intricacies of obtaining necessary approvals and
permissions from relevant authorities to conduct their study. These efforts were complemented by a
meticulous review of institutional policies to gain a comprehensive understanding of the procedures
required for data access. Subsequently, in the actual data gathering phase, after securing the required
permissions from the college, the researchers adeptly acquired the essential data for their research
endeavor. Following the data collection phase, the researchers seamlessly transitioned into the post-
data gathering phase, where they undertook a thorough examination and curation of the collected
data, meticulously organizing it in preparation for subsequent item analysis. The data analysis
process was multifaceted, encompassing a variety of techniques. Subproblem 1 was addressed
through the application of item analysis methods, incorporating formulas for calculating difficulty
and discrimination indices. These calculations drew upon established reference tables, including
Matoba's difficulty index table and Dr. Roderick O. Misa's discrimination index table. For
subproblem 2, the researchers harnessed the power of the split-half reliability method and the
Spearman-Brown formula, supported by an adapted table of reliability coefficients authored by Dr.
Roderick O. Misa. The data treatment strategy involved presenting information in tabulated formats,
facilitating its interpretation through diverse statistical techniques. In addressing subproblems 3 and
4, the researchers efficiently employed frequency counts to identify items necessitating revision or
rejection, thereby adopting a comprehensive and systematic approach to data analysis in pursuit of
their research objectives.
The ethical considerations integral to this study encompasses several key facets. Notably,
participant selection did not apply as the data gathered was of a secondary nature, sourced from the
College of Teacher Education and the Computer Maintenance Old Building In-Charge. The research
procedures adhered to a meticulously defined protocol, entailing the solicitation of permissions to
access data from the departmental examination held during the 1st semester of S.Y. 2022-2023.
Subsequently, the obtained data underwent rigorous testing for validity and reliability, contributing
to the creation of a repository of sound test items. Maintaining the highest level of confidentiality,
access to this data was restricted solely to the researchers and the supervisory panel, with plans for
its secure disposal after six months. Importantly, given the absence of respondents, the study carried
minimal risk, as the data were readily available secondary sources. Strict confidentiality measures
were diligently applied, ensuring that the data and related documents were securely stored and not
disclosed to external parties. Upon the study's completion, the researchers committed to sharing
their findings with the College of Teacher Education and the school administration, conducting all
research-related communications with transparency and integrity. The trustworthiness of the
research was underscored by a meticulous commitment to precision, consistency, and
comprehensiveness throughout the data collection and analysis processes, with detailed
documentation and systematic methodology disclosure to enable readers to assess the research's
authenticity.
RESULTS AND DISCUSSION
This chapter provides the results, discussions, analyses, and interpretations of data based on
the stated objectives.

Table 1
Difficulty Level of Foundation of Special and Inclusive Education

Difficulty Level Frequency Percentage


Very Easy 129 86%
Easy 6 4%
Average 3 2%
Difficult 7 5%
Very Difficult 5 3%
Totality 150 100%

The data shows that 86% of the items in this test are very easy, 4% of the items in this test
are easy, 2% of the items in the test are average, 5% of the items in this test are difficult, and 3% of
the items in this test are very difficult. It can be deduced that this test on the Foundation of Special
and Inclusive Education could potentially possess the qualities of an excellent test. The data
revealed that the majority of the items in the test set have a very easy level of difficulty, while there
are barely any items with a medium level of difficulty. The Foundation of Special and Inclusive
Education examination includes questions on the learning characteristics of students with special
educational needs as well as differentiated strategies for teaching, assessing, and managing students
with special educational needs in regular classes.

Table 2
Discrimination Level of Foundation of Special and Inclusive Education

Discrimination Level Frequency Percentage


Intrinsically Ambiguous 13 9%
Zero Discrimination 1 1%
Poor Discrimination 0 0%
Marginal Discrimination 2 1%
Fair Discrimination 4 2%
Good Discrimination 1 1%
Excellent Discrimination 129 86%
Totality 150 100%

The data reveals that 9% of the items are intrinsically ambiguous, 1% of the items have zero
discrimination, 0% of the items have poor discrimination, 1% of the items have marginal
discrimination, 2% of the items have fair discrimination, 1% of the items have good discrimination,
and 86% of the items have excellent discrimination. Based on the data, most of the items were
within the acceptable parameters of excellent discrimination as revealed by the frequency count.
This means that most of the test items were able to discriminate academically between the high-
performing and low-performing examinees.

Table 3
Difficulty Level of The Child and Adolescent Learners and Learning Principles

Difficulty Level Frequency Percentage


Very Easy 137 91%
Easy 6 4%
Average 3 2%
Difficult 3 2%
Very Difficult 1 1%
Totality 150 100%

The data shows that 91% of the items in this test are very easy, 4% of the items in this test
are easy, 2% of the items in the test are average, 2% of the items in this test are difficult, and 1% of
the items in this test are very difficult. It can be inferred that this test based on The Child and
Adolescent Learners and Learning Principles is invalid, but it has the potential to possess the
qualities of an excellent test. The Child and Adolescent Learners and Learning Principles
examination includes questions regarding human development, developmental stages and tasks, and
human development theories.

Table 4
Discrimination Level of The Child and Adolescent Learners and Learning Principles

Discrimination Level Frequency Percentage


Intrinsically Ambiguous 12 8%
Zero Discrimination 2 1%
Poor Discrimination 3 2%
Marginal Discrimination 3 2%
Fair Discrimination 4 3%
Good Discrimination 4 3%
Excellent Discrimination 122 81%
Totality 150 100%

The data indicates that 8% of the items are intrinsically ambiguous, 1% of the items have
zero discrimination, 2% of the items have poor discrimination, 2% of the items have marginal
discrimination, 3% of the items have fair discrimination, 3% of the items have good discrimination,
and 81% of the items have excellent discrimination. Based on the data, it can be implied that this
test on The Child and Adolescent Learners and Learning Principles has a good positive
discrimination power which is required for a good test. With more than 50% of items having
excellent discrimination, it can be said that this test set has reached the ideal index and implies that
high achievers answer correctly more often than low achievers.
Table 5
Difficulty Level of Technology for Teaching and Learning 1

Difficulty Level Frequency Percentage


Very Easy 93 62%
Easy 15 10%
Average 26 18%
Difficult 14 9%
Very Difficult 2 1%
Totality 150 100%

The data shows that 62% of the items in this test are very easy, 10% of the items in this test
are easy, 18% of the items in the test are average, 9% of the items in this test are difficult, and 1% of
the items in this test are very difficult. This test set on Technology for Teaching and Learning 1
entails that it does not have the desirable qualities of a good test but has the ability to. The item
difficulty data tells us that the quality of the test items is very easy, which can help teachers and test
creators be informed that their current test set lacks items with a medium degree of difficulty. The
Technology for Teaching and Learning 1 includes questions about basic concepts in ICT, digital
literacy skills, non-digital and digital tools, and the integration of ICT in the classroom.

Table 6
Discrimination Level of Technology for Teaching and Learning 1

Discrimination Level Frequency Percentage


Intrinsically Ambiguous 15 10%
Zero Discrimination 5 3%
Poor Discrimination 5 3%
Marginal Discrimination 10 7%
Fair Discrimination 14 9%
Good Discrimination 14 9%
Excellent Discrimination 87 59%
Totality 150 100%

The data exhibits that 10% of the items are intrinsically ambiguous, 3% of the items have
zero discrimination, 3% of the items have poor discrimination, 7% of the items have marginal
discrimination, 9% of the items have fair discrimination, 9% of the items have good discrimination,
and 59% of the items have excellent discrimination. On the basis of the frequency data, it is possible
to deduce that this Technology for Teaching and Learning 1 test set has a positive discrimination
power. This demonstrates that this test set has reached the optimal index and suggests that high
achievers answer accurately more frequently than low achievers.
Table 7
Difficulty Level of The Teaching Profession

Difficulty Level Frequency Percentage


Very Easy 102 68%
Easy 23 15%
Average 16 11%
Difficult 7 5%
Very Difficult 2 1%
Totality 150 100%

The data shows that 68% of the items in this test are very easy, 15% of the items in this test
are easy, 11% of the items in the test are average, 5% of the items in this test are difficult, and 1% of
the items in this test are very difficult. It is possible to conclude that this test based on The Teaching
Profession is invalid, but it has the potential to be an exceptional test. The majority of the items on
the examination were of a very easy degree of difficulty, and there were just a few with a moderate
degree of difficulty. The Teaching Profession examination consists of queries about teaching as a
profession, vocation, and mission, code of ethics, competency framework and standards, and
educational philosophies.

Table 8
Discrimination Level of The Teaching Profession

Discrimination Level Frequency Percentage


Intrinsically Ambiguous 10 7%
Zero Discrimination 5 3%
Poor Discrimination 2 1%
Marginal Discrimination 16 11%
Fair Discrimination 19 13%
Good Discrimination 20 13%
Excellent Discrimination 78 52%
Totality 150 100%

The data presents that 7% of the items are intrinsically ambiguous, 3% of the items have
zero discrimination, 1% of the items have poor discrimination, 11% of the items have marginal
discrimination, 13% of the items have fair discrimination, 13% of the items have good
discrimination, and 52% of the items have excellent discrimination. It may be inferred from the data
that this set of test on The Teaching Profession has a high positive discrimination power indicative
of a high-quality assessment tool. With more than fifty percent of items having excellent
discrimination, it can be said that this test set has attained the optimal index, indicating that high
achievers are more likely to answer accurately than low achievers.
Table 9
Difficulty Level of Assessment in Learning 1

Difficulty Level Frequency Percentage


Very Easy 126 84%
Easy 13 9%
Average 8 5%
Difficult 2 1%
Very Difficult 1 1%
Totality 150 100%

The data shows that 84% of the items in this test are very easy, 9% of the items in this test
are easy, 5% of the items in the test are average, 1% of the items in this test are difficult, and 1% of
the items in this test are very difficult. It can be deduced that this Assessment in Learning 1
examination may have possessed the qualities of an ideal examination. The data revealed that the
test set lacks items with a medium level of difficulty and consists primarily of items with a very
simple level of difficulty. The Assessment in Learning 1 exam includes assessment purposes,
assessment types, and item-analysis questions.
Table 10
Discrimination Level of Assessment in Learning 1

Discrimination Level Frequency Percentage


Intrinsically Ambiguous 3 2%
Zero Discrimination 2 1%
Poor Discrimination 1 1%
Marginal Discrimination 8 5%
Fair Discrimination 9 6%
Good Discrimination 19 13%
Excellent Discrimination 108 72%
Totality 150 100%

The data indicates that 2% of the items are intrinsically ambiguous, 1% of the items have
zero discrimination, 1% of the items have poor discrimination, 5% of the items have marginal
discrimination, 6% of the items have fair discrimination, 13% of the items have good
discrimination, and 72% of the items have excellent discrimination. This test set for Assessment in
Learning 1 has good discriminatory power, as shown by the frequency statistics. This indicates that
the optimal index has been reached and suggests that high achievers are more likely than low
achievers to provide accurate responses.
Table 11
Difficulty Level of The Teacher and The School Curriculum

Difficulty Level Frequency Percentage


Very Easy 101 67%
Easy 24 16%
Average 14 9%
Difficult 7 5%
Very Difficult 4 3%
Totality 150 100%

The data shows that 67% of the items in this test are very easy, 16% of the items in this test
are easy, 9% of the items in the test are average, 5% of the items in this test are difficult, and 3% of
the items in this test are very difficult. It can be gleaned that this test set on the Teacher and the
School Curriculum could have had the ideal characteristics of a good test. The data revealed that the
test set lacks a medium degree of difficulty. The Teacher and the School Curriculum test include
curriculum development, planning, and evaluation questions.

Table 12
Discrimination Level of The Teacher and The School Curriculum

Discrimination Level Frequency Percentage


Intrinsically Ambiguous 14 9%
Zero Discrimination 2 1%
Poor Discrimination 5 3%
Marginal Discrimination 10 7%
Fair Discrimination 17 12%
Good Discrimination 23 15%
Excellent Discrimination 79 53%
Totality 150 100%

The data presents, that 9% of the items are intrinsically ambiguous, 1% of the items have
zero discrimination, 3% of the items have poor discrimination, 7% of the items have marginal
discrimination, 12% of the items have fair discrimination, 15% of the item’s good discrimination,
and 53% of the items have excellent discrimination. The data suggest that this test set on The
Teacher and The School Curriculum has a high positive discrimination power, which is indicative of
a high-quality evaluation instrument. According to the presented data, the majority of items fall
within the acceptable range for excellent discrimination. It indicates that the majority of the items
were able to distinguish between high and low academic performers.
Table 13
Reliability Statistics of the Departmental Examination of Professional Education Courses

Professional Education N Spearman-Brown Verbal


Courses Coefficient Interpretation
Foundation of Special Good
207 .733
and Inclusive Education Reliability
The Child and
Very Good
Adolescent Learners 205 .872
Reliability
and Learning Principles
Technology for Teaching Good
135 .752
and Learning 1 Reliability
The Teaching Profession Questionable
103 .070
Reliability
Assessment in Learning Very Good
106 .885
1 Reliability
The Teacher and The Very Good
98 .894
School Curriculum Reliability

The data reveals that three professional education courses got very good reliability. The
Teacher and The School Curriculum course got the highest reliability coefficient, .894. Followed by
Assessment in Learning 1 course, with a .885 reliability coefficient, and then followed by The Child
and Adolescent Learners and Learning Principles course, with a .872 coefficient reliability. While
Technology for Teaching and Learning 1 and Foundation of Special and Inclusive Education
courses got good reliability, with a .752 and .733 reliability coefficient, respectively. Lastly, The
Teaching Profession course got a coefficient reliability of .070, interpreted as questionable
reliability. According to Flucher and Davidson (2007), for a test to be reliable, it must produce
consistent results under different conditions. Consequently, the test result is reliable. The Spearman-
Brown formula was used to ascertain the questionnaire's reliability. A test with a calculated value of
greater than 0.70 is regarded as reliable (Crocker & Alqini, 2008; Smith, 2018). Consequently, the
departmental examination of the professional education courses is regarded as reliable. Even though
the reliability of one of the exams is questionable, it is still valid.
Table 14
Number of Items to be Retained, Revised, and Rejected

Professional Education No. of items No. of items No. of items Totality


Courses to be to be revised to be rejected
retained
Foundation of Special and
8 126 16 150
Inclusive Education
The Child and Adolescent
Learners and Learning 45 103 2 150
Principles
Technology for Teaching
29 99 22 150
and Learning 1
The Teaching Profession 24 106 20 150
Assessment in Learning 1 17 124 9 150
The Teacher and The
34 86 30 150
School Curriculum

The data presented shows that the column with the highest numbers is the column for the
number of items to be revised. The only subject with a high number of items to be retained is
TC112: The Child and Adolescent Learners and Learning Principles, with a number of 45. Followed
by PC315: The Teacher and the School Curriculum, with a number of 34 items to be retained. The
table shows that it is important for teachers and test creators to perform item analysis to know more
about the tests they created. Odukya et al. (2018) advised using item analysis when preparing for an
exam. By doing this, teachers and test creators may know whether the test items they have created
met the psychometric thresholds for difficulty indices. If test items failed to meet the psychometric
thresholds for difficulty indices, it requires modifications or removal. Modifying or removing those
failed test items would improve the test content and construct validity.

CONCLUSION
In conclusion, this descriptive quantitative research revealed that all six professional
education courses are not validated but reliable. Additionally, one of the reasons why the majority of
test questions are "very easy" could be that the majority of students have studied or reviewed well
for the exam. Finally, the findings of this study revealed that item analysis is a crucial step in the
test development process. It pertains to determining the test quality that directly influences the
accuracy of student scores.

RECOMMENDATIONS

Given the implications of the research findings and the ensuing conclusions, the ensuing
recommendations are proffered:
1. Institution of a Rigorous Test Item Refinement Protocol: Endeavor to formulate a holistic
and dynamic approach focused on the rigorous enhancement and validation of departmental
examination questions for professional education subjects. This protocol should be anchored
on contemporary best practices and should prioritize the elevation of question quality,
ensuring they align with the rigors of the professional landscape and the expectations of
Teacher Education Institutions (TEIs).
2. Establishment of a Dedicated Test Quality Oversight Committee within the College of
Teacher Education (CTE): This body should be entrusted with the responsibility of
periodically reviewing, evaluating, and refining test items. By doing so, they would be
ensuring that assessments remain contemporaneous, challenging, and reflective of the
evolving academic and professional demands.
3. Inception of a Comprehensive Teacher Training Workshop: Champion a series of dedicated
workshops tailored for test creators within the CTE. These workshops should provide a deep
dive into the nuances of designing high-quality test items, elucidate the principles of test
validity and reliability, and offer hands-on opportunities for faculty to refine and perfect
their test creation skills.
4. Rollout of a Feedback-Driven Test Revision Mechanism: Cultivate a culture of continuous
improvement by instituting a mechanism where teachers, students, and test creators
collaboratively review and provide feedback on test items. This iterative feedback loop
ensures the dynamism and relevance of questions while fostering a sense of collective
ownership over the quality of assessments.
5. The study recommends further research to establish the validation of departmental
examination questions as an important part of formulating a test item bank. The following
titles can be considered for Quantitative studies: (1) A Comparative Analysis of Item
Difficulty Index in Exam Questions for College of Teacher Education Students; (2) Validity
and Reliability of Multiple-Choice Questions for Education Students’ Exams; (3) An
Empirical Study of Item Discrimination Index and Item-Total Correlation in Exam
Questions for College of Teacher Education Students. For Qualitative studies: (1) Evaluating
the Content Validity of Examination Questions: A Qualitative Analysis of Expert Feedback;
(2) Exploring the Perceptions of Students regarding the Construct Validity of Examination
Questions; (3) A Qualitative Study on Item Analysis Techniques for Validating Examination
Questions.
REFERENCES

Abdalla, M. E. (2011). What does Item Analysis Tell Us? Factors affecting the reliability of
Multiple Choice Questions (MCQs). Gezira Journal of Health Sciences, 7(2), Article 2.
http://journals.uofg.edu.sd/index.php/gjhs/article/view/318
An, X., & Yung, Y.-F. (2014). Item Response Theory: What It Is and How You Can Use the IRT
Procedure to Apply It. 14.
Arokia Marie, S. M. J., & Edannur, S. (2015). Relevance of Item Analysis in Standardizing an
Achievement Test in Teaching of Physical Science in B.Ed Syllabus. I-Manager’s Journal
of Educational Technology, 12(3), 30–36. https://doi.org/10.26634/jet.12.3.3743
Brown, H. D. (2004). Language Assessment: Principles and Classroom Practices. Longman.
Bryman, A. & Bell, E. (2007). “Business Research Methods,” 2nd edition. Oxford University
Press.
Classical Test Theory – Psychometric Tests. (n.d.). Retrieved November 9, 2022, from
https://www.psychometrictest.org.uk/classic-test-theory/
Classical Test Theory Assumptions, Equations, Limitations, and Item Analyses. (2005).
https://www.sagepub.com/sites/default/files/upm-binaries/4869_Kline_Chapter_5_Classical
_Test_Theory.pdf
Crocker, L. & Algina, J. (2008). Introduction to classical and modern test theory. Mason, OH:
Cengage Learning.
Flucher, G., & Davidson, F. (2007). Language Testing and Assessment: An Advance Resource
Book. Routledge.
Fraenkel, J. R. & Wallen, N. E. (2003). How to design and evaluate research in education (5th ed.).
New York, NY: McGraw-Hill Higher Education.
Hartoyo. (2011). Language assessment. Pelita Insani
Haladyna, T. M. (2004). Developing and Validating Multiple-Choice Test Items (3rd ed.). Lawrence
Erlbaum Associates Publisher.
Hingorjo, M. R., & Jaleel, F. (2012). Analysis of One-Best MCQs: The Difficulty Index,
Discrimination Index, and Distractor Efficiency. JPMA-Journal of the Pakistan Medical
Association, 62(2), 142–147.
Karim, S. A., Sudiro, S., & Sakinah, S. (2021). Utilizing test items analysis to examine the level of
difficulty and discriminating power in a teacher-made test. EduLite: Journal of English
Education, Literature and Culture, 6(2), 256. https://doi.org/10.30659/e.6.2.256-269
Kunandar, K. (2013). Penilaian Autentik: (Penilaian Hasil Belajar Peserta Didik Kurikulum 2013).
RajaGrafindo Persada.
Kline, T. J. (2005). Reliability of test scores and test items. SAGE Publications, Inc.,
https://dx.doi.org/10.4135/9781483385693
Linn, R. L. (2010). Educational Measurement: Overview. In P. Peterson, E. Baker, & B. McGaw
(Eds.), International Encyclopedia of Education (Third Edition) (pp. 45–49). Elsevier.
https://doi.org/10.1016/B978-0-08-044894-7.00243-8
Livingston, S. (2018). Test Reliability-Basic Concepts.
https://www.ets.org/Media/Research/pdf/RM-18-01.pdf
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA:
Addison-Wesley.
Maharani, A. V., & Putro, N. H. P. S. (2020). Item Analysis of English Final Semester Test.
Indonesian Journal of EFL and Linguistics, 5(2), 491.
https://doi.org/10.21462/ijefl.v5i2.302
Mardapi, D. (2015). Pengukuran, Penilaian, dan Evaluasi Pendidikan. Nuha Litera.
Matoba, B. (2020, August 6). Discriminatory item analysis: A test administrator’s best friend.
EMS1. https://www.ems1.com/ems-products/training-tools/articles/discriminatory-item-
analysis-a-test-administrators-best-friend-swTO4YwkCWH1ZXyD/
Miller, D. M., Linn, R. L., & Gronlund, N. E. (2009). Measurement and Assessment in Teaching
(10th ed.). Pearson Education.
Odukoya, J. A., Adekeye, O., Igbinoba, A. O., & Afolabi, A. (2018). Item analysis of university-
wide multiple choice objective examinations: The experience of a Nigerian private
university. Quality & Quantity, 52(3), 983–997. https://doi.org/10.1007/s11135-017-0499-2
Rosana, D., & Setyawarno, D. (2017). Statistik Terapan Untuk Penelitian Pendidikan. UNY Press.
Sabbot. (2013, May 15). Standardized Test Definition. The Glossary of Education Reform.
https://www.edglossary.org/standardized-test/
Smith, S. (2018). What is KR20? Platinum Educational Group. Retrieved from
https://platinumed.zendesk.com/hc/en-us/articles/214600006-What-is-KR20-
Tanquis, J., & Baluarte, S. (2019). Assessment of the Departmental Examination Outcomes of the
College of Computer Studies of the University of Cebu LapuLapu and Mandaue: Proposed
Intervention. Cebu Journal of Computer Studies, 2(1), 6-16.
Thompson, N. (2018, April 14). What is the Spearman-Brown Formula? - Psychometric analytics.
Assessment Systems; Assessment Systems. https://assess.com/spearman-brown-prediction-
formula/
Understanding Item Analyses. (2022). Office of Educational Assessment. Retrieved October 14,
2022, from https://www.washington.edu/assessment/scanning-scoring/scoring/reports/item-
analysis/

You might also like