MRAD
MRAD
ABSTRACT
INTRODUCTION
As the global community grapples with sustainability challenges, the realm of education
has not remained untouched. Amidst pressing environmental, economic, and societal challenges, the
validity and integrity of educational assessments have emerged as crucial facets in achieving global
sustainability targets, particularly under the umbrella of quality education. Reliable assessment tools
are not mere evaluative instruments; they are the bedrock upon which it can nurture a future
generation that's equipped to address multifaceted global issues.
Standardized examinations, a predominant component of educational evaluation, hold the promise
of equanimity, offering an even playing ground for all students. Examples such as departmental
exams and board exams are testament to this, where a universal set of questions aims to gauge the
readiness of future professionals—from doctors to educators—for their respective fields. These
exams aren't just milestones; they shape futures and hence, the weight of their influence is
formidable.
Given the stakes, Teacher Education Institutions (TEIs) wield a potent tool in mock boards and
departmental examinations, mirroring the Board Licensure Examination for Professional Teachers
(BLEPT). Designed to mimic real-world testing conditions, these mock examinations serve as
essential dress rehearsals. However, the validity of these tools hinges on the quality of their
construction. In an era advocating for revitalized research and ethics, ensuring the quality of these
exams through rigorous validation becomes a global imperative. The resilience and credibility of a
test score correlate with its reliability, and any shortcoming in this spectrum risks the sanctity of the
evaluations, and by extension, the future trajectories of students.
Alarmingly, many educators have not internalized the gravity of thorough item analysis, often
resting on the assumption that their constructed test items are beyond reproach (Hartoyo, 2011).
Such presumptions pose a threat to quality education as they can misguide interpretations about
student capabilities based merely on scores, without weighing the quality of the test itself. Despite
global emphasis on item analysis, the Philippines lags behind, with glaring gaps, particularly
concerning departmental exams.
In light of these issues, this research aims to validate departmental test items, emphasizing their role
in achieving holistic educational excellence in the Philippines and by extension, contributing to
global sustainability. This study's outcomes hope to illuminate the path for educators, inspiring the
crafting of reliable assessments. By sharpening the precision of test items, this study takes one step
closer to providing students with unparalleled evaluative tools, solidifying the foundations for a
globally-recognized test bank in service of sustainable futures.
THEORETICAL BACKGROUND
Anchored in the principles of Classical Test Theory (CTT), this study delves into the
quantitative assessment of test reliability and validity. CTT operates on the premise that every
individual has an inherent true score, captured by the equation X = T + E. Here, 'X' denotes the
observed score, 'T' the true score, and 'E' the random error. The core objective of CTT is to refine
both the reliability and validity of tests.
In this research context, a designated sample undergoes testing. Post-examination, an
analytical evaluation ensues, leveraging various facets of CTT, including descriptive statistics like
mean and variance. A critical step involves gauging item-level statistics, especially the mean, to
discern item efficacy. Additionally, item difficulty, quantified by 'p' values, and the discrimination
index, which measures the item's validity, play pivotal roles in the analysis.
The study's crux revolves around CTT's application in evaluating departmental exam
questions' validity. Through meticulous assessments, researchers aim to categorize items as
retainable, in need of revision, or to be discarded. The findings, in essence, will chart the blueprint
for a robust test item bank and provide directives for validating exam questions, thereby enabling
institutions to offer exams that truly reflect and augment students' learning capacities.
Figure 1. Schematic Diagram of Study
OBJECTIVES
The study entitled Validation of Departmental Examination Questions: Basis for Test Item
Bank Formulation aims to validate the departmental examination questions of the professional
education subjects for the S.Y. 2022-2023 as the basis for test item bank formulation. Specifically,
this study seeks to answer the following question: (1) evaluate the validity indices of subjects within
the departmental examination during the first semester of S.Y. 2022-2023, (2) evaluate the
reliability indices of these subjects within the same examination framework (3) identify the number
of exam items categorized for retention, revision, or rejection, and (4) propose a strategic action
plan derived from the study's findings for future examination enhancements.
METHODOLOGY
This study employed a descriptive quantitative approach to assess the validity and reliability
of departmental examination questions for professional education subjects during the academic year
2022-2023. The descriptive quantitative design was chosen as it provides a structured method for
gathering quantifiable data from a sample population, facilitating statistical analysis.
In this particular study, there were no respondents involved as the data collected is of a
secondary nature, as the college itself is responsible for conducting and, consequently, collecting the
data. Given that the college administers these examinations as part of its standard academic
procedures, there was no need to seek external respondents.
This study is conducted at the University of Cebu—Lapu-Lapu and Mandaue Campus,
situated in Mandaue City. This esteemed institution of higher learning is deeply dedicated to
providing a holistic education founded on the principles of humanism, nationalism, and academic
excellence. The university's overarching mission is to make quality education accessible to all and
aims to lead the industry through visionary leadership, with a strong commitment to inspiring and
positively influencing the lives of its students. In pursuit of this mission, the College of Teacher
Education (CTE) consistently administers departmental assessments, designed to equip students
with the knowledge and skills needed to excel in board examinations, with the ultimate goal of
producing topnotchers.
The data gathering process involved several distinct phases. Initially, during the preliminary
stage, the researchers diligently navigated the intricacies of obtaining necessary approvals and
permissions from relevant authorities to conduct their study. These efforts were complemented by a
meticulous review of institutional policies to gain a comprehensive understanding of the procedures
required for data access. Subsequently, in the actual data gathering phase, after securing the required
permissions from the college, the researchers adeptly acquired the essential data for their research
endeavor. Following the data collection phase, the researchers seamlessly transitioned into the post-
data gathering phase, where they undertook a thorough examination and curation of the collected
data, meticulously organizing it in preparation for subsequent item analysis. The data analysis
process was multifaceted, encompassing a variety of techniques. Subproblem 1 was addressed
through the application of item analysis methods, incorporating formulas for calculating difficulty
and discrimination indices. These calculations drew upon established reference tables, including
Matoba's difficulty index table and Dr. Roderick O. Misa's discrimination index table. For
subproblem 2, the researchers harnessed the power of the split-half reliability method and the
Spearman-Brown formula, supported by an adapted table of reliability coefficients authored by Dr.
Roderick O. Misa. The data treatment strategy involved presenting information in tabulated formats,
facilitating its interpretation through diverse statistical techniques. In addressing subproblems 3 and
4, the researchers efficiently employed frequency counts to identify items necessitating revision or
rejection, thereby adopting a comprehensive and systematic approach to data analysis in pursuit of
their research objectives.
The ethical considerations integral to this study encompasses several key facets. Notably,
participant selection did not apply as the data gathered was of a secondary nature, sourced from the
College of Teacher Education and the Computer Maintenance Old Building In-Charge. The research
procedures adhered to a meticulously defined protocol, entailing the solicitation of permissions to
access data from the departmental examination held during the 1st semester of S.Y. 2022-2023.
Subsequently, the obtained data underwent rigorous testing for validity and reliability, contributing
to the creation of a repository of sound test items. Maintaining the highest level of confidentiality,
access to this data was restricted solely to the researchers and the supervisory panel, with plans for
its secure disposal after six months. Importantly, given the absence of respondents, the study carried
minimal risk, as the data were readily available secondary sources. Strict confidentiality measures
were diligently applied, ensuring that the data and related documents were securely stored and not
disclosed to external parties. Upon the study's completion, the researchers committed to sharing
their findings with the College of Teacher Education and the school administration, conducting all
research-related communications with transparency and integrity. The trustworthiness of the
research was underscored by a meticulous commitment to precision, consistency, and
comprehensiveness throughout the data collection and analysis processes, with detailed
documentation and systematic methodology disclosure to enable readers to assess the research's
authenticity.
RESULTS AND DISCUSSION
This chapter provides the results, discussions, analyses, and interpretations of data based on
the stated objectives.
Table 1
Difficulty Level of Foundation of Special and Inclusive Education
The data shows that 86% of the items in this test are very easy, 4% of the items in this test
are easy, 2% of the items in the test are average, 5% of the items in this test are difficult, and 3% of
the items in this test are very difficult. It can be deduced that this test on the Foundation of Special
and Inclusive Education could potentially possess the qualities of an excellent test. The data
revealed that the majority of the items in the test set have a very easy level of difficulty, while there
are barely any items with a medium level of difficulty. The Foundation of Special and Inclusive
Education examination includes questions on the learning characteristics of students with special
educational needs as well as differentiated strategies for teaching, assessing, and managing students
with special educational needs in regular classes.
Table 2
Discrimination Level of Foundation of Special and Inclusive Education
The data reveals that 9% of the items are intrinsically ambiguous, 1% of the items have zero
discrimination, 0% of the items have poor discrimination, 1% of the items have marginal
discrimination, 2% of the items have fair discrimination, 1% of the items have good discrimination,
and 86% of the items have excellent discrimination. Based on the data, most of the items were
within the acceptable parameters of excellent discrimination as revealed by the frequency count.
This means that most of the test items were able to discriminate academically between the high-
performing and low-performing examinees.
Table 3
Difficulty Level of The Child and Adolescent Learners and Learning Principles
The data shows that 91% of the items in this test are very easy, 4% of the items in this test
are easy, 2% of the items in the test are average, 2% of the items in this test are difficult, and 1% of
the items in this test are very difficult. It can be inferred that this test based on The Child and
Adolescent Learners and Learning Principles is invalid, but it has the potential to possess the
qualities of an excellent test. The Child and Adolescent Learners and Learning Principles
examination includes questions regarding human development, developmental stages and tasks, and
human development theories.
Table 4
Discrimination Level of The Child and Adolescent Learners and Learning Principles
The data indicates that 8% of the items are intrinsically ambiguous, 1% of the items have
zero discrimination, 2% of the items have poor discrimination, 2% of the items have marginal
discrimination, 3% of the items have fair discrimination, 3% of the items have good discrimination,
and 81% of the items have excellent discrimination. Based on the data, it can be implied that this
test on The Child and Adolescent Learners and Learning Principles has a good positive
discrimination power which is required for a good test. With more than 50% of items having
excellent discrimination, it can be said that this test set has reached the ideal index and implies that
high achievers answer correctly more often than low achievers.
Table 5
Difficulty Level of Technology for Teaching and Learning 1
The data shows that 62% of the items in this test are very easy, 10% of the items in this test
are easy, 18% of the items in the test are average, 9% of the items in this test are difficult, and 1% of
the items in this test are very difficult. This test set on Technology for Teaching and Learning 1
entails that it does not have the desirable qualities of a good test but has the ability to. The item
difficulty data tells us that the quality of the test items is very easy, which can help teachers and test
creators be informed that their current test set lacks items with a medium degree of difficulty. The
Technology for Teaching and Learning 1 includes questions about basic concepts in ICT, digital
literacy skills, non-digital and digital tools, and the integration of ICT in the classroom.
Table 6
Discrimination Level of Technology for Teaching and Learning 1
The data exhibits that 10% of the items are intrinsically ambiguous, 3% of the items have
zero discrimination, 3% of the items have poor discrimination, 7% of the items have marginal
discrimination, 9% of the items have fair discrimination, 9% of the items have good discrimination,
and 59% of the items have excellent discrimination. On the basis of the frequency data, it is possible
to deduce that this Technology for Teaching and Learning 1 test set has a positive discrimination
power. This demonstrates that this test set has reached the optimal index and suggests that high
achievers answer accurately more frequently than low achievers.
Table 7
Difficulty Level of The Teaching Profession
The data shows that 68% of the items in this test are very easy, 15% of the items in this test
are easy, 11% of the items in the test are average, 5% of the items in this test are difficult, and 1% of
the items in this test are very difficult. It is possible to conclude that this test based on The Teaching
Profession is invalid, but it has the potential to be an exceptional test. The majority of the items on
the examination were of a very easy degree of difficulty, and there were just a few with a moderate
degree of difficulty. The Teaching Profession examination consists of queries about teaching as a
profession, vocation, and mission, code of ethics, competency framework and standards, and
educational philosophies.
Table 8
Discrimination Level of The Teaching Profession
The data presents that 7% of the items are intrinsically ambiguous, 3% of the items have
zero discrimination, 1% of the items have poor discrimination, 11% of the items have marginal
discrimination, 13% of the items have fair discrimination, 13% of the items have good
discrimination, and 52% of the items have excellent discrimination. It may be inferred from the data
that this set of test on The Teaching Profession has a high positive discrimination power indicative
of a high-quality assessment tool. With more than fifty percent of items having excellent
discrimination, it can be said that this test set has attained the optimal index, indicating that high
achievers are more likely to answer accurately than low achievers.
Table 9
Difficulty Level of Assessment in Learning 1
The data shows that 84% of the items in this test are very easy, 9% of the items in this test
are easy, 5% of the items in the test are average, 1% of the items in this test are difficult, and 1% of
the items in this test are very difficult. It can be deduced that this Assessment in Learning 1
examination may have possessed the qualities of an ideal examination. The data revealed that the
test set lacks items with a medium level of difficulty and consists primarily of items with a very
simple level of difficulty. The Assessment in Learning 1 exam includes assessment purposes,
assessment types, and item-analysis questions.
Table 10
Discrimination Level of Assessment in Learning 1
The data indicates that 2% of the items are intrinsically ambiguous, 1% of the items have
zero discrimination, 1% of the items have poor discrimination, 5% of the items have marginal
discrimination, 6% of the items have fair discrimination, 13% of the items have good
discrimination, and 72% of the items have excellent discrimination. This test set for Assessment in
Learning 1 has good discriminatory power, as shown by the frequency statistics. This indicates that
the optimal index has been reached and suggests that high achievers are more likely than low
achievers to provide accurate responses.
Table 11
Difficulty Level of The Teacher and The School Curriculum
The data shows that 67% of the items in this test are very easy, 16% of the items in this test
are easy, 9% of the items in the test are average, 5% of the items in this test are difficult, and 3% of
the items in this test are very difficult. It can be gleaned that this test set on the Teacher and the
School Curriculum could have had the ideal characteristics of a good test. The data revealed that the
test set lacks a medium degree of difficulty. The Teacher and the School Curriculum test include
curriculum development, planning, and evaluation questions.
Table 12
Discrimination Level of The Teacher and The School Curriculum
The data presents, that 9% of the items are intrinsically ambiguous, 1% of the items have
zero discrimination, 3% of the items have poor discrimination, 7% of the items have marginal
discrimination, 12% of the items have fair discrimination, 15% of the item’s good discrimination,
and 53% of the items have excellent discrimination. The data suggest that this test set on The
Teacher and The School Curriculum has a high positive discrimination power, which is indicative of
a high-quality evaluation instrument. According to the presented data, the majority of items fall
within the acceptable range for excellent discrimination. It indicates that the majority of the items
were able to distinguish between high and low academic performers.
Table 13
Reliability Statistics of the Departmental Examination of Professional Education Courses
The data reveals that three professional education courses got very good reliability. The
Teacher and The School Curriculum course got the highest reliability coefficient, .894. Followed by
Assessment in Learning 1 course, with a .885 reliability coefficient, and then followed by The Child
and Adolescent Learners and Learning Principles course, with a .872 coefficient reliability. While
Technology for Teaching and Learning 1 and Foundation of Special and Inclusive Education
courses got good reliability, with a .752 and .733 reliability coefficient, respectively. Lastly, The
Teaching Profession course got a coefficient reliability of .070, interpreted as questionable
reliability. According to Flucher and Davidson (2007), for a test to be reliable, it must produce
consistent results under different conditions. Consequently, the test result is reliable. The Spearman-
Brown formula was used to ascertain the questionnaire's reliability. A test with a calculated value of
greater than 0.70 is regarded as reliable (Crocker & Alqini, 2008; Smith, 2018). Consequently, the
departmental examination of the professional education courses is regarded as reliable. Even though
the reliability of one of the exams is questionable, it is still valid.
Table 14
Number of Items to be Retained, Revised, and Rejected
The data presented shows that the column with the highest numbers is the column for the
number of items to be revised. The only subject with a high number of items to be retained is
TC112: The Child and Adolescent Learners and Learning Principles, with a number of 45. Followed
by PC315: The Teacher and the School Curriculum, with a number of 34 items to be retained. The
table shows that it is important for teachers and test creators to perform item analysis to know more
about the tests they created. Odukya et al. (2018) advised using item analysis when preparing for an
exam. By doing this, teachers and test creators may know whether the test items they have created
met the psychometric thresholds for difficulty indices. If test items failed to meet the psychometric
thresholds for difficulty indices, it requires modifications or removal. Modifying or removing those
failed test items would improve the test content and construct validity.
CONCLUSION
In conclusion, this descriptive quantitative research revealed that all six professional
education courses are not validated but reliable. Additionally, one of the reasons why the majority of
test questions are "very easy" could be that the majority of students have studied or reviewed well
for the exam. Finally, the findings of this study revealed that item analysis is a crucial step in the
test development process. It pertains to determining the test quality that directly influences the
accuracy of student scores.
RECOMMENDATIONS
Given the implications of the research findings and the ensuing conclusions, the ensuing
recommendations are proffered:
1. Institution of a Rigorous Test Item Refinement Protocol: Endeavor to formulate a holistic
and dynamic approach focused on the rigorous enhancement and validation of departmental
examination questions for professional education subjects. This protocol should be anchored
on contemporary best practices and should prioritize the elevation of question quality,
ensuring they align with the rigors of the professional landscape and the expectations of
Teacher Education Institutions (TEIs).
2. Establishment of a Dedicated Test Quality Oversight Committee within the College of
Teacher Education (CTE): This body should be entrusted with the responsibility of
periodically reviewing, evaluating, and refining test items. By doing so, they would be
ensuring that assessments remain contemporaneous, challenging, and reflective of the
evolving academic and professional demands.
3. Inception of a Comprehensive Teacher Training Workshop: Champion a series of dedicated
workshops tailored for test creators within the CTE. These workshops should provide a deep
dive into the nuances of designing high-quality test items, elucidate the principles of test
validity and reliability, and offer hands-on opportunities for faculty to refine and perfect
their test creation skills.
4. Rollout of a Feedback-Driven Test Revision Mechanism: Cultivate a culture of continuous
improvement by instituting a mechanism where teachers, students, and test creators
collaboratively review and provide feedback on test items. This iterative feedback loop
ensures the dynamism and relevance of questions while fostering a sense of collective
ownership over the quality of assessments.
5. The study recommends further research to establish the validation of departmental
examination questions as an important part of formulating a test item bank. The following
titles can be considered for Quantitative studies: (1) A Comparative Analysis of Item
Difficulty Index in Exam Questions for College of Teacher Education Students; (2) Validity
and Reliability of Multiple-Choice Questions for Education Students’ Exams; (3) An
Empirical Study of Item Discrimination Index and Item-Total Correlation in Exam
Questions for College of Teacher Education Students. For Qualitative studies: (1) Evaluating
the Content Validity of Examination Questions: A Qualitative Analysis of Expert Feedback;
(2) Exploring the Perceptions of Students regarding the Construct Validity of Examination
Questions; (3) A Qualitative Study on Item Analysis Techniques for Validating Examination
Questions.
REFERENCES
Abdalla, M. E. (2011). What does Item Analysis Tell Us? Factors affecting the reliability of
Multiple Choice Questions (MCQs). Gezira Journal of Health Sciences, 7(2), Article 2.
http://journals.uofg.edu.sd/index.php/gjhs/article/view/318
An, X., & Yung, Y.-F. (2014). Item Response Theory: What It Is and How You Can Use the IRT
Procedure to Apply It. 14.
Arokia Marie, S. M. J., & Edannur, S. (2015). Relevance of Item Analysis in Standardizing an
Achievement Test in Teaching of Physical Science in B.Ed Syllabus. I-Manager’s Journal
of Educational Technology, 12(3), 30–36. https://doi.org/10.26634/jet.12.3.3743
Brown, H. D. (2004). Language Assessment: Principles and Classroom Practices. Longman.
Bryman, A. & Bell, E. (2007). “Business Research Methods,” 2nd edition. Oxford University
Press.
Classical Test Theory – Psychometric Tests. (n.d.). Retrieved November 9, 2022, from
https://www.psychometrictest.org.uk/classic-test-theory/
Classical Test Theory Assumptions, Equations, Limitations, and Item Analyses. (2005).
https://www.sagepub.com/sites/default/files/upm-binaries/4869_Kline_Chapter_5_Classical
_Test_Theory.pdf
Crocker, L. & Algina, J. (2008). Introduction to classical and modern test theory. Mason, OH:
Cengage Learning.
Flucher, G., & Davidson, F. (2007). Language Testing and Assessment: An Advance Resource
Book. Routledge.
Fraenkel, J. R. & Wallen, N. E. (2003). How to design and evaluate research in education (5th ed.).
New York, NY: McGraw-Hill Higher Education.
Hartoyo. (2011). Language assessment. Pelita Insani
Haladyna, T. M. (2004). Developing and Validating Multiple-Choice Test Items (3rd ed.). Lawrence
Erlbaum Associates Publisher.
Hingorjo, M. R., & Jaleel, F. (2012). Analysis of One-Best MCQs: The Difficulty Index,
Discrimination Index, and Distractor Efficiency. JPMA-Journal of the Pakistan Medical
Association, 62(2), 142–147.
Karim, S. A., Sudiro, S., & Sakinah, S. (2021). Utilizing test items analysis to examine the level of
difficulty and discriminating power in a teacher-made test. EduLite: Journal of English
Education, Literature and Culture, 6(2), 256. https://doi.org/10.30659/e.6.2.256-269
Kunandar, K. (2013). Penilaian Autentik: (Penilaian Hasil Belajar Peserta Didik Kurikulum 2013).
RajaGrafindo Persada.
Kline, T. J. (2005). Reliability of test scores and test items. SAGE Publications, Inc.,
https://dx.doi.org/10.4135/9781483385693
Linn, R. L. (2010). Educational Measurement: Overview. In P. Peterson, E. Baker, & B. McGaw
(Eds.), International Encyclopedia of Education (Third Edition) (pp. 45–49). Elsevier.
https://doi.org/10.1016/B978-0-08-044894-7.00243-8
Livingston, S. (2018). Test Reliability-Basic Concepts.
https://www.ets.org/Media/Research/pdf/RM-18-01.pdf
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA:
Addison-Wesley.
Maharani, A. V., & Putro, N. H. P. S. (2020). Item Analysis of English Final Semester Test.
Indonesian Journal of EFL and Linguistics, 5(2), 491.
https://doi.org/10.21462/ijefl.v5i2.302
Mardapi, D. (2015). Pengukuran, Penilaian, dan Evaluasi Pendidikan. Nuha Litera.
Matoba, B. (2020, August 6). Discriminatory item analysis: A test administrator’s best friend.
EMS1. https://www.ems1.com/ems-products/training-tools/articles/discriminatory-item-
analysis-a-test-administrators-best-friend-swTO4YwkCWH1ZXyD/
Miller, D. M., Linn, R. L., & Gronlund, N. E. (2009). Measurement and Assessment in Teaching
(10th ed.). Pearson Education.
Odukoya, J. A., Adekeye, O., Igbinoba, A. O., & Afolabi, A. (2018). Item analysis of university-
wide multiple choice objective examinations: The experience of a Nigerian private
university. Quality & Quantity, 52(3), 983–997. https://doi.org/10.1007/s11135-017-0499-2
Rosana, D., & Setyawarno, D. (2017). Statistik Terapan Untuk Penelitian Pendidikan. UNY Press.
Sabbot. (2013, May 15). Standardized Test Definition. The Glossary of Education Reform.
https://www.edglossary.org/standardized-test/
Smith, S. (2018). What is KR20? Platinum Educational Group. Retrieved from
https://platinumed.zendesk.com/hc/en-us/articles/214600006-What-is-KR20-
Tanquis, J., & Baluarte, S. (2019). Assessment of the Departmental Examination Outcomes of the
College of Computer Studies of the University of Cebu LapuLapu and Mandaue: Proposed
Intervention. Cebu Journal of Computer Studies, 2(1), 6-16.
Thompson, N. (2018, April 14). What is the Spearman-Brown Formula? - Psychometric analytics.
Assessment Systems; Assessment Systems. https://assess.com/spearman-brown-prediction-
formula/
Understanding Item Analyses. (2022). Office of Educational Assessment. Retrieved October 14,
2022, from https://www.washington.edu/assessment/scanning-scoring/scoring/reports/item-
analysis/