Seeq: A Reliable, Valid, A N D Useful Instrument For Collecting Students' Evaluations of University Teaching

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Br. J. educ. Psychol.

, 52, 77-95, 1982

SEEQ: A RELIABLE, VALID, AND USEFUL


INSTRUMENT FOR COLLECTING STUDENTS’
EVALUATIONS OF UNIVERSITY TEACHING
BY H. W. MARSH
(Department of Education, The University of Sydney, Australia)
SUMMARY. This study describes research leading to the development and implemen-
tation of SEEQ (Students’ Evaluations of Educational Quality). SEEQ is an instrument
and programme for collecting students’ evaluations of college/university teaching. The
paper indicates that SEEQ measures nine distinct components of teaching effectiveness
that have been identified in both student ratings and faculty self evaluations of their own
teaching. Reliability is good when based upon 10 to 15 or more student responses.
The ratings have successfully been validated against the retrospective ratings of former
students, student learning as measured by objective examination, affective course
consequences, and staff self evaluations of their own teaching effectiveness. Suspected
sources of bias to the ratings have been shown to have little impact. Feedback from
student ratings, particularly when coupled with a candid discussion with an external
consultant, produced improvement in both subsequent ratings and student learning.

INTRODUCTION
THEpurpose of this review is to summarise research that led to the development of
SEEQ (Students’ Evaluations of Educational Quality). SEEQ is an instrument and a
programme for collecting students’ evaluations of college/university teaching.
Research presented in this review is described in greater detail in a series of technical
reports and publications. This research, in addition to guiding SEEQ’s development,
has also provided an academic credibility that is essential in winning faculty support.
It is hoped that this review may serve as both a model and encouragement to academic
units seeking to implement or improve systematic programmes of students’ evaluations.
Research and development on the first SEEQ, which is substantially similar to the
current version, was conducted a t the University of California, Los Angeles (UCLA).
This effort began with a Task Force on the Evaluation of Teaching that examined
evaluation practices at UCLA and other universities, and made recommendations that
included the development of a campus-wide programme of students’ evaluations of
teaching. Based upon current practices, interviews with students and faculty, and a
review of the evaluation literature, an extensive item pool was developed. The work
done by Hildebrand et al. (1971) at the University of California, Davis was particularly
important in developing this pool of items. Several different pilot surveys-each
consisting of 50-75 items-were administered to classes in different academic depart-
ments. Students, in addition to making ratings, were asked to indicate the items they
felt were most important in describing the quality of teaching. Similarly, staff were
asked to indicate the items they felt would provide them with the most useful feedback
about their teaching. Students’ open-ended comments were reviewed to determine if
important aspects had been excluded. Factor analysis identified the dimensions
underlying the student ratings, and the items that best measured each. Reliability
coefficients were compiled for each of the evaluation items. Finally, after several
revisions, four criteria were used to select items to be included on the UCLA version
of SEEQ. These were: (1) student ratings of item importance, (2) staff ratings of item
usefulness, (3) factor analysis, and (4) item reliabilities. During the last 6 years over
500,000 of these forms have been completed by UCLA students from more than 50
academic departments in over 20,000 courses. The results of the evaluations are
returned to faculty as feedback about their teaching, are used in tenure/promotion
decisions and are published for students to use in the selection of courses.
77
78 Students’ Evaluations
The current version of SEEQ (see Appendix 1) was developed at the University of
Southern California (USC). A preliminary version of the instrument was adopted on
a trial basis by the Division of Social Sciences, pending the outcome of research on the
instrument. On the basis of much of the research summarised in this review, the
current form was unanimously endorsed by the Dean and Department Chairpersons
in the Division, and its use required in all Social Science courses. The programme was
later adopted by other academic units at USC, and over 250,000 SEEQ forms have
been completed by USC students over the last 4 years.

METHOD
Description of the instrument
The SEEQ survey form is presented in Appendix 1. The two-sided evaluation
instrument is self-explanatory, easily administered, and computer scorable. The form
strives for a compromise between uniformity and flexibility. The standardised ques-
tions used to evaluate all courses measure separate components of instructional
effectivenessthat have been identified with factor analysis. Provision for supplemental
questions at the bottom of the printed form allows the individual instructor or aca-
demic unit to design items unique to specific needs. Adequate provision for student
comments to open-ended questions is provided on the back of the form.
A sample of the two-page summary report prepared for each course is presented
in Appendix 2 (the actual report appears on 8.5 inch x 15 inch computer paper). The
summary report, along with the completed surveys that contain students’ open-ended
comments, are returned to the instructor. Copies of the report are also sent to the
Department Chairperson and/or the Dean of the particular academic unit. The data
upon which the report is based are permanently stored in a computer archive system
by the Office of Institutional Studies, the central office that processes the forms. In the
report, the evaluation factor scores, the overall summary ratings, and demographic/
background items are presented on page 1, while the separate rating items appear on
page 2. Each item is summarised by a frequency distribution of student responses, the
mean, the standard error, and the percentile rank that shows how the mean rating
compares with other courses. A graphic representation of the percentile rank is also
shown. If any supplemental questions were used, a summary of these responses
appears on a third page.
The normative comparisons provided in the summary report (the percentile ranks)
play an important role in the interpretation of the ratings. First, students are uni-
versally quite generous in their evaluations of teaching. The average overall course
and instructor ratings are typically about 4.0 on a one-to-five scale. Second, some
items receive higher responses than do others-overall instructor ratings are almost
always higher than overall course ratings. Finally, comparisons are made between
instructors teaching courses at similar levels (i.e., there are separate norms for graduate
level courses, undergraduate level courses taught by faculty members, and courses
taught by teaching assistants). Academic units at USC (e.g., the 10 departments in the
Division of Social Sciences) are given the option of using university-wide norms or
norms based upon ratings from just their unit. However, ratings are only ranked
against norms containing at least 200 courses.
A longitudinal summary report, summarising all the available courses ever taught
by each instructor, is also produced annually. The report contains means and percen-
tile ranks for the evaluation factor scores, the overall summary ratings, and selected
background/demographic items. This information is presented separately for each
course, and is averaged across all graduate level courses and across all undergraduate
courses. Courses that were evaluated by fewer than 10 students or by less than 50
H. W. MARSH 79
per cent of the enrolled students are not included in the longitudinal averages. Impor-
tant information can be gained from examining this report, beyond the convenience of
having a summary of all the ratings for each teacher. The longitudinal average is not
unduly affected by a chance occurrence in any one course offering, and it reflects
teaching effectiveness in the range of courses that are likely to be taught by a particular
instructor. The change in ratings over time provides a measure of instructional
improvement. Furthermore, this summary provides a basis for determining the
classes in which an individual teacher is most effective.
In addition to the individual and longitudinal summary reports, other studies and
special analyses are performed at the request of the Dean and/or Chairpersons.
These include requests as diverse as using previous ratings for a particular course as a
baseline against which to compare ratings after an innovative change, a determination
of the trend over time in ratings of all courses within a given academic department, and
the use of supplemental questions to query students about their preferences in class
scheduling.

RESULTS
Factor analysis
Factor analysis is used to describe the different components of teaching effective-
ness actually being measured by a set of questions. Its use is particularly important
in the development of student evaluation instruments, since it provides a safeguard
against a ' halo effect '-a generalisation from some subjective feeling about the teacher
which affects ratings of all the questions. To the extent that all the items are con-
taminated by this halo effect, they will all blend together and not be distinguished as the
separate components of teaching effectiveness that the evaluation form was designed
to measure.
A well-developed factor structure is also important to the interpretation of the
student ratings. Broad global ratings averaged across a collection of heterogeneous
items provide little diagnostic feedback and are difficult to interpret. For example,
Marsh, Overall and Kesler (1979b) showed that while large classes did tend to receive
lower ratings when averaged across all items, this effect was limited almost entirely
to the Group Interaction and Individual Rapport factors. Similarly, an interview
with a student about an earlier version of the evaluation form indicated that she had
given an instructor lower ratings on several more or less randomly selected items
because there were no items where she could express her sentiment that " the examin-
ations were terrible ". Even if particular components of teaching effectiveness seem
less important to a particular instructor (or academic unit), their exclusion may make
other ratings more difficult to interpret.
SEEQ measures nine evaluation factors (see Table 1). Marsh (Marsh and Overall,
1979b; Marsh, in press) presented a factor analysis of student ratings that confirmed
the nine factors SEEQ was designed to measure, and these findings have been replicated
in different academic disciplines and in different academic years. Even more convinc-
ing support came from a study in which faculty in 329 classes were asked to evaluate
their own teaching effectiveness with the same SEEQ form that was used by their
students. Separate factor analyses of the student ratings and the instructor self-
evaluation both demonstrated the same nine evaluation factors that had previously
been identified (see Table 1). More recently the same nine factors have been identified
in ratings collected at the University of Sydney, Australia (Marsh, 1981a). These
analyses illustrate the replicability of the rating factors and their generalisability
across different populations of students and different methods of evaluation.
TABLE 1
FACTOR
ANALYSES
OF STUDENTS' EVALUATIONS EmCTtVENEsSAND "HE
OF TEACHING CORRESPONDING STAFF
SELFEVALUATIONS
OF THEIR
OWNTCACHING (IN
I N ALL 329 COURSLS DRACKLIS)

Factor Pattern Loadings

Evaluation Items (paraplirdsed) I Ir 111 1v V \'I VlI VIII IX


I Learning/Value
Course challenging/stimulating 42 (40) 23 (25) 09 (-10) 04 (04: 00 (-03) 15 (27) 09 (05) 16 (23) 2 9 (20)
Learned something valuable 53 (77) 15 (02) 10 (-02) 09 (04) 01 (01) 10 (00) 10 (04) 17 (09) 16 (06)
Increased subject interest 57 (70) 12 (05) OX (07) 08 (07) 02 (-03) 18 (08) 03 (-04) 19 (05) 14 (-02)
Lcarned/understood subject mattcr 55 (52) 12 (12) 13 (12) 05 (03) 03 (11) 02 (-01) 19 (07) 14 (-04) -23 (-11)
Overall course rating 25 (29) 16 (09) 12 (OX) 09 (02) 12 (16) 1 3 (-08) 14 (27) OX (16)
It Enthusiasm
Enthusiastic about tcaching 15 (29) 16 (00) 07 (02) 21 (15) I0 (00) 05 (16) 01 (09)
Dynamic and energetic 08 (03) 70 (70) 15 (01) 11 (06) OX (05) 06 (05) 07 (16) 01 (05)
Enhanced presentations with humour 10 (04) 66 (58) -04 (06) 05 (01) 13 (02) 12 (02) 14 (07) 02(-18) -07 (-10)
Teaching style held your interest 09 (12) 23 (20) 16 (06) 06 (00) 03 (14) 10 (05) 06 (C3) -02 (-03)
Overall instructor rating 12 (27) 23 (09) 14 (OX) 23 (02) 1 1 (16) 10( -08) 05 (27) 05 (16)
I11 Organisation
Instructor explanations clear 1 2 (00) 07 (24) 55 (42) 20 (09) 0 5 (04) 10 (06) 1 3 (01) 06 (23) (12)
-0801 (-03)
Course materials prepared and clear 06 (06) 03 (-02) 09 (01) 10 (-02) 09 (04 06 (03) 10 (03)
Objectives stated and pursued 19 (12) -05 ( - 0 8 ) 14% 03 (05) OX (05) 14 (08) 25 (27) 06 ( 0 5 ) 06 (06)
Lectures facilitated note taking -03 (02) 20 (09) -17 (07) -02 (05) 14 (04) 15 (06) 08 (01) -04 (-05)
......... ~...
1V Gronn Tntel-action
I
1
Encouraged class discussions 04 (06) 10 (02) 01 (03) 84 (86) 03 (00) 00 (00) 06 (00) 08 (-05)
06 (-10) 00 (-03)
Students shared ideas/knowledge (12 (OX) 06 (-07) -04 (-01) 85 (88) 05 (13) 05 (01) OX (-02) -02 (01)
Encouraged questions and answers 03 (-04) 06 (09) 14 (06) 62 (69) 16 (-02) 15 (03) 07 (1 I) OX (21) 00 (01)
Encouraged expression of ideas 07 (01) 02 (06) 01 (-11) 73 (75) 10 (09) 05 (07) 09 (12) 05 (09 00 (- 02)
V Individual Rappoit
Friendly towards students -04 (10) 17 (06) 00 (-06) 13 (12) -01 (-05) 13 (02) 10 (-05) -07 (01)
Welcomed seeking help/advice 04(--10) 05 (02) 02 (07) 06 (00) 85 (75) -04 (04) 12 (06) 05 (20) 03 (-04)
Interested in individual studenls 07 (10) II (09) 00 (01) 14 (07) 69 (77) -01 (-09) 14 (03) 08 (-09) 03 (09)
Accessible to individual students 02( -J3) -11 -11) 16 (09) 09 (- 02) 62 (43) 20 (25) 08 (13) 00 (14) 04 (07)
VI Breadth or Coverage
Contrasted implications -05 (02) 12 (01) 05 (03) 08 (01) -03 (01) 72 (84) 08 (08)
01 (-03) 14 (02) 08 (-06)
Gave background of ideas/concepts OX (03) 08 (10) 16 (07) -03 (-02) 02 (- 02) 71 (78) 11 (-01) 03 (03)
Gave different points of view 04 (-06) 04 (09) I 1 (11) 08 (16) 06 (01) 72 (55) 07 (17) 01 (-06) 04 (08)
Discussed current developments 23 (29) 08 (-04) -04 (-04) 05 (12) 09 (00) n5 0 (48) 06 (05) 16 (10) -01 (-02)
VII Examinations/Grading
Examination feedback valuable -03 (01) OX (09 06 (-11) 09 (05) 08 (12) -04 (03) 72 (62) 05 (-03) 09 (03)
Eva1 methods fair/appropriate 06 (02) 00 (-03) 03 (14) 07 (06) 14 (00) 10 (17) 69 (64) I I (11) -08 (05)
Tested emphasised course content 08 (00) -01 (04) 11 (21) 01 (01) 06 (00) 11 (-04) 07 (10) -02 (-03)
VlII Assignments
Readings/texts valuable -06 (09) -03 (-03) 03 (07) -01 (-06) 03 (01) 07 (-07) 01 (11) 02 (04)
Added to course understanding 12 (01) -01 (-12) 01 (04) 09 (21) 01 (17) -02 (08) 07 (05) 06 (10)
1X Workload/Difficulty
Course diRculty f easy-hard) -06 (00) 06 (-01) 04 (-05) -04 (02) -01 (00) 08 (00) - 04 (08) 10 (04) a5 (74)
Course itorkload (light-heavy) 14 (-04) -0'J (-01) 03 (02) 07 (05) 00 (04) 06 (01) 00 (01) 00 ( - 04) 88 (86)
Course pace (inn slow-too f a t ) -20 (07) 12 ( 0 0 ) 04 (18) -I2 (-09) 06 (02) -03 (-07) 04 (OX) 05 ( -04) 62 (32)
1l~,l,r,/\,ccLW l d C 01 c l a w 14 (00) 07 (00) - I I(O0) 07 (02) 00 ( 0 2 ) -04 (03) 03 ( - 08) 0 s ( 2 I) 73 (461
n . .
~ o r rFactor
: Iondings in boves are the loadings for items designed to mcnsllre each factor. All loadings are presented without decimal points. Factor annlyses o f student ratings and
l
instructor ~ c l rating\ (loadings in parentheses) consisted of a prinLip3komponents analysis, Kaiser normalisation, and rotation to a direct oblilnin criterion. ?'he first nine
unrolated factors for the instructor self ratings had eigenvalues 01.9.5 2.9. 2.5 2.2 2.0 1.4 1.3 1 . 1 and 1.0 and accounted for 68 per cent of the variance. For the student
the first nine eigenvalues were 19.9, 3.3, 2.3, 1.5, 1,2,09,0.7, 0.6 an'd 0.5, aAd ako;nted fo;XX per cent'of the variance. The analyses were performed with the commercially
available SI'SS routine (See Nie ct a/., 1975).
H. W. MARSH 81
Factor scores derived from the results of factor analytic research are an important
part of the summaries of the student ratings described earlier. Research described in
this section is presented in more detail in Marsh, Overall and Kesler (1979a), Marsh
and Overall (1979b), Marsh and Cooper (1981) and Marsh (in press). Further
discussion of this issue is presented in Marsh (1980b).
Reliability
Reliability refers to the relative lack of random error in student ratings, and is a
necessary prerequisite for any measurement device. Reliability is assessed by deter-
mining the consistency or stability of a measure. According to one conceptualisation
of reliability called the intraclass correlation, a reliable item is one in which there is
agreement among ratings within each class, but consistent differences between the
ratings of different classes. A similar approach would be to take a random half of the
students’ ratings from each of a large number of classes and to correlate their ratings
with those of the remaining students. The reliability of a given item depends more on
the number of students responding than on the actual item content. The average
reliability of SEEQ items is about 0.90 when based upon 25 students, but falls to 0.74
when based upon only 10 responses and is even lower for fewer responses.
An alternative determination of reliability, called coefficient alpha, considers the
relative agreement among different items designed to measure the same factor. This
approach does not include disagreement among students within the same class as a
source of unreliability, and probably results in an inflated estimate of reliability. The
coefficient alphas for the different evaluation factors in SEEQ vary between 0.88 and
0.97.
As a consequence of this research, a caution appears on any summary report that
is based upon fewer than 10 responses. Similarly, these courses are not included in
the computation of the longitudinal averages. Data on the reliability of SEEQ items
and factors is presented in Marsh and Overall (1979b).

Long-term stability
A common criticism directed at student ratings is that students do not have an
adequate perspective to recognise the value of instruction at the end of a class. Accor-
ding to this argument, students will only recognise the value of teaching after being
called upon to apply the course materials in further coursework and after graduation.
A rather unique opportunity to test this notion arose at a California State University
which had adopted an earlier version of SEEQ. Undergraduate and graduate students
in the school of management evaluated teaching effectiveness at the end of each course.
However, unlike most programmes, the forms were actually signed by the students,
allowing the identification of individual responses. One year after graduation from
the programme (and several years after taking a course) the same students were
again asked to make ‘ retrospective ratings ’ of teaching effectiveness in each course,
using a subset of the original items. Since all evaluations were signed, the end-of-term
ratings could be matched with the retrospective ratings. Over a several-year period of
time, matched sets of ratings-both end-of-term and retrospective-were collected for
students in 100 classes. Analysis of the two sets of ratings showed remarkable agree-
ment. The average correlation (relative agreement) between end-of-term and
retrospective ratings was 0.83. Mean differences between the two sets of ratings
(absolute agreement) was small ;the median rating was 6.63 for retrospective ratings and
6.61 for end-of-term ratings. Separate analysis showed these results to be consistent at
both the graduate and undergraduate levels, and across different course types.
This research is described in more detail in Marsh and Overall (1979a, 1981) and
Overall and Marsh (1980). In related research, Marsh (1977) showed that responses
from graduating seniors were similar to the ratings of current students.
F
82 Students’ Evaluations
Validity-student learning
Student ratings, one measure of teaching effectiveness, are difficult to validate
since there is no universal criterion of effective teaching. Consequently, using an
approach called construct validation, student ratings have been related to other
measures that are assumed to be indicative of effective teaching. If two measures that
are supposed to measure the same thing show agreement, there is evidence that both
are valid. Clearly this approach requires that many alternative validity criteria be used.
Within this framework, evidence of the long-term stability of student ratings can be
interpreted as a validity measure. However, the most commonly used criterion has
been student learning as measured by performance on a standardised exam-
ination.
Methodological problems require a special setting for this research. Ideally,
there are many sections (i.e., different lecture groups that are part of the same course)
of a large multi-section course in which students are randomly assigned or at least
enroll without knowledge of who will be teaching the section. Each section of the
course should be taught by a separate teacher, but the course outline, textbooks,
course objectives, and most importantly the final examination, should be developed by
a course director who does not actually lecture to the students. In two separate studies
applying this methodology, it was found that the sections that evaluate teaching most
favourably during the last week of classesalso perform best on the standardised examin-
ation given to all sections the following week. Since students did not know who would
be teaching different sections at the time of registration, and sections did not differ on a
pretest administered at the start of the term, these findings provide good support for
the validity of student ratings.
In the second of these studies a set of affective variables was also considered as a
validity criterion. Since the course was an introduction to computer programming,
these included such variables as feelings of course mastery, plans to apply the skills that
were gained from the course, plans to pursue the subject further, and determination of
whether or not students had joined the local computer club. In each case, more fav-
ourable responses to these items were correlated with more favourable evaluations
of the teacher.
These two studies are described in more detail in Marsh, Fleiner and Thomas
(1975) and Marsh and Overall (1980). Similar findings, using this same methodology,
are presented in Frey et al. (1975), Centra (1977), in studies reviewed by McKeachie
(1979) and Marsh (1980b), and in a meta-analysis by Cohen (1980).

Validity-faculty serf evaluations


Validity research such as that described above, while supporting the use of
student ratings, has generally been limited to a specialised setting (e.g., large multi-
section courses) or has employed criteria (e.g., student retrospective ratings) that are
unlikely to convince sceptics. Thus, faculty members will continue to question the
usefulness of student ratings until validity criteria that are both convincing and
applicable across a wide range of courses are utilised. Staff self evaluations of their
own teaching is one criterion that meets both these requirements. Furthermore,
instructors can be asked to evaluate their own teaching along the same dimensions
employed in the student rating form, thereby testing the specific validity of the different
rating factors. In two different studies faculty members were asked to evaluate their
own teaching with the same evaluation form used by their students, as well as to
provide background/demographic information and to express their attitudes toward
the evaluation of teaching. A letter from the Dean of the Division was also sent that
encouraged participation and guaranteed confidentiality.
H. W. MARSH 83
A majority of the faculty (59 per cent) indicated that some measure of teaching
effectiveness should be given more emphasis in promotional decisions. Faculty mem-
bers clearly agreed that student ratings were useful to the faculty themselves as feed-
back, and a majority of them even agreed that the ratings should be made publicly
available for students to use in course selection. However, they were more sceptical
about the accuracy of the student ratings and even more critical of using classroom
visitation or faculty self evaluations in promotional decisions; they were somewhat
less critical about colleague examination of course outlines, reading lists, and class-
room examinations. Faculty also indicated a number of potential biases that they felt
would substantially affect student ratings. The most frequently mentioned were
Course Difficulty, Grading Leniency, Instructor Popularity, and Student Interest in
the Subject Before Taking the Course. A dilemma clearly exists. Faculty are
concerned about teaching effectiveness, even to the extent of wanting it to play a more
important role in their own promotions. However, many expressed doubts about
each of the proposed measures of teaching effectiveness-including student ratings.
Before the potential usefulness of the student ratings can be realised, faculty and
administrators have to be convinced that student ratings are valid.
In the first study, only undergraduate courses taught by faculty were considered.
Despite their reservations about the validity of the student ratings, there was con-
siderable student-faculty agreement in the evaluations of teaching effectiveness.
Validity coefficients, the correlation between student and faculty ratings on the same
factors, were statistically significant for all evaluation factors (median r = 0.49).
Absolute agreement was also assessed by examining the mean differences between
student and faculty self evaluations. Across all the evaluation items the median
rating was the same for both students and faculty-4.07-and few differences in either
direction reached statistical significance.
In the second study, the same general findings were replicated with a larger
sample (329 classes) that included graduate level courses and courses taught by
teaching assistants (see Table 2). Student evaluations correlated with instructor self
evaluations in courses taught by teaching assistants (r = 0*46), in undergraduate
courses taught by faculty (r = 0.41), and even in graduate level courses (r = 0-39),
demonstrating their validity at all levels of teaching. Furthermore, a multitrait-
multimethod analysis (Campbell and Fiske, 1959) also provided evidence for the
distinctiveness of each of the rating factors. For example, if a single ' generalised
rating factor ' underlies both student and instructor ratings, then agreement on any
particular factor might be a function of this generalised agreement and not have
anything to d o with the specific content of the factor being considered. However, if
this were the case, the correlations between student and instructor ratings on digerent
factors should be nearly as high as correlations between ratings on the same
factors.
In fact, while correlations between student and instructor ratings on the same
factors were high (median r = 0.45), correlations between their ratings on different
factors were low (median r = 0.02). This argues for the distinctiveness of the different
evaluation factors and for the use of multifactor evaluation instruments that have been
developed with the use of factor analytic techniques. The findings of these two studies
provide further evidence for the validity of the student ratings, suggest the possible
usefulness of faculty self evaluations, and should be particularly helpful in reassuring
lecturers about the accuracy of the student ratings.
The results of the original study appear in Marsh, Overall and Kesler (1979a),
while the findings of the second study are presented in Marsh and Overall (1979b),
Marsh and Cooper (1981) and Marsh (in press).
00
P

TABLE 2
MULTITRAIT-MULTIMETHOD
MATRIX:CORRELATIONS BETWEEN SELF EVALUATIONS
STUDENT AND FACULTY IN ALL 329 COURsEs

Instructor self-evaluation factors


Instructor self-
evaluation factors Learn Enthu Organ Group Indiv Brdth Exams Assign Wrkld

Learning/Value );;(
Enthusiasm $);
Organisation 12 )-
;;(
Group interaction 01 03 (;y
Individual rapport -07 -01 07
Breadth 13 12 13 11
Examinations -01 08 26 09
Assignments 24 -01 17 05
Workld/Difficulty 03 -01 12 -09

Instructor self-evaluation factors Student evaluation factors


Student
evaluation factors Learn Enthu Organ Group Indiv Brdth Exams Assign Wrkld Learn Enthu Organ Group Indiv Brdth Exams Assign Wrkld
-
Learning/Value 10 -01 08 -12 09 -04 08 02 (95)
Enthusiasm (if) (54) -04 -01 -02 -01 -03 -09 -09 45
Organisation 17 13 (30) -03 04 07 09 00 -05 52
Group interaction 19 05 -20 00 -02 -14 -04 -08 37 (

;
(ZWJ30 ‘E’
Individual rapport 03 03 -05 (? (28) -19 -03 -02 00 22 35 33
Breadth 26 15 09 00 -14 (42) 00 09 02 49 34 56 17
‘T5”’ (g)
Examinations 18 09 01 -01 06 -09 (17) -02 -06 48 42 57 34 50 Z();
Assignments 20 03 02 09 -01 51 21 34 30 29 40 ‘”d
Workld/Difficulty -06 -03 04 00 03 -03
O4 12
-01 (2 (69)
l2 06 02 -05 -05 08 18 -02 (87)
NOTE:Values in the diagonals of the upper left and lower right matrices, the two triangular matrices, are reliability (coefEcient alpha) coefficients(See Nie ef a/. 1977). Values in the
diagonal of lower left matrix, the square matrix. are convergent validity coefficients that have been corrected for unreliability according to the Spearman Brown equation. The
nine uncorrected validity,coefficients,starting with Learning, would be 0.41, 0.48, 0.25, 0.46, 0.25, 0.37, 0.13.0.36, and 0.54. All correlation coefficients are presented without
decimal points. Correlations greater than 0.10 are statistically significant.
H. W. MARSH 85
Relationship with student, course and instructor characteristics
It is often feared that variables unrelated to teaching excellence may affect student
ratings, and the harshest critics even suggest that faculty can ' buy ' favourable
ratings by teaching only small courses, giving high grades, and requiring little work by
students. While these attitudes are probably not held by a majority of the faculty,
results cited earlier suggest that many do feel that student ratings are biased. The
study of possible biases is complicated by a number of problems. First is the question
of how large a relationship must be before it is considered practically significant.
Second is the problem of how to interpret a relationship even if it is substantial.
There are generally several alternatives and a bias may not be the most likely. For
example, the positive relationship between student ratings and student learning
supports the validity of the ratings, and it is unreasonable to say that student ratings
are biased by student learning. While the question is complex, the first step is to
determine which variables are substantially related to student ratings.
The relationship between student evaluations of 511 courses and a set of 16
student/course/instructor characteristics was examined. The set of background vari-
ables included such things as Class Size, GPA, Teacher Rank, Reason for Taking
the Course, Class Level, Year in School, Expected Grade, Workload/Difficulty, and
Prior Student Interest in the Subject. Separately, each background variable generally
explained less than 5 per cent of the variance in any of the student evaluation factors,
and there was little indication of non-linearity (see Table 3). The only variable that
consistently demonstrated non-linearity was Class Size-the smallest and largest
classes tended to be rated most favourably. Several multivariate techniques showed
that 12-14 per cent of the variance in the student ratings could be explained by the
entire set of background variables. Three background variables were most influential :
more favourable ratings were correlated with higher Prior Subject Interest, higher
Expected Grades, and higher levels of Workload/Difficulty. A path analysis showed
that Prior Subject Interest was most important, and also accounted for one-third of
the relationship between Expected Grades and ratings.
These results show that even the combined effect of the entire set of background
variables has only a small impact on student ratings, but indicated that three of these
background variables were most influential-Workload/Difficulty, Prior Subject
Interest, and Expected Grades. Although Workload/Difficulty is often suggested as a
potential bias, the relationship found in this study was the opposite of the suggested
bias. Harder, more difficult courses that require more time outside of class receive
more favourable ratings.
Prior Subject Interest, the variable with the largest impact on ratings, was
examined in greater detail in a separate study. A similar pattern of correlations was
found between Prior Subject Interest and student ratings collected at both UCLA
(using the earlier version of SEEQ) and USC. Prior Subject Interest was most highly
correlated with ratings of Learning/Value in both settings. The relationship between
Prior Subject Interest and Instructor self evaluations was also explored in that study,
Prior Subject Interest-measured by both student and instructor perceptions-
illustrated a similar pattern of correlations with both student ratings and instructor
self evaluations. In particular, Prior Subject Interest was most highly correlated to
both student and instructor ratings of Learning/Value. These findings argue that
lecturers actually are more effective at teaching when working with motivated students,
and that this more effective teaching is accurately reflected in the student ratings.
The relationship between student ratings and Expected Grades is subject to
several alternative interpretations. First, the Expected Grade effect was reduced by
one-third by controlling for Prior Subject Interest. The best explanation is that
Prior Subject Interest caused both better grades and a better educational experience.
TABLE 3
CORRELATIONS BETWEEN
16 BACKGROUND
VARIABLES A N D 1 1 STUDENT EVALUATION
SCORES
( N = 5 1 1 COURSE AVERAGES)

Student evaluation scores

Overall Overall Organis- Group Individual Workload/


difficulty
Background variables course instructor Learning Enthusiasm ation interaction rapport Breadth Exam Assignments

Prior subject interest 33 20 44 23 -03 29 09 23 12


Workload/Difficulty 23 14 12 06 01 -02 01 23' xx
Expected grade 21 20 29 20 01 31 17 13 -29
Reason for taking course
yo General interest only 16 12 15 09 16 07 -02 19' 10 18 -I3
Yo Major elective 16 13 26 06 -03 21 04 18* 02 15 -06
Yo Major requirement -I5 -12 - I8 -07 -08 -04 01 26* -02 -17 17
% General Ed require -11 -08 - I7 -04 03 -28 -06 03 -09 -06 -12
yo Minor/Related field 07 06 07 03 02 02 04 12 01 07 -01
Class level
Course level 17 14 21 12 -08 29 14 13 04 I1 06
Yo Fresh-Soph in Class -I2 -12 -I8 -08 -01 -28 -I7 01 -09 - 05 -03
O0 r.- r. in class 11 01 07 21 04 05 13 03 10 09 -01
i v i . "Year in School " II 10 20 11 -06 27 19 -06 03 04 03
Overall GPA (p!ior) 07 07 10 07 -06 17 14 04 07 13 12
yo Division Majors 15 14 15 03 05 29 08 08 13 II 15
Enrolment -I0 -09 4'
-1 01 -03 -32' -I8 01 -13 -04 01
Teacher rank -02 -08 - 10 -12 -10 -14 -05 24* -14 13 11
Multiple R'
(Yo Variance explained) 20"; 8.9% 24.7% 5.9% 0 96 23.0:; 3'5'/, 11.3% 8.3":, 12.3"(, 19.6":
~ ~

NOTE:Correlations are presented without decimal points. Correlations that are in bold figures indicate background variables which account for at least 5 per cent of the variance in a
particular evaluation score. The value of Multiple R squared is based,upon the combined effect of the subset of background variables that is most highly correlated with the
evaluation score. This was determined with a step-wise multiple regression in which a new background variable was added at each step until no additional variable could increase
Multiple R squared by as much as 1 per cent. The Multiple R squared was then corrected for the number of variables in the equation.
These relationships showed substantial non-linearity (i.e., quadratic and/or cubic components add at least 1 per cent to the Variance Explained by the linear relationship and the
Total Variance Explained by all components was at least 5 per cent).
H. W. MARSH 87
According to this interpretation, part of the Expected Grade relationship with student
ratings is spurious. Second, the Expected Grade relationship can only be considered a
bias if higher grades reflect ‘ easy grading ’ on the part of the teacher. If the higher
grades reflect better student achievement, then the Expected Grade relationship may
support the validity of the student ratings, i.e., better ratings are associated with more
student learning. At least two facts support this interpretation. First, Prior Subject
Interest is related to Expected Grades and it is more reasonable to assume that it
affects student achievement rather than the instructor’s grading standards. Second,
lecturers’ self evaluations of their own grading standards showed little correlation
with student ratings. In reality, Expected Grades probably reflect some unknown com-
bination of both ‘ easy grading ’ and student achievement. However, even if Expected
Grades do represent a real bias to the student ratings, their effect is not substantial.
These studies show that none of the suspected biases to student ratings seems
actually to have much impact. Similar findings have been reported by Remmers
(1963), Hildebrand et al. (1971), McKeachie (1979), and Marsh (1980a). Neverthe-
less, as a consequence of this research, summary reports describing student evaluations
also include mean responses and percentile ranks for Prior Subject Interest and
Expected Grades (see Appendix 2). This research is described in greater detail in
Marsh (1978, 1980b). Separate studies have examined the relationship between
student ratings and: (1) Expected Grades (Marsh, Overall and Thomas, 1976), (2)
Class Size (Marsh, Overall and Kesler, 1979b), and (3) Prior Subject Interest (Marsh
and Cooper, 1981). In related research, Marsh (1981b; Marsh and Overall, 1981)
demonstrated that student ratings are primarily,a function of the instructor doing the
teaching, and not the particular course or the level at which it is taught.
Instructional improvement-feedback fionz student ratings
There is ample reason to believe that a carefully planned programme of instruc-
tional evaluation instituted on a broad basis will lead to the improvement of teaching.
Teachers, particularly those who are most critical of the student ratings, will have to
give more serious consideration to their own teaching in order to consider the merits
of an evaluation programme. The institution of the programme and the clear
endorsement by the administrative hierarchy will give notice that quality of teaching is
being taken more seriously, an observation that both students and faculty will be
likely to make. The results of the student ratings, as one measure of teaching effective-
ness, will provide a basis for administrative decisions and thereby increase the likeli-
hood that quality teaching will be recognised and rewarded. The social reinforcement
of getting favourable ratings will provide added incentive for the improvement of
teaching, even at the tenured faculty level. Finally, the diagnostic feedback from the
student ratings may provide a basis for instructional improvement. As described
earlier, teaching staff at USC indicate that student ratings are useful in the improve-
ment of a course and/or the quality of their teaching: 80 per cent said that they were
potentially useful while 59 per cent said they actually had been useful. However, the
empirical demonstration of this suggestion is more difficult to test.
In two different studies the effect of feedback from midterm evaluations on end-
of-course criteria was tested. Both these studies were conducted with the multi-
section course in computer programming described earlier. In the first study, students
completed an abbreviated version of the student evaluation instrument at mid-term,
and the results were returned to a random half of the instructors. At the end of the
term, student ratings of “ perceived change in instruction between the beginning of
the term and the end of the term ” were significantly higher for the feedback group, as
were ratings on two of the seven evaluation factors. Ratings on the overall course and
instructor summary items did not differ, nor did student performance on the standar-
dised final examination given to all students.
Students’ Evaluations
Several changes were made in the second study that was based upon 30 classes.
First, mid-term evaluations were made on the same evaluation form that was used at
the end of the course. Second, the researchers actually met with the group of ran-
domly selected feedback instructors to discuss the ratings. At this meeting the
teachers discussed the evaluations with each other and with the researchers, but were
assured that their comments would remain confidential. A third change was the
addition of affective variables, items that focused on application of the subject matter
and student pIans to pursue the subject. At the end of the term, students of the
feedback instructors : (1) rated teaching effectiveness more favourably, (2) averaged
higher scores on the standardised final examination, and (3) experienced more positive
affective outcomes than students whose instructors received no feedback. Students in
the feedback group were similar to the other students in terms of both pretest achieve-
ment scores completed at the start of the term and the midterm evaluations of their
teachers. These findings suggest that the feedback from student ratings, coupled with
a frank discussion of their implications with an external consultant, can be an effective
intervention for improving teaching effectiveness.
The details of these studies have been described in two published articles (Marsh,
Fleiner and Thomas, 1975; Overall and Marsh, 1979). SimiIar findings have been
reported by McKeachie et al. (1980) and a meta-analysis by Cohen (1981).
In summary, research described in this study has indicated that:
(1) SEEQ measures nine distinct components of teaching effectiveness as demons-
trated by factor analysis. Factor analysis of faculty evaluations of their own teaching
resulted in the same factors. Factor scores based upon this research are used to
summarise the student ratings that are returned to faculty.
(2) Student evaluations are quite reliable when based upon the responses of 10 to
15 or more students. Class ratings based upon fewer than ten student responses
should be interpreted carefully.
(3) The retrospective ratings of former students agree remarkably well with the
evaluations that they made at the end of a course.
(4) Student evaluations show moderate correlations with student learning as
measured by a standardised examination and with affective course consequences such
as application of the subject matter and plans to pursue the subject further.
( 5 ) Faculty self evaluations of their own teaching show good agreement with
student ratings.
(6) Suspected sources of bias to student ratings have little impact.
(7) Feedback from student ratings, particularly when coupled with a candid
discussion with an external consultant, can lead to improved teaching.

ACKNOWLEDGMENTS.-GratefUl acknowledgments are extended to John Schutz, Joseph


Kertes, and Robert Linnell of the University of Southern California, and Raymond Orbach,
James Trent, Robert Pace, and Leon Levine of the University of California, Los Angeles, who
supported the student evaluation programme at their universities. Thanks are also extended
to each of the co-authors of studies that led to the development of SEEQ, and particularly to
Jesse Overall. The author was employed as Director of the Evaluation of Instruction Pro-
gram at UCLA (1972-74) and as Head of Evaluation Services at USC (1976-80) during the
time this research was being conducted. An earlier version of this paper was presented at
the annual meeting of the Australian Association for Research in Education, Sydney,
Australia, November, 1980. Requests for reprints should be sent to Herbert W. Marsh,
Department of Education, University of Sydney, Sydney, NSW 2006, Australia. The SEEQ
survey is copyrighted by Dr. Herbert W. Marsh, but interested parties can obtain permission
to use it by writing to the author.
H. W. MARSH 89
REFERENCES
CAMPBELL, D. T., and FISKE,D. W. (1959). Convergent and discriminant validation by the multi-
trait-multimethod matrix. Psycho/. Bull., 56, 81-105.
CENTRA, J. A. (1977). Student ratings of instruction and their relationship to student learning.
Am educ. Res. J . , 14, 17-24.
COHEN,P. A. (1980). Effectiveness of student-rating feedback for improving college instruction: a
meta-analysis. Res. higher Educ., 13, 321-341.
COHEN,P. A. (1981). Student ratings of instruction and student achievement: a meta-analysis of
multisection validity studies. Rev. educ. Res., 51, 281-309.
FREY,P. W., LEONARD, D. W., and BEATTY, W. W. (1975). Student ratings of instruction: validation
research. Am. educ. Res. J., 12, 327-336.
HILDEBRAND, M., WILSON, R. C., and DIENST, E. R. (1971). Evaluating University Teaching. Berke-
ley : Center for Research and Development in Higher Education, University of California,
Berkeley.
MARSH,H. W. (1977). The validity of students’ evaluations: classroom evaluations of instructors
independently nominated as best and worst teachers by graduating seniors. Am. educ. Res. J.,
14, 441-447.
MARSH,H. W. (1978). Students’ Evaluations of Instructional Effectiveness: Relationship to Student
Course, and Instructor Characteristics. Paper presented at the Annual Meeting of the American
Educational Research Association, Toronto. (ERIC Document Reproduction Service No.
E D 155 217).
MARSH, H. W. (1980a). The influence of student,%ourse and instructor characteristics on evaluations
of university teaching. Anz. educ. Res. J., 17, 219-237.
MARSH,H. W. (1980b). Research on students’ evaluations on teaching effectiveness. Instruct. Eval.
4, 5-13.
MARSH,H. W. (1981a). Students’ evaluations of tertiary instruction: testing the applicability of
American surveys in an Australian setting. Acist. J. Educ., 25, 177-192.
MARSH,H. W. (1981b). The use of path analysis to estimate teacher and course effects in student
ratings of instructional effectiveness. Appl. Psychol. Meas. (in press).
MARSH,H. W. Validity of students’ evaluations of college teaching: a multitrait-multi-method
analysis. J. educ. Psycho/., (in press).
MARSH,H. W., and COOPER, T. L. (1981). Prior subject interest, students’ evaluations, and instruc-
tional effectiveness. Multiv. behav. Res., 16, 82-104.
MARSH,H. W., FLEINER, H., and THOMAS, C. S. (1975). Validity and usefulness of student evalu-
ations of instructional quality. J. educ. Psychol., 67, 833-839.
MARSH,H. W., and OVERALL, J. U. (1979a). Long-term stability of students’ evaluations: note on
Feldman’s Consistency and variability among college students in rating their teachers and
I‘

courses ”. Res. higher Educ., 10, 139-147.


MARSH,H. W., and OVERALL, J. U. (1979b). Validity of Students’ Evaluations of Teaching: A
Comparison with Instructor Self Evaluations by Teaching Assistants, Undergraduate Faculty
and Graduate Faculty. Paper presented at the Annual Meeting of the American Educational
Research Association, San Francisco (ERIC Document Reproduction Service No. ED177 205).
MARSH,H. W., and OVERALL, J. U. (1980). Validity of students’ evaluations of teaching effective-
ness: cognitive and affective criteria. J. educ. Psychol., 72, 468-475.
MARSH,H. W., and OVERALL, J. U: (1981). The relative influence of course level, course type, and
instructor on students’ evaluations of college teaching. Am. educ. Res. J., 18, 103-112.
MARSH,H. W., OVERALL, J. U., and KESLER, S. P. (1979a). Validity of student evaluations of instruc-
tional effectiveness: a comparison of faculty self-evaluations and evaluations by their students.
J. educ. Psychol., 71, 149-160.
MARSH,H. W., OVERALL, J. U., and KESLER, S. P. (1979b). Class size, students’ evaluations, and
instructional effectiveness. Am. educ. Res. J . , 16, 57-70.
MARSH,H. W., OVERALL, J. U., and THOMAS, C. S. (1976). The Relationship Between Students’
Evaluation of Instruction and Expected Grade. Paper presented at the Annual Meeting of the
American Educational Research Association, San Francisco. (ERIC Document Reoroduction
Service No. ED 126140).
MCKEACHIE, W. J. (1979). Student ratings of faculty: a reprise. Academe, 384-397.
MCKEACHIE. W. J.. LIN. Y-G.. DAUGHERTY. M.. MOFFETT.M. M.. NEIGLER.C.. NORK.J.. WALZ.
M., and BALDWIN, R. (1980). Using siudent ratings and consultation to improve insthction:
Br. J . educ. Psycho/., 50, 168-174.
NIE, N. H., HULL,C. H., JENKINS, J. G., STEINBRENNER, K., and BENT,D. H. (1975). Statistical
Package For the Social Sciences. New York: McGraw-Hill.
NIE, N. H., HULL,C. H., JENKINS, J. G., STEINBRENNER, K., and BENT,D. H. (1977). Update to
Statistical Package for the Social Sciences. New York : McGraw-Hill.
OVERALL, J. U., and MARSH,H. W. (1979). Midterm feedback from students: its relationship to
instructional improvement and students’ cognitive and affective outcomes. J. educ. Psychol., 71,
856-865.
Students’ Evaluations
OVERALL, J. U., and MARSH, H. W. (1980). Students’ evaluations of instruction: a longitudinal study
of their stability. J. educ. Psychol., 72, 321-325.
REMMERS, H. H. (1963). Teaching methods in research on teaching. In GAGE,N. L. (Ed.), Handbook
on Teaching. Chicago: Rand McNally.
(Manuscript received 29th December, 1980)

APPENDIX 1-Information Contained in the SEEQ Survey*


INSTRUCTIONS
This evaluation form is intended to measure your reactions to this instructor and course. Results will
be reported to the Department Chairperson to be used as part of the overall evaluation of the in-
structor. These evaluations will have budgetary and promotional implications so please take it
seriously. When you have finished a designated student will pick up the evaluations and take them
to the Department Chairperson. Your responses will remain anonymous and the summaries will not
be given to the instructor until after final grades have been assigned.
As a description of this Course/Instructor, this statement is:
(select the best response for each of the following statements, Very Moder- Very
leaving a response blank only if it is clearly not relevant) poor Poor ate Good good
1 Learning: You found the course intellectually challenging
and stimulating 1 2 3 4 5
2 You have learned something which you
consider valuable 1 2 3 4 5
3 Your interest in the subject has increased as a
consequence of this course 1 2 3 4 5
4 You have learned and understood the subject
materials in this course 1 2 3 4 5
5 Enthusiasm: Instructor was enthusiastic about teaching
the course 1 2 3 4 5
6 Instructor was dynamic and energetic in
conducting the course 1 2 3 4 5
7 Instructor enhanced presentations with the use
of humour 1 2 3 4 5
8 Instructor’s style of presentation held your
interest during class 1 2 3 4 5
9 Organisation: Instructor’s explanations were clear 1 2 3 4 5
10 Course materials were well prepared and
carefully explained 1 2 3 4 5
11 Proposed objectives agreed with those actually
taught so you knew where course was going 1 2 3 4 5
12 Instructor gave lectures that facilitated taking
notes 1 2 3 4 5
13 Group Interaction: Students were encouraged to partici-
pate in class discussions 1 2 3 4 5
14 Students were invited to share their ideas and
knowledge 1 2 3 4 5
15 Students were encouraged to ask questions and
were given meaningful answers 1 2 3 4 5
16 Students were encouraged to express their own
ideas and/or question the instructor 1 2 3 4 5
17 Individual Rapport: Instructor was friendly towards
individual students 1 2 3 4 5
18 Instructor made students feel welcome in
seeking help/advice in or outside of class 1 2 3 4 5
19 Instructor had a genuine interest in individual
students 1 2 3 4 5
20 instructor was adequately accessible to
students during office hours or after class 1 2 3 4 5
H. W. MARSH 91
APPENDIX 1-continued
Very Moder- Very
poor Poor ate Good good
21 Breadth: Instructor contrasted the implications of
various theories 1 2 3
22 Instructor presented the background or origin
of ideas/concepts developed in class 1 2 3
23 Instructor presented points of view other than
his/her own when appropriate 1 2 3
24 Instructor adequately discussed current devel-
opments in the field 1 2 3
25 Examinations: Feedback on examinations/graded materi-
als was valuable 1 2 3
26 Methods of evaluating student work were fair
and appropriate 1 2 3
27 Examinations/graded materials tested course
content as emphasised by the instructor 1 2 3
28 Assignments: Required readings/texts were valuable 1 2 3
29 Readings, homework, etc. contributed to
appreciation and understanding of subject 1 2 3
30 Overall: How does this course compare with other
courses you have had at University of Southern
California (USC)? 1 2 3 4 5
31 How does this instructor compare with other
instructors you have had at USC? 1 2 3 4 5
Student and Course Characteristics
(Leave blank if no response applies)
32 Course difficulty, relative to other courses, was:
. ..
1. very easy . . 3. medium . 5. very hard 1 2 3 4 5
33 Course workload, relative to other courses, was:
..
1. very light . 3. medium .. . 5. very heavy 1 2 3 4 5
34 Course pace was: 1. too slow.. . 3. about right.. .
5. too fast 1 2 3 4 5
35 Hours/weeks required outside of class: 1. 0 to 5;
2. 2 to 5; 3. 5 to 7; 4. 8 to 12; 5. over 12 1 2 3 4 5
36 Level of interest in the subject prior to this course:
.
1. very l o w . . . 3. medium. . 5. very high 1 2 3 4 5
37 Overall Grade Point Average a t USC: 1. below 2.5;
2. 2.5 to 3.0; 3. 3.0 to 3.4; 4. 3.5 to 3.7; 5. above 3.7.
Leave blank if not yet established at USC. 1 2 3 4 5
38 Expected gradein thecourse: 1. F, 2. D, 3. C, 4. B, 5. A F D C B A
39 Reason for taking the course: 1 . major require; 2.
major elective; 3. general ed require; 4. minor/related
field; 5. general interest only. Select the one which is
best. 1 2 3 4 5
40 Year in school: 1. Freshman; 2. Sophomore; 3. Junior;
4. Senior; 5. Postgraduate 1 2 3 4 5
41 Major department: 1. SOCSci/Comm; 2. Nat Sci/Math.;
3. Humanities; 4. Business; 5. Education; 6. Engin- 1 2 3 4 5
eering; 7. Perf Arts; 8. Pub Affairs; 9. Other; 10.
Undeclared/undecided 6 7 8 9 10
* The material presented here represents some information extracted from the two-sided survey.
!age 1 of the computer scay,nable form actually contains these 41 items and provision for responses to
Supplemental Questions that can be devised by the instructor or academic unit. Page 2 contains
the instructions, and room for comments to three open-ended questions: (1) Please indicate the
important characteristics of this instructor/course which have been most valuable to your learning
experience; (2) Please indicate characteristics of this instructor/course which you felt are most im-
portant for himiher to work on improving (particularly aspects not covered by the rating items) ;
and (3) Please use the additional space to clarify any of your responses or to make other comments.
The SEEQ survey is copyrighted by Dr. Herbert W. Marsh.
APPENDIX 2-Information contained in the Summary Report returned to individual staff and their department chairpersons?
Instructor: Doe, John Class Schedule Number: 99999 Page 1 of 2
Department: Sample Department Term: Spring 78 Number of Students completing evaluations: 43
Course: Sample Dept 999 Percentage of enrolled students completing evaluations : 92 %
Student and Course Characteristics:
For each question the percentage of students making each response and the mean average response (if appropriate) is presented. (These statistics
are based upon the number of students actually responding to the item.) In addition the percentage of students who completed the evaluation
form but did not respond to a particular question is indicated by the " No Response " percentage.
Prior Interest Overall G.P.A. Expected Grade Reason in Class Year in School
PCT PCT PCT PCT PCT
1. Very low 9%1. Below 2.5 0% 0. F 0% 1. Maj reqrd 44% 1. Freshman 9%
2. 9%2. 2.5-3.0 27% l.D 0% 2. Maj elect 37% 2. Sophomore 35 %
3. Medium 42%3. 3.0-3.4 39% 2. c 14% 3. Gen. Ed req 2% 3. Junior 33 %
4. 12%4. 3.4-3.7 12X 3.B 62 % 4. Min/Reltd 2% 4. Senior 23 %
5. Very high 28%5. Above 3.7 22% 4. A 24 % 5. Gen intrst 14% 5. Graduate 0% 2. Nat Sci 2 g
No response 0% NO response 5 No response 2% No response 0% No response 0% 3. Humnties 9/,
Mean: 3.39 Mean: 3.28 Mean: 3.09 Mean: 2.69 4. Business 2%
5. Education 0%
6. Engineer 0%
Course Difficulty Course Workload Course Pace Outside Hrs/Wk 7. Perf Art 0%
PCT PCT PCT PCT 8. Pub Affr 0%
9. Other 12%
1. Very easy 2% 1. Very light 0% 1. Too slow 2X 1.Oto2 0% 0. Undec 2%
2. 2% 2. 0% 2. 0% 2. 2 to 5 55y
3. Medium 40 % 3. Medium 71% 3. Right 71% 3. 5 t o 7 32 /,
4. 40 % 4. 24% 4. 19% 4. 8 to 12 13 %
5. Very hard 16% 5. Very heavy 5% 5. Too fast 7% 5. Over 12 0%
No response 0% No response 2% No response 2% No response 7%
Mean: 3.64 Mean: 3.32 Mean: 3.28 Mean: 2.56
Summary evaluation scores: Nine evaluation factor scores, two overall rating items, and two student/course characteristic items.
The nine evaluation factor scores are weighted averages of separate rating items and have a mean average (across all USC courses) of 50. For
all scores, the Standard Error (SE) is a measure of the reliability. It is smaller (more reliable) when larger numbers of students are responding
and when there is greater agreement among the students completing the evaluations. Differences of less than one standard error are too small to
be reliably interpreted. In general, evaluations based upon less than 10 students’ responses or less than 50 per cent of the students enrolled in the
class should be interpreted cautiously. The percentile ranks (which may vary between 0 and 100) and the corresponding graphs show how your
ratings.compare with other courses in your comparison group. Higher percentile ranks and more stars indicate higher ratings. Your comparison
group 1s:
Undergraduate courses not taught by teaching assistants
SE %Ti1 Graph of %Tile Rank
Evaluation factor scores Mean +I- Rank 0 1 2 3 4 5 6 7 8 9
Learning Valuable learning experience, was intellectually stimulating/challenging 58.2 1.745 80 *****************
Enthusiasm Instructor displayed enthusiasm, energy, humour and ability to hold interest 56.9 1.506 74 ***************
Organisation Organisation/clarity of explanations, course materials, objectives, lectures 61.3 1.726 90 *******************
Group Interact Students encouraged to discuss, participate, share ideas and ask questions 51.3 2.160 53 ***********
Indv. rapport Instructor accessible, friendly, and interested in students 48.7 2.362 43 *********
Breadth Presentation of broad backgrd, concepts and alternative approaches/theories 55.1 2.274 70 ***************
Examinations Student perceptions of value and fairness of exams/graded materials 50.9 2.188 55 ************
Assignments Value of assignments in adding appreciation/understandingto course 59.9 1.639 88 ******************
Workload/Diff Relative course workload, difficulty, pace, and outside hours required 51.5 1.515 60 *************
Overall summary i tems
Overall course How does this course compare with others at USC? (Question 30) 4.44 0.102 83 *****************
Overall Instr. How does this instructor compare with others at USC? (Question 31) 4.61 0.089 84 ****************
Important student/course characteristics
Level of interest in subject prior to this course (1. Very low.
.. 5. Very high) (Question 36) 3.39 0.191 49 **********
Expected grade in the course (0-F, 1-D, 2-C, 3-B, 4-A) (Question 38) 3.09 0.428 30 *******
Prepared by Dr. Herbert W. Marsh, Office of Institutional Studies, USC; L.A., CA 90007 Telephone: (213) 741-6503.

W
w
APPENDIX 2 -continued
Instructor: Doe, John Class Schedule Number: 99999 Page 2 of 2
Department: Sample Department Term: Spring 78 Number of students completing evaluations: 43
Course: Sample Dept 999 Percentage of Enrolled Students completing evaluations: 92%
Evaluation items (Some questions have been abbreviated) :
For each question, the percentage of students making each response, the mean average response, and the Standard Error (SE) of the responses are
presented. (These statistics are based upon the actual number of students responding to the questio?) In addition, the percentage of students
who completed the evaluation form but did not respond to a particular question is indicated in the No Resp column. Differences in mean
averages that are less than one standard error (see page one for a description) are too small to be reliably interpreted. In general, evaluations based
upon less than 10 students' responses, evaluations based upon less than 50 per cent of the class, and evaluation items which uere frequently left
blank should be interpreted cautiously. The percentile ranks (which vary between 0 and 100) and the graphs show how your evaluations compare
with other courses in your comparison group. (Higher percentile ranks and more stars indicate more favourable evaluations.) Your comparison
group is:
Undergraduate courses not taught
- by. teaching assistants
Rank relative to your
Percentage responding comparison group (See above)
Very Med- Very No SE %Ti1 Graph
Poor Poor ium Good good resp Mean +/- Rank 0 1 2 3 4 5 6 7 8 9
Learning
1. Course was intellectually challenging and stimulating 0 0 5 42 53 0 448 0.089 32 *****************
2. Learned something considered to be valuable 0 0 9 40 51 0 4.41 0.100 69 **************
3. Increased interest in subject as consequence of course 0 2 7 40 51 0 4.39 0.110 82 *****************
4. Learned and understood the subject materials 0 0 9 60 30 0 4.20 0.090 68 **************
Enthusiasm
5. Instructor was enthusiastic about teaching the course 0 0 2 26 72 0 4.69 0.077 84 *****************
6. Instructor was dynamic and energetic in conducting course 0 0 7 50 43 2 4.35 0.094 72 ***************
7. Instructor enhanced presentation with humour 0 5 33 36 26 2 3.82 0.135 42 * * * * ** * **
8. Instructor style of presentation held interest 0 7 7 53 33 0 4.11 0.125 71 ***************
Organisation
9. Instructors' explanations were clear 0 0 12 40 49 0 4.36 0.104 80 *****************
10. Course materials were well prepared and explained 0 0 7 34 59 5 4.50 0.099 88 ******************
11. Proposed objectives agreed with those actually taught 0 0 5 44 51 5 4.45 0.092 87 ******************
12. Lectures facilitated taking notes 0 2 9 28 60 0 4.46 0.116 90 *******************
Group interaction
13. Students encouraged to participate in class discussions 0 5 19 33 44 0 4.15 0.136 52 ***********
14. Students invited to share ideas and knowledge 0 2 21 40 37 0 4.11 0.125 48 **********
15. Students encouraged to ask questions and give answers 0 0 21 28 51 0 4.29 0.121 59 ************
16. Students encouraged to express own ideas 0 2 21 35 42 0 4.15 0.128 51 ***********
Individual rapport
17. Instructor was friendly towards individual students 0 5 17 45 32 7 4.04 0.133 26 ******
18. Instructor welcomed students to seek help/advice 0 7 29 29 36 2 3.92 0.149 32 *******
19. Instructor had genuine interest in individual students 0 2 25 40 32 7 4.01 0.130 48 **********
20. Instructor was accessible during office hours/after class 0 0 16 50 34 12 4.17 0.111 68 **************
Breadth
21. Instructor contrasted implications of various theories 0 2 19 36 43 2 4.18 0.128 67 **************
22. Instructor presented background of ideas/concepts 0 0 9 40 51 0 4.41 0.100 84 *****************
23. Instructor presented points of view other than own 0 2 19 45 33 2 4.09 0.121 53 *********** p
24. Instructor discussed current developments in field 0 2 16 35 47 0 4.25 0.125 58 ************
Examinations
25. Feedback on exams/graded materials was valuable 2 12 31 40 14 2 3.51 0.148 38 ********
26. Method of evaluation was fair and appropriate 0 5 21 52 21 2 3.89 0.121 56 ************
27. Graded materials tested course content as emphasised 0 0 24 43 33 2 4.09 0.116 63 ************* M
Assignments 3:
28. Required readings/texts were valuable 0 0 7 49 44 0 4.36 0.093 90 *******************
29. Assignments contributed to appreciation/understanding 0 0 10 48 43 2 4.32 0.099 83 *****************
Overall
30. How does this course compare with others at USC? 0 0 10 35 55 2 4.44 0.102 83 *****************
31. How does this instructor compare with others at USC? 0 0 5 29 67 2 4.61 0.089 84 *****************
t The actual Summary Report is a two-paged computer printout that is produced as part of the data processing.

You might also like