sigcse2018_PythonVsCpp_Alzahrani
sigcse2018_PythonVsCpp_Alzahrani
8 Constructor overloading 7 Objs & Classes 3/10/2017 0xxx Yes while (userNum >= 1){
1:30:18 PM cout << userNum << " ";
9 Basic inheritance 10 Inheritance userNum = userNum / 2;
}
10 Derived class membr override 10 Inheritance
number of attempts to solve that CA, called “Baseline attempts”;
11 Recursive function: Writing 12 Recursion (3) if the student total time spent is greater than 15 minutes, then
the base case the student is a struggling student; or if the student total time spent
is greater than 5 minutes and the student total time spent is greater
than double the Baseline time and the student total number of
attempts is greater than 3 and the student total number of attempts
that CA. We defined a dynamic struggle rate (by referring to top
is greater than double the Baseline attempts, then the student is a
20% students) rather than a static struggle rate to account for the
struggling student. To calculate the struggle rate for a CA, we
class (students) background level in programming. A CA’s
divide the total number of struggling students for a CA by the
struggle rate is defined as the # of struggling students divided by
total number of students for that CA.
the # of students in that class. The following formulas summarize
a struggling student and the struggle rate. 4 RESULTS
Struggling student = ((time > 5 min.) AND (time We obtained anonymized student submission data for the 11
above-mentioned nearly-identical CAs, for C++ and Python
> 2 * Baseline time) AND (# attempts > 3) AND courses at dozens of universities. We chose 11 C++ courses and
(# attempts > 2 * Baseline attempts)) OR (time > 10 Python courses at 20 universities to represent a variety of
15 min.) institutions, including 4-year research institutions (none were
schools typically ranked in the top 20), non-research 4-year
Struggle rate = # struggling students / # students institutions, and 2-year institutions (community colleges). To
Changing the parameters increases or decreases the struggle rate; obtain roughly equal samples from both languages, we generally
we based these numbers on teaching experiences. Other struggle sought to match each C++ course with a Python course from an
rate metrics are possible. For our purposes, the raw % is less institution of the same type and roughly the same numbers of
important than is the comparison of rates for different languages. students. Table 3 shows the number of students in the 20 courses
As shown in Table 2, students can make multiple submissions for per language (C++ and Python). Obviously such matching can’t
each CA. Each submission consists of a timestamp, user id #, be perfect, but by attempting such matching, coupled with the
correctness, and the submitted code. To identify a struggling large numbers of students, we can have more confidence that the
student for a CA, we do the following: (1) get the student total two sample populations’ statistics can be meaningfully compared.
time spent and the student total number of attempts to solve that
Table 4 shows the struggle rates for the 11 coding exercises,
CA; (2) calculate the top 20% student average time to solve that
summarized for all 11 C++ courses and all 10 Python courses. For
CA, called “Baseline time”, and the top 20% student average
example, the first data row is for CA 1 (Coding Activity 1). 787
Table 3: Each row is two similar schools using different Table 4: A comparison of struggle rates on 11 nearly-
languages. A Python-match for row 11 does not exist, but identical coding exercises for 11 C++ and 10 Python courses.
we kept the C++ offering to have data for no-prerequisite
and non-majors for C++ CA’s.
2 (Community colleges) 13 33
6 (Teaching universities) 48 35
Not No prerequisite and non CS < 0.0001 6% Our analysis involved 1,927 students at courses across 20
shown majors. 2 C++ and 3 Python universities. While those large numbers and the diversity of
courses. populations are strengths of the data and likely minimizes the
impact of one particular course’s policies or instructor’s teaching
style, also useful would be a controlled study at one university
(which is hard to carry out, since such random assignment is
rarely acceptable), or where the university switched from one
Tables 4 and 5 show Python students struggle more than C++
language to another across semesters (but other factors like
students. We want to account for the relative number of students
teacher and student population may confound results).
per CA because the struggle % is not consistent per CA. For
example, in Table 5, CA 8 had 247 C++ students but only 32 The Python/C++ textbooks use a standard approach. Other
Python students. We thus converted the difference of the average approaches, such as a media-based approach or objects-first
C++ and Python Z-scores to a % as follows. For each table, we: ordering, may yield different results.
1. Calculated the Z-score per CA: We used the mean and The perceived easier learning curve is just one reason some
standard deviation of C++ and Python struggle combined per teachers have switched to Python. Other reasons exist, such as
CA. built-in libraries. Thus, the above data relates to just one factor
2. Calculated the p-value for the whole table: We used a among many that influence a CS 1 language decision.
Student's t-test to compare the C++ Z-scored struggle to
Possible reasons: This study analyzed struggle rate, not the
Python Z-scored struggle
reasons. One possible reason for Python’s struggle rate not being
3. Calculated the percentage of the average difference: We
averaged the C++ Z-scores, and separately averaged the lower than for C++ is that learning core programming concepts
may overshadow syntax issues. The manual investigation of
Python Z-scores, then used a Z-score to percentile calculator
[7] to convert the difference in average to a percentile. student submissions seemed to support this reason; few students
struggled with syntax in either language. Instead, struggle was due
The final step gives the % likelihood that a given student would to programming concepts like creating a proper loop to solve a
struggle more with Python than C++. Table 6 shows that a student task. The case may be that college students can master the basic
is 12% more likely to struggle with Python than C++. syntax of C++ nearly as quickly as they master the slightly-easier
syntax of Python. Also, C++ teachers can choose whether or not
6 MANUAL INVESTIGATION to dwell on C++ syntax. The textbook in this study avoids
The analysis above suggests that the Python learning curve, based potentially-problematic aspects of C++, such as branches/loops
on the metric of struggle rate on small coding exercises, is not without braces (the book always uses braces), assignments in
easier than the C++ learning curve. In fact, the analysis suggests branch/loop expressions (the book avoids those), use of
(perhaps surprisingly) that the learning curve is actually harder. prefix/postfix increment operators (the book avoids except in a
To better understand, we manually examined student submissions for-loop header), etc., instead teaching a common and safer subset
to many of the CA’s. Because manual examination is very time of C++.
Python’s struggle rate was surprisingly higher. One possible students to think more precisely about language-independent
reason relates to programming requiring precision. From the problem-solving as well, like writing loop expressions that iterate
beginning, C++ requires precise thought about variable exactly as desired. Python’s forgivingness might breed a more
declarations, variable types, data types resulting from expressions, cavalier attitude that extends beyond syntax/semantics into
use of braces, use of = vs. ==, etc. This precision may prime problem solving as well. This of course is just conjecture; future
work may seek to test the idea.
Table 7: CA1, 2, 7, and 10 for all data and the reasons why
students struggled, as determined by manual We note that cloud-based programming is reducing the difference
investigation. between languages, eliminating (or postponing) the need to install
or even use an IDE.
CA C++ Python
8 CONCLUSIONS
Reasons Reasons One factor leading teachers to use Python in CS 1 courses is the
belief that Python has an easier learning curve. We analyzed
1 1-Using / instead of * 1-Using tan() instead of struggle rates for 11 nearly-identical short coding exercises in 11
2-Missing tan() for the math.tan() C++ and 10 Python courses, involving about 1,000 students in
angleElevation 2-Using / instead of * each language at 20 universities. We found the Python struggle
variable 3-Missing tan() for the
rate was not lower than C++. One possible reason is that the
3-Mistyping variable angleElevation variable
languages’ syntax differences are eclipsed by the difficulty of
names 3-Mistyping variable names
learning language-independent programming concepts, especially
4-Wrong assignment (using two
if C++ teachers don’t dwell on C++’s complex syntax options.
= symbols, assign to the wrong
variable, reverse assignment, In fact, our analysis showed Python’s struggle rate to be
etc.) significantly higher than C++. One possible reason is that C++’s
focus on precision translates to a more precise approach to
2 1-Wrong loop 1-Wrong loop condition programming. As for attrition, at our institution, we have found
condition 2-Wrong/missing loop counter
that a caring talented instructor with good class design, policies,
2-Wrong/missing loop update
and assignments -- appropriate homework/assignment points ratio,
counter update 3-Missing/wrong location
various help resources, encouragement of collaboration, flipped
3-Missing/wrong output statement
location output stmt 4-Indentation (few: just 5
lectures, interesting/relevant assignments -- seem far more
students) important than the language choice. In fact, in our most recent
offering of CS 1 in C++, students provided evaluations in the
7 1-Missing for-loop 1-Missing for-loop 95’th percentile for all courses in the university of 30,000
2-Missing counter 2-Wrong while-loop update students, while performing strongly on programming assignments
inside the for-loop (when using while-loop) and exams.
3-Wrong for-loop 3-Wrong for-loop condition
In any case, the analysis might help CS 1 teachers predict whether
counter initial value 4-Wrong print() argument
switching from C++ (or C or Java) to Python might yield the
4-Wrong for-loop inside the for-loop
condition 5-Missing for-loop condition
desired benefit of an easier learning curve. We encourage the
5-Wrong for-loop variable inside the for-loop community to perform more such analyses, so that teachers can be
location 6-Wrong for-loop location guided by data in making language decisions for CS 1 courses.
6-Wrong cout() arg
inside for-loop REFERENCES
[1] Richard J. Enbody, William F. Punch, and Mark McCullen. 2009. Python CS1 as
preparation for C++ CS2. ACM SIGCSE Bulletin 41, no. 1 (2009): 116-120.
10 1-Missing ; 1-Missing function definition [2] Richard J. Enbody, and William F. Punch. 2010. Performance of python CS1
2-Wrong cout() 2-Missing call to member students in mid-level non-python CS courses. In Proceedings of the 41st ACM
argument function technical symposium on Computer science education, pp. 520-523. ACM, 2010.
[3] Michael H. Goldwasser, and David Letscher. 2008. Teaching an object-oriented
3-Not complete cout() 3-Missing argument to call a CS1-: with Python. ACM SIGCSE bulletin, vol. 40, no. 3, pp. 42-46. ACM,
arguments member function 2008.
4-Wrong location to 4-Extra space when calling [4] Lutz Prechelt. 2003. Are scripting languages any good? A validation of Perl,
Python, Rexx, and Tcl against C, C++, and Java. Advances in Computers 57
call a member function print() (2003): 205-270.
5-Missing to call a 5-Wrong call to a member [5] John M. Zelle. 1999. Python as a first language. In Proceedings of 13th Annual
member function function Midwest Computer Conference, vol. 2, p. 145. 1999.
[6] Philip Guo. 2014. Python is now the most popular introductory teaching
6-Wrong format to call 6-Wrong print() statement language at top us universities. BLOG@ CACM, July (2014): 47.
a member function 7-Wrong function argument [7] Measuring U. Z-Score to Percentile Calculator. https://measuringu.com/pcalcz/,
7-Missing function def accessed Aug, 2017.
[8] zyBooks. https://www.zybooks.com/, accessed Aug, 2017.