0% found this document useful (0 votes)
83 views

Teaching Monte Carlo Simulation With Python

This article describes teaching Monte Carlo simulation to business students using Python. The authors first taught simulation using Google Sheets to familiarize students with the concept before introducing Python. They developed a series of simulation assignments completed first in Sheets then in Python. The goal was to support learning statistical computing for students unfamiliar with programming but familiar with spreadsheets. Teaching simulation with Python provides students with skills relevant for data analysis careers and aligns with recommendations to incorporate computing concepts and hands-on modeling experiences into statistics education.

Uploaded by

Dũng Nguyễn
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views

Teaching Monte Carlo Simulation With Python

This article describes teaching Monte Carlo simulation to business students using Python. The authors first taught simulation using Google Sheets to familiarize students with the concept before introducing Python. They developed a series of simulation assignments completed first in Sheets then in Python. The goal was to support learning statistical computing for students unfamiliar with programming but familiar with spreadsheets. Teaching simulation with Python provides students with skills relevant for data analysis careers and aligns with recommendations to incorporate computing concepts and hands-on modeling experiences into statistics education.

Uploaded by

Dũng Nguyễn
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Journal of Statistics and Data Science Education

ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/ujse21

Teaching Monte Carlo Simulation with Python

Justin O. Holman & Allie Hacherl

To cite this article: Justin O. Holman & Allie Hacherl (2022): Teaching Monte Carlo Simulation with
Python, Journal of Statistics and Data Science Education, DOI: 10.1080/26939169.2022.2111008

To link to this article: https://doi.org/10.1080/26939169.2022.2111008

© 2022 The Author(s). Published with


license by Taylor and Francis Group, LLC.

View supplementary material

Published online: 11 Oct 2022.

Submit your article to this journal

Article views: 1317

View related articles

Full Terms & Conditions of access and use can be found at


https://www.tandfonline.com/action/journalInformation?journalCode=ujse21
JOURNAL OF STATISTICS AND DATA SCIENCE EDUCATION
2022, VOL. 00, NO. 0, 1–12
https://doi.org/10.1080/26939169.2022.2111008

Teaching Monte Carlo Simulation with Python


Justin O. Holmana and Allie Hacherlb
a
Hasan School of Business, Colorado State University Pueblo, Pueblo, CO; b Judson University, Elgin, IL

ABSTRACT ARTICLE HISTORY


It has become increasingly important for future business professionals to understand statistical computing Received 16 October 2021
methods as data science has gained widespread use in contemporary organizational decision processes in Accepted 31 July 2022
recent years. Used by scores of academics and practitioners in a variety of fields, Monte Carlo simulation is
one of the most broadly applicable statistical computing methods. This article describes efforts to teach KEYWORDS
Monte Carlo simulation using Python. A series of simulation assignments are completed first in Google Statistical Computing;
Python programming;
Sheets, as described in a previous article. Then, the same simulation assignments are completed in Python, Spreadsheets
as detailed in this article. This pedagogical strategy appears to support student learning for those who are
unfamiliar with statistical computing but familiar with the use of spreadsheets. Supplementary materials
for this article are available online.

1. Introduction 2. Literature Review


Two semesters of applied statistics are required for undergradu- In recent years, demand has increased dramatically for pro-
ates pursuing a business degree at the corresponding author’s gramming and data analysis skills (Manyika et al. 2011;
institution, which is an Association to Advance Collegiate Davenport and Patil 2012). As a result, AACSB-accredited
Schools of Business (AACSB) accredited School of Business Business Schools have begun offering targeted coursework
within a regional state university. Previously, these business and programs in business analytics and data science (Brunner
statistics courses would have included applied data analysis and Kim 2016; Zhao and Zhao 2016). An awareness of the
using commercially produced software programs in the curricu- importance of quantitative literacy for a variety of disciplines,
lum. However, the field of applied statistics has moved toward particularly in probability and statistics, has risen in the past
free and open-source statistical computing platforms, making two decades as well (Hallett 2003). Accordingly, instructional
commercial programs less pertinent. In an effort to make strategies for data science courses have received recent attention
statistics as relevant as possible to the future business careers in both statistical and pedagogical literature. A consistent theme
our students will pursue, I, the corresponding author, began is the need to give students opportunities to practice and
teaching a statistical computing module using Python as part of apply programming techniques and languages like Python to
the second-semester statistics course. Initially, I incorporated realistic problems, both in and out of the classroom (Rufinus
statistical computing with Python into the curriculum with and Kortsarts 2006; Dichev et al. 2016; Woodard and Lee 2021).
two free learning platforms, Datacamp.com and Repl.it, and Similarly, statisticians have started to advocate for the inclusion
evaluated their effectiveness in teaching programming skills of computing concepts in undergraduate statistics curricula
to undergraduate students with limited, if any, programming (Nolan and Temple Lang 2010; Donoho 2015). This shift began
backgrounds (Holman 2018). A deeper analysis of the decision in the early 2010s with a call from the American Statistician
to use DataCamp’s resources can be found in Holman (2018), in Association (ASA) to improve undergraduate curricula, to more
which I conclude that “I still recommend DataCamp as a good effectively prepare both statistics students and interdisciplinary
source of material for training data scientists but I do not suggest students for their future careers (Horton and Hardin 2015).
relying on the interactive exercises as a reliable indication or In their revised Guidelines for Assessment and Instruction in
measurement of student comprehension” (109). In the next Statistics Education (GAISE) College report released in 2016, the
iteration of the course, I introduced Monte Carlo simulation ASA added two additional emphases to the recommendation
with spreadsheets as a pedagogical tool to better support student “teach students statistical thinking,” pushing for that to be in
learning of a new programming language (Holman 2019). In the context of investigative processes of problem-solving and
this article, the coauthor and I describe the sequence of Python decision-making and to include multivariable thinking (GAISE
programming exercises used to teach Monte Carlo simulation College Group 2016, 6). Additionally, the GAISE recommenda-
and analyze their effectiveness. tions include “gain[ing] experience with how statistical models

CONTACT Justin O. Holman [email protected] Hasan School of Business, Colorado State University Pueblo, 2200 Bonforte Blvd., Pueblo, CO 81001-4901.
Supplementary materials for this article are available online. Please go to www.tandfonline.com/ujse.
© 2022 The Author(s). Published with license by Taylor and Francis Group, LLC.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution,
and reproduction in any medium, provided the original work is properly cited. The moral rights of the named author(s) have been asserted.
2 J. O. HOLMAN AND A. HACHERL

are used” and “interpret[ing] and draw[ing] conclusions from students in the application of the programming language
standard output from statistical software packages” (GAISE Col- Python.
lege Group 2016, 8). Simulation and statistical modeling meet When it comes to the software package used for simulation
these criteria and as a result, entire statistics courses have been modeling, some statisticians advocate the use of the R language
framed around simulation exercises (Nolan and Speed 1999; (R Development Core Team 2006) in statistics curricula to teach
Nolan and Temple Lang 2003; Horton, Brown, and Qian 2004; students programming language skills (Nolan and Temple Lang
Foster and Stine 2006; McLoughlin 2008; Horton 2013). Many 2010). As described by Horton, Brown, and Qian (2004), R
of these simulation exercises operate along the same lines of can be used both for statistical analysis by professionals and as
thinking as Batanero, Tauber, and Sánchez in arguing that stu- a classroom tool for exploration. Indeed, students pursuing a
dents develop an understanding of mathematical and statistical degree in statistics or data science would be well served to learn
concepts by solving problems related to that concept (Batanero, both R and Python. For undergraduate business students taking
Tauber, and Sánchez 2004). Such simulations can also encourage required courses in statistical methods, there is not sufficient
them to make real-life connections through examining risk in time built into the curriculum to properly introduce both the
the world of financial investment (Foster and Stine 2006) and R and Python environments. As a result, and due to its broad
allow them to work with real data, defined by Gould (2010) applicability and growing popularity, Python seems to be the
as “associated with real people and places and collected to more practical option.
solve and answer a specific and pressing questions” (5). Both of In recent years Python has received increasing support for
these examples demonstrate how integrating simulation exer- teaching data science and programming skills (Brunner and
cises into statistics courses allows further opportunities to meet Kim 2016). According to the Python Software Foundation
the guidelines and recommendations set by the American Sta- (2021), “Python is an interpreted, interactive, object-oriented
tistical Association by making the coursework more relevant programming language which combines remarkable power
and accessible to students. Additionally, similar arguments in with very clear syntax.” Many researchers and instructors also
favor of providing students opportunities for contextualized emphasize Python as a preferred language for instruction
experience with statistical simulation in order to better prepare because it is so marketable and its simplicity allows students to
them for the workforce have been made in fields outside of
focus on the programming and problem-solving skills rather
business analytics, including engineering and chemistry (Barba,
than the complex syntax of other programming languages
Wickenheiser, and Watkins 2017; McCluskey et al. 2019). These
(Rufinus and Kortsarts 2006; Brau, Brau, and Keith 2020). While
additional cross-discipline studies demonstrate an overall shift
R is restricted to statistical computing applications, Python is a
in the world of statistical computing to emphasize the real-world
general-purpose programming language like Java or C++ and
application of such skills and better prepare students for the
can be used for a wide variety of purposes, which helps prepare
reality of the workforce.
business students in particular for the workforce (Perkel 2015).
In this article, we will build on these pedagogical strategies by
Brunner and Kim (2016) also argued that “Python (especially
introducing a scaffolded Monte Carlo simulation as an exercise
when using the Pandas library) is capable of performing most,
to learn Python. Research shows that this mathematical tool,
which is used to estimate outcomes in cases of risk or uncer- if not all, of the data analysis operations that a data scientist
tainty, has incredibly broad applications in statistical computing. might complete by using R” (1948). More recently, the Institute
The Monte Carlo method was introduced by Metropolis and of Electrical and Electronics Engineers (IEEE) ranked Python
Ulam (1949) and has been applied extensively in many fields as the number one overall programming language (Cass 2020).
including biology (Manly 1997), physics (Strawderman 2001), In 2018, they observed that R’s slight decline in their rankings
engineering (Arie 2000), and finance (Glasserman 2004). It has likely contributed to Python’s continued success, stating “the
also been applied through Python programming to research in existence of high-quality Python libraries for both statistics
a variety of disciplines, ranging from MRI data to earth science and machine learning may be making flexible Python a more
(Karssenberg, de Jong, and van der Kwast 2007; Kerkelä et al. attractive jumping-off point than the more specialized R” (Cass
2020). Though Monte Carlo simulation is not usually included 2018). In 2020s report, Cass also argued that Python’s sustained
in a business statistics curriculum, it has proved to be useful popularity can be attributed to its frequent use as a teaching
pedagogy for teaching production and operations management language, further supporting its presence in undergraduate
(Usher 2008; Hayes 2008), economics (Becker and Greene curricula.
2001; Craft 2003), finance (Carver 2013) and more traditional Researchers have found Python to be an increasingly
business statistics topics including Sampling Distributions desirable skill for employers in both data science and business
and Hypothesis Testing (Weltman 2015, 2017). Finally, recent analytics and concluded that its tremendous community
research argues that modeling and simulation are constructive support has increased its utility in the workforce and its
to students’ statistical learning to provide opportunities to desirability for teaching at the collegiate level (Stanton and
diagnose unexpected outcomes (Peng et al. 2021), a criterion Stanton 2020). Usage statistics continue to underscore the
Monte Carlo simulation provides because it produces a full advantages of Python. According to Stack Overflow (2018),
range of outcomes, including those that are less likely or unex- a popular online forum for programmers, growth in the
pected. This article builds on these pedagogical applications use of Python versus other popular programming languages
to business curriculum concepts generally and draws from is unsurpassed since 2012. In its most recent survey, Stack
Carver (2013) to provide a set of teaching exercises specifically Overflow (2021) reports usage of Python at 48.24% versus Java
designed for analysis of financial data while also engaging at 33.35%, C++ at 24.31%.
JOURNAL OF STATISTICS AND DATA SCIENCE EDUCATION 3

Python is easy to learn and even described as “a joy to teach” learning Python programming with Google Sheets instead.
because of its simplicity and practicality (McCane 2009, 10). Google Sheets is nearly identical to Excel in function but is
This sentiment comes primarily from an introductory program- a “pure” web application, which eliminates concerns about
ming standpoint but it still has some relevance in business compatibility given the vast variety of mobile and desktop
analytics. More generally, quantitative evaluation of teaching student computing devices.
introductory programming with Python versus Java indicated As they are completing each of the simulation lessons in
that Python is preferred for teaching basic, procedural aspects of Google Sheets, students also begin an introductory Python
programming and Java is more appropriate for teaching object- programming module using a combination of courseware
oriented programming (Jayal et al. 2011). Furthermore, sig- from DataCamp and instructor-led in-class programming
nificant literature emphasizes Python’s accessibility as a pro- activities utilizing the Repl.it on-line development environment.
gramming language for undergraduate students with little to no DataCamp provides video and interactive programming lessons
background knowledge in programming, which best describes at no cost for classroom use at datacamp.com (DataCamp
the background of the majority of students in this course. 2019). Students began with “Intro to Python for Data Science”
where they were introduced to Python Basics, Lists, Functions,
Methods and Packages, including NumPy, an extension module
3. Course Description created to facilitate numerical computation (Oliphant 2006;
Harris et al. 2020). While DataCamp serves to communicate
The University catalog describes Advanced Business Statistics as
and teach the essentials of Python, Repl.it is a browser-based
follows: “Development of advanced statistical techniques to sup-
development environment for Python and other programming
port business decision-making. Topics include advanced multi-
languages that provides a space for students to practice what
ple regression analysis, analysis of variance, and nonparamet-
they are learning by writing code (Repl.it 2021). It is useful both
ric techniques.” The course is the second in a required two-
because it is a platform-independent online environment and
course sequence within the School of Business, following the
because it is easy to share and troubleshoot programs by simply
prerequisite first-semester course, titled “Inferential Statistics
copying and pasting a web link.
and Problem Solving.”
In summary, students learn the simulations in Google Sheets
In Spring 2019 the course was taught in a computer lab class-
while they are also learning the basics of writing Python through
room with 45 Dell Personal Computers (PCs) connected to the
the lessons in Datacamp. Then, they apply that knowledge of
internet, running Microsoft Windows, and equipped with the
both Python basics and Monte Carlo simulation exercises by
Microsoft Office suite and other popular software applications
crafting the simulations themselves in Python using Repl.it.
for business productivity. The course met for 80 minutes twice
Below, we briefly describe the exercise students completed for
per week for 15 weeks in a traditional format. Although web
each simulation, along with a model of the work required to
materials and web applications were used frequently, the course
run the simulation in Python using Repl.it, and the pedagogical
was not an officially designated “hybrid” or “online” course, and
implications of each. A line-by-line description of the simula-
students were expected to attend class in person. There were two
tion is beyond the scope of this article, but can be found in the
sections of the course with the first section beginning at 11:15
supplementary materials.
a.m. and the second beginning at 1:00 p.m. each Monday and
Wednesday of the semester. A total of 79 students were enrolled
in both sections combined. Students were encouraged to work 4.1. Simulation 1: Coin Toss
together on assignments but required to submit their own work.
Initially, students are introduced to the coin toss, one of the
Three exams were used to assess student proficiency. A course
simplest possible random processes. In this exercise, students
syllabus and individual lesson plans with detailed assignments,
are asked to consider how the accuracy of forecasting the out-
including Python programming assignments, are available by
come of the coin toss changes with greater repetition. The
contacting the author.
intended learning outcome is that students are able to describe
how accurately predicting a single coin flip is very difficult
but, as the number of iterations increases, predicting the pro-
4. Simulation Lessons
portion of heads or tails becomes increasingly accurate. This
Though a detailed accounting of every aspect of the content sounds elementary, but hopefully, students recognize that the
delivered in the Advanced Statistics course during the Spring same simulation approach can be applied to other, more com-
2019 semester is beyond the scope of this article this section plex, processes with similar opportunities in terms of prediction
describes the ways in which the Python simulation module is accuracy. Using a simple “for loop,” a Python program iterates
presented to students. In Holman (2019), the corresponding through multiple coin tosses. There are no inputs or parameters.
author described the instructions for first implementing these Rather, this simple simulation illustrates that as the number
three simulations in Google Sheets, before moving on to repeat of coin tosses increases, the mean probability of coin tosses
the same simulations using Python. that result in “heads” converges toward 0.5. See Figure 1 or
While few business students had experience with computer interact with the program directly by browsing to https://replit.
programming coming into the course, nearly every student com/@statsprof/Sim1CoinToss. Note that although Python loops
had experience with spreadsheets, primarily using Microsoft are known to be relatively inefficient, no performance lags were
Excel due to its near-ubiquitous presence in the business world. observed during classroom simulation tests with up to 10,000
Despite this, this coursework scaffolds their experience of iterations.
4 J. O. HOLMAN AND A. HACHERL

Figure 1. Coin toss simulation in Google Sheets, used to scaffold student learning in Python (Holman 2019).

Figure 2. Coin toss simulation in Python on Repl.it: https://replit.com/@statsprof/Sim1CoinToss.

In the Python program above, shown in the Repl.it develop- the convergence, which is why students are asked to repeat the
ment environment, there are 20 lines of Python code in the file, simulation with higher numbers of iterations each time. With
main.py. Line 2 of the program imports the “random” module only 100 iterations, the range of possibilities is wider, anywhere
used to generate random numbers. There are four total vari- from 0.4 to 0.6 is likely, but at 1,000 or 10,000 iterations, the
ables: “sims” for number of simulated coin tosses, “simcount” range narrows and is highly likely to be within a few one-
and “heads” for integer counting variables, and “pcth” to calcu- hundredths from 0.5 by the end of the simulation.
late the “percent heads” result at each step in the simulation. As The program in Figure 2 contains everything necessary for
mentioned before, the substance of the program begins with a the simulation. The remaining lines of code, shown in Figure 3,
“for loop” which tells Python to loop through the simulation generate a lineplot for visualization showing how the percentage
ten thousand times for the student assignment, but can be of coin tosses resulting in “heads” converges to 0.5. This com-
modified to any integer. Once the coin toss has been simulated, ponent of the exercise is both an appropriate skill for students
a conditional statement checks to see if the random number to practice in Python and serves to reiterate the objective to
is greater than zero, which is the “heads” outcome, and the students, ensuring that their takeaway from this simulation is
“heads” variable increases accordingly. As the heads variable that greater repetition leads to convergence and predictabil-
changes, so does the “pcth” variable so that the percent heads ity. Two very commonly used Python modules are imported,
is constantly being updated as the simulation runs. The idea is NumPy and matplotlib, to facilitate the creation of a line plot.
to show that early on in the iterations, the percentage of heads The list variable, “pcth” or “percent heads,” is converted into a
is fairly volatile, but it eventually begins to converge toward NumPy array, which is passed to the “plot” function creating a
0.5. The higher the number of iterations, the more complete line plot. Note that matplotlib can typically plot a list variable
JOURNAL OF STATISTICS AND DATA SCIENCE EDUCATION 5

Figure 3. Coin toss simulation graphic output on Repl.it: https://replit.com/@statsprof/Sim1CoinToss.

rolls other than a seven result in a losing bet while rolling a


seven is rewarded with a relatively large payoff. The objective
of the exercise is to run the simulation under various payoff
scenarios, in which the payoff multiplier for a winning roll could
be four, five, six, or seven times the bet amount. Then, they use
the simulation to decide which payoff amount will both draw
potential game players and earn profit for the casino.
A new element introduced in this simulation is the financial
outcome component. An initial balance is established and the
balance is updated after each roll of the dice. This enables stu-
dents to analyze various payoff levels and how winnings are then
distributed between the “player” and the “house” after many
iterations. The program is displayed in Figure 5, in the supple-
mental materials, and on repl.it at https://replit.com/@statsprof/
Sim2DiceGame.
This program starts the same way as the coin flip simulation—
Figure 4. Graphic displaying repeated trial results for coin flip simulation.
with a Python module for random variable generator, using
another “for loop” at the core of the simulation. However, instead
of randomly choosing between integers zero and one, the range
but in Repl.it the line plot was not rendering properly until the of the random variable generator increases from between one
NumPy array was utilized. On lines 30–36, graph attributes are and six to mimic the roll of a single die. Likewise, it must be
defined and the “show” command tells the program to display inserted twice in the code to simulate the roll of two dice. If you
the resulting graphic. See Figure 3 as well as the supplementary generate a random number between two and twelve, to simulate
materials for coding details and see Figure 4 for the resulting the possible range of sums of those two dice, the outcome will
graphic output. be a uniform distribution. It will reflect an equal likelihood of
rolling a twelve or a two as rolling a seven or a nine, even though
in reality a roll of double ones or double sixes is far less likely
4.2. Simulation 2: Dice Game
than rolling a six, seven, or eight. Therefore, the same line is
In the second simulation, students are introduced to another inserted twice before summing the two numbers to simulate the
relatively simple discrete probability distribution. It involves distribution of results from rolling two dice individually. This
repeated randomization of rolling a pair of dice, evaluating the is the simplest of games involving dice: if you roll a 7 you win,
outcome, and then calculating the resulting change to a hypo- if you roll any other sum you lose. Therefore, the sum has to be
thetical player’s cash balance. Students are specifically asked tested to determine if the roll is a win or a loss, increasing the
to simulate a simple game called “Lucky Seven” in which all respective variable by one.
6 J. O. HOLMAN AND A. HACHERL

Figure 5. Dice game simulation on Repl.it: https://replit.com/@statsprof/Sim2DiceGame.

Figure 6. Dice game simulation with betting: https://replit.com/@statsprof/Sim2DiceGame-PlusBetting.

To extend the Dice Simulation to include a financial amount. That will be expressed in line 17 (under “losses”
component, “begin balance” and “end balance” are added, in line 16) as “endbalance = beginbalance—bet.” At the end
along with the “bet” amount and the “pay off odds” variables. of the “for loop,” the beginning balance needs to be reset to
If the result is a win, the end balance will be equal to the reflect the end balance. That way, it will accurately reflect
beginning balance plus the bet amount times the payoff odds. the impact of wins and losses for each subsequent roll of the
This is expressed under “wins” in line 14 as “endbalance two dice. See Figure 6, the supplemental materials, or browse
= beginbalance + (bet*payoffodds).” If it is a loss, the end to https://replit.com/@statsprof/Sim2DiceGame-PlusBetting for
balance will be equal to the beginning balance minus the bet details.
JOURNAL OF STATISTICS AND DATA SCIENCE EDUCATION 7

When a pair of dice are rolled, there are 36 possible outcomes. instructed to identify the distribution of data they may want to
and 6 of those outcomes result in rolling a 7. Therefore, the simulate in the future, which may not be normally distributed.
probability of rolling a 7 is 6/36, which is 1/6 or approximately Using the information provided above and the simulation
16.67%. Once again, though there will be some initial variabil- described below as a starting point, students are then asked to
ity, as the number of simulations increases, the probability of implement the necessary program elements to track a hypothet-
winning the game and rolling a 7 should get closer and closer ical retirement portfolio over a 40-year time frame. This exercise
to 16.67% overall. Possible extensions to this assignment are as is very intimidating and difficult for novice programmers but
numerous as the number of dice games that can be simulated. using the spreadsheet model to scaffold instruction as described
Prompting students to implement alternative rules is an excel- in Holman (2019) as a guide along with the code for tracking
lent way to test and enhance their comprehension. bets in the Dice Simulation (see Simulation 2 and Figure 6),
After presenting the Lucky 7 game simulation, students are students typically complete the assignment successfully.
asked to implement a simplified version of craps called “Pass” In this simulation, similar to the way the random module
where the player wins with a roll of 7 or 11, loses with a roll was imported, the “NumPy” module is imported (numpy.org).
of 2, 3, or 12 and ties with rolls of 4, 5, 6, 8, 9, or 10. As a Then, the variables are established as number of simulations,
more challenging assignment, students are asked to implement years, and beginning balance. The mean and standard deviation
a game called “Chuck-a-luck” which involves tossing three dice of the stock market return data, which are taken from Figure 6,
and winning a large payout for rolling a “triple,” that is, three of are integrated in line 3. Next, another loop is established to sim-
a kind. ulate the stock market assuming a normal distribution and using
the “normal” function passing the mean, standard deviation,
and the number of years as parameters. The loop goes through
4.3. Simulation 3: Stock Market Returns
the 40 years starting with the beginning balance increased by
The stock market simulation is the most complex, but by this the return from stock market simulation in the previous loop’s
point in the course, students are familiar with most of the iterations. The simulation concludes in lines 12 to 14 with the
elements utilized in the model. First, we look at the distri- ending balance that has the list of returns, which prints before
bution of historical stock market returns. Using the S&P 500 the simulation ends. See details in Figure 8, the supplemental
and going back to 1926 we generate a frequency distribution materials, or browse to the repl.it program at https://replit.com/
of historical total returns and observe that the distribution is @statsprof/Sim3StockMarket.
skewed but approximately normal with a mean of 11.88% and a Students are given the program in Figure 8 then asked to
standard deviation of 19.76%. See Figure 7 for details. Students calculate several summary statistics. Students are also asked
are cautioned that the assumption of normality may skew results to use the simulation to estimate the likelihood of reaching a
and is made to facilitate a simpler simulation. Students are also particular target balance within a specified number of years.

Figure 7. Distribution of S&P 500 Stock Market Index annual returns (%) from 1926 to 2018.
8 J. O. HOLMAN AND A. HACHERL

Figure 8. Stock market simulation: https://replit.com/@statsprof/Sim3StockMarket.

After students have gained familiarity with the stock market the first question, nor did they include only suggestions
simulation they are asked to add a fixed income (bonds) compo- for improvements in their answer to the second question.
nent, or other asset classes (e.g., gold, cash, international stocks), Because many students included both positives and negatives
to the simulation. In addition, students must implement the together in their answers, the two collections of responses are
ability to designate a percent stocks and percent bonds (or other best analyzed together. Many of those positive and negative
asset class), for example, 80/20 or 60/40, as a portion of the total comments applied to the grading and pacing of the course,
investment portfolio with annual rebalancing. which while useful, are not pertinent to this article. However,
multiple mentions of in-class work on solving problems were
mentioned as positive, which is a hallmark component of
5. Student Evaluations
teaching simulations. Additionally, more than five students
Although a formal experimental design with control groups is praised the real-life application of the work they learned and
beyond the scope of this article, voluntary student evaluations completed in this course, several of them specifically linking
at the author’s institution can be used as anecdotal evidence it to Python’s many applications to their future careers, which
to reflect the impact of this pedagogical strategy on students. was a driving force for including it in the coursework in the
These evaluations are submitted near the end of the course, and first place. Several students also argued that more time should
because they are voluntary, respondents are self-selected, not have been dedicated to Python. This is productive feedback that
randomly determined. Students are presented with 18 different demonstrates the general advantages of learning Python, while
categories for which they select ratings on a scale with 5 options also informing potential structural changes to future iterations
(i.e., Strongly Agree, Agree, Neutral, Disagree, Strongly Dis- of the course.
agree) and are subsequently prompted to provide text responses The evidence leads me to conclude that Python was not
reflecting on instructor approachability, what works well and consistently popular or unpopular with all students. While some
what to improve. comments were critical of the use of Python programming as
In the spring semester of 2018, the corresponding author part of the course, others specifically mentioned Python as a
introduced Python through DataCamp lessons and Monte Carlo positive element. That said, it’s difficult to know whether the
simulations but without first introducing simulation exercises same students might have taken issue with being assigned run-
in spreadsheets. That semester, out of 79 possible respondents ning and interpreting multiple regression models in SPSS or
in the class, 35 students completed an evaluation yielding a similar, which would have required extensive time by nature.
response rate of 44%. Of the 35 evaluations completed, 28 In the spring semester of 2019, the corresponding author
students provided text comments in response to the “What continued to teach Python with DataCamp lessons and Monte
works well?” prompt with the word “Python” appearing 6 times. Carlo simulations. However, exercises introducing Monte Carlo
And, 29 students provided text comments to address “What to simulations in Google Sheets were added to the first half of the
improve?” with the word “Python” appearing 11 times. Com- course in order to scaffold student learning as they approached
ments submitted are listed under “Student Evaluation Data” in Python with the same simulations. This semester, the student
the supplementary materials. evaluations yielded a response rate of 57%, with 43 students
Clearly, from the comments submitted, not all students answering out of 76 possible respondents in the course. Of the
included only positives of the course in their response to 43 evaluations completed, 36 students provided additional text
JOURNAL OF STATISTICS AND DATA SCIENCE EDUCATION 9

comments in response to the “What works well?” prompt with used to the soft-coding on that before moving along to Python.
the word “Python” appearing 7 times. Additionally, 32 students Great approach” to the question “What works well?” While
provided text comments to address “What to improve?” with the these rating differences could be attributed to differences in
word “Python” appearing 13 times. Comments submitted are the student cohorts or the comfort of the instructor with the
listed in the supplementary materials. materials, overall, the feedback and increased ratings support
Though many comments again make suggestions more the hypothesis of this article that scaffolding the simulation
focused on grading and assignments, with regard to Python, lessons with spreadsheets before introducing Python program-
this evidence once again shows that students are divided in their ming makes learning programming more accessible to students.
view. While some comments specifically mention Python in a The stated aim of the course is to teach advanced statisti-
positive light, other comments mentioning Python are critical cal methods. Some students believe this implies that the con-
of its role in the course. Once again, five students praised the tent should be akin to a “math class” rather than a “program-
in-class demonstration, hands-on problems, and the practical ming class” but we disagree. The most salient changes and
applications of the assigned problems. Five students specifically advances in statistical methods over the past 30 to 40 years,
praised the inclusion of outside technology like DataCamp, and in the business world and elsewhere, involve the application of
several students also described the real-life utility of Python computational methods to large datasets. These advances have
across disciplines. Several negatives were presented by students both opened up opportunities to explore problems that were
as well, though many of them appeared to be related to the previously intractable and transformed the workplace. In the
challenging nature of the course, and it is possible these students 20th century, no one thought of themselves as a data scientist,
exhibit a negative response to most challenging stimuli. On the but in 2012 the role of data scientist was declared the “Sexi-
other hand, there are legitimate concerns expressed by students est Job of the 21st Century” in the Harvard Business Review
regarding how much time is devoted to learning Python, how (Davenport and Patil 2012). Some students have complained
content is delivered in class and out of class, and how much because learning computational methods is a more challenging
breadth and depth are appropriate. These comments can be used task than learning how to operate a statistical software program
to inform improvement in terms of both content and instructor like SPSS. These are students whose primary motivation is to
delivery. minimize effort toward obtaining a degree. More ambitious
Table 1 compares ratings between the two Advanced Statis- students who are looking ahead at these trends in employment
tics course sessions in Spring 2018 and Spring 2019. Again, prefer to be exposed to the latest approaches and technolo-
instead of looking at the numerical rating averages, I have simply gies. These are likely the same students providing evaluation
provided the proportion of students indicating they “Strongly feedback, available in the supplementary materials, with com-
Agree,” the highest (best) rating possible, with each question ments such as “the material I learned will be applied in the
about the course. future” and “Datacamp gears things back to the real world.”
The noticeable increase in overall ratings across all cate- If the curriculum was limited to a “pencil and paper” math
gories for the course between 2018 and 2019 may demon- class instructional modality, these forward-thinking students
strate the effectiveness of the scaffolding approach to teaching would rightfully be able to complain that the professor and the
Python, created by introducing Monte Carlo simulations in academy were not keeping up with the “real world” of private
Google Sheets first and in Python second. In fact, one stu- industry where they will almost certainly spend their career.
dent’s evaluation comment specifically described this approach Therefore, we believe including Python in the curriculum meets
as a positive aspect of the course that supported their learning, the stated aim of teaching advanced statistical methods for these
responding “He went about teaching google sheets first to get us students.

Table 1. The proportion of students indicating that they “Strongly Agree”with each question about the course is shown here for the 2018 Spring and 2019 Spring Advanced
Statistics Course.
2018 2019
Question text Advanced statistics Advanced statistics Difference
Pace of course (is appropriate) 0.54 0.72 +0.18
Grading system is fair 0.71 0.86 +0.15
Prompt grading of work 0.71 0.84 +0.13
Instructor made use of class time 0.74 0.91 +0.17
Made difficult material understandable 0.60 0.70 +0.10
Communicates ideas clearly 0.63 0.81 +0.18
Responded to student questions 0.74 0.81 +0.07
Available outside of class 0.69 0.84 +0.15
Set and maintained high standards 0.71 0.86 +0.15
Encouraged critical thinking and analysis 0.74 0.91 +0.17
Instructor facilitated class participation 0.57 0.84 +0.27
Treated students with respect 0.83 0.88 +0.05
Communicated enthusiasm for the course 0.77 0.84 +0.07
Teaching strategies enhanced learning 0.63 0.72 +0.09
Text was effective 0.26 0.65 +0.39
Instructor was a successful teacher 0.66 0.91 +0.25
Learned from the course 0.66 0.79 +0.13
Average 0.66 0.82 +0.16
10 J. O. HOLMAN AND A. HACHERL

6. Conclusion (Holman 2019), there was a wide range of responses to including


Python programming in the curriculum, with some students
In 2020, Brau, Brau, and Keith described an Information Sys-
believing a statistics course should include no programming
tems Course with programming-oriented learning objectives.
elements while others wanted to spend more time on compu-
Similar to this article, the course targeted students with no
tational methods. However, the data did indicate that student
programming language experience and strove to provide them
familiarity with a spreadsheet environment made using Google
with contextualized, workforce-applicable examples of Python’s Sheets as an introductory computing platform helpful in allevi-
utility in future careers in the hands-on simulation portion ating student anxiety around beginning Python programming.
of the coursework. Their student evaluation results indicated Because Monte Carlo simulation can be conducted in Google
that the course explained concepts effectively, and comments Sheets as well as in Python, it supports my efforts to scaffold
reflected that the ways in which these advanced programming student learning by allowing students to become accustomed
skills were taught helped students understand their application. to new simulations in a familiar spreadsheet environment, like
This case study appears to reiterate the findings of this article Google Sheets, before transitioning to a new and more intim-
that such hands-on, simulation-based programming courses are idating computational environment, like Python. With this in
useful to students in preparing them for future careers. mind, a positive next step for the course is to continue experi-
In 2017, through a grant from the National Science Foun- menting with pedagogical approaches to introduce students to
dation, Dr. Lorena A. Barba and her fellow Mechanical and these important computational and statistical methods while
Aerospace Engineering department faculty at George Washing- working to mitigate their anxiety.
ton University designed a two-year program of Python com- Scaffolding with spreadsheets to teach computational skills
puting modules to accompany their undergraduate engineering in Python offers a promising path toward filling a critical gap
coursework (Barba 2020). As we argued in this article, Barba in workforce training and economic development found in, for
also emphasized the value of Python programming in preparing example, the shortage of computational skills in the labor mar-
students for the workforce across many disciplines, including ket. To continue assessing the spreadsheet-to-Python scaffold-
engineering. For this reason, the department created modules ing approach described here, there are plans to next build com-
that incorporated simulation activities and computational skills parable learning modules for teaching other advanced statistics
through open-source resources like the Jupyter Notebook. In topics such as multiple regression analysis and modeling in this
student evaluation data, the majority of survey respondents and similar courses.
said they frequently used the skills they learned in the Python
modules in their other coursework. It appears that the idea of
simulation-based problem-solving as a way for students to expe- Supplementary Materials
rience computing in applied contexts supports student learning
Included in the supplementary materials is a line-by-line explanation of
and workforce preparation across disciplines, meaning that sim- the Python code that students should input into the Repl.it environment
ulations created for one course or department could possibly be to complete each simulation. Additionally, the full Student Evaluation Data
beneficial to another as well. is included in the form of course comments from the Spring 2018 and
Finally, scaffolding learning is a pedagogical tool the corre- Spring 2019 semesters as evidence supporting the inclusion of Python in
the course.
sponding author, as the instructor, felt was critical to the student
success in this course. The concept of scaffolding itself is widely
attributed to Vygotsky (1978) and his research around the zone References
of proximal development. Maybin, Mercer, and Stierer (1992)
built on this work to define scaffolding as “the process whereby Arie, D. (2000), Monte Carlo Applications in Systems Engineering, New York:
Wiley.
one person in the role of ‘teacher’ mediates the progress of
Barba, L. (2020), “Engineers Code: Reusable Open Learning Modules for
another person, the ‘learner’, by reducing the scope for failure Engineering Computations,” Computing in Science & Engineering, 22,
in the task the learner is attempting” (23). Its applications to a 26–35.
variety of disciplines and pedagogical practices has been widely Barba, L., Wickenheiser, A., and Watkins, R. (2017), “CyberTraining:
explored in the past fifty years, and the teaching of mathematics DSE—The Code Maker: Computational Thinking for Engineers
with Interactive, Contextual Learning,” National Science Foundation
and statistics are no exception. Postsecondary instructors have
CyberTraining Proposal, Accessed April 7, 2022. Available at https://
argued that appropriate scaffolding materials are differentiated figshare.com/articles/online_resource/CyberTraining_DSE_The_Code_
to groups of students in order to bridge their thinking from Maker_Computational_Thinking_for_Engineers_with_Interactive_
a familiar to an unfamiliar concept (Taber 2018) and should Contextual_Learning/5662051/1.
help students construct their own conceptual knowledge of a Batanero, C., Tauber, L. M., and Sánchez, V. (2004), “Students’ Reasoning
About the Normal Distribution.” in The Challenge of Developing Statis-
problem or idea by engaging in guided research and hands-on
tical Literacy, Reasoning and Thinking, eds. D. Ben-Zvi and J. Garfield,
practice (Ruder, Stanford, and Gandhi 2018; Khusna 2021). 257–276. Netherlands: Springer.
Overcoming the trepidation many students feel toward com- Becker, W. E., and Greene, W. H. (2001), “Teaching Statistics and Econo-
puter programming is a key challenge in teaching statistical metrics to Undergraduates,” Journal of Economic Perspectives, 15, 169–
computing. For most students, their experience in my Advanced 182.
Brau, H. C., Brau, J. C., and Keith, M. (2020), “A Pedagogical Model
Statistics course is their very first introduction to computer
for Teaching Data Analytics in an Introductory Information Systems
programming. However, nearly all business students have some Python Course,” Business Education Innovation Journal, 12, 77–82.
familiarity with spreadsheets coming into the course. According Brunner, R. J., and Kim, E. J. (2016), “Teaching Data Science,” Procedia
to student data collected through end-of-course evaluations Computer Science, 80, 1947–1956.
JOURNAL OF STATISTICS AND DATA SCIENCE EDUCATION 11

Carver, A. B. (2013), “The Equity Indexed Annuity: A Monte Carlo Forensic Material in Mathematics Statistics Courses,” Journal of Physics: Confer-
Investigation into Controversial Financial Product,” Decision Sciences ence Series, 1940, 012093.
Journal of Innovative Education, 11, 23–28. Manly, B. F. J. (1997), Randomization and Monte Carlo Methods in Biology,
Cass, S. (2018), “The 2018 Top Programming Languages,” Accessed July London: Chapman and Hall.
31, 2018, available at https://spectrum.ieee.org/at-work/innovation/the- Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., and
2018-top-programming-languages/. Byers, A. (2011), Big Data: The Next Frontier for Innovation, Competition,
(2020), “The Top Programming Languages: Our Latest Rankings and Productivity, United States: McKinsey Global Institute.
Put Python on Top—Again [Careers],” IEEE Spectrum, 57, 22–22. Maybin, J., Mercer, N., and Stierer, B. (1992), “Scaffolding Learning in the
Craft, R. K. (2003), “Using Spreadsheets to Conduct Monte Carlo Exper- Classroom,” in Thinking Voices: The Work of the National Oracy Project,
iments for Teaching Introductory Econometrics,” Southern Economic 186–195. United Kingdom: Hodder & Stoughton.
Journal, 69, 726–735. McCane, B. (2009), “Introductory Programming with Python,” The Python
Datacamp.com (2019), Accessed January 2019, Available at https://www. Papers Monograph, 1, 1–18.
datacamp.com/. McCluskey, A. R., Grant, J., Symington, A. R., Snow, T., Doutch, J., Morgan,
Davenport, T. H., and Patil, D. J. (2012), “Data Scientist: The Sexiest Job of B. J., Parker, S. C., and Edler, K. J. (2019), “An Introduction to Classical
the 21st Century,” Harvard Business Review, 90, 70–76. Molecular Dynamics Simulation for Experimental Scattering Users,”
Dichev, C., Dicheva, D., Cassel, L., Goelman, D., and Posner, M. A. (2016), Journal of Applied Crystallography, 52, 665–668.
“Preparing All Students for the Data-Driven World,” Proceedings of the
McLoughlin, P. (2008), “A Modified Moore Approach to Teaching Probabil-
Symposium on Computing at Minority Institutions, Admi 346.
ity and Mathematical Statistics: An Inquiry-Based Learning Technique,”
Donoho, D. (2015), 50 Years of Data Science, Princeton NJ, Tukey Centen-
ASA Proceedings of the Joint Statistical Meetings.
nial Workshop, 1–41.
Metropolis, N., and Ulam, S. (1949), “The Monte Carlo Method,” Journal of
Foster, D. P., and Stine, R. A. (2006), “Being Warren Buffett: A Classroom
the American Statistical Association, 44, 335–341.
Simulation of Risk and Wealth When Investing in the Stock Market,” The
American Statistician, 60, 53–60. Nolan, D., and Speed, T. P. (1999), “Teaching Statistics Theory Through
GAISE College Group (2016), “Guidelines for Assessment and Instruction Applications,” The American Statistician, 53, 370–375.
in Statistics Education,” Accessed November 10, 2021, Available at Nolan, D., and Temple Lang, D. (2003), “Case Studies and Computing:
https://www.amstat.org/asa/education/Guidelines-for-Assessment-and- Broadening the Scope of Statistical Education,” Proceedings of the 2003
Instruction-in-Statistics-Education-Reports.aspx. ISI Meeting.
Glasserman, P. (2004), Monte Carlo Methods in Financial Engineering, (2010), “Computing in Statistics Curricula,” The American Statis-
Germany: Springer. tician, 64, 97–107.
Gould, R. (2010), “Statistics and the Modern Student,” International Statis- NumPy.org (2021), Accessed August 31, 2021, Available at https://numpy.
tical Review, 78, 297–315. org/doc/stable/reference/.
Hallett, D. H. (2003), “The Role of Mathematics Courses in the Develop- Oliphant, T. E. (2006), A Guide to NumPy, USA: Trelgol Publishing.
ment of Quantitative Literacy,” in Quantitative Literacy: Why Numeracy Peng, R. D., Chen, A., Bridgeford, E., Leek, J. T., and Hicks, S. C. (2021),
Matters for Schools and Colleges, 91–98. United States: The University of “Diagnosing Data Analytic Problems in the Classroom,” Journal of Statis-
California. tics and Data Science Education, 29, 1–24.
Harris, C. R., Millman, K. J., van der Walt, S. J., Gommers, R., Virtanen, Perkel, J. M. (2015), “Pickup Python,” Nature, 518, 125–126.
P., Cournapeau, D., Wieser, E., Taylor, J., Berg, S., Smith, N. J., Kern, R., Python Software Foundation. (2021), General Python FAQ, United States:
Picus, M., Hoyer, S., van Kerkwijk, M. H., Brett, M., Haldane, A., Del Río, Python Software Foundation.
J. F., Wiebe, M., Peterson, P., Gérard-Marchant, P., Sheppard, K., Reddy, R Development Core Team (2006), R: A Language and Environment
T., Weckesser, W., Abbasi, H., Gohlke, C., and Oliphant, T. E. (2020), for Statistical Computing, R Foundation for Statistical Computing,
“Array Programming with NumPy,” Nature, 585, 357–362. Vienna.
Hayes, J. M. (2008), “Trikes, Cars and the Theory of Constraints (TOC),” Repl.it (2021), Accessed August 3, 2019, available at https://replit.com.
Decision Sciences Journal of Innovative Education, 6, 349–354. Ruder, S. M., Stanford, C., and Gandhi, A. (2018), “Scaffolding STEM Class-
Holman, J. O. (2018), “Teaching Statistical Computing with Python in a rooms to Integrate Key Workplace Skills: Development of Resources for
Second Semester Undergraduate Business Statistics Course,” Business Active Learning Environments,” Journal of College Science Teaching, 47,
Education Innovation Journal, 10, 104–110. 29–35.
Holman, J. O. (2019), “Teaching Simulation Methods with Google Sheets Rufinus, J., and Kortsarts, Y. (2006), “Teaching an Introductory Program-
as a Gentle Introduction to Statistical Computing with Python,” Business ming Course for Non-Majors Using Python,” Information Systems Edu-
Education Innovation Journal, 11, 125. cation Journal, 4, 3–8.
Horton, N. J. (2013), “I Hear, I Forget. I Do, I Understand: A Modified Stack Overflow (2018), “Developer Survey Results: 2018,” Accessed June 28,
Moore-Method Mathematical Statistics Course,” The American Statisti-
2021, Available at https://insights.stackoverflow.com/survey/2018.
cian, 67, 219–228.
(2021), “Developer Survey Results: 2021,” Accessed November 13,
Horton, N. J., and Hardin, J. S. (2015), “Teaching the Next Generation of
2021, Available at https://insights.stackoverflow.com/survey/2021#most-
Statistics Students to ‘Think with Data” The American Statistician, 69,
popular-technologies-language.
259–265.
Stanton, W. W., and Stanton, A. D. (2020), “Helping Business Students
Horton, N. J., Brown, E. R., and Qian, L. (2004), “Use of R as a Toolbox
for Mathematical Statistics Exploration,” The American Statistician, 58, Acquire the Skills Needed for a Career in Analytics: A Comprehensive
343–357. Industry Assessment of Entry-Level Requirements,” Decision Sciences
Jayal, A., Lauria, S., Tucker, A., and Swift, S. (2011), “Python for Teach- Journal of Innovative Education, 18, 138–165.
ing Introductory Programming: A Quantitative Evaluation,” Innovation Strawderman, R. L. (2001), “Monte Carlo Methods in Statistical Physics,”
in Teaching and Learning in Information and Computer Sciences, 10, Journal of the American Statistical Association, 96, 778–778.
86–90. Taber, K. S. (2018), “Scaffolding Learning: Principles for Effective Teaching
Karssenberg, D., de Jong, K., and van der Kwast, J. (2007), “Modelling and the Design of Classroom Resources.” In Effective Teaching and
Landscape Dynamics with Python,” International Journal of Geograph- Learning: Perspectives, Strategies, and Implementation, eds. M. Abend, 1–
ical Information Science, 21, 483–495. 43. New York: Nova Science Publishers
Kerkelä, L., Nery, F., Hall, M., and Clark, C. (2020), “Disimpy: A Massively Usher, J. M. (2008), “Simulation Software for Illustrating the Performance
Parallel Monte Carlo Simulator for Generating Diffusion-Weighted MRI Impact of Process Variation and Workstation Dependency,” Decision
Data in Python,” Journal of Open Source Software, 5, 2527. Sciences Journal of Innovative Education, 6, 343–347.
Khusna, A. H. (2021), “Scaffolding Based Learning: Strategies for Devel- Vygotsky, L. S. (1978), Mind in Society: The Development of Higher Psycho-
oping Reflective Thinking Skills: A Case Study on Random Variable logical Processes. Cambridge, MA: Harvard University Press.
12 J. O. HOLMAN AND A. HACHERL

Weltman, D. (2015), “Using Monte Carlo Simulation with Oracle ©Crystal Woodard, V., and Lee, H. (2021), “How Students Use Statistical Computing
Ball to Teach Business Students Sampling Distribution Concepts,” Busi- in Problem Solving,” Journal of Statistics and Data Science Education, 29,
ness Education Innovation Journal, 7, 59–63. S145–S156.
(2017), “Using Monte Carlo Simulation with Oracle ©Crystal Ball Zhao, J., and Zhao, S. Y. (2016), “Business Analytics Programs Offered
to Teach Business Students Hypothesis Testing Concepts and Type I by AACSB-accredited U.S. Colleges of Business: A Web Mining Study,”
Error,” Business Education Innovation Journal, 9, 184–188. Journal of Education for Business, 91, 327–337.

You might also like