Coding With Replit Export
Coding With Replit Export
Understanding the Repl.it IDE: a practical guide to building your first project with Repl.it 5
Introduction: creating an account and starting a project . . . . . . . . . . . . . . . . . . . . . 5
Adding more files to your software project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Sharing your application with others . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Sharing write-access: Multiplayer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Make it your own . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Where next? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Productivity hacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Using the global command palette . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Using the code editing command palette . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
CONTENTS
Creating and hosting a basic web application with Django and Repl.it . . . . . . . . . . . . . 190
Setting up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
Creating a static home page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
Calling IPIFY from JavaScript . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
Adding a new route and view, and passing data . . . . . . . . . . . . . . . . . . . . . . . . . . 203
Calling a Django route using Ajax and jQuery . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
Using ip-api.com for geolocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
Getting weather data from OpenWeatherMap . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
Learn the basics of the Repl.it IDE. Why use an online IDE and what are all those different panes?
Build a simple program to solve your maths homework.
Computers were initially created to read and write files, and although we’ve come a long way files
remain central to everything we do. Learn how to create them, read from them, write to them, and
import and export them in bulk.
Welcome to Code with Repl.it 2
No one is an island, and if you build software you’ll build it on top of existing modules that others
have written. Here we show you how to work with other people’s code in a variety of ways: in many
cases all you need to do is import antigravity and fly away¹.
Data is only useful if it can be easily understood. Plots, charts, and graphs are the easiest way to
know what’s happening in the world around you. And did you know that data science is the sexiest
job of the 21st century². Follow along to plot every city in the USA and find out if richer people live
longer.
Did we mention that no one is an island? Coders don’t have to work alone. You can invite your
friends to code along with you, a technique used by beginners and experts alike. Learn how to code
collaboratively, as if you were using a Google Doc.
Most open source software lives on GitHub and it’s easy to take advantage of all of this free software
by pulling code from GitHub to Repl.it and running it with one click. Some software needs to be
configured in specific ways so you’ll also learn how to modify what happens when you press that
big green “run” button.
Do you want to develop games? Of course, you can do that with Repl.it to. We’ll build a 2D juggling
game using PyGame in this lesson and you’ll learn more about graphics programming at the same
time: sprites, physics, and more.
Tutorial 8: Can you keep a secret? What about from time travellers?
Have you been hacked? It’s only a matter of time if you haven’t. Learn how to keep your secrets
safe, even when coding in public spaces. Pro tip: if you accidentally paste a password into your code
and then remove it, others might still find it in your history, so you’ll learn how to navigate that too.
¹https://xkcd.com/353/
²https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century
Welcome to Code with Repl.it 3
By this stage you’ll have made a few mistakes. Learn the TDD way and how to write code that
tests your other code to catch frustrating errors before they can hurt anyone. Repetitive jobs is what
computers are best at after all.
Have you seen the Matrix? Learn to be the Neo of coding by getting more than one cursor, using
keyboard shortcuts, and all of the other productivity features that Repl.it offers. You’ll be soon
producing more code in less time.
Tutorial 11: Keeping your data in check with the Repl.it database
Now that you are starting to build larger and more complicated applications, it is time to start using
databases to keep your data clean and secure.
Tutorial 12: Repl audio - control (or create) your music with code
Find, download, play, and control the volume of your music, all in code. If that’s not enough, create
your own music too.
This is the part where you realize that the possibilities are endless while you learn how to control
your music with code.
Learn more about what web scraping is, how websites are built, and how to automatically scrape
data from websites.
Extending the beginner’s web scraping tutorial, you’ll build a more advanced scraper that extracts
the plain text from news articles, stripping away the ‘boilerplate’ content, such as text in sidebars.
Welcome to Code with Repl.it 4
Build an echo bot using the Discord API. Your bot will always respond with exactly what you send
it, but you can customize it afterwards to do something more useful.
A NodeJS version of the Discord bot tutorial above. Even if you prefer Python, it’s good to go through
this one too to get experience with other languages.
Creating and hosting a basic web application with Django and Repl.it
Build a django web application and host it with Repl.it. You’ll use geolocation a weather API to show
the user their local weather forecast.
Another web application, but using NodeJS instead of Django. This is a different application where
you’ll build a basic app to manage customer information.
Build a machine-learning based text classifier. We skip the maths but show how you can use machine
learning libraries to implement useful solutions without in-depth theoretical knoweldge.
Whether you’re applying for jobs or just like algorithms, it’s useful to understand how sorting works.
In real projects, most of the time you’ll just call .sort(), but here you’ll build a sorter from scratch
and understand how it works.
Understanding the Repl.it IDE: a
practical guide to building your first
project with Repl.it
Software developers can get pretty attached to their Integrated Development Environments (IDEs)
and if you look for advice on which one to use, you’ll find no end of people advocating strongly for
one over another: VS Code, Sublime Text, IntelliJ, Atom, Vim, Emacs, and no shortage of others.
In the end, an IDE is just a glorified text editor. It lets you type text into files and save those files,
functionality that has been present in nearly all computers since those controlled by punch cards.
In this lesson, you’ll learn how to use the Repl.it IDE. It has some features you won’t find in many
other IDEs, namely:
• It’s fully online. You can use it from any computer that can connect to the internet and run a
web browser, including a phone or tablet.
• It’ll fully manage your environment for building and running code: you won’t need to mess
around with making sure you have the right version of Python or the correct NodeJS libraries.
• You can deploy any code you build to the public in one click: no messing around with servers,
or copying code around.
In the first part of this guide, we’ll cover the basics and also show you how multiplayer works so
that you can code alone or with friends.
you can pick one yourself. Note that by default your repl will be public to anyone on the internet;
this is great for sharing and collaboration, but we’ll have to be careful to not include passwords or
other sensitive information in any of our projects.
You’ll also notice an “Import from GitHub” option. Repl.it allows you to import existing software
projects directly from GitHub, but we’ll create our own for now. Once your project is created, you’ll
be taken to a new view with several panes. Let’s take a look at what these are.
1. Left pane: files and configuration. This, by default, shows all the files that make up your
project. Because we chose a Python project, Repl.it has gone ahead and created a main.py file.
2. Middle pane: code editor. You’ll probably spend most of your time using this pane. It’s a text
editor where you can write code. In the screenshot, we’ve added two lines of Python code,
which we’ll run in a bit.
3. Right pane: output sandbox. This is where you’ll see your code in action. All output that
your program produces will appear in this pane, and it also acts as a quick sandbox to run
small pieces of code, which we’ll look at more later.
4. Run button. If you click the big green run button, your code will be executed and the output
will appear on the right.
5. Menu bar. This lets you control what you see in the main left pane (pane 1). By default, you’ll
see the files that make up your project but you can use this bar to view other things here too
by clicking on the various icons. We’ll take a look at these options later.
Don’t worry too much about all of the functionality offered right away. For now, we have a simple
goal: write some code and run it.
1 print("Hello World")
2 print(1+2)
Your script will run and the output it generates will appear on the right pane (pane 3). Our code
output the phrase “Hello World” (it’s a long-standing tradition that when you learn something new
the first thing you do is build a ‘hello world’ project), and then output the answer to the sum 1 + 2.
You probably won’t be able to turn this script into the next startup unicorn quite yet, but let’s keep
going.
⁵https://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93print_loop
Understanding the Repl.it IDE: a practical guide to building your first project with Repl.it 9
This is useful for prototyping or checking how things work, but for any serious program you write
you’ll want to be able to save it, and that means writing the code in a file like in our earlier example.
Let’s make Python do the repetitive steps for us by creating a program called “solver”. This could
eventually have a lot of different solvers, but for now we’ll just write one: solve_quadratic.
Add a new file to your project by clicking on the new file button, as shown below. Call the file
solver.py. You now have two files in your project: main.py and solver.py. You can switch between
your files by clicking on them.
Understanding the Repl.it IDE: a practical guide to building your first project with Repl.it 11
Open the solver.py file and add the following code to it.
1 import math
2
3 def solve_quadratic(a, b, c):
4 d = (b ** 2) - 4 * a * c
5 s1 = (-b + math.sqrt(d)) / (2 * a)
6 s2 = (-b - math.sqrt(d)) / (2 * a)
7 return s1, s2
Note that this won’t solve all quadratic equations as it doesn’t handle cases where d, the discriminant,
is 0 or negative. However, it’ll do for now.
Navigate back to the main.py file. Delete all the code we had before and add the following code
instead.
Note how we use Python’s import functionality to import the code from our new solver script into
the main.py file so that we can run it. Python looks for .py (Python) files automatically, so you omit
the .py suffix when importing code. Here we import the solve_quadratic function (which we just
defined) from the solver.py file.
Run the code and you should see the solution to the equation, as shown below.
Understanding the Repl.it IDE: a practical guide to building your first project with Repl.it 12
What does this mean? Because no one else can edit your repl, you can share it far and wide. But
because anyone can read your repl, you should be careful that you don’t share anything private or
secret in it.
If you have a friend handy, send it to them to try it out. If not, you can try out multiplayer anyway
using a separate incognito window again. Below is our main Repl.it account on the left and a second
account which opened the multiplayer invite link on the right. As you can see, all keystrokes can be
seen by all parties in real time.
Understanding the Repl.it IDE: a practical guide to building your first project with Repl.it 15
Where next?
You can now create basic programs on your own or with friends, and you are familiar with the
most important Repl.it features. There’s a lot more to learn though. In the next nine lessons, you’ll
work through a series of projects that will teach you more about Repl.it features and programming
concepts along the way.
If you get stuck, you can get help from the Repl.it community⁷ or on the Repl.it Discord server⁸.
⁷https://repl.it/talk/all
⁸https://repl.it/discord
Working with Files using Repl.it
In this lesson, you’ll gain experience with using and manipulating files using Repl.it. In the previous
lesson you saw how to add new files to a project, but there’s a lot more you can do.
Files can be used for many different things. In programming, you’ll primarily use them to store data
or code. Instead of manually creating files and entering data, you can also use your programs to
create files and automatically write data to these.
Repl.it also offers functionality to mass import or export files from or to the IDE; this is useful in
cases when your program writes data to multiple files and you want to export all of these for use in
another program.
As before, you’ll get a default repl project with a main.py file. We need a data file to practise reading
data programmatically.
Working with Files using Repl.it 17
You should now have something similar to what you see below.
1 f = open("mydata.txt")
2 contents = f.read()
3 f.close()
4 print(contents)
This opens the file programmatically, reads the data in the file into a variable called contents, closes
the file, and prints the file contents to our output pane.
Press the Run button. If everything went well, you should see output as shown in the image below.
1 f = open("createdfile.txt", "w")
2 f.write("This is some data that our Python script created.\n")
3 f.close()
Note the w argument that we pass into the open function now. This means that we want to open the
file in “write” mode. Be careful: if the file already exists, Python will delete it and create a new one.
There are many different ‘modes’ to work with files in different ways which you can read about in
the Python documentation⁹.
Run the code again and you should see a new file pop up in the files pane. If you click on the file,
you’ll find the data that the Python script wrote to it, as shown below.
You should keep this key secret to stop other people using up all of your monthly calls.
1 import requests
2
3 # change the following line to use your own API key
4 API_KEY = "baaf201731c0cbc4af2c519cb578f907"
5 WS_URL = "http://api.weatherstack.com/current"
6
7 city = "London"
8
9 parameters = {'access_key': API_KEY, 'query': city}
10
11 response = requests.get(WS_URL, parameters)
12 js = response.json()
13 print(js)
14 print()
Working with Files using Repl.it 20
This code asks WeatherStack for the current temperature in London, gets the JSON¹¹ version of this
and prints it out. You should see something similar to what is shown below.
1. The location: To see if we found the correct London and not one of the 29 other places¹² called
London.
2. The date: We’ll record this when we save this data to a file.
3. The current temperature: This is specified by default in Celsius, but can be customised¹³ if
you prefer Fahrenheit.
Add the following code below the existing code to extract these values into a format that’s easier to
read.
1 temperature = js['current']['temperature']
2 date = js['location']['localtime']
3 city = js['location']['name']
4 country = js['location']['country']
5
6 print(f"The temperature in {city}, {country} on {date} is {temperature} degrees Cels\
7 ius")
If you run the code again, you’ll see a more human-friendly output, as shown below.
¹¹https://www.w3schools.com/js/js_json_intro.asp
¹²https://www.wanderlust.co.uk/content/londons-around-the-world/
¹³https://weatherstack.com/documentation
Working with Files using Repl.it 21
This is great for getting the current weather, but now we want to extend it a bit to record weather
historically.
We’ll create a file called cities.txt containing the list of cities we want to get weather data for. Our
script will request the weather for each city, and save a new line with the weather and timestamp.
Add the cities.txt file, as in the image below (of course, you can change which cities you would
like to get weather info for).
Now remove the code we currently have in main.py and replace it with the following.
1 import requests
2
3 API_KEY = "baaf201731c0cbc4af2c519cb578f907"
4 WS_URL = "http://api.weatherstack.com/current"
5
6 cities = []
7 with open("cities.txt") as f:
8 for line in f:
9 cities.append(line.strip())
10 print(cities)
Working with Files using Repl.it 22
11
12 for city in cities:
13 parameters = {'access_key': API_KEY, 'query': city}
14 response = requests.get(WS_URL, parameters)
15 js = response.json()
16
17 temperature = js['current']['temperature']
18 date = js['location']['localtime']
19
20 with open(f"{city}.txt", "w") as f:
21 f.write(f"{date},{temperature}\n")
• Read the city names from our cities.txt file and put each city into a Python list.
• Loop through the cities and get the weather data for each one.
• Create a new file with the same name as each city and write the date and temperature
(separated by a comma) to each file.
In our previous examples we explicitly closed files using f.close(). In this example, we instead
open our files in a with block. This is a common idiom in Python and is usually how you will open
files. You can read more about this in the files section of the Python docs¹⁴.
If you run this code, you’ll see it creates one file for each city.
If you open up one of the files, you’ll see it contains the date and temperature that we fetched from
WeatherStack.
¹⁴https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files
Working with Files using Repl.it 23
If you run the script multiple times, each file will still only contain one line of data: that from the
most recent run. This is because when we open a file with in “write” mode ("w"), it overwrites it
with the new data. Because we want to create historical weather logs, we need to change the second
last line to use “append” mode instead ("a").
Change
to
and run the script again. If you open one of the city files again, you’ll see it has a new line instead
of the old data being overwritten. Newer data is appended to the end of the file. WeatherStack only
updates its data every 5 minutes or so, so you might see exact duplicate lines if you run the script
multiple times in quick succession.
Once you’ve downloaded the .zip file you can extract it in your local file system and find all of the
data files which can now be opened with other programs as required.
Image 13: The created data files on our local file system
From the same menu, you can also choose upload file or upload folder to import files into your
repl. For example, if you cleaned the files using external software and then wanted your repl to start
appending new data to the cleaned versions, you could re-import them.
Repl.it will warn you about overwriting your existing files if you haven’t changed the names.
Working with Files using Repl.it 25
Where next?
That’s it for our weather reporting project. You learned how to work with files in Python and Repl.it,
including different modes (read, write, or append) in which files can be opened.
You also worked with an external library, requests, for fetching data over the internet. This module
is not actually part of Python, and in the next article you’ll learn more about how to manage external
modules or dependencies.
Managing dependencies using Repl.it
Nearly all useful programs rely to some extent on pre-existing code in various forms. The existing
code that your code relies on is known as a dependency. You have already come across some
dependencies in previous tutorials: you used the math module to calculate quadratic equations in
the first tutorial and you used the requests module to fetch weather data in the second tutorial.
In the first tutorial, you also wrote the solver.py module, and imported this into the main.py file.
We can think of dependencies as falling into three broad categories:
• Internal dependencies: other code that you or your organisation wrote and which you fully
control, e.g. the solver.py file.
• Standard dependencies: code that exists as part of the standard language libraries, e.g. the math
module.
• External dependencies: code that is written by third-party developers, e.g. the requests mod-
ule.
In this tutorial, you’ll gain more experience with all three categories of dependencies. Specifically,
you’ll write an NLP (natural language processing) program to analyse sentences, using spaCy¹⁵, a
third-party dependency.
Dependency management is a hugely complicated area, and there is a large ecosystem of related
tools to help manage packaging and installing Python programs. We won’t be covering all of the
options and background, but you can read an overview of the different tools here¹⁶.
¹⁵https://spacy.io/
¹⁶https://modelpredict.com/python-dependency-management-tools
Managing dependencies using Repl.it 27
In order to use this library, you would first have to install it using a command similar to pip install
requests, and only then would the import statement run correctly.
Repl.it, by contrast, can often do the installation for you completely automatically, using the
Universal Package Manager¹⁷. The moment you run the import requests line of code, the package
manager will go find the correct package and install it, or in some cases Repl.it will even have pre-
installed the package. Either way, your code will “just work”.
This is super convenient, but sometimes you need more control. For example, you might need a
specific version of a package, or the universal package manager might not be able to automatically
install all of your dependencies. In these cases, you can use more advanced ways to install packages.
This will take you to a page showing an overview and summary of the selected package. You can
install it to your repl by using the + button, as shown below.
Once the package is installed, we can use it in our code. Run the example shown below to extract
the “Google Search” text from the main button on the homepage.
1 import requests
2 from bs4 import BeautifulSoup
3
4 r = requests.get("https://google.com").text
5 soup = BeautifulSoup(r, "html.parser")
6
7 print([x.get("title") for x in soup.findAll("input") if x.get("title")])
This code uses the requests library to scrape the HTML from google.com and then uses the
beautifulsoup4 library to get the title of the button off the page and print it to the console.
Managing dependencies using Repl.it 29
Because requests is one of the most commonly used Python libraries, Repl.it probably installed it
in a slightly different way from most packages. However, beautifulsoup4 is less common and this
will have been installed in the standard way using poetry¹⁸.
If you go back to the files tab, you’ll see two new files poetry.lock and pyproject.toml which were
created automatically by the installer. Take a look inside the pyproject.toml file.
Image 5: The pyproject.toml file lists all dependencies and their versions.
In this case, line 9 says that our project relies on the beautifulsoup4 package and needs at least
version 4.9.1. If we look at the beautifulsoup page on PyPi¹⁹, we’ll see that the latest stable version
is 4.9.1, so if this project is run in the future and there is a new version available, it will automatically
use the updated package.
Image 6: We can access the spaCy package via the package index”
Select the version at the top and hit the + button to add this package to your application. Once this
is complete, head across to your main.py and enter the following code:
1 import spacy
2 print(spacy.__version__)
This should output the version of spaCy that we are using, which means that spaCy has been added
as a dependency correctly.
If you take a look at your pyproject.toml file now, you should see that it has specified spaCy as a
dependency.
1 [tool.poetry]
2 name = "spacy-example"
3 version = "0.1.0"
4 description = ""
5 authors = ["Your Name <[email protected]>"]
6
7 [tool.poetry.dependencies]
8 python = "^3.8"
9 spacy = "^2.3.2"
10
11 [tool.poetry.dev-dependencies]
Managing dependencies using Repl.it 31
12
13 [build-system]
14 requires = ["poetry>=0.12"]
15 build-backend = "poetry.masonry.api"
An important component of spaCy is a set of pretrained statistical models that support NLP. These
do not come with spaCy by default, nor are they indexed on PyPi. One of these models is en_core_-
web_sm.
In your main.py file, replace your current code with the following:
1 import spacy
2
3 nlp = spacy.load("en_core_web_sm")
4 doc = nlp("The quick brown fox jumps over the lazy dog.")
5 for token in doc:
6 print(token.text)
This code should simply break our short sentence into tokens (words), and print each one out.
However, at this point, if you run your code you will get an error, as Python cannot find the en_-
core_web_sm model:
We will now explicitly tell our application how to access this dependency. To do this, we need to
find where the model is stored online.
First, we need to find the spaCy documentation for this model. This can be accessed here²¹.
²¹https://spacy.io/models/en
Managing dependencies using Repl.it 32
Selecting the RELEASE DETAILS button will guide us to where the model is stored online, on GitHub²².
GitHub is a very common place to store code and related components online.
The GitHub page also lets us know what version of spaCy is needed to make sure the model runs
correctly.
²²https://github.com/explosion/spacy-models/releases//tag/en_core_web_sm-2.3.1
Managing dependencies using Repl.it 33
Image 9: The GitHub page provides information about the requirements and features of the model
Here we see that spaCy version should be greater than or equal to 2.3.0, but less than 2.4.0. We should
make a note of this for later, so we can check that we have pinned an appropriate spaCy version.
If we scroll right to the bottom of the page, you will see an “Assets” section, and under this you will
see the same Package icon we used in Repl.it with “en_core_web_sm-2.3.1.tar.gz” next to it. This is
what we have been looking for: the file containing the model.
Image 10: The model can be found under the “Assets” heading
Right-click on this file and select copy link address. We will need this shortly, as this is the URL
of the file.
We now need to modify our pyproject.toml file in Repl.it. Open this file and add the following
section to it
Managing dependencies using Repl.it 34
1 [tool.poetry.dependencies.en_core_web_sm]
2 url = "https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.\
3 3.1/en_core_web_sm-2.3.1.tar.gz"
The url should be the one that you copied from GitHub in the previous step. Your whole pyproject.toml
file should now look like the one below.
Image 11: Modifying pyproject.toml to explicitly point to the model allows our application to find it and use it
At this point that we should also check that we are using an appropriate version of spaCy. We are
using version 2.3.2, which is in the allowed range for the model release (>=2.3.0, <2.4.0) , so we do
not need to modify this.
Finally, hit the run button. This will cause your configuration files to be updated and then will run
your application. If everything has gone correctly, you should see the following in the output pane
once it completes.
Image 12: spaCy and the necessary components are all found as dependencies, so the application runs successfully
Managing dependencies using Repl.it 35
1 import spacy
2 import requests
3 from bs4 import BeautifulSoup
4 from collections import Counter
5
6 nlp = spacy.load("en_core_web_sm")
7 response = requests.get("http://lite.cnn.com/en")
8 soup = BeautifulSoup(response.text, "html.parser")
9
10 # https://stackoverflow.com/questions/1936466/beautifulsoup-grab-visible-webpage-text
11 [s.extract() for s in soup(['style', 'script', '[document]', 'head', 'title'])]
12 text = soup.getText()
13 doc = nlp(text)
14
15 names = []
16 for ent in doc.ents:
17 if ent.label_ == "PERSON":
18 names.append(ent.lemma_)
19
20 print("These people are in the headlines today")
21 print(Counter(names).most_common(10))
This pulls the HTML from the lite version of CNN, extracts the HTML, removes non-visible text
such as CSS styles and JavaScript, and parses the resulting text using spaCy.
Then we loop through all of the named entities²⁴ that spaCy detects as part of its standard parse, and
print out any that look like people.
If you run this code, you should see a list of people making headlines today. At the time of writing,
John Lewis is mentioned in the most headlines. (Note that named entity recognition is a difficult
²³https://lite.cnn.com
²⁴https://en.wikipedia.org/wiki/Named_entity
Managing dependencies using Repl.it 36
task and here spaCy considers the possessive form John Lewis' to be a separate entity. We can see
that John Lewis was mentioned a total of 7 times though.)
Where next?
spaCy is a very powerful NLP library and it can do far more than simply extract people’s names. See
what other interesting insights you can automatically extract from today’s news.
Now you can use the Repl.it IDE, write programs that use files, and install third-party dependencies.
Next up, we’ll be taking a look at doing data science with Repl.it by visualising data using matplotlib
and seaborn.
Data science with Repl.it: Plots and
graphs
So far, all the programs we have looked at have been entirely text based. They have taken text input
and produced text output, on the console or saved to files.
While text is flexible and powerful, sometimes a picture is worth a thousand words. Especially
when analysing data, you’ll often want to produce plots and graphs. There are three main ways of
achieving this using Repl.it.
1. Creating a front-end only project and using only JavaScript, HTML and CSS.
2. Creating a full web application with something like Flask²⁵, analysing the data in Python and
passing the results to a front end to be visualised.
3. Using Python code only, creating windows using X²⁶ and rendering the plots in there.
Option 1 is great if you’re OK with your users having access to all of your data, you like doing data
manipulation in JavaScript, and your data set is small enough to load into a web browser. Option 2
is often the most powerful, but can be overkill if you just want a few basic plots.
Here, we’ll demonstrate how to do option 3, using Python and Matplotlib²⁷.
There are many traditions in the Python data science world about how to import libraries. Many of
the libraries have long names and get imported as easier-to-type shortcuts. You’ll see that nearly all
examples import pyplot as the shorter plt before using it, as we do above. We can then generate
a basic line plot by passing two arrays to plt.plot() for X and Y values. In this example, the first
point that we plot is (1,6) (the first value from each array). We then add all of the plotted points
joined into a line graph.
Repl.it knows that it needs an X server to display this plot (triggered when you call plt.show()),
so after running this code you’ll see “Starting X” in the main output console and a new graphical
window will appear.
Image 1: We can plot a basic line plot by passing in the X and Y values
The X server is very limited compared to a full operating system GUI. Beneath the plot, you’ll see
some controls to pan and zoom around the image, but if you try to use them you’ll see that the
experience is not that smooth.
Line plots are cool, but we can do more. Let’s plot a real data set.
²⁸https://www.gislounge.com/what-is-gis/
²⁹https://simplemaps.com/data/us-cities
Data science with Repl.it: Plots and graphs 39
If you run this, you’ll notice it takes a little bit longer than the six point plot we created before, as it
now has to plot nearly 30 000 data points. Once it’s done, you should see something similar to the
following (though, as the colours were chosen randomly, yours might look different).
Data science with Repl.it: Plots and graphs 40
You’ll also notice that while it’s recognisable as the US, the proportions are not right. Mapping a 3D
sphere to a 2D plane is surprisingly difficult and there are many different ways of doing it.
³⁰https://seaborn.pydata.org/
³¹https://pandas.pydata.org/
³²https://numpy.org/
Data science with Repl.it: Plots and graphs 41
1 import requests
2 import seaborn as sns
3 import pandas as pd
4 from matplotlib import pyplot as plt
5 import numpy as np
6
7 data_url = "https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_data\
8 set/4_ThreeNum.csv"
9
10 r = requests.get(data_url)
11
12 with open("gdp-life.txt", "w") as f:
13 f.write(r.text)
14
15 df = pd.read_csv("gdp-life.txt")
16 print(df.head())
17
18 print("___")
19 print("The correlation is: ", np.corrcoef(df["gdpPercap"], df["lifeExp"])[0, 1])
20 print("___")
21
22 sns.lmplot("gdpPercap", "lifeExp", df).set_axis_labels(
23 "Life expectancy", "GDP per capita"
24 )
25
26 plt.title("People live longer in richer countries")
27 plt.tight_layout()
28 plt.show()
If you run this, you’ll see it plots each country in a similar way to our previous scatter plot, but also
adds a line showing the correlation.
In the output pane below you can also see that the correlation coefficient between the two variables
is 0.67 which is a fairly strong positive correlation.
Data science with Repl.it: Plots and graphs 42
Image 3: Using seaborn to create a scatter plot with a best fit line to see correlation
Data science and data visualisation are huge topics, and there are dozens of Python libraries that
can be used to plot data. For a good overview of all of them and their strengths and weaknesses, you
should watch Jake Vanderplas’s talk³³.
1 plt.savefig("GDPlife.png")
Rerun the code. Instead of seeing the plot appear in the right-hand pane, you’ll see a new file in the
files pane. Clicking on it will show you the PNG file in the editing pane.
³³https://www.youtube.com/watch?v=FytuB8nFHPQ
Data science with Repl.it: Plots and graphs 43
Where next?
You’ve learned how to make some basic plots using Python and Repl.it. There are millions of freely
available data sets on the internet, waiting for people to explore them. You can find many of these
using Google’s Dataset Search³⁴ service. Pick a topic that you’re interested in and try to find out
more about it through data visualisations.
Next up, we’ll explore the mutiplayer functionality of Repl.it in more detail so that you can code
collaboratively with friends or colleagues.
³⁴https://datasetsearch.research.google.com/
Multiplayer: Pair programming with
Repl.it
Software developers have a reputation for being loners, but they don’t always code by themselves.
Pair programming³⁵ is used by many programmers to
• Write bug-free code more efficiently (for example, one person might watch for mistakes while
the other codes).
• Share knowledge (a less-experienced programmer might ‘follow along’ while a more experi-
enced programmer develops something, learning from each step of the process).
• Assess expertise (if you’re considering a new hire, watching them code first can be helpful to
assess how good a coder they are, but coding with them allows you to also see their experience
in teamwork and communication).
Pair programming intuitively sounds like it would be inefficient: after all, the two developers could
instead be working on different projects simultaneously. But on top of catching more bugs, two
people working together often display more creativity as well. You might think of an idea based on
something your buddy said that wouldn’t have come to you alone.
If you have a friend handy, work through this tutorial together to gain real pair programming
experience. If you’re alone, fire up two browsers (or use incognito mode) to sign into two Repl.it
accounts simultaneously.
Now from your own fork, press the share button, as shown below.
Copy the invite link, and note that this is different from the normal link to your repl. If you copy
the link from your URL bar, you can give people read access to your repl, but by copying the invite
Multiplayer: Pair programming with Repl.it 47
If you knew your friend’s Repl.it username or the email associated with the Repl.it account, you
could instead use the Invite box at the top. Share the link with your friend and wait for them to
join.
As soon as they do, you will see that a chat box pops up in the bottom right corner. Their profile
picture or letter will be at the top of the chat box, so you can always know who is currently active.
Remember, you forked the repl in a previous step, so you are the owner of this fork and the “host” of
this multiplayer session. If you invite multiple people and then leave, they can continue collaborating
without you, but they won’t be able to rejoin if the host is no longer in the session.
You can use the team chat feature, as shown below.
Multiplayer: Pair programming with Repl.it 48
In the previous tutorial, we looked at GDP by country. Imagine that you are now interested in how
this is broken down by continent too. You still want to plot each country as a separate data point,
but you want them in different colours, one for each continent. You’re not sure how to do this, so
you ask for help.
You can see a typing indicator to help decide if you should wait around for a reply or go make coffee.
Multiplayer: Pair programming with Repl.it 49
Your friend tells you about the hue argument and points out that you already have this data in the
continent column in your data frame. You add hue="continent" to the graph and re-run it, but it
doesn’t quite work out how you expect.
Multiplayer: Pair programming with Repl.it 50
Image 6: Changing the plot from grouping data by country to grouping by continent
Your friend suggests maybe a scatter plot without the correlation line might look better, but when
you try that it results in an error. The error message is hidden by the chat box, so you move it to the
other side of the screen.
Multiplayer: Pair programming with Repl.it 51
Image 7: You can move the chat box to the left of the IDE to see errors better
This is getting a bit more complicated than you bargained for. Sometimes showing is easier than
telling, so your friend starts editing the code directly instead of telling you how to do so using chat.
The code
1 sns.scatterplot(
2 "gdpPercap", "lifeExp", df, hue="continent"
3 ).set_axis_labels("GDP per capita", "Life expectancy")
changes to
1 ax = sns.scatterplot(`
2 "gdpPercap", "lifeExp", df, hue="continent"
3 )
4 ax.set(xlabel="GDP per capita", ylabel="Life expectancy")
Multiplayer: Pair programming with Repl.it 52
Image 8: In our new plot, we can see that African countries tend to have low life expectency and low GDP, but the
correlation looks weaker for the other continents
Where next?
Getting help on a single file in a program is only one use for multiplayer, but there are many
scenarios where it can be useful to see your teammates’ changes in real time. For example, if you’re a
back-end developer you could work closely with a front-end developer, ironing out any issues with
Multiplayer: Pair programming with Repl.it 53
data communication between the back- and front-end in real time, instead of waiting for multiple
iterations of several days.
That brings us to the end of part 1 of this series and you should now be familiar with all of the basic
features of Repl.it.
In part 2, we’ll cover more advanced features, such as running projects from GitHub, storing secrets
securely, and productivity hacks.
Repl.it and GitHub: Using and
contributing to open-source projects
You’ve probably heard of GitHub³⁸, which hosts millions of coding projects that you can use or learn
from.
In this tutorial, we’ll see how to:
• Import open-source projects from GitHub to Repl.it so that we can run them or modify them
• Integrate your own GitHub account with your Repl.it account so that you can work on your
private projects
• Push changes back to open-source projects as pull requests.
We’ll use a basic Flask hello world app for the demonstrations. You can use this same project to
follow along, or pick any other project on GitHub that interests you.
Press the green Import from GitHub button and you’ll see Repl.it clone the repository and turn it into
a repl. In all of our previous projects, we used the main.py file that Repl.it automatically creates for
all new Python projects, and which it runs automatically when you press the run button. Note how
in this GitHub project, we have no main.py file, and our code is instead in mydemoapp.py. Therefore,
Repl.it will need some help from you to define how to run the project. This is configured through
another special file named .replit. Because there was no main.py file, Repl.it automatically created
this file and will prompt you to configure it.
Repl.it and GitHub: Using and contributing to open-source projects 56
Image 2: Adding a .replit file to indicate how the project should be run.
Select the language (Python) from the first dropdown and type python mydemoapp.py in the
“configure the run button” input. Every time you press the run button, Repl.it will execute the
command given here. If you prefer, you can also edit the .replit file directly. If you click on it,
you’ll see it now contains the following configuration, which matches what we provided through
the GUI panel.
1 language = "python3"
2 run = "python mydemoapp.py"
If you hit the run button, you should see the app start. As you can see, the web application is
very basic: all it can do is display a welcome message. If the configuration panel doesn’t pop up
automatically, you can manually create a file called .replit and add the configuration above to get
the same result.
Repl.it and GitHub: Using and contributing to open-source projects 57
Some GitHub projects are very large and complicated, and you might not be able to run everything
you need directly on Repl.it, but in many cases it just works. Open-source projects can be read and
run by anyone, but still have restrictions on who can push changes to them. Next we’ll improve this
project and request that the owner merges our changes into the original.
Image 4: Viewing details about the repository in the version control tab.
If you select this version control tab from the menu on the left, you’ll see a summary of the linked
repository. Note that it’s already figured out what changes we’ve made, and it shows that the .replit
file is new. It would be nice for other people who use this repository with GitHub to have the file
automatically, so we might want to push the changes we made back to GitHub.
Note that the owner of the repository is ritza-co though, so you won’t have write permissions for
this repository. If you press the commit & push button, you’ll see an error as shown below.
You should be taken to a new page in GitHub that looks very similar to the old one but which is
owned by your own GitHub username. My GitHub username is sixhobbits so the new URL for me
is https://github.com/sixhobbits/flask-hello-world (but yours will be different).
Now, instead of cloning the original project into Repl.it, create a new repl and import this fork of
the project instead. Instead of going through the import UI again, you can also just create and load
the relevant import URL. These URLs are in the format https://repl.it/github/<githubproject>
so in my case I open https://repl.it/github/sixhobbits/flask-hello-world in my browser (you need to
substitute your own GitHub username for this to work).
Configure the .replit file again and open the version control tab, as before. Under “what did you
⁴¹https://github.com
Repl.it and GitHub: Using and contributing to open-source projects 60
change” enter “add .repl file” or a similar message to describe what contributions you’re making,
and press commit and push.
You’ll see the error again and be presented with the option to connect your Repl.it account to GitHub
to prove that you authorise Repl.it to make these changes to GitHub on your behalf.
You can give Repl.it access to all of your repositories (useful if you want to use this integration a lot),
but by default it will only get permission for the specific repository that we’re working with.
Press the green approve button and you’ll be directed back to Repl.it. Press the commit & push button
again on Repl.it and this time everything should work without any errors.
Navigate back to your fork of the GitHub project, and you should see that the changes are reflected
in GitHub too.
Repl.it and GitHub: Using and contributing to open-source projects 61
As you can see, the new .replit file is visible and GitHub prompts you to make a pull request back
into the original repository. Press Pull request, create pull request, add a comment explaining
why your changes should be merged into the original repository, and click Create pull request
again.
Repl.it and GitHub: Using and contributing to open-source projects 62
Image 9: Creating a pull request from GitHub to merge our changes back into the original repository.
The owner of the repository will get a notification about your proposal and can choose to add your
changes or reject them (in this case, don’t be too hopeful about your changes being accepted as the
.replit file being missing is important to follow along the earlier steps of this tutorial �.)
Where next?
Cloning repositories and being able to immediately run them is useful in many scenarios, from just
wanting to try out a cool project that you found to running production workflows.
Open-source software only exists because of people who build, maintain, and improve it. There’s
no global committee that decides who gets to be an open-source software developer and you can be
one too. Find a project that you like, look at the “Issues” tab on GitHub to see what problems exist,
and try to fix one. Many issues on GitHub are tagged with “Good first issue” to help direct newer
developers to places where they can get started.
In the next tutorial, we’ll do something a bit different and build a 2D game using PyGame.
Building a game with PyGame and
Repl.it
So far, we’ve mainly seen how to write text-based programs, or those with a basic web front end.
In this tutorial, we’ll instead build a 2D game using PyGame. You’ll use animated sprites and learn
how to:
The basic premise of the game is as follows. You’re a juggler, learning to juggle. Balls will fall down
from the top of the screen, and you’ll need to click them to ‘throw’ them up again. After several
successful throws without dropping any balls, more balls will be added to make the game harder.
You’ll see “Python3 with PyGame” displayed in the default console and a separate pane in the Repl.it
IDE where you will be able to see and interact with the game you will create.
The first thing we need is a so-called “sprite”, which is a basic image file that we will use in our
game. Download the tennis ball file available here⁴⁴ and save it to your local machine.
Now upload it to your repl using the upload file button and you should be able to see a preview
of the image by clicking on it in the files pane.
⁴⁴https://raw.githubusercontent.com/ritza-co/public-images/master/small_tennis.png
Building a game with PyGame and Repl.it 66
1 import pygame
2
3 WIDTH = 800
4 HEIGHT = 600
5 BACKGROUND = (0, 0, 0)
6
7 class Ball:
8 def __init__(self):
9 self.image = pygame.image.load("small_tennis.png")
10 self.rect = self.image.get_rect()
11
12 def main():
13 pygame.init()
14 screen = pygame.display.set_mode((WIDTH, HEIGHT))
15 clock = pygame.time.Clock()
16 ball = Ball()
Building a game with PyGame and Repl.it 67
17
18 while True:
19 screen.fill(BACKGROUND)
20 screen.blit(ball.image, ball.rect)
21 pygame.display.flip()
22 clock.tick(60)
23
24 if __name__ == "__main__":
25 main()
This code looks a bit more complicated than it needs to be because in addition to drawing the ball
to the screen, it also sets up a game loop. While basic 2D games appear to move objects around
the screen, they usually actually simulate this effect by redrawing the entire screen many times per
second. To account for this we need to run our logic in a while True: loop.
We start by importing PyGame and setting up some global variables: the size of our screen and the
background color (black). We then define our Ball, setting up an object that knows where to find
the image for the ball and how to get the default coordinates of where the image should be drawn.
We then set up PyGame by calling init() and starting the screen as well as a clock. The clock
is necessary because each loop might take a different amount of time, based on how much logic
needs to run to calculate the new screen. PyGame has built-in logic to calculate how much time
elapses between calls to clock.tick() to draw frames faster or slower as necessary to keep the
game experience smooth.
We start the game loop and call blit on our ball. Blitting⁴⁵ refers to moving all of the pixels from
our sprite file (the tennis ball) to our game environment. The flip() function updates our screen
and the tick(60) call means that our game will redraw the screen around 60 times per second.
If you run this code, you should see the ball pop up in the top right pane, as shown below.
⁴⁵https://en.wikipedia.org/wiki/Bit_blit
Building a game with PyGame and Repl.it 68
1 class Ball:
2 def __init__(self):
3 self.image = pygame.image.load("small_tennis.png")
4 self.speed = [0, 1]
5 self.rect = self.image.get_rect()
6
7 def update(self):
8 self.move()
9
10 def move(self):
11 self.rect = self.rect.move(self.speed)
Building a game with PyGame and Repl.it 69
Now modify your game loop to include a call to the new update() method. The loop should look as
follows.
1 while True:
2 screen.fill(BACKGROUND)
3 screen.blit(ball.image, ball.rect)
4 ball.update()
5 pygame.display.flip()
6 clock.tick(60)
The (0, 1) tuple causes the ball to move its Y coordinate by 1 each loop and keep a constant X
coordinate. This has the effect of making the ball drop slowly down the screen. Run your code again
to check that this works.
1 while True:
2 for event in pygame.event.get():
3 if event.type == pygame.MOUSEBUTTONDOWN:
4 if ball.rect.collidepoint(pygame.mouse.get_pos()):
5 ball.speed = [0,-1]
6 screen.fill(BACKGROUND)
7 screen.blit(ball.image, ball.rect)
8 ball.update()
9 pygame.display.flip()
10 clock.tick(60)
With this code, we loop through all events and check for left click (MOUSEBUTTONDOWN) events. If we
find one, we check if the click happened on top of the ball (using collidepoint() which checks
for overlapping coordinates), and in this case we reverse the direction of the ball (still no x-axis or
horizontal movement, but we make the ball move negatively on the y-axis, which is up.)
If you run this code again, you should now be able to click on the ball (let it fall for a while first)
and see it change direction until it goes off the top of the screen.
1 def update(self):
2 if self.rect.top < 0:
3 self.speed = [0, 1]
4 self.move()
This checks to see if the top of the ball is above the top of the screen. If it is, we set the speed back
to (0, 1) (moving down).
⁴⁷https://i.ritzastatic.com/repl/codewithrepl/07-pygame/07-06-GIF-bounce-off-roof.gif
Building a game with PyGame and Repl.it 72
1 import pygame
2 import random
3
4 WIDTH = 800
5 HEIGHT = 600
6 BACKGROUND = (0, 0, 0)
7
8 class Ball:
9 def __init__(self):
10 self.image = pygame.image.load("small_tennis.png")
11 self.speed = [random.uniform(-4,4), 2]
12 self.rect = self.image.get_rect()
13
14 def update(self):
15 if self.rect.top < 0:
16 self.speed[1] = -self.speed[1]
17 self.speed[0] = random.uniform(-4, 4)
18 elif self.rect.left < 0 or self.rect.right > WIDTH:
19 self.speed[0] = -self.speed[0]
20 self.move()
21
22 def move(self):
23 self.rect = self.rect.move(self.speed)
24
25 def main():
26 clock = pygame.time.Clock()
27 ball = Ball()
28 pygame.init()
29 screen = pygame.display.set_mode((WIDTH, HEIGHT))
30
31 while True:
32 for event in pygame.event.get():
33 if event.type == pygame.MOUSEBUTTONDOWN:
34 if ball.rect.collidepoint(pygame.mouse.get_pos()):
35 ball.speed[0] = random.uniform(-4, 4)
36 ball.speed[1] = -2
37 screen.fill(BACKGROUND)
38 screen.blit(ball.image, ball.rect)
39 ball.update()
40 pygame.display.flip()
41 clock.tick(60)
42
43 if __name__ == "__main__":
Building a game with PyGame and Repl.it 73
44 main()
If you run your code again, you should be able to juggle the ball around by clicking on it and watch
it randomly bounce off the ceiling and walls.
1 self.alive = True
In the main() function, directly before the while True: line, add the following code.
1 ball1 = Ball()
2 ball2 = Ball()
3 ball3 = Ball()
4
5 balls = [ball1, ball2, ball3]
Now remove the ball=Ball(), ball.update() and screen.blit(...) lines and replace them with
a loop that updates all of the balls and removes the dead ones (even though we haven’t written the
logic yet to stop the balls from ever being alive.)
You’ll also need to account for multiple balls in the the event detection loop. For each event, loop
through all of the balls and check if the mouse click collided with any of them.
At this point, the full main() function should look as follows.
Building a game with PyGame and Repl.it 74
1 def main():
2 clock = pygame.time.Clock()
3 pygame.init()
4 screen = pygame.display.set_mode((WIDTH, HEIGHT))
5
6 ball1 = Ball()
7 ball2 = Ball()
8 ball3 = Ball()
9
10 balls = [ball1, ball2, ball3]
11
12 while True:
13 for event in pygame.event.get():
14 if event.type == pygame.MOUSEBUTTONDOWN:
15 for ball in balls:
16 if ball.rect.collidepoint(pygame.mouse.get_pos()):
17 ball.speed[0] = random.randrange(-4, 4)
18 ball.speed[1] = -2
19 break
20 screen.fill(BACKGROUND)
21 for i, ball in enumerate(balls):
22 if ball.alive:
23 screen.blit(ball.image, ball.rect)
24 ball.update()
25 if not ball.alive:
26 del balls[i]
27 pygame.display.flip()
28 clock.tick(60)
To kill balls when they fall through the floor, we can add another check to the update() function as
follows.
Run the code again and you should be able to juggle three balls. See how long you can keep them
in the air.
Building a game with PyGame and Repl.it 75
Now the game is to see how many balls you can juggle with. If it’s too easy, modify the speeds and
angles of the balls.
Where next?
You’ve learned how to make 2D games using PyGame. If you want to make more games but are
stuck for ideas, check out PyGame’s extensive collection of examples⁴⁹.
You could also extend the juggling game more. For example, make the balls accelerate as they fall,
or increase the speed of all balls over time.
⁴⁹https://www.pygame.org/docs/ref/examples.html
Staying safe: Keeping your passwords
and other secrets secure
While developing software fully in public has many benefits, it also means that we need to be
extra careful about leaking sensitive information. Because all of our repls are public by default,
we shouldn’t store passwords, access keys, personal information, or anything else sensitive in them.
Even if you’re coding offline or only in private repls, it’s good practice to keep your code separate
from any private information in any case.
In this tutorial, we’ll look at how to use the special .env file that Repl.it provides to set environment
variables⁵⁰. We can use these to store sensitive information and Repl.it will make sure that this file
isn’t included when others fork our repl.
This can be used to store all kinds of configuration, but it’s commonly used for passwords, API keys,
and database credentials.
1 SECRET_PASSWORD=ThisIsMySuperSecretP@ssword!!
With this file present, your scripts can load the variable SECRET_PASSWORD from the operating system
environment directly.
Unlike in Python, where x = 1 and x= 1 are the same, in .env files, spaces matter and you should
be careful to not add any extra ones.
⁵⁰https://en.wikipedia.org/wiki/Environment_variable
Staying safe: Keeping your passwords and other secrets secure 78
We have API_KEY defined near the top of main.py, and this is the value that we want to keep secret.
Let’s move it to a .env file instead.
Click on the add file icon and call your new file .env. Be sure to add the initial full stop and don’t
add any spaces.
⁵¹http://www.codewithrepl.it/02-managing-files-using-repl-it.html
⁵²https://repl.it/@GarethDwyer1/cwr-02-weather-report
Staying safe: Keeping your passwords and other secrets secure 79
Now remove the API_KEY variable from the main.py file and add it to the .env file, removing all
quotation marks (") and spaces.
Your .env file should look as follows (but use your own WeatherStack API instead of the example
given here).
1 API_KEY=baaf201731c0cbc4af2c519cb578f907
Image 3: Checking that the .env file isn’t included in public versions of our repl.
1 import requests
2 import os
3
4 API_KEY = os.getenv("API_KEY")
The getenv function looks for an environment variable of a specific name. Now our code (and anyone
who sees it) only needs to know the name of the key that stores our private API key, instead of the
API key itself. You should be able to run your code again at this point to verify that new weather
entries are correctly added to the relevant files.
There are many other environment variables that make various parts of an operating system work
correctly. For example, you could also take a look at the LANG and PATH environment variables, which
will show you that Repl.it has their servers configured to use US English and 8-bit unicode character
encoding, and have some default places where the system looks for executable programs.
Staying safe: Keeping your passwords and other secrets secure 81
You should see a bunch of entries from each change you’ve made to this project. Click through them
and find the one where you deleted your API key. As you can see, the history viewer shows not only
which lines have been changed, but also what was there before.
Staying safe: Keeping your passwords and other secrets secure 82
Luckily history is not included when other people fork your repl so this is not a huge problem, but
it’s important to keep in mind where people might find your credentials.
In our case, the worst case scenario is that someone finds our WeatherStack API key and uses up the
quota, which is not the end of the world. A far more painful (and very common) scenario involves
real money. For example, a developer signs up for a free trial on an expensive service like AWS,
links a credit card, and then accidentally pushes the credentials for the service to GitHub or similar.
Even if they realise their mistake and delete these within seconds, the credentials themselves are still
available in the history of their repository. Hackers have bots that regularly look out for mistakes
like this and use the credentials to spin up thousands of servers (often to mine cryptocurrency or
join a botnet attack), potentially costing the poor developer thousands of dollars before they notice.
Rotating credentials
Even if there’s a small chance that your API key has been exposed, it’s important to rotate it. This
involves creating a new key, ensuring the new key works with your service, and then disabling the
old one.
In the case of WeatherStack, there is no option to create a new key while keeping the old one active,
so we need to reset it and then copy the new key to our .env file (meaning that our app can’t function
between the time that we disable the old key and replace it with the new one).
Staying safe: Keeping your passwords and other secrets secure 83
Visit your WeatherStack account and press the reset button to get your new API key.
Where next?
There’s a lot more that you can do with .env files. In a later tutorial, we’ll use it to store database
credentials (which are more important to keep safe than a free API key).
You could also keep other private information in environment variables. For example, if you code a
hangman game, you could keep the word that people need to guess in there so that you can share
your code without spoiling the game.
An introduction to pytest and doing
test-driven development with Repl.it
In this tutorial we’ll introduce test-driven development and you’ll see how to use pytest⁵³ to ensure
that your code is working as expected.
pytest lets you specify inputs and expected outputs for your functions. It runs each input through
your function and validates that the output is correct. pytest is a Python library and works just like
any other Python library: you install it through your package manager and you can import it into
your Python code. Tests are written in Python too, so you’ll have code testing other code.
Test-driven development or TDD is the practice of writing tests before you write code. You can read
more about TDD and why it’s popular on Wikipedia⁵⁴.
Specifically you’ll:
• See how to structure your project to keep your tests separate but still have them refer to your
main code files
• Figure out the requirements for a function that can split a full name into first and last name
components
• Write tests for this function
• Write the actual function.
You want both the folders to be at the root level of your project.
Now add a file at the root level of the project called __init__.py. This is a special file that indicates
to Python that we want our project to be treated as a “module”: something that other files can refer
to by name and import pieces from. Also add an __init__.py file inside the utils folder and the
tests folder. These files will remain empty, but it’s important that they exist for our tests to run.
Their presence specifies that our main project should be treated as a module and that any code in
our utils and tests folders should be treated as submodules of the main one.
Finally, create the files where we’ll actually write code. Inside the utils folder create a file called
name_helper.py and inside the tests folder create one called test_name_helper.py. Your project
should now look as follows. Make sure that you have all the files and folders with exactly these
names, in the correct places.
1 def split_name(name):
2 first_name, last_name = name.split()
3 return [first_name, last_name]
4
5 print(split_name("John Smith"))
6 # >>> ["John", "Smith"]
While this does indeed work in many cases, names are surprisingly complicated and it’s very
common to make mistakes when dealing with them as programmers, as discussed in this classic
article⁵⁵. It would be a huge project to try and deal with any name, but let’s imagine that you have
requirements to deal with the following kinds of names:
Specifically, you can assume that once you find a name starting with a lowercase letter, it signifies
the start of a last name, and that all other names starting with a capital letter are part of the first
and middle names. Middle names can be combined with first names.
Of course, this does not cover all possibilities, but it is a good starting point in terms of requirements.
Using TDD, we always write failing tests first. The idea is that we should write a test about how
some code should behave, check to make sure that it breaks in the way we expect (as the code isn’t
there). Only then do we write the actual code and check that the tests now pass.
⁵⁵https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/
An introduction to pytest and doing test-driven development with Repl.it 87
Note that the namesplitter in the first line is taken from the name of your Repl.it project, which
defines the names of the parent module. If you called your project something else, you’ll need to
use that name in the import line. It’s important to not include special characters in the project name
(including a hyphen, so names like my-tdd-demo are out) or the import won’t work.
The assert keyword simply checks that a specific statement evaluates to True. In this case, we call
our function on the left-hand side and give the expected value on the right-hand side, and ask assert
to check if they are the same.
This is our most basic case: we have two names and we simply split them on the single space. Of
course, we haven’t written the split_name function anywhere yet, so we expect this test to fail. Let’s
check.
Usually you would run your tests by typing py.test into your terminal, but using Repl.it things
work better if we import pytest into our code base and run it from there. This is because a) our
terminal is always already activated into a Python environment and b) caching gets updated when
we press the Run button, so invoking our tests from outside of this means that they could run on old
versions of our code, causing confusion.
Let’s run them from our main.py file for now as we aren’t using it for anything else yet. Add the
following to this file.
1 import pytest
2 pytest.main()
Press the Run button. pytest does automatic test discovery so you don’t need to tell it which tests
to run. It will look for files that start with test and for functions that start with test_ and assume
these are tests. (You can read more about exactly how test discovery works and can be configured
here⁵⁶.)
You should see some scary looking red failures, as shown below. (pytest uses dividors such as ======
and ------ to format sections and these can get messy if your output pane is too narrow. If things
look a bit wonky try making it wider and rerunning.)
⁵⁶https://docs.pytest.org/en/reorganize-docs/new-docs/user/naming_conventions.html
An introduction to pytest and doing test-driven development with Repl.it 88
If you read the output from the top down you’ll see a bunch of different things happened. First,
pytest ran test discovery and found one test. It ran this and it failed so you see the first red F above
the FAILURES section. That tells us exactly which line of the test failed and how. In this case, it was
an AttributeError as we tried to use split_name which was not defined. Let’s go fix that.
Head over to the utils/name_helper.py file and add the following code.
1 def split_name(name):
2 first_name, last_name = name.split()
3 return [first_name, last_name]
This is the very simple version we discussed earlier that can only handle two names, but it will solve
the name error and TDD is all about small increments. Press Run to re-run the tests and you should
see a far more friendly green output now, as below, indicating that all of our tests passed.
An introduction to pytest and doing test-driven development with Repl.it 89
Before fixing our function to handle more complex cases, let’s first write the tests and check that
they fail. Go back to tests/test_name_helper.py and add the following four test functions beneath
the existing one.
Rerun the tests and you should see a lot more output now. If you scroll back up to the most recent
===== test session starts ===== section, it should look as follows.
An introduction to pytest and doing test-driven development with Repl.it 90
In the top section, the .FFFF is shorthand for “five tests were run, the first one passed and the next
four failed” (a green dot indicates a pass and a red F indicates a failure). If you had more files with
tests in them, you would see a line like this per file, with one character of output per test.
The failures are described in detail after this, but they all amount to variations of the same problem.
Our code currently assumes that we will always get exactly two names, so it either has too many or
too few values after running split() on the test examples.
1 def split_name(name):
2 names = name.split(" ")
3
4 if not name:
5 return ["", ""]
6
7 if len(names) == 1:
8 return ["", name]
9
10 if len(names) == 2:
An introduction to pytest and doing test-driven development with Repl.it 91
This should handle the case of zero, one, or two names. Let’s run our tests again to see if we’ve made
progress before we handle the more difficult cases. You should get a lot less output now and three
green dots, as shown below.
The rest of the output indicates that it’s the middle names and surname prefix examples that are
still tripping up our function, so let’s add the code we need to fix those. Another important aspect
of TDD is keeping your functions as small as possible so that they are easier to understand, test, and
reuse, so let’s write a second function to handle the three or more names cases.
Add the new function called split_name_three_plus() and add an else clause to the existing
split_name function where you call this new function. The entire file should now look as follows.
1 def split_name_three_plus(names):
2 first_names = []
3 last_names = []
4
5 for i, name in enumerate(names):
6 if i == len(names) - 1:
7 last_names.append(name)
8 elif name[0].islower():
9 last_names.extend(names[i:])
10 break
11 else:
An introduction to pytest and doing test-driven development with Repl.it 92
12 first_names.append(name)
13 first_name = " ".join(first_names)
14 last_name = " ".join(last_names)
15 return [first_name, last_name]
16
17 def split_name(name):
18 names = name.split(" ")
19
20 if not name:
21 return ["", ""]
22
23 if len(names) == 1:
24 return ["", name]
25
26 if len(names) == 2:
27 firstname, lastname = name.split(" ")
28 return [firstname, lastname]
29 else:
30 return split_name_three_plus(names)
The new function works by always appending names to the first_names list until it gets to the last
name, or until it encounters a name that starts with a lowercase letter, at which point it adds all of
the remaining names to last_names list. If you run the tests again, they should all pass now.
The tests were already helpful in making sure that we understood the problem and that our function
An introduction to pytest and doing test-driven development with Repl.it 93
worked for specific examples. If we had made any off-by-one mistakes in our code that deals with
three or more names, our tests would have caught them. If we need to refactor or change our code in
future, we can also use our tests to make sure that our new code doesn’t introduce any regressions
(where fixing problems causes code to break on other examples that worked before the fix.)
If you run this, it will prompt the user for their name and then display their first and last name.
Because you’re using the main.py file now, you can also invoke pytest directly from the output
console on the right by typing import pytest; pytest.main(). Note that updates to your code are
only properly applied when you press the Run button though, so make sure to run your code between
changes before running the tests.
An introduction to pytest and doing test-driven development with Repl.it 94
Image 8: Triggering a new error and invoking pytest from the output pane.
Where next
You’ve learned to do TDD in this project. It’s a popular style of programming, but it’s not for
everyone. Even if you decide not to use TDD, having tests is still very useful and it’s not uncommon
for large projects to have thousands or millions of tests.
Take a look at the big list of naughty strings⁵⁷ for a project that collects inputs that often cause
software to break. You could also read How SQLite Is Tested⁵⁸ which explains how SQLite, a popular
lightweight database, has 150 thousand lines of code and nearly 100 million(!) lines of tests.
In the next tutorial, we’ll show you how to become a Repl.it poweruser by taking advantage of the
productivity features it offers.
⁵⁷https://github.com/minimaxir/big-list-of-naughty-strings
⁵⁸https://www.sqlite.org/testing.html
Productivity hacks
The images in this chapter are mostly .gif files, click here⁵⁹ to access the web version of this chapter
After coding for a while, you may find that there are some repetitive things that take up unnecessary
time. For example, searching for and updating a variable name can seem laborious. Luckily, Repl.it
has some built-in productivity tools that we’ll take a look at in this tutorial.
Specifically, you’ll see how to:
• Make simultaneous changes in several parts of your file using multiple cursors
• Use keyboard shortcuts to quickly carry out tasks without the delay of reaching for your mouse
• Switch to Vim or Emacs keybindings for full mouseless control.
Similarly to learning to touch type, there is often a steep learning curve when you start to use
advanced code editing features. They might even substantially slow you down at first, but once you
master them you’ll soar past the limits of what you could achieve without these aids.
The keyboard shortcut indicated to the right of each option shows how to activate that option directly
without opening up the global command palette, but once it’s open you can type in a part of any of
the options to activate that option. For example, in our weather project app, I can type Cmd+K and
then type fi (start of find) and press Enter and then type Lo (start of London.txt) and press Enter
again to quickly open the weather log for London.
Productivity hacks 97
Of course, with only six files it might be faster to reach for my mouse, but as the find searches
through all files in all directories this method can be significantly faster for larger projects.
Opening up the multiplayer, version control, and settings panes using this method is also faster once
the habit is ingrained compared to moving the mouse to the small icons on the left bar. And while
pressing Ctrl+Enter or Cmd+Enter to run your code is faster than choosing “Run” from this global
command ette, Ctrl+K is only a single shortcut to remember and it will remind you of any other
shortcuts you can’t recall.
Let’s take a look at how these work by editing the PyGame juggling project⁶⁰ that we covered in a
previous tutorial⁶¹.
Instead of carrying out the suggested operations as you usually would, use Repl.it’s productivity
features instead.
1 ball1 = Ball()
2 ball2 = Ball()
3 ball3 = Ball()
Instead of typing out all three lines, you can type out the first one, leave your cursor position on that
line, and press Shift+Alt+down (shift+option+down on MacOS) twice. This will create two copies of
the line, directly below the original one, and then you can simply change the number in the variable
to account for the second two balls.
⁶⁰https://repl.it/@GarethDwyer1/cwr-07-juggling-with-pygame
⁶¹http://www.codewithrepl.it/07-building-a-game-with-pygame.html
Productivity hacks 99
As an example, below you can see how we might use this to first delete one of our elif blocks by
doing two “delete line” operations. We then change our random speed to be constant by using a
“delete to end of line” operation from the = sign and then typing our constant.
Productivity hacks 100
Instead you can use Ctrl+] (cmd+] on MacOS) to indent and dedent the line no matter where your
cursor is. For example, if you need to fix the indentation in the following code, you can
* put your cursor on the for line
* press Ctrl+]
* press down
* press Shift+down
* press Ctrl+] again twice.
Productivity hacks 102
Adding cursors
Sometimes it’s useful to make exactly the same changes in multiple places at once. For example, we
might want to rename our speed attribute to velocity. Put your cursor anywhere on the word that
you want to change and press Ctrl+D (cmd+D on MacOS). Repeatedly press Ctrl+D to select matching
words individually, each with their own cursor. Now you can apply edits and they will appear at
each selection, as below.
Productivity hacks 104
If you want multiple cursors on consecutive lines, press Ctrl+Alt+up or Ctrl+Alt+down (cmd+option+up
and cmd+option+down on MacOs). For example, if we want a square game we could change both
width and height to be 1000 simultaneously as follows.
Productivity hacks 105
The go to line operation (Ctrl+G) allows you to navigate to a line by giving its line number. This is
useful to track down the source of those error messages that tell you what line had an issue, or if
you’re on a call with someone who says “I’m looking at line 23” and you can quickly jump to the
same place.
Finally, you can open a specific file by searching for a part of the name by pressing Ctrl+P (cmd+P
on MacOS), which can be quicker than scrolling through the files pane if you have a lot of files.
Productivity hacks 107
Where next?
Now that you have mastered the productivity features of Repl.it, you can build proof of concept
applications in no time.
In the next tutorial, we’ll show you how to store data directly in the Repl.it key-value store, one
of the simplest verieties of database. This will cover the so-called “CRUD” (Create, Read, Update,
Delete) operations that are fundamental to any database-backed software.
Using the Repl.it database
In previous tutorials we used the file system to store data persistently. This works fine for smaller
projects, but there are some limitations to storing data directly in a file system. A more advanced
way to store data which is used by nearly any production application is a database.
Another advantage of storing data in a database instead of in files is that it separates our code and
data cleanly. If we build an application on Repl.it that processes any kind of data, it’s likely that we’ll
want to share the code with other people but not the data. Having our data cleanly separated into a
private database allows us to do exactly this.
In this tutorial, you’ll see how to store data from a Repl.it project directly in the Repl.it key-value
store, one of the simplest varieties of database, similar to a Python dictionary and more scalable.
As a demonstration project, we’ll build a basic phone book application, storing contact information
about friends and family and a command line application to allow users to:
This will cover the so-called “CRUD” (Create, Read, Update, Delete) operations that are fundamental
to any database-backed software.
Now create a new Python repl called “phonebook”.
Using the Repl.it database 110
Databases usually store data on a separate physical server from where your code is running, so
your code needs to know how to find the database and how to authenticate (to prove that you are
authorised to access a specific database to stop other people reading your data).
Usually we would have to supply some kind of credentials for this (e.g. a username and password),
as well as an endpoint to indicate where the database can be found. In this case, Repl.it handles
everything automatically (as long as you are signed in), so you can start storing data straight away.
The db object works very similarly to a global Python dictionary but any data is persistently stored.
You can associate a specific value with a given key in the same way. Add the following to your
main.py file.
You should see the phone number printed to the console, as shown below.
For a concrete example, consider storing the same “John Smith” contact in both a dictionary and the
database. Replace the code in your main.py file with the following and run it.
Using the Repl.it database 111
Here we store the information first in the database and print it from the database and then in a
dictionary and print it from there. In both cases, we see the result printed and the syntax is exactly
the same.
However, if we comment out the lines where we create the association between key and value, and
run the code again, we’ll see a difference.
In this case, the first print still works as the data has persisted in the database. However the dictionary
has been cleared between runs so we get the error NameError: name 'd' is not defined.
Because each Repl.it project has its own unique database which needs a secret key to access, you can
add as much data to your database and still share your project without sharing any of your data.
The database also has some functionality that Python dictionaries do not, such as searching keys by
prefix, which we will take a closer look at soon.
We’ll keep the code that interacts with users in our main.py file and the database logic in a new
module called contacts.py
As we don’t have any contacts yet, we’ll start by allowing our users to add them.
1 def prompt_add_contact():
2 name = input("Please enter the contact's name: ")
3 number = input("Please enter the contact's phone number: ")
4 print(f"Adding {name} with {number}")
5
6 prompt_add_contact()
This doesn’t actually store the contact anywhere yet, but you can test it out to see how it prompts
the user for input and then displays a confirmation message.
Next we need to add some logic to store this in our database.
Create a new file called contacts.py and add the following code.
Because we will use people’s names as keys in our database and because it’s possible that different
people share the same name, it’s possible that our users could overwrite important phone numbers
by adding a new contact with the same name as an existing one. To prevent this, we’ll ensure that
they use a unique name for each contact and only add information with this method to new names.
Back in the main.py file add two lines to import our new module and call the add_contact function.
The new code should look as follows:
Using the Repl.it database 113
1 import contacts
2
3 def prompt_add_contact():
4 name = input("Please enter the contact's name: ")
5 number = input("Please enter the contact's phone number: ")
6 print(f"Adding {name} with {number}")
7 contacts.add_contact(name, number)
8
9 prompt_add_contact()
Test that this works - run it twice and enter the same name both times, with a different phone
number. You should see the confirmation the first time, but the second time it will inform you that
the contact already exists, as shown below.
1 def prompt_get_contact():
2 name = input("Please enter the name to find: ")
3 number = contacts.get_contact(name)
4 if number:
5 print(f"{name}'s number is {number}")
6 else:
7 print(f"It looks like {name} does not exist")
8
9 prompt_get_contact()
Note that this time we call the get_contact function before we write it - we have a blueprint that
works now from our previous example so we can skip some back-and-forth steps.
Add the following function to contacts.py:
1 def get_contact(name):
2 number = db.get(name)
3 return number
Our new code to go into contacts.py is very simple and it might be tempting to just put this logic
directly in the main.py file as it’s so short. However it’s good to stay consistent as each of the files
is likely to grow in length and complexity over time, and it will be easier to maintain our codebase
if our user interaction code is strictly separate from our database interaction code.
Run the code again and input the same name as before. If all went well, you’ll see the number, as in
the example below.
• removing contacts.
But before we get started on those problems, we need to allow users to choose what kind of
functionality they want to activate. With a GUI or web application, we could add some menu items
or buttons, but our command line application is driven only by text input and output on a simple
console. Let’s build a main menu that allows users to specify what they want to do.
To make life easier for our users, we’ll let them make choices by inputting a single number that’s
associated with the relevant menu item.
Change your main.py file to look as follows:
1 import contacts
2 from os import system
3
4 main_message = """WELCOME TO PHONEBOOK
5 ----------------------------------
6 Please choose:
7 1 - to add a new contact
8 2 - to find a contact
9 ----------------------------------
10 """
11
12 def prompt_add_contact():
13 name = input("Please enter the contact's name: ")
14 number = input("Please enter the contact's phone number: ")
15 print(f"Adding {name} with {number}")
16 contacts.add_contact(name, number)
17
18 def prompt_get_contact():
19 name = input("Please enter the name to find: ")
20 number = contacts.get_contact(name)
21 if number:
22 print(f"{name}'s number is {number}")
23 else:
24 print(f"It looks like {name} does not exist")
25
26 def main():
27 print(main_message)
28 choice = input("Please make your choice: ").strip()
29 if choice == "1":
30 prompt_add_contact()
31 elif choice == "2":
32 prompt_get_contact()
Using the Repl.it database 116
33 else:
34 print("Invalid input. Please try again.")
35
36 while True:
37 system("clear")
38 main()
39 input("Press enter to continue: ")
This looks like a lot more code than we had before, but if you ignore the multi-line string at the top
and the two functions that we already had, there’s not much more. Our new main() function asks
the users to choose an item from the menu, makes sure that it’s a valid choice, and then calls the
appropriate function.
Below our main() function, we have an infinite loop so that the user can keep using our application
without re-running it after the first action. We call system("clear") between runs to clean up the
old inputs and outputs (and we also added a new import at the top of the file for this).
1 def search_contacts(search):
2 match_keys = db.prefix(search)
3 return {k: db[k] for k in match_keys}
And over in main.py modify the prompt_get_contacts() function to call this if necessary (when
there is no exact match) as follows:
Using the Repl.it database 117
1 def prompt_get_contact():
2 name = input("Please enter the name to find: ")
3 number = contacts.get_contact(name)
4 if number:
5 print(f"{name}'s number is {number}")
6 else:
7 matches = contacts.search_contacts(name)
8 if matches:
9 for k in matches:
10 print(f"{k}'s number is {matches[k]}")
11 else:
12 print(f"It looks like {name} does not exist")
Run the code again and choose to add a contact. Enter “Smith, Mary” when prompted and any phone
number. When the program starts over, choose to find a contact and input “Smi”. It should print out
both “Smith” matches that we have, as shown below.
Image 5: The user menu: They can now choose what action to do.
1. Change the name of a contact but keep the same phone number
2. Change the phone number of a contact but keep the same name
Because we are storing contacts as keys and values, to do 1) we need to create a new contact and
remove the original one, while for 2) we can simply update the value of the existing key.
We can handle both cases with a single prompt by allowing the user to leave either field blank, in
this case preserving the old value. Add the following function to your main.py file.
Using the Repl.it database 118
1 def prompt_update_contact():
2 old_name = input("Please enter the name of the contact to update: ")
3 old_number = contacts.get_contact(old_name)
4 if old_number:
5 new_name = input(f"Please enter the new name for this contact (leave blank t\
6 o keep {old_name}): ").strip()
7 new_number = input(f"Please enter the new number for this contact (leave bla\
8 nk to keep {old_number}): ").strip()
9
10 if not new_number:
11 new_number = old_number
12
13 if not new_name:
14 contacts.update_number(old_name, new_number)
15 else:
16 contacts.update_contact(old_name, new_name, new_number)
17
18 else:
19 print(f"It looks like {old_name} does not exist")
This uses two functions in our contacts.py file that don’t exist yet. These are:
Note how we can use the del Python keyword to remove things from our database. We’ll use this
again in the next section.
Now we need to allow users to choose “update” as an option from the menu. In the main.py file, add
a new line to the menu prompt to inform our users about the option and update the main() function
to call the new update function when appropriate, as follows:
Using the Repl.it database 119
1 def main():
2 print(main_message)
3 choice = input("Please make your choice: ").strip()
4 if choice == "1":
5 prompt_add_contact()
6 elif choice == "2":
7 prompt_get_contact()
8 elif choice == "3":
9 prompt_update_contact()
10 else:
11 print("Invalid input. Please try again.")
Test it out! Change someone’s name, someone else’s number, and then update both the name and
the number at once.
1 def prompt_delete_contact():
2 name = input("Please enter the name to delete: ")
3 contact = contacts.get_contact(name)
4 if contact:
5 print(f"Deleting {name}")
6 contacts.delete_contact(name)
7 else:
8 print(f"It looks like {name} does not exist")
1 def delete_contact(name):
2 del db[name]
1 def main():
2 print(main_message)
3 choice = input("Please make your choice: ").strip()
4 if choice == "1":
5 prompt_add_contact()
6 elif choice == "2":
7 prompt_get_contact()
8 elif choice == "3":
9 prompt_update_contact()
10 elif choice == "4":
11 prompt_delete_contact()
12 else:
13 print("Invalid input. Please try again.")
It may be a bit inconvenient to type out the whole name of a contact that you want to delete, but it’s
usually acceptable to make “dangerous” operations less user friendly. As there is no way to recover
contacts, it’s good to make it a bit more difficult to delete them. Maybe our user will change their
mind while typing out the name of an old friend to delete the record and reach out instead :).
Using the Repl.it database 121
Where next
You’ve learned how basic databases work. Databases are a complicated topic on their own and it
can take years or decades to master the more advanced aspects of them, but they can also do more
than the simple operations that we’ve covered here. Spend some time reading about PostgreSQL⁶⁴
and relational databases⁶⁵ in general, or other key-value stores⁶⁶ like the Repl.it database.
Even without further research, the basic Create, Read, Update, and Delete (CRUD) operations that
we covered here will get you far and you can build nearly any app you can imagine with just these.
Next we’ll take a look at playing audio files programmatically so you can use Python to control your
music.
⁶⁴https://www.postgresql.org/
⁶⁵https://en.wikipedia.org/wiki/Relational_database
⁶⁶https://en.wikipedia.org/wiki/Key%E2%80%93value_database
Repl.it Audio
Most people control their music players manually, pressing the pause button to pause a track
or hitting a volume up control to raise the volume. With Repl.it, you can automate your media
experience using code.
In this tutorial, we’ll build a media player that can play audio files programmatically, allowing the
user to pause playback, change the track, change the volume, or get looping information by giving
text commands.
We’ll also outline how this could be integrated into other applications, such as a chatbot, but we’ll
leave the implementation of that as an exercise for the reader.
⁶⁹https://freemusicarchive.org/search
Repl.it Audio 124
1 import requests
2
3 url = " https://files.freemusicarchive.org/storage-freemusicarchive-org/music/Oddio_\
4 Overplay/MIT_Concert_Choir/Carmina_Burana/MIT_Concert_Choir_-_01_-_O_Fortuna.mp3"
5
6 r = requests.get(url)
7 with open("o_fortuna.mp3", "wb") as f:
8 f.write(r.content)
Change the URL to the one you chose and o_fortuna.mp3 to something more appropriate if you
chose a different song.
This downloads the song, opens up a binary file, and writes the contents of the download to the file.
You should see the new file pop up in the files tab on the left after you run this code.
Instead of downloading the audio file using requests as shown above, you can also press the add
file button in your repl and upload an audio file from your local machine.
Note that your repl usually dies the moment there is no more code to execute, and playing audio
doesn’t keep it alive. For now, we are sleeping for 10 seconds which keeps the repl alive and the
audio playing. If you run this, you should hear the first 10 seconds of the track before it cuts out.
It’s not ideal to keep the execution loop locked up in a sleep() call as we can’t interact with our
program so we can’t control the playback in any way.
To keep the music playing until the user presses a key, change the last line to:
Now the program is blocked waiting for user input and the music will keep playing until the user
enters something.
Let’s add some more useful controls.
• source.volume: an attribute that we can add to or subtract from to increase or decrease the
volume
• source.paused: an attribute we can change to True or False to pause or unpause the track
• source.set_loop(): a method we can call to specify how many times a track should loop before
ending
We can also display useful information about the current status of our media player by looking at:
• source.loops_remaining: an attribute to see how many more time a track will loop
• source.get_remaining(): a method to see the remaining playtime for the current track.
We’ll allow the user to see the current information but for simplicity we’ll only update this on each
input, so our display will often display ‘out of date’ information.
1 import time
2 from os import system
3 from replit import audio
4
5 main_message = """
6 +: volume up
7 -: volume down
8 k: add loop
9 j: remove loop
10 <space>: play/pause
11 """
Here we add one more import for system which we’ll use to clear the screen so that the user doesn’t
see old information. We then define a string that will prompt the user with their options.
1 def show_status(source):
2 time.sleep(0.2)
3 system("clear")
4 vbar = '|' * int(source.volume * 20)
5 vperc = int(source.volume * 100)
6 pp = "�" if source.paused else "�"
7
8 print(f"Volume: {vbar} {vperc}% \n")
9 print(f"Looping {source.loops_remaining} time(s)")
10 print(f"Time remaining: {source.get_remaining()}")
11 print(f"Playing: {pp}")
12 print(main_message)
Note that we add a time.sleep() at the top of this function. Because changing the status involves
writing to the /tmp/audio file we discussed before and reading the status involves reading from this
file, we want to wait a short while to ensure we don’t read stale information before showing it to
the user.
Otherwise our function clears the screen, prints out a text-based volume bar along with the current
volume percentage, and shows other information such as whether the track is currently playing or
paused, how many loops are left, and how much time is left before the track finishes.
Finally, we need a loop to constantly prompt the user for the next command which will also keep
our repl alive and continue playing the track while we are waiting for user input. Add the following
main() function to main.py and call it:
Repl.it Audio 127
1 def main():
2 source = audio.play_file("o_fortuna.mp3")
3 time.sleep(1)
4 show_status(source)
5
6 while True:
7 choice = input("Enter command: ")
8 if choice == '+':
9 source.volume += 0.1
10 elif choice == '-':
11 source.volume -= 0.1
12 elif choice == "k":
13 source.set_loop(source.loops_remaining + 1)
14 elif choice == "j":
15 source.set_loop(source.loops_remaining - 1)
16 elif choice == " ":
17 source.paused = not source.paused
18 show_status(source)
19
20 main()
Once again, you should replace the “o_fortuna” string if you downloaded or uploaded a different
audio file.
If you run the repl now you should hear you track play and you can control it by inputting the
various commands.
If you’ve ever played a musical instrument, you’ll probably have come across notes referred to by
the letters A-G. With digital audio, you’ll specify the pitch in hertz (Hz). “Middle C” on a piano is
usually 262 Hz and the A above this is 440 Hz.
Let’s write a program to play “Twinkle Twinkle Little Star”. Create a new Python repl and add the
following code to main.py.
1 import time
2 from replit import audio
3
4 def play_note(note, duration):
5 note_to_freq = {
6 "C": 262, "D": 294, "E": 330, "F": 349, "G": 392, "A": 440
7 }
8 audio.play_tone(duration, note_to_freq[note], 0)
9 time.sleep(duration)
10
11 play_note("C", 2)
Above we set up a convenience function to play specific notes for a specific duration. It includes
a dictionary mapping the names of notes to their frequencies. We’ve only done one octave and no
sharps or flats, but you can easily extend this to add the other notes.
It then plays the tone of the note passed in for the specified duration. We sleep for that duration
too, as othewise the next note will be played before the previous note is finished. We also pass a 0
to play_tone which specifies the default sine waveform. You can change it to 1, 2, or 3 for triangle,
saw, or square, which you can read about in more detail⁷¹.
Test that you can play a single note as expected. Now you can play the first part of “Twinkle Twinkle
Little Star” by defining all of the notes and durations, and then looping through them, calling play_-
note on each in turn.
⁷⁰https://www.perfectcircuit.com/signal/difference-between-waveforms
⁷¹https://www.perfectcircuit.com/signal/difference-between-waveforms
Repl.it Audio 129
1 notes = ["C", "C", "G", "G", "A", "A", "G", "F", "F", "E", "E", "D", "D", "C"]
2 durations = [2, 2, 2, 2, 2, 2, 4, 2, 2, 2, 2, 2, 2, 4]
3
4 for i in range(len(notes)):
5 play_note(notes[i], durations[i])
We can also control the volume of each tone by passing a volume argument to play_tone(). As for
audio files, this is a float where 1 represents 100% volume. If we wanted to implement a decrescendo
(gradual decrease in volume), we could modify our code to look as follows:
Here we added a volume argument to our play_note() function so that we can pass it along to
play_tone(). Each time around the loop we reduce the volume by 5%. Play it again and you should
hear the song slowly fade out (if you add more than 20 notes, the volume will hit 0 so you’ll have
to reduce the step or increase the volume at some point to stop the song going silent).
Where next
Controlling your audio files through a text-based interface might feel like a downgrade from using
a GUI media player, but you can use these concepts to integrate audio controls into your other
applications. For example, you could create a Discord chatbot⁷³ that plays different tracks and
automatically pauses or reduces the volume of your music when you join a Discord voice channel.
Or you could integrate audio tracks into a web application or game (e.g. playing a victory or defeat
sound at a specific volume given certain conditions).
Once you can control something using code, the possibilities are pretty broad, so use your imagina-
tion!
You’ve reached the end of this collection of tutorials that teach you the ins and outs of Repl.it, and
you should be able to build any project that you can imagine now.
If you’re stuck for ideas, continue on to Part 3 where we’ll walk you through eight practical projects,
focusing more on coding concepts than Repl.it features.
⁷³https://ritza.co/showcase/repl.it/building-a-discord-bot-with-python-and-repl-it.html
Beginner web scraping with Python
and Repl.it
In this guide, we’ll walk through how to grab data from web sites automatically. Most websites are
created with a human audience in mind - you use a search engine or type a URL into your web
browser, and see information displayed on the page. Sometimes, we might want to automatically
extract and process this data, and this is where web scraping can save us from boring repetitive
labour. We can create a custom computer program to visit web sites, extract specific data and process
this data in a particular way.
We’ll be extracting news data from the bbc.com⁷⁴ news website, but you should be able to adapt it
to extract information from any website that you want with a bit of trial and error.
There are many reasons you might wish to use web scraping. For example, you might need to:
• extract numbers from a report that is released weekly and published online
• grab the schedule for your favourite sports team as it’s released
• find the release dates for upcoming movies in your favourite genre
• be notified automatically when a website changes
There are many other use cases for web scraping. However, you should also note that copyright
law and web scraping laws are complex and differ by country. As long as you aren’t blatantly
copying their content or doing web scraping for commercial gain, people generally don’t mind web
scaping. However, there have been some legal cases involving scraping data from LinkedIn⁷⁵ and
media attention from scraping data from OKCupid⁷⁶. Web scraping can violate the law, go against
a particular website’s terms of service, or breach ethical guidelines - so take care with where you
apply this skill.
With the disclaimer out of the way, let’s learn how to scrape!
We’ll be using the online programming environment Repl.it⁷⁸ so you won’t need to install any
software locally to follow along step by step. If you want to adapt this guide to your own needs,
you should create a free account by going to repl.it⁷⁹ and follow their sign up process.
It would help if you have basic familiarity with Python or another high-level programming language,
but we’ll be explaining each line of code we write in detail so you should be able to keep up, or at
least replicate the result, even if you don’t.
1. The one you are used to, where you can see text, images, and other media. Different fonts, sizes,
and colours are used to display information in a useful and (usually) aesthetic way.
2. The “source” of the webpage. This is the computer code that tells your web browser (e.g. Mozilla
Firefox or Google Chrome) what to display and how to display it.
Websites are created through a combination of three computer languages: HTML, CSS and JavaScript.
This in itself is a huge and complicated field with a messy history, but having a basic understanding
of how some of it works is necessary to automate web scraping effectively. If you open any website
in your browser and right-click somewhere on the page, you’ll see a menu which should include
an option to “view page source” – to inspect the code form of a website, before your web browser
interprets it.
This is shown in the image below: a normal web page on the left, with an open menu (displayed
by right-clicking on the page). Clicking “view page source” on this menu produces the result on
the right – we can see the code that contains all the data and supporting information that the web
browser needs to display the complete page. While the page on the left is easy to read, use, and looks
good, the one on the right is a monstrosity. It takes some effort and experience to make any sense of
it, but it’s possible and necessary if we want to write custom web scrapers.
⁷⁷https://www.crummy.com/software/BeautifulSoup/
⁷⁸https://repl.it
⁷⁹https://repl.it
Beginner web scraping with Python and Repl.it 133
Image 1: Normal and source view of the same BBC news article.
The <p class="story-body__introduction"> just before the highlighted section is HTML code to
specify that a paragraph (<p> in HTML) starts here and that this is a special kind of paragraph (an
introduction to the story). The paragraph continues until the </p> symbol. You don’t need to worry
about understanding HTML completely, but you should be aware that it contains both the text data
that makes up the news article and additional data about how to display it.
A large part of web scraping is viewing pages like this to a) identify the data that we are interested
in and b) to separate this from the markup and other code that it is mixed with. Even before we start
writing our own code, it can still be tricky first to understand other people’s.
In most pages, there is a lot of code to define the structure, layout, interactivity, and other function-
ality of a web page, and relatively little that contains the actual text and images that we usually view.
For especially complex pages it can be quite difficult, even with the help of the find function, to locate
the code that is responsible for a particular part of the page. For this reason, most web browsers come
with so-called “developer tools”, which are aimed primarily at programmers to assist in the creation
and maintenance of web sites, though these tools are also handy for doing web scraping.
** Image 3:** Opening Developer Tools in Chrome (left) and Firefox (right)
Activating the tool brings up a new panel in your web browser, usually at the bottom or on the
right-hand side. The tool contains an “Inspector” panel and a selector tool, which can be chosen by
pressing the icon highlighted in red below. Once the selector tool is active, you can click on parts
of the web page to view the corresponding source code. In the image below, we selected the same
first paragraph in the normal view and we can see the <p class=story-body__introduction"> code
again in the panel below.
Beginner web scraping with Python and Repl.it 136
Image 4: Viewing the code for a specific element using developer tools
The Developer Tools are significantly more powerful than using the simple find tool, but they are
also more complicated. You should choose a method based on your experience and the complexity
of the page that you are trying to analyze.
This will take you to a working Python coding environment where you can write and run Python
code. To start with, we’ll download the content from the BBC News homepage, and print out the
first 1000 characters of HTML source code.
You can do this with the following four lines of Python:
1 import requests
2
3 url = "https://bbc.com/news"
4 response = requests.get(url)
5 print(response.text[:1000])
Put this code in the main.py file that Repl automatically creates for you and press the “Run” button.
After a short delay, you should see the output in the output pane - the beginning of HTML source
code, similar to what we viewed in our web browser above.
Beginner web scraping with Python and Repl.it 138
• In line 1, we import the Python requests library, which is a library that allows us to make web
requests.
• In line 3, we define a variable containing the URL of the main BBC news site. You can visit this
URL in your web browser to see the BBC News home page.
• In line 4, we pass the URL we defined to the requests.get function, which will visit the web
page that the URL points to and fetch the HTML source code. We load this into a new variable
called response.
• In line 5, we access the text attribute of our response object, which contains all of the HTML
source code. We take only the first 1000 characters of this, and pass them to the print function,
which simply dumps the resulting text to our output pane.
We have now automatically retrieved a web page and we can display parts of the content. We are
unlikely to be interested in the full source code dump of a web page (unless we are storing it for
archival reasons), so let’s extract some interesting parts of the page, instead of only the first 1000
characters.
Let’s assume for now that we want to find all the news articles on the BBC News homepage, and get
their URLs. If we look at the main page below, we’ll see there are a bunch of stories on the home page.
By mousing over any of the headlines with the “inspect” tool, we can see that each has a unique
URL which takes us to that news story. For example, mousing over the main “US and Canada agree
new trade deal” story in the image below is a link to https://www.bbc.com/news/business-45702609.
If we inspect that element using the browser’s developer tools, we can see it is a <a> element,
which is HTML for a link, with an <href> component that points to the URL. Note that the href
section goes only to the last part of the URL, omitting the https://www.bbc.com part. Because we
are already on BBC, the site can use relative URLs instead of absolute URLs. This means that when
you click on the link, your browser will figure out that the URL isn’t complete and prepend it with
https://www.bbc.com. If you look around the source code of the main BBC page, you’ll find both
relative and absolute URLs, which already makes scraping all of the URLs on the page more difficult.
We could try to use Python’s built-in text search functions like find() or regular expressions to
extract all of the URLs from the BBC page, yet it is not actually possible to do this reliably. HTML is
a complex language which allows web developers to do many unusual things. For an amusing take
on why we should avoid a “naive” method of looking for links, see this very famous⁸¹ StackOverflow
question and the first answer.
Luckily, there is a powerful and simple-to-use HTML parsing library called BeautifulSoup⁸², which
will help us extract all the links from a given piece of HTML. We can use it by modifying the code
⁸¹https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags
⁸²https://www.crummy.com/software/BeautifulSoup/
Beginner web scraping with Python and Repl.it 140
1 import requests
2 from bs4 import BeautifulSoup
3
4 url = "https://bbc.com/news"
5
6 response = requests.get(url)
7 html = response.text
8
9 soup = BeautifulSoup(html, "html.parser")
10 links = soup.findAll("a")
11 for link in links:
12 print(link.get("href"))
If you run this code, you’ll see that it outputs dozens of URLs, one per line. You’ll probably notice
that the code now takes quite a bit longer to run than before – BeautifulSoup is not built into Python,
it is a third-party module. This means that before running the code, Repl has to go and fetch this
library and install it for you. Subsequent runs will be faster.
• In line 2, we import the BeautifulSoup library, which is used for parsing and processing HTML.
Beginner web scraping with Python and Repl.it 141
• One line 9, we transform our HTML into “soup”. This is BeautifulSoup’s representation of a
web page, which contains a bunch of useful programmatic features to search and modify the
data in the page. We use the “html.parser” option to parse HTML which is included by default
– BeautifulSoup also allows you specify a custom HTML parser here. For example, you could
install and specify a faster parser which can be useful if you need to process a lot of HTML
data.
• In line 10, we find all the a elements in our HTML and extract them to a list. Remember, when
we were looking at the URLs using our web browser (Image 7), we noted that the <a> element
in HTML was used to define links, with the href attribute being used to specify where the link
should go to. This line finds all of the HTML <a> elements.
• In line 11, we loop through all of the links we have, and in line 12 we print out the href section.
These last two lines show why BeautifulSoup is useful. To try and find and extract these elements
without it would be remarkably difficult, but now we can do it in two lines of readable code!
If we look at the URLs in the output pane, we’ll see quite a mixed bag of results. We have absolute
URLs (starting with “http”) and relative ones (starting with “/”). Most of them go to general pages
rather than specific news articles. We need to find a pattern in the links we’re interested in (that go
to news articles) so that we can extract only those.
Again, trial and error is the best way to do this. If we go to the BBC News home page and
use developer tools to inspect the links that go to news articles, we’ll find that they all have a
similar pattern. They are relative URLs which start with “/news” and end with a long number, e.g.
/news/newsbeat-45705989
We can make a small change to our code to only output URLs that match this pattern. Replace the
last two lines of our Python code with the following four lines:
Here we still loop through all of the links that BeautifulSoup found for us, but now we extract the
href to its own variable immediately after. We then inspect this variable to make sure that it matches
our conditions (starts with “/news” and ends with a digit), and only if it does, then we print it out.
Beginner web scraping with Python and Repl.it 142
1 import requests
2 import string
3
4 from collections import Counter
5
6 from bs4 import BeautifulSoup
7
8
9 url = "https://bbc.com/news"
10
11
12 response = requests.get(url)
13 html = response.text
Beginner web scraping with Python and Repl.it 143
This code is quite a bit more complicated than what we previously wrote, so don’t worry if you
don’t understand all of it. The main changes are:
• At the top, we add two new imports in addition to the requests library. The first new module
is one for string, which is a standard Python module that contains some useful word and letter
shortcuts. We’ll use it to identify all the capital letters in our alphabet. The second module is
a Counter, which is part of the built-in collections module. This will let us find the most
common nouns in a list, once we have built a list of all the nouns.
• We’ve added news_urls = [] at the top of the first for loop. Instead of printing out each URL
once we’ve identified it as a “news URL”, we add it to this list so we can download each page
later. Inside the for loop two lines down, we combine the root domain (“http://bbc.com”) with
each href attribute and then add the complete URL to our news_urls list.
• We then go into another for loop, where we loop through the first 10 news URLs (if you
have more time, you can remove the [:10] part to iterate through all the news pages, but
for efficiency, we’ll just demonstrate with the first 10).
• We print out the URL that we’re fetching (as it takes a second or so to download each page, it’s
nice to display some feedback so we can see that the program is working).
• We then fetch the page and turn it into soup, as we did before.
Beginner web scraping with Python and Repl.it 144
• With words = soup.text.split() we extract all the text from the page and split this resulting
big body of text into individual words. The Python split() function splits on white space,
which is a crude way to extract words from a piece of text, but it will serve our purpose for
now.
• The next line loops through all the words in that given article and keeps only the ones that are
made up of numeric characters and which start with a capital letter (string.ascii_uppercase
is just the uppercase alphabet). This is also an extremely crude way of extracting nouns, and
we will get a lot of words (like those at the start of sentences) which are not actually proper
nouns, but again it’s a good enough approximation for now.
• We then add all the words that look like nouns to our all_nouns list and move on to the next
article to do the same.
• Finally, once we’ve downloaded all the pages, we print out the 100 most common nouns along
with a count of how often they appeared using Python’s convenient Counter object.
You should see output similar to that in the image below (though your words will be different, as
the news changes every few hours). We have the most common “nouns” followed by a count of how
often that noun appeared in all 10 of the articles we looked at.
We can see that our crude extraction and parsing methods are far from perfect – words like “Twitter”
and “Facebook” appear in most articles because of the social media links at the bottom of each article,
so their presence doesn’t mean that Facebook and Twitter themselves are in the news today. Similarly,
words like “From” aren’t nouns, and other words like “BBC” and “Business” are also included because
they appear on each page, outside of the main article text.
Image: 10 The final output of our program, showing the words that appear most often in BBC articles.
Beginner web scraping with Python and Repl.it 145
Where next?
We’ve completed the basics of web scraping and have looked at how the web works, how to
extract information from web pages, and how to do some very basic text extraction. You will
probably want to do something other than extract words from BBC! You can fork this Repl from
https://repl.it/@GarethDwyer1/beginnerwebscraping and modify it to change which site it scrapes
and what content it extracts. You can also join the Repl Discord Server⁸³ to chat with other developers
who are working on similar projects and who will happily exchange ideas with you or help if you
get stuck.
We have walked through a very flexible method of web scraping, but it’s the “quick and dirty” way.
If BBC updates their website and some of our assumptions (e.g. that news URLs will end with a
number) break, our web scraper will also break.
Once you’ve done a bit of web scraping, you’ll notice that the same patterns and problems come
up again and again. Because of this, there are many frameworks and other tools that solve these
common problems (finding all the URLs on the page, extracting text from the other code, dealing
with changing web sites, etc), and for any big web scraping project, you’ll definitely want to use
these instead of starting from scratch.
Some of the best Python web scraping tools are:
• Scrapy⁸⁴: A framework used by people who want to scrape millions or even billions of web
pages. Scrapy lets you build “spiders” – programmatic robots that move around the web at high
speed, gathering data based on rules that you specify.
• Newspaper⁸⁵: we touched on how it was difficult to separate the main text of an online news
article from all the other content on the page (headers, footers, adverts, etc). This problem is
an incredibly difficult one to solve. Newspaper uses a combination of manually specified rules
and some clever algorithms to remove the “boilerplate” or non-core text from each article.
• Selenium⁸⁶: we scraped some basic content without using a web browser, and this works fine
for images and text. Many parts of the modern web are dynamic though – e.g. they only load
when you scroll down a page far enough or click on a button to reveal more content. These
dynamic sites are challenging to scrape, but Selenium allows you to fire up a real web browser
and control it just as a human would (but automatically), and this allows you to access this
kind of dynamic content.
There is no shortage of other tools, and a lot can be done simply by using them in combination with
each other. Web scraping is a vast world that we’ve only just touched on, but we’ll explore some
more web scraping use cases in the next chapter, in particular, building news word clouds.
⁸³https://discord.com/login?redirect_to=%2Fchannels%2F%40me
⁸⁴https://scrapy.org/
⁸⁵https://github.com/codelucas/newspaper
⁸⁶https://www.seleniumhq.org/
Building news word clouds using
Python and Repl.it
Word clouds, which are images showing scattered words in different sizes, are a popular way to
visualise large amounts of text. Words that appear more frequently in the given text are larger, and
less common words are smaller or not shown at all.
In this tutorial, we’ll build a web application using Python and Flask that transforms the latest news
stories into word clouds and displays them to our visitors.
At the end of this tutorial, our users will see a page similar to the one shown below, but containing
the latest news headlines from BBC news. We’ll learn some tricks about web scraping, RSS feeds,
and building image files directly in memory along the way.
Image: 1
We’ll be using the online programming environment Repl.it⁸⁷ so you won’t need to install any
software locally to follow along step by step. If you want to adapt this guide to your own needs,
you should create a free account by going to repl.it⁸⁸ and follow their sign up process.
Web scraping
We previously looked at basic web scraping in an introduction to web scraping⁸⁹. If you’re completely
new to the idea of automatically retrieving content from the internet, have a look at that tutorial
first.
In this tutorial, instead of scraping the links to news articles directly from the BBC homepage, we’ll
be using RSS feeds⁹⁰ - an old but popular standardised format that publications use to let readers
know when new content is available.
Image: 2
If you click on the link above, you won’t see the XML directly. Instead, it has some associated styling
information so that most web browsers will display something that’s a bit more human friendly. For
example, opening the page in Google Chrome shows the page below. In order to view the raw XML
directly, you can right-click on the page and click “view source”.
Building news word clouds using Python and Repl.it 149
Image: 3
RSS feeds are used internally by software such as the news reader Feedly⁹¹ and various email clients.
We’ll be consuming these RSS feeds with a Python library to retrieve the latest articles from BBC.
Image: 4
1 import feedparser
2
3 BBC_FEED = "http://feeds.bbci.co.uk/news/world/rss.xml"
4 feed = feedparser.parse(BBC_FEED)
5
6 for article in feed['entries']:
7 print(article['link'])
Feedparser does most of the heavy lifting for us, so we don’t have to get too close to the slightly
cumbersome XML format. In the code above, we parse the feed into a nice Python representation
(line 4), loop through all of the entries (the <item> entries from the XML we looked at earlier), and
print out the link elements.
If you run this code, you should see a few dozen URLs output on the right pane, as in the image
below.
⁹³https://www.codementor.io/garethdwyer/beginner-web-scraping-with-python-and-repl-it-nzr27jvnq
⁹⁴https://www.crummy.com/software/BeautifulSoup/bs4/doc/
⁹⁵https://pythonhosted.org/feedparser/
Building news word clouds using Python and Repl.it 151
Image: 5
1 import feedparser
2 from flask import Flask
3
4 app = Flask(__name__)
5
6 BBC_FEED = "http://feeds.bbci.co.uk/news/world/rss.xml"
7
8 @app.route("/")
9 def home():
10 feed = feedparser.parse(BBC_FEED)
11 urls = []
12
13 for article in feed['entries']:
⁹⁶http://flask.pocoo.org/
Building news word clouds using Python and Repl.it 152
14 urls.append(article['link'])
15
16 return str(urls)
17
18
19 if __name__ == '__main__':
20 app.run('0.0.0.0')
Here we still parse the feed and extract all of the latest article URLs, but instead of printing them
out, we add them to a list (urls), and return them from a function. The interesting parts of this code
are:
Press “run” again, and you should see a new window appear in the top right pane. Here we can see
a basic web page (viewable already to anyone in the world by sharing the URL you see above it),
and we see the same output that we previously printed to the console.
Image: 6
Building news word clouds using Python and Repl.it 153
1 import feedparser
2 import requests
3
4 from flask import Flask
5 from bs4 import BeautifulSoup
6
7 app = Flask(__name__)
8
9 BBC_FEED = "http://feeds.bbci.co.uk/news/world/rss.xml"
10 LIMIT = 2
11
12 def parse_article(article_url):
13 print("Downloading {}".format(article_url))
14 r = requests.get(article_url)
15 soup = BeautifulSoup(r.text, "html.parser")
16 ps = soup.find_all('p')
17 text = "\n".join(p.get_text() for p in ps)
18 return text
19
20 @app.route("/")
21 def home():
22 feed = feedparser.parse(BBC_FEED)
23 article_texts = []
24
25 for article in feed['entries'][:LIMIT]:
26 text = parse_article(article['link'])
27 article_texts.append(text)
28 return str(article_texts)
29
30 if __name__ == '__main__':
31 app.run('0.0.0.0')
Building news word clouds using Python and Repl.it 154
If you run the code now, you should see output similar to that shown in the image below (you may
need to hit refresh in the right pane). You can see text from the first article in the top-right pane now,
and the text for the second article is further down the page. You’ll notice that out text extraction
algorithm isn’t perfect and there’s still some extra text about “Share this” at the top that isn’t actually
part of the article, but this is good enough for us to create word clouds from later.
Image: 7
CSS styling, it’s better to define HTML templates, and use Flask’s template engine, jinja, to inject
dynamic content into these. Before we get to creating image files from our text content, let’s set up
a basic Flask template.
To use Flask’s templates, we need to set up a specific file structure. Press the “new folder” button
(next to the “new file” button, on the left pane), and name the resulting new folder templates. This
is a special name recognised by Flask, so make sure you get the spelling exactly correct.
Select the new folder and press the “new file” button to create a new file inside our templates folder.
Call the file home.html. Note below how the home.html file is indented one level, showing that it is
inside the folder. If yours is not, drag and drop it into the templates folder so that Flask can find it.
Image: 8
In the home.html file, add the following code, which is a mix between standard HTML and Jinja’s
templating syntax to mix dynamic content into the HTML.
1 <html>
2 <body>
3 <h1>News Word Clouds</h1>
4 <p>Too busy to click on each news article to see what it's about? Below you \
5 can see all the articles from the BBC front page, displayed as word clouds. If you w\
6 ant to read more about any particular article, just click on the wordcloud to go to \
7 the original article</p>
8 {% for article in articles %}
9 <p>{{article}}</p>
10 {% endfor %}
Building news word clouds using Python and Repl.it 156
11 </body>
12 </html>
Jinja uses the specials characters {% and {{ (in opening and closing pairs) to show where dynamic
content (e.g. variables calculated in our Python code) should be added and to define control
structures. Here we loop through a list of articles and display each one in a set of <p> tags.
We’ll also need to tweak our Python code a bit to account for the template. In the main.py file, make
the following changes.
• Add a new import near the top of the file, below the existing Flask import
• Update the last line of the home() function to make a call to render_template instead of
returning a str directly as follows.
1 @app.route("/")
2 def home():
3 feed = feedparser.parse(BBC_FEED)
4 article_texts = []
5
6 for article in feed['entries'][:LIMIT]:
7 text = parse_article(article['link'])
8 article_texts.append(text)
9 return render_template('home.html', articles=article_texts)
The render_template call tells Flask to prepare some HTML to return to the user by combining data
from our Python code and the content in our home.html template. Here we pass article_texts to
the renderer as articles, which matches the articles variable we loop through in home.html.
If everything went well, you should see different output now, which contains our header from the
HTML and static first paragraph, followed by two paragraphs showing the same article content that
we pulled before. If you don’t see the updated webpage, you may need to hit refresh in the right
pane again.
Building news word clouds using Python and Repl.it 157
Image: 9
⁹⁷https://imgur.com/
Building news word clouds using Python and Repl.it 158
1 import base64
2 import feedparser
3 import io
4 import requests
5
6 from bs4 import BeautifulSoup
7 from wordcloud import WordCloud
8 from flask import Flask
9 from flask import render_template
We’ll be converting the text from each article into a separate word cloud, so it’ll be useful to have
another helper function that can take text as input and produce the word cloud as output. We can use
base64⁹⁸ to represent the images, which can then be displayed directly in our visitors’ web browsers.
Add the following function to the main.py file.
1 def get_wordcloud(text):
2 pil_img = WordCloud().generate(text=text).to_image()
3 img = io.BytesIO()
4 pil_img.save(img, "PNG")
5 img.seek(0)
6 img_b64 = base64.b64encode(img.getvalue()).decode()
7 return img_b64
This is probably the hardest part of our project in terms of readability. Normally, we’d generate the
word cloud using the wordcloud library and then save the resulting image to a file. However, because
we don’t want to use our file system here, we’ll create a BytesIO Python object in memory instead
and save the image directly to that. We’ll convert the resulting bytes to base64 in order to finally
return them as part of our HTML response and show the image to our visitors.
In order to use this function, we’ll have to make some small tweaks to the rest of our code.
For our template, in the home.html file, change the for loop to read as follows.
Now instead of displaying our article in <p> tags, we’ll put it inside an <img/> tag so that it can be
displayed as an image. We also specify that it is formatted as a png and encoded as base64.
The last thing we need to do is modify our home() function to call the new get_wordcloud() function
and to build and render an array of images instead of an array of text. Change the home() function
to look as follows.
⁹⁸https://en.wikipedia.org/wiki/Base64
Building news word clouds using Python and Repl.it 159
1 @app.route("/")
2 def home():
3 feed = feedparser.parse(BBC_FEED)
4 clouds = []
5
6 for article in feed['entries'][:LIMIT]:
7 text = parse_article(article['link'])
8 cloud = get_wordcloud(text)
9 clouds.append(cloud)
10 return render_template('home.html', articles=clouds)
We made changes on lines 4, 8, 9, and 10, to change to a clouds array, populate that with images
from our get_wordcloud() function, and return that in our render_template call.
If you restart the Repl and refresh the page, you should see something similar to the following. We
can see the same content from the articles, however, we can now see the important keywords without
having to read the entire article.
Image:10
For a larger view, you can pop out the website in a new browser tab using the button in the top right
of the Repl editor (indicated in red above).
The last thing we need to do is add some styling to make the page look a bit prettier and link the
images to the original articles.
Building news word clouds using Python and Repl.it 160
Adding CSS
Edit the home.html file to look as follows
1 <html>
2 <head>
3 <title>News in WordClouds | Home</title>
4 <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/3.4.1/\
5 css/bootstrap.min.css" integrity="sha384-HSMxcRTRxnN+Bdg0JdbxYKrThecOKuH5zCYotlSAcp1\
6 +c8xmyTe9GYg1l9a69psu" crossorigin="anonymous">
7
8 <style type="text/css">
9 body {padding: 20px;}
10 img{padding: 5px;}
11 </style>
12 </head>
13
14 <body>
15 <h1>News Word Clouds</h1>
16 <p>Too busy to click on each news article to see what it's about? Below you ca\
17 n see all the articles from the BBC front page, displayed as word clouds. If you wan\
18 t to read more about any particular article, just click on the wordcloud to go to th\
19 e original article</p>
20 {% for article in articles %}
21 <a href="{{article.url}}"><img src="data:image/png;base64,{{article.image}}"\
22 ></a>
23 {% endfor %}
24 </body>
25 </html>
On line 3 we add a title, which is displayed in the browser tab. On line 4, we import Bootstrap⁹⁹,
which has some nice CSS defaults right out the box (it’s probably a bit heavy-weight for our project
as we have so little content and won’t use most of Bootstrap’s features, but it’s nice to have if you’re
planning on extending the project.)
⁹⁹https://getbootstrap.com/
Building news word clouds using Python and Repl.it 161
On lines 6-8, we add padding to the main body to stop the text going to close to the edges of the
screen, and also add padding to our images to stop them touching each other.
On line 16, we use an <a> tag to add a link to our image. We also change the Jinja templates to
{{article.url}} and {{article.image}} so that we can have images that link back to the original
news article.
Now we need to tweak our backend code again to pass through the URL and image for each article,
as the template currently doesn’t have access to the URL.
1 class Article:
2 def __init__(self, url, image):
3 self.url = url
4 self.image = image
This is a simple class with two attributes: url and image. We’ll store the original URL from the RSS
feed in url and the final base64 wordcloud in image.
To use this class, modify the home() function to look as follows.
1 @app.route("/")
2 def home():
3 feed = feedparser.parse(BBC_FEED)
4 articles = []
5
6 for article in feed['entries'][:LIMIT]:
7 text = parse_article(article['link'])
8 cloud = get_wordcloud(text)
9 articles.append(Article(article['link'], cloud))
10 return render_template('home.html', articles=articles)
We changed the name of our clouds list to articles, and populated it by initialising Article objects
in the for loop and appending them to this list. We then pass across articles=articles instead of
articles=clouds in the return statement so that the template can access our list of Article objects,
which each contain the image and the URL of each article.
If you refresh the page again and expand the window using the pop out button, you’ll be able to
click any of the images to go to the original article, allowing readers to view a brief summary of the
day’s news or to read more details about any stories that catch their eye.
Building news word clouds using Python and Repl.it 162
Where next?
We’ve included several features in our web application, and looked at how to use RSS feeds and
process and serve images directly in Python, but there are a lot more features we could add. For
example:
• Our application only shows two stories at a time as the download time is slow. We could instead
look at implementing a threaded solution to downloading web pages so that we could process
several articles in parallel. Alternatively (or in addition), we could also download the articles
on a schedule and cache the resulting images so that we don’t have to do the resource heavy
downloading and parsing each time a visitor visits our site.
• Our web application only shows articles from a single source (BBC), and only from today. We
could add some more functionality to show articles from different sources and different time
frames. We could also consider allowing the viewer to choose which category of articles to
view (news, sport, politics, finance, etc) by using different RSS feeds as sources.
• Our design and layout is very basic. We could make our site look better and be more responsive
by adding more CSS. We could lay out the images in a grid of rows and columns to make it
look better on smaller screens such as mobile phones.
If you’d like to keep working on the web application, simply head over to the Repl¹⁰⁰ and fork it to
continue your own version.
In the next chapter, we’ll be looking at how to build our own Discord Chatbot.
¹⁰⁰https://repl.it/@GarethDwyer1/news-to-wordcloud
Building a Discord Bot with Python
and Repl.it
In this tutorial, we’ll use Repl.it¹⁰¹ and Python to build a Discord Chatbot. If you’re reading this
tutorial, you probably have at least heard of Discord and likely have an existing account. If not,
Discord is a VoIP and Chat application that is designed to replace Skype for gamers. The bot we
create in this tutorial will be able to join a Discord server and respond to messages sent by people.
If you prefer JavaScript, the next chapter is the same tutorial using NodeJS instead of Python.
You’ll find it easier to follow along if you have some Python knowledge and have used Discord or a
similar app such as Skype or Telegram before. We won’t be covering the very basics of Python, but
we will explain each line of code in detail, so if you have any experience with programming, you
should be able to follow along.
Let’s get through these admin steps first and then we can get to the fun part of coding our bot.
You can also rename the application and provide a description for your bot at this point and press
“Save Changes”.
You have now created a Discord application. The next step is to add a bot to this application, so head
Building a Discord Bot with Python and Repl.it 165
over to the “Bot” tab using the menu on the left and press the “Add Bot” button, as indicated below.
Click “Yes, do it” when Discord asks if you’re sure about bringing a new bot to life.
The last thing we’ll need from our bot is a Token. Anyone who has the bot’s token can prove that
they own the bot, so you’ll need to be careful not to share this with anyone. You can get the token by
pressing “Click to Reveal Token”, or copy it to your clipboard without seeing it by pressing “Copy”.
Take note of your token or copy it to your clipboard, as we’ll need to add it to our code soon.
Press “Create a server” in the screen that follows, and then give your server a name. Once the server
is up and running, you can chat with yourself, or invite some friends to chat with you. Soon we’ll
invite our bot to chat with us as well.
Select the server we created in the step before this and hit the “authorize” button. After completing
the captcha, you should get an in-app Discord notification telling you that your bot has joined your
server.
Now we can get to the fun part of building a brain for our bot!
Open this new file and add a variable to define your bot’s secret token (note that this is the second
token that we got while setting up the bot – different from the Client ID that we used to add our
bot to our server). It should look something like:
1 DISCORD_BOT_SECRET=NDcUN5T32zcTjMYOM0Y1MTUy.Dk7JBw.ihrTSAO1GKHZSonqvuhtwta16WU
• Replace the token (after the = sign) with the token that Discord gave you when creating your
own bot.
• Be careful about spacing. Unlike in Python, if you put a space on either side of the = in your
.env file, these spaces will be part of the variable name or the value, so make sure you don’t
have any spaces around the = or at the end of the line.
• Run the code again. Sometimes you’ll need to refresh the whole page to make sure that your
environment variables are successfully loaded.
¹⁰⁸https://www.digitalocean.com/community/tutorials/how-to-read-and-set-environmental-and-shell-variables-on-a-linux-vps
Building a Discord Bot with Python and Repl.it 169
Let’s make a Discord bot that repeats everything we say but in reverse. We can do this in only a few
lines of code. In your main.py file, add the following:
1 import discord
2 import os
3
4 client = discord.Client()
5
6 @client.event
7 async def on_ready():
8 print("I'm in")
9 print(client.user)
10
11 @client.event
12 async def on_message(message):
13 if message.author != client.user:
14 await message.channel.send(message.content[::-1])
15
16 token = os.environ.get("DISCORD_BOT_SECRET")
17 client.run(token)
• Lines 1-2 import the discord library that we installed earlier and the built-in operating system
library, which we’ll need to access our bot’s secret token.
• In line 4, we create a Discord Client. This is a Python object that we’ll use to send various
commands to Discord’s servers.
• In line 6, we say we are defining an event for our client. This line is a Python decorator, which
will take the function directly below it and modify it in some way. The Discord bot is going to
run asynchronously, which might be a bit confusing if you’re used to running standard Python.
We won’t go into asynchronous Python in depth here, but if you’re interested in what this is
and why it’s used, there’s a good guide over at FreeCodeCamp¹⁰⁹. In short, instead of running
¹⁰⁹https://medium.freecodecamp.org/a-guide-to-asynchronous-programming-in-python-with-asyncio-232e2afa44f6
Building a Discord Bot with Python and Repl.it 170
the code in our file from top to bottom, we’ll be running pieces of code in response to specific
events.
• In lines 7-9 we define what kind of event we want to respond to, and what the response should
be. In this case, we’re saying that in response to the on_ready event (when our bot joins a server
successfully), we should output some information server-side (i.e. this will be displayed in our
Repl’s output, but not sent as a message through to Discord). We’ll print a simple I'm in
message to see that the bot is there and print our bot’s user id (if you’re running multiple bots,
this will make it easier to work out who’s doing what).
• Lines 11-14 are similar, but instead of responding to an on_ready event, we tell our bot how
to handle new messages. Line 13 says we only want to respond to messages that aren’t from
us (otherwise our bot will keep responding to himself – you can remove this line to see why
that’s a problem), and line 14 says we’ll send a new message to the same channel where we
received a message (message.channel) and the content we’ll send will be the same message
that we received, but backwards (message.content[::-1] - ::-1 is a slightly odd but useful
Python idiom to reverse a string or list).
The last two lines get our secret token from the environment variables that we set up earlier and
then tell our bot to start up.
Press the big green “Run” button again and you should see your bot reporting a successful channel
join in the Repl output.
Open Discord, and from within the server we created earlier, select your ReplBotApplication from
the pane on the right-hand side of the screen.
Building a Discord Bot with Python and Repl.it 171
.
Once you have selected this, you will be able to send a message (by typing into the box highlighted
below) and see your bot respond!
Building a Discord Bot with Python and Repl.it 172
.
The bot responds each time, reversing the text we enter.
Building a Discord Bot with Python and Repl.it 173
¹¹⁰http://flask.pocoo.org/
Building a Discord Bot with Python and Repl.it 174
We won’t go over this in detail as it’s not central to our bot, but here we start a web server that
will return “I’m alive” if anyone visits it, and we’ll provide a method to start this in a new thread
(leaving the main thread for our Repl bot).
In our main.py file, we need to add an import for this server at the top. Add the following line near
the top of main.py.
In main.py we need to start up the web server just before you start up the bot. Add these three lines
to main.py, just before the line with token = os.environ.get("DISCORD_BOT_SECRET"):
1 keep_alive()
2 token = os.environ.get("DISCORD_BOT_SECRET")
3 client.run(token)
After doing this and hitting the green “Run” button again, you should see some changes to your Repl.
For one, you’ll see a new pane in the top right which shows the web output from your server. We
can see that visiting our Repl now returns a basic web page showing the “I’m alive” string that we
told our web server to return by default. In the bottom-right pane, you can also see some additional
output from Flask starting up and running continuously, listening for requests.
Building a Discord Bot with Python and Repl.it 175
Now your bot will stay alive even after closing your browser or shutting down your development
machine. Repl will still clean up your server and kill your bot after about one hour of inactivity,
so if you don’t use your bot for a while, you’ll have to log into Repl and start the bot up again.
Alternatively, you can set up a third-party (free!) service like Uptime Robot¹¹¹. Uptime Robot pings
your site every 5 minutes to make sure it’s still working – usually to notify you of unexpected
downtime, but in this case, the constant pings have the side effect of keeping our Repl alive as it will
never go more than an hour without receiving any activity.
Let’s get through these admin steps first and then we can get to the fun part of coding our bot.
You can also rename the application and provide a description for your bot at this point and press
“Save Changes”.
You have now created a Discord application. The next step is to add a bot to this application, so head
Building a Discord bot with Node.js and Repl.it 179
over to the “Bot” tab using the menu on the left and press the “Add Bot” button, as indicated below.
Click “Yes, do it” when Discord asks if you’re sure about bringing a new bot to life.
The last thing we’ll need from our bot is a Token. Anyone who has the bot’s token can prove that
they own the bot, so you’ll need to be careful not to share this with anyone. You can get the token by
pressing “Click to Reveal Token”, or copy it to your clipboard without seeing it by pressing “Copy”.
Take note of your token or copy it to your clipboard, as we’ll need to add it to our code soon.
Press “Create a server” in the screen that follows, and then give your server a name. Once the server
is up and running, you can chat with yourself, or invite some friends to chat with you. Soon we’ll
invite our bot to chat with us as well.
Select the server we created in the step before this and hit the “authorize” button. After completing
the captcha, you should get an in-app Discord notification telling you that your bot has joined your
server.
Now we can get to the fun part of building a brain for our bot!
Press the “Run” button and you should see Repl.it installing the Discord library in the output pane
on the right, as in the image below.
¹²²https://repl.it
¹²³https://discord.js.org/
¹²⁴https://www.npmjs.com/
Building a Discord bot with Node.js and Repl.it 182
Our bot is nearly ready to go – but we still need to plug in our secret token. This will authorize our
code to control our bot.
1 DISCORD_BOT_SECRET=NDcUN5T32zcTjMYOM0Y1MTUy.Dk7JBw.ihrTSAO1GKHZSonqvuhtwta16WU
¹²⁵https://www.digitalocean.com/community/tutorials/how-to-read-and-set-environmental-and-shell-variables-on-a-linux-vps
Building a Discord bot with Node.js and Repl.it 183
• Replace the token below (after the = sign) with the token that Discord gave you when creating
your own bot.
• Be careful about spacing. If you put a space on either side of the = in your .env file, these
spaces will be part of the variable name or the value, so make sure you don’t have any spaces
around the = or at the end of the line.
• Run the code again. Sometimes you’ll need to refresh the whole page to make sure that your
environment variables are successfully loaded.
In the image below we’ve highlighted the “Add file” button, the new file (.env) and how to define
the secret token for our bot’s use.
Let’s make a Discord bot that repeats everything we say but in reverse. We can do this in only a few
lines of code. In your index.js file, add the following:
• Line 1 is what we had earlier. This line both tells Repl.it to install the third party library and
brings it into this file so that we can use it.
• In line 2, we create a Discord Client. We’ll use this client to send commands to the Discord
server to control our bot and send it commands.
• In line 3 we retrieve our secret token from the environment variables (which Repl.it sets from
our .env file).
• In line 5, we define an event for our client, which defines how our bot should react to the
“ready” event. The Discord bot is going to run asynchronously, which might be a bit confusing
if you’re used to running standard synchronous code. We won’t go into asynchronous coding
in depth here, but if you’re interested in what this is and why it’s used, there’s a good guide
over at RisingStack¹²⁶. In short, instead of running the code in our file from top to bottom, we’ll
be running pieces of code in response to specific events.
• In lines 6-8 we define how our bot should respond to the “ready” event, which is fired when
our bot successfully joins a server. We instruct our bot to output some information server side
(i.e. this will be displayed in our Repl’s output, but not sent as a message through to Discord).
We’ll print a simple I'm in message to see that the bot is there and print our bot’s username
(if you’re running multiple bots, this will make it easier to work out who’s doing what).
• Lines 10-14 are similar, but instead of responding to an “ready” event, we tell our bot how
to handle new messages. Line 11 says we only want to respond to messages that aren’t from
us (otherwise our bot will keep responding to himself – you can remove this line to see why
that’s a problem), and line 12 says we’ll send a new message to the same channel where we
received a message (msg.channel) and the content we’ll send will be the same message that we
received, but backwards. To reverse a string, we split it into its individual characters, reverse
the resulting array, and then join it all back into a string again.
The last line fires up our bot and uses the token we loaded earlier to log into Discord.
Press the big green “Run” button again and you should see your bot reporting a successful channel
join in the Repl output.
¹²⁶https://blog.risingstack.com/node-hero-async-programming-in-node-js/
Building a Discord bot with Node.js and Repl.it 185
Open Discord, and from within the server we created earlier, select your ReplBotApplication from
the pane on the right-hand side of the screen.
.
Once you have selected this, you will be able to send a message (by typing into the box highlighted
below) and see your bot respond!
Building a Discord bot with Node.js and Repl.it 186
.
The bot responds each time, reversing the text we enter.
Building a Discord bot with Node.js and Repl.it 187
We won’t go over this in detail as it’s not central to our bot, but here we start a web server that will
return “I’m alive” if anyone visits it.
In our index.js file, we need to add a require statement for this server at the top. Add the following
line near the top of index.js.
After doing this and hitting the green “Run” button again, you should see some changes to your
Repl. For one, you’ll see a new pane in the top right which shows the web output from your server.
We can see that visiting our Repl now returns a basic web page showing the “I’m alive” string that
we told our web server to return by default.
Now your bot will stay alive even after closing your browser or shutting down your development
machine. Repl will still clean up your server and kill your bot after about one hour of inactivity,
so if you don’t use your bot for a while, you’ll have to log into Repl and start the bot up again.
Alternatively, you can set up a third-party (free!) service like Uptime Robot¹²⁷. Uptime Robot pings
your site every 5 minutes to make sure it’s still working – usually to notify you of unexpected
downtime, but in this case the constant pings have the side effect of keeping our Repl alive as it will
never go more than an hour without receiving any activity. Note that you need to select the HTTP
option instead of the Ping option when setting up Uptime Robot, as Repl.it requires regular HTTP
requests to keep your chatbot alive.
¹²⁷https://uptimerobot.com/
Building a Discord bot with Node.js and Repl.it 189
In this tutorial, we’ll be using Django to create an online service that shows visitors their current
weather and location. We’ll develop the service and host it using repl.it¹³³.
To work through this tutorial, you should ideally have basic knowledge of Python and some
knowledge of web application development. However, we’ll explain all of our reasoning and each
line of code thoroughly , so if you have any programming experience you should be able to follow
along as a complete Python or web app beginner too. We’ll also be making use of some HTML,
JavaScript, and jQuery, so if you have been exposed to these before you’ll be able to work through
more quickly. If you haven’t, this will be a great place to start.
To display the weather at the user’s current location, we’ll have to tie together a few pieces. The
main components of our system are:
By using this tutorial as a starting point, you can easily create your own bespoke web applications.
Instead of showing weather data to your visitors, you could, for example, pull and combine data
from any of the hundreds of APIs found at this list of public APIs¹³⁸.
Setting up
You won’t need to install any software or programming languages on your local machine, as we’ll
be developing our application directly through repl.it¹³⁹. Head over there and create an account.
Press the + button in the top right to create a new project and search for “Django Template”. Give
your project a name and press “Create repl”.
By default, Django comes with a pretty complicated folder structure of existing files and folders.
There’s also a README.md file that will open by default, giving you some guidance on how to find
your way around.
¹³⁸https://github.com/toddmotto/public-apis
¹³⁹repl.it
Creating and hosting a basic web application with Django and Repl.it 192
We won’t explain what all these different components are for and how they tie together in this
tutorial. If you want to understand Django better, you should go through their official tutorial¹⁴⁰.
In this tutorial, we’ll just look at the few files that we need to modify to get our basic application
working.
Hit the Run button in the bar at the top and you’ll see Repl.it install all of the required packages and
start up the default Django app.
¹⁴⁰https://docs.djangoproject.com/en/2.0/intro/tutorial01/
Creating and hosting a basic web application with Django and Repl.it 193
Let’s add the HTML templates that we will use to render our static page. Create a file called
base.html within the newly created templates folder and add the following code.
1 {% load static %}
2 <!DOCTYPE html>
3
4 <html lang="en">
5 <head>
6 <meta charset="UTF-8">
7 <title>Hello World!</title>
8 <meta charset="UTF-8"/>
9 <meta name="viewport" content="width=device-width, initial-scale=1"/>
10 <link rel="stylesheet" href="{% static "css/style.css" %}">
11 </head>
12 <body>
13 {% block content %}{% endblock content %}
14 </body>
15 </html>
The above is a basic HTML template that our Django app will use when rendering pages. We also
link to a stylesheet that Django will get from a folder called static/css which we will create soon.
Note the {% load static %} in the first line, this is to tell Django that we are using static files in
this template ie. style.css.
Still in the templates folder, create a file called index.html and add the following code.
Creating and hosting a basic web application with Django and Repl.it 195
1 {% extends "base.html" %}
2
3 {% block content %}
4 <h1>Hello World!</h1>
5 {% endblock content %}
This is a file written in Django’s template language¹⁴¹, which often looks very much like HTML (and
is usually found in files with a .html extension), but which contains some extra functionality to help
us load dynamic data from the back end of our web application into the front end for our users to
see.
The above extends the base.html template and adds the block content to it, in this case “Hello
World!”.
Django looks for template folders within “app” folders by default. This is helpful when you have
multiple apps within your project but in this case we don’t so we need to tell Django where to find
our templates folder.
Open the mysite/settings.py file, scroll down to TEMPLATES and add os.path.join(BASE_DIR,
'templates') within the square brackets next to DIRS: like below
¹⁴¹https://docs.djangoproject.com/en/2.0/topics/templates/
Creating and hosting a basic web application with Django and Repl.it 196
Now that we have our templates added, let’s add the folders and files for adding the stylesheet.
Create a folder called static and also create a folder called css within the static folder.
Create a file called style.css within the static/css/ directory and add the following code.
1 body {
2 background-color: lightblue;
3 }
4
5 h1 {
6 color: navy;
7 margin-left: 20px;
8 }
Above we add basic CSS code to demonstrate how you can modify the look of your site.
Creating and hosting a basic web application with Django and Repl.it 197
Django handles static files similar to templates where it automatically checks app directories for a
directory called static. Since we only have a static page instead of apps we need to tell Django
where our static directory is located.
Open the mysite/settings.py file, scroll all the way down and add the following code right after
STATIC_URL ='/static/'
Within the mysite/ directory, create a file called views.py and add the following code.
A view function in Django is a Python function that takes Web requests and returns a Web response.
This is where you add the logic that will return a certain response when called. In our case we define
a view that will return the index.html page.
Creating and hosting a basic web application with Django and Repl.it 199
To call this view function we need to add it to our url patterns. Open the mysite/urls.py file and
replace the contents with the below code.
Note that we import the views file created earlier from . import views. Then we add the url pattern
with an empty path '' and point it to the home view created earlier. The admin/ path points to the
admin page that comes as a default with Django.
When Django receives a request it goes through the urlpatterns list until it finds a match. In our
case <url>.com will match the first path and return the home view that will render the index.html
page. If we navigate to<url>.com/admin/, Django will match the pattern of the admin/ path and
return the admin page.
Restart your server and refresh the web page on the right. You should see our “Hello World!” page.
Great, we have now put all the pieces in place for our “Hello World!” web page.Let’s expand this
and start building our weather app.
Creating and hosting a basic web application with Django and Repl.it 200
Open the templates/index.html file and change the code where it says “Hello World!” to read
“Weather” like below.
1 {% extends "base.html" %}
2
3 {% block content %}
4 <h1>Weather</h1>
5 {% endblock content %}
Click the refresh button as indicated below to see the result change from “Hello World!” to “Weather”.
You can also press the pop-out button to the right of the the URL bar to open only the resulting web
page that we’re building, as a visitor would see it. You can share the URL with anyone and they’ll
be able to see your Weather website already!
Creating and hosting a basic web application with Django and Repl.it 201
Changing the static text that our visitors see is a good start, but our web application still doesn’t do
anything. We’ll change that in the next step by using JavaScript to get our user’s IP Address.
1 {% load static %}
2 <!DOCTYPE html>
3
4 <html lang="en">
5 <head>
6 <meta charset="UTF-8">
7 <title>Hello World!</title>
8 <meta charset="UTF-8"/>
9 <meta name="viewport" content="width=device-width, initial-scale=1"/>
10 <link rel="stylesheet" href="{% static "css/style.css" %}">
11 </head>
12 <body>
13 {% block content %}{% endblock content %}
¹⁴²https://dyn.com/blog/finding-yourself-the-challenges-of-accurate-ip-geolocation/
¹⁴³https://www.ipify.org/
Creating and hosting a basic web application with Django and Repl.it 202
14 </body>
15 </html>
The “head” section of this template is between lines 5 and 11 – the opening and closing <head> tags.
We’ll add our scripts directly below the <link ...> on line 10 and above the closing </head> tag on
line 11. Modify this part of code to add the following lines:
1 <script>
2 function use_ip(json) {
3 alert("Your IP address is: " + json.ip);
4 }
5 </script>
6
7 <script src="https://api.ipify.org?format=jsonp&callback=use_ip"></script>
These are two snippets of JavaScript. The first (lines 1-5) is a function that when called will display
a pop-up box (an “alert”) in our visitor’s browser showing their IP address from the json object that
we pass in. The second (line 7) loads an external script from ipify’s API and asks it to pass data
(including our visitor’s IP address) along to the use_ip function that we provide.
If you open your web app again and refresh the page, you should see this script in action (if you’re
running an adblocker, it might block the ipify scripts, so try disabling that temporarily if you have
any issues).
This code doesn’t do anything with the IP address except display it to the user, but it is enough
to see that the first component of our system (getting our user’s IP Address) is working. This also
introduces the first dynamic functionality to our app – before, any visitor would see exactly the same
thing, but now we can show each visitor something related specifically to them (no two people have
the same IP address).
Now instead of simply showing this IP address to our visitor, we’ll modify the code to rather pass it
along to our Repl webserver (the “backend” of our application), so that we can use it to fetch location
information through a different service.
Creating and hosting a basic web application with Django and Repl.it 203
We’ve added a definition for the get_weather_from_ip route, telling our app that if anyone visits
https://django-weather-tutorial-eugenedorfling.ritza.repl.co/get_weather_from_ip¹⁴⁴ then we should
trigger a function in our views.py file that is also called get_weather_from_ip. Let’s write that
function now.
In your views.py file, add a get_weather_from_ip() function beneath the existing home() one, and
add an import for JsonResponse on line 2. Your whole views.py file should now look like this:
By default, Django passes a request argument to all views. This is an object that contains informa-
tion about our user and the connection, and any additional arguments passed in the URL. As our
application isn’t connected to any weather services yet, we’ll just make up a temperature (20) and
pass that back to our user as JSON.
In line 10, we print out the IP address that we will pass along to this route from the GET arguments
(we’ll look at how to use this later). We then create the fake data (which we’ll later replace with
real data) and return a JSON response (the data in a format that a computer can read more easily,
with no formatting). We return JSON instead of HTML because our system is going to use this route
internally to pass data between the front and back ends of our application, but we don’t expect our
users to use this directly.
To test this part of our system, open your web application in a new tab and add /get_weather_-
from_ip?ip_address=123 to the URL. Here, we’re asking our system to fetch weather data for the
IP address 123 (not a real IP address). In your browser, you’ll see the fake weather data displayed in
a format that can easily be programmatically parsed.
In our Repl’s output, we can see that the backend of our application has found the “IP address” and
printed it out, between some other outputs telling us which routes are being visited and which port
our server is running on:
• pass the user’s real IP address to our new route in the background when the user visits our
main page
• add more backend logic to fetch the user’s location from the IP address
• add logic to fetch the user’s weather from their location
• display this data to the user.
Let’s start by using Ajax¹⁴⁵ to pass the user’s IP address that we collected before to our new route,
¹⁴⁵https://en.wikipedia.org/wiki/Ajax_(programming)
Creating and hosting a basic web application with Django and Repl.it 205
without our user having to explicitly visit the get_weather_from_ip endpoint or refresh their page.
Note: usually you wouldn’t add JavaScript directly to your base.html template, but to
keep things simpler and to avoid creating too many files, we’ll be diverging from some
good practices. See the Django documentation¹⁴⁷ for some guidance on how to structure
JavaScript properly in larger projects.
In your templates/base.html file, add the following script above the line where we previously
defined the use_ip() function.
1 <script
2 src="https://code.jquery.com/jquery-3.3.1.min.js"
3 integrity="sha256-FgpCb/KJQlLNfOu91ta32o/NMZxltwRo8QtmkMRdAu8="
4 crossorigin="anonymous"></script>
This loads the entire jQuery library from a CDN¹⁴⁸, allowing us to complete certain tasks using fewer
lines of JavaScript.
Now, modify the use_ip() script that we wrote before to call our backend route using Ajax. The
new use_ip() function should be as follows:
1 function use_ip(json) {
2 $.ajax({
3 url: {% url 'get_weather_from_ip' %},
4 data: {"ip": json.ip},
5 dataType: 'json',
6 success: function (data) {
7 document.getElementById("weatherdata").innerHTML = data.weather_data
8 }
9 });
10 }
¹⁴⁶http://api.jquery.com/jquery.ajax/
¹⁴⁷https://docs.djangoproject.com/en/2.0/howto/static-files/
¹⁴⁸https://www.cloudflare.com/learning/cdn/what-is-a-cdn/
Creating and hosting a basic web application with Django and Repl.it 206
Our new use_ip()function makes an asynchronous¹⁴⁹ call to our get_weather_from_ip route, send-
ing along the IP address that we previously displayed in a pop-up box. If the call is successful, we
call a new function (in the success: section) with the returned data. This new function (line 7) looks
for an HTML element with the ID of weatherdata and replaces the contents with the weather_data
attribute of the response that we received from get_weather_from_ip (which at the moment is still
hardcoded to be “20”).
To see the results, we’ll need to add an HTML element as a placeholder with the id weatherdata. Do
this in the templates/index.html file as follows.
1 {% extends "base.html" %}
2
3 {% block content %}
4 <h1>Weather</h1>
5 <p id=weatherdata></p>
6 {% endblock %}
This adds an empty HTML paragraph element which our JavaScript can populate once it has the
required data.
Now reload the app and you should see our fake 20 being displayed to the user. If you don’t see what
you expect, open up your browser’s developer tools for Chrome¹⁵⁰ and Firefox¹⁵¹) and have a look
¹⁴⁹http://api.jquery.com/jquery.ajax/
¹⁵⁰https://developers.google.com/web/tools/chrome-devtools/
¹⁵¹https://developer.mozilla.org/son/docs/Tools
Creating and hosting a basic web application with Django and Repl.it 207
at the Console section for any JavaScript errors. A clean console (with no errors) is shown below.
Now it’s time to change out our mock data for real data by calling two services backend – the first
to get the user’s location from their IP address and the second to fetch the weather for that location.
1 import requests
1 def get_location_from_ip(ip_address):
2 response = requests.get("http://ip-api.com/json/{}".format(ip_address))
3 return response.json()
Note: again we are diverging from best practice in the name of simplicity. Usually
whenever you write any code that relies on networking (as above), you should add
exception handling¹⁵³ so that your code can fail more gracefully if there are problems.
You can see the response that we’ll be getting from this service by trying it out in your browser. Visit
http://ip-api.com/json/41.71.107.123¹⁵⁴ to see the JSON response for that specific IP address.
¹⁵²http://ip-api.com
¹⁵³https://docs.python.org/3/tutorial/errors.html
¹⁵⁴http://ip-api.com/json/41.71.107.123
Creating and hosting a basic web application with Django and Repl.it 208
Take a look specifically at the highlighted location information that we’ll need to extract to pass on
to a weather service.
Before we set up the weather component, let’s display the user’s current location data instead of the
hardcoded temperature that we had before. Change the get_weather_from_ip() function to call our
new function and pass along some useful data as follows:
1 def get_weather_from_ip(request):
2 ip_address = request.GET.get("ip")
3 location = get_location_from_ip(ip_address)
4 city = location.get("city")
5 country_code = location.get("countryCode")
6 s = "You're in {}, {}".format(city, country_code)
7 data = {"weather_data": s}
8 return JsonResponse(data)
Now, instead of just printing the IP address that we get sent and making up some weather data,
we use the IP address to guess the user’s location, and pass the city and country code back to the
template to be displayed. If you reload your app again, you should see something similar to the
following (though hopefully with your location instead of mine).
That’s the location component of our app done and dusted – let’s move on to getting weather data
for that location now.
Creating and hosting a basic web application with Django and Repl.it 209
This key is a bit like a password – when we use OpenWeatherMap’s service, we’ll always send along
this key to indicate that it’s us making the call. Because Repl.it’s projects are public by default, we’ll
need to be careful to keep this key private and prevent other people making too many calls using our
OpenWeatherMap quota (potentially making our app fail when OpenWeatherMap starts blocking
our calls). Luckily Repl.it provides a neat way of solving this problem using .env files¹⁵⁸.
In your project, create a new file using the “New file” button as shown below. Make sure that the file
is in the root of your project and that you name the file .env (in Linux, starting a filename with a .
usually indicates that it’s a system or configuration file). Inside this file, define the OPEN_WEATHER_-
¹⁵⁵https://openweathermap.org/
¹⁵⁶https://openweathermap.org/
¹⁵⁷https://home.openweathermap.org/api_keys
¹⁵⁸https://repl.it/site/docs/secret-keys
Creating and hosting a basic web application with Django and Repl.it 210
TOKEN variable as follows, but using your own token instead of the fake one below. Make sure not to
have a space on either side of the = sign.
1 OPEN_WEATHER_TOKEN=1be9250b94bf6803234b56a87e55f
Repl.it will load the contents of this file into our server’s environment variables¹⁵⁹. We’ll be able to
access this using the os library in Python, but when other people view or fork our Repl, they won’t
see the .env file, keeping our API key safe and private.
To fetch weather data, we need to call the OpenWeatherMap api, passing along a search term. To
make sure we’re getting the city that we want, it’s good to pass along the country code as well as the
city name. For example, to get the weather in London right now, we can visit (again, you’ll need to
add your own API key in place of the string after appid=) https://api.openweathermap.org/data/2.5/weather?q=Londo
To test this, you can visit the URL in your browser first. If you prefer Fahrenheit to Celsius, simply
change the unit=metric part of the url to units=imperial.
¹⁵⁹https://wiki.archlinux.org/index.php/environment_variables
Creating and hosting a basic web application with Django and Repl.it 211
Let’s write one last function in our views.py file to replicate this call for our visitor’s city which we
previously displayed.
First we need to add an import for the Python os (operating system) module so that we can access
our environment variables. At the top of views.py add:
1 import os
In line 2, we get our API key from the environment variables (note, you sometimes need to refresh
the repl.it page with your repl in to properly load in the environment variables), and we then use
this to format our URL properly in line 3. We get the response from OpenWeatherMap and return
it as json.
We can now use this function in our get_weather_from_ip() function by modifying it to look as
follows:
1 def get_weather_from_ip(request):
2 ip_address = request.GET.get("ip")
3 location = get_location_from_ip(ip_address)
4 city = location.get("city")
5 country_code = location.get("countryCode")
6 weather_data = get_weather_from_location(city, country_code)
7 description = weather_data['weather'][0]['description']
8 temperature = weather_data['main']['temp']
9 s = "You're in {}, {}. You can expect {} with a temperature of {} degrees".format(\
10 city, country_code, description, temperature)
11 data = {"weather_data": s}
12 return JsonResponse(data)
Creating and hosting a basic web application with Django and Repl.it 212
We now get the weather data in line 6, parse this into a description and temperature in lines 7 and 8,
and add this to the string we pass back to our template in line 9. If you reload the page, you should
see your location and your weather.
and hit the “Fork” button. If you didn’t create an account at the beginning of this tutorial, you’ll be
prompted to create one. (You can even use a lot of Repl functionality without creating an account.)
Forking a Repl
• Make the page look nicer by using Bootstrap¹⁶⁰ or another CSS framework in your template
files.
• Make the app more customizable by allowing the user to choose their own location if the IP
location that we guess is wrong
• Make the app more useful by showing the weather forecast along with the current weather.
(This data is also available¹⁶¹ from Open Weather Map).
• Add other location-related data to the web app such as news, currency conversion, transla-
tion, postal codes. See https://github.com/toddmotto/public-apis#geocoding¹⁶² for a nice list of
possibilities.
In the next chapter, we’ll be looking at building our own CRM app with NodeJS and Repl.it. This
tutorial will also introduce you to setting up a MongoDB database and creating a user interface.
¹⁶⁰https://getbootstrap.com/
¹⁶¹https://openweathermap.org/forecast5
¹⁶²https://github.com/toddmotto/public-apis#geocoding
Building a CRM app with NodeJS,
Repl.it, and MongoDB
In this tutorial, we’ll use NodeJS on Repl.it, along with a MongoDB database to build a basic
CRUD¹⁶³ (Create, Read, Update, Delete) CRM¹⁶⁴ (Customer Relationship Management) application.
A CRM lets you store information about customers to help you track the status of every customer
relationship. This can help businesses keep track of their clients and ultimately increase sales. The
application will be able to store and edit customer details, as well as keep notes about them.
This tutorial won’t be covering the basics of Node.js, but each line of code will be explained in detail.
By the end of the tutorial, the application you will have created will be able to create, update, and
delete documents in a MongoDB database. You will also have used a web application framework
called Express¹⁶⁶ and the Pug¹⁶⁷ templating engine.
After signing up, under “Shared Clusters”, press the “Create a Cluster” button.
You now have to select a provider and a region. For the purposes of this tutorial, we chose Google
Cloud Platform as the provider and Iowa (us-central1) as the region, although it should work
regardless of the provider and region.
Under “Cluster Name” you can change the name of your cluster. Note that you can only change the
name now - it can’t be changed once the cluster is created. After you’ve done that, click “Create
Cluster”.
After a bit of time, your cluster will be created. Once it’s available, click on “Database Access” under
the Security heading in the left-hand column and then click “Add New Database User”. You need a
Building a CRM app with NodeJS, Repl.it, and MongoDB 215
database user to actually store and retrieve data. Enter a username and password for the user and
make a note of those details - you’ll need them later. Select “Read and write to any database” as the
user privilege. Hit “Add User” to complete this step.
Next, you need to allow network access to the database. Click on “Network Access” in the left-
hand column, and “Add IP Address”. Because we won’t have a static IP from Repl.it, we’re just
going to allow access from anywhere - don’t worry, the database is still secured with the username
and password you created earlier. In the popup, click “Allow Access From Anywhere” and then
“Confirm”.
Building a CRM app with NodeJS, Repl.it, and MongoDB 216
Now select “Clusters”, under “Data Storage” in the left-hand column. Click on “Connect” and select
“Connect Your Application”. This will change the pop-up view. Copy the “Connection String” as
you will need it shortly to connect to your database from Repl.it. It will look something like this:
mongodb+srv://<username>:<password>@cluster0-zrtwi.gcp.mongodb.net/test?retryWrites=true&w=majority
Building a CRM app with NodeJS, Repl.it, and MongoDB 217
1 MONGO_USERNAME=username
2 MONGO_PASSWORD=password
• Replace username and password with your database username and password
Building a CRM app with NodeJS, Repl.it, and MongoDB 218
• Spacing matters. Make sure that you don’t add any spaces before or after the = sign
Now that we have credentials set up for the database, we can move on to connecting to it in our
code.
MongoDB is kind enough to provide a client that we can use. To test out our database connection,
we’re going to insert some customer data into our database. In your index.js file (created automat-
ically and found under the Files pane), add the following code:
Let’s break this down to see what is going on and what we still need to change:
• Line 1 adds the dependency for the MongoDB Client. As we have discussed before, Repl.it
makes things easy by installing all the dependencies for us, so we don’t have to use something
like npm to do it manually.
• Line 2 & 3 we retrieve our MongoDB username and password from the environment variables
that we set up earlier.
• Line 5 has a few very important details that we need to get right.
– Replace the section between the @ and the next / with the same section of your connection
string from MongoDB that we copied earlier. You may notice the ${mongo_username} and
${mongo_password} before and after the colon near the beginning of the string. These are
called Template Literals. Template Literals allow us to put variables in a string, which
Node.js will then helpfully replace with the actual values of the variables.
– Note crmdb after the / and before the ?. This will be the name of the database that we will
be using. MongoDB creates the database if it doesn’t exist for us. You can change this to
whatever you want to name the database, but remember what you changed it to for future
sections of this tutorial.
• Line 6 creates the client that we will use to connect to the database.
framework is designed to support the development of web applications - it gives you a standard way
to build your application and lets you get to building your application fast without having to do the
boilerplate code.
A really simple, fast and flexible Node.js web application framework is Express¹⁶⁹, which provides
a robust set of features for the development of web applications.
The first thing we need to do is add the dependencies we need. Right at the top of your index.js file
(above the MongoDB code), add the following lines:
• Line 1 adds the dependency for Express. Repl.it will take care of installing it for us.
• Line 2 creates a new Express app that will be needed to handle incoming requests.
• Line 3 adds a dependency for ‘body-parser’. This is needed for the Express server to be able to
handle the data that the form will send, and give it to us in a useful format to use in the code.
• Line 4 adds a dependency for a basic HTTP server.
• Line 6 & 7 tell the Express app which parsers to use on incoming data. This is needed to handle
form data.
Next, we need to add a way for the Express to handle an incoming request and give us the form that
we want. Add the following lines of code below that which you just added:
• '/' tells Express that it should respond to GET requests sent to the root URL. A root URL looks
something like ‘https://crm.hawkiesza.repl.co’ - note that there are no slashes after the URL.
• '/create' tells Express that it should respond to GET requests to /create after the root URL i.e.
‘https://crm.hawkiesza.repl.co/create’
• res.sendFile tells Express to send the given file as a response.
Before the server will start receiving requests and sending responses, we need to tell it to run. Add
the following code below the previous line.
• Line 1 tells Express to set the port number to either a number defined as an environment
variable, or 5000 if no definition was made.
• Line 2-4 tells the server to start listening for requests.
Now we have an Express server listening for requests, but we haven’t yet built the form that it needs
to send back if it receives a request.
Make a new file called index.html and paste the following code into it:
1 <!DOCTYPE html>
2 <html>
3 <body>
4 <form action="/create" method="GET">
5 <input type="submit" value="Create">
6 </form>
7
8 </body>
9 </html>
This is just a simple bit of HTML that puts a single button on the page. When this button is clicked
it sends a GET request to /create, which the server will then respond to according to the code that
we wrote above - in our case it will send back the create.html file which we will define now.
Make a new file called create.html and paste the following into it:
Building a CRM app with NodeJS, Repl.it, and MongoDB 221
1 <!DOCTYPE html>
2 <html>
3 <body>
4
5 <h2>Customer details</h2>
6
7 <form action="/create" method="POST">
8 <label for="name" >Customer name *</label><br>
9 <input type="text" id="name" name="name" class="textInput" placeholder="John Smith\
10 " required>
11 <br>
12 <label for="address" >Customer address *</label><br>
13 <input type="text" name="address" class="textInput" placeholder="42 Wallaby Way, S\
14 ydney" required>
15 <br>
16 <label for="telephone" >Customer telephone *</label><br>
17 <input type="text" name="telephone" class="textInput" placeholder="+275554202" req\
18 uired>
19 <br>
20 <label for="note" >Customer note</label><br>
21 <input type="text" name="note" class="textInput" placeholder="Needs a new pair of \
22 shoes">
23 <br><br>
24 <input type="submit" value="Submit">
25 </form>
26
27 </body>
28 </html>
We won’t go in-depth into the above HTML. It is a very basic form with 4 fields (name, address,
telephone, note) and a Submit button, which creates an interface that will look like the one below.
Building a CRM app with NodeJS, Repl.it, and MongoDB 222
When the user presses the submit button a POST request is made to /create with the data in the
form - we still have to handle this request in our code as we’re currently only handling a GET request
to /.
If you now start up your application (click the “run” button) a new window should appear on the
right that displays the “create” button we defined just now in “create.html”. You can also navigate to
https://<repl_name>.<your_username>.repl.co (replace <repl_name> with whatever you named
your Repl (but with no underscores or spaces) and <your_username> with your Repl username) to
see the form. You will be able to see this URL in your Repl itself.
Building a CRM app with NodeJS, Repl.it, and MongoDB 223
If you select “create” and then fill in the form and hit submit, you’ll get a response back that says
Cannot POST /create. This is because we haven’t added the code that handles the form POST request,
so let’s do that.
Add the following code into your index.js file, below the app.get entry that we made above.
10 });
11 })
12 res.send('Customer created');
13 })
• Line 1 defines a new route that listens for an HTTP ‘POST’ request at /create.
• Line 2 connects to the database. This happens asynchronously, so we define a callback function
that will be called once the connection is made.
• Line 3 creates a new collection of customers. Collections in MongoDB are similar to Tables in
SQL.
• Line 5 defines customer data that will be inserted into the collection. This is taken from the
incoming request. The form data is parsed using the parsers that we defined earlier and is then
placed in the req.body variable for us to use in the code.
• Line 6 inserts the customer data into the collection. This also happens asynchronously, and so
we define another callback function that will get an error if an error occurred, or the response
if everything happened successfully.
• Line 7 throws an error if the above insert had a problem.
• Line 8 gives us some feedback that the insert happened successfully.
If you now run the Repl (you may need to refresh it) and submit the filled-in form, you’ll get a
message back that says “Customer created”. If you then go and look in your cluster in MongoDB
and select the “collections” button, you’ll see a document has been created with the details that we
submitted in the form.
• Line 1-3 as before, this tells Express to respond to incoming GET requests on /get by sending
the get.html file which we will define below.
• Line 5-12 this tells Express to respond to incoming GET requests on /get-client.
– Line 7 makes a call to the database to fetch a customer by name. If there are more than 1
with the same name, then the first one found will be returned.
– Line 9 tells Express to render the update template, replacing variables with the given
values as it goes. Important to note here is that we are also replacing values in the hidden
form fields we created earlier with the current values of the customer details. This is to
ensure that we update or delete the correct customer.
In your index.html file, add the following code after the </form> tag:
1 <br>
2 <form action="/get" method="GET">
3 <input type="submit" value="Update/Delete">
4 </form>
This adds a new button that will make a GET request to /get, which will then return get.html.
Building a CRM app with NodeJS, Repl.it, and MongoDB 226
Image:10 Index
1 <!DOCTYPE html>
2 <html>
3 <body>
4 <form action="/get-client" method="GET">
5 <label for="name" >Customer name *</label><br>
6 <input type="text" id="name" name="name" class="textInput" placeholder="John Smi\
7 th" required>
8 <input type="submit" value="Get customer">
9 </form>
10 </body>
11 </html>
This makes a simple form with an input for the customer’s name and a button.
Clicking this button will then make a GET call to /get-client which will respond with the client
details where we will be able to update or delete them.
To actually see the customer details on a form after requesting them, we need a templating engine to
render them onto the HTML page and send the rendered page back to us. With a templating engine,
you define a template - a page with variables in it - and then give it the values you want to fill into
the variables. In our case, we’re going to request the customer details from the database and tell the
templating engine to render them onto the page.
We’re going to use a templating engine called Pug¹⁷⁰. Pug is a simple templating engine that
¹⁷⁰https://pugjs.org/api/getting-started.html
Building a CRM app with NodeJS, Repl.it, and MongoDB 227
integrates fully with Express. The syntax that Pug uses is very similar to HTML. One important
difference in the syntax is that spacing is very important as it determines your parent/child hierar-
chy.
First, we need to tell Express which templating engine to use and where to find our templates. Put
the following line above your route definitions (i.e. after the other app. lines in index.js):
1 app.engine('pug', require('pug').__express)
2 app.set('views', '.')
3 app.set('view engine', 'pug')
Now create a new file called update.pug with the following content:
1 html
2 body
3 p #{message}
4 h2= 'Customer details'
5 form(method='POST' action='/update')
6 input(type='hidden' id='oldname' name='oldname' value=oldname)
7 input(type='hidden' id='oldaddress' name='oldaddress' value=oldaddress)
8 input(type='hidden' id='oldtelephone' name='oldtelephone' value=oldtelephone)
9 input(type='hidden' id='oldnote' name='oldnote' value=oldnote)
10 label(for='name') Customer name:
11 br
12 input(type='text', placeholder='John Smith' name='name' value=name)
13 br
14 label(for='address') Customer address:
15 br
16 input(type='text', placeholder='42 Wallaby Way, Sydney' name='address' value=a\
17 ddress)
18 br
19 label(for='telephone') Customer telephone:
20 br
21 input(type='text', placeholder='+275554202' name='telephone' value=telephone)
22 br
23 label(for='note') Customer note:
24 br
25 input(type='text', placeholder='Likes unicorns' name='note' value=note)
26 br
27 button(type='submit' formaction="/update") Update
28 button(type='submit' formaction="/delete") Delete
This is very similar to the HTML form we created previously for create.html, however this is written
Building a CRM app with NodeJS, Repl.it, and MongoDB 228
in the Pug templating language. We’re creating a hidden element to store the “old” name, telephone,
address, and note of the customer - this is for when we want to do an update.
Using the old details to update the customer is an easy solution, but not the best solution as it makes
the query cumbersome and slow. If you add extra fields into your database you would have to
remember to update your query as well, otherwise it could lead to updating or deleting the wrong
customer if they have the same information. A better, but more complicated way is to use the unique
ID of the database document as that will only ever refer to one customer.
We have also put in placeholder variables for name, address, telephone, and note, and we have given
the form 2 buttons with different actions.
If you now run the code, you will have an index page with 2 buttons. Pressing the ‘Update/Delete’
button will take you to a new page that asks for a Customer name. Filling the customer name and
pressing ‘Get customer’ will, after a little time, load a page with the customer’s details and 2 buttons
below that say ‘Update’ and ‘Delete’. Make sure you enter a customer name you have entered before.
Image:12 Update-Delete
Our next step is to add the ‘Update’ and ‘Delete’ functionality. Add the following code below your
routes in index.js:
Building a CRM app with NodeJS, Repl.it, and MongoDB 229
This introduces 2 new ‘POST’ handlers - one for /update, and one for /delete.
delete - if we search for a document that has no address with an address of ‘’ (empty string),
then our query won’t return anything.
• Line 5 defines the new values that we want to update our customer with.
• Line 6 updates the customer with the new values using the query
• Line 7 throws an error if there was a problem with the update.
• Line 8 logs that a document was updated.
• Line 9 re-renders the update page with a message saying that the customer was updated, and
displays the new values.
• Line 15 connects to our MongoDB database.
• Line 16 throws an error if there was a problem connecting to the database.
• Line 17 defines a query that we will use to find the document to delete. In this case we are
using all the details of the customer before any changes were made on the form to make sure
we delete that specific customer.
• Line 18 we connect to the database and delete the customer.
• Line 19 throws an error if there was a problem with the delete.
• Line 20 logs that a document was deleted.
• Line 21 sends a response to say that the customer was deleted.
Setting up
Create a new Python Repl and open the main.py file that Repl created for you automatically. Add
the following two imports to the top and run the Repl so that these dependencies are installed.
¹⁷³https://repl.it
¹⁷⁴https://scikit-learn.org/
¹⁷⁵https://repl.it
Introduction to Machine Learning with Python and Repl.it 232
In line 1, we import the tree module, which will give us a Decision Tree classifier that can learn from
data. In line 2, we import a vectoriser – something that can turn text into numbers. We’ll describe
each of these in more detail soon!
Throughout the next steps, you can hit the big green “run” button to run your code, check for bugs,
and view output along the way (you should do this every time you add new code).
1 positive_texts = [
2 "we love you",
3 "they love us",
4 "you are good",
5 "he is good",
6 "they love mary"
7 ]
8
9 negative_texts = [
10 "we hate you",
11 "they hate us",
12 "you are bad",
13 "he is bad",
14 "we hate mary"
15 ]
16
17 test_texts = [
18 "they love mary",
19 "they are good",
20 "why do you hate mary",
21 "they are almost always good",
22 "we are very bad"
23 ]
Introduction to Machine Learning with Python and Repl.it 233
We’ve created three simple datasets of five sentences each. The first one contains positive sentences;
the second one contains negative sentences; and the last contains a mix of positive and negative
sentences.
It’s immediately obvious to a human which sentences are positive and which are negative, but can
we teach a computer to tell them apart?
We’ll use the two lists positive_texts and negative_texts to train our model. That is, we’ll show
these examples to the computer along with the correct answers for the question “is this text positive
or negative?”. The computer will try to find rules to tell the difference, and then we’ll test how well
it did by giving it test_texts without the answers and ask it to guess whether each example is
positive or negative.
Understanding vectorization
The first step in nearly all machine learning problems is to translate your data from a format that
makes sense to a human to one that makes sense to a computer. In the case of language and text data,
a simple but effective way to do this is to associate each unique word in the dataset with a number,
from 0 onwards. Each text can then be represented by an array of numbers, representing how often
each possible word appears in the text.
Let’s go through an example to see how this works. If we had the two sentences
["nice pizza is nice"], ["what is pizza"]
then we would have a dataset with four unique words in it. The first thing we’d want to do is create
a vocabulary mapping to map each unique word to a unique number. We could do this as follows:
1 {
2 "nice": 0,
3 "pizza": 1,
4 "is": 2,
5 "what": 3
6 }
To create this, we simply go through both sentences from left to right, mapping each new word
to the next available number and skipping words that we’ve seen before. We can now convert our
sentences into bag of words vectors as follows, where we indicate the frequency of occurrence of
each of the words in our vocabulary:
Introduction to Machine Learning with Python and Repl.it 234
1 [
2 [2, 1, 1, 0], # two "nice", one "pizza", one "is", zero "what"
3 [0, 1, 1, 1] # zero "nice", one "pizza", one "is", one "what"
4 ]
Each sentence vector is always the same length as the total vocabulary size. We have four words in
total (across all of our sentences), so each sentence is represented by an array of length four. Each
position in the array represents a word, and each value represents how often that word appears in
that sentence.
The first sentence contains the word “nice” twice, while the second sentence does not contain the
word “nice” at all. According to our mapping, the zeroth element of each array should indicate how
often the word nice appears, so the first sentence contains a 2 in the beginning and the second
sentence contains a 0 there.
This representation is called “bag of words” because we lose all of the information represented by
the order of words. We don’t know, for example, that the first sentence starts and ends with “nice”,
only that it contains the word “nice” twice.
With real data, these arrays get very long. There are millions of words in most languages, so for a big
dataset containing most words, each sentence needs to be represented by a very long array, where
nearly all values are set to zero (all the words not in that sentence). This could take up a lot of space,
but luckily scikit-learn uses a clever sparse-matrix implementation to overcome this. This doesn’t
quite look like the above, but the overall concept remains the same.
Let’s see how to achieve the above using scikit-learn’s optimised vectoriser.
First we want to combine all of our “training” data (the data that we’ll show the computer along
with the correct labels of “positive” or “negative” so that it can learn), so we’ll combine our positive
and negative texts into one array. Add the following code below the datasets you created.
1 ['we hate you', 'they hate us', 'you are bad', 'he is bad', 'we hate mary', 'we love\
2 you', 'they love us', 'you are good', 'he is good', 'they love mary']
3 ['negative', 'negative', 'negative', 'negative', 'negative', 'positive', 'positive',\
4 'positive', 'positive', 'positive']
The two arrays (texts and labels) are associated by index. The first text in the first array is negative,
and corresponds to the first label in the second array, and so on.
Now we need a vectoriser to transform the texts into numbers. We can create one in scikit-learn
with
Introduction to Machine Learning with Python and Repl.it 235
1 vectorizer = CountVectorizer()
Before we can use our vectorizer, it needs to run once through all the data we have so it can build
the mapping from words to indices. This is referred to as “fitting” the vectoriser, and we can do it
like this:
1 vectorizer.fit(training_texts)
If we want, we can see the mapping it created (which might not be in order, as in the examples
we walked through earlier, but each word will have its own index). We can inspect the vectoriser’s
vocabulary by adding the line
1 print(vectorizer.vocabulary_)
(Note the underscore at the end. Scikit-learn uses this as a convention for “helper” attributes. The
mapping is explicit only for debugging purposes and you shouldn’t need to use it in most cases). My
vocabulary mapping looked as follows:
1 {'we': 10, 'hate': 3, 'you': 11, 'they': 8, 'us': 9, 'are': 0, 'bad': 1, 'he': 4, 'i\
2 s': 5, 'mary': 7, 'love':6, 'good': 2}
Behind the scenes, the vectoriser inspected all of our texts, did same basic preprocessing like making
everything lowercase, split the text into words using a built-in tokenization method, and produced
a vocabulary mapping specific to our dataset.
Now that we have a vectorizer that knows what words are in our dataset, we can use it to transform
our texts into vectors. Add the following lines of code to your Repl:
1 training_vectors = vectorizer.transform(training_texts)
2 testing_vectors = vectorizer.transform(test_texts)
The first line creates a list of vectors which represent all of the training texts, still in the same order,
but now each text is a vector of numbers instead of a string.
The second line does the same with the test vectors. The machine learning part isn’t looking at our
test texts (that would be cheating) – it’s just mapping the words to numbers so that it can work with
them more easily. Note that when we called fit() on the vectoriser, we only showed it the training
texts. Because there are words in the test texts that don’t appear in the training texts, these words
will simply be ignored and will not be represented in testing_vectors.
Now that we have a vectorised representation of our problem, let’s take a look at how we can solve
it.
Introduction to Machine Learning with Python and Repl.it 236
Understanding classification
A classifier is a statistical model that tries to predict a label for a given input. In our case, the input is
the text and the output is either “positive” or “negative”, depending on whether the classifier thinks
that the input is positive or negative.
A machine learning classifier can be “trained”. We give it labelled data and it tries to learn rules
based on that data. Every time it gets more data, it updates its rules slightly to account for the new
information. There are many kinds of classifiers, but one of the simplest is called a Decision Tree.
Decision trees learn a set of yes/no rules by building decisions into a tree structure. Each new input
moves down the tree, while various questions are asked one by one. When the input filters all the
way to a leaf node in the tree, it acquires a label.
If that’s confusing, don’t worry! We’ll walk through a detailed example with a picture soon to clarify.
First, let’s show how to get some results using Python.
Add the following lines to main.py:
1 classifier = tree.DecisionTreeClassifier()
2 classifier.fit(training_vectors, training_labels)
3 predictions = classifier.predict(testing_vectors)
4 print(predictions)
Similarly to the vectoriser, we first create a classifier by using the module we imported at the start.
Then we call fit() on the classifier and pass in our training vectors and their associated labels. The
decision tree is going to look at both and attempt to learn rules that separate the two kinds of data.
Once our classifier is trained, we can call the predict() method and pass in previously unseen data.
Here we pass in testing_vectors which is the list of vectorized test data that the computer didn’t
look at during training. It has to try and apply the rules it learned from the training data to this new
“unseen” data. Machine learning is pretty cool, but it’s not magic, so there’s no guarantee that the
rules we learned will be any good yet.
The code above produces the following output:
Let’s take a look at our test texts again to see if these predictions match up to reality.
Introduction to Machine Learning with Python and Repl.it 237
The output maps to the input by index, so the first output label (“positive”) matches up to the first
input text (“they love mary”), and the last output label (“negative”) matches up to the last input text
(“we are very bad”).
It looks like the computer got every example right! It’s not a difficult problem to solve. The words
“bad” and “hate” appear only in the negative texts and the words “good” and “love”, only in the
positive ones. Other words like “they”, “mary”, “you” and “we” appear in both good and bad texts. If
our model did well, it will have learned to ignore the words that appear in both kinds of texts, and
focus on “good”, “bad”, “love” and “hate”.
Decision Trees are not the most powerful machine learning model, but they have one advantage
over most other algorithms: after we have trained them, we can look inside and see exactly how
they work. More advanced models like deep neural networks are nearly impossible to make sense
of after training.
The Scikit-learn tree module contains a useful function to assist in visualising trees. Add the
following code to the end of your Repl:
In the left-hand pane, you should see a file called ‘tree.png’. If you open it, your tree graph should
look as follows:
Introduction to Machine Learning with Python and Repl.it 238
The above shows a decision tree that only learned two rules. The first rule (top square) is about the
word “hate”. The rule is “is the number of times ‘hate’ occurs in this sentence less than or equal to
0.5”. None of our sentences contain duplicate words, so each rule will really be only about whether
the word appears or not (you can think of the <= 0.5 rules as < 1 in this case).
For each question in our training dataset, we can ask if the first rule is True or False. If the rule is
True for a given sentence, we’ll move that sentence down the tree left. If not, we’ll go right.
Once we’ve asked this first question for each sentence in our dataset, we’ll have three sentences for
which the answer is “False”, because three of our training sentences contain the word “hate”. These
three sentences go right in the decision tree and end up at first leaf node (an end node with no arrows
coming out the bottom). This leaf node has value = [3,0] in it, which means that three samples
reach this node, and three belong to the negative class and zero to the positive class.
For each sentence where the first rule is “True” (the word “hate” appears less than 0.5 times, or in
our case 0 times), we go down the left of the tree, to the node where value = [2,5]. This isn’t a leaf
node (it has more arrows coming out the bottom), so we’re not done yet. At this point we have two
negative sentences and all five positive sentences still.
Introduction to Machine Learning with Python and Repl.it 239
The next rule is “bad <= 0.5”. In the same way as before, we’ll go down the right path if we have
more than 0.5 occurrences of “bad” and left if we have fewer than 0.5 occurrences of “bad”. For the
last two negative sentences that we are still evaluating (the two containing “bad”), we’ll go right
and end up at the node with value=[2,0]. This is another leaf node and when we get here we have
two negative sentences and zero positive ones.
All other data will go left, and we’ll end up at [0,5], or zero negative sentences and five positive
ones.
As an exercise, take each of the test sentences (not represented in the annotated tree above) and try
to follow the set of rules for each one. If it ends up in a bucket with more negative sentences than
positive ones (either of the right branches), it’ll be predicted as a negative sentence. If it ends up in
the left-most leaf node, it’ll be predicted as a positive sentence.
1 def manual_classify(text):
2 if "hate" in text:
3 return "negative"
4 if "bad" in text:
5 return "negative"
6 return "positive"
7
8 predictions = []
9 for text in test_texts:
10 prediction = manual_classify(text)
11 predictions.append(prediction)
12 print(predictions)
Here we have replicated the decision tree above. For each sentence, we check if it contains “hate”
and if it does we classify it as negative. If it doesn’t, we check if it contains “bad”, and if it does, we
classify it as negative. All other sentences are classified as positive.
So what’s the difference between machine learning and traditional rule-based models like this one?
The main advantage of learning the rules directly is that it doesn’t really get more difficult as the
dataset grows. Our dataset was trivially simple, but a real-world dataset might need thousands or
millions of rules, and while we could write a more complicated set of if-statements “by hand”, it’s
much easier if we can teach machines to learn these by themselves.
Introduction to Machine Learning with Python and Repl.it 240
Also, once we’ve perfected a set of manual rules, they’ll still only work for a single dataset. But once
we’ve perfected a machine learning model, we can use it for many tasks, simply by changing the
input data!
In the example we walked through, our model was a perfect student and learned to correctly classify
all five unseen sentences, this is not usually the case for real-world settings. Because machine
learning models are based on probability, the goal is to make them as accurate as possible, but
in general you will not get 100% accuracy. Depending on the problem, you might be able to get
higher accuracy by hand-crafting rules yourself, so machine learning definitely isn’t the correct tool
to solve all classification problems.
Try the code on bigger datasets to see how it performs. There is no shortage of interesting data sets
to experiment with. For example, you could have a look at positive vs negative movie reviews from
IMDB using the dataset here¹⁷⁶. See if you can load the dataset from there into the classifier we built
in this tutorial and compare the results.
You can fork this Repl here: https://repl.it/@GarethDwyer1/machine-learning-intro¹⁷⁷ to keep hack-
ing on it (it’s the same code as we walked through above but with some comments added.) If you
prefer, the entire program is shown at the end of this tutorial, so you can copy paste it and work
from there.
In the next chapter, we’ll be investigating the Quicksort algorithm. Whether you’re applying for jobs
or just like algorithms, it’s useful to understand how sorting works. In real projects, most of the time
you’ll just call .sort(), but here you’ll build a sorter from scratch and understand how it works.
—
61 print(predictions)
Quicksort tutorial: Python
implementation with line by line
explanation
In this tutorial, we’ll be going over the Quicksort¹⁷⁸ algorithm with a line-by-line explanation. We’ll
go through how the algorithm works, build it in Repl.it and then time it to see how efficient it is.
¹⁷⁸https://en.wikipedia.org/wiki/Quicksort
¹⁷⁹https://en.wikipedia.org/wiki/Sorting_algorithm
¹⁸⁰https://en.wikipedia.org/wiki/Recursion#In_computer_science
Quicksort tutorial: Python implementation with line by line explanation 244
1 xs = [8, 4, 2, 2, 1, 7, 10, 5]
We could pick the last element (5) as the pivot point. We would want the list (after partitioning) to
look as follows:
1 xs = [4, 2, 2, 1, 5, 7, 10, 8]
Note that this list isn’t sorted, but it has some interesting properties. Our pivot element, 5, is in the
correct place (if we sort the list completely, this element won’t move). Also, all the numbers to the
left are smaller than 5and all the numbers to the right are greater.
Because 5 is the in the correct place, we can ignore it after the partition algorithm (we won’t need
to move it again). This means that if we can sort the two smaller sublists to the left and right of
Quicksort tutorial: Python implementation with line by line explanation 245
5() [4, 2, 2, 1] and [7, 10, 8]) then the entire list will be sorted. Any time we can efficiently
break a problem into smaller sub-problems, we should think of recursion as a tool to solve our main
problem. Using recursion, we often don’t even have to think about the entire solution. Instead, we
define a base case (a list of length 0 or 1 is always sorted), and a way to divide a larger problem into
smaller ones (e.g. partitioning a list in two), and almost by magic the problem solves itself!
But we’re getting ahead of ourselves a bit. Let’s take a look at how to actually implement the partition
algorithm on its own, and then we can come back to using it to implement a sorting algorithm.
1 def bad_partition(xs):
2 smaller = []
3 larger = []
4 pivot = xs.pop()
5 for x in xs:
6 if x >= pivot:
7 larger.append(x)
8 else:
9 smaller.append(x)
10 return smaller + [pivot] + larger
In this implementation, we set up two temporary lists (smaller and larger). We then take the pivot
element as the last element of the list (pop takes the last element and removes it from the original
xs list).
We then consider each element x in the list xs. The ones that are smaller than the pivot, we store in
the smaller temporary list, and the others go to the larger temporary list. Finally, we combine the
two lists with the pivot item in the middle, and we have partitioned our list.
This is much easier to read than the implementation at the start of this post, so why don’t we do it
like this?
The primary advantage of Quicksort is that it is an in place sorting algorithm. Although for the toy
examples we’re looking at, it might not seem like much of an issue to create a few copies of our list,
if you’re trying to sort terabytes of data, or if you are trying to sort any amount of data on a very
limited computer (e.g a smartwatch), then you don’t want to needlessly copy arrays around.
In Computer Science terms, this algorithm has a space-complexity of O(2n), where n is the number
of elements in our xs array. If we consider our example above of xs = [8, 4, 2, 2, 1, 7, 10, 5],
we’ll need to store all 8 elements in the original xs array as well as three elements ([7, 10, 8]] in
the larger array and four elements ([4, 2, 2, 1]) in the smaller array. This is a waste of space!
With some clever tricks, we can do a series of swap operations on the original array and not need
to make any copies at all.
Quicksort tutorial: Python implementation with line by line explanation 246
In our good partition function, you can see that we do some swap operations (lines 5 and 8) on the
xs that is passed in, but we never allocate any new memory. This means that the storage remains
constant to the size of xs, or O(n) in Computer Science terms. That is, this algorithm has half the
space requirement of the “bad” implementation above, and should therefore allow us to sort lists
that are twice the size using the same amount of memory.
The confusing part of this implementation is that although everything is based around our pivot
element (the last item of the list in our case), and although the pivot element ends up somewhere in
the middle of the list at the end, we don’t actually touch the pivot element until the very last swap.
Instead, we have two other counters (follower and leader) which move around the smaller and
bigger numbers in a clever way and implicitly keep track of where the pivot element should end up.
We then switch the pivot element into the correct place at the end of the loop (line 8).
The leader is just a loop counter. Every iteration it increments by one until it gets to the pivot
element (the end of the list). The follower is more subtle, and it keeps count of the number of swap
iterations we do, moving up the list more slowly than the leader, tracking where our pivot element
should eventually end up.
The other confusing part of this algorithm is on line 4. We move through the list from left to right.
All numbers are currently to the left of the pivot but we eventually want the “big” items to end up
on the right.
Intuitively, you would then expect us to do the swapping action when we find an item that is larger
than the pivot, but in fact, we do the opposite. When we find items that are smaller than the pivot,
we swap the leader and the follower.
You can think of this as pushing the small items further to the left. Because the leader is always
ahead of the follower, when we do a swap, we are swapping a small element with one further left in
the list. The follower only looks at “big” items (ones that the leader has passed over without action),
so when we do the swap, we’re swapping a small item (leader) with a big one (follower), meaning
that small items will move towards the left and large ones towards the right.
Quicksort tutorial: Python implementation with line by line explanation 247
1 xs = [8, 4, 2, 2, 1, 7, 10, 5]
In xs, there are 4 items that are smaller than the pivot. Every time we find an item that is smaller
than the pivot, we increment follower by one. This means that at the end of the loop, follower will
have incremented 4 times and be pointing at index 4 in the original list. By inspection, you can see
that this is the correct place for our pivot element (5).
The last thing we do is return the follower index, which now points to our pivot element in its correct
place. We need to return this as it defines the two smaller sub-problems in our partitioned list - we
now want to sortxs[0:4] (the first 4 items, which form an unsorted list) and the xs[5:] (the last 3
items, which form an unsorted list).
Quicksort tutorial: Python implementation with line by line explanation 248
1 xs = [4, 2, 2, 1, 5, 7, 10, 8]
If you want another way to visualise exactly how this works, going over some examples by hand
(that is, writing out a short randomly ordered list with a pen and paper, and writing out the new
list at each step of the algorithm) is very helpful. You can also watch this detailed YouTube video¹⁸¹
where KC Ang demonstrates every step of the algorithm using paper cups in under 5 minutes!
To sort a list, we partition it (line 4), sort the left sublist (line 5: from the start of the original list up
to the pivot point), and then sort the right sublist (line 6: from just after the pivot point to the end
of the original list). We do this recursively with the end boundary moving left, closer to start, for
the left sublists and the start boundary moving right, closer to end, for the right sublists. When the
start and end boundaries meet (line 2), we’re done!
The first call to Quicksort will always be with the entire list that we want sorted, which means
that 0 will be the start of the list and len(xs)-1 will be the end of the list. We don’t want to have
to remember to pass these extra arguments in every time we call Quicksort from another program
(e.g. in any case where it is not calling itself), so we’ll make a prettier wrapper function with these
defaults to get the process started.
1 def quicksort(xs):
2 return _quicksort(xs, 0, len(xs)-1)
If you run this code, you will see the sorted list. This does what we expect, but it doesn’t tell us
about how efficient Quicksort is - so let’s take a closer look. Replace the code in main.py with the
following, and again add the code listed at the beginning of this tutorial after the imports on line 3.
The code generates a random list of 100 000 numbers and sorts this list in around 5 seconds. You can
compare the performance of Quicksort to some other common sorting algorithms using this Repl¹⁸².
If you want to try the code from the tutorial out, visit the Repl at https://repl.it/@GarethDwyer1/quicksort¹⁸³.
You’ll be able to run the code, see the results, and even fork it to continue developing or testing it
on your own.
¹⁸²https://repl.it/@GarethDwyer1/sorting
¹⁸³https://repl.it/@GarethDwyer1/quicksort?language=python3
Quicksort tutorial: Python implementation with line by line explanation 250
If you need help, the folk over at the Repl discord server¹⁸⁴ are very friendly and keen to help people
learn.
—
¹⁸⁴https://repl.it/discord
Closing note
We have now come to the end of the series of tutorials. You have learnt the basics of the Repl.it IDE,
worked with more advanced features and gone through a number of practical projects. This doesn’t
mean the end of fun projects, for you should now be equipped to tackle your own projects, which
you can start from scratch or use the code from the tutorials as a basis.