Celery Introduction
Celery Introduction
In the Terminal
Clone the repo. Your command should look something like this:
Background Tasks
When running a complex web application, sometimes we need to do things
that take a little while to execute. Some examples are fetching data from a
remote API (e.g. movie searching), sending out emails to many users,
running a reindex on a search database, and more.
This is where Celery comes in. It’s a library that is used to run tasks outside
of the main web process. At a high level, tasks are put into a queue (the time
to write the task to the queue is fast). The main webserver process can then
continue, and return a response to the client. Meanwhile, a broker reads
from the queue and passes the task on to worker(s) that process/execute the
task and puts the results back into a queue or database. When a task has
queued a reference to the task is returned. Later, we can use this reference
to get the result of the task. For example, we could return the reference to
the browser and it could make requests to an endpoint to check the result
is available, which means the task has finished. Sometimes we don’t need
to use the task reference, we just assume that the task will complete
successfully and exceptions will be logged to the usual places on failure.
Celery Setup
Before setting up Celery, let’s first look at broker selection.
Broker Selection
We’re going to use Redis as the broker, and the django-celery-results project
to store the task results. This will allow us to query task results in using
standard ORM methods and see them in the Django admin.
Redis
Redis also works as a key-value store/cache as well as having queueing
features for message brokering. You might consider using Redis as your
general cache for Django if you’re using it for your message brokering as
well.
Try It Out:
Before we start configuring Celery in our Django project, let’s get Redis
installed. Use the apt command to install it into your Codio environment.
It will install and start automatically. To check that it’s working, you can
issue it the ping command using the redis-cli tool.
redis-cli ping
PONG
Now let’s look at how to install and set up Celery in our project.
Installation and Configuration
Celery is installed like most Python libraries, using pip. To store the results
in the Django database, django-celery-results is also required. We’ll also
need the redis package for communication with the Redis server. They can
all be installed at the same time:
Open settings.py
CELERY_RESULT_BACKEND = "django-db"
CELERY_BROKER_URL = "redis://localhost:6379/0"
After this is done, the Celery Results models are available in the Django
admin.
celery results
Validate your changes by starting the Django dev server and checking that
you can see the Celery Results section, as per the image above.
View Project
celery.py
Think of this file kind of like wsgi.py. WSGI compatible web servers can
load the application variable from wsgi.py to run. Celery knows how to
load the app variable from celery.py to start the Celery worker.
We’ll take a look at the file that we’ll be using, and then go through and
explain how it works:
import os
os.environ.setdefault("DJANGO_SETTINGS_MODULE",
"course4_proj.settings")
os.environ.setdefault("DJANGO_CONFIGURATION", "Dev")
import configurations
configurations.setup()
app = Celery("course4_proj")
app.config_from_object("django.conf:settings",
namespace="CELERY")
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
Then we instantiate the Celery class with a name – in this case we use the
project name. We call config_from_object() on the instance, and tell it to
load the settings from the settings variable in the django.conf module. The
namespace argument essentially means the prefix for the Celery settings,
e.g. broker_url comes from CELERY_BROKER_URL.
__all__ = ("celery_app",)
Where:
Upon running this command the Celery worker should start like this:
$ celery -A course4_proj worker -l INFO
[tasks]
Indicating that the worker has started successfully and is ready to execute
tasks.
The worker can be stopped by typing Control-C in the terminal. Next we’ll
get Celery configured in your course4_proj project, and then add some
tasks.
Try It Out
Start by creating a file called celery.py in the course4_proj module
directory. Insert this content:
import os
os.environ.setdefault("DJANGO_SETTINGS_MODULE",
"course4_proj.settings")
os.environ.setdefault("DJANGO_CONFIGURATION", "Dev")
import configurations
configurations.setup()
app = Celery("course4_proj")
app.config_from_object("django.conf:settings",
namespace="CELERY")
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
Then open __init__.py in the same directory and insert this content:
Open init.py
__all__ = ("celery_app",)
Confirm that you get similar output as was shown above, finishing with the
ready message.
@app.task
def my_long_running_function(arg1, arg2):
# do something
However, with Django, this is not the preferred method of registering tasks.
Sometimes you might be writing tasks inside Django apps that are reused
by different projects. For example, we could open-source our movies app
and make it available to the community. When installed by a third party,
their Django project would not be called course4_proj and so trying to
import from course4_proj would fail for them.
@shared_task
def my_long_running_function(arg1, arg2):
# do something
my_long_running_function.delay("arg1", 2)
Therefore these two pieces of code will result in the same output; first the
“normal way”:
value = my_long_running_function("arg1", 2)
print(value)
res = my_long_running_function.delay("arg1", 2)
value = res.get()
print(value)
When we’re looking at this second block of code, we can see it’s obvious
how to fetch the result if you have access to the returned AsyncResult. But
you’re essentially making asynchronous code synchronous again. How can
we come back later and access the result? We’ll look at that in the next
section.
Fetching Results Later
res = my_long_running_function.delay("arg1", 2)
task_id = res.id
Get takes a timeout argument which defaults to None if not passed in. This
means the method will wait forever for a result, blocking the rest of your
code from running. The timeout argument can be provided, and indicates
the number of seconds that get() should block for. If no result is received
before the timeout, then celery.exceptions.TimeoutError is raised. By
using this argument, you can wait for a result for a short amount of time,
then give the user some feedback that the result is not ready, and
depending on your use case, you can try again or tell the user to try again.
For example, here’s how you could loop and keep waiting two seconds for
the result of the task until it was received. For every loop, print out a
message that the result is still pending.
from celery.exceptions import TimeoutError
res = my_long_running_function.delay("arg1", 2)
while True:
try:
value = res.get(timeout=2) # wait two seconds
break # once we have a value, break from the loop
except TimeoutError:
# print a message if no result after two seconds, then
continue looping
print("Task not finished after another two seconds,
waiting again.")
This allows us to give some feedback to the user so they know the program
hasn’t crashed.
Another feature of get() is that it will raise any uncaught exceptions inside
the delayed function. That is, in our example, if
my_long_running_function() raises an exception it should be caught with a
handler around res.get().
Now that we’ve seen how to schedule tasks and get their results later, let’s
implement this in the course4_proj.
Try It Out
Try It Out
We’ll make the movie search_and_save() function into a task. Then we’ll
add three views for search.
Start by creating a file called tasks.py in the movies app directory. It will
first need to import the shared_task decorator and the omdb_integration
module so we have access to search_and_save:
@shared_task
def search_and_save(search):
return omdb_integration.search_and_save(search)
Separate files
Since task functions can be called normally (i.e. even if decorated we can
still call search_and_save() instead of search_and_save.delay()) we could
choose to move the functions from omdb_integration.py to tasks.py.
Having them segregated like this can make it more clear which functions
you intend to run as tasks. It can also make the namespacing of your code
strange if you just have tasks.py as a “catch-all” for many different
functions across different domains.
Now we’ll implement the views. Open the movies app’s views.py. We won’t
be using templates, so you can remove the from django.shortcuts import
render line, and instead add these imports:
Open movies/views.py
import urllib.parse
The first view is search(). It retrieves the search term from the search_term
query string parameter, then uses it to
search_and_save.delay(search_term). It then waits two seconds for a
result, and if none is received, it redirects to the waiting view, with the ID
of the task for later reference. If search results are received within two
seconds (for example, if there aren’t many results or the results have
already been cached) then it redirects straight to the search results view.
Here’s the code to implement it:
def search(request):
search_term = request.GET["search_term"]
res = search_and_save.delay(search_term)
try:
res.get(timeout=2)
except TimeoutError:
return redirect(
reverse("search_wait", args=(res.id,))
+ "?search_term="
+ urllib.parse.quote_plus(search_term)
)
return redirect(
reverse("search_results")
+ "?search_term="
+ urllib.parse.quote_plus(search_term),
permanent=False,
)
Next the search_wait() view. It accepts the task UUID as an argument, then
uses that to fetch the AsyncResult(). It tries to get the result. By using a
timeout of -1 it will return immediately if there’s no result: using a timeout
of 0 actually means it will wait forever for a result which is not what we
want.
If the get() does timeout (i.e. there’s no result available immediately) then
a TimeoutError will be raised. It’s caught and a message is returned to the
browser telling the user to refresh the page to check again. If a result is
returned (which means the task has finished), then a redirect to the results
pages is returned. Here’s the search_wait() view:
try:
res.get(timeout=-1)
except TimeoutError:
return HttpResponse("Task pending, please refresh.",
status=200)
return redirect(
reverse("search_results")
+ "?search_term="
+ urllib.parse.quote_plus(search_term)
)
Finally the search_results. It queries the database for the search term and
returns all the results as a plain text list:
def search_results(request):
search_term = request.GET["search_term"]
movies = Movie.objects.filter(title__icontains=search_term)
return HttpResponse(
"\n".join([movie.title for movie in movies]),
content_type="text/plain"
)
Minimal implementation
Note that the intention of these views is to be a minimal implementation to
demonstrate Celery. You’ll probably spot several ways to raise exceptions.
For example, if search_term is missing. If we were implementing this in a
production app we’d have much better error handling.
The final file to set up is urls.py inside the course4_proj module directory.
Open urls.py
import movies.views
You’re ready to try it out, but there’s one last thing you can change to make
testing easier. In omdb_integrations.py in the search_and_save() function,
we return if the search_term was already used. You can comment out the
return inside the check, which will allow us to repeatedly make the same
search. This is not necessary, but can make seeing the behavior easy
because you can repeat long searches.
Before these new views will work, you’ll need to make sure a Celery
worker is started:
Open a terminal
[tasks]
. movies.tasks.search_and_save
Start the Django dev server too (in another terminal), then start a search by
visiting the search page. For example: /search/?search_term=star+wars.
This is a good search as it has a lot of results. The page should load for two
seconds and then you’ll be redirected to the waiting page.
waiting page
info
View Project
Notice how the ID of the task is used in the URL. You can now refresh the
waiting page, and can do so repeatedly until the search is finished (you can
watch the Celery output terminal to see when the search has completed).
Once it’s finished you will be redirected to the results view.
results page
Make sure you re-enable the early return for repeated searches in
search_and_save() (if you disabled it) and retry the search. You should get
redirected to the results straight away. Or, if you try a more precise search
that returns all the results within two seconds, you’ll be redirected without
going to the waiting view.
Use JavaScript to check if the results are ready on the waiting page.
fetch() can be called multiple times and then a redirect can be
performed in the browser, meaning the user doesn’t have to manually
refresh the page.
When a search result is initiated, store the task ID in the database for
the result (for example, in a new field on the SearchTerm model). If
another user performs a search while one is already active, we could
reuse the same task ID to prevent starting another query for the same
results.
The UUID is exposed to the user. In our case this is not a big security
concern, and UUIDs are nearly impossible to guess, so the chance of
being able to guess another user’s search based on UUID is fairly low.
However it’s a better practice to not allow fetching of arbitrary tasks
based on ID without check that they belong to the logged in user. This is
something that you’d need to store in the database or session yourself.
Task Results and Production
Task Results
Since we’re using django-celery-results we can see results of tasks in Django
admin. Log into Django admin and navigate to the Task Results list, you
should have some populated now. You can see the results of the task, what
parameters it was called with, which worker executed the task, and more.
View Project
This can make it easier to debug problems with Celery tasks. Debugging
asynchronous tasks can sometimes be quite difficult as minute differences
between the web environment and Celery environment can cause the tasks
to execute differently, so any extra information can help.
Celery in Production
When we take Django to production, we don’t just start up the Django dev
server in a terminal and leave it running. Similarly with Celery, we don’t
just run celery in a terminal. Instead it should be set up within your
operating system’s daemon system (supervisord, systemd, init, etc). More
information on this can be found at Celery’s Daemonizing Guide. The exact
process will depend on your OS.
In the next section we’ll look at Django signals and how to use them with
Celery.
Pushing to GitHub
Pushing to GitHub
Before continuing, you must push your work to GitHub. In the terminal:
git add .
git commit -m "Finish celery introduction"
Push to GitHub:
```bash
git push