02.omdb-api
02.omdb-api
Course 4 Project
The next few assignments will create a new Django app that uses the Open
Movie Database API. As such, we will need to fork a new GitHub repo
before creating the app.
Go to the course4_proj repo. This repo contains the starting point for
this project.
Click on the “Fork” button in the top-right corner.
Click the green “Code” button.
Copy the SSH information. It should look something like this:
[email protected]:<your_github_username>/course4_proj.git
In the Terminal
Clone the repo. Your command should look something like this:
Intro
The Open Movie Database is a free REST web service that can be queried to
get information about movies. To follow along with these examples, and
complete modules 3 and 4, you’ll need an API key. One can be obtained free
from https://www.omdbapi.com/apikey.aspx.
Once you have a key, it’s passed in the URL as the apikey parameter. Free
keys are limited to 1,000 requests per day.
Here’s some example code that uses Requests to get movie details by title
(in this case, star wars). It loads the OMDb key from an environment
variable.
import os
import requests
print(resp.json())
{'Title': 'Star Wars', 'Year': '1977', 'Rated': 'PG', 'Released': '25 May 1977', 'Runtime': '121 min'
Adventure, Fantasy', 'Director': 'George Lucas', 'Writer': 'George Lucas', 'Actors': 'Mark Hamill, Harrison Ford,
Carrie Fisher', 'Plot': "Luke Skywalker joins forces with a Jedi Knight, a cocky pilot, a Wookiee and two droids to
save the galaxy from the Empire's world-destroying battle station, while also attempting to rescue Princess Leia
from the mysterious Darth Vad", 'Language': 'English', 'Country': 'United States, United Kingdom'
Oscars. 63 wins & 29 nominations total', 'Poster': 'https://m.media-
amazon.com/images/M/MV5BNzVlY2MwMjktM2E4OS00Y2Y3LWE3ZjctYzhkZGM3YzA1ZWM2XkEyXkFqcGdeQXVyNzkwMjQ5NzM@._V1_SX300.jpg'
, 'Ratings': [{'Source': 'Internet Movie Database', 'Value': '8.6/10'}, {'Source': 'Rotten Tomatoes'
'92%'}, {'Source': 'Metacritic', 'Value': '90/100'}], 'Metascore': '90', 'imdbRating': '8.6',
'1,271,153', 'imdbID': 'tt0076759', 'Type': 'movie', 'DVD': '06 Dec 2005', 'BoxOffice': '$460,998,507'
'Production': 'Lucasfilm Ltd.', 'Website': 'N/A', 'Response': 'True'}
We can also search the API. Here’s code that performs a search for star
wars:
print(resp.json())
{'Search': [{'Title': 'Star Trek: Enterprise - In a Time of War', 'Year': '2014', 'imdbID': 'tt3445408'
'https://m.media-amazon.com/images/M/MV5BMTk4NDA4MzUwM15BMl5BanBnXkFtZTgwMTg3NjY5MDE@._V1_SX300.jpg'
Next Generation - Survive and Suceed: An Empire at War', 'Year': '2013', 'imdbID': 'tt3060318'
'https://m.media-amazon.com/images/M/MV5BMjM5ODY0MDQ2NF5BMl5BanBnXkFtZTgwMjQ5NDgwMDE@._V1_SX300.jpg'
Star Hand Kid Volume 3 - Time War', 'Year': '1989', 'imdbID': 'tt0410598', 'Type': 'movie', 'Poster'
Five Star Heroes: Gods of War", 'Year': '1998', 'imdbID': 'tt0371529', 'Type': 'movie', 'Poster'
War', 'Year': '2014', 'imdbID': 'tt4254746', 'Type': 'movie', 'Poster': 'https://m.media-
amazon.com/images/M/MV5BMzI1MDczMjc4N15BMl5BanBnXkFtZTgwNjk1NjU3NTE@._V1_SX300.jpg'}, {'Title'
Cold War: Declassified', 'Year': '2014', 'imdbID': 'tt3445422', 'Type': 'movie', 'Poster': 'N/A'
Wars Fan Film', 'Year': '2017', 'imdbID': 'tt6314408', 'Type': 'movie', 'Poster': 'https://m.media-
amazon.com/images/M/MV5BZjAyMzRhZDYtZTM2NC00NjgyLWE0ODItZGNjYzVlODBmZGYwL2ltYWdlL2ltYWdlXkEyXkFqcGdeQXVyNTc5MTY0OTA@._V1_SX
, {'Title': 'Sky Corps - Star Beast War', 'Year': '2019', 'imdbID': 'tt10368144', 'Type': 'movie'
'totalResults': '8', 'Response': 'True'}
Now that we have a basic idea of the data that’s being returned from the
OMDb API, we now need to plan how we’re going to use it in Django, so let’s
talk about the site we’re building to use this data and what considerations
need to be made.
The Site & User Behavior
A user will visit our site in order to comment on a movie, or read previous
comments about a movie. When they visit the site, let’s assume they will
begin by performing a search for that movie. How should this search
operate?
If the user is performing a search for a title that is not in our database, then
we’ll need to first find matching films on OMDb and use that to populate
our database. Then we can query our own database and get results.
Remember we should keep in mind that searching the local database is
generally a lot faster than a remote API, and that it costs us (uses up API
quota) every time a request is made to the API. However, we want to make
sure our own database is (reasonably) up to date.
We’re going to solve this problem by keeping a record of search terms that
have been used. We’ll only query the API if the search term hasn’t been
searched for in the past 24 hours. Otherwise, we only search our local
database.
Also notice in the API responses, the list response contains only some of the
data (Title, Year, imdbID, Poster (URL) and Type), whereas the detailed
response contains a lot more data. At this point, we need to decide if we
want to store data that the detailed response contains, or if it’s only
necessary to store list (summary) data.
In our case, we want to also store the plot and genre(s) of the movie, which
means we’ll have to retrieve the detailed response too. This will allow us to
display them on a movie detail page. In theory, it would also allow
searching by these fields. However, those searches would have to go
directly to our database. We can’t search OMDb by genre or plot, so we’d
only be able to enable this once we have a fairly “decent-sized” database
that gives reasonable results.
We need to consider how we go from summary results to detail results. In
some applications, the detailed response might change over time. For
example, if we were going to display the ratings of the movie, you could
expect them to vary slightly over time. To get the latest rating values, you’d
need to make sure that the movie data was re-fetched frequently to stay up
to date, perhaps once a day or once a week. This would mean storing a last-
fetched date/time and re-fetching after a certain period of time has elapsed
since then.
Since we don’t expect any of our data to change, we’ll just store a flag to
indicate if it’s the full record or not. If someone tries to view a movie that
doesn’t have the full record we can go and fetch it in realtime, and expect it
not to ever need to be updated.
With all that considered, here’s how the flow would work:
We’ll go into each step in more detail as we build the site. Next we’ll look at
modeling the movie data in Django.
Django Models
It was important to see the response that was received from the API so that
we know what fields are available and what we’re going to store. We’re
going to need three models: Movie, Genre and SearchTerm. The first two
should be obvious, the third will keep a record of search terms that were
used so that we know when to re-run the queries.
Try It Out
Try It Out
You can follow along to set up a new Django project and get the models
created. The fork you made of the GitHub repo course4_proj is the starting
point for this new project. It is the equivalent of having run the django-
admin startproject course4_proj command.
Once it’s set up, we’ll make the changes to support Django Configurations.
Start in the manage.py file. Change the line:
to:
os.environ.setdefault('DJANGO_SETTINGS_MODULE',
'course4_proj.settings')
os.environ.setdefault('DJANGO_CONFIGURATION', 'Dev')
Open settings.py
class Dev(Configuration):
The easiest way to get all the settings in the Dev class it to select them all
and hit Tab twice to indent them four spaces, so they become attributes of
the Dev class.
You should then set up the logging configurations so we can see some
debug messages. Add the LOGGING setting:
LOGGING = {
"version": 1,
"disable_existing_loggers": False,
"formatters": {
"verbose": {
"format": "{levelname} {asctime} {module}
{process:d} {thread:d} {message}",
"style": "{",
},
},
"handlers": {
"console": {
"class": "logging.StreamHandler",
"stream": "ext://sys.stdout",
"formatter": "verbose",
}
},
"root": {
"handlers": ["console"],
"level": "DEBUG",
},
}
We also need to update the settings so Django works with Codio. Start by
importing the os module.
import os
MIDDLEWARE = [
"debug_toolbar.middleware.DebugToolbarMiddleware",
"django.middleware.security.SecurityMiddleware",
"django.contrib.sessions.middleware.SessionMiddleware",
"django.middleware.common.CommonMiddleware",
# 'django.middleware.csrf.CsrfViewMiddleware',
"django.contrib.auth.middleware.AuthenticationMiddleware
",
"django.contrib.messages.middleware.MessageMiddleware",
#
'django.middleware.clickjacking.XFrameOptionsMiddleware',
If all your changes went OK, then the movies app will be created without
any issue. Update settings.py by adding 'movies' to INSTALLED_APPS (we’ll
create it now).
Open settings.py
Open models.py
class SearchTerm(models.Model):
class Meta:
ordering = ["id"]
term = models.TextField(unique=True)
last_search = models.DateTimeField(auto_now=True)
The term that’s searched for, term, is unique. The last_search is the date
that the search was last performed. We use this to make sure the search
isn’t repeated too often.
class Genre(models.Model):
class Meta:
ordering = ["name"]
name = models.TextField(unique=True)
class Movie(models.Model):
class Meta:
ordering = ["title", "year"]
title = models.TextField()
year = models.PositiveIntegerField()
runtime_minutes = models.PositiveIntegerField(null=True)
imdb_id = models.SlugField(unique=True)
genres = models.ManyToManyField(Genre,
related_name="movies")
plot = models.TextField(null=True, blank=True)
is_full_record = models.BooleanField(default=False)
It has the fields that you would expect, to match what the API is supplying
us. We use the is_full_record flag to determine if the Movie contains only
the values in the list response, or if it has been supplemented with the full
detail response.
Notice how imdb_id (the ID of the movie from the Internet Movie DataBase)
has been defined. It’s a unique SlugField. We’re actually going to treat this
as kind of like a primary key. In the movie detail URL, we’ll use this, and
then query the database on it. It will also allow us to create a mapping
between the record in our local DB and the data provided by OMDb. OMDb
in turn uses this ID to map back to IMDb.
Add these three models to models.py and save the file. Then, run the
makemigrations and migrate management commands.
OMDb Client
When building a REST client that’s going to be used with Django, it’s
tempting to write it so that it’s tightly integrated with Django. For example,
OMDb requires an API key for each request. We’re going to store this in
settings.py, and it would be tempting to write an OMDb client that
automatically retrieves the setting value when it’s instantiated. We could
also have our client directly write to the Django database.
When writing our OMDb client, we’ll try to stick to these rules:
Have a single method that’s used to make requests, so we have just one
place to add authentication and error handling.
It should not need to know about Django at all, this will allow us to
refactor more easily and re-use the code in a non-Django project.
The transformation from JSON to Python should take place in the client.
The consumers of the client should not need to know about the data
structure of the response. This means our API can change and only our
client code needs to be updated, not different parsing code throughout
our codebase.
Now you can follow along as we implement the client and data helper class.
Try It Out
Start by creating a new directory in the root of the Django project (the first
course4_proj), called omdb. Inside it, create an empty file called
__init__.py, then another file called client.py.
Inside client.py we need to start with some imports, and set up two global
variables. Add this code:
import logging
import requests
logger = logging.getLogger(__name__)
OMDB_API_URL = "https://www.omdbapi.com/"
Here we’re just setting up the logger and defining the API URL which we’ll
use in the client.
You should be able to see the advantage of using this class. We’re moving
all the transformations from API to Python into a single place. There is a
separation between the data that’s returned from the API and how we’re
using said data in Python. If the API response were to change, for example,
Title changes to Name, we could still refer to title in our code and just
change the key that’s being used to fetch it from the data.
class OmdbMovie:
"""A simple class to represent movie data coming back from
OMDb
and transform to Python types."""
@property
def imdb_id(self):
return self.data["imdbID"]
@property
def title(self):
return self.data["Title"]
@property
def year(self):
return int(self.data["Year"])
@property
def runtime_minutes(self):
self.check_for_detail_data_key("Runtime")
if units != "min":
raise ValueError(f"Expected units 'min' for runtime.
Got '{units}")
return int(rt)
@property
def genres(self):
self.check_for_detail_data_key("Genre")
@property
def plot(self):
self.check_for_detail_data_key("Plot")
return self.data["Plot"]
Yield
If you’re not familiar with the yield keyword, it turns a Python function
into a generator. This means the function’s “return value” must be iterated
across. For example, results = client.search(term) doesn’t mean results
contain a list of results. You would have to do something like results =
[movie for movie in client.search(term)]. Going more in-depth into
generators is beyond the scope of this course, the Python Wiki page on
Generators is a good place to start for more information.
Both of these methods call just one method to make the request: the
make_request() method. Since each API endpoint is on the same URL, it
only varies on the paramaters passed. make_request() accepts just a single
argument: the dictionary of paramaters to pass to the URL. The method will
automatically add the API key to the params when sending to the API.
while True:
logger.info("Fetching page %d", page)
resp = self.make_request({"s": search, "type":
"movie", "page": str(page)})
resp_body = resp.json()
if total_results is None:
total_results = int(resp_body["totalResults"])
page += 1
Even though this class is separate from Django, that doesn’t stop us from
writing some helper code to make it easier to get an OmdbClient instance.
Let’s write a function that will instantiate an OmdbClient and pass in the key
in the Django settings. Create a file called django_client.py inside the omdb
directory. Add this content:
Open django_client.py
def get_client_from_settings():
"""Create an instance of an OmdbClient using the OMDB_KEY
from the Django settings."""
return OmdbClient(settings.OMDB_KEY)
OMDB_KEY = values.SecretValue()
$ export DJANGO_OMDB_KEY=abc123
$ python manage.py [command...]
Since you’re only using this project and key for your own personal
development, you might find it tedious having to do this. Therefore, you
can choose to hard code the API key directly into the settings.py file.
Important, use your OMDb key in place of "abc123".
Open settings.py
OMDB_KEY = "abc123"
As we saw in Course One, this is kind of like having the SECRET_KEY in
settings.py. It’s fine for development, but when going into production you
would want to make sure it was stored securely and not in your codebase.
Helper Functions
Helper Functions
Rather than integrate OmdbClient directly into our Django views, we’ll write
three helper functions that contain the logic. This will allow to add some
management commands without repeating code. The helper functions will
be:
Try It Out
Now you will implement these helper functions. Start by creating the file
omdb_integration.py, inside the movies app directory. You’ll first need to
add some imports and set up the logger:
import logging
import re
from datetime import timedelta
logger = logging.getLogger(__name__)
def get_or_create_genres(genre_names):
for genre_name in genre_names:
genre, created =
Genre.objects.get_or_create(name=genre_name)
yield genre
Then the fill_movie_details() function:
def fill_movie_details(movie):
"""
Fetch a movie's full details from OMDb. Then, save it to the
DB. If the movie already has a `full_record` this does
nothing, so it's safe to call with any `Movie`.
"""
if movie.is_full_record:
logger.warning(
"'%s' is already a full record.",
movie.title,
)
return
omdb_client = get_client_from_settings()
movie_details = omdb_client.get_by_imdb_id(movie.imdb_id)
movie.title = movie_details.title
movie.year = movie_details.year
movie.plot = movie_details.plot
movie.runtime_minutes = movie_details.runtime_minutes
movie.genres.clear()
for genre in get_or_create_genres(movie_details.genres):
movie.genres.add(genre)
movie.is_full_record = True
movie.save()
Notice that if the movie already has all the data (is_full_record is True)
then we do nothing. Otherwise we instantiate an OmdbClient, fetch the
details, then update and save the Movie.
search_term, created =
SearchTerm.objects.get_or_create(term=normalized_search_term)
omdb_client = get_client_from_settings()
if created:
logger.info("Movie created: '%s'", movie.title)
search_term.save()
Notice that the search term is normalized (multiple spaces removed, and
then lowercased) before checking if the search has been done before. If the
search has not been made, then an OmdbClient is instantiated and a search
performed, with each result being saved to the local database.
Now in order to test these we’ll create some management commands. This
will allow testing without having to write views and templates yet. In the
movies app directory, create a directory called management with an empty
__init__.py file inside it. Then inside the management directory, create
another directory called commands, also with an empty __init__.py file
inside.
Create a file inside the commands directory called movie_search.py. Copy and
paste this content:
Open movie_search.py
class Command(BaseCommand):
help = "Search OMDb and populates the database with results"
To test this out and perform your first search, execute the movie_search
command with manage.py:
You’ll probably see a lot more output, and sometimes the search can take a
while to complete if there are a lot of results. But now, you will have Movie
objects in your database. You can verify this by starting a Django shell and
querying the database.
You can also test running the same search more than once, and verify that
the search is not performed again within 24 hours.
The output should inform the user that a search was done in the past 24
hours,
The final command we’ll create is the one to fetch the full data for a movie,
given its IMDB ID. This will use the fill_movie_details() function. Create
a file called movie_fill.py inside the commands directory.
Open movie_fill.py
logger = logging.getLogger(__name__)
class Command(BaseCommand):
help = "Search OMDb and populates the database with results"
fill_movie_details(movie)
Try running the command more than once with the same IMDb ID.
Wrap-Up
We’ve implemented the models, client, and functions to tie them together.
At this point it would be trivial to write the views and templates. We’re not
going to do that as they should be straightforward:
Results
We should note that we’re assuming our movie search will return the same
results as OMDb’s. For example, if OMDb uses a full-text search but we use
a simple contains query then the records might not match. This is
something that you would need to test with trial and error to make sure the
results were consistent.
Finally it’s important to mention that you need to follow the licensing rules
of the API. The terms of use may not allow you to make a local copy of your
data. Fortunately for us the data in OMDb is licensed under the Creative
Commons Attribution-NonCommercial 4.0 International License. This
means we are allowed to copy, redistribute, remix, transform, etc, the data,
as long as the source is attributed and we use it in a non-commercial way. If
we were building a UI for our site, we’d make sure to have attribution and
follow these terms.
In the next section, we’re going to look at interacting with an API using a
third-party library instead of our own client.
Pushing to GitHub
Pushing to GitHub
Before continuing, you must push your work to GitHub. In the terminal:
git add .
git commit -m "Finish omdb api"
Push to GitHub:
git push