Welcome to video 4 in Generating Random Data in Python. In the previous video, we learned why, in secure applications, it’s important to generate a random number in a cryptographically secure way through entropy. But how do we effectively incorporate entropy in code?
os.urandom()
There are two standard library modules in Python, secrets
and uuid
, that provide us with the necessary entropy to generate cryptographically secure random numbers. Both modules get entropy from your operating system, through the os
module’s os.urandom()
method. Let’s take a look at this method first:
>>> import math, os
>>> sec = os.urandom(32)
>>> sec = int.from_bytes(sec,sys.byteorder)
>>> len(str(sec))
>>> math.log10(sec)
After importing the necessary modules, we invoke os.urandom()
, passing it the size we need, and it returns a value of type bytes
. To make this easier to look at, let’s convert it to an integer where we can see that its length is 77
digits, a sufficiently sized random number.
random.secrets()
As of Python 3.6, we have secrets
, a short page of code that’s basically a wrapper around os.urandom()
.
secrets
is from PEP 506, which was introduced to more or less protect developers from themselves. In other words, developers who didn’t thoroughly read the documentation and used the random
module for secure applications. If this describes you, then don’t be too embarrassed. A quick Google search will you show you that there are many others in this camp.
From now on, however, you don’t have an excuse. Use secrets
instead.
secrets
exports a handful of functions for generating random numbers, bytes, and strings. Let’s look at some examples.
After importing secrets
and specifying a size, we can generate secure tokens from our system random in bytes, hex, or strings. We also have the familiar choice()
method:
>>> import secrets
>>> n = 16
>>> secrets.token_bytes(n)
b'\xe7\x04S\x9fv\xe9\xc4\x90\xf1\xb8\x98X\xb2\x0f\xf3B'
>>> secrets.token_hex(n)
'065c070f35c8ea534c3fc0f0d6c3e8d6'
>>> secrets.token_urlsafe(n)
'qNElCiWqsg_psF_mYeRnEw'
>>> secrets.choice('abcde')
'b'
Let’s see secrets
in action with a URL shortener application. The real-life versions of these are a bit more involved, but ours will be pretty simple to demonstrate the operation of token_urlsafe()
. This method, as its name suggests, returns a string that is URL safe in the number of bytes requested. We’ve incorporated this method in our shorten()
function, where we keep track of our URL mappings in a global DATABASE
variable.
In our main program, we pass a couple URL strings, and we print out our returned result for each, along with their database entries.
The reason we’re getting 7-character strings back when specifying 5 bytes is that token_urlsafe
uses base64 encoding where each character is 6 bits, and our result will be the ceiling of 8 * 5 bytes 6.
shortly.py:
from secrets import token_urlsafe
DATABASE = {}
def shorten(url: str, nbytes: int=5) -> str:
ext = token_urlsafe(nbytes=nbytes)
if ext in DATABASE:
return shorten(url, nbytes=nbytes)
else:
DATABASE.update({ext: url})
return f'short.ly/{ext}
In this example, we’re passing a string to the shorten()
function to generate a random token for the URL to map to. If it exists, we rerun until it’s unique. We specify 5 bytes as the default length. Here’s the implementation:
>>> urls = (
... 'https://realpython.com/',
... 'https://docs.python.org/3/library/secrets.html'
... )
>>> for u in urls:
... print(shorten(u))
short.ly/p_Z4fLI
short.ly/fuxSyNY
>>> DATABASE
{'p_Z4fLI': 'https://realpython.com/',
'fuxSyNY': 'https://docs.python.org/3/howto/regex.html'}
random.uuid()
The second module from the standard library, mentioned earlier, is uuid
. UUID stands for universally unique identifier. A UUID is 128 bits or 16 bytes or 32 hex digits. In the uuid
module, there’s a method, uuid4()
.
There are others, ending in 1, 3, and 5 but those variations take input, such as your machine name, whereas uuid4()
uses system random, so it’s the one that’s secure. Let’s see how uuid4()
works:
>>> import uuid
>>> tok = uuid.uuid4()
Notice that the uuid4()
method doesn’t return a string, but rather a class. This offers some convenience as the class instance has the attributes hex, int, and bytes
If you’re wondering about collisions (another word for generating duplicates), the chances are super small: one in 2^128, improbable enough to be considered secure.
SystemRandom
If you’ve been following along by looking at the standard library documentation for the modules we’ve been working with, then you may have noticed that the random module does provide a SystemRandom
class that uses os.urandom()
.
You might be wondering why Python’s random
module wouldn’t simply default to using the safer, more secure system random. First, as we noted, it’s often necessary to reproduce test or modeling data. Second, implementing crypographically secure random tends to be slower.
Hashing
Sometimes, there’s confusion about whether hashing involves randomness. In short, it does not. It’s an algorithm that produces a one-way, fixed-size string from a given input. A hash function will always produce the same string if given the same input. Its value is that it’s not reversible and can be used to verify digital integrity.
Some applications store hashes of user passwords so they can avoid storing plaintext passwords. The user types in their password, and then the app hashes it and compares the hash to the database.
A single hash cycle of the password is not secure enough for user passwords because it’s trivial to generate what’s known as a rainbow table, which is a sort of lookup guide for common words and their hashed equivalents. To safeguard against this, it’s common for systems to repeat or salt the hash. Salting the hash means adding some extra data to the original before it is hashed.
Sometimes that salt is generated randomly, but otherwise, hashing and randomness are otherwise not related.
Recap
We’ve covered a lot of ground in this video series, so let’s take a moment to recap.
We started with the random
module and many of its most useful methods and operations.
We then took a look at NumPy’s version of random
and how it can be useful in basic data science applications.
Finally, we wrapped up with cryptographically secure Python in the form of secrets
, which wraps os.urandom()
, our system entropy, and another module, uuid
, which uses that same entropy to generate unique IDs.
I hope you found the video series useful. If you have feedback or questions, please let us know in the comments below. Thanks for watching.
Congratulations, you made it to the end of the course! What’s your #1 takeaway or favorite thing you learned? How are you going to put your newfound skills to use? Leave a comment in the discussion section and let us know.
Justin Cletus on July 6, 2019
Hi @Jackie, It is great tutorial about learning python random module with other useful libraries.