How to Download Files From URLs With Python – Real Python
How to Download Files From URLs With Python – Real Python
Table of Contents
Facilitating File Downloads With Python
Downloading a File From a URL in Python
Using urllib From the Standard Library
Using the Third-Party requests Library
Saving Downloaded Content to a File
Downloading a Large File in a Streaming Fashion
Performing Parallel File Downloads
Using a Pool of Threads With the requests Library
Using the Asynchronous aiohttp Library
Deciding Which Option to Choose
File Sizes to Download
User-Friendliness
Additional Features and Flexibility
Conclusion
Frequently Asked Questions
Remove ads
Python makes it straightforward to download files from a URL with its robust set of libraries. For quick tasks, you can use the
built-in urllib module or the requests library to fetch and save files. When working with large files, streaming data in chunks can
help save memory and improve performance.
https://realpython.com/python-download-file-from-url/ 1/15
30/01/2025, 20:35 How to Download Files From URLs With Python – Real Python
You can also perform parallel file downloads using ThreadPoolExecutor for multithreading or the aiohttp library for asynchronous
tasks. These approaches allow you to handle multiple downloads concurrently, significantly reducing the total download time if
you’re handling many files.
You can use Python to download files with libraries like urllib and requests.
To download a file using a URL in Python, you can use urlretrieve() or requests.get().
To extract data from a URL in Python, you use the response object from requests.
To download a CSV file from a URL in Python, you may need to specify the format in the URL or query parameters.
In this tutorial, you’ll be downloading a range of economic data from the World Bank Open Data platform. To get started on this
example project, go ahead and grab the sample code below:
Free Bonus: Click here to download your sample code for downloading files from the Web with Python.
One advantage is flexibility, as Python has a rich ecosystem of libraries, including ones that offer efficient ways to handle
different file formats, protocols, and authentication methods. You can choose the most suitable Python tools to accomplish the
task at hand and fulfill your specific requirements, whether you’re downloading from a plain-text CSV file or a complex binary
file.
Another reason is portability. You may encounter situations where you’re working on cross-platform applications. In such cases,
using Python is a good choice because it’s a cross-platform programming language. This means that Python code can run
consistently across different operating systems, such as Windows, Linux, and macOS.
Using Python also offers the possibility of automating your processes, saving you time and effort. Some examples include
automating retries if a download fails, retrieving and saving multiple files from URLs, and processing and storing your data in
designated locations.
These are just a few reasons why downloading files using Python is better than using traditional command-line tools. Depending
on your project requirements, you can choose the approach and library that best suits your needs. In this tutorial, you’ll learn
approaches to some common scenarios requiring file retrievals.
Remove ads
While the urllib package comes with Python in its standard library, it has some limitations. So, you’ll also learn to use a popular
third-party library, requests, that offers more features for making HTTP requests. Later in the tutorial, you’ll see additional
functionalities and use cases.
https://realpython.com/python-download-file-from-url/ 2/15
30/01/2025, 20:35 How to Download Files From URLs With Python – Real Python
As a standard library package, urllib has no external dependencies and doesn’t require installing additional packages, making it
a convenient choice. For the same reason, it’s readily accessible for development and deployment. It’s also cross-platform
compatible, meaning you can write and run code seamlessly using the urllib package across different operating systems
without additional dependencies or configuration.
The urllib package is also very versatile. It integrates well with other modules in the Python standard library, such as re for
building and manipulating regular expressions, as well as json for working with JSON data. The latter is particularly handy when
you need to consume JSON APIs.
In addition, you can extend the urllib package and use it with other third-party libraries, like requests, BeautifulSoup, and
Scrapy. This offers the possibility for more advanced operations in web scraping and interacting with web APIs.
To download a file from a URL using the urllib package, you can call urlretrieve() from the urllib.request module. This
function fetches a web resource from the specified URL and then saves the response to a local file. To start, import urlretrieve()
from urlllib.request:
Python
Next, define the URL that you want to retrieve data from. If you don’t specify a path to a local file where you want to save the
data, then the function will create a temporary file for you. Since you know that you’ll be downloading a ZIP file from that URL,
go ahead and provide an optional path to the target file:
Python
>>> url = (
... "https://api.worldbank.org/v2/en/indicator/"
... "NY.GDP.MKTP.CD?downloadformat=csv"
... )
>>> filename = "gdp_by_country.zip"
Because your URL is quite long, you rely on Python’s implicit concatenation by splitting the string literal over multiple lines
inside parentheses. The Python interpreter will automatically join the separate strings on different lines into a single string. You
also define the location where you wish to save the file. When you only provide a filename without a path, Python will save the
resulting file in your current working directory.
Then, you can download and save the file by calling urlretrieve() and passing in the URL and optionally your filename:
Python
The function returns a tuple of two objects: the path to your output file and an HTTP message object. When you don’t specify a
custom filename, then you’ll see a path to a temporary file that might look like this: /tmp/tmps7qjl1tj. The HTTPMessage object
represents the HTTP headers returned by the server for the request, which can contain information like content type, content
length, and other metadata.
You can unpack the tuple into the individual variables using an assignment statement and iterate over the headers as though
they were a Python dictionary:
Python
https://realpython.com/python-download-file-from-url/ 3/15
30/01/2025, 20:35 How to Download Files From URLs With Python – Real Python
This information might be helpful when you’re unsure about which file format you’ve just downloaded and how you’re supposed
to interpret its content. In this case, it’s a ZIP file that’s about 128 kilobytes in size. You can also deduce the original filename,
which was API_NY.GDP.MKTP.CD_DS2_en_csv_v2_5551501.zip.
Now that you’ve seen how to download a file from a URL using Python’s urllib package, it’s time to tackle the same task using a
third-party library. You’ll find out which way is more convenient for you.
Remove ads
The requests library is also known for its flexibility and offers tighter control over the download process, allowing you to
customize it according to your project requirements. Some examples include the ability to specify request headers, handle
cookies, access data behind login-gated web pages, stream data in chunks, and more.
In addition, the library is designed to be efficient and performant by supporting various features that enhance the overall
download performance. Its ability to automatically handle connection pooling and reuse optimizes network utilization and
reduces overhead.
Now, you’ll look into using the requests library to download that same ZIP file with GDP by country data from the World Bank
Open Data platform. To begin, install the requests library into your active virtual environment using pip:
Shell
This command installs the latest release of the requests library into your virtual environment. Afterward, you can start a new
Python REPL session and import the requests library:
Python
Before moving further, it’s worth recalling the available HTTP methods because the requests library exposes them to you
through Python functions. When you make HTTP requests to web servers, you have two commonly used methods to choose
from:
https://realpython.com/python-download-file-from-url/ 4/15
30/01/2025, 20:35 How to Download Files From URLs With Python – Real Python
1. HTTP GET
2. HTTP POST
You’ll use the GET method to retrieve data by fetching a representation of the remote resource without modifying the server’s
state. Therefore, you’ll commonly use it to retrieve files like images, HTML web pages, or raw data. You’ll use the GET request in
later steps.
The POST method allows you to send data for the server to process or use in creating or updating a resource. In POST requests,
the data is typically sent in the request body in various formats like JSON or XML, and it’s not visible in the URL. You can use POST
requests for operations that modify server data, such as creating, updating, or submitting existing or new resources.
In this tutorial, you’ll only use GET requests for downloading files.
Next, define the URL of the file that you want to download. To include additional query parameters in the URL, you’ll pass in a
dictionary of strings as key-value pairs:
Python
In the example above, you define the same URL as before but specify the downloadformat=csv parameter separately using a
Python dictionary. The library will append those parameters to the URL after you pass them to requests.get() using an optional
params argument:
Python
This makes a GET request to retrieve data from the constructed URL with optional query parameters. The function returns an
HTTP response object with the server’s response to the request. If you’d like to see the constructed URL with the optional
parameters included, then use the response object’s .url attribute:
Python
>>> response.url
'https://api.worldbank.org/v2/en/indicator/NY.GDP.MKTP.CD?downloadformat=csv'
The response object provides several other convenient attributes that you can check out. For example, these two will let you
determine if the request was successful and what HTTP status code the server returned:
Python
>>> response.ok
True
>>> response.status_code
200
A status code of 200 indicates that your request has been completed successfully. Okay, but how do you access the data payload,
usually in JSON format, that you’ve retrieved with the requests library? Read on to answer this question in the next section.
Remove ads
https://realpython.com/python-download-file-from-url/ 5/15
30/01/2025, 20:35 How to Download Files From URLs With Python – Real Python
There are a few ways in which you can access data retrieved with the requests library, depending on content type. In particular,
when you want to save the original data to a local file, then you’ll use the .content attribute of the returned response object.
Because this attribute holds raw bytes, you’ll open a new file in binary mode for writing ('wb') and then write the downloaded
content to this file:
Python
What you see in the output is the number of bytes saved to the file. In this case, it’s consistent with the expected content length
that you saw earlier. Fine, but that file was only a hundred or so kilobytes long. How about downloading much larger files, which
are so common in many data science projects?
To overcome those issues, you can download large files in a streaming fashion to avoid reading the content of large responses
all at once. Data streams enable you to process and handle the data in manageable chunks, making the download process more
efficient and saving memory.
Data streaming also offers advantages in other scenarios when downloading files in Python, such as the ability to:
Download and process a file in small chunks: This comes in handy when a network enforces restrictions on the size of
data transfer in a single request. In these instances, streaming data can allow you to bypass these limitations to download
and process the file in smaller chunks.
Process and consume the data in real time: By processing the data as it arrives, you can use and extract insights from the
downloaded content of the file while the remaining data continues to download.
Pause and resume the download process: This enables you to download a portion of the file, pause the operation, and
later resume where you left off, without having to restart the entire download.
To download a large file in a streaming manner, you’d keep the request connection open and download only the response
headers by setting the stream keyword argument in the requests.get() function. Go ahead and try an example by downloading a
large ZIP file, which is around 72 megabytes, containing the World Development Indicators from the World Bank Open Data
platform:
Python
The stream=True parameter makes the requests library send a GET request at the specified URL in a streaming fashion, which
downloads only the HTTP response headers first. You can view those response headers by accessing the .headers attribute of the
received object:
Python
https://realpython.com/python-download-file-from-url/ 6/15
30/01/2025, 20:35 How to Download Files From URLs With Python – Real Python
>>> response.headers
{'Date': 'Wed, 28 Jun 2023 12:53:58 GMT',
'Content-Type': 'application/x-zip-compressed',
'Content-Length': '71855385',
'Connection': 'keep-alive',
'Last-Modified': 'Thu, 11 May 2023 14:56:30 GMT',
'ETag': '0x8DB522FE768EA66',
'x-ms-request-id': '8490ea74-101e-002f-73bf-a9210b000000',
'x-ms-version': '2009-09-19',
'x-ms-lease-status': 'unlocked',
'x-ms-blob-type': 'BlockBlob',
'Cache-Control': 'public, max-age=3600',
'x-azure-ref': '20230628T125357Z-99z2qrefc90b99ypt8spyt0dn40000...8dfa',
'X-Cache': 'TCP_MISS',
'Accept-Ranges': 'bytes'}
As you can see, one of the headers tells you that the server keeps the connection alive for you. This is an HTTP persistent
connection, which allows you to potentially send multiple HTTP requests within a single network connection. Otherwise, you’d
have to establish a new TCP/IP connection for each outgoing request, which is an expensive operation that takes time.
Another advantage of the streaming mode in the requests library is that you can download data in chunks even when you send
only one request. To do so, use the .iter_content() method provided by the response object. This enables you to iterate through
the response data in manageable chunks. In addition, you can specify the chunk size using the chunk_size parameter, which
represents the number of bytes that it should read into memory.
With data streaming, you’ll want to save the downloaded content locally as you progress through the download process:
Python
You specify a desired filename or path and open the file in binary mode (wb) using the with statement for better resource
management. Then, you iterate through the response data using response.iter_content(), choosing an optional chunk size,
which is 10 kilobytes in this case. Finally, you write each chunk to the file within the loop’s body.
Note: When you don’t intend to consume the entire message body by reading all of the chunks, then you should also use
the with statement for the request. Doing so will ensure that you have access to the response until you’re done reading all
of the desired content:
Python
This will ensure that the library gracefully closes the underlying connection and releases it to a shared pool for subsequent
requests.
You’re getting better at downloading a single file under different scenarios using Python. However, in real life, you’ll often want to
download more than one file at the same time. For example, you may need to fetch a set of invoice documents from a given time
period. In the next section, you’ll explore a few different ways to download multiple files at once in Python.
Remove ads
https://realpython.com/python-download-file-from-url/ 7/15
30/01/2025, 20:35 How to Download Files From URLs With Python – Real Python
In a multithreaded program, you typically spawn multiple threads of execution, each with its own instruction sequence that runs
independently and in parallel with the other. These threads can perform different tasks or execute other parts of the program
concurrently, improving performance and maximizing the use of the available CPU cores. For example, each thread can make its
own HTTP request.
You’ll now try concurrently downloading three ZIP files from the World Bank Open Data platform:
In this example, you’ll use a pool of threads with the requests library to understand how to perform parallel file downloads.
First, import the ThreadPoolExecutor class from the concurrent.futures module and the requests library again:
Python
Next, write a function that you’ll execute within each thread to download a single file from a given URL:
Python
This function takes a URL as an argument, makes a GET request using the requests library, and saves the retrieved data in a local
file. In this specific example, it first attempts to extract the filename from the Content-Disposition response header, which
contains information on which items on the page are displayed inline versus as attachments. If that’s not available, then it gets
the filename from a part of the URL.
Note: Another response header element that’s worth knowing is the status code. This indicates the status of the request.
For example, 200 is successful, 301 is a redirect, and 404 means page not found.
You’ll be downloading three separate files from the same API, so create a URL template and populate a Python list with it:
Python
https://realpython.com/python-download-file-from-url/ 8/15
30/01/2025, 20:35 How to Download Files From URLs With Python – Real Python
>>> template_url = (
... "https://api.worldbank.org/v2/en/indicator/"
... "{resource}?downloadformat=csv"
... )
>>> urls = [
... # Total population by country
... template_url.format(resource="SP.POP.TOTL"),
...
... # GDP by country
... template_url.format(resource="NY.GDP.MKTP.CD"),
...
... # Population density by country
... template_url.format(resource="EN.POP.DNST"),
... ]
Here, you call the .format() method on the template string with different resource names that correspond to ZIP files with CSV
data on a remote server.
To download these files concurrently using multiple threads of execution, create a new thread pool and map your
download_file() function onto each item from the list:
Python
As before, you use the with statement to help with resource management for your threads. Within the executor’s context, you call
the .map() method with the function that you want to execute. You also pass it an iterable, which is the list of URLs in this case.
The executor object allocates a pool of threads up front and assigns a separate thread to each download task.
Because downloading data from a network is an I/O-bound task, which is limited by the speed at which the data can be read
rather than the CPU speed, you achieve true parallelism. Despite Python’s global interpreter lock (GIL), which would otherwise
prevent that, the threads in the executor perform multiple tasks in parallel, resulting in faster overall completion time.
The sample ZIP files are relatively small and similar in size, so they finish downloading at roughly the same time. But what if you
used a for loop to download the files instead? Go ahead and call download_file() with each URL on the main thread:
Python
You may have noticed that this operation took much longer. That’s because the files were downloaded sequentially instead of
concurrently. So when the first file starts downloading, the second one won’t start until it finishes, and so on. This means that the
total time required to download all the files is the sum of the download times for each individual file.
Notes: While timing your code is beyond the scope of this tutorial, there are plenty of ways to do so in Python. Check out
Python Timer Functions: Three Ways to Monitor Your Code to learn what they are.
Unfortunately, using threads isn’t always desirable due to their complexity. The example in this section was fairly straightforward
because you didn’t have to worry about thread synchronization, coordination, or resource management. But when working with
multithreaded code, you should be careful about thread safety to ensure that you safely access and modify shared resources to
avoid data corruption.
Note: It’s also important not to run into deadlocks, livelocks, resource starvation, and many more problems related to
https://realpython.com/python-download-file-from-url/ 9/15
30/01/2025, 20:35 How to Download Files From URLs With Python – Real Python
There are some conditions where multithreading may not improve your performance at all. If the underlying problem is
inherently sequential, then there’s no way to parallelize it. Moreover, if your tasks involve CPU-bound operations, then Python
won’t be able to take advantage of multiple CPU cores because of the GIL from earlier. The additional cost of context switching
may actually reduce performance!
If working with threads makes your head spin, then you might be interested in exploring alternative solutions, like Python’s
asynchronous programming capabilities.
Remove ads
Asynchronous tasks also differ from multithreading in that they execute concurrently within a single thread instead of multiple
threads. Therefore, they must periodically give up their execution time to other tasks without hogging the CPU. By doing so, I/O-
bound tasks such as asynchronous downloads allow for concurrency, as the program can switch between tasks and make
progress on each in parallel.
A popular Python package to perform asynchronous downloads when retrieving multiple files is the aiohttp library. It’s built on
top of the standard library’s asyncio module, which provides a framework for asynchronous programming in Python.
The aiohttp library takes advantage of the concurrency features in the asyncio package, allowing you to write asynchronous
code that can handle multiple requests concurrently. The library can perform non-blocking network operations, meaning it’ll
let other code run while another task waits for data to arrive from the network.
By using the aiohttp API along with the async def and await keywords, you can write asynchronous code that makes concurrent
HTTP requests. In addition, the aiohttp library supports connection pooling, a feature that allows multiple requests to use the
same underlying connection. This helps to optimize and improve performance.
To begin, install the aiohttp library using pip in the command line:
Shell
This installs the aiohttp library into your active virtual environment.
In addition to this third-party library, you’ll also need the asyncio package from the Python standard library to perform
asynchronous downloads. So, import both packages now:
Python
The next step is defining an asynchronous function to download a file from a URL. You can do so by creating an
aiohttp.ClientSession instance, which holds a connector reused for multiple connections. It automatically keeps them alive for
a certain time period to reuse them in subsequent requests to the same server whenever possible. This improves performance
and reduces the overhead of establishing new connections for each request.
The following function performs an asynchronous download using the ClientSession class from the aiohttp package:
https://realpython.com/python-download-file-from-url/ 10/15
30/01/2025, 20:35 How to Download Files From URLs With Python – Real Python
Python
The function defined above takes a URL as an argument. It then creates a client session using the async with statement, ensuring
that the session is properly closed and resources are released after the program exits this code block. With the session context, it
makes an HTTP GET request to the specified URL and obtains the response object using the async with statement.
Inside the infinite loop, you read data in chunks, breaking out of the loop when there are no more chunks. The await keyword
indicates that this operation is asynchronous and other tasks can execute in parallel until the data is available. After deriving the
filename from the response object, you save the downloaded chunk of data in a local file.
Afterward, you can perform concurrent downloads using the asynchronous capabilities of the aiohttp and asyncio libraries. You
may reuse the code from an earlier example based on multithreading to prepare a list of URLs:
Python
>>> template_url = (
... "https://api.worldbank.org/v2/en/indicator/"
... "{resource}?downloadformat=csv"
... )
>>> urls = [
... # Total population by country
... template_url.format(resource="SP.POP.TOTL"),
...
... # GDP by country
... template_url.format(resource="NY.GDP.MKTP.CD"),
...
... # Population density by country
... template_url.format(resource="EN.POP.DNST"),
... ]
Finally, define and run an asynchronous main() function that will download files concurrently from those URLs:
Python
In the snippet above, the main() function defines a list comprehension to create a list of tasks, whereby each task calls the
download_file() function for each URL in the global variable urls. Note that you define the function with the async keyword to
make it asynchronous. Then, you use asyncio.gather() to wait until all tasks in the list are completed.
You run the main() function using asyncio.run(), which initiates and executes the asynchronous tasks in an event loop. This code
downloads files concurrently from the specified URLs without blocking the execution of other tasks.
https://realpython.com/python-download-file-from-url/ 11/15
30/01/2025, 20:35 How to Download Files From URLs With Python – Real Python
Remove ads
When downloading multiple files, you can either use the requests library together with multithreading or the aiohttp library
with asynchronous downloads to perform concurrent downloads. When used properly, either can improve performance and
optimize the download process.
User-Friendliness
If you’re looking for something quick and straightforward, then the urllib package is already included in Python’s standard
library, requiring no additional installation. This is a good option if you want to download small files that can fit entirely into
memory without any issues.
While you could build many functionalities using urllib, it may be harder to use and require more manual setup for more
involved use cases, such as streaming data and downloading multiple files in parallel. In fact, there are third-party libraries with
functions that readily support these features, such as the requests library, which can support data streaming with an argument
in the requests.get() method.
If you’d like to learn more about a project that might require extra features supported by the requests library, then check out this
project on building a content aggregator. Creating a content aggregator involves steps to download data from multiple websites,
some of which may require authentication and session management, so requests would really come in handy.
Conclusion
You can use Python to automate your file downloads or to have better control and flexibility over this process. Python offers
several options for downloading files from URLs that cater to different scenarios, such as downloading large files, multiple files,
or files behind gated web pages that require authentication.
In this tutorial, you’ve learned the steps to download files in Python, including how to:
Download files from the Internet using both built-in and external libraries in Python
Perform data streaming and download large files in smaller, more manageable chunks
Use a pool of threads to fetch multiple files concurrently
Download multiple files asynchronously when performing bulk downloads
In addition, you’ve seen how to use the requests library for its streaming and parallel downloading capabilities. You’ve also
seen examples using the aiohttp and asyncio libraries for concurrent requests and asynchronous downloads to improve
download speeds for multiple files.
https://realpython.com/python-download-file-from-url/ 12/15
30/01/2025, 20:35 How to Download Files From URLs With Python – Real Python
If you have questions, then feel free to reach out in the comments section below.
These FAQs are related to the most important concepts you’ve covered in this tutorial. Click the Show/Hide toggle beside each
question to reveal the answer.
Free Bonus: Click here to download your sample code for downloading files from the Web with Python.
🐍 Python Tricks 💌
Get a short & sweet Python Trick delivered to your inbox every couple of
days. No spam ever. Unsubscribe any time. Curated by the Real Python
team.
Email Address
About Claudia Ng
https://realpython.com/python-download-file-from-url/ 13/15
30/01/2025, 20:35 How to Download Files From URLs With Python – Real Python
Claudia is an avid Pythonista and Real Python contributor. She is a Data Scientist and has worked for several tech startups
specializing in the areas of credit and fraud risk modeling.
Each tutorial at Real Python is created by a team of developers so that it meets our high quality standards. The team members who
worked on this tutorial are:
Join us and get access to thousands of tutorials, hands-on video courses, and a community of
expert Pythonistas:
What’s your #1 takeaway or favorite thing you learned? How are you going to put your newfound skills to use? Leave a
comment below and let us know.
Commenting Tips: The most useful comments are those written with the goal of learning from or helping out other
students. Get tips for asking good questions and get answers to common questions in our support portal.
Looking for a real-time conversation? Visit the Real Python Community Chat or join the next “Office Hours” Live Q&A
Session. Happy Pythoning!
Keep Learning
Related Tutorials:
An Intro to Threading in Python
Async IO in Python: A Complete Walkthrough
How to Sort Unicode Strings Alphabetically in Python
Click and Python: Build Extensible and Composable CLI Apps
Socket Programming in Python (Guide)
Remove ads
https://realpython.com/python-download-file-from-url/ 15/15