0% found this document useful (0 votes)
5 views

How to Download Files From URLs With Python – Real Python

The document provides a comprehensive guide on how to download files from URLs using Python, detailing methods with both the built-in urllib module and the third-party requests library. It covers various techniques including downloading files, saving content, streaming large files, and performing parallel downloads. The tutorial emphasizes the advantages of using Python for file downloads, such as flexibility, portability, and automation capabilities.

Uploaded by

vikas06095
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

How to Download Files From URLs With Python – Real Python

The document provides a comprehensive guide on how to download files from URLs using Python, detailing methods with both the built-in urllib module and the third-party requests library. It covers various techniques including downloading files, saving content, streaming large files, and performing parallel downloads. The tutorial emphasizes the advantages of using Python for file downloads, such as flexibility, portability, and automation capabilities.

Uploaded by

vikas06095
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

30/01/2025, 20:35 How to Download Files From URLs With Python – Real Python

How to Download Files From URLs With Python


by Claudia Ng Jan 25, 2025 1 Comment basics

Mark as Completed Share

Table of Contents
Facilitating File Downloads With Python
Downloading a File From a URL in Python
Using urllib From the Standard Library
Using the Third-Party requests Library
Saving Downloaded Content to a File
Downloading a Large File in a Streaming Fashion
Performing Parallel File Downloads
Using a Pool of Threads With the requests Library
Using the Asynchronous aiohttp Library
Deciding Which Option to Choose
File Sizes to Download
User-Friendliness
Additional Features and Flexibility
Conclusion
Frequently Asked Questions

Remove ads

Python makes it straightforward to download files from a URL with its robust set of libraries. For quick tasks, you can use the
built-in urllib module or the requests library to fetch and save files. When working with large files, streaming data in chunks can
help save memory and improve performance.
https://realpython.com/python-download-file-from-url/ 1/15
30/01/2025, 20:35 How to Download Files From URLs With Python – Real Python

You can also perform parallel file downloads using ThreadPoolExecutor for multithreading or the aiohttp library for asynchronous
tasks. These approaches allow you to handle multiple downloads concurrently, significantly reducing the total download time if
you’re handling many files.

By the end of this tutorial, you’ll understand that:

You can use Python to download files with libraries like urllib and requests.
To download a file using a URL in Python, you can use urlretrieve() or requests.get().
To extract data from a URL in Python, you use the response object from requests.
To download a CSV file from a URL in Python, you may need to specify the format in the URL or query parameters.

In this tutorial, you’ll be downloading a range of economic data from the World Bank Open Data platform. To get started on this
example project, go ahead and grab the sample code below:

Free Bonus: Click here to download your sample code for downloading files from the Web with Python.

Facilitating File Downloads With Python


While it’s possible to download files from URLs using traditional command-line tools, Python provides several libraries that
facilitate file retrieval. Using Python to download files offers several advantages.

One advantage is flexibility, as Python has a rich ecosystem of libraries, including ones that offer efficient ways to handle
different file formats, protocols, and authentication methods. You can choose the most suitable Python tools to accomplish the
task at hand and fulfill your specific requirements, whether you’re downloading from a plain-text CSV file or a complex binary
file.

Another reason is portability. You may encounter situations where you’re working on cross-platform applications. In such cases,
using Python is a good choice because it’s a cross-platform programming language. This means that Python code can run
consistently across different operating systems, such as Windows, Linux, and macOS.

Using Python also offers the possibility of automating your processes, saving you time and effort. Some examples include
automating retries if a download fails, retrieving and saving multiple files from URLs, and processing and storing your data in
designated locations.

These are just a few reasons why downloading files using Python is better than using traditional command-line tools. Depending
on your project requirements, you can choose the approach and library that best suits your needs. In this tutorial, you’ll learn
approaches to some common scenarios requiring file retrievals.

Remove ads

Downloading a File From a URL in Python


In this section, you’ll learn the basics of downloading a ZIP file containing gross domestic product (GDP) data from the World
Bank Open Data platform. You’ll use two common tools in Python, urllib and requests, to download GDP by country.

While the urllib package comes with Python in its standard library, it has some limitations. So, you’ll also learn to use a popular
third-party library, requests, that offers more features for making HTTP requests. Later in the tutorial, you’ll see additional
functionalities and use cases.

Using urllib From the Standard Library


Python ships with a package called urllib, which provides a convenient way to interact with web resources. It has a
straightforward and user-friendly interface, making it suitable for quick prototyping and smaller projects. With urllib, you can
perform different tasks dealing with network communication, such as parsing URLs, sending HTTP requests, downloading files,

https://realpython.com/python-download-file-from-url/ 2/15
30/01/2025, 20:35 How to Download Files From URLs With Python – Real Python

and handling errors related to network operations.

As a standard library package, urllib has no external dependencies and doesn’t require installing additional packages, making it
a convenient choice. For the same reason, it’s readily accessible for development and deployment. It’s also cross-platform
compatible, meaning you can write and run code seamlessly using the urllib package across different operating systems
without additional dependencies or configuration.

The urllib package is also very versatile. It integrates well with other modules in the Python standard library, such as re for
building and manipulating regular expressions, as well as json for working with JSON data. The latter is particularly handy when
you need to consume JSON APIs.

In addition, you can extend the urllib package and use it with other third-party libraries, like requests, BeautifulSoup, and
Scrapy. This offers the possibility for more advanced operations in web scraping and interacting with web APIs.

To download a file from a URL using the urllib package, you can call urlretrieve() from the urllib.request module. This
function fetches a web resource from the specified URL and then saves the response to a local file. To start, import urlretrieve()
from urlllib.request:

Python

>>> from urllib.request import urlretrieve

Next, define the URL that you want to retrieve data from. If you don’t specify a path to a local file where you want to save the
data, then the function will create a temporary file for you. Since you know that you’ll be downloading a ZIP file from that URL,
go ahead and provide an optional path to the target file:

Python

>>> url = (
... "https://api.worldbank.org/v2/en/indicator/"
... "NY.GDP.MKTP.CD?downloadformat=csv"
... )
>>> filename = "gdp_by_country.zip"

Because your URL is quite long, you rely on Python’s implicit concatenation by splitting the string literal over multiple lines
inside parentheses. The Python interpreter will automatically join the separate strings on different lines into a single string. You
also define the location where you wish to save the file. When you only provide a filename without a path, Python will save the
resulting file in your current working directory.

Then, you can download and save the file by calling urlretrieve() and passing in the URL and optionally your filename:

Python

>>> urlretrieve(url, filename)


('gdp_by_country.zip', <http.client.HTTPMessage object at 0x7f06ee7527d0>)

The function returns a tuple of two objects: the path to your output file and an HTTP message object. When you don’t specify a
custom filename, then you’ll see a path to a temporary file that might look like this: /tmp/tmps7qjl1tj. The HTTPMessage object
represents the HTTP headers returned by the server for the request, which can contain information like content type, content
length, and other metadata.

You can unpack the tuple into the individual variables using an assignment statement and iterate over the headers as though
they were a Python dictionary:

Python

https://realpython.com/python-download-file-from-url/ 3/15
30/01/2025, 20:35 How to Download Files From URLs With Python – Real Python

>>> path, headers = urlretrieve(url, filename)


>>> for name, value in headers.items():
... print(name, value)
...
Date Wed, 28 Jun 2023 11:26:18 GMT
Content-Type application/zip
Content-Length 128310
Connection close
Set-Cookie api_https.cookieCORS=76a6c6567ab12cea5dac4942d8df71cc; Path=/; SameSite=None; Secure
Set-Cookie api_https.cookie=76a6c6567ab12cea5dac4942d8df71cc; Path=/
Cache-Control public, must-revalidate, max-age=1
Expires Wed, 28 Jun 2023 11:26:19 GMT
Last-Modified Wed, 28 Jun 2023 11:26:18 GMT
Content-Disposition attachment; filename=API_NY.GDP.MKTP.CD_DS2_en_csv_v2_5551501.zip
Request-Context appId=cid-v1:da002513-bd8b-4441-9f30-737944134422

This information might be helpful when you’re unsure about which file format you’ve just downloaded and how you’re supposed
to interpret its content. In this case, it’s a ZIP file that’s about 128 kilobytes in size. You can also deduce the original filename,
which was API_NY.GDP.MKTP.CD_DS2_en_csv_v2_5551501.zip.

Now that you’ve seen how to download a file from a URL using Python’s urllib package, it’s time to tackle the same task using a
third-party library. You’ll find out which way is more convenient for you.

Remove ads

Using the Third-Party requests Library


While urllib is a good built-in option, there may be scenarios where you need to use third-party libraries to make more
advanced HTTP requests, such as those requiring some form of authentication. The requests library is a popular, user-friendly,
and Pythonic API for making HTTP requests in Python. It can handle the complexities of low-level network communication
behind the curtain.

The requests library is also known for its flexibility and offers tighter control over the download process, allowing you to
customize it according to your project requirements. Some examples include the ability to specify request headers, handle
cookies, access data behind login-gated web pages, stream data in chunks, and more.

In addition, the library is designed to be efficient and performant by supporting various features that enhance the overall
download performance. Its ability to automatically handle connection pooling and reuse optimizes network utilization and
reduces overhead.

Now, you’ll look into using the requests library to download that same ZIP file with GDP by country data from the World Bank
Open Data platform. To begin, install the requests library into your active virtual environment using pip:

Shell

(venv) $ python -m pip install requests

This command installs the latest release of the requests library into your virtual environment. Afterward, you can start a new
Python REPL session and import the requests library:

Python

>>> import requests

Before moving further, it’s worth recalling the available HTTP methods because the requests library exposes them to you
through Python functions. When you make HTTP requests to web servers, you have two commonly used methods to choose
from:

https://realpython.com/python-download-file-from-url/ 4/15
30/01/2025, 20:35 How to Download Files From URLs With Python – Real Python

1. HTTP GET
2. HTTP POST

You’ll use the GET method to retrieve data by fetching a representation of the remote resource without modifying the server’s
state. Therefore, you’ll commonly use it to retrieve files like images, HTML web pages, or raw data. You’ll use the GET request in
later steps.

The POST method allows you to send data for the server to process or use in creating or updating a resource. In POST requests,
the data is typically sent in the request body in various formats like JSON or XML, and it’s not visible in the URL. You can use POST
requests for operations that modify server data, such as creating, updating, or submitting existing or new resources.

In this tutorial, you’ll only use GET requests for downloading files.

Next, define the URL of the file that you want to download. To include additional query parameters in the URL, you’ll pass in a
dictionary of strings as key-value pairs:

Python

>>> url = "https://api.worldbank.org/v2/en/indicator/NY.GDP.MKTP.CD"


>>> query_parameters = {"downloadformat": "csv"}

In the example above, you define the same URL as before but specify the downloadformat=csv parameter separately using a
Python dictionary. The library will append those parameters to the URL after you pass them to requests.get() using an optional
params argument:

Python

>>> response = requests.get(url, params=query_parameters)

This makes a GET request to retrieve data from the constructed URL with optional query parameters. The function returns an
HTTP response object with the server’s response to the request. If you’d like to see the constructed URL with the optional
parameters included, then use the response object’s .url attribute:

Python

>>> response.url
'https://api.worldbank.org/v2/en/indicator/NY.GDP.MKTP.CD?downloadformat=csv'

The response object provides several other convenient attributes that you can check out. For example, these two will let you
determine if the request was successful and what HTTP status code the server returned:

Python

>>> response.ok
True

>>> response.status_code
200

A status code of 200 indicates that your request has been completed successfully. Okay, but how do you access the data payload,
usually in JSON format, that you’ve retrieved with the requests library? Read on to answer this question in the next section.

Remove ads

https://realpython.com/python-download-file-from-url/ 5/15
30/01/2025, 20:35 How to Download Files From URLs With Python – Real Python

Saving Downloaded Content to a File


Now that you’ve retrieved content from a URL using the requests library, you can save it to your computer locally. When saving
data to a file in Python, it’s highly recommended to use the with statement. It ensures that Python properly manages resources,
including files, and automatically closes them when you no longer need them.

There are a few ways in which you can access data retrieved with the requests library, depending on content type. In particular,
when you want to save the original data to a local file, then you’ll use the .content attribute of the returned response object.
Because this attribute holds raw bytes, you’ll open a new file in binary mode for writing ('wb') and then write the downloaded
content to this file:

Python

>>> with open("gdp_by_country.zip", mode="wb") as file:


... file.write(response.content)
...
128310

What you see in the output is the number of bytes saved to the file. In this case, it’s consistent with the expected content length
that you saw earlier. Fine, but that file was only a hundred or so kilobytes long. How about downloading much larger files, which
are so common in many data science projects?

Downloading a Large File in a Streaming Fashion


You’ve now seen how to download a single ZIP file using both the standard urllib package and the third-party requests library. If
your project requires downloading a larger file, then you may run into issues using the steps above when you try to load the
entire file into memory.

To overcome those issues, you can download large files in a streaming fashion to avoid reading the content of large responses
all at once. Data streams enable you to process and handle the data in manageable chunks, making the download process more
efficient and saving memory.

Data streaming also offers advantages in other scenarios when downloading files in Python, such as the ability to:

Download and process a file in small chunks: This comes in handy when a network enforces restrictions on the size of
data transfer in a single request. In these instances, streaming data can allow you to bypass these limitations to download
and process the file in smaller chunks.

Process and consume the data in real time: By processing the data as it arrives, you can use and extract insights from the
downloaded content of the file while the remaining data continues to download.

Pause and resume the download process: This enables you to download a portion of the file, pause the operation, and
later resume where you left off, without having to restart the entire download.

To download a large file in a streaming manner, you’d keep the request connection open and download only the response
headers by setting the stream keyword argument in the requests.get() function. Go ahead and try an example by downloading a
large ZIP file, which is around 72 megabytes, containing the World Development Indicators from the World Bank Open Data
platform:

Python

>>> url = "https://databank.worldbank.org/data/download/WDI_CSV.zip"


>>> response = requests.get(url, stream=True)

The stream=True parameter makes the requests library send a GET request at the specified URL in a streaming fashion, which
downloads only the HTTP response headers first. You can view those response headers by accessing the .headers attribute of the
received object:

Python

https://realpython.com/python-download-file-from-url/ 6/15
30/01/2025, 20:35 How to Download Files From URLs With Python – Real Python

>>> response.headers
{'Date': 'Wed, 28 Jun 2023 12:53:58 GMT',
'Content-Type': 'application/x-zip-compressed',
'Content-Length': '71855385',
'Connection': 'keep-alive',
'Last-Modified': 'Thu, 11 May 2023 14:56:30 GMT',
'ETag': '0x8DB522FE768EA66',
'x-ms-request-id': '8490ea74-101e-002f-73bf-a9210b000000',
'x-ms-version': '2009-09-19',
'x-ms-lease-status': 'unlocked',
'x-ms-blob-type': 'BlockBlob',
'Cache-Control': 'public, max-age=3600',
'x-azure-ref': '20230628T125357Z-99z2qrefc90b99ypt8spyt0dn40000...8dfa',
'X-Cache': 'TCP_MISS',
'Accept-Ranges': 'bytes'}

As you can see, one of the headers tells you that the server keeps the connection alive for you. This is an HTTP persistent
connection, which allows you to potentially send multiple HTTP requests within a single network connection. Otherwise, you’d
have to establish a new TCP/IP connection for each outgoing request, which is an expensive operation that takes time.

Another advantage of the streaming mode in the requests library is that you can download data in chunks even when you send
only one request. To do so, use the .iter_content() method provided by the response object. This enables you to iterate through
the response data in manageable chunks. In addition, you can specify the chunk size using the chunk_size parameter, which
represents the number of bytes that it should read into memory.

With data streaming, you’ll want to save the downloaded content locally as you progress through the download process:

Python

>>> with open("WDI_CSV.zip", mode="wb") as file:


... for chunk in response.iter_content(chunk_size=10 * 1024):
... file.write(chunk)
...

You specify a desired filename or path and open the file in binary mode (wb) using the with statement for better resource
management. Then, you iterate through the response data using response.iter_content(), choosing an optional chunk size,
which is 10 kilobytes in this case. Finally, you write each chunk to the file within the loop’s body.

Note: When you don’t intend to consume the entire message body by reading all of the chunks, then you should also use
the with statement for the request. Doing so will ensure that you have access to the response until you’re done reading all
of the desired content:

Python

>>> with requests.get(url, stream=True) as response:


... # ...
...

This will ensure that the library gracefully closes the underlying connection and releases it to a shared pool for subsequent
requests.

You’re getting better at downloading a single file under different scenarios using Python. However, in real life, you’ll often want to
download more than one file at the same time. For example, you may need to fetch a set of invoice documents from a given time
period. In the next section, you’ll explore a few different ways to download multiple files at once in Python.

Remove ads

https://realpython.com/python-download-file-from-url/ 7/15
30/01/2025, 20:35 How to Download Files From URLs With Python – Real Python

Performing Parallel File Downloads


Downloading multiple files is yet another common scenario in Python. In such a case, you can speed up the download process by
getting the files in parallel. There are two popular approaches to downloading several files simultaneously:

1. Using a pool of threads with the requests library


2. Using asynchronous downloads with the aiohttp library

You’ll start with the first approach now.

Using a Pool of Threads With the requests Library


When you want to make many HTTP requests to the same or different servers, you can take advantage of multithreading to
decrease the overall execution time of your code.

In a multithreaded program, you typically spawn multiple threads of execution, each with its own instruction sequence that runs
independently and in parallel with the other. These threads can perform different tasks or execute other parts of the program
concurrently, improving performance and maximizing the use of the available CPU cores. For example, each thread can make its
own HTTP request.

You’ll now try concurrently downloading three ZIP files from the World Bank Open Data platform:

1. Total population by country


2. GDP by country
3. Population density by country

In this example, you’ll use a pool of threads with the requests library to understand how to perform parallel file downloads.
First, import the ThreadPoolExecutor class from the concurrent.futures module and the requests library again:

Python

>>> from concurrent.futures import ThreadPoolExecutor


>>> import requests

Next, write a function that you’ll execute within each thread to download a single file from a given URL:

Python

>>> def download_file(url):


... response = requests.get(url)
... if "content-disposition" in response.headers:
... content_disposition = response.headers["content-disposition"]
... filename = content_disposition.split("filename=")[1]
... else:
... filename = url.split("/")[-1]
... with open(filename, mode="wb") as file:
... file.write(response.content)
... print(f"Downloaded file {filename}")
...

This function takes a URL as an argument, makes a GET request using the requests library, and saves the retrieved data in a local
file. In this specific example, it first attempts to extract the filename from the Content-Disposition response header, which
contains information on which items on the page are displayed inline versus as attachments. If that’s not available, then it gets
the filename from a part of the URL.

Note: Another response header element that’s worth knowing is the status code. This indicates the status of the request.
For example, 200 is successful, 301 is a redirect, and 404 means page not found.

You’ll be downloading three separate files from the same API, so create a URL template and populate a Python list with it:

Python

https://realpython.com/python-download-file-from-url/ 8/15
30/01/2025, 20:35 How to Download Files From URLs With Python – Real Python

>>> template_url = (
... "https://api.worldbank.org/v2/en/indicator/"
... "{resource}?downloadformat=csv"
... )

>>> urls = [
... # Total population by country
... template_url.format(resource="SP.POP.TOTL"),
...
... # GDP by country
... template_url.format(resource="NY.GDP.MKTP.CD"),
...
... # Population density by country
... template_url.format(resource="EN.POP.DNST"),
... ]

Here, you call the .format() method on the template string with different resource names that correspond to ZIP files with CSV
data on a remote server.

To download these files concurrently using multiple threads of execution, create a new thread pool and map your
download_file() function onto each item from the list:

Python

>>> with ThreadPoolExecutor() as executor:


... executor.map(download_file, urls)
...
<generator object Executor.map.<locals>.result_iterator at 0x7fc9c90f0640>
Downloaded file API_SP.POP.TOTL_DS2_en_csv_v2_5551506.zip
Downloaded file API_NY.GDP.MKTP.CD_DS2_en_csv_v2_5551501.zip
Downloaded file API_EN.POP.DNST_DS2_en_csv_v2_5552158.zip

As before, you use the with statement to help with resource management for your threads. Within the executor’s context, you call
the .map() method with the function that you want to execute. You also pass it an iterable, which is the list of URLs in this case.
The executor object allocates a pool of threads up front and assigns a separate thread to each download task.

Because downloading data from a network is an I/O-bound task, which is limited by the speed at which the data can be read
rather than the CPU speed, you achieve true parallelism. Despite Python’s global interpreter lock (GIL), which would otherwise
prevent that, the threads in the executor perform multiple tasks in parallel, resulting in faster overall completion time.

The sample ZIP files are relatively small and similar in size, so they finish downloading at roughly the same time. But what if you
used a for loop to download the files instead? Go ahead and call download_file() with each URL on the main thread:

Python

>>> for url in urls:


... download_file(url)
...

You may have noticed that this operation took much longer. That’s because the files were downloaded sequentially instead of
concurrently. So when the first file starts downloading, the second one won’t start until it finishes, and so on. This means that the
total time required to download all the files is the sum of the download times for each individual file.

Notes: While timing your code is beyond the scope of this tutorial, there are plenty of ways to do so in Python. Check out
Python Timer Functions: Three Ways to Monitor Your Code to learn what they are.

Unfortunately, using threads isn’t always desirable due to their complexity. The example in this section was fairly straightforward
because you didn’t have to worry about thread synchronization, coordination, or resource management. But when working with
multithreaded code, you should be careful about thread safety to ensure that you safely access and modify shared resources to
avoid data corruption.

Note: It’s also important not to run into deadlocks, livelocks, resource starvation, and many more problems related to

https://realpython.com/python-download-file-from-url/ 9/15
30/01/2025, 20:35 How to Download Files From URLs With Python – Real Python

threads and concurrent programming in general.

There are some conditions where multithreading may not improve your performance at all. If the underlying problem is
inherently sequential, then there’s no way to parallelize it. Moreover, if your tasks involve CPU-bound operations, then Python
won’t be able to take advantage of multiple CPU cores because of the GIL from earlier. The additional cost of context switching
may actually reduce performance!

If working with threads makes your head spin, then you might be interested in exploring alternative solutions, like Python’s
asynchronous programming capabilities.

Remove ads

Using the Asynchronous aiohttp Library


In addition to multithreading, another method to download multiple files concurrently is by using the async/await pattern in
Python. It involves running multiple non-blocking tasks asynchronously in an event loop by allowing them to suspend and
resume execution voluntarily as a form of cooperative multitasking. This is different from threads, which require a preemptive
scheduler to manage context switching between them.

Asynchronous tasks also differ from multithreading in that they execute concurrently within a single thread instead of multiple
threads. Therefore, they must periodically give up their execution time to other tasks without hogging the CPU. By doing so, I/O-
bound tasks such as asynchronous downloads allow for concurrency, as the program can switch between tasks and make
progress on each in parallel.

A popular Python package to perform asynchronous downloads when retrieving multiple files is the aiohttp library. It’s built on
top of the standard library’s asyncio module, which provides a framework for asynchronous programming in Python.

The aiohttp library takes advantage of the concurrency features in the asyncio package, allowing you to write asynchronous
code that can handle multiple requests concurrently. The library can perform non-blocking network operations, meaning it’ll
let other code run while another task waits for data to arrive from the network.

By using the aiohttp API along with the async def and await keywords, you can write asynchronous code that makes concurrent
HTTP requests. In addition, the aiohttp library supports connection pooling, a feature that allows multiple requests to use the
same underlying connection. This helps to optimize and improve performance.

To begin, install the aiohttp library using pip in the command line:

Shell

(venv) $ python -m pip install aiohttp

This installs the aiohttp library into your active virtual environment.

In addition to this third-party library, you’ll also need the asyncio package from the Python standard library to perform
asynchronous downloads. So, import both packages now:

Python

>>> import asyncio


>>> import aiohttp

The next step is defining an asynchronous function to download a file from a URL. You can do so by creating an
aiohttp.ClientSession instance, which holds a connector reused for multiple connections. It automatically keeps them alive for
a certain time period to reuse them in subsequent requests to the same server whenever possible. This improves performance
and reduces the overhead of establishing new connections for each request.

The following function performs an asynchronous download using the ClientSession class from the aiohttp package:

https://realpython.com/python-download-file-from-url/ 10/15
30/01/2025, 20:35 How to Download Files From URLs With Python – Real Python

Python

>>> async def download_file(url):


... async with aiohttp.ClientSession() as session:
... async with session.get(url) as response:
... if "content-disposition" in response.headers:
... header = response.headers["content-disposition"]
... filename = header.split("filename=")[1]
... else:
... filename = url.split("/")[-1]
... with open(filename, mode="wb") as file:
... while True:
... chunk = await response.content.read()
... if not chunk:
... break
... file.write(chunk)
... print(f"Downloaded file {filename}")
...

The function defined above takes a URL as an argument. It then creates a client session using the async with statement, ensuring
that the session is properly closed and resources are released after the program exits this code block. With the session context, it
makes an HTTP GET request to the specified URL and obtains the response object using the async with statement.

Inside the infinite loop, you read data in chunks, breaking out of the loop when there are no more chunks. The await keyword
indicates that this operation is asynchronous and other tasks can execute in parallel until the data is available. After deriving the
filename from the response object, you save the downloaded chunk of data in a local file.

Afterward, you can perform concurrent downloads using the asynchronous capabilities of the aiohttp and asyncio libraries. You
may reuse the code from an earlier example based on multithreading to prepare a list of URLs:

Python

>>> template_url = (
... "https://api.worldbank.org/v2/en/indicator/"
... "{resource}?downloadformat=csv"
... )

>>> urls = [
... # Total population by country
... template_url.format(resource="SP.POP.TOTL"),
...
... # GDP by country
... template_url.format(resource="NY.GDP.MKTP.CD"),
...
... # Population density by country
... template_url.format(resource="EN.POP.DNST"),
... ]

Finally, define and run an asynchronous main() function that will download files concurrently from those URLs:

Python

>>> async def main():


... tasks = [download_file(url) for url in urls]
... await asyncio.gather(*tasks)
...
>>> asyncio.run(main())
Downloaded file API_SP.POP.TOTL_DS2_en_csv_v2_5551506.zip
Downloaded file API_EN.POP.DNST_DS2_en_csv_v2_5552158.zip
Downloaded file API_NY.GDP.MKTP.CD_DS2_en_csv_v2_5551501.zip

In the snippet above, the main() function defines a list comprehension to create a list of tasks, whereby each task calls the
download_file() function for each URL in the global variable urls. Note that you define the function with the async keyword to
make it asynchronous. Then, you use asyncio.gather() to wait until all tasks in the list are completed.

You run the main() function using asyncio.run(), which initiates and executes the asynchronous tasks in an event loop. This code
downloads files concurrently from the specified URLs without blocking the execution of other tasks.

https://realpython.com/python-download-file-from-url/ 11/15
30/01/2025, 20:35 How to Download Files From URLs With Python – Real Python

Remove ads

Deciding Which Option to Choose


At this point, you’ve learned how to use several tools to download files in Python. Depending on the task at hand, you may want
to choose one option over the others. Some factors to consider are the size and number of files that you’ll be working with, as
well as the tool’s ease of use, flexibility, and additional features.

File Sizes to Download


If you’re looking to download large files, then the requests library is a good option that will handle them efficiently. It can stream
data, letting you iterate over the message body in chunks for a more efficient process and better memory use.

When downloading multiple files, you can either use the requests library together with multithreading or the aiohttp library
with asynchronous downloads to perform concurrent downloads. When used properly, either can improve performance and
optimize the download process.

User-Friendliness
If you’re looking for something quick and straightforward, then the urllib package is already included in Python’s standard
library, requiring no additional installation. This is a good option if you want to download small files that can fit entirely into
memory without any issues.

While you could build many functionalities using urllib, it may be harder to use and require more manual setup for more
involved use cases, such as streaming data and downloading multiple files in parallel. In fact, there are third-party libraries with
functions that readily support these features, such as the requests library, which can support data streaming with an argument
in the requests.get() method.

Additional Features and Flexibility


The requests library has a rich set of features that can help you in numerous other download scenarios. Although you didn’t
extensively cover these tasks, the requests library has features that can handle authentication, redirects, session management,
and more. These features can give you more control and flexibility for more advanced tasks.

If you’d like to learn more about a project that might require extra features supported by the requests library, then check out this
project on building a content aggregator. Creating a content aggregator involves steps to download data from multiple websites,
some of which may require authentication and session management, so requests would really come in handy.

Conclusion
You can use Python to automate your file downloads or to have better control and flexibility over this process. Python offers
several options for downloading files from URLs that cater to different scenarios, such as downloading large files, multiple files,
or files behind gated web pages that require authentication.

In this tutorial, you’ve learned the steps to download files in Python, including how to:

Download files from the Internet using both built-in and external libraries in Python
Perform data streaming and download large files in smaller, more manageable chunks
Use a pool of threads to fetch multiple files concurrently
Download multiple files asynchronously when performing bulk downloads

In addition, you’ve seen how to use the requests library for its streaming and parallel downloading capabilities. You’ve also
seen examples using the aiohttp and asyncio libraries for concurrent requests and asynchronous downloads to improve
download speeds for multiple files.
https://realpython.com/python-download-file-from-url/ 12/15
30/01/2025, 20:35 How to Download Files From URLs With Python – Real Python

If you have questions, then feel free to reach out in the comments section below.

Frequently Asked Questions


Now that you have some experience with downloading files from a URL in Python, you can use the questions and answers below
to check your understanding and recap what you’ve learned.

These FAQs are related to the most important concepts you’ve covered in this tutorial. Click the Show/Hide toggle beside each
question to reveal the answer.

Can you use Python to download files? Show/Hide

How do you download a file using a URL in Python? Show/Hide

How do you download a CSV file from a URL in Python? Show/Hide

Free Bonus: Click here to download your sample code for downloading files from the Web with Python.

Mark as Completed Share

🐍 Python Tricks 💌
Get a short & sweet Python Trick delivered to your inbox every couple of
days. No spam ever. Unsubscribe any time. Curated by the Real Python
team.

Email Address

Send Me Python Tricks »

About Claudia Ng

https://realpython.com/python-download-file-from-url/ 13/15
30/01/2025, 20:35 How to Download Files From URLs With Python – Real Python

Claudia is an avid Pythonista and Real Python contributor. She is a Data Scientist and has worked for several tech startups
specializing in the areas of credit and fraud risk modeling.

» More about Claudia

Each tutorial at Real Python is created by a team of developers so that it meets our high quality standards. The team members who
worked on this tutorial are:

Aldren Brenda Bartosz

Geir Arne Kate Philipp

Master Real-World Python Skills With Unlimited Access to Real Python

Join us and get access to thousands of tutorials, hands-on video courses, and a community of
expert Pythonistas:

Level Up Your Python Skills »

What Do You Think?

Rate this article:


https://realpython.com/python-download-file-from-url/ 14/15
30/01/2025, 20:35 How to Download Files From URLs With Python – Real Python

LinkedIn Twitter Bluesky Facebook Email

What’s your #1 takeaway or favorite thing you learned? How are you going to put your newfound skills to use? Leave a
comment below and let us know.

Commenting Tips: The most useful comments are those written with the goal of learning from or helping out other
students. Get tips for asking good questions and get answers to common questions in our support portal.

Looking for a real-time conversation? Visit the Real Python Community Chat or join the next “Office Hours” Live Q&A
Session. Happy Pythoning!

Keep Learning

Related Topics: basics

Related Tutorials:
An Intro to Threading in Python
Async IO in Python: A Complete Walkthrough
How to Sort Unicode Strings Alphabetically in Python
Click and Python: Build Extensible and Composable CLI Apps
Socket Programming in Python (Guide)

Remove ads

© 2012–2025 Real Python ⋅ Newsletter ⋅ Podcast ⋅ YouTube ⋅ Twitter ⋅ Facebook ⋅ Instagram ⋅


Python Tutorials ⋅ Search ⋅ Privacy Policy ⋅ Energy Policy ⋅ Advertise ⋅ Contact
Happy Pythoning!

https://realpython.com/python-download-file-from-url/ 15/15

You might also like