How to extract images from PDF in Python? Last Updated : 09 Sep, 2024 Comments Improve Suggest changes Like Article Like Report The task in this article is to extract images from PDFs and convert them to Image to PDF and PDF to Image in Python.To extract the images from PDF files and save them, we use the PyMuPDF library. First, we would have to install the PyMuPDF library using Pillow.pip install PyMuPDF PillowPyMuPDF is used to access PDF files. To extract images from a PDF file, we need to follow the steps mentioned below-Import necessary librariesSpecify the path of the file from which you want to extract images and open itIterate through all the pages of the PDF and get all images and objects present on every pageUse getImageList() method to get all image objects as a list of tuplesTo get the image in bytes and along with the additional information about the image, use extractImage()Note: To download the PDF file click here.Implementation: Python # STEP 1 # import libraries import fitz # PyMuPDF import io from PIL import Image # STEP 2 # file path you want to extract images from file = "/content/pdf_file.pdf" # open the file pdf_file = fitz.open(file) # STEP 3 # iterate over PDF pages for page_index in range(len(pdf_file)): # get the page itself page = pdf_file.load_page(page_index) # load the page image_list = page.get_images(full=True) # get images on the page # printing number of images found in this page if image_list: print(f"[+] Found a total of {len(image_list)} images on page {page_index}") else: print("[!] No images found on page", page_index) for image_index, img in enumerate(image_list, start=1): # get the XREF of the image xref = img[0] # extract the image bytes base_image = pdf_file.extract_image(xref) image_bytes = base_image["image"] # get the image extension image_ext = base_image["ext"] # save the image image_name = f"image{page_index+1}_{image_index}.{image_ext}" with open(image_name, "wb") as image_file: image_file.write(image_bytes) print(f"[+] Image saved as {image_name}") Output:Image to PDF and PDF to Image Conversion:Image to PDF ConversionNote: The image used here can be found here. Python import fitz doc = fitz.open() imgdoc = fitz.open('image.jpeg') # open image pdfbytes = imgdoc.convert_to_pdf() imgpdf = fitz.open("pdf", pdfbytes) doc.insert_pdf(imgpdf) doc.save('imagetopdf.pdf') # save file First, we opened a blank document. Then we opened the image.Now the image is converted to PDF using the convert_to_pdf() method.After conversion, the image is appended to the empty doc which we created at starting. The document is saved after it has been appended.Output: PDF to Image ConversionNote: We are using the sample.pdf for PDf to image conversion; to get the pdf, use the link below.https://www.africau.edu/images/default/sample.pdf - sample.pdf Python import fitz doc = fitz.open('sample.pdf') for page in doc: pix = page.get_pixmap(matrix=fitz.Identity, dpi=None, colorspace=fitz.csRGB, clip=None, alpha=True, annots=True) pix.save("samplepdfimage-%i.jpg" % page.number) # save file We used the get_pixmap() method to convert pdf to image and then saved the image.Output:The sample.pdf is a two-page document, so two separate images are created. Comment More infoAdvertise with us Next Article How to extract images from PDF in Python? D devanshigupta1304 Follow Improve Article Tags : Python python-utility Listicles Practice Tags : python Similar Reads How to extract image metadata in Python? Prerequisites: PIL Metadata stands for data about data. In case of images, metadata means details about the image and its production. Some metadata is generated automatically by the capturing device. Some details contained by image metadata is as follows: HeightWidthDate and TimeModel etc. Python h 2 min read How to Convert Image to PDF in Python? img2pdf is an open source Python package to convert images to pdf format. It includes another module Pillow which can also be used to enhance image (Brightness, contrast and other things) Use this command to install the packages pip install img2pdf  Below is the implementation: Image can be convert 1 min read Extract hyperlinks from PDF in Python Prerequisite: PyPDF2, Regex In this article, We are going to extract hyperlinks from PDF in Python. It can be done in different ways: Using PyPDF2Using pdfx Method 1: Using PyPDF2. PyPDF2 is a python library built as a PDF toolkit. It is capable of Extracting document information and many more. Appr 2 min read How to Extract PDF Tables in Python? When handling data in PDF files, you may need to extract tables for use in Python programs. PDFs (Portable Document Format) preserve the layout of text, images and tables across platforms, making them ideal for sharing consistent document formats. For example, a PDF might contain a table like:User_I 3 min read How to open an image from the URL in PIL? In this article, we will learn How to open an image from the URL using the PIL module in python. For the opening of the image from a URL in Python, we need two Packages urllib and Pillow(PIL). Approach:Install the required libraries and then import them. To install use the following commands:pip ins 1 min read How to Download All Images from a Web Page in Python? Prerequisite: Requests BeautifulSouposFile Handling Web scraping is a technique to fetch data from websites. While surfing on the web, many websites donât allow the user to save data for personal use. One way is to manually copy-paste the data, which both tedious and time-consuming. Web Scraping is 3 min read How to download an image from a URL in Python Downloading content from its URL is a common task that Web Scrapers or online trackers perform. These URLs or Uniform Resource Locators can contain the web address (or local address) of a webpage, website, image, text document, container files, and many other online resources. It is quite easy to do 3 min read How to iterate through images in a folder Python? In this article, we will learn how to iterate through images in a folder in Python. Method 1: Using os.listdirExample 1: Iterating through .png onlyAt first we imported the os module to interact with the operating system.Then we import listdir() function from os to get access to the folders given i 2 min read How to Recognize Optical Characters in Images in Python? Prerequisite: Pytesseract, OpenCV In this article, we are going to recognize the character from the images and get text data out of an image. Let's take a quick introduction to the required module. OpenCV: It is a Python module in which we can do image processing, video capture, and some analysis to 2 min read How to create Word Art from an image using Python? In this article, we are going to learn how to create word art from an image using Python. In this, we are going to take an image as input and then created a similar image in a text form using the characters. We can perform this task with the help of pillow and pywhatkit modules of Python. Pillow Thi 2 min read Like