Grep is a powerful tool to search for a pattern or regular expression in a text file but it cannot do the search in pdf files and that's where pdfgrep comes into the picture. It's a simple command used to search PDF files for a regular expression. In this article, we will discuss about the pdfgrep command and its usage.
Syntax:
Usage: pdfgrep [OPTION]... PATTERN FILE...
Installation of pdfgrep command
pdfgrep is not pre-installed like grep but it can be downloaded from the repositories in most of the Linux distributions.
1. For Ubuntu/Debian:
sudo apt-get install pdfgrep
2. For CentOS/Fedora:
sudo yum install pdfgrep
Working with pdfgrep:
pdfgrep command is compatible with GNU grep with some PDF-specific options. If you are familiar with grep, then most of the option looks familiar.
1. Basic Search
Let's do a basic search for a string "General Linux" in a pdf file,
Example:
pdfgrep "General Linux" intro-linux.pdf
Output:
2. Print filename
Use --with-filename or -H option to display pdf file name along with the output when there is one file to search.
Example:
pdfgrep -H dns intro-linux.pdf
Output:
The command prints the filename by default when there is more than one file to search (implies -H).
3. Case-insensitive search:
Use --ignore-case or -i to do case insensitive search. Let's search for the word dns.
Example:
pdfgrep -i dns intro-linux.pdf
Output:
The above output shows the matches for both dns and DNS.
4. Get the match count
Use --count or -c to see the count for the matches.
Example:
pdfgrep -ic dns intro-linux.pdf
Output:
Thus ignoring the case, dns was mentioned 28 times.
5. Show the page number
Use –page-number or -n to show the page number. This option would prefix each match with the page number where the pattern got matched.
Example:
pdfgrep -in dns intro-linux.pdf
Output:
6. Show match-count per page
Use --page-count or -p option to print the number of matches per page. This option implies page number (-n).
Example:
pdfgrep -ip dns intro-linux.pdf
Output:
The above output represents 'page number: match count'. On page number 53, dns is present once. But the same is repeated 5 times on page number 169.
7. Stop match count
Use --max-count or -m option to stop reading the file when the number of pages crossed. This option can be used when the user doesn't want to read the file after crossing the NUM matches.
Example:
pdfgrep -inm 10 dns intro-linux.pdf
Output:
The output shows only 10 matches for dns pattern and stopped reading the file further.
8. Context control
The following options can be used when the user wants to know what lines are present before, after, and around the match.
8.1 Context after the match
Use --after-context or -A option to print NUM lines of context after the match.
Example:
pdfgrep -A 2 dns intro-linux.pdf
Output:
Here we can see 2 lines are printed after the match and the contiguous group of matches is separated by --.
8.2 Context before the match
Use --before-context or -B to print NUM lines of context before the match.
Example:
pdfgrep -B 2 dns intro-linux.pdf
Output:
8.3 Context around match
Use --context or -C to print NUM lines of context before and after the match.
Example:
pdfgrep -C 2 dns intro-linux.pdf
Output:
9. Caching
PDF file consists of images along with the text. When the file is large it would take some time to skip the media and do the search which can be frustrating when we do frequent grep. There is an option --cache which would cache the rendered text and make the search time quicker. It would be helpful especially when the file is large.
Example:
time pdfgrep --cache iq dns intro-linux.pdf
Output:
Here the pattern dns is searched twice with and without enabling cache, where the command that includes --cache got completed faster than the other commands that didn't include it. -q option is used to suppress the output for easy understanding.
10. Password protected file
Using --password option, pdfgrep tool can also be used to do a grep in password protected file.
Usage:
pdfgrep --password [password] [pattern] [pdf_file]
Example:
pdfgrep --password "ndey" dns intro-linux-protected.pdf
Table of Difference between grep and pdfgrep
Grep | pdfgrep |
---|
It works only on plain text files. | It works only on pdf files. |
It is a default package. | It is not a default package but can be downloaded from the repository. |
It operates on lines. | It operates on pages. |
-n option is to show the line number | -n option is to show the page number. |
Similar Reads
Unzip Command in Linux As an open-source operating system, Linux presents plenty of powerful and versatile instructions for dealing with files and directories. One such command that performs an important role in coping with compressed files is the "unzip" command.Compressed files are a common way to keep space and share d
8 min read
gs command in Linux with Examples gs command invokes Ghostscript, which is an interpreter of Adobe Systems PostScript and Portable Document Format(PDF) languages. After executing Ghostscript it reads further input from the standard input stream until it encounters 'quit' command. Syntax: gs [ options ] [ files ] ... Options: Below a
2 min read
How to Convert PDF to Image in Linux Command Line? Pdftoppm is a tool that converts PDF document files into .PNG format and many other formats. We can use this tool on Linux to convert the PDF into images. It also provides the features like the cropping image, set resolution, and scale, and many more. Now let's see how to install the pdftoppm Instal
3 min read
groffer Command in Linux with Examples The groffer command is a powerful utility in Linux used for viewing Groff files and man pages on various platforms, including the X Window System, terminal (tty), and other document viewers. It simplifies the process of displaying formatted documents, allowing users to read and search manual pages o
4 min read
10 Best Linux PDF Editors in 2025 If you're looking for the best PDF editors for Linux, you're in the right place. From free and open-source tools to premium solutions, this article provides an in-depth overview of various PDF editing tools available on Linux, covering a variety of options to cater to different user requirements. Wh
12 min read