0% found this document useful (0 votes)
7 views

Extracting pages from a PDF with Acrobat JavaScript

This document provides a tutorial on using Acrobat JavaScript to automate the extraction of pages from large PDF documents. It outlines the use of the doc.extractPages() function, prerequisites for automation, and includes example scripts for extracting and emailing specific pages. The tutorial is aimed at intermediate and advanced users familiar with Acrobat JavaScript programming.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Extracting pages from a PDF with Acrobat JavaScript

This document provides a tutorial on using Acrobat JavaScript to automate the extraction of pages from large PDF documents. It outlines the use of the doc.extractPages() function, prerequisites for automation, and includes example scripts for extracting and emailing specific pages. The tutorial is aimed at intermediate and advanced users familiar with Acrobat JavaScript programming.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Tutorials Get Help Resources Join us at MAX Try Acrobat

Home > Tutorials > Extracting pages from a PDF with Acrobat JavaScript

Extracting pages from a PDF with Acrobat Try Acrobat


JavaScript Get started >

Learn how to use Acrobat JavaScript to automate splitting apart smaller


subsets of pages from large PDF-based documents.

By Thom Parker – February 12, 2009

Scope: Acrobat 5.0 and later


Category: Automation Learn how to
Skill Level: Intermediate and Advanced
Prerequisites: Basic Acrobat JavaScript Programming edit PDF.
Imagine receiving a large, automatically generated report in PDF that needs to be sliced and diced so different parts can be sent to
clients or other departments. Not an uncommon activity, and one that’s possible to do manually with Acrobat Professional. Now Get started
imagine having to do this every week to a document that needs to be split 100 different ways. That’s a big task, one prone to human
error. Fortunately, this can be easily automated with Acrobat JavaScript.

About page extraction


Page extraction is performed with the doc.extractPages() function. This function takes three input arguments: The page numbers
for the beginning and end of the extraction, and a path to a PDF file where the extracted pages are saved.

This is a simple function to use, especially since all the input arguments are optional. But it does have a couple restrictions. First,
page extraction cannot be done in the free Adobe Reader; this can only be done with Acrobat Professional or Standard. Second,
due to security restrictions in Acrobat scripting, the path input can only be used if this function is called from a privileged context.
This means the path input cannot be used if this function is run from a script in a PDF file. Extracting pages is for automation, not
document interactivity. Automation scripts include JavaScript code run from the JavaScript Console, a Batch Process, or a Folder
Level Script.

All the examples in this article will be run from the Acrobat Console Window, which is a privileged context and also very handy for
running quick cut-and-paste automation scripts. I’ve made up an example file for testing. Download this file and save it to a local
folder on your system:
Ask the Community
Example file
NelsonsInc_Employee1040s.pdf

This file was generated from the accounting mainframe at Nelson’s Buggy Whips. In 1864, Nelson’s provided all its employees with
filled-out 1040s to make it easier for them to file taxes. The sample above is a single file with all employees’ 1040s included. It was
generated for print, but now needs to be split and e-mailed to the individual employees.

We’ll start off with some simple examples before getting into the full automation script. Post, discuss and be part of
the Acrobat community.
Open the example file in Acrobat Professional, then open the JavaScript Console by pressing Ctrl+J on Windows, or Command+J
on Mac.
Join now >
To extract a single page from the document, specify only the nStart input. Run the following code in the JavaScript Console:

this.extractPages({nStart:5}); Topics
If your screen isn’t large enough to accommodate both the Console Window and Acrobat, close the Console Window. Notice Rearrange PDF pages
Acrobat has created a new temporary file with a single page (page six) from the original document. It’s very important to remember
that page numbers in JavaScript are zero-based, i.e., page zero in JavaScript is page one in the Acrobat viewer. Convert PDF to JPG
Notice also Acrobat created a temporary document to place the extracted page. This is because the path input, cPath, was not Create PDF online
specified. Look back in the Console Window (Figure 1). The return value from running the code printed out the text [object Doc]. If
you are using Acrobat 7 or earlier, the output will be slightly different. For Acrobat 7, the output will be [object Global]. Convert Word to PDF online

Convert Excel to PDF online

Convert PowerPoint to PDF online

Convert JPG to PDF online

Compress PDF online

Sign Microsoft Word documents


Create electronic signatures

Create digital signatures

Create PDFs

Edit PDFs

Export PDFs

Combine Files

Review and Comment

Scan and Optimize

Mobile PDF

Protect PDFs

PDF Forms

Sign and Send PDFs


Figure 1 – Document object returned from extractPages() function.
Print Production
The extractPages() function returns a pointer to the newly created document object with the extracted pages. If this code was part
of a larger script, then the document pointer would be critical for actually doing something with the extracted pages. We’ll get to this PDF Standards
in a later example.
Accessibility
Delete the temporary PDF. Note: be sure to do this for every example that creates a temporary PDF so you don’t get mixed up
JavaScript
about which document you are working on.

Let’s do this again, using a simple path argument:


Products
this.extractPages({nStart:5, cPath: "TestExtract1.pdf"});
Acrobat DC
This time, the extractPages() function returns null, and no temporary PDF is created. Look in the folder where you saved the
example file. There will be a new file in that folder named "TestExtract1.pdf.” Acrobat saved the extracted page, so there was no Acrobat XI
need to return a document pointer.
Acrobat X
Before we move to the next example, it’s worthwhile to point out the notation used to pass the arguments into the function. This
“Object Style” notation is an Acrobat DOM feature, not a core JavaScript feature. It only works on functions that are part of the Acrobat 9
Acrobat JavaScript Model. It’s useful because it eliminates having to specify the other optional arguments, but it’s not necessary.
The first example could have been run like this: Acrobat Reader
this.extractPages(5);

Or the second example like this:

this.extractPages(5, 5, "TestExtract1.pdf");

Which leads into the next example, using the cEnd input. Using cEnd by itself extracts all pages from the beginning of the
document to the page value specified by cEnd. Run this code in the Console Window:

this.extractPages({nEnd:5});

This code extracts pages one through six. It is exactly the same as running this code:

this.extractPages(0,5);

To extract the pages from page five to the end of the document, use this code:

this.extractPages(5, this.numPages-1 );

where this.numPages is a document property that returns the number of pages in the document. So, (this.numPages-1) is the
page number for the last page in the file.

Creating a cut-and-paste automation script


Now we’re ready to create the script to split all the 1040s and e-mail them to the right people. Let’s start with breaking out the
individual forms for the employees.

Each 1040 form has four pages. Forms were simpler in 1864 (although the tax calculations were still incomprehensible), no
schedules or related forms, so we can write a loop to both extract the pages and e-mail the documents.

for(var i=0; i<this.numPages; i+=4) {


var oNewDoc = this.extractPages({nStart: i, nEnd: i + 3});
oNewDoc.mailDoc( … );
oNewDoc.closeDoc(true);
}

This script walks through the document extracting four-page blocks. The extractPages() function returns a pointer to the newly
created object, which is then used to e-mail the document, and finally to close it before moving on to the next extraction. You can
look up the mailDoc() and closeDoc() functions in the Acrobat JavaScript Reference.

One thing is missing from this script: Where do the e-mail addresses come from? For simplicity, we’ll modify the code to use a list of
names and e-mail addresses.

var aEmailList = ["[email protected]","[email protected]","[email protected]"];


for(var i=0,j=0; i<this.numPages; i+=4,j++) {
var oNewDoc = this.extractPages({nStart: i, nEnd: i + 3});
// Build file name and path for new file
var cFlName = aEmailList[j].split("@").shift() + "_1040.pdf";
var cPath = oNewDoc.path.replace(oNewDoc.documentFileName,cFlName);
oNewDoc.saveAs(cPath);
oNewDoc.mailDoc(false, aEmailList[j]);
oNewDoc.closeDoc(true);
}

A second variable is added to the for statement for walking through the array of e-mails, and a saveAs command is included. Copy
and paste the above code into the Console Window. Make sure to select all lines in the script before running it, so all the code is
executed at the same time. Acrobat will go out to lunch for a short time. When it returns, you should have three new e-mails in your
out folder, each with a PDF attachment.

Unfortunately, the name of the temporary file created by extracting the pages is a bit cryptic, and it ends with “.tmp” instead of “.pdf.”
Files should have sensible names so it’s easier to tell a bit about the contents from the name. But we have a potentially bigger
problem because of the “.tmp” extension. It’s possible an e-mail server will block an attachment with this extension. The code for
creating a new file name and the doc.saveAs() function were added to the script to fix these issues. It saves the temporary file to a
name derived from the e-mail address. For example, the first set of extracted pages will be saved to “HBabner_1040.pdf.” The file is
saved to a temporary file folder, so it can be cleaned up easily later.

This is a pretty simple script that can make our job a lot easier. But, what if the individual 1040s varied in page length, or the
document was so huge it wasn’t practical to set up the e-mail addresses to match the extraction order? How do we make a more
flexible automation script?

All these issues can be handled with Acrobat JavaScript. For example, we could use the doc.getPageNthWord() function to
both find the page ranges and extract the employee’s name. This information could then be used to look up the e-mails on a local
list, or even the company’s server. But, that is a much more complex script, so it will have to wait for another day.

Using the example scripts


In this article, we ran the example code by copying and pasting the scripts into the JavaScript Console Window. In fact, for doing
simple-automation tasks, it’s a good idea to place all your favorite scripts into a plain-text document from which you can copy and
paste.

To extract pages from a group of files, you would use a Batch Sequence. Batch Sequences are a privileged context, so all the
example code can be copied directly into a Batch Sequence.

A more interesting and useful way to run an automation script is with an Acrobat toolbar button or menu item. However, using one of
these options requires that the code be enclosed in a trusted function. Code for creating toolbar buttons and trusted functions can
be found in this article, Applying PDF security with Acrobat JavaScript.

For more information on functions used in this article, see the Acrobat JavaScript Reference and the Acrobat JavaScript Guide.

https://www.adobe.com/devnet/acrobat.html

Click on the Documentation tab and scroll down to the JavaScript section.

Share this page

Related topics: JavaScript


Top Searches: Edit PDF, create PDF, Action Wizard

19 comments
Comments for this tutorial are now closed.

Lori Kassuba 5, 2015-03-19 19, 2015

Hi Linda Haworth,
Can you post your question here so some of our other experts can assist you (be sure to select the
JavaScript category):
https://answers.acrobatusers.com/AskQuestion.aspx
Thanks,
Lori

Linda Haworth 9, 2015-03-17 17, 2015

I have a form that will have changing amount of pages based on user input. I want to extract and email
the last 20 pages. is there a way to extract counting backwards so one day my form may be a total of
25 the next time a total of 45 but I always extract the last 20

Thom Parker 5, 2014-10-27 27, 2014

Hello Jean, These are both interesting questions, but not related to the article topic. Here is a link to an
article on setting email address, subjects, and such
https://acrobatusers.com/tutorials/dynamically-setting-submit-e-mail-address
There is in fact a JavaScript command for importing data from a CSV file, you’ll find an article on the
topic at this membership site.

http://www.pdfscripting.com/public/ExcelAndAcrobat.cfm

And here is an article on another technique for acquiring CSV data.


https://acrobatusers.com/tutorials/getting-external-data-into-acrobat-x-javascript

None of these are simple, and all require some knowledge of programming.

Jean 11, 2014-10-25 25, 2014

Is there a sample script to read the email list aEmailList from a csv file?

Jean 11, 2014-10-25 25, 2014

If I wanted to add email subject and message to the script, what do I add in oNewDoc.mailDoc(false,
aEmailList[j]);?

Lori Kassuba 2, 2014-05-06 06, 2014

Hi Bob Hurt,

Please see this discussion on extracting metadata:


http://answers.acrobatusers.com/Is-extract-metadata-PDF-file-write-file-association-PDF-q29727.aspx
Thanks,
Lori

Bob Hurt 7, 2014-04-28 28, 2014

How do I extract the pdf document description and author? I want to display that in a web page beside
a list of PDF file names

Ed 6, 2013-10-24 24, 2013

I’m trying to extract pages. When I run the following from the Adobe console the first extract works but
the second is not processed. Can anyone help with this. Thank you.
// Extract pages1
extractPages({nStart: 103, nEnd: 104, cPath: “file1.pdf”});

// Extract pages2
extractPages({nStart: 105, nEnd: 106, cPath: “file2.pdf”});

Ed 8, 2013-10-23 23, 2013

I’m trying to extract separate files. I want to use very basic commands. How do I separate the
arguments. When I run this script it only processes the last “extractPages” ? How do I separate these
arguments so both will be processed. Thank You.

this.extractPages(29, 31, “Coyotes 10-31-13 210 B 10-12.pdf”);


this.extractPages(32, 35, “Coyotes 10-31-13 211 B 5-8.pdf”);

Thom Parker 4, 2013-10-17 17, 2013

Milton, Read these two articles to learn about manipulating file paths in Acrobat JavaScript
https://acrobatusers.com/tutorials/file-paths-acrobat-javascript
https://acrobatusers.com/tutorials/splitting-and-rebuilding-strings
A script in Acrobat cannot create new folders, for security reasons. So your target folder must already
exist

Cori 11, 2013-10-14 14, 2013

I admit, I have not been on acrobatusers.com in a long time however it was another joy to see It is such
an important topic and ignored by so many, even professionals. I thank you to help making people more
aware of possible issues.

Milton Fosneca 7, 2013-09-29 29, 2013

I’m looking to extract the “selected” pages and then do a “save as” to a specific folder. The name of the
file has to be “‘date’.pdf” on a specific path. Can someone help me with this?

Thom Parker 12, 2012-10-23 23, 2012

Dennis,
Saving the Extracted page in Acrobat X requires privileged. It sounds like you need to place your script
into a trusted function.

Dennis 5, 2012-10-19 19, 2012


Has anyone had issues moving to Adobe Pro X? I had a script which extracted a cover page and saved
the file with the same filename in a different folder. Now the script creates a temp file instead.

Thom Parker 3, 2012-10-15 15, 2012

Patrick, the extraction loops in the article extract in 4 page blocks. If you want 2 page blocks, then all
you have to do is to change the page increment to “2” instead of “4”, (“i” is used as the page increment
in all the loops).
And since you are only saving the file. You can include the cPath parameter in the “extractPages”
function.

patrick ball 8, 2012-10-11 11, 2012

I’m trying to do this exact extraction except two pages at a time. basically going every 2 pages to in a
doc, and splitting a large PDF so that every 2 pages become a new file. what small addition in the script
changes this?

Thom Parker 4, 2012-09-13 13, 2012

Nathan,
Why Yes, this is possible. The current page is in the “this.pageNum” document property. So you would
set nStart and nEnd variable like this.

nStart = this.pageNum-2;
nEnd = this.pageNum+2;

Although you also need to add code to check for and correct values that overrun the first and last pages.

Artem Burmakin 3, 2012-09-13 13, 2012

@Nathan Gardner
This should not be too difficult to do, at least if I understood your request correctly.

All you need to know is the current page number.


this.pageNum - does this.
Then simply use it to extract pages:
this.extractPages(this.pageNum,this.pageNum+1);

the above will extract the current page and the next one.
Is it answering your question?

Nathan Gardner 12, 2012-09-12 12, 2012

I am looking to use this script to set up a function to extract 2 pages before and after the displayed page
of a large document. We want to do this to provide context to a particular search result for document
review.
Is this possible?

Any help would be much appreciated.

Artem Burmakin 8, 2012-08-16 16, 2012

Thank you Thom. you are right it is not possible to find problem remotly. In any case thank you for help,
the article above was really usefull, so was your comment.

After struggling a bit with the code I finally created something that works and does what I need to do.
If you do not mind I would like to share it here, mybe someone will find it usefull.
So the task was: I have a big report in pdf with employee salaries and other payments. Employees are
groupped by country and the country name in the format like this - Country: Austria - is stated at the
same place, but not on every page. What I need to do is to split this big file into smaller reports by
Country (all in all there are about 60 countries in the report of 650 pages).
Here is the code that makes this for me (I start this code from Console):
for (var p = this.numPages - 1;p >=0; p—)
{
var numWords = this.getPageNumWords (p);
{
var ckWord = this.getPageNthWord (p, 60, true);
if ( ckWord ==“Country”)
{
console.println(p);
//CHANGE THE File name IN THE NEXT LINE
this.extractPages(p, this.numPages-1,“07-Mnthly_Comp_perAsgne_by_PLS - ” + this.getPageNthWord
(p, 61, true) + “.pdf”);
this.deletePages(p, this.numPages-1);
}}}

Hope this will help someone.


Thanks again Thom, I could not do this without your help.

Thom Parker 6, 2012-08-08 08, 2012


Artem, The likely problem is that the words are not being found. However, just as a general rule, scripts
of any complexity are going to have issues that require debugging and extra code for testing your
values. In this case extra code needs to be added to ensure extraction only takes place when the words
are found. There may also be other issues. I don’t know, I don’t have your test documents and I have
not analyzed or debugged this code. I just wrote it off the top of my head. If you are having issues then
you need to learn about debugging or hire a programmer. I would suggest reading this article and then
asking questions on the regular forum.
https://acrobatusers.com/tutorials/why-doesnt-my-script-work]Why Doesn’t my Script Work

Artem Burmakin 11, 2012-08-08 08, 2012

I really appreciate your support, but still can’t make it work.


I am a dummy in coding, so if you could help a little bit more it would be really great.
So, I take the code you gave and I add “extract” function, but it says:
TypeError: Invalid argument type.
Doc.extractPages:26:Batch undefined:Exec
===> Parameter nStart.

What am I doing wrong?


Here is the code:
/* Test111 */
var cKeyWord1 = “Austria”;
var cKeyWord2 = “Canada”;
var bFound1 = false;
var nPage1 = -1;
var nPage2 = -1;
for(var nPg=0;nPg<this.numPages;nPg++)
{
if(!bFound1)
{
if(this.getPageNthWord(61) == cKeyWord1)
{
nPage1 = nPg;
bFound1 = true;
}
}
else
{
if(this.getPageNthWord(61) == cKeyWord2)
{
nPage2 = nPg;
break;
}
}
this.extractPages({nStart: nPage1,nEnd: nPage2, cPath: "TestExtract1.pdf"});
}

thanks a million in advance

Thom Parker 3, 2012-08-06 06, 2012

The best way to find your words is to use loop to search for the words on all pages. Use a state variable
to control which word is being looked for.

var cKeyWord1 = “Key1”;


var cKeyWord2 = “Key2”;
var bFound1 = false;
var nPage1 = -1;
var nPage2 = -1;
for(var nPg=0;nPg<this.numPages;nPg++)
{
if(!bFound1)
{
if(this.getPageNthWord(65) == cKeyWord1)
{
nPage1 = nPg;
bFound1 = true;
}
}
else
{
if(this.getPageNthWord(65) == cKeyWord2)
{
nPage2 = nPg;
break;
}
}

}
And that is how you find the page range.

Artem Burmakin 4, 2012-08-03 03, 2012

Could you please advise how to split pdf based on content with doc.getPageNthWord()? I have broken
my head trying to find the solution.
I have a pdf wich I want to split by keywords that are always on the same place (say word 65), but not
present on every page. So I need to define the range between two keywords and extract the pages in
between.

Thank you in advance,

Comments for this tutorial are now closed.

Try it yourself Get help Stay connected

Adobe Acrobat Ask the community


Download the free Reader

Copyright © 2024 Adobe. All rights reserved. Terms of Use Do not sell my personal information Contact Us

You might also like