6 Important Factors When Choosing A PDF Library: Adam Pez
6 Important Factors When Choosing A PDF Library: Adam Pez
You plan to embed PDF functionality into an This dissatisfaction implies that picking the right
application. But before you dive into the project, PDF SDK is a lot harder than it seems. And to
you must decide: do you go with a more expen- help you avoid the same mistakes as past
sive commercial PDF SDK — or a lower-cost implementations, we’ve written this article.
alternative such as an open-source library or
an open-source wrapper? (We also recently surveyed 57 unique
organizations that switched from PDF.js to a
There are non-trivial costs to switching later. commercial SDK. Read our comprehensive guide
Developers have to re-learn the new library, to PDF.js to learn more.)
re-adjust the backend, customize the UI to match
what users are accustomed to, as well as migrate
documents, form data, annotations, and more.
2
Restrictions on Features and Platforms
A first mistake organizations make when Additionally, a library may work fine initially on
selecting a PDF library for the first time is to the main platforms preferred by your users. But
assume fixed feature requirements. But these later if you wish to expand, the library does not
are likely to evolve. support the platforms you want — or the APIs are
inconsistent, with different classes and methods
Users start to ask for more functionality as they across platforms making it so your engineers
grow dependent upon a PDF SDK. An organiza- have to start from scratch.
tion will then have to consider saying no to user
feature requests; building time-intensive and To avoid this hidden cost, go with an SDK
challenging customizations on top; or integrating with a broad feature-set across multiple
additional libraries and thus adding more platforms, providing you the flexibility to
complexity, maintenance overhead, and risk. grow down the road.
3
Unanticipated Difficulty Adding Features
Another mistake is where organizations select a involves a steep learning curve not subject to
basic library to save money with the assumption economies of scale. Throwing more devs into the
that they can build anything needed on top. equation will not shrink the ramp-up time for the
first developer.
But building in an unfamiliar domain can easily
lead to unknown challenges, high expenses, and Additionally, custom features will have to be
reduced speed to market. And PDF is unusually supported and maintained long-term, creating
complex as high-profile teams attest — including an additional ongoing opportunity cost: commit-
those of Slack, Dropbox, and Linkedin. ted resources will be less-available to work on
other projects.
Your devs are not necessarily PDF experts, and
attempting challenging PDF features in-house
— Developer, Slack
4
Organizations that we’ve spoken to have found While it is certainly possible to build the above
the most challenging features are those that in-house, PDF features can consume a shocking
require engaging PDF at a low level, where amount of time. And you eventually may have to
objects are defined in PDF byte code — decide whether to continue — or whether to bite
with unique byte offsets for different objects, the bullet and abandon months or years of work
making it difficult for devs unfamiliar with PDF’s for an alternative that can meet your require-
inner workings to parse and manage these ments cost-effectively.
objects correctly.
To avoid this type of hidden cost, you will want to
Challenging PDF functionality includes carefully consider the capacity of your existing
development team should you decide to build,
• Managing PDF annotations from multiple maintain, and support custom PDF features in-
users (e.g., synchronization and versioning house as these features often prove time-inten-
• PDF generation (creating PDFs from scratch sive and challenging.
or from other documents)
• Page manipulation (add, merge, or remove)
• Layers (via Optional Content Groups)
• Color management features (e.g., ink-color
separations, overprint, etc.)
...you shouldn’t build anything that’s available off the shelf be-
cause it’s not a source of competitive advantage if everybody
else can avail themselves of it. The only scenario where you
should build is if it’s your core technology -- the core source of
your competitive differentiation and competitive advantage.
— Mark Holst-Knudsen, President ThomasNet @
MIT’s 2014 CIO Symposium
5
A lower-quality library also encounters • CAD-based PDFs such as construction and
performance and memory issues, such as large engineering drawings with very large and
documents with frustratingly long wait times for complex designs.
your users as well as complex documents that • Reports, textbooks, and marketing material
crash the viewer. This is often due to the absence using advanced PDF graphics such as shad-
of features such as PDF tiling, parallelization, ings, gradients, soft masks, and patterns.
and linearization that a more mature PDF SDK • Geospatial maps with OCG layers that are
will incorporate. switched off by default.
• Pre-press documents which require an SDK
Some solutions (e.g., image servers) perform with advanced color management features to
excellently when tested on a small number of print colors accurately.
documents and users but then inflict unexpected • High-speed accurate rendering (especially on
hidden costs when scaled up. When hundreds native mobile apps and mobile browsers).
or thousands of users later view, mark up, com- • Context extraction of tables, text, etc. with
ment on, and otherwise interact with (i.e.,scroll, document structure (e.g., text read order or
pan, and zoom) documents, server resource and table arrangement) in tact.
network data usage explodes. To maintain your
desired UX, you have to pay higher fees or invest To prevent crashes, slowness, and rendering
in more servers. issues from disrupting your UX, test functionality
with the types of documents your users will work
The following types of documents have much with. Also test a server-based solution at the
more demanding rendering requirements: anticipated load and usage.
6
Poor UX: Slow Performance, Crashing,
and Inaccurate Rendering
Another source of hidden costs can be a poor smaller company with many remote developers
user experience, especially as users start to may have difficulty providing the same turn-
upload more massive and complex documents around time and specialized support and
that crash or freeze a lower-quality viewer. service as a commercial SDK. If they did not
Construction Computer Software encountered build the rendering engine themselves, they
these issues with a free PDF viewer add-on to its may not be able to fix the issue — or fixes may
flagship estimation software. take a long time — because they have difficulty
finding in the code where the problem originated.
As is often the case with a lower-quality library, If you go with open-source, you may have to fix
PDFs render incorrectly. You then have to wait bugs yourself.
on the vendor to respond. But a reseller or a
If you’re looking for a PDF reader for the first time, you better
make sure it can read 100% of your PDF files. Because if your
client-base starts relying on that PDF reader, exactly what
happened to us, they still want the absolute best quality.”
— Tony Cornwall, Construction Computer Software
7
Low Adoption on a Complex UI
In 2018, AEC-software company PlanGrid source UI will make it difficult to evaluate how
partnered with FMI to survey nearly 600 con- deeply you can customize, optimize, and add new
struction leaders from around the world to tools or annotation types to the UI. Therefore,
discern why construction and engineering your team may build out a proof of concept and
software succeed or fail. The findings report make their plans for future expansion — only
“Construction Disconnected” identified a to have to scale back their ambitions or wait on
complex UI and inadequate user training as two the vendor to adjust the API. A black box UI will
of the top five reasons for why technology fails. prove especially problematic if your UI team is
very strict or if you have unique UI requirements
Being able to slim down the interface and tailor (e.g., accessibility compliance requirements such
feature-sets to specific user groups is proven to as ADA/508).
significantly cut down training costs and improve
user adoption. (See our OEC Graphics success To avoid this hidden cost, choose a vendor with
story to learn more.) an open-source UI or make certain your proof of
concept won’t need to change.
However, a closed-source UI will limit you in what
you can customize, and you may not be able to
fully fix the UX. (And by the time you’ve
discovered this, it may be too late.) A closed-
8
Security Issues The Bottom Line
When writing PDF features from scratch, The best way to avoid hidden costs associated
developers may be tempted to take shortcuts with the wrong PDF library is to perform due
to save time. But these shortcuts cause the diligence during your evaluation. To assist you
solution to become obsolete quickly as devs run in this process, we’ve written a blog with several
into the exact security issues a more mature tool- considerations you can add to your PDF SDK
kit makes a lot easier to solve. evaluation checklist.
One recent instance our solution engineers have We hope this article was helpful! If you have any
noted is where developers use JavaScript-based questions, don’t hesitate to contact us.
submit buttons on forms rather than uploading
and parsing data out of forms — which opens up
the system to phishing and middle-man attacks.
Someone could easily edit the button to have
it send personal information to another server,
and then maliciously re-circulate the form within
your organization or send it to end users.
Vendor Lock-in