0% found this document useful (0 votes)
1K views5 pages

Book Scanning

about book scanning

Uploaded by

bezveze111
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views5 pages

Book Scanning

about book scanning

Uploaded by

bezveze111
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Book scanning

frame, and photograph the pages from above. Pages may


be turned by hand or by automated paper transport devices. Glass or plastic sheets are usually pressed against
the page to atten it.
After scanning, software adjusts the document images by
lining it up, cropping it, picture-editing it, and converting it to text and nal e-book form. Human proofreaders
usually check the output for errors.
Scanning at 118 dots/centimeter (300 dpi) is adequate
for conversion to digital text output, but for archival reproduction of rare, elaborate or illustrated books, much
higher resolution is used. High-end scanners capable of
thousands of pages per hour can cost thousands of dollars,
but do-it-yourself (DIY), manual book scanners capable
of 1200 pages per hour have been built for US$300.[1]

Internet Archive Scribe book scanner in 2011

1 Commercial book scanners

Internet Archive book scanner

Book scanning (or magazine scanning) is the process


of converting physical books and magazines into digital
media such as images, electronic text, or electronic books
(e-books) by using an image scanner.
Digital books can be easily distributed, reproduced, and
read on-screen. Common le formats are DjVu, Portable
Document Format (PDF), and Tagged Image File Format (TIFF). To convert the raw images optical character
recognition (OCR) is used to turn book pages into a digital text format like ASCII or other similar format, which
reduces the le size and allows the text to be reformatted, Sketch of a V-shaped book scanner from Atiz
searched, or processed by other applications.
Commercial book scanners are not like normal scanners;
these book scanners are usually a high quality digital
camera with light sources on either side of the camera
mounted on some sort of frame to provide easy access
for a person or machine to ip the pages of the book.
Some models involve V-shaped book cradles, which provide support for book spines and also center book position

Image scanners may be manual or automated. In an ordinary commercial image scanner, the book is placed on
a at glass plate (or platen), and a light and optical array moves across the book underneath the glass. In manual book scanners, the glass plate extends to the edge of
the scanner, making it easier to line up the books spine.
Other book scanners place the book face up in a v-shaped
1

Digital camera
Light source

DESTRUCTIVE SCANNING

that are out of copyright; however, Google Book Search


is known to scan books still protected under copyright unless the publisher specically excludes them.

3 Destructive scanning

Book
White background
Table

Sketch of a typical manual book scanner

For book scanning on a low budget, the least expensive


method to scan a book or magazine is to cut o the binding. This converts the book or magazine into a sheaf of
looseleaf papers, which can then be loaded into a standard
automatic document feeder (ADF) and scanned using inexpensive and common scanning technology. While this
is denitely not a desirable solution for very old and uncommon books, it is a useful tool for book and magazine scanning where the book is not an expensive collectors item and replacement of the scanned content is easy.
There are two technical diculties with this process, rst
with the cutting and second with the scanning.

automatically.
The advantage of this type of scanner is that it is very fast, 3.1
compared to the productivity of overhead scanners.

Book scanning by organizations


on a large scale

Projects like Project Gutenberg, Million Book Project,


Google Books, and the Open Content Alliance scan
books on a large scale.
One of the main challenges to this is the sheer volume of
books that must be scanned. In 2010 the total number
of works appearing as books in human history was estimated to be around 130 million.[2] All of these must be
scanned and then made searchable online for the public to
use as a universal library. Currently, there are three main
ways that large organizations are relying on: outsourcing,
scanning in-house using commercial book scanners, and
scanning in-house using robotic scanning solutions.
As for outsourcing, books are often shipped to be scanned
by low-cost sources to India or China. Alternatively,
due to convenience, safety and technology improvement,
many organizations choose to scan in-house by using either overhead scanners which are time-consuming, or
digital camera-based scanning solutions which are substantially faster, and is a method employed by Internet
Archive as well as Google. Traditional methods have included cutting o the books spine and scanning the pages
in a scanner with automatic page-feeding capability, with
rebinding of the loose pages occurring afterwards.

Unbinding

More precise and less destructive than cutting pages with


a paper guillotine or razor or scissors is the technique
of meticulous unbinding by hand, assisted with tools.
This technique has been successfully employed for tens of
thousands of pages of archival original paper scanned for
the Riazanov Library digital archive project from newspapers and magazines and pamphlets, varying from 50
to 100 years old and more, and often composed of fragile, brittle paper. Although the monetary value for some
collectors (and for most sellers of this sort of material)
is destroyed by unbinding, unbinding in many cases actually greatly assists preservation of the physical pages
themselves, making them more accessible to researchers
and less likely to be damaged when subsequently examined. The down side is that unbound stacks of pages are
ued up, and therefor more exposed to oxygen in the
air, which may in some cases (theoretically) speed deterioration. This can be addressed by putting weights on the
pages after they are unbound, and storage in appropriate
containers.
Hand unbinding will preserve text that runs into the gutters of bindings, and most critically allows more easy and
complete high quality scans to be made of two page wide
material, such as center cartoons, graphic art, and photos
in magazines. The digital archive of The Liberator 19181924 on Marxist Internet Archive nicely demonstrates the
quality of two page wide graphic art scans made possible
by careful hand unbinding prior to at bed or other scanning.

Unbinding techniques vary with the binding technology,


Once the page is scanned, the data is either entered manu- from simply removing a few staples to unbending and really or via OCR, another major cost of the book scanning moving nails to meticulous grinding down of layers of
projects.
glue on the spine of a book to precisely the right point,
Due to copyright issues, most scanned books are those followed by laborious removal of the string used to hold

3.4

A test case: PGP

the book together.

shape and size, and variably sized or shaped pages can


Note that with some newspapers (such as Labor Action lead to improper scanning. The ried edges or curved
1950-1952) there are columns on the center facing pages edge can be guillotined o to render the outer edges at
that run right in-between the pages. Chopping o part and smooth before the binding is cut.
of the spine of a bound volume of such papers will lose The coated paper of magazines and bound textbooks can
part of this text. Even the Greenwood Reprint of this make them dicult for the rollers in an ADF to pick up
publication failed to preserve the text content of those and guide along the paper path. An ADF which uses a
center columns, cutting o signicant amounts of text series of rollers and channels to ip sheets over may jam
there. Only when bound volumes of the original news- or misfeed when fed coated paper. Generally there are
paper were meticulously unbound, and the opened pair fewer problems by using as straight of a paper path as is
of center pages were scanned as a single page on a at possible, with few bends and curves. The clay can also
bed scanner was the center column content made digitally rub o the paper over time and coat sticky pickup rollers,
available. Alternatively, one can present the two facing causing them to loosely grip the paper. The ADF rollers
center pages as three scans. One of each individual page, may need periodic cleaning to prevent this slipping.
and one of a page sized area situated over the center of Magazines can pose a bulk-scanning challenge due to
the two pages.
small nonuniform sheets of paper in the stack, such as

3.2

Cutting

One method of cutting a stack of 500 to 1000 pages in one


pass is accomplished with a guillotine paper cutter. This
is a large steel table with a paper vise that screws down
onto the stack and rmly secures it before cutting. The cut
is accomplished with a large sharpened steel blade which
moves straight down and cuts the entire length of each
sheet all at once. A lever on the blade permits several
hundred pounds of force to be applied to the blade for a
quick one-pass cut.
A clean cut through a thick stack of paper cannot be made
with a traditional inexpensive sickle-shaped hinged paper
cutter. These cutters are only intended for a few sheets,
with up to ten sheets being the practical cutting limit. A
large stack of paper applies torsional forces on the hinge,
pulling the blade away from the cutting edge on the table.
The cut becomes more inaccurate as the cut moves away
from the hinge, and the force required to hold the blade
against the cutting edge increases as the cut moves away
from the hinge.
The guillotine cutting process dulls the blade over time,
requiring that it be resharpened. Coated paper such as
slick magazine paper dulls the blade more quickly than
plain book paper, due to the kaolinite clay coating. Additionally, removing the binding of an entire hardcover
book causes excessive wear due to cutting through the
covers sti backing material. Instead the outer cover can
be removed and only interior pages need be cut.

3.3

Scanning

magazine subscription cards and fold out pages. These


need to be removed before the bulk scan begins, and are
either scanned separately if they include worthwhile content, or are simply left out of the scan process.

3.4 A test case: PGP


In 1995, Phil Zimmerman published PGP Source Code
and Internals as a $60 hardbound book, which under the
First Amendment could legally be shipped abroad. The
buyer could either display it in a library or destructively
scan it so that the source code could be compiled via
freely available GNU software into the Pretty Good Privacy (PGP) cryptosystem that the U.S. government regarded as a restricted munition. Zimmerman was being
prosecuted for distributing PGP software and wanted to
test the law in the courts. It was not directly tested, but
export restrictions have eased: it is legal to export PGP
anywhere but the seven countries and specied groups
and individuals to which nothing can be exported from
the U.S.

4 Non-destructive scanning
In recent years, software driven machines and robots have
been developed to scan books without the need of disbinding them in order to preserve both the contents of
the document and create a digital image archive of its current state. This recent trend has been due in part to ever
improving imaging technologies that allow a high quality
digital archive image to be captured with little or no damage to a rare or fragile book in a reasonably short period
of time.

Once the paper is liberated from the spine, it can be Some high-end scanning systems employ vacuum and
scanned one sheet at a time using a traditional atbed air and static charges to turn pages while imaging is
scanner or automatic document feeder.
performed automatically, usually from a high resolution
Pages with a decorative ried edging or curving in an camera located over an adjustable v-shaped cradle. Imarc due to a non-at binding can be dicult to scan using ages are then shuttled from the imaging device into varan ADF. An ADF is designed to scan pages of uniform ious editing suites which can further process the images

EXTERNAL LINKS

6 References
[1] DIY High-Speed Book Scanner from Trash and Cheap
Cameras. instructables.com. Retrieved 19 January
2014.
[2] Taycher, Leonid (2010-08-05). As of Aug 5, 2010,
google estimates that there are 129,864,880 dierent
books in the world. Googleblog.blogspot.co.at. Retrieved 2014-08-08.
[3] The Secret Of Googles Book Scanning Machine Revealed,
by Maureen Clements, April 30, 2009.

An example of a DIY non-destructive book scanner/digitizer, with


the book downwards design, allowing gravity to atten pages

[4] Guizzo, Erico (2010-03-17). ""Superfast Scanner Lets


You Digitize Book By Flipping Pages, IEEE Spectrum,
March 17, 2010. Spectrum.ieee.org. Retrieved 201408-08.

for either an archival-quality le such as TIFF or JPEG


2000, or a web-friendly output such as JPEG or PDF.

Newsweek article on the future of book scanning


and the publishing industry

Googles patent 7508978 shows an infrared camera technology which allows to detect and automatically adjust
the three-dimensional shape of the page.[3] Researchers
from the University of Tokyo have an experimental nondestructive book scanner[4] that includes a 3D surface
scanner to allow images of a curved page to be straightened in software. Thus the book or magazine can be
scanned as quickly as the operator can ip through the
pages; about 200 pages per minute.

See also

Wired Article on Amazon Book Scanning


New York Times article on book scanning and the
universal library
College students are now starting to scan expensive
textbooks only used for a single class and trading
them like song and movie les.
The DIY Book Scanner, Slashdot, December 13,
2009, by Soulskill
DIY Book Scanners Turn Your Books Into Bytes,
By Priya Ganapati, Wired, December 11, 2009
Some Important Points to Note before Handing
Over Book Scanning Tasks to Somebody Else, By
Don Steacy, Articlepool, March 15, 2013

7 External links
Do It Yourself book scanner device forum
Google Open Source Linear Book Scanner

Turning the pages in between taking scans.

Digital library
Institutional repository
Optical character recognition
Planetary scanner
Robotic book scanner

Text and image sources, contributors, and licenses

8.1

Text

Book scanning Source: http://en.wikipedia.org/wiki/Book%20scanning?oldid=650693926 Contributors: Zundark, Patrick, Ronz, Dcoetzee, Premeditated Chaos, Alan Liefting, Beland, Piotrus, MFNickster, Pavel Vozenilek, Stbalbach, Rajah, Trevj, 99of9, Kocio, Unixxx,
Mindmatrix, Scriberius, BD2412, Bubuka, DMahalko, MosheA, Kvn8907, Mysid, John hartley, GraemeL, Shawnc, Crost, Mdwyer,
SmackBot, InverseHypercube, Kevinalewis, Ikiroid, Nbarth, JonHarder, Whpq, Thebt, Itsover, Spdegabrielle, CmdrObot, Hebrides, Electron9, MarshBot, RobotG, Dylan Lake, Rising Suns, Martygoodman, DGG, Gwern, Nietzscheanlie, SagaciousAWB, Nbauman, Joewski,
Root2, Nemo bis, Mufka, Sbierwagen, CSumit, Headphonos, Floppydog66, Je G., Sarasinb, Cwcrowley, Poppyjuice, Jimthing, Meysi,
Gregrebholz, RashersTierney, Saddhiyama, Alexbot, M4gnum0n, PixelBot, Pruek, DumZiBoT, Clarkdavid, Felix Folio Secundus, Addbot,
Fgnievinski, Luckas-bot, Cm001, Nataku4ca, AnomieBOT, Piano non troppo, Smk65536, Pesdt, Priyankamathur, Inhibit, Tylerritchie,
Downsize43, Lotje, Waters2100, Ripchip Bot, Aboutoccur, Donner60, Spicemix, BG19bot, Co2capt, Pans1978, BattyBot, May day16,
WikiU2013, Don077, Oleaster and Anonymous: 76

8.2

Images

File:A_Real_Page-Turner.jpg Source: http://upload.wikimedia.org/wikipedia/commons/e/e7/A_Real_Page-Turner.jpg License: CC


BY 2.0 Contributors: http://www.flickr.com/photos/textfiles/6050670124/ Original artist: Jason Textles Scott
File:Book_scanner.svg Source: http://upload.wikimedia.org/wikipedia/commons/b/bb/Book_scanner.svg License: CC-BY-SA-3.0 Contributors: Self-made in Inkscape; based on a Paintbrush drawing (en:Image:BookScannerSketch.PNG) by en:User:Joewski. Original artist:
Oona Risnen
File:Commons-logo.svg Source: http://upload.wikimedia.org/wikipedia/en/4/4a/Commons-logo.svg License: ? Contributors: ? Original
artist: ?
File:Internet_Archive_book_scanner_1.jpg Source: http://upload.wikimedia.org/wikipedia/commons/6/65/Internet_Archive_book_
scanner_1.jpg License: GFDL Contributors: Own work Original artist: Dvortygirl
File:MyBookScanner,June12,2011.JPG
Source:
http://upload.wikimedia.org/wikipedia/commons/9/98/MyBookScanner%
2CJune12%2C2011.JPG License: CC BY 3.0 Contributors: Transferred from en.wikipedia
Original artist: Floppydog66 (talk). Original uploader was Floppydog66 at en.wikipedia
File:Scribe_Book_Scanner.jpg Source: http://upload.wikimedia.org/wikipedia/commons/0/06/Scribe_Book_Scanner.jpg License: CC
BY 2.0 Contributors: http://www.flickr.com/photos/textfiles/6050109183/ Original artist: Jason Textles Scott
File:Text_document_with_red_question_mark.svg Source: http://upload.wikimedia.org/wikipedia/commons/a/a4/Text_document_
with_red_question_mark.svg License: Public domain Contributors: Created by bdesham with Inkscape; based upon Text-x-generic.svg
from the Tango project. Original artist: Benjamin D. Esham (bdesham)
File:V-shaped-cradle.jpg Source: http://upload.wikimedia.org/wikipedia/commons/1/1f/V-shaped-cradle.jpg License: CC BY 3.0 Contributors: Transferred from en.wikipedia to Commons by User:Wcam using CommonsHelper. Original artist: Pruek at en.wikipedia

8.3

Content license

Creative Commons Attribution-Share Alike 3.0

You might also like