Conclusion and Future Work

ples of an input provided to the network, examples of net- Lafferty, J. D., McCallum, A., and Pereira, F. C. N. (2001).

work output, and ground truth target output. These results Conditional random fields: Probabilistic models for seg-
demonstrate impressive results with such a small dataset. menting and labeling sequence data. In ICML ’01 Pro-
In particular, the network is able to reject header and footer ceedings of the Eighteenth International Conference on
text extremely reliably. The network rejects most abstracts, Machine Learning, pages 282–289.
figure captions and references, confusing only some where LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learn-
the text formatting is extremely similar to typical paragraph ing. nature, 521(7553):436.
text. The per pixel classification accuracy on the validation
Lipinski, M., Yao, K., Breitinger, C., Beel, J., and Gipp,
set was 94.32%, compared to a baseline of classifying each
B. (2013). Evaluation of header metadata extraction ap-
pixel as “not paragraph” which would provide 79.67% ac-
proaches and tools for scientific PDF documents. In Pro-
ceedings of the 13th ACM/IEEE-CS joint conference on
Digital libraries - JCDL ’13. ACM Press.
6. Conclusion and Future Work
Lopez, P. (2009). GROBID: Combining Automatic Bib-
In this paper we demonstrated that deep learning-based im- liographic Data Recognition and Term Extraction for
age analysis can be used to identify sections of scientific Scholarship Publications. In International Conference
publications. Given the results from our current experi- on Theory and Practice of Digital Libraries, pages 473–
ments, we feel that deep learning can be successfully used 474. Springer.
to enhance current PDF extraction methods, and based on
Mao, S., Rosenfeld, A., and Kanungo, T. (2003). Docu-
our findings we plan to continue collecting data in order
ment structure analysis algorithms: a literature survey.
to further increase our networks results, as we feel many
In Tapas Kanungo, et al., editors, Document Recognition
of the misclassified portions of text are due to insufficient
and Retrieval X. SPIE, jan.
training data that does not currently characterize features
such as reference sections and abstracts sufficiently. Peng, F. and McCallum, A. (2004). Accurate informa-
Our current results show that a deep learning network can tion extraction from research papers using conditional
successfully distinguish and learn the difference between random fields. In HLT-NAACL 2004: Human Language
the body text and other portions of a PDF document. The Technology Conference of the North America Chapter of
next step is to extend the approach to identifying each type the Association for Computational Linguistics, Proceed-
of text (title, author, abstract, body text, etc.) rather than ings of the Main Conference, pages 329–336.
simply body text versus other. Additionally, we plan to in- Ronneberger, O., Fischer, P., and Brox, T. (2015). U-net:
crease the accuracy of our network by adding more data Convolutional networks for biomedical image segmen-
and to create an extraction tool that leverages the output tation. In International Conference on Medical image
of the deep learning network to extract text. While we are computing and computer-assisted intervention, pages
currently evaluating accuracy based on a per pixel count of 234–241. Springer.
estimated versus redacted image, an improved test of accu- Siegel, N., Lourie, N., Power, R., and Ammar, W. (2018).
racy would be to leverage such an extraction tool to identify Extracting Scientific Figures with Distantly Supervised
the per character accuracy of this text extraction approach. Neural Networks. In To appear in ACM/IEEE Joint
Conference on Digital Libraries in 2018 (JCDL 2018).
