A common and difficult problem acquiring data is extracting tables from a PDF. Previously, I described how to extract the text from a PDF with PDF.js, a PDF rendering library made by Mozilla Labs. The rendering process requires an HTML canvas object, and then draws each object (character, line, rectangle, etc) on it. The easiest way to get a list of these is to to intercept all the calls PDF.js ma