PDF Document Inspector

A PDF Document structure browser.

I wrote PDF Document Inspector as a part of developing my shareware application, Cheap Impostor. PDF Document Inspector is a wrapper over the PJX library I used to parse the input documents in Cheap Impostor. I found it invaluable for debugging, and I thought it might be useful to the community at large.

Download PDF Inspector here

What does it do?

PDF Document Inspector is a browser of a PDF document's contents. If you're not doing something pretty down-and-dirty with PDF, I can't imagine how it would be useful to you. On the other hand, if you are working with the innermost bits of a PDF file (like I was with Cheap Impostor), I can't think of any other tool out there that would help you figure out what's going wrong.

How does it work?

Well, you run it, and then open a PDF file. A pretty spartan window will pop up showing you the top-level objects of the file in an outline view. Click on the "disclosure triangles" to open up an item. It will all look really cryptic if you don't already have an in-depth knowledge of PDF document format. Realizing that most people don't have such knowledge, I refer you to Adobe's excellent PDF Reference (it's an 8MB PDF file). Other PDF resources are available through Adobe's Solutions Network.

Here's a screenshot of PDF Inspector in action:

There are a few things to note:

  • It's a spartan display. There's no graphical representation of the PDF file -- just a hierarchical listing of the objects inside the PDF file.
  • The top level objects are pretty abstract. Generally you'll only see a TrailerDict and a Page list. Digging around both of these is where the interesting stuff is.
  • There are a number of object types represented.
    • Dictionaries (Dict: in the figure) map names to objects.
    • References (Ref: in the figure) point to other objects. In this example, Page 0 is the name of a reference. The thing it refers to is object 333 version 0, which happens to be a Dictionary.
    • Arrays are, well, arrays. They are usually arrays of references.
    • Streams are blocks of text, which may be encoded in different ways. When you open the disclosure triangle on a stream, PDFInspector unpacks the stream and opens it using your default text editor.
    • ... and so on

How much does it cost?

It's free!

Hey, PJX is GPL'd, where's my sourcecode?

download source

dylan at aracnet dot com