Archive for the ‘PDF’ Category

The PHP routines to fill in PDF forms work great, and now my partner wants to use them too. Changing the forms to make the physician name a fill-in field rather than fixed text is easy, but what to do about the signature? It's not like a check; a pixelated image would be fine (I'm not worried about someone forging a preschool physical exam note). But it's not that easy to insert an image into an existing PDF file. I don't want to parse the entire PDF and rewrite it.

Fortunately, images are stored as individual objects in the PDF file, so if there is an image I can identify, I can easily replace it with one of my own choosing. The placement and size of the image on the page is part of the page description, so the new image will be in exactly the same place as the old.

Continue reading ‘Changing Images in a PDF’ »

I use a lot of forms at work. The more paperless the office gets, the more paper we generate. Every school has its own physical exam form, every government agency has its own application form, every screening test is another form for the parent to fill out. And my handwriting is atrocious. So I try to get PDF copies of everything, then use PDF Escape to add text boxes that I can fill in, and an image of my signature at the bottom. But when filling them out, that still leaves a lot of either typing or cut-and-paste from the EMR (electronic medical record) of the patient's name, birthdate, address etc. There had to be a better way, and one that uses only free tools (I'm not buying Acrobat for $400).

Fortunately, Adobe Reader can run a version of Javascript, and we can use that to help fill in the form.

Every PDF includes a /Catalog object that serves as the root object of the document. Normally it just includes a reference to the array of pages, but it can include other things like Javascript to be executed when the document is opened. The syntax is convoluted; it is a dictionary containing a dictionary containing a string:

0 1 obj
  /Type /Catalog
  /Pages 0 2 R % a standard catalog entry
  /Names << % the Javascript entry
    /JavaScript <<
      /Names [
          /S /JavaScript
          /JS (
            app.alert('Hello, World!');
  >> % end of the javascript entry

That's complicated but the coding part is straightforward: take an existing PDF, open it in a text editor and find /Catalog and insert the boilerplate after the /Pages reference, and put in your code. PDF is smart enough to match parentheses, so as long as your code pairs them correctly (you don't have any strings like "We love smileys :)") you don't have to escape them. If you need to, escape them with a backslash. Actual backslashes in your code need to be escaped (write them as \\, since the PDF parser will read the string before interpreting it as Javascript.

This will create an incorrect PDF file, since the xref table no longer has the correct byte lengths. Adobe Reader will correct this automatically, as will PDF Escape, but they compress and otherwise munge up the code so it's impossible to further edit.

See a sample blank page that says "Hello, World".

Continue reading ‘Adding Javascript to PDF Files’ »

Just found another useful tool for manipulating PDF's: the PDF Toolkit. It's a command line tool based on iText that I use mostly for merging PDF's together. The big downside is that embedded Javascript is lost, so that has to be added to the PDF after it has been put together.

So my free (as in free beer and free speech) PDF tools now include:

  • Open Office, for creating PDF's from word processing documents.
  • PDF Escape, for modifying and adding fields (text, checkboxes). This does preserve Javascript code, but compresses everything so you can't edit it further.
  • pdftk, for merging PDF's.
  • tcpdf, for creating PDF's with PHP.
  • A good text editor and a thorough understanding of the PDF specification, to hand tweak.

The PDF specification is very particular about byte lengths of each element, with a table at the end that specifies exactly where in the file everything is, but the most recent Adobe Reader is pretty forgiving (a millisecond alert pops up that it is trying to fix the file). That's important if I'm hand-tweaking a PDF, since I can't correct the cross reference table. PDF Escape, fortunately, will correct everything, so if it's important I can just upload the tweaked PDF and download it back.

After playing with creating PDFs with PHP using fPDF for a while, and trying to get everything to work consistently, I discovered tcpdf, which is a fork of fpdf that includes everything that anyone has ever added to the original. And I mean everything; this thing is huge! I printed out the source to see how it differed from the original, and it ran more than 500 pages. Good thing they're so generous to me at work.

Most of the size is due to the SVG and HTML formatting, which I don't need, but the biggest advantage is that Unicode font subsetting works. Mostly.

tfpdf, the Unicode-enabled version that comes with fpdf, supports Unicode fonts but they don't show up on the iPhone. Apple's PDF viewer is somehow different from Adobe's and reads the fonts differently. tcpdf does a better job (displays in Adobe Reader but generates an error for the HumaneJenson font): the Droid fonts work on the iPhone, though the DejaVu fonts do not. Try those last links on the iPhone; the built-in Helvetica fonts show up but DejaVu does not. Try refreshing the test page multiple times; it randomly selects fonts to display each time. Some fonts generate errors in Adobe Reader but display, some don't display at all and some don't display on the iPhone. It all seems very random, but at least I have a set of open-source true type fonts that I can include.

It also does most of the things I need: PNG graphics with transparency, form fields like text boxes (I played with that one for weeks with tfpdf, but it never worked the way I wanted it to), rotating text. The API is clunky and poorly documented and I definitely like my routines better, but this is done and someone else maintains it. A huge advantage. I can write my own interface routines to be more elegant if I want.

Continue reading ‘Don’t Reinvent the Wheel, PDF Style’ »

Looking at FPDF and at my PDF tutorial, it is clear that there are a few things that PDF's can do that aren't part of the API of FPDF. However, FPDF is easily extensible to include everything I might find useful, so I put together a package of those routines.

See the code.

See the sample output.

See the code that produced the sample output.

Continue reading ‘Paths, Vector Graphics and PHP images in FPDF’ »

Now we need to add text. That's the most useful part of a PDF, and the easiest. Also the hardest. Sometimes life is like that.

See the code.

Continue reading ‘Creating PDFs with PHP, part 5: Text’ »

Now that we can draw in our PDF, we want to add images. There are two kinds of images, bitmapped and vector. In PDF, images are called XObjects (The X stands for external, meaning defined outside the page). Vector images are easier, since they are just packages of PDF drawing commands, a sort of macro. PDF calls them Forms, since they were originally used to draw the boxes on a printed form.

See the code.

Continue reading ‘Creating PDFs with PHP, part 4: Images’ »

Now that we can create blank PDF's, it's time to add some stuff. Vector drawing commands (lines and shapes) are simple; you just add the commands to the page content stream. In terms of the original class that would be:

$this->pages[count($this->pages)-1]->contents .= "the command\n";
// we just need some whitespace at the end, but the newline makes it easier to read the resulting PDF

But to make things easier, we can keep track of the last page:

function newpage(){
  $this->currentPage = $this->pages[count($this->pages)-1];
// and now adding commands is:
$this->currentPage->contents .= "the command\n";
// this also has the advantage that we can manipulate currentPage to add commands to other content streams

There are lots of commands, all of which are postfix (parameters come before operators). There are no math operators or stack manipulation operators; any calculation has to be done before generating the PDF and numbers inserted directly.

See the code.

Continue reading ‘Creating PDFs with PHP, part 3: Drawing’ »

Continuing my attempt to dissect FDPF to understand PDF's, we'll create the simplest PDF: a blank page.

We need a couple of objects:

This serves as the root object and describes the data structures in the document, which for our purposes is just the collection of printed pages. Other things, like the data for interactive forms, Javascript routines and metadata (author, subject, keywords) would go here.
1 0 obj
  /Type /Catalog
  /Pages 0 2 R % reference to object number 2
One useful optional entry in the catalog is the /OpenAction that can be used to set the zoom level and opening page. /OpenAction [3 0 R /Fit] starts at the page described in object 3 and zooms in to fit the page on the screen.

See the code.

Continue reading ‘Creating PDFs with PHP, part 2: A Blank Page’ »

I wanted to allow my webservices to create PDF files, and I figured it couldn't be too hard—after all, it's just a bunch of graphics commands in a text file, right? Foolish me. The reference manual is 756 pages long, not including the javascript reference, another 769 pages. The place to start is fPDF, which is open source and pretty easy to understand, and its derivative tFPDF that lets you use and embed True Type fonts (it's the 21st century; who uses anything but True Type fonts?). Using it is simple:

define('_SYSTEM_TTFONTS', '/path/to/your/truetype/fonts/'); // Took a bit of experimenting to find the right values for these
putenv('GDFONTPATH='._SYSTEM_TTFONTS); // so we can use GD images as well
$pdf=new tFPDF();
$pdf->Cell(40,10,'Hello World!');

One gotcha is that you need to create the unifont directory within the fonts folder, and copy tFPDF's ttfonts.php file into that.

The result is here.

Continue reading ‘Creating PDFs with PHP: Syntax’ »