I use a lot of forms at work. The more paperless the office gets, the more paper we generate. Every school has its own physical exam form, every government agency has its own application form, every screening test is another form for the parent to fill out. And my handwriting is atrocious. So I try to get PDF copies of everything, then use PDF Escape to add text boxes that I can fill in, and an image of my signature at the bottom. But when filling them out, that still leaves a lot of either typing or cut-and-paste from the EMR (electronic medical record) of the patient's name, birthdate, address etc. There had to be a better way, and one that uses only free tools (I'm not buying Acrobat for $400).

Fortunately, Adobe Reader can run a version of Javascript, and we can use that to help fill in the form.

Every PDF includes a /Catalog object that serves as the root object of the document. Normally it just includes a reference to the array of pages, but it can include other things like Javascript to be executed when the document is opened. The syntax is convoluted; it is a dictionary containing a dictionary containing a string:

0 1 obj
<< 
  /Type /Catalog
  /Pages 0 2 R % a standard catalog entry
  /Names << % the Javascript entry
    /JavaScript <<
      /Names [
        (EmbeddedJS)
        <<
          /S /JavaScript
          /JS (
            app.alert('Hello, World!');
          )
        >>
      ]
    >>
  >> % end of the javascript entry
>>
endobj

That's complicated but the coding part is straightforward: take an existing PDF, open it in a text editor and find /Catalog and insert the boilerplate after the /Pages reference, and put in your code. PDF is smart enough to match parentheses, so as long as your code pairs them correctly (you don't have any strings like "We love smileys :)") you don't have to escape them. If you need to, escape them with a backslash. Actual backslashes in your code need to be escaped (write them as \\, since the PDF parser will read the string before interpreting it as Javascript.

This will create an incorrect PDF file, since the xref table no longer has the correct byte lengths. Adobe Reader will correct this automatically, as will PDF Escape, but they compress and otherwise munge up the code so it's impossible to further edit.

See a sample blank page that says "Hello, World".

So how do I use this? Lets say I have a Missouri form MO 580-1878, CHILD MEDICAL EXAMINATION REPORT, for preschool. I load it into PDF Escape to generate a form with textfields, my signature and office address. I can now fill it out, but it's inconvenient to have to type everything. So I load it back into PDF Escape and pre-fill the text fields. The instructions for specialized care I'll just enter "None," since that's by far the most common; it's still a text field that I can change if I need to. The other fields get boilerplate text, like {date}, {dob} (for date of birth), or {dos} (for date of service), etc. The result isn't so useful--the boilerplate text still needs to be replaced.

So we add a Javascript routine to replace all the text with the appropriate values (and erase any fields with text that starts with '{' so the boilerplate doesn't show up if we're missing data):

var doc = this;
function replace(replacements){
	for ( var i=0; i<doc.numFields; i++) {
		var field = doc.getField(doc.getNthFieldName(i)); // go through every text field in the document
		if (field.type == "text" && /^{/.test(field.value)){
			field.value = replacements[field.value] || '';
		}
	}
}
replace({
	'{date}' : '6/5/2012',
	'{name}' : 'John Doe',
	'{dos}' : '1/1/2012'
});

Now, we have to get that code into the PDF, but customized for each patient, of course. The following PHP program takes the parameters from the URL query and uses that for the replacements argument, but uses the date and time on the client side:

$file = file_get_contents('http://bililite.com/blog/blogfiles/pdf/preschool%20physical-fields.pdf');
$data = json_encode($_GET); // this should make it safe
$js = <<<EOT
/Names << /JavaScript <</Names [ (EmbeddedJS) << /S /Javascript /JS (
	var replacements=$data;
	replacements['{date}'] = util.printd('m/d/yyyy', new Date()),
	replacements['{time}'] = util.printd('HH:MM', new Date())
	var doc = this;
	function replace(replacements){
		for ( var i=0; i<doc.numFields; i++) {
			var field = doc.getField(doc.getNthFieldName(i));
			if (field.type == "text" && /^{/.test(field.value)){
				field.value = replacements[field.value] || '';
			}
		}
	}
	replace(replacements);
) >> ]  >> >>
EOT;

$file =  preg_replace(
	'#/Type\s*/Catalog#',
	'$0'.$js,
	$file
);
header('Content-Type: application/pdf');
echo $file;

And you can try it out here (obviously in real life, I have it generated by the Electronic Medical Record):

Basically, it pulls in the PDF file into $file and creates a JSON object from the URL into $data, then puts that JSON into a Javascript template (note the var replacements = $data line. The Javascript template also includes the PDF hieroglyphics to insert it into the PDF file itself, which it does by searching for the string /Type /Catalog and putting the Javascript right after that.

And now, I don't need to do nearly as much typing!

4 Comments

  1. Hacking at 0300 : Changing Images in a PDF says:

    […] PHP routines to fill in PDF forms work great, and now my partner wants to use them too. Changing the forms to make the physician name […]

  2. Wilton says:

    Need to add submit button to pdf that will work on website.

  3. Danny says:

    I’m not sure how that would work. Adding javascript to PDF files is a whole adventure unto itself.

Leave a Reply


Warning: Undefined variable $user_ID in /home/public/blog/wp-content/themes/evanescence/comments.php on line 75