Archive for June 5th, 2012

Just found another useful tool for manipulating PDF's: the PDF Toolkit. It's a command line tool based on iText that I use mostly for merging PDF's together. The big downside is that embedded Javascript is lost, so that has to be added to the PDF after it has been put together.

So my free (as in free beer and free speech) PDF tools now include:

  • Open Office, for creating PDF's from word processing documents.
  • PDF Escape, for modifying and adding fields (text, checkboxes). This does preserve Javascript code, but compresses everything so you can't edit it further.
  • pdftk, for merging PDF's.
  • tcpdf, for creating PDF's with PHP.
  • A good text editor and a thorough understanding of the PDF specification, to hand tweak.

The PDF specification is very particular about byte lengths of each element, with a table at the end that specifies exactly where in the file everything is, but the most recent Adobe Reader is pretty forgiving (a millisecond alert pops up that it is trying to fix the file). That's important if I'm hand-tweaking a PDF, since I can't correct the cross reference table. PDF Escape, fortunately, will correct everything, so if it's important I can just upload the tweaked PDF and download it back.

I use Bing for the search box on, and it's worked well; simple API, no need to create a custom search engine as with Google. Unfortunately, Microsoft is losing almost half a million dollars an hour on Bing, and they want me to make up the difference. Well, not me alone, but they are going to start charging for using their web services. Fortunately, they are (as of now) providing a free tier of up to 5,000 queries a month, which is far more than I need.

So I have to sign up for Azure Marketplace (Azure is Microsoft's cloud service) and Subscribe to the Bing Web Search API and create an application key. Then I need to convert my old requests into the new format. Luckily, Microsoft provides a migration guide (as a Word document!), and that includes sample code in PHP. The biggest difference is the need for HTTP authentication. The code from Microsoft works, as long I leave out the proxy line in the context parameters (I guess they only tested their code on local servers) and file_get_contents works on URLs, which is enabled on my service with Nearly Free Speech. I imagine setting the header similarly with cUrl would also work.

The other big difference is that they no longer return the total number of results if not all of them were returned. Now they return a parameter __next (note two underlines) that contains the URL for getting more results if they are available. Since I'm only showing a limited list, I just need to test for the existence of that parameter to indicate that more results are available.

So the updated code is:

Continue reading ‘New Bing API’ »