Archive for January, 2010

A simple walkthrough to use the Engauge Digitizer to pull values off a "printed" graph

There's a far more complete manual that comes with the download, but these are the steps I used to generate the data for the webservices graphs. It assumes you've got a black and white image of the graph, with continuous lines for the data and orthogonal gridlines, and linear axes.

Continue reading ‘Engauge Digitizer Tutorial’ »

I wanted to add Down Syndrome growth charts to the bililite.com webservices, but as far as I can tell, the charts are available only as images in the AAP's guidelines (and the original paper; subscription only). The often-cited growthcharts.com has charts, and Greg Richards was generous enough to share his data with me. However, some of the data are from a different study, and he got his data from the original charts the old-fashioned way: with pencil, ruler, and a blown-up copy of the paper. Nothing wrong with that; that's how I got my numbers for the bilirubin chart, but I wanted all my charts to match the AAP's.

So how to get the numbers off the graph? I emailed the lead author of the original paper, but haven't gotten any answer. I can pull the graphs as gif's from the PDF of the paper (thanks to OpenOffice.org and Sun's PDF importer; Adobe's reader seems to get more limited with each upgrade). I was afraid I would have to digitize the graph by hand; I read the cool article on Sudoku recognition and figured I could learn about Hough transforms to get the graph, and 2-D Fourier transforms to remove the gridlines, then blob detection to find the lines. Turning pixels into measurements would be the trivial last step. Sounds like fun, if I had an infinite amount of free time.

Luckily, I found Engauge Digitizer. With almost no time reading the manual, I had it removing gridlines, digitizing the curves on the graph, and exporting values at x-values that I selected into CSV files. It was close to easy. Not quite automated, but with only 4 graphs to digitize, I was done in half an hour. Highly recommended. With my remaining free time, I'll write a quick tutorial so I don't forget what I did.

I wanted the webservices to be as RESTful as possible, so they should use the Accept: header rather than file name extensions to determine the type. Continue reading ‘Parsing the HTTP Accept: header’ »