[H]undreds of decision tools in a variety of forms—guidelines, practice parameters, prediction rules—have been generated. Some have been good, some bad; some have been validated, others not. But what they all have in common is that their overall use remains poor at best. In the meantime, those of us in academia continue to create them and those of us on editorial boards continue to vet them for methodological rigor. The cottage industry of decision tools has at least the appearance of an academic jobs program since, to clinicians in the real world, their utilities remain largely unproven [emphasis added]. For example, there are no fewer than 10 clinical prediction rules for something as common as streptococcal pharyngitis, and I would be surprised if most clinicians even use one.
The big problem is the difference between significance and importance. Significance in statistical jargon is an estimate of how likely the results are to be true (see David Sackett's article on randomized controlled trials) but it says nothing about whether the results mean anything to me in real life. A test that tells me a patient has a 50% chance of having strep throat, when before I would have guessed at 20%, may have a p = 0.0001
(the test is consistently better than chance!). But who cares? For actual clinical practice, things that affect my decision-making need to be orders of magnitude apart, not a few percent. Similarly, a new fever-reducing medicine may have reduced the temperature by 0.1 degree with a really high ?2, but isn't going to make me prescribe it.
But real life is rarely so cooperative as to give you high odds ratios, so guidelines based on actual data tend to be less than useful, and guidelines with advice that can actually be implemented tend to be based on "expert opinion," which is a polite way of saying "faith." ("Experience is the ability to make the same mistakes with greater and greater confidence").
So what do we do? The academics would have us analyze every article ever published and determine our own statistics, but that's impossible. All we can do is read the reviews and guidelines and make sure we know which conclusions are solid, and treat them with respect, and which are strongly-held opinions by experts in the field, and listen politely and decide if we think we know enough to overrule them. If the guideline won't tell you the difference, throw it out.
There's still some room out there for the art of medicine.
Update: A perfect example in this month's Pediatrics (Taisan and Copp (2011) Pediatrics 127:119-128). A review of ultrasound to look for undescended testes showed that it would change the probability of an intra-abdominal testis (requiring surgery) from 49% to 64%. It doesn't matter how significant that difference is, it isn't clinically important and therefore the ultrasound is not indicated.