All I wanted was to get a better 404 page for the site and to use mod_rewrite to standardize my pages. Of course, nothing is ever simple and mod_rewrite is voodoo. So it took a week of experimenting, pulling my hair and staring at a lot of 500 errors, but I think I got things where I wanted them (still don't have a very elegant 404 page, though!). To make sure I didn't have to go through this again, and to help anyone else out there, here are the things I wish I had known:

askApache is my new best friend

There are lots of mod_rewrite tutorials out there on the web, but once you get past the simple stuff, nothing is obvious. askApache digs deeper and explains more of the hack-y stuff that you need to know in order to actually use your .htaccess file, especially his Crazy Advanced Mod_Rewrite Tutorial, Highly recommended.

Know the difference between Redirect and Rewrite

Both of these involve changing a URL to a different one. Redirecting, however, tells the browser that it changed (generally by sending a 301 status code) while rewriting (or "internally redirecting") keeps it a secret. Thus if I internally use URL's like // but publicly I want them to look like // I'll rewrite:

RewriteRule ^([^/\.]+)/([^/\.]+)$ index.php?main=$1&part=$2 [QSA,L]

But to publicly change a URL, say from // to // I'll redirect:

Redirect Permanent /oldname.html /newname.html
# or
RewriteRule ^oldname.html$ newname.html [R]

Note that it's usually easier to use Redirect or RedirectMatch rather than RewriteRule with the [R] flag. Note also that the Redirect pattern starts with the / (or whatever the per-directory prefix is) while RewriteRule patterns strip the / out. I've been burned by this! (this is true for .htaccess only; the per-server configuration file httpd.conf uses the entire URL. But I'm on a cheap shared server; I can't touch httpd.conf).

Use Redirect to append a slash to directories

Generally, if you use a URL like // you really want // Apache does this automatically for real directories. But if you're doing the /foo/bar to /index.php?main=foo&part=bar and /foo to /index.php?main=foo, you really want /foo to be redirected to /foo/, then /foo/ rewritten to /index.php?main=foo. If you don't, then relative URL's won't work. If the file returned by /foo contains a link <a href="bar"> you want the browser to interpret that href as /foo/bar. But it won't. It thinks the file has the URL /foo, so its directory is / and the relative URL bar refers to /bar. Redirecting /foo to /foo/ lets the browser know that foo is to be treated as a directory, so bar refers to /foo/bar. Example:

RedirectMatch Permanent ^/([^/\.]+)$ /$1/
RewriteRule ^([^/\.]+)/$ index.php?main=$1 [QSA,L]
RewriteRule ^([^/\.]+)/([^/\.]+)$ index.php?main=$1&part=$2 [QSA,L]

ErrorDocument should rewrite, not redirect

It's incredibly annoying to type in // (note the typo!) and have the address bar turn into // I want the address bar to stay the same so I can edit what I wrote! The ErrorDocument directive uses redirection if a complete URL is specified. This is bad. Just start the URL with a slash and it will use internal redirection (what I'm calling rewriting) and the browser's address bar will remain untouched.

ErrorDocument 404 /404.php
#NOT ErrorDocument 404

1&1 gets it wrong

I use the cheapest shared-server plan from 1&1 and I've been pretty happy with it: no frills, straight classic LAMP stack, reliable enough for me. But they "hijack" the ErrorDocument for active pages (.php, .cgi, .pl and I assume others) to their own advertising-laden page. That I can understand (again, it's a cheap plan) but they redirect so you can't correct the URL. So I need to do an end run around them by catching nonexistent files first:

RewriteCond %{REQUEST_FILENAME} \.php$
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule . /404.php [L]

RewriteCond -f is case sensitive

This burned me badly before I figured it out. The -f test for RewriteCond uses the underlying operating system to find the file, which means that on a Linux system, it's case-sensitive. No way around it; the [NC] flag doesn't help. So when I create gif's on my Windows machine in Paint and save it, it automatically names the file whatever.GIF and remains that way when I copy it to the Linux server. Now Apache will do fine finding the file if I request it with <img src="whatever.gif"/>, but if my .htaccess has a line RewriteCond %{REQUEST_FILENAME} !-f it will return true since whatever.gif doesn't exist; whatever.GIF does.

Hope this saves someone some time (and hair).

Leave a Reply

Warning: Undefined variable $user_ID in /home/public/blog/wp-content/themes/evanescence/comments.php on line 75