Taco de Wolff

Data Scientist

Domain independent htaccess

Published on October 02, 2014

Since URL canonicalization is quite important, according to Google for example, you should use an htaccess file to route some URLs. Using the mod_rewrite extension in Apache we can redirect URLs either explicitly (issue HTTP code 301 - Moved Permanently) or implicitly (rewrite internally).

As a programmer of mostly tools and not specific websites, I encountered many cases where I needed the htaccess file to be usefull for any site. So it has to be domain independent and directory independent. Apparently not many people need this because information about this is scarce. Hereby a collection including explaination of rules in this order used by this site.

General part

You should always enclose mod_rewrite related rules within the if-statement or else the website will give a 500 - Internal Server Error when the module is not loaded! Within the if-statement, rewriting is enabled, we set the environment variable BASE and we set the rewrite rules as discussed below.

<IfModule mod_rewrite.c>
	RewriteEngine on
	RewriteBase /

	# get rewrite base manually
	RewriteCond $0#%{REQUEST_URI} ^([^#]*)#(.*)/?\1$
	RewriteRule ^.*$ - [E=BASE:%2]

	# rewrite rules as shown below are put here

The manual extraction of the base directory makes it possible to make the htaccess file work directory independent.

The condition uses $0 which is a backreference to the complete match in the rule. Then we append the request URI after a hash, because a hash never appears in either of the variables. The pattern tries to find the difference between both variables (%{REQUEST_URI} ends with $0) and extracts the path relative to this directory. The rule puts this in ENV:BASE. The /? part in the pattern is when \1 is empty which can have a trailing slash.

With the BASE variable becomes /dir/dir-with-this-htaccess, REQUEST_URI being /relative-dir/page.

Disable HTTPS

Either you disable or enable HTTPS (for certain pages) for your site. Since I run a simple site I disable it entirely.

# disable https
RewriteCond %{HTTPS} on
RewriteRule ^ http://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]

Omitting www from domain

I prefer to omit the www. part from Because the www subdomain is used so often, it’s better to save space.

# omit www
RewriteCond %{HTTP_HOST} ^www\.(.+)$ [NC]
RewriteRule ^ http://%1%{REQUEST_URI} [L,R=301]

The condition ensures that the host starts with www. (not case sensitive). The rest of the hostname is put in variable %1. The rule matches to everything, omits the www. and appends the request URI; 301 hard redirect.

Redirect index.html to root

Omitting the index.html is possible because using the DirectoryIndex directive Apache automatically opens the index file if you request a directory.

# omit index.html
RewriteRule ^(.*)index(\.html)?$ %{ENV:BASE}$1/ [L,R=301]

Omit extensions

I only use HTML files with my static website generator, but you can easily add other extensions.

# omit .html
RewriteCond %{THE_REQUEST} ^GET\ .*\.html\ HTTP
RewriteCond %{REQUEST_URI} ^(.*)\.html$
RewriteRule ^ http://%{HTTP_HOST}%1 [L,R=301]

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME}.html -f
RewriteRule ^ %{REQUEST_FILENAME}.html [L]

First rewrite catches all requests which have .html appended and rewrite those. Then internally redirect such requests to the actual files.

Handle 404’s

If your DocumentRoot is properly set to the root directory of your site, you can simply use the following.

ErrorDocument 404 /404.html

If not, this redirects to 404.html in the same directory as .htaccess when the requested file does not exist. But it doesn’t send a 404 - Not Found HTTP code like the code above does.

# handle 404
RewriteCond %{REQUEST_FILENAME}index.html !-f
RewriteRule /$ http://%{HTTP_HOST}%{ENV:BASE}404 [L,R=301]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME}.html !-f
RewriteRule [^/]$ http://%{HTTP_HOST}%{ENV:BASE}404 [L,R=301]

First match non-existing index files, then handle the rest.