Sunday, December 10, 2006

HTACCESS Wrappers with PHP

HTACCESS is a remarkable tool you can use for password protection, error handling (like custom 404 pages), or HTTP redirects. It can also be used to transform whole folders in seconds: adding headers to all your HTML documents, watermarking all your images, and more.

A wrapper is like a middleman. Using htaccess you can tell your web server to "forward" certain files to PHP scripts of yours. When a visitor tries to load an image in their browser, you could activate a script that adds a watermark to the image. When an HTML page is loaded you could query an IP-to-country database and have your HTML pages translated into the native language of your visitor's country-of-origin.

Every file in a folder, or all files of a certain type in a folder, can be instructed to go through a PHP script.

TORTILLA WRAP

Pretend you host several affiliate sites, or a full-blown hosting service like Geocities. Most sites running on free hosting services have some kind of advertisement the owners use to generate revenue. These aren't applied voluntarily by the users of these services. The ads don't even show up on their source files, just when displayed on the web.

It's possible to replicate this feature using less than 10 lines of PHP and htaccess code. To start off, make a folder on your web host called "header". Create a new text file and enter the following:

AddHandler headered .htm
AddHandler headered .html

Action headered /header/header.php

This designates files with the extension ".htm" and ".html" to a type called "headered". The name "headered" can really be anything, it's just a way of labeling a group of files. The last line there tells the web server that if any of the file types in the group called "headered" are called, we should instead execute the script "/header/header.php". This is the relative path, so if your URL is http://your.host, this will run http://your.host/header/header.php.

That's all you've got to do for the htaccess file. Save that as "htaccess.txt" -- we'll get back to it later.

For the actual wrapper, create a new text file with the standard tags, then assign your header and footer file names to variables called $header and $footer.

$header = "header.html";
$footer = "footer.html";

Redirecting a user to our script doesn't pass its contents to it, just the filename. If you call phpinfo() in your script and scroll to the bottom you can see all the server variables which give us the name. The element "REQUEST_URI" in $_SERVER gives us the relative path (/header/sample.html), but we want the full system path since we're going to be reading the actual file (/home/username/wwwroot/your.host/header/sample.html), which is "PATH_TRANSLATED".

$file = $_SERVER["PATH_TRANSLATED"];

The name of the file that just tried to be shown is now stored in the variable $file. Three simple things are left: output the header, output the actual file, then output the footer.

readfile($header);
readfile($file);
readfile($footer);

That's it. Here's the entire header.php file:

All that, in just nine lines of code. Download it here: http://www.jumpx.com/tutorials/wrapper/header.zip

That contains the htaccess file and PHP wrapper script, along with a sample header, footer, and a test page. Upload all five files to your web host, chmod htaccess.txt to 0755 then rename it to ".htaccess". It might disappear from your directory listing which is okay, it should still be there.

Load, in your browser, the copy of sample.html residing on your web server. The text "This is my header" should appear at the top while "This is my footer" should show on the bottom. If you open up the actual file called sample.html, you'll see that these actually aren't there. They've been added in by the script all HTML files in the folder "header" must now pass through.

This is how wrappers work. Certain things, like adding custom headers and footers are done "on the fly" without modifying your original file. You'll get the same effect if you create other HTML files and upload them to this folder.

Files without ".html" or ".htm" extensions, such as text files or images, won't show these headers or footers. This is a good thing because text files aren't part of the presentation on a web site and adding extra text to images will corrupt them. It affects all HTML files within your /headers folder, and none of the files outside of it.

If you wanted, you could add or remove any file extensions you want, just by adding or taking away those "AddHandler" lines.

To get everything back to normal, either delete your .htaccess file or upload a blank .htaccess file in that folder, and all will be well again.

SHRINK-WRAP

The same basic formula can be applied again for other uses -- HTTP compression, for example. This was an idea that used to be impractical because computers ran at slower speeds, and is now obsolete because of broadband technologies (DSL and cable).

It works like this: when an HTML page is loaded, the web server instead gives the visitor a zipped or compressed version of that page. The visitor downloads that file, which of course takes up less space than the real thing and downloads in less time, then unzips it and displays the original page.

In this age of lighting fast DSL lines, there's almost no noticeable difference. However, if you have a site that hosts large files whose audience is mostly dialup users, it might be something to look into.

Make a new folder called "compress". Create your htaccess file again, just as before, but set the extensions to include .htm, .html, and .txt. (The group name, folder name, and script name have nothing to do with one another, you can name any of these whatever you like -- I just like things to match.)

Our wrapper script for this should be called "compress.php". That's what I'm naming mine. This means the htaccess file you have should look as follows:

AddHandler compress .html
AddHandler compress .htm
AddHandler compress .txt

Action compress /compress/compress.php

If our wrapper were simply going to pass through the file (in other words, just read its contents into a variable and display it), our handler script would look like this:

"GIFT WRAPPING" YOUR OUTPUT

To make the HTTP compression work, we use two functions: ob_start() and ob_gzhandler(). Output buffering functions are strange. Any time you try to display something, you can have PHP save up everything you're trying to output. At the very end it's all dumped into a function of your choosing where the text can be changed or transformed before it's output.

There is a built-in PHP function called ob_gzhandler() which takes one parameter (a string of text), compresses the data according to the gzip standard and does all the header trickery that's needed to tell the user's browser that we are transmitting data that needs to be decompressed once it's downloaded. When this line is used:

ob_start("ob_gzhandler");

It tells PHP: everything displayed afterwards has to go through the function ob_gzhandler() first. Put that at the top of our script and here's what we've got:

Save that as compress.php. Upload both files, chmod htaccess.txt to 0755 and rename to .htaccess and you're done. That's all you need for it to work, and you can just as easily apply HTTP compression to any script by just adding that line.

To try this puppy out, I got on a dialup connection and put a copy of "The Decline And Fall Of The Roman Empire Volume 1" on my web host, a 900 page book, about 1.6 megabytes in size. Without HTTP compression it took 5 and a half minutes to download. With the compression, only 2 minutes. Internet Explorer told me the download was going at 20 KB per second, impossible with a dialup connection... but since the file was zipped, I really was downloading 20 KB a second (once the data was decompressed on my end) over a 5 KB per second connection.

Though HTTP compression will work on sounds, video, and images, the space you save is negligible, usually only a few bytes. These sorts of media are already heavily compressed so zipping makes almost no difference. This is why we've told htaccess to only use compression on text and HTML, because it's with human languages like English where a lot of repetition occurs, which means more information can be compressed.

Not all browsers support HTTP compression, but ob_gzhandler() figures out if a browser can support HTTP compression. If the browser doesn't, the original file is displayed, no harm done.

You can get a copy of this sample script at: http://www.jumpx.com/tutorials/wrapper/compress.zip

Both of these scripts I've created for you will work only on static files, files that actually exist such as images or HTML files. If you tried to apply these wrappers as-is to PHP scripts, Perl scripts or even HTML pages that use SSI. If your whole site is run by a single script it's a better idea to hard-code these things right in, anyway.

THE BEST THING SINCE BUBBLE WRAP

This last demonstration of an htaccess wrapper is something that I think most people with content sites have a use for. On the Internet, people steal stuff. Theft of HTML source code is a nuisance, sure, but the lifting of images is more common. Someone likes a logo on your page, or an e-book cover, or a picture of a physical product you're selling, and it becomes theirs to use.

A practical way to keep this from happening is to add a watermark to all your images, which is your logo or name on a corner somewhere, forcing anyone who takes your graphic to either unwillingly give you credit, or chop off a part of that picture.

Lucky for us, PHP has a set of functions to handle images, and in version 4.3 and above, it's included by default. Wrappers come in handy here because you might have an entire site full of images and would rather not spend three weeks watermarking tons of images by hand. Maybe you just don't want to have to juggle two sets of images, one watermarked and one normal.

Download this script from: http://www.jumpx.com/tutorials/wrapper/watermark.zip

The only files you need to worry about in that zip are htaccess.txt and wrapper.php. Upload them to a folder called "watermark", chmod htaccess.txt to 0755 and rename to ".htaccess".

The file wrapper.php remain as is. I've put comments in the file regarding most of what it does, so if you're curious go ahead and take a peek.

What the script does is this: It figures out the original image that was supposed to be called. Then it loads the watermark, which I've set in wrapper.php to be "watermark.png" which is just a PNG image containing the text "THIS IS WATER MARKED". The watermark is placed on top of the original, in the lower right corner, and output in the same format (i.e., JPEG) as the original.

You can tell the difference by looking at these two images:

http://www.jumpx.com/tutorials/wrapper/thomas.jpg

http://www.jumpx.com/tutorials/wrapper/thomas-watermarked.jpg

I've included several types of images (GIFs, JPGs, and PNGs) in the zip file for you to test out. Once you've got everything setup, upload those images and see how they look with the watermark.

This script will work with GIFs, JPEGs, and PNGs. Due to a patent issue (which expires worldwide in July 2004) GIFs can only be read, and not output. To make up for this, any of your GIFs will be output as PNGs, which should still work.

THE WRAP-UP

If you think about it, a watermark script like this could also be used for a number of things. For example, if you decide to run an image hosting service like AuctionWatch does for eBay users, you could watermark your site's URL to the bottom. Your users get a free service and everyone else sees a possible place to get free image hosting, there's some nice viral promotion right there.

You could also adapt the script to check the HTTP referer (in the variable $_SERVER["HTTP_REFERER"]) to see if the image was called offsite. If it was, the script would put the watermark on there but if you called it from a page on your own site, the image would be shown without one.

Even I have put wrappers to good use. Last year I wrote a product for Teresa King called Codewarden, which uses htaccess wrappers to display all the files of a directory in an encoded JavaScript string in an effort to hide HTML source.

No comments: