This post originated from an RSS feed registered with Ruby Buzz
by Guy Naor.
Original Post: Smart Caching and mod_rewrite Black Magic
Feed Title: Famundo - The Dev Blog
Feed URL: http://devblog.famundo.com/xml/rss/feed.xml
Feed Description: A blog describing the development and related technologies involved in creating famundo.com - a family management sytem written using Ruby On Rails and postgres
To be able to server large amount of data, especially pictures and similar files, I needed it to be served by the web server directly. Web servers are so optimized to serve this kind of files that anything developed in rails (or any other framework/language) will always be slower.
But there is a problem with serving those files in a system requiring authentication. Putting them anywhere the server can get at, will make them available to anyone knowing the URL - not good at all. Using caching (like the one built into rails) is possible, but becomes impractical for large number of files, and require cache management in the application.
So I decided to handle the different access modes for famundo in two separate ways. The private files requiring authentication for access will be delivered directly from the controllers of famundo, using send_file(). The public files will be served directly by the web server.
To make sure I don't have to copy and duplicate every file I want to share, I just symlink from the public area into the private area files. A note here: NEVER EVER point to a directory this way - only to files. If you point to a directory you open yourself up to directory traversal attacks.
So for a simple system, this wil be it. But for a system hosting large number of distinct users, each with his/her own data, some more changes are needed. Each family hosted in famundo will have it's own files and directories, and we need to keep them separate. And because we don't want to keep the files with the original file name, but only with some internal ID (keeps managemet MUCH easier, and obliviates the need for file name sanitation), we need an easy way to get at the file but still return the correct name to the browser. Enter mod_rewrite. Please note that I'm describing lighttpd mod_rewrite and not apache's. A similar thing can be done with apache, but the syntax is different.
For this example, we will have a family hosted as smith3.famundo.com. And the files we publish for this family will we internaly stored at: /data/private_data/families/s/m/smith3/files/ (Side note: the s and m before the full family name serve to distribute the families over multiple directories - I can explain the reasoning if there's an interest). Files are also distributed using part of the file ID. The file my_amazing_picture.png, that has ID 835 in the database, will be stored under /3/5/835.png. So the full path to the file will be: /data/private_data/families/s/m/smith3/files/3/5/835.png. In the public direcotry of rails, we add a directory structure for the files:
public/family_files/s/m/smitsh3/files, and in it we store the symlinks to the private files. So our famous picture will be a symlink:
But now we want a nice URL. We don't want to point every picture at this long ugly URL. mod_rewrite to the rescue! First thing first, enable mod_rewrite in the lighttpd config. Then add the following "black-magic" entry to the config file:
A nice side effect of this trick is that we can give the file the true name in the URL. We don't even need to sanitize it, because we discard it when we retrieve the file. We retrieve the ID based name, but to the browser it appeared named correctly.
Some things to keep in mind:
The family name need to be atleast 2 characters long.
The file name based on the ID need to be atleast 2 digits. You can always use sprintf "%02d", id for that.
Sharing a file is only creating a symlink in the published structure. Unsharing is just a delete of the same symlink.
Different mappings can be created as long as the mapping can be represented in a regular expression.
A more advanced mapping can be created by using the Lua language with lighttpd config file.