This post originated from an RSS feed registered with Web Buzz
by Stuart Langridge.
Original Post: Anatomy of a simple web application
Feed Title: as days pass by
Feed URL: http://feeds.feedburner.com/kryogenix
Feed Description: scratched tallies on the prison wall
I thought I’d put together some notes on how the Tory poster generator actually works, as an example of how I write a small web application. While it is small, a reasonable amount of thought went into its design, and those design principles are extendable (in a handwavy sort of sense) to larger and more complex web apps.
Introduction to the project
What the Generator does, for those of you who haven’t tried it, is create an image of a poster, by asking the user for some text to go on said poster and then creating a downloadable PNG with that text, in a particular font, superimposed over an existing image. The font is one of a selection of “handwriting” fonts which were available for free download; the user may choose any of these fonts, or one will be randomly selected.
Too many users spoil the broth
The key point I had in mind with this app is that image generation is pretty intensive. Since I had an idea in my head that the Generator might turn out to be popular, I was worried that it would overwhelm the machine that it’s on. The obvious way to build a web app like this would be to have the image generation happen “in process“; that is, the architecture looks like this:
The problem with that idea, where you generate the image in PHP code, is that the page hangs while you’re waiting for the image to be generated. Moreover, if a thousand people all request an image at once, then a thousand image generation PHP pages all run at once. That’s certain death to the server, unless you’ve got a pretty studly server, which I haven’t. So, time for a more efficient approach.
A better way
Instead of doing the generation “in process“, I decided to do it “out of process“. So the bit the user sees and uses and the bit that actually generates the images would be entirely different processes. The image generation process would just run in the background on the server, generating one image at a time, and the front end would just hand “requests” for image generation to it, and then wait around until it was done. Something like this:
Separating the processes makes each part quick and easy. The front end simply adds requests to the queue and then loads a “wait page”: this page refreshes itself every five seconds and, on load, checks whether its particular request has been completed, by looking for the completed image in the storage area. If it has, redirect to the final page, which shows the completed image. Meanwhile, the back end, or server process, or daemon (call it what you will) checks every five seconds to see if there is a new request in the queue. If there is then it stops checking the queue and starts generating that request. When finished, it adds the completed image to the storage area and then resumes its every-five-seconds check. This approach entirely solves the worry of too much traffic; if a thousand people all generate an image at once then a thousand image requests go into the queue (which is not intensive) and then the back end just processes them one by one. This leaves all the people near the end of our thousand waiting around a lot at Stage 2, but, critically, they’re not killing the server by doing so. Running a thousand image generating processes at once would probably leave everyone waiting nearly as long, and would not incidentally max out the server while it was doing it. Not good for anyone, that.
Actually building it
So, two separate processes: one web-based, one a background daemon. The web-based process is a very simple sequence of web pages; a good solution here would be PHP. The server process needs to have good image-creation capabilities; use whatever language you feel most comfortable writing real programs in. I chose Python, which regular readers will not be surprised to hear, and the Python Imaging Library.
The back end
The back end daemon’s simple operation is described above; I’m not going to go into much detail about how it actually uses PIL and some TrueType fonts to write the requested details onto the poster image. You can browse the source for the daemon file posterd.py if you’re interested in that.
The front end is composed of three stages, as shown in the diagram; the stages correspond to three PHP files: index.php, waitpage.php, and display.php. The design is conceptually like that shown in the diagram, but in implementation it ended up slightly differently, because of a technique I tend to use for multi-page processes on the web. Imagine the simplest of these processes: ask the user for some data in a form, save the data in a database, and then display a thankyou message. Some people would have a two page process as follows: page1 contains the form, which submits to page2; Page2 saves the data and then displays the message. I don’t do it like that; I have the form on page1 submit to page1, and page1 is structured as follows:
if data has been submitted
save the data in the database
redirect to page 2
end page
end if
display the form
So, in actuality, the “add a request for image generation” part is done in page1, and page2 doesn’t do anything but go round and round the refresh loop until the daemon has completed the request.
Smarty
Smarty made this process really, really easy. The way it works, for those of you not familiar with it, maps very neatly onto simple projects like this, because it helps the pages separate out. The actual PHP page that the user visits (take index.php, Stage 1, as an example) just contains page logic. It doesn’t contain any HTML at all. This means that, basically, it looks exactly like the pseudocode I outlined above; it’s about ten lines of code. The display the form bit actually reads $smarty->display('frontpage.tpl');, which picks up frontpage.tpl, a plain HTML file, and displays it. This means that your HTML template files, *.tpl, look like HTML, and aren’t cluttered with code. Meanwhile, your PHP files look like PHP, and aren’t cluttered up with HTML. This separation is fantastic. On more complicated projects it’s less easy, because you end up having to create little blocks of HTML in the PHP code ready to be substituted into the template, and that’s not good, but for a simple project like this it was a real boon to use.
Conclusions
There’s lots more I could write about this, like how I make sure the daemon stays running, and how I clear out old generated images, and how index.php randomly picks a font and substitutes it into the template, and how everything to do with the queue (adding new requests, seeing if a request is completed, ad nauseam) is separated out into a small library that other PHP code (in index.php, waitpage.php, etc) can call, but I think I’ll stop here. The point is that tiny projects like my poster creator give you a chance to try out these techniques; it’s all too common to think that, well, this is a quick knock-off application, so I won’t bother to apply technique to it, I’ll just code from the hip without structure. But then when you come to do a real proper project, applying the techniques is complicated and awful and so you don’t do it then either, because you’re not really sure how. Hone your skills and your approach by doing all this complex stuff on projects that don’t really need it (for example, the Generator has a spec (albeit a pretty simple one)), so that when you do need it you’ll be comfortable and confident with the techniques.