Weblogs Forum - Automatically Synchronize Your Site

Articles |
News |
Weblogs |
Books |
Forums

Artima Forums | Articles | Weblogs | Java Answers | News

Sponsored Link •

Weblogs Forum
Automatically Synchronize Your Site

7 replies on 1 page. Most recent reply: Apr 27, 2007 5:36 PM by Bruce Eckel

Welcome Guest
Sign In

Back to Topic List

Reply to this Topic

Search Forum

Threaded View


Previous Topic		Next Topic

Flat View: This topic has 7 replies on 1 page

Bruce Eckel

Posts: 875
Nickname: beckel
Registered: Jun, 2003

Automatically Synchronize Your Site (View in Weblogs)

Posted: Apr 27, 2007 5:36 PM

Summary
The most interesting part of this is not the tool -- although that was fun to create -- but the process of letting go that allowed me to build it.

Backstory: I started using Zope many years ago. I've slowly discovered that Zope is great, amazing, spectacular, as long as you live and breathe Zope (it's especially good for big, industrial-strength, powerful customized sites). But Zope pretty much reinvents the world, so if a web site is something that you do casually on the side, the Zope world is going to look different than your normal world. So you forget the Zope world every time, and either have to relearn it when you come back, or end up just using the bare bones basics of Zope. That's what I did.

As an example: the most trivial and basic thing you could want to do, take the fields in an input form and store them: Very nontrivial in Zope. Using raw Python, PHP, Perl, on your server: fairly trivial.

So I've been slowly trying to shift away from Zope into something clean, basic, and straightforward. I've been moving towards the Unix philosophy: instead of one, big, do-all tool (Zope), piece things together using a lot of little tools. So: CSS, HTML, PHP (but only simple PHP for simple things), standalone Python or a framework like TurboGears (for more complex things). Cheap server (GoDaddy, in my case) from a service that's big enough I don't have to worry about uptime and maintenance issues (and so my system can easily be transferred to customers with low budgets). And very simple development tools.

Perhaps they've gotten better, but I'm still smarting horribly from my initial experience with web page development tools. There turned out to be no way they could isolate you from the underlying code, so you ended up hacking on horrible HTML. After looking in vain for a tool that would generate readable (that is, editable) HTML, I reverted to hand-coding everything, just so it was maintainable. I've been fairly happy with that decision.

CSS, although it is far from living up to its promise, was a big aid in this direction. They wanted to allow you to annotate your HTML with styles, defined in a single place so they could be easily changed. Noble and desirable, so probably the stupidest thing in CSS is that you can't actually create your own styles, as in <MyStyle>, but that you must use the hacky and verbose style="MyStyle" which was a sad and costly compromise. And of course there's the fact that CSS doesn't behave the same on all platforms, something else it promised to do. Despite all that, it's a big improvement. And the pages seem to come up a lot faster, which makes sense because pages are now primarily content and have much less markup (which can be cached), thus there's a lot less to move across the wire.

So now my pages are relatively readable and so not very hard to write and maintain by hand (which is something that Zope also provided). Thus I am able to move away from Zope and as I do so things actually seem to get better in a number of places (I know, Zope can also do CSS just fine. My problem isn't that, it's the programing issues).

The point of this article is another feature of Zope that I "couldn't live without": through-the-web editing. Zope allows you to create and edit objects from any machine, using only a web browser. I've actually used this feature a handful of times while traveling, via internet cafes. But hardly ever. Nonetheless, I was convinced I needed it and that's what I found fascinating. I didn't think about it very hard before making this decision, because if I had I would have questioned the value of it.

Through-the-web editing is so amazingly primitive that I have often found myself copying the contents of a page into UltraEdit, performing more sophisticated edits there, and pasting it back in. That should have clued me in, but it took awhile.

I searched for through-the-web editing tools that might be installed on Godaddy. These are a bit vague -- the descriptions I read did not make it clear exactly what they did, but the most promising open-source one appeared to be the unfortunately-but-unforgetably-named "fckeditor." I didn't get around to trying it before I had my epiphany.

Which was this: I just want to edit files, using powerful editing tools (including python tools I might write myself), and just have the results magically appear on the server. Through-the-web editing was holding me back, a lot. And as happens lately when I have one of these epiphanies, I always wonder what else I'm stuck on that's holding me back.

Anyway, here's my solution. This program watches the entire subtree from where it's started, and anytime a file becomes newer than the last time the program recorded the modification date, that file is copied to the server. If you add new directories, those are automatically created on the server. The program repeats every second (printing '.' when there's nothing to do, so you know it's alive), which means you get reasonably responsive results.

The directory and file information is stored in a text file using Python's pprint (pretty-printing), so the file is both readable and editable. This information is stored as the text representation of a tuple of a list and a dictionary; to recover this information all I do is eval the contents of that file, which happens on program startup. Every time the directory is successfully updated to the server, the information is stored to disk.


"""
updateSite.py

by Bruce Eckel, www.MindView.net

Mirror (every repeatInterval seconds) the current directory
hierarchy onto a web site using ftp. Only updates files that
have changed since the last update. Does NOT compare this
hierarchy with the destination web site. If something has
changed since last time, it's pushed onto the server.

Place this program at the root of your web site tree on your
computer, then just double-click it to run it. It will run in
the background until you kill the window.

"""
repeatInterval = 1.0 # seconds
fileData = "updateSiteData.txt"
excludeFiles = (fileData, "updateSiteConfig.txt")

site, user, password = eval(file("updateSiteConfig.txt").read())

import os, ftplib, traceback, threading, pprint

class SiteUpdater(object):
    def __init__(self):
        # Read in old file data:
        if os.path.exists(fileData):
            self.oldDirs, self.oldFiles = eval(file(fileData).read())
        else:
            self.oldDirs = []
            self.oldFiles = {}
        self.__connect()
        self.updateSite()

    def __connect(self):
        self.ftp = ftplib.FTP()
        self.ftp.connect(site, port=21)
        self.ftp.login(user, password)
        print "Connected"

    def __del__(self):
        self.ftp.close()

    @staticmethod
    def getTree():
        changedFiles = {}
        newDirs = []
        for dir, subdirs, files in os.walk("."):
            newDirs += [os.path.join(dir, d) for d in subdirs]
            for path in [os.path.join(dir, f) for f in files if f not in excludeFiles]:
                changedFiles[path] = os.path.getmtime(path)
        return (newDirs, changedFiles)

    @staticmethod
    def __flush():
        """
        Force all files and dirs to be up to date
        """
        file(fileData, 'w').write(pprint.pformat(SiteUpdater.getTree()))

    def getUpdateLists(self, newDirs, changedFiles):
        dirUpdates = [path for path in newDirs if path not in self.oldDirs]
        fileUpdates = []
        for path in changedFiles:
            if path in self.oldFiles and self.oldFiles[path] == changedFiles[path]:
                continue
            fileUpdates.append(path)

        # Sorting dirUpdates produces short directories first:
        return sorted(dirUpdates), sorted(fileUpdates)

    def updateSite(self):
        newDirs, changedFiles = self.getTree()
        dirUpdates, fileUpdates = self.getUpdateLists(newDirs, changedFiles)

        if not fileUpdates and not dirUpdates:
            print ".",
        else:
            print
            try:
                for d in dirUpdates: # Create new directories
                    dir = d[1:].replace('\\', '/') # Strip leading '.'
                    self.ftp.mkd(dir)
                    print "created", dir
                for f in fileUpdates:
                    dir, name = os.path.split(f)
                    dir = dir[1:].replace('\\', '/') # Strip leading '.'
                    self.ftp.cwd(dir)
                    self.ftp.storbinary('STOR ' + name, file(f, "rb"))
                    print "updated", f
                # Only store the updates if we get through the whole list:
                file(fileData, 'w').write(pprint.pformat((newDirs, changedFiles)))
                self.oldDirs = newDirs[:]
                self.oldFiles = changedFiles.copy()
            except:
                traceback.print_exc()
                # Close and restart if there are any ftp problems:
                print "Session timed out; reconnecting"
                self.ftp.close()
                self.ftp = None
                self.__connect()
        threading.Timer(repeatInterval, self.updateSite).start()

if __name__=="__main__":
    try:
        SiteUpdater()
    except:
        traceback.print_exc()
        raw_input("Press Enter...")

Most ftp servers will disconnect after awhile. This program detects a disconnect because an exception is thrown when you try to write to a disconnected ftp object; in that case it automatically closes the connection and opens a new one (there may be a more elegant way to do this, and I'd love to hear about it). This way, it can just keep quietly running and updating in the background. All I have to do is edit the files in my tree and they show up on the server.

I've been using this for a couple of weeks and so far it's been very nice. Now I can edit files with a sophisticated editor, so I'm much more productive. And I don't miss through-the-web editing.

The only other things I'd like to be able to do on the Godaddy site -- and I've read bits and pieces about Apache servers in general that make me think this is possible -- is:

Automatically open index.html if only a directory, and no file, is given in the URL.
Run PHP scripts inside .html files.
Automatically include a header and footer without having to explicitly type the lines in every file.

Pointers appreciated.

Michael Foord

Posts: 13
Nickname: fuzzyman
Registered: Jan, 2006

Re: Automatically Synchronize Your Site

Posted: Apr 27, 2007 6:10 PM

You can use SSI to automatically include headers and footers. The downside is that your files have to be '.shtml' files instead of '.html'.

Have you come across rest2web? It lets you build files from a single template and keep the content in either ReST (ReStructuredText) or HTML format. It builds static HTML files, but is a great way to use a single template.

http://www.voidspace.org.uk/python/rest2web/

Julio Aguilar

Posts: 2
Nickname: madth3
Registered: Mar, 2006

Re: Automatically Synchronize Your Site

Posted: Apr 27, 2007 7:13 PM

I've been doing some development in Java web applications and instead of including headers and footers we ended up using Sitemesh decorator.

I heard about a port called phpmesh (http://trypticon.org/software/phpmesh/). Never used it but it might be useful to you.

Andriy Shapochka

Posts: 3
Nickname: ashapochka
Registered: Oct, 2004

Re: Automatically Synchronize Your Site

Posted: Apr 27, 2007 8:02 PM

> Have you come across rest2web? It lets you build files
> from a single template and keep the content in either ReST
> (ReStructuredText) or HTML format. It builds static HTML
> files, but is a great way to use a single template.
>
> http://www.voidspace.org.uk/python/rest2web/

I think, rest2web + rsync is a wonderful tool combination to build and update static sites.

David Buxton

Posts: 2
Nickname: buxtobox
Registered: Apr, 2007

Re: Automatically Synchronize Your Site

Posted: Apr 28, 2007 2:33 AM

With Apache, you can serve .html files as PHP with


<Files "*.html">
        ForceType application/x-httpd-php
</FilesMatch>

And PHP can include files automatically using this Apache configuration...


php_value auto_append_file /path/to/append_file.php
php_value auto_prepend_file /path/to/prepend_file.php

But does GoDaddy allow you to change Apache's behaviour?

References:

http://httpd.apache.org/docs/2.0/mod/core.html#files
http://www.zend.com/zend/spotlight/prepend.php

Bruce Eckel

Posts: 875
Nickname: beckel
Registered: Jun, 2003

Re: Automatically Synchronize Your Site

Posted: Apr 29, 2007 8:59 AM

Thanks -- these are some very helpful pointers!

Mark Thornton

Posts: 275
Nickname: mthornton
Registered: Oct, 2005

Re: Automatically Synchronize Your Site

Posted: May 2, 2007 2:11 AM

Why not have your synchronizing script automatically include the header/footer lines in the files as they are transferred? Although there is more to transfer, especially when the header/footer are changed, it requires nothing from your host.

Bruce Eckel

Posts: 875
Nickname: beckel
Registered: Jun, 2003

Re: Automatically Synchronize Your Site

Posted: May 3, 2007 7:55 AM

Mark: Nice thought. I'm still getting used to thinking in this fashion, and the fact that I am already running an automation process -- so why not add more automation? -- didn't occur to me. Sounds like it could qualify as "simplest thing" thinking.

Flat View: This topic has 7 replies on 1 page

Previous Topic

Next Topic