Wanting to know how many people are reading my RSS feeds, I wrote a
couple scripts to analyze my httpd logs. Their only merit, if any, is
that they use a cache (with good old Marshal) to avoid processing data
twice, so they still terminate in a fraction of a second when I run them
on my >200MB access.log file, which I update with my
append-only rsync substitute.
The bloglines index
bloglines seems to be one of the most successful online aggregators,
and its bots leave some interesting information in the Referer field
(which e.g. Google's Feedreader doesn't), so it's quite a good way to
measure a site's growth
The script aggregates requests originating from bloglines within a 1H
interval, and saves the information necessary for the next run (how much of
the file was processed and the data corresponding to the last requests).
Pretty simple: