This post originated from an RSS feed registered with Python Buzz
by Jeremy Hylton.
Original Post: More Questions and Answers for the ZEO Cache
Feed Title: Jeremy Hylton's Web Log
Feed URL: https://www.python.org/~jeremy/weblog/rss.xml
Feed Description: A techie web log mostly about Python
The work on the ZEO cache has been delayed by some other customer
work at Zope Corp. I wanted to capture the current state of our
efforts for when we get back to it.
It seems crucial to take into account the wide range of object
sizes. In theory, the size of an object is unbounded, although most
objects are small, some only a few tens of bytes.
The Thor algorithm has done remarkably well compared to other
approaches, although it's biggest win may be taking object size into
account. It needs tuning to decide how often to "tick."
The ARC algorithm does not perform well for our workloads. It
seems to require some tuning to get reasonable performance, despite
the fact that the algorithm is intended to be free of tuning
parameters.
The 2Q algorithm has performed fairly well, although some variations
on the basic scheme seem to improve it. One change is to avoid
promoting an object on a hit in A1out, unless there was also a hit
in A1in. It's hard to judge other tuning parameters, like size of
A1out. Why even bother with A1out? We could have a hit threshold
for A1in: If the object is hit more than N times while it is in
A1in, promote it to Am.
The ZEO client cache is a second
level buffer cache, where the
references it sees depend a lot on the size of the first
level
object cache. We think zope.org's first
level cache is configured
for 23,000 objects, which is much larger than the default of 400.
A simple FIFO algorithm is substantially better than the current
scheme with its flips and copies.
The FIFO algorithm is the only one we simulate actual storage
management. All the other simulations are low fidelity, because
they ignore the details of how to manage space. If a 2Q simulation
needs N bytes and evicts M objects, it doesn't, for example, worry
about whether they are contiguous. A good algorithm for persistent
storage allocation, assuming that the cache is full in steady state,
is hard.