This post originated from an RSS feed registered with Python Buzz
by Andrew Dalke.
Original Post: Updated ReseekFile
Feed Title: Andrew Dalke's writings
Feed URL: http://www.dalkescientific.com/writings/diary/diary-rss.xml
Feed Description: Writings from the software side of bioinformatics and chemical informatics, with a heaping of Python thrown in for good measure.
I updated ReseekFile.
I wrote it two years ago to help probe the contents of file stream
coming over a socket connection. I was working with a server that
would normally return XML but returned a simple text file if there was
an error. It didn't change the headers so I couldn't check the
content-type or status code. Instead I checked the first line of the
input to check the content directly. If it's XML I needed to reset
the data stream so I could pass the file handle to the XML processor.
I'm currently working on a similar project. One quibble I have with
ReseekFile is that it keeps everything in memory. Machines have a lot
of memory these days but I wanted the option to use the filesystem if
need be. For example, if the file is indeed XML then I will be using
jing, a
Relax-NG validator. That takes a filename so I would prefer just
pointing it to that backing file.
I started to write a new class that would have been more complicated
than ReseekFile. It would have allowed arbitrary seeks into the file.
In the spirit of YAGNI I thought
better about it and looked to see if I could modify ReseekFile. I
could. It just needed a factory function, to specify how to create a
backing store. Turns out I was using the cStringIO's
getvalue() to get the size of the buffer file. Real files
don't have that so I added a bit of code to track how many bytes I've
written to the file.