I had the wonderful chance to do some Unicode -> ASCII translation. I had thought that this was going to be easy, but I got confounded.
So - excercise to the reader:
Given a Unicode string with characters that cannot be represented as US-ASCII, how do you get all the valid ASCII characters out of the input string in Java?
The Python version is this:
someUnicodeString.encode('ascii', 'replace') or someUnicodeString.encode('ascii', 'ignore').
Luke pointed me out to a solution in Java, but as it turns out - I still don't get it. Which concrete implementation of OutputStream was I supposed to use?
On an aside - I've been busy refactoring Webware. The goal has been to get Webware into a distutils friendly state. There's a couple reasons for this:
distutils would make installing Webware almost trivial to install
I personally find writing unit tests to be a lot simpler in a distutils friendly setup. Test cases go in a proper directory and the Python path can be munged inside of a distutils 'test' custom class.
Testing of Webware plugins is too hard right now. We've made modifications to FormKit using FormEncode, but testing the changes has been neglected.
it's a good excuse to get my hands dirty with the Webware internals to see how it works
That last point is really my main reason - I've found that the best way to figure out how non-trivial code works is to gut it and put it back together. The paranoid freak in me never really trusts the API docs.
Current status:
Servlets basically work now - all the WebKit examples run. Minor "yay!".
I still need to redo the plugin packaging for PSP, MiddleKit, MiscUtils, and all that other stuff that sits as a peer to WebKit though. There's some code/data mingling going on with plugins and the examples in each plugin which is causing me some grief. An example is the WebKit/Testing/Main.py servlet which loads test cases from a data file. I'd really rather just push all the testing into the unit test suite and have some way of instrumenting Webware to load my testing servlet and then drive some tests against it.
Hopefully another two weekends of hacking and I should have something that isn't entirely embarassing to show off.
I've noticed that the way in which I refactor Python code is different than when I'm working on Java code. In Java - I tend to lean harder on the refactoring browsers built into Eclipse. In Python - I tend to lean harder on my unit tests, sed, grep and vim.
Truth told - I prefer the Java way of refactoring. Static type checking may be a pain in the ass when you're working on 'new' code - but I like the safety of an automated refactoring when I can get it.