So, the new job at Google is a blast! It's like being back in college. There are tons of smart people here, and that doesn't just refer to programmers; it appears that everyone they hire is above average in their domain. There's always more to learn; I expect to be overwhelmed for the first three months and it may take me a year before I feel truly comfortable. (And did I mention the free food? :-)
Python is big at Google. Since I don't want to bother with getting this blog reviewed by Google, I can't go into much detail, but it's at a secure 3rd place after C++ and Java, and it's being used for everything from build tools to managing ads. Name your third-party Python module and someone at Google is probably using it. So this is an exciting environment -- I get to see first-hand what truly large-scale Python development is like, and where the pain points are.
And did I mention that I get to spend 50% of my time on Python? No strings attached. Of course I get to spend the other 50% on Python too, but that's in a corporate setting. Fortunately it's easy to separate the two. If it uses two-space indents, it's corporate code; if it uses four-space indents, it's open source. (If it uses tabs, I didn't write it! :-)
The next US Python Conference is soon! We've got a new location, Addison TX (near Dallas). I've heard some rumblings from the organizers that attendance is lower than expected, but I'd like to point out that we've had worries about attendance at every PyCon, and in the end the results were always above all predictions. And, like most years, we're keeping early-bird registration open an extra two weeks (until January 15); the special rate at the hotel is valid until February 1st.
The program looks spectacular. We've got the Plone team keynoting, and on day three instead of a keynote we have an interview with BitTorrent creator and nouveau-enterpreneur Bram Cohen (submit your questions now!). Oh, and I believe some BDFL guy is doing a state-of-the-Python talk. BTW, at least 7 Googlers are coming (with at least three of us presenting this or that).
After the conference there will be four days of sprints. Like every year, this is an outstanding opportunity for teams that normally communicate via email and IRC to have a few days of coding in the same space -- despite the wonderful invention of the internet, there's still nothing that quite beats face-to-face contact. Several groups will be sprinting on core Python things (possibly even Python 3000!); I expect we'll also see sprints for projects like Twisted, Zope and Plone. You can sign up your own project via the wiki!
Despite my 50% Python time at Google I haven't managed to keep up-to-date with everything that's going on in python-dev. (And that's an understatement!) It seems clear however that development is picking up. Lots of bugs are being fixed (I'd especially like to mention my fellow Googler Neal Norwitz here, who seems to have no life :-). We've successfully switched to self-hosted subversion. (I'm still waiting for the switch to self-hosted Roundup as the issue tracker, but apparently nobody's volunteering.) After I made a few disparaging remarks the AST-branch group got its act together with the result that this new, abstract-syntax-tree-based approach to producing Python bytecode is now mainline for Python 2.5. The new infrastructure immediately proved itself by enabling a relative newbie to implement PEP 341 (merging try/except and try/finally).
Martin von Loewis is working on getting rid of a nasty 32-bit dependency in Python's implementation: using C ints for indexing Python sequences. On many 64-bit platforms, an int holds only 32 bits which makes it a bit of a problem to make effective use of the architecture's ability to handle strings longer than 2 GB. This is not a theoretical problem any more; servers with 6-8 GB of RAM are now commonplace.
Another cool new development tool is the Python buildbot. This is a set of cron jobs that continuously check out the latest version of Python, build it, and run the unit test suite, on a variety of machines. The results can be viewed live at http://www.python.org/dev/buildbot/.
Of course, there's plenty of heat as well -- a discussion about making 'self' implicit in Python 3000 that won't go away, flame wars about the missing 'quit' command and about replacing LaTeX (really!), and the old standby, the GIL. But all in all it's an enjoyable place, and I plan to spend more time there as soon as I've handled my backlog of other Python tasks (like writing the definitive article about Python's history for the ACM HOPL-III conference, to be held in 2007)! Once that's done, expect to hear more about Python 3000 on python-dev (unless the population there bans that subject to a separate list; I'm not sure yet whether that would be a good or a bad thing).
Trac as an issue tracker, changset tracker, milestone tracker, source viewer and wiki is quite amazing too, and a natural choice for svn users. (be sure to try out the latest 0.9.3 egg and put the sourcecode highlight plugin in)
Guido, I know you can't write about your two-space indent code, but here is my wish list of what I hope that you are developing at google:
1. python-izing sawzall and then releasing it, or even better, could there be an "import sawzall" in google's future? Can we have it too? It looks like fun. 2. creating a google web framework that builds on (and improves) zope so that it becomes THE python web framework. 3. improving goopy so that python has more "higher order" capabilities than perl. 4. boosting jython development
You probably aren't working on any of these, but still, I hope they hired you because they really want to propel the python community forward. I'm sure there would be a lot of money in it for them, which is ok by me. Tim
> 1. python-izing sawzall and then releasing it, or even > better, could there be an "import sawzall" in google's > future? Can we have it too? It looks like fun.
I'm not sure what you would call Pythonizing it; AFIK sawzall is a statically typed language. Also, ISTM that sawzall essentially depends on resources that only Google has.
> 2. creating a google web framework that builds on (and > improves) zope so that it becomes THE python web > framework.
I actually don't think Google uses Zope (contradicting my own statement above! :-).
I just spoke to Mark Shuttleworth who also expressed concern about the lack of a "de-facto standard" Python web framework. I hope to be working on this issue somewhat, but it'll take a long time, and I doubt it'll involve open-sourcing Google code that is curently proprietary.
> 3. improving goopy so that python has more "higher order" > capabilities than perl.
Python already has all the higher-order capabilities you need, even if map/reduce/filter were taken away.
> 4. boosting jython development
I'm not a Java user so I'm not motivated to do this myself, and probably not even qualified. Jython work requires a rare combination of understanding Python, Java, language implementation techniques, *and* obscure JVM and class-loader tricks.
There seems widespread misunderstanding about the significance of Jython; IMO it's very important but its importance hinges on Java's importance; Jython is important for interacting with Java libraries. There's rarely a good reason to run "pure" Python code in a JVM since Python runs anywhere Java runs (and then some).
> You probably aren't working on any of these, but still, I > hope they hired you because they really want to propel the > python community forward. I'm sure there would be a lot of > money in it for them, which is ok by me.
I think they hired me because they like my work; Google is expressing its goals for the Python community clearly enough through sponsorship of the PSF and PyCon.
I'm not sure what you would call Pythonizing it; AFIK sawzall is a statically typed language.
Not being familiar with google's needs (although I see from a recent sawzall paper that they want to extend sawzall to do multiple passes) I can only speak to what may one day be opensourced by google. If (hopefully) google shares sawzall and mapreduce one day, it would be nice if there was an easy way for python programers to utilize it. perhaps it could work like so: 1. write program in python. 2. process program with a script that strips out/ converts code into a sawzall program. 3. run multiple instances of sawzall interpreter on multiple boxes 4. write reports in python program that interfaces with these multiple instances.
Also, ISTM that sawzall essentially depends on resources that only Google has.
exactly. That is too bad for us! Maybe steps 1. and 2. and 4. listed above are done on a client program and google can sell us step 3? But isn't it just a matter of time before the same types of resources are more widely available? Certainly not the same scale but if google doesn't opensource googleFS, mapreduce, sawzall, somebody else will, no?
I dream of the day when python has a single Web framework and we all know that Mark is the guy that can get it done! But I can think of a couple of ways Google could benifit from having a widely adopted web framework. Oh well, there is probably too many things cooking down there as it is. Tim
> 1. write program in python. > 2. process program with a script that strips out/ converts > code into a sawzall program.
I seriously doubt that this will work. The required semantic analysis is made very difficult by Python's dynamic nature; if you want your program to be analyzable to the point where it can be parallellized a la sawzall you're better off writing it in sawzall. Most sawzall programs are very small, but run on very large (and I mean VERY LARGE) data sets so this is a small thing to ask for.
In general I'm not very sympathetic to requests to make Python more "functional" because Python's dynamic nature makes it impossible to do the kind of analysis that is the bread and butter of functional programming languages. E.g. when you write
for x in seq: f(x)
there isn't enough information for the compiler to know whether the order in which seq is iterated over matters, and whether it may be parallellized.
I seriously doubt that this will work. The required semantic analysis is made very difficult by Python's dynamic nature; if you want your program to be analyzable to the point where it can be parallellized a la sawzall you're better off writing it in sawzall
But what if you wanted to sell the sawzall functionality as a service? Wouldn't the general public be more receptive to an already established language? I am far from being a language designer, but it seems that the benefits of sawzall (even strict type checking) could be repackaged into a language that is a lot like python. It would be a stripped down version of python that eleminated the dynamic problem. Executable pseudo-code for parallel programming. Perhaps you wouldn't have to throw out your [] and {} but they would go through a pre-processor that would fix them into static data structures. And you make it clear that sequence order can't matter. Then you run def f(): emit <- value # sends value to sawzall.reducer()
for x in seq: r = sawzall.reducer() for x in range(1, 100000): sawzall.instance(f(x, r))
(sorry no time to find that sawzall paper and reread it, I have to go pick up my kid. BTW, how do those sawzall instances get out onto all the boxen?)
Here is what it looks like to me. Google has created a state-of-the art parallel processing system. These systems are going to become more and more available to others outside google. Google will either help that along by selling services or just keep eveything proprietary. (They have already helped it along a lot by publishing the papers that they have published.) When these types of systems become widely available, there is going to be a need to find programers that can work with these systems. Why not tap into the python community?
Guido, i definitely want to learn Python! I remember someone at a previous job who had a Phd in Mathematics, and couldn't find a solution to a Corba related problem and was finally able to solve it using Python!
Also Guido, i am wondering if you might generally know, what types of projects at Google are typically done using C++ vs Java vs Python please?
One small correction: BuildBot is not a set of cron jobs, it's a Twisted-based server (and written in pure Python). This means you can e.g. run it on Windows, though that can be a bit difficult to setup.
> Python already has all the higher-order capabilities you > need, even if map/reduce/filter were taken away.
Could you elaborate or point me to a resource that discusses this? This is of great interest to us latent academics. What features replace these. I know I can use generators or list comprehensions for some things, but is everything really readily supported if map/reduce/filter/lambda went away? (I added the lambda, since I know there have been rumblings about removing it too). Thanks.
> > Python already has all the higher-order capabilities > you > > need, even if map/reduce/filter were taken away. > > Could you elaborate or point me to a resource that > discusses this? This is of great interest to us latent > academics. What features replace these. I know I can use > generators or list comprehensions for some things, but is > everything really readily supported if > map/reduce/filter/lambda went away? > (I added the lambda, since I know there have been > rumblings about removing it too). Thanks.
Actually, lambda isn't going away (I decided this recently).
For the others, here are some starting points -- because Python supports higher order functions natively, you can trivially write these yourself:
def map(f, seq): for x in seq: yield f(x)
def filter(p, seq): for x in seq: if p(x): yield x
def reduce(op, seq, zero=0): result = zero for x in seq: result = op(result, x) return result
Extension to multiple argument sequeces etc. are left as an exercise to the reader.
> > 4. boosting jython development > > I'm not a Java user so I'm not motivated to do this > myself, and probably not even qualified. Jython work > requires a rare combination of understanding Python, Java, > language implementation techniques, *and* obscure JVM and > class-loader tricks. > > There seems widespread misunderstanding about the > significance of Jython; IMO it's very important but its > importance hinges on Java's importance; Jython is > important for interacting with Java libraries. There's > rarely a good reason to run "pure" Python code in a JVM > since Python runs anywhere Java runs (and then some).
I'd say the key reason folks want a python/java bridge is to use the cross-platform GUI/Grapics capabilities of java, but from python. It'd be both a huge pain to get a cross platform GUI/Graphics suite for python, and a shape to waste a perfectly reasonable resource -- java.
For that use, it seems to me that building a Java server that Python connects to, much as the X11 window system works, would work. Its trivial to build servers in java .. listen on a socket then fork a thread connected to it. Build a simple protocol for accessing java swing/awt/swt interfaces. Painful, and getting the optimization right might be difficult.
In terms of API, the Processing.org folks are doing wonderful work at simple (pythonic) 2D/3D interfaces. I suspect a first start for the java server would be access to the Processing API.