This post originated from an RSS feed registered with Python Buzz
by Ben Last.
Original Post: A Matter Of Questions
Feed Title: The Law Of Unintended Consequences
Feed URL: http://benlast.livejournal.com/data/rss
Feed Description: The Law Of Unintended Consequences
The PlayStation Project revealed. Another case study of Python and open-source tools on another Interesting Problem.
A Little Bit Of Background E3 thunders towards us on the calendar[0] and mortal development teams quail before the onrushing juggernauts of deadlines. Apart from our team, for whom the E3 deadline was a while back. Now we have a whole new set of amusing comedy deadlines for beta tests, but that's not important right now. E3 is a big deal for us because it's where the Playstation Project is now exposed to the nerveless, searching gaze of the Eye of Sauron... no, wait, I mean The Games Industry Press. Same thing, smaller spiked helmets, as I understand it.
I'm a regular reader of Penny Arcade. Not that I'm really a player in any sense these days; my favourite game is still Homeworld2, which is pretty much gathering dust on the shelves of most gamers' rooms. But I like Tycho's writing, and Gabe's style of art. A while ago, they posted some comments from Geoffrey Zatkin, on the subject of new ideas for games. Here's a quote:
At PAX this year I was a judge for their "pitch your idea for a game" sit-in. I got to break a lot of hearts by telling the audience a very sad fact – that in my 8+ years as a professional game designer, not once has any boss of mine ever asked me for an idea for a new game. Not once. Again, unless you own the company, you get assigned a project (or jump ship to another company working on a game that sounds interesting). Sure, I've helped flesh out any number of games from concept to fully realized design. And that's the hard part. Coming up with a good idea for a game is like coming up with a good idea for a novel. Everybody does that. But very few people have the discipline to sit down and write the book. The ideas are easy – the execution of the idea is the hard part.
But that's what we've done; we came up with a new idea, something that genuinely hasn't been done before[3], and we've done a massive great chunk of the work of executing that idea. And it's been interesting, in the best senses of the word.
It isn't a first-person shooter. It's not a racing game. No busty women swing over pits and solve puzzles. It's something a bit new, in a number of ways. The game's called Buzz![1], and it is, at heart, a music quiz. There are nearly 1200 different clips of tracks involved, with a total of over 47,000 questions, in ten languages. Our development partners do the clever 3D interface work (and a damn fine job they've done of it as well). It's been our job to gather, generate, edit, collate, audit, process and provide all of this data on which the quiz is based. So, since this blog is (ostensibly) chiefly about techie things, I thought it might be interesting to explain the set of open-source software and tools that we've used to manage all of this.
Of Databases And Babel Let's start with the database engine (since that's the core of it all). The requirements that I drew up[2] were pretty much these:
Open-source database. There is no religious reason for this. It's a budget thing.
That interfaces well with Python. More on this below.
And that supports Unicode properly. I mean; has tools that support input and output of international characters sets. And that does Unicode via the Python interface.
Supports transactions.
Accessible from Windows tools (ODBC, Python)
Runs on Linux. All our serious development servers run Fedora.
Given the above, it was pretty much a question of MySQL or PostgreSQL. I chose MySQL for two reasons. First; I'd used it many times before. Second; the various Python/PostgreSQL packages all seemed to lack in one way or another, especially when it got down to sorting out Unicode. It may be that there are neat solutions to any/all of the issues I found, but there seemed to be no great advantage to swapping a database I was familiar with for another I wasn't. MySQL (as of version 4.1.1) also has excellent Unicode support and speaks unto Python via the truly excellent MySQLdb package, courtesy of Andy Dustman, to whom I shall one day build a small shrine.
None of this would be much use if the general query tools for MySQL (like the Query Browser) didn't work properly with international strings on Windows. All our desktop machines run XP, and it's critical that display, edit and copy/paste of strings in any language work. Well, as long as you choose the right tools, they do, but that's the nature of Unicode work for you.
The Language That Gets Everywhere Why, you might ask, purely because at this point it would help me move on to the next point, do you need a database that links with Python? Thanks for asking. Early on in the whole development process, I gave a lot of thought to the jobs that we'd have to do. There were some guiding principles I followed.
Whatever we think we're going to be doing, it probably won't happen the way we think it will. Columns will change meaning. Entities will be discarded and new ones appear. Be flexible.
We're going to be gathering data from a zillion different sources. Any data we capture is going to be dirty and need auditing and cleaning.
There's going to be so much data that any manual task applied to the thousands of records we'll have might mean that we'll run out of time. Automate.
It would be nice, one day, to work again on the sort of project for which BigDesignUpFront would be applicable (that doesn't, of course, mean that I'd do it; I'm pro-iterative myself). Unfortunately, when you're starting to gather data well in advance of knowing how that data will be used, you need to come up with something that'll handle Big Change. Thus "the database" in our little world doesn't really mean the MySQL repository in which the data's kept. It's MySQL plus thirty or so Python scripts and modules that Do Things To Data.
It's always seemed to me that there are two approaches to database work. I shall, for the purposes of discussion[4] class them as Database-centric and Code-centric development.
The database-centric approach was typified for me by a database developer I worked with at breathe, several years ago. His approach to starting the working day was to (a) sit down at PC, (b) fire up a SQL Server client. And that was him done; everything (and I mean everything) else was done from within the client. List a directory? Do some file copying? It could all, apparently, be done from within the Database That God Gave Him. A nice enough guy, but a tad fixated.
Of course, there are lesser examples of the approach and there's much to be said for keeping the business logic next to the code in funky stored procedures. But I never liked the languages in which they were written, nor the way in which the rectangular nature of the data forced its way into every corner of the design. I am a dyed-in-the-wool code centric guy. And, naturally, the code-centric approachs works thusly; your database holds your data. Operations on that data are carried out by code, which fetches the data (whether that be into simple structures, objects that relate to the schema or objects that relate to whatever the hell they want to), Does Stuff to the data and rewrites the data back to the database.
The big point for me, though, that favours code-centricity is that it's proven in the past to be more flexible in the ways that matter to me. Maybe it's a sign that I came to databases after high level programming languages (and to those after assembler). Whatever; I like the code paradigm. And in this case, it meant that I didn't even attempt to create a vast and allencompassing ERD that encapsulated every last semantic of the data. We just created a base set of simple tables that the data gathering team could begin to populate. Over time and as prototypes of the game were created and revised, the schema grew to reflect the uses of the data; now it's pretty complex.
So for this project, the phrase "the database" usually refers to the set of tables in the MySQL server plus the code that operates on it. And that's all in Python.
There were a couple of other options open to us. We could have started in Access... but apart from the obvious scale issues, we needed a proper multi-user database that could be efficiently hosted on a remote server, with replication. Also, I'm not a great fan of VB as a development language; for quick-and-dirty user interface jobs it's great. Slapping together forms to edit data? Access can be the solution you need, using linked tables to get at the MySQL data where necessary. We could have used Java for the main development, but working in a dynamic language has rather spoiled me for Java in odd ways. Most of the supporting code for the database is all in a single Python module and the convenience of firing up a Python interpreter and being able to do ad-hoc processing at the command line was too good to pass up. In fact, as we've approached the last few deadlines there have been many, many audits and sensibility checks all run from inside a Python interpreter (often Quasi). The Java development cycle is just that bit too slow, and the definitions of objects a little too fixed to match that convenience.
The Swiss Army Chainsaw There're many audio files involved in this project, which means that sox has been a complete necessity at times. Same rule applies as with the data; for a small company, even the simplest, fastest manual processing can be impossible when multiplied up by thousands of files. Thus automation's been key here, allowing all the samples to be crossfaded and normalised in batches. For a lot of editing work, though, there is really no substitute for having the waveform visible on a PC and here, for once, proprietary tools have really won out. Two of us in the company are musicians with home studios, and applications like Cool Edit or Wavelab have done a lot of the work.
So that's it. Perhaps it's not an in-depth industry exposé of the use of open-source stuff to create the next Duke Nukem or Doom, but there's a real project here resulting in a real game and that's a snapshot of some of the ways we did it. And are continuing to do it... there's still work to do. Back to Eclipse and QueryBrowser for me...
[0] RSS feeds and individual workloads being what they are, you may well be reading this after E3 2005 has come and gone. In which case, try to imagine yourself back in the heady old days of May 2005. Feel the period atmosphere. Good, isn't it? Do they have flying cars yet in your time? [1] Yes, the exclamation mark's part of the name. Like Yahoo! [2] This was way back at the end of 2003. It's taken a long time to get this thing under way, and I might do another blog entry to explain why, and what happened along the way. [3] True. Yes, there have been quizzes. Yes, there have been quizzes with clips of music (like the DVD Pepsi Challenge game, or the CD-based Spot The Intro). But nobody's ever done one with 1200 tracks on it. [4] As opposed to the Porpoises of Discursion. I had a small spelling checker issue that was too good to delete entirely...