Ruby Buzz Forum - Fictohedron: Writing Team Novels with the Help of a Spam Filter

Articles |
News |
Weblogs |
Books |
Forums

Artima Forums | Articles | Weblogs | Java Answers | News

Sponsored Link •

Ruby Buzz Forum
Fictohedron: Writing Team Novels with the Help of a Spam Filter

0 replies on 1 page.

Welcome Guest
Sign In

Back to Topic List

Reply to this Topic

Search Forum

Threaded View


Previous Topic		Next Topic

Flat View: This topic has 0 replies on 1 page

Red Handed

Posts: 1158
Nickname: redhanded
Registered: Dec, 2004

Red Handed is a Ruby-focused group blog.

Fictohedron: Writing Team Novels with the Help of a Spam Filter

Posted: May 5, 2006 11:04 AM

This post originated from an RSS feed registered with Ruby Buzz by Red Handed.
Original Post: Fictohedron: Writing Team Novels with the Help of a Spam Filter Feed Title: RedHanded Feed URL: http://redhanded.hobix.com/index.xml Feed Description: sneaking Ruby through the system	Latest Ruby Buzz Posts Latest Ruby Buzz Posts by Red Handed Latest Posts From RedHanded

You feed a pile of books through a spam filter. Half are books you favor. The other half you pointedly dislike. What words and themes would rise? Obviously crossbows. But perhaps toast?

See, this is what is happening with Fictohedron. I feed ten fictional blogs into the filter. (Francis Hwang’s Ten-Sided project, for which I write as well.) And twenty real blogs. Some of the blogs are mainstream stuff (Gawker, Dooce) but most are just plain LiveJournals. Then, we watch the terms peculiar to Ten-Sided float to the top. I take weekly samples, to be sure new terms rise and timely popular terms get weeded out.

Hooking Up the Filter

I am using bogofilter. And a very short Camping. I fill up two directories ham and spam with blog entries disguised as mail. Then…

 bogofilter -s -B spam/* -d bogo
 bogofilter -n -B ham/* -d bogo
 bogoutil -d bogo/wordlist.db | awk '{print $1}' |
 bogoutil -p bogo/wordlist.db > scores

The scores file now contains a list of all the words found and their rating. Low ratings are good, they indicate hamliness. In fact, the stats you see next to each word on Fictohedron is the number of total mentions and, in parens, the filter’s rating.

Terms are only stored in the database if they occur more than once and have a rating less than 25. There’s enough data to show trends and overall ratings. But I don’t want to make it too busy yet, maybe in two more months when Ten-Sided ends.

And This Assists Readers or Writers?

Without the tool, it’s tough for casual readers to sense the common themes between blog entries. You really have to spend time reading each blog, getting to know each character, figuring out how they relate to each other. But the whole point of team writing is to watch the interplay. Wait, which characters ended up in Vegas this week? Who’s seen Aliss and who’s looking for her?

Oh, hey, the word seaside was mentioned twice this week. Look, Toni’s staying at The Golden Chain, a seaside residence. And A.P.’s off to a week-long conference on the seaside. An obvious link. But other connections are more subtle: toast, nurse, pocket. What are people putting in/taking out of their pockets this week?

But I think the tool is even more beneficial to the writers. How do you leave clues for the others without using marquee tags?

If a word is dominating and you can fit it into your story, do it. But twist it, contrarywise.
Mid-week, grab a new word at the bottom of the list and really push it up. Surges like that avoid remaining topically flatlined.
Drop a word twice within a week and it’ll likely show up. Use these sparingly or you’ll wash out the natural instincts of the filter.

The best part is that not all terms will get picked up. The spam classifier may find the terms too droll given the state of the corpus. Which means you either push it harder or wind the plot elsewhere.

Collaborating with Bots, etc.

So, what else could be done? We have the underpinnings for seeding some AI here. Or we could use the filter to seek out new writers from the Net at large. It adds a new dynamic to group writing, doesn’t it?

A few things I’d like to see:

Bots which could act as incidental characters in the story, set up new blogs, seek out likeminded kids on LJ/MySpace and coexist with them. Thus tying the fiction to the physical world.
A spider which could continually filter in new bodies of text from published books or Guttenberg, queueing the writers with possible choices of obscure references and footnotes.
Allow readers to easily rate the terms. The writers can then appease the readers by fueling the preferred terms. Or piss off the readers by bring hated terms to prominence.

It seems like a big goal is to eliminate neutrality in the writing. If the readers don’t care and none of the other writers care, then there’s got to be a way to help it die.

And, I mean, is the story any good? Can we measure that?

Read: Fictohedron: Writing Team Novels with the Help of a Spam Filter

Previous Topic

Next Topic


	Web Artima.com