The Artima Developer Community
Sponsored Link

Ruby Buzz Forum
Fictohedron: Writing Team Novels with the Help of a Spam Filter

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Red Handed

Posts: 1158
Nickname: redhanded
Registered: Dec, 2004

Red Handed is a Ruby-focused group blog.
Fictohedron: Writing Team Novels with the Help of a Spam Filter Posted: May 5, 2006 11:04 AM
Reply to this message Reply

This post originated from an RSS feed registered with Ruby Buzz by Red Handed.
Original Post: Fictohedron: Writing Team Novels with the Help of a Spam Filter
Feed Title: RedHanded
Feed URL: http://redhanded.hobix.com/index.xml
Feed Description: sneaking Ruby through the system
Latest Ruby Buzz Posts
Latest Ruby Buzz Posts by Red Handed
Latest Posts From RedHanded

Advertisement

You feed a pile of books through a spam filter. Half are books you favor. The other half you pointedly dislike. What words and themes would rise? Obviously crossbows. But perhaps toast?

See, this is what is happening with Fictohedron. I feed ten fictional blogs into the filter. (Francis Hwang’s Ten-Sided project, for which I write as well.) And twenty real blogs. Some of the blogs are mainstream stuff (Gawker, Dooce) but most are just plain LiveJournals. Then, we watch the terms peculiar to Ten-Sided float to the top. I take weekly samples, to be sure new terms rise and timely popular terms get weeded out.

Hooking Up the Filter

I am using bogofilter. And a very short Camping. I fill up two directories ham and spam with blog entries disguised as mail. Then…

 bogofilter -s -B spam/* -d bogo
 bogofilter -n -B ham/* -d bogo
 bogoutil -d bogo/wordlist.db | awk '{print $1}' |
 bogoutil -p bogo/wordlist.db > scores

The scores file now contains a list of all the words found and their rating. Low ratings are good, they indicate hamliness. In fact, the stats you see next to each word on Fictohedron is the number of total mentions and, in parens, the filter’s rating.

Terms are only stored in the database if they occur more than once and have a rating less than 25. There’s enough data to show trends and overall ratings. But I don’t want to make it too busy yet, maybe in two more months when Ten-Sided ends.

And This Assists Readers or Writers?

Without the tool, it’s tough for casual readers to sense the common themes between blog entries. You really have to spend time reading each blog, getting to know each character, figuring out how they relate to each other. But the whole point of team writing is to watch the interplay. Wait, which characters ended up in Vegas this week? Who’s seen Aliss and who’s looking for her?

Oh, hey, the word seaside was mentioned twice this week. Look, Toni’s staying at The Golden Chain, a seaside residence. And A.P.’s off to a week-long conference on the seaside. An obvious link. But other connections are more subtle: toast, nurse, pocket. What are people putting in/taking out of their pockets this week?

But I think the tool is even more beneficial to the writers. How do you leave clues for the others without using marquee tags?

  • If a word is dominating and you can fit it into your story, do it. But twist it, contrarywise.
  • Mid-week, grab a new word at the bottom of the list and really push it up. Surges like that avoid remaining topically flatlined.
  • Drop a word twice within a week and it’ll likely show up. Use these sparingly or you’ll wash out the natural instincts of the filter.

The best part is that not all terms will get picked up. The spam classifier may find the terms too droll given the state of the corpus. Which means you either push it harder or wind the plot elsewhere.

Collaborating with Bots, etc.

So, what else could be done? We have the underpinnings for seeding some AI here. Or we could use the filter to seek out new writers from the Net at large. It adds a new dynamic to group writing, doesn’t it?

A few things I’d like to see:

  • Bots which could act as incidental characters in the story, set up new blogs, seek out likeminded kids on LJ/MySpace and coexist with them. Thus tying the fiction to the physical world.
  • A spider which could continually filter in new bodies of text from published books or Guttenberg, queueing the writers with possible choices of obscure references and footnotes.
  • Allow readers to easily rate the terms. The writers can then appease the readers by fueling the preferred terms. Or piss off the readers by bring hated terms to prominence.

It seems like a big goal is to eliminate neutrality in the writing. If the readers don’t care and none of the other writers care, then there’s got to be a way to help it die.

And, I mean, is the story any good? Can we measure that?

Read: Fictohedron: Writing Team Novels with the Help of a Spam Filter

Topic: Null Object Pattern Previous Topic   Next Topic Topic: Marshal.load Speed

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use