The Artima Developer Community
Sponsored Link

Python Buzz Forum
Python Community Server has a search engine now

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Phillip Pearson

Posts: 1083
Nickname: myelin
Registered: Aug, 2003

Phillip Pearson is a Python hacker from New Zealand
Python Community Server has a search engine now Posted: Oct 14, 2003 4:59 AM
Reply to this message Reply

This post originated from an RSS feed registered with Python Buzz by Phillip Pearson.
Original Post: Python Community Server has a search engine now
Feed Title: Second p0st
Feed URL: http://www.myelin.co.nz/post/rss.xml
Feed Description: Tech notes and web hackery from the guy that brought you bzero, Python Community Server, the Blogging Ecosystem and the Internet Topic Exchange
Latest Python Buzz Posts
Latest Python Buzz Posts by Phillip Pearson
Latest Posts From Second p0st

Advertisement
OK, I finally got a search engine working with Python Community Server. A very very simple one, but quite effective. I've wasted a lot of time trying to get first ht://Dig (in C++) and then Lucene (in Java), integrated into my Python code ... but this one works. It does a simple linear search through a table of cached posts. You might think that would be slow, except some benchmarking here suggests it should be able to walk through many thousands of posts (informal benchmarks say 20,000) per second. If you post five times a day, every day, it'll take ten years before a search of your blog will take a second... so I'm not too worried.

Like the rest of PyCS, it's designed to work with 'distributed' blogging tools like Radio. These have the annoying property that the server usually doesn't know anything about the text of your posts -- it only ever sees the rendered HTML. To get around this, I've added an XML-RPC call that the client needs to call after posting, that sends the content of the post to the community server so it can be searched.

    struct xmlStorageSystem.mirrorPosts(email, password, posts)

The first two parameters are the same as for the other xmlStorageSystem methods. The last one is a list of structs, one for each post you want to put in the index. (If you've just made a new post, there will only be one entry in the list, but if you're initialising the server with a few hundred old posts, you'll want to send them in 10 at a time). Each struct needs the following members:

  date: an XML-RPC DateTime value; this is the date of the post.

  postid: a string. This is used to determine if you have sent the same post before (in which case it will be updated).

  url: a string; the permalink of the post. This is used on the results page.

  guid: currently unused; expected to be the same as the url parameter, but if you have some other sort of GUID for your posts (e.g. what some Movable Type users put in their RSS), you could put it here.

  title: a string; the title of your post. This is shown on the results page.

  description: a string; the content of your post. This is shown on the results page.

You can search it from the /system/search.py page. For example, search the pycs-devel mailing list archive blog. It doesn't do phrases, but does allow required inclusions and exclusions. Here are some examples:

  search: georg +xml-rpc -- Georg talking about XML-RPC

  search: georg -bauer -- people talking to Georg

  search: phil -pearson -- people talking to me

Right now the only tool supporting this is my bzero. If somebody wants to hack Radio or PyDS support in, that would be greatly appreciated. I might do Radio myself if I have time...

Comment

Read: Python Community Server has a search engine now

Topic: Another reason Previous Topic   Next Topic Topic: pycs.net, still broken

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use