This post originated from an RSS feed registered with Python Buzz
by Phillip Pearson.
Original Post: Something I'd like to see: likesearchd
Feed Title: Second p0st
Feed URL: http://www.myelin.co.nz/post/rss.xml
Feed Description: Tech notes and web hackery from the guy that brought you bzero, Python Community Server, the Blogging Ecosystem and the Internet Topic Exchange
This would be really handy: a server, in C or C++, to do really quick substring searches over lots of data. MySQL seems to be very slow at doing things like this, probably because the data is all spread out over many memory pages / disk blocks:
SELECT some columns FROM sometable WHERE foo LIKE '%bar%';
For a table with 100K rows, that can take many seconds. But if I dump out the foo column with SELECT INTO OUTFILE, and write a C program to iterate over all the data and check for 'bar' using strstr(), it can do the same search many times per second.
Incredibly, it even seems slow for MySQL to do something like this on a table with a key set on (lastname, firstname):
SELECT lastname,firstname FROM people ORDER BY lastname,firstname LIMIT 10000,50;
That query can take over a second, so it would be much faster to get the C program to find the ids of rows 10000-10050 and run this one 50 times:
SELECT lastname,firstname FROM people WHERE id=(id received from C program);
This all feels very wrong to me: there should be some way to do this better from inside MySQL. But I can't find it. Anyone know of a better way?