The Artima Developer Community
Sponsored Link

Python Buzz Forum
How MySQL fails to optimise LIMIT

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Phillip Pearson

Posts: 1083
Nickname: myelin
Registered: Aug, 2003

Phillip Pearson is a Python hacker from New Zealand
How MySQL fails to optimise LIMIT Posted: Feb 10, 2005 3:35 PM
Reply to this message Reply

This post originated from an RSS feed registered with Python Buzz by Phillip Pearson.
Original Post: How MySQL fails to optimise LIMIT
Feed Title: Second p0st
Feed URL: http://www.myelin.co.nz/post/rss.xml
Feed Description: Tech notes and web hackery from the guy that brought you bzero, Python Community Server, the Blogging Ecosystem and the Internet Topic Exchange
Latest Python Buzz Posts
Latest Python Buzz Posts by Phillip Pearson
Latest Posts From Second p0st

Advertisement

As promised, today I'm going to talk about MySQL's LIMIT keyword, and how, even for simple queries, it's not as fast as you might expect.

First, read this note on what does get optimised. The general trend is that if you use LIMIT, MySQL will do as much work as it needs to get the rows you have asked for, and then stop. So if you do a query like this:

SELECT * FROM foo LIMIT 50

... then it will grab the first 50 rows, and stop. Fast!

If you have a table with an index on column a, this will be fast:

SELECT * FROM foo ORDER BY a LIMIT 50

... and so will this:

SELECT * FROM foo ORDER BY a DESC LIMIT 50

However, if foo has 100,000 rows, this query will require a full table scan:

SELECT * FROM foo ORDER BY a LIMIT 99950,50

To perform that, MySQL will read 99950 rows (or at least that many index entries), then read the last 50 and give them to you. So if you want the last 50 rows in a big table like this, it's quicker to use this query, then reverse the results:

SELECT * FROM foo ORDER BY a DESC LIMIT 50

phpBB uses this technique [archive.org cache] to speed up fetching the last 50 posts in a very long thread -- something that people want to do quite a lot!

Of course, this is just as slow as the first way if you want to fetch 50 rows out of the middle of a table. So what can you do?

One way is to build your own index:

CREATE TABLE foo_a_index (seq INT NOT NULL AUTO_INCREMENT, PRIMARY KEY(seq), foo_id INT NOT NULL) SELECT NULL,id FROM foo ORDER BY a

Now you can do really quick queries into foo, by joining with foo_a_index:

SELECT foo.* FROM foo_a_index, foo
WHERE foo_a_index.seq >= 50000 AND foo_a_index.seq < 50050 AND foo.id=foo_a_index.foo_id
ORDER BY foo_a_index.seq

The downside is that you now need to rebuild foo_a_index every time foo changes. If foo changes a lot, you might want to put the rebuild process on a 15-minute timer or something, and make the rebuild process more like this:

CREATE TABLE foo_a_index_new LIKE foo_a_index SELECT NULL,id FROM foo ORDER BY a
RENAME TABLE foo_a_index TO foo_a_index_old, foo_a_index_new TO foo_a_index
DROP TABLE foo_a_index_old

Using RENAME TABLE in this way guarantees that you always have a foo_a_index table for people to query.

If you want to index more than one thing -- say, you want to query lists of people in foo based on what country they are from -- then you can do pretty much the same thing. If you don't mind a bit more code, you can make it like this:

CREATE TABLE foo_a_index_new (country_id INT NOT NULL, seq INT NOT NULL, PRIMARY KEY(country_id, seq), foo_id INT NOT NULL)

... and then use a for loop in your favourite language to populate the index. If you don't mind a bit more SQL, you can make seq an AUTO_INCREMENT column and populate it like this:

INSERT INTO foo_a_index_new SELECT country_id, id FROM foo ORDER BY country_id, a

Note that this will result in most countries having seq values that start somewhere after 1, so your query ends up looking like this:

SELECT @minseq=MIN(seq) FROM foo_a_index WHERE country_id=123
SELECT foo.* FROM foo,foo_a_index
WHERE foo.id=foo_a_index.foo_id AND foo_a_index.country_code=123 AND foo_a_index.seq >= @minseq+50000 AND foo_a_index.seq < @minseq+50050
ORDER BY foo_a_index.seq

I'm not sure if I got the @ syntax right there, so don't panic if that query completely fails to work :)

Notes

Thanks to David Wilson for suggesting the use of a range query on a temporary table, as described above.

Comment

Read: How MySQL fails to optimise LIMIT

Topic: Make sure your indices are actually being used Previous Topic   Next Topic Topic: U-klucht

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use