The Artima Developer Community
Sponsored Link

Python Buzz Forum
"sexeger"[::-1]

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Simon Willison

Posts: 282
Nickname: simonw
Registered: Jun, 2003

Simon Willison is a web technology enthusiast studying for a Computer Science degree at Bath Uni, UK
"sexeger"[::-1] Posted: Sep 16, 2003 6:30 PM
Reply to this message Reply

This post originated from an RSS feed registered with Python Buzz by Simon Willison.
Original Post: "sexeger"[::-1]
Feed Title: Simon Willison: Python
Feed URL: http://simon.incutio.com/syndicate/python/rss1.0
Feed Description: Simon Willison's Python cateory
Latest Python Buzz Posts
Latest Python Buzz Posts by Simon Willison
Latest Posts From Simon Willison: Python

Advertisement

Via Ned Batchelder, an article on Reversing Regular Expressions from Perl.com. Otherwise known as Sexeger, these offer a performance boost over normal regular expressions for certain tasks. The basic idea is pretty simple: searching backwards through a string using a regular expression can be a messy business, but by reversing both the string and the expression, running it, then reversing the result far better performance can be achieved (reversing a string is a relatively inexpensive operation). The example code is in Perl, but I couldn't resist trying it in Python. The challenge is to find the last number occurring in a string.

>>> import re
>>> lastnum = re.compile(r'(\d+)(?!\D*\d)')
>>> s = ' this isa 454 asd very very 
  very long strin9 asd9 009 76 with numbers 
  99 in it and here is the last 537 number'
  # NB this was all on one line originally
>>> lastnum.search(s).group(0)
'537'
>>> import timeit
>>> t1 = timeit.Timer('lastnum.search(s).group(0)', 
         'from __main__ import lastnum, s')
>>> print "%.2f usec/pass" % (1000000 * t1.timeit(number=100000)/100000)
26.82 usec/pass
>>> lastnumrev = re.compile('(\d+)')
>>> lastnumrev.search(s[::-1]).group(0)[::-1]
'537'
>>> t2 = timeit.Timer('lastnumrev.search(s[::-1]).group(0)[::-1]', 
         'from __main__ import lastnumrev, s')
>>> print "%.2f usec/pass" % (1000000 * t2.timeit(number=100000)/100000)
9.26 usec/pass

There are a few points worth explaining in the above code. The (?!\D*\d) part of the first regular expression is a negative lookahead assertion - it basically means "match the subpattern provided it isn't followed by a string of non-digits followed by at least one digit. This is the bit that ensures we only get back the last digit in the string, and is also the bit that could cause a performance problem.

'some string'[::-1] is an example of Extended Slices, introduced in Python 2.3. Its affect is to reverse the string, by stepping through it from start to end going back one character at a time.

The actual benchmarking code makes use of the new timeit module from Python 2.3 - I copied it verbatim from that module's example section in the manual.

The resutls speak for themselves: 26.82 for the lookahead assertion expression compared to just 9.26 for the reversed regular expression. This is definitely a useful trick to add to the tool box.

Read: "sexeger"[::-1]

Topic: Sweden says no to euro Previous Topic   Next Topic Topic: Google In-Depth

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use