The UIUC VW Wiki got spammed yesterday - well over a hundred pages. When it's a handful, I manually fix them (unless someone beats me to it). The attack from yesterday was hanging out there though, so I sat down and wrote some workspace script - I just grabbed the page source for all the modified pages on Recent Changes, and stuffed that into a collection - looked like this, but bigger:
strings := #(
'<A href="/VisualWorks/VisualWorks+WebServer+-+history">VisualWorks WebServer - history</A> 18:20:49 (ah1-p4id-56.advancedhosters.com)'
...
).
From there, it was a matter of finding the right page to revert to. This little snippet just pulled the urls out of that mess:
urls := OrderedCollection new.
base := 'http://wiki.cs.uiuc.edu'.
wiki := '/VisualWorks'.
old := 'VERSION'.
rep := 'PROMOTE'.
strings do: [:each | | url |
stream := each readStream.
stream through: $".
url := stream upTo: $".
urls add: url].
From that, I created the page history urls for each spammed page:
histUrls := OrderedCollection new.
urls2 do: [:each |
| url |
url := base, wiki, '/HISTORY', (each copyReplaceAll: '/VisualWorks' with: '').
histUrls add: url].
Then, grabbing each page, I scanned down to the second "VERSION" string, grabbed the good version number, and created the appropriate URL to restore the page back to the way it should have been. I added in a delay so that I wasn't doing a DOS attack on the server:
fixUrls := OrderedCollection new.
histUrls do: [:each |
| content stream next num url tail|
Transcript show: 'Getting: ', each; cr.
content := (HttpClient new get: each) contents.
stream := content readStream.
stream throughAll: 'VERSION/'.
stream throughAll: 'VERSION/'.
stream atEnd ifFalse: [
next := stream upTo: $/.
num := next asNumber.
tail := (UnixFilename named: each) tail.
url := base, wiki, '/PROMOTE/', num printString, '/', tail.
fixUrls add: url].
(Delay forSeconds: 1) wait].
Now, with the set of "fix" urls in hand, I just ran each of them - another delay for the same reason, and a catch for HTTP exceptions - that way, I could cache any pages that didn't get fixed due to transient network errors.
missed := OrderedCollection new.
fixUrls do: [:each |
Transcript show: 'Fixing: ', each; cr.
[HttpClient new get: each]
on: HttpException
do: [:ex | Transcript show: 'Could not do: ', each; cr.
missed add: each.
ex return].
(Delay forSeconds: 1) wait].
Then, simply rinse, repeat for anything that got missed. All the spammed pages there have been restored, and I didn't have to manually visit each one.
Technorati Tags:
smalltalk