Agile Buzz Forum - Fixing a Wiki Attack

The UIUC VW Wiki got spammed yesterday - well over a hundred pages. When it's a handful, I manually fix them (unless someone beats me to it). The attack from yesterday was hanging out there though, so I sat down and wrote some workspace script - I just grabbed the page source for all the modified pages on Recent Changes, and stuffed that into a collection - looked like this, but bigger:


strings := #(
'<A href="/VisualWorks/VisualWorks+WebServer+-+history">VisualWorks WebServer - history</A> 18:20:49 (ah1-p4id-56.advancedhosters.com)'
...
).

From there, it was a matter of finding the right page to revert to. This little snippet just pulled the urls out of that mess:


urls := OrderedCollection new.
base := 'http://wiki.cs.uiuc.edu'.
wiki := '/VisualWorks'.
old := 'VERSION'.
rep := 'PROMOTE'.
strings do: [:each | | url |
	stream := each readStream.
	stream through: $".
	url := stream upTo: $".
	urls add: url].

From that, I created the page history urls for each spammed page:


histUrls := OrderedCollection new.
urls2 do: [:each |
	|  url |
	url := base, wiki, '/HISTORY', (each copyReplaceAll: '/VisualWorks' with: '').
	histUrls add: url].

Then, grabbing each page, I scanned down to the second "VERSION" string, grabbed the good version number, and created the appropriate URL to restore the page back to the way it should have been. I added in a delay so that I wasn't doing a DOS attack on the server:


fixUrls := OrderedCollection new.
histUrls do: [:each |
	| content stream next num url tail|
	Transcript show: 'Getting: ', each; cr.
	content := (HttpClient new get: each) contents.
	stream := content readStream.
	stream throughAll: 'VERSION/'.
	stream throughAll: 'VERSION/'.
	stream atEnd ifFalse: [
		next := stream upTo: $/.
		num := next asNumber.
		tail := (UnixFilename named: each) tail.
		url := base, wiki, '/PROMOTE/', num printString, '/', tail.
		fixUrls add: url].
	(Delay forSeconds: 1) wait].

Now, with the set of "fix" urls in hand, I just ran each of them - another delay for the same reason, and a catch for HTTP exceptions - that way, I could cache any pages that didn't get fixed due to transient network errors.


missed := OrderedCollection new.
fixUrls do: [:each |
	Transcript show: 'Fixing: ', each; cr.
	[HttpClient new get: each]
		on: HttpException
		do: [:ex | Transcript show: 'Could not do: ', each; cr.
			missed add: each.
			ex return].
	(Delay forSeconds: 1) wait].

Then, simply rinse, repeat for anything that got missed. All the spammed pages there have been restored, and I didn't have to manually visit each one.

Technorati Tags: smalltalk


	Web Artima.com