Java Community News - Domas Mituzas on the Wikipedia Architecture

Articles |
News |
Weblogs |
Books |
Forums

Artima Forums | Articles | Weblogs | Java Answers | News

Sponsored Link •

Java Community News
Domas Mituzas on the Wikipedia Architecture

0 replies on 1 page.

Welcome Guest
Sign In

Back to Topic List

Reply to this Topic

Search Forum

Threaded View


Previous Topic		Next Topic

Flat View: This topic has 0 replies on 1 page

Frank Sommers

Posts: 2642
Nickname: fsommers
Registered: Jan, 2002

Domas Mituzas on the Wikipedia Architecture

Posted: Oct 26, 2007 5:49 PM

Summary
In a recent presentation at the MySQL users conference, Domas Mituzas explains Wikipedia's architecture.

As the eights busiest Web site, Wikipedia is unique in that it relies mostly on free, open-source software for its highly available infrastructure. At the 2007 MySQL users conference, MySQL's Domas Mituzas, who also works with Wikipedia, gave a presentation on Wikipedia's scalable architecture. The presentation is available from Wikipedia's Site Internals, Configuration, Code Examples and Management Issues.

Mituzas points out that:

The principle of openness forced all operation to use free & open-source software only. Having commercial alternatives out of question, Wikipedia had the challenging task to build efficient platform of freely available components...

Wikipedia’s primary aim is to provide a platform for building collaborative compendium of knowledge. Due to different kind of funding (it is mostly donation driven), performance and efficiency has been prioritized above high availability or security of operation.

Mituzas highlights the key elements of Wikipedia's architecture:

Linux - operating system (Fedora, Ubuntu)

PowerDNS - geo-based request distribution

LVS - used for distributing requests to cache and application servers

Squid - content acceleration and distribution

lighttpd - static file serving

Apache - application HTTP server

PHP5 - Core language

MediaWiki - main application

Lucene, Mono - search

Memcached - various object caching

The presentation focuses on many aspects of caching and content delivery:

Content delivery network is the ‘holy grail’ of performance for Wikipedia. Most of pages (except for logged in users) end up generated in such a manner, where both caching and invalidating the content is fairly trivial...

There’re no unaccounted dynamic bits on a content page (if there are, the changes are not invalidated in cache layer, hence causing stale data).. Every content page has strict naming, with single URI to the file ( good for having uniform linking and not wasting memory on dupe cache entries)... Caching is application-controlled (via headers) (simplifies configuration, more efficient selection of what can and cannot be cached)... Content purging is completely application-driven (the amount of unpredictable changes in unpredictable areas would render lots of stale data otherwise)... Application must support lightweight revalidations (If-Modified-Since requests)

What do you think of Wikipedia's architecture as presented by Mituzas?

Previous Topic

Next Topic


	Web Artima.com