Agile Buzz Forum - Google's Map-Reduce

Articles |
News |
Weblogs |
Books |
Forums

Artima Forums | Articles | Weblogs | Java Answers | News

Sponsored Link •

Agile Buzz Forum
Google's Map-Reduce

0 replies on 1 page.

Welcome Guest
Sign In

Back to Topic List

Reply to this Topic

Search Forum

Threaded View


Previous Topic		Next Topic

Flat View: This topic has 0 replies on 1 page

Keith Ray

Posts: 658
Nickname: keithray
Registered: May, 2003

Keith Ray is multi-platform software developer and Team Leader

Google's Map-Reduce

Posted: May 23, 2005 7:05 AM

This post originated from an RSS feed registered with Agile Buzz by Keith Ray.
Original Post: Google's Map-Reduce Feed Title: MemoRanda Feed URL: http://homepage.mac.com/1/homepage404ErrorPage.html Feed Description: Keith Ray's notes to be remembered on agile software development, project management, oo programming, and other topics.	Latest Agile Buzz Posts Latest Agile Buzz Posts by Keith Ray Latest Posts From MemoRanda

I went to Google's open house on May 5 (which happened to be 5/05/2005, but that's probably not significant) and listened to a very interesting set of lectures (and a genuine professional "Engineer/Comedian"). One of the interesting bits is a simple algorithmic infrastructure that Google uses for the index of web-pages, and some other purposes: the "MapReduce" programming model.

Google indexes the web using a sequence of 24 MapReduce operations. These operations are specified by a custom high level interpreted programming langauge, and distributed across thousands of rack-mounted dual-processor PCs. Using their fault-tolerant parallel processing infrastructure (implemented in C++).

The input data is a few thousand terabytes of data represented as files in Google's custom distributed/fault tolerant file system. The output and intermediate output data are several hundred terabytes.

As you might expect, the system is i/o bound -- disk i/o and network i/o. Thus the overhead of an interpreted language for map-reduce operations is not the constraint when doing these big jobs. They have to build their own custom gigabit ethernet switches to handle a thousand (or more) rack-mounted PCs -- commercial ethernet switches top out at 64 PCs and cost too much.

Their system has to be fault tolerant, since having at least one PC die (or otherwise go out of commission) each day is not unusual for each of their data-centers.

Read: Google's Map-Reduce

Previous Topic

Next Topic


	Web Artima.com