Summary
Amazon.com has for some time allowed developers to programmatically access its products catalogue and e-commerce infrastructure. But the company has lately been transforming itself into a veritable application development and hosting platform as well. Artima asked Amazon chief evangelist Jeff Barr to explain his company's developer strategy.
Advertisement
Amazon.com has become known as much for its huge product catalogue as for its highly reliable online platform. The company has been expanding its developer offerings in the hope that sharing some of the secret sauce that made Amazon's own infrastructure so robust will bring new business to the company. At the core of these efforts are the Amazon Web Services.
Amazon.com chief developer evangelist Jeff Barr explained to Artima in a telephone interview that developers are an increasingly important part of his company's vision:
We worked hard to take what we built to power Amazon.com, and open [that] up so that outside developers can step in and build their own applications on top of the same infrastructure. We see some economies of scale that can [benefit developers]... We currently offer ten services in four categories: e-commerce, infrastructure, Web metrics, and workflow.
This is a business for us, this isn't a experiment. We do have a business model for these ... services. We give developers the ability to create an account on Amazon. You can use your retail account, if you'd like. Once you've done that, you sign an individual license agreement, and then pay us a pay-as-you-go charge for the infrastructure services, and [some other types of pricing] for the other services.
A lot of what we learned is in efficiency and cost management. What we pass onto our developer customers is the fact that we've been running gigantic data centers for eleven years now. We know how to do that very efficiently, and how to keep those machines running at a very good, industry-leading cost level....
If you were to dig deep inside our firewall, you'd find that Amazon's own Website applications are in many cases hosted on top of these services... We consider how [this] works to be extremely proprietary, but we can give the benefits of how it works out to developers.
Barr noted that most of the Amazon.com services have have both SOAP and REST APIs, and that higher-level toolkits and APIs exist for Java or PHP or .NET.
Amazon's e-commerce services are a natural outgrowth of the company' s core business as an online store, according to Barr. Of the company's Amazon E-Commerce Service, he noted that:
We give you full access to the Amazon product catalogue with the ability to do searches and queries to get the catalogue, retrieve data about millions of products. For each of those products, we return a very complete schema describing the products there, with about a 100 different data elements ranging from product title, list of authors, cover art, the new price to the publisher, to market place pricing.
Another e-commerce service is the Amazon Historic Pricing Service:
This [service] enables Amazon marketplace sellers to look back in time and get a good sense of historic pricing and sales trends as sold on Amazon. That accesses a data warehouse of around 25 TB of historical data.
Perhaps the most interesting Amazon.com services from a developer's point of view are the infrastructure-related services. Barr described that these services together form the basis of a developer platform:
These are great application building services. We have developers who [have] built complete applications that span one or more of these services.
Barr described the Amazon Simple Queue Service (SQS) as follows:
We take a relatively simple data structure, a first-in-first out queue, and allow developers to create these queues and host them inside their Amazon accounts. Once you've done that, you can actually put any number of queues inside your account, and you can use that as a coordination tool for distributed applications.
Let's say you have a whole collection of desktop applications that are collecting local data from those desktops, and you need to put that data to some central location, and then process that, aggregate it, sum it up in some way. Each of those desktops could have the identity of a single queue, and then pass their local data into that central queue, where [the data] is stored reliably and securely. And then a central application can pull data off of the end of that queue, and do whatever centralized processing is necessary.
While we call these services "simple," that really reflects the interface to the programmer, not so much how these services are implemented. With the queue service, you have simple operations like createQueue(), listQueues(), you can insert data into the queue, you can look inside the queue, and take data off the queue, you can remove multiple messages at a time, and you can also delete a whole queue as well.
A single application can have as many queues as it would like. The elements in the queue can be up to 250,000 bytes per element, and you can have any number of queues and any capacity per queue. For the queue service, there is a certain charge for putting messages into the queue, and for storing them.
Of the Amazon Simple Storage Service (S3) Barr had this to say:
The concept here is to give developers a way to easily and reliably store online data. Any application can push data into S3. You store data in objects that range anywhere from 1 byte to 5 GB at a time. You always write entire objects into S3.
At the point where the store is complete, the data is replicated into multiple servers throughout the data center, and then it will be replicated also across multiple data centers as well. To group all the objects, the developer has to have an indexing model, or grouping model, called buckets. Inside each developer's account, you have buckets. You can do things like look at the entire contents of a buckets, or retrieve contents item by item in there.
There are access control lists for every object stored in S3. You can start out with every object [being] fully private, and you can then do things [like] leave write access only for the owner, and then give read access to the entire world. That’s great if you're putting, say, a Web page or an image in there. You can also put individual, authenticated users on the access control list. You have very fine-grained control of both reading and writing for each object stored in S3.
Every object stored in S3 has a unique URL. That makes it great for storing Web pages, or CSS pages, Web images, almost anything you would want to put online, you can simply put up in S3.
We charge for the storage, based on the number of gigabytes per month stored. Each month we measure several times per day how many gigabytes you store in your account. We then sum that up across the month, and do all the right math so that we charge you on a very fine-grained basis. We charge $0.15 per gigabyte month for storage, and then $0.2 per gigabyte for data transfer either up to S3 or back down from S3.
We had some interesting success stories. For instance, small newspapers, some that would become really popular, would put content or banner ads, whatever they needed, to scale up. That provided them an instant scalability solution.
The final instructure service Barr described to us is still in beta, the Amazon Elastic Computing Cloud (EC2):
I like to think of it as hardware as a service. We have a processing grid of many, many servers... and we give developers a way to actually rent time on those servers by the wall clock hour.
To do that, we use a virtualization technology called XEN. The first step a developer does is create something called an AMI, an Amazon Machine Image. Inside that image, you configure your local database, your local applications, whatever local services you need to have running on that machine . You then store those images inside of S3, and then, using Web services, or using a control panel, you can start up as many instances of these machine images as they would like...
You get a virtual environment with a guaranteed amount of processing and memory and bandwidth. Each of those machines is the equivalent of a 1.7GHz x86-class machine, you have also 1.6GB of RAM available, and 160GB of local disk storage, and you have 250Mb/second of bandwidth and out... At this point, we support several different flavors of Linux. We also give you the tools to create your own machine images...
Maybe you have a classic three-tier business application. You have your Web server, your business logic, and your database server. To start that up in EC2, you make three images. If your business starts out really small, you instantiate one copy of each. When you see traffic build up on any of those three tiers, you simply add additional servers. If your Web server is the first one that needs to scale up, you add a second copy of your Web server...
You can add and release processors [in] between five and ten minutes. This provides you with essentially instant scaling. If you see that your site is about to hit a new peak load, you can dial into the HC2 control panel, and crank up a few additional servers to handle that peak. Once the peak is over, you would then just release those servers. You have no ongoing cost - you simply pay as you use those servers.
You pay $0.1 per CPU per clock hour. So your [three-tier application], running on three servers, costs $0.30 per hour to run... We charge for bandwidth transfer in and out of EC2, but we don't charge for bandwidth inside of our data center. If you have a huge amount of data and you bring it into EC2 for processing, and then store the results back into S3, that transfer from S3 to EC2 and back is at no charge, because that's local network cost.
Because bandwidth within the data center is not charged, one EC2 user might host a really interesting Web service that you want to use. Your application that runs in EC2 can call that Web service, and make those calls back and forth for no bandwidth charges.
We don' t have a built-in load-balancing solution at the application level. There are other developers building that on top of what we have, and there are also a lot of open-source solutions that people are using. We simply give you the raw compute power and open that for you to do what you like on top of that.
We've seen a huge range of applications, from [those] that are very time-sensitive, such as end-of-month billing runs for a utility or a phone company, [to those] doing huge processing-intensive calculations, or rendering 3D graphics. We have people that run businesses that are very busy during US business hours, and after business hours, are not all that busy. They can have a large number of servers during the day, and a much smaller number of servers in the off hours.
We are still in beta, and have a lot of people using the service. We see that it's very easy to move applications into EC2, and see people moving entire applications, and starting them up there within 24 hours.
Barr finally highlighted the remaining two Amazon Web services, including Alexa.com:
Their business is to measure Web traffic and Web popularity. They generate about 100TB of new data every two months, and they make that data available through an API called the Alexa Web Information Service. You can search through these archives, get back detailed information about Web sites, including things like Web connectivity for any URL, traffic rating, popularity rating, speed and reliability ratings.
Just as Alexa, the final service, the Amazon.com Mechanical Turk, also has its own Web site, and both HTML and API-level interfaces:
Mechanical Turk is an API to human processing power. If you need some human processing anywhere inside your application. If you need someone to look at some images, check things against some rules, you integrate those into your applications. Each of those activities is called a Human Intelligence Task, or HIT. The API is used to put the requests into the system, along with payment information, and a set of rules called qualifications, workers go to a Web site, and from there they can find work, do the work, and the work is returned to the requesting organization, and workers get paid when the requesting organization is satisfied with the quality of the work.
To what extent do you think Amazon's services enable new types of applications?
I'm not often that positive, but I think Amazon's web services are amongst the most important technical initiatives of the last couple of years. The most important aspect is the freedom their services give to small teams of developers. The number of new possibilities is just mind-blowing, if you ask me. Although I don't think it would be wise, I suddenly am able to start an eBay clone. My worries about hosting/upgrading/scaling/distributing/storing etc. would be a lot smaller with Amazon WS. I can just start small and cheap, and grow dynamically when needed. Now THAT is a powerful idea, it's unprecedented.
I bet we're going to see many new and exiting applications based on Amazon's WS over the coming years.