The Artima Developer Community
Sponsored Link

Java Community News
Ted Dziuba on Why Statelessness Stops at the Database

7 replies on 1 page. Most recent reply: Oct 9, 2006 7:24 AM by nes

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 7 replies on 1 page
Frank Sommers

Posts: 0
Nickname: fsommers
Registered: Jan, 2002

Ted Dziuba on Why Statelessness Stops at the Database Posted: Oct 3, 2006 1:44 PM
Reply to this message Reply
Summary
In a recent blog entry, Ted Dziuba argues that in most Web applications that access a database, the HTTP protocol's inherent scalability is lost when a request must interact with a database. The solution he outlines to extending scalability to the database layer is the one Google employs in its user account management system.
Advertisement

While the stateless HTTP protocol facilitates the horizontal scaling of Web applications, much of the benefits of a stateless application are lost when that application must enter a database. In a recent blog entry,Web Apps: Why does Statelessness Stop at the Database?, Ted Dziuba characterizes the problem as follows:

As a user query moves up the stack of services, [the database] is where the statelessness stops. If a configuration has a few FastCGI nodes, each with a few processes, then each process maintains its own connection to the database. Not only is one database server handling all of the load, but it is also a single point of failure. Losing a FastCGI machine is not that big of a deal, but losing your SQL server means lights out.

He suggests that you can architect your database layer to be "stateless," too, which is what Dziuba's employer, Google, has done with their user account management:

You can load balance [your database] and set up some master/slave replication, but it makes more sense to keep the whole stack stateless. How is this done? With a stateless database of course.

At Google, we use a stateless database to run our user accounts authentication system. Namely, it’s BerkeleyDB HA. Many different products depend on Google Accounts: GMail, Google Groups, Personalized Home, Orkut, Writely, and anything else that requires a user to log in (the list goes on). The statelessness of BerkeleyDB HA lets Google Accounts achieve remarkably low latency, even under heavy load. As you could probably guess from the name, it provides high availability as well...

When we design tables, the typical mapping is that each table represents[an] object, with foreign keys to other objects. Stateless databases can be shard similarly. The natural division is to keep replicas of different tables on different backends. This way, you won’t have a copy of the entire database on each app server, but each app server can get the objects it needs easily. Furthermore, you can add more replicas for objects that get hit frequently, further reducing latency....

Each application server will have a replica the data, but if you can shard your data by object type as presented in the previous section, space won’t be an issue except in extreme cases.

At the end of this post, Dziuba points to a paper describing Google's use of BerkeleyDB High Availability (PDF).

What do you think of Dziuba's point in making the database layer "stateless," too? What solutions do you use to scale your database layer?


nes

Posts: 137
Nickname: nn
Registered: Jul, 2004

Re: Ted Dziuba on Why Statelessness Stops at the Database Posted: Oct 4, 2006 11:44 AM
Reply to this message Reply
The article is misleading. This has nothing to do with Statelessness. If no state was necessary we would not need a database to begin with.

The issue discussed here is that in a web stack with current technology, everything but the database can be trivially parallelized by adding more cheap machines. What we need is a cheap and efficient shared-nothing parallel database. Google is using a custom solution from BerkleyDB HA, but I am sure there are other ways to approach that. Anybody know how well Oracle’s solution perform if I wanted to distribute a database on say 200 cheap PCs for instance?

Cameron Purdy

Posts: 186
Nickname: cpurdy
Registered: Dec, 2004

Re: Ted Dziuba on Why Statelessness Stops at the Database Posted: Oct 5, 2006 8:11 PM
Reply to this message Reply
> What we need is a cheap and efficient shared-nothing parallel database.

Or even better, the ability to avoid going to a database altogether for 99% of cases .. what if what you needed were already in memory in the form you needed it in? ;-)

Peace,

Cameron Purdy
http://www.tangosol.com

Brad Schneider

Posts: 5
Nickname: snide
Registered: Jun, 2006

Re: Ted Dziuba on Why Statelessness Stops at the Database Posted: Oct 6, 2006 8:56 AM
Reply to this message Reply
It's blog entries like this that make my life miserable. I end up in a meeting with operations or management and they just can't fathom why they can't run their booking engine in multiple datacenters...now. "But Google does it, I read this blog post, they say it's super easy", the unspoken implication being that I must suck and if they just had some of those Google geniuses instead of me, their life would be sweet. There have always been techniques for scaling beyond a single database, they inevitably incur some level additional complexity (additional code is complexity, additional configuration is complexity, additional products is complexity) in the tiers above the database and more often than not, relaxing ACID and opening up yourself to potential collisions. A deep understanding of the nature of your data and the operations required is the key to scaling your database. If you have that, then you can make intelligent decisions about what path to take, be it partitioning, replication, caching or clustering (Oracle claims to have true clustering, but I've heard conflicting reports on the reality.) Yes there are some technologies that can make it more or less easy, but make no mistake, you are *always* making some tradeoff.

nes

Posts: 137
Nickname: nn
Registered: Jul, 2004

Re: Ted Dziuba on Why Statelessness Stops at the Database Posted: Oct 6, 2006 9:22 AM
Reply to this message Reply
> Or even better, the ability to avoid going to a database
> altogether for 99% of cases .. what if what you needed
> were already in memory in the form you needed it in? ;-)
>

Sure caching will get you long ways and is what should be done first. But I don’t know if caching works so well with information that is very personalized. Password, personalized layouts, shopping cart contents etc.

Cameron Purdy

Posts: 186
Nickname: cpurdy
Registered: Dec, 2004

Re: Ted Dziuba on Why Statelessness Stops at the Database Posted: Oct 6, 2006 1:38 PM
Reply to this message Reply
> Sure caching will get you long ways and is what should be
> done first. But I don’t know if caching works so well with
> information that is very personalized. Password,
> personalized layouts, shopping cart contents etc.

I wasn't only referring to caching. Many of our customers use main-memory architectures for transactional operational data (ODS).

Personalization is actually one of the "big wins" for this type of architecture, since the usage patterns tend to be very tight (e.g. high affinity within a distributed environment, highly coupled access patterns, etc.)

Peace,

Cameron Purdy
http://www.tangosol.com/

Cameron Purdy

Posts: 186
Nickname: cpurdy
Registered: Dec, 2004

Re: Ted Dziuba on Why Statelessness Stops at the Database Posted: Oct 6, 2006 1:49 PM
Reply to this message Reply
> Yes there are some technologies that can make
> it more or less easy, but make no mistake, you are
> *always* making some tradeoff.

Absolutely, and a bit more realism in our industry (e.g. "the grass is always greener in the newest framework" and "people who don't work for us must be smarter") would do us well.

Complexity, to your point, is probably the greatest threat to the growth of our industry. We are getting to the point as an industry in which a large number of our aggregate cycles are spent dealing with entropy. I think that this has happened for two reasons: First, the growth of the industry has slowed and perhaps reversed (at least in the west), which means that the total number of cycles is not growing and may be shrinking. Second, we failed to target entropy and manage its growth, so the investments made through the recent dot-com boom (etc.) are requiring many more cycles on average to maintain than the systems that came before them.

There are bright points of hope as well, such as the reduction in the number of applications that a typical client has to install / maintain, thanks to advancements in "rich client" web technologies, self-updating applications, cross-platform (e.g. Java) technology, install-on-demand extensions, etc.

Peace,

Cameron Purdy
http://www.tangosol.com/

nes

Posts: 137
Nickname: nn
Registered: Jul, 2004

Re: Ted Dziuba on Why Statelessness Stops at the Database Posted: Oct 9, 2006 7:24 AM
Reply to this message Reply
> I wasn't only referring to caching. Many of our customers
> use main-memory architectures for transactional
> operational data (ODS).
>

So you never write names and passwords to disk? I would be interested to know more details on how that technology works?

Flat View: This topic has 7 replies on 1 page
Topic: Ted Dziuba on Why Statelessness Stops at the Database Previous Topic   Next Topic Topic: Software Production: A Major Bottleneck?

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use