Summary
HTTP Authentication may be RESTful, but it's not very USEful.
Advertisement
Whenever I talk to a REST enthusiast, they tell me I should be using HTTP authentication, not forms with cookies or URL-rewriting, for user authentication. (REST is an architecture style for distributed systems, and a popular way to think about the web.) Most web frameworks, including Java servlets, use one or both of these approaches to make it easy to identify a server-side session object for a given request. A RESTful application, however, would model all server-side state as resources accessible via URIs, so there's no need for that session object. The URL-rewriting is particularly anti-REST, because a single resource has many URIs, one for each session ID. Both approaches, especially when combined with a server-side session data, tend to erase much of the scalability benefits of HTTP 1.1's caching mechanisms.
Nevertheless, HTTP authentication is not widely used. The two approaches currently defined in the specifications are HTTP "Basic" and "Digest" authentication. Basic is widely supported, but because it effectively transmits the username and password in the clear, it is usually appropriate only over HTTPS. Digest never sends the password in the clear, but is apparently not implemented consistently in clients. In his recent XML.com article, httplib2: HTTP Persistence and Authentication, Joe Gregario describes the interoperability issues with Digest in more detail, and concludes:
The bad news is that current state of security with HTTP is bad. The best interoperable solution is Basic over HTTPS.
With HTTP authentication, the client (such as a web browser) presents its own authentication dialog to the user when prompted for credentials by the server, after an initial request of a resource that is in a protected realm. (A realm is a URI, such as "/admin" and all resources under that URI, such as "/admin/moderation" and "/admin/users".) The user enters their username and password into the dialog, thereby authenticating themselves to the client for the requested realm. The client then uses those credentials to authenticate with the server for that realm from then on, until the client is exited. (I.e., there is usually no way to log out of a client authentication except by ending the client process.)
Being a budding REST-enthusiast myself, I investigated HTTP authentication, but discovered several usability and a few security concerns:
1. Username prompt may confuse the user
With HTTP authentication, the client takes the credentials, usually a username and password. This means that the application has no control over what prompt is given for username. At Artima, we are planning to have a network of sites with single sign on, and a while back I took to calling the username "Artima ID" on our sign in page (just as Yahoo calls their login a "Yahoo ID"), to make it obvious what you're logging into. The C++ Source will someday be its own website, but to sign in you will need to use your Artima ID. You won't have a The C++ Source ID. My first concern is that the inability for me to put "Artima ID" next to an empty box for the username with HTTP authentication, will cause people to be confused about what to type there.
Moreover, email addresses are now unique in our database, so if you forget your Artima ID, I'd like to let you use any of the email addresses you attached to your account instead of your Artima ID. And in fact, I may have registration paths where you don't select an Artima ID, so you may not have an ID even if you have an account. So I may want to say "Artima ID or email address:" next to that empty box, or at least a note explaining that if you've forgotten your ID, you can use your email. If people are authenticating via the client's authentication dialog, I believe it would be harder for me to explain that.
2. Not obvious what to do if you forgot your password
If you have forgotten your password, I have on the sign in page a "Forgot your password" link to a way to deal with that problem. It is not as easy to make that option clear using the browser's HTTP authentication dialog.
3. Not obvious what to do if you don't have an account
If you don't have an account yet, I have on our existing sign in page a "Sign Up" link to a path that allows you to create a new account. It is not as easy to make that path obvious when using the client's HTTP authentication dialog.
4. No way to do optional authentication
With HTTP authentication, I can force the user to sign in before doing things that require being signed in, like posting to a forum or administration. However, I also want to be able to recognize via the request whether or not the user is signed in. If not, I don't want to force them to sign in, and instead will return a representation for guests. If they are signed in, I want to return a personalized representation of the resource.
5. Difficult to do single-sign on
With cookies or URL-rewriting, I can quite easily enable users to sign onto the entire network of artima.com subdomains in one shot. In HTTP Digest authentication, I can explicitly list domains that are included in the realm, but not with Basic. However, the scalability of listing each subdomain individually with Digest is a scalability concern. It works fine for half a dozen subdomains, but what about 100 or 500?
6. No way to log out
Users are accustomed to having a log out button that enables them to log out of a site before leaving a public terminal. The HTTP authentication protocols provide no way for the server to request that the client erase the credentials for a realm other than prompting for them again, which causes the browser to pop up the dialog again. Today's browsers do not themselves provide a way to log out of a realm, other than quitting the browser.
HTTP authentication workarounds
Nevertheless, if JavaScript and cookies are enabled on the client, I believe you can get around most of these issues. For example, optional authentication can be achieved by using cookies to indicate whether the user has authenticated, while using HTTP authentication for the actual authentication. A sign in page can be created that provides "Forgot your password" and "Sign Up" links, an explanation of what to type into the browser's dialog (i.e., Artima ID or email and password), and includes a button to press that will actually cause the HTTP authentication dialog to appear. Even better, if JavaScript is enabled on the client, some JavaScript embedded in that sign in page can replace the explanation and button with a user-friendly login form. JavaScript can capture the submit, and do the HTTP authentication itself.
In addition, if JavaScript is enabled on the client, there are even ways to effectively log people off of an HTTP authentication session. For example, if you require that passwords are at least six characters long, then no one can have the password "abc". When you want to log someone out because they've pressed the "Log Out" button, some JavaScript on the client can run and do an HTTP authentication for the realm you want to log the user out of at a special URI with the user's username and the password "abc". This authentication will succeed at that special URI, which now means that the next time the client visits any other URI in that realm, it will attempt to authenticate with the wrong password. You have effectively logged the user out of his or her authentication session for that realm.
Conclusion
Despite the presence of potential workarounds, I find HTTP authentication in its current state to be too much trouble to use. I am still planning on using a cookie as an authentication token in our new architecture, or if cookies aren't enabled, URL rewriting. I'm curious if others have any success stories, or the opposite, to share about using HTTP authentication in practice.
I think HTTP auth, at least like it is implemented these days, is not good for most things. We use Basic HTTP Auth (with Apache) internally, to access a Subversion repository. There's no "sign up" page, and we don't want anyone with no username to do anything. In general, HTTP Auth is good if the application developer doesn't want to invest the time for developing a custom auth mechanism (or when you just want to password-protect an existing application).
However, I don't see the problem with cookies. Persistent cookies may be a privacy issue, but session cookies are harmless AFAIK, and I see no reason to block them(*).
One more approach that you didn't cover is the one used by default by ASP.NET applications. They use POST rather than GET, and the session object is embedded in the response sent to the user - and uploaded back whenever a user does something. I don't like it, but it's something.
(*) My cookie policy, thanks to Firefox, is this: accept *all* cookies, persistent or session. However, on browser exit, remove all but a small list of whitelisted cookies. Currently there are exactly 5 domains on this list.
> I think HTTP auth, at least like it is implemented these > days, is not good for most things. We use Basic HTTP Auth > (with Apache) internally, to access a Subversion > repository. There's no "sign up" page, and we don't want > anyone with no username to do anything. > In general, HTTP Auth is good if the application developer > doesn't want to invest the time for developing a custom > auth mechanism (or when you just want to password-protect > an existing application). > I was at one point considering using Basic HTTP auth over HTTPS for making admins re-authenticate before getting to the /admin realm, for example. But even that may be too much of a pain. If I'm going over HTTPS, then I can just use an admin auth cookie if I want, and its easier. If I could depend on HTTP Digest auth, that would be different. I haven't tried to see how well it is implemented everywhere, but I see reports of problems, and I don't need to spend time testing cookies.
Another way it occurred to me to secure scary functions like deleting an entire site's database, would be to send the admin through a link I send in an email to activate the scary admin roles. Or confirm scary things by email before actually doing them. Then even if someone steals their Artima password, then unless they can get ahold of their email, then still can't do the really scary things. But there aren't that many scary things, and I think I'm going to err on the side of ease of use for the most part. If you have the password, you will be able to do most things without needing to reauthenticate.
> However, I don't see the problem with cookies. Persistent > cookies may be a privacy issue, but session cookies are > harmless AFAIK, and I see no reason to block them(*). > Here's the relevant portion from Roy Fielding's dissertation:
6.3.4.2 Cookies
An example of where an inappropriate extension has been made to the protocol to support features that contradict the desired properties of the generic interface is the introduction of site-wide state information in the form of HTTP cookies [73]. Cookie interaction fails to match REST's model of application state, often resulting in confusion for the typical browser application.
An HTTP cookie is opaque data that can be assigned by the origin server to a user agent by including it within a Set-Cookie response header field, with the intention being that the user agent should include the same cookie on all future requests to that server until it is replaced or expires. Such cookies typically contain an array of user-specific configuration choices, or a token to be matched against the server's database on future requests. The problem is that a cookie is defined as being attached to any future requests for a given set of resource identifiers, usually encompassing an entire site, rather than being associated with the particular application state (the set of currently rendered representations) on the browser. When the browser's history functionality (the "Back" button) is subsequently used to back-up to a view prior to that reflected by the cookie, the browser's application state no longer matches the stored state represented within the cookie. Therefore, the next request sent to the same server will contain a cookie that misrepresents the current application context, leading to confusion on both sides.
Cookies also violate REST because they allow data to be passed without sufficiently identifying its semantics, thus becoming a concern for both security and privacy. The combination of cookies with the Referer [sic] header field makes it possible to track a user as they browse between sites.
As a result, cookie-based applications on the Web will never be reliable. The same functionality should have been accomplished via anonymous authentication and true client-side state. A state mechanism that involves preferences can be more efficiently implemented using judicious use of context-setting URI rather than cookies, where judicious means one URI per state rather than an unbounded number of URI due to the embedding of a user-id. Likewise, the use of cookies to identify a user-specific "shopping basket" within a server-side database could be more efficiently implemented by defining the semantics of shopping items within the hypermedia data formats, allowing the user agent to select and store those items within their own client-side shopping basket, complete with a URI to be used for check-out when the client is ready to purchase.
> One more approach that you didn't cover is the one used by > default by ASP.NET applications. They use POST rather than > GET, and the session object is embedded in the response > sent to the user - and uploaded back whenever a user does > something. I don't like it, but it's something. > I hadn't heard of or thought of that approach. I guess that requires JavaScript to be enabled on the client to catch the clicks and turn GETs into POSTs? Either that or you rewrite the page and put buttons for every link that needs the session ID sent along with it? But wait, wouldn't you need to return the resource as the response of the POST, and doesn't that cause the funky dialog to drop down asking the user if they want to repost old data when backing up?
Is there a URL where this approach is described?
Anyway, this sounds even worse than URL rewriting to me on first blush.
> I hadn't heard of or thought of that approach. I guess > that requires JavaScript to be enabled on the client to > catch the clicks and turn GETs into POSTs? Either that or > you rewrite the page and put buttons for every link that > needs the session ID sent along with it? But wait, > wouldn't you need to return the resource as the response > of the POST, and doesn't that cause the funky dialog to > drop down asking the user if they want to repost old data > when backing up?
Every link/button/whatever that's related to the current state calls a JavaScript function called __doPostBack(). This functions POSTs to the server with whatever parameters needed by that control, AND a hidden input field called "__VIEWSTATE". This field contains encrypted state information as what appears to be Base64-encoded data.
> Is there a URL where this approach is described?
I'm not a web developer myself, so I don't know where the official documentation is. However, try searching Google for "doPostBack" or "__doPostBack", and you'll see plenty of references to it.
> Anyway, this sounds even worse than URL rewriting to me on > first blush.
It is. What's really bad is this encrypted state information can get really big - even as big as a few KBs. Again, I'm not a web developer, but it appears to me that this is just the default behavior that allows applications to Just Work (TM), and it should be tweaked by the developer.
If you are using apache, you can use something like mod_perl to 'hijack' the authentication mechanism. You can use a custom auth handler to do things like check the URI, the client IP, verify username/password (you could use this for your 'id or email' thing) or any number of other factors. Your handler will then either return 'ok' or 'auth required.' Similarly, you can hijack the access mechanism. A custom access handler would let you (for example) deny access to an admin URI without popping up a new login box.
I think there are some very simple solutions to most of the pain points described in this article.
For example, you can have a welcome page that does not require authentication. The welcome page can have links to sign up, reset password, and login. The login link can point to a page that requires basic HTTP auth. Once the user logs in, the rest of the pages he traverses to can all be covered by HTTP basic auth.
HTTP Auth does allow for providing a hint to the user as to what credentials are needed. If you are using Apache httpd, you can use the AuthName config param to say, "Artima Login".