Summary
A review of The Humane Interface, by Jef Raskin.
Advertisement
I've just finished Jef Raskin's The Humane Interface. Jef's
greatest claim to fame is the invention of the original Machintosh
(although I'm not sure exactly what role he played, given that the
Macintosh was surely created by a team). I found the book a quick
read, interesting and thought-provoking, although a bit uneven.
This book wants to be on the same shelf as Donald Norman's classic,
The Psychology of Everyday things. Like Normal, Raskin revels in
pointing out examples of bad interface design right under our noses,
for example the placement of Windows menus, which he claims are hard
to aim for because they're not at the edge of the screen like Mac
menus. I once attended an entertaining talk Jef gave criticizing the
"iDrive" interface in the BMW 745i, which makes every rookie mistake
in interface design imaginable: from having too many levels of
submenus to requiring the driver's attention when his eyes should be
on the road, and, of course, the mother of all bad interface design,
modes.
Remember the editor wars, pitting vi and Emacs enthusiasts against
each other? One argument often heard against vi is that it has
modes: sometimes, typing an 'x' inserts the letter 'x' into the
text, and other times, it acts as a command (erasing a character, as
it happens). While vi's successor, vim, does a much better job of
showing in which mode you are, Jef's point is that the mode indicator
is not at the user's locus of attention: that is the insert
point in the text, while the mode indicator (at least when using the
terminal version) is at the bottom of the screen. If you are
distracted for a few seconds and then go back to your editing, you may
not notice the mode indicator in the corner of your peripheral vision,
and you will start typing assuming the wrong mode. Sure enough, that
has happened to me many times. (Emacs has modes too though! For
example, I often find myself inexplicably in a modal dialog, or
accidentally typing into a "dired" window.)
In his book, Jef argues that all modes are evil, whether we
are talking modal dialog boxes, different selection modes in graphical
editors, or even having different applications that behave
differently. Since humans are creatures of habit, we have a hard time
remembering which mode the computer is, because we want to focus on
the task at hand; modes divert our attention to keeping track of the
mode, and hence slow us down.
Jef's alternative is a clever idea that he calls a
quasimode, which is a mode that only lasts as long as the user
keeps a key or button depressed. His key example is LEAP: an
incremental search that is triggered by depressing a key placed below
the space bar with your thumb. The LEAP mode ends as soon as the key
is released. It is impossible to forget that you're holding the LEAP
key down (this has something to do with the part of our nervous system
that keeps track of which muscles are used), so you can't forget the
mode you're in. I've seen Jef give a demo of this using a laptop's
mouse buttons as LEAP keys, and the idea seems to work well
-- if you have a keyboard with keys that can be used for this
purpose.
But for most of us, LEAP and some of Jef's other clever ideas are
part of a utopian dream which we cannot obtain. Some other ideas
discussed in the book are more down-to-earth. For example, I found
his explanation of the GOMS analysis of how long it takes to perform a
particular task very useful: some of the biggest slowdowns are
switching between mouse and keyboard, aiming the mouse at a tiny
target, and thinking about what to do next. Jef's analysis of
the amount of information conveyed by a user gesture is also
interesting: for example, clicking in a modal dialog box with only an
OK button conveys no information, and such informational messages are
better presented in a way that doesn't require explicit
acknowledgement. Jef's suggestion, to make the message transparent,
is clever and I would like to try this out -- but, heeding one of his
other points (also made in another favorite book of mine, Steve Krug's
Don't Make Me Think), it needs more user testing.
Jef makes a point against interface customization which I have long
observed: customizations act as long-living modes, and can cause great
pain and confusion when you find yourself using a familiar application
on someone else's computer, or even on your own computer when you have
changed a preference that has a farther-reaching effect than expected.
(An extreme example: four out of five PythonLabs developers use XEmacs
on Red Hat Linux, and yet none of us dares drive anyone else's
keyboard, because our personalizations are all totally different.
When doing pair programming, this can be a serious slow-down!)
Jef also argues convincingly against differentiating between
beginner and expert modes, and against "learning" interfaces that
change the contents of the menus based on its observation of your
behavior. He is not against menus (menu selection is after all the
first quasi-mode), and seems to argue for fewer menus containing more
items: while it's easier to find something in a short list than in a
long list, if you have hundreds of items, having a small number of
long lists makes a rarely-used item easier to find than hiding it in
one of many dozens of short lists (especially submenus). He also
sensibly points out that in menus there's no particular gain in
brevity, and that text can convey more information quicker than icons
do: never underestimate our brain's text processing ability!
One point that Jef makes seems untenable to me: he wants the
distinction between all applications to disappear. I agree with some
of his observations, e.g. that it's confusing to have a text editor
containing a simple graphics editor sitting next to a graphics editor
containing a dumbed-down text editor. (I've often deplored the
differences in behavior between Word and Excel, which sometimes seem
to be from different planets, despite being sold together as a package
deal!) But while we can hope that key productivity applications will
eventually merge (and not in the way of the dreaded OpenOffice), I
expect that there will always be a lively trade in applications that
cater to special areas, be they games, instant messaging, or video
editing. And, despite bearing a 2000 copyright, the book has
remarkably little to say about web browsers and the interfaces we
build using them. I'm looking forward to Jef's Humane Browser!
(The next book I'm reading is Bob Martin's Agile Programming book.
I'm looking forward to reviewing that, too!)
> One point that Jef makes seems untenable to me: he > wants the distinction between all applications to disappear.
One (untenable?) thing I'd like to see is the distinction between doing the same thing in all languages disappear. I shuttle regularly between VB and Java and have started learning Python, so I regularly use the following statements:
Java: System.out.println "x = " + x VB: debug.print "x = " & x Python: print "x =", x Python: print "x = " + `x` (This was the way I first learnt it.)
Jef Raskin and others have an (very alpha) open source implementation of his ideas called 'The Humane Environment' at http://humane.sourceforge.net. As far as I know it is currently only available on Mac OS 9/Mac OS X classic environment.
Mr. van Rossum would probably be interested to know that the underlying programming is done in Python and one of the main things the environment/editor is used for is Python programming.
I wonder how "Open Source" it actually is at the moment. The only thing approaching a license statement I found was in one of the docs in CVS and that clearly wasn't open source as it didn't allow you to distribute the software.
Furthermore LEAP is patented, which is always a problem for Open Source and Free Software.
I haven't read Raskin's Book, but I did spend some time reading about THE (the humane environment, humane.sf.net), and trying it out. In the end, I remain unimpressed.
One of the problems I see with Raskin is his Macintosh background. The Mac is a great interface designed for the computing mainstream of the 1980's. The mainstream computer user has changed dramatically since then, and I don't believe the Mac serves modern users very well -- it has polish, but it no longer has vision. Certainly the recent changes to the Mac GUI are uninspired and merely incremental. (which is not to say that it's not pleasant to use, as refined products often are)
The mainstream computer user of today is not someone who is just learning to use a computer. It is not someone who needs an application that is immediately intuitive. Mainstream computer users use computers many hours each day, typically using a small number of applications.
To his credit, Raskin recognizes this, and proposes interfaces that are productive for advanced users, not just novices. Where Raskin fails, however, is by using novice interfaces as an intellectual starting point.
Advanced interfaces do exist. The Humane Environment, really a text editor, is just a bad implementation of the ideas of Emacs. It's not surprising -- Emacs has a somewhat unique status. Those who use Emacs use it intensely. The users also tend to have the skills and inclination to change Emacs, and of course the license and architecture to make that possible (even encouraged).
Maybe Raskin realizes this, and that's why many of his proposals are reminiscent of Emacs (though from what I understand he doesn't give Emacs the credit it deserves). But I am more impressed with constructive work -- the creation of working interfaces -- than with critique and theorizing. THE is Raskin's constructive work (and he deserves credit for trying), but the actual interface doesn't impress me.
I wrote this comparison between Emacs and THE, and why Emacs is everything THE wants to be:
Certainly Emacs could be improved, especially the learning curve and the transparency of the interface. But THE doesn't solve those problems. LEAP, I believe, is a failure (though it does address a valid problem). Abandoning modes is misdirected effort -- making the modal context more transparent would be a better direction, though in general I don't believe modes are as bad as Raskin thinks.
In general Emacs' biggest problem is not the particulars of its interface, but the scaling of its interface to such a large domain. THE aspires to the same large domain, but offers no ideas of managing such a large command set -- THE is Emacs with every shortcut except M-x removed. That's not a step forward.
(But I'd still love to see Emacs used as an interface starting point for a reimplementation, in Python of course...)
Your qualification of the "normal" computer user as someone who uses the computer for hours a day and designating the normal user to the category of experienced or semi-experienced computer users group couldn't be more wrong; or perhaps mis-guided is a better word. Even if a user uses a computer for hours a day, most aren't in the semi-experienced or better group.
The problem with your statement is not primarily factual, though it may be wrong even if we just look at the number of users that qualify as semi-experienced or better. The main issue with what you say is the fact that, even though perhaps as many as 60% of computer users are semi-experienced or better (a purely imaginative number), making the normal, or regular, computer user semi-experienced or better, there are still 40% computer users who are novices.
My experience with computer interfaces -- ranging from MacOS 7-9 and MacOS X through Windows 3.11, Windows 95 and Windows XP to Solaris, Linux and FreeBSD -- and with users using those environments, leads me to the conclusion that intuitive/simple-to-understand interfaces are important for the majority of computer users. While it may be true that many users use computers (primarily at work) for hours a day, most of them use the computers in a mechanical fashion, similar to industry robots performing an action: without pupose or deliberation.
A lot of users don't understand their computers, don't understand what it means to open an application, don't understand when to double-click or single-clink. Many users don't realise the difference between the Internet and Internet Explorer. These issues can be alleviated by an intuitive/simple-to-use interface, an interface in which actions can be easily understood and the actions a user performs are intuitive -- and more important that that, consistent with earlier experience.
This may sound cynical, but in my experience, you can never underestimate the competence of a computer user -- not due to idiocy but due to the user not being interested. Neither can you overestimate the ability of computer users to find a way to rationalize/explain the (faulty) result of their actions when using their computers as being the computers fault, or simply the way it's meant to be.
Intuitive interfaces are still, and will always be, an important issue. The MacOS lost some of the intuitivity/simplicity that made it so easy and pleasant to use when transiting to MacOS X. This has been difficult for many users, even though the change wasn't great. And still the MacOS interface is superior in many ways to the Windows interface, though they have become closer the last couple of years. We can only hope those who design and implement our interfaces do not think as you do.
"One point that Jef makes seems untenable to me: he wants the distinction between all applications to disappear. I agree with some of his observations, e.g. that it's confusing to have a text editor containing a simple graphics editor sitting next to a graphics editor containing a dumbed-down text editor. (I've often deplored the differences in behavior between Word and Excel, which sometimes seem to be from different planets, despite being sold together as a package deal!) But while we can hope that key productivity applications will eventually merge (and not in the way of the dreaded OpenOffice), I expect that there will always be a lively trade in applications that cater to special areas, be they games, instant messaging, or video editing."
No matter what area, once producing commands instead of applications (interlocked groups of commands) got off the ground, wouldn't competition among command vendors work better (for everyone except application vendors)?
The current way: if you discover technologically how to, say, make a typed instant message speak to its recipient in the typist's voice, you have to either sell your commands, "record voice sample" and "speak typed message", to the producer of an instant-messaging application or else produce an application that tries to do everything the others already do but looks a little different and enables "record sample" and "speak message".
Either way: I who want the new commands (technology) must change to a new application (product) that is not precisely what I want; even if I am using the application to which you sell the commands, to get them I must "upgrade" and thus have to deal with other new commands, including new ways to do old things, that I do not care about.
And a command is not just a smaller chunk of code than an application: a command codes a single human "gesture", a single hardware operation, to a definite computer event. (The Humane Interface defines a gesture as "a sequence of human actions completed automatically once set in motion.")
Once someone produces a "humane controlboard" (Jef is designing one) that eliminates current controlboards' unnecessary complexity, vagueness and redundancy and physically separates its computer controls from its Querty keyboard (so that learning to type becomes no harder than it was before the Querty keyboard got boxed in by computer controls), getting control of our computers one command at a time would keep us always more in control of them; as things are, most of us have machines whose powers far exceed our ability to control -- use -- them.
Command architecture would also lead to a more sane relationship between standardization and customization. One way or another, it would get kept track of that, say, the sender's holding the Command key with the thumb and typing s p e a k was getting used to make a recipient's computer "speak" a typed instant message in its sender's voice. Someone would no doubt then produce a cheaper command that generated the same event with the same gesture. Good. But there would be much less incentive to produce a confusingly similar command in the way applications are often confusingly similar.
And if someone did produce a command that generated the same event with a different gesture -- say, saying the word "speak" -- or a different event with the same gesture, I with your original commands could acquire (or decline to) the new one straightforwardly. In the first case, I would probably keep the original commands as is; that way, I would not have to unlearn one way of typing to make "speak" happen and learn what, with my new, voice-recognition application, I must type whenever I want to "speak" basically how I used to; nor would I have to stay fluent in two similar applications/modes.
In the case of the new command that used the same gesture -- hold the Command key and type s p e a k -- to cause a different computer event: if I wanted the new command, I would have to bite the bullet -- give up the old one -- remove it from the machine not just when I was in one application rather than another but, simply, remove it from the machine. The competition to design commands that best used the standard controlboard would lead to its becoming less powerful but more useful.
And what used to be powerful, specialized applications would get implemented as specialized controlboards or as appendages to the standard controlboard. That way, instead of each specialized designer's "deciding", first of all, how the product's interface was going to be -- namely, the current standard keyboard -- each would design and code physical controls appropriate to the specialized task. Specialized applications designed to an existing keyboard are like the approach that seems to be used by designers of remote controls: 1. Put buttons labelled 0 through 9, and buttons with the standard symb play, stop, rewind, et cetera.
Jef Raskin's point on per window menus (like in MS Windows) versus single global menu bar shared by all applications (as on the Mac) is a very flawed one.
He completely overlooks the bigger problem with the single shared menu bar,... which is .. it demands excessive mouse motion on part of the user in addition to contradict his own "modes are bad" theory.
Having menu bars on your window instead of on the top of the desktop is less confusing to users when there a multiple windows opened up on the desktop. It is often not at all obious to which window/application that menu belongs.
trying to precisely point the mouse is not such a big deal in the long run compared to the moving the mouse back over the longer distance between the window work area and menu bar. Also pointing the mouse carefully is something thats users master in a short time.
Per window menu's obviously do away with "modes" for the meny bar itself !
Microsoft's engineers certainly made a better decision in that respect.
One other +ve of the Windows interface is the "Programs" menu. I have seen mac users often custom creating such a folder on the desktop and placing all Application icons in it for easy accessibility. Windows does this and maintains the "programs" automatically for you. Easier to find where the application was installed and what apps are installed.
Sounds to me like Jef wants to make what is, in my opinion, the biggest mistake of UI design. He wants to treat everyone the same. Sure, customization is a pain when you're at someone else's desk, but I doubt it happens enough to make it worth losing the benefit of those customizations, and you can always start the editor in a default mode. If you visit often, you can keep a copy of your .emacs there.
Personally, I find myself very fast in Vi/Vim, and occasionally forgetting what mode I'm in is a small price to pay for the increase in productivity over using some dumbed-down standard text editing widget out of a GUI library, little better than Notepad on Windows.
I'm sure he'd hate my windowing environment too, since I never run into the problems reaching windows like he mentions, by avoiding mouse use as much as possible. Reaching for the mouse takes too long.
Dumbed-down GUIs are fine for systems you use rarely, or maybe once, like a Kiosk in a mall or a museum. They are not suitable for day-to-day use, and the problems Jef seems to be trying to solve are minor compared with the problems his solutions would cause.
People are unique. The only way to satisfy their requirements is to be flexible, extensible, and customizable.
I did read The Humane Interface quite some time ago. For several reasons, I was not entirely pleased. Yes, the general incompetence of computer users because of disinterest can't be underestimated - but I find myself using programs who do just not the same things.
You know, just take a text editor and a spreadsheet program. The things these two programs have to be able to do are completely different, and despite the 80/20 rule, there's still users who want to access obscure functions nobody else knows or cares about. The Humane Interface concentrates on really simple tasks and very little automation of user tasks.
Yes, these simple tasks make up most of anybody's time at a computer. Anybody who actually works with his computer, that is - games are a different thing alltogether anyway.
And then, there's not only spreadsheet programs and word processors. Take a graphics program, any 3D package, a network reading application (yes, commonly referred to as a browser, with high design and graphic rendering requirements and commonly very little actual user interaction), a media player, your programming environment. Do we want one interface to accomplish everything at the drop of a hat, modelessly? You know, just like "I know those 2 million commands and can do everything instantly in my one text window"? Is that the easy interface of the future? What about desktops and windows, are they just something we got used to despite actually wanting something else, a single "this one is everything" window? So we actually don't need large screens?
And something that questions the ideas behind The Humane Interface: Do humans work modelessly? Why should we model something not after how we work ourselves? Do we have the same thought patterns whether we read our blogroll or edit computer graphics? Whether we're doing sports or programming?
I don't think so.
So, in my point of view, The Humane Interface does take a great approach for text editing in computers. Maybe even a great approach for PDAs or data collection and messaging in smartphones (note that I don't speak about calling somebody), if data entry in smartphones were much easier than it currently is. But it is not applicable, IMHO, for computers at large.
...And maybe I did get Jef Raskin's ideas all wrong anyway. :-)