Summary
A recent paper by Gilles Dubochet describes a study that concluded Scala code written in a functional style that favors for expressions and higher order functions is easier to comprehend than Scala code written in a more Java-like, looping style. It also found that meaningful variable names were insigificant or even detrimental to comprehension.
Advertisement
A recent paper by Gilles Dubochet, Computer Code as a Medium for Human Communication: Are Programming Languages Improving?, measured eye movements of people reading Scala code written in three different styles: a Java-like, looping style and two versions of a more idiomatic Scala style that used for expressions and higher-order functions instead of loops: one with meaningful variable names and one without. Although I was not surprised to see that the more verbose looping-style code took longer and/or was harder to comprehend than the more concise functional style code, I was not sure what to make of the difference between the two variable naming styles.
In one style, called grounded in the paper, intermediate functions and variables were given domain-meaningful names. A few intermediate variables were added as well, and given meaningful names to help readers connect each intermediate value to its domain meaning. In the other style, called ungrounded in the paper, concentrated solely on cleanly expressing the algorithm using as few intermediate names as possible, and always using meaningless words.
The results seem to show that the meaningless variable name version was easier and/or quicker to comprehend than the one with domain-meaningful variable names. One caveat they point out in the paper is that each participant in the study was made familiar with the domain before letting them see the code. It may be that people not familiar with the domain would have had an easier time figuring out the version with domain-meaningful names, however, in general I'd guess programmers reading code are usually familiar with the domain of that code to at least some extent.
A wise programmer once told me that the way he recommends people name variables is he asks them what the variable is, and whatever they answer, he tells them that's what the name should be. I tend to use descriptive names, but sometimes feel longer names have a tradeoff of cluttering the code, giving it a weight that takes away from seeing the structure of the code.
How do you decide what to name variables? When do you use single character names for variables? When do you use short abbreviations (but longer than one character)? And when do you use full words?
Just yesterday I was reading some code written by a nameless co-worker. He used "x" as variable name for a record, and you may be sure that I was not happy about it. I changed the "x" to "row". Generally speaking I hate one-letter names, unless you are directly translating from a mathematical formula. I also hate very extra-long names. Still there is a lot of middle ground in between and I believe one can use names which are both short and meaningful. For instance, IMO, "row" is better than "x" and better than "rowFromCSVFile", if it is obvious already that the row is being read from a CSV file. Last year I dazzled for a bit with SML and in that community one-letter variable names are quite common. I must admit I found them quite readable, but I think the reason is that I was reading algorithmic code. Of course, having the types helped too. But more often than not, and especially in untyped languages, one-letter variables are a suicide. One exception are dummy variables. For instance in this Python list comprehension
xxx_records = [r for r in records if r.code=='XXX']
I would not feel bad about using a one-letter variable "r", but I could also use "rec". I would not use "x".
While I haven't read the paper yet, the briefing I read said that small variable names where ok within 18 lines of their declaration. I think this is a very important caveat. In such cases, you just refer to the definition site. Particularly if they are vals, not vars. I'll keep an eye on that distinction while reading the paper.
> While I haven't read the paper yet, the briefing I read > said that small variable names where ok within 18 lines of > their declaration. I think this is a very important > caveat. In such cases, you just refer to the definition > site. Particularly if they are vals, not vars. I'll keep > an eye on that distinction while reading the paper.
I missed that point when I read the paper, but I kind of code that way. I do like one character variable names for some things, and it is usually when it is only used very closeby. Like e in:
t match { case _: TestPendingException => report(TestPending(tracker.nextOrdinal(), thisSuite.suiteName, Some(thisSuite.getClass.getName), testName)) case e if !anErrorThatShouldCauseAnAbort(e) => val duration = System.currentTimeMillis - testStartTime handleFailedTest(t, hasPublicNoArgConstructor, testName, rerunnable, report, tracker, duration) case e => throw e }
@"measured eye movements of people reading Scala code written in three different styles"---as a programmer, i do not like to be fettled down by an eye movement test. all these linguistic and cognitive psychology field is deeply in trouble due to their over-pawlowian-skinnerian methodology. they are basically of the 唯境無識 school of thought. because of this, they completely forget about asking the subject for *their* feelings, and i do not like that. i am love encounter flow, and researching in, but not communicating with the other one is as un-encounter as it can get in a legal setting.
i for one want to be asked, like, "did you enjoy reading this code?"; "would you like to outline with a pencil and explain what you have just understood?"; "do you like the style of programming? do you like the variable and the methods names?"; "do you like the overall lay-out of the code as it's written?"; "do you remember those names?"; "can you find ${your-keyword-here} for me, in the code?"; "do you think now that you read the code and answered all these questions, do you think you can just start now and do something with it or do you feel daunted or bored?"; questions like that.---not asking these questions is asking for a way of thinking that is purely quantitative and has no concept of quality or any other dimension except counting incidences. even epidemiology of all accounting sciences is already past that stage.
as an answer to the question put forth here, i do prefer longer (but not wordy) variable names to cryptic ones; and yes i am using a few ultra short names (USNs) for a number of things; thus, the return value is always `R` with me; `O` is the options; `i` for counting loops is appropriate; `e` is `exception` in catch-phrases; i nearly always write `def f( foo, bar, *P, **Q )` in python so extra positional arguments become `P` and named arguments become `Q` (for ‘quoted’); and so on.
This an interesting subject, but we should be wary of drawing too far-fetched concusions from this study because it doesn't study real programmers reading real code.
The test subjects are comp. sci. students, not working programmers and the code is short (the longest algorithm in the test is 73 lines). The code implements a relatively complex algorithm. Typical code in real world projects is not algorithmically advanced, but is difficult to comprehend because of its size and complex dependencies.
Yes, I like short names but not single character names. It can comprise 2 words not exceeding 20 chars. An additional word can also be suffixed for example Exception, Attribute, Event, Handler, etc. if required
Over the years my variable & function names have become more verbose. I've found it so much easier to understand code from years ago when it reads like a sentence.
if (number_of_errors > MAX_NUMBER_OF_ERRORS) { Indicate_Serious_Error(number_of_errors); }
> While I haven't read the paper yet, the briefing I read > said that small variable names where ok within 18 lines of > their declaration. I think this is a very important > caveat. In such cases, you just refer to the definition > site. Particularly if they are vals, not vars. I'll keep > an eye on that distinction while reading the paper.
Studies show that programmers focus 50% of their time moving back toward the declaration of a variable, in side-effect langauges. This study does not disprove that claim or show statistics different for side-effect-free languages. I think that a good study would be to separate out various aspects that might affect comprehension, such as side effects, proximity of declaration to initialization, etc.
You would need about 1,000 programmers to complete such a study, though.
> >It also found that meaningful variable names were > insigificant or even detrimental to comprehension. > > This statement is self-contradictory. > > If the variable name is meaningful, how does it hurt > comprehension. > My guess is that it cluttered the code with "too much information." The difference between:
class Rational(numerator: Int, denominator: Int) {
require(denominator != 0)
private val greatestCommonDivisor = gcd(numerator.abs, denominator.abs) val numer = numerator / greatestCommonDivisor val denom = denominator / greatestCommonDivisor
private val g = gcd(n.abs, d.abs) val numer = n / g val denom = d / g
def this(n: Int) = this(n, 1) // ...
If readers can figure out and remember what n, d, and g are, then maybe the algorithm they are being used in is a bit easier to see. As I think I mentioned in my post, I've always shied away from single character variable names except for common cases of i for index, e for exception in a catch clause, etc. But I have also felt that sometimes longer, "meaningful" names can shroud the larger meaning.
I use single-letter variable names for condensation of important components that are often used in construction.
I often use short acronyms of descriptive labels for names, with the non-abbreviated name noted as a comment.
Fairly often I use non-abbreviated words and phrases as names, but those tend to be large-scale or end-result entities rather than things that will apply within intricate calculations.
These practices describe my practice when programming in J, an Iverson language. In J, far fewer entities need be named than in most languages, and long names tend to make the structure of the code harder to see.