Summary
... wherein I decide that, with winter a-cumin in, it's time for a lot of heat, so I venture into the programming language equivalent of TV Wresteling: coding style...
Advertisement
I'm sure this will cause me no end of grief, but I'm about to
confess publicly here that I am a heretic. (In this particular
case I'm only confessing to heresy in computer language design.
Other heresy confessions will have to await another time.)
I'll state it right out: For almost any mature language (C,
Java, C++, Python, Lisp, Ada, FORTRAN, Python, Smalltalk, sh,
javascript, ...) coding style is an essentially solved problem, and
we ought to stop worrying about it. And to stop worrying about it
will require worrying about it a lot first, because the only way
to get from where we are to a place where we stop worrying about
style is to enforce it as part of the language.
Yup. I'm really saying that. I'm saying that, for example,
the next ANSI C update should define the standard K&R
C programming style into the language grammar. Programs that use
any new features should be required to be in K&R style or be rejected
by the compiler as syntactically illegal.
I'm gonna pause here. When I was talking about this on a mailing
list I had to go through this several times. People didn't quite
get me because they didn't quite believe someone was saying this.
I mean this quite literally. For example, I want the next C grammar
to define that a space comes between any keyword and an opening
parenthesis. "if (foo)" would be legal, but "if(foo)"
would not. Not a warning, not optionally checked, but actually
forbidden by the language parser. Flat out illegal. Can't compile.
Here is the logic in its most simple form:
Premise 1: For any given language, there are one or a few
common coding styles.
Typically one is set by the founder(s) or earliest documenter,
but others will evolve over time. But even for C there are only a
handful of commonly used styles, ignoring trivial variations.
Premise 2: There is not now, nor will there ever be, a
programming style whose benefit is significantly greater than
any of the common styles.
Get real. Discovering a style that improves your productivity
or code quality by more than a few percent over the common
styles is about as likely as discovering a new position for
sex. [Astronauts need not apply, unless they want to invite
me along.]
Premise 3: Approximately a gaboozillion cycles are spent on
dealing with coding style variations.
Think about it: How many reformatter/pretty-printers projects
are there on sourceforge alone?
How many options does any given IDE (including emacs) have for
formatting code? How many cycles are spent deciding on a style,
documenting it, enforcing it, and updating it? How many history
logs for CVS, Clearcase, etc., have a lot of noise from varying
format changes? How many brain cycles are spent on arguing
about this topic?
Premise 4: For any non-trivial project, a common coding style
is a good thing.
I really think this is pretty well agreed on. How constraining
the style is varies, but having several folks hacking on the same
code with conflicting coding styles introduces more pain than
any single style imposes on any single person. Every project
I know of has a style, if not spelled out at least by custom.
Conclusion: Thinking of all the code in the entire world as
a single "project" with a single style, we would get more value
than we do by allowing for variations in style.
Think of it. All the programming examples in one style. Web
pages, journals, papers, emails use one style. Reformatting issues
gone. Arguments over whose style is better gone. Reformatters a
quaint historical artifact.
And most of all: No More Style Wars! Really! Think
of all those cycles that we could then plow into something more
productive, like vi/emacs wars! Or world peace! Or a really
good chocolate cookie recipe! You choose!
Of course, you will never enforce any style globally unless
people have literally no choice. How many C programmers use
during as a stylistic preference to while?
(Preprocessor abusers need not apply. On second thought, please
do: We need to identify you for our eugenics program.) Or skip the
parens around an if clause? They don't because they
can't. You know they would if they could. The thing that
stops these "personal styles" is that the C compiler will not accept
them. If you can't compile your code you fix it. It's so simple
it's stupid. And therefore it works.
So I want the owners of standards for established languages to
take this up. I want the next version of these languages to
require any code that uses new features to conform to some
style. Let the standards committees gnash and snarl and wring their
hands over which of the common styles is the winner. Sell tickets.
We all get to comment and the langauge lawyer standards geeks decide.
We know where they'll go -- C will go to K&R; C++ will go with
Bjarne's style (excuse me while I cringe); Java will go with the
Sun style as shown in the language spec and most of the Java books
from Sun (including mine); Lisp style is almost already set mostly
in stone. Perl is a vast swamp of lexical and syntactic swill and
nobody knows how to format even their own code well, but it's the
only major language I can think of (with the possible exception of
the recent, yet very Java-like C#) that doesn't have at least one
style that's good enough.
Some things are either uncheckable (Hungarian notation, using
"get" and "set" method prefixes) or not widely agreed upon (such
as import/#include ordering). These can be left for future standards.
Or not. The owners of the the standard decide. But whatever they
do, they should set the style and build it into the actual freakin'
grammar.
This heresy encompasses one major sub-heresy: That whitespace
should matter.
Most style rules have to do with the placement of whitespace:
newlines before or after curly braces, white space around operators
or not, etc. So I'm saying that languages should indeed care about
whitespace. A lot.
Yet one of the things we supposedly learned from languages like
FORTRAN was that whitespace should only matter to separate tokens.
This was accepted wisdom because FORTRAN had columns -- the first
five columns were reserved for a statement number or a comment
indicator, the sixth column with any character in it meant a
continuation of the previous line, seven through 72 where language
statements, and the last eight were reserved for sequence numbers
useful for re-ordering the card deck if it was dropped. Yes, I
mean cards, the physical type, with rectangular holes. Also,
DO10I=1,100 was the same as
DO 10 I = 1, 100 because DO
was a keyword followed by a number and so the space wasn't required,
although it made DO10I=1 interesting, as that assigned
1 to a variable named DO10I.
I lived this ugliness, so I feel the pain. But all it really
proved was that FORTRAN's whitespace rules sucked. License to put
whitespace anywhere has proven to be expensive and cycle-wasting
in practice. We're not editing on punched cards anymore, and
reformatters are as common as spam. We can use this power -- type
code however you want to but before you compile it, reformat it (or
reformat on the fly, whatever).
In the end, this requires only that editors and IDEs used by
coders will let the user type stuff and it will make it look right.
This is basically just reformatting on the fly, which many editors
already do. We don't need you to type zero, one or seventeen spaces
between an if and its open paren, we just need the editor
(assuming K&R C) to put exactly one space there. And getting even
this right will be easier if there is only one style to worry about.
It's one of those things that those reformatting or style adapting
cycles can go to.
Basically, freedom for formatting style has proven extremely
expensive, and does not deliver much value for cost. Think of it
this way: Could you honestly fill in the following:
I, [insert name here], know of a programming
style whose impact on programmer productivity and/or program
quality is large enough that my freedom to choose it over any
major common style validates the programmer productivity and
investment used industry wide in arguing about style, imposing
style, and reformatting to match styles. That style is [insert
style description here] and its benefits are [insert
benefits here].
Or even the less demanding:
I, [insert name here], know of a programming
style whose impact on programmer productivity and/or program
quality is >= 5% when compared to any major common style. That
style is [insert style description here] and its benefits
are [insert benefits here].
I think you will mostly get snickers even suggesting that this
can be filled out. And even on a single project you can spend 5%
on coding style issues -- mostly up front, but it's a continuous
bleeding as style wars crop up over things as yet undefined; new
tools are suggested, written, or integrated; people forget to put
it in the right style and then it gets corrected and pollutes the
change history; training new people in the style; disciplining
engineers who are uncooperative; and just general bitching, whining
and moaning.
So 5% doesn't even touch the opportunity and other costs
associated with not having a mandated style across all the code in
the world.
Or if you prefer the question the other way 'round: What
benefits do we get from freedom of style that outweighs the cost
we pay for it?
To me the answer seems obvious: Nowhere near enough.
It doesn't work for languages that don't have a fixed grammar, such as Dylan or Lisp. This is important because it is where the next big jump in programmer productivity is probably going to come from.
The main problem is automatically determining the proper indentaion level for some (but not all) intermediate words such as "elseif" or "private" in newly-defined control structures.
Other than that problem (which can probably be solved) I'm all in favour of saying that anyone is entitled to go mark-whole-buffer indent-region using the appropriate emacs mode and checking the result in.
I agree 100% that that kind of support for coding styles would benefit the economics of the software industry.
// Martin Rosén-Lidholm
PS Id like to add, due to the quote with the possible exception of the recent [...] C#, that I find Juval Löwys coding standard for C# very valuable.
First, You somehow manage to put in one page that how it looks does not matter and "radical hypothesis that programmers are human".
Secondly I have two examples. We do use vertical and horisontal whitespace to signify importance of parts of code. If you don't do this you lose ability of visual distinction.
zero=0;one=1;
counter = 0;
someMinorObject->action();
anotherMinorObject->anotherAction();
majorStuff(); //so important that even intention is commented
minorStuffAgain();
andAgain();
When you write some tables(other data structures), it does matter how you layout them.
Now if these examples does not explain what I tried to say, either tic tac toe does not mean much to you or visual arrangement or I'll have to take more English classes. Probably most of that :)
For Java, Sun both wrote the compiler and the coding standard, and the standard existed from the very beginning, so there really is no debate on the big issues.
Yet this compiles:
public class test {
private static String MyString = "Hello";
public static final void main(String[ ] ARGS) { System.out.println(MyString); } }
Maybe in Java 1.5.1 (or 5.1) we could have the enforcement as a flag (say -Xcheckstyle) and make it permanent in java 1.6 ?
There will always be some marginal cases. You can enforce formating before checkin(commit and even after taking sources from repository.
Or make desktop icon 'Fix formatting for all my sources'.
But we should be allowed to do better than standards. Becouse of some (thousands) people whose code we don't like, we should not restrict all of programmers.
I agree with the author. Having so many styles makes programming more confusing, especially for entry-level programmers. Just look at it, one book advocates one style, another book advocates a different one, etc. How is he or she not feel confused? Which style is better? If we only have one style, and enforce it, that new programmer can concentrate on different aspects of that language. It makes things easier. For all of us, really.
I've been programming for a couple years now and I'm still trying to find the best style. I would be happy to have that aspect eliminated.
I've just joined a new team, and therefore I'm now using their coding guidelines. I'm convinced to 110% that it is a Good Thing that a team uses one agreed-upon coding style. So everything should be just fine, right?
Well, no. This team's guideline has a couple of extremely annoying and inexplicable rules squezed into an otherwise decent guideline document.
The most annoying rule says that "there MUST be NO spaces between any keyword and an opening parenthesis"!! Sigh. Or rather @#£¤$%&*#. It is rules like this which gives the coding guidelines a bad reputation, I guess. Pity poor me...
Therefore, despite of in-house programming style guideline documents, or maybe because of those amateurishly put together guidelines, I *may* welcome a style guideline defined and enforced by the language standard. I say may because it is important that such an enforcement of style as part of the language has the strongest support from the community. Of course.
I wonder if there are any style comparsions put together? Pointers to urls, articles and books are much welcome! For example, can I find the most popular styles in C, for example, lined up side by side somewhere?
/Tommy
PS. I'm really looking forward to the forthcoming book "C++ Coding Standards" by Herb Sutter and Andrei Alexandrescu.
I agree, and I've always thought this would be extremely powerful, especially when doing collaborative programming.
Let's face it; everyone likes to see code in their own favorite way. If it was stored in a standardized format, but *your* IDE automatically formatted it so you could see it in your style, it would accomplish a lot.
No more CVS headaches when somebody "accidentally" reformats the entire project, for example.
I don't think you've gone quite far enough. I think the language specification should specify the infoset of the language rather than the syntax. This makes the relaxed vs. tight syntax debate moot.
[Aside: this closely parallels the "XML sux" thread that also showed up on Artima today - what is the best way to parse content (and its copyright) and presentation layers - I'm trying to be nonpartisan here]
By the infoset, I mean the logical structure of the construct, ie: IF has a {condition} and a {block}. No approved syntax, just structure. Not XML, though that might be a fine way to persist a program without "whitespace" issues. As I remember, RDF is defined this way.
In this way, the syntax of the program would be one of many, not all of which would have to be complete. Much like the language-VM relationship underlying Java, C#, et. al.
Then a language product (eg: java) would issue one or more syntaxes on top of the language, including painfully tight ones such as you've proposed, and a given project/effort could adopt one (or choose to argue forever ;-) A professional group (as in mature and with the beat, not commercial) would adopt the rigorous norm.
Representation transformers would then have the same advantages as in VM's, pcode, and what not: they only have to be one way, text to object or object to text, rather than one style convention to another.
Wow, and I thought the naive idealist in me had died a while ago...I'm going back to go waste some time adjusting the magic formatting options in my IDE...
> I don't think you've gone quite far enough. I think the > language specification should specify the infoset of the > language rather than the syntax. This makes the relaxed > vs. tight syntax debate moot. > This is the image I got when talking to Gosling about his project Jackpot, in which the notion of program truth is not text, but an annotated parse tree--something that was traditionally not built until the compiler compiles. But Gosling wanted to see what kind of things he could do if the IDE kept everything in a parse tree form, and he was able to do some visualization and refactoring stuff. The refactoring sounded like what IntelliJ had been doing for quite a while, but the really interesting conceptual leap I got from the conversation was that maybe future languages could be defined not as a text syntax but as some kind of data structure.
This would allow a Model/View separation, in which the model was defined as a standard, and different people could write and view the program in ways that fit their personalities. I was concerned that this would mean the actual saved program would be binary, and I asked Gosling how Jackpot saved programs. He said it is saved in Java code.
I don't think I ended up publishing this part of the conversation, but I asked him about how he deals with comments, then, because they aren't part of the syntax. He said comments were among the hardest part for him to work with. So one thing languages in the future could do that would help is define comments as first class parts of the syntax, not as text to stripped off before parsing. But if you can figure out a way to deal with the comments, you could create a Java IDE that does this today. (I believe Jackpot is now part of NetBeans, so maybe it already exists today in some form.)
The other comment I would make about enforcing style at the language level is that no matter what style you choose, some people won't like it. And although it would help the economy probably if everyone adopted the same style, some people may just choose a different language whose enforced style better fits their taste. So it won't necessarily help you market your language, and I doubt it would end user complaints about style. Instead of complaining about others' style, they'll complain about the language's style.
Python uses indentation to determine blocks, and I like that a lot, but I know others who hate that. Nevertheless, Python does go a bit in the direction Ken talked about in his weblog, and I think Python programmers benefit from it. One time over breakfast with Bruce Eckel I wondered if Python didn't go far enough, if a language should enforce even down to the number of spaces between an if and an open paren. I suggested we invent a new language to try this idea out, and proposed we call it ARL, for Anal-Retentive Language.
But my opinion today is that this kind of thing is better done by making comments a first class syntax citizen, and then letting programmers use tools to view and edit the program in whatever way works for their personality and situation. The saved form of the program (the "model") could be specified as Ken suggests, down to the number spaces between if and (.
re the main post ... my main problem with layout standards is that they are indirectly a constraint on the whole development environment of the programmer. A standard which is right for a 21 inch desktop may not be right for a 15 inch laptop. Maybe I prefer to split my Emacs window vertically and you prefer to split it horizontally.
Is this just the first step towards a standard IDE and computer? Are they gonna try to take Emacs away from me?
It's a good idea, but I think that there more important fish to fry if we start tweeking language rules. Imagine how much better code would look if C++/Java/C# compilers no longer accepted more than 5-7 statements per method.
Look, it's a well stated point. Good conception and articulation. I get it. But the way you layed out the bold text, the length of the lines in some of the paragraphs, and your use of a space between periods and the start of the next sentence...
Flat View: This topic has 44 replies
on 3 pages
[
123
|
»
]