On semicolons, pseudo-technical and subjective arguments, revisited
The JavaScript community is, indeed, very mature — just like any other programming community. We always find time to properly and politely (and heatedly!) discuss the most important kinds of topics: like whether to use semicolons or not, or whether to use tabs or spaces. Or yet, whether to use Emacs or Vi{,m}. Of course, everyone knows you should use Emacs, so I don't even know why those discussions surface.
In any case, I decided to write a post to shed some light on the subject
and let the flames begin. I tried to just shake this whole discussion
away, because it started out too subjective… but I can't. This is an
important post. Someone is wrong on the internet!
A bit of history
Discussions on the rules of Automatic Semicolon Insertion (henceforth ASI) aren't a new thing in the community. In fact, we've been seeing these for as long as the community exists — well, perhaps not as much, but definitely the discussions did grow as the community as a whole grew.
On the one side, there are people who argue that ASI is harmful, and you should be ashamed of using such a excuse for "poor coding style". Of course, the qualities of a coding style are, usually, extremely subjective, and make absolute no sense taken without context — that is, without stating what concerns the particular coding style chosen tries to address. Other arguments involve blatant (and sometimes personal) attacks on people who don't use semicolons in their code, misconceptions about when ASI applies, and things like "we've told ASI is harmful, so it must be!" or "JS is not a semicolon-less language" — I wonder if they would also consider Python and Haskell as non-semicolon-less languages, since they also have semicolons and ASI (albeit much more significant line breaks and white space handling).
On the other side, there are people who argue that semicolons are harmful, and we're much better off without them. Arguments against semicolons vary from a simple "Yes, yes we can!", passing through the "But I use other languages that don't require semicolons!", to the "Omitting semicolons make errors more apparent by transforming them in non-usual things" — needless to say those are not the actual quotes, but they are close enough to the crux of the arguments.
In this post, despite the hints for flamewar here and there, I'll try to lay out the discussion from a not-so-biased standpoint. I am on the Yay ASI!, side, though, so you might see some bias here and there — I'll try to make those as overt as possible, so you can just laugh it off (or rage) and move on.
What is ASI?
We can't start this whole discussion without first clarifying what we're talking about. That's why, we first need to define what in the Lord's name ASI is.
Let's start with the specification definition of this whole mess:
Certain ECMAScript statements (empty statement, variable statement, expression statement,
do-while
statement,continue
statement,break
statement,return
statement, andthrow
statement) must be terminated with semicolons. Such semicolons may always appear explicitly in the source text. For convenience, however, such semicolons may be omitted from the source text in certain situations. These situations are described by saying that semicolons are automatically inserted into the source code token stream in those situations.
The specification then goes on to define all of the rules by which semicolon insertion takes place.
If you still don't grasp what ASI is after reading this part of the specification, don't worry, most people don't. We could summarise it as: "You need semicolons to end the statements cited above, but for convenience you can omit them in some situations, if you want to."
As Brendan Eich noted in his post, ASI is an syntactic error-correction procedure, that got implemented in JavaScript engine parsers and then made it into the specification. This means that, technically speaking, all code written without semicolons has syntactic errors in them. There's no denying it, because that's how the specification describes these procedures.
Practically speaking, though, the parsing rules and ASI rules — should we even call it that, since parsers don't need to actually add that token, — for the ECMAScript language are pretty deterministic, and ambiguities in terms of Automatic Semicolon Insertion can be resolved in a single or double look-ahead — if we ignore comments and horizontal white space.
In fact, if we looked at the practical matter of the subject, we could categorise famous languages' semicolon insertion rules in the following groups:
- White-space strict (aka Statements ends as early as possible)
-
With Python as the most known example of this particular style. In a language where "statements ends as early as possible", you can kind of define the language's statements' (or expressions) rules as the following:
<statement> ::= ( lots of junk ) <end-of-statement> <end-of-statement> ::= NEWLINE | ";"
Which, at a first glance, might look quite easy to remember, right? Unfortunately, in real world code, there are quite some more intricacies regarding these rules. Sticking with Python, since you want a little more expressiveness on how to lay out your long statements, Python offers you optional implicit and explicit statement continuations. So, a simple example with a string spanning multiple lines (and yes, I am aware of the additional constructs for multi-line strings in the language), would look like the following:
foo = "a string" \ + "in multiple" \ + "lines."
Notice how each line has to be terminated with an explicit continuation. Worse! If your editor can't highlight invisible characters, you risk running into the following error:
$ python test.py File "test.py", line 1 foo = "a line" \ ^ SyntaxError: unexpected character after line continuation character.
An alternative to explicit continuations are implicit line joins, which happen on parenthesised expressions:
foo = ( "a string" "in multiple" "lines." )
Note that, in Python's particular case, the concatenation operator isn't necessary when two chunks of strings are separated by just white-space. This style saves you from worrying about whether there's white-space after your explicit continuations, but then your simple
<end-of-statement>
rules stop being justNEWLINE | “;”
.
- Non-white-space strict (aka If we can't parse, insert a damn semicolon!)
-
ECMAScript is actually the only language I know who fits this bill — Haskell having even more complex statement rules, but managing to be overtly white-space strict.
In ECMAScript, your statements always ends up with a semicolon. Whether you need to explicitly spell one in your source code, though, is another matter entirely. In fact, you can write large JavaScript applications without needing to write any semicolon in your source code at all, (comments excluded), as long as you stay away from old-style
for
loops.In JavaScript, the same example above would look like this:
var foo = "a string" + "in multiple" + "lines."
ECMAScript's handling of statement continuation, imho, feels much more natural to read and write than Python's one. Therefore, they are, of course, TEH BESTEST — if my feelings would constitute actual arguments, sadly they do not.
How ASI actually works?
If you've been reading the comments from this whole flamewar discussion,
you'll see that many people describe ASI in JavaScript as a guessing game. They couldn't be further from the truth. In fact, there's no guessing at
all, everything is extremely deterministic from the grammar of the language
itself.
And what's more, to check if an statement ends, a parser would need to use at most 2 look-aheads, if we don't consider comments and white-space. So, let's first define what the usual rule for ending a statement in JavaScript is:
<end-of-statement> ::= ";" | <horizontal-space>* NEWLINE blank* (and STATEMENT_ENDED? (not <continuation>)) <horizontal-space> ::= SPACE | TAB | <comment> <continuation> ::= "(" | "[" | <infix-operator>
Well, actually, it's not as simple as that. You have a few special cases in the
language to handle prefix operators and a few others, like the return
statement, which the specification calls restricted productions
.
For prefix statements, it just suffices to say that they require an argument succeeding then, and so can't end earlier due to a line break. Thus, the following is valid JavaScript:
var a = 1 ~ a // => -2
In the case of restricted productions
, however, we have the "awesome" <no line-break here>
restriction in the grammar itself. Which means that, while
these productions may have something succeeding the token, they do not
require that such a case happens, and as such, a line break would indicate that
we want to end the statement early. A good example of restricted productions
,
and that is widely (and mistakenly1) used to indicate how you should
always end your statements with a semicolon, is the return
statement, which
accepts an optional return value:
<return-stmt> ::= "return" [no line break here] <expression>
(function(){ return 1 })() // => 1 ;(function(){ return 2 }() // => undefined
Of course, inserting a semicolon after 2
in the second example won't prevent
the return
statement from always returning undefined
.
For more in-depth articles on the intricacies of ASI in JavaScript, please refer to the awesome people that have written about it over and over and over again. They'll probably dive into more practical (and non-grammar-ish) details than I have here. Inimino's post is particularly awesome.
Is ASI safe?
We've seen that all of the ASI rules are pretty deterministic, and they don't rely on any kind of Black Magic from hell that only parsers and parser-writers know. They are also pretty manageable for a human brain to keep on his head while reading the code — or at least, I consider JavaScript's ASI rules, despite all of the complexity, in the same level of cognitive overhead as Python's, if only because it feels more natural to me.
But do all of the engines implement it properly? Is it safe? Will my code suddenly break?! Well, if you've been blindly following the Cult of Crockford, you might have been lead to believe that the answer to all of those questions would be: no, NO!, BET THE HELL IT WILL.
Of course, you should apply the not
predicate from higher-order logic to all
of those answers to get the actual, useful, answers. So, addressing the
concerns in order:
Do all of the engines implement it properly?
Well, as inimino says in his post about JavaScript semicolons, there's no reason to fear any incompatible behaviour between browsers in regards to the "feature" (or misfeature, the definitions vary in the community). I have not been able to verify this claim with empiric proofs yet to date, but from my experiments in code, it seems to hold perfectly valid.
If any browser implementer would like to clarify the matter, it would be an interesting (and welcome) addition, indeed.
Another misconception is that bugs in browser JavaScript engines mean that using semicolons everywhere is safer, and will protect the developer from compatibility issues between browsers. This is simply not the case. All extant browsers implement the specification correctly with regard to ASI, and any bugs that may have existed are long since lost in the mists of early Web history. There is no reason to be concerned about browser compatibility in regard to semicolon insertion: all browsers implement the same rules and they are the rules given by the spec and explained above.
Is it safe?
No. I mean yes. I mean, it depends on what you mean by safe. From a technical stand-point, it is perfectly safe — after all, it is deterministic. You just need to know the rules.
Will my code suddenly break?!
As David Herman, a TC39 member, kindly enough clarified TC39 won't just break working code for no good reason. All the more when there's such a large body of working code relying on this particular feature all the way around.
Brendan Eich also discussed the issue with compatibility with older versions of ECMAScript and legacy code as they move forward to extend the language, several times (in his blog, in es-discuss, in twitter, in…).
It can all be summed up as: "if something is working and in use in the JavaScript community, we can't just break it."
So, no, your code will not suddenly break if you start omitting semicolons here and there. Now, I can't guarantee people won't want to break you.
Reasons to use or not ASI
So, this all said, there are any actual arguments in favour of omitting semicolons from your code, aside from the "well, duh, we can!" non-argument argument? Turns out, there are a few, but they're all overtly subjective, and not in any sense technical:
- Consistency! I write code in other languages that use no semicolons, so avoiding them in JavaScript makes it easier for me to switch back and forth, without inserting them in the wrong places, in the wrong languages.
- Semicolons are not necessary, thus removing them promotes them from ordinary statement terminators/separators to special tokens that should be present in certain situations. It's expected that this change in the role of semicolons would highlight certain kinds of bugs, by making lacks of semicolons in places where they matter more apparent.
- Removing semicolons reduces the overall noise of a source code, granted the indication given by semicolons is duplicated by line breaks and indentation, this means that we don't have extraneous and needless symbols to distract you from the actual code.
Of course, there are reasons for not relying on ASI, and the valid ones are also overtly subjective, and not in any sense technical:
- Consistency! I write code in other languages that use no ASI, so putting them everywhere in JavaScript makes it easier for me to switch back and forth, without omitting them in the wrong places, in the wrong languages.
- Avoiding semicolons means that you have a higher cognitive load, and you must always scan much more of the source code than you should, just to be sure that the next line doesn't imply a continuation.
- I use tools that don't support ASI (like JSMin, or JSLint).
Okay, the last one of those is not a subjective reason not to rely on ASI, but is not a technical reason on why other people should do the same. After all, you're just relying on a broken tool.
Conclusion
You should omit all semicolons from all your JavaScript code. Semicolons everywhere are a freaking silly code idiom, and I'm not going to change my coding style because some punks dislike it… wait, that wasn't really the topic in discussion. Well, whatever.
Footnotes:
1 : Citation needed.