for-in grammar

2011-08-23

And down the rabbit hole I go. Again. I never knew, but there actually are some syntactic restrictions to the for-in statement. Even though you're unlikely to ever get this wrong, it's still interesting to see what the hell is going on...

Now, if you're not that interested in grammars, syntax or language edge cases; turn around now. This is not for you.

Also note that I'm talking on the formal specification / grammar level. Not what your browser does and does not do... :) When determining good or bad, this blog post talks on the parser level.

The common patterns for the for-in are:

Code:
for (var key in obj) ...
for (key in obj) ...

Nothing weird here. Crockford will tell you to wrap their statements in an if checking for own property, but that's beyond the scope of the syntax. Far beyond it.

Now, what let's take a look at some more exotic versions. First with a variable declaration.

Code:
for (var i=0 in obj) ...
for (var i,j in obj) ...
for (var foo in bar in obj) ...
for (var i=5, j=6 in obj) ...
for (var i in obj, j in obj2) ...

Can you determine which of these are wrong (on the parser level)? Don't worry, I don't think many people actually know. And why would you, there's no real reason to ever go there. The only valid versions are the first one and third. Edit: and fifth, thanks @joseanpg. For the fifth, you're essentially doing i in boolean, which will fail for sure when executing.

Although it is allowed for a variable declaration in the for-in header to have an initializer, you cannot declare more than one variable. Of course, this means you cannot declare more than one variable, so initializing more of them still bad, regardless how you do it.

Then there's the weird double in version. Note that the second in actually belongs to the whole expression that follows the first in. So you're actually checking doing ([var j = foo] in [bar in obj]). This will fail eventually, but not on the parser level.

If you thought that was weird, just wait for the versions that don't have a variable declaration. Which of these is incorrect (on the parser level)? Ignores the numbers that preceed them.

Code:
1 for (key in obj) ...
2 for (key=5 in obj) ...
3 for (key={} in obj) ...
4 for (foo in bar in obj) ...
5 for ((foo=bar) in obj) ...
6 for (new foo in obj) ...
7 for (foo() in obj) ...
8 for (new foo() in obj) ...
9 for (foo(bar in baz) in obj) ...

Yeah, not so easy now eh. Let's discuss them one by one.

#1. Standard, won't even bother you with this one. Valid of course.

#2. Looks standard doesn't it. Don't trust your browser, according to the specification this is NOT allowed. Start at IterationStatement. A left hand side expression is a call expression (no assignment in this grammar sub-tree) or a new expression. The new expression goes to the member expression which goes to the primary expression. The PrimaryExpression allows for a generic expression (which includes assignments), but only when wrapped in parentheses. So there is no rule that allows an assignment without parenthesis. Alas, this version is invalid.

#3. Don't be confused by the object literal. This is just an assignment so see #2.

#4. Just like the variable declaration, this is valid. And just like the variable declaration version, the first in marks the center of the for-in and it will fail when executed, but not when parsed.

#5. Valid, see #2.

#6. Valid. I find this a little weird. There's no way a new expression could end up as a valid for-in statement. For the path see #1.

#7. Valid, just like #6.

#8. Valid, just like #6, the parens don't change anything.

#9. That was a lame attempt, but this is of course also valid because the first in is parsed as part of the argument expression for the call, not as the in for the for-in.

Now, what makes this interesting for me is that this seems to be the only language construct that requires backtracking on the parser level. So far I've not encountered any other part of the ECMAScript grammar that requires you to backtrack when you cannot parse the next token. This means that, apart from the for-in, you can parse js linearly, without backtracking. If at any point you cannot continue, the source is invalid. Even for labels (which at first might be an identifier, you have to wait for the colon or the absence of the colon).

Now this sounds worse than it really is. In reality, say you have for (key=5 in obj);, by the time you reach the in token, you would have to backtrack to before the = token to throw the error. But there is no way that you could parse this any different and thus all you have to do is remember that you parsed an assignment and throw an error when you encounter the in keyword and accept if you encounter... anything else (it'd be valid in a for-each). Also note that this is an edge case and the overhead is not important for the parser since you're going to bail anyways.