Closed
Bug 520095
Opened 15 years ago
Closed 13 years ago
Should encoded U+FFFE and U+FFFF decode to U+FFFD?
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
VERIFIED
FIXED
mozilla8
People
(Reporter: Waldo, Assigned: Waldo)
References
(Blocks 1 open bug, )
Details
(Whiteboard: [inbound])
Attachments
(2 files)
2.23 KB,
patch
|
Waldo
:
review+
|
Details | Diff | Splinter Review |
2.16 KB,
patch
|
masa141421356
:
review+
|
Details | Diff | Splinter Review |
Bug 172699 made these characters decode to the replacement character. Why? Spec sez this: Let V be the value obtained by applying the UTF-8 transformation to Octets, that is, from an array of octets into a 32-bit value. If Octects does not contain a valid UTF-8 encoding of a Unicode code point throw a URIError exception. According to Table 21 in the ES5 final draft (pending approval in December), U+FFFE and U+FFFF are valid (only the surrogate range is not valid). Do we preserve our still-non-standard behavior? Should this be fixed in ES5 errata (or ES6)? I have no idea, hence this bug.
Assignee | ||
Comment 1•15 years ago
|
||
(Yet another option would be to throw, which might be better than replacing silently if we actually want to treat U+FFF[EF] as error-ful, and it's arguably more consistent with how we'll handle error-ful overlong UTF-8 once I land bug 511859.)
Comment 2•15 years ago
|
||
U+FFFE has security implications similar to those for overlong sequences. By embedding U+FFFE noncharacters, attackers can get an insufficiently careful UTF-16 decoder to byte swap subsequent input, sneaking prohibited characters past security-enforcing input validators.
Assignee | ||
Comment 3•15 years ago
|
||
Hm, why not U+FEFF then, if that were part of the reasoning? Why why why couldn't the existing behavior have had an explanatory comment by it...
Comment 4•15 years ago
|
||
U+FEFF is a valid, assigned character. There is no reason to prohibit it. The code would be quite unmaintainable if every bug fix added explanatory comments. Such things generally belong in Bugzilla, for example bug 74198 comment 3.
Assignee | ||
Comment 5•15 years ago
|
||
Er, oh, never mind, I don't know what I was thinking with respect to U+FEFF. I disagree about where explanatory comments should go, but do note that the bug that introduced that code didn't have such a comment.
Comment 6•15 years ago
|
||
Bug 172699, which introduced this code, has an initial comment referring to bug 172699, which has the comment.
Assignee | ||
Comment 7•13 years ago
|
||
This causes test262 failures: S15.1.3.1_A2.4_T1 Complex tests, use RFC 3629 fail S15.1.3.2_A2.4_T1 Complex tests, use RFC 3629 fail Regardless what Unicode wants here, or what is purest from that point of view (and I do sympathize with it), I think ES5 wants us to treat Table 21 - UTF-8 Encodings as canonical. Only bad or mismatched surrogates, or overlong sequences, are to be treated as errors. That's what test262 wants. It's what Chrome, Opera, IE, and Safari implement. If anyone actually were to be broken by letting a couple more code points through, they'd also be broken in every other browser. And if we did this, it looks like it'd address the dependency bug 505991. We should make the change.
Assignee | ||
Comment 8•13 years ago
|
||
This is all straightforward changes to whitespace, variable declaration location, and so on, so it doesn't need review. I'm just posting it in case anyone wants to actually apply the next patch, which will modify a tree with this patch in it.
Attachment #546056 -
Flags: review+
Assignee | ||
Comment 9•13 years ago
|
||
This undoes replacement for these two characters and adds a test for non-replacing behavior.
Attachment #546057 -
Flags: review?(masa141421356)
Comment 10•13 years ago
|
||
I think it is a poor choice to introduce a potential security vulnerability just because some standard or test suite seems to require it. I believe the security vulnerability should be brought up with the maintainers of the standard/suite in question.
Assignee | ||
Comment 11•13 years ago
|
||
I disagree, obviously, but the concern is at least plausible. es-discuss can take a pass at it, although I doubt it will codify either current replace-with-U+FFFD behavior or throw-a-URIError behavior not implemented by any browser to date. https://mail.mozilla.org/pipermail/es-discuss/2011-July/015993.html Hopefully that request will receive more responses than it did the first time I made it: https://mail.mozilla.org/pipermail/es-discuss/2009-October/010014.html My query of a few hours ago is actually in response to this message, which received no replies when I originally posted it.
Assignee | ||
Comment 12•13 years ago
|
||
The response so far: https://mail.mozilla.org/pipermail/es-discuss/2011-July/016000.html
Comment 13•13 years ago
|
||
Comment on attachment 546057 [details] [diff] [review] Patch and test Looks fine.
Attachment #546057 -
Flags: review?(masa141421356) → review+
Assignee | ||
Comment 14•13 years ago
|
||
In the absence of feedback from ECMA saying that censoring U+FFF{E,F} is appropriate, and in the presence of tentative feedback saying that it's the wrong thing to do, I landed the patch to make the switch: http://hg.mozilla.org/integration/mozilla-inbound/rev/ffc7c2391d3f
Whiteboard: [inbound]
Target Milestone: --- → mozilla8
Comment 15•13 years ago
|
||
http://hg.mozilla.org/mozilla-central/rev/ffc7c2391d3f
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Updated•13 years ago
|
Status: RESOLVED → VERIFIED
You need to log in
before you can comment on or make changes to this bug.
Description
•