Opened 16 years ago
Last modified 12 years ago
#2728 confirmed Bug
String.prototype.Trim should also trim unicode ideographic space
Reported by: | thiloplanz | Owned by: | |
---|---|---|---|
Priority: | Normal | Milestone: | |
Component: | General | Version: | |
Keywords: | Cc: |
Description
String.prototype.Trim (defined in fckjscoreextensions.js) should also remove the Unicode "Ideographic Space" (U+3000), which is used in Japanese.
String.prototype.Trim = function() { // We are not using \s because we don't want "non-breaking spaces to be caught". return this.replace( /(^[ \t\n\r\u3000]*)|([ \t\n\r\u3000]*$)/g, '' ) ; } String.prototype.LTrim = function() { // We are not using \s because we don't want "non-breaking spaces to be caught". return this.replace( /^[ \t\n\r\u3000]*/g, '' ) ; } String.prototype.RTrim = function() { // We are not using \s because we don't want "non-breaking spaces to be caught". return this.replace( /[ \t\n\r\u3000]*$/g, '' ) ; }
Change History (4)
comment:1 Changed 16 years ago by
comment:2 Changed 16 years ago by
I am not sure we want to strip all Unicode whitespace, for example the "non-breaking spaces" mentioned in the comment in the code need to be preserved. The various types would need to be reviewed one by one, by someone who actually uses them.
For Japanese users, U+3000 is commonly used and treated the same as "regular" space, with an expectation that it should also be trimmed.
comment:3 Changed 16 years ago by
Summary: | String.prototype.Trim should also trim unicode two-byte whitespace → String.prototype.Trim should also trim unicode ideographic space |
---|
comment:4 Changed 12 years ago by
Status: | new → confirmed |
---|
The ECMAScript 5's String#trim method replaces /\s/
and this pattern includes U+3000 character. I think that CKEditor should do that too.
https://developer.mozilla.org/en-US/docs/JavaScript/Reference/Global_Objects/String/Trim
There are several whitespace defitions, the base whitespace def in Unicode is here but most languages restrict to ASCII whitespace. So either you restrict to ASCII or strip all Unicode whitespace.