Opened 12 years ago
Closed 11 years ago
#10276 closed New Feature (fixed)
Introduce config.disallowedContent
Reported by: | Piotrek Koszuliński | Owned by: | Piotrek Koszuliński |
---|---|---|---|
Priority: | Normal | Milestone: | CKEditor 4.4.0 |
Component: | General | Version: | 4.1 RC |
Keywords: | Drupal | Cc: | wim.leers@… |
Description (last modified by )
We have config.allowedContent
being a whitelist.
But real life cases often require also being able to define blacklist. Therefore we should introduce config.disallowedContent
.
Change History (21)
comment:1 Changed 12 years ago by
Status: | new → confirmed |
---|
comment:2 Changed 12 years ago by
comment:3 Changed 12 years ago by
Cc: | wim.leers@… added |
---|---|
Description: | modified (diff) |
Keywords: | Drupal added |
This, or something like it, is (AFAICT) necessary for https://drupal.org/node/1936392. I discussed this in that context with @Reinmar on March 27, I think slightly before this ticket was created.
Thinking purely conceptually and logically, I think there might be a case to be made for "whitelisting only" from a WYSIWYG editor perspective.
Having only a whitelist, you must enumerate all allowed tags, attributes, styles and classes. That might seem like more work, but this also ensures that you fully define the universe of all possible uses and values.
For example, rather han being able to exclude (blacklist) e.g. just the target
attribute on <a>
tags, you'll have to allow (whitelist) all of the attributes you want to allow: href
, hreflang
and rel
. More work, but also more certainty, more control. It's probably worth it.
Since a WYSIWYG editor's job is to generate markup, and it's easy to abuse it (or plugins for it) to generate unwanted markup, it makes a lot of sense to demand that the whole universe of allowed values is well-defined.
So, whitelisting only, allowedContent
only, makes sense for CKEditor.
Replying to wwalc:
I think disallowedContent
may not be necessary, since there's not really a use case for blacklisting tags. But there is a clear use case for blacklisting properties (attributes, styles, classes). What if CKEditor would just have to allow that?
Besides attributes
and requiredAttributes
, you could have forbiddenAttributes
. In the string notation, you could use the ^
character for marking forbidden attributes, similarly like !
for required attributes. (Why ^
? It could be another character of course, but ^
is used for negation in regular expressions.)
comment:4 Changed 12 years ago by
Inspired by comment:1:ticket:10471, I've built a temporary work-around (posted at https://drupal.org/node/1936392#comment-7507809) until this ticket is resolved:
/** * This is a huge hack to do ONE thing: to allow Drupal to fully mandate what * CKEditor should allow by setting CKEditor's allowedContent setting. The * problem is that allowedContent only allows for whitelisting, whereas * Drupal's default HTML filtering (the filter_html filter) also blacklists * the "style" and "on*" ("onClick" etc.) attributes. * * So this function hacks in explicit support for Drupal's filter_html's need * to blacklisting specifically those attributes, until ACF supports * blacklisting of properties: http://dev.ckeditor.com/ticket/10276. * * Limitations: * - This does not support blacklisting of other attributes, it's only * intended to implement filter_html's blacklisted attributes. * - This is only a temporary work-around; it assumes the filter_html * filter is being used whenever *any* restriction exists. This is a valid * assumption for the default text formats in Drupal 8 core, but obviously * won't work for release. * * This is the only way we could get https://drupal.org/node/1936392 committed * before Drupal 8 code freeze on July 1, 2013. CKEditor has committed to * explicitly supporting this in some way. * * @todo D8 remove this once http://dev.ckeditor.com/ticket/10276 is done. */ _ACF_HACK_to_support_blacklisted_attributes: function (element, format) { function override(rule) { var oldValue = rule.attributes; function filter_html_override_attributes (attribute) { // Disallow the "style" and "on*" attributes on any tag. if (attribute === 'style' || attribute.substr(0, 2) === 'on') { return false; } // Ensure the original logic still runs, if any. if (typeof oldValue === 'function') { return oldValue(attribute); } else if (typeof oldValue === 'boolean') { return oldValue; } // Otherwise, accept this attribute. return true; } rule.attributes = filter_html_override_attributes; } CKEDITOR.once('instanceLoaded', function(e) { if (e.editor.name === element.id) { // If everything is allowed, everything is allowed. if (format.editorSettings.allowedContent === true) { return; } // Otherwise, assume Drupal's filter_html filter is being used. else { // Get the filter object (ACF). var filter = e.editor.filter; // Find the "config" rule (the one caused by the allowedContent // setting) for each HTML tag, and override its "attributes" value. for (var el in filter._.rules.elements) { if (filter._.rules.elements.hasOwnProperty(el)) { for (var i = 0; i < filter._.rules.elements[el].length; i++) { if (filter._.rules.elements[el][i].featureName === 'config') { override(filter._.rules.elements[el][i]); } } } } } } }); }
comment:5 follow-up: 6 Changed 12 years ago by
The main use case for blacklisting is being able to trimm down automatically generated set of ACF rules. Instead of redefining entire universe of what's allowed one should be able to just remove some things from automatic configuration.
The second use case is created by the inability to define allowedContent rules that are allowing everything with some exceptions (e.g. Drupal's case: "all attributes except style
and on*
"). Even with regexp support in validators this is not easily doable.
So, I see disallowedContent
as a subtraction from allowedContent
or *[*]{*}(*)
universe. It should have higher priority (use case 1) and work with both - automatic and manual configuration (use case 2).
Question is - could blacklisting be simplified comparing to whitelisting?
- Instead of defining entire
disallowedContent
(with the same syntax asallowedContent
) maybe it could be possible to define just the 3 functions for disallowing arguments/styles/classes validation. After a bit of thinking I'm rather sceptical regarding this. By having static syntax we are able to anticipate what will happen in edge cases. With functions ACF ruleset would become a blackbox for us. I can't think of any situation right now, but I'm pretty sure there would be some problems. - The second idea was that we don't need the full format, but just blacklisting properties. Unfortunately this would be a too strong simplification. We may want to remove
border
style from images, but not from tables. Also we may want to disallow captions for tables, but not entire table plugin.
On the other hand it won't be a problem for us to implement full format (without required properties, which do not make sense in this case). Parser is ready, rules normalization is ready - just a matter of adding a prepass with disallowed content rules. (of course I miss some serious problems which will come out during implementation :D)
comment:6 Changed 12 years ago by
Replying to Reinmar:
So, I see
disallowedContent
as a subtraction fromallowedContent
or*[*]{*}(*)
universe. It should have higher priority (use case 1) and work with both - automatic and manual configuration (use case 2).
Interesting, that makes a lot of sense! I'm wondering if wwalc also looked at it that way?
However, I disagree that it's entirely analogous. If you go this route, then the string/object format will no longer be logical IMHO.
Here's the key thing: if an element is disallowed, then it is by definition impossible to set attributes, styles or classes on it.
That's why in my opinion it doesn't make any sense to list disallowed attributes etc. within disallowedContent
: it should only contain disallowed HTML elements!
Given a list of allowed elements (i.e. allowedContent
), only then it makes sense to define restrictions on which attributes, styles and classes may be set (attributes
), must be set (requiredAttributes
) and … may not be set (forbiddenAttributes
, as I proposed in comment:3).
With functions ACF ruleset would become a blackbox for us. I can't think of any situation right now, but I'm pretty sure there would be some problems.
You're right: idempotency, and because of that: caching. The current ACF configuration is static, and hence results are idempotent. Functions are dynamic, hence results are not necessarily idempotent, which prevents caching. I don't know how necessary/impactful caching is here though.
- The second idea was that we don't need the full format, but just blacklisting properties. Unfortunately this would be a too strong simplification. We may want to remove
border
style from images, but not from tables. Also we may want to disallow captions for tables, but not entire table plugin.
My proposal is that you blacklist properties for a given element.
In the modified string format I proposed at the bottom of comment:3, the per-element property blacklisting you describe could be expressed like this:
img { * , ^border }
Drupal's specific blacklisting ("disallow style
and on*
on every tag) could be expressed like this:
* [ ^style , ^on* ]
comment:7 Changed 11 years ago by
Milestone: | → CKEditor 4.4 |
---|
comment:8 Changed 11 years ago by
Owner: | set to Piotrek Koszuliński |
---|---|
Status: | confirmed → assigned |
comment:9 Changed 11 years ago by
I understood one possible limitation of blacklisting. What do you think about it?
Disallowing content will not work at all if ACF is disabled (by setting config.allowedContent = true
). So it won't be possible to subtract some rules from the *(*){*}[*]
universe. The reason is that when the filter is disabled, for performance reasons it's not applied to input data at all. Enabling it when ( allowedContent===true && disallowedContent )
would require not only enabling entire filter but also making sure that it passes all content when not having any allowed content rules defined. That's a lot of coding and I think that case is not worth it.
comment:10 Changed 11 years ago by
Two cases:
- What should
disallowedContent == 'span'
do when'span(!marker1)'
was allowed? - What should
disallowedContent == 'span(marker1)'
do when'span(!marker1)'
was allowed? - What should
disallowedContent == 'span(marker1)'
do when'span(marker1)'
was allowed?
In other words - can we make it possible and how to disallow tag names?
The second case is easy - if we remove marker1
class from span
, that span
will be removed because that class was required. This is how features usually work in CKEditor - they require some class or style, so disallowing for example 'span{font-size}' will disable font size feature.
But then what in other cases?
I would say that in the first one either nothing should happen or all rules allowing spans should be removed. The first option is nice because it's totally consistent - developer can disallow elements' properties, not elements themselves. The second option has more power and may be more intuitive, but is tricky. For example the first option answers a question what should happen in third case, but second option not.
Of course we can have a property in object format (and maybe some shortcut in string format) of disallowed content rules, but we have to pick the the default behaviour anyway. Also, implementing two behaviours now may not make sense - I would try to predict which option will be more useful and implement only that one.
My vote: 1st option - elements in disallowed content rules are used only to identify from which elements specified properties will be removed. This is what most people want - disallow width/height
styles for img
elements, disable all on*
attributes on all elements, etc. The option to disallow entire elements may be implemented in the future.
comment:11 Changed 11 years ago by
elements in disallowed content rules are used only to identify from which elements specified properties will be removed.
But if allowed content allows both tags and attributes, shouldn't disallowedContend have same capabilities?
Perhaps simple decision what has higher priority is enough? In real life we are accustomed to "what isn't forbidden is allowed" thus perhaps disallowedContent should have final word and if something was allowed by allowedContent then disallowedContent should remove it (in such case order of calling these properties would matter)?
comment:12 Changed 11 years ago by
But if allowed content allows both tags and attributes, shouldn't disallowedContend have same capabilities?
That's not a point. The point is that rule disallowedContent: 'span{foo}'
may be understood in two ways:
- Disallow
foo
style on allspan
s. - Disallow
foo
style on allspan
s and disallowspan
elements too (when ...?).
So, of course disallowedContent has higher priority than allowedContent. Question is how to make it possible to disallow elements.
I also thought about another possibility.
- If there are any elements' properties (styles, classes, attributes) defined in disallowed content rule, then the rule means "don't allow these properties on these elements". It does not disallow the element itself yet.
- A rule with only element names and without any properties means "don't allow these elements - remove them completely".
I'm not sure if this is intuitive though.
comment:13 Changed 11 years ago by
- Disallow
foo
style on allspan
s and disallowspan
elements too
As a user I would expect it to work that way otherwise it shouldn't be called disallowedContent
but rather disallowedProperties
.
Would it be possible, when editor is starting, to create set of rules and then decide which plugins (and extra rules) should be enabled and which ones not? I know that each plugin registers itself to ACF so most likely not.
Btw. disallowedContent
should probably have higher priority than extraAllowedContent
.
comment:14 Changed 11 years ago by
Disallow foo style on all spans and disallow span elements too
As a user I would expect it to work that way otherwise it shouldn't be called disallowedContent but rather disallowedProperties.
The thing is that we may want/need to implement both options in the future, so changing the config option name is not a good idea. I would rather want to keep this together, because I can reuse one parsing and optimization algorithm.
Btw. disallowedContent should probably have higher priority than extraAllowedContent.
Yes. It has the highest priority.
Would it be possible, when editor is starting, to create set of rules and then decide which plugins (and extra rules) should be enabled and which ones not? I know that each plugin registers itself to ACF so most likely not.
Interesting, but unfortunately no. For example you've got an image plugin which enables: 'img[!src,alt]{width,height}'. A popular demand is to be able to disallow these width and height styles, but keep the image. So in order to do that they would have to be a separate feature. We would end with atomic features like 'imageStyleWidth' - too complex. And there's another TC - you want to enable all attributes except on*
. The "all attributes" would have to be split into infinite number of features.
comment:15 Changed 11 years ago by
There are two things that come to my mind for one solution - ACF has to be told somehow that either all or only properties should be removed.
- You could have two properties
disallowedContent
anddisallowedProperties
using same logic but the second property would be used to keep tags but prevent properties(styles, attributes, classes) only.
- Perhaps some special character could inform ACF not to remove the tag e.g.
disallowedContent : '@img{width,height}' // remove width styles only disallowedContent : 'img{width,height}' // remove width styles plus tag (in short all img)
Just noticed that second rule doesn't make much sense as simple 'img' would be enough :)
comment:16 Changed 11 years ago by
Status: | assigned → review |
---|
Pushed branch:t/10276 on dev and tests.
There's no documentation yet - extracted to #11709.
What works:
- disallowing entire elements - e.g. 'img' will disable entire image feature,
- disallowing elements' properties - e.g. 'img{width,height}',
- DACR with wildcards: - e.g. '*[on*]',
- DACR in object format - e.g. match option.
- transformations - e.g. if
extraAllowedContent: img[width,height]
anddisallowedContent: 'img{width,height}'
, then attributes will be used. Although, you need to remember this is only an integration on data processing level - image plugin does not understand that.
So basically everything I could think of.
Interestingly, I noticed no visible performance degradation. On the same set of tests, on Chrome the results were:
- 3.481s,3.482s,3.388s on branch:t/10276
- 3.656s,3.391s,3.263s on branch:major
However, you may notice that implementation of blacklisting itself may not be optimum. The reason is that most people won't use blacklisting or will use just really few rules. Therefore, I focused on keeping code short and clear and optimized for ACRs.
comment:18 Changed 11 years ago by
Status: | review → review_failed |
---|
Though I'm not able to get through changes in dev code, the general impression is "thumbs up". There are few remarks:
- Why does
filter.disallow
need additional argument?'test disallow() aborts if editor has custom config': function() { var filter = new CKEDITOR.filter( this.editors.editorCustomAllowedContent ); assert.isFalse( filter.disallow( 'b' ) ); assert.areSame( 0, filter.disallowedContent.length ); }, 'test disallow() adds rule if editor has custom config and overrideCustom passed': function() { var filter = new CKEDITOR.filter( this.editors.editorCustomAllowedContent ); assert.isTrue( filter.disallow( 'b', true ) ); assert.areSame( 1, filter.disallowedContent.length ); },
The second "safety" argument is not necessary fordisallowedContent
because, unlikeallowedContent
, DC is always meant to overwrite (subtract) rules, right? - Missing docs for
acfTestTools
. We got lots of mysterious tools which are not documented. - It got to be explained why
disallowedContent: '*[*]'
works butdisallowedContent: '*'
does not. In fact it was the first thing I did out of curiosity and I found it quite confusing. Even if this a feature. - Some example in
samples/datafiltering.html
?
comment:19 Changed 11 years ago by
Status: | review_failed → review |
---|
Force-pushed branch:t/10276 on dev and tests. Fixed 1. & 2.
Points 3. and 4. are part of #11709. Though I don't know if we can add now another example to the sample, because it's already too long. I think that we'll focus now on extending docs and proper cross linking.
comment:20 Changed 11 years ago by
Status: | review → review_passed |
---|
comment:21 Changed 11 years ago by
Resolution: | → fixed |
---|---|
Status: | review_passed → closed |
Hurray, hurray! Merged to major with git:c6e729f on dev and a72150e on tests.
Few things:
allowedContent
anddisallowedContent
are defined in plugin definitions?1a) Will CKEditor accept anything that is accepted by
allowedContent(s)
and not disallowed bydisallowedContent(s)
?This way a naive plugin would not be able to specify
disallowedContent
to quickly specify a set of tags that can be inserted by it (e.g. an "insert embed media" plugin that wants to inject video, audio, embed, object, param, iframe and such things), because allowedContent(s) will not allow it anyway.1b) Will
allowedContent
be ignored totally if at least onedisallowedContent
rule is set? I cannot find any reasonable rule for disallowedContent that could explain such behavior. Besides it means that black listing is preferred over white listing...allowedContent
anddisallowedContent
specified by plugins when the following options are set in config.js:My opinion is that introducing
config.disallowedContent
will just increase the complexity of ACF. If we have so far just one real use case for it (blacklistingstyle
andon*
attributes), instead of supporting it "officially" in API let's just find a workaround for this particular issue.