Opened 6 years ago

Closed 5 years ago

#10276 closed New Feature (fixed)

Introduce config.disallowedContent

Reported by: Piotrek Koszuliński Owned by: Piotrek Koszuliński
Priority: Normal Milestone: CKEditor 4.4.0
Component: General Version: 4.1 RC
Keywords: Drupal Cc: wim.leers@…

Description (last modified by Wim Leers)

We have config.allowedContent being a whitelist.

But real life cases often require also being able to define blacklist. Therefore we should introduce config.disallowedContent.

Change History (21)

comment:1 Changed 6 years ago by Piotrek Koszuliński

Status: newconfirmed

comment:2 Changed 6 years ago by Wiktor Walc

Few things:

  1. What will happen if both allowedContent and disallowedContent are defined in plugin definitions?

Example 1: allowedContent: a img strong disallowedContent: script iframe img

Example 2: allowedContent: img[*](*) disallowedContent: img(width, height)

1a) Will CKEditor accept anything that is accepted by allowedContent(s) and not disallowed by disallowedContent(s)?

This way a naive plugin would not be able to specify disallowedContent to quickly specify a set of tags that can be inserted by it (e.g. an "insert embed media" plugin that wants to inject video, audio, embed, object, param, iframe and such things), because allowedContent(s) will not allow it anyway.

1b) Will allowedContent be ignored totally if at least one disallowedContent rule is set? I cannot find any reasonable rule for disallowedContent that could explain such behavior. Besides it means that black listing is preferred over white listing...

  1. what will happen with allowedContent and disallowedContent specified by plugins when the following options are set in config.js:
  • config.allowedContent=true
  • config.disallowedContent=true
  • config.allowedContent rules are set
  • config.disallowedContent rules are set
  • config.allowedContent=true AND config.disallowedContent rules are set
  • config.disallowedContent=true AND config.allowedContent rules are set

My opinion is that introducing config.disallowedContent will just increase the complexity of ACF. If we have so far just one real use case for it (blacklisting style and on* attributes), instead of supporting it "officially" in API let's just find a workaround for this particular issue.

comment:3 Changed 5 years ago by Wim Leers

Cc: wim.leers@… added
Description: modified (diff)
Keywords: Drupal added

This, or something like it, is (AFAICT) necessary for https://drupal.org/node/1936392. I discussed this in that context with @Reinmar on March 27, I think slightly before this ticket was created.


Thinking purely conceptually and logically, I think there might be a case to be made for "whitelisting only" from a WYSIWYG editor perspective.

Having only a whitelist, you must enumerate all allowed tags, attributes, styles and classes. That might seem like more work, but this also ensures that you fully define the universe of all possible uses and values. For example, rather han being able to exclude (blacklist) e.g. just the target attribute on <a> tags, you'll have to allow (whitelist) all of the attributes you want to allow: href, hreflang and rel. More work, but also more certainty, more control. It's probably worth it.

Since a WYSIWYG editor's job is to generate markup, and it's easy to abuse it (or plugins for it) to generate unwanted markup, it makes a lot of sense to demand that the whole universe of allowed values is well-defined.

So, whitelisting only, allowedContent only, makes sense for CKEditor.


Replying to wwalc:

I think disallowedContent may not be necessary, since there's not really a use case for blacklisting tags. But there is a clear use case for blacklisting properties (attributes, styles, classes). What if CKEditor would just have to allow that?

Besides attributes and requiredAttributes, you could have forbiddenAttributes. In the string notation, you could use the ^ character for marking forbidden attributes, similarly like ! for required attributes. (Why ^? It could be another character of course, but ^ is used for negation in regular expressions.)

comment:4 Changed 5 years ago by Wim Leers

Inspired by comment:1:ticket:10471, I've built a temporary work-around (posted at https://drupal.org/node/1936392#comment-7507809) until this ticket is resolved:

  /**
   * This is a huge hack to do ONE thing: to allow Drupal to fully mandate what
   * CKEditor should allow by setting CKEditor's allowedContent setting. The
   * problem is that allowedContent only allows for whitelisting, whereas
   * Drupal's default HTML filtering (the filter_html filter) also blacklists
   * the "style" and "on*" ("onClick" etc.) attributes.
   *
   * So this function hacks in explicit support for Drupal's filter_html's need
   * to blacklisting specifically those attributes, until ACF supports
   * blacklisting of properties: http://dev.ckeditor.com/ticket/10276.
   *
   * Limitations:
   *   - This does not support blacklisting of other attributes, it's only
   *     intended to implement filter_html's blacklisted attributes.
   *   - This is only a temporary work-around; it assumes the filter_html
   *     filter is being used whenever *any* restriction exists. This is a valid
   *     assumption for the default text formats in Drupal 8 core, but obviously
   *     won't work for release.
   *
   * This is the only way we could get https://drupal.org/node/1936392 committed
   * before Drupal 8 code freeze on July 1, 2013. CKEditor has committed to
   * explicitly supporting this in some way.
   *
   * @todo D8 remove this once http://dev.ckeditor.com/ticket/10276 is done.
   */
  _ACF_HACK_to_support_blacklisted_attributes: function (element, format) {
    function override(rule) {
      var oldValue = rule.attributes;
      function filter_html_override_attributes (attribute) {
        // Disallow the "style" and "on*" attributes on any tag.
        if (attribute === 'style' || attribute.substr(0, 2) === 'on') {
          return false;
        }

        // Ensure the original logic still runs, if any.
        if (typeof oldValue === 'function') {
          return oldValue(attribute);
        }
        else if (typeof oldValue === 'boolean') {
          return oldValue;
        }

        // Otherwise, accept this attribute.
        return true;
      }
      rule.attributes = filter_html_override_attributes;
    }

    CKEDITOR.once('instanceLoaded', function(e) {
      if (e.editor.name === element.id) {
        // If everything is allowed, everything is allowed.
        if (format.editorSettings.allowedContent === true) {
          return;
        }
        // Otherwise, assume Drupal's filter_html filter is being used.
        else {
          // Get the filter object (ACF).
          var filter = e.editor.filter;
          // Find the "config" rule (the one caused by the allowedContent
          // setting) for each HTML tag, and override its "attributes" value.
          for (var el in filter._.rules.elements) {
            if (filter._.rules.elements.hasOwnProperty(el)) {
              for (var i = 0; i < filter._.rules.elements[el].length; i++) {
                if (filter._.rules.elements[el][i].featureName === 'config') {
                  override(filter._.rules.elements[el][i]);
                }
              }
            }
          }
        }
      }
    });
  }

comment:5 Changed 5 years ago by Piotrek Koszuliński

The main use case for blacklisting is being able to trimm down automatically generated set of ACF rules. Instead of redefining entire universe of what's allowed one should be able to just remove some things from automatic configuration.

The second use case is created by the inability to define allowedContent rules that are allowing everything with some exceptions (e.g. Drupal's case: "all attributes except style and on*"). Even with regexp support in validators this is not easily doable.

So, I see disallowedContent as a subtraction from allowedContent or *[*]{*}(*) universe. It should have higher priority (use case 1) and work with both - automatic and manual configuration (use case 2).

Question is - could blacklisting be simplified comparing to whitelisting?

  • Instead of defining entire disallowedContent (with the same syntax as allowedContent) maybe it could be possible to define just the 3 functions for disallowing arguments/styles/classes validation. After a bit of thinking I'm rather sceptical regarding this. By having static syntax we are able to anticipate what will happen in edge cases. With functions ACF ruleset would become a blackbox for us. I can't think of any situation right now, but I'm pretty sure there would be some problems.
  • The second idea was that we don't need the full format, but just blacklisting properties. Unfortunately this would be a too strong simplification. We may want to remove border style from images, but not from tables. Also we may want to disallow captions for tables, but not entire table plugin.

On the other hand it won't be a problem for us to implement full format (without required properties, which do not make sense in this case). Parser is ready, rules normalization is ready - just a matter of adding a prepass with disallowed content rules. (of course I miss some serious problems which will come out during implementation :D)

comment:6 in reply to:  5 Changed 5 years ago by Wim Leers

Replying to Reinmar:

So, I see disallowedContent as a subtraction from allowedContent or *[*]{*}(*) universe. It should have higher priority (use case 1) and work with both - automatic and manual configuration (use case 2).

Interesting, that makes a lot of sense! I'm wondering if wwalc also looked at it that way?

However, I disagree that it's entirely analogous. If you go this route, then the string/object format will no longer be logical IMHO.

Here's the key thing: if an element is disallowed, then it is by definition impossible to set attributes, styles or classes on it. That's why in my opinion it doesn't make any sense to list disallowed attributes etc. within disallowedContent: it should only contain disallowed HTML elements! Given a list of allowed elements (i.e. allowedContent), only then it makes sense to define restrictions on which attributes, styles and classes may be set (attributes), must be set (requiredAttributes) and … may not be set (forbiddenAttributes, as I proposed in comment:3).


With functions ACF ruleset would become a blackbox for us. I can't think of any situation right now, but I'm pretty sure there would be some problems.

You're right: idempotency, and because of that: caching. The current ACF configuration is static, and hence results are idempotent. Functions are dynamic, hence results are not necessarily idempotent, which prevents caching. I don't know how necessary/impactful caching is here though.


  • The second idea was that we don't need the full format, but just blacklisting properties. Unfortunately this would be a too strong simplification. We may want to remove border style from images, but not from tables. Also we may want to disallow captions for tables, but not entire table plugin.

My proposal is that you blacklist properties for a given element.

In the modified string format I proposed at the bottom of comment:3, the per-element property blacklisting you describe could be expressed like this:

img { * , ^border }

Drupal's specific blacklisting ("disallow style and on* on every tag) could be expressed like this:

* [ ^style , ^on* ]
Last edited 5 years ago by Wim Leers (previous) (diff)

comment:7 Changed 5 years ago by Piotrek Koszuliński

Milestone: CKEditor 4.4

comment:8 Changed 5 years ago by Piotrek Koszuliński

Owner: set to Piotrek Koszuliński
Status: confirmedassigned

comment:9 Changed 5 years ago by Piotrek Koszuliński

I understood one possible limitation of blacklisting. What do you think about it?

Disallowing content will not work at all if ACF is disabled (by setting config.allowedContent = true). So it won't be possible to subtract some rules from the *(*){*}[*] universe. The reason is that when the filter is disabled, for performance reasons it's not applied to input data at all. Enabling it when ( allowedContent===true && disallowedContent ) would require not only enabling entire filter but also making sure that it passes all content when not having any allowed content rules defined. That's a lot of coding and I think that case is not worth it.

comment:10 Changed 5 years ago by Piotrek Koszuliński

Two cases:

  1. What should disallowedContent == 'span' do when 'span(!marker1)' was allowed?
  2. What should disallowedContent == 'span(marker1)' do when 'span(!marker1)' was allowed?
  3. What should disallowedContent == 'span(marker1)' do when 'span(marker1)' was allowed?

In other words - can we make it possible and how to disallow tag names?

The second case is easy - if we remove marker1 class from span, that span will be removed because that class was required. This is how features usually work in CKEditor - they require some class or style, so disallowing for example 'span{font-size}' will disable font size feature.

But then what in other cases?

I would say that in the first one either nothing should happen or all rules allowing spans should be removed. The first option is nice because it's totally consistent - developer can disallow elements' properties, not elements themselves. The second option has more power and may be more intuitive, but is tricky. For example the first option answers a question what should happen in third case, but second option not.

Of course we can have a property in object format (and maybe some shortcut in string format) of disallowed content rules, but we have to pick the the default behaviour anyway. Also, implementing two behaviours now may not make sense - I would try to predict which option will be more useful and implement only that one.

My vote: 1st option - elements in disallowed content rules are used only to identify from which elements specified properties will be removed. This is what most people want - disallow width/height styles for img elements, disable all on* attributes on all elements, etc. The option to disallow entire elements may be implemented in the future.

comment:11 Changed 5 years ago by Jakub Ś

elements in disallowed content rules are used only to identify from which elements specified properties will be removed.

But if allowed content allows both tags and attributes, shouldn't disallowedContend have same capabilities?

Perhaps simple decision what has higher priority is enough? In real life we are accustomed to "what isn't forbidden is allowed" thus perhaps disallowedContent should have final word and if something was allowed by allowedContent then disallowedContent should remove it (in such case order of calling these properties would matter)?

comment:12 Changed 5 years ago by Piotrek Koszuliński

But if allowed content allows both tags and attributes, shouldn't disallowedContend have same capabilities?

That's not a point. The point is that rule disallowedContent: 'span{foo}' may be understood in two ways:

  • Disallow foo style on all spans.
  • Disallow foo style on all spans and disallow span elements too (when ...?).

So, of course disallowedContent has higher priority than allowedContent. Question is how to make it possible to disallow elements.

I also thought about another possibility.

  • If there are any elements' properties (styles, classes, attributes) defined in disallowed content rule, then the rule means "don't allow these properties on these elements". It does not disallow the element itself yet.
  • A rule with only element names and without any properties means "don't allow these elements - remove them completely".

I'm not sure if this is intuitive though.

comment:13 Changed 5 years ago by Jakub Ś

  • Disallow foo style on all spans and disallow span elements too

As a user I would expect it to work that way otherwise it shouldn't be called disallowedContent but rather disallowedProperties.

Would it be possible, when editor is starting, to create set of rules and then decide which plugins (and extra rules) should be enabled and which ones not? I know that each plugin registers itself to ACF so most likely not.

Btw. disallowedContent should probably have higher priority than extraAllowedContent.

comment:14 Changed 5 years ago by Piotrek Koszuliński

Disallow foo style on all spans and disallow span elements too

As a user I would expect it to work that way otherwise it shouldn't be called disallowedContent but rather disallowedProperties.

The thing is that we may want/need to implement both options in the future, so changing the config option name is not a good idea. I would rather want to keep this together, because I can reuse one parsing and optimization algorithm.

Btw. disallowedContent should probably have higher priority than extraAllowedContent.

Yes. It has the highest priority.

Would it be possible, when editor is starting, to create set of rules and then decide which plugins (and extra rules) should be enabled and which ones not? I know that each plugin registers itself to ACF so most likely not.

Interesting, but unfortunately no. For example you've got an image plugin which enables: 'img[!src,alt]{width,height}'. A popular demand is to be able to disallow these width and height styles, but keep the image. So in order to do that they would have to be a separate feature. We would end with atomic features like 'imageStyleWidth' - too complex. And there's another TC - you want to enable all attributes except on*. The "all attributes" would have to be split into infinite number of features.

comment:15 Changed 5 years ago by Jakub Ś

There are two things that come to my mind for one solution - ACF has to be told somehow that either all or only properties should be removed.

  1. You could have two properties disallowedContent and disallowedProperties using same logic but the second property would be used to keep tags but prevent properties(styles, attributes, classes) only.
  1. Perhaps some special character could inform ACF not to remove the tag e.g.
    disallowedContent : '@img{width,height}' // remove width styles only
    disallowedContent : 'img{width,height}' // remove width styles plus tag (in short all img)
    

Just noticed that second rule doesn't make much sense as simple 'img' would be enough :)

comment:16 Changed 5 years ago by Piotrek Koszuliński

Status: assignedreview

Pushed branch:t/10276 on dev and tests.

There's no documentation yet - extracted to #11709.

What works:

  • disallowing entire elements - e.g. 'img' will disable entire image feature,
  • disallowing elements' properties - e.g. 'img{width,height}',
  • DACR with wildcards: - e.g. '*[on*]',
  • DACR in object format - e.g. match option.
  • transformations - e.g. if extraAllowedContent: img[width,height] and disallowedContent: 'img{width,height}', then attributes will be used. Although, you need to remember this is only an integration on data processing level - image plugin does not understand that.

So basically everything I could think of.

Interestingly, I noticed no visible performance degradation. On the same set of tests, on Chrome the results were:

However, you may notice that implementation of blacklisting itself may not be optimum. The reason is that most people won't use blacklisting or will use just really few rules. Therefore, I focused on keeping code short and clear and optimized for ACRs.

comment:17 Changed 5 years ago by Olek Nowodziński

Rebased branches on major.

comment:18 Changed 5 years ago by Olek Nowodziński

Status: reviewreview_failed

Though I'm not able to get through changes in dev code, the general impression is "thumbs up". There are few remarks:

  1. Why does filter.disallow need additional argument?
    'test disallow() aborts if editor has custom config': function() {
    	var filter = new CKEDITOR.filter( this.editors.editorCustomAllowedContent );
    
    	assert.isFalse( filter.disallow( 'b' ) );
    	assert.areSame( 0, filter.disallowedContent.length );
    },
    
    'test disallow() adds rule if editor has custom config and overrideCustom passed': function() {
    	var filter = new CKEDITOR.filter( this.editors.editorCustomAllowedContent );
    
    	assert.isTrue( filter.disallow( 'b', true ) );
    	assert.areSame( 1, filter.disallowedContent.length );
    },
    
    The second "safety" argument is not necessary for disallowedContent because, unlike allowedContent, DC is always meant to overwrite (subtract) rules, right?
  2. Missing docs for acfTestTools. We got lots of mysterious tools which are not documented.
  3. It got to be explained why disallowedContent: '*[*]' works but disallowedContent: '*' does not. In fact it was the first thing I did out of curiosity and I found it quite confusing. Even if this a feature.
  4. Some example in samples/datafiltering.html?

comment:19 Changed 5 years ago by Piotrek Koszuliński

Status: review_failedreview

Force-pushed branch:t/10276 on dev and tests. Fixed 1. & 2.

Points 3. and 4. are part of #11709. Though I don't know if we can add now another example to the sample, because it's already too long. I think that we'll focus now on extending docs and proper cross linking.

comment:20 Changed 5 years ago by Olek Nowodziński

Status: reviewreview_passed

comment:21 Changed 5 years ago by Piotrek Koszuliński

Resolution: fixed
Status: review_passedclosed

Hurray, hurray! Merged to major with git:c6e729f on dev and a72150e on tests.

Note: See TracTickets for help on using tickets.
© 2003 – 2017 CKSource – Frederico Knabben. All rights reserved. | Terms of use | Privacy policy