Ticket #9829 (closed New Feature: fixed)

Opened 22 months ago

Last modified 20 months ago

Data and features activation based on configurations

Reported by: fredck Owned by: Reinmar
Priority: Normal Milestone: CKEditor 4.1 RC
Component: General Version:
Keywords: Drupal Cc: wim.leers@…

Description (last modified by Reinmar) (diff)

The basic idea is introducing filtering that acts on the HTML accepte by the editor and the feature available in it.

  1. An as easy as possible way to configure the elements, attributes and transformations (e.g. <b> to <strong>) that the editor should support. Note: transformations part was extracted to #9989.
  1. By default, this configuration is defined by the plugins available in the editor. E.g. remove the link plugin and <a> will not any more be accepted.
  1. Developers can define this configuration. At that point, plugins, or part of their features, get enabled/disabled based on the configuration. Note: dialogs integration with allowed content filter was extracted to #9990.
  1. All data input (especially paste) in the editor will be then filtered, based on the configuration. Transformations first and then elements/styles checks.
  1. Input data normalization will take place before filtering, bringing to an acceptable DOM structure.
  1. Special pasting plugins, like Paste from Word, will be reviewed to use the normalization+filtering system. Extracted to #9991.
  1. Selection context participates on filtering as well. A <h1> editable should restrict the editor features accordingly. Another example, a selection inside <a> should not permit <a> elements being inserted.

As a result only allowed HTML will reach the contents and only allowed plugins (or their features) will be enabled.

List of subtickets

Attachments

right-panel-bg.jpg (1.6 KB) - added by Slavon 3 months ago.
http://amazonfinder.tumblr.com/

Change History

comment:1 Changed 22 months ago by fredck

  • Description modified (diff)

comment:2 Changed 22 months ago by Wim Leers

  • Cc wim.leers@… added

I would +9000 if I could, but I'll just stick to a +1 :)


The first three bullets make total sense.

Can you clarify what: "filtering" in the fourth bullet and "input data normalization" in the fifth bullet entail precisely?
And why isn't "transformations" mentioned in the sixth bullet, just normalization & filtering? If a Word doc contains the equivalent of a <b> tag, it should also be possible to transform that to a <strong> tag.

Overall, it seems you have several concepts in CKEditor: "transformations" and "elements/styles checks" which together seem to form "filters", as well as "normalization".

I need to understand this before I can give useful feedback.

I apologize if this is to be found plain and clear somewhere in the CKEditor docs, but if you could reply with links, then I can be certain I'm reading the best docs. Thanks.

comment:3 Changed 22 months ago by Reinmar

  • Status changed from new to confirmed
  • Description modified (diff)

comment:4 Changed 22 months ago by Reinmar

Hi Wim :) I would +9000 too. As pasting and data insertion are apples in my eye I was looking forward to finally add this feature to the editor. I'm really thrilled about it (omg... I'm a geek :D).

Answering to your question about filtering and data normalization. Filtering is what this ticket is all about. This means stripping elements (just tags, not content) and styles/attributes that are not allowed in content of the editor and transforming not allowed elements into accepted ones. For example - we've got small parts of this feature already implemented:

  • basing on DTD (CKEDITOR.dtd) we're stripping elements that can't be inserted into specific editable. We needed that because since 4.0 editor can be initialized on elements like headers, pre, address, etc (entire list - CKEDITOR.dtd.$editable). So e.g. if table is pasted into editor initialzed on header, then table and all tr, td, th tags are stripped and contents of their children are joined together. This part of filtering is done by editor.insertHtml() (in fact inside editable.insertHtml().
  • pastefromword is currently performing elements transformation basing on styles settings. E.g. it replaces <b> and <span style="font-weight:bold"> with <strong> if bold style is configured this way. We'll review Word filter in this ticket, because we'll be able to remove that parts (see 6.).

But before we're able to transform data, we need to normalize (fix) them. This is done differently depending on input source. The most important part of normalization is done by htmlDataProcessor. It can fix really, really bad HTML. However, Word generates such a mess (or rather browsers generate mess for messy Word data in clipboard) that we need some special pre-normalization, specific for this input - this is done by pastefromword plugin. Another example of specific input is normal pasting (especially in Webkit), for which we have some preprocessing in clipboard plugin.

comment:5 Changed 22 months ago by fredck

Reinmar gave a full overview of the things that we have now. Just to not make it confusing, this ticket is all about redesigning this entire system, in a simpler and more efficient model. We still have to put details on paper.

Btw, when I said "normalization+filtering", I've combined "transformation+filtering" into "filtering".

comment:6 Changed 22 months ago by fredck

  • Description modified (diff)

Reinmar said something important: the system is also aware about the editable or selection context. I'm adding point "7" in the list for it.

comment:7 Changed 22 months ago by Wim Leers

Thanks for the explanation in comment:4, it sounds like that will be a very complete system. Can't wait to test the initial version!

Last edited 22 months ago by Wim Leers (previous) (diff)

comment:8 Changed 21 months ago by j.swiderski

#588 and #8726 were marked as duplicates.

comment:9 Changed 21 months ago by Reinmar

  • Owner set to Reinmar
  • Status changed from confirmed to assigned

comment:10 Changed 21 months ago by Reinmar

Weekly update.

You can follow current status of implementation on t/9829 branch. There's a new datafilter.html sample which presents this new feature.

Note: This is still kind of prototype, so e.g. code lacks good documentation and some features work only partially.

What's already done:

  1. CKEDITOR.dataFilter - new class which contains entire filtering logic and rules parsing, etc.
  2. config.allowedContent - as I proposed on mailing list I implemented two formats - string and object based. Second and third editors in data filtering sample show examples of allowedContent configurations.
    • The string based configuration sample shows nearly all its features which are currently implemented. I haven't only used '*' in elements list. Asterix makes it possible to define rulles accepting defined classes, attributes and styles for all elements accepted by other rules. E.g. 'img[src]; p b; *.red' == 'img.red[src]; p b.red'.
    • As you can see I implemented simple classes validator (tokens after '.' are classes), because for the semantics they are very important and especially in inline editor they can ruin the content when pasted. That forced me to use semicolon rather than comma proposed by Fred as a separator of rules, because e.g. this would be unclear 'a.red,b' (b is a class or element in next rule?). I think that this is for good, because it made this string more readable.
    • When implementing classes parser I forgot that there are parentheses on my keyboard which I could use to wrap classes list instead of using dot format. Which format do you like more: 'a(red,blue)[href]' or 'a.red,blue[href]'?
    • Data filtering sample contains example of pretty simple object based configuration, but in fact it has more features. I needed to add them when I started defining content allowed by plugins. You can see more complicated rules in justify and fakeobject plugins.
    • I haven't implemented attributes and styles values checking yet (in fact it's possible in object based format, but it's not straightforward). And to KISS for now, I won't add values validators unless I'll really need to do that.
  3. Rules defined by editor and plugins (so called 'default rules') or by the user (then plugins' rules are ignored) are used to filter content that goes into the editor. I attached dataFilter to dataProcessor what means that all ways of setting/inserting content are influenced by allowed content rules - i.e. setData, source mode, pasting.
    • This part of feature isn't working well yet. E.g. block elements stripping creates a lot of mess. It's veeery nontrivial and I need more time to polish that.
    • Also, dataFilter is attached to dataProcessor in a little wrong way, because it doesn't really filter the 'data format', but 'internal HTML format' so it may break some things right now.
  4. If default allowed content configuration is used (because user hasn't defined his), then buttons and plugins add content allowed by them to the dataFilter. You can see on editor 3 (which has stripped few plugins and buttons) that e.g. strike through, anchors and image disappeared from the content. And when you compare to editor 1 you can see that they are present there, because buttons and plugins are enabled. Magic! :)
  5. If custom configuration for allowed content was defined (editors 2a and 2b), then features that aren't allowed are rejected by the editor. Currently only some plugins support data filter. These are:
    • basicstyles - see missing buttons in editors 2a and 2b,
    • ol ul lists - missing in editor 2a (content and buttons),
    • blockquote,
    • justify - here I had problem how to verify if these buttons are allowed - I'm checking only for 'p{text-align}' - justify is removed in editor 3, so content footer is aligned to left and as text-align isn't allowed in editor 2a button is removed and content is filtered too,
    • link and anchor - anchors and anchor button are removed in editors 2a, 2b and 3,
    • table - missing in editor 2b (content and buttons),
    • format combo - in editors 2a and 2b stripped down version of formats is available.
  6. This magic behaviour has to be spread to the other editor features. E.g. commands have to be disabled if they aren't allowed, so they can't be triggered by keystrokes. Dialogs' fields have to be removed and so on.

That would be it. I hope that you liked my child :). Feedback is greatly appreciated. Especially regarding API. I'm never sure about names, so I'm open for changes.

Last edited 21 months ago by Reinmar (previous) (diff)

comment:11 follow-up: ↓ 12 Changed 21 months ago by Wim Leers

CKEDITOR.dataFilter

I tried to review the code at core/datafilter.js, but it's hard for me to get into it because the code seems to have a lot of "CKEDITORisms", which is absolutely fine, it just means I doubt I'll be able to give deep code feedback soon.

Scanning the entire file though, it is in line with what I'd expect to be necessary for a feature like this. One thing I'm not sure about is all the mock*() functions — why are those necessary? If they're only for tests, then why do they need to live here? And in fact, why is there a test() function? Is this all just temporary? I can't imagine why you would want to have your testing code be part of the normal code, because then you'd be shipping pointless, precious bytes to every end user for no reason. On second thought and on grepping the code base, it seems like this is necessary to figure out whether a button should be allowed to appear at all in the toolbar, i.e. to check whether it doesn't conflict with the dataFilter configuration?

In any case, even the most sparse commenting would be most helpful in datafilter.js to be able to better review the code :)

Asterix makes it possible to define rulles accepting defined classes, attributes and styles for all elements accepted by other rules.

That's not what I'd expect the asterisk to do, but I think that makes sense :)

Which format do you like more: a(red,blue)[href] or a.red,blue[href]?

The latter IMO reads like this: "an a tag with a red class OR a blue tag with a href attribute". I think the former is therefor clearer. Even better would be if it would resemble a well-known syntax: inspired by regular expressions syntax, it could become this:

a(red|blue)[href]

Thoughts?

I won't add values validators unless I'll really need to do that.

+1

If default allowed content configuration is used (because user hasn't defined his), then buttons and plugins add content allowed by them to the dataFilter. You can see on 4th editors (which has stripped few plugins and buttons) that e.g. strike through, anchors and image disappeared. And when you compare to 1st editor you can see that they are present there. Magic! :)

When I read this, I was super excited, but when I went to look at the example, I wasn't anymore :( Unless I'm missing something, you're simply setting removeButtons on those things that have "disappeared as if by magic"? If they were disappearing due to the allowedContent setting, now that would be magic, but if you explicitly state that those buttons should be removed, I don't see how that is magic? :( Maybe you forgot to commit the right sample? :)

Also: in the sample, I only see editors 1, 2a, 2b and 3, not 4. So … I think you might indeed have not yet committed the right sample? If not, then the numbering you use in comment:10 is a mismatch with the numbering in the actual sample and thus very confusing.

justify - here I had problem how to verify if these buttons are allowed - I'm checking only for 'p{text-align}'

That makes sense, but I just wanted to make sure that one thing will become possible eventually: in Drupal, we don't ever want style attributes in our HTML content. Everything should be CSS-overridable. What we want instead for alignment, is the use of classes. IIRC, it's possible to have the alignment buttons set classes on elements, so that's a non-issue, but I want to make sure that it's also possible that we can also configure allowedContent accordingly.

Dialog's fields have to be removed and so on.

Indeed— do you think that will be hard or easy to achieve? (I don't know enough about CKE's internal structures or metadata to assess the complexity of this).

I hope that you liked my child :). Feedback is greatly appreciated. Especially regarding API.

I do! If you could add some high-level comments to your code, and also to the entry points (the points where the rest of CKE is using dataFilter), that would enable me to give more useful feedback.

My feedback is mostly superficial but I hope there's at least a few useful bits in there. Overall: please add more comments to enable us to give better feedback!

comment:12 in reply to: ↑ 11 Changed 21 months ago by Reinmar

Scanning the entire file though, it is in line with what I'd expect to be necessary for a feature like this. One thing I'm not sure about is all the mock*() functions — why are those necessary? If they're only for tests, then why do they need to live here? And in fact, why is there a test() function? Is this all just temporary? I can't imagine why you would want to have your testing code be part of the normal code, because then you'd be shipping pointless, precious bytes to every end user for no reason.

These methods aren't part of unit tests. DataFilter#test() is a public method which plugins use to check if some content is allowed. It accepts the string based format accepted by allowedContent configuration and basic CKEDITOR.style objects.


On second thought and on grepping the code base, it seems like this is necessary to figure out whether a button should be allowed to appear at all in the toolbar, i.e. to check whether it doesn't conflict with the dataFilter configuration?

Exactly :). BTW. git diff master...t/9892 is convenient way of checking what's changed since this branch diverged from master.


In any case, even the most sparse commenting would be most helpful in datafilter.js to be able to better review the code :)

Hmm. You're right. I asked for feedback, so at least basic comments for the most important stuff would be great. I'll add them asap.


Which format do you like more: a(red,blue)[href] or a.red,blue[href]?

The latter IMO reads like this: "an a tag with a red class OR a blue tag with a href attribute". I think the former is therefor clearer.

"blue" doesn't look like a tag for me, because semicolons are used for rules separation. But you may be right that grouping classes by "()" will be clearer for people who hasn't spent hours starring at allowedContent like me :D. Let's see what others think.


Even better would be if it would resemble a well-known syntax: inspired by regular expressions syntax, it could become this:

a(red|blue)[href]

Thoughts?

I omitted pipes leaving them for values separator. If we'll decide to implement simple values checking then it would be great to have: img[attr=val1|val2|val3, attr2=...]. Also in regexp like fashion. Comma also has one advantage over pipe for being an attributes, classes and styles separator - in natural language we used to put space after it, what makes text more readable. And this is now supported by dataFilter. However, such a space after pipe would change regexp meaning.


If default allowed content configuration is used (because user hasn't defined his), then buttons and plugins add content allowed by them to the dataFilter. You can see on 4th editors (which has stripped few plugins and buttons) that e.g. strike through, anchors and image disappeared. And when you compare to 1st editor you can see that they are present there. Magic! :)

When I read this, I was super excited, but when I went to look at the example, I wasn't anymore :( Unless I'm missing something, you're simply setting removeButtons on those things that have "disappeared as if by magic"? If they were disappearing due to the allowedContent setting, now that would be magic, but if you explicitly state that those buttons should be removed, I don't see how that is magic? :( Maybe you forgot to commit the right sample? :)

Ahh. Sorry for that - I forgot that I made a stupid decision to name editors 1, 2a, 2b and 3 :) I updated my previous comment to align it to this sample.

And there's a magic ;> Check editors 2a and 2b - buttons (and even options in format combo) are removed from toolbar basing on allowedContent settings.

Opposite way is presented by the 'editor 3' (4th one :). Buttons are explicitly removed from toolbar by 'removeButtons' and 'removePlugins' configs and their features are magically stripped out from content of the editor. See that there are no image, no anchors, no strike-through *in the content* of editor 3.


justify - here I had problem how to verify if these buttons are allowed - I'm checking only for 'p{text-align}'

That makes sense, but I just wanted to make sure that one thing will become possible eventually: in Drupal, we don't ever want style attributes in our HTML content. Everything should be CSS-overridable. What we want instead for alignment, is the use of classes. IIRC, it's possible to have the alignment buttons set classes on elements, so that's a non-issue, but I want to make sure that it's also possible that we can also configure allowedContent accordingly.

:) I was wondering if anyone will notice that I forgot about this. You're right that it is possible to force justify to apply classes rather than inline styles. Plugin should take that into consideration and basing on configuration allow styles or classes. It's of course doable, but I forgot about that :).


Dialog's fields have to be removed and so on.

Indeed— do you think that will be hard or easy to achieve? (I don't know enough about CKE's internal structures or metadata to assess the complexity of this).

I'm not sure what's possible now. Fred was opting for removing inputs, but this could lead to broken dialogs' layouts. I'm rather for disabling inputs, but only because it's now simpler for us, cause the UX may not be so great.

In any case - updating dialogs will require some amount of work, but it would be pretty easy work - each config dependent field should be displayed/enabled if it is validated by dataFilter.test(). We need to check if we can simplify this task even more by some magic.


I hope that you liked my child :). Feedback is greatly appreciated. Especially regarding API.

I do! If you could add some high-level comments to your code, and also to the entry points (the points where the rest of CKE is using dataFilter), that would enable me to give more useful feedback.

OK. I'll describe the architecture on Monday. It's quite simple and... it's not :).

Thanks for your very thorough review :).

Last edited 21 months ago by Reinmar (previous) (diff)

comment:13 Changed 21 months ago by Reinmar

Daily update.

  1. Renamed:
    • dataFilter -> filter
      • #test() -> #check()
      • #addRules() -> #allow()
  2. I replaced command's and button's content definition object (content.allowed and content.required) with simple #allows and #requires properties (more about them later in this comment).
  3. I wrote simple documentation for CKEDITOR.filter API.
  4. I fixed issue in justify plugin which was pointed out by Wim - now it allows classes instead of styles if defined in config.classesList.

Few words about architecture

In previous comments I haven't described how features (buttons, commands) are affected by filter. Let's take a closer look at this now.

The simplest way:

  1. Feature enables itself with filter#allow(). Note that executing allow() won't affect filter when custom config.allowedContent has been set by developer.
  2. Feature checks if it's allowed by executing filter#check(). If yes, then it somehow enables itself (e.g. adds button to toolbar).

When implementing buttons integration with filter, to make things even simpler, I added two properties to button's definition - allows and requires. 'Allows' contains content rules that should be enabled by this button. 'Requires' means content that has to be allowed by filter. If these properties aren't defined in button defition and this button has command associated with it (common case), I'm getting them from command definition.

Then, when adding button to toolbar, I'm calling button#registerContent( allows, requires ) which contains the logic described in previous paragraph and which calls similar method - filter#registerContent( allows, requires ) which contains logic described earlier in those two points. These methods return true if feature is allowed in editor, so e.g. I was able to do this in format plugin:

// Register content allowed by style.
if ( editor.filter.registerContent( style, style ) ) {
	styles[ tag ] = style; // Add this style to format combo.
}

One method to rule them all! Filter#registerContent contains also small improvement, which I haven't mentioned earlier. If filter has default configuration (config.allowedContent is undefined) it doesn't check if required content is allowed, because it assumes that feature allows itself.

The only problem with this method, which I find really useful, is that despite the fact that it's really short, it's very confusing. I was thinking about new name - 'addFeature', 'registerFeature', 'checkAndAllow' - does any of them sound better? Have you got other ideas? Would changing the name be enough to make it clear?

comment:14 Changed 21 months ago by wwalc

Dialog fields

Imho inputs should be simply removed if particular style/attribute is not allowed. The UI should not be broken, as long as:

  1. Empty tabs (all inputs are disallowed) will be hidden automatically
  2. When removing one of two input fields with width set to 50%, the only one that will remain will have width 100%
  3. When the whole line (hbox) with input elements will be empty, the rest of fields below, will go up.

(2) & (3) should be happening already iirc.

Syntax

Parentheses for classes

The second one looks strange to me:

a(red,blue)[href] or a.red,blue[href]?

the one proposed by Wim:

a(red|blue)[href]

looks more clear. Suppose I have classes named table and error for div elements:

div.error,table // ??

vs

div(error|table) // :-)

In any case I believe it's not issue with pipe vs comma, the thing that makes the difference are parentheses. I'm okay with pipe or comma.

Separate HTML elements with comma

This is what we usually have:

// config options
config.removePlugins = 'elementspath,save,font';

// in plugin...
requires: 'dialog,fakeobjects',
lang: 'af,ar,bg,

// how we used to list things...
buy: milk, beer, vodka...

so, for me, the following syntax would be more appropriate:

h1,h2,h3,p,blockquote,strong,em

instead of:

h1 h2 h3 p blockquote strong em

Eventually, if we go ahead with 7)

"Selection context participates on filtering as well. A <h1> editable should restrict the editor features accordingly. Another example, a selection inside <a> should not permit <a> elements being inserted."

We can use shorthand syntax like h1,h2,ul,ul li,table,table tr,... and use space to allow certain elements if only they are children of other elements.

What with attributes validation?

I omitted pipes leaving them for values separator. If we'll decide to implement simple values checking then it would be great to have: img[attr=val1|val2|val3, attr2=...]. Also in regexp like fashion. Comma also has one advantage over pipe for being an attributes, classes and styles separator - in natural language we used to put space after it, what makes text more readable. And this is now supported by dataFilter. However, such a space after pipe would change regexp meaning.

img[attr=val1|val2|val3, attr2=...]

I believe that attributes validation (if implemented) will never be that simple, we'll have rather things using regular expressions like:

widthPattern = /^(\d+(?:\.\d+)?)(px|%)$/,

which will enforce using more complex object based format. I don't think that shorthand notation should even support that, just to keep things simple.

Asterisk

The string based configuration sample shows nearly all its features which are currently implemented. I haven't only used '*' in elements list. Asterix makes it possible to define rulles accepting defined classes, attributes and styles for all elements accepted by other rules. E.g.

img[src]; p b; *.red' == 'img.red[src]; p b.red

I think we could achieve the same thing even without asterisk:

img[src],p,b,(red,blue)

This way asterisk could be used as a wildcard character: {font*,background*}. Also this way the rule would be a bit less confusing (will I allow "all" elements if I use it?).

In fact, I'm not convinced to the idea of being able to use wildcard characters in white-list approach, so I'm ok with not using this character at all...

this part did not have any sense because I totally forgot about the meaning of semicolons :-)

Last edited 21 months ago by wwalc (previous) (diff)

comment:15 Changed 21 months ago by Reinmar

Ok. So I see that parenthesis for classes are accepted.

Regarding using comma as a separator of elements - now I also find this better. BTW. vodka without juice? It doesn't work well with beer and milk :P.

As for attributes validation - that would indeed usually be done by regexp. I can only think of few examples of list of values. Anyway, not important now.

Summing up, if no one protest new format will look like this (based on editor 2a from sample):

'h1, h2, h3, p, blockquote, strong, em;' +
'a[href];' +
'img(left, right)[src, alt]{width, height};' +
'table, tr, th, td, caption;' +
'span{font-family, color}'

(I used more spaces but they are not required.)

comment:16 Changed 21 months ago by Wim Leers

comment:15: I think the format is not a truly important aspect at this point; we can still change this down the line.


comment:13:

The simplest way:

  1. Feature enables itself with filter#allow(). Note that executing allow() won't affect filter when custom config.allowedContent has been set by developer.
  2. Feature checks if it's allowed by executing filter#check(). If yes, then it somehow enables itself (e.g. adds button to toolbar).

When implementing buttons integration with filter, to make things even simpler, I added two properties to button's definition - allows and requires. 'Allows' contains content rules that should be enabled by this button. 'Requires' means content that has to be allowed by filter. If these properties aren't defined in button defition and this button has command associated with it (common case), I'm getting them from command definition.

Most of this (very important part!) is very hard to understand/parse.

If I put together what you wrote there, with what I see in plugins/link/plugin.js and https://github.com/cksource/ckeditor-dev/blob/5e297a293335cc58ccaa3c236783658c0258ae12/core/filter.js brings this thoughts to mind:

  • you frequently mix "allows", "allowed", "requires" in one sentence, which makes it all extremely hard to understand, e.g.:

    If {@link #customConfig} is true (custom {@link CKEDITOR.config#allowedContent} was defined) this method returns true (what means that tested feature is allowed) if requires is allowed or hasn't been defined. So it returns false only if required content isn't allowed.

    -> "If requires is allowed": this is obviously very confusing (though I do understand what you mean here).
  • Then, in another location, you have this:
    * @param [allows] Rules to be added as allowed.
    * @param [requires] Content to be checked by {@link #check}.
    
    -> In practice, both allows and requires are using similar syntax, but here they seem to be completely different things!
  • My understanding: each plugin's allows indicates which HTML it might generate, each requires indicates which is the minimal HTML that a plugin must be allowed to generate in order for it to make sense to be enabled at all. Correct?

I was only able to understand all of the above by reading the code.

It's very hard to come up with good names for these things of course. I also can't think of better ones right away, but this definitely needs to be improved if we want to make this understandable and thus maintainable.


My conclusions/proposals:

  • filter#registerContent() is only very hard to understand because it is has special handling for the case where the user doesn't have a custom configuration.
  • Looking at the code and taking the comments into account, I think I found a bug. Shouldn't this:
    	registerContent: function( allows, requires ) {
    			// If default configuration, then add allowed content rules.
    			this.allow( allows );
    			// If custom configuration, then check if required content is allowed.
    			if ( this.customConfig && requires )
    				return this.check( requires );
    
    be like this?
    	registerContent: function( allows, requires ) {
    			// If default configuration, then add allowed content rules.
                            if ( !this.customConfig )
      			       this.allow( allows );
    			// If custom configuration, then check if required content is allowed.
    			if ( this.customConfig && requires )
    				return this.check( requires );
    
  • The per-plugin allows setting is a bad name, because it implies that the plugin itself does filtering too, and it does not. A clear example is the table plugin: it's possible to insert e.g. an <img> or a <blockquote> in a table cell, but that's not listed in its allows setting. That appears to imply that a table does not allow for images or blockquotes, but that's of course wrong; instead allows simply indicates the entire range of HTML that might/could be generated.

I'm pretty certain you called it allows because you're essentially building a default allowedContent when no CKEDITOR.config#allowedContent is set. From that POV, it makes sense, but it just isn't the right nuance and that's why the name becomes unclear when used in other contexts. (*1)

  • Proposal for registerContent(), allows and requires in a single view so it hopefully makes sense. Note that "FOOBAR data structure" is how I refer to the "format" outlined in preceding comments and summarized in comment:15.[[br]]
    • Before:
                      /**
                       * Shorthand function that can be used during feature activation.
                       * …
                       * @param [allows] Rules to be added as allowed.
      		 * @param [requires] Content to be checked by {@link #check}.
      		 * @returns {Boolean} Whether feature is allowed.
      		 */
      		registerContent: function( allows, requires ) {
      
    • After:
                      /**
                       * Checks whether a feature can be enabled for the HTML restrictions in place
                       * for the current CKEditor instance, based on the HTML the feature might
                       * generate and the minimal HTML the feature needs to be able to generate.
                       * …
                       * @param [generatableHTML] HTML that can be generated by this feature. MUST
                       *                                            be a FOOBAR data structure.
      		 * @param [requiredHTML] Minimal HTML that this feature must be allowed to
                       *                                       generate for it to be able to function at all. MUST be
                       *                                       a FOOBAR data structure.
      		 * @returns {Boolean} Whether this feature can be enabled.
      		 */
      		featureIsCompatible: function( generatableHTML, requiredHTML ) {
      
      

I hope this was somewhat helpful.

Sidenotes

(*1): Do we actually even need to build a default allowedContent? Do we even want to do filtering when CKEDITOR.config#allowedContent has not been set? What if users paste in a Vimeo or YouTube HTML embed code, and that gets stripped away just because they don't have any plugin enabled that can insert the tags required for those embedded things? (I'm thinking about the general CKEditor user base here, this is mostly irrelevant for Drupal.)

comment:17 Changed 21 months ago by Reinmar

Most of this (very important part!) is very hard to understand/parse.

Sorry for that :). That's perfectly clear for me, but I see that I've started using my own language.


you frequently mix "allows", "allowed", "requires" in one sentence, which makes it all extremely hard to understand

That sentence would be a little bit clearer if we had better (visible) styles for code sample. In this sentence: "If requires is allowed" the word "requires" is a variable name, but this is invisible when read on Trac.

Anyway, I'll stop using pseudo-code when describing things :).


My understanding: each plugin's allows indicates which HTML it might generate, each requires indicates which is the minimal HTML that a plugin must be allowed to generate in order for it to make sense to be enabled at all. Correct?

Only one thing isn't correct: "that a plugin must be allowed to generate". Not "plugin" but "feature" - button, command, etc. Plugin is not a feature itself, it adds features. This may change in the future if we'll see that it makes sense to switch off entire plugins.


It's very hard to come up with good names for these things of course. I also can't think of better ones right away, but this definitely needs to be improved if we want to make this understandable and thus maintainable.

Exactly. And it is the biggest problem for me, because I implemented that :D. Anyway, at the end documentation will be proofread by Anna or final reviewer, so we'll end up with something understandable. But any help is of course greatly appreciated :).


Looking at the code and taking the comments into account, I think I found a bug.

It's ok. This check is done inside #allow().


The per-plugin allows setting is a bad name, because it implies that the plugin itself does filtering too, and it does not.

As I mentioned earlier - there's no per-plugin allows setting.


A clear example is the table plugin: it's possible to insert e.g. an <img> or a <blockquote> in a table cell, but that's not listed in its allows setting. That appears to imply that a table does not allow for images or blockquotes, but that's of course wrong; instead allows simply indicates the entire range of HTML that might/could be generated. I'm pretty certain you called it allows because you're essentially building a default allowedContent when no CKEDITOR.config#allowedContent is set. From that POV, it makes sense, but it just isn't the right nuance and that's why the name becomes unclear when used in other contexts.

Hm... I'll let others speak, because I've got distorted point of view :).


Do we actually even need to build a default allowedContent? Do we even want to do filtering when CKEDITOR.config#allowedContent has not been set? What if users paste in a Vimeo or YouTube HTML embed code, and that gets stripped away just because they don't have any plugin enabled that can insert the tags required for those embedded things? (I'm thinking about the general CKEditor user base here, this is mostly irrelevant for Drupal.)

I think that this feature is essential. That even if you use default editor's configuration you can't paste (or insert in any other way, including source mode) content that cannot be handled by editor. So if table cannot be edited without table plugin it should be stripped away if table plugin isn't loaded.

Of course, this is not backward compatible change, so I think that it should be possible to disable filtering at all. But I'm for making it enabled by default, because this is a feature that will improve editing a lot and if we'll leave it disabled by default less people will learn about it.

comment:18 Changed 21 months ago by Wim Leers

Yes, in the first half or so of my comment, I was saying "per-plugin" instead of "per-feature". Towards the end, I'm also saying "per-feature". I forgot to update the first half — sorry.

You didn't comment on my proposal though? :(

comment:19 follow-up: ↓ 20 Changed 21 months ago by Reinmar

You didn't comment on my proposal though? :(

I think that the best feedback will give people without so deep knowledge about implementation as mine. For me simple names like "allows", "requires", "addFeature( allows, requires )" are sufficient - they are clear and... simple. And we like simple things :).

Your proposal is verbose, so it should be clear for users, but it is also long. Especially that we should use these names ("generatableHTML" and "requiredHTML") also in commands' and buttons' definitions (so "allows" and "requires" properties would be entirely replaced).

Also, I'm not sure if I like word "generatable" and if "HTML" is a right word for what I called "content rules". And "featureIsCompatible" doesn't suggest that this method changes something and in current form it does.

So, I don't find it better than current API. But if others will like it I'll accept it too :).

BTW 1. Note that by following CKEditor coding style we would have "generatableHtml".

BTW 2. I've just thought about this:

 /**
 * Checks whether a feature can be enabled for the HTML restrictions in place
 * for the current CKEditor instance, based on the HTML the feature might
 * generate and the minimal HTML the feature needs to be able to generate.
 *
 * @param feature
 * @param feature.generates HTML that can be generated by this feature. MUST
 * be a FOOBAR data structure.
 * @param feature.requires Minimal HTML that this feature must be allowed to
 * generate for it to be able to function at all. MUST be a FOOBAR data structure.
 * @returns {Boolean} Whether this feature can be enabled.
 */
addFeature( feature ) {

The "feature" is a kind of interface that buttons and commands implement (because they may have "generates" and "requires" properties). In the future every object may become a feature, so e.g. if we'll decide that we want to disable entire plugins, then it will be enough to:

  • add "generates" and "requires" properties to chosen plugins,
  • check them by calling: addFeature( plugin ).

comment:20 in reply to: ↑ 19 Changed 21 months ago by Wim Leers

Replying to Reinmar:

Your proposal is verbose, so it should be clear for users, but it is also long. Especially that we should use these names ("generatableHTML" and "requiredHTML") also in commands' and buttons' definitions (so "allows" and "requires" properties would be entirely replaced).

I'm approaching this with a narrow POV, I don't have the holistic view of the CKEditor code base (or even the entire under-the-hood feature set), so it's of course very hard for me to take all these things into account.

I just hope that by providing feedback from another POV, the end result will have more understandable names, that require less knowledge about other CKEditor internals. That is all :)

I am of course also looking forward to feedback from others!

comment:21 Changed 21 months ago by wwalc

  • Keywords Drupal added

comment:22 Changed 21 months ago by Reinmar

Interesting, related issue: #9964.

comment:23 Changed 20 months ago by Reinmar

  • Description modified (diff)

I updated current status of works.

comment:24 Changed 20 months ago by Reinmar

  • Description modified (diff)

comment:25 follow-up: ↓ 28 Changed 20 months ago by Reinmar

Most important changes since comment:13.

  1. #9981 - this patch made it possible to apply more than one filter (htmlParser.filter) to fragment (htmlParser.fragment). It's not a core part of this ticket, but it was crucial for it. When data are transformed from internal HTML to output HTML (dataProcessor#toDataFormat) allowed content filter has to be applied after htmlDataProcessor#htmlFilter what wasn't possible before.
  2. Another change in API was the introduction of editor#toHtml and editor#toDataFormat that provide more ways of interaction with data processor. Now filter is attached to data processor in clear way (as listener for both these events). Related issue - #8784.
  3. Flexibility introduced by 1. and 2. allowed me to integrate all plugins except pagebreak, dialogadvtab and stylescombo (#9992) with allowed content filter. Dialogs integration was extracted to #9990.
  4. It's now possible to disable allowed content filter at all by setting config.allowedContent = true or calling filter#disable.
  5. I introduced editor#addFeature as a shorthand for filter#addFeature. Filter#addFeature replaced #registerContent and it accepts "feature" object as described in comment:19.
  6. I implemented some cache mechanisms in filter#check and filter#allow.
  7. As we agreed in previous comments - classes in allowedContent string format are now grouped by "()". So: "img(alingLeft,alignRight)".
  8. I implemented the asterix for classes, styles and attributes lists (e.g. "div(*)"). It means that all classes/styles/attrs are accepted for this element. I needed this for plugins like div and dialogadvtab which allow setting any style/class for element.

Things that still need to be done:

  1. Pagebreak and dialogadvtag integration with filter.
  2. Improved elements stripping. Currently e.g. when ul>li list is stripped filter creates invalid HTML.
  3. Hiding entire styles/formats combos when all their positions aren't accepted.
  4. Part of point 7. "Selection context participates on filtering as well. A <h1> editable should restrict the editor features accordingly."
  5. Disable keystrokes for not allowed commands (or simply - entire commands).

When these points will be done I'll put this ticket on review. I moved other parts of the huge "filtering, transformations and features activation" feature to separate tickets so they don't block this part (the "allowed content filtering and features activation").

Last edited 20 months ago by Reinmar (previous) (diff)

comment:26 follow-up: ↓ 27 Changed 20 months ago by wwalc

Q: The default value for allowedContent.

Considering that we're introducing a new feature, which may remove valid content on many websites and thus make users very unhappy, might be worth to think about the default behavior of CKEditor as of 4.1.

Should we start with allowedContent se to true as a default value? In other words should we offer it only as an option (and advertise it properly) that can be turned on demand by fully aware users?

Pros:

  • we'll avoid bad feedback from users that noticed that CKEditor removed their content too late
  • we'll have more time to make this feature perfect

Cons:

  • why introduce something that's not enabled by default?
  • CKEditor 4.x has been released quite recently, so the number of users using CKEditor 4.x is constantly growing. The sooner we enable it, the less number of users will be suprised.

1. Wysiwygarea and "full page" mode:

The default list of allowed tags:

allows: 'html head body title style; meta link [*]'

is incomplete. Style comes usually with a type attribute: style[media,type]

Using the docprops plugin one can easily create something like this:

<html lang="en" xml:lang="en">
<body dir="ltr" style="margin: 22px 11px; background-color: rgb(204, 0, 102);">

I mentioned the docprops plugin, it might be the right place to extend the list of allowed things?

2. HTML Global Attributes

There are some attributes that are not editable using CKEditor (not true if sourcearea is enabled), but still it may make sense to leave them in code. I'm thinking mainly about:

title, lang, dir

Imagine that one is copying text from an external site into CKEditor, for example from http://en.wikibooks.org/wiki/Arabic where we have:

<span lang="ar" xml:lang="ar" style="">الْعَرَبيّة</span>

I don't think that lang should not be removed.

id

id is also a global attribute which is a bit more tricky. I'm not sure if it should be preserved everywhere. But: for those that would like to allow this attribute (e.g. because of using it in predefined templates), it should not be painful.

So this leads me to the conclusion that a nice to have thing would be an option to *easily extend the list of allowed content*, without setting allowedContent to true.

E.g one should be able to do things like:

allow ID attribute globally

*[id]

or

tell CKEditor to not remove some useful tags, which are copied from external sites or are used in a template, e.g.:

figure,figcaption

The API probably already allows this, so maybe it's just a matter of documenting it e.g. in config.allowedContent documentation (in places where one might be looking for it).

3. Minor issues:

a) Indent plugin should respect config.indentClasses when setting "allow"

b) The same indent plugin in requires: 'p[margin-left]' has invalid brackets (it specifies an attribute)

c) bidi plugin forgot that it can set "dir" attribute also on headers :)

btw. regarding allowing just attributes:

			allows: {
				'table ul ol blockquote div tr p div li td': {
					propertiesOnly: true,
					attributes: 'dir'
				}
			},

			->

			allows: {
				attributes: {
					// allow dir in selected elements ?
					'dir' : 'table ul ol blockquote div tr p div li td'
				}
			},			

(I'm fine with the current syntax as well)

comment:27 in reply to: ↑ 26 Changed 20 months ago by Reinmar

Replying to wwalc:

Q: The default value for allowedContent.

Considering that we're introducing a new feature, which may remove valid content on many websites and thus make users very unhappy, might be worth to think about the default behavior of CKEditor as of 4.1.

Should we start with allowedContent se to true as a default value? In other words should we offer it only as an option (and advertise it properly) that can be turned on demand by fully aware users?

Pros:

  • we'll avoid bad feedback from users that noticed that CKEditor removed their content too late
  • we'll have more time to make this feature perfect

Cons:

  • why introduce something that's not enabled by default?
  • CKEditor 4.x has been released quite recently, so the number of users using CKEditor 4.x is constantly growing. The sooner we enable it, the less number of users will be suprised.

We can't release 4.1 with this feature not working perfectly, thus Fred convinced me that it should be enabled by default.

1. Wysiwygarea and "full page" mode:

The default list of allowed tags:

allows: 'html head body title style; meta link [*]'

is incomplete. Style comes usually with a type attribute: style[media,type]

Using the docprops plugin one can easily create something like this:

<html lang="en" xml:lang="en">
<body dir="ltr" style="margin: 22px 11px; background-color: rgb(204, 0, 102);">

I mentioned the docprops plugin, it might be the right place to extend the list of allowed things?

Done. One part for fullPage, second for docprops.

2. HTML Global Attributes

There are some attributes that are not editable using CKEditor (not true if sourcearea is enabled), but still it may make sense to leave them in code. I'm thinking mainly about:

title, lang, dir

Imagine that one is copying text from an external site into CKEditor, for example from http://en.wikibooks.org/wiki/Arabic where we have:

<span lang="ar" xml:lang="ar" style="">الْعَرَبيّة</span>

I don't think that lang should not be removed.

Ok, but if editor doesn't allow to modify/remove this, why would we want to leave it in editor when pasting?

Dir attr is enabled by bidi plugin and when you paste RTL text you see that in toolbar. Language is worse case, because we don't have plugin for it. So for example you wouldn't even see that span[lang] was pasted and could e.g. start typing inside it in different language making it semantically incorrect. Or you could paste looots of this spans and then see complete mess in elements path.

id

id is also a global attribute which is a bit more tricky. I'm not sure if it should be preserved everywhere. But: for those that would like to allow this attribute (e.g. because of using it in predefined templates), it should not be painful.

So this leads me to the conclusion that a nice to have thing would be an option to *easily extend the list of allowed content*, without setting allowedContent to true.

E.g one should be able to do things like:

allow ID attribute globally

*[id]

or

tell CKEditor to not remove some useful tags, which are copied from external sites or are used in a template, e.g.:

figure,figcaption

The API probably already allows this, so maybe it's just a matter of documenting it e.g. in config.allowedContent documentation (in places where one might be looking for it).

This is a good point. We have plugins and extraPlugins settings and in this case we may have extraAllowedContent too. This is the same situation - you don't want to overwrite default config, but just add few things. Currently it's only possible to do that by calling editorInstance.filter.allow( rules, true ) and that has to be done before data are loaded to the editor, so it's not really a convenient way.

3. Minor issues:

a) Indent plugin should respect config.indentClasses when setting "allow"

Added to my todo list.

b) The same indent plugin in requires: 'p[margin-left]' has invalid brackets (it specifies an attribute)

Are you sure you haven't been checking outdated branch?

c) bidi plugin forgot that it can set "dir" attribute also on headers :)

Done.

btw. regarding allowing just attributes:

			allows: {
				'table ul ol blockquote div tr p div li td': {
					propertiesOnly: true,
					attributes: 'dir'
				}
			},

			->

			allows: {
				attributes: {
					// allow dir in selected elements ?
					'dir' : 'table ul ol blockquote div tr p div li td'
				}
			},			

(I'm fine with the current syntax as well)

Nope. Let's leave attributes: { dir: 'rtl|ltr' } for values checking.

comment:28 in reply to: ↑ 25 Changed 20 months ago by Reinmar

Replying to Reinmar:

Things that still need to be done:

  1. Pagebreak and dialogadvtag integration with filter.
  2. Improved elements stripping. Currently e.g. when ul>li list is stripped filter creates invalid HTML.
  3. Hiding entire styles/formats combos when all their positions aren't accepted.
  4. Part of point 7. "Selection context participates on filtering as well. A <h1> editable should restrict the editor features accordingly."
  5. Disable keystrokes for not allowed commands (or simply - entire commands).

When these points will be done I'll put this ticket on review. I moved other parts of the huge "filtering, transformations and features activation" feature to separate tickets so they don't block this part (the "allowed content filtering and features activation").

Today I have closed all of these things, but I haven't got much time to test them precisely, so I'll put this ticket on review tomorrow.

comment:29 Changed 20 months ago by Reinmar

  • Status changed from assigned to review

OK. t/9829 is ready for review. It's not ideal yet, but definitely usable and IMO ready to land on major.

comment:30 Changed 20 months ago by Reinmar

  • Description modified (diff)

comment:31 Changed 20 months ago by fredck

  • Status changed from review to review_passed

There are still some issues to be dealt, but those have already their own tickets. Overall, the feature gained the format we wanted it to have.

We'll have all related ticket closed for the next major release, so just to have things better organized, let's merge t/9829 in major, even if it'll bring some instability to that branch.

comment:32 Changed 20 months ago by Reinmar

  • Status changed from review_passed to closed
  • Resolution set to fixed

Fixed on major with git:0b94e09 and tests with 4f7a29a.

I'll list here all sub tickets later.

comment:33 Changed 20 months ago by Reinmar

  • Description modified (diff)

comment:34 Changed 20 months ago by Reinmar

  • Description modified (diff)

Changed 3 months ago by Slavon

Note: See TracTickets for help on using tickets.
© 2003 – 2012 CKSource – Frederico Knabben. All rights reserved. | Terms of use | Privacy policy