Opened 16 years ago
Closed 15 years ago
#4395 closed New Feature (fixed)
Use htmldataprocessor to refactor pasting processor
Reported by: | Garry Yao | Owned by: | Garry Yao |
---|---|---|---|
Priority: | Normal | Milestone: | CKEditor 3.1 |
Component: | General | Version: | |
Keywords: | Paste Confirmed | Cc: |
Description (last modified by )
We should start using htmldataprocessor when processing with the pasting input, instead of current implementation which based on regexp exclusively, such a infrustructure would bring benefits in many sense:
- Allow structure transformation to happen easily toward the source instead of simply cleanup, e.g. MS-WORD created middot bullet -> HTML unordered list;
- Leveraging all the existing rules we currently have for output, e.g. flash object, namespaces tags;
- It will be much more easy for developer to extend/customize by adding/altering the rules.
Change History (5)
comment:1 Changed 16 years ago by
Description: | modified (diff) |
---|---|
Status: | new → assigned |
Summary: | Use htmldataprocessor to refactor pasting clean up → Use htmldataprocessor to refactor pasting processor |
comment:2 Changed 16 years ago by
Keywords: | Paste added |
---|
comment:3 Changed 16 years ago by
Migrate all the regexp based rules in 'cleanWord' function to be based on filter rules with [4208].
comment:4 Changed 16 years ago by
It's noticed that there's one significant impedance mismatch between the old regexp based and the current filter based one:
The old approach is linear, multiple-pass parsing, while our html filter is a top-down, one-pass procedure, which make difficulties for some of the rule's migration.
Considering the following example, which should be correctly cleaned up as a single .
<span lang=EN-GB style='font-family:Calibri'> <o:p> </o:p> </span>
The old rules related to this were:
html = html.replace(/<o:p>\s*<\/o:p>/g, '') ; html = html.replace(/<o:p>[\s\S]*?<\/o:p>/g, ' ') ; html = html.replace( /<SPAN\s*[^>]*>\s* \s*<\/SPAN>/gi, ' ' ) ; html = html.replace( /<SPAN\s*[^>]*><\/SPAN>/gi, '' ) ;
The new rules would ideally be the following but actually was wrong because the 'span' rule will always be execute first( determinate by tree order ):
elements : { $ : function( element ) { var tagName = element.name; if( tagName == 'span' ) { var child; if ( ( child = onlyChildOf( element ) ) && /(:?\s| )+/.exec( child.value ) ) ...Drop this element, preserve childs... } else if( tagName == 'o:p' ) { ...Drop this element, preserve childs... } }
In such case, the filter must have one mechanism to properly perform the filtering from bottom to top( allow children to be filtered before itself ), in this concrete example will execute the <o:p> rule, then the <span> rule.
I'm adding one function CKEDITOR.htmlParser.element::filterChildren to allow this happen like the following when necessary, changes were checked in at the pasting branch with [4218].
if( tagName == 'span' ) { // Filter down the childrens first. element.filterChildren(); var child; if ( ( child = onlyChildOf( element ) ) && /(:?\s| )+/.exec( child.value ) ) ...Drop this element, preserve childs... }
comment:5 Changed 15 years ago by
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Changes committed with [4207] in pasting branch.