Opened 10 months ago

Last modified 8 months ago

#16833 assigned Bug

IE11 malformed list pasted from Word

Reported by: Tade0 Owned by: Marek Lewandowski
Priority: Nice to have (we want to work on it) Milestone:
Component: Plugin : Paste from Word Version: 4.7.0
Keywords: Cc:

Description

Steps to reproduce

  1. Open Full editor demo from: http://nightly.ckeditor.com/
  2. Open the attached file in Word, copy and paste it into the editor.

Expected result

List structure is the same as in the Word file.

Actual result

Lists are malformed - they are shifted one level up.

Other details (browser, OS, CKEditor version, installed plugins)

IE11 Only.

Attachments (1)

Numbered.and.Bulleted.Lists.with.Headings.docx (13.5 KB) - added by Tade0 10 months ago.

Download all attachments as: .zip

Change History (17)

comment:1 Changed 10 months ago by Tade0

Component: GeneralPlugin : Paste from Word

comment:2 Changed 10 months ago by Marek Lewandowski

Status: newconfirmed

comment:3 Changed 10 months ago by Tade0

Status: confirmedreview

Added a method that check whether the list levels are shifted and corrects that.

Added in Heuristics, because without the mentioned method the list is still technically valid.

Changes pushed to branch:t/16833.

comment:4 Changed 10 months ago by Marek Lewandowski

Status: reviewreview_failed

Please add also other browsers to https://github.com/cksource/ckeditor-dev/tree/02e84ce223e6e0f345a4052f390797551028bcbf/tests/plugins/pastefromword/generated/_fixtures/Tickets/16818Numbered_lists - we need to ensure that it works well also for other browsers.

I'd say edge, chrome, firefox with one version of Word is enough.

comment:5 Changed 10 months ago by Tade0

Including chrome and firefox was straightforward, but there are still lingering problems in Edge, namely some list elements are not detected - this is a known problem in Edge for which there are heuristics in place, but they need to be adjusted.

A solution has been provided and is currently tested for regressions.

comment:6 Changed 10 months ago by Tade0

Status: review_failedreview

Updated tests.

Changes pushed to branch:t/16833.

comment:7 Changed 10 months ago by Marek Lewandowski

Status: reviewreview_failed

Regarding functionallity, the branch actually does a good job.

There are just a couple things to be fixed thoguh:

  • Edge fixture was generated with word2013 - please update it as it will simplify test file a bit.
  • We need filters for all headers, in HTML we have it all the way up to h6
  • correctLevelShift, isShifted are missing API docs.
  • Calling multiple Array.concat have really poor readability, it would be better just to do:
    	var preceeding = isShiftedList( list ) ? [ list ] : list.children;
    	return preceeding.concat( acc );
    
  • I was also concerned about empty paragraphs being propagated by PFW in IE11, but these get filtered out later, so it could be extracted into a follow-up ticket.

comment:8 in reply to:  7 ; Changed 10 months ago by Tade0

Status: review_failedreview

Replying to m.lewandowski:

The rule I'm going by is: Don't introduce new names if you don't have to.. For me personally it reduces cognitive load.

Corrected everything according to the list. Changes pushed to branch:t/16833.

comment:9 in reply to:  8 Changed 10 months ago by Marek Lewandowski

Resolution: fixed
Status: reviewclosed

Replying to Tade0:

Replying to m.lewandowski:

The rule I'm going by is: Don't introduce new names if you don't have to.. For me personally it reduces cognitive load.

Corrected everything according to the list. Changes pushed to branch:t/16833.

It's a good role, but it's usage strongly depends on given case. Here it contained simply unnecessary hacks to perform an operation that .could be conveyed more easily. As long as you're not working with more than 7 names at once.

Fix LGTM, good job.

Fixed with git:2d07c870ee91d1cecf42e1cac0a6bb8f875f7f84, merged to major.

comment:10 Changed 10 months ago by Marek Lewandowski

Resolution: fixed
Status: closedreopened

comment:11 in reply to:  7 Changed 10 months ago by Marek Lewandowski

Replying to m.lewandowski:

  • I was also concerned about empty paragraphs being propagated by PFW in IE11, but these get filtered out later, so it could be extracted into a follow-up ticket.

Actually mentioned paragraphs are visible in IE11 on Windows 10. It's a critical issue and needs to be fixed.

comment:12 Changed 10 months ago by Jakub Ś

Status: reopenedconfirmed

comment:13 Changed 10 months ago by Marek Lewandowski

Owner: changed from Tade0 to Marek Lewandowski
Status: confirmedassigned

comment:14 Changed 10 months ago by Marek Lewandowski

There were three ways to fix this markup:

RegExp

Regexp first: there's not much to elaborate as RegExps are not simply a good tool for parsing, it's error prone.

Browser DOM parser

The idea is simply to preformat data before it gets used by the CKEditor, so that:

  1. Internet Explorer pastes the data into our dataValue variable.
  2. dataValue gets loaded into a Document Fragment in the memory, where we traverse the tree and do the necesarry modifications.
  3. After our modifications are done, we get the output HTML, and feed it into the CKEditor.

Problem with this approach is that browsers do fix malformed HTML on it's own. You could check it on this demo: http://codepen.io/mlewand/pen/pRZoYG (see the console logs there).

See what happened to <strong><em> paris.

And that's the very reason why CKEditor 4 comes with it's own HTML parser.

CKEditor4 parser

Ultimately the only option to fix this issue is to make a fix in CKEditor HTML parser.

In this case our parser does a good job of moving inline elements like strong, but it does bad job with paragraphs because it moves it before the list (and this is exactly what you're seeing here).

This decission has been made in #3195 - it was meant to have the same handling for wrong elements directly inside of a table and lists.

What we could do here is to bring a fix for lists only, so that:

<ul>
	<li>1</li>
	<p>2</p>
	<li>3</li>
</ul>

Gets transformed into:

<ul>
	<li>1</li>
	<li><p>2</p></li>
	<li>3</li>
</ul>

Alternatively we could restrict the change only to remove empty paragraphs like that. So that:

<ul>
	<li>1</li>
	<p></p>
	<li>3</li>
</ul>

Gets transformed into:

<ul>
	<li>1</li>
	<li>3</li>
</ul>

But no paragraph is moved around.

Last edited 10 months ago by Marek Lewandowski (previous) (diff)

comment:15 Changed 10 months ago by Marek Lewandowski

So with that above, changing our parser is the only way to fix this issue. This brings very high risk and needs to be carried out extremely carefully.

Another thing here is that it's IE11 that produces totally malformed HTML - the only problem is that as a result this problem is very visible to the end user.

comment:16 Changed 8 months ago by Marek Lewandowski

Milestone: CKEditor 4.7.0
Priority: NormalNice to have (we want to work on it)
Note: See TracTickets for help on using tickets.
© 2003 – 2017 CKSource – Frederico Knabben. All rights reserved. | Terms of use | Privacy policy