Opened 5 months ago

Last modified 3 months ago

#16833 assigned Bug

IE11 malformed list pasted from Word

Reported by: Tade0 Owned by: m.lewandowski
Priority: Nice to have (we want to work on it) Milestone:
Component: Plugin : Paste from Word Version: 4.7.0 (GitHub - major)
Keywords: Cc:

Description

Steps to reproduce

  1. Open Full editor demo from: http://nightly.ckeditor.com/
  2. Open the attached file in Word, copy and paste it into the editor.

Expected result

List structure is the same as in the Word file.

Actual result

Lists are malformed - they are shifted one level up.

Other details (browser, OS, CKEditor version, installed plugins)

IE11 Only.

Attachments (1)

Numbered.and.Bulleted.Lists.with.Headings.docx (13.5 KB) - added by Tade0 5 months ago.

Download all attachments as: .zip

Change History (17)

comment:1 Changed 5 months ago by Tade0

  • Component changed from General to Plugin : Paste from Word

comment:2 Changed 5 months ago by m.lewandowski

  • Status changed from new to confirmed

comment:3 Changed 5 months ago by Tade0

  • Status changed from confirmed to review

Added a method that check whether the list levels are shifted and corrects that.

Added in Heuristics, because without the mentioned method the list is still technically valid.

Changes pushed to branch:t/16833.

comment:4 Changed 5 months ago by m.lewandowski

  • Status changed from review to review_failed

Please add also other browsers to https://github.com/cksource/ckeditor-dev/tree/02e84ce223e6e0f345a4052f390797551028bcbf/tests/plugins/pastefromword/generated/_fixtures/Tickets/16818Numbered_lists - we need to ensure that it works well also for other browsers.

I'd say edge, chrome, firefox with one version of Word is enough.

comment:5 Changed 5 months ago by Tade0

Including chrome and firefox was straightforward, but there are still lingering problems in Edge, namely some list elements are not detected - this is a known problem in Edge for which there are heuristics in place, but they need to be adjusted.

A solution has been provided and is currently tested for regressions.

comment:6 Changed 5 months ago by Tade0

  • Status changed from review_failed to review

Updated tests.

Changes pushed to branch:t/16833.

comment:7 follow-ups: Changed 5 months ago by m.lewandowski

  • Status changed from review to review_failed

Regarding functionallity, the branch actually does a good job.

There are just a couple things to be fixed thoguh:

  • Edge fixture was generated with word2013 - please update it as it will simplify test file a bit.
  • We need filters for all headers, in HTML we have it all the way up to h6
  • correctLevelShift, isShifted are missing API docs.
  • Calling multiple Array.concat have really poor readability, it would be better just to do:
    	var preceeding = isShiftedList( list ) ? [ list ] : list.children;
    	return preceeding.concat( acc );
    
  • I was also concerned about empty paragraphs being propagated by PFW in IE11, but these get filtered out later, so it could be extracted into a follow-up ticket.

comment:8 in reply to: ↑ 7 ; follow-up: Changed 5 months ago by Tade0

  • Status changed from review_failed to review

Replying to m.lewandowski:

The rule I'm going by is: Don't introduce new names if you don't have to.. For me personally it reduces cognitive load.

Corrected everything according to the list. Changes pushed to branch:t/16833.

comment:9 in reply to: ↑ 8 Changed 5 months ago by m.lewandowski

  • Resolution set to fixed
  • Status changed from review to closed

Replying to Tade0:

Replying to m.lewandowski:

The rule I'm going by is: Don't introduce new names if you don't have to.. For me personally it reduces cognitive load.

Corrected everything according to the list. Changes pushed to branch:t/16833.

It's a good role, but it's usage strongly depends on given case. Here it contained simply unnecessary hacks to perform an operation that .could be conveyed more easily. As long as you're not working with more than 7 names at once.

Fix LGTM, good job.

Fixed with git:2d07c870ee91d1cecf42e1cac0a6bb8f875f7f84, merged to major.

comment:10 Changed 5 months ago by m.lewandowski

  • Resolution fixed deleted
  • Status changed from closed to reopened

comment:11 in reply to: ↑ 7 Changed 5 months ago by m.lewandowski

Replying to m.lewandowski:

  • I was also concerned about empty paragraphs being propagated by PFW in IE11, but these get filtered out later, so it could be extracted into a follow-up ticket.

Actually mentioned paragraphs are visible in IE11 on Windows 10. It's a critical issue and needs to be fixed.

comment:12 Changed 5 months ago by j.swiderski

  • Status changed from reopened to confirmed

comment:13 Changed 5 months ago by m.lewandowski

  • Owner changed from Tade0 to m.lewandowski
  • Status changed from confirmed to assigned

comment:14 Changed 5 months ago by m.lewandowski

There were three ways to fix this markup:

RegExp

Regexp first: there's not much to elaborate as RegExps are not simply a good tool for parsing, it's error prone.

Browser DOM parser

The idea is simply to preformat data before it gets used by the CKEditor, so that:

  1. Internet Explorer pastes the data into our dataValue variable.
  2. dataValue gets loaded into a Document Fragment in the memory, where we traverse the tree and do the necesarry modifications.
  3. After our modifications are done, we get the output HTML, and feed it into the CKEditor.

Problem with this approach is that browsers do fix malformed HTML on it's own. You could check it on this demo: http://codepen.io/mlewand/pen/pRZoYG (see the console logs there).

See what happened to <strong><em> paris.

And that's the very reason why CKEditor 4 comes with it's own HTML parser.

CKEditor4 parser

Ultimately the only option to fix this issue is to make a fix in CKEditor HTML parser.

In this case our parser does a good job of moving inline elements like strong, but it does bad job with paragraphs because it moves it before the list (and this is exactly what you're seeing here).

This decission has been made in #3195 - it was meant to have the same handling for wrong elements directly inside of a table and lists.

What we could do here is to bring a fix for lists only, so that:

<ul>
	<li>1</li>
	<p>2</p>
	<li>3</li>
</ul>

Gets transformed into:

<ul>
	<li>1</li>
	<li><p>2</p></li>
	<li>3</li>
</ul>

Alternatively we could restrict the change only to remove empty paragraphs like that. So that:

<ul>
	<li>1</li>
	<p></p>
	<li>3</li>
</ul>

Gets transformed into:

<ul>
	<li>1</li>
	<li>3</li>
</ul>

But no paragraph is moved around.

Last edited 5 months ago by m.lewandowski (previous) (diff)

comment:15 Changed 5 months ago by m.lewandowski

So with that above, changing our parser is the only way to fix this issue. This brings very high risk and needs to be carried out extremely carefully.

Another thing here is that it's IE11 that produces totally malformed HTML - the only problem is that as a result this problem is very visible to the end user.

comment:16 Changed 3 months ago by m.lewandowski

  • Milestone CKEditor 4.7.0 deleted
  • Priority changed from Normal to Nice to have (we want to work on it)
Note: See TracTickets for help on using tickets.
© 2003 – 2017 CKSource – Frederico Knabben. All rights reserved. | Terms of use | Privacy policy