Context Navigation

← Previous Ticket
Next Ticket →

#16833 assigned Bug

IE11 malformed list pasted from Word

Reported by:	Tade0	Owned by:	Marek Lewandowski
Priority:	Nice to have (we want to work on it)	Milestone:
Component:	Plugin : Paste from Word	Version:	4.7.0
Keywords:		Cc:

Description

Steps to reproduce

Open Full editor demo from: http://nightly.ckeditor.com/
Open the attached file in Word, copy and paste it into the editor.

Expected result

List structure is the same as in the Word file.

Actual result

Lists are malformed - they are shifted one level up.

Other details (browser, OS, CKEditor version, installed plugins)

IE11 Only.

Attachments (1)

Numbered.and.Bulleted.Lists.with.Headings.docx (13.5 KB) - added by Tade0 9 years ago.

Download all attachments as: .zip

Change History (17)

Changed 9 years ago by Tade0

Attachment:	Numbered.and.Bulleted.Lists.with.Headings.docx added

comment:1 Changed 9 years ago by Tade0

Component:	General → Plugin : Paste from Word

comment:2 Changed 9 years ago by Marek Lewandowski

Status:	new → confirmed

comment:3 Changed 9 years ago by Tade0

Status:	confirmed → review

Added a method that check whether the list levels are shifted and corrects that.

Added in Heuristics, because without the mentioned method the list is still technically valid.

Changes pushed to branch:t/16833.

comment:4 Changed 9 years ago by Marek Lewandowski

Status:	review → review_failed

Please add also other browsers to https://github.com/cksource/ckeditor-dev/tree/02e84ce223e6e0f345a4052f390797551028bcbf/tests/plugins/pastefromword/generated/_fixtures/Tickets/16818Numbered_lists - we need to ensure that it works well also for other browsers.

I'd say edge, chrome, firefox with one version of Word is enough.

comment:5 Changed 9 years ago by Tade0

Including chrome and firefox was straightforward, but there are still lingering problems in Edge, namely some list elements are not detected - this is a known problem in Edge for which there are heuristics in place, but they need to be adjusted.

A solution has been provided and is currently tested for regressions.

comment:6 Changed 9 years ago by Tade0

Status:	review_failed → review

Updated tests.

Changes pushed to branch:t/16833.

comment:7 follow-ups: 8 11 Changed 9 years ago by Marek Lewandowski

Status:	review → review_failed

Regarding functionallity, the branch actually does a good job.

There are just a couple things to be fixed thoguh:

Edge fixture was generated with word2013 - please update it as it will simplify test file a bit.
We need filters for all headers, in HTML we have it all the way up to h6
correctLevelShift, isShifted are missing API docs.

Calling multiple Array.concat have really poor readability, it would be better just to do:

	var preceeding = isShiftedList( list ) ? [ list ] : list.children;
	return preceeding.concat( acc );

I was also concerned about empty paragraphs being propagated by PFW in IE11, but these get filtered out later, so it could be extracted into a follow-up ticket.

comment:8 in reply to: 7 ; follow-up: 9 Changed 9 years ago by Tade0

Status:	review_failed → review

Replying to m.lewandowski:

Calling multiple Array.concat have really poor readability, it would be better just to do:
	var preceeding = isShiftedList( list ) ? [ list ] : list.children;
	return preceeding.concat( acc );

The rule I'm going by is: Don't introduce new names if you don't have to.. For me personally it reduces cognitive load.

Corrected everything according to the list. Changes pushed to branch:t/16833.

comment:9 in reply to: 8 Changed 9 years ago by Marek Lewandowski

Resolution:	→ fixed
Status:	review → closed

Replying to Tade0:

Replying to m.lewandowski:
Calling multiple Array.concat have really poor readability, it would be better just to do:
	var preceeding = isShiftedList( list ) ? [ list ] : list.children;
	return preceeding.concat( acc );
The rule I'm going by is: Don't introduce new names if you don't have to.. For me personally it reduces cognitive load.

Corrected everything according to the list. Changes pushed to branch:t/16833.

It's a good role, but it's usage strongly depends on given case. Here it contained simply unnecessary hacks to perform an operation that .could be conveyed more easily. As long as you're not working with more than 7 names at once.

Fix LGTM, good job.

Fixed with git:2d07c870ee91d1cecf42e1cac0a6bb8f875f7f84, merged to major.

comment:10 Changed 9 years ago by Marek Lewandowski

Resolution:	fixed
Status:	closed → reopened

comment:11 in reply to: 7 Changed 9 years ago by Marek Lewandowski

Replying to m.lewandowski:

I was also concerned about empty paragraphs being propagated by PFW in IE11, but these get filtered out later, so it could be extracted into a follow-up ticket.

Actually mentioned paragraphs are visible in IE11 on Windows 10. It's a critical issue and needs to be fixed.

comment:12 Changed 9 years ago by Jakub Ś

Status:	reopened → confirmed

comment:13 Changed 9 years ago by Marek Lewandowski

Owner:	changed from Tade0 to Marek Lewandowski
Status:	confirmed → assigned

comment:14 Changed 9 years ago by Marek Lewandowski

There were three ways to fix this markup:

use CKE4 parser (e.g. htmlParser.fromHtml method)
use browser DOM parsing.
use RegExp

RegExp

Regexp first: there's not much to elaborate as RegExps are not simply a good tool for parsing, it's error prone.

Browser DOM parser

The idea is simply to preformat data before it gets used by the CKEditor, so that:

Internet Explorer pastes the data into our dataValue variable.
dataValue gets loaded into a Document Fragment in the memory, where we traverse the tree and do the necesarry modifications.
After our modifications are done, we get the output HTML, and feed it into the CKEditor.

Problem with this approach is that browsers do fix malformed HTML on it's own. You could check it on this demo: http://codepen.io/mlewand/pen/pRZoYG (see the console logs there).

See what happened to <strong><em> paris.

And that's the very reason why CKEditor 4 comes with it's own HTML parser.

CKEditor4 parser

Ultimately the only option to fix this issue is to make a fix in CKEditor HTML parser.

In this case our parser does a good job of moving inline elements like strong, but it does bad job with paragraphs because it moves it before the list (and this is exactly what you're seeing here).

This decission has been made in #3195 - it was meant to have the same handling for wrong elements directly inside of a table and lists.

What we could do here is to bring a fix for lists only, so that:

<ul>
	<li>1</li>
	<p>2</p>
	<li>3</li>
</ul>

Gets transformed into:

<ul>
	<li>1</li>
	<li><p>2</p></li>
	<li>3</li>
</ul>

Alternatively we could restrict the change only to remove empty paragraphs like that. So that:

<ul>
	<li>1</li>
	<p></p>
	<li>3</li>
</ul>

Gets transformed into:

<ul>
	<li>1</li>
	<li>3</li>
</ul>

But no paragraph is moved around.

Last edited 9 years ago by Marek Lewandowski (previous) (diff)

comment:15 Changed 9 years ago by Marek Lewandowski

So with that above, changing our parser is the only way to fix this issue. This brings very high risk and needs to be carried out extremely carefully.

Another thing here is that it's IE11 that produces totally malformed HTML - the only problem is that as a result this problem is very visible to the end user.

comment:16 Changed 9 years ago by Marek Lewandowski

Milestone:	CKEditor 4.7.0
Priority:	Normal → Nice to have (we want to work on it)

Note: See TracTickets for help on using tickets.

Download in other formats: