Opened 7 years ago

Closed 6 years ago

Last modified 5 years ago

#9456 closed Bug (fixed)

Properly paste bullet list style from MS-Word

Reported by: JPG Owned by: Piotrek Koszuliński
Priority: Normal Milestone: CKEditor 4.0.1
Component: Plugin : Paste from Word Version: 3.0
Keywords: Chrome Cc: Jeff Fournier

Description (last modified by Jakub Ś)

When you use the "Paste from Word" function to paste text containing bullet list, there is a few problems in all browser i've tried (Chrome v.22, FF v.16 and IE v.9).

How to reproduce :

  • Open the attached document (text-with-bullet-list-example.doc) ;
  • Copy the text ;
  • Click on the "Paste from Word" button or use <ctr><v> ;

What is the problem :

  • Each element of the list is converted into a paragraph (<p>) instead of regular HTML unordered list tags (<ul> and <li>) ;
  • Each element contains a character and a few spaces to visually represent the list. Depending on the level, you can get :
    1. Root level : "· " ;
    2. First indentation : "o " ;
    3. Second indentation : "§ ".

What would be expected :

To have proper HTML unordered list with <ul> and <li> tags, containing only the text of the element, without any extra character like "o" or "§" neither spaces.

The issue #6662 is similar to this one but for numeric list style only.


This may be just different TC for ticket #8734. Please see comment:10

Please see comment:5, comment:2 and comment:1 for better view at this problem

Attachments (7)

html-result.png (141.9 KB) - added by JPG 7 years ago.
HTML generated after the paste operation
text-with-bullet-list-example.doc (23.5 KB) - added by JPG 7 years ago.
Text to copy for testing
copy-paste-demo.swf (2.5 MB) - added by JPG 7 years ago.
Video of the problem
Stuff to get.docx (14.2 KB) - added by Henrik 6 years ago.
Simple two-level list
pasted.txt (4.6 KB) - added by Henrik 6 years ago.
Export of the complete "Stuff to get" file pasted into a contentEditable div.
pasted_libre.txt (938 bytes) - added by Henrik 6 years ago.
"Stuff to get.docx" opened in Libre Office and copied to a contentEditable div.
list_paste_from_msword.docx (17.0 KB) - added by Garry Yao 6 years ago.

Change History (30)

Changed 7 years ago by JPG

Attachment: html-result.png added

HTML generated after the paste operation

Changed 7 years ago by JPG

Text to copy for testing

comment:1 Changed 7 years ago by Jakub Ś

Component: Core : PastingCore : Lists
Keywords: Chrome added; Paste Word List removed
Version: 3.6.53.0

@jipolin what you have described occurs only in Chrome and looks like the continuation of #8734.

I have noticed that your list doesn't have normal list styles but has something like - "Paragraphe de liste". Could you tell me how this list was created?

comment:2 Changed 7 years ago by Jakub Ś

Status: newconfirmed

Lists don't get pasted in Chrome from Word 2010 #8734 and also from Word 2003 when some custom formatting (not default) is applied to list.
As discussed with @wwalc lists should be recognized in such cases as well and pasted as HTML lists and not tags.

Workaround is to use normal lists on Word 2003 and then paste them in CKEditor.

Changed 7 years ago by JPG

Attachment: copy-paste-demo.swf added

Video of the problem

comment:3 Changed 7 years ago by JPG

You're right, it's working on IE9 and FF.

The "Paragraphe de liste" is the french traduction of "List paragraph" which is the default style automatically added by MS-Word when a list is created (see http://office.microsoft.com/en-us/word-help/style-basics-in-word-HA010230882.aspx#BM2b).

On the video copy-past-demo.swf, you can see that if I copy/paste the list with its default style ("Paragraphe de liste"), it won't work but if I change it for "Normal", the list is properly handled.

Maybe the CKEditor doesn't recognize the style because it's in French ? Do we have the same problem with a english version of MS-Word 2010 ?

comment:4 Changed 7 years ago by Jakub Ś

@jipolin sorry for late response - English version doesn't have this problem. It doesn't have such styles attached by default.

comment:5 Changed 7 years ago by Jakub Ś

I have noticed that if you paste normal MS WORD list (without any custom style), Chrome sees it as lists. Browser sees ul li tags.

If for example you attach some MS Word indented list style then Chrome sees paragraphs with 'MsoList...' class.
If you for example style text with indented text and then apply list, Chrome will see it as paragraphs with class MsoBodyTextIndent and some HTML comments about lists.

Anyway it looks like that if Chrome sees UL then lists are pasted but if it sees paragraphs it always pastes paragraphs (even if they have 'MsoList class).

comment:6 Changed 7 years ago by JPG

About your first comment, are you sure ? Because as the mentionned it in the US english help page, you should have such style in english version of word too.

About your last comment, when Chrome sees paragraphs with 'MsoList...' class, would it be possible to consider those as actual lists in CKEditor ? Do you know how IE and FF react on those cases, is the pasted text contain paragraph too ?

comment:7 Changed 7 years ago by Jakub Ś

Description: modified (diff)

Changed 6 years ago by Henrik

Attachment: Stuff to get.docx added

Simple two-level list

Changed 6 years ago by Henrik

Attachment: pasted.txt added

Export of the complete "Stuff to get" file pasted into a contentEditable div.

comment:8 Changed 6 years ago by Henrik

I have the same issue with both numbered and bulleted lists pasted from Word 2010 (MS Office Pro Plus, Word 14.0.6123.5000, 32bit, English), both when pasting in Chrome 22.0.1229.79 and IE 8.0.6001.18702. No difference between the official CKEditor 3 (Full) and 4 Beta (Inline) demos. This uses the default number and bullet styles. Doesn't matter which list type is used where though.

I've attached the file I created and copied (Ctrl+A, Ctrl+C) as "Stuff to get.docx" and the "raw" markup pasted into a simple contentEditable div in Chrome as "pasted.txt".

This is on a Win XP Pro machine in case that matters.

comment:9 Changed 6 years ago by Jakub Ś

If I understand correctly "Stuff to get.docx" is what you expect. Where is the file that is causing the problem? Where is the file which is creating : "Root level : "• " ; First indentation : "o " ; Second indentation : "§ " ".
That is what was reported in original ticket.

If you are just getting paragraphs please refer to #8734.

I'm waiting for your comments.

comment:10 Changed 6 years ago by Henrik

#8734 appears to be the exact same issue.

The "Stuff to get.docx" file is not what I expect, it's the actual file causing the problem. I should have renamed it to represent that instead of the "todo list" it was used as.

The inserted leading character mentioned in the original post is the "bullet item" itself. I get the default · character inserted inside span tags before each bulleted-list-paragraph, and a number before each numbered-list-paragraph, as I did not change the list style.

The "bullet item" span, and the "indent spaces" also mentioned in the original post, are wrapped in <!--[if !supportLists]--> <!--[endif]--> comments. To me it looks like these represent "artificial" bulletpoints/listnumbers for renderers which do not have real lists. CKEditor appears to remove/ignore the pasted comments but leave the span intact, which makes the "bullet items" visible, and possibly also interfering with parsing the list as an actual list. Everything inside those comments should be removed, not just the comments themselves.

If you change the list style, you can get any character or numerical representation to show up as the "bullet item" in each paragraph. The screencast attached in #8734 also shows that the "bullet item" has been included inside each of the paragraphs, but as a number since he used numbered lists.

I did not have a tool to inspect the "raw" clipboard data at the time of testing this, so I used a contentEditable div to at least get the HTML version of what was in there, after copying from the .docx file. I created "pasted.txt" by simply putting basic HTML markup including the contentEditable div in a file, opened that in my browser, pasted the contents from "Stuff to get.docx" inside the div, used the browser's developer tools to copy the current document's markup, and finally put that in the text file. Trivia note: Looks like either MS Word creates the weird markup on the fly during copying, or Libre Office filters it out when opening the file. If I open the same .docx file from Libre Office, copy everything and paste it into the contentEditable div I used to create "pasted.txt", I get nearly perfect markup to begin with. This is obviously a lot easier for CKEditor to work with, so the lists turn out great.

I attached the markup generated when copying from Libre Office and pasting into the plain contentEditable div as "pasted_libre.txt". That is much closer to what I expect when pasting into CKEditor from MS Word 2010.

Changed 6 years ago by Henrik

Attachment: pasted_libre.txt added

"Stuff to get.docx" opened in Libre Office and copied to a contentEditable div.

comment:11 Changed 6 years ago by Jakub Ś

@TwoD - Thank you for detailed description.

Result from Libre Office is ul li tags which Chrome understands as list. Anything else i.e. List which in pasted HTML is represented by paragraphs is not recognized as list.

Last edited 6 years ago by Jakub Ś (previous) (diff)

comment:12 Changed 6 years ago by Jakub Ś

Description: modified (diff)

comment:13 Changed 6 years ago by Piotrek Koszuliński

I pushed t/9456 to tests repo.

I gathered results from FF, Chrome and IE8/9 for pasting contents of documents attached in this ticket (opened in Word 2007 and 2010).

http://ckeditor4.t/tt/9456/1.html

comment:14 Changed 6 years ago by Garry Yao

Thanks for the feedback, sorry for not able to participant earlier in this bug since we'd been working hard to launch version 4.

On topic, this's a recent Webkit regression and in fact CKEditor has been working very well with list transformation from MS-Word in the past, FYI a list of tickets to back this feature:

#5399
Lists pasted from Word do not maintain their nesting
#6330
Roman list style are not pasted properly from Word
#6658
Paste from Word in IE - Internal error of html formatter - Tabs in empty in lists
#6662
Lists copied from Word are not pasted properly.
#7131
Copy/Paste Word List should preserve list properties
#7269
paste from word - footnote links link to document path in webkit based browsers
#7480
Bulleted lists copied from MS Word are pasted into the Editor as Numbered lists
#7872
two-level list pasted from word gets flattened or split
#7898
Problem with Label for Lock ratio button not changing in High Contrast mode

The Webkit regression that introduced recently on Chrome, compared with what we had before, on presenting a (circle) list bullet from MS-Word:

Mozilla/5.0 (Windows NT 6.1) AppleWebKit/534.57.2 (KHTML, like Gecko)

<!--[if !supportLists]-->
<span style="mso-list:Ignore">o</span>
<!--[endif]-->

Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.11 (KHTML, like Gecko)

<!--[if !supportLists]-->
<span lang="EN-US" style="font-family:Wingdings;mso-fareast-font-family:Wingdings;
mso-bidi-font-family:Wingdings">o</span>
<!--[endif]-->

As seen above the mso-list:Ignore that we relies on to identify a list item has gone that breaks this feature.

comment:15 Changed 6 years ago by Garry Yao

Reported Chromium bug: http://code.google.com/p/chromium/issues/detail?id=162800

We would meanwhile work on a urgent hot fix for the editor, depending on the reaction at Chromium side.

comment:16 Changed 6 years ago by JPG

Thanks for your answers, it's crystal clear now that it's a chrome regression. Hope you will be abble to add a hot fix pending chrome team fix it.

comment:17 Changed 6 years ago by Garry Yao

Milestone: CKEditor 4.0.1
Owner: set to Garry Yao
Status: confirmedreview

Opened 7ac9b14 for a hot fix in v4, as chrome seems not reactive on that actively.

Changed 6 years ago by Garry Yao

Attachment: list_paste_from_msword.docx added

comment:18 Changed 6 years ago by Piotrek Koszuliński

Owner: changed from Garry Yao to Piotrek Koszuliński
Status: reviewassigned

comment:19 Changed 6 years ago by Piotrek Koszuliński

Pushed rebased branches.

comment:20 Changed 6 years ago by Piotrek Koszuliński

Status: assignedreview

With few more tests and fixes for Webkit back on review.

comment:21 Changed 6 years ago by Frederico Caldeira Knabben

Status: reviewreview_passed

comment:22 Changed 6 years ago by Piotrek Koszuliński

Resolution: fixed
Status: review_passedclosed

Fixed with git:e5090f2.

comment:23 Changed 5 years ago by Frederico Caldeira Knabben

Component: Core : ListsPlugin : Paste from Word
Note: See TracTickets for help on using tickets.
© 2003 – 2019 CKSource – Frederico Knabben. All rights reserved. | Terms of use | Privacy policy