Opened 8 years ago

Last modified 7 years ago

#9456 closed Bug

Properly paste bullet list style from MS-Word — at Version 12

Reported by: JPG Owned by:
Priority: Normal Milestone: CKEditor 4.0.1
Component: Plugin : Paste from Word Version: 3.0
Keywords: Chrome Cc: Jeff Fournier

Description (last modified by Jakub Ś)

When you use the "Paste from Word" function to paste text containing bullet list, there is a few problems in all browser i've tried (Chrome v.22, FF v.16 and IE v.9).

How to reproduce :

  • Open the attached document (text-with-bullet-list-example.doc) ;
  • Copy the text ;
  • Click on the "Paste from Word" button or use <ctr><v> ;

What is the problem :

  • Each element of the list is converted into a paragraph (<p>) instead of regular HTML unordered list tags (<ul> and <li>) ;
  • Each element contains a character and a few spaces to visually represent the list. Depending on the level, you can get :
    1. Root level : "· " ;
    2. First indentation : "o " ;
    3. Second indentation : "§ ".

What would be expected :

To have proper HTML unordered list with <ul> and <li> tags, containing only the text of the element, without any extra character like "o" or "§" neither spaces.

The issue #6662 is similar to this one but for numeric list style only.


This may be just different TC for ticket #8734. Please see comment:10

Please see comment:5, comment:2 and comment:1 for better view at this problem

Change History (18)

Changed 8 years ago by JPG

Attachment: html-result.png added

HTML generated after the paste operation

Changed 8 years ago by JPG

Text to copy for testing

comment:1 Changed 8 years ago by Jakub Ś

Component: Core : PastingCore : Lists
Keywords: Chrome added; Paste Word List removed
Version: 3.6.53.0

@jipolin what you have described occurs only in Chrome and looks like the continuation of #8734.

I have noticed that your list doesn't have normal list styles but has something like - "Paragraphe de liste". Could you tell me how this list was created?

comment:2 Changed 8 years ago by Jakub Ś

Status: newconfirmed

Lists don't get pasted in Chrome from Word 2010 #8734 and also from Word 2003 when some custom formatting (not default) is applied to list.
As discussed with @wwalc lists should be recognized in such cases as well and pasted as HTML lists and not tags.

Workaround is to use normal lists on Word 2003 and then paste them in CKEditor.

Changed 8 years ago by JPG

Attachment: copy-paste-demo.swf added

Video of the problem

comment:3 Changed 8 years ago by JPG

You're right, it's working on IE9 and FF.

The "Paragraphe de liste" is the french traduction of "List paragraph" which is the default style automatically added by MS-Word when a list is created (see http://office.microsoft.com/en-us/word-help/style-basics-in-word-HA010230882.aspx#BM2b).

On the video copy-past-demo.swf, you can see that if I copy/paste the list with its default style ("Paragraphe de liste"), it won't work but if I change it for "Normal", the list is properly handled.

Maybe the CKEditor doesn't recognize the style because it's in French ? Do we have the same problem with a english version of MS-Word 2010 ?

comment:4 Changed 8 years ago by Jakub Ś

@jipolin sorry for late response - English version doesn't have this problem. It doesn't have such styles attached by default.

comment:5 Changed 8 years ago by Jakub Ś

I have noticed that if you paste normal MS WORD list (without any custom style), Chrome sees it as lists. Browser sees ul li tags.

If for example you attach some MS Word indented list style then Chrome sees paragraphs with 'MsoList...' class.
If you for example style text with indented text and then apply list, Chrome will see it as paragraphs with class MsoBodyTextIndent and some HTML comments about lists.

Anyway it looks like that if Chrome sees UL then lists are pasted but if it sees paragraphs it always pastes paragraphs (even if they have 'MsoList class).

comment:6 Changed 8 years ago by JPG

About your first comment, are you sure ? Because as the mentionned it in the US english help page, you should have such style in english version of word too.

About your last comment, when Chrome sees paragraphs with 'MsoList...' class, would it be possible to consider those as actual lists in CKEditor ? Do you know how IE and FF react on those cases, is the pasted text contain paragraph too ?

comment:7 Changed 8 years ago by Jakub Ś

Description: modified (diff)

Changed 8 years ago by Henrik

Attachment: Stuff to get.docx added

Simple two-level list

Changed 8 years ago by Henrik

Attachment: pasted.txt added

Export of the complete "Stuff to get" file pasted into a contentEditable div.

comment:8 Changed 8 years ago by Henrik

I have the same issue with both numbered and bulleted lists pasted from Word 2010 (MS Office Pro Plus, Word 14.0.6123.5000, 32bit, English), both when pasting in Chrome 22.0.1229.79 and IE 8.0.6001.18702. No difference between the official CKEditor 3 (Full) and 4 Beta (Inline) demos. This uses the default number and bullet styles. Doesn't matter which list type is used where though.

I've attached the file I created and copied (Ctrl+A, Ctrl+C) as "Stuff to get.docx" and the "raw" markup pasted into a simple contentEditable div in Chrome as "pasted.txt".

This is on a Win XP Pro machine in case that matters.

comment:9 Changed 8 years ago by Jakub Ś

If I understand correctly "Stuff to get.docx" is what you expect. Where is the file that is causing the problem? Where is the file which is creating : "Root level : "• " ; First indentation : "o " ; Second indentation : "§ " ".
That is what was reported in original ticket.

If you are just getting paragraphs please refer to #8734.

I'm waiting for your comments.

comment:10 Changed 8 years ago by Henrik

#8734 appears to be the exact same issue.

The "Stuff to get.docx" file is not what I expect, it's the actual file causing the problem. I should have renamed it to represent that instead of the "todo list" it was used as.

The inserted leading character mentioned in the original post is the "bullet item" itself. I get the default · character inserted inside span tags before each bulleted-list-paragraph, and a number before each numbered-list-paragraph, as I did not change the list style.

The "bullet item" span, and the "indent spaces" also mentioned in the original post, are wrapped in <!--[if !supportLists]--> <!--[endif]--> comments. To me it looks like these represent "artificial" bulletpoints/listnumbers for renderers which do not have real lists. CKEditor appears to remove/ignore the pasted comments but leave the span intact, which makes the "bullet items" visible, and possibly also interfering with parsing the list as an actual list. Everything inside those comments should be removed, not just the comments themselves.

If you change the list style, you can get any character or numerical representation to show up as the "bullet item" in each paragraph. The screencast attached in #8734 also shows that the "bullet item" has been included inside each of the paragraphs, but as a number since he used numbered lists.

I did not have a tool to inspect the "raw" clipboard data at the time of testing this, so I used a contentEditable div to at least get the HTML version of what was in there, after copying from the .docx file. I created "pasted.txt" by simply putting basic HTML markup including the contentEditable div in a file, opened that in my browser, pasted the contents from "Stuff to get.docx" inside the div, used the browser's developer tools to copy the current document's markup, and finally put that in the text file. Trivia note: Looks like either MS Word creates the weird markup on the fly during copying, or Libre Office filters it out when opening the file. If I open the same .docx file from Libre Office, copy everything and paste it into the contentEditable div I used to create "pasted.txt", I get nearly perfect markup to begin with. This is obviously a lot easier for CKEditor to work with, so the lists turn out great.

I attached the markup generated when copying from Libre Office and pasting into the plain contentEditable div as "pasted_libre.txt". That is much closer to what I expect when pasting into CKEditor from MS Word 2010.

Changed 8 years ago by Henrik

Attachment: pasted_libre.txt added

"Stuff to get.docx" opened in Libre Office and copied to a contentEditable div.

comment:11 Changed 8 years ago by Jakub Ś

@TwoD - Thank you for detailed description.

Result from Libre Office is ul li tags which Chrome understands as list. Anything else i.e. List which in pasted HTML is represented by paragraphs is not recognized as list.

Last edited 8 years ago by Jakub Ś (previous) (diff)

comment:12 Changed 8 years ago by Jakub Ś

Description: modified (diff)
Note: See TracTickets for help on using tickets.
© 2003 – 2019 CKSource – Frederico Knabben. All rights reserved. | Terms of use | Privacy policy