Opened 6 years ago

Closed 6 months ago

Last modified 6 months ago

#7646 closed Bug (fixed)

Paste with CKEDITOR.config.pasteFromWordRemoveFontStyles = false

Reported by: naresh.sivaraman Owned by: garry.yao
Priority: Normal Milestone: CKEditor 4.6.0
Component: Plugin : Paste from Word Version: 3.6.2
Keywords: Oracle Cc:

Description

While trying to paste the following text from word, with config variable pasteFromWordRemoveFontStyles = false set, there seems to be few issues:

3/28現在、発生した全200件の障害のうち対応完了、対応不要、再現待ちを除く残事項は15件です。対応状況は、表1の通りです。

First Part till number 200 is fine. Pasted correctly. After 200, till the end, the font size increases from 10 to 10.5. And there is a font family change when 15 appears.

When copying without the numbers, the paste seems to be working fine. This has something to do with the numbers interspersed with the text. Please see the same.

Attachments (8)

01.testcase_document.png (53.2 KB) - added by naresh.sivaraman 6 years ago.
Word File
02.paste_to_editor_wysiwig.png (46.9 KB) - added by naresh.sivaraman 6 years ago.
Wysiwyg_Mode
03.paste_to_editor_htmlsource.png (62.3 KB) - added by naresh.sivaraman 6 years ago.
Source_Mode
testcase.doc (23.5 KB) - added by naresh.sivaraman 6 years ago.
testcase_V1.0.doc (26.5 KB) - added by yaswanth.ravella 5 years ago.
Test Case doc w/ bigger font
7646.patch (2.0 KB) - added by fredck 5 years ago.
7646_2.patch (8.4 KB) - added by garry.yao 5 years ago.
7646_3.patch (11.3 KB) - added by garry.yao 5 years ago.

Download all attachments as: .zip

Change History (41)

comment:1 Changed 6 years ago by j.swiderski

  • Status changed from new to pending

I could not reproduce this issue. CKEditor doesn't show font-size or family on selected text pasted form word so I have tried to switch between source and WYSIWYG, pasting form CKEditor back to word, checking text with FF font plug-in but none of this resulted in what you have described above. Line of text had one font-size and two font-families: one for text and the other for numbers.

Could you tell me what OS and browser are you using. Also have you tested this bug in the latest demo version http://nightly.ckeditor.com/demo? Maybe it has already been fixed?

If this issue is still reproducible could you provide me with more detailed test case. I'm also very interested in how did you managed to get that font size:)

Changed 6 years ago by naresh.sivaraman

Word File

Changed 6 years ago by naresh.sivaraman

Wysiwyg_Mode

Changed 6 years ago by naresh.sivaraman

Source_Mode

comment:2 Changed 6 years ago by naresh.sivaraman

Sorry, Saw your reply today only.

Have attached three files to the ticket. Please see them. This is what we see in our environment.

And It is not reproducible in the ckeditor demo because pasteFromWordRemoveFontStyles = false is not set by default in the ckeditor demo.

comment:3 Changed 6 years ago by j.swiderski

I see you are using IE7. I have checked that browser and all other IE versions but none of them produced what you have showed on screen shots. I have received the following results:

<span style="font-size: 10pt">3/28</span><span style="font-family: ms mincho"><span style="font-size: 10pt">現在、発生した全</span></span><span style="font-size: 10pt">200</span><span style="font-family: ms mincho"><span style="font-size: 10pt">件の障害のうち対応完了、対応不要、再現待ちを除く残事項は</span></span><span style="font-size: 10pt">15</span><span style="font-family: ms mincho"><span style="font-size: 10pt">件です。対応状況は、表</span></span><span style="font-size: 10pt">1</span><span style="font-family: ms mincho"><span style="font-size: 10pt">の通りです。</span></span></p>

Have you tried to download a clean version of CKEditor, set pasteFromWordRemoveFontStyles=false and than run your example?

I have and I've got what I've pasted above. This makes me rather convinced that there is no problem with CKEditor but with your environment. If you can prove me wrong and reproduce this issue in clean version of CKEditor please do so and provide me with a test case so that I could reproduce it to.

Waiting for your comments.

comment:4 Changed 6 years ago by naresh.sivaraman

I have been using my FF3 and still I am able to reproduce the issue using a plain vanilla instance of CKEDITOR. Will just detail the steps that i did.

(1) Downloaded CKEDITOR 3.6 Zip.

(2) Changed config.js and added config.pasteFromWordRemoveFontStyles=false;

CKEDITOR.editorConfig = function( config ) {

config.pasteFromWordRemoveFontStyles=false; Define changes to default configuration here. For example: config.language = 'fr'; config.uiColor = '#AADC6E';

};

(3) Took up one of the samples provided and tried to paste this : 3/28現在、発生した全200件の障害のうち対応完了、対応不要、再現待ちを除く残事項は15件です。対応状況は、表1の通りです。(Originally MS Pincho, Size 10 in Word.) Will attach the word document along.

(4) Copied to the editor and copied back to word. Found the formatting had changed as I have mentioned earlier. The first part till 200 is fine, copied with MS Pincho Size 10. Second part from 200 to 15 is in MS Pincho Font, but size is different. Third part, after 15 is of a different font family altogether.

Please let me know If I am doing something wrong here.

Changed 6 years ago by naresh.sivaraman

comment:5 Changed 6 years ago by garry.yao

WFM in both FF and IE.

comment:6 Changed 6 years ago by j.swiderski

@krst and me also could not reproduce the issue. Perhaps this problem is related to OS font settings.

Have you tried to reproduce this problem on different OS perhaps different machine?

comment:7 Changed 6 years ago by naresh.sivaraman

I am using Microsoft Windows XP(SP3) and using Microsoft Word 2003. I am using English Locale.

We have also been able to reproduce the issue on Windows 2003 server (SP2) and using MS Word 2007 with Japanese locale.

comment:8 Changed 6 years ago by j.swiderski

  • Status changed from pending to confirmed
  • Version changed from 3.4.2 to 3.0

I have received some confirming results but they are varying depending on a browser

IE has always been changing the font name, but the font size from CKEditor 3.1: IE6, IE7, IE8 - Name is changed to Arial Unicode MS and from CKEditor, size to 9.

IE9 and Webkit has always been changing font name to MS Gothic and from CKEditor 3.1, font size to 9.

FF and Opera leave the font name untouched but starting from CKEditor 3.1 have been changing font size to 10.5

comment:9 Changed 5 years ago by naresh.sivaraman

Do you have any new updates on this ticket? It has been five months since the last update and it would be really helpful to get the fix for this bug asap.

comment:10 Changed 5 years ago by twakashi

It would be really helpful if you can give us an update on this. My customer has been waiting for the fix for months. Would you please at least let us know a tentative time by which this will be fixed.

comment:11 Changed 5 years ago by j.swiderski

This is most likely the side effect of having pasteFromWordRemoveFontStyles disabled and I don't think that we will be able to change there much.

Please bear in mind Word is not made for producing web contents. CKEditor makes its best to translate the information send from Word through the clipboard but this is not always possible to translate it 1 to 1.

As I was saying in comment:8, I didn't get the same results as you did - there were font-name or font-size changes but they concerned the whole text not parts of it. IMO those differences are not that critical.

You have set the version for CKEditor 3.4.2 - have you tried to use latest CKEditor 3.6.2 - we have done some improvements since then.

Changed 5 years ago by yaswanth.ravella

Test Case doc w/ bigger font

comment:12 follow-ups: Changed 5 years ago by yaswanth.ravella

It is reproducible in CKEditor 3.6.2 as well. We think this the bug in the CKEditor. I've uploaded the testdoc with bigger font so it will be easier to see the buggy behaviour in CKEditor also. To reproduce, disable the pasteFromWordRemoveFontStyles and paste the contents of the doc in the CKEditor.

We have tested this on other Online editors and they seem to be working fine. One example is http://developer.yahoo.com/yui/examples/editor/code_editor.html

After some debugging we found that removing Lang attribute in the pastefromword filter is causing this issue. We are able to get this bug fixed by removing the below rule from \_source\plugins\pastefromword\filter\default.js

***936***
- // Remove lang/language attributes.
- [ ( /^lang/ ), '' ]

Can you please review this fix and let us know if it would have any side effects.

Thank you.

Last edited 5 years ago by yaswanth.ravella (previous) (diff)

comment:13 Changed 5 years ago by yaswanth.ravella

  • Version changed from 3.0 to 3.6.2

comment:14 in reply to: ↑ 12 ; follow-up: Changed 5 years ago by fredck

Replying to yaswanth.ravella:

We have tested this on other Online editors and they seem to be working fine. One example is http://developer.yahoo.com/yui/examples/editor/code_editor.html

Sorry, but this is out of comparison. There is no cleanup happening at all in the above editor. If you're happy with that, you can have the same results with CKEditor by adding this configuration line:

config.removePlugins = 'pastefromword';

But of course, this is strongly not recommended and there is no guarantee on the behavior of CKEditor when editing over the raw HTML sent by Word.

comment:15 in reply to: ↑ 14 ; follow-up: Changed 5 years ago by yaswanth.ravella

Replying to fredck:

Replying to yaswanth.ravella:

Hi Fred,

Thanks a lot for responding.

I understand that CKEditor is doing a cleanup, which is good and what we need. But during the cleanup, the span tags have clearly gone out of sync.

For example in my case. Below is the HTML data in CKEDITOR.cleanWord function before calling data.replace( /<span>/g, ):

<p>
<span><span style="font-family:ms pmincho,serif;"><span style="font-size:36.0pt;">現在全</span></span></span>
<span><span style="font-family:ms pmincho,serif;"><span style="font-size:36.0pt;">3/28<span>現在全</span>200<span>件の事項は</span>15<span>件です。対応状況は、</span></span></span></span>
</p>

The above html is perfect and works fine. But then I see a call to replace( /<span>/g, ) on the above HTML which removed all the opening span tags but didn't bother removing the corresponding closing tags. Below is the result of the replace operation.

<p>
<span style="font-family:ms pmincho,serif;"><span style="font-size:36.0pt;">現在全</span></span></span>
<span style="font-family:ms pmincho,serif;"><span style="font-size:36.0pt;">3/28現在全</span>200件の事項は</span>15件です。対応状況は、</span></span></span></span>
</p>

The above html is invalid and has clearly lost the formatting context. The ideal fix for this would have been to remove the corresponding closing tags for the empty span tags we just deleted and then the HTML would look like the below one, which is Valid.

<p>
<span style="font-family:ms pmincho,serif;"><span style="font-size:36.0pt;">現在全</span></span>
<span style="font-family:ms pmincho,serif;"><span style="font-size:36.0pt;">3/28現在全200件の事項は15件です。対応状況は、</span></span>
</p>

Since, span tags in msWordHTML contained lang attribute and I didn't mind lang attributes lying around in the output(for the greater good of not loosing the formatting context), I thought it would be fine to leave the lang attribute during cleanup so that span tags won't be removed. May be we should fix this at the removal of empty span tags code and make sure we remove the corresponding closing tags properly :)

But for now. I am okey with lang attribute lying around in the output as long as it doesn't break the formatting and doesn't cause any regression. Can you please let me know if you think this fix would have any other side effects.

comment:17 in reply to: ↑ 12 ; follow-up: Changed 5 years ago by fredck

  • Owner set to fredck
  • Status changed from confirmed to assigned

Replying to yaswanth.ravella:

The problem is not really the "lang" attribute removal. That's something that just fixes your case, but the real bug will still remain there and we could have other cases that would bring the same issue to light.

I'll be proposing a patch to it.

Changed 5 years ago by fredck

comment:18 Changed 5 years ago by fredck

  • Component changed from General to Core : Pasting
  • Keywords Oracle added
  • Status changed from assigned to review

comment:19 in reply to: ↑ 17 Changed 5 years ago by yaswanth.ravella

Replying to fredck:

The problem is not really the "lang" attribute removal. That's something that just fixes your case, but the real bug will still remain there and we could have other cases that would bring the same issue to light.

Yes. My fix was merely a hack to keep the span tags intact.

I'll be proposing a patch to it.

Thanks for taking this up. I will test the patch and will update this ticket accordingly.

comment:20 in reply to: ↑ 15 ; follow-up: Changed 5 years ago by fredck

Replying to yaswanth.ravella:

You understood the problem well. The issue you've described is exactly what the patch is fixing.

comment:21 in reply to: ↑ 20 Changed 5 years ago by yaswanth.ravella

Replying to fredck:

You understood the problem well. The issue you've described is exactly what the patch is fixing.

I just got confirmation from the customer that the fix worked for them. Thanks for helping out in this.

comment:22 follow-up: Changed 5 years ago by garry.yao

  • Status changed from review to review_failed

If this patch is taken, it would double the performance cost on paste from word, a better approach has to be found.

comment:23 in reply to: ↑ 22 Changed 5 years ago by fredck

  • Milestone set to CKEditor 3.6.3
  • Owner fredck deleted
  • Status changed from review_failed to confirmed

Replying to garry.yao:

If this patch is taken, it would double the performance cost on paste from word, a better approach has to be found.

That's the only reliable way I've found to replace the hack we had there in the code. Maybe you can propose something better?

comment:24 in reply to: ↑ 16 ; follow-up: Changed 5 years ago by garry.yao

Replying to fredck:

http://ckeditor.t/tt/7646/1.html

The ticket tc is missing from the above, pls check.

comment:25 in reply to: ↑ 24 Changed 5 years ago by fredck

Replying to garry.yao:

The ticket tc is missing from the above, pls check.

Sorry... t/7646 pushed.

Changed 5 years ago by garry.yao

comment:26 Changed 5 years ago by garry.yao

  • Owner set to garry.yao
  • Status changed from confirmed to review

New patch is based on the previous one, while it makes it possible to run filter multiple times without reparsing, so it should be minimum on performance.

comment:27 Changed 5 years ago by fredck

  • Status changed from review to review_failed

I've checked the patch and the basic idea of it is interesting. Just the way it is proposed is not looking clean, in the API point of view.

The fact is that we're introducing new API stuff here. So let's do it in the right way.

The first thing to note is that the filters defined on beforeProcessData will work on element nodes only. Filters on text or comments will have no effect.

Then, we need to have coherence with the method purposes. The patch says that writeHtml can be used also NOT to write HTML, which sounds strange. In fact, we're hacking it to make it became a two purposes thing.

So, if we have two purposes, let's have two separated methods at this point: writeHtml() (the current signature) and runFilter(), which does the thing we need for this ticket. The first method should call the second.

Of course, this will involve a review on the way writeHtml works. But if the latest patch idea is really the one to follow, than this is the price we pay to go with it.

We have still another weak point in the patch. I see that it is implementing two new events: beforeProcessData and beforeProcessHtml.

These new events are ok, but again they were made to satisfy this ticket needs only, being poorly implement. They look quite generic instead and so enhancements need to be done to propose it in the right way.

As the name says, the events must be called BEFORE PROCESSING the data/html. This means that the event must be the very first thing to happen on those methods, BEFORE any PROCESSING could have taken place.

Additionally, we're just giving the possibility to set filters with those events, but to bring more sense to these generic need events, we must also pass the data/html which is about to be processed, so the event listeners can manipulate the data/html before processing.

Finally, to make it complete, the beforeProcessData/Html need its counterpart afterProcessData/Html event, again making it possible to change the data/html AFTER processing.

Changed 5 years ago by garry.yao

comment:28 Changed 5 years ago by garry.yao

  • Status changed from review_failed to review

The new patch addresses the above review comments, and make it a step further, to avoid completely the HTML parsing happens inside of word paste file.

comment:29 Changed 5 years ago by fredck

  • Milestone CKEditor 3.6.3 deleted

This new patch is making the API too counter-intuitive. It's hard to understand what CKEDITOR.cleanWord does at this point. It started to do some magic that no one understands where it gonna be execute.

I'm not saying that the solution is not good. But due to its drastic changes, I would avoid having it on the upcoming v3 release.

comment:30 Changed 3 years ago by fredck

  • Component changed from Core : Pasting to Plugin : Paste from Word

comment:31 Changed 6 months ago by Tade0

  • Resolution set to fixed
  • Status changed from review to closed

Fixed with new Paste From Word plugin in 4.6.0.

comment:32 Changed 6 months ago by Anna

  • Milestone set to CKEditor 4.6.0

comment:33 Changed 6 months ago by Anna

Also, this was probably a duplicate of #8341.

Note: See TracTickets for help on using tickets.
© 2003 – 2016 CKSource – Frederico Knabben. All rights reserved. | Terms of use | Privacy policy