Ticket #2291 (closed Bug: duplicate)
[FF3] simple copy & paste from Word document - extra code not stripped
| Reported by: | icedblind | Owned by: | |
|---|---|---|---|
| Priority: | Normal | Milestone: | FCKeditor 2.6.3 |
| Component: | General | Version: | FCKeditor 2.6.1 |
| Keywords: | Confirmed | Cc: |
Description
You can check this bug by yourself trying to copy and paste (CTRL+C/V) some text from a Microsoft Word document using first FF2 and after FF3 in Demo FCKeditor pages.
In FF3, viewing the source code of the copied text, you can see some extra information that in FF2 is stripped (meta tags, xml and style definitions):
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" /> <meta content="Word.Document" name="ProgId" /> <meta content="Microsoft Word 11" name="Generator" /> <meta content="Microsoft Word 11" name="Originator" /> <link href="file:///[...]" rel="File-List" /><!--[if gte mso 9]><xml> <w:WordDocument> <w:View>Normal</w:View> [...] </xml><![endif]--><style type="text/css"> <!-- /* Style Definitions */ [...] </style> <![endif]-->
Change History
comment:1 Changed 5 years ago by fredck
- Status changed from new to closed
- Keywords ff3 copy paste word removed
- Resolution set to invalid
- Milestone FCKeditor 2.6.2 deleted
comment:2 Changed 5 years ago by icedblind
I'm sorry fredck, but that does not resolve the problem: if you try, also using the "Paste from Word" button you get quite the same result (ok, something more is stripped, but you don't get the same result as using FF2).
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"> <meta content="Word.Document" name="ProgId"> <meta content="Microsoft Word 11" name="Generator"> <meta content="Microsoft Word 11" name="Originator"> <link href="file:///[...]" rel="File-List" /><!--[if gte mso 9]><xml> Normal 0 false false false [...] MicrosoftInternetExplorer4 </xml><![endif]--><!--[if gte mso 9]><![endif]--><style type="text/css"> <!-- /* Style Definitions */ [...] </style><!--[if gte mso 10]> <style> /* Style Definitions */ [...] </style> <![endif]-->
comment:3 Changed 5 years ago by icedblind
- Status changed from closed to reopened
- Resolution invalid deleted
fredck, i've taken a look at fckeditor/editor/dialog/fck_paste.html code, inside CleanWord function and i propose to modify some code - if you all don't see any problem, i'm not a smart js programmer''
fckeditor/editor/dialog/fck_paste.html - line 248
actual
html = html.replace(/<\!--.*?-->/g, '' ) ;
new lines
html = html.replace( /<w:[^>]*>(.*?)<\/w:[^>]*>/gi, '' ) ; html = html.replace( /<meta[^>]*>/gi, '' ) ; html = html.replace( /<link[^>]*>/gi, '' ) ; html = html.replace( /<style[^>]*>([\w|\W|\n]*?)<\/style>/gim, '' ) ; html = html.replace( /<\!--([\w|\W|\n]*?)-->/gm, '' ) ;
This at least strips away the lines that i've reported in FF3, and also extends to multiplelines the comment removal using the "paste from word" button.
comment:4 Changed 5 years ago by fredck
- Keywords Confirmed added
- Milestone set to FCKeditor 2.6.2
Your suggestion makes sense icedblind. Thanks for it.
The provided regexes are not the definitive though. The "catch all" for JavaScript is /[\s\S]*/. Also, <meta> and <link> could be caught on a single regex /(?:meta|link)/.
We are aware that the Word cleanup procedure is to be refined with time. We can't catch all cases in a set of tests, so additions like yours would just make it better.
I had previously invalidated the ticket because it was making reference to the plain pasting operation, which is out of our control. Now, enhancements to Paste from Word are definitely acceptable.

You should use the "Paste from Word" button to have better results. The normal pasting will take the clipboard data as is, and we can see that the Firefox team have worked to make it "better" for their version 3.