Opened 10 years ago

Last modified 7 years ago

#11486 review New Feature

Create a new test system for Paste from Word

Reported by: Frederico Caldeira Knabben Owned by: Frederico Caldeira Knabben
Priority: Must have (possibly next milestone) Milestone:
Component: Plugin : Paste from Word Version:
Keywords: Cc:


To better support #9991, a new test system should be available for Paste from Word.

Change History (12)

comment:1 Changed 10 years ago by Frederico Caldeira Knabben

Component: GeneralCore : Pasting
Milestone: CKEditor 4.4
Owner: set to Frederico Caldeira Knabben
Status: newreview

Pushed t/11486@tests.

The commit with description "Sample tests for the PFW system." is there just for review purposes.

If the idea is accepted, more work is needed. We need wwalc to check the php scripts to be sure that they're safe.

comment:2 Changed 10 years ago by Frederico Caldeira Knabben

Component: Core : PastingPlugin : Paste from Word

comment:3 Changed 10 years ago by Piotrek Koszuliński


comment:4 Changed 10 years ago by Frederico Caldeira Knabben

I’ve been working on these tests for some good time already. I was mainly focused on understanding what we need to support and what tools we need to create tests. In this way, we could define the final structure of the test system.

I’ve been trying to understand what are the current stats for Word versions usage. It’s very hard to find something useful, but looking at the few data available it’s reasonable that we should focus today on Word 2013 and Word 2010.

Let’s not forget Mac users though. So we should support the current Word for Mac 2011.

Still, we need also to understand that there is a huge amount of legacy word documents spread everywhere and people certainly still deal with them. So, other than supporting the .docx format (since Word 2007), we should also support the historical .doc files. I mean, we need to support the files being opened with Word 2010 and 2013, not really supporting the Word versions used originally to create these files.

I’ve installed the above three Word versions and played with clipboard and CKEditor, using both .docx and .doc files. Part of this research can be found in the t/11486b branch @tests. Lemme describe what we have there (inside /dt/plugins/pastefromword):

  • tools/getclipboard.html: this is a tool that uses the CKEditor pasting API to catch the HTML received by the browser on paste. It also shows the final HTML produced by the paste processing. This tool is essential to create automated tests.
  • tests/: here we would have source files for automated tests.
  • pastefromword.html: this is the main test page. It’ll dynamically create tests based on the contents of the above “tests” folder. This is not working on t/11486b, but the idea can be checked on t/11486.

I didn’t cleanup the tests folder for now, because we may eventually talk over the files. But it looks like we’ll need much less files than what we have there right now, because of the following.

In fact, these are good news: Word 2010, 2013 and 2011 for Mac behave in the same way when it comes to clipboard. It means that, the same file, will produce (enough) similar HTML on pasting.

The bad news: .docx and .doc versions of the very same contents (I mean, created independently but in the very same way), will produce totally different HTML. The good thing here is that all Word versions produce the same HTML in this case as well.

The additional factor is that IE, Chrome and Firefox will produce totally different HTML.

All the above summarizes to the following conclusion:

  • We need to produce .docx and .doc files for tests.
  • We can use any of the three target Word versions to produce the HTML input for test files.
  • We need to produce test files for every browser.

Considering this, I would propose the following structure for the tests folder:

  • /tests/
  • /tests/[test title | ticket number]/ (n)
  • /tests/[test title | ticket number]/expected.txt
  • /tests/[test title | ticket number]/[doc|docx]/
  • /tests/[test title | ticket number]/[doc|docx]/source.doc(x)
  • /tests/[test title | ticket number]/[doc|docx]/[browser].txt (n)

Then it’s up to us to decide how many tests to create, but with the available tools it is relatively easy to handle them.

It’s still unclear if a single expected.txt file will be enough, but we can change this in the course of the tests creation if needed. Eventually we’ll need different versions per file format or per browser.

comment:5 Changed 10 years ago by Frederico Caldeira Knabben

Additional information: as for their current versions, Safari and Chrome produce the very same HTML on Word paste.

comment:6 Changed 10 years ago by Piotrek Koszuliński

Milestone: CKEditor 4.4CKEditor 4.5

comment:7 Changed 9 years ago by Piotrek Koszuliński

Milestone: CKEditor 4.5.0CKEditor 4.6.0

comment:8 Changed 8 years ago by Marek Lewandowski

Just to add some bits to @fredck, testing confirms that:

  • Chrome acts pretty much the same despite Word version (2007, 2013 tested)
  • FF acts pretty much the same (major differences are in source, for xml element wrapped with <!--[if gte mso 9]> but we don't care about that)
  • IE - IEs actually varies when it comes down to different browser versions
    • IE11 is an exception, it has pretty consistent behaviour just as Chrome
    • But different Word version does not make a bigger difference in HTML

Differences Between IEs

What has been tested? I've used a simple .docx file with underline, bold, strikethrough, superscript, picture, header, table and page separator.


  • IE11 - is the only (didn't test Edge though) to put base64 encoded image instead of file:// url. (ie11 + word2007/2011)
  • IE9 - a single case when CKEditor does not convert picture to img element. It puts an empty paragraph there. (ie9+word2007)
  • IE8 - clipboard markup does not contain picture size and it's file name. (ie8+word2007)

Page Break

  • IE11 - It converts line Word page breaks into <br clear="all">. Any other browser will ignore it.


  • IE8, IE10 are the only IE versions that provide extra styling to provide unordered list marker, e.g. "circle".


  • IE8 is the only browser to produce invalid chromakey attribute for the img element, when math equation is transformed to an img.
  • IE11 is the only browser to convert math equatation to img that contains base64 encoded image of equation.

To sum it up

  • Chrome, Firefox, IE11 are consistent, and produces similar markup despite word version.
  • Earlier IEs have some inconsistencies when browser version changes.
Last edited 8 years ago by Marek Lewandowski (previous) (diff)

comment:9 Changed 8 years ago by Marek Lewandowski

I've ported @fredck tool to CKE 4.5+ (dataTransfer) and palced it in dev/pastefromword/ directory in pastefromword branch (though I think we can make this one more generic tool, and use as clipboard inspector).

For my needs I've also created a branch with an utility that easily allows you to save clipboard input / parsed content to a file. It will append controls panel where you specify Word version and the browser.

The tool is in pastefromword_php branch in my own CKE fork, ass it involves PHP scripts messing with filesystem, and we definely don't want such things in main repo.

comment:10 Changed 7 years ago by Marek Lewandowski

Milestone: CKEditor 4.6.0CKEditor 4.6.1

We started this system with 4.6.0, and though we used it to genereate test suites for PFW in this realease, the code itself is not yet in a production quality - thus moving to 4.6.1.

comment:11 Changed 7 years ago by Marek Lewandowski

Milestone: CKEditor 4.6.1CKEditor 4.6.2

comment:12 Changed 7 years ago by Marek Lewandowski

Milestone: CKEditor 4.6.2
Priority: NormalMust have (possibly next milestone)
Note: See TracTickets for help on using tickets.
© 2003 – 2022, CKSource sp. z o.o. sp.k. All rights reserved. | Terms of use | Privacy policy