Opened 17 years ago

Closed 16 years ago

#1355 closed Bug (fixed)

Converting <p> to <pre> may add additional line breaks

Reported by: Frederico Caldeira Knabben Owned by: Martin Kou
Priority: Normal Milestone: FCKeditor 2.6
Component: General Version: SVN (FCKeditor) - Retired
Keywords: Review+ Cc:

Description

Steps to Reproduce

  1. Load the following HTML:
<p>Line 1<br />
Line 2<br />
Line 3</p>
  1. Change the paragraph Format to "Formatted".

Note that each line has been separated by a double line break. It doesn't happen if we have the following <p> instead:

<p>Line 1<br />Line 2<br />Line 3</p>

Things get even worst if the initial <p> looks like this:

<p>
    Some


    sample
</p>

It is evident that the whitespace of the original <p> is being preserved on the transformation, but the expected result for the above case is:

<pre>Some sample</pre>

This is not a problem with IE, as it automatically eats whitespaces when parsing.

Attachments (6)

1355.patch (2.4 KB) - added by Martin Kou 16 years ago.
Proposed patch for fixing #1355.
1355_2.patch (9.3 KB) - added by Martin Kou 16 years ago.
1355_3.patch (9.3 KB) - added by Martin Kou 16 years ago.
Fixes a minor bug in 1533_2.patch discovered when writing test cases.
1355_4.patch (9.4 KB) - added by Martin Kou 16 years ago.
Simplified the 1355_3.patch.
1355_5.patch (8.8 KB) - added by Martin Kou 16 years ago.
Furthur simplified 1355_4.patch from Fred's suggestions.
1355_6.patch (10.9 KB) - added by Martin Kou 16 years ago.

Download all attachments as: .zip

Change History (21)

comment:1 Changed 16 years ago by Martin Kou

Owner: set to Martin Kou
Status: newassigned

Changed 16 years ago by Martin Kou

Attachment: 1355.patch added

Proposed patch for fixing #1355.

comment:2 Changed 16 years ago by Martin Kou

Keywords: Review? added

comment:3 Changed 16 years ago by Frederico Caldeira Knabben

Keywords: Review- added; Review? removed

It seems a much simpler solution could be achieved by using a couple of regexes and innerHTML.

We must also check if the other way (<pre> => <p>) is properly handled.

comment:4 Changed 16 years ago by Martin Kou

The regex method sounds simple in theory, unfortunately it seems to trigger a nasty Firefox bug which causes JavaScript error in the range.SelectBookmark() function.

After we modified the <pre> node by assigning innerHTML instead of by DOM tree operations, the bookmark nodes inside the <pre> block can still be retrieved by document.getElementId() in range.SelectBookmark(), which is correct. However, the bookmark nodes retrieved this way would then have a null parentNode property, which does not make sense. That caused the SelectBookmark() function to throw JavaScript errors.

This is very unfortunate, I'm trying to see if a workaround can be created.

comment:5 Changed 16 years ago by Martin Kou

The regex method does not work <pre> -> regular block because converting simple spaces to &nbsp; with a regex would destroy the contents inside tags. Also for cases like

<pre>
hello world!
</pre>

Converting the space between "hello" and "world" to &nbsp; is not necessary.

comment:6 in reply to:  5 Changed 16 years ago by Frederico Caldeira Knabben

Replying to martinkou:

The regex method does not work <pre> -> regular block because converting simple spaces to &nbsp; with a regex would destroy the contents inside tags. Also for cases like

<pre>
hello world!
</pre>

Converting the space between "hello" and "world" to &nbsp; is not necessary.

Martin, I still think using innerHTML and Regex is the easier solution. It would also avoid problems with successive space text nodes.

One idea would be using the <string>.split, to split the text using a regex that finds tags, like /(<[^\s].*?>)/, capturing the regex matches, so they are included in the split array. Then, processing each entry in the array, replacing spaces in those that don't look like tags (don't start with <[^\s], basically), and re-joining the array.

To replace the spaces, first replace all tabs with four spaces, then a simple replace regex with / {2,}/ using a replace function would do the &nbsp; magic.

comment:7 in reply to:  4 Changed 16 years ago by Frederico Caldeira Knabben

Replying to martinkou:

After we modified the <pre> node by assigning innerHTML instead of by DOM tree operations, the bookmark nodes inside the <pre> block can still be retrieved by document.getElementId() in range.SelectBookmark(), which is correct. However, the bookmark nodes retrieved this way would then have a null parentNode property, which does not make sense. That caused the SelectBookmark() function to throw JavaScript errors.

Be sure that bookmark.StartNode and bookmark.EndNode are null before calling range.SelectBookmark(). The node could have been cached in the bookmark for performance.

comment:8 Changed 16 years ago by Martin Kou

I've written a new patch based on the regex approach, with some simple XHTML parsing logic to differentiate between XHTML tags and non-tag XHTML code.

It turns out there are quite a number of gotchas in converting between <pre> and regular blocks. e.g. when converting from <pre> to <p>... the following cases need to be handled correctly:

<pre>some sample</pre>

vs

<pre>some   sample</pre>

The first case can be converted to <p> without any problem. But the spaces in the second case must be converted to &nbsp;.

and

<pre> some sample</pre>

vs

<pre>some sample</pre>

Simply changing the tag to <p> in the first case would cause the leading space to disappear, so even though it's only a single space it still must be converted to &nbsp;.

And then there are gotchas when people put weird stuff inside tags... e.g. trying to convert the following to <pre>

<p>some <b attr="<<               >>" attr2='>     ><     <'> text </b> here</p>

Sure it's not standard but even if the user wrote &gt; and &lt; within those quotes, IE would still convert them back to the literal form for us, which causes trouble for regex substitutions since simple dumb regex does not differentiate between text inside tags and text outside tags.

I think I should write some test cases in the FCKtest SVN tree first, to record down all those gotchas I've discovered so far, before putting my patch on review.

Changed 16 years ago by Martin Kou

Attachment: 1355_2.patch added

Changed 16 years ago by Martin Kou

Attachment: 1355_3.patch added

Fixes a minor bug in 1533_2.patch discovered when writing test cases.

comment:9 Changed 16 years ago by Martin Kou

Keywords: Review? added; Review- removed

I've uploaded a number of test cases for this ticket to the FCKtest tree in [1452].

An updated patch was attached which solves a minor bug in the previous patch. The bug was discovered while writing the test cases. In particular, the previous patch (1355_2.patch) could not properly handle the following case when converting to <P>:

<pre>
some text ehre
 </pre>

comment:10 Changed 16 years ago by Martin Kou

This ticket is related to #1229.

Changed 16 years ago by Martin Kou

Attachment: 1355_4.patch added

Simplified the 1355_3.patch.

Changed 16 years ago by Martin Kou

Attachment: 1355_5.patch added

Furthur simplified 1355_4.patch from Fred's suggestions.

comment:11 Changed 16 years ago by Frederico Caldeira Knabben

Keywords: Review- added; Review? removed

Ok... the code seemed ok to me now, but then I had some failures on some TCs.

For example. Load the following code in the editor.

<pre>
    Line 1
    Line 2
    Line 3
</pre>

Note that each line has 4 spaces in the beginning.

Now, convert the above block to <p>. You will immediately note that Line 1 is not aligned with the others. Switching to Source we have:

<p>&nbsp;&nbsp;&nbsp;&nbsp;Line 1<br>
&nbsp;&nbsp;&nbsp;Line 2<br>
&nbsp;&nbsp;&nbsp;Line 3</p>

So, none of the lines have been converted properly. Also, looking at the way browses handle followed spaces, the real space is always left "after" the &nbsp; sequence (which makes sense for line wrapping), while we are making it come before of it (my fault). So, changing line 781 to the following will fix Line 2 and Line 3:

return new Array( match.length ).join( '&nbsp;' ) + ' ' ;

The "Line 1" problem still persists though.

The expected output is:

<p>&nbsp;&nbsp;&nbsp; Line 1<br>
&nbsp;&nbsp;&nbsp; Line 2<br>
&nbsp;&nbsp;&nbsp; Line 3</p>

Changed 16 years ago by Martin Kou

Attachment: 1355_6.patch added

comment:12 Changed 16 years ago by Martin Kou

Keywords: Review? added; Review- removed

I've updated the patch (yet again) to fix the newly discovered problem. New test cases will be added to FCKtest shortly.

comment:13 Changed 16 years ago by Martin Kou

Two new test cases has been added to the FCKtest source tree for this ticket.

comment:14 Changed 16 years ago by Frederico Caldeira Knabben

Keywords: Review+ added; Review? removed

Please add the changelog entry for it when committing.

comment:15 Changed 16 years ago by Martin Kou

Resolution: fixed
Status: assignedclosed

Fixed with [1559].

Click here for more info about our SVN system.

Note: See TracTickets for help on using tickets.
© 2003 – 2022, CKSource sp. z o.o. sp.k. All rights reserved. | Terms of use | Privacy policy