Opened 11 years ago

Closed 11 years ago

Last modified 11 years ago

#10018 closed Bug (wontfix)

IE8 not cleaning up invalid HTML

Reported by: Jon Sykes Owned by:
Priority: Normal Milestone:
Component: Core : Parser Version:
Keywords: Cc:

Description

We had HTML corruption, during testing we identified that in IE8 some broken HTML isn't being cleaned and allows for saving of the corrupt HTML structure.

If you open up the demo page, and paste the code snippet below either from a web page that has this code then switch to source you'll see that the empty tags don't get cleaned up.

They do they cleaned in Chrome and Firefox no problem.

<p><//><//></p>

Change History (9)

comment:1 Changed 11 years ago by Jakub Ś

Keywords: ie8 invalid html xhtml parse removed
Status: newpending
Version: 3.6.4

I have tried pasting your code directly in source mode in CKEditor or displaying it first on HTML page, then copying and pasting in WYSIWYG area.

<div style="border:1px solid black;width:200px;height:100px;">
<p><//><//></p>
</div>

In all cases either such 'tags' were removed or changed to HTML comments.

Could you please provide exact steps to reproduce this problem or/and sample and working page helping to reproduce it?

I have tested editor 3.6.4, 3.6.6 and 4.0.1

comment:2 Changed 11 years ago by Jon Sykes

Sorry, I simplified it too much, I just took the one of the corrupt sample content blocks we have found and tried to reduce it down to the minium.

<ul dir="ltr">
	<li align="justify">
		</>abc
		<li align="justify">
			</></></>123</></><//><//><//><//><//></li>
		<//></li>
</ul>
<p>
	</></></></></></>
	<p>
		</></></></></></>
		<p>
			</></></></></></>
			<p>
				</></></></></></>
				<p>
					</></></></></></>
					<p>
						</></></></></></>
						<p>
							</></></></></>
							<p>
								</></></></></></>
								<p>
									</></></></></>
									<p>
										</></></>
										<p>
											</></></></>
											<ul>
											</ul>
											<p>
												&nbsp;</p>
											<p>
												&nbsp;</p>
											<p>
												&nbsp;</p>
											<p>
												&nbsp;</p>
											<p>
												&nbsp;</p>
											<p>
												&nbsp;</p>
											<p>
												&nbsp;</p>
											<p>
												&nbsp;</p>
											<p>
												&nbsp;</p>
											<p>
												&nbsp;</p>
											<//><//><//><//></p>
										<//><//><//></p>
									<//><//><//><//><//></p>
								<//><//><//><//><//><//></p>
							<//><//><//><//><//></p>
						<//><//><//><//><//><//></p>
					<//><//><//><//><//><//></p>
				<//><//><//><//><//><//></p>
			<//><//><//><//><//><//></p>
		<//><//><//><//><//><//></p>
	<//><//><//><//><//><//></p>

If you drop this into ckeditor 3.6.4 in IE8 and toggle back and forth from source to edit view, you'll see each time the html gets more and more crazy without doing any other actions. I'll try and boil this down even further into an even shorter example.

comment:3 Changed 11 years ago by Jon Sykes

<ul dir="ltr">
	<li align="justify">
		abc</li>
	<li align="justify">
		</></></>123</></><//><//><//><//><//></li>
</ul>

This doesn't generate additional corrupt html, but it also retains the <> tags and doesn't strip or convert to comments.

comment:4 Changed 11 years ago by Jon Sykes

The root cause of these additional weird tags might be related to http://dev.ckeditor.com/ticket/6789 but even if empty tags are generated I was under the impression ckeditor validated markup.

comment:5 Changed 11 years ago by Jakub Ś

Resolution: wontfix
Status: pendingclosed
  1. Your first example is not valid. You have p inside p which is not valid HTML (<> are also invalid). You can always say that editor should not react the way it reacts but ... there is always way to break editor with really complicated and invalid HTML.
  2. Second example - As you already know this is also invalid HTML (entities should be used here). In this case editor does no cleaning. It pastes what browser gives it. In Webkit, FF and Opera you get comments, in IE9 these are removed, in IE7 and IE8 these are left untouched.
  3. This ignoring may happen (my assumption only) because sometimes users want to paste their own custom inline tags (not containing children). Such tags are then not ignored and allow better flexibility.
  4. In CKEditor 4.1 we will introduce allowedContent filter for filtering tags, styles and attributes. My colleague has checked your first example on test branch and it will change in two lists and bunch of paragraphs below them if <> will be filtered.

To summarize: in CKE 3.6.x this is a won't fix. In CKEditor 4.1 this will be handled by new feature.

To solve this problem in CKE 3.x you can attach listeners on Ctrl+v (keydown and check keystroke) and paste event and filer it there. You could try attaching filters to insertHtml event - http://docs.cksource.com/ckeditor_api/symbols/CKEDITOR.editor.html#event:insertHtml.
If this is not a problem in editor but in your application than you can filter data when it is submitted to your app (getData() method or event can be used here - http://docs.cksource.com/ckeditor_api/symbols/CKEDITOR.editor.html#event:getData).

comment:6 Changed 11 years ago by Jon Sykes

I understand it's not valid HTML, that's why it's such a concern. In our situation it's not our users entering malformed HTML (they only have the wysiwyg) view, our suspicion is the <> and </> are being caused by another ckeditor bug (like 6789) what I was hoping this ticket might get was if that malformed HTML is generated by Ckeditor that it wouldn't then save it.

It does sound like we might have to write our own parser on save to ensure that HTML is cleaned up or rejected, I just thought one of the features of Ck was that it validated against a DTD and always generated valid HTML.

comment:7 Changed 11 years ago by Wiktor Walc

In CKEditor 4.1 we will introduce allowedContent filter for filtering tags, styles and attributes. My colleague has checked your first example on test branch and it will change in two lists and bunch of paragraphs below them if <> will be filtered.

It does sound like we might have to write our own parser on save to ensure that HTML is cleaned up or rejected, I just thought one of the features of Ck was that it validated against a DTD and always generated valid HTML.

No need to write anything. Wait for 4.1.

comment:8 Changed 11 years ago by Jon Sykes

OK, thanks, really appreciate your input on this matter.

comment:9 Changed 11 years ago by Jakub Ś

@jonsykes your initail report was talking about CKEditor leaving invalid code and not creating invalid code. If you are convinced that CKEditor is creating such code then could you please provide Sample file that is causing this error?
Please also note that #6789 creatures entities &lt;&gt; which in WYSIWYG mode show up as <> while you claim that CKEditor creates <> in HTML (Source mode). This doesn't look like the same issue.

Note: See TracTickets for help on using tickets.
© 2003 – 2022, CKSource sp. z o.o. sp.k. All rights reserved. | Terms of use | Privacy policy