Opened 4 years ago

Closed 4 years ago

Last modified 4 years ago

#8774 closed Bug (fixed)

Output XML predefined named entities only

Reported by: garretwilson Owned by: garry.yao
Priority: Normal Milestone: CKEditor 3.6.3
Component: General Version:
Keywords: Cc:

Description

I want to output pure XHTML. I don't want any entities except the five entities supported by XML. If any other entities are added, the XHTML cannot be parsed by an XML parser. I don't know how to explain it any simpler than that.

CKEditor is entering  , which is not well-formed XML. This 100% breaks XHTML/XML support.

For example, if I type "blah blah " (without the quotes, yet with the trailing space) and then copy that string and paste it, every type I paste it I get another  . Yet the CKEditor entity configuration is broken (see #8755), and there is no way to turn off  .

This should be placed in very high priority, as it completely makes the output unparseable and breaks XHTML/XML compliance.

Attachments (2)

8774.patch (481 bytes) - added by fredck 4 years ago.
8774_2.patch (1.4 KB) - added by garry.yao 4 years ago.

Download all attachments as: .zip

Change History (17)

comment:1 Changed 4 years ago by garretwilson

In fact, I find even if I copy "blah" and then paste "blah" and hit space and paste "blah" again, I get   all over the place! This is seriously broken.

comment:2 Changed 4 years ago by garretwilson

This was tested on Chrome 17.

comment:3 Changed 4 years ago by garretwilson

...and before we get into the "but the browser adds this, so it must be a good thing" discussion: I don't care who adds it. I don't care why they add it. If there is   in the output, it is incorrect, bad, not XHTML compliant, and unparseable as XML. End of story.

comment:4 Changed 4 years ago by garretwilson

Note that I am using the "entities : false" setting. (If entity configuration weren't broken as indicated in #8755), I could use "basicEntities : false" and then "entities_additional : 'lt,gt,amp,apos,quot'".)

I see that if I go to your XHTML compliance demonstration at http://nightly.ckeditor.com/7366/_samples/output_xhtml.html , which has no entity configuration whatsoever, and I enter the word "forró", the output gives me "forró", which is not valid XHTML. So much for demonstrating XHTML compliance.

Let me be clear---such entities are valid XHTML if and only if you are using an XHTML DTD. But you can have XHTML without using an XHTML DTD. I want pure XML output, please.

comment:5 Changed 4 years ago by fredck

  • Status changed from new to confirmed
  • Summary changed from XHTML compliance completely broken by nbsp; output unparseable to Output XML predefined named entities only

Thanks for your input, but just to avoid misunderstandings, let's clarify a few things:

  • All your talk has nothing to do with XHTML. It's all about XML, period.
  • The concept of XHTML without DTD doesn't exist. Actually, XHTML is in fact XML with a DTD.
  • It's totally fine to have named entities inside XHTML. forró is certainly compliant.
  • It's totally fine to have named entities inside XML. The actual problem is that they're not know by your parser/processor because you're not defining them neither inline nor into a DTD.
  • CKEditor is XHTML compliant. Your application is not able/made to parse XHTML.

Now, to make it simpler as there is too much side notes, there is no bug here. You're asking for a feature not available in CKEditor: the possibility to output the XML predefined named entities only, leaving other entities as numbers.

Changed 4 years ago by fredck

comment:6 Changed 4 years ago by fredck

  • Owner set to fredck
  • Status changed from confirmed to review

There is a minor issue in the entities plugin, which makes it impossible to have this feature with the current trunk.

With the provided patch, the following configuration combination can be used to solve the ticket requirement:

config.basicEntities = false;
config.entities_processNumerical = true;
config.entities_additional = 'lt,gt,amp,apos,quot'
config.entities_latin = false;
config.entities_greek = false;

I would avoid adding dedicated features for it in the plugin at this point, having the proposed fix only.

comment:7 Changed 4 years ago by garretwilson

Let me address each of your points, but in a different order:

  • You say, "The concept of XHTML without DTD doesn't exist. Actually, XHTML is in fact XML with a DTD." This is a false statement. In "HTML vs XHTML" at http://www.w3.org/TR/html5/introduction.html#html-vs-xhtml , the W3C explains that an HTML document is considered XHTML when it complies with XML syntax requirements and is transmitted using an XML content type. Nothing more. There is no requirement of a DTD. See also the W3C's statement on "Writing XHTML Documents" in the HTML5 spec at http://www.w3.org/TR/html5/the-xhtml-syntax.html#writing-xhtml-documents . They say that "XML documents may contain a DOCTYPE if desired, but this is not required to conform to this specification. This specification does not define a public or system identifier, nor provide a format DTD." So not only is it possible to have an HTML5 XHTML document without a DTD, that's the only way to have an HTML5 XHTML document, because there is no XHTML5 DTD!
  • You say, "It's totally fine to have named entities inside XHTML. forró is certainly compliant." See again the W3C's statement on "Writing XHTML Documents" at http://www.w3.org/TR/html5/the-xhtml-syntax.html#writing-xhtml-documents . They say "According to the XML specification, XML processors are not guaranteed to process the external DTD subset referenced in the DOCTYPE. This means, for example, that using entity references for characters in XHTML documents is unsafe if they are defined in an external file (except for <, >, &, " and ')." So even if you use an external DTD, it's still unsafe to use those entities in XHTML.
  • You say, "It's totally fine to have named entities inside XML. The actual problem is that they're not know by your parser/processor because you're not defining them neither inline nor into a DTD." Both statements are true: one can have named entities inside XML but you must define them inline or in a DTD. But as I point out, I am using XHTML without a DTD, and indeed there is no DTD for XHTML5, and even if there were, it would still be unsafe to use them unless I defined them inline. By forcing me to use these entities, you are forcing me to define them inline---but remember that CKEditor edits a fragment of a document, and it is erroneous to assume something about the context of the body of document when you are editing a fragment. CKEditor puts out no DTD, inline or otherwise.
  • You say, "All your talk has nothing to do with XHTML. It's all about XML, period." Hopefully you now see that I am indeed talking about XHTML. Specifically I am talking about encoding HTML5 in XHTML, which is not only possible without a DTD, it is *only* possible without an external XHTML5 DTD, because such a DTD does not exist. Hopefully you also now see that even if an external DTD did exist for HTML5, it would still not be a good idea to include the entities, as explained above by the W3C itself.
  • You say, "CKEditor is XHTML compliant. Your application is not able/made to parse XHTML." I again defer to the W3C on "Writing XHTML Documents" at http://www.w3.org/TR/html5/the-xhtml-syntax.html#writing-xhtml-documents : "The syntax for using HTML with XML, whether in XHTML documents or embedded in other XML documents, is defined in the XML and Namespaces in XML specifications. This specification does not define any syntax-level requirements beyond those defined for XML proper." I am using a perfectly valid XML parser to parse my XHTML, and as the W3C points out, I should be able to parse my XHTML with only an XML parser.

comment:8 Changed 4 years ago by fredck

There was no mention to XHTML5 so far in this ticket.

HTML5 is far from being the reference for XHTML. If we wants to talk about XHTML compliance, the following is the right source: http://www.w3.org/TR/xhtml1/

HTML5 is now, for simplicity, pushing a new definition for XHTML. But XHTML5 is not XHTML.

Anyway, this talk is not bringing any value to the ticket, so better to stop it now.

comment:9 Changed 4 years ago by garry.yao

  • Status changed from review to review_failed

Latin-1 Entities and some others symbols are not still encoded as character entities.

http://ckeditor.t/tt/8774/1.html

Changed 4 years ago by garry.yao

comment:10 Changed 4 years ago by garry.yao

  • Owner changed from fredck to garry.yao
  • Status changed from review_failed to review

comment:11 Changed 4 years ago by fredck

  • Milestone set to CKEditor 3.6.3
  • Status changed from review to review_passed

comment:12 Changed 4 years ago by garry.yao

  • Resolution set to fixed
  • Status changed from review_passed to closed

Fixed with [7412].

comment:13 Changed 4 years ago by fredck

  • Resolution fixed deleted
  • Status changed from closed to reopened

The ticket test is not passing. Additionally, it doesn't seem to be masterised.

comment:14 Changed 4 years ago by fredck

  • Resolution set to fixed
  • Status changed from reopened to closed

My fault... everything is ok.

comment:15 Changed 4 years ago by garry.yao

Ticket test breaks in IE8, fixed with [7426].

Note: See TracTickets for help on using tickets.
© 2003 – 2016 CKSource – Frederico Knabben. All rights reserved. | Terms of use | Privacy policy