Opened 7 years ago

Closed 7 years ago

Last modified 7 years ago

#8704 closed Bug (expired)

HTML Parsing Problem

Reported by: joydipraha Owned by:
Priority: Normal Milestone:
Component: Core : Output Data Version:
Keywords: Cc: tuonela76@…

Description

Hi there,

I have using CKEditor 3 for on of my site.Now in ckedior when i put some space in ckeditor content it shows me some unwanted characters in the out put instead of those space. Can you please help me out with some solution of this problem. You can see the text which is given through CKEditor from back end on the following URL

http://www.flightoffers.org.uk. See at the end of the page. That description text is given through a CKEditor which is not 100% parsed as it shows some unknown character.

Thanks & Regards

Joydip Raha +91 9477039314

Change History (21)

comment:1 Changed 7 years ago by Jakub Ś

Keywords: HTMLParse removed
Status: newpending

I see the result but I'm not sure how have you achieved it. Could you give me more details here?

when i put some space

How do you put this space into CKEditor is it by pressing space or you pase some white-space characters from external editor?

Such characters usually occur when you have something wrong with the encoding E.g. different in web-page and different in data base.

comment:2 Changed 7 years ago by Amy McCrobie

I have found that I have a similar problem. The user types a space, or sometimes a new line, and in the database the text is saved as <p> </p> (the space in between the paragraph tags seems to be a "tab" space. CKEditor does not seem to know how to parse the empty space, so it is replaced with an unknown character (the diamond with the question mark) which, when saved, is replaced in the database with %uFFFD. Then, if the user displays the text again, %uFFFD is displayed where the unknown character was previously.

comment:3 Changed 7 years ago by Jakub Ś

The user types a space, or sometimes a new line, and in the database the text is saved as <p> </p>

Firest you make space or new line in CKEditor and then you save it in DB, is that correct?

CKEditor does not seem to know how to parse the empty space, so it is replaced with an unknown character

When does is happen, when you load data from DB in CKEditor?

If answer to above questions is yes, then the problem here is rather with saving your data in DB and not with CKEditor. Most likely this is a problem with encoding. Could you please check which encoding your DB uses and how your read request parameters in your webapp and write your results?

comment:4 Changed 7 years ago by Amy McCrobie

First you make space or new line in CKEditor and then you save it in DB, is that correct?

That is correct. And, in the database the space or new line is saved as <p> </p>, but there appears to be a new line and a tab between the tags in the database. On some occasions, there is simply a space.

CKEditor does not seem to know how to parse the empty space, so it is replaced with an unknown character

When does is happen, when you load data from DB in CKEditor?

Yes. The special character is a black diamond with a question mark.

Could you please check which encoding your DB uses and how your read request parameters in your webapp and write your results?

I use latin1_swedish_ci on the DB.
To load the saved text into the editor, I simply get the data from the database in a mysql query:

$comments = $Rjobstatus!['comments'];

Then I echo the variable:

<textarea><?php echo $comments;?></textarea>

To store the text into the DB I use a javascript function with makes an ajax POST request to a php script. I get the text:

var editor5 = escape(CKEDITOR.instances.editor5.getData());

Then, in the php script:

$comments = mysql_real_escape_string($_POST['editor5']);
$comments = preg_replace("/[\n\r]/","",$comments);
$comments = str_replace("<p>\t</p>",'',$comments);
$comments = str_replace("<p><br /></p>",'',$comments);
$comments = str_replace("<p>&nbsp;</p>",'',$comments);
$comments = str_replace("<ul>\t",'',$comments);
$comments = str_replace("<ol>\t",'',$comments);

And, from there I update the db with the new data.


Yesterday I made some changes to the config.js file:

config.enterMode = CKEDITOR.ENTER_BR;
config.autoParagraph = false;

Now, new lines are saved in the database as <br />
I have not yet had the chance to test the app since these changes.

comment:5 Changed 7 years ago by Jakub Ś

This is 99% encoding issue.

I use latin1_swedish_ci on the DB.

And what encoding you use on your page E.g. In CKEditor's samples you will see UTF-8 <meta content="text/html; charset=utf-8" http-equiv="content-type" />

And, in the database the space or new line is saved as <p> </p>, but there appears to be a new line and a tab between the tags in the database. On some occasions, there is simply a space

From the above I'm guessing that you may be using the same encoding on page and in application but different in DB what results in strange white space characters.

You can test this in a very simple way:

  • send contents to server and echo it right after reading request parameters
  • do your changes and echo contents right before sending it to DB
  • load your data into CKEditor and echo it right after getting it from DB.

This should give you the idea where encoding is mixed up. Please check it and send a comment.

comment:6 Changed 7 years ago by Amy McCrobie

You can test this in a very simple way:

  • send contents to server and echo it right after reading request parameters
  • do your changes and echo contents right before sending it to DB
  • load your data into CKEditor and echo it right after getting it from DB.

This should give you the idea where encoding is mixed up. Please check it and send a comment.

Sorry for the late reply. I was out with pneumonia! I have not had a chance to try your suggestion, but I did change the encoding the database uses to utf8. I will see how that works for me... It should produce the same results - basically if I don't have the problem anymore, then encoding was the issue.

comment:7 Changed 7 years ago by Jakub Ś

Resolution: expired
Status: pendingclosed

comment:8 Changed 7 years ago by Amy McCrobie

I noticed the status of the ticket was changed to closed and realized I needed to update the ticket with my findings...

Changing the encoding on the database did not resolve the issue. The database, its tables and fields, and all pages on my site are now set to encode with UTF-8.

On one occasion, the database field contains the text "<p>6/20<br />%uFFFD</p>".

On another occasion, the database field contains the text "<p>%uFFFD</p>". In this example, the database field actually contains "<p>" and then a newline, and then a tab, and then "%uFFFD</p>".

The user states that this seems to happen when he presses the "Enter" key on the keyboard to create a blank line in the text. I have also verified the user is not copying text from Microsoft Word and pasting it into the editor.

Last edited 7 years ago by Amy McCrobie (previous) (diff)

comment:9 Changed 7 years ago by Amy McCrobie

Cc: tuonela76@… added

comment:10 in reply to:  7 Changed 7 years ago by Amy McCrobie

Replying to j.swiderski: I apologize for taking so long to respond. Is it possible to reopen this ticket?

comment:11 Changed 7 years ago by Jakub Ś

Let's keep it closed as this is IMO 100% your mistake and not CKEditor problem.

Have you tried testing this problem as I have described before to find out where exactly does this problem happen? Please do that as this will help you to find the area where the mistake is:

  • send contents to server and echo it right after reading request parameters
  • do your changes and echo contents right before sending it to DB
  • load your data into CKEditor and echo it right after getting it from DB.

You are either still having different encoding as described here http://cksource.com/forums/viewtopic.php?t=15640 or you are encoding but not decoding your request parameters.

Please test this and give me a comment on your results.

comment:12 Changed 7 years ago by Amy McCrobie

Okay, that sounds good to me.

By the way, I have checked and re-checked the encoding on the database and the encoding in the meta tag of the HTML pages and I am positive it is correct, however I agree that there may be something in my jQuery/js code that is causing the issue, or perhaps one of the PHP functions I am using to clean the data is causing the issue.

Thank you for your continued support! I will let you know what I find.

comment:13 Changed 7 years ago by Jakub Ś

Please do as it may help other users using CKEditor in their webapps.

comment:14 Changed 7 years ago by Amy McCrobie

I haven't been able to reproduce the issue yet, but I thought I would post an update explaining the PHP and JavaScript/jQuery script I use to process the data from the editor.

When the user clicks the submit button a JavaScript function is called. In the JavaScript function I store the input from the editor in a variable like so:

var editor5 = escape(CKEDITOR.instances.editor5.getData());

The variable is sent via ajax POST to a PHP script. In the PHP script, I use the following code to clean the input from the editor:

$comments = mysql_real_escape_string($_POST['editor5']);
$comments = preg_replace("/[\n\r]/","",$comments);
$comments = str_replace("<p>\t</p>",'',$comments);
$comments = str_replace("<p><br /></p>",'',$comments);
$comments = str_replace("<p>&nbsp;</p>",'',$comments);
$comments = str_replace("<ul>\t",'',$comments);
$comments = str_replace("<ol>\t",'',$comments);

Originally, I only had this line in the PHP script:

$comments = mysql_real_escape_string($_POST['editor5']);

But, when users started having the issue described in this ticket, I started adding the str_replace lines. I have changed the str_replace lines several times trying to solve the issue, and, at one point, I removed the str_replace lines all together. All of the changes I made to the PHP script happened before I changed the database tables and fields to use UTF-8 encoding.

Now that I have had a chance to look at the script again, I highly suspect that 1 or more of the following could be causing the problem:

  • the JavaScript escape() function
  • the PHP mysql_real_escape_string() function
  • the PHP preg_replace() function
  • the PHP str_replace() function

Again, I am still trying to reproduce the issue. Meanwhile, do you have any thoughts on the code in my script and its affect on the input from the editor?

Last edited 7 years ago by Amy McCrobie (previous) (diff)

comment:15 Changed 7 years ago by Amy McCrobie

Another piece of the puzzle...

The editor is used in a customized project management application for the construction company for which I work. When a job is newly assigned to a project manager, the script which saves this assignment to the database places some "template" text into the database field that corresponds to the editor. The code looks like this:

$today = date("n/j",time());
$mycomments = '<p>'.$today.'</p>';
$mycomments .= '<ul><li>&nbsp;</li>';
$mycomments .= '<li>&nbsp;</li>';
$mycomments .= '<li>&nbsp;</li>';
$mycomments .= '<li>&nbsp;</li></ul>';
$mycomments .= '<u>Order:</u><br /><br />';
$mycomments .= '<u>Freight:</u><br /><br />';
$mycomments .= '<u>Rental:</u><br /><br />';
$mycomments .= '<u>Manpower:</u><br /><br />';

Is there anything here that could cause the issue described in this ticket?

comment:16 Changed 7 years ago by Jakub Ś

  1. You shouldn't use escape but encodeuricomponent.

http://stackoverflow.com/questions/75980/best-practice-escape-or-encodeuri-encodeuricomponent

  1. I'm not sure if mysql_real_escape_string should be used http://www.php.net/manual/en/function.mysql-real-escape-string.php. Please make some tests - watch the data in a request and see is this function breaks the "encoding" after you have applied solution from point 1.

comment:17 Changed 7 years ago by Amy McCrobie

I now have this code in config.js:

config.enterMode = CKEDITOR.ENTER_BR;
config.autoParagraph = false;

I have this code in the php file, which is used to clean the input before inserting into the database:

$comments = mysql_real_escape_string($_POST['editor5']);
$comments = preg_replace("/[\n\r]/","",$comments);
$comments = str_replace("<p>\t</p>",'',$comments);
$comments = str_replace("<p><br /></p>",'',$comments);
$comments = str_replace("<p>&nbsp;</p>",'',$comments);
$comments = str_replace("<ul>\t",'',$comments);
$comments = str_replace("<ol>\t",'',$comments);

And, in my ajax function I have:

var editor5 = '';
if(CKEDITOR.instances.editor5 !== undefined)
{
	var editor5 = encodeURIComponent(CKEDITOR.instances.editor5.getData());
}

The difference now is the lines I added to config.js. I have had no other reports from users about special characters and Unicode characters showing up in the text from the editor.

EDIT: I changed the code in my ajax function above to use the correct JavaScript function to encode the text from the editor to properly prepare the text to be included in the query string which is sent to the PHP script. Thank you j.swiderski for pointing out the mistake so I could make corrections! For anyone who finds themselves having similar issues, do not use the escape() function, instead use the encodeURIComponent() function. Here is the link j.swiderski posted: http://stackoverflow.com/questions/75980/best-practice-escape-or-encodeuri-encodeuricomponent.

Last edited 7 years ago by Amy McCrobie (previous) (diff)

comment:18 Changed 7 years ago by Jakub Ś

But why are you still using JavaScript escape function?

You shouldn't use escape but encodeuricomponent.
As described here: http://stackoverflow.com/questions/75980/best-practice-escape-or-encodeuri-encodeuricomponent

comment:19 Changed 7 years ago by Amy McCrobie

Ah! I had changed from escape() to encodeURIComponent() originally but, during testing I switched back to using escape(). I only meant for the revert to be a temporary. Thank you for pointing it out again! My ajax function looks like this:

var editor5 = '';
if(CKEDITOR.instances.editor5 !== undefined)
{
	var editor5 = encodeURIComponent(CKEDITOR.instances.editor5.getData());
}

I have also fixed my earlier post to reflect the correct code.

comment:20 Changed 7 years ago by Amy McCrobie

Maybe you could help me with this issue?

Basically, as a user enters data into the form which also displays the CKEditor instance, a JavaScript 'autosave' function runs. The function gets the form input values and sends them to my php (autosaveedit.php) via an ajax POST. I am encoding the value from the editor with encodeURIComponent() in the JavaScript function. For debugging, I am echoing the POST value from the editor at the end of my autosaveedit.php script (echo $_POST['editor5'];). Below is a screenshot of the result in firebug:

http://eintranet.r717.net/test/editor_1.png

This is an accurate representation of what the data looks like when saved to the database (line breaks displayed after the <br /> tag, extra white space, etc.). The field in the database is data type mediumtext. I would like for the data to be stored without the extra white space and line breaks. Something more like this:

7/30<br /><br />Mobilize 5/29<br />At temp 8/24<br /><p>&nbsp;</p><ul><li>Heater drip pans</li><li>Sub electrician - through july</li><li>Insulation on-site 8/7</li><li>Primer and Paint 7/27</li><li>Heater detail</li><li>Pipe stand caps caulk</li></ul><br /><u>Order:</u><br /><br /><u>Freight:</u><br /><br />5/29 - Mobalize truck - Tools, Pipe stands, prefab, hanging steel<br />5/30 - Evaps to deliver<br />7/6 - condenser platform, pipe bridge, BR Prefab, Upper Catwalk, Receiver, Big-bore, Additional Tools, Disp Tank, Vacuum Pump &amp; Oil, Compressor Oil, Ranger !#2?<br />7/9 - condenser deliver<br />7/16 - Mech house ( Need weight and dims)<br /><br /><u>Rental:</u><br /><ul><li>5/29 (2) 40&#39; Connex, 32&#39; scissor, (1) lull w/48&#39; reach</li></ul><u>Manpower:</u><br /><br />

Is there a CKEditor configuration that can be used to accomplish this? Or, is there a PHP function that I am not aware of that would produce these results?

Edit: I just found the solution. I added this code to the bottom of the page which displays the CKEditor instance:

CKEDITOR.on('instanceReady', function(ev)
{
	ev.editor.dataProcessor.writer.setRules( 'br',
	{
		indent : false,
		breakBeforeOpen : false,
		breakAfterOpen : false,
		breakBeforeClose : false,
		breakAfterClose : false
	});
	ev.editor.dataProcessor.writer.setRules( 'p',
	{
		indent : false,
		breakBeforeOpen : false,
		breakAfterOpen : false,
		breakBeforeClose : false,
		breakAfterClose : false
	});
	ev.editor.dataProcessor.writer.setRules( 'ul',
	{
		indent : false,
		breakBeforeOpen : false,
		breakAfterOpen : false,
		breakBeforeClose : false,
		breakAfterClose : false
	});
	ev.editor.dataProcessor.writer.setRules( 'li',
	{
		indent : false,
		breakBeforeOpen : false,
		breakAfterOpen : false,
		breakBeforeClose : false,
		breakAfterClose : false
	});
});
Last edited 7 years ago by Amy McCrobie (previous) (diff)

comment:21 Changed 7 years ago by Jakub Ś

Well yes, that is one way of doing it but please not that when you switch to source you will get one line code which is not very easy to read.

A much better approach is using regular expressions in (JS or PHP) that will remove white spaces and unwanted <br /> tags before sending or saving data to DB.

Note: See TracTickets for help on using tickets.
© 2003 – 2019 CKSource – Frederico Knabben. All rights reserved. | Terms of use | Privacy policy