Context Navigation

← Previous Ticket
Next Ticket →

#8784 closed New Feature (fixed)

Allow for external mechanisms to work on the dataprocessor

Reported by:	Sa'ar Zac Elias	Owned by:
Priority:	Normal	Milestone:
Component:	General	Version:
Keywords:		Cc:

Description

We should allow for interaction with other mechanisms while parsing the html (to and from), so one could, for example, parse some html using regex.
The main idea is to use an event, which will also bring the benefit of priority in the order of filters.
That will be helpful with, for exmple, Drupal integration. Most Drupal modules use regex to filter html, and it will be easier to integrate them with our editor that way.

Change History (5)

comment:1 Changed 13 years ago by Sa'ar Zac Elias

Note: that came from the unnatural way of the current handling:

		var proto = CKEDITOR.htmlDataProcessor.prototype;
		proto.toHtml = CKEDITOR.tools.override( proto.toHtml, function( org )
		{
			return function( data )
			{
				// changes here
				return org.apply( this, arguments );
			};
		});
		proto.toDataFormat = CKEDITOR.tools.override( proto.toDataFormat, function( org )
		{
			return function( data )
			{
				data = org.apply( this, arguments );
				// changes here
				return data;
			};
		});

Last edited 13 years ago by Sa'ar Zac Elias (previous) (diff)

comment:2 Changed 13 years ago by Sa'ar Zac Elias

Another example, this time from phpBB:

editor.on( 'toHtml', function( ev )
{
	ev.data = ev.data.replace( /\[youtube\]([^\[]+)\[\/youtube\]/gi, function( match, url )
	{
		return '<div contenteditable="false" class="youtube">' + url + '</div>';
	});
});
editor.on( 'toDataFormat', function( ev )
{
	ev.data = ev.data.replace( /<div\s+(?:[^>]+\s+)?class=\"youtube\"(?:\s+[^>]+)?>\s*([^<]+)\s*<\/div>/gi, function( match, url )
	{
		return '[youtube]' + url + '[/youtube]';
	});
});

..instead of..

var proto = CKEDITOR.htmlDataProcessor.prototype;
proto.toHtml = CKEDITOR.tools.override( proto.toHtml, function( org )
{
	return function( data )
	{
		data = data.replace( /\[youtube\]([^\[]+)\[\/youtube\]/gi, function( match, url )
		{
			return '<div contenteditable="false" class="youtube">' + url + '</div>';
		});
		return org.apply( this, arguments );
	};
});
proto.toDataFormat = CKEDITOR.tools.override( proto.toDataFormat, function( org )
{
	return function( data )
	{
		data = org.apply( this, arguments );
		return data.replace( /<div\s+(?:[^>]+\s+)?class=\"youtube\"(?:\s+[^>]+)?>\s*([^<]+)\s*<\/div>/gi, function( match, url )
		{
			return '[youtube]' + url + '[/youtube]';
		});
	};
});

(Used with:)

[youtube]http://www.youtube.com/watch?v=J---aiyznGQ[/youtube]

Last edited 13 years ago by Sa'ar Zac Elias (previous) (diff)

comment:3 Changed 13 years ago by Jakub Ś

Status:	new → confirmed

comment:4 Changed 13 years ago by Wiktor Walc

Mixing BBCode-like tags and HTML is a pretty common situation. Since many "big names" are doing it, others follow the same convention.

Drupal is probably the most interesting case to explain why we should make using the htmlDataProcessor more flexible / user friendly.

Drupal filtering

Drupal provides a filtering system which allows translating e.g. BBCodes into HTML on the server side. This way, instead of allowing user to insert HTML elements like: object, embed, param elements or an iframe, which imposes a security risk, developer can easily create a nice and handy filter that will change [video:url] into the right HTML code (like video_filter module), while keeping the set of allowed HTML tags to minimum.

Writing a Drupal module requires PHP and Drupal knowledge. Most of such developers know PHP well and since browsing DOM on the server side does not work, this makes them more likely to use regular expressions for everything.

Unfortunately, things get a little bit harder, when a PHP developer that writes a server side module, wants to add also a dedicated plugin for a WYSIWYG editor. The sample scenario is as follows:

Custom syntax in source
"Fake" element in WYSIWYG mode to present the BBCode (e.g. for tags like [random_number])
The conversion between Wysiwyg <-> Source is done using regular expressions, because this is the only available & reliable way on the server side, so developers want to use exactly the same piece of code also for JavaScript.

Unescaped HTML

Another reason why parsing code with regular expressions sometimes just fits better is the following code, that is rendered perfectly fine by Drupal with the GeSHi Filter module:

Sample HTML <strong>text</strong> and a PHP code below:

<php>
$prefix = "<b>sample text</b><div>";
</php>
...and a closing div below:
<php>
$suffix = "</div>";
</php>
... and a header:
<php>
$header = <<<HEREDOC
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" dir="ltr" lang="en-gb" xml:lang="en-gb">
<head>
HEREDOC;
</php>

(note that HTML tags are not escaped).

Custom syntax in different text nodes, with HTML inside

Another vote for making the htmlDataProcessor less scary is the comment added by Quicksketch, after trying to add support for CKEditor in his module: http://drupal.org/node/1286004#comment-5016902 which basically operates on code similar to:

[caption align="right"]
<img src="example.png" alt="" width="100" height="100" />The Caption of the Image
[/caption]

using reglar expressions.

Basically, people have doubts whether overriding core methods is the desired way of implementing such things. Making some kind of a official mechanism and documenting it properly could help developers in dealing with such tasks and also could help in getting rid of the feeling that this is a hack.

Just to point out that's not a Drupal specific issue, some samples of mixed syntax from other applications.

Mixed syntax in WordPress

Sample code inserted in WordPress, which basically operates on HTML:

[gallery order="DESC" columns="5" orderby="title"]

Sample bbcodes inserted by thirdparty plugins in WordPress:

[nggallery id=1]
[jwplayer mediaid="16"]

I'm pasting them just to show, that this is a pretty common convention.

In phpBB users can define BBCodes that are later replaced with HTML http://www.phpbb.com/kb/article/adding-custom-bbcodes-in-phpbb3/ so we can consider the ability to define an external dataprocessor that handles just a piece of data as a common requirement.

comment:5 Changed 12 years ago by Piotrek Koszuliński

Resolution:	→ fixed
Status:	confirmed → closed

This issue is fixed (currently only on major, but 4.1 will be released soon) by the changes introduced in #9829.

There is a couple of new events on htmlDataProcessor - #toHtml and #toDataFormat. Depending on a priority of listener it is now possible to process data on different stages. Quote from #toHtml doc:

* 1-4: Data are available in original string format.
* 5: Data are initially filtered with regexp patterns and parsed to
	{@link CKEDITOR.htmlParser.fragment} {@link CKEDITOR.htmlParser.element}.
* 5-9: Data are available in parsed format, but {@link CKEDITOR.htmlDataProcessor#dataFilter}
	isn't applied yet.
* 10: Data are filtered with {@link CKEDITOR.htmlDataProcessor#dataFilter}.
* 10-14: Data are available in parsed format and {@link CKEDITOR.htmlDataProcessor#dataFilter}
	isn't already applied.
* 15: Data are written back to HTML string.
* 15-*: Data are available in HTML string.

Also, there's one more interesting change - CKEDITOR.htmlParser.filter doesn't have to be applied during serializing pseudo DOM to a string. Now it also can be applied separately by fragment/element#filter( filter ). Thanks to that it is possible to apply many separate filters to one fragment or e.g. to use the new fragment#forEach method after htmlDataParser's filter has been applied to data which are being processed.

I think that these new features completely solve this issue.

Note: See TracTickets for help on using tickets.

Download in other formats: