Be a Supporter!

PHP: replace_tags()

  • 3,917 Views
  • 4 Replies
New Topic Respond to this Topic
Pilot-Doofy
Pilot-Doofy
  • Member since: Sep. 13, 2003
  • Offline.
Forum Stats
Member
Level 37
Musician
PHP: replace_tags() 2006-02-27 17:02:39 Reply

PHP: Main

In this tutorial you will learn how to write a function that is identical to striptags() except it doesn't delete unwanted HTML, instead it converts it to harmless HTML equivalents of &lt; for < and &gt; for >.

Let's start with the function.

function replace_tags($string) {
$total_args = func_num_args() + 1;
$args = func_get_args();
for($i = 1; $i < $total_args; $i++) {
$tags[] = $args[$i];
}

$string = htmlentities($string);

foreach($tags as $tag) {
$regexp = "\&lt\;\s?" . $tag . "(.*?)\&gt\;(.+)\&lt\;/\s?" . $tag . "\s?\&gt\;";
$string = preg_replace("#" . $regexp . "#i", '<' . $tag . '\\1>\\2</' . $tag . '>', $string);
}

return $string;
}

To use this function you could use something like this:

$string = replace_tags($tmp_str, 'a', 'u', 'b', 'i'); to allow the a, underline, bold, and italics tags. It can adapt to however many arguments you pass it to it, just make sure the first argument is the string you wish to modify.

Firstly, let's explain the func_num_args() and func_get_args() functions. The func_num_args() function returns the total number of arguments that are being passed to the function you're calling from inside of. The reason we add 1 to this number is to skip over the $string argument; hence, the for loop starts at 1 rather than the standard 0.

When you call the func_get_args() function inside of a function it retrieves the arguments from the list provided by the code. In this case, we store them in an array called $tags inside of the for loop.

Once you've stored the tags in the array, it's time to retrieve them and run a regular expression on them. The htmlentities() function converts all potentially harmful HTML into harmless equivalents, such as &lt; for < and &gt; for > as explained above.

After we've converted all the tags to &lt; and &gt;, it's time to reconvert those we want to show up. Let's look at the regular expression piece by piece.

\s matches any amount and any kind of white space. This covers people who like to write their HTML like this:

< tag >

That would be valid and would be accepted through this function. However, notice the whitespace catch (\s) has the ? attached to the end of it. This basically means it isn't required, but it will accept it. You can view it as being optional.

You'll also notice the (.*?). This matches anything 0 or more times, which is also set to be "optional". The reason this is there is for HTML tags that require more parts than just the tag, for instance the A tag has href, target, etc. The iframe tag can be used as well as an example. However, you may want to strip the STYLE from the tag so people can't enter CSS on the HTML, but that's up to you.

It also matches an "optional" amount of whitespace at the end of the tags as well.

That's it. After that, the string has be converted how you like it and HTML is no longer a threat to the layout of your page. If you want to include <img> tags you may want to read up on my resizing images tutorial which also provides a comprehensive function which is open for modification.

Enjoy!

Khao
Khao
  • Member since: Sep. 20, 2003
  • Offline.
Forum Stats
Member
Level 20
Blank Slate
Response to PHP: replace_tags() 2006-02-27 17:10:33 Reply

Again you made a great tutorial :D It's really helpful!

liljim
liljim
  • Member since: Dec. 16, 1999
  • Offline.
Forum Stats
Staff
Level 28
Blank Slate
Response to PHP: replace_tags() 2006-02-27 18:06:11 Reply

Why are you escaping ampersands and semi-colons in the expression? If you want to get this to work correctly with tags that have quotes in them, you'll have to replace &quot; with ". Having said that, you have to be very careful with stuff like this... Take for example, the following input:

[a href="http://www.google.com" onclick="for(i=0;i<100000000000;i++){alert
('blah');}"]whatever[/a]

Harmless, but irritating nonetheless.

Pilot-Doofy
Pilot-Doofy
  • Member since: Sep. 13, 2003
  • Offline.
Forum Stats
Member
Level 37
Musician
Response to PHP: replace_tags() 2006-02-27 18:08:03 Reply

Oh crap I forgot I escaped those. I was writing it while working on English homework and got tired of looking up Greek writers. :-P Anyway, I didn't feel like safe guarding the code for every possibilty as you explained.

I told the user they may want to check the other input contained in (.*?) for malicious code. As you further explained and gave an example of, it can be irritating sometimes.

Claxor
Claxor
  • Member since: Oct. 21, 2005
  • Offline.
Forum Stats
Member
Level 12
Blank Slate
Response to PHP: replace_tags() 2006-02-28 14:01:31 Reply

One thing to note is that some hosts doesn't allow preg, as they don't have the PCRE librarie installed, making them having to use ereg instead. And as ereg doesn't allow lazy, that could be a bit of a problem :/

PCRE librarie


BBS Signature