Forum Topic: regex - remove duplicate lines?

(140 views • 14 replies)

This topic is 1 page long.

<< < > >>
None

yhar

Reply To Post Reply & Quote

Posted at: 4/28/09 11:19 AM

yhar NEUTRAL LEVEL 03

Sign-Up: 04/02/08

Posts: 1,769

Hi,
Regex rapes my mind, so I'm struggling. I want to remove duplicate <br/>'s, I've got it working on single ones, but not consecutive.

Hello<br/>
Hello<br/><br/>
Hello<br/><br/><br/>
Hello<br/><br/><br/><br/>
Hello<br/><br/><br/><br/><br/>

The top two should work, because there is either one or two <br/>s, but the other three shouldn't. So, how can I make it say "If <br/> is repeated three or more times...". I'm using PHP to strip out the rest, I can manage that myself, it's just the actual regex that mindfucks.

thanks (L)

THIS IS CITRICSQUID POSTING


None

Jon-86

Reply To Post Reply & Quote

Posted at: 4/28/09 11:28 AM

Jon-86 NEUTRAL LEVEL 13

Sign-Up: 01/30/07

Posts: 3,929

You can solve this without using regex. While inside a loop search your string for "<br/><br/><br/>" if they exist then string replace "<br/><br/><br/>" with "<br/><br/>" and loop again till it gets them down to only 2.

PHP Main :: C++ Main :: Java Main :: irc.freenode.net

BBS Signature

None

yhar

Reply To Post Reply & Quote

Posted at: 4/28/09 11:30 AM

yhar NEUTRAL LEVEL 03

Sign-Up: 04/02/08

Posts: 1,769

At 4/28/09 11:28 AM, Jon-86 wrote: You can solve this without using regex. While inside a loop search your string for "<br/><br/><br/>" if they exist then string replace "<br/><br/><br/>" with "<br/><br/>" and loop again till it gets them down to only 2.

I considered that, and that is what I'll be doing if this doesn't work. However, it's better practice to use regex (I think?) and it requires less processing power, as opposed to looping through. It's potentially possible for there to be thousands of <br/>'s, so it'd be surely quicker to use regex to find the first 2, then replace the rest with nothing than to loop through every single one and replace.

THIS IS CITRICSQUID POSTING


None

Jon-86

Reply To Post Reply & Quote

Posted at: 4/28/09 11:38 AM

Jon-86 NEUTRAL LEVEL 13

Sign-Up: 01/30/07

Posts: 3,929

Regex is good for complicated patterns, as you know, but if theirs an easier way like this I would normally go for it so the code is easier to follow. Unless efficiency is an issue like you said, but you could do tests on this to see how much it actually takes to loop through test data.

Regex rapes my mind also, as you say :-/

PHP Main :: C++ Main :: Java Main :: irc.freenode.net

BBS Signature

None

yhar

Reply To Post Reply & Quote

Posted at: 4/28/09 11:40 AM

yhar NEUTRAL LEVEL 03

Sign-Up: 04/02/08

Posts: 1,769

At 4/28/09 11:38 AM, Jon-86 wrote: Regex is good for complicated patterns, as you know, but if theirs an easier way like this I would normally go for it so the code is easier to follow. Unless efficiency is an issue like you said, but you could do tests on this to see how much it actually takes to loop through test data.

It's not much of an issue, as such, but I'd like to minimise the amount of processing power used. My main concern is that it'd be much better to look at how it's done with regex so I can 'learn' and understand regex better, that way I'll not be stuck with work arounds forever. I've tried putting it together myself and I can't get it to work, which is why I'm here :D

But yeah, I'll go with a loop if I can't get regex that works.

THIS IS CITRICSQUID POSTING


None

Jon-86

Reply To Post Reply & Quote

Posted at: 4/28/09 11:44 AM

Jon-86 NEUTRAL LEVEL 13

Sign-Up: 01/30/07

Posts: 3,929

Your in the same boat as me, I've no had the time to learn regex properly. And that stops you from implementing mod_rewrite properly also. Blah t'is balls so it it!

PHP Main :: C++ Main :: Java Main :: irc.freenode.net

BBS Signature

None

DFox

Reply To Post Reply & Quote

Posted at: 4/28/09 11:59 AM

DFox LIGHT LEVEL 30

Sign-Up: 08/09/03

Posts: 9,483

One of the most common things I've seen on the NG programming forum is the lack of use of Google when applying searches to more complex issues. I'm not picking on you. It's the common thing on this forum. Someone asks a basic question about PHP, it's automatically "OMG, use Google". If it's more complex, people usually don't even bother suggesting a search engine.

But, the truth of the matter is, for people like you, who already know PHP, Google is much more valuable than for people who don't know anything about the language. So what I'm saying is, even if your issue seems complex, try to search it, because the overwhelming odds are someone else has had that same issue.

Anyway, a simple search led me to this: http://stackoverflow.com/questions/13357 1/how-to-convert-multiple-br-tag-to-a-si ngle-br-tag-in-php

There's probably 10 solutions on that page.


None

DFox

Reply To Post Reply & Quote

Posted at: 4/28/09 12:03 PM

DFox LIGHT LEVEL 30

Sign-Up: 08/09/03

Posts: 9,483

At 4/28/09 11:38 AM, Jon-86 wrote: Unless efficiency is an issue

Efficiency should ALWAYS be an issue unless you're doing something solely for educational purposes.


None

yhar

Reply To Post Reply & Quote

Posted at: 4/28/09 12:05 PM

yhar NEUTRAL LEVEL 03

Sign-Up: 04/02/08

Posts: 1,769

At 4/28/09 11:59 AM, DFox wrote: There's probably 10 solutions on that page.

I've been searching for over an hour. I don't like posting topics, because it's often that the solution is obvious, however I've searched for ages. I'm unable to find exactly what I need, therefore I'm posting. I didn't think to search for converting multiple <br/>'s, instead I was searching for "match repeated sequence" etc.

Thanks, though :)

THIS IS CITRICSQUID POSTING


None

Jon-86

Reply To Post Reply & Quote

Posted at: 4/28/09 12:12 PM

Jon-86 NEUTRAL LEVEL 13

Sign-Up: 01/30/07

Posts: 3,929

At 4/28/09 12:03 PM, DFox wrote: Efficiency should ALWAYS be an issue unless you're doing something solely for educational purposes.

A meant as in if the function needs to do a lot of processing. If the function is only ever gonna be passed small data sets then I would make the code clearer to read, if not it would be heavily commented. This just makes maintaining it less tedious for people down the line.

PHP Main :: C++ Main :: Java Main :: irc.freenode.net

BBS Signature

None

Afro-Ninja

Reply To Post Reply & Quote

Posted at: 4/28/09 12:15 PM

Afro-Ninja EVIL LEVEL 38

Sign-Up: 03/02/02

Posts: 13,467

here's a simple one

$input='Hello<br/><br/><br/><br/>';
$input = preg_replace('!(<br/>){2,}!i', '<br/>', $input);
echo $input;

it doesn't take whitespace inside the tag into account though, like the ones dfox linked to

The exclamation marks delimit the whole pattern. You can use any number of characters, most people use forward slashes. I didn't use them because the pattern itself contains a forward slash as part of the <br/>, meaning it would need to be escaped if I did.

The parentheses identify that we're looking to match <br/> as a whole. The {2,} means 'two or more times'

Lastly, the i at the end means case insensitive.

BBS Signature

None

DFox

Reply To Post Reply & Quote

Posted at: 4/28/09 12:15 PM

DFox LIGHT LEVEL 30

Sign-Up: 08/09/03

Posts: 9,483

At 4/28/09 12:12 PM, Jon-86 wrote: A meant as in if the function needs to do a lot of processing. If the function is only ever gonna be passed small data sets then I would make the code clearer to read, if not it would be heavily commented. This just makes maintaining it less tedious for people down the line.

Good point, but you just need to always remember PHP is parsed to begin with so making it do any significant extra work could really damage the efficiency.


None

Jon-86

Reply To Post Reply & Quote

Posted at: 4/28/09 12:20 PM

Jon-86 NEUTRAL LEVEL 13

Sign-Up: 01/30/07

Posts: 3,929

Yeah for the most part I will implement some system, then analyse it for bottle necks, at these points is where I will spend time and energy optimising code. Unless its a design flaw. It would always be nice to go back and do the full works. But deadlines don't permit that.

PHP Main :: C++ Main :: Java Main :: irc.freenode.net

BBS Signature

None

yhar

Reply To Post Reply & Quote

Posted at: 4/28/09 12:26 PM

yhar NEUTRAL LEVEL 03

Sign-Up: 04/02/08

Posts: 1,769

At 4/28/09 12:15 PM, Afro-Ninja wrote: $input = preg_replace('!(<br/>){2,}!i', '<br/>', $input);

\s matches a blank space, so "<br />" should be "<br\s\/>", so I've changed it to:

$input = preg_replace('!(<br\s\/>){2,}!i', '<br />', $input);

with input containing:

hello<br />
<br />
<br />
<br />
<br />
<br />
hi<br />
<br />
ffffff
<br />

Then ran the code and no luck. Any ideas where I could be going wrong? I've read through the meanings of regex thingies, and \s is apparently a blank space, so my code should be working, but it doesn't. Any ideas what I'm doing wrong?

THIS IS CITRICSQUID POSTING


None

Afro-Ninja

Reply To Post Reply & Quote

Posted at: 4/28/09 12:43 PM

Afro-Ninja EVIL LEVEL 38

Sign-Up: 03/02/02

Posts: 13,467

At 4/28/09 12:26 PM, yhar wrote:
At 4/28/09 12:15 PM, Afro-Ninja wrote: $input = preg_replace('!(<br/>){2,}!i', '<br/>', $input);
\s matches a blank space, so "<br />" should be "<br\s\/>", so I've changed it to:

$input = preg_replace('!(<br\s\/>){2,}!i', '<br />', $input);

try this one

$input = preg_replace('!(<br\s*/>\s*){2,}!i', '<br />', $input);

I changed the \s to \s*, the \s means 'zero or more' times. Then I took out the following back slash, because you don't need to escape the forward slash with ! delimiters

We need to take line breaks into account, so another \s* is added at the end (zero or more whitespace/linebreaks)

that *should* work, but I'm by no means a regex expert. I'm sure even this simple pattern could be made more efficient.

BBS Signature

All times are Eastern Standard Time (GMT -5) | Current Time: 11:06 PM

<< Back

This topic is 1 page long.

<< < > >>
You need a Grounds Gold Account to post on the NG BBS! If you don't have one, click here to sign up now! It's fast, free, and easy — and opens up tons of great NG features!