[ic] Prevent search from matching on html
Daniel Davenport
DDavenport at newagedigital.com
Thu Oct 26 20:01:34 EDT 2006
> -----Original Message-----
> From: interchange-users-bounces at icdevgroup.org
> [mailto:interchange-users-bounces at icdevgroup.org] On Behalf
> Of Kevin Walsh
> Sent: 2006 October 26 -- Thursday 6:30 PM
> To: interchange-users at icdevgroup.org
> Subject: Re: [ic] Prevent search from matching on html
>
> Josh Lavin <josh at myprivacy.ca> wrote:
> > I am finding that when we use HTML in our product
> descriptions, the
> > search results will include products where an HTML tag matched the
> > search query.
> >
> > Simple example: if my description contains "<h2>Features</h2>" and
> > someone searches for 'h2', then that product will be
> returned in the
> > results.
> >
> > I would like to avoid this, and figured I needed a custom
> SearchOp,
> > but I'm having no luck with this one:
> >
> > CodeDef not_tags SearchOp
> > CodeDef not_tags Routine <<EOR
> > sub {
> > my ($self, $i, $pat) = @_;
> >
> > return sub {
> > my $string = shift;
> > $string =~ s:<[/\w].*?\s?/?>::gi;
> > return $string;
> > };
> > }
> > EOR
> >
> > The idea is to remove any HTML tags before searching. Any ideas?
> >
> You are always returning a true value. A SearchOp's coderef needs
> to return true if a match is found or false if no match is found.
>
> Try something like this instead:
>
> CodeDef not_tags SearchOp
> CodeDef not_tags Routine <<EOR
> sub {
> my ($self, $i, $pat) = @_;
> $pat = qr/$pat/i;
>
> return sub {
> my $string = shift;
>
> $string =~ s:<[/\w].+?>::gi;
> return $string =~ $pat;
> };
> }
> EOR
And make sure you don't have any <text with angles around it, like this
for instance> in the text being searched, cause that regexp will get it
ignored in your search...as will just about any others that just match
whatever's between < and >.
If the field's known html, that shouldn't be a problem -- they should be
< and > anyway -- but if you're letting other people edit
descriptions, it's something to watch out for.
More information about the interchange-users
mailing list