[ic] weighted search result sorting

Thu Jan 2 18:07:01 2003

> Paul Jordan [paul@gishnetwork.com] wrote:
> > 
> > A pseudo idea would be like:
> > 
> > sku		keywords
> > sku123	ocean_7, island_9, sky_5, trees_4, water_5
> > sku124	sky_10, clouds_10, blue_3, day_2
> > 
> > I have no idea if this is possible but in the above it is 
> assumed that
> > with substring matching turned on, 'sky' will still be a 
> HIT for both,
> > then maybe create some custom tf=? or method of sorting based on the
> > numeric TOTALs of the corresponding _'n' with regards to the words
> > matched by the users search spec.
> > 
> > So now with the above, a search for 'sky' will still return 
> both, but
> > the first one visible will be sku124 (because sky=10) and 
> for the other
> > (sky=5)
> > 
> > But if someone searched for 'sky ocean' then both would still be
> > returned but sku123 will be first because (sky+ocean=12) 
> and the other
> > sku is (sky=10)
> > 
> > I still want to return both, because a Graphic artists can 
> just take the
> > sky from one and the ocean from another, so both are 
> relevant. I know I
> > know this is starting to sound terribly inefficient :) but 
> the but any
> > normal tf=?,?,? will simply not work well at all for us.
> > 

[ part of Kevin's solution cut]

> Query:
> 
>     SELECT  sku, SUM(weight) AS skuweight
>     FROM    keywords
>     WHERE   keyword IN ('sky','ocean')
>     GROUP BY sku
>     ORDER BY skuweight DESC
> 
> Results:
> 
>     sku     skuweight
>     ------- ---------
>     sku123  12
>     sku124  10
> 
> Or that's the theory, at least. :-)
> 
> You'd pass the keywords into a page which would split them into
> individual lower-case words for use in the WHERE clause.  If you
> wanted to allow substring matches, you could have multiple "LIKE"
> lines instead of one large "IN" line:
> 
> User types "sk oce" and the following is generated:
> 
>     SELECT  sku, SUM(weight) AS skuweight
>     FROM    keywords
>     WHERE   keyword LIKE '%sk%'
>     OR      keyword LIKE '%oce%'
>     GROUP BY sku
>     ORDER BY skuweight DESC
> 
> Results:
> 
>     sku     skuweight
>     ------- ---------
>     sku123  12
>     sku124  10
> 
> Once you have your SQL query, you'd execute it with the [query] tag
> and display the results in any way you see fit.

Not sure how one would implement this, but a site I've worked on
had this problem and we came up with a solution of doing multiple
passes, starting with Kevin's first query which finds exact matches.
Then if that pass doesn't produce <the-maximum-results-to-display>
(however you choose that), you do the second of Kevin's queries.
Obviously you rank the exact matches from the first pass above those
in the second pass.

If you just wanted to do it the simplest way, you could always do
both queries for every search and just display it as:

Exact matches on your search:

	<the results from query 1>

Items with similar keywords:

	<the results from query 2>

A slightly-less-simple way might be to do it as a [query] block and
do the second query within the [no_match] block of the first
query.  Then you would run the LIKE query whenever you got nothing
at all from the exact match query.  Not sure how well Interchange
handles queries within queries, though and I have a vague
recollection that it didn't like nesting them at all.

Patrick Bennett
http://www.ccgenesis.com