Copyright © 2005, 2006 Interchange Development Group
This documentation is free; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
It is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
Abstract
The purpose of this document is to describe the search "subsystem" in Interchange and link together all search-related topics.
Table of Contents
ac — mv_all_charsbd — mv_base_directorybs — mv_begin_stringck — mv_cache_keycs — mv_caseop — mv_column_opco — mv_coordinatecv — mv_verbatim_columnsde — mv_dict_enddf — mv_dict_folddi — mv_dict_limitdl — mv_dict_lookDL — mv_raw_dict_lookdo — mv_dict_orderdr — mv_record_delimem — mv_exact_matcher — mv_spelling_errorsfc — mv_force_coordinateff — mv_field_filefi — mv_search_fileft — mv_field_titlefm — mv_first_matchfn — mv_field_nameshs — mv_head_skipid — mv_index_delimlb — mv_search_labellf — mv_like_fieldlo — mv_list_onlylr — mv_search_line_returnls — mv_like_specma — mv_more_alphamc — mv_more_alpha_charsmd — mv_more_decademi — mv_more_idml — mv_matchlimitmm — mv_max_matchesMM — mv_more_matchesmp — mv_profilems — mv_min_stringne — mv_negateng — mv_negatenh — mv_no_hidenm — mv_no_morenp — mv_nextpagens — mv_next_searchnu — mv_numericos — mv_orsearchpm — mv_more_permanentra — mv_return_alldr — mv_return_delimre — mv_search_referencerf — mv_return_fieldsrg — mv_range_alpharl — mv_range_lookrm — mv_range_minrn — mv_return_file_namerr — mv_return_referencers — mv_return_specrx — mv_range_maxsd — mv_small_datase — mv_searchspecsf — mv_search_fieldsg — mv_search_groupsi — mv_search_immediatesm — mv_start_matchsp — mv_search_pagesq — mv_sql_querysr — mv_search_relatest — mv_searchtypesu — mv_substring_matchtf — mv_sort_fieldto — mv_sort_optionun — mv_uniqueva — mv_valueThe Swish search module allows you to search index files generated by Swish-e.
To enable any Swish searching, modify your interchange.cfg to add:
Require module Vend::Swish AddDirective Swish hash Variable swish Vend::Swish
To configure your catalog to use Swish, modify the appropriate catalog.cfg and add:
Swish command /usr/bin/swish-e Swish index products/swish-e.db
Finally, in search parameters, use mv_searchtype=swish or
the shorthand notation st=swish.
The fields to be returned from Swish to Interchange are configurable, and default to:
mv_return_fields=code score title url mod_date filesize mv_field_names=code score title url mod_date filesize
These correspond to:
code swishreccount score swishrank url swishdocpath title swishtitle filesize swishdocsize mod_date swishlastmodified
The date in the mod_date field is returned in the
format %Y-%m-%d %H:%M:%S.
You can change that with the date_format option:
Swish date_format "%d %b %Y"
See time glossary entry for supported format strings.
Simple search for the term Swish:
swish-e -w Swish
Same search with specifying the index file:
swish-e -w Swish -f db/xmldocs
You can include properties in the output:
swish-e -w Swish -f db/xmldocs -p purpose
Or search within a property:
swish-e -w purpose=LWP -f db/xmldocs
Indexing web sites is pretty easy. Swish provides a spider script, which is
simply called with the parameters default
. Create a configuration
file similar to the following:
starting_URL
IndexFile db/icdevgroup IndexDir /usr/local/lib/swish-e/spider.pl SwishProgParameters default http://www.icdevgroup.org/docs/
Now you can start indexing with swish-e -S prog -c
.
icdevgroup.conf
(directory_name,
default ProductDir)
base directory in which to look up text files to search
(related option fi).
Directory paths can be absolute, provided that the pathname is
equal to the MV_SEARCH_FILE variable, or
a scratch variable of the same name is 1.
To enable searching in say,
/etc/dict, use either
[calcn]$Variable->{MV_SEARCH_FILE} =
'/etc/dict'; return[/calcn] or
[tmp /etc/dict]1[/tmp].
(1/0, default false)
the search string matches only at the beginning of a column.
(search_reference_pointer,
default none)
not intended for common use. When more tag is used,
this option automatically provides a pointer to the search
reference.
(rm | eq |
tq | aq,
default rm)
operation to perform to check field for a match.
For tq and aq matching
using Text::Query module, see
Q: .
(0/1,
default 0)
the so-called "coordinated" search allows for multiple search options to be stacked on top of each other.
If the number of search fields (sf options) equals the
number of search specs (se options), the search will
return items that match all or one of the field-specification blocks
(controlled with mv_orsearch).
When the two numbers do not match, coordinated mode will be automatically
and silently turned off. To force a coordinated search, see
mv_force_coordinate.
When coordinated searching is used, case sensitivity, substring matching, negation and other options can be specified multiple times and work on a field-by field basis, according to the following rules:
If only one instance of the option is set, it will affect all fields (search specifications).
If the number of instances of the option is greater than, or equal to, the number of search specifications, all will be used independently. (Eventual trailing, excess instances will be ignored).
If more than one instance of the option is set, but fewer than the total number of search specifications, the default, documented setting will be used for trailing search specifications.
If a search specification is blank, it will be removed and all
case-sensitivity, negation, substring and other options will be
adjusted accordingly. If you need to match on a blank string,
use quotes ("").
(/,
default )
Make dictionary matching case-insensitive. Ignored unless
mv_dict_look is set.
(/,
default )
Make dictionary matching follow dictionary order, where only word
characters and whitespace matter.
Ignored unless mv_dict_look is set.
(record_delimiter,
default \n)
delimiter for counting records in search index files. The default, a newline, works well for most line-based index files.
(0/1,
default 0)
require that search field matches the search specification exactly
(as opposed to the default word-based matching, or substring matching
with su). Search specification will behave as it
was enclosed in quotes.
(0/1,
default 0)
force coordinated search (enabled with mv_coordinate).
Normally, coordinated mode is automatically turned off when the number of search specifications does not match the number of search fields. With this option, however, instead of disabling coordinated mode, Interchange ensures the number of search specifications does match the number of fields by filling the missing specifications with the last one specified, or by discarding extras.
This option is useful when you want to search for one string in multiple fields with different options.
(header_filename,
default none)
specify filename containing a single line with the list of database fields, separated by TABs. This is used when you are searching databases without the "field header" on the first line, but you would still want to refer to fields by their names.
(search_result_number,
default 1)
return search results from the specified result number onwards. When this option is set, Interchange will return search results starting from the match number specified even if there is only one page of results. If set to a value greater than the total number of matches, it will act as if no matches were found.
(row_count,
default 1 for text files, 0
otherwise)
number of lines to skip at the beginning of a search index or text
file. Interchange normally skips one line for text-based searches
(st=text) to exclude the header line.
(field_delimiter,
default \t)
delimiter for counting fields in search index files. The default, a TAB character, works well for most line-based index files.
(,
default none)field_name
perform search similar to SQL "LIKE" functionality.
When defined, mv_like_spec is required as well.
(,
default none)search_specification
string to search for in mv_like_field.
The behaviour of the % character and case-sensitivity
depends upon your SQL implementation.
(record_count,
default 50)
maximum number of records (search results) to return from a search.
When all the results are
displayed on a single page, this option is equivalent to
mm. When the more tag is used
to display results multi-page, then this option
determines the number of results per page.
To specify unlimited, use none or
all, not 0.
(record_count,
default unlimited)
final, maximum number of records (search results) to return from a search
(related option ml).
(,
default min_length1 for text-based searches)
minimum size of a search string for a search operation.
(1/0, default 0)
search operator will perform numeric (instead of string) comparison.
(1/0, default 0)
the one and only match from the search will be the value of the
mv_searchspec itself. Useful in testing, or yes/no
confirmation whether the search string was found
(SQL_Query,
default none)
for text-based searches (st=text only), this option
specifies the SQL query to run over the lines in the file.
This is not the same as an external SQL database search.
Furthermore, the SQL_Query undergoes a
little modification before it is used. Here's a practical
example:
Artist: <input name="artist" /> Title: <input name="title" /> <input type="hidden" name="mv_sql_query" value=" SELECT code FROM products WHERE artist LIKE artist AND title LIKE title " />
If the right-hand side of every part of expression is an alphanumeric, unquoted word, then it is replaced with the appropriate form variable value. (Or if it's a one-click search, scratch variables are used instead). Quoted right-hand side values are taken literally.
If the left-hand side of every part of expression is a quoted word, the behavior is reversed. That part is replaced with the appropriate form variable value. (Or if it's a one-click search, scratch variables are used instead). Unquoted left-hand side values are taken literally.
Here's an example that allows users to select whether they want to search in title or artist fields:
Search for: <input name="searchstring" /><br />
Search in <input type="radio" name="column" value="title" /> title
<input type="radio" name="column" value="artist" /> artist
<input type=hidden name="mv_sql_query" value="
SELECT code FROM products
WHERE 'column' LIKE searchstring
" />
Just for a reference, here's what the two above examples would look like when coded "manually":
[page search=" co=yes sf=artist op=rm se=[value artist] sf=title op=rm se=[value title] "] Search for [value artist], [value title] </a> [page search=" co=yes sf=[value column] op=rm se=[value searchstring] "] Search for [value searchstring] in [value column] </a>
( [ glimpse | db |
sql | text |
ref ],
default none)
select search type. glimpse uses the Glimpse search
engine (see Glimpse), db (or the
equivalent sql) iterate over every row in the
SQL database, text searches
corresponding database text source files, and
ref iterates over the results from some
previous, already-performed search (related option lb).
(0/1,
default 0)
match on substrings as well as whole words. This is typically set in dictionary-based searches.
(field_name_or_index [,field_name2_or_index2...],
default none)
determine sort order of the returned data. It is possible to refer
to columns by both using their names (if the search is such that column
names are known) and their indices, starting from 0.
(0/1,
default 0)
removes duplicate records from the result
set. Duplicates are determined by comparing the value
of the first
search return field (set with rf).
(value_variable_name=value,
default none)
assign value to a value variable. This
is exactly what happens with normal variables in search profiles
when you use the syntax,
so you should use this option only where variables cannot be set
directly (i.e. in one-click searches):
variable_name=value
[page
href=scan
arg="se=Renaissance
se=Impressionists
va=category_name=Renaissance and Impressionist Paintings
os=yes"
]Renaissance and Impressionist Paintings<a>